IE531: Algorithms for Data Analytics
Spring, 2018
Homework 1: Review of Linear Algebra, Probability & Statistics
and Computing
Due Date: March 2, 2018
c Prof. R.S. Sreenivas
Instructions
1. You can modify any of the C++ code on Compass to solve these problems, if you
want. It might help you with honing your programming skills. If these attempts
(at using C++ code) is turning out to to be intense, you can use MATLAB just
this once.
2. You will submit a PDF-version of your answers on Compass on-or-before mid-
night of the due date.
Instructions
1. (25 points) Tightness of the Chebyshev Bound: This problem is about discov-
ering distributions where the upper-bounds of the Chebyshev Inequality is tight.
First, you are going to show (by example) that there is a discrete RV where this
bound is tight. Then, you are going to present a cogent argument (no need to be
super formal here!) that there can be no continuous RV where the Chebyshev
Bound it tight.
(a) (5 points) Show that the Chebyshev Bound is tight for the discrete RV X2
f 1;0;1g, where Prob(X = 1) = Prob(X = 1) = 12k2 . That is, compute
EfXg and var(X) and plug it into the Chebyshev Bound and arrive at the
conclusion that Prob(jXj 1) = 1k2
(b) (20 points) Show that there can be no continuous distribution over the
whole real axis where the Chebyshev Bound is tight.
2. (25 points) Unit-Ball in High Dimensions: We will use the ‘4-norm to define
the unit-ball as
B(1;d;4) =f(x1; x2;:::; xd)2Rd jx41 + x42 + + x4d 1g
(a) (12.5 points) Suppose we define
S :=f(x1; x2;:::; xd)2Rd jx41 + x42 + + x4d 12g;
what fraction of the volume of B(1;d;4) does S occupy?
(b) (12.5 points) For any c > 0, prove that the fraction of the volume of
B(1;d;4) outside the slab
jx1 j cd1=4 is at most 1c3 e c4=4:
1
3. (25 points) OverlapofSpheresinHigh-Dimensions: Let x be a random sample
from the (surface of the) unit sphere in d-dimensions with the origin as center.
(a) (5 points) What is the value of Efxg?
(b) (5points) What is component-wise variance of x? That is, for i2f1;2;:::;dg
what is Ef(xi Efxig)2g?
(c) (5 points) Show that for any unit length vector u, the variance of the real-
valued random variable uTx is Pdi=1 u2i Efx2ig. Using this, compute the vari-
ance and standard deviation of uTx.
(d) (5points) Given two unit-radius spheres in d-dimensional space whose cen-
ters are separated by a distance of a, show that the volume of their intersec-
tion is at most
8e a2(d 1)=8
apd 1
times the volume of each sphere.
(e) (5 points) From your solution to problem 3d, present a verbal argument
that supports the conclusion that if the inter-center separation of the two
spheres of radius r (r is not necessarily unity) is (r=pd), then they share
very small mass. From this, make a cogent case for the conclusion that
given randomly generated points from the two distributions, one inside
each sphere, we can tell “which sphere contains which point” (i.e. clas-
sify we have a clustering algorithm that separates randomly generated data
into two spherical-groups)
4. (25points) ACounterpointtotheJohnson-LindenstraussLemma: Prove that
for every fixed dimension reduction matrix A2Rk d with k < d, there is a pair
of vectors x;y 2Rd such that the distances between their images Ax and Ay is
hugely distorted (compared to the distance between x and y).