辅导R编程、R讲解、解析R编程、R编程辅导、辅导R、R辅导、辅导STAT3017/7017

STAT3017/7017 Homework 2 Page 1 of 2
Homework 2
Due by Monday 27 August 2018 10:00
In this homework, we are going to consider the recent paper by researchers at Google
titled Nonlinear Random Matrix Theory for Deep Learning (NIPS 2017). When people talk
about AI they usually mean ‘Deep Learning’ which is the idea that you can obtain amazing
results by using a cascade of neural networks. Unfortunately, although the results are good,
there is little mathematical theory to explain why it works so well. This paper suggests that
maybe we can use Random Matrix Theory to achieve this goal.
Layers in neural networks roughly have the form. Y =f(WX) where X is a data matrix,
W = [wij] is a matrix of weights, and f : R!R is a nonlinear activation function (e.g.,
f(x) :=max(0;x)) that applies elementwise on the matrix WX; see Figure 2 in [1]. Fitting a
(one-layer) Deep Learning model involves finding the weight matrix W given some training
data (Y;X) (e.g., labels Y for inputs X). A multi-layer model is a cascade that takes the form
of (15) in [2].
In [2], they consider the following stylised setup. X = [Xi ] is a n0 m random data
matrix with i.i.d. elements Xi N(0; 2x) and W = [Wij] is a random weight matrix with i.i.d.
elements Wij N(0; 2w=n0). They set = n0=m and = n0=n1 and consider the regime
where n0;n1;m!1while and remain constant. The function f : R!R must satisfy
E[f( w xZ)] = 0; E[jf( w xZ)jk]1;
for random variable Z N(0;1). Let Y = f(WX) where f is applied elementwise to the
matrix WX, they then proceed to study the n1 n1 matrix M := 1mYY0.
Question 1 [5 Points]
The empirical spectral density of M is given by
M(t) = 1n
1
n1X
j=1
j(t)
where 1;:::; n1 are the n1 eigenvalues of M. Show that the Stieltjes transform. G of
M is given by
G(z) = 1n
1
E[tr(M zIn1) 1];
where In1 is a n1 n1 identity matrix and tr is the trace operator.
Question 2 [5 Points]
Let the nonlinear activation function f : R!R be given by
f (x) = [x]+ + [ x]+
1+ p
2 q
1
2(1 +
2) 1
2 (1 + )
2
Dale Roberts - Australian National University
Last updated: August 18, 2018
STAT3017/7017 Homework 2 Page 2 of 2
where [x]+ is the positive part of x. Show that
E[f ( w xZ)] = 0 and := E[ w xf0 ( w xZ)]2 = (1 )
2
2(1 + )2 2 (1 + )2
for Z N(0;1). In particular, show that when = 1 we can write f as a simpler (well
known) function and that we also get = 0.
Question 3 [5 Points]
Consider the case = 0 (ie., see Section 3.2.2). Show that the Stieltjes transform. GMP
of the Marchenko-Pastur distribution with shape y = = , satisfies the equation
zG2 +

(1 )z 1

G+ = 0:
Question 4 [5 Points]
Reproduce the numerical experiment of Figure 1 in [2] but only for the case that = 1
but with L= 1, L= 5, and L= 10.
Question 5 [5 Points]
Now redo the numerical experiment for the nonlinear activation function f for 1
(but less than 1). What is your conclusion about this numerical experiment? and the
paper overall?
References
[1] LeCun, Bengio, and Hinton (2014). Deep learning. Nature.
[2] Pennington and Worah (2017). Nonlinear random matrix theory for deep learning. NIPS.
This homework is to be submitted through Wattle in digital form. only as per ANU policy. The
R code must be supplied. If you use any references (note: this will never count against you),
please clearly indicate which ones.
Dale Roberts - Australian National University
Last updated: August 18, 2018