调试Python、Machine Learning编程解析、讲解机器学习

TTIC 31020: Introduction to Statistical Machine Learning
Autumn 2017
Problem Set #4
Out: November 28, 2017
Due: [Optional] Sunday November 10, 11:59pm
Instructions
How and what to submit? Please submit your solutions electronically via Canvas.
Please submit the empirical component of the solution (Python code and the documen-
tation of the experiments you are asked to run, including gures) in a Jupyter notebook le.
Name the notebook
hfirstname-lastnamei-sol5.ipynb.
Late submissions: No submissions will be accepted past the deadline. Note
that this is di erent from previous problem sets.
What is the required level of detail? When asked to derive something, please clearly
state the assumptions, if any, and strive for balance: justify any non-obvious steps, but try to
avoid super uous explanations. When asked to plot something, please include int the ipynb
le the gure as well as the code used to plot it. If multiple entities appear on a plot, make
sure that they are clearly distinguishable (by color or style. of lines and markers). When
asked to provide a brief explanation or description, try to make your answers concise, but
do not omit anything you believe is important. If there is a mathematical answer, provide
is precisely (and accompany by only succinct words, if appropriate).
When submitting code, please make sure it’s reasonably documented, and describe suc-
cinctly in the written component of the solution what is done where.
Collaboration policy : collaboration is allowed and encouraged, as long as you (1) write
your own solution entirely on your own, (2) specify names of student(s) you collaborated
with in the notebook.
1 Neural networks
This problem set will deal with neural networks. We will complete an implementation of
neural networks, including training by back-propagation, and apply it to two data sets we
have worked with in the course: the Fashion MNIST and the consumer reviews (CR) data
set. You can use the train/validation/test partitions for those data set from Problem Sets 4
and 3, respectively.
1
The code in the notebook has a few missing pieces. Otherwise, it implements all you
need for a basic multi-layer neural network. The network can be constructed as a sequence
of modules (layers), and a module has two essential functionalities:
forward computation (converting input into output);
backward computation (converting error signal from output to error signal for the
input, as well as computing gradient of the error with respect to the model parameters,
if any.
The additional les include implementation of general optimization tools (utils.py) and
some tools speci c to the data in the two domains (CRutils.py, FMNIST utils.py).
The main type of layer we will work with is the \fully connected layer", which is the
basic layer we covered in class; given the output zt 1 of the previous layer, it computes its
output as
zt = h(Wtzt 1 + bt)
where Wt is the matrix of weights, and bt is the vector of biases.
Problem 1 [40 points]
Complete the missing code (marked with YOUR CODE HERE in the notebook) for the following:
the forward/backward passes in the fully connected layer implementation;
the forward/backward passes in ReLU layer;
the backward pass in the SoftMax layer.
Note that the SoftMax layer is technically not a layer but a di erent object called \criterion"
{ that’s a common jargon in the neural network implementations for the loss \layer" (the
criterion to be optimized). It has forward/backward functionalities like the network layers,
but it interacts with an external source of information { the true label values.
Also, note that the ReLU layer does not have trainable parameters (weights or biases);
nonetheless, it does perform. computation in both the forward and the backward pass.
Use the provided gradient check tools to test your implementation (the notebook has the
relevant calls, you just need to make sure the results look good).
End of problem 1
Problem 2 [30 points]
Train a multi-layer network on the Fashion MNIST data, and use the best model you can
nd to submit predictions on the test set to Kaggle:
https://www.kaggle.com/c/ttic-31020-hw5-fashion-mnist/overview
End of problem 2
2
Problem 3 [30 points]
Train a multi-layer network on the CR data, and use the best model you can nd to submit
predictions on the test set to Kaggle:
https://www.kaggle.com/c/ttic-31020-hw5-cr/overview
End of problem 3
In the last two problems, you are free to experiment with architectures (number of hidden
layers, size of hidden layers), activation functions (ReLU, sigmoid, etc.) as well as with opti-
mization techniques (di erent update rules, dropout, weight decay, learning rate schedules).
However please do not introduce new layer/connection types (no convolutional layers, for
instance). Many of the relevant tools (e.g., dropout) are included in the code; others (e.g.,
tanh activations) you may need to code up yourselves.