首页 > > 详细

辅导CSCI3151-Assignment 2辅导Python编程

Assignment 2 - Foundations of Machine Learning CSCI3151 - Dalhousie University 
 
Q1 (30%) 
Gradient descent - Logistic regression 
In this question we are going to experiment with logistic regression. This exercise focuses on 
the inner workings of gradient descent using a cross-entropy cost function as it was learned in 
class. 
 
a) Using the ​pima indians data set​, first separate a random 20% of your data instances for 
validation. Then, apply a feature selection algorithm based on evaluating feature importance 
using ​Pearson correlation (​scipy documentation). Extract the top two most important features 
based on this measure. 
b) We want to train a logistic regression model to predict the target feature ​Outcome​. It’s 
important that no other external package is used (pandas, numpy are ok) ​for this question 
part​. We want to find the weights for the logistic regression using a ​hand made gradient descent 
algorithm. We will use cross-entropy as the cost function, and the logistic cross-entropy to 
compute the weight update during gradient descent. It is OK to reuse as much as you need from 
the code you developed for Assignment 1. Differently from what we did for Assignment 1, we 
are now using a random 20% of your data instances for validation. 
 
Your function should be able to return the updated weights and bias after every iteration of the 
gradient descent algorithm. 
 
Your function should be defined as follows: 
def LRGradDesc(​data, target, weight_init, bias_init, learning_rate, max_iter​): 
 
 
And it should print lines as indicated below (note the last line with the weights): 
Iteration 0:​ ​[initial_train cost], [train accuracy], [validation accuracy] 
Iteration 1: [train cost after first iteration], [train accuracy after first iteration], [validation 
accuracy after first iteration] 
Iteration 2: [weights after second iteration], [train cost after second iteration], [train accuracy after 
second iteration], [validation accuracy after second iteration] 
… 
Iteration ​max_iter​: [weights after ​max_iter iteration], [train cost after ​max_iter iteration], [train 
accuracy after ​max_iter ​iteration], [validation accuracy after ​max_iter ​iterations] 
Final weights: [bias], [w_0], [w_1] 
 
Note that you may want to print every 100 or every 1000 iterations if ​max_iter is a fairly large 
number (but you shouldn’t have more iterations than the indicated in ​max_iter​). 
 
c) Discuss how the choice of learning_rate affects the fitting of the model. 
d) Compare your model with one using a machine learning library to compute logistic 
regression. 
 
e) Retrain your model using three features of your choice. Compare both models using an ROC 
curve (you can use code from ​here​ to draw the ROC curve) 
 
 
 
Q2 (30%) 
Multi-class classification using neural networks 
In this question you will experiment with a neural network in the context of text classification, 
where a document can belong to one out of several possible categories. The main goal for you 
is to try different hyperparameters in a systematic manner so that you can propose a network 
configuration that is properly justified. You will experiment with the ​Reuters dataset​, which can 
be loaded directly from Keras: 
 
from keras.datasets import reuters 
(train_data, train_labels), (test_data, test_labels) = reuters.load_data(num_words=10000) 
 
a) Experiment with different hyper-parameters and report your best accuracy found. The most 
important hyperparameters that you need to experiment with in this question part are: number of 
layers, nodes per hidden layer, learning rate, and number of epochs. 
b) Describe how your convergence changes when you vary the size of your mini-batch. A plot 
showing cost in terms of number of epochs would be enough. Discuss the reasons for this. 
c) Experiment with different regularization options (e.g. L2 and dropout).You may need to make 
your network larger in case you don’t find much benefits from applying regularization. 
 
Note: we recommend you to control your initialization parameters by means of a seed 
https://keras.io/api/layers/initializers/​. 
 
Q3 (10%) 
Computational graph (no code involved) 
This question aims at checking your understanding on defining arbitrary network architectures 
and compute any derivative involved for optimization. 
 
Consider a neural network with N input units, N output units, and K hidden units. The activations 
are computed as follows: 
 
where σ denotes the logistic function, applied elementwise. The cost involves a squared 
difference with the target ​s (with a 0.5 factor) and a regularization term that accounts for the dot 
product with respect to an external vector ​r​. More concretely: 
 
a) Draw the computation graph relating ​x, z, h, y,​ , , and . 
b) Derive the backpropagation equations for computing ∂ /∂ . To make things simpler, you W (1) 
may use σ’ to denote the derivative of the logistic function. 
 
 
Q4 (30%) 
Tuning generalization 
In this question you will construct a neural network to classify a large set of low resolution 
images. Differently from Q2, in this case we suggest you a neural network to start experimenting 
with, but we would like you to describe the behavior of the network as you modify certain 
parameters. You will be reproducing some concepts mentioned during the lectures, such as the 
one shown on slide 8, of the lecture on “Ensembles, regularization and feature selection” from 
Week 4. 
 
a) Use the CIFAR-100 dataset (available from Keras) 
 
from keras.datasets import cifar100 
 
(x_train_original, y_train_original), (x_test_original, y_test_original) = 
cifar100.load_data(label_mode='fine') 
 
to train a neural network with two hidden layers using the ReLU activation function, with 500 and 
200 hidden nodes, respectively. The output layer should be defined according to the nature of 
the targets. 
a) Generate a plot that shows average precision for training and test sets as a function of the 
number of epochs. Indicate what a reasonable number of epochs should be. 
b) Generate a plot that shows average precision for training and test sets as a function of the 
number of weights/parameters (# hidden nodes). For this question part, you will be modifying 
the architecture that was given to you as a starting point. 
c) Generate a plot that shows average precision for training and test sets as a function of the 
number of instances in the training set. For this question part, you will be modifying your training 
set. For instance, you can run 10 experiments where you first use a random 10% of the training 
data, a second experiment where you use a random 20% of the training data, and so on until 
you use the entire training set. Keep the network hyperparameters constant during your 
experiments. 
d) Based on all your experiments above, define a network architecture and report accuracy and 
average precision for all classes. 
e) Can you improve test prediction performance by using an ensemble of neural networks? 
 
 
 
Submitting the assignment (REVISED) 
Note that you will have four separate Assignments 2 on Brightspace, i.e. one for each 
question (A2-Q1, A2-Q2, A2-Q3 and A2-Q4) 
1. Your assignment as a single .ipynb file including your answers should be submitted for each 
question before the deadline on Brightspace. 
Use ​markdown​ syntax to format your answers. 
2. You can submit multiple editions of your assignment. Only the last one will be marked. It is 
recommended to upload a complete submission, even if you are still improving it, so that you 
have something into the system if your computer fails for whatever reason. 
3. IMPORTANT: PLEASE NAME YOUR PYTHON NOTEBOOK FILE AS: 
--Assignment-N-Q.ipynb, for example 
Soto-Axel-Assignment-2-1.ipynb (for the first question of the second assignment) 
A penalty applies if the format is not correct. 
4. The markers will enter your marks and their overall feedback on Brightspace. In case that 
there is any important feedback, it will be given to you, but otherwise you would need to refer to 
the model solutions. 
 
Marking the assignment 
 
Criteria and weights. Each criterion is marked by a letter grade. Overall mark is the 
weighted average of the grade of each criterion. 
For the experimental questions: 
0.2 Clarity: All steps are clearly described. The origin of all code used is clearly. Markdown is 
used effectively to format the answer to make it easier to read and grasp the main points. Links 
have been added to all online resources used (markdown syntax is: [AnchorText](URL) ). 
0.2 Justification: Parameter choices or processes are well justified. 
0.2 Results: The results are complete. The results are presented in a manner that is easy to 
understand. The answer is selective in the amount and diversity of the experimental results 
presented. 
Only key results that support the insights are presented. There is no need to present every 
single experiment you carried out. Only the interesting results are presented, where the 
behaviour of the ML model varies. 
0.4 Insights: The insights obtained from the experimental results are clearly explained. The 
insights are connected with the concepts discussed in the lectures. 
The insights can also include statistical considerations (separate training-test data, 
cross-validation, variance).Preliminary investigation of the statistical properties of the attributes 
(e.g. histogram, mean, standard deviation) is included. 
 
For the theoretical questions (Q3): 
0.6 Correctness: Correctness of the answer. Explanation is clear and precise. 
0.4 Neatness of explanation: Explanation is well written, well structured and easy to read. It 
uses well defined and consistent notation. 
 

联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!