讲解留学生Machine Learning设计、辅导Machine Learning、Machine Learning开发讲解

Homework 1
MTH 496 { Machine Learning
Due date: Feb 11, 2018
(6 problems/2 pages)
Problem 1 (20pts). Assume the training data is given as follows: (x1;y1), (x2;y2), :::,
(xM;yM). The predictor of the linear regression is de ned as
pc(x) = c0 + c1x
a) Find the loss function associated with the predictor pc(x).
b) Find the optimal values c0 and c1. (Note: show all of your steps to receive a full
credit)
Problem 2 (30pts). Assume the training data for the classi cation task is given as follows:
(x1;y1), (x2;y2), :::, (xM;yM), with yi 2f0;1g, i = 1;2;:::;M. The logistic regression is
employed to learn this dataset.
a) What is the predictor for a given input x?
b) Show all of steps for constructing the loss function of the logistic regression method?
c) What is the parameter vector c in the predictor after the rst iteration in the gradient
descent. (choose your own initial values)
Problem 3 (20pts). a) What is the purpose of the regularization?
b) State the loss functions of linear regression and logistic regression under the regularization.
Problem 4 (20pts). Assume the features of our training data is given as
(1;20);( 3;40);( 2;10);(0;30)
a) Use two di erent ways of normalizing features to scale all the feature values in the training
data.
b) If the test data is given by (4;25);(2;15). Find the normalized features of the test set
corresponding to each normalization approach.
Problem 5 (10pts). What are the di erences between k-NN and k-Means methods? Why
are they called non-parametric algorithm?
1
Problem 6 (60pts). Assume the training data for the classi cation task is given as follows:
(x(1)1 ;x(1)2;y1), (x(2)1 ;x(2)2 ;y2), :::, (x(M)1 ;x(M)2 ;yM), with yi2f0;1g, i = 1;2;:::;M.
a) With c = (c0;c1;c2). Show that the margin of the hyperplane, i.e. the distance between
two lines cTx = 1 and cTx = 1 is 2=(pc21 + c22).
b) Describe the idea that leads to the formulation of SVM’s loss function (hard margin)
c) Describe the reasons why considering the soft margin
d) What is the purpose of using the kernel in SVM?
e) Why should kernels follow the Mercer’s theorem?
f) Prove that all the kernels mentioned in the lecture satisfy the Mercer’s conditions.