Statistical Machine 语言程序调试、讲解留学生Machine Learning

TTIC 31020: Introduction to Statistical Machine Learning
Autumn 2017
Practice exam
This is a practice exam, intended to reflect the diculty/format of the
real final (although not necessarily its length!)
Please read the questions carefully. Think what each question is about,
what fundamental principles are involved, and what lectures/homework may
be relevant if you want to consult the course material. Make sure you are
answering the question as asked, and not some extension of the question you
have made up. For instance, if the question is “what is likely to happen if
we do X”, do not answer the question “is is reasonable to do X”.
Your written answers, when required, are expected to fit in the designated
space. Feel free to provide additional reasoning if you feel it’s important. If
you can’t quite answer the question, but have some pertinent thoughts that
you think may earn partial credit, do write them down (succinctly!).
If you want to discuss these with the TAs or the instructor during oce
hours, please do not hesitate to do so!

Problem 0
In a binary classification setting (two classes), you are given 2d data points
in d-dimensional feature space (for example, 200 points in 100-dimensional
space). You happen to know for a fact that the true distribution of each class
is a Gaussian, and that the Gaussian for each class is likely to have its own,
full (unrestricted) covariance matrix.
You have access to software capable of training two classification models:
1. linear logistic regression (linear features only), with no regularization
2. quadratic discriminant analysis, fitting a multivariate Gaussian to each
class, with no restrictions on covariance matrices.
You can not modify the software, e.g., you can not add regularization to (1),
or add restrictions on covariances in (2).
Part 1 [5 points]
Which of the two classification models above will you use for this problem?
Explain briefly.
Part 2 [5 points]
Which of the two models will su↵er more from approximation error?
Problem 1
In a machine learning class, the probability of a student getting an A grade is
Pr(A)=1/2, a B grade Pr(B)=µ,aCgradePr(C)=2µ,andaD(lowest)
grade Pr(D)=1/23µ.Thisyear,wearetoldthatc students got a C, and
d students got a D. We also know that h students got either A or B - but we
do not know how many exactly got each of these two grades. So, if a is the
number of students who got an A, and b the number of students who got a
B, then a and b are unknown, with a constraint a + b = h.
We want to use the EM algorithm to find a maximum likelihood estimate for
µ,witha and b treated as hidden variables.