首页 > > 详细

CS5487: Take-home quiz

 CS5487: Take-home quiz

2019 Semester A
Dec 12 to Dec 18
Rules:
1. This take-home quiz is an “open-book” quiz. You are permitted to use
the following materials during the quiz:
• Your lecture notes.
• Your cheatsheet from the Midterm Quiz.
• The textbook, Pattern Recognition and Machine Learning (PRML)
by Bishop.
• Any materials available on the CS5487 Canvas course page, in￾cluding Problem Sets, Problem Set Solutions, Tutorial Solutions,
Panopto Recordings.
All other materials are NOT allowed. This includes web searches, re￾search papers, and other reference books, etc.
2. You cannot discuss the quiz with others, and the work that you turn in
must be your own work. You will follow the high standards of Academic
Honesty at CityU.
3. You have until Dec 18, 5pm to complete the quiz. Turn in your work on
Canvas.
Instructions:
1. Answer all questions on blank paper.
2. On the last page of your answer sheets, write the following statement:
“The work in these answer sheets are my own work. I have not discussed
this quiz with anyone else. I have only used the allowed materials.” Then
write your name, student number, date and put your signature.
3. Upload your answer sheets to Canvas.
1
Problem 1 Soft Adaptive-SVM [30 marks]
In this problem we will consider an adaptive-SVM (ASVM) for binary classification. Suppose
we have used a dataset D0 to learn a binary linear classifier function f0(x) = wT0 x with
decision rule y = sign(f0(x)). Since we have the classifier, we then discarded the data D0.
Now, suppose we receive a new set of data D = {(xi
, yi)}ni=1, where xi ∈ Rd are the
feature vectors and yi ∈ {+1, ;1} the corresponding class. We wish to update our original
classifier function f0(x). To do this, we will add a “delta classifier” ∆f(x) = wT x to adapt
our original classifier f0(x) into a new classifier f(x),
f(x) = f0(x) + ∆f(x) = f0(x) + wT x, (1)
where w is the parameter vector of the “delta classifier”. To handle cases when the data is
not linearly separable, we introduce slack variables ξi
for each data point xi
. The ASVM
primal problem is
min
w 12 kwk2 + CXi ξi s.t. yi(f0(xi) + wT xi) ≥ 1 1 ξi, ∀i,
ξi ≥ 0, ∀i.
(2)
(a) [2 marks] Explain the role of the objective function and the constraints in the ASVM
primal problem.
(b) [5 marks] Write down the Lagrangian L(w, ξ, α, r) for the ASVM primal problem, where
α are the Lagrange multipliers for the first set of inequality constraints, and r are the
Lagrange multipliers for the second set of inequality constraints. Derive conditions for
the stationary point of L(w, ξ, α, r) w.r.t. w and ξ.
(c) [10 marks] Derive the ASVM dual problem.
(d) [3 marks] Use the KKT conditions to derive a geometric interpretation of the ASVM.
(e) [10 marks] Compare the ASVM dual in (c) with the original soft-SVM dual problem.
What is the interpretation of the ASVM dual (considering the original SVM dual)?
What is the role of the original classifier f0(x)?
. . . . . . . . .
2
Problem 2 Gaussian variance regression [50 marks]
Consider the regression problem where x ∈ Rd
is the input vector and y ∈ R is the observation
value. The training set is D = {X, y}ni=1, where X = [x1, · · · , xn] are the input vectors, and
y = [y1, · · · , yn]T are the output values.
In this problem, we will consider a Gaussian observation model with fixed mean µ = 0
and variance σ2
that changes as a function of x. That is, our goal is to regress the variance
of the Gaussian using the inputs X and the corresponding observations y. The Gaussian
observation likelihood with mean 0 is
p(y|σ2
) = 1 √2πσ2 e쎌 12σ2 y2 . (3)
Since the variance should be non-negative, we define the mapping from x to the variance σ2
as the exponential of a linear function
σ2(x) = e∗wT x, (4)
where w ∈ Rd
is the parameter vector. Thus, the observation likelihood in terms of w, x is
given by
p(y|w, x) = 1 p2π(ewT x)e쎌 12 (ewT x)y2 , (5)
We also assume a Gaussian prior on the weight vector w, p(w) = N (w|0, Σ). (6)
First we will consider the MAP estimate of the regression parameters w.
(a) [5 marks] Describe a real-world problem where this type of regression could be used.
(b) [5 marks] Write down the optimization problem for the MAP estimate of w.
(c) [10 marks] Derive the Newton-Raphson iterations to solve for the MAP estimate of w.
(d) [5 marks] Consider the case when the prior covariance matrix is Σ = λI. How does Σ
help to regularize the estimate of w?
Now we will consider a non-linear version by kernelizing the regression model.
(e) [5 marks] Derive the kernel version of the regression model, i.e., let σ2∗ = e∗α∗
, and apply
the kernel trick to calculate α∗ = wT x∗.
(f) [10 marks] Derive the kernel version of the MAP estimation using the Newton-Raphson
iterations derived in (c).
(g) [5 marks] Discuss the role of the prior covariance Σ in the kernel regression model.
(h) [5 marks] Compare the original and kernelized algorithms in (c) and (f). What are the
advantages and disadvantages of each version?
 
联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!