Machine Learning Theory (CSC 482A/581A)
Problem Set 1 Due on Friday, February 2nd, 5pm
Instructions:
• You must write up your solutions individually.
• You may have high-level discussions with up to 2 other students registered in the course. If
you discuss problems with others, include at the top of your submission: their names, V#’s,
and the problems discussed.
• Your solutions should be clear and concise (and legible!). You are strongly encouraged to
type up your solutions in LaTeX. For any problems where you only have a partial solution,
be clear about any parts of your solution for which you have low confidence.
• If submitted through conneX, the due date is Friday, February 2nd, 7pm. This is a hard
deadline. If submitted in writing, submit at the beginning of class (12:30pm) Friday February
2nd. You are thus incentivized to submit online (using LaTeX) as you get more time!
Questions:
1. Let X = R2 and take C to be the class of concentric circles C ={cr : r≥0}, where for each
nonnegative real number r≥0, we have cr(x) = 1[bardblxbardbl2≤r]. Prove thatC is PAC learnable.
In particular, show a PAC learning algorithm which, given a training sample of size n≥ log 1δε ,
finds with probability at least 1−δ a hypothesis ˆf∈C for which R(ˆf)≤ε.
2. Devise an efficient mistake bound learner for the concept class k-term DNF verX ={0,1}d.
The runtime and mistake bound of your algorithm both should be polynomial in d; you may
treat k as a constant.
3. Let X ={0,1}d and consider PAC learning a finite concept class C. Assume that the inputs
are drawn i.i.d. from an unknown distribution P overX, and the labels are generated via the
rule Y = c(X) for some c∈C.
Let’s call this problem the “clean” problem; so, in the clean problem, the training sample
consists of random examples of the form. (X,Y) for X∼P and Y = c(X).
Next, consider the following “corrupted” problem: Each time we request a random example
(X,Y), with probability α(X) ∈ [0,1] the value of the label Y is flipped. Call the resulting
label tildewideY. Thus,
tildewideY =
braceleftBigg−Y with probability α(X)
Y with probability 1−α(X)
In the corrupted problem, the examples are of the form. (X,tildewideY), and so the labels are noisy.
1
(a) Using c and α, derive an expression for the Bayes classifier for the corrupted problem.
(b) For the remaining questions, assume that α(x) = 14 for all x ∈X. What is the Bayes
classifier for the corrupted problem?
(c) What is the Bayes risk for the corrupted problem?
(d) Let cε ∈C be a hypothesis for which Pr(cε(X) negationslash= c(X)) = ε > 0. What is the risk
(expected zero-one loss) of cε for the corrupted problem?
(e) Design an algorithm for PAC learningC given access only to corrupted labeled examples
(X1,tildewideY1),...,(Xn,tildewideYn). That is, your algorithm should, with probability at least 1−δ,
output a concept ˆf ∈C for which EX∼P[ ˆf(X) negationslash= c(X)] ≤ε. Your algorithm should be
statistically efficient (you should mention the sample size n required, and n should be
polynomial in 1ε and 1δ), but it need not be computationally efficient. Please explain why
your algorithm is correct.