BTRY 4030 - Fall 2018 - Homework 1-4

BTRY 4030 - Fall 2018 - Homework 5 Q1
Put Your Name and NetID Here
Due Tuesday, December 4, 2018
You may either respond to the questions below by editing the hw5_2018_q1.Rmd to include your answers
and compiling it into a PDF document, or by handwriting your answers and scanning these in.
You may discuss the homework problems and computing issues with other students in the class. However, you
must write up your homework solution on your own. In particular, do not share your homework RMarkdown
file with other students.
Here we will add one more deletion diagnostic to our arsenal. When comparing two possible models, we often
want to ask “Does one predict future data better than the other?” One way to do this is to divide your data
into two collections of observations (X1, y1) and (X2, y2), say. We use (X1, y1) to obtain a linear regression
model, with parameters ˆ and look at the prediction error (y2 − X2 ˆ )T (y2 − X2 ˆ ).
This is a bit wasteful – you could use (X2, y2) to improve your estimate of ˆ . However, we can assess how
well this type of model does (for these data) as follows:
For each observation i
i. Remove (xi, yi) from the data and obtain ˆ (i) from the remaining n − 1 data points.
ii. Use this to make a prediction ˆy(i)i = xT
i
ˆ (i).
Return the cross validation error CV = Pn
i=1(yi − ˆy(i)i)2
This can be used to compare a models that use different covariates, for example; particularly when the models
are not nested. We will see an example of this in Question 2.
Here, we will find a way to calculate CV without having to manually go through removing observations one
by one.
a. We will start by considering a separate test-set. As in the midterm, imagine that we have X2 = X1, but
that the errors that produce y2 are independent of those that produce y1. We estimate ˆ | using (X1, y1):
ˆ = (XT1 X1)−1XT1 y. Show that the in-sample average squared error, (y1 − X1 ˆ )T (y1 − X1 ˆ )/n, is
biased downwards as an estimate of , but the test-set average squared error, (y2−X2 ˆ )T (y2−X2 ˆ )/n,
is biassed upwards. (You may find the midterm solutions helpful.)
b. Suppose that p = 0, that is the final column of X1 has no impact on prediction. Show that the test set
error is smaller if we remove the final column from each of X1 and X2 than if we don’t. (This makes
using a test set a reasonable means of choosing what covariates to include. )
c. Now we will turn to cross validation. Using the identity