MATH 504 Homework 11

Homework 11

MATH 504

1. • Reading from Elements of Statistical Learning (available in Course Documents)

– Sec 5.1, 5.2 discuss spline regression and the general idea of using basis functions to calculate regressions. I followed roughly the notations

and development of these sections in the lecture.

• Reading from Sauer

– Sections 5.1.1, 5.1.2. Differentiation using finite difference approximations.

– Sections 5.2.1, 5.2.2, 5.2.3. Integration using trapezoid rule and simpson’s rule. Newton-Cotes formulas are generalizations of Riemann,

trapezoid integration using higher order polynomials.

• Here is a nice webpage on spline regression in R, see section 4.5 therein.

http://data.princeton.edu/R/linearModels.html

The main point is that the R function lm is used in conjunction with

an R function, e.g. bs, that selects a spline basis and then computes

the associated model/design matrix. In otherwords, spline regression is

viewed as a linear regression as we discussed in class and this view is

made explicit by using lm.

2. The file BoneMassData.txt contains bone mass data for men and women at

a variety of ages. This data comes from the book, ”The Elements of Statistical

Learning”. In this problem, we will apply a spline regression to the dataset

using predetermined knots. Specifically our regression model will be y ∼ S(x)

were x is the age, y is the bone mass, and S(x) is a cubic spline. For brevity

consider only the samples taken from women and for simplicity consider two

knots with ζ1 = 15 and ζ2 = 20 (the case of more knots is no different). Then

our splines are determined by the parameters ai

with the requirement that S(x), S0

(x), S00(x) be continuous at the two knots.

Our goal is to find S(x) that minimizes the sum of squared residuals

Show that the spline S(x) that minimizes the sum of squared residuals

is given by α defined by α = (BT B)

)1BT y where y is the vector of yi

values and B is a N × D matrix given by Bk` = h`(xk). (We did this in

class, I want you to go through the details.)

(b) Now show that D from (a) has value D = 6. (Again, we did this in class.)

+ form a basis for all splines with knots

at ζ = 15, 20. To do this you must show (i) each hi(x) is a spline for

the two knots, (ii) the hi(x) cannot be linearly combined to give the

zero function, and (iii) each spline must be a linear combination of these

functions. (Hint: show (i) and (ii) and then use the dimensionality of the

spline space to show (iii))

(d) Now compute the regression spline S(x) and plot it along with the data

points to show the fit. (Typically it is better to do the actual fitting

using a call to lm or using a spline regression package, but it’s good to

go through the process yourself at least once.)

(0) = 1. Consider the following two finite

For h = 10i with i = =20, ;19, ;18, . . . , ;1, 0 calculate both finite differences.

For each h, determine how many digits in the finite difference estimate are correct (you know the true value of the derivative is 1). Note that .99991 is correct

up to 4 digits since .999999 · · · = 1. Explain your results given finite differences

and floating point error. DON’T FORGET TO SET options(digits=16).

F(x) is an important function - essentially the cdf of a normal random variable

- that has no analytic formula and must be evaluated numerically. Write a

function, Fapprox(n, method) that approximates F(∞) by numerical integration. If method is set to the character string ”reimann” or ”trapezoid” use a

Riemann sum and trapezoid rule method, respectively, with n grid points. If

method is set to ”useR”, use the R integrate function with subdivisions set

to n. (R’s integrate function uses an adaptive grid method. In these methods, the grid is made dense in regions where the function varies, see Sauer for

further details on such methods.) Since you cannot integrate to ∞, you must

pick some reasonable cutoff. We know that the true value is F(∞) = 1/2.

Consider approximating the integral using each of the three methods, tryn = 10, 100, 1000, 10000 and compare accuracy.

联系我们