讲解R、R code设计辅导留学生、讲解留学生R语言

Statistics 471 { Homework 4
Due Tuesday March 20, 2018
Homework Assignment Policy and Guidelines
(a) Homework assignments should be well organized and reasonably neat. It is required
that you show your work in order to receive credit.
(b) Unless otherwise stated in a problem, please use R Markdown to write homework
answers.
(c) Homework assignments are due in class unless otherwise noted. Credit will not be
given for homework turned in late.
(d) You may be asked to submit some homework problems online, in addition to a hard
copy that you turn in in class. Such homework problems will be marked online submission .
Your submission should be combined into one PDF or HTML document.
(e) Unless it is speci cally stated otherwise, you may work on and submit your homework
in groups of 1 or 2. If you choose to work as a group of 2, both of you should con-
tribute signi cantly to the solution for every question and submit only one copy of the
homework with both your names on it. Whether you submit on your own or with a
partner, discussing homework with your fellow students is encouraged. However, after
discussions, every group must ultimately produce their own homework to be graded.
Verbatim copying of homework is absolutely forbidden.
1. online submission Write a function LevenbergMarquardt to implement the
Levenberg-Marquardt algorithm and use it to nd the MLE of = ( 1; 2)0 in the log-
likelihood function (2) on Page 4 of Notes 11 for the Old Faithful data.
2. Consider the dataset \shuttle.csv" discussed on Pages 17-21 of Lab3.pdf.
(a) Derive formulas for the gradient vector and the Hessian matrix of the log-likelihood
function in Eq. (2), page 18.
(b) online submission Write a function oringNR to implement the Newton-Raphson
algorithm to determine the maximum likelihood estimates of the regression coe cients
0 and 1 in Eq. (1), page 18. Report the iteration history using the same starting
values for coordinate ascent.
Your function should be of this form.:
oringNR 0 is the nitrogen content beyond which there is
no improvement (increase) in yield.
The following data was collected on nitrogen amount and yield of a particular crop.
data.frame("nitrogen"= c(rep(0,4),rep(30,4),rep(60,4),rep(90,4),rep(120,4)),
"yield"=c(1.41,1.75,2.02,2.13,1.93,2.24,2.29,2.35,2.12,2.38,
2.49,2.57,2.16,2.20,2.28,2.49,2.34,2.45,2.59,2.62))
The rst four yield values correspond to zero nitrogen application, the second four correspond
to nitrogen values of 30, etc..
(a) online submission Fit a simple linear regression model to the data. Display the
tted model on a scatterplot of the data.
(b) Determine the form. of the gradient vector of the model function
f(x; ) = 0 + 1 min(x;Nmax)
with respect to the parameter = ( 0; 1;Nmax). (Hint: For Nmax consider the two
cases xNmax.)
(c) online submission Write a function LPmodel to t the linear plateau model using
the Gauss-Newton method discussed in Notes #11. Use the estimated coe cients from
your simple linear regression t as starting values for 0 and 1, and max(N) 5 as
the starting value for Nmax.
2
LPmodel=function(Y,N,tol=1e-5,maxiter=100)
{
## compute starting values inside the function
## del=change in sum of squares between iterations
return(list(b0,b1,Nmax,iter,del))
}
(Note: you can check your answer using the nls function in R.)
(d) online submission Display the tted model on the scatterplot. Use di erent colors
for the simple linear regression and linear plateau model ts and indicate which is
which using a legend.
4. Let X1;X2;:::;Xn be a random sample from a Poisson distribution with mean .
(a) Write down the likelihood function for . Show that the sample mean, Xn, is the
maximum likelihood estimator for .
(b) The score statistic for testing H0 : = 0 against a two-sided alternative is
(1) S =PiXi n 0pn 02
=Xn 0p0=n!2:
Explain why this statistic has a chi-squared distribution (approximately) if H0 is true.
Determine its degrees of freedom.
(c) What is the corresponding likelihood ratio statistic?
(d) Show that the endpoints of a con dence interval for obtained by inverting the score
test are the solutions of a quadratic equation. Show that the endpoints are always
positive unless the all the data values are zero.
(e) online submission Create a function to determine the endpoints of a con dence
interval for obtained by inverting the score test. This function should be of the form.:
Score_interval = function(data,alpha)
{
#R code
return(lower,upper)
}
where \data" is the vector containing the sample values, and \alpha" is the desired
con dence level.
3
(f) online submission Create a function to determine the endpoints of an approximate
con dence interval for by inverting the likelihood ratio test. You will need to
determine the endpoints numerically, e.g. using the Newton-Raphson method discussed
in class. (Consider calling the function Score interval to get starting values for the
endpoints.) The function should return warnings if all the data values are zero, if there
are negative values, or if some of the values are not integers. Your function should be
of the form.:
LR_interval = function(data,alpha,tol=1e-9,max.iter=100)
{
#R code
return(lower,upper)
}
where \data" is the vector containing the sample values, and \alpha" is the desired
con dence level.
5. online submission The le \horsekicks.csv" has three columns repre-
senting the number of deaths by horse kick in the Prussian Army, the year
(1875 to 1894), and the Corps. There are numerous web sites that dis-
cuss this data; for example, http://mindyourdecisions.com/blog/2013/06/
21/what-do-deaths-from-horse-kicks-have-to-do-with-statistics/ and
http://blog.minitab.com/blog/quality-data-analysis-and-statistics/
no-horsing-around-with-the-poisson-distribution-troops.
(a) Use the tapply function to determine the total number of deaths by horse kick per
year.
(b) Use your LR interval function in Problem 4(f) to determine an approximate 95% con-
dence interval for, , the expected number of deaths by horse kick per year assuming
the counts (y in the data set) are a random sample from the Poisson distribution.