讲解留学生R语言、R编程解析、辅导R编程、R辅导、辅导留学生R设计

Due at the beginning of class on Wednesday January 24, 2018
All homework pages (except the top sheet) must be stapled (before you come to class).
The rst page (‘top sheet’) should contain ONLY your name, student ID, discussion section,
and homework number. Use the format shown below. Do NOT staple to the rest of your
homework.
The second page (which means a new sheet of paper, so not the back side of the rst page)
should ALSO contain your name, student ID, discussion section, and homework number.
Use the format shown below.
After I collect homeworks I put all of the top sheets into a folder before passing the home-
works to the grader. If at any point during the quarter there is a homework that you know
you turned in, but it does not show on Canvas, contact me. I will look to see if I have your
top sheet from the homework. If I have your top sheet then I will give you credit for the
homework.
It is your responsibility to make sure every homework assignment you submit has a top sheet
with your correct discussion section number. If you tell me you turned in a homework, but
there is no Canvas grade and no top sheet in my folder then you will get a 0.
You will not loose any point for not making a top sheet. But if your homework goes missing
you will have no way to prove you turned it in.
On both the rst page (‘top sheet’) and second page write your name and student ID on the
top left, homework number on the top center, and section on the top right. For example,
for homework 1 if your name is John Smith, your student ID is 123456789, and you are in
section A01, then the top of your rst page should look like this
John Smith Homework 1 A01
123456789
Points lost if you
don’t follow the rule
correct format for name, ID, homework and section number 1
Staple all pages EXCEPT the top sheet 1
If your homework is on paper pulled out of a notebook,
cut o all of the fringes (from the torn horizontal threads
that attached the paper to the notebook). 1
Be kind to the grader.
make sure you write your name clearly (so it is easy to read)
write neatly
circle all nal answers (so they are easy to nd)
PART A
Use the same data you used for homework 1.
x=age, y=blood pressure, and n = 25 (number of xi;yi pairs). It is always a good idea when
2
importing data into R to make sure the data has been correctly imported. One way to do
this is with a quick check of the sample means which are x = 45:44 and y = 127.
The simple linear regression model for this data is
Yi = 0 + 1xi +"i for i = 1;:::;n (1)
where Yi is a random variable for the outcome of the ith experimental unit, 0 and 1 are
parameters and "i is a random variable. We make the following assumptions.
1. E("i) = 0 for i = 1;:::;n
2. "1;:::;"n has constant variance which we de ne as
2("i) = 2 for i = 1;:::;n
3. The "1;:::;"n are uncorrelated. That is, for any i6= j we have ("i;"j) = 0.
4. "i N for i = 1;:::;n (that is the "i are all normally distributed)
Let b0 and b1 be the least squares estimates of 0 and 1.
PART A QUESTIONS
1. Find the 99% con dence interval for 1.
2. Using the con dence interval you calculated in question 1 do you reject or fail to reject
the null hypothesis H0 : 1 = 0 if the signi cance level is = 0:01. (Use a two-sided
test, so the alternative hypothesis is H0 : 1 6= 0.) How did you make your decision?
3. Look back and the answer to question 10(e) on homework 1. Say why you get di erent
decisions (i.e. rejecting or failing to reject H0) on question 2 in this homework and
question 10(e) in homework 1.
4. We found in question 10(d) on homework 1 p-value=0.043 for testing H0 : 1 = 0.
Using this information, will 0 be inside or outside of the 95% con dence interval. You
should answer this question without actually computing the 95% con dence interval,
but you may do the calculation to verify your answer is correct.
5. We found in question 10(d) on homework 1 p-value=0.043 for testing H0 : 1 = 0.
Using this information, will 0 be inside or outside of the 90% con dence interval. You
should answer this question without actually computing the 90% con dence interval,
but you may do the calculation to verify your answer is correct.
6. Find the 85% con dence interval for E(Yjx = 40). In book notation this is the con -
dence interval for E(Yh) with xh = 40. The point estimate E(Yjx = 40) is ^yh and the
estimate for the variance for this estimate is
s2(^yh) = MSE
2
66
4
1
n +
(xh x)2
nP
i=1
(xi x)2
3
77
5
Some R code to help you get started with these types of questions. The following
assumes you have already created the variables MSE and n (see homework 1).
3
#GET PREDICTED VALUE FOR x=40
pred40 = b0 + b1*40
#GET VARIANCE OF THE PREDICTED VALUE
x.h=40
var.EY40 = MSE*( 1/n + (x.h - mean(x))^2/sum( (x-mean(x))^2 ) )
#GET CRITICAL VALUE
alpha=1-0.85
CV=qt(1-alpha/2,n-2)
#GET LL=LOWER LIMIT AND UL=UPPER LIMIT OF THE CONFIDENCE INTERVAL
LL = pred40 - CV*sqrt(var.EY40)
UL = pred40 + CV*sqrt(var.EY40)
c(LL,UL)
7. Find the 85% con dence interval for E(Yjx = 60).
8. For the two con dence intervals you calculated in questions 6 and 7 which one is wider?
What is the explanation for why the interval is wider?
9. Find the 85% prediction interval for Yh(new) for xh = 40. The estimate of the variance
for the predicted value is
s2(pred) = MSE +s2(^yh)
= MSE
2
66
41 +
1
n +
(xh x)2
nP
i=1
(xi x)2
3
77
5
10. Explain the di erence between the 85% con dence interval you calculated in question
6 and the 85% prediction interval you calculated in question 9.
11. Which is wider the 85% con dence interval you calculated in question 6 and the 85%
prediction interval you calculated in question 9?
12. If I gave you a new dataset, would you get the same answer for question 11? In other
words, will one of the two intervals (the 85% con dence interval or the 85% prediction
interval) always be wider?
Part B
1. s2(b1) is an estimate of the variance of b1 (the least squares estimate of 1).
s2(b1) = MSEnP
i=1
(xi x)2
;
which is an estimate of the variance of b1
2(b1) =
2("i)
nP
i=1
(xi x)2
: (2)
Which of the four assumptions given on page 2 are necessary for equation (2) to be
the correct formula. Hint: see 1/16/18 discussion.
4
2. Show that the following relationship holds.
nX
i=1
(xi x)(yi y) =
nX
i=1
xi(yi y)
3. Show that the following relationship holds.
nX
i=1
yi =
nX
i=1
^yi
You may use the fact that
nX
i=1
ei = 0:
4. Show that for the simple linear regression model, the point ( x; y) is always on the
regression line. Speci cally, this means you need to show that
y = b0 +b1 x
You may NOT use the relationship b0 = y b1 x