首页 > > 详细

讲解留学生Assignment 程序、Assignment 编程解析、讲解留学生R语言、R程序讲解、辅导R

THE AUSTRALIAN NATIONAL UNIVERSITY
RESEARCH SCHOOL OF FINANCE,
ACTUARIAL STUDIES AND STATISTICS
STAT3008/STAT7001
APPLIED STATISTICS
Assignment 1
Lecturer: Dr Tao Zou
Last Updated: Fri Sep 14 08:59:14 2018
This assignment is due at 11:00 am, Sep 26, 2017.
This assignment is worth 15% of your final grade but is optional and redeemable.
Students are expected to complete this assignment individually. Maximum points:
15.0. You cannot get partially correct for all the questions. Assignments can only
be submitted via the physical assignment box at the front of the reception
on Level 4, CBE Building (26C). Hard copy submission is required. Late
submission will not be accepted and the weight will roll over to your final exam.
Identical submissions are treated as cheating.
Please exactly follow the instructions of questions and write down your short
answers of the following questions in the answer sheet file on the Wattle. Note
that you do not need to copy the questions in the answer sheet. Please only submit
your finished answer sheet and do not paste any unrelated results. The data used in
this assignment can be found on the Wattle.
The significance level for all the questions is set to be 0.05.
Course Evaluations “CourseEvaluations.csv” contains data on course evaluations,
course characteristics, and professor characteristics for 463 courses for the academic
years 2000-2002 at the University of Texas at Austin. These data were provided by
Professor Daniel Hamermesh of the University of Texas at Austin and were used in
his paper with Amy Parker, “Beauty in the Classroom: Instructors’ Pulchritude and
Putative Pedagogical Productivity,” Economics of Education Review, August 2005,
Vol. 24, No. 4, pp. 369-376.
In order to investigate how the course charactristics and professor characteristics affect
the course evaluations, please use R to answer the following Questions 1 – 3 in the
answer sheet.
Question 1 (Multiple Linear Regression, 3.0 points)
Consider the multiple linear regression model to regress the logarithm of “Course_eval”
on all the other variables in the dataset (please do not consider the interaction terms
for now).
Please answer the following questions in the answer sheet.
a) (1 point) Please use R to fit the model. What is the least squares estimate for
the coefficient of “age” (rounded to four decimal places)? Please also interpret
this estimated coefficient.
b) (1 point) Based on the “summary” function output of this fitted model, what
are the null hypothesis and the alternative hypothesis for the “F-statistic” in the
“summary” function output? What conclusion can you obtain for this F-test?
c) (0.5 points) Based on the “summary” function output of this fitted model, if we
control the other variables, is the mean of log(Course_eval) in the category of
“male”, is significantly different from that in the category of “female”?
d) (0.5 points) Please paste the R codes for all the above analyses of Question 1 in
the answer sheet.
3
Question 2 (Model Diagnostics, 6.0 points)
Consider the multiple linear regression model in Quesiton 1 a). Please answer the
following questions in the answer sheet.
a) (0.5 points) Based on the “summary” function output of the fitted model in
Quesiton 1 a), please interpret the R-squared.
b) (1 point) Please paste the residuals versus fitted values plot of the fitted model
in Quesiton 1 a) in the answer sheet. Are the assumptions in the multiple linear
regression model violated based on this plot?
c) (1 point) Please paste the Q-Q plot of the residuals based on the fitted model in
Quesiton 1 a) in the answer sheet. What conclusions can you obtain via the Q-Q
plot?
d) (1 point) Please paste the Cook’s distance plot of the fitted model in Quesiton 1
a) in the answer sheet. Based on the criterion introduced in lectures, are there
any influential observations? Why or why not?
e) (1 point) Please find the observation with the largest Cook’s distance. (Hint: use
“which” function in R.) Based on the “rule of thumb” cut-offs for the studentized
residual, is this observation an outlier? How to deal with this suspected influential
observation?
f) (1 point) We have found the observation with the largest Cook’s distance in e).
Based on the “rule of thumb” cut-off for the leverage, does this observation have
distant explanatory variable values? Why or why not?
g) (0.5 points) Please paste the R codes for all the above analyses of Question 2 in
the answer sheet.
4
Question 3 (Multiple Linear Regression for Continuous and Categorical Explanatory
Variables, 3.0 points)
Consider the multiple linear regression model in Quesiton 1 a), but we would like to
add more explanatory variables. Please answer the following questions in the answer
sheet.
a) (0.5 points) Consider the model and the variables in Question 1 a). But now
we add all the interaction terms between any of two explanatory variables from
“Beauty”, “Female”, “Minority”, “NNenglish”, “intro”, “onecredit”, and “age”,
and obtain a new model. Compute and show the sum of squared errors (SSE)
for the fitted model in this question and Question 1 a), respectively. Which one
is smaller?
b) (0.5 points) Clearly “Minority” is an indicator variable of two categories “non-
White” and “White”. Which category is the baseline level for the model with
interactions constructed in the previous question?
c) (0.5 points) Consider the regression model with the interaction terms suggested
in Question 3 a). If now we are interested in testing whether or not the regression
model for the response log(Course_eval) in the category of “a native English
speaker”, is significantly different from that in the category of “not a native
English speaker”, when other variables are held constant, please use R to obtain
an appropriate test statistic and the corresponding p-value. What conclusion can
you obtain based on the result?
d) (0.5 points) Consider the model with interactions in Question 3 a). What are the
explanations of the estimated coefficient of the interaction term between “Female”
and “intro”? Is the interaction between “Female” and “intro” significant? Why
or why not?
e) (0.5 points) What is the 90% confidence interval for the coefficient of the inter-
action term between “Female” and “intro”? Please round your answer to four
decimal places. Please also interpret the meaning of this confidence interval.
f) (0.5 points) Please paste the R codes for all the above analyses of Question 3 in
the answer sheet.
5
Question 4 (Simulation for Multiple Linear Regression, 3.0 points)
Consider the multiple linear regression model µ{Y|X1,X2}= β0 +β1X1 +β2X2 for
the observations{(Yi,X1,i,X2,i) : i = 1,···,(n+1)}, and the least squares estimates
ˆβ0, ˆβ1 and ˆβ2 based on the data{(Yi,X1,i,X2,i) : i = 1,···,n}for the coefficients β0,
β1 and β2 can be obtained.
Lily wants to use R to generate random samples based on the multiple linear regression
model assumptions. She follows the steps below.
Step 1: Specify β0 = 2, β1 = 1 and β2 =−1,
Step 2: Suppose the observations X1,1,···,X1,n+1 are 1,2,···,101, so n = 100.
Step 3: Generate X2,1,···,X2,n+1 from the t3 distribution. (Hint: use the R function
“rt”.)
Step 4: GenerateE1,···,En+1 from the normal distribution with mean 0 and variance
2 [N(0,2)].
Step 5: Generate Yi = µ{Yi|X1,i,X2,i}+Ei, i = 1,···,(n+1).
Step 6: Repeat Step 4 – Step 5 1,000 times and obtain 1,000 different datasets of
{(Yi,X1,i,X2,i) : i = 1,···,(n+1)}.
Part 1. (1.5 points) Lei Li is a friend of Lily. Lily hands over the above 1,000
datasets of {(Yi,X1,i,X2,i) : i = 1,···,n} to him but she keeps the observation
(Yn+1,X1,n+1,X2,n+1) for each dataset only for herself. She also does not tell him the
true values of β0, β1 and β2. Based on each dataset of{(Yi,X1,i,X2,i) : i = 1,···,n},
Lei Li computes the least squares estimates ˆβ0, ˆβ1 and ˆβ2 as well as the 95% confidence
interval for the mean of response given X1 = 2.5 and X2 = 0. Ultimately, he obtains
1,000 different confidence intervals.
Then Lily computes the mean of response µ{Y|X1 = 2.5,X2 = 0}and tells Lei Li this
information. Lei Li counts the number of the above 1,000 confidence intervals that
cover µ{Y|X1 = 2.5,X2 = 0}.
Please answer the following questions in the answer sheet.
a) (0.5 points) Suppose you play both roles of Lily and Lei Li and realise the above
steps in R. Please paste the complete R codes for all the above procedures in the
answer sheet.
b) (0.5 points) What is the number of the confidence intervals that cover µ{Y|X1 =
2.5,X2 = 0}based on the output after running your R codes? Please answer this
question in the answer sheet.
c) (0.5 points) Based on the result of b), interpret the 95% confidence interval for
the mean of response. Please answer this question in the answer sheet.
Part 2. (1.5 points) James is another friend of Lily. Lily hands over the above
1,000 datasets of {(Yi,X1,i,X2,i) : i = 1,···,n} and (X1,n+1,X2,n+1) to him but
she keeps the observation of response Yn+1 for each dataset only for herself. She
also does not tell him the true values of β0, β1 and β2. Based on each dataset of
{(Yi,X1,i,X2,i) : i = 1,···,n}, James computes the least squares estimates ˆβ0, ˆβ1 and
ˆβ2. Using those estimates and (X1,n+1,X2,n+1), he also calculates the 95% prediction
interval of the response Yn+1. Ultimately, he obtains one prediction interval of the
response Yn+1 for each dataset, and 1,000 different prediction intervals in total.
Then Lily tells James the values of Yn+1 for 1,000 datasets. For each dataset, James
counts “1” if the prediction interval covers the corresponding Yn+1; “0”, otherwise.
Since there are 1,000 datasets, James can count the total number of “1”s in the above
procedure.
Please answer the following questions in the answer sheet.
a) (0.5 points) Suppose you play both roles of Lily and James and realise the above
steps in R. Please paste the complete R codes for all the above procedures in the
answer sheet.
b) (0.5 points) What is the total number of “1”s based on the output after running
your R codes? Please answer this question in the answer sheet.
c) (0.5 points) Based on the result of b), interpret the 95% prediction interval for
Yn+1. Please answer this question in the answer sheet.

联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!