解析Web开发、asp编程辅导留学生、辅导asp、JSP设计辅导留学生

Using R for economics and statistics 2018 Take Home Exam
The due date for this exam is 3 July, 2018. The exam solution should be in R
Markdown form. You are also required to submit the hard copy of the html or
pdf le generated from your R Markdown le.
1. (20 points) In this exercise you need to write R programs to examine the nite sample
properties of the OLS estimates in the following models
Model 1 : Yt = 0 + 1Xt +"t; "t i:i:d:N (0;1)
Model 2 : Yt = 0 + 1Xt +"t; "t i:i:d:t4 (t distrition with degree 4)
Obtain the histograms of b 0and b 1 based on 10,000 replications of simulated data with
the sample size being 25 and 100, respectively. In all cases, simulate Xts from N (10;1)
and xed them in repeated samples. Examine how each histogram changes with the
sample size and the error distribution. Testing for the normality of b 1 in all four
cases (namely Model 1 with n=25, 100 and Model 2 with n=25, 100). Write a short
paragraph to summarize what you can observe from the experiment.
2. (20 points)In this exercise you need to write a R program to examine the nite sample
properties of the OLS estimates of and the bias of the OLS estimates of in the
following model:
Yt = Yt 1 +"t; "t i:i:d:N (0;1) , Y0 N

0; 11 2

based on 5,000 replications using simulated data, each with the sample size of 20 (make
sure to use the same set of random seeds for di erent values of when generating data).
Set the parameter value at
= 0;0:1;0:2;0:3;0:4;0:5;0:6;0:7;0:8;0:9;0:91;0:92;0:93;0:94;0:95;0:96;0:97;0:98;0:99;0:995
Plot the bias of the OLS estimates obtained from the simulation. Plot the bias function
( 2 =n) in the same graph. Discuss the results.
(Hint: The MLE estimate of has no exact distribution even under normality as-
sumption on the error term "t. It is a big di erence between the OLS estimator in iid
(independent and identically distributed) assumption and time series assumption. In
iid case, OLS estimator is BLUE but not in time series case. Phillips (1977, ECTA)
found an edgeworth expansion to approximate the nite sample distribution of b ols.
does not have a closed-form. expression. White (1961, Biometrica) showed

2 n . That is why we say b ols is a biased estimator in nite sample
but an asymptotic unbiased estimator as n!1. When the sample size is not very
large, the bias may be not small. And when > 0, the bias is always negative which
is denoted by downward bias.)
1
3. (20 points)It is pretty rare to nd something that represents linearity in the environ-
mental system. The Y/X response may not be a straight line, but humped, asymp-
totic, sigmoidal or polynomial are possibly, truly non-linear. In this exercise, we will
try to take a closer look at how polynomial regression works and practice with a study
case. There are three types of common patterns of data exploration, including con-
cave (power and exponential), S-shaped (sigmoidal and logistic), and Peaks and valleys
(polynomials.) There are others patterns, but at this time, we will stick to those three.
Polynomials are incorporations of predictor variables where the variable is represented
by multiple instances of itself in successively higher orders.
Here, we use ecological data (Peake and Quinn, 1993) to investigate the abundance
e ects for invertebrates living in mussel beds in intertidal areas. Possible variable con-
guration: Response variable = number of invertebrates (INDIV) Explanatory variable
= the area of each clump (AREA) Additional possible response variables = Species
richness of invertebrates (SPECIES)
(1) Load the data-set and try to look at its structure, particularly the normality. Whats
the best guess based on the scatter-plot?
(2) Add in polynomial terms for the AREA variable up to the 3rd order.
(3) From (1) and (2), you have three models already, which one is the best? Use some
information criterion, such as AIC or BIC, if you know more, it will be better.
(4) Now let us treat the order of polynomial as a tuning parameter. Split your data
in to two parts, one for training, another for test. In training set, further split it
into training and validation set, use cross validation (10-fold and leave on out cross
validation, LOOCV) to determine which order is the best.That is which one has the
best mean squared error on the test set.
(5) Use the best model you get from (4), predict the test set and compute the mean
squared prediction error (MSPE).
4. (20 points) Consider the time series bjsera.txt of Box and Jenkins (1976). There are
197 observations. Suppose that we use the rst 150 observations to perform. model
estimation and the last 47 observations for forecasting evaluation. Two models are
used. The rst model is an ARMA(1,1) and the second model an AR(7).
(a) Write you own R function to compute the mean squares of forecast errors of 1-step
ahead out-of-sample forecasts for the two models. [You should re-estimate the model
at each forecast origin (use the data from the beginning to the forecast origin), starting
with t = 150. This is the growing window case.]
(b) The same to (a), but this time use the xed window method.
(c) Based on the analysis you have performed, which model is more adequate? Why?
5. (20 pionts)(Lasso and Ridge regression) In this exercise, you are required to use Lasso
and Ridge regression to analysis the "Hitters" data which is in the package "ISLR".
Use all the variables in "Hitters" except variable "Salary" to predict "salary".
(a) Preparing the data. Delete the observations with missing values, construct a matrix
including all the predict variables. Note that you should transfer all the non-numerical
variables to numeric.because ‘glmnet()‘ can only take numerical, quantitative inputs.
2
(b) By default the ‘glmnet()‘ function performs ridge regression for an automatically
selected range of values. However, here you are required to implement the function
over a grid of values ranging from = 1010 to = 10 2, essentially covering the
full range of scenarios from the null model containing only the intercept, to the least
squares t.
(c) Comparing the L2 norm (the sum of the squares of each parameters) of the param-
eters for = 50 and = 1000. Comment your results.
(d) Split the data into train and test part, and predict the test data by = 4 and
compute the mean squared error. And please also run a OLS regression to predict the
test data and compare the result with the ridge regression with = 4.
(e) Instead of arbitrarily choosing = 4, here you are required to use cross-validation
to choose the tuning parameter .
(f) Choose the that results in the smallest cross-validation error from (e) to predict
the test set and compute the mean squared error, then comparing it with the results
in (e).
(g) Now please perform. the lasso regression on the train set and the use cross validation
to choose the which minimize the mean squared error and predict the test set.
Compare it with (f).