首页 > > 详细

辅导留学生R设计、R辅导、解析R编程、Applied Regression Analysis讲解

Stats 413: Applied Regression Analysis
Problem sets are due in lecture on the due date (please print out and submit in class). For
problems that require programming, please properly comment your code and submit it together
with any output. You are encouraged to collaborate on problem sets with classmates, but the nal
write-up (including any code) must be your own.
1. Height and weight data (by Sanford Weisberg)
The data Htwt from http://users.stat.umn.edu/~sandy/alr4ed/data/ give \ht" = height
in centimeters and \wt" = weight in kilograms for a sample of n = 10 18-year-old girls.
Interest is in predicting weight from height.
(a) Draw a scatter plot of \wt" on the vertical axis versus \ht" on the horizontal axis. On
the basis of this plot, does a simple linear regression model make sense for these data?
Why or why not?
(b) Compute estimates of the slope and the intercept for the regression of \wt" on \ht".
Draw the tted line on your scatterplot.
(c) Obtain the estimate of 2 and nd the estimated standard errors of ^ 0 and ^ 1. Compute
the t-tests for the hypotheses that 0 = 0 and 1 = 0 and nd the appropriate p-values
using two-sided tests.
2. Multi-task regression (by Andrew Ng)
Thus far, we only considered regression with scalar-valued responses. In some applications,
the response is itself a vector: yi 2 Rp. We posit the relationship between the features and
the vector-valued response is linear:
yTi xTi B ;
where B 2Rd p is a matrix of regression coefficients.
(a) Express the sum of squared residuals (SSR) in matrix notation (i.e. without using any
summations).
Hint: work out how to express the SSR in terms of
(b) Find the matrix of regression coefficients that minimizes the SSR.
(c) Instead of minimizing the SSR, we break up the problem into p regression problems with
scalar-valued responses. That is, we t p linear models of the form
(yi)k xTi k;
where k 2 Rd. How do the regression coefficients from the p separate regressions
compare to the matrix of regression coefficients that minimizes the SSR.
3. Predicting crime rate
Download the Boston dataset, from the course website. In this problem, we will predict the
pre capita crime using the other variables in the dataset.
(a) For each predictor, t a simple linear regression model to predict the response. In which
of the simple linear models is there a statistically signi cant association between the
predictor and the response?
(b) Fit a multiple regression model to predict the response using all the other features in
the dataset. For which features can we reject the null H0 : j = 0.
(c) How do the results from (a) and (b) compare. Create a scatterplot displaying the simple
regression coefficient of each predictor from (a) on the x-axis, and the multiple regression
coefficient from (b) on the y-axis. That is, each predictor is displayed as a point on the
plot.
(d) Is there evidence of non-linear relationship between any of the features and response?
For each predictor xj, look at the t of the cubic model
y 0 + 1xj + 2x2j + 3x3j:
4. The advantage of backwardness (by Cosma Shalizi)
Some theories of economic growth suggest it is easier for poorer countries to grow than it is
for richer countries{the so called \advantages of backwardness". One possible explanation
is poorer countries can grow by simplying importing policies and technology from richer
countries, while richer countries must innovate to grows. Since importing is easier than
innovating, all else being equal, poorer countries should grow faster than richer countries.
Download penn.csv from the course website (Canvas). Each row of this table contains the
initial population, the initial gross domestic product (GDP) per capita, the average annual
growth of the GDP, the average annual growth of the population, the average percentage
of GDP devoted to investment, and the average ratio of trade to GDP for a country over a
5-year period.
(a) Regress gdp.growth on log(gdp). Report the tted coefficients and their p-values.
(b) Regress gdp.growth on log(gdp), pop.growth, invest, trade. Report the tted coeffi-
cients and their p-values.
(c) Some theories suggest the catching-up effect is only present in countries which trade
with more developed countries. Add an interaction between log(gdp) and trade to the
model from part (b). What are the relevant regression coefficients, and what are their
p-values?
(d) Summarize your ndings. Does the data support the hypothesis that there are \advan-
tages of backwardness", undermine it, or support it occuring under certain conditions?
Please summarize your ndings in a short report|this means no raw R code (unless you
deem it necessary for clarity), no raw R output, properly referencing gures etc.

联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!