首页 >
> 详细

Big Data Methods

PC session 3

Part I: Empirical Part

Use the dataset “Oilfinance” for the exercises 1-21.

Ridge and lasso regression for prediction of continuous outcome

1) Define the first variable (i.e. first column) in the data matrix to be the outcome y (price

change of RTS index in % compared to one week ago) and the remaining variables to be the

predictors x (lagged levels and price changes of oil supply, stocks, indices). Show the

distribution of y by means of histogram.

2) Define a training sample containing 188 observations. Apply k-fold cross-validation (k=10) in

the training data to find optimal lambda for ridge regression (alpha=0). Report the optimal

lambda for the ridge regression.

3) In a next step, run a ridge regression (alpha=0) in the training data with the optimal lambda

and show the coefficients.

4) Predict the outcome in the test data using the optimal lambda.

5) Compute the mean squared error, the mean absolute error, and also compute the average of

the absolute y in test data (to compare it to the errors).

6) Run a ridge regression (alpha=0) in the training data with a user-provided penalty of

lambda=10 and show the coefficients.

7) Predict the outcome in the test data and compute the mean squared error.

8) Run a lasso regression (alpha=1) with the same data. Apply k-fold cross-validation (k=10) in

the training data to find the optimal lambda. Report the coefficients which are different to

zero.

9) Predict the outcome in test data using the optimal lambda.

10) Compute the mean squared error and the mean absolute error.

11) Predict the expected price change in % for mean values of x in the data.

2019 Selina Gangl

Ridge and lasso regression for prediction of binary outcome

12) Create a binary variable for y>0 (meaning that RTSindex price change is larger than zero).

13) Run a lasso logit regression for binary outcomes (setting family to binomial). Apply k-fold

cross-validation (k=10) in the training data to find optimal the lambda. Report the optimal

lambda.

14) Run a lasso regression (alpha=1) in the training data with the optimal lambda. Report the

coefficients that are different to zero.

15) Predict the outcome in the test data using the optimal lambda.

16) Recode the predicted outcome to be one if the predicted probability is larger than 50%

(=0.5). Compare this variable to the true outcomes in the test data in order to calculate the

classification error rate and share of correct classifications.

Causal inference for one regressor based on double lasso without sample splitting

17) Define "brentyl1" (price/barrel of crude brent oil in last period, i.e. one week ago) to be the

regressor d whose causal effect on y is of interest. Define the remaining regressors to be

used as potential controls x for causal analysis when estimating the effect of d on y.

18) Run a LASSO with double selection of x in the treatment and outcome equations to estimate

the causal effect of d on y. The effect of d is assumed to be homogeneous (does not depend

on values of x or d). Report the output.

19) Re-run the command with "partialling out" rather than "double selection". Report the

output.

Causal inference for one regressor based on double lasso with sample splitting

20) Apply the partialling out method with sample splitting. Use the training sample to estimate a

lasso-based model for y as a function of x and of d as a function of x based on crossvalidation.

Then estimate effect of d on y in test data. Swap the roles of the test and training

data and estimate the effect of d on y as the average of the effects in either subsamples.

Furthermore, compute the standard error of the estimated effect.

Causal inference for several regressors based on double lasso without sample splitting

21) Use the command “rlassoEffects” to estimate the causal inference for several regressors

based on double lasso without sample splitting .

2019 Selina Gangl

Lasso-based causal inference with instruments without sample splitting

Use the data “EminentDomain” from the “hdm” package. This is a dataset on judicial eminent

domain decisions and contains four sub-data sets, which differ mainly in the dependent

variables. Use the data about the non-metro (NM) area (in log).

Outcome variable (y) log house price in non-metro area of circuit

(=district)

Causal variable (d) number of pro-plaintiff appellate takings

decisions overturning government's seizure

of property in favor of private owner

(indicator for protection of individual

property rights)

Instruments (z) characteristics of randomly assigned judges

including gender, race, religion, political

affiliation,...

Control variables (x)

Define the outcome variable (y), the causal variable (d), the instruments (z), and the control

variables (x).

22) Run LASSO IV estimation for the selection of controls x and instruments z.

23) Run LASSO IV estimation for the selection of z, but take all x variables as controls.

24) Run LASSO IV estimation for the selection of x, while using all of the first 20 elements in z as

instruments

2019 Selina Gangl

Part II: Conceptual questions

Probably, we won’t have time to discuss this part in the pc session.

Anyway, you may use this part like a mock exam.

25) Compare Lasso estimation and standard OLS and comment on similarities and differences.

26) Compare ridge regression and Lasso estimation and comment on similarities and differences.

27) Explain the concept of k-fold cross-validation for picking the shrinkage factor in Lasso.

28) Explain the concept of post-Lasso double selection in OLS for performing causal inference.

29) Explain the idea of adaptive Lasso. For which reason might it be preferred over

“conventional” Lasso?

30) Explain the concept of a “sparse” model.

31) What is the advantage of shrinkage methods compared to “classical” variable selection

methods like forward selection or backwards elimination?

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp2

- Ee6435 Programming Homework 2020-05-30
- Computer Architecture Homework 3 2020-05-30
- Infs7450作业代做、Media Analytics作业代写、Pytho 2020-05-29
- 代写stats 782作业、代做r编程设计作业、代写data留学生作业、R课 2020-05-29
- 代写math223作业、R课程设计作业代做、代写data课程作业、R程序语言 2020-05-28
- 代写kxo151留学生作业、代做programming作业、Java语言作业 2020-05-28
- Math 160作业代做、Data课程作业代写、Matlab程序设计作业调试 2020-05-28
- 代做csci 3120作业、Program课程作业代做、C/C++语言作业代 2020-05-28
- St2020课程作业代做、Statistics作业代做、R程序设计作业调试、 2020-05-28
- Com1005作业代写、Ai Techniques作业代做、Java语言作业 2020-05-28
- 代写sit216留学生作业、Python程序语言作业调试、Java/C++实 2020-05-28
- 代写artificial课程作业、Java，Python程序语言作业调试、C 2020-05-27
- Comp Sci 3306作业代写、Python编程语言作业调试、代做jav 2020-05-27
- Data留学生作业代写、代做r课程设计作业、Analytics作业代做、R编 2020-05-27
- Csci 3120作业代做、C++程序语言作业调试、代做c/C++课程作业、 2020-05-26
- 代写algorithms作业、Data留学生作业代做、代写java、Pyth 2020-05-26
- Data Science作业代写、C++程序设计作业代写、Programmi 2020-05-26
- Data课程作业代写、C++编程设计作业调试、C/C++语言作业代做、Alg 2020-05-26
- 代写r留学生作业、代做data课程作业、代写r编程语言作业代做r语言编程|调 2020-05-25
- Cosc473作业代做、Systems作业代写、Python编程设计作业调试 2020-05-25