首页 >
> 详细

Big Data Methods

PC session 3

Part I: Empirical Part

Use the dataset “Oilfinance” for the exercises 1-21.

Ridge and lasso regression for prediction of continuous outcome

1) Define the first variable (i.e. first column) in the data matrix to be the outcome y (price

change of RTS index in % compared to one week ago) and the remaining variables to be the

predictors x (lagged levels and price changes of oil supply, stocks, indices). Show the

distribution of y by means of histogram.

2) Define a training sample containing 188 observations. Apply k-fold cross-validation (k=10) in

the training data to find optimal lambda for ridge regression (alpha=0). Report the optimal

lambda for the ridge regression.

3) In a next step, run a ridge regression (alpha=0) in the training data with the optimal lambda

and show the coefficients.

4) Predict the outcome in the test data using the optimal lambda.

5) Compute the mean squared error, the mean absolute error, and also compute the average of

the absolute y in test data (to compare it to the errors).

6) Run a ridge regression (alpha=0) in the training data with a user-provided penalty of

lambda=10 and show the coefficients.

7) Predict the outcome in the test data and compute the mean squared error.

8) Run a lasso regression (alpha=1) with the same data. Apply k-fold cross-validation (k=10) in

the training data to find the optimal lambda. Report the coefficients which are different to

zero.

9) Predict the outcome in test data using the optimal lambda.

10) Compute the mean squared error and the mean absolute error.

11) Predict the expected price change in % for mean values of x in the data.

2019 Selina Gangl

Ridge and lasso regression for prediction of binary outcome

12) Create a binary variable for y>0 (meaning that RTSindex price change is larger than zero).

13) Run a lasso logit regression for binary outcomes (setting family to binomial). Apply k-fold

cross-validation (k=10) in the training data to find optimal the lambda. Report the optimal

lambda.

14) Run a lasso regression (alpha=1) in the training data with the optimal lambda. Report the

coefficients that are different to zero.

15) Predict the outcome in the test data using the optimal lambda.

16) Recode the predicted outcome to be one if the predicted probability is larger than 50%

(=0.5). Compare this variable to the true outcomes in the test data in order to calculate the

classification error rate and share of correct classifications.

Causal inference for one regressor based on double lasso without sample splitting

17) Define "brentyl1" (price/barrel of crude brent oil in last period, i.e. one week ago) to be the

regressor d whose causal effect on y is of interest. Define the remaining regressors to be

used as potential controls x for causal analysis when estimating the effect of d on y.

18) Run a LASSO with double selection of x in the treatment and outcome equations to estimate

the causal effect of d on y. The effect of d is assumed to be homogeneous (does not depend

on values of x or d). Report the output.

19) Re-run the command with "partialling out" rather than "double selection". Report the

output.

Causal inference for one regressor based on double lasso with sample splitting

20) Apply the partialling out method with sample splitting. Use the training sample to estimate a

lasso-based model for y as a function of x and of d as a function of x based on crossvalidation.

Then estimate effect of d on y in test data. Swap the roles of the test and training

data and estimate the effect of d on y as the average of the effects in either subsamples.

Furthermore, compute the standard error of the estimated effect.

Causal inference for several regressors based on double lasso without sample splitting

21) Use the command “rlassoEffects” to estimate the causal inference for several regressors

based on double lasso without sample splitting .

2019 Selina Gangl

Lasso-based causal inference with instruments without sample splitting

Use the data “EminentDomain” from the “hdm” package. This is a dataset on judicial eminent

domain decisions and contains four sub-data sets, which differ mainly in the dependent

variables. Use the data about the non-metro (NM) area (in log).

Outcome variable (y) log house price in non-metro area of circuit

(=district)

Causal variable (d) number of pro-plaintiff appellate takings

decisions overturning government's seizure

of property in favor of private owner

(indicator for protection of individual

property rights)

Instruments (z) characteristics of randomly assigned judges

including gender, race, religion, political

affiliation,...

Control variables (x)

Define the outcome variable (y), the causal variable (d), the instruments (z), and the control

variables (x).

22) Run LASSO IV estimation for the selection of controls x and instruments z.

23) Run LASSO IV estimation for the selection of z, but take all x variables as controls.

24) Run LASSO IV estimation for the selection of x, while using all of the first 20 elements in z as

instruments

2019 Selina Gangl

Part II: Conceptual questions

Probably, we won’t have time to discuss this part in the pc session.

Anyway, you may use this part like a mock exam.

25) Compare Lasso estimation and standard OLS and comment on similarities and differences.

26) Compare ridge regression and Lasso estimation and comment on similarities and differences.

27) Explain the concept of k-fold cross-validation for picking the shrinkage factor in Lasso.

28) Explain the concept of post-Lasso double selection in OLS for performing causal inference.

29) Explain the idea of adaptive Lasso. For which reason might it be preferred over

“conventional” Lasso?

30) Explain the concept of a “sparse” model.

31) What is the advantage of shrinkage methods compared to “classical” variable selection

methods like forward selection or backwards elimination?

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp

- Data Visualisation And Analytics Assi... 2019-11-15
- Block Breaker Assignment Game Engine ... 2019-11-15
- Data Visualisation And Analytics 2019 2019-11-15
- Event Driven Computing 2019 Assignment... 2019-11-15
- Fit1043 Assignment 3 2019-11-15
- Event Driven Computing Assignment 3 - ... 2019-11-15
- 代做data Ming作业、代写systematic课程作业、代写r编程语言 2019-11-15
- Cs210留学生作业代做、Java编程语言作业调试、Java课程设计作业代写 2019-11-15
- 代写stat 385作业、代做r程序语言作业、代写r课程设计作业、Progr 2019-11-15
- 代写cpeg 222作业、Java，C/C++程序语言作业调试、Python 2019-11-15
- Ece 547作业代做、代写python编程设计作业、代做networks留 2019-11-15
- Csc8202作业代做、Web编程语言作业代写、代做web、Html课程设计 2019-11-15
- 代写mathematics课程作业、Matlab编程语言作业代做、代写mat 2019-11-15
- 代做pyopencl留学生作业、Python程序设计作业调试、Python实 2019-11-15
- Rtos Kernel作业代做、代写python，C++程序语言作业、代做j 2019-11-14
- Algorithm课程作业代写、代做r课程设计作业、R编程语言作业调试、代写 2019-11-14
- 代做fpu留学生作业、代写python，Java编程设计作业、代写c++语言 2019-11-14
- 代写msc/Icy课程作业、代写software留学生作业、代做java语言 2019-11-14
- Cse105留学生作业代做、Java程序语言作业调试、代做programmi 2019-11-14
- 代写fm 9528留学生作业、代做risk Analytics作业、Java 2019-11-14