Econ 419: Homework 2 Fall 2018

Econ 419: Homework 2 Fall 2018
Michael P. Leung
(Hint: just follow the same steps as the derivation of the normal limit for
the OLS estimator.)
2. (50 pts) In this empirical exercise, we will learn how to evaluate the robustness
of linear regression by assessing the degree of overlap and computing matching
estimators.
We will use data from the National Supported Work Demonstration (NSW)
and Current Population Survey (CPS) to look at the effect of a job training
program on earnings. Load nsw.dta and use describe to see what the variables
represent. The observations for which experimental equals one come from
NSW data, and those for which it equals zero come from CPS data. NSW data
comes from an experiment in which treated individuals were assigned to a job
training program. Treatment took place in 1975. The variables re74, re75,
1
and re78 are measures of earnings in years 1974, 1975, and 1978 respectively,
where re75 is measured prior to treatment assignment.
To complete this exercise, you will need to install several STATA programs.
First input into STATA
net from http://fmwww.bc.edu/RePEc/bocode/i.
Scroll down and click isvar, and a new page will pop up. Click “(click here to
install)” to install the package. Next input
net from http://personalpages.manchester.ac.uk/staff/mark.lunt.
Install the package propensity.
(a) We will first analyze the experimental sample. This means all your
commands for this question must be restricted to observations
only in this sample. To do this, use the if option, e.g. summarize y
if x > 5.
i. Regress earnings on treatment and all available controls, including
pre-treatment outcomes. Comment on the results.
ii. The propensity score is PpD 1 |Xq, where D is the treatment indicator
and X the vector of controls. This measures assignment/selection
into treatment on observables for each subpopulation X.
A common way to detect differences between the treatment and control
subpopulations (other than balance tests) is to plot the density of the
propensity score for both groups. The idea is that, in a randomized
experiment, assignment/selection into treatment is the same regardless
of the subpopulation X because D KK X.
A. To estimate the propensity score, we regress D on X. However,
to ensure that the fitted values from this regression are between
0 and 1, we use a logistic regression instead of a linear regression.
To do this, use the same syntax as OLS except replace reg with
logit.
B. For each observation i, given Xi, compute the predicted value of
Yi from the regression. Store the predictions in a new variable.1
C. Plot the estimated densities of the propensity scores for the treatment
and control populations.2 Remember to restrict your analysis
to the experimental sample! Comment on the resulting graph.
1Hint: use predict.
2Hint: to estimate the densities of variables y and x on the same graph, use graph tw kdensity
y || kdensity x, legend(label(1 "y") label(2 "x")). Combine this with the if option to
restrict your analysis to the right subpopulations.
2
(b) The previous exercises established our experimental benchmark results.
Let’s see what happens when we use some observational data instead. We
will ignore observations in the control group of the experimental sample
and use in their place the observations in the CPS sample as control. Your
analysis below should be restricted to this new “NSW-CPS” sample.3
i. Repeat part (a). Are your results similar to the experimental estimates?
Come up with an economic story for why or why not.
ii. To further probe differences between treatment and control, run balance
tests on pretreatment outcomes. Discuss your results.
iii. Based on our previous results, we might want to try a matching estimator
to better pair treated units to similar control units, with the
hope that units with similar observables will also have similar unobservables,
thereby reducing selection bias. Match treated and control
units with similar propensity scores (estimated in part (a)). You can
do this with the command gmatch.4
iv. Using only the subset of matched units (remembering to still restrict
your analysis to the NSW-CPS sample), estimate the effect of treatment
on earnings with OLS, and test the null that it’s different from
zero. How does the result compare to parts (a) and (b)i.?
v. The quality of the matching estimator depends on the quality of our
matches. To assess the latter, plot the distribution of propensity scores
for treatment and control. Discuss the result.
3Hint: generate a new indicator variable analogous to experimental and use this variable along
with the if option in future commands.
4Hint: For a treatment variable D and control X, the command gmatch D X if [..],
set(varname) generates a new variable varname. This assigns a unique number to each treated
unit and then a corresponding number to each control unit matched with the treated unit. Unmatched
control units are given an empty label. Note that gmatch creates a variable diff which
you can ignore.