Econ 419: Homework 5
Fall 2018
Michael P. Leung
Due 11/29 2:05 p.m.
In Brazil, about 20 percent of adults are illiterate. Yet prior to the late ’90s, votes
were cast via paper ballots with only written instructions, and citizens had to write
down their vote. In 1998, the government rolled out electronic voting technology
using visual aids that were much simpler to understand. Fujiwara (2015) studies the
impact of this new technology on the 1998 elections when the devices were first rolled
out. He exploits the fact that, due to a limited supply of devices, the technology was
only used in municipalities with at least 40,500 registered voters, according to 1996
voter rolls.
To analyze the data, you will need to install a new package. Enter into STATA
search rdrobust, all, and then click on
st0366 from http://www.stata-journal.com/software/sj14-4
to install. Use describe to figure out which of the variables in the dataset are needed
in the problems below.
1. Initially, our main outcome of interest will be the number of valid votes cast,
with the theory being that with a more understandable voting technology, fewer
flawed ballots will be submitted.
(a) Define the population model.
(b) Assess the plausibility of the RD identification conditions.
(c) Interpret the RD estimand, and discuss its relevance for the question being
studied.
2. Let’s begin our analysis with the most basic RD graph, plotting the outcome
against the running variable.
(a) Consider the subsample of observations for which the running variable is
between 4500 and 100,000. Using twoway, graph this subsample on a scatterplot
along with a quadratic best fit line on each side of the discontinuity
using qfit. Also display a vertical line at the discontinuity using the xline
option.1
(b) From the result in the previous question, it might not surprise you to learn
that most papers don’t display the raw data like that. Instead, they divide
the outcome into bins and plot the average outcome within each bin to get
a cleaner graph. Let x be the running variable.
1You can use the option lc(blue) with qfit to make the best fit line blue. You can do the same
with the scatterplot using the option mc(blue).
1
i. Input egen bin_x=cut(x), at(500(4000)200000) to create a new
binned version of x.
ii. Use the egen command to generate the mean outcome variable by
values of bin_x.
iii. Redo the graph in the previous problem, except replace the scatterplot
with one plotting mean outcome against bin_x (keep the qfit
commands the same).
3. To estimate the treatment effect, implement RD using an OLS regression with a
linear specification on both sides of the discontinuity. Remember to center your
running variable at the threshold. Discuss your results. What’s the meaning of
the magnitude of the treatment effect estimate?
4. Repeat the previous problem for two more specifications. The first should use
a quadratic specification on both sides of the discontinuity. The second should
use a cubic specification. Compare your results with the previous problem.
5. Next we’ll try nonparametric RD by restricting our sample to a neighborhood
around the threshold.
(a) Rerun the linear specification using the subsample of observations whose
running variable is at most 5000 more or less than threshold. Repeat with
10000 in place of 5000. Compare your point estimates, standard errors,
and sample sizes to previous regressions.
(b) The command rdbwselect y x, kernel(uni) bwselect(IK) generates
an “optimal” bandwidth for outcome y and running variable x. Its output
is stored in the variable e(h_IK). Input the first command, and then rerun
the linear specification using the subsample of observations whose running
variable is at most e(h_IK) more or less than the threshold. Compare your
point estimates, standard errors, and sample sizes to previous regressions.
6. For all future problems, we will use the linear specification with the optimal
bandwidth. Make sure to rerun the rdbwselect command with each new outcome
and running variable.
Let’s look at the effect of the policy on electoral outcomes. Replace the outcome
with the share of the vote that went to right-wing parties, measured by the
variable right.
(a) Run the regression, and replicate the binned mean scatterplot for this new
outcome.
(b) Comment on the results. What’s a story for the sign of the point estimate?
7. Run balance tests using at least four other relevant variables included in your
dataset. Comment on the results and what the tests mean.
2
8. We’ll do a visual placebo test by adding to the binned mean plot in question
#2 the same scatterplots and quadratic fits for two new outcomes, the share of
valid votes in the 1994 elections (prior to rollout of electronic voting), and the
share of valid votes in the 2002 elections (when all municipalities used electronic
voting). Comment on the shape of the plots.
9. Lastly, we’ll implement a visual version of McCrary’s density test. Create a new
variable tabulating the number of observations within each bin of the bin_x
variable.2 Graph a scatterplot of the new variable against bin_x, restricted
to values of the running variable between 15,500 and 100,000. Also include a
vertical line at the threshold. What does the graph indicate?
2Hint: generate a variable ones equal to 1 for all observations. Then use egen to create a new
variable that sums ones within each bin.