SCM 460讲解、辅导R编程设计、讲解R、辅导dataset留学生解析Haskell程序|辅导Web开发

Fall 2019 SCM 460 / SCM 575 Midterm
Exam
Directions
• This exam is open notes, open book, open computer.
• You may use all class materials and resources available to you on the internet.
• You may not communicate with other students in any way.
• Show your work. Credit will not be given to numbers without explanation or intermediate
calculations. In particular, R output with no accompanying analysis will not receive credit.
• Supply all additional analyses and charts that are needed for your analysis, even if I have not
explicitly asked for them.
• Clearly label your final answer. If it is not clear what the final answer is, you will be penalized.
• Maximum points will be awarded to correct and concise answers.
• You have from 10/14 to 10/21 at 11:59pm to complete the exam.
• You must submit two files: (i) a Word or PDF document giving your answers and a brief explanation
of your work, and (ii) the accompanying R script. Submit these files to Blackboard
before the due date.
1
1. What is the effect of unemployment on crime rates? To answer this question, we’ll be using
“unemployment.csv” dataset, posted on Blackboard. This is a dataset of 46 different cities in
Michigan, their unemployment rates, and their crime rates, in two different years. Here is a brief
data description:
• ctyid: ID number of each city.
• crmrte82: crimes per 1000 people in 1982
• unem82: unemployment rate in 1982
• crmrte87: crimes per 1000 people in 1987
• unem87: unemployment rate in 1987
(a) (1 pt) To find the effect of unemployment on crime rates, first try regressing the crime rate on
unemployment, using only observations from 1987. Interpret the regression results. Briefly
discuss any issues or problems surrounding this approach - obviously there are some omitted variables
here, so list a few and explain how these variables can cause problems in the interpretation.
(b) (3 pts) Suppose that one such omitted variable is demographics (say, race, age, and gender).
Is there any way that we can control for this variable even though it is missing from our
dataset?
Note that across such a short period of time from 1982 to 1987, demographics tend to be timeinvariant
- that is, the demographics will be roughly the same in these two years. Let’s try to take
advantage of this fact in our analysis. Taking demographics into account, our regression model for
1987 becomes
crmrtei,yr=1987 = β0 + β1unemi,yr=1987 + β2demographicsi
, (0.1)
where i represents the city ID. In plain English, the crime rate in city i in year 1987 is a linear
function of the unemployment in city i in year 1987, and of the demographics in city i (note that
this is independent of year). Similarly, our regression model for 1982 is
crmrtei,yr=1982 = β0 + β1unemi,yr=1982 + β2demographicsi
. (0.2)
Now use these two equations to estimate β1. Hint. Subtract equation (0.2) from equation (0.1) and
run the resulting regression. It is independent of demographics which is missing from our dataset
anyways. This method can be generalized to control for any number of time-invariant omitted
variables.
2
2. What is the causal effect of attending a Ivy League school instead of a public school on wage?
To answer this question we focus on NYC students who applied to Columbia University (private)
and CUNY (public). In “college.csv” you are given the following information on 2000 students:
• CBApp: Dummy variable, 1 indicates the student applied to Columbia.
• CUNYApp: Dummy variable, 1 indicates the student applied to CUNY.
• CBAdmit: Dummy variable, 1 indicates the student was admitted to Columbia.
• CUNYAdmit: Dummy variable, 1 indicates the student was admitted to CUNY.
• CBAttend: Dummy variable, 1 indicates the student enrolled in Columbia, 0 indicates the
student enrolled in CUNY.
• wage: Annual wage of the student 10 years after graduation.
(a) (1pt) Perform the regression log(wage) = β0 + β1CBAttend. Give an interpretation of β1.
Does β1 give the causal effect of attending Columbia over CUNY on wage? If not, explain why.
(b) (3pts) As I suggested in the beginning of the semester, the comparison would make more
sense if we compared wages only amongst students that applied to and got admitted to the same
universities.
Add controls to the regression in (a) to do this. We want the effect of attending Columbia,
keeping the schools that each student applied and got accepted to constant. Explain how you came
up with your controls. Interpret ALL coefficients in your regression.
Hint: Create groups based on each unique combination of admittance and application decisions,
e.g. Group 1 is the set of students who applied to both schools and got accepted to both
schools, Group 2 is the set of students who applied to both schools and only got into CUNY, etc.
Then add these groups into your regression as controls. Note that this dataset is taken from college
students, so it does not include people who were not admitted into any university.
(c) (2pt) How does the significance of CBAttend change between (a) and (c)? Give intuition
why this is the case. Explain why your regression in (c) accounts for the omitted variables that
you mentioned in (a). What do you conclude about the effect on wage of attending Columbia over
CUNY?