Final
Coursework
Introduction to Quantitative Research Methods
The coursework will be posted on Moodle on 15 December 2017 at 2pm, and is due on 2 January
2018 at 2pm. Please follow all designated SPP submission guidelines for online submission as
detailed on the Moodle page. Late submission results in an automatic fail.
• Coursework should be submitted via the Essay 2 Turnitin Submission’ link on the course
Moodle page. You will need to click the ‘Submit Paper’ link at the bottom of the page. When
presented with the ‘Submit Paper’ box, the ‘Submission Title’ should be your candidate number
(e.g. ABCD1), and you should upload your document into the box provided.
– Please remember to state only your candidate number on your coursework (your candidate
number is made up of four letters and one number e.g. ABCD1). Your name and/or student
number must not appear on your coursework.
• This is an assessed piece of coursework (worth 75% of your final module mark) for the PUBLG100A/B
module; collaboration and/or discussion of the coursework with anyone is strictly prohibited. The
rules for plagiarism apply and any cases of suspected plagiarism of published work or the work of
classmates will be taken seriously.
• The word count for this assessment is 3000 words (not including tables or the appendix). The
word limit will be strictly enforced, and submissions with more than 3000 words will have marks
deducted.
• As this is an assessed piece of work, you may not email/ask the course tutors or teaching fellows
questions about the coursework.
• Along with the coursework itself, the datasets for the coursework (vdem.Rdata and bes.Rdata)
can be found in the PUBLG100A/B page on Moodle. You will need to use the load() function
to open these in R.
• The coursework consists of 3 sections; you must complete each part of each section to achieve the
full amount of points. The points available for each section are given in the section headings.
• Where appropriate, answers should be written in complete sentences; no bulleting or outlining.
Be sure to answer all parts of the questions posed and interpret the results.
• PLEASE SUBMIT Y OUR TY PE-WRITTEN ANSWERS IN ONE DOCUMENT. CREATE AT
THE END AN APPENDI X SECTI ON CONTAINING ALL R CODE NEEDED TO REPRO-
DUCE Y OUR RESPONSES (y ou do not need to include the code that failed to run, but just the
cleaned-up version. Y our code has to work when we run it). FAILURE TO INCLUDE THE R
CODE MEA NS THAT THE COURSEWORK WI LL BE MA RKED I NCOMPLETE (fail).
• You may assume the methods you have used (e.g. linear regression, logit, etc) are understood by
the reader and do not need definitions.
• Round all numbers to two digits after the decimal point.
• Do not copy and paste any brute R output (e.g. summary(lm(y ∼x))) into your answers. Create
a minimally formatted table, e.g. with the screenreg command as seen in class. If that does not
work, re-create by hand such a table.
• Assign every table and figure a title and a number and refer to the number in the text when
discussing a specific figure or table.
• All variable names in the coursework are written in code font.
University of Chicago December 2017
2
Datasets
1) Varieties of Democracy – vdem.Rdata
This data set includes several variables taken from the Varieties of Democracy project (https:
//www.v-dem.net/en/). The unit of analysis is the country-year. The data here covers 161
countries for the years 1976, 1981, 1986, 1991, 1996, 2001 and 2006. Not every country is
included for every year, and there are a total of 1040 observations in the data. The variables in
the data set are:
• year – Year
• country name – Name of country
• region name – Geographic region in which the country is located
• life expectancy – Life expectancy at birth (in years)
• radio television per cap – Number of radio and television sets per capita
• log population – Logged population
• civil war – 1 if there was an intra-state war with at least 1,000 battle deaths in this
country-year, 0 otherwise
• international war – 1 if the country participated in an international armed conflict in
a given year, 0 otherwise
• urban population pct – Percentage of population living in urban areas
• oil production per cap – Value of petroleum produced per capita (US dollars)
• inequality gini – Distribution of income expressed as a Gini coefficient
• gdp per cap – Gross domestic production, per capita
• inflation – Annual inflation rate
• education15 – Average years of education among citizens older than 15
• government effectiveness – A continuous measure of government effectiveness based
on the quality of public service provision amongst bureaucrats and government actors
• political stability – A continuous measure of political stability based on perceptions
of the likelihood that the government in power will be destabilized or overthrown by
possibly unconstitutional and/or violent means
• polity – Score on the polity scale (higher values indicate more democratic countries,
lower values indicate more autocratic countries)
• healthcare – A continuous variable measuring the extent to which high quality basic
healthcare is guaranteed to all (higher values indicate higher access to healthcare)
• womens civ lib – A continuous variable indicating whether women have the ability to
make meaningful decisions in key areas of their lives (higher values indicate higher levels
of civil liberties for women)
University of Chicago December 2017
3
• media censorship – A continuous variable indicating whether the government directly or
indirectly attempts to censor the print or broadcast media (lower values indicate higher
levels of censorship)
• internet access – 1 if there internet in this country-year, 0 otherwise
University of Chicago December 2017
4
2) British Election Study – bes.Rdata
This data set includes several variables taken from the 2015 British Election Study (http:
//www.britishelectionstudy.com/data-objects/cross-sectional-data/). The unit of
analysis is individual respondents to a face-to-face survey. There are a total of 1669 obser-
vations in the data. The variables in the data set are:
• turnout – 1 if the respondent voted in the 2015 election, 0 otherwise
• last election – 1 if the respondent voted in the previous general election, 0 otherwise
• age – Age (in years) of the respondent
• education – Factor variable for the education of the respondent (1 = “None”, 2 =
“GCSE”, 3 = “A-level”, 4 = “Degree”, 5 = “Other”)
• income – Factor variable for the income of the respondent (Low, Mid Hi)
• gender – Gender of the respondent (1= “Female”, 2 = “Male”)
• religion – Religion of the respondent (1 = “None”, 2 = “Christian”, 3 = “Jewish”, 4 =
“Hindu”, 5 = “Muslim”, 6 = “Sikh”, 7 = “Buddhist”, 8 = “Other”)
• homeowner – Home-owning status of the respondent (1 = “It belongs to a Housing Asso-
ciation”, 2 = “Own home on mortgage”, 3 = “Own home outright”, 4 = “Rented from
local authority”, 5 = “Rented from private landlord”)
• ethnicity – Ethnicity of the respondent (1 = “White”, 2 = “Mixed”, 3 = “Asian”, 4 =
“Black British”, 5 = “Other”)
• political interest – Political interest of the respondent (1 = “Fairly interested”, 2 =
“Not at all interested”, 3 = “Not very interested”, 4 = “Very interested”)
• encouragement – Number of times the respondent was encouraged to vote by a friend or
family memb er
• region – Region of the respondent (1 = “East Midlands”, 2 = “East of England”, 3 =
“London”, 4 = “North East”, 5 = “North West”, 6 = “Scotland”, 7 = “South East”, 8 =
“South West”, 9 = “Wales”, 10 = “West Midlands”, 11 = “Yorkshire and the Humber”)
• partyID – Generic party identification of the respondent (1 = “Other”, 2 = “None”, 3
= “Labour”, 4 = “Conservative”, 5 = “Liberal Democrat”, 6 = “Scottish National Party
(SNP)”, 7 = “Green Party”)
• left right – Respondent self-placement on a left-right scale (0 = Left, 10 = Right)
• attention – Respondent attention to politics (0 = Pay no attention, 10 = Pay a great
deal of attention)
• read papers – Factor variable for whether the respondent regularly reads a newspaper (1
= “No”, 2 = “Yes”)
• campaign contact – Factor variable for whether the respondent was contacted by a po-
litical party during the campaign (1 = “No”, 2 = “Yes”)
University of Chicago December 2017
5
• financial situation – Factor variable for the respondent’s impression of their financial
situation over the past year (1 = “A lot worse”, 2 = “A little worse”, 3 = “The same”, 4
= “A little better”, 5 = “A lot better”)
• democratic satisfaction – Respondent’s level of satisfaction with democracy (1 =
“Very dissatisfied”, 2 = “A little dissatisfied”, 3 = “Fairly satisfied”, 4 = “Very satis-
fied”)
• trust politicians – Respondent’s level of trust in UK politicians (0 = No trust, 10 =
A great deal of trust)
• eu referendum – Respondent’s intended vote in the 2016 EU referendum (1 = “I would
not vote”, 2 = “Leave the EU”, 3 = “Stay in the EU”)
• political activity – Number of times the respondent has engaged in political activities
over the past year (campaigning, signing petitions, etc)
• unemployed – 1 if the respondent is currently unemployed, 0 otherwise
• most important issue – Factor variable measuring the most important political issue to
the respondent (many categories)
University of Chicago December 2017
6
. Σ
Question 1 (20 pts.) – The gender gap in promotions
An important question in economics is whether female ‘role models’ help to reduce the gender-
gap in career progression within organisations. To investigate this idea, we collect data on 1500
employees who applied for promotion within their firm. We will focus on the following variables:
• promotion – 1 if the employee was promoted following their application, 0 otherwise
• female employee – 1 if the employee is female, 0 otherwise
• female manager – 1 if the employee’s manager is female, 0 otherwise
• employee quality – A variable measuring the “quality” of the employee based on their
performance evaluations over the past year (the variable is measured from 0 to 10, with
higher values indicating higher quality employees)
To test the ‘role model’ effect, we specify a logistic regression with promotion as the depen-
dent variable, and female employee, female manager and employee quality as independent
variables. We also include the interaction between female employee and female manager.
The equation for the model we estimate is:
logit πi = α + β
1 − πi
∗ employee qualityi +
β2 ∗ female employeei +
β3 ∗ female manageri +
β4 ∗ (female employeei ∗ female manageri)
where πi is the probability that Y = 1 (i.e. the employee was promoted) for observation i. The
estimates from the logistic regression are shown in table 1.
Table 1: Logistic regression for employee promotion
Dependent variable:
Promotion
Employee Quality 0.326
(0.036)
Female Employee −1.338
(0.204)
Female Manager 0.085
(0.276)
Female Employee * Female Manager 0.983
(0.470)
Constant 0.977
(0.181)
Observations 1,500
Log Likelihood −436.590
1
University of Chicago December 2017
7
1. What is the null hypothesis for the interaction term β4?
2. Which coefficients are significantly different from zero at the 95% confidence level?
3. Describe the effect of employee quality on promotion using the concept of an odds-ratio
4. Describe the effects of being a female employee on promotion using odds-ratios
5. Calculate the predicted probability of promotion for the following types of respondent:
• A female employee, with an employment quality of 6, working with a male manager
• A male employee, with an employment quality of 6, working with a male manager
University of Chicago December 2017
8
Question 2 (40 pts.) – Determinants of life expectancy
The expected length of a life varies greatly across countries and over time. A crucial question
in the literature of politics and public health is: what determines average life expectancy? In
this section, you will use a linear regression model to help to understand this question.
2.1
Your task in this section is to develop a theoretically -grounded model of life expectancy using
the varieties of democracy data (dataset 1 above). You should implement an appropriate model
for life expectancy with six theoretically important explanatory variables from the
supplied dataset. You should explain why you would expect these variables to have an effect
on life expectancy. You should not estimate several models but argue theoretically why you
chose certain variables. Think carefully about the relevant variables that you want to include –
including considering whether non-linear and/or interactive specifications of those variables are
appropriate.
In your answer, you should focus on communicating the substantive implications of the
regression that you implement. You may wish to focus on the following:
• Provide descriptive statistics and/or plots to provide the reader with an overview of the
dependent variable and the important explanatory variables that you intend to use.
• Implement a model which uses 6 explanatory variables to explain life expectancy. Do
not include the healthcare variable in this model. You should state an appropriate
hypothesis/null hypothesis for each of the variables in your model.
• Discuss the fit of your model using appropriate statistics.
• In addition to the estimated results, present quantities of interest from the model that
illustrate the relative importance of the different explanatory factors. Examine the effects
for sensible values of the independent variables, and focus your interpretation not just on
the direction of the effects, but also the magnitude of the effects.
• Evaluate your regression model with reference to the assumptions of linear regression and,
if appropriate, implement corrections when these assumptions appear to be violated.
You should write up your results as if they were to be published in a political science journal
article with a focus on communicating the substantive meaning of your results.
2.2
You present the results of the model you developed in question 2.1 to a friend who suggests
that your model is too complicated. In particular, the friend argues that the only thing that
matters for determining average life expectancy is the level of healthcare in the country. You
decide to investigate whether your friend is right:
1. Using a statistical test you have learned on this course, evaluate your friend’s claim that
healthcare alone provides a better explanation for life expectancy than your model does.
University of Chicago December 2017
9
2. Incorporate the healthcare variable into your model from 2.1 and use a statistical test
you have learned on this course to evaluate whether the model is improved by the inclusion
of this variable.
3. What is the interpretation of the healthcare variable in your new model?
University of Chicago December 2017
10
Question 3 (40 pts.) – Who votes?
Why do individuals vote in elections? This is a central question in political science, and has
important normative implications: if some types of people vote less than others, they are less
likely to have their views represented in the political process, and are less likely to benefit
from favourable public policy. In this question, you will use an appropriate limited dependent
variable model to improve our understanding of which types of citizens are more likely to vote.
3.1
The data is from the 2015 British Election Study (BES) and includes information on the political
attitudes and demographics of UK citizens. The dependent variable for this analysis is turnout,
which measures 1 if the respondent voted in the 2015 election, and 0 if they did not vote.
You should implement an appropriate limited dependent variable model with 5 theoretically
important predictors from the dataset. You should explain why you have selected the variables
that you include in the model and explain – from a theoretical perspective – why you expect
them to be important for determining whether an individual decides to vote or not. As with
question 2, you should think carefully about your choice of variables, and consider whether it
would be appropriate to include non-linear or interactive specifications of these variables.
You should focus on the following:
• Fit a model which uses 5 explanatory variables to predict turnout. Do not include the
last election variable in this model. You should state an appropriate hypothesis/null
hypothesis for each of the variables in your model.
• Provide and discuss an appropriate fit statistic for your model
• Interpret your model in both statistical and substantive terms. You should present pre-
dicted probabilities from the model that help to illustrate the substantive importance of
the variables in your model (i.e. simply reporting estimated log-odds ratios is not sufficient
for full marks).
• Create at least one plot of predicted probabilities from your model for a continuous inde-
pendent variable.
You should write up your results as if they were to be published in a political science journal
article with a focus on communicating the substantive meaning of your results.
3.2
You are an advisor to a UK political party. The party bosses look at your results from the model
you developed in section 3.1 and tell you that, in their experience, the most important predictor for
whether a voter votes on election day is whether they voted in the previous election. They would
like to know how your results would change when accounting for past turnout.
1. Estimate a new version of your model, this time including the last election variable.
2. Does this model provide a better fit to the data than your original model? Use a statistical
test that you have learned on this course to check.
3. How – if at all – does your interpretation of the original variables change in this new model?
If the interpretation does change, why might this be the case?