Introduction to Quantitative Research Methods (PUBLG100A/B)
The coursework will be posted on Moodle on 15 December 2017 at 2pm, and is due on 8 January
2018 at 2pm. Please follow all designated SPP submission guidelines for online submission as
detailed on the PUBLG100A/B Moodle page. Late submission results in an automatic fail.
Coursework should be submitted via the ‘PUBLG100A(B) Essay 2 Turnitin Submission’ link on
the course Moodle page. You will need to click the ‘Submit Paper’ link at the bottom of the page.
When presented with the ‘Submit Paper’ box, the ‘Submission Title’ should be your candidate
number (e.g. ABCD1), and you should upload your document into the box provided.
{ Please remember to state only your candidate number on your coursework (your candidate
number is made up of four letters and one number e.g. ABCD1). Your name and/or student
number must not appear on your coursework.
This is an assessed piece of coursework (worth 75% of your nal module mark) for the PUBLG100A/B
module; collaboration and/or discussion of the coursework with anyone is strictly prohibited. The
rules for plagiarism apply and any cases of suspected plagiarism of published work or the work of
classmates will be taken seriously.
The word count for this assessment is 3000 words (not including tables or the appendix). The
word limit will be strictly enforced, and submissions with more than 3000 words will have marks
deducted.
As this is an assessed piece of work, you may not email/ask the course tutors or teaching fellows
questions about the coursework.
Along with the coursework itself, the datasets for the coursework (vdem.Rdata and bes.Rdata)
can be found in the PUBLG100A/B page on Moodle. You will need to use the load() function
to open these in R.
The coursework consists of 3 sections; you must complete each part of each section to achieve the
full amount of points. The points available for each section are given in the section headings.
Where appropriate, answers should be written in complete sentences; no bulleting or outlining.
Be sure to answer all parts of the questions posed and interpret the results.
PLEASE SUBMIT YOUR TYPE-WRITTEN ANSWERS IN ONE DOCUMENT. CREATE AT
THE END AN APPENDIX SECTION CONTAINING ALL R CODE NEEDED TO REPRO-
DUCE YOUR RESPONSES (you do not need to include the code that failed to run, but just the
cleaned-up version. Your code has to work when we run it). FAILURE TO INCLUDE THE R
CODE MEANS THAT THE COURSEWORK WILL BE MARKED INCOMPLETE (fail).
You may assume the methods you have used (e.g. linear regression, logit, etc) are understood by
the reader and do not need de nitions.
Round all numbers to two digits after the decimal point.
Do not copy and paste any brute R output (e.g. summary(lm(y x))) into your answers. Create
a minimally formatted table, e.g. with the screenreg command as seen in class. If that does not
work, re-create by hand such a table.
Assign every table and gure a title and a number and refer to the number in the text when
discussing a speci c gure or table.
All variable names in the coursework are written in code font.
1
Final coursework PUBLG100A/B December 2017
Datasets
1) Varieties of Democracy { vdem.Rdata
This data set includes several variables taken from the Varieties of Democracy project (https:
//www.v-dem.net/en/). The unit of analysis is the country-year. The data here covers 161
countries for the years 1976, 1981, 1986, 1991, 1996, 2001 and 2006. Not every country is
included for every year, and there are a total of 1040 observations in the data. The variables in
the data set are:
year { Year
country name { Name of country
region name { Geographic region in which the country is located
life expectancy { Life expectancy at birth (in years)
radio television per cap { Number of radio and television sets per capita
log population { Logged population
civil war { 1 if there was an intra-state war with at least 1,000 battle deaths in this
country-year, 0 otherwise
international war { 1 if the country participated in an international armed con ict in
a given year, 0 otherwise
urban population pct { Percentage of population living in urban areas
oil production per cap { Value of petroleum produced per capita (US dollars)
inequality gini { Distribution of income expressed as a Gini coe cient
gdp per cap { Gross domestic production, per capita
inflation { Annual in ation rate
education15 { Average years of education among citizens older than 15
government effectiveness { A continuous measure of government e ectiveness based
on the quality of public service provision amongst bureaucrats and government actors
political stability { A continuous measure of political stability based on perceptions
of the likelihood that the government in power will be destabilized or overthrown by
possibly unconstitutional and/or violent means
polity { Score on the polity scale (higher values indicate more democratic countries,
lower values indicate more autocratic countries)
healthcare { A continuous variable measuring the extent to which high quality basic
healthcare is guaranteed to all (higher values indicate higher access to healthcare)
womens civ lib { A continuous variable indicating whether women have the ability to
make meaningful decisions in key areas of their lives (higher values indicate higher levels
of civil liberties for women)
Final coursework PUBLG100A/B December 2017
media censorship { A continuous variable indicating whether the government directly or
indirectly attempts to censor the print or broadcast media (lower values indicate higher
levels of censorship)
internet access { 1 if there internet in this country-year, 0 otherwise
3
Final coursework PUBLG100A/B December 2017
2) British Election Study { bes.Rdata
This data set includes several variables taken from the 2015 British Election Study (http:
//www.britishelectionstudy.com/data-objects/cross-sectional-data/). The unit of
analysis is individual respondents to a face-to-face survey. There are a total of 1669 obser-
vations in the data. The variables in the data set are:
turnout { 1 if the respondent voted in the 2015 election, 0 otherwise
last election { 1 if the respondent voted in the previous general election, 0 otherwise
age { Age (in years) of the respondent
education { Factor variable for the education of the respondent (1 = \None", 2 =
\GCSE", 3 = \A-level", 4 = \Degree", 5 = \Other")
income { Factor variable for the income of the respondent (Low, Mid Hi)
gender { Gender of the respondent (1= \Female", 2 = \Male")
religion { Religion of the respondent (1 = \None", 2 = \Christian", 3 = \Jewish", 4 =
\Hindu", 5 = \Muslim", 6 = \Sikh", 7 = \Buddhist", 8 = \Other")
homeowner { Home-owning status of the respondent (1 = \It belongs to a Housing Asso-
ciation", 2 = \Own home on mortgage", 3 = \Own home outright", 4 = \Rented from
local authority", 5 = \Rented from private landlord")
ethnicity { Ethnicity of the respondent (1 = \White", 2 = \Mixed", 3 = \Asian", 4 =
\Black British", 5 = \Other")
political interest { Political interest of the respondent (1 = \Fairly interested", 2 =
\Not at all interested", 3 = \Not very interested", 4 = \Very interested")
encouragement { Number of times the respondent was encouraged to vote by a friend or
family member
region { Region of the respondent (1 = \East Midlands", 2 = \East of England", 3 =
\London", 4 = \North East", 5 = \North West", 6 = \Scotland", 7 = \South East", 8 =
\South West", 9 = \Wales", 10 = \West Midlands", 11 = \Yorkshire and the Humber")
partyID { Generic party identi cation of the respondent (1 = \Other", 2 = \None", 3
= \Labour", 4 = \Conservative", 5 = \Liberal Democrat", 6 = \Scottish National Party
(SNP)", 7 = \Green Party")
left right { Respondent self-placement on a left-right scale (0 = Left, 10 = Right)
attention { Respondent attention to politics (0 = Pay no attention, 10 = Pay a great
deal of attention)
read papers { Factor variable for whether the respondent regularly reads a newspaper (1
= \No", 2 = \Yes")
campaign contact { Factor variable for whether the respondent was contacted by a po-
litical party during the campaign (1 = \No", 2 = \Yes")
Final coursework PUBLG100A/B December 2017
financial situation { Factor variable for the respondent’s impression of their nancial
situation over the past year (1 = \A lot worse", 2 = \A little worse", 3 = \The same", 4
= \A little better", 5 = \A lot better")
democratic satisfaction { Respondent’s level of satisfaction with democracy (1 =
\Very dissatis ed", 2 = \A little dissatis ed", 3 = \Fairly satis ed", 4 = \Very satis-
ed")
trust politicians { Respondent’s level of trust in UK politicians (0 = No trust, 10 =
A great deal of trust)
eu referendum { Respondent’s intended vote in the 2016 EU referendum (1 = \I would
not vote", 2 = \Leave the EU", 3 = \Stay in the EU")
political activity { Number of times the respondent has engaged in political activities
over the past year (campaigning, signing petitions, etc)
unemployed { 1 if the respondent is currently unemployed, 0 otherwise
most important issue { Factor variable measuring the most important political issue to
the respondent (many categories)
5
Final coursework PUBLG100A/B December 2017
Question 1 (20 pts.) { The gender gap in promotions
An important question in economics is whether female ‘role models’ help to reduce the gender-
gap in career progression within organisations. To investigate this idea, we collect data on 1500
employees who applied for promotion within their rm. We will focus on the following variables:
promotion { 1 if the employee was promoted following their application, 0 otherwise
female employee { 1 if the employee is female, 0 otherwise
female manager { 1 if the employee’s manager is female, 0 otherwise
employee quality { A variable measuring the \quality" of the employee based on their
performance evaluations over the past year (the variable is measured from 0 to 10, with
higher values indicating higher quality employees)
To test the ‘role model’ e ect, we specify a logistic regression with promotion as the depen-
dent variable, and female employee, female manager and employee quality as independent
variables. We also include the interaction between female employee and female manager.
The equation for the model we estimate is:
logit
i
1 i
= + 1 employee qualityi +
2 female employeei +
3 female manageri +
4 (female employeei female manageri)
where i is the probability that Y = 1 (i.e. the employee was promoted) for observation i. The
estimates from the logistic regression are shown in table 1.
Table 1: Logistic regression for employee promotion
Dependent variable:
Promotion
Employee Quality 0.326
(0.036)
Female Employee 1.338
(0.204)
Female Manager 0.085
(0.276)
Female Employee * Female Manager 0.983
(0.470)
Constant 0.977
(0.181)
Observations 1,500
Log Likelihood 436.590Final coursework PUBLG100A/B December 2017
1. What is the null hypothesis for the interaction term 4?
2. Which coe cients are signi cantly di erent from zero at the 95% con dence level?
3. Describe the e ect of employee quality on promotion using the concept of an odds-ratio
4. Describe the e ects of being a female employee on promotion using odds-ratios
5. Calculate the predicted probability of promotion for the following types of respondent:
A female employee, with an employment quality of 6, working with a male manager
A male employee, with an employment quality of 6, working with a male manager
Final coursework PUBLG100A/B December 2017
Question 2 (40 pts.) { Determinants of life expectancy
The expected length of a life varies greatly across countries and over time. A crucial question
in the literature of politics and public health is: what determines average life expectancy? In
this section, you will use a linear regression model to help to understand this question.
2.1
Your task in this section is to develop a theoretically-grounded model of life expectancy using
the varieties of democracy data (dataset 1 above). You should implement an appropriate model
for life expectancy with six theoretically important explanatory variables from the
supplied dataset. You should explain why you would expect these variables to have an e ect
on life expectancy. You should not estimate several models but argue theoretically why you
chose certain variables. Think carefully about the relevant variables that you want to include {
including considering whether non-linear and/or interactive speci cations of those variables are
appropriate.
In your answer, you should focus on communicating the substantive implications of the
regression that you implement. You may wish to focus on the following:
Provide descriptive statistics and/or plots to provide the reader with an overview of the
dependent variable and the important explanatory variables that you intend to use.
Implement a model which uses 6 explanatory variables to explain life expectancy. Do
not include the healthcare variable in this model. You should state an appropriate
hypothesis/null hypothesis for each of the variables in your model.
Discuss the t of your model using appropriate statistics.
In addition to the estimated results, present quantities of interest from the model that
illustrate the relative importance of the di erent explanatory factors. Examine the e ects
for sensible values of the independent variables, and focus your interpretation not just on
the direction of the e ects, but also the magnitude of the e ects.
Evaluate your regression model with reference to the assumptions of linear regression and,
if appropriate, implement corrections when these assumptions appear to be violated.
You should write up your results as if they were to be published in a political science journal
article with a focus on communicating the substantive meaning of your results.
2.2
You present the results of the model you developed in question 2.1 to a friend who suggests
that your model is too complicated. In particular, the friend argues that the only thing that
matters for determining average life expectancy is the level of healthcare in the country. You
decide to investigate whether your friend is right:
1. Using a statistical test you have learned on this course, evaluate your friend’s claim that
healthcare alone provides a better explanation for life expectancy than your model does.
Final coursework PUBLG100A/B December 2017
2. Incorporate the healthcare variable into your model from 2.1 and use a statistical test
you have learned on this course to evaluate whether the model is improved by the inclusion
of this variable.
3. What is the interpretation of the healthcare variable in your new model?
Final coursework PUBLG100A/B December 2017
Question 3 (40 pts.) { Who votes?
Why do individuals vote in elections? This is a central question in political science, and has
important normative implications: if some types of people vote less than others, they are less
likely to have their views represented in the political process, and are less likely to bene t
from favourable public policy. In this question, you will use an appropriate limited dependent
variable model to improve our understanding of which types of citizens are more likely to vote.
3.1
The data is from the 2015 British Election Study (BES) and includes information on the political
attitudes and demographics of UK citizens. The dependent variable for this analysis is turnout,
which measures 1 if the respondent voted in the 2015 election, and 0 if they did not vote.
You should implement an appropriate limited dependent variable model with 5 theoretically
important predictors from the dataset. You should explain why you have selected the variables
that you include in the model and explain { from a theoretical perspective { why you expect
them to be important for determining whether an individual decides to vote or not. As with
question 2, you should think carefully about your choice of variables, and consider whether it
would be appropriate to include non-linear or interactive speci cations of these variables.
You should focus on the following:
Fit a model which uses 5 explanatory variables to predict turnout. Do not include the
last election variable in this model. You should state an appropriate hypothesis/null
hypothesis for each of the variables in your model.
Provide and discuss an appropriate t statistic for your model
Interpret your model in both statistical and substantive terms. You should present pre-
dicted probabilities from the model that help to illustrate the substantive importance of
the variables in your model (i.e. simply reporting estimated log-odds ratios is not su cient
for full marks).
Create at least one plot of predicted probabilities from your model for a continuous inde-
pendent variable.
You should write up your results as if they were to be published in a political science journal
article with a focus on communicating the substantive meaning of your results.
3.2
You are an advisor to a UK political party. The party bosses look at your results from the model
you developed in section 3.1 and tell you that, in their experience, the most important predictor
for whether a voter votes on election day is whether they voted in the previous election. They
would like to know how your results would change when accounting for past turnout.
1. Estimate a new version of your model, this time including the last election variable.
2. Does this model provide a better t to the data than your original model? Use a statistical
test that you have learned on this course to check.
3. How { if at all { does your interpretation of the original variables change in this new
model? If the interpretation does change, why might this be the case