POLS0008 Guided Marking Exercise
In a previous assessment for POLS0008 students were asked to complete the following task:
Report brief
You have been commissioned by the Scottish Government to write a report on data related to interest in politics collected immediately before the 2014 independence referendum using a web-based interview. The data contain interest in politics scores (0-10) on different aspects of politics, attitudes to political matters and other individual characteristics.
Question 1 You should start your report with an introduction describing the data and
variables you will use and present the sample characteristics in a table before going on to answer the research questions below (10 points).
Question 2 Did those who voted SNP in 2010 think Scotland gets more or less than its fair
share of UK government spending? (10 points).
- Produce a cross tabulation between whether Scotland gets its fair share of UK government spending and party voted for at the 2010 general election. Describe the table in your text and report the result of an appropriate test to determine whether there is a relationship between the two variables in your table. Comment on whether your data meet the assumptions required to conduct the test and present only a final cross-tabulation that meets these assumptions.
Question 3 Does age predict interest in Scottish politics? (25 points)
- State your hypothesis and report a statistical model in a table. The model should enable you to explain variation in interest in Scottish politics using single year of age as an explanatory variable.
- Describe the findings from your model relating to your hypothesis.
- Use your model to predict interest in Scottish politics for an individual aged 90 and explain whether this is an appropriate prediction to make.
- Check your model for one assumption of your residual values using an appropriate test.
- Comment on the broader limitations of your model.
Question 4 Your report should end with a concluding section summarising the
implications for political engagement in Scotland from your data analysis using no more than 200 words (10 points).
You are required to score each question from each paper on following scale:
• Fail (<40%)
• 40-49
• 50-59
• 60-69
• 70-79
• 80+
Post your marks on Mentimeter for the three papers before 24th March 2025. We will discuss your grading at the final POLS0008 lecture.
Paper 1
Question 1
Introduction
The Scottish Referendum took place on 18 September 2014, where citizens were asked whether Scotland should be an independent country. Answers from around three thousand people were collected in a web-based interview before the referendum to analyse voter behaviors, interests, and other characteristics. The data will be used to explore how people reach decisions on who to vote for, and how this process differs among specific characteristics of elections. Different aspects of politics, attitudes to political subjects and other individual features are collected as part of the data.
Variables
Person ID Number. Used to differentiate between different participants of the web-based survey, the variable works to label the participants, without having to use the names of each respondent to the internet panel survey.
Interest in UK Politics in General. Measures the interest Scottish citizens have in UK politics in general. Scores range from 0 to 10, with 10 indicating the greatest interest and 0 standing for no interest at all in UK politics.
Interest in Scottish Politics in General. Measures the interest Scottish citizens have in Scottish politics in general. Scores range from 0 to 10, with 10 indicating the greatest interest and 0 standing for no interest at all in Scottish politics.
Interest in International Politics. Measures the interest Scottish citizens have in international politics in general. Scores range from 0 to 10, with 10 indicating the greatest interest and 0 standing for no interest at all in international politics.
Interest in the Independence Referendum. Measures the interest Scottish citizens have in the Independence Referendum. Scores range from 0 to 10, with 10 indicating the greatest interest and 0 standing for no interest at all in the Independence Referendum.
How likely it is that you will vote in the referendum. Assesses the likelihood of Scottish citizens voting in the Independence Referendum. The six different answers to this question range from “Very unlikely that I will vote” to “Very likely that I will vote”, including the “Don’t know” option.
Does Scotland get more than its fair share of UK government spending? Finds whether the Scottish citizens think Scotland gets more than its fair share of UK government spending. The answers respondents gave to this question range from “Much less than its fair share” to “Much more than its fair share”, including the “Don’t know” option.
Does London get more than its fair share of UK government spending? Finds whether the Scottish citizens think London gets more than its fair share of UK government spending. The answers respondents gave to this question range from “Much less than its fair share” to “Much more than its fair share”, including the “Don’t know” option.
Vote in 2010 General Election. Used to analyse the respondents’ past opinions on the political parties, the variable shows which party the respondents voted for in 2010 General Election. Answers are “Scottish National Party”, “Liberal Democrats”, “Conservative party”, “Labour Party”, “BNP”, “UKIP”, “Green Party”, “Plaid Cymru”, “Respect”, “Some other party”, and “Did not vote” .
Single Year of Age. Presents the age of each respondent. The internet survey was completed by people aged 16 and over, with the oldest respondents being 82.
Gender. Presents the gender of each respondent.
Current Voting Intention. Shows which party the respondents intend to vote for in the next election. Answers are “SNP”, “Lib Dem”, “Con”, “Lab”, “Other”, and “Don’t know” . Scottish Region. Presents the Scottish regions the respondents are from. There are eight regions the respondents are reporting from.
Table 1: Sample Characteristics
Question 2
After producing a cross tabulation between whether Scotland is seen to get its fair share of UK government spending and party voted for at the 2010 general election, we obtain a table with the columns consisting of the parties respondents voted for in 2010 General Election, and the rows being the answers to the “Does Scotland get more than its fair share of UK government spending?” question. As we have to compare these two variables to see if they are related, we need to run a chi square test. For the chi square test to be conducted, the data has to meet a couple of assumptions.
The first assumption is that the data in the cells should be frequencies, rather than percentages or any other form. When we take a look at our data, we observe that it meets the first assumption, as all the cells are frequencies. The second assumption is that all the variables are mutually exclusive. The variables are “Vote in 2010 General Election” and “Does Scotland get more than its fair share of UK government spending?”, which are mutually exclusive. These two variables are both measured as categories, fulfilling a further assumption . Observations also have to be independent, and the observations in our data are.
Last assumption is that the value of the cell expecteds should be 5 or more in at least 80% of the cells, with no cell having an expected of less than one. To make our data fit this assumption, we find a new name for the less popular parties among respondents: “Other”. After getting the parties with lower frequencies all under one name, we need to reduce the fair share options into four different categories to be more efficient. After performing these steps, we check our data for cells with expected frequency that are less than 5. There is no one cell with an expected less than 5—and the minimum expected frequency of the data is 5.26. This means that our data now fulfills all of the assumptions, and we are allowed to carry on with the chi square test.
Table 3: Cross Tabulation between opinions on Scotland getting its fair share and party voted for in 2010 election
The null hypothesis for the test is that there is no relationship between the categorical variables. Running the chi square test, we obtain the p-value of 1.211269e-152, lower than the alpha level of 0.05, as can be observed in Table 3. As the p-value is smaller than the alpha level, we reject the null hypothesis. It is understood that there is a statistically significant association between the variables. There is a relationship between the parties respondents voted for and whether respondents found Scotland to get more than its fair share of UK government spending, as confirmed by the test.
Question 3
To see if age predicts interest in Scottish politics, we firstly need to fit a model. Since we have an independent variable (Single year of age) and a dependent variable (Interest in Scottish politics), and we want to estimate the relationship between these two variables, we will fit a simple regression model. We set up the null hypothesis that our explanatory variable has no correlation with our dependent variable. As observed on Table 6, the p- values of both variables are lower than the accepted significance level of 0.05. This means that we reject the null hypothesis—we assume that there is a non-zero correlation between the variables.
Table 6: Simple Regression Model Results
Using our simple regression model, we can also predict interest in Scottish politics for individuals of all ages. For an individual aged 90, for instance, the model predicts interest in Scottish politics of 7.98. While this might seem like an appropriate prediction to make, taking another look at Table 6 will explain why this is not the case. The multiple R square value of our model is 0.0091. This indicates that our model only explains 0.91% of the variance in interest in Scottish politics. As our R2 value is low, it can be stated that our model is relatively weak. The model merely associates older age with higher political interest, disregarding other factors that might play a role in explaining political interest. This makes this model unreliable in terms of prediction.
The regression model is based on a number of key assumptions . One of these assumptions is homoskedasticity or equal variances of errors. The model assumes that the variance of the residuals is equal. If it is not equal, the need for different estimation methods might arise. One way to test for this assumption is to run a Non-constant Variance Score Test. After running this test, the p-value of 0.0003 is obtained. As this is lower than the determined significance level of 0.05, we reject the null hypothesis that the variance of the residuals is equal and assume that heteroskedasticity is present. Heteroskedasticity can be fixed by manipulating the data.
While a simple regression model is efficient for the most part, there are some broader limitations to it. For example, we can only consider linear relationships with this model. Another limitation is that there might be other variables that also influence the response variable, that we are not studying in the model. This would give us a biased value of the correlation between the dependent variable and the explanatory variable. A strong correlation also does not automatically mean a cause and effect relationship, and the data has to be analyzed thoroughly before concluding that such a relation exists between the variables.
Question 4
In the present analysis, we found that political interest does not vary a lot among and within the regions of Scotland. In addition, there was no statistically significant difference found between the mean interest in Scottish politics of those aged under 25 and the national average. After assessing the variables, it was shown that there was an association between whether people think Scotland gets its fair share of government spending, and the party they voted for at the 2010 general election. It was also discovered that men are more interested in Scottish politics than women. While analysing the data, it has also come to our attention that a person interested in a specific aspect of politics is likely to be interested in different aspects of it as well. Lastly, we have observed that even though there is a correlation between age and interest in Scottish politics, this correlation is not strong enough to make predictions of political interest solely based on age.
Paper 2
Question 1
Introduction
This report aims to evaluate political engagement in Scotland. The data used to produce the results is issued from the Scottish Referendum Study 2014. It aims to evaluate Scottish interest in politics before the 2014 Scottish referendum through a web-based survey addressed to Scottish adults (aged 16 and over). In total, 4849 respondents answered the survey across all age groups and both genders. Initially, 13 questions were asked to the respondents. However, this report will only consider 9 of these variables.
Variables
The variables disregarded are ID, Interest in the referendum, Likeliness of voting in the referendum and London share in UK spending.
Interest in UK, Scottish and International politics are quantitative ratio variables.
Respondents were asked to rate their interest in these areas of politics on a 0-10 discrete scale where 0 signifies the absence of interest. Additionally, age is also considered a discrete ratio variable as a true 0 exists. However, observations only have a range of 16 to 82.
For ratio variables, the mean and standard deviation seemed to be appropriate summary statistics. The proportion of missing values was also included to indicate the reliability of these values.
Scotland share in spending is a categorical ordinal variable with a range of ‘Much less’ to ‘Much more’ . Vote in 2010 elections, Gender, Voting intention and Scottish region are categorical nominal variables.
For categorical variables, proportions for each levels are presented. These proportions exclude missing values, for clarity. However, the proportion of missing values is also included in the summary to reflect the reliability of these values.
For consistency, all results will be rounded to the second decimal.
Question 2
SNP voters’ perception of Scotland’s share in UK spending
This section will analyse whether SNP voters in the 2010 elections estimate that Scotland gets its fair share of UK government spending.
Methodology
69.10% of SNP voters in 2010 in the sample believe that Scotland receives less than its fair share in UK spending. As the variables ‘Scotland share in spending' and ‘Voting in 2010 elections’ are nominal, I produced a cross-tabulation and conducted a two-way Chi-Square test to establish the existence of a significant relationship between both variables in the population.
Does the data fit the assumptions to conduct a Chi-Square test?
• Each subject contributes to one cell: Each respondent has a unique vote and estimation of Scotland’s share in spending.
• No expected values are lower than 5: To meet this condition, parties with a low vote count were combined under the label ‘Other parties’ .
The data fits the assumptions to run a Chi-Square test with the following characteristics:
• Dependent variable: Scotland’s share in UK spending.
• Explanatory variable: Vote in 2010 elections.
• NULL Hypothesis: The variables are independent.
• Alternative Hypothesis: There is a relationship between the variables. Significance level: 0.05.
As this test does not provide any information about the strength of the relationship, Cramer’s V is conducted to ascertain the level of association.
Figure 3: Cross-tabulation of Scotland’s perceived share in spending according to the party voted for in the 2010 elections.
Interpretation
This cross-table shows the number of individuals who had a certain perception of Scotland’s share in spending according to the party they voted for in 2010. As such, 7.03% of SNP voters believe that Scotland receives more than its fair share in spending. By contrast, 47.80% of Conservative voters have this perception.
After computing the Chi-Square test, we notice 24 degrees of freedom meaning that there are 24 independent pieces of information. We find a Chi- Square of 773.76. This value is high and would be improbable if the NULL Hypothesis was true. This results in a p-value of 1.21 # 10-152 which is far below the significance level of 0.05. Therefore, we can reject the NULL Hypothesis. Calculating Cramer’s V gives a value of 0.20. There is a weak relationship between both variables which can be generalised to the Scottish population. Consequently, the majority of SNP voters believe that Scotland’s receives less than its fair share in UK spending.
Question 3
Prediction of the interest in Scottish politics by age
This section will aim to define whether age is a good predictor for interest in Scottish politics.
Linear relationship
The hypothesis formulated is that age and interest in Scottish politics are positively correlated as it can be assumed that as a person grows older, politics have a stronger impact on their daily life leading to a higher interest.
Pa shows a scatterplot of the Interest in Scottish politics according to age. At first glance, no relationship is apparent. To determine the existence of a linear relationship, I ran a Pearson’s correlation test with the following characteristics:
• NULL Hypothesis: There is no relationship between the variables: true correlation is equal to zero.
• Alternative Hypothesis: There is a relationship between the variables: true correlation differs from 0
• Significance level: 0.05 which means that 5% of Type I errors are tolerated.
The correlation coefficient computed is 0.10 indicating a weak positive relationship between age and interest in Scottish politics. The p-value calculated (2.65 # 10-11) is below the significance level of 0.05 therefore we can reject the NULL Hypothesis. As age increases so does the interest in Scottish politics.
Figure 7: Scatterplot (pa) and linear regression (pb) of the Interest in Scottish politics according to age.
Linear regression
Having ascertained the existence of a statistically significant linear relationship between age and interest in Scottish politics, a simple linear regression line can be !fitted to the data (pb). The dependent variable is the interest in Scottish politics and the explanatory variable is age. The computed coefficients are represented in Figure 8 with a confidence interval of 95%.
Figure 8: Table of the regression coefficients for Interest in Scottish politics and age Interest in Scottish politics Interest in Scottish politics
The resulting equation is: y = 0.02x + 6.62 where y is Interest in Scottish politics and x is Age.
An intercept of 6.62 indicates that an individual aged 0 year would have an interest in Scottish politics of 6.62. This result does not hold much interpretative value as it is inconceivable to believe that a newborn child would be interested in politics. A gradient of 0.02 suggests that for each additional year, interest in Scottish politics grows by 0.02. Additionally, the test computed an R2 of 0.92% which is the variance in Interest in Scottish politics explained by age.
Prediction of the Interest in Scottish politics for an individual aged 90 years
The linear regression equation aims to predict the Interest in Scottish politics according to a person’s age. For an individual aged 90 years, the interest predicted is 7.98. However, although this value offers an idea of what interest would be, this prediction is not appropriate as 90 is outside of the observed age range which lies between 16 and 82.
Testing the model for assumptions
Linearity
In the section ‘Linear relationship’, we have established the statistical significance of a weak positive linear relationship between the variables.
Independence of errors
The observations are independent as the organisation charged with distributing the surveys uses a complex sampling methodology. Therefore, the errors are independent as well.
Equal variances of residuals
To test for homoskedasticity, I ran a non-constant variance score test with the following characteristics:
• NULL Hypothesis: The residuals have an equal variance
• Alternative Hypothesis: The residuals have an unequal variance
• Significance level: 0.05
The p-value computed is 3.04 # 10-4 therefore the NULL Hypothesis is rejected: the residuals have unequal variances. To correct this problem, the regression coefficients can be adjusted for heteroskedasticity. However, the new coefficients do not differ from the previous ones when rounded to the second decimal.
Normality of errors
To test for normality, a Q-Q Plot is produced upon which I noticed that the residuals are not normally distributed.
Outliers and influential cases
It is important to look for outliers and influential cases as they can skew the regression model. After creating an influence plot, several outliers are noticeable. However, they have a low leverage and influence. In fact, the largest outlier has a residual of -3.12 but a Cook’s value of 4.93 x 10-3 (low influence) and a hat-value of 1.01 x 10-3 (low leverage).
Limitations
This regression model is limited in several ways.
By definition, it offers an approximative representation of the relationship between age and interest in Scottish politics. It, therefore, oversimpli!ies the relationship, especially when taking into consideration the low linearity of the data. In fact, the model can only account for 0.92% of the data which suggests that other variables that are not taken into account by this model can explain the variance of interest in Scottish politics.
Moreover, the model fails to meet several key assumptions concerning residuals, notably homoskedasticity and normality. Although the model is quite robust against these assumptions (for example, when adapted for heteroskedasticity, the regression coefficients only slightly changed), it limits the possibilities for generalisation.
Additionally, there are several outliers in the data. These observations have a low leverage and influence on the overall model however their existence further demonstrates that the model fails to fully capture the relationship.
Question 4
Conclusion
This report explored the level of political engagement of Scottish adults prior to the 2014 referendum using a sample of 4849 respondents. It concludes that Scottish adults seemed to have a high level of political engagement across all groups although some differences were noticeable. Notably, men seemed more interested in Scottish politics than women and SNP appeared more popular in the Lothians than in other regions. However, this report cannot conclude that under 25s were more interested in Scottish politics. In fact, age did not seem to be a particularly accurate predictor of interest. This report also established the existence of a weak relationship between party endorsed and Scotland’s perceived share in UK spending and a strong correlation between interest in different areas of politics.
However, this survey was conducted before a critical political event. Contrasting these results with another sample taken after the referendum would allow to determine whether this high and homogenous political engagement is steady across time. Additionally, comparing these results with a sample taken during other political events would allow to find out whether political engagement reaches similar levels for other occasions. These comparisons would permit a further generalisation of the results in this report.
Paper 3
Question 1
Introduction
This report is based on the data from the Scottish Referendum Study in 2014, which is an internet panel survey of 4,849 Scottish adults (aged 16 and over) before the referendum vote. I will be analysing people's interest in different aspects of politics such as UK, Scottish, and International Politics. In order to achieve the objectives, I will be looking at this from different geographical and demographic profiles including age, gender, voting history and more.
In this report, the independent variables involved are the year of age, sex, Scottish regions, and the voter’s selection in the 2010 general election. While the dependent variables included are their fair share opinions, their current voting intention, as well as the interests in UK, Scottish, international politics, and the Independence Referendum.
Question 2
The fourth objective of this report is to determine whether the SNP voters think Scotland gets more than its fair share of UK government spending. In order to determine this, performing a cross-tabulation is essential as it allows us to easily interpret the situation.
Figure 4: Cross-Tabulation Table
As can be inferred from in figure 4, we can observe that out of 1110 people, 34.51% who voted SNP in 2010 believes that Scotland receives a little less than its fair share, and 34.60% believes that Scotland receives much less than its fair share, with a raw total of 383 and 384 respectively. While only 10.36% do not know or thinks Scotland gets more than its fair share, we can conclude that most of those who voted SNP in 2010 think Scotland gets less than its fair share of UK government spending.
Question 3
The seventh and final objective of this report is to determine whether the age predict interest in Scottish politics. The hypothesis is that if age of surveyed person increases, the willingness of voting will also increase, since older population may have more leisure time to pay attention to their nation’s future.
Figure 7: Linear Regression - Age Vs. Interest in Scottish Politics
Figure 8: Residual Plot
The statistical model as shown in figure 7, x-axis is the age, y-axis is their interest to vote, and the regression line is also plotted in red. The hypothesis is accepted, because the regression line does show a positive correlation between age and the interests in Scottish politics. When predicting the interest of Scottish politics for someone who is 90 years old, the result is 7.98. Using linear regression, residual values is shown in figure 8, where the difference between predicted value and actual value is displayed in the y - axis. While x - axis is the fitted value, which were obtained from plotting the residual plot using Python modules. A regression that is close to 0 means that the linear regression obtained in figure 7 is accurate, as predicted value does not differ far from actual value. Therefore, it is possible for us to use figure 7 to predict the Interest in Scottish politics with relation to ages. However, limitation of this statistical model is that the sample is collected from only 4,849 Scottish adults, which is far smaller than the population of Scotland. The regression might fit well for 4,849 samples, but not necessarily accurate when used to reflect situation of the entire population.
Furthermore, the surveyed results might be biased, where data collected favour those who are old, and at the same time has a high interest in Scottish politics.
Question 4
Conclusion
Based from the results of the different statistical test carried out, we arrive at the following conclusions: Eight regions regarding to the interest in Scottish politics have a close variation and mean interest; The Lothian region has a significantly higher interest in Scottish politics as compared to the national average, implying that the party should consider paying close attention to getting supporters from other region besides the Lothian Region; Interests in Scottish politics for voters aged less than 25 is beneath the national average, so the government of Scotland may wish to invest a particular approach to attract the attention of youth population; Men are more interested in Scottish politics than women, which implies that if the Scottish government would like to increase the interest of the population to Scottish politics, more attention should be paid on women; People have significantly higher interest in Scottish politics as compared to the other two, implying that Scottish adults pays more attention to their local political activities than UK an international ones; Finally, as the age of Scottish grew, so do their interests in Scottish politics, and this may also imply the possibility that youth population are lacking interests in politics.