MSc/MPhil Data Analysis Assessment
Attitudes to Immigration in Contemporary Britain
Due: 12 noon, Monday, Week 10 MT
Medium: PDF only
Submission: Inspera
A Professor has hired you to assist with the analysis of a UK social survey dataset on public attitudes toward immigration. The purpose of the project is twofold:
1. Identify which factors are associated with people’s views about immigration, and
2. Quantify the strength and direction of these relationships.
The Professor does not have time to carry out the analysis themselves and is relying on you to examine the data carefully, justify your analytical decisions, and summarise your findings clearly. You are expected to:
• Interrogate the dataset thoroughly, using appropriate descriptive and inferential methods.
• Explain the analytical choices you make (e.g., how you structure models, which variables you include).
• Present well-designed tables and figures that support your conclusions.
• Avoid raw, unedited R output. The Professor has a strong aversion to copy-and-paste console dumps; all results must be formatted clearly and thoughtfully.
• Produce a coherent narrative that explains what predicts attitudes toward immigration and how strongly.
• Draw conclusions grounded strictly in your analysis of the dataset. A literature review is not required and will not be assessed.
•
Your final report should be no more than 3000 words, excluding tables and figures. Headings, subheadings and figure/table captions do not count toward the word limit.
A bibliography is not required; you should not include one. You must include, in an appendix, all R code used to generate your results. The appendix is not included in the word count.
This code should be commented clearly, so the purpose of each block is immediately obvious.
If you are unsure about how to treat a variable, structure an analysis, present a result, or choose a model, you must make your own decision. The ability to do this independently is part of what is being assessed.
Assignment Dataset Allocation
For the purposes of this assessment, students are divided into two groups.
Your allocated dataset depends on your birth month:
• If you were born in January–June, you must work with dataset_ 1.RDS
• If you were born in July–December, you must work with dataset_2.RDS •
This allocation is fixed and you must use the dataset assigned to you.
The file you have been assigned already contains only your allocated cases; the subset variable is for internal use only and you can ignore it.
Codebook (same for both datasets)
serial
5-digit numeric ID for each respondent.
subset
1 = Dataset 1
2 = Dataset 2
age
Age in whole years (18-99).
female
1 = Female
0 = Male
urban
1 = Urban area
0 = Rural area
london
1 = Lives in London
0 = Does not live in London
bornUK
1 = Born in the UK
0 = Born outside the UK
graduate
1 = Degree-level qualification
0 = No degree
renter
1 = Rents their home
0 = Owns their home (including mortgage)
contact
1 = Has meaningful contact with immigrants
0 = No meaningful contact
occ_class
Occupation group:
1 = manager_prof
2 = intermediate
3 = working_class
hh_inc
Gross household income (£ per year). Top-coded at £200,000.
imm_att5
Attitude toward immigration (1-5 scale):
1 = Very bad for Britain
2 = Quite bad
3 = Neither good nor bad
4 = Quite good
5 = Very good for Britain
(Higher values = more favourable)
zodiac
Birth sign (categorical):
1 = Aries,
2 = Taurus
3 = Gemini
4 = Cancer
5 = Leo
6 = Virgo
7 = Libra
8 = Scorpio
9 = Sagittarius
10 = Capricorn
11 = Aquarius
12 = Pisces
AI / LLM USE POLICY FOR THIS ASSIGNMENT
You are permitted to use LLMs (ChatGPT, Claude, Copilot, etc.) under the following conditions:
1. You may use an LLM to help you with:
• debugging your R code (e.g., “Why am I getting this error?”)
• reminding you of R syntax (e.g., “How do I make a scatterplot?”)
• general conceptual understanding (e.g., “What does R-squared mean?”)
• explaining an output that you have already generated (e.g., “Here is my regression table — what is a slope coefficient?”)
These uses are acceptable as long as the analysis is your own, based on the dataset provided.
2. You may NOT use an LLM for:
• running or interpreting analyses directly from the assignment instructions without looking at your own data
• writing your report for you
• generating numerical results (means, SDs, p-values, correlations, regression coefficients, etc.)
• inventing interpretations that do not match your actual output
• selecting variables or describing patterns without referring to the actual dataset If your report contains interpretations or claims that do NOT match your submitted R output, this will be treated as academic misconduct.
3. All numerical results must come from your own R analysis of the provided dataset.
4. All figures and tables must be produced by your own script
Screenshots or AI-generated plots are not acceptable.
5. You are responsible for the accuracy of everything you submit.
If you use an LLM to help explain something, you must still check:
• the meaning is correct,
• the interpretation fits your actual numbers,
• the description matches your actual plot,
• nothing contradicts your analysis.
6. Your submitted R script. must run and reproduce all numbers in your report.
If your text and your script. do not match, that will be taken as evidence that you did not do the analysis yourself.
In short:
AI tools may help you understand and debug your work. They may NOT replace your statistical analysis, your interpretation, or your judgment. Your report must reflect your own engagement with the dataset and the course material.