Assessment 2: Take-home coding exercise
Overview: A 2,500 word (+/- 10%) take-home coding exercise that tests you on the whole of the module’s content.
Percentage of mark for module: 70%
Due date: Submitted via ELE before 1400 on 12/01/2026. Information on how to submit an assignment can be found via:
http://www.exeter.ac.uk/students/infopoints/yourinfopointservices/assessments/
4.1 Structure of the report and what is required
This assignment will test you on the aspects of doing data analysis with Python that you have covered in this module.
For this assignment, you have to pick one of the three datasets listed in the datasets section below.
You are then required to do the following:
1. Use Python to clean the dataset.
2. Use Python to create some visualisations of the data and to calculate summary statistics.
3. Do a 2,500 (+/-10%) word write-up.
The report should include:
1. A description of the dataset.
9
10 CHAPTER 4. ASSESSMENT 2: TAKE-HOME CODING EXERCISE
2. The issues you encountered with the data and the steps you took to clean the dataset.
3. A discussion of what your data visualisations and descriptive statistics show.
4. The Python code you used in annotated form. If you decide to do your analysis in a Jupyter notebook, you can include markdown boxes that contain text describing what the different parts of your code do as you go along. You can then export the Jupyter notebook as a pdf file and attach it to your report for submission. If, instead, you decide to code in a Python script, you can include commented lines of code explaining what it does. It is important to note that you do not have to provide a description for every single line of code. Instead, you can just provide a description of what chunks of your code (i.e. a function or a for loop) do. You can then copy and paste the script. into a notepad or word document to attached to your report. The annotation of your code does not count towards the word count of the report.
There is no specific question to answer in this assignment per se. More-over, this assignment is about you demonstrating your ability to clean a dataset and to explore it using visualisation and descriptive statistics. In other words, use you Python and data science skills to describe the dataset.
4.2 Datasets
There are three datasets to choose from for this assignment, of which you must select one to do the above on.
4.2.1 Dataset1 1: World University Rankings - 2023
The World University Rankings 2023 dataset include 1,799 universities across 104 countries and regions, making them the largest and most diverse univer-sity rankings to date.
This dataset contains 10 variables:
1. University Rank
2. Name of University
3. Location
4.2. DATASETS 11
4. No of student
5. No of student per staff
6. International Student
7. OverAll Score
8. Teaching Score
9. Research Score
10. Citations Score
4.2.2 Dataset 2: Airlines
Airline data contains a great amount of information regardomg the functioning and efficiency of the aviation industry.
This dataset contains 15 variables:
1. Passenger ID
2. First Name
3. Last Name
4. Gender
5. Age
6. Nationality
7. Airport Name
8. Airport
9. Country Code
10. Country Name
11. Airport Continent
12. Continents
13. Departure Date
12 CHAPTER 4. ASSESSMENT 2: TAKE-HOME CODING EXERCISE
14. Arrival Airport
15. Pilot Name
16. Flight Status
4.2.3 Dataset 3: Spotify most played - 2023
This dataset contains a comprehensive list of the most popular songs streamed
on Spotify during 2023.
This variable has 8 variables:
1. trackname
2. artist(s)name
3. artistcount
4. releasedyear
5. releasedmonth
6. releasedday
7. inspotifyplaylists
8. inspotifycharts