STAT500 Applied Statistics – Assignment 2 Semester 2 2017
Auckland University of Technology
STAT500 Applied Statistics
Semester 2, 2017
Assignment 2
Instructions:
• Due date: Submit to Blackboard (AUTonline) by March 30th, 2018, close of day
• This assignment is worth 10% of your final grade and will be marked out of 100 marks.
• Assignments should be submitted to Blackboard as a pdf or word document.
• The dataset is available on Blackboard (AUTonline). All data has been obtained from the World
Bank, with the exception of the "Region" variable.
You can read the dataset into R using a command like
a <- read.csv(file.choose(), header=TRUE)
• Your assignment should have a title page including your name and student ID, the name of the
paper (STAT500 Applied Statistics), the name of the assignment (Assignment 2) and the date.
• R code should be formatted in a fixed-width font such as Courier New.
• Include references (formatted using a recognised style. such as APA) for any sources that you use.
• Late submission: Failure to submit the assignment on time will result in a mark of 0 for the
assignment. If extenuating circumstances (e.g. illness) prevent the timely submission of your
assignment you can apply for special consideration. You may also apply for special consideration
if such circumstances result in your submission being incomplete. The required form. is available
on Blackboard ("Assessment/Special Consideration") or from the SECMS reception in WT Level 1.
• Originality: The assignment is an individual piece of work. You are encouraged to discuss the
assignment with your lecturers and classmates, however, the work you submit must be your own.
Assignments that show similarities to work submitted by other students will be investigated for
plagiarism; we treat this issue very seriously. Plagiarism software, such as TurnItIn, may be used to
electronically compare submissions to those of other students and to documents on the internet.
Talk to your tutor or lecturer if you have any questions about this.
Question: 1 2 3 4 5 6 7 Total
Marks: 10 5 5 45 15 10 10 100
Score:
STAT500 Applied Statistics – Assignment 2 Semester 2 2017
This assignment will follow the statistical enquiry cycle discussed in week 2. Your analysis should be
written as a report (2-4 pages of A4 is about right) and should follow the structure outlined in questions
1 – 6 below.
1. Problem Total for Question 1: 10 marks
Consult the list of variables provided in the dataset and choose a topic to explore (if your dataset is
called a, typing names(a) on the command line tells you the variable names you have available; in
Rstudio, you can see the details of a dataset which you have read in the ‘environment’ pane).
You should choose a broad overall topic, and then write two specific questions to investigate. You
should also provide a brief explanation (about 100-150 words) about why you think these questions
are interesting.
Two examples of topics and questions are provided below (do not choose these examples!). The
questions must be able to be answered using data in the dataset.
Example 1:
Topic: What factors impact literacy?
Two specific questions:
• Do countries with a higher GDP also have a higher literacy rate?
• Does this relationship differ by geographic region?
Example 2:
Topic: How do CO2 emissions vary globally?
Two specific questions:
• Do countries with higher CO2 emissions also have large agricultural sectors?
• Do CO2 emissions vary by geographic region?
2. Plan Total for Question 2: 5 marks
What variables will be required to answer your questions? Provide a brief explanation (in your own
words) about each of your chosen variables. You are encouraged to refer to the World Bank website
for details about variables. If you need to compute an additional variable in order to answer one of
your questions, the code used to do this should be provided.
Example:
Short Name Full Name Brief Description
Female Literacy Rate Literacy rate, adult
female (% of females ages
15 and above)
The percentage of females, 15 years and
above, who are able to read and write simple
statements.
Male Literacy Rate ... ...
Literacy Rate ... ...
GDP ... ...
3. Data Total for Question 3: 5 marks
Read the dataset into R and extract the data that you will need for this analysis. You can extract the
columns by referring to them by name or by number. Provide the code that you use to do this.
Example:
#Read CSV file
allglobal <- read.csv(file.choose())
#Extract chosen columns using names
Page 2 of 5
STAT500 Applied Statistics – Assignment 2 Semester 2 2017
global <- allglobal[, c("Country", "Region", "GDP", "TaxRevenue")]
#Extract chosen columns using column numbers
global <- allglobal[, c(1, 3, 12, 30)]
#Inspect data
head(global)
4. Analysis Total for Question 4: 45 marks
Explore your two questions using appropriate summary statistics and graphs (15 marks per question).
The following structure is recommended for each question:
• Clearly state the question being analysed (e.g. with a heading)
• Summary statistics and graphs. In the example above, you might use R idiom such as
– mean(global$GDP) or maybe median(global$GDP) (why the difference)?
– hist(global$GDP)
– boxplot(GDP~Region, data=global) . . . although in this case you might be better off
using logarithmic axes, boxplot(GDP~Region, data=global, log=’y’)
• 50-150 words explaining the key messages of your summary statistics and graphs.
Across the analysis of the two questions, you should include at least 3 different types of graphs
and 4 different summary statistics. Marks will only be awarded if the graphs, summary statistics
and explanations are correct and appropriate.
5. Conclusion. Total for Question 5: 15 marks
Write one or two paragraphs (approximately 100-200 words) to summarise your findings. You should
answer your question by referring to the appropriate analysis conducted in part 4. You should also
mention further analysis that could be conducted.
6. Critique Total for Question 6: 10 marks
Choose two of the graphs that you used in part 4. For each graph, explain why that type of graph was
appropriate for the analysis that you conducted (approximately 100 - 150 words total).
7. Presentation and Grammar Total for Question 7: 10 marks
Your report should be professionally presented and should be free of spelling and grammar errors,
and all sources should be clearly referenced. Therefore, 10 marks will be allocated based on the
overall presentation of your assignment.
Page 3 of 5
STAT500 Applied Statistics – Assignment 2 Semester 2 2017
Column Short Name Long Name World Bank Code
1 Country Country Name
2 CountryCode Country Code
3 Region Region
4 AdolescentFertility Adolescent fertility rate (births per
1,000 women ages 15-19)
[SP.ADO.TFRT]
5 Agriculture Agriculture, value added (% of GDP) [NV.AGR.TOTL.ZS]
6 CO2 CO2 emissions (metric tons per
capita)
[EN.ATM.CO2E.PC]
7 Electric Electric power consumption (kWh
per capita)
[EG.USE.ELEC.KH.PC]
8 Energy Energy use (kg of oil equivalent per
capita)
[EG.USE.PCAP.KG.OE]
9 Exports Exports of goods and services (% of
GDP)
[NE.EXP.GNFS.ZS]
10 Fertility Fertility rate, total (births per woman) [SP.DYN.TFRT.IN]
11 Forest Forest area (sq. km) [AG.LND.FRST.K2]
12 GDP GDP (current US$) [NY.GDP.MKTP.CD]
13 GDPGrowth GDP growth (annual %) [NY.GDP.MKTP.KD.ZG]
14 TechExports High-technology exports (% of
manufactured exports)
[TX.VAL.TECH.MF.ZS]
15 Immunization Immunization, measles (% of
children ages 12-23 months)
[SH.IMM.MEAS]
16 Imports Imports of goods and services (% of
GDP)
[NE.IMP.GNFS.ZS]
17 Industry Industry, value added (% of GDP) [NV.IND.TOTL.ZS]
18 Internet Internet users (per 100 people) [IT.NET.USER.P2]
19 LifeExpectancy Life expectancy at birth, total (years) [SP.DYN.LE00.IN]
20 Military Military expenditure (% of GDP) [MS.MIL.XPND.GD.ZS]
21 Mobile Mobile cellular subscriptions (per 100
people)
[IT.CEL.SETS.P2]
22 Mortality Mortality rate, under-5 (per 1,000 live
births)
[SH.DYN.MORT]
23 StatisticalCapacity Overall level of statistical capacity
(scale 0 - 100)
[IQ.SCI.OVRL]
24 PopulationDensity Population density (people per sq. km
of land area)
[EN.POP.DNST]
25 PopulationGrowth Population growth (annual %) [SP.POP.GROW]
26 Population Population, total [SP.POP.TOTL]
27 HIV Prevalence of HIV, total (% of
population ages 15-49)
[SH.DYN.AIDS.ZS]
28 SecondarySchool School enrollment, secondary (%
gross)
[SE.SEC.ENRR]
29 SurfaceArea Surface area (sq. km) [AG.SRF.TOTL.K2]
30 TaxRevenue Tax revenue (% of GDP) [GC.TAX.TOTL.GD.ZS]
31 PopulationGrowthUrban Urban population growth (annual %) [SP.URB.GROW]
32 AgeDependency Age dependency ratio (% of working-
age population)
[SP.POP.DPND]
33 Agricultural Agricultural land (sq. km) [AG.LND.AGRI.K2]
34 AirPassengers Air transport, passengers carried [IS.AIR.PSGR]
35 AirDepartures Air transport, registered carrier
departures worldwide
[IS.AIR.DPRT]
Page 4 of 5
STAT500 Applied Statistics – Assignment 2 Semester 2 2017
36 ArmedForces Armed forces personnel (% of total
labor force)
[MS.MIL.TOTL.TF.ZS]
37 ATM Automated teller machines (ATMs)
(per 100,000 adults)
[FB.ATM.TOTL.P5]
38 BirthRate Birth rate, crude (per 1,000 people) [SP.DYN.CBRT.IN]
39 Cereal Cereal production (metric tons) [AG.PRD.CREL.MT]
40 ComputerExports Computer, communications and
other services (% of commercial
service exports)
[TX.VAL.OTHR.ZS.WT]
41 ComputerImports Computer, communications and
other services (% of commercial
service imports)
[TM.VAL.OTHR.ZS.WT]
42 DeathRate Death rate, crude (per 1,000 people) [SP.DYN.CDRT.IN]
43 Broadband Fixed (wired) broadband
subscriptions
[IT.NET.BBND]
44 BroadbandPer100 Fixed (wired) broadband
subscriptions (per 100 people)
[IT.NET.BBND.P2]
45 Telephone Fixed telephone subscriptions [IT.MLT.MAIN]
46 GDPperCapita GDP per capita (current US$) [NY.GDP.PCAP.CD]
47 Hospital Hospital beds (per 1,000 people) [SH.MED.BEDS.ZS]
48 TechExportsUSD High-technology exports (current
US$)
[TX.VAL.TECH.CD]
49 Household Household final consumption
expenditure (current US$)
[NE.CON.PRVT.CD]
50 HouseholdAdj Household final consumption
expenditure, etc. (current US$)
[NE.CON.PETC.CD]
51 ICTServiceExports ICT service exports (% of service
exports, BoP)
[BX.GSR.CCIS.ZS]
52 ICTServiceExportsUSD ICT service exports (BoP, current US$) [BX.GSR.CCIS.CD]
53 ICTGoodsImports ICT goods imports (% total goods
imports)
[TM.VAL.ICTG.ZS.UN]
54 ICTGoodsExports ICT goods exports (% of total goods
exports)
[TX.VAL.ICTG.ZS.UN]
55 Unemployment Long-term unemployment (% of total
unemployment)
[SL.UEM.LTRM.ZS]
56 Nurses Nurses and midwives (per 1,000
people)
[SH.MED.NUMW.P3]
57 PopulationChildren Population ages 0-14 (% of total) [SP.POP.0014.TO.ZS]
58 PopulationAdult Population ages 15-64 (% of total) [SP.POP.1564.TO.ZS]
59 PopulationElderly Population ages 65 and above (% of
total)
[SP.POP.65UP.TO.ZS]
60 Renewable Renewable energy consumption (% of
total final energy consumption)
[EG.FEC.RNEW.ZS]
61 SecureInternetTotal Secure Internet servers [IT.NET.SECR]
62 SecureInternet Secure Internet servers (per 1 million
people)
[IT.NET.SECR.P6]