讲解STAT 3406/STAdata structure编程、C/C++讲解、讲解留学生Matlab

STAT 3406/STAT 4067
Assignment Two: Clustering and data visualisation
This assignment is assessed, and carries 30% weight towards your nal mark for this unit. Your work
for this assignment must be submitted to the unit lecturer by 5pm on Friday, 26 October 2018.
Plagiarism: The work that you submit must be your sole e ort (i.e. not copied from anyone else). If
you are found guilty of plagiarism you may be penalised.
The maximum number of pages for each question is two including key tables and gures, other details
may be provided in an appendix. You will be marked down for exceeding this page limit.
All three of the assignment questions involve the analysis of some data, each worth a third of the total
marks for this assignment.
The data set mentioned in Part Three is available from LMS. If you have di culty accessing the data
you should contact the lecturer immediately.
For each question you should hand in a mini-report for each task. The aim of the mini-report is to
convey the aims, methodology and results of your data analysis in a concise, readable fashion. It is
strongly recommended that you structure your report into sections, along the following lines:
Introduction: Summarise the data and the aims of the analysis.
Methodology: Describe the statistical methods that you use (technical details not required).
Results: Describe the results of your analysis and their interpretations.
Discussion: Draw conclusions (based on your results) as necessary.
Marks for each mini-report will be awarded for
Exposition: your mini-reports should be well organised. You should aim to write in a concise, yet
readable, manner.
Data visualisations: marks will be awarded for appropriate and well presented data visualisations.
Application of statistics: marks will be awarded for the correct use of appropriate statistical
techniques, and for the correct interpretation of results from these techniques.
1
Part One: Beer, wine and spirit choice by country
Using data, Mona Chalabi answers questions from readers for FiveThirtyEight. In an article writ-
ten in 2014 she discusses the countries in which people drink the most beer, wine and spirits https://
fivethirtyeight.com/features/dear-mona-followup-where-do-people-drink-the-most-beer-wine-and-spirits/.
There is an R package which has the data from this article called fivethirtyeight and the dataset
is called drinks, it contains only some of the data referenced in the article namely the World Health
Organisation data only.
Consider the question: Which countries are similar based on their consumption of beer, wine and spirits?
To provide an insight into this question use at least one of K-means cluster analysis or hierarchical
clustering. Given there are 193 countries you may consider a subset, if you do so please provide a
justi cation for your choice.
Part Two: Similarity of NFL teams
Blythe Terrell wanted to pick a new NFL team to support so she compiled data on 32 teams. http://
fivethirtyeight.com/features/the-rams-are-dead-to-me-so-i-answered-3352-questions-to-find-a-new-team/.
There is an R package which has the data from this article called fivethirtyeight and the dataset is
called nfl_fav_team.
Consider the question: What are some of the features of NRL teams?
To provide an insight into this question use principal component analysis. Your mini-report should
include insight into the components and discussion about which teams are similar.
Part Three: Marketing to airline customers
On LMS there is a dataset with 3,999 airline customer records, the spreadsheet also provides some
information on the variables in the data sheet.
Consider the question: What are the key customer segments within this data set?
To provide an insight into this question use self organising maps. Your mini-report should include insight
into the components and discussion about which segments are similar. You should also aim to cluster the
nodes of the SOM to aid in interpretation, this article o ers some guidance on how this can be achieved
https://www.r-bloggers.com/self-organising-maps-for-customer-segmentation-using-r/. If
you want a more honeycomb look to a hexagonal grid this example is useful https://www.visualcinnamon.
com/2013/11/how-to-create-hexagonal-heatmap-in-r.