辅导STATS 782、讲解R编程设计、辅导data留学生、R设计讲解
调试Matlab程序|讲解留学生Processing
            
                Department of Statistics
STATS 782 Statistical Computing
Assignment 3(2020.FC)
Total: 50 marks Due: 2:00 pm NZST, Friday 29 May 2020
1. Please read these instructions carefully. Further instructions might be posted on the class
webpage.
2. Upload your soft copy (assignment source) to Canvas: the file should end in .Rmd, or possibly
.R or .Rnw. The marker may run or knit your R code, so include your name and ID in all
files. The file names should contain your UPI. RMarkdown is strongly recommended.
3. Also upload your .pdf to Canvas too. Note the time difference between countries.
4. Coversheet: please make sure you do one of the following else your assignment will not be
marked:
(a) Sign the Cover Sheet and combine with your assignment document (pdf or Word) into
a single file before submission, OR
(b) Type or write for the following at the beginning of your assignment: Your name (as it
appears in Canvas), your UPI, and the following statement: “I have read the declaration
on the cover sheet and confirm my agreement with it.”
5. Include everything in your report: R code (tidied up), outputs (including error/warning messages),
and your explanations (if any). Please comment on almost all of your output, especially
parts that need human interpretation, else marks will be deducted. That is, you need
to convince the marker that you understand what the data or solution is saying.
6. Print some intermediate results to show how your code works step by step, if not obvious.
Comment your code if appropriate, e.g., for functions, blocks of code, and key variables.
7. Type help.start() when you open R. You need to use the online help to find details and
functions that may not be covered directly in the coursebook. This requires maturity; we
cannot cover everything in class or the coursebook.
8. Your mark for this assignment will depend on getting the right answer, the elegance/efficiency
of your approach, and the tidiness and documentation of your code/report. The R Tidyverse
Style Guide or R Google Style Guide is recommended. Marks (up to 7) will be deducted
for messy code, etc.
9. This PDF file may contain colour that is important to see.
1. [16 marks] The Ministry of Health of New Zealand provides daily updates of the status of
COVID-19 cases in the country. The basic data consists of the date of report and the number
of probable and confirmed COVID-19 cases reported that day. The data reported on April 19
is provided in the file covid19-apr19.csv.
(a) The ministry published the following plot on April 26 showing the total of reported cases
per day (confirmed + probable):
Re-create the graphic using R as closely as sensible. Start with the same basic type of
plot in R then adjust colors, line widths and labels. Finally address the axes. If there are
any visual differences, describe them and explain which version you think is better, yours
or the original and why. Note that there are small differences between the available data
and the plot1
. [4 marks]
(b) In addition to the plot above, the ministry also publishes a plot of all cases known up to
a given date:
Re-create the graphic using R. Discuss any drawbacks of the rendition of the graphic.
[4 marks]
(c) Change the graphic from (a) in a way that it allows to distinguish probable from confirmed
cases. Explain your decisions and which comparisons can be directly performed visually
in the plot. Give at least one example of a comparison which cannot be done using this
plot. [4 marks]
1The dataset file is more detailed in that it counts actual cases filed on the reported day whereas the daily report
plots count new cases known at a given time of that day which may include cases filed earlier.
2
(d) We can modify the plot from (c) such that we can directly compare the relative proportion
of confirmed cases to total each day while keeping the modifications to a minimum as
follows:
Mar 01 Mar 15 Apr 01 Apr 15
Proportion of confirmed cases
Date of report
Proportion (in %)
Re-create that plot type. Did you have to sacrifice information that was available in (c)
but is no longer visible? If so, what was it? Interpret the resulting plot. [4 marks]
3
2. [11 marks] Consider the following plot illustrating an optical illusion:
The plot is composed of squares that are all aligned at the same y coordinate, although our
eyes makes us believe that the lines are not straight. Each row is shifted by 1/4 square relative
to the adjacent rows, but the direction changes every two steps.
(a) Re-create the plot using R. [5 marks]
(b) Create a function taking n as a parameter which determines how many rows of squares
there will be. Run it for values of 9, 11 and 15. [3 marks]
(c) Enhance the function from (b) by adding an argument cols which is a vector of the two
colours to be used to fill the boxes. Call it with f(n=11, cols=c("red","yellow")) and
show the resulting plot. Does the effect still work? [3 marks]
4
3. [23 marks] The dataset temp-cities.csv contains the daily low and high temperatures
for seven cities in the world over last 20 years.
(a) Read the dataset and restrict it to the subset as follows: city Auckland and records from
the year 2019. Create one plot which shows both the lows and highs for every day of
the year 2019 in Auckland. Use blue colour for the lows and red colour for the highs.
[4 marks]
(b) Based on the 2019 Auckland subset, compute the weekly average for both lows and highs
respecitvely. For this purpose the first week are the first 7 days in 2019, second week are
the next 7 days etc. Superimpose the averages over the plot obtained in (a). [4 marks]
(c) Create a matrix of plots such that each plot shows all the data for one city. Make sure
that it is possible to compare values between the plots. Justify the layout you used. The
purpose of this plot is exploratory data analysis, not presentation, so you do not need
to worry about removing axes that are superfluous or labels (other than the city) at this
point. Do you see any obvious issues in the data? [4 marks]
(d) Plot a matrix of scatterplots of highs vs lows for each city. Describe what can you learn
from the plots. Do you see any technical issues with the data? [3 marks]
(e) Compute the average low and high temperature for each city and week of the year. This is
similar to (b), but you want to averge over the years as well, i.e., the average for the first
week2 will be computed from temperatures on 1-7 January of all the years 2000, . . . , 2019.
Do not worry about special handing of leap years.
Plot the results. How can you interpret the resulting shapes? [4 marks]
(f) Take the plot from (e) and improve it by removing superfluous axes and margins. Use
axes only along the outer edge left, bottom and right of the entire matrix as illustrated in
figure 1. [4 marks]
2
If you don’t want to split years by hand (which you can), you may find as.POSIXlt(date)$yday useful.
Figure 1: Weekly average temperature lows and highs for 2000-2019 in 7 world cities.
6