UNIVERSITY OF NEW SOUTH WALES
SCHOOL OF MATHEMATICS AND STATISTICS
MATH2831/2931 Linear Models
Assignment Three
Note: This assignment is due in the Tuesday lecture in Week 12.
Course (please circle) : MATH2831 / MATH2931
I (We) declare that this assessment item is my (our) own work, except where
acknowledged, and has not been submitted for academic credit elsewhere, and
acknowledge that the assessor of this item may, for the purpose of assessing this
item:
• Reproduce this assessment item and provide a copy to another member of
the University; and/or,
• Communicate a copy of this assessment item to a plagiarism checking ser-
vice (which may then retain a copy of the assessment item on its database
for the purpose of future plagiarism checking).
I (We) certify that I (We) have read and understood the University Rules in
respect of Student Academic Misconduct.
Surname Whit Given name Student ID Signature Whit Date Whi
Please follow the instructions below for completing the assignment, it is worth 15% of
your final mark. You may do this assignment in groups up to 3 person.
Instructions
• Your answers to question 1 and 2 must be typeset in one continuous LATEX(.pdf)
or R markdown (knitted to .pdf) document.
• Your answers to question 3 should be a separate R markdown (knitted to .pdf)
document (stapled together with the first question).
• Each question should be numbered using section or enumerate environments
in LATEX or with hashes in R markdown.
• You must do all your calculations in R and provide all code and relevant output
using the verbatim environment in LATEXor inside R markdown chunks.
• You must submit a hard copy printed assignment with a completed cover page
(above).
• Font size should be easily readable (10 to 12)
2
Assignment 3 - Questions
1. (MATH2931 and MATH2831)
Andy wants to know if watering the trees in his woods will result in taller
trees, which are more valuable when sold. He waters some trees daily
with 0, 1 or 2 buckets of water for several years and measures the height
of the trees at the end of his experiment, he also knows the age of each
tree, which will have an effect height. The dataset andydat.csv has the
following variables:
• height: the heigh of each tree at the end of the experiment.
• age: the age of each tree at the end of the experiment.
• buckets: the number of buckets of water poured daily on the tree.
(a) Plot Andy’s data, and comment on any relationships between variables
and potential violations of assumptions.
(b) Fit a linear model to Andy’s data (untransformed), plot residuals (all
four plots using the plot(mod) function), and comment on any as-
sumptions that are violated, or other problems with the model.
(c) By using transformations of the response and/or predictors, build a
model which better meets the assumptions of a linear model. Before
implementing a transformation, clearly state which assumption viola-
tion you are attempting to remedy. Don’t go too nuts, use a maximum
of two transformations.
(d) What is Andy’s main question of interest in statistical language? Hint:
Look at past assignment questions.
(e) Use your final model to answer Andy’s main question of interest, list-
ing all relevant test statistics and p-values.
2. (MATH2931 only)
Andy is accustomed to transforming the response in his own past analyses,
however he has heard of a method called weighted least squares, where
you don’t need to transform. the response. He is interested in trying this
method. In this question, you can assume you don’t need to transform. the
predictors.
(a) We have multiple observations at each age. We want to use the the
inverse of the sample variance for the observations at each value of
age as weights. Write down a mathematical formula for w(agej), the
weight for any observation with age equal to agej. You may need to
define some notation.
3
(b) Write R code to calculate w(agej) for each value of age.
(c) Write R code to add a column weight to the dataset, which gives the
weight for each observation.
(d) Calculate a weighted least squares using the lm() function and the
weights you calculated.
(e) Derive a formula for externally studentized residuals in a weighted
least squares.
Hint: Start by writing down HW, the hat matrix in a weighted least
squares regression.
(f) Check, using an appropriate plot, that the weights you use were ade-
quate.
(g) Give an answer to Andy’s main question, listing all relevant test statis-
tics and p-values.
(h) The gls function in the nlme package is able to do weighted least
squares in a similar way to d) with far more compact code:
trees=read.csv("andydat.csv")
library(nlme)
gls_mod=gls(formula=..., data=trees, weights = ...) #fit model
plot(gls_mod, ...) #plot residuals to check variance assumption
where ... is replaced with appropriate expressions. You will need to
complete the code above. This is a research question, you will not find
the answer in lecture notes. Include in your answer:
• The final code, in the same form. as above.
• A detailed explanation of what each part of your code does in
words.
• Is your answer the same as in part d)? Explain, in no more than
3 sentences, why or why not.
• To be clear, you cannot add additional lines of code to calculate
weights or create covariance matrices.
• The gls function in nlme is different from the gls.lm function
discussed in lecture notes.
• Hint: Look at ?varClasses and ?factor.
3. (MATH2931 and MATH2831)
You will now need to communicate your results to Andy. You’ve had a chat
with Andy, and have concluded that he has a high school level of mathe-
matics, understands simple statistical concepts like mean and variance, and
has used R before to fit linear models. He does not understand any statisti-
cal jargon like bias and estimator. You will need to send Andy a markdown
4
with the results of your analysis (MATH2831 use Q1, MATH2931 use Q2),
with enough detail that he can understand and reproduce the analysis a on
a different (but similar) dataset, including checking assumptions. You will
be marked with the following criteria:
• No more than 3 printed pages using default markdown fonts.
• Appropriate level of mathematical detail for the audience.
• Appropriate level of R code and comments for the audience.
• Clear answer to Andy’s question of interest.
• Clear and reproducible code and assumption checks.
• (MATH2931only)Clearly explain why weighted least squares might
be used instead of standard least squares. Hint: WLS is BLUE in
some circumstances.
• (MATH2931only)Clearly explain why weighted least squares might
be used instead of transformations.
Hint: The following R Markdown chunk options will give you a less clunky
output for residuals plots:
```{r fig.height=2.5, fig.width=10}
par(mfrow=c(1,4))
plot(mod)