STAT4051 Assignment 5
Show all work for full credit.
20 pts
Cathedral Problem
The cathedral dataset contains the length in feet, y, nave height, x, and style (Gothic and
Romanesque) of 25 medieval English cathedrals. Assume that y is the response, x is the
covariate and style is the treatment.
library (faraway)
data(cathedral, package=”faraway”)
a. Plot the data and indicate the different cathedral styles on the plot.
b. Determine the final model to predict cathedral length in feet.
c. Did you include a covariate in your final model? If so, quantify its benefit.
d. How much does length change for a 10 unit increase in height?
e. How much does length change for the different styles of cathedrals?
Oehlert Problem 17.1
You may find the following code helpful:
prob17.1<-
read.table(“http://www.stat.umn.edu/~gary/book/fcdae.data/pr17.1”,header=TRUE)
a. Analyze these data with respect to the effect of pesticide on calcium in bones.
b. Write your final model. Hint: Use dummy variables to represent the various
pesticides.
c. Create a plot for your final model.
d. Estimate the average calcium concentration in bones for a diameter of 3.0 mm based
on your final model.
e. Quantify the benefit, if any, of including a covariate in the model.
Glass Problem
Criminal investigators involve classifying the type of glass at a crime scene into 1 of 7
categories based on chemical composition:
Type Description
1 building windows float processed
2 building windows non-float processed
3 vehicle windows float processed
4 vehicle windows non-float processed
5 containers
6 tableware
7 headlamps
1
STAT4051 Assignment 5
At the scene of the crime, the glass left can be used as evidence… if it is correctly identified.
You may find the following code helpful:
glass<-read.csv("https://archive.ics.uci.edu/ml/machine-learningdatabases/glass/glass.data")
attach(glass)
# Name the variables
colnames(glass) <- c("id","ri","Na","Mg","Al", "Si", "K",
"Ca", "Ba", "Fe", "Type")
a. Examine the covariance matrix. What variable(s) dominate?
b. Perform principal components on the covariance matrix. What does the first
eigenvector describe?
c. Examine the correlation matrix. What are the large correlations?
d. Perform principal components on the correlation matrix. Describe the first
eigenvector.
e. What matrix (covariance or correlation) should principal components be performed
on? Justify your choice.
f. How many components should be retained. Justify your choice.
Turtle Problem
Download the turtles.csv dataset from Canvas.
Sex is coded as Female=1 and Male =2.
a. Obtain the sample correlation matrix R for these data and discuss your findings.
b. Show the first principal component is orthogonal to the second principal
component.
c. Write the equation for the first principal component.
d. Create a Scree plot and determine how many components should be retained.
e. Interpret the retained components.
f. Run the following code to examine the pairs of principal components. What plot
separates the turtles sex the most? What does this plot tell you about turtles with
respect to sex?
pairs(~PC1 + PC2 + PC3,data=your dataset name, col=your
variable name for sex)
STAT4051 Assignment 5
Womens Track Problem
Download and import the dataset on Womens Track Records from Canvas. Remember to
set the header=TRUE to retain the original variable names.
a. Obtain the sample correlation matrix R for these data and discuss your findings.
b. Create a Scree plot and determine how many components should be retained.
c. Interpret the first two principal components.
d. Plot the first two principal components. Identify the top three winners. Hint: rank
the nations based on the principal component scores for the first component.
Food for thought -
http://homepage.divms.uiowa.edu/~dzimmer/sports-statistics/Naik-Khattree.pdf