Lab 5 Statistical Modeling: Least Squares Regression with lm
Write all parts of the lab in an R script. When you have completed the lab, paste your R script.
with functions and output into a Word document and submit to Harvey. Use the courier font for
code.
1. Fitting a line to data. Consider the following x and y data with the goal of fitting a linear
model, yhat(b,x) = b0+b1*x.
x<-c(.22,.33,.48,.49,.51) # ind data
y<-c(330,266,236,224,236) # dep data
a.) What is the correlation between x and y?
b.) Use lm to create a linear model fit to the data. Use $ and linear.fit1 to create and show
variables for the intercept b0 and slope b1.
linear.fit1 <- lm(y~x) # includes intercept by default
summary(linear.fit1)
b0 <- linear.fit1$coefficients[1]
b1 <- linear.fit1$coefficients[2]
c.) Use the code below to plot the data and the model fit. Note how predict.lm works so
you can use it later.
library(ggplot2)
ggplot(data.frame(x=x,y=y), aes(x , y)) +
geom_point() +
stat_function(fun = function(x) b0 + b1 * x) +
xlab("x") + ylab("y") +
ggtitle("Problem 1: Linear Model")
2. Polynomial fit. Consider the following x and y data with the goal of fitting a polynomial
model, yhat(b,x) = b0+b1*x+b2*x^2+b3*x^3+b4*x^4.
x2 <-c(-3,-1.5,.5,2,5)
y2 <-c(6.8,15.2,14.5,-21.2,10)
Use lm and poly to fit the model:
poly.fit2<-lm(y2~poly(x2,4))
Show the regression coefficients. Use the code from part 1 to help you plot the data and
predicted polynomial fit for this model. Hint: here is a way to create a statistical function from
a prediction model.
my.poly.fn <- function(x){predict.lm(poly.fit2,newdata=data.frame(x2=x))}
my.poly.fn(-3)
3. Power models: wind tunnel data. Consider the x and y data in the tab delimited
windtunnel.txt file. The columns correspond to velocity and force data from a hypothetical
experiment from a wind tunnel experiment. Force is measured for increasing wind speed to
estimate the drag coefficient of an object.
Load the data into a variable as follows. Before you run this code, make sure you use setwd to
change the working directory to the file’s location.
wind.data <- read.table("windtunnel.txt",header=T,sep="\t")
a.) First assume that the relationship between force and velocity is linear: F(v) = b0+b1*v. This
is a good model of air resistance for low velocities (laminar flow/low Reynolds number). Use lm
to estimate the model parameters. Plot the data and model on the same figure.
b.) A better model for the force as a function of wind-speed might be a power function:
F(v)=bo ×vb1
In order to linearize this model (bring the second model parameter out of the exponent), we need
to log the model:
lo g (F ) = lo g (b o ) + b1 × lo g (v )
You can fit this model by the following command
power.wind <- lm(log(F)~log(v),data=wind.data)
betas <- coef(power.wind)
Instead of using predict.lm for plotting, you can use the following function.
predict.power <- function(v){exp(betas[1])*v^betas[2]}
Show the parameters (beta’s) and plot the data and model on the same figure.
c.) It appears that the power function is a better fit to the data and the power is nearly squared.
This suggests that the data was collected in a turbulent-flow regime (high Reynolds number),
where the friction force is approximately F(v)=bo ×v2 and bo is related to various physical
properties of the fluid and the object. Squared velocity, as opposed to an arbitrary power, has the
advantage of having a coefficient with more sensible units. Use lm to estimate the parameters of
this new model (you need to use I(v^2)+0). Plot the data and model on the same figure.
Optional
4. The tab separated data file weatherdata.txt consists of two columns of data: months
ordered numerically 1-48 and the corresponding average temperature. There are four years of
average monthly temperatures for Tulsa from January 2008 through December 2011. The goal
of this question is to find the least squares solution for the parameter vector p of the following
model “yhat” given the first two years of data, and then see how well this model predicts future
temperatures:
where M is time in months and o=2pi/12 accounts for the periodicity of weather.
a.) Read in the weather data and fit the model using the first two years of data (Jan 2008- Dec
2009).
omega0 <- 2*pi/12
weather.model.first2years <-
lm(Temp~sin(omega0*mo)+cos(omega0*mo),data=weather.data[1:24,])
Show the parameters and plot the data and predictions for the first two years. You will need to
create a prediction function like in 3b.
b.) Using the model from the first two years (2008-2009), predict the temperature for the second
two years (2010-2011). Show the plots of the data and predictions for years 2010 and 2011.
Comment on the prediction.