ASSIGNMENT 1
MODELING MULTIVARIATE SPATIAL DATA
The goal of this assignment is to learn more about models for weather data. The idea is
to predict weather (in our case, atmospheric pressure) at some location using measurements
from the nearby weather stations. This is a challenging problem because there are two types
of dependences that need to be modeled in this case. The rst one is temporal depen-
dence when daily measurements for a given station have serial dependence. In other words,
measurements made at two consecutive days will be dependent. The second one is spatial
dependence when measurements made on a given day will be dependent across di erent
weather stations.
Q1 You need to t a single linear model to the pressure readings for 17 stations. Consider
station j. Because atmospheric pressure does not change very quickly, you may expect the
pressure readings at time t and at time t 1 to be dependent. Hence, you can use the lagged
pressure readings as predictor variables. Let Yt;j be the pressure measured at time t and
station j, t = 1;:::;152 and j = 1;:::;17. De ne
Here, Yj is the response variable and Xj is the predictor variable (lag 1 of Yj). A simple
linear model for the j-th station can be written as
Yj = j + jXj: (1)
In general, we have 17 pairs of coe cients, ( j; j), for the 17 stations, 34 parameters in
total. We want a single model for all stations and we can assume that j depends on spatial
covariates, NLATj, ELONj and ELEVj and j = 0:
j = 0 + 1NLATj + 2ELONj + 3ELEVj:
You can t the model (1) for each of the 17 stations and compare the parameter estimates,
^ j and ^ j, to check if this assumption is reasonable. We plug it in (1) to get
Yj = 0 + 0Xj + 1NLATj + 2ELONj + 3ELEVj: (2)
1
A vector of residuals for the j-th station is
j = Yj ( 0 + 0Xj + 1NLATj + 2ELONj + 3ELEVj);
and unknown parameters, 0; 1; 2; 3 and 0, can be estimated by minimizing the sum of
squared residuals
S2 = S2( 0; 0; 1; 2) =
17X
j=1
j> j; ( ^ 0;^ 0;^ 1;^ 2) = argmin S2( 0; 0; 1; 2):
You do not have to include all spatial covariates in (2); include only those covariates that
can improve the t of this model (for example, if they result in a much smaller S2). Likewise,
you may want to include more lags of Yj (lag 2, lag 3) as predictor variables for Yj.
For the tted model (2), you can check for each station if the residuals, j, have serial
dependence by using acf() function in R and compare the acf plots for residuals with the
acf plots of the original data, Yj. For a given vector of residuals, acf( j) function computes
the correlations Cor( j;t; j;t+l) for l = 1;2;:::;20. If the model (2) is good, the residuals for
each stations should be nearly independent and hence most of the correlations calculated
by acf() should be close to zero and not exceed the upper bound shown as a dashed line
on the plot. You should see a signi cant improvement comparing to the acf plots of the
original data, with much smaller correlations (note that some of them still can exceed the
upper boundary).
Q2 For each station j = 1;:::;17, standardize the vector of residuals, j. Check the
histograms and bivariate scatter plots for some pairs of standardized residuals, j . We
assume that ( 1 ;:::; 17 ) has the multivariate distribution. Is is a plausible assumption?
Q3 Assume that ( 1 ;:::; 17 ) N(0; ) where is a 17 17 correlation matrix with
i;j = exp[ rfdisti;jg ], where r> 0 and 0 0 and 0 < 2. You can do it using nlm() or optim() functions in R.
Q4 Use bootstrap to construct approximate 95% con dence intervals for r and . You
do not have to use all the methods we have covered in class; just one method is good enough.
Note that to obtain parameter estimates, ^r and ^ , we rst need to obtain a matrix of residuals
and to do that, we need to estimate the linear model (2). What bootstrap method might be
more appropriate in this case?
Report You do not have to include your code in the report but you should explain
carefully all important steps; e.g., how you select important spatial covariates and number of
lagged variables in (2) or why the joint normality assumption is suitable for the residuals. You
should brie y explain how you nd maximum likelihood estimates of the r and parameters
and how you use bootstrap to obtain con dence intervals for these parameters. You may
also want to include some acf plots, histograms and scatter plots of residuals.