辅导 MA416 – Analysis of Variance – Mini-Exam 3调试SPSS

MA416 - Analysis of Variance - Mini-Exam 3

Directions (Please Read) : The time permitted for this exam is 40 minutes. You must show all work to receive full, if any, credit (where applicable). A basic scientific or TI calculator may be used, in addition to a 3” x 5” note card (handwritten, back and front); no notes, books, search engine, cell phone, any other person, device, or resource aside from yourself can be used to get assistance on any portion of this exam. Failure to adhere to any of these (or university) policies will result in a zero for this assessment. Please try to adhere to the recommended time limits on each question block.

Disclaimer: This practice assessment is not an exhaustive list of all topics that can be assessed on the actual exam. Please reference the lecture content for a completelist of all expected topics for mastery.

[2 mins] For the following (1) – (4),circle either “True” or “False” regarding the validity of the entire statement provided.

1. True False The adjusted r2 value is always between 0 and 1 for a multiple linear regression model.

2. True False If you are to remove a variable from a multiple linear regression model solely on the multicollinearity indices, you would choose the variable whose multicollinearity index is closest to 1.

3. True False Including interaction terms or higher-order polynomial terms are examples of data-driven multicollinearity.

4. True False If you are to remove a variable from a multiple linear regression model solely on the contribution of a

predictor variables model coefficients, you would choose the variable whose coefficient’s absolute value is closest to 0.

For the following (5) – (7), suppose you want to construct a multiple linear regression model. Suppose that you gather a sample of two. The response values of their ages were (25, 21), and the response values of their sleep average were (5, 3). Assume that you are using age and sleep average to predict happiness values, where your sample happiness values were (87, 75). Show all work to justify your answers. Do not use a calculator or software.

5. Construct the design matrix and response vector for your multiple linear regression model.

6. Assume that you decide to only use age to predict happiness values. Using the matrix approach, find the coefficients of your “multiple” linear regression mode. Show all work to support your answer. Round to 1 decimal place if needed.

7. In continuation of (6), calculate an unbiased estimator of var(y | X = 2). Show all work to justify your answer.

8. Suppose that you have a multiple linear regression model given by:

Y = β0 + β1X1 + β2X2 + β3X3 + ε

How would you calculate the multicollinearity index associated to predictor X2 ? Give your answer in a “step 1, step 2, …” manner.

9. Suppose that you calculate X(XTX)−1XT to be such that your sample size was set to be equal to

four and the number of predictors (p) is equal to two. You should realize that these two statements/objects cannot coexist without contradictions, implying that one row/column is missing from the matrix. One can prove that the average of the leverages for a multiple linear regression model with p predictors built on a sample size of n is equal to n/p+1. Using the given information, what are all of the leverages for your data set?

10. Suppose that you have a simple linear regression model MA : Y = 2.3 + 7.3X and another regression model MB : Y = 1.8 + 7. 1X, where their covariance matrices are given below:

Construct a test statistic that tests the claim that these simple linear regression models are parallel to one another.

Solution Guide

1. F

2. F

3. F

4. F

5. Using the definitions, one can find:

6. Firstly, and One can then find with det(XTX) = 16. From here we have

that:

Therefore, the regression model is given by Y = 12 + 3X1 with β(̂)0 = 12 and β(̂)1 = 3

7. Our predicted values are ̂(y)1 = 12 + 3(25) = 87 and ̂(y)2 = 12 + 3(21) = 75, which are precisely equal tooury1, y2 values,

hence the residuals are both equal to 0, and hence SSE = 0, and since MSE is an unbiased estimator of the homoscedasticity variance (assuming that the model assumptions are met), then we have that (defined conditionally to be 0 since n-2/. → 0 as n → 2).

8. The first step is to obtain the coefficients of the model X2 = ̂(a)0 + ̂(a)1X1 + ̂(a)3X3, and from this model, calculate its T 2 value.

From this T 2 value, the multicollinearity index for X2 is given by

9. Since this implies that Σ(Lj ) = 3. Since the diagonal entries of the hat matrix correspond to the leverages,

subtracting off the given diagonal entries from 3 gives the remaining leverage to be 0.76.

10. The test that we will construct evidence for/against is H0 : β1(A) = β2(B). Our unbiased and consistent point estimates for these

parameters are given to be β(̂)1(A) = 7.3 and β(̂)2(B) = 7.1 . The standard error for the difference (assuming that they are

independent random variables) is equal to:

SE (β(̂)1(A) − β(̂)2(B)) = √6.34 + 2.74 ≈ 3.0133

Therefore, if the normality assumption is met for both models, then the random variable of the difference can be approximated with a T distribution, whose test statistic is equal to 0.0664. The degrees of freedom for this distribution is not asked, so I will not go into it here (it is similar to the unpooled variance T test).