STAT 3300 Final Exam: Non-Computer Portion VER A
1. Below are results from a regression performed on the same data set as your project. Recall there
were 179 observations. You may assume all the assumptions of the model below are met. Fuel
type is a categorical variable with two levels: gas and diesel.
a. (3 pts) Given the parameter estimate table above, what is the estimated regression equation for
the diesel-powered cars. You may assume all parameters are significant.
b. (3 pts) The coefficient of the “horsepower” term above is 275.20 interpret this coefficient.
c.(3 pts) Find and interpret a confidence interval for the coefficient the “horsepower” term you
interpreted above. Show your work.
d. (3 pts) There is a t value and pvalue missing in the table above. Find the t value (t statistic) and
interpret that parameter. Show your work.
e. (3 pts) Let’s say in reality the relationship (slope) between horsepower and price is not dependent
on the fuel.type even though the pvalue is .0210. Assuming alpha is .05, what type of error would be
be making here? Why?
f. (4 pts) Perform. a formal 6 step hypothesis test to test the claim that the horsepower parameter
(slope) is significantly different from zero. Show all work and draw and shade in step 2.
2. (3 pts) Consider the test
with
alpha = .05 where is a
regression coefficient.
Assume that the actual value
in the population is .
Simply shade in the area of
the chart below that
represents the probability of
making a type II error with
respect to this test.
3. (2 pts) What is the definition of a pvalue?
Below are fits of all simple linear regression models (Model 1-3), all two-way models (models with two of the three
variables in the model Models 4-6) and a the model with all three variables in the model (Model 7).
Model 1:
Model Variables
T P >
|t|
Model 4:
Model
Variables
T P > |t| Model 6:
Model
Variables
T P > |t|
X 4.0 .0021 X 3.9 .002 Y 6.0
|t|
Y 5.0 |t| Model 7:
Model
Variables
T P > |t|
Model 3:
Model Variables
T P >
|t|
X .5 .324 X 4.3 .120
Z .6 .345 Z 15.1 <.0001 Y 2.3 .072
Z 1.8 .0678
4. (3 pts) Conduct a forward selection using the Parameter Estimate Table Above. Simply write down
the variables in the final model. List the final explanatory variables (EVs) that will be selected.
5. (3 pts) Conduct a stepwise regression with the same table above. List the EVs that will be selected.
6. (2 pts) Suppose that two variables X and Y are known to have a correlation (r) of -0.9. Which of
the following statements do we know must always be true? (3pts)
a. A regression of Y (response) on X will produce a line with a negative slope
b. There is a 95% chance that Y values will be found within 2 standard deviations of their mean
c. X is normally distributed
d. The X variable will have a larger standard deviation that the Y variable
e. 90% of Y’s variation is explained by Y’s linear relationship with X.
For questions 7 – 10: Let Ho: Ha: with alpha = .05.
7. (2 pt) T / F If we get a p-value of 0.001, this means we have proven that the mean of group 1 is not
equal to the mean of group 2.
8. (2 pt) T / F If we get a p-value of 0.98, this means that the evidence suggests that the mean of
group 1 is equal to the mean of group 2.
9. (2 pts) If the 95% confidence interval for was (-5.3, -1.1)
a. We are 95% confident that both and are
contained in the interval.
b. We are 95% confident that or is
contained in the interval.
c. We are 95% confident that is larger than .
d. We are 95% confident that is larger than .
e. a and c are correct
f. a and d are correct
g. None of the above are correct
10. (2 pts) Circle all that are true when calculating a correlation between x and y?
d. A negative value for the statistic r indicates x and y are strongly unassociated.
e. A near zero value of the statistic r indicates x and y are strongly unassociated.
f. A value of r = .9 always indicates a linear relationship between x and y.
11. (3 pts) We learned at least 4 methods for imputation for missing values. Simply list 3 of them.
12. (2 pts) If you remove a variable from a multiple linear regression model which of these will always happen?
Circle all that are true.
a. R2 will increase or stay the same.
b. R2 will decrease or stay the same.
c. Adjusted R2 will increase or stay the same.
d. Adjusted R2 will decrease or stay the same.
13. (1pt) All or nothing: Fill in the blank with the appropriate symbol.
Mean Standard Deviation Linear Correlation Coefficient
Sample
Population
14. (2 pts) True or False: The glm function in R does not estimate the regression coefficients of a logistic
regression through minimizing the sum of the squared residuals (OLS), it used the MLE instead.
15. (2 pts) In the last problem, what does MLE stand for?