首页 > > 详细

讲解留学生R语言、R语言讲解留学生、讲解R程序、R语言讲解留学生、解析R编程


This question uses the automobile data set that you used for your project!
1. There is a new column called Accident Rating which is a dichotomous variable that
describes whether the car is “Safe” or “Dangerous”. We would like to investigate if
there is a relationship between the body.style. of a car and it’s accident rating. We
would like to do this after accounting for the horsepower of the car. Therefore, fit a
logistic regression model with accident rating as the response and horsepower and
body style. as explanatory variables. Specifically, we would like to investigate if there
is significant difference in the odds of being a dangerous car between a sedan and
any other body style. In your answer provide your rcode and parameter estimate
table. Be sure and interpret the appropriate parameter estimates and provide
confidence intervals.

2. Given the model you fit above, interpret the horsepower parameter estimate. Be
sure and provide confidence intervals with your answer.

3. We would like to get an idea of how well your model is performing. Perform. the
necessary steps to construct a confusion table for the model fit above. Provide all
relevant RCode as well as the table. Also include an estimate of the correct
classification rate and the mis-classification rate.

4. Notice that there are a couple of cars with missing accident ratings. Use your model
to estimate the probability that each car is a Dangerous Car. Report the probability
for each car and use it to impute the accident rating of each vehicle. (Update the
data set with these values.) Use a .5 threshold for determining between “Safe” and
“Dangerous”.

5. Use your model to predict the safety rating of Lebron James’ car. It is a convertible
with a horse 450 horsepower. Is this extrapolation? Show all Rcode and make sure
and report the estimated probability of his car being dangerous as well as the actual
predicted “Safe” or “Dangerous” Prediction. Use a .5 threshold for determining
between “Safe” and “Dangerous”.

6. For this question, we would like to quantify our uncertainty about our prediction in
the last question. Find a prediction interval for probability of Labron James’ car
being Dangerous. Can we be 95% confident that his car is “Safe” or “Dangerous”?
Why?

7. Now we would like to use our full dataset, with the imputed values for accident
rating, to estimate the price category of our car. Another new variable has been
created for the price. This variable is called PriceCat (and PriceI) and represents the
price category a car is in (Very Inexpensive, Inexpensive, Moderate, Expensive, Very
Expensive, and Extremely Expensive.) Fit a model that predicts the price category of
the car (not the actual numerical price) using a cumulative logit / ordinal logistic
regression model. Your model should use horsepower and body.style. as predictors.
Display the RCode and parameter estimate table for this model and use this model
to predict the probability of Labron James’ car to fall in the Very Expensive category.

8. Use your model above to interpret the horsepower parameter estimate. Be sure an
include a confidence interval.


BONUS:

1. Perform. a cross validation for the ordinal regression model. Make your training set 2/3
of the original data set and your test set 1/3. Find the correct classification rate (CCR).
Show all R code and confusion matrix.


2. Fit another model of your choice that will beat the above model in terms of cross
validated correct classification rate (based on the same training/test split of the data
from the first bonus question.) Identify your model, show all Rcode and provide the CCR
and the confusion matrix.


THAT’s IT! We have reached the end of Jan Term 2018 STAT 3300!
I can’t tell you how much fun I have had and how much respect I have for the amount of effort
you have put in and frankly how much you have learned. I feel very confident (as I hope you d o
as well) that each of you have a solid understanding of the fundamentals of regression. You
now have the tools to make decisions based on data and that is a very marketable tool. From
here, there are so many more methods and models that can be studied; I can vouch that YOU
have the talent, work ethic and now the background to take this as far as you wish.

Thank you for a great “semester”! :)

Bivin
 

联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!