首页 > > 详细

讲解Stat2Data、辅导Java/C++程序语言、p-value留学生讲解 讲解SPSS|辅导Web开发

Logistic Regression project

For this project, you will need the dataset Film in the Stat2Data package. This dataset is based on 100 movie reviews by the movie critic Leonard Matlin. The variables are described on page 501 of your textbook (problem 9.31). The response variable is Good (1 = a rating of 3 stars or better, 0 = any lower rating).

In the dataset, there is a variable called Origin, which specifies the country of origin. Although the textbook lists 5 values for this variable (0 = USA, 1 = Great Britain, 2 = France, 3 = Italy, 4

=Canada), there are actually 7 (additionally, 5 = Sweden and 6 = Hungary). Before you analyze the data, however, you will need to create a new variable that consolidates the 7 levels of Origin into 2: (2 points)


Create the variable English, which takes the values
o1 for the movies in English (Origin = 0 or 1 or 4)
o0 for all other movies

In the following, NewVariable refers to the new variable you created above.

Model 1: Do a Chi-square test of independence to see if there is any relationship between NewVariable and Good (whether or not the movie was rated as “good” by Leonard Maltin). If there is a relationship, use residuals or odds ratios to explain the association.

(4 points: 1 point for performing the test

1 point for checking the assumptions
1 point for the correct decision based on the p-value
1 point for the correct conclusion based on the decision)

Model 2: Create a logistic regression with Good as the response and NewVariable as the predictor, and test whether the slope for NewVariable is significantly different form 0. Is your conclusion consistent with that of Model 1?

(4 points: 1 point for setting up the Null Hypothesis and Alternative Hypothesis correctly 1 point for getting the correct p-value
1 point for the correct decision and conclusion based on the p-value

1 point for the comparing the conclusions of Models 1 and 2 and point out whether they are consistent.)

Model 3: Find the simplest, best logistic regression model for predicting whether a movie is
good from the following variables: NewVariable, Year, Time, Cast, and Description. Perform


stepwise selection starting with the additive model. Your simplest model should include NewVariable, and your most complex model should include a 5-way interaction (if a 5-way interaction is not possible, then include all the 4-way interactions).

(2 points: 1 for doing model selection where the starting model is different from the ones in the scope
1 for using the “both” direction.)

After you select your final model, write it down and use it to predict the probability of a good rating for the following film:

For Lab Sec 002: A film made in Europe in 1971 that is 90 minutes long, has 5 cast members listed, and a description that is 10 lines of text long.

For Lab Sec 003: A film made in US in 1950 that is 120 minutes long, has 8 cast members listed, and a description that is 16 lines of text long.

For Lab Sec 005: A film made in the Britain in 1982 that is 75 minutes long, has 13 cast members listed, and a description that is 5 lines of text long.

For Lab Sec 006: A film where the actors do not speak English, was made in 1933, is

140 minutes long, has 3 cast members listed, and a description that is 20 lines of text long.
(4 points: 2 points for getting the formula correct

2 points for getting the prediction correct)

联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!