首页 > > 详细

辅导数据结构程序、php语言程序辅导、解析R编程、Python程序讲解

The data set ‘movies’ (available at http://users.stat.umn.edu/~parky/movies.csv) contains 128 movies
released in the U.S. during 2009 that opened on more than 500 screens. The data set contains following
variables
Movie: the title of a movie in the data set
Total.Gross: Total domestic box office earning of the movie in millions of U.S. dollars
Opening: Opening weekend box office earning of the movie in millions of U.S. dollars
Screen: the number of screens on which the movie opened
Budget: estimated production budget when reported in millions of U.S. dollars
RT: the rating of the movie at the film review website (rottentomatoes.com)
Action: 1 if the movie’s genre is action, 0 otherwise.
Comedy: 1 if the movie’s genre is comedy, 0 otherwise.
In this project, you are assigned to construct one simple linear regression model and multiple linear
regression models. For each model, use the following variables.
Model Simple Linear Regression Multiple Linear Regression
Response variable Total.Gross Opening
(Potential) Explanatory
variable(s)
Opening Screens, Budget, Action
1. When you fit and assess the model, answer the following questions:
a) Simple linear regression model:
Use Opening to predict Total Gross. Do either or both of the response variable and/or the
explanatory variable need to be transformed? Include graphs/summary outputs to support
your decision. Once you created the model, write the equation, interpret each coefficient and
r-squared, and predict the total gross if a movie has the opening weekend box office earning
of 50 million dollars.
(5 pts: 1 for creating model correctly, 1 for appropriate decision on transformation and reason
(with graphic/summary output statistic), 1 for checking assumptions, 1 for correct
interpretation of intercept and slope, and r-squared, 1 for correctly estimating the total gross
value)

b) Multiple Linear Regression model:
i. Use the step() function (see Section 4.2 of the Lecture slide) to create the best,
simplest model using the {Budget, Screens, Action} to Opening. Consider square
terms but do not consider interaction terms. Do not transform. variables. Once you
create the model, use F-test to see if the overall model is significant.
(4 pts: 1 pt for creating a model, 1 for checking assumptions, 1 for writing the
hypotheses of F-test correctly, 1 for correct conclusion)

ii. Regardless of the model you created in part 1 b) i above, add the in Comedy, RT, and
their interaction into your model. Use F-test to see if the interaction term between
Comedy and RT is significant.
(3 pts: for having model with Comedy, RT, and Comedy*RT, 1 for doing an F-test, 1
for writing the hypotheses correctly)

2. Then, when you use the model, answer the following questions:
a) For the Multiple Regression model in part 1 b) i:
i. Which model is the simplest, best model? Why did you keep or eliminate variable?
(2 pts: 1 pt for citing the model obtained through model selection, 1 pt for explaining
the decision criterion such as AIC, p-value of t-test, adj. r squared, etc.)

ii. Use your model created in part 1 b) i to predict the value of Opening weekend
earning for a movie with Budget=100, Screen=3500, Action=1 (Use only the variable
values needed for your model)
(2 pts: 1 pt for using the formula found by the model selection, 1 pt for calculating
response variable correctly)
You must use R markdown to complete your project. Submit your Rmd file and the pdf version of your
report to moodle.

联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!