Problem
Given the following dataset:
library(foreign)
dat <- read.dta("https://stats.idre.ucla.edu/stat/stata/dae/nb_data.dta")
dat <- within(dat, {
prog <- factor(prog, levels = 1:3, labels = c("General", "Academic", "Vocat
ional"))
id <- factor(id)
})
summary(dat)
## id gender math daysabs
## 1001 : 1 female:160 Min. : 1.00 Min. : 0.000
## 1002 : 1 male :154 1st Qu.:28.00 1st Qu.: 1.000
## 1003 : 1 Median :48.00 Median : 4.000
## 1004 : 1 Mean :48.27 Mean : 5.955
## 1005 : 1 3rd Qu.:70.00 3rd Qu.: 8.000
## 1006 : 1 Max. :99.00 Max. :35.000
## (Other):308
## prog
## General : 40
## Academic :167
## Vocational:107
##
##
##
##
Do the following,
1. Explore the data with ggplot by finding the marginals of General, academic, Vocational and all
of the data combined.
2. Fit a negative binomial model and poisson regression. Find the best fitting model for
each framework.
3. Compare with the best fitting negative binomial model to best fitting poisson regression model
using a likelihood ratio test. Which is better?
4. Given the following data
newdata1 <- data.frame(math = mean(dat$math), prog = factor(1:3, levels = 1:3,
labels = levels(dat$prog)))
Predict the outcome for both the best fitting possion regression and negative binomial. How do they
Page 1 of 2
2018/3/19file:///G:/homework_4.html
compare?
Your solution here
Page 2 of 2
2018/3/19file:///G:/homework_4.html