辅导 STATS 3860B/9155B Winter 2024 Assignment 3辅导 Java编程

Assignment 3

STATS 3860B/9155B

Winter 2024

• Assingment 3 is due Friday, March 22, 2024, at 11:55 pm.

• You must write your answers and R code using Rmarkdown (template provided with Assignment 1) and generate a single PDF ﬁle. Submissions not generated by Rmarkdown will not be graded and receive zero marks.

• Submissions must be done via Gradescope. You must carefully assign questions to their corresponding pages. Submissions without questions assigned to pages will not be graded. Questions with no pages assigned to them will receive zero marks.

• Always show all your work and add comments to your code explaining what you are doing.

Question 1

The dataset melanoma gives data on a sample of patients suﬀering from melanoma (skin cancer) cross-classiﬁed by the type of cancer and the location on the body.

suppressMessages(library(faraway))

str(melanoma)

## ' data. frame ' : 12 obs . of 3 variables:

## $ count: num 22 16 19 11 2 54 33 17 10 115 . . .

## $ tumor: Factor w/ 4 levels "freckle","indeterminate",..: 1 4 3 2 1 4 3 2 1 4 . . .

## $ site : Factor w/ 3 levels "extremity","head",..: 2 2 2 2 3 3 3 3 1 1 . . .

a) Display the data in a two-way table. Make a mosaic plot and comment on the evidence of independence.

b) Check for independence between site and tumour type using a Chi-squared test.

c) Fit a Poisson GLM model and use it to check for independence.

d) Make a two-way table of the deviance residuals from the last model. Comment on your results.

Question 2

The hsb data was collected as a subset of the “High School and Beyond” study conducted by the National Education Longitudinal Studies program of the National Center for Education Statistics. The variables are gender, race, socioeconomic status (SES), school type, chosen high school program type, scores on reading, writing, math, science and social studies. The response variable is the chosen high school program type (prog), which is multinomial with 3 levels.

library(faraway)

library(nnet)

data("hsb")

hsb <- hsb[,-1] ## removing first column corresponding to student ID str(hsb)

## ' data. frame ' : 200 obs . of 10 variables:

## $ gender : Factor w/ 2 levels "female","male": 2 1 2 2 2 2 2 2 2 2 . . .

## $ race : Factor w/ 4 levels "african-amer",..: 4 4 4 4 4 4 1 3 4 1 . . .

## $ ses : Factor w/ 3 levels "high","low","middle": 2 3 1 1 3 3 3 3 3 3 . . .

## $ schtyp : Factor w/ 2 levels "private","public": 2 2 2 2 2 2 2 2 2 2 . . .

## $ prog : Factor w/ 3 levels "academic","general",..: 2 3 2 3 1 1 2 1 2 1 . . .

## $ read : int 57 68 44 63 47 44 50 34 63 57 . . .

## $ write : int 52 59 33 44 52 52 59 46 57 55 . . .

## $ math : int 41 53 54 47 57 51 42 45 54 52 . . .

## $ science: int 47 63 58 53 53 63 53 39 58 50 . . .

## $ socst : int 57 61 31 56 61 61 61 36 51 51 . . .

a) Fit a multinomial regression model for prog (with baseline level academic) and all nine predictors.

b) Interpret the coeﬃcients corresponding to the ﬁve subjects (scores on reading, writing, math, science and social studies) in terms of odds.

c) Regarding to part b), identify which one of the ﬁve subjects gives unexpected results and suggest an explanation for this behavior. Any reasonable explanation will be accepted.

Question 3

Refer to Exercise 1 Chapter 8 of the textbook (page 171). Work on all parts - a) to e).

Question 4

This question refers to Exercise 4 of Chapter 8 of the Faraway textbook (page 172). Work on all parts - a) to g).

Question 5

The denim dataset concerns the amount of waste in material cutting for a jeans manufacturer due to ﬁve suppliers. Consider the code below to ﬁrst remove two outliers from the dataset.

library(faraway)

data(denim)

denim <- denim[-which(denim$waste == max (denim$waste)),] #removing 2 outliers

denim <- denim[-which(denim$waste == max (denim$waste)),]

str(denim)

## ' data. frame ' : 93 obs . of 2 variables:

## $ waste : num 1.2 16.4 12.1 11.5 24 10.1 -6 9.7 10.2 -3.7 . . .

## $ supplier: Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5 1 2 3 4 5 . . .

## - attr(*, "na. action")= ' omit ' Named int [1:15] 70 75 80 85 90 95 98 99 100 103 . . .

## . . - attr(*, "names")= chr [1:15] "70" "75" "80" "85" . . .

a) Plot the data and comment.

b) Fit the linear ﬁxed eﬀects model. Is the supplier signiﬁcant?