首页 > > 详细

STAT2008/STAT4038/STAT6014/STAT6038

RESEARCH SCHOOL OF FINANCE, ACTUARIAL STUDIES AND STATISTICS REGRESSION MODELLING (STAT2008/STAT4038/STAT6014/STAT6038) Assignment 2 for Semester 1, 2019 INSTRUCTIONS: • This assignment is worth 20% of your overall marks for this course. • Please submit your assignment on Wattle. When uploading to Wattle you must submit the following, combined into a single document: 1. Your assignment/report in a pdf or word document. 2. The R code you have used for the assignment as an appendix. Failure to upload the R code will result in a penalty. • Assignments should be typed. Scanned pdf les will not be marked and result in a penalty. Your assignment may include some carefully edited computer output (e.g. graphs, tables) showing the results of your data analysis and a discussion of these results, aswell as some carefully selected code. Please be selective about what you present and only include as many pages and as much computer output as necessary to justify your solution. It is important to be be concise in your discussion of the results. Clearly label each part of your report with the part of the question that it refers to. • Unless otherwise advised, use a signi cance level of 5% and two decimal places for all answers. • Marks may be deducted if these instructions are not strictly adhered to, and marks will certainly be deducted if the total report is of an unreasonable length, i.e. more than 10 pages including graphs and tables. You may include an appendix that is in addition to the above page limits; however the appendix will not be assessed. It will only be used if there is some question about what you have actually done. • You may ask me (Abhinav Mehta) questions about this assignment up to 24 hours before the submission time. This will allow me enough time to respond to your questions. • Late submissions will attract a penalty of 5% of your mark for each day of delay. No assignments will be accepted 10 days beyond the due date. • Extensions will usually be granted on medical or compassionate grounds on production of appropriate evidence, but must havemy permission by no later than 24hours before the submission date. If you are granted an extension and submit your assignment after the extended deadline then the late submission penalty will still apply. Assignment 2 - Sem 1, 2019 Page 1 of 3 Question 1 [40 Marks] A group of researchers in the US attempted to look at the pollution related factors a ecting mortality. Sixty US cities were sampled. Total age-adjusted mortality, (mortality), from all causes, in deaths per 100,000 population, was measured, along with the following covariates: mean annual precipitation (in inches) (precipitation); median number of school years completed for persons aged 25 years or older (education); percentage of population that is non-white (nonwhite); relative pollution potential of oxides of nitrogen (nox); and relative pollution potential of sulphur dioxide (so2). “Relative pollution potential” is the product of tons emitted per day per square kilometre and a factor correcting for the city dimension and exposure. The data is available in a .csv le, pollution. (a) [6 marks] Fit a multiple linear regression (MLR)model with Mortality as the response variable and all other covariates as predictors. Is the regression model signi cant? (b) [8 marks] What are the estimated coecients of the (MLR) model in part (a) and the standard errors associated with these coecients? Interpret the values of these estimated coecients with regards to model speci cation. (c) [8 marks] There is a t-test associated with each of these coecients. Brie y explain, what these tests can or cannot be used for? In your answer, be sure to mention the appropriate hypotheses that can be assessed using these t-tests. (d) [6 marks] Construct an appropriate test of the hypothesis that education and nox are not signi cant contributors to the model. That is, test education = nox = 0. (e) [6 marks] A researcher from this group suggested a model with coecients: precipitation = 2, education = 􀀀10, nonwhite = 3, nox = 0, and so2 = 1 may be a better model. Can you test whether this new model is signi cant? How would you t such a model and what would be the estimate of the intercept term with these coecients? (f) [6 marks] One of the researcher is from the city of San Antonio, and has recorded a new set of measurements on each of the predictors. The precipitation is 33, education is 11.5, nonwhite is 17.2 and nox and so2 are each 1. What do you predict the mortality rate to be? Find a 99% interval for this prediction. Assignment 2 - Sem 1, 2019 Page 2 of 3 Question 2 [60 Marks] The data for this question comprises measurements on breeding pairs of land-bird species collected from 16 islands around Britain over the course of several decades available in a .csv le, bird. For each species, the data set contains an average time of extinctions, extinct, on those islands where the species appeared. (This is actually the reciprocal of the average of 1=T where T is the length of time the species remained on the island and 1=T is taken to be zero if the species did not become extinct on the island); the average number of nesting pairs per year, over all islands where the species appeared (nest.pair); the size (size) of the species, (S = Small, L = Large); and the migratory status (mig.status) of the species, (R = Resident, M = Migrant). It is expected that species with large numbers of nesting pairs will tend to remain longer before becoming extinct. Of particular interest is whether, after accounting for the number of nesting pairs, size or migratory status has any e ect. (a) [10 marks] Fit a multiple linear regression (MLR) model with extinct as the response variable and all other covariates as predictors. Is the regression model signi cant? Interpret the coe- cients for the categorical variables in this model. Does the coecient support the expectations that large number of nesting pairs tend to delay extinction? (b) [6 marks] As the question indicates, of particular interest is whether, after accounting for the number of nesting pairs, size or migratory status has any e ect. Conduct a formal test of the hypothesis that Size = MigStatus = 0 using an appropriate anova table. Evaluate the Fstatistic and the corresponding p-value. (c) [6 marks] The Red-crested Periwinkle is a small, migratory species of bird, while the Great Plover is a large, resident species of bird. Assuming that the number of nesting pairs is the same for each species over the period, based on the model in part (a), what would you predict the di erence in extinction times to be for these two species? (d) [8 marks] A noted theory suggests that Size and Migratory Status should contribute equally to the extinction time. Test whether the coecients of size and mig.status are the same. Construct an appropriate model to test this hypothesis. (e) [20 marks] Produce the appropriate diagnostic plots for the model tted in part (a) and assess the model assumptions. Produce the relevant in uence diagnostics for this model. Which data points appear to be in uential in the analysis, and in what sense would you consider them in uential? Also, do any points appear to be outliers? If so, to which species do these points correspond? (f) [10 marks] Two transformations are suggested for the response variable, log(extinct) and 1/extinct. Investigate whether using these transformations improves on the model t. Comment on the assumptions ofMLRfor these models as compared to your original model. Which of three models would you choose based on your analysis? Assignment 2 - Sem 1, 2019 Page 3 of 3
联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!