讲解R程序、R调试、讲解留学生R语言、R讲解、辅导留学生R设计、R程序讲解

Problem 2
This problem uses the data set Prob2.csv on the class website, which contains data from an
experimental study of recidivism of 432 male prisoners, who were observed for a year after
being released from prison. The following variables are included in the data:

A data frame. with 432 observations on the following variables.
week: week of first arrest after release, or censoring time.
arrest: the event indicator, equal to 1 for those arrested during the
period of the study and 0 for those who were not arrested.
fin: a factor, with levels yes if the individual received financial aid
after release from prison, and no if he did not.
age: in years at the time of release.
race: a factor with levels black and other.
wexp: a factor with levels yes if the individual had full-time work
experience prior to incarceration and no if he did not.
mar: a factor with levels married if the individual was married at the time
of release and not married if he was not.
paro: a factor coded yes if the individual was released on parole and no if
he was not.
prio: number of prior convictions.

a. (5 points) Fit a Cox model with all variables included. Provide a point estimate and 95%
confidence interval for the hazard ratio with one year increase of age.
b. (10 points) Fit a new Cox model with only significant variables identified from a (i.e. those
variables with p-value < 0.05). Test the proportional haz- ard assumption for this new Cox
model using the scaled Schoenfeld residuals. Which variable(s) violate the proportional
hazard assumption?
c. (5 points) To address the proportional hazard assumption issue for the vari- able(s)
identified in b, you may propose a new Cox model. How would you justify the proportional
hazard assumption issue is resolved for the new model?
d. (5 points) For the three models from part a, b, and c, which model best fit this data set and
why?
Problem 3
This problem uses the retinopathy data in the survival package (a copy Prob3.csv is on the
class website). A data description is given below.

Diabetic Retinopathy

Description:
A trial of laser coagulation as a treatment to delay diabetic
retinopathy.
Format:
A data frame. with 394 observations on the following 9 variables.
id numeric subject id
laser type of laser used: xenon argon
eye which eye was treated: right left
age age at diagnosis of diabetes
type type of diabetes: juvenile adult,
(diagnosis before age 20)
trt 0 = control eye, 1 = treated eye
futime time to loss of vision or last follow-up
status 0 = censored, 1 = loss of vision in this eye
risk a risk score for the eye. This high risk subset is defined
as a score of 6 or greater in at least one eye.
Details:
The 197 patients in this dataset were a 50% random sample of the
patients with "high-risk" diabetic retinopathy as defined by the
Diabetic Retinopathy Study (DRS). Each patient had one eye
randomized to laser treatment and the other eye received no
treatment, and has two observations in the data set. For each eye,
the event of interest was the time from initiation of treatment to
the time when visual acuity dropped below 5/200 two visits in a
row. Thus there is a built-in lag time of approximately 6 months
(visits were every 3 months). Survival times in this dataset are
the actual time to vision loss in months, minus the minimum
possible time to event (6.5 months). Censoring was caused by
death, dropout, or end of the study.
References:
W. J. Huster, R. Brookmeyer and S. G. Self (1989). Modelling
paired survival data with covariates, Biometrics 45:145-156.
A. L. Blair, D. R. Hadden, J. A. Weaver, D. B. Archer, P. B.
Johnston and C. J. Maguire (1976). The 5-year prognosis for
vision in diabetes, American Journal of Ophthalmology, 81:383-396.

a. (5 points) Calculate a point estimate and 95% confidence interval for the crude hazard ratio
for loss of vision comparing the treated eye to the un- treated eye.
b. (5 points) Adjust the variance for clustering of the two eyes of each individual. Calculate a
new 95% confidence interval.
c. (5 points) Now adjust for the risk score risk in each eye. Without adjusting for clustering,
try polynomial terms or a factor. Which model fits best? Explain your reasoning.
d. (5 points) Adjust for clustering by individual in the model from part (c). Give a new point
estimate and 95% confidence interval for the hazard ratio for loss of vision associated with
treatment.
e. (5 points) Using the model part (d), plot the predicted survival for treated and untreated
patients with risk = 9. Give meaningful axis labels, differen- tiate the treated and untreated
curves, and include a legend.

Problem 4
This problem uses the data set Prob4.csv on the class website, which is a right- censored
sample of 300 observations from a parametric failure time distribution. It has follow-up time
exit and event indicator delta.
a. (5 points) Calculate and plot the Nelson-Aalen estimator of the cumulative hazard function
with log-transformed pointwise 95% confidence intervals. Be sure to include useful axis
labels.
b. (5 points) Use parametric regression models to determine whether the sample is from a
Weibull or log-logistic distribution. Explain your choice.
c. (5 points) Give point estimates and 95% confidence intervals for the rate λ and shape γ for
the best-fitting distribution from part (b).
d. (5 points) Using the point estimates of λ and γ from part (c), add the pre- dicted cumulative
hazard function to the plot from part (a). Use color or line type to distinguish the parametric
estimate from the Nelson-Aalen estimate, and add a legend.
e. (5 points (STA 6177 Only)) Give point estimates and 95% confidence limits for the mean
and median survival times using the distribution chosen in part (b).