辅导C/C++设计、JSP辅导留学生、讲解留学生JSP编程

Assignment 2 - CSC/DSC 265/465 - Spring 2018 - Due March 8
Q1: The gamma density, denoted Y gamma( ; ) is de ned as
f(yj ; ) =

( )y
1e y
where y; ; > 0. Suppose we observe a Poisson random variable X pois( ), so that
P(X = x) =
x
x! e
; x = 0;1;2;:::
and that has a prior density ( ):
gamma( 0; 0)
for some xed 0, 0.
(a) Recall the de nition of the gamma function
(t) =
Z 1
u=0
ut 1e udu; t> 0:
Show that if Y gamma( ; ) then for any k> 0
E[Yk] = k (k + ) ( ) :
Use this to derive the mean and variance of Y (recall that (t+ 1) = t (t)).
(b) Show that the gamma prior density is a conjugate density for , that is, that the posterior density for
given X is also gamma. Give this density precisely.
(c) Suppose we observe a random sample X1;:::;Xn from a Poisson distribution with mean . Denote the
sum S = Pni=1Xi. De ne prior density gamma( 0; 0). Show that the Bayes estimator under
squared error loss for (that is, the mean of the posterior density) can be given by
^ Bayes = q X + (1 q) 0
0
where X = S=n and q = n=(n + 0), assuming that the posterior density is conditioned directly on S
(recall that the sum of independent Poisson random variables is also a Poisson random variable).
(d) A study is to use this model, and a sample size n is planned. Suppose prior knowledge suggests that
= for some xed value . Since sample size can be taken as a measure of precision or certainty, the
con dence in the prior belief can be expressed as a fraction of the proposed sample size, say n = n=10.
What values 0; 0 would be appropriate for the prior density of in this situation?
Q2: Suppose we observe survival times 42;51;51;51;53+;60;60+;64, where T+ is a right-censored survival
time.
(a) Consider time points (t0;t1;t2;t3;t4;t5) = (0;42;51;53;60;64). For each of these timest0 give the number
at risk at time t0 (the number of subjects with survival times T t0 or T+ t0), and the number who
die at time t0 (T = t0). Summarize these quantities in a tabular form.
(b) Plot a Kaplan-Meier estimate of the survival curve. Do this two ways:
(i) Plot the curve directly from the numbers in the table. Use the pty=’s’ option. Indicate all points
on the curve at which an observation was censored. use a + symbol (pch=3).
1
(ii) Use the survfit() function. Set options conf.int=FALSE and mark.time=TRUE. Under what
conditions are censored observations shown using the mark.time=TRUE option?
Q3: Load the data frame. leuk from the library MASS. This data set contains time (survival time in weeks)
and wbc (white blood counts) for n = 33 leukaemia patients. None of the survival times are censored (in
this case event can be omitted from the Surv object). It also contains the variable ag. From the help
le \[t]he patients were also factored into 2 groups according to the presence or absence of a morphologic
characteristic of white blood cells. Patients termed AG positive were identi ed by the presence of Auer rods
and/or signi cant granulation of the leukaemic cells in the bone marrow at the time of diagnosis".
In the article Feigl, P. & Zelen, M. (1965) Estimation of exponential survival probabilities with concomitant
information, Biometrics 21, 826{838, these lifetimes are assumed to be exponentially distributed with means
1, where is allowed to depend on the predictors wbc and ag.
(a) The Cox proportional hazards model predicts the hazard rate
h(x) = h0(x)e ; x> 0;
where any predictor is incorporated into , but not the baseline hazard rate h0(x). Therefore, a crucial
assumption is that the hazard rate functions of all observations are proportional to h0, and therefore to
each other. This can be checked when comparing a small number of groups, but is more di cult when
using numerical predictors, since each survival time distribution is distinct, except for ties.
However, we can at least check the assumption approximately, by comparing the estimated hazard
functions for the two levels of the factor ag, assuming provisionally that the survival times within each
level are identically distributed.
(i) Show that if the hazard rates are proportional, the cumulative hazard rates will be proportional as
well.
(ii) What is the cumulative hazard function H(x) of an exponentially distributed lifetime of mean 1?
(iii) If each survival time actually is exponentially distributed, with mean 1 dependent on ag and
wbc, will the proportional hazard rate assumption be satis ed?
(iv) Calculate Kaplan-Meier estimates of the survival curves separately for the two levels of factor ag.
Plot the cumulative hazard functions. This can be done using essentially the same method used
to plot the survival curves, except that the option fun="cumhaz" is used. Make sure the option
conf.int=TRUE is used.
(v) Calculate the sample means of the survival times for each level of ag. Using these estimates,
superimpose on the plot an estimate of the cumulative hazard function obtainable by assuming
that survival times are exponentially distributed, and that the mean survival time is constant
within each level of ag. Does the proportional hazards assumption seem reasonable?
(b) Using function cox.ph t the cox proportional hazards model with
= 1ag + 2 log(wbc):
Assuming that survival times are exponentially distributed, use this model to estimate the proportion
by which expected survival time is reduced by a 2-fold increase in white blood cell count.
Q4: This problem will make use of the cabbages data set from the MASS library. This contains data from
a cabbage eld trial. We will be interested in HeadWt (weight of the cabbage head, presumably in kg) and
VitC ascorbic acid content, in unde ned units.
2
(a) We are interested in the simple linear regression model
Yi = 0 + 1Xi + i
where i N(0; 2 ), Yi = VitC and Xi = log(HeadWt). Plot Y against X. Using the function lm(),
calculate the least squares estimates of 0; 1, and superimpose the estimated regression line on your
plot. Is a linear relationship between X and Y plausible?
(b) We will construct a Bayesian model for the inference of ( 0; 1). The marginal prior densities will be
0 N( Y; 20)
1 N(0; 21);
and under the prior assumption, 0 and 1 are independent. The conditional densities of Yi given ( 0; 1)
are Yi N( 0 + 1Xi; 2 ). Given ( 0; 1) the responses Yi can be assumed to be independent. Let
(x; ; 2) denote the density function for a N( ; 2) distribution. The joint posterior density of ( 0; 1)
will therefore be proportional to
( 0; 1 jY1;:::;Yn)/
" nY
i=1
(Yi; 0 + 1Xi; 2 )
#
( 0) ( 1);
where ( 0) = ( 0; Y; 20) and ( 1) = ( 1; 0; 21). Create a Hastings-Metropolis algorithm to
simulate a sample from ( 0; 1 jY1;:::;Yn). Implement the following features.
(i) Estimate Y using the sample mean of the responses Yi, and estimate 2 using the MSE from
the regression model in Part (a) (using data to estimate parameters in a Bayesian model can be
referred to as the empirical Bayesian method).
(ii) Use as a proposal rule something like beta.new = beta.old + runif(2,-1,1). This means the
resulting state space is not discrete, but the algorithm will work in much the same way. Under this
proposal rule we can take 1 = Q(y2 jy1)=Q(y1 jy2) when calculating the acceptance probability.
(iii) Allow N = 100;000 transitions. Capture in a single object all sampled values of ( 0; 1)
(iv) Run the algorithm twice, rst setting prior variance 20 = 21 = 2prior = 100, then 2prior = 1000.
A parameter de ning a prior density is referred to as a hyperparameter. Sometimes, this is used to
represent prior information (for example Y in this model). Otherwise, the hyperparameters are
often set so as to make the prior density uniform, close to uniform, or otherwise highly variable, to
re ect uncertainty regarding the parameter. This is known as a di use prior.
(v) When constructing an MCMC algorithm, rather than calculate a ratio of densities, it is better
to calculate a di erence in log-densities, and then calculate the exponential function of the
di erence, that is, e . This can be done using the log = TRUE option of the dnorm() function.
This option is generally available for density functions in R.
(c) Construct separate histograms for 0 and 1 for each hyperparameter choice 2prior = 100;1000. Also,
superimpose on each histogram the least squares estimate of 0; 1 from Part (a), as well as the con dence
interval bounds ^ i tcritSEi (the abline() function can be used). In general, is the Bayesian inference
for 0 and 1 consistent with the con dence intervals?
(d) When a prior is intended to be di use, the usual practice is to investigate the sensitivity of the posterior
density to the choice of prior. Ideally, in this case, the posterior density does not depend signi cantly on
the prior. The simplest way to do this is to use a range of priors, then compare the resulting posterior
densities. With this in mind, does the posterior density appear to be sensitive to the choice of 2prior?
3
Q5: [For Graduate Students] Let Y be a random variable, and let X be any random observation
X = (X1;:::;Xn). Suppose we have the density of Y conditional on X, fYjX(yjx), and the density of X,
fX(x). The joint density of (X;Y) is then
fXY (x;y) = fYjX(yjx)fX(x):
Then for any function h(x;y) denote the conditional expectation
E[h(X;Y)jX = x] = E[h(x;Y)jX = x] =
Z
y
h(x;y)fYjX(yjx)dy:
The law of total expectation states
E[h(X;Y)] =
Z
x1

Z
xn
Z
y
h(x;y)fXY (x;y)dydx1 dxn
=
Z
x1

Z
xn
Z
y
h(x;y)fYjX(yjx)fX(x)dydx1 dxn
=
Z
x1

Z
xn
fX(x)
Z
y
h(x;y)fYjX(yjx)dy

dx1 dxn
=
Z
x1

Z
xn
E[h(X;Y)jX = x]fX(x)dx1 dxn;
and can be informally summarized as E[h(X;Y)] = E[E[h(X;Y)jX]].
The problem of prediction is to determine a function (X) which is closed to Y in some sense. We can
de ned mean squared error MSE of as
MSE = E[(Y (X))2];
and the minimum mean squared error (MMSE) predictor is the one which minimizes MSE . Then recall that
the Bayes (minimum expected risk) estimator for parameter under squared error loss is the expected value
of under the posterior density. We can verify this by understanding how to derive an MMSE predictor. In
the following development we will assume that all means and variances are nite, and in general this would
need to be veri ed.
(a) Prove that the constant c which minimizes E[(Y c)2] is, uniquely, c = E[Y].
(b) De ne the function
Y (x) = E[Y jX = x];
which gives the expected value of Y conditional on fX = xg. Using the result of Part (a), and the law
of total expectation, show that the MMSE predictor of Y is (X) = Y (X).
(c) Then derive the Bayes estimator for parameter under squared error loss.