#
PSTAT 115程序实验代写、R编程语言调试、data留学生程序代做
代写Database|代写Python编程

Homework 3

PSTAT 115, Fall 2020

Due on November 8, 2020 at 11:59 pm

Note: If you are working with a partner, please submit only one homework per group with both names

and whether you are taking the course for graduate credit or not. Submit your Rmarkdown (.Rmd) and the

compiled pdf on Gauchospace.

Problem 1. Cancer Research in Laboratory Mice

As a reminder from homework 2, a laboratory is estimating the rate of tumorigenesis (the formation of

tumors) in two strains of mice, A and B. They have tumor count data for 10 mice in strain A and 13 mice in

strain B. Type A mice have been well studied, and information from other laboratories suggests that type

A mice have tumor counts that are approximately Poisson-distributed. Tumor count rates for type B mice

are unknown, but type B mice are related to type A mice. Assuming a Poisson sampling distribution for

each group with rates θA and θB. We assume θA ∼ gamma(120, 10) and θB ∼ gamma(12, 1). We observe

yA = (12, 9, 12, 14, 13, 13, 15, 8, 15, 6) and yB = (11, 11, 10, 9, 9, 8, 7, 10, 6, 8, 8, 9, 7). Now we will actually

investigate evidence that Type A mice are have higher rates of tumor formation than Type B mice.

a. For n0 ∈ {1, 2, ..., 50}, obtain P r(θB < θA | yA, yB) via Monte Carlo sampling for θA ∼ gamma(120, 10)

and θB ∼ gamma(12 × n0, n0). Make a line plot of P r(θB < θA | yA, yB) vs n0. Describe how sensitive

the conclusions about the event {θB < θA} are to the prior distribution on θB.

y_A <- c(12, 9, 12, 14, 13, 13, 15, 8, 15, 6)

y_B <- c(11, 11, 10, 9, 9, 8, 7, 10, 6, 8, 8, 9, 7)

# YOUR CODE HERE

Type your answer here, replacing this text.

b. Repeat the previous part replacing the event {θB < θA} with the event {Y˜B < Y˜A}, where Y˜A and Y˜B

are samples from the posterior predictive distribution.

y_A = c(12, 9, 12, 14, 13, 13, 15, 8, 15, 6)

y_B = c(11, 11, 10, 9, 9, 8, 7, 10, 6, 8, 8, 9, 7)

# YOUR CODE HERE

Type your answer here, replacing this text.

c. In the context of this problem, describe the meaning of the events {θB < θA} and {Y˜B < Y˜A}. How are

they different?

Type your answer here, replacing this text.

2. Posterior Predictive Model Checking

Model checking and refinement is an essential part of Bayesian data analysis. Let’s investigate the adequacy

of the Poisson model for the tumor count data. Consider strain A mice only for now, and generate posterior

predictive datasets y

(1)

A , ..., y

(1000)

A . Each y

(s)

A is a sample of size nA = 10 from the Poisson distribution with

parameter θ

(s)

A , θ

(s)

A is itself a sample from the posterior distribution p(θA | yA) and yA is the observed data.

For each s, let t

(s) be the sample average divided by the sample variance of y

(s)

A .

1

a. If the Poisson model was a reasonable one, what would a “typical” value t

(s) be? Why?

y_A = c(12, 9, 12, 14, 13, 13, 15, 8, 15, 6)

# YOUR CODE HERE

Type your answer here, replacing this text.

b. In any given experiment, the realized value of t

s will not be exactly the “typical value” due to sampling

variability. Make a histogram of t

(s) and compare to the observed value of this statistic, mean(yA)

var(yA)

.

Based on this statistic, make a comment on if the Poisson model seems reasonable for these data (at

least by this one metric).

# YOUR CODE HERE

Type your answer here, replacing this text.

c. Repeat the part b) above for strain B mice, using YB and nB = 13 to generate the samples. Assume

the prior distribution p(θB) ∼ Gamma(12, 1). Again make a comment on the Poisson model fit.

y_B = c(11, 11, 10, 9, 9, 8, 7, 10, 6, 8, 8, 9, 7)

# YOUR CODE HERE

3. Interval estimation with rejection sampling.

a. Use rejection sampling to sample from the following density:

p(x) = 1

4

|sin(x)| × I{x ∈ [0, 2π]}

Use a proposal density which is uniform from 0 to 2π and generate at least 1000 true samples from p(x).

Compute and report the Monte Carlo estimate of the upper and lower bound for the 50% quantile interval

using the quantile function on your samples. Compare this to the 50% HPD region calculated on the

samples. What are the bounds on the HPD region? Report the length of the quantile interval and the total

length of the HPD region. What explains the difference? Hint: to compute the HPD use the hdi function

from the HDInterval package. As the first argument pass in density(samples), where samples is the name

of your vector of true samples from the density. Set the allowSplit argument to true and use the credMass

argument to set the total probability mass in the HPD region to 50%.

b. Plot p(x) using the curve function (base plotting) or stat_function (ggplot). Add lines corresponding

to the intervals / probability regions computed in the previous part to your plot using them segments

function. To ensure that the lines don’t overlap visually, for the HPD region set y0 and y1 to 0 and for

the quantile interval set y0 and y1 to 0.01. Make the segments for HPD region and the segment for

quantile interval different colors. Report the length of the quantile interval and the total length of the

HPD region, verifying that indeed the HPD region is smaller.

# Rejection sampling and interval construction

# YOUR CODE HERE

# hd_region is the result of calling hdi function

hd_region <- NULL # YOUR CODE HERE

print(hd_region)

print(sprintf("Total region length: %.02f", sum(hd_region[, "end"] - hd_region[,"begin"])))

quantile_interval <- NULL # YOUR CODE HERE

print(quantile_interval)

print(sprintf("Total region length: %.02f", quantile_interval[2] - quantile_interval[1]))

2

## Make the plot

# YOUR CODE HERE

3