STAT 440辅导、R编程设计调试、讲解data、辅导R语言讲解R语言编程|讲解R语言编程

STAT 440: Homework 4 Due: 2/11 at 3:00pm
All work must be done using RMarkdown. Turn in the code as well as the output. Clearly denote the
results of each question! If the grader has a hard time finding your answer, I will instruct them to not give
you credit!
1. The Zero-Inflated Poisson distribution is useful for modeling count processes where there are additional
zero values. It is commonly used to model the counts of rare events, where most of the time there will
be no events. Let X ∼ ZIP(p, λ) be a random variable from the zero-inflated Poisson distribution
with occurance probability p and rate λ. Then
P(X = i) = ((1 − p) + p · e−λ, i = 0p ·λie−λi!, i = 1, 2, . . .
Random variables from a ZIP distribution can also be written as a function of two other random
variables. If X ∼ ZIP(p, λ), then
X = Y · Z
where
Y ∼ Bern(p)
Z ∼ P ois(λ)
(a) Simulate 10000 iid random variables Xi from Xi ∼ ZIP(0.3, 7) and plot a histogram of the
resulting random variables.
(b) Calculate the theoretical probabilities:
P(X = i) i = 0, 1, ..., 9
and compare them to the Monte Carlo estimates of these probabilities from your simulations.
(c) Estimate θ = E(X) where X ∼ ZIP(p, λ). Use 1000 Monte Carlo samples to estimate θ, and
give a 95% confidence interval for your estimate.
2. Use Monte Carlo to estimate the integral
θ =Z 42(3x2 − 2x − 10)dx.
Perform this calculation m = 1000 times each for Monte Carlo sample sizes of n = 1000, n = 10, 000,
and n = 100, 000. For each n, plot a histogram of ˆθ(n)1, . . . ,
ˆθ(n)m, and calculate the mean squared
error of the estimate,
where ˆθ(n)i is the i
th MC estimate of θ for sample size n, and θ0 is the true value of the integral θ.
3. In the last HW, you estimated the conditional moment of the standard normal distribution: Z ∼ N(0, 1)
θα = E[Z|Z > α]
using Monte Carlo. Now you will do the same using importance sampling.
(a) First, use the efficient sampler you wrote in your last HW (or use the one in the posted solutions)
to estimate θ4.5 using Monte Carlo with 1,000 MC samples. Report the estimate ˆθ4.5, the time it
took to compute the estimator (most of this time will be spent drawing the Z|Z > α), and your
standard error.
STAT 440: Homework 4 Due: 2/11 at 3:00pm
(b) Now you will estimate θ4.5 using an importance sampler. First, use a N(µ, σ2) as your proposal
distribution. Show to to write the expectation ˆθ4.5 as an expectation with respect to the N(µ, σ2)
distribution.
(c) Write the density of Z|Z > α in terms of the normal pdf and the normal cdf. Implement this
density as an R function. Use your function to plot this density for enough values of z between -1
and 10 to make the plot look smooth.
(d) Implement an importance sampler using a N(µ, σ2) as the proposal density. Again, use 1,000
samples. Estimate θ4.5 using your importance sampler for a few different values of µ and σ2, and
calculate the standard error. For the best values of µ and σ2
that you find, plot the corresponding
density on top of your plot of the density of Z|Z > α, and report ˆθ4.5, how long it took to compute
the estimator, and the standard error.
(e) Now try a different proposal density. Write another importance sampler that uses an Exp(λ = α)
proposal density. Run your new importance sampler using 1,000 samples, and report ˆθ4.5, how
long it took to compute the estimator, and the standard error.
(f) Make a table that summarizes your three estimators of θ4.5. This table should contain the point
estimate, the running time, and the standard error. Which estimator to you think is the best?
4. Consider an iid sample X1, . . . , Xn of Bernoulli random variables with success parameter p.
(a) Write down the likelihood function L(p) and the log-likelihood `(p).
(b) Find the maximum likelihood estimator ˆp.
(c) Using the asymptotic theory of MLEs, what is the asymptotic distribution of ˆp?
(d) Instead of using the theory of MLEs, use the CLT to find the asymptotic distribution of ˆp.
(e) What is the nonasymptotic distribution of ˆp? (that is, you can find the exact distribution of ˆp)
(f) In this part we want to visualize the sampling distribution of ˆp. Suppose that p = 0.95. For
n = 10, 100, and 1000 generate 10000 Monte Carlo samples and for each n construct a historgram
of the resulting ˆp. Overlay the asymptotic distributions and the exact distributions (that you
derived previously) onto the histograms (note that the exact distribution will look like a step
function).