algorithm辅导、讲解R设计、R编程语言调试、辅导integration辅导Python编程|讲解数据库SQL

Lab 2: Monte Carlo integration and importance
sampling
Nial Friel
5 October 2019
Aim of this lab
This practical is focused on implementing Monte Carlo integration and the importance sampling algorithm
for two examples.
Monte Carlo integration and importance sampling
Recall that we are interested to estimate the integral
θ =Zφ(x)f(x) dx = Efφ(X).
Using (ordinary) Monte Carlo we can estimate θ asˆθn(f) = 1nφ(xi),
where x1, . . . , xn are iid draws from f(x).
Similarly, we can estimate θ using importance sampling. Recall that the key idea here is to express the
expectation wrt to f as one wrt an importance density g. So that one may then estimate the quantity of
interest, θ, using Monte Carlo by drawing from g. We can explain this idea neatly as:
Efφ(X) = Zφ(x)f(x) dx =Zφ(x)f(x)g(x)
g(x) dx = Egφ(X)w(X),
where w(x) = f(x)/g(x). We can now produce an importance sampling estimate of θ asθn(g) = 1nφ(xi)w(xi),
where now x1, . . . , xn are iid draws from g(x).
Note that importance sampling can prove useful in situations where we can’t sample from f, but crucially,
also in situations where sampling from g can lead to a reduced variance estimate of θ.
Here we first write a generic function to carry out importance sampling following the notation above.
importance.sampling <- function(f, phi, g, random.g, n) {
# Aim: Estimate Integral of phi(x)*f(x) dx, ie, E_f phi(X)
# f: density which we wish to sample from (or take expectation wrt)
# phi: function which we take expectation of
# g: density which we can sample from
# rg: random draw from density g.
# n: number of samples drawn from g.
1
y <- random.g(n)
mean(phi(y)*f(y)/g(y))}
Rejection sampling from a gamma distribution
We now implement this code to simulate from a gamma distribution. First, recall from labs 1 that we can
express Y ∼ Ga(k, λ) with k ∈ R as
Y = X1 + · · · + Xk
where Xi
iid ∼ Exp(λ). We can use this fact to simulate from a Ga(k, λ) distributions with integer-valued
k > 0:
inv.gamma.int <- function(n,k,lambda) {
x <- matrix(inv.exp(n=n*k,lambda=lambda),ncol=k)
apply(x,1,sum)}
using the inv.exp function which we also developed in lab 1 to simulate from an exponential distribution:
inv.exp <- function(n,lambda){
u <- runif(n)x <- -log(1-u)/lambdax}
The method described above for drawing from a Ga(k, λ) distribution only works for positive integers k.
However, suppose we would like to sample from X ∼ Ga(α, λ), for α, λ > 1, with density function f(x) to
estimate Ef (X).
Instead of sampling from f, we could use importance sampling with a Ga(k, λ − 1) distribution with k = bαc
as an importance function g(x). The symbol bαc means the largest integer less than α. This can be evaluated
in R using floor(α).
One of the key requirements for an importance function g(x) is that it dominates f(x). This means that
g(x) > 0 whenever f(x) = 0. In the context of f(x) being the density of a Ga(α, λ), distribution, for α, λ > 1
and g(x) being the density of a Ga(k, λ−1) distribution with k = bαc, it is easy to show that this requirement
holds.
Since α ≤ k and λ > 1, this implies that w(x) > 0 for the entire range of x > 0.
Let’s now implement importance sampling to estimate E(X) where X ∼ Ga(3.4, 3) using using the importance
sampling strategy described above. We can do this as follows:
f <- function(x) dgamma(x, 3.4, rate=3) # alpha=3.4, lambda=3
phi <- function(x) x
g <- function(x) dgamma(x,3,rate=2) # k=floor(alpha)=3, lambda=2
random.g <- function(n) inv.gamma.int(n,3,2)
y <- importance.sampling(f,phi,g,random.g,1e4)
y
2
## [1] 1.133229
This agrees well with the true value of 3.4/3 = 1.333.
Calculating the tail probabilty of a Cauchy distribution
Recall the following example in the lecture notes:
Estimate the probability θ = P(X > 2) where X follows a Cauchy distribution with density.
Using our usual strategy we express this as an expectation as follows.
Z ∞2f(x) dx =ZIA(x)f(x) dx = Ef IA(X),
where IA(x) is the indicator function taking the value 1 if x ∈ A = {x : x > 2}. Therefore we can estimate θ
using (ordinary) Monte Carlo asˆθn(f) = 1n,
where x1, . . . , xn are drawn iid from f(x). We implement this as:
mc = mean( rcauchy(1e4)>2 ) # (ordinary) Monte Carlo estimator
mc
## [1] 0.1449
We can implement importance sampling (as we did in lectures) using an importance function derived as
follows. Observe that for large x, f(x) is approximately equal to 1/(πx2). If we normalise this function so
that it is a density function with support on [2, ∞) we get
g(x) = 2x2, x > 2.
As we’ve seen in lectures, we can sample from g(x) using the inversion method:
random.g <- function(n){u <- runif(n)x <- 2/ux}
This now leads to an importance sampling estimatorˆθg =1n,
where x1, . . . , xn are an iid sample drawn from g(x), where xi = 2/ui and ui ∼ U(0, 1), for i = 1, . . . , n. We
are now in a position to implement importance sampling as
f <- function(x) dcauchy(x)
phi <- function(x) x>2
g <- function(x) 2*x^(-2)
isamp <- importance.sampling(f,phi,g,random.g,1e4)
isamp
## [1] 0.1476665
This compares well to the true value of θ = P(X > 2) is 0.5 − π
−1
tan 2 = 0.1476. Compare this to the
corresponding estimator above based on (ordinary) Monte Carlo.
Now we’ll explore how both the (ordinary) Monte Carlo and importance sampling estimators improve as the
sample size, n increases.
ns = seq(50,10000,len=995)
is <- rep(NA, 995)
mc <- rep(NA,995)
for(i in 1:995){
is[i] <- importance.sampling(f,phi,g,random.g,ns[i]) # importance sampling estimator
mc[i] <- mean( rcauchy(ns[i])>2 ) # (ordinary) Monte Carlo estimator
}
plot(ns, mc, type='l', col='red', xlab = 'sample size n', ylab='Estimate of P(X>2)', main='Estimate of P(X>2) -- Importance sampling (blue), MC (red)')
lines(ns,is, type='l', col='blue')
0 2000 4000 6000 8000 10000
0.10 0.15 0.20
Estimate of P(X>2) −− Importance sampling (blue), MC (red)
sample size n
Estimate of P(X>2)
Exercises
1. Consider again estimating θ = P(X > 2), where X follows a Cauchy density.
• Estimate the standard error of ˆθn(f) and ˆθn(g), (the (ordinary) Monte Carlo and importance sampling
estimators, respectively, of θ) for various values of n. Provide appropriate plots and comments on your
output. [15 marks]
• Provide an approximate 95% confidence interval for θ using (ordinary) Monte Carlo for n = 1, 000
and similarly another approximate 95% confidence interval for θ using importance sampling, again for
n = 1, 000. [5 marks]
2. Here we would like to estimate θ = P(X > 3) where X ∼ N(0, 1). Estimate θ using importance
sampling with an importance density g(x) distributed as N(µ, 1), for µ > 0. You may use n = 10, 000.
Experiment using different values of µ and discuss how this effects the precision the estimate of θ.
[30 marks]
Hand-in date: October 23rd at 12pm