Multilevel Sequential Monte Carlo Samplers
Alexandros Beskos1, Ajay Jasra2, Kody Law3, Raul Tempone3 Yan Zhou2
1Department of Statistical Science, University College London, London, WC1E 6BT, UK,
2Department of Statistics Applied Probability, National University of Singapore, Singapore, 117546, SG,
3Center for Uncertainty Quanti cation in Computational Science Engineering, King Abdullah University of
Science and Technology, Thuwal, 23955-6900, KSA.
Abstract
In this article we consider the approximation of expectations w.r.t. probability distributions asso-
ciated to the solution of partial di erential equations (PDEs); this scenario appears routinely in
Bayesian inverse problems. In practice, one often has to solve the associated PDE numerically,
using, for instance nite element methods and leading to a discretisation bias, with the step-size
level hL. In addition, the expectation cannot be computed analytically and one often resorts to
Monte Carlo methods. In the context of this problem, it is known that the introduction of the mul-
tilevel Monte Carlo (MLMC) method can reduce the amount of computational e ort to estimate
expectations, for a given level of error. This is achieved via a telescoping identity associated to
a Monte Carlo approximation of a sequence of probability distributions with discretisation levels
1> h0 > h1 > hL. In many practical problems of interest, one cannot achieve an i.i.d. sam-
pling of the associated sequence of probability distributions. A sequential Monte Carlo (SMC)
version of the MLMC method is introduced to deal with this problem. It is shown that under
appropriate assumptions, the attractive property of a reduction of the amount of computational
e ort to estimate expectations, for a given level of error, can be maintained within the SMC
context. The approach is numerically illustrated on a Bayesian inverse problem.
Keywords: Multilevel Monte Carlo, Sequential Monte Carlo, Bayesian Inverse Problems.
AMS subject classi cation: 65C30, 65Y20.
1. Introduction
Consider a sequence of probability measuresf lgl 0 on a common measurable space (E;E); we
assume that the probabilities have common dominating nite-measure du and write the densities
w.r.t. du as l = l(u). In particular, for some known l : E!R+, we let
l(u) = l(u)Z
l
(1)
where the normalizing constant Zl = RE l(u)du may be unknown. The context of interest is when
the sequence of densities is associated to an ‘accuracy’ parameter hl, with hl!0 as l!1 with
1> h0 > h1 > > h1 = 0. This set-up is relevant to the context of discretised numerical
approximations of continuum elds, as we will explain below. The objective is to compute:
E 1[g(U)] :=
Z
E
g(u) 1(u)du
Preprint submitted to Elsevier March 26, 2015
for potentially many measurable 1 integrable functions g : E!R. In practice one cannot treat
hl = 0 and must consider these distributions with hl > 0.
Problems involving numerical approximations of continuum elds are discretized before being
solved numerically. Finer-resolution solutions are more expensive to compute than coarser ones.
Such discretizations naturally give rise to hierarchies of resolutions via the use of nested meshes.
Successive solution at re ned meshes can be utilized to mitigate the number of necessary solves
for the nest resolutions. For the solution of linear systems, the coarsened systems are solved as
pre-conditioners within the framework of iterative linear solvers in order to reduce the condition
number, and hence the number of necessary iterations at the ner resolution. This is the principle
of multi-grid methods. For Monte Carlo methods, as in the context above, a telescoping sum of
associated di erences at successive re nement levels can be utilized. This is so that the bias of the
resulting multilevel estimator is determined by the nest level but the variance of the estimators
of the di erences decays. The reduction in the variance at ner levels implies that the number
of samples required to reach a given error tolerance is also reduced with increasing resolution.
This procedure is then optimized to balance the extra per-sample cost at the ner levels. Overall
one can obtain a method with smaller computational e ort to reach a pre-determined error than
applying a standard Monte Carlo method immediately at the nest resolution [12].
Multi Level Monte Carlo (MLMC) [12] (see also [13]) methods are such that one typically sets
an error threshold for a target expectation, and then sets out to attain an estimator with the
prescribed error utilizing an optimal allocation of Monte Carlo resources. Within the context of
[12, 14], the continuum problem is a stochastic di erential equation (SDE) or PDE with random
coe cients, and the target quantity is an expectation of a functional, say g : E ! R, of the
parameter of interest U 2 E, over an ideal measure U 1 that avoids discretisation. The
levels are a hierarchy of re ned approximations of the function-space, speci ed in terms of a
small resolution parameter say hl, for 0 l L, thus giving rise to a corresponding sequence of
approximate laws l. The method uses the telescopic sum
E L[g(U)] = E 0[g(U)] +
LX
l=1
fE l[g(U)] E l 1[g(U)]g
and proceeds by coupling the consecutive probability distributions l 1, l. Thus, the expectations
are estimated via the standard unbiased Monte Carlo averages
YNll =
NlX
i=1
fg(U(i)l ) g(U(i)l 1)gN 1l
wherefU(i)l 1;U(i)l gare i.i.d. samples, with marginal laws l 1, l, respectively, carefully constructed
on a joint probability space. This is repeated independently for 0 l L. The overall multilevel
estimator will be
^YL;Multi =
LX
l=0
YNll ; (2)
under the convention that g(U(i) 1) = 0. A simple error analysis gives that the mean squared error
(MSE) is
Ef^YL;Multi E 1[g(U)]g2 = Ef^YL;Multi E L[g(U)]g2| {z }
variance
+fE L[g(U)] E 1[g(U)]g2| {z }
bias
: (3)
2
One can now optimally allocate N0;N1;:::;NL to minimize the variance term PLl=0Vl=Nl for xed
computational cost PLl=0 ClNl, where Vl is the variance of [g(U(i)l ) g(U(i)l 1)] and Cl the computa-
tional cost for its realisation. Using Lagrange multipliers for the above constrained optimisation,
we get the optimal allocation of resources Nl/pVl=Cl. In more detail, the typical chronology is
that one targets an MSE, sayO( 2), then (i) given a characterisation of the bias as an order of hl,
one determines hl = M l, l = 0;1;:::;L, for some integer M > 1, and chooses a horizon L such
that the bias isO( 2) and (ii) given a characterisation of Vl, Cl as some orders of hl, one optimizes
the required samples N0;:::NL needed to give variance O( 2). Thus, a speci cation of the bias,
variance and computational costs as functions of hl is needed.
As a prototypical example of the above setting [12], consider the case U = X(T) with X(T)
being the terminal position of the solution X of a SDE and l is the distribution of X(T) under
the consideration of a numerical approximation with time-step tl = hl. The laws l 1, l can be
coupled via use of the same driving Brownian path. Invoking the relevant error analysis for SDE
models, one can obtain (for U 1, Ul l, and de ned on the common probability space):
(i) weak error jE[g(Ul) g(U)]j=O(h l ), providing the bias O(h l ),
(ii) strong error, Ejg(Ul) g(U)j2 =O(h l ), giving the variance Vl =O(h l ),
(iii) computational cost for a realisation of g(Ul) g(Ul 1), Cl =O(h l ),
for some constants ; ; related to the details of the discretisation method. The standard Euler
Marayuma method for solution of SDE gives the orders = = = 1.
Assuming a general context, given such rates for bias, Vl and Cl, one proceeds as follows. Recall
that hl = M (l+k), for some integer M > 1. Then, targeting an error tolerance of and letting
h L = M L = O( ), one has L = log( 1)=( log(M)) +O(1), as in [12]. Using the optimal
allocation Nl/pVl=Cl, one nds that Nl/h( + )=2l . Taking under consideration a target error of
sizeO( ), one sets Nl/ 2h( + )=2l KL, with KL chosen to control the total error for increasing L.