ST337 and ST405
BAYESIAN FORECASTING AND INTERVENTION
Summer 2024
1. In a clinical trial, a new vaccine is being tested. A group of n individuals are vaccinated. Record Yi = 1, if the individual i has successfully developed immunity. Assume that Y1, . . . ,Yn are conditionally independent given the unknown success rate θ.
(a) Which of the probability distributions listed in Table 1 is best suited to model the data generation process? Write down the corresponding likelihood function. [2 marks]
Table 1: Probability distributions of random variable Y . B(α, β) and Γ(α) are the Beta and Gamma functions, respectively.
(b) Suppose a scientist wants to use the uniform. distribution as the prior for θ.
(i) As the scientist has no preference toward any specific value of success rate θ, specify suitable values of parameters a and b of the uniform. prior? Justify your answer. [2 marks]
(ii) Would the Uniform. prior be conjugate for the selected likelihood function? Justify your answer. [3 marks]
(c) Another scientist wants to use a Beta(α, β) as the prior for θ.
(i) Derive the posterior distribution for θ given Y1:n and state explictly the distribution parameters. [4 marks]
(ii) Find the posterior mean and variance (in terms of n, α, β, and y¯ = n−1 yi) [4 marks]
(iii) Give the expression (pdf) of the prediction distribution of Yn+1 given Y1:n (Hint: the expression is a fraction of two Beta functions B(·, ·)). [5 marks]
2. Consider a Constant Gaussian Dynamic Linear Model (DLM) used to track the position and velocity of a high-speed train. Denote k ≥ 0 as the evenly spaced discrete time index, with time step ∆. The state of the system at time k is given by the state vector θk = (θk,1, θk,2)
⊤, which includes the position θk,1 and velocity θk,2 of the train. The transitions of train’s states are subject to transition noise uk. Sensor measurements of the train’s position Yk are recorded at discrete time and are subject to observation noise vk.
(a) Write down the corresponding state (including the initial state) and observation equations and give explicitly the assumptions on the noise terms. [3 marks]
(b) What are the hyperparameters of the DLM and their dimensionality? Give the explicit form. of the transition matrix and the observation matrix. [2 marks]
(c) Assuming the conditions of Kalman Filter are fullfilled. Let the predictive distribution of θ1 given Y0 and the noise covariances be expressed respectively as
θ1|Y0 ∼ N((0, 0)⊤, I2), cov(uk) = I2, and cov(vk) = 1,
where I2 is 2 × 2 identity matrix.
(i) Write down the forecasting distribution of Y1 given Y0. [3 marks]
(ii) Given the observation y1 = 2, write down the filtering distribution of θ1 given Y0:1. [3 marks]
(iii) Compute the Kalman gain K1. Compare the predictive distribution of θ1 given Y0 and the filtering distribution of θ1 given Y0:1. Explain the implications of using the filtering distribution over the predictive distribution for making real-time decisions. [3 marks]
(d) Suppose the observation at time k+1 is missing, derive the predictive distribution of the state θk+2, given all observations up to time k (in terms of mk+1 and Pk+1). [2 marks]
(e) Suppose U and V are unknown covariances of transition and observation noises, respectively. U and V have common scaling factor σ
2
, and they can be be written as U = σ
2Ue and V = σ
2Ve with σ
2 unknown but Ue and Ve known. We perform. inference on σ
2 using both Maximum likelihood estimation (MLE) and Bayesian approach.
(i) For MLE, write down the likelihood function and explain why the Kalman filter is useful to obtain MLE. [2 marks]
(ii) For Baysian approach, without actual derivations, state the prior distribution and indicate what family the posterior distribution of σ
2 will belong to. [2 marks]
3. (a) Consider M a univariate time-series {Yk, θk}k≥0 of the form.
with θ0 = u0 and, for k ≥ 0, vk and uk have mean zero and are all independent.
(i) Derive the expression of the forecast function gk(δ) = E(Yk+δ | Y0:k) of M, for δ > 0, in terms of the filtering mean mˆ k = E(θk | Y0:k). Justify all steps. [3 marks]
(ii) Consider the transition matrix
the observation vector H = (1, 1) and the filtering mean mˆ k = (2, 2)⊤. 1) Find the forecast function. 2) Is this DLM a polynomial model? 3) Is this DLM observable? [4 marks]
(iii) Can a DLM similar to the one in Question 3(a)(ii) have forecast function gk(δ) = 3δ? Justify your answer. [2 marks]
(iv) Why the DLMs are classified based on E(yk+δ | y0:k) and not on E(θk+δ | y0:k)? [2 marks]
(b) Consider an M with transition matrix
and with observation matrix H = (2, 1).
(i) Provide the expression of M′ a canonical similar model to M and identify its transition matrix F
′ and its observation matrix H′
. Justify your answers. [3 marks]
(ii) Find the similarity matrix S between M′ and M. [4 marks]
(iii) Let Id2 denote the 2 × 2 identity matrix. Since F = Id2F
′
Id−
2
1
, can you conclude that the similarity matrix is S = Id2, without performing the calculations in Question 3 (b)(ii)? Justify your answer. [2 marks]
4. Consider a DLM of the following form.
with θ0 = u0, F = , H = [1 0] and uk = ϵk, with {ϵk}k≥0 be a sequence of i.i.d. random variables distributed as N(0, σ2
) for some σ
2 > 0.
(a) Rewrite this DLM as an ARMA(p, q) model for Yk. Identify the value of p and q, as well as the AR and MA coefficients. [5 marks]
(b) Study the stability of the time series using the characteristic polynomials. Without further calculations, compare your result with the stability conditions for an AR(1) process and conclude about the influence of the moving average component. [5 marks]
(c) Write the ARMA(p, q) model obtained in Question 4(a) as an infinite-order MA process. [4 marks]
(d) Let Zk = Yk − E(Yk|Y0:k−1) be the innovation.
(i) What is the mean of Zk. Justify your answer. [2 marks]
(ii) What is the variance of Zk given Y0:k−1. Justify your answer. [2 marks]
(iii) What is the lag-δ autocvariance of Zk for δ ≠ 0. Justify your answer. [2 marks]
5. (a) Consider a real-valued distribution π with mean µ = E(θ) and variance
where θ ∼ π. Suppose µ is known, but σ
2
is unknown. The task is estimating the variance σ
2
.
(i) Derive the expression for Vb1, the Monte Carlo estimator of σ
2
, and express its variance in terms of the fourth central moment µ4 = E[(θ − µ)
4
] and the variance σ
2 of π. [2 marks]
(ii) Derive the expression for Vb2, an importance sampling estimator of σ
2 with respect to a general proposal distribution q. [2 marks]
(iii) Compute the mean and variance of Vb2. [3 marks]
(iv) Is it possible for the variance of Vb2 to be lower than that of Vb1? Justify your response and provide an illustrative example [Hint: consider sampling weights ω(θ) = σ
2/(θ − µ)
2
]. [3 marks]
(v) List two advantages of using Vb2 over Vb1, especially in light of your answer to the Question 5(d). [2 marks]
(b) Consider the following smoothing distribution:
with
and with Zn, the corresponding normalising constant. It is assumed that one can sample from p0(·) and qk(· | θ) for any θ ∈ Θ.
(i) What are the two main issues when estimating integrals related to smoothing distributions? [2 marks]
(ii) Which proposal distribution sn(θ0:n) would you use? [2 marks]
(iii) Propose a way to get a sample θ0:n from sn(·). [2 marks]
(iv) In the context of self-normalised importance sampling, what would be weight of this sample? [2 marks]