ST3370_B
BAYESIAN FORECASTING AND INTERVENTION
Summer 2022
Question 1
We consider the general objective of inferring an unknown parameter θ in a set θ from observations y1:n = (y1,...,yn) on R. We model the observations as realizations of y1:n = (y1,..., yn) whose law is defined by
where µ(0), µ(1), σ2, π(0) and π(1) are known parameters and N(·; µ, σ2) denotes the density of a Gaussian distribution with mean µ and variance σ2 > 0.
(a) Which conditions should π(0) and π(1) satisfy for pθ to be a density? [1 mark]
(b) Let pθ / c(0)N(θ; µ(0), σ2) + c(1)N(θ; µ(1), σ2) with c(0), c(1) > 0 not depending on θ. What is the expression of pθ(θ)? [1 mark]
(c) What is the prior mean of θ? To what extent, or for which range of parameters, would you argue that it is a good summary for the prior? If needed, propose an alternative and explain your choice. [3 marks]
(d) For each j = 0, 1 consider the model
(i) What is the posterior distribution for θ(j) given that y(
1:
j
n
) = y1:n? [3 marks]
(ii) Based on the answer to Question 1(d)(i), find py(j)1:n(y1:n). [3 marks]
(e) What is the posterior distribution for θ given that y1:n = y1:n? [Hint: use the answers to Question 1(b) and Question 1(d)] [3 marks]
(f) With reference to conjugacy,
(i) Given the expression of the prior for θ, which condition should the posterior satisfy to ensure that the prior is conjugate with respect to the likelihood? [1 mark]
(ii) Based on the answer to Question 1(e) and Question 1(f)(i), if the prior for θ is conjugate with respect to the likelihood, comment on the update of all the parameters of the posterior and their behaviour when the number of observations n diverges. [3 marks]
(g) Consider the following alternative model for the observations:
where Ber(j; π)=(π)j (1−π)1−j {0,1}(j) denotes the density of a Bernoulli distribution with probability of success π.
What is the posterior for θ' given that y' 1:n = y1:n? [2 marks]
Question 2
You are given two observations (y0, y1) = (10, 9) at, respectively, time step 0 and time step 1. You are asked to model them through the following constant Gaussian DLM:
(1)
where θ0 = 10 + v0, vk ~ N(·; 0, 1) and uk ~ N(·; 0, 1) are all independent for k ≥ 0.
(a) What are the hyperparameters of the DLM and their dimensionality? [1 marks]
(b) Using the initial distribution for the state and the observations (y0, y1) = (10, 9), derive the explicit expression of the filtering and of the predictive distributions (both for the state and for the observations) recursively starting from the filtering at time 0 (p(θ0|y0 = 10)) until the predictive for the observations at time 2, (p(y2 | y0:1 = (10, 9)). [4 marks]
(c) Provide two di↵erent ways of interpreting the relation between the filtering mean and the predictive mean for the parameter at the same time step. How does the variance change between the filtering and predictive distribution? Please provide an interpretation. [3 marks]
(d) Which value for the state θ10 do you expect at time 10? [2 marks]
(e) Consider a dynamic linear model of the form.
where vk ~ N(·; 0, Vk) and uk ~ N(·; 0, Uk) are all independent for k ≥ 0.
Let Cn,j,h = Cov(θn+j , θn+h | y0:n = y0:n) for j, h > 0.
(i) Which assumptions are missing for the model to be correctly specified? [1 mark]
(ii) Provide the expression of Cn,j,j in terms of the filtering distribution. [2 marks]
(iii) For a fixed j > 0, provide a recursive algorithm to compute Cn,j,h for every h ≥ j. [3 marks]
(iv) Let Dn,j,h = Cov(yn+j , yn+h | y0:n = y0:n) for j, h > 0. Provide a way to compute Dn,j,h for h>j. [2 marks]
(v) Compare Cn,j,h and Dn,j,h for the model in (1) when h>j and compare them. What changes in the comparison when h = j? [2 marks]
Question 3
Consider a time series model with the following state-space equations
(2)
where (θ0,1, θ0,2) = m0 + u0, vk and uk are all independent for k ≥ 0 and have mean zero.
(a) What is the transition matrix F of the time series in (2)? [1 mark]
(b) Consider a reparameterization of the model in (2) with θ0
k = (θ' k,1, θ' k,2)=(θk,2, 3θk,1).
(i) Is (yk, θ0
k)k≥0 a state space model? Justify your answer and, if this is the case, highlight the expression of the transition matrix F0
. [2 marks]
(ii) Is θ'k a linear transformation of θk = (θk,1, θk,1)? If so, express the linear transformation in matrix form. [1 mark]
(iii) Use your answers to Question 3(b)(i) and Question 3(b)(ii) to find a canonical similar model to (2). [2 marks]
(iv) Find the eigenvalues of F and confirm your findings. [1 mark]
(c) You are asked to predict the position of a train and for five consecutive time intervals you observe it at y0:4 = (5, 5.1, 4.9, 4.7, 5.3) kilometers from Coventry station. Would you use the model in (2)? Justify its use or propose an alternative with reference to the findings in Question 3(b). [3 marks]
(d) Starting from (2) consider a different space equation for y0
k that satisfies
(3)
where µ0 = b0 + w0, wk are independent for k ≥ 0, have mean zero, and are independent from (uk)k≥0 and (vk)k≥0.
(i) How would you describe qualitatively the observations (yk)k≥0 arising from this model? Propose a phenomenon that could be modelled by (3). [2 marks]
(ii) Is the state-space model in (3) observable? Justify your answer. [2 marks]
(e) Can the superposition of two state-space models be observable? Justify your answer by providing an example. [2 marks]
(f) Consider a Gaussian dynamic linear model M that is similar to model (2) and has observability matrix
and ω ≠ 2πq for every q ∈ N.
(i) What is a canonical model similar to M? [1 mark]
(ii) Let Uk and Vk denote the variances of the transition and observation noises of M. What is the canonical equivalent model ? [3 marks]
Question 4
Consider the random walk with noise process characterized by the following equations
(4)
where vk ~ N(·; 0, V ) and uk ~ N(·; 0, U) are all independent for k 2 Z.
(a) Is (yk)k2Z autoregressive? Does this imply that it is stationary? Justify your answer. [2 marks]
(b) Let ek = yk − yk−1. Is (ek)k2Z stationary? Justify your answers. [2 marks]
(c) Show that the autocovariance of (ek)k2Z coincides with the one of a moving average model MA(p) for some p. How can we find the corresponding coefficients? [2 marks]
(d) Consider the optimal Kalman gain Kk at time k of model (4).
(i) Express Kk in terms of U, V and Kk−1 and use it to prove that Kk converges to a constant K as k → +∞. [Hint: use 0 < < 1.] [2 marks]
(ii) Prove that the filtering variance ˆPk = V Kk and deduce that Pk converges to a constant P as k → +∞ [1 mark]
(e) Let zk = yk − E(yk|y0:k−1) be the innovation error, considered as a function of y0:k.
(i) What is the mean of zk ? [1 mark]
(ii) What is the variance of zk? How does it change when k → +∞? [3 marks]
(iii) Show that zk and zk+δ are not correlated. [1 mark]
(iv) Express ek in terms of Kk−1 and zk. [3 marks]
(v) What can we deduce about the asymptotic behaviour of ek? [2 marks]
(vi) What can we deduce about the asymptotic behaviour of yk? [1 mark]
Question 5: Compulsory question for students taking ST405.
Consider the following univariate Gaussian dynamic linear model
with θ0 = α + u0, vk ~ N(·; 0, Vk) and uk ~ N(·; 0, Uk) are all independent for k ≥ 0. You know that α 2 [0, 1] but you have no particular opinion on its value. You decide to model it as a random variable α.
(a) Is it possible to define a state-space model (θ0
k, yk)k that allows inference on α to be made? Provide the state equation, the observation equation and the initial state distribution. [3 marks]
(b) Can you use the Kalman filter on (✓0
k, yk)k≥0? Justify your answer. [2 marks]
(c) You are asked to find the mean estimate of sin(α + θ0 + θ1 + θ2) conditionally on y0:2 = y0:2.
(i) Are you able to find its exact value? If not, what is the difficult part? [2 marks]
(ii) Why could we think of using a Monte Carlo method to approximate it and what obstacles do we find? [2 marks]
(iii) What is the challenge in using a standard importance sampling with target equal to p(α, θ0:2|y0:2) and proposal equal to the prior p(α, θ0:2)? [2 marks]
(iv) Provide all the steps of a sequential Monte Carlo algorithm with target equal to p(α, θ0:2|y0:2) and proposal equal to the prior p(α, θ0:2) and indicate how to use it to approximate the quantity you are interested in. [3 marks]
(v) How many times have you sampled values for α|y0:2 = y0:2 in the previous algorithm? Would it change if you introduced a resampling step? [2 marks]
(vi) Do you think that your answer to Question 5(iv) is a good algorithm? Justify your answer. [2 marks]
(vii) How can you use your answer to Question 5(iv) to find an approximation of p(α|y0:2)? [2 marks]