首页 > > 详细

MATH96053/MATH97084/MATH97185 Time Series Analysis

 MATH96053/MATH97084/MATH97185

Time Series Analysis
Ed Cohen
Room: 536 Huxley
email: e.cohen@imperial.ac.uk
Use Blackboard to obtain all course resources
Department of Mathematics
Imperial College London
180 Queen’s Gate, London SW7 2BZ
1
Video 1
Chapter 1
Introduction
1.1 Module admin and structure
Pre-requisites
Probability for Statistics MATH50010
Statistical Modelling I MATH50011 (preferable but not essential).
Course materials
Lecture notes
Figures booklets
11 weeks of lecture material delivered via Panopto videos
Non-assessed quiz questions
5 non-assessed problem sheets
Live sessions
5 problems classes
10 Q&A classes
Assessment
90% Exam
10% Coursework
2
1.2 What are times series?
Video 2
A time series is a series of data points indexed (or listed) in time order [wikipedia].
Any metric that is measured over time is a time series.
Times series analysis (TSA) could be described as a branch of applied stochastic
processes. We start with an indexed family of real-valued random variables
{Xt : t 2 T}
where t is the index, here taken to be time (but it could be space). T is called the
index set. We have a state space of values of X.
Possibilities
State Time
Continuous Continuous X(t)
Continuous Discrete Xt
Discrete Continuous
Discrete Discrete
In addition X could be univariate or multivariate. We shall concentrate on
discrete time. Samples are taken at equal intervals.
We wish to use TSA to characterize time series and understand structure. Our job
is to make inference on the underlying stochastic process from a single realisation -
the observed time series.
Diagram: paths/trajectories
3
Stochastic Random Typically 21
process Assumed to run from a ton X r Xa Ko Xa Xz
Temp
population fµ f fx hx x
Let't WEE
T I Tweets per hour
time sent over twitter
density
I l k l l l t t i
to tf 1 ETL Efs 4 I 1 Y
Examples: Figures 1–4, in all cases points are joined for clarity.
[1] wind speed in a certain direction at a location, measured every 0.025s.
[2] monthly average measurements of the flow of water in the Willamette River at
Salem, Oregon.
[3] the daily record of the change in average daily frequency that tells us how well
an atomic clock keeps time on a day to day basis.
[4] the change in the level of ambient noise in the ocean from one second to the
next.
[5] part of the Epstein-Barr Virus DNA sequence (the entire sequence consists of
approximately 172,000 base pairs).
[6] Daily US Dollar/Sterling exchange rate and the corresponding returns from 1981
to 1985.
The visual appearances of these datasets are quite di↵erent. For example, con￾sider the wind speed and atomic clock data,
Wind speed Atomic clock
Adjacent points are Positive values tend to be
close in value followed by negative values
For the numerical data, we can illustrate this using lag 1 scatter plots.
Diagram: lag 1 scatter plots
4
s.net Htt tic xD
xd x x
nuns
x x x
At
(See Figures 4a and 5). Realizations of the series denoted x1,...,xN . So plot xt
versus xt+1 as t varies from 1 to N " 1.
From these scatter plots we note the following:
[1] for the wind speed and US dollar series, the values are positively correlated.
[2] Willamette river data is similar, but points are more spread out.
[3] for the atomic clock data, the values are negatively correlated.
[4] for the ocean noise data and the US dollar returns series there is no clear clus￾tering tendency.
We could similarly create lag ⌧ scatter plots by plotting xt versus xt+⌧ for integer
⌧, but they would be unwieldy to deal with and interpret.
A better approach is to realize that the series x1,...,xN can be regarded as a
realization of the corresponding random variables X1,...,XN , and we will proceed
by studying the covariance relationships between these random variables.
5
realisatroins
f of r r s
X1 XN
ye
i swh time
9 too
observation
window
1.3 A brief aside - covariance and correlation
Video 3
The concept of covariance and correlation will be crucial in this module, therefore
we BRIEFLY recap some of the key ideas.
Covariance
Covariance is a measure of joint variability of two random variables, X and Y say.
Defined as
cov(X, Y ) ⌘ E {(X " E{X})(Y " E{Y })} = E{XY } " E{X}E{Y }. • Positive covariance ) when X is above its mean then Y also tends to be.
• Negative covariance ) when X is above its mean then Y tends to be below
its mean.
• Zero covariance means there is no relationship of this type and E{XY } = E{X}E{Y }
Therefore, this gives a measure of linear dependency between 2 random variables.
NOTE: cov(X, X) = var(X)
All variance and covariance terms (known as the joint second moments), can be
summarized in the variance-covariance matrix (also commonly known as just the
covariance matrix ).
Define vector X = (X, Y )T with mean E{X} = µ = (µX, µY )T and variances
"2X and "2Y , respectively.
⌃ ⌘ E{(X " µ)(X " µ)T } = 0B@ "2X cov(X, Y )
cov(X, Y ) "2Y 1CA
This can be extended to higher dimensions. For a random vector X = (X1, X2, ..., Xm),
we have a m ⇥ m covariance matrix ⌃ = ("ij ) where "ij = cov(Xi, Xj ).
⌃ is a symmetric positive semi-definite matrix.
6 T E f XY X E 43 Efx YtEfx EGyf EfXY ESME fill
Efx Eh42 t Efx Efm
Variances
L T T
Correlation
Correlation is a normalized measure of covariance. It is useful because covariance is
proportional to variance so can be misleading.
Covariance has all the properties of an inner-product, cov(·, ·) ⌘ h·, ·i, namely
• Bilinear: cov(aX + bY, Z) = acov(X, Z) + bcov(Y,Z) • Symmetric: cov(X, Y ) = cov(Y,X) • Positive definite: cov(X, X) ( 0 for all X.
This means we can invoke the Cauchy-Swartz inequality:
|hx, yi|  qkxk2kyk2 |cov(X, Y )|  q
 
联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!