Task:
The objective of this coursework is to propose and build a framework for batch
forecasting of fast moving time series. Once a set of suitable forecasting models is
identified (Part A), you are asked to propose a model selection strategy that will
automatically choose the most appropriate forecasting model for each time series
individually (Part B). The proposed strategy should then be applied to all time series.
Finally, performance evaluation (Part C) and residuals diagnostics (Part D) should be
carried out.
Data:
Using the library Mcomp of the R statistical software, consider the quarterly time
series of the M3-Competition with IDs within [701, 1400], so that the last digit of the
series ID matches the last digit of your Student ID. ( 6)For example, if
your Student ID is 1456789, then you should select all the series with IDs finishing at
9, that means: 709, 719, 729, 739, ..., 1399. Following this procedure, you should end
up with a set of 70 quarterly time series. You should be able to access a single time
series (e.g. the time series 709) using the command M3[[709]] . Note that each time
series is split in an in-sample (M3[[709]]$x) and an out-of-sample (M3[[709]]$xx) set
of observations. Other useful variables include the size of the in-sample
(M3[[709]]$n), the size of the out-of-sample (equal to the required forecast horizon,
M3[[709]]$h), and the category of the data (micro, macro, industry, demographic,
finance or other, M3[[709]]$type). For simplicity, the length of the out-of-sample set
is always 8 quarters. You are expected to use only the in-sample set in order to
generate statistical forecasts for the out-of-sample set (forecasting horizon equal to 8
periods/ quarters). Then, forecasting performance should be evaluated by comparing
the produced forecasts with the withheld out-of-sample set of observations.
Part A: Select a suitable toolbox of forecasting models
Your toolbox should contain at least one (1) time series regression model, at least
three (3) exponential smoothing models and at least two (2) ARIMA models. The
selected models should be able to capture collectively different underlying time series
characteristics (level, trend, seasonality). A full justification of the selected models
should be provided.
Part B: Select and apply a suitable model selection strategy.
Using only the in-sample data, propose a suitable strategy in order to select for each
series individually the most suitable forecasting model. Justify the selection of this
model selection strategy over other model selection strategies for forecasting. Apply
the proposed model selection strategy to the data in order to generate forecasts for the
out-of-sample periods. You are strongly advised to consider methodologies such as
validation and/or cross-validation. The selection of the validation/cross-validation
windows lengths and the number of cross-validation steps should be justified.
Part C: Performance evaluation
Evaluate the forecasts produced in the previous step using at least three appropriate
error measures. Evaluation should be carried out across time series and across
horizons. Justify the selection of these error measures over other possible candidates.
Compare the accuracy of the proposed selection strategy with that of three (3)
suitable benchmarks (for example, Naïve or Damped Exponential Smoothing) when
each of these is applied across all series. Was the application of the proposed model
selection strategy successful for this set of data? Critically discuss. For extra marks,
consider the decomposition of the analysis to different planning horizons (short-,
medium- and long-term), decomposition of the results with regards time series
characteristics (trend and/or seasonality) as well as decomposition for the different
categories of data (micro, macro, industry, demographic, finance or other).
Part D: Residuals diagnostics
Select the three time series in your set where 1201 < ID < 1230 (for example, if the
last digit of your ID is 6, then the three time series should be 1206, 1216 and 1226)
and perform. residuals diagnostics for the selected “optimal” and two more methods
(one being exponential smoothing and the other a regression model).
model error measure
Regression models
Simple linear regression D
Multiple regression
ANOVA A
Logistic regression C
Exponential smoothing models (level, trend, seasonal)
SES (additive seasonal, multi seasonal) A
Naïve (trend, additive seasonal, multi seasonal) C
Moving average (trend) D
Global average
Holt’s linear trend C
Holt winter A
Damped trend A
ARIMA models (p, d, q)
AR A
Stationarity
MA A
Error measures
Scale-dependent (ME, MAE, MSE… )
Percentage measure (MPE, MAPE … )
Relative measure (MRAE, MdRAE...)