Using R Spring 2018
Assignment 1
The due date for this assignment is 15, May.
1. Let’s introduce a package "quantmod" - a quantitative nancial modelling and trading
framework in R with its homepage at www.quantmod.com. This package o ers a
variety of tools for downloading, extracting and displaying daily prices in the OHLC
format. (The abbreviation OHLC stands for open, high, low and close prices.):
(a) Install package "quantmod" with command install.packages(quantmod). If it does
not work, please download the package and its dependencies from web and install
them locally. Note that the data format you download using "quantmod" is xts
object which can be constructed by package "xts", so please install it. (Hint:
The command to download yahoo prices form. the Google Finance, getSymbol-
s("YHOO", src="google"), the option src speci es the source from which thee
data are to be downloaded. The current available sources for downloading in-
clude yahoo (default), google, Fed and Onada). Please download SP 500 index
prices from yahoo, the sample period is from November 16, 2011 | April 5, 2012.
(b) Using "View" to view the data you download. And what we need is the last
column of the object, the adjusted index prices. Now please take log of this series
and plot the log price series and acf of the log price series. Comment your results.
(c) Load the packages ’forecast’, ’tseries’, ’xts’ and ’fUnitRoots’, then please use dif-
ferent adf tests to this series, comment you results.
(d) Regress the log price series on a linear trend model, then plot the residuals, its
acf and pacf. Comment the results.
(e) Using an ARMA model to t the residuals. You can use auto.arima to choose
the model. Print your results.
(f) Take the di erence of the log prices which is the returns of the series. Plot the
returns, its acf and pacf.
(g) Use ARIMA(2,1,2) model to t the log price series and ARMA(2,2) model to t
the di erence of the log price series. Print your results.
2. The e cient market hypothesis (EMH) in nance assumes that asset prices are fair,
information is accessiable for everybody and is assimilated rapidly to adjust price, and
papeople are rational. The price is right, and there exists no arbitrary opportunities.
(a) A widely used form. for EMH is
rt = +"t;"t WN 0; 2
where rt = log (Pt=Pt 1) denote the return, Pt is the price of the security. In fact,
we have three forms of assumption about the innovation "t. The second is that "t
is a martingale di erence sequence in the sense that for any t
E("tjrt 1;rt 2; ) = E("tj"t 1;"t 2; ) = 0:
The third is that "t IID(0; 2):Among these three assumption, which one is
the strongest? which one is the weakest?
(b) Suppose "t = t t, where t IIDNormal(0;1), t = pa+br2t 1, suppose
a> 0, b> 0 and rt is stationary, E(r2t ) exists, what is the conditional mean and
variance of "t? What is the unconditional mean and variance? Is "t a white noise,
martingale di erence or IID sequence?
(c) An implication of EMH is that the return rt is unpredictable, that is the condition
expectation of rt based on the information up to time t is the same as , that
is the past information is no useful for predicting rt. Is this condition satis ed
under three di erent assumptions?
(d) In practice, there are two ways to test the EMH, the rst is to test whether rt is
a white noise, the commonly use statistic is the Ljung-Box Q statistic. This test
statistic is in fact to test whether the time series is uncorrelated which is de ned
asQm = T(T + 2)mXj=11T j ^ j
where
j = Corr(rt;rk) = cov(rt;rt k)pvar(r
t)var(rt k)
and m is the lag which is predeterminated. Apply the Q statistics to the monthly
log return (log (Pt=Pt 1)) of SP 500 index in January 1985 - February 2011 and
the square of the monthly log return for m = 1;6;12;24. Show your results.
(e) Another way is to rewrite the EMH as
logPt = logPt 1 +"t
then logPt is a random walk, then apply the adf test to the monthly log return
(log (Pt=Pt 1)) of SP 500 index in January 1985 - February 2011 and then show
your results.
3 Time series analysis can be used in a multitude of business applications for forecasting
a quantity into the future and explaining its historical patterns. Here are just a few
examples of possible use cases: 1) Explaining seasonal patterns in sales; 2) Predicting
the expected number of incoming or churning customers; 3)Estimating the e ect of
a newly launched product on number of sold units; 4)Detecting unusual events and
estimating the magnitude of their e ect. Here we are using a dataset on the number of
bicycles checkouts from a bike sharing service. We will be using the dataset aggregat-
ed at daily level. Bike sharing systems are new generation of traditional bike rentals
where whole process from membership, rental and return back has become automatic.
Through these systems, user is able to easily rent a bike from a particular position
and return back at another position. Currently, there are about over 500 bike-sharing
programs around the world which is composed of over 500 thousands bicycles. Today,
there exists great interest in these systems due to their important role in tra c, en-
vironmental and health issues. Apart from interesting real world applications of bike
sharing systems, the characteristics of data being generated by these systems make
them attractive for the research. Opposed to other transport services such as bus or
subway, the duration of travel, departure and arrival position is explicitly recorded in
these systems. This feature turns bike sharing system into a virtual sensor network
that can be used for sensing mobility in the city. Hence, it is expected that most of
importantevents in the city could be detected via monitoring these data.
(a) Load the packages and data. Bike-sharing rental process is highly correlated to
the environmental and seasonal settings. For instance, weather conditions, precip-
itation, day of week, season, hour of the day, etc. can a ect the rental behaviors.
The core data set is related to the two-year historical log corresponding to years
2011 and 2012 from Capital Bikeshare system, Washington D.C., USA which is
publicly available in http://capitalbikeshare.com/system-data. Please load the
packages ’forecast’, ’tseries’, ’xts’ and ’fUnitRoots’ and the dataset aggregated at
daily level named ’day.csv’.
(b) Examine Your Data. The variable ’cnt’ in ’day.csv’ denotes the count of
total rental bikes including both casual and registered. A good stating point is to
plot the series and visually examine it for any outliers, volatility, or irregularities.
Please discribe your ndings. R provides a convenient method for removing time
series outliers: tsclean() as part of its forecast package. tsclean() identi es and
replaces outliers using series smoothing and decomposition. Note that you should
rst use the ts() command to create a time series object to pass to tsclean().
(c) Even after removing outliers, the daily data may still pretty volatile. Visually,
we could draw a line through the series tracing its bigger troughs and peaks
while smoothing out noisy uctuations. This line can be described by one of
the simplest | but also very useful |concepts in time series analysis known as
a moving average. It is an intuitive concept that averages points across several
time periods, thereby smoothing the observed data into a more stable predictable
series. Using the cleaned data by tsclean() with the command MA, the order
is 7. And plot your results.
(d) Decompose Your Data. Use the command stl() to decompose your data.
Deseasonalized the data you get from (c), and plot the deseasonalized data.
(e) Stationarity. Check the Stationarity of the deseasonalized data. If it is not
stationary, turn it into stationary. Then plot the ACF and PACF of the desea-
sonalized data and its di erence.
(f) Fitting an ARIMA model. Use the command auto.arima to deternmine the
AR, MA and Integrated order of ARIMA model.
(g) Evaluate and Iterate. So now we have tted a model that can produce a
forecast, but does it make sense? Can we trust this model? We can start by ex-
amining ACF and PACF plots for model residuals. If model order parameters and
structure are correctly speci ed, we would expect no signi cant autocorrelations
present. Then we can adjust the orders in the model from (f) to make the model
residuals uncorrelated.
(h) Forcasting. Use the command forcast to get the 1:20 ahead forcast with the
model from (g) and plot.