MXB361 Module 1: Complex Systems and Networks

MXB361

Module 1: Complex Systems and Networks

Assessment Part A: Complex Systems and Computation

This assignment counts for 60% of the MXB361 final score and focusses on the

material covered in weeks 1-6 of Module 1. It is due at midnight on the Tuesday of the week

following the last week of teaching of Semester 1. This is to be completed individually. The

answers will be run through plagiarism checking software (please don’t even think about it!).

Your submissions should be in the form of a report answering the questions below. You must

provide a code sample for each question where code writing is involved; and images as

requested. Give all answers to at least 2 decimal points.

Part I (Complex Systems Quantitative Analysis – 40 marks in total)

You have just landed a dream job as the head of strategy at an evil but old-fashioned,

non-quantitative foreign exchange trading hedge fund, having out-muscled, out-witted and outcomputed your rivals. You have been given a remit by the boss of the firm, Mr Fatman Rich,

to revolutionise their inept investment strategies. This you are determined to accomplish, in

order to get a massive bonus so you can buy yourself a huge jacuzzi and fill it with chocolate

and gold, or whatever other material goods you desire.

The fund invests only in the top 22 most traded “tickers”, which includes 19 national

currencies, as well as bitcoin (BTC), gold (XAU) and silver (XAG). Because you took

MXB361, your first step is to download the firm’s stock market data with a view to devising a

new, computationally sound investment strategy. The data set (‘data’ matrix) has the past 7110

days (about 30 years) of ticker prices for the 22 tickers – these are the prices daily of each

currency against the US dollar. The ticker prices, corresponding to the codes for each currency

are in the ‘tickers’ matrix and the dates are in the ‘times’ matrix.

1. Load the data into MATLAB. Look only at the last 504 days of trading prices.

Divide each ticker’s price by its initial opening price 504 days ago so that they all

start at 1. Plot these corrected prices against time (for the past two years) for all

tickers. Plot, in a 22 x 22 plot array, all the corrected prices against one another.

Provide the code you used and the respective images in an image format of your

choice. Briefly describe 5 things you see. (5 marks).

2. Using your corrected prices over the past two years of data, compute the corrected

daily returns (i.e. the % daily change in the value) of each ticker. Using this,

compute the correlation matrix of daily returns. Check in MATLAB whether the

matrix is symmetric and provide evidence whether it is or not. Visualise the

correlations matrix appropriately. What is the determinant of this matrix? In 10

words or less, give a geometric interpretation of the determinant of a matrix (5

marks).

3. The Marcenko-Pastur distribution describes asymptotic behaviour of the

eigenvalues of random matrices. Plot this distribution as a family of curves for

different parameter values, N and T, respectively, where N is the number of series

and T is their length. Give expressions for the two points along the x-axis at which

the MP distribution curves departs from the horizontal axis (i.e. where the MP

distribution is > 0), as functions of N and T. What is the significance of any of the

eigenvalues of a given non-random matrix derived from time series data falling

between these two values? What about those falling outside the range? (5 marks).

4. Study the network derived from the correlation matrix you just computed. Is it a

directed or undirected matrix? Plot a histogram of its eigenvalues and comment on

it (100 words). Plot the distribution of the node degrees and comment on it (100

words). What is the mean correlation between any two randomly chosen ticker price

returns in the past year? What is the distribution of the correlations between any

two randomly chosen ticker price returns in that time? What is the standard

deviation? Comment on the above in 250 words or less (5 marks).

5. Your evil employer’s investment strategy focusses around choosing currencies that

its analysts believe are geographic regions (Asia, Europe, etc) that are “on the rise”

and avoiding those that are “falling” relative to the USD. Naturally, having taken

MXB361, you are sceptical of such an approach absent evidence in its favour. You

decide to use community detection methods to see if there are any distinct

communities in the ticker returns over the last year and if so, if they overlap with

the GICS sectors. Your first step is to write your own community detection code.

Choose an algorithm you’ve come across (such as Potts, Louvain, etc.) or devise

your own and write a piece of code in MATLAB that takes as input an adjacency

matrix (for example, a correlation matrix) and maximises the modularity, outputting

the communities within the network and a value for the estimated maximum

modularity. Provide evidence that the method works on a small network you have

devised (5 marks).

6. In order to feed a noise-cleaned network into your community detection code, you

need to eliminate some eigenvectors from the correlations matrix you computed.

Based on you answer to Q5 above, set the eigenvalues that you think are

indistinguishable from noise to 0 and reconstitute the matrix using only the nonnoise eigenvalues/vectors. Study the degree distribution of this new

matrix/network. How, if at all, is it different from your answer in Q5? What

proportion of the eigenvalues were eliminated in this way? What does this

proportion tell you about the level of noise in the initial correlation matrix? How

would this all change if the values of T and N changes? In particular, for what

combination(s) of T and N is the correlation matrix indistinguishable from a random

matrix, and justify the reason for this claim using the expressions you reported in

your answer from Q3 (5 marks).

7. Apply your community detection code to the new, noise-cleaned matrix in order to

find modules in your data. Are there any and if so, how many? What is the

modularity maximum your code found? Plot on a world map bubbles/dots

corresponding to each currency placed over the geographic coordinates of each

country’s capital city (for bitcoin, gold and silver, just place the dots somewhere

“empty” to the side, e.g. over the Pacific). Provide the image and code. Does the

membership of these modules overlap with the geographic regions? If not, how big

is the difference? Do their “locations” make any sense, with respect to their

economic activity (you may need to do a bit of research on that latter aspect)? Where

do bitcoin, gold and silver localise to (obviously, they don’t belong to any country),

module membership-wise? Any comments on these findings? Finally, what is the

implication of these findings for your firm’s current investment strategy in Q6? (5

marks).

8. Based on the above, devise a new, computationally driven investment strategy for

your firm. Explain the outlines of your strategy to your CEO in a one-page

document that includes one image, along with an estimate of the % return your

strategy would make yearly. You may wish to discuss the effect of over-fitting

and/or keep some data for model fitting and some for validation. (5 marks).

Part II (Computational Complexity and OtherAreas of Module I – 20 marks in total)

9. “We are living in a period in human history in which new data is generated at such

a rate and mis-analysed so badly that most of us can’t make sense of the world

around us. This is similar to the 1500s when, following the invention of the printing

press but before literacy was widespread, there followed perhaps the bloodiest

period in human history”. Do you agree or disagree? Why or why not? Discuss in

500 words or so (10 marks).

10. The permanent of a matrix is defined exactly like the determinant but instead of

alternating signs between the terms, it’s all +’s, like this:

Write code to compute the permanent for a given n x n real-valued matrix. What

is the computational complexity (in time) of your algorithm? Can you find a way

to do any better? It’s thought that the permanent is computationally intractable – i.e. exponential time (in n) is required for its computation, a problem related to

“P?=NP” in ways that we don’t yet understand. Why would the determinant be

polynomial time computable (it is, otherwise we couldn’t solve systems of linear

equations or do pretty much anything useful with a computer), but the permanent,

so similar in its definition, be intractable to compute? Discuss in 500 words or so

(10 marks).

联系我们