CST2330程序讲解、Data Analysis程序辅导、辅导Python，R程序设计讲解SPSS|解析C/C++编程

MIDDLESEX UNIVERSITY COURSEWORK 1
CST2330

Data Analysis for Enterprise Modelling
This assignment is worth 50% of the overall grade. The submission date is Week 12, Friday, 19:00 January 8, 2021.

Contents
1The net present value (NPV) problem (10%) 1
2Optimisation and linear programming (20%) 2
2.1Solution using analysis and graphs (10%) . . . . . . . . . . . . . . 2
2.2 Solution using solver (10%) . . . . . . . . . . . . . . . . . . . . . . 3
3Data import, plotting and transformation (20%) 3
3.1 Plot prices (5%) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2 Create prices table (10%) . . . . . . . . . . . . . . . . . . . . . . . 5
3.3 Convert prices to log-returns (5%) . . . . . . . . . . . . . . . . . . 6
Software and data required
You are recommended to use R-Studio — an integrated development environment for R language. It is available on the University computers from Apps Anywhere. It can also be installed on personal computers, a free copy is available at:
https://rstudio.com/products/rstudio/
You may need to load the following libaries in R-Studio:
●lpSovle — to solve linear programming problems in Task 2.
●DBI — if you need to connect and read databases in Task 3.
●xts — for extensible timeseries objects in Task 3.
Note, however, that all tasks in this coursework can be implemented in other languages (e.g. Python, Common Lisp, Julia or even more visual projects, such as KNIME or RapidMiner), and you can use them, if you feel more comfortable.
A speciﬁc dataset is required for Task 3 as well as the tasks in Coursework 2.
It is available on My Learning in the Data folder of the CST2330 page.

1The net present value (NPV) problem (10%)
A car manufacturer is looking to bring a new range of electric vehicles (EV) in the next ﬁve years. This will require an initial investment of f500M and has running costs of f100M per year after that. The predicted incomes from this range are:

Year 2020 2021 2022 2023 2024 2025
fM 0 10 50 200 300 500

Notice that the initial expenditure takes place in 2020.

1.Assume a discount rate of 1% per year, and perform analysis of discounted cashﬂow. You should create a data table in R with variables Year, Outﬂow
and in the ﬁrst three columns:

year outflow inflow net n pvf
1 2020 500 0
2 2021 100 10
3 2022 100 50
4 2023 100 200
5 2024 100 300
6 2025 100 500

Write a function to analyse the cashﬂow table:
This function should modify the table by adding columns showing the net returns, the numbers of years to discount, the present value factors and the discounted returns. Your function must be able to take diﬀerent cashﬂow tables and discount rates as the input, print the modiﬁed cashﬂow table and return the (total) net present value (NPV) for the investment. Include your
code into the report. 5 marks
2.Decide whether or not this is a worthwhile investment, justifying your answer
by your results. 1 mark
3.Calculate the NPV for discount rates of 2%, 3%, 4% and 5%, giving your
answers to two decimal places. 1 mark
4.Plot NPV against the discount rates ranging from 0% to 5%. 1 mark
5.Use your graph to estimate the discount rate that would give an NPV of zero. What is the signiﬁcance of this discount rate? (Hint: What happens
at discount rates lower and higher than this ﬁgure?) 2 marks

2Optimisation and linear programming (20%)
2.1Solution using analysis and graphs (10%)
A car manufacturer produces conventional cars, which generate f4,000 proﬁt per car, and a newly developed EVs, which generate f5,000 per car. The objective is to maximise proﬁt by selecting the right combination of conventional cars and EVs, subject to the following constraints:
●The factory can produce up to 1,000 conventional cars and up to 500 EVs per day.
●The logistics and warehouse facilities do not allow for production of more than 750 vehicles per day in total (i.e. both conventional and EVs).
●The manufacturer has a contract to produce at least 100 conventional and 50 EVs per day.
Your task is to:
1.Write the objective function. 1 mark
2.Write the constraints for production and logistics. 2 marks
3.Plot or draw the feasible region and the isoproﬁt line. 3 marks

4.Find the optimal numbers of conventional cars and EVs to be produced (the
solution) in order to achieve the maximum proﬁt (the optimal value). 2 marks

5.Explain the solution analytically or using the graph. 2 marks

2.2Solution using solver (10%)
Assume that the manufacturer adds hybrid vehicles to their range, which generate f4,500 proﬁt per car. The factory can produce up to 1,000 hybrid cars per day, and there are no obligations to the smallest number. The warehouse constraints remain the same.
Your task is to:

1.Write the updated objective function. 1 mark

2.Write the updated constraints for production and logistics. 2 marks
3.Use the lpSolve library in R to ﬁnd the solution (an optimal combination of three types of cars) and the optimal value (the maximum proﬁt) to this
problem. Include your code into the report. 5 marks

4.What happens if the contract to supply at least 100 conventional cars is
replaced by the same number of hybrid cars? 2 marks

3Data import, plotting and transformation (20%)
In this task, you should use the crypto-candles dataset, which contains daily exchange rates between major crypto-currencies between Jan 2019—Sept 2020. This dataset can be downloaded from the data folder on the course’s webpage (My Learning). There are two versions of this dataset:

crypto-candles.csv crypto-candles.db
The ﬁrst is a comma separated values (csv) ﬁle, and the second is an SQLite database. You can read the csv ﬁle with the read.csv command. Alternatively, you can connect to the database and read table candles from the db ﬁle using the dbReadTable command. You should assign the result to a variable, which you may call candles. Note that if you read data from the database, then the timestamps have to be converted into dates by the command:
Regardless of which method you use, you should now have the same dataset, the ﬁrst 6 rows of which are:
TIMESTAMP OPEN CLOSE HIGH LOW VOLUME SYMBOL
1 2020 -09 -16 01:00:00 10789 10803 10803 10789 12.05823 tAAABBB
2 2020 -09 -14 01:00:00 10316 500 10377 500 4658.04400 tAAABBB
3 2020 -09 -13 01:00:00 10459 10324 10586 10238 822.69780 tAAABBB
4 2020 -09 -12 01:00:00 10407 10443 10489 10297 499.03366 tAAABBB
5 2020 -09 -11 01:00:00 10355 10406 10415 10218 810.13170 tAAABBB
6 2020 -09 -10 01:00:00 10302 10355 10484 10271 901.84625 tAAABBB
The column SYMBOL contains names of the trading pairs (e.g. tBTCUSD is the ex- change rate between Bitcoin and the US Dollar). Thus, each row of the dataset contains a record of the prices (open, close, high, low) and volume data on a spec- iﬁed date (given by the TIMESTAMP) and for each trading pair (given by SYMBOL).
The goal of this task is to make several transformations of this dataset into
other formats, so that it is ready for further analysis in Coursework 2.

3.1Plot prices (5%)
Plot closing prices against time for several (2–5) trading pairs, such as the graph below shown for the tBTCUSD pair:

BTCUSD 2019−01−02 / 2020−09−19 01:00:00

12000 12000

10000 10000

8000 8000

6000 6000

4000 4000

Jan 02 Apr 01 Jul 01 Oct 01 Jan 01 Apr 01 Jul 01 Sep 19
2019 2019 2019 2019 2020 2020 2020 2020

To do this, you will need to select subsets from the data corresponding to the trading pairs of your choice (e.g. tETHUSD, tETHBTC, tIOTBTC, tEOSJPY), and then selecting columns TIMESTAMP and CLOSE. Note that before plotting the subset, you can convert it into the extensible timeseries format (xts) using the command:
where is your subset for a speciﬁc trading pair. 5 marks

3.2Create prices table (10%)
Convert the dataset into a new format, which contains the closing prices for each trading pair in diﬀerent columns side-by-side and ordered according to their TIMESTAMP. Thus, each row should correspond to a speciﬁc date and contain closing prices for all trading pairs, as shown below:

tAAABBB tABSUSD ... tBTCEUR tBTCGBP tBTCJPY tBTCUSD
2019 -01 -02 NA NA ... 3577.195 3233.9 435370.0 4048.8
2019 -01 -03 NA 0.0075 ... 3441.500 3101.6 423820.0 3924.3
2019 -01 -04 NA 0.0078 ... 3476.100 3116.6 429750.0 3954.9
2019 -01 -05 NA NA ... 3432.478 3075.2 424538.3 3911.9
2019 -01 -06 NA NA ... 3653.900 3285.9 452870.0 4168.4
2019 -01 -07 NA 0.0077 ... 3583.700 3215.3 446690.0 4113.9
Note that the table above shows only some of the columns for just a few pairs (i.e. "tAAABBB", "tBTCUSD", etc). The dataset contains more than 270 trading pairs. One possible way of converting the candles is as follows:
●Create a list (or vector) all_pairs of all trading pair names. This can be done by selecting the SYMBOL column from the data, and then removing duplicates using function unique.
●Create an empty variable prices that will be used to assemble closing prices for all pairs.
●Run a loop for (pair in all_pairs), in which
1.Select TIMESTAMP and CLOSE columns from a subset for the pair and assign the result into a temporary table (e.g. call it temp).
2.Convert the result into class xts using CLOSE as data and order by
TIMESTAMP (see the commands in Task 3.1).
3.Add the result into the prices table using the cbind function:

●On ﬁnish, give names to columns: colnames(prices) <- all_pairs
In your report, you should print the dimensions of the resulting table using command dim(prices) and show the ﬁrst 10 rows and randomly chosen 5 columns using the command:
3.3Convert prices to log-returns (5%)
If s(t) and s(t + 1) are the prices on two consecutive days, then the diﬀerence s(t + 1) · s(t) is called a return, while the diﬀerence of their logarithms is called log-return:
They represent price changes and are more interesting for analysis and forecasting than the prices themselves. Thus, in this task you have to convert the prices table into a table of their log_returns. This can be done by the following command:
Notice the use of as.matrix to preserve the precision of numerical operations as well as dates as the rownames. This allows us to convert the result into the timeseries format using as.xts, which uses dates in the rownames as time index. The problem is that the prices table has some data missing: notice the NA (‘not available’) entries in the prices table. This is because prices for some pairs were not available on certain dates in the original dataset (e.g. some pairs were not traded during some periods and no prices were recorded). Thus, before the log-returns can be computed, the NA entries must be ﬁlled in by some values. This
can be done in the following way:
1.Fill in the NA values by the last observations (i.e. most recent available prices). This is equivalent to assuming that the price remained the same rather than missing, and so log-return will be zero rather than NA.
2.In the cases when no more recent prices were observed (e.g. a pair was not traded before certain date), then ﬁll in the NA values by the next observations. This means that log-returns will also be zero in this case rather than NA.
If the prices table is in the xts (timeseries) format, then both operations above can be performed by the function na.locf and using its argument fromLast to control the direction of the operations.
After ﬁlling in the NA values, convert the prices table into log_returns, as described above. In your report, you should print the dimensions of the resulting table using command dim(log_returns) and show the ﬁrst 10 rows and randomly chosen 5 columns using the command:
In addition, include a couple of plots of log-returns for some of the pairs (e.g. those used in Task 3.1). For example, below is the plot of log-returns of tBTCUSD:
Presentation
Your report should be well presented. A good guide is the Publication Manual of the American Psychological Association (e.g. see http://www.apastyle.org/). At the very least, your report should be clear, typed or nicely hand-written doc- ument with good spelling, grammar and easy to understand English. There is no word limit, but a useful report should be just long enough to describe the work. A sensible limit is about 10 pages of typed text. Beyond this, you are probably being a bit too verbose. Tables, graphs, careful labelling and numbering are all well established and eﬀective presentation tools.
Things to avoid are:
●Including images or diagrams that you did not create yourself or did not obtain the permission to use from the author (even if the image is from the Internet).
●Including graphs or diagrams that you do not describe in the text.
●Forgetting to label the axes on the charts.
●Using 3D charts to display 2D information.
●Including material irrelevant to the work.