TASK 1
We will start with Ethereum. You will be scraping the information from the blockchain explorer,
parsing it to find the data you need, and building an in-memory database of the transactions.
(It is suggested to use Beautiful soup for scraping)
* Use an Ethereum block explorer (recommended https://etherscan.io/txs; alternative
https://www.etherchain.org/) and spend some time looking through the Ethereum
transactions. How big are they? How often do they seem to occur? What patterns do you
see?
* You will be working with 6 fields. Transaction ID, transaction timestamp (date and time)
source address, destination address, transaction amount, mining/relayed pool. Each
transaction will be one row. Note: we do *not* need to include the reward notes such as
“generation fees” or “transaction fees”.
* Based on your knowledge from exploring Ethereum transactions, in your writeup document,
specify EXACTLY what format each field will be. (examples. name: string length between 3
and 30, age: integer between 0 and 120). Also specify what you will do with any missing,
incomplete, or scrambled data. For non-transaction data, exclude the record entirely.
* Your code should get display to the screen what block it is working on, then display all the
transactions in that block, before moving on to the next block.
* Write pseudo code in your writeup document to outline your program structure and flow.
Remember, there is no specific form. for pseudo code, but it should help you plan how to code
and how to break your code into manageable chunks.
* Write your actual code, and once it is working properly, run it on 100 blocks (most recent
first). Comment your code.
* For now do not save this data, we will be writing it to a data structure next week. However,
check that your output is COMPLETELY clean. No stray quotes or brackets or spaces, etc.
TASK 2
Now we’re going to do the same thing, but for Bitcoin. This should go much faster as you’ll be
reusing ideas from your Ethereum work. You should also be reusing some of your code, at
this point just copying it and modifying it is fine.
* Use a bitcoin explorer (such as https://www.blockchain.com/btc/blocks) to look at bitcoin
transactions. How are they similar to ethereum? How are they different?
* For Ethereum we had each transaction in one row. Will that work with Bitcoin? Why or why
not? (Hint, bitcoin has multiple senders and receivers per transaction). Think of your OWN
WAY to handle the multiple address problem in bitcoin. In your writeup document describe
how you changed your data structure, fields, processing, etc, to handle this difference. Also
include your bitcoin field specifications like you did for ethereum. (hint: if your ethereum and
bitcoin field specifications are exactly the same, you did something wrong)
* Write your pseudo code and include in writeup.
* Write your actual code, and when it is working properly run in on 100 most recent blocks.
Comment your code.
* Again, check that your output is COMPLETELY clean. No stray quotes or brackets or
spaces, etc.
When you are done submit your writeup document and your two python files. You do not
need to submit any of the transaction data now.
Task 3
Be sure the transaction data you are gathering from above is totally clean.
Update your program from Task 12 to build a data frame. of transactions instead of writing to
screen. Remember, data frames are statically sized, so you should build data frame. for each
block that you process, append it to a master data frame. representing all your transactions, then
start a new data frame. on the next block.
Write your full transaction data for Bitcoin and for Ethereum to CSV files and retain them for
your use in future projects.
Write a random sample of 1000 transactions for Bitcoin and Ethereum.
This week we dive into a particular statistical arbitrage trading strategy component, specifically
pairs-trading. Here is a basic overview
In finance, many investors look for/ like to see a stationary stock price series (note we are
referring to the price here not the stock’s return) .
Stationary means that the average and
variance of the stock price series for a certain period is fixed as constants (ie the mean of daily
prices for Tesla in July is the same as the mean of the daily prices for Tesla in June). The point
is, if we can identify stocks that are” stationary,” than it will be easier for us to predict the future
price movement (of that/those stocks) because if we can confirm that those elements (above)
are fixed then we can assume the stock(s) will follow a certain distribution pattern.
That said, it is not easy to find “stationary” stocks / most of the time, we can’t find stationary
stocks. Therefore, people construct portfolios with 2 (or more) stocks, to achieve a stationary
portfolio. In that case, those stocks are referred to as being “cointegrated.”
Cointegration is not the same as correlation.
Correlation is concerned with whether the daily return of the 2 stocks moves in the same
direction whereas
Cointegration is concerned with whether difference between the 2 stocks daily price series will
diverge over the long term (ie for Stock X and Stock Y, the portfolio is Z=a*X+b*Y, if Z series is
stationary, then we can do pairs-trading by investing in A units of Stock X and in B units of Stock
Y).
Go to Yahoo finance and download all historical price data for Bitcoin (BTC) and Ethereum
(ETH).
Load both, dropping adjusted close as it has no meaning for these items.
Add a column for Daily Return %.