讲解 Financial Technology: Methods and Practice Final Exam辅导留学生Python语言

Financial Technology: Methods and Practice

Final Exam

Due: May 6, 2024 (Midnight Central Time)

Instructions:

. This is a take-home and individual exam. There are 14 (part I) +13 (part II) +2 (part III)=29 tasks.

o Collaboration on the final exam violates academic integrity.

. Submission is due May 6 midnight. Technical reasons are not valid for a deadline extension.

. Your submission must include the following:

1. A concise write-up that contains all your answers, including all necessary illustrations

(e.g., critical code, plots, tables, if applicable). Acceptable formats are .docx, .pdf, and .HTML.

o Failure to submit your complete write-up may result in a deduction of 20% from your final exam score.

2. Your full code in a separate file (or two separate files). Acceptable formats are .R, .ipynb, .Rmd, and .py.

o Non-executable code may result in a final exam score of close to 0.

o Code execution time should be reasonable (less than 10 minutes for part I and less than 30 minutes for part II). Longer execution time may result in a deduction of 10% from your final exam score.

. You might earn a small bonus if the code execution time is very short.

o Packages may be used in the exam:

. R: dplyr, lubridate, tidyr, ggplot2, zoo, felm

. Python: pandas, numpy, statsmodels, linearmodels. iv, matplotlib

. Usage of ChatGPT or other AI tools for storytelling/open questions is strictly prohibited.

. Mentioning phrases like "I am an AI tool" will result in a grade of 0 for that question. I may also report potential violations of Olin’s Academic Integrity policies.

. Include proper and verifiable citations for storytelling/open questions (if applicable).

Part I: Consumer credit sub-project

Introduction

In this exercise, you want to understand whether more restrictive debt collection regulations lead consumers to borrow money from non-traditional credit such as payday loans. Creditors, especially traditional creditors such as credit card companies, commonly turn to third-party debt collectors to collect past-due payments. On the other hand, payday lenders do not usually rely on third-party debt collectors.

On Canvas, under Files–second half–final exam–data–Part I, you can find the below two datasets: a) state.csv: contains macroeconomic variables (e.g., unemployment rates and income per capita) for each state over time. It also includes the year when a state adopts restricting debt collection practices (variable (legislation_yeaT,). This data is at state-year level.

b) consumer_credit.xlsx: contains consumer credit information. Consumers in this dataset reside in counties at the border of states. This data is at consumer-state-year level.

Tasks:

Read and process state legislation data

1. Load state.csv into R (or Python) as a data frame. (state, . Report the number of

observations, mean, median, standard deviation, min, and max of state income per capita ( uincomeu ), state population ( upopu ), and state medical expenditures per capita ("health_expu).

2. Create a variable (index,, which is equal to 0 before the first debt collection legislation change in state s, 1 after (including the legislation change year) the first legislation change, 2 after the same state enacts the second legislation change, and 3 after the same state enacts another legislation change. For instance, Illinois had two regulation changes in 2005 and 2013, separately. (index, should be equal to 0 during 2000-2004, equal to 1 during 2005- 2012, equal to 2 in 2013 and afterward; Alabama never had regulation changes during the sample period, (index, should always be equal to 0. This variable measures the restrictiveness of debt collection legislation.

3. Create a variable (post, (postsJt ). Its value is 0 before the first legislation change in state s and 1 after. For instance, for Illinois state, its value should be equal to 0 during 2000-2004 and 1 starting from 2005. For Alabama state, its value should always be equal to 0.

4. Create a variable (tTeatment, (tTeatments ). Its value is equal to 1 for states that adopt restrictions in debt collection practices, otherwise 0. Its value should always be equal to 1 for Illinois and 0 for Alabama.

Read and process consumer credit data

5. Load consumer_credit.xlsx as a data frame ‘consumeT_cTedit’ . Report the mean, median, and standard deviation of these four variables: credit score, W2 income, payday loan amount (“payday_loan”), and total traditional loan amount ( “total_loan” ). The summary statistics should be presented in one table.

6. Create two variables: the logarithm of total traditional loan amount and the logarithm of payday loan amount. Assign 0 if the value of payday loan amount or total traditional loan amount is 0.

7. Generate a variable called ‘ income_quintile’ by year. The value for income_quintile is from 1 to 5, where 1 represents the lowest W2 income group and 5 represents the highest W2 income group. Create a bar chart where the x-axis is the income quintile and the y-axis shows the median total traditional loan amount for each income quintile.

8. Merge consumeT_cTedit data frame with state data frame by state and year.

Regressions

9. Regress credit outcomes on index, controlling for W2 income. The credit outcomes are credit score, log of total traditional loan amount, and log of payday loan amount. Then report coefficients (along with their p-values and t-statistics) for the 3 regressions.

10. Run the following 2SLS regression:

Stage 1: Regress log of total traditional loan amount on index, controlling for W2 income.

Stage 2: Regress log of payday loan amount on the fitted value of log of total traditional loan amount, controlling for W2 income.

Show your second-stage result. Your output should have the right standard errors.

11. Run a DID regression: regress log of payday loan amount on tTeatment times post , controlling for W2 income. Based on the sign on the interaction, do you get consistent result to task 10?

12. Rerun the DID specified in task 11 for each income quintile. Generate a table that collects the regression coefficients (along with their t-statistics, standard errors, and p-values) on the interaction term.

Storytelling

13. Explain why changes in debt collection legislation (index) is a good instrument for task 10.

14. The state legislators tighten debt collection regulations to protect consumers. Do your results support that more restrictive debt collection regulations benefit consumers? Why or why not?

Part II: Investment

Introduction

In this exercise, you will analyzeRobinhoodinvestor holding behavior. and its effect on the stock market.

On Canvas, under Files–second half–final exam –data –Part II, you can find the below three datasets:

a) robinhood_users_holding.csv (or RDS or pickle): It reports the number of Robinhood users holding a particular stock after 2 pm ET and before 5 pm ET. The original data is from Robintrack. The data is sorted by ticker and timestamp.

b) crsp_dret.csv: It reports each stock’s ticker, trading volume, and return on each trading day. The data is from CRSP (Center for Research in Security Prices).

c) ff_daily.csv: Fama-French daily stock factors. The data is from Professor Ken French’s Data Library.

Tasks:

Tasks 1-4: Read and process data

1. Read crsp_dret.csv as a data frame called “cTsp_daily”, convert the variable “date” to date format. Sort the data frame by ticker and date. Next, report the summary statistics (mean, median, standard deviation, and the number of non-missing observations) for return ("Tet") and trading volume ("vol").

2. Read Robinhood holding data (CSV or RDS or pickle), convert the column "timestamp" to timestamp format. Generate a new column "date" that only contains the date (format should be"yyyy-mm-dd") of timestamp.

3. Generate a data frame “useT_daily” that is at the stock-day level. It should include the following three columns:

1) "tickeT": stock ticker

2) "date": "yyyy-mm-dd" format

3) "useTs_close": for each stock, obtain the last available user holding data before the market close time 4 pm (“before” does not include 4 pm) on each day.

Now you have the number of users for each stock at daily level, and the data is stored in “useT_daily” . Next, report summary statistics of "useTs_close".

Now you may delete the intraday Robinhood data to release some memory.

4. Generate the following variables in useT_daily :

1) "useTchg" : the change of the number of users for the stock on each day (useTs_close(i, t) − useTs_close(i, t − 1)).

2) "abnoTmal_useTchg" : the difference between a stock’s user change on day t (useTchg) and the average user change for that stock from the previous five days (the average useTchg from day t – 5 to day t – 1).

3) "abnoTmal_useTchg_lag": lagged (1-day) abnoTmal_useTchg for each stock. You may use the lag () function from dpylT package (or shift() function from pandas).

Next, report summary statistics for the above 3 variables.

Tasks 5-7: Merge Robinhood user holding data with CRSP returns, then find the determinants for Robinhood investors’ trading behavior

5. Merge useT_daily (in task 4) and cTsp_daily (in task 1) by tickeT and date. Keep tickers and dates available in both datasets (inner merge). The merged data frame should be called “useT_meTged” . Sort useT_meTged by ticker and date. Report summary statistics for all variables (not including ticker and date) in this data frame.

6. In the data frame. useT_meTged, create a variable “extTeme_absolute_TetuTn” . Its value is equal to 1 if the absolute return for a stock is ranked in the top 20 on day t, 0 otherwise. Next, create a variable "extTeme_absolute_TetuTn_lag" as the lagged (1-day) “extTeme_absolute_TetuTn” for each stock. (Hint: you may need to generate the rank for the absolute return on each day first.)

7. Regress abnoTmal_useTchg on extTeme_absolute_TetuTn_lag , controlling for abnoTmal_useTchg_lag and year fixed effects. Report the regression result. Does your result indicate that Robinhood users prefer stocks with extreme returns?

Tasks 8-13: Do stocks heavily bought by Robinhood investors underperform the market? Evidence from a portfolio analysis

8. In the data frame useT_meTged, drop rows if “Tet” or “abnoTmal_useTchg_lag” are missing.

9. Create single-sort decile portfolios based on abnoTmal_useTchg_lag. The portfolios are rebalanced daily and are equally weighted. For each date, you need to assign each stock to a decile (1 to 10) based on abnoTmal_useTchg_lag, then calculate the average return for each decile and date.

10. Reshape the portfolio data to wide format. After reshaping, you should have 11 variables: one date variable and 10 decile portfolios. The decile portfolios should be named as “decile_1” , “decile_2”, …, and “decile_10” . Create a variable called "decile_10_minus_1" which reflects the portfolio returns that long decile_10 and short decile_1.

11. Create a table that summarizes the 11 portfolios’ annualized mean, annualized standard deviation, and t-statistics that test whether the average returns are statistically equal to 0. There are 252 trading days a year.

12. Merge the above portfolio returns with Fama-French daily factors by date. Regress decile_10_minus_1 on Fama-French five factors ( mktTf, smb, Tmw, cma, and hml ). Report the regression results. What is the annualized alpha (the intercept)?

13. Is the decile_10_minus_1 portfolio losing against or beating the benchmarks? List potential reason(s) that explain(s) the sign (positive or negative) on the alpha. A few sentences should suffice.

Part III: Open questions

1. Financial technology may benefit or harm consumers. Utilize the arguments, facts, or examples discussed in our FIN 450F/550F class to justify both views. Include proper citations in your answer.

2. Choose one question to answer from the following:

1) Explain the tradeoff of Type I and Type II errors in the context of default prediction.

2) Compare the advantages and disadvantages of ‘pure’ robo advisors (e.g., Wealthfront) and traditional asset managers ’ robo-advising solutions (e.g., Fidelity).

联系我们