A bank wants to use performance of an in-house credit product to create a risk model. A sample of
applicants for the original credit product was selected. Credit bureau data describing these individuals (at
the time of application) was recorded and stored in the CREDIT data set. The ultimate disposition of the
loan was determined (paid off or bad debt). For loans rejected at the time of application, a disposition was
inferred from credit bureau records on loans obtained in a similar time frame.
Variable Type Len Label
BanruptcyInd Num 8 Bankruptcy Indicator
CollectCnt Num 8 Number Collections
DerogCnt Num 8 Number Public Derogatories
InqCnt06 Num 8 Number Inquiries 6 Months
InqFinanceCnt24 Num 8 Number Finance Inquires 24 Months
InqTimeLast Num 8 Time Since Last Inquiry
TARGET Num 8 0 = Paid off; 1= Bad debt
TL50UtilCnt Num 8 Number Trade Lines 50 pct Utilized
TL75UtilCnt Num 8 Number Trade Lines 75 pct Utilized
TLBadCnt24 Num 8 Number Trade Lines Bad Debt 24 Months
TLBadDerogCnt Num 8 Number Bad Dept plus Public Derogatories
TLBalHCPct Num 8 Percent Trade Line Balance to High Credit
TLCnt Num 8 Total Open Trade Lines
TLCnt03 Num 8 Number Trade Lines Opened 3 Months
TLCnt12 Num 8 Number Trade Lines Opened 12 Months
TLCnt24 Num 8 Number Trade Lines Opened 24 Months
TLDel3060Cnt24 Num 8 Number Trade Lines 30 or 60 Days 24 Months
TLDel60Cnt Num 8 Number Trade Lines Currently 60 Days or Worse
TLDel60Cnt24 Num 8 Number Trade Lines 60 Days or Worse 24 Months
TLDel60CntAll Num 8 Number Trade Lines 60 Days or Worse Ever
TLDel90Cnt24 Num 8 Number Trade Lines 90+ 24 Months
TLMaxSum Num 8 Total High Credit All Trade Lines
TLOpen24Pct Num 8 Percent Trade Lines Open 24 Months
TLOpenPct Num 8 Percent Trade Lines Open
TLSatCnt Num 8 Number Trade Lines Currently Satisfactory
TLSatPct Num 8 Percent Satisfactory to Total Trade Lines
TLSum Num 8 Total Balance All Trade Lines
TLTimeFirst Num 8 Time Since First Trade Line
TLTimeLast Num 8 Time Since Last Trade Line
1) What is the number of missing values for the TLSum variable in the sample?
2) After dropping the missing values, what percentage of observations in the sample has TARGET=1?
Page 2 of 3
3) Randomly split the dataset, so that the training dataset includes 60% of the original dataset.
4) Create a logistic regression model with all variables as the predictors, with the exception of
the TARGET and ID variables.
5) What percentage of all observations is being correctly predicted in the test data set by the logistic
regression?
6) In the test data set, consider only those observations for which the actual value of the target variable
equals 1, TARGET=1. What percentage of these observations is being correctly predicted by the
logistic regression?
7) What is the predicted probability that applicant 66 (ID = 66) will default on a loan?
DECLARATION OF INDEPENDENT WORK
I hereby declare that I have not received any help from other people on all parts of the first exam
for Predictive Modeling II offered in Fall 2018. The document I submitted for the exam contains
only my independent work.
I confirm that I have not committed plagiarism in the accomplishment of this work.
I accept the academic penalties that may be imposed for violations of the above.