Data Analysis and Statistical Inference with R - Spring 2018
Homework 3
DUE IN: Friday, 02.03.2018 at 23.59,
HOW: electronically in pdf-format via submission to www.turnitin.com
Class id: depends on lab group (see announcement on piazza.com)
enrollment password: 20TiTaNic18
Please register for the class on turnitin ahead of time.
GROUP WORK: is allowed with a maximum of 2 persons per group. PLEASE stay within the
same group throughout the semester. Only one solution is accepted and graded per group.
Please include the names of all group members on each assignment.
HOW MANY: There will be a total of six homework assignments in this semester. We will do
a random selection of questions to be graded. Each week a total of ten points can be gained.
Only the ve best homeworks will be counted.
DUE DATES: 16.02., 23.02., 02.03., 09.03., 16.03., 23.03. (tentatively, subject to change)
FORMAT: Please do the required analyses and provide answers in complete sentences. Pro-
vide the R syntax for the commands. Extract and report those statistics that are
relevant; do not copy complete R output without providing proper answers to the assignment
questions. Integrate requested gures or tables into your document and give a brief verbal
comment/caption on them.
Credit card approvals
You work at American Express and you are in charge of deciding on credit card applications. Based
on data on previous decisions on credit card applications you try to nd some general rule for credit
card approval.
The data set creditcard (an R data set, stored in the le (creditcard.Rdata) on campusnet)
contains fteen variables. You are only interested in the following six variables
Gender Is applicant female or male? (female = 0, male = 1)
Children Do children live in applicant’s household? (no = 0, yes = 1)
MaritalStatus is applicant married? (not married = 0, married = 1)
HomeOwner Does applicant own a home? (no = 0, yes = 1)
SavingsType di erent types of saving accounts (regular = 1, money market = 2, or certi cates of
deposite (CDs) = 3)
1
CreditCard dichotomous indicator whether or not the credit card application was approved (0=
no, 1= yes)
1. First of all, tabulate the CreditCard variable.
(a) (half a point) How many applicants was the credit card approved?
(b) (1 point) What are the odds of getting a credit card application denied?
(c) (1 point) What is the \risk" (i.e. probability) of getting a credit card application ap-
proved?
2. Cross-tabulate the variables Children and CreditCard.
(a) (1 point) How many applicants with children in their household was the credit card
denied?
(b) (1 point) How many applicants without children in their household got the credit card
application approved?
(c) (half a point) Draw a mosaicplot visualising the contingency table of credit card approval
and whether children live in the applicant’s household.
3. You continue with your analysis of the relationship between Children and CreditCard.
(a) (1 point) Are applicants with children in their household less likely (as measured in
odds) to get credit card applications approved than others? Calculate the odds ratio for
getting a credit card application approved comparing applicants with children in their
household with those without.
(b) (1 point) Are applicants with children less likely (as measured in risk) to get credit
card applications denied than applicants without children? Calculate the relative risk
for getting a credit card application denied comparing applicants with children in their
household to those without.
(c) (half a point) Looking at the mosiac plot created in Question 2c, how strong is the
relationship between the two variables Children and CreditCard.
4. Now, you assess the relationship between Children and CreditCard using the 2-statistic.
(a) (1 point) Calculate the 2-statistic to assess the relationship between Children and
CreditCard.
(b) (1.5 points) Calculate the expected frequencies under the assumption that home own-
ership has no e ect on credit card approval. For which cells are expected frequencies
higher than the observed ones?
5. In the following, perform. the analyses separately for female and male applicants.
(a) (half a point) Calculate the 2-statistic to assess the relationship between Children and
CreditCard.
(b) (1.5 point) Calculate the expected frequencies under the assumption that home own-
ership has no e ect on credit card approval. For which cells are expected frequencies
higher than the observed ones?
2
(c) (half a point) Do the results di er for the two sexes?
6. (2.5 points) Visualise the relationships using mosaicplots. Do the di erences between females
and males in relation to credit card approval and children in the household become visible in
the plots? Provide reason for your answer!
7. (2.5 points) Compute the oddsratios to assess the relationship between Children and CreditCard
for males and females separately. Are the results more in line with the 2 statistic or more
in line with the mosaicplot?
8. Now, you assess the relationships between SavingsType and CreditCard.
(a) (half a point) Calculate the 2-statistic to assess the relationship between SavingsType
and CreditCard.
(b) (1 point) Calculate the Phi-coe cient, the Contingency Coe cient and Cramer’s V to
assess the relationship between SavingsType and CreditCard.
(c) (half a point) Why is there no result for the Phi-Coe cient?
(d) (half a point) Visualize the relationship between credit card approval and savings type.
Which of the statistics used for this data comes closer to the visual representation of the
relationship’s strength?
9. (2.5 points) Since there are too few cases for savings type 3 (CDs), you exclude applicants
having this savings type and re-run the analysis of Question 8 again. Summarize the results
and comment on the di erences.