首页 >
> 详细

Data Visualisation and Analytics Assignment 3

Department of Econometrics and Business Statistics, Monash University

Due Date: 24th October 2019 at 1PM

A Implementing kNN classification (10 Marks)

This part of the assignment involves kNN classification of a dataset of 140 bank customers and must be

completed by ALL students. Note that this assignment is based on simulated data and each student has

their own personalised dataset. You must enter your student ID number before downloading your unique

dataset. The data can be downloaded here.

In the dataset, for each employee, data were collected on the following variables:

• Name : Customer name.

• Default: Did customer fail to pay back loan (Default) or successfully pay back the loan (No Default).

• WeeklyIncome : Income per week.

• EmploymentDuration : Time spent in current job.

• WeeklySpend : Average amount of money spent per week.

• Children : Number of children.

• Age : Customers age.

• Sample : Whether the customer is in the training sample or test sample.

The objective is to predict on the basis of Weekly Income, Employment Duration, Weekly Spend, Number of

Children and Age whether a customer will default. The training sample can be used for determining a rule

for prediction and the test sample for evaluation. You may assume that the costs of both types of incorrect

prediction are equal. All numerical variables have been standardised by subtracting the mean and dividing

by the standard deviation of the traning sample. You do NOT need to standardise the data.

Once you have downloaded your data, complete the assignment by going to this google form and answering

all questions. You must be signed in to your Monash email account when submitting the form for this to

work. To help you prepare your answers a pdf version of the form is available on Moodle.

B Analysis of classification methods (10 Marks)

The second part of the assignment is to be submitted as a hard copy. Students enrolled in the ETX2250 unit

code should submit into the mailbox of Joan Tan. Students enrolled in the ETF5922 unit code should submit

into the mailbox of Anastasios Panagiotelis. Both of these can be found on level 5, Building H of the Caulfield

Campus. A soft copy can be submitted via moodle as a backup but you still MUST submit a hard copy.

B.1 Loan Approval (For ETX2250 Students Only)

You are consulting for a bank that currently uses k-nearest neighbours with k = 1 to determine whether a

customer will default on a loan or not default. The features used in this model are weekly spending (measured

in dollars) and duration in the current job (measured in years).

1. Explain why the data need to be standardised before carrying out kNN classification? (2 Marks)

2. Suppose a customer arrives who has been in their job for 5 years (standardised value 0.75) and a weekly

spend of $129.17 (standardised value of 1). Using Figure 1, determine whether the bank predicts that

this customer defaults or does not default? (1 Mark) 1

−2

−1

0123

−1 0 1 2

Weekly Spending (Standardised)

Employment Duration (Standardised)

Default

Default

No Default

Training Data for Loan Approval

Figure 1: Training data used by bank to determine loan approvals. The features are standardised. The bank

uses k nearest neighbours with k=1 to predict default.

2

3. Suppose the same customer who has been in their job for 5 years (standardised value 0.75) plans to

reduce their weekly spend to $101.44 (standardised value of 0.25). Using Figure 1, determine whether

the bank predicts that this customer defaults or does not default? (1 Mark)

4. Suppose the same customer who has been in their job for 5 years plans to reduce their weekly spend

$92.20 (standardised value of 0). Using Figure 1, determine whether the bank predicts that this customer

defaults or does not default? (1 Mark)

5. With respect to Questions 2 to 4 discuss a limitation(s) of the bank’s method. (1 Mark)

6. How could you address the limitation(s) discussed in your answer to Question 5 while still using k

nearest neighbour classification. (2 Mark)

7. How could linear discriminant analysis overcome the problem discussed in Question 5. (2 Mark)

B.2 Multiclass classification (For ETF5922 Students only)

You are consulting for a client that would like to build a method for predicting brand choice in the

telecommunications industry. You have data on the following:

• Brand: Choice of brand for telecommunications. Either Telstra, Optus or Vodafone

• Income: Yearly income measured in dollars

• Age: Age measured in years

Four potential classification methods should be considered

• k Nearest neighbours classification with k = 3

• k Nearest neighbours classification with k = 13

• Linear Discriminant Analysis (LDA)

• Quadratic Discriminant Analysis (QDA)

Your task is to evaluate ALL of these methods and recommend one method to be used by the client. You

must describe

1. The process and criteria used to evaluate the methods.

2. Any other considerations that are important in evaluating the methods.

3. Any limitations of the analysis.

Summarise your results in a report (should not be more than 1000 words and will probably be less). Any

conclusions you make must be supported by evidence.

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp

- Csci 340作业代做、代写java程序语言作业、代做java实验作业代写 2019-12-12
- Data课程作业代做、代写nbershade作业、代做r课程设计作业、代写r 2019-12-12
- 代写csci 1100作业、Program课程作业代做、Python语言作业 2019-12-12
- Data留学生作业代做、代写sql实验作业、Sql编程有作业调试、Pseud 2019-12-12
- 代做g6077留学生作业、System课程作业代写、代做web编程语言作业、 2019-12-12
- 代写comp529作业、代做analysis留学生作业、代写java语言作业 2019-12-12
- Ce235留学生作业代写、Program课程作业代写、C/C++程序语言作业 2019-12-12
- 代写system留学生作业、代做python语言作业、代写java，C/C+ 2019-12-12
- 代写ma705留学生作业、代写python程序语言作业、代写python实验 2019-12-11
- Stat 3312作业代做、R语言作业代写、代做r编程设计作业、代写sas 2019-12-11
- Comp201作业代做、代写software Engineering作业、J 2019-12-11
- Statistics 3022作业代做、代写data留学生作业、R编程设计作 2019-12-11
- 代写canvas留学生作业、Python, R，Matlab编程作业代做、代 2019-12-11
- Cs 112留学生作业代做、Program编程语言作业、Python程序语言 2019-12-11
- 代写fre6831留学生作业、代做python程序语言作业、Python实验 2019-12-11
- Mathjax课程作业代写、代做html，Css作业、代写r编程设计作业、代 2019-12-11
- Stsci 5060作业代做、Sql编程语言作业调试、代写sql课程设计作业 2019-12-11
- 代做parser留学生作业、Programs课程作业代写、代写c++实验作业 2019-12-10
- 代写econ215留学生作业、代写python课程设计作业、Python编程 2019-12-10
- 代写databases作业、代做java，Python编程设计作业、代写c/ 2019-12-10