EE6435 : Homework 2

NAME:

SID: Homework 2

EE6435

February 10, 2020

Homework 2 is due at 11:59PM on Feb. 20. Please submit your homework

via CANVAS. No late work will be graded. You can type or submit scanned

version of your handwritten solutions. Some problems don’t allow partial credit.

If allowed, all partially correct answers get half of the full mark in order to make

the grading more standard. For example, if the answer to Problem 5 is partially

correct, the mark is 2.5 pts.

Problem 1 (15 pts)

Obtain one of the data sets available at the UCI Machine Learning Repository and apply

three visualization techniques from Lecture 2 (histogram, scatter plot, box plot). For scatter

plot, you can just choose two attributes. You can use excel, matlab, R, or Weka for

producing the required visualization results.

weka: https://www.cs.waikato.ac.nz/ml/weka/

UCI: https://archive.ics.uci.edu/ml/index.php

I suggest the following two data sets at UCI: Student Academics Performance Data Set and

Bank Marketing Data Set.

In your submission, clearly describe what tool/tools you used for generating the results.

Describe any codes/commands you used as well. Then attach the figures.

Problem 2 (15 pts, no partial credit)

Considering the training examples shown in Table 1 for a binary classification problem.

(a) What is the entropy of this collection of training examples with respect to the positive

and negative classes?

(b) For a3, which is a continuous attribute, compute the information gain for every possible

binary split.

Problem 3 (20 pts, no partial credit)

Considering the training examples in Table 2, where X, Y, and Z are attributes.

(a) Compute a two-level decision tree using the greedy approach described in Lecture 3. Use

the entropy as the criterion for splitting.

(b) For the deduced tree, what are the error rates of all the leaf nodes?

NAME:

SID: Homework 2

EE6435

February 10, 2020

Table 1: Data set for Problem 2

Instance a1 a2 a3 Target Class

1 T T 1.0 +

2 T T 4.0 +

3 T F 5.0 0

4 F F 4.0 +

5 F T 7.0 0

6 F T 6.0 0

7 F F 8.0 0

8 T F 7.0 +

9 F T 3.0 0

Table 2: Data set for Problem 3

X Y Z Num of Class C1 examples Num of class C2 examples

0 0 0 10 15

0 0 1 0 10

0 1 0 0 20

0 1 1 45 10

1 0 0 8 42

1 0 1 12 8

1 1 0 5 0

1 1 1 5 10

Problem 4 (15 pts)

Apply decision tree to the iris data set and also the data set you choose in Problem 1 using

Weka. Submit and explain the decision trees produced by weka.

You need to install Weka on your computer: https://www.cs.waikato.ac.nz/ml/weka/

In addition, the above website provides YouTube clips about the classification models.

Problem 5 (5 pts)

CityU has a set of rules to decide the academic status (i.e. class) of an undergraduate student

using the GPA-related attributes. These rules can be found in the file at Canvas/files/data/.

Instead of using these rules, use a decision tree to represent them (no training is needed).

联系我们

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-21:00
微信：codinghelp

热点文章

辅导 comm2000 creating socia... 2026-01-08
讲解 isen1000 – introductio... 2026-01-08
讲解 cme213 radix sort讲解 c... 2026-01-08
辅导 csc370 database讲解迭代 2026-01-08
讲解 ca2401 a list of colleg... 2026-01-08
讲解 nfe2140 midi scale play... 2026-01-08
讲解 ca2401 the universal li... 2026-01-08
辅导 engg7302 advanced compu... 2026-01-08
辅导 comp331/557 – class te... 2026-01-08
讲解 soft2412 comp9412 exam辅... 2026-01-08
讲解 scenario # 1 honesty讲解... 2026-01-08
讲解 002499 accounting infor... 2026-01-08
讲解 comp9313 2021t3 project... 2026-01-08
讲解 stat1201 analysis of sc... 2026-01-08
辅导 stat5611: statistical m... 2026-01-08
辅导 mth2010-mth2015 - multi... 2026-01-08
辅导 eeet2387 switched mode ... 2026-01-08
讲解 an online payment servi... 2026-01-08
讲解 textfilter辅导 r语言 2026-01-08
讲解 rutgers ece 434 linux o... 2026-01-08

热点标签

msinm014/msing014/msing014b

联系我们 - QQ: 99515681 微信：codinghelp

程序辅导网！