COMP9318 ASSIGNMENT 1

COMP9318 (20T1) ASSIGNMENT 1

DUE ON 20:59 12 APR, 2020 (SUN)

Q1. (40 marks)

Consider the following base cuboid Sales with four tuples and the aggregate function

SUM:

Location T ime Item Quantity

Sydney 2005 PS2 1400

Sydney 2006 PS2 1500

Sydney 2006 Wii 500

Melbourne 2005 XBox 360 1700

Location, T ime, and Item are dimensions and Quantity is the measure. Suppose the

system has built-in support for the value ALL.

(1) List the tuples in the complete data cube of R in a tabular form with 4 attributes,

i.e., Location, T ime, Item, SUM(Quantity)?

(2) Write down an equivalent SQL statement that computes the same result (i.e., the

cube). You can only use standard SQL constructs, i.e., no CUBE BY clause.

(3) Consider the following ice-berg cube query:

SELECT Location, Time, Item, SUM(Quantity)

FROM Sales

CUBE BY Location, Time, Item

HAVING COUNT(*) > 1

Draw the result of the query in a tabular form.

(4) Assume that we adopt a MOLAP architecture to store the full data cube of R, with

the following mapping functions:

Draw the MOLAP cube (i.e., sparse multi-dimensional array) in a tabular form

of (ArrayIndex, V alue). You also need to write down the function you chose to

map a multi-dimensional point to a one-dimensioinal point.

Q2. (30 marks)

Consider the given similarity matrix. You are asked to perform group average hierarchical clustering on this dataset.

You need to show the steps and final result of the clustering algorithm. You will show

the final results by drawing a dendrogram. The dendrogram should clearly show the order

in which the points are merged.

p1 p2 p3 p4 p5

p1 1.00 0.10 0.41 0.55 0.35

p2 0.10 1.00 0.64 0.47 0.98

p3 0.41 0.64 1.00 0.44 0.85

p4 0.55 0.47 0.44 1.00 0.76

p5 0.35 0.98 0.85 0.76 1.00

Q3. (30 marks)

Algorithm 1: k-means(D, k)

Data: D is a dataset of n d-dimensional points; k is the number of clusters.

1 Initialize k centers C = [c1, c2, . . . , ck];

2 canStop ← false;

3 while canStop = false do

4 Initialize k empty clusters G = [g1, g2, . . . , gk];

5 for each data point p ∈ D do

6 cx ← NearestCenter(p, C);

7 gcx

.append(p);

8 for each group g ∈ G do

9 ci ← ComputeCenter(g);

10 return G;

Consider the (slightly incomplete) k-means clustering algorithm as depicted in Algorithm 1.

COMP9318 (20T1) ASSIGNMENT 1 3

(1) Assume that the stopping criterion is till the algorithm converges to the final k

clusters. Can you insert several lines of pseudo-code to the algorithm to implement

this logic? You are not allowed to change the first 7 lines though.

(2) The cost of k clusters is just the total cost of each group gi

, or formally

dist() is the Euclidean distance. Now show that the cost of k clusters as evaluated at

the end of each iteration (i.e., after Line 9 in the current algorithm) never increases.

(You may assume d = 2)

(3) Prove that the cost of clusters obtained by k-means algorithm always converges to

a local minima. You can make use of the previous conclusion even if you have not

proved it.

In fact, the two loops (Lines 5–7 and Lines 8–9) never increases the cost. Hint 1.

Submission

Please write down your answers in a file named ass1.pdf. You must write down

your name and student ID on the first page. You should typeset your answers in

LATEX or MS Word. We do not accept handwritten answers.

You can submit your file by

give cs9318 ass1 ass1.pdf

Late Penalty. -10% per day for the first two days, and -20% for each of the followingdays.

联系我们

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-21:00
微信：codinghelp

热点文章

辅导 comm2000 creating socia... 2026-01-08
讲解 isen1000 – introductio... 2026-01-08
讲解 cme213 radix sort讲解 c... 2026-01-08
辅导 csc370 database讲解迭代 2026-01-08
讲解 ca2401 a list of colleg... 2026-01-08
讲解 nfe2140 midi scale play... 2026-01-08
讲解 ca2401 the universal li... 2026-01-08
辅导 engg7302 advanced compu... 2026-01-08
辅导 comp331/557 – class te... 2026-01-08
讲解 soft2412 comp9412 exam辅... 2026-01-08
讲解 scenario # 1 honesty讲解... 2026-01-08
讲解 002499 accounting infor... 2026-01-08
讲解 comp9313 2021t3 project... 2026-01-08
讲解 stat1201 analysis of sc... 2026-01-08
辅导 stat5611: statistical m... 2026-01-08
辅导 mth2010-mth2015 - multi... 2026-01-08
辅导 eeet2387 switched mode ... 2026-01-08
讲解 an online payment servi... 2026-01-08
讲解 textfilter辅导 r语言 2026-01-08
讲解 rutgers ece 434 linux o... 2026-01-08

热点标签

msinm014/msing014/msing014b

联系我们 - QQ: 99515681 微信：codinghelp

程序辅导网！