首页 > > 详细

CS5481 Data Engineering Assignment 3

 CS5481 Data Engineering    Assignment 3 

Deadline:  6‐DEC‐2019 (Friday), noon (no late submission will be accepted) 
Submission:   Canvas 
Format:  word or pdf 
Points to note:  
o Different books may have slightly different descriptions of concepts, algorithms and 
terminologies. To ensure uniformity in marking, you must follow the convention used in 
the lecture slides or our textbook (Database System Concepts). Other conventions will 
not be accepted. 
o You must show the steps clearly.  The marker will not give you marks if s/he cannot 
understand your work. 
o This is an individual assignment. You must work on your own.  Check 
http://www6.cityu.edu.hk/ah/plagiarism.htm for “The Problem of Plagiarism”.
1.  Transactions [35%] 
(a)  Consider the following two transactions. 
T1: read(A); A:=A+2; write(A); read(B); B:=B*3; write(B); 
T2: read(B); B:=B*2; write(B); read(A); A:=A+3; write(A); 
Assume that the two data items A and B are integers.  Also assume that the two 
transactions preserve database consistency in isolation.   
(i) Assume A = 1 and B = 2 initially.  Demonstrate that both serial schedules, i.e., 
(T1, T2) and (T2, T1), are equivalent by showing the final values of the two data 
items in the database. 
(ii) Give a concurrent schedule1
 that is equivalent to a serial schedule, i.e., giving 
the same final values in the database as a serial schedule. 
(iii) Give a concurrent schedule1
 that results in an inconsistent state and show the 
final values of the two data items in the database. 
(b)  The two transactions in Part (a) can be simplified as follows by showing read and 
write instructions only.  
T1: r1(A); w1(A); r1(B); w1(B); 
T2: r2(B); w2(B); r2(A); w2(A); 
(i) Give all possible schedules2 (including serial schedule) of the two transactions 
that are conflict‐equivalent to the serial schedule (T1, T2). 
(ii) Give all possible schedules2
 (including serial schedule) of the two transactions 
that are conflict‐equivalent to the serial schedule (T2, T1). 
(c)  Transaction T2 in Part (b) has been changed as follows, so A is processed before B.    
T1: r1(A); w1(A); r1(B); w1(B); 
T2: r2(A); w2(A); r2(B); w2(B); 
Give all possible conflict‐serializable schedules2
 (including serial schedules) of the two 
transactions. 
1
 Present the schedule like those shown on slides 13-14 of lec5. 2
 Present the schedule with read and write instructions only. 
CS5481 Data Engineering    Assignment 3 
2.  Distributed Query Processing [65%] 
Consider the following relations of a database of an Internet company. 
User (ip, user‐id, region) 
Visit (ip, dest‐ip, last‐visit‐date) 
Website (dest‐ip, url, region) 
For each user, there is an ip, user‐id, and region where the user is located in.  For each 
website, there is an dest‐ip, url, and region where the website is located in.  When a user 
visit a website, the last‐visit‐date will be stored or updated if the record already exists. 
The three relations User, Visit and Website are stored at Site 1, Site 2, and Site 3, 
respectively and the following information is available. 
 number of records in User = 120,000 
 number of records in Visit = 300,000 
 number of records in Website = 3,000 
 20% of the users never visit any website 
 size of ip, dest‐ip = 4 bytes 
 size of user‐id = 12 bytes 
 size of region = 10 bytes  
 size of last‐visit‐date = 8 bytes 
 size of url = 20 bytes 
The following SQL query submitted at Site 3 (and therefore the query result is needed at Site 
3) retrieves the user‐id, url and last‐visit‐date of visited website of all users who are located 
in the same region where the website is located in. 
SELECT    User.user‐id, Website.url, Visit.last‐visit‐date
FROM    User, Visit, Website
WHERE    User.ip =Visit.ip     AND
    Website.dest‐ip = Visit.dest‐ip AND
    User.region = Website.region    
(a) Suppose the performance criterion is to minimize the data transmission cost in at most 
four data transfers between different sites.  Based on the available information, for each 
of the following estimated number of output tuples, suggest a strategy for executing this 
query and find the data transmission cost in number of bytes transferred in each step 
and compute the total data transmission cost at the end. 
(i) 290,000 
(ii) 90,000 
(iii) 50,000 
(b) If there is no limitation on number of data transfers, revise your strategies and re‐
compute the data transmission costs in part (a), if necessary. 
    
联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!