CS5481 Data Engineering Assignment 3
Deadline: 6‐DEC‐2019 (Friday), noon (no late submission will be accepted)
Submission: Canvas
Format: word or pdf
Points to note:
o Different books may have slightly different descriptions of concepts, algorithms and
terminologies. To ensure uniformity in marking, you must follow the convention used in
the lecture slides or our textbook (Database System Concepts). Other conventions will
not be accepted.
o You must show the steps clearly. The marker will not give you marks if s/he cannot
understand your work.
o This is an individual assignment. You must work on your own. Check
http://www6.cityu.edu.hk/ah/plagiarism.htm for “The Problem of Plagiarism”.
1. Transactions [35%]
(a) Consider the following two transactions.
T1: read(A); A:=A+2; write(A); read(B); B:=B*3; write(B);
T2: read(B); B:=B*2; write(B); read(A); A:=A+3; write(A);
Assume that the two data items A and B are integers. Also assume that the two
transactions preserve database consistency in isolation.
(i) Assume A = 1 and B = 2 initially. Demonstrate that both serial schedules, i.e.,
(T1, T2) and (T2, T1), are equivalent by showing the final values of the two data
items in the database.
(ii) Give a concurrent schedule1
that is equivalent to a serial schedule, i.e., giving
the same final values in the database as a serial schedule.
(iii) Give a concurrent schedule1
that results in an inconsistent state and show the
final values of the two data items in the database.
(b) The two transactions in Part (a) can be simplified as follows by showing read and
write instructions only.
T1: r1(A); w1(A); r1(B); w1(B);
T2: r2(B); w2(B); r2(A); w2(A);
(i) Give all possible schedules2 (including serial schedule) of the two transactions
that are conflict‐equivalent to the serial schedule (T1, T2).
(ii) Give all possible schedules2
(including serial schedule) of the two transactions
that are conflict‐equivalent to the serial schedule (T2, T1).
(c) Transaction T2 in Part (b) has been changed as follows, so A is processed before B.
T1: r1(A); w1(A); r1(B); w1(B);
T2: r2(A); w2(A); r2(B); w2(B);
Give all possible conflict‐serializable schedules2
(including serial schedules) of the two
transactions.
1
Present the schedule like those shown on slides 13-14 of lec5. 2
Present the schedule with read and write instructions only.
CS5481 Data Engineering Assignment 3
2. Distributed Query Processing [65%]
Consider the following relations of a database of an Internet company.
User (ip, user‐id, region)
Visit (ip, dest‐ip, last‐visit‐date)
Website (dest‐ip, url, region)
For each user, there is an ip, user‐id, and region where the user is located in. For each
website, there is an dest‐ip, url, and region where the website is located in. When a user
visit a website, the last‐visit‐date will be stored or updated if the record already exists.
The three relations User, Visit and Website are stored at Site 1, Site 2, and Site 3,
respectively and the following information is available.
number of records in User = 120,000
number of records in Visit = 300,000
number of records in Website = 3,000
20% of the users never visit any website
size of ip, dest‐ip = 4 bytes
size of user‐id = 12 bytes
size of region = 10 bytes
size of last‐visit‐date = 8 bytes
size of url = 20 bytes
The following SQL query submitted at Site 3 (and therefore the query result is needed at Site
3) retrieves the user‐id, url and last‐visit‐date of visited website of all users who are located
in the same region where the website is located in.
SELECT User.user‐id, Website.url, Visit.last‐visit‐date
FROM User, Visit, Website
WHERE User.ip =Visit.ip AND
Website.dest‐ip = Visit.dest‐ip AND
User.region = Website.region
(a) Suppose the performance criterion is to minimize the data transmission cost in at most
four data transfers between different sites. Based on the available information, for each
of the following estimated number of output tuples, suggest a strategy for executing this
query and find the data transmission cost in number of bytes transferred in each step
and compute the total data transmission cost at the end.
(i) 290,000
(ii) 90,000
(iii) 50,000
(b) If there is no limitation on number of data transfers, revise your strategies and re‐
compute the data transmission costs in part (a), if necessary.