DATABASE SYSTEMS
Assignment 2
Important Note: Students must work on assignments individually. You may not discuss
the speci c questions in this assignment, nor their solutions with any other student. You
may not copy any solution, in whole or in part. You may not share any solution with other
students or on the Internet. You are encouraged to discuss the general concepts involved in
the questions in the context of completely di erent examples, for examples, the questions
discussed in class. If you are in doubt as to what constitutes acceptable discussion, please
ask! I take academic integrity very seriously. You can nd information on academic integrity
here: http://students.sfu.ca/academicintegrity.html
We will use the schema available in Piazza, under Assignments.
Assignment. Write the following queries in SQL.
(01) Total number of orders placed in 2016 by customers of each country. If all customers
from a speci c country did not place any orders in 2016, the country appears in the
output with 0 total orders. Return all countries and corresponding total counts.
(02) For each product type calculate the average number of times products of that type
have been included in an order (taking into account quantities). Orders that do not
include any product of a certain type do not contribute to the average for that type
(rather than contributing 0). Return the product type and the corresponding average;
the latter must be a number with exactly two decimal places. Types of products that
have never been ordered are not included in the output.
(03) Find invoices that have been taxed. The amount of such invoices is the total of the
order they refer to plus 20% of tax. Return the invoice ID. Keep in mind the datatype
of \amount" in invoices and pay special attention to rounding.
(04) For each type of product, nd the customer who ordered the highest number of products
of that type (taking into account quantities). Return the product type and the ID of
the customer. If two or more customers are tied for a speci c product type they will
all be included in the output. If no products of a speci c type have ever been ordered,
that type will not be in the output.
1
(05) For each customer, calculate the number of orders placed and the average spend,
which is the average total (taking into account quantities and unit prices of ordered
products) across all orders placed by that customer. Return the customer ID, the
number of orders, and the corresponding average spend. The latter must be a number
with exactly two decimal places. Orders without detail must be considered in the
calculation of the average as having a total of 0. Customers who did not place any
orders will be included in the output with 0 orders (not 0.00 or anything else, just 0)
and NULL average spend.
(06) For each product type, calculate the number of orders consisting only of products of
that type. Return the product type and the number of orders.
(07) Find \poor readers": customers who have not purchased any books in 2016. Return
the customer ID.
(08) Find customers who spent less than 50 in music in the period between January and
June 2016 (extremes included). Return the customer ID and the corresponding spend.
Customers who have not bought any music in the given period must be returned with
0.00 spend (precisely in this form).
(09) For each customer who has placed at least two orders, calculate the longest number
of days ever elapsed between any one order and the next. The only case in which
this interval (which you can calculate as the di erence of two dates) will be zero is
for customers who placed at least two orders and all of them were placed in the same
day. Whether an order has a detail is irrelevant. Return the customer ID and the
corresponding interval (which will be an integer).
(10) Find customers who have placed at least 5 orders, all in di erent dates, and the interval
between any one of their orders and the next (not on the same date) is less than 30 days
on average. The interval between any two orders placed in the same day (which is 0)
does not contribute to the average. Whether an order has a detail or not is irrelevant.
Return the customer ID only.
To some extent, this query is similar to the previous one, but subtly di erent. Take a
moment to think before doing copy & paste.
Submission instructions.
• You will be submitting your work through CourSys. What to submit:
a .zip le consisting of 10 les with the SQL queries (so that we could run them).
Before producing the .zip le, put all your queries in a directory (folder) called
Lastname-Firstname-queries. Then produce .zip le of that folder.
• Each query must be written in a text le named .sql where is the two-digit
number in the list of queries above. For example, the rst query will be written in le
01.sql and the last one in le 10.sql.
2
• Each le consists of a single SQL statement, terminated by semicolon (;). Submitted
les that do not contain exactly one semicolon will be discarded when the submission
is processed and consequently they will not be assessed (as if they were not submitted).
Please pay attention to this; even if it looks like a trivial detail, it is not.
• Files can be submitted more than once, in which case the previously submitted version
will be overwritten.
• Before submitting your les, you should check that they run smoothly in SQL Server
in CSIL, using the exact database schema provided in the link above.
• As le history is not recorded, les whose latest version is submitted after the deadline
(as reported in the log) will not be considered for assessment, unless you have been
granted a homework extension.
Assessment. The report section for Part 3 describing any additional testing is worth 10
marks. Your queries will be executed on 5 database instances without nulls. Each query is
worth 10 marks, which are allocated as follows:
• 2 marks are given for each test database on which the query returns all and only the
correct answers.
• 1 mark is given for each test database on which the query returns at least 50% of the
correct answers (but not all of them) and no wrong answers.
The queries will not be assessed for their performance, as long as they terminate after a
\reasonable" time; each query should not take more than a few seconds to run. Style. will
not be assessed this time either: you could write a query on single line, but this is not
recommended for your own sake.
Test data. Two of the ve instances on which your queries will be assessed are have been
created and loaded with data.
Query answers. The answers your queries are supposed to return on the given database
instances are available in CSV format in Piazza, under Resources/Assignment 1. The order
in which the rows appear in the answer to your queries is irrelevant for this assignment (no
ordering is enforced on the answers, so the DBMS will output rows in an arbitrary order).
The names of the columns in the answers are also irrelevant (the CSV les we provide have
no header) so they can be renamed as you wish in the SELECT clause. What is important
is the number of columns and the order in which they appear in the output: (1; 2) is not
the same as (2; 1; 1). Your query gives a fully correct answer if it outputs all and only the
rows (with repetitions, if that’s the case) listed in the corresponding CSV le, no matter on
which line.