2807ICT Programming Principles
Mark This will be marked out of 100, contributing to 19 % of the course
total.
Conditions This assignment is strictly individual { no group or team work.
Students may discuss the problems with anyone, but may not view
or reproduce another person’s code, or show to or share their code
with other persons (with the exception of teaching sta ) or on any
online service.
1 Background
Aliens exist! Scientists have intercepted a communications stream from another planet for the rst time.
Initial analysis has shown that the data stream is in fact the aliens’ version of Twitter.
The United Nations have decided that we will reach out to these aliens and contact them, but only
after we have learned as much as we can about them. At present we can not translate their language, but
we can try to learn about them as a society and as individuals by textual analysis of the stream.
2 The data
The data to analyse is saved in two text les:
follows.txt which contains the information about which other users follow each user; and
stream.txt which is the actual alien Twitter stream of tweets.
2.1 User names
In the text les a user name is a sequence of alphanumeric characters, length >= 0; for example,
mrmagoo23.
2.2 follows.txt
The le follows.txt contains the follow graph. Each line is of the form
user0user2 represents some whitespace. It means that user0 follows user1, user2, ... If a user doesn’t
follow anyone, the line looks like this:
user0
The lines are in no particular order.
1
2.3 stream.txt
The le stream.txt contains the Twitter stream in chronological order. Each line is one of:
ordinary tweet
user0RTany text...
where user0 is the username of the user who has retweeted an original tweet authored by user1.
direct message
user0@user1any text...
where user0 is author of a private message, normally seen only by user1.
Within the content of any tweet (any text...) can appear mentions of other users. Mentions start with
@ followed by a user name. The user name is terminated by the end of the line or any non-alphanumeric
character. Any user mentioned will see that tweet, even if it is a direct message. A tweeter might make
a mistake, so a mentioned user, or direct message addressee, might not actually exist.
3 Tasks
Your tasks for this assignment are to write Python programs that analyse one or more of the data les to
answer some questions we have about the alien Twitter users.
3.1 Task 1: Who are the most social users? (10 marks)
The more social a user is, the more other users the user follows. Write a program that prints the user
name of the most social user. In the case of a tie, print the user names of all the most social users, one
per line, in lexicographic order. Sample output:
andrew
mrmagoo23
sally
3.2 Task 2: Who are the top tweeters? (20 marks)
Write a program that prints the usernames of the top n tweeters and how many tweets they authored. n
is a number prompted for and entered by the user. The top tweeters authored the most original tweets
(not retweets). Sample output for n = 5:
1034 mrwordy
999 chatty3 d555lucy
10 blabberMcBlabberFace john mrmagoo23
Note:
how the output is formatted in a table with justi ed columns;
that users with the same numbers of tweets were printed in lexicographic order; and
that even though only the top 5 were requested, because of a tie, more than 5 were printed.
3.3 Task 3: Who are the top quotables? (20 marks)
Write a program that prints the usernames of the top n quotables and how many times in total their
tweets were retweeted. n is a number prompted for and entered by the user. The top quotables are those
users whose tweets were retweeted the most often. Format the output in the same manner as for task 2.
2
3.4 Task 4: Who sees the most tweets? (30 marks)
Write a program that prints the usernames of the top n users that would have seen the most tweets and
how many tweets they would have seen. A user will see a tweet if it comes from a user they follow, or they
are mentioned in the tweet, including as addressees in direct messages and as the authors of retweeted
tweets. Format the output in the same manner as for task 2.
4 Report (10 marks)
Prepare a report in PDF format that:
identi es:
{ the name of the course;
{ the year and trimester;
{ the title of the assessment item
{ your student number;
{ your name; and
{ the name of your lab tutor;
and for each task:
{ state the name(s) of the program le(s) for this task;
{ outline in English how your program is supposed to work, particularly with reference to the
data structures it uses;
{ state whether or not you think your program has worked; and
{ a screen shot of your program’s results, even if that is just an error message.
There is no set minimum or maximum length for this report, but clarity, completeness, and brevity will
be appreciated.
5 Code presentation (10 marks)
Your code will be assessed on its readability as well as its correctness. Use commenting to:
identify the task;
document the purpose and types of variables;
document your functions’ purposes, parameters and returned values, and
the purpose of sections of your code.
Assume the reader of your program knows Python at least as well as you do.
6 Submission
Submit your assignment in a zip le to Learning@Gri th. The zip le should only contain:
your programs as text les;
your PDF report.