首页 > > 详细

Twitter is a social networking

 Twitter is a social networking website where users can post very short messages know as "tweets". 

Each Twitter user can choose to "follow" other users, which means that they see those users' tweets. 
A Twitter user sees the tweets of users they are "following", and their tweets are seen by their 
"followers" (the users who follow them).
All the "follow" connections define a network among Twitter users, and it's quite interesting to look for 
patterns in the connections. Tools like Twiangulate let you explore questions like "what connections do 
my two friends have in common?". In this assignment, you'll write a program that lets you ask 
questions (or "queries") about a Twitter dataset.
Any tool for exploring the Twitterverse must get its data from Twitter itself. Twitter provides an API to 
allow programmers to write programs that interact with Twitter and extract data from it. In general, an 
API is a module that defines functions for accessing underlying data and performing other tasks 
without having to know how that data is actually stored and retrieved.
To make this assignment more manageable for you, we will assume that the information we need has 
already been extracted from Twitter and stored in a file.
How to tackle this assignment
This is your first experience designing a program of this size. We are providing detailed advice to help 
you break the task down into manageable pieces.
Make sure your twitterverse_functions.py module runs without error before submitting. If you have 
syntax errors in your module, comment them out before submitting, or we will not be able to test your 
functions!
The Twitter Data File
A Twitter data file contains a series of one or more user profiles, one after the other. Each user profile 
has the following elements, in this order:
A line containing a non-blank, non-empty username. You may assume that usernames are unique; 
that is, a single username will not occur more than once in the file, and that usernames do not contain 
any whitespace.
A line for the user's actual name. If they did not provide a name, this line will be blank.
A line for the user's location, or a blank line if they did not provide one.
A line for the URL of a website, or a blank line if they did not provide one.
Zero or more lines for the user's bio, then a line with nothing but the keyword ENDBIO on it. This 
marks the end of the bio, and is not considered part of it. (You may assume that no bio has the string 
ENDBIO within it.) If the user did not provide a bio, the ENDBIO line will come immediately after the 
website line, with no blank line in between.
Zero or more lines each containing the username of someone that this user is following, then a line 
with the keyword END on it. (You may assume that no one has END as their username.) A user 
cannot be on his or her own following list. You may assume that every user on a following list has a 
user profile in the Twitter data file.
Notice that the keywords act as separators in this file. All of their letters are capitalised, and the 
keywords contain no punctuation.
Examples
Here is a sample user profile that might occur among many in a file:
tomCruise
Tom Cruise
Los Angeles, CA
http://www.tomcruise.com
Official TomCruise.com crew tweets. We love you guys! 
Visit us at Facebook!
ENDBIO
katieH
NicoleKidman
END
The file data.txt is a smallish example of a complete Twitter data file (and was made by hand) and the 
file rdata.txt (see starter code) is a much larger example (and is made from real data extracted from 
Twitter). These should help you confirm your understanding of the file format and will also be useful in 
testing your program.
Cycles in the data
Although a user cannot be on their own following or followers lists, there can be "loops" (we call them 
"cycles") such as this: user A can be following B who is following A. This is the shortest possible cycle. 
Of course, cycles can be longer.
The Query File
Note that the word "query" just means "question". In computer science, we use it to mean a request 
for information. For this assignment, a query will be provided in a file. Below we will review the high 
level parts of the query, look at an example, and then describe the format of the query file.
Overview
A query has three components: a search specification, a filter specification, and a presentation 
specification.
The search specification describes how to generate a list of Twitter usernames, starting with an initial 
username (a list of length one) and then finding their followers or people they are following, then 
people that are those people's followers or who they are following, and so on. When processing the 
search specification, don't try to do anything to avoid cycles. For instance, if the search specification 
says to find the people who user A is following, and from there the people they are following, you 
could find yourself back at user A. Don't try to avoid that.
After processing the search specification, we have a list of Twitter usernames. Its length could be 
zero. For example, if the initial username is 'adalovelace' and the search specification contains a 
single 'followers' keyword, then the length of the list will be zero if 'adalovelace' has no followers.
The filter specification describes how to filter the list of usernames produced by the search 
specification. The filtering can be based on
whether or not they are following a particular user,
whether or not a particular user is their follower,
whether their name contains a particular string (case-insensitive), or
whether their location contains a particular string (case-insensitive).
After processing the filter specification, we have a possibly reduced list of usernames.
Once the search results have been found and filtered, the presentation specification describes how 
the output should be presented. It specifies on what basis the results should be sorted, and whether 
the results should be presented in a short or long format.
Example query
Here is an example query:
SEARCH
tomCruise
following
following
following
FILTER
following c
location-includes CA
PRESENT
sort-by popularity
format long
The search specification in this particular query has four steps.
Start with a list containing the username to start the search from; i.e.,. ['tomCruise']. Let's call that list 
L1.
The search keyword 'following' says to replace each username p in L1 with the usernames of the 
users who p is following. This yields a new list, L2.
For the next 'following' keyword, we start with L2 and repeat the same operation as in the previous 
step, yielding another list, L3.
For the final 'following' keyword, we start with L3 and repeat that operation one last time, yielding list 
L4.
Notice that each step yields a list of zero or more usernames that is the input to the next step. There 
should be no duplicates in the final results list. Duplicates should be removed after each step.
The Twitter data file diagram_data.txt (see starter code) contains the follower/following relationships 
as represented by this diagram. For those relationships, the search specification above would yield 
this list of usernames: ['i', 'j', 'h', 'k', 'tomCruise']. Make sure that you can see how the four lists, ending 
with this final one, are generated. Notice that the final list contains the users you can get to in three 
"steps" of the "following" relationship, starting from 'tomCruise'.
The final list generated by the search specification becomes the input to the filter specification. For our 
current example, the filter specification says that the list should be filtered in this way: a user should 
be kept only if they are following user 'c' and has a location that includes the string 'CA'. Notice that 
the resulting list of usernames is just ['tomCruise'].
The presentation specification says to present the results in long format and to order the users 
according to their popularity.
 
联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!