首页 >
> 详细

CS103 - Pagerank

1 Introduction

You will write a program to rank webpages in an artificial webgraph. Your program will implement Pagerank algorithm

[1] used by Google to order search results. Pagerank is not the only algorithm used currently by google to order search

results, but it is the first used by the company[2].

This assignment requires you to create C++ classes to model objects in a webgraph such as webpages. You will also

use file I/O and stringstreams to read and write text files containing web information. Further, you will understand

how to implement directed graphs in C++ that represent webpages and their outbound links. Pages are represented

by vertices, while links between pages are directed edges connecting pages.

In this document, the word link is used interchangeably with hyperlink.

2 Background

The world wide web (WWW) is a network of web pages that are connected through hyperlinks. WWW is described by

a webgraph: vertices of the graph correspond to webpages, the directed edges between vertices represent hyperlinks.

Figure 1: Example of a webgraph

Figure 1 shows an artificial webgraph. The graph shows that there is a link from isi.edu to usc.edu. It also shows

that usc.edu and ucla.edu both have links to nsf.gov.

You will create a Page class that represent webpages. A webpage object should contain information about the

page name and the links from the page.

Links can be represented by arrays, vectors or lists. Each webpage should store the page ids of the pages it is

referencing – have an outbound links to them.

3 Pagerank

Pagerank [1] assigns numerical weights to pages in a webgraph. The purpose of the weights is to measure the importance

of a page among the rest. The algorithm attempts to model random surfer’s behavior who clicks on links at random.

The pagerank of a page represents the probability that a random surfer end up visiting that page.

Pagerank is defined as:

PR(A) =

∑

v∈BA

PR(v)

L(v)

1

where PR(A) is the Pagerank of page A, PR(v) is the Pagerank of pages v which link to page A, L(v) is the number

of outbound links on page v, and BA is the set of pages that link to A

Figure 2: Pagerank example

Consider the graph in Figure 2, where A links to C, B links A, C links to A and B, and D has no incoming links.

We get the following equations for the Pagerank calculation:

Since we are dealing with probabilities we have PR(A) + PR(B) + PR(C) + PR(D) = 1. We need three more

equations to calculate pageranks, we pick:

PR(A) = PR(B) +

PR(C)

2

PR(B) =

PR(C)

2

+ PR(D)

PR(C) = PR(A)

Solving the above equations we get

PR(A) =

2

5

= 0.4 PR(B) =

1

5

= 0.2 PR(C) =

2

5

= 0.4 PR(D) = 0.0 (1)

4 Prelab

The purpose of the prelab is to give you intuition into the Pagerank. Before getting into the details, you need to get

familiar with the project files and skeleton code.

We provided you a skeleton code. You can check the code by going to Vocareum, select the PA, launch the terminal

and type:

$ls -1

gen_web.py #Python script to generate random graphs

graph_20_1_random.png #Picture visualizing a graph with 20 pages.

graph_20_1_random.txt #Description of a graph with 20 pages - visualized in graph_20_1_random.png

graph_30_0 .5 _random.png

graph_30_0 .5 _random.txt

graph_rep.png #Picture visualizing a graph shown in Figure 2

graph_rep.txt #Description of a graph shown in Figure 2

Makefile #Makefile to compile your code.

page.h #Models a page in the webgraph.

page_rank.cpp #main() function that reads graph description from a file ,calcuate

#pagerank and write results into a file.

readme.txt #readme file , where you answer prelab questions.

web.h #Web class contains all Pages and implement pagerank algorithm

Generate a random graph by using the gen web.py script. Make sure you are in the assignment directory and

run:

./ gen_web.py -N 4 -s 1 -o mygraph.txt -g

This generates a random webgraph with 4 pages similar to the one in Figure 3. The graph description is at

mygraph.txt. A picture of the graph is at mygraph.png.

Calculate pageranks of the pages in the graph you generated. Let PR(id = k) be the page rank of the page

with id=k. Formulate four equations starting with PR(id = 0) + PR(id = 1) + PR(id = 2) + PR(id = 3) = 1

and solve them by any mean you want. Pageranks for pages in the graph shown in Figure 3 is: PR(id = 0) = 25 ,

PR(id = 1) = 15 , PR(id = 2) =

1

5 and PR(id = 3) =

1

5 .

Note: In your readme.txt, include the equations and solution for the pageranks of the graph you generated. Make

sure to show your work.

2

Figure 3: A Random Webgraph

5 Randomwalk and Pagerank

Calculating Pagerank analytically for large webgraph - millions of pages - is computationally challenging due to the

large number of variables, and equations. It requires solving millions of equations.

Since PageRank is trying to measure the behavior of a random web surfers traversing the web, we can simulate

those surfers. You will use a randomwalk to model the behavior of a random web surfer. In randomwalk, a walker

(modeling a web surfers) starts at a random page and keeps moving from one page to another by selecting one of the

outgoing edges randomly. In order to simulate, we need some way model the web graph and the links. You will need

to create classes to represent pages and their connections. Since the pagerank of a page represents the probability

that a random surfer eventually visits that page, you are going to simulate a large number of surfers who randomly

click on links. After some time, the population at each page is an estimate of the page rank.

The algorithm is as follows.

Let S be the number o f s imu la t i on i t e r a t i o n s .

Let N i s the number o f walkers .

Divide the walkers equa l l y a c r o s s pages .

f o r i = 1 : S do

Each walker chooses a l i n k randomly from i t s cur rent page

and walk over the edge to a new page

done

fo r each page p do

pagerank (p) = the propor t ion o f walkers in page p .

done

(a) time t = 0 (b) time t = 1

Figure 4: Random surfers behavior

Suppose at time t = 0 the four walkers, red,green, blue and yellow, are distributed uniformly among the pages in

the graph shown in Figure 4. Walker green has only one link to click, which is B, while Red has two options A or B.

Yellow and blue both have one link to clicks. Red chooses one randomly let assume it is A. At time t = 1, page A

has two walkers , while B and A have one each. After simulating randomwalks for some time, you can calculate the

pagerank. The proportion of the walkers on a page is an estimate of the pagerank.

3

6 File format

Your program is required to read and writing from a textfile containing information about the webgraph. The format

of the file is as below.

1: Number of webpages in the file

2: pageid 0

3:

4:

5:

... ...

n-3: pageid k

n-2:

n-1:

n:

Below is an example of a file describing the graph shown in Figure 2.

4

0

A. com

0 .0

2

1

B. com

0 .0

0

2

C. com

0 .0

0 1

3

D. com

0 .0

1

7 Procedure

1. Implement a class to model a web page and its outgoing links following the specification in Figure 5.

The class Page is already defined in page.h. You need to implement the class in page.cpp. You may also need

to declare new methods in page.h and implement them in page.cpp. The class should contain:

Figure 5: Page class

(a) An integer to represent the pageid.

(b) A string to represent the page URL.

(c) An array or vector to model the outbound links of the page.

(d) Methods to access and modify internal data.

2. Implement a class to model the webgraph following the specification in Figure 6. The class should be able to

read/write a graph description, and calculate pagerank.

The class Web defined in web.h models the webgraph. You need to implement web.cpp and you can declare

additional methods and data members in web.h

4

Figure 6: Web class

(a) A list of the pages in the webgraph.

(b) A method read graph() to read a graph from the file.

(c) A method write graph() to write a webgraph to a file.

(d) A method to calculate the pagerank.

3. The main program is already implemented in page rank.cpp. The main() creates an object of type Web and

instructs it to read from a file, then calculate pagerank and finally, write the graph into a file.

Through command line arguments main() takes a graph description file, output file, number of random surfers

and number of simulation iterations.

To run the main you can type the following from the command line arguments:

$./ pagerank

4. Run your program with the example provided with the project files.

(a) To calculate the page rank of a graph in shown in Figure 2:

$./ pagerank graph_rep.txt graph_out.txt 1000 16000

This reads the webgraph information from graph rep.txt – graph depicted in figure2, performs randomwalk

to calculate pagerank with 1000 walkers for 16000 simulation steps, and write the webgraph with the

pagerank in graph out.txt. The output graph should look like, which is pretty close to what we calculated

in Equation 1

4

0

A

0.397

2

1

B

0.199

0

2

C

0.404

0 1

3

D

0

1

(b) Calculate the pagerank of graph 20 1 random.txt by running

./ pagerank graph_20_1_random.txt graph_20_1_random_out.txt 10000 16000

Page rank of hrs5lwt8.mil should be close to 0.3595, 2005b.gov 0.3672, and s3b.com is 0.2733.

(c) Pagerank of graph 30 0.5 random.txt

./ pagerank graph_30_0 .5 _random.txt graph_30_0 .5 _random_out.txt 10000 16000

Pagerank of vnm1um.edu is 0.3631, 1cr.mil is 0.311 and 2s.net is 0.3259.

8 Submission

In addition to completed web.h, web.cpp, page.h, page.cpp and page rank.cpp files, you need to include a readme.txt

with the solution to the pagerank equations for the random graph you generated. Be sure to include mygraph.txt

as well.

5

References

[1] L. Page, S. Brin, R. Motwani, and T. Winograd, “The pagerank citation ranking: Bringing order to the web.,”

Technical Report 1999-66, Stanford InfoLab, November 1999. Previous number = SIDL-WP-1999-0120.

[2] Wikipedia, “Pagerank — wikipedia, the free encyclopedia,” 2017. [Online; accessed 28-October-2017].

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp2

- Cs2461-10实验程序代做、代写java，C/C++，Python编程设 2021-03-02
- 代写program程序语言、代做python，C++课程程序、代写java编 2021-03-02
- Programming课程代做、代写c++程序语言、Algorithms编程 2021-03-02
- 代写csc1-Ua程序、代做java编程设计、Java实验编程代做 代做留学 2021-03-02
- 代做program编程语言、代写python程序、代做python设计编程 2021-03-02
- 代写data编程设计、代做python语言程序、Python课程编程代写 代 2021-03-02
- Cse 13S程序实验代做、代写c++编程、C/C++程序语言调试 代写留学 2021-03-02
- Mat136h5编程代做、C/C++程序调试、Python，Java编程设计 2021-03-01
- 代写ee425x实验编程、代做python，C++，Java程序设计 帮做c 2021-03-01
- Cscc11程序课程代做、代写python程序设计、Python编程调试 代 2021-03-01
- 代写program编程、Python语言程序调试、Python编程设计代写 2021-03-01
- 代做r语言编程|代做database|代做留学生p... 2021-03-01
- Data Structures代写、代做r编程课程、代做r程序实验 帮做ha 2021-03-01
- 代做data留学生编程、C++，Python语言代写、Java程序代做 代写 2021-03-01
- 代写aps 105编程实验、C/C++程序语言代做 代写r语言程序|代写py 2021-03-01
- Fre6831 Computational Finance 2021-02-28
- Sta141b Assignment 5 Interactive Visu... 2021-02-28
- Eecs2011a-F20 2021-02-28
- Comp-251 Final Asssessment 2021-02-28
- 代写cs1027课程程序、代做java编程语言、代写java留学生编程帮做h 2021-02-28