Page rank algorithm

Jung Hoon Kim
N5, Room 2239
E-mail: junghoon.kim@kaist.ac.kr

2014.01.14

KAIST Knowledge Service Engineering
Data Mining Lab.

1

Introduction
 First introduced by Sergey Brin & Larry Page in 1998
 Original ranking algorithm didn’t suitable for web in 1996
 # of Web pages grew rapidly


in 1996, query “classification technique” => 10 million relevant
page searched!

 content similarity method are easily spammed


vulnerable for spam page

Data Mining Lab.

2

Basic
 page rank algorithm has two principle
 A hyperlink from a page pointing to another page is an
implicit conveyance of authority to the target page.
thus, the more in-links that a page i receives, the more
prestige the page i has
 Pages that point to page i also have their own prestige
score. A page with higher prestige score pointing to i is
more important than a page with a lower prestige score
pointing to i

Data Mining Lab.

3

principle
 hyperlink trick

 many incident node means more important

Data Mining Lab.

4

Authority
 more authority people say .. is more important

 John is computer scientist
 Alice is cooker
Data Mining Lab.

5

Big picture
 big picture

 famous person is means having many incident edges
Data Mining Lab.

6

Cyclic problem
 In web, there are many cycles like this

 this matrix has cycle A->B->E
 it means the score is increased by infinitely

Data Mining Lab.

7

Random suffer trick
 To avoid many problem and many reason
 they adapted random surfer






each node can ability to move any node
it can solve cycle problem
high incident node can have high rank
sometimes it called as damping factor(d)
 by google initial model, d = 0.15

Data Mining Lab.

8

Test
 1000 times test result
 nearly correct ;
 D, A has high rank


A has only one incident link

 To easily identify rank, to

express percentage is good
methods

Data Mining Lab.

9

 Example

Data Mining Lab.

10

Solve cycle problem
 Solve cycle problem

Data Mining Lab.

11

Formula


a
1

i

b
3
c
2
Data Mining Lab.

12

Formula
 in mathematically, we have a system of n linear

equations.
 P=(P1, P2, P3 , … Pn)

 A is adjacent matrix, so we can make this formula
Data Mining Lab.

13

Example

Data Mining Lab.

14

Linear Algebra
 formula
 P is an eigenvector with the corresponding eigenvalue of 1.
 1 is the largest eigenvalue and the PageRank vector P is the

principle eigenvector


to calculate P, we can use power iteration algorithm

Data Mining Lab.

15

Condition
 but the conditions are that A is a stochastic matrix and

that it is irreducible and aperiodic
 We can see the graph model as markov model
 each web page is node and hyperlink is transition

 A is not a stochastic matrix, because there are zero

row(5). zero row means no out-link.
 So we fix the problem by adding a complete set of outgoing

links from each such page i to all the pages on the Web
Data Mining Lab.

16

Modified version

Data Mining Lab.

17

irreducible
 if there is no path from u to v, A is not irreducible because

of some pair of nodes u and v.
 if there are path u to v, A is irreducible!

 A state i is periodic with period k > 1 if k is the smallest

number such that all paths leading from state i back to
state i have a length that is a multiple of k. If a state is not
periodic, A markov chain is aperiodic if all states are
aperiodic

Data Mining Lab.

18

Page Rank
 It is easy to deal with the above two problems with a

single strategy
 We add a link from each page to every page and give each

link a small transition probability controlled by a parameter
d

Data Mining Lab.

19

Page Rank
 The computation of pagerank values of the Web pages can

be done using the power iteration method, which produces
the principal eigenvector with an eigenvalue of 1
 The iteration ends when the PageRank values do not
change much or converge.

Data Mining Lab.

20

Real Page rank
 To deal with web spam is most important thing

 give equal random surfer constants and calculate all the

page needs to many times to calculate it
 Currently, Google use more 200 factors to calculate
ranking in web

Data Mining Lab.

21

Thank you

Data Mining Lab.

22

Page rank algorithm

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Page rank algorithm

Similar to Page rank algorithm (20)

Recently uploaded

Recently uploaded (20)

Page rank algorithm