1. 1
The Maths behind Web search engines
(PageRank)
2010
Dante Vsevolod Zubov
2. 2
Web search in a nutshell
Storing pages
Use depth first search to discover new pages
(crawling)
Ranking results
Human generated ranking
− Yahoo! Directory
Automated ranking
− Text
− Meta data (title, keywords etc.)
3. 3
Motivation for the PageRank
prioritization of results is crucial
PageRank uses the link structure of the Web to
rank pages by their “importance”
named after Larry Page (co-founder of Google)
4. 4
Simple PageRank
Pi
is a page
r: page → [0,1] is the PageRank function
BPi
is the set of pages pointing to Pi (backlinks)
|Pj
| is the number of outlinks pointing from Pj
Calculate by iterating:
6. 6
Random Surfer model
• on any page, random surfer will follow one of the
outlinks at random with some probability d
(damping factor, usually taken as 0.85);
• random surfer will get bored and select some
page from entire Web at random with probability
(1-d) ;
• if the page does not have outlinks then random
surfer is “teleported” to some random page on
the Web.
10. 10
Matrix representation
The matrix M on the right is column normalized
version of the adjacency matrix we saw earlier.
l (Pi
, Pj
)= 0 if Pj does not link to Pi
l (Pi
, Pj
)= 1/|Pj
| if Pj links to Pi
11. 11
existence and uniqueness of R
Let E be the N by N matrix with all its elements
equal to 1 then ER = 1.
Call the matrix in the middle M' then R is a
1-eigenvector of M'
M' is a stochastic matrix
By Perron–Frobenius theorem R does in fact
exist and is unique
13. 13
Questions?
References:
Brin, S. and Page, L. (1998) The Anatomy of a
Large-Scale Hypertextual Web Search Engine.
Google's PageRank and Beyond, Langville &
Meyer (2006), Chapter 4
http://en.wikipedia.org/wiki/PageRank (11/2010)