SlideShare a Scribd company logo
1 of 13
1
The Maths behind Web search engines
(PageRank)
2010
Dante Vsevolod Zubov
2
Web search in a nutshell

Storing pages

Use depth first search to discover new pages
(crawling)

Ranking results

Human generated ranking
− Yahoo! Directory

Automated ranking
− Text
− Meta data (title, keywords etc.)
3
Motivation for the PageRank

prioritization of results is crucial

PageRank uses the link structure of the Web to
rank pages by their “importance”

named after Larry Page (co-founder of Google)
4
Simple PageRank
Pi
is a page
r: page → [0,1] is the PageRank function
BPi
is the set of pages pointing to Pi (backlinks)
|Pj
| is the number of outlinks pointing from Pj
Calculate by iterating:
5
Simple PageRank

Will r converge in all cases?

What about pages with no outlinks?
6
Random Surfer model
• on any page, random surfer will follow one of the
outlinks at random with some probability d
(damping factor, usually taken as 0.85);
• random surfer will get bored and select some
page from entire Web at random with probability
(1-d) ;
• if the page does not have outlinks then random
surfer is “teleported” to some random page on
the Web.
7
Adjusted PageRank
 BPi
now includes all of the sink pages

r a discrete probability distribution
8
PageRank example
9
Matrix representation

The link structure of the Web
can be represented by an
adjacency matrix

Define
10
Matrix representation
 The matrix M on the right is column normalized
version of the adjacency matrix we saw earlier.
 l (Pi
, Pj
)= 0 if Pj does not link to Pi
 l (Pi
, Pj
)= 1/|Pj
| if Pj links to Pi
11
existence and uniqueness of R

Let E be the N by N matrix with all its elements
equal to 1 then ER = 1.

Call the matrix in the middle M' then R is a
1-eigenvector of M'

M' is a stochastic matrix

By Perron–Frobenius theorem R does in fact
exist and is unique
12
computation of R

Algebraic method

Iterative method

repeat iteration until
13
Questions?
References:

Brin, S. and Page, L. (1998) The Anatomy of a
Large-Scale Hypertextual Web Search Engine.

Google's PageRank and Beyond, Langville &
Meyer (2006), Chapter 4

http://en.wikipedia.org/wiki/PageRank (11/2010)

More Related Content

Viewers also liked

3.4 Application of Set Theory
3.4   Application of Set Theory3.4   Application of Set Theory
3.4 Application of Set TheoryGary Ball
 
Union and Intersection of Sets
Union and Intersection of SetsUnion and Intersection of Sets
Union and Intersection of Setsayesha nigar
 
Class 5 - Set Theory and Venn Diagrams
Class 5 - Set Theory and Venn DiagramsClass 5 - Set Theory and Venn Diagrams
Class 5 - Set Theory and Venn DiagramsStephen Parsons
 
Three Circle Venn Diagrams
Three Circle Venn DiagramsThree Circle Venn Diagrams
Three Circle Venn DiagramsPassy World
 
Venn Diagrams and Sets
Venn Diagrams and SetsVenn Diagrams and Sets
Venn Diagrams and SetsPassy World
 
Ppt sets and set operations
Ppt sets and set operationsPpt sets and set operations
Ppt sets and set operationsgeckbanaag
 

Viewers also liked (10)

3.4 Application of Set Theory
3.4   Application of Set Theory3.4   Application of Set Theory
3.4 Application of Set Theory
 
Union and Intersection of Sets
Union and Intersection of SetsUnion and Intersection of Sets
Union and Intersection of Sets
 
Set Theory and its Applications
Set Theory and its ApplicationsSet Theory and its Applications
Set Theory and its Applications
 
Sets and Subsets
Sets and SubsetsSets and Subsets
Sets and Subsets
 
Problems involving sets
Problems involving setsProblems involving sets
Problems involving sets
 
Class 5 - Set Theory and Venn Diagrams
Class 5 - Set Theory and Venn DiagramsClass 5 - Set Theory and Venn Diagrams
Class 5 - Set Theory and Venn Diagrams
 
Three Circle Venn Diagrams
Three Circle Venn DiagramsThree Circle Venn Diagrams
Three Circle Venn Diagrams
 
Venn Diagrams and Sets
Venn Diagrams and SetsVenn Diagrams and Sets
Venn Diagrams and Sets
 
Ppt sets and set operations
Ppt sets and set operationsPpt sets and set operations
Ppt sets and set operations
 
Maths sets ppt
Maths sets pptMaths sets ppt
Maths sets ppt
 

Similar to The Maths behind Web search engines (20)

Ranking Web Pages
Ranking Web PagesRanking Web Pages
Ranking Web Pages
 
J046045558
J046045558J046045558
J046045558
 
LINEAR ALGEBRA BEHIND GOOGLE SEARCH
LINEAR ALGEBRA BEHIND GOOGLE SEARCHLINEAR ALGEBRA BEHIND GOOGLE SEARCH
LINEAR ALGEBRA BEHIND GOOGLE SEARCH
 
Page rank by university of michagain.ppt
Page rank by university of michagain.pptPage rank by university of michagain.ppt
Page rank by university of michagain.ppt
 
Pr
PrPr
Pr
 
page rank explication et exemple formule
page rank explication et exemple  formulepage rank explication et exemple  formule
page rank explication et exemple formule
 
I04015559
I04015559I04015559
I04015559
 
Page Rank Link Farm Detection
Page Rank Link Farm DetectionPage Rank Link Farm Detection
Page Rank Link Farm Detection
 
Page rank and hyperlink
Page rank and hyperlink Page rank and hyperlink
Page rank and hyperlink
 
Pagerank
Pagerank Pagerank
Pagerank
 
Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461
 
Google page rank
Google page rankGoogle page rank
Google page rank
 
Page Rank
Page RankPage Rank
Page Rank
 
Page Rank
Page RankPage Rank
Page Rank
 
Page Rank
Page RankPage Rank
Page Rank
 
Page Rank
Page RankPage Rank
Page Rank
 
Page Rank
Page RankPage Rank
Page Rank
 
Page Rank
Page RankPage Rank
Page Rank
 
nueva
nuevanueva
nueva
 
Page Rank
Page RankPage Rank
Page Rank
 

The Maths behind Web search engines

  • 1. 1 The Maths behind Web search engines (PageRank) 2010 Dante Vsevolod Zubov
  • 2. 2 Web search in a nutshell  Storing pages  Use depth first search to discover new pages (crawling)  Ranking results  Human generated ranking − Yahoo! Directory  Automated ranking − Text − Meta data (title, keywords etc.)
  • 3. 3 Motivation for the PageRank  prioritization of results is crucial  PageRank uses the link structure of the Web to rank pages by their “importance”  named after Larry Page (co-founder of Google)
  • 4. 4 Simple PageRank Pi is a page r: page → [0,1] is the PageRank function BPi is the set of pages pointing to Pi (backlinks) |Pj | is the number of outlinks pointing from Pj Calculate by iterating:
  • 5. 5 Simple PageRank  Will r converge in all cases?  What about pages with no outlinks?
  • 6. 6 Random Surfer model • on any page, random surfer will follow one of the outlinks at random with some probability d (damping factor, usually taken as 0.85); • random surfer will get bored and select some page from entire Web at random with probability (1-d) ; • if the page does not have outlinks then random surfer is “teleported” to some random page on the Web.
  • 7. 7 Adjusted PageRank  BPi now includes all of the sink pages  r a discrete probability distribution
  • 9. 9 Matrix representation  The link structure of the Web can be represented by an adjacency matrix  Define
  • 10. 10 Matrix representation  The matrix M on the right is column normalized version of the adjacency matrix we saw earlier.  l (Pi , Pj )= 0 if Pj does not link to Pi  l (Pi , Pj )= 1/|Pj | if Pj links to Pi
  • 11. 11 existence and uniqueness of R  Let E be the N by N matrix with all its elements equal to 1 then ER = 1.  Call the matrix in the middle M' then R is a 1-eigenvector of M'  M' is a stochastic matrix  By Perron–Frobenius theorem R does in fact exist and is unique
  • 12. 12 computation of R  Algebraic method  Iterative method  repeat iteration until
  • 13. 13 Questions? References:  Brin, S. and Page, L. (1998) The Anatomy of a Large-Scale Hypertextual Web Search Engine.  Google's PageRank and Beyond, Langville & Meyer (2006), Chapter 4  http://en.wikipedia.org/wiki/PageRank (11/2010)