Upcoming SlideShare
×

# 5 Understanding Page Rank

759 views
721 views

Published on

An introductory lecture on the Google PageRank algorithm stressing the mathematical underpinnings. Based on the excellent book by Langville & Meyer.

Published in: Education, Technology
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
759
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
21
0
Likes
0
Embeds 0
No embeds

No notes for slide

### 5 Understanding Page Rank

1. 1. Understanding Google’s PageRank™ Amy Langville, Carl Meyer, Google’s Page Rank and Beyond: The Science of Search Engine Rankings. Princeton University Pres, 2006
2. 2. Review: The Search Engine
3. 3. An Elegant Formula <ul><li> π  π   S + (1-  ) E) </li></ul><ul><li>Google’s (Brin & Page) PageRank™ equation. </li></ul><ul><li>US Patent #6285999, filed 1998, granted 2001 </li></ul><ul><li> This formula resolves the world’s largest matrix calculation. </li></ul>
4. 4.  π  π   S + (1-  ) E) <ul><li>Derived from a formula B&P worked out in graduate school (itself derived from traditional bibliometrics research literature). </li></ul><ul><li> r(P i ) = </li></ul><ul><li>Essential characteristic: high-ranking pages associate with high-ranking pages </li></ul>r (P j ) |P j | _____  P j  B Pi
5. 5.  π  π   S + (1-  ) E) <ul><ul><ul><li>r(P i ) = </li></ul></ul></ul><ul><ul><ul><li>Must be applied to a set of linked pages, or a graph. </li></ul></ul></ul><ul><ul><ul><li>To do this we analyze the graph to see it’s out-links and back-links. </li></ul></ul></ul><ul><ul><ul><li>Therefore. . . </li></ul></ul></ul>r (P j ) |P j | _____ P j  B Pi  r(P i ) : the rank of a given page P j  B pi : the ranks of the set of back- linking pages r (P j ) : the rank of a given page |P j | : the number of out-links on a page
6. 6.  π  π   S + (1-  ) E) <ul><li>A site graph like this: </li></ul>1 2 3 5 4 6
7. 7.  π  π   S + (1-  ) E) <ul><li>becomes a directed graph like this: </li></ul>1 2 3 6 4 5
8. 8. But there’s a problem <ul><li>Nothing’s ranked! </li></ul>r (P j ) |P j | _____ P j  B Pi  r(P i ) : the rank of a given page P j  B pi : the ranks of the set of back- linking pages r (P j ) : the rank of a given page |P j | : the number of out-links on a page <ul><ul><ul><li>r(P i ) = </li></ul></ul></ul>1 2 3 6 4 5
9. 9. The solution. . . sort of <ul><li>Start by assuming all the ranks are equal. In this example each page is just 1 of 6, so the initial rank is expressed as 1/6 </li></ul><ul><li>Then, you keep feeding the number through the formula until you get a ranking. </li></ul><ul><li>This results in a rank matrix. . . </li></ul>1 2 3 6 4 5
10. 10. Directed graph iterative node values <ul><li>r 0 r 1 r 2 Rank(i2) </li></ul><ul><li>P 1 1/6 1/18 1/36 5 </li></ul><ul><li>P 2 1/6 5/36 1/18 4 </li></ul><ul><li>P 3 1/6 1/12 1/36 5 </li></ul><ul><li>P 4 1/6 1/4 17/72 1 </li></ul><ul><li>P 5 1/6 5/36 11/72 3 </li></ul><ul><li>P 6 1/6 1/6 14/72 2 </li></ul>1 2 3 6 4 5
11. 11. CMS matrix This can’t go on forever Some values are equivalent (ties). In the interest of speed and efficiency, we need to know if the ranks converge—that is, will we break all ties, or will we keep doing this indefinitely and never have a decisive ranking? To determine this, the formula must be transformed using binary adjacency transformation, and Markov chain theory. 1 2 3 6 4 5
12. 12. Convert the iterative calculation to a matrix calculation using binary adjacency transformation for a 1Xn matrix <ul><li>P 1 P 2 P 3 P 4 P 5 P 6 </li></ul><ul><li>P 1 0 ½ ½ 0 0 0 </li></ul><ul><li>P 2 0 0 0 0 0 0 </li></ul><ul><li>P 3 1/3 1/3 0 0 1/3 0 </li></ul><ul><li>P 4 0 0 0 0 ½ ½ </li></ul><ul><li>P 5 0 0 0 ½ 0 ½ </li></ul><ul><li>P 6 0 0 0 1 0 0 </li></ul>[ ]
13. 13. Now, you can treat a row as a vector, or set of values P 1 P 2 P 3 P 4 P 5 P 6 P 1 0 ½ ½ 0 0 0 P 2 0 0 0 0 0 0 P 3 1/3 1/3 0 0 1/3 0 P 4 0 0 0 0 ½ ½ P 5 0 0 0 ½ 0 ½ P 6 0 0 0 1 0 0 [ ]   
14. 14. This is a sparse matrix. That’s good. P 1 P 2 P 3 P 4 P 5 P 6 P 1 0 ½ ½ 0 0 0 P 2 0 0 0 0 0 0 P 3 1/3 1/3 0 0 1/3 0 P 4 0 0 0 0 ½ ½ P 5 0 0 0 ½ 0 ½ P 6 0 0 0 1 0 0 [ ]
15. 15.  π  π   S + (1-  ) E) <ul><li> So now this: </li></ul><ul><li>Has become this: π  π   ) </li></ul><ul><li>We only need a couple more adjustments. </li></ul>r (P j ) |P j | _____ P j  B Pi  <ul><ul><ul><li>r(P i ) = </li></ul></ul></ul>
16. 16.  π  π   S + (1-  ) E) <ul><li>Sometimes, people teleport to a page. They just enter the URL and go. And just as easily, they can teleport out. To account for this, B&P added two adjustments: </li></ul><ul><li> S accounts for people who reach a dead end and jump to another page within a site.  is a weighted probability that someone will leave. </li></ul><ul><li>S is a matrix of probable page destinations. </li></ul>
17. 17.  π  π   S + (1-  ) E) <ul><li>What about people who jump out to a completely new destination? To account for this, B&P added the final adjustments: </li></ul><ul><li>1-  is the inverted weighted probability that someone will leave and go to a completely new site. </li></ul><ul><li>E is a random teleportation matrix of probable page destinations. </li></ul>