Numerical computing &Google’s PageRankDAVID F. GLEICH, CS 197 PRESENTATION
Hey Katie, do you have a  date for Valentine’s Day? It was1234567890in 2009.
Thanks Internet!                                  http://school.discoveryeducation.com/clipart/clip/stk-fgr6.html         ...
og le                           Go               Thanks Internet!    n                   ks                               ...
How did Google get started?
How did Google get started?… with an idea … … on the shoulders of giants!
LEO KATZ
Vannevar Bush“wholly new forms ofencyclopedias will appear,ready made with a mesh ofassociative trails runningthrough them...
Sir Tim Berners-Lee“We should work towards auniversal linked informationsystem … to allow a place forany information or re...
… the mesh became the web … the web became a mess... “finding it afterwards”? Hah!
Larry Page "Sergey Brin•  Grad students at Stanford•  Worked with Terry Winograd   (artificial intelligence)•  Created a we...
A cartoon websearch primer1.  Crawl webpages2.  Analyze webpage text (information retrieval)3.  Analyze webpage links4.  F...
SportsIllustrated.comBobsPortsIllustrated.com
1             2to         3
What pages areimportant?Those that people visit a lot!How to we check?Create a model of how peoplevisit the web.
What pages areimportant?The Google random surfer•  Follows a random link with   probability alpha"   “random clicks”•  Goe...
This is a Markov chain!
Andrei Markov•  Studied sequences of random   variables.•  The probability that the random   variable takes a particular v...
Oskar Perron"Georg Frobenius•  Simultaneously discovered   when a Markov chain has an   “average” •  The “average” of the ...
What pages areimportant?Perron and Frobenius proved thefollowing algorithm alwaysconverges to a solution…set prob[i] = 0 f...
Richard von Mises•  Created “the power method”•  An efficient algorithm to   “average” a Markov chain•  It updated the prob...
What pages areimportant?Using the von Mises method …set prob[i] = 1/n for all pagesfor t = 1 to about 80  set newprob[i] =...
That algorithm underlyingGoogle’s analysis of the web isfrom 1929!
Leo Katz
That’s           not qu   right W        ite          ikipedi                 a!Leo Katz
A new status index (1953)"Leo KatzA paper about how information spreads in groups … “For example, the information that the...
… there were many other    shoulders too …
Gene Golub                                                          Popularized numerical computing with                  ...
CreditsMost pictures taken from Google image search.Original idea from Massimo Franceschet.“PageRank: Standing on the shou...
Upcoming SlideShare
Loading in...5
×

A history of PageRank from the numerical computing perspective

629

Published on

We'll survey some of the underlying ideas from Google's PageRank algorithm along the lines of Massimo Franceschet's CACM history.

There are some slight liberties I've taken to make it more accessible.

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
629
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
13
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

A history of PageRank from the numerical computing perspective

  1. 1. Numerical computing &Google’s PageRankDAVID F. GLEICH, CS 197 PRESENTATION
  2. 2. Hey Katie, do you have a date for Valentine’s Day? It was1234567890in 2009.
  3. 3. Thanks Internet! http://school.discoveryeducation.com/clipart/clip/stk-fgr6.html http://listsoplenty.com/pix/tag/cartoon https://www.facebook.com/ProgrammersJokeshttp://www.feld.com/wp/archives/2009/02/unix-time-1234567890- on-valentines-day.html
  4. 4. og le Go Thanks Internet! n ks ha http://school.discoveryeducation.com/clipart/clip/stk-fgr6.htmlT http://listsoplenty.com/pix/tag/cartoon https://www.facebook.com/ProgrammersJokeshttp://www.feld.com/wp/archives/2009/02/unix-time-1234567890- on-valentines-day.html
  5. 5. How did Google get started?
  6. 6. How did Google get started?… with an idea … … on the shoulders of giants!
  7. 7. LEO KATZ
  8. 8. Vannevar Bush“wholly new forms ofencyclopedias will appear,ready made with a mesh ofassociative trails runningthrough them, ready to bedropped into the memex andthere amplified” -- “As we may think” The Atlantic, July 1945
  9. 9. Sir Tim Berners-Lee“We should work towards auniversal linked informationsystem … to allow a place forany information or referenceone felt was important and away of finding it afterwards.” -- Founding proposal for “the mesh”, 1989
  10. 10. … the mesh became the web … the web became a mess... “finding it afterwards”? Hah!
  11. 11. Larry Page "Sergey Brin•  Grad students at Stanford•  Worked with Terry Winograd (artificial intelligence)•  Created a web-search algorithm called “backrub”•  Spun-off a company “Googol”•  Worth about $20 billion each
  12. 12. A cartoon websearch primer1.  Crawl webpages2.  Analyze webpage text (information retrieval)3.  Analyze webpage links4.  Fit measures to human evaluations5.  Produce rankings6.  Continuously update
  13. 13. SportsIllustrated.comBobsPortsIllustrated.com
  14. 14. 1 2to 3
  15. 15. What pages areimportant?Those that people visit a lot!How to we check?Create a model of how peoplevisit the web.
  16. 16. What pages areimportant?The Google random surfer•  Follows a random link with probability alpha" “random clicks”•  Goes anywhere with probability (1-alpha)" “random jumps”
  17. 17. This is a Markov chain!
  18. 18. Andrei Markov•  Studied sequences of random variables.•  The probability that the random variable takes a particular value only depends on it’s current value.•  The “page id” is the “random variable” in the Markov chain!
  19. 19. Oskar Perron"Georg Frobenius•  Simultaneously discovered when a Markov chain has an “average” •  The “average” of the web? It’s the probability of finding the random surfer at a page.•  In 1907
  20. 20. What pages areimportant?Perron and Frobenius proved thefollowing algorithm alwaysconverges to a solution…set prob[i] = 0 for all pagesset p to a random pagefor t = 1 to ... increment prob[p] if rand() < alpha, set p to a random neighbor of p else, set p to a random page
  21. 21. Richard von Mises•  Created “the power method”•  An efficient algorithm to “average” a Markov chain•  It updated the probabilities of all pages at once.“Praktische Verfahren der Gleichungsauflösung”"R. von Mises and H. Pollaczek-Geiringer, 1929
  22. 22. What pages areimportant?Using the von Mises method …set prob[i] = 1/n for all pagesfor t = 1 to about 80 set newprob[i] = 0 for all pages for all links from page i to page j set newprob[j] += prob[i]/deg[i] for all pages I set prob[i] = alpha*newprob[i] + (1-alpha)/n
  23. 23. That algorithm underlyingGoogle’s analysis of the web isfrom 1929!
  24. 24. Leo Katz
  25. 25. That’s not qu right W ite ikipedi a!Leo Katz
  26. 26. A new status index (1953)"Leo KatzA paper about how information spreads in groups … “For example, the information that the new high-school principal is unmarried and handsome mightoccasion a violent reaction in a ladies garden cluband hardly a ripple of interest in a luncheon group ofthe local chamber of commerce. On the other hand,the luncheon group might be anything but apatheticin its response to information concerning a fractionalchange in credit buying restrictions announced by thefederal government.”
  27. 27. … there were many other shoulders too …
  28. 28. Gene Golub Popularized numerical computing with matrices via the informal “Golub thesis” “anything worth computing can be stated as a matrix problem” William Kahan Formalized IEEE-754 floating point arithmetic. Make it possible to compute with probabilities as “real numbers” instead of discrete counts.
  29. 29. CreditsMost pictures taken from Google image search.Original idea from Massimo Franceschet.“PageRank: Standing on the shoulders of giants”
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×