Your SlideShare is downloading.
×

- 1. Numerical computing & Google’s PageRank DAVID F. GLEICH, CS 197 PRESENTATION
- 2. Hey Katie, do you have a date for Valentine’s Day? It was 1234567890 in 2009.
- 3. Thanks Internet! http://school.discoveryeducation.com/clipart/clip/stk-fgr6.html http://listsoplenty.com/pix/tag/cartoon https://www.facebook.com/ProgrammersJokes http://www.feld.com/wp/archives/2009/02/unix-time-1234567890- on-valentines-day.html
- 4. og le Go Thanks Internet! n ks ha http://school.discoveryeducation.com/clipart/clip/stk-fgr6.html T http://listsoplenty.com/pix/tag/cartoon https://www.facebook.com/ProgrammersJokes http://www.feld.com/wp/archives/2009/02/unix-time-1234567890- on-valentines-day.html
- 5. How did Google get started?
- 6. How did Google get started? … with an idea … … on the shoulders of giants!
- 7. LEO KATZ
- 8. Vannevar Bush “wholly new forms of encyclopedias will appear, ready made with a mesh of associative trails running through them, ready to be dropped into the memex and there ampliﬁed” -- “As we may think” The Atlantic, July 1945
- 9. Sir Tim Berners-Lee “We should work towards a universal linked information system … to allow a place for any information or reference one felt was important and a way of ﬁnding it afterwards.” -- Founding proposal for “the mesh”, 1989
- 10. … the mesh became the web … the web became a mess ... “ﬁnding it afterwards”? Hah!
- 11. Larry Page " Sergey Brin • Grad students at Stanford • Worked with Terry Winograd (artiﬁcial intelligence) • Created a web-search algorithm called “backrub” • Spun-off a company “Googol” • Worth about $20 billion each
- 12. A cartoon websearch primer 1. Crawl webpages 2. Analyze webpage text (information retrieval) 3. Analyze webpage links 4. Fit measures to human evaluations 5. Produce rankings 6. Continuously update
- 13. SportsIllustrated.com BobsPortsIllustrated.com
- 14. 1 2 to 3
- 15. What pages are important? Those that people visit a lot! How to we check? Create a model of how people visit the web.
- 16. What pages are important? The Google random surfer • Follows a random link with probability alpha" “random clicks” • Goes anywhere with probability (1-alpha)" “random jumps”
- 17. This is a Markov chain!
- 18. Andrei Markov • Studied sequences of random variables. • The probability that the random variable takes a particular value only depends on it’s current value. • The “page id” is the “random variable” in the Markov chain!
- 19. Oskar Perron" Georg Frobenius • Simultaneously discovered when a Markov chain has an “average” • The “average” of the web? It’s the probability of ﬁnding the random surfer at a page. • In 1907
- 20. What pages are important? Perron and Frobenius proved the following algorithm always converges to a solution… set prob[i] = 0 for all pages set p to a random page for t = 1 to ... increment prob[p] if rand() < alpha, set p to a random neighbor of p else, set p to a random page
- 21. Richard von Mises • Created “the power method” • An efﬁcient algorithm to “average” a Markov chain • It updated the probabilities of all pages at once. “Praktische Verfahren der Gleichungsauﬂösung”" R. von Mises and H. Pollaczek-Geiringer, 1929
- 22. What pages are important? Using the von Mises method … set prob[i] = 1/n for all pages for t = 1 to about 80 set newprob[i] = 0 for all pages for all links from page i to page j set newprob[j] += prob[i]/deg[i] for all pages I set prob[i] = alpha*newprob[i] + (1-alpha)/n
- 23. That algorithm underlying Google’s analysis of the web is from 1929!
- 24. Leo Katz
- 25. That’s not qu right W ite ikipedi a! Leo Katz
- 26. A new status index (1953)" Leo Katz A paper about how information spreads in groups … “For example, the information that the new high- school principal is unmarried and handsome might occasion a violent reaction in a ladies' garden club and hardly a ripple of interest in a luncheon group of the local chamber of commerce. On the other hand, the luncheon group might be anything but apathetic in its response to information concerning a fractional change in credit buying restrictions announced by the federal government.”
- 27. … there were many other shoulders too …
- 28. Gene Golub Popularized numerical computing with matrices via the informal “Golub thesis” “anything worth computing can be stated as a matrix problem” William Kahan Formalized IEEE-754 ﬂoating point arithmetic. Make it possible to compute with probabilities as “real numbers” instead of discrete counts.
- 29. Credits Most pictures taken from Google image search. Original idea from Massimo Franceschet. “PageRank: Standing on the shoulders of giants”