Numerical computing &
Google’s PageRank

DAVID F. GLEICH, CS 197 PRESENTATION
Hey Katie, do you have a
  date for Valentine’s Day? 




It was
1234567890
in 2009.
Thanks Internet!
                                
  http://school.discoveryeducation.com/clipart/clip/stk-fgr6.html
              http://listsoplenty.com/pix/tag/cartoon
         https://www.facebook.com/ProgrammersJokes
http://www.feld.com/wp/archives/2009/02/unix-time-1234567890-
                     on-valentines-day.html
og le
                           Go
               Thanks Internet!


    n                   ks
                                 




 ha
  http://school.discoveryeducation.com/clipart/clip/stk-fgr6.html




T
              http://listsoplenty.com/pix/tag/cartoon
         https://www.facebook.com/ProgrammersJokes
http://www.feld.com/wp/archives/2009/02/unix-time-1234567890-
                     on-valentines-day.html
How did Google get started?
How did Google get started?
… with an idea … 
… on the shoulders of giants!
LEO KATZ
Vannevar Bush
“wholly new forms of
encyclopedias will appear,
ready made with a mesh of
associative trails running
through them, ready to be
dropped into the memex and
there amplified” 
-- “As we may think” The Atlantic, July 1945
Sir Tim Berners-Lee
“We should work towards a
universal linked information
system … to allow a place for
any information or reference
one felt was important and a
way of finding it afterwards.”
 -- Founding proposal for “the mesh”, 1989
… the mesh became the web 
… the web became a mess
... “finding it afterwards”? Hah!
Larry Page "
Sergey Brin
•  Grad students at Stanford
•  Worked with Terry Winograd
   (artificial intelligence)
•  Created a web-search
   algorithm called “backrub”
•  Spun-off a company “Googol”
•  Worth about $20 billion each
A cartoon websearch primer
1.  Crawl webpages
2.  Analyze webpage text (information retrieval)
3.  Analyze webpage links
4.  Fit measures to human evaluations
5.  Produce rankings
6.  Continuously update
SportsIllustrated.com

BobsPortsIllustrated.com
1
             2
to

         3
What pages are
important?
Those that people visit a lot!
How to we check?
Create a model of how people
visit the web.
What pages are
important?
The Google random surfer
•  Follows a random link with
   probability alpha"
   “random clicks”
•  Goes anywhere with
   probability (1-alpha)"
   “random jumps”
This is a Markov chain!
Andrei Markov
•  Studied sequences of random
   variables.
•  The probability that the random
   variable takes a particular value
   only depends on it’s current value.
•  The “page id” is the “random
   variable” in the Markov chain!
Oskar Perron"
Georg Frobenius
•  Simultaneously discovered
   when a Markov chain has an
   “average” 
•  The “average” of the web? It’s
   the probability of finding the
   random surfer at a page.
•  In 1907
What pages are
important?
Perron and Frobenius proved the
following algorithm always
converges to a solution…
set prob[i] = 0 for all pages
set p to a random page
for t = 1 to ...
  increment prob[p]
  if rand() < alpha,
    set p to a random neighbor of p
  else, set p to a random page
Richard von Mises
•  Created “the power method”
•  An efficient algorithm to
   “average” a Markov chain
•  It updated the probabilities of
   all pages at once.
“Praktische Verfahren der Gleichungsauflösung”"
R. von Mises and H. Pollaczek-Geiringer, 1929
What pages are
important?
Using the von Mises method …

set prob[i] = 1/n for all pages
for t = 1 to about 80
  set newprob[i] = 0 for all pages
  for all links from page i to page j
    set newprob[j] += prob[i]/deg[i]
  for all pages I
    set prob[i] = alpha*newprob[i] +
                   (1-alpha)/n
That algorithm underlying
Google’s analysis of the web is
from 1929!
Leo Katz
That’s
           not qu
   right W        ite
          ikipedi
                 a!
Leo Katz
A new status index (1953)"
Leo Katz
A paper about how information spreads in groups … 
“For example, the information that the new high-
school principal is unmarried and handsome might
occasion a violent reaction in a ladies' garden club
and hardly a ripple of interest in a luncheon group of
the local chamber of commerce. On the other hand,
the luncheon group might be anything but apathetic
in its response to information concerning a fractional
change in credit buying restrictions announced by the
federal government.”
… there were many other
    shoulders too …
Gene Golub
                             
                             Popularized numerical computing with
                             matrices via the informal “Golub thesis”
                             
                             “anything worth computing can be
                             stated as a matrix problem”
                             




                William Kahan
                                             
Formalized IEEE-754 floating point arithmetic.
                                             
Make it possible to compute with probabilities
 as “real numbers” instead of discrete counts.
Credits



Most pictures taken from Google image search.
Original idea from Massimo Franceschet.
“PageRank: Standing on the shoulders of giants”

A history of PageRank from the numerical computing perspective

  • 1.
    Numerical computing & Google’sPageRank DAVID F. GLEICH, CS 197 PRESENTATION
  • 2.
    Hey Katie, doyou have a date for Valentine’s Day? It was 1234567890 in 2009.
  • 3.
    Thanks Internet! http://school.discoveryeducation.com/clipart/clip/stk-fgr6.html http://listsoplenty.com/pix/tag/cartoon https://www.facebook.com/ProgrammersJokes http://www.feld.com/wp/archives/2009/02/unix-time-1234567890- on-valentines-day.html
  • 4.
    og le Go Thanks Internet! n ks ha http://school.discoveryeducation.com/clipart/clip/stk-fgr6.html T http://listsoplenty.com/pix/tag/cartoon https://www.facebook.com/ProgrammersJokes http://www.feld.com/wp/archives/2009/02/unix-time-1234567890- on-valentines-day.html
  • 5.
    How did Googleget started?
  • 6.
    How did Googleget started? … with an idea … … on the shoulders of giants!
  • 7.
  • 8.
    Vannevar Bush “wholly newforms of encyclopedias will appear, ready made with a mesh of associative trails running through them, ready to be dropped into the memex and there amplified” -- “As we may think” The Atlantic, July 1945
  • 9.
    Sir Tim Berners-Lee “Weshould work towards a universal linked information system … to allow a place for any information or reference one felt was important and a way of finding it afterwards.” -- Founding proposal for “the mesh”, 1989
  • 10.
    … the meshbecame the web … the web became a mess ... “finding it afterwards”? Hah!
  • 11.
    Larry Page " SergeyBrin •  Grad students at Stanford •  Worked with Terry Winograd (artificial intelligence) •  Created a web-search algorithm called “backrub” •  Spun-off a company “Googol” •  Worth about $20 billion each
  • 12.
    A cartoon websearchprimer 1.  Crawl webpages 2.  Analyze webpage text (information retrieval) 3.  Analyze webpage links 4.  Fit measures to human evaluations 5.  Produce rankings 6.  Continuously update
  • 13.
  • 14.
    1 2 to 3
  • 15.
    What pages are important? Thosethat people visit a lot! How to we check? Create a model of how people visit the web.
  • 16.
    What pages are important? TheGoogle random surfer •  Follows a random link with probability alpha" “random clicks” •  Goes anywhere with probability (1-alpha)" “random jumps”
  • 17.
    This is aMarkov chain!
  • 18.
    Andrei Markov •  Studiedsequences of random variables. •  The probability that the random variable takes a particular value only depends on it’s current value. •  The “page id” is the “random variable” in the Markov chain!
  • 19.
    Oskar Perron" Georg Frobenius • Simultaneously discovered when a Markov chain has an “average” •  The “average” of the web? It’s the probability of finding the random surfer at a page. •  In 1907
  • 20.
    What pages are important? Perronand Frobenius proved the following algorithm always converges to a solution… set prob[i] = 0 for all pages set p to a random page for t = 1 to ... increment prob[p] if rand() < alpha, set p to a random neighbor of p else, set p to a random page
  • 21.
    Richard von Mises • Created “the power method” •  An efficient algorithm to “average” a Markov chain •  It updated the probabilities of all pages at once. “Praktische Verfahren der Gleichungsauflösung”" R. von Mises and H. Pollaczek-Geiringer, 1929
  • 22.
    What pages are important? Usingthe von Mises method … set prob[i] = 1/n for all pages for t = 1 to about 80 set newprob[i] = 0 for all pages for all links from page i to page j set newprob[j] += prob[i]/deg[i] for all pages I set prob[i] = alpha*newprob[i] + (1-alpha)/n
  • 23.
    That algorithm underlying Google’sanalysis of the web is from 1929!
  • 24.
  • 25.
    That’s not qu right W ite ikipedi a! Leo Katz
  • 26.
    A new statusindex (1953)" Leo Katz A paper about how information spreads in groups … “For example, the information that the new high- school principal is unmarried and handsome might occasion a violent reaction in a ladies' garden club and hardly a ripple of interest in a luncheon group of the local chamber of commerce. On the other hand, the luncheon group might be anything but apathetic in its response to information concerning a fractional change in credit buying restrictions announced by the federal government.”
  • 27.
    … there weremany other shoulders too …
  • 28.
    Gene Golub Popularized numerical computing with matrices via the informal “Golub thesis” “anything worth computing can be stated as a matrix problem” William Kahan Formalized IEEE-754 floating point arithmetic. Make it possible to compute with probabilities as “real numbers” instead of discrete counts.
  • 29.
    Credits Most pictures takenfrom Google image search. Original idea from Massimo Franceschet. “PageRank: Standing on the shoulders of giants”