Markov Chains as methodology used by PageRank to
rank the Web Pages on Internet.
Sergio S. Guirreri - www.guirreri.host22....
Overview
1 Concepts on Markov-Chains.
2 The idea of the PageRank algorithm.
3 The PageRank algorithm.
4 Solving the PageRa...
Concepts on Markov-Chains.
Stochastic Process and Markov-Chains.
Let assume the following stochastic process
{Xn; n = 0, 1...
Concepts on Markov-Chains.
Stochastic Process and Markov-Chains.
Let assume the following stochastic process
{Xn; n = 0, 1...
Concepts on Markov-Chains.
Stochastic Process and Markov-Chains.
Let assume the following stochastic process
{Xn; n = 0, 1...
Concepts on Markov-Chains.
Stochastic Process and Markov-Chains.
Let assume the following stochastic process
{Xn; n = 0, 1...
Concepts on Markov-Chains.
Stochastic Process and Markov-Chains.
Let assume the following stochastic process
{Xn; n = 0, 1...
The idea of the PageRank algorithm.
PageRank’s idea.
The idea behind the PageRank algorithm is similar to the idea of the ...
The idea of the PageRank algorithm.
PageRank’s idea.
The idea behind the PageRank algorithm is similar to the idea of the ...
The idea of the PageRank algorithm.
Elements of the PageRank.
To illustrate the PageRank algorithm I define the following v...
The idea of the PageRank algorithm.
Elements of the PageRank.
To illustrate the PageRank algorithm I define the following v...
The idea of the PageRank algorithm.
Elements of the PageRank.
To illustrate the PageRank algorithm I define the following v...
The idea of the PageRank algorithm.
Elements of the PageRank.
To illustrate the PageRank algorithm I define the following v...
The idea of the PageRank algorithm.
Elements of the PageRank.
To illustrate the PageRank algorithm I define the following v...
The PageRank algorithm.
The PageRank with irreducible Markov Chain.
Assuming that the Markov chain is irreduciblea
and ape...
The PageRank algorithm.
The PageRank with irreducible Markov Chain.
Assuming that the Markov chain is irreduciblea
and ape...
The PageRank algorithm.
The PageRank with irreducible Markov Chain.
Assuming that the Markov chain is irreduciblea
and ape...
The PageRank algorithm.
The PageRank with irreducible Markov Chain.
Assuming that the Markov chain is irreduciblea
and ape...
The PageRank algorithm.
The PageRank with reducible Markov Chain
Since the matrix Q can be reducible to ensure that the st...
The PageRank algorithm.
The PageRank with reducible Markov Chain
Since the matrix Q can be reducible to ensure that the st...
The PageRank algorithm.
The PageRank with reducible Markov Chain
Solving the following linear system of equations subject ...
Solving the PageRank algorithm.
The power method.
The power method is an iterative method for solving the dominant eigenva...
Solving the PageRank algorithm.
The power method.
The power method is an iterative method for solving the dominant eigenva...
Solving the PageRank algorithm.
The power method.
The power method is an iterative method for solving the dominant eigenva...
Solving the PageRank algorithm.
The power method.
The power method is an iterative method for solving the dominant eigenva...
Solving the PageRank algorithm.
The power method.
The power method is an iterative method for solving the dominant eigenva...
Solving the PageRank algorithm.
The power method.
The initial vector x0 can be wrote:
x(0)
= a1u(1)
+ a2u(2)
+ · · · + anu...
Solving the PageRank algorithm.
The power method.
The initial vector x0 can be wrote:
x(0)
= a1u(1)
+ a2u(2)
+ · · · + anu...
Solving the PageRank algorithm.
The power method.
The initial vector x0 can be wrote:
x(0)
= a1u(1)
+ a2u(2)
+ · · · + anu...
Solving the PageRank algorithm.
The power method.
The initial vector x0 can be wrote:
x(0)
= a1u(1)
+ a2u(2)
+ · · · + anu...
Solving the PageRank algorithm.
The power method.
The initial vector x0 can be wrote:
x(0)
= a1u(1)
+ a2u(2)
+ · · · + anu...
Solving the PageRank algorithm.
The power method.
The initial vector x0 can be wrote:
x(0)
= a1u(1)
+ a2u(2)
+ · · · + anu...
Solving the PageRank algorithm.
The power method.
The initial vector x0 can be wrote:
x(0)
= a1u(1)
+ a2u(2)
+ · · · + anu...
Conclusions.
The power method and PageRank.
Results.
The matrix P of the PageRank algorithm is a stochastic matrix therefo...
Conclusions.
The power method and PageRank.
Results.
The matrix P of the PageRank algorithm is a stochastic matrix therefo...
Conclusions.
The power method and PageRank.
Results.
The matrix P of the PageRank algorithm is a stochastic matrix therefo...
Conclusions.
The power method and PageRank.
Results.
The matrix P of the PageRank algorithm is a stochastic matrix therefo...
Conclusions.
The power method and PageRank.
Results.
The matrix P of the PageRank algorithm is a stochastic matrix therefo...
Conclusions.
Really thanks to GTUG Palermo
and
see you to the next meeting!
Sergio S. Guirreri - www.guirreri.host22.com (...
Bibliography.
Bibliography.
Brin, S. and Page, L. (1998).
The anatomy of a large-scale hypertextual Web search engine.
Com...
Internet web sites.
Internet web sites.
Jon Atle Gulla (2007) - From Google Search to Semantic Exploration. -
Norwegian Un...
Upcoming SlideShare
Loading in...5
×

PageRank and Markov Chain

5,301

Published on

A brief introduction to the methodology used by PageRank to rank the webpages.

Published in: Technology, News & Politics
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,301
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
181
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

PageRank and Markov Chain

  1. 1. Markov Chains as methodology used by PageRank to rank the Web Pages on Internet. Sergio S. Guirreri - www.guirreri.host22.com Google Technology User Group (GTUG) of Palermo. 5th March 2010 Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 1 / 14
  2. 2. Overview 1 Concepts on Markov-Chains. 2 The idea of the PageRank algorithm. 3 The PageRank algorithm. 4 Solving the PageRank algorithm. 5 Conclusions. 6 Bibliography. 7 Internet web sites. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 2 / 14
  3. 3. Concepts on Markov-Chains. Stochastic Process and Markov-Chains. Let assume the following stochastic process {Xn; n = 0, 1, 2, . . . } with values in a set E, called the state space, while its elements are called state of the process. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 3 / 14
  4. 4. Concepts on Markov-Chains. Stochastic Process and Markov-Chains. Let assume the following stochastic process {Xn; n = 0, 1, 2, . . . } with values in a set E, called the state space, while its elements are called state of the process. Let assume the set E is finite or countable. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 3 / 14
  5. 5. Concepts on Markov-Chains. Stochastic Process and Markov-Chains. Let assume the following stochastic process {Xn; n = 0, 1, 2, . . . } with values in a set E, called the state space, while its elements are called state of the process. Let assume the set E is finite or countable. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 3 / 14
  6. 6. Concepts on Markov-Chains. Stochastic Process and Markov-Chains. Let assume the following stochastic process {Xn; n = 0, 1, 2, . . . } with values in a set E, called the state space, while its elements are called state of the process. Let assume the set E is finite or countable. Definition A Markov Chain is a stochastic process Xn that hold the following feature: Prob{Xn+1 = j|Xn = i, Xn−1 = in−1, . . . , X0 = i0} = = Prob{Xn+1 = j|Xn = i} = pij(n) where E is the state space set and j, i, in−1, . . . , i0 ∈ E, n ∈ N. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 3 / 14
  7. 7. Concepts on Markov-Chains. Stochastic Process and Markov-Chains. Let assume the following stochastic process {Xn; n = 0, 1, 2, . . . } with values in a set E, called the state space, while its elements are called state of the process. Let assume the set E is finite or countable. Definition A Markov Chain is a stochastic process Xn that hold the following feature: Prob{Xn+1 = j|Xn = i, Xn−1 = in−1, . . . , X0 = i0} = = Prob{Xn+1 = j|Xn = i} = pij(n) where E is the state space set and j, i, in−1, . . . , i0 ∈ E, n ∈ N. The transition probability matrix P of the process Xn is composed of pij, ∀i, j ∈ E. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 3 / 14
  8. 8. The idea of the PageRank algorithm. PageRank’s idea. The idea behind the PageRank algorithm is similar to the idea of the impact factor index used to rank the Journals [Page et al.(1999)] [Brin and Page(1998)] [Langville et al.(2008)]. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 4 / 14
  9. 9. The idea of the PageRank algorithm. PageRank’s idea. The idea behind the PageRank algorithm is similar to the idea of the impact factor index used to rank the Journals [Page et al.(1999)] [Brin and Page(1998)] [Langville et al.(2008)]. PageRank the impact factor of Internet. The impact factor of a journal is defined as the average number of citations per recently published papers in that journal. By regarding each web page as a journal, this idea was then extended to measure the importance of the web page in the PageRank Algorithm. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 4 / 14
  10. 10. The idea of the PageRank algorithm. Elements of the PageRank. To illustrate the PageRank algorithm I define the following variables [Ching and Ng(2006)]: let be N the total number of web pages in the web. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 5 / 14
  11. 11. The idea of the PageRank algorithm. Elements of the PageRank. To illustrate the PageRank algorithm I define the following variables [Ching and Ng(2006)]: let be N the total number of web pages in the web. let be k the outgoing links of web page j. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 5 / 14
  12. 12. The idea of the PageRank algorithm. Elements of the PageRank. To illustrate the PageRank algorithm I define the following variables [Ching and Ng(2006)]: let be N the total number of web pages in the web. let be k the outgoing links of web page j. let be Q the so called hyperlink matrix with elements: Qij =    1 k if web page i is an outgoing link of web page j; 0 otherwise; Qi,i > 0 ∀i. (1) Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 5 / 14
  13. 13. The idea of the PageRank algorithm. Elements of the PageRank. To illustrate the PageRank algorithm I define the following variables [Ching and Ng(2006)]: let be N the total number of web pages in the web. let be k the outgoing links of web page j. let be Q the so called hyperlink matrix with elements: Qij =    1 k if web page i is an outgoing link of web page j; 0 otherwise; Qi,i > 0 ∀i. (1) Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 5 / 14
  14. 14. The idea of the PageRank algorithm. Elements of the PageRank. To illustrate the PageRank algorithm I define the following variables [Ching and Ng(2006)]: let be N the total number of web pages in the web. let be k the outgoing links of web page j. let be Q the so called hyperlink matrix with elements: Qij =    1 k if web page i is an outgoing link of web page j; 0 otherwise; Qi,i > 0 ∀i. (1) The hyperlink matrix Q can be regarded as a transition probability matrix of a Markov chain. One may regard a surfer on the net as a random walker and the web pages as the states of the Markov chain. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 5 / 14
  15. 15. The PageRank algorithm. The PageRank with irreducible Markov Chain. Assuming that the Markov chain is irreduciblea and aperiodicb then the steady-state probability distribution (p1, p2, . . . , pN )T of the states (web pages) exists. aA Markov chain is irreducible if all states communicate with each other. bA chain is periodic if there exists k > 1 such that the interval between two visits to some state s is always a multiple of k. Therefore a chain is aperiodic if k=1. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 6 / 14
  16. 16. The PageRank algorithm. The PageRank with irreducible Markov Chain. Assuming that the Markov chain is irreduciblea and aperiodicb then the steady-state probability distribution (p1, p2, . . . , pN )T of the states (web pages) exists. aA Markov chain is irreducible if all states communicate with each other. bA chain is periodic if there exists k > 1 such that the interval between two visits to some state s is always a multiple of k. Therefore a chain is aperiodic if k=1. The PageRank Each pi is the proportion of time that the surfer visiting the web page i. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 6 / 14
  17. 17. The PageRank algorithm. The PageRank with irreducible Markov Chain. Assuming that the Markov chain is irreduciblea and aperiodicb then the steady-state probability distribution (p1, p2, . . . , pN )T of the states (web pages) exists. aA Markov chain is irreducible if all states communicate with each other. bA chain is periodic if there exists k > 1 such that the interval between two visits to some state s is always a multiple of k. Therefore a chain is aperiodic if k=1. The PageRank Each pi is the proportion of time that the surfer visiting the web page i. The higher the value of pi is, the more important web page i will be. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 6 / 14
  18. 18. The PageRank algorithm. The PageRank with irreducible Markov Chain. Assuming that the Markov chain is irreduciblea and aperiodicb then the steady-state probability distribution (p1, p2, . . . , pN )T of the states (web pages) exists. aA Markov chain is irreducible if all states communicate with each other. bA chain is periodic if there exists k > 1 such that the interval between two visits to some state s is always a multiple of k. Therefore a chain is aperiodic if k=1. The PageRank Each pi is the proportion of time that the surfer visiting the web page i. The higher the value of pi is, the more important web page i will be. The PageRank of web page i is then defined as pi. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 6 / 14
  19. 19. The PageRank algorithm. The PageRank with reducible Markov Chain Since the matrix Q can be reducible to ensure that the steady-state probability exists and is unique the following matrix P must be considered: P = α     Q11 Q12 . . . Q1N Q21 Q22 . . . Q2N . . . . . . . . . . . . QN1 QN2 . . . QNN     + (1 − α) N     1 1 . . . 1 1 1 . . . 1 . . . . . . . . . . . . 1 1 . . . 1     (2) Where 0 < α < 1 and the most popular values of α are 0.85 and (1 − 1/N). Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 7 / 14
  20. 20. The PageRank algorithm. The PageRank with reducible Markov Chain Since the matrix Q can be reducible to ensure that the steady-state probability exists and is unique the following matrix P must be considered: P = α     Q11 Q12 . . . Q1N Q21 Q22 . . . Q2N . . . . . . . . . . . . QN1 QN2 . . . QNN     + (1 − α) N     1 1 . . . 1 1 1 . . . 1 . . . . . . . . . . . . 1 1 . . . 1     (2) Where 0 < α < 1 and the most popular values of α are 0.85 and (1 − 1/N). Interpretation of PageRank The idea of the PageRank (2) is that, for a network of N web pages, each web page has an inherent importance of (1 − α)/N. If a page Pi has an importance of pi, then it will contribute an importance of α pi which is shared among the web pages that it points to. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 7 / 14
  21. 21. The PageRank algorithm. The PageRank with reducible Markov Chain Solving the following linear system of equations subject to the normalization constraint one can obtain the importance of web page Pi :      p1 p2 ... pN      = α     Q11 Q12 . . . Q1N Q21 Q22 . . . Q2N . . . . . . . . . . . . QN1 QN2 . . . QNN          p1 p2 ... pN      + (1 − α) N      1 1 ... 1      (3) Since N i=1 pi = 1 the (3) can be rewritten as (p1, p2, . . . , pN )T = P(p1, p2, . . . , pN )T Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 8 / 14
  22. 22. Solving the PageRank algorithm. The power method. The power method is an iterative method for solving the dominant eigenvalue and its corresponding eigenvectors of a matrix. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 9 / 14
  23. 23. Solving the PageRank algorithm. The power method. The power method is an iterative method for solving the dominant eigenvalue and its corresponding eigenvectors of a matrix. Given an n × n matrix A, the hypothesis of power method are: there is a single dominant eigenvalue. The eigenvalues can be sorted: |λ1| > |λ2| ≥ |λ3| ≥ . . . |λn| Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 9 / 14
  24. 24. Solving the PageRank algorithm. The power method. The power method is an iterative method for solving the dominant eigenvalue and its corresponding eigenvectors of a matrix. Given an n × n matrix A, the hypothesis of power method are: there is a single dominant eigenvalue. The eigenvalues can be sorted: |λ1| > |λ2| ≥ |λ3| ≥ . . . |λn| there is a linearly independent set of n eigenvectors: {u(1) , u(2) , . . . , u(n) } Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 9 / 14
  25. 25. Solving the PageRank algorithm. The power method. The power method is an iterative method for solving the dominant eigenvalue and its corresponding eigenvectors of a matrix. Given an n × n matrix A, the hypothesis of power method are: there is a single dominant eigenvalue. The eigenvalues can be sorted: |λ1| > |λ2| ≥ |λ3| ≥ . . . |λn| there is a linearly independent set of n eigenvectors: {u(1) , u(2) , . . . , u(n) } Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 9 / 14
  26. 26. Solving the PageRank algorithm. The power method. The power method is an iterative method for solving the dominant eigenvalue and its corresponding eigenvectors of a matrix. Given an n × n matrix A, the hypothesis of power method are: there is a single dominant eigenvalue. The eigenvalues can be sorted: |λ1| > |λ2| ≥ |λ3| ≥ . . . |λn| there is a linearly independent set of n eigenvectors: {u(1) , u(2) , . . . , u(n) } so that Au(i) = λiu(i) , i = 1, . . . , n. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 9 / 14
  27. 27. Solving the PageRank algorithm. The power method. The initial vector x0 can be wrote: x(0) = a1u(1) + a2u(2) + · · · + anu(n) Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 10 / 14
  28. 28. Solving the PageRank algorithm. The power method. The initial vector x0 can be wrote: x(0) = a1u(1) + a2u(2) + · · · + anu(n) iterating the initial vector with the A matrix: Ak x(0) = a1Ak u(1) + a2Ak u(2) + · · · + anAk u(n) Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 10 / 14
  29. 29. Solving the PageRank algorithm. The power method. The initial vector x0 can be wrote: x(0) = a1u(1) + a2u(2) + · · · + anu(n) iterating the initial vector with the A matrix: Ak x(0) = a1Ak u(1) + a2Ak u(2) + · · · + anAk u(n) = a1λk 1u(1) + a2λk 2u(2) + · · · + anλk nu(n) . Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 10 / 14
  30. 30. Solving the PageRank algorithm. The power method. The initial vector x0 can be wrote: x(0) = a1u(1) + a2u(2) + · · · + anu(n) iterating the initial vector with the A matrix: Ak x(0) = a1Ak u(1) + a2Ak u(2) + · · · + anAk u(n) = a1λk 1u(1) + a2λk 2u(2) + · · · + anλk nu(n) . dividing by λk 1 Ak x(0) λk 1 = a1u(1) + a2 λ2 λ1 k u(2) + · · · + an λn λ1 k u(n) , Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 10 / 14
  31. 31. Solving the PageRank algorithm. The power method. The initial vector x0 can be wrote: x(0) = a1u(1) + a2u(2) + · · · + anu(n) iterating the initial vector with the A matrix: Ak x(0) = a1Ak u(1) + a2Ak u(2) + · · · + anAk u(n) = a1λk 1u(1) + a2λk 2u(2) + · · · + anλk nu(n) . dividing by λk 1 Ak x(0) λk 1 = a1u(1) + a2 λ2 λ1 k u(2) + · · · + an λn λ1 k u(n) , Since |λi| |λ1| < 1 → Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 10 / 14
  32. 32. Solving the PageRank algorithm. The power method. The initial vector x0 can be wrote: x(0) = a1u(1) + a2u(2) + · · · + anu(n) iterating the initial vector with the A matrix: Ak x(0) = a1Ak u(1) + a2Ak u(2) + · · · + anAk u(n) = a1λk 1u(1) + a2λk 2u(2) + · · · + anλk nu(n) . dividing by λk 1 Ak x(0) λk 1 = a1u(1) + a2 λ2 λ1 k u(2) + · · · + an λn λ1 k u(n) , Since |λi| |λ1| < 1 → lim k→∞ |λi|k |λ1|k = 0 → Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 10 / 14
  33. 33. Solving the PageRank algorithm. The power method. The initial vector x0 can be wrote: x(0) = a1u(1) + a2u(2) + · · · + anu(n) iterating the initial vector with the A matrix: Ak x(0) = a1Ak u(1) + a2Ak u(2) + · · · + anAk u(n) = a1λk 1u(1) + a2λk 2u(2) + · · · + anλk nu(n) . dividing by λk 1 Ak x(0) λk 1 = a1u(1) + a2 λ2 λ1 k u(2) + · · · + an λn λ1 k u(n) , Since |λi| |λ1| < 1 → lim k→∞ |λi|k |λ1|k = 0 → Ak ≈ a1λk 1u(1) Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 10 / 14
  34. 34. Conclusions. The power method and PageRank. Results. The matrix P of the PageRank algorithm is a stochastic matrix therefore the largest eigenvalue is 1. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 11 / 14
  35. 35. Conclusions. The power method and PageRank. Results. The matrix P of the PageRank algorithm is a stochastic matrix therefore the largest eigenvalue is 1. The convergence rate of the power method depends on the ratio of λ2 λ1 . Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 11 / 14
  36. 36. Conclusions. The power method and PageRank. Results. The matrix P of the PageRank algorithm is a stochastic matrix therefore the largest eigenvalue is 1. The convergence rate of the power method depends on the ratio of λ2 λ1 . It has been showed by [Haveliwala and Kamvar(2003)] that for the second largest eigenvalue of P, we have |λ2| ≤ α 0 ≤ α ≤ 1. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 11 / 14
  37. 37. Conclusions. The power method and PageRank. Results. The matrix P of the PageRank algorithm is a stochastic matrix therefore the largest eigenvalue is 1. The convergence rate of the power method depends on the ratio of λ2 λ1 . It has been showed by [Haveliwala and Kamvar(2003)] that for the second largest eigenvalue of P, we have |λ2| ≤ α 0 ≤ α ≤ 1. Since λ1 = 1 the converge rate depends on α. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 11 / 14
  38. 38. Conclusions. The power method and PageRank. Results. The matrix P of the PageRank algorithm is a stochastic matrix therefore the largest eigenvalue is 1. The convergence rate of the power method depends on the ratio of λ2 λ1 . It has been showed by [Haveliwala and Kamvar(2003)] that for the second largest eigenvalue of P, we have |λ2| ≤ α 0 ≤ α ≤ 1. Since λ1 = 1 the converge rate depends on α. The most popular value for α is 0.85. With this value it has been proved that the power method on web data set of over 80 million pages converges in about 50 iterations. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 11 / 14
  39. 39. Conclusions. Really thanks to GTUG Palermo and see you to the next meeting! Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 12 / 14
  40. 40. Bibliography. Bibliography. Brin, S. and Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer networks and ISDN systems, 30(1-7), 107–117. Ching, W. and Ng, M. (2006). Markov Chains: Models, Algoritms and Applications. Springer Science + Business Media, Inc. Haveliwala, T. and Kamvar, M. (2003). The second eigenvalue of the google matrix. Technical report, Stanford University. Langville, A., Meyer, C., and Fern´Andez, P. (2008). Google’s PageRank and beyond: the science of search engine rankings. The Mathematical Intelligencer, 30(1), 68–69. Page, L., Brin, S., Motwani, R., and Winograd, T. (1999). The PageRank Citation Ranking: Bringing Order to the Web. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 13 / 14
  41. 41. Internet web sites. Internet web sites. Jon Atle Gulla (2007) - From Google Search to Semantic Exploration. - Norwegian University of Science Technology - www.slideshare.net/sveino/semantics-and-search?type=presentation Steven Levy (2010) - Exclusive: How Google’s Algorithm Rules the Web - Wired Magazine - www.wired.com/magazine/2010/02/ff_google_algorithm/ Ann Smarty (2009) - Let’s Try to Find All 200 Parameters in Google Algorithm - Search Engine Journal - www.searchenginejournal.com/200-parameters-in-google-algorithm/15457/. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 14 / 14
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×