Markov Chains as methodology used by PageRank to
rank the Web Pages on Internet.
Sergio S. Guirreri - www.guirreri.host22.com
Google Technology User Group (GTUG) of Palermo.
5th March 2010
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 1 / 14
Overview
1 Concepts on Markov-Chains.
2 The idea of the PageRank algorithm.
3 The PageRank algorithm.
4 Solving the PageRank algorithm.
5 Conclusions.
6 Bibliography.
7 Internet web sites.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 2 / 14
Concepts on Markov-Chains.
Stochastic Process and Markov-Chains.
Let assume the following stochastic process
{Xn; n = 0, 1, 2, . . . }
with values in a set E, called the state space, while its elements are called
state of the process.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 3 / 14
Concepts on Markov-Chains.
Stochastic Process and Markov-Chains.
Let assume the following stochastic process
{Xn; n = 0, 1, 2, . . . }
with values in a set E, called the state space, while its elements are called
state of the process.
Let assume the set E is finite or countable.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 3 / 14
Concepts on Markov-Chains.
Stochastic Process and Markov-Chains.
Let assume the following stochastic process
{Xn; n = 0, 1, 2, . . . }
with values in a set E, called the state space, while its elements are called
state of the process.
Let assume the set E is finite or countable.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 3 / 14
Concepts on Markov-Chains.
Stochastic Process and Markov-Chains.
Let assume the following stochastic process
{Xn; n = 0, 1, 2, . . . }
with values in a set E, called the state space, while its elements are called
state of the process.
Let assume the set E is finite or countable.
Definition
A Markov Chain is a stochastic process Xn that hold the following feature:
Prob{Xn+1 = j|Xn = i, Xn−1 = in−1, . . . , X0 = i0} =
= Prob{Xn+1 = j|Xn = i} = pij(n)
where E is the state space set and j, i, in−1, . . . , i0 ∈ E, n ∈ N.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 3 / 14
Concepts on Markov-Chains.
Stochastic Process and Markov-Chains.
Let assume the following stochastic process
{Xn; n = 0, 1, 2, . . . }
with values in a set E, called the state space, while its elements are called
state of the process.
Let assume the set E is finite or countable.
Definition
A Markov Chain is a stochastic process Xn that hold the following feature:
Prob{Xn+1 = j|Xn = i, Xn−1 = in−1, . . . , X0 = i0} =
= Prob{Xn+1 = j|Xn = i} = pij(n)
where E is the state space set and j, i, in−1, . . . , i0 ∈ E, n ∈ N.
The transition probability matrix P of the process Xn is composed of pij,
∀i, j ∈ E.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 3 / 14
The idea of the PageRank algorithm.
PageRank’s idea.
The idea behind the PageRank algorithm is similar to the idea of the impact
factor index used to rank the Journals [Page et al.(1999)]
[Brin and Page(1998)] [Langville et al.(2008)].
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 4 / 14
The idea of the PageRank algorithm.
PageRank’s idea.
The idea behind the PageRank algorithm is similar to the idea of the impact
factor index used to rank the Journals [Page et al.(1999)]
[Brin and Page(1998)] [Langville et al.(2008)].
PageRank the impact factor of Internet.
The impact factor of a journal is defined as the average number of citations
per recently published papers in that journal.
By regarding each web page as a journal, this idea was then extended to
measure the importance of the web page in the PageRank Algorithm.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 4 / 14
The idea of the PageRank algorithm.
Elements of the PageRank.
To illustrate the PageRank algorithm I define the following variables
[Ching and Ng(2006)]:
let be N the total number of web pages in the web.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 5 / 14
The idea of the PageRank algorithm.
Elements of the PageRank.
To illustrate the PageRank algorithm I define the following variables
[Ching and Ng(2006)]:
let be N the total number of web pages in the web.
let be k the outgoing links of web page j.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 5 / 14
The idea of the PageRank algorithm.
Elements of the PageRank.
To illustrate the PageRank algorithm I define the following variables
[Ching and Ng(2006)]:
let be N the total number of web pages in the web.
let be k the outgoing links of web page j.
let be Q the so called hyperlink matrix with elements:
Qij =



1
k if web page i is an outgoing link of web page j;
0 otherwise;
Qi,i > 0 ∀i.
(1)
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 5 / 14
The idea of the PageRank algorithm.
Elements of the PageRank.
To illustrate the PageRank algorithm I define the following variables
[Ching and Ng(2006)]:
let be N the total number of web pages in the web.
let be k the outgoing links of web page j.
let be Q the so called hyperlink matrix with elements:
Qij =



1
k if web page i is an outgoing link of web page j;
0 otherwise;
Qi,i > 0 ∀i.
(1)
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 5 / 14
The idea of the PageRank algorithm.
Elements of the PageRank.
To illustrate the PageRank algorithm I define the following variables
[Ching and Ng(2006)]:
let be N the total number of web pages in the web.
let be k the outgoing links of web page j.
let be Q the so called hyperlink matrix with elements:
Qij =



1
k if web page i is an outgoing link of web page j;
0 otherwise;
Qi,i > 0 ∀i.
(1)
The hyperlink matrix Q can be regarded as a transition probability matrix of
a Markov chain.
One may regard a surfer on the net as a random walker and the web pages as
the states of the Markov chain.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 5 / 14
The PageRank algorithm.
The PageRank with irreducible Markov Chain.
Assuming that the Markov chain is irreduciblea
and aperiodicb
then the
steady-state probability distribution (p1, p2, . . . , pN )T
of the states (web
pages) exists.
aA Markov chain is irreducible if all states communicate with each other.
bA chain is periodic if there exists k > 1 such that the interval between two visits to some
state s is always a multiple of k. Therefore a chain is aperiodic if k=1.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 6 / 14
The PageRank algorithm.
The PageRank with irreducible Markov Chain.
Assuming that the Markov chain is irreduciblea
and aperiodicb
then the
steady-state probability distribution (p1, p2, . . . , pN )T
of the states (web
pages) exists.
aA Markov chain is irreducible if all states communicate with each other.
bA chain is periodic if there exists k > 1 such that the interval between two visits to some
state s is always a multiple of k. Therefore a chain is aperiodic if k=1.
The PageRank
Each pi is the proportion of time that the surfer visiting the web page i.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 6 / 14
The PageRank algorithm.
The PageRank with irreducible Markov Chain.
Assuming that the Markov chain is irreduciblea
and aperiodicb
then the
steady-state probability distribution (p1, p2, . . . , pN )T
of the states (web
pages) exists.
aA Markov chain is irreducible if all states communicate with each other.
bA chain is periodic if there exists k > 1 such that the interval between two visits to some
state s is always a multiple of k. Therefore a chain is aperiodic if k=1.
The PageRank
Each pi is the proportion of time that the surfer visiting the web page i.
The higher the value of pi is, the more important web page i will be.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 6 / 14
The PageRank algorithm.
The PageRank with irreducible Markov Chain.
Assuming that the Markov chain is irreduciblea
and aperiodicb
then the
steady-state probability distribution (p1, p2, . . . , pN )T
of the states (web
pages) exists.
aA Markov chain is irreducible if all states communicate with each other.
bA chain is periodic if there exists k > 1 such that the interval between two visits to some
state s is always a multiple of k. Therefore a chain is aperiodic if k=1.
The PageRank
Each pi is the proportion of time that the surfer visiting the web page i.
The higher the value of pi is, the more important web page i will be.
The PageRank of web page i is then defined as pi.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 6 / 14
The PageRank algorithm.
The PageRank with reducible Markov Chain
Since the matrix Q can be reducible to ensure that the steady-state
probability exists and is unique the following matrix P must be considered:
P = α




Q11 Q12 . . . Q1N
Q21 Q22 . . . Q2N
. . . . . . . . . . . .
QN1 QN2 . . . QNN



 +
(1 − α)
N




1 1 . . . 1
1 1 . . . 1
. . . . . . . . . . . .
1 1 . . . 1



 (2)
Where 0 < α < 1 and the most popular values of α are 0.85 and (1 − 1/N).
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 7 / 14
The PageRank algorithm.
The PageRank with reducible Markov Chain
Since the matrix Q can be reducible to ensure that the steady-state
probability exists and is unique the following matrix P must be considered:
P = α




Q11 Q12 . . . Q1N
Q21 Q22 . . . Q2N
. . . . . . . . . . . .
QN1 QN2 . . . QNN



 +
(1 − α)
N




1 1 . . . 1
1 1 . . . 1
. . . . . . . . . . . .
1 1 . . . 1



 (2)
Where 0 < α < 1 and the most popular values of α are 0.85 and (1 − 1/N).
Interpretation of PageRank
The idea of the PageRank (2) is that, for a network of N web pages, each web
page has an inherent importance of (1 − α)/N.
If a page Pi has an importance of pi, then it will contribute an importance of
α pi which is shared among the web pages that it points to.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 7 / 14
The PageRank algorithm.
The PageRank with reducible Markov Chain
Solving the following linear system of equations subject to the normalization
constraint one can obtain the importance of web page Pi :





p1
p2
...
pN





= α




Q11 Q12 . . . Q1N
Q21 Q22 . . . Q2N
. . . . . . . . . . . .
QN1 QN2 . . . QNN









p1
p2
...
pN





+
(1 − α)
N





1
1
...
1





(3)
Since
N
i=1
pi = 1
the (3) can be rewritten as
(p1, p2, . . . , pN )T
= P(p1, p2, . . . , pN )T
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 8 / 14
Solving the PageRank algorithm.
The power method.
The power method is an iterative method for solving the dominant eigenvalue
and its corresponding eigenvectors of a matrix.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 9 / 14
Solving the PageRank algorithm.
The power method.
The power method is an iterative method for solving the dominant eigenvalue
and its corresponding eigenvectors of a matrix.
Given an n × n matrix A, the hypothesis of power method are:
there is a single dominant eigenvalue. The eigenvalues can be sorted:
|λ1| > |λ2| ≥ |λ3| ≥ . . . |λn|
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 9 / 14
Solving the PageRank algorithm.
The power method.
The power method is an iterative method for solving the dominant eigenvalue
and its corresponding eigenvectors of a matrix.
Given an n × n matrix A, the hypothesis of power method are:
there is a single dominant eigenvalue. The eigenvalues can be sorted:
|λ1| > |λ2| ≥ |λ3| ≥ . . . |λn|
there is a linearly independent set of n eigenvectors:
{u(1)
, u(2)
, . . . , u(n)
}
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 9 / 14
Solving the PageRank algorithm.
The power method.
The power method is an iterative method for solving the dominant eigenvalue
and its corresponding eigenvectors of a matrix.
Given an n × n matrix A, the hypothesis of power method are:
there is a single dominant eigenvalue. The eigenvalues can be sorted:
|λ1| > |λ2| ≥ |λ3| ≥ . . . |λn|
there is a linearly independent set of n eigenvectors:
{u(1)
, u(2)
, . . . , u(n)
}
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 9 / 14
Solving the PageRank algorithm.
The power method.
The power method is an iterative method for solving the dominant eigenvalue
and its corresponding eigenvectors of a matrix.
Given an n × n matrix A, the hypothesis of power method are:
there is a single dominant eigenvalue. The eigenvalues can be sorted:
|λ1| > |λ2| ≥ |λ3| ≥ . . . |λn|
there is a linearly independent set of n eigenvectors:
{u(1)
, u(2)
, . . . , u(n)
}
so that
Au(i)
= λiu(i)
, i = 1, . . . , n.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 9 / 14
Solving the PageRank algorithm.
The power method.
The initial vector x0 can be wrote:
x(0)
= a1u(1)
+ a2u(2)
+ · · · + anu(n)
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 10 / 14
Solving the PageRank algorithm.
The power method.
The initial vector x0 can be wrote:
x(0)
= a1u(1)
+ a2u(2)
+ · · · + anu(n)
iterating the initial vector with the A matrix:
Ak
x(0)
= a1Ak
u(1)
+ a2Ak
u(2)
+ · · · + anAk
u(n)
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 10 / 14
Solving the PageRank algorithm.
The power method.
The initial vector x0 can be wrote:
x(0)
= a1u(1)
+ a2u(2)
+ · · · + anu(n)
iterating the initial vector with the A matrix:
Ak
x(0)
= a1Ak
u(1)
+ a2Ak
u(2)
+ · · · + anAk
u(n)
= a1λk
1u(1)
+ a2λk
2u(2)
+ · · · + anλk
nu(n)
.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 10 / 14
Solving the PageRank algorithm.
The power method.
The initial vector x0 can be wrote:
x(0)
= a1u(1)
+ a2u(2)
+ · · · + anu(n)
iterating the initial vector with the A matrix:
Ak
x(0)
= a1Ak
u(1)
+ a2Ak
u(2)
+ · · · + anAk
u(n)
= a1λk
1u(1)
+ a2λk
2u(2)
+ · · · + anλk
nu(n)
.
dividing by λk
1
Ak
x(0)
λk
1
= a1u(1)
+ a2
λ2
λ1
k
u(2)
+ · · · + an
λn
λ1
k
u(n)
,
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 10 / 14
Solving the PageRank algorithm.
The power method.
The initial vector x0 can be wrote:
x(0)
= a1u(1)
+ a2u(2)
+ · · · + anu(n)
iterating the initial vector with the A matrix:
Ak
x(0)
= a1Ak
u(1)
+ a2Ak
u(2)
+ · · · + anAk
u(n)
= a1λk
1u(1)
+ a2λk
2u(2)
+ · · · + anλk
nu(n)
.
dividing by λk
1
Ak
x(0)
λk
1
= a1u(1)
+ a2
λ2
λ1
k
u(2)
+ · · · + an
λn
λ1
k
u(n)
,
Since
|λi|
|λ1|
< 1 →
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 10 / 14
Solving the PageRank algorithm.
The power method.
The initial vector x0 can be wrote:
x(0)
= a1u(1)
+ a2u(2)
+ · · · + anu(n)
iterating the initial vector with the A matrix:
Ak
x(0)
= a1Ak
u(1)
+ a2Ak
u(2)
+ · · · + anAk
u(n)
= a1λk
1u(1)
+ a2λk
2u(2)
+ · · · + anλk
nu(n)
.
dividing by λk
1
Ak
x(0)
λk
1
= a1u(1)
+ a2
λ2
λ1
k
u(2)
+ · · · + an
λn
λ1
k
u(n)
,
Since
|λi|
|λ1|
< 1 → lim
k→∞
|λi|k
|λ1|k
= 0 →
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 10 / 14
Solving the PageRank algorithm.
The power method.
The initial vector x0 can be wrote:
x(0)
= a1u(1)
+ a2u(2)
+ · · · + anu(n)
iterating the initial vector with the A matrix:
Ak
x(0)
= a1Ak
u(1)
+ a2Ak
u(2)
+ · · · + anAk
u(n)
= a1λk
1u(1)
+ a2λk
2u(2)
+ · · · + anλk
nu(n)
.
dividing by λk
1
Ak
x(0)
λk
1
= a1u(1)
+ a2
λ2
λ1
k
u(2)
+ · · · + an
λn
λ1
k
u(n)
,
Since
|λi|
|λ1|
< 1 → lim
k→∞
|λi|k
|λ1|k
= 0 → Ak
≈ a1λk
1u(1)
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 10 / 14
Conclusions.
The power method and PageRank.
Results.
The matrix P of the PageRank algorithm is a stochastic matrix therefore
the largest eigenvalue is 1.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 11 / 14
Conclusions.
The power method and PageRank.
Results.
The matrix P of the PageRank algorithm is a stochastic matrix therefore
the largest eigenvalue is 1.
The convergence rate of the power method depends on the ratio of λ2
λ1
.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 11 / 14
Conclusions.
The power method and PageRank.
Results.
The matrix P of the PageRank algorithm is a stochastic matrix therefore
the largest eigenvalue is 1.
The convergence rate of the power method depends on the ratio of λ2
λ1
.
It has been showed by [Haveliwala and Kamvar(2003)] that for the second
largest eigenvalue of P, we have
|λ2| ≤ α 0 ≤ α ≤ 1.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 11 / 14
Conclusions.
The power method and PageRank.
Results.
The matrix P of the PageRank algorithm is a stochastic matrix therefore
the largest eigenvalue is 1.
The convergence rate of the power method depends on the ratio of λ2
λ1
.
It has been showed by [Haveliwala and Kamvar(2003)] that for the second
largest eigenvalue of P, we have
|λ2| ≤ α 0 ≤ α ≤ 1.
Since λ1 = 1 the converge rate depends on α.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 11 / 14
Conclusions.
The power method and PageRank.
Results.
The matrix P of the PageRank algorithm is a stochastic matrix therefore
the largest eigenvalue is 1.
The convergence rate of the power method depends on the ratio of λ2
λ1
.
It has been showed by [Haveliwala and Kamvar(2003)] that for the second
largest eigenvalue of P, we have
|λ2| ≤ α 0 ≤ α ≤ 1.
Since λ1 = 1 the converge rate depends on α.
The most popular value for α is 0.85. With this value it has been proved
that the power method on web data set of over 80 million pages converges
in about 50 iterations.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 11 / 14
Conclusions.
Really thanks to GTUG Palermo
and
see you to the next meeting!
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 12 / 14
Bibliography.
Bibliography.
Brin, S. and Page, L. (1998).
The anatomy of a large-scale hypertextual Web search engine.
Computer networks and ISDN systems, 30(1-7), 107–117.
Ching, W. and Ng, M. (2006).
Markov Chains: Models, Algoritms and Applications.
Springer Science + Business Media, Inc.
Haveliwala, T. and Kamvar, M. (2003).
The second eigenvalue of the google matrix.
Technical report, Stanford University.
Langville, A., Meyer, C., and Fern´Andez, P. (2008).
Google’s PageRank and beyond: the science of search engine rankings.
The Mathematical Intelligencer, 30(1), 68–69.
Page, L., Brin, S., Motwani, R., and Winograd, T. (1999).
The PageRank Citation Ranking: Bringing Order to the Web.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 13 / 14
Internet web sites.
Internet web sites.
Jon Atle Gulla (2007) - From Google Search to Semantic Exploration. -
Norwegian University of Science Technology -
www.slideshare.net/sveino/semantics-and-search?type=presentation
Steven Levy (2010) - Exclusive: How Google’s Algorithm Rules the Web - Wired
Magazine - www.wired.com/magazine/2010/02/ff_google_algorithm/
Ann Smarty (2009) - Let’s Try to Find All 200 Parameters in Google Algorithm -
Search Engine Journal -
www.searchenginejournal.com/200-parameters-in-google-algorithm/15457/.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 14 / 14

PageRank and Markov Chain

  • 1.
    Markov Chains asmethodology used by PageRank to rank the Web Pages on Internet. Sergio S. Guirreri - www.guirreri.host22.com Google Technology User Group (GTUG) of Palermo. 5th March 2010 Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 1 / 14
  • 2.
    Overview 1 Concepts onMarkov-Chains. 2 The idea of the PageRank algorithm. 3 The PageRank algorithm. 4 Solving the PageRank algorithm. 5 Conclusions. 6 Bibliography. 7 Internet web sites. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 2 / 14
  • 3.
    Concepts on Markov-Chains. StochasticProcess and Markov-Chains. Let assume the following stochastic process {Xn; n = 0, 1, 2, . . . } with values in a set E, called the state space, while its elements are called state of the process. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 3 / 14
  • 4.
    Concepts on Markov-Chains. StochasticProcess and Markov-Chains. Let assume the following stochastic process {Xn; n = 0, 1, 2, . . . } with values in a set E, called the state space, while its elements are called state of the process. Let assume the set E is finite or countable. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 3 / 14
  • 5.
    Concepts on Markov-Chains. StochasticProcess and Markov-Chains. Let assume the following stochastic process {Xn; n = 0, 1, 2, . . . } with values in a set E, called the state space, while its elements are called state of the process. Let assume the set E is finite or countable. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 3 / 14
  • 6.
    Concepts on Markov-Chains. StochasticProcess and Markov-Chains. Let assume the following stochastic process {Xn; n = 0, 1, 2, . . . } with values in a set E, called the state space, while its elements are called state of the process. Let assume the set E is finite or countable. Definition A Markov Chain is a stochastic process Xn that hold the following feature: Prob{Xn+1 = j|Xn = i, Xn−1 = in−1, . . . , X0 = i0} = = Prob{Xn+1 = j|Xn = i} = pij(n) where E is the state space set and j, i, in−1, . . . , i0 ∈ E, n ∈ N. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 3 / 14
  • 7.
    Concepts on Markov-Chains. StochasticProcess and Markov-Chains. Let assume the following stochastic process {Xn; n = 0, 1, 2, . . . } with values in a set E, called the state space, while its elements are called state of the process. Let assume the set E is finite or countable. Definition A Markov Chain is a stochastic process Xn that hold the following feature: Prob{Xn+1 = j|Xn = i, Xn−1 = in−1, . . . , X0 = i0} = = Prob{Xn+1 = j|Xn = i} = pij(n) where E is the state space set and j, i, in−1, . . . , i0 ∈ E, n ∈ N. The transition probability matrix P of the process Xn is composed of pij, ∀i, j ∈ E. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 3 / 14
  • 8.
    The idea ofthe PageRank algorithm. PageRank’s idea. The idea behind the PageRank algorithm is similar to the idea of the impact factor index used to rank the Journals [Page et al.(1999)] [Brin and Page(1998)] [Langville et al.(2008)]. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 4 / 14
  • 9.
    The idea ofthe PageRank algorithm. PageRank’s idea. The idea behind the PageRank algorithm is similar to the idea of the impact factor index used to rank the Journals [Page et al.(1999)] [Brin and Page(1998)] [Langville et al.(2008)]. PageRank the impact factor of Internet. The impact factor of a journal is defined as the average number of citations per recently published papers in that journal. By regarding each web page as a journal, this idea was then extended to measure the importance of the web page in the PageRank Algorithm. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 4 / 14
  • 10.
    The idea ofthe PageRank algorithm. Elements of the PageRank. To illustrate the PageRank algorithm I define the following variables [Ching and Ng(2006)]: let be N the total number of web pages in the web. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 5 / 14
  • 11.
    The idea ofthe PageRank algorithm. Elements of the PageRank. To illustrate the PageRank algorithm I define the following variables [Ching and Ng(2006)]: let be N the total number of web pages in the web. let be k the outgoing links of web page j. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 5 / 14
  • 12.
    The idea ofthe PageRank algorithm. Elements of the PageRank. To illustrate the PageRank algorithm I define the following variables [Ching and Ng(2006)]: let be N the total number of web pages in the web. let be k the outgoing links of web page j. let be Q the so called hyperlink matrix with elements: Qij =    1 k if web page i is an outgoing link of web page j; 0 otherwise; Qi,i > 0 ∀i. (1) Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 5 / 14
  • 13.
    The idea ofthe PageRank algorithm. Elements of the PageRank. To illustrate the PageRank algorithm I define the following variables [Ching and Ng(2006)]: let be N the total number of web pages in the web. let be k the outgoing links of web page j. let be Q the so called hyperlink matrix with elements: Qij =    1 k if web page i is an outgoing link of web page j; 0 otherwise; Qi,i > 0 ∀i. (1) Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 5 / 14
  • 14.
    The idea ofthe PageRank algorithm. Elements of the PageRank. To illustrate the PageRank algorithm I define the following variables [Ching and Ng(2006)]: let be N the total number of web pages in the web. let be k the outgoing links of web page j. let be Q the so called hyperlink matrix with elements: Qij =    1 k if web page i is an outgoing link of web page j; 0 otherwise; Qi,i > 0 ∀i. (1) The hyperlink matrix Q can be regarded as a transition probability matrix of a Markov chain. One may regard a surfer on the net as a random walker and the web pages as the states of the Markov chain. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 5 / 14
  • 15.
    The PageRank algorithm. ThePageRank with irreducible Markov Chain. Assuming that the Markov chain is irreduciblea and aperiodicb then the steady-state probability distribution (p1, p2, . . . , pN )T of the states (web pages) exists. aA Markov chain is irreducible if all states communicate with each other. bA chain is periodic if there exists k > 1 such that the interval between two visits to some state s is always a multiple of k. Therefore a chain is aperiodic if k=1. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 6 / 14
  • 16.
    The PageRank algorithm. ThePageRank with irreducible Markov Chain. Assuming that the Markov chain is irreduciblea and aperiodicb then the steady-state probability distribution (p1, p2, . . . , pN )T of the states (web pages) exists. aA Markov chain is irreducible if all states communicate with each other. bA chain is periodic if there exists k > 1 such that the interval between two visits to some state s is always a multiple of k. Therefore a chain is aperiodic if k=1. The PageRank Each pi is the proportion of time that the surfer visiting the web page i. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 6 / 14
  • 17.
    The PageRank algorithm. ThePageRank with irreducible Markov Chain. Assuming that the Markov chain is irreduciblea and aperiodicb then the steady-state probability distribution (p1, p2, . . . , pN )T of the states (web pages) exists. aA Markov chain is irreducible if all states communicate with each other. bA chain is periodic if there exists k > 1 such that the interval between two visits to some state s is always a multiple of k. Therefore a chain is aperiodic if k=1. The PageRank Each pi is the proportion of time that the surfer visiting the web page i. The higher the value of pi is, the more important web page i will be. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 6 / 14
  • 18.
    The PageRank algorithm. ThePageRank with irreducible Markov Chain. Assuming that the Markov chain is irreduciblea and aperiodicb then the steady-state probability distribution (p1, p2, . . . , pN )T of the states (web pages) exists. aA Markov chain is irreducible if all states communicate with each other. bA chain is periodic if there exists k > 1 such that the interval between two visits to some state s is always a multiple of k. Therefore a chain is aperiodic if k=1. The PageRank Each pi is the proportion of time that the surfer visiting the web page i. The higher the value of pi is, the more important web page i will be. The PageRank of web page i is then defined as pi. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 6 / 14
  • 19.
    The PageRank algorithm. ThePageRank with reducible Markov Chain Since the matrix Q can be reducible to ensure that the steady-state probability exists and is unique the following matrix P must be considered: P = α     Q11 Q12 . . . Q1N Q21 Q22 . . . Q2N . . . . . . . . . . . . QN1 QN2 . . . QNN     + (1 − α) N     1 1 . . . 1 1 1 . . . 1 . . . . . . . . . . . . 1 1 . . . 1     (2) Where 0 < α < 1 and the most popular values of α are 0.85 and (1 − 1/N). Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 7 / 14
  • 20.
    The PageRank algorithm. ThePageRank with reducible Markov Chain Since the matrix Q can be reducible to ensure that the steady-state probability exists and is unique the following matrix P must be considered: P = α     Q11 Q12 . . . Q1N Q21 Q22 . . . Q2N . . . . . . . . . . . . QN1 QN2 . . . QNN     + (1 − α) N     1 1 . . . 1 1 1 . . . 1 . . . . . . . . . . . . 1 1 . . . 1     (2) Where 0 < α < 1 and the most popular values of α are 0.85 and (1 − 1/N). Interpretation of PageRank The idea of the PageRank (2) is that, for a network of N web pages, each web page has an inherent importance of (1 − α)/N. If a page Pi has an importance of pi, then it will contribute an importance of α pi which is shared among the web pages that it points to. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 7 / 14
  • 21.
    The PageRank algorithm. ThePageRank with reducible Markov Chain Solving the following linear system of equations subject to the normalization constraint one can obtain the importance of web page Pi :      p1 p2 ... pN      = α     Q11 Q12 . . . Q1N Q21 Q22 . . . Q2N . . . . . . . . . . . . QN1 QN2 . . . QNN          p1 p2 ... pN      + (1 − α) N      1 1 ... 1      (3) Since N i=1 pi = 1 the (3) can be rewritten as (p1, p2, . . . , pN )T = P(p1, p2, . . . , pN )T Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 8 / 14
  • 22.
    Solving the PageRankalgorithm. The power method. The power method is an iterative method for solving the dominant eigenvalue and its corresponding eigenvectors of a matrix. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 9 / 14
  • 23.
    Solving the PageRankalgorithm. The power method. The power method is an iterative method for solving the dominant eigenvalue and its corresponding eigenvectors of a matrix. Given an n × n matrix A, the hypothesis of power method are: there is a single dominant eigenvalue. The eigenvalues can be sorted: |λ1| > |λ2| ≥ |λ3| ≥ . . . |λn| Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 9 / 14
  • 24.
    Solving the PageRankalgorithm. The power method. The power method is an iterative method for solving the dominant eigenvalue and its corresponding eigenvectors of a matrix. Given an n × n matrix A, the hypothesis of power method are: there is a single dominant eigenvalue. The eigenvalues can be sorted: |λ1| > |λ2| ≥ |λ3| ≥ . . . |λn| there is a linearly independent set of n eigenvectors: {u(1) , u(2) , . . . , u(n) } Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 9 / 14
  • 25.
    Solving the PageRankalgorithm. The power method. The power method is an iterative method for solving the dominant eigenvalue and its corresponding eigenvectors of a matrix. Given an n × n matrix A, the hypothesis of power method are: there is a single dominant eigenvalue. The eigenvalues can be sorted: |λ1| > |λ2| ≥ |λ3| ≥ . . . |λn| there is a linearly independent set of n eigenvectors: {u(1) , u(2) , . . . , u(n) } Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 9 / 14
  • 26.
    Solving the PageRankalgorithm. The power method. The power method is an iterative method for solving the dominant eigenvalue and its corresponding eigenvectors of a matrix. Given an n × n matrix A, the hypothesis of power method are: there is a single dominant eigenvalue. The eigenvalues can be sorted: |λ1| > |λ2| ≥ |λ3| ≥ . . . |λn| there is a linearly independent set of n eigenvectors: {u(1) , u(2) , . . . , u(n) } so that Au(i) = λiu(i) , i = 1, . . . , n. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 9 / 14
  • 27.
    Solving the PageRankalgorithm. The power method. The initial vector x0 can be wrote: x(0) = a1u(1) + a2u(2) + · · · + anu(n) Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 10 / 14
  • 28.
    Solving the PageRankalgorithm. The power method. The initial vector x0 can be wrote: x(0) = a1u(1) + a2u(2) + · · · + anu(n) iterating the initial vector with the A matrix: Ak x(0) = a1Ak u(1) + a2Ak u(2) + · · · + anAk u(n) Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 10 / 14
  • 29.
    Solving the PageRankalgorithm. The power method. The initial vector x0 can be wrote: x(0) = a1u(1) + a2u(2) + · · · + anu(n) iterating the initial vector with the A matrix: Ak x(0) = a1Ak u(1) + a2Ak u(2) + · · · + anAk u(n) = a1λk 1u(1) + a2λk 2u(2) + · · · + anλk nu(n) . Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 10 / 14
  • 30.
    Solving the PageRankalgorithm. The power method. The initial vector x0 can be wrote: x(0) = a1u(1) + a2u(2) + · · · + anu(n) iterating the initial vector with the A matrix: Ak x(0) = a1Ak u(1) + a2Ak u(2) + · · · + anAk u(n) = a1λk 1u(1) + a2λk 2u(2) + · · · + anλk nu(n) . dividing by λk 1 Ak x(0) λk 1 = a1u(1) + a2 λ2 λ1 k u(2) + · · · + an λn λ1 k u(n) , Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 10 / 14
  • 31.
    Solving the PageRankalgorithm. The power method. The initial vector x0 can be wrote: x(0) = a1u(1) + a2u(2) + · · · + anu(n) iterating the initial vector with the A matrix: Ak x(0) = a1Ak u(1) + a2Ak u(2) + · · · + anAk u(n) = a1λk 1u(1) + a2λk 2u(2) + · · · + anλk nu(n) . dividing by λk 1 Ak x(0) λk 1 = a1u(1) + a2 λ2 λ1 k u(2) + · · · + an λn λ1 k u(n) , Since |λi| |λ1| < 1 → Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 10 / 14
  • 32.
    Solving the PageRankalgorithm. The power method. The initial vector x0 can be wrote: x(0) = a1u(1) + a2u(2) + · · · + anu(n) iterating the initial vector with the A matrix: Ak x(0) = a1Ak u(1) + a2Ak u(2) + · · · + anAk u(n) = a1λk 1u(1) + a2λk 2u(2) + · · · + anλk nu(n) . dividing by λk 1 Ak x(0) λk 1 = a1u(1) + a2 λ2 λ1 k u(2) + · · · + an λn λ1 k u(n) , Since |λi| |λ1| < 1 → lim k→∞ |λi|k |λ1|k = 0 → Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 10 / 14
  • 33.
    Solving the PageRankalgorithm. The power method. The initial vector x0 can be wrote: x(0) = a1u(1) + a2u(2) + · · · + anu(n) iterating the initial vector with the A matrix: Ak x(0) = a1Ak u(1) + a2Ak u(2) + · · · + anAk u(n) = a1λk 1u(1) + a2λk 2u(2) + · · · + anλk nu(n) . dividing by λk 1 Ak x(0) λk 1 = a1u(1) + a2 λ2 λ1 k u(2) + · · · + an λn λ1 k u(n) , Since |λi| |λ1| < 1 → lim k→∞ |λi|k |λ1|k = 0 → Ak ≈ a1λk 1u(1) Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 10 / 14
  • 34.
    Conclusions. The power methodand PageRank. Results. The matrix P of the PageRank algorithm is a stochastic matrix therefore the largest eigenvalue is 1. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 11 / 14
  • 35.
    Conclusions. The power methodand PageRank. Results. The matrix P of the PageRank algorithm is a stochastic matrix therefore the largest eigenvalue is 1. The convergence rate of the power method depends on the ratio of λ2 λ1 . Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 11 / 14
  • 36.
    Conclusions. The power methodand PageRank. Results. The matrix P of the PageRank algorithm is a stochastic matrix therefore the largest eigenvalue is 1. The convergence rate of the power method depends on the ratio of λ2 λ1 . It has been showed by [Haveliwala and Kamvar(2003)] that for the second largest eigenvalue of P, we have |λ2| ≤ α 0 ≤ α ≤ 1. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 11 / 14
  • 37.
    Conclusions. The power methodand PageRank. Results. The matrix P of the PageRank algorithm is a stochastic matrix therefore the largest eigenvalue is 1. The convergence rate of the power method depends on the ratio of λ2 λ1 . It has been showed by [Haveliwala and Kamvar(2003)] that for the second largest eigenvalue of P, we have |λ2| ≤ α 0 ≤ α ≤ 1. Since λ1 = 1 the converge rate depends on α. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 11 / 14
  • 38.
    Conclusions. The power methodand PageRank. Results. The matrix P of the PageRank algorithm is a stochastic matrix therefore the largest eigenvalue is 1. The convergence rate of the power method depends on the ratio of λ2 λ1 . It has been showed by [Haveliwala and Kamvar(2003)] that for the second largest eigenvalue of P, we have |λ2| ≤ α 0 ≤ α ≤ 1. Since λ1 = 1 the converge rate depends on α. The most popular value for α is 0.85. With this value it has been proved that the power method on web data set of over 80 million pages converges in about 50 iterations. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 11 / 14
  • 39.
    Conclusions. Really thanks toGTUG Palermo and see you to the next meeting! Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 12 / 14
  • 40.
    Bibliography. Bibliography. Brin, S. andPage, L. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer networks and ISDN systems, 30(1-7), 107–117. Ching, W. and Ng, M. (2006). Markov Chains: Models, Algoritms and Applications. Springer Science + Business Media, Inc. Haveliwala, T. and Kamvar, M. (2003). The second eigenvalue of the google matrix. Technical report, Stanford University. Langville, A., Meyer, C., and Fern´Andez, P. (2008). Google’s PageRank and beyond: the science of search engine rankings. The Mathematical Intelligencer, 30(1), 68–69. Page, L., Brin, S., Motwani, R., and Winograd, T. (1999). The PageRank Citation Ranking: Bringing Order to the Web. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 13 / 14
  • 41.
    Internet web sites. Internetweb sites. Jon Atle Gulla (2007) - From Google Search to Semantic Exploration. - Norwegian University of Science Technology - www.slideshare.net/sveino/semantics-and-search?type=presentation Steven Levy (2010) - Exclusive: How Google’s Algorithm Rules the Web - Wired Magazine - www.wired.com/magazine/2010/02/ff_google_algorithm/ Ann Smarty (2009) - Let’s Try to Find All 200 Parameters in Google Algorithm - Search Engine Journal - www.searchenginejournal.com/200-parameters-in-google-algorithm/15457/. Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Inte5th March 2010 14 / 14