1.
The effect of New Links on Google Pagerank By Hui Xie Apr , 07
2.
Computing PageRank <ul><li>Matrix representation </li></ul><ul><li>Let P be an n n matrix and p ij be the entry at the i-th row and j-th column. </li></ul><ul><li>If page i has k>0 outgoing links </li></ul><ul><li>p ij = 1/k if page i has a link to page j </li></ul><ul><li>p ij = 0 if there is no link from i to j </li></ul><ul><li>If page I has no outgoing links </li></ul><ul><li>p ij = 1/n j=1,…,n </li></ul>
3.
Google matrix <ul><li>G=cP+(1-c)(1/n)ee T </li></ul><ul><li>e=(1,…,1) T </li></ul><ul><li>G is stochastic matrix Ge=e </li></ul><ul><li>There exists a unique column vector π such that </li></ul><ul><li>π T G= π T, π T e=1 </li></ul><ul><li>π T =(1-c)/n e T (I-cP) -1 </li></ul>
4.
Discrete Time Markov Chains <ul><li>A sequence of random variables {X n } is called a Markov chain if it has the Markov property : </li></ul><ul><li>States are usually labeled {(0,)1,2,…} </li></ul><ul><li>State space can be finite or infinite </li></ul>
5.
Transition Probability <ul><li>Probability to jump from state i to state j </li></ul><ul><li>Assume stationary : independent of time </li></ul><ul><li>Transition probability matrix: </li></ul><ul><li>P = ( p ij ) </li></ul><ul><li>Two state MC: </li></ul>
6.
Side Topic: Markov Chains <ul><li>A discrete time stochastic process is a sequence of random variables {X 0 , X 1 , …, X n , …} where the 0, 1, …, n, … are discrete points in time. </li></ul><ul><li>A Markov chain is a discrete-time stochastic process defined over a finite (or countably infinite) set of states S in terms of a matrix P of transition probabilities . </li></ul><ul><li>Memorylessness property : for a Markov chain </li></ul><ul><li>Pr[X t+1 = j | X 0 = i 0 , X 1 = i 1 , …, X t = i] = Pr[X t+1 = j | X t = i] </li></ul>
7.
Side Topic: Markov Chains <ul><li>Let i (t) be the probability of being in state i at time step t. </li></ul><ul><li>Let (t) = [ 0 (t), 1 (t), … ] be the vector of probabilities at time t. </li></ul><ul><li>For an initial probability distribution (0), the probabilities at time n are </li></ul><ul><li> (n) = (0) P n </li></ul><ul><li>A probability distribution is stationary if = P </li></ul><ul><li>P ( X m+n = j | X m = i ) = P ( X n = j | X 0 = i ) = P n (i,j) </li></ul>
8.
absorbing Markov chain <ul><li>Define a discrete-time absorbing markov chain </li></ul><ul><li>{X t , t=0,1,…}with the state space {0,1,…,n} </li></ul><ul><li>Where transitions between the states 1,…, n are conducted by the matrix cP, and the state 0 is absorbing. </li></ul><ul><li>The transition matrix is </li></ul>
9.
<ul><li>Random walk interpretation </li></ul><ul><li>Walk starts at a uniformly chosen web page </li></ul><ul><li>At each step, if currently at page p </li></ul><ul><li>W/p α , go to a uniformly chosen outneighbor of p </li></ul><ul><li>W/p 1 - α , stop </li></ul>
10.
<ul><li>Let N j be the total number of visits to state j before absorption including the visit at time t = 0 if X 0 is j . Formally, </li></ul><ul><li>Then z ij =(I-cP) -1 ij =E(N j |X 0 =I) </li></ul><ul><li>Let q ij be the probability of reaching the state j before absorption if the initial state is i . Then we have </li></ul>
11.
<ul><li>Theorem Let X denote a Markov chain with state space E . The total number of visits to a state j ∈ E under the condition that the chain starts in state i is given by </li></ul><ul><li>P(N j =m|X 0 =j)=q jj m-1 (1-q jj ) </li></ul><ul><li>and for i!=j </li></ul><ul><li>P(N j =m|X 0 =i)= 1-q ij if m=0 </li></ul><ul><li>q ij q jj m-1 (1-q jj ) if m>=1 </li></ul><ul><li>Corollary For all i,j ∈ E the relations </li></ul><ul><li>z ij =(1-q ii ) -1 and z ij =q ij z jj hold </li></ul>
12.
Outgoing links from i do not affect q ji for any j!=I So by changing the outgoing links, a page can control its PageRank up to multiplication by a factor z ii =1/(1-q ii ) For 0<=q ii <=c 2 , 1<=z ii <=(1-c 2 ) -1 ≈3.6 for c=0.85
13.
Rank one update of google pagerank <ul><li>Page 1 with k 0 old links has k 1 newly created links to page 2 to k 1 +1 </li></ul><ul><li>k=k 0 +k 1 , p 1 T be the first row of matrix P </li></ul><ul><li>Updated hyperlink matrix </li></ul>
15.
<ul><li>According to (9) the ranking of page 1 increases when </li></ul>For z 11 =1/(1-q 11 ), z i2 =q i1 z 11, i>1 The above is equivalent to
16.
<ul><li>Hence, the page 1 increases its ranking when it refers to pages that are characterized by a high value of q i1 . These must be the pages that refer to page 1 or at least belong to the same Web community. Here by a Web community we mean a set of Web pages that a surfer can reach from one to another in a relatively small number of steps. </li></ul>
18.
the PageRank of page j increases if if several new links are added then the PageRank of page j might actually decrease even if this page receives one of the new links. Such situation occurs when most of newly created links point to “irrelevant” pages.
19.
<ul><li>For instance, let j = 2 and assume that there is no hyperlink path from pages 3,…, k + 1 to page 2.Then z ij is close to zero for i = 3,…, k + 1, and the PageRank of page 2 will increase only if ( c / k 1 ) z 22 > z 12 , which is not necessarily true, especially if z 12 and k 1 are considerably large. </li></ul>
20.
Asymptotic analysis <ul><li>Let be the stopping time of the first visit to the state j </li></ul><ul><li>M ij =E( |X 0 =i) be the average time needed to reach j starting from i(mean first passage time) </li></ul>
21.
<ul><li>Consider a page i = 1,…, n and assume that i has links to pages i 1 ,…,i k distinct from i . Further, let m ij ( c ) be the mean first passage time from page i to page j for the Google transition matrix G with parameter c . </li></ul>Optimal Linking Strategy
22.
<ul><li>outgoing links from i do not affect m ji ( c ) for any j! = i . Thus, by linking from i to j , one can only alter k, this means that the owner of the page I has very little control over its pagerank. The best that he can do is to link only to one page j * such that </li></ul>Note that (surprisingly) the PageRank of j* plays no role here.
23.
<ul><li>Theorem. The optimal linking strategy for a Web page is to have only one outgoing link pointing to a Web page with a shortest mean first passage time back to the original page. </li></ul>
24.
Conclusions <ul><li>Our main conclusion is that a Web page cannot significantly manipulate its PageRank by changing its outgoing links. </li></ul><ul><li>Furthermore, keeping a logical hyperlink structure and linking to a relevant Web community is the most sensible and rewarding policy. </li></ul>
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.
Be the first to comment