Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hui xie 591r_presentation


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Hui xie 591r_presentation

  1. 1. The effect of New Links on Google Pagerank By Hui Xie Apr , 07
  2. 2. Computing PageRank <ul><li>Matrix representation </li></ul><ul><li>Let P be an n  n matrix and p ij be the entry at the i-th row and j-th column. </li></ul><ul><li>If page i has k>0 outgoing links </li></ul><ul><li>p ij = 1/k if page i has a link to page j </li></ul><ul><li>p ij = 0 if there is no link from i to j </li></ul><ul><li>If page I has no outgoing links </li></ul><ul><li>p ij = 1/n j=1,…,n </li></ul>
  3. 3. Google matrix <ul><li>G=cP+(1-c)(1/n)ee T </li></ul><ul><li>e=(1,…,1) T </li></ul><ul><li>G is stochastic matrix Ge=e </li></ul><ul><li>There exists a unique column vector π such that </li></ul><ul><li>π T G= π T, π T e=1 </li></ul><ul><li>π T =(1-c)/n e T (I-cP) -1 </li></ul>
  4. 4. Discrete Time Markov Chains <ul><li>A sequence of random variables {X n } is called a Markov chain if it has the Markov property : </li></ul><ul><li>States are usually labeled {(0,)1,2,…} </li></ul><ul><li>State space can be finite or infinite </li></ul>
  5. 5. Transition Probability <ul><li>Probability to jump from state i to state j </li></ul><ul><li>Assume stationary : independent of time </li></ul><ul><li>Transition probability matrix: </li></ul><ul><li>P = ( p ij ) </li></ul><ul><li>Two state MC: </li></ul>
  6. 6. Side Topic: Markov Chains <ul><li>A discrete time stochastic process is a sequence of random variables {X 0 , X 1 , …, X n , …} where the 0, 1, …, n, … are discrete points in time. </li></ul><ul><li>A Markov chain is a discrete-time stochastic process defined over a finite (or countably infinite) set of states S in terms of a matrix P of transition probabilities . </li></ul><ul><li>Memorylessness property : for a Markov chain </li></ul><ul><li>Pr[X t+1 = j | X 0 = i 0 , X 1 = i 1 , …, X t = i] = Pr[X t+1 = j | X t = i] </li></ul>
  7. 7. Side Topic: Markov Chains <ul><li>Let  i (t) be the probability of being in state i at time step t. </li></ul><ul><li>Let  (t) = [  0 (t),  1 (t), … ] be the vector of probabilities at time t. </li></ul><ul><li>For an initial probability distribution  (0), the probabilities at time n are </li></ul><ul><li> (n) =  (0) P n </li></ul><ul><li>A probability distribution  is stationary if  =  P </li></ul><ul><li>P ( X m+n = j | X m = i ) = P ( X n = j | X 0 = i ) = P n (i,j) </li></ul>
  8. 8. absorbing Markov chain <ul><li>Define a discrete-time absorbing markov chain </li></ul><ul><li>{X t , t=0,1,…}with the state space {0,1,…,n} </li></ul><ul><li>Where transitions between the states 1,…, n are conducted by the matrix cP, and the state 0 is absorbing. </li></ul><ul><li>The transition matrix is </li></ul>
  9. 9. <ul><li>Random walk interpretation </li></ul><ul><li>Walk starts at a uniformly chosen web page </li></ul><ul><li>At each step, if currently at page p </li></ul><ul><li>W/p α , go to a uniformly chosen outneighbor of p </li></ul><ul><li>W/p 1 - α , stop </li></ul>
  10. 10. <ul><li>Let N j be the total number of visits to state j before absorption including the visit at time t = 0 if X 0 is j . Formally, </li></ul><ul><li>Then z ij =(I-cP) -1 ij =E(N j |X 0 =I) </li></ul><ul><li>Let q ij be the probability of reaching the state j before absorption if the initial state is i . Then we have </li></ul>
  11. 11. <ul><li>Theorem Let X denote a Markov chain with state space E . The total number of visits to a state j ∈ E under the condition that the chain starts in state i is given by </li></ul><ul><li>P(N j =m|X 0 =j)=q jj m-1 (1-q jj ) </li></ul><ul><li>and for i!=j </li></ul><ul><li>P(N j =m|X 0 =i)= 1-q ij if m=0 </li></ul><ul><li>q ij q jj m-1 (1-q jj ) if m>=1 </li></ul><ul><li>Corollary For all i,j ∈ E the relations </li></ul><ul><li>z ij =(1-q ii ) -1 and z ij =q ij z jj hold </li></ul>
  12. 12. Outgoing links from i do not affect q ji for any j!=I So by changing the outgoing links, a page can control its PageRank up to multiplication by a factor z ii =1/(1-q ii ) For 0<=q ii <=c 2 , 1<=z ii <=(1-c 2 ) -1 ≈3.6 for c=0.85
  13. 13. Rank one update of google pagerank <ul><li>Page 1 with k 0 old links has k 1 newly created links to page 2 to k 1 +1 </li></ul><ul><li>k=k 0 +k 1 , p 1 T be the first row of matrix P </li></ul><ul><li>Updated hyperlink matrix </li></ul>
  14. 15. <ul><li>According to (9) the ranking of page 1 increases when </li></ul>For z 11 =1/(1-q 11 ), z i2 =q i1 z 11, i>1 The above is equivalent to
  15. 16. <ul><li>Hence, the page 1 increases its ranking when it refers to pages that are characterized by a high value of q i1 . These must be the pages that refer to page 1 or at least belong to the same Web community. Here by a Web community we mean a set of Web pages that a surfer can reach from one to another in a relatively small number of steps. </li></ul>
  16. 17. the PageRank of page j increases if
  17. 18. the PageRank of page j increases if if several new links are added then the PageRank of page j might actually decrease even if this page receives one of the new links. Such situation occurs when most of newly created links point to “irrelevant” pages.
  18. 19. <ul><li>For instance, let j = 2 and assume that there is no hyperlink path from pages 3,…, k + 1 to page 2.Then z ij is close to zero for i = 3,…, k + 1, and the PageRank of page 2 will increase only if ( c / k 1 ) z 22 > z 12 , which is not necessarily true, especially if z 12 and k 1 are considerably large. </li></ul>
  19. 20. Asymptotic analysis <ul><li>Let be the stopping time of the first visit to the state j </li></ul><ul><li>M ij =E( |X 0 =i) be the average time needed to reach j starting from i(mean first passage time) </li></ul>
  20. 21. <ul><li>Consider a page i = 1,…, n and assume that i has links to pages i 1 ,…,i k distinct from i . Further, let m ij ( c ) be the mean first passage time from page i to page j for the Google transition matrix G with parameter c . </li></ul>Optimal Linking Strategy
  21. 22. <ul><li>outgoing links from i do not affect m ji ( c ) for any j! = i . Thus, by linking from i to j , one can only alter k, this means that the owner of the page I has very little control over its pagerank. The best that he can do is to link only to one page j * such that </li></ul>Note that (surprisingly) the PageRank of j* plays no role here.
  22. 23. <ul><li>Theorem. The optimal linking strategy for a Web page is to have only one outgoing link pointing to a Web page with a shortest mean first passage time back to the original page. </li></ul>
  23. 24. Conclusions <ul><li>Our main conclusion is that a Web page cannot significantly manipulate its PageRank by changing its outgoing links. </li></ul><ul><li>Furthermore, keeping a logical hyperlink structure and linking to a relevant Web community is the most sensible and rewarding policy. </li></ul>