Successfully reported this slideshow.
Upcoming SlideShare
×

# Hui xie 591r_presentation

400 views

Published on

Published in: Technology, Education
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

### Hui xie 591r_presentation

1. 1. The effect of New Links on Google Pagerank By Hui Xie Apr , 07
2. 2. Computing PageRank <ul><li>Matrix representation </li></ul><ul><li>Let P be an n  n matrix and p ij be the entry at the i-th row and j-th column. </li></ul><ul><li>If page i has k>0 outgoing links </li></ul><ul><li>p ij = 1/k if page i has a link to page j </li></ul><ul><li>p ij = 0 if there is no link from i to j </li></ul><ul><li>If page I has no outgoing links </li></ul><ul><li>p ij = 1/n j=1,…,n </li></ul>
3. 3. Google matrix <ul><li>G=cP+(1-c)(1/n)ee T </li></ul><ul><li>e=(1,…,1) T </li></ul><ul><li>G is stochastic matrix Ge=e </li></ul><ul><li>There exists a unique column vector π such that </li></ul><ul><li>π T G= π T, π T e=1 </li></ul><ul><li>π T =(1-c)/n e T (I-cP) -1 </li></ul>
4. 4. Discrete Time Markov Chains <ul><li>A sequence of random variables {X n } is called a Markov chain if it has the Markov property : </li></ul><ul><li>States are usually labeled {(0,)1,2,…} </li></ul><ul><li>State space can be finite or infinite </li></ul>
5. 5. Transition Probability <ul><li>Probability to jump from state i to state j </li></ul><ul><li>Assume stationary : independent of time </li></ul><ul><li>Transition probability matrix: </li></ul><ul><li>P = ( p ij ) </li></ul><ul><li>Two state MC: </li></ul>
6. 6. Side Topic: Markov Chains <ul><li>A discrete time stochastic process is a sequence of random variables {X 0 , X 1 , …, X n , …} where the 0, 1, …, n, … are discrete points in time. </li></ul><ul><li>A Markov chain is a discrete-time stochastic process defined over a finite (or countably infinite) set of states S in terms of a matrix P of transition probabilities . </li></ul><ul><li>Memorylessness property : for a Markov chain </li></ul><ul><li>Pr[X t+1 = j | X 0 = i 0 , X 1 = i 1 , …, X t = i] = Pr[X t+1 = j | X t = i] </li></ul>
7. 7. Side Topic: Markov Chains <ul><li>Let  i (t) be the probability of being in state i at time step t. </li></ul><ul><li>Let  (t) = [  0 (t),  1 (t), … ] be the vector of probabilities at time t. </li></ul><ul><li>For an initial probability distribution  (0), the probabilities at time n are </li></ul><ul><li> (n) =  (0) P n </li></ul><ul><li>A probability distribution  is stationary if  =  P </li></ul><ul><li>P ( X m+n = j | X m = i ) = P ( X n = j | X 0 = i ) = P n (i,j) </li></ul>
8. 8. absorbing Markov chain <ul><li>Define a discrete-time absorbing markov chain </li></ul><ul><li>{X t , t=0,1,…}with the state space {0,1,…,n} </li></ul><ul><li>Where transitions between the states 1,…, n are conducted by the matrix cP, and the state 0 is absorbing. </li></ul><ul><li>The transition matrix is </li></ul>
9. 9. <ul><li>Random walk interpretation </li></ul><ul><li>Walk starts at a uniformly chosen web page </li></ul><ul><li>At each step, if currently at page p </li></ul><ul><li>W/p α , go to a uniformly chosen outneighbor of p </li></ul><ul><li>W/p 1 - α , stop </li></ul>
10. 10. <ul><li>Let N j be the total number of visits to state j before absorption including the visit at time t = 0 if X 0 is j . Formally, </li></ul><ul><li>Then z ij =(I-cP) -1 ij =E(N j |X 0 =I) </li></ul><ul><li>Let q ij be the probability of reaching the state j before absorption if the initial state is i . Then we have </li></ul>
11. 11. <ul><li>Theorem Let X denote a Markov chain with state space E . The total number of visits to a state j ∈ E under the condition that the chain starts in state i is given by </li></ul><ul><li>P(N j =m|X 0 =j)=q jj m-1 (1-q jj ) </li></ul><ul><li>and for i!=j </li></ul><ul><li>P(N j =m|X 0 =i)= 1-q ij if m=0 </li></ul><ul><li>q ij q jj m-1 (1-q jj ) if m>=1 </li></ul><ul><li>Corollary For all i,j ∈ E the relations </li></ul><ul><li>z ij =(1-q ii ) -1 and z ij =q ij z jj hold </li></ul>
12. 12. Outgoing links from i do not affect q ji for any j!=I So by changing the outgoing links, a page can control its PageRank up to multiplication by a factor z ii =1/(1-q ii ) For 0<=q ii <=c 2 , 1<=z ii <=(1-c 2 ) -1 ≈3.6 for c=0.85
13. 13. Rank one update of google pagerank <ul><li>Page 1 with k 0 old links has k 1 newly created links to page 2 to k 1 +1 </li></ul><ul><li>k=k 0 +k 1 , p 1 T be the first row of matrix P </li></ul><ul><li>Updated hyperlink matrix </li></ul>
14. 15. <ul><li>According to (9) the ranking of page 1 increases when </li></ul>For z 11 =1/(1-q 11 ), z i2 =q i1 z 11, i>1 The above is equivalent to
15. 16. <ul><li>Hence, the page 1 increases its ranking when it refers to pages that are characterized by a high value of q i1 . These must be the pages that refer to page 1 or at least belong to the same Web community. Here by a Web community we mean a set of Web pages that a surfer can reach from one to another in a relatively small number of steps. </li></ul>
16. 17. the PageRank of page j increases if
17. 18. the PageRank of page j increases if if several new links are added then the PageRank of page j might actually decrease even if this page receives one of the new links. Such situation occurs when most of newly created links point to “irrelevant” pages.
18. 19. <ul><li>For instance, let j = 2 and assume that there is no hyperlink path from pages 3,…, k + 1 to page 2.Then z ij is close to zero for i = 3,…, k + 1, and the PageRank of page 2 will increase only if ( c / k 1 ) z 22 > z 12 , which is not necessarily true, especially if z 12 and k 1 are considerably large. </li></ul>
19. 20. Asymptotic analysis <ul><li>Let be the stopping time of the first visit to the state j </li></ul><ul><li>M ij =E( |X 0 =i) be the average time needed to reach j starting from i(mean first passage time) </li></ul>
20. 21. <ul><li>Consider a page i = 1,…, n and assume that i has links to pages i 1 ,…,i k distinct from i . Further, let m ij ( c ) be the mean first passage time from page i to page j for the Google transition matrix G with parameter c . </li></ul>Optimal Linking Strategy
21. 22. <ul><li>outgoing links from i do not affect m ji ( c ) for any j! = i . Thus, by linking from i to j , one can only alter k, this means that the owner of the page I has very little control over its pagerank. The best that he can do is to link only to one page j * such that </li></ul>Note that (surprisingly) the PageRank of j* plays no role here.
22. 23. <ul><li>Theorem. The optimal linking strategy for a Web page is to have only one outgoing link pointing to a Web page with a shortest mean first passage time back to the original page. </li></ul>
23. 24. Conclusions <ul><li>Our main conclusion is that a Web page cannot significantly manipulate its PageRank by changing its outgoing links. </li></ul><ul><li>Furthermore, keeping a logical hyperlink structure and linking to a relevant Web community is the most sensible and rewarding policy. </li></ul>