Your SlideShare is downloading. ×
DIAPOSITIVA
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

DIAPOSITIVA

251

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
251
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Why does the Power Method Work?
  • Assume that lambda 1 is less than 1 and all other eigenvalues are strictly less than 1.
  • Here, talk about in the past, how lambda 2 is often close to 1, so the power method is not useful. However, in our case,
  • Note : derivation given here is slightly different from what’s in the paper the one here is perhaps more intuitive the one in the paper is more compact
  • Transcript

    • 1. Extrapolation Methods for Accelerating PageRank Computations Sepandar D. Kamvar Taher H. Haveliwala Christopher D. Manning Gene H. Golub Stanford University
    • 2. Motivation
      • Problem:
        • Speed up PageRank
      • Motivation:
        • Personalization
        • “ Freshness”
      Note: PageRank Computations don’t get faster as computers do. Results: 1. The Official Site of the San Francisco Giants Search: Giants Results: 1. The Official Site of the New York Giants
    • 3. Outline
      • Definition of PageRank
      • Computation of PageRank
      • Convergence Properties
      • Outline of Our Approach
      • Empirical Results
      0.4 0.2 0.4 Repeat: u 1 u 2 u 3 u 4 u 5 u 1 u 2 u 3 u 4 u 5
    • 4. Link Counts Linked by 2 Important Pages Linked by 2 Unimportant pages Sep’s Home Page Taher’s Home Page Yahoo! CNN DB Pub Server CS361
    • 5. Definition of PageRank
      • The importance of a page is given by the importance of the pages that link to it.
      importance of page i pages j that link to page i number of outlinks from page j importance of page j
    • 6. Definition of PageRank Yahoo! CNN DB Pub Server Taher Sep 1/2 1/2 1 1 0.1 0.1 0.1 0.05 0.25
    • 7. PageRank Diagram Initialize all nodes to rank 0.333 0.333 0.333
    • 8. PageRank Diagram Propagate ranks across links (multiplying by link weights) 0.167 0.167 0.333 0.333
    • 9. PageRank Diagram 0.333 0.5 0.167
    • 10. PageRank Diagram 0.167 0.167 0.5 0.167
    • 11. PageRank Diagram 0.5 0.333 0.167
    • 12. PageRank Diagram After a while… 0.4 0.4 0.2
    • 13. Computing PageRank
      • Initialize:
      • Repeat until convergence:
      importance of page i pages j that link to page i number of outlinks from page j importance of page j
    • 14. Matrix Notation 0 .2 0 .3 0 0 .1 .4 0 .1 = .1 .3 .2 .3 .1 .1 .2 . 1 .3 .2 .3 .1 .1
    • 15. Matrix Notation Find x that satisfies: . 1 .3 .2 .3 .1 .1 0 .2 0 .3 0 0 .1 .4 0 .1 = .1 .3 .2 .3 .1 .1 .2
    • 16. Power Method
      • Initialize:
      • Repeat until convergence:
    • 17.
      • PageRank doesn’t actually use P T . Instead, it uses A=cP T + (1-c)E T .
      • So the PageRank problem is really:
      • not:
      A side note Find x that satisfies: Find x that satisfies:
    • 18. Power Method
      • And the algorithm is really . . .
      • Initialize:
      • Repeat until convergence:
    • 19. Outline
      • Definition of PageRank
      • Computation of PageRank
      • Convergence Properties
      • Outline of Our Approach
      • Empirical Results
      0.4 0.2 0.4 Repeat: u 1 u 2 u 3 u 4 u 5 u 1 u 2 u 3 u 4 u 5
    • 20. Power Method u 1 1 u 2  2 u 3  3 u 4  4 u 5  5 Express x (0) in terms of eigenvectors of A
    • 21. Power Method u 1 1 u 2  2  2 u 3  3  3 u 4  4  4 u 5  5  5
    • 22. Power Method u 1 1 u 2  2  2 2 u 3  3  3 2 u 4  4  4 2 u 5  5  5 2
    • 23. Power Method u 1 1 u 2  2  2 k u 3  3  3 k u 4  4  4 k u 5  5  5 k
    • 24. Power Method u 1 1 u 2  u 3  u 4  u 5 
    • 25. Why does it work?
      • Imagine our n x n matrix A has n distinct eigenvectors u i .
      u 1 1 u 2  2 u 3  3 u 4  4 u 5  5
      • Then, you can write any n -dimensional vector as a linear combination of the eigenvectors of A .
    • 26. Why does it work?
      • From the last slide:
      • To get the first iterate, multiply x (0) by A .
      • First eigenvalue is 1.
      • Therefore:
      All less than 1
    • 27. Power Method u 1 1 u 2  2 u 3  3 u 4  4 u 5  5 u 1 1 u 2  2  2 u 3  3  3 u 4  4  4 u 5  5  5 u 1 1 u 2  2  2 2 u 3  3  3 2 u 4  4  4 2 u 5  5  5 2
    • 28.
      • The smaller  2 , the faster the convergence of the Power Method.
      Convergence u 1 1 u 2  2  2 k u 3  3  3 k u 4  4  4 k u 5  5  5 k
    • 29. Our Approach u 1 u 2 u 3 u 4 u 5 Estimate components of current iterate in the directions of second two eigenvectors, and eliminate them.
    • 30. Why this approach?
      • For traditional problems:
        • A is smaller, often dense.
        •  2 often close to   , making the power method slow.
      • In our problem,
        • A is huge and sparse
        • More importantly,  2 is small 1 .
      • Therefore, Power method is actually much faster than other methods.
      1 (“The Second Eigenvalue of the Google Matrix” dbpubs.stanford.edu/pub/2003-20.)
    • 31. Using Successive Iterates u 1 x (0) u 1 u 2 u 3 u 4 u 5
    • 32. Using Successive Iterates u 1 x (1) x (0) u 1 u 2 u 3 u 4 u 5
    • 33. Using Successive Iterates u 1 x (1) x (0) x (2) u 1 u 2 u 3 u 4 u 5
    • 34. Using Successive Iterates x (0) u 1 x (1) x (2) u 1 u 2 u 3 u 4 u 5
    • 35. Using Successive Iterates x (0) x’ = u 1 x (1) u 1 u 2 u 3 u 4 u 5
    • 36. How do we do this?
      • Assume x (k) can be written as a linear combination of the first three eigenvectors ( u 1 , u 2 , u 3 ) of A.
      • Compute approximation to { u 2 , u 3 }, and subtract it from x (k) to get x (k) ’
    • 37. Assume
      • Assume the x (k) can be represented by first 3 eigenvectors of A
    • 38. Linear Combination
      • Let’s take some linear combination of these 3 iterates.
    • 39. Rearranging Terms
      • We can rearrange the terms to get:
      Goal: Find  1 ,  2 ,  3 so that coefficients of u 2 and u 3 are 0, and coefficient of u 1 is 1.
    • 40. Summary
      • We make an assumption about the current iterate.
      • Solve for dominant eigenvector as a linear combination of the next three iterates.
      • We use a few iterations of the Power Method to “clean it up”.
    • 41. Outline
      • Definition of PageRank
      • Computation of PageRank
      • Convergence Properties
      • Outline of Our Approach
      • Empirical Results
      u 1 u 2 u 3 u 4 u 5 u 1 u 2 u 3 u 4 u 5 0.4 0.2 0.4 Repeat:
    • 42. Results Quadratic Extrapolation speeds up convergence. Extrapolation was only used 5 times!
    • 43. Results Extrapolation dramatically speeds up convergence, for high values of c (c=.99)
    • 44. Take-home message
      • Speeds up PageRank by a fair amount, but not by enough for true Personalized PageRank.
      • Ideas are useful for further speedup algorithms.
      • Quadratic Extrapolation can be used for a whole class of problems.
    • 45. The End
      • Paper available at http://dbpubs.stanford.edu/pub/2003-16

    ×