Upcoming SlideShare
×

# Pagerank

• 89 views

More in: Technology
• Comment goes here.
Are you sure you want to
Be the first to comment
Be the first to like this

Total Views
89
On Slideshare
0
From Embeds
0
Number of Embeds
0

Shares
0
0
Likes
0

No embeds

### Report content

No notes for slide

### Transcript

• 1. The PageRank Citation Ranking: Bringing Order to the Web Larry Page etc. Stanford University Presented by Guoqiang Su & Wei Li
• 2. Contents
• Motivation
• Related work
• Page Rank & Random Surfer Model
• Implementation
• Application
• Conclusion
• 3. Motivation
• Web: heterogeneous and unstructured
• Free of quality control on the web
• Commercial interest to manipulate ranking
• 4. Related Work
• Clustering methods of link structure
• Hubs & Authorities Model
• Link Structure of the Web
• Approximation of importance / quality
• 6. PageRank
• Pages with lots of backlinks are important
• Backlinks coming from important pages convey more importance to a page
• Problem: Rank Sink
• 7. Rank Sink
• Page cycles pointed by some incoming link
• Problem: this loop will accumulate rank but never distribute any rank outside
• 8. Escape Term
• Solution: Rank Source
• c is maximized and = 1
• E(u) is some vector over the web pages
• – uniform, favorite page etc.
• 9. Matrix Notation
• R is the dominant eigenvector and c is the dominant eigenvalue of because c is maximized
• 10. Computing PageRank
• - initialize vector over web pages
• loop:
• - new ranks sum of normalized backlink ranks
• - compute normalizing factor
• - control parameter
• while - stop when converged
• 11. Random Surfer Model
• Page Rank corresponds to the probability distribution of a random walk on the web graphs
• E(u) can be re-phrased as the random surfer gets bored periodically and jumps to a different page and not kept in a loop forever
• 12. Implementation
• Computing resources
• — 24 million pages
• — 75 million URLs
• Memory and disk storage
• Weight Vector
• (4 byte float)
• Matrix A
• (linear access)
• 13. Implementation (Con't)
• Unique integer ID for each URL
• Sort and Remove dangling links
• Rank initial assignment
• Iteration until convergence
• 14. Convergence Properties
• Graph (V, E) is an expander with factor  if for all (not too large) subsets S: |As|   |s|
• Eigenvalue separation: Largest eigenvalue is sufficiently larger than the second-largest eigenvalue
• Random walk converges fast to a limiting probability distribution on a set of nodes in the graph.
• 15. Convergence Properties (con't)
• PageRank computation is O(log(|V|)) due to rapidly mixing graph G of the web.
• 16. Personalized PageRank
• Rank Source E can be initialized :
• – uniformly over all pages: e.g. copyright
• warnings, disclaimers, mailing lists archives
•  result in overly high ranking
• – total weight on a single page, e.g . Netscape, McCarthy
•  great variation of ranks under different single pages as rank source
• – and everything in-between, e.g. server root pages
•  allow manipulation by commercial interests
• 17. Applications I
• Estimate web traffic
• – Server/page aliases
• – Link/traffic disparity, e.g. porn sites, free web-mail
• – Citation counts have been used to predict future citations
• – very difficult to map the citation structure of the web completely
• – avoid the local maxima that citation counts get stuck in and get better performance
• 18. Applications II - Ranking Proxy
• Annotating links by PageRank (bar graph)
• Not query dependent
• 19. Issues
• Users are no random walkers
• – Content based methods
• Starting point distribution
• – Actual usage data as starting vector
• Reinforcing effects/bias towards main pages
• How about traffic to ranking pages?
• No query specific rank