Your SlideShare is downloading. ×
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

# Pagerank (1)

165
views

Published on

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Be the first to comment

• Be the first to like this

No Downloads
Views
Total Views
165
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Transcript

• 1. The PageRank Citation Ranking: Bringing Order to the Web Larry Page etc. Stanford University Presented by Guoqiang Su & Wei Li
• 2. Contents
• Motivation
• Related work
• Page Rank & Random Surfer Model
• Implementation
• Application
• Conclusion
• 3. Motivation
• Web: heterogeneous and unstructured
• Free of quality control on the web
• Commercial interest to manipulate ranking
• 4. Related Work
• Academic citation analysis
• Link-based analysis
• Clustering methods of link structure
• Hubs & Authorities Model
• 5. Backlink
• Link Structure of the Web
• Approximation of importance / quality
• 6. PageRank
• Pages with lots of backlinks are important
• Backlinks coming from important pages convey more importance to a page
• Problem: Rank Sink
• 7. Rank Sink
• Page cycles pointed by some incoming link
• Problem: this loop will accumulate rank but never distribute any rank outside
• 8. Escape Term
• Solution: Rank Source
• c is maximized and = 1
• E(u) is some vector over the web pages
• – uniform, favorite page etc.
• 9. Matrix Notation
• R is the dominant eigenvector and c is the dominant eigenvalue of because c is maximized
• 10. Computing PageRank
• - initialize vector over web pages
• loop:
• - new ranks sum of normalized backlink ranks
• - compute normalizing factor
• - add escape term
• - control parameter
• while - stop when converged
• 11. Random Surfer Model
• Page Rank corresponds to the probability distribution of a random walk on the web graphs
• E(u) can be re-phrased as the random surfer gets bored periodically and jumps to a different page and not kept in a loop forever
• 12. Implementation
• Computing resources
• — 24 million pages
• — 75 million URLs
• Memory and disk storage
• Weight Vector
• (4 byte float)
• Matrix A
• (linear access)
• 13. Implementation (Con't)
• Unique integer ID for each URL
• Sort and Remove dangling links
• Rank initial assignment
• Iteration until convergence
• Add back dangling links and Re-compute
• 14. Convergence Properties
• Graph (V, E) is an expander with factor  if for all (not too large) subsets S: |As|   |s|
• Eigenvalue separation: Largest eigenvalue is sufficiently larger than the second-largest eigenvalue
• Random walk converges fast to a limiting probability distribution on a set of nodes in the graph.
• 15. Convergence Properties (con't)
• PageRank computation is O(log(|V|)) due to rapidly mixing graph G of the web.
• 16. Personalized PageRank
• Rank Source E can be initialized :
• – uniformly over all pages: e.g. copyright
• warnings, disclaimers, mailing lists archives
•  result in overly high ranking
• – total weight on a single page, e.g . Netscape, McCarthy
•  great variation of ranks under different single pages as rank source
• – and everything in-between, e.g. server root pages
•  allow manipulation by commercial interests
• 17. Applications I
• Estimate web traffic
• – Server/page aliases
• – Link/traffic disparity, e.g. porn sites, free web-mail
• Backlink predictor
• – Citation counts have been used to predict future citations
• – very difficult to map the citation structure of the web completely
• – avoid the local maxima that citation counts get stuck in and get better performance
• 18. Applications II - Ranking Proxy
• Surfer's Navigation Aid
• Annotating links by PageRank (bar graph)
• Not query dependent
• 19. Issues
• Users are no random walkers
• – Content based methods
• Starting point distribution
• – Actual usage data as starting vector
• Reinforcing effects/bias towards main pages
• How about traffic to ranking pages?
• No query specific rank
• Linkage spam
• – PageRank favors pages that managed to get other pages to link to
• them
• – Linkage not necessarily a sign of relevancy, only of promotion
• (advertisement…)
• 20. Evaluation I
• 21. Evaluation II
• 22. Conclusion
• PageRank is a global ranking based on the web's graph structure
• PageRank use backlinks information to bring order to the web
• PageRank can separate out representative pages as cluster center
• A great variety of applications