Like this presentation? Why not share!

# Pagerank

## by José Luis Vera Chávez on Jun 11, 2010

• 210 views

### Views

Total Views
210
Views on SlideShare
209
Embed Views
1

Likes
0
0
0

### 1 Embed1

 http://www.slideshare.net 1

### Categories

Uploaded via SlideShare as Microsoft PowerPoint

## PagerankPresentation Transcript

• The PageRank Citation Ranking: Bringing Order to the Web Larry Page etc. Stanford University Presented by Guoqiang Su & Wei Li
• Contents
• Motivation
• Related work
• Page Rank & Random Surfer Model
• Implementation
• Application
• Conclusion
• Motivation
• Web: heterogeneous and unstructured
• Free of quality control on the web
• Commercial interest to manipulate ranking
• Related Work
• Clustering methods of link structure
• Hubs & Authorities Model
• Link Structure of the Web
• Approximation of importance / quality
• PageRank
• Pages with lots of backlinks are important
• Backlinks coming from important pages convey more importance to a page
• Problem: Rank Sink
• Rank Sink
• Page cycles pointed by some incoming link
• Problem: this loop will accumulate rank but never distribute any rank outside
• Escape Term
• Solution: Rank Source
• c is maximized and = 1
• E(u) is some vector over the web pages
• – uniform, favorite page etc.
• Matrix Notation
• R is the dominant eigenvector and c is the dominant eigenvalue of because c is maximized
• Computing PageRank
• - initialize vector over web pages
• loop:
• - new ranks sum of normalized backlink ranks
• - compute normalizing factor
• - control parameter
• while - stop when converged
• Random Surfer Model
• Page Rank corresponds to the probability distribution of a random walk on the web graphs
• E(u) can be re-phrased as the random surfer gets bored periodically and jumps to a different page and not kept in a loop forever
• Implementation
• Computing resources
• — 24 million pages
• — 75 million URLs
• Memory and disk storage
• Weight Vector
• (4 byte float)
• Matrix A
• (linear access)
• Implementation (Con't)
• Unique integer ID for each URL
• Sort and Remove dangling links
• Rank initial assignment
• Iteration until convergence
• Convergence Properties
• Graph (V, E) is an expander with factor  if for all (not too large) subsets S: |As|   |s|
• Eigenvalue separation: Largest eigenvalue is sufficiently larger than the second-largest eigenvalue
• Random walk converges fast to a limiting probability distribution on a set of nodes in the graph.
• Convergence Properties (con't)
• PageRank computation is O(log(|V|)) due to rapidly mixing graph G of the web.
• Personalized PageRank
• Rank Source E can be initialized :
• – uniformly over all pages: e.g. copyright
• warnings, disclaimers, mailing lists archives
•  result in overly high ranking
• – total weight on a single page, e.g . Netscape, McCarthy
•  great variation of ranks under different single pages as rank source
• – and everything in-between, e.g. server root pages
•  allow manipulation by commercial interests
• Applications I
• Estimate web traffic
• – Server/page aliases
• – Link/traffic disparity, e.g. porn sites, free web-mail
• – Citation counts have been used to predict future citations
• – very difficult to map the citation structure of the web completely
• – avoid the local maxima that citation counts get stuck in and get better performance
• Applications II - Ranking Proxy
• Annotating links by PageRank (bar graph)
• Not query dependent
• Issues
• Users are no random walkers
• – Content based methods
• Starting point distribution
• – Actual usage data as starting vector
• Reinforcing effects/bias towards main pages
• How about traffic to ranking pages?
• No query specific rank