The Pagerank
Upcoming SlideShare
Loading in...5

The Pagerank






Total Views
Views on SlideShare
Embed Views



1 Embed 1 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

The Pagerank The Pagerank Presentation Transcript

  • The PageRank Citation Ranking: Bringing Order to the Web Larry Page etc. Stanford University Presented by Guoqiang Su & Wei Li
  • Contents
    • Motivation
    • Related work
    • Page Rank & Random Surfer Model
    • Implementation
    • Application
    • Conclusion
  • Motivation
    • Web: heterogeneous and unstructured
    • Free of quality control on the web
    • Commercial interest to manipulate ranking
  • Related Work
    • Academic citation analysis
    • Link-based analysis
    • Clustering methods of link structure
    • Hubs & Authorities Model
  • Backlink
    • Link Structure of the Web
    • Approximation of importance / quality
  • PageRank
    • Pages with lots of backlinks are important
    • Backlinks coming from important pages convey more importance to a page
    • Problem: Rank Sink
  • Rank Sink
    • Page cycles pointed by some incoming link
    • Problem: this loop will accumulate rank but never distribute any rank outside
  • Escape Term
    • Solution: Rank Source
    • c is maximized and = 1
    • E(u) is some vector over the web pages
    • – uniform, favorite page etc.
  • Matrix Notation
    • R is the dominant eigenvector and c is the dominant eigenvalue of because c is maximized
  • Computing PageRank
    • - initialize vector over web pages
    • loop:
    • - new ranks sum of normalized backlink ranks
    • - compute normalizing factor
    • - add escape term
    • - control parameter
    • while - stop when converged
  • Random Surfer Model
    • Page Rank corresponds to the probability distribution of a random walk on the web graphs
    • E(u) can be re-phrased as the random surfer gets bored periodically and jumps to a different page and not kept in a loop forever
  • Implementation
    • Computing resources
    • — 24 million pages
    • — 75 million URLs
    • Memory and disk storage
    • Weight Vector
    • (4 byte float)
    • Matrix A
    • (linear access)
  • Implementation (Con't)
    • Unique integer ID for each URL
    • Sort and Remove dangling links
    • Rank initial assignment
    • Iteration until convergence
    • Add back dangling links and Re-compute
  • Convergence Properties
    • Graph (V, E) is an expander with factor  if for all (not too large) subsets S: |As|   |s|
    • Eigenvalue separation: Largest eigenvalue is sufficiently larger than the second-largest eigenvalue
    • Random walk converges fast to a limiting probability distribution on a set of nodes in the graph.
  • Convergence Properties (con't)
    • PageRank computation is O(log(|V|)) due to rapidly mixing graph G of the web.
  • Personalized PageRank
    • Rank Source E can be initialized :
    • – uniformly over all pages: e.g. copyright
    • warnings, disclaimers, mailing lists archives
    •  result in overly high ranking
    • – total weight on a single page, e.g . Netscape, McCarthy
    •  great variation of ranks under different single pages as rank source
    • – and everything in-between, e.g. server root pages
    •  allow manipulation by commercial interests
  • Applications I
    • Estimate web traffic
    • – Server/page aliases
    • – Link/traffic disparity, e.g. porn sites, free web-mail
    • Backlink predictor
    • – Citation counts have been used to predict future citations
    • – very difficult to map the citation structure of the web completely
    • – avoid the local maxima that citation counts get stuck in and get better performance
  • Applications II - Ranking Proxy
    • Surfer's Navigation Aid
    • Annotating links by PageRank (bar graph)
    • Not query dependent
  • Issues
    • Users are no random walkers
    • – Content based methods
    • Starting point distribution
    • – Actual usage data as starting vector
    • Reinforcing effects/bias towards main pages
    • How about traffic to ranking pages?
    • No query specific rank
    • Linkage spam
    • – PageRank favors pages that managed to get other pages to link to
    • them
    • – Linkage not necessarily a sign of relevancy, only of promotion
    • (advertisement…)
  • Evaluation I
  • Evaluation II
  • Conclusion
    • PageRank is a global ranking based on the web's graph structure
    • PageRank use backlinks information to bring order to the web
    • PageRank can separate out representative pages as cluster center
    • A great variety of applications