Upcoming SlideShare
Loading in...5







Total Views
Slideshare-icon Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Pagerank Pagerank Presentation Transcript

    • The PageRank Citation Ranking: Bringing Order to the Web Larry Page etc. Stanford University Presented by Guoqiang Su & Wei Li
    • Contents
      • Motivation
      • Related work
      • Page Rank & Random Surfer Model
      • Implementation
      • Application
      • Conclusion
    • Motivation
      • Web: heterogeneous and unstructured
      • Free of quality control on the web
      • Commercial interest to manipulate ranking
    • Related Work
      • Academic citation analysis
      • Link-based analysis
      • Clustering methods of link structure
      • Hubs & Authorities Model
    • Backlink
      • Link Structure of the Web
      • Approximation of importance / quality
    • PageRank
      • Pages with lots of backlinks are important
      • Backlinks coming from important pages convey more importance to a page
      • Problem: Rank Sink
    • Rank Sink
      • Page cycles pointed by some incoming link
      • Problem: this loop will accumulate rank but never distribute any rank outside
    • Escape Term
      • Solution: Rank Source
      • c is maximized and = 1
      • E(u) is some vector over the web pages
      • – uniform, favorite page etc.
    • Matrix Notation
      • R is the dominant eigenvector and c is the dominant eigenvalue of because c is maximized
    • Computing PageRank
      • - initialize vector over web pages
      • loop:
      • - new ranks sum of normalized backlink ranks
      • - compute normalizing factor
      • - add escape term
      • - control parameter
      • while - stop when converged
    • Random Surfer Model
      • Page Rank corresponds to the probability distribution of a random walk on the web graphs
      • E(u) can be re-phrased as the random surfer gets bored periodically and jumps to a different page and not kept in a loop forever
    • Implementation
      • Computing resources
      • — 24 million pages
      • — 75 million URLs
      • Memory and disk storage
      • Weight Vector
      • (4 byte float)
      • Matrix A
      • (linear access)
    • Implementation (Con't)
      • Unique integer ID for each URL
      • Sort and Remove dangling links
      • Rank initial assignment
      • Iteration until convergence
      • Add back dangling links and Re-compute
    • Convergence Properties
      • Graph (V, E) is an expander with factor  if for all (not too large) subsets S: |As|   |s|
      • Eigenvalue separation: Largest eigenvalue is sufficiently larger than the second-largest eigenvalue
      • Random walk converges fast to a limiting probability distribution on a set of nodes in the graph.
    • Convergence Properties (con't)
      • PageRank computation is O(log(|V|)) due to rapidly mixing graph G of the web.
    • Personalized PageRank
      • Rank Source E can be initialized :
      • – uniformly over all pages: e.g. copyright
      • warnings, disclaimers, mailing lists archives
      •  result in overly high ranking
      • – total weight on a single page, e.g . Netscape, McCarthy
      •  great variation of ranks under different single pages as rank source
      • – and everything in-between, e.g. server root pages
      •  allow manipulation by commercial interests
    • Applications I
      • Estimate web traffic
      • – Server/page aliases
      • – Link/traffic disparity, e.g. porn sites, free web-mail
      • Backlink predictor
      • – Citation counts have been used to predict future citations
      • – very difficult to map the citation structure of the web completely
      • – avoid the local maxima that citation counts get stuck in and get better performance
    • Applications II - Ranking Proxy
      • Surfer's Navigation Aid
      • Annotating links by PageRank (bar graph)
      • Not query dependent
    • Issues
      • Users are no random walkers
      • – Content based methods
      • Starting point distribution
      • – Actual usage data as starting vector
      • Reinforcing effects/bias towards main pages
      • How about traffic to ranking pages?
      • No query specific rank
      • Linkage spam
      • – PageRank favors pages that managed to get other pages to link to
      • them
      • – Linkage not necessarily a sign of relevancy, only of promotion
      • (advertisement…)
    • Evaluation I
    • Evaluation II
    • Conclusion
      • PageRank is a global ranking based on the web's graph structure
      • PageRank use backlinks information to bring order to the web
      • PageRank can separate out representative pages as cluster center
      • A great variety of applications