Relating Web Characteristics with Link-Based Ranking

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    1 Favorite

    Relating Web Characteristics with Link-Based Ranking - Presentation Transcript

    1. Relating Web Characteristics Ricardo Baeza-Yates Carlos Castillo Universidad de Chile
    2. Agenda Introduction • Link-based ranking • Web structure • Web characteristics • Web usage • Web dynamics • Conclusions • Relating Web Characteristics
    3. Introduction: Sample Web sample: .CL domain on year 2000 • 670,000 pages in 7,500 domains • 15kb average page size • Collection from the TodoCL web search • engine Relating Web Characteristics
    4. Introduction: Emphasis • Broder et al.: Graph Structure on the Web (2000) – Page-based structure based on strongly connected components – The Web graph is not a random graph – Process: cut & paste model • Our is mostly a site-based analysis – Trying to make Web structure meaningful Relating Web Characteristics
    5. Introduction: The Empire Relating Web Characteristics
    6. Introduction: One Map Relating Web Characteristics
    7. Link ranking: Pagerank Pages that point to page p k q Pagerank ( p ) = + (1 − q )∑ Pagerank (ri ) N i =1 Currently used by Google Probability of a Brin & Page, 1998 random jump over number of pages Relating Web Characteristics
    8. Link ranking: Hubs & Authorities • HITS algorithm (Kleinberg, 1998) • A good authority is a page pointed by good hubs, so we assume that it has good content • A good hub is a page that points to good authorities, so we assume it is a good set of links • Linear system calculated by numerical iteration Relating Web Characteristics
    9. Link ranking: Distribution <2% with relevant Pagerank 9% with relevant 2-3% with relevant hub score authority score Relating Web Characteristics
    10. Link ranking: Correlation Hub score, authority score and Pagerank do not seem to be correlated Relating Web Characteristics
    11. Link ranking: Sites • Which measure to use for sites ? • Average score – But good sites can have lots of bad pages • Maximum score – But one good page cannot be all that is needed to be a good site • Sum of the scores of all pages – Natural for Pagerank Relating Web Characteristics
    12. Link ranking: Sites Graph 90% relevant site-Pagerank It’s harder to have a good hub than a good authority (site) Relating Web Characteristics
    13. Web Structure: Basis • The Web graph has structure: MAIN IN OUT ISLANDS Relating Web Characteristics
    14. Web Structure: Basis (cont.) • The MAIN component has structure: MAIN IN MAIN OUT MAIN MAIN IN MAIN NORM OUT Relating Web Characteristics
    15. Web Structure: Sketch Relating Web Characteristics
    16. Web Structure: Degree Relating Web Characteristics
    17. Web Structure: Sizes Relating Web Characteristics
    18. Web Structure: Preferences Relating Web Characteristics
    19. Web Structure: Preferences OUT MAIN OUT OUT MAIN MAIN MAIN MAIN Real ODP TodoCL Relating Web Characteristics
    20. Web Structure: Various Relating Web Characteristics
    21. Web Structure: Link Scores Relating Web Characteristics
    22. Web Dynamics: Ages • The kernel of the Web comes from the past Relating Web Characteristics
    23. Web Dynamics: By Component Relating Web Characteristics
    24. Web Dynamics: Pagerank Pagerank is biased against newer pages Relating Web Characteristics
    25. Web Dynamics: Hubs & Authorities Authority Score Hub Score Age (months) Relating Web Characteristics
    26. Conclusions • Pagerank/HITS do not seem to be correlated – And Pagerank is biased to older pages • Site ranking can help to make good human-selected directories • Finding good pages is not so simple • Characterizing Web structure gives valuable insight – Web Graph Mining is just starting Relating Web Characteristics

    + Carlos CastilloCarlos Castillo, 3 years ago

    custom

    605 views, 1 favs, 0 embeds more stats

    More info about this document

    CC Attribution License

    Go to text version

    • Total Views 605
      • 605 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 1
    • Downloads 13
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories