Dr. Searcher and Mr. Browser:
       A unified hyperlink-click graph
Barbara Poblete, Carlos Castillo, Aristides Gionis
   ...
In this talk




       New unified web graph: hyperlink-click graph
       Union of the hyperlink graph and the click grap...
Motivation



     Two types of web graphs: click and hyperlink graph
     Represent two most common tasks of users:
     ...
Motivation...


  Drawbacks:
    § Hyperlink graph: adversarial increase in PageRank scores,
      i.e., spam or link farm...
Motivation...


  Drawbacks:
    § Hyperlink graph: adversarial increase in PageRank scores,
      i.e., spam or link farm...
Contribution

  Introduce the hyperlink-click graph,
  union of hyperlink and click graph
  Consider random walk on the hy...
Applications



  Uses of the hyperlink-click graph:

      Ranking of documents
      Query ranking and query recommendat...
Web graphs
                               hyperlink-click graph


    clicked                     clicked     neighbor doc...
Random walks on web graphs


  Random walk on the hyperlink graph
      PH = αNH + (1 − α)1H
  Random walk on the click gr...
Random walks on web graphs


  Random walk on the hyperlink graph
      PH = αNH + (1 − α)1H
  Random walk on the click gr...
Random walks on web graphs


  Random walk on the hyperlink graph
      PH = αNH + (1 − α)1H
  Random walk on the click gr...
Web graphs
                               hyperlink-click graph


    clicked                     clicked     neighbor doc...
Evaluation and results




      Validate utility of the random walk scores on the
      hyperlink-click graph
      Compa...
Evaluation and results...



   We focus on two tasks in which a good ranking method
   should perform well:

       ranki...
Evaluation and results...


   Dataset:
       Yahoo! query log
       9 K seed documents extracted from the query log
   ...
Evaluation and results...




   Two types of data:
       Without sponsored results: query log only with
       algorithm...
Evaluation and results...


   Random walk evaluation
      we compute scores on the complete datasets, but
      we evalu...
Evaluation and results...


   dmoz:
   ΠZ : Our first measure is the normalized sum of the π scores
        of DZ document...
Evaluation and results...


   User study:

       13 users
       1 710 accessments
       users expressed not neutral op...
Evaluation and results...


   dmoz:
      Macro-evaluation: captures the overall scores of
      high-quality documents
 ...
Evaluation and results...


   dmoz:
      Macro-evaluation: captures the overall scores of
      high-quality documents
 ...
Evaluation and results...


   dmoz
     metric                        macro
              without sponsored results with ...
Evaluation and results...




   User study

      metric    without sponsored results   with sponsored results
        Γ ...
Evaluation and results...
   Exclude documents with very similar scores
Evaluation and results...

         Summary of ΓZ values for dmoz:
         1.000
                                        ...
Evaluation and results...

         Summary of ΠZ values for dmoz:
         0.900
                                        ...
Conclusions



      We study random walk on a unified web graph
      Intended to model user searching and browsing behavi...
Future work



     Analyze how to deal with the inherent bias that exists in
     any ranking technique based on usage mi...
Thank you!
Upcoming SlideShare
Loading in …5
×

Dr. Searcher and Mr. Browser: A unified hyperlink-click graph

2,338 views

Published on

Barbara Poblete, Aristides Gionis, Carlos Castillo: \"Dr. Searcher and Mr. Browser: a unified hyperlink-click graph\". Proceedings of CIKM, Napa Valley, CA, USA, October 2008.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,338
On SlideShare
0
From Embeds
0
Number of Embeds
45
Actions
Shares
0
Downloads
26
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Dr. Searcher and Mr. Browser: A unified hyperlink-click graph

  1. 1. Dr. Searcher and Mr. Browser: A unified hyperlink-click graph Barbara Poblete, Carlos Castillo, Aristides Gionis University Pompeu Fabra, Yahoo! Research Barcelona, Spain CIKM 2008
  2. 2. In this talk New unified web graph: hyperlink-click graph Union of the hyperlink graph and the click graph Random walk generates robust results that match or are better than the results obtained on either individual graph
  3. 3. Motivation Two types of web graphs: click and hyperlink graph Represent two most common tasks of users: searching and browsing Correspond to two prototypical ways of looking for information: asking and exploring Edges capture semantic relations among nodes: e.g., similarity and authority endorsement
  4. 4. Motivation... Drawbacks: § Hyperlink graph: adversarial increase in PageRank scores, i.e., spam or link farms § Click graph: sparsity, inherent bias in web-search engine ranking, dependency on textual match and click spam Our claim: Hyperlink and click graphs are complementary and can be used to alleviate the shortcomings of each other
  5. 5. Motivation... Drawbacks: § Hyperlink graph: adversarial increase in PageRank scores, i.e., spam or link farms § Click graph: sparsity, inherent bias in web-search engine ranking, dependency on textual match and click spam Our claim: Hyperlink and click graphs are complementary and can be used to alleviate the shortcomings of each other
  6. 6. Contribution Introduce the hyperlink-click graph, union of hyperlink and click graph Consider random walk on the hyperlink-click graph 1 Model user behavior more accurately 2 Show that ranking with the hyperlink-click graph is similar to the best performance by either of the two graphs 3 The unified graph compensates where either graph performs poorly 4 more robust and fail-safe
  7. 7. Applications Uses of the hyperlink-click graph: Ranking of documents Query ranking and query recommendation Similarity search Spam detection
  8. 8. Web graphs hyperlink-click graph clicked clicked neighbor documents queries documents frequency(q) click graph hyperlink graph
  9. 9. Random walks on web graphs Random walk on the hyperlink graph PH = αNH + (1 − α)1H Random walk on the click graph PC = αNC + (1 − α)1C Random walk on the hyperlink-click graph PHC = αβNC + α(1 − β)NH + (1 − α)1
  10. 10. Random walks on web graphs Random walk on the hyperlink graph PH = αNH + (1 − α)1H Random walk on the click graph PC = αNC + (1 − α)1C Random walk on the hyperlink-click graph PHC = αβNC + α(1 − β)NH + (1 − α)1
  11. 11. Random walks on web graphs Random walk on the hyperlink graph PH = αNH + (1 − α)1H Random walk on the click graph PC = αNC + (1 − α)1C Random walk on the hyperlink-click graph PHC = αβNC + α(1 − β)NH + (1 − α)1
  12. 12. Web graphs hyperlink-click graph clicked clicked neighbor documents queries documents frequency(q) click graph hyperlink graph
  13. 13. Evaluation and results Validate utility of the random walk scores on the hyperlink-click graph Compare scores with those produced from the hyperlink and the click graph
  14. 14. Evaluation and results... We focus on two tasks in which a good ranking method should perform well: ranking high-quality documents and ranking pairs of documents The evaluation is centered on analyzing the dissimilarities among the different models
  15. 15. Evaluation and results... Dataset: Yahoo! query log 9 K seed documents extracted from the query log - documents with at least 10 clicks 61 K queries with at least one click to the seed documents Using a web crawl expanded to 144 million documents The expansion considers all in- and out-neighbors of all documents in the seed set seed set.
  16. 16. Evaluation and results... Two types of data: Without sponsored results: query log only with algorithmic results With sponsored results:
  17. 17. Evaluation and results... Random walk evaluation we compute scores on the complete datasets, but we evaluate only on documents in the intersection of the three graphs Two tasks: 1 ranking high-quality documents (dmoz), 2 ranking pairs of documents (user evaluation).
  18. 18. Evaluation and results... dmoz: ΠZ : Our first measure is the normalized sum of the π scores of DZ documents. d∈DZπ(d) ΠZ = d∈DC π(d) ΓZ : Goodman-Kruskal Gamma Γ measure between the rankings D −A Γ= D +A
  19. 19. Evaluation and results... User study: 13 users 1 710 accessments users expressed not neutral opinion in 32% of document pairs Use Γ to measure pairs of documents on which rankings agree with users
  20. 20. Evaluation and results... dmoz: Macro-evaluation: captures the overall scores of high-quality documents ΠZ and ΓZ are computed considering all the documents in the evaluation set Micro-evaluation: at query level ΠZ and ΓZ in the evaluation set are reduced to only those documents clicked from a particular query. (User study: micro-evaluation)
  21. 21. Evaluation and results... dmoz: Macro-evaluation: captures the overall scores of high-quality documents ΠZ and ΓZ are computed considering all the documents in the evaluation set Micro-evaluation: at query level ΠZ and ΓZ in the evaluation set are reduced to only those documents clicked from a particular query. (User study: micro-evaluation)
  22. 22. Evaluation and results... dmoz metric macro without sponsored results with sponsored results ΓZ GC ≈ GHC > GH GC > GHC > GH ΠZ GH ≈ GHC > GC GH ≈ GHC > GC metric micro without sponsored results with sponsored results ΓZ GC ≈ GHC > GH GHC ≈ GC > GH ΠZ GHC ≈ GC > GH GHC ≈ GH ≈ GC
  23. 23. Evaluation and results... User study metric without sponsored results with sponsored results Γ GC > GHC > GH GH > GHC > GC
  24. 24. Evaluation and results... Exclude documents with very similar scores
  25. 25. Evaluation and results... Summary of ΓZ values for dmoz: 1.000 click graph hyperlink-click graph hyperlink graph 0.800 0.600 0.400 value 0.200 0.000 -0.200 Macro (with variations) Macro (no variations) Micro (with variations) Micro (no variations)
  26. 26. Evaluation and results... Summary of ΠZ values for dmoz: 0.900 click graph hyperlink-click graph 0.800 hyperlink graph 0.700 0.600 value 0.500 0.400 0.300 0.200 Macro (with variations) Macro (no variations) Micro (with variations) Micro (no variations)
  27. 27. Conclusions We study random walk on a unified web graph Intended to model user searching and browsing behavior Several combinations of metrics are used for evaluation Experimental evaluation shows: The unified graph is always close to the best performance of either the click or the hyperlink graph
  28. 28. Future work Analyze how to deal with the inherent bias that exists in any ranking technique based on usage mining Study other web mining applications 1 Link and click spam detection 2 Similarity search 3 Query recommendation
  29. 29. Thank you!

×