Dr. Carson Kai-Sang Leung

        Inderjeet Singh
(Database and Data Mining Lab)
   Introduction

   Problem

   Solution Methodology

   Evaluation




                           Comp 7220   1/11/2012   2
Comp 7220   1/11/2012   3
   Mining user behaviour/preferences
   Predict document relevance
   Re-rank the search results
   Compare different ranking functions (train/test)
   Optimize the ad. performance
   Query suggestions

   How Big are these logs?
    ◦ 10+ terabyte of entries each day
    ◦ Composed of billions of distinct (query, url)’s

                                               Comp 7220   1/11/2012   4
Documents/results                                 Ranking factors
                        Many ranking factors
presented in order of                             depend on query,
                          considered when
the relevance to the                                document and
                        ranking these results
        query                                    query-document pair


 Improving ranking
   based on user        Personalized search       Recency (temporal)
     preferences          +Social search               ranking
   (likes/dislikes)



                                                Comp 7220   1/11/2012   5
[David Green; blog]



                      Comp 7220   1/11/2012   6
# of clicks received




[CIKM'09 Tutorial]
                     Comp 7220   1/11/2012    7
Trust factor: Preferences to certain URLs more than the other,
e.g., wikipedia.com, stackoverflow.com, Yahoo answers,
about.com

What is missing (in previous models) ?
 Modelling trust factor

 Clicks on sponsored results

 Related queries/searches (sidebars)

 Realistic and flexible assumptions on user behaviour




                                              Comp 7220   1/11/2012   8
Comp 7220   1/11/2012   9
1. Informational query – “DDR3 memory”, “SATA 3 hard drives”,
   “American history”
2. Navigational query – “gmail”, “digg”, “CIBC”, “CIBC credit cards”




                                               Comp 7220   1/11/2012   10
No                              No
Snippet Examine?            Snippet Examine?


                      No
          Yes                         Yes
                                                   No
Snippet Attractive?         Snippet Attractive?

                                                  No
         Yes          No             Yes

Enough Utility?             Enough Utility?

         Yes                          Yes
       End                         End

                                                       Comp 7220   1/11/2012   11
Realistic and flexible assumptions on user
behaviour (session modelling)

    Consider trust bias (trust factor)

        Order results for particular query by
        relevance scores predicted by model

            Comparison of this order to the editorial
            ranking

                 Is it good model? If orderings agree upto a
                 considerable extent

                                                   Comp 7220   1/11/2012   12
Deploy this model as a feature/factor for predicting relevance in learning
                            to rank algorithm



                   Deriving retrieval/ranking function



If metric gains over baseline ranking function? Model insights can be used
                      as a feature in ranking function



  Ranking function tests with different class of queries for metric gains

                                                   Comp 7220   1/11/2012     13
Metrics
• Discounted Cumulative Gain (DCG)
• Normalized DCG (NDCG)
• Precision
• Recall

Two types of data
1. Search click logs (from real or meta search engines)
2. Benchmarking dataset LEarning TO Rank (LETOR) for
   information retrieval




                                     Comp 7220   1/11/2012   14
[Guo et al., 2009]

                     [Chapelle and Zhang, 2009]



                     Comp 7220   1/11/2012        15
   David Green Blog. http://davidgreen.com/comparative-value-of-google-search-
    rankings (accessed 20th-April-2011)

   Fan Guo and Chao Liu. Statistical Models for Web Search Click Log Analysis.
    Tutorial, 2009

   Fan Guo, Chao Liu, and Yi Min Wang. Efficient multiple-click models in web
    search. In Proceedings of Second Web Search and Data Mining (WSDM)
    Conference, Barcelona, Spain, pages 124-131. ACM, 9-11 February, 2009

   Olivier Chapelle and Ye Zhang. A dynamic bayesian network click model for
    web search and ranking. In Proceedings of the 18th International Conference
    on World Wide web (WWW), Madrid, Spain, pages 1-10, ACM, 20-24 April,
    2009
                                                        Comp 7220   1/11/2012     16
[Tmcnet.com Blog]
Comp 7220   1/11/2012              17

Determining Relevance Rankings from Search Click Logs

  • 1.
    Dr. Carson Kai-SangLeung Inderjeet Singh (Database and Data Mining Lab)
  • 2.
    Introduction  Problem  Solution Methodology  Evaluation Comp 7220 1/11/2012 2
  • 3.
    Comp 7220 1/11/2012 3
  • 4.
    Mining user behaviour/preferences  Predict document relevance  Re-rank the search results  Compare different ranking functions (train/test)  Optimize the ad. performance  Query suggestions  How Big are these logs? ◦ 10+ terabyte of entries each day ◦ Composed of billions of distinct (query, url)’s Comp 7220 1/11/2012 4
  • 5.
    Documents/results Ranking factors Many ranking factors presented in order of depend on query, considered when the relevance to the document and ranking these results query query-document pair Improving ranking based on user Personalized search Recency (temporal) preferences +Social search ranking (likes/dislikes) Comp 7220 1/11/2012 5
  • 6.
    [David Green; blog] Comp 7220 1/11/2012 6
  • 7.
    # of clicksreceived [CIKM'09 Tutorial] Comp 7220 1/11/2012 7
  • 8.
    Trust factor: Preferencesto certain URLs more than the other, e.g., wikipedia.com, stackoverflow.com, Yahoo answers, about.com What is missing (in previous models) ?  Modelling trust factor  Clicks on sponsored results  Related queries/searches (sidebars)  Realistic and flexible assumptions on user behaviour Comp 7220 1/11/2012 8
  • 9.
    Comp 7220 1/11/2012 9
  • 10.
    1. Informational query– “DDR3 memory”, “SATA 3 hard drives”, “American history” 2. Navigational query – “gmail”, “digg”, “CIBC”, “CIBC credit cards” Comp 7220 1/11/2012 10
  • 11.
    No No Snippet Examine? Snippet Examine? No Yes Yes No Snippet Attractive? Snippet Attractive? No Yes No Yes Enough Utility? Enough Utility? Yes Yes End End Comp 7220 1/11/2012 11
  • 12.
    Realistic and flexibleassumptions on user behaviour (session modelling) Consider trust bias (trust factor) Order results for particular query by relevance scores predicted by model Comparison of this order to the editorial ranking Is it good model? If orderings agree upto a considerable extent Comp 7220 1/11/2012 12
  • 13.
    Deploy this modelas a feature/factor for predicting relevance in learning to rank algorithm Deriving retrieval/ranking function If metric gains over baseline ranking function? Model insights can be used as a feature in ranking function Ranking function tests with different class of queries for metric gains Comp 7220 1/11/2012 13
  • 14.
    Metrics • Discounted CumulativeGain (DCG) • Normalized DCG (NDCG) • Precision • Recall Two types of data 1. Search click logs (from real or meta search engines) 2. Benchmarking dataset LEarning TO Rank (LETOR) for information retrieval Comp 7220 1/11/2012 14
  • 15.
    [Guo et al.,2009] [Chapelle and Zhang, 2009] Comp 7220 1/11/2012 15
  • 16.
    David Green Blog. http://davidgreen.com/comparative-value-of-google-search- rankings (accessed 20th-April-2011)  Fan Guo and Chao Liu. Statistical Models for Web Search Click Log Analysis. Tutorial, 2009  Fan Guo, Chao Liu, and Yi Min Wang. Efficient multiple-click models in web search. In Proceedings of Second Web Search and Data Mining (WSDM) Conference, Barcelona, Spain, pages 124-131. ACM, 9-11 February, 2009  Olivier Chapelle and Ye Zhang. A dynamic bayesian network click model for web search and ranking. In Proceedings of the 18th International Conference on World Wide web (WWW), Madrid, Spain, pages 1-10, ACM, 20-24 April, 2009 Comp 7220 1/11/2012 16
  • 17.

Editor's Notes

  • #17 User Browsing Model (UBM) [Dupret and Piwowarski, 2008]Dynamic Bayesian Model (DBM) [Chapelle and Zhang, 2009] Session Utility Model (SUM) [Dupret and Liao, 2010]Independent Click Model (ICM) [Guo et. al, 2009]Dependent Click Model (DCM) [Guo et. al, 2009]