Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Current Approaches in
Search Result Diversification
         Mario Sangiorgio
Presentation outline
Problem definition

What is diversity?

The relevance/diversity trade-off

Performance evaluation

Op...
Why is result diversification needed?

   A couple of real life examples
Ambiguous query


     Flash
Unambiguous query



 Nuclear power plant
Problem definition


Search result diversification is an optimization
 problem aiming to find k items which are the
subset...
What is needed



Relevance measure       Diversity measure




           Diversification objective
The result diversification process



Items are ranked by relevance                          Diversity is measured




   ...
What is diversity?
How can items be diverse?

  Word sense diversity,
from ambiguous queries




                  Information source
       ...
Measures of diversity

Diversity is tightly coupled with the concept of
                     similarity

To address the di...
Semantic distance
     Diversifies on content dissimilarity
Uses the min-hashing
  scheme to get the        Sd ={MH h 1 ...
Categorical distance
Emphasizes word sense diversification

  It is based on metadata (Taxonomy)

The measure is a weighte...
Novel information
   Diversifies on a general sense regarding
   content dissimilarity. Good for subtopics
Results are rep...
Diversity measures: open issues

  Some aspects not taken into account:

    intrinsic properties of the document

       ...
The relevance/diversity
 optimization problem
Diversification objectives
It has been proved impossible to find a function
       that has all the required properties:

...
Diversification objectives
           Several functions proposed:
         Max sum                 Max min
        (No sta...
Diversification algorithms
 Finding the best solution is a NP-Hard problem

  Algorithm depends on the objective function
...
How to evaluate diversity in
         search
Data set for the evaluation
                     Full text
                 TREC Interactive
   Top results from commercia...
Benchmarks
          Adaptation from existing metrics:
    Alpha-NDCG             Subtopic recall and
Normalized discounte...
Alpha-nDCG

  Based on information nuggets (Answer to a
                 question)

A document is relevant when it contain...
Subtopic recall and precision

                   Is the result set exhaustive?
                  number of subtopics cove...
Conclusions


Diversification can really improve quality of search
                       results

There is still some wor...
Open issues

  There is room for improvement defining new
           diversity types and metrics

Ranking functions should...
References
   Minack, E., Demartini, G., Nejdl W.: Current Approaches to Search
           Result Diversification. In: Pro...
Current Approaches in Search Result Diversification
Upcoming SlideShare
Loading in …5
×

Current Approaches in Search Result Diversification

3,564 views

Published on

Published in: Technology, Career
  • Be the first to comment

Current Approaches in Search Result Diversification

  1. 1. Current Approaches in Search Result Diversification Mario Sangiorgio
  2. 2. Presentation outline Problem definition What is diversity? The relevance/diversity trade-off Performance evaluation Open issues and conclusions
  3. 3. Why is result diversification needed? A couple of real life examples
  4. 4. Ambiguous query Flash
  5. 5. Unambiguous query Nuclear power plant
  6. 6. Problem definition Search result diversification is an optimization problem aiming to find k items which are the subset of all relevant results that contains both most relevant and most diverse results.
  7. 7. What is needed Relevance measure Diversity measure Diversification objective
  8. 8. The result diversification process Items are ranked by relevance Diversity is measured The two measures are used to get the final ranking
  9. 9. What is diversity?
  10. 10. How can items be diverse? Word sense diversity, from ambiguous queries Information source diversity, from unambiguous queries
  11. 11. Measures of diversity Diversity is tightly coupled with the concept of similarity To address the different aspects of the problem several measures emerged: Semantic distance Categorical distance Novel information
  12. 12. Semantic distance Diversifies on content dissimilarity Uses the min-hashing scheme to get the Sd ={MH h 1 d  ,... , MH h d } n sketch of a document ∣Su ∩Sv∣ Distance is computed sim u , v= ∣Su ∪Sv∣ from Jaccard similarity d u , v=1−sim u , v Does not work well when the documents have too different lengths or small sketch size
  13. 13. Categorical distance Emphasizes word sense diversification It is based on metadata (Taxonomy) The measure is a weighted tree distance l u  l v 1 1 d u , v= ∑ 2 e i−1  ∑ 2 e i−1 i=lca u , v i=lca u , v Examples of taxonomies: /Top/Health vs /Top/Finance /Top/Sport/Racing vs /Top/Sport/Football
  14. 14. Novel information Diversifies on a general sense regarding content dissimilarity. Good for subtopics Results are represented with unigram language models (Used for natural language processing) For each document is evaluated (with the Kullback-Leibler divergence) how much novel information it brings into the set How many extra bits will be needed to describe the new document using only the already selected document in the set
  15. 15. Diversity measures: open issues Some aspects not taken into account: intrinsic properties of the document genre of the document sentiment regarding the topic
  16. 16. The relevance/diversity optimization problem
  17. 17. Diversification objectives It has been proved impossible to find a function that has all the required properties: scale invariance consistency richness stability independence of irrelevant attributes monotonicity strength of relevance strength of similarity
  18. 18. Diversification objectives Several functions proposed: Max sum Max min (No stability) (No consistency nor stability) Max sum of max score Mono objective (Maximizes relevancy and (No consistency) then diversity) Categorical Max product (Results have to cover (It is based on the already a set of categories) chosen results)
  19. 19. Diversification algorithms Finding the best solution is a NP-Hard problem Algorithm depends on the objective function Approximation Greedy Open issues: Is Off-line Are there efficient pre-computation data structures? applicable?
  20. 20. How to evaluate diversity in search
  21. 21. Data set for the evaluation Full text TREC Interactive Top results from commercial search engine Structured data Taxonomies (Open Directory Project) Ground truth Wikipedia disambiguation pages Judgements from Amazon Mechanical Turk There is the need of task-specific standard datasets
  22. 22. Benchmarks Adaptation from existing metrics: Alpha-NDCG Subtopic recall and Normalized discounted precision cumulative gain Number of subtopics covered User intent Comparison Results distribution against the should reflect what the optimum user is asking for
  23. 23. Alpha-nDCG Based on information nuggets (Answer to a question) A document is relevant when it contains a nugget needed by the user Quality of results graded by human assessors The most nuggets are in the set the best
  24. 24. Subtopic recall and precision Is the result set exhaustive? number of subtopics covered by the first k documents s−recall at k = total number of subtopics Is the result set efficient? minRank S opt , r s− precision at r= minRank S , r
  25. 25. Conclusions Diversification can really improve quality of search results There is still some work to do in order to achieve good results in all the possible scenarios
  26. 26. Open issues There is room for improvement defining new diversity types and metrics Ranking functions should take in account diversity from the beginning in an integrated process Datasets to evaluate each notion of diversity should be built
  27. 27. References Minack, E., Demartini, G., Nejdl W.: Current Approaches to Search Result Diversification. In: Proceedings of ISWC '09 Gollapudi, S., Sharma, A.: An Axiomatic Approach for Result Diversification.In: Proocedings of WWW '09 Zhai, C.X., Cohen, W.W., Lafferty, J.: Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval. In: Proceedings of SIGIR '03 Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying Search Results. In: Proceedings of WSDM '09 Clough, P., Sanderson, M., Abouammoh, M., Navarro, S., Paramita, M.: Multiple Approaches to Analysing Query Diversity. In: Proceedings of SIGIR '09 Clarke, C.L., Kolla, M., Cormack, G.V., Vechtomova, O., Ashkan, A., Büttcher, S., MacKinnon, I.: Novelty and Diversity in Information Retrieval Evaluation. In: Proceedings of SIGIR '08

×