SlideShare a Scribd company logo
1 of 28
Download to read offline
Current Approaches in
Search Result Diversification
         Mario Sangiorgio
Presentation outline
Problem definition

What is diversity?

The relevance/diversity trade-off

Performance evaluation

Open issues and conclusions
Why is result diversification needed?

   A couple of real life examples
Ambiguous query


     Flash
Unambiguous query



 Nuclear power plant
Problem definition


Search result diversification is an optimization
 problem aiming to find k items which are the
subset of all relevant results that contains both
  most relevant and most diverse results.
What is needed



Relevance measure       Diversity measure




           Diversification objective
The result diversification process



Items are ranked by relevance                          Diversity is measured




                 The two measures are used to get the final ranking
What is diversity?
How can items be diverse?

  Word sense diversity,
from ambiguous queries




                  Information source
                     diversity, from
                  unambiguous queries
Measures of diversity

Diversity is tightly coupled with the concept of
                     similarity

To address the different aspects of the problem
         several measures emerged:
             Semantic distance
            Categorical distance
             Novel information
Semantic distance
     Diversifies on content dissimilarity
Uses the min-hashing
  scheme to get the        Sd ={MH h 1 d  ,... , MH h d }
                                                         n


sketch of a document
                                         ∣Su ∩Sv∣
  Distance is computed      sim u , v=
                                         ∣Su ∪Sv∣
 from Jaccard similarity      d u , v=1−sim u , v


Does not work well when the documents have too
      different lengths or small sketch size
Categorical distance
Emphasizes word sense diversification

  It is based on metadata (Taxonomy)

The measure is a weighted tree distance
                    l u                             l v
                                     1                                 1
   d u , v=      ∑            2
                                    e i−1
                                                    ∑            2
                                                                      e i−1
                i=lca u , v                     i=lca u , v


        Examples of taxonomies:
      /Top/Health vs /Top/Finance
/Top/Sport/Racing vs /Top/Sport/Football
Novel information
   Diversifies on a general sense regarding
   content dissimilarity. Good for subtopics
Results are represented with unigram language
models (Used for natural language processing)
  For each document is evaluated (with the
Kullback-Leibler divergence) how much novel
       information it brings into the set
How many extra bits will be needed to describe
  the new document using only the already
        selected document in the set
Diversity measures: open issues

  Some aspects not taken into account:

    intrinsic properties of the document

          genre of the document

       sentiment regarding the topic
The relevance/diversity
 optimization problem
Diversification objectives
It has been proved impossible to find a function
       that has all the required properties:

               scale invariance
                 consistency
                   richness
                    stability
     independence of irrelevant attributes
                 monotonicity
            strength of relevance
            strength of similarity
Diversification objectives
           Several functions proposed:
         Max sum                 Max min
        (No stability)        (No consistency
                                nor stability)
 Max sum of max score
                              Mono objective
(Maximizes relevancy and
                              (No consistency)
     then diversity)
                                 Categorical
        Max product
                            (Results have to cover
(It is based on the already
                              a set of categories)
       chosen results)
Diversification algorithms
 Finding the best solution is a NP-Hard problem

  Algorithm depends on the objective function
    Approximation              Greedy


                  Open issues:
   Is Off-line
                           Are there efficient
pre-computation
                           data structures?
   applicable?
How to evaluate diversity in
         search
Data set for the evaluation
                     Full text
                 TREC Interactive
   Top results from commercial search engine

               Structured data
      Taxonomies (Open Directory Project)

                 Ground truth
        Wikipedia disambiguation pages
   Judgements from Amazon Mechanical Turk

There is the need of task-specific standard datasets
Benchmarks
          Adaptation from existing metrics:
    Alpha-NDCG             Subtopic recall and
Normalized discounted          precision
   cumulative gain         Number of subtopics
                                covered
      User intent              Comparison
   Results distribution         against the
 should reflect what the         optimum
    user is asking for
Alpha-nDCG

  Based on information nuggets (Answer to a
                 question)

A document is relevant when it contains a nugget
              needed by the user

 Quality of results graded by human assessors

    The most nuggets are in the set the best
Subtopic recall and precision

                   Is the result set exhaustive?
                  number of subtopics covered by the first k documents
s−recall at k =
                              total number of subtopics




                      Is the result set efficient?
                                       minRank S opt , r
                    s− precision at r=
                                        minRank S , r
Conclusions


Diversification can really improve quality of search
                       results

There is still some work to do in order to achieve
   good results in all the possible scenarios
Open issues

  There is room for improvement defining new
           diversity types and metrics

Ranking functions should take in account diversity
  from the beginning in an integrated process

  Datasets to evaluate each notion of diversity
                should be built
References
   Minack, E., Demartini, G., Nejdl W.: Current Approaches to Search
           Result Diversification. In: Proceedings of ISWC '09

     Gollapudi, S., Sharma, A.: An Axiomatic Approach for Result
             Diversification.In: Proocedings of WWW '09

 Zhai, C.X., Cohen, W.W., Lafferty, J.: Beyond Independent Relevance:
Methods and Evaluation Metrics for Subtopic Retrieval. In: Proceedings
                              of SIGIR '03

 Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying Search
                  Results. In: Proceedings of WSDM '09

 Clough, P., Sanderson, M., Abouammoh, M., Navarro, S., Paramita, M.:
 Multiple Approaches to Analysing Query Diversity. In: Proceedings of
                               SIGIR '09

   Clarke, C.L., Kolla, M., Cormack, G.V., Vechtomova, O., Ashkan, A.,
   Büttcher, S., MacKinnon, I.: Novelty and Diversity in Information
           Retrieval Evaluation. In: Proceedings of SIGIR '08
Current Approaches in Search Result Diversification

More Related Content

What's hot

Learning from Multiple Annotators
Learning  from  Multiple AnnotatorsLearning  from  Multiple Annotators
Learning from Multiple AnnotatorsGaurav Trivedi
 
data_mining_Projectreport
data_mining_Projectreportdata_mining_Projectreport
data_mining_ProjectreportSampath Velaga
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learningbutest
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.pptbutest
 
An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier ananth
 
Supervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting TechniqueSupervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting Techniqueiosrjce
 
Centralized Class Specific Dictionary Learning for wearable sensors based phy...
Centralized Class Specific Dictionary Learning for wearable sensors based phy...Centralized Class Specific Dictionary Learning for wearable sensors based phy...
Centralized Class Specific Dictionary Learning for wearable sensors based phy...Sherin Mathews
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
Extending Recommendation Systems With Semantics And Context Awareness
Extending Recommendation Systems With Semantics And Context AwarenessExtending Recommendation Systems With Semantics And Context Awareness
Extending Recommendation Systems With Semantics And Context AwarenessVictor Codina
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningLior Rokach
 
A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...
A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...
A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...ijaia
 
Knewton adaptive-learning-white-paper
Knewton adaptive-learning-white-paperKnewton adaptive-learning-white-paper
Knewton adaptive-learning-white-paperdearrd
 
uai2004_V1.doc.doc.doc
uai2004_V1.doc.doc.docuai2004_V1.doc.doc.doc
uai2004_V1.doc.doc.docbutest
 
Text Classification/Categorization
Text Classification/CategorizationText Classification/Categorization
Text Classification/CategorizationOswal Abhishek
 
Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...IJERA Editor
 
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET Journal
 

What's hot (18)

Learning from Multiple Annotators
Learning  from  Multiple AnnotatorsLearning  from  Multiple Annotators
Learning from Multiple Annotators
 
I0704047054
I0704047054I0704047054
I0704047054
 
data_mining_Projectreport
data_mining_Projectreportdata_mining_Projectreport
data_mining_Projectreport
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
 
An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier
 
Supervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting TechniqueSupervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting Technique
 
Centralized Class Specific Dictionary Learning for wearable sensors based phy...
Centralized Class Specific Dictionary Learning for wearable sensors based phy...Centralized Class Specific Dictionary Learning for wearable sensors based phy...
Centralized Class Specific Dictionary Learning for wearable sensors based phy...
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
Extending Recommendation Systems With Semantics And Context Awareness
Extending Recommendation Systems With Semantics And Context AwarenessExtending Recommendation Systems With Semantics And Context Awareness
Extending Recommendation Systems With Semantics And Context Awareness
 
Naive Bayes | Statistics
Naive Bayes | StatisticsNaive Bayes | Statistics
Naive Bayes | Statistics
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...
A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...
A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...
 
Knewton adaptive-learning-white-paper
Knewton adaptive-learning-white-paperKnewton adaptive-learning-white-paper
Knewton adaptive-learning-white-paper
 
uai2004_V1.doc.doc.doc
uai2004_V1.doc.doc.docuai2004_V1.doc.doc.doc
uai2004_V1.doc.doc.doc
 
Text Classification/Categorization
Text Classification/CategorizationText Classification/Categorization
Text Classification/Categorization
 
Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...
 
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
 

Similar to Current Approaches in Search Result Diversification

Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Ian Morgan
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Bayes Nets meetup London
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningKai Koenig
 
G04124041046
G04124041046G04124041046
G04124041046IOSR-JEN
 
Iterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer PredictionIterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer PredictionAlessandro Suglia
 
Iterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer PredictionIterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer PredictionClaudio Greco
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Miningbutest
 
Binary search query classifier
Binary search query classifierBinary search query classifier
Binary search query classifierEsteban Ribero
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Sudeep Das, Ph.D.
 
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...Geetika Gautam
 
Multivariate Models in Questionnaire Development
Multivariate Models in Questionnaire DevelopmentMultivariate Models in Questionnaire Development
Multivariate Models in Questionnaire DevelopmentD Dutta Roy
 
A Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsA Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsRebecca Bilbro
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroPyData
 
STUDENT PERFORMANCE ANALYSIS USING DECISION TREE
STUDENT PERFORMANCE ANALYSIS USING DECISION TREESTUDENT PERFORMANCE ANALYSIS USING DECISION TREE
STUDENT PERFORMANCE ANALYSIS USING DECISION TREEAkshay Jain
 
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisWorkshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisOlga Scrivner
 
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5ssuser33da69
 

Similar to Current Approaches in Search Result Diversification (20)

Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
G04124041046
G04124041046G04124041046
G04124041046
 
Iterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer PredictionIterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer Prediction
 
Iterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer PredictionIterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer Prediction
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
 
Binary search query classifier
Binary search query classifierBinary search query classifier
Binary search query classifier
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
 
Summary2 (1)
Summary2 (1)Summary2 (1)
Summary2 (1)
 
Classification
ClassificationClassification
Classification
 
Classification
ClassificationClassification
Classification
 
Multivariate Models in Questionnaire Development
Multivariate Models in Questionnaire DevelopmentMultivariate Models in Questionnaire Development
Multivariate Models in Questionnaire Development
 
A Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsA Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and Distributions
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
Clustering
ClusteringClustering
Clustering
 
STUDENT PERFORMANCE ANALYSIS USING DECISION TREE
STUDENT PERFORMANCE ANALYSIS USING DECISION TREESTUDENT PERFORMANCE ANALYSIS USING DECISION TREE
STUDENT PERFORMANCE ANALYSIS USING DECISION TREE
 
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisWorkshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
 
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
 

Recently uploaded

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 

Recently uploaded (20)

The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 

Current Approaches in Search Result Diversification

  • 1. Current Approaches in Search Result Diversification Mario Sangiorgio
  • 2. Presentation outline Problem definition What is diversity? The relevance/diversity trade-off Performance evaluation Open issues and conclusions
  • 3. Why is result diversification needed? A couple of real life examples
  • 6. Problem definition Search result diversification is an optimization problem aiming to find k items which are the subset of all relevant results that contains both most relevant and most diverse results.
  • 7. What is needed Relevance measure Diversity measure Diversification objective
  • 8. The result diversification process Items are ranked by relevance Diversity is measured The two measures are used to get the final ranking
  • 10. How can items be diverse? Word sense diversity, from ambiguous queries Information source diversity, from unambiguous queries
  • 11. Measures of diversity Diversity is tightly coupled with the concept of similarity To address the different aspects of the problem several measures emerged: Semantic distance Categorical distance Novel information
  • 12. Semantic distance Diversifies on content dissimilarity Uses the min-hashing scheme to get the Sd ={MH h 1 d  ,... , MH h d } n sketch of a document ∣Su ∩Sv∣ Distance is computed sim u , v= ∣Su ∪Sv∣ from Jaccard similarity d u , v=1−sim u , v Does not work well when the documents have too different lengths or small sketch size
  • 13. Categorical distance Emphasizes word sense diversification It is based on metadata (Taxonomy) The measure is a weighted tree distance l u  l v 1 1 d u , v= ∑ 2 e i−1  ∑ 2 e i−1 i=lca u , v i=lca u , v Examples of taxonomies: /Top/Health vs /Top/Finance /Top/Sport/Racing vs /Top/Sport/Football
  • 14. Novel information Diversifies on a general sense regarding content dissimilarity. Good for subtopics Results are represented with unigram language models (Used for natural language processing) For each document is evaluated (with the Kullback-Leibler divergence) how much novel information it brings into the set How many extra bits will be needed to describe the new document using only the already selected document in the set
  • 15. Diversity measures: open issues Some aspects not taken into account: intrinsic properties of the document genre of the document sentiment regarding the topic
  • 17. Diversification objectives It has been proved impossible to find a function that has all the required properties: scale invariance consistency richness stability independence of irrelevant attributes monotonicity strength of relevance strength of similarity
  • 18. Diversification objectives Several functions proposed: Max sum Max min (No stability) (No consistency nor stability) Max sum of max score Mono objective (Maximizes relevancy and (No consistency) then diversity) Categorical Max product (Results have to cover (It is based on the already a set of categories) chosen results)
  • 19. Diversification algorithms Finding the best solution is a NP-Hard problem Algorithm depends on the objective function Approximation Greedy Open issues: Is Off-line Are there efficient pre-computation data structures? applicable?
  • 20. How to evaluate diversity in search
  • 21. Data set for the evaluation Full text TREC Interactive Top results from commercial search engine Structured data Taxonomies (Open Directory Project) Ground truth Wikipedia disambiguation pages Judgements from Amazon Mechanical Turk There is the need of task-specific standard datasets
  • 22. Benchmarks Adaptation from existing metrics: Alpha-NDCG Subtopic recall and Normalized discounted precision cumulative gain Number of subtopics covered User intent Comparison Results distribution against the should reflect what the optimum user is asking for
  • 23. Alpha-nDCG Based on information nuggets (Answer to a question) A document is relevant when it contains a nugget needed by the user Quality of results graded by human assessors The most nuggets are in the set the best
  • 24. Subtopic recall and precision Is the result set exhaustive? number of subtopics covered by the first k documents s−recall at k = total number of subtopics Is the result set efficient? minRank S opt , r s− precision at r= minRank S , r
  • 25. Conclusions Diversification can really improve quality of search results There is still some work to do in order to achieve good results in all the possible scenarios
  • 26. Open issues There is room for improvement defining new diversity types and metrics Ranking functions should take in account diversity from the beginning in an integrated process Datasets to evaluate each notion of diversity should be built
  • 27. References Minack, E., Demartini, G., Nejdl W.: Current Approaches to Search Result Diversification. In: Proceedings of ISWC '09 Gollapudi, S., Sharma, A.: An Axiomatic Approach for Result Diversification.In: Proocedings of WWW '09 Zhai, C.X., Cohen, W.W., Lafferty, J.: Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval. In: Proceedings of SIGIR '03 Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying Search Results. In: Proceedings of WSDM '09 Clough, P., Sanderson, M., Abouammoh, M., Navarro, S., Paramita, M.: Multiple Approaches to Analysing Query Diversity. In: Proceedings of SIGIR '09 Clarke, C.L., Kolla, M., Cormack, G.V., Vechtomova, O., Ashkan, A., Büttcher, S., MacKinnon, I.: Novelty and Diversity in Information Retrieval Evaluation. In: Proceedings of SIGIR '08