SlideShare a Scribd company logo
Probabilistic Models of NovelProbabilistic Models of Novel
Document Rankings forDocument Rankings for
Faceted Topic RetrievalFaceted Topic Retrieval
Ben Cartrette and Praveen Chandar
Dept. of Computer and Information Science
University of Delaware
Newark, DE
( CIKM ’09 )
Date: 2010/05/03
Speaker: Lin, Yi-Jhen
Advisor: Dr. Koh, Jia-Ling
AgendaAgenda
Introduction
- Motivation, Goal
Faceted Topic Retrieval
- Task, Evaluation
Faceted Topic Retrieval Models
- 4 kinds of models
Experiment & Results
Conclusion
Introduction - MotivationIntroduction - Motivation
Modeling documents as independently
relevant does not necessarily provide the
optimal user experience.
Traditional
evaluation
measure would
reward System1
since it has
higher recall
Introduction - MotivationIntroduction - Motivation
Actually,
we prefer
System2
(since it has
more
information)
System2 is better !
IntroductionIntroduction
Novelty and diversity become the
new definition of relevance and
evaluation measures .
They can be achieved through
retrieving documents that are
relevant to query, but cover different
facets of the topic.
we call faceted topic retrieval !
Introduction - GoalIntroduction - Goal
The faceted topic retrieval system
must be able to find a small set of
documents that covers all of the
facets
3 documents that cover 10 facets is
preferable to 5 documents that cover
10 facets
Faceted Topic Retrieval - TaskFaceted Topic Retrieval - Task
Define the task in terms of
Information need :
A faceted topic retrieval information need
is one that has a set of answers – facets –
that are clearly delineated
How that need is best satisfied :
Each answer is fully contained within at
least one document
Faceted Topic Retrieval - TaskFaceted Topic Retrieval - Task
Information need
invest in next generation technologies
increase use of renewable energy sources
Invest in renewable energy sources
double ethanol in gas supply
shift to biodiesel
shift to coal
Facets (a set of
answers)
Faceted Topic RetrievalFaceted Topic Retrieval
A Query :
A sort list of keywords
A ranked list of documents
that contain as many unique
facets as possible.
D1D1
DnDn
D2D2
Faceted Topic RetrievalFaceted Topic Retrieval
-Evaluation-Evaluation
S-recall
S-precision
Redundancy
Evaluation –Evaluation –
an example for S-recall and S-precisionan example for S-recall and S-precision
Total : 10 facets (assume all facets in
documents are non-overlapped)
Evaluation –Evaluation –
an example for Redundancyan example for Redundancy
Faceted topic retrieval modelsFaceted topic retrieval models
4 kinds of models
- MMR (Maximal Marginal Relevance)
- Probabilistic Interpretation of MMR
- Greedy Result Set Pruning
- A Probabilistic Set-Based Approach
1. MMR1. MMR
2. Probabilistic2. Probabilistic
Interpretation of MMRInterpretation of MMR
Let c1=0, c3=c4
3. Greedy Result Set Pruning3. Greedy Result Set Pruning
First, rank without considering
novelty (in order of relevance)
Second, step down the list of
documents, prune documents with
similarity greater than some
threshold ϴ
 I.e., at rank i, remove any document Dj,
j > i, with sim(Dj,Di) > ϴ
4. A Probabilistic Set-Based4. A Probabilistic Set-Based
ApproachApproach
 P(F ϵ D) :Probability of D contains F
the probability that a facet Fj occurs in at
least one document in a set D is
the probability that all of the facets in a
set F are captured by the documents D is
4. A Probabilistic Set-Based4. A Probabilistic Set-Based
ApproachApproach
4.1 Hypothesizing Facets
4.2 Estimating Document-Facet
Probabilities
4.3 Maximizing Likelihood
4.1 Hypothesizing Facets4.1 Hypothesizing Facets
Two unsupervised probabilistic methods :
Relevance modeling
Topic modeling with LDA
Instead of extract facets directly
from any particular word or phrase,
we build a “ facet model ”P(w|F)
4.1 Hypothesizing Facets4.1 Hypothesizing Facets
Since we do not know the facet
terms or the set of documents
relevant to the facet, we will
estimate them from the retrieved
documents
Obtain m models from the top m
retrieved documents by taking each
document along with its k nearest
neighbors as the basis for a facet
model
Relevance modelingRelevance modeling
Estimate m ”facet models“ P(w|Fj)
from a set of retrieved documents
using the so-called RM2 approach:
DFj : the set of documents relevant to facet Fj
fk : facet terms
Topic modeling with LDATopic modeling with LDA
Probabilistic P(w|Fj) and P(Fj) can
found through expectation
maximization
4.2 Estimating Document-Facet4.2 Estimating Document-Facet
ProbabilitiesProbabilities
Both the facet relevance model and LDA
model produce generation probabilistic
P(Di|Fj)
P(Di|Fj) : the probability that sampling
terms from the facet model Fj will
produce document Di
4.3 Maximizing Likelihood4.3 Maximizing Likelihood
Define the likelihood function
Constrain :
K : hypothesized minimum number
required to cover the facets
Maximizing L(y) is a NP-Hard problem
Approximate solution :
For each facet Fj, take the document Di
with maximum
Experiment - DataExperiment - Data
A Query :
A sort list of keywords
Top 130 retrieved documents
D1D1
D130D130
D2D2
Query Likelihood L.M.
Experiment - DataExperiment - Data
Top 130 retrieved
documents
D1D1
D130D130
D2D2
2 assessors
to judge
44.7 relevant documents per
query
Each document contains 4.3
facets
39.2 unique facets on average
( for average one unique facet
per relevant document )
Agreement :
72% of all relevant documents
were judged relevant by both
assessors
For 60 queries :
Experiment - DataExperiment - Data
TDT5 sample topic definition
Judgments
Query
Experiment – Retrieval EnginesExperiment – Retrieval Engines
Using Lemur toolkit
 LM baseline: a query-likelihood language model
 RM baseline: a pseudo-feedback with relevance
model
 MMR: query similarity scores from LM baseline
and cosine similarity for novelty
 AvgMix (Prob MMR) : the probabilistic MMR
model using query-likelihood scores from LM
baseline and the AvgMix novelty score.
 Pruning: removing documents from the LM
baseline on cosine similarity
 FM: the set-based facet model
Experiment – Retrieval EnginesExperiment – Retrieval Engines
FM: the set-based facet model
 FM-RM:
each of the top m documents and their K nearest
neighbors becomes a “facet model ”P(w|Fj), then
compute the probability P(Di|Fj)
 FM-LDA:
use LDA to discover subtopics zj, and get P(zj|
D) , we extract 50 subtopics
Experiments - EvaluationExperiments - Evaluation
Use five-fold cross-validation to
train and test systems
48 queries in four folds to train
model parameters
Parameters are used to obtain
ranked results on the remaining 12
queries
At the minimum optimal rank S-
rec, we report S-recall, redundancy,
MAP
ResultsResults
ResultsResults
ConclusionConclusion
We defined a type of novelty retrieval
task called faceted topic retrieval 
retrieve the facets of information need
in a small set of documents.
We presented two novel models: One
that prunes a retrieval ranking and
one a formally-motivated probabilistic
models.
Both models are competitive with
MMR, and outperform another
probabilistic model.

More Related Content

What's hot

Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models
Primya Tamil
 
Tdm probabilistic models (part 2)
Tdm probabilistic  models (part  2)Tdm probabilistic  models (part  2)
Tdm probabilistic models (part 2)KU Leuven
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
Bhaskar Mitra
 
Analytical learning
Analytical learningAnalytical learning
Analytical learning
swapnac12
 
Neural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progressNeural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progress
Bhaskar Mitra
 
Probabilistic models (part 1)
Probabilistic models (part 1)Probabilistic models (part 1)
Probabilistic models (part 1)KU Leuven
 
Inductive analytical approaches to learning
Inductive analytical approaches to learningInductive analytical approaches to learning
Inductive analytical approaches to learning
swapnac12
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document Ranking
Bhaskar Mitra
 
Survey of Generative Clustering Models 2008
Survey of Generative Clustering Models 2008Survey of Generative Clustering Models 2008
Survey of Generative Clustering Models 2008
Roman Stanchak
 
G04124041046
G04124041046G04124041046
G04124041046
IOSR-JEN
 
Information retrieval 7 boolean model
Information retrieval 7 boolean modelInformation retrieval 7 boolean model
Information retrieval 7 boolean model
Vaibhav Khanna
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
Bhaskar Mitra
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
ijnlc
 
Information Retrieval 02
Information Retrieval 02Information Retrieval 02
Information Retrieval 02
Jeet Das
 
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain Ontology
Keerti Bhogaraju
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Sean Golliher
 
Ir 02
Ir   02Ir   02
Ir 09
Ir   09Ir   09
Ir 03
Ir   03Ir   03
The effect of number of concepts on readability of schemas 2
The effect of number of concepts on readability of schemas 2The effect of number of concepts on readability of schemas 2
The effect of number of concepts on readability of schemas 2
Saman Sara
 

What's hot (20)

Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models
 
Tdm probabilistic models (part 2)
Tdm probabilistic  models (part  2)Tdm probabilistic  models (part  2)
Tdm probabilistic models (part 2)
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
Analytical learning
Analytical learningAnalytical learning
Analytical learning
 
Neural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progressNeural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progress
 
Probabilistic models (part 1)
Probabilistic models (part 1)Probabilistic models (part 1)
Probabilistic models (part 1)
 
Inductive analytical approaches to learning
Inductive analytical approaches to learningInductive analytical approaches to learning
Inductive analytical approaches to learning
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document Ranking
 
Survey of Generative Clustering Models 2008
Survey of Generative Clustering Models 2008Survey of Generative Clustering Models 2008
Survey of Generative Clustering Models 2008
 
G04124041046
G04124041046G04124041046
G04124041046
 
Information retrieval 7 boolean model
Information retrieval 7 boolean modelInformation retrieval 7 boolean model
Information retrieval 7 boolean model
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
 
Information Retrieval 02
Information Retrieval 02Information Retrieval 02
Information Retrieval 02
 
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain Ontology
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
 
Ir 02
Ir   02Ir   02
Ir 02
 
Ir 09
Ir   09Ir   09
Ir 09
 
Ir 03
Ir   03Ir   03
Ir 03
 
The effect of number of concepts on readability of schemas 2
The effect of number of concepts on readability of schemas 2The effect of number of concepts on readability of schemas 2
The effect of number of concepts on readability of schemas 2
 

Similar to Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval

Document ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspaceDocument ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspace
Prakash Dubey
 
Tanvi Motwani- A Few Examples Go A Long Way
Tanvi Motwani- A Few Examples Go A Long WayTanvi Motwani- A Few Examples Go A Long Way
Tanvi Motwani- A Few Examples Go A Long WayTanvi Motwani
 
Probablistic information retrieval
Probablistic information retrievalProbablistic information retrieval
Probablistic information retrieval
Nisha Arankandath
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
Nik Spirin
 
Topic modeling of marketing scientific papers: An experimental survey
Topic modeling of marketing scientific papers: An experimental surveyTopic modeling of marketing scientific papers: An experimental survey
Topic modeling of marketing scientific papers: An experimental survey
ICDEcCnferenece
 
probabilistic ranking
probabilistic rankingprobabilistic ranking
probabilistic rankingFELIX75
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligencevini89
 
Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Textbutest
 
4-IR Models_new.ppt
4-IR Models_new.ppt4-IR Models_new.ppt
4-IR Models_new.ppt
BereketAraya
 
4-IR Models_new.ppt
4-IR Models_new.ppt4-IR Models_new.ppt
4-IR Models_new.ppt
BereketAraya
 
EFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATA
EFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATAEFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATA
EFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATA
csandit
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptbutest
 
Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...
Sujit Pal
 
Information retrieval as statistical translation
Information retrieval as statistical translationInformation retrieval as statistical translation
Information retrieval as statistical translation
Bhavesh Singh
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxKalpit Desai
 
BoTLRet: A Template-based Linked Data Information Retrieval
 BoTLRet: A Template-based Linked Data Information Retrieval BoTLRet: A Template-based Linked Data Information Retrieval
BoTLRet: A Template-based Linked Data Information Retrieval
National Inistitute of Informatics (NII), Tokyo, Japann
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and Recommendation
Arjen de Vries
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Miningbutest
 
Topic Models Based Personalized Spam Filter
Topic Models Based Personalized Spam FilterTopic Models Based Personalized Spam Filter
Topic Models Based Personalized Spam Filter
Sudarsun Santhiappan
 

Similar to Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval (20)

Document ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspaceDocument ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspace
 
Tanvi Motwani- A Few Examples Go A Long Way
Tanvi Motwani- A Few Examples Go A Long WayTanvi Motwani- A Few Examples Go A Long Way
Tanvi Motwani- A Few Examples Go A Long Way
 
Probablistic information retrieval
Probablistic information retrievalProbablistic information retrieval
Probablistic information retrieval
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
 
Topic modeling of marketing scientific papers: An experimental survey
Topic modeling of marketing scientific papers: An experimental surveyTopic modeling of marketing scientific papers: An experimental survey
Topic modeling of marketing scientific papers: An experimental survey
 
probabilistic ranking
probabilistic rankingprobabilistic ranking
probabilistic ranking
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Topicmodels
TopicmodelsTopicmodels
Topicmodels
 
Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Text
 
4-IR Models_new.ppt
4-IR Models_new.ppt4-IR Models_new.ppt
4-IR Models_new.ppt
 
4-IR Models_new.ppt
4-IR Models_new.ppt4-IR Models_new.ppt
4-IR Models_new.ppt
 
EFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATA
EFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATAEFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATA
EFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATA
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
 
Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...
 
Information retrieval as statistical translation
Information retrieval as statistical translationInformation retrieval as statistical translation
Information retrieval as statistical translation
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
BoTLRet: A Template-based Linked Data Information Retrieval
 BoTLRet: A Template-based Linked Data Information Retrieval BoTLRet: A Template-based Linked Data Information Retrieval
BoTLRet: A Template-based Linked Data Information Retrieval
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and Recommendation
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
 
Topic Models Based Personalized Spam Filter
Topic Models Based Personalized Spam FilterTopic Models Based Personalized Spam Filter
Topic Models Based Personalized Spam Filter
 

More from YI-JHEN LIN

Adaptive relevance feedback in information retrieval
Adaptive relevance feedback in information retrievalAdaptive relevance feedback in information retrieval
Adaptive relevance feedback in information retrieval
YI-JHEN LIN
 
BioSnowball_Automated Population of Wikis
BioSnowball_Automated Population of WikisBioSnowball_Automated Population of Wikis
BioSnowball_Automated Population of Wikis
YI-JHEN LIN
 
PQC_Personalized Query Classification
PQC_Personalized Query ClassificationPQC_Personalized Query Classification
PQC_Personalized Query Classification
YI-JHEN LIN
 
Query Dependent Pseudo-Relevance Feedback based on Wikipedia
Query Dependent Pseudo-Relevance Feedback based on WikipediaQuery Dependent Pseudo-Relevance Feedback based on Wikipedia
Query Dependent Pseudo-Relevance Feedback based on Wikipedia
YI-JHEN LIN
 
MrKNN_Soft Relevance for Multi-label Classification
MrKNN_Soft Relevance for Multi-label ClassificationMrKNN_Soft Relevance for Multi-label Classification
MrKNN_Soft Relevance for Multi-label Classification
YI-JHEN LIN
 
Visual Summarization of Web Pages
Visual Summarization of Web PagesVisual Summarization of Web Pages
Visual Summarization of Web Pages
YI-JHEN LIN
 

More from YI-JHEN LIN (6)

Adaptive relevance feedback in information retrieval
Adaptive relevance feedback in information retrievalAdaptive relevance feedback in information retrieval
Adaptive relevance feedback in information retrieval
 
BioSnowball_Automated Population of Wikis
BioSnowball_Automated Population of WikisBioSnowball_Automated Population of Wikis
BioSnowball_Automated Population of Wikis
 
PQC_Personalized Query Classification
PQC_Personalized Query ClassificationPQC_Personalized Query Classification
PQC_Personalized Query Classification
 
Query Dependent Pseudo-Relevance Feedback based on Wikipedia
Query Dependent Pseudo-Relevance Feedback based on WikipediaQuery Dependent Pseudo-Relevance Feedback based on Wikipedia
Query Dependent Pseudo-Relevance Feedback based on Wikipedia
 
MrKNN_Soft Relevance for Multi-label Classification
MrKNN_Soft Relevance for Multi-label ClassificationMrKNN_Soft Relevance for Multi-label Classification
MrKNN_Soft Relevance for Multi-label Classification
 
Visual Summarization of Web Pages
Visual Summarization of Web PagesVisual Summarization of Web Pages
Visual Summarization of Web Pages
 

Recently uploaded

Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 

Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval

  • 1. Probabilistic Models of NovelProbabilistic Models of Novel Document Rankings forDocument Rankings for Faceted Topic RetrievalFaceted Topic Retrieval Ben Cartrette and Praveen Chandar Dept. of Computer and Information Science University of Delaware Newark, DE ( CIKM ’09 ) Date: 2010/05/03 Speaker: Lin, Yi-Jhen Advisor: Dr. Koh, Jia-Ling
  • 2. AgendaAgenda Introduction - Motivation, Goal Faceted Topic Retrieval - Task, Evaluation Faceted Topic Retrieval Models - 4 kinds of models Experiment & Results Conclusion
  • 3. Introduction - MotivationIntroduction - Motivation Modeling documents as independently relevant does not necessarily provide the optimal user experience.
  • 4. Traditional evaluation measure would reward System1 since it has higher recall Introduction - MotivationIntroduction - Motivation Actually, we prefer System2 (since it has more information) System2 is better !
  • 5. IntroductionIntroduction Novelty and diversity become the new definition of relevance and evaluation measures . They can be achieved through retrieving documents that are relevant to query, but cover different facets of the topic. we call faceted topic retrieval !
  • 6. Introduction - GoalIntroduction - Goal The faceted topic retrieval system must be able to find a small set of documents that covers all of the facets 3 documents that cover 10 facets is preferable to 5 documents that cover 10 facets
  • 7. Faceted Topic Retrieval - TaskFaceted Topic Retrieval - Task Define the task in terms of Information need : A faceted topic retrieval information need is one that has a set of answers – facets – that are clearly delineated How that need is best satisfied : Each answer is fully contained within at least one document
  • 8. Faceted Topic Retrieval - TaskFaceted Topic Retrieval - Task Information need invest in next generation technologies increase use of renewable energy sources Invest in renewable energy sources double ethanol in gas supply shift to biodiesel shift to coal Facets (a set of answers)
  • 9. Faceted Topic RetrievalFaceted Topic Retrieval A Query : A sort list of keywords A ranked list of documents that contain as many unique facets as possible. D1D1 DnDn D2D2
  • 10. Faceted Topic RetrievalFaceted Topic Retrieval -Evaluation-Evaluation S-recall S-precision Redundancy
  • 11. Evaluation –Evaluation – an example for S-recall and S-precisionan example for S-recall and S-precision Total : 10 facets (assume all facets in documents are non-overlapped)
  • 12. Evaluation –Evaluation – an example for Redundancyan example for Redundancy
  • 13. Faceted topic retrieval modelsFaceted topic retrieval models 4 kinds of models - MMR (Maximal Marginal Relevance) - Probabilistic Interpretation of MMR - Greedy Result Set Pruning - A Probabilistic Set-Based Approach
  • 14. 1. MMR1. MMR 2. Probabilistic2. Probabilistic Interpretation of MMRInterpretation of MMR Let c1=0, c3=c4
  • 15. 3. Greedy Result Set Pruning3. Greedy Result Set Pruning First, rank without considering novelty (in order of relevance) Second, step down the list of documents, prune documents with similarity greater than some threshold ϴ  I.e., at rank i, remove any document Dj, j > i, with sim(Dj,Di) > ϴ
  • 16. 4. A Probabilistic Set-Based4. A Probabilistic Set-Based ApproachApproach  P(F ϵ D) :Probability of D contains F the probability that a facet Fj occurs in at least one document in a set D is the probability that all of the facets in a set F are captured by the documents D is
  • 17. 4. A Probabilistic Set-Based4. A Probabilistic Set-Based ApproachApproach 4.1 Hypothesizing Facets 4.2 Estimating Document-Facet Probabilities 4.3 Maximizing Likelihood
  • 18. 4.1 Hypothesizing Facets4.1 Hypothesizing Facets Two unsupervised probabilistic methods : Relevance modeling Topic modeling with LDA Instead of extract facets directly from any particular word or phrase, we build a “ facet model ”P(w|F)
  • 19. 4.1 Hypothesizing Facets4.1 Hypothesizing Facets Since we do not know the facet terms or the set of documents relevant to the facet, we will estimate them from the retrieved documents Obtain m models from the top m retrieved documents by taking each document along with its k nearest neighbors as the basis for a facet model
  • 20. Relevance modelingRelevance modeling Estimate m ”facet models“ P(w|Fj) from a set of retrieved documents using the so-called RM2 approach: DFj : the set of documents relevant to facet Fj fk : facet terms
  • 21. Topic modeling with LDATopic modeling with LDA Probabilistic P(w|Fj) and P(Fj) can found through expectation maximization
  • 22. 4.2 Estimating Document-Facet4.2 Estimating Document-Facet ProbabilitiesProbabilities Both the facet relevance model and LDA model produce generation probabilistic P(Di|Fj) P(Di|Fj) : the probability that sampling terms from the facet model Fj will produce document Di
  • 23. 4.3 Maximizing Likelihood4.3 Maximizing Likelihood Define the likelihood function Constrain : K : hypothesized minimum number required to cover the facets Maximizing L(y) is a NP-Hard problem Approximate solution : For each facet Fj, take the document Di with maximum
  • 24. Experiment - DataExperiment - Data A Query : A sort list of keywords Top 130 retrieved documents D1D1 D130D130 D2D2 Query Likelihood L.M.
  • 25. Experiment - DataExperiment - Data Top 130 retrieved documents D1D1 D130D130 D2D2 2 assessors to judge 44.7 relevant documents per query Each document contains 4.3 facets 39.2 unique facets on average ( for average one unique facet per relevant document ) Agreement : 72% of all relevant documents were judged relevant by both assessors For 60 queries :
  • 26. Experiment - DataExperiment - Data TDT5 sample topic definition Judgments Query
  • 27. Experiment – Retrieval EnginesExperiment – Retrieval Engines Using Lemur toolkit  LM baseline: a query-likelihood language model  RM baseline: a pseudo-feedback with relevance model  MMR: query similarity scores from LM baseline and cosine similarity for novelty  AvgMix (Prob MMR) : the probabilistic MMR model using query-likelihood scores from LM baseline and the AvgMix novelty score.  Pruning: removing documents from the LM baseline on cosine similarity  FM: the set-based facet model
  • 28. Experiment – Retrieval EnginesExperiment – Retrieval Engines FM: the set-based facet model  FM-RM: each of the top m documents and their K nearest neighbors becomes a “facet model ”P(w|Fj), then compute the probability P(Di|Fj)  FM-LDA: use LDA to discover subtopics zj, and get P(zj| D) , we extract 50 subtopics
  • 29. Experiments - EvaluationExperiments - Evaluation Use five-fold cross-validation to train and test systems 48 queries in four folds to train model parameters Parameters are used to obtain ranked results on the remaining 12 queries At the minimum optimal rank S- rec, we report S-recall, redundancy, MAP
  • 32. ConclusionConclusion We defined a type of novelty retrieval task called faceted topic retrieval  retrieve the facets of information need in a small set of documents. We presented two novel models: One that prunes a retrieval ranking and one a formally-motivated probabilistic models. Both models are competitive with MMR, and outperform another probabilistic model.