SlideShare a Scribd company logo
1 of 40
Can Short Queries Be Even
Shorter?
University of Delaware
1
Long/Verbose Queries
Had Been Extensively Studied
2
• Example long query here …
3
However …
Can short queries have the similar
property?
4
Family Leave Law
(ROBUST04 qid:648)
0.2725
MAP
However …
Can short queries have the similar
property?
5
Family Leave Law
(ROBUST04 qid:648)
0.2725
MAP
Family Leave 0.4679
However …
Can short queries have the similar
property?
6
Family Leave Law
(ROBUST04 qid:648)
0.2725
MAP
Family Leave 0.4679
However …
Can short queries have the similar
property?
• Subquery of the short query could be better!
A high level overview
• A comparison between the Best Subqueries with
the Original Queries for TREC collections:
7
Collection Orig. Queries
Best
Subqueries
Diff.
Disk12 0.2597 0.2880 +10.9%
ROBUST04 0.2399 0.2772 +15.5%
AQUAINT 0.2107 0.2426 +15.1%
WT2G 0.3285 0.3580 +9.0%
WT10G 0.1720 0.2051 +19.2%
GOV2 0.3060 0.3221 +5.3%
On Average 0.2528 0.2821 +12.5%
8
Question:
“Family Leave Law”
Original Query
“Family Leave”
Best Subquery
?
• Can we identify those optimal
subqueries?
• How do identify?
We formulate it as a
Subquery Ranking Problem
Family Leave Law
9
Family
Leave
Law
Family Leave
Leave Law
Family Law
F
F
F
F
F
F
F
0.2725
0.0029
0.2477
0.0000
0.4679
0.0639
0.0046
LearnExtract
Subquery Features Label(MAP
)
Then the key is the Features
10
Family Leave Features
Previously Proposed Features
11
Previously Proposed Features (for verbose query)
Statistical Query Post-Retrieval
TF
IDF
Collection TF
Collection IDF
Mutual Information
Similarity with Orig.
Contain Stopwords?
Query Drift
Query Scope
Clarity Score
Weighted Information Gain
Family Leave Features
The Problem of Previously Proposed
Features
12
Family Leave Law
IDFs
13.26 12.39 8.98
The Problem of Previously Proposed
Features
13
Remove the term with lowest IDF
Family Leave Law
IDFs
13.26 12.39 8.98
The Problem of Previously Proposed
Features
14
?
?
When stop removing?
Remove the term with lowest IDF
Family Leave Law
IDFs
13.26 12.39 8.98
The Problem of Previously Proposed
Features
15
Other features do not work well (details in the paper)
?
?
When stop removing?
Remove the term with lowest IDF
Family Leave Law
IDFs
13.26 12.39 8.98
New futures are proposed to tackle the
problem
• Post-retrieval
• Focus on term relationship
• document level features term level features
16
New futures are proposed to tackle the
problem
• Post-retrieval
• Focus on term relationship
• document level features term level features
• 3 Categories of features
• Term Proximity based Features
• Term Score based Features
• Compactness and Positions of Term Score
Tensors
17
Term Proximity based Features (PXM)
• Term Dependency Model [Metzler05]
18
Family Leave Law
Term Proximity based Features (PXM)
• Term Dependency Model [Metzler05]
19
Family Leave Law
• Already know it is a law code
• Occur together
• In that order
Term Proximity based Features (PXM)
• Term Dependency Model [Metzler05]
20
• Already know it is a law code
• Occur together
• In that order
How to capture the feature?
Family Leave Law
How to Capture PXM?
• Use proximity query
21
#combine(#uw4(family leave) #ow4(family leave))
Unordered Window of 4 Ordered Window of 4
• Use proximity query
22
#combine(#uw4(family leave) #ow4(family leave))
Unordered Window of 4 Ordered Window of 4
• Explore the ranking scores
0.5894
0.5632
0.5323
0.4927
How to Capture PXM?
MIN
MAX
MAX-MIN
MAX/MIN
SUM
MEAN
STD
GMEAN
proximity
ranking
scores
0.5894
0.5632
0.5323
0.4927
proximity
ranking
scores
0.6288
0.6109
0.6099
0.5912
original
ranking
scores
correlationcorrelation
Term Score based Features (TS)
• TF-IDF Constraint [Fang2011]
23
SVM Tutorial SVM Tutorial
99 1 50 50
Counter Intuitive
• TF-IDF Constraint [Fang2011]
24
• We instead look at the term scores…
SVM Tutorial SVM Tutorial
99 1 50 50
Counter Intuitive
Term Score based Features (TS)
25
• We look at the term scores…
• Colors are relevant probability
• Queries have different term scores distribution
One term is
more important
Terms are of relatively
equivalent importance
Term Score based Features (TS)
26
• Explore the ranking scores of terms
0.2123 0.4596 0.0038
0.2346 0.4087 0.0002
0.2016 0.4456 0.0016
0.1946 0.4213 0.0027
0.1942 0.3928 0.0059
How to Capture TS?
Family Leave Law
feature func
(max)
feature funcs
MIN, MAX, MAX-MIN, MAX/MIN, SUM, MEAN, STD, GMEAN
0.4596
0.4087
0.4456
0.4213
0.3928
feature func
(mean)
0.4256
Final
Feature
doc1
doc2
doc3
doc4
doc5
Individual Term Score
Compactness and Positions of Term Score Tensors
(TCP)
• Normalized Query Commitment (NQC) [Shtok2012]
27
0.5894
0.5632
0.5323
0.4927
document ranking scores
0.6678
0.5632
0.4896
Quote:
“Higher deviation value was
correlated with potentially lower
query drift, and thus indicating the
better effectiveness"
Larger
Gap
Larger
Gap
28
Compactness and Positions of Term Score Tensors
(TCP)
• We instead look at the
term scores…
• Term scores as tensors
in multi-dimensional
space
Relevant Documents
NonRelevant Documents
29
Compactness and Positions of Term Score Tensors
(TCP)
• We instead look at the
term scores…
• Term scores as tensors
in multi-dimensional
space
• Best subquery has more
compact tensors
• But clustered at different
locations
Relevant Documents
NonRelevant Documents
30
Compactness of Tensors
• Mean and Standard Deviation of the distances between tensors
and their centroid
31
Tensor Closeness to Diagonal (CDG)
• The distance from the tensors
centroid to the diagonal line in
multi-dimensional space
• Mean and Standard deviation
of the distances from tensors
to the diagonal line
32
Tensor Closeness to Nearest Axis (CNA)
• The distance from the tensors
centroid to the nearest axis in
multi-dimensional space
• Mean and Standard deviation
of the distances from tensors
to the nearest axis
33
Experiments
Collection #qry |QL|=2 |QL|=3 |QL|=4
Disk12 150 30(20%) 37(25%) 41(27%)
ROBUST04 250 75(33%) 147(59%) 17(7%)
AQUAINT 50 21(42%) 27(54%) 1(2%)
WT2G 50 24(48%) 23(46%) 0(0%)
WT10G 100 30(30%) 25(25%) 20(20%)
GOV2 150 44(29%) 65(43%) 35(23%)
Keep Drop
34
Experiments - mapping labels from AP to
Integer
35
Experiments - LambdaMART with other
features
• Mutual Information (MI)
• Collection Term Frequency (CTF)
• Document Frequency (DF)
• Inverted Document Frequency (IDF)
• Min Document Term Frequency (MINTF) and Max Document
Term Frequency (MAXTF)
• Average Document Term Frequency (AVGTF) and Standard
Deviation Document Term Frequency (STDTF)
• Average Document Term Frequency with IDF (AVGTFIDF) and
with Collection Occurrence Probability (AVGTFCOP)
• Simplied Clarity Score (SCS)
36
Results
Collection OG SR UB
Disk12 0.3216
0.3309
+2.89
%
0.3372
+4.85
%
ROBUST0
4 0.2506
0.2566
+2.39
%
0.2662
+6.23
%
AQUAINT 0.2063
0.2091
+1.36
%
0.2184
+5.87
%
WT2G 0.2983
0.2983
+0.00
%
0.3083
+3.35
%
WT10G 0.2544
0.2663
+4.68
%
0.2738
+7.63
%
Collection OG SR UB
Disk12 0.2597
0.2833
+9.09
%
0.2880
+10.90
%
ROBUST0
4 0.2399
0.2643
+10.17%
0.2772
+15.55%
AQUAINT 0.2107
0.2323
+10.25%
0.2426
+15.14%
WT2G 0.3285
0.3380
+2.89
%
0.3580
+8.98
%
WT10G 0.1720
0.1949
+13.31%
0.2051
+19.24%
GOV2 0.3060
0.3113
-
1.73
%
0.3221
+5.26
%
|QL|=2 |QL|=3
37
Feature Analysis
BasicBasic PXMPXM TSTS TCPTCP
BasicBasic PXMPXM TS TCPTCP
• Performance Difference
• The larger the more important of the feature
38
Feature Analysis
Basic Features AVGTFCOP SCS CTF
Diff.
0.2294
-15.5%
0.2363
-12.9%
0.2370
-12.7%
TCP TCP(TC) TCP(CDG) TCP(CNA)
Diff.
0.2342
-14.0%
0.2359
-13.6%
0.2329
-14.7%
PXM PXM(h) PXM(corr)
Diff.
0.2341
-14.2%
0.2364
-13.3%
TS TS1 TS2 TS3
Diff.
0.2337
-13.5%
0.2256
-16.2%
0.2259
-16.1%
TS1: TS(MAX/MIN,SUM); TS2: TS(SUM,SUM); TS3: TS(GMEAN,MEAN)
39
Related Work – Query Reduction
• Statistical Features
• TF-IDF based
• Mutual Information
• Domain specific
• Query Features
• Similarity Original Query
• Term Dependency Features
• Tree-based dependency
• Post Retrieval Features
• Query-document Relevance Scores
• Weighted Information Gain
• Query drift
Thank You!
Q & A
40

More Related Content

Similar to Can Short Queries Be Optimized Through Subquery Identification

AI3391 Artificial Intelligence Session 21 CSP.pptx
AI3391 Artificial Intelligence Session 21 CSP.pptxAI3391 Artificial Intelligence Session 21 CSP.pptx
AI3391 Artificial Intelligence Session 21 CSP.pptxAsst.prof M.Gokilavani
 
Metody logiczne w analizie danych
Metody logiczne w analizie danych Metody logiczne w analizie danych
Metody logiczne w analizie danych Data Science Warsaw
 
Faster and smaller inverted indices with Treaps Research Paper
Faster and smaller inverted indices with Treaps Research PaperFaster and smaller inverted indices with Treaps Research Paper
Faster and smaller inverted indices with Treaps Research Papersameiralk
 
Test design made easy (and fun) Rik Marselis EuroSTAR
Test design made easy (and fun) Rik Marselis EuroSTARTest design made easy (and fun) Rik Marselis EuroSTAR
Test design made easy (and fun) Rik Marselis EuroSTARRik Marselis
 
Reverted Indexing for Expansion and Feedback
Reverted Indexing for Expansion and FeedbackReverted Indexing for Expansion and Feedback
Reverted Indexing for Expansion and FeedbackGene Golovchinsky
 
L5. Data Transformation and Feature Engineering
L5. Data Transformation and Feature EngineeringL5. Data Transformation and Feature Engineering
L5. Data Transformation and Feature EngineeringMachine Learning Valencia
 
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...Innovation Quotient Pvt Ltd
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchGreg Makowski
 
Multiple objectives in Collaborative Filtering (RecSys 2010)
Multiple objectives in Collaborative Filtering (RecSys 2010)Multiple objectives in Collaborative Filtering (RecSys 2010)
Multiple objectives in Collaborative Filtering (RecSys 2010)Tamas Jambor
 
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...How Do Gain and Discount Functions Affect the Correlation between DCG and Use...
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...Julián Urbano
 
"Эффективность и оптимизация кода в Java 8" Сергей Моренец
"Эффективность и оптимизация кода в Java 8" Сергей Моренец"Эффективность и оптимизация кода в Java 8" Сергей Моренец
"Эффективность и оптимизация кода в Java 8" Сергей МоренецFwdays
 
IoT with Azure Machine Learning and InfluxDB
IoT with Azure Machine Learning and InfluxDBIoT with Azure Machine Learning and InfluxDB
IoT with Azure Machine Learning and InfluxDBIvo Andreev
 
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffDatabases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffTimescale
 
Journey of Migrating Millions of Queries on The Cloud
Journey of Migrating Millions of Queries on The CloudJourney of Migrating Millions of Queries on The Cloud
Journey of Migrating Millions of Queries on The Cloudtakezoe
 
Optical properties materials_studio_55
Optical properties materials_studio_55Optical properties materials_studio_55
Optical properties materials_studio_55BIOVIA
 
Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Maarten Smeets
 

Similar to Can Short Queries Be Optimized Through Subquery Identification (20)

AI3391 Artificial Intelligence Session 21 CSP.pptx
AI3391 Artificial Intelligence Session 21 CSP.pptxAI3391 Artificial Intelligence Session 21 CSP.pptx
AI3391 Artificial Intelligence Session 21 CSP.pptx
 
Multivariate Analysis
Multivariate AnalysisMultivariate Analysis
Multivariate Analysis
 
Metody logiczne w analizie danych
Metody logiczne w analizie danych Metody logiczne w analizie danych
Metody logiczne w analizie danych
 
Faster and smaller inverted indices with Treaps Research Paper
Faster and smaller inverted indices with Treaps Research PaperFaster and smaller inverted indices with Treaps Research Paper
Faster and smaller inverted indices with Treaps Research Paper
 
Test design made easy (and fun) Rik Marselis EuroSTAR
Test design made easy (and fun) Rik Marselis EuroSTARTest design made easy (and fun) Rik Marselis EuroSTAR
Test design made easy (and fun) Rik Marselis EuroSTAR
 
Reverted Indexing for Expansion and Feedback
Reverted Indexing for Expansion and FeedbackReverted Indexing for Expansion and Feedback
Reverted Indexing for Expansion and Feedback
 
Trivandrum
TrivandrumTrivandrum
Trivandrum
 
L5. Data Transformation and Feature Engineering
L5. Data Transformation and Feature EngineeringL5. Data Transformation and Feature Engineering
L5. Data Transformation and Feature Engineering
 
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
Multiple objectives in Collaborative Filtering (RecSys 2010)
Multiple objectives in Collaborative Filtering (RecSys 2010)Multiple objectives in Collaborative Filtering (RecSys 2010)
Multiple objectives in Collaborative Filtering (RecSys 2010)
 
Icpc11c.ppt
Icpc11c.pptIcpc11c.ppt
Icpc11c.ppt
 
Icpc11c.ppt
Icpc11c.pptIcpc11c.ppt
Icpc11c.ppt
 
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...How Do Gain and Discount Functions Affect the Correlation between DCG and Use...
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...
 
"Эффективность и оптимизация кода в Java 8" Сергей Моренец
"Эффективность и оптимизация кода в Java 8" Сергей Моренец"Эффективность и оптимизация кода в Java 8" Сергей Моренец
"Эффективность и оптимизация кода в Java 8" Сергей Моренец
 
IoT with Azure Machine Learning and InfluxDB
IoT with Azure Machine Learning and InfluxDBIoT with Azure Machine Learning and InfluxDB
IoT with Azure Machine Learning and InfluxDB
 
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffDatabases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
 
Journey of Migrating Millions of Queries on The Cloud
Journey of Migrating Millions of Queries on The CloudJourney of Migrating Millions of Queries on The Cloud
Journey of Migrating Millions of Queries on The Cloud
 
Optical properties materials_studio_55
Optical properties materials_studio_55Optical properties materials_studio_55
Optical properties materials_studio_55
 
Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!
 

More from Twitter Inc.

An Exploration of Ranking-based Strategy for Contextual Suggestions
An Exploration of Ranking-based Strategy for Contextual SuggestionsAn Exploration of Ranking-based Strategy for Contextual Suggestions
An Exploration of Ranking-based Strategy for Contextual SuggestionsTwitter Inc.
 
An Opinion-aware Approach to Contextual Suggestion
An Opinion-aware Approach to Contextual SuggestionAn Opinion-aware Approach to Contextual Suggestion
An Opinion-aware Approach to Contextual SuggestionTwitter Inc.
 
Evaluating the Effectiveness of Axiomatic Approaches in Web Track
Evaluating the Effectiveness of Axiomatic Approaches in Web TrackEvaluating the Effectiveness of Axiomatic Approaches in Web Track
Evaluating the Effectiveness of Axiomatic Approaches in Web TrackTwitter Inc.
 
VIRLab SIGIR14 Demo
VIRLab SIGIR14 DemoVIRLab SIGIR14 Demo
VIRLab SIGIR14 DemoTwitter Inc.
 
Combining the opinion profile modeling with complex context filtering for Con...
Combining the opinion profile modeling with complex context filtering for Con...Combining the opinion profile modeling with complex context filtering for Con...
Combining the opinion profile modeling with complex context filtering for Con...Twitter Inc.
 
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...Twitter Inc.
 
Retrieval Performance Bound Analysis for Single Term Queries
Retrieval Performance Bound Analysis for Single Term QueriesRetrieval Performance Bound Analysis for Single Term Queries
Retrieval Performance Bound Analysis for Single Term QueriesTwitter Inc.
 
Opinion-based User Profile Modeling for Contextual Suggestions
Opinion-based User Profile Modeling for Contextual SuggestionsOpinion-based User Profile Modeling for Contextual Suggestions
Opinion-based User Profile Modeling for Contextual SuggestionsTwitter Inc.
 
Anserini SIGIR 2017 Poster
Anserini SIGIR 2017 PosterAnserini SIGIR 2017 Poster
Anserini SIGIR 2017 PosterTwitter Inc.
 
TREC 2014 Contextual Suggestion Talk
TREC 2014 Contextual Suggestion TalkTREC 2014 Contextual Suggestion Talk
TREC 2014 Contextual Suggestion TalkTwitter Inc.
 
A Reproducibility Study of Information Retrieval Models
A Reproducibility Study of Information Retrieval ModelsA Reproducibility Study of Information Retrieval Models
A Reproducibility Study of Information Retrieval ModelsTwitter Inc.
 

More from Twitter Inc. (11)

An Exploration of Ranking-based Strategy for Contextual Suggestions
An Exploration of Ranking-based Strategy for Contextual SuggestionsAn Exploration of Ranking-based Strategy for Contextual Suggestions
An Exploration of Ranking-based Strategy for Contextual Suggestions
 
An Opinion-aware Approach to Contextual Suggestion
An Opinion-aware Approach to Contextual SuggestionAn Opinion-aware Approach to Contextual Suggestion
An Opinion-aware Approach to Contextual Suggestion
 
Evaluating the Effectiveness of Axiomatic Approaches in Web Track
Evaluating the Effectiveness of Axiomatic Approaches in Web TrackEvaluating the Effectiveness of Axiomatic Approaches in Web Track
Evaluating the Effectiveness of Axiomatic Approaches in Web Track
 
VIRLab SIGIR14 Demo
VIRLab SIGIR14 DemoVIRLab SIGIR14 Demo
VIRLab SIGIR14 Demo
 
Combining the opinion profile modeling with complex context filtering for Con...
Combining the opinion profile modeling with complex context filtering for Con...Combining the opinion profile modeling with complex context filtering for Con...
Combining the opinion profile modeling with complex context filtering for Con...
 
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
 
Retrieval Performance Bound Analysis for Single Term Queries
Retrieval Performance Bound Analysis for Single Term QueriesRetrieval Performance Bound Analysis for Single Term Queries
Retrieval Performance Bound Analysis for Single Term Queries
 
Opinion-based User Profile Modeling for Contextual Suggestions
Opinion-based User Profile Modeling for Contextual SuggestionsOpinion-based User Profile Modeling for Contextual Suggestions
Opinion-based User Profile Modeling for Contextual Suggestions
 
Anserini SIGIR 2017 Poster
Anserini SIGIR 2017 PosterAnserini SIGIR 2017 Poster
Anserini SIGIR 2017 Poster
 
TREC 2014 Contextual Suggestion Talk
TREC 2014 Contextual Suggestion TalkTREC 2014 Contextual Suggestion Talk
TREC 2014 Contextual Suggestion Talk
 
A Reproducibility Study of Information Retrieval Models
A Reproducibility Study of Information Retrieval ModelsA Reproducibility Study of Information Retrieval Models
A Reproducibility Study of Information Retrieval Models
 

Recently uploaded

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 

Recently uploaded (20)

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 

Can Short Queries Be Optimized Through Subquery Identification

  • 1. Can Short Queries Be Even Shorter? University of Delaware 1
  • 2. Long/Verbose Queries Had Been Extensively Studied 2 • Example long query here …
  • 3. 3 However … Can short queries have the similar property?
  • 4. 4 Family Leave Law (ROBUST04 qid:648) 0.2725 MAP However … Can short queries have the similar property?
  • 5. 5 Family Leave Law (ROBUST04 qid:648) 0.2725 MAP Family Leave 0.4679 However … Can short queries have the similar property?
  • 6. 6 Family Leave Law (ROBUST04 qid:648) 0.2725 MAP Family Leave 0.4679 However … Can short queries have the similar property? • Subquery of the short query could be better!
  • 7. A high level overview • A comparison between the Best Subqueries with the Original Queries for TREC collections: 7 Collection Orig. Queries Best Subqueries Diff. Disk12 0.2597 0.2880 +10.9% ROBUST04 0.2399 0.2772 +15.5% AQUAINT 0.2107 0.2426 +15.1% WT2G 0.3285 0.3580 +9.0% WT10G 0.1720 0.2051 +19.2% GOV2 0.3060 0.3221 +5.3% On Average 0.2528 0.2821 +12.5%
  • 8. 8 Question: “Family Leave Law” Original Query “Family Leave” Best Subquery ? • Can we identify those optimal subqueries? • How do identify?
  • 9. We formulate it as a Subquery Ranking Problem Family Leave Law 9 Family Leave Law Family Leave Leave Law Family Law F F F F F F F 0.2725 0.0029 0.2477 0.0000 0.4679 0.0639 0.0046 LearnExtract Subquery Features Label(MAP )
  • 10. Then the key is the Features 10 Family Leave Features
  • 11. Previously Proposed Features 11 Previously Proposed Features (for verbose query) Statistical Query Post-Retrieval TF IDF Collection TF Collection IDF Mutual Information Similarity with Orig. Contain Stopwords? Query Drift Query Scope Clarity Score Weighted Information Gain Family Leave Features
  • 12. The Problem of Previously Proposed Features 12 Family Leave Law IDFs 13.26 12.39 8.98
  • 13. The Problem of Previously Proposed Features 13 Remove the term with lowest IDF Family Leave Law IDFs 13.26 12.39 8.98
  • 14. The Problem of Previously Proposed Features 14 ? ? When stop removing? Remove the term with lowest IDF Family Leave Law IDFs 13.26 12.39 8.98
  • 15. The Problem of Previously Proposed Features 15 Other features do not work well (details in the paper) ? ? When stop removing? Remove the term with lowest IDF Family Leave Law IDFs 13.26 12.39 8.98
  • 16. New futures are proposed to tackle the problem • Post-retrieval • Focus on term relationship • document level features term level features 16
  • 17. New futures are proposed to tackle the problem • Post-retrieval • Focus on term relationship • document level features term level features • 3 Categories of features • Term Proximity based Features • Term Score based Features • Compactness and Positions of Term Score Tensors 17
  • 18. Term Proximity based Features (PXM) • Term Dependency Model [Metzler05] 18 Family Leave Law
  • 19. Term Proximity based Features (PXM) • Term Dependency Model [Metzler05] 19 Family Leave Law • Already know it is a law code • Occur together • In that order
  • 20. Term Proximity based Features (PXM) • Term Dependency Model [Metzler05] 20 • Already know it is a law code • Occur together • In that order How to capture the feature? Family Leave Law
  • 21. How to Capture PXM? • Use proximity query 21 #combine(#uw4(family leave) #ow4(family leave)) Unordered Window of 4 Ordered Window of 4
  • 22. • Use proximity query 22 #combine(#uw4(family leave) #ow4(family leave)) Unordered Window of 4 Ordered Window of 4 • Explore the ranking scores 0.5894 0.5632 0.5323 0.4927 How to Capture PXM? MIN MAX MAX-MIN MAX/MIN SUM MEAN STD GMEAN proximity ranking scores 0.5894 0.5632 0.5323 0.4927 proximity ranking scores 0.6288 0.6109 0.6099 0.5912 original ranking scores correlationcorrelation
  • 23. Term Score based Features (TS) • TF-IDF Constraint [Fang2011] 23 SVM Tutorial SVM Tutorial 99 1 50 50 Counter Intuitive
  • 24. • TF-IDF Constraint [Fang2011] 24 • We instead look at the term scores… SVM Tutorial SVM Tutorial 99 1 50 50 Counter Intuitive Term Score based Features (TS)
  • 25. 25 • We look at the term scores… • Colors are relevant probability • Queries have different term scores distribution One term is more important Terms are of relatively equivalent importance Term Score based Features (TS)
  • 26. 26 • Explore the ranking scores of terms 0.2123 0.4596 0.0038 0.2346 0.4087 0.0002 0.2016 0.4456 0.0016 0.1946 0.4213 0.0027 0.1942 0.3928 0.0059 How to Capture TS? Family Leave Law feature func (max) feature funcs MIN, MAX, MAX-MIN, MAX/MIN, SUM, MEAN, STD, GMEAN 0.4596 0.4087 0.4456 0.4213 0.3928 feature func (mean) 0.4256 Final Feature doc1 doc2 doc3 doc4 doc5 Individual Term Score
  • 27. Compactness and Positions of Term Score Tensors (TCP) • Normalized Query Commitment (NQC) [Shtok2012] 27 0.5894 0.5632 0.5323 0.4927 document ranking scores 0.6678 0.5632 0.4896 Quote: “Higher deviation value was correlated with potentially lower query drift, and thus indicating the better effectiveness" Larger Gap Larger Gap
  • 28. 28 Compactness and Positions of Term Score Tensors (TCP) • We instead look at the term scores… • Term scores as tensors in multi-dimensional space Relevant Documents NonRelevant Documents
  • 29. 29 Compactness and Positions of Term Score Tensors (TCP) • We instead look at the term scores… • Term scores as tensors in multi-dimensional space • Best subquery has more compact tensors • But clustered at different locations Relevant Documents NonRelevant Documents
  • 30. 30 Compactness of Tensors • Mean and Standard Deviation of the distances between tensors and their centroid
  • 31. 31 Tensor Closeness to Diagonal (CDG) • The distance from the tensors centroid to the diagonal line in multi-dimensional space • Mean and Standard deviation of the distances from tensors to the diagonal line
  • 32. 32 Tensor Closeness to Nearest Axis (CNA) • The distance from the tensors centroid to the nearest axis in multi-dimensional space • Mean and Standard deviation of the distances from tensors to the nearest axis
  • 33. 33 Experiments Collection #qry |QL|=2 |QL|=3 |QL|=4 Disk12 150 30(20%) 37(25%) 41(27%) ROBUST04 250 75(33%) 147(59%) 17(7%) AQUAINT 50 21(42%) 27(54%) 1(2%) WT2G 50 24(48%) 23(46%) 0(0%) WT10G 100 30(30%) 25(25%) 20(20%) GOV2 150 44(29%) 65(43%) 35(23%) Keep Drop
  • 34. 34 Experiments - mapping labels from AP to Integer
  • 35. 35 Experiments - LambdaMART with other features • Mutual Information (MI) • Collection Term Frequency (CTF) • Document Frequency (DF) • Inverted Document Frequency (IDF) • Min Document Term Frequency (MINTF) and Max Document Term Frequency (MAXTF) • Average Document Term Frequency (AVGTF) and Standard Deviation Document Term Frequency (STDTF) • Average Document Term Frequency with IDF (AVGTFIDF) and with Collection Occurrence Probability (AVGTFCOP) • Simplied Clarity Score (SCS)
  • 36. 36 Results Collection OG SR UB Disk12 0.3216 0.3309 +2.89 % 0.3372 +4.85 % ROBUST0 4 0.2506 0.2566 +2.39 % 0.2662 +6.23 % AQUAINT 0.2063 0.2091 +1.36 % 0.2184 +5.87 % WT2G 0.2983 0.2983 +0.00 % 0.3083 +3.35 % WT10G 0.2544 0.2663 +4.68 % 0.2738 +7.63 % Collection OG SR UB Disk12 0.2597 0.2833 +9.09 % 0.2880 +10.90 % ROBUST0 4 0.2399 0.2643 +10.17% 0.2772 +15.55% AQUAINT 0.2107 0.2323 +10.25% 0.2426 +15.14% WT2G 0.3285 0.3380 +2.89 % 0.3580 +8.98 % WT10G 0.1720 0.1949 +13.31% 0.2051 +19.24% GOV2 0.3060 0.3113 - 1.73 % 0.3221 +5.26 % |QL|=2 |QL|=3
  • 37. 37 Feature Analysis BasicBasic PXMPXM TSTS TCPTCP BasicBasic PXMPXM TS TCPTCP • Performance Difference • The larger the more important of the feature
  • 38. 38 Feature Analysis Basic Features AVGTFCOP SCS CTF Diff. 0.2294 -15.5% 0.2363 -12.9% 0.2370 -12.7% TCP TCP(TC) TCP(CDG) TCP(CNA) Diff. 0.2342 -14.0% 0.2359 -13.6% 0.2329 -14.7% PXM PXM(h) PXM(corr) Diff. 0.2341 -14.2% 0.2364 -13.3% TS TS1 TS2 TS3 Diff. 0.2337 -13.5% 0.2256 -16.2% 0.2259 -16.1% TS1: TS(MAX/MIN,SUM); TS2: TS(SUM,SUM); TS3: TS(GMEAN,MEAN)
  • 39. 39 Related Work – Query Reduction • Statistical Features • TF-IDF based • Mutual Information • Domain specific • Query Features • Similarity Original Query • Term Dependency Features • Tree-based dependency • Post Retrieval Features • Query-document Relevance Scores • Weighted Information Gain • Query drift