TPDL'11: Query Operators Shown Beneficial for Improving Search Results

TPDL’11: International Conference on Theory
and Practice of Digital Libraries
September 25-29, Berlin, Germany

Query Operators Shown Beneficial for
Improving Search Results

Gilles Hubert, Guillaume Cabanac,
Christian Sallaberry, Damien Palacio

Query Operators Shown Beneficial for Improving Search Results G. Hubert et al.

Outline

1. Context Operators in Search Queries

2. Methodology Assessing the effects of query operators

3. Experiments Potential of effectiveness yielded
and Results by operators

4. Conclusion and Future Work

2


Outline





3

1. Context  Operators in Search Queries G. Hubert et al.

Search Engines Offer Query Operators

Information need
“I’m looking for research projects funded in the DL domain”

Regular query Query with operators

 Various Operators
 Quotation marks, Must appear (+), boosting operator (^),
Boolean operators, proximity operators…
4



Information need


 
 
  

 Case 1: What designers of search engines may expect
5



Information need


 
 

 Case 2: What users of search engines may believe
6



Information need


   
  

 Case 3: What designers of search engines may fear
7


Usage of Query Operators
 Quantitative Studies
Excite
Altavista [Jansen et al. 2000]
[Silverstein et al., 1999] Excite Google+MSN Search+Yahoo!
[Spink et al., 2001] [White and Morris, 2007]
25%
Queries with operators

20%

15%

10%

5%

0%
1999 2000 2001 2002 2003 2004 2005 2006 2007

 Possible Explanations
 Unknown features?
 No improvement observed? 8


 Qualitative Studies
 Users
 Average users not comfortable with “advanced means of searching”
[Jansen et al., 2000]
 Expert users recourse to query operators more frequently
[Hölscher and Strube, 2000; Lucas and Topi, 2002; White and Morris, 2007]

 Information Needs
 More used in dedicated search
[Jansen and Pooch, 2001]
 Difficulty in finding information (e.g., complex information needs)
[Aula et al., 2010]

 Appropriateness
 Operators used in a “semantically appropriate manner”
[Eastman and Jansen, 2004]

9


 Effects of Query Operators on Effectiveness

 Eastman and Jansen studied queries with operators
 Real users: AOL, Google and MSN Search
 Operators: AND, OR, MUST APPEAR and PHRASE

 No statistically significant improvement P@10

10



 Study on 20% of all queries
 Expert users
 Complex needs (Queries with operators)

11



 What about the other 80% of all queries ?!
 Average users
 Regular queries (no operators)

12


Outline





13

2. Methodology  Assessing the effects of query operators G. Hubert et al.

Our Research Questions

Q = Do query operators lead to improved search results?

Q1 = Maximum gain in Q2 = Do users succeed in
effectiveness when enriching formulating better queries
a query with operators? involving operators?

14


Our Methodology in a Nutshell
. VN
V4 . .
V3
V2
Regular query V1: Query variant with operators

 
 
  


15


Overview of the Methodology

preOps

postOps Query Variant {v1, … , vi, …, vn}
Generator
query

corpus Search l(vi)
IR model Engine
Evaluation
qrels measures of
Procedure effectiveness
metrics

Usual evaluation framework in IR

Components introduced for this study
16


Outline





17

3. Experiments and Results  Potential of effectiveness yielded by operators G. Hubert et al.

Experiment Settings
 Standard Test Collections
 TREC-7
 TREC-8
Variant # Query variants generated with preOps and postOps

 Query Operators 1 encryption equipment export
2 encryption +equipment +export
 Must appear (+) … … … …
 Term boosting (^N) 124 encryption +equipment export^10
… … … …
338 encryption^30 equipment^40 export^50
 Variant Generation
 Must appear ‘+’ only
 Boost ‘^’ only with weights ^10, ^20, ^30, ^40, and ^50
 Both ‘+’ and ‘^’

 Search engine
 Terrier with various models: BM25, DFR_BM25, InL2, PL2, TF_IDF
18


Results
 TREC-7 per Topic Analysis: Boxplots
 ‘+’ and ‘^’

19


Results
 Per Topic Analysis: Boxplot
0.4 Query variant highest AP
AP (Average Precision)

0.3

AP of TREC’s regular query

0.2

0.1

Query variant lowest AP

Topics
32 20


Results
 TREC-7 Per Topic Analysis MAP  = 0.1554
+35.1%
 ‘+’ and ‘^’ MAP ┬ = 0.2099

21


Results
 TREC-8 per Topic Analysis MAP  = 0.1840
+24.3%
 ‘+’ and ‘^’ MAP ┬ = 0.2288

22


Results
 Global Analysis: MAP
 ‘+’ only

TREC-7 TREC-8
MAP MAP
Model Baseline VOP (%) Baseline VOP (%)
BM25 0.1677 0.1836 9.5** 0.1957 0.2154 10.2*
DFR_BM25 0.1683 0.1843 9.5** 0.1965 0.2162 10.0*
InL2 0.1710 0.1852 8.3** 0.1996 0.2172 8.8*
PL2 0.1554 0.1826 17.5** 0.1840 0.2106 14.5**
TF_IDF 0.1674 0.1833 9.5** 0.1964 0.2158 9.9**
Statistical signiﬁcance is denoted by ‘*’ for p < 0.05 (‘**’ for p < 0.01)

23


Results
 ‘^’ only

TREC-7 TREC-8
MAP MAP
BM25 0.1677 0.2027 20.9** 0.1957 0.2312 18.1**
DFR_BM25 0.1683 0.2034 20.9** 0.1965 0.2316 17.9**
InL2 0.1710 0.2059 20.4** 0.1996 0.2352 17.8**
PL2 0.1554 0.1926 23.9** 0.1840 0.2173 18.1**
TF_IDF 0.1674 0.2026 21.0** 0.1964 0.2312 17.7**

24


Results
 ‘+’ and ‘^’

TREC-7 TREC-8
MAP MAP
BM25 0.1677 0.2132 27.1** 0.1957 0.2381 21.7**
DFR_BM25 0.1683 0.2133 26.7** 0.1965 0.2387 21.5**
InL2 0.1710 0.2144 25.4** 0.1996 0.2407 20.6**
PL2 0.1554 0.2099 35.1** 0.1840 0.2288 24.3**
TF_IDF 0.1674 0.2131 27.3** 0.1964 0.2383 21.3**

25


Outline





26

4. Conclusion and Future Work G. Hubert et al.

Conclusions
 H: the Proper Use of Query Operators Improves Search Results

 Methodology to Validate H

 Standard IR Test Collections: TREC-7 and TREC-8

 Must Appear (+) and Boosting Operators (^)

 Findings
 Observed gain up to 35.1%
 Statistically signiﬁcant
 For all tested IR models and collections

 Users Should Use Query Operators More Often
27

4. Conclusion and Future Work G. Hubert et al.

Future Work
 Short Term
 Experimenting our methodology in various contexts
 Additional IR collections

 Additional IR models

 Additional query operators

 Medium Term
 Address Q2: Do users succeed in formulating queries with operators,
so that these lead to a significant gain in effectiveness?
 Study other factors
 Number of terms

 Selection of terms

 Long Term
 Additional dimensions of information
 Geographic IR 28

TPDL’11: International Conference on Theory
and Practice of Digital Libraries
September 25-29, Berlin, Germany

Thank you

TPDL'11: Query Operators Shown Beneficial for Improving Search Results

Recommended

Recommended

More Related Content

Similar to TPDL'11: Query Operators Shown Beneficial for Improving Search Results

Similar to TPDL'11: Query Operators Shown Beneficial for Improving Search Results (16)

More from Guillaume Cabanac

More from Guillaume Cabanac (20)

TPDL'11: Query Operators Shown Beneficial for Improving Search Results