SlideShare a Scribd company logo
TPDL’11: International Conference on Theory
              and Practice of Digital Libraries
              September 25-29, Berlin, Germany




  Query Operators Shown Beneficial for
       Improving Search Results


    Gilles Hubert, Guillaume Cabanac,
    Christian Sallaberry, Damien Palacio
Query Operators Shown Beneficial for Improving Search Results      G. Hubert et al.



                                Outline

1. Context                    Operators in Search Queries

2. Methodology                Assessing the effects of query operators

3. Experiments                Potential of effectiveness yielded
   and Results                by operators

4. Conclusion and Future Work




                                                                         2
Query Operators Shown Beneficial for Improving Search Results      G. Hubert et al.



                                Outline

1. Context                    Operators in Search Queries

2. Methodology                Assessing the effects of query operators

3. Experiments                Potential of effectiveness yielded
   and Results                by operators

4. Conclusion and Future Work




                                                                         3
1. Context  Operators in Search Queries                                    G. Hubert et al.



Search Engines Offer Query Operators

                                    Information need
               “I’m looking for research projects funded in the DL domain”



              Regular query                              Query with operators




    Various Operators
         Quotation marks, Must appear (+), boosting operator (^),
          Boolean operators, proximity operators…
                                                                                      4
1. Context  Operators in Search Queries                                    G. Hubert et al.



Search Engines Offer Query Operators

                                Information need
           “I’m looking for research projects funded in the DL domain”



          Regular query                              Query with operators




                                                    
                                                         
                                                                       



     Case 1: What designers of search engines may expect
                                                                                   5
1. Context  Operators in Search Queries                                   G. Hubert et al.



Search Engines Offer Query Operators

                                Information need
           “I’m looking for research projects funded in the DL domain”



          Regular query                              Query with operators




                                                         
                                                                   



       Case 2: What users of search engines may believe
                                                                                  6
1. Context  Operators in Search Queries                                   G. Hubert et al.



Search Engines Offer Query Operators

                                Information need
           “I’m looking for research projects funded in the DL domain”



           Regular query                             Query with operators




                                                       
                                                                  



       Case 3: What designers of search engines may fear
                                                                                  7
1. Context  Operators in Search Queries                                                            G. Hubert et al.


Usage of Query Operators
    Quantitative Studies
                             Excite
        Altavista      [Jansen et al. 2000]
[Silverstein et al., 1999]                  Excite                                 Google+MSN Search+Yahoo!
                                      [Spink et al., 2001]                          [White and Morris, 2007]
                                   25%
          Queries with operators




                                   20%

                                   15%

                                   10%

                                   5%

                                   0%
                                         1999   2000   2001   2002   2003   2004   2005   2006   2007



    Possible Explanations
         Unknown features?
         No improvement observed?                                                                            8
1. Context  Operators in Search Queries                                               G. Hubert et al.


Usage of Query Operators
    Qualitative Studies
         Users
              Average users not comfortable with “advanced means of searching”
               [Jansen et al., 2000]
              Expert users recourse to query operators more frequently
               [Hölscher and Strube, 2000; Lucas and Topi, 2002; White and Morris, 2007]


         Information Needs
              More used in dedicated search
               [Jansen and Pooch, 2001]
              Difficulty in finding information (e.g., complex information needs)
               [Aula et al., 2010]


         Appropriateness
              Operators used in a “semantically appropriate manner”
               [Eastman and Jansen, 2004]

                                                                                                 9
1. Context  Operators in Search Queries                               G. Hubert et al.


Usage of Query Operators
    Effects of Query Operators on Effectiveness



                      Eastman and Jansen studied queries with operators
                         Real users: AOL, Google and MSN Search
                         Operators: AND, OR, MUST APPEAR and PHRASE

                         No statistically significant improvement P@10




                                   [Eastman and Jansen, 2003]
                                                                                10
1. Context  Operators in Search Queries                         G. Hubert et al.


Usage of Query Operators
    Effects of Query Operators on Effectiveness



                      Study on 20% of all queries
                           Expert users
                           Complex needs (Queries with operators)




                                    [Eastman and Jansen, 2003]
                                                                          11
1. Context  Operators in Search Queries                        G. Hubert et al.


Usage of Query Operators
    Effects of Query Operators on Effectiveness



                      What about the other 80% of all queries ?!
                           Average users
                           Regular queries (no operators)




                                    [Eastman and Jansen, 2001]
                                                                         12
Query Operators Shown Beneficial for Improving Search Results      G. Hubert et al.



                                Outline

1. Context                    Operators in Search Queries

2. Methodology                Assessing the effects of query operators

3. Experiments                Potential of effectiveness yielded
   and Results                by operators

4. Conclusion and Future Work




                                                                        13
2. Methodology  Assessing the effects of query operators            G. Hubert et al.


Our Research Questions



    Q = Do query operators lead to improved search results?




    Q1 = Maximum gain in                        Q2 = Do users succeed in
effectiveness when enriching                   formulating better queries
   a query with operators?                        involving operators?



                                                                           14
2. Methodology  Assessing the effects of query operators                   G. Hubert et al.


Our Methodology in a Nutshell
                                                                            . VN
                                                                     V4 . .
                                                               V3
                                                         V2
          Regular query                        V1: Query variant with operators




                                                     
                                                            
                                                                      




                                 
                                                                                  15
3. Methodology  Assessing the effects of query operators                      G. Hubert et al.


Overview of the Methodology


preOps

postOps         Query Variant      {v1, … , vi, …, vn}
                 Generator
query

corpus                                  Search           l(vi)
IR model                                Engine
                                                                 Evaluation
qrels                                                                         measures of
                                                                 Procedure    effectiveness
metrics




             Usual evaluation framework in IR

             Components introduced for this study
                                                                                       16
Query Operators Shown Beneficial for Improving Search Results      G. Hubert et al.



                                Outline

1. Context                    Operators in Search Queries

2. Methodology                Assessing the effects of query operators

3. Experiments                Potential of effectiveness yielded
   and Results                by operators

4. Conclusion and Future Work




                                                                        17
3. Experiments and Results  Potential of effectiveness yielded by operators                      G. Hubert et al.


Experiment Settings
    Standard Test Collections
         TREC-7
         TREC-8
                                       Variant #   Query variants generated with preOps and postOps

    Query Operators                      1         encryption       equipment          export
                                          2         encryption       +equipment         +export
         Must appear (+)                 …             …                …                 …
         Term boosting (^N)             124        encryption       +equipment        export^10
                                          …             …                …                 …
                                         338       encryption^30   equipment^40        export^50
    Variant Generation
         Must appear ‘+’ only
         Boost ‘^’ only with weights ^10, ^20, ^30, ^40, and ^50
         Both ‘+’ and ‘^’


    Search engine
         Terrier with various models: BM25, DFR_BM25, InL2, PL2, TF_IDF
                                                                                                           18
3. Experiments and Results  Potential of effectiveness yielded by operators   G. Hubert et al.


Results
    TREC-7 per Topic Analysis: Boxplots
         ‘+’ and ‘^’




                                                                                        19
3. Experiments and Results  Potential of effectiveness yielded by operators   G. Hubert et al.


Results
    Per Topic Analysis: Boxplot
                                          0.4          Query variant highest AP
                 AP (Average Precision)


                                          0.3

                                                       AP of TREC’s regular query

                                          0.2




                                          0.1

                                                        Query variant lowest AP

                                                     Topics
                                                32                                      20
3. Experiments and Results  Potential of effectiveness yielded by operators   G. Hubert et al.


Results
    TREC-7 Per Topic Analysis                      MAP  = 0.1554
                                                                         +35.1%
         ‘+’ and ‘^’                               MAP ┬ = 0.2099




                                                                                        21
3. Experiments and Results  Potential of effectiveness yielded by operators   G. Hubert et al.


Results
    TREC-8 per Topic Analysis                      MAP  = 0.1840
                                                                         +24.3%
         ‘+’ and ‘^’                               MAP ┬ = 0.2288




                                                                                        22
3. Experiments and Results  Potential of effectiveness yielded by operators                    G. Hubert et al.


Results
    Global Analysis: MAP
          ‘+’ only


                                   TREC-7                                     TREC-8
                              MAP                                       MAP
     Model            Baseline       VOP          (%)          Baseline       VOP           (%)
     BM25              0.1677       0.1836        9.5**          0.1957       0.2154         10.2*
     DFR_BM25          0.1683       0.1843        9.5**          0.1965       0.2162         10.0*
     InL2              0.1710       0.1852        8.3**          0.1996       0.2172          8.8*
     PL2               0.1554       0.1826       17.5**          0.1840       0.2106         14.5**
     TF_IDF            0.1674       0.1833        9.5**          0.1964       0.2158          9.9**
                Statistical significance is denoted by ‘*’ for p < 0.05 (‘**’ for p < 0.01)




                                                                                                         23
3. Experiments and Results  Potential of effectiveness yielded by operators                    G. Hubert et al.


Results
    Global Analysis: MAP
          ‘^’ only


                                   TREC-7                                     TREC-8
                              MAP                                       MAP
     Model            Baseline       VOP          (%)          Baseline       VOP           (%)
     BM25              0.1677       0.2027       20.9**          0.1957       0.2312         18.1**
     DFR_BM25          0.1683       0.2034       20.9**          0.1965       0.2316         17.9**
     InL2              0.1710       0.2059       20.4**          0.1996       0.2352         17.8**
     PL2               0.1554       0.1926       23.9**          0.1840       0.2173         18.1**
     TF_IDF            0.1674       0.2026       21.0**          0.1964       0.2312         17.7**
                Statistical significance is denoted by ‘*’ for p < 0.05 (‘**’ for p < 0.01)




                                                                                                         24
3. Experiments and Results  Potential of effectiveness yielded by operators                    G. Hubert et al.


Results
    Global Analysis: MAP
          ‘+’ and ‘^’


                                    TREC-7                                    TREC-8
                               MAP                                      MAP
     Model               Baseline    VOP          (%)          Baseline       VOP           (%)
     BM25                0.1677     0.2132       27.1**          0.1957       0.2381         21.7**
     DFR_BM25            0.1683     0.2133       26.7**          0.1965       0.2387         21.5**
     InL2                0.1710     0.2144       25.4**          0.1996       0.2407         20.6**
     PL2                 0.1554     0.2099       35.1**          0.1840       0.2288         24.3**
     TF_IDF              0.1674     0.2131       27.3**          0.1964       0.2383         21.3**
                Statistical significance is denoted by ‘*’ for p < 0.05 (‘**’ for p < 0.01)




                                                                                                         25
Query Operators Shown Beneficial for Improving Search Results      G. Hubert et al.



                                Outline

1. Context                    Operators in Search Queries

2. Methodology                Assessing the effects of query operators

3. Experiments                Potential of effectiveness yielded
   and Results                by operators

4. Conclusion and Future Work




                                                                        26
4. Conclusion and Future Work                           G. Hubert et al.


Conclusions
    H: the Proper Use of Query Operators Improves Search Results

    Methodology to Validate H

    Standard IR Test Collections: TREC-7 and TREC-8

    Must Appear (+) and Boosting Operators (^)

    Findings
       Observed gain up to 35.1%
       Statistically significant
       For all tested IR models and collections



 Users Should Use Query Operators More Often
                                                                 27
4. Conclusion and Future Work                                               G. Hubert et al.


Future Work
    Short Term
         Experimenting our methodology in various contexts
             Additional IR collections

             Additional IR models

             Additional query operators




    Medium Term
         Address Q2: Do users succeed in formulating queries with operators,
          so that these lead to a significant gain in effectiveness?
         Study other factors
             Number of terms

             Selection of terms




    Long Term
         Additional dimensions of information
            Geographic IR                                                           28
TPDL’11: International Conference on Theory
              and Practice of Digital Libraries
              September 25-29, Berlin, Germany




           Thank you

More Related Content

Similar to TPDL'11: Query Operators Shown Beneficial for Improving Search Results

Enterprise Search Research Article: Designing for Enterprise Search in a Glob...
Enterprise Search Research Article: Designing for Enterprise Search in a Glob...Enterprise Search Research Article: Designing for Enterprise Search in a Glob...
Enterprise Search Research Article: Designing for Enterprise Search in a Glob...
Findwise
 
Azzopardi2012economics of iir_tech_talk
Azzopardi2012economics of iir_tech_talkAzzopardi2012economics of iir_tech_talk
Azzopardi2012economics of iir_tech_talk
Leif Azzopardi
 
Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014
Marianne Sweeny
 
Design the Search Experience
Design the Search ExperienceDesign the Search Experience
Design the Search Experience
Marianne Sweeny
 
Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By Design
Marianne Sweeny
 
Design Principles of Advanced Task Elicitation Systems
Design Principles of Advanced Task Elicitation SystemsDesign Principles of Advanced Task Elicitation Systems
Design Principles of Advanced Task Elicitation SystemsProf. Dr. Alexander Maedche
 
Not venturini enter_2013
Not venturini enter_2013Not venturini enter_2013
Not venturini enter_2013
Adriano Venturini
 
Aspect extraction (A survey)
Aspect extraction (A survey)Aspect extraction (A survey)
Aspect extraction (A survey)
Mido Razaz
 
Smashing SIlos: UX is the New SEO
Smashing SIlos: UX is the New SEOSmashing SIlos: UX is the New SEO
Smashing SIlos: UX is the New SEO
BrightEdge
 
Optimal SEO (Marianne Sweeny)
Optimal SEO (Marianne Sweeny) Optimal SEO (Marianne Sweeny)
Optimal SEO (Marianne Sweeny)
uxpa-dc
 
Birds Bears and Bs:Optimal SEO for Today's Search Engines
Birds Bears and Bs:Optimal SEO for Today's Search EnginesBirds Bears and Bs:Optimal SEO for Today's Search Engines
Birds Bears and Bs:Optimal SEO for Today's Search Engines
Marianne Sweeny
 
Advanced Methods for User Evaluation in Enterprise AR
Advanced Methods for User Evaluation in Enterprise ARAdvanced Methods for User Evaluation in Enterprise AR
Advanced Methods for User Evaluation in Enterprise AR
Mark Billinghurst
 
Better UX Surveys at UCD2012 by @cjforms
Better UX Surveys at UCD2012 by @cjformsBetter UX Surveys at UCD2012 by @cjforms
Better UX Surveys at UCD2012 by @cjforms
Caroline Jarrett
 
Icete content-based filtering with applications on tv viewing data
Icete   content-based filtering with applications on tv viewing dataIcete   content-based filtering with applications on tv viewing data
Icete content-based filtering with applications on tv viewing data
Elaine Cecília Gatto
 
Userzoom Webinar Monster Aug09
Userzoom Webinar Monster Aug09Userzoom Webinar Monster Aug09
Userzoom Webinar Monster Aug09Alfonso de la Nuez
 

Similar to TPDL'11: Query Operators Shown Beneficial for Improving Search Results (16)

Enterprise Search Research Article: Designing for Enterprise Search in a Glob...
Enterprise Search Research Article: Designing for Enterprise Search in a Glob...Enterprise Search Research Article: Designing for Enterprise Search in a Glob...
Enterprise Search Research Article: Designing for Enterprise Search in a Glob...
 
Azzopardi2012economics of iir_tech_talk
Azzopardi2012economics of iir_tech_talkAzzopardi2012economics of iir_tech_talk
Azzopardi2012economics of iir_tech_talk
 
Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014
 
Design the Search Experience
Design the Search ExperienceDesign the Search Experience
Design the Search Experience
 
pedersen
pedersenpedersen
pedersen
 
Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By Design
 
Design Principles of Advanced Task Elicitation Systems
Design Principles of Advanced Task Elicitation SystemsDesign Principles of Advanced Task Elicitation Systems
Design Principles of Advanced Task Elicitation Systems
 
Not venturini enter_2013
Not venturini enter_2013Not venturini enter_2013
Not venturini enter_2013
 
Aspect extraction (A survey)
Aspect extraction (A survey)Aspect extraction (A survey)
Aspect extraction (A survey)
 
Smashing SIlos: UX is the New SEO
Smashing SIlos: UX is the New SEOSmashing SIlos: UX is the New SEO
Smashing SIlos: UX is the New SEO
 
Optimal SEO (Marianne Sweeny)
Optimal SEO (Marianne Sweeny) Optimal SEO (Marianne Sweeny)
Optimal SEO (Marianne Sweeny)
 
Birds Bears and Bs:Optimal SEO for Today's Search Engines
Birds Bears and Bs:Optimal SEO for Today's Search EnginesBirds Bears and Bs:Optimal SEO for Today's Search Engines
Birds Bears and Bs:Optimal SEO for Today's Search Engines
 
Advanced Methods for User Evaluation in Enterprise AR
Advanced Methods for User Evaluation in Enterprise ARAdvanced Methods for User Evaluation in Enterprise AR
Advanced Methods for User Evaluation in Enterprise AR
 
Better UX Surveys at UCD2012 by @cjforms
Better UX Surveys at UCD2012 by @cjformsBetter UX Surveys at UCD2012 by @cjforms
Better UX Surveys at UCD2012 by @cjforms
 
Icete content-based filtering with applications on tv viewing data
Icete   content-based filtering with applications on tv viewing dataIcete   content-based filtering with applications on tv viewing data
Icete content-based filtering with applications on tv viewing data
 
Userzoom Webinar Monster Aug09
Userzoom Webinar Monster Aug09Userzoom Webinar Monster Aug09
Userzoom Webinar Monster Aug09
 

More from Guillaume Cabanac

Adoption de l’identifiant ORCID : le cas des universités toulousaines
Adoption de l’identifiant ORCID : le cas des universités toulousainesAdoption de l’identifiant ORCID : le cas des universités toulousaines
Adoption de l’identifiant ORCID : le cas des universités toulousaines
Guillaume Cabanac
 
Dépollution de la littérature scientifique : traque d’expression torturées ...
Dépollution de la littérature scientifique : traque d’expression torturées ...Dépollution de la littérature scientifique : traque d’expression torturées ...
Dépollution de la littérature scientifique : traque d’expression torturées ...
Guillaume Cabanac
 
Interroger la science
Interroger la scienceInterroger la science
Interroger la science
Guillaume Cabanac
 
Valoriser le capital documentaire (en sommeil) d’une organisation : exploitat...
Valoriser le capital documentaire (en sommeil) d’une organisation : exploitat...Valoriser le capital documentaire (en sommeil) d’une organisation : exploitat...
Valoriser le capital documentaire (en sommeil) d’une organisation : exploitat...
Guillaume Cabanac
 
Comment analyser une mobilisation collective dans les réseaux socionumériques...
Comment analyser une mobilisation collective dans les réseaux socionumériques...Comment analyser une mobilisation collective dans les réseaux socionumériques...
Comment analyser une mobilisation collective dans les réseaux socionumériques...
Guillaume Cabanac
 
Gender as a Variable to Study Academic Writing
Gender as a Variable to Study Academic WritingGender as a Variable to Study Academic Writing
Gender as a Variable to Study Academic Writing
Guillaume Cabanac
 
Prospection de textes scientifiques : vision prospective
Prospection de textes scientifiques : vision prospectiveProspection de textes scientifiques : vision prospective
Prospection de textes scientifiques : vision prospective
Guillaume Cabanac
 
Questionner le texte scientifique pour caractériser la science et l'innovation
Questionner le texte scientifique pour caractériser la science et l'innovationQuestionner le texte scientifique pour caractériser la science et l'innovation
Questionner le texte scientifique pour caractériser la science et l'innovation
Guillaume Cabanac
 
Le carnet de l'avent de la sociologie francophone sur Twitter : réseaux et al...
Le carnet de l'avent de la sociologie francophone sur Twitter : réseaux et al...Le carnet de l'avent de la sociologie francophone sur Twitter : réseaux et al...
Le carnet de l'avent de la sociologie francophone sur Twitter : réseaux et al...
Guillaume Cabanac
 
Interroger le texte scientifique
Interroger le texte scientifiqueInterroger le texte scientifique
Interroger le texte scientifique
Guillaume Cabanac
 
The promises of web scrapping: Mining the web for relational data about artists
The promises of web scrapping: Mining the web for relational data about artistsThe promises of web scrapping: Mining the web for relational data about artists
The promises of web scrapping: Mining the web for relational data about artists
Guillaume Cabanac
 
Émergence de l’open access « gris » : LibGen et Sci-Hub comme filières clande...
Émergence de l’open access « gris » : LibGen et Sci-Hub comme filières clande...Émergence de l’open access « gris » : LibGen et Sci-Hub comme filières clande...
Émergence de l’open access « gris » : LibGen et Sci-Hub comme filières clande...
Guillaume Cabanac
 
Confrontation à la perception humaine de mesures de similarité entre membres
Confrontation à la perception humaine de mesures de similarité entre membres Confrontation à la perception humaine de mesures de similarité entre membres
Confrontation à la perception humaine de mesures de similarité entre membres
Guillaume Cabanac
 
« T'as pensé à retweeter mon article ? » Enjeux, limites et critique de la bi...
« T'as pensé à retweeter mon article ? » Enjeux, limites et critique de la bi...« T'as pensé à retweeter mon article ? » Enjeux, limites et critique de la bi...
« T'as pensé à retweeter mon article ? » Enjeux, limites et critique de la bi...
Guillaume Cabanac
 
Émergence de l’open access « gris » : LibGen et Sci-Hub
Émergence de l’open access « gris » : LibGen et Sci-HubÉmergence de l’open access « gris » : LibGen et Sci-Hub
Émergence de l’open access « gris » : LibGen et Sci-Hub
Guillaume Cabanac
 
Sur les étagères des bibliothèques numériques clandestines:
Sur les étagères des bibliothèques numériques clandestines: Sur les étagères des bibliothèques numériques clandestines:
Sur les étagères des bibliothèques numériques clandestines:
Guillaume Cabanac
 
Les altmetrics : estimer l'engouement pour la recherche sur les médias sociaux
Les altmetrics : estimer l'engouement pour la recherche sur les médias sociauxLes altmetrics : estimer l'engouement pour la recherche sur les médias sociaux
Les altmetrics : estimer l'engouement pour la recherche sur les médias sociaux
Guillaume Cabanac
 
A Journey in Scientometrics: quantitative studies of science at the crossroad...
A Journey in Scientometrics: quantitative studies of science at the crossroad...A Journey in Scientometrics: quantitative studies of science at the crossroad...
A Journey in Scientometrics: quantitative studies of science at the crossroad...
Guillaume Cabanac
 
Bibliogifts ? Les bibliothèques clandestines de l'édition scientifique
Bibliogifts ? Les bibliothèques clandestines de l'édition scientifiqueBibliogifts ? Les bibliothèques clandestines de l'édition scientifique
Bibliogifts ? Les bibliothèques clandestines de l'édition scientifique
Guillaume Cabanac
 
Le renfort des liens forts - dynamique relationnelle du coauthorship
Le renfort des liens forts - dynamique relationnelle du coauthorshipLe renfort des liens forts - dynamique relationnelle du coauthorship
Le renfort des liens forts - dynamique relationnelle du coauthorship
Guillaume Cabanac
 

More from Guillaume Cabanac (20)

Adoption de l’identifiant ORCID : le cas des universités toulousaines
Adoption de l’identifiant ORCID : le cas des universités toulousainesAdoption de l’identifiant ORCID : le cas des universités toulousaines
Adoption de l’identifiant ORCID : le cas des universités toulousaines
 
Dépollution de la littérature scientifique : traque d’expression torturées ...
Dépollution de la littérature scientifique : traque d’expression torturées ...Dépollution de la littérature scientifique : traque d’expression torturées ...
Dépollution de la littérature scientifique : traque d’expression torturées ...
 
Interroger la science
Interroger la scienceInterroger la science
Interroger la science
 
Valoriser le capital documentaire (en sommeil) d’une organisation : exploitat...
Valoriser le capital documentaire (en sommeil) d’une organisation : exploitat...Valoriser le capital documentaire (en sommeil) d’une organisation : exploitat...
Valoriser le capital documentaire (en sommeil) d’une organisation : exploitat...
 
Comment analyser une mobilisation collective dans les réseaux socionumériques...
Comment analyser une mobilisation collective dans les réseaux socionumériques...Comment analyser une mobilisation collective dans les réseaux socionumériques...
Comment analyser une mobilisation collective dans les réseaux socionumériques...
 
Gender as a Variable to Study Academic Writing
Gender as a Variable to Study Academic WritingGender as a Variable to Study Academic Writing
Gender as a Variable to Study Academic Writing
 
Prospection de textes scientifiques : vision prospective
Prospection de textes scientifiques : vision prospectiveProspection de textes scientifiques : vision prospective
Prospection de textes scientifiques : vision prospective
 
Questionner le texte scientifique pour caractériser la science et l'innovation
Questionner le texte scientifique pour caractériser la science et l'innovationQuestionner le texte scientifique pour caractériser la science et l'innovation
Questionner le texte scientifique pour caractériser la science et l'innovation
 
Le carnet de l'avent de la sociologie francophone sur Twitter : réseaux et al...
Le carnet de l'avent de la sociologie francophone sur Twitter : réseaux et al...Le carnet de l'avent de la sociologie francophone sur Twitter : réseaux et al...
Le carnet de l'avent de la sociologie francophone sur Twitter : réseaux et al...
 
Interroger le texte scientifique
Interroger le texte scientifiqueInterroger le texte scientifique
Interroger le texte scientifique
 
The promises of web scrapping: Mining the web for relational data about artists
The promises of web scrapping: Mining the web for relational data about artistsThe promises of web scrapping: Mining the web for relational data about artists
The promises of web scrapping: Mining the web for relational data about artists
 
Émergence de l’open access « gris » : LibGen et Sci-Hub comme filières clande...
Émergence de l’open access « gris » : LibGen et Sci-Hub comme filières clande...Émergence de l’open access « gris » : LibGen et Sci-Hub comme filières clande...
Émergence de l’open access « gris » : LibGen et Sci-Hub comme filières clande...
 
Confrontation à la perception humaine de mesures de similarité entre membres
Confrontation à la perception humaine de mesures de similarité entre membres Confrontation à la perception humaine de mesures de similarité entre membres
Confrontation à la perception humaine de mesures de similarité entre membres
 
« T'as pensé à retweeter mon article ? » Enjeux, limites et critique de la bi...
« T'as pensé à retweeter mon article ? » Enjeux, limites et critique de la bi...« T'as pensé à retweeter mon article ? » Enjeux, limites et critique de la bi...
« T'as pensé à retweeter mon article ? » Enjeux, limites et critique de la bi...
 
Émergence de l’open access « gris » : LibGen et Sci-Hub
Émergence de l’open access « gris » : LibGen et Sci-HubÉmergence de l’open access « gris » : LibGen et Sci-Hub
Émergence de l’open access « gris » : LibGen et Sci-Hub
 
Sur les étagères des bibliothèques numériques clandestines:
Sur les étagères des bibliothèques numériques clandestines: Sur les étagères des bibliothèques numériques clandestines:
Sur les étagères des bibliothèques numériques clandestines:
 
Les altmetrics : estimer l'engouement pour la recherche sur les médias sociaux
Les altmetrics : estimer l'engouement pour la recherche sur les médias sociauxLes altmetrics : estimer l'engouement pour la recherche sur les médias sociaux
Les altmetrics : estimer l'engouement pour la recherche sur les médias sociaux
 
A Journey in Scientometrics: quantitative studies of science at the crossroad...
A Journey in Scientometrics: quantitative studies of science at the crossroad...A Journey in Scientometrics: quantitative studies of science at the crossroad...
A Journey in Scientometrics: quantitative studies of science at the crossroad...
 
Bibliogifts ? Les bibliothèques clandestines de l'édition scientifique
Bibliogifts ? Les bibliothèques clandestines de l'édition scientifiqueBibliogifts ? Les bibliothèques clandestines de l'édition scientifique
Bibliogifts ? Les bibliothèques clandestines de l'édition scientifique
 
Le renfort des liens forts - dynamique relationnelle du coauthorship
Le renfort des liens forts - dynamique relationnelle du coauthorshipLe renfort des liens forts - dynamique relationnelle du coauthorship
Le renfort des liens forts - dynamique relationnelle du coauthorship
 

TPDL'11: Query Operators Shown Beneficial for Improving Search Results

  • 1. TPDL’11: International Conference on Theory and Practice of Digital Libraries September 25-29, Berlin, Germany Query Operators Shown Beneficial for Improving Search Results Gilles Hubert, Guillaume Cabanac, Christian Sallaberry, Damien Palacio
  • 2. Query Operators Shown Beneficial for Improving Search Results G. Hubert et al. Outline 1. Context Operators in Search Queries 2. Methodology Assessing the effects of query operators 3. Experiments Potential of effectiveness yielded and Results by operators 4. Conclusion and Future Work 2
  • 3. Query Operators Shown Beneficial for Improving Search Results G. Hubert et al. Outline 1. Context Operators in Search Queries 2. Methodology Assessing the effects of query operators 3. Experiments Potential of effectiveness yielded and Results by operators 4. Conclusion and Future Work 3
  • 4. 1. Context  Operators in Search Queries G. Hubert et al. Search Engines Offer Query Operators Information need “I’m looking for research projects funded in the DL domain” Regular query Query with operators  Various Operators  Quotation marks, Must appear (+), boosting operator (^), Boolean operators, proximity operators… 4
  • 5. 1. Context  Operators in Search Queries G. Hubert et al. Search Engines Offer Query Operators Information need “I’m looking for research projects funded in the DL domain” Regular query Query with operators         Case 1: What designers of search engines may expect 5
  • 6. 1. Context  Operators in Search Queries G. Hubert et al. Search Engines Offer Query Operators Information need “I’m looking for research projects funded in the DL domain” Regular query Query with operators      Case 2: What users of search engines may believe 6
  • 7. 1. Context  Operators in Search Queries G. Hubert et al. Search Engines Offer Query Operators Information need “I’m looking for research projects funded in the DL domain” Regular query Query with operators         Case 3: What designers of search engines may fear 7
  • 8. 1. Context  Operators in Search Queries G. Hubert et al. Usage of Query Operators  Quantitative Studies Excite Altavista [Jansen et al. 2000] [Silverstein et al., 1999] Excite Google+MSN Search+Yahoo! [Spink et al., 2001] [White and Morris, 2007] 25% Queries with operators 20% 15% 10% 5% 0% 1999 2000 2001 2002 2003 2004 2005 2006 2007  Possible Explanations  Unknown features?  No improvement observed? 8
  • 9. 1. Context  Operators in Search Queries G. Hubert et al. Usage of Query Operators  Qualitative Studies  Users  Average users not comfortable with “advanced means of searching” [Jansen et al., 2000]  Expert users recourse to query operators more frequently [Hölscher and Strube, 2000; Lucas and Topi, 2002; White and Morris, 2007]  Information Needs  More used in dedicated search [Jansen and Pooch, 2001]  Difficulty in finding information (e.g., complex information needs) [Aula et al., 2010]  Appropriateness  Operators used in a “semantically appropriate manner” [Eastman and Jansen, 2004] 9
  • 10. 1. Context  Operators in Search Queries G. Hubert et al. Usage of Query Operators  Effects of Query Operators on Effectiveness  Eastman and Jansen studied queries with operators  Real users: AOL, Google and MSN Search  Operators: AND, OR, MUST APPEAR and PHRASE  No statistically significant improvement P@10 [Eastman and Jansen, 2003] 10
  • 11. 1. Context  Operators in Search Queries G. Hubert et al. Usage of Query Operators  Effects of Query Operators on Effectiveness  Study on 20% of all queries  Expert users  Complex needs (Queries with operators) [Eastman and Jansen, 2003] 11
  • 12. 1. Context  Operators in Search Queries G. Hubert et al. Usage of Query Operators  Effects of Query Operators on Effectiveness  What about the other 80% of all queries ?!  Average users  Regular queries (no operators) [Eastman and Jansen, 2001] 12
  • 13. Query Operators Shown Beneficial for Improving Search Results G. Hubert et al. Outline 1. Context Operators in Search Queries 2. Methodology Assessing the effects of query operators 3. Experiments Potential of effectiveness yielded and Results by operators 4. Conclusion and Future Work 13
  • 14. 2. Methodology  Assessing the effects of query operators G. Hubert et al. Our Research Questions Q = Do query operators lead to improved search results? Q1 = Maximum gain in Q2 = Do users succeed in effectiveness when enriching formulating better queries a query with operators? involving operators? 14
  • 15. 2. Methodology  Assessing the effects of query operators G. Hubert et al. Our Methodology in a Nutshell . VN V4 . . V3 V2 Regular query V1: Query variant with operators         15
  • 16. 3. Methodology  Assessing the effects of query operators G. Hubert et al. Overview of the Methodology preOps postOps Query Variant {v1, … , vi, …, vn} Generator query corpus Search l(vi) IR model Engine Evaluation qrels measures of Procedure effectiveness metrics Usual evaluation framework in IR Components introduced for this study 16
  • 17. Query Operators Shown Beneficial for Improving Search Results G. Hubert et al. Outline 1. Context Operators in Search Queries 2. Methodology Assessing the effects of query operators 3. Experiments Potential of effectiveness yielded and Results by operators 4. Conclusion and Future Work 17
  • 18. 3. Experiments and Results  Potential of effectiveness yielded by operators G. Hubert et al. Experiment Settings  Standard Test Collections  TREC-7  TREC-8 Variant # Query variants generated with preOps and postOps  Query Operators 1 encryption equipment export 2 encryption +equipment +export  Must appear (+) … … … …  Term boosting (^N) 124 encryption +equipment export^10 … … … … 338 encryption^30 equipment^40 export^50  Variant Generation  Must appear ‘+’ only  Boost ‘^’ only with weights ^10, ^20, ^30, ^40, and ^50  Both ‘+’ and ‘^’  Search engine  Terrier with various models: BM25, DFR_BM25, InL2, PL2, TF_IDF 18
  • 19. 3. Experiments and Results  Potential of effectiveness yielded by operators G. Hubert et al. Results  TREC-7 per Topic Analysis: Boxplots  ‘+’ and ‘^’ 19
  • 20. 3. Experiments and Results  Potential of effectiveness yielded by operators G. Hubert et al. Results  Per Topic Analysis: Boxplot 0.4 Query variant highest AP AP (Average Precision) 0.3 AP of TREC’s regular query 0.2 0.1 Query variant lowest AP Topics 32 20
  • 21. 3. Experiments and Results  Potential of effectiveness yielded by operators G. Hubert et al. Results  TREC-7 Per Topic Analysis MAP  = 0.1554 +35.1%  ‘+’ and ‘^’ MAP ┬ = 0.2099 21
  • 22. 3. Experiments and Results  Potential of effectiveness yielded by operators G. Hubert et al. Results  TREC-8 per Topic Analysis MAP  = 0.1840 +24.3%  ‘+’ and ‘^’ MAP ┬ = 0.2288 22
  • 23. 3. Experiments and Results  Potential of effectiveness yielded by operators G. Hubert et al. Results  Global Analysis: MAP  ‘+’ only TREC-7 TREC-8 MAP MAP Model Baseline VOP (%) Baseline VOP (%) BM25 0.1677 0.1836 9.5** 0.1957 0.2154 10.2* DFR_BM25 0.1683 0.1843 9.5** 0.1965 0.2162 10.0* InL2 0.1710 0.1852 8.3** 0.1996 0.2172 8.8* PL2 0.1554 0.1826 17.5** 0.1840 0.2106 14.5** TF_IDF 0.1674 0.1833 9.5** 0.1964 0.2158 9.9** Statistical significance is denoted by ‘*’ for p < 0.05 (‘**’ for p < 0.01) 23
  • 24. 3. Experiments and Results  Potential of effectiveness yielded by operators G. Hubert et al. Results  Global Analysis: MAP  ‘^’ only TREC-7 TREC-8 MAP MAP Model Baseline VOP (%) Baseline VOP (%) BM25 0.1677 0.2027 20.9** 0.1957 0.2312 18.1** DFR_BM25 0.1683 0.2034 20.9** 0.1965 0.2316 17.9** InL2 0.1710 0.2059 20.4** 0.1996 0.2352 17.8** PL2 0.1554 0.1926 23.9** 0.1840 0.2173 18.1** TF_IDF 0.1674 0.2026 21.0** 0.1964 0.2312 17.7** Statistical significance is denoted by ‘*’ for p < 0.05 (‘**’ for p < 0.01) 24
  • 25. 3. Experiments and Results  Potential of effectiveness yielded by operators G. Hubert et al. Results  Global Analysis: MAP  ‘+’ and ‘^’ TREC-7 TREC-8 MAP MAP Model Baseline VOP (%) Baseline VOP (%) BM25 0.1677 0.2132 27.1** 0.1957 0.2381 21.7** DFR_BM25 0.1683 0.2133 26.7** 0.1965 0.2387 21.5** InL2 0.1710 0.2144 25.4** 0.1996 0.2407 20.6** PL2 0.1554 0.2099 35.1** 0.1840 0.2288 24.3** TF_IDF 0.1674 0.2131 27.3** 0.1964 0.2383 21.3** Statistical significance is denoted by ‘*’ for p < 0.05 (‘**’ for p < 0.01) 25
  • 26. Query Operators Shown Beneficial for Improving Search Results G. Hubert et al. Outline 1. Context Operators in Search Queries 2. Methodology Assessing the effects of query operators 3. Experiments Potential of effectiveness yielded and Results by operators 4. Conclusion and Future Work 26
  • 27. 4. Conclusion and Future Work G. Hubert et al. Conclusions  H: the Proper Use of Query Operators Improves Search Results  Methodology to Validate H  Standard IR Test Collections: TREC-7 and TREC-8  Must Appear (+) and Boosting Operators (^)  Findings  Observed gain up to 35.1%  Statistically significant  For all tested IR models and collections  Users Should Use Query Operators More Often 27
  • 28. 4. Conclusion and Future Work G. Hubert et al. Future Work  Short Term  Experimenting our methodology in various contexts  Additional IR collections  Additional IR models  Additional query operators  Medium Term  Address Q2: Do users succeed in formulating queries with operators, so that these lead to a significant gain in effectiveness?  Study other factors  Number of terms  Selection of terms  Long Term  Additional dimensions of information  Geographic IR 28
  • 29. TPDL’11: International Conference on Theory and Practice of Digital Libraries September 25-29, Berlin, Germany Thank you