An Enhance Image Retrieval of User Interest Using Query Specific Approach and...IJSRD
In recent years, image retrieval process has increased artistically. An image retrieval system is a process for searching and retrieving images from large amount of the image dataset. Color, texture and edge have been the primitive low level image descriptors in content based image retrieval systems. In this paper we discover a system which splits the search process into two stages. In the query specify approach the feature descriptors of a query image we re-extracted and then used to check the similarity between the query image and those images which is in database. In the evolution stage, the most relevant images where retrieved by using the Interactive genetic algorithm. IGA help the users to retrieve the images that are most relevant to the users’ need and SVM will rank the image as their title and as par time of search. So that user can get search image as par their requirements.
Half of the Honoré Mercier Bridge will be closed for seven weeks from Friday, June 20, at 9:00 p.m. to Monday, August 11, at 5:00 a.m. One lane will be open per direction while work takes place.
During this period, major work will take place day and night to replace a section of the deck on the federal portion of the Honoré Mercier Bridge, as well as to replace expansion joints and bridge bearings, and to carry out paving and marking work.
An Enhance Image Retrieval of User Interest Using Query Specific Approach and...IJSRD
In recent years, image retrieval process has increased artistically. An image retrieval system is a process for searching and retrieving images from large amount of the image dataset. Color, texture and edge have been the primitive low level image descriptors in content based image retrieval systems. In this paper we discover a system which splits the search process into two stages. In the query specify approach the feature descriptors of a query image we re-extracted and then used to check the similarity between the query image and those images which is in database. In the evolution stage, the most relevant images where retrieved by using the Interactive genetic algorithm. IGA help the users to retrieve the images that are most relevant to the users’ need and SVM will rank the image as their title and as par time of search. So that user can get search image as par their requirements.
Half of the Honoré Mercier Bridge will be closed for seven weeks from Friday, June 20, at 9:00 p.m. to Monday, August 11, at 5:00 a.m. One lane will be open per direction while work takes place.
During this period, major work will take place day and night to replace a section of the deck on the federal portion of the Honoré Mercier Bridge, as well as to replace expansion joints and bridge bearings, and to carry out paving and marking work.
Enterprise Search Research Article: Designing for Enterprise Search in a Glob...Findwise
Enterprise Search is used by organizations to capitalize on their internal knowledge by providing quick access to all internal information, helping users re-finding and discovering new information, as well as creating the necessary conditions for collaboration across organizational and geographical boundaries. In this large organization a search application was created to meet these goals. This paper focuses on the main design concepts of the second release of the search application, and how these were affected by experiences gained throughout the project. This design focused on simplicity and discoverability. Preliminary results show that the design is usable and that users find it easier to find the information they are looking for. A general increase in user satisfaction is also established.
In this talk, I discuss how Micro-economics can be used to describe, explain and prediction the interactions of a user and information retrieval system. The work is based on the ACM SIGIR 2011 paper ( http://dl.acm.org/citation.cfm?id=2009923 ) and is available to download from: http://www.dcs.gla.ac.uk/~leif/papers/azzopardi2011economics.pdf
While we have been busy trying to "define the damn thing" IA or answering the age old question of who rules, UX, IxDA or IA, the search engines have been busily transitioning to a machine mediated experience model for ranking. This means that SEO is now the responsibility of UX/IA whether we like it or not. This presentation lays out how search engines evaluate user experience and how we can influence this evaluation with an optimized design.
Delivered at Enterprise Search and Discovery 2015, this presentation takes a look at the search landscape users enjoy outside the firewall and the expectations it fosters inside. It presents contemporary user research on enterprise search behavior and uses these findings to make recommendations to enhance enterprise search effectiveness.
Search Solutions 2011: Successful Enterprise Search By DesignMarianne Sweeny
When your colleagues say they want Google, they don’t mean the Google Search Appliance. They mean the Google Search user experience: pervasive, expedient and delivering the information that they need. Successful enterprise search does not start with the application features, is not part of the information architecture, does not come from a controlled vocabulary and does not emerge on its own from the developers. It requires enterprise-specific data mining, enterprise-specific user-centered design and fine tuning to turn “search sucks” into search success within the firewall. This presentation looks at action items, tools and deliverables for Discovery, Planning, Design and Post Launch phases of an enterprise search deployment.
This presentation hopes to illuminate how Search, Content Strategy, Information Architecture, User Experience, Interaction Design can break down silos to take back relevance. Because, in the end, we, the people, should be the arbiters of experience, not machines and certainly not math.
Birds Bears and Bs:Optimal SEO for Today's Search EnginesMarianne Sweeny
In February of 2012, Google began launching the Panda Update (bears), the first of many steps away from a link-based model of relevance to a user experience model of relevance. This bearish focus on relevance use algorithms to determine a positive user experience focused on click-through (does the user select the result), bounce rate (does the user take action once they arrive at the landing page) and conversion (does the landing page satisfy the user’s information need). Content and information design became the foundation for relevance. Sadly, no one at Google told the content strategists, user experience professionals and information architects about their new influence on search engine performance. In April of 2012, Google followed up with the Penguin update (birds), a direct assault on link building, a mainstay of traditional search engine optimization (SEO). The Penguin algorithm evaluates the context and quality of links pointing to a site. Website found to be “over optimized” with low quality links are removed from Google’s index. Matt Cutts, GOogle Webmaster and the public face of Google, summed this up best: “And so that’s the sort of thing where we try to make the web site, uh Google Bot smarter, we try to make our relevance more adaptive so that people don’t do SEO, we handle that...” Sadly, Google is short on detail about how they are handling SEO, what constitutes adaptive relevance and how user experience professionals, information architects and content strategists can contribute thought-processing biped wisdom to computational algorithmic adaptive relevance so that searchers find what they are looking for even when they do not know that that is. This presentation will provide a brief introduction to the inner workings of information retrieval, the foundation of all search engines, even Google. On this foundation, I will dive deep into the Bs of how to optimize Web sites for today’s search technology: Be focused, Be authoritative, Be contextual and Be engaging. Birds (Penguin), Bears (Panda) & Bees: Optimal SEO will provide insight into recent search engine changes, proscriptive optimization guidance for usability and content strategy and foresight into the future direction of search.
Better UX Surveys: a workshop led by Caroline Jarrett at the UCD 2012 conference in London, UK on 10th November 2012. Slides include feedback from some exercises.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Icete content-based filtering with applications on tv viewing dataElaine Cecília Gatto
Recommendation systems provide recommendation based on information about users’ preferences. Information Filtering is used by recommendation systems so as information can be processed and suggested to users; and Content-Based Filtering is an Information Filtering approach very used in recommendation systems. Content-Based Filtering analyses the correlation of items content with the user’s profile, suggesting relevant items and putting away irrelevant items. Recommendation systems, which are very much used on the Internet, have been studied in order to be used on Digital TV context, and there already are several works in this sense. As they are used on the Internet, recommendation systems can be used in Digital TV in order to recommend TV programs, publicity and advertisement and also the electronic commerce. Thus, within Digital TV context, the items can be programs, advertisements and the products to be sold; and using Content-Based Filtering in the recommendation programs, for instance, these programs’ contents can be correlated with the user’s preferences, which in this scenario, are the type of program one wants to watch. This paper presents the studies accomplished with Content-Based Filtering with application on Digital TV data. The survey aims at observing and evaluating how some filtering techniques based on content can be used in recommendation systems in Digital TV context
Adoption de l’identifiant ORCID : le cas des universités toulousainesGuillaume Cabanac
Article publié dans les actes d'Inforsid 2020 (voir http://inforsid.fr/actes/2020/INFORSID_2020_p19-34.pdf et https://doi.org/10.1002/leap.1451).
Les systèmes d’information de la recherche collectent et mettent en visibilité la pro- duction scientifique des chercheurs. Leur désambiguïsation est capitale pour ne pas fusionner les productions de plusieurs personnes (cas des homonymes). Or, l’initiative ORCID offre un identifiant à chaque chercheur, pointant vers ses affiliations et sa bibliographie. Les agences de financement (ANR et ERC) et les revues savantes encouragent l’adoption d’ORCID. Nous présentons une méthode pour quantifier cette adoption selon la discipline et de la catégorie d’emploi des publiants d’un établissement. La preuve de concept est réalisée sur les données des 6 471 personnels rattachés aux 150 laboratoires du site toulousain. Nous confrontons avec une validation manuelle leur identité aux 7,3 de millions profils d’orcid.org. Nous observons une adoption croissante d’ORCID avec une disparité d’adoption selon les disciplines. Étonnement, des profils sont uniquement créés pour obtenir un ORCID, sans renseigner ni affiliation ni bibliographie. Ces profils « vides » ont peu d’intérêt pour la tâche de désambiguïsation des identités. À notre connaissance, aucune autre étude de cette ampleur n’a été publiée concernant l’adoption d’ORCID sur un site universitaire multidisciplinaire. La méthode proposée est réplicable et de futures études pourront chercher à confronter les situations et les dynamiques d’évolution.
Conférence invitée au congrès CORIA 2019 (Conference francophone sur la Recherche d'Information et ses Applications)
https://coria-earia2019.projet.liris.cnrs.fr/Programme/keynotes/
Les données de la recherche forment un matériau d'une rare richesse pour notre communauté. Les publications sont des textes structurés et interconnectés via les références bibliographiques qui étayent leurs rhétoriques. La paternité des productions a trait à des individus regroupés, parfois même hiérarchisés, au sein de collectifs de co-signataires qui se reconfigurent au fil du temps. La nature de la contribution de chacun tend désormais à être explicitée. Chaque affiliation est ancrée sur un territoire, le contenu des recherches l'est parfois aussi. L'impact des savoirs produits se matérialise explicitement via les citations et implicitement par les éponymes et autres évocations d'écoles de pensée. La délimitation des disciplines et le front de recherche — séparant le connu de l'inconnu — sont en perpétuelle évolution. Tous ces savoirs circulent dans la sphère académique, certains atteignent le grand public qui les relaie sur les réseaux sociaux et dans la presse, alimentant des altmetrics qui attestent de cette percolation science–société.
Cet exposé présentera une variété de tâches de recherche interrogeant ce matériau pour éclairer la genèse et l'évolution des mondes sociaux et des savoirs en sciences. Il s'agit de travaux interdisciplinaires à la croisée de l'informatique, de la scientométrie (désignant l'étude quantitative de la science et de l'innovation) et des sciences humaines et sociales. Je souhaite transmettre mon enthousiasme pour ces problématiques et promouvoir les thèmes du workshop Bibliometric-enhanced Information Retrieval (BIR) que je co-anime dans le cadre d'ECIR.
Valoriser le capital documentaire (en sommeil) d’une organisation : exploitat...Guillaume Cabanac
Conférence aux 10es journées d'étude du Département Archive et Médiathèque de l'Université Toulouse Jean Jaurès sur le thème "La participation des usagers et lecteurs en contexte numérique : quels impacts sur les pratiques professionnelles ?"
https://www.irit.fr/publis/IRIS/2019_DDAME_C.pdf
Comment analyser une mobilisation collective dans les réseaux socionumériques...Guillaume Cabanac
Séminaire PragmaTIC sur les TIC, les pratiques associées et leurs incidences sociales.
Université Toulouse 2, 28 septembre 2017
https://web.archive.org/web/20170928/http://sms.univ-tlse2.fr/accueil-sms/comunitic/seminaire-journee-d-etudes/seminaire-pragmatic-les-usages-des-medias-sociaux-dans-le-cadre-des-mobilisations-collectives-au-bresil--524448.kjsp
Conférence au workshop “Women and
men in science: Do we need gender metrics?” du 27 avril 2017 à l'Université Toulouse 2 - Jean Jaurès
https://www.irit.fr/~Guillaume.Cabanac/docs/workshopGenderScienceLabexSMS2017.pdf
More Related Content
Similar to TPDL'11: Query Operators Shown Beneficial for Improving Search Results
Enterprise Search Research Article: Designing for Enterprise Search in a Glob...Findwise
Enterprise Search is used by organizations to capitalize on their internal knowledge by providing quick access to all internal information, helping users re-finding and discovering new information, as well as creating the necessary conditions for collaboration across organizational and geographical boundaries. In this large organization a search application was created to meet these goals. This paper focuses on the main design concepts of the second release of the search application, and how these were affected by experiences gained throughout the project. This design focused on simplicity and discoverability. Preliminary results show that the design is usable and that users find it easier to find the information they are looking for. A general increase in user satisfaction is also established.
In this talk, I discuss how Micro-economics can be used to describe, explain and prediction the interactions of a user and information retrieval system. The work is based on the ACM SIGIR 2011 paper ( http://dl.acm.org/citation.cfm?id=2009923 ) and is available to download from: http://www.dcs.gla.ac.uk/~leif/papers/azzopardi2011economics.pdf
While we have been busy trying to "define the damn thing" IA or answering the age old question of who rules, UX, IxDA or IA, the search engines have been busily transitioning to a machine mediated experience model for ranking. This means that SEO is now the responsibility of UX/IA whether we like it or not. This presentation lays out how search engines evaluate user experience and how we can influence this evaluation with an optimized design.
Delivered at Enterprise Search and Discovery 2015, this presentation takes a look at the search landscape users enjoy outside the firewall and the expectations it fosters inside. It presents contemporary user research on enterprise search behavior and uses these findings to make recommendations to enhance enterprise search effectiveness.
Search Solutions 2011: Successful Enterprise Search By DesignMarianne Sweeny
When your colleagues say they want Google, they don’t mean the Google Search Appliance. They mean the Google Search user experience: pervasive, expedient and delivering the information that they need. Successful enterprise search does not start with the application features, is not part of the information architecture, does not come from a controlled vocabulary and does not emerge on its own from the developers. It requires enterprise-specific data mining, enterprise-specific user-centered design and fine tuning to turn “search sucks” into search success within the firewall. This presentation looks at action items, tools and deliverables for Discovery, Planning, Design and Post Launch phases of an enterprise search deployment.
This presentation hopes to illuminate how Search, Content Strategy, Information Architecture, User Experience, Interaction Design can break down silos to take back relevance. Because, in the end, we, the people, should be the arbiters of experience, not machines and certainly not math.
Birds Bears and Bs:Optimal SEO for Today's Search EnginesMarianne Sweeny
In February of 2012, Google began launching the Panda Update (bears), the first of many steps away from a link-based model of relevance to a user experience model of relevance. This bearish focus on relevance use algorithms to determine a positive user experience focused on click-through (does the user select the result), bounce rate (does the user take action once they arrive at the landing page) and conversion (does the landing page satisfy the user’s information need). Content and information design became the foundation for relevance. Sadly, no one at Google told the content strategists, user experience professionals and information architects about their new influence on search engine performance. In April of 2012, Google followed up with the Penguin update (birds), a direct assault on link building, a mainstay of traditional search engine optimization (SEO). The Penguin algorithm evaluates the context and quality of links pointing to a site. Website found to be “over optimized” with low quality links are removed from Google’s index. Matt Cutts, GOogle Webmaster and the public face of Google, summed this up best: “And so that’s the sort of thing where we try to make the web site, uh Google Bot smarter, we try to make our relevance more adaptive so that people don’t do SEO, we handle that...” Sadly, Google is short on detail about how they are handling SEO, what constitutes adaptive relevance and how user experience professionals, information architects and content strategists can contribute thought-processing biped wisdom to computational algorithmic adaptive relevance so that searchers find what they are looking for even when they do not know that that is. This presentation will provide a brief introduction to the inner workings of information retrieval, the foundation of all search engines, even Google. On this foundation, I will dive deep into the Bs of how to optimize Web sites for today’s search technology: Be focused, Be authoritative, Be contextual and Be engaging. Birds (Penguin), Bears (Panda) & Bees: Optimal SEO will provide insight into recent search engine changes, proscriptive optimization guidance for usability and content strategy and foresight into the future direction of search.
Better UX Surveys: a workshop led by Caroline Jarrett at the UCD 2012 conference in London, UK on 10th November 2012. Slides include feedback from some exercises.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Icete content-based filtering with applications on tv viewing dataElaine Cecília Gatto
Recommendation systems provide recommendation based on information about users’ preferences. Information Filtering is used by recommendation systems so as information can be processed and suggested to users; and Content-Based Filtering is an Information Filtering approach very used in recommendation systems. Content-Based Filtering analyses the correlation of items content with the user’s profile, suggesting relevant items and putting away irrelevant items. Recommendation systems, which are very much used on the Internet, have been studied in order to be used on Digital TV context, and there already are several works in this sense. As they are used on the Internet, recommendation systems can be used in Digital TV in order to recommend TV programs, publicity and advertisement and also the electronic commerce. Thus, within Digital TV context, the items can be programs, advertisements and the products to be sold; and using Content-Based Filtering in the recommendation programs, for instance, these programs’ contents can be correlated with the user’s preferences, which in this scenario, are the type of program one wants to watch. This paper presents the studies accomplished with Content-Based Filtering with application on Digital TV data. The survey aims at observing and evaluating how some filtering techniques based on content can be used in recommendation systems in Digital TV context
Adoption de l’identifiant ORCID : le cas des universités toulousainesGuillaume Cabanac
Article publié dans les actes d'Inforsid 2020 (voir http://inforsid.fr/actes/2020/INFORSID_2020_p19-34.pdf et https://doi.org/10.1002/leap.1451).
Les systèmes d’information de la recherche collectent et mettent en visibilité la pro- duction scientifique des chercheurs. Leur désambiguïsation est capitale pour ne pas fusionner les productions de plusieurs personnes (cas des homonymes). Or, l’initiative ORCID offre un identifiant à chaque chercheur, pointant vers ses affiliations et sa bibliographie. Les agences de financement (ANR et ERC) et les revues savantes encouragent l’adoption d’ORCID. Nous présentons une méthode pour quantifier cette adoption selon la discipline et de la catégorie d’emploi des publiants d’un établissement. La preuve de concept est réalisée sur les données des 6 471 personnels rattachés aux 150 laboratoires du site toulousain. Nous confrontons avec une validation manuelle leur identité aux 7,3 de millions profils d’orcid.org. Nous observons une adoption croissante d’ORCID avec une disparité d’adoption selon les disciplines. Étonnement, des profils sont uniquement créés pour obtenir un ORCID, sans renseigner ni affiliation ni bibliographie. Ces profils « vides » ont peu d’intérêt pour la tâche de désambiguïsation des identités. À notre connaissance, aucune autre étude de cette ampleur n’a été publiée concernant l’adoption d’ORCID sur un site universitaire multidisciplinaire. La méthode proposée est réplicable et de futures études pourront chercher à confronter les situations et les dynamiques d’évolution.
Conférence invitée au congrès CORIA 2019 (Conference francophone sur la Recherche d'Information et ses Applications)
https://coria-earia2019.projet.liris.cnrs.fr/Programme/keynotes/
Les données de la recherche forment un matériau d'une rare richesse pour notre communauté. Les publications sont des textes structurés et interconnectés via les références bibliographiques qui étayent leurs rhétoriques. La paternité des productions a trait à des individus regroupés, parfois même hiérarchisés, au sein de collectifs de co-signataires qui se reconfigurent au fil du temps. La nature de la contribution de chacun tend désormais à être explicitée. Chaque affiliation est ancrée sur un territoire, le contenu des recherches l'est parfois aussi. L'impact des savoirs produits se matérialise explicitement via les citations et implicitement par les éponymes et autres évocations d'écoles de pensée. La délimitation des disciplines et le front de recherche — séparant le connu de l'inconnu — sont en perpétuelle évolution. Tous ces savoirs circulent dans la sphère académique, certains atteignent le grand public qui les relaie sur les réseaux sociaux et dans la presse, alimentant des altmetrics qui attestent de cette percolation science–société.
Cet exposé présentera une variété de tâches de recherche interrogeant ce matériau pour éclairer la genèse et l'évolution des mondes sociaux et des savoirs en sciences. Il s'agit de travaux interdisciplinaires à la croisée de l'informatique, de la scientométrie (désignant l'étude quantitative de la science et de l'innovation) et des sciences humaines et sociales. Je souhaite transmettre mon enthousiasme pour ces problématiques et promouvoir les thèmes du workshop Bibliometric-enhanced Information Retrieval (BIR) que je co-anime dans le cadre d'ECIR.
Valoriser le capital documentaire (en sommeil) d’une organisation : exploitat...Guillaume Cabanac
Conférence aux 10es journées d'étude du Département Archive et Médiathèque de l'Université Toulouse Jean Jaurès sur le thème "La participation des usagers et lecteurs en contexte numérique : quels impacts sur les pratiques professionnelles ?"
https://www.irit.fr/publis/IRIS/2019_DDAME_C.pdf
Comment analyser une mobilisation collective dans les réseaux socionumériques...Guillaume Cabanac
Séminaire PragmaTIC sur les TIC, les pratiques associées et leurs incidences sociales.
Université Toulouse 2, 28 septembre 2017
https://web.archive.org/web/20170928/http://sms.univ-tlse2.fr/accueil-sms/comunitic/seminaire-journee-d-etudes/seminaire-pragmatic-les-usages-des-medias-sociaux-dans-le-cadre-des-mobilisations-collectives-au-bresil--524448.kjsp
Conférence au workshop “Women and
men in science: Do we need gender metrics?” du 27 avril 2017 à l'Université Toulouse 2 - Jean Jaurès
https://www.irit.fr/~Guillaume.Cabanac/docs/workshopGenderScienceLabexSMS2017.pdf
Émergence de l’open access « gris » : LibGen et Sci-Hub comme filières clande...Guillaume Cabanac
Séminaire PragmaTIC sur les TIC, les pratiques associées et leurs incidences sociales.
Université Toulouse 2, 20 octobre 2016
https://web.archive.org/web/20161009/http://sms.univ-tlse2.fr/accueil-sms/comunitic/seminaire-journee-d-etudes/seminaire-pragmatic-programmation-2016-2017-451614.kjsp?RH=actions-SMS
« T'as pensé à retweeter mon article ? » Enjeux, limites et critique de la bi...Guillaume Cabanac
« T'as pensé à retweeter mon article ? »
Enjeux, limites et critique de la bibliométrie alternative via les Altmetrics
Guillaume Cabanac, MCF, Université Toulouse 3, Institut de Recherche en Informatique de Toulouse
L'impact d'un résultat scientifique est traditionnellement estimé par le nombre de citations que la publication associée suscite. Cependant, cet indicateur ne permet pas d'estimer la réception d'une recherche en dehors de la sphère académique, à court terme, que ce soit dans la presse ou sur les médias sociaux (Twitter, Facebook, etc.).
C'est dans ce contexte que des indicateurs complémentaires appelés « altmetrics » sont développés depuis 2012 pour refléter l'engouement exprimé à l'égard des résultats scientifiques. Les altmetrics sont désormais intégrés aux plateformes des éditeurs scientifiques (tels qu'Elsevier, PLOS, Springer et Wiley) et aux CV en ligne des chercheurs (sur impactstory.org par exemple).
Mesurer l'intérêt du grand public pour la science : les altmetrics atteignent-ils cet objectif louable ? De récentes études suggèrent que le gros de l'activité captée par les altmetrics provient des chercheurs eux-mêmes... Les scientifiques adeptes des réseaux sociaux auraient-ils détourné -- inconsciemment ou délibérément -- cet indicateur pour accroître leur e-reputation ?
"When a measure becomes a target, it ceases to be a good measure" -- Goodhart's law
https://en.wikipedia.org/wiki/Goodhart%27s_law
Quelques références récentes :
-Colquhoun, D., & Plested, A. (2014). Scientists don't count: Why you should ignore Altmetrics and other bibliometric nightmares. DC's Improbable Science [Blog post]. Available from: http://wp.me/p2ZpqR-1EJ
-González-Valiente, C. L., Pacheco-Mendoza, J. and Arencibia-Jorge, R. (2016), A review of altmetrics as an emerging discipline for research evaluation. Learned Publishing. doi:10.1002/leap.1043
-Ke, Q., Ahn, Y.-Y., & Sugimoto, C. R. (2016). A Systematic Identification and Analysis of Scientists on Twitter. ArXiV preprint available from http://arxiv.org/abs/1608.06229
https://openeval2016.sciencesconf.org
https://openeval2016.sciencesconf.org/data/program/Resume_Guillaume_Cabanac.pdf
Émergence de l’open access « gris » : LibGen et Sci-HubGuillaume Cabanac
Séminaire ELICO « Observer les dynamiques socio-économiques de la publication scientifique : approches qualitative et bibliométrique »
http://web.archive.org/web/20160511081918/http://www.elico-recherche.eu/actualites/actualites-du-laboratoire/programme-du-seminaire-elico-2015-16
Les altmetrics : estimer l'engouement pour la recherche sur les médias sociauxGuillaume Cabanac
Journée d’étude Doccitanist : "Evaluation scientifique, qui croire et pourquoi ?"
08 octobre 2015
http://web.archive.org/web/20151008092117/http://doccitanist.lirmm.fr/spip.php?article273&lang=fr
Bibliogifts ? Les bibliothèques clandestines de l'édition scientifiqueGuillaume Cabanac
Conférence invitée à la 6e édition de la journée « Réseau des bibliothèques », université Fédérale de Toulouse, 9 juin 2015
http://web.archive.org/web/20150606154341/http://bibliotheques.univ-toulouse.fr/actualite/journee-reseau-des-bibliotheques-2015
Le renfort des liens forts - dynamique relationnelle du coauthorshipGuillaume Cabanac
Le renfort des liens forts - dynamique relationnelle du coauthorship
Cas de l’informatique (1980-2010)
Journées d'études RÉSOCIT
http://www.irit.fr/~Guillaume.Cabanac/docs/resocit2015.pdf
Le renfort des liens forts - dynamique relationnelle du coauthorship
TPDL'11: Query Operators Shown Beneficial for Improving Search Results
1. TPDL’11: International Conference on Theory
and Practice of Digital Libraries
September 25-29, Berlin, Germany
Query Operators Shown Beneficial for
Improving Search Results
Gilles Hubert, Guillaume Cabanac,
Christian Sallaberry, Damien Palacio
2. Query Operators Shown Beneficial for Improving Search Results G. Hubert et al.
Outline
1. Context Operators in Search Queries
2. Methodology Assessing the effects of query operators
3. Experiments Potential of effectiveness yielded
and Results by operators
4. Conclusion and Future Work
2
3. Query Operators Shown Beneficial for Improving Search Results G. Hubert et al.
Outline
1. Context Operators in Search Queries
2. Methodology Assessing the effects of query operators
3. Experiments Potential of effectiveness yielded
and Results by operators
4. Conclusion and Future Work
3
4. 1. Context Operators in Search Queries G. Hubert et al.
Search Engines Offer Query Operators
Information need
“I’m looking for research projects funded in the DL domain”
Regular query Query with operators
Various Operators
Quotation marks, Must appear (+), boosting operator (^),
Boolean operators, proximity operators…
4
5. 1. Context Operators in Search Queries G. Hubert et al.
Search Engines Offer Query Operators
Information need
“I’m looking for research projects funded in the DL domain”
Regular query Query with operators
Case 1: What designers of search engines may expect
5
6. 1. Context Operators in Search Queries G. Hubert et al.
Search Engines Offer Query Operators
Information need
“I’m looking for research projects funded in the DL domain”
Regular query Query with operators
Case 2: What users of search engines may believe
6
7. 1. Context Operators in Search Queries G. Hubert et al.
Search Engines Offer Query Operators
Information need
“I’m looking for research projects funded in the DL domain”
Regular query Query with operators
Case 3: What designers of search engines may fear
7
8. 1. Context Operators in Search Queries G. Hubert et al.
Usage of Query Operators
Quantitative Studies
Excite
Altavista [Jansen et al. 2000]
[Silverstein et al., 1999] Excite Google+MSN Search+Yahoo!
[Spink et al., 2001] [White and Morris, 2007]
25%
Queries with operators
20%
15%
10%
5%
0%
1999 2000 2001 2002 2003 2004 2005 2006 2007
Possible Explanations
Unknown features?
No improvement observed? 8
9. 1. Context Operators in Search Queries G. Hubert et al.
Usage of Query Operators
Qualitative Studies
Users
Average users not comfortable with “advanced means of searching”
[Jansen et al., 2000]
Expert users recourse to query operators more frequently
[Hölscher and Strube, 2000; Lucas and Topi, 2002; White and Morris, 2007]
Information Needs
More used in dedicated search
[Jansen and Pooch, 2001]
Difficulty in finding information (e.g., complex information needs)
[Aula et al., 2010]
Appropriateness
Operators used in a “semantically appropriate manner”
[Eastman and Jansen, 2004]
9
10. 1. Context Operators in Search Queries G. Hubert et al.
Usage of Query Operators
Effects of Query Operators on Effectiveness
Eastman and Jansen studied queries with operators
Real users: AOL, Google and MSN Search
Operators: AND, OR, MUST APPEAR and PHRASE
No statistically significant improvement P@10
[Eastman and Jansen, 2003]
10
11. 1. Context Operators in Search Queries G. Hubert et al.
Usage of Query Operators
Effects of Query Operators on Effectiveness
Study on 20% of all queries
Expert users
Complex needs (Queries with operators)
[Eastman and Jansen, 2003]
11
12. 1. Context Operators in Search Queries G. Hubert et al.
Usage of Query Operators
Effects of Query Operators on Effectiveness
What about the other 80% of all queries ?!
Average users
Regular queries (no operators)
[Eastman and Jansen, 2001]
12
13. Query Operators Shown Beneficial for Improving Search Results G. Hubert et al.
Outline
1. Context Operators in Search Queries
2. Methodology Assessing the effects of query operators
3. Experiments Potential of effectiveness yielded
and Results by operators
4. Conclusion and Future Work
13
14. 2. Methodology Assessing the effects of query operators G. Hubert et al.
Our Research Questions
Q = Do query operators lead to improved search results?
Q1 = Maximum gain in Q2 = Do users succeed in
effectiveness when enriching formulating better queries
a query with operators? involving operators?
14
15. 2. Methodology Assessing the effects of query operators G. Hubert et al.
Our Methodology in a Nutshell
. VN
V4 . .
V3
V2
Regular query V1: Query variant with operators
15
16. 3. Methodology Assessing the effects of query operators G. Hubert et al.
Overview of the Methodology
preOps
postOps Query Variant {v1, … , vi, …, vn}
Generator
query
corpus Search l(vi)
IR model Engine
Evaluation
qrels measures of
Procedure effectiveness
metrics
Usual evaluation framework in IR
Components introduced for this study
16
17. Query Operators Shown Beneficial for Improving Search Results G. Hubert et al.
Outline
1. Context Operators in Search Queries
2. Methodology Assessing the effects of query operators
3. Experiments Potential of effectiveness yielded
and Results by operators
4. Conclusion and Future Work
17
18. 3. Experiments and Results Potential of effectiveness yielded by operators G. Hubert et al.
Experiment Settings
Standard Test Collections
TREC-7
TREC-8
Variant # Query variants generated with preOps and postOps
Query Operators 1 encryption equipment export
2 encryption +equipment +export
Must appear (+) … … … …
Term boosting (^N) 124 encryption +equipment export^10
… … … …
338 encryption^30 equipment^40 export^50
Variant Generation
Must appear ‘+’ only
Boost ‘^’ only with weights ^10, ^20, ^30, ^40, and ^50
Both ‘+’ and ‘^’
Search engine
Terrier with various models: BM25, DFR_BM25, InL2, PL2, TF_IDF
18
19. 3. Experiments and Results Potential of effectiveness yielded by operators G. Hubert et al.
Results
TREC-7 per Topic Analysis: Boxplots
‘+’ and ‘^’
19
20. 3. Experiments and Results Potential of effectiveness yielded by operators G. Hubert et al.
Results
Per Topic Analysis: Boxplot
0.4 Query variant highest AP
AP (Average Precision)
0.3
AP of TREC’s regular query
0.2
0.1
Query variant lowest AP
Topics
32 20
21. 3. Experiments and Results Potential of effectiveness yielded by operators G. Hubert et al.
Results
TREC-7 Per Topic Analysis MAP = 0.1554
+35.1%
‘+’ and ‘^’ MAP ┬ = 0.2099
21
22. 3. Experiments and Results Potential of effectiveness yielded by operators G. Hubert et al.
Results
TREC-8 per Topic Analysis MAP = 0.1840
+24.3%
‘+’ and ‘^’ MAP ┬ = 0.2288
22
23. 3. Experiments and Results Potential of effectiveness yielded by operators G. Hubert et al.
Results
Global Analysis: MAP
‘+’ only
TREC-7 TREC-8
MAP MAP
Model Baseline VOP (%) Baseline VOP (%)
BM25 0.1677 0.1836 9.5** 0.1957 0.2154 10.2*
DFR_BM25 0.1683 0.1843 9.5** 0.1965 0.2162 10.0*
InL2 0.1710 0.1852 8.3** 0.1996 0.2172 8.8*
PL2 0.1554 0.1826 17.5** 0.1840 0.2106 14.5**
TF_IDF 0.1674 0.1833 9.5** 0.1964 0.2158 9.9**
Statistical significance is denoted by ‘*’ for p < 0.05 (‘**’ for p < 0.01)
23
24. 3. Experiments and Results Potential of effectiveness yielded by operators G. Hubert et al.
Results
Global Analysis: MAP
‘^’ only
TREC-7 TREC-8
MAP MAP
Model Baseline VOP (%) Baseline VOP (%)
BM25 0.1677 0.2027 20.9** 0.1957 0.2312 18.1**
DFR_BM25 0.1683 0.2034 20.9** 0.1965 0.2316 17.9**
InL2 0.1710 0.2059 20.4** 0.1996 0.2352 17.8**
PL2 0.1554 0.1926 23.9** 0.1840 0.2173 18.1**
TF_IDF 0.1674 0.2026 21.0** 0.1964 0.2312 17.7**
Statistical significance is denoted by ‘*’ for p < 0.05 (‘**’ for p < 0.01)
24
25. 3. Experiments and Results Potential of effectiveness yielded by operators G. Hubert et al.
Results
Global Analysis: MAP
‘+’ and ‘^’
TREC-7 TREC-8
MAP MAP
Model Baseline VOP (%) Baseline VOP (%)
BM25 0.1677 0.2132 27.1** 0.1957 0.2381 21.7**
DFR_BM25 0.1683 0.2133 26.7** 0.1965 0.2387 21.5**
InL2 0.1710 0.2144 25.4** 0.1996 0.2407 20.6**
PL2 0.1554 0.2099 35.1** 0.1840 0.2288 24.3**
TF_IDF 0.1674 0.2131 27.3** 0.1964 0.2383 21.3**
Statistical significance is denoted by ‘*’ for p < 0.05 (‘**’ for p < 0.01)
25
26. Query Operators Shown Beneficial for Improving Search Results G. Hubert et al.
Outline
1. Context Operators in Search Queries
2. Methodology Assessing the effects of query operators
3. Experiments Potential of effectiveness yielded
and Results by operators
4. Conclusion and Future Work
26
27. 4. Conclusion and Future Work G. Hubert et al.
Conclusions
H: the Proper Use of Query Operators Improves Search Results
Methodology to Validate H
Standard IR Test Collections: TREC-7 and TREC-8
Must Appear (+) and Boosting Operators (^)
Findings
Observed gain up to 35.1%
Statistically significant
For all tested IR models and collections
Users Should Use Query Operators More Often
27
28. 4. Conclusion and Future Work G. Hubert et al.
Future Work
Short Term
Experimenting our methodology in various contexts
Additional IR collections
Additional IR models
Additional query operators
Medium Term
Address Q2: Do users succeed in formulating queries with operators,
so that these lead to a significant gain in effectiveness?
Study other factors
Number of terms
Selection of terms
Long Term
Additional dimensions of information
Geographic IR 28