Digital documents support and shape people’s daily activities. Regarding Computer Science, such documents are the cornerstone of two areas of research: Digital Libraries and Information Retrieval. In this presentation, we discuss the research questions that we addressed in these areas, such as:
* Digital Libraries:
- How to transpose paper-based annotations into digital documents?
- How to measure the social validity of a statement according to the argumentative discussion it sparked off?
- How to harness a quiescent capital present in any organization: its documents?
* Information Retrieval
- Is document tie-breaking affecting the evaluation of Information Retrieval systems?
- How to retrieve documents matching keywords and spatiotemporal constraints?
- Do operators in search queries (e.g., ‘+’, ‘^’) improve the effectiveness of search results?
Each question gives us the opportunity to recall background knowledge, such as how to evaluate the effectiveness of a search engine?
Finally, we discuss some of our works related to Scientometrics, which may be defined as the study of science with scientific methods. We applied techniques of Information Retrieval to documents extracted from scientific Digital Libraries. We plan to introduce our findings to the following questions:
- How to recommend researchers according to their research topics and social clues?
- What is the landscape of research in Information Systems from the perspective of gatekeepers?
- What if submission date influenced the acceptance of conference papers?
Through this journey at the crossroads of Digital Libraries, Information Retrieval, and Scientometrics, we wish to pass on our enthusiasm for these subjects to academics and students alike.
Generative models of online discussion threads (ASONAM 2018 tutorial)Pablo Aragón
Online discussion is a core feature of numerous social media platforms and has attracted increasing attention from academia for different and relevant reasons, e.g., the resolution of problems in collaborative editing, question answering and e-learning platforms, the response of online communities to news events, online political and civic participation, etc. Discussions on the Internet commonly occur as a exchange of written messages among two or more participants. These conversations are often represented as threads, which are initiated by a user posting a starting message (a post) and then other users replies to either the post or the earlier replies. Given this sequential posting behavior, online discussion threads follow a tree network structure.
Different modeling approaches have been proposed to identify the governing mechanisms of the network structure of threads. Statistical models of this type are aimed to reproduce the growth of discussion threads through different features, often related to human behavior. This is why they are usually called generative models: they do not only estimate the statistical significance of their corresponding features but also reproduce the temporal arrival patterns of messages that form a discussion thread. The parameters of these models allow to compare different platforms and communities, they even can help to assess the impact of design choices and user interface changes on the way the discussions unfold. Therefore, we aim to provide the participants with state of the art tools and methods for the analysis, diagnosis, management and improvement of online discussion platform and communities.
A preliminary approach to knowledge integrity risk assessment in Wikipedia p...Pablo Aragón
Wikipedia is one of the main repositories of free knowledge available today, with a central role in the Web ecosystem. For this reason, it can also be a battleground for actors trying to impose specific points of view or even spreading disinformation online. There is a growing need to monitor its "health" but this is not an easy task. Wikipedia exists in over 300 language editions and each project is maintained by a different community, with their own strengths, weaknesses and limitations. In this paper, we introduce a taxonomy of knowledge integrity risks across Wikipedia projects and a first set of indicators to assess internal risks related to community and content issues, as well as external threats such as the geopolitical and media landscape. On top of this taxonomy, we offer a preliminary analysis illustrating how the lack of editors' geographical diversity might represent a knowledge integrity risk. These are the first steps of a research project to build a Wikipedia Knowledge Integrity Risk Observatory.
Ostrom’s crypto-principles? Towards a commons-based approach for the use of B...David Rozas
Sildes from presentation at "Science, politics, activism and citizenship". Redes CTS & Catalan Society for the History of Science and Technology (Valencia, 31/05/2018).
How Blockchains Are Transforming Adult EducationJohn Domingue
Slides from a session at the 9th Pan Commonwealth Forum giving an overview of the technology and concrete examples of how it is being used today to transform adult learning in a number of regions.
When the Wikipedians talk: network and tree structure of Wikipedia discussion...David Laniado
Talk pages play a fundamental role in Wikipedia as the place for discussion and communication. In this work we use the comments on these pages to extract and study three networks, corresponding to different kinds of interactions. We find evidence of a specific assortativity profile which differentiates article discussions from personal conversations. An analysis of the tree structure of the article talk pages allows to capture patterns of interaction, and reveals structural differences among the discussions about articles from different semantic areas.
On Thursday 19 November 2015, the British Embassy in Paris hosted a second trilateral workshop with French, German and British delegates from the research, government and business sectors to discuss the importance of energy storage.
Generative models of online discussion threads (ASONAM 2018 tutorial)Pablo Aragón
Online discussion is a core feature of numerous social media platforms and has attracted increasing attention from academia for different and relevant reasons, e.g., the resolution of problems in collaborative editing, question answering and e-learning platforms, the response of online communities to news events, online political and civic participation, etc. Discussions on the Internet commonly occur as a exchange of written messages among two or more participants. These conversations are often represented as threads, which are initiated by a user posting a starting message (a post) and then other users replies to either the post or the earlier replies. Given this sequential posting behavior, online discussion threads follow a tree network structure.
Different modeling approaches have been proposed to identify the governing mechanisms of the network structure of threads. Statistical models of this type are aimed to reproduce the growth of discussion threads through different features, often related to human behavior. This is why they are usually called generative models: they do not only estimate the statistical significance of their corresponding features but also reproduce the temporal arrival patterns of messages that form a discussion thread. The parameters of these models allow to compare different platforms and communities, they even can help to assess the impact of design choices and user interface changes on the way the discussions unfold. Therefore, we aim to provide the participants with state of the art tools and methods for the analysis, diagnosis, management and improvement of online discussion platform and communities.
A preliminary approach to knowledge integrity risk assessment in Wikipedia p...Pablo Aragón
Wikipedia is one of the main repositories of free knowledge available today, with a central role in the Web ecosystem. For this reason, it can also be a battleground for actors trying to impose specific points of view or even spreading disinformation online. There is a growing need to monitor its "health" but this is not an easy task. Wikipedia exists in over 300 language editions and each project is maintained by a different community, with their own strengths, weaknesses and limitations. In this paper, we introduce a taxonomy of knowledge integrity risks across Wikipedia projects and a first set of indicators to assess internal risks related to community and content issues, as well as external threats such as the geopolitical and media landscape. On top of this taxonomy, we offer a preliminary analysis illustrating how the lack of editors' geographical diversity might represent a knowledge integrity risk. These are the first steps of a research project to build a Wikipedia Knowledge Integrity Risk Observatory.
Ostrom’s crypto-principles? Towards a commons-based approach for the use of B...David Rozas
Sildes from presentation at "Science, politics, activism and citizenship". Redes CTS & Catalan Society for the History of Science and Technology (Valencia, 31/05/2018).
How Blockchains Are Transforming Adult EducationJohn Domingue
Slides from a session at the 9th Pan Commonwealth Forum giving an overview of the technology and concrete examples of how it is being used today to transform adult learning in a number of regions.
When the Wikipedians talk: network and tree structure of Wikipedia discussion...David Laniado
Talk pages play a fundamental role in Wikipedia as the place for discussion and communication. In this work we use the comments on these pages to extract and study three networks, corresponding to different kinds of interactions. We find evidence of a specific assortativity profile which differentiates article discussions from personal conversations. An analysis of the tree structure of the article talk pages allows to capture patterns of interaction, and reveals structural differences among the discussions about articles from different semantic areas.
On Thursday 19 November 2015, the British Embassy in Paris hosted a second trilateral workshop with French, German and British delegates from the research, government and business sectors to discuss the importance of energy storage.
Engaging Your Community Through Cultural Heritage Digital Libraries Karen S Calhoun
Based on the book Exploring Digital Libraries, this ALA Techsource webinar examines cultural heritage collections in the context of the social web and online communities. Calhoun and Brenner explore the possibilities and provide examples of digital libraries' shift toward social platforms, along the way discussing how to increase discoverability and community engagement, for instance through crowdsourcing.
The Impact of Digitization in Rhetoric and Practice: A Review of Budget Cuts ...sara_allain
This presentation was first given at the CAIS Archives conference in Dundee, Scotland, on 25 April 2013 by Sara Allain and Kelli Babcock.
It is provided here under a Creative Commons Attribution-NonCommercial 3.0 License (http://creativecommons.org/licenses/by-nc/3.0/)
Evaluating Digital Scholarship, Alison ByerlyNITLE
While a number of professional organizations have produced valuable guidelines for evaluation of digital work, many colleges and universities have yet to establish clear protocols and practices for applying them. Alison Byerly, College Professor and former Provost and Executive Vice President at Middlebury College, who has co-led workshops on evaluating digital scholarship at the MLA convention, will review major issues to be considered in the evaluation of digital work, such as: presentation of medium-specific materials, documentation of multiple roles in collaborative work, changing forms of peer review, and identification of appropriate reviewers. She will then talk briefly about how these issues can best be approached from the perspective of the candidate who wishes to present his or her work effectively to review committees, as well as from the perspective of colleagues who wish to provide a well-informed evaluation of such work.
In Praise of Interdisciplinary Research through ScientometricsGuillaume Cabanac
Keynote talk to the workshop on on Bibliometric-enhanced Information Retrieval (BIR) collocated with the ECIR 2015 conference.
http://www.gesis.org/en/events/events-archive/conferences/ecirworkshop2015/
How communities curate knowledge & how ontologists can help -Eurecom--2015-01-19jodischneider
Invited talk 2015-01-19 at EURCOM.
Two themes:
How do communities curate knowledge?
and
How can information technology help?
Q: How do communities curate knowledge?
A: Communities curate knowledge by discussing evidence and applying community standards to it.
In Wikipedia, 4 questions are used to evaluate borderline articles:
Notability – Is the topic appropriate for our encyclopedia?
Sources – Is the article well-sourced?
Maintenance – Can we maintain this article?
Bias – Is the article neutral? POV appropriately weighted?
Q: How can information technology help?
A: Information technology can organize evidence based on the criteria communities use.
In Wikipedia, we developed an alternate interface for deletion discussions.
How they might connect in a digital context. Invited keynote presentation in DARIAH workshop Practices and Context in Contemporary Annotation Activities. University of Hamburg, 29 October, 2015.
Presentaiton to the NITLE Reed College Learning Management Systems meeting (http://nitle.org/index.php/nitle/opportunities/fall_2006/learning_management_systems_at_liberal_arts_colleges).
Immersive Recommendation incorporates cross-platform and diverse personal digital traces into recommendations. Our context-aware topic modeling algorithm systematically profiles users' interests based on their traces from different contexts, and our hybrid recommendation algorithm makes high-quality recommendations by fusing users' personal profiles, item profiles, and existing ratings. The proposed model showed significant improvement over the state-of-the-art algorithms, suggesting the value of using this new user-centric recommendation model to improve recommendation quality, including in cold-start situations.
Adoption de l’identifiant ORCID : le cas des universités toulousainesGuillaume Cabanac
Article publié dans les actes d'Inforsid 2020 (voir http://inforsid.fr/actes/2020/INFORSID_2020_p19-34.pdf et https://doi.org/10.1002/leap.1451).
Les systèmes d’information de la recherche collectent et mettent en visibilité la pro- duction scientifique des chercheurs. Leur désambiguïsation est capitale pour ne pas fusionner les productions de plusieurs personnes (cas des homonymes). Or, l’initiative ORCID offre un identifiant à chaque chercheur, pointant vers ses affiliations et sa bibliographie. Les agences de financement (ANR et ERC) et les revues savantes encouragent l’adoption d’ORCID. Nous présentons une méthode pour quantifier cette adoption selon la discipline et de la catégorie d’emploi des publiants d’un établissement. La preuve de concept est réalisée sur les données des 6 471 personnels rattachés aux 150 laboratoires du site toulousain. Nous confrontons avec une validation manuelle leur identité aux 7,3 de millions profils d’orcid.org. Nous observons une adoption croissante d’ORCID avec une disparité d’adoption selon les disciplines. Étonnement, des profils sont uniquement créés pour obtenir un ORCID, sans renseigner ni affiliation ni bibliographie. Ces profils « vides » ont peu d’intérêt pour la tâche de désambiguïsation des identités. À notre connaissance, aucune autre étude de cette ampleur n’a été publiée concernant l’adoption d’ORCID sur un site universitaire multidisciplinaire. La méthode proposée est réplicable et de futures études pourront chercher à confronter les situations et les dynamiques d’évolution.
Engaging Your Community Through Cultural Heritage Digital Libraries Karen S Calhoun
Based on the book Exploring Digital Libraries, this ALA Techsource webinar examines cultural heritage collections in the context of the social web and online communities. Calhoun and Brenner explore the possibilities and provide examples of digital libraries' shift toward social platforms, along the way discussing how to increase discoverability and community engagement, for instance through crowdsourcing.
The Impact of Digitization in Rhetoric and Practice: A Review of Budget Cuts ...sara_allain
This presentation was first given at the CAIS Archives conference in Dundee, Scotland, on 25 April 2013 by Sara Allain and Kelli Babcock.
It is provided here under a Creative Commons Attribution-NonCommercial 3.0 License (http://creativecommons.org/licenses/by-nc/3.0/)
Evaluating Digital Scholarship, Alison ByerlyNITLE
While a number of professional organizations have produced valuable guidelines for evaluation of digital work, many colleges and universities have yet to establish clear protocols and practices for applying them. Alison Byerly, College Professor and former Provost and Executive Vice President at Middlebury College, who has co-led workshops on evaluating digital scholarship at the MLA convention, will review major issues to be considered in the evaluation of digital work, such as: presentation of medium-specific materials, documentation of multiple roles in collaborative work, changing forms of peer review, and identification of appropriate reviewers. She will then talk briefly about how these issues can best be approached from the perspective of the candidate who wishes to present his or her work effectively to review committees, as well as from the perspective of colleagues who wish to provide a well-informed evaluation of such work.
In Praise of Interdisciplinary Research through ScientometricsGuillaume Cabanac
Keynote talk to the workshop on on Bibliometric-enhanced Information Retrieval (BIR) collocated with the ECIR 2015 conference.
http://www.gesis.org/en/events/events-archive/conferences/ecirworkshop2015/
How communities curate knowledge & how ontologists can help -Eurecom--2015-01-19jodischneider
Invited talk 2015-01-19 at EURCOM.
Two themes:
How do communities curate knowledge?
and
How can information technology help?
Q: How do communities curate knowledge?
A: Communities curate knowledge by discussing evidence and applying community standards to it.
In Wikipedia, 4 questions are used to evaluate borderline articles:
Notability – Is the topic appropriate for our encyclopedia?
Sources – Is the article well-sourced?
Maintenance – Can we maintain this article?
Bias – Is the article neutral? POV appropriately weighted?
Q: How can information technology help?
A: Information technology can organize evidence based on the criteria communities use.
In Wikipedia, we developed an alternate interface for deletion discussions.
How they might connect in a digital context. Invited keynote presentation in DARIAH workshop Practices and Context in Contemporary Annotation Activities. University of Hamburg, 29 October, 2015.
Presentaiton to the NITLE Reed College Learning Management Systems meeting (http://nitle.org/index.php/nitle/opportunities/fall_2006/learning_management_systems_at_liberal_arts_colleges).
Immersive Recommendation incorporates cross-platform and diverse personal digital traces into recommendations. Our context-aware topic modeling algorithm systematically profiles users' interests based on their traces from different contexts, and our hybrid recommendation algorithm makes high-quality recommendations by fusing users' personal profiles, item profiles, and existing ratings. The proposed model showed significant improvement over the state-of-the-art algorithms, suggesting the value of using this new user-centric recommendation model to improve recommendation quality, including in cold-start situations.
Adoption de l’identifiant ORCID : le cas des universités toulousainesGuillaume Cabanac
Article publié dans les actes d'Inforsid 2020 (voir http://inforsid.fr/actes/2020/INFORSID_2020_p19-34.pdf et https://doi.org/10.1002/leap.1451).
Les systèmes d’information de la recherche collectent et mettent en visibilité la pro- duction scientifique des chercheurs. Leur désambiguïsation est capitale pour ne pas fusionner les productions de plusieurs personnes (cas des homonymes). Or, l’initiative ORCID offre un identifiant à chaque chercheur, pointant vers ses affiliations et sa bibliographie. Les agences de financement (ANR et ERC) et les revues savantes encouragent l’adoption d’ORCID. Nous présentons une méthode pour quantifier cette adoption selon la discipline et de la catégorie d’emploi des publiants d’un établissement. La preuve de concept est réalisée sur les données des 6 471 personnels rattachés aux 150 laboratoires du site toulousain. Nous confrontons avec une validation manuelle leur identité aux 7,3 de millions profils d’orcid.org. Nous observons une adoption croissante d’ORCID avec une disparité d’adoption selon les disciplines. Étonnement, des profils sont uniquement créés pour obtenir un ORCID, sans renseigner ni affiliation ni bibliographie. Ces profils « vides » ont peu d’intérêt pour la tâche de désambiguïsation des identités. À notre connaissance, aucune autre étude de cette ampleur n’a été publiée concernant l’adoption d’ORCID sur un site universitaire multidisciplinaire. La méthode proposée est réplicable et de futures études pourront chercher à confronter les situations et les dynamiques d’évolution.
Conférence invitée au congrès CORIA 2019 (Conference francophone sur la Recherche d'Information et ses Applications)
https://coria-earia2019.projet.liris.cnrs.fr/Programme/keynotes/
Les données de la recherche forment un matériau d'une rare richesse pour notre communauté. Les publications sont des textes structurés et interconnectés via les références bibliographiques qui étayent leurs rhétoriques. La paternité des productions a trait à des individus regroupés, parfois même hiérarchisés, au sein de collectifs de co-signataires qui se reconfigurent au fil du temps. La nature de la contribution de chacun tend désormais à être explicitée. Chaque affiliation est ancrée sur un territoire, le contenu des recherches l'est parfois aussi. L'impact des savoirs produits se matérialise explicitement via les citations et implicitement par les éponymes et autres évocations d'écoles de pensée. La délimitation des disciplines et le front de recherche — séparant le connu de l'inconnu — sont en perpétuelle évolution. Tous ces savoirs circulent dans la sphère académique, certains atteignent le grand public qui les relaie sur les réseaux sociaux et dans la presse, alimentant des altmetrics qui attestent de cette percolation science–société.
Cet exposé présentera une variété de tâches de recherche interrogeant ce matériau pour éclairer la genèse et l'évolution des mondes sociaux et des savoirs en sciences. Il s'agit de travaux interdisciplinaires à la croisée de l'informatique, de la scientométrie (désignant l'étude quantitative de la science et de l'innovation) et des sciences humaines et sociales. Je souhaite transmettre mon enthousiasme pour ces problématiques et promouvoir les thèmes du workshop Bibliometric-enhanced Information Retrieval (BIR) que je co-anime dans le cadre d'ECIR.
Valoriser le capital documentaire (en sommeil) d’une organisation : exploitat...Guillaume Cabanac
Conférence aux 10es journées d'étude du Département Archive et Médiathèque de l'Université Toulouse Jean Jaurès sur le thème "La participation des usagers et lecteurs en contexte numérique : quels impacts sur les pratiques professionnelles ?"
https://www.irit.fr/publis/IRIS/2019_DDAME_C.pdf
Comment analyser une mobilisation collective dans les réseaux socionumériques...Guillaume Cabanac
Séminaire PragmaTIC sur les TIC, les pratiques associées et leurs incidences sociales.
Université Toulouse 2, 28 septembre 2017
https://web.archive.org/web/20170928/http://sms.univ-tlse2.fr/accueil-sms/comunitic/seminaire-journee-d-etudes/seminaire-pragmatic-les-usages-des-medias-sociaux-dans-le-cadre-des-mobilisations-collectives-au-bresil--524448.kjsp
Conférence au workshop “Women and
men in science: Do we need gender metrics?” du 27 avril 2017 à l'Université Toulouse 2 - Jean Jaurès
https://www.irit.fr/~Guillaume.Cabanac/docs/workshopGenderScienceLabexSMS2017.pdf
Émergence de l’open access « gris » : LibGen et Sci-Hub comme filières clande...Guillaume Cabanac
Séminaire PragmaTIC sur les TIC, les pratiques associées et leurs incidences sociales.
Université Toulouse 2, 20 octobre 2016
https://web.archive.org/web/20161009/http://sms.univ-tlse2.fr/accueil-sms/comunitic/seminaire-journee-d-etudes/seminaire-pragmatic-programmation-2016-2017-451614.kjsp?RH=actions-SMS
« T'as pensé à retweeter mon article ? » Enjeux, limites et critique de la bi...Guillaume Cabanac
« T'as pensé à retweeter mon article ? »
Enjeux, limites et critique de la bibliométrie alternative via les Altmetrics
Guillaume Cabanac, MCF, Université Toulouse 3, Institut de Recherche en Informatique de Toulouse
L'impact d'un résultat scientifique est traditionnellement estimé par le nombre de citations que la publication associée suscite. Cependant, cet indicateur ne permet pas d'estimer la réception d'une recherche en dehors de la sphère académique, à court terme, que ce soit dans la presse ou sur les médias sociaux (Twitter, Facebook, etc.).
C'est dans ce contexte que des indicateurs complémentaires appelés « altmetrics » sont développés depuis 2012 pour refléter l'engouement exprimé à l'égard des résultats scientifiques. Les altmetrics sont désormais intégrés aux plateformes des éditeurs scientifiques (tels qu'Elsevier, PLOS, Springer et Wiley) et aux CV en ligne des chercheurs (sur impactstory.org par exemple).
Mesurer l'intérêt du grand public pour la science : les altmetrics atteignent-ils cet objectif louable ? De récentes études suggèrent que le gros de l'activité captée par les altmetrics provient des chercheurs eux-mêmes... Les scientifiques adeptes des réseaux sociaux auraient-ils détourné -- inconsciemment ou délibérément -- cet indicateur pour accroître leur e-reputation ?
"When a measure becomes a target, it ceases to be a good measure" -- Goodhart's law
https://en.wikipedia.org/wiki/Goodhart%27s_law
Quelques références récentes :
-Colquhoun, D., & Plested, A. (2014). Scientists don't count: Why you should ignore Altmetrics and other bibliometric nightmares. DC's Improbable Science [Blog post]. Available from: http://wp.me/p2ZpqR-1EJ
-González-Valiente, C. L., Pacheco-Mendoza, J. and Arencibia-Jorge, R. (2016), A review of altmetrics as an emerging discipline for research evaluation. Learned Publishing. doi:10.1002/leap.1043
-Ke, Q., Ahn, Y.-Y., & Sugimoto, C. R. (2016). A Systematic Identification and Analysis of Scientists on Twitter. ArXiV preprint available from http://arxiv.org/abs/1608.06229
https://openeval2016.sciencesconf.org
https://openeval2016.sciencesconf.org/data/program/Resume_Guillaume_Cabanac.pdf
Émergence de l’open access « gris » : LibGen et Sci-HubGuillaume Cabanac
Séminaire ELICO « Observer les dynamiques socio-économiques de la publication scientifique : approches qualitative et bibliométrique »
http://web.archive.org/web/20160511081918/http://www.elico-recherche.eu/actualites/actualites-du-laboratoire/programme-du-seminaire-elico-2015-16
Les altmetrics : estimer l'engouement pour la recherche sur les médias sociauxGuillaume Cabanac
Journée d’étude Doccitanist : "Evaluation scientifique, qui croire et pourquoi ?"
08 octobre 2015
http://web.archive.org/web/20151008092117/http://doccitanist.lirmm.fr/spip.php?article273&lang=fr
Bibliogifts ? Les bibliothèques clandestines de l'édition scientifiqueGuillaume Cabanac
Conférence invitée à la 6e édition de la journée « Réseau des bibliothèques », université Fédérale de Toulouse, 9 juin 2015
http://web.archive.org/web/20150606154341/http://bibliotheques.univ-toulouse.fr/actualite/journee-reseau-des-bibliotheques-2015
Le renfort des liens forts - dynamique relationnelle du coauthorshipGuillaume Cabanac
Le renfort des liens forts - dynamique relationnelle du coauthorship
Cas de l’informatique (1980-2010)
Journées d'études RÉSOCIT
http://www.irit.fr/~Guillaume.Cabanac/docs/resocit2015.pdf
Conférence aux 6e journées d'étude du Département Archive et Médiathèque de l'Université Toulouse Jean Jaurès sur le thème "Visibilité et légitimité de l'information : comment se faire "bien voir" dans le contexte numérique ?"
Programme : http://www.irit.fr/publis/SIG/2015_DAM_C.pdf
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত ..
আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ...
তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...।
বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics
1. Musings at the Crossroads ofMusings at the Crossroads of
Digital Libraries, Information Retrieval,Digital Libraries, Information Retrieval,
and Scientometricsand Scientometrics
http://bit.ly/rguCabanac2012http://bit.ly/rguCabanac2012
Guillaume Cabanac
guillaume.cabanac@univ-tlse3.fr
March 28th, 2012
2. Outline of these Musings
2
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Digital LibrariesDigital Libraries
• Collective annotations
• Social validation of discussion threads
• Organization-based document similarity
Information RetrievalInformation Retrieval
• The tie-breaking bias in IR evaluation
• Geographic IR
• Effectiveness of query operators
ScientometricsScientometrics
• Recommendation based on topics and social clues
• Landscape of research in Information Systems
• The submission-date bias in peer-reviewed conferences
3. 3
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Digital LibrariesDigital Libraries
• Collective annotations
• Social validation of discussion threads
• Organization-based document similarity
Information RetrievalInformation Retrieval
• The tie-breaking bias in IR evaluation
• Geographic IR
• Effectiveness of query operators
ScientometricsScientometrics
• Recommendation based on topics and social clues
• Landscape of research in Information Systems
• The submission-date bias in peer-reviewed conferences
Outline of these Musings
4. 4
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Digital LibrariesDigital Libraries
• Collective annotations
• Social validation of discussion threads
• Organization-based document similarity
Question DL-1
How to transpose paper-based
annotations into digital documents?
IRIRDLDL
SCIMSCIM
Guillaume Cabanac, Max Chevalier, Claude Chrisment, Christine Julien. “Collective annotation: Perspectives for
information retrieval improvement.” RIAO’07 : Proceedings of the 8th conference on Information Retrieval and its
Applications, pages 529–548. CID, may 2007.
5. 5
Characteristics of paper annotation
Secular activity: older than 4 centuries
Numerous applicative contexts: theology, science, literature …
Personal use: “active reading” (Adler & van Doren, 1972)
Collective use: review process, opinion exchange …
From Individual Paper-based Annotation …
US students
(Marshall, 1998)
1541
Annotated bible
(Lortsch, 1910)
Fermat’s last
theorem
(Kleiner, 2000)
Annotations from
Blake, Keats…
(Jackson, 2001)
Les Misérables
Victor Hugo
1630 1790 1830 1881 1998
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
6. 6
… to Collective Digital Annotations
author
87%
reader
13%
1993 2005
ComMentor … iMarkup … Yawas … Amaya …
> 20 annotation systems
(Cabanac et al., 2005)
Web servers (Ovsiannikov et al., 1999)
Annotation
server
a discussion thread
Hard to share ⇒ ‘lost’
hardcopy
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
7. 7
W3C Annotea / Amaya (Kahan et al., 2002)
Digital Document Annotation: Examples
a reader’s comment
discussion
thread
Arakne, featuring “fluid annotations” (Bouvin et al., 2002)
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
8. 8
Collective Annotations
Reviewed 64 systems designed during 1989–2008
Collective Annotation
Objective data
Owner, creation date
Anchoring point within the document. Granularity: all doc, words…
Subjective information
Comments, various marks: stars, underlined text…
Annotation types: support/refutation, question…
Visibility: public, private, group…
Purpose-oriented annotation categories
Annotation remark
Annotation reminder
Annotation argumentation
Personal Annotation Space
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
9. 9
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Digital LibrariesDigital Libraries
• Collective annotations
• Social validation of discussion threads
• Organization-based document similarity
Question DL-2
How to measure the social validity of
a statement according to the
argumentative discussion it sparked off?
IRIRDLDL
SCIMSCIM
Guillaume Cabanac, Max Chevalier, Claude Chrisment, Christine Julien. “Social validation of collective
annotations : Definition and experiment.” Journal of the American Society for Information Science and
Technology, 61(2):271–287, feb. 2010, Wiley. DOI:10.1002/asi.21255
10. 10
Scalability issue
Which annotations
should I read?
Social validation = degree of consensus of the group
Social Validation
Social Validation of Argumentative Debates
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
11. 11
Social Validation of Argumentative Debates
Before
Annotation magma
After
Filtered display
Informing readers about how validated each annotation is
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
12. 12
Overview
Two proposed algorithms
Empirical Recursive Scoring Algorithm (Cabanac et al., 2005)
Bipolar Argumentation Framework Extension
based on Artificial Intelligence research works (Cayrol & Lagasquie-Schiex, 2005)
Social Validation Algorithms
validity
0
socially neutral
– 1
socially refuted
1
socially confirmed
case 1case 2case 3 case 4
A
B
A
B
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
13. 13
Example
Computing the social validity of a debated annotation
Social Validation Algorithm
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
14. 14
Validation with a User-study
Design
Corpus: 13 discussion threads
= 222 annotations + answers
Task of a participant
Label opinion type
Infer overall opinion
Volunteer subjects
53
119
Aim: social validation vs human perception of consensus
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
15. 15
Q1 Do people agree when labeling opinions?
Kappa coefficient (Fleiss, 1971; Fleiss et al., 2003)
Inter-rater agreement among n > 2 raters
Weak agreement, with variability ⇒ subjective task
Experimenting the Social Validation of Debates
Debate Id
Fair to good
Poor
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
ValueofKappa
agreement
16. 16
Q2 How well SV approximates HP?
HP = Human Perception of consensus
SV = Social Validation algorithm
1. Test whether PH and VS are different (p < 0.05)
⇒ Student’s paired t-test: (p = 0,20) > (α = 0,05)
2. Correlate HP et SV
⇒ Pearson’s coefficient of correlation r
r(HP, SV) = 0.48 shows a weak correlation
Experimenting the Social Validation of Debates
HP – SV
Density y = p(HP – SV)
example: HP = SV for 24 % of all cases
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Density
17. 17
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Digital LibrariesDigital Libraries
• Collective annotations
• Social validation of discussion threads
• Organization-based document similarity
Question DL-3
How to harness a quiescent capital
present in any community:
its documents?
IRIRDLDL
SCIMSCIM
Guillaume Cabanac, Max Chevalier, Claude Chrisment, Christine Julien. “Organization of digital resources as an
original facet for exploring the quiescent information capital of a community.” International Journal on Digital
Libraries, 11(4):239–261, dec. 2010, Springer. DOI:10.1007/s00799-011-0076-6
18. 18
Personal Documents
Filtered, validated, organized information…
… relevant to activities in the organization
Paradox: profitable, but under-exploited
Reason 1 – folders and files are private
Reason 2 – manual sharing
Reason 3 – automated sharing
Consequences
People resort to resources available outside of the community
Weak ROI ⇒ why would we have to look outside when it’s already there?
Documents as a Quiescent Wealth
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
19. 19
Mapping the documents of the community
SOM [Kohonen, 2001] Umap [Triviumsoft] TreeMap [Fekete & Plaisant, 2001]…
Limitations
Find the documents with same topicssame topics as D
Find documents that colleagues useuse with D
→ concept of usage: grouping documentsgrouping documents ⇆ keeping stuff in commonkeeping stuff in common
How to Benefit from Documents in a Community?
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
20. 20
Organization-based similarities
inter-folder
inter-document
inter-user
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
How to Benefit from Documents in a Community?
21. 21
Purpose: Offering a global view of
… people and their documents
Based on document contents
Based on document usage/organization
Requirement: non-intrusiveness and confidentiality
OperationalOperational needs
Find documents
With related materials
With complementary materials
Seeking people ⇆ seeking documents
ManagerialManagerial needs
Visualize the global/individual activity
Work position → required documents
How to Help People to Discover/Find/Use Documents?
community
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
22. 22
4 views = {documents, people} × {group, unit}
1. Group of documents
Main topics
Usage groups
2. A single document
Who to liaise with?
What to read?
3. Group of people
Community of interest
Community of use
4. A single people
Interests
Similar users (potential help)
Proposed System: Static Aspect
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
23. 23
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Digital LibrariesDigital Libraries
• Collective annotations
• Social validation of discussion threads
• Organization-based document similarity
Information RetrievalInformation Retrieval
• The tie-breaking bias in IR evaluation
• Geographic IR
• Effectiveness of query operators
ScientometricsScientometrics
• Recommendation based on topics and social clues
• Landscape of research in Information Systems
• The submission-date bias in peer-reviewed conferences
Outline of these Musings
24. 24
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Question IR-1
Is document tie-breaking
affecting the evaluation of
Information Retrieval systems?
IRIRDLDL
SCIMSCIM
Information RetrievalInformation Retrieval
• The tie-breaking bias in IR evaluation
• Geographic IR
• Effectiveness of query operators
Guillaume Cabanac, Gilles Hubert, Mohand Boughanem, Claude Chrisment. “Tie-breaking Bias : Effect of an
Uncontrolled Parameter on Information Retrieval Evaluation.” M. Agosti, N. Ferro, C. Peters, M. de Rijke, and A. F.
Smeaton (Eds.) CLEF’10 : Proceedings of the 1st Conference on Multilingual and Multimodal Information Access
Evaluation, volume 6360 de LNCS, pages 112–123. Springer, sep. 2010. DOI:10.1007/978-3-642-15998-5_13
25. 25
Measuring the Effectiveness of IR systems
User-centered vs. System-focused [Spärck Jones & Willett, 1997]
Evaluation campaigns
1958 Cranfield, UK
1992 TREC (Text Retrieval Conference), USA
1999 NTCIR (NII Test Collection for IR Systems), Japan
2001 CLEF (Cross-Language Evaluation Forum), Europe
…
“Cranfield” methodology
Task
Test collection
Corpus
Topics
Qrels
Measures : MAP, P@X ...
using trec_eval [Voorhees, 2007]
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
26. 26
Runs are Reordered Prior to Their Evaluation
Qrels = 〈qid, iter, docno, rel〉 Run = 〈qid, iter, docno, rank, sim, run_id〉
Reordering by trec_eval
qid asc, sim desc, docno desc
Effectiveness measure = f (intrinsic_quality, )
MAP, P@X, MRR…
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
27. 27
Consequences of Run Reordering
Measures of effectiveness for an IRS s
RR(s,t) 1/rank of the 1st
relevant
document, for topic t
P(s,t,d) precision at document d, for
topic t
AP(s,t) average precision for topic t
MAP(s) mean average precision
Tie-breaking bias
Is the Wall Street Journal collection more relevant than Associated Press?
ChrisChris
EllenEllen
Sensitive to
document
rank
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
28. 28
What we Learnt: Beware of Tie-breaking for AP
Poor effect on MAP, larger effect on AP
Measure bounds APRealistic ≤ APConventionnal ≤ APOptimistic
padre1, adhoc’94
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
29. 29
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Question IR-2
How to retrieve documents
matching keywords and
spatiotemporal constraints?
IRIRDLDL
SCIMSCIM
Information RetrievalInformation Retrieval
• The tie-breaking bias in IR evaluation
• Geographic IR
• Effectiveness of query operators
Damien Palacio, Guillaume Cabanac, Christian Sallaberry, Gilles Hubert. “On the evaluation of geographic
information retrieval systems: Evaluation framework and case study.” International Journal on Digital Libraries,
11(2):91–109, june 2010, Springer. DOI:10.1007/s00799-011-0070-z
30. 30
Geographic Information Retrieval
Query = “Road trip around Aberdeen summer 1982”
Search engines
Topic term ∈ {road, trip, Aberdeen, summer}
spatial ∈ {AberdeenCity, AberdeenCounty…}
Geographic temporal ∈ [21-JUN-1982 .. 22-SEP-1982]
term ∈ {road, trip, Aberdeen, summer}
≈ 1/6 queries = geographic queries
Excite (Sanderson et al., 2004)
AOL (Gan et al., 2008)
Yahoo! (Jones et al., 2008)
⇒ Current issue worth studying
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
31. 31
The Internals of a Geographic IR System
3 dimensions to process
Topical, spatial, temporal
1 index per dimension
Topic bag of words, stemming, weighting, comparing with VSM…
Spatial spatial entity detection, spatial relation resolution…
Temporal temporal entity detection…
Query processing with sequential filtering
e.g., priority to theme, then filtering according to other dimensions
Issue: effectiveness of GIRSs vs state-of-the-art IRSs?
Hypothesis: GIRSs better than state-of-the-art IRSs
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
32. 32
Case Study: the PIV GIR System
Indexing: one index per dimension
Topical = Terrier IRS Spatial = tiling Temporal = tiling
Retrieval
Identification of the 3 dimensions in the query
Routing towards each index
Combination of results with CombMNZ [Fox & Shaw, 1993; Lee 1997]
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
33. 33
Case Study: the PIV GIR System
Principle of CombMNZ and Borda Count
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
34. 34
Case Study: the PIV GIR System
Gain in effectiveness
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
35. 35
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Question IR-3
Do operators in search queries improve
the effectiveness of search results?
IRIRDLDL
SCIMSCIM
Information RetrievalInformation Retrieval
• The tie-breaking bias in IR evaluation
• Geographic IR
• Effectiveness of query operators
Gilles Hubert, Guillaume Cabanac, Christian Sallaberry, Damien Palacio. “Query Operators Shown Beneficial for
Improving Search Results.” S. Gradmann, F. Borri, C. Meghini, H. Schuldt (Eds.) TPDL’11 : Proceedings of the 1st
International Conference on Theory and Practice of Digital Libraries, volume 6966 de LNCS, pages 118–129.
Springer, sep. 2011. DOI:10.1007/978-3-642-24469-8_14.
36. Various Operators
Quotation marks, Must appear (+), boosting operator (^),
Boolean operators, proximity operators…
36
Information need
“I’m looking for research projects funded in the DL domain”
Regular query Query with operators
Search Engines Offer Query Operators
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
38. 38
Our Methodology in a Nutshell
Regular query V1: Query variant with operators
<
V3
V2
V4
VN. . .
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
39. 39
Effectiveness of Query Operators
TREC-7 per Topic Analysis: Boxplots
‘+’ and ‘^’
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
40. 40
Effectiveness of Query Operators
Per Topic Analysis: Box plot
AP of TREC’s regular query
Query variant highest AP
32
Topics
AP(AveragePrecision)
0.2
0.1
0.3
0.4
Query variant lowest AP
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
41. 41
Effectiveness of Query Operators
TREC-7 Per Topic Analysis
‘+’ and ‘^’
MAP = 0.1554
MAP ┬ = 0.2099
+35.1%
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
42. 42
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Digital LibrariesDigital Libraries
• Collective annotations
• Social validation of discussion threads
• Organization-based document similarity
Information RetrievalInformation Retrieval
• The tie-breaking bias in IR evaluation
• Geographic IR
• Effectiveness of query operators
ScientometricsScientometrics
• Recommendation based on topics and social clues
• Landscape of research in Information Systems
• The submission-date bias in peer-reviewed conferences
Outline of these Musings
43. 43
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Question SCIM-1
How to recommend researchers
according to their research topics
and social clues?
IRIRDLDL
SCIMSCIM
ScientometricsScientometrics
• Recommendation based on topics and social clues
• Landscape of research in Information Systems
• The submission-date bias in peer-reviewed conferences
Guillaume Cabanac. “Accuracy of inter-researcher similarity measures based on topical and social clues.”
Scientometrics, 87(3):597–620, june 2011, Springer. DOI:10.1007/s11192-011-0358-1
44. 44
Recommendation of Literature (McNee et al., 2006)
Collaborative filtering
Principle: mining the preferencespreferences of researchers
→ those who liked this paper also liked…
Snowball effect / fad
Innovation?
Relevance of theme?
Cognitive filtering
Principle: mining the contentscontents of articles
→ profile of resources (researcher, articles)
→ citation graph
Hybrid approach
????
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
45. 45
Foundations: Similarity Measures Under Study
Model
Coauthors graph authors ↔ auteurs
Venues graph authors ↔ conferences / journals
Social similarities
Inverse degree of separation length of the shortest path
Strength of the tie number of shortest paths
Shared conferences number of shared conference editions
Thematic similarity
Cosine on Vector Space Model di = (wi
1
, … , wi
n
)
built on titles (doc / researcher)
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
46. 46
Computing Similarities with Social Clues
Task of literature review
Requirement topical relevance
Preference social proximity (meetings, project…)
⇒ re-rank topical results with social clues
Combination with CombMNZ (Fox & Shaw, 1993)
Final result: list of recommended researchers
CombMNZ
Degree of separation
Strength of ties
Shared conferences
Social list
Topical list
∩
CombMNZ TS listTS list
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
47. 47
Evaluation Design
Comparison of recommendations and researchers’ perception
Q1 : Effectiveness of topical (only) recommendations?
Q2 : Gain due to integrating social clues?
IR experiments: Cranfield paradigm (TREC…)
Does the search engine retrieve relevant documents?
Doc relevant?
assessor
relevance judgments
{0, 1} binary
[0, N] gradual
qrels
trec_eval
Effectiveness measures
Mean Average Precision
Normalized Discounted Cumulative Gain
topic S1 S2
1 0.5687 0.6521
… … …
50 0.7124 0.7512
avg 0.6421 0.7215
improvement +12.3 %
significativity p < 0.05 (paired t-test)
search engine x
input
topic
corpus
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
48. 48
Evaluating Recommendations
doc relevant ?
assessor
relevance judgments
{0, 1} binary
[0, N] gradual
qrels
trec_eval
Effectiveness measures
Mean Average Precision
Normalized Discounted Cumulative Gain
topic S1 S2
1 0.5687 0.6521
… … …
50 0.7124 0.7512
avg 0.6421 0.7215
improvement +12.3 %
significativity p < 0.05 (paired t-test)
search engine x
input
topic
corpus
name of a
researcher
researcher
« With whom would you like to chat for
improving your research? »
recommender system
topical
topical +
social
#subjects
Top 25
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
49. 49
Experiment
Features
Data dblp.xml (713 MB = 1.3M publications for 811,787 researchers)
Subjects 90 researchers-contacts contacted by mail
74 researchers began to fill the questionnaire. 71 completed it
Interface for assessing recommendations
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
50. 50
Experiments: Profile of the Participants
Experience of the 71 subjects Mdn = 13 years
74
Productivity of the 71 subjects Mdn = 15 publications
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
NumberofparticipantsNumberofparticipants
Seniority
Number of publications
51. 51
Empirical Validation of our Hypothesis
Strong baseline ⇒ effective approach based on VSM
+8.49 % = significant improvement (p < 0.05 ; n = 70)
of topical recommendations by social clues
0,5
0,6
0,7
0,8
0,9
1
global < 15 publis >= 15 publis < 13 ans >= 13 ans
Thématique Thématique + Social
productivity experience
+8,49 %+8,49 % +10,39 %+10,39 % +7,03 %+7,03 % +6,50 %+6,50 % +10,22 %+10,22 %
NDCG
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Topical Topical + social
yearsyears
52. 52
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Question SCIM-2
What is the landscape of research in
Information Systems from the
perspective of gatekeepers?
IRIRDLDL
SCIMSCIM
ScientometricsScientometrics
• Recommendation based on topics and social clues
• Landscape of research in Information Systems
• The submission-date bias in peer-reviewed conferences
Guillaume Cabanac. “Shaping the landscape of research in Information Systems from the perspective of editorial
boards : A scientometric study of 77 leading journals.” Journal of the American Society for Information Science
and Technology, 63, to appear in 2012, Wiley. DOI:10.1002/asi.22609
53. 53
Landscape of Research in Information Systems
The gatekeepers of science
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
54. 54
Landscape of Research in Information Systems
The 77 core peer-reviewed IS journals in the WoS
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
55. 55
Landscape of Research in Information Systems
Exploratory data analysis
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
56. 56
Landscape of Research in Information Systems
Exploratory data analysis
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
57. 57
Landscape of Research in Information Systems
Topical map of the IS field
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
58. 58
Landscape of Research in Information Systems
Most influential
gatekeepers
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
59. 59
Landscape of Research in Information Systems
Number of gatekeepers per country
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
60. 60
Landscape of Research in Information Systems
Geographic and gender diversity
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
61. 61
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Question SCIM-3
What if submission date influenced the
acceptance of conference papers?
IRIRDLDL
SCIMSCIM
ScientometricsScientometrics
• Recommendation based on topics and social clues
• Landscape of research in Information Systems
• The submission-date bias in peer-reviewed conferences
Guillaume Cabanac. “What if submission date influenced the acceptance of conference papers?” Submitted to
the Journal of the American Society for Information Science and Technology, Wiley.
62. 62
Conferences Affected by a Submission-Date bias?
Peer-review
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
63. 63
The Submission-Date bias
Dataset from the ConfMaster conference management system
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
64. 64
The Submission-Date bias
Influence of submission date on bids
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
65. 65
The Submission-Date bias
Influence of submission date on average marks
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
66. Conclusion
66
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Digital LibrariesDigital Libraries
• Collective annotations
• Social validation of discussion threads
• Organization-based document similarity
Information RetrievalInformation Retrieval
• The tie-breaking bias in IR evaluation
• Geographic IR
• Effectiveness of query operators
ScientometricsScientometrics
• Recommendation based on topics and social clues
• Landscape of research in Information Systems
• The submission-date bias in peer-reviewed conferences