SlideShare a Scribd company logo
Musings at the Crossroads ofMusings at the Crossroads of
Digital Libraries, Information Retrieval,Digital Libraries, Information Retrieval,
and Scientometricsand Scientometrics
http://bit.ly/rguCabanac2012http://bit.ly/rguCabanac2012
Guillaume Cabanac
guillaume.cabanac@univ-tlse3.fr
March 28th, 2012
Outline of these Musings
2
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Digital LibrariesDigital Libraries
• Collective annotations
• Social validation of discussion threads
• Organization-based document similarity
Information RetrievalInformation Retrieval
• The tie-breaking bias in IR evaluation
• Geographic IR
• Effectiveness of query operators
ScientometricsScientometrics
• Recommendation based on topics and social clues
• Landscape of research in Information Systems
• The submission-date bias in peer-reviewed conferences
3
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Digital LibrariesDigital Libraries
• Collective annotations
• Social validation of discussion threads
• Organization-based document similarity
Information RetrievalInformation Retrieval
• The tie-breaking bias in IR evaluation
• Geographic IR
• Effectiveness of query operators
ScientometricsScientometrics
• Recommendation based on topics and social clues
• Landscape of research in Information Systems
• The submission-date bias in peer-reviewed conferences
Outline of these Musings
4
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Digital LibrariesDigital Libraries
• Collective annotations
• Social validation of discussion threads
• Organization-based document similarity
Question DL-1
How to transpose paper-based
annotations into digital documents?
IRIRDLDL
SCIMSCIM
Guillaume Cabanac, Max Chevalier, Claude Chrisment, Christine Julien. “Collective annotation: Perspectives for
information retrieval improvement.” RIAO’07 : Proceedings of the 8th conference on Information Retrieval and its
Applications, pages 529–548. CID, may 2007.
5
 Characteristics of paper annotation
 Secular activity: older than 4 centuries
 Numerous applicative contexts: theology, science, literature …
 Personal use: “active reading” (Adler & van Doren, 1972)
 Collective use: review process, opinion exchange …
From Individual Paper-based Annotation …
US students
(Marshall, 1998)
1541
Annotated bible
(Lortsch, 1910)
Fermat’s last
theorem
(Kleiner, 2000)
Annotations from
Blake, Keats…
(Jackson, 2001)
Les Misérables
Victor Hugo
1630 1790 1830 1881 1998
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
6
… to Collective Digital Annotations
author
87%
reader
13%
1993 2005
ComMentor … iMarkup … Yawas … Amaya …
> 20 annotation systems
(Cabanac et al., 2005)
Web servers (Ovsiannikov et al., 1999)
Annotation
server
a discussion thread
Hard to share ⇒ ‘lost’
hardcopy
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
7
 W3C Annotea / Amaya (Kahan et al., 2002)
Digital Document Annotation: Examples
a reader’s comment
discussion
thread
 Arakne, featuring “fluid annotations” (Bouvin et al., 2002)
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
8
Collective Annotations
 Reviewed 64 systems designed during 1989–2008
 Collective Annotation
 Objective data
 Owner, creation date
 Anchoring point within the document. Granularity: all doc, words…
 Subjective information
 Comments, various marks: stars, underlined text…
 Annotation types: support/refutation, question…
 Visibility: public, private, group…
 Purpose-oriented annotation categories
Annotation remark
Annotation reminder
Annotation argumentation
Personal Annotation Space
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
9
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Digital LibrariesDigital Libraries
• Collective annotations
• Social validation of discussion threads
• Organization-based document similarity
Question DL-2
How to measure the social validity of
a statement according to the
argumentative discussion it sparked off?
IRIRDLDL
SCIMSCIM
Guillaume Cabanac, Max Chevalier, Claude Chrisment, Christine Julien. “Social validation of collective
annotations : Definition and experiment.” Journal of the American Society for Information Science and
Technology, 61(2):271–287, feb. 2010, Wiley. DOI:10.1002/asi.21255
10
 Scalability issue 
 Which annotations
should I read?
 Social validation = degree of consensus of the group
Social Validation
Social Validation of Argumentative Debates
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
11
Social Validation of Argumentative Debates
Before
Annotation magma
After
Filtered display
 Informing readers about how validated each annotation is
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
12
 Overview
 Two proposed algorithms
 Empirical Recursive Scoring Algorithm (Cabanac et al., 2005)
 Bipolar Argumentation Framework Extension
 based on Artificial Intelligence research works (Cayrol & Lagasquie-Schiex, 2005)
Social Validation Algorithms
validity
0
socially neutral
– 1
socially refuted
1
socially confirmed
case 1case 2case 3 case 4
A
B
A
B
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
13
 Example
 Computing the social validity of a debated annotation
Social Validation Algorithm
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
14
Validation with a User-study
 Design
 Corpus: 13 discussion threads
= 222 annotations + answers
 Task of a participant
 Label opinion type
 Infer overall opinion
 Volunteer subjects
53
119
 Aim: social validation vs human perception of consensus
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
15
 Q1 Do people agree when labeling opinions?
 Kappa coefficient (Fleiss, 1971; Fleiss et al., 2003)
Inter-rater agreement among n > 2 raters
 Weak agreement, with variability ⇒ subjective task
Experimenting the Social Validation of Debates
Debate Id
Fair to good
Poor
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
ValueofKappa
agreement
16
 Q2 How well SV approximates HP?
 HP = Human Perception of consensus
 SV = Social Validation algorithm
1. Test whether PH and VS are different (p < 0.05)
⇒ Student’s paired t-test: (p = 0,20) > (α = 0,05)
2. Correlate HP et SV
⇒ Pearson’s coefficient of correlation r
r(HP, SV) = 0.48 shows a weak correlation
Experimenting the Social Validation of Debates
HP – SV
Density y = p(HP – SV)
example: HP = SV for 24 % of all cases
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Density
17
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Digital LibrariesDigital Libraries
• Collective annotations
• Social validation of discussion threads
• Organization-based document similarity
Question DL-3
How to harness a quiescent capital
present in any community:
its documents?
IRIRDLDL
SCIMSCIM
Guillaume Cabanac, Max Chevalier, Claude Chrisment, Christine Julien. “Organization of digital resources as an
original facet for exploring the quiescent information capital of a community.” International Journal on Digital
Libraries, 11(4):239–261, dec. 2010, Springer. DOI:10.1007/s00799-011-0076-6
18
 Personal Documents
 Filtered, validated, organized information…
 … relevant to activities in the organization
 Paradox: profitable, but under-exploited
 Reason 1 –  folders and files are private
 Reason 2 –  manual sharing
 Reason 3 –  automated sharing
 Consequences
 People resort to resources available outside of the community
 Weak ROI ⇒ why would we have to look outside when it’s already there?
Documents as a Quiescent Wealth
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
19
 Mapping the documents of the community
 SOM [Kohonen, 2001] Umap [Triviumsoft] TreeMap [Fekete & Plaisant, 2001]…
 Limitations
 Find the documents with same topicssame topics as D
 Find documents that colleagues useuse with D
→ concept of usage: grouping documentsgrouping documents ⇆ keeping stuff in commonkeeping stuff in common
How to Benefit from Documents in a Community?
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
20
 Organization-based similarities
 inter-folder
 inter-document
 inter-user
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
How to Benefit from Documents in a Community?
21
 Purpose: Offering a global view of
 … people and their documents
 Based on document contents
 Based on document usage/organization
 Requirement: non-intrusiveness and confidentiality
 OperationalOperational needs
 Find documents
 With related materials
 With complementary materials
 Seeking people ⇆ seeking documents
 ManagerialManagerial needs
 Visualize the global/individual activity
 Work position → required documents
How to Help People to Discover/Find/Use Documents?
community
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
22
4 views = {documents, people} × {group, unit}
1. Group of documents
 Main topics
 Usage groups
2. A single document
 Who to liaise with?
 What to read?
3. Group of people
 Community of interest
 Community of use
4. A single people
 Interests
 Similar users (potential help)
Proposed System: Static Aspect
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
23
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Digital LibrariesDigital Libraries
• Collective annotations
• Social validation of discussion threads
• Organization-based document similarity
Information RetrievalInformation Retrieval
• The tie-breaking bias in IR evaluation
• Geographic IR
• Effectiveness of query operators
ScientometricsScientometrics
• Recommendation based on topics and social clues
• Landscape of research in Information Systems
• The submission-date bias in peer-reviewed conferences
Outline of these Musings
24
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Question IR-1
Is document tie-breaking
affecting the evaluation of
Information Retrieval systems?
IRIRDLDL
SCIMSCIM
Information RetrievalInformation Retrieval
• The tie-breaking bias in IR evaluation
• Geographic IR
• Effectiveness of query operators
Guillaume Cabanac, Gilles Hubert, Mohand Boughanem, Claude Chrisment. “Tie-breaking Bias : Effect of an
Uncontrolled Parameter on Information Retrieval Evaluation.” M. Agosti, N. Ferro, C. Peters, M. de Rijke, and A. F.
Smeaton (Eds.) CLEF’10 : Proceedings of the 1st Conference on Multilingual and Multimodal Information Access
Evaluation, volume 6360 de LNCS, pages 112–123. Springer, sep. 2010. DOI:10.1007/978-3-642-15998-5_13
25
Measuring the Effectiveness of IR systems
 User-centered vs. System-focused [Spärck Jones & Willett, 1997]
 Evaluation campaigns
 1958 Cranfield, UK
 1992 TREC (Text Retrieval Conference), USA
 1999 NTCIR (NII Test Collection for IR Systems), Japan
 2001 CLEF (Cross-Language Evaluation Forum), Europe
 …
 “Cranfield” methodology
 Task
 Test collection
 Corpus
 Topics
 Qrels
 Measures : MAP, P@X ...
using trec_eval [Voorhees, 2007]
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
26
Runs are Reordered Prior to Their Evaluation
Qrels = 〈qid, iter, docno, rel〉 Run = 〈qid, iter, docno, rank, sim, run_id〉
Reordering by trec_eval
qid asc, sim desc, docno desc
Effectiveness measure = f (intrinsic_quality, )
MAP, P@X, MRR…
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
27
Consequences of Run Reordering
 Measures of effectiveness for an IRS s
 RR(s,t) 1/rank of the 1st
relevant
document, for topic t
 P(s,t,d) precision at document d, for
topic t
 AP(s,t) average precision for topic t
 MAP(s) mean average precision
 Tie-breaking bias
 Is the Wall Street Journal collection more relevant than Associated Press?
ChrisChris
EllenEllen

Sensitive to
document
rank
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
28
What we Learnt: Beware of Tie-breaking for AP
 Poor effect on MAP, larger effect on AP
 Measure bounds APRealistic ≤ APConventionnal ≤ APOptimistic
padre1, adhoc’94
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
29
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Question IR-2
How to retrieve documents
matching keywords and
spatiotemporal constraints?
IRIRDLDL
SCIMSCIM
Information RetrievalInformation Retrieval
• The tie-breaking bias in IR evaluation
• Geographic IR
• Effectiveness of query operators
Damien Palacio, Guillaume Cabanac, Christian Sallaberry, Gilles Hubert. “On the evaluation of geographic
information retrieval systems: Evaluation framework and case study.” International Journal on Digital Libraries,
11(2):91–109, june 2010, Springer. DOI:10.1007/s00799-011-0070-z
30
Geographic Information Retrieval
 Query = “Road trip around Aberdeen summer 1982”
 Search engines
 Topic term ∈ {road, trip, Aberdeen, summer}
spatial ∈ {AberdeenCity, AberdeenCounty…}
 Geographic temporal ∈ [21-JUN-1982 .. 22-SEP-1982]
term ∈ {road, trip, Aberdeen, summer}
 ≈ 1/6 queries = geographic queries
 Excite (Sanderson et al., 2004)
 AOL (Gan et al., 2008)
 Yahoo! (Jones et al., 2008)
⇒ Current issue worth studying
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
31
The Internals of a Geographic IR System
 3 dimensions to process
 Topical, spatial, temporal
 1 index per dimension
 Topic bag of words, stemming, weighting, comparing with VSM…
 Spatial spatial entity detection, spatial relation resolution…
 Temporal temporal entity detection…
 Query processing with sequential filtering
 e.g., priority to theme, then filtering according to other dimensions
 Issue: effectiveness of GIRSs vs state-of-the-art IRSs?
 Hypothesis: GIRSs better than state-of-the-art IRSs
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
32
Case Study: the PIV GIR System
 Indexing: one index per dimension
 Topical = Terrier IRS Spatial = tiling Temporal = tiling
 Retrieval
 Identification of the 3 dimensions in the query
 Routing towards each index
 Combination of results with CombMNZ [Fox & Shaw, 1993; Lee 1997]
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
33
Case Study: the PIV GIR System
 Principle of CombMNZ and Borda Count
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
34
Case Study: the PIV GIR System
 Gain in effectiveness
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
35
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Question IR-3
Do operators in search queries improve
the effectiveness of search results?
IRIRDLDL
SCIMSCIM
Information RetrievalInformation Retrieval
• The tie-breaking bias in IR evaluation
• Geographic IR
• Effectiveness of query operators
Gilles Hubert, Guillaume Cabanac, Christian Sallaberry, Damien Palacio. “Query Operators Shown Beneficial for
Improving Search Results.” S. Gradmann, F. Borri, C. Meghini, H. Schuldt (Eds.) TPDL’11 : Proceedings of the 1st
International Conference on Theory and Practice of Digital Libraries, volume 6966 de LNCS, pages 118–129.
Springer, sep. 2011. DOI:10.1007/978-3-642-24469-8_14.
 Various Operators
 Quotation marks, Must appear (+), boosting operator (^),
Boolean operators, proximity operators…
36
Information need
“I’m looking for research projects funded in the DL domain”
Regular query Query with operators
Search Engines Offer Query Operators
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Our Research Questions
37
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
38
Our Methodology in a Nutshell
Regular query V1: Query variant with operators
  
 


<
V3
V2
V4
VN. . .
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
39
Effectiveness of Query Operators
 TREC-7 per Topic Analysis: Boxplots
 ‘+’ and ‘^’
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
40
Effectiveness of Query Operators
 Per Topic Analysis: Box plot
AP of TREC’s regular query
Query variant highest AP
32
Topics
AP(AveragePrecision)
0.2
0.1
0.3
0.4
Query variant lowest AP
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
41
Effectiveness of Query Operators
 TREC-7 Per Topic Analysis
 ‘+’ and ‘^’
MAP  = 0.1554
MAP ┬ = 0.2099
+35.1%
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
42
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Digital LibrariesDigital Libraries
• Collective annotations
• Social validation of discussion threads
• Organization-based document similarity
Information RetrievalInformation Retrieval
• The tie-breaking bias in IR evaluation
• Geographic IR
• Effectiveness of query operators
ScientometricsScientometrics
• Recommendation based on topics and social clues
• Landscape of research in Information Systems
• The submission-date bias in peer-reviewed conferences
Outline of these Musings
43
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Question SCIM-1
How to recommend researchers
according to their research topics
and social clues?
IRIRDLDL
SCIMSCIM
ScientometricsScientometrics
• Recommendation based on topics and social clues
• Landscape of research in Information Systems
• The submission-date bias in peer-reviewed conferences
Guillaume Cabanac. “Accuracy of inter-researcher similarity measures based on topical and social clues.”
Scientometrics, 87(3):597–620, june 2011, Springer. DOI:10.1007/s11192-011-0358-1
44
Recommendation of Literature (McNee et al., 2006)
 Collaborative filtering
 Principle: mining the preferencespreferences of researchers
→ those who liked this paper also liked…
 Snowball effect / fad
 Innovation?
 Relevance of theme?
 Cognitive filtering
 Principle: mining the contentscontents of articles
→ profile of resources (researcher, articles)
→ citation graph
 Hybrid approach





????
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
45
Foundations: Similarity Measures Under Study
 Model
 Coauthors graph authors ↔ auteurs
 Venues graph authors ↔ conferences / journals
 Social similarities
 Inverse degree of separation length of the shortest path
 Strength of the tie number of shortest paths
 Shared conferences number of shared conference editions
 Thematic similarity
 Cosine on Vector Space Model di = (wi
1
, … , wi
n
)
built on titles (doc / researcher)
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
46
Computing Similarities with Social Clues
 Task of literature review
 Requirement topical relevance
 Preference social proximity (meetings, project…)
⇒ re-rank topical results with social clues
 Combination with CombMNZ (Fox & Shaw, 1993)
 Final result: list of recommended researchers
CombMNZ
Degree of separation
Strength of ties
Shared conferences
Social list
Topical list
∩
CombMNZ TS listTS list
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
47
Evaluation Design
 Comparison of recommendations and researchers’ perception
 Q1 : Effectiveness of topical (only) recommendations?
 Q2 : Gain due to integrating social clues?
 IR experiments: Cranfield paradigm (TREC…)
 Does the search engine retrieve relevant documents?
Doc relevant?
assessor
relevance judgments
{0, 1} binary
[0, N] gradual
qrels
trec_eval
Effectiveness measures
Mean Average Precision
Normalized Discounted Cumulative Gain
topic S1 S2
1 0.5687 0.6521
… … …
50 0.7124 0.7512
avg 0.6421 0.7215
improvement +12.3 %
significativity p < 0.05 (paired t-test)
search engine x
input
topic
corpus
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
48
Evaluating Recommendations
doc relevant ?
assessor
relevance judgments
{0, 1} binary
[0, N] gradual
qrels
trec_eval
Effectiveness measures
Mean Average Precision
Normalized Discounted Cumulative Gain
topic S1 S2
1 0.5687 0.6521
… … …
50 0.7124 0.7512
avg 0.6421 0.7215
improvement +12.3 %
significativity p < 0.05 (paired t-test)
search engine x
input
topic
corpus
name of a
researcher
researcher
« With whom would you like to chat for
improving your research? »
recommender system
topical
topical +
social
#subjects
Top 25
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
49
Experiment
 Features
 Data dblp.xml (713 MB = 1.3M publications for 811,787 researchers)
 Subjects 90 researchers-contacts contacted by mail
74 researchers began to fill the questionnaire. 71 completed it
 Interface for assessing recommendations



Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
50
Experiments: Profile of the Participants
 Experience of the 71 subjects Mdn = 13 years
74
 Productivity of the 71 subjects Mdn = 15 publications
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
NumberofparticipantsNumberofparticipants
Seniority
Number of publications
51
Empirical Validation of our Hypothesis
 Strong baseline ⇒ effective approach based on VSM
 +8.49 % = significant improvement (p < 0.05 ; n = 70)
of topical recommendations by social clues
0,5
0,6
0,7
0,8
0,9
1
global < 15 publis >= 15 publis < 13 ans >= 13 ans
Thématique Thématique + Social
productivity experience
+8,49 %+8,49 % +10,39 %+10,39 % +7,03 %+7,03 % +6,50 %+6,50 % +10,22 %+10,22 %
NDCG
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Topical Topical + social
yearsyears
52
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Question SCIM-2
What is the landscape of research in
Information Systems from the
perspective of gatekeepers?
IRIRDLDL
SCIMSCIM
ScientometricsScientometrics
• Recommendation based on topics and social clues
• Landscape of research in Information Systems
• The submission-date bias in peer-reviewed conferences
Guillaume Cabanac. “Shaping the landscape of research in Information Systems from the perspective of editorial
boards : A scientometric study of 77 leading journals.” Journal of the American Society for Information Science
and Technology, 63, to appear in 2012, Wiley. DOI:10.1002/asi.22609
53
Landscape of Research in Information Systems
 The gatekeepers of science
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
54
Landscape of Research in Information Systems
 The 77 core peer-reviewed IS journals in the WoS
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
55
Landscape of Research in Information Systems
 Exploratory data analysis
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
56
Landscape of Research in Information Systems
 Exploratory data analysis
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
57
Landscape of Research in Information Systems
 Topical map of the IS field
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
58
Landscape of Research in Information Systems
 Most influential
gatekeepers
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
59
Landscape of Research in Information Systems
 Number of gatekeepers per country
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
60
Landscape of Research in Information Systems
 Geographic and gender diversity
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
61
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Question SCIM-3
What if submission date influenced the
acceptance of conference papers?
IRIRDLDL
SCIMSCIM
ScientometricsScientometrics
• Recommendation based on topics and social clues
• Landscape of research in Information Systems
• The submission-date bias in peer-reviewed conferences
Guillaume Cabanac. “What if submission date influenced the acceptance of conference papers?” Submitted to
the Journal of the American Society for Information Science and Technology, Wiley.
62
Conferences Affected by a Submission-Date bias?
 Peer-review
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
63
The Submission-Date bias
 Dataset from the ConfMaster conference management system
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
64
The Submission-Date bias
 Influence of submission date on bids
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
65
The Submission-Date bias
 Influence of submission date on average marks
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Conclusion
66
Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
Digital LibrariesDigital Libraries
• Collective annotations
• Social validation of discussion threads
• Organization-based document similarity
Information RetrievalInformation Retrieval
• The tie-breaking bias in IR evaluation
• Geographic IR
• Effectiveness of query operators
ScientometricsScientometrics
• Recommendation based on topics and social clues
• Landscape of research in Information Systems
• The submission-date bias in peer-reviewed conferences
Thank you
http://www.irit.fr/~Guillaume.Cabanachttp://www.irit.fr/~Guillaume.Cabanac
Twitter: @tafanorTwitter: @tafanor

More Related Content

Similar to Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

Going social: the librarians bag of tricks
Going social: the librarians bag of tricksGoing social: the librarians bag of tricks
Going social: the librarians bag of tricks
Bonaria Biancu
 
Synthesizing knowledge from disagreement -cwi-2015-04-23
Synthesizing knowledge from disagreement -cwi-2015-04-23Synthesizing knowledge from disagreement -cwi-2015-04-23
Synthesizing knowledge from disagreement -cwi-2015-04-23
jodischneider
 
Synthesizing knowledge from disagreement -- Manchester -- 2015-05-06
Synthesizing knowledge from disagreement -- Manchester -- 2015-05-06Synthesizing knowledge from disagreement -- Manchester -- 2015-05-06
Synthesizing knowledge from disagreement -- Manchester -- 2015-05-06
jodischneider
 
A Journey in Scientometrics: quantitative studies of science at the crossroad...
A Journey in Scientometrics: quantitative studies of science at the crossroad...A Journey in Scientometrics: quantitative studies of science at the crossroad...
A Journey in Scientometrics: quantitative studies of science at the crossroad...
Guillaume Cabanac
 
Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.
Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.
Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.
PhiloWeb
 
Cais2013 allain babcock_presentation_final
Cais2013 allain babcock_presentation_finalCais2013 allain babcock_presentation_final
Cais2013 allain babcock_presentation_final
KellliBee
 
Web 2.0 and the LMS
Web 2.0 and the LMSWeb 2.0 and the LMS
Web 2.0 and the LMS
Bryan Alexander
 
Engaging Your Community Through Cultural Heritage Digital Libraries
Engaging Your Community Through Cultural Heritage Digital Libraries Engaging Your Community Through Cultural Heritage Digital Libraries
Engaging Your Community Through Cultural Heritage Digital Libraries
Karen S Calhoun
 
The Impact of Digitization in Rhetoric and Practice: A Review of Budget Cuts ...
The Impact of Digitization in Rhetoric and Practice: A Review of Budget Cuts ...The Impact of Digitization in Rhetoric and Practice: A Review of Budget Cuts ...
The Impact of Digitization in Rhetoric and Practice: A Review of Budget Cuts ...
sara_allain
 
Calhoun and Brenner: Engaging your Community Through Cultural Heritage Digita...
Calhoun and Brenner: Engaging your Community Through Cultural Heritage Digita...Calhoun and Brenner: Engaging your Community Through Cultural Heritage Digita...
Calhoun and Brenner: Engaging your Community Through Cultural Heritage Digita...
ALATechSource
 
Evaluating Digital Scholarship, Alison Byerly
Evaluating Digital Scholarship, Alison ByerlyEvaluating Digital Scholarship, Alison Byerly
Evaluating Digital Scholarship, Alison Byerly
NITLE
 
In Praise of Interdisciplinary Research through Scientometrics
In Praise of Interdisciplinary Research through ScientometricsIn Praise of Interdisciplinary Research through Scientometrics
In Praise of Interdisciplinary Research through Scientometrics
Guillaume Cabanac
 
How communities curate knowledge & how ontologists can help -Eurecom--2015-01-19
How communities curate knowledge & how ontologists can help -Eurecom--2015-01-19How communities curate knowledge & how ontologists can help -Eurecom--2015-01-19
How communities curate knowledge & how ontologists can help -Eurecom--2015-01-19
jodischneider
 
Annotation and Scholarship
Annotation and ScholarshipAnnotation and Scholarship
Annotation and Scholarship
John Bradley
 
Web 2.0 2006: Implications for the LMS
Web 2.0 2006: Implications for the LMSWeb 2.0 2006: Implications for the LMS
Web 2.0 2006: Implications for the LMS
Bryan Alexander
 
Immersive Recommendation
Immersive RecommendationImmersive Recommendation
Immersive Recommendation
承剛 謝
 
ACS Summer Institute - Emerging Roles of Librarians - 14_0731
ACS Summer Institute - Emerging Roles of Librarians - 14_0731ACS Summer Institute - Emerging Roles of Librarians - 14_0731
ACS Summer Institute - Emerging Roles of Librarians - 14_0731
jeffreylancaster
 
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slidesMining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
Michael Mathioudakis
 
Anatomy of Social Networks, a guide for social media strategists
Anatomy of Social Networks, a guide for social media strategistsAnatomy of Social Networks, a guide for social media strategists
Anatomy of Social Networks, a guide for social media strategists
Paolo Nesi
 
Calhoun Rbms Rev June 2008
Calhoun Rbms Rev June 2008Calhoun Rbms Rev June 2008
Calhoun Rbms Rev June 2008Karen S Calhoun
 

Similar to Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics (20)

Going social: the librarians bag of tricks
Going social: the librarians bag of tricksGoing social: the librarians bag of tricks
Going social: the librarians bag of tricks
 
Synthesizing knowledge from disagreement -cwi-2015-04-23
Synthesizing knowledge from disagreement -cwi-2015-04-23Synthesizing knowledge from disagreement -cwi-2015-04-23
Synthesizing knowledge from disagreement -cwi-2015-04-23
 
Synthesizing knowledge from disagreement -- Manchester -- 2015-05-06
Synthesizing knowledge from disagreement -- Manchester -- 2015-05-06Synthesizing knowledge from disagreement -- Manchester -- 2015-05-06
Synthesizing knowledge from disagreement -- Manchester -- 2015-05-06
 
A Journey in Scientometrics: quantitative studies of science at the crossroad...
A Journey in Scientometrics: quantitative studies of science at the crossroad...A Journey in Scientometrics: quantitative studies of science at the crossroad...
A Journey in Scientometrics: quantitative studies of science at the crossroad...
 
Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.
Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.
Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.
 
Cais2013 allain babcock_presentation_final
Cais2013 allain babcock_presentation_finalCais2013 allain babcock_presentation_final
Cais2013 allain babcock_presentation_final
 
Web 2.0 and the LMS
Web 2.0 and the LMSWeb 2.0 and the LMS
Web 2.0 and the LMS
 
Engaging Your Community Through Cultural Heritage Digital Libraries
Engaging Your Community Through Cultural Heritage Digital Libraries Engaging Your Community Through Cultural Heritage Digital Libraries
Engaging Your Community Through Cultural Heritage Digital Libraries
 
The Impact of Digitization in Rhetoric and Practice: A Review of Budget Cuts ...
The Impact of Digitization in Rhetoric and Practice: A Review of Budget Cuts ...The Impact of Digitization in Rhetoric and Practice: A Review of Budget Cuts ...
The Impact of Digitization in Rhetoric and Practice: A Review of Budget Cuts ...
 
Calhoun and Brenner: Engaging your Community Through Cultural Heritage Digita...
Calhoun and Brenner: Engaging your Community Through Cultural Heritage Digita...Calhoun and Brenner: Engaging your Community Through Cultural Heritage Digita...
Calhoun and Brenner: Engaging your Community Through Cultural Heritage Digita...
 
Evaluating Digital Scholarship, Alison Byerly
Evaluating Digital Scholarship, Alison ByerlyEvaluating Digital Scholarship, Alison Byerly
Evaluating Digital Scholarship, Alison Byerly
 
In Praise of Interdisciplinary Research through Scientometrics
In Praise of Interdisciplinary Research through ScientometricsIn Praise of Interdisciplinary Research through Scientometrics
In Praise of Interdisciplinary Research through Scientometrics
 
How communities curate knowledge & how ontologists can help -Eurecom--2015-01-19
How communities curate knowledge & how ontologists can help -Eurecom--2015-01-19How communities curate knowledge & how ontologists can help -Eurecom--2015-01-19
How communities curate knowledge & how ontologists can help -Eurecom--2015-01-19
 
Annotation and Scholarship
Annotation and ScholarshipAnnotation and Scholarship
Annotation and Scholarship
 
Web 2.0 2006: Implications for the LMS
Web 2.0 2006: Implications for the LMSWeb 2.0 2006: Implications for the LMS
Web 2.0 2006: Implications for the LMS
 
Immersive Recommendation
Immersive RecommendationImmersive Recommendation
Immersive Recommendation
 
ACS Summer Institute - Emerging Roles of Librarians - 14_0731
ACS Summer Institute - Emerging Roles of Librarians - 14_0731ACS Summer Institute - Emerging Roles of Librarians - 14_0731
ACS Summer Institute - Emerging Roles of Librarians - 14_0731
 
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slidesMining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
 
Anatomy of Social Networks, a guide for social media strategists
Anatomy of Social Networks, a guide for social media strategistsAnatomy of Social Networks, a guide for social media strategists
Anatomy of Social Networks, a guide for social media strategists
 
Calhoun Rbms Rev June 2008
Calhoun Rbms Rev June 2008Calhoun Rbms Rev June 2008
Calhoun Rbms Rev June 2008
 

More from Guillaume Cabanac

Adoption de l’identifiant ORCID : le cas des universités toulousaines
Adoption de l’identifiant ORCID : le cas des universités toulousainesAdoption de l’identifiant ORCID : le cas des universités toulousaines
Adoption de l’identifiant ORCID : le cas des universités toulousaines
Guillaume Cabanac
 
Dépollution de la littérature scientifique : traque d’expression torturées ...
Dépollution de la littérature scientifique : traque d’expression torturées ...Dépollution de la littérature scientifique : traque d’expression torturées ...
Dépollution de la littérature scientifique : traque d’expression torturées ...
Guillaume Cabanac
 
Interroger la science
Interroger la scienceInterroger la science
Interroger la science
Guillaume Cabanac
 
Valoriser le capital documentaire (en sommeil) d’une organisation : exploitat...
Valoriser le capital documentaire (en sommeil) d’une organisation : exploitat...Valoriser le capital documentaire (en sommeil) d’une organisation : exploitat...
Valoriser le capital documentaire (en sommeil) d’une organisation : exploitat...
Guillaume Cabanac
 
Comment analyser une mobilisation collective dans les réseaux socionumériques...
Comment analyser une mobilisation collective dans les réseaux socionumériques...Comment analyser une mobilisation collective dans les réseaux socionumériques...
Comment analyser une mobilisation collective dans les réseaux socionumériques...
Guillaume Cabanac
 
Gender as a Variable to Study Academic Writing
Gender as a Variable to Study Academic WritingGender as a Variable to Study Academic Writing
Gender as a Variable to Study Academic Writing
Guillaume Cabanac
 
Prospection de textes scientifiques : vision prospective
Prospection de textes scientifiques : vision prospectiveProspection de textes scientifiques : vision prospective
Prospection de textes scientifiques : vision prospective
Guillaume Cabanac
 
Questionner le texte scientifique pour caractériser la science et l'innovation
Questionner le texte scientifique pour caractériser la science et l'innovationQuestionner le texte scientifique pour caractériser la science et l'innovation
Questionner le texte scientifique pour caractériser la science et l'innovation
Guillaume Cabanac
 
Le carnet de l'avent de la sociologie francophone sur Twitter : réseaux et al...
Le carnet de l'avent de la sociologie francophone sur Twitter : réseaux et al...Le carnet de l'avent de la sociologie francophone sur Twitter : réseaux et al...
Le carnet de l'avent de la sociologie francophone sur Twitter : réseaux et al...
Guillaume Cabanac
 
Interroger le texte scientifique
Interroger le texte scientifiqueInterroger le texte scientifique
Interroger le texte scientifique
Guillaume Cabanac
 
The promises of web scrapping: Mining the web for relational data about artists
The promises of web scrapping: Mining the web for relational data about artistsThe promises of web scrapping: Mining the web for relational data about artists
The promises of web scrapping: Mining the web for relational data about artists
Guillaume Cabanac
 
Émergence de l’open access « gris » : LibGen et Sci-Hub comme filières clande...
Émergence de l’open access « gris » : LibGen et Sci-Hub comme filières clande...Émergence de l’open access « gris » : LibGen et Sci-Hub comme filières clande...
Émergence de l’open access « gris » : LibGen et Sci-Hub comme filières clande...
Guillaume Cabanac
 
Confrontation à la perception humaine de mesures de similarité entre membres
Confrontation à la perception humaine de mesures de similarité entre membres Confrontation à la perception humaine de mesures de similarité entre membres
Confrontation à la perception humaine de mesures de similarité entre membres
Guillaume Cabanac
 
« T'as pensé à retweeter mon article ? » Enjeux, limites et critique de la bi...
« T'as pensé à retweeter mon article ? » Enjeux, limites et critique de la bi...« T'as pensé à retweeter mon article ? » Enjeux, limites et critique de la bi...
« T'as pensé à retweeter mon article ? » Enjeux, limites et critique de la bi...
Guillaume Cabanac
 
Émergence de l’open access « gris » : LibGen et Sci-Hub
Émergence de l’open access « gris » : LibGen et Sci-HubÉmergence de l’open access « gris » : LibGen et Sci-Hub
Émergence de l’open access « gris » : LibGen et Sci-Hub
Guillaume Cabanac
 
Sur les étagères des bibliothèques numériques clandestines:
Sur les étagères des bibliothèques numériques clandestines: Sur les étagères des bibliothèques numériques clandestines:
Sur les étagères des bibliothèques numériques clandestines:
Guillaume Cabanac
 
Les altmetrics : estimer l'engouement pour la recherche sur les médias sociaux
Les altmetrics : estimer l'engouement pour la recherche sur les médias sociauxLes altmetrics : estimer l'engouement pour la recherche sur les médias sociaux
Les altmetrics : estimer l'engouement pour la recherche sur les médias sociaux
Guillaume Cabanac
 
Bibliogifts ? Les bibliothèques clandestines de l'édition scientifique
Bibliogifts ? Les bibliothèques clandestines de l'édition scientifiqueBibliogifts ? Les bibliothèques clandestines de l'édition scientifique
Bibliogifts ? Les bibliothèques clandestines de l'édition scientifique
Guillaume Cabanac
 
Le renfort des liens forts - dynamique relationnelle du coauthorship
Le renfort des liens forts - dynamique relationnelle du coauthorshipLe renfort des liens forts - dynamique relationnelle du coauthorship
Le renfort des liens forts - dynamique relationnelle du coauthorship
Guillaume Cabanac
 
Médias sociaux et visibilité des chercheurs
Médias sociaux et visibilité des chercheursMédias sociaux et visibilité des chercheurs
Médias sociaux et visibilité des chercheurs
Guillaume Cabanac
 

More from Guillaume Cabanac (20)

Adoption de l’identifiant ORCID : le cas des universités toulousaines
Adoption de l’identifiant ORCID : le cas des universités toulousainesAdoption de l’identifiant ORCID : le cas des universités toulousaines
Adoption de l’identifiant ORCID : le cas des universités toulousaines
 
Dépollution de la littérature scientifique : traque d’expression torturées ...
Dépollution de la littérature scientifique : traque d’expression torturées ...Dépollution de la littérature scientifique : traque d’expression torturées ...
Dépollution de la littérature scientifique : traque d’expression torturées ...
 
Interroger la science
Interroger la scienceInterroger la science
Interroger la science
 
Valoriser le capital documentaire (en sommeil) d’une organisation : exploitat...
Valoriser le capital documentaire (en sommeil) d’une organisation : exploitat...Valoriser le capital documentaire (en sommeil) d’une organisation : exploitat...
Valoriser le capital documentaire (en sommeil) d’une organisation : exploitat...
 
Comment analyser une mobilisation collective dans les réseaux socionumériques...
Comment analyser une mobilisation collective dans les réseaux socionumériques...Comment analyser une mobilisation collective dans les réseaux socionumériques...
Comment analyser une mobilisation collective dans les réseaux socionumériques...
 
Gender as a Variable to Study Academic Writing
Gender as a Variable to Study Academic WritingGender as a Variable to Study Academic Writing
Gender as a Variable to Study Academic Writing
 
Prospection de textes scientifiques : vision prospective
Prospection de textes scientifiques : vision prospectiveProspection de textes scientifiques : vision prospective
Prospection de textes scientifiques : vision prospective
 
Questionner le texte scientifique pour caractériser la science et l'innovation
Questionner le texte scientifique pour caractériser la science et l'innovationQuestionner le texte scientifique pour caractériser la science et l'innovation
Questionner le texte scientifique pour caractériser la science et l'innovation
 
Le carnet de l'avent de la sociologie francophone sur Twitter : réseaux et al...
Le carnet de l'avent de la sociologie francophone sur Twitter : réseaux et al...Le carnet de l'avent de la sociologie francophone sur Twitter : réseaux et al...
Le carnet de l'avent de la sociologie francophone sur Twitter : réseaux et al...
 
Interroger le texte scientifique
Interroger le texte scientifiqueInterroger le texte scientifique
Interroger le texte scientifique
 
The promises of web scrapping: Mining the web for relational data about artists
The promises of web scrapping: Mining the web for relational data about artistsThe promises of web scrapping: Mining the web for relational data about artists
The promises of web scrapping: Mining the web for relational data about artists
 
Émergence de l’open access « gris » : LibGen et Sci-Hub comme filières clande...
Émergence de l’open access « gris » : LibGen et Sci-Hub comme filières clande...Émergence de l’open access « gris » : LibGen et Sci-Hub comme filières clande...
Émergence de l’open access « gris » : LibGen et Sci-Hub comme filières clande...
 
Confrontation à la perception humaine de mesures de similarité entre membres
Confrontation à la perception humaine de mesures de similarité entre membres Confrontation à la perception humaine de mesures de similarité entre membres
Confrontation à la perception humaine de mesures de similarité entre membres
 
« T'as pensé à retweeter mon article ? » Enjeux, limites et critique de la bi...
« T'as pensé à retweeter mon article ? » Enjeux, limites et critique de la bi...« T'as pensé à retweeter mon article ? » Enjeux, limites et critique de la bi...
« T'as pensé à retweeter mon article ? » Enjeux, limites et critique de la bi...
 
Émergence de l’open access « gris » : LibGen et Sci-Hub
Émergence de l’open access « gris » : LibGen et Sci-HubÉmergence de l’open access « gris » : LibGen et Sci-Hub
Émergence de l’open access « gris » : LibGen et Sci-Hub
 
Sur les étagères des bibliothèques numériques clandestines:
Sur les étagères des bibliothèques numériques clandestines: Sur les étagères des bibliothèques numériques clandestines:
Sur les étagères des bibliothèques numériques clandestines:
 
Les altmetrics : estimer l'engouement pour la recherche sur les médias sociaux
Les altmetrics : estimer l'engouement pour la recherche sur les médias sociauxLes altmetrics : estimer l'engouement pour la recherche sur les médias sociaux
Les altmetrics : estimer l'engouement pour la recherche sur les médias sociaux
 
Bibliogifts ? Les bibliothèques clandestines de l'édition scientifique
Bibliogifts ? Les bibliothèques clandestines de l'édition scientifiqueBibliogifts ? Les bibliothèques clandestines de l'édition scientifique
Bibliogifts ? Les bibliothèques clandestines de l'édition scientifique
 
Le renfort des liens forts - dynamique relationnelle du coauthorship
Le renfort des liens forts - dynamique relationnelle du coauthorshipLe renfort des liens forts - dynamique relationnelle du coauthorship
Le renfort des liens forts - dynamique relationnelle du coauthorship
 
Médias sociaux et visibilité des chercheurs
Médias sociaux et visibilité des chercheursMédias sociaux et visibilité des chercheurs
Médias sociaux et visibilité des chercheurs
 

Recently uploaded

Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
Wasim Ak
 
Reflective and Evaluative Practice PowerPoint
Reflective and Evaluative Practice PowerPointReflective and Evaluative Practice PowerPoint
Reflective and Evaluative Practice PowerPoint
amberjdewit93
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
goswamiyash170123
 
Fresher’s Quiz 2023 at GMC Nizamabad.pptx
Fresher’s Quiz 2023 at GMC Nizamabad.pptxFresher’s Quiz 2023 at GMC Nizamabad.pptx
Fresher’s Quiz 2023 at GMC Nizamabad.pptx
SriSurya50
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
deeptiverma2406
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
IreneSebastianRueco1
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
RitikBhardwaj56
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
 
kitab khulasah nurul yaqin jilid 1 - 2.pptx
kitab khulasah nurul yaqin jilid 1 - 2.pptxkitab khulasah nurul yaqin jilid 1 - 2.pptx
kitab khulasah nurul yaqin jilid 1 - 2.pptx
datarid22
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
eBook.com.bd (প্রয়োজনীয় বাংলা বই)
 

Recently uploaded (20)

Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
 
Reflective and Evaluative Practice PowerPoint
Reflective and Evaluative Practice PowerPointReflective and Evaluative Practice PowerPoint
Reflective and Evaluative Practice PowerPoint
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
 
Fresher’s Quiz 2023 at GMC Nizamabad.pptx
Fresher’s Quiz 2023 at GMC Nizamabad.pptxFresher’s Quiz 2023 at GMC Nizamabad.pptx
Fresher’s Quiz 2023 at GMC Nizamabad.pptx
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
 
kitab khulasah nurul yaqin jilid 1 - 2.pptx
kitab khulasah nurul yaqin jilid 1 - 2.pptxkitab khulasah nurul yaqin jilid 1 - 2.pptx
kitab khulasah nurul yaqin jilid 1 - 2.pptx
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
 

Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

  • 1. Musings at the Crossroads ofMusings at the Crossroads of Digital Libraries, Information Retrieval,Digital Libraries, Information Retrieval, and Scientometricsand Scientometrics http://bit.ly/rguCabanac2012http://bit.ly/rguCabanac2012 Guillaume Cabanac guillaume.cabanac@univ-tlse3.fr March 28th, 2012
  • 2. Outline of these Musings 2 Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac Digital LibrariesDigital Libraries • Collective annotations • Social validation of discussion threads • Organization-based document similarity Information RetrievalInformation Retrieval • The tie-breaking bias in IR evaluation • Geographic IR • Effectiveness of query operators ScientometricsScientometrics • Recommendation based on topics and social clues • Landscape of research in Information Systems • The submission-date bias in peer-reviewed conferences
  • 3. 3 Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac Digital LibrariesDigital Libraries • Collective annotations • Social validation of discussion threads • Organization-based document similarity Information RetrievalInformation Retrieval • The tie-breaking bias in IR evaluation • Geographic IR • Effectiveness of query operators ScientometricsScientometrics • Recommendation based on topics and social clues • Landscape of research in Information Systems • The submission-date bias in peer-reviewed conferences Outline of these Musings
  • 4. 4 Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac Digital LibrariesDigital Libraries • Collective annotations • Social validation of discussion threads • Organization-based document similarity Question DL-1 How to transpose paper-based annotations into digital documents? IRIRDLDL SCIMSCIM Guillaume Cabanac, Max Chevalier, Claude Chrisment, Christine Julien. “Collective annotation: Perspectives for information retrieval improvement.” RIAO’07 : Proceedings of the 8th conference on Information Retrieval and its Applications, pages 529–548. CID, may 2007.
  • 5. 5  Characteristics of paper annotation  Secular activity: older than 4 centuries  Numerous applicative contexts: theology, science, literature …  Personal use: “active reading” (Adler & van Doren, 1972)  Collective use: review process, opinion exchange … From Individual Paper-based Annotation … US students (Marshall, 1998) 1541 Annotated bible (Lortsch, 1910) Fermat’s last theorem (Kleiner, 2000) Annotations from Blake, Keats… (Jackson, 2001) Les Misérables Victor Hugo 1630 1790 1830 1881 1998 Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 6. 6 … to Collective Digital Annotations author 87% reader 13% 1993 2005 ComMentor … iMarkup … Yawas … Amaya … > 20 annotation systems (Cabanac et al., 2005) Web servers (Ovsiannikov et al., 1999) Annotation server a discussion thread Hard to share ⇒ ‘lost’ hardcopy Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 7. 7  W3C Annotea / Amaya (Kahan et al., 2002) Digital Document Annotation: Examples a reader’s comment discussion thread  Arakne, featuring “fluid annotations” (Bouvin et al., 2002) Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 8. 8 Collective Annotations  Reviewed 64 systems designed during 1989–2008  Collective Annotation  Objective data  Owner, creation date  Anchoring point within the document. Granularity: all doc, words…  Subjective information  Comments, various marks: stars, underlined text…  Annotation types: support/refutation, question…  Visibility: public, private, group…  Purpose-oriented annotation categories Annotation remark Annotation reminder Annotation argumentation Personal Annotation Space Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 9. 9 Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac Digital LibrariesDigital Libraries • Collective annotations • Social validation of discussion threads • Organization-based document similarity Question DL-2 How to measure the social validity of a statement according to the argumentative discussion it sparked off? IRIRDLDL SCIMSCIM Guillaume Cabanac, Max Chevalier, Claude Chrisment, Christine Julien. “Social validation of collective annotations : Definition and experiment.” Journal of the American Society for Information Science and Technology, 61(2):271–287, feb. 2010, Wiley. DOI:10.1002/asi.21255
  • 10. 10  Scalability issue   Which annotations should I read?  Social validation = degree of consensus of the group Social Validation Social Validation of Argumentative Debates Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 11. 11 Social Validation of Argumentative Debates Before Annotation magma After Filtered display  Informing readers about how validated each annotation is Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 12. 12  Overview  Two proposed algorithms  Empirical Recursive Scoring Algorithm (Cabanac et al., 2005)  Bipolar Argumentation Framework Extension  based on Artificial Intelligence research works (Cayrol & Lagasquie-Schiex, 2005) Social Validation Algorithms validity 0 socially neutral – 1 socially refuted 1 socially confirmed case 1case 2case 3 case 4 A B A B Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 13. 13  Example  Computing the social validity of a debated annotation Social Validation Algorithm Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 14. 14 Validation with a User-study  Design  Corpus: 13 discussion threads = 222 annotations + answers  Task of a participant  Label opinion type  Infer overall opinion  Volunteer subjects 53 119  Aim: social validation vs human perception of consensus Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 15. 15  Q1 Do people agree when labeling opinions?  Kappa coefficient (Fleiss, 1971; Fleiss et al., 2003) Inter-rater agreement among n > 2 raters  Weak agreement, with variability ⇒ subjective task Experimenting the Social Validation of Debates Debate Id Fair to good Poor Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac ValueofKappa agreement
  • 16. 16  Q2 How well SV approximates HP?  HP = Human Perception of consensus  SV = Social Validation algorithm 1. Test whether PH and VS are different (p < 0.05) ⇒ Student’s paired t-test: (p = 0,20) > (α = 0,05) 2. Correlate HP et SV ⇒ Pearson’s coefficient of correlation r r(HP, SV) = 0.48 shows a weak correlation Experimenting the Social Validation of Debates HP – SV Density y = p(HP – SV) example: HP = SV for 24 % of all cases Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac Density
  • 17. 17 Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac Digital LibrariesDigital Libraries • Collective annotations • Social validation of discussion threads • Organization-based document similarity Question DL-3 How to harness a quiescent capital present in any community: its documents? IRIRDLDL SCIMSCIM Guillaume Cabanac, Max Chevalier, Claude Chrisment, Christine Julien. “Organization of digital resources as an original facet for exploring the quiescent information capital of a community.” International Journal on Digital Libraries, 11(4):239–261, dec. 2010, Springer. DOI:10.1007/s00799-011-0076-6
  • 18. 18  Personal Documents  Filtered, validated, organized information…  … relevant to activities in the organization  Paradox: profitable, but under-exploited  Reason 1 –  folders and files are private  Reason 2 –  manual sharing  Reason 3 –  automated sharing  Consequences  People resort to resources available outside of the community  Weak ROI ⇒ why would we have to look outside when it’s already there? Documents as a Quiescent Wealth Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 19. 19  Mapping the documents of the community  SOM [Kohonen, 2001] Umap [Triviumsoft] TreeMap [Fekete & Plaisant, 2001]…  Limitations  Find the documents with same topicssame topics as D  Find documents that colleagues useuse with D → concept of usage: grouping documentsgrouping documents ⇆ keeping stuff in commonkeeping stuff in common How to Benefit from Documents in a Community? Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 20. 20  Organization-based similarities  inter-folder  inter-document  inter-user Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac How to Benefit from Documents in a Community?
  • 21. 21  Purpose: Offering a global view of  … people and their documents  Based on document contents  Based on document usage/organization  Requirement: non-intrusiveness and confidentiality  OperationalOperational needs  Find documents  With related materials  With complementary materials  Seeking people ⇆ seeking documents  ManagerialManagerial needs  Visualize the global/individual activity  Work position → required documents How to Help People to Discover/Find/Use Documents? community Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 22. 22 4 views = {documents, people} × {group, unit} 1. Group of documents  Main topics  Usage groups 2. A single document  Who to liaise with?  What to read? 3. Group of people  Community of interest  Community of use 4. A single people  Interests  Similar users (potential help) Proposed System: Static Aspect Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 23. 23 Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac Digital LibrariesDigital Libraries • Collective annotations • Social validation of discussion threads • Organization-based document similarity Information RetrievalInformation Retrieval • The tie-breaking bias in IR evaluation • Geographic IR • Effectiveness of query operators ScientometricsScientometrics • Recommendation based on topics and social clues • Landscape of research in Information Systems • The submission-date bias in peer-reviewed conferences Outline of these Musings
  • 24. 24 Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac Question IR-1 Is document tie-breaking affecting the evaluation of Information Retrieval systems? IRIRDLDL SCIMSCIM Information RetrievalInformation Retrieval • The tie-breaking bias in IR evaluation • Geographic IR • Effectiveness of query operators Guillaume Cabanac, Gilles Hubert, Mohand Boughanem, Claude Chrisment. “Tie-breaking Bias : Effect of an Uncontrolled Parameter on Information Retrieval Evaluation.” M. Agosti, N. Ferro, C. Peters, M. de Rijke, and A. F. Smeaton (Eds.) CLEF’10 : Proceedings of the 1st Conference on Multilingual and Multimodal Information Access Evaluation, volume 6360 de LNCS, pages 112–123. Springer, sep. 2010. DOI:10.1007/978-3-642-15998-5_13
  • 25. 25 Measuring the Effectiveness of IR systems  User-centered vs. System-focused [Spärck Jones & Willett, 1997]  Evaluation campaigns  1958 Cranfield, UK  1992 TREC (Text Retrieval Conference), USA  1999 NTCIR (NII Test Collection for IR Systems), Japan  2001 CLEF (Cross-Language Evaluation Forum), Europe  …  “Cranfield” methodology  Task  Test collection  Corpus  Topics  Qrels  Measures : MAP, P@X ... using trec_eval [Voorhees, 2007] Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 26. 26 Runs are Reordered Prior to Their Evaluation Qrels = 〈qid, iter, docno, rel〉 Run = 〈qid, iter, docno, rank, sim, run_id〉 Reordering by trec_eval qid asc, sim desc, docno desc Effectiveness measure = f (intrinsic_quality, ) MAP, P@X, MRR… Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 27. 27 Consequences of Run Reordering  Measures of effectiveness for an IRS s  RR(s,t) 1/rank of the 1st relevant document, for topic t  P(s,t,d) precision at document d, for topic t  AP(s,t) average precision for topic t  MAP(s) mean average precision  Tie-breaking bias  Is the Wall Street Journal collection more relevant than Associated Press? ChrisChris EllenEllen  Sensitive to document rank Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 28. 28 What we Learnt: Beware of Tie-breaking for AP  Poor effect on MAP, larger effect on AP  Measure bounds APRealistic ≤ APConventionnal ≤ APOptimistic padre1, adhoc’94 Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 29. 29 Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac Question IR-2 How to retrieve documents matching keywords and spatiotemporal constraints? IRIRDLDL SCIMSCIM Information RetrievalInformation Retrieval • The tie-breaking bias in IR evaluation • Geographic IR • Effectiveness of query operators Damien Palacio, Guillaume Cabanac, Christian Sallaberry, Gilles Hubert. “On the evaluation of geographic information retrieval systems: Evaluation framework and case study.” International Journal on Digital Libraries, 11(2):91–109, june 2010, Springer. DOI:10.1007/s00799-011-0070-z
  • 30. 30 Geographic Information Retrieval  Query = “Road trip around Aberdeen summer 1982”  Search engines  Topic term ∈ {road, trip, Aberdeen, summer} spatial ∈ {AberdeenCity, AberdeenCounty…}  Geographic temporal ∈ [21-JUN-1982 .. 22-SEP-1982] term ∈ {road, trip, Aberdeen, summer}  ≈ 1/6 queries = geographic queries  Excite (Sanderson et al., 2004)  AOL (Gan et al., 2008)  Yahoo! (Jones et al., 2008) ⇒ Current issue worth studying Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 31. 31 The Internals of a Geographic IR System  3 dimensions to process  Topical, spatial, temporal  1 index per dimension  Topic bag of words, stemming, weighting, comparing with VSM…  Spatial spatial entity detection, spatial relation resolution…  Temporal temporal entity detection…  Query processing with sequential filtering  e.g., priority to theme, then filtering according to other dimensions  Issue: effectiveness of GIRSs vs state-of-the-art IRSs?  Hypothesis: GIRSs better than state-of-the-art IRSs Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 32. 32 Case Study: the PIV GIR System  Indexing: one index per dimension  Topical = Terrier IRS Spatial = tiling Temporal = tiling  Retrieval  Identification of the 3 dimensions in the query  Routing towards each index  Combination of results with CombMNZ [Fox & Shaw, 1993; Lee 1997] Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 33. 33 Case Study: the PIV GIR System  Principle of CombMNZ and Borda Count Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 34. 34 Case Study: the PIV GIR System  Gain in effectiveness Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 35. 35 Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac Question IR-3 Do operators in search queries improve the effectiveness of search results? IRIRDLDL SCIMSCIM Information RetrievalInformation Retrieval • The tie-breaking bias in IR evaluation • Geographic IR • Effectiveness of query operators Gilles Hubert, Guillaume Cabanac, Christian Sallaberry, Damien Palacio. “Query Operators Shown Beneficial for Improving Search Results.” S. Gradmann, F. Borri, C. Meghini, H. Schuldt (Eds.) TPDL’11 : Proceedings of the 1st International Conference on Theory and Practice of Digital Libraries, volume 6966 de LNCS, pages 118–129. Springer, sep. 2011. DOI:10.1007/978-3-642-24469-8_14.
  • 36.  Various Operators  Quotation marks, Must appear (+), boosting operator (^), Boolean operators, proximity operators… 36 Information need “I’m looking for research projects funded in the DL domain” Regular query Query with operators Search Engines Offer Query Operators Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 37. Our Research Questions 37 Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 38. 38 Our Methodology in a Nutshell Regular query V1: Query variant with operators        < V3 V2 V4 VN. . . Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 39. 39 Effectiveness of Query Operators  TREC-7 per Topic Analysis: Boxplots  ‘+’ and ‘^’ Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 40. 40 Effectiveness of Query Operators  Per Topic Analysis: Box plot AP of TREC’s regular query Query variant highest AP 32 Topics AP(AveragePrecision) 0.2 0.1 0.3 0.4 Query variant lowest AP Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 41. 41 Effectiveness of Query Operators  TREC-7 Per Topic Analysis  ‘+’ and ‘^’ MAP  = 0.1554 MAP ┬ = 0.2099 +35.1% Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 42. 42 Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac Digital LibrariesDigital Libraries • Collective annotations • Social validation of discussion threads • Organization-based document similarity Information RetrievalInformation Retrieval • The tie-breaking bias in IR evaluation • Geographic IR • Effectiveness of query operators ScientometricsScientometrics • Recommendation based on topics and social clues • Landscape of research in Information Systems • The submission-date bias in peer-reviewed conferences Outline of these Musings
  • 43. 43 Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac Question SCIM-1 How to recommend researchers according to their research topics and social clues? IRIRDLDL SCIMSCIM ScientometricsScientometrics • Recommendation based on topics and social clues • Landscape of research in Information Systems • The submission-date bias in peer-reviewed conferences Guillaume Cabanac. “Accuracy of inter-researcher similarity measures based on topical and social clues.” Scientometrics, 87(3):597–620, june 2011, Springer. DOI:10.1007/s11192-011-0358-1
  • 44. 44 Recommendation of Literature (McNee et al., 2006)  Collaborative filtering  Principle: mining the preferencespreferences of researchers → those who liked this paper also liked…  Snowball effect / fad  Innovation?  Relevance of theme?  Cognitive filtering  Principle: mining the contentscontents of articles → profile of resources (researcher, articles) → citation graph  Hybrid approach      ???? Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 45. 45 Foundations: Similarity Measures Under Study  Model  Coauthors graph authors ↔ auteurs  Venues graph authors ↔ conferences / journals  Social similarities  Inverse degree of separation length of the shortest path  Strength of the tie number of shortest paths  Shared conferences number of shared conference editions  Thematic similarity  Cosine on Vector Space Model di = (wi 1 , … , wi n ) built on titles (doc / researcher) Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 46. 46 Computing Similarities with Social Clues  Task of literature review  Requirement topical relevance  Preference social proximity (meetings, project…) ⇒ re-rank topical results with social clues  Combination with CombMNZ (Fox & Shaw, 1993)  Final result: list of recommended researchers CombMNZ Degree of separation Strength of ties Shared conferences Social list Topical list ∩ CombMNZ TS listTS list Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 47. 47 Evaluation Design  Comparison of recommendations and researchers’ perception  Q1 : Effectiveness of topical (only) recommendations?  Q2 : Gain due to integrating social clues?  IR experiments: Cranfield paradigm (TREC…)  Does the search engine retrieve relevant documents? Doc relevant? assessor relevance judgments {0, 1} binary [0, N] gradual qrels trec_eval Effectiveness measures Mean Average Precision Normalized Discounted Cumulative Gain topic S1 S2 1 0.5687 0.6521 … … … 50 0.7124 0.7512 avg 0.6421 0.7215 improvement +12.3 % significativity p < 0.05 (paired t-test) search engine x input topic corpus Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 48. 48 Evaluating Recommendations doc relevant ? assessor relevance judgments {0, 1} binary [0, N] gradual qrels trec_eval Effectiveness measures Mean Average Precision Normalized Discounted Cumulative Gain topic S1 S2 1 0.5687 0.6521 … … … 50 0.7124 0.7512 avg 0.6421 0.7215 improvement +12.3 % significativity p < 0.05 (paired t-test) search engine x input topic corpus name of a researcher researcher « With whom would you like to chat for improving your research? » recommender system topical topical + social #subjects Top 25 Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 49. 49 Experiment  Features  Data dblp.xml (713 MB = 1.3M publications for 811,787 researchers)  Subjects 90 researchers-contacts contacted by mail 74 researchers began to fill the questionnaire. 71 completed it  Interface for assessing recommendations    Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 50. 50 Experiments: Profile of the Participants  Experience of the 71 subjects Mdn = 13 years 74  Productivity of the 71 subjects Mdn = 15 publications Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac NumberofparticipantsNumberofparticipants Seniority Number of publications
  • 51. 51 Empirical Validation of our Hypothesis  Strong baseline ⇒ effective approach based on VSM  +8.49 % = significant improvement (p < 0.05 ; n = 70) of topical recommendations by social clues 0,5 0,6 0,7 0,8 0,9 1 global < 15 publis >= 15 publis < 13 ans >= 13 ans Thématique Thématique + Social productivity experience +8,49 %+8,49 % +10,39 %+10,39 % +7,03 %+7,03 % +6,50 %+6,50 % +10,22 %+10,22 % NDCG Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac Topical Topical + social yearsyears
  • 52. 52 Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac Question SCIM-2 What is the landscape of research in Information Systems from the perspective of gatekeepers? IRIRDLDL SCIMSCIM ScientometricsScientometrics • Recommendation based on topics and social clues • Landscape of research in Information Systems • The submission-date bias in peer-reviewed conferences Guillaume Cabanac. “Shaping the landscape of research in Information Systems from the perspective of editorial boards : A scientometric study of 77 leading journals.” Journal of the American Society for Information Science and Technology, 63, to appear in 2012, Wiley. DOI:10.1002/asi.22609
  • 53. 53 Landscape of Research in Information Systems  The gatekeepers of science Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 54. 54 Landscape of Research in Information Systems  The 77 core peer-reviewed IS journals in the WoS Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 55. 55 Landscape of Research in Information Systems  Exploratory data analysis Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 56. 56 Landscape of Research in Information Systems  Exploratory data analysis Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 57. 57 Landscape of Research in Information Systems  Topical map of the IS field Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 58. 58 Landscape of Research in Information Systems  Most influential gatekeepers Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 59. 59 Landscape of Research in Information Systems  Number of gatekeepers per country Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 60. 60 Landscape of Research in Information Systems  Geographic and gender diversity Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 61. 61 Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac Question SCIM-3 What if submission date influenced the acceptance of conference papers? IRIRDLDL SCIMSCIM ScientometricsScientometrics • Recommendation based on topics and social clues • Landscape of research in Information Systems • The submission-date bias in peer-reviewed conferences Guillaume Cabanac. “What if submission date influenced the acceptance of conference papers?” Submitted to the Journal of the American Society for Information Science and Technology, Wiley.
  • 62. 62 Conferences Affected by a Submission-Date bias?  Peer-review Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 63. 63 The Submission-Date bias  Dataset from the ConfMaster conference management system Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 64. 64 The Submission-Date bias  Influence of submission date on bids Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 65. 65 The Submission-Date bias  Influence of submission date on average marks Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac
  • 66. Conclusion 66 Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac Digital LibrariesDigital Libraries • Collective annotations • Social validation of discussion threads • Organization-based document similarity Information RetrievalInformation Retrieval • The tie-breaking bias in IR evaluation • Geographic IR • Effectiveness of query operators ScientometricsScientometrics • Recommendation based on topics and social clues • Landscape of research in Information Systems • The submission-date bias in peer-reviewed conferences

Editor's Notes

  1. Structure FDD, en-contexte