SlideShare a Scribd company logo
Giovanni Colavizza
Matteo Romanello (@mr56k)
Frédéric Kaplan (@frederickaplan)
The References of References:
Enriching Library Catalogs via Domain-Specific
Reference Mining
1
Goal
2
Empowering scholars in
the Humanities with better
IR systems
Motivation - the Scholar
Issues: lack of data [Sula and Miller, 2014] leads to absence of
services: estimated coverage of Web of Science for Humanities
circa 13% [Mingers and Leydesdorff, 2015].
3
Sciences:
Google Scholar
English
mainly papers
Lower-cost information
gathering
Humanities:
no Google Scholar-like system
multiple languages
mainly monographs
Higher-cost information
gathering
Motivation - the Footnote
How humanists cite? Footnotes [see e.g. Hellqvist, 2009]
4
Motivation - the Archive
Approximately half citations to primary sources [Wiberley Jr., 2009]
5
Motivation - the Scholar reloaded
6
Proposal: Enriching library catalogs
7
Use reference monographs, the “canon” of the
domain, to extract references to the rest of the
literature and enrich library catalogs.
Project: Linked Books
Focused on a case study/domain:
the history of Venice.
Partners so far:
• Ca’ Foscari University Library System
• Biblioteca Marciana
• Istituto Veneto di Scienze, Lettere ed Arti
• Archivio di Stato di Venezia
• EPFL
8
The Pipeline
9
Corpus selection
10
Result: 1904 monographs, 701 with
a structured list of references.
Use the means of the library:
1- Consultation shelves
2- Dewey and subject classification
3- Scholarly bibliographies
4- Keyword search
The Pipeline - Digitization
11
Digitization
12
1,904 monographs + ~1,000 journal issues
The Pipeline - Annotation/Extraction/Parsing
13
Annotation
14
• annotated 27% of 701 monographs (with reference list)
• 3.8% of all digitized pages (with references)
• annotators identified 33 citation styles, divided into 6 families
• Yes, humanities scholars love customized reference styles!
Reference Extraction/Parsing
15
[Klinkhammer b-i-secondary-full] [Lutz, i-secondary-full] [L’occupazione i-
secondary-full] [tedesca i-secondary-full] [in i-secondary-full] [Italia i-secondary-full]
[1943-1945, i-secondary-full] [Torino, i-secondary-full] [Bollati i-secondary-full]
[Boringhieri i-secondary-full] [1993 i-secondary-full].
Klinkhammer Lutz, L’occupazione tedesca in Italia 1943-1945,
Torino, Bollati Boringhieri 1993.
[Klinkhammer author] [Lutz, author] [L’occupazione title] [tedesca title]
[in title] [Italia title] [1943-1945, title] [Torino, publicationplace] [Bollati
publisher] [Boringhieri publisher] [1993 publicationyear].
Extraction/Parsing - Evaluation
16
Extraction/Parsing - Confusion Matrix
17
null
author
title
abbrev. (E)
monograph (E)
Task 1
F1 score
(avg) 0.806
class=“null” 0.609
Task 2
F1 score
(avg) 0.842
class=“end abbreviated” 0.242
The Pipeline - Lookup
18
Lookup
19
1. Against OPAC SBN (via API)
Steps:
1. search candidates by title
2. match reference metadata
3. assign each candidate a
confidence score
4. return set of candidates
Evaluation:
• 2k references (out of 181k)
• 41.7% no candidates
• 58.3% with candidates:
• 72.3% -> first candidate
correct
Goal: disambiguation of references
Issues:
• OCR errors -> impact on search by title (low recall)
• API as a “black box” + bottleneck of search by title
Lookup
20
2. Against metadata of digitized books
Lookup
Goal: verify cohesiveness of digitized corpus
Method:
• based on SBN lookup
• but lookup against digitization
metadata
• tuned to maximize precision
• returns 1 or no matches
Evaluation*:
• 500 references (out of 181k)
• precision ~ 1.00
• recall > 0.95
Result:
• only 7% of references extracted from 701 monographs point
inwards (i.e. towards the 1904 monographs)
21
Core of the discipline
co-citation network from
extracted references*
giant component = 59%
of selected corpus
books in the giant
component -> core of
reference works on
history of Venice
giant component ->
32.5% with only works in
consultation
Conclusions and Outlook
22
data- and citation-driven approach to assess and
exploit, from an IR point of view, domain-specific
library holdings on the history of Venice
next big challenge: extraction, consolidation and
disambiguation of references contained within
footnotes (journals)
Giovanni Colavizza
Matteo Romanello (@mr56k)
Frédéric Kaplan (@frederickaplan)
Thank you!
go.epfl.ch/linkedbooks
23

More Related Content

Similar to The References of References: Enriching Library Catalogs via Domain-Specific Reference Mining

Exploratory computing: designing discovery-driven user experiences
Exploratory computing: designing discovery-driven user experiencesExploratory computing: designing discovery-driven user experiences
Exploratory computing: designing discovery-driven user experiences
Luigi Spagnolo
 
INF 100 Tutorial
INF 100 TutorialINF 100 Tutorial
INF 100 Tutorial
Chris Chan
 
Dissertations 5 ref, plagiarism, own crit-analysis
Dissertations 5   ref, plagiarism, own crit-analysisDissertations 5   ref, plagiarism, own crit-analysis
Dissertations 5 ref, plagiarism, own crit-analysisStudy Hub
 
Goldminers of the Digital Age: How Libraries are Selecting, Presenting, and D...
Goldminers of the Digital Age: How Libraries are Selecting, Presenting, and D...Goldminers of the Digital Age: How Libraries are Selecting, Presenting, and D...
Goldminers of the Digital Age: How Libraries are Selecting, Presenting, and D...
Northern California Technical Processes Group
 
Linked Books - DH Venice Fall School 2014
Linked Books - DH Venice Fall School 2014Linked Books - DH Venice Fall School 2014
Linked Books - DH Venice Fall School 2014
Giovanni Colavizza
 
Romanello tokyo
Romanello tokyoRomanello tokyo
Romanello tokyo
Matteo Romanello
 
Bibiliography Footnotes Oral Presentation PPT 17.pptx
Bibiliography Footnotes Oral Presentation PPT 17.pptxBibiliography Footnotes Oral Presentation PPT 17.pptx
Bibiliography Footnotes Oral Presentation PPT 17.pptx
Jamshi8
 
Dissertations 5 ref, plagiarism, own crit-analysis [handout]
Dissertations 5   ref, plagiarism, own crit-analysis [handout]Dissertations 5   ref, plagiarism, own crit-analysis [handout]
Dissertations 5 ref, plagiarism, own crit-analysis [handout]Study Hub
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
Angelo Salatino
 
Towards knowledge maintenance in scientific digital libraries with the keysto...
Towards knowledge maintenance in scientific digital libraries with the keysto...Towards knowledge maintenance in scientific digital libraries with the keysto...
Towards knowledge maintenance in scientific digital libraries with the keysto...
jodischneider
 
02 Literature search and reviewing_1.pptx
02  Literature search and reviewing_1.pptx02  Literature search and reviewing_1.pptx
02 Literature search and reviewing_1.pptx
ssusere05ec21
 
Portland Place School June 2018
Portland Place School June 2018 Portland Place School June 2018
Portland Place School June 2018
EISLibrarian
 
Order #185993101 writers choice (5 pages, 4 slides)type of serv
Order #185993101 writers choice (5 pages, 4 slides)type of servOrder #185993101 writers choice (5 pages, 4 slides)type of serv
Order #185993101 writers choice (5 pages, 4 slides)type of serv
JUST36
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
Angelo Salatino
 
Print source literature 24 March 2023.pptx
Print source literature 24 March 2023.pptxPrint source literature 24 March 2023.pptx
Print source literature 24 March 2023.pptx
sanjaychavan62
 
Collection evaluation techniques for academic libraries
Collection evaluation techniques for academic libraries Collection evaluation techniques for academic libraries
Collection evaluation techniques for academic libraries
ALISS
 
Writing Right: Teaching Writing Conventions Specific to a Discipline
Writing Right: Teaching Writing Conventions Specific to a DisciplineWriting Right: Teaching Writing Conventions Specific to a Discipline
Writing Right: Teaching Writing Conventions Specific to a DisciplineRobert Domanski
 
Arc 323 human studies in architecture fall 2018 lecture 3-literature review
Arc 323 human studies in architecture fall 2018 lecture 3-literature reviewArc 323 human studies in architecture fall 2018 lecture 3-literature review
Arc 323 human studies in architecture fall 2018 lecture 3-literature review
Galala University
 
McGill Library and your thesis
McGill Library and your thesisMcGill Library and your thesis
McGill Library and your thesis
Pamela Harrison
 
Where data and journal content collide: what does it mean to ‘publish your da...
Where data and journal content collide: what does it mean to ‘publish your da...Where data and journal content collide: what does it mean to ‘publish your da...
Where data and journal content collide: what does it mean to ‘publish your da...
EDINA, University of Edinburgh
 

Similar to The References of References: Enriching Library Catalogs via Domain-Specific Reference Mining (20)

Exploratory computing: designing discovery-driven user experiences
Exploratory computing: designing discovery-driven user experiencesExploratory computing: designing discovery-driven user experiences
Exploratory computing: designing discovery-driven user experiences
 
INF 100 Tutorial
INF 100 TutorialINF 100 Tutorial
INF 100 Tutorial
 
Dissertations 5 ref, plagiarism, own crit-analysis
Dissertations 5   ref, plagiarism, own crit-analysisDissertations 5   ref, plagiarism, own crit-analysis
Dissertations 5 ref, plagiarism, own crit-analysis
 
Goldminers of the Digital Age: How Libraries are Selecting, Presenting, and D...
Goldminers of the Digital Age: How Libraries are Selecting, Presenting, and D...Goldminers of the Digital Age: How Libraries are Selecting, Presenting, and D...
Goldminers of the Digital Age: How Libraries are Selecting, Presenting, and D...
 
Linked Books - DH Venice Fall School 2014
Linked Books - DH Venice Fall School 2014Linked Books - DH Venice Fall School 2014
Linked Books - DH Venice Fall School 2014
 
Romanello tokyo
Romanello tokyoRomanello tokyo
Romanello tokyo
 
Bibiliography Footnotes Oral Presentation PPT 17.pptx
Bibiliography Footnotes Oral Presentation PPT 17.pptxBibiliography Footnotes Oral Presentation PPT 17.pptx
Bibiliography Footnotes Oral Presentation PPT 17.pptx
 
Dissertations 5 ref, plagiarism, own crit-analysis [handout]
Dissertations 5   ref, plagiarism, own crit-analysis [handout]Dissertations 5   ref, plagiarism, own crit-analysis [handout]
Dissertations 5 ref, plagiarism, own crit-analysis [handout]
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
Towards knowledge maintenance in scientific digital libraries with the keysto...
Towards knowledge maintenance in scientific digital libraries with the keysto...Towards knowledge maintenance in scientific digital libraries with the keysto...
Towards knowledge maintenance in scientific digital libraries with the keysto...
 
02 Literature search and reviewing_1.pptx
02  Literature search and reviewing_1.pptx02  Literature search and reviewing_1.pptx
02 Literature search and reviewing_1.pptx
 
Portland Place School June 2018
Portland Place School June 2018 Portland Place School June 2018
Portland Place School June 2018
 
Order #185993101 writers choice (5 pages, 4 slides)type of serv
Order #185993101 writers choice (5 pages, 4 slides)type of servOrder #185993101 writers choice (5 pages, 4 slides)type of serv
Order #185993101 writers choice (5 pages, 4 slides)type of serv
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
Print source literature 24 March 2023.pptx
Print source literature 24 March 2023.pptxPrint source literature 24 March 2023.pptx
Print source literature 24 March 2023.pptx
 
Collection evaluation techniques for academic libraries
Collection evaluation techniques for academic libraries Collection evaluation techniques for academic libraries
Collection evaluation techniques for academic libraries
 
Writing Right: Teaching Writing Conventions Specific to a Discipline
Writing Right: Teaching Writing Conventions Specific to a DisciplineWriting Right: Teaching Writing Conventions Specific to a Discipline
Writing Right: Teaching Writing Conventions Specific to a Discipline
 
Arc 323 human studies in architecture fall 2018 lecture 3-literature review
Arc 323 human studies in architecture fall 2018 lecture 3-literature reviewArc 323 human studies in architecture fall 2018 lecture 3-literature review
Arc 323 human studies in architecture fall 2018 lecture 3-literature review
 
McGill Library and your thesis
McGill Library and your thesisMcGill Library and your thesis
McGill Library and your thesis
 
Where data and journal content collide: what does it mean to ‘publish your da...
Where data and journal content collide: what does it mean to ‘publish your da...Where data and journal content collide: what does it mean to ‘publish your da...
Where data and journal content collide: what does it mean to ‘publish your da...
 

More from Giovanni Colavizza

Sul ruolo dell’umanista nelle Digital Humanities
Sul ruolo dell’umanista nelle Digital HumanitiesSul ruolo dell’umanista nelle Digital Humanities
Sul ruolo dell’umanista nelle Digital Humanities
Giovanni Colavizza
 
La Venice Time Machine e alcune sfide dei progetti “Big Science” nelle discip...
La Venice Time Machine e alcune sfide dei progetti “Big Science” nelle discip...La Venice Time Machine e alcune sfide dei progetti “Big Science” nelle discip...
La Venice Time Machine e alcune sfide dei progetti “Big Science” nelle discip...
Giovanni Colavizza
 
A Cliometrics’ view on the Garzoni database
A Cliometrics’ view on the Garzoni databaseA Cliometrics’ view on the Garzoni database
A Cliometrics’ view on the Garzoni database
Giovanni Colavizza
 
Venice 1740 Reconstruction
Venice 1740 ReconstructionVenice 1740 Reconstruction
Venice 1740 Reconstruction
Giovanni Colavizza
 
Notes de bas de page: d’un outil savant aux hyperliens
Notes de bas de page: d’un outil savant aux hyperliensNotes de bas de page: d’un outil savant aux hyperliens
Notes de bas de page: d’un outil savant aux hyperliens
Giovanni Colavizza
 
Introduction to the Venice Time Machine
Introduction to the Venice Time MachineIntroduction to the Venice Time Machine
Introduction to the Venice Time Machine
Giovanni Colavizza
 
Mapping Early Modern News Networks
Mapping Early Modern News NetworksMapping Early Modern News Networks
Mapping Early Modern News Networks
Giovanni Colavizza
 
Report on Ongoing Digitisation and Information System Design for VTM
Report on Ongoing Digitisation and Information System Design for VTMReport on Ongoing Digitisation and Information System Design for VTM
Report on Ongoing Digitisation and Information System Design for VTM
Giovanni Colavizza
 
Mapping the News Networks in XVII Italy
Mapping the News Networks in XVII ItalyMapping the News Networks in XVII Italy
Mapping the News Networks in XVII Italy
Giovanni Colavizza
 
Garzoni conference 11 October 2014
Garzoni conference 11 October 2014Garzoni conference 11 October 2014
Garzoni conference 11 October 2014
Giovanni Colavizza
 
Leipzig Functional Categorisation 11/12/2013
Leipzig Functional Categorisation 11/12/2013Leipzig Functional Categorisation 11/12/2013
Leipzig Functional Categorisation 11/12/2013
Giovanni Colavizza
 
Udine Digital Humanities 19/11/2013
Udine Digital Humanities 19/11/2013Udine Digital Humanities 19/11/2013
Udine Digital Humanities 19/11/2013Giovanni Colavizza
 
Venezia Biblioteche e Digital Humanities 28/10/2013
Venezia Biblioteche e Digital Humanities 28/10/2013Venezia Biblioteche e Digital Humanities 28/10/2013
Venezia Biblioteche e Digital Humanities 28/10/2013Giovanni Colavizza
 
Mainz Expert Workshop on Controlled Vocabularies 10/10/2013
Mainz Expert Workshop on Controlled Vocabularies 10/10/2013Mainz Expert Workshop on Controlled Vocabularies 10/10/2013
Mainz Expert Workshop on Controlled Vocabularies 10/10/2013Giovanni Colavizza
 

More from Giovanni Colavizza (14)

Sul ruolo dell’umanista nelle Digital Humanities
Sul ruolo dell’umanista nelle Digital HumanitiesSul ruolo dell’umanista nelle Digital Humanities
Sul ruolo dell’umanista nelle Digital Humanities
 
La Venice Time Machine e alcune sfide dei progetti “Big Science” nelle discip...
La Venice Time Machine e alcune sfide dei progetti “Big Science” nelle discip...La Venice Time Machine e alcune sfide dei progetti “Big Science” nelle discip...
La Venice Time Machine e alcune sfide dei progetti “Big Science” nelle discip...
 
A Cliometrics’ view on the Garzoni database
A Cliometrics’ view on the Garzoni databaseA Cliometrics’ view on the Garzoni database
A Cliometrics’ view on the Garzoni database
 
Venice 1740 Reconstruction
Venice 1740 ReconstructionVenice 1740 Reconstruction
Venice 1740 Reconstruction
 
Notes de bas de page: d’un outil savant aux hyperliens
Notes de bas de page: d’un outil savant aux hyperliensNotes de bas de page: d’un outil savant aux hyperliens
Notes de bas de page: d’un outil savant aux hyperliens
 
Introduction to the Venice Time Machine
Introduction to the Venice Time MachineIntroduction to the Venice Time Machine
Introduction to the Venice Time Machine
 
Mapping Early Modern News Networks
Mapping Early Modern News NetworksMapping Early Modern News Networks
Mapping Early Modern News Networks
 
Report on Ongoing Digitisation and Information System Design for VTM
Report on Ongoing Digitisation and Information System Design for VTMReport on Ongoing Digitisation and Information System Design for VTM
Report on Ongoing Digitisation and Information System Design for VTM
 
Mapping the News Networks in XVII Italy
Mapping the News Networks in XVII ItalyMapping the News Networks in XVII Italy
Mapping the News Networks in XVII Italy
 
Garzoni conference 11 October 2014
Garzoni conference 11 October 2014Garzoni conference 11 October 2014
Garzoni conference 11 October 2014
 
Leipzig Functional Categorisation 11/12/2013
Leipzig Functional Categorisation 11/12/2013Leipzig Functional Categorisation 11/12/2013
Leipzig Functional Categorisation 11/12/2013
 
Udine Digital Humanities 19/11/2013
Udine Digital Humanities 19/11/2013Udine Digital Humanities 19/11/2013
Udine Digital Humanities 19/11/2013
 
Venezia Biblioteche e Digital Humanities 28/10/2013
Venezia Biblioteche e Digital Humanities 28/10/2013Venezia Biblioteche e Digital Humanities 28/10/2013
Venezia Biblioteche e Digital Humanities 28/10/2013
 
Mainz Expert Workshop on Controlled Vocabularies 10/10/2013
Mainz Expert Workshop on Controlled Vocabularies 10/10/2013Mainz Expert Workshop on Controlled Vocabularies 10/10/2013
Mainz Expert Workshop on Controlled Vocabularies 10/10/2013
 

Recently uploaded

Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
frank0071
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
Anemia_ types_clinical significance.pptx
Anemia_ types_clinical significance.pptxAnemia_ types_clinical significance.pptx
Anemia_ types_clinical significance.pptx
muralinath2
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
Mudde & Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
Mudde &  Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...Mudde &  Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
Mudde & Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
frank0071
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
RASHMI M G
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
zeex60
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 

Recently uploaded (20)

Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
Anemia_ types_clinical significance.pptx
Anemia_ types_clinical significance.pptxAnemia_ types_clinical significance.pptx
Anemia_ types_clinical significance.pptx
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
Mudde & Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
Mudde &  Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...Mudde &  Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
Mudde & Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 

The References of References: Enriching Library Catalogs via Domain-Specific Reference Mining

  • 1. Giovanni Colavizza Matteo Romanello (@mr56k) Frédéric Kaplan (@frederickaplan) The References of References: Enriching Library Catalogs via Domain-Specific Reference Mining 1
  • 2. Goal 2 Empowering scholars in the Humanities with better IR systems
  • 3. Motivation - the Scholar Issues: lack of data [Sula and Miller, 2014] leads to absence of services: estimated coverage of Web of Science for Humanities circa 13% [Mingers and Leydesdorff, 2015]. 3 Sciences: Google Scholar English mainly papers Lower-cost information gathering Humanities: no Google Scholar-like system multiple languages mainly monographs Higher-cost information gathering
  • 4. Motivation - the Footnote How humanists cite? Footnotes [see e.g. Hellqvist, 2009] 4
  • 5. Motivation - the Archive Approximately half citations to primary sources [Wiberley Jr., 2009] 5
  • 6. Motivation - the Scholar reloaded 6
  • 7. Proposal: Enriching library catalogs 7 Use reference monographs, the “canon” of the domain, to extract references to the rest of the literature and enrich library catalogs.
  • 8. Project: Linked Books Focused on a case study/domain: the history of Venice. Partners so far: • Ca’ Foscari University Library System • Biblioteca Marciana • Istituto Veneto di Scienze, Lettere ed Arti • Archivio di Stato di Venezia • EPFL 8
  • 10. Corpus selection 10 Result: 1904 monographs, 701 with a structured list of references. Use the means of the library: 1- Consultation shelves 2- Dewey and subject classification 3- Scholarly bibliographies 4- Keyword search
  • 11. The Pipeline - Digitization 11
  • 12. Digitization 12 1,904 monographs + ~1,000 journal issues
  • 13. The Pipeline - Annotation/Extraction/Parsing 13
  • 14. Annotation 14 • annotated 27% of 701 monographs (with reference list) • 3.8% of all digitized pages (with references) • annotators identified 33 citation styles, divided into 6 families • Yes, humanities scholars love customized reference styles!
  • 15. Reference Extraction/Parsing 15 [Klinkhammer b-i-secondary-full] [Lutz, i-secondary-full] [L’occupazione i- secondary-full] [tedesca i-secondary-full] [in i-secondary-full] [Italia i-secondary-full] [1943-1945, i-secondary-full] [Torino, i-secondary-full] [Bollati i-secondary-full] [Boringhieri i-secondary-full] [1993 i-secondary-full]. Klinkhammer Lutz, L’occupazione tedesca in Italia 1943-1945, Torino, Bollati Boringhieri 1993. [Klinkhammer author] [Lutz, author] [L’occupazione title] [tedesca title] [in title] [Italia title] [1943-1945, title] [Torino, publicationplace] [Bollati publisher] [Boringhieri publisher] [1993 publicationyear].
  • 17. Extraction/Parsing - Confusion Matrix 17 null author title abbrev. (E) monograph (E) Task 1 F1 score (avg) 0.806 class=“null” 0.609 Task 2 F1 score (avg) 0.842 class=“end abbreviated” 0.242
  • 18. The Pipeline - Lookup 18
  • 19. Lookup 19 1. Against OPAC SBN (via API) Steps: 1. search candidates by title 2. match reference metadata 3. assign each candidate a confidence score 4. return set of candidates Evaluation: • 2k references (out of 181k) • 41.7% no candidates • 58.3% with candidates: • 72.3% -> first candidate correct Goal: disambiguation of references Issues: • OCR errors -> impact on search by title (low recall) • API as a “black box” + bottleneck of search by title
  • 20. Lookup 20 2. Against metadata of digitized books Lookup Goal: verify cohesiveness of digitized corpus Method: • based on SBN lookup • but lookup against digitization metadata • tuned to maximize precision • returns 1 or no matches Evaluation*: • 500 references (out of 181k) • precision ~ 1.00 • recall > 0.95 Result: • only 7% of references extracted from 701 monographs point inwards (i.e. towards the 1904 monographs)
  • 21. 21 Core of the discipline co-citation network from extracted references* giant component = 59% of selected corpus books in the giant component -> core of reference works on history of Venice giant component -> 32.5% with only works in consultation
  • 22. Conclusions and Outlook 22 data- and citation-driven approach to assess and exploit, from an IR point of view, domain-specific library holdings on the history of Venice next big challenge: extraction, consolidation and disambiguation of references contained within footnotes (journals)
  • 23. Giovanni Colavizza Matteo Romanello (@mr56k) Frédéric Kaplan (@frederickaplan) Thank you! go.epfl.ch/linkedbooks 23