SlideShare a Scribd company logo
Challenging knowledge extraction to support

the curation of documentary evidence in the humanities
Sciknow 2019
19 November 2019
Marina del Rey, California, United States
Enrico Daga and Enrico Motta
The Open University
Feedback:	@enridaga	|	www.enridaga.net	
Paper: http://oro.open.ac.uk/67961/
Background
• The identification and cataloguing of documentary evidence from textual
corpora is an important part of empirical research in the humanities.
• Semantic databases of documentary evidence: a recent trend
• The Listening Experience Database Project (LED) (over 10.000 unique
experiences) - https://led.kmi.open.ac.uk/
• READ-IT: Reading Europe Advanced Data Investigation Tool - https://readit-
project.eu/
• Two-step workflow:
1. Identification + Retrieval (*)
2. Cataloguing and curation (*)	“Capturing	Themed	Evidence,	a	Hybrid	Approach”		
							Details	tomorrow	afternoon	…
Knowledge Extraction (KE)
• Bet: metadata curation could be supported with KE methods
• KE: automatic or semi-automatic derivation of formal symbolic knowledge from
unstructured or semi-structured sources
• Approaches in the literature vary in task / scope:
• (Named) Entity Recognition and Classification (Person, Work, Time, Place, …)
• Entity Linking (DBpedia, Gazetteers)
• Relation Extraction (listener of, in place)
• Event extraction (Performance)
• Machine reading
Example #1
"I then went to Amsterdam to conduct Oedipus at the
Concertgebouw, which was celebrating its fortieth
anniversary by a series of sumptuous musical
productions. The fine Concertgebouw orchestra, always
at the same high level, the magnificent male choruses
from the Royal Apollo Society, soloists of the first rank -
among them Mme Hélène Sadoven as Jocasta, Louis van
Tulder as Oedipus, and Paul Huf, an excellent reader -
and the way in which my work was received by the public,
have left a particularly precious memory that I recall with
much enjoyment."
listener: Igor Strawinsky
time: in the beginning of 1928
place: Amsterdam
opera: Oedipus Rex
/by: Igor Strawinsky
performer: Concertgebouw orch.
environment: Public
Igor Stravinksy
An Autobiography (1936), p. 139.
https://led.kmi.open.ac.uk/entity/lexp/1435674909834
Example #2
"Music is certainly a pleasure that may be reckoned
intellectual, and we shall never again have it in the
perfection it is this year, because Mr. Handel will not
compose any more! Oratorios begin next week, to
my great joy, for they are the highest entertainment
to me."
listener: Mrs Delany
time: March, 1737
place: London
opera: Operas and Oratorios
/by: G. F. Handel
environment: Public
From: Mary Granville, and Augusta Hall (ed.),
Autobiography and Correspondence of Mary Granville, Mrs
Delany: with interesting Reminiscences of King George the
Third and Queen Charlotte, volume 1 (London, 1861), p.
594.
https://led.kmi.open.ac.uk/entity/lexp/1444424772006
Feedback:	@enridaga	|	www.enridaga.net
Analysis
• Scope: 7.3% of the LED with sources available (archive.org) and
including DBpedia entities as place or agent, 690 excerpts from 26
books.
• Focus on Entity Recognition: Listener & Place
1. Find the position of the evidence text back in the original source
2. Check where the DBpedia entity (listener or place) is mentioned
• Analysis is clearly limited and results are preliminary, although …
Analysis
• Q1 - in the excerpt? The place is mentioned in the
excerpt in 25.9% cases. The listener only in 13.4%.
• Q2 - near the excerpt? Only 10% of the times the
place mention is less than 5 paragraphs from the
excerpt. The agent, in 4% of the cases.
• Q3 - in the source? 83.2% of the times the place is
mentioned at least once in the source. In 11.4% the
place hasn’t been found.
• Q4 - in the meta? 64.8% of the listeners are also
the authors of the text - 5874 cases in LED.
Distance	of	entity	(in	n	of	paragraphs)
Challenges
• Implicit information, based on inference requiring expertise (e.g. Mr Handel is
G.F Handel, Oedipus is “Oedipus Rex”)
• The role of contextual knowledge is fundamental (1) in identifying the agent
from the metadata of the source; (2) common sense inference (“in the beginning
of 1928”)
• Entities can exist in distributed, heterogeneous resources (encyclopaedic KBs,
domain-specific taxonomies, gazetteers, …)
• Machine reading generates an ontology formalising the discourse in the text,
reducing the task to one of ontology alignment (not a simplification!)
• Cultural studies typically coin novel concepts (ListeningExperience) with original
schemas. Portability of the methods is even more at risk!
Thank you
Questions?
Feedback:	@enridaga	|	www.enridaga.net

More Related Content

Similar to Challenging knowledge extraction to support
the curation of documentary evidence in the humanities

Kilian Schmidtner/Klaus Thoden - The Missing Links: Defining the Mamluk Empi...
Kilian Schmidtner/Klaus Thoden - The Missing Links:  Defining the Mamluk Empi...Kilian Schmidtner/Klaus Thoden - The Missing Links:  Defining the Mamluk Empi...
Kilian Schmidtner/Klaus Thoden - The Missing Links: Defining the Mamluk Empi...
Digitised Manuscripts to Europeana
 
Keynote: Conflicting Cultures of Knowledge - D. Oldman - ESWC SS 2014
Keynote: Conflicting Cultures of Knowledge - D. Oldman - ESWC SS 2014 Keynote: Conflicting Cultures of Knowledge - D. Oldman - ESWC SS 2014
Keynote: Conflicting Cultures of Knowledge - D. Oldman - ESWC SS 2014
eswcsummerschool
 
Digital Scholarship Intersection
Digital Scholarship IntersectionDigital Scholarship Intersection
Digital Scholarship Intersection
David De Roure
 
Doing the Digital: How Scholars Learned to Stop Worrying and Love the Computer
Doing the Digital: How Scholars Learned to Stop Worrying and Love the ComputerDoing the Digital: How Scholars Learned to Stop Worrying and Love the Computer
Doing the Digital: How Scholars Learned to Stop Worrying and Love the Computer
Andrew Prescott
 
[Akhil gupta, james_ferguson]_anthropological_loca(book4_you)
[Akhil gupta, james_ferguson]_anthropological_loca(book4_you)[Akhil gupta, james_ferguson]_anthropological_loca(book4_you)
[Akhil gupta, james_ferguson]_anthropological_loca(book4_you)
marce c.
 
[Akhil gupta, james_ferguson]_anthropological_loca(book_zz.org)
[Akhil gupta, james_ferguson]_anthropological_loca(book_zz.org)[Akhil gupta, james_ferguson]_anthropological_loca(book_zz.org)
[Akhil gupta, james_ferguson]_anthropological_loca(book_zz.org)
Tshuvunga Bembele
 
Irish Studies - making library data work harder
Irish Studies - making library data work harderIrish Studies - making library data work harder
Irish Studies - making library data work harder
lisld
 
"It will discourse most eloquent music": Sonify Variants of Hamlet
"It will discourse most eloquent music": Sonify Variants of Hamlet"It will discourse most eloquent music": Sonify Variants of Hamlet
"It will discourse most eloquent music": Sonify Variants of Hamlet
Iain Emsley
 
Jb dariah-annotation-workshop
Jb dariah-annotation-workshopJb dariah-annotation-workshop
Jb dariah-annotation-workshop
John Bradley
 
The Past, the Present and the Future of Dissecting Literary Texts: From Mora...
The Past, the Present and the Future of Dissecting Literary Texts: From Mora...The Past, the Present and the Future of Dissecting Literary Texts: From Mora...
The Past, the Present and the Future of Dissecting Literary Texts: From Mora...
Dilip Barad
 
Digital Humanities: A brief introduction to the field
Digital Humanities: A brief introduction to the fieldDigital Humanities: A brief introduction to the field
Digital Humanities: A brief introduction to the field
aelang
 
Digital Cultural Heritage and the new EU Framework Programme
Digital Cultural Heritage and the new EU Framework ProgrammeDigital Cultural Heritage and the new EU Framework Programme
Digital Cultural Heritage and the new EU Framework Programme
locloud
 
Tectonic Storytelling with Open Source and Digital Object Identifiers - a cas...
Tectonic Storytelling with Open Source and Digital Object Identifiers - a cas...Tectonic Storytelling with Open Source and Digital Object Identifiers - a cas...
Tectonic Storytelling with Open Source and Digital Object Identifiers - a cas...
Peter Löwe
 
What's the (European) story - Alexander Badenoch
What's the (European) story - Alexander BadenochWhat's the (European) story - Alexander Badenoch
What's the (European) story - Alexander Badenoch
EUscreen
 
Making the Leap (Julian Richards)
Making the Leap (Julian Richards)Making the Leap (Julian Richards)
Making the Leap (Julian Richards)
Onroerend Erfgoed
 
Linkage in Haze: challenges and take-home messages of crowd-sourcing vaguenes...
Linkage in Haze: challenges and take-home messages of crowd-sourcing vaguenes...Linkage in Haze: challenges and take-home messages of crowd-sourcing vaguenes...
Linkage in Haze: challenges and take-home messages of crowd-sourcing vaguenes...
Alessandro Adamou
 
Roger Malina ucsb final
Roger Malina ucsb finalRoger Malina ucsb final
Roger Malina ucsb final
roger malina
 
Ucla2013
Ucla2013Ucla2013
Ucla2013
piero scaruffi
 
Genericity versus expressivity – reflections about the semantics of interoper...
Genericity versus expressivity – reflections about the semantics of interoper...Genericity versus expressivity – reflections about the semantics of interoper...
Genericity versus expressivity – reflections about the semantics of interoper...
Andrea Scharnhorst
 
Research Data Management for the Humanities and Social Sciences
Research Data Management for the Humanities and Social SciencesResearch Data Management for the Humanities and Social Sciences
Research Data Management for the Humanities and Social Sciences
Martin Donnelly
 

Similar to Challenging knowledge extraction to support
the curation of documentary evidence in the humanities (20)

Kilian Schmidtner/Klaus Thoden - The Missing Links: Defining the Mamluk Empi...
Kilian Schmidtner/Klaus Thoden - The Missing Links:  Defining the Mamluk Empi...Kilian Schmidtner/Klaus Thoden - The Missing Links:  Defining the Mamluk Empi...
Kilian Schmidtner/Klaus Thoden - The Missing Links: Defining the Mamluk Empi...
 
Keynote: Conflicting Cultures of Knowledge - D. Oldman - ESWC SS 2014
Keynote: Conflicting Cultures of Knowledge - D. Oldman - ESWC SS 2014 Keynote: Conflicting Cultures of Knowledge - D. Oldman - ESWC SS 2014
Keynote: Conflicting Cultures of Knowledge - D. Oldman - ESWC SS 2014
 
Digital Scholarship Intersection
Digital Scholarship IntersectionDigital Scholarship Intersection
Digital Scholarship Intersection
 
Doing the Digital: How Scholars Learned to Stop Worrying and Love the Computer
Doing the Digital: How Scholars Learned to Stop Worrying and Love the ComputerDoing the Digital: How Scholars Learned to Stop Worrying and Love the Computer
Doing the Digital: How Scholars Learned to Stop Worrying and Love the Computer
 
[Akhil gupta, james_ferguson]_anthropological_loca(book4_you)
[Akhil gupta, james_ferguson]_anthropological_loca(book4_you)[Akhil gupta, james_ferguson]_anthropological_loca(book4_you)
[Akhil gupta, james_ferguson]_anthropological_loca(book4_you)
 
[Akhil gupta, james_ferguson]_anthropological_loca(book_zz.org)
[Akhil gupta, james_ferguson]_anthropological_loca(book_zz.org)[Akhil gupta, james_ferguson]_anthropological_loca(book_zz.org)
[Akhil gupta, james_ferguson]_anthropological_loca(book_zz.org)
 
Irish Studies - making library data work harder
Irish Studies - making library data work harderIrish Studies - making library data work harder
Irish Studies - making library data work harder
 
"It will discourse most eloquent music": Sonify Variants of Hamlet
"It will discourse most eloquent music": Sonify Variants of Hamlet"It will discourse most eloquent music": Sonify Variants of Hamlet
"It will discourse most eloquent music": Sonify Variants of Hamlet
 
Jb dariah-annotation-workshop
Jb dariah-annotation-workshopJb dariah-annotation-workshop
Jb dariah-annotation-workshop
 
The Past, the Present and the Future of Dissecting Literary Texts: From Mora...
The Past, the Present and the Future of Dissecting Literary Texts: From Mora...The Past, the Present and the Future of Dissecting Literary Texts: From Mora...
The Past, the Present and the Future of Dissecting Literary Texts: From Mora...
 
Digital Humanities: A brief introduction to the field
Digital Humanities: A brief introduction to the fieldDigital Humanities: A brief introduction to the field
Digital Humanities: A brief introduction to the field
 
Digital Cultural Heritage and the new EU Framework Programme
Digital Cultural Heritage and the new EU Framework ProgrammeDigital Cultural Heritage and the new EU Framework Programme
Digital Cultural Heritage and the new EU Framework Programme
 
Tectonic Storytelling with Open Source and Digital Object Identifiers - a cas...
Tectonic Storytelling with Open Source and Digital Object Identifiers - a cas...Tectonic Storytelling with Open Source and Digital Object Identifiers - a cas...
Tectonic Storytelling with Open Source and Digital Object Identifiers - a cas...
 
What's the (European) story - Alexander Badenoch
What's the (European) story - Alexander BadenochWhat's the (European) story - Alexander Badenoch
What's the (European) story - Alexander Badenoch
 
Making the Leap (Julian Richards)
Making the Leap (Julian Richards)Making the Leap (Julian Richards)
Making the Leap (Julian Richards)
 
Linkage in Haze: challenges and take-home messages of crowd-sourcing vaguenes...
Linkage in Haze: challenges and take-home messages of crowd-sourcing vaguenes...Linkage in Haze: challenges and take-home messages of crowd-sourcing vaguenes...
Linkage in Haze: challenges and take-home messages of crowd-sourcing vaguenes...
 
Roger Malina ucsb final
Roger Malina ucsb finalRoger Malina ucsb final
Roger Malina ucsb final
 
Ucla2013
Ucla2013Ucla2013
Ucla2013
 
Genericity versus expressivity – reflections about the semantics of interoper...
Genericity versus expressivity – reflections about the semantics of interoper...Genericity versus expressivity – reflections about the semantics of interoper...
Genericity versus expressivity – reflections about the semantics of interoper...
 
Research Data Management for the Humanities and Social Sciences
Research Data Management for the Humanities and Social SciencesResearch Data Management for the Humanities and Social Sciences
Research Data Management for the Humanities and Social Sciences
 

More from Enrico Daga

Citizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyCitizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data Journey
Enrico Daga
 
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
Enrico Daga
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.
Enrico Daga
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything Project
Enrico Daga
 
Trying SPARQL Anything with MEI
Trying SPARQL Anything with MEITrying SPARQL Anything with MEI
Trying SPARQL Anything with MEI
Enrico Daga
 
The SPARQL Anything project
The SPARQL Anything projectThe SPARQL Anything project
The SPARQL Anything project
Enrico Daga
 
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Enrico Daga
 
Capturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid ApproachCapturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid Approach
Enrico Daga
 
Ld4 dh tutorial
Ld4 dh tutorialLd4 dh tutorial
Ld4 dh tutorial
Enrico Daga
 
OU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data ClusterOU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data Cluster
Enrico Daga
 
CityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tablesCityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tables
Enrico Daga
 
Propagating Data Policies - A User Study
Propagating Data Policies - A User StudyPropagating Data Policies - A User Study
Propagating Data Policies - A User Study
Enrico Daga
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so far
Enrico Daga
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data Flows
Enrico Daga
 
A bottom up approach for licences classification and selection
A bottom up approach for licences classification and selectionA bottom up approach for licences classification and selection
A bottom up approach for licences classification and selection
Enrico Daga
 
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsA BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
Enrico Daga
 
Early Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEarly Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data Cubes
Enrico Daga
 

More from Enrico Daga (17)

Citizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyCitizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data Journey
 
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything Project
 
Trying SPARQL Anything with MEI
Trying SPARQL Anything with MEITrying SPARQL Anything with MEI
Trying SPARQL Anything with MEI
 
The SPARQL Anything project
The SPARQL Anything projectThe SPARQL Anything project
The SPARQL Anything project
 
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
 
Capturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid ApproachCapturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid Approach
 
Ld4 dh tutorial
Ld4 dh tutorialLd4 dh tutorial
Ld4 dh tutorial
 
OU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data ClusterOU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data Cluster
 
CityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tablesCityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tables
 
Propagating Data Policies - A User Study
Propagating Data Policies - A User StudyPropagating Data Policies - A User Study
Propagating Data Policies - A User Study
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so far
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data Flows
 
A bottom up approach for licences classification and selection
A bottom up approach for licences classification and selectionA bottom up approach for licences classification and selection
A bottom up approach for licences classification and selection
 
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsA BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
 
Early Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEarly Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data Cubes
 

Recently uploaded

ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 

Recently uploaded (20)

ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 

Challenging knowledge extraction to support
the curation of documentary evidence in the humanities

  • 1. Challenging knowledge extraction to support
 the curation of documentary evidence in the humanities Sciknow 2019 19 November 2019 Marina del Rey, California, United States Enrico Daga and Enrico Motta The Open University Feedback: @enridaga | www.enridaga.net Paper: http://oro.open.ac.uk/67961/
  • 2. Background • The identification and cataloguing of documentary evidence from textual corpora is an important part of empirical research in the humanities. • Semantic databases of documentary evidence: a recent trend • The Listening Experience Database Project (LED) (over 10.000 unique experiences) - https://led.kmi.open.ac.uk/ • READ-IT: Reading Europe Advanced Data Investigation Tool - https://readit- project.eu/ • Two-step workflow: 1. Identification + Retrieval (*) 2. Cataloguing and curation (*) “Capturing Themed Evidence, a Hybrid Approach” Details tomorrow afternoon …
  • 3.
  • 4.
  • 5. Knowledge Extraction (KE) • Bet: metadata curation could be supported with KE methods • KE: automatic or semi-automatic derivation of formal symbolic knowledge from unstructured or semi-structured sources • Approaches in the literature vary in task / scope: • (Named) Entity Recognition and Classification (Person, Work, Time, Place, …) • Entity Linking (DBpedia, Gazetteers) • Relation Extraction (listener of, in place) • Event extraction (Performance) • Machine reading
  • 6. Example #1 "I then went to Amsterdam to conduct Oedipus at the Concertgebouw, which was celebrating its fortieth anniversary by a series of sumptuous musical productions. The fine Concertgebouw orchestra, always at the same high level, the magnificent male choruses from the Royal Apollo Society, soloists of the first rank - among them Mme Hélène Sadoven as Jocasta, Louis van Tulder as Oedipus, and Paul Huf, an excellent reader - and the way in which my work was received by the public, have left a particularly precious memory that I recall with much enjoyment." listener: Igor Strawinsky time: in the beginning of 1928 place: Amsterdam opera: Oedipus Rex /by: Igor Strawinsky performer: Concertgebouw orch. environment: Public Igor Stravinksy An Autobiography (1936), p. 139. https://led.kmi.open.ac.uk/entity/lexp/1435674909834
  • 7. Example #2 "Music is certainly a pleasure that may be reckoned intellectual, and we shall never again have it in the perfection it is this year, because Mr. Handel will not compose any more! Oratorios begin next week, to my great joy, for they are the highest entertainment to me." listener: Mrs Delany time: March, 1737 place: London opera: Operas and Oratorios /by: G. F. Handel environment: Public From: Mary Granville, and Augusta Hall (ed.), Autobiography and Correspondence of Mary Granville, Mrs Delany: with interesting Reminiscences of King George the Third and Queen Charlotte, volume 1 (London, 1861), p. 594. https://led.kmi.open.ac.uk/entity/lexp/1444424772006 Feedback: @enridaga | www.enridaga.net
  • 8. Analysis • Scope: 7.3% of the LED with sources available (archive.org) and including DBpedia entities as place or agent, 690 excerpts from 26 books. • Focus on Entity Recognition: Listener & Place 1. Find the position of the evidence text back in the original source 2. Check where the DBpedia entity (listener or place) is mentioned • Analysis is clearly limited and results are preliminary, although …
  • 9. Analysis • Q1 - in the excerpt? The place is mentioned in the excerpt in 25.9% cases. The listener only in 13.4%. • Q2 - near the excerpt? Only 10% of the times the place mention is less than 5 paragraphs from the excerpt. The agent, in 4% of the cases. • Q3 - in the source? 83.2% of the times the place is mentioned at least once in the source. In 11.4% the place hasn’t been found. • Q4 - in the meta? 64.8% of the listeners are also the authors of the text - 5874 cases in LED. Distance of entity (in n of paragraphs)
  • 10. Challenges • Implicit information, based on inference requiring expertise (e.g. Mr Handel is G.F Handel, Oedipus is “Oedipus Rex”) • The role of contextual knowledge is fundamental (1) in identifying the agent from the metadata of the source; (2) common sense inference (“in the beginning of 1928”) • Entities can exist in distributed, heterogeneous resources (encyclopaedic KBs, domain-specific taxonomies, gazetteers, …) • Machine reading generates an ontology formalising the discourse in the text, reducing the task to one of ontology alignment (not a simplification!) • Cultural studies typically coin novel concepts (ListeningExperience) with original schemas. Portability of the methods is even more at risk!