SlideShare a Scribd company logo
Knowledge Graph Maintenance
Prof. Paul Groth | @pgroth | pgroth.com | indelab.org

Thanks to Daniel Daza, Corey Harper, Thiviyan Thanapalsingam, Niels ten
Oever, Marieke van Erp, Valentin Vogelman and Frank van Harmelen

NEC Lab Europae 

January 22, 2021
We investigate intelligent systems that support
people in their work with data and information from
diverse sources.

In this area, we perform applied and fundamental
research informed by empirical insights into data
science practice.

Current topics:

• Automated Knowledge Base Construction

• Data Search + Data Provenance

• Data Management for Machine Learning

• Causality for machine learning on messy data 

indelab.org
Roads
and Bridges:
The Unseen Labor Behind
Our Digital Infrastructure
W R I T T E N B Y
Nadia Eghbal
Source:

https://www.fordfoundation.org/work/learning/research-reports/roads-and-bridges-the-unseen-labor-behind-our-
digital-infrastructure/
Faculty of Science
Knowledge Graphs
Source:

Azzaoui, K., Jacoby, E., Senger, S., Rodríguez, E. C., Loza, M., Zdrazil, B., … Ecker, G. F. (2013). Scienti
fi
c
competency questions as the basis for semantically enriched open pharmacological space development. Drug
Discovery Today, 18(17–18), 843–852. https://doi.org/10.1016/j.drudis.2013.05.008
Source:

https://www.biocuration2019.org/about
Source:

https://www.wired.com/story/inside-the-alexa-friendly-world-of-wikidata/
Source:

https://stats.wikimedia.org/v2/#/en.wikipedia.org/contributing/user-edits/normal||2001-01-01~2019-09-01|~total|
Crowdsourcing
100,000s of hand annotated examples
The TAC Relation Extraction Dataset
Source:

Zhang, Yuhao, et al. "Position-aware attention and supervised data improve slot
fi
lling." Proceedings of the 2017
Conference on Empirical Methods in Natural Language Processing. 2017.

Karen Fort, Gilles Adda, Kevin Bretonnel Cohen. Amazon Mechanical Turk: Gold Mine or Coal Mine?. Computational
Linguistics, Massachusetts Institute of Technology Press (MIT Press), 2011, pp.413-420. 10.1162/COLI_a_00057
Concept1
Concept2 Concept3
KOS
Professional
Curators
Literature
Software
Non-professional
contributors
1. dealing with changing cultural and societal
norms, specifically to address or correct bias;
2. political influence
3. new concepts and terminology arising from
discoveries or change in perspective within a
technical/scientific community
4. gardening
5. incremental contributorship
6. progressive formalization
7. software and automation
8. integration of large numbers of data sources
9. variance in algorithm training data
Data
⚐
Society & Politics
(4, 5, 6)
(7, 8, 9)
(3)
(1, 2)
Source:

Michael Lauruhn and Paul Groth. 

“Sources of Change for Modern Knowledge Organization Systems." Knowledge Organization 43, no. 8 (2016).
Content
Universal
schema
Surface form
relations
Structured
relations
Factorization
model
Matrix
Construction
Open
Information
Extraction
Entity
Resolution
Matrix
Factorization
Knowledge
graph
Curation
Predicted
relations
Matrix
Completion
Taxonomy
Triple
Extraction
Concept
Resolution
14M
SD articles
475 M
triples
3.3 million
relations
49 M
relations
~15k ->
1M
entries
Paul Groth, Sujit Pal, Darin McBeath, Brad Allen, Ron Daniel
“Applying Universal Schemas for Domain Specific Ontology Expansion”
5th Workshop on Automated Knowledge Base Construction (AKBC) 2016
Knowledge Graph Curation and Re
fi
nement
Apply ML
Link Prediction
Inductive Prediction
Daniel Daza, Michael Cochez, and Paul Groth. "Inductive Entity
Representations from Text via Link Prediction." arXiv preprint
arXiv:2010.03496 (2020). Accepted to WebConf 2021
Inductive Prediction
Inductive Prediction
Inductive Prediction
Transformer
Evaluation Scenarios
• Transductive (standard) - predict links between entities seen at
training time

• Dynamic - predict links between entities where a unseen entity at
training time can be in the head, tail or both positions of the triple.

• Situation: adding entities to a KG

• Transfer - predict links on entities that are unseen at training time.

• Situation: predicting links on an unseen KG
Dynamic Transfer
Representations for other tasks
https://github.com/dfdazac/blp
“john lennon, parents”
“bicycle holiday nature”
“Airports in Germany”
“What is the longest river?”.
Future: Sub-graph Prediction
Future: Learning KG Pipelines End-to-End
Paul T. Groth, Antony Scerri, Ron Daniel, Bradley P. Allen:

End-to-End Learning for Answering Structured Queries Directly over Text. DL4KG@ESWC 2019: 57-70
KG POPULATION &


THE LONG TAIL
The Importance of Attributes
• Extract numeric patterns (12 ± 3, 53–55, 0.245)
• Extract corresponding units (°C, μM, hours, h, MPa)
• Nanoamperes (nA) for neural cell Rheobase values
• Megapascals (MPa) for compressive strength of concrete
• Milligrams per Kilogram (mg/kg) for administered drug dosages
Units of Measurement Corey Harper
Simple Annotation - Surprisingly Hard
https://competitions.codalab.org/competitions/25770
Task List
Guidelines for applying the model
More and more detailed guidelines...
And started annotating, a paragraph at a time...
NLP Competitions -- Labs Online Lecture -- Thorne, Harper
But the paragraphs got messy...
2020-12-07
2020-12-07
https://github.com/harperco/measeval
NLP Competitions -- Labs Online Lecture -- Thorne, Harper
2020-12-07
40
NLP Competitions -- Labs Online Lecture -- Thorne, Harper
Some Inter-annotator Agreement Info (Krippendorff’s Alpha)
2020-12-07
41
NLP Competitions -- Labs Online Lecture -- Thorne, Harper
Evaluation Output
2020-12-07
42
NLP Competitions -- Labs Online Lecture -- Thorne, Harper
Evaluation Process
2020-12-07
43
NLP Competitions -- Labs Online Lecture -- Thorne, Harper
Local Evaluation Micro Averages
• Can run micro averages


• Broken down:


• By entity / relation class


• By document (paragraph)


• By class & document


• Could add additional analysis
2020-12-07
44
KG POPULATION &


THE LONG TAIL
LINKING & IDENTITY
Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 2129–2137
Marseille, 11–16 May 2020
c European Language Resources Association (ELRA), licensed under CC-BY-NC
2129
Towards Entity Spaces
Marieke van Erp⇤
, Paul Groth†
⇤
KNAW Humanities Cluster - DHLab, Amsterdam, NL
†
University of Amsterdam, Amsterdam, NL
marieke.van.erp@dh.huc.knaw.nl, p.groth@uva.nl
Abstract
Entities are a central element of knowledge bases and are important input to many knowledge-centric tasks including text analysis.
For example, they allow us to find documents relevant to a specific entity irrespective of the underlying syntactic expression within a
document. However, the entities that are commonly represented in knowledge bases are often a simplification of what is truly being
referred to in text. For example, in a knowledge base, we may have an entity for Germany as a country but not for the more fuzzy concept
of Germany that covers notions of German Population, German Drivers, and the German Government. Inspired by recent advances in
contextual word embeddings, we introduce the concept of entity spaces - specific representations of a set of associated entities with
near-identity. Thus, these entity spaces provide a handle to an amorphous grouping of entities. We developed a proof-of-concept for
English showing how, through the introduction of entity spaces in the form of disambiguation pages, the recall of entity linking can be
improved.
Keywords: entity, identity, knowledge representation, entity linking
1. Introduction
Entities are a central element for knowledge bases and
text analysis tasks (Balog, 2018). However, the way in
which entities are represented in knowledge bases and
how subsequent tools use these representations are a sim-
plification of the complexity of many entities. For ex-
ample, the entity Germany in Wikidata as represented
by wikidata:Q183 focuses on its properties as a location
and geopolitical entity due to its membership as an in-
stance of sovereign state, country, federal state, repub-
lic, social state, legal state, and administrative territorial
entity. Similarly, in DBpedia (version 2016-10), Ger-
many is represented as entity of type populated place and
some subtypes such as yago:WikicatFederalCountries and
yago:WikicatMemberStatesOfTheEuropeanUnion.1
However, when the term Germany is used in text, it can
take on many meanings that all have ‘something to do’ with
Germany as it is represented in knowledge bases, but are all
not quite the same:
(1) Germany imported 47,600 sheep from Britain last
year, nearly half of total imports.
(2) German July car registrations up 14.2 pct yr / yr.
(3) Australia last won the Davis Cup in 1986, but they
were beaten finalists against Germany three years
ago under Fraser’s guidance.
In Example (1), Germany refers partly to the location, but
a location usually cannot take on an active role, such that
the entity ‘importing’ the sheep is most likely a referent to
the German meat industry. Germany in Example (2), refers
to the German population buying and registering more cars
than a year before. Finally, in Example (3), Germany refers
to the German Davis Cup team from 1993 (the news article
is from 1996). In the AIDA-YAGO dataset, this entity is
tagged as dbp:Germany Davis Cup team but this presents
1
Germany also has rdf:type dbo:Person but we assume this
is a glitch.
us with another layer of identities, namely that every year,
or every couple of years, the German Davis cup team con-
sists of different players. In 1993, the German Davis cup
team consisted of Michael Stich and Marc-Kevin Goellner,
in 1996 of David Prinosil and Hendrik Dreekmann and at
the time of writing this article in 2019 of Alexander Zverev
and Philipp Kohlschreiber. Both MAG (Moussallem et al.,
2017) and DBpedia spotlight (Daiber et al., 2013a) annotate
Australia and Germany in Example (3) as dbp:Australia
and dbp:Germany respectively. While both the annotations
and automatic linkages are close to the identity of the en-
tity in resolving these referents to dbp:Germany, we argue
this is an underspecification and highlights a larger problem
with identity representation in knowledge bases.
Collapsing of identities has been a frequent topic within
Semantic Web discourse. However, most discussions have
focused on issues with owl:sameAs links (McCusker and
McGuinness, 2010; Raad et al., 2018). However, the prob-
lem of simplified entity representations (e.g. the collaps-
ing of identities) also occurs before the creation of such
owl:sameAs links. Specifically, with the fact that most
knowledge bases represent a single or limited number of
an entity’s facets. In this paper, we analyse the extent of
the problem by connecting Semantic Web representations
of identity to linguistic representations of entities, namely
coreference and near-identity. To overcome this identity
problem, we argue for the introduction of explicit represen-
tations of near-identity within knowledge bases. We term
these explicit representations - entity spaces. We illustrate
how the introduction of entity spaces can boost the perfor-
mance of state-of-the-art entity linking pipelines.
Our contributions are: 1) the definition of entity spaces; 2)
a prototype showing the use of entity spaces over multiple
entity linking pipelines; and 3) experiments on 13 English
entity linking datasets showing the impact of a more toler-
ant approach to entity linking made possible through entity
spaces.
Our code and experimental results are available via https:
//github.com/MvanErp/entity-spaces.
A good question:
Thinking about an answer:
Situated Dialogue
• Computer mediated
dialogue is omnipresent

• Key observation: dialogue is
situated. It not only is about
the “back-and-forth”
between parties but also the
larger environment (e.g.
documents, concepts,
projects, world knowledge). 

• Not just chat
Complex Environments (e.g. Standards
Organizations)
in-sight-it.github.io
conversationkg - kg extraction from dialogue
Concept1
Concept2 Concept3
KOS
Professional
Curators
Literature
Software
Non-professional
contributors
1. dealing with changing cultural and societal
norms, specifically to address or correct bias;
2. political influence
3. new concepts and terminology arising from
discoveries or change in perspective within a
technical/scientific community
4. gardening
5. incremental contributorship
6. progressive formalization
7. software and automation
8. integration of large numbers of data sources
9. variance in algorithm training data
Data
⚐
Society & Politics
(4, 5, 6)
(7, 8, 9)
(3)
(1, 2)
Source:

Michael Lauruhn and Paul Groth. 

“Sources of Change for Modern Knowledge Organization Systems." Knowledge Organization 43, no. 8 (2016).
Knowledge Engineering Revisited
• Knowledge graphs are built ad-hoc 

• 100s of components (extractors, scrapers, quality,
scoring,  user feedback, ….)

• Unique for each organization

• Existing knowledge engineering theory does not apply:

• Assumes small scale

• Assumes slow change

• People-centric

• Expressive representations 

• an updated theory and methods for knowledge
engineering designed for the demands of modern
knowledge graphs
knowledgescientist.org
Conclusion
• Knowledge graphs require maintenance

• Maintenance is frequently people work

• Need for new ML based methods & new human + machine work
fl
ows

• Interested? Happy to talk more
Paul Groth | @pgroth | pgroth.com | indelab.org

More Related Content

What's hot

Minimal viable-datareuse-czi
Minimal viable-datareuse-cziMinimal viable-datareuse-czi
Minimal viable-datareuse-czi
Paul Groth
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?
Paul Groth
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for Science
Paul Groth
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge Graphs
Paul Groth
 
The need for a transparent data supply chain
The need for a transparent data supply chainThe need for a transparent data supply chain
The need for a transparent data supply chain
Paul Groth
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
Rinke Hoekstra
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Rinke Hoekstra
 
An Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataAn Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities Data
Rinke Hoekstra
 
Prov-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationProv-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance Visualization
Rinke Hoekstra
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization Systems
Paul Groth
 
Machines are people too
Machines are people tooMachines are people too
Machines are people too
Paul Groth
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Paul Groth
 
Cognitive data
Cognitive dataCognitive data
Cognitive data
Sören Auer
 
Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...
Sören Auer
 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS case
Rinke Hoekstra
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
Paul Groth
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge Graph
Sören Auer
 
Data Discovery and Visualization
Data Discovery and VisualizationData Discovery and Visualization
Data Discovery and Visualization
Dr. Neil Brittliff
 
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
Christoph Lange
 
Self adaptive based natural language interface for disambiguation of
Self adaptive based natural language interface for disambiguation ofSelf adaptive based natural language interface for disambiguation of
Self adaptive based natural language interface for disambiguation of
Nurfadhlina Mohd Sharef
 

What's hot (20)

Minimal viable-datareuse-czi
Minimal viable-datareuse-cziMinimal viable-datareuse-czi
Minimal viable-datareuse-czi
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for Science
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge Graphs
 
The need for a transparent data supply chain
The need for a transparent data supply chainThe need for a transparent data supply chain
The need for a transparent data supply chain
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
 
An Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataAn Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities Data
 
Prov-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationProv-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance Visualization
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization Systems
 
Machines are people too
Machines are people tooMachines are people too
Machines are people too
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
 
Cognitive data
Cognitive dataCognitive data
Cognitive data
 
Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...
 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS case
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge Graph
 
Data Discovery and Visualization
Data Discovery and VisualizationData Discovery and Visualization
Data Discovery and Visualization
 
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
 
Self adaptive based natural language interface for disambiguation of
Self adaptive based natural language interface for disambiguation ofSelf adaptive based natural language interface for disambiguation of
Self adaptive based natural language interface for disambiguation of
 

Similar to Knowledge Graph Maintenance

The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so far
Elena Simperl
 
PPOL 650Issue Analysis Paper InstructionsYou will submit a p.docx
PPOL 650Issue Analysis Paper InstructionsYou will submit a p.docxPPOL 650Issue Analysis Paper InstructionsYou will submit a p.docx
PPOL 650Issue Analysis Paper InstructionsYou will submit a p.docx
ChantellPantoja184
 
PDS Unit - 1 Introdiction to DS.ppt
PDS Unit - 1 Introdiction to DS.pptPDS Unit - 1 Introdiction to DS.ppt
PDS Unit - 1 Introdiction to DS.ppt
ssuser52a19e
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
Sören Auer
 
Frontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisFrontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text Analysis
Jonathan Stray
 
Omitola birmingham cityuniv
Omitola birmingham cityunivOmitola birmingham cityuniv
Omitola birmingham cityuniv
Tope Omitola
 
Data dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLData dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNL
Anubhav Jain
 
Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...
Fernando de Assis Rodrigues
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenza
Giorgia Lodi
 
Analysis on the Demand of Top Talent Introduction in Big Dat
Analysis on the Demand of Top Talent Introduction in Big DatAnalysis on the Demand of Top Talent Introduction in Big Dat
Analysis on the Demand of Top Talent Introduction in Big Dat
MadonnaJacobsenfp
 
Analysis on the Demand of Top Talent Introduction in Big Dat.docx
Analysis on the Demand of Top Talent Introduction in Big Dat.docxAnalysis on the Demand of Top Talent Introduction in Big Dat.docx
Analysis on the Demand of Top Talent Introduction in Big Dat.docx
greg1eden90113
 
Analysis on the Demand of Top Talent Introduction in Big Dat.docx
Analysis on the Demand of Top Talent Introduction in Big Dat.docxAnalysis on the Demand of Top Talent Introduction in Big Dat.docx
Analysis on the Demand of Top Talent Introduction in Big Dat.docx
jack60216
 
Analysis on the Demand of Top Talent Introduction in Big Dat.docx
Analysis on the Demand of Top Talent Introduction in Big Dat.docxAnalysis on the Demand of Top Talent Introduction in Big Dat.docx
Analysis on the Demand of Top Talent Introduction in Big Dat.docx
SHIVA101531
 
Analysis on the Demand of Top Talent Introduction in Big Dat.docx
Analysis on the Demand of Top Talent Introduction in Big Dat.docxAnalysis on the Demand of Top Talent Introduction in Big Dat.docx
Analysis on the Demand of Top Talent Introduction in Big Dat.docx
daniahendric
 
Andy Petrella_Med@Scale by Data Fellas: Scalable and Interoperable Genomics d...
Andy Petrella_Med@Scale by Data Fellas: Scalable and Interoperable Genomics d...Andy Petrella_Med@Scale by Data Fellas: Scalable and Interoperable Genomics d...
Andy Petrella_Med@Scale by Data Fellas: Scalable and Interoperable Genomics d...
Dataconomy Media
 
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Andy Petrella
 
Linked Data Tutorial (Florianópolis)
Linked Data Tutorial (Florianópolis)Linked Data Tutorial (Florianópolis)
Linked Data Tutorial (Florianópolis)
Oscar Corcho
 
A Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia ArticlesA Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia Articles
ijma
 
Follow directions below using original work only. Based .docx
Follow directions below using original work only. Based .docxFollow directions below using original work only. Based .docx
Follow directions below using original work only. Based .docx
keugene1
 
Follow directions below using original work only. Based .docx
Follow directions below using original work only. Based .docxFollow directions below using original work only. Based .docx
Follow directions below using original work only. Based .docx
budbarber38650
 

Similar to Knowledge Graph Maintenance (20)

The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so far
 
PPOL 650Issue Analysis Paper InstructionsYou will submit a p.docx
PPOL 650Issue Analysis Paper InstructionsYou will submit a p.docxPPOL 650Issue Analysis Paper InstructionsYou will submit a p.docx
PPOL 650Issue Analysis Paper InstructionsYou will submit a p.docx
 
PDS Unit - 1 Introdiction to DS.ppt
PDS Unit - 1 Introdiction to DS.pptPDS Unit - 1 Introdiction to DS.ppt
PDS Unit - 1 Introdiction to DS.ppt
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
Frontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisFrontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text Analysis
 
Omitola birmingham cityuniv
Omitola birmingham cityunivOmitola birmingham cityuniv
Omitola birmingham cityuniv
 
Data dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLData dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNL
 
Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenza
 
Analysis on the Demand of Top Talent Introduction in Big Dat
Analysis on the Demand of Top Talent Introduction in Big DatAnalysis on the Demand of Top Talent Introduction in Big Dat
Analysis on the Demand of Top Talent Introduction in Big Dat
 
Analysis on the Demand of Top Talent Introduction in Big Dat.docx
Analysis on the Demand of Top Talent Introduction in Big Dat.docxAnalysis on the Demand of Top Talent Introduction in Big Dat.docx
Analysis on the Demand of Top Talent Introduction in Big Dat.docx
 
Analysis on the Demand of Top Talent Introduction in Big Dat.docx
Analysis on the Demand of Top Talent Introduction in Big Dat.docxAnalysis on the Demand of Top Talent Introduction in Big Dat.docx
Analysis on the Demand of Top Talent Introduction in Big Dat.docx
 
Analysis on the Demand of Top Talent Introduction in Big Dat.docx
Analysis on the Demand of Top Talent Introduction in Big Dat.docxAnalysis on the Demand of Top Talent Introduction in Big Dat.docx
Analysis on the Demand of Top Talent Introduction in Big Dat.docx
 
Analysis on the Demand of Top Talent Introduction in Big Dat.docx
Analysis on the Demand of Top Talent Introduction in Big Dat.docxAnalysis on the Demand of Top Talent Introduction in Big Dat.docx
Analysis on the Demand of Top Talent Introduction in Big Dat.docx
 
Andy Petrella_Med@Scale by Data Fellas: Scalable and Interoperable Genomics d...
Andy Petrella_Med@Scale by Data Fellas: Scalable and Interoperable Genomics d...Andy Petrella_Med@Scale by Data Fellas: Scalable and Interoperable Genomics d...
Andy Petrella_Med@Scale by Data Fellas: Scalable and Interoperable Genomics d...
 
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
 
Linked Data Tutorial (Florianópolis)
Linked Data Tutorial (Florianópolis)Linked Data Tutorial (Florianópolis)
Linked Data Tutorial (Florianópolis)
 
A Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia ArticlesA Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia Articles
 
Follow directions below using original work only. Based .docx
Follow directions below using original work only. Based .docxFollow directions below using original work only. Based .docx
Follow directions below using original work only. Based .docx
 
Follow directions below using original work only. Based .docx
Follow directions below using original work only. Based .docxFollow directions below using original work only. Based .docx
Follow directions below using original work only. Based .docx
 

More from Paul Groth

To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AI
Paul Groth
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge Graph
Paul Groth
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domains
Paul Groth
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computation
Paul Groth
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicine
Paul Groth
 
Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Are we finally ready for transclusion?*
Are we finally ready for transclusion?*
Paul Groth
 
Structured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialStructured Data & the Future of Educational Material
Structured Data & the Future of Educational Material
Paul Groth
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic Framework
Paul Groth
 
Data for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchersData for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchers
Paul Groth
 
Tradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance CaptureTradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance Capture
Paul Groth
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPedia
Paul Groth
 
Information architecture at Elsevier
Information architecture at ElsevierInformation architecture at Elsevier
Information architecture at Elsevier
Paul Groth
 

More from Paul Groth (13)

To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AI
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge Graph
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domains
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computation
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicine
 
Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Are we finally ready for transclusion?*
Are we finally ready for transclusion?*
 
Structured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialStructured Data & the Future of Educational Material
Structured Data & the Future of Educational Material
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic Framework
 
Data for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchersData for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchers
 
Tradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance CaptureTradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance Capture
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPedia
 
Information architecture at Elsevier
Information architecture at ElsevierInformation architecture at Elsevier
Information architecture at Elsevier
 

Recently uploaded

JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 

Recently uploaded (20)

JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 

Knowledge Graph Maintenance

  • 1. Knowledge Graph Maintenance Prof. Paul Groth | @pgroth | pgroth.com | indelab.org Thanks to Daniel Daza, Corey Harper, Thiviyan Thanapalsingam, Niels ten Oever, Marieke van Erp, Valentin Vogelman and Frank van Harmelen NEC Lab Europae January 22, 2021
  • 2. We investigate intelligent systems that support people in their work with data and information from diverse sources. In this area, we perform applied and fundamental research informed by empirical insights into data science practice. Current topics: • Automated Knowledge Base Construction • Data Search + Data Provenance • Data Management for Machine Learning • Causality for machine learning on messy data indelab.org
  • 3. Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure W R I T T E N B Y Nadia Eghbal Source: https://www.fordfoundation.org/work/learning/research-reports/roads-and-bridges-the-unseen-labor-behind-our- digital-infrastructure/
  • 5. Source: Azzaoui, K., Jacoby, E., Senger, S., Rodríguez, E. C., Loza, M., Zdrazil, B., … Ecker, G. F. (2013). Scienti fi c competency questions as the basis for semantically enriched open pharmacological space development. Drug Discovery Today, 18(17–18), 843–852. https://doi.org/10.1016/j.drudis.2013.05.008
  • 6.
  • 10. Crowdsourcing 100,000s of hand annotated examples The TAC Relation Extraction Dataset Source: Zhang, Yuhao, et al. "Position-aware attention and supervised data improve slot fi lling." Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017. Karen Fort, Gilles Adda, Kevin Bretonnel Cohen. Amazon Mechanical Turk: Gold Mine or Coal Mine?. Computational Linguistics, Massachusetts Institute of Technology Press (MIT Press), 2011, pp.413-420. 10.1162/COLI_a_00057
  • 11. Concept1 Concept2 Concept3 KOS Professional Curators Literature Software Non-professional contributors 1. dealing with changing cultural and societal norms, specifically to address or correct bias; 2. political influence 3. new concepts and terminology arising from discoveries or change in perspective within a technical/scientific community 4. gardening 5. incremental contributorship 6. progressive formalization 7. software and automation 8. integration of large numbers of data sources 9. variance in algorithm training data Data ⚐ Society & Politics (4, 5, 6) (7, 8, 9) (3) (1, 2) Source: Michael Lauruhn and Paul Groth. “Sources of Change for Modern Knowledge Organization Systems." Knowledge Organization 43, no. 8 (2016).
  • 12. Content Universal schema Surface form relations Structured relations Factorization model Matrix Construction Open Information Extraction Entity Resolution Matrix Factorization Knowledge graph Curation Predicted relations Matrix Completion Taxonomy Triple Extraction Concept Resolution 14M SD articles 475 M triples 3.3 million relations 49 M relations ~15k -> 1M entries Paul Groth, Sujit Pal, Darin McBeath, Brad Allen, Ron Daniel “Applying Universal Schemas for Domain Specific Ontology Expansion” 5th Workshop on Automated Knowledge Base Construction (AKBC) 2016 Knowledge Graph Curation and Re fi nement
  • 15. Inductive Prediction Daniel Daza, Michael Cochez, and Paul Groth. "Inductive Entity Representations from Text via Link Prediction." arXiv preprint arXiv:2010.03496 (2020). Accepted to WebConf 2021
  • 19.
  • 20. Evaluation Scenarios • Transductive (standard) - predict links between entities seen at training time • Dynamic - predict links between entities where a unseen entity at training time can be in the head, tail or both positions of the triple. • Situation: adding entities to a KG • Transfer - predict links on entities that are unseen at training time. • Situation: predicting links on an unseen KG
  • 22. Representations for other tasks https://github.com/dfdazac/blp
  • 23.
  • 24. “john lennon, parents” “bicycle holiday nature” “Airports in Germany” “What is the longest river?”.
  • 26. Future: Learning KG Pipelines End-to-End Paul T. Groth, Antony Scerri, Ron Daniel, Bradley P. Allen:
 End-to-End Learning for Answering Structured Queries Directly over Text. DL4KG@ESWC 2019: 57-70
  • 27.
  • 28. KG POPULATION & THE LONG TAIL
  • 29. The Importance of Attributes
  • 30. • Extract numeric patterns (12 ± 3, 53–55, 0.245) • Extract corresponding units (°C, μM, hours, h, MPa) • Nanoamperes (nA) for neural cell Rheobase values • Megapascals (MPa) for compressive strength of concrete • Milligrams per Kilogram (mg/kg) for administered drug dosages Units of Measurement Corey Harper
  • 31. Simple Annotation - Surprisingly Hard
  • 35. More and more detailed guidelines...
  • 36. And started annotating, a paragraph at a time...
  • 37. NLP Competitions -- Labs Online Lecture -- Thorne, Harper But the paragraphs got messy... 2020-12-07
  • 38.
  • 40. NLP Competitions -- Labs Online Lecture -- Thorne, Harper 2020-12-07 40
  • 41. NLP Competitions -- Labs Online Lecture -- Thorne, Harper Some Inter-annotator Agreement Info (Krippendorff’s Alpha) 2020-12-07 41
  • 42. NLP Competitions -- Labs Online Lecture -- Thorne, Harper Evaluation Output 2020-12-07 42
  • 43. NLP Competitions -- Labs Online Lecture -- Thorne, Harper Evaluation Process 2020-12-07 43
  • 44. NLP Competitions -- Labs Online Lecture -- Thorne, Harper Local Evaluation Micro Averages • Can run micro averages • Broken down: • By entity / relation class • By document (paragraph) • By class & document • Could add additional analysis 2020-12-07 44
  • 45. KG POPULATION & THE LONG TAIL
  • 46. LINKING & IDENTITY Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 2129–2137 Marseille, 11–16 May 2020 c European Language Resources Association (ELRA), licensed under CC-BY-NC 2129 Towards Entity Spaces Marieke van Erp⇤ , Paul Groth† ⇤ KNAW Humanities Cluster - DHLab, Amsterdam, NL † University of Amsterdam, Amsterdam, NL marieke.van.erp@dh.huc.knaw.nl, p.groth@uva.nl Abstract Entities are a central element of knowledge bases and are important input to many knowledge-centric tasks including text analysis. For example, they allow us to find documents relevant to a specific entity irrespective of the underlying syntactic expression within a document. However, the entities that are commonly represented in knowledge bases are often a simplification of what is truly being referred to in text. For example, in a knowledge base, we may have an entity for Germany as a country but not for the more fuzzy concept of Germany that covers notions of German Population, German Drivers, and the German Government. Inspired by recent advances in contextual word embeddings, we introduce the concept of entity spaces - specific representations of a set of associated entities with near-identity. Thus, these entity spaces provide a handle to an amorphous grouping of entities. We developed a proof-of-concept for English showing how, through the introduction of entity spaces in the form of disambiguation pages, the recall of entity linking can be improved. Keywords: entity, identity, knowledge representation, entity linking 1. Introduction Entities are a central element for knowledge bases and text analysis tasks (Balog, 2018). However, the way in which entities are represented in knowledge bases and how subsequent tools use these representations are a sim- plification of the complexity of many entities. For ex- ample, the entity Germany in Wikidata as represented by wikidata:Q183 focuses on its properties as a location and geopolitical entity due to its membership as an in- stance of sovereign state, country, federal state, repub- lic, social state, legal state, and administrative territorial entity. Similarly, in DBpedia (version 2016-10), Ger- many is represented as entity of type populated place and some subtypes such as yago:WikicatFederalCountries and yago:WikicatMemberStatesOfTheEuropeanUnion.1 However, when the term Germany is used in text, it can take on many meanings that all have ‘something to do’ with Germany as it is represented in knowledge bases, but are all not quite the same: (1) Germany imported 47,600 sheep from Britain last year, nearly half of total imports. (2) German July car registrations up 14.2 pct yr / yr. (3) Australia last won the Davis Cup in 1986, but they were beaten finalists against Germany three years ago under Fraser’s guidance. In Example (1), Germany refers partly to the location, but a location usually cannot take on an active role, such that the entity ‘importing’ the sheep is most likely a referent to the German meat industry. Germany in Example (2), refers to the German population buying and registering more cars than a year before. Finally, in Example (3), Germany refers to the German Davis Cup team from 1993 (the news article is from 1996). In the AIDA-YAGO dataset, this entity is tagged as dbp:Germany Davis Cup team but this presents 1 Germany also has rdf:type dbo:Person but we assume this is a glitch. us with another layer of identities, namely that every year, or every couple of years, the German Davis cup team con- sists of different players. In 1993, the German Davis cup team consisted of Michael Stich and Marc-Kevin Goellner, in 1996 of David Prinosil and Hendrik Dreekmann and at the time of writing this article in 2019 of Alexander Zverev and Philipp Kohlschreiber. Both MAG (Moussallem et al., 2017) and DBpedia spotlight (Daiber et al., 2013a) annotate Australia and Germany in Example (3) as dbp:Australia and dbp:Germany respectively. While both the annotations and automatic linkages are close to the identity of the en- tity in resolving these referents to dbp:Germany, we argue this is an underspecification and highlights a larger problem with identity representation in knowledge bases. Collapsing of identities has been a frequent topic within Semantic Web discourse. However, most discussions have focused on issues with owl:sameAs links (McCusker and McGuinness, 2010; Raad et al., 2018). However, the prob- lem of simplified entity representations (e.g. the collaps- ing of identities) also occurs before the creation of such owl:sameAs links. Specifically, with the fact that most knowledge bases represent a single or limited number of an entity’s facets. In this paper, we analyse the extent of the problem by connecting Semantic Web representations of identity to linguistic representations of entities, namely coreference and near-identity. To overcome this identity problem, we argue for the introduction of explicit represen- tations of near-identity within knowledge bases. We term these explicit representations - entity spaces. We illustrate how the introduction of entity spaces can boost the perfor- mance of state-of-the-art entity linking pipelines. Our contributions are: 1) the definition of entity spaces; 2) a prototype showing the use of entity spaces over multiple entity linking pipelines; and 3) experiments on 13 English entity linking datasets showing the impact of a more toler- ant approach to entity linking made possible through entity spaces. Our code and experimental results are available via https: //github.com/MvanErp/entity-spaces. A good question: Thinking about an answer:
  • 47. Situated Dialogue • Computer mediated dialogue is omnipresent • Key observation: dialogue is situated. It not only is about the “back-and-forth” between parties but also the larger environment (e.g. documents, concepts, projects, world knowledge). • Not just chat
  • 48. Complex Environments (e.g. Standards Organizations) in-sight-it.github.io
  • 49. conversationkg - kg extraction from dialogue
  • 50. Concept1 Concept2 Concept3 KOS Professional Curators Literature Software Non-professional contributors 1. dealing with changing cultural and societal norms, specifically to address or correct bias; 2. political influence 3. new concepts and terminology arising from discoveries or change in perspective within a technical/scientific community 4. gardening 5. incremental contributorship 6. progressive formalization 7. software and automation 8. integration of large numbers of data sources 9. variance in algorithm training data Data ⚐ Society & Politics (4, 5, 6) (7, 8, 9) (3) (1, 2) Source: Michael Lauruhn and Paul Groth. “Sources of Change for Modern Knowledge Organization Systems." Knowledge Organization 43, no. 8 (2016).
  • 51. Knowledge Engineering Revisited • Knowledge graphs are built ad-hoc • 100s of components (extractors, scrapers, quality, scoring,  user feedback, ….) • Unique for each organization • Existing knowledge engineering theory does not apply: • Assumes small scale • Assumes slow change • People-centric • Expressive representations • an updated theory and methods for knowledge engineering designed for the demands of modern knowledge graphs
  • 53. Conclusion • Knowledge graphs require maintenance • Maintenance is frequently people work • Need for new ML based methods & new human + machine work fl ows • Interested? Happy to talk more Paul Groth | @pgroth | pgroth.com | indelab.org