Knowledge Graph Curation:
A Practical Framework
Elwin Huaman and Dieter Fensel
Semantic Technology Institute (STI) Innsbruck
Department of Computer Science,
University of Innsbruck, Austria
IJCKG 2021
Elwin Huaman | IJCKG 2021 | 08/12/2021
Outline
● What?
Basics - Research questions
● How?
Approach - Solution
● Why?
Motivation
2
What?
Basics - Research questions
3
Elwin Huaman | IJCKG 2021 | 23/11/2021
Elwin Huaman | IJCKG 2021 | 08/12/2021
What are Knowledge Graphs (KGs)?
Over the last decade, creating and especially maintaining large KGs have gained attention.
4
Which KG
is best for
me?
What about their:
● Quality
● Correctness
● Completeness
Elwin Huaman | IJCKG 2021 | 08/12/2021
What is Knowledge Graph Curation?
5
[Fensel et al., 2020]
It is part of the knowledge graph lifecycle.
How to curate KGs?
● How to assess their quality?
● How to improve their correctness?
● How to improve their completeness?
How?
Approach - Solution
6
Elwin Huaman | IJCKG 2021 | 23/11/2021
Elwin Huaman | IJCKG 2021 | 08/12/2021
How to Curate Knowledge Graphs?
7
[Fensel et al., 2020]
The first step to curate KGs is to evaluate their quality.
How to assess KGs quality?
1. Accessibility
2. Accuracy
3. Appropriate amount
4. Believability
5. Completeness
6. Concise representation
7. Consistent representation
8. Cost-effectiveness
9. Ease of manipulation
10. Ease of operation
11. Ease of understanding
12. Free-of-error
13. Interoperability
14. Objectivity
15. Relevancy
16. Reputation
17. Security
18. Timeliness
19. Traceability
20. Variety
Elwin Huaman | IJCKG 2021 | 08/12/2021
How to Curate Knowledge Graphs?
8
[Fensel et al., 2020]
The first step to curate KGs is to evaluate their quality.
How to assess KGs quality?
Elwin Huaman | IJCKG 2021 | 08/12/2021
How to Curate Knowledge Graphs?
9
[Fensel et al., 2020]
Cleaning task aims to improve the correctness of KGs.
How to improve KGs correctness?
❏ Detecting errors
❏ Correcting errors
● Verification
○ Check schema conformance
and integrity constraints.
■ RDFUnit, SHACL, ShEx,
SPIN, Stardog ICV, ...
● Validation
○ Compare with "real" world, a.k.a.
Fact Checking.
■ COPAAL, DeFacto,
FactCheck, FacTify, Leopard,
Surface, Tracy
Elwin Huaman | IJCKG 2021 | 08/12/2021
How to Curate Knowledge Graphs?
10
[Fensel et al., 2020]
Enrichment task aims to improve the completeness of KGs.
How to improve KGs completeness?
❏ Finding relevant KGs
❏ Duplicate detection
❏ Entity fusion
● Duplicate detection
○ Identifying duplicates of a same
entity in a single or various KGs.
■ ADEL, DDaaS, Dedupe, DuDe,
Duke, Legato, LIMES, SERIMI, Silk,
…
● Entity fusion
○ Resolving conflicting property
value assertions.
■ FAGI, Sieve, SLIPO Toolkit, …
Elwin Huaman | IJCKG 2021 | 08/12/2021
How to Curate Knowledge Graphs?
11
[Fensel et al., 2020]
Knowledge Graph Curation Framework
Assessing
KGs
Quality
Metrics
<<datastore>>
Assessment
Report
Weights
Mapping
& Indexing
KGs
Domain
Specif.
<<datastore>>
Verification
Report
Verifier
Constraints
[0.1]
<<datastore>>
Validation
Report
Validator
Instance
Validation
Triple
Validation
[0.1]
Validation
Strategies
Configuration
Learning
Config.
Instance
Matching
<<datastore>>
Duplicates
Report
Entity
Fusion
Fusion
Strategies
<<datastore>>
Fusion
Report
Why?
Motivation
12
Elwin Huaman | IJCKG 2021 | 23/11/2021
Elwin Huaman | IJCKG 2021 | 08/12/2021
Why Curation of Knowledge Graphs?
13
Are they needed?
World Wide Web
E.g. Google as a Query Answering Engine
Virtual intelligent agents
E.g. Bots
Physical intelligent agents
E.g. Autonomous cars
In May 2016 Joshua Brown was killed by his car because its auto
pilot mixed up a very long car (large wheelbase) with a traffic sign.
This is what the auto pilot “saw”.
Why do not connect the car with a Knowledge Graph containing
traffic data that simply knows that there is no traffic sign?
Elwin Huaman | IJCKG 2021 | 08/12/2021
Insights & Limitations
❏ Assessment
❏ Automation
❏ Cost-effectiveness
❏ Dynamic-data
❏ Prevention
❏ Reproducibility
❏ Re-usability
❏ User-in-the-loop
❏ Scalability
14
Elwin Huaman | IJCKG 2021 | 08/12/2021
Summary
● A practical framework
○ Assessment
■ Quality Dimensions
○ Cleaning
■ Verification
■ Validation
○ Enrichment
■ Duplicate Detection
■ Entity Fusion
● Insights and limitations
15
Thank you!
@ElwinHuaman
16

Knowledge Graph Curation: A Practical Framework

  • 1.
    Knowledge Graph Curation: APractical Framework Elwin Huaman and Dieter Fensel Semantic Technology Institute (STI) Innsbruck Department of Computer Science, University of Innsbruck, Austria IJCKG 2021
  • 2.
    Elwin Huaman |IJCKG 2021 | 08/12/2021 Outline ● What? Basics - Research questions ● How? Approach - Solution ● Why? Motivation 2
  • 3.
    What? Basics - Researchquestions 3 Elwin Huaman | IJCKG 2021 | 23/11/2021
  • 4.
    Elwin Huaman |IJCKG 2021 | 08/12/2021 What are Knowledge Graphs (KGs)? Over the last decade, creating and especially maintaining large KGs have gained attention. 4 Which KG is best for me? What about their: ● Quality ● Correctness ● Completeness
  • 5.
    Elwin Huaman |IJCKG 2021 | 08/12/2021 What is Knowledge Graph Curation? 5 [Fensel et al., 2020] It is part of the knowledge graph lifecycle. How to curate KGs? ● How to assess their quality? ● How to improve their correctness? ● How to improve their completeness?
  • 6.
    How? Approach - Solution 6 ElwinHuaman | IJCKG 2021 | 23/11/2021
  • 7.
    Elwin Huaman |IJCKG 2021 | 08/12/2021 How to Curate Knowledge Graphs? 7 [Fensel et al., 2020] The first step to curate KGs is to evaluate their quality. How to assess KGs quality? 1. Accessibility 2. Accuracy 3. Appropriate amount 4. Believability 5. Completeness 6. Concise representation 7. Consistent representation 8. Cost-effectiveness 9. Ease of manipulation 10. Ease of operation 11. Ease of understanding 12. Free-of-error 13. Interoperability 14. Objectivity 15. Relevancy 16. Reputation 17. Security 18. Timeliness 19. Traceability 20. Variety
  • 8.
    Elwin Huaman |IJCKG 2021 | 08/12/2021 How to Curate Knowledge Graphs? 8 [Fensel et al., 2020] The first step to curate KGs is to evaluate their quality. How to assess KGs quality?
  • 9.
    Elwin Huaman |IJCKG 2021 | 08/12/2021 How to Curate Knowledge Graphs? 9 [Fensel et al., 2020] Cleaning task aims to improve the correctness of KGs. How to improve KGs correctness? ❏ Detecting errors ❏ Correcting errors ● Verification ○ Check schema conformance and integrity constraints. ■ RDFUnit, SHACL, ShEx, SPIN, Stardog ICV, ... ● Validation ○ Compare with "real" world, a.k.a. Fact Checking. ■ COPAAL, DeFacto, FactCheck, FacTify, Leopard, Surface, Tracy
  • 10.
    Elwin Huaman |IJCKG 2021 | 08/12/2021 How to Curate Knowledge Graphs? 10 [Fensel et al., 2020] Enrichment task aims to improve the completeness of KGs. How to improve KGs completeness? ❏ Finding relevant KGs ❏ Duplicate detection ❏ Entity fusion ● Duplicate detection ○ Identifying duplicates of a same entity in a single or various KGs. ■ ADEL, DDaaS, Dedupe, DuDe, Duke, Legato, LIMES, SERIMI, Silk, … ● Entity fusion ○ Resolving conflicting property value assertions. ■ FAGI, Sieve, SLIPO Toolkit, …
  • 11.
    Elwin Huaman |IJCKG 2021 | 08/12/2021 How to Curate Knowledge Graphs? 11 [Fensel et al., 2020] Knowledge Graph Curation Framework Assessing KGs Quality Metrics <<datastore>> Assessment Report Weights Mapping & Indexing KGs Domain Specif. <<datastore>> Verification Report Verifier Constraints [0.1] <<datastore>> Validation Report Validator Instance Validation Triple Validation [0.1] Validation Strategies Configuration Learning Config. Instance Matching <<datastore>> Duplicates Report Entity Fusion Fusion Strategies <<datastore>> Fusion Report
  • 12.
    Why? Motivation 12 Elwin Huaman |IJCKG 2021 | 23/11/2021
  • 13.
    Elwin Huaman |IJCKG 2021 | 08/12/2021 Why Curation of Knowledge Graphs? 13 Are they needed? World Wide Web E.g. Google as a Query Answering Engine Virtual intelligent agents E.g. Bots Physical intelligent agents E.g. Autonomous cars In May 2016 Joshua Brown was killed by his car because its auto pilot mixed up a very long car (large wheelbase) with a traffic sign. This is what the auto pilot “saw”. Why do not connect the car with a Knowledge Graph containing traffic data that simply knows that there is no traffic sign?
  • 14.
    Elwin Huaman |IJCKG 2021 | 08/12/2021 Insights & Limitations ❏ Assessment ❏ Automation ❏ Cost-effectiveness ❏ Dynamic-data ❏ Prevention ❏ Reproducibility ❏ Re-usability ❏ User-in-the-loop ❏ Scalability 14
  • 15.
    Elwin Huaman |IJCKG 2021 | 08/12/2021 Summary ● A practical framework ○ Assessment ■ Quality Dimensions ○ Cleaning ■ Verification ■ Validation ○ Enrichment ■ Duplicate Detection ■ Entity Fusion ● Insights and limitations 15
  • 16.