Towards Knowledge Graphs Validation
through Weighted Knowledge Sources
Elwin Huaman, Amar Tauqeer, and Anna Fensel
Semantic Technology Institute (STI) Innsbruck
Department of Computer Science,
University of Innsbruck, Austria
KGSWC 2021 Next Generation
Elwin Huaman | KGSWC2021 | 23/11/2021
Outline
● What?
Basics - Research questions
● How?
Approach - Solution
● Why?
Use cases
2
What?
Basics - Research questions
3
Elwin Huaman | KGSWC2021 | 23/11/2021
Elwin Huaman | KGSWC2021 | 23/11/2021
What?
Weighted knowledge sources are data sources that have different weights (or degree of
importance) for different application scenarios.
Weighted Knowledge Sources
4
Which KG is best for me?
● Quality - fitness for use
● Whether data complies to the user's need
● Dependent on tasks
Elwin Huaman | KGSWC2021 | 23/11/2021
●
●
● … and more statements can be added
●
●
●
● Forming a graph
●
●
●
●
● Graphs can be created independently
●
●
●
●
●
● … and can be integrated
●
● We can add more statements
What?
Knowledge Graphs are very large semantic nets that integrate various and heterogeneous
information sources to represent knowledge about certain domains of discourse.
Knowledge Graphs
:anna
:cs101
:enrolledIn
21
Anna
:name
:age
:carol
:enrolledIn
:knows
Carol
:name
Programming
:subject
:Puno
:birthPlace
Puno
:name
:luis
:Puno
Puno
:name
:birthPlace
None
Luis
:name :age
:cs102
:enrolledIn
Algebra
:subject
:enrolledIn
sameAs
Entity
Literal
Relationship
sameAs relatioship
prefix : <http://example.org/>
● Basic statement (or triple)
5
Elwin Huaman | KGSWC2021 | 23/11/2021
What?
Knowledge Graphs Validation task aims measuring whether statements from KGs are
semantically correct and correspond to the so-called "real" world.
The University of Innsbruck is located in the city of Innsbruck
A simple statement or triple.
A triple = (subject, predicate, object)
Knowledge Graphs Validation
is located in
University of Innsbruck City of Innsbruck
http://schema.org/containedInPlace
http://example.com/University_of_Innsbruck http://example.com/Innsbruck
A triple:
An RDF triple:
6
Elwin Huaman | KGSWC2021 | 23/11/2021
What?
prefix : <http://example.org/>
prefix e: <http://example.com/>
prefix so: <http://schema.org/>
● Wrong instance assertion
E.g. :anna is a Person, not a Product
What needs to be fixed?
Type
Entity
Literal
Relationship
sameAs relatioship :anna
:cs101
so:knows
21
Anna
so:name
so:age
:carol
so:teaches
so:knows
Carol
so:name
Programming
so:name
:Puno
so:birthPlace
Puno
so:name
e:luis
e:Puno
Puno
so:name
so:birthPlace
None
Luis
so:name
so:age
sameAs
so:Course so:Product so:Place
cs101
so:courseCode
● Wrong property value assertion
E.g. so:knows is semantically wrong
● Wrong equality assertion
E.g. :Puno and e:Puno are related, but not
the same
● …
7
Knowledge Graphs Validation
Elwin Huaman | KGSWC2021 | 23/11/2021
Compute a confidence score for every triple (or statement) and instance in KGs. The computed score is
based on finding the same instances across different weighted knowledge sources and comparing their
features.
What?
Towards Knowledge Graphs Validation through Weighted
Knowledge Sources
8
Validator
Reliable
KGs
KG
[0.1]
Validator
Weights
How?
Approach - Solution
9
Elwin Huaman | KGSWC2021 | 23/11/2021
Elwin Huaman | KGSWC2021 | 23/11/2021
Input: The user has two options, a) to provide a SPARQL endpoint where to fetch the data from, or b) to
load a dataset in a Turtle format.
How?
Towards Knowledge Graphs Validation through Weighted Knowledge Sources
10
Validator
Reliable
KGs
KG
Elwin Huaman | KGSWC2021 | 23/11/2021
Mapping: The validator maps the input KG and the external sources to a common format.
How?
Towards Knowledge Graphs Validation through Weighted Knowledge Sources
11
Validator
Reliable
KGs
KG
Mapping
DS
Validator
Elwin Huaman | KGSWC2021 | 23/11/2021
Instance Matching: The Validator requests to define at least two or more properties (e.g., name and geo
coordinates) that are to be used for the instance matching process.
How?
Towards Knowledge Graphs Validation through Weighted Knowledge Sources
12
Validator
Reliable
KGs
KG
Mapping
DS
Instance
matching
Validator
Elwin Huaman | KGSWC2021 | 23/11/2021
Confidence Measurement / Triple validation: Calculates a confidence score of whether a property value
on various external sources matches the property value in the user’s KG.
How?
Towards Knowledge Graphs Validation through Weighted Knowledge Sources
13
Validator
Reliable
KGs
KG
Mapping
DS
Instance
matching
Triple
validation
Weights
Confidence Measurement
Validator
[0.1]
Elwin Huaman | KGSWC2021 | 23/11/2021
Confidence Measurement / Instance validation: Computes the aggregated score from the attribute
space of an instance.
How?
Towards Knowledge Graphs Validation through Weighted Knowledge Sources
14
Validator
Reliable
KGs
KG
Mapping
DS
Instance
matching
Triple
validation
Instance
validation
[0.1]
Weights
Confidence Measurement
Validator
[0.1]
Elwin Huaman | KGSWC2021 | 23/11/2021
Output: The computed scores for triples and instances are shown in a graphical user interface.
How?
Towards Knowledge Graphs Validation through Weighted Knowledge Sources
15
Validator
Reliable
KGs
KG
Mapping
DS
Instance
matching
Triple
validation
Instance
validation
[0.1] [0.1]
Weights
Confidence Measurement
Validator
Elwin Huaman | KGSWC2021 | 23/11/2021
Output: The computed scores for triples and instances are shown in a graphical user interface.
How?
Towards Knowledge Graphs Validation through Weighted Knowledge Sources
16
Validator
Elwin Huaman | KGSWC2021 | 23/11/2021
Evaluation I:
● Dataset: A subset of the Tirol Knowledge
Graph (~15 Billion statements).
○ 50 Hotel instances
● Baseline: We performed a manual validation.
○ Precision, Recall, and F-measure
● Result: F-measure of at least 75% on
address, name, and phone properties.
How?
Towards Knowledge Graphs Validation through Weighted Knowledge Sources
17
Validator
Comparison of precision, recall, and f-measure scores over the
manual and semi-automatic validation.
Elwin Huaman | KGSWC2021 | 23/11/2021
Evaluation II:
● Dataset: Pantheon dataset 11341 famous biographies
○ 2530 politician instances
● Setup: We defined two external sources.
○ Wikidata and DBpedia.
● Result: ~15 minutes.
○ Overall recall scores are
■ 0.36% (DBpedia)
■ 0.49% (Wikidata)
How?
Towards Knowledge Graphs Validation through Weighted Knowledge Sources
18
Validator
The recall score results of the validation of politician instances.
Why?
Use cases
19
Elwin Huaman | KGSWC2021 | 23/11/2021
Elwin Huaman | KGSWC2021 | 23/11/2021
Why?
Towards Knowledge Graphs Validation through Weighted Knowledge Sources
Use cases:
Semantic correctness of a triple.
E.g. To validate if the shown data of a person,
business, are correct based on different sources
20
Elwin Huaman | KGSWC2021 | 23/11/2021
Semantic correctness of a triple.
E.g. To validate if the shown data of a person,
business, are correct based on different sources
Linking different Knowledge Sources.
E.g. Linking an instance of the user’s KG with the matched
instance in Wikidata
Why?
Use cases:
21
Towards Knowledge Graphs Validation through Weighted Knowledge Sources
Elwin Huaman | KGSWC2021 | 23/11/2021
Semantic correctness of a triple.
E.g. To validate if the shown data of a person,
business, are correct based on different sources
Linking different Knowledge Sources.
E.g. Linking an instance of the user’s KG with the matched
instance in Wikidata.
Validating static data.
E.g. Checking whether the addresses of hotels are
up-to-date and are correctly shown by external sources.
Why?
Use cases:
22
Towards Knowledge Graphs Validation through Weighted Knowledge Sources
Elwin Huaman | KGSWC2021 | 23/11/2021
Insights & Limitations
❏ Assessment
❏ Automation
❏ Cost-effectiveness
❏ Dynamic-data
❏ Scalability
23
Elwin Huaman | KGSWC2021 | 23/11/2021
Summary
● A Validation framework
○ Mapping
○ Instance Matching
○ Confidence Measurement
■ Triple validation
■ Instance Validation
○ GUI
● Use cases
● Insights and limitations
24
Elwin Huaman | KGSWC2021 | 23/11/2021
Acknowledgement
Univ.-Prof. Dr. Fensel Dieter
Assoc.-Prof. Dr. Fensel Anna
Tauqeer Amar M.Sc.
MindLab (mindlab.ai)
WordLiftNG (wordlift.io/ng/)
STI
Projects
Next Generation
25

Towards Knowledge Graphs Validation through Weighted Knowledge Sources

  • 1.
    Towards Knowledge GraphsValidation through Weighted Knowledge Sources Elwin Huaman, Amar Tauqeer, and Anna Fensel Semantic Technology Institute (STI) Innsbruck Department of Computer Science, University of Innsbruck, Austria KGSWC 2021 Next Generation
  • 2.
    Elwin Huaman |KGSWC2021 | 23/11/2021 Outline ● What? Basics - Research questions ● How? Approach - Solution ● Why? Use cases 2
  • 3.
    What? Basics - Researchquestions 3 Elwin Huaman | KGSWC2021 | 23/11/2021
  • 4.
    Elwin Huaman |KGSWC2021 | 23/11/2021 What? Weighted knowledge sources are data sources that have different weights (or degree of importance) for different application scenarios. Weighted Knowledge Sources 4 Which KG is best for me? ● Quality - fitness for use ● Whether data complies to the user's need ● Dependent on tasks
  • 5.
    Elwin Huaman |KGSWC2021 | 23/11/2021 ● ● ● … and more statements can be added ● ● ● ● Forming a graph ● ● ● ● ● Graphs can be created independently ● ● ● ● ● ● … and can be integrated ● ● We can add more statements What? Knowledge Graphs are very large semantic nets that integrate various and heterogeneous information sources to represent knowledge about certain domains of discourse. Knowledge Graphs :anna :cs101 :enrolledIn 21 Anna :name :age :carol :enrolledIn :knows Carol :name Programming :subject :Puno :birthPlace Puno :name :luis :Puno Puno :name :birthPlace None Luis :name :age :cs102 :enrolledIn Algebra :subject :enrolledIn sameAs Entity Literal Relationship sameAs relatioship prefix : <http://example.org/> ● Basic statement (or triple) 5
  • 6.
    Elwin Huaman |KGSWC2021 | 23/11/2021 What? Knowledge Graphs Validation task aims measuring whether statements from KGs are semantically correct and correspond to the so-called "real" world. The University of Innsbruck is located in the city of Innsbruck A simple statement or triple. A triple = (subject, predicate, object) Knowledge Graphs Validation is located in University of Innsbruck City of Innsbruck http://schema.org/containedInPlace http://example.com/University_of_Innsbruck http://example.com/Innsbruck A triple: An RDF triple: 6
  • 7.
    Elwin Huaman |KGSWC2021 | 23/11/2021 What? prefix : <http://example.org/> prefix e: <http://example.com/> prefix so: <http://schema.org/> ● Wrong instance assertion E.g. :anna is a Person, not a Product What needs to be fixed? Type Entity Literal Relationship sameAs relatioship :anna :cs101 so:knows 21 Anna so:name so:age :carol so:teaches so:knows Carol so:name Programming so:name :Puno so:birthPlace Puno so:name e:luis e:Puno Puno so:name so:birthPlace None Luis so:name so:age sameAs so:Course so:Product so:Place cs101 so:courseCode ● Wrong property value assertion E.g. so:knows is semantically wrong ● Wrong equality assertion E.g. :Puno and e:Puno are related, but not the same ● … 7 Knowledge Graphs Validation
  • 8.
    Elwin Huaman |KGSWC2021 | 23/11/2021 Compute a confidence score for every triple (or statement) and instance in KGs. The computed score is based on finding the same instances across different weighted knowledge sources and comparing their features. What? Towards Knowledge Graphs Validation through Weighted Knowledge Sources 8 Validator Reliable KGs KG [0.1] Validator Weights
  • 9.
    How? Approach - Solution 9 ElwinHuaman | KGSWC2021 | 23/11/2021
  • 10.
    Elwin Huaman |KGSWC2021 | 23/11/2021 Input: The user has two options, a) to provide a SPARQL endpoint where to fetch the data from, or b) to load a dataset in a Turtle format. How? Towards Knowledge Graphs Validation through Weighted Knowledge Sources 10 Validator Reliable KGs KG
  • 11.
    Elwin Huaman |KGSWC2021 | 23/11/2021 Mapping: The validator maps the input KG and the external sources to a common format. How? Towards Knowledge Graphs Validation through Weighted Knowledge Sources 11 Validator Reliable KGs KG Mapping DS Validator
  • 12.
    Elwin Huaman |KGSWC2021 | 23/11/2021 Instance Matching: The Validator requests to define at least two or more properties (e.g., name and geo coordinates) that are to be used for the instance matching process. How? Towards Knowledge Graphs Validation through Weighted Knowledge Sources 12 Validator Reliable KGs KG Mapping DS Instance matching Validator
  • 13.
    Elwin Huaman |KGSWC2021 | 23/11/2021 Confidence Measurement / Triple validation: Calculates a confidence score of whether a property value on various external sources matches the property value in the user’s KG. How? Towards Knowledge Graphs Validation through Weighted Knowledge Sources 13 Validator Reliable KGs KG Mapping DS Instance matching Triple validation Weights Confidence Measurement Validator [0.1]
  • 14.
    Elwin Huaman |KGSWC2021 | 23/11/2021 Confidence Measurement / Instance validation: Computes the aggregated score from the attribute space of an instance. How? Towards Knowledge Graphs Validation through Weighted Knowledge Sources 14 Validator Reliable KGs KG Mapping DS Instance matching Triple validation Instance validation [0.1] Weights Confidence Measurement Validator [0.1]
  • 15.
    Elwin Huaman |KGSWC2021 | 23/11/2021 Output: The computed scores for triples and instances are shown in a graphical user interface. How? Towards Knowledge Graphs Validation through Weighted Knowledge Sources 15 Validator Reliable KGs KG Mapping DS Instance matching Triple validation Instance validation [0.1] [0.1] Weights Confidence Measurement Validator
  • 16.
    Elwin Huaman |KGSWC2021 | 23/11/2021 Output: The computed scores for triples and instances are shown in a graphical user interface. How? Towards Knowledge Graphs Validation through Weighted Knowledge Sources 16 Validator
  • 17.
    Elwin Huaman |KGSWC2021 | 23/11/2021 Evaluation I: ● Dataset: A subset of the Tirol Knowledge Graph (~15 Billion statements). ○ 50 Hotel instances ● Baseline: We performed a manual validation. ○ Precision, Recall, and F-measure ● Result: F-measure of at least 75% on address, name, and phone properties. How? Towards Knowledge Graphs Validation through Weighted Knowledge Sources 17 Validator Comparison of precision, recall, and f-measure scores over the manual and semi-automatic validation.
  • 18.
    Elwin Huaman |KGSWC2021 | 23/11/2021 Evaluation II: ● Dataset: Pantheon dataset 11341 famous biographies ○ 2530 politician instances ● Setup: We defined two external sources. ○ Wikidata and DBpedia. ● Result: ~15 minutes. ○ Overall recall scores are ■ 0.36% (DBpedia) ■ 0.49% (Wikidata) How? Towards Knowledge Graphs Validation through Weighted Knowledge Sources 18 Validator The recall score results of the validation of politician instances.
  • 19.
    Why? Use cases 19 Elwin Huaman| KGSWC2021 | 23/11/2021
  • 20.
    Elwin Huaman |KGSWC2021 | 23/11/2021 Why? Towards Knowledge Graphs Validation through Weighted Knowledge Sources Use cases: Semantic correctness of a triple. E.g. To validate if the shown data of a person, business, are correct based on different sources 20
  • 21.
    Elwin Huaman |KGSWC2021 | 23/11/2021 Semantic correctness of a triple. E.g. To validate if the shown data of a person, business, are correct based on different sources Linking different Knowledge Sources. E.g. Linking an instance of the user’s KG with the matched instance in Wikidata Why? Use cases: 21 Towards Knowledge Graphs Validation through Weighted Knowledge Sources
  • 22.
    Elwin Huaman |KGSWC2021 | 23/11/2021 Semantic correctness of a triple. E.g. To validate if the shown data of a person, business, are correct based on different sources Linking different Knowledge Sources. E.g. Linking an instance of the user’s KG with the matched instance in Wikidata. Validating static data. E.g. Checking whether the addresses of hotels are up-to-date and are correctly shown by external sources. Why? Use cases: 22 Towards Knowledge Graphs Validation through Weighted Knowledge Sources
  • 23.
    Elwin Huaman |KGSWC2021 | 23/11/2021 Insights & Limitations ❏ Assessment ❏ Automation ❏ Cost-effectiveness ❏ Dynamic-data ❏ Scalability 23
  • 24.
    Elwin Huaman |KGSWC2021 | 23/11/2021 Summary ● A Validation framework ○ Mapping ○ Instance Matching ○ Confidence Measurement ■ Triple validation ■ Instance Validation ○ GUI ● Use cases ● Insights and limitations 24
  • 25.
    Elwin Huaman |KGSWC2021 | 23/11/2021 Acknowledgement Univ.-Prof. Dr. Fensel Dieter Assoc.-Prof. Dr. Fensel Anna Tauqeer Amar M.Sc. MindLab (mindlab.ai) WordLiftNG (wordlift.io/ng/) STI Projects Next Generation 25