Knowledge graphs for knowing more and knowing for sure

KI – Institute for Artificial Intelligence
Knowledge graphs for
knowing more and knowing for sure
Steffen Staab
@ststaab
https://www.ki.uni-stuttgart.de
https://semanux.com
https://southampton.ac.uk/research/institutes-centres/web-internet-science

provides an international forum for presentation and
discussion of research on information and knowledge
management, as well as recent advances on data and
knowledge bases.
2
Conference on Information and Knowledge Management
Rather: Conference on Large Language Models?
Let’s explore the role of knowledge bases/graphs!

1. What is a Knowledge Graph?
2. Some Applications of Knowledge Graphs
3. Knowledge Graphs for Knowing for Sure
4. Knowledge Graphs for Knowing More
5. Large Language Models as Knowledge Bases
6. Large Language Models as AI Assistants
3
Plan for my talk

What is a Knowledge Graph?
A model for knowledge structures with
5
C22.0
Patient2342
treatedBy
„liver tumor“ / „PhValue 7.5“
Concepts
Entities
Relations
Labels / Values

Queries
• Scalability to
billions of facts
• Answering with
• facts
• predictions
• recommendations
6
What does a knowledge graph do for us?
What are the difficulties?
Example from medical project:
• Foundational Model of Anatomy:
75.000 concepts, 120.000 labels,
> 2 Mio facts
• Not even patient data yet!

Queries
Ontologies & Facts
• How to develop and integrate
ontologies?
• How to provide facts?
• Reasoning?
• Learning?
• Guarantees?
7
• Foundational Model of Anatomy
• RadLex
• ICD-10

Queries
Ontologies & Facts
What can be represented?
• Provenance
• Uncertainty
• Time
• …
8
• Patient history
• patient measurements

2 Some Applications
of Knowledge
Graphs

02.11.2020
Steffen Staab, Universität Stuttgart, @ststaab, https://www.ipvs.uni-stuttgart.de/departments/ac/ 10
Encyclopedic
Knowledge Graph

02.11.2020
Wonderful ressource
– but not representative

12
Application 1: Bosch Semantic Stack
©
Kalaycı, Elem Güzel, et al. "Semantic integration of Bosch manufacturing data using virtual knowledge graphs." ISWC 2020: 19th International
Semantic Web Conference, Springer, 2020.

Application 2: KG for Circular Factory
Product
Production
Co-
Design
Knowledge Graph contains knowledge about design, production and product
including plans, sensor measurements and intra-logistics

14
Application 3: Architecture, Engineering, Construction
[CAAD Futures 2023] Collaboration with architects Happold, London

https://pixabay.com/de/users/peggy_marco-1553824/
KG-S
Simul
structural
engineering
KG-A
Design
architectural
design
KG-C
Constr
construction
customer requirements
acoustic engineering
electrical engineering
ventilation
fire containment
plumbing
….
many stakeholders
many tools
work in parallel, no waterfall
mutual dependencies

KG-A
Design
architectural
design
many stakeholders
many tools
work in parallel, no waterfall
mutual dependencies
code as agent
https://vimeo.com/372594657

• Updates and deletions with dependencies [EKAW18],
also at the ontological level [KR2020]
• Federation [WWW08]
• Lacking views with deletions and updates
• Transaction locking [ESWC2013]
• Lacking recent standards (SHACL) and optimistic schemes
• Uncertainties
• Managing identities
(„does re-designed column preserve its identity?“)
• ...
17
Applications Imply Wealth of Requirements
rudimentary
available
research

Encyclopedic KGs
• Facts are reported often
• Who is Douglas Adams?
• What is the capital of France?
• Head of distribution of world
knowledge on the Web
• Answers with high precision
retrieval desired
Engineering KGs
• Point facts exist once
• w3476 instOf AngleGrinder
• faceGear4223
maxDeviation 0.3mm
• Processes are important
• Answers must be correct
18
Sliding scale of knowledge graph requirements
Currently fashionable
research
“we build a system”
under-researched

A lot of research in Knowledge Graphs builds on the
assumption that we want to query encyclopedias
but we have many other requirements in industry.
19
Observation 1

KG 1 KG 2 KG 3
App A App B App C
Scenario in Architecture, Engineering, Construction
(AEC)

22
SOLID Project
https://solidproject.org/
• people store their data
securely in decentralized
data stores - Pods
• people control access to
the data in their Pod
• standard, open, and
interoperable data
formats and protocols
Focus:
authentication &
authorization

KG 1 KG 2 KG 3
App A App B App C
Can my app B work on my KG2?

Example: How old are the students?
Query for all students, access age
Query fails during evaluation
let students = query { SELECT ?x WHERE {?x a Student. } }
for student in students do
printfn „%A“ (student.age)
bob
alice 𝑏1
Student
University
subClass
type
studiesAt
type
211... "Bob"
matrNr name
25 "Alice"
age name
Person
[ESOP17,ISWC19]

Example: How old are the students?
Should we use this relation on this signifier?
Depends on:
1. Conceptualization of data source
2. Querying of data source
3. Software code
bob
alice 𝑏1
Student
University
subClass
type
studiesAt
type
211... "Bob"
matrNr name
25 "Alice"
age name
Person

Closed-world conceptualization of classes and relations
SHACL – SHApes Constraint Language
• SHACL shapes are integrity constraints
• Namespaces omitted for brevity
:StudentShape a :NodeShape;
:targetClass :Student;
:class :Person;
:property [
:path :studiesAt;
:minCount 1;
:class :University;
].
:PersonShape a :NodeShape;
:targetClass :Person;
:property [
:path :name;
:minCount 1;
:datatype xsd:string;
].

Closed-world conceptualization of code (1)
Type checking discovers (potential) run-time errors
Set of all students (StudentShape)
One value of
StudentShape
set
Not allowed since
StudentShape ⊈ ≥𝟏age.⊤
when considering
all conceptually possible RDF graphs

• Access: matrNr
• No error during evaluation
• Unsafe: Rejected by type checking,
conceptualization not guaranteed
printfn „%A“ (student.matrNr)
bob
alice 𝑏1
Student
University
subClass
type
studiesAt
type
211... "Bob"
matrNr name
25 "Alice"
age name
Person

• Query for: matrNr
• Type safe access:
matrNr inferred to be given for all values of student
let students = query { SELECT ?x WHERE {?x matrNr ?y. } }
printfn „%A“ (student.matrNr)
bob
alice 𝑏1
Student
University
subClass
type
studiesAt
type
211... "Bob"
matrNr name
25 "Alice"
age name
Person
[ESOP17,ISWC19]

1. Use available SHACL constraints
2. Infer additional SHACL constraints from queries
3. Type check using inference
Determine type safety
printfn „%A“ (student.name)
Query shape(2) including StudentShape (1)
One value of
StudentShape
set
StudentShape ⊆ PersonShape and
PersonShape ⊆ ≥1name. ⊤ in all possible graphs
Inference (3)
[ESOP17,ISWC19]

KG 1 KG 2 KG 3
App A App B App C
Scenario: Can my app B work on my view of KG1?

32
Shapes to Shapes
KG 1 KG 2
App B
Input Shape
Sin = { :Person ⊑ :Agent
}
Input query defining view
q = CONSTRUCT {
?x a :Person .
?y a :Agent
} WHERE {
?x a :Person .
?y a :Agent
“Every Person
is an Agent” Output Shapes
“Which data can App B expect?”
s2s(Sin, q) → Sout
view
[Seifer2023]

Tracing Query Concepts (and Relations)
Sin = { :Person1 ⊑
:Agent }
q = CONSTRUCT {
?x a :Person3 .
?y a :Agent
} WHERE {
?x a :Person2 .
?y a :Agent
}
Sout = { :Person3 ⊑
:Agent }
Are concepts
:Person1
:Person2
:Person3
the same?
33
Yes!
[Seifer2023]

Tracing Query Concepts (and Relations)
Sin = { :Person1 ⊑ :Agent
}
q = CONSTRUCT {
?x a :Person3 .
?y a :Agent
} WHERE {
?x a :Person2 .
?x a :Teacher .
?y a :Agent
}
Sout = { :Person3 ⊑ :Agent
}
Are concepts
:Person1
:Person2
:Person3
still the same?
34
NO!
[Seifer2023]
Hard problem even for restricted
query and constraint languages

KG problems occur at ontological and at fact level.
Knowledge Graph technologies lack crucial capabilities
for guaranteeing results.
35
Observation 2

Deduction
• Ontological reasoning: description logics/OWL,
EL++, DL-lite
• Rule reasoning
Induction
• Rule learning and reasoning: AnyBurl, AMIE
• Knowledge Graph factorizations and embeddings:
Rescal, TransE
Similarity reasoning
• Similarities of entities, attributes, relations, ….
• Analogical reasoning
Knowing More than What is Stated in a Knowledge Graph
These
paradigms
overlap

• represent entities and relation names: 𝒉, 𝒓, 𝒕 ∈ ℝ𝑘
• embed triples (𝒉, 𝒓, 𝒕) ∈ ℝ𝑘 × ℝ𝑘 × ℝ𝑘 such that 𝒉 + 𝒓 ≈ 𝒕
• minimize error 𝒉 + 𝒓 − 𝒕
• regularize to avoid
trivial solutions
• Design modifications
geometric spaces, geometric operators, loss, regularizers…
38
TransE (Bordes et al 2013)
𝒆𝟏
𝒆𝟐
𝒆𝟑
𝒆𝟒
𝒓𝟏
𝒓𝟏
𝒓𝟐
𝒓𝟐

39
Finding and Exploiting Patterns of Similarity & Analogy
Stuttgart
Area
worksFor
locatedIn
Koblenz
Area
Wolv.
Area
Steffen
Frank
Ingo
birthdate
livesIn
prediction impossible prediction possible

Correct [2013]:
“TransE significantly outperforms state-of-the-art methods in link
prediction on two knowledge bases.”
Misleading:
“Our work focuses on modeling multi-relational data from KBs
(Wordnet [9] and Freebase [1] in this paper), with the goal of
providing an efficient tool to complete them by automatically
adding new facts, without requiring extra knowledge.”
A. Bordes et al. [TransE 2013]
Knowing More than What is Stated in a Knowledge Graph

Geometric Reasoning with EL Ontology A-Box
Concept assertion 𝐶(𝑎)
𝑎
𝐶
Geometric membership
[ISWC2022]

42
Geometric Reasoning with EL Ontology T-Box
Box affine
transformation
Box entailment Box intersection
Box disjointedness
[ISWC2022]

Geometric Reasoning with EL Ontology A-Box
4
3
Concept assertion 𝐶(𝑎)
𝑎
𝐶
r(𝑎, 𝑏)
𝑇𝑟
𝑏
𝑎
Role assertion
Geometric membership
Affine transformation
between two points
[ISWC2022]

44
Geometric Reasoning with Fact Attributions in ShrinkE
• Modeling primal triple as a spatial spanning (from a point to a box)
• Modeling qualifiers as a spatial (monotonically) shrinking of the box
• Qualifier implication and exclusion are geometrically modeled as
box containment and disjointedness
[ACL2023]
Check out https://kg-beyond-triple.github.io/

45
Geometric Reasoning with Fact Attributions in ShrinkE
[ACL2023]
• Box embedding
• Box shrinking is a box-to-box transform that monotonically shrinks the size

• WD50k: excerpt from Wikidata
• JF17K: excerpt from Freebase
• WikiPeople: excerpt from Wikidata
• FB15k-237: excerpt from Freebase
• …
Datasets for evaluating knowledge graph embeddings
Many datasets, but all biased in the same direction

02.11.2020
PhD thesis in preparation by Fabian Sasse, KIT 47
Selecting manufacturing measurement technology
in immature production processes

48
Ontological Case-based Reasoning using
Knowledge Graph Embeddings
PhD thesis in preparation by Fabian Sasse, KIT

Knowledge Graph embedding techniques
do not complete knowledge graphs,
they perform similarity and analogical reasoning.
Evaluations of Knowledge Graph embedding methods
remain biased towards encyclopedic knowledge.
49
Observation 3

5 Large Language
Models as
Knowledge Bases

51
ChatGPT on Frequently Observed Facts
Wikidata on Frequently Observed Facts

52
ChatGPT on Point Facts
Wikidata on Point Facts
2023-10-24
2023-10-22

53
One highly ranked result of Google search

Statistically frequent knowledge
• Commonsense knowledge:
• “cows eat grass”
• “apples fall towards earth if
unsupported”
• Commonsense expert
knowledge
• “halting problem is undecidable”
• “3SAT is NP-complete”
“Point knowledge”
• Steffen Staab is a professor
at University of Stuttgart
54
Knowledge in text

• Smoothing a data set: create an approximating function that
preserves patterns in the data, while leaving out noise or fine-
scale structures. [Shortened from Wikipedia]
• Laplacian smoothing for Naïve Bayes:
argmax𝑐 𝑃 𝑐 𝑥1, … , 𝑥𝑛 ≈ argmax𝑐𝑃 𝑐 𝑃 𝑥1 𝑐 ⋯ 𝑃 𝑥𝑛 𝑐
• Smoothing for language models [ACL14]
𝑃 𝑤𝑛 𝑤𝑛−𝑘 ⋯ 𝑤𝑛−1
must not be 0 for unobserved 𝑤𝑛−𝑘 ⋯ 𝑤𝑛−1 𝑤𝑛
55
Language models smoothen probability distributions
must not be 0

• What other terms could appear in a masked position?
• High “temperature” → diversity of answers
• Varying answers for “Write a poem about <your name>”
56
Smoothing is the core task of Large Language Models

57
Sampling diverse decisions towards preferred answers
5 different answers:
[Potyka23]

Language models are not fact repositories
but they are good representing commonly occurring facts.
58
Observation 4

6 Large Language
Models as
AI Assistants

60
Integrating Knowledge Graph Embeddings and
Pre-trained Language Models in Hypercomplex Spaces
[ISWC2023]

ex:AstonVilla ex:endedWinlessStreakAgainst
ex:WestHamUnited ;
ex:hasConsecutiveHomeVictories 11 ;
ex:hasMatch [
ex:hasScorer ex:DouglasLuiz,
ex:hasScorer ex:OllieWatkins,
ex:hasScorer ex:LeonBailey
] .
ex:DouglasLuiz a foaf:Person ;
foaf:name "Douglas Luiz" ;
ex:scoresFor "sixth league game running" ;
ex:scoresPenalty true ;
ex:doubledTally true .
ex:OllieWatkins a foaf:Person ;
foaf:name "Ollie Watkins" ;
ex:scoredForEngland true ;
ex:pulledShotWide true .
ex:WestHamUnited a ex:FootballClub .
ex:JarrodBowen a foaf:Person ;
foaf:name "Jarrod Bowen" ;
ex:deflectedShot true .
ex:JohnMcGinn a foaf:Person ;
foaf:name "John McGinn" ;
ex:providesAssist true .
ex:KurtZouma a foaf:Person ;
foaf:name "Kurt Zouma" ;
ex:disrupted true .
ex:MichailAntonio a foaf:Person ;
foaf:name "Michail Antonio" ;
ex:shotOnTarget true .
ex:EmiMartinez a foaf:Person ;
foaf:name "Emi Martinez" ;
ex:makesSmartStop true .
ex:LucasPaqueta a foaf:Person ;
61
Knowledge engineering by LLM translation

@prefix vfb: <http://example.com/soccerclub#>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
vfb:VfB rdf:type vfb:SoccerClub; vfb:hasName "VfB"; vfb:hasNickname "Brustring".
vfb:SechsterErfolgInSerie rdf:type vfb:MatchResult; vfb:againstTeam vfb:UnionBerlin; vfb:withScore
"3:0”.
vfb:UnionBerlin rdf:type vfb:SoccerClub; vfb:hasName "1. FC Union Berlin".
vfb:MaxiMittelstaedt rdf:type vfb:Player; vfb:hasName "Maxi Mittelstädt".
vfb:DanAxelZagadou rdf:type vfb:Player; vfb:hasName "Dan-Axel Zagadou". vfb:PascalStenzel rdf:type
vfb:Player; vfb:hasName "Pascal Stenzel". vfb:HirokiIto rdf:type vfb:Player; vfb:hasName "Hiroki Ito".
vfb:SerhouGuirassy rdf:type vfb:Player; vfb:hasName "Serhou Guirassy"; vfb:hasGoals "1";
vfb:hasAssists "0". vfb:DenizUndav rdf:type vfb:Player; vfb:hasName "Deniz Undav"; vfb:hasGoals "1";
62
Knowledge engineering by LLM translation

63
Few shot in context learning on KB question answering
Tianle Li, Xueguang Ma, Alex Zhuang, Yu Gu, Yu Su,
and Wenhu Chen. 2023. Few-shot In-context Learning
on Knowledge Base Question Answering. In ACL-2023

64
Integration of knowledge graphs
by
language model
[SIGIR23]

Language models may be our translators and interfaces for
assisting the entering and retrieval of facts.
65
Observation 5

Knowing for Sure
• Research required for
dealing with federated,
overlapping KGs with
multiple authorities
Knowing More
• Know what you get and
evaluate not only with
encyclopedic KGs
LLMs as knowledge bases
• Commonsense knowledge
• Frequently observed
knowledge
LLMs as AI assistants
• entering and retrieving
“point knowledge”
Do not (always) go with the flow

Thank you!
E-Mail
www.
Universität Stuttgart
KI – Institute for Artificial Intelligence
Universitätsstraße 32, 70569 Stuttgart
Steffen Staab
ki.uni-stuttgart.de
Analytic Computing, KI
Steffen.staab@ki.uni-stuttgart.de
Many thanks go to my
PhD students, PostDocs and
collaborators who made the work
possible portrayed in this talk
check out references!
I hire
PostDoc & PhD student
for circular factory project!

1. [Potyka23] Nico Potyka, Yuqicheng Zhu, Evgeny Kharlamov and Steffen Staab.
Uncertainty-aware Knowledge Extraction from Large Language Models using
Social Choice Theory. TechReport.
2. [ISWC2022] B. Xiong, N. Potyka, T.-K. Tran, M. Nayyeri, S. Staab. For “Faithful
Embeddings for EL++ Knowledge Bases”. In: 21st International Semantic Web
Conference (ISWC2022)
3. [SIGIR23] J. Lu, J. Shen, B. Xiong, W. Ma, S. Staab, C. Yang. HiPrompt: Few-
Shot Biomedical Knowledge Fusion via Hierarchy-Oriented Prompting. In:
Proceedings of ACM SIGIR-2023, Taipei, Taiwan, July 23-27, 2023.
4. [ISWC2023] M. Nayyeri, Z. Wang, M. M. Akter, M. Mohtashim, Md R. Al Hasan
Rony, J. Lehmann, S. Staab. Integrating Knowledge Graph Embeddings and Pre-
trained Language Models in Hypercomplex Spaces. In: 22nd Int. Semantic Web
Conference (ISWC2023), Athens, GR, November 6-10, 2023.
5. [TransE 2013] Bordes, Antoine, et al. "Translating embeddings for modeling
multi-relational data." Advances in neural information processing systems 26
(2013).
References related to Knowing More

1. [ISWC19] M. Leinberger, P. Seifer, C. Schon, R. Lämmel, S. Staab. Type Checking Program Code using SHACL. In: Proc.
of Int. Semantic Web Conference (ISWC-2019). Auckland, New Zealand, October 2019.
2. [Seifer2023] Philipp Seifer, Daniel Hernández, Ralf Lämmel, Steffen Staab. From Shapes to Shapes: Inferring SHACL
Shapes for Results of SPARQL CONSTRUCT Queries. TechReport.
3. [ESOP17] M. Leinberger, R. Lämmel, S. Staab. The essence of functional programming on semantic data. In 26th
European Symposium on Programming (ESOP 2017), Uppsala, SE, 22 - 29 Apr 2017, pp. 750-776.
4. [CAAD Futures 2023] D. Elshani, D. Hernandez, A. Lombardi, L. Siriwardena, T. Schwinn, A. Fisher, S. Staab, A. Menges,
T. Wortmann. Building Information Validation and Reasoning Using Semantic Web Technologies. In: Computer-Aided
Architectural Design. CAAD Futures 2023. Springer, Cham, 2023.
5. [KR2020] T. Rienstra, C. Schon, S. Staab. Concept Contraction in the Description Logic EL. In: Principles of Knowledge
Representation and Reasoning: Proceedings of the Seventeenth International Conference, KR 2020, pp. 723-732.
6. [EKAW18] C. Schon, S. Staab, P. Kügler, P. Kestel, B. Schleich, S. Wartzack. Metaproperty-guided Deletion from the
Instance-Level of a Knowledge Base. In: Proc. of EKAW 2018, 21st International Conference on Knowledge Engineering
and Knowledge Management, November 12-16, 2018, Nancy, France, Springer 2018.
7. [ESWC2013] S. Scheglmann, S. Staab, M. Thimm, G. Gröner. Locking for Concurrent Transactions on Ontologies. In: 10th
Extended Semantic Web Conference (ESWC2013), Montpellier, France, May 26-30, 2013.
8. [WWW08] S. Schenk, S. Staab. Networked Graphs: A Declarative Mechanism for SPARQL Rules, SPARQL Views and RDF
Data Integration on the Web. In: Proc. of WWW-2008, 17th Int. World Wide Web Conference, Bejing, China, April 21-25,
2008, pp. 585-594.
References related to Knowing for Sure

[ACL14] R. Pickhardt, T. Gottron, M. Körner, P. G. Wagner, T. Speicher, S. Staab. A Generalized Language Model as the
Combination of Skipped n-grams and Modified Kneser Ney Smoothing. In: Proc. of ACL-2014 - The 52nd Annual Meeting of the
Association for Computational Linguistics. Baltimore, June 22-27, 2014.
02.11.2020
Others

Knowledge graphs for knowing more and knowing for sure

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Knowledge graphs for knowing more and knowing for sure

Similar to Knowledge graphs for knowing more and knowing for sure (20)

More from Steffen Staab

More from Steffen Staab (20)

Recently uploaded

Recently uploaded (20)

Knowledge graphs for knowing more and knowing for sure

Editor's Notes