Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems

JKU 2021
Knowledge Matters!
The Role of Knowledge Graphs
in Modern AI Systems
16.11.21
Heiko Paulheim 1

AI Ingredients
OK, Google, when will the final
season of Money Heist be on Netflix?
The fifth season of Money Heist
will be released on September 3rd
and December 3rd
.

AI Ingredients
Are there any other series
by the same creator?
Álex Pina has also created
White Lines, The Pier, and Locked Up.

AI Ingredients
● What does an AI system like Google Assistant need?
– Speech recognition, interpretation, and synthesis
– A knowledge base
– Logical reasoning
– …
● ...there are many more other ingredients to AI
– e.g., machine learning, computer vision, ...
16.11.21
Heiko Paulheim 4

AI Ingredients
●
Four components of AI
required to pass a Turing test [1]:
– Natural language processing
– Knowledge representation
– Automated reasoning
– Machine learning
16.11.21
Heiko Paulheim 5
[1] Russel, Norvig: Artificial Intelligence, A Modern Approach

It’s an Unequal Field
16.11.21
Heiko Paulheim 6
[1] Google Trends, 2021

Human Intelligence Ingredients
● System 1 (think: 2+2)
– Fast
– Intuitive
– Unconscious
– Prone to biases
● System 2 (think: 342+735)
– Slow
– Explicit
– Conscious
– Tedious (hence: lazy)
[1] Kahnemann: Thinking, Fast and Slow
16.11.21
Heiko Paulheim 7

Fast and Slow AI
●
Kahneman for AI [1]
– System 1: ML, Statistics,
Heuristics
– System 2: Explicit reasoning,
knowledge representation,
explanations
●
Neuro-symbolic or Hybrid AI
uses both components
16.11.21
Heiko Paulheim 8
[1] Booch et al. (AAAI 2021): Thinking Fast and Slow in AI

Knowledge Graphs for AI
16.11.21
Heiko Paulheim 9
2021-09-03
2020-04-03
release date
release date
has part
h
a
s
p
a
r
t
OK, Google, when will the final season
Money Heist be on Netflix?
.
.
.

16.11.21
Heiko Paulheim 10
2021-09-03
2020-04-03
release date
release date
creator
has part
h
a
s
p
a
r
t
cast
c
a
s
t
creator
c
a
s
t
Are there any other series
by the same creator?
creator
cast
cast .
.
.
.
.
.

AIs on the Shoulders of Giants
●
Current knowledge graphs [1]
– Open data
– Millions of entities
– Billions of facts
●
Facilitates AIs access to
– Large-scale factual knowledge
(note: not common sense knowledge)
– e.g., for explanations
16.11.21
Heiko Paulheim 11
[1] Heist et al. (2021): Knowledge Graphs on the Web – An Overview

Knowledge What?
• Knowledge Graphs on the Web
• Everybody talks about them, but what is a Knowledge
Graph?
16.11.21
Heiko Paulheim 12
Journal Paper Review, (Natasha Noy, Google, June 2015):
“Please define what a knowledge graph is – and what it is not.”

●
Approaches since the 80s
– CyC (and OpenCyc)
– DBpedia & YAGO
– Wikidata
– Linked Open Data Cloud
16.11.21
Heiko Paulheim 13

Knowledge What?
• Working definition [1]: a Knowledge Graph
– mainly describes instances and their relations in the world
• Unlike an ontology
• Unlike, e.g., WordNet
– Defines possible classes and relations in a schema or ontology
• i.e., we know the types of things that are in our graphs
– Has a flexible schema
• Unlike a relational database
– Covers various domains
• Unlike, e.g., Geonames
16.11.21
Heiko Paulheim 14
[1] Paulheim (2017): Knowledge Graph Refinement – A Survey of Approaches and Evaluation
Methods

Knowledge What?
16.11.21
Heiko Paulheim 15

Knowledge What?
● Google uses the knowledge graph...
– for augmenting and improving search results
– for integrating data from various sources
● Some numbers [1]
– >5 billion entities
– >500 billion facts (i.e., edges)
16.11.21
Heiko Paulheim 16
[1] https://blog.google/products/search/about-knowledge-graph-and-knowledge-panels/

A Bit of History
• CyC (started by Douglas Lenat in 1984)
– Encyclopedic collection of knowledge
– Estimation: 350 person years and 250,000 rules
should do the job
of collecting the essence of the world’s knowledge
• The present (as of June 2017)
– ~1,000 person years, $120M total development cost
– 21M axioms and rules
16.11.21
Heiko Paulheim 17

A Bit of Business
● Does that Scale?
– A few back of an envelope calculations [1]
● Cyc contains...
– 21M statements and rules (roughly: „edges“)
– $120M development costs
→ $5,71 per statement
● Google’s Knowledge Graph
– 500 billion statements
– $2.571 trillion
● (that’s ~15 times Google’s net revenue in 2020)
[1] Paulheim (2018): How much is a Triple? Estimating the Cost of Knowledge Graph Creation.
16.11.21
Heiko Paulheim 18

Crowdsourcing Knowledge Graphs
● Freebase (launched 2007)
– Collaborative editing (like Wikipedia)
– Acquired by Google in 2010
– Shut down in 2016
● Wikidata (launched 2012)
– Free, collaborative
– Collects data from different sources
– Today: one of the largest publicly available,
free knowledge graphs
16.11.21
Heiko Paulheim 19

The Business Side of Crowdsourcing Knowledge Graphs
● Freebase: created by laymen
– Assumption: adding a statement to Freebase
equals adding a sentence to Wikipedia
• English Wikipedia up to April 2011: 41M working hours [1]
• size in April 2011: 3.6M pages, avg. 36.4 sentences each
• Using US minimum wage: $2.25 per sentence
→ $2.25 per statement
● Total cost of creating Freebase: $6.75B
– Acquired by Google for $60-$300M
[1] Geiger, Halfaker (2013): Using edit sessions to measure participation in wikipedia
16.11.21
Heiko Paulheim 20

Towards Automatic Knowledge Graph Construction
● Modern AI needs Massive Amounts of Knowledge
● Manual/crowdsourced creation
– Costly
– Does not work at scale
16.11.21
Heiko Paulheim 21
OK, Google, when will the final
season of Money Heist be on Netflix?

Creating Knowledge Graphs from Wikipedia
● Why start from scratch?
– If we already have (semi-)structured knowledge
at our fingertips
● Structured knowledge in Wikipedia
– Infoboxes (cf. Google’s Knowledge Panels)
– Categories
16.11.21
Heiko Paulheim 22

Turning Wikipedia into a Knowledge Graph
● First Observation:
– Many Wikipedia pages are about an entity
– For example: people, places, organizations, works…
16.11.21
Heiko Paulheim 23

● Further Observations:
– Articles are interlinked
– Some links have explicit meaning
– There are also numbers and dates
16.11.21
Heiko Paulheim 24

● Putting the Pieces Together
16.11.21
Heiko Paulheim 25
Nine_Inch_Nails
The_Downward
_Spiral
artist
1994-03-08
released
…
Trent_Reznor
member producer
...

Knowledge Graphs based on Wikipedia
● DBpedia: launched 2007
– Mapping infoboxes to node classes (e.g., „Person“, „Album“)
– Mapping infobox keys to edge labels (e.g., „artist“, „member“)
– Crowd-sourced mappings
● YAGO: launched 2008
– Using article categories in Wikipedia as classes
– Mapping infobox keys to edge labels
– Expert-created mappings
– Also contains temporal facts
16.11.21
Heiko Paulheim 26

Again: A Bit of Business
● DBpedia: 4.9M LOC, 2.2M LOC for mappings
– software project development: ~37 LOC per hour
(Devanbu et al., 1996)
– we use German PhD salaries as a cost estimate
→ 1.85c per statement
● We save by a factor of >100!
16.11.21
Heiko Paulheim 27

How Big is Big Enough?
● DBpedia and YAGO
– Constrained by the size (i.e., number of entries)
of Wikipedia
– Currently ~6M
● Commonly used recommender system
benchmarks have a coverage of… [1]
– ...85% for movies
– ...63% for music artists
– ...31% for books
16.11.21
Heiko Paulheim 28
https://grouplens.org/datasets/
[1] Di Noia, et al.: SPRank: Semantic Path-based Ranking for Top-n
Recommendations using Linked Open Data. In: ACM TIST, 2016

Let’s Look Closer...
● Red links and unknown instances
16.11.21
Heiko Paulheim 29

Exploiting More Structure in Wikipedia
● Listings and categories also are
structures
● They commonly share…
– a type (e.g., musician, book, …) and/or
– a common relation
● member of the same band
● book by the same author
● actor playing in the same film
… e.g., to
● the entity that represents the page
● ...or an entity mentioned somewhere
16.11.21
Heiko Paulheim 30

Exploiting More Structure in Wikipedia
● CaLiGraph [1]
– Extracts entities from listings
– Derives definitions from categories and list titles
● e.g., „Death Metal Bands“ → genre = Death_Metal
● 15M entities
– incl. 8M from listings
16.11.21
Heiko Paulheim 31
[1] Heist, Paulheim: Information Extraction from Co-Occurring Similar Entities.
In: The Web Conference, 2021

Beyond Wikipedia
16.11.21
Heiko Paulheim 32

Beyond Wikipedia
● Regarding DBpedia and YAGO as a black box
– Input: a copy of Wikipedia
– Output: a knowledge graph
● If we have that black box
– Can’t we input any Wiki?
16.11.21
Heiko Paulheim 33
Magic ;-)

Beyond Wikipedia
● There’s thousands of Wikis
– Plus farms that host thousands themselves
● One of the largest farms: Fandom
16.11.21
Heiko Paulheim 34

Beyond Wikipedia
● Integration of Information from Multiple Wikis
● Challenges:
– Duplicate detection
– Few conventions
– Contradictions
16.11.21
Heiko Paulheim 35
[1] Hertling, Paulheim (2020): DBkWik: Extracting and Integrating Knowledge from
Thousands of Wikis. Knowledge and Information Systems 62(6): 2169-2190

The Story so Far
● We’ve come from AI building blocks:
– Natural language processing
– Knowledge representation
– Automated reasoning
– Machine learning
● How do we put the blocks together?
16.11.21
Heiko Paulheim 36

Using Knowledge Graphs as an Ingredient in AI
●
Automated Reasoning
– The combination of reasoning and knowledge graphs
has a long tradition
– Think of rules on the knowledge graph
– Example: artists on metal albums are metal artists
<Y artist X>, <Y genre Z> → <X genre Z>
16.11.21
Heiko Paulheim 37
Nine_Inch_Nails
The_Downward
_Spiral
artist
Metal
genre
genre

●
Knowledge Graphs are graphs
– hence the name ;-)
●
Most learning tools are tabular
16.11.21
Heiko Paulheim 38

● How to create tabular representations of entities in
knowledge graphs?
– Easy: data values (e.g., release date)
– Easy: edges with single occurences (e.g., birth place)
– Complex: edges with multiple occurences (e.g., starring)
16.11.21
Heiko Paulheim 39
?

Hybrid AI with Knowledge Graphs
●
Graphs to vectors!
– Representation learning aka embeddings
●
Approaches (not limited to)
– Language modeling adaptations
(RDF2vec, KGlove, …)
– Tensor factorization
(RESCAL, DistMult, ...)
– Link prediction
(TransE and its descendants)
– Graph Neural Networks
(e.g., GCN)
16.11.21
Heiko Paulheim 40

Knowledge Graph Embeddings
● A recent hype trend
– Each node (and edge)
in the graph is represented
as a point
– Similar nodes
are close in that space
16.11.21
Heiko Paulheim 41

Knowledge Graph Embeddings
● What do we win?
– Each entity is a
numeric vector
– Learning tools can be used
easily
● What do we lose?
– Dimensions do not
carry meaning anymore
16.11.21
Heiko Paulheim 42

Quo Vadis?
●
Knowledge Graphs are also
consumable for humans
– (think: explainable AI)
– but vectors are not!
●
We are missing
an important building block
– in Kahneman’s terms:
we forged system 2
into a new system 1 instead
– Holy grail: interpretable embeddings
16.11.21
Heiko Paulheim 43

Summary
● AI Ingredients
– AIs need knowledge
– e.g., conversational agents: need to know about entites in the world
● Knowledge Graphs
– One representation paradigm for such knowledge
– There are plenty of freely available KGs
– Can be used for explainable AI
16.11.21
Heiko Paulheim 44

JKU 2021
Knowledge Matters!
The Role of Knowledge Graphs
in Modern AI Systems
16.11.21
Heiko Paulheim 45

Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems

Similar to Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems (20)

More from Heiko Paulheim

More from Heiko Paulheim (10)

Recently uploaded

Recently uploaded (20)

Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems