1) Knowledge graphs are a representation of knowledge that is useful for modern AI systems. They describe entities and their relationships in a way that facilitates logical reasoning and access to large-scale factual knowledge.
2) Knowledge graphs can be created through manual effort but recent approaches aim to construct them automatically at large scale, such as extracting knowledge graphs from Wikipedia and other wikis.
3) Knowledge graphs are useful ingredients for AI, supporting natural language processing, automated reasoning, and machine learning approaches through knowledge graph embeddings that represent entities as vectors. However, the meaning carried in dimensions is lost which presents challenges.
2. AI Ingredients
OK, Google, when will the final
season of Money Heist be on Netflix?
The fifth season of Money Heist
will be released on September 3rd
and December 3rd
.
3. AI Ingredients
Are there any other series
by the same creator?
Álex Pina has also created
White Lines, The Pier, and Locked Up.
4. AI Ingredients
● What does an AI system like Google Assistant need?
– Speech recognition, interpretation, and synthesis
– A knowledge base
– Logical reasoning
– …
● ...there are many more other ingredients to AI
– e.g., machine learning, computer vision, ...
16.11.21
Heiko Paulheim 4
5. AI Ingredients
●
Four components of AI
required to pass a Turing test [1]:
– Natural language processing
– Knowledge representation
– Automated reasoning
– Machine learning
16.11.21
Heiko Paulheim 5
[1] Russel, Norvig: Artificial Intelligence, A Modern Approach
6. It’s an Unequal Field
16.11.21
Heiko Paulheim 6
[1] Google Trends, 2021
7. Human Intelligence Ingredients
● System 1 (think: 2+2)
– Fast
– Intuitive
– Unconscious
– Prone to biases
● System 2 (think: 342+735)
– Slow
– Explicit
– Conscious
– Tedious (hence: lazy)
[1] Kahnemann: Thinking, Fast and Slow
16.11.21
Heiko Paulheim 7
8. Fast and Slow AI
●
Kahneman for AI [1]
– System 1: ML, Statistics,
Heuristics
– System 2: Explicit reasoning,
knowledge representation,
explanations
●
Neuro-symbolic or Hybrid AI
uses both components
16.11.21
Heiko Paulheim 8
[1] Booch et al. (AAAI 2021): Thinking Fast and Slow in AI
9. Knowledge Graphs for AI
16.11.21
Heiko Paulheim 9
2021-09-03
2020-04-03
release date
release date
has part
h
a
s
p
a
r
t
OK, Google, when will the final season
Money Heist be on Netflix?
.
.
.
10. Knowledge Graphs for AI
16.11.21
Heiko Paulheim 10
2021-09-03
2020-04-03
release date
release date
creator
has part
h
a
s
p
a
r
t
cast
c
a
s
t
creator
c
a
s
t
Are there any other series
by the same creator?
creator
cast
cast .
.
.
.
.
.
11. AIs on the Shoulders of Giants
●
Current knowledge graphs [1]
– Open data
– Millions of entities
– Billions of facts
●
Facilitates AIs access to
– Large-scale factual knowledge
(note: not common sense knowledge)
– e.g., for explanations
16.11.21
Heiko Paulheim 11
[1] Heist et al. (2021): Knowledge Graphs on the Web – An Overview
12. Knowledge What?
• Knowledge Graphs on the Web
• Everybody talks about them, but what is a Knowledge
Graph?
16.11.21
Heiko Paulheim 12
Journal Paper Review, (Natasha Noy, Google, June 2015):
“Please define what a knowledge graph is – and what it is not.”
13. Knowledge Graphs for AI
●
Approaches since the 80s
– CyC (and OpenCyc)
– DBpedia & YAGO
– Wikidata
– Linked Open Data Cloud
16.11.21
Heiko Paulheim 13
14. Knowledge What?
• Working definition [1]: a Knowledge Graph
– mainly describes instances and their relations in the world
• Unlike an ontology
• Unlike, e.g., WordNet
– Defines possible classes and relations in a schema or ontology
• i.e., we know the types of things that are in our graphs
– Has a flexible schema
• Unlike a relational database
– Covers various domains
• Unlike, e.g., Geonames
16.11.21
Heiko Paulheim 14
[1] Paulheim (2017): Knowledge Graph Refinement – A Survey of Approaches and Evaluation
Methods
16. Knowledge What?
● Google uses the knowledge graph...
– for augmenting and improving search results
– for integrating data from various sources
● Some numbers [1]
– >5 billion entities
– >500 billion facts (i.e., edges)
16.11.21
Heiko Paulheim 16
[1] https://blog.google/products/search/about-knowledge-graph-and-knowledge-panels/
17. A Bit of History
• CyC (started by Douglas Lenat in 1984)
– Encyclopedic collection of knowledge
– Estimation: 350 person years and 250,000 rules
should do the job
of collecting the essence of the world’s knowledge
• The present (as of June 2017)
– ~1,000 person years, $120M total development cost
– 21M axioms and rules
16.11.21
Heiko Paulheim 17
18. A Bit of Business
● Does that Scale?
– A few back of an envelope calculations [1]
● Cyc contains...
– 21M statements and rules (roughly: „edges“)
– $120M development costs
→ $5,71 per statement
● Google’s Knowledge Graph
– 500 billion statements
– $2.571 trillion
● (that’s ~15 times Google’s net revenue in 2020)
[1] Paulheim (2018): How much is a Triple? Estimating the Cost of Knowledge Graph Creation.
16.11.21
Heiko Paulheim 18
19. Crowdsourcing Knowledge Graphs
● Freebase (launched 2007)
– Collaborative editing (like Wikipedia)
– Acquired by Google in 2010
– Shut down in 2016
● Wikidata (launched 2012)
– Free, collaborative
– Collects data from different sources
– Today: one of the largest publicly available,
free knowledge graphs
16.11.21
Heiko Paulheim 19
20. The Business Side of Crowdsourcing Knowledge Graphs
● Freebase: created by laymen
– Assumption: adding a statement to Freebase
equals adding a sentence to Wikipedia
• English Wikipedia up to April 2011: 41M working hours [1]
• size in April 2011: 3.6M pages, avg. 36.4 sentences each
• Using US minimum wage: $2.25 per sentence
→ $2.25 per statement
● Total cost of creating Freebase: $6.75B
– Acquired by Google for $60-$300M
[1] Geiger, Halfaker (2013): Using edit sessions to measure participation in wikipedia
16.11.21
Heiko Paulheim 20
21. Towards Automatic Knowledge Graph Construction
● Modern AI needs Massive Amounts of Knowledge
● Manual/crowdsourced creation
– Costly
– Does not work at scale
16.11.21
Heiko Paulheim 21
OK, Google, when will the final
season of Money Heist be on Netflix?
22. Creating Knowledge Graphs from Wikipedia
● Why start from scratch?
– If we already have (semi-)structured knowledge
at our fingertips
● Structured knowledge in Wikipedia
– Infoboxes (cf. Google’s Knowledge Panels)
– Categories
16.11.21
Heiko Paulheim 22
23. Turning Wikipedia into a Knowledge Graph
● First Observation:
– Many Wikipedia pages are about an entity
– For example: people, places, organizations, works…
16.11.21
Heiko Paulheim 23
24. Turning Wikipedia into a Knowledge Graph
● Further Observations:
– Articles are interlinked
– Some links have explicit meaning
– There are also numbers and dates
16.11.21
Heiko Paulheim 24
25. Turning Wikipedia into a Knowledge Graph
● Putting the Pieces Together
16.11.21
Heiko Paulheim 25
Nine_Inch_Nails
The_Downward
_Spiral
artist
1994-03-08
released
…
Trent_Reznor
member producer
...
26. Knowledge Graphs based on Wikipedia
● DBpedia: launched 2007
– Mapping infoboxes to node classes (e.g., „Person“, „Album“)
– Mapping infobox keys to edge labels (e.g., „artist“, „member“)
– Crowd-sourced mappings
● YAGO: launched 2008
– Using article categories in Wikipedia as classes
– Mapping infobox keys to edge labels
– Expert-created mappings
– Also contains temporal facts
16.11.21
Heiko Paulheim 26
27. Again: A Bit of Business
● DBpedia: 4.9M LOC, 2.2M LOC for mappings
– software project development: ~37 LOC per hour
(Devanbu et al., 1996)
– we use German PhD salaries as a cost estimate
→ 1.85c per statement
● We save by a factor of >100!
16.11.21
Heiko Paulheim 27
28. How Big is Big Enough?
● DBpedia and YAGO
– Constrained by the size (i.e., number of entries)
of Wikipedia
– Currently ~6M
● Commonly used recommender system
benchmarks have a coverage of… [1]
– ...85% for movies
– ...63% for music artists
– ...31% for books
16.11.21
Heiko Paulheim 28
https://grouplens.org/datasets/
[1] Di Noia, et al.: SPRank: Semantic Path-based Ranking for Top-n
Recommendations using Linked Open Data. In: ACM TIST, 2016
30. Exploiting More Structure in Wikipedia
● Listings and categories also are
structures
● They commonly share…
– a type (e.g., musician, book, …) and/or
– a common relation
● member of the same band
● book by the same author
● actor playing in the same film
… e.g., to
● the entity that represents the page
● ...or an entity mentioned somewhere
16.11.21
Heiko Paulheim 30
31. Exploiting More Structure in Wikipedia
● CaLiGraph [1]
– Extracts entities from listings
– Derives definitions from categories and list titles
● e.g., „Death Metal Bands“ → genre = Death_Metal
● 15M entities
– incl. 8M from listings
16.11.21
Heiko Paulheim 31
[1] Heist, Paulheim: Information Extraction from Co-Occurring Similar Entities.
In: The Web Conference, 2021
33. Beyond Wikipedia
● Regarding DBpedia and YAGO as a black box
– Input: a copy of Wikipedia
– Output: a knowledge graph
● If we have that black box
– Can’t we input any Wiki?
16.11.21
Heiko Paulheim 33
Magic ;-)
34. Beyond Wikipedia
● There’s thousands of Wikis
– Plus farms that host thousands themselves
● One of the largest farms: Fandom
16.11.21
Heiko Paulheim 34
35. Beyond Wikipedia
● Integration of Information from Multiple Wikis
● Challenges:
– Duplicate detection
– Few conventions
– Contradictions
16.11.21
Heiko Paulheim 35
[1] Hertling, Paulheim (2020): DBkWik: Extracting and Integrating Knowledge from
Thousands of Wikis. Knowledge and Information Systems 62(6): 2169-2190
36. The Story so Far
● We’ve come from AI building blocks:
– Natural language processing
– Knowledge representation
– Automated reasoning
– Machine learning
● How do we put the blocks together?
16.11.21
Heiko Paulheim 36
37. Using Knowledge Graphs as an Ingredient in AI
●
Automated Reasoning
– The combination of reasoning and knowledge graphs
has a long tradition
– Think of rules on the knowledge graph
– Example: artists on metal albums are metal artists
<Y artist X>, <Y genre Z> → <X genre Z>
16.11.21
Heiko Paulheim 37
Nine_Inch_Nails
The_Downward
_Spiral
artist
Metal
genre
genre
38. Using Knowledge Graphs as an Ingredient in AI
●
Knowledge Graphs are graphs
– hence the name ;-)
●
Most learning tools are tabular
16.11.21
Heiko Paulheim 38
39. Using Knowledge Graphs as an Ingredient in AI
● How to create tabular representations of entities in
knowledge graphs?
– Easy: data values (e.g., release date)
– Easy: edges with single occurences (e.g., birth place)
– Complex: edges with multiple occurences (e.g., starring)
16.11.21
Heiko Paulheim 39
?
40. Hybrid AI with Knowledge Graphs
●
Graphs to vectors!
– Representation learning aka embeddings
●
Approaches (not limited to)
– Language modeling adaptations
(RDF2vec, KGlove, …)
– Tensor factorization
(RESCAL, DistMult, ...)
– Link prediction
(TransE and its descendants)
– Graph Neural Networks
(e.g., GCN)
16.11.21
Heiko Paulheim 40
41. Knowledge Graph Embeddings
● A recent hype trend
– Each node (and edge)
in the graph is represented
as a point
– Similar nodes
are close in that space
16.11.21
Heiko Paulheim 41
42. Knowledge Graph Embeddings
● What do we win?
– Each entity is a
numeric vector
– Learning tools can be used
easily
● What do we lose?
– Dimensions do not
carry meaning anymore
16.11.21
Heiko Paulheim 42
43. Quo Vadis?
●
Knowledge Graphs are also
consumable for humans
– (think: explainable AI)
– but vectors are not!
●
We are missing
an important building block
– in Kahneman’s terms:
we forged system 2
into a new system 1 instead
– Holy grail: interpretable embeddings
16.11.21
Heiko Paulheim 43
44. Summary
● AI Ingredients
– AIs need knowledge
– e.g., conversational agents: need to know about entites in the world
● Knowledge Graphs
– One representation paradigm for such knowledge
– There are plenty of freely available KGs
– Can be used for explainable AI
16.11.21
Heiko Paulheim 44