1. 1
Introduction to concepts
HINF 6230
Knowledge Graphs
1
Presented by: Ali Daowd,
Ph.D. candidate, NICHE research group, Faculty of Computer Science,
Dalhousie University.
Adapted by: Jaber Rad,
Ph.D. candidate, NICHE research group, Faculty of Computer Science,
Dalhousie University.
2. Agenda
• What is a knowledge graph?
• Why is it called a knowledge graph?
• Why are knowledge graphs important?
• Google knowledge graph
• Drug repurposing knowledge graph
• The knowledge graph lifecycle
• Knowledge graph creation workflow
• Property graphs
2
3. References
• Kejriwal, M. (2019). Domain-specific knowledge graph
construction
• Robinson, I., Webber, J., & Eifrem, E. (2015). Graph
databases: new opportunities for connected data.
• Blumauer, A., Nahy, H., The Knowledge Graph Cookbook
3
4. What Is A Knowledge Graph?
• Knowledge: “understanding of a science, art,
technique, or other domains”
• https://www.merriam-webster.com/dictionary/knowledge
• Graph: “a structure amounting to a set of objects in
which some pairs of objects are in some sense
related”
• https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)
• Knowledge Graph (KG): a graph of data
representing human knowledge and its underlying
semantics
4
5. What Is A Knowledge Graph?
• Data in KG represent real-world entities, their
attributes, and semantic relations linking them
• In its simplest form, a KG is a set of triples representing an
assertion – i.e., facts about real-world entities
• Triple is a 3-tuple {h, r, t}, where h and t are entities, and r
is the relation between h, t
5
6. What Is A Knowledge Graph?
• “Dalhousie University is a public research university in
Nova Scotia”
6
7. What Is A Knowledge Graph?
• “Dalhousie University is a public research university in
Nova Scotia”
• Entity #1: Dalhousie University
• Entity #2: public research university
• Entity #3: Nova Scotia
7
8. What Is A Knowledge Graph?
• “Dalhousie University is a public research university in
Nova Scotia”
• {Dalhousie University, is_a, public research university}
• {Dalhousie University, located_in, Nova Scotia}
8
9. Wikidata
• Wikidata.org is a great example of an open knowledge
base
• Entities are called ‘items’
• Each item has a unique identifier, label, description, and
aliases
• Relations to other entities are called ‘statements’
• Property-value pair
9
11. Why Is It Called Knowledge Graph?
• Used to be called knowledge base
• Terminology started to change when Google
introduced the Google Knowledge Graph
• The real reason is because triples are better
understood when visualized in a graph
• Entities as nodes
• Relation between entities as edges
11
12. Why Is It Called Knowledge Graph?
• {Dalhousie University, is_a, public research university}
• {Dalhousie University, located_in, Nova Scotia}
12
Dalhousie
University
Public
research
university
Is_a
Dalhousie
University
Nova
Scotia
Located_in
13. Why Are Knowledge Graphs
Important?
• For us humans:
• Help to reduce information overload
• Provides an intuitive data structure that we can explore
• Excellent tool for knowledge-driven tasks
• For machines:
• Reduces gap between data and semantics
• Makes use of powerful graph analysis techniques
• Key aspect for many AI tasks
13
14. Google Knowledge Graph
• Google uses KGs to improve
its search engine – “things,
not strings”
• Google graphs YouTube video
14
15. Knowledge Graphs for Drug
Repurposing
• Drug repurposing (or repositioning) is an emerging
research discipline to use existing drugs for new
therapeutic indications – i.e., to target new diseases
• Makes use of public manually-curated databases – e.g.,
DrugBank, Reactome, Therapeutic Target Database,
PharmGKB
• Integrates and normalizes heterogenous data sources
• Captures interactions between genetic, molecular, biological,
anatomical, therapeutic, and disease entities
15
16. Knowledge Graphs for Drug
Repurposing
16
Source: DRKG - Drug Repurposing Knowledge Graph for
Covid-19
18. Interdisciplinary Domain
• Creation of KGs requires expertise in:
• Natural language processing
• Information extraction, relation extraction, entity linking
• Knowledge engineering
• KG construction, rule-based reasoning
• Databases
• RDF triple store, graph database
• Data science & machine learning
• Domain-specific KGs will require domain-specific experts,
analysts, informaticians, etc.
18
19. KG Creation Workflow
1. Data acquirement from multiple heterogenous
sources
2. Knowledge extraction (named-entity recognition,
entity resolution, relation extraction)
3. Knowledge representation
19
20. Data Acquirement
• Where does the data come from?
• Raw unstructured data from text, webpages,
images, literature
• Structured data from relational databases, social
networks
20
21. Knowledge Extraction
• Important task when using raw data to build the KG
• Named-Entity Recognition (NER)
• Given a raw text, the NER system detects segments of
text referring to entities, and classifies extracted
mentions of entities within segments of text
• NER methods: classical rule-based, supervised, semi-
supervised, and deep learning-based
21
22. Knowledge Extraction
• Entity relation extraction:
• Detecting and classifying semantic relations between
entities
• methods: rule-based, supervised, semi-supervised, and
unsupervised
22
23. Knowledge Extraction
• A well-known knowledge extraction tool for
biomedicine is SemRep
• UMLS-based application
• Extracts semantic triples from biomedical literature
in PubMed (subject-PREDICATE-object)
• E.g., “We used hemofiltration to treat a patient
with digoxin overdose that was complicated by
refractory hyperkalemia”
• Hemofiltration-TREATS-Patients
• Digoxin overdose-PROCESS_OF-Patients
• hyperkalemia-COMPLICATES-Digoxin overdose
23
24. Knowledge Extraction
• Semantic Medline is the web-based application for
SemRep
• Free to use once you register for a UMLS license
24
25. Knowledge Representation
• Most KGs are implemented as Resource Description
Framework (RDF) triples – the de facto standard for
KGs
• RDF is a standard of semantic web
• Focus on interoperability and information exchange
• Makes information on the web and relations between
them machine understandable
25
26. Knowledge Representation
• More recently, property graph data models gained
popularity
• Focus on data storage, querying, and
developers/applications
• Unlike semantic web and RDF, property graphs are not
standardized, multiple vendors introducing their own
schemas and query languages (Cypher, Gremlin, PGQL,
etc.)
26
29. Knowledge Representation
• Why RDF stores are not as popular as property
graphs?
• RDF is a complex standard, property graphs provide
similar services with less complexity
• Developers are more familiar with property graphs, RDF
adds unnecessary level of complexity
• Even semantic web founders acknowledge the
shortcomings: “Why the semantic web will never work”
• Interesting blog post on differences between RDF and
property graphs
29
30. Property Graph
• Property graph characteristics:
• Contains nodes and relationships
• Nodes have one or mode labels and key-value pair
properties (i.e., attributes)
• Relationships are labeled, directed, and always have a
start and end nodes
• Relationships also have key-value pair properties
• Mostly quantitative properties: weight, cost, distance, rating,
time interval, etc.
• Together, a relation’s direction and label add semantic
meaning to the structuring of nodes
30
32. Property Graph
• Property graphs are “whiteboard-friendly”
• Data model can simply be a sketch on a whiteboard
32
Source: https://neo4j.com/developer/guide-data-modeling/
33. Property Graph
• Property graphs are “whiteboard-friendly”
• Whiteboard sketch formalized a bit
33
Source: https://neo4j.com/developer/guide-data-modeling/
34. Property Graph
• Property graphs are “whiteboard-friendly”
• Node/relationship labels and properties added
34
Source: https://neo4j.com/developer/guide-data-modeling/
35. Property Graph
• Property graphs are “whiteboard-friendly”
• Final model in graph DB
35
Source: https://neo4j.com/developer/guide-data-modeling/
36. Popular Property Graphs
• Neo4j
• By far the most popular property graph. Neo4j supports
large graph structures and it’s free to download and use
• Amazon Neptune
• Supports both property graph-based and RDF-based
models
• Orient DB
36
37. Neo4j
• The next tutorials will focus on Neo4j and Cypher (the
query language)
• Goal is to expose students to popular new technologies
used in academia and industry
• Not enough time to learn everything about Neo4j and
Cypher, so you’ll learn the basics
37
38. Cypher
• Suppose that I have a Neo4j graph containing
information on my circle of friends and all their favorite
donair restaurants in Halifax
38
Self
{name:
Ali}
Friend
{name:
Ahmad}
Friend
{name:
Chris}
Friend_of
F
r
i
e
n
d
_
o
f
Restaurant
{name:
Tony’s
donair}
Restaurant
{name:
KoD}
Likes
Likes
City
{city_name
: Halifax}
Located_in
L
o
c
a
t
e
d
_
i
n
39. Cypher
• I want to find all donair restaurants in Halifax that my
friends like
• Match (Self) – [:Friend_of] -> (Friend) - [:Likes] ->
(Restaurant) – [:Located_in] -> (City {city_name:
‘Halifax’}) return Restaurant.name
39
40. To Do Before The Next Tutorial
1. Familiarize yourself with Neo4j:
https://neo4j.com/developer/get-started/
2. Start a Neo4j sandbox:
https://neo4j.com/sandbox/?ref=developer-start
• Start with the ‘Movies’ pre-built project and follow
tutorial instructions
3. Download Neo4j desktop version:
https://neo4j.com/download/
40
41. Knowledge Graph Project
• Purpose of the project is to expose students to latest
graph technologies and methods
• Requirements for the project:
• Neo4j graph database – desktop version
• Microsoft Excel or R
41
42. Knowledge Graph Project
• Each student will receive their own dataset
• You’re not expected to be Neo4j experts and learn Cypher
in a short period of time, so all Cypher scripts required to
import and analyze the data will be provided
• You’re required to apply various graph algorithms (using
provided scripts) and interpret the output
• You may be required to do additional simple analysis on
Excel (e.g., histograms, frequencies)
• Objective here is to make students aware of existing
methods and technologies
42
43. Conclusion
43
• KGs are gaining popularity in research and industry due to
their wide range of uses
• In its most simple form, KGs are essentially a collection of
semantic triples {s, p, o}
• Triples can be easily represented as nodes and edges in a
graph – hence, knowledge graph
• RDF is the de facto model for KG, but property graphs
gaining traction due to its versatility
• RDF are complex and some describe it as an ‘overkill’
• Property graphs are easy to learn and use, even for people
without technical background