SlideShare a Scribd company logo
1 of 43
Download to read offline
1
Introduction to concepts
HINF 6230
Knowledge Graphs
1
Presented by: Ali Daowd,
Ph.D. candidate, NICHE research group, Faculty of Computer Science,
Dalhousie University.
Adapted by: Jaber Rad,
Ph.D. candidate, NICHE research group, Faculty of Computer Science,
Dalhousie University.
Agenda
• What is a knowledge graph?
• Why is it called a knowledge graph?
• Why are knowledge graphs important?
• Google knowledge graph
• Drug repurposing knowledge graph
• The knowledge graph lifecycle
• Knowledge graph creation workflow
• Property graphs
2
References
• Kejriwal, M. (2019). Domain-specific knowledge graph
construction
• Robinson, I., Webber, J., & Eifrem, E. (2015). Graph
databases: new opportunities for connected data.
• Blumauer, A., Nahy, H., The Knowledge Graph Cookbook
3
What Is A Knowledge Graph?
• Knowledge: “understanding of a science, art,
technique, or other domains”
• https://www.merriam-webster.com/dictionary/knowledge
• Graph: “a structure amounting to a set of objects in
which some pairs of objects are in some sense
related”
• https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)
• Knowledge Graph (KG): a graph of data
representing human knowledge and its underlying
semantics
4
What Is A Knowledge Graph?
• Data in KG represent real-world entities, their
attributes, and semantic relations linking them
• In its simplest form, a KG is a set of triples representing an
assertion – i.e., facts about real-world entities
• Triple is a 3-tuple {h, r, t}, where h and t are entities, and r
is the relation between h, t
5
What Is A Knowledge Graph?
• “Dalhousie University is a public research university in
Nova Scotia”
6
What Is A Knowledge Graph?
• “Dalhousie University is a public research university in
Nova Scotia”
• Entity #1: Dalhousie University
• Entity #2: public research university
• Entity #3: Nova Scotia
7
What Is A Knowledge Graph?
• “Dalhousie University is a public research university in
Nova Scotia”
• {Dalhousie University, is_a, public research university}
• {Dalhousie University, located_in, Nova Scotia}
8
Wikidata
• Wikidata.org is a great example of an open knowledge
base
• Entities are called ‘items’
• Each item has a unique identifier, label, description, and
aliases
• Relations to other entities are called ‘statements’
• Property-value pair
9
Wikidata
10
Why Is It Called Knowledge Graph?
• Used to be called knowledge base
• Terminology started to change when Google
introduced the Google Knowledge Graph
• The real reason is because triples are better
understood when visualized in a graph
• Entities as nodes
• Relation between entities as edges
11
Why Is It Called Knowledge Graph?
• {Dalhousie University, is_a, public research university}
• {Dalhousie University, located_in, Nova Scotia}
12
Dalhousie
University
Public
research
university
Is_a
Dalhousie
University
Nova
Scotia
Located_in
Why Are Knowledge Graphs
Important?
• For us humans:
• Help to reduce information overload
• Provides an intuitive data structure that we can explore
• Excellent tool for knowledge-driven tasks
• For machines:
• Reduces gap between data and semantics
• Makes use of powerful graph analysis techniques
• Key aspect for many AI tasks
13
Google Knowledge Graph
• Google uses KGs to improve
its search engine – “things,
not strings”
• Google graphs YouTube video
14
Knowledge Graphs for Drug
Repurposing
• Drug repurposing (or repositioning) is an emerging
research discipline to use existing drugs for new
therapeutic indications – i.e., to target new diseases
• Makes use of public manually-curated databases – e.g.,
DrugBank, Reactome, Therapeutic Target Database,
PharmGKB
• Integrates and normalizes heterogenous data sources
• Captures interactions between genetic, molecular, biological,
anatomical, therapeutic, and disease entities
15
Knowledge Graphs for Drug
Repurposing
16
Source: DRKG - Drug Repurposing Knowledge Graph for
Covid-19
The Knowledge Graph Lifecycle
17
Source: The Knowledge Graph Cookbook
Interdisciplinary Domain
• Creation of KGs requires expertise in:
• Natural language processing
• Information extraction, relation extraction, entity linking
• Knowledge engineering
• KG construction, rule-based reasoning
• Databases
• RDF triple store, graph database
• Data science & machine learning
• Domain-specific KGs will require domain-specific experts,
analysts, informaticians, etc.
18
KG Creation Workflow
1. Data acquirement from multiple heterogenous
sources
2. Knowledge extraction (named-entity recognition,
entity resolution, relation extraction)
3. Knowledge representation
19
Data Acquirement
• Where does the data come from?
• Raw unstructured data from text, webpages,
images, literature
• Structured data from relational databases, social
networks
20
Knowledge Extraction
• Important task when using raw data to build the KG
• Named-Entity Recognition (NER)
• Given a raw text, the NER system detects segments of
text referring to entities, and classifies extracted
mentions of entities within segments of text
• NER methods: classical rule-based, supervised, semi-
supervised, and deep learning-based
21
Knowledge Extraction
• Entity relation extraction:
• Detecting and classifying semantic relations between
entities
• methods: rule-based, supervised, semi-supervised, and
unsupervised
22
Knowledge Extraction
• A well-known knowledge extraction tool for
biomedicine is SemRep
• UMLS-based application
• Extracts semantic triples from biomedical literature
in PubMed (subject-PREDICATE-object)
• E.g., “We used hemofiltration to treat a patient
with digoxin overdose that was complicated by
refractory hyperkalemia”
• Hemofiltration-TREATS-Patients
• Digoxin overdose-PROCESS_OF-Patients
• hyperkalemia-COMPLICATES-Digoxin overdose
23
Knowledge Extraction
• Semantic Medline is the web-based application for
SemRep
• Free to use once you register for a UMLS license
24
Knowledge Representation
• Most KGs are implemented as Resource Description
Framework (RDF) triples – the de facto standard for
KGs
• RDF is a standard of semantic web
• Focus on interoperability and information exchange
• Makes information on the web and relations between
them machine understandable
25
Knowledge Representation
• More recently, property graph data models gained
popularity
• Focus on data storage, querying, and
developers/applications
• Unlike semantic web and RDF, property graphs are not
standardized, multiple vendors introducing their own
schemas and query languages (Cypher, Gremlin, PGQL,
etc.)
26
Knowledge Representation
• Popularity of different databases since 2013
27
Source: http://db-engines.com
Knowledge Representation
• Popularity of property graphs vs. RDF stores
28
Source: http://db-engines.com
Knowledge Representation
• Why RDF stores are not as popular as property
graphs?
• RDF is a complex standard, property graphs provide
similar services with less complexity
• Developers are more familiar with property graphs, RDF
adds unnecessary level of complexity
• Even semantic web founders acknowledge the
shortcomings: “Why the semantic web will never work”
• Interesting blog post on differences between RDF and
property graphs
29
Property Graph
• Property graph characteristics:
• Contains nodes and relationships
• Nodes have one or mode labels and key-value pair
properties (i.e., attributes)
• Relationships are labeled, directed, and always have a
start and end nodes
• Relationships also have key-value pair properties
• Mostly quantitative properties: weight, cost, distance, rating,
time interval, etc.
• Together, a relation’s direction and label add semantic
meaning to the structuring of nodes
30
Property Graph
31
Source: https://neo4j.com/developer/guide-data-modeling/
Property Graph
• Property graphs are “whiteboard-friendly”
• Data model can simply be a sketch on a whiteboard
32
Source: https://neo4j.com/developer/guide-data-modeling/
Property Graph
• Property graphs are “whiteboard-friendly”
• Whiteboard sketch formalized a bit
33
Source: https://neo4j.com/developer/guide-data-modeling/
Property Graph
• Property graphs are “whiteboard-friendly”
• Node/relationship labels and properties added
34
Source: https://neo4j.com/developer/guide-data-modeling/
Property Graph
• Property graphs are “whiteboard-friendly”
• Final model in graph DB
35
Source: https://neo4j.com/developer/guide-data-modeling/
Popular Property Graphs
• Neo4j
• By far the most popular property graph. Neo4j supports
large graph structures and it’s free to download and use
• Amazon Neptune
• Supports both property graph-based and RDF-based
models
• Orient DB
36
Neo4j
• The next tutorials will focus on Neo4j and Cypher (the
query language)
• Goal is to expose students to popular new technologies
used in academia and industry
• Not enough time to learn everything about Neo4j and
Cypher, so you’ll learn the basics
37
Cypher
• Suppose that I have a Neo4j graph containing
information on my circle of friends and all their favorite
donair restaurants in Halifax
38
Self
{name:
Ali}
Friend
{name:
Ahmad}
Friend
{name:
Chris}
Friend_of
F
r
i
e
n
d
_
o
f
Restaurant
{name:
Tony’s
donair}
Restaurant
{name:
KoD}
Likes
Likes
City
{city_name
: Halifax}
Located_in
L
o
c
a
t
e
d
_
i
n
Cypher
• I want to find all donair restaurants in Halifax that my
friends like
• Match (Self) – [:Friend_of] -> (Friend) - [:Likes] ->
(Restaurant) – [:Located_in] -> (City {city_name:
‘Halifax’}) return Restaurant.name
39
To Do Before The Next Tutorial
1. Familiarize yourself with Neo4j:
https://neo4j.com/developer/get-started/
2. Start a Neo4j sandbox:
https://neo4j.com/sandbox/?ref=developer-start
• Start with the ‘Movies’ pre-built project and follow
tutorial instructions
3. Download Neo4j desktop version:
https://neo4j.com/download/
40
Knowledge Graph Project
• Purpose of the project is to expose students to latest
graph technologies and methods
• Requirements for the project:
• Neo4j graph database – desktop version
• Microsoft Excel or R
41
Knowledge Graph Project
• Each student will receive their own dataset
• You’re not expected to be Neo4j experts and learn Cypher
in a short period of time, so all Cypher scripts required to
import and analyze the data will be provided
• You’re required to apply various graph algorithms (using
provided scripts) and interpret the output
• You may be required to do additional simple analysis on
Excel (e.g., histograms, frequencies)
• Objective here is to make students aware of existing
methods and technologies
42
Conclusion
43
• KGs are gaining popularity in research and industry due to
their wide range of uses
• In its most simple form, KGs are essentially a collection of
semantic triples {s, p, o}
• Triples can be easily represented as nodes and edges in a
graph – hence, knowledge graph
• RDF is the de facto model for KG, but property graphs
gaining traction due to its versatility
• RDF are complex and some describe it as an ‘overkill’
• Property graphs are easy to learn and use, even for people
without technical background

More Related Content

Similar to Introduction_to_knowledge_graph.pdf

Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsJon Voss
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jDebanjan Mahata
 
Lynch & Dirks - Platforms for Open Research - Charleston Conference 2011
Lynch & Dirks  - Platforms for Open Research - Charleston Conference 2011Lynch & Dirks  - Platforms for Open Research - Charleston Conference 2011
Lynch & Dirks - Platforms for Open Research - Charleston Conference 2011Lee Dirks
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so farEnrico Daga
 
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel ASIS&T
 
IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesDr.-Ing. Thomas Hartmann
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataEUCLID project
 
Hide the Stack: Toward Usable Linked Data
Hide the Stack:Toward Usable Linked DataHide the Stack:Toward Usable Linked Data
Hide the Stack: Toward Usable Linked Dataaba-sah
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Lucy McKenna
 
Linked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the SoftwareLinked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the SoftwareIMC Technologies
 
Linked Energy Data Generation
Linked Energy Data GenerationLinked Energy Data Generation
Linked Energy Data GenerationFilip Radulovic
 
Global lodlam_communities and open cultural data
Global lodlam_communities and open cultural dataGlobal lodlam_communities and open cultural data
Global lodlam_communities and open cultural dataMinerva Lin
 
DubJug: Neo4J and Open Data
DubJug: Neo4J and Open DataDubJug: Neo4J and Open Data
DubJug: Neo4J and Open DataScott Sosna
 
Rachel Proudfoot JIBS-RLUK event July 2012
Rachel Proudfoot JIBS-RLUK event July 2012Rachel Proudfoot JIBS-RLUK event July 2012
Rachel Proudfoot JIBS-RLUK event July 2012sherif user group
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
 
Cro presentation for library jan13v2
Cro presentation for library jan13v2Cro presentation for library jan13v2
Cro presentation for library jan13v2NeilStewartCity
 
Industry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraftIndustry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraftRuleML
 

Similar to Introduction_to_knowledge_graph.pdf (20)

Semantics and Machine Learning
Semantics and Machine LearningSemantics and Machine Learning
Semantics and Machine Learning
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & Museums
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
 
Lynch & Dirks - Platforms for Open Research - Charleston Conference 2011
Lynch & Dirks  - Platforms for Open Research - Charleston Conference 2011Lynch & Dirks  - Platforms for Open Research - Charleston Conference 2011
Lynch & Dirks - Platforms for Open Research - Charleston Conference 2011
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so far
 
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
 
November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...
November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...
November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...
 
IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with Triples
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Hide the Stack: Toward Usable Linked Data
Hide the Stack:Toward Usable Linked DataHide the Stack:Toward Usable Linked Data
Hide the Stack: Toward Usable Linked Data
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
 
Linked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the SoftwareLinked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the Software
 
Linked Energy Data Generation
Linked Energy Data GenerationLinked Energy Data Generation
Linked Energy Data Generation
 
Global lodlam_communities and open cultural data
Global lodlam_communities and open cultural dataGlobal lodlam_communities and open cultural data
Global lodlam_communities and open cultural data
 
DubJug: Neo4J and Open Data
DubJug: Neo4J and Open DataDubJug: Neo4J and Open Data
DubJug: Neo4J and Open Data
 
Rachel Proudfoot JIBS-RLUK event July 2012
Rachel Proudfoot JIBS-RLUK event July 2012Rachel Proudfoot JIBS-RLUK event July 2012
Rachel Proudfoot JIBS-RLUK event July 2012
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
 
Cro presentation for library jan13v2
Cro presentation for library jan13v2Cro presentation for library jan13v2
Cro presentation for library jan13v2
 
Today's forecast for your campus: BLUEcloud
 Today's forecast for your campus: BLUEcloud Today's forecast for your campus: BLUEcloud
Today's forecast for your campus: BLUEcloud
 
Industry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraftIndustry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraft
 

Recently uploaded

‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555kikilily0909
 
TOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxTOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxdharshini369nike
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫qfactory1
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxEran Akiva Sinbar
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Welcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayWelcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayZachary Labe
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 

Recently uploaded (20)

‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555
 
TOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxTOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptx
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Welcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayWelcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work Day
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 

Introduction_to_knowledge_graph.pdf

  • 1. 1 Introduction to concepts HINF 6230 Knowledge Graphs 1 Presented by: Ali Daowd, Ph.D. candidate, NICHE research group, Faculty of Computer Science, Dalhousie University. Adapted by: Jaber Rad, Ph.D. candidate, NICHE research group, Faculty of Computer Science, Dalhousie University.
  • 2. Agenda • What is a knowledge graph? • Why is it called a knowledge graph? • Why are knowledge graphs important? • Google knowledge graph • Drug repurposing knowledge graph • The knowledge graph lifecycle • Knowledge graph creation workflow • Property graphs 2
  • 3. References • Kejriwal, M. (2019). Domain-specific knowledge graph construction • Robinson, I., Webber, J., & Eifrem, E. (2015). Graph databases: new opportunities for connected data. • Blumauer, A., Nahy, H., The Knowledge Graph Cookbook 3
  • 4. What Is A Knowledge Graph? • Knowledge: “understanding of a science, art, technique, or other domains” • https://www.merriam-webster.com/dictionary/knowledge • Graph: “a structure amounting to a set of objects in which some pairs of objects are in some sense related” • https://en.wikipedia.org/wiki/Graph_(discrete_mathematics) • Knowledge Graph (KG): a graph of data representing human knowledge and its underlying semantics 4
  • 5. What Is A Knowledge Graph? • Data in KG represent real-world entities, their attributes, and semantic relations linking them • In its simplest form, a KG is a set of triples representing an assertion – i.e., facts about real-world entities • Triple is a 3-tuple {h, r, t}, where h and t are entities, and r is the relation between h, t 5
  • 6. What Is A Knowledge Graph? • “Dalhousie University is a public research university in Nova Scotia” 6
  • 7. What Is A Knowledge Graph? • “Dalhousie University is a public research university in Nova Scotia” • Entity #1: Dalhousie University • Entity #2: public research university • Entity #3: Nova Scotia 7
  • 8. What Is A Knowledge Graph? • “Dalhousie University is a public research university in Nova Scotia” • {Dalhousie University, is_a, public research university} • {Dalhousie University, located_in, Nova Scotia} 8
  • 9. Wikidata • Wikidata.org is a great example of an open knowledge base • Entities are called ‘items’ • Each item has a unique identifier, label, description, and aliases • Relations to other entities are called ‘statements’ • Property-value pair 9
  • 11. Why Is It Called Knowledge Graph? • Used to be called knowledge base • Terminology started to change when Google introduced the Google Knowledge Graph • The real reason is because triples are better understood when visualized in a graph • Entities as nodes • Relation between entities as edges 11
  • 12. Why Is It Called Knowledge Graph? • {Dalhousie University, is_a, public research university} • {Dalhousie University, located_in, Nova Scotia} 12 Dalhousie University Public research university Is_a Dalhousie University Nova Scotia Located_in
  • 13. Why Are Knowledge Graphs Important? • For us humans: • Help to reduce information overload • Provides an intuitive data structure that we can explore • Excellent tool for knowledge-driven tasks • For machines: • Reduces gap between data and semantics • Makes use of powerful graph analysis techniques • Key aspect for many AI tasks 13
  • 14. Google Knowledge Graph • Google uses KGs to improve its search engine – “things, not strings” • Google graphs YouTube video 14
  • 15. Knowledge Graphs for Drug Repurposing • Drug repurposing (or repositioning) is an emerging research discipline to use existing drugs for new therapeutic indications – i.e., to target new diseases • Makes use of public manually-curated databases – e.g., DrugBank, Reactome, Therapeutic Target Database, PharmGKB • Integrates and normalizes heterogenous data sources • Captures interactions between genetic, molecular, biological, anatomical, therapeutic, and disease entities 15
  • 16. Knowledge Graphs for Drug Repurposing 16 Source: DRKG - Drug Repurposing Knowledge Graph for Covid-19
  • 17. The Knowledge Graph Lifecycle 17 Source: The Knowledge Graph Cookbook
  • 18. Interdisciplinary Domain • Creation of KGs requires expertise in: • Natural language processing • Information extraction, relation extraction, entity linking • Knowledge engineering • KG construction, rule-based reasoning • Databases • RDF triple store, graph database • Data science & machine learning • Domain-specific KGs will require domain-specific experts, analysts, informaticians, etc. 18
  • 19. KG Creation Workflow 1. Data acquirement from multiple heterogenous sources 2. Knowledge extraction (named-entity recognition, entity resolution, relation extraction) 3. Knowledge representation 19
  • 20. Data Acquirement • Where does the data come from? • Raw unstructured data from text, webpages, images, literature • Structured data from relational databases, social networks 20
  • 21. Knowledge Extraction • Important task when using raw data to build the KG • Named-Entity Recognition (NER) • Given a raw text, the NER system detects segments of text referring to entities, and classifies extracted mentions of entities within segments of text • NER methods: classical rule-based, supervised, semi- supervised, and deep learning-based 21
  • 22. Knowledge Extraction • Entity relation extraction: • Detecting and classifying semantic relations between entities • methods: rule-based, supervised, semi-supervised, and unsupervised 22
  • 23. Knowledge Extraction • A well-known knowledge extraction tool for biomedicine is SemRep • UMLS-based application • Extracts semantic triples from biomedical literature in PubMed (subject-PREDICATE-object) • E.g., “We used hemofiltration to treat a patient with digoxin overdose that was complicated by refractory hyperkalemia” • Hemofiltration-TREATS-Patients • Digoxin overdose-PROCESS_OF-Patients • hyperkalemia-COMPLICATES-Digoxin overdose 23
  • 24. Knowledge Extraction • Semantic Medline is the web-based application for SemRep • Free to use once you register for a UMLS license 24
  • 25. Knowledge Representation • Most KGs are implemented as Resource Description Framework (RDF) triples – the de facto standard for KGs • RDF is a standard of semantic web • Focus on interoperability and information exchange • Makes information on the web and relations between them machine understandable 25
  • 26. Knowledge Representation • More recently, property graph data models gained popularity • Focus on data storage, querying, and developers/applications • Unlike semantic web and RDF, property graphs are not standardized, multiple vendors introducing their own schemas and query languages (Cypher, Gremlin, PGQL, etc.) 26
  • 27. Knowledge Representation • Popularity of different databases since 2013 27 Source: http://db-engines.com
  • 28. Knowledge Representation • Popularity of property graphs vs. RDF stores 28 Source: http://db-engines.com
  • 29. Knowledge Representation • Why RDF stores are not as popular as property graphs? • RDF is a complex standard, property graphs provide similar services with less complexity • Developers are more familiar with property graphs, RDF adds unnecessary level of complexity • Even semantic web founders acknowledge the shortcomings: “Why the semantic web will never work” • Interesting blog post on differences between RDF and property graphs 29
  • 30. Property Graph • Property graph characteristics: • Contains nodes and relationships • Nodes have one or mode labels and key-value pair properties (i.e., attributes) • Relationships are labeled, directed, and always have a start and end nodes • Relationships also have key-value pair properties • Mostly quantitative properties: weight, cost, distance, rating, time interval, etc. • Together, a relation’s direction and label add semantic meaning to the structuring of nodes 30
  • 32. Property Graph • Property graphs are “whiteboard-friendly” • Data model can simply be a sketch on a whiteboard 32 Source: https://neo4j.com/developer/guide-data-modeling/
  • 33. Property Graph • Property graphs are “whiteboard-friendly” • Whiteboard sketch formalized a bit 33 Source: https://neo4j.com/developer/guide-data-modeling/
  • 34. Property Graph • Property graphs are “whiteboard-friendly” • Node/relationship labels and properties added 34 Source: https://neo4j.com/developer/guide-data-modeling/
  • 35. Property Graph • Property graphs are “whiteboard-friendly” • Final model in graph DB 35 Source: https://neo4j.com/developer/guide-data-modeling/
  • 36. Popular Property Graphs • Neo4j • By far the most popular property graph. Neo4j supports large graph structures and it’s free to download and use • Amazon Neptune • Supports both property graph-based and RDF-based models • Orient DB 36
  • 37. Neo4j • The next tutorials will focus on Neo4j and Cypher (the query language) • Goal is to expose students to popular new technologies used in academia and industry • Not enough time to learn everything about Neo4j and Cypher, so you’ll learn the basics 37
  • 38. Cypher • Suppose that I have a Neo4j graph containing information on my circle of friends and all their favorite donair restaurants in Halifax 38 Self {name: Ali} Friend {name: Ahmad} Friend {name: Chris} Friend_of F r i e n d _ o f Restaurant {name: Tony’s donair} Restaurant {name: KoD} Likes Likes City {city_name : Halifax} Located_in L o c a t e d _ i n
  • 39. Cypher • I want to find all donair restaurants in Halifax that my friends like • Match (Self) – [:Friend_of] -> (Friend) - [:Likes] -> (Restaurant) – [:Located_in] -> (City {city_name: ‘Halifax’}) return Restaurant.name 39
  • 40. To Do Before The Next Tutorial 1. Familiarize yourself with Neo4j: https://neo4j.com/developer/get-started/ 2. Start a Neo4j sandbox: https://neo4j.com/sandbox/?ref=developer-start • Start with the ‘Movies’ pre-built project and follow tutorial instructions 3. Download Neo4j desktop version: https://neo4j.com/download/ 40
  • 41. Knowledge Graph Project • Purpose of the project is to expose students to latest graph technologies and methods • Requirements for the project: • Neo4j graph database – desktop version • Microsoft Excel or R 41
  • 42. Knowledge Graph Project • Each student will receive their own dataset • You’re not expected to be Neo4j experts and learn Cypher in a short period of time, so all Cypher scripts required to import and analyze the data will be provided • You’re required to apply various graph algorithms (using provided scripts) and interpret the output • You may be required to do additional simple analysis on Excel (e.g., histograms, frequencies) • Objective here is to make students aware of existing methods and technologies 42
  • 43. Conclusion 43 • KGs are gaining popularity in research and industry due to their wide range of uses • In its most simple form, KGs are essentially a collection of semantic triples {s, p, o} • Triples can be easily represented as nodes and edges in a graph – hence, knowledge graph • RDF is the de facto model for KG, but property graphs gaining traction due to its versatility • RDF are complex and some describe it as an ‘overkill’ • Property graphs are easy to learn and use, even for people without technical background