Evotec - How can Knowledge Graphs support Druh Discovery

Polina Shpudeiko
Scientific Programmer, Computational Biology
How can Knowledge Graphs
Support Drug Discovery?
Graph Summit Neo4j, Frankfurt, October 10th, 2023

Agenda
1. Why do we need knowledge graphs in
drug discovery?
2. How can we build them to solve our
challenges?
3. What can be done with the power of
graphs?
4. Where will it lead us?

Why do we need knowledge
graphs in drug discovery?

Integration of public and internal knowledge
Towards a comprehensive understanding of diseases and therapies
Public knowledge Internal knowledge

The life science space is diverse
Navigating the complexity of biology, chemistry and clinics
Genes
Proteins
Mutations
Tissues
Pathways
Cell types
Compounds
Diseases

The life science space is diverse
Various databases capture the structured public knowledge
And these are only a few examples...
Genes
Proteins Compounds
Mutations
Tissues
Diseases Pathways
Cell types

Literature space is adding more complexity
Scientific articles are the key for sharing novel knowledge
Statistics
There are approximately
30,000 journals in the
world with an increasing
rate of 5-7% per year
The rapidly evolving landscape of
scientific research, marked by an
annual influx of approximately
2 million new articles
There are already 36 million
articles in the open source
database for articles
These ideas can be extracted by utilising natural
language processing (NLP)

The drug discovery process as our in-house data source
Each step generates novel insights and requires dedicated expertise
Clinics
Disease biology Screening and compound chemistry
Target ID and
validation
Hit ID and
optimization
Lead
optimisation
Candidate
selection
Candidate
profiling
Clinical trial
Areas of interest

Mission – harmonize data,
understand diseases and support
the development of new therapies
Integration of public and internal knowledge
Combinig it together will lead us towards novel targets discovery at Evotec
Public knowledge Internal knowledge

How can we build them
to solve our challenges?

Challenge of managing diverse data

Experimental data
Public ontologies space is not standardized Public ontologies space is incomplete
Ontologies do not cover cutting-edge science and novel associations
Multiple distinct
ontologies for Diseases
The ontological space is complex and incomplete
Stable and reliable data models and custom ontologies are essential

Knowledge graph as harmonisation tool
Bringing together heterogeneous biological data in one place
• 15 databases
• 30 mln nodes and
100 mln connections
and counting
• Deep understanding of
ontologies (hierarchical/
semantical connections
between different entities)
was re-quired for har-
monisation of diseases
and traits

• Extract public know-
ledge from scientific
articles with NLP
• Overlay de-novo mined
knowledge with
ontological database
space
Knowledge graph as integration tool
Integration of literature data with NLP approaches
Pathways
Tissue
Genes
Compounds
Diseases
Traits
Mutations
NLP
Article

PMID prevent
Depressive
disorder
BAIAP2
Knowledge graph
• Natural Language Processing (NLP) extracts keys mentioned in the articles
• NLP-powered search engines can understand the context and semantics of queries
• Ontologies help to harmonize the extracted knowledge in one graph
Knowledge graph as tool for integration
Integration of literature data with NLP approaches

• Signatures are a
representation of
internal experimental
knowledge
• Example: genes which
are changing their
expression in response
to therapy
Pathways
Tissue
Genes
Compounds
Diseases
Traits
Mutations
NLP
Article
Knowledge graph as tool for unification
Combining internal knowledge with public data
Signatures

What can be done with
the power of graphs?

Disease tree ontology:
Fibrosis
Integration of Public and Internal data in graph
Using public knowledge for target identification from patient-derived signatures
Graph representation
of a Signature

• Expression data in a large patient
cohort can be enriched with hetero-
geneous data from public (NLP,
pathways, cell types) and internal
(in-house signatures) resources
• This allows us to understand better
underlying mechanisms that drive
the disease
Patient
stratification
based signature
Integration of patients signatures and experimental models
Translational research from animal to human
In vivo
signatures
In vitro
signatures

Kidney Diseases Genes Any Connected Disease
• Disease space is defined by
Parent term of ontology
(Kidney Disease)
• All NLP co-mentions of
child diseases to genes
are collected
• To determine specificity
all other diseases that were
co-mentioned are added
• Co-mention edges are
weighted by the number
of unique articles
Defining molecular disease spaces
Based on internal experimental data and NLP-mined external knowledge

Genes associated
with genetic
kidney diseases
Genes are
involved in the
infectious
diseases
Neoplasms
Infectious Diseases
Kidney Diseases
Defining molecular disease spaces
Identification of kidney-specific genes in the embeddings of kidney disease space
Polycystic Kidney
Diseases
Genes which are
taking part in
cancer and not
specific to disease
space of interest
Genes which
drive kidney
diseases are the
most important
target candidates

Sharing of the data insights
Neodash solution for internal knowledge sharing

Where will it lead us?

PAGE 24
Summary and outlook
• Graphs are powerful tools for data
harmonization in diverse life science
space – bringing ontologies together
• Alliance between public and internal
knowledge into one place with graphs –
allowed to characterize internal signatures
in the most efficient way
• Application of diverse graph algorithms
helps us understand hidden insights in our
data – identification of specific genes for the
disease of interest with the highest potential
for Target ID

Polina Shpudeiko
Scientific Programmer, Computational Biology
polina.shpudeiko@evotec.com

Evotec - How can Knowledge Graphs support Druh Discovery

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Evotec - How can Knowledge Graphs support Druh Discovery

Similar to Evotec - How can Knowledge Graphs support Druh Discovery (20)

More from Neo4j

More from Neo4j (20)

Recently uploaded

Recently uploaded (20)

Evotec - How can Knowledge Graphs support Druh Discovery