SERVIER Pegasus - Graphe de connaissances pour les phases primaires de recherche de nouveaux médicaments

Neo4J - GraphSummit Paris - 08/06/2023
Pegasus – Knowledge Graph
For The Early Drug Discovery
Jeremy Grignard, PhD
Research & Data Scientist
Institut de Recherches Servier

Discovering A New Drug: A Long, Expensive And Risky Process
2
Research Clinical
Discovery Preclinical
Exploratory
Target
Identification
Screening
Phase
I
Phase
II
Phase
III
106 perturbators 101 perturbators Candidate
10-15 years - 2 billion Euros - High failure rate
Formulation of
a causal
hypothesis
between a
target and a
disease
Strategic Objective
Obtain a MA for a
chemical or
biological entity /
combination of
entities every 3
years
How to improve the early drug discovery phases in order to increase the success
rate of drug candidates in clinical phases?

Our Mission - Data Sciences & Data Management
Efficiently guide early research projects using computational methods supported
by experimental capabilities
We rely on 3 interconnected activities:
• High throughput design of efficient perturbators
• Explainable project-oriented selection of relevant profiled perturbators
• Large and heterogeneous dataset and models’ analysis to ensure target tractability
and support rational decision making
Profiling
Systems
Biology
Sequence
Designs
Knowledge
Graph
4 main interconnected areas of expertise
3

4
Useful Data Sources For Therapeutic Projects And Our Activities
Large Data Heterogeneous pharmaco-biological concepts
• Genes, transcripts, proteins
• Ontologies (genomics, phenotypic)
• Static Maps
• Diseases
• In Vivo / In Vitro models
• Perturbators
- Chemicals (small compounds, PROTACs, probes)
- Target (shRNA, CRISPR, siRNA, overexpression)
- Antisens oligonucleotides (ASOS)
- Antibody
• Fingerprints
How to capitalize and link heterogeneous data to bring values for therapeutic
projects and support decision making?

5
Pegasus – Knowledge Graph For The Early Drug Discovery
Rational & Context
• Integration of heterogeneous data
- Labeled property graph with Neo4J
• 46.371.784 entities – 66 labels
• 331.570.883 relations – 14 types
• Data model flexible
- Model evolved over time
- Can be easily changed depending on new request
• Efficient data preparation (hours), import
(minutes), storage and query
• By and for Servier
Data Model
Pegasus is built to answer questions and give valuable insights extremely quickly
Answering a question is like traversing paths in a graph

6
Pegasus – Knowledge Graph For The Early Drug Discovery
Raw
Data
Primary
Data
Aggregated
Data
PEGASUS
Preparation Aggregation
Added
Value
Data Information Knowledge
Process Exposure
Target
Id
Card
DFSL
ASOS
design
Phenotypic
screening
deconvolution
Action
Models
Prediction
Models
uORF
Applications
Report
Report
Cpds
list
List of
targets
Outcomes
List of
ASOS
Process
Informed
compounds
library

7
ASOS (Antisens Oligonucleotides) Design
Characterization Of ASOs Off-target Effects
Given (100.000) ASOs designed for a target, prioritize quickly the ones (784) to screen
MATCH path=(:Asos)-[:HAS_ACTIVITIES]-(:ActivityAsos)-[:ON_TARGETS]-(:Transcript)-
[:HAS_REFERENCES]-(:Transcript)-[:TRANSCRIPTS]-(:Gene)-[:HAS_REFERENCES]-(:Gene)-
[:HAS_ACTIVITY]-(:ActivityGeneEssential)-[:ON_TARGETS]-(:Disease) RETURN path;
Industrial Applications
ASOS designed, active and non-toxic:
• X and Y (preclinical)
• A, B and C (LO)
• O and P (Research)
Note for X:
• Treating a baby with epileptic
encephalopathy
• 6.143 designed ASOs
• 2.073 → 372 essential genes
• 1.344 → 400 developmental
genes
• 784 screen ASOs (mouse, neurons)
• 13 active ASOs
• 8/13 no off target
• 3/13 best activities

8
Automated Focused Library Design
HAS REFERENCE
TRANSCRIPT
IN
TRANSLATE
IN HAS
HOMOLOGY
GENE TRANSCRIPT PROTEIN
Protein Embeddings
• Encode functional and structural properties of
proteins using LMs
Auto-encoder

9
IS ACTIVE
ON
GENE TRANSCRIPT PROTEIN CONTEXT PERTURBATOR

10
HAS
PREDICTED
ACTIVITY
GENE TRANSCRIPT PROTEIN CONTEXT
Historical activity
database
Multitasks learning
Federated deep learning:
- Unbiased
- Sequence based
- Unlimited number of molecules
- Very limited number of targets
PERTURBATOR

11
CHEMICALLY
SIMILAR
Fingerprints + metric (Tanimoto) + threshold
PERTURBATOR

12
CHEMICALLY
SIMILAR
PHENOTYPICALLY
SIMILAR
Phenotypic: Cell Painting + threshold
• Biology
• Polypharmacology
• Perturbation agnostic: bridge the GAP between Chemistry & Biology
• Medium throughput
Auto-encoder
PERTURBATOR

13
CHEMICALLY
SIMILAR
PHENOTYPICALLY
SIMILAR
TRANSCRIPTOMICALLY
SIMILAR
Transcriptomic: L1000 + cQuery
• Biology
• Perturbation agnostic:
bridge the GAP between Chemistry & Biology
PERTURBATOR

14
Automated Focused Library DesignFiltering out compounds / Off Targets
CHEMICALLY
DEFINED AS
PAINS
ESSENTIAL
GENE
OFF TARGET
OFF TARGET
OFF TARGET
LOW SPECIFICITY
PERTURBATOR

15
Automated Focused Library DesignFiltering out compounds
COMMERCIALY
UNAVAILABLE
COMMERCIALLY
AVAILABLE
INTERNALLY
AVAILABLE
PERTURBATOR

16
GENE TRANSCRIPT PROTEIN CONTEXT DISCARDED
LEVEL OF CONFIDENCE
PERTURBATOR

17
NOVELTY
PERTURBATOR

18
Automated Focused Library DesignMoving away from the target of interest
(cellular assay)
PPI
Pathway PART OF
PATHWAY
PERTURBATOR

19
Automated Focused Library DesignMoving away from the target of interest
(cellular assay)
INVOLVED
IN
PATHWAY
Disease
DISEASE
PERTURBATOR

20
Automated Focused Library DesignHypothesis testing
GENE TRANSCRIPT PROTEIN CONTEXT DISCARDED PATHWAY DISEASE
PERTURBATOR

21
Target based hypothesis
DISCARDED PATHWAY DISEASE
Perturbator
1
Target
1
hypothesis 1
Predicted
activity
Embedding
homology
Translation Transcription
PERTURBATOR

22
DISCARDED PATHWAY DISEASE
Perturbator
1
Target
1
hypothesis 1
Predicted
activity
Embedding
homology
PERTURBATOR

23
Perturbator
1
Target
1
hypothesis 1
Predicted
activity
Embedding
homology
Perturbator
2
Target
1
hypothesis 2
Chemical
similarity
Experimental
activity
PERTURBATOR

24
Perturbator
1
Target
1
hypothesis 1
Predicted
activity
Embedding
homology
Perturbator
2
Target
1
hypothesis 2
Chemical
similarity
Experimental
activity
Perturbator
N
Target
1
hypothesis M
.
.
.
PERTURBATOR

25
Perturbator
1
Target
1
hypothesis 1
Predicted
activity
Embedding
homology
Perturbator
2
Target
1
hypothesis 2
Chemical
similarity
Experimental
activity
Perturbator
N
+- 1.000
Target
1
hypothesis M
.
.
.
Hit rate enrichment
vs. regular pilot
screen library
(null hypothesis)
Medium scale
screening campaign
Perturbator
3
Perturbator
13
Perturbator
23
Validated
hits
Validated hits
+
Validated /
unvalidated
hypothesis

26
Automated Focused Library DesignHit deconvolution
HITS

27
GENE TRANSCRIPT PROTEIN CONTEXT HITS DISCARDED PATHWAY DISEASE CRISPR
shRNA cDNA
TRANSCRIPTOMIC
SIMILARITY
TRANSCRIPTOMIC
SIMILARITY
TRANSCRIPTOMIC
SIMILARITY
Transcriptomic: L1000 + cQuery
• Biology
• Perturbation agnostic:
bridge the GAP between Chemistry & Biology
TRANSCRIPTOMIC
SIMILARITY

28
shRNA cDNA
TRANSCRIPTOMIC
SIMILARITY
TRANSCRIPTOMIC
SIMILARITY
TRANSCRIPTOMIC
SIMILARITY
PHENOTYPIC
SIMILARITY
TRANSCRIPTOMIC
SIMILARITY

29
Automated Focused Library DesignModality Expansion
shRNA cDNA ASOS
PREDICTED
ACTIVE

30
Simple Overview With Results
Given any target, identify compounds having activity, their analogues and specificity
MATCH path=(g:Gene)-[:HAS_REFERENCES]-(:Gene)-[:ON_TARGETS]-(:ActivityChemical)-
[:HAS_ACTIVITIES]-(:ChemicalChembl)-[r:HAS_SIMILARITIES]-(:ChemicalServier)
WHERE g.geneId = ‘target X’ AND r.tanimoto_similarity > 0.6 RETURN path;
Success Stories
Target X:
• Identification of 984 Servier
compounds chemically similar to
compounds having an activity
• MST validation : hit rate 15%
Target Y:
• From Project Leader : 4 potential
reference compounds
• Identification of 47
chemically similar
compounds
• Using annotations in Pegasus
→ completely non-specific
profile of the reference
compounds

31
Therapeutic Target Environment Exploration (ID card)
Pegasus Application
Given any target, present a target ID card

32
Therapeutic Target Environment Exploration (ID card)
Pegasus Application
Given any target, present a target ID card
Success Stories
For plenty of targets: A, B, C, D, E, F, G, H, I, J, K, L, M,
N, O, P, Q, R, S, T, U, V, W, X, Y, Z
- Genes / transcripts / proteins identifiers, cross-
reference, isoforms
- Biological processes, cellular localization
- Protein half lives
- Gene essentiality
- In Vivo models
- Gene / Disease associations
- SNP
- Pathways
- Perturbators

33
Conclusion
Knowledge graph are well suited for integrating
• Large amount of heterogenous and sparse data (as typically seen in pharmaceutical research)
This data structure is guiding our therapeutics projects on various aspects
• And allow us to seamlessly integrate exploratory cutting-edge AI approaches
Systematic generation of data (Silico & Vitro)
• Is a requirement to feed properly those database and generate knowledge out of them
Communication
• Neo4j Health Care & Life Sciences Workshop; Symposium Servier – AI to New Drug Development;
Servier Corporate Strategy & Executive Director; France Culture « La Méthode Scientifique »…

34
Perspective
« Les objets sont caractérisés par la façon dont ils interagissent. Si un objet n'a pas
d'interactions, n'influence rien, n'agit sur rien, n'émet pas de lumière, n'attire pas, ne
repousse pas, ne se laisse pas toucher, ne sent rien, etc.,
c'est comme s'il n'existait pas.
Parler d'objets qui n'interagissent jamais, c'est parler de choses qui, quand bien
même elles existeraient, ne nous concernent pas.
Nous ne comprenons même pas ce que dire que de telles choses « existent »
pourrait signifier.
Le monde que nous connaissons, qui nous touche, qui nous intéresse, ce que nous
appelons la « réalité », est le vaste réseau d'entités en interaction qui se manifestent
les unes aux autres en interagissant et dont nous faisons partie.
C'est à ce réseau (Pegasus) que nous nous intéressons. »
Helgoland by Carlo Rovelli

Acknowledgements
J.P. Stephan
N. Boisseau
I. S. Khader
S. Lotfi
A.L. Ong
A. Gohier
T. Dorval
DSDM Team
TA(s)
PA(s)
35

37
JUMP-CPJoint Undertaking In Morphological Profiling Cell Painting
Phenotypic Fingerprint
High-Content Screening - Cell Painting Fingerprint Analysis
Compounds
CRISPR
Fingerprints
(morphological descriptors)
Phenotypic fingerprint induced by compounds screened in HCS show various phenotypic
response, reveal dose response effects with different mechanism of action
Negative
control and
no effect
compounds
dose response

38
Target Expansion
Protein Fingerprint
Protein Embeddings
• Encode functional and structural properties of
proteins using LMs
Fingerprint Analysis
X, Y, Z, A, B, C
Protein fingerprint (embedding) seem to carry sequence and domain function information
On going work: interpretability + sub-domain similarity
Proteins
Fingerprints
(embedded sequences)

39
Cancer Dependency Map (DepMap)
Cellular Model and Chemical Fingerprints
Achilles & CCLE
• Cell lines identification that express a target of
interest (experimental validation for CISH)
Fingerprint Analysis (PRISM 1D)
• Phenotypic similarity to identify compounds
sharing same phenotypic effects
Targets / MOAs / Chemical Similarity
In Clusters
Phenotypic
similarity
Drugs
Fingerprints
(activity in cell lines)
Similarity of cellular model fingerprints induced by compounds in DepMap PRISM reveal a
correlation (in some clusters) between chemical structures and phenotypic responses

40
CMAP – L1000
Chemical / CRISPR / sh Fingerprints
Dimensional Reduction Cluster - MOA / Targets Analysis
Compounds identified within the cluster, obtained by reduction of L1000 fingerprints, have the
same mechanism of action (MOA), and have activity on same target family (HDCA*)
Compounds
CRIPSR, sh, …
Fingerprints
(L1000 signatures)

41
CMAP – L1000
Chemical / CRISPR / sh Fingerprints
Fingerprint Similarity (Phenotypic – DepMap) Fingerprint Similarity (Chemical)
Compounds identified within the cluster, obtained by reduction of L1000 fingerprints, have the
same phenotypic effects (in DepMap PRISM) and are not chemically similar while sharing
same mechanism of action and activity on target family (HDCA*)

42
Our PA DSDM activities Linked To Pegasus (so far)
Support Therapeutic Projects
Therapeutic target environment exploration (ID card)
• Get target annotations: isoforms, biological processes, localization, half lives, essentiality, models, diseases, SNP,
pathways, perturbators activity
Optimized design of Antisense Oligonucleotides (ASO)
• Characterize ASOs off-targets effects to prioritize ASO to screen
Design new experiment (e.g coupling uORFs / ASOS)
• Identify target transcripts having uORFs that can be targeted by ASOs to increase protein expression
Identification of focused screening libraries
• Identify relevant perturbators, validate of biological hypothesis, and expand from Servier compound library to
phenotypic, omics, cellular spaces
These activities are based on heterogeneous pharmaco-biological data already present in
Pegasus and under exploration for future integration

43
Chemical / Activity On Targets
Chemical Fingerprint
Chemicals / Activity extraction
• Extraction, annotations (e.g PAINS, frequent hitter, reactive), and standardization
Given a target of interest, quick identification of compounds having activity,
analogues and compound specificity
Chemicals
Fingerprints
(e.g FCFP6)

Formalismes de représentation des connaissances pour mettre en relation des
données hétérogènes
Graphe Ressource Description Framework (RDF) Graphe à Propriétés Étiqueté (GPE)
Description atomique des données :
triplet (sujet, prédicat, objet)
Partage d’informations et de données
cohérentes
• Modèle de données éprouvé
• Difficulté d’identification d’une
sémantique commune à l’information
atomique à décrire
• Modèle RDF peu flexible
Données sous forme d’entités (nœuds) et
de relations
Analyse de graphe, recherche de
chemins en profondeur, importation
massif de données
Flexibilité du modèle de données : nœuds
et relations
Défauts formalisme RDF pour nos activités de recherche → Choix GPE
Manque de concepts → Conception et implémentation de Pegasus (GPE)

De multiples bases de données identifient des ressources fonctionnellement
identiques comme des gènes
Modélisation de concepts fonctionnellement identiques par
plusieurs entités et reliées par des références croisées

Concepts de gènes et de protéines sont souvent mélangés et le concept de
transcrit est généralement absent
Modélisation des gènes, transcrits, protéines et d’unités fonctionnelles par des entités distinctes reliées par des
relations

Certains concepts de perturbateurs chimiques et biologiques
sont absents
Modélisation des perturbateurs par des entités distinctes

Aucun concept de signatures phénotypiques et de similarités phénotypiques
dans les représentations existantes
Modélisation des similarités phénotypiques par des relations
entre signatures phénotypiques (entités)

Les relations reliant les nœuds des graphes sont directes dans les
représentations existantes
Entité intermédiaire pour relier et annoter contextuellement plusieurs entités entre elles

Plateforme de génération pour traiter les sources de données
pharmaco-biologiques hétérogènes et de provenances multiples
automatiquement et générer Pegasus

SERVIER Pegasus - Graphe de connaissances pour les phases primaires de recherche de nouveaux médicaments

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to SERVIER Pegasus - Graphe de connaissances pour les phases primaires de recherche de nouveaux médicaments

Similar to SERVIER Pegasus - Graphe de connaissances pour les phases primaires de recherche de nouveaux médicaments (20)

More from Neo4j

More from Neo4j (20)

Recently uploaded

Recently uploaded (20)

SERVIER Pegasus - Graphe de connaissances pour les phases primaires de recherche de nouveaux médicaments