SlideShare a Scribd company logo
Neo4J - GraphSummit Paris - 08/06/2023
Pegasus – Knowledge Graph
For The Early Drug Discovery
Jeremy Grignard, PhD
Research & Data Scientist
Institut de Recherches Servier
Discovering A New Drug: A Long, Expensive And Risky Process
2
Research Clinical
Discovery Preclinical
Exploratory
Target
Identification
Screening
Phase
I
Phase
II
Phase
III
106 perturbators 101 perturbators Candidate
10-15 years - 2 billion Euros - High failure rate
Formulation of
a causal
hypothesis
between a
target and a
disease
Strategic Objective
Obtain a MA for a
chemical or
biological entity /
combination of
entities every 3
years
How to improve the early drug discovery phases in order to increase the success
rate of drug candidates in clinical phases?
Our Mission - Data Sciences & Data Management
Efficiently guide early research projects using computational methods supported
by experimental capabilities
We rely on 3 interconnected activities:
• High throughput design of efficient perturbators
• Explainable project-oriented selection of relevant profiled perturbators
• Large and heterogeneous dataset and models’ analysis to ensure target tractability
and support rational decision making
Profiling
Systems
Biology
Sequence
Designs
Knowledge
Graph
4 main interconnected areas of expertise
3
4
Useful Data Sources For Therapeutic Projects And Our Activities
Large Data Heterogeneous pharmaco-biological concepts
• Genes, transcripts, proteins
• Ontologies (genomics, phenotypic)
• Static Maps
• Diseases
• In Vivo / In Vitro models
• Perturbators
- Chemicals (small compounds, PROTACs, probes)
- Target (shRNA, CRISPR, siRNA, overexpression)
- Antisens oligonucleotides (ASOS)
- Antibody
• Fingerprints
How to capitalize and link heterogeneous data to bring values for therapeutic
projects and support decision making?
5
Pegasus – Knowledge Graph For The Early Drug Discovery
Rational & Context
• Integration of heterogeneous data
- Labeled property graph with Neo4J
• 46.371.784 entities – 66 labels
• 331.570.883 relations – 14 types
• Data model flexible
- Model evolved over time
- Can be easily changed depending on new request
• Efficient data preparation (hours), import
(minutes), storage and query
• By and for Servier
Data Model
Pegasus is built to answer questions and give valuable insights extremely quickly
Answering a question is like traversing paths in a graph
6
Pegasus – Knowledge Graph For The Early Drug Discovery
Raw
Data
Primary
Data
Aggregated
Data
PEGASUS
Preparation Aggregation
Added
Value
Data Information Knowledge
Process Exposure
Target
Id
Card
DFSL
ASOS
design
Phenotypic
screening
deconvolution
Action
Models
Prediction
Models
uORF
Applications
Report
Report
Cpds
list
List of
targets
Outcomes
List of
ASOS
Process
Informed
compounds
library
7
ASOS (Antisens Oligonucleotides) Design
Characterization Of ASOs Off-target Effects
Given (100.000) ASOs designed for a target, prioritize quickly the ones (784) to screen
MATCH path=(:Asos)-[:HAS_ACTIVITIES]-(:ActivityAsos)-[:ON_TARGETS]-(:Transcript)-
[:HAS_REFERENCES]-(:Transcript)-[:TRANSCRIPTS]-(:Gene)-[:HAS_REFERENCES]-(:Gene)-
[:HAS_ACTIVITY]-(:ActivityGeneEssential)-[:ON_TARGETS]-(:Disease) RETURN path;
Industrial Applications
ASOS designed, active and non-toxic:
• X and Y (preclinical)
• A, B and C (LO)
• O and P (Research)
Note for X:
• Treating a baby with epileptic
encephalopathy
• 6.143 designed ASOs
• 2.073 → 372 essential genes
• 1.344 → 400 developmental
genes
• 784 screen ASOs (mouse, neurons)
• 13 active ASOs
• 8/13 no off target
• 3/13 best activities
8
Automated Focused Library Design
HAS REFERENCE
TRANSCRIPT
IN
TRANSLATE
IN HAS
HOMOLOGY
GENE TRANSCRIPT PROTEIN
Protein Embeddings
• Encode functional and structural properties of
proteins using LMs
Auto-encoder
9
Automated Focused Library Design
IS ACTIVE
ON
GENE TRANSCRIPT PROTEIN CONTEXT PERTURBATOR
10
Automated Focused Library Design
HAS
PREDICTED
ACTIVITY
GENE TRANSCRIPT PROTEIN CONTEXT
Historical activity
database
Multitasks learning
Federated deep learning:
- Unbiased
- Sequence based
- Unlimited number of molecules
- Very limited number of targets
PERTURBATOR
11
Automated Focused Library Design
CHEMICALLY
SIMILAR
GENE TRANSCRIPT PROTEIN CONTEXT
Fingerprints + metric (Tanimoto) + threshold
PERTURBATOR
12
Automated Focused Library Design
CHEMICALLY
SIMILAR
PHENOTYPICALLY
SIMILAR
GENE TRANSCRIPT PROTEIN CONTEXT
Phenotypic: Cell Painting + threshold
• Biology
• Polypharmacology
• Perturbation agnostic: bridge the GAP between Chemistry & Biology
• Medium throughput
Auto-encoder
PERTURBATOR
13
Automated Focused Library Design
CHEMICALLY
SIMILAR
PHENOTYPICALLY
SIMILAR
TRANSCRIPTOMICALLY
SIMILAR
GENE TRANSCRIPT PROTEIN CONTEXT
Transcriptomic: L1000 + cQuery
• Biology
• Polypharmacology
• Perturbation agnostic:
bridge the GAP between Chemistry & Biology
• Medium throughput
PERTURBATOR
14
Automated Focused Library DesignFiltering out compounds / Off Targets
GENE TRANSCRIPT PROTEIN CONTEXT
CHEMICALLY
DEFINED AS
PAINS
ESSENTIAL
GENE
OFF TARGET
OFF TARGET
OFF TARGET
LOW SPECIFICITY
PERTURBATOR
15
Automated Focused Library DesignFiltering out compounds
GENE TRANSCRIPT PROTEIN CONTEXT
COMMERCIALY
UNAVAILABLE
COMMERCIALLY
AVAILABLE
INTERNALLY
AVAILABLE
PERTURBATOR
16
Automated Focused Library Design
GENE TRANSCRIPT PROTEIN CONTEXT DISCARDED
LEVEL OF CONFIDENCE
PERTURBATOR
17
Automated Focused Library Design
GENE TRANSCRIPT PROTEIN CONTEXT DISCARDED
NOVELTY
PERTURBATOR
18
Automated Focused Library DesignMoving away from the target of interest
(cellular assay)
GENE TRANSCRIPT PROTEIN CONTEXT DISCARDED
PPI
Pathway PART OF
PATHWAY
PERTURBATOR
19
Automated Focused Library DesignMoving away from the target of interest
(cellular assay)
GENE TRANSCRIPT PROTEIN CONTEXT DISCARDED
INVOLVED
IN
PATHWAY
Disease
DISEASE
PERTURBATOR
20
Automated Focused Library DesignHypothesis testing
GENE TRANSCRIPT PROTEIN CONTEXT DISCARDED PATHWAY DISEASE
PERTURBATOR
21
Automated Focused Library DesignHypothesis testing
GENE TRANSCRIPT PROTEIN CONTEXT
Target based hypothesis
DISCARDED PATHWAY DISEASE
Perturbator
1
Target
1
hypothesis 1
Predicted
activity
Embedding
homology
Translation Transcription
PERTURBATOR
22
Automated Focused Library DesignHypothesis testing
GENE TRANSCRIPT PROTEIN CONTEXT
Target based hypothesis
DISCARDED PATHWAY DISEASE
Perturbator
1
Target
1
hypothesis 1
Predicted
activity
Embedding
homology
Translation Transcription
PERTURBATOR
23
Automated Focused Library DesignHypothesis testing
GENE TRANSCRIPT PROTEIN CONTEXT DISCARDED PATHWAY DISEASE
Target based hypothesis
Perturbator
1
Target
1
hypothesis 1
Predicted
activity
Embedding
homology
Translation Transcription
Perturbator
2
Target
1
hypothesis 2
Chemical
similarity
Experimental
activity
Translation Transcription
PERTURBATOR
24
Automated Focused Library DesignHypothesis testing
GENE TRANSCRIPT PROTEIN CONTEXT DISCARDED PATHWAY DISEASE
Target based hypothesis
Perturbator
1
Target
1
hypothesis 1
Predicted
activity
Embedding
homology
Translation Transcription
Perturbator
2
Target
1
hypothesis 2
Chemical
similarity
Experimental
activity
Translation Transcription
Perturbator
N
Target
1
hypothesis M
.
.
.
PERTURBATOR
25
Automated Focused Library DesignHypothesis testing
Target based hypothesis
Perturbator
1
Target
1
hypothesis 1
Predicted
activity
Embedding
homology
Translation Transcription
Perturbator
2
Target
1
hypothesis 2
Chemical
similarity
Experimental
activity
Translation Transcription
Perturbator
N
+- 1.000
Target
1
hypothesis M
.
.
.
Hit rate enrichment
vs. regular pilot
screen library
(null hypothesis)
Medium scale
screening campaign
Perturbator
3
Perturbator
13
Perturbator
23
Validated
hits
Validated hits
+
Validated /
unvalidated
hypothesis
26
Automated Focused Library DesignHit deconvolution
GENE TRANSCRIPT PROTEIN CONTEXT DISCARDED PATHWAY DISEASE
HITS
27
Automated Focused Library DesignHit deconvolution
GENE TRANSCRIPT PROTEIN CONTEXT HITS DISCARDED PATHWAY DISEASE CRISPR
shRNA cDNA
TRANSCRIPTOMIC
SIMILARITY
TRANSCRIPTOMIC
SIMILARITY
TRANSCRIPTOMIC
SIMILARITY
Transcriptomic: L1000 + cQuery
• Biology
• Polypharmacology
• Perturbation agnostic:
bridge the GAP between Chemistry & Biology
• Medium throughput
TRANSCRIPTOMIC
SIMILARITY
28
Automated Focused Library DesignHit deconvolution
GENE TRANSCRIPT PROTEIN CONTEXT HITS DISCARDED PATHWAY DISEASE CRISPR
shRNA cDNA
TRANSCRIPTOMIC
SIMILARITY
TRANSCRIPTOMIC
SIMILARITY
TRANSCRIPTOMIC
SIMILARITY
PHENOTYPIC
SIMILARITY
TRANSCRIPTOMIC
SIMILARITY
29
Automated Focused Library DesignModality Expansion
GENE TRANSCRIPT PROTEIN CONTEXT HITS DISCARDED PATHWAY DISEASE CRISPR
shRNA cDNA ASOS
PREDICTED
ACTIVE
30
Automated Focused Library Design
Simple Overview With Results
Given any target, identify compounds having activity, their analogues and specificity
MATCH path=(g:Gene)-[:HAS_REFERENCES]-(:Gene)-[:ON_TARGETS]-(:ActivityChemical)-
[:HAS_ACTIVITIES]-(:ChemicalChembl)-[r:HAS_SIMILARITIES]-(:ChemicalServier)
WHERE g.geneId = ‘target X’ AND r.tanimoto_similarity > 0.6 RETURN path;
Success Stories
Target X:
• Identification of 984 Servier
compounds chemically similar to
compounds having an activity
• MST validation : hit rate 15%
Target Y:
• From Project Leader : 4 potential
reference compounds
• Identification of 47
chemically similar
compounds
• Using annotations in Pegasus
→ completely non-specific
profile of the reference
compounds
31
Therapeutic Target Environment Exploration (ID card)
Pegasus Application
Given any target, present a target ID card
32
Therapeutic Target Environment Exploration (ID card)
Pegasus Application
Given any target, present a target ID card
Success Stories
For plenty of targets: A, B, C, D, E, F, G, H, I, J, K, L, M,
N, O, P, Q, R, S, T, U, V, W, X, Y, Z
- Genes / transcripts / proteins identifiers, cross-
reference, isoforms
- Biological processes, cellular localization
- Protein half lives
- Gene essentiality
- In Vivo models
- Gene / Disease associations
- SNP
- Pathways
- Perturbators
33
Conclusion
Knowledge graph are well suited for integrating
• Large amount of heterogenous and sparse data (as typically seen in pharmaceutical research)
This data structure is guiding our therapeutics projects on various aspects
• And allow us to seamlessly integrate exploratory cutting-edge AI approaches
Systematic generation of data (Silico & Vitro)
• Is a requirement to feed properly those database and generate knowledge out of them
Communication
• Neo4j Health Care & Life Sciences Workshop; Symposium Servier – AI to New Drug Development;
Servier Corporate Strategy & Executive Director; France Culture « La Méthode Scientifique »…
34
Perspective
« Les objets sont caractérisés par la façon dont ils interagissent. Si un objet n'a pas
d'interactions, n'influence rien, n'agit sur rien, n'émet pas de lumière, n'attire pas, ne
repousse pas, ne se laisse pas toucher, ne sent rien, etc.,
c'est comme s'il n'existait pas.
Parler d'objets qui n'interagissent jamais, c'est parler de choses qui, quand bien
même elles existeraient, ne nous concernent pas.
Nous ne comprenons même pas ce que dire que de telles choses « existent »
pourrait signifier.
Le monde que nous connaissons, qui nous touche, qui nous intéresse, ce que nous
appelons la « réalité », est le vaste réseau d'entités en interaction qui se manifestent
les unes aux autres en interagissant et dont nous faisons partie.
C'est à ce réseau (Pegasus) que nous nous intéressons. »
Helgoland by Carlo Rovelli
Acknowledgements
J.P. Stephan
N. Boisseau
I. S. Khader
S. Lotfi
A.L. Ong
A. Gohier
T. Dorval
DSDM Team
TA(s)
PA(s)
35
37
JUMP-CPJoint Undertaking In Morphological Profiling Cell Painting
Phenotypic Fingerprint
High-Content Screening - Cell Painting Fingerprint Analysis
Compounds
CRISPR
Fingerprints
(morphological descriptors)
Phenotypic fingerprint induced by compounds screened in HCS show various phenotypic
response, reveal dose response effects with different mechanism of action
Negative
control and
no effect
compounds
dose response
38
Target Expansion
Protein Fingerprint
Protein Embeddings
• Encode functional and structural properties of
proteins using LMs
Fingerprint Analysis
X, Y, Z, A, B, C
Protein fingerprint (embedding) seem to carry sequence and domain function information
On going work: interpretability + sub-domain similarity
Proteins
Fingerprints
(embedded sequences)
39
Cancer Dependency Map (DepMap)
Cellular Model and Chemical Fingerprints
Achilles & CCLE
• Cell lines identification that express a target of
interest (experimental validation for CISH)
Fingerprint Analysis (PRISM 1D)
• Phenotypic similarity to identify compounds
sharing same phenotypic effects
Targets / MOAs / Chemical Similarity
In Clusters
Phenotypic
similarity
Drugs
Fingerprints
(activity in cell lines)
Similarity of cellular model fingerprints induced by compounds in DepMap PRISM reveal a
correlation (in some clusters) between chemical structures and phenotypic responses
40
CMAP – L1000
Chemical / CRISPR / sh Fingerprints
Dimensional Reduction Cluster - MOA / Targets Analysis
Compounds identified within the cluster, obtained by reduction of L1000 fingerprints, have the
same mechanism of action (MOA), and have activity on same target family (HDCA*)
Compounds
CRIPSR, sh, …
Fingerprints
(L1000 signatures)
41
CMAP – L1000
Chemical / CRISPR / sh Fingerprints
Fingerprint Similarity (Phenotypic – DepMap) Fingerprint Similarity (Chemical)
Compounds identified within the cluster, obtained by reduction of L1000 fingerprints, have the
same phenotypic effects (in DepMap PRISM) and are not chemically similar while sharing
same mechanism of action and activity on target family (HDCA*)
42
Our PA DSDM activities Linked To Pegasus (so far)
Support Therapeutic Projects
Therapeutic target environment exploration (ID card)
• Get target annotations: isoforms, biological processes, localization, half lives, essentiality, models, diseases, SNP,
pathways, perturbators activity
Optimized design of Antisense Oligonucleotides (ASO)
• Characterize ASOs off-targets effects to prioritize ASO to screen
Design new experiment (e.g coupling uORFs / ASOS)
• Identify target transcripts having uORFs that can be targeted by ASOs to increase protein expression
Identification of focused screening libraries
• Identify relevant perturbators, validate of biological hypothesis, and expand from Servier compound library to
phenotypic, omics, cellular spaces
These activities are based on heterogeneous pharmaco-biological data already present in
Pegasus and under exploration for future integration
43
Chemical / Activity On Targets
Chemical Fingerprint
Chemicals / Activity extraction
• Extraction, annotations (e.g PAINS, frequent hitter, reactive), and standardization
Given a target of interest, quick identification of compounds having activity,
analogues and compound specificity
Chemicals
Fingerprints
(e.g FCFP6)
Formalismes de représentation des connaissances pour mettre en relation des
données hétérogènes
Graphe Ressource Description Framework (RDF) Graphe à Propriétés Étiqueté (GPE)
Description atomique des données :
triplet (sujet, prédicat, objet)
Partage d’informations et de données
cohérentes
• Modèle de données éprouvé
• Difficulté d’identification d’une
sémantique commune à l’information
atomique à décrire
• Modèle RDF peu flexible
Données sous forme d’entités (nœuds) et
de relations
Analyse de graphe, recherche de
chemins en profondeur, importation
massif de données
Flexibilité du modèle de données : nœuds
et relations
Défauts formalisme RDF pour nos activités de recherche → Choix GPE
Manque de concepts → Conception et implémentation de Pegasus (GPE)
De multiples bases de données identifient des ressources fonctionnellement
identiques comme des gènes
Modélisation de concepts fonctionnellement identiques par
plusieurs entités et reliées par des références croisées
Concepts de gènes et de protéines sont souvent mélangés et le concept de
transcrit est généralement absent
Modélisation des gènes, transcrits, protéines et d’unités fonctionnelles par des entités distinctes reliées par des
relations
Certains concepts de perturbateurs chimiques et biologiques
sont absents
Modélisation des perturbateurs par des entités distinctes
Aucun concept de signatures phénotypiques et de similarités phénotypiques
dans les représentations existantes
Modélisation des similarités phénotypiques par des relations
entre signatures phénotypiques (entités)
Les relations reliant les nœuds des graphes sont directes dans les
représentations existantes
Entité intermédiaire pour relier et annoter contextuellement plusieurs entités entre elles
Plateforme de génération pour traiter les sources de données
pharmaco-biologiques hétérogènes et de provenances multiples
automatiquement et générer Pegasus

More Related Content

What's hot

The Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdfThe Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdf
Neo4j
 
The Data Platform for Today’s Intelligent Applications
The Data Platform for Today’s Intelligent ApplicationsThe Data Platform for Today’s Intelligent Applications
The Data Platform for Today’s Intelligent Applications
Neo4j
 

What's hot (20)

The art of the possible with graph technology_Neo4j GraphSummit Dublin 2023.pptx
The art of the possible with graph technology_Neo4j GraphSummit Dublin 2023.pptxThe art of the possible with graph technology_Neo4j GraphSummit Dublin 2023.pptx
The art of the possible with graph technology_Neo4j GraphSummit Dublin 2023.pptx
 
AstraZeneca at Neo4j GraphSummit London 14Nov23.pptx
AstraZeneca at Neo4j GraphSummit London 14Nov23.pptxAstraZeneca at Neo4j GraphSummit London 14Nov23.pptx
AstraZeneca at Neo4j GraphSummit London 14Nov23.pptx
 
Workshop - Build a Graph Solution
Workshop - Build a Graph SolutionWorkshop - Build a Graph Solution
Workshop - Build a Graph Solution
 
Demystifying Graph Neural Networks
Demystifying Graph Neural NetworksDemystifying Graph Neural Networks
Demystifying Graph Neural Networks
 
Neo4j : Graphes de Connaissance, IA et LLMs
Neo4j : Graphes de Connaissance, IA et LLMsNeo4j : Graphes de Connaissance, IA et LLMs
Neo4j : Graphes de Connaissance, IA et LLMs
 
Easily Identify Sources of Supply Chain Gridlock
Easily Identify Sources of Supply Chain GridlockEasily Identify Sources of Supply Chain Gridlock
Easily Identify Sources of Supply Chain Gridlock
 
Kerry Group: How Neo4j graph technology is delivering benefits to Kerry Group...
Kerry Group: How Neo4j graph technology is delivering benefits to Kerry Group...Kerry Group: How Neo4j graph technology is delivering benefits to Kerry Group...
Kerry Group: How Neo4j graph technology is delivering benefits to Kerry Group...
 
Optimizing Your Supply Chain with the Neo4j Graph
Optimizing Your Supply Chain with the Neo4j GraphOptimizing Your Supply Chain with the Neo4j Graph
Optimizing Your Supply Chain with the Neo4j Graph
 
BT Group: Use of Graph in VENA (a smart broadcast network)
BT Group: Use of Graph in VENA (a smart broadcast network)BT Group: Use of Graph in VENA (a smart broadcast network)
BT Group: Use of Graph in VENA (a smart broadcast network)
 
The Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdfThe Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdf
 
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data ScienceGet Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
 
Neo4j GraphSummit London - The Path To Success With Graph Database and Data S...
Neo4j GraphSummit London - The Path To Success With Graph Database and Data S...Neo4j GraphSummit London - The Path To Success With Graph Database and Data S...
Neo4j GraphSummit London - The Path To Success With Graph Database and Data S...
 
Building a modern data stack to maintain an efficient and safe electrical grid
Building a modern data stack to maintain an efficient and safe electrical gridBuilding a modern data stack to maintain an efficient and safe electrical grid
Building a modern data stack to maintain an efficient and safe electrical grid
 
Försäkringskassan: Neo4j as an Information Hub (GraphSummit Stockholm 2023)
Försäkringskassan: Neo4j as an Information Hub (GraphSummit Stockholm 2023)Försäkringskassan: Neo4j as an Information Hub (GraphSummit Stockholm 2023)
Försäkringskassan: Neo4j as an Information Hub (GraphSummit Stockholm 2023)
 
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
 
The path to success with Graph Database and Graph Data Science
The path to success with Graph Database and Graph Data ScienceThe path to success with Graph Database and Graph Data Science
The path to success with Graph Database and Graph Data Science
 
The Data Platform for Today’s Intelligent Applications
The Data Platform for Today’s Intelligent ApplicationsThe Data Platform for Today’s Intelligent Applications
The Data Platform for Today’s Intelligent Applications
 
The three layers of a knowledge graph and what it means for authoring, storag...
The three layers of a knowledge graph and what it means for authoring, storag...The three layers of a knowledge graph and what it means for authoring, storag...
The three layers of a knowledge graph and what it means for authoring, storag...
 
Workshop - Neo4j Graph Data Science
Workshop - Neo4j Graph Data ScienceWorkshop - Neo4j Graph Data Science
Workshop - Neo4j Graph Data Science
 
Graph Data Modeling Best Practices(Eric_Monk).pptx
Graph Data Modeling Best Practices(Eric_Monk).pptxGraph Data Modeling Best Practices(Eric_Monk).pptx
Graph Data Modeling Best Practices(Eric_Monk).pptx
 

Similar to SERVIER Pegasus - Graphe de connaissances pour les phases primaires de recherche de nouveaux médicaments

2012 10-24 - ngs webinar
2012 10-24 - ngs webinar2012 10-24 - ngs webinar
2012 10-24 - ngs webinar
Elsa von Licy
 
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Databricks
 
2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappan2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappan
Elsa von Licy
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
Atai Rabby
 
Practical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS projectPractical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS project
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 

Similar to SERVIER Pegasus - Graphe de connaissances pour les phases primaires de recherche de nouveaux médicaments (20)

2012 10-24 - ngs webinar
2012 10-24 - ngs webinar2012 10-24 - ngs webinar
2012 10-24 - ngs webinar
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
 
16S MVRSION at Washington University
16S MVRSION at Washington University16S MVRSION at Washington University
16S MVRSION at Washington University
 
Ngs webinar 2013
Ngs webinar 2013Ngs webinar 2013
Ngs webinar 2013
 
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
 
Annotation capabilities
Annotation capabilitiesAnnotation capabilities
Annotation capabilities
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and How
 
2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappan2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappan
 
Open PHACTS (Sept 2013) EBI Industry Programme
Open PHACTS (Sept 2013) EBI Industry ProgrammeOpen PHACTS (Sept 2013) EBI Industry Programme
Open PHACTS (Sept 2013) EBI Industry Programme
 
bioinfomatics
bioinfomaticsbioinfomatics
bioinfomatics
 
Next generation sequencing & microarray-- Genotypic Technology
Next generation sequencing & microarray-- Genotypic TechnologyNext generation sequencing & microarray-- Genotypic Technology
Next generation sequencing & microarray-- Genotypic Technology
 
Introducing VSWarehouse - A Scalable Genetic Data Warehouse for VarSeq
Introducing VSWarehouse - A Scalable Genetic Data Warehouse for VarSeqIntroducing VSWarehouse - A Scalable Genetic Data Warehouse for VarSeq
Introducing VSWarehouse - A Scalable Genetic Data Warehouse for VarSeq
 
QPS Biomarker Capabilities
QPS Biomarker CapabilitiesQPS Biomarker Capabilities
QPS Biomarker Capabilities
 
GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
DisGeNET: A discovery platform for the dynamical exploration of human disease...
DisGeNET: A discovery platform for the dynamical exploration of human disease...DisGeNET: A discovery platform for the dynamical exploration of human disease...
DisGeNET: A discovery platform for the dynamical exploration of human disease...
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 
Practical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS projectPractical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS project
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
Using Public Access Clinical Databases to Interpret NGS Variants
Using Public Access Clinical Databases to Interpret NGS VariantsUsing Public Access Clinical Databases to Interpret NGS Variants
Using Public Access Clinical Databases to Interpret NGS Variants
 

More from Neo4j

More from Neo4j (20)

GraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysisGraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysis
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
 
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
KLARNA -  Language Models and Knowledge Graphs: A Systems ApproachKLARNA -  Language Models and Knowledge Graphs: A Systems Approach
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4j
 
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptxBT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
 
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit MilanWorkshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
 
Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
Workshop -  Architecting Innovative Graph Applications- GraphSummit MilanWorkshop -  Architecting Innovative Graph Applications- GraphSummit Milan
Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
 
LARUS - Galileo.XAI e Gen-AI: la nuova prospettiva di LARUS per il futuro del...
LARUS - Galileo.XAI e Gen-AI: la nuova prospettiva di LARUS per il futuro del...LARUS - Galileo.XAI e Gen-AI: la nuova prospettiva di LARUS per il futuro del...
LARUS - Galileo.XAI e Gen-AI: la nuova prospettiva di LARUS per il futuro del...
 
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4jGraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
 
GraphSummit Milan & Stockholm - Neo4j: The Art of the Possible with Graph
GraphSummit Milan & Stockholm - Neo4j: The Art of the Possible with GraphGraphSummit Milan & Stockholm - Neo4j: The Art of the Possible with Graph
GraphSummit Milan & Stockholm - Neo4j: The Art of the Possible with Graph
 
LARUS - Galileo.XAI e Gen-AI: la nuova prospettiva di LARUS per il futuro del...
LARUS - Galileo.XAI e Gen-AI: la nuova prospettiva di LARUS per il futuro del...LARUS - Galileo.XAI e Gen-AI: la nuova prospettiva di LARUS per il futuro del...
LARUS - Galileo.XAI e Gen-AI: la nuova prospettiva di LARUS per il futuro del...
 
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale IbridaUNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
 
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
 
From Knowledge Graphs via Lego Bricks to scientific conversations.pptx
From Knowledge Graphs via Lego Bricks to scientific conversations.pptxFrom Knowledge Graphs via Lego Bricks to scientific conversations.pptx
From Knowledge Graphs via Lego Bricks to scientific conversations.pptx
 
Novo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMsNovo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMs
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 

Recently uploaded

Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 

Recently uploaded (20)

Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Kraków
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024
 
iGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by SkilrockiGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by Skilrock
 

SERVIER Pegasus - Graphe de connaissances pour les phases primaires de recherche de nouveaux médicaments

  • 1. Neo4J - GraphSummit Paris - 08/06/2023 Pegasus – Knowledge Graph For The Early Drug Discovery Jeremy Grignard, PhD Research & Data Scientist Institut de Recherches Servier
  • 2. Discovering A New Drug: A Long, Expensive And Risky Process 2 Research Clinical Discovery Preclinical Exploratory Target Identification Screening Phase I Phase II Phase III 106 perturbators 101 perturbators Candidate 10-15 years - 2 billion Euros - High failure rate Formulation of a causal hypothesis between a target and a disease Strategic Objective Obtain a MA for a chemical or biological entity / combination of entities every 3 years How to improve the early drug discovery phases in order to increase the success rate of drug candidates in clinical phases?
  • 3. Our Mission - Data Sciences & Data Management Efficiently guide early research projects using computational methods supported by experimental capabilities We rely on 3 interconnected activities: • High throughput design of efficient perturbators • Explainable project-oriented selection of relevant profiled perturbators • Large and heterogeneous dataset and models’ analysis to ensure target tractability and support rational decision making Profiling Systems Biology Sequence Designs Knowledge Graph 4 main interconnected areas of expertise 3
  • 4. 4 Useful Data Sources For Therapeutic Projects And Our Activities Large Data Heterogeneous pharmaco-biological concepts • Genes, transcripts, proteins • Ontologies (genomics, phenotypic) • Static Maps • Diseases • In Vivo / In Vitro models • Perturbators - Chemicals (small compounds, PROTACs, probes) - Target (shRNA, CRISPR, siRNA, overexpression) - Antisens oligonucleotides (ASOS) - Antibody • Fingerprints How to capitalize and link heterogeneous data to bring values for therapeutic projects and support decision making?
  • 5. 5 Pegasus – Knowledge Graph For The Early Drug Discovery Rational & Context • Integration of heterogeneous data - Labeled property graph with Neo4J • 46.371.784 entities – 66 labels • 331.570.883 relations – 14 types • Data model flexible - Model evolved over time - Can be easily changed depending on new request • Efficient data preparation (hours), import (minutes), storage and query • By and for Servier Data Model Pegasus is built to answer questions and give valuable insights extremely quickly Answering a question is like traversing paths in a graph
  • 6. 6 Pegasus – Knowledge Graph For The Early Drug Discovery Raw Data Primary Data Aggregated Data PEGASUS Preparation Aggregation Added Value Data Information Knowledge Process Exposure Target Id Card DFSL ASOS design Phenotypic screening deconvolution Action Models Prediction Models uORF Applications Report Report Cpds list List of targets Outcomes List of ASOS Process Informed compounds library
  • 7. 7 ASOS (Antisens Oligonucleotides) Design Characterization Of ASOs Off-target Effects Given (100.000) ASOs designed for a target, prioritize quickly the ones (784) to screen MATCH path=(:Asos)-[:HAS_ACTIVITIES]-(:ActivityAsos)-[:ON_TARGETS]-(:Transcript)- [:HAS_REFERENCES]-(:Transcript)-[:TRANSCRIPTS]-(:Gene)-[:HAS_REFERENCES]-(:Gene)- [:HAS_ACTIVITY]-(:ActivityGeneEssential)-[:ON_TARGETS]-(:Disease) RETURN path; Industrial Applications ASOS designed, active and non-toxic: • X and Y (preclinical) • A, B and C (LO) • O and P (Research) Note for X: • Treating a baby with epileptic encephalopathy • 6.143 designed ASOs • 2.073 → 372 essential genes • 1.344 → 400 developmental genes • 784 screen ASOs (mouse, neurons) • 13 active ASOs • 8/13 no off target • 3/13 best activities
  • 8. 8 Automated Focused Library Design HAS REFERENCE TRANSCRIPT IN TRANSLATE IN HAS HOMOLOGY GENE TRANSCRIPT PROTEIN Protein Embeddings • Encode functional and structural properties of proteins using LMs Auto-encoder
  • 9. 9 Automated Focused Library Design IS ACTIVE ON GENE TRANSCRIPT PROTEIN CONTEXT PERTURBATOR
  • 10. 10 Automated Focused Library Design HAS PREDICTED ACTIVITY GENE TRANSCRIPT PROTEIN CONTEXT Historical activity database Multitasks learning Federated deep learning: - Unbiased - Sequence based - Unlimited number of molecules - Very limited number of targets PERTURBATOR
  • 11. 11 Automated Focused Library Design CHEMICALLY SIMILAR GENE TRANSCRIPT PROTEIN CONTEXT Fingerprints + metric (Tanimoto) + threshold PERTURBATOR
  • 12. 12 Automated Focused Library Design CHEMICALLY SIMILAR PHENOTYPICALLY SIMILAR GENE TRANSCRIPT PROTEIN CONTEXT Phenotypic: Cell Painting + threshold • Biology • Polypharmacology • Perturbation agnostic: bridge the GAP between Chemistry & Biology • Medium throughput Auto-encoder PERTURBATOR
  • 13. 13 Automated Focused Library Design CHEMICALLY SIMILAR PHENOTYPICALLY SIMILAR TRANSCRIPTOMICALLY SIMILAR GENE TRANSCRIPT PROTEIN CONTEXT Transcriptomic: L1000 + cQuery • Biology • Polypharmacology • Perturbation agnostic: bridge the GAP between Chemistry & Biology • Medium throughput PERTURBATOR
  • 14. 14 Automated Focused Library DesignFiltering out compounds / Off Targets GENE TRANSCRIPT PROTEIN CONTEXT CHEMICALLY DEFINED AS PAINS ESSENTIAL GENE OFF TARGET OFF TARGET OFF TARGET LOW SPECIFICITY PERTURBATOR
  • 15. 15 Automated Focused Library DesignFiltering out compounds GENE TRANSCRIPT PROTEIN CONTEXT COMMERCIALY UNAVAILABLE COMMERCIALLY AVAILABLE INTERNALLY AVAILABLE PERTURBATOR
  • 16. 16 Automated Focused Library Design GENE TRANSCRIPT PROTEIN CONTEXT DISCARDED LEVEL OF CONFIDENCE PERTURBATOR
  • 17. 17 Automated Focused Library Design GENE TRANSCRIPT PROTEIN CONTEXT DISCARDED NOVELTY PERTURBATOR
  • 18. 18 Automated Focused Library DesignMoving away from the target of interest (cellular assay) GENE TRANSCRIPT PROTEIN CONTEXT DISCARDED PPI Pathway PART OF PATHWAY PERTURBATOR
  • 19. 19 Automated Focused Library DesignMoving away from the target of interest (cellular assay) GENE TRANSCRIPT PROTEIN CONTEXT DISCARDED INVOLVED IN PATHWAY Disease DISEASE PERTURBATOR
  • 20. 20 Automated Focused Library DesignHypothesis testing GENE TRANSCRIPT PROTEIN CONTEXT DISCARDED PATHWAY DISEASE PERTURBATOR
  • 21. 21 Automated Focused Library DesignHypothesis testing GENE TRANSCRIPT PROTEIN CONTEXT Target based hypothesis DISCARDED PATHWAY DISEASE Perturbator 1 Target 1 hypothesis 1 Predicted activity Embedding homology Translation Transcription PERTURBATOR
  • 22. 22 Automated Focused Library DesignHypothesis testing GENE TRANSCRIPT PROTEIN CONTEXT Target based hypothesis DISCARDED PATHWAY DISEASE Perturbator 1 Target 1 hypothesis 1 Predicted activity Embedding homology Translation Transcription PERTURBATOR
  • 23. 23 Automated Focused Library DesignHypothesis testing GENE TRANSCRIPT PROTEIN CONTEXT DISCARDED PATHWAY DISEASE Target based hypothesis Perturbator 1 Target 1 hypothesis 1 Predicted activity Embedding homology Translation Transcription Perturbator 2 Target 1 hypothesis 2 Chemical similarity Experimental activity Translation Transcription PERTURBATOR
  • 24. 24 Automated Focused Library DesignHypothesis testing GENE TRANSCRIPT PROTEIN CONTEXT DISCARDED PATHWAY DISEASE Target based hypothesis Perturbator 1 Target 1 hypothesis 1 Predicted activity Embedding homology Translation Transcription Perturbator 2 Target 1 hypothesis 2 Chemical similarity Experimental activity Translation Transcription Perturbator N Target 1 hypothesis M . . . PERTURBATOR
  • 25. 25 Automated Focused Library DesignHypothesis testing Target based hypothesis Perturbator 1 Target 1 hypothesis 1 Predicted activity Embedding homology Translation Transcription Perturbator 2 Target 1 hypothesis 2 Chemical similarity Experimental activity Translation Transcription Perturbator N +- 1.000 Target 1 hypothesis M . . . Hit rate enrichment vs. regular pilot screen library (null hypothesis) Medium scale screening campaign Perturbator 3 Perturbator 13 Perturbator 23 Validated hits Validated hits + Validated / unvalidated hypothesis
  • 26. 26 Automated Focused Library DesignHit deconvolution GENE TRANSCRIPT PROTEIN CONTEXT DISCARDED PATHWAY DISEASE HITS
  • 27. 27 Automated Focused Library DesignHit deconvolution GENE TRANSCRIPT PROTEIN CONTEXT HITS DISCARDED PATHWAY DISEASE CRISPR shRNA cDNA TRANSCRIPTOMIC SIMILARITY TRANSCRIPTOMIC SIMILARITY TRANSCRIPTOMIC SIMILARITY Transcriptomic: L1000 + cQuery • Biology • Polypharmacology • Perturbation agnostic: bridge the GAP between Chemistry & Biology • Medium throughput TRANSCRIPTOMIC SIMILARITY
  • 28. 28 Automated Focused Library DesignHit deconvolution GENE TRANSCRIPT PROTEIN CONTEXT HITS DISCARDED PATHWAY DISEASE CRISPR shRNA cDNA TRANSCRIPTOMIC SIMILARITY TRANSCRIPTOMIC SIMILARITY TRANSCRIPTOMIC SIMILARITY PHENOTYPIC SIMILARITY TRANSCRIPTOMIC SIMILARITY
  • 29. 29 Automated Focused Library DesignModality Expansion GENE TRANSCRIPT PROTEIN CONTEXT HITS DISCARDED PATHWAY DISEASE CRISPR shRNA cDNA ASOS PREDICTED ACTIVE
  • 30. 30 Automated Focused Library Design Simple Overview With Results Given any target, identify compounds having activity, their analogues and specificity MATCH path=(g:Gene)-[:HAS_REFERENCES]-(:Gene)-[:ON_TARGETS]-(:ActivityChemical)- [:HAS_ACTIVITIES]-(:ChemicalChembl)-[r:HAS_SIMILARITIES]-(:ChemicalServier) WHERE g.geneId = ‘target X’ AND r.tanimoto_similarity > 0.6 RETURN path; Success Stories Target X: • Identification of 984 Servier compounds chemically similar to compounds having an activity • MST validation : hit rate 15% Target Y: • From Project Leader : 4 potential reference compounds • Identification of 47 chemically similar compounds • Using annotations in Pegasus → completely non-specific profile of the reference compounds
  • 31. 31 Therapeutic Target Environment Exploration (ID card) Pegasus Application Given any target, present a target ID card
  • 32. 32 Therapeutic Target Environment Exploration (ID card) Pegasus Application Given any target, present a target ID card Success Stories For plenty of targets: A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z - Genes / transcripts / proteins identifiers, cross- reference, isoforms - Biological processes, cellular localization - Protein half lives - Gene essentiality - In Vivo models - Gene / Disease associations - SNP - Pathways - Perturbators
  • 33. 33 Conclusion Knowledge graph are well suited for integrating • Large amount of heterogenous and sparse data (as typically seen in pharmaceutical research) This data structure is guiding our therapeutics projects on various aspects • And allow us to seamlessly integrate exploratory cutting-edge AI approaches Systematic generation of data (Silico & Vitro) • Is a requirement to feed properly those database and generate knowledge out of them Communication • Neo4j Health Care & Life Sciences Workshop; Symposium Servier – AI to New Drug Development; Servier Corporate Strategy & Executive Director; France Culture « La Méthode Scientifique »…
  • 34. 34 Perspective « Les objets sont caractérisés par la façon dont ils interagissent. Si un objet n'a pas d'interactions, n'influence rien, n'agit sur rien, n'émet pas de lumière, n'attire pas, ne repousse pas, ne se laisse pas toucher, ne sent rien, etc., c'est comme s'il n'existait pas. Parler d'objets qui n'interagissent jamais, c'est parler de choses qui, quand bien même elles existeraient, ne nous concernent pas. Nous ne comprenons même pas ce que dire que de telles choses « existent » pourrait signifier. Le monde que nous connaissons, qui nous touche, qui nous intéresse, ce que nous appelons la « réalité », est le vaste réseau d'entités en interaction qui se manifestent les unes aux autres en interagissant et dont nous faisons partie. C'est à ce réseau (Pegasus) que nous nous intéressons. » Helgoland by Carlo Rovelli
  • 35. Acknowledgements J.P. Stephan N. Boisseau I. S. Khader S. Lotfi A.L. Ong A. Gohier T. Dorval DSDM Team TA(s) PA(s) 35
  • 36.
  • 37. 37 JUMP-CPJoint Undertaking In Morphological Profiling Cell Painting Phenotypic Fingerprint High-Content Screening - Cell Painting Fingerprint Analysis Compounds CRISPR Fingerprints (morphological descriptors) Phenotypic fingerprint induced by compounds screened in HCS show various phenotypic response, reveal dose response effects with different mechanism of action Negative control and no effect compounds dose response
  • 38. 38 Target Expansion Protein Fingerprint Protein Embeddings • Encode functional and structural properties of proteins using LMs Fingerprint Analysis X, Y, Z, A, B, C Protein fingerprint (embedding) seem to carry sequence and domain function information On going work: interpretability + sub-domain similarity Proteins Fingerprints (embedded sequences)
  • 39. 39 Cancer Dependency Map (DepMap) Cellular Model and Chemical Fingerprints Achilles & CCLE • Cell lines identification that express a target of interest (experimental validation for CISH) Fingerprint Analysis (PRISM 1D) • Phenotypic similarity to identify compounds sharing same phenotypic effects Targets / MOAs / Chemical Similarity In Clusters Phenotypic similarity Drugs Fingerprints (activity in cell lines) Similarity of cellular model fingerprints induced by compounds in DepMap PRISM reveal a correlation (in some clusters) between chemical structures and phenotypic responses
  • 40. 40 CMAP – L1000 Chemical / CRISPR / sh Fingerprints Dimensional Reduction Cluster - MOA / Targets Analysis Compounds identified within the cluster, obtained by reduction of L1000 fingerprints, have the same mechanism of action (MOA), and have activity on same target family (HDCA*) Compounds CRIPSR, sh, … Fingerprints (L1000 signatures)
  • 41. 41 CMAP – L1000 Chemical / CRISPR / sh Fingerprints Fingerprint Similarity (Phenotypic – DepMap) Fingerprint Similarity (Chemical) Compounds identified within the cluster, obtained by reduction of L1000 fingerprints, have the same phenotypic effects (in DepMap PRISM) and are not chemically similar while sharing same mechanism of action and activity on target family (HDCA*)
  • 42. 42 Our PA DSDM activities Linked To Pegasus (so far) Support Therapeutic Projects Therapeutic target environment exploration (ID card) • Get target annotations: isoforms, biological processes, localization, half lives, essentiality, models, diseases, SNP, pathways, perturbators activity Optimized design of Antisense Oligonucleotides (ASO) • Characterize ASOs off-targets effects to prioritize ASO to screen Design new experiment (e.g coupling uORFs / ASOS) • Identify target transcripts having uORFs that can be targeted by ASOs to increase protein expression Identification of focused screening libraries • Identify relevant perturbators, validate of biological hypothesis, and expand from Servier compound library to phenotypic, omics, cellular spaces These activities are based on heterogeneous pharmaco-biological data already present in Pegasus and under exploration for future integration
  • 43. 43 Chemical / Activity On Targets Chemical Fingerprint Chemicals / Activity extraction • Extraction, annotations (e.g PAINS, frequent hitter, reactive), and standardization Given a target of interest, quick identification of compounds having activity, analogues and compound specificity Chemicals Fingerprints (e.g FCFP6)
  • 44. Formalismes de représentation des connaissances pour mettre en relation des données hétérogènes Graphe Ressource Description Framework (RDF) Graphe à Propriétés Étiqueté (GPE) Description atomique des données : triplet (sujet, prédicat, objet) Partage d’informations et de données cohérentes • Modèle de données éprouvé • Difficulté d’identification d’une sémantique commune à l’information atomique à décrire • Modèle RDF peu flexible Données sous forme d’entités (nœuds) et de relations Analyse de graphe, recherche de chemins en profondeur, importation massif de données Flexibilité du modèle de données : nœuds et relations Défauts formalisme RDF pour nos activités de recherche → Choix GPE Manque de concepts → Conception et implémentation de Pegasus (GPE)
  • 45. De multiples bases de données identifient des ressources fonctionnellement identiques comme des gènes Modélisation de concepts fonctionnellement identiques par plusieurs entités et reliées par des références croisées
  • 46. Concepts de gènes et de protéines sont souvent mélangés et le concept de transcrit est généralement absent Modélisation des gènes, transcrits, protéines et d’unités fonctionnelles par des entités distinctes reliées par des relations
  • 47. Certains concepts de perturbateurs chimiques et biologiques sont absents Modélisation des perturbateurs par des entités distinctes
  • 48. Aucun concept de signatures phénotypiques et de similarités phénotypiques dans les représentations existantes Modélisation des similarités phénotypiques par des relations entre signatures phénotypiques (entités)
  • 49. Les relations reliant les nœuds des graphes sont directes dans les représentations existantes Entité intermédiaire pour relier et annoter contextuellement plusieurs entités entre elles
  • 50. Plateforme de génération pour traiter les sources de données pharmaco-biologiques hétérogènes et de provenances multiples automatiquement et générer Pegasus