Data-Driven Approaches to
Medicinal Chemistry
How Large-Scale Normalized
Data Empowers Drug Discovery
RICT 2015
Drug Discovery and Selection
Barberan Olivier
Senior Product Manager Reaxys Medicinal Chemistry
July 1
The Lead optimization Chalenge :
Optimization of early subtances to potential drug
2
Potency &
Selectivity
DMPK Properties
Physical properties
Safety
pharmacology
Opportunity : Knowledge-driven drug design using structure activity
relationship knowledge base
3
• General descriptor-property
relationships
• Sub-structural alerts
• QSAR
• Matched Molecular Pair analyses
• Predictive pharmacology
Etc…
Cumming, J.G., Davis, A.M. et al. Nat. Rev. Drug Disc. (2013) 12, 948–962
Data Knownledge Predictions
• Data normalization
• Taxonomies
• Quality control
Etc…
“Those who cannot remember the past are condemned to repeat it.”
George Santayana: Life of Reason, Reason in Common Sense, Scribner's, 1905, page 284
4
• Integration of high value Data Sources supporting Lead Finding and lead
Optimization
4
Elsevier Solution for Lead Optimization : Reaxys medicinal Chemistry
Transform
Load
Extract
• Substances : 1M
• Biological results : 3.5 M
• Substances : 3.5 M
• Biological results : 8 M
• Substances : 4.2 M
• Biological results : 22 M
• Substances : 6 M
• Biological results : 29 M
Data Normalization (parameters, Units etc…)
Structure Normalization
Taxonomies (Targets, Sepecies, Cell lines,
Tissues/organs, bioassays)
Reaxys Medicinal Chemistry Coverage
Substances
Chemical structure ,Name, code, synonym of compound, calculated physchem
properties (log P, HBA, HBD, PSA, RotB), Lipinsky rules
Druggable target
Explore Target affinity patterns of chemical compounds
In vitro and Cell Based assays
In vitro assays (binding, second messenger etc..) and Cell based assays for
example : Aggregation, Angiogenesis, Apoptosis, Cell differentiation, Cellular Cycle
Animal models disease
Zucker rats for obesity model, ovariectomized rat in osteoporosis, treatment of
glaucoma, Xenografted animals with tumors to test antineplastic drugs
Pharmacokinetic and ADME Properties
Metabolic stability, Intrinsic clearance, Half life of elimination, Bioavailability, In
vivo Clearance
Toxicity
Cytotoxicity, cardiotoxicity, chronic toxicity
Reaxys Medicinal Chemistry : Journals coverage
6
6
• 345 000 articles are included in Reaxys Medicinal Chemistry
• corresponding to >5000 Journals from 1980 to present.
• Some articles stored in Reaxys Medicinal Chemistry are older than 1980.
• Elsevier and others publishers are covered.
• Medicinal chemistry journals are the cornerstone of Reaxys Medicinal chemistry but not
only pharmacology, biology and Chemistry journals are also included.
Reaxys Medicinal Chemistry : Patent excerption examples
7
8
Chemical diversity per target : JAK3 and NPY5
JAK3 Substances Diversity
(B&M Scaffolds)
RMC (patent only) 43365 16045
RMC (Articles only) 3283 2199
RMC 45715 17828
chEMBL 2443 1490
NPY5 Substances Diversity
(B&M Scaffolds)
RMC (patent only) 12698 5700
RMC (Articles only) 2537 1014
RMC 14544 5963
chEMBL 1483 652
95%
+914%
+1196%
90%
- Patents increase the chemical diversity by around 1000% versus articles only
- Patents represent around 90% of the overall chemical diversity
Putting Data to Work |
Hit to lead : Virtual
screening
Putting Data to Work | 10
Ligand Based virtual Screening –
Using Reaxys Medicinal CHemistry
Objective
• Describe an In Silico Screening approach using
Reaxys Medicinal Chemistry
Case Study on T-Type calcium channels
Putting Data to Work | 11
Ligand-Based In Silico Screening
Simple Target name
search returns all
results
Filter on active
compound pX>7
ANSWERS
130 compounds and 1200
experimental data
Putting Data to Work | 12
Ligand-Based In Silico Screening
130 Query structures
Flat file
Representation & Chemical Space
Molecular descriptors &
Fingerprints
Virtual Screening Pharmacophoric
Similarity
N
O
N
N
N
O
N
N
N
314 Hits
"Drug-like" Filtering
1. Molecular diversity and chemical originality
2. Compounds availability
39 compounds ordered for testing
Putting Data to Work |
FEATURES
The Reaxys Medicinal Chemistry Flatfile
• Substance information (~ 26 million substances)
• The substances are delivered as a series of SD files containing
all structures from Reaxys in Molfile format together with their
identification data and a list of available facts and reactions for
each compound
• Unstructured substances are included as empty Molfiles
• Bioactivity data (> 29 million bioactivity data points)
• The bioactivity data are delivered as a series of linked data files
in XML format, using the Resource Description Framework
(RDF), compliant with the OpenPHACTS guidelines
• The XML files contain information on bioassays, citations,
bioactivity data points, substance facts and bioactivity targets
• This includes pharmacokinetic and ADME property data, toxicity
data
Substance
information
Putting Data to Work | 14
Biological activity
Electrophysiology experiments: Screening @10 µM
on Cav3.2 T-Type channels
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627 2930313233343536373839
0
25
50
75
100
Peakcurrentinhibition(%)
28
9 compounds with a % inhibition > 75%
15 compounds with a % inhibition >50%
Compound # (@ 10µM)
ADMET Properties influencing
medicinal Chemistry design
Prediction of ADMET properties
Influencing medicinal chemistry design
• logD7.4
• Protein Binding
• Solubility
• Metabolic
Stability
• hERG
• Etc…
Step1
•Rat PPB
• Hu heps
• CYP inhib
• Caco2
• NaV1.5
• Etc…
Step2
• logD7.4
• Solubility
•Protein Binding
• hERG
• Rat PPB
• Metabolic
Stability
• CYP inhib
• Caco2
• etc.
Step 0
AstraZeneca’s global HERG QSAR model70 has contributed to the
reduction in the synthesis of ‘red flag’ compounds (compounds that are
measured to have an HERG potency of <1μM)
from 25.8% of all compounds tested in 2003 to only 6% in 2010.
Cumming, J.G., Davis, A.M. et al. Nat. Rev. Drug Disc. (2013) 12, 948–962
Predictive modeling of
chemical solubility
Case Study: Solubility Modeling
18
Overview
• Reaxys has an impressive amount of data about compounds that was reported in literature
• Using the Reaxys API one can access the data and create predictive models
• This example uses aqueous solubility as reported in literature, this is not the intrinsic solubility for
neutral molecules, this is whatever the authors reported which is generally at neutral pH.
• Every value has a reference that you could read to verify where the value came from
Model Making Process
• Extracted 6893 reported aqueous solubilities in g/L reported in Reaxys
Converted reported values to molarity using molecular weight, computed logS
Averaged values when there were multiple reports for each compound
• Created a KNIME workflow to do the analysis
• Used CDK molecular descriptors and R to make simple solubility model
Simple multiple regression model “lm” in R
• Also created a model that reports the solubility of the most similar compound
This has been reported to be surprisingly effective!
This works well for Reaxys because of the large number of compounds with solubilities
Relevant data where and when they are needed
ELSS content integrated into the existing environment of tools and processes
19
Script,
PipelinePilot
or KNIME
node
Set of
compound
structures
List of
target
names
Patent
numbers
Bioactivity
values
Compound
structures
Chemical
properties
Input OutputSearch, retrieve &
process element
– Visualisation,
Spotfire input
– Reporting,
dashboard
production
– Excel tables
– QSAR/QSPR
modeling
– Hit-to-lead
optimization
– Reaction
modeling
– Text mining
Further processing
Knime workflow for Solubility Modeling
20
KNIME Workflow
• Can create this in PipelinePilot
• Can auto-update with new data in Reaxys since it pulls directly
from the server
Putting Data to Work |
Solubility modeling : Predicted-vs-Actual
• Predicted vs actual Log (S[M])
• Could filter for a “better” subset of compounds
• More scatter than recent work; in this framework
can try various descriptors to find ones that work
best
Residual standard error: 1.253 on 3437 degrees of freedom
Multiple R-squared: 0.5728, Adjusted R-squared: 0.5636
F-statistic: 62.29 on 74 and 3437 DF, p-value: < 2.2e-16
• Recent work of Yalkowsky, 1642 selected
compounds.
• Used group-contribution methods
• Std error 0.8 log units
• Int J Pharm. 2008 Aug 6;360(1-2):122-47.
doi: 10.1016/j.ijpharm.2008.04.028
Putting Data to Work |
Conclusions
• One can use the valuable properties reported in Reaxys for creating
models
• Much more biological information available in Reaxys Medicinal Chemistry!
• All sources are referenced
• The API allows easy access to the data outside of the web user
interface for models
• One can make several kinds of models and show all results, or make a
consensus determination
Safety Pharmacology :
avoiding hERG inhibition
Putting Data to Work | 24
Why avoiding Herg inhibition?
Putting Data to Work |
Case Study : Which are the antagonist of 5-HT2a antagonist with low
affinity on herg Channel?
• 5-HT2A receptor antagonism in contributing to the therapeutic effect of several clinically
effective and potential atypical antipsychotics as well as several antidepressants.
• The ability of selective 5-HT2A receptor antagonists to interfere with the heightened state of
dopamine activity without altering basal tone, suggests that these drugs possess antipsychotic
activity and may provide the basis for new therapies for psychosis and drug dependence.
search for 5-HT2a
antagonist
search for
compounds tested
on Herg
Putting Data to Work | 26
26
Click on Heatmap overlay to
retrieve 5-HT2 antagonist tested
on Herg
Combine Hitsets
Putting Data to Work | 27
Which are the antagonist of 5-HT2a antagonist with low affinity on herg Channel?
The following Heatmap displays 99 5-
HT2a antagonist tested also on Herg
Channel
Putting Data to Work | 28
Which are the antagonist of 5-HT2a antagonist with low affinity on herg Channel?
Most active antagonist on 5-HT2A
(~10nM) with low affinity on Herg
How to avoid erg inhibition
Putting Data to Work |
Prediction of Cardiotoxic
drugs related to hERG
blockade
Putting Data to Work |
Introduction
 QT interval prolongation can lead to serious arrhythmias which can evolve
to fatal issue.
 A large number of non-cardiac drugs-induced QT prolongation has been
reported and continues to increase with the withdrawn of some
blockbusters medicines.
 hERG seems to be the main target of this adverse side effect.
 In silico models could be rapid and powerful tools to screen out potential
hERG blockers as early as possible during the discovery process.
Putting Data to Work |
Extraction & Methodology
hERG data set
640 mol
Recursive Partitioning (RP)
MOE QuaSAR Classify
Molecules tested on hERG (Kv11.1)
2D-Molecular descriptors sets Predictive models
Subsets of molecules
According to biological detailed protocols
Representation of hERG ligands within chemical space
of NCI database according to the two first PCA axis
NCI database
hERG data set
Cross-validation
External validation
Putting Data to Work |
Model 3 HIGH WEAK
1 µM 10 µM 50 / 9 mol50 / 46 mol
Training Test
Descriptors Relevant P_VSA Relevant P_VSA
High 48/50 47/50 45/46 42/46
Weak 49/50 49/50 8/9 9/9
All 97% 96% 96% 93%
Correct classifications determined by 5-fold cross-validation (Training) and
by external validation (Test) for each descriptor set.
SlogP_VSA7
SlogP_VSA2SMR_VSA6
PEOE_VSA+1
+ +
+ +
+
-
- -
-
-
SMR_VSA5 SMR_VSA6
SMR_VSA5
SMR_VSA4
PEOE_VSA+0
+
-
SlogP
PEOE_VSA_FHYD
SMR_VSA1
SMR_VSA6
SMR_VSA1
SlogP vsa_pol
-
-
-- ++
Relevant P_VSA
Putting Data to Work |
Conclusion
 Reaxys Medicinal Chemistry permitted to retrieve high quality dataset
with chemical diversity and homogeneous biological activities.
 Pertinent predictive models of hERG activity have been designed
using recursive partitioning analysis with 2D-molecular descriptors.
 From Reaxys Medicinal Chemistry, fast virtual screening approach
could be used as early tool in the drug discovery process to avoid
cardiotoxic side effect related to hERG blockade.
AstraZeneca’s global HERG QSAR model70 has contributed to the reduction in the
synthesis of ‘red flag’ compounds (compounds that are measured to have an HERG
potency of <1μM)
from 25.8% of all compounds tested in 2003 to only 6% in 2010.
Cumming, J.G., Davis, A.M. et al. Nat. Rev. Drug Disc. (2013) 12, 948–962
Even if models are performing well designers need interpretable models
Putting Data to Work |
Mine Reaxys medicinal
chemistry for Metabolic
staability
Putting Data to Work | 35
How to search Metabolism of a certain Phenotype
Bioassay Category Parameter
Broad search
(All parameters)
Precise search
(by parameters)
Pyrrolidine versus Azetidine metabolic stability
Putting Data to Work | 36
How to Access to Metabolism details
enzyme
Tissue/
Organ
Cell
Fraction
Enzyme
Substrate
Putting Data to Work | 37
Metabolic Stability Export
Putting Data to Work |
Show Case 1 : Pyrrolidine metabolic stability
Pyrrolidines are known to be metabolically unstable. Are there pyrrolidines out there
with an intrinsic clearance in microsomes <20 µL/min/mg of protein? How do the
complete structures of the compounds look like
The overall search on intrinsic clearance (ml/min/g or µL/min/mg of protein) of Pyrrolidines
in Reaxys Medicinal Chemistry provides1031 Substances and 1777 clearance results. (see
below) extracted from 138 citations.
Putting Data to Work |
More stable Pyrrolidine compouds
• The top 10 pyrrolidine
compounds having the
lowest intrinsic
clearance are displayed
Putting Data to Work |
Show Case 2 : Azetidine Metabolic stability
• Different cyclic amines might be tolerated. What is known about the metabolic
stability of azetidines? Are they more stable than pyrrolidines? What modifications
of the azetidine are known?
•
Putting Data to Work |
Among the stable azetidines (clint <20 µl/min/mg of prot) the followings scaffolds were found.
What modifications of the azetidine are known ?
Putting Data to Work |
Are Azetidines more stable than pyrrolidines?
Based on the graph below displaying azetidines and Pyrrolidines Clearance results founds in RMC, it’s appear that in
general Azetidines are more stable than Pyrrolidines.
• 40% of the total results of Azetine clearances are below 20 µl/min/mg of prot but only 30% for Pyrrolidines.
• This is even obvious when looking into the 20 to 100 µl/min/mg of prot of Clearance range where 42% of
azetine results fall into this category but only 20% of pyrrolydines.
Putting Data to Work |
Lead optimization
exploration of structural features of a lead
series of Compounds
Putting Data to Work | 44
Exploration of structural features of a lead series of
Compounds
• Within NK3 the 3,4-dichlorophenyl group appears to be important as structural
feature
• Are there more target classes in which the diCl-Phe play an important role?
• Does the 3,4-di Cl Phe cause a certain activity profile?
• Are other 3,4-diX Phe structures known and what is their pharmacological
profile?
• Are there other di-substitution patterns with a strong pharmacological
response? (Other than 3,4 is meant here).
Putting Data to Work | 45
Does the 3,4-di Cl Phe cause a certain activity profile?
Substructure search
for 3,4 Dichloro
Phenyl Fragment
30% of the substances
containing 3,4 DiCl Phenyl
fragment have a bioactivity
below 0,1µM
Putting Data to Work | 46
Are there more target classes in which the 3,4 diCl-Phe plays an important role
Select bioactivities
below 0,1µM
• Target Profile of 3,4 DiCl Phenyl
• Target are ranked based on
count of bioactivities. Yellow
bars indicate the count of
bioactivities below 0,1µM
Putting Data to Work | 47
Are there more target classes in which the 3,4 diCl-Phe plays an important role
Off Targets/CNS adverse Effect: Addiction/psychostimulant
Off Targets/CNS adverse Effect: Attention/perception
Off Targets/CNS adverse Effect: Learning/Memory
3,4 Dichloro Phenyl group are also involved in
Off-Targets Mainly CNS related
Target Profile of 3,4 DiCl
Phenyl
With an affinity below
0,1µM and having at least
100 bioactivities
Substances tested on NK3 are
not tested ON other targets
except one substances on
Histamine 1
Putting Data to Work | 48
Are other 3,4-diX Phe structures known and what is their pharmacological profile?
3,4-DiFluoro Phenyl 3,4-Dibromo Phenyl 3,4-Dibromo Phenyl
Target Profile of 3,4 DiX Phenyl
With an affinity below 0,1µM
Putting Data to Work | 49
Are there other di-substitution patterns with a strong pharmacological response?
(Other than 3,4 is meant here)
2,4-DiChloro Phenyl 2,5-DiChloro Phenyl 2,3-DiChloro Phenyl
• Canabinoid receptor (1 and 2)
Melanocortin 4
5-HT 2C
Amyloid precursor protein (App)
Dopamine receptor (2 and 3)
p38a
Cytochrome P450
3A4 potential Drug
drug interactions
Target Profile of X,Y DiCl Phenyl.With an
affinity below 0,1µM
Putting Data to Work |
Reaxys Medicinal chemistry accelerates Drug Discovery by
Knowledge based Design
• Mine large datasets to find hits (virtual screening)
• Mine large datasets to accelerate understanding & derive useful medicinal
chemistry knowledge
• Apply this knowledge to propose and evaluate new, better molecules to
fulfil the multi-objective design needs of Lead Optimisation
• Apply this to develop clinical candidates faster
• Apply this knowlegdge base to repurpose drug.

Data drivenapproach to medicinalchemistry

  • 1.
    Data-Driven Approaches to MedicinalChemistry How Large-Scale Normalized Data Empowers Drug Discovery RICT 2015 Drug Discovery and Selection Barberan Olivier Senior Product Manager Reaxys Medicinal Chemistry July 1
  • 2.
    The Lead optimizationChalenge : Optimization of early subtances to potential drug 2 Potency & Selectivity DMPK Properties Physical properties Safety pharmacology
  • 3.
    Opportunity : Knowledge-drivendrug design using structure activity relationship knowledge base 3 • General descriptor-property relationships • Sub-structural alerts • QSAR • Matched Molecular Pair analyses • Predictive pharmacology Etc… Cumming, J.G., Davis, A.M. et al. Nat. Rev. Drug Disc. (2013) 12, 948–962 Data Knownledge Predictions • Data normalization • Taxonomies • Quality control Etc… “Those who cannot remember the past are condemned to repeat it.” George Santayana: Life of Reason, Reason in Common Sense, Scribner's, 1905, page 284
  • 4.
    4 • Integration ofhigh value Data Sources supporting Lead Finding and lead Optimization 4 Elsevier Solution for Lead Optimization : Reaxys medicinal Chemistry Transform Load Extract • Substances : 1M • Biological results : 3.5 M • Substances : 3.5 M • Biological results : 8 M • Substances : 4.2 M • Biological results : 22 M • Substances : 6 M • Biological results : 29 M Data Normalization (parameters, Units etc…) Structure Normalization Taxonomies (Targets, Sepecies, Cell lines, Tissues/organs, bioassays)
  • 5.
    Reaxys Medicinal ChemistryCoverage Substances Chemical structure ,Name, code, synonym of compound, calculated physchem properties (log P, HBA, HBD, PSA, RotB), Lipinsky rules Druggable target Explore Target affinity patterns of chemical compounds In vitro and Cell Based assays In vitro assays (binding, second messenger etc..) and Cell based assays for example : Aggregation, Angiogenesis, Apoptosis, Cell differentiation, Cellular Cycle Animal models disease Zucker rats for obesity model, ovariectomized rat in osteoporosis, treatment of glaucoma, Xenografted animals with tumors to test antineplastic drugs Pharmacokinetic and ADME Properties Metabolic stability, Intrinsic clearance, Half life of elimination, Bioavailability, In vivo Clearance Toxicity Cytotoxicity, cardiotoxicity, chronic toxicity
  • 6.
    Reaxys Medicinal Chemistry: Journals coverage 6 6 • 345 000 articles are included in Reaxys Medicinal Chemistry • corresponding to >5000 Journals from 1980 to present. • Some articles stored in Reaxys Medicinal Chemistry are older than 1980. • Elsevier and others publishers are covered. • Medicinal chemistry journals are the cornerstone of Reaxys Medicinal chemistry but not only pharmacology, biology and Chemistry journals are also included.
  • 7.
    Reaxys Medicinal Chemistry: Patent excerption examples 7
  • 8.
    8 Chemical diversity pertarget : JAK3 and NPY5 JAK3 Substances Diversity (B&M Scaffolds) RMC (patent only) 43365 16045 RMC (Articles only) 3283 2199 RMC 45715 17828 chEMBL 2443 1490 NPY5 Substances Diversity (B&M Scaffolds) RMC (patent only) 12698 5700 RMC (Articles only) 2537 1014 RMC 14544 5963 chEMBL 1483 652 95% +914% +1196% 90% - Patents increase the chemical diversity by around 1000% versus articles only - Patents represent around 90% of the overall chemical diversity
  • 9.
    Putting Data toWork | Hit to lead : Virtual screening
  • 10.
    Putting Data toWork | 10 Ligand Based virtual Screening – Using Reaxys Medicinal CHemistry Objective • Describe an In Silico Screening approach using Reaxys Medicinal Chemistry Case Study on T-Type calcium channels
  • 11.
    Putting Data toWork | 11 Ligand-Based In Silico Screening Simple Target name search returns all results Filter on active compound pX>7 ANSWERS 130 compounds and 1200 experimental data
  • 12.
    Putting Data toWork | 12 Ligand-Based In Silico Screening 130 Query structures Flat file Representation & Chemical Space Molecular descriptors & Fingerprints Virtual Screening Pharmacophoric Similarity N O N N N O N N N 314 Hits "Drug-like" Filtering 1. Molecular diversity and chemical originality 2. Compounds availability 39 compounds ordered for testing
  • 13.
    Putting Data toWork | FEATURES The Reaxys Medicinal Chemistry Flatfile • Substance information (~ 26 million substances) • The substances are delivered as a series of SD files containing all structures from Reaxys in Molfile format together with their identification data and a list of available facts and reactions for each compound • Unstructured substances are included as empty Molfiles • Bioactivity data (> 29 million bioactivity data points) • The bioactivity data are delivered as a series of linked data files in XML format, using the Resource Description Framework (RDF), compliant with the OpenPHACTS guidelines • The XML files contain information on bioassays, citations, bioactivity data points, substance facts and bioactivity targets • This includes pharmacokinetic and ADME property data, toxicity data Substance information
  • 14.
    Putting Data toWork | 14 Biological activity Electrophysiology experiments: Screening @10 µM on Cav3.2 T-Type channels 1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627 2930313233343536373839 0 25 50 75 100 Peakcurrentinhibition(%) 28 9 compounds with a % inhibition > 75% 15 compounds with a % inhibition >50% Compound # (@ 10µM)
  • 15.
  • 16.
    Prediction of ADMETproperties Influencing medicinal chemistry design • logD7.4 • Protein Binding • Solubility • Metabolic Stability • hERG • Etc… Step1 •Rat PPB • Hu heps • CYP inhib • Caco2 • NaV1.5 • Etc… Step2 • logD7.4 • Solubility •Protein Binding • hERG • Rat PPB • Metabolic Stability • CYP inhib • Caco2 • etc. Step 0 AstraZeneca’s global HERG QSAR model70 has contributed to the reduction in the synthesis of ‘red flag’ compounds (compounds that are measured to have an HERG potency of <1μM) from 25.8% of all compounds tested in 2003 to only 6% in 2010. Cumming, J.G., Davis, A.M. et al. Nat. Rev. Drug Disc. (2013) 12, 948–962
  • 17.
  • 18.
    Case Study: SolubilityModeling 18 Overview • Reaxys has an impressive amount of data about compounds that was reported in literature • Using the Reaxys API one can access the data and create predictive models • This example uses aqueous solubility as reported in literature, this is not the intrinsic solubility for neutral molecules, this is whatever the authors reported which is generally at neutral pH. • Every value has a reference that you could read to verify where the value came from Model Making Process • Extracted 6893 reported aqueous solubilities in g/L reported in Reaxys Converted reported values to molarity using molecular weight, computed logS Averaged values when there were multiple reports for each compound • Created a KNIME workflow to do the analysis • Used CDK molecular descriptors and R to make simple solubility model Simple multiple regression model “lm” in R • Also created a model that reports the solubility of the most similar compound This has been reported to be surprisingly effective! This works well for Reaxys because of the large number of compounds with solubilities
  • 19.
    Relevant data whereand when they are needed ELSS content integrated into the existing environment of tools and processes 19 Script, PipelinePilot or KNIME node Set of compound structures List of target names Patent numbers Bioactivity values Compound structures Chemical properties Input OutputSearch, retrieve & process element – Visualisation, Spotfire input – Reporting, dashboard production – Excel tables – QSAR/QSPR modeling – Hit-to-lead optimization – Reaction modeling – Text mining Further processing
  • 20.
    Knime workflow forSolubility Modeling 20 KNIME Workflow • Can create this in PipelinePilot • Can auto-update with new data in Reaxys since it pulls directly from the server
  • 21.
    Putting Data toWork | Solubility modeling : Predicted-vs-Actual • Predicted vs actual Log (S[M]) • Could filter for a “better” subset of compounds • More scatter than recent work; in this framework can try various descriptors to find ones that work best Residual standard error: 1.253 on 3437 degrees of freedom Multiple R-squared: 0.5728, Adjusted R-squared: 0.5636 F-statistic: 62.29 on 74 and 3437 DF, p-value: < 2.2e-16 • Recent work of Yalkowsky, 1642 selected compounds. • Used group-contribution methods • Std error 0.8 log units • Int J Pharm. 2008 Aug 6;360(1-2):122-47. doi: 10.1016/j.ijpharm.2008.04.028
  • 22.
    Putting Data toWork | Conclusions • One can use the valuable properties reported in Reaxys for creating models • Much more biological information available in Reaxys Medicinal Chemistry! • All sources are referenced • The API allows easy access to the data outside of the web user interface for models • One can make several kinds of models and show all results, or make a consensus determination
  • 23.
  • 24.
    Putting Data toWork | 24 Why avoiding Herg inhibition?
  • 25.
    Putting Data toWork | Case Study : Which are the antagonist of 5-HT2a antagonist with low affinity on herg Channel? • 5-HT2A receptor antagonism in contributing to the therapeutic effect of several clinically effective and potential atypical antipsychotics as well as several antidepressants. • The ability of selective 5-HT2A receptor antagonists to interfere with the heightened state of dopamine activity without altering basal tone, suggests that these drugs possess antipsychotic activity and may provide the basis for new therapies for psychosis and drug dependence. search for 5-HT2a antagonist search for compounds tested on Herg
  • 26.
    Putting Data toWork | 26 26 Click on Heatmap overlay to retrieve 5-HT2 antagonist tested on Herg Combine Hitsets
  • 27.
    Putting Data toWork | 27 Which are the antagonist of 5-HT2a antagonist with low affinity on herg Channel? The following Heatmap displays 99 5- HT2a antagonist tested also on Herg Channel
  • 28.
    Putting Data toWork | 28 Which are the antagonist of 5-HT2a antagonist with low affinity on herg Channel? Most active antagonist on 5-HT2A (~10nM) with low affinity on Herg How to avoid erg inhibition
  • 29.
    Putting Data toWork | Prediction of Cardiotoxic drugs related to hERG blockade
  • 30.
    Putting Data toWork | Introduction  QT interval prolongation can lead to serious arrhythmias which can evolve to fatal issue.  A large number of non-cardiac drugs-induced QT prolongation has been reported and continues to increase with the withdrawn of some blockbusters medicines.  hERG seems to be the main target of this adverse side effect.  In silico models could be rapid and powerful tools to screen out potential hERG blockers as early as possible during the discovery process.
  • 31.
    Putting Data toWork | Extraction & Methodology hERG data set 640 mol Recursive Partitioning (RP) MOE QuaSAR Classify Molecules tested on hERG (Kv11.1) 2D-Molecular descriptors sets Predictive models Subsets of molecules According to biological detailed protocols Representation of hERG ligands within chemical space of NCI database according to the two first PCA axis NCI database hERG data set Cross-validation External validation
  • 32.
    Putting Data toWork | Model 3 HIGH WEAK 1 µM 10 µM 50 / 9 mol50 / 46 mol Training Test Descriptors Relevant P_VSA Relevant P_VSA High 48/50 47/50 45/46 42/46 Weak 49/50 49/50 8/9 9/9 All 97% 96% 96% 93% Correct classifications determined by 5-fold cross-validation (Training) and by external validation (Test) for each descriptor set. SlogP_VSA7 SlogP_VSA2SMR_VSA6 PEOE_VSA+1 + + + + + - - - - - SMR_VSA5 SMR_VSA6 SMR_VSA5 SMR_VSA4 PEOE_VSA+0 + - SlogP PEOE_VSA_FHYD SMR_VSA1 SMR_VSA6 SMR_VSA1 SlogP vsa_pol - - -- ++ Relevant P_VSA
  • 33.
    Putting Data toWork | Conclusion  Reaxys Medicinal Chemistry permitted to retrieve high quality dataset with chemical diversity and homogeneous biological activities.  Pertinent predictive models of hERG activity have been designed using recursive partitioning analysis with 2D-molecular descriptors.  From Reaxys Medicinal Chemistry, fast virtual screening approach could be used as early tool in the drug discovery process to avoid cardiotoxic side effect related to hERG blockade. AstraZeneca’s global HERG QSAR model70 has contributed to the reduction in the synthesis of ‘red flag’ compounds (compounds that are measured to have an HERG potency of <1μM) from 25.8% of all compounds tested in 2003 to only 6% in 2010. Cumming, J.G., Davis, A.M. et al. Nat. Rev. Drug Disc. (2013) 12, 948–962 Even if models are performing well designers need interpretable models
  • 34.
    Putting Data toWork | Mine Reaxys medicinal chemistry for Metabolic staability
  • 35.
    Putting Data toWork | 35 How to search Metabolism of a certain Phenotype Bioassay Category Parameter Broad search (All parameters) Precise search (by parameters) Pyrrolidine versus Azetidine metabolic stability
  • 36.
    Putting Data toWork | 36 How to Access to Metabolism details enzyme Tissue/ Organ Cell Fraction Enzyme Substrate
  • 37.
    Putting Data toWork | 37 Metabolic Stability Export
  • 38.
    Putting Data toWork | Show Case 1 : Pyrrolidine metabolic stability Pyrrolidines are known to be metabolically unstable. Are there pyrrolidines out there with an intrinsic clearance in microsomes <20 µL/min/mg of protein? How do the complete structures of the compounds look like The overall search on intrinsic clearance (ml/min/g or µL/min/mg of protein) of Pyrrolidines in Reaxys Medicinal Chemistry provides1031 Substances and 1777 clearance results. (see below) extracted from 138 citations.
  • 39.
    Putting Data toWork | More stable Pyrrolidine compouds • The top 10 pyrrolidine compounds having the lowest intrinsic clearance are displayed
  • 40.
    Putting Data toWork | Show Case 2 : Azetidine Metabolic stability • Different cyclic amines might be tolerated. What is known about the metabolic stability of azetidines? Are they more stable than pyrrolidines? What modifications of the azetidine are known? •
  • 41.
    Putting Data toWork | Among the stable azetidines (clint <20 µl/min/mg of prot) the followings scaffolds were found. What modifications of the azetidine are known ?
  • 42.
    Putting Data toWork | Are Azetidines more stable than pyrrolidines? Based on the graph below displaying azetidines and Pyrrolidines Clearance results founds in RMC, it’s appear that in general Azetidines are more stable than Pyrrolidines. • 40% of the total results of Azetine clearances are below 20 µl/min/mg of prot but only 30% for Pyrrolidines. • This is even obvious when looking into the 20 to 100 µl/min/mg of prot of Clearance range where 42% of azetine results fall into this category but only 20% of pyrrolydines.
  • 43.
    Putting Data toWork | Lead optimization exploration of structural features of a lead series of Compounds
  • 44.
    Putting Data toWork | 44 Exploration of structural features of a lead series of Compounds • Within NK3 the 3,4-dichlorophenyl group appears to be important as structural feature • Are there more target classes in which the diCl-Phe play an important role? • Does the 3,4-di Cl Phe cause a certain activity profile? • Are other 3,4-diX Phe structures known and what is their pharmacological profile? • Are there other di-substitution patterns with a strong pharmacological response? (Other than 3,4 is meant here).
  • 45.
    Putting Data toWork | 45 Does the 3,4-di Cl Phe cause a certain activity profile? Substructure search for 3,4 Dichloro Phenyl Fragment 30% of the substances containing 3,4 DiCl Phenyl fragment have a bioactivity below 0,1µM
  • 46.
    Putting Data toWork | 46 Are there more target classes in which the 3,4 diCl-Phe plays an important role Select bioactivities below 0,1µM • Target Profile of 3,4 DiCl Phenyl • Target are ranked based on count of bioactivities. Yellow bars indicate the count of bioactivities below 0,1µM
  • 47.
    Putting Data toWork | 47 Are there more target classes in which the 3,4 diCl-Phe plays an important role Off Targets/CNS adverse Effect: Addiction/psychostimulant Off Targets/CNS adverse Effect: Attention/perception Off Targets/CNS adverse Effect: Learning/Memory 3,4 Dichloro Phenyl group are also involved in Off-Targets Mainly CNS related Target Profile of 3,4 DiCl Phenyl With an affinity below 0,1µM and having at least 100 bioactivities Substances tested on NK3 are not tested ON other targets except one substances on Histamine 1
  • 48.
    Putting Data toWork | 48 Are other 3,4-diX Phe structures known and what is their pharmacological profile? 3,4-DiFluoro Phenyl 3,4-Dibromo Phenyl 3,4-Dibromo Phenyl Target Profile of 3,4 DiX Phenyl With an affinity below 0,1µM
  • 49.
    Putting Data toWork | 49 Are there other di-substitution patterns with a strong pharmacological response? (Other than 3,4 is meant here) 2,4-DiChloro Phenyl 2,5-DiChloro Phenyl 2,3-DiChloro Phenyl • Canabinoid receptor (1 and 2) Melanocortin 4 5-HT 2C Amyloid precursor protein (App) Dopamine receptor (2 and 3) p38a Cytochrome P450 3A4 potential Drug drug interactions Target Profile of X,Y DiCl Phenyl.With an affinity below 0,1µM
  • 50.
    Putting Data toWork | Reaxys Medicinal chemistry accelerates Drug Discovery by Knowledge based Design • Mine large datasets to find hits (virtual screening) • Mine large datasets to accelerate understanding & derive useful medicinal chemistry knowledge • Apply this knowledge to propose and evaluate new, better molecules to fulfil the multi-objective design needs of Lead Optimisation • Apply this to develop clinical candidates faster • Apply this knowlegdge base to repurpose drug.