SlideShare a Scribd company logo
MedChemica | 2016
ACS Philadelphia 2016
Extracting and exploiting
medicinal chemistry ADMET
knowledge automatically from
public and large pharma data
MedChemica
ACS August 2016
MedChemica | 2016
ACS Philadelphia 2016
Better compounds designed from Data
•  Improved compounds quicker
•  Applicable ideas
•  Confident design decisions
•  Help when stuck
•  Clearly describable plans
•  Maximizing value from ADMET testing
•  Pursuing dead-end series
•  Pursuing dead-end projects
•  Running out of time or $
Essentials
Gains
Pains
MedChemica | 2016
ACS Philadelphia 2016
Key findings:
•  Secure sharing of large scale ADMET knowledge
between large Pharma is possible
•  The collaboration generated great synergy
•  Many findings are highly significant
•  MMP is a great tool for idea generation 
•  The rules have been used in drug-discovery projects
and generated meaningful results
•  MMPA methodology can be extended to extract
pharmacophores  
MedChemica | 2016
ACS Philadelphia 2016
Grand Rule database
Better medicinal chemistry by sharing knowledge not data & structures
MedChemica | 2016
ACS Philadelphia 2016
Barriers Broken to Sharing Knowledge
Data
Integrity and
curation
Knowledge
extraction
algorithms
Consortium
building to
share
knowledge Into the minds of
chemists
MedChemica | 2016
ACS Philadelphia 2016
Data Integrity and Curation
Structural
•  Extensive standards for
inclusion of mixtures,
chiral compounds, salt
forms
•  Tautomer and charge
state canonicalisation
client side
•  Automated validation of
structures run client side
= “clean” comparable
structures submitted to
pair finding
Measured Data
•  Assay protocols reviewed
prior to merging
•  Precise documentation
on unit definitions and
data reporting standards
•  Option to share standard
compound measured
values
•  Automated extensive
data validation checks
prior to merging data
“client	
  side”	
  =	
  behind	
  Pharma	
  firewall	
  
MedChemica | 2016
ACS Philadelphia 2016
IP security
•  Contributing company structures are NOT shared
•  Contributing company identifiers are NOT shared
•  Each contributing company receives it’s OWN
custom GRD copy to enable drill back to OWN
example data
•  Minimum of n=6 example pairs required for
inclusion in the GRD
•  Enforced limits of shared substructure sizes
•  Option for “time-blocking” sharing of data to
withhold data < 6 months old for IP submission
MedChemica | 2016
ACS Philadelphia 2016
Barriers Broken to Sharing Knowledge
Data
Integrity and
curation
Knowledge
extraction
algorithms
Consortium
building to
share
knowledge Into the minds of
chemists
✓	
  
MedChemica | 2016
ACS Philadelphia 2016
MCPairs Platform
•  Extract rules using Advanced Matched Molecular Pair Analysis
•  Knowledge is captured as transformations
–  divorced from structures => sharable
Measured
Data
rule
finder Exploitable
Knowledge
MC Expert
Enumerator
System
Problem molecule
Solution molecules
Pharmacophores
& toxophores
SMARTS
matching
Alerts	
  
Virtual	
  screening	
  
Library	
  design	
  
Protect	
  the	
  IP	
  jewels	
  
MedChemica | 2016
ACS Philadelphia 2016
•  Matched Molecular Pairs – Molecules
that differ only by a particular, well-
defined structural transformation
•  Transformation with environment capture –
MMPs can be recorded as transformations
from A B
•  Environment is essential to understand
chemistry
Statistical analysis
•  Learn what effect the transformation has had on ADMET properties in
the past
Griffen,	
  E.	
  et	
  al.	
  Matched	
  Molecular	
  Pairs	
  as	
  a	
  Medicinal	
  Chemistry	
  Tool.	
  Journal	
  of	
  Medicinal	
  Chemistry.	
  2011,	
  54(22),	
  pp.7739-­‐7750.	
  	
  
	
  
Advanced MMPA
Δ Data
A-B1
2
2
3
3
3
4
4
4
12
23
3
34
4
4
A 	
   	
   	
   	
  B 	
  	
  
MedChemica | 2016
ACS Philadelphia 2016
Environment really matters
HMe:
•  Median Δlog(Solubility)
•  225 different
environments
2.5log	
  
1.5log	
  
HMe:	
  
•  Median Δlog(Clint)
Human microsomal
clearance
•  278 different
environments
MedChemica | 2016
ACS Philadelphia 2016
More environment = right detail
HMe Solubility:
•  225 different environments
MedChemica | 2016
ACS Philadelphia 2016
HF What effect on Clearance?
•  Median Δlog(Clint) Human microsomal clearance
•  37 different environments
2	
  fold	
  improvement	
   2	
  fold	
  worse	
  
Increase	
  
clearance	
  
decrease	
  
clearance	
  
MedChemica | 2016
ACS Philadelphia 2016
Identify and group matching SMIRKS
Calculate statistical parameters for each unique
SMIRKS (n, median, sd, se, n_up/n_down)
Is n ≥ 6?
Not enough data:
ignore transformation
Is the |median| ≤ 0.05 and the
intercentile range (10-90%) ≤ 0.3?
Perform two-tailed binomial test on the
transformation to determine the
significance of the up/ down frequency
transformation is
classified as ‘neutral’
Transformation classified as
‘NED’ (No Effect Determined)
Transformation classified as
‘increase’ or ‘decrease’
depending on which direction the
property is changing
pass	fail	
yes	no	
yes	no	
Rule selection
0 +ve-ve
Median data difference
Neutral	
   Increase	
  Decrease	
  
NED	
  
MedChemica | 2016
ACS Philadelphia 2016
Barriers Broken to Sharing Knowledge
Data
Integrity and
curation
Knowledge
extraction
algorithms
Consortium
building to
share
knowledge Into the minds of
chemists
✓	
  
✓	
  
MedChemica | 2016
ACS Philadelphia 2016
Merging knowledge
•  Use the transforms that
are robust in both
companies to calibrate
assays.
•  Once the assays are
calibrated against each
other the transform
data can be combined
to build support in
poorly exemplified
transforms
•  Methodology
precedented in other
fields
CalibrateRobust
Robust
Weak
Weak
Discover
Novel
Pharma	
  1	
  
Pharma	
  2	
  
MedChemica | 2016
ACS Philadelphia 2016
Merging Datasets
•  Datasets are standardized by comparison of
transformations shared by contributing companies
•  Transformations are examined at the “pair example”
level
•  Minimum of 6 transformations, each with a minimum of 6
pairs (42 compounds bare minimum) required to
standardise
•  “calibration factors” extracted to standardize the
datasets to a common value – mean of calibration
factors 0.94, typical range 0.8-1.2.
•  Datasets with too few common transformations have
standard compound measurements shared for
calibration.
“Blinded”	
  source	
  of	
  
transformaLons	
  
Current Knowledge sets – GRDv3
Numbers of statistically valid transforms
Grouped Datasets Number of Rules
logD7.4 153449
Merged solubility 46655
In vitro microsomal clearance:
Human, rat ,mouse, cyno, dog
88423
In vitro hepatocyte clearance :
Human, rat ,mouse, cyno, dog 26627
MCDK permeability A-B / B – A efflux 1852
Cytochrome P450 inhibition:
2C9, 2D6 , 3A4 , 2C19 , 1A2 40605
Cardiac ion channels
NaV 1.5 , hERG ion channel inhibition
15636
Glutathione Stability 116
plasma protein or albumin binding
Human, rat ,mouse, cyno, dog
64622
MedChemica | 2016
ACS Philadelphia 2016
Pharma 1 100k rules
Pharma 2 92k rules
Pharma 3 37k rules
5.8k rules in common (pre-merge) ~ 2%
New Rules 88k
~26% of total
Merge	
  
Combining	
  data	
  yields	
  brand	
  new	
  rules	
  
Gains:	
  	
  300	
  -­‐	
  900%	
  
Merging knowledge – GRDv1
MedChemica | 2016
ACS Philadelphia 2016
Key findings:
•  Secure sharing of large scale ADMET knowledge
between large Pharma is possible
•  The collaboration generated great synergy
•  Many findings are highly significant.
•  MMP is a great tool for idea generation. 
•  The rules have been used in drug-discovery projects
and generated meaningful results
•  MMPA methodology can be extended to extract
pharmacophores  
MedChemica | 2016
ACS Philadelphia 2016
Barriers Broken to Sharing Knowledge
Data
Integrity and
curation
Knowledge
extraction
algorithms
Consortium
building to
share
knowledge Into the minds of
chemists
✓	
  
✓	
  
✓	
  
MedChemica | 2016
ACS Philadelphia 2016
Integrating knowledge exploitation
..many possible connections
..high Stability and flexibility**
Rule
Database
MCExpert
API server
** api.medchemica.com has
delivered 18 months of uptime and
drives SaltTraX (Elixir)
Approx 50 chemists use the tool
RESTful
API
Chemistry	
  Shape	
  
and	
  electrostaHcs	
  
MedChemica | 2016
ACS Philadelphia 2016
Results from Hao Zheng…..
MedChemica | 2016
ACS Philadelphia 2016
MedChemica | 2016
ACS Philadelphia 2016
- Fix hERG problem whilst maintaining potency
Waring et al, Med. Chem. Commun., (2011), 2, 775
Glucokinase Activators
MMPA
∆pEC50: -0.1 ∆logD: -0.6 ∆hERG pIC50 :-0.5
n=33 n=32 n=22
MMPA
∆pEC50: +0.3 ∆logD: +0.3 ∆hERG pIC50 :-0.3
n=20 n=23 n=19
MMPA
∆pEC50: -0.1 ∆logD: -0.6 ∆hERG pIC50 :-0.5
n=27 n=27 n=7
MedChemica | 2016
ACS Philadelphia 2016
A Less Simple Example
Increase logD and gain solubility
Property	 Number	of	
Observa2ons	
Direc2on	 Mean	Change	 Probability	
logD	 8	 Increase	 1.2	 100%	
Log(Solubility)	 14	 Increase	 1.4	 92%	
What	is	the	effect	on	lipophilicity	and	
solubility?	
Roche	data	is	inconclusive!	(2	pairs	
for	logD,	1	pair	for	solubility)	
logD	=	2.65	
KineMc	solubility	=	84	µg/ml	
IC50	SST5	=	0.8	µM	
logD	=	3.63	
KineMc	solubility	=	>452	µg/ml	
IC50	SST5	=	0.19	µM	
Ques2on:	
Available	
Sta2s2cs:	
Roche	
Example:	
MedChemica | 2016
ACS Philadelphia 2016
Solving a tBu metabolism issue
Benchmark	
compound	
Predicted	to	offer	most	improvement	in	microsomal	stability	(in	at	least	1	species	/	assay)	
											R2	
	
R1	
tBu	 Me	 Et	 iPr	
99	
392	
16	
64	
78	
410	
53	
550	
99	
288	
78	
515	
41	
35	
98	
327	
92	
372	
24	
247	
35	
128	
24	
62	
60	
395	
39	
445	
3	
21	
20	
27	
57	
89	
54	
89	
•  Data shown are Clint for HLM and MLM (top and bottom, respectively)
R1	 R2	R1	tBu	
Roger Butlin
Rebecca Newton
Allan Jordan
MedChemica | 2016
ACS Philadelphia 2016
- Fix hERG problem whilst maintaining potency
Waring et al, Med. Chem. Commun., (2011), 2, 775
Glucokinase Activators
MMPA
∆pEC50: -0.1 ∆logD: -0.6 ∆hERG pIC50 :-0.5
n=33 n=32 n=22
MMPA
∆pEC50: +0.3 ∆logD: +0.3 ∆hERG pIC50 :-0.3
n=20 n=23 n=19
MMPA
∆pEC50: -0.1 ∆logD: -0.6 ∆hERG pIC50 :-0.5
n=27 n=27 n=7
MedChemica | 2016
ACS Philadelphia 2016
A Less Simple Example
Increase logD and gain solubility
Property	
   Number	
  of	
  
Observa@ons	
  
Direc@on	
   Mean	
  Change	
   Probability	
  
logD	
   8	
   Increase	
   1.2	
   100%	
  
Log(Solubility)	
   14	
   Increase	
   1.4	
   92%	
  
What	
  is	
  the	
  effect	
  on	
  lipophilicity	
  and	
  
solubility?	
  
Roche	
  data	
  is	
  inconclusive!	
  (2	
  pairs	
  
for	
  logD,	
  1	
  pair	
  for	
  solubility)	
  
logD	
  =	
  2.65	
  
KineLc	
  solubility	
  =	
  84	
  µg/ml	
  
IC50	
  SST5	
  =	
  0.8	
  µM	
  
logD	
  =	
  3.63	
  
KineLc	
  solubility	
  =	
  >452	
  µg/ml	
  
IC50	
  SST5	
  =	
  0.19	
  µM	
  
Ques@on:	
  
Available	
  
Sta@s@cs:	
  
Roche	
  
Example:	
  
MedChemica | 2016
ACS Philadelphia 2016
The application helped lead
optimization in project
27
•  193
compounds
•  Enumerated
Objective: improve metabolic stability
MMP
Enumeration
Calculated Property
Docking
8 compounds
synthesized
MedChemica | 2016
ACS Philadelphia 2016
Solving a tBu metabolism issue
Benchmark	
  
compound	
  
Predicted	
  to	
  offer	
  most	
  improvement	
  in	
  microsomal	
  stability	
  (in	
  at	
  least	
  1	
  species	
  /	
  assay)	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  R2	
  
	
  
R1	
  
tBu	
   Me	
   Et	
   iPr	
  
99	
  
392	
  
16	
  
64	
  
78	
  
410	
  
53	
  
550	
  
99	
  
288	
  
78	
  
515	
  
41	
  
35	
  
98	
  
327	
  
92	
  
372	
  
24	
  
247	
  
35	
  
128	
  
24	
  
62	
  
60	
  
395	
  
39	
  
445	
  
3	
  
21	
  
20	
  
27	
  
57	
  
89	
  
54	
  
89	
  
•  Data shown are Clint for HLM and MLM (top and bottom, respectively)
R1	
   R2	
  R1	
  tBu	
  
Roger Butlin
Rebecca Newton
Allan Jordan
MedChemica | 2016
ACS Philadelphia 2016
Early successes
From GRDv1 May 2014
29
J.	
  Med.	
  Chem.,	
  2015,	
  58	
  (23),	
  pp	
  9309–9333	
  
DOI:	
  10.1021/acs.jmedchem.5b01312	
  
MedChemica | 2016
ACS Philadelphia 2016
Comparison of Merck in-house MMPA with SALTMinerTM
Structure:
ADMET Issue: hERG
Lead A2A receptor antagonist
compound in Merck Parkinson's
project
138 suggestion molecules with
predicted improvement in hERG
binding
How many match the results of
Merck?
•  Also shows potent binding
to the hERG ion channel
•  Deng et al performed in-
house MMPA on hERG
binding compound data
and have published 18
resulting fluorobenzene
transformations, which they
have synthesized and
tested for hERG activity
Deng	
  et	
  al,	
  	
  
Bioorg.	
  &	
  Med	
  Chem	
  Let	
  (2015),	
  
doi:	
  hhp://dx.doi.org/10.1016/
j.bmcl.2015.05.036	
  	
  
MedChemica | 2016
ACS Philadelphia 2016
R	
  group:	
  
Measured	
  hERG	
  
pIC50	
  change	
  
-­‐1.187	
   -­‐1.149	
   -­‐1.038	
   -­‐1.215	
   -­‐1.157	
   -­‐0.149	
   -­‐1.487	
   -­‐1.133	
  
GRD	
  median	
  
historic	
  pIC50	
  
change	
  
0	
   -­‐0.171	
   -­‐0.1	
   -­‐0.283	
   -­‐0.219	
   -­‐0.318	
   -­‐0.159	
   -­‐0.103	
  
Results:
8 out of the 18 fluorobenzene transformations produced by Merck were also suggested
by MCExpert to decrease hERG binding:
Searching the GRD for transformations that increase hERG there were none that
matched the remaining 10 of 18 transformations in the paper.
MCExpert also suggested an additional 50 fluorobenzene replacements to decrease
hERG binding NOT mentioned in the publication.
MedChemica | 2016
ACS Philadelphia 2016
Key findings:
•  Secure sharing of large scale ADMET knowledge
between large Pharma is possible
•  The collaboration generated great synergy
•  Many findings are highly significant.
•  MMP is a great tool for idea generation. 
•  The rules have been used in drug-discovery projects
and generated meaningful results
•  MMPA methodology can be extended to extract
pharmacophores  
MedChemica | 2016
ACS Philadelphia 2016
Barriers Broken to Sharing Knowledge
Data
Integrity and
curation
Knowledge
extraction
algorithms
Consortium
building to
share
knowledge Into the minds of
chemists
✓	
  
✓	
  
✓	
   ✓	
  
MedChemica | 2016
ACS Philadelphia 2016
Pharmacophores and Toxophores
by extended analysis from the MMPA
PharmacophoresBigData Stats
Matched
Pairs
Finding
Public and in-
house potency
data
Mining transform sets to find influential fragments
Identify the ‘Z’ fragments associated with a
significant number of potency increasing changes –
irrespective of what they are replaced with
‘Z’ is ‘worse than anything you replace it with’
Fragment A Fragment B	
  
Change in binding
measurement
Public
Data
Find
Matched
Pairs
Find Potent
Fragments
+2.7	
  
+3.2	
  
+0.6	
  
+0.6	
  
Identify the ‘A’ fragments associated with a
significant number of potency decreasing changes
– irrespective of what they are replaced with
‘A’ is ‘better than anything you replace it with’
A	
  
+2.1	
  +2.2	
  
+1.4	
  
+0.4	
  
+1.8	
  
Z	
  
pKi/
pIC50
Compounds with
destructive fragment
Compounds with
constructive fragments
Generate	
  Pharmacophore	
  dyads	
  by	
  
permutaLng	
  all	
  the	
  fragments	
  with	
  
the	
  shortest	
  path	
  between	
  them	
  
MedChemica | 2016
ACS Philadelphia 2016
Toxophores - Detailed, specific & transparent
Dopamine D2 receptor human
Actual: 9.5
Predicted: 9.1
Mean with: 8.0
Mean without: 6.6
Odds Ratio: 340
Dopamine Transporter
Actual: 9.1
Predicted: 8.6
Mean with: 8.3
Mean without: 6.7
Odds Ratio: 407
GABA-A
Actual: 9.0
Predicted: 8.7
Mean with: 8.0
Mean without: 6.8
Odds Ratio: 1506
β1 adrenergic receptor
Actual: 7.8
Predicted: 7.7
Mean with: 6.5
Mean without: 5.7
Odds Ratio: 1501
Find Potent
Fragments
Matched
Pairs
Finding
Find
Pharmacophore
Dyads
Public and in-
house potency
data
MedChemica | 2016
ACS Philadelphia 2016
Prediction of unseen new molecules
The acid test…
•  Vascular endothelial growth factor receptor 2 tyrosine kinase (KDR)
•  Inhibitors have oncology and ophthalmic indications
•  Large dataset in CHEMBL
•  10 fold cross validated PLS model
•  Selected model by minimised RMSEP
37
Compounds 4466
Matched Pairs 288100
Fragments 678
Pharmacophore dyads 787
RMSEP 0.8
R2 0.64
Y-scrambled R2 0.0
ROC 0.95
Geomean odds ratio 80
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
● ●●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●●●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
4
6
8
10
5 7 9
pIC50_pred
pIC50
MedChemica | 2016
ACS Philadelphia 2016
Novartis Predictions From Our Model
Domain of Applicability….
Actual: 8.4[1]
Predicted: 7.5
38
Actual: 7.6[1]
Predicted: 7.5
1. J MedChem(2016), Bold et al.
2. MedChem Lett (2016), Mainolfi et al.
Actual: 7.7[2]
Predicted: 7.1
Actual: 9.0[2]
Predicted: Out of Domain
MedChemica | 2016
ACS Philadelphia 2016
Target
Number	
  of	
  
compounds	
  
Number	
  of	
  
compound	
  
pairs	
  
Number	
  of	
  
Fragments	
  
Number	
  of	
  
Pharmacophore	
  
dyads	
  aoer	
  
filtering	
  
R2	
   RMSEP	
   ROC	
  
odds_raLo	
  
(geomean)	
  
Acetylcholine esterase - human 383	
   27755	
   44	
   10	
   0.43	
   1.57	
   0.80	
   4	
  
β 1 adrenergic receptor 505	
   145447	
   276	
   313	
   0.64	
   0.70	
   0.96	
   833	
  
Androgen receptor 1064	
   113163	
   186	
   46	
   0.47	
   0.77	
   0.86	
   140	
  
CB1 canabinnoid receptor 1104	
   88091	
   165	
   90	
   0.61	
   1.02	
   0.87	
   96	
  
CB2 canabinnoid receptor 1112	
   82130	
   194	
   158	
   0.19	
   0.85	
   0.64	
   5.7	
  
Dopamine D2 receptor - human 3873	
   230962	
   483	
   602	
   0.42	
   0.88	
   0.69	
   110	
  
Dopamine D2 receptor - rat 1807	
   118736	
   267	
   377	
   0.29	
   0.85	
   0.78	
   125	
  
Dopamine Transporter 1470	
   106969	
   282	
   336	
   0.58	
   0.73	
   0.88	
   141	
  
GABA A receptor 848	
   39494	
   106	
   167	
   0.70	
   0.76	
   0.97	
   560	
  
hERG ion channel 4189	
   242261	
   392	
   76	
   0.61	
   0.96	
   0.92	
   55	
  
5HT2a receptor 642	
   50870	
   197	
   267	
   0.61	
   0.59	
   0.83	
   600	
  
Monoamine oxidase 264	
   15439	
   44	
   11	
   0.12	
   1.25	
   0.48	
   181	
  
Muscarinic acetylcholine receptor
M1
628	
   48200	
   97	
   510	
   0.62	
   0.94	
   0.89	
   48	
  
µ opioid receptor 1128	
   37184	
   33	
   11	
   0.69	
   1.30	
   0.87	
   81	
  
Critical safety target analysis
•  Build models using 10-fold cross validated PLS
•  Assess using ROC / BEDROC, R2 vs 100 fold y-scrambled R2 and geomean odds ratio
39
Public
Data
Find
Matche
d Pairs
Pharmacophores
Find
Pharmacophore
dyads
Find Potent
Fragments
MedChemica | 2016
ACS Philadelphia 2016
MCBiophore GUI screenshot
Assay Image Mean_with Mean_without PLS_coeff Path SMARTS1 SMARTS2 n_examples odds_ratio
2
VEGFR 8.3 6.4 0.71 [c]c[c][c] Cc1ccc[c]c1[n] [c]/C(=N/O)/C 18 259.8
3
VEGFR 8.2 6.4 0.17 [c] [c]c1cc(cc[c]1)/C(=N/OC)/CCc1ccc[c]c1[n] 17 257.6
0
VEGFR 8.1 6.4 -0.01 [CH3] [cH]c([cH])/C(=N/OC)/C[c]/C(=N/OCC)/C 8 20.2
1
VEGFR 8.1 6.4 -0.01 [CH3] [c]c1cc(cc[c]1)/C(=N/OC)/C[c]/C(=N/OCC)/C 8 151.2
Detailed
results in
excel
MedChemica | 2016
ACS Philadelphia 2016
Matched Molecular Pair
data A data B
data C data D
data E data F
Chemical Transformations
Δ data A B
Δ data C D
Δ data E F
Chemical Transformations
Δ data A B
Δ data C D
Δ data E F
Δ data G H
Δ data I J
Δ data K L
Matched Molecular Pair Analysis (MMPA) enables SAR sharing
Without sharing underlying structures and data
Grand
Rule
Database
Enumeration
Rate-My-Idea
GRD-Browser
ChEMBL Tox database Toxophores
MC-
Biophore
MedChemica | 2016
ACS Philadelphia 2016
Take home messages
•  Systematic “Big Data” analysis of ADMET data yields a system
that SUGGESTS solution molecules for medicinal chemists
•  Better decision making / Faster project progression
•  Knowledge Sharing IS possible between pharma companies
–  This yields HIGH VALUE databases and VAST new knowledge
•  Pharmacophores and toxophores can be extracted from MMPA
MedChemica | 2016
ACS Philadelphia 2016
Collaborators and Users - experience
MedChemica | 2016
ACS Philadelphia 2016
A Collaboration of the willing
Craig Bruce OE
John Cumming Roche
David Cosgrove C4XD
Andy Grant★
Martin Harrison Elixir
Huw Jones Base360
Al Rabow Consulting
David Riley AZ
Graeme Robb AZ
Attilla Ting AZ
Howard Tucker retired
Dan Warner Myjar
Steve St-Galley Syngenta
David Wood JDR
Lauren Reid MedChemica
Shane Monague MedChemica
Jessica Stacey MedChemica
Andy Barker Consulting
Pat Barton AZ
Andy Davis AZ
Andrew Griffin Elixir
Phil Jewsbury AZ
Mike Snowden AZ
Peter Sjo AZ
Martin Packer AZ
Manos Perros Entasis Therapeutics
Nick Tomkinson AZ
Martin Stahl Roche
Jerome Hert Roche
Martin Blapp Roche
Torsten Schindler Roche
Paula Petrone Roche
Christian Kramer Roche
Jeff Blaney Genentech
Hao Zheng Genentech
Slaton Lipscomb Genentech
Alberto Gobbi Genentech

More Related Content

What's hot

Extracting medicinal chemistry knowledge by a secured Matched Molecular Pair ...
Extracting medicinal chemistry knowledge by a secured Matched Molecular Pair ...Extracting medicinal chemistry knowledge by a secured Matched Molecular Pair ...
Extracting medicinal chemistry knowledge by a secured Matched Molecular Pair ...
Ed Griffen
 
Emerging Challenges for Artificial Intelligence in Medicinal Chemistry
Emerging Challenges for Artificial Intelligence in Medicinal ChemistryEmerging Challenges for Artificial Intelligence in Medicinal Chemistry
Emerging Challenges for Artificial Intelligence in Medicinal Chemistry
Ed Griffen
 
MedChemica Levinthal Lecture at Openeye CUP XX 2020
MedChemica Levinthal Lecture at Openeye CUP XX 2020MedChemica Levinthal Lecture at Openeye CUP XX 2020
MedChemica Levinthal Lecture at Openeye CUP XX 2020
Ed Griffen
 
DataPharmaNovember2016
DataPharmaNovember2016DataPharmaNovember2016
DataPharmaNovember2016Pfizer
 
How predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinarHow predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinar
Ann-Marie Roche
 
SMi Group's AI in Drug Discovery 2020 conference
SMi Group's AI in Drug Discovery 2020 conferenceSMi Group's AI in Drug Discovery 2020 conference
SMi Group's AI in Drug Discovery 2020 conference
Dale Butler
 
Reaxys rmc unified platform_ webinar_
Reaxys rmc unified platform_ webinar_Reaxys rmc unified platform_ webinar_
Reaxys rmc unified platform_ webinar_
Ann-Marie Roche
 
Big Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use CasesBig Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use Cases
Josef Scheiber
 
MDC Connects: Make the Molecules that Matter
MDC Connects: Make the Molecules that MatterMDC Connects: Make the Molecules that Matter
MDC Connects: Make the Molecules that Matter
Medicines Discovery Catapult
 
Webinar: New RMC - Your lead_optimization Solution June082017
Webinar: New RMC - Your lead_optimization Solution June082017Webinar: New RMC - Your lead_optimization Solution June082017
Webinar: New RMC - Your lead_optimization Solution June082017
Ann-Marie Roche
 
MDC Connects: Synthetic chemistry aspects of drug discovery and early develop...
MDC Connects: Synthetic chemistry aspects of drug discovery and early develop...MDC Connects: Synthetic chemistry aspects of drug discovery and early develop...
MDC Connects: Synthetic chemistry aspects of drug discovery and early develop...
Medicines Discovery Catapult
 
AI is the Future of Drug Discovery
AI is the Future of Drug DiscoveryAI is the Future of Drug Discovery
AI is the Future of Drug Discovery
David Leahy
 
Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Ankur Khanna
 
Data mining (DM) in the pharmaceutical industry
Data mining (DM) in the pharmaceutical industryData mining (DM) in the pharmaceutical industry
Data mining (DM) in the pharmaceutical industry
lurdhu agnes
 
Big Data and its Impact on Industry (Example of the Pharmaceutical Industry)
Big Data and its Impact on Industry (Example of the Pharmaceutical Industry)Big Data and its Impact on Industry (Example of the Pharmaceutical Industry)
Big Data and its Impact on Industry (Example of the Pharmaceutical Industry)
Hellmuth Broda
 
Knovel lss webinar
Knovel lss webinarKnovel lss webinar
Knovel lss webinar
Ann-Marie Roche
 
MedChemica Large scale analysis and sharing of Medicinal chemistry Knowledge ...
MedChemica Large scale analysis and sharing of Medicinal chemistry Knowledge ...MedChemica Large scale analysis and sharing of Medicinal chemistry Knowledge ...
MedChemica Large scale analysis and sharing of Medicinal chemistry Knowledge ...
Ed Griffen
 
Pistoia alliance debates analytics 15-09-2015 16.00
Pistoia alliance debates   analytics 15-09-2015 16.00Pistoia alliance debates   analytics 15-09-2015 16.00
Pistoia alliance debates analytics 15-09-2015 16.00
Pistoia Alliance
 
SMi Group's 14th annual Drug Design 2015 conference
SMi Group's 14th annual Drug Design 2015 conferenceSMi Group's 14th annual Drug Design 2015 conference
SMi Group's 14th annual Drug Design 2015 conference
Dale Butler
 

What's hot (20)

Extracting medicinal chemistry knowledge by a secured Matched Molecular Pair ...
Extracting medicinal chemistry knowledge by a secured Matched Molecular Pair ...Extracting medicinal chemistry knowledge by a secured Matched Molecular Pair ...
Extracting medicinal chemistry knowledge by a secured Matched Molecular Pair ...
 
Emerging Challenges for Artificial Intelligence in Medicinal Chemistry
Emerging Challenges for Artificial Intelligence in Medicinal ChemistryEmerging Challenges for Artificial Intelligence in Medicinal Chemistry
Emerging Challenges for Artificial Intelligence in Medicinal Chemistry
 
MedChemica Levinthal Lecture at Openeye CUP XX 2020
MedChemica Levinthal Lecture at Openeye CUP XX 2020MedChemica Levinthal Lecture at Openeye CUP XX 2020
MedChemica Levinthal Lecture at Openeye CUP XX 2020
 
DataPharmaNovember2016
DataPharmaNovember2016DataPharmaNovember2016
DataPharmaNovember2016
 
How predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinarHow predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinar
 
SMi Group's AI in Drug Discovery 2020 conference
SMi Group's AI in Drug Discovery 2020 conferenceSMi Group's AI in Drug Discovery 2020 conference
SMi Group's AI in Drug Discovery 2020 conference
 
Reaxys rmc unified platform_ webinar_
Reaxys rmc unified platform_ webinar_Reaxys rmc unified platform_ webinar_
Reaxys rmc unified platform_ webinar_
 
Big Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use CasesBig Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use Cases
 
MDC Connects: Make the Molecules that Matter
MDC Connects: Make the Molecules that MatterMDC Connects: Make the Molecules that Matter
MDC Connects: Make the Molecules that Matter
 
Webinar: New RMC - Your lead_optimization Solution June082017
Webinar: New RMC - Your lead_optimization Solution June082017Webinar: New RMC - Your lead_optimization Solution June082017
Webinar: New RMC - Your lead_optimization Solution June082017
 
MDC Connects: Synthetic chemistry aspects of drug discovery and early develop...
MDC Connects: Synthetic chemistry aspects of drug discovery and early develop...MDC Connects: Synthetic chemistry aspects of drug discovery and early develop...
MDC Connects: Synthetic chemistry aspects of drug discovery and early develop...
 
AI is the Future of Drug Discovery
AI is the Future of Drug DiscoveryAI is the Future of Drug Discovery
AI is the Future of Drug Discovery
 
Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma
 
Data mining (DM) in the pharmaceutical industry
Data mining (DM) in the pharmaceutical industryData mining (DM) in the pharmaceutical industry
Data mining (DM) in the pharmaceutical industry
 
Big Data and its Impact on Industry (Example of the Pharmaceutical Industry)
Big Data and its Impact on Industry (Example of the Pharmaceutical Industry)Big Data and its Impact on Industry (Example of the Pharmaceutical Industry)
Big Data and its Impact on Industry (Example of the Pharmaceutical Industry)
 
Knovel lss webinar
Knovel lss webinarKnovel lss webinar
Knovel lss webinar
 
Machine learning in computational docking
Machine learning in computational dockingMachine learning in computational docking
Machine learning in computational docking
 
MedChemica Large scale analysis and sharing of Medicinal chemistry Knowledge ...
MedChemica Large scale analysis and sharing of Medicinal chemistry Knowledge ...MedChemica Large scale analysis and sharing of Medicinal chemistry Knowledge ...
MedChemica Large scale analysis and sharing of Medicinal chemistry Knowledge ...
 
Pistoia alliance debates analytics 15-09-2015 16.00
Pistoia alliance debates   analytics 15-09-2015 16.00Pistoia alliance debates   analytics 15-09-2015 16.00
Pistoia alliance debates analytics 15-09-2015 16.00
 
SMi Group's 14th annual Drug Design 2015 conference
SMi Group's 14th annual Drug Design 2015 conferenceSMi Group's 14th annual Drug Design 2015 conference
SMi Group's 14th annual Drug Design 2015 conference
 

Viewers also liked

Pharmacophore extraction from Matched Molecular Pair Analysis
Pharmacophore extraction from Matched Molecular Pair AnalysisPharmacophore extraction from Matched Molecular Pair Analysis
Pharmacophore extraction from Matched Molecular Pair Analysis
Ed Griffen
 
Collaborative Medicinal Chemistry Research
Collaborative Medicinal Chemistry ResearchCollaborative Medicinal Chemistry Research
Collaborative Medicinal Chemistry Research
David Andrews
 
Using Matched Molecular Pairs To Cluster Compounds
Using Matched Molecular Pairs To Cluster CompoundsUsing Matched Molecular Pairs To Cluster Compounds
Using Matched Molecular Pairs To Cluster Compounds
Willem van Hoorn
 
5 Signs Your Critical Assets Are Costing You More Than You Think
5 Signs Your Critical Assets Are Costing You More Than You Think5 Signs Your Critical Assets Are Costing You More Than You Think
5 Signs Your Critical Assets Are Costing You More Than You Think
Stephen Dobson
 
Fashion e-commerce design cases
Fashion e-commerce design casesFashion e-commerce design cases
Fashion e-commerce design cases
Louise Roose
 
Modulating PIP2 levels in Cells via PIP5K inhibition
Modulating PIP2 levels in Cells via PIP5K inhibitionModulating PIP2 levels in Cells via PIP5K inhibition
Modulating PIP2 levels in Cells via PIP5K inhibitionDavid Andrews
 
Anu Peltola & Mikael Gummerus, FROSMO, Winning the future of UX, 26.10.2016
Anu Peltola & Mikael Gummerus, FROSMO, Winning the future of UX, 26.10.2016Anu Peltola & Mikael Gummerus, FROSMO, Winning the future of UX, 26.10.2016
Anu Peltola & Mikael Gummerus, FROSMO, Winning the future of UX, 26.10.2016
Frosmo
 
Electio - Technology Metals Investment Brochure
Electio - Technology Metals Investment BrochureElectio - Technology Metals Investment Brochure
Electio - Technology Metals Investment Brochure
Electio Middle East
 
創薬に必要なデータ統合の現在と未来
創薬に必要なデータ統合の現在と未来創薬に必要なデータ統合の現在と未来
創薬に必要なデータ統合の現在と未来
Mitsuteru Nakao
 
Frede space up paris 2013
Frede space up paris 2013Frede space up paris 2013
Frede space up paris 2013
Jędrzej Górski
 
Recruit, Retain, Realize - How Third Party Transactional Data Can Power Your ...
Recruit, Retain, Realize - How Third Party Transactional Data Can Power Your ...Recruit, Retain, Realize - How Third Party Transactional Data Can Power Your ...
Recruit, Retain, Realize - How Third Party Transactional Data Can Power Your ...
Doug Oldfield
 
807 103康八上 my comic book
807 103康八上 my comic book807 103康八上 my comic book
807 103康八上 my comic book
Ally Lin
 
Reclaiming the idea of the University
Reclaiming the idea of the UniversityReclaiming the idea of the University
Reclaiming the idea of the University
Richard Hall
 
Scala play-framework
Scala play-frameworkScala play-framework
Scala play-framework
Abdhesh Kumar
 
6º básico a semana 09 al 13 de mayo (1)
6º básico a semana 09  al 13 de  mayo (1)6º básico a semana 09  al 13 de  mayo (1)
6º básico a semana 09 al 13 de mayo (1)
Colegio Camilo Henríquez
 
Marquette Social Listening presentation
Marquette Social Listening presentationMarquette Social Listening presentation
Marquette Social Listening presentation
7Summits
 
Giveandget.com
Giveandget.comGiveandget.com
Giveandget.com
Hegedűs Zsolt
 

Viewers also liked (20)

Pharmacophore extraction from Matched Molecular Pair Analysis
Pharmacophore extraction from Matched Molecular Pair AnalysisPharmacophore extraction from Matched Molecular Pair Analysis
Pharmacophore extraction from Matched Molecular Pair Analysis
 
Collaborative Medicinal Chemistry Research
Collaborative Medicinal Chemistry ResearchCollaborative Medicinal Chemistry Research
Collaborative Medicinal Chemistry Research
 
Using Matched Molecular Pairs To Cluster Compounds
Using Matched Molecular Pairs To Cluster CompoundsUsing Matched Molecular Pairs To Cluster Compounds
Using Matched Molecular Pairs To Cluster Compounds
 
5 Signs Your Critical Assets Are Costing You More Than You Think
5 Signs Your Critical Assets Are Costing You More Than You Think5 Signs Your Critical Assets Are Costing You More Than You Think
5 Signs Your Critical Assets Are Costing You More Than You Think
 
Fashion e-commerce design cases
Fashion e-commerce design casesFashion e-commerce design cases
Fashion e-commerce design cases
 
Healthcare sherpa rcm services
Healthcare sherpa rcm servicesHealthcare sherpa rcm services
Healthcare sherpa rcm services
 
Modulating PIP2 levels in Cells via PIP5K inhibition
Modulating PIP2 levels in Cells via PIP5K inhibitionModulating PIP2 levels in Cells via PIP5K inhibition
Modulating PIP2 levels in Cells via PIP5K inhibition
 
Anu Peltola & Mikael Gummerus, FROSMO, Winning the future of UX, 26.10.2016
Anu Peltola & Mikael Gummerus, FROSMO, Winning the future of UX, 26.10.2016Anu Peltola & Mikael Gummerus, FROSMO, Winning the future of UX, 26.10.2016
Anu Peltola & Mikael Gummerus, FROSMO, Winning the future of UX, 26.10.2016
 
Electio - Technology Metals Investment Brochure
Electio - Technology Metals Investment BrochureElectio - Technology Metals Investment Brochure
Electio - Technology Metals Investment Brochure
 
創薬に必要なデータ統合の現在と未来
創薬に必要なデータ統合の現在と未来創薬に必要なデータ統合の現在と未来
創薬に必要なデータ統合の現在と未来
 
Ingles
InglesIngles
Ingles
 
Gamification review 1
Gamification review 1Gamification review 1
Gamification review 1
 
Frede space up paris 2013
Frede space up paris 2013Frede space up paris 2013
Frede space up paris 2013
 
Recruit, Retain, Realize - How Third Party Transactional Data Can Power Your ...
Recruit, Retain, Realize - How Third Party Transactional Data Can Power Your ...Recruit, Retain, Realize - How Third Party Transactional Data Can Power Your ...
Recruit, Retain, Realize - How Third Party Transactional Data Can Power Your ...
 
807 103康八上 my comic book
807 103康八上 my comic book807 103康八上 my comic book
807 103康八上 my comic book
 
Reclaiming the idea of the University
Reclaiming the idea of the UniversityReclaiming the idea of the University
Reclaiming the idea of the University
 
Scala play-framework
Scala play-frameworkScala play-framework
Scala play-framework
 
6º básico a semana 09 al 13 de mayo (1)
6º básico a semana 09  al 13 de  mayo (1)6º básico a semana 09  al 13 de  mayo (1)
6º básico a semana 09 al 13 de mayo (1)
 
Marquette Social Listening presentation
Marquette Social Listening presentationMarquette Social Listening presentation
Marquette Social Listening presentation
 
Giveandget.com
Giveandget.comGiveandget.com
Giveandget.com
 

Similar to MedChemica - Extracting and exploiting medicinal chemistry ADMET knowledge automatically from public and large pharma data

Sharing and standards christopher hart - clinical innovation and partnering...
Sharing and standards   christopher hart - clinical innovation and partnering...Sharing and standards   christopher hart - clinical innovation and partnering...
Sharing and standards christopher hart - clinical innovation and partnering...
Christopher Hart
 
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Databricks
 
An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...
An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...
An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...
imgcommcall
 
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Bigfinite
 
2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI ConferenceMegan Sawchuk
 
Microsoft: A Waking Giant in Healthcare Analytics and Big Data
Microsoft: A Waking Giant in Healthcare Analytics and Big DataMicrosoft: A Waking Giant in Healthcare Analytics and Big Data
Microsoft: A Waking Giant in Healthcare Analytics and Big Data
Dale Sanders
 
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
effective data sharing for a learning healthcare system
effective data sharing for a learning healthcare systemeffective data sharing for a learning healthcare system
effective data sharing for a learning healthcare system
Paul Houston
 
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big DataMicrosoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
Health Catalyst
 
Curlew Research Brussels 2014 Electronic Data & Knowledge Management
Curlew Research Brussels 2014 Electronic Data & Knowledge ManagementCurlew Research Brussels 2014 Electronic Data & Knowledge Management
Curlew Research Brussels 2014 Electronic Data & Knowledge Management
Nick Lynch
 
Explainable AI in Drug Hunting
Explainable AI in Drug HuntingExplainable AI in Drug Hunting
Explainable AI in Drug Hunting
Ed Griffen
 
Database Designing in CDM
Database Designing in CDMDatabase Designing in CDM
Database Designing in CDM
ClinosolIndia
 
DataFAIRy bioassays pilot -- lessons learned and future outlook
DataFAIRy bioassays pilot -- lessons learned and future outlookDataFAIRy bioassays pilot -- lessons learned and future outlook
DataFAIRy bioassays pilot -- lessons learned and future outlook
Isabella Feierberg
 
PROJECT softwares (28 May 14)
PROJECT softwares (28 May 14)PROJECT softwares (28 May 14)
PROJECT softwares (28 May 14)Preeti Sirohi
 
JR's Lifetime Advanced Analytics
JR's Lifetime Advanced AnalyticsJR's Lifetime Advanced Analytics
JR's Lifetime Advanced AnalyticsChase Hamilton
 
JR's Lifetime Advanced Analytics
JR's Lifetime Advanced AnalyticsJR's Lifetime Advanced Analytics
JR's Lifetime Advanced Analyticsd-Wise Technologies
 
Common Protocol Template (CPT) Initiative - Implementation Toolkit Executive ...
Common Protocol Template (CPT) Initiative - Implementation Toolkit Executive ...Common Protocol Template (CPT) Initiative - Implementation Toolkit Executive ...
Common Protocol Template (CPT) Initiative - Implementation Toolkit Executive ...
TransCelerate
 
Translating Medical Research for Patient Findability
Translating Medical Research for Patient FindabilityTranslating Medical Research for Patient Findability
Translating Medical Research for Patient Findabilitymapanzer
 

Similar to MedChemica - Extracting and exploiting medicinal chemistry ADMET knowledge automatically from public and large pharma data (20)

Sharing and standards christopher hart - clinical innovation and partnering...
Sharing and standards   christopher hart - clinical innovation and partnering...Sharing and standards   christopher hart - clinical innovation and partnering...
Sharing and standards christopher hart - clinical innovation and partnering...
 
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
 
NARDA
NARDANARDA
NARDA
 
An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...
An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...
An Approach to Combining Disparate Clinical Study Data across Multiple Sponso...
 
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
 
2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference
 
2016 LabHIT Vision
2016 LabHIT Vision2016 LabHIT Vision
2016 LabHIT Vision
 
Microsoft: A Waking Giant in Healthcare Analytics and Big Data
Microsoft: A Waking Giant in Healthcare Analytics and Big DataMicrosoft: A Waking Giant in Healthcare Analytics and Big Data
Microsoft: A Waking Giant in Healthcare Analytics and Big Data
 
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
 
effective data sharing for a learning healthcare system
effective data sharing for a learning healthcare systemeffective data sharing for a learning healthcare system
effective data sharing for a learning healthcare system
 
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big DataMicrosoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
 
Curlew Research Brussels 2014 Electronic Data & Knowledge Management
Curlew Research Brussels 2014 Electronic Data & Knowledge ManagementCurlew Research Brussels 2014 Electronic Data & Knowledge Management
Curlew Research Brussels 2014 Electronic Data & Knowledge Management
 
Explainable AI in Drug Hunting
Explainable AI in Drug HuntingExplainable AI in Drug Hunting
Explainable AI in Drug Hunting
 
Database Designing in CDM
Database Designing in CDMDatabase Designing in CDM
Database Designing in CDM
 
DataFAIRy bioassays pilot -- lessons learned and future outlook
DataFAIRy bioassays pilot -- lessons learned and future outlookDataFAIRy bioassays pilot -- lessons learned and future outlook
DataFAIRy bioassays pilot -- lessons learned and future outlook
 
PROJECT softwares (28 May 14)
PROJECT softwares (28 May 14)PROJECT softwares (28 May 14)
PROJECT softwares (28 May 14)
 
JR's Lifetime Advanced Analytics
JR's Lifetime Advanced AnalyticsJR's Lifetime Advanced Analytics
JR's Lifetime Advanced Analytics
 
JR's Lifetime Advanced Analytics
JR's Lifetime Advanced AnalyticsJR's Lifetime Advanced Analytics
JR's Lifetime Advanced Analytics
 
Common Protocol Template (CPT) Initiative - Implementation Toolkit Executive ...
Common Protocol Template (CPT) Initiative - Implementation Toolkit Executive ...Common Protocol Template (CPT) Initiative - Implementation Toolkit Executive ...
Common Protocol Template (CPT) Initiative - Implementation Toolkit Executive ...
 
Translating Medical Research for Patient Findability
Translating Medical Research for Patient FindabilityTranslating Medical Research for Patient Findability
Translating Medical Research for Patient Findability
 

Recently uploaded

Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
sonaliswain16
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Studia Poinsotiana
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
Areesha Ahmad
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
sanjana502982
 

Recently uploaded (20)

Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
 

MedChemica - Extracting and exploiting medicinal chemistry ADMET knowledge automatically from public and large pharma data

  • 1. MedChemica | 2016 ACS Philadelphia 2016 Extracting and exploiting medicinal chemistry ADMET knowledge automatically from public and large pharma data MedChemica ACS August 2016
  • 2. MedChemica | 2016 ACS Philadelphia 2016 Better compounds designed from Data •  Improved compounds quicker •  Applicable ideas •  Confident design decisions •  Help when stuck •  Clearly describable plans •  Maximizing value from ADMET testing •  Pursuing dead-end series •  Pursuing dead-end projects •  Running out of time or $ Essentials Gains Pains
  • 3. MedChemica | 2016 ACS Philadelphia 2016 Key findings: •  Secure sharing of large scale ADMET knowledge between large Pharma is possible •  The collaboration generated great synergy •  Many findings are highly significant •  MMP is a great tool for idea generation  •  The rules have been used in drug-discovery projects and generated meaningful results •  MMPA methodology can be extended to extract pharmacophores  
  • 4. MedChemica | 2016 ACS Philadelphia 2016 Grand Rule database Better medicinal chemistry by sharing knowledge not data & structures
  • 5. MedChemica | 2016 ACS Philadelphia 2016 Barriers Broken to Sharing Knowledge Data Integrity and curation Knowledge extraction algorithms Consortium building to share knowledge Into the minds of chemists
  • 6. MedChemica | 2016 ACS Philadelphia 2016 Data Integrity and Curation Structural •  Extensive standards for inclusion of mixtures, chiral compounds, salt forms •  Tautomer and charge state canonicalisation client side •  Automated validation of structures run client side = “clean” comparable structures submitted to pair finding Measured Data •  Assay protocols reviewed prior to merging •  Precise documentation on unit definitions and data reporting standards •  Option to share standard compound measured values •  Automated extensive data validation checks prior to merging data “client  side”  =  behind  Pharma  firewall  
  • 7. MedChemica | 2016 ACS Philadelphia 2016 IP security •  Contributing company structures are NOT shared •  Contributing company identifiers are NOT shared •  Each contributing company receives it’s OWN custom GRD copy to enable drill back to OWN example data •  Minimum of n=6 example pairs required for inclusion in the GRD •  Enforced limits of shared substructure sizes •  Option for “time-blocking” sharing of data to withhold data < 6 months old for IP submission
  • 8. MedChemica | 2016 ACS Philadelphia 2016 Barriers Broken to Sharing Knowledge Data Integrity and curation Knowledge extraction algorithms Consortium building to share knowledge Into the minds of chemists ✓  
  • 9. MedChemica | 2016 ACS Philadelphia 2016 MCPairs Platform •  Extract rules using Advanced Matched Molecular Pair Analysis •  Knowledge is captured as transformations –  divorced from structures => sharable Measured Data rule finder Exploitable Knowledge MC Expert Enumerator System Problem molecule Solution molecules Pharmacophores & toxophores SMARTS matching Alerts   Virtual  screening   Library  design   Protect  the  IP  jewels  
  • 10. MedChemica | 2016 ACS Philadelphia 2016 •  Matched Molecular Pairs – Molecules that differ only by a particular, well- defined structural transformation •  Transformation with environment capture – MMPs can be recorded as transformations from A B •  Environment is essential to understand chemistry Statistical analysis •  Learn what effect the transformation has had on ADMET properties in the past Griffen,  E.  et  al.  Matched  Molecular  Pairs  as  a  Medicinal  Chemistry  Tool.  Journal  of  Medicinal  Chemistry.  2011,  54(22),  pp.7739-­‐7750.       Advanced MMPA Δ Data A-B1 2 2 3 3 3 4 4 4 12 23 3 34 4 4 A        B    
  • 11. MedChemica | 2016 ACS Philadelphia 2016 Environment really matters HMe: •  Median Δlog(Solubility) •  225 different environments 2.5log   1.5log   HMe:   •  Median Δlog(Clint) Human microsomal clearance •  278 different environments
  • 12. MedChemica | 2016 ACS Philadelphia 2016 More environment = right detail HMe Solubility: •  225 different environments
  • 13. MedChemica | 2016 ACS Philadelphia 2016 HF What effect on Clearance? •  Median Δlog(Clint) Human microsomal clearance •  37 different environments 2  fold  improvement   2  fold  worse   Increase   clearance   decrease   clearance  
  • 14. MedChemica | 2016 ACS Philadelphia 2016 Identify and group matching SMIRKS Calculate statistical parameters for each unique SMIRKS (n, median, sd, se, n_up/n_down) Is n ≥ 6? Not enough data: ignore transformation Is the |median| ≤ 0.05 and the intercentile range (10-90%) ≤ 0.3? Perform two-tailed binomial test on the transformation to determine the significance of the up/ down frequency transformation is classified as ‘neutral’ Transformation classified as ‘NED’ (No Effect Determined) Transformation classified as ‘increase’ or ‘decrease’ depending on which direction the property is changing pass fail yes no yes no Rule selection 0 +ve-ve Median data difference Neutral   Increase  Decrease   NED  
  • 15. MedChemica | 2016 ACS Philadelphia 2016 Barriers Broken to Sharing Knowledge Data Integrity and curation Knowledge extraction algorithms Consortium building to share knowledge Into the minds of chemists ✓   ✓  
  • 16. MedChemica | 2016 ACS Philadelphia 2016 Merging knowledge •  Use the transforms that are robust in both companies to calibrate assays. •  Once the assays are calibrated against each other the transform data can be combined to build support in poorly exemplified transforms •  Methodology precedented in other fields CalibrateRobust Robust Weak Weak Discover Novel Pharma  1   Pharma  2  
  • 17. MedChemica | 2016 ACS Philadelphia 2016 Merging Datasets •  Datasets are standardized by comparison of transformations shared by contributing companies •  Transformations are examined at the “pair example” level •  Minimum of 6 transformations, each with a minimum of 6 pairs (42 compounds bare minimum) required to standardise •  “calibration factors” extracted to standardize the datasets to a common value – mean of calibration factors 0.94, typical range 0.8-1.2. •  Datasets with too few common transformations have standard compound measurements shared for calibration. “Blinded”  source  of   transformaLons  
  • 18. Current Knowledge sets – GRDv3 Numbers of statistically valid transforms Grouped Datasets Number of Rules logD7.4 153449 Merged solubility 46655 In vitro microsomal clearance: Human, rat ,mouse, cyno, dog 88423 In vitro hepatocyte clearance : Human, rat ,mouse, cyno, dog 26627 MCDK permeability A-B / B – A efflux 1852 Cytochrome P450 inhibition: 2C9, 2D6 , 3A4 , 2C19 , 1A2 40605 Cardiac ion channels NaV 1.5 , hERG ion channel inhibition 15636 Glutathione Stability 116 plasma protein or albumin binding Human, rat ,mouse, cyno, dog 64622
  • 19. MedChemica | 2016 ACS Philadelphia 2016 Pharma 1 100k rules Pharma 2 92k rules Pharma 3 37k rules 5.8k rules in common (pre-merge) ~ 2% New Rules 88k ~26% of total Merge   Combining  data  yields  brand  new  rules   Gains:    300  -­‐  900%   Merging knowledge – GRDv1
  • 20. MedChemica | 2016 ACS Philadelphia 2016 Key findings: •  Secure sharing of large scale ADMET knowledge between large Pharma is possible •  The collaboration generated great synergy •  Many findings are highly significant. •  MMP is a great tool for idea generation.  •  The rules have been used in drug-discovery projects and generated meaningful results •  MMPA methodology can be extended to extract pharmacophores  
  • 21. MedChemica | 2016 ACS Philadelphia 2016 Barriers Broken to Sharing Knowledge Data Integrity and curation Knowledge extraction algorithms Consortium building to share knowledge Into the minds of chemists ✓   ✓   ✓  
  • 22. MedChemica | 2016 ACS Philadelphia 2016 Integrating knowledge exploitation ..many possible connections ..high Stability and flexibility** Rule Database MCExpert API server ** api.medchemica.com has delivered 18 months of uptime and drives SaltTraX (Elixir) Approx 50 chemists use the tool RESTful API Chemistry  Shape   and  electrostaHcs  
  • 23. MedChemica | 2016 ACS Philadelphia 2016 Results from Hao Zheng…..
  • 24. MedChemica | 2016 ACS Philadelphia 2016 MedChemica | 2016 ACS Philadelphia 2016 - Fix hERG problem whilst maintaining potency Waring et al, Med. Chem. Commun., (2011), 2, 775 Glucokinase Activators MMPA ∆pEC50: -0.1 ∆logD: -0.6 ∆hERG pIC50 :-0.5 n=33 n=32 n=22 MMPA ∆pEC50: +0.3 ∆logD: +0.3 ∆hERG pIC50 :-0.3 n=20 n=23 n=19 MMPA ∆pEC50: -0.1 ∆logD: -0.6 ∆hERG pIC50 :-0.5 n=27 n=27 n=7 MedChemica | 2016 ACS Philadelphia 2016 A Less Simple Example Increase logD and gain solubility Property Number of Observa2ons Direc2on Mean Change Probability logD 8 Increase 1.2 100% Log(Solubility) 14 Increase 1.4 92% What is the effect on lipophilicity and solubility? Roche data is inconclusive! (2 pairs for logD, 1 pair for solubility) logD = 2.65 KineMc solubility = 84 µg/ml IC50 SST5 = 0.8 µM logD = 3.63 KineMc solubility = >452 µg/ml IC50 SST5 = 0.19 µM Ques2on: Available Sta2s2cs: Roche Example: MedChemica | 2016 ACS Philadelphia 2016 Solving a tBu metabolism issue Benchmark compound Predicted to offer most improvement in microsomal stability (in at least 1 species / assay) R2 R1 tBu Me Et iPr 99 392 16 64 78 410 53 550 99 288 78 515 41 35 98 327 92 372 24 247 35 128 24 62 60 395 39 445 3 21 20 27 57 89 54 89 •  Data shown are Clint for HLM and MLM (top and bottom, respectively) R1 R2 R1 tBu Roger Butlin Rebecca Newton Allan Jordan
  • 25. MedChemica | 2016 ACS Philadelphia 2016 - Fix hERG problem whilst maintaining potency Waring et al, Med. Chem. Commun., (2011), 2, 775 Glucokinase Activators MMPA ∆pEC50: -0.1 ∆logD: -0.6 ∆hERG pIC50 :-0.5 n=33 n=32 n=22 MMPA ∆pEC50: +0.3 ∆logD: +0.3 ∆hERG pIC50 :-0.3 n=20 n=23 n=19 MMPA ∆pEC50: -0.1 ∆logD: -0.6 ∆hERG pIC50 :-0.5 n=27 n=27 n=7
  • 26. MedChemica | 2016 ACS Philadelphia 2016 A Less Simple Example Increase logD and gain solubility Property   Number  of   Observa@ons   Direc@on   Mean  Change   Probability   logD   8   Increase   1.2   100%   Log(Solubility)   14   Increase   1.4   92%   What  is  the  effect  on  lipophilicity  and   solubility?   Roche  data  is  inconclusive!  (2  pairs   for  logD,  1  pair  for  solubility)   logD  =  2.65   KineLc  solubility  =  84  µg/ml   IC50  SST5  =  0.8  µM   logD  =  3.63   KineLc  solubility  =  >452  µg/ml   IC50  SST5  =  0.19  µM   Ques@on:   Available   Sta@s@cs:   Roche   Example:  
  • 27. MedChemica | 2016 ACS Philadelphia 2016 The application helped lead optimization in project 27 •  193 compounds •  Enumerated Objective: improve metabolic stability MMP Enumeration Calculated Property Docking 8 compounds synthesized
  • 28. MedChemica | 2016 ACS Philadelphia 2016 Solving a tBu metabolism issue Benchmark   compound   Predicted  to  offer  most  improvement  in  microsomal  stability  (in  at  least  1  species  /  assay)                        R2     R1   tBu   Me   Et   iPr   99   392   16   64   78   410   53   550   99   288   78   515   41   35   98   327   92   372   24   247   35   128   24   62   60   395   39   445   3   21   20   27   57   89   54   89   •  Data shown are Clint for HLM and MLM (top and bottom, respectively) R1   R2  R1  tBu   Roger Butlin Rebecca Newton Allan Jordan
  • 29. MedChemica | 2016 ACS Philadelphia 2016 Early successes From GRDv1 May 2014 29 J.  Med.  Chem.,  2015,  58  (23),  pp  9309–9333   DOI:  10.1021/acs.jmedchem.5b01312  
  • 30. MedChemica | 2016 ACS Philadelphia 2016 Comparison of Merck in-house MMPA with SALTMinerTM Structure: ADMET Issue: hERG Lead A2A receptor antagonist compound in Merck Parkinson's project 138 suggestion molecules with predicted improvement in hERG binding How many match the results of Merck? •  Also shows potent binding to the hERG ion channel •  Deng et al performed in- house MMPA on hERG binding compound data and have published 18 resulting fluorobenzene transformations, which they have synthesized and tested for hERG activity Deng  et  al,     Bioorg.  &  Med  Chem  Let  (2015),   doi:  hhp://dx.doi.org/10.1016/ j.bmcl.2015.05.036    
  • 31. MedChemica | 2016 ACS Philadelphia 2016 R  group:   Measured  hERG   pIC50  change   -­‐1.187   -­‐1.149   -­‐1.038   -­‐1.215   -­‐1.157   -­‐0.149   -­‐1.487   -­‐1.133   GRD  median   historic  pIC50   change   0   -­‐0.171   -­‐0.1   -­‐0.283   -­‐0.219   -­‐0.318   -­‐0.159   -­‐0.103   Results: 8 out of the 18 fluorobenzene transformations produced by Merck were also suggested by MCExpert to decrease hERG binding: Searching the GRD for transformations that increase hERG there were none that matched the remaining 10 of 18 transformations in the paper. MCExpert also suggested an additional 50 fluorobenzene replacements to decrease hERG binding NOT mentioned in the publication.
  • 32. MedChemica | 2016 ACS Philadelphia 2016 Key findings: •  Secure sharing of large scale ADMET knowledge between large Pharma is possible •  The collaboration generated great synergy •  Many findings are highly significant. •  MMP is a great tool for idea generation.  •  The rules have been used in drug-discovery projects and generated meaningful results •  MMPA methodology can be extended to extract pharmacophores  
  • 33. MedChemica | 2016 ACS Philadelphia 2016 Barriers Broken to Sharing Knowledge Data Integrity and curation Knowledge extraction algorithms Consortium building to share knowledge Into the minds of chemists ✓   ✓   ✓   ✓  
  • 34. MedChemica | 2016 ACS Philadelphia 2016 Pharmacophores and Toxophores by extended analysis from the MMPA PharmacophoresBigData Stats Matched Pairs Finding Public and in- house potency data
  • 35. Mining transform sets to find influential fragments Identify the ‘Z’ fragments associated with a significant number of potency increasing changes – irrespective of what they are replaced with ‘Z’ is ‘worse than anything you replace it with’ Fragment A Fragment B   Change in binding measurement Public Data Find Matched Pairs Find Potent Fragments +2.7   +3.2   +0.6   +0.6   Identify the ‘A’ fragments associated with a significant number of potency decreasing changes – irrespective of what they are replaced with ‘A’ is ‘better than anything you replace it with’ A   +2.1  +2.2   +1.4   +0.4   +1.8   Z   pKi/ pIC50 Compounds with destructive fragment Compounds with constructive fragments Generate  Pharmacophore  dyads  by   permutaLng  all  the  fragments  with   the  shortest  path  between  them  
  • 36. MedChemica | 2016 ACS Philadelphia 2016 Toxophores - Detailed, specific & transparent Dopamine D2 receptor human Actual: 9.5 Predicted: 9.1 Mean with: 8.0 Mean without: 6.6 Odds Ratio: 340 Dopamine Transporter Actual: 9.1 Predicted: 8.6 Mean with: 8.3 Mean without: 6.7 Odds Ratio: 407 GABA-A Actual: 9.0 Predicted: 8.7 Mean with: 8.0 Mean without: 6.8 Odds Ratio: 1506 β1 adrenergic receptor Actual: 7.8 Predicted: 7.7 Mean with: 6.5 Mean without: 5.7 Odds Ratio: 1501 Find Potent Fragments Matched Pairs Finding Find Pharmacophore Dyads Public and in- house potency data
  • 37. MedChemica | 2016 ACS Philadelphia 2016 Prediction of unseen new molecules The acid test… •  Vascular endothelial growth factor receptor 2 tyrosine kinase (KDR) •  Inhibitors have oncology and ophthalmic indications •  Large dataset in CHEMBL •  10 fold cross validated PLS model •  Selected model by minimised RMSEP 37 Compounds 4466 Matched Pairs 288100 Fragments 678 Pharmacophore dyads 787 RMSEP 0.8 R2 0.64 Y-scrambled R2 0.0 ROC 0.95 Geomean odds ratio 80 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ●●●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 4 6 8 10 5 7 9 pIC50_pred pIC50
  • 38. MedChemica | 2016 ACS Philadelphia 2016 Novartis Predictions From Our Model Domain of Applicability…. Actual: 8.4[1] Predicted: 7.5 38 Actual: 7.6[1] Predicted: 7.5 1. J MedChem(2016), Bold et al. 2. MedChem Lett (2016), Mainolfi et al. Actual: 7.7[2] Predicted: 7.1 Actual: 9.0[2] Predicted: Out of Domain
  • 39. MedChemica | 2016 ACS Philadelphia 2016 Target Number  of   compounds   Number  of   compound   pairs   Number  of   Fragments   Number  of   Pharmacophore   dyads  aoer   filtering   R2   RMSEP   ROC   odds_raLo   (geomean)   Acetylcholine esterase - human 383   27755   44   10   0.43   1.57   0.80   4   β 1 adrenergic receptor 505   145447   276   313   0.64   0.70   0.96   833   Androgen receptor 1064   113163   186   46   0.47   0.77   0.86   140   CB1 canabinnoid receptor 1104   88091   165   90   0.61   1.02   0.87   96   CB2 canabinnoid receptor 1112   82130   194   158   0.19   0.85   0.64   5.7   Dopamine D2 receptor - human 3873   230962   483   602   0.42   0.88   0.69   110   Dopamine D2 receptor - rat 1807   118736   267   377   0.29   0.85   0.78   125   Dopamine Transporter 1470   106969   282   336   0.58   0.73   0.88   141   GABA A receptor 848   39494   106   167   0.70   0.76   0.97   560   hERG ion channel 4189   242261   392   76   0.61   0.96   0.92   55   5HT2a receptor 642   50870   197   267   0.61   0.59   0.83   600   Monoamine oxidase 264   15439   44   11   0.12   1.25   0.48   181   Muscarinic acetylcholine receptor M1 628   48200   97   510   0.62   0.94   0.89   48   µ opioid receptor 1128   37184   33   11   0.69   1.30   0.87   81   Critical safety target analysis •  Build models using 10-fold cross validated PLS •  Assess using ROC / BEDROC, R2 vs 100 fold y-scrambled R2 and geomean odds ratio 39 Public Data Find Matche d Pairs Pharmacophores Find Pharmacophore dyads Find Potent Fragments
  • 40. MedChemica | 2016 ACS Philadelphia 2016 MCBiophore GUI screenshot Assay Image Mean_with Mean_without PLS_coeff Path SMARTS1 SMARTS2 n_examples odds_ratio 2 VEGFR 8.3 6.4 0.71 [c]c[c][c] Cc1ccc[c]c1[n] [c]/C(=N/O)/C 18 259.8 3 VEGFR 8.2 6.4 0.17 [c] [c]c1cc(cc[c]1)/C(=N/OC)/CCc1ccc[c]c1[n] 17 257.6 0 VEGFR 8.1 6.4 -0.01 [CH3] [cH]c([cH])/C(=N/OC)/C[c]/C(=N/OCC)/C 8 20.2 1 VEGFR 8.1 6.4 -0.01 [CH3] [c]c1cc(cc[c]1)/C(=N/OC)/C[c]/C(=N/OCC)/C 8 151.2 Detailed results in excel
  • 41. MedChemica | 2016 ACS Philadelphia 2016 Matched Molecular Pair data A data B data C data D data E data F Chemical Transformations Δ data A B Δ data C D Δ data E F Chemical Transformations Δ data A B Δ data C D Δ data E F Δ data G H Δ data I J Δ data K L Matched Molecular Pair Analysis (MMPA) enables SAR sharing Without sharing underlying structures and data Grand Rule Database Enumeration Rate-My-Idea GRD-Browser ChEMBL Tox database Toxophores MC- Biophore
  • 42. MedChemica | 2016 ACS Philadelphia 2016 Take home messages •  Systematic “Big Data” analysis of ADMET data yields a system that SUGGESTS solution molecules for medicinal chemists •  Better decision making / Faster project progression •  Knowledge Sharing IS possible between pharma companies –  This yields HIGH VALUE databases and VAST new knowledge •  Pharmacophores and toxophores can be extracted from MMPA
  • 43. MedChemica | 2016 ACS Philadelphia 2016 Collaborators and Users - experience
  • 44. MedChemica | 2016 ACS Philadelphia 2016 A Collaboration of the willing Craig Bruce OE John Cumming Roche David Cosgrove C4XD Andy Grant★ Martin Harrison Elixir Huw Jones Base360 Al Rabow Consulting David Riley AZ Graeme Robb AZ Attilla Ting AZ Howard Tucker retired Dan Warner Myjar Steve St-Galley Syngenta David Wood JDR Lauren Reid MedChemica Shane Monague MedChemica Jessica Stacey MedChemica Andy Barker Consulting Pat Barton AZ Andy Davis AZ Andrew Griffin Elixir Phil Jewsbury AZ Mike Snowden AZ Peter Sjo AZ Martin Packer AZ Manos Perros Entasis Therapeutics Nick Tomkinson AZ Martin Stahl Roche Jerome Hert Roche Martin Blapp Roche Torsten Schindler Roche Paula Petrone Roche Christian Kramer Roche Jeff Blaney Genentech Hao Zheng Genentech Slaton Lipscomb Genentech Alberto Gobbi Genentech