SlideShare a Scribd company logo
MedChemica | 2017
CICAG RSC Liverpool June 2017
Extracting medicinal chemistry
knowledge by a secured Matched
Molecular Pair Analysis platform:
standardization of SMIRKS enables
knowledge exchange
Dr Alexander Dossetter
MedChemica
CICAG Structure Representaton meeting, 22 June 2017
Liverpool University, UK
MedChemica | 2017
CICAG RSC Liverpool June 2017
NCE Drug Approval have not increased enough
…something has to change
2
Data	-	Federal	Drug	Administra2on	Website	h7ps://www.fda.gov	
A7ri2on	in	the	Pharmaceu2cal	Industry:	Reasons,	Implica2ons,	and	Pathways	Forward	
By	Alexander	Alex,	C.	John	Harris,	Dennis	A.	Smith;	Wiley	2016	
15.5	 18.0	 24.2	 21.4	 26.2	 36.8	 23.6	 22.2	 33	
0	
10	
20	
30	
40	
50	
60	
1971	
1972	
1973	
1974	
1975	
1976	
1977	
1978	
1979	
1980	
1981	
1982	
1983	
1984	
1985	
1986	
1987	
1988	
1989	
1990	
1991	
1992	
1993	
1994	
1995	
1996	
1997	
1998	
1999	
2000	
2001	
2002	
2003	
2004	
2005	
2006	
2007	
2008	
2009	
2010	
2011	
2012	
2013	
2014	
2015	
2016	
Mean	each	5	year
MedChemica | 2017
CICAG RSC Liverpool June 2017
Actual spending / Chemistry everywhere
Paul, S. M. et al How to improve R&D productivity: the pharmaceutical
industry’s grand challenge, Nat. Rev. Drug Discovery 2010, 9, 203
Medium sized companies R&D spend in
one year $1.7 billion 34% is spent in H2L
and LO
Better Knowledge =
Fewer Compounds =
Lower the cost
Chemistry CD Quality influences
the success and speed
Chemistry controls the productivity
and quality
MedChemica | 2017
CICAG RSC Liverpool June 2017
Where is the “Handbook of Medicinal Chemistry”?
•  Case study collections
•  “War stories” & anecdotes
•  Broad highly general rules (eg Lipinski)
Where is the evidence based quantitative
guide to medicinal chemistry?
4
Here is the story of the building of the
‘Grand Rule Database’.
A Multi-pharma Med Chem Textbook
based on a thorough AI study.
Grand Rule
Database
v3
MedChemica | 2017
CICAG RSC Liverpool June 2017
What we actually need is UNSUPERVISED
Machine Learning
5
What? Where? Why?
Large datasets Large Pharma Access to all the
“actives” and “in-
actives”
Algorithms to extract
structure
Matched Molecular
Pair Analysis
All combinations
considered [O(n2)
problem], accurate
structures, speed, finds
counter intuitive Rules
Compute resource Within Pharma IP secure
Storage Secure with VM Multi-T-bytes
Ability to visualize and
apply the results
Modern web tools and
REST API
Chemists understand it
and use it
MedChemica | 2017
CICAG RSC Liverpool June 2017
Grand Rule database
Better medicinal chemistry by sharing knowledge not data & structures
MMP
finder
MCPairs=	
Kramer,	C.;	Ting,	A.;	Zheng,	H.;	Hert,	J.;	Schindler,	T.;	Stahl,	M.;	Robb,	
G.;	Crawford,	J.;	Blaney,	J.;	Montague,	S.;	Leach,	A.	G.;	Dosse7er,	A.	
G.;	Griffen,	E.	J.		Learning	Medicinal	Chemistry	ADMET	rules	from	
Cross-company	MMPA	J.Med.Chem.	SubmiBed.
MedChemica | 2017
CICAG RSC Liverpool June 2017
Finding Matched Pairs and
Chem-infomatics
•  Challenge:
– Matched Pair finding is an O(n2) process
so will be “BigData”
– What is the best matched pair finding
technique?
– Once the pairs are found, how do you
encode the output so knowledge can be
shared securely?
– Once there is knowledge how do
chemists use it?
MedChemica | 2017
CICAG RSC Liverpool June 2017
Data
Integrity and
curation
Knowledge
extraction
algorithms
Consortium
building to
share
knowledge Into the minds of
chemists
✓	
✓	
✓	
✓	
Grand Rule
Database
v3
MCPairs
Barriers Broken to Sharing Knowledge
MedChemica | 2017
CICAG RSC Liverpool June 2017
New – stereochemistry standard
Standardisation
(Units, Species, Routes, Aggregation)
All agreed by Consortium (all in the MCPairs system and documentation)
– use public ontology and taxonomy where possible
22 standard units
Linear scale / catagorical
Shared Assay Standard
Linear scale / catagorical
1962 Species
MedChemica | 2017
CICAG RSC Liverpool June 2017
Barriers Broken to Sharing Knowledge
Data
Integrity and
curation
Knowledge
extraction
algorithms
Consortium
building to
share
knowledge Into the minds of
chemists
✓
MedChemica | 2017
CICAG RSC Liverpool June 2017
Matched pair methodology
There are two technique – Frag and Index (H/R) and MCSS
A – CHEMBL156639 B - CHEMBL2387702
A – CHEMBL100461 B –CHEMBL103900
MCSS ✓, F&I ✗ MCSS ✗ , F&I ✓
MCSS ✓, F&I ✗
MCSS ✓, F&I ✗
MCSS ✗, F&I ✗ MCSS ✗, F&I ✗
MCSS ✗ , F&I ✓
MCSS ✓, F&I ✗
The two techniques find different chemistry….
MedChemica | 2017
CICAG RSC Liverpool June 2017
Does the Matched Pair method really matter?
Using only one technique will miss between 12%
and 56% of pairings
12
Pairings Pairings
number	of	
compounds	 common	 FI	only	 MCSS	only	 total	 FI	only	%	 common	%	 MCSS	only	%	
VEGF	 4466	 14631	 17172	 14823	 46626	 37	 31	 32	
Dopamine	
Transporter	 1470	 4480	 8930	 3497	 16907	 53	 26	 21	
GABAA	 848	 2500	 1722	 4205	 8427	 20	 30	 50	
D2	human	 3873	 12995	 13811	 13098	 39904	 35	 33	 33	
D2	rat	 1807	 5408	 6595	 7346	 19349	 34	 28	 38	
Acetylcholine	
esterase	 383	 536	 725	 1434	 2695	 27	 20	 53	
Monoamine	
oxidase	 264	 653	 1156	 246	 2055	 56	 32	 12	
min	 20	 20	 12	
max	 56	 33	 53	
FI MCSS
common
Lukac,	I.;	Zarnecka,	J.;	Griffen,	E.J.;	Dosse7er,	A.G.;	St-Gallay,	S.;	Enoch,	S.;	Madden,	
J.;	Leach,	A.G.	"Turbocharging	matched	molecular	pair	analysis;	opEmizing	the	
idenEficaEon	and	analysis	of	pairs.”	J.	Chem.	Inf.	Model.	Submi7ed
MedChemica | 2017
CICAG RSC Liverpool June 2017
•  Matched Molecular Pairs – Molecules
that differ only by a particular, well-
defined structural transformation
•  Transformation with environment capture –
MMPs can be recorded as transformations
from Aà B
•  Environment is essential to understand
chemistry
Griffen,	E.	et	al.	Matched	Molecular	Pairs	as	a	Medicinal	Chemistry	Tool.	Journal	of	Medicinal	Chemistry.	2011,	54(22),	pp.7739-7750.		
	
Advanced MMPA with MCPairs
Δ Data
A-B1
2
2
3
3
3
4
4
4
12
23
3
34
4
4
A 	 	 	 	B 		
Environment is key and we need to capture it in our chemical encoding…
MedChemica | 2017
CICAG RSC Liverpool June 2017
How do we encode the chemical
transformation?
•  Requirements
–  Lightweight – using as few bytes as possible
–  hashable – allows database indexing
–  Can be used with Chem Toolkits to generate
product molecules from chemist’s input
–  SMIRKS / Reaction SMARTS (RDKit) fit the bill
–  Issue – need to automatically generate
SMIRKS from the matched pairs
•  New algorithm required
•  Canonicalisation is required so SMIRKS are
consistent from one organisation to another
MedChemica | 2017
CICAG RSC Liverpool June 2017
Standardising Canonicalised SMIRKS
CHEMBL309689 CHEMBL2331793 	 	 	 	
3-Atom	rule 		[O:1]([H])[c:2]([c:3])[n:4]>>[c:3][c:2]([n:4])[O:1][C]([H])([H])[C]([H])([H])([H])	
Highly specific explicit H
Key mapped atom
With 4 atom transform environment is more complex
4-Atom	rule 	[O:1]([H])[c:2]1[c:3]([c:4][o:5][n:6]1)[C:7]([H])([H])>>	
	 	 	 	[C]([H])([H])([H])[C]([H])([H])[O:1][c:2]1[c:3]([c:4][o:5][n:6]1)[C:7]([H])([H])	
Note Mapped atoms run left to right 1,2,3,4…n
Rule change depending on rule size, environment and symmetry
Without explicit H critical information is lost and incorrect products generated
With OpenEye ChemTK SMIRKS operate at 99.2% reliability and functionality
2	
1	
3	
3	
4	
4	
4	 Orange	–	atom	env	radius	
Blue						-	atom	map	index	
4	
3	
2	
1	
4	
3	
2	
1
MedChemica | 2017
CICAG RSC Liverpool June 2017
Reaction SMARTs variation (for RDkit)
CHEMBL309689 CHEMBL2331793 	 	 	 	
3-Atom	rule 		[O;H1:1][c:2]([c:3])[n:4]>>[c:3][c:2]([n:4])[O;H0:1][C;H2][C;H3]	
Hydrogens are within SMARTS (note H0 in product)
4-Atom	rule 	[O;H1:1][c:2]1[c:3]([c:4][o:5][n:6]1)[C;H2:7]>>	
	 	 	 	[C;H3][C;H2][O;H0:1][c:2]1[c:3]([c:4][o:5][n:6]1)[C;H2:7]	
RDkit canonical reaction SMARTS work at ~95% of examples.
Without the Hydrogen SMARTS in mappings critical chemical information is
lost and products are not formed
2	
1	
3	
3	
4	
4	
4	 Orange	–	atom	env	radius	
Blue						-	atom	map	index	
4	
3	
2	
1	
4	
3	
2	
1
MedChemica | 2017
CICAG RSC Liverpool June 2017
Identify and group matching SMIRKS
Calculate statistical parameters for each unique
SMIRKS (n, median, sd, se, n_up/n_down)
Is n ≥ 6?
Not enough data:
ignore transformation
Is the |median| ≤ 0.05 and the
intercentile range (10-90%) ≤ 0.3?
Perform two-tailed binomial test on the
transformation to determine the
significance of the up/ down frequency
transformation is
classified as ‘neutral’
Transformation classified as
‘NED’ (No Effect Determined)
Transformation classified as
‘increase’ or ‘decrease’
depending on which direction the
property is changing
pass	fail	
yes	no	
yes	no	
How do you find Knowledge?
Rule selection 101
0 +ve-ve
Median data difference
Neutral	 Increase	Decrease	
NED
MedChemica | 2017
CICAG RSC Liverpool June 2017
Barriers Broken to Sharing Knowledge
Data
Integrity and
curation
Knowledge
extraction
algorithms
Consortium
building to
share
knowledge Into the minds of
chemists
✓	
✓
MedChemica | 2017
CICAG RSC Liverpool June 2017
Merging knowledge
•  Use the transforms that
are robust in both
companies to calibrate
assays.
•  Once the assays are
calibrated against each
other the transform
data can be combined
to build support in
poorly exemplified
transforms
•  Methodology
precedented in other
fields
CalibrateRobust
Robust
Weak
Weak
Discover
Novel
Pharma	1	
Pharma	2
MedChemica | 2017
CICAG RSC Liverpool June 2017
Merging Datasets
•  Datasets are standardized by comparison of
transformations shared by contributing companies
•  Transformations are examined at the “pair example”
level
•  Minimum of 6 transformations, each with a minimum of 6
pairs (42 compounds bare minimum) required to
standardise
•  “calibration factors” extracted to standardize the
datasets to a common value – mean of calibration
factors 0.94, typical range 0.8-1.2.
•  Datasets with too few common transformations have
standard compound measurements shared for
calibration.
“Blinded”	source	of	
transforma2ons
MedChemica | 2017
CICAG RSC Liverpool June 2017
Current Knowledge sets – June 2015
Numbers of statistically valid transforms
Grouped Datasets Number of Rules
logD7.4 153449
Merged solubility 46655
In vitro microsomal clearance:
Human, rat, mouse, cyno, dog
88423
In vitro hepatocyte clearance :
Human, rat, mouse, cyno, dog 26627
MCDK permeability A-B / B – A efflux 1852
Cytochrome P450 inhibition:
2C9, 2D6 , 3A4 , 2C19 , 1A2
40605
Cardiac ion channels
NaV 1.5, hERG ion channel inhibition
15636
Glutathione Stability 116
plasma protein or albumin binding
Human, rat, mouse, cyno, dog
64622
Grand Rule
Database
v3
MedChemica | 2017
CICAG RSC Liverpool June 2017
Data
Integrity and
curation
Knowledge
extraction
algorithms
Consortium
building to
share
knowledge Into the minds of
chemists
✓	
✓	
✓	
✓	
Grand Rule
Database
v3
MCPairs
Barriers Broken to Sharing Knowledge
MedChemica | 2017
CICAG RSC Liverpool June 2017
Exploiting Knowledge for Compound
Optimization
Measured
Data
rule
finder Exploitable
Knowledge
MCExpert
System
Problem molecule
New molecule
suggestions
rule
finder
MCPairs=	
“..it’s like asking 150 of your peers
for ideas in just a few seconds” –
AZ Principal Scientist
MedChemica | 2017
CICAG RSC Liverpool June 2017
“Its like asking 150 of peers for ideas…”
Suggested Molecules with “heat map” of Rules for 26 in-vitro endpoints
- “…an MPO Gold Mine” – Roche consortium member
Ask for a demo
MedChemica | 2017
CICAG RSC Liverpool June 2017
More examples of Success
25
MedChemica | 2016
ACS Philadelphia 2016
- Fix hERG problem whilst maintaining potency
Waring et al, Med. Chem. Commun., (2011), 2, 775
Glucokinase Activators
MMPA
∆pEC50: -0.1 ∆logD: -0.6 ∆hERG pIC50 :-0.5
n=33 n=32 n=22
MMPA
∆pEC50: +0.3 ∆logD: +0.3 ∆hERG pIC50 :-0.3
n=20 n=23 n=19
MMPA
∆pEC50: -0.1 ∆logD: -0.6 ∆hERG pIC50 :-0.5
n=27 n=27 n=7
MedChemica | 2016
ACS Philadelphia 2016
A Less Simple Example
Increase logD and gain solubility
Property	 Number	of	
Observa2ons	
Direc2on	 Mean	Change	 Probability	
logD	 8	 Increase	 1.2	 100%	
Log(Solubility)	 14	 Increase	 1.4	 92%	
What	is	the	effect	on	lipophilicity	and	
solubility?	
Roche	data	is	inconclusive!	(2	pairs	
for	logD,	1	pair	for	solubility)	
logD	=	2.65	
KineMc	solubility	=	84	µg/ml	
IC50	SST5	=	0.8	µM	
logD	=	3.63	
KineMc	solubility	=	>452	µg/ml	
IC50	SST5	=	0.19	µM	
Ques2on:	
Available	
Sta2s2cs:	
Roche	
Example:	
Thompson;	M.J.	et	al	J.	Med.	Chem.,	2015,	58	(23),	pp	9309–9333	
DOI:	10.1021/acs.jmedchem.5b01312
MedChemica | 2017
CICAG RSC Liverpool June 2017
Collaborators and Users
Survey - 17 out of 19 organisations
said the GRD aided project
progression
MedChemica | 2017
CICAG RSC Liverpool June 2017
Can we understand efflux? – MDR1 / PGP
27
Metrabase
h7p://www-metrabase.ch.cam.ac.uk	
1911	compounds:	substrate		Y/N	
Pair
Finding
Rule
Finding
Property	analysis	
MDR1	substrate	= 	↑	hydrogen	bond	donors		
	 	↑	hydrogen	bond	acceptors	
	 	↑	PSA	
Only	826	compound	pairs	 1	“borderline”	rule	
Public	transporter	data:	
	
•  Not	quan2ta2ve	
•  Not	enough	
•  Too	diverse	
•  Trivial	conclusions	
		
See	also:	Drug	Discov	Today.	2012	Apr;17(7-8):
343-51.	doi:	10.1016/j.drudis.2011.11.003
MedChemica | 2017
CICAG RSC Liverpool June 2017
Global Absorption Analysis by MMPA
Combined knowledge from a large number of peer pharma
Secure Analysis of in-vitro absorption
Good Absorption improves
•  Efficacy
•  Safety (lower dose / less off target)
Medicinal Chemistry demands answers
•  Low dose oral bio-availability – how?
•  MDR1 resistance in oncology
•  Brain penetration for CNS diseases
•  Only rough and ready “rules”
available – trial and error victims
Problem
•  These expensive assays preclude ANY
one company having enough
knowledge
•  Extreme paucity of data in literature
Solution
•  ≥10 companies worth of data
•  A typical pharma has ~10000 results
•  MedChemica has the technology,
standarization to perform this analysis
MedChemica’s offer
$13600 per organisation to produce a
new absorption database
Absorption
Rule Database
v1
MedChemica | 2017
CICAG RSC Liverpool June 2017
Data
Integrity and
curation
Knowledge
extraction
algorithms
Consortium
building to
share
knowledge Into the minds of
chemists
✓	
✓	
✓	
✓	
Grand Rule
Database
v3
MCPairs
Barriers Broken to Sharing Knowledge
MedChemica | 2017
CICAG RSC Liverpool June 2017
Key findings:
•  Secure sharing of large scale ADMET knowledge
between large Pharma is possible
•  The collaboration generated great synergy
•  Standarisation of Units, Species, Assays, MMPA
environment, Canonical SMIRKS enabled sharing
•  MMP is a great tool for idea generation 
•  The rules have been used in drug-discovery projects
and yields a clear business case for sharing
MedChemica | 2017
CICAG RSC Liverpool June 2017
A Collaboration of the willing
Craig Bruce OE
John Cumming Roche
David Cosgrove C4XD
Andy Grant★
Martin Harrison Elixir
Huw Jones Base360
Al Rabow Consulting
David Riley AZ
Graeme Robb AZ
Attilla Ting AZ
Howard Tucker retired
Dan Warner Myjar
Steve St-Galley Syngenta
David Wood JDR
Lauren Reid MedChemica
Shane Monague MedChemica
Jessica Stacey MedChemica
Andy Barker Consulting
Pat Barton AZ
Andy Davis AZ
Andrew Griffin Elixir
Phil Jewsbury AZ
Mike Snowden AZ
Peter Sjo AZ
Martin Packer AZ
Manos Perros Entasis Therapeutics
Nick Tomkinson AZ
Martin Stahl Roche
Jerome Hert Roche
Martin Blapp Roche
Torsten Schindler Roche
Paula Petrone Roche
Christian Kramer Roche
Jeff Blaney Genentech
Hao Zheng Genentech
Slaton Lipscomb Genentech
Alberto Gobbi Genentech
MedChemica | 2017
CICAG RSC Liverpool June 2017

More Related Content

Similar to Extracting medicinal chemistry knowledge by a secured Matched Molecular Pair Analysis platform: standardization of SMIRKS enables knowledge exchange

Chemoinformatic File Format.pptx
Chemoinformatic File Format.pptxChemoinformatic File Format.pptx
Chemoinformatic File Format.pptx
wadhava gurumeet
 
Developing tools for high resolution mass spectrometry-based screening via th...
Developing tools for high resolution mass spectrometry-based screening via th...Developing tools for high resolution mass spectrometry-based screening via th...
Developing tools for high resolution mass spectrometry-based screening via th...
Andrew McEachran
 
A label free and enzyme-free aptasensor for visual cd2+ detection based on sp...
A label free and enzyme-free aptasensor for visual cd2+ detection based on sp...A label free and enzyme-free aptasensor for visual cd2+ detection based on sp...
A label free and enzyme-free aptasensor for visual cd2+ detection based on sp...
Vincent Paul Schmitz
 
Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Data analysis workflows part 2 2015
Data analysis workflows part 2 2015
Dmitry Grapov
 
Meaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchMeaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine research
Nolan Nichols
 
Virtual screening of chemicals for endocrine disrupting activity through CER...
Virtual screening of chemicals for endocrine disrupting activity through  CER...Virtual screening of chemicals for endocrine disrupting activity through  CER...
Virtual screening of chemicals for endocrine disrupting activity through CER...
Kamel Mansouri
 
Multi-Attribute Decision Making with VIKOR Method for Any Purpose Decision
Multi-Attribute Decision Making with VIKOR Method for Any Purpose DecisionMulti-Attribute Decision Making with VIKOR Method for Any Purpose Decision
Multi-Attribute Decision Making with VIKOR Method for Any Purpose Decision
Universitas Pembangunan Panca Budi
 
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
Kamel Mansouri
 
MedChemica Active Learning - Combining MMPA and ML
MedChemica Active Learning - Combining MMPA and MLMedChemica Active Learning - Combining MMPA and ML
MedChemica Active Learning - Combining MMPA and ML
Al Dossetter
 
Modern analytical chemistry
Modern analytical chemistryModern analytical chemistry
Modern analytical chemistry
Tiến Đồng Sỹ
 
Nanomaterial design guided by the principles of green chemistry
Nanomaterial design guided by the principles of green chemistryNanomaterial design guided by the principles of green chemistry
Nanomaterial design guided by the principles of green chemistry
Chemist Sayed
 
Efficient Searching and Similarity of Unmapped Reactions: Application to ELN ...
Efficient Searching and Similarity of Unmapped Reactions: Application to ELN ...Efficient Searching and Similarity of Unmapped Reactions: Application to ELN ...
Efficient Searching and Similarity of Unmapped Reactions: Application to ELN ...
NextMove Software
 
Duffy 1998
Duffy 1998Duffy 1998
Duffy 1998
Fran Flores
 
Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biog...
Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biog...Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biog...
Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biog...
GigaScience, BGI Hong Kong
 
Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...
Kamel Mansouri
 
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
Kamel Mansouri
 
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
Kamel Mansouri
 
Cadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.PharmCadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.Pharm
Shikha Popali
 
Project padeatric adeno virus inhibitor.pptx
Project padeatric adeno virus inhibitor.pptxProject padeatric adeno virus inhibitor.pptx
Project padeatric adeno virus inhibitor.pptx
DrRajeshDas
 
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
Andrew McEachran
 

Similar to Extracting medicinal chemistry knowledge by a secured Matched Molecular Pair Analysis platform: standardization of SMIRKS enables knowledge exchange (20)

Chemoinformatic File Format.pptx
Chemoinformatic File Format.pptxChemoinformatic File Format.pptx
Chemoinformatic File Format.pptx
 
Developing tools for high resolution mass spectrometry-based screening via th...
Developing tools for high resolution mass spectrometry-based screening via th...Developing tools for high resolution mass spectrometry-based screening via th...
Developing tools for high resolution mass spectrometry-based screening via th...
 
A label free and enzyme-free aptasensor for visual cd2+ detection based on sp...
A label free and enzyme-free aptasensor for visual cd2+ detection based on sp...A label free and enzyme-free aptasensor for visual cd2+ detection based on sp...
A label free and enzyme-free aptasensor for visual cd2+ detection based on sp...
 
Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Data analysis workflows part 2 2015
Data analysis workflows part 2 2015
 
Meaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchMeaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine research
 
Virtual screening of chemicals for endocrine disrupting activity through CER...
Virtual screening of chemicals for endocrine disrupting activity through  CER...Virtual screening of chemicals for endocrine disrupting activity through  CER...
Virtual screening of chemicals for endocrine disrupting activity through CER...
 
Multi-Attribute Decision Making with VIKOR Method for Any Purpose Decision
Multi-Attribute Decision Making with VIKOR Method for Any Purpose DecisionMulti-Attribute Decision Making with VIKOR Method for Any Purpose Decision
Multi-Attribute Decision Making with VIKOR Method for Any Purpose Decision
 
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
 
MedChemica Active Learning - Combining MMPA and ML
MedChemica Active Learning - Combining MMPA and MLMedChemica Active Learning - Combining MMPA and ML
MedChemica Active Learning - Combining MMPA and ML
 
Modern analytical chemistry
Modern analytical chemistryModern analytical chemistry
Modern analytical chemistry
 
Nanomaterial design guided by the principles of green chemistry
Nanomaterial design guided by the principles of green chemistryNanomaterial design guided by the principles of green chemistry
Nanomaterial design guided by the principles of green chemistry
 
Efficient Searching and Similarity of Unmapped Reactions: Application to ELN ...
Efficient Searching and Similarity of Unmapped Reactions: Application to ELN ...Efficient Searching and Similarity of Unmapped Reactions: Application to ELN ...
Efficient Searching and Similarity of Unmapped Reactions: Application to ELN ...
 
Duffy 1998
Duffy 1998Duffy 1998
Duffy 1998
 
Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biog...
Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biog...Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biog...
Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biog...
 
Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...
 
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
 
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
 
Cadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.PharmCadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.Pharm
 
Project padeatric adeno virus inhibitor.pptx
Project padeatric adeno virus inhibitor.pptxProject padeatric adeno virus inhibitor.pptx
Project padeatric adeno virus inhibitor.pptx
 
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
 

More from Ed Griffen

MedChemica Levinthal Lecture at Openeye CUP XX 2020
MedChemica Levinthal Lecture at Openeye CUP XX 2020MedChemica Levinthal Lecture at Openeye CUP XX 2020
MedChemica Levinthal Lecture at Openeye CUP XX 2020
Ed Griffen
 
Accelerating lead optimisation with active learning by exploiting MMPA based ...
Accelerating lead optimisation with active learning by exploiting MMPA based ...Accelerating lead optimisation with active learning by exploiting MMPA based ...
Accelerating lead optimisation with active learning by exploiting MMPA based ...
Ed Griffen
 
Griffen MedChemica Virtual Tox Panel
Griffen MedChemica Virtual Tox PanelGriffen MedChemica Virtual Tox Panel
Griffen MedChemica Virtual Tox Panel
Ed Griffen
 
RSC Hatfield 2018 Kinase meeting : potency patents MMPA approaches
RSC Hatfield 2018  Kinase meeting : potency patents MMPA approachesRSC Hatfield 2018  Kinase meeting : potency patents MMPA approaches
RSC Hatfield 2018 Kinase meeting : potency patents MMPA approaches
Ed Griffen
 
Explainable AI in Drug Hunting
Explainable AI in Drug HuntingExplainable AI in Drug Hunting
Explainable AI in Drug Hunting
Ed Griffen
 
MedChemica Large scale analysis and sharing of Medicinal chemistry Knowledge ...
MedChemica Large scale analysis and sharing of Medicinal chemistry Knowledge ...MedChemica Large scale analysis and sharing of Medicinal chemistry Knowledge ...
MedChemica Large scale analysis and sharing of Medicinal chemistry Knowledge ...
Ed Griffen
 
Extracting actionable knowledge from large scale in vitro pharmacology data
Extracting actionable knowledge from large scale in vitro pharmacology dataExtracting actionable knowledge from large scale in vitro pharmacology data
Extracting actionable knowledge from large scale in vitro pharmacology data
Ed Griffen
 
Pharmacophore extraction from Matched Molecular Pair Analysis
Pharmacophore extraction from Matched Molecular Pair AnalysisPharmacophore extraction from Matched Molecular Pair Analysis
Pharmacophore extraction from Matched Molecular Pair Analysis
Ed Griffen
 

More from Ed Griffen (8)

MedChemica Levinthal Lecture at Openeye CUP XX 2020
MedChemica Levinthal Lecture at Openeye CUP XX 2020MedChemica Levinthal Lecture at Openeye CUP XX 2020
MedChemica Levinthal Lecture at Openeye CUP XX 2020
 
Accelerating lead optimisation with active learning by exploiting MMPA based ...
Accelerating lead optimisation with active learning by exploiting MMPA based ...Accelerating lead optimisation with active learning by exploiting MMPA based ...
Accelerating lead optimisation with active learning by exploiting MMPA based ...
 
Griffen MedChemica Virtual Tox Panel
Griffen MedChemica Virtual Tox PanelGriffen MedChemica Virtual Tox Panel
Griffen MedChemica Virtual Tox Panel
 
RSC Hatfield 2018 Kinase meeting : potency patents MMPA approaches
RSC Hatfield 2018  Kinase meeting : potency patents MMPA approachesRSC Hatfield 2018  Kinase meeting : potency patents MMPA approaches
RSC Hatfield 2018 Kinase meeting : potency patents MMPA approaches
 
Explainable AI in Drug Hunting
Explainable AI in Drug HuntingExplainable AI in Drug Hunting
Explainable AI in Drug Hunting
 
MedChemica Large scale analysis and sharing of Medicinal chemistry Knowledge ...
MedChemica Large scale analysis and sharing of Medicinal chemistry Knowledge ...MedChemica Large scale analysis and sharing of Medicinal chemistry Knowledge ...
MedChemica Large scale analysis and sharing of Medicinal chemistry Knowledge ...
 
Extracting actionable knowledge from large scale in vitro pharmacology data
Extracting actionable knowledge from large scale in vitro pharmacology dataExtracting actionable knowledge from large scale in vitro pharmacology data
Extracting actionable knowledge from large scale in vitro pharmacology data
 
Pharmacophore extraction from Matched Molecular Pair Analysis
Pharmacophore extraction from Matched Molecular Pair AnalysisPharmacophore extraction from Matched Molecular Pair Analysis
Pharmacophore extraction from Matched Molecular Pair Analysis
 

Recently uploaded

A NICER VIEW OF THE NEAREST AND BRIGHTEST MILLISECOND PULSAR: PSR J0437−4715
A NICER VIEW OF THE NEAREST AND BRIGHTEST MILLISECOND PULSAR: PSR J0437−4715A NICER VIEW OF THE NEAREST AND BRIGHTEST MILLISECOND PULSAR: PSR J0437−4715
A NICER VIEW OF THE NEAREST AND BRIGHTEST MILLISECOND PULSAR: PSR J0437−4715
Sérgio Sacani
 
Classification and role of plant nutrients - Roxana Madjar
Classification and role of plant nutrients - Roxana MadjarClassification and role of plant nutrients - Roxana Madjar
Classification and role of plant nutrients - Roxana Madjar
Faculty of Applied Chemistry and Materials Science
 
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
Sérgio Sacani
 
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
Sérgio Sacani
 
How Does TaskTrain Integrate Workflow and Project Management Efficiently.pdf
How Does TaskTrain Integrate Workflow and Project Management Efficiently.pdfHow Does TaskTrain Integrate Workflow and Project Management Efficiently.pdf
How Does TaskTrain Integrate Workflow and Project Management Efficiently.pdf
Task Train
 
Adjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyerAdjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyer
pablovgd
 
Analytical methods for blue residues characterization - Oana Crina Bujor
Analytical methods for blue residues characterization - Oana Crina BujorAnalytical methods for blue residues characterization - Oana Crina Bujor
Analytical methods for blue residues characterization - Oana Crina Bujor
Faculty of Applied Chemistry and Materials Science
 
All-domain Anomaly Resolution Office Supplement to Oak Ridge National Laborat...
All-domain Anomaly Resolution Office Supplement to Oak Ridge National Laborat...All-domain Anomaly Resolution Office Supplement to Oak Ridge National Laborat...
All-domain Anomaly Resolution Office Supplement to Oak Ridge National Laborat...
Sérgio Sacani
 
Simulations of pulsed overpressure jets: formation of bellows and ripples in ...
Simulations of pulsed overpressure jets: formation of bellows and ripples in ...Simulations of pulsed overpressure jets: formation of bellows and ripples in ...
Simulations of pulsed overpressure jets: formation of bellows and ripples in ...
Sérgio Sacani
 
BIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptx
BIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptxBIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptx
BIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptx
alishyt102010
 
Gametogenesis: Male gametes Formation Process / Spermatogenesis .pdf
Gametogenesis: Male gametes Formation Process / Spermatogenesis .pdfGametogenesis: Male gametes Formation Process / Spermatogenesis .pdf
Gametogenesis: Male gametes Formation Process / Spermatogenesis .pdf
SELF-EXPLANATORY
 
Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...
Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...
Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...
bellared2
 
Direct instructions, towards hundred fold yield,layering,budding,grafting,pla...
Direct instructions, towards hundred fold yield,layering,budding,grafting,pla...Direct instructions, towards hundred fold yield,layering,budding,grafting,pla...
Direct instructions, towards hundred fold yield,layering,budding,grafting,pla...
Dr. sreeremya S
 
Phytoremediation: Harnessing Nature's Power with Phytoremediation
Phytoremediation: Harnessing Nature's Power with PhytoremediationPhytoremediation: Harnessing Nature's Power with Phytoremediation
Phytoremediation: Harnessing Nature's Power with Phytoremediation
Gurjant Singh
 
Testing the Son of God Hypothesis (Jesus Christ)
Testing the Son of God Hypothesis (Jesus Christ)Testing the Son of God Hypothesis (Jesus Christ)
Testing the Son of God Hypothesis (Jesus Christ)
Robert Luk
 
Potential of Marine renewable and Non renewable energy.pptx
Potential of Marine renewable and Non renewable energy.pptxPotential of Marine renewable and Non renewable energy.pptx
Potential of Marine renewable and Non renewable energy.pptx
J. Bovas Joel BFSc
 
Composting blue materials - Joshua Cabell
Composting blue materials - Joshua CabellComposting blue materials - Joshua Cabell
Composting blue materials - Joshua Cabell
Faculty of Applied Chemistry and Materials Science
 
Structure of Sperm / Spermatozoon .pdf
Structure of  Sperm / Spermatozoon  .pdfStructure of  Sperm / Spermatozoon  .pdf
Structure of Sperm / Spermatozoon .pdf
SELF-EXPLANATORY
 
AN EMPIRE ACROSS THE THREE CONTINENTS.pptx
AN EMPIRE ACROSS THE THREE CONTINENTS.pptxAN EMPIRE ACROSS THE THREE CONTINENTS.pptx
AN EMPIRE ACROSS THE THREE CONTINENTS.pptx
kalpnayadav03021986
 
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptxSCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
WALTONMARBRUCAL
 

Recently uploaded (20)

A NICER VIEW OF THE NEAREST AND BRIGHTEST MILLISECOND PULSAR: PSR J0437−4715
A NICER VIEW OF THE NEAREST AND BRIGHTEST MILLISECOND PULSAR: PSR J0437−4715A NICER VIEW OF THE NEAREST AND BRIGHTEST MILLISECOND PULSAR: PSR J0437−4715
A NICER VIEW OF THE NEAREST AND BRIGHTEST MILLISECOND PULSAR: PSR J0437−4715
 
Classification and role of plant nutrients - Roxana Madjar
Classification and role of plant nutrients - Roxana MadjarClassification and role of plant nutrients - Roxana Madjar
Classification and role of plant nutrients - Roxana Madjar
 
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
 
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
 
How Does TaskTrain Integrate Workflow and Project Management Efficiently.pdf
How Does TaskTrain Integrate Workflow and Project Management Efficiently.pdfHow Does TaskTrain Integrate Workflow and Project Management Efficiently.pdf
How Does TaskTrain Integrate Workflow and Project Management Efficiently.pdf
 
Adjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyerAdjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyer
 
Analytical methods for blue residues characterization - Oana Crina Bujor
Analytical methods for blue residues characterization - Oana Crina BujorAnalytical methods for blue residues characterization - Oana Crina Bujor
Analytical methods for blue residues characterization - Oana Crina Bujor
 
All-domain Anomaly Resolution Office Supplement to Oak Ridge National Laborat...
All-domain Anomaly Resolution Office Supplement to Oak Ridge National Laborat...All-domain Anomaly Resolution Office Supplement to Oak Ridge National Laborat...
All-domain Anomaly Resolution Office Supplement to Oak Ridge National Laborat...
 
Simulations of pulsed overpressure jets: formation of bellows and ripples in ...
Simulations of pulsed overpressure jets: formation of bellows and ripples in ...Simulations of pulsed overpressure jets: formation of bellows and ripples in ...
Simulations of pulsed overpressure jets: formation of bellows and ripples in ...
 
BIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptx
BIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptxBIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptx
BIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptx
 
Gametogenesis: Male gametes Formation Process / Spermatogenesis .pdf
Gametogenesis: Male gametes Formation Process / Spermatogenesis .pdfGametogenesis: Male gametes Formation Process / Spermatogenesis .pdf
Gametogenesis: Male gametes Formation Process / Spermatogenesis .pdf
 
Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...
Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...
Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...
 
Direct instructions, towards hundred fold yield,layering,budding,grafting,pla...
Direct instructions, towards hundred fold yield,layering,budding,grafting,pla...Direct instructions, towards hundred fold yield,layering,budding,grafting,pla...
Direct instructions, towards hundred fold yield,layering,budding,grafting,pla...
 
Phytoremediation: Harnessing Nature's Power with Phytoremediation
Phytoremediation: Harnessing Nature's Power with PhytoremediationPhytoremediation: Harnessing Nature's Power with Phytoremediation
Phytoremediation: Harnessing Nature's Power with Phytoremediation
 
Testing the Son of God Hypothesis (Jesus Christ)
Testing the Son of God Hypothesis (Jesus Christ)Testing the Son of God Hypothesis (Jesus Christ)
Testing the Son of God Hypothesis (Jesus Christ)
 
Potential of Marine renewable and Non renewable energy.pptx
Potential of Marine renewable and Non renewable energy.pptxPotential of Marine renewable and Non renewable energy.pptx
Potential of Marine renewable and Non renewable energy.pptx
 
Composting blue materials - Joshua Cabell
Composting blue materials - Joshua CabellComposting blue materials - Joshua Cabell
Composting blue materials - Joshua Cabell
 
Structure of Sperm / Spermatozoon .pdf
Structure of  Sperm / Spermatozoon  .pdfStructure of  Sperm / Spermatozoon  .pdf
Structure of Sperm / Spermatozoon .pdf
 
AN EMPIRE ACROSS THE THREE CONTINENTS.pptx
AN EMPIRE ACROSS THE THREE CONTINENTS.pptxAN EMPIRE ACROSS THE THREE CONTINENTS.pptx
AN EMPIRE ACROSS THE THREE CONTINENTS.pptx
 
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptxSCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
 

Extracting medicinal chemistry knowledge by a secured Matched Molecular Pair Analysis platform: standardization of SMIRKS enables knowledge exchange

  • 1. MedChemica | 2017 CICAG RSC Liverpool June 2017 Extracting medicinal chemistry knowledge by a secured Matched Molecular Pair Analysis platform: standardization of SMIRKS enables knowledge exchange Dr Alexander Dossetter MedChemica CICAG Structure Representaton meeting, 22 June 2017 Liverpool University, UK
  • 2. MedChemica | 2017 CICAG RSC Liverpool June 2017 NCE Drug Approval have not increased enough …something has to change 2 Data - Federal Drug Administra2on Website h7ps://www.fda.gov A7ri2on in the Pharmaceu2cal Industry: Reasons, Implica2ons, and Pathways Forward By Alexander Alex, C. John Harris, Dennis A. Smith; Wiley 2016 15.5 18.0 24.2 21.4 26.2 36.8 23.6 22.2 33 0 10 20 30 40 50 60 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Mean each 5 year
  • 3. MedChemica | 2017 CICAG RSC Liverpool June 2017 Actual spending / Chemistry everywhere Paul, S. M. et al How to improve R&D productivity: the pharmaceutical industry’s grand challenge, Nat. Rev. Drug Discovery 2010, 9, 203 Medium sized companies R&D spend in one year $1.7 billion 34% is spent in H2L and LO Better Knowledge = Fewer Compounds = Lower the cost Chemistry CD Quality influences the success and speed Chemistry controls the productivity and quality
  • 4. MedChemica | 2017 CICAG RSC Liverpool June 2017 Where is the “Handbook of Medicinal Chemistry”? •  Case study collections •  “War stories” & anecdotes •  Broad highly general rules (eg Lipinski) Where is the evidence based quantitative guide to medicinal chemistry? 4 Here is the story of the building of the ‘Grand Rule Database’. A Multi-pharma Med Chem Textbook based on a thorough AI study. Grand Rule Database v3
  • 5. MedChemica | 2017 CICAG RSC Liverpool June 2017 What we actually need is UNSUPERVISED Machine Learning 5 What? Where? Why? Large datasets Large Pharma Access to all the “actives” and “in- actives” Algorithms to extract structure Matched Molecular Pair Analysis All combinations considered [O(n2) problem], accurate structures, speed, finds counter intuitive Rules Compute resource Within Pharma IP secure Storage Secure with VM Multi-T-bytes Ability to visualize and apply the results Modern web tools and REST API Chemists understand it and use it
  • 6. MedChemica | 2017 CICAG RSC Liverpool June 2017 Grand Rule database Better medicinal chemistry by sharing knowledge not data & structures MMP finder MCPairs= Kramer, C.; Ting, A.; Zheng, H.; Hert, J.; Schindler, T.; Stahl, M.; Robb, G.; Crawford, J.; Blaney, J.; Montague, S.; Leach, A. G.; Dosse7er, A. G.; Griffen, E. J. Learning Medicinal Chemistry ADMET rules from Cross-company MMPA J.Med.Chem. SubmiBed.
  • 7. MedChemica | 2017 CICAG RSC Liverpool June 2017 Finding Matched Pairs and Chem-infomatics •  Challenge: – Matched Pair finding is an O(n2) process so will be “BigData” – What is the best matched pair finding technique? – Once the pairs are found, how do you encode the output so knowledge can be shared securely? – Once there is knowledge how do chemists use it?
  • 8. MedChemica | 2017 CICAG RSC Liverpool June 2017 Data Integrity and curation Knowledge extraction algorithms Consortium building to share knowledge Into the minds of chemists ✓ ✓ ✓ ✓ Grand Rule Database v3 MCPairs Barriers Broken to Sharing Knowledge
  • 9. MedChemica | 2017 CICAG RSC Liverpool June 2017 New – stereochemistry standard Standardisation (Units, Species, Routes, Aggregation) All agreed by Consortium (all in the MCPairs system and documentation) – use public ontology and taxonomy where possible 22 standard units Linear scale / catagorical Shared Assay Standard Linear scale / catagorical 1962 Species
  • 10. MedChemica | 2017 CICAG RSC Liverpool June 2017 Barriers Broken to Sharing Knowledge Data Integrity and curation Knowledge extraction algorithms Consortium building to share knowledge Into the minds of chemists ✓
  • 11. MedChemica | 2017 CICAG RSC Liverpool June 2017 Matched pair methodology There are two technique – Frag and Index (H/R) and MCSS A – CHEMBL156639 B - CHEMBL2387702 A – CHEMBL100461 B –CHEMBL103900 MCSS ✓, F&I ✗ MCSS ✗ , F&I ✓ MCSS ✓, F&I ✗ MCSS ✓, F&I ✗ MCSS ✗, F&I ✗ MCSS ✗, F&I ✗ MCSS ✗ , F&I ✓ MCSS ✓, F&I ✗ The two techniques find different chemistry….
  • 12. MedChemica | 2017 CICAG RSC Liverpool June 2017 Does the Matched Pair method really matter? Using only one technique will miss between 12% and 56% of pairings 12 Pairings Pairings number of compounds common FI only MCSS only total FI only % common % MCSS only % VEGF 4466 14631 17172 14823 46626 37 31 32 Dopamine Transporter 1470 4480 8930 3497 16907 53 26 21 GABAA 848 2500 1722 4205 8427 20 30 50 D2 human 3873 12995 13811 13098 39904 35 33 33 D2 rat 1807 5408 6595 7346 19349 34 28 38 Acetylcholine esterase 383 536 725 1434 2695 27 20 53 Monoamine oxidase 264 653 1156 246 2055 56 32 12 min 20 20 12 max 56 33 53 FI MCSS common Lukac, I.; Zarnecka, J.; Griffen, E.J.; Dosse7er, A.G.; St-Gallay, S.; Enoch, S.; Madden, J.; Leach, A.G. "Turbocharging matched molecular pair analysis; opEmizing the idenEficaEon and analysis of pairs.” J. Chem. Inf. Model. Submi7ed
  • 13. MedChemica | 2017 CICAG RSC Liverpool June 2017 •  Matched Molecular Pairs – Molecules that differ only by a particular, well- defined structural transformation •  Transformation with environment capture – MMPs can be recorded as transformations from Aà B •  Environment is essential to understand chemistry Griffen, E. et al. Matched Molecular Pairs as a Medicinal Chemistry Tool. Journal of Medicinal Chemistry. 2011, 54(22), pp.7739-7750. Advanced MMPA with MCPairs Δ Data A-B1 2 2 3 3 3 4 4 4 12 23 3 34 4 4 A B Environment is key and we need to capture it in our chemical encoding…
  • 14. MedChemica | 2017 CICAG RSC Liverpool June 2017 How do we encode the chemical transformation? •  Requirements –  Lightweight – using as few bytes as possible –  hashable – allows database indexing –  Can be used with Chem Toolkits to generate product molecules from chemist’s input –  SMIRKS / Reaction SMARTS (RDKit) fit the bill –  Issue – need to automatically generate SMIRKS from the matched pairs •  New algorithm required •  Canonicalisation is required so SMIRKS are consistent from one organisation to another
  • 15. MedChemica | 2017 CICAG RSC Liverpool June 2017 Standardising Canonicalised SMIRKS CHEMBL309689 CHEMBL2331793 3-Atom rule [O:1]([H])[c:2]([c:3])[n:4]>>[c:3][c:2]([n:4])[O:1][C]([H])([H])[C]([H])([H])([H]) Highly specific explicit H Key mapped atom With 4 atom transform environment is more complex 4-Atom rule [O:1]([H])[c:2]1[c:3]([c:4][o:5][n:6]1)[C:7]([H])([H])>> [C]([H])([H])([H])[C]([H])([H])[O:1][c:2]1[c:3]([c:4][o:5][n:6]1)[C:7]([H])([H]) Note Mapped atoms run left to right 1,2,3,4…n Rule change depending on rule size, environment and symmetry Without explicit H critical information is lost and incorrect products generated With OpenEye ChemTK SMIRKS operate at 99.2% reliability and functionality 2 1 3 3 4 4 4 Orange – atom env radius Blue - atom map index 4 3 2 1 4 3 2 1
  • 16. MedChemica | 2017 CICAG RSC Liverpool June 2017 Reaction SMARTs variation (for RDkit) CHEMBL309689 CHEMBL2331793 3-Atom rule [O;H1:1][c:2]([c:3])[n:4]>>[c:3][c:2]([n:4])[O;H0:1][C;H2][C;H3] Hydrogens are within SMARTS (note H0 in product) 4-Atom rule [O;H1:1][c:2]1[c:3]([c:4][o:5][n:6]1)[C;H2:7]>> [C;H3][C;H2][O;H0:1][c:2]1[c:3]([c:4][o:5][n:6]1)[C;H2:7] RDkit canonical reaction SMARTS work at ~95% of examples. Without the Hydrogen SMARTS in mappings critical chemical information is lost and products are not formed 2 1 3 3 4 4 4 Orange – atom env radius Blue - atom map index 4 3 2 1 4 3 2 1
  • 17. MedChemica | 2017 CICAG RSC Liverpool June 2017 Identify and group matching SMIRKS Calculate statistical parameters for each unique SMIRKS (n, median, sd, se, n_up/n_down) Is n ≥ 6? Not enough data: ignore transformation Is the |median| ≤ 0.05 and the intercentile range (10-90%) ≤ 0.3? Perform two-tailed binomial test on the transformation to determine the significance of the up/ down frequency transformation is classified as ‘neutral’ Transformation classified as ‘NED’ (No Effect Determined) Transformation classified as ‘increase’ or ‘decrease’ depending on which direction the property is changing pass fail yes no yes no How do you find Knowledge? Rule selection 101 0 +ve-ve Median data difference Neutral Increase Decrease NED
  • 18. MedChemica | 2017 CICAG RSC Liverpool June 2017 Barriers Broken to Sharing Knowledge Data Integrity and curation Knowledge extraction algorithms Consortium building to share knowledge Into the minds of chemists ✓ ✓
  • 19. MedChemica | 2017 CICAG RSC Liverpool June 2017 Merging knowledge •  Use the transforms that are robust in both companies to calibrate assays. •  Once the assays are calibrated against each other the transform data can be combined to build support in poorly exemplified transforms •  Methodology precedented in other fields CalibrateRobust Robust Weak Weak Discover Novel Pharma 1 Pharma 2
  • 20. MedChemica | 2017 CICAG RSC Liverpool June 2017 Merging Datasets •  Datasets are standardized by comparison of transformations shared by contributing companies •  Transformations are examined at the “pair example” level •  Minimum of 6 transformations, each with a minimum of 6 pairs (42 compounds bare minimum) required to standardise •  “calibration factors” extracted to standardize the datasets to a common value – mean of calibration factors 0.94, typical range 0.8-1.2. •  Datasets with too few common transformations have standard compound measurements shared for calibration. “Blinded” source of transforma2ons
  • 21. MedChemica | 2017 CICAG RSC Liverpool June 2017 Current Knowledge sets – June 2015 Numbers of statistically valid transforms Grouped Datasets Number of Rules logD7.4 153449 Merged solubility 46655 In vitro microsomal clearance: Human, rat, mouse, cyno, dog 88423 In vitro hepatocyte clearance : Human, rat, mouse, cyno, dog 26627 MCDK permeability A-B / B – A efflux 1852 Cytochrome P450 inhibition: 2C9, 2D6 , 3A4 , 2C19 , 1A2 40605 Cardiac ion channels NaV 1.5, hERG ion channel inhibition 15636 Glutathione Stability 116 plasma protein or albumin binding Human, rat, mouse, cyno, dog 64622 Grand Rule Database v3
  • 22. MedChemica | 2017 CICAG RSC Liverpool June 2017 Data Integrity and curation Knowledge extraction algorithms Consortium building to share knowledge Into the minds of chemists ✓ ✓ ✓ ✓ Grand Rule Database v3 MCPairs Barriers Broken to Sharing Knowledge
  • 23. MedChemica | 2017 CICAG RSC Liverpool June 2017 Exploiting Knowledge for Compound Optimization Measured Data rule finder Exploitable Knowledge MCExpert System Problem molecule New molecule suggestions rule finder MCPairs= “..it’s like asking 150 of your peers for ideas in just a few seconds” – AZ Principal Scientist
  • 24. MedChemica | 2017 CICAG RSC Liverpool June 2017 “Its like asking 150 of peers for ideas…” Suggested Molecules with “heat map” of Rules for 26 in-vitro endpoints - “…an MPO Gold Mine” – Roche consortium member Ask for a demo
  • 25. MedChemica | 2017 CICAG RSC Liverpool June 2017 More examples of Success 25 MedChemica | 2016 ACS Philadelphia 2016 - Fix hERG problem whilst maintaining potency Waring et al, Med. Chem. Commun., (2011), 2, 775 Glucokinase Activators MMPA ∆pEC50: -0.1 ∆logD: -0.6 ∆hERG pIC50 :-0.5 n=33 n=32 n=22 MMPA ∆pEC50: +0.3 ∆logD: +0.3 ∆hERG pIC50 :-0.3 n=20 n=23 n=19 MMPA ∆pEC50: -0.1 ∆logD: -0.6 ∆hERG pIC50 :-0.5 n=27 n=27 n=7 MedChemica | 2016 ACS Philadelphia 2016 A Less Simple Example Increase logD and gain solubility Property Number of Observa2ons Direc2on Mean Change Probability logD 8 Increase 1.2 100% Log(Solubility) 14 Increase 1.4 92% What is the effect on lipophilicity and solubility? Roche data is inconclusive! (2 pairs for logD, 1 pair for solubility) logD = 2.65 KineMc solubility = 84 µg/ml IC50 SST5 = 0.8 µM logD = 3.63 KineMc solubility = >452 µg/ml IC50 SST5 = 0.19 µM Ques2on: Available Sta2s2cs: Roche Example: Thompson; M.J. et al J. Med. Chem., 2015, 58 (23), pp 9309–9333 DOI: 10.1021/acs.jmedchem.5b01312
  • 26. MedChemica | 2017 CICAG RSC Liverpool June 2017 Collaborators and Users Survey - 17 out of 19 organisations said the GRD aided project progression
  • 27. MedChemica | 2017 CICAG RSC Liverpool June 2017 Can we understand efflux? – MDR1 / PGP 27 Metrabase h7p://www-metrabase.ch.cam.ac.uk 1911 compounds: substrate Y/N Pair Finding Rule Finding Property analysis MDR1 substrate = ↑ hydrogen bond donors ↑ hydrogen bond acceptors ↑ PSA Only 826 compound pairs 1 “borderline” rule Public transporter data: •  Not quan2ta2ve •  Not enough •  Too diverse •  Trivial conclusions See also: Drug Discov Today. 2012 Apr;17(7-8): 343-51. doi: 10.1016/j.drudis.2011.11.003
  • 28. MedChemica | 2017 CICAG RSC Liverpool June 2017 Global Absorption Analysis by MMPA Combined knowledge from a large number of peer pharma Secure Analysis of in-vitro absorption Good Absorption improves •  Efficacy •  Safety (lower dose / less off target) Medicinal Chemistry demands answers •  Low dose oral bio-availability – how? •  MDR1 resistance in oncology •  Brain penetration for CNS diseases •  Only rough and ready “rules” available – trial and error victims Problem •  These expensive assays preclude ANY one company having enough knowledge •  Extreme paucity of data in literature Solution •  ≥10 companies worth of data •  A typical pharma has ~10000 results •  MedChemica has the technology, standarization to perform this analysis MedChemica’s offer $13600 per organisation to produce a new absorption database Absorption Rule Database v1
  • 29. MedChemica | 2017 CICAG RSC Liverpool June 2017 Data Integrity and curation Knowledge extraction algorithms Consortium building to share knowledge Into the minds of chemists ✓ ✓ ✓ ✓ Grand Rule Database v3 MCPairs Barriers Broken to Sharing Knowledge
  • 30. MedChemica | 2017 CICAG RSC Liverpool June 2017 Key findings: •  Secure sharing of large scale ADMET knowledge between large Pharma is possible •  The collaboration generated great synergy •  Standarisation of Units, Species, Assays, MMPA environment, Canonical SMIRKS enabled sharing •  MMP is a great tool for idea generation  •  The rules have been used in drug-discovery projects and yields a clear business case for sharing
  • 31. MedChemica | 2017 CICAG RSC Liverpool June 2017 A Collaboration of the willing Craig Bruce OE John Cumming Roche David Cosgrove C4XD Andy Grant★ Martin Harrison Elixir Huw Jones Base360 Al Rabow Consulting David Riley AZ Graeme Robb AZ Attilla Ting AZ Howard Tucker retired Dan Warner Myjar Steve St-Galley Syngenta David Wood JDR Lauren Reid MedChemica Shane Monague MedChemica Jessica Stacey MedChemica Andy Barker Consulting Pat Barton AZ Andy Davis AZ Andrew Griffin Elixir Phil Jewsbury AZ Mike Snowden AZ Peter Sjo AZ Martin Packer AZ Manos Perros Entasis Therapeutics Nick Tomkinson AZ Martin Stahl Roche Jerome Hert Roche Martin Blapp Roche Torsten Schindler Roche Paula Petrone Roche Christian Kramer Roche Jeff Blaney Genentech Hao Zheng Genentech Slaton Lipscomb Genentech Alberto Gobbi Genentech
  • 32. MedChemica | 2017 CICAG RSC Liverpool June 2017