SlideShare a Scribd company logo
1 of 30
Download to read offline
Joint CICAG and Cambridge Cheminformatics Network Meeting
19th Feb 2014

Using Matched Molecular Series as
a Predictive Tool To Optimize
Biological Activity
Noel O’Boyle and Roger Sayle
NextMove Software

Jonas Boström and Adrian Gill
AstraZeneca
Matched pairs &
series
Matched (Molecular) Pairs

1.6

[Cl, F]

3.5
Coined by Kenny and Sadowski in 2005*
Easier to predict differences in the values of a property
than it is to predict the value itself
* Chemoinformatics in drug discovery, Wiley, 271–285.
Matched Pair usage
• Successfully used for:
– Rationalising and predicting physicochemical
property changes
– Finding bioisosteres

• Not very successful in improving activity
– Activity changes dependent on binding environment

• Various approaches to address this
– Incorporate atom environment (WizePairZ and
Papadatos et al JCIM, 2010, 50, 1872)
– Incorporate protein environment (VAMMPIRE
and 3D Matched Pairs)
Looking beyond matched Pairs
• Consider the following ‘trivial’ inference
– If we know that [Cl>F] in a particular case, it would
increase the likelihood that [Br>F]

• Using known orderings of matched pairs, we
can make improved inferences about other
matched pairs
– Not captured by matched pair analysis

• Matched (Molecular) Series
Matched SERIES of LENGTH 2 = MP

1.6

3.5

[Cl, F]
Matched Series of length 3

1.6

3.5

2.1

[Cl, F, NH2]
Matched Series Literature

• “Matching molecular series” introduced by
Wawer and Bajorath JMC 2011, 54, 2944
– Subsequent papers use MMS to investigate SAR
transfer, mechanism hopping, visualisation of SAR
networks and SAR matrices

• Only a single other paper on MMS
– Mills et al Med Chem Commun 2012, 3, 174
Algorithm to find matched Series
Index
(Scaffold)
Fragment

Matched
Series

+
+

Index
Collate

+

• Hussain and Rea JCIM 2010, 50, 339
– Fragment molecules at acyclic single bonds
• Single-cut only, scaffold >= 5, R group <= 12

– Index each fragment based on the other
– A matched series will be indexed together

Matched
Series
dATASET
Matched series from ChEMBL16 IC50 binding assays
N=2: 211,989
N=3: 52,341
N=4: 24,426
N=5: 13,792
N=6:
9,197
SAR Transfer
CHEMBL768956
COX-2 inhibition

CHEMBL772766
COX-1 inhibition

R Group

CHEMBL768956 (pIC50)

CHEMBL772766 (pIC50)

SMe

??

5.92

NH2

??

OMe

6.68

Me

6.10

4.82

Cl

5.92

4.75

F

5.82

4.59

Et

5.81

4.54

CF3

5.70

<4.00

H

5.62

4.26

COOH

4.23

<3.60

Rank order

5.88

Potential SAR
transfer

5.59

0.93 rank order
correlation
Strengths and weaknesses
• High confidence in predictions if sufficiently
long series with correlated activities (or their
rank order)
– Not always able to find such a series
– For short series will typically find 10s/100s/1000s
of matching series with low confidence

• Suited to pairwise comparison within focused
dataset
– Dense SAR matrix from target with well-explored
SAR
Preferred orders
in matched series
Preferred orders: Halides (N=2)
For an ordered matched series (i.e. A>B>C>…), there are
N! ways of arranging the R Groups:
Series

Observations*

F>H

8250

H>F

7338

Would expect 7794 for each assuming the order is
random
– We can calculate enrichment
*Dataset is ChEMBL16 IC50 data for binding assays (transformed to pIC50 values)
Preferred orders: Halides (N=2)
For an ordered matched series (i.e. A>B>C>…), there are
N! ways of arranging the R Groups:
Series

Enrichment

Observations

F>H

1.06*

8250

H>F

0.94*

7338

Would expect 7794 for each assuming the order is
random
– We can calculate enrichment
*Significant at 0.05 level according to binomial test after correcting for
multiple testing (Bonferroni with N-1)
Preferred orders: Halides (N=3)
Series

Enrichment

Observations

Cl > F > H

1.85*

1185

H > F > Cl

1.08

690

F > Cl > H

0.88*

566

Cl > H > F

0.79*

504

F > H > Cl

0.78*

503

H > Cl > F

0.63*

401
Preferred orders: Halides (N=4)
Series

Enrichment

Observations

Br > Cl > F > H

5.62*

230

Cl > Br > F > H

2.79*

114

H > F > Cl > Br

1.69*

69

F > Cl > Br > H

1.47

60

Br > Cl > H > F

1.39

57

Cl > Br > H > F

0.88

36

…

…

…

H > F > Br > Cl

0.73

30

…

…

…

Cl > H > F > Br

0.49*

20

H > Br > F > Cl

0.49*

20

Cl > H > Br > F

0.46*

19

Br > F > H > Cl

0.44*

18

H > Cl > Br > F

0.44*

18

F > H > Br > Cl

0.42*

17

H > Cl > F > Br

0.37*

15

F > Br > H > Cl

0.34*

14

Br > H > F > Cl

0.22*

9

N=2: Max = 1.06, Min = 0.94
N=3: Max = 1.85, Min = 0.63
N=4: Max = 5.62, Min = 0.22
Longer series exhibit greater
preferences

If [H>F>Cl] is observed, will Br
increase activity further?
128 observations of [H>F>Cl]
but only 9 where [Br>H>F>Cl]

Don’t forget sampling bias
Matsy:
Prediction using
Matched Series
Find R Groups that increase activity
Query
A>B

R Group

Observations

D
E
C
…

3
1
4
…

Obs that
increase
activity
3
1
1

A>B>C
C>A>B
D>A>B>C
D>A>C>B
E>D>A>B
…
% that
increase
activity
100
100
25
…
Example
Query:
R Group

>

>

Observations

% that increase
activity

53

75

28

71

22

63

41

58

36

58

40 proteins including:
22 GPCRs (muscarinic
acetylcholine, glucagon,
endothelin, angiotensin)
5 oxidoreductases
(cytochrome P450,
cyclooxygenase)
3 acyltransferases
3 hydrolases
Example
Query:
R Group

>

>

Observations

% that increase
activity

23

39

24

37

97

35

21

33

21

33

9 proteins including:
3 proteases (HIV-1, cathepsin K)
2 kinases (serine/threonine
protein kinase ATR, CDK2)
1 GPCR
CHEMBL1953234
PARP-1 inhibition (Poly[ADP-Ribose] Polymerase 1)
[Me>Cl>H>F>CF3]

R

Remove most active and predict:
[?>Cl>H>F>CF3]

Prediction ranked Me as 2nd most likely, on the basis of 23
observations of which 7 (30%) showed improvement

R

CHEMBL956577
Inverse agonist at Histamine H3
receptor
[Me>Cl>H>F>CF3]
Topliss Decision
Tree
Rational Stepwise scheme for
Substituted Phenyl

Topliss, J. G. Utilization of Operational Schemes for Analog
Synthesis in Drug Design. J. Med. Chem. 1972, 15, 1006–1011.
Data-Driven Stepwise scheme for
Substituted Phenyl

Using Matsy and ChEMBL 16 IC50 binding data
DEMO of drag-and-drop interface
In summary
• Longer matched series (N>2) show an increased
preference for particular activity orders
• This can be exploited to predict R groups that
will increase activity
– Predictions are typically based on data from a range
of targets and structures

• Completely knowledge-based
– Can link predictions to particular targets/structures
– Predictions refined based on new results
– Data-hungry
Using Matched Molecular Series as a
Predictive Tool To Optimize Biological
Activity
http://nextmovesoftware.com
noel@nextmovesoftware.com
@nmsoftware

More Related Content

Viewers also liked

David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'
David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'
David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'Cresset
 
Representation and display of non-standard peptides using semi-systematic ami...
Representation and display of non-standard peptides using semi-systematic ami...Representation and display of non-standard peptides using semi-systematic ami...
Representation and display of non-standard peptides using semi-systematic ami...NextMove Software
 
Chemistry and reactions from non-US patents
Chemistry and reactions from non-US patentsChemistry and reactions from non-US patents
Chemistry and reactions from non-US patentsNextMove Software
 
Peptide Informatics - Bridging the gap between small-molecule and large-molec...
Peptide Informatics - Bridging the gap between small-molecule and large-molec...Peptide Informatics - Bridging the gap between small-molecule and large-molec...
Peptide Informatics - Bridging the gap between small-molecule and large-molec...NextMove Software
 
Cursuri open septembrie
Cursuri open septembrieCursuri open septembrie
Cursuri open septembrieCatalin Lazar
 
Heat transfer by evaporation
Heat transfer by evaporationHeat transfer by evaporation
Heat transfer by evaporationsherinshaju
 
Planificacion de riesgos._esquema
Planificacion de riesgos._esquemaPlanificacion de riesgos._esquema
Planificacion de riesgos._esquemaBryan Ryan
 
Everything You Need To Know About Creating and Maintaining Donor Relationships
Everything You Need To Know About Creating and Maintaining Donor RelationshipsEverything You Need To Know About Creating and Maintaining Donor Relationships
Everything You Need To Know About Creating and Maintaining Donor RelationshipsBloomerang
 
communication design philosophy
communication design philosophycommunication design philosophy
communication design philosophyYoshinori Kawamura
 

Viewers also liked (15)

David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'
David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'
David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'
 
Representation and display of non-standard peptides using semi-systematic ami...
Representation and display of non-standard peptides using semi-systematic ami...Representation and display of non-standard peptides using semi-systematic ami...
Representation and display of non-standard peptides using semi-systematic ami...
 
Chemistry and reactions from non-US patents
Chemistry and reactions from non-US patentsChemistry and reactions from non-US patents
Chemistry and reactions from non-US patents
 
InChI for Large Molecules
InChI for Large MoleculesInChI for Large Molecules
InChI for Large Molecules
 
Peptide Informatics - Bridging the gap between small-molecule and large-molec...
Peptide Informatics - Bridging the gap between small-molecule and large-molec...Peptide Informatics - Bridging the gap between small-molecule and large-molec...
Peptide Informatics - Bridging the gap between small-molecule and large-molec...
 
XP再解釈
XP再解釈XP再解釈
XP再解釈
 
Cursuri open septembrie
Cursuri open septembrieCursuri open septembrie
Cursuri open septembrie
 
Fases Tecnológicas
Fases TecnológicasFases Tecnológicas
Fases Tecnológicas
 
Heat transfer by evaporation
Heat transfer by evaporationHeat transfer by evaporation
Heat transfer by evaporation
 
Planificacion de riesgos._esquema
Planificacion de riesgos._esquemaPlanificacion de riesgos._esquema
Planificacion de riesgos._esquema
 
Redistribution into OSPF
Redistribution into OSPFRedistribution into OSPF
Redistribution into OSPF
 
Everything You Need To Know About Creating and Maintaining Donor Relationships
Everything You Need To Know About Creating and Maintaining Donor RelationshipsEverything You Need To Know About Creating and Maintaining Donor Relationships
Everything You Need To Know About Creating and Maintaining Donor Relationships
 
communication design philosophy
communication design philosophycommunication design philosophy
communication design philosophy
 
Aggression in health_web
Aggression in health_webAggression in health_web
Aggression in health_web
 
Cursuri open august
Cursuri open augustCursuri open august
Cursuri open august
 

Similar to Using Matched Molecular Series as a Predictive Tool To Optimize Biological Activity

12 ch ken black solution
12 ch ken black solution12 ch ken black solution
12 ch ken black solutionKrunal Shah
 
Prediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructurePrediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructureJeremy Besnard
 
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSISFUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSISIrene Pochinok
 
Quality perception of coding artifacts and packet loss in networked video com...
Quality perception of coding artifacts and packet loss in networked video com...Quality perception of coding artifacts and packet loss in networked video com...
Quality perception of coding artifacts and packet loss in networked video com...soojin kim
 
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...cscpconf
 
beckerbr_Agilent-users-meeting
beckerbr_Agilent-users-meetingbeckerbr_Agilent-users-meeting
beckerbr_Agilent-users-meetingBridget Becker
 
eggs_project_interm
eggs_project_intermeggs_project_interm
eggs_project_intermRoopan Verma
 
Feature Selection Method Based on Chaotic Maps and Butterfly Optimization Alg...
Feature Selection Method Based on Chaotic Maps and Butterfly Optimization Alg...Feature Selection Method Based on Chaotic Maps and Butterfly Optimization Alg...
Feature Selection Method Based on Chaotic Maps and Butterfly Optimization Alg...Tarek Gaber
 
Affect of Metabolic Obesity and Body Mass Index in Coronary Artery Diseases
Affect of Metabolic Obesity and Body Mass Index in Coronary Artery DiseasesAffect of Metabolic Obesity and Body Mass Index in Coronary Artery Diseases
Affect of Metabolic Obesity and Body Mass Index in Coronary Artery DiseasesNikhil Gupta
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesGreg Landrum
 
Diabetes data - model assessment using R
Diabetes data - model assessment using RDiabetes data - model assessment using R
Diabetes data - model assessment using RGregg Barrett
 
Discovery of Ligand-Protein Interactome
Discovery of Ligand-Protein InteractomeDiscovery of Ligand-Protein Interactome
Discovery of Ligand-Protein InteractomePurdue University
 
Factor Analysis for Exploratory Studies
Factor Analysis for Exploratory StudiesFactor Analysis for Exploratory Studies
Factor Analysis for Exploratory StudiesManohar Pahan
 
The physics of computational drug discovery
The physics of computational drug discoveryThe physics of computational drug discovery
The physics of computational drug discoveryShourjya Sanyal
 
Thesis (presentation)
Thesis (presentation)Thesis (presentation)
Thesis (presentation)nlt2390
 
An Artificial Immune Network for Multimodal Function Optimization on Dynamic ...
An Artificial Immune Network for Multimodal Function Optimization on Dynamic ...An Artificial Immune Network for Multimodal Function Optimization on Dynamic ...
An Artificial Immune Network for Multimodal Function Optimization on Dynamic ...Fabricio de França
 

Similar to Using Matched Molecular Series as a Predictive Tool To Optimize Biological Activity (20)

12 ch ken black solution
12 ch ken black solution12 ch ken black solution
12 ch ken black solution
 
Prediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructurePrediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical Structure
 
201977 1-1-3-pb
201977 1-1-3-pb201977 1-1-3-pb
201977 1-1-3-pb
 
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSISFUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
 
Quality perception of coding artifacts and packet loss in networked video com...
Quality perception of coding artifacts and packet loss in networked video com...Quality perception of coding artifacts and packet loss in networked video com...
Quality perception of coding artifacts and packet loss in networked video com...
 
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
 
beckerbr_Agilent-users-meeting
beckerbr_Agilent-users-meetingbeckerbr_Agilent-users-meeting
beckerbr_Agilent-users-meeting
 
eggs_project_interm
eggs_project_intermeggs_project_interm
eggs_project_interm
 
Feature Selection Method Based on Chaotic Maps and Butterfly Optimization Alg...
Feature Selection Method Based on Chaotic Maps and Butterfly Optimization Alg...Feature Selection Method Based on Chaotic Maps and Butterfly Optimization Alg...
Feature Selection Method Based on Chaotic Maps and Butterfly Optimization Alg...
 
Affect of Metabolic Obesity and Body Mass Index in Coronary Artery Diseases
Affect of Metabolic Obesity and Body Mass Index in Coronary Artery DiseasesAffect of Metabolic Obesity and Body Mass Index in Coronary Artery Diseases
Affect of Metabolic Obesity and Body Mass Index in Coronary Artery Diseases
 
Estimating a Population Mean
Estimating a Population MeanEstimating a Population Mean
Estimating a Population Mean
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databases
 
Diabetes data - model assessment using R
Diabetes data - model assessment using RDiabetes data - model assessment using R
Diabetes data - model assessment using R
 
Discovery of Ligand-Protein Interactome
Discovery of Ligand-Protein InteractomeDiscovery of Ligand-Protein Interactome
Discovery of Ligand-Protein Interactome
 
Unit 4.pptx
Unit 4.pptxUnit 4.pptx
Unit 4.pptx
 
Factor Analysis for Exploratory Studies
Factor Analysis for Exploratory StudiesFactor Analysis for Exploratory Studies
Factor Analysis for Exploratory Studies
 
The physics of computational drug discovery
The physics of computational drug discoveryThe physics of computational drug discovery
The physics of computational drug discovery
 
Jmessp13420104
Jmessp13420104Jmessp13420104
Jmessp13420104
 
Thesis (presentation)
Thesis (presentation)Thesis (presentation)
Thesis (presentation)
 
An Artificial Immune Network for Multimodal Function Optimization on Dynamic ...
An Artificial Immune Network for Multimodal Function Optimization on Dynamic ...An Artificial Immune Network for Multimodal Function Optimization on Dynamic ...
An Artificial Immune Network for Multimodal Function Optimization on Dynamic ...
 

More from NextMove Software

CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...NextMove Software
 
Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...NextMove Software
 
CINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedCINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedNextMove Software
 
A de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESA de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESNextMove Software
 
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionRecent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionNextMove Software
 
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...NextMove Software
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsNextMove Software
 
Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...NextMove Software
 
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...NextMove Software
 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKitNextMove Software
 
Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...NextMove Software
 
Digital Chemical Representations
Digital Chemical RepresentationsDigital Chemical Representations
Digital Chemical RepresentationsNextMove Software
 
Challenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsChallenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsNextMove Software
 
PubChem as a Biologics Database
PubChem as a Biologics DatabasePubChem as a Biologics Database
PubChem as a Biologics DatabaseNextMove Software
 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...NextMove Software
 
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction DatabasesCINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction DatabasesNextMove Software
 
Building on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesBuilding on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesNextMove Software
 
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...NextMove Software
 
Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)NextMove Software
 

More from NextMove Software (20)

DeepSMILES
DeepSMILESDeepSMILES
DeepSMILES
 
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
 
Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...
 
CINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedCINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speed
 
A de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESA de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILES
 
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionRecent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
 
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule Implementations
 
Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...
 
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKit
 
Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...
 
Digital Chemical Representations
Digital Chemical RepresentationsDigital Chemical Representations
Digital Chemical Representations
 
Challenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsChallenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptions
 
PubChem as a Biologics Database
PubChem as a Biologics DatabasePubChem as a Biologics Database
PubChem as a Biologics Database
 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
 
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction DatabasesCINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
 
Building on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesBuilding on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfiles
 
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
 
Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)
 

Recently uploaded

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Using Matched Molecular Series as a Predictive Tool To Optimize Biological Activity

  • 1. Joint CICAG and Cambridge Cheminformatics Network Meeting 19th Feb 2014 Using Matched Molecular Series as a Predictive Tool To Optimize Biological Activity Noel O’Boyle and Roger Sayle NextMove Software Jonas Boström and Adrian Gill AstraZeneca
  • 3. Matched (Molecular) Pairs 1.6 [Cl, F] 3.5 Coined by Kenny and Sadowski in 2005* Easier to predict differences in the values of a property than it is to predict the value itself * Chemoinformatics in drug discovery, Wiley, 271–285.
  • 4. Matched Pair usage • Successfully used for: – Rationalising and predicting physicochemical property changes – Finding bioisosteres • Not very successful in improving activity – Activity changes dependent on binding environment • Various approaches to address this – Incorporate atom environment (WizePairZ and Papadatos et al JCIM, 2010, 50, 1872) – Incorporate protein environment (VAMMPIRE and 3D Matched Pairs)
  • 5. Looking beyond matched Pairs • Consider the following ‘trivial’ inference – If we know that [Cl>F] in a particular case, it would increase the likelihood that [Br>F] • Using known orderings of matched pairs, we can make improved inferences about other matched pairs – Not captured by matched pair analysis • Matched (Molecular) Series
  • 6. Matched SERIES of LENGTH 2 = MP 1.6 3.5 [Cl, F]
  • 7. Matched Series of length 3 1.6 3.5 2.1 [Cl, F, NH2]
  • 8. Matched Series Literature • “Matching molecular series” introduced by Wawer and Bajorath JMC 2011, 54, 2944 – Subsequent papers use MMS to investigate SAR transfer, mechanism hopping, visualisation of SAR networks and SAR matrices • Only a single other paper on MMS – Mills et al Med Chem Commun 2012, 3, 174
  • 9. Algorithm to find matched Series Index (Scaffold) Fragment Matched Series + + Index Collate + • Hussain and Rea JCIM 2010, 50, 339 – Fragment molecules at acyclic single bonds • Single-cut only, scaffold >= 5, R group <= 12 – Index each fragment based on the other – A matched series will be indexed together Matched Series
  • 10. dATASET Matched series from ChEMBL16 IC50 binding assays N=2: 211,989 N=3: 52,341 N=4: 24,426 N=5: 13,792 N=6: 9,197
  • 12. CHEMBL768956 COX-2 inhibition CHEMBL772766 COX-1 inhibition R Group CHEMBL768956 (pIC50) CHEMBL772766 (pIC50) SMe ?? 5.92 NH2 ?? OMe 6.68 Me 6.10 4.82 Cl 5.92 4.75 F 5.82 4.59 Et 5.81 4.54 CF3 5.70 <4.00 H 5.62 4.26 COOH 4.23 <3.60 Rank order 5.88 Potential SAR transfer 5.59 0.93 rank order correlation
  • 13. Strengths and weaknesses • High confidence in predictions if sufficiently long series with correlated activities (or their rank order) – Not always able to find such a series – For short series will typically find 10s/100s/1000s of matching series with low confidence • Suited to pairwise comparison within focused dataset – Dense SAR matrix from target with well-explored SAR
  • 15. Preferred orders: Halides (N=2) For an ordered matched series (i.e. A>B>C>…), there are N! ways of arranging the R Groups: Series Observations* F>H 8250 H>F 7338 Would expect 7794 for each assuming the order is random – We can calculate enrichment *Dataset is ChEMBL16 IC50 data for binding assays (transformed to pIC50 values)
  • 16. Preferred orders: Halides (N=2) For an ordered matched series (i.e. A>B>C>…), there are N! ways of arranging the R Groups: Series Enrichment Observations F>H 1.06* 8250 H>F 0.94* 7338 Would expect 7794 for each assuming the order is random – We can calculate enrichment *Significant at 0.05 level according to binomial test after correcting for multiple testing (Bonferroni with N-1)
  • 17. Preferred orders: Halides (N=3) Series Enrichment Observations Cl > F > H 1.85* 1185 H > F > Cl 1.08 690 F > Cl > H 0.88* 566 Cl > H > F 0.79* 504 F > H > Cl 0.78* 503 H > Cl > F 0.63* 401
  • 18. Preferred orders: Halides (N=4) Series Enrichment Observations Br > Cl > F > H 5.62* 230 Cl > Br > F > H 2.79* 114 H > F > Cl > Br 1.69* 69 F > Cl > Br > H 1.47 60 Br > Cl > H > F 1.39 57 Cl > Br > H > F 0.88 36 … … … H > F > Br > Cl 0.73 30 … … … Cl > H > F > Br 0.49* 20 H > Br > F > Cl 0.49* 20 Cl > H > Br > F 0.46* 19 Br > F > H > Cl 0.44* 18 H > Cl > Br > F 0.44* 18 F > H > Br > Cl 0.42* 17 H > Cl > F > Br 0.37* 15 F > Br > H > Cl 0.34* 14 Br > H > F > Cl 0.22* 9 N=2: Max = 1.06, Min = 0.94 N=3: Max = 1.85, Min = 0.63 N=4: Max = 5.62, Min = 0.22 Longer series exhibit greater preferences If [H>F>Cl] is observed, will Br increase activity further? 128 observations of [H>F>Cl] but only 9 where [Br>H>F>Cl] Don’t forget sampling bias
  • 20. Find R Groups that increase activity Query A>B R Group Observations D E C … 3 1 4 … Obs that increase activity 3 1 1 A>B>C C>A>B D>A>B>C D>A>C>B E>D>A>B … % that increase activity 100 100 25 …
  • 21. Example Query: R Group > > Observations % that increase activity 53 75 28 71 22 63 41 58 36 58 40 proteins including: 22 GPCRs (muscarinic acetylcholine, glucagon, endothelin, angiotensin) 5 oxidoreductases (cytochrome P450, cyclooxygenase) 3 acyltransferases 3 hydrolases
  • 22. Example Query: R Group > > Observations % that increase activity 23 39 24 37 97 35 21 33 21 33 9 proteins including: 3 proteases (HIV-1, cathepsin K) 2 kinases (serine/threonine protein kinase ATR, CDK2) 1 GPCR
  • 23. CHEMBL1953234 PARP-1 inhibition (Poly[ADP-Ribose] Polymerase 1) [Me>Cl>H>F>CF3] R Remove most active and predict: [?>Cl>H>F>CF3] Prediction ranked Me as 2nd most likely, on the basis of 23 observations of which 7 (30%) showed improvement R CHEMBL956577 Inverse agonist at Histamine H3 receptor [Me>Cl>H>F>CF3]
  • 25. Rational Stepwise scheme for Substituted Phenyl Topliss, J. G. Utilization of Operational Schemes for Analog Synthesis in Drug Design. J. Med. Chem. 1972, 15, 1006–1011.
  • 26.
  • 27. Data-Driven Stepwise scheme for Substituted Phenyl Using Matsy and ChEMBL 16 IC50 binding data
  • 29. In summary • Longer matched series (N>2) show an increased preference for particular activity orders • This can be exploited to predict R groups that will increase activity – Predictions are typically based on data from a range of targets and structures • Completely knowledge-based – Can link predictions to particular targets/structures – Predictions refined based on new results – Data-hungry
  • 30. Using Matched Molecular Series as a Predictive Tool To Optimize Biological Activity http://nextmovesoftware.com noel@nextmovesoftware.com @nmsoftware