SlideShare a Scribd company logo
1 of 25
Download to read offline
Standardized Representations of
ELN Reactions for Categorization
and Duplicate/Variation
Identification
Roger Sayle and daniel lowe
NextMove Software, Cambridge, UK
247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
Overview
• Electronic Lab Notebooks (ELNs) are a rich source of
experimental observations of the synthetic methods
used by the pharmaceutical industry, especially failed
and low yield reactions.
• Recent successes in data mining this information
have provided insights to improving the productivity
and reducing costs in medicinal chemistry programs.
• This presentation describes the challenge of a small
but vital aspect of this process: reaction pivoting.
247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
Context: the hazelnut pipeline
247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
HazELNut Filbert NameRXN Cobnut
Accelrys
Pipeline Pilot
(AstraZeneca, AbbVie
& Hoffmann-La Roche)
ChemAxon
JChem Cartridge
(GlaxoSmithKline
& Novartis)
Elsevier Reaxys
(Hoffmann-La Roche,
AstraZeneca)
Perkin Elmer Informatics
(formerly CambridgeSoft)
eNotebook v9, v11 or v13
Oracle Server
version 10 or 11
Microsoft Windows or Linux
The pharmaceutical industry is increasingly making use
of reaction data warehouses from ELN data to better
share and learn from the experience of in-house and
CRO chemists.
Example (actionable) insight
The clear trend between Suzuki coupling success rate and
predicted octanol-water partition co-efficient.
Data: Pickett et al., ACS Med. Chem. Lett. 2(1):28, 2011
97 232
340
525
474 202
63
55 119 127 141 80 36 8
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
<1.0 1.0-2.0 2.0-3.0 3.0-4.0 4.0-5.0 5.0-6.0 >6.0
Reaction Product Predicted cLogP
Sucessful Reactions Failed Reactions
247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
big data mining confirmation of
Nadine-churcher hypothesis
On 16,335 Suzuki coupling reactions extracted from US
patent applications between 2001 and 2012.
Nadine, Hattotuwagama and Churcher ,“Lead-Oriented Synthesis: A New Opportunity for
Synthetic Chemistry”, Angew. Chem. Int. Ed, 51:1114 2012.
LogP Mean Yield N Obs
< 1.0 52.89% 196
1.0 – 2.0 56.02% 1155
2.0 – 3.0 56.72% 2881
3.0 – 4.0 58.14% 4071
4.0 – 5.0 57.26% 3186
5.0 – 6.0 59.25% 2126
> 6.0 63.83% 2720
247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
categorization of ELN reactions
1. J. Carey, D. Laffan, C. Thomson, M. Williams, Org. Biomol. Chem. 2337, 2006.
2. S. Roughley and A. Jordan, J. Med. Chem. 54:3451-3479, 2011.
34%
17%
5%
2%
3%
6%
10%
1%
15%
2%
5% Heteroatom alkylation and arylation
Acylation and related processes
C-C bond formations
Heterocycle formation
Protections
Deprotections
Reductions
Oxidations
Functional group conversion
Functional group addition
Resolution
Problem summary
• Categorizing reactions and observing trends requires
transforming data from the representations used in
electronic lab notebooks, which are primarily
designed to reliably and efficiently capture
intellectual property, into “cleaned up” forms more
suitable for analysis and analytics.
• Often overlooked is the “pivoting” from document or
experiment-based view of data to the de-duplicated
reaction (and variation) view of the same data.
247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
simplest example of duplication
• In a “declassified” subset of 18237 reactions from
AstraZeneca’s medchem ELN, there only 11451
unique experiments (63%), without any reaction
normalization.
• The most duplicated reaction (41 times) is the
Buchwald-Hartwig amination below.
Reaction variations in reaxys
247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
When is a million really a million?
247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
Although this data set contains
1,196,165 reaction instances,
there are only 744,043 unique
reactions, even without any
structure or role normalization.
http://nextmovesoftware.com/blog
Complication #1: molecule
normalization
• Some duplicates are caused by alternate chemistry
representations requiring normalization of SMILES.
• This problems can be solved by Reaction InChIs.
• EN01585-15
• EN01995-47
247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
Pharmaceutical patent example
US2002/0103226 [Merck] US2008/0139505 [Lilly]
Phosphodiesterase-4 inhibitors Glutamate receptor potentiators
[0398] Step 2 (Scheme 3) Preparation 168
(4-methoxyphenoxy)acetamide oxime N-hydroxy-2-(4-methoxy-phenoxy)-acetamidine
Both InChI=1S/C9H12N2O3/c1-13-7-2-4-8(5-3-7)14-6-9(10)11-12/h2-5,12H,6H2,1H3,(H2,10,11)
247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
Complication #2: superatoms
becomes
247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
Complication #3: correcting
chemdraw’s superatom expansion
Chemist drew Chemist Intended Chemist got
DMF
K2CO3
HBTU
mCPBA
247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
Complication #4: reaction role
normalization
• Some duplicates are caused by inconsistent reaction
roles (reactants vs. agents) in the chemist’s sketch.
• EN00104-06
• EN00104-47
247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
What is Atom-Mapping?
Mapping
algorithm
247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
Atom-Mapping for reaction role
identification/normalization
• Atom mappings can distinguish reactants from
solvents and catalysts, by whether they
contribute atoms to the product(s).
but this requires reasonable atom mapping…
247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
Atom mapping +classification
0
10
20
30
40
50
60
70
80
90
100
Atom mapping
algorithms alone
Combined with
NameRXN
Percentofreactionswithallproduct
atomsmapped
Marvin 6.0
ChemDraw 12
Consensus
Result
Verified /
Recognised
by
NameRXN
(71%)
Complication #5: eln sections
247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
Would the full ELN reaction please stand up?
Complication #6: agents matter
From http://en.wikipedia.org/wiki/Diazonium_compound
Sandmeyer reaction
Benzenediazonium chloride heated with cuprous chloride
disolved in HCl to yield chlorobenzene.
C6H5N2
+ + CuCl → C6H5Cl + N2 + Cu+
Gatterman reaction
Benzenediazonium chloride is warmed with copper powder and
HCl to yield chlorobenzene.
C6H5N2
+ + CuCl → C6H5Cl + N2 + Cu+
247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
AZ examples with same parent
• EN01325-20
• EN01325-22
• EN01325-25
• EN01325-27
247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
examples with same grandparent
• EN00930-16
• EN00930-25
• EN00930-60
Results & Lessons learned
• Exact duplicates are extremely common.
– 62.2% of USPTO data set is unique (744043/1196165).
– Some large pharmaceutical ELNs < 50% unique.
• True reaction variations are relatively rare.
– 684688 unique products without normalization.
– 675026 unique tautomer/protomer products (56.4%).
– Over 90% of deduplicated reactions have unique products.
• For many use cases the reaction InChI or the product
InChI may be sufficient for use as a key to join on.
247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
conclusions
• Transforming the ELN content into analysis friendly
data sets is perhaps more complicated than it might
first appear.
• This presentation describes the challenge of
“reaction pivoting” and several technical solutions
devised by the authors to address this problem.
• The resulting de-duplicated and categorized
reactions are useful in current work investigating the
influence of reaction conditions on yield, and step
choice and ordering in multi-step syntheses.
247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
acknowledgements
• Nick Tomkinson, AstraZeneca R&D, Alderley Park, UK.
• Thierry Kogej, Astra Zeneca R&D, Mölndal, Sweden.
• Mick Kappler, Hoffmann-La Roche, Nutley, NJ, USA.
• Plamen Petrov, AstraZeneca R&D, Mölndal, Sweden.
• Colin Batchelor, Royal Society of Chemistry, UK.
• Grégori Gerebtzoff, Hoffmann-La Roche, Basel, Switzerland.
• Thank you for you time. Questions?
247th ACS National Meeting, Dallas, TX, Monday 17th March 2014

More Related Content

What's hot

CINF 51: Analyzing success rates of supposedly 'easy' reactions
CINF 51: Analyzing success rates of supposedly 'easy' reactionsCINF 51: Analyzing success rates of supposedly 'easy' reactions
CINF 51: Analyzing success rates of supposedly 'easy' reactionsNextMove Software
 
Automatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patentsAutomatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patentsNextMove Software
 
Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...NextMove Software
 
Unlocking chemical information from tables and legacy articles
Unlocking chemical information from tables and legacy articlesUnlocking chemical information from tables and legacy articles
Unlocking chemical information from tables and legacy articlesNextMove Software
 
CHAS 31: Encoding reactive chemical hazards and incompatibilities in an alert...
CHAS 31: Encoding reactive chemical hazards and incompatibilities in an alert...CHAS 31: Encoding reactive chemical hazards and incompatibilities in an alert...
CHAS 31: Encoding reactive chemical hazards and incompatibilities in an alert...NextMove Software
 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...NextMove Software
 
Chemistry and reactions from non-US patents
Chemistry and reactions from non-US patentsChemistry and reactions from non-US patents
Chemistry and reactions from non-US patentsNextMove Software
 
Chemical structure representation in PubChem
Chemical structure representation in PubChemChemical structure representation in PubChem
Chemical structure representation in PubChemNextMove Software
 
CINF 18: Wikipedia and Wiktionary as resources for chemical text mining
CINF 18: Wikipedia and Wiktionary as resources for chemical text miningCINF 18: Wikipedia and Wiktionary as resources for chemical text mining
CINF 18: Wikipedia and Wiktionary as resources for chemical text miningNextMove Software
 
Substructure Search Face-off
Substructure Search Face-offSubstructure Search Face-off
Substructure Search Face-offNextMove Software
 
Classification, representation and analysis of cyclic peptides and peptide-li...
Classification, representation and analysis of cyclic peptides and peptide-li...Classification, representation and analysis of cyclic peptides and peptide-li...
Classification, representation and analysis of cyclic peptides and peptide-li...NextMove Software
 
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]NextMove Software
 
Sketchy sketches hiding chemistry in plain sight
Sketchy sketches hiding chemistry in plain sightSketchy sketches hiding chemistry in plain sight
Sketchy sketches hiding chemistry in plain sightNextMove Software
 
Acs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvspAcs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvspKen Karapetyan
 
Challenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsChallenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsNextMove Software
 
Resolving cryptic needles to molecular structures: The GtoPdb experience
Resolving cryptic needles to molecular structures: The GtoPdb experienceResolving cryptic needles to molecular structures: The GtoPdb experience
Resolving cryptic needles to molecular structures: The GtoPdb experienceChris Southan
 
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...Ken Karapetyan
 
Tackling the difficult areas of chemical entity extraction: Misspelt chemical...
Tackling the difficult areas of chemical entity extraction: Misspelt chemical...Tackling the difficult areas of chemical entity extraction: Misspelt chemical...
Tackling the difficult areas of chemical entity extraction: Misspelt chemical...dan2097
 
In grammars we trust: LeadMine, a knowledge driven solution
In grammars we trust: LeadMine, a knowledge driven solutionIn grammars we trust: LeadMine, a knowledge driven solution
In grammars we trust: LeadMine, a knowledge driven solutionNextMove Software
 

What's hot (20)

CINF 51: Analyzing success rates of supposedly 'easy' reactions
CINF 51: Analyzing success rates of supposedly 'easy' reactionsCINF 51: Analyzing success rates of supposedly 'easy' reactions
CINF 51: Analyzing success rates of supposedly 'easy' reactions
 
Automatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patentsAutomatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patents
 
Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...
 
Unlocking chemical information from tables and legacy articles
Unlocking chemical information from tables and legacy articlesUnlocking chemical information from tables and legacy articles
Unlocking chemical information from tables and legacy articles
 
CHAS 31: Encoding reactive chemical hazards and incompatibilities in an alert...
CHAS 31: Encoding reactive chemical hazards and incompatibilities in an alert...CHAS 31: Encoding reactive chemical hazards and incompatibilities in an alert...
CHAS 31: Encoding reactive chemical hazards and incompatibilities in an alert...
 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
 
Chemistry and reactions from non-US patents
Chemistry and reactions from non-US patentsChemistry and reactions from non-US patents
Chemistry and reactions from non-US patents
 
Chemical structure representation in PubChem
Chemical structure representation in PubChemChemical structure representation in PubChem
Chemical structure representation in PubChem
 
CINF 18: Wikipedia and Wiktionary as resources for chemical text mining
CINF 18: Wikipedia and Wiktionary as resources for chemical text miningCINF 18: Wikipedia and Wiktionary as resources for chemical text mining
CINF 18: Wikipedia and Wiktionary as resources for chemical text mining
 
Substructure Search Face-off
Substructure Search Face-offSubstructure Search Face-off
Substructure Search Face-off
 
Classification, representation and analysis of cyclic peptides and peptide-li...
Classification, representation and analysis of cyclic peptides and peptide-li...Classification, representation and analysis of cyclic peptides and peptide-li...
Classification, representation and analysis of cyclic peptides and peptide-li...
 
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
 
Sketchy sketches hiding chemistry in plain sight
Sketchy sketches hiding chemistry in plain sightSketchy sketches hiding chemistry in plain sight
Sketchy sketches hiding chemistry in plain sight
 
Acs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvspAcs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvsp
 
Challenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsChallenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptions
 
Resolving cryptic needles to molecular structures: The GtoPdb experience
Resolving cryptic needles to molecular structures: The GtoPdb experienceResolving cryptic needles to molecular structures: The GtoPdb experience
Resolving cryptic needles to molecular structures: The GtoPdb experience
 
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
 
Data model
Data modelData model
Data model
 
Tackling the difficult areas of chemical entity extraction: Misspelt chemical...
Tackling the difficult areas of chemical entity extraction: Misspelt chemical...Tackling the difficult areas of chemical entity extraction: Misspelt chemical...
Tackling the difficult areas of chemical entity extraction: Misspelt chemical...
 
In grammars we trust: LeadMine, a knowledge driven solution
In grammars we trust: LeadMine, a knowledge driven solutionIn grammars we trust: LeadMine, a knowledge driven solution
In grammars we trust: LeadMine, a knowledge driven solution
 

Viewers also liked

Representation and display of non-standard peptides using semi-systematic ami...
Representation and display of non-standard peptides using semi-systematic ami...Representation and display of non-standard peptides using semi-systematic ami...
Representation and display of non-standard peptides using semi-systematic ami...NextMove Software
 
2005 04 05 SRI ELN Architecture
2005 04 05 SRI ELN Architecture2005 04 05 SRI ELN Architecture
2005 04 05 SRI ELN ArchitectureSimon Coles
 
2010 01 27 Surveying the ELN Landscape
2010 01 27 Surveying the ELN Landscape2010 01 27 Surveying the ELN Landscape
2010 01 27 Surveying the ELN LandscapeSimon Coles
 
Sharepoint conf 5 - g mills
Sharepoint conf 5 - g millsSharepoint conf 5 - g mills
Sharepoint conf 5 - g millsMIchael Carey
 
Why paperless lab is just the first step towards a smart lab
Why paperless lab is just the first step towards a smart labWhy paperless lab is just the first step towards a smart lab
Why paperless lab is just the first step towards a smart labOSTHUS
 
Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – ...
Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – ...Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – ...
Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – ...Semantic Web Company
 
Electronic Lab Notebooks
Electronic Lab NotebooksElectronic Lab Notebooks
Electronic Lab NotebooksKristin Briney
 
Most important features when choosing an electronic lab notebook
Most important features when choosing an electronic lab notebookMost important features when choosing an electronic lab notebook
Most important features when choosing an electronic lab notebooksciNote LLC
 
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)Kristin Briney
 
Peptide Informatics - Bridging the gap between small-molecule and large-molec...
Peptide Informatics - Bridging the gap between small-molecule and large-molec...Peptide Informatics - Bridging the gap between small-molecule and large-molec...
Peptide Informatics - Bridging the gap between small-molecule and large-molec...NextMove Software
 
SLAS 2017 - "Multiple Research Platforms: One Single Data Sharing Portal"
SLAS 2017 - "Multiple Research Platforms:  One Single Data Sharing Portal"SLAS 2017 - "Multiple Research Platforms:  One Single Data Sharing Portal"
SLAS 2017 - "Multiple Research Platforms: One Single Data Sharing Portal"CSols, Inc.
 
Using Matched Molecular Series as a Predictive Tool To Optimize Biological Ac...
Using Matched Molecular Series as a Predictive Tool To Optimize Biological Ac...Using Matched Molecular Series as a Predictive Tool To Optimize Biological Ac...
Using Matched Molecular Series as a Predictive Tool To Optimize Biological Ac...NextMove Software
 
Peptide line notations for biologics registration and patent filings
Peptide line notations for biologics registration and patent filingsPeptide line notations for biologics registration and patent filings
Peptide line notations for biologics registration and patent filingsNextMove Software
 
Using Matched Series to decide what compound to make next
Using Matched Series to decide what compound to make nextUsing Matched Series to decide what compound to make next
Using Matched Series to decide what compound to make nextNextMove Software
 
jQuery sans jQuery
jQuery sans jQueryjQuery sans jQuery
jQuery sans jQuerygoldoraf
 
LNUG - A year with AWS
LNUG - A year with AWSLNUG - A year with AWS
LNUG - A year with AWSAndrew Clarke
 
Protecting Your SsaSets 01.07.10
Protecting Your SsaSets 01.07.10Protecting Your SsaSets 01.07.10
Protecting Your SsaSets 01.07.10michael keyes
 

Viewers also liked (20)

Representation and display of non-standard peptides using semi-systematic ami...
Representation and display of non-standard peptides using semi-systematic ami...Representation and display of non-standard peptides using semi-systematic ami...
Representation and display of non-standard peptides using semi-systematic ami...
 
InChI for Large Molecules
InChI for Large MoleculesInChI for Large Molecules
InChI for Large Molecules
 
2005 04 05 SRI ELN Architecture
2005 04 05 SRI ELN Architecture2005 04 05 SRI ELN Architecture
2005 04 05 SRI ELN Architecture
 
2010 01 27 Surveying the ELN Landscape
2010 01 27 Surveying the ELN Landscape2010 01 27 Surveying the ELN Landscape
2010 01 27 Surveying the ELN Landscape
 
Sharepoint conf 5 - g mills
Sharepoint conf 5 - g millsSharepoint conf 5 - g mills
Sharepoint conf 5 - g mills
 
Why paperless lab is just the first step towards a smart lab
Why paperless lab is just the first step towards a smart labWhy paperless lab is just the first step towards a smart lab
Why paperless lab is just the first step towards a smart lab
 
Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – ...
Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – ...Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – ...
Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – ...
 
Electronic Lab Notebooks
Electronic Lab NotebooksElectronic Lab Notebooks
Electronic Lab Notebooks
 
Most important features when choosing an electronic lab notebook
Most important features when choosing an electronic lab notebookMost important features when choosing an electronic lab notebook
Most important features when choosing an electronic lab notebook
 
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
 
Peptide Informatics - Bridging the gap between small-molecule and large-molec...
Peptide Informatics - Bridging the gap between small-molecule and large-molec...Peptide Informatics - Bridging the gap between small-molecule and large-molec...
Peptide Informatics - Bridging the gap between small-molecule and large-molec...
 
SLAS 2017 - "Multiple Research Platforms: One Single Data Sharing Portal"
SLAS 2017 - "Multiple Research Platforms:  One Single Data Sharing Portal"SLAS 2017 - "Multiple Research Platforms:  One Single Data Sharing Portal"
SLAS 2017 - "Multiple Research Platforms: One Single Data Sharing Portal"
 
How NOSQL Paid off for Telenor
How NOSQL Paid off for TelenorHow NOSQL Paid off for Telenor
How NOSQL Paid off for Telenor
 
Using Matched Molecular Series as a Predictive Tool To Optimize Biological Ac...
Using Matched Molecular Series as a Predictive Tool To Optimize Biological Ac...Using Matched Molecular Series as a Predictive Tool To Optimize Biological Ac...
Using Matched Molecular Series as a Predictive Tool To Optimize Biological Ac...
 
Peptide line notations for biologics registration and patent filings
Peptide line notations for biologics registration and patent filingsPeptide line notations for biologics registration and patent filings
Peptide line notations for biologics registration and patent filings
 
Using Matched Series to decide what compound to make next
Using Matched Series to decide what compound to make nextUsing Matched Series to decide what compound to make next
Using Matched Series to decide what compound to make next
 
jQuery sans jQuery
jQuery sans jQueryjQuery sans jQuery
jQuery sans jQuery
 
LNUG - A year with AWS
LNUG - A year with AWSLNUG - A year with AWS
LNUG - A year with AWS
 
El periodico en el aula
El periodico en el aula El periodico en el aula
El periodico en el aula
 
Protecting Your SsaSets 01.07.10
Protecting Your SsaSets 01.07.10Protecting Your SsaSets 01.07.10
Protecting Your SsaSets 01.07.10
 

Similar to Standardized Representations of ELN Reactions for Categorization and Duplicate/Variation Identification

Metabolic engineering approaches in medicinal plants
Metabolic engineering approaches in medicinal plantsMetabolic engineering approaches in medicinal plants
Metabolic engineering approaches in medicinal plantsN Poorin
 
Data drivenapproach to medicinalchemistry
Data drivenapproach to medicinalchemistryData drivenapproach to medicinalchemistry
Data drivenapproach to medicinalchemistryAnn-Marie Roche
 
Efficient Searching and Similarity of Unmapped Reactions: Application to ELN ...
Efficient Searching and Similarity of Unmapped Reactions: Application to ELN ...Efficient Searching and Similarity of Unmapped Reactions: Application to ELN ...
Efficient Searching and Similarity of Unmapped Reactions: Application to ELN ...NextMove Software
 
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...Kamel Mansouri
 
Extracting Synthetic Knowledge from Reaction Databases - ARChem at the 246th ACS
Extracting Synthetic Knowledge from Reaction Databases - ARChem at the 246th ACSExtracting Synthetic Knowledge from Reaction Databases - ARChem at the 246th ACS
Extracting Synthetic Knowledge from Reaction Databases - ARChem at the 246th ACSSimBioSys_Inc
 
CSMC EPP June 3, 2011 (slides)
CSMC EPP June 3, 2011 (slides)CSMC EPP June 3, 2011 (slides)
CSMC EPP June 3, 2011 (slides)Simon Millar
 
Simplifying Intact Molecular Weight Determination for ADCs
Simplifying Intact Molecular Weight Determination for ADCsSimplifying Intact Molecular Weight Determination for ADCs
Simplifying Intact Molecular Weight Determination for ADCsSCIEX
 
Unc slides on computational toxicology
Unc slides on computational toxicologyUnc slides on computational toxicology
Unc slides on computational toxicologySean Ekins
 
OECD Guidline on acute and chronic toxicity
OECD Guidline on acute and chronic toxicityOECD Guidline on acute and chronic toxicity
OECD Guidline on acute and chronic toxicityShital Magar
 
DOE Applications in Process Chemistry Presentation
DOE Applications in Process Chemistry PresentationDOE Applications in Process Chemistry Presentation
DOE Applications in Process Chemistry Presentationsaweissman
 
Mixtures: informatics for formulations and consumer products
Mixtures: informatics for formulations and consumer productsMixtures: informatics for formulations and consumer products
Mixtures: informatics for formulations and consumer productsAlex Clark
 
Using rule based classifiers for the predictive analysis of breast cancer rec...
Using rule based classifiers for the predictive analysis of breast cancer rec...Using rule based classifiers for the predictive analysis of breast cancer rec...
Using rule based classifiers for the predictive analysis of breast cancer rec...Alexander Decker
 
11.using rule based classifiers for the predictive analysis of breast cancer ...
11.using rule based classifiers for the predictive analysis of breast cancer ...11.using rule based classifiers for the predictive analysis of breast cancer ...
11.using rule based classifiers for the predictive analysis of breast cancer ...Alexander Decker
 
A new approach for summarizing SemRep predications
A new approach for summarizing SemRep predications A new approach for summarizing SemRep predications
A new approach for summarizing SemRep predications Vahid Taslimitehrani
 
Structure identification using high resolution mass spectrometry data and the...
Structure identification using high resolution mass spectrometry data and the...Structure identification using high resolution mass spectrometry data and the...
Structure identification using high resolution mass spectrometry data and the...Andrew McEachran
 

Similar to Standardized Representations of ELN Reactions for Categorization and Duplicate/Variation Identification (20)

Metabolic engineering approaches in medicinal plants
Metabolic engineering approaches in medicinal plantsMetabolic engineering approaches in medicinal plants
Metabolic engineering approaches in medicinal plants
 
Data drivenapproach to medicinalchemistry
Data drivenapproach to medicinalchemistryData drivenapproach to medicinalchemistry
Data drivenapproach to medicinalchemistry
 
Efficient Searching and Similarity of Unmapped Reactions: Application to ELN ...
Efficient Searching and Similarity of Unmapped Reactions: Application to ELN ...Efficient Searching and Similarity of Unmapped Reactions: Application to ELN ...
Efficient Searching and Similarity of Unmapped Reactions: Application to ELN ...
 
Modern analytical chemistry
Modern analytical chemistryModern analytical chemistry
Modern analytical chemistry
 
QSAR Modeling.pdf
QSAR Modeling.pdfQSAR Modeling.pdf
QSAR Modeling.pdf
 
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
 
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
 
foglar book.pdf
foglar book.pdffoglar book.pdf
foglar book.pdf
 
Extracting Synthetic Knowledge from Reaction Databases - ARChem at the 246th ACS
Extracting Synthetic Knowledge from Reaction Databases - ARChem at the 246th ACSExtracting Synthetic Knowledge from Reaction Databases - ARChem at the 246th ACS
Extracting Synthetic Knowledge from Reaction Databases - ARChem at the 246th ACS
 
CSMC EPP June 3, 2011 (slides)
CSMC EPP June 3, 2011 (slides)CSMC EPP June 3, 2011 (slides)
CSMC EPP June 3, 2011 (slides)
 
Simplifying Intact Molecular Weight Determination for ADCs
Simplifying Intact Molecular Weight Determination for ADCsSimplifying Intact Molecular Weight Determination for ADCs
Simplifying Intact Molecular Weight Determination for ADCs
 
SETAC HTS SFlood
SETAC HTS SFlood SETAC HTS SFlood
SETAC HTS SFlood
 
Unc slides on computational toxicology
Unc slides on computational toxicologyUnc slides on computational toxicology
Unc slides on computational toxicology
 
OECD Guidline on acute and chronic toxicity
OECD Guidline on acute and chronic toxicityOECD Guidline on acute and chronic toxicity
OECD Guidline on acute and chronic toxicity
 
DOE Applications in Process Chemistry Presentation
DOE Applications in Process Chemistry PresentationDOE Applications in Process Chemistry Presentation
DOE Applications in Process Chemistry Presentation
 
Mixtures: informatics for formulations and consumer products
Mixtures: informatics for formulations and consumer productsMixtures: informatics for formulations and consumer products
Mixtures: informatics for formulations and consumer products
 
Using rule based classifiers for the predictive analysis of breast cancer rec...
Using rule based classifiers for the predictive analysis of breast cancer rec...Using rule based classifiers for the predictive analysis of breast cancer rec...
Using rule based classifiers for the predictive analysis of breast cancer rec...
 
11.using rule based classifiers for the predictive analysis of breast cancer ...
11.using rule based classifiers for the predictive analysis of breast cancer ...11.using rule based classifiers for the predictive analysis of breast cancer ...
11.using rule based classifiers for the predictive analysis of breast cancer ...
 
A new approach for summarizing SemRep predications
A new approach for summarizing SemRep predications A new approach for summarizing SemRep predications
A new approach for summarizing SemRep predications
 
Structure identification using high resolution mass spectrometry data and the...
Structure identification using high resolution mass spectrometry data and the...Structure identification using high resolution mass spectrometry data and the...
Structure identification using high resolution mass spectrometry data and the...
 

More from NextMove Software

Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...NextMove Software
 
A de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESA de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESNextMove Software
 
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionRecent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionNextMove Software
 
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...NextMove Software
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsNextMove Software
 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKitNextMove Software
 
Digital Chemical Representations
Digital Chemical RepresentationsDigital Chemical Representations
Digital Chemical RepresentationsNextMove Software
 
PubChem as a Biologics Database
PubChem as a Biologics DatabasePubChem as a Biologics Database
PubChem as a Biologics DatabaseNextMove Software
 
Building on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesBuilding on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesNextMove Software
 
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...NextMove Software
 
Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)NextMove Software
 
Challenges in Chemical Information Exchange
Challenges in Chemical Information ExchangeChallenges in Chemical Information Exchange
Challenges in Chemical Information ExchangeNextMove Software
 
RDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical DepictionsRDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical DepictionsNextMove Software
 
GHS and NFPA diamonds: where they come from and how they can be useful
GHS and NFPA diamonds: where they come from and how they can be usefulGHS and NFPA diamonds: where they come from and how they can be useful
GHS and NFPA diamonds: where they come from and how they can be usefulNextMove Software
 
Line notations for nucleic acids (both natural and therapeutic)
Line notations for nucleic acids (both natural and therapeutic)Line notations for nucleic acids (both natural and therapeutic)
Line notations for nucleic acids (both natural and therapeutic)NextMove Software
 
Which is the best fingerprint for medicinal chemistry?
Which is the best fingerprint for medicinal chemistry?Which is the best fingerprint for medicinal chemistry?
Which is the best fingerprint for medicinal chemistry?NextMove Software
 
CINF 29: Visualization and manipulation of Matched Molecular Series for decis...
CINF 29: Visualization and manipulation of Matched Molecular Series for decis...CINF 29: Visualization and manipulation of Matched Molecular Series for decis...
CINF 29: Visualization and manipulation of Matched Molecular Series for decis...NextMove Software
 

More from NextMove Software (18)

DeepSMILES
DeepSMILESDeepSMILES
DeepSMILES
 
Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...
 
A de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESA de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILES
 
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionRecent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
 
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule Implementations
 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKit
 
Digital Chemical Representations
Digital Chemical RepresentationsDigital Chemical Representations
Digital Chemical Representations
 
PubChem as a Biologics Database
PubChem as a Biologics DatabasePubChem as a Biologics Database
PubChem as a Biologics Database
 
Building on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesBuilding on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfiles
 
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
 
Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)
 
Challenges in Chemical Information Exchange
Challenges in Chemical Information ExchangeChallenges in Chemical Information Exchange
Challenges in Chemical Information Exchange
 
RDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical DepictionsRDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical Depictions
 
GHS and NFPA diamonds: where they come from and how they can be useful
GHS and NFPA diamonds: where they come from and how they can be usefulGHS and NFPA diamonds: where they come from and how they can be useful
GHS and NFPA diamonds: where they come from and how they can be useful
 
Line notations for nucleic acids (both natural and therapeutic)
Line notations for nucleic acids (both natural and therapeutic)Line notations for nucleic acids (both natural and therapeutic)
Line notations for nucleic acids (both natural and therapeutic)
 
Which is the best fingerprint for medicinal chemistry?
Which is the best fingerprint for medicinal chemistry?Which is the best fingerprint for medicinal chemistry?
Which is the best fingerprint for medicinal chemistry?
 
CINF 29: Visualization and manipulation of Matched Molecular Series for decis...
CINF 29: Visualization and manipulation of Matched Molecular Series for decis...CINF 29: Visualization and manipulation of Matched Molecular Series for decis...
CINF 29: Visualization and manipulation of Matched Molecular Series for decis...
 

Recently uploaded

GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 

Recently uploaded (20)

GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 

Standardized Representations of ELN Reactions for Categorization and Duplicate/Variation Identification

  • 1. Standardized Representations of ELN Reactions for Categorization and Duplicate/Variation Identification Roger Sayle and daniel lowe NextMove Software, Cambridge, UK 247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
  • 2. Overview • Electronic Lab Notebooks (ELNs) are a rich source of experimental observations of the synthetic methods used by the pharmaceutical industry, especially failed and low yield reactions. • Recent successes in data mining this information have provided insights to improving the productivity and reducing costs in medicinal chemistry programs. • This presentation describes the challenge of a small but vital aspect of this process: reaction pivoting. 247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
  • 3. Context: the hazelnut pipeline 247th ACS National Meeting, Dallas, TX, Monday 17th March 2014 HazELNut Filbert NameRXN Cobnut Accelrys Pipeline Pilot (AstraZeneca, AbbVie & Hoffmann-La Roche) ChemAxon JChem Cartridge (GlaxoSmithKline & Novartis) Elsevier Reaxys (Hoffmann-La Roche, AstraZeneca) Perkin Elmer Informatics (formerly CambridgeSoft) eNotebook v9, v11 or v13 Oracle Server version 10 or 11 Microsoft Windows or Linux The pharmaceutical industry is increasingly making use of reaction data warehouses from ELN data to better share and learn from the experience of in-house and CRO chemists.
  • 4. Example (actionable) insight The clear trend between Suzuki coupling success rate and predicted octanol-water partition co-efficient. Data: Pickett et al., ACS Med. Chem. Lett. 2(1):28, 2011 97 232 340 525 474 202 63 55 119 127 141 80 36 8 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% <1.0 1.0-2.0 2.0-3.0 3.0-4.0 4.0-5.0 5.0-6.0 >6.0 Reaction Product Predicted cLogP Sucessful Reactions Failed Reactions 247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
  • 5. big data mining confirmation of Nadine-churcher hypothesis On 16,335 Suzuki coupling reactions extracted from US patent applications between 2001 and 2012. Nadine, Hattotuwagama and Churcher ,“Lead-Oriented Synthesis: A New Opportunity for Synthetic Chemistry”, Angew. Chem. Int. Ed, 51:1114 2012. LogP Mean Yield N Obs < 1.0 52.89% 196 1.0 – 2.0 56.02% 1155 2.0 – 3.0 56.72% 2881 3.0 – 4.0 58.14% 4071 4.0 – 5.0 57.26% 3186 5.0 – 6.0 59.25% 2126 > 6.0 63.83% 2720 247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
  • 6. 247th ACS National Meeting, Dallas, TX, Monday 17th March 2014 categorization of ELN reactions 1. J. Carey, D. Laffan, C. Thomson, M. Williams, Org. Biomol. Chem. 2337, 2006. 2. S. Roughley and A. Jordan, J. Med. Chem. 54:3451-3479, 2011. 34% 17% 5% 2% 3% 6% 10% 1% 15% 2% 5% Heteroatom alkylation and arylation Acylation and related processes C-C bond formations Heterocycle formation Protections Deprotections Reductions Oxidations Functional group conversion Functional group addition Resolution
  • 7. Problem summary • Categorizing reactions and observing trends requires transforming data from the representations used in electronic lab notebooks, which are primarily designed to reliably and efficiently capture intellectual property, into “cleaned up” forms more suitable for analysis and analytics. • Often overlooked is the “pivoting” from document or experiment-based view of data to the de-duplicated reaction (and variation) view of the same data. 247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
  • 8. 247th ACS National Meeting, Dallas, TX, Monday 17th March 2014 simplest example of duplication • In a “declassified” subset of 18237 reactions from AstraZeneca’s medchem ELN, there only 11451 unique experiments (63%), without any reaction normalization. • The most duplicated reaction (41 times) is the Buchwald-Hartwig amination below.
  • 9. Reaction variations in reaxys 247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
  • 10. When is a million really a million? 247th ACS National Meeting, Dallas, TX, Monday 17th March 2014 Although this data set contains 1,196,165 reaction instances, there are only 744,043 unique reactions, even without any structure or role normalization. http://nextmovesoftware.com/blog
  • 11. Complication #1: molecule normalization • Some duplicates are caused by alternate chemistry representations requiring normalization of SMILES. • This problems can be solved by Reaction InChIs. • EN01585-15 • EN01995-47 247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
  • 12. Pharmaceutical patent example US2002/0103226 [Merck] US2008/0139505 [Lilly] Phosphodiesterase-4 inhibitors Glutamate receptor potentiators [0398] Step 2 (Scheme 3) Preparation 168 (4-methoxyphenoxy)acetamide oxime N-hydroxy-2-(4-methoxy-phenoxy)-acetamidine Both InChI=1S/C9H12N2O3/c1-13-7-2-4-8(5-3-7)14-6-9(10)11-12/h2-5,12H,6H2,1H3,(H2,10,11) 247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
  • 13. Complication #2: superatoms becomes 247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
  • 14. Complication #3: correcting chemdraw’s superatom expansion Chemist drew Chemist Intended Chemist got DMF K2CO3 HBTU mCPBA 247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
  • 15. Complication #4: reaction role normalization • Some duplicates are caused by inconsistent reaction roles (reactants vs. agents) in the chemist’s sketch. • EN00104-06 • EN00104-47 247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
  • 16. 247th ACS National Meeting, Dallas, TX, Monday 17th March 2014 What is Atom-Mapping? Mapping algorithm
  • 17. 247th ACS National Meeting, Dallas, TX, Monday 17th March 2014 Atom-Mapping for reaction role identification/normalization • Atom mappings can distinguish reactants from solvents and catalysts, by whether they contribute atoms to the product(s). but this requires reasonable atom mapping…
  • 18. 247th ACS National Meeting, Dallas, TX, Monday 17th March 2014 Atom mapping +classification 0 10 20 30 40 50 60 70 80 90 100 Atom mapping algorithms alone Combined with NameRXN Percentofreactionswithallproduct atomsmapped Marvin 6.0 ChemDraw 12 Consensus Result Verified / Recognised by NameRXN (71%)
  • 19. Complication #5: eln sections 247th ACS National Meeting, Dallas, TX, Monday 17th March 2014 Would the full ELN reaction please stand up?
  • 20. Complication #6: agents matter From http://en.wikipedia.org/wiki/Diazonium_compound Sandmeyer reaction Benzenediazonium chloride heated with cuprous chloride disolved in HCl to yield chlorobenzene. C6H5N2 + + CuCl → C6H5Cl + N2 + Cu+ Gatterman reaction Benzenediazonium chloride is warmed with copper powder and HCl to yield chlorobenzene. C6H5N2 + + CuCl → C6H5Cl + N2 + Cu+ 247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
  • 21. 247th ACS National Meeting, Dallas, TX, Monday 17th March 2014 AZ examples with same parent • EN01325-20 • EN01325-22 • EN01325-25 • EN01325-27
  • 22. 247th ACS National Meeting, Dallas, TX, Monday 17th March 2014 examples with same grandparent • EN00930-16 • EN00930-25 • EN00930-60
  • 23. Results & Lessons learned • Exact duplicates are extremely common. – 62.2% of USPTO data set is unique (744043/1196165). – Some large pharmaceutical ELNs < 50% unique. • True reaction variations are relatively rare. – 684688 unique products without normalization. – 675026 unique tautomer/protomer products (56.4%). – Over 90% of deduplicated reactions have unique products. • For many use cases the reaction InChI or the product InChI may be sufficient for use as a key to join on. 247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
  • 24. conclusions • Transforming the ELN content into analysis friendly data sets is perhaps more complicated than it might first appear. • This presentation describes the challenge of “reaction pivoting” and several technical solutions devised by the authors to address this problem. • The resulting de-duplicated and categorized reactions are useful in current work investigating the influence of reaction conditions on yield, and step choice and ordering in multi-step syntheses. 247th ACS National Meeting, Dallas, TX, Monday 17th March 2014
  • 25. acknowledgements • Nick Tomkinson, AstraZeneca R&D, Alderley Park, UK. • Thierry Kogej, Astra Zeneca R&D, Mölndal, Sweden. • Mick Kappler, Hoffmann-La Roche, Nutley, NJ, USA. • Plamen Petrov, AstraZeneca R&D, Mölndal, Sweden. • Colin Batchelor, Royal Society of Chemistry, UK. • Grégori Gerebtzoff, Hoffmann-La Roche, Basel, Switzerland. • Thank you for you time. Questions? 247th ACS National Meeting, Dallas, TX, Monday 17th March 2014