SlideShare a Scribd company logo
1 of 30
Download to read offline
CINF 13, ACS Fall 2017, Washington, D.C.
pistachio
Search and Faceting of Large Reaction Databases
John	Mayfield,	Daniel	Lowe,	Roger	Sayle
What do Synthetic Chemists Want from Their
Reaction Systems?
CINF 13, ACS Fall 2017, Washington, D.C.
Data ClassificationDiagrams Search
What do Synthetic Chemists Want from Their
Reaction Systems?
CINF 13, ACS Fall 2017, Washington, D.C.
Data ClassificationDiagrams Search
HazELNut Filbert NameRXN Cobnut
Accelrys
Pipeline Pilot
(AstraZeneca, AbbVie
& Hoffmann-La Roche)
ChemAxon
JChem Cartridge
(GlaxoSmithKline
& Novartis)
Elsevier Reaxys
(Hoffmann-La Roche,
AstraZeneca, Merck)
Perkin Elmer Informatics
(formerly CambridgeSoft)
eNotebook v9, v11 or v13
or Symyx ELN v5.x or v6.x
Oracle Server
version 10, 11 or
Microsoft Windows, Linux or Mac OS
Infrastructure for liberating and processing
reactions from Electronic Lab Notebooks (ELNs)
CINF 13, ACS Fall 2017, Washington, D.C.
To 7-chloro-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid (Peakdale) (220 mg, 1.025 mmol) and (3,4-
dimethoxyphenyl)boronic acid (187 mg, 1.025 mmol) in 1,4-dioxane (3 mL) and water (1.5 mL) was
added sodium carbonate(435 mg, 4.10 mmol) and tetrakis(triphenylphosphine)palladium(0) (110 mg, 0.095
mmol). The reaction was heated in the microwave at 80° C. for 2 hours and at 100° C. for a further 2 hours.
The solvent was removed and the residue was suspended in DMSO, filtered and purified by MDAP. Appropriate
fractions were combined and the solvent removed to give 7-(3,4-dimethoxyphenyl)-4-oxo-4,5-dihydrofuro[2,3-
d]pyridazine-2-carboxylic acid (25 mg, 7%) as a yellow solid.
[0517]
US 2016/16966 A1
Daniel M. Lowe. Extraction of chemical structures and reactions from the literature. Ph.D. Thesis,
University of Cambridge, 2012
Daniel M. Lowe. Extraction of chemical structures and reactions from the literature. Ph.D. Thesis,
University of Cambridge, 2012
To 7-chloro-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid (Peakdale) (220 mg, 1.025 mmol) and (3,4-
dimethoxyphenyl)boronic acid (187 mg, 1.025 mmol) in 1,4-dioxane (3 mL) and water (1.5 mL) was
added sodium carbonate(435 mg, 4.10 mmol) and tetrakis(triphenylphosphine)palladium(0) (110 mg, 0.095
mmol). The reaction was heated in the microwave at 80° C. for 2 hours and at 100° C. for a further 2 hours.
The solvent was removed and the residue was suspended in DMSO, filtered and purified by MDAP. Appropriate
fractions were combined and the solvent removed to give 7-(3,4-dimethoxyphenyl)-4-oxo-4,5-dihydrofuro[2,3-
d]pyridazine-2-carboxylic acid (25 mg, 7%) as a yellow solid.
[0517]
Product Properties
7-(3,4-dimethoxyphenyl)-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid 25 mg, 7% yield, Yellow Solid
Reactant Properties
7-chloro-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid 220 mg, 1.025 mmol
(3,4-dimethoxyphenyl)boronic acid 187 mg, 1.025 mmol
Agent Properties
1,4-dioxane 3mL
water 1.5mL
sodium carbonate 435 mg, 4.10 mol
tetrakis(triphenylphosphine)palladium(0) 110 mg, 0.095 mmol
DMSO
Unstructured	text	to	a	structured	reaction	table
US 2016/16966 A1
LeadMine	+	Chemical	Tagger
Christos Nicolaou et al. The Proximal Lilly Collection: Mapping, Exploring and Exploiting
Feasible Chemical Space J. Chem. Inf. Model., 2016, 56 (7), pp 1253–1266
Nadine Schneider et al. Big Data from Pharmaceutical Patents: A Computational Analysis of
Medicinal Chemists’ Bread and Butter. J. Med. Chem., 2016, 59 (9), pp 4385–4402
Nadine Schneider et al. Development of a Novel Fingerprint for Chemical Reactions and Its
Application to Large-Scale Reaction Classification and Similarity J. Chem. Inf.
Model., 2015, 55 (1), pp 39–53
Nadine Schneider et al. What’s What: The (Nearly) Definitive Guide to Reaction Role
Assignment. J. Chem. Inf. Model., 2016, 56 (12), pp 2336–2346
Connor Coley et al. Prediction of Organic Reaction Outcomes Using Machine Learning. ACS
Cent. Sci., 2017, 3 (5), pp 434–443
Data impact
CINF 13, ACS Fall 2017, Washington, D.C.
Public subset released in 2014 as CC-Zero
Pistachio expands the scope of the data and uses Atom-
Atom Maps from NameRxn
Example	26.	Epizyme	Inc.	1-phenoxy-3-(alkylamino)-propan-2-olderivatives	as	CARM1	inhibitors	and	uses	thereof	(US	09718816	
B2)	Aug.	1,	2017
Example 26, US 09718816 B2
John	May,	et	al.	Sketchy	Sketches:	Hiding	Chemistry	in	Plain	Sight.	Seventh	Joint	Sheffield	Conference	on	
Cheminformatics.	2016
	Step	1
	Step	4
	Step	3
	Step	2
	etc..
sketch extraction
NextMove’s	Praline
total reactions over time
CINF 13, ACS Fall 2017, Washington, D.C.
0
0.5M
1.0M
1.5M
2.0M
2.5M
3.0M
3.5M
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
ReactionDetails(cumulative)
EPO Applications
EPO Grants
USPTO Applications
USPTO Grants
What do Synthetic Chemists Want from Their
Reaction Systems?
CINF 13, ACS Fall 2017, Washington, D.C.
Data ClassificationDiagrams Search
reaction DIAGRAMS
Good reaction diagrams are essential in
communicating synthetic chemistry
Layout can be stored or generated
• When extracting from text, layout must be generated
• Generated diagrams can be unsatisfactory for display
CINF 13, ACS Fall 2017, Washington, D.C.
O
OB
OH
HO
OH
O
O
Cl
N
HN
C
O
PPd
P
P
P
O
O
Na+
Na+
-O O-
O
H2O
O
O
N
HN
C
O
O OH
O
+
ChemDrawOEChem
Generated from SMILES for US 2016/16966 A1 [0517]
ChemAxonBIOVIA
Generated from SMILES for US 2016/16966 A1 [0517]
diagram improvements
Typical work arounds:
• Separately render molecules
• Hide agents and list separately
What do humans do:
• Wrap products below
• Abbreviate functional groups and agents
• Orientate reactants to products and visa versa
• Hide agents and list as text
CINF 13, ACS Fall 2017, Washington, D.C.
Pistachio+CDK
(Abbreviated+Aligned)
Pistachio+CDK
(Abbreviated)
Generated from SMILES for US 2016/16966 A1 [0517]
reaction detail view
What do Synthetic Chemists Want from Their
Reaction Systems?
CINF 13, ACS Fall 2017, Washington, D.C.
Data ClassificationDiagrams Search
4.1.6	Cyclic	Beckmann	rearrangement
Assigns names to 900+ reactions using transformations
Can guarantee perfect Atom-Atom Mapping
• Atom-Atom Mapping is an output not an input
• MCS mappers struggle with rearrangements:
namerxn
concepts and rxno
CINF 13, ACS Fall 2017, Washington, D.C.
1 Heteroatom alkylation and arylation
.7 O-substitution
.1 Chan-Lam ether coupling
.2 Diazomethane esterification
.3 Ethyl esterification
.4 Hydroxy to methoxy
.5 Hydroxy to triflyloxy
.6 Methyl esterification
.n
2 Acylation and related processes
.6 O-acylation to ester
.1 Ester Schotten-Baumann
.2 Esterification (generic)
.3 Fischer-Speier esterification
.4 Baeyer-Villiger oxidation
.5 Yamaguchi esterification
.6 Hydroxy to imidazolecarbonyloxy
.7 Imidazolecarbonyl to ester
.8 Hydroxy to acetoxy
.9 Steglich esterification
.n
concepts and rxno
CINF 13, ACS Fall 2017, Washington, D.C.
1 Heteroatom alkylation and arylation
.7 O-substitution
.1 Chan-Lam ether coupling
.2 Diazomethane esterification
.3 Ethyl esterification
.4 Hydroxy to methoxy
.5 Hydroxy to triflyloxy
.6 Methyl esterification
.n
2 Acylation and related processes
.6 O-acylation to ester
.1 Ester Schotten-Baumann
.2 Esterification (generic)
.3 Fischer-Speier esterification
.4 Baeyer-Villiger oxidation
.5 Yamaguchi esterification
.6 Hydroxy to imidazolecarbonyloxy
.7 Imidazolecarbonyl to ester
.8 Hydroxy to acetoxy
.9 Steglich esterification
.n
Esterification	(7)
Chan-Lam	coupling	(3)
Schotten-Baumann	
Reaction	(9)
RXNO: http://github.com/rsc-ontologies/rxno
result FACETS
Provides summary over the key concepts of results
Cut through information deluge and refine search
CINF 13, ACS Fall 2017, Washington, D.C.
• Reaction Types (NextMove ontology tree)
• Drug Targets (ChEMBL ontology tree)
• Disease Targets (MESH ontology tree)
• Yields
• Affiliation (NextMove ontology tree)
• Publication Date, Documents, Authors
CINF 13, ACS Fall 2017, Washington, D.C.
Intel(R) Core(TM) i7-6900K CPU @
3.20GHz
2.9 seconds to summarise
all 6.6 million rows
Resource expensive – O(n) size of
result set
• Client, server, or database?
• Overhead copying and transferring data that is
not needed
• Calculate when requested or up-front?
facet calculation
Custom cartridge:
What do Synthetic Chemists Want from Their
Reaction Systems?
CINF 13, ACS Fall 2017, Washington, D.C.
Data ClassificationDiagrams Search
one entry point
CINF 13, ACS Fall 2017, Washington, D.C.
Systematic	Name Date	Range Trivial	Name
Yield	Range Affiliation Reaction	SMARTS
Disease	Target DocumentLine	Formula
SMILES InChIAuthor Protein	Target Collection
Reaction	Type	(NameRxn)SMARTSSource
…and	logical	combinations	thereof
suggestions
Based on global frequency
CINF 13, ACS Fall 2017, Washington, D.C.
Based on context frequency
structure search technology
NextMove’s Arthor Technology
Up to 100x faster then state-of-the-
art
Combination of SMARTS
compilation and efficient storage
Preliminary PostgreSQL integration
36s Arthor
56m BIOVIA Direct (Oracle)
1h Bingo (NoSQL)
1h54m Bingo (PostgreSQL)
2h6m Bingo (Oracle)
2h41m JChem (Oracle)
5h9m RDCart (PostgreSQL)
13h54m pgchem (PostgreSQL)
1d1h52m mychem (MySQL)
3d1h13m orchem (Oracle)
Benchmark: ~3.5K queries against ~7M structures (eMolecules 2014) all on the same
hardware.
John May and Roger Sayle, Substructure Search Face-off, May 2015
Intention can be refined by qualifiers
Role
{structure} product
Substructure
{structure} substructure
{structure} substructure product
Make/Break
Synthesis of {structure}
Combined with other terms
{structure} substructure product and yield of 80%
refining structure search
CINF 13, ACS Fall 2017, Washington, D.C.
Find:	7H-purine	substructure	product
Find:	Synthesis	of	7H-purine
make/break example
CINF 13, ACS Fall 2017, Washington, D.C.
Find:	7H-purine-8-one	substructure	chlorination
Find:	[*:1][CH2:2]Cl>>[*:1][CH2:2]F
Namerxn example
CINF 13, ACS Fall 2017, Washington, D.C.
Acknowledgements
Noel O’Boyle (NextMove Software), Egon Willighagen (CDK)
James Davison, Matt Swain (Vernalis)
What do Synthetic Chemists Want from Their
Reaction Systems?
Data ClassificationDiagrams Search
pistachio
http://www.nextmovesoftware.com/pistachio.html
Come find me around ACS for a demo!
See also: CINF 90

More Related Content

What's hot

Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala Edureka!
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaEdureka!
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 
The Case for Graphs in AML
The Case for Graphs in AMLThe Case for Graphs in AML
The Case for Graphs in AMLNeo4j
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL DatabasesBADR
 
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleHow to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleDATAVERSITY
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
 
Poincare embeddings for Learning Hierarchical Representations
Poincare embeddings for Learning Hierarchical RepresentationsPoincare embeddings for Learning Hierarchical Representations
Poincare embeddings for Learning Hierarchical RepresentationsTatsuya Shirakawa
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining Phi Jack
 
End to-end monitoring for a successful Power BI implementation
End to-end monitoring for a successful Power BI implementationEnd to-end monitoring for a successful Power BI implementation
End to-end monitoring for a successful Power BI implementationMarc Lelijveld
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Edureka!
 
Workshop Tel Aviv - Graph Data Science
Workshop Tel Aviv - Graph Data ScienceWorkshop Tel Aviv - Graph Data Science
Workshop Tel Aviv - Graph Data ScienceNeo4j
 
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics MeetupIntroduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetupiwrigley
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 

What's hot (20)

Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
The Case for Graphs in AML
The Case for Graphs in AMLThe Case for Graphs in AML
The Case for Graphs in AML
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
 
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleHow to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
 
Graphene
GrapheneGraphene
Graphene
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data Architecture
 
Big Data analytics
Big Data analyticsBig Data analytics
Big Data analytics
 
Poincare embeddings for Learning Hierarchical Representations
Poincare embeddings for Learning Hierarchical RepresentationsPoincare embeddings for Learning Hierarchical Representations
Poincare embeddings for Learning Hierarchical Representations
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
Apache Kylin
Apache KylinApache Kylin
Apache Kylin
 
End to-end monitoring for a successful Power BI implementation
End to-end monitoring for a successful Power BI implementationEnd to-end monitoring for a successful Power BI implementation
End to-end monitoring for a successful Power BI implementation
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform System
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
 
Workshop Tel Aviv - Graph Data Science
Workshop Tel Aviv - Graph Data ScienceWorkshop Tel Aviv - Graph Data Science
Workshop Tel Aviv - Graph Data Science
 
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics MeetupIntroduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 

Similar to CINF 13: Pistachio - Search and Faceting of Large Reaction Databases

Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Kamel Mansouri
 
ICIC 2013 Conference Proceedings Sebastian Radestock
ICIC 2013 Conference Proceedings Sebastian RadestockICIC 2013 Conference Proceedings Sebastian Radestock
ICIC 2013 Conference Proceedings Sebastian RadestockDr. Haxel Consult
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesGreg Landrum
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)University of Washington
 
Cheminformatics II
Cheminformatics IICheminformatics II
Cheminformatics IIbaoilleach
 
Review of some successes
Review of some successesReview of some successes
Review of some successesAndrea Zaliani
 
Getting the Big Picture by Joining up the SAR dots
Getting the Big Picture by Joining up the SAR dotsGetting the Big Picture by Joining up the SAR dots
Getting the Big Picture by Joining up the SAR dotsSorel Muresan
 
Practical 9 protein structure and function (3)
Practical 9 protein structure and function  (3)Practical 9 protein structure and function  (3)
Practical 9 protein structure and function (3)Osama Barayan
 
The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...Kamel Mansouri
 
R Analytics in the Cloud
R Analytics in the CloudR Analytics in the Cloud
R Analytics in the CloudDataMine Lab
 
Automatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patentsAutomatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patentsNextMove Software
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Sunghwan Kim
 
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...open_phacts
 
CINF 2012 talk Recrystallization App
CINF 2012 talk Recrystallization AppCINF 2012 talk Recrystallization App
CINF 2012 talk Recrystallization AppJean-Claude Bradley
 

Similar to CINF 13: Pistachio - Search and Faceting of Large Reaction Databases (20)

The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
 
Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...
 
ICIC 2013 Conference Proceedings Sebastian Radestock
ICIC 2013 Conference Proceedings Sebastian RadestockICIC 2013 Conference Proceedings Sebastian Radestock
ICIC 2013 Conference Proceedings Sebastian Radestock
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databases
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
 
Cheminformatics II
Cheminformatics IICheminformatics II
Cheminformatics II
 
Review of some successes
Review of some successesReview of some successes
Review of some successes
 
Getting the Big Picture by Joining up the SAR dots
Getting the Big Picture by Joining up the SAR dotsGetting the Big Picture by Joining up the SAR dots
Getting the Big Picture by Joining up the SAR dots
 
Websci17 final
Websci17 finalWebsci17 final
Websci17 final
 
Practical 9 protein structure and function (3)
Practical 9 protein structure and function  (3)Practical 9 protein structure and function  (3)
Practical 9 protein structure and function (3)
 
The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...
 
Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...
Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...
Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...
 
The influence of data curation on QSAR Modeling – examining issues of qualit...
 The influence of data curation on QSAR Modeling – examining issues of qualit... The influence of data curation on QSAR Modeling – examining issues of qualit...
The influence of data curation on QSAR Modeling – examining issues of qualit...
 
R Analytics in the Cloud
R Analytics in the CloudR Analytics in the Cloud
R Analytics in the Cloud
 
Automatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patentsAutomatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patents
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...
 
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
 
CINF 2012 talk Recrystallization App
CINF 2012 talk Recrystallization AppCINF 2012 talk Recrystallization App
CINF 2012 talk Recrystallization App
 
GiTools
GiToolsGiTools
GiTools
 
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
 

More from NextMove Software

CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...NextMove Software
 
Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...NextMove Software
 
CINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedCINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedNextMove Software
 
A de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESA de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESNextMove Software
 
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionRecent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionNextMove Software
 
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...NextMove Software
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsNextMove Software
 
Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...NextMove Software
 
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...NextMove Software
 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKitNextMove Software
 
Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...NextMove Software
 
Digital Chemical Representations
Digital Chemical RepresentationsDigital Chemical Representations
Digital Chemical RepresentationsNextMove Software
 
Challenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsChallenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsNextMove Software
 
PubChem as a Biologics Database
PubChem as a Biologics DatabasePubChem as a Biologics Database
PubChem as a Biologics DatabaseNextMove Software
 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...NextMove Software
 
Building on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesBuilding on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesNextMove Software
 
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...NextMove Software
 
Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)NextMove Software
 
Challenges in Chemical Information Exchange
Challenges in Chemical Information ExchangeChallenges in Chemical Information Exchange
Challenges in Chemical Information ExchangeNextMove Software
 

More from NextMove Software (20)

DeepSMILES
DeepSMILESDeepSMILES
DeepSMILES
 
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
 
Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...
 
CINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedCINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speed
 
A de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESA de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILES
 
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionRecent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
 
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule Implementations
 
Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...
 
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKit
 
Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...
 
Digital Chemical Representations
Digital Chemical RepresentationsDigital Chemical Representations
Digital Chemical Representations
 
Challenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsChallenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptions
 
PubChem as a Biologics Database
PubChem as a Biologics DatabasePubChem as a Biologics Database
PubChem as a Biologics Database
 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
 
Building on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesBuilding on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfiles
 
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
 
Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)
 
Challenges in Chemical Information Exchange
Challenges in Chemical Information ExchangeChallenges in Chemical Information Exchange
Challenges in Chemical Information Exchange
 

Recently uploaded

SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfWadeK3
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 

Recently uploaded (20)

SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 

CINF 13: Pistachio - Search and Faceting of Large Reaction Databases

  • 1. CINF 13, ACS Fall 2017, Washington, D.C. pistachio Search and Faceting of Large Reaction Databases John Mayfield, Daniel Lowe, Roger Sayle
  • 2. What do Synthetic Chemists Want from Their Reaction Systems? CINF 13, ACS Fall 2017, Washington, D.C. Data ClassificationDiagrams Search
  • 3. What do Synthetic Chemists Want from Their Reaction Systems? CINF 13, ACS Fall 2017, Washington, D.C. Data ClassificationDiagrams Search
  • 4. HazELNut Filbert NameRXN Cobnut Accelrys Pipeline Pilot (AstraZeneca, AbbVie & Hoffmann-La Roche) ChemAxon JChem Cartridge (GlaxoSmithKline & Novartis) Elsevier Reaxys (Hoffmann-La Roche, AstraZeneca, Merck) Perkin Elmer Informatics (formerly CambridgeSoft) eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating and processing reactions from Electronic Lab Notebooks (ELNs) CINF 13, ACS Fall 2017, Washington, D.C.
  • 5. To 7-chloro-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid (Peakdale) (220 mg, 1.025 mmol) and (3,4- dimethoxyphenyl)boronic acid (187 mg, 1.025 mmol) in 1,4-dioxane (3 mL) and water (1.5 mL) was added sodium carbonate(435 mg, 4.10 mmol) and tetrakis(triphenylphosphine)palladium(0) (110 mg, 0.095 mmol). The reaction was heated in the microwave at 80° C. for 2 hours and at 100° C. for a further 2 hours. The solvent was removed and the residue was suspended in DMSO, filtered and purified by MDAP. Appropriate fractions were combined and the solvent removed to give 7-(3,4-dimethoxyphenyl)-4-oxo-4,5-dihydrofuro[2,3- d]pyridazine-2-carboxylic acid (25 mg, 7%) as a yellow solid. [0517] US 2016/16966 A1 Daniel M. Lowe. Extraction of chemical structures and reactions from the literature. Ph.D. Thesis, University of Cambridge, 2012
  • 6. Daniel M. Lowe. Extraction of chemical structures and reactions from the literature. Ph.D. Thesis, University of Cambridge, 2012 To 7-chloro-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid (Peakdale) (220 mg, 1.025 mmol) and (3,4- dimethoxyphenyl)boronic acid (187 mg, 1.025 mmol) in 1,4-dioxane (3 mL) and water (1.5 mL) was added sodium carbonate(435 mg, 4.10 mmol) and tetrakis(triphenylphosphine)palladium(0) (110 mg, 0.095 mmol). The reaction was heated in the microwave at 80° C. for 2 hours and at 100° C. for a further 2 hours. The solvent was removed and the residue was suspended in DMSO, filtered and purified by MDAP. Appropriate fractions were combined and the solvent removed to give 7-(3,4-dimethoxyphenyl)-4-oxo-4,5-dihydrofuro[2,3- d]pyridazine-2-carboxylic acid (25 mg, 7%) as a yellow solid. [0517] Product Properties 7-(3,4-dimethoxyphenyl)-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid 25 mg, 7% yield, Yellow Solid Reactant Properties 7-chloro-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid 220 mg, 1.025 mmol (3,4-dimethoxyphenyl)boronic acid 187 mg, 1.025 mmol Agent Properties 1,4-dioxane 3mL water 1.5mL sodium carbonate 435 mg, 4.10 mol tetrakis(triphenylphosphine)palladium(0) 110 mg, 0.095 mmol DMSO Unstructured text to a structured reaction table US 2016/16966 A1 LeadMine + Chemical Tagger
  • 7. Christos Nicolaou et al. The Proximal Lilly Collection: Mapping, Exploring and Exploiting Feasible Chemical Space J. Chem. Inf. Model., 2016, 56 (7), pp 1253–1266 Nadine Schneider et al. Big Data from Pharmaceutical Patents: A Computational Analysis of Medicinal Chemists’ Bread and Butter. J. Med. Chem., 2016, 59 (9), pp 4385–4402 Nadine Schneider et al. Development of a Novel Fingerprint for Chemical Reactions and Its Application to Large-Scale Reaction Classification and Similarity J. Chem. Inf. Model., 2015, 55 (1), pp 39–53 Nadine Schneider et al. What’s What: The (Nearly) Definitive Guide to Reaction Role Assignment. J. Chem. Inf. Model., 2016, 56 (12), pp 2336–2346 Connor Coley et al. Prediction of Organic Reaction Outcomes Using Machine Learning. ACS Cent. Sci., 2017, 3 (5), pp 434–443 Data impact CINF 13, ACS Fall 2017, Washington, D.C. Public subset released in 2014 as CC-Zero Pistachio expands the scope of the data and uses Atom- Atom Maps from NameRxn
  • 8. Example 26. Epizyme Inc. 1-phenoxy-3-(alkylamino)-propan-2-olderivatives as CARM1 inhibitors and uses thereof (US 09718816 B2) Aug. 1, 2017 Example 26, US 09718816 B2 John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step 1 Step 4 Step 3 Step 2 etc.. sketch extraction NextMove’s Praline
  • 9. total reactions over time CINF 13, ACS Fall 2017, Washington, D.C. 0 0.5M 1.0M 1.5M 2.0M 2.5M 3.0M 3.5M 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 ReactionDetails(cumulative) EPO Applications EPO Grants USPTO Applications USPTO Grants
  • 10. What do Synthetic Chemists Want from Their Reaction Systems? CINF 13, ACS Fall 2017, Washington, D.C. Data ClassificationDiagrams Search
  • 11. reaction DIAGRAMS Good reaction diagrams are essential in communicating synthetic chemistry Layout can be stored or generated • When extracting from text, layout must be generated • Generated diagrams can be unsatisfactory for display CINF 13, ACS Fall 2017, Washington, D.C.
  • 13. ChemAxonBIOVIA Generated from SMILES for US 2016/16966 A1 [0517]
  • 14. diagram improvements Typical work arounds: • Separately render molecules • Hide agents and list separately What do humans do: • Wrap products below • Abbreviate functional groups and agents • Orientate reactants to products and visa versa • Hide agents and list as text CINF 13, ACS Fall 2017, Washington, D.C.
  • 17. What do Synthetic Chemists Want from Their Reaction Systems? CINF 13, ACS Fall 2017, Washington, D.C. Data ClassificationDiagrams Search
  • 18. 4.1.6 Cyclic Beckmann rearrangement Assigns names to 900+ reactions using transformations Can guarantee perfect Atom-Atom Mapping • Atom-Atom Mapping is an output not an input • MCS mappers struggle with rearrangements: namerxn
  • 19. concepts and rxno CINF 13, ACS Fall 2017, Washington, D.C. 1 Heteroatom alkylation and arylation .7 O-substitution .1 Chan-Lam ether coupling .2 Diazomethane esterification .3 Ethyl esterification .4 Hydroxy to methoxy .5 Hydroxy to triflyloxy .6 Methyl esterification .n 2 Acylation and related processes .6 O-acylation to ester .1 Ester Schotten-Baumann .2 Esterification (generic) .3 Fischer-Speier esterification .4 Baeyer-Villiger oxidation .5 Yamaguchi esterification .6 Hydroxy to imidazolecarbonyloxy .7 Imidazolecarbonyl to ester .8 Hydroxy to acetoxy .9 Steglich esterification .n
  • 20. concepts and rxno CINF 13, ACS Fall 2017, Washington, D.C. 1 Heteroatom alkylation and arylation .7 O-substitution .1 Chan-Lam ether coupling .2 Diazomethane esterification .3 Ethyl esterification .4 Hydroxy to methoxy .5 Hydroxy to triflyloxy .6 Methyl esterification .n 2 Acylation and related processes .6 O-acylation to ester .1 Ester Schotten-Baumann .2 Esterification (generic) .3 Fischer-Speier esterification .4 Baeyer-Villiger oxidation .5 Yamaguchi esterification .6 Hydroxy to imidazolecarbonyloxy .7 Imidazolecarbonyl to ester .8 Hydroxy to acetoxy .9 Steglich esterification .n Esterification (7) Chan-Lam coupling (3) Schotten-Baumann Reaction (9) RXNO: http://github.com/rsc-ontologies/rxno
  • 21. result FACETS Provides summary over the key concepts of results Cut through information deluge and refine search CINF 13, ACS Fall 2017, Washington, D.C. • Reaction Types (NextMove ontology tree) • Drug Targets (ChEMBL ontology tree) • Disease Targets (MESH ontology tree) • Yields • Affiliation (NextMove ontology tree) • Publication Date, Documents, Authors
  • 22. CINF 13, ACS Fall 2017, Washington, D.C. Intel(R) Core(TM) i7-6900K CPU @ 3.20GHz 2.9 seconds to summarise all 6.6 million rows Resource expensive – O(n) size of result set • Client, server, or database? • Overhead copying and transferring data that is not needed • Calculate when requested or up-front? facet calculation Custom cartridge:
  • 23. What do Synthetic Chemists Want from Their Reaction Systems? CINF 13, ACS Fall 2017, Washington, D.C. Data ClassificationDiagrams Search
  • 24. one entry point CINF 13, ACS Fall 2017, Washington, D.C. Systematic Name Date Range Trivial Name Yield Range Affiliation Reaction SMARTS Disease Target DocumentLine Formula SMILES InChIAuthor Protein Target Collection Reaction Type (NameRxn)SMARTSSource …and logical combinations thereof
  • 25. suggestions Based on global frequency CINF 13, ACS Fall 2017, Washington, D.C. Based on context frequency
  • 26. structure search technology NextMove’s Arthor Technology Up to 100x faster then state-of-the- art Combination of SMARTS compilation and efficient storage Preliminary PostgreSQL integration 36s Arthor 56m BIOVIA Direct (Oracle) 1h Bingo (NoSQL) 1h54m Bingo (PostgreSQL) 2h6m Bingo (Oracle) 2h41m JChem (Oracle) 5h9m RDCart (PostgreSQL) 13h54m pgchem (PostgreSQL) 1d1h52m mychem (MySQL) 3d1h13m orchem (Oracle) Benchmark: ~3.5K queries against ~7M structures (eMolecules 2014) all on the same hardware. John May and Roger Sayle, Substructure Search Face-off, May 2015
  • 27. Intention can be refined by qualifiers Role {structure} product Substructure {structure} substructure {structure} substructure product Make/Break Synthesis of {structure} Combined with other terms {structure} substructure product and yield of 80% refining structure search CINF 13, ACS Fall 2017, Washington, D.C.
  • 30. Acknowledgements Noel O’Boyle (NextMove Software), Egon Willighagen (CDK) James Davison, Matt Swain (Vernalis) What do Synthetic Chemists Want from Their Reaction Systems? Data ClassificationDiagrams Search pistachio http://www.nextmovesoftware.com/pistachio.html Come find me around ACS for a demo! See also: CINF 90