Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

CINF 13: Pistachio - Search and Faceting of Large Reaction Databases

2,687 views

Published on

We have previously described the extraction of reactions from US and European patents. This talk will discuss the assembly of over six million extracted reaction details consisting of the connection tables, procedure, quantities, solvents, catalysts and yields into a searchable "read-only" Electronic Lab Notebook.

In addition to reactions details, concepts including diseases, drug targets, and assignees are recognised from the patent documents and normalised to appropriate ontologies. Each normalised term is paired with the reaction details found in the document to allow intuitive cross concept querying (e.g. "GlaxoSmithKline C-C Bond Formation greater than 80% yield Myocardial Infarction"). Reactions are classified and assigned to leafs in the RXNO Ontology. The ontologies are used to provide organisation, faceting, and filtering of results. The reaction classification also provides a precise atom mapping that facilitates structural transformation queries and can improve reaction diagram layout.

Through improvements in substructure search technology we will demonstrate several types of chemical synthesis queries that can be efficiently answered. The combination of high performance chemical searching and additional document terms provides a powerful exploratory and trend analysis tool for chemists.

Published in: Science
  • http://www.landscapingtacomawa.com/
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Pretty helpful slides, I gots to say! Very professional. Tacoma Landscaping thanks you for the post.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

CINF 13: Pistachio - Search and Faceting of Large Reaction Databases

  1. 1. CINF 13, ACS Fall 2017, Washington, D.C. pistachio Search and Faceting of Large Reaction Databases John Mayfield, Daniel Lowe, Roger Sayle
  2. 2. What do Synthetic Chemists Want from Their Reaction Systems? CINF 13, ACS Fall 2017, Washington, D.C. Data ClassificationDiagrams Search
  3. 3. What do Synthetic Chemists Want from Their Reaction Systems? CINF 13, ACS Fall 2017, Washington, D.C. Data ClassificationDiagrams Search
  4. 4. HazELNut Filbert NameRXN Cobnut Accelrys Pipeline Pilot (AstraZeneca, AbbVie & Hoffmann-La Roche) ChemAxon JChem Cartridge (GlaxoSmithKline & Novartis) Elsevier Reaxys (Hoffmann-La Roche, AstraZeneca, Merck) Perkin Elmer Informatics (formerly CambridgeSoft) eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating and processing reactions from Electronic Lab Notebooks (ELNs) CINF 13, ACS Fall 2017, Washington, D.C.
  5. 5. To 7-chloro-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid (Peakdale) (220 mg, 1.025 mmol) and (3,4- dimethoxyphenyl)boronic acid (187 mg, 1.025 mmol) in 1,4-dioxane (3 mL) and water (1.5 mL) was added sodium carbonate(435 mg, 4.10 mmol) and tetrakis(triphenylphosphine)palladium(0) (110 mg, 0.095 mmol). The reaction was heated in the microwave at 80° C. for 2 hours and at 100° C. for a further 2 hours. The solvent was removed and the residue was suspended in DMSO, filtered and purified by MDAP. Appropriate fractions were combined and the solvent removed to give 7-(3,4-dimethoxyphenyl)-4-oxo-4,5-dihydrofuro[2,3- d]pyridazine-2-carboxylic acid (25 mg, 7%) as a yellow solid. [0517] US 2016/16966 A1 Daniel M. Lowe. Extraction of chemical structures and reactions from the literature. Ph.D. Thesis, University of Cambridge, 2012
  6. 6. Daniel M. Lowe. Extraction of chemical structures and reactions from the literature. Ph.D. Thesis, University of Cambridge, 2012 To 7-chloro-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid (Peakdale) (220 mg, 1.025 mmol) and (3,4- dimethoxyphenyl)boronic acid (187 mg, 1.025 mmol) in 1,4-dioxane (3 mL) and water (1.5 mL) was added sodium carbonate(435 mg, 4.10 mmol) and tetrakis(triphenylphosphine)palladium(0) (110 mg, 0.095 mmol). The reaction was heated in the microwave at 80° C. for 2 hours and at 100° C. for a further 2 hours. The solvent was removed and the residue was suspended in DMSO, filtered and purified by MDAP. Appropriate fractions were combined and the solvent removed to give 7-(3,4-dimethoxyphenyl)-4-oxo-4,5-dihydrofuro[2,3- d]pyridazine-2-carboxylic acid (25 mg, 7%) as a yellow solid. [0517] Product Properties 7-(3,4-dimethoxyphenyl)-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid 25 mg, 7% yield, Yellow Solid Reactant Properties 7-chloro-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid 220 mg, 1.025 mmol (3,4-dimethoxyphenyl)boronic acid 187 mg, 1.025 mmol Agent Properties 1,4-dioxane 3mL water 1.5mL sodium carbonate 435 mg, 4.10 mol tetrakis(triphenylphosphine)palladium(0) 110 mg, 0.095 mmol DMSO Unstructured text to a structured reaction table US 2016/16966 A1 LeadMine + Chemical Tagger
  7. 7. Christos Nicolaou et al. The Proximal Lilly Collection: Mapping, Exploring and Exploiting Feasible Chemical Space J. Chem. Inf. Model., 2016, 56 (7), pp 1253–1266 Nadine Schneider et al. Big Data from Pharmaceutical Patents: A Computational Analysis of Medicinal Chemists’ Bread and Butter. J. Med. Chem., 2016, 59 (9), pp 4385–4402 Nadine Schneider et al. Development of a Novel Fingerprint for Chemical Reactions and Its Application to Large-Scale Reaction Classification and Similarity J. Chem. Inf. Model., 2015, 55 (1), pp 39–53 Nadine Schneider et al. What’s What: The (Nearly) Definitive Guide to Reaction Role Assignment. J. Chem. Inf. Model., 2016, 56 (12), pp 2336–2346 Connor Coley et al. Prediction of Organic Reaction Outcomes Using Machine Learning. ACS Cent. Sci., 2017, 3 (5), pp 434–443 Data impact CINF 13, ACS Fall 2017, Washington, D.C. Public subset released in 2014 as CC-Zero Pistachio expands the scope of the data and uses Atom- Atom Maps from NameRxn
  8. 8. Example 26. Epizyme Inc. 1-phenoxy-3-(alkylamino)-propan-2-olderivatives as CARM1 inhibitors and uses thereof (US 09718816 B2) Aug. 1, 2017 Example 26, US 09718816 B2 John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step 1 Step 4 Step 3 Step 2 etc.. sketch extraction NextMove’s Praline
  9. 9. total reactions over time CINF 13, ACS Fall 2017, Washington, D.C. 0 0.5M 1.0M 1.5M 2.0M 2.5M 3.0M 3.5M 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 ReactionDetails(cumulative) EPO Applications EPO Grants USPTO Applications USPTO Grants
  10. 10. What do Synthetic Chemists Want from Their Reaction Systems? CINF 13, ACS Fall 2017, Washington, D.C. Data ClassificationDiagrams Search
  11. 11. reaction DIAGRAMS Good reaction diagrams are essential in communicating synthetic chemistry Layout can be stored or generated • When extracting from text, layout must be generated • Generated diagrams can be unsatisfactory for display CINF 13, ACS Fall 2017, Washington, D.C.
  12. 12. O OB OH HO OH O O Cl N HN C O PPd P P P O O Na+ Na+ -O O- O H2O O O N HN C O O OH O + ChemDrawOEChem Generated from SMILES for US 2016/16966 A1 [0517]
  13. 13. ChemAxonBIOVIA Generated from SMILES for US 2016/16966 A1 [0517]
  14. 14. diagram improvements Typical work arounds: • Separately render molecules • Hide agents and list separately What do humans do: • Wrap products below • Abbreviate functional groups and agents • Orientate reactants to products and visa versa • Hide agents and list as text CINF 13, ACS Fall 2017, Washington, D.C.
  15. 15. Pistachio+CDK (Abbreviated+Aligned) Pistachio+CDK (Abbreviated) Generated from SMILES for US 2016/16966 A1 [0517]
  16. 16. reaction detail view
  17. 17. What do Synthetic Chemists Want from Their Reaction Systems? CINF 13, ACS Fall 2017, Washington, D.C. Data ClassificationDiagrams Search
  18. 18. 4.1.6 Cyclic Beckmann rearrangement Assigns names to 900+ reactions using transformations Can guarantee perfect Atom-Atom Mapping • Atom-Atom Mapping is an output not an input • MCS mappers struggle with rearrangements: namerxn
  19. 19. concepts and rxno CINF 13, ACS Fall 2017, Washington, D.C. 1 Heteroatom alkylation and arylation .7 O-substitution .1 Chan-Lam ether coupling .2 Diazomethane esterification .3 Ethyl esterification .4 Hydroxy to methoxy .5 Hydroxy to triflyloxy .6 Methyl esterification .n 2 Acylation and related processes .6 O-acylation to ester .1 Ester Schotten-Baumann .2 Esterification (generic) .3 Fischer-Speier esterification .4 Baeyer-Villiger oxidation .5 Yamaguchi esterification .6 Hydroxy to imidazolecarbonyloxy .7 Imidazolecarbonyl to ester .8 Hydroxy to acetoxy .9 Steglich esterification .n
  20. 20. concepts and rxno CINF 13, ACS Fall 2017, Washington, D.C. 1 Heteroatom alkylation and arylation .7 O-substitution .1 Chan-Lam ether coupling .2 Diazomethane esterification .3 Ethyl esterification .4 Hydroxy to methoxy .5 Hydroxy to triflyloxy .6 Methyl esterification .n 2 Acylation and related processes .6 O-acylation to ester .1 Ester Schotten-Baumann .2 Esterification (generic) .3 Fischer-Speier esterification .4 Baeyer-Villiger oxidation .5 Yamaguchi esterification .6 Hydroxy to imidazolecarbonyloxy .7 Imidazolecarbonyl to ester .8 Hydroxy to acetoxy .9 Steglich esterification .n Esterification (7) Chan-Lam coupling (3) Schotten-Baumann Reaction (9) RXNO: http://github.com/rsc-ontologies/rxno
  21. 21. result FACETS Provides summary over the key concepts of results Cut through information deluge and refine search CINF 13, ACS Fall 2017, Washington, D.C. • Reaction Types (NextMove ontology tree) • Drug Targets (ChEMBL ontology tree) • Disease Targets (MESH ontology tree) • Yields • Affiliation (NextMove ontology tree) • Publication Date, Documents, Authors
  22. 22. CINF 13, ACS Fall 2017, Washington, D.C. Intel(R) Core(TM) i7-6900K CPU @ 3.20GHz 2.9 seconds to summarise all 6.6 million rows Resource expensive – O(n) size of result set • Client, server, or database? • Overhead copying and transferring data that is not needed • Calculate when requested or up-front? facet calculation Custom cartridge:
  23. 23. What do Synthetic Chemists Want from Their Reaction Systems? CINF 13, ACS Fall 2017, Washington, D.C. Data ClassificationDiagrams Search
  24. 24. one entry point CINF 13, ACS Fall 2017, Washington, D.C. Systematic Name Date Range Trivial Name Yield Range Affiliation Reaction SMARTS Disease Target DocumentLine Formula SMILES InChIAuthor Protein Target Collection Reaction Type (NameRxn)SMARTSSource …and logical combinations thereof
  25. 25. suggestions Based on global frequency CINF 13, ACS Fall 2017, Washington, D.C. Based on context frequency
  26. 26. structure search technology NextMove’s Arthor Technology Up to 100x faster then state-of-the- art Combination of SMARTS compilation and efficient storage Preliminary PostgreSQL integration 36s Arthor 56m BIOVIA Direct (Oracle) 1h Bingo (NoSQL) 1h54m Bingo (PostgreSQL) 2h6m Bingo (Oracle) 2h41m JChem (Oracle) 5h9m RDCart (PostgreSQL) 13h54m pgchem (PostgreSQL) 1d1h52m mychem (MySQL) 3d1h13m orchem (Oracle) Benchmark: ~3.5K queries against ~7M structures (eMolecules 2014) all on the same hardware. John May and Roger Sayle, Substructure Search Face-off, May 2015
  27. 27. Intention can be refined by qualifiers Role {structure} product Substructure {structure} substructure {structure} substructure product Make/Break Synthesis of {structure} Combined with other terms {structure} substructure product and yield of 80% refining structure search CINF 13, ACS Fall 2017, Washington, D.C.
  28. 28. Find: 7H-purine substructure product Find: Synthesis of 7H-purine make/break example CINF 13, ACS Fall 2017, Washington, D.C.
  29. 29. Find: 7H-purine-8-one substructure chlorination Find: [*:1][CH2:2]Cl>>[*:1][CH2:2]F Namerxn example CINF 13, ACS Fall 2017, Washington, D.C.
  30. 30. Acknowledgements Noel O’Boyle (NextMove Software), Egon Willighagen (CDK) James Davison, Matt Swain (Vernalis) What do Synthetic Chemists Want from Their Reaction Systems? Data ClassificationDiagrams Search pistachio http://www.nextmovesoftware.com/pistachio.html Come find me around ACS for a demo! See also: CINF 90

×