SlideShare a Scribd company logo
Evaluating the Quality and
Performance of Automatic Atom
      Mapping Algorithms

                Daniel Lowe and Roger Sayle
                    NextMove Software
                       Cambridge, UK

 ACS National Meeting, Philadelphia, USA 20th August 2012
What is Atom-Mapping?




                                                      Mapping
                                                      algorithm




ACS National Meeting, Philadelphia, USA 20th August 2012
Why Perform Atom-Mapping?

• Assigning roles to reagents

• Normalization of reactions for registration




     ACS National Meeting, Philadelphia, USA 20th August 2012
Why Perform Atom-Mapping?

• More precise database searches
  – Solvents/catalysts can be distinguished from
    reactants
  – Allows the relationship between the reactant
    atoms and product atoms to be made explicit




    ACS National Meeting, Philadelphia, USA 20th August 2012
Example
• I want to find reactions converting an alkene
  to a cyclopropane so I search for C=C>>C1CC1




     ACS National Meeting, Philadelphia, USA 20th August 2012
Why Perform Atom-Mapping?

• Identifying suspect reactions:




     ACS National Meeting, Philadelphia, USA 20th August 2012
Qualities to look for in an atom
        mapping algorithm
• Chemically plausible atom mappings
• Ability to distinguish genuine reactants from
  solvents/catalysts
• Support for unbalanced reactions
  – Side product not specified
  – Reactant stoichiometry > 1
• Fast run-time


     ACS National Meeting, Philadelphia, USA 20th August 2012
Algorithms Evaluated

      Vendor:Program                                         Version
       ChemAxon:Marvin                                        5.10.1
              GGA:Indigo                                       1.1
         InfoChem:ICMAP                                       5.10
PerkinElmer:ChemDraw Ultra                                    12.0




  ACS National Meeting, Philadelphia, USA 20th August 2012
Methodology

                        Test set                                     Reactions
         Pharmaceutical ELN subset                                    18,244
           ChemReact68 database                                       67,926
           SPRESI database subset                                      5,230
      Reactions extracted from 2008-                                  562,872
     2011 USPTO patent applications*



* Lowe, D. M. Automated Extraction of Reactions from the Patent Literature.
243rd ACS National Meeting & Exposition, San Diego, CA, March 27, 2012.

          ACS National Meeting, Philadelphia, USA 20th August 2012
Methodology-cont.

• Reaction SMILES were used as input and
  output for all algorithms bar ICMAP
• Input and output was converted to and from
  RDF for use with ICMAP
• Indigo was ran with its default configuration
  and more lenient settings for matching
  valences, charges and bond orders
• Marvin was configured to use its best
  quality mapping strategy
     ACS National Meeting, Philadelphia, USA 20th August 2012
Ability to map all product atoms




  ACS National Meeting, Philadelphia, USA 20th August 2012
c-c bonds broken




ACS National Meeting, Philadelphia, USA 20th August 2012
Speed Comparison




  Average           1.7                    3.6                  1.6       4.0
reagents per
  reaction
               ACS National Meeting, Philadelphia, USA 20th August 2012
Simple mappings




        Marvin/ChemDraw/Indigo/ICMAP
ACS National Meeting, Philadelphia, USA 20th August 2012
Simple mappings




        Marvin/ChemDraw/Indigo/ICMAP
ACS National Meeting, Philadelphia, USA 20th August 2012
More complicated Mappings




                                        Marvin




                                   ChemDraw

ACS National Meeting, Philadelphia, USA 20th August 2012
More complicated Mappings



                                        Indigo




                                        ICMAP


ACS National Meeting, Philadelphia, USA 20th August 2012
Reuse of reactants




ACS National Meeting, Philadelphia, USA 20th August 2012
Reuse of reactants




                                        Marvin
ACS National Meeting, Philadelphia, USA 20th August 2012
Reuse of reactants




                                   ChemDraw
ACS National Meeting, Philadelphia, USA 20th August 2012
Reuse of reactants




                                        Indigo
ACS National Meeting, Philadelphia, USA 20th August 2012
Reuse of reactants




                                        ICMAP
ACS National Meeting, Philadelphia, USA 20th August 2012
Single Atom Mapping




                               ICMAP/Marvin




                          ChemDraw/Indigo

ACS National Meeting, Philadelphia, USA 20th August 2012
Bugs and quirks

• Marvin
  – 2 unsuccessful mappings produced unchecked
    exceptions rather than checked exceptions
• ChemDraw
  – Hydrogen on aromatic atoms missing in SMILES
     output
• Indigo
  – Calculation of valency fails for aromatic sulfur


     ACS National Meeting, Philadelphia, USA 20th August 2012
Bugs and quirks

• ICMAP
  – Single atom products are interpreted as empty
    molecules or occasionally replaced by a product
    from a previous reaction (bug reported)
  – Input files must be < 2gb and use dos line endings




    ACS National Meeting, Philadelphia, USA 20th August 2012
conclusions

• ICMAP produced the best quality mappings on
  the tested sets

• Atom mapping isn’t as simple as finding a
  maximum common subgraph mapping

• In all the algorithms there were aspects that
  could be improved to yield appreciable
  benefits
     ACS National Meeting, Philadelphia, USA 20th August 2012
acknowledgements

• Ed Griffen and Nick Tomkinson, AstraZeneca.
• Andrew Wooster, GSK.
• Hans Kraut, InfoChem


• Thank you for your time.




     ACS National Meeting, Philadelphia, USA 20th August 2012

More Related Content

Similar to Evaluating the Quality and Performance of Automatic Atom Mapping Algorithms

Chemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical PatentsChemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical Patents
dan2097
 
Robert kiss acs_2012_sd_upload
Robert kiss acs_2012_sd_uploadRobert kiss acs_2012_sd_upload
Robert kiss acs_2012_sd_upload
rkiss81
 
Chemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical PatentsChemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical Patents
NextMove Software
 
Recent improvements in marvin v6 reaction atom mapping and its application to...
Recent improvements in marvin v6 reaction atom mapping and its application to...Recent improvements in marvin v6 reaction atom mapping and its application to...
Recent improvements in marvin v6 reaction atom mapping and its application to...
NextMove Software
 
Green Chemistry &amp; Engineering
Green Chemistry &amp; EngineeringGreen Chemistry &amp; Engineering
Green Chemistry &amp; Engineering
ernestvictor
 
Transforming pharma to academia
Transforming pharma to academiaTransforming pharma to academia
Transforming pharma to academia
DIv CHAS
 
Integrating Analyzers with Automation Systems: Oil and Gas by David Schihabel
Integrating Analyzers with Automation Systems: Oil and Gas by David SchihabelIntegrating Analyzers with Automation Systems: Oil and Gas by David Schihabel
Integrating Analyzers with Automation Systems: Oil and Gas by David Schihabel
ISA Interchange
 
Gc&amp; E Presentation Rev2 A
Gc&amp; E Presentation Rev2 AGc&amp; E Presentation Rev2 A
Gc&amp; E Presentation Rev2 A
ernestvictor
 
Virtual Reaction Service Using Chem Axon Reactor July06
Virtual Reaction Service Using Chem Axon Reactor July06Virtual Reaction Service Using Chem Axon Reactor July06
Virtual Reaction Service Using Chem Axon Reactor July06
DanielSButler
 
ALD 2018 Tutorial ALD precursors for semiconductor and its development challe...
ALD 2018 Tutorial ALD precursors for semiconductor and its development challe...ALD 2018 Tutorial ALD precursors for semiconductor and its development challe...
ALD 2018 Tutorial ALD precursors for semiconductor and its development challe...
Wonyong Koh
 
Efficient Perception of Proteins and Nucleic Acids from Atomic Connectivity
Efficient Perception of Proteins and Nucleic Acids from Atomic ConnectivityEfficient Perception of Proteins and Nucleic Acids from Atomic Connectivity
Efficient Perception of Proteins and Nucleic Acids from Atomic Connectivity
NextMove Software
 

Similar to Evaluating the Quality and Performance of Automatic Atom Mapping Algorithms (11)

Chemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical PatentsChemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical Patents
 
Robert kiss acs_2012_sd_upload
Robert kiss acs_2012_sd_uploadRobert kiss acs_2012_sd_upload
Robert kiss acs_2012_sd_upload
 
Chemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical PatentsChemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical Patents
 
Recent improvements in marvin v6 reaction atom mapping and its application to...
Recent improvements in marvin v6 reaction atom mapping and its application to...Recent improvements in marvin v6 reaction atom mapping and its application to...
Recent improvements in marvin v6 reaction atom mapping and its application to...
 
Green Chemistry &amp; Engineering
Green Chemistry &amp; EngineeringGreen Chemistry &amp; Engineering
Green Chemistry &amp; Engineering
 
Transforming pharma to academia
Transforming pharma to academiaTransforming pharma to academia
Transforming pharma to academia
 
Integrating Analyzers with Automation Systems: Oil and Gas by David Schihabel
Integrating Analyzers with Automation Systems: Oil and Gas by David SchihabelIntegrating Analyzers with Automation Systems: Oil and Gas by David Schihabel
Integrating Analyzers with Automation Systems: Oil and Gas by David Schihabel
 
Gc&amp; E Presentation Rev2 A
Gc&amp; E Presentation Rev2 AGc&amp; E Presentation Rev2 A
Gc&amp; E Presentation Rev2 A
 
Virtual Reaction Service Using Chem Axon Reactor July06
Virtual Reaction Service Using Chem Axon Reactor July06Virtual Reaction Service Using Chem Axon Reactor July06
Virtual Reaction Service Using Chem Axon Reactor July06
 
ALD 2018 Tutorial ALD precursors for semiconductor and its development challe...
ALD 2018 Tutorial ALD precursors for semiconductor and its development challe...ALD 2018 Tutorial ALD precursors for semiconductor and its development challe...
ALD 2018 Tutorial ALD precursors for semiconductor and its development challe...
 
Efficient Perception of Proteins and Nucleic Acids from Atomic Connectivity
Efficient Perception of Proteins and Nucleic Acids from Atomic ConnectivityEfficient Perception of Proteins and Nucleic Acids from Atomic Connectivity
Efficient Perception of Proteins and Nucleic Acids from Atomic Connectivity
 

More from dan2097

From Open text mining solutions to Open Data resources
From Open text mining solutions to Open Data resourcesFrom Open text mining solutions to Open Data resources
From Open text mining solutions to Open Data resources
dan2097
 
Tackling the difficult areas of chemical entity extraction: Misspelt chemical...
Tackling the difficult areas of chemical entity extraction: Misspelt chemical...Tackling the difficult areas of chemical entity extraction: Misspelt chemical...
Tackling the difficult areas of chemical entity extraction: Misspelt chemical...
dan2097
 
OPSIN: Taming the jungle of IUPAC chemical nomenclature
OPSIN: Taming the jungle of IUPAC chemical nomenclatureOPSIN: Taming the jungle of IUPAC chemical nomenclature
OPSIN: Taming the jungle of IUPAC chemical nomenclature
dan2097
 
OPSIN: Taming the Jungle of IUPAC Chemical Nomenclature
OPSIN: Taming the Jungle of IUPAC Chemical NomenclatureOPSIN: Taming the Jungle of IUPAC Chemical Nomenclature
OPSIN: Taming the Jungle of IUPAC Chemical Nomenclature
dan2097
 
InChI vs IUPAC nomenclature: Aspects to be aware of when using Standard InChI
InChI vs IUPAC nomenclature: Aspects to be aware of when using Standard InChIInChI vs IUPAC nomenclature: Aspects to be aware of when using Standard InChI
InChI vs IUPAC nomenclature: Aspects to be aware of when using Standard InChI
dan2097
 
Automated Extraction of Reactions from the Patent Literature
Automated Extraction of Reactions from the Patent LiteratureAutomated Extraction of Reactions from the Patent Literature
Automated Extraction of Reactions from the Patent Literature
dan2097
 

More from dan2097 (6)

From Open text mining solutions to Open Data resources
From Open text mining solutions to Open Data resourcesFrom Open text mining solutions to Open Data resources
From Open text mining solutions to Open Data resources
 
Tackling the difficult areas of chemical entity extraction: Misspelt chemical...
Tackling the difficult areas of chemical entity extraction: Misspelt chemical...Tackling the difficult areas of chemical entity extraction: Misspelt chemical...
Tackling the difficult areas of chemical entity extraction: Misspelt chemical...
 
OPSIN: Taming the jungle of IUPAC chemical nomenclature
OPSIN: Taming the jungle of IUPAC chemical nomenclatureOPSIN: Taming the jungle of IUPAC chemical nomenclature
OPSIN: Taming the jungle of IUPAC chemical nomenclature
 
OPSIN: Taming the Jungle of IUPAC Chemical Nomenclature
OPSIN: Taming the Jungle of IUPAC Chemical NomenclatureOPSIN: Taming the Jungle of IUPAC Chemical Nomenclature
OPSIN: Taming the Jungle of IUPAC Chemical Nomenclature
 
InChI vs IUPAC nomenclature: Aspects to be aware of when using Standard InChI
InChI vs IUPAC nomenclature: Aspects to be aware of when using Standard InChIInChI vs IUPAC nomenclature: Aspects to be aware of when using Standard InChI
InChI vs IUPAC nomenclature: Aspects to be aware of when using Standard InChI
 
Automated Extraction of Reactions from the Patent Literature
Automated Extraction of Reactions from the Patent LiteratureAutomated Extraction of Reactions from the Patent Literature
Automated Extraction of Reactions from the Patent Literature
 

Recently uploaded

Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 

Recently uploaded (20)

Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 

Evaluating the Quality and Performance of Automatic Atom Mapping Algorithms

  • 1. Evaluating the Quality and Performance of Automatic Atom Mapping Algorithms Daniel Lowe and Roger Sayle NextMove Software Cambridge, UK ACS National Meeting, Philadelphia, USA 20th August 2012
  • 2. What is Atom-Mapping? Mapping algorithm ACS National Meeting, Philadelphia, USA 20th August 2012
  • 3. Why Perform Atom-Mapping? • Assigning roles to reagents • Normalization of reactions for registration ACS National Meeting, Philadelphia, USA 20th August 2012
  • 4. Why Perform Atom-Mapping? • More precise database searches – Solvents/catalysts can be distinguished from reactants – Allows the relationship between the reactant atoms and product atoms to be made explicit ACS National Meeting, Philadelphia, USA 20th August 2012
  • 5. Example • I want to find reactions converting an alkene to a cyclopropane so I search for C=C>>C1CC1 ACS National Meeting, Philadelphia, USA 20th August 2012
  • 6. Why Perform Atom-Mapping? • Identifying suspect reactions: ACS National Meeting, Philadelphia, USA 20th August 2012
  • 7. Qualities to look for in an atom mapping algorithm • Chemically plausible atom mappings • Ability to distinguish genuine reactants from solvents/catalysts • Support for unbalanced reactions – Side product not specified – Reactant stoichiometry > 1 • Fast run-time ACS National Meeting, Philadelphia, USA 20th August 2012
  • 8. Algorithms Evaluated Vendor:Program Version ChemAxon:Marvin 5.10.1 GGA:Indigo 1.1 InfoChem:ICMAP 5.10 PerkinElmer:ChemDraw Ultra 12.0 ACS National Meeting, Philadelphia, USA 20th August 2012
  • 9. Methodology Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68 database 67,926 SPRESI database subset 5,230 Reactions extracted from 2008- 562,872 2011 USPTO patent applications* * Lowe, D. M. Automated Extraction of Reactions from the Patent Literature. 243rd ACS National Meeting & Exposition, San Diego, CA, March 27, 2012. ACS National Meeting, Philadelphia, USA 20th August 2012
  • 10. Methodology-cont. • Reaction SMILES were used as input and output for all algorithms bar ICMAP • Input and output was converted to and from RDF for use with ICMAP • Indigo was ran with its default configuration and more lenient settings for matching valences, charges and bond orders • Marvin was configured to use its best quality mapping strategy ACS National Meeting, Philadelphia, USA 20th August 2012
  • 11. Ability to map all product atoms ACS National Meeting, Philadelphia, USA 20th August 2012
  • 12. c-c bonds broken ACS National Meeting, Philadelphia, USA 20th August 2012
  • 13. Speed Comparison Average 1.7 3.6 1.6 4.0 reagents per reaction ACS National Meeting, Philadelphia, USA 20th August 2012
  • 14. Simple mappings Marvin/ChemDraw/Indigo/ICMAP ACS National Meeting, Philadelphia, USA 20th August 2012
  • 15. Simple mappings Marvin/ChemDraw/Indigo/ICMAP ACS National Meeting, Philadelphia, USA 20th August 2012
  • 16. More complicated Mappings Marvin ChemDraw ACS National Meeting, Philadelphia, USA 20th August 2012
  • 17. More complicated Mappings Indigo ICMAP ACS National Meeting, Philadelphia, USA 20th August 2012
  • 18. Reuse of reactants ACS National Meeting, Philadelphia, USA 20th August 2012
  • 19. Reuse of reactants Marvin ACS National Meeting, Philadelphia, USA 20th August 2012
  • 20. Reuse of reactants ChemDraw ACS National Meeting, Philadelphia, USA 20th August 2012
  • 21. Reuse of reactants Indigo ACS National Meeting, Philadelphia, USA 20th August 2012
  • 22. Reuse of reactants ICMAP ACS National Meeting, Philadelphia, USA 20th August 2012
  • 23. Single Atom Mapping ICMAP/Marvin ChemDraw/Indigo ACS National Meeting, Philadelphia, USA 20th August 2012
  • 24. Bugs and quirks • Marvin – 2 unsuccessful mappings produced unchecked exceptions rather than checked exceptions • ChemDraw – Hydrogen on aromatic atoms missing in SMILES output • Indigo – Calculation of valency fails for aromatic sulfur ACS National Meeting, Philadelphia, USA 20th August 2012
  • 25. Bugs and quirks • ICMAP – Single atom products are interpreted as empty molecules or occasionally replaced by a product from a previous reaction (bug reported) – Input files must be < 2gb and use dos line endings ACS National Meeting, Philadelphia, USA 20th August 2012
  • 26. conclusions • ICMAP produced the best quality mappings on the tested sets • Atom mapping isn’t as simple as finding a maximum common subgraph mapping • In all the algorithms there were aspects that could be improved to yield appreciable benefits ACS National Meeting, Philadelphia, USA 20th August 2012
  • 27. acknowledgements • Ed Griffen and Nick Tomkinson, AstraZeneca. • Andrew Wooster, GSK. • Hans Kraut, InfoChem • Thank you for your time. ACS National Meeting, Philadelphia, USA 20th August 2012