SlideShare a Scribd company logo
1 of 18
Download to read offline
The anatomy of a
chemical reaction:
Dissection by machine
learning algorithms
Alex M. Clark, Ph.D.
August 2014
© 2015 Molecular Materials Informatics, Inc.
http://molmatinf.com
MOLECULAR MATERIALS INFORMATICS
21st Century Publishing
2
chemist
experiment
write up
confirm
μ pub
URI
viewing
searching
machine
learning
MOLECULAR MATERIALS INFORMATICS
All your byte are belong to us
• Just because a reaction scheme is digital…
• … doesn’t mean it’s of any use to a computer.
3
MOLECULAR MATERIALS INFORMATICS
Production Raster Graphics
4
Generic molfile
15 16 0 0 0 0 0 0 0 0999 V2000
-3.9510 4.0500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-5.2500 3.3000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.6519 3.3000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-5.2500 1.8000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-3.9510 1.0500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.6519 1.8000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.2306 3.7694 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.3482 2.5611 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.2240 1.3407 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
-6.5490 4.0500 0.0000 N 0 3 0 0 0 0 0 0 0 0 0 0
-7.8481 3.3000 0.0000 O 0 5 0 0 0 0 0 0 0 0 0 0
-6.5490 5.5500 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1.1518 2.5673 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.9072 1.2714 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.8964 3.8695 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
1 3 2 0 0 0 0
2 4 2 0 0 0 0
4 5 1 0 0 0 0
5 6 2 0 0 0 0
6 3 1 0 0 0 0
3 7 1 0 0 0 0
7 8 2 0 0 0 0
8 9 1 0 0 0 0
9 6 1 0 0 0 0
2 10 1 0 0 0 0
10 11 1 0 0 0 0
10 12 2 0 0 0 0
8 13 1 0 0 0 0
13 14 1 0 0 0 0
13 15 1 0 0 0 0
M CHG 2 10 1 11 -1
M END
MOLECULAR MATERIALS INFORMATICS
Production Vector Graphics
• Manuscripts usually delivered as PDFs:
5
MOLECULAR MATERIALS INFORMATICS
Spreadsheets
• Data gives the impression of organisation
• Very high degrees of freedom, nothing for structures
6
MOLECULAR MATERIALS INFORMATICS
Common Scheme
7
MOLECULAR MATERIALS INFORMATICS
Digitally Friendly
8
primary
reactant
secondary
reactants
catalyst
solvent
intermediate
byproducts
final
product
reagent
MOLECULAR MATERIALS INFORMATICS
Representation
• For machines: representation
must be very rigidly defined
• For humans: can generate
diagram programmatically
• MDL RXN/RDfile ~50% there
• DataSheet XML with Experiment
aspect
http://molmatinf.com/fmtaspect.html
9
StructureStep Role
1
1
1
1
1
1
2
2
Reactant
Reagent
Product
Product
Stoich.
1
1
1
1
1
1
1
1
Reactant
Reagent
Reagent
Reagent
MOLECULAR MATERIALS INFORMATICS
Balancing
10
MOLECULAR MATERIALS INFORMATICS
Quantities
11
MOLECULAR MATERIALS INFORMATICS
Green Metrics
12
• Totals for reactants, products & waste
• For each non-waste product: yield, PMI, E-factor,
Atom-E… always calculated, always recorded
MOLECULAR MATERIALS INFORMATICS 13
Reaction Transforms
• Reaction = specific description of experiment
1 2 3
4
1 2
3
4
• Transform = the generic form of a reaction
MOLECULAR MATERIALS INFORMATICS 14
Convenience
• Apply to a molecule...
10 g
MOLECULAR MATERIALS INFORMATICS 15
Decision MakingProductSearchResults
Yield PMI E-factor
Atom
Economy
100% 2.18 1.18 100%
84% 12.49 11.49 93.3%
82% 19.17 18.17 87.4%
100% 8.26 7.26 56.3%
63% 8.93 7.93 73.3%
MOLECULAR MATERIALS INFORMATICS
Model Building
• Most reaction data is noisy and incomplete
• Imagine opportunities with quantity & quality...
16
1
2
3
4
5 1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
• For example: model solvent substitution
MOLECULAR MATERIALS INFORMATICS
Conclusions & Future
• Most published reactions intractible to machines
• Most reaction informatics formats 50% complete
• Full description has immediate benefits...
• ... eventual large scale machine learning.
• μPublications with provenance: the path to open
repositories - but requires attention to content
17
Acknowledgments
http://molmatinf.com
http://molsync.com
http://cheminf20.org
@aclarkxyz
• Antony Williams
• Sean Ekins
• Leah McEwen
• Open data advocates
• Inquiries to
info@molmatinf.com

More Related Content

Similar to Machine learning dissects chemical reactions

Chemoinformatics in Action
Chemoinformatics in ActionChemoinformatics in Action
Chemoinformatics in ActionSSA KPI
 
PEP Functional Proteomics Technology
PEP Functional Proteomics TechnologyPEP Functional Proteomics Technology
PEP Functional Proteomics TechnologyXing Wang
 
Fast & Micro GC Presentation
Fast & Micro GC PresentationFast & Micro GC Presentation
Fast & Micro GC Presentationspparker
 
Assessing the consistency, quality, and completeness of the Reviewed Event Bu...
Assessing the consistency, quality, and completeness of the Reviewed Event Bu...Assessing the consistency, quality, and completeness of the Reviewed Event Bu...
Assessing the consistency, quality, and completeness of the Reviewed Event Bu...Ivan Kitov
 
EUGM 2013 - Anh Kiet Tran Minh (CNRS): French Academic Compound Library: the ...
EUGM 2013 - Anh Kiet Tran Minh (CNRS): French Academic Compound Library: the ...EUGM 2013 - Anh Kiet Tran Minh (CNRS): French Academic Compound Library: the ...
EUGM 2013 - Anh Kiet Tran Minh (CNRS): French Academic Compound Library: the ...ChemAxon
 
Industry 4.0 v2.0 4x3 (are sunum) (1)
Industry 4.0 v2.0 4x3 (are sunum) (1)Industry 4.0 v2.0 4x3 (are sunum) (1)
Industry 4.0 v2.0 4x3 (are sunum) (1)Mustafa Kuğu
 
BILS 2015 Christoph Herwig
BILS 2015 Christoph HerwigBILS 2015 Christoph Herwig
BILS 2015 Christoph HerwigGBX Events
 
Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...
Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...
Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...NextMove Software
 
Adventures in Metabolite Profiling with an Accurate Mass QTof
Adventures in Metabolite Profiling with an Accurate Mass QTofAdventures in Metabolite Profiling with an Accurate Mass QTof
Adventures in Metabolite Profiling with an Accurate Mass QTofMicroConstants
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceDr. Haxel Consult
 
Impact of Laboratory Automation on quality and TRT. Evaluating and Selecting...
Impact of  Laboratory Automation on quality and TRT. Evaluating and Selecting...Impact of  Laboratory Automation on quality and TRT. Evaluating and Selecting...
Impact of Laboratory Automation on quality and TRT. Evaluating and Selecting...Moustafa Rezk
 
EUGM 2014 - Roger Sayle (NextMove Software): Implementing ISO standard 11238 ...
EUGM 2014 - Roger Sayle (NextMove Software): Implementing ISO standard 11238 ...EUGM 2014 - Roger Sayle (NextMove Software): Implementing ISO standard 11238 ...
EUGM 2014 - Roger Sayle (NextMove Software): Implementing ISO standard 11238 ...ChemAxon
 
Irish Renewable Energy Summit 230212 Final
Irish Renewable Energy Summit 230212 FinalIrish Renewable Energy Summit 230212 Final
Irish Renewable Energy Summit 230212 Finalbenhanley77
 

Similar to Machine learning dissects chemical reactions (20)

Chemoinformatics in Action
Chemoinformatics in ActionChemoinformatics in Action
Chemoinformatics in Action
 
PEP Functional Proteomics Technology
PEP Functional Proteomics TechnologyPEP Functional Proteomics Technology
PEP Functional Proteomics Technology
 
Fast & Micro GC Presentation
Fast & Micro GC PresentationFast & Micro GC Presentation
Fast & Micro GC Presentation
 
Chang Sha, China
Chang Sha, ChinaChang Sha, China
Chang Sha, China
 
Assessing the consistency, quality, and completeness of the Reviewed Event Bu...
Assessing the consistency, quality, and completeness of the Reviewed Event Bu...Assessing the consistency, quality, and completeness of the Reviewed Event Bu...
Assessing the consistency, quality, and completeness of the Reviewed Event Bu...
 
EUGM 2013 - Anh Kiet Tran Minh (CNRS): French Academic Compound Library: the ...
EUGM 2013 - Anh Kiet Tran Minh (CNRS): French Academic Compound Library: the ...EUGM 2013 - Anh Kiet Tran Minh (CNRS): French Academic Compound Library: the ...
EUGM 2013 - Anh Kiet Tran Minh (CNRS): French Academic Compound Library: the ...
 
Mr Benjamin Soffer
Mr Benjamin SofferMr Benjamin Soffer
Mr Benjamin Soffer
 
Data integration and building a profile for yourself as an online scientist
Data integration and building a profile for yourself as an online scientistData integration and building a profile for yourself as an online scientist
Data integration and building a profile for yourself as an online scientist
 
Industry 4.0 v2.0 4x3 (are sunum) (1)
Industry 4.0 v2.0 4x3 (are sunum) (1)Industry 4.0 v2.0 4x3 (are sunum) (1)
Industry 4.0 v2.0 4x3 (are sunum) (1)
 
BILS 2015 Christoph Herwig
BILS 2015 Christoph HerwigBILS 2015 Christoph Herwig
BILS 2015 Christoph Herwig
 
Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...
Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...
Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...
 
Energy Saving Calculations for Recommissioning and Design
Energy Saving Calculations for Recommissioning and DesignEnergy Saving Calculations for Recommissioning and Design
Energy Saving Calculations for Recommissioning and Design
 
Adventures in Metabolite Profiling with an Accurate Mass QTof
Adventures in Metabolite Profiling with an Accurate Mass QTofAdventures in Metabolite Profiling with an Accurate Mass QTof
Adventures in Metabolite Profiling with an Accurate Mass QTof
 
ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics...
ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics...ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics...
ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics...
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
Machine Learning Impact on IoT - Part 2
Machine Learning Impact on IoT - Part 2Machine Learning Impact on IoT - Part 2
Machine Learning Impact on IoT - Part 2
 
Impact of Laboratory Automation on quality and TRT. Evaluating and Selecting...
Impact of  Laboratory Automation on quality and TRT. Evaluating and Selecting...Impact of  Laboratory Automation on quality and TRT. Evaluating and Selecting...
Impact of Laboratory Automation on quality and TRT. Evaluating and Selecting...
 
EUGM 2014 - Roger Sayle (NextMove Software): Implementing ISO standard 11238 ...
EUGM 2014 - Roger Sayle (NextMove Software): Implementing ISO standard 11238 ...EUGM 2014 - Roger Sayle (NextMove Software): Implementing ISO standard 11238 ...
EUGM 2014 - Roger Sayle (NextMove Software): Implementing ISO standard 11238 ...
 
Irish Renewable Energy Summit 230212 Final
Irish Renewable Energy Summit 230212 FinalIrish Renewable Energy Summit 230212 Final
Irish Renewable Energy Summit 230212 Final
 
Future of Bibliographic Systems: Designing a Roadmap to a new Bibliographic I...
Future of Bibliographic Systems: Designing a Roadmap to a new Bibliographic I...Future of Bibliographic Systems: Designing a Roadmap to a new Bibliographic I...
Future of Bibliographic Systems: Designing a Roadmap to a new Bibliographic I...
 

More from Alex Clark

Mixtures QSAR: modelling collections of chemicals
Mixtures QSAR: modelling collections of chemicalsMixtures QSAR: modelling collections of chemicals
Mixtures QSAR: modelling collections of chemicalsAlex Clark
 
Mixtures InChI: a story of how standards drive upstream products
Mixtures InChI: a story of how standards drive upstream productsMixtures InChI: a story of how standards drive upstream products
Mixtures InChI: a story of how standards drive upstream productsAlex Clark
 
Mixtures as first class citizens in the realm of informatics
Mixtures as first class citizens in the realm of informaticsMixtures as first class citizens in the realm of informatics
Mixtures as first class citizens in the realm of informaticsAlex Clark
 
Mixtures: informatics for formulations and consumer products
Mixtures: informatics for formulations and consumer productsMixtures: informatics for formulations and consumer products
Mixtures: informatics for formulations and consumer productsAlex Clark
 
Coordination InChI (2019)
Coordination InChI (2019)Coordination InChI (2019)
Coordination InChI (2019)Alex Clark
 
Chemical mixtures: File format, open source tools, example data, and mixtures...
Chemical mixtures: File format, open source tools, example data, and mixtures...Chemical mixtures: File format, open source tools, example data, and mixtures...
Chemical mixtures: File format, open source tools, example data, and mixtures...Alex Clark
 
Bringing bioassay protocols to the world of informatics, using semantic annot...
Bringing bioassay protocols to the world of informatics, using semantic annot...Bringing bioassay protocols to the world of informatics, using semantic annot...
Bringing bioassay protocols to the world of informatics, using semantic annot...Alex Clark
 
ACS CINF Luncheon talk (Boston 2018)
ACS CINF Luncheon talk (Boston 2018)ACS CINF Luncheon talk (Boston 2018)
ACS CINF Luncheon talk (Boston 2018)Alex Clark
 
Autonomous model building with a preponderance of well annotated assay protocols
Autonomous model building with a preponderance of well annotated assay protocolsAutonomous model building with a preponderance of well annotated assay protocols
Autonomous model building with a preponderance of well annotated assay protocolsAlex Clark
 
Representing molecules with minimalism: A solution to the entropy of informatics
Representing molecules with minimalism: A solution to the entropy of informaticsRepresenting molecules with minimalism: A solution to the entropy of informatics
Representing molecules with minimalism: A solution to the entropy of informaticsAlex Clark
 
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...Alex Clark
 
BioAssay Express
BioAssay ExpressBioAssay Express
BioAssay ExpressAlex Clark
 
Compact models for compact devices: Visualisation of SAR using mobile apps
Compact models for compact devices: Visualisation of SAR using mobile appsCompact models for compact devices: Visualisation of SAR using mobile apps
Compact models for compact devices: Visualisation of SAR using mobile appsAlex Clark
 
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)Alex Clark
 
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013Alex Clark
 
Open Drug Discovery Teams @ Hacking Health Montreal
Open Drug Discovery Teams @ Hacking Health MontrealOpen Drug Discovery Teams @ Hacking Health Montreal
Open Drug Discovery Teams @ Hacking Health MontrealAlex Clark
 
Pistoia Alliance App Strategy
Pistoia Alliance App StrategyPistoia Alliance App Strategy
Pistoia Alliance App StrategyAlex Clark
 
Mobile+Cloud: a viable replacement for desktop cheminformatics?
Mobile+Cloud: a viable replacement for desktop cheminformatics?Mobile+Cloud: a viable replacement for desktop cheminformatics?
Mobile+Cloud: a viable replacement for desktop cheminformatics?Alex Clark
 
Practical cheminformatics workflows with mobile apps
Practical cheminformatics workflows with mobile appsPractical cheminformatics workflows with mobile apps
Practical cheminformatics workflows with mobile appsAlex Clark
 
Alex M. Clark, CINF, ACS 2012 Philadelphia
Alex M. Clark, CINF, ACS 2012 PhiladelphiaAlex M. Clark, CINF, ACS 2012 Philadelphia
Alex M. Clark, CINF, ACS 2012 PhiladelphiaAlex Clark
 

More from Alex Clark (20)

Mixtures QSAR: modelling collections of chemicals
Mixtures QSAR: modelling collections of chemicalsMixtures QSAR: modelling collections of chemicals
Mixtures QSAR: modelling collections of chemicals
 
Mixtures InChI: a story of how standards drive upstream products
Mixtures InChI: a story of how standards drive upstream productsMixtures InChI: a story of how standards drive upstream products
Mixtures InChI: a story of how standards drive upstream products
 
Mixtures as first class citizens in the realm of informatics
Mixtures as first class citizens in the realm of informaticsMixtures as first class citizens in the realm of informatics
Mixtures as first class citizens in the realm of informatics
 
Mixtures: informatics for formulations and consumer products
Mixtures: informatics for formulations and consumer productsMixtures: informatics for formulations and consumer products
Mixtures: informatics for formulations and consumer products
 
Coordination InChI (2019)
Coordination InChI (2019)Coordination InChI (2019)
Coordination InChI (2019)
 
Chemical mixtures: File format, open source tools, example data, and mixtures...
Chemical mixtures: File format, open source tools, example data, and mixtures...Chemical mixtures: File format, open source tools, example data, and mixtures...
Chemical mixtures: File format, open source tools, example data, and mixtures...
 
Bringing bioassay protocols to the world of informatics, using semantic annot...
Bringing bioassay protocols to the world of informatics, using semantic annot...Bringing bioassay protocols to the world of informatics, using semantic annot...
Bringing bioassay protocols to the world of informatics, using semantic annot...
 
ACS CINF Luncheon talk (Boston 2018)
ACS CINF Luncheon talk (Boston 2018)ACS CINF Luncheon talk (Boston 2018)
ACS CINF Luncheon talk (Boston 2018)
 
Autonomous model building with a preponderance of well annotated assay protocols
Autonomous model building with a preponderance of well annotated assay protocolsAutonomous model building with a preponderance of well annotated assay protocols
Autonomous model building with a preponderance of well annotated assay protocols
 
Representing molecules with minimalism: A solution to the entropy of informatics
Representing molecules with minimalism: A solution to the entropy of informaticsRepresenting molecules with minimalism: A solution to the entropy of informatics
Representing molecules with minimalism: A solution to the entropy of informatics
 
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...
 
BioAssay Express
BioAssay ExpressBioAssay Express
BioAssay Express
 
Compact models for compact devices: Visualisation of SAR using mobile apps
Compact models for compact devices: Visualisation of SAR using mobile appsCompact models for compact devices: Visualisation of SAR using mobile apps
Compact models for compact devices: Visualisation of SAR using mobile apps
 
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)
 
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013
 
Open Drug Discovery Teams @ Hacking Health Montreal
Open Drug Discovery Teams @ Hacking Health MontrealOpen Drug Discovery Teams @ Hacking Health Montreal
Open Drug Discovery Teams @ Hacking Health Montreal
 
Pistoia Alliance App Strategy
Pistoia Alliance App StrategyPistoia Alliance App Strategy
Pistoia Alliance App Strategy
 
Mobile+Cloud: a viable replacement for desktop cheminformatics?
Mobile+Cloud: a viable replacement for desktop cheminformatics?Mobile+Cloud: a viable replacement for desktop cheminformatics?
Mobile+Cloud: a viable replacement for desktop cheminformatics?
 
Practical cheminformatics workflows with mobile apps
Practical cheminformatics workflows with mobile appsPractical cheminformatics workflows with mobile apps
Practical cheminformatics workflows with mobile apps
 
Alex M. Clark, CINF, ACS 2012 Philadelphia
Alex M. Clark, CINF, ACS 2012 PhiladelphiaAlex M. Clark, CINF, ACS 2012 Philadelphia
Alex M. Clark, CINF, ACS 2012 Philadelphia
 

Recently uploaded

VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 

Recently uploaded (20)

VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 

Machine learning dissects chemical reactions

  • 1. The anatomy of a chemical reaction: Dissection by machine learning algorithms Alex M. Clark, Ph.D. August 2014 © 2015 Molecular Materials Informatics, Inc. http://molmatinf.com
  • 2. MOLECULAR MATERIALS INFORMATICS 21st Century Publishing 2 chemist experiment write up confirm μ pub URI viewing searching machine learning
  • 3. MOLECULAR MATERIALS INFORMATICS All your byte are belong to us • Just because a reaction scheme is digital… • … doesn’t mean it’s of any use to a computer. 3
  • 4. MOLECULAR MATERIALS INFORMATICS Production Raster Graphics 4 Generic molfile 15 16 0 0 0 0 0 0 0 0999 V2000 -3.9510 4.0500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -5.2500 3.3000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -2.6519 3.3000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -5.2500 1.8000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -3.9510 1.0500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -2.6519 1.8000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.2306 3.7694 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.3482 2.5611 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.2240 1.3407 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 -6.5490 4.0500 0.0000 N 0 3 0 0 0 0 0 0 0 0 0 0 -7.8481 3.3000 0.0000 O 0 5 0 0 0 0 0 0 0 0 0 0 -6.5490 5.5500 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 1.1518 2.5673 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1.9072 1.2714 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1.8964 3.8695 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 1 3 2 0 0 0 0 2 4 2 0 0 0 0 4 5 1 0 0 0 0 5 6 2 0 0 0 0 6 3 1 0 0 0 0 3 7 1 0 0 0 0 7 8 2 0 0 0 0 8 9 1 0 0 0 0 9 6 1 0 0 0 0 2 10 1 0 0 0 0 10 11 1 0 0 0 0 10 12 2 0 0 0 0 8 13 1 0 0 0 0 13 14 1 0 0 0 0 13 15 1 0 0 0 0 M CHG 2 10 1 11 -1 M END
  • 5. MOLECULAR MATERIALS INFORMATICS Production Vector Graphics • Manuscripts usually delivered as PDFs: 5
  • 6. MOLECULAR MATERIALS INFORMATICS Spreadsheets • Data gives the impression of organisation • Very high degrees of freedom, nothing for structures 6
  • 8. MOLECULAR MATERIALS INFORMATICS Digitally Friendly 8 primary reactant secondary reactants catalyst solvent intermediate byproducts final product reagent
  • 9. MOLECULAR MATERIALS INFORMATICS Representation • For machines: representation must be very rigidly defined • For humans: can generate diagram programmatically • MDL RXN/RDfile ~50% there • DataSheet XML with Experiment aspect http://molmatinf.com/fmtaspect.html 9 StructureStep Role 1 1 1 1 1 1 2 2 Reactant Reagent Product Product Stoich. 1 1 1 1 1 1 1 1 Reactant Reagent Reagent Reagent
  • 12. MOLECULAR MATERIALS INFORMATICS Green Metrics 12 • Totals for reactants, products & waste • For each non-waste product: yield, PMI, E-factor, Atom-E… always calculated, always recorded
  • 13. MOLECULAR MATERIALS INFORMATICS 13 Reaction Transforms • Reaction = specific description of experiment 1 2 3 4 1 2 3 4 • Transform = the generic form of a reaction
  • 14. MOLECULAR MATERIALS INFORMATICS 14 Convenience • Apply to a molecule... 10 g
  • 15. MOLECULAR MATERIALS INFORMATICS 15 Decision MakingProductSearchResults Yield PMI E-factor Atom Economy 100% 2.18 1.18 100% 84% 12.49 11.49 93.3% 82% 19.17 18.17 87.4% 100% 8.26 7.26 56.3% 63% 8.93 7.93 73.3%
  • 16. MOLECULAR MATERIALS INFORMATICS Model Building • Most reaction data is noisy and incomplete • Imagine opportunities with quantity & quality... 16 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 • For example: model solvent substitution
  • 17. MOLECULAR MATERIALS INFORMATICS Conclusions & Future • Most published reactions intractible to machines • Most reaction informatics formats 50% complete • Full description has immediate benefits... • ... eventual large scale machine learning. • μPublications with provenance: the path to open repositories - but requires attention to content 17
  • 18. Acknowledgments http://molmatinf.com http://molsync.com http://cheminf20.org @aclarkxyz • Antony Williams • Sean Ekins • Leah McEwen • Open data advocates • Inquiries to info@molmatinf.com