SlideShare a Scribd company logo
1 of 40
Capturing Chemistry in XML/CML J. A. Townsend * ,  S. E. Adams *  , J. M. Goodman * ,  P. Murray-Rust * , C. A. Waudby *   Capturing Chemistry in XML/CML ACS March 2004 *  Unilever Centre for Molecular Informatics, University of Cambridge
The Agony Of  Publication - Loss Capturing Chemistry in XML/CML ACS March 2004 The World
The Agony Of  Publication - Loss Capturing Chemistry in XML/CML ACS March 2004 The World Sad The Scientist The Lab Journals Web Pages
The Vision-1 Capturing Chemistry in XML/CML ACS March 2004 < scalar  dictRef =“ ccml:mp ” units =“units:c” minValue =“65” maxValue =“66”  /> mp 65-66   C Human-readable Machine-readable
The Vision-2 ,[object Object],Capturing Chemistry in XML/CML ACS March 2004 ,[object Object],[object Object],[object Object],[object Object],But also
Our Approach ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Capturing Chemistry in XML/CML ACS March 2004
Machine Parsing  of Chemistry Capturing Chemistry in XML/CML ACS March 2004 Structured (CompChem) Semi-Structured (Articles) Unstructured (Discussion) Structured  documents and data in  XML MACHINE PARSING   ?
How? Abstract Discussion Experimental Capturing Chemistry in XML/CML ACS March 2004 Article semi- structured Add  Structure Parse with Regular Expressions Legacy to CML  converters
Regular Expressions Capturing Chemistry in XML/CML ACS March 2004 ,[object Object],Maybe ‘.’ Any  punctuation 0 or more digits Capital ‘ C’ Melting point: two possible syntaxes Capital or  lowercase ‘m’ Lowercase ‘ p’ Maybe whitespace Maybe degrees sign m.p. > 23.5 °C mp 23.5 – 25 °C
CML - XML For  Chemistry ,[object Object],[object Object],[object Object],[object Object],[object Object],Capturing Chemistry in XML/CML ACS March 2004 J. Chem. Inf. Comp. Sci.,  2003 ,  43 , 757
The CML Family Controlled XMLNamespaces: CMLCore – compounds and properties CMLReact – reactions CMLSpect – spectra * CMLComp – compChem CMLCryst – crystallography and condensed matter Interoperates with HTML, MathML, SVG,  * AniML + ,  * ThermoML $ , etc. Capturing Chemistry in XML/CML ACS March 2004 + spectra: ANSI/JCAMP $ thermochemistry: NIST J. Chem. Inf. Comp. Sci.,  2003 ,  43 , 757
Case Studies Parsing output from 750,000 MOPAC jobs High-throughput parsing of journals Capturing Chemistry in XML/CML ACS March 2004
CompChem Logs Capturing Chemistry in XML/CML ACS March 2004 Coordinates Molecular Formula Calculation Type Point Group Dipole Total Energy
Loss From CompChem Capturing Chemistry in XML/CML ACS March 2004 Coordinates Molecular Formula Calculation Type Dipole Total Energy Ionisation Potential
Loss From CompChem Capturing Chemistry in XML/CML ACS March 2004 Coordinates Molecular Formula Calculation Type Dipole Total Energy Ionisation Potential
Parsing Data CompChem Output Capturing Chemistry in XML/CML ACS March 2004 Coordinates Energy Levels Vibrations Coordinates Energy Level Vibration CML File CMLCore CMLCore CMLComp CMLSpect Input/jobControl General Parsers
Display Process 1 Capturing Chemistry in XML/CML ACS March 2004 CompChem Log Xindice CML XSLT
Display Process 2 Capturing Chemistry in XML/CML ACS March 2004 CML File CMLCore CMLCore CMLComp CMLSpect compChem Output 3D structure, electronic properties Coordinates Energy Levels Vibrations Input/jobControl XSLT Display Normal modes 2D structure,  thermodynamic properties
Parsing Data Capturing Chemistry in XML/CML ACS March 2004 Dictionary Entry: The pointgroup of a molecule ... The Schoenflies convention is  normally used, but Hermann  Mauguin is also allowed. D [debye] ParentSI: c.m Multiplier: 3.335641E-30 CGS units for electric dipole
Dictionaries Capturing Chemistry in XML/CML ACS March 2004 < scalar  dictRef =“ ccml:mp ” units =“units:c” minValue =“65” maxValue =“66”  /> Linked to CML schema Accesses CCML  namespace Units dictionary id =&quot;celsius&quot;  name =&quot;Celsius&quot;  parentSI =&quot;k&quot; multiplierToSI =&quot;1&quot;  constantToSI =&quot;273.15&quot;  abbreviation =&quot;C&quot;  unitType =&quot;temp&quot; id =&quot;meltrange&quot;  term =&quot;Melting range&quot; definition =&quot;Minimum and maximum values of melting range in degrees Celsius&quot;
OSCAR Open Source Chemistry Analysis Routines Capturing Chemistry in XML/CML ACS March 2004 Sponsored by the Royal Society of Chemistry (Cambridge) Mounted on http://www.rsc.org/
Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Synthesis Set up Analysis Compound Name Article Experimental
Information  Checked / Extracted Capturing Chemistry in XML/CML ACS March 2004 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
OSCAR Parsing Data Capturing Chemistry in XML/CML ACS March 2004 H NMR Nature HRMS
OSCAR Parsing Data Capturing Chemistry in XML/CML ACS March 2004
OSCAR Data Found Capturing Chemistry in XML/CML ACS March 2004 Results from one paper
OSCAR Error Checking Capturing Chemistry in XML/CML ACS March 2004 Serious Error Warning Type 1 Warning Type 2
OSCAR Error Checking Capturing Chemistry in XML/CML ACS March 2004 ~30 errors / warnings  searched for This article has: 4 errors 2 warnings (type 1) 30 warnings (type 2) Elemental analysis, incorrect – calculations are for a different molecular formula
OSCAR Data Presentation Capturing Chemistry in XML/CML ACS March 2004
OSCAR Speed Capturing Chemistry in XML/CML ACS March 2004 A typical paper contains ca. 20 compounds JOC (Feb 2004) contains ~600 compounds OSCAR could extract and tabulate in under 5 minutes OBC (Feb 2004) contains ~300 compounds OSCAR could extract and tabulate in under 3 minutes High throughput, high precision
OSCAR Accuracy Capturing Chemistry in XML/CML ACS March 2004 92 % of Data Correctly Identified 3 % incorrect  author entry 5 % missed 437 items, ~10,000 data fields in test set, working with current Regular Expressions False-positives: 3 %
XML-CML Databases Capturing Chemistry in XML/CML ACS March 2004 CML Journals Theses CompChem XMLDb can support > 250,000 molecules Millisecond retrieval on INChI, properties Xindice
Capturing Molecules Capturing Chemistry in XML/CML ACS March 2004 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Encourage chemists to
NLP & Parsing Names Capturing Chemistry in XML/CML ACS March 2004 KEY:  Locant  Characteristic Group  Mono valent parent hydride Multiplier  Heterocyclic parent hydride
Thank You Unilever RSC Jonathan Goodman Sam Adams Fraser Norton Chris Waudby Yong Zhang Capturing Chemistry in XML/CML ACS March 2004

More Related Content

What's hot

General Concepts in QSAR for Using the QSAR Application Toolbox Part 2
General Concepts in QSAR for Using the QSAR Application Toolbox Part 2General Concepts in QSAR for Using the QSAR Application Toolbox Part 2
General Concepts in QSAR for Using the QSAR Application Toolbox Part 2International QSAR Foundation
 
ACSSA Halide-Water Poster
ACSSA Halide-Water PosterACSSA Halide-Water Poster
ACSSA Halide-Water PosterJiarong Zhou
 
General Concepts in QSAR for Using the QSAR Application Toolbox Part 1
General Concepts in QSAR for Using the QSAR Application Toolbox Part 1General Concepts in QSAR for Using the QSAR Application Toolbox Part 1
General Concepts in QSAR for Using the QSAR Application Toolbox Part 1International QSAR Foundation
 
Quantitative Structure Activity Relationship (QSAR)
Quantitative Structure Activity Relationship (QSAR)Quantitative Structure Activity Relationship (QSAR)
Quantitative Structure Activity Relationship (QSAR)Theabhi.in
 
DavidWooChemEResearchPosterv2
DavidWooChemEResearchPosterv2DavidWooChemEResearchPosterv2
DavidWooChemEResearchPosterv2David Woo
 
Introduction to OECD QSAR Toolbox
Introduction to OECD QSAR ToolboxIntroduction to OECD QSAR Toolbox
Introduction to OECD QSAR Toolboxguestcfca1eb1
 
Fac/Mer Isomerism in Fe(II) Complexes
Fac/Mer Isomerism in Fe(II) ComplexesFac/Mer Isomerism in Fe(II) Complexes
Fac/Mer Isomerism in Fe(II) ComplexesRafia Aslam
 
Linking Ab Initio-Calphad for the Assessment of the AluminiumLutetium System
Linking Ab Initio-Calphad for the Assessment of the AluminiumLutetium SystemLinking Ab Initio-Calphad for the Assessment of the AluminiumLutetium System
Linking Ab Initio-Calphad for the Assessment of the AluminiumLutetium SystemIRJESJOURNAL
 
Free wilson analysis qsar
Free wilson analysis qsarFree wilson analysis qsar
Free wilson analysis qsarRahul B S
 
Introduction to Quantitative Structure Activity Relationships
Introduction to Quantitative Structure Activity RelationshipsIntroduction to Quantitative Structure Activity Relationships
Introduction to Quantitative Structure Activity RelationshipsOmar Sokkar
 
Quantum mechanical study the kinetics, mechanisms and
Quantum mechanical study the kinetics, mechanisms andQuantum mechanical study the kinetics, mechanisms and
Quantum mechanical study the kinetics, mechanisms andAlexander Decker
 
Chem 2 - Chemical Kinetics III - Determining the Rate Law with the Method of ...
Chem 2 - Chemical Kinetics III - Determining the Rate Law with the Method of ...Chem 2 - Chemical Kinetics III - Determining the Rate Law with the Method of ...
Chem 2 - Chemical Kinetics III - Determining the Rate Law with the Method of ...Lumen Learning
 
A correlation for the prediction of thermal conductivity of liquids
A correlation for the prediction of thermal conductivity of liquidsA correlation for the prediction of thermal conductivity of liquids
A correlation for the prediction of thermal conductivity of liquidsJosemar Pereira da Silva
 
Steric parameters taft’s steric factor (es)
Steric parameters  taft’s steric factor (es)Steric parameters  taft’s steric factor (es)
Steric parameters taft’s steric factor (es)Shikha Popali
 

What's hot (20)

General Concepts in QSAR for Using the QSAR Application Toolbox Part 2
General Concepts in QSAR for Using the QSAR Application Toolbox Part 2General Concepts in QSAR for Using the QSAR Application Toolbox Part 2
General Concepts in QSAR for Using the QSAR Application Toolbox Part 2
 
ACSSA Halide-Water Poster
ACSSA Halide-Water PosterACSSA Halide-Water Poster
ACSSA Halide-Water Poster
 
General Concepts in QSAR for Using the QSAR Application Toolbox Part 1
General Concepts in QSAR for Using the QSAR Application Toolbox Part 1General Concepts in QSAR for Using the QSAR Application Toolbox Part 1
General Concepts in QSAR for Using the QSAR Application Toolbox Part 1
 
Quantitative Structure Activity Relationship (QSAR)
Quantitative Structure Activity Relationship (QSAR)Quantitative Structure Activity Relationship (QSAR)
Quantitative Structure Activity Relationship (QSAR)
 
DavidWooChemEResearchPosterv2
DavidWooChemEResearchPosterv2DavidWooChemEResearchPosterv2
DavidWooChemEResearchPosterv2
 
Introduction to OECD QSAR Toolbox
Introduction to OECD QSAR ToolboxIntroduction to OECD QSAR Toolbox
Introduction to OECD QSAR Toolbox
 
Computer Simulation of EPR Orthorhombic Jahn-Teller Spectra of Cu2+in Cd2(NH4...
Computer Simulation of EPR Orthorhombic Jahn-Teller Spectra of Cu2+in Cd2(NH4...Computer Simulation of EPR Orthorhombic Jahn-Teller Spectra of Cu2+in Cd2(NH4...
Computer Simulation of EPR Orthorhombic Jahn-Teller Spectra of Cu2+in Cd2(NH4...
 
Fac/Mer Isomerism in Fe(II) Complexes
Fac/Mer Isomerism in Fe(II) ComplexesFac/Mer Isomerism in Fe(II) Complexes
Fac/Mer Isomerism in Fe(II) Complexes
 
Qsar lecture
Qsar lectureQsar lecture
Qsar lecture
 
Linking Ab Initio-Calphad for the Assessment of the AluminiumLutetium System
Linking Ab Initio-Calphad for the Assessment of the AluminiumLutetium SystemLinking Ab Initio-Calphad for the Assessment of the AluminiumLutetium System
Linking Ab Initio-Calphad for the Assessment of the AluminiumLutetium System
 
Free wilson analysis qsar
Free wilson analysis qsarFree wilson analysis qsar
Free wilson analysis qsar
 
Introduction to Quantitative Structure Activity Relationships
Introduction to Quantitative Structure Activity RelationshipsIntroduction to Quantitative Structure Activity Relationships
Introduction to Quantitative Structure Activity Relationships
 
Poster
PosterPoster
Poster
 
Quantum mechanical study the kinetics, mechanisms and
Quantum mechanical study the kinetics, mechanisms andQuantum mechanical study the kinetics, mechanisms and
Quantum mechanical study the kinetics, mechanisms and
 
Chem 2 - Chemical Kinetics III - Determining the Rate Law with the Method of ...
Chem 2 - Chemical Kinetics III - Determining the Rate Law with the Method of ...Chem 2 - Chemical Kinetics III - Determining the Rate Law with the Method of ...
Chem 2 - Chemical Kinetics III - Determining the Rate Law with the Method of ...
 
QSAR
QSARQSAR
QSAR
 
Qsar ppt
Qsar pptQsar ppt
Qsar ppt
 
A correlation for the prediction of thermal conductivity of liquids
A correlation for the prediction of thermal conductivity of liquidsA correlation for the prediction of thermal conductivity of liquids
A correlation for the prediction of thermal conductivity of liquids
 
QSAR
QSARQSAR
QSAR
 
Steric parameters taft’s steric factor (es)
Steric parameters  taft’s steric factor (es)Steric parameters  taft’s steric factor (es)
Steric parameters taft’s steric factor (es)
 

Viewers also liked

Effective Capability Building
Effective Capability BuildingEffective Capability Building
Effective Capability BuildingMohit Mittal
 
CHUYÊN ĐỀ LƯỢNG GIÁC CHƯƠNG 1 ĐẠI SỐ 11 MỚI NHẤT - HAY NHẤT
CHUYÊN ĐỀ LƯỢNG GIÁC CHƯƠNG 1 ĐẠI SỐ 11 MỚI NHẤT - HAY NHẤTCHUYÊN ĐỀ LƯỢNG GIÁC CHƯƠNG 1 ĐẠI SỐ 11 MỚI NHẤT - HAY NHẤT
CHUYÊN ĐỀ LƯỢNG GIÁC CHƯƠNG 1 ĐẠI SỐ 11 MỚI NHẤT - HAY NHẤTHoàng Thái Việt
 
ly thuyet va de kiem tra vat ly 7 hoc ky 1
ly thuyet va de kiem tra vat ly 7 hoc ky 1ly thuyet va de kiem tra vat ly 7 hoc ky 1
ly thuyet va de kiem tra vat ly 7 hoc ky 1Hoàng Thái Việt
 
Luento ammattilaiset, lauran muokkaama pohja 2014
Luento ammattilaiset, lauran muokkaama pohja 2014Luento ammattilaiset, lauran muokkaama pohja 2014
Luento ammattilaiset, lauran muokkaama pohja 20140458452713
 
Digital transformation : The Necessity
Digital transformation : The NecessityDigital transformation : The Necessity
Digital transformation : The NecessityMohit Mittal
 
TONG HOP DE KIEM TRA CHUONG 2 DAI SO 11 HAY
TONG HOP DE KIEM TRA CHUONG 2 DAI SO 11 HAYTONG HOP DE KIEM TRA CHUONG 2 DAI SO 11 HAY
TONG HOP DE KIEM TRA CHUONG 2 DAI SO 11 HAYHoàng Thái Việt
 
Juknis lomba karya ilmiah
Juknis  lomba karya ilmiahJuknis  lomba karya ilmiah
Juknis lomba karya ilmiahJack Sudarto
 
Enabling Voice Applications with WebRTC and ORTC in Microsoft Edge
Enabling Voice Applications with WebRTC and ORTC in Microsoft EdgeEnabling Voice Applications with WebRTC and ORTC in Microsoft Edge
Enabling Voice Applications with WebRTC and ORTC in Microsoft EdgeMark Roberts
 
Chuyen de hinh hoc khong gian
Chuyen de hinh hoc khong gianChuyen de hinh hoc khong gian
Chuyen de hinh hoc khong gianonthi360
 
SE_Lec 11_ Project Management
SE_Lec 11_ Project ManagementSE_Lec 11_ Project Management
SE_Lec 11_ Project ManagementAmr E. Mohamed
 
Automated Securities Accounting System
Automated Securities Accounting System Automated Securities Accounting System
Automated Securities Accounting System SRI Infotech
 
Singapore intresting facts
Singapore intresting factsSingapore intresting facts
Singapore intresting factsYasir Shah
 
evowatcger - computer monitoring system
evowatcger - computer monitoring systemevowatcger - computer monitoring system
evowatcger - computer monitoring systemCatalin Muresan
 
وحدة التعلم الذاتي 2015
وحدة التعلم الذاتي 2015وحدة التعلم الذاتي 2015
وحدة التعلم الذاتي 2015Haitham El-Ghareeb
 

Viewers also liked (20)

Effective Capability Building
Effective Capability BuildingEffective Capability Building
Effective Capability Building
 
MS Dynamics AX 2012
MS Dynamics AX 2012MS Dynamics AX 2012
MS Dynamics AX 2012
 
Neha_Resume_Dev
Neha_Resume_DevNeha_Resume_Dev
Neha_Resume_Dev
 
F.D
F.DF.D
F.D
 
Jane Howard
Jane HowardJane Howard
Jane Howard
 
Xp day roberto20130323
Xp day roberto20130323Xp day roberto20130323
Xp day roberto20130323
 
CHUYÊN ĐỀ LƯỢNG GIÁC CHƯƠNG 1 ĐẠI SỐ 11 MỚI NHẤT - HAY NHẤT
CHUYÊN ĐỀ LƯỢNG GIÁC CHƯƠNG 1 ĐẠI SỐ 11 MỚI NHẤT - HAY NHẤTCHUYÊN ĐỀ LƯỢNG GIÁC CHƯƠNG 1 ĐẠI SỐ 11 MỚI NHẤT - HAY NHẤT
CHUYÊN ĐỀ LƯỢNG GIÁC CHƯƠNG 1 ĐẠI SỐ 11 MỚI NHẤT - HAY NHẤT
 
diningfacilityconcept
diningfacilityconceptdiningfacilityconcept
diningfacilityconcept
 
ly thuyet va de kiem tra vat ly 7 hoc ky 1
ly thuyet va de kiem tra vat ly 7 hoc ky 1ly thuyet va de kiem tra vat ly 7 hoc ky 1
ly thuyet va de kiem tra vat ly 7 hoc ky 1
 
Luento ammattilaiset, lauran muokkaama pohja 2014
Luento ammattilaiset, lauran muokkaama pohja 2014Luento ammattilaiset, lauran muokkaama pohja 2014
Luento ammattilaiset, lauran muokkaama pohja 2014
 
Digital transformation : The Necessity
Digital transformation : The NecessityDigital transformation : The Necessity
Digital transformation : The Necessity
 
TONG HOP DE KIEM TRA CHUONG 2 DAI SO 11 HAY
TONG HOP DE KIEM TRA CHUONG 2 DAI SO 11 HAYTONG HOP DE KIEM TRA CHUONG 2 DAI SO 11 HAY
TONG HOP DE KIEM TRA CHUONG 2 DAI SO 11 HAY
 
Juknis lomba karya ilmiah
Juknis  lomba karya ilmiahJuknis  lomba karya ilmiah
Juknis lomba karya ilmiah
 
Enabling Voice Applications with WebRTC and ORTC in Microsoft Edge
Enabling Voice Applications with WebRTC and ORTC in Microsoft EdgeEnabling Voice Applications with WebRTC and ORTC in Microsoft Edge
Enabling Voice Applications with WebRTC and ORTC in Microsoft Edge
 
Chuyen de hinh hoc khong gian
Chuyen de hinh hoc khong gianChuyen de hinh hoc khong gian
Chuyen de hinh hoc khong gian
 
SE_Lec 11_ Project Management
SE_Lec 11_ Project ManagementSE_Lec 11_ Project Management
SE_Lec 11_ Project Management
 
Automated Securities Accounting System
Automated Securities Accounting System Automated Securities Accounting System
Automated Securities Accounting System
 
Singapore intresting facts
Singapore intresting factsSingapore intresting facts
Singapore intresting facts
 
evowatcger - computer monitoring system
evowatcger - computer monitoring systemevowatcger - computer monitoring system
evowatcger - computer monitoring system
 
وحدة التعلم الذاتي 2015
وحدة التعلم الذاتي 2015وحدة التعلم الذاتي 2015
وحدة التعلم الذاتي 2015
 

Similar to Capturing Chemistry In XML

Quantum pharmacology. Basics
Quantum pharmacology. BasicsQuantum pharmacology. Basics
Quantum pharmacology. BasicsMobiliuz
 
Cheminformatics II
Cheminformatics IICheminformatics II
Cheminformatics IIbaoilleach
 
Computational Organic Chemistry
Computational Organic ChemistryComputational Organic Chemistry
Computational Organic ChemistryIsamu Katsuyama
 
AWMA Presentation Application of Two State-of-the-art Dispersion Models
AWMA Presentation Application of Two State-of-the-art Dispersion ModelsAWMA Presentation Application of Two State-of-the-art Dispersion Models
AWMA Presentation Application of Two State-of-the-art Dispersion Modelsmtingle
 
Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Kamel Mansouri
 
How to use data to design and optimize reaction? A quick introduction to work...
How to use data to design and optimize reaction? A quick introduction to work...How to use data to design and optimize reaction? A quick introduction to work...
How to use data to design and optimize reaction? A quick introduction to work...Ichigaku Takigawa
 
Molecular Simulation to build models for enzyme induced fit
Molecular Simulation to build models for enzyme induced fit Molecular Simulation to build models for enzyme induced fit
Molecular Simulation to build models for enzyme induced fit MinSung Kim
 
Energy Minimization Using Gromacs
Energy Minimization Using GromacsEnergy Minimization Using Gromacs
Energy Minimization Using GromacsRajendra K Labala
 
Cheminformatics, concept by kk sahu sir
Cheminformatics, concept by kk sahu sirCheminformatics, concept by kk sahu sir
Cheminformatics, concept by kk sahu sirKAUSHAL SAHU
 
Lecture_No._2_Computational_Chemistry_Tools___Application_of_computational_me...
Lecture_No._2_Computational_Chemistry_Tools___Application_of_computational_me...Lecture_No._2_Computational_Chemistry_Tools___Application_of_computational_me...
Lecture_No._2_Computational_Chemistry_Tools___Application_of_computational_me...ManavBhugun3
 
Parameterization of force field
Parameterization of force fieldParameterization of force field
Parameterization of force fieldJose Luis
 
Machine Learning in Chemistry: Part I
Machine Learning in Chemistry: Part IMachine Learning in Chemistry: Part I
Machine Learning in Chemistry: Part IJon Paul Janet
 
molecular mechanics and quantum mechnics
molecular mechanics and quantum mechnicsmolecular mechanics and quantum mechnics
molecular mechanics and quantum mechnicsRAKESH JAGTAP
 
Canonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsCanonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsJeremy Yang
 
Hydrogen fuel cells for the automotive system
Hydrogen fuel cells for the automotive systemHydrogen fuel cells for the automotive system
Hydrogen fuel cells for the automotive systemOmar Qasim
 

Similar to Capturing Chemistry In XML (20)

Poster_Jun 2014
Poster_Jun 2014Poster_Jun 2014
Poster_Jun 2014
 
Quantum pharmacology. Basics
Quantum pharmacology. BasicsQuantum pharmacology. Basics
Quantum pharmacology. Basics
 
Cheminformatics II
Cheminformatics IICheminformatics II
Cheminformatics II
 
Computational Organic Chemistry
Computational Organic ChemistryComputational Organic Chemistry
Computational Organic Chemistry
 
AWMA Presentation Application of Two State-of-the-art Dispersion Models
AWMA Presentation Application of Two State-of-the-art Dispersion ModelsAWMA Presentation Application of Two State-of-the-art Dispersion Models
AWMA Presentation Application of Two State-of-the-art Dispersion Models
 
Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpider
Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpiderIdentification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpider
Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpider
 
Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...
 
How to use data to design and optimize reaction? A quick introduction to work...
How to use data to design and optimize reaction? A quick introduction to work...How to use data to design and optimize reaction? A quick introduction to work...
How to use data to design and optimize reaction? A quick introduction to work...
 
Molecular Simulation to build models for enzyme induced fit
Molecular Simulation to build models for enzyme induced fit Molecular Simulation to build models for enzyme induced fit
Molecular Simulation to build models for enzyme induced fit
 
Energy Minimization Using Gromacs
Energy Minimization Using GromacsEnergy Minimization Using Gromacs
Energy Minimization Using Gromacs
 
Cheminformatics, concept by kk sahu sir
Cheminformatics, concept by kk sahu sirCheminformatics, concept by kk sahu sir
Cheminformatics, concept by kk sahu sir
 
Lecture_No._2_Computational_Chemistry_Tools___Application_of_computational_me...
Lecture_No._2_Computational_Chemistry_Tools___Application_of_computational_me...Lecture_No._2_Computational_Chemistry_Tools___Application_of_computational_me...
Lecture_No._2_Computational_Chemistry_Tools___Application_of_computational_me...
 
A01 9-1
A01 9-1A01 9-1
A01 9-1
 
Parameterization of force field
Parameterization of force fieldParameterization of force field
Parameterization of force field
 
Machine Learning in Chemistry: Part I
Machine Learning in Chemistry: Part IMachine Learning in Chemistry: Part I
Machine Learning in Chemistry: Part I
 
molecular mechanics and quantum mechnics
molecular mechanics and quantum mechnicsmolecular mechanics and quantum mechnics
molecular mechanics and quantum mechnics
 
The importance of standards for data exchange and interchange on the Royal So...
The importance of standards for data exchange and interchange on the Royal So...The importance of standards for data exchange and interchange on the Royal So...
The importance of standards for data exchange and interchange on the Royal So...
 
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
 
Canonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsCanonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformatics
 
Hydrogen fuel cells for the automotive system
Hydrogen fuel cells for the automotive systemHydrogen fuel cells for the automotive system
Hydrogen fuel cells for the automotive system
 

Recently uploaded

ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty SecureFemke de Vroome
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101vincent683379
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FIDO Alliance
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastUXDXConf
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka DoktorováCzechDreamin
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyUXDXConf
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfFIDO Alliance
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...FIDO Alliance
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIES VE
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsUXDXConf
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Julian Hyde
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...FIDO Alliance
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...CzechDreamin
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKUXDXConf
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance
 

Recently uploaded (20)

ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 

Capturing Chemistry In XML

  • 1. Capturing Chemistry in XML/CML J. A. Townsend * , S. E. Adams * , J. M. Goodman * , P. Murray-Rust * , C. A. Waudby * Capturing Chemistry in XML/CML ACS March 2004 * Unilever Centre for Molecular Informatics, University of Cambridge
  • 2. The Agony Of Publication - Loss Capturing Chemistry in XML/CML ACS March 2004 The World
  • 3. The Agony Of Publication - Loss Capturing Chemistry in XML/CML ACS March 2004 The World Sad The Scientist The Lab Journals Web Pages
  • 4. The Vision-1 Capturing Chemistry in XML/CML ACS March 2004 < scalar dictRef =“ ccml:mp ” units =“units:c” minValue =“65” maxValue =“66” /> mp 65-66  C Human-readable Machine-readable
  • 5.
  • 6.
  • 7. Machine Parsing of Chemistry Capturing Chemistry in XML/CML ACS March 2004 Structured (CompChem) Semi-Structured (Articles) Unstructured (Discussion) Structured documents and data in XML MACHINE PARSING ?
  • 8. How? Abstract Discussion Experimental Capturing Chemistry in XML/CML ACS March 2004 Article semi- structured Add Structure Parse with Regular Expressions Legacy to CML converters
  • 9.
  • 10.
  • 11. The CML Family Controlled XMLNamespaces: CMLCore – compounds and properties CMLReact – reactions CMLSpect – spectra * CMLComp – compChem CMLCryst – crystallography and condensed matter Interoperates with HTML, MathML, SVG, * AniML + , * ThermoML $ , etc. Capturing Chemistry in XML/CML ACS March 2004 + spectra: ANSI/JCAMP $ thermochemistry: NIST J. Chem. Inf. Comp. Sci., 2003 , 43 , 757
  • 12. Case Studies Parsing output from 750,000 MOPAC jobs High-throughput parsing of journals Capturing Chemistry in XML/CML ACS March 2004
  • 13. CompChem Logs Capturing Chemistry in XML/CML ACS March 2004 Coordinates Molecular Formula Calculation Type Point Group Dipole Total Energy
  • 14. Loss From CompChem Capturing Chemistry in XML/CML ACS March 2004 Coordinates Molecular Formula Calculation Type Dipole Total Energy Ionisation Potential
  • 15. Loss From CompChem Capturing Chemistry in XML/CML ACS March 2004 Coordinates Molecular Formula Calculation Type Dipole Total Energy Ionisation Potential
  • 16. Parsing Data CompChem Output Capturing Chemistry in XML/CML ACS March 2004 Coordinates Energy Levels Vibrations Coordinates Energy Level Vibration CML File CMLCore CMLCore CMLComp CMLSpect Input/jobControl General Parsers
  • 17. Display Process 1 Capturing Chemistry in XML/CML ACS March 2004 CompChem Log Xindice CML XSLT
  • 18. Display Process 2 Capturing Chemistry in XML/CML ACS March 2004 CML File CMLCore CMLCore CMLComp CMLSpect compChem Output 3D structure, electronic properties Coordinates Energy Levels Vibrations Input/jobControl XSLT Display Normal modes 2D structure, thermodynamic properties
  • 19. Parsing Data Capturing Chemistry in XML/CML ACS March 2004 Dictionary Entry: The pointgroup of a molecule ... The Schoenflies convention is normally used, but Hermann Mauguin is also allowed. D [debye] ParentSI: c.m Multiplier: 3.335641E-30 CGS units for electric dipole
  • 20. Dictionaries Capturing Chemistry in XML/CML ACS March 2004 < scalar dictRef =“ ccml:mp ” units =“units:c” minValue =“65” maxValue =“66” /> Linked to CML schema Accesses CCML namespace Units dictionary id =&quot;celsius&quot; name =&quot;Celsius&quot; parentSI =&quot;k&quot; multiplierToSI =&quot;1&quot; constantToSI =&quot;273.15&quot; abbreviation =&quot;C&quot; unitType =&quot;temp&quot; id =&quot;meltrange&quot; term =&quot;Melting range&quot; definition =&quot;Minimum and maximum values of melting range in degrees Celsius&quot;
  • 21. OSCAR Open Source Chemistry Analysis Routines Capturing Chemistry in XML/CML ACS March 2004 Sponsored by the Royal Society of Chemistry (Cambridge) Mounted on http://www.rsc.org/
  • 22. Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
  • 23. Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
  • 24. Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
  • 25. Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
  • 26. Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
  • 27. Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Synthesis Set up Analysis Compound Name Article Experimental
  • 28.
  • 29. OSCAR Parsing Data Capturing Chemistry in XML/CML ACS March 2004 H NMR Nature HRMS
  • 30. OSCAR Parsing Data Capturing Chemistry in XML/CML ACS March 2004
  • 31. OSCAR Data Found Capturing Chemistry in XML/CML ACS March 2004 Results from one paper
  • 32. OSCAR Error Checking Capturing Chemistry in XML/CML ACS March 2004 Serious Error Warning Type 1 Warning Type 2
  • 33. OSCAR Error Checking Capturing Chemistry in XML/CML ACS March 2004 ~30 errors / warnings searched for This article has: 4 errors 2 warnings (type 1) 30 warnings (type 2) Elemental analysis, incorrect – calculations are for a different molecular formula
  • 34. OSCAR Data Presentation Capturing Chemistry in XML/CML ACS March 2004
  • 35. OSCAR Speed Capturing Chemistry in XML/CML ACS March 2004 A typical paper contains ca. 20 compounds JOC (Feb 2004) contains ~600 compounds OSCAR could extract and tabulate in under 5 minutes OBC (Feb 2004) contains ~300 compounds OSCAR could extract and tabulate in under 3 minutes High throughput, high precision
  • 36. OSCAR Accuracy Capturing Chemistry in XML/CML ACS March 2004 92 % of Data Correctly Identified 3 % incorrect author entry 5 % missed 437 items, ~10,000 data fields in test set, working with current Regular Expressions False-positives: 3 %
  • 37. XML-CML Databases Capturing Chemistry in XML/CML ACS March 2004 CML Journals Theses CompChem XMLDb can support > 250,000 molecules Millisecond retrieval on INChI, properties Xindice
  • 38.
  • 39. NLP & Parsing Names Capturing Chemistry in XML/CML ACS March 2004 KEY: Locant Characteristic Group Mono valent parent hydride Multiplier Heterocyclic parent hydride
  • 40. Thank You Unilever RSC Jonathan Goodman Sam Adams Fraser Norton Chris Waudby Yong Zhang Capturing Chemistry in XML/CML ACS March 2004