SlideShare a Scribd company logo
Validating the Open Spectral
                                                                                                                                                          Ryan Sasaki1, Sergey Golotvin2
                                                                                                                                                              and Antony Williams3

                                                                                                                                                     1 Advanced  Chemistry Development, Inc.

     Database NMR Collection using ACD/Labs
                                                                                                                                                       (ACD/Labs)
                                                                                                                                                     2 ACD Moscow Inc., Moscow,
                                                                                                                                                       Russian Federation


             Verification Algorithms
                                                                                                                                                     3 ChemSpider, Royal Society of Chemistry,
                                                                                                                                                       904 Tamaras Circle, Wake Forest,
                                                                                                                                                       North Carolina 27587, USA



Introduction                                                             2) Chemical shift, integration, and multiplicity information are          Other encountered issues include spectra with low resolution,
In parallel with the development of new 2D NMR techniques, new             predicted for the proposed chemical structure and compared with         incorrect spectrometer frequency, unknown solvents, and of course a
ChemSpider is a free online database of over 26 million unique             the related properties extracted from the experimental spectrum.        series of incorrectly proposed structures
chemical compounds sourced from over 400 different sources                 A comparison is then made based on an auto-assignment
including government laboratories, chemical vendors, and public            procedure3 that finds the best possible fit as the minimum of a
resources. ChemSpider allows its users to deposit data including           special objective function.
structures, properties, links to external resources, and various forms
of spectral data. ChemSpider has aggregated over 2000 high quality         A similar approach is taken for 13C NMR verification but compares
NMR spectra and continues to expand as the community deposits              the experimental and predicted chemical shift values and peak
additional data. The data are generally validated by the community         heights. In both cases the output for each verification procedure
but a batch-wise verification of all 1D 1H and 13C NMR spectral data       is a Match Factor metric (0-1) produced to illustrate the level of
in the database was performed using ACD/Labs NMR verification              consistency between the proposed structure and the experimental         Figure 2: Example of a 1H NMR spectrum with a mixture of
software.                                                                  spectrum. For the purpose of the 1H NMR study, structure-spectrum       components as evidenced by integral values.
                                                                           pairs that generate a match factor >0.8 were considered consistent.
Sources of Spectral Data                                                   For 13C NMR, a match factor of >0.75 was considered consistent.         Inconsistent results for the 13C NMR data were also evaluated. Close
Databases of structures with associated NMR assignments are                                                                                        inspection revealed that the biggest culprit was due to poor S/N that
available as commercial or open data. However, databases of              Analysis of Data                                                          led to the absence of 13C peaks for quaternary carbons. As a result,
NMR spectral curves are less common and generally limited to             The ACD/Labs automated 1H and 13C verification routines were run          the software was unable to find peaks corresponding to quaternary
metabonomics data (for example, the BMRB1 and DrugBank2). One            on the NMR spectra dataset from ChemSpider. The results of this           carbons in many proposed structures and thus a significant number
component of the ChemSpider project is to gather, host, and make         procedure are shown in Figure 1 below:                                    of inconsistent results were observed.
available a structure searchable database of spectral data: 1D/2D                          7%
                                                                                                                    8%


NMR, IR, Raman, and MS. The majority of data are deposited by users                 16%                                                            Conclusions
of ChemSpider. Submission of spectra in the form of JCAMP-DX (for                                           25%
                                                                                                                                                   ChemSpider is an online structure database allowing the community
1D spectra) or images/PDF (for 1D or 2D spectra) are supported. In                                                                                 to participate in the deposition of additional data. A growing NMR
order to deposit a spectrum a user simply searches ChemSpider for                                   77%
                                                                                                                              67%

                                                                                                                                                   spectral curve data collection is available to download. In this way
the associated structure and uploads the JCAMP-DX or image form of                                                                  Consistent
                                                                                                                                    Ambiguous      a major reference source of Open NMR data can be provided. The
                                                                                                A                        B
the spectrum. Community-based curators validate and annotate the                                                                    Inconsistent

                                                                                                                                                   validation of the existing set of spectral data has been performed
data as appropriate to ensure that only the highest quality data are     Figure 1: (A) The ACD/Labs 1H verification methodology suggests           using ACD/Labs NMR Verification routines. The data validation work
available in the database. As the data collection grew, a batchwise      that 77% of the 744 NMR spectra submitted to ChemSpider were              highlighted a number of errors in the data, that have now been
validation of the data quality was required and ACD/Labs NMR             consistent with the proposed chemical structure. (B) The ACD/Labs         resolved, as well as providing a thorough test of the algorithms on
verification software was used to perform the analysis.                  13C verification methodology suggests that 67% of the 704 NMR
                                                                                                                                                   real-world data.
                                                                         spectra submitted to ChemSpider were consistent with the proposed
ACD/Labs NMR Verification Routines                                       chemical structure.
                                                                                                                                                   References
The ACD/Labs approach to 1H NMR verification consists of two steps:                                                                                1) Biological Magnetic Resonance Bank: http://www.bmrb.wisc.edu/
1) The experimental spectrum with an attached chemical structure         Identified Issues with the Data                                           2) DrugBank: http://www.drugbank.ca/
  is automatically processed and analyzed. Analysis includes             Structures that were deemed inconsistent by the ACD/Labs system           3) Automated Structure Verification Based on 1H NMR Prediction S.S.
  automated peak picking, integration, and multiplicity analysis         were manually reviewed. The most frequent reason for inconsistent            Golotvin, E.Vodopianov, B.A. Lefebvre, A.J. Williams, and T.D. Spitzer
                                                                                                                                                      (GSK) Magn. Reson. Chem., 44 (5) 524–538, 2006.
  (extraction of coupling constants and coupling patterns). In           1H NMR verification results were in spectra where multiple

  addition, all extraneous signals present in the spectrum are           components were observed, i.e., a mixture of isomers. Typically
  identified (i.e., solvent, reference, known admixtures, etc. )         these were observed based on two signals in close proximity with                                                  Tel: (416) 368-3435
                                                                         partial integrals (for example 0.6H and 0.4H instead of 1H). Manual                                               Fax: (416) 368-5596
                                                                                                                                                                                           Toll Free: 1-800-304-3988
                                                                         inspection of all inconsistent results revealed 22 such cases where                                               Email: info@acdlabs.com
                                                                         mixtures were present.                                                                                            www.acdlabs.com

More Related Content

Viewers also liked

Study visit
Study visitStudy visit
Study visitUrvin
 
Las 10 cualidades de los emprendedores
Las 10 cualidades de los emprendedoresLas 10 cualidades de los emprendedores
Las 10 cualidades de los emprendedoresosoriohenao
 
Cinque Terre
Cinque Terre Cinque Terre
Cinque Terre hotmanila
 
Resultados trofeo 2016
Resultados trofeo 2016Resultados trofeo 2016
Resultados trofeo 2016Clau Corvera
 
11 Building A Bigger Pension
11 Building A Bigger Pension11 Building A Bigger Pension
11 Building A Bigger PensionOliver Taylor
 
☆HTC ONE M8☆ Mejor Smartphone 2014
☆HTC ONE M8☆ Mejor Smartphone 2014☆HTC ONE M8☆ Mejor Smartphone 2014
☆HTC ONE M8☆ Mejor Smartphone 2014Aitor BV
 

Viewers also liked (12)

Bass 4 & 5 strings
Bass 4 & 5 stringsBass 4 & 5 strings
Bass 4 & 5 strings
 
Documentos
DocumentosDocumentos
Documentos
 
Study visit
Study visitStudy visit
Study visit
 
Reniec empleos461 2011
Reniec empleos461 2011Reniec empleos461 2011
Reniec empleos461 2011
 
Las 10 cualidades de los emprendedores
Las 10 cualidades de los emprendedoresLas 10 cualidades de los emprendedores
Las 10 cualidades de los emprendedores
 
M Saleh Osman - Resume
M Saleh Osman - ResumeM Saleh Osman - Resume
M Saleh Osman - Resume
 
Cinque Terre
Cinque Terre Cinque Terre
Cinque Terre
 
Reniec empleos468 2011
Reniec empleos468 2011Reniec empleos468 2011
Reniec empleos468 2011
 
Policiais fora de função
Policiais fora de funçãoPoliciais fora de função
Policiais fora de função
 
Resultados trofeo 2016
Resultados trofeo 2016Resultados trofeo 2016
Resultados trofeo 2016
 
11 Building A Bigger Pension
11 Building A Bigger Pension11 Building A Bigger Pension
11 Building A Bigger Pension
 
☆HTC ONE M8☆ Mejor Smartphone 2014
☆HTC ONE M8☆ Mejor Smartphone 2014☆HTC ONE M8☆ Mejor Smartphone 2014
☆HTC ONE M8☆ Mejor Smartphone 2014
 

Similar to Validating the ChemSpider Open Spectral Database NMR Collection using ACD/Labs Verification Algorithms

Sgg crest-presentation-final
Sgg crest-presentation-finalSgg crest-presentation-final
Sgg crest-presentation-finalmarpierc
 
CENT-1206APP11.15-A_Characterizing RNA Nanoparticles by Analytical Ultracentr...
CENT-1206APP11.15-A_Characterizing RNA Nanoparticles by Analytical Ultracentr...CENT-1206APP11.15-A_Characterizing RNA Nanoparticles by Analytical Ultracentr...
CENT-1206APP11.15-A_Characterizing RNA Nanoparticles by Analytical Ultracentr...Chad Schwartz
 
Proteomics Practical (NMR and Protein 3D software
Proteomics Practical (NMR and Protein 3D softwareProteomics Practical (NMR and Protein 3D software
Proteomics Practical (NMR and Protein 3D softwareiqraakbar8
 
Mars STEM poster final draft
Mars STEM poster final draftMars STEM poster final draft
Mars STEM poster final draftAlexander Reedy
 
Quantitative Structure Activity Relationship
Quantitative Structure Activity RelationshipQuantitative Structure Activity Relationship
Quantitative Structure Activity RelationshipRaniBhagat1
 
Medicamente si abuzul de medicamente.pdf
Medicamente si abuzul de medicamente.pdfMedicamente si abuzul de medicamente.pdf
Medicamente si abuzul de medicamente.pdfCarmen180502
 
Getting the Big Picture by Joining up the SAR dots
Getting the Big Picture by Joining up the SAR dotsGetting the Big Picture by Joining up the SAR dots
Getting the Big Picture by Joining up the SAR dotsSorel Muresan
 
Mark Mackey, Cresset, 'Meet Molecular Architect, A new product for understand...
Mark Mackey, Cresset, 'Meet Molecular Architect, A new product for understand...Mark Mackey, Cresset, 'Meet Molecular Architect, A new product for understand...
Mark Mackey, Cresset, 'Meet Molecular Architect, A new product for understand...Cresset
 
Vcu Chemistry Reasearch Facilities
Vcu Chemistry Reasearch FacilitiesVcu Chemistry Reasearch Facilities
Vcu Chemistry Reasearch FacilitiesJoseph Turner 'Jody'
 
Raman spectrometry pptx 21 dec2021
Raman spectrometry pptx 21 dec2021Raman spectrometry pptx 21 dec2021
Raman spectrometry pptx 21 dec2021kusumshrestha14
 
PerkinElmer: Nano-Composites Characterization by Differential Scanning Calori...
PerkinElmer: Nano-Composites Characterization by Differential Scanning Calori...PerkinElmer: Nano-Composites Characterization by Differential Scanning Calori...
PerkinElmer: Nano-Composites Characterization by Differential Scanning Calori...PerkinElmer, Inc.
 

Similar to Validating the ChemSpider Open Spectral Database NMR Collection using ACD/Labs Verification Algorithms (20)

Chem Spider Building An Online Database Of Open Spectra
Chem Spider  Building An Online Database Of Open Spectra Chem Spider  Building An Online Database Of Open Spectra
Chem Spider Building An Online Database Of Open Spectra
 
NMR Prediction Accuracy Validation
NMR Prediction Accuracy ValidationNMR Prediction Accuracy Validation
NMR Prediction Accuracy Validation
 
Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpider
Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpiderIdentification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpider
Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpider
 
ChemSpider, how a free community resource of data can support teaching NMR sp...
ChemSpider, how a free community resource of data can support teaching NMR sp...ChemSpider, how a free community resource of data can support teaching NMR sp...
ChemSpider, how a free community resource of data can support teaching NMR sp...
 
Sgg crest-presentation-final
Sgg crest-presentation-finalSgg crest-presentation-final
Sgg crest-presentation-final
 
2D NMR.pptx
2D NMR.pptx2D NMR.pptx
2D NMR.pptx
 
CENT-1206APP11.15-A_Characterizing RNA Nanoparticles by Analytical Ultracentr...
CENT-1206APP11.15-A_Characterizing RNA Nanoparticles by Analytical Ultracentr...CENT-1206APP11.15-A_Characterizing RNA Nanoparticles by Analytical Ultracentr...
CENT-1206APP11.15-A_Characterizing RNA Nanoparticles by Analytical Ultracentr...
 
Towards More Reliable 13C and 1H Chemical Shift Prediction: A Systematic Comp...
Towards More Reliable 13C and 1H Chemical Shift Prediction: A Systematic Comp...Towards More Reliable 13C and 1H Chemical Shift Prediction: A Systematic Comp...
Towards More Reliable 13C and 1H Chemical Shift Prediction: A Systematic Comp...
 
Proteomics Practical (NMR and Protein 3D software
Proteomics Practical (NMR and Protein 3D softwareProteomics Practical (NMR and Protein 3D software
Proteomics Practical (NMR and Protein 3D software
 
Mars STEM poster final draft
Mars STEM poster final draftMars STEM poster final draft
Mars STEM poster final draft
 
A systematic approach for the generation and verification of structural hypot...
A systematic approach for the generation and verification of structural hypot...A systematic approach for the generation and verification of structural hypot...
A systematic approach for the generation and verification of structural hypot...
 
Quantitative Structure Activity Relationship
Quantitative Structure Activity RelationshipQuantitative Structure Activity Relationship
Quantitative Structure Activity Relationship
 
Medicamente si abuzul de medicamente.pdf
Medicamente si abuzul de medicamente.pdfMedicamente si abuzul de medicamente.pdf
Medicamente si abuzul de medicamente.pdf
 
Getting the Big Picture by Joining up the SAR dots
Getting the Big Picture by Joining up the SAR dotsGetting the Big Picture by Joining up the SAR dots
Getting the Big Picture by Joining up the SAR dots
 
Recoord 07
Recoord 07Recoord 07
Recoord 07
 
Mark Mackey, Cresset, 'Meet Molecular Architect, A new product for understand...
Mark Mackey, Cresset, 'Meet Molecular Architect, A new product for understand...Mark Mackey, Cresset, 'Meet Molecular Architect, A new product for understand...
Mark Mackey, Cresset, 'Meet Molecular Architect, A new product for understand...
 
Vcu Chemistry Reasearch Facilities
Vcu Chemistry Reasearch FacilitiesVcu Chemistry Reasearch Facilities
Vcu Chemistry Reasearch Facilities
 
Raman spectrometry pptx 21 dec2021
Raman spectrometry pptx 21 dec2021Raman spectrometry pptx 21 dec2021
Raman spectrometry pptx 21 dec2021
 
HPLC2005
HPLC2005HPLC2005
HPLC2005
 
PerkinElmer: Nano-Composites Characterization by Differential Scanning Calori...
PerkinElmer: Nano-Composites Characterization by Differential Scanning Calori...PerkinElmer: Nano-Composites Characterization by Differential Scanning Calori...
PerkinElmer: Nano-Composites Characterization by Differential Scanning Calori...
 

Recently uploaded

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...Elena Simperl
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationZilliz
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Julian Hyde
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesThousandEyes
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxAbida Shariff
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...Sri Ambati
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backElena Simperl
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...Product School
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀DianaGray10
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityScyllaDB
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 

Recently uploaded (20)

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 

Validating the ChemSpider Open Spectral Database NMR Collection using ACD/Labs Verification Algorithms

  • 1. Validating the Open Spectral Ryan Sasaki1, Sergey Golotvin2 and Antony Williams3 1 Advanced Chemistry Development, Inc. Database NMR Collection using ACD/Labs (ACD/Labs) 2 ACD Moscow Inc., Moscow, Russian Federation Verification Algorithms 3 ChemSpider, Royal Society of Chemistry, 904 Tamaras Circle, Wake Forest, North Carolina 27587, USA Introduction 2) Chemical shift, integration, and multiplicity information are Other encountered issues include spectra with low resolution, In parallel with the development of new 2D NMR techniques, new predicted for the proposed chemical structure and compared with incorrect spectrometer frequency, unknown solvents, and of course a ChemSpider is a free online database of over 26 million unique the related properties extracted from the experimental spectrum. series of incorrectly proposed structures chemical compounds sourced from over 400 different sources A comparison is then made based on an auto-assignment including government laboratories, chemical vendors, and public procedure3 that finds the best possible fit as the minimum of a resources. ChemSpider allows its users to deposit data including special objective function. structures, properties, links to external resources, and various forms of spectral data. ChemSpider has aggregated over 2000 high quality A similar approach is taken for 13C NMR verification but compares NMR spectra and continues to expand as the community deposits the experimental and predicted chemical shift values and peak additional data. The data are generally validated by the community heights. In both cases the output for each verification procedure but a batch-wise verification of all 1D 1H and 13C NMR spectral data is a Match Factor metric (0-1) produced to illustrate the level of in the database was performed using ACD/Labs NMR verification consistency between the proposed structure and the experimental Figure 2: Example of a 1H NMR spectrum with a mixture of software. spectrum. For the purpose of the 1H NMR study, structure-spectrum components as evidenced by integral values. pairs that generate a match factor >0.8 were considered consistent. Sources of Spectral Data For 13C NMR, a match factor of >0.75 was considered consistent. Inconsistent results for the 13C NMR data were also evaluated. Close Databases of structures with associated NMR assignments are inspection revealed that the biggest culprit was due to poor S/N that available as commercial or open data. However, databases of Analysis of Data led to the absence of 13C peaks for quaternary carbons. As a result, NMR spectral curves are less common and generally limited to The ACD/Labs automated 1H and 13C verification routines were run the software was unable to find peaks corresponding to quaternary metabonomics data (for example, the BMRB1 and DrugBank2). One on the NMR spectra dataset from ChemSpider. The results of this carbons in many proposed structures and thus a significant number component of the ChemSpider project is to gather, host, and make procedure are shown in Figure 1 below: of inconsistent results were observed. available a structure searchable database of spectral data: 1D/2D 7% 8% NMR, IR, Raman, and MS. The majority of data are deposited by users 16% Conclusions of ChemSpider. Submission of spectra in the form of JCAMP-DX (for 25% ChemSpider is an online structure database allowing the community 1D spectra) or images/PDF (for 1D or 2D spectra) are supported. In to participate in the deposition of additional data. A growing NMR order to deposit a spectrum a user simply searches ChemSpider for 77% 67% spectral curve data collection is available to download. In this way the associated structure and uploads the JCAMP-DX or image form of Consistent Ambiguous a major reference source of Open NMR data can be provided. The A B the spectrum. Community-based curators validate and annotate the Inconsistent validation of the existing set of spectral data has been performed data as appropriate to ensure that only the highest quality data are Figure 1: (A) The ACD/Labs 1H verification methodology suggests using ACD/Labs NMR Verification routines. The data validation work available in the database. As the data collection grew, a batchwise that 77% of the 744 NMR spectra submitted to ChemSpider were highlighted a number of errors in the data, that have now been validation of the data quality was required and ACD/Labs NMR consistent with the proposed chemical structure. (B) The ACD/Labs resolved, as well as providing a thorough test of the algorithms on verification software was used to perform the analysis. 13C verification methodology suggests that 67% of the 704 NMR real-world data. spectra submitted to ChemSpider were consistent with the proposed ACD/Labs NMR Verification Routines chemical structure. References The ACD/Labs approach to 1H NMR verification consists of two steps: 1) Biological Magnetic Resonance Bank: http://www.bmrb.wisc.edu/ 1) The experimental spectrum with an attached chemical structure Identified Issues with the Data 2) DrugBank: http://www.drugbank.ca/ is automatically processed and analyzed. Analysis includes Structures that were deemed inconsistent by the ACD/Labs system 3) Automated Structure Verification Based on 1H NMR Prediction S.S. automated peak picking, integration, and multiplicity analysis were manually reviewed. The most frequent reason for inconsistent Golotvin, E.Vodopianov, B.A. Lefebvre, A.J. Williams, and T.D. Spitzer (GSK) Magn. Reson. Chem., 44 (5) 524–538, 2006. (extraction of coupling constants and coupling patterns). In 1H NMR verification results were in spectra where multiple addition, all extraneous signals present in the spectrum are components were observed, i.e., a mixture of isomers. Typically identified (i.e., solvent, reference, known admixtures, etc. ) these were observed based on two signals in close proximity with Tel: (416) 368-3435 partial integrals (for example 0.6H and 0.4H instead of 1H). Manual Fax: (416) 368-5596 Toll Free: 1-800-304-3988 inspection of all inconsistent results revealed 22 such cases where Email: info@acdlabs.com mixtures were present. www.acdlabs.com