SlideShare a Scribd company logo
1 of 35
1
Automated Structure Annotation and
Curation for MassBank:
Potential and Pitfalls
Emma Schymanski
Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg.
Email: emma.schymanski@uni.lu
Michael A. Stravs (Eawag, Dübendorf, Switzerland)
Tobias Schulze (Helmholtz Centre for Environmental Research, Germany)
Antony J. Williams (NCCT, US EPA, Research Triangle Park, NC, USA)
The views expressed in this presentation are those of the authors and do not necessarily reflect the views or policies of the U.S. Environmental Protection Agency.
2
MassBank: Japan, Europe, America ….
Horai et al. 2010. JMS, 45(7) pp 703-714. DOI: 10.1002/jms.1777
www.massbank.jp, www.massbank.eu, http://mona.fiehnlab.ucdavis.edu/
o MassBank started as a public repository in Japan, 2006
o No standard analytical method
o Include many different data types (GC, LC, MS, MS/MS, HR, LR, AM…)
o Contributor is responsible for data quality
o NORMAN network of reference laboratories, research centres and related
organisations for monitoring of emerging environmental substances
o Many different laboratories with different instruments & reference standards
o “Emerging substances” and TPs: not yet widely known; not yet in databases
o NORMAN joined MassBank in 2012 and founded MassBank.EU
o MassBank.JP and MassBank.EU are quite similar …
o MoNA (MassBank of North America) is the latest in the collection
o Completely different database concept
3
MassBank: Crossing the World
Image: www.massbank.jp
4
MassBank Now
o www.massbank.jp & www.massbank.eu
MassBank now has 46,334 spectra* from 32 contributing institutes!
Contributions from European NORMAN member institutes
*Spectra numbers from http://mona.fiehnlab.ucdavis.edu/downloads
10,668 MS/MS
5
MassBank Now
Image: www.massbank.eu
http://massbank.eu/MassBank
6
Example Mass Spectrum
MassBank Now
7
European MassBank
http://massbank.eu/MassBank
o MassBank.EU was founded late 2012, hosted at UFZ, Leipzig, Germany
o 16,017 MS/MS spectra; 1,232 substances from NORMAN members
o Tentative/unknown/literature spectra on massbank.eu (not massbank.jp)
8
European MassBank
Image: www.massbank.eu
http://massbank.eu/MassBank
o MassBank.EU was founded late 2012, hosted at UFZ, Leipzig, Germany
o 16,017 MS/MS spectra; 1,232 substances from NORMAN members
o Tentative/unknown/literature spectra on massbank.eu (not massbank.jp)
9
Creating High-Quality Mass Spectra
Automatic MS and MS/MS
Recalibration and Clean-up
Remove interfering peaks
Spectral Annotation with
- Experimental Details
- Compound Information
https://github.com/MassBank/RMassBank/
http://bioconductor.org/packages/RMassBank/
Stravs, Schymanski, Singer and Hollender, 2013,
Journal of Mass Spectrometry, 48, 89–99. DOI: 10.1002/jms.3131
16,004 (61 %*) MS/MS spectra
1,269 (18 %*) substances
*% of all open LC-MS/MS data
10
Record Specifications – Well Defined
https://github.com/MassBank/MassBank-web/tree/master/Documentation
11
Concept of RMassBank Processing
o Connect spectrum and compound information
o User need to contribute a bare minimum of information
o Identification:
• Only the user knows what compound has been measured
• At least one form of (unambiguous) compound identifier required
Typically: internal ID, name, SMILES, retention time
• Use web services to fill in the rest => reduces manual errors
o Measurement:
• Measurement parameters, methods, settings are consistent
• Added in batch form via a settings file
Stravs et al, 2013, JMS, 48, 89–99. DOI: 10.1002/jms.3131
12
Concept of RMassBank Processing
o Web services: let these do the work for you!
o CACTUS Chemical Identifier Resolver
o http://cactus.nci.nih.gov/chemical/structure
o SMILES (c1ccccc1) to InChI Key (UHOVQNZJYSORNB-
UHFFFAOYSA-N)
o Chemical Translation Service (CTS)
o http://cts.fiehnlab.ucdavis.edu/
o Names, CAS #, InChI and Identifiers (IDs, if available): PubChem
CID, ChemSpider, ChEBI, HMDB, KEGG, LipidMaps
o PubChem
o https://pubchem.ncbi.nlm.nih.gov/
o PubChem CID
o ChemSpider
o http://www.chemspider.com/
o ChemSpider ID
Stravs et al, 2013, JMS, 48, 89–99. DOI: 10.1002/jms.3131
Behind the scenes …
OpenBabel & rcdk
13
Auto-Completed Compound List
o Exported to user for manual verification
14
MassBank Naming Issues – Contributor Defined
15
Chemicals we purchase are not always “MS-ready”
Schymanski & Williams, 2017, ES&T, 51 (10), pp 5357–5359. DOI: 10.1021/acs.est.7b01908
16
MassBank/CompTox Curation of External Data
o A “nice” example: 4-4'-Bis(2-sulfostyryl)biphenyl
Purchased: CAS: 27344-41-8
DTXSID6036467
Registered: CAS: 38775-22-3 (UFZ)
DTXSID7047017
17
Two MassBank Lists on CompTox Chem. Dashboard
Image: www.massbank.eu
18
MassBank Reference Collection
https://comptox.epa.gov/dashboard/chemical_lists/massbankref
o All mass spectra measured with reference standards
(Level 1)
19
MassBank EU Special Cases
o Literature Spectra, Supporting Information,
Transformation Products, Complex Mixtures …
20
Creating High-Quality Mass Spectra II
Automatic MS and MS/MS
Recalibration and Clean-up
Remove interfering peaks
Spectral Annotation with
- Experimental Details
- Compound Information
https://github.com/MassBank/RMassBank/
http://bioconductor.org/packages/RMassBank/
MS/MS for
further
processing
Knowns, Suspects, Unknowns…
21
Confidence Levels for Tentative Structures
Schymanski, Jeon, Gulde, Fenner, Ruff, Singer & Hollender (2014) ES&T, 48 (4), 2097-2098. DOI: 10.1021/es5002105
o Annotation is the key to communicating information
MS, MS2, RT, Reference Std.
Level 1: Confirmed structure
by reference standard
Level 2: Probable structure
a) by library spectrum match
b) by diagnostic evidence
Identification confidence
N
N
N
NHNH
CH3
CH3
S
CH3
OH
MS, MS2, Library MS2
MS, MS2, Exp. data
Example Minimum data requirements
Level 4: Unequivocal molecular formula
Level 5: Exact mass of interest
C6H5N3O4
192.0757
MS isotope/adduct
MS
Level 3: Tentative candidate(s)
structure, substituent, class MS, MS2, Exp. data
22
Automating Confidence Levels
Schymanski et al, 2014, ES&T. DOI: 10.1021/es5002105 & Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7
23
Suspect Screening: Benzotriazole TPs
Huntscha et al. 2014, ES&T, 48(8), 4435-4443.
1H-BT .eu
24
Surfactant Screening From Literature
Schymanski et al. (2014), ES&T, 48: 1811-1818. DOI: 10.1021/es4044374
Literature sources
o Formulas, masses (ions), retention times and intensities
o Spectra of selected compounds (different instruments)
Gonzalez et al. Rapid Comm.
Mass Spec. 2008, 22: 1445-54
Lara-Martin et al. EST. 2010, 44: 1670-1676
massbank.eu
39 literature spectra (so far)
25
Supporting Information Mass Spectra in MassBank
Schymanski et al, 2014, ES&T. DOI: 10.1021/es5002105
o Implementation of “levels” enables creation of supporting
information collections
o Several Eawag tentative/unknown collections following
the 2014 Eawag Level Scheme (DOI 10.1021/es5002105)
• Gulde et al 2016 (DOI 10.1021/acs.est.6b01301)
• TPs already found in GNPS! http://goo.gl/NmO4tx
• Rösch et al 2016 (DOI 10.1021/acs.est.5b05186)
• …and many more
26
Several Years of Automated Curation Issues … but…
Discussions will move to https://github.com/MassBank/MassBank-Curation
o Curation discussion on Github …
27
…we have enabled world-wide MS exchange!
Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7
NORMAN Suspect List Exchange:
http://www.norman-network.com/?q=node/236
Tentatively Identified Spectra:
http://goo.gl/0t7jGp
Hits in GNPS MassIVE datasets:
TPs in skin: http://goo.gl/NmO4tx
Surfactants: http://goo.gl/7sY9Pf
28
Acknowledgements
Questions?
NORMAN MassBank:
www.massbank.eu
CompTox Chemistry
Dashboard:
https://comptox.epa.gov/
Contact:
emma.schymanski@uni.lu
29
Target, Suspect and Non-Target Screening
KNOWNS SUSPECTS No Prior Knowledge
HPLC separation and HR-MS/MS
TARGET
ANALYSIS
SUSPECT
SCREENING
NON-TARGET
SCREENING
Targets found Suspects found Masses of interest
(Molecular formula)
DATABASE
SEARCH
STRUCTURE
GENERATION
Confirmation and quantification of compounds present
Candidate selection (retention time, MS/MS, calculated properties)
Sampling extraction (SPE) HPLC separation HR-MS/MS
Time, Effort & Number of Compounds….
SUSPECTS
SPECTRUM
SEARCH
Spectral match
30
Supporting Evidence for Homologues
Stravs et al. (2013), J. Mass Spectrom, 48(1):89-99. DOI: 10.1002/jms.3131
OHSO
O
CH3
O
OH
m n
SPA-9C
m+n=6
Formulas: http://sourceforge.net/projects/genform/
Meringer et al, 2011, MATCH 65, 259-290
Data: Schymanski et al. 2014, ES&T, 48:
1811-1818. DOI: 10.1021/es4044374
Chromatography and MS/MS Annotation
Literature: LIT00034,35
Sample: ETS00002
Standard: ETS00016,17,19,20
https://github.com/MassBank/RMassBank/
31
Cross-Linking Homologues in the Dashboard
CDK Depict
https://www.slideshare.net/AntonyWilliams/
markush-enumeration-to-manage-mesh-and-manipulate-substances-of-unknown-or-variable-composition
32
Cross-Linking Homologues in the Dashboard
Schymanski, Grulke, Williams et al, in prep. & Williams et al. 2017 J. Cheminformatics 9:61 DOI: 10.1186/s13321-017-0247-6
https://comptox.epa.gov/dashboard/chemical_lists/eawagsurf
33
Enhancing Access to Mass Spectral Information
Viniaxa, Schymanski, Navarro, Neumann, Salek, Yanes, 2016, TrAC, DOI: 10.1016/j.trac.2015.09.005
= HMDB,
GNPS,
MassBank,
ReSpect
Compound lists
provided by:
S. Stein, R. Mistrik, Agilent
Most libraries still have many unique entries – intercomparability?
34
SPLASH – Communicate between libraries
Wohlgemuth et al., 2016, Nature Biotechnology 34, 1099-1101, DOI: 10.1038/nbt.3689
splash10 - 0002 - 0900000000 - b112e4e059e1ecf98c5f
[version] - [top10] - [histogram] - [hash of full spectrum]
http://mona.fiehnlab.ucdavis.edu/#/spectra/splash/splash10-0002-0900000000-b112e4e059e1ecf98c5f
https://www.google.ch/search?q=splash10-0002-0900000000-b112e4e059e1ecf98c5f
35
Homologues … UVCBs … Complex Mixtures
Schymanski, Grulke, Williams et al, in prep. & Williams et al. 2017 J. Cheminformatics 9:61 DOI: 10.1186/s13321-017-0247-6
https://comptox.epa.gov/dashboard/chemical_lists/eawagsurf

More Related Content

Similar to Automated Structure Annotation and Curation for MassBank: Potential and Pitfalls

MMTF-Spark: Interactive, Scalable, and Reproducible Datamining of 3D Macromo...
 MMTF-Spark: Interactive, Scalable, and Reproducible Datamining of 3D Macromo... MMTF-Spark: Interactive, Scalable, and Reproducible Datamining of 3D Macromo...
MMTF-Spark: Interactive, Scalable, and Reproducible Datamining of 3D Macromo...Peter Rose
 
Citizen_Science_Environmental Monitoring_Pittcon_2017
Citizen_Science_Environmental Monitoring_Pittcon_2017Citizen_Science_Environmental Monitoring_Pittcon_2017
Citizen_Science_Environmental Monitoring_Pittcon_2017ACH_jaypatel
 
StatDCAT-Application Profile: presentation
StatDCAT-Application Profile: presentationStatDCAT-Application Profile: presentation
StatDCAT-Application Profile: presentationSemic.eu
 
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
2016 ACS Semantic Approaches for Biochemical Knowledge DiscoveryMichel Dumontier
 
SC1 Workshop 2 Pilot instantiations
SC1 Workshop 2 Pilot instantiationsSC1 Workshop 2 Pilot instantiations
SC1 Workshop 2 Pilot instantiationsBigData_Europe
 
Data-sharing in Metabolomics beyond your supplemental PDF
Data-sharing in Metabolomics beyond your supplemental PDFData-sharing in Metabolomics beyond your supplemental PDF
Data-sharing in Metabolomics beyond your supplemental PDFSteffen Neumann
 
RSC Environmental Cheminformatics to Identify Unknowns Feb 2019
RSC Environmental Cheminformatics to Identify Unknowns Feb 2019RSC Environmental Cheminformatics to Identify Unknowns Feb 2019
RSC Environmental Cheminformatics to Identify Unknowns Feb 2019Emma Schymanski
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Sunghwan Kim
 
Small Molecules in Big Data - Analytica Munich
Small Molecules in Big Data - Analytica MunichSmall Molecules in Big Data - Analytica Munich
Small Molecules in Big Data - Analytica MunichEmma Schymanski
 
Semantics for integrated laboratory analytical processes - The Allotrope Pers...
Semantics for integrated laboratory analytical processes - The Allotrope Pers...Semantics for integrated laboratory analytical processes - The Allotrope Pers...
Semantics for integrated laboratory analytical processes - The Allotrope Pers...OSTHUS
 
Cao report 2007-2012
Cao report 2007-2012Cao report 2007-2012
Cao report 2007-2012Elif Ceylan
 
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...Kamel Mansouri
 
Challenges in Analytics for BIG Data
Challenges in Analytics for BIG DataChallenges in Analytics for BIG Data
Challenges in Analytics for BIG DataPrasant Misra
 
Experiences to learn from the MS proteomics field
Experiences to learn from the MS proteomics fieldExperiences to learn from the MS proteomics field
Experiences to learn from the MS proteomics fieldJuan Antonio Vizcaino
 

Similar to Automated Structure Annotation and Curation for MassBank: Potential and Pitfalls (20)

MMTF-Spark: Interactive, Scalable, and Reproducible Datamining of 3D Macromo...
 MMTF-Spark: Interactive, Scalable, and Reproducible Datamining of 3D Macromo... MMTF-Spark: Interactive, Scalable, and Reproducible Datamining of 3D Macromo...
MMTF-Spark: Interactive, Scalable, and Reproducible Datamining of 3D Macromo...
 
Adding complex expert knowledge into chemical database and transforming surfa...
Adding complex expert knowledge into chemical database and transforming surfa...Adding complex expert knowledge into chemical database and transforming surfa...
Adding complex expert knowledge into chemical database and transforming surfa...
 
Citizen_Science_Environmental Monitoring_Pittcon_2017
Citizen_Science_Environmental Monitoring_Pittcon_2017Citizen_Science_Environmental Monitoring_Pittcon_2017
Citizen_Science_Environmental Monitoring_Pittcon_2017
 
StatDCAT-Application Profile: presentation
StatDCAT-Application Profile: presentationStatDCAT-Application Profile: presentation
StatDCAT-Application Profile: presentation
 
CompTox Chemicals Dashboard: Data and tools to support chemical and environme...
CompTox Chemicals Dashboard: Data and tools to support chemical and environme...CompTox Chemicals Dashboard: Data and tools to support chemical and environme...
CompTox Chemicals Dashboard: Data and tools to support chemical and environme...
 
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
 
Curating and Sharing Structures and Spectra for the Environmental Community
Curating and Sharing  Structures and Spectra for the Environmental CommunityCurating and Sharing  Structures and Spectra for the Environmental Community
Curating and Sharing Structures and Spectra for the Environmental Community
 
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
 
SC1 Workshop 2 Pilot instantiations
SC1 Workshop 2 Pilot instantiationsSC1 Workshop 2 Pilot instantiations
SC1 Workshop 2 Pilot instantiations
 
Data-sharing in Metabolomics beyond your supplemental PDF
Data-sharing in Metabolomics beyond your supplemental PDFData-sharing in Metabolomics beyond your supplemental PDF
Data-sharing in Metabolomics beyond your supplemental PDF
 
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Toxico...
Integrating Mass Spectrometry  Non-Targeted Analysis and Computational Toxico...Integrating Mass Spectrometry  Non-Targeted Analysis and Computational Toxico...
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Toxico...
 
RSC Environmental Cheminformatics to Identify Unknowns Feb 2019
RSC Environmental Cheminformatics to Identify Unknowns Feb 2019RSC Environmental Cheminformatics to Identify Unknowns Feb 2019
RSC Environmental Cheminformatics to Identify Unknowns Feb 2019
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...
 
Small Molecules in Big Data - Analytica Munich
Small Molecules in Big Data - Analytica MunichSmall Molecules in Big Data - Analytica Munich
Small Molecules in Big Data - Analytica Munich
 
Data and model management in Systems Biology
Data and model management in Systems BiologyData and model management in Systems Biology
Data and model management in Systems Biology
 
Semantics for integrated laboratory analytical processes - The Allotrope Pers...
Semantics for integrated laboratory analytical processes - The Allotrope Pers...Semantics for integrated laboratory analytical processes - The Allotrope Pers...
Semantics for integrated laboratory analytical processes - The Allotrope Pers...
 
Cao report 2007-2012
Cao report 2007-2012Cao report 2007-2012
Cao report 2007-2012
 
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
 
Challenges in Analytics for BIG Data
Challenges in Analytics for BIG DataChallenges in Analytics for BIG Data
Challenges in Analytics for BIG Data
 
Experiences to learn from the MS proteomics field
Experiences to learn from the MS proteomics fieldExperiences to learn from the MS proteomics field
Experiences to learn from the MS proteomics field
 

Recently uploaded

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionPriyansha Singh
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 

Recently uploaded (20)

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorption
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 

Automated Structure Annotation and Curation for MassBank: Potential and Pitfalls

  • 1. 1 Automated Structure Annotation and Curation for MassBank: Potential and Pitfalls Emma Schymanski Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg. Email: emma.schymanski@uni.lu Michael A. Stravs (Eawag, Dübendorf, Switzerland) Tobias Schulze (Helmholtz Centre for Environmental Research, Germany) Antony J. Williams (NCCT, US EPA, Research Triangle Park, NC, USA) The views expressed in this presentation are those of the authors and do not necessarily reflect the views or policies of the U.S. Environmental Protection Agency.
  • 2. 2 MassBank: Japan, Europe, America …. Horai et al. 2010. JMS, 45(7) pp 703-714. DOI: 10.1002/jms.1777 www.massbank.jp, www.massbank.eu, http://mona.fiehnlab.ucdavis.edu/ o MassBank started as a public repository in Japan, 2006 o No standard analytical method o Include many different data types (GC, LC, MS, MS/MS, HR, LR, AM…) o Contributor is responsible for data quality o NORMAN network of reference laboratories, research centres and related organisations for monitoring of emerging environmental substances o Many different laboratories with different instruments & reference standards o “Emerging substances” and TPs: not yet widely known; not yet in databases o NORMAN joined MassBank in 2012 and founded MassBank.EU o MassBank.JP and MassBank.EU are quite similar … o MoNA (MassBank of North America) is the latest in the collection o Completely different database concept
  • 3. 3 MassBank: Crossing the World Image: www.massbank.jp
  • 4. 4 MassBank Now o www.massbank.jp & www.massbank.eu MassBank now has 46,334 spectra* from 32 contributing institutes! Contributions from European NORMAN member institutes *Spectra numbers from http://mona.fiehnlab.ucdavis.edu/downloads 10,668 MS/MS
  • 7. 7 European MassBank http://massbank.eu/MassBank o MassBank.EU was founded late 2012, hosted at UFZ, Leipzig, Germany o 16,017 MS/MS spectra; 1,232 substances from NORMAN members o Tentative/unknown/literature spectra on massbank.eu (not massbank.jp)
  • 8. 8 European MassBank Image: www.massbank.eu http://massbank.eu/MassBank o MassBank.EU was founded late 2012, hosted at UFZ, Leipzig, Germany o 16,017 MS/MS spectra; 1,232 substances from NORMAN members o Tentative/unknown/literature spectra on massbank.eu (not massbank.jp)
  • 9. 9 Creating High-Quality Mass Spectra Automatic MS and MS/MS Recalibration and Clean-up Remove interfering peaks Spectral Annotation with - Experimental Details - Compound Information https://github.com/MassBank/RMassBank/ http://bioconductor.org/packages/RMassBank/ Stravs, Schymanski, Singer and Hollender, 2013, Journal of Mass Spectrometry, 48, 89–99. DOI: 10.1002/jms.3131 16,004 (61 %*) MS/MS spectra 1,269 (18 %*) substances *% of all open LC-MS/MS data
  • 10. 10 Record Specifications – Well Defined https://github.com/MassBank/MassBank-web/tree/master/Documentation
  • 11. 11 Concept of RMassBank Processing o Connect spectrum and compound information o User need to contribute a bare minimum of information o Identification: • Only the user knows what compound has been measured • At least one form of (unambiguous) compound identifier required Typically: internal ID, name, SMILES, retention time • Use web services to fill in the rest => reduces manual errors o Measurement: • Measurement parameters, methods, settings are consistent • Added in batch form via a settings file Stravs et al, 2013, JMS, 48, 89–99. DOI: 10.1002/jms.3131
  • 12. 12 Concept of RMassBank Processing o Web services: let these do the work for you! o CACTUS Chemical Identifier Resolver o http://cactus.nci.nih.gov/chemical/structure o SMILES (c1ccccc1) to InChI Key (UHOVQNZJYSORNB- UHFFFAOYSA-N) o Chemical Translation Service (CTS) o http://cts.fiehnlab.ucdavis.edu/ o Names, CAS #, InChI and Identifiers (IDs, if available): PubChem CID, ChemSpider, ChEBI, HMDB, KEGG, LipidMaps o PubChem o https://pubchem.ncbi.nlm.nih.gov/ o PubChem CID o ChemSpider o http://www.chemspider.com/ o ChemSpider ID Stravs et al, 2013, JMS, 48, 89–99. DOI: 10.1002/jms.3131 Behind the scenes … OpenBabel & rcdk
  • 13. 13 Auto-Completed Compound List o Exported to user for manual verification
  • 14. 14 MassBank Naming Issues – Contributor Defined
  • 15. 15 Chemicals we purchase are not always “MS-ready” Schymanski & Williams, 2017, ES&T, 51 (10), pp 5357–5359. DOI: 10.1021/acs.est.7b01908
  • 16. 16 MassBank/CompTox Curation of External Data o A “nice” example: 4-4'-Bis(2-sulfostyryl)biphenyl Purchased: CAS: 27344-41-8 DTXSID6036467 Registered: CAS: 38775-22-3 (UFZ) DTXSID7047017
  • 17. 17 Two MassBank Lists on CompTox Chem. Dashboard Image: www.massbank.eu
  • 18. 18 MassBank Reference Collection https://comptox.epa.gov/dashboard/chemical_lists/massbankref o All mass spectra measured with reference standards (Level 1)
  • 19. 19 MassBank EU Special Cases o Literature Spectra, Supporting Information, Transformation Products, Complex Mixtures …
  • 20. 20 Creating High-Quality Mass Spectra II Automatic MS and MS/MS Recalibration and Clean-up Remove interfering peaks Spectral Annotation with - Experimental Details - Compound Information https://github.com/MassBank/RMassBank/ http://bioconductor.org/packages/RMassBank/ MS/MS for further processing Knowns, Suspects, Unknowns…
  • 21. 21 Confidence Levels for Tentative Structures Schymanski, Jeon, Gulde, Fenner, Ruff, Singer & Hollender (2014) ES&T, 48 (4), 2097-2098. DOI: 10.1021/es5002105 o Annotation is the key to communicating information MS, MS2, RT, Reference Std. Level 1: Confirmed structure by reference standard Level 2: Probable structure a) by library spectrum match b) by diagnostic evidence Identification confidence N N N NHNH CH3 CH3 S CH3 OH MS, MS2, Library MS2 MS, MS2, Exp. data Example Minimum data requirements Level 4: Unequivocal molecular formula Level 5: Exact mass of interest C6H5N3O4 192.0757 MS isotope/adduct MS Level 3: Tentative candidate(s) structure, substituent, class MS, MS2, Exp. data
  • 22. 22 Automating Confidence Levels Schymanski et al, 2014, ES&T. DOI: 10.1021/es5002105 & Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7
  • 23. 23 Suspect Screening: Benzotriazole TPs Huntscha et al. 2014, ES&T, 48(8), 4435-4443. 1H-BT .eu
  • 24. 24 Surfactant Screening From Literature Schymanski et al. (2014), ES&T, 48: 1811-1818. DOI: 10.1021/es4044374 Literature sources o Formulas, masses (ions), retention times and intensities o Spectra of selected compounds (different instruments) Gonzalez et al. Rapid Comm. Mass Spec. 2008, 22: 1445-54 Lara-Martin et al. EST. 2010, 44: 1670-1676 massbank.eu 39 literature spectra (so far)
  • 25. 25 Supporting Information Mass Spectra in MassBank Schymanski et al, 2014, ES&T. DOI: 10.1021/es5002105 o Implementation of “levels” enables creation of supporting information collections o Several Eawag tentative/unknown collections following the 2014 Eawag Level Scheme (DOI 10.1021/es5002105) • Gulde et al 2016 (DOI 10.1021/acs.est.6b01301) • TPs already found in GNPS! http://goo.gl/NmO4tx • Rösch et al 2016 (DOI 10.1021/acs.est.5b05186) • …and many more
  • 26. 26 Several Years of Automated Curation Issues … but… Discussions will move to https://github.com/MassBank/MassBank-Curation o Curation discussion on Github …
  • 27. 27 …we have enabled world-wide MS exchange! Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7 NORMAN Suspect List Exchange: http://www.norman-network.com/?q=node/236 Tentatively Identified Spectra: http://goo.gl/0t7jGp Hits in GNPS MassIVE datasets: TPs in skin: http://goo.gl/NmO4tx Surfactants: http://goo.gl/7sY9Pf
  • 29. 29 Target, Suspect and Non-Target Screening KNOWNS SUSPECTS No Prior Knowledge HPLC separation and HR-MS/MS TARGET ANALYSIS SUSPECT SCREENING NON-TARGET SCREENING Targets found Suspects found Masses of interest (Molecular formula) DATABASE SEARCH STRUCTURE GENERATION Confirmation and quantification of compounds present Candidate selection (retention time, MS/MS, calculated properties) Sampling extraction (SPE) HPLC separation HR-MS/MS Time, Effort & Number of Compounds…. SUSPECTS SPECTRUM SEARCH Spectral match
  • 30. 30 Supporting Evidence for Homologues Stravs et al. (2013), J. Mass Spectrom, 48(1):89-99. DOI: 10.1002/jms.3131 OHSO O CH3 O OH m n SPA-9C m+n=6 Formulas: http://sourceforge.net/projects/genform/ Meringer et al, 2011, MATCH 65, 259-290 Data: Schymanski et al. 2014, ES&T, 48: 1811-1818. DOI: 10.1021/es4044374 Chromatography and MS/MS Annotation Literature: LIT00034,35 Sample: ETS00002 Standard: ETS00016,17,19,20 https://github.com/MassBank/RMassBank/
  • 31. 31 Cross-Linking Homologues in the Dashboard CDK Depict https://www.slideshare.net/AntonyWilliams/ markush-enumeration-to-manage-mesh-and-manipulate-substances-of-unknown-or-variable-composition
  • 32. 32 Cross-Linking Homologues in the Dashboard Schymanski, Grulke, Williams et al, in prep. & Williams et al. 2017 J. Cheminformatics 9:61 DOI: 10.1186/s13321-017-0247-6 https://comptox.epa.gov/dashboard/chemical_lists/eawagsurf
  • 33. 33 Enhancing Access to Mass Spectral Information Viniaxa, Schymanski, Navarro, Neumann, Salek, Yanes, 2016, TrAC, DOI: 10.1016/j.trac.2015.09.005 = HMDB, GNPS, MassBank, ReSpect Compound lists provided by: S. Stein, R. Mistrik, Agilent Most libraries still have many unique entries – intercomparability?
  • 34. 34 SPLASH – Communicate between libraries Wohlgemuth et al., 2016, Nature Biotechnology 34, 1099-1101, DOI: 10.1038/nbt.3689 splash10 - 0002 - 0900000000 - b112e4e059e1ecf98c5f [version] - [top10] - [histogram] - [hash of full spectrum] http://mona.fiehnlab.ucdavis.edu/#/spectra/splash/splash10-0002-0900000000-b112e4e059e1ecf98c5f https://www.google.ch/search?q=splash10-0002-0900000000-b112e4e059e1ecf98c5f
  • 35. 35 Homologues … UVCBs … Complex Mixtures Schymanski, Grulke, Williams et al, in prep. & Williams et al. 2017 J. Cheminformatics 9:61 DOI: 10.1186/s13321-017-0247-6 https://comptox.epa.gov/dashboard/chemical_lists/eawagsurf