Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Curating and Sharing Structures and Spectra for the Environmental Community

133 views

Published on

The increasing popularity of high mass accuracy non-target mass spectrometry methods has yielded extensive identification efforts based on spectral and chemical compound databases in the environmental community and beyond. Increasingly, new methods are relying on open data resources. Candidate structures are often retrieved with either exact mass or molecular formula from large resources such as PubChem, ChemSpider or the EPA CompTox Chemistry Dashboard. Smaller, selective lists of chemicals (also called “suspect lists”) can be used to perform more efficient annotation. Mass spectral libraries can then be used to increase the confidence in tentative identification. Additional metadata (e.g. exposure and hazard information, reference and data source information) can be extremely useful to prioritize substances of high environmental interest. Exchanging information and “sharing structural linkages” between these resources requires extensive curation to ensure that the correct information is shared correctly, yet many valuable datasets arise from scientists and regulators with little official cheminformatics training. This talk will cover curation efforts undertaken to map spectral libraries (e.g. MassBank.EU, mzCloud) and suspect lists from the NORMAN Suspect Exchange (http://www.norman-network.com/?q=node/236) to unique chemical identifiers associated with the US EPA CompTox Chemistry Dashboard. The curation workflow takes advantage of years of experience, as well as contact with the original data providers, to enable open access to valuable, curated datasets to support environmental scientists and the broader research community (e.g. https://comptox.epa.gov/dashboard/chemical_lists). Note: This abstract does not reflect US EPA policy.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Curating and Sharing Structures and Spectra for the Environmental Community

  1. 1. 1 Curating and SharingCurating and Sharing Structures and Spectra for theStructures and Spectra for the Environmental CommunityEnvironmental Community Emma Schymanski Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg. Email: emma.schymanski@uni.lu Antony J. Williams (NCCT, US EPA, Research Triangle Park, NC, USA) Image © www.seanoakley.com/ The views expressed in this presentation are those of the authors and do not necessarily reflect the views or policies of the U.S. Environmental Protection Agency.
  2. 2. 2 The Goal: Aligizakis et al in prep. o To identify as many substances in environmental samples with high resolution mass spectrometry as possible NORMAN Digital Sample Freezing Platform - Retrospective screening across Europe
  3. 3. 3 The Goal: o To identify as many substances in environmental samples with high resolution mass spectrometry as possible o To do this we need: • Mass Spectra in reference libraries • Metadata (and fancy computational methods … but that’s another story…)
  4. 4. 4 The Power of the Metadata (Top 1 ranks) Schymanski et al, 2017, J Cheminf., DOI: 10.1186/s13321-017-0207-1 www.casmi-contest.org
  5. 5. 5 The Power of the Metadata (Top 1 ranks) Schymanski et al, 2017, J Cheminf., DOI: 10.1186/s13321-017-0207-1 www.casmi-contest.org
  6. 6. 6 The Power of the Metadata (Top 1 ranks) Schymanski et al, 2017, J Cheminf., DOI: 10.1186/s13321-017-0207-1 www.casmi-contest.org
  7. 7. 7 MetFrag2.3: Non-target Identification Ruttkies, Schymanski, Wolf, Hollender, Neumann (2016) J. Cheminf., 2016, DOI: 10.1186/s13321-016-0115-9 Status: 2010 => 2016 5 ppm 0.001 Da mz [M-H]- 213.9637 ChemSpider or PubChem± 5 ppm 2.32.3 RT: 4.54 min 355 InChI/RTs References External Refs Data Sources RSC Count PubMed Count Suspect Lists MS/MS 134.0054 339689 150.0001 77271 213.9607 632466 Elements: C,N,S S OO OH
  8. 8. 8 Non-target Identification and Metadata Example from Schymanski et al 2014, ES&T, 48: 1811-1818. DOI: 10.1021/es4044374 o Helps prioritize interesting candidates rapidly … o …assuming candidates are in databases … o https://msbi.ipb-halle.de/MetFragBeta/ Ruttkies, Schymanski, Wolf, Hollender, Neumann (2016) J. Cheminf., 2016, DOI: 10.1186/s13321-016-0115-9 One candidate stands out already
  9. 9. 9 Suspect Screening Allows Efficient Data Exploration
  10. 10. 10 NORMAN Suspect Exchange & SusDat o http://www.norman-network.com/?q=node/236 Schymanski, Aalizadeh et al. in prep. 2.3 2.3 ReferencesFull Lists InChIKeys
  11. 11. 11 o Now 23 lists available online … from small to large! Specialised Lists through to Market Lists
  12. 12. 12 …but not all are what they seem…
  13. 13. 13 Example: Eawag Surfactant List Schymanski et al 2014, ES&T, 48: 1811-1818. DOI: 10.1021/es4044374 https://comptox.epa.gov/dashboard/chemical_lists/eawagsurf
  14. 14. 14 Eawag Surfactant List (after many late nights…) Schymanski et al 2014, ES&T, 48: 1811-1818. DOI: 10.1021/es4044374 https://comptox.epa.gov/dashboard/chemical_lists/eawagsurf
  15. 15. 15 Eawag Surfactant List in CompTox Dashboard Schymanski, Grulke, Williams et al, in prep. & Williams et al. 2017 J. Cheminformatics 9:61 DOI: 10.1186/s13321-017-0247-6 https://comptox.epa.gov/dashboard/chemical_lists/eawagsurf CDK Depict
  16. 16. 16 Cross-Linking with Lists in CompTox Dashboard See next slide …
  17. 17. 17 Supporting Evidence for Homologues Stravs et al. (2013), J. Mass Spectrom, 48(1):89-99. DOI: 10.1002/jms.3131 OHSO O CH3 O OH m n SPA-9C m+n=6 Formulas: http://sourceforge.net/projects/genform/ Meringer et al, 2011, MATCH 65, 259-290 Data: Schymanski et al. 2014, ES&T, 48: 1811-1818. DOI: 10.1021/es4044374 Chromatography and MS/MS Annotation Literature: LIT00034,35 Sample: ETS00002 Standard: ETS00016,17,19,20 https://github.com/MassBank/RMassBank/
  18. 18. 18 Using Generic Structures for Screening/Linking o https://github.com/schymane/RChemMass/
  19. 19. 19 Toolkit compatibility will be vital for us … o https://github.com/cdk/depict/issues/7
  20. 20. 20 … to enable world-wide exchange of suspects Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7 NORMAN Suspect List Exchange: http://www.norman-network.com/?q=node/236 Tentatively Identified Spectra: http://goo.gl/0t7jGp Hits in GNPS MassIVE datasets: TPs in skin: http://goo.gl/NmO4tx Surfactants: http://goo.gl/7sY9Pf
  21. 21. 21 Acknowledgements emma.schymanski@uni.lu Further Information: www.massbank.eu http://www.norman-network.com/?q=node/236 https://github.com/MassBank/RMassBank/ https://comptox.epa.gov/dashboard/ https://wwwen.uni.lu/lcsb/ .eu 2.32.3 EU Grant 603437 Stellan Fischer KEMI
  22. 22. 22
  23. 23. 23 2015: European Non-target Screening Trial Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7 Croatian Water RWS
  24. 24. 24 Homologous Series Detection M. Loos & H Singer, 2017. J. Cheminf. DOI: 10.1186/s13321-017-0197-z & Schymanski et al. 2014, ES&T DOI: 10.1021/es4044374 http://www.envihomolog.eawag.ch/ Search for discrete mass differences S OO OH CH3 CH3 m n C9H19 O O S O O OHm
  25. 25. 25 Surfactant Screening From Literature Schymanski et al. (2014), ES&T, 48: 1811-1818. DOI: 10.1021/es4044374 Literature sources o Formulas, masses (ions), retention times and intensities o Spectra of selected compounds (different instruments) Gonzalez et al. Rapid Comm. Mass Spec. 2008, 22: 1445-54 Lara-Martin et al. EST. 2010, 44: 1670-1676
  26. 26. 26 2015: European Non-target Screening Trial Schymanski et al, 2014, ES&T. DOI: 10.1021/es5002105 & Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7 Peak picking Non-target HR-MS(/MS) Acquisition Target Screening Suspect Screening Non-target Screening Start Level 1 Confirmed Structure by reference standard Level 2 Probable Structure by library/diagnostic evidence Start Level 3 Tentative Candidate(s) suspect, substructure, class Level 4 Unequivocal Molecular Formula insufficient structural evidence Start Level 5 Mass of Interest multiple detection, trends, … “downgrading” with contradictory evidence Increasing identification confidence Target list Suspect list Peak picking or XICs
  27. 27. 27 Reported Identification Depended on Source List Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7 Terbutylazine Detects: 12; # Refs: 220 Sebutylazine Detects: 3; # Refs: 51 Propazine Detects: 3; # Refs: 201 C9H16ClN5 m/z229.1094Da N NN Cl NH CH3 NH CH3 CH3 CH3 (no related compound at this mass)N NN Cl NH CH3NH CH3 CH3 N NN Cl NH CH3NH CH3 CH3 CH3 Simazine Detects: 4; # Refs: 518 Terbutylazine-desethyl Detects: 9; # Refs: 92 Sebutylazine-desethyl Detects: 1; # Refs: 14 C7H12ClN5 m/z201.0781Da N NN Cl NH2NH CH3 CH3 CH3 N NN Cl NH2NH CH3 CH3 N NN Cl NHNH CH3CH3 (no related compound at this mass) Terbutylazine-desethyl- 2-hydroxy Detects: 2; # Refs: 57 Sebutylazine-desethyl- 2-hydroxy Detects: 0; # Refs: 3 Simazine-2-hydroxy Detects: 2; # Refs: 66 C7H13N5O m/z183.1120Da N NN NH2NH OH CH3 CH3 CH3 N NN OH NH2NH CH3 CH3 N NN OH NHNH CH3CH3 (no related compound at this mass)
  28. 28. 28 Enter: NORMAN Suspect Exchange Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7 o …part of the NORMAN Databases Collection
  29. 29. 29 Eawag Surfactant List in CompTox Dashboard CDK Depict https://www.slideshare.net/AntonyWilliams/ markush-enumeration-to-manage-mesh-and-manipulate-substances-of-unknown-or-variable-composition
  30. 30. 30 Curation never stops … (many) more registrations… Cleaning up lists to remove errors Undefined mixtures (UVCBs)
  31. 31. 31 Target, Suspect and Non-Target Screening KNOWNS SUSPECTS No Prior Knowledge HPLC separation and HR-MS/MS TARGET ANALYSIS SUSPECT SCREENING NON-TARGET SCREENING Targets found Suspects found Masses of interest (Molecular formula) DATABASE SEARCH STRUCTURE GENERATION Confirmation and quantification of compounds present Candidate selection (retention time, MS/MS, calculated properties) Sampling extraction (SPE) HPLC separation HR-MS/MS Time, Effort & Number of Compounds…. SUSPECTS SPECTRUM SEARCH Spectral match

×