Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Non-targeted analysis supported by data and cheminformatics delivered via the US-EPA CompTox Chemicals Dashboard


Published on

Non-targeted analysis (NTA) uses high-resolution mass spectrometry to better understand the identity of a wide variety of chemicals present in environmental samples (and other matrices). However, data processing remains challenging due to the vast number of chemicals detected in samples, software and computational requirements of data processing, and inherent uncertainty in confidently identifying chemicals from candidate lists. Analysis of the resultant mass spectrometry information relies on cheminformatics to identify and rank chemicals and the US EPA has developed functionality within the CompTox Chemicals Dashboard ( to address challenges related to this analysis. These tools include the generation of “MS-Ready” structures to optimize database searching, retention time prediction for candidate reduction, consensus ranking using chemical metadata, and in silico MS/MS fragmentation prediction for spectral matching. Combining these tools into a comprehensive workflow improves certainty in candidate identification. This presentation will review how the CompTox Chemicals Dashboard via its flexible search capabilities, rich data for ~875,000 chemical substances, and visualization approaches within this open chemistry resource provides a freely available software tool to support structure identification and NTA. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Non-targeted analysis supported by data and cheminformatics delivered via the US-EPA CompTox Chemicals Dashboard

  1. 1. Non-targeted analysis supported by data and cheminformatics delivered via the CompTox Chemicals Dashboard Antony Williams1, Alex Chao2, Tom Transue3, Tommy Cathey3, Elin Ulrich1 and Jon Sobus1 1) Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, RTP, NC 2) Oak Ridge Institute of Science and Education (ORISE) Research Participant, RTP, NC 3) GDIT, Research Triangle Park, North Carolina, United State November 2019 SETAC, Toronto, Canada The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
  2. 2. An intro to the Dashboard • Freely available web-based database from the National Center for Computational Toxicology • Providing data for 875,000 substances including – Experimental and predicted physicochemical properties – In vivo toxicity data harvested from dozens of public resources – In vitro bioactivity data for thousands of chemicals and assays – Exposure data including chemicals in consumer products – Real time predictions for >20 physchem and toxicological endpoints • Dashboard is used by mass spectrometrists for chemical identification • A quick view of general capabilities… 1
  3. 3. CompTox Chemicals Dashboard 2 875k Chemical Substances
  4. 4. Detailed Chemical Pages 3
  5. 5. Access to Chemical Hazard Data 4
  6. 6. Sources of Exposure to Chemicals 5
  7. 7. Link Access 6 Links based on chemical identifiers to dozens of online resources – including analytical data
  8. 8. MassBank of North America 7
  9. 9. “MS-ready” structures 8
  10. 10. Overview of MS-Ready Structures • All structure-based chemical substances are algorithmically processed to – Split multicomponent chemicals into individual structures – Desalt and neutralize individual structures – Remove stereochemical bonds from all chemicals 9
  11. 11. 10
  12. 12. MS-Ready Mappings Set All substances containing component 11
  13. 13. Mass/Formula Searching and Metadata Ranking 12
  14. 14. Advanced Searches Mass Search 13
  15. 15. Advanced Searches Mass Search 14
  16. 16. MS-Ready Structures for Formula Search 15
  17. 17. MS-Ready Mappings • EXACT Formula: C10H16N2O8: 3 Hits 16
  18. 18. MS-Ready Mappings • Same Input Formula: C10H16N2O8 • MS Ready Formula Search: 125 Chemicals 17
  19. 19. Candidate ranking using metadata 18
  20. 20. Data Source Ranking of “known unknowns” 19 • A mass and/or formula search is for an unknown chemical but it is a known chemical contained within a reference database • Most likely candidate chemicals have the most associated data sources, most associated literature articles or both C14H22N2O3 266.16304 Chemical Reference Database Sorted candidate structures
  21. 21. Dashboard Metadata for Ranking • Chosen dashboard metadata to rank candidates – Associated data sources • Lists in the underlying database (more about lists later) • Associated data sources in PubChem • Specific source types (e.g. water, surfactants, pesticides) – Number of associated literature articles (Pubmed) – Chemicals in the environment – the number of products/categories containing the chemical is an important source of data (from CPDat database) 20
  22. 22. Comparing Search Performance 21 • When dashboard contained 720k chemicals • Only 3% of ChemSpider size • What was the comparison in performance?
  23. 23. SAME dataset for comparison 22
  24. 24. How did performance compare? 23 For the same 162 chemicals, Dashboard outperforms ChemSpider for both Mass and Formula Ranking
  25. 25. Data Quality is important • Data quality in free web-based databases! 24
  26. 26. Will the correct Microcystin LR Stand Up? ChemSpider Skeleton Search 25
  27. 27. Comparing ChemSpider Structures 26
  28. 28. Batch Searching mass and formula 27
  29. 29. Batch Searching • Singleton searches are useful but we work with thousands of masses and formulae! • Typical questions – What is the list of chemicals for the formula CxHyOz – What is the list of chemicals for a mass +/- error – Can I get chemical lists in Excel files? In SDF files? – Can I include properties in the download file? 28
  30. 30. Batch Searching Formula/Mass 29
  31. 31. Searching batches using MS-Ready Formula (or mass) searching 30
  32. 32. Chemical Lists 31
  33. 33. Chemical Lists 32
  34. 34. EPAHFR: Hydraulic Fracturing 33
  35. 35. PFAS lists of Chemicals 34
  36. 36. Research in Progress 35
  37. 37. Predicted Mass Spectra • MS/MS spectra prediction for ESI+, ESI-, and EI • Predictions generated and stored for >800,000 structures, to be accessible via Dashboard 36
  38. 38. Search Expt. vs. Predicted Spectra
  39. 39. Search Expt. vs. Predicted Spectra
  40. 40. Spectral Viewer Comparison 39
  41. 41. Prototype Development 40
  42. 42. Conclusion • Dashboard access to data for ~875,000 chemicals • MS-Ready data facilitates structure identification • Related metadata facilitates candidate ranking 41 • Relationship mappings and chemical lists of great utility • Dashboard and contents are one part of the solution • New developments in progress, especially API development, will be very enabling…
  43. 43. Acknowledgements • IT Development team – especially Jeff Edwards and Jeremy Dunne • Chris Grulke for the ChemReg system • Andrew McEachran (now at Agilent) • The curation team focused on data quality 42
  44. 44. Contact Antony Williams US EPA Office of Research and Development Center for Computational Toxicology and Exposure EMAIL: ORCID: 43