Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using the US EPA’s CompTox Chemistry Dashboard for structure identification and non-targeted analyses

612 views

Published on

High resolution mass spectrometry (HRMS) and non-targeted analysis (NTA) are advancing the identification of emerging contaminants in environmental matrices, improving the means by which exposure analyses can be conducted. However, confidence in structure identification of unknowns in NTA presents challenges to analytical chemists. Structure identification requires integration of complementary data types such as reference databases, fragmentation prediction tools, and retention time prediction models. The goal of this research is to optimize and implement structure identification functionality within the US EPA’s CompTox Chemistry Dashboard, an open chemistry resource and web application containing data for ~760,000 substances. Rank-ordering the number of sources associated with chemical records within the Dashboard (Data Source Ranking) improves the identification of unknowns by bringing the most likely candidate structures to the top of a search results list. Database searching has been further optimized with the generation of MS-Ready Structures. MS-Ready structures are de-salted, stripped of stereochemistry, and mixture separated to replicate the form of a chemical observed via HRMS. Functionality to conduct batch searching of molecular formulae and monoisotopic masses was designed and released to improve searching efforts. Finally, a scoring-based identification scheme was developed, optimized, and surfaced via the Dashboard using multiple data streams contained within the database underlying the Dashboard. The scoring-based identification scheme improved the identification of unknowns over previous efforts using data source ranking alone. Combining these steps within an open chemistry resource provides a freely available software tool for structure identification and NTA. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Using the US EPA’s CompTox Chemistry Dashboard for structure identification and non-targeted analyses

  1. 1. Using the US EPA’s CompTox Chemistry Dashboard for structure identification and non-targeted analyses Antony Williams1, Andrew D. McEachran3, Seth Newton2, Kristin Isaacs2, Katherine Phillips2, Nancy Baker1, Chris Grulke1 and Jon R. Sobus2 1) National Center for Computational Toxicology, U.S. Environmental Protection Agency, RTP, NC 2) National Exposure Research Laboratory, U.S. Environmental Protection Agency, RTP, NC 3) Oak Ridge Institute of Science and Education (ORISE) Research Participant, Research Triangle Park, NC March 2018 ACS Spring Meeting, New Orleans http://www.orcid.org/0000-0002-2668-4821 The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
  2. 2. The CompTox Chemistry Dashboard • A publicly accessible website delivering access: – ~760,000 chemicals with related property data – Experimental and predicted physicochemical property data – Experimental Human and Ecological hazard data – Integration to “biological assay data” for 1000s of chemicals – Information regarding consumer products containing chemicals – Links to other agency websites and public data resources – “Literature” searches for chemicals using public resources – “Batch searching” for thousands of chemicals – DOWNLOADABLE Open Data for reuse and repurposing 1
  3. 3. CompTox Chemistry Dashboard https://comptox.epa.gov/dashboard 2
  4. 4. 1 of ~761,000 Chemical Pages 3
  5. 5. Access to Chemical Hazard Data 4
  6. 6. Sources of Exposure to Chemicals 5
  7. 7. Dashboard for Structure ID • Structure Identification using the dashboard – Formula/mass-based searching – 1 chemical at a time 6
  8. 8. Advanced Searches 7
  9. 9. Advanced Searches Mass Based Search 8
  10. 10. Advanced Searches 9
  11. 11. Formula Searches 10
  12. 12. Exact Formula Search: C12H17NO 298 Chemicals 11
  13. 13. Dashboard for Structure ID • Structure Identification using the dashboard – Formula/mass-based searching – 1 chemical at a time – Distilling structures into “MS-Ready form” 12
  14. 14. Specific Data-Mappings “MS-Ready Structures” 13
  15. 15. Diphenhydramine 15 Total MS-Ready Mappings 14
  16. 16. “MS Ready” Formula Search C12H17NO 354 Chemicals 15
  17. 17. Dashboard for Structure ID • Structure Identification using the dashboard – Formula/mass-based searching – 1 chemical at a time – Distilling structures into “MS-Ready form” – Ranking based on metadata 16
  18. 18. Identifying Known Unknowns by reference ranking 17
  19. 19. Data source ranking using the Dashboard 18 DOI: 10.1007/s00216-016-0139-z
  20. 20. Additional Metadata Ranking • US EPA CompTox Chemistry Dashboard Data Sources • “CPDat” Consumer Product Database • PubChem Data Source Count • PubMed Reference Count 19
  21. 21. 20 Additional Metadata Ranking C12H17NO: 354 Chemicals
  22. 22. 21 Additional Metadata Ranking C12H17NO: 354 Chemicals
  23. 23. Top Ranked Chemical 22
  24. 24. Additional data streams in development • US EPA CompTox Chemistry Dashboard Data Sources • “CPDat” Consumer Product Database • PubChem Data Source Count • PubMed Reference Count • Retention Time Prediction • Predicted Environmental Media Occurrence • Presence in Lists 23 0 1 2 3 4 DTXSID5024506 DTXSID3020962 DTXSID0026961 DTXSID2022591 DTXSID9059208 DTXSID1052298 DTXSID5075365 DTXSID2062535 DTXSID0046066 DTXSID90197716 C7H7NO3 Data Sources Retention Time Media Occurrence Method Compatibility 𝑆𝐶 𝑇𝑂𝑇𝐴𝐿 = 𝑆𝐶 𝐷𝑆 + 𝑆𝐶 𝑃𝑀 + 𝑆𝐶 𝑅𝑇 + 𝑆𝐶 𝑀𝑂 + ⋯
  25. 25. “Chemicals Detected in Water” 24
  26. 26. Dashboard for Structure ID • Structure Identification using the dashboard – Formula/mass-based searching – 1 chemical at a time – Distilling structures into “MS-Ready form” – Ranking based on metadata – Batch searching of formulae and masses 25
  27. 27. Batch Search 26
  28. 28. Batch Search 27
  29. 29. Excel Output 28
  30. 30. Batch Search Integration to MetFrag http://c-ruttkies.github.io/MetFrag/projects/metfragweb/ 29
  31. 31. MetFrag Input File 30
  32. 32. Batch Search Integration to MetFrag http://c-ruttkies.github.io/MetFrag/projects/metfragweb/ 31
  33. 33. The Dashboard to Support MS-Analysis 32 MS-Ready Structures Underpin Analysis
  34. 34. Downloadable Data 33
  35. 35. Future Work: Combined Substructure/Formula Searching 34
  36. 36. Future Work: Searching Against Predicted Spectra 35
  37. 37. Future Work: Searching Against Predicted Spectra • CFM-ID predicted spectra generated for 700,000 chemicals – Positive ion, Negative ion, Electron Impact – Three energies 36
  38. 38. Future Work Scoring scheme into results 37 𝑆𝐶 𝑇𝑂𝑇𝐴𝐿 = 𝑆𝐶 𝐷𝑆 + 𝑆𝐶 𝑃𝑀 + 𝑆𝐶 𝑅𝑇 + 𝑆𝐶 𝑀𝑂 + ⋯
  39. 39. Conclusion • The CompTox Chemistry Dashboard provides access to data for ~760,000 chemicals • High quality curated data and rich metadata facilitates mass spec analysis • “MS-Ready” processed data enables structure identification 38
  40. 40. Acknowledgments • The CompTox Chemistry Dashboard team • NERL colleagues: – Jon Sobus, Elin Ulrich, Mark Strynar, Seth Newton (NTA Analysis) – Katherine Phillips, Kathie Dionisio, Kristin Isaacs (Consumer Products Database) • Emma Schymanski – Luxembourg Center for Systems Biomedicine (MS-ready/NTA) 39
  41. 41. Contact Antony Williams US EPA Office of Research and Development National Center for Computational Toxicology (NCCT) Williams.Antony@epa.gov ORCID: https://orcid.org/0000-0002-2668-4821 40

×