Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Structure Identification Using High Resolution Mass Spectrometry Data and the EPA’s Chemistry Dashboard

959 views

Published on

The iCSS Chemistry Dashboard is a publicly accessible dashboard provided by the National Center for Computation Toxicology at the US-EPA. It serves a number of purposes, including providing a chemistry database underpinning many of our public-facing projects (e.g. ToxCast and ExpoCast). The available data and searches provide a valuable path to structure identification using mass spectrometry as the source data. With an underlying database of over 720,000 chemicals, the dashboard has already been used to assist in identifying chemicals present in house dust. However, it can also be applied to many other purposes, e.g., the identification of agrochemicals in waste streams. This presentation will provide a review of the EPA’s platform and underlying algorithms used for the purpose of compound identification using high-resolution mass spectrometry data. We will also discuss progress towards a high-throughput non-targeted analysis platform for use by the mass spectrometry community. This abstract does not reflect U.S. EPA policy.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Structure Identification Using High Resolution Mass Spectrometry Data and the EPA’s Chemistry Dashboard

  1. 1. Structure Identification Using High Resolution Mass Spectrometry Data and the EPA Chemistry Dashboard Antony J. Williams†, Andrew McEachran, Jon Sobus, Chris Grulke, Jennifer Smith, Michelle Krzyzanowski, Jordan Foster and Jeff Edwards National Center for Computational Toxicology U.S. Environmental Protection Agency, RTP, NC August 21-25, 2016 ACS Fall Meeting, Philadelphia, PA The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA http://www.orcid.org/0000-0002-2668-4821 @ChemConnector on Twitter
  2. 2. Comparing Analysis Approaches • Targeted Analysis: - We know exactly what we’re looking for - 10s – 100s of chemicals • Suspect Screening Analysis (SSA): - We have chemicals of interest - 100s – 1,000s of chemicals • Non-Targeted Analysis (NTA): - We have no preconceived lists - 1,000s – 10,000s of chemicals - In dust, soil, food, air, water, products, plants, animals, and…us!!
  3. 3. General Goals of SSA/NTA - 1 Dust Sample - Negative Ionization Mode - 300 Extracted “Molecular Features” 1) Prioritize “Molecular Features” 2) Correctly assign formulas 3) Correctly assign structures 4) Determine chemical sources 5) Predict chemical concentrations C17H19NO3 12 µg/g (1) (2) (3) (4) (5) EXPOSURE
  4. 4. Non-targeted analysis challenges • 3000-5000 molecular features in a given sample • Current technologies can identify up to 5% • How can we improve identification??? – Simple workflows – Reliable formula prediction (Instrument) – Accurate ranking of likelihood (Databases)
  5. 5. The General Approach Analytical Instruments Comp. Tools & Workflows Databases
  6. 6. Previous Work with Suspect-Screening
  7. 7. We’re on the Right Path… • … but certainly room for improvement • ~thousands of molecular features (not unique) • 33 confirmed chemicals • State-of-the-art SSA yields <5% confirmed IDs • So what else is in these (and other) samples??
  8. 8. 2012: Definitive Study of Known- Unknowns using ChemSpider 7
  9. 9. 2012: Definitive Study of Known- Unknowns using ChemSpider 8
  10. 10. ChemSpider • http://www.chemspider.com • Grows daily with new depositors and annotations. Example Data Sources 9
  11. 11. Our New Dashboard https://comptox.epa.gov 10
  12. 12. Bisphenol A 11
  13. 13. Physicochemical Properties 12
  14. 14. Bioassay Screening Data 13
  15. 15. Functional Use and Composition 14
  16. 16. External Links 15
  17. 17. National Environmental Methods Index 16
  18. 18. Advanced Search 17
  19. 19. Formula Searching Formulae matching Bisphenol A 18
  20. 20. Formula Search Results 19
  21. 21. Download to Excel 20
  22. 22. Download as SDF file 21
  23. 23. SDF file opened in ChemFolder 22
  24. 24. Rank-ordering all of those hits?? • With so many hits how do you rank order based on formulae? Or mass?? 23
  25. 25. Comparing Performance 24 721k structures
  26. 26. Bisphenol A as an example ChemSpider: 1564 Structures 25
  27. 27. Bisphenol A as an example Dashboard: 215 Structures 26
  28. 28. A more pointed example…C15H15N3O2 6926 results 27
  29. 29. A more pointed example…C15H15N3O2 94 results 28
  30. 30. Particulate Matter Antibiotics used in animal production TYL= Tylosin MON= Monensin TC= Tetracycline OTC= Oxytetracycline CTC= Chlortetracycline McEachran AD, Blackwell BR, Hanson JD, Wooten KJ, Mayer GD, Cox SB, Smith PN. 2015. Antibiotics, bacteria, and antibiotic resistance genes: aerial transport from cattle feed yards via particulate matter. Environ Health Perspect 123:337-343; DOI:10.1289/EHP.1408555 Antibiotics in beef commercial feed
  31. 31. Mass-based Search Formula-based Search Agricultural Source # Compounds Dashboard ChemSpider Dashboard ChemSpider Wastewater land application1 34 1.3 1.8 1.1 1.1 Cattle Feedyard2 5 1.0 1.0 1.0 1.0 1McEachran AD, Shea D, Bodnar W, Nichols EG. 2016. Pharmaceutical Occurrence in groundwater and surface waters in forests land-applied with municipal wastewater. Environ Toxicol Chem 35: 898-905. DOI: 10.1002/etc.3216 2McEachran AD, Blackwell BR, Hanson JD, Wooten KJ, Mayer GD, Cox SB, Smith PN. 2015. Antibiotics, bacteria, and antibiotic resistance genes: aerial transport from cattle feed yards via particulate matter. Environ Health Perspect 123:337-343; DOI:10.1289/EHP.1408555 Mass-based Search Formula Based Search Dashboard ChemSpider Dashboard ChemSpider Tylosin 1/1 1/28 1/1 1/25 Monensin 1/1 1/39 1/1 1/24 Tetracycline 1/ 38 1/4008 1/11 1/355 Oxytetracycline 1/16 1/3271 1/3 1/110 Chlortetracycline 1/23 1/2545 1/3 1/77 Rank Position/Total # Results Mean Rank Position Rank-ordering Comparisons
  32. 32. Chemical Identification Dashboard vs ChemSpider Sorted by number of references (ChemSpider) or data sources (Dashboard) Monoisotopic Mass (+/- 0.005 amu) Search Position of compound sorted Source of List # of Compounds Search Tool Mean Position Median Position #1 #2 #3 #4 #5+ McEachran et al Wastewater 34 ChemSpider 1.8 1 28 5 0 0 1 Dashboard 1.3 1 31 2 0 0 1 Misc. NTA Compounds 13 ChemSpider 2 1 7 5 0 0 1 Dashboard 1.7 1 10 2 0 0 1 Bade et al (2016) 19 ChemSpider 2.1 1 11 2 5 0 1 Dashboard 1.6 1 12 3 3 1 0 Rager et al (2016) 24 ChemSpider 2.25 1 15 2 1 2 4 Dashboard 1.08 1 22 2 0 0 0
  33. 33. Dashboard vs ChemSpider Ranking Summary Mass-based Searching Formula Based Searching Dashboard ChemSpider Dashboard ChemSpider Cumulative Average Position 1.3 2.2 1.2 1.4 % in #1 Position 85% 70% 88% 80% 162 total individual chemicals in search
  34. 34. Functional Use to Sort Candidates 33 Anti-cancer Drug Microbiological Indicator Dye Textile/Product Dye
  35. 35. Future Work • Rank-ordering based on other criteria • Already testing QSARs to build retention time models for ranking • External links to methods: e.g. CDC NIOSH • Formula identification using isotope profiles 34
  36. 36. ToxCast Chemicals What impurities/ interaction products found? Engaging the MS Community
  37. 37. Conclusions • Our NTA research is focused on understanding our exposure to chemicals • New dashboard with focus on high-quality data – no large database will be perfect! • Specific searches/functionality are being developed with Non-targeted Analysis in mind • Dashboard outperforms ChemSpider, a community standard database, in ranking chemicals of environmental concern • Early work on new rank-ordering approaches show that we can improve things even further. 36
  38. 38. Acknowledgements EPA NCCT Chris Grulke Jeff Edwards Ann Richard Jordan Foster Jennifer Smith Andrew McEachran* Michelle Krzyzanowski EPA NERL Kathie Dionisio Katherine Phillips Jon Sobus Mark Strynar Elin Ulrich Seth Newton * = ORISE Participant

×