Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Environmental Chemistry Compound
Identification Using High Resolution Mass
Spectrometry Data Integrated to the EPA
Chemist...
Who is NCCT?
• National Center for Computational Toxicology – part of EPA’s
Office of Research and Development
• Research ...
Exposure Data Cannot Keep Pace
with Regulatory Needs
TSCA: > 84,000
P.P. Egeghy et al. Sci Total Environ. 414 (2012) 159–1...
Introducing Our Latest Dashboard
https://comptox.epa.gov
3
• >720,000 chemicals
• >14 years assembling data
Bisphenol A
4
Physicochemical Properties
5
ToxCast Bioassay Screening Data
Useful Meta Data
6
Functional Use and Composition
VERY Useful Meta Data
7
Dashboard: External Links to
Analytical Methods and Data
8
National Environmental Methods Index
9
Previous Work with Suspect-Screening
ONE ASPECT of the dashboard is to
support Non-targeted Analysis
Rank-Ordering of “Known-Unknowns”
using ChemSpider
11
Some history…
• 2007 A Hobby Project
• 2009 ChemSpider Acquired
• May 2015 Joined EPA – what
we are showing is very new
12
Advanced MS Searches
13
Formula Search
14
Found 8 results for 'C8H14ClN5'
Monoisotopic Mass Search
15
Found 344 results for '215.096 ± 0.005 amu'
Download to Excel
16
Download as SDF file
17
Does the Dashboard Add Value?
18
721k structures
Does the Dashboard Add Value?
• Remember:
– Focus on high quality data and curation
– Data sources include EPA data source...
Dilution Example…
Morphine Skeleton
20
ChemSpider 6982 Results!!!
Search for C15H15N3O2
21
Tacedinaline
Methyl Red
C.I Disperse
Yellow 3
Same top hits – different ranking
90 hits only versus 6926 hits
22
18
17
4Tacedinaline
Methyl Red
C.I Disperse
Yellow 3
Using Meta-Data to Sort Candidates
23
Anti-cancer Drug
Microbiological
Indicator Dye
Textile/Product Dye
Chemical Identification
Dashboard vs ChemSpider
Sorted by number of references (ChemSpider) or data sources (Dashboard)
Mo...
Dashboard vs ChemSpider
Ranking Summary
Mass-based Searching Formula Based Searching
Dashboard ChemSpider Dashboard ChemSp...
Active vs Deleted CASRN
26
Collisions in CAS Numbers
27
But there are MANY CASRNs!
• http://web.stanford.edu/group/swain/cinf/c
asreg/snumber.html
28
How Bad Can It Get??
29
How Bad Can It Get?
This one is 316 Deleted CASRN
30
Our OPEN Data is available…
• Various types of data at FTP download site:
ftp://newftp.epa.gov/COMPTOX/Sustainable_Chemist...
Coming December 2016
Batch Searching Names/CASRNs
• What are these chemicals?
32
Coming December 2016
Batch Searching…
33
Coming December 2016
Download to Excel
34
Batch Searching of
Molecular Formula
35
Metadata included for Ranking
36
Need for “MS-Ready Structures”
37
“QSAR-Ready Structures”
• For the purpose of building QSAR Models
we already “standardize” structures
– Desalt/Neutralize
...
“QSAR-Ready Structures”
• Mass and Formula-based searches will be
based on MS-ready structures but
connected to the origin...
Future Work
• Continue to research rank-ordering approaches
• Working on “retention time prediction”
• Search for adducts ...
Acknowledgements
EPA NCCT
Chris Grulke
Jeff Edwards
Ann Richard
Jennifer Smith
Andrew McEachran*
EPA NERL
Jon Sobus
Seth N...
Upcoming SlideShare
Loading in …5
×

Environmental Chemistry Compound Identification Using High Resolution Mass Spectrometry Data Integrated to the EPA Chemistry Dashboard

363 views

Published on

There is a growing need for rapid chemical screening and prioritization to inform regulatory decision-making on thousands of chemicals in the environment. We have previously used high-resolution mass spectrometry to examine household vacuum dust samples using liquid chromatography time-of-flight mass spectrometry (LC-TOF/MS). Using a combination of exact mass, isotope distribution, and isotope spacing, molecular features were matched with a list of chemical formulas from the EPA’s Distributed Structure-Searchable Toxicity (DSSTox) database. This has further developed our understanding of how openly available chemical databases, together with the appropriate searches, could be used for the purpose of compound identification. We report here on the utility of the EPA’s iCSS Chemistry Dashboard for the purpose of compound identification using searches against a database of over 720,000 chemicals. We also examine the benefits of QSAR prediction for the purpose of retention time prediction to allow for alignment of both chromatographic and mass spectral properties. This abstract does not reflect U.S. EPA policy.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Environmental Chemistry Compound Identification Using High Resolution Mass Spectrometry Data Integrated to the EPA Chemistry Dashboard

  1. 1. Environmental Chemistry Compound Identification Using High Resolution Mass Spectrometry Data Integrated to the EPA Chemistry Dashboard Antony J. Williams, Andrew McEachran, Jon Sobus, Seth Newton, Elin Ulrich, Chris Grulke, Kamel Mansouri, Jennifer Smith and Jeff Edwards November 14th, 2016 Eastern Analytical Symposium 2016 http://www.orcid.org/0000-0002-2668-4821 The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
  2. 2. Who is NCCT? • National Center for Computational Toxicology – part of EPA’s Office of Research and Development • Research driven by EPA’s Chemical Safety for Sustainability Research Program – Develop new approaches to evaluate the safety of chemicals – Integrate advances in biology, biotechnology, chemistry, exposure science and computer science 1 • Goal - To identify chemical exposures that may disrupt biological processes and cause adverse outcomes.
  3. 3. Exposure Data Cannot Keep Pace with Regulatory Needs TSCA: > 84,000 P.P. Egeghy et al. Sci Total Environ. 414 (2012) 159–166
  4. 4. Introducing Our Latest Dashboard https://comptox.epa.gov 3 • >720,000 chemicals • >14 years assembling data
  5. 5. Bisphenol A 4
  6. 6. Physicochemical Properties 5
  7. 7. ToxCast Bioassay Screening Data Useful Meta Data 6
  8. 8. Functional Use and Composition VERY Useful Meta Data 7
  9. 9. Dashboard: External Links to Analytical Methods and Data 8
  10. 10. National Environmental Methods Index 9
  11. 11. Previous Work with Suspect-Screening ONE ASPECT of the dashboard is to support Non-targeted Analysis
  12. 12. Rank-Ordering of “Known-Unknowns” using ChemSpider 11
  13. 13. Some history… • 2007 A Hobby Project • 2009 ChemSpider Acquired • May 2015 Joined EPA – what we are showing is very new 12
  14. 14. Advanced MS Searches 13
  15. 15. Formula Search 14 Found 8 results for 'C8H14ClN5'
  16. 16. Monoisotopic Mass Search 15 Found 344 results for '215.096 ± 0.005 amu'
  17. 17. Download to Excel 16
  18. 18. Download as SDF file 17
  19. 19. Does the Dashboard Add Value? 18 721k structures
  20. 20. Does the Dashboard Add Value? • Remember: – Focus on high quality data and curation – Data sources include EPA data sources and a focus on environmental chemistry • No “dilution” by chemical vendors 19
  21. 21. Dilution Example… Morphine Skeleton 20
  22. 22. ChemSpider 6982 Results!!! Search for C15H15N3O2 21 Tacedinaline Methyl Red C.I Disperse Yellow 3
  23. 23. Same top hits – different ranking 90 hits only versus 6926 hits 22 18 17 4Tacedinaline Methyl Red C.I Disperse Yellow 3
  24. 24. Using Meta-Data to Sort Candidates 23 Anti-cancer Drug Microbiological Indicator Dye Textile/Product Dye
  25. 25. Chemical Identification Dashboard vs ChemSpider Sorted by number of references (ChemSpider) or data sources (Dashboard) Monoisotopic Mass (+/- 0.005 amu) Search Position of compound sorted Source of List # of Compounds Search Tool Mean Position Median Position #1 #2 #3 #4 #5+ McEachran et al Wastewater 34 ChemSpider 1.8 1 28 5 0 0 1 Dashboard 1.3 1 31 2 0 0 1 Misc. NTA Compounds 13 ChemSpider 2 1 7 5 0 0 1 Dashboard 1.7 1 10 2 0 0 1 Bade et al (2016) 19 ChemSpider 2.1 1 11 2 5 0 1 Dashboard 1.6 1 12 3 3 1 0 Rager et al (2016) 24 ChemSpider 2.25 1 15 2 1 2 4 Dashboard 1.08 1 22 2 0 0 0
  26. 26. Dashboard vs ChemSpider Ranking Summary Mass-based Searching Formula Based Searching Dashboard ChemSpider Dashboard ChemSpider Cumulative Average Position 1.3 2.2 1.2 1.4 % in #1 Position 85% 70% 88% 80% • Selected peer-reviewed publications • 162 total individual chemicals in search
  27. 27. Active vs Deleted CASRN 26
  28. 28. Collisions in CAS Numbers 27
  29. 29. But there are MANY CASRNs! • http://web.stanford.edu/group/swain/cinf/c asreg/snumber.html 28
  30. 30. How Bad Can It Get?? 29
  31. 31. How Bad Can It Get? This one is 316 Deleted CASRN 30
  32. 32. Our OPEN Data is available… • Various types of data at FTP download site: ftp://newftp.epa.gov/COMPTOX/Sustainable_Chemistry_ Data/Chemistry_Dashboard 31
  33. 33. Coming December 2016 Batch Searching Names/CASRNs • What are these chemicals? 32
  34. 34. Coming December 2016 Batch Searching… 33
  35. 35. Coming December 2016 Download to Excel 34
  36. 36. Batch Searching of Molecular Formula 35
  37. 37. Metadata included for Ranking 36
  38. 38. Need for “MS-Ready Structures” 37
  39. 39. “QSAR-Ready Structures” • For the purpose of building QSAR Models we already “standardize” structures – Desalt/Neutralize – Desolvate – Remove stereochemistry • Some minor tweaks gets us “MS-ready Structures”. ALREADY in our database. 38
  40. 40. “QSAR-Ready Structures” • Mass and Formula-based searches will be based on MS-ready structures but connected to the original chemical (with name, CAS, rank ordering) • MS-ready structures and substance mappings will be available as Open Data 39
  41. 41. Future Work • Continue to research rank-ordering approaches • Working on “retention time prediction” • Search for adducts (+Na, +K, +NH4) and handle decarboxylation, loss of water etc • Expand link outs to Mass Spec databases – Thermo’s mzCloud, Massbank, etc. • Predicting metabolites and degradants • Optimize web services for the community 40
  42. 42. Acknowledgements EPA NCCT Chris Grulke Jeff Edwards Ann Richard Jennifer Smith Andrew McEachran* EPA NERL Jon Sobus Seth Newton Elin Ulrich * = ORISE Participant

×