Using the US EPA’s CompTox
Chemistry Dashboard for structure
identification and non-targeted analyses
Antony Williams1, Andrew D. McEachran3, Seth Newton2,
Kristin Isaacs2, Katherine Phillips2, Nancy Baker1,
Chris Grulke1 and Jon R. Sobus2
1) National Center for Computational Toxicology, U.S. Environmental Protection Agency, RTP, NC
2) National Exposure Research Laboratory, U.S. Environmental Protection Agency, RTP, NC
3) Oak Ridge Institute of Science and Education (ORISE) Research Participant, Research Triangle Park, NC
March 2018
ACS Spring Meeting, New Orleans
http://www.orcid.org/0000-0002-2668-4821
The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
The CompTox Chemistry Dashboard
• A publicly accessible website delivering access:
– ~760,000 chemicals with related property data
– Experimental and predicted physicochemical property data
– Experimental Human and Ecological hazard data
– Integration to “biological assay data” for 1000s of chemicals
– Information regarding consumer products containing chemicals
– Links to other agency websites and public data resources
– “Literature” searches for chemicals using public resources
– “Batch searching” for thousands of chemicals
– DOWNLOADABLE Open Data for reuse and repurposing
1
CompTox Chemistry Dashboard
https://comptox.epa.gov/dashboard
2
1 of ~761,000 Chemical Pages
3
Access to Chemical Hazard Data
4
Sources of Exposure to Chemicals
5
Dashboard for Structure ID
• Structure Identification using the dashboard
– Formula/mass-based searching – 1 chemical at a time
6
Advanced Searches
7
Advanced Searches
Mass Based Search
8
Advanced Searches
9
Formula Searches
10
Exact Formula Search: C12H17NO
298 Chemicals
11
Dashboard for Structure ID
• Structure Identification using the dashboard
– Formula/mass-based searching – 1 chemical at a time
– Distilling structures into “MS-Ready form”
12
Specific Data-Mappings
“MS-Ready Structures”
13
Diphenhydramine
15 Total MS-Ready Mappings
14
“MS Ready” Formula Search C12H17NO
354 Chemicals
15
Dashboard for Structure ID
• Structure Identification using the dashboard
– Formula/mass-based searching – 1 chemical at a time
– Distilling structures into “MS-Ready form”
– Ranking based on metadata
16
Identifying Known Unknowns
by reference ranking
17
Data source ranking using
the Dashboard
18
DOI: 10.1007/s00216-016-0139-z
Additional Metadata Ranking
• US EPA CompTox Chemistry Dashboard Data Sources
• “CPDat” Consumer Product Database
• PubChem Data Source Count
• PubMed Reference Count
19
20
Additional Metadata Ranking
C12H17NO: 354 Chemicals
21
Additional Metadata Ranking
C12H17NO: 354 Chemicals
Top Ranked Chemical
22
Additional data streams
in development
• US EPA CompTox Chemistry Dashboard Data Sources
• “CPDat” Consumer Product Database
• PubChem Data Source Count
• PubMed Reference Count
• Retention Time Prediction
• Predicted Environmental Media Occurrence
• Presence in Lists
23
0 1 2 3 4
DTXSID5024506
DTXSID3020962
DTXSID0026961
DTXSID2022591
DTXSID9059208
DTXSID1052298
DTXSID5075365
DTXSID2062535
DTXSID0046066
DTXSID90197716
C7H7NO3
Data Sources Retention Time
Media Occurrence Method Compatibility
𝑆𝐶 𝑇𝑂𝑇𝐴𝐿 = 𝑆𝐶 𝐷𝑆 + 𝑆𝐶 𝑃𝑀 + 𝑆𝐶 𝑅𝑇 + 𝑆𝐶 𝑀𝑂 + ⋯
“Chemicals Detected in Water”
24
Dashboard for Structure ID
• Structure Identification using the dashboard
– Formula/mass-based searching – 1 chemical at a time
– Distilling structures into “MS-Ready form”
– Ranking based on metadata
– Batch searching of formulae and masses
25
Batch Search
26
Batch Search
27
Excel Output
28
Batch Search Integration to MetFrag
http://c-ruttkies.github.io/MetFrag/projects/metfragweb/
29
MetFrag Input File
30
Batch Search Integration to MetFrag
http://c-ruttkies.github.io/MetFrag/projects/metfragweb/
31
The Dashboard to Support
MS-Analysis
32
MS-Ready
Structures
Underpin
Analysis
Downloadable Data
33
Future Work: Combined
Substructure/Formula Searching
34
Future Work: Searching Against
Predicted Spectra
35
Future Work: Searching Against
Predicted Spectra
• CFM-ID predicted spectra generated for
700,000 chemicals
– Positive ion, Negative ion, Electron Impact
– Three energies
36
Future Work
Scoring scheme into results
37
𝑆𝐶 𝑇𝑂𝑇𝐴𝐿 = 𝑆𝐶 𝐷𝑆 + 𝑆𝐶 𝑃𝑀 + 𝑆𝐶 𝑅𝑇 + 𝑆𝐶 𝑀𝑂 + ⋯
Conclusion
• The CompTox Chemistry Dashboard provides
access to data for ~760,000 chemicals
• High quality curated data and rich metadata
facilitates mass spec analysis
• “MS-Ready” processed data enables structure
identification
38
Acknowledgments
• The CompTox Chemistry Dashboard team
• NERL colleagues:
– Jon Sobus, Elin Ulrich, Mark Strynar, Seth Newton (NTA Analysis)
– Katherine Phillips, Kathie Dionisio, Kristin Isaacs (Consumer Products
Database)
• Emma Schymanski – Luxembourg Center for
Systems Biomedicine (MS-ready/NTA)
39
Contact
Antony Williams
US EPA Office of Research and Development
National Center for Computational Toxicology (NCCT)
Williams.Antony@epa.gov
ORCID: https://orcid.org/0000-0002-2668-4821
40

Using the US EPA’s CompTox Chemistry Dashboard for structure identification and non-targeted analyses

  • 1.
    Using the USEPA’s CompTox Chemistry Dashboard for structure identification and non-targeted analyses Antony Williams1, Andrew D. McEachran3, Seth Newton2, Kristin Isaacs2, Katherine Phillips2, Nancy Baker1, Chris Grulke1 and Jon R. Sobus2 1) National Center for Computational Toxicology, U.S. Environmental Protection Agency, RTP, NC 2) National Exposure Research Laboratory, U.S. Environmental Protection Agency, RTP, NC 3) Oak Ridge Institute of Science and Education (ORISE) Research Participant, Research Triangle Park, NC March 2018 ACS Spring Meeting, New Orleans http://www.orcid.org/0000-0002-2668-4821 The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
  • 2.
    The CompTox ChemistryDashboard • A publicly accessible website delivering access: – ~760,000 chemicals with related property data – Experimental and predicted physicochemical property data – Experimental Human and Ecological hazard data – Integration to “biological assay data” for 1000s of chemicals – Information regarding consumer products containing chemicals – Links to other agency websites and public data resources – “Literature” searches for chemicals using public resources – “Batch searching” for thousands of chemicals – DOWNLOADABLE Open Data for reuse and repurposing 1
  • 3.
  • 4.
    1 of ~761,000Chemical Pages 3
  • 5.
    Access to ChemicalHazard Data 4
  • 6.
    Sources of Exposureto Chemicals 5
  • 7.
    Dashboard for StructureID • Structure Identification using the dashboard – Formula/mass-based searching – 1 chemical at a time 6
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
    Exact Formula Search:C12H17NO 298 Chemicals 11
  • 13.
    Dashboard for StructureID • Structure Identification using the dashboard – Formula/mass-based searching – 1 chemical at a time – Distilling structures into “MS-Ready form” 12
  • 14.
  • 15.
  • 16.
    “MS Ready” FormulaSearch C12H17NO 354 Chemicals 15
  • 17.
    Dashboard for StructureID • Structure Identification using the dashboard – Formula/mass-based searching – 1 chemical at a time – Distilling structures into “MS-Ready form” – Ranking based on metadata 16
  • 18.
    Identifying Known Unknowns byreference ranking 17
  • 19.
    Data source rankingusing the Dashboard 18 DOI: 10.1007/s00216-016-0139-z
  • 20.
    Additional Metadata Ranking •US EPA CompTox Chemistry Dashboard Data Sources • “CPDat” Consumer Product Database • PubChem Data Source Count • PubMed Reference Count 19
  • 21.
  • 22.
  • 23.
  • 24.
    Additional data streams indevelopment • US EPA CompTox Chemistry Dashboard Data Sources • “CPDat” Consumer Product Database • PubChem Data Source Count • PubMed Reference Count • Retention Time Prediction • Predicted Environmental Media Occurrence • Presence in Lists 23 0 1 2 3 4 DTXSID5024506 DTXSID3020962 DTXSID0026961 DTXSID2022591 DTXSID9059208 DTXSID1052298 DTXSID5075365 DTXSID2062535 DTXSID0046066 DTXSID90197716 C7H7NO3 Data Sources Retention Time Media Occurrence Method Compatibility 𝑆𝐶 𝑇𝑂𝑇𝐴𝐿 = 𝑆𝐶 𝐷𝑆 + 𝑆𝐶 𝑃𝑀 + 𝑆𝐶 𝑅𝑇 + 𝑆𝐶 𝑀𝑂 + ⋯
  • 25.
  • 26.
    Dashboard for StructureID • Structure Identification using the dashboard – Formula/mass-based searching – 1 chemical at a time – Distilling structures into “MS-Ready form” – Ranking based on metadata – Batch searching of formulae and masses 25
  • 27.
  • 28.
  • 29.
  • 30.
    Batch Search Integrationto MetFrag http://c-ruttkies.github.io/MetFrag/projects/metfragweb/ 29
  • 31.
  • 32.
    Batch Search Integrationto MetFrag http://c-ruttkies.github.io/MetFrag/projects/metfragweb/ 31
  • 33.
    The Dashboard toSupport MS-Analysis 32 MS-Ready Structures Underpin Analysis
  • 34.
  • 35.
  • 36.
    Future Work: SearchingAgainst Predicted Spectra 35
  • 37.
    Future Work: SearchingAgainst Predicted Spectra • CFM-ID predicted spectra generated for 700,000 chemicals – Positive ion, Negative ion, Electron Impact – Three energies 36
  • 38.
    Future Work Scoring schemeinto results 37 𝑆𝐶 𝑇𝑂𝑇𝐴𝐿 = 𝑆𝐶 𝐷𝑆 + 𝑆𝐶 𝑃𝑀 + 𝑆𝐶 𝑅𝑇 + 𝑆𝐶 𝑀𝑂 + ⋯
  • 39.
    Conclusion • The CompToxChemistry Dashboard provides access to data for ~760,000 chemicals • High quality curated data and rich metadata facilitates mass spec analysis • “MS-Ready” processed data enables structure identification 38
  • 40.
    Acknowledgments • The CompToxChemistry Dashboard team • NERL colleagues: – Jon Sobus, Elin Ulrich, Mark Strynar, Seth Newton (NTA Analysis) – Katherine Phillips, Kathie Dionisio, Kristin Isaacs (Consumer Products Database) • Emma Schymanski – Luxembourg Center for Systems Biomedicine (MS-ready/NTA) 39
  • 41.
    Contact Antony Williams US EPAOffice of Research and Development National Center for Computational Toxicology (NCCT) Williams.Antony@epa.gov ORCID: https://orcid.org/0000-0002-2668-4821 40