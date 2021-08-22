Successfully reported this slideshow.
Your SlideShare is downloading.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Web-based access to data for >600 disinfection by-products via the EPA CompTox Chemicals Dashboard Antony Williams1, Chris...
1 EPA’s CompTox Chemicals Dashboard A publicly accessible website delivering: - ~883,000 chemicals with related property d...
A single app integrating… 2 2 SEARCH TOX DATA BIOACTIVITY SIMILARITY READ-ACROSS PUBMED BATCH SEARCH
CompTox Chemicals Dashboard https://comptox.epa.gov/dashboard 3 883k Chemical Substances >906k in the next release
EPA Drinking Water Requirements 4
Chemical Lists on the Dashboard >300 lists and growing 5
EPAHFR: Hydraulic Fracturing 6
The advantage of lists • Pulls a relevant dataset of chemicals into a single list • Download of the file provides relevant...
Disinfectants • Where do you find the best list? You ask an expert who reviews the science! • Extract, register and map th...
Download File – SDF 9
Download File – Excel 10 Masses and Formulae support Mass Spectrometry
Mass & Formula Searching 11
Advanced Searches Mass Search 12
Advanced Searches Mass Search 13
MS-Ready Structures for Formula Search 14
“MS-Ready Structures” https://doi.org/10.1186/s13321-018-0299-2 15
16
MS-Ready Mappings 17
MS-Ready Mappings Set 18
MS-Ready Mappings • EXACT Formula: C10H16N2O8: 3 Hits 19
MS-Ready Mappings • Same Input Formula: C10H16N2O8 • MS Ready Formula Search: 125 Chemicals 20
MS-Ready Mappings • 125 chemicals returned in total – 8 of the 125 are single component chemicals – 3 of the 8 are isotope...
Candidate ranking 22
Data Source Ranking of “known unknowns” 23 • Mass and/or formula is for an unknown chemical but contained within a referen...
Is a bigger database better? 24 • ChemSpider was 26 million chemicals then • Much BIGGER today • Is bigger better??
Using Metadata for Ranking • Use available metadata to rank candidates – Associated data sources • Associated lists in the...
Comparing Search Performance 26 • Dashboard content was 720k chemicals • Only 3% of ChemSpider size • What was the compari...
SAME dataset for comparison 27
How did performance compare? 28 For the same 162 chemicals, Dashboard outperforms ChemSpider
How did performance compare? 29
Batch Searching • Singleton searches are useful but we work with thousands of masses and formulae! • Typical questions – W...
Batch Searching Formula/Mass 31
Searching batches using MS-Ready Formula (or mass) searching 32
In Progress 33
Work in Progress • Predicted Spectra for candidate ranking – Viewing and Downloading pre-predicted spectra – Search spectr...
Predicted Mass Spectra http://cfmid.wishartlab.com/ • MS/MS spectra prediction for ESI+, ESI-, and EI • Predictions genera...
Search Expt. vs. Predicted Spectra
• Predictions generated and stored for >700,000 structures • Python code to score experimental vs predicted spectra • Cosi...
Prototype Development Structure/substructure search 38
Conclusion • Dashboard access to data for ~875,000 chemicals • MS-Ready data facilitates structure identification • Relate...
Acknowledgements • CCTE IT development team • All scientists within CCTE that provide data and feedback on the Dashboard •...
Contact Antony Williams US EPA Office of Research and Development National Center for Computational Toxicology EMAIL: Will...
Upcoming SlideShare
Loading in …5
×
Science
Aug. 22, 2021
55 views

0

Share

Download to read offline

Web-based access to data for >600 disinfection by-products via the EPA CompTox Chemicals Dashboard

Download to read offline

Science
Aug. 22, 2021
55 views

The US EPA’s CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard) is a freely available web-based application providing access to data for ~900,000 chemical substances, the majority of these represented as chemical structures. The Dashboard also provides access to segregated lists of chemicals that are of specific interest to relevant stakeholders and, in particular, a list of hundreds of disinfection by-product (DBP) chemicals reported in the literature and detected in the laboratory using mass spectrometric techniques. Many of these chemicals are explicit chemical structures whose structures have been confirmed using purchased or synthesized reference standards. However, some of these chemicals may be ambiguous in nature with no explicit positional isomers being possible to define but the formula and mass spec fragmentation sufficient to define a class of chemicals (e.g. dichlorophenol). Such chemicals may be represented with ambiguous chemical structure forms, so-called Markush structures, and mapped to the individual class members. Chemicals accessible via the Dashboard can include access to a wide array of computed and measured physicochemical properties, in vitro high-throughput screening data and in vivo toxicity data, product use information extracted from safety data sheets, and integrated chemical linkages to a growing list of literature, toxicology, and analytical chemistry websites. Since DBP chemicals are primarily identified using mass spectrometric techniques specific search types have been developed to directly support the non-targeted screening community, enabling cohesive workflows to support data generation for the detection and assessment of environmental exposures to chemicals contained within the database. This presentation will provide an overview of the Dashboard, the ongoing expansion of the DBP chemical list and specific functionality supporting identification of DBPs by mass spectrometry. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

License: CC Attribution-NonCommercial License

Recommended

no profile picture user

  • Be the first to comment

  • Be the first to like this

Web-based access to data for >600 disinfection by-products via the EPA CompTox Chemicals Dashboard

  1. 1. Web-based access to data for >600 disinfection by-products via the EPA CompTox Chemicals Dashboard Antony Williams1, Chris Grulke1 and Susan Richardson2 1Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, RTP, NC 2University of South Carolina, Department of Chemistry and Biochemistry, Columbia, SC 29208 August 2021 ACS Fall Meeting, Atlanta http://www.orcid.org/0000-0002-2668-4821 The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
  2. 2. 1 EPA’s CompTox Chemicals Dashboard A publicly accessible website delivering: - ~883,000 chemicals with related property data - Experimental and predicted physicochemical property data - Integration to “biological assay data” for 1000’s of chemicals - Information regarding consumer products containing chemicals - Links to other agency websites and public data resources - “Literature” searches for chemicals using public resources - “Batch searching” for thousands of chemicals - Downloadable Open Data for reuse and repurposing
  3. 3. A single app integrating… 2 2 SEARCH TOX DATA BIOACTIVITY SIMILARITY READ-ACROSS PUBMED BATCH SEARCH
  4. 4. CompTox Chemicals Dashboard https://comptox.epa.gov/dashboard 3 883k Chemical Substances >906k in the next release
  5. 5. EPA Drinking Water Requirements 4
  6. 6. Chemical Lists on the Dashboard >300 lists and growing 5
  7. 7. EPAHFR: Hydraulic Fracturing 6
  8. 8. The advantage of lists • Pulls a relevant dataset of chemicals into a single list • Download of the file provides relevant data to user: structure, CASRN, Names, InChI • “Send to batch” provides access to all other data of interest – hazard, in vitro bioactivity, exposure, properties, relationship mappings to salts and so much more 7
  9. 9. Disinfectants • Where do you find the best list? You ask an expert who reviews the science! • Extract, register and map the data between parents and by-products 8
  10. 10. Download File – SDF 9
  11. 11. Download File – Excel 10 Masses and Formulae support Mass Spectrometry
  12. 12. Mass & Formula Searching 11
  13. 13. Advanced Searches Mass Search 12
  14. 14. Advanced Searches Mass Search 13
  15. 15. MS-Ready Structures for Formula Search 14
  16. 16. “MS-Ready Structures” https://doi.org/10.1186/s13321-018-0299-2 15
  17. 17. 16
  18. 18. MS-Ready Mappings 17
  19. 19. MS-Ready Mappings Set 18
  20. 20. MS-Ready Mappings • EXACT Formula: C10H16N2O8: 3 Hits 19
  21. 21. MS-Ready Mappings • Same Input Formula: C10H16N2O8 • MS Ready Formula Search: 125 Chemicals 20
  22. 22. MS-Ready Mappings • 125 chemicals returned in total – 8 of the 125 are single component chemicals – 3 of the 8 are isotope-labeled – 3 are neutral compounds and 2 are charged 21
  23. 23. Candidate ranking 22
  24. 24. Data Source Ranking of “known unknowns” 23 • Mass and/or formula is for an unknown chemical but contained within a reference database • Most likely candidate chemicals have the most associated data sources, most associated lit. articles or both C14H22N2O3 266.16304 Chemical Reference Database Sorted candidate structures
  25. 25. Is a bigger database better? 24 • ChemSpider was 26 million chemicals then • Much BIGGER today • Is bigger better??
  26. 26. Using Metadata for Ranking • Use available metadata to rank candidates – Associated data sources • Associated lists in the underlying database • Associated data sources in PubChem • Specific types (e.g. water, surfactants, pesticides etc.) – Number of associated literature articles (Pubmed) – Chemicals in the environment – the number of products/categories containing the chemical is a very important source of data 25
  27. 27. Comparing Search Performance 26 • Dashboard content was 720k chemicals • Only 3% of ChemSpider size • What was the comparison in performance?
  28. 28. SAME dataset for comparison 27
  29. 29. How did performance compare? 28 For the same 162 chemicals, Dashboard outperforms ChemSpider
  30. 30. How did performance compare? 29
  31. 31. Batch Searching • Singleton searches are useful but we work with thousands of masses and formulae! • Typical questions – What is the list of chemicals for the formula CxHyOz – What is the list of chemicals for a mass +/- error – Can I get chemical lists in Excel files? In SDF files? – Can I include properties in the download file? 30
  32. 32. Batch Searching Formula/Mass 31
  33. 33. Searching batches using MS-Ready Formula (or mass) searching 32
  34. 34. In Progress 33
  35. 35. Work in Progress • Predicted Spectra for candidate ranking – Viewing and Downloading pre-predicted spectra – Search spectra against the database 34
  36. 36. Predicted Mass Spectra http://cfmid.wishartlab.com/ • MS/MS spectra prediction for ESI+, ESI-, and EI • Predictions generated and stored for >800,000 structures, to be accessible via Dashboard 35
  37. 37. Search Expt. vs. Predicted Spectra
  38. 38. • Predictions generated and stored for >700,000 structures • Python code to score experimental vs predicted spectra • Cosine dot product match score calculation August 26, 2019 Nontargeted screening of wastewater for water reuse using mass spectrometry Current Advances in Water Analysis 37 CFM-ID Predicted Library Available
  39. 39. Prototype Development Structure/substructure search 38
  40. 40. Conclusion • Dashboard access to data for ~875,000 chemicals • MS-Ready data facilitates structure identification • Related metadata facilitates candidate ranking 39 • Relationship mappings and chemical lists of great utility • Dashboard and contents are one part of the solution • New API and Web Services are in development
  41. 41. Acknowledgements • CCTE IT development team • All scientists within CCTE that provide data and feedback on the Dashboard • Ann Richard and the CCTE curation team
  42. 42. Contact Antony Williams US EPA Office of Research and Development National Center for Computational Toxicology EMAIL: Williams.Antony@epa.gov ORCID: https://orcid.org/0000-0002-2668-4821 41

    Be the first to comment

The US EPA’s CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard) is a freely available web-based application providing access to data for ~900,000 chemical substances, the majority of these represented as chemical structures. The Dashboard also provides access to segregated lists of chemicals that are of specific interest to relevant stakeholders and, in particular, a list of hundreds of disinfection by-product (DBP) chemicals reported in the literature and detected in the laboratory using mass spectrometric techniques. Many of these chemicals are explicit chemical structures whose structures have been confirmed using purchased or synthesized reference standards. However, some of these chemicals may be ambiguous in nature with no explicit positional isomers being possible to define but the formula and mass spec fragmentation sufficient to define a class of chemicals (e.g. dichlorophenol). Such chemicals may be represented with ambiguous chemical structure forms, so-called Markush structures, and mapped to the individual class members. Chemicals accessible via the Dashboard can include access to a wide array of computed and measured physicochemical properties, in vitro high-throughput screening data and in vivo toxicity data, product use information extracted from safety data sheets, and integrated chemical linkages to a growing list of literature, toxicology, and analytical chemistry websites. Since DBP chemicals are primarily identified using mass spectrometric techniques specific search types have been developed to directly support the non-targeted screening community, enabling cohesive workflows to support data generation for the detection and assessment of environmental exposures to chemicals contained within the database. This presentation will provide an overview of the Dashboard, the ongoing expansion of the DBP chemical list and specific functionality supporting identification of DBPs by mass spectrometry. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

Views

Total views

55

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

0

Shares

0

Comments

0

Likes

0

×