Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Informatics Approaches supporting UVCB Chemicals in the US-EPA CompTox Chemicals Dashboard


Published on

The National Center for Computational Toxicology (NCCT) at the US Environmental Protection Agency has measured, assembled and delivered an enormous quantity and diversity of data for the environmental sciences. This includes high-throughput in vitro screening data, legacy in vivo animal data, and functional use data, exposure models and chemical databases with associated properties. The CompTox Chemicals Dashboard provides access to data associated with ~875,000 chemical substances with approximately 20,000 of these substances being Unknown or Variable Composition, Complex reaction products and Biological materials (UVCB substances). The dashboard is increasingly applied to supporting the needs of “non-targeted analysis” (NTA), the identification of chemical substances using analytical science, specifically mass spectrometry (MS). Even though complex mixtures are being analyzed, MS approaches identify chemical constituents in the mixture, with many of these being ambiguous in terms of substitution patterns. This talk will review our efforts to utilize generic and Markush representations and enumeration approaches to map structure candidates identified through NTA in the growing chemistry content within the Dashboard. We will also discuss how enumeration approaches can help in profiling UVCB chemicals for physicochemical parameter ranges and how this information can be of value in terms of hazard and risk assessment. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Informatics Approaches supporting UVCB Chemicals in the US-EPA CompTox Chemicals Dashboard

  1. 1. Informatics Approaches supporting UVCB Chemicals in the US-EPA CompTox Chemicals Dashboard Antony Williams1, Chris Grulke1 and Emma Schymanski2 Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, RTP, NC Luxembourg Center for Systems Biomedicine, University of Luxembourg, Luxembourg, Europe November 2019 SETAC, Toronto, Canada The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
  2. 2. Overview • The CompTox Chemicals Dashboard - web- based database of 875k substances • Associated data including: – Experimental and predicted physicochemical data – In vivo hazard data – In vitro bioactivity screening data – Link farm to tens of public resources • Integrated modules – read-across, lit search • Data mappings and searches supporting Mass Spectrometry & structure identification 1
  3. 3. CompTox Chemicals Dashboard 2 875k Chemical Substances
  4. 4. BASIC Search 3
  5. 5. Detailed Chemical Pages 4
  6. 6. Human and Eco Hazard Data • ToxVal Database contains following data: –~700,000 toxicity values –~30 sources of data –~22,000 sub-sources –~5000 journals cited –~70,000 literature citations 5
  7. 7. In Vitro Bioassay Screening ToxCast and Tox21 6
  8. 8. Related Substances – Transformation Products, “Monomer-Polymer” 7 What No Structures???
  9. 9. “UVCB” Chemical Substances 8
  10. 10. UVCB Chemicals 9
  11. 11. Lots of UVCBS in Commerce…. 10
  12. 12. Example UVCBs with no structures 11 • Aroclor 1254 • Toxaphene • Xylenes • Demeton • Aroclor 1248 • Asbestos • Technical chlordane • Coke oven emissions • Creosote • Diesel engine exhaust • Nickel refinery dust • Nickel, soluble salts • Refractory ceramic fibers • Toluene diisocyanate
  13. 13. What is Toxaphene? “Complex, but reproducible mixture of at least 175 distinct C10-chloro compounds, having an approximate overall empirical formula of C10H10Cl8.; the 2 most active components are a C10H10Cl8 compound and a C10H11Cl7 compound which had been elucidated as 2,2,5-endo,6-exo,8,9,10-heptachlorobornane. Produced by the chlorination of camphene to 67-69% chlorine by weight and made up of compds. of C10 H8 Cl10, C10 H18-n Cl n (mostly poychloroboranes) and C10 H16-n Cl n (oychloroboranes and/or polychlorotricyclenes) with n = 6 to 9” 12
  14. 14. Toxaphene has rich data 13
  15. 15. UVCB: Mapped Chemicals Polymer individual components
  16. 16. Linear Alkylbenzenesulfonates 15 • Ambiguous representations of some structures are possible • “Markush” representations can be valuable for describing and mapping chemicals
  17. 17. “Markush Structures” 16
  18. 18. Chemical “grouping” – PCBs, PBDEs 17 PCBs PBDEs
  19. 19. Some Markush chemicals are bounded 18 Previously 1-1 mapping
  20. 20. Now Markush Enumeration 19
  21. 21. UVCB: Complex Surfactants 20
  22. 22. UVCB: Complex Surfactants 21
  23. 23. Chemical Lists and Categories 22
  24. 24. EPAHFR: Hydraulic Fracturing 23
  25. 25. TSCA Inventory 24
  26. 26. TSCA Inventory List Active Non-confidential portion 25 Over 11,500 out of 23,500 have NO STRUCTURES
  27. 27. Batch Searching 26
  28. 28. Batch Searching • Singleton searches are useful but people generally want data on LOTS of chemicals! • Typical questions – What is the list of chemicals for the formula CxHyOz – What is the list of chemicals for a mass +/- error – Can I get chemical lists in Excel files? In SDF files? – Can I include properties in the download file? 27
  29. 29. Batch Search Names 28 Excel Download
  30. 30. Add Other Data of Interest 29
  31. 31. Related Substance Relationships
  32. 32. Conclusion • Building an integrated hub for environmental chemistry data to serve computational toxicology • Transparent access to data and models – file downloads, SQL data dumps and web services • Expansion of functionality to serve all data streams generated by NCCT across the agency & community 31 • Data QUALITY is a key focus - ongoing curation • Ongoing API development will provide enhanced access to data streams
  33. 33. Acknowledgements EPA-RTP • An enormous team of contributors from CCTE, especially the IT software development team • Our curation team for their care and focus on data quality • Multiple centers and laboratories across the EPA • Many public domain databases and open data contributors
  34. 34. Contact Antony Williams CCTE, US EPA Office of Research and Development, ORCID: 33