Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Delivering web-based access to data and algorithms to support computational toxicology: the US-EPA CompTox Chemicals Dashboard

32 views

Published on

The US-EPA National Center for Computational Toxicity (NCCT) has been generating data and building software applications and web-based chemistry databases for over a decade. During this period the center has analyzed thousands of chemicals in hundreds of bioassays, has researched high-throughput physicochemical property measurements and investigated approaches for high throughput toxicokinetics. NCCT continues to expand the battery of assays and number of chemicals under examination and is now investigating the application of transcriptomics. In parallel to these experimental efforts, and to support our efforts to develop new approaches to prioritize chemicals based on potential human health risks, we aggregate and curate data streams of various types to support prediction models. Over the past few years some of the data have been delivered through prototype web-based “dashboards” for public consumption. The latest of these web applications, the CompTox Chemicals Dashboard, is an integrated access point to obtain information associated with 875,000 chemical substances and providing experimental and predicted data of various types. This includes physicochemical and fate and transport data, bioactivity data, exposure data and integrated literature searches. Real-time predictions and generalized read-across are possible and advanced search capabilities are available to support EPA-related projects including mass spectrometry non-targeted analysis. This presentation will provide an overview of the CompTox Chemicals Dashboard and the its role in delivering access to the outputs of NCCT. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

Published in: Science
  • Be the first to comment

Delivering web-based access to data and algorithms to support computational toxicology: the US-EPA CompTox Chemicals Dashboard

  1. 1. Delivering web-based access to data and algorithms to support computational toxicology: US EPA CompTox Chemicals Dashboard Antony Williams, Chris Grulke, Ann Richard, Richard Judson, Grace Patlewicz, Imran Shah, John Wambaugh, Katie Paul-Friedman, Jeremy Dunne and Jeff Edwards National Center for Computational Toxicology, U.S. Environmental Protection Agency, RTP, NC August 2019 ACS Fall Meeting, San Diego http://www.orcid.org/0000-0002-2668-4821 The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
  2. 2. Overview • The CompTox Chemicals Dashboard - web- based database of 875k substances • Associated data including: – Experimental and predicted physicochemical data – In vivo hazard data – In vitro bioactivity screening data – Link farm to tens of public resources • Integrated modules – read-across, lit search • Data mappings and searches supporting Mass Spectrometry & structure identification 1
  3. 3. CompTox Chemicals Dashboard https://comptox.epa.gov/dashboard 2 875k Chemical Substances
  4. 4. BASIC Search 3
  5. 5. Detailed Chemical Pages 4
  6. 6. Experimental and Predicted Data 5
  7. 7. Transparency for prediction models 6
  8. 8. OPERA Predicted Properties 7 OPERA Models: https://github.com/kmansouri/OPERA
  9. 9. Access to Chemical Hazard Data 8
  10. 10. Hazard Data from “ToxVal_DB” • ToxVal Database contains following data: –~800,000 toxicity values –~30 sources of data –~22,000 sub-sources –~5000 journals cited –~70,000 literature citations 9
  11. 11. In Vitro Bioassay Screening ToxCast and Tox21 10
  12. 12. In Vitro Bioassay Screening ToxCast and Tox21 11
  13. 13. Bioactivity: Downloadable Data https://www.epa.gov/chemical-research/exploring-toxcast-data- downloadable-data 12
  14. 14. Sources of Exposure to Chemicals 13
  15. 15. Sources of Exposure to Chemicals 14
  16. 16. An “Executive Summary” Quick Look Tox Info 15
  17. 17. Identifiers to Support Searches 16
  18. 18. Quality Control of the Database • We have full time curators checking data 17
  19. 19. Built-in “Modules” 18
  20. 20. Abstract Sifter for Excel 19
  21. 21. Literature Searching 20
  22. 22. Literature Searching 21
  23. 23. Literature Searching 22
  24. 24. Generalized Read-Across (GenRA) 23
  25. 25. Related Publications 24
  26. 26. Mapped Relationships 25
  27. 27. Relationships in the Data 26
  28. 28. 27
  29. 29. “MS-Ready Structures” https://doi.org/10.1186/s13321-018-0299-2 28
  30. 30. Bisphenol A 27 Total MS-Ready Mappings 29
  31. 31. Related Substances – Transformation Products, “Monomer-Polymer” 30 What No Structures???
  32. 32. “UVCB” Chemical Substances 31
  33. 33. UVCB Chemicals 32
  34. 34. UVCB: Complex Surfactants 33
  35. 35. “Markush Structures” https://en.wikipedia.org/wiki/Markush_structure 34
  36. 36. UVCB: Complex Surfactants 35
  37. 37. Chemical Lists and Categories 36
  38. 38. Category example – PAHs 37
  39. 39. EPAHFR: Hydraulic Fracturing 38
  40. 40. List of Assays
  41. 41. From Assay to Chemicals… 40
  42. 42. Other Searches 41
  43. 43. Product/Use Categories 42
  44. 44. Lubricant 43
  45. 45. Lots of UVCBS in Commerce…. 44
  46. 46. Other Searches Chemical-Biology 45
  47. 47. Assay/Gene Search 46
  48. 48. Assay/Gene Search Finds associated assays/chemicals 47
  49. 49. Mass/Formula Searching and Metadata Ranking 48
  50. 50. Advanced Searches Mass Search 49
  51. 51. Advanced Searches Mass Search 50
  52. 52. MS-Ready Structures for Formula Search 51
  53. 53. MS-Ready Mappings • EXACT Formula: C10H16N2O8: 3 Hits 52
  54. 54. MS-Ready Mappings • Same Input Formula: C10H16N2O8 • MS Ready Formula Search: 125 Chemicals 53
  55. 55. Mass Spec Focused Applications 54
  56. 56. Mass Spec Focused Applications 55
  57. 57. Batch Searching 56
  58. 58. Batch Searching • Singleton searches are useful but people generally want data on LOTS of chemicals! • Typical questions – What is the list of chemicals for the formula CxHyOz – What is the list of chemicals for a mass +/- error – Can I get chemical lists in Excel files? In SDF files? – Can I include properties in the download file? 57
  59. 59. Aggregate data for a list of chemicals 58
  60. 60. Batch Search Names 59 Excel Download
  61. 61. Add Other Data of Interest 60
  62. 62. Batch Search in specific lists 61
  63. 63. Built in Checks… 62
  64. 64. Related Substance Relationships
  65. 65. Batch Searching Formula/Mass 64
  66. 66. Searching batches using MS-Ready Formula (or mass) searching 65
  67. 67. Real-Time Predictions 66
  68. 68. Real-Time Predictions 67
  69. 69. Real-Time Predictions with detailed calculation reports 68
  70. 70. Real-Time Predictions with detailed calculation reports 69
  71. 71. Open Data Download Files 70
  72. 72. Downloadable Data 71
  73. 73. Work in Progress Prototypes in Development 72
  74. 74. Predicted Mass Spectra http://cfmid.wishartlab.com/ • MS/MS spectra prediction for ESI+, ESI-, and EI • Predictions generated and stored for >800,000 structures, to be accessible via Dashboard 73
  75. 75. Search Expt. vs. Predicted Spectra
  76. 76. Search Expt. vs. Predicted Spectra
  77. 77. Spectral Viewer Comparison 76
  78. 78. Prototype Development 77
  79. 79. In Progress : pKa Prediction Model • pKa prediction models based on Open Data Set of 8000 chemicals – acidic, basic and amphoteric chemicals • Accepted for publication to Journal of Cheminformatics78
  80. 80. Conclusion • Building an integrated hub for environmental chemistry data to serve computational toxicology • Transparent access to data and models – file downloads, SQL data dumps and web services • Expansion of functionality to serve all data streams generated by NCCT across the agency & community 79 • Data QUALITY is a key focus - ongoing curation • Ongoing API development will provide enhanced access to data streams
  81. 81. Acknowledgements EPA-RTP • An enormous team of contributors from NCCT, especially the IT software development team • Our curation team for their care and focus on data quality • Multiple centers and laboratories across the EPA • Many public domain databases and open data contributors
  82. 82. Contact Antony Williams NCCT, US EPA Office of Research and Development, Williams.Antony@epa.gov ORCID: https://orcid.org/0000-0002-2668-4821 81 https://doi.org/10.1186/s13321-017-0247-6

×