Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

US EPA CompTox Chemicals Dashboard Data Integration Hub to Support Environmental Science

108 views

Published on

A presentation given at the Global Marine Summit at UNC Wilmington on October 9th 2019. The focus of the presentation was an overview of the EPA CompTox Chemicals Dashboard with a specific focus on providing access to data to chemicals such as algal toxins and mycotoxins.

Published in: Science
  • Be the first to comment

  • Be the first to like this

US EPA CompTox Chemicals Dashboard Data Integration Hub to Support Environmental Science

  1. 1. US EPA CompTox Chemicals Dashboard Data Integration Hub to Support Environmental Science Antony Williams Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, RTP, NC Global Marine Summit 2019 UNCW, Wilmington, NC http://www.orcid.org/0000-0002-2668-4821 The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
  2. 2. A little bit about me… 1 • NMR spectroscopist by training • …ultimately focused on CASE Analysis (Computer-Assisted Structure Elucidation)
  3. 3. CASE Analysis – Elucidating VERY complex chemical structures 2
  4. 4. A little bit about me… • We built this free website… • …that has about 100,000 users a day… 3
  5. 5. Bringing large databases and CASE together 4 • Application of computer-assisted structure elucidation using ACD/Structure Elucidator and data obtained from the ChemSpider database hosted by the RSC
  6. 6. Today I represent US EPA… 5
  7. 7. CompTox Chemicals Dashboard • A publicly accessible website delivering access: – ~875,000 chemicals with related property data – Searchable by chemical, product use, gene and assay (ToxCast) – Experimental and predicted physicochemical property data – “Bioactivity data” for the ToxCast/Tox21 project – Links to other agency websites and public data resources – “Literature” searches for chemicals using public resources – “Batch searching” for thousands of chemicals – DOWNLOADABLE Open Data for reuse and repurposing 6
  8. 8. CompTox Chemicals Dashboard https://comptox.epa.gov/dashboard 7 875k Chemical Substances
  9. 9. Type-ahead Search 8
  10. 10. Substring Search: Enniatin (10/29) 9
  11. 11. Chemical Details Page 10
  12. 12. Experimental & Predicted Properties 11
  13. 13. Open Source Prediction Models 12
  14. 14. OPERA Predicted Properties 13 OPERA Models: https://github.com/kmansouri/OPERA
  15. 15. Plus Real-Time Predictions 14
  16. 16. Toxicity Estimation Software Tool 15
  17. 17. Access to Chemical Hazard Data 16
  18. 18. Hazard Data from “ToxVal_DB” • ToxVal Database contains following data: –~800,000 toxicity values –~30 sources of data –~22,000 sub-sources –~5000 journals cited –~70,000 literature citations 17
  19. 19. In Vitro Bioassay Screening ToxCast and Tox21 18
  20. 20. In Vitro Bioassay Screening ToxCast and Tox21 19
  21. 21. Identifiers to Support Searches 20
  22. 22. Built in “Modules” 21
  23. 23. Literature Searching 22
  24. 24. Literature Searching 23
  25. 25. Literature Searching 24
  26. 26. Sifting retrieved articles 25
  27. 27. Direct Link to PubMed 26
  28. 28. Abstract Sifter for Excel 27
  29. 29. Mapped Relationships 28
  30. 30. Relationships in the Data All chemicals: Same Formula 29
  31. 31. Relationships in the Data All chemicals: Same Formula 30
  32. 32. Relationships in the Data Structure search the web 31
  33. 33. Structure search the web 32
  34. 34. Similar Compounds 33
  35. 35. Related Substances – Metabolites and Transformation Products 34
  36. 36. “External Links” to >70 sites 35
  37. 37. External Links: CTD 36
  38. 38. External Links: MassBank of North America 37
  39. 39. Chemical Lists 38
  40. 40. >200 Lists of Chemicals 39
  41. 41. Filtered Search on Toxins 40
  42. 42. Mycotoxins with MS Data 41
  43. 43. EPA Algal Toxins https://www.epa.gov/cyanohabs 42
  44. 44. Algal Toxins 43
  45. 45. Hazard Data for 25/54 Algal Toxins 44
  46. 46. And who wants to draw these? 45
  47. 47. When you can download them… 46
  48. 48. DO WE REALLY NEED ANOTHER DATABASE? 47
  49. 49. Data Quality is important • Data quality in free web-based databases! 48
  50. 50. Will the correct Microcystin LR Stand Up? ChemSpider Skeleton Search 49
  51. 51. Comparing ChemSpider Structures 50
  52. 52. Comparing ChemSpider Structures 51
  53. 53. Other Searches 52
  54. 54. Delivering a Better Database • An ideal database would provide: – Curated CAS Number-Name mappings with “correct” chemical structures • We have full time curators checking data 53
  55. 55. Names to CASRN Mappings 54
  56. 56. Subtleties 55 E/Z-stereochemistry E-stereochemistry “4-Decene”
  57. 57. Crowdsourced Curation 56
  58. 58. Batch Searching 57
  59. 59. Batch Searching • Singleton searches are useful but people generally want data on LOTS of chemicals! • Typical questions – What is the list of chemicals for the formula CxHyOz – What is the list of chemicals for a mass +/- error – Can I get chemical lists in Excel files? In SDF files? – Can I include properties in the download file? 58
  60. 60. Batch searching 59
  61. 61. Public release had 16/17 mycotoxins. Last one registered 60
  62. 62. Add Other Data of Interest 61
  63. 63. Related Substance Relationships 62
  64. 64. MASS AND FORMULA SEARCHING 63
  65. 65. Advanced Searches Mass and Formula Based Search 64
  66. 66. Advanced Searches Mass and Formula Based Search 65
  67. 67. Batch Searching Formula/Mass 66
  68. 68. WORK IN PROGRESS 67
  69. 69. Predicted Mass Spectra http://cfmid.wishartlab.com/ • MS/MS spectra prediction for ESI+, ESI-, and EI • Predictions generated and stored for >800,000 structures, to be accessible via Dashboard 68
  70. 70. Search Expt. vs. Predicted Spectra
  71. 71. Search Expt. vs. Predicted Spectra
  72. 72. Spectral Viewer Comparison 71
  73. 73. Prototype Development 72
  74. 74. Prototype Development 73
  75. 75. Agilent Dataset • Agilent : “Mycotoxins and Metabolites Personal Compound Database and Library” • Registered for next release… 74
  76. 76. Please help • Help grow the lists of Mycotoxins and Algal Toxins – please suggest additions • Next up – structures of microviridins… • Email me at williams.antony@epa.gov 75
  77. 77. Conclusion • Building an integrated hub for environmental chemistry • Transparent access to data and models • Data QUALITY is a key focus - ongoing curation • Microcystins and algal toxins are two growing “lists” 76
  78. 78. Acknowledgements EPA-RTP • An enormous team of contributors from NCCT, especially the IT software development team • Our curation team for their care and focus on data quality • Multiple centers and laboratories across the EPA • Many public domain databases and open data contributors
  79. 79. Contact Antony Williams NCCT, US EPA Office of Research and Development, Williams.Antony@epa.gov ORCID: https://orcid.org/0000-0002-2668-4821 78 https://doi.org/10.1186/s13321-017-0247-6

×