Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using open data, services and source software to deliver the EPA CompTox Chemicals Dashboard

95 views

Published on

The US EPA CompTox Chemicals Dashboard website provides access to various data types associated with ~900,000 chemical substances and supports the needs of the National Center for Computational Toxicology. The dashboard both consumes data, models and open source code from open as well as delivering data and services back to the community. The dashboard offers access to various types of aggregated and integrated chemistry, biology and toxicology data. The dashboard provides web-based access to data hosted in multiple databases, integrates to an underlying chemical registration system and utilizes both commercial and open QSAR/QSPR models to deliver predicted data for the chemicals. Some of the open source software that we utilize is used for InChI generation, for structure drawing and for real time prediction of both toxicity and physicochemical endpoints. This presentation will provide an overview of the dashboard, review our usage of open source code to deliver the web application, and discuss our present and planned contributions to open science. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Using open data, services and source software to deliver the EPA CompTox Chemicals Dashboard

  1. 1. Using open data, open services, and open source software to deliver the EPA CompTox Chemicals Dashboard Antony Williams1, Chris Grulke1, Kamel Mansouri2, Jeremy Dunne1 and Jeff Edwards1 1) National Center for Computational Toxicology, U.S. Environmental Protection Agency, RTP, NC 2) Integrated Laboratory Systems, Research Triangle Park, NC Fall 2019 ACS Fall Meeting, San Diego http://www.orcid.org/0000-0002-2668-4821 The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
  2. 2. CompTox Chemicals Dashboard https://comptox.epa.gov/dashboard 1 875k Chemical Substances
  3. 3. BASIC Search 2
  4. 4. Detailed Chemical Pages 3
  5. 5. CompTox Chemicals Dashboard • Total data landscape includes: – ~875,000 chemical substances – Experimental & predicted physchem property data – Experimental Human and Ecological hazard data – Bioactivity data for 1000s of chemicals – Consumer products containing chemicals – “Literature” searches for chemicals using PubMed – Real time prediction of physchem/toxicity endpoints 4
  6. 6. CompTox Chemicals Dashboard • To make this happen we CONSUME open data and open source software • We also PRODUCE open data and both free and open source software • This presentation is an overview of what we consume and what we produce… 5
  7. 7. CONSUMER Integrated Wikipedia Snippet 6 • Integrated Wikipedia snippet linked out to full article
  8. 8. PhysChem Data and Predictions 7
  9. 9. CONSUMER: Available Data 2010 files underlying EPI Suite 8
  10. 10. Data required thorough curation 9
  11. 11. CONSUMER KNIME workflows for curation 10
  12. 12. PRODUCER All curated data available 11
  13. 13. PRODUCER TEST and OPERA Predictions 12
  14. 14. Transparency for prediction models 13
  15. 15. PRODUCER OPERA Standalone Application 14
  16. 16. PRODUCER: Open Source https://github.com/kmansouri/OPERA 15
  17. 17. PRODUCER: Other prediction models TEST Desktop Software 16
  18. 18. Bioactivity Data: Tox21 and Toxcast 17
  19. 19. PRODUCER In Vitro Bioassay Screening 18
  20. 20. PRODUCER Bioactivity: Downloadable Data 19
  21. 21. CONSUMER PubChem Widgets - Bioactivities 20
  22. 22. Literature Data 21
  23. 23. CONSUMER PubChem Widgets - Articles 22
  24. 24. CONSUMER PubChem Widgets – Patents 23
  25. 25. CONSUMER: Pubmed Services Literature Searching 24
  26. 26. CONSUMER: Pubmed Services Literature Searching 25
  27. 27. CONSUMER: Pubmed Services Literature Searching 26
  28. 28. PRODUCER Abstract Sifter for Excel 27
  29. 29. Real-Time Predictions 28
  30. 30. CONSUMER epam Ketcher 29
  31. 31. CONSUMER: Ketcher Drawing TEST Real Time Predictions 30
  32. 32. Ketcher Drawing TEST Real Time Predictions 31
  33. 33. TEST detailed calculation reports 32
  34. 34. PRODUCER: TEST Software https://www.epa.gov/chemical-research/toxicity- estimation-software-tool-test 33
  35. 35. PRODUCER: TEST Web Services https://www.epa.gov/sites/production/files/2018-08/documents/ webtest_users_guide.pdf 34
  36. 36. PRODUCER Web Services 35
  37. 37. PRODUCER: Web Services https://actorws.epa.gov/actorws/ • Dozens of web services to provide access to data • Data in UI, JSON and XML format 36
  38. 38. PRODUCER InChIKey to DTXCIDs 37 https://actorws.epa.gov/actorws/dsstox/v02/msready?identifier =UVOFGKIRTCCNKG-UHFFFAOYSA-N
  39. 39. CONSUMER of our services MassBank mapping to Dashboard 38
  40. 40. Open Data Sharing 39
  41. 41. CONSUMER NORMAN Suspect List Exchange 40
  42. 42. PRODUCER Curated Chemical Lists 41
  43. 43. PRODUCER EPAHFR: Hydraulic Fracturing 42
  44. 44. PRODUCER Publishing Open Data: CPDat 43
  45. 45. …and then reused in PubChem 44
  46. 46. PRODUCER Downloadable Data 45
  47. 47. Work in Progress • Multiple projects in progress that use Open Source software – Structure, substructure and similarity searching – MS-Ready and QSAR-Ready data preparation – WebTEST “Batch” predictions – OPERA model predictions – Prediction of MS fragmentation patterns and matching to searched experimental spectra • New services and Full API will be openly available (in future releases) 46
  48. 48. CONSUMER: Prototype Development epam Ketcher + Bingo NoSQL 47
  49. 49. CONSUMER Epam Bingo NoSQL 48
  50. 50. August 26, 2019 Nontargeted screening of wastewater for water reuse using mass spectrometry Current Advances in Water Analysis 49 CONSUMER CFM-ID Fragmentation Prediction 4x10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 Cpd 101: Pseudoephedrine 12.111: (M+H)+: +ESI Product Ion (rt: 12.111 min) Frag=125.0V CID@20.0 (166.1226[z=1] -> **) HI1.d 148.1120 117.0697 133.0888 91.0539 70.0644 106.0655 120.0814 79.0535 100.1109 93.0679 NH OH H3C H3C Counts vs. Mass-to-Charge (m/z) 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 165 Ephedrine
  51. 51. • Predictions generated and stored for >700,000 structures • Python code to score experimental vs predicted spectra • Cosine dot product match score calculation August 26, 2019 Nontargeted screening of wastewater for water reuse using mass spectrometry Current Advances in Water Analysis 50 PRODUCER CFM-ID Predicted Library
  52. 52. PRODUCER Published data is on FigShare 51
  53. 53. Conclusion • Dashboard access to data for ~875,000 chemicals • Dashboard CONSUMES a lot of open source libraries and open data 52 • We PRODUCE open models and data to the community in exchange • We are committed to an open API to provide more complete data access and real time predictions
  54. 54. Acknowledgements • NCCT IT development team • Tommy Cathey, ACTOR Web Services • Nancy Baker, Abstract Sifter • Todd Martin & Valery Tkachenko, WebTEST • Kathie Dionisio & Kristin Isaacs, CPDat • Thanks to Emma Schymanski, University of Luxembourg, for coordinating all efforts with the NORMAN Network for curation of lists on the Suspect Exchange
  55. 55. Contact Antony Williams NCCT, US EPA Office of Research and Development, Williams.Antony@epa.gov ORCID: https://orcid.org/0000-0002-2668-4821 54 https://doi.org/10.1186/s13321-017-0247-6

×