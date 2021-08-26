Successfully reported this slideshow.
Structure identification approaches using the EPA CompTox Chemicals Dashboard to support mass spectrometry analyses Antony...
1 EPA’s CompTox Chemicals Dashboard A publicly accessible website delivering: - ~883,000 chemicals with related property d...
A single app integrating… 2 2 SEARCH TOX DATA BIOACTIVITY SIMILARITY READ-ACROSS PUBMED BATCH SEARCH
Detailed Chemical Pages 3
Sources of Exposure to Chemicals 4
Physicochemical properties and environmental fate and transport 5
Link farm to public resources 6
Mass Spec Links 7
NIST WebBook https://webbook.nist.gov/chemistry/ 8
MassBank of North America https://mona.fiehnlab.ucdavis.edu 9
Chemical lists 10
>300 Chemical Lists (and growing) 11
“Volatilome” Human Breath 12
“Volatilome” Saliva 13
Disinfection By-Products 14
Tire Crumb Rubber (298) 15
Hydraulic Fracturing (1640) 16
PFAS Lists 17
Related Searches to Support Mass Spectrometry 18
Find me “related structures” Formula-Based Search 19
Select Chemicals of Interest 20
Find me “related structures” Based on Structure Similarity 21
Find me “related structures” Based on Structure Similarity 22
Find me “related structures” Structure Similarity – sort on mass 23
Mass & Formula Searching 24
Advanced Searches Mass Search 25
Advanced Searches Mass Search 26
MS-Ready Structures for Formula Search 27
“MS-Ready Structures” https://doi.org/10.1186/s13321-018-0299-2 28
29
MS-Ready Mappings • EXACT Formula: C10H16N2O8: 3 Hits 30
MS-Ready Mappings • Same Input Formula: C10H16N2O8 • MS Ready Formula Search: 125 Chemicals 31
MS-Ready Mappings • 125 chemicals returned in total – 8 of the 125 are single component chemicals – 3 of the 8 are isotope...
MS-Ready Mappings 33
MS-Ready Mappings Set 34
Candidate ranking 35
Data Source Ranking of “known unknowns” 36 • Mass and/or formula is for an unknown chemical but contained within a referen...
Is a bigger database better? 37 • ChemSpider was 26 million chemicals then • Much BIGGER today • Is bigger better??
Using Metadata for Ranking • Use available metadata to rank candidates – Associated data sources • Associated lists in the...
Comparing Search Performance 39 • Dashboard content was 720k chemicals • Only 3% of ChemSpider size • What was the compari...
SAME dataset for comparison 40
How did performance compare? 41 For the same 162 chemicals, Dashboard outperforms ChemSpider
How did performance compare? 42
Will the correct Microcystin LR Stand Up? ChemSpider Skeleton Search 43
Comparing ChemSpider Structures 44
Comparing ChemSpider Structures 45
Other Searches 46
Batch Searches 47
List of Opioids – Presence in Lists? 48
Batch Search Names 49 Excel Download
Batch Search in specific lists 50
Opioids and Metabolites (160) 51
Batch Searching • We work with thousands of masses/formulae! • Typical questions – What is the list of chemicals for the f...
Batch Searching Formula/Mass 53
Searching batches using MS-Ready Formula (or mass) searching 54
Benefits of Open Data 55
API services and Open Data • Available API and web services • Open Data available for download 56
Web Services https://actorws.epa.gov/actorws/ • Dozens of web services to provide access to data • Data in UI, JSON and XM...
Example: InChIKey to DTXCIDs 58 https://actorws.epa.gov/actorws/dsstox/v02/msready?identifier =UVOFGKIRTCCNKG-UHFFFAOYSA-N
MassBank mapping to Dashboard 59
NORMAN Suspect List Exchange https://www.norman-network.com/?q=node/236 60
Integration to MetFrag in place https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0299-2 61
Work in Progress 62
Prototype Work in Progress • CFM-ID – Viewing and Downloading pre-predicted spectra – Search spectra against the database ...
Predicted Mass Spectra http://cfmid.wishartlab.com/ • MS/MS spectra prediction for ESI+, ESI-, and EI • Predictions genera...
• Predictions generated and stored for >800,000 structures • Python code to score experimental vs predicted spectra • Cosi...
Search Expt. vs. Predicted Spectra
Search Expt. vs. Predicted Spectra
Spectral Viewer Comparison 68
CASMI 2012-2017 revisited • Application of metadata candidate ranking and CFM-ID to all five years of CASMI data 69
70
Published: Alex Chao et al 71
NTA data input files Parameters for NTA clean-up / database searching NTA WebApp: Input Page Jeff Minucci NTA WebApp Devel...
Visualizations of Data – Cluster 1 Chemical Candidate Information
Patterns of Cluster Results Cluster 1: Primarily drugs (e.g. diphenhydramine, labetalol, clindamycin)
Cluster 2: Likely xenobiotics (ethanolamides) Patterns of Cluster Results
Cluster 7: Mostly human metabolism chemicals (acyl carnitines, amino acids, peptides) Patterns of Cluster Results
Prototype Development 77
Method Amenability Prediction Charlie Lowe Why? • Chromatography-mass spectrometry can be LC or GC • Which phase is more a...
Ongoing Work • Data sources to date • Massbank of North America • 9,275 chemicals for non-derivatized GC • 846 chemicals f...
Conclusion • Dashboard access to data for ~883,000 chemicals MS-Ready data facilitates structure identification • Related ...
Acknowledgements • NCCT IT development team • Tommy Cathey, ACTOR Web Services • Nancy Baker, Abstract Sifter • Todd Marti...
Contact Antony Williams US EPA Office of Research and Development National Center for Computational Toxicology EMAIL: Will...
Structure identification approaches using the EPA CompTox Chemicals Dashboard to support mass spectrometry analyses

High resolution mass spectrometry (HRMS) and non-targeted analysis (NTA) are utilized to identify emerging contaminants and chemical signatures of interest detected in various media. At the US Environmental Protection Agency the CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard) is an open chemistry resource and web-based application containing data for ~900,000 substances and supports non-targeted and suspect screening analyses. Searching functionality includes identifier searches (e.g. systematic names, trade names and CAS Registry Numbers), mass and formula-based searches and prototype developments include combined substructure-mass/formula searches and searching experimental mass spectral data against predicted fragmentation spectra. A specific type of data mapping in the database uses “MS-Ready” structures, a way to process all registered substances to separate multi-component chemicals into their individual components, removal of stereochemical bonds and desalting and neutralization. This MS-Ready processing supports batch-searching using either mass or formulae to identify candidate chemicals and their mapped substances. A number of chemical lists (https://comptox.epa.gov/dashboard/chemical_lists) have also been developed to support the identification of chemicals related to agrochemistry, specifically pesticides (both active and inert constituents), insecticides and their metabolites and environmental breakdown products). This presentation will provide an overview of how the CompTox Chemicals Dashboard supports mass spectrometry based structure identification and non-targeted analysis of chemicals in agrochemistry. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

Structure identification approaches using the EPA CompTox Chemicals Dashboard to support mass spectrometry analyses

  1. 1. Structure identification approaches using the EPA CompTox Chemicals Dashboard to support mass spectrometry analyses Antony Williams, Charles Lowe, Alex Chao, Elin Ulrich and Jon Sobus Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, RTP, NC August 2021 ACS Fall Meeting, Atlanta http://www.orcid.org/0000-0002-2668-4821 The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
  2. 2. 1 EPA’s CompTox Chemicals Dashboard A publicly accessible website delivering: - ~883,000 chemicals with related property data - Experimental and predicted physicochemical property data - Integration to “biological assay data” for 1000’s of chemicals - Information regarding consumer products containing chemicals - Links to other agency websites and public data resources - “Literature” searches for chemicals using public resources - “Batch searching” for thousands of chemicals - Downloadable Open Data for reuse and repurposing
  3. 3. A single app integrating… 2 2 SEARCH TOX DATA BIOACTIVITY SIMILARITY READ-ACROSS PUBMED BATCH SEARCH
  4. 4. Detailed Chemical Pages 3
  5. 5. Sources of Exposure to Chemicals 4
  6. 6. Physicochemical properties and environmental fate and transport 5
  7. 7. Link farm to public resources 6
  8. 8. Mass Spec Links 7
  9. 9. NIST WebBook https://webbook.nist.gov/chemistry/ 8
  10. 10. MassBank of North America https://mona.fiehnlab.ucdavis.edu 9
  11. 11. Chemical lists 10
  12. 12. >300 Chemical Lists (and growing) 11
  13. 13. “Volatilome” Human Breath 12
  14. 14. “Volatilome” Saliva 13
  15. 15. Disinfection By-Products 14
  16. 16. Tire Crumb Rubber (298) 15
  17. 17. Hydraulic Fracturing (1640) 16
  18. 18. PFAS Lists 17
  19. 19. Related Searches to Support Mass Spectrometry 18
  20. 20. Find me “related structures” Formula-Based Search 19
  21. 21. Select Chemicals of Interest 20
  22. 22. Find me “related structures” Based on Structure Similarity 21
  23. 23. Find me “related structures” Based on Structure Similarity 22
  24. 24. Find me “related structures” Structure Similarity – sort on mass 23
  25. 25. Mass & Formula Searching 24
  26. 26. Advanced Searches Mass Search 25
  27. 27. Advanced Searches Mass Search 26
  28. 28. MS-Ready Structures for Formula Search 27
  29. 29. “MS-Ready Structures” https://doi.org/10.1186/s13321-018-0299-2 28
  30. 30. 29
  31. 31. MS-Ready Mappings • EXACT Formula: C10H16N2O8: 3 Hits 30
  32. 32. MS-Ready Mappings • Same Input Formula: C10H16N2O8 • MS Ready Formula Search: 125 Chemicals 31
  33. 33. MS-Ready Mappings • 125 chemicals returned in total – 8 of the 125 are single component chemicals – 3 of the 8 are isotope-labeled – 3 are neutral compounds and 2 are charged 32
  34. 34. MS-Ready Mappings 33
  35. 35. MS-Ready Mappings Set 34
  36. 36. Candidate ranking 35
  37. 37. Data Source Ranking of “known unknowns” 36 • Mass and/or formula is for an unknown chemical but contained within a reference database • Most likely candidate chemicals have the most associated data sources, most associated lit. articles or both C14H22N2O3 266.16304 Chemical Reference Database Sorted candidate structures
  38. 38. Is a bigger database better? 37 • ChemSpider was 26 million chemicals then • Much BIGGER today • Is bigger better??
  39. 39. Using Metadata for Ranking • Use available metadata to rank candidates – Associated data sources • Associated lists in the underlying database • Associated data sources in PubChem • Specific types (e.g. water, surfactants, pesticides etc.) – Number of associated literature articles (Pubmed) – Chemicals in the environment – the number of products/categories containing the chemical is a very important source of data 38
  40. 40. Comparing Search Performance 39 • Dashboard content was 720k chemicals • Only 3% of ChemSpider size • What was the comparison in performance?
  41. 41. SAME dataset for comparison 40
  42. 42. How did performance compare? 41 For the same 162 chemicals, Dashboard outperforms ChemSpider
  43. 43. How did performance compare? 42
  44. 44. Will the correct Microcystin LR Stand Up? ChemSpider Skeleton Search 43
  45. 45. Comparing ChemSpider Structures 44
  46. 46. Comparing ChemSpider Structures 45
  47. 47. Other Searches 46
  48. 48. Batch Searches 47
  49. 49. List of Opioids – Presence in Lists? 48
  50. 50. Batch Search Names 49 Excel Download
  51. 51. Batch Search in specific lists 50
  52. 52. Opioids and Metabolites (160) 51
  53. 53. Batch Searching • We work with thousands of masses/formulae! • Typical questions – What is the list of chemicals for the formula CxHyOz – What is the list of chemicals for a mass +/- error – Can I get chemical lists in Excel files? In SDF files? – Can I include properties in the download file? 52
  54. 54. Batch Searching Formula/Mass 53
  55. 55. Searching batches using MS-Ready Formula (or mass) searching 54
  56. 56. Benefits of Open Data 55
  57. 57. API services and Open Data • Available API and web services • Open Data available for download 56
  58. 58. Web Services https://actorws.epa.gov/actorws/ • Dozens of web services to provide access to data • Data in UI, JSON and XML format 57
  59. 59. Example: InChIKey to DTXCIDs 58 https://actorws.epa.gov/actorws/dsstox/v02/msready?identifier =UVOFGKIRTCCNKG-UHFFFAOYSA-N
  60. 60. MassBank mapping to Dashboard 59
  61. 61. NORMAN Suspect List Exchange https://www.norman-network.com/?q=node/236 60
  62. 62. Integration to MetFrag in place https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0299-2 61
  63. 63. Work in Progress 62
  64. 64. Prototype Work in Progress • CFM-ID – Viewing and Downloading pre-predicted spectra – Search spectra against the database • Structure/substructure/similarity search • The EPA NTA WebApp • Access to API and web services 63
  65. 65. Predicted Mass Spectra http://cfmid.wishartlab.com/ • MS/MS spectra prediction for ESI+, ESI-, and EI • Predictions generated and stored for >800,000 structures, to be accessible via Dashboard 64
  66. 66. • Predictions generated and stored for >800,000 structures • Python code to score experimental vs predicted spectra • Cosine dot product match score calculation August 26, 2019 Nontargeted screening of wastewater for water reuse using mass spectrometry Current Advances in Water Analysis 65 CFM-ID Predicted Library Available
  67. 67. Search Expt. vs. Predicted Spectra
  68. 68. Search Expt. vs. Predicted Spectra
  69. 69. Spectral Viewer Comparison 68
  70. 70. CASMI 2012-2017 revisited • Application of metadata candidate ranking and CFM-ID to all five years of CASMI data 69
  71. 71. 70
  72. 72. Published: Alex Chao et al 71
  73. 73. NTA data input files Parameters for NTA clean-up / database searching NTA WebApp: Input Page Jeff Minucci NTA WebApp Development
  74. 74. Visualizations of Data – Cluster 1 Chemical Candidate Information
  75. 75. Patterns of Cluster Results Cluster 1: Primarily drugs (e.g. diphenhydramine, labetalol, clindamycin)
  76. 76. Cluster 2: Likely xenobiotics (ethanolamides) Patterns of Cluster Results
  77. 77. Cluster 7: Mostly human metabolism chemicals (acyl carnitines, amino acids, peptides) Patterns of Cluster Results
  78. 78. Prototype Development 77
  79. 79. Method Amenability Prediction Charlie Lowe Why? • Chromatography-mass spectrometry can be LC or GC • Which phase is more appropriate for which chemicals?
  80. 80. Ongoing Work • Data sources to date • Massbank of North America • 9,275 chemicals for non-derivatized GC • 846 chemicals for derivatized GC • 816 chemicals for APCI+ • 454 chemicals for APCI- • 4,907 chemicals for ESI+ • 3,430 chemicals for ESI- • EPA Non-targeted Analysis Collaborative Trial (ENTACT) • 886 chemicals for non-derivatized GC • 44 chemicals for derivatized GC • 774 chemicals for APCI+ • 431 chemicals for APCI- • 1,113 chemicals for ESI+ • 648 chemicals for ESI-
  81. 81. Conclusion • Dashboard access to data for ~883,000 chemicals MS-Ready data facilitates structure identification • Related metadata facilitates candidate ranking 80 • Relationship mappings and chemical lists of great utility • Curation and mutual sharing of chemical lists is important (e.g. NORMAN)
  82. 82. Acknowledgements • NCCT IT development team • Tommy Cathey, ACTOR Web Services • Nancy Baker, Abstract Sifter • Todd Martin & Valery Tkachenko, WebTEST • Kathie Dionisio & Kristin Isaacs, CPDat • Thanks to Emma Schymanski, University of Luxembourg, for coordinating all efforts with the NORMAN Network for curation of lists on the Suspect Exchange
  83. 83. Contact Antony Williams US EPA Office of Research and Development National Center for Computational Toxicology EMAIL: Williams.Antony@epa.gov ORCID: https://orcid.org/0000-0002-2668-4821 82

High resolution mass spectrometry (HRMS) and non-targeted analysis (NTA) are utilized to identify emerging contaminants and chemical signatures of interest detected in various media. At the US Environmental Protection Agency the CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard) is an open chemistry resource and web-based application containing data for ~900,000 substances and supports non-targeted and suspect screening analyses. Searching functionality includes identifier searches (e.g. systematic names, trade names and CAS Registry Numbers), mass and formula-based searches and prototype developments include combined substructure-mass/formula searches and searching experimental mass spectral data against predicted fragmentation spectra. A specific type of data mapping in the database uses “MS-Ready” structures, a way to process all registered substances to separate multi-component chemicals into their individual components, removal of stereochemical bonds and desalting and neutralization. This MS-Ready processing supports batch-searching using either mass or formulae to identify candidate chemicals and their mapped substances. A number of chemical lists (https://comptox.epa.gov/dashboard/chemical_lists) have also been developed to support the identification of chemicals related to agrochemistry, specifically pesticides (both active and inert constituents), insecticides and their metabolites and environmental breakdown products). This presentation will provide an overview of how the CompTox Chemicals Dashboard supports mass spectrometry based structure identification and non-targeted analysis of chemicals in agrochemistry. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

