Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Chemical identification of unknowns in high resolution mass spectrometry using the CompTox Chemicals Dashboard

417 views

Published on

Non-targeted and suspect screening studies using high resolution mass spectrometry (HRMS) have revolutionized the detection of chemicals in complex matrices. However, data processing remains challenging due to the vast number of chemicals detected in samples, software and computational requirements of data processing, and inherent uncertainty in confidently identifying chemicals from candidate lists. The US EPA has developed functionality within the CompTox Chemicals Dashboard (https://comptox.epa.gov) to address challenges related to data processing and analysis in HRMS. These tools include the generation of “MS-Ready” structures to optimize database searching, retention time prediction for candidate reduction, consensus ranking using chemical metadata, and in silico MS/MS fragmentation prediction for spectral matching. Combining these tools into a comprehensive workflow improves certainty in candidate identification. This presentation will introduce the tools and combined workflow, including visualization and access via the CompTox Chemicals Dashboard. These tools, data, and visualization approaches within an open chemistry resource provides a publicly available software tool to support structure identification and non-targeted analyses. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

Published in: Science
  • Be the first to comment

Chemical identification of unknowns in high resolution mass spectrometry using the CompTox Chemicals Dashboard

  1. 1. Applications of the US EPA’s CompTox Chemicals Dashboard to support structure identification and chemical forensics using mass spectrometry Antony Williams1 and Andrew D. McEachran2,3 1) National Center for Computational Toxicology, U.S. Environmental Protection Agency, RTP, NC 2) Oak Ridge Institute of Science and Education (ORISE) Research Participant, RTP, NC 3) Present Address: Agilent Inc., Santa Clara, CA March 18th 2019 Pittcon, Philadelphia http://www.orcid.org/0000-0002-2668-4821 The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
  2. 2. CompTox Chemicals Dashboard • A publicly accessible website delivering access: – ~875,000 chemicals with related property data – Searchable by chemical, product use, gene and assay (ToxCast) – Experimental and predicted physicochemical property data – “Bioactivity data” for the ToxCast/Tox21 project – Links to other agency websites and public data resources – “Literature” searches for chemicals using public resources – “Batch searching” for thousands of chemicals – DOWNLOADABLE Open Data for reuse and repurposing 1
  3. 3. CompTox Chemicals Dashboard https://comptox.epa.gov/dashboard 2
  4. 4. BASIC Search 3
  5. 5. Detailed Chemical Pages 4
  6. 6. Access to Chemical Hazard Data 5
  7. 7. In Vitro Bioassay Screening ToxCast and Tox21 6
  8. 8. Sources of Exposure to Chemicals 7
  9. 9. Identifiers to Support Searches 8
  10. 10. Literature Searches and Links 9
  11. 11. Link Access 10
  12. 12. NIST WebBook https://webbook.nist.gov/chemistry/ 11
  13. 13. MassBank of North America https://mona.fiehnlab.ucdavis.edu 12
  14. 14. m/z CLOUD https://www.mzcloud.org/ 13
  15. 15. SIMPLE APPLICATIONS FOR MASS SPEC. 14
  16. 16. Find me “opioids like Morphine” Structure Similarity 15
  17. 17. Find me “opioids like Morphine” Structure Similarity 16
  18. 18. Find me “opioids like Morphine” Structure Similarity – sort on mass 17
  19. 19. Find me “opioids like Morphine” Formula-Based Search 18
  20. 20. Select Chemicals of Interest 19
  21. 21. Prune to list of interest 20
  22. 22. Sort for “morphine” by string 21
  23. 23. Literature Searching 22
  24. 24. Literature Searching 23
  25. 25. Literature Searching 24
  26. 26. FOCUSED CHEMICAL LISTS OF INTEREST 25
  27. 27. Chemical Lists 26
  28. 28. PFAS lists of Chemicals 27
  29. 29. EPAHFR: Hydraulic Fracturing 28
  30. 30. Aggregate data for a list of chemicals 29
  31. 31. Batch Search Names 30 Excel Download
  32. 32. Add Other Data of Interest 31
  33. 33. Batch Search in specific lists 32
  34. 34. MASS AND FORMULA SEARCHING 33
  35. 35. Advanced Searches Mass and Formula Based Search 34
  36. 36. Advanced Searches Mass and Formula Based Search 35
  37. 37. Batch Searching • Singleton searches are useful but we work with thousands of masses and formulae! • Typical questions – What is the list of chemicals for the formula CxHyOz – What is the list of chemicals for a mass +/- error – Can I get chemical lists in Excel files? In SDF files? – Can I include properties in the download file? 36
  38. 38. Batch Searching Formula/Mass 37
  39. 39. DO WE REALLY NEED ANOTHER DATABASE? 38
  40. 40. Is a bigger database better? 39 • ChemSpider was 26 million chemicals then • Much BIGGER today • Is bigger better??
  41. 41. Comparing Search Performance 40 • Dashboard content was 720k chemicals • Only 3% of ChemSpider size • What was the comparison in performance?
  42. 42. SAME dataset for comparison 41
  43. 43. How did performance compare? 42
  44. 44. Data Quality is important • Data quality in free web-based databases! 43
  45. 45. Will the correct Microcystin LR Stand Up? ChemSpider Skeleton Search 44
  46. 46. Comparing ChemSpider Structures 45
  47. 47. Comparing ChemSpider Structures 46
  48. 48. Other Searches 47
  49. 49. Delivering a Better Database • An ideal database would provide: – Curated CAS Number-Name mappings with “correct” chemical structures • We have full time curators checking data 48
  50. 50. Names to CASRN Mappings 49
  51. 51. Subtleties 50 E/Z-stereochemistry E-stereochemistry “4-Decene”
  52. 52. CAS Registry Numbers 51
  53. 53. Alternative Synonyms 52
  54. 54. SPECIFIC APPLICATIONS TO MASS SPEC. 53
  55. 55. Mass Spec Focused Applications 54
  56. 56. Mass Spec Focused Applications 55
  57. 57. “MS-Ready Structures” https://doi.org/10.1186/s13321-018-0299-2 56
  58. 58. Specific Data-Mappings “MS-Ready Structures” 57
  59. 59. 58
  60. 60. MS-Ready Mappings 59
  61. 61. MS-Ready Mappings Set 60
  62. 62. Advanced Searches Mass Search 61
  63. 63. Advanced Searches Mass Search 62
  64. 64. MS-Ready Structures for Formula Search 63
  65. 65. MS-Ready Structures Batch Searches 64
  66. 66. MS-Ready Mappings • EXACT Formula: C10H16N2O8: 3 Hits 65
  67. 67. MS-Ready Mappings • Same Input Formula: C10H16N2O8 • MS Ready Formula Search: 125 Chemicals 66
  68. 68. MS-Ready Mappings • 125 chemicals returned in total – 8 of the 125 are single component chemicals – 3 of the 8 are isotope-labeled – 3 are neutral compounds and 2 are charged 67
  69. 69. Complexity to Simplicity - Use Lists 125 Chemicals – 7 in EPAHFR 68
  70. 70. Complexity to Simplicity 125 Chemicals – 7 in the list 69
  71. 71. Searching batches using MS-Ready Formula (or mass) searching 70
  72. 72. UVCB CHEMICAL SUBSTANCES 71
  73. 73. UVCB Chemicals 72
  74. 74. Many Hydraulic Fracturing Chemicals are “Complex” 73
  75. 75. “Markush Structures” https://en.wikipedia.org/wiki/Markush_structure 74
  76. 76. UVCB: Complex Surfactants 75
  77. 77. UVCB: Complex Surfactants 76
  78. 78. UVCB: Complex Surfactants 77
  79. 79. WORK IN PROGRESS 78
  80. 80. Work in Progress • CFM-ID – Viewing and Downloading pre-predicted spectra – Search spectra against the database 79
  81. 81. Predicted Mass Spectra http://cfmid.wishartlab.com/ • MS/MS spectra prediction for ESI+, ESI-, and EI • Predictions generated and stored for >800,000 structures, to be accessible via Dashboard 80
  82. 82. Search Expt. vs. Predicted Spectra
  83. 83. Search Expt. vs. Predicted Spectra
  84. 84. Spectral Viewer Comparison 83
  85. 85. Work in Progress • CFM-ID – Viewing and Downloading pre-predicted spectra – Search spectra against the database • Retention Time Index Prediction 84
  86. 86. Retention Time Prediction for Ranking 85
  87. 87. Moving to Relative Retention Times 86
  88. 88. Work in Progress • CFM-ID – Viewing and Downloading pre-predicted spectra – Search spectra against the database • Retention Time Index Prediction • Structure/substructure/similarity search 87
  89. 89. Prototype Development 88
  90. 90. Prototype Development 89
  91. 91. Work in Progress • CFM-ID – Viewing and Downloading pre-predicted spectra – Search spectra against the database • Retention Time Index Prediction • Structure/substructure/similarity search • Integration of predicted ion mobility data 90
  92. 92. Collision Cross Section Prediction 91
  93. 93. Work in Progress • CFM-ID – Viewing and Downloading pre-predicted spectra – Search spectra against the database • Retention Time Index Prediction • Structure/substructure/similarity search • Integration of predicted ion mobility data • Access to API and web services for programmatic access 92
  94. 94. API services and Open Data • Groups waiting on our API and web services • Mass Spec companies instrument integration • Release will be in iterations but for now our data are available 93
  95. 95. SIDE EFFECTS OF SHARING OPEN DATA 94
  96. 96. NORMAN Suspect List Exchange https://www.norman-network.com/?q=node/236 95
  97. 97. Integration to MetFrag in place https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0299-2 96
  98. 98. Conclusion • Dashboard access to data for ~875,000 chemicals • MS-Ready data facilitates structure identification • Related metadata facilitates candidate ranking 97 • Relationship mappings and chemical lists of great utility • Dashboard and contents are one part of the solution • Future releases will offer even more utility • We are committed to open API development with time..
  99. 99. Acknowledgements • THANK YOU for the invitation! • IT Development team – especially Jeff Edwards and Jeremy Dunne • Chris Grulke for the ChemReg system • NERL colleagues – Jon Sobus, Elin Ulrich, Mark Strynar, Seth Newton • Emma Schymanski, LCSB, Luxembourg • The NORMAN Network and all contributors 98
  100. 100. Contact Antony Williams US EPA Office of Research and Development National Center for Computational Toxicology EMAIL: Williams.Antony@epa.gov ORCID: https://orcid.org/0000-0002-2668-4821 99

×