Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Identification of toxicants and metabolites

879 views

Published on

A criminal story on the identification of small molecules from mass spectrometry data

Published in: Technology
  • Be the first to like this

Identification of toxicants and metabolites

  1. 1. Identifying toxicants and metabolites well, - Annotating in most cases Steffen Neumann Leibniz Institute of Plant Biochemistry Mass Spectrometry and Bioinformatics group [email_address]
  2. 2. A criminal story CSI: You want to identify somebody, based on one or more of the following evidence: <ul><li>You know the weight
  3. 3. You know the first name
  4. 4. You know the fingerprint
  5. 5. You know the fingerprint, but he's not in the files
  6. 6. And if nobody else has seen him ? </li></ul>
  7. 7. I know his weight <ul><li>So you can do a search in KEGG, PubChem, ChemSpider, ChEBI, CAS, …
  8. 8. (Usually) no extra software needed
  9. 9. And obtain zillions of hits
  10. 10. Caveat: you know the monoisotopic mass, not the average mass </li></ul>www.msu.edu/~gallego7/MassSpect/
  11. 11. DB Mass Distribution <ul><li>KEGG (3/2010) </li><ul><li>12788 entries
  12. 12. Median mass : 310 </li></ul><li>PubChem Compound (3/2010) </li><ul><li>26.167.050 entries
  13. 13. Median mass : 383 </li></ul></ul>
  14. 14. Search for a given mass <ul><li>KEGG hits with given ppm
  15. 15. 3000ppm @300Da -> nominal mass
  16. 16. Hardly improvement below 10ppm: -> same elemental composition </li></ul>
  17. 17. I know his first name <ul><li>Calculate elemental composition with an accurate mass and proper Isotopic pattern
  18. 18. Add N-rule, DBE, CH ratios, MS/MS etc. to narrow down relevant formulas ( Caveat: don't filter compounds of interest! )
  19. 19. So you can do a better search in KEGG, PubChem, ChemSpider, ChEBI, CAS, …
  20. 20. Not zillion hits, but still too many </li></ul>
  21. 21. Chemical Space beyond DBs <ul><li>Somewhat exaggerated, 97% of KEGG <1000Da
  22. 22. Estimated Metabolites: 2.500 Human, 5.000 Arabidopsis, 200.000 Plant kingdom ( D.Strack, K.Saito ) </li></ul>http://fiehnlab.ucdavis.edu
  23. 23. But wait … <ul><li>GP databases contain uncharged compounds
  24. 24. But MS observes ions: Adduct ? Fragment ?
  25. 25. Search for these “modifications” (e.g. ARMeC, HMDB, KNApSAcK, …) </li></ul>doi:10.1038/nprot.2007.512
  26. 26. KNApSAcK <ul><li>Species/Metabolite associations
  27. 27. Manually extracted from literature
  28. 28. Java Application or simple Web Form
  29. 29. Search possible adduct modifications </li></ul>Metabolite 49553 entries metabolite-species pair 95077 entries last update 2010/03/04
  30. 30. Heuristics to the rescue <ul><li>Searching all adducts multiplies false positives
  31. 31. Better “undo” that before searching: </li><ul><li>Commercial ACD IntelliXtract
  32. 32. Bioconductor CAMERA calculates [M] </li></ul></ul>http://www.acdlabs.com/
  33. 33. I know his fingerprint <ul><li>Spectra can be interpreted to some degree
  34. 34. Spectral libraries with reference spectra
  35. 35. GC/MS also produces nice spectra </li></ul>Tandem-MS and MS n in space ( QqQ, QqTOF ) or time (Orbi/Ion -trap) Molecules fragment upon collision -> CID Different MS parameters -> different spectra
  36. 36. Spectra library: NIST <ul><li>Last release: NIST'08
  37. 37. 190.000 EI spectra 14.800 ESI spectra (all nominal mass)
  38. 38. NIST MS Search
  39. 39. Plain record format .msp </li></ul>
  40. 40. Spectra library www.massbank.jp <ul><li>Many Instruments, 24.000 spectra
  41. 41. ESI (and EI)
  42. 42. Rich Web Interface
  43. 43. Batch Search
  44. 44. Web Services </li></ul>Others DBs with MS/MS: MetLin,HMDB,MMCD
  45. 45. Actually, what is “ like ” ? <ul><li>Comparing spectra is old: H.S. Hertz, R.A. Hites, and K. Biemann “Identification of Mass Spectra by computer searching a file of known spectra,” Anal. Chem., 1971
  46. 46. But not really solved, actually you want to compare compounds / peptides behind spectra
  47. 47. Peak Count and its variants
  48. 48. Cosine distance
  49. 49. Absolute, relative or “ranked” intensities </li></ul>
  50. 50. X-Rank <ul><li>Find matching peaks between spectra
  51. 51. Use ranked instead of absolute/relative intensities
  52. 52. Probability that a rank from an experimental spectrum matches a rank from a library spectrum
  53. 53. Requires training on given database </li></ul>
  54. 54. X-Rank probabilities <ul><li>Probability that 1 st peak in 1 st spec matches 1 st peak in 2 nd spec
  55. 55. Model distribution(s) of (mis-) matching spectra
  56. 56. -> highest for correct compound </li></ul>DOI: 10.1021/ac900954d
  57. 57. Machine (in-)dependence ? <ul><li>MassBank subset: 700 cpds, 8785 spectra
  58. 58. evaluate QqQ (query) + QTOF (DB)
  59. 59. Oberacher: 402 cpds, 3759 spectra library (QqQ,QTOF,QLit,LIT-FT)
  60. 60. Training of scoring function </li></ul>-> Not as bad as expected DOI 10.1002/jms.1525
  61. 61. I know others, but not their fingerprints <ul><li>Predicting spectra is tough
  62. 62. ACD Fragmenter, Waters, Bruker
  63. 63. MassFrontier (Highchem / Thermo) </li><ul><li>Originally for EI spectra interpretation
  64. 64. Fragmentation reactions (from literature) </li></ul><li>Fragment Identificator http://www.cs.helsinki.fi/group/sysfys/software/fragid/ </li></ul>http://www.biocompare.com/Articles/ApplicationNote/1471/
  65. 65. MassFrontier <ul><li>Searching PubChem 2006 with MS² spectra: 102 test compounds / 500 spectra </li></ul><ul><ul><li>Retrieve exact mass candidates (ca 272)
  66. 66. Fragmentation analysis for each candidate
  67. 67. Score: #explained peaks </li></ul></ul><ul><li>Correct: #4 (median) 75% < rank 18 </li></ul>
  68. 68. MetFrag <ul><li>Search upstream DBs: KEGG/PubChem/ChemSpider
  69. 69. In-silico fragmentation to “explain” peaks
  70. 70. Score: #peaks and BDE </li></ul>
  71. 71. MetFrag <ul><li>Many similar results with (good) tied ranks </li><ul><li>Similarity clustering (Tanimoto fingerprint >0.95)
  72. 72. Don't expect to discriminate positional isomers (or stereo …) </li></ul><li>Correct: #4 (median), 75% < rank 12 </li></ul>
  73. 73. Nobody else saw him ? <ul><li>PubChem is large, even huge … but not complete
  74. 74. Less than 100 structures: </li><ul><li>Evaluation of GC/MS & MetFrag (E. Schymanski RRP=0.33) </li></ul><li>C 23 H 48 (324 Da) -> 5,7mio (Sasaki, “Structure elucidation system using structural information from multisources: CHEMICS”, J. Chem. Inf. Comput. Sci., 1985)
  75. 75. Determination of molecular structures using tandem MS (United States Patent 7197402) </li></ul>
  76. 76. Beyond MS <ul><li>Optical: </li><ul><li>UV, Flourescence </li></ul><li>Biological: </li><ul><li>Biosynthesis Mutants (Plant) </li></ul><li>Physicochemical: </li><ul><li>Retention time
  77. 77. log K ow (ChemSpider) </li></ul></ul>-> Caveat: “If any one of your hints (m/z, RT, Fragmentation, …) is wrong, then the whole assignment is wrong.” (D. S. Richards, What is Structure Elucidation, Structure 2010, Hinckley, UK, 24 February 2010)
  78. 78. BTW: What is identified ?! Levels of Identification defined by MSI: <ul><li>Non-novels identified via authentic compound: </li></ul><ul><ul><li>RT+MS or RT+NMR or exact mass+MS² or “ exact mass+isotope pattern ” (?!)
  79. 79. Identical MS conditions </li></ul></ul><ul><li>Putatively annotated compounds
  80. 80. Putatively characterized compound classes
  81. 81. Unknowns (sic!) </li></ul>10.1007/s11306-007-0082-2
  82. 82. Mass Spectrometry and Bioinformatics Metabolite profiling pipeline: <ul><ul><li>XCMS, CAMERA, Rdisop
  83. 83. www.bioconductor.org </li></ul></ul>Metabolite Identification: <ul><ul><li>Spectral library: www.massbank.jp
  84. 84. Search PubChem with tandem-MS: msbi.ipb-halle.de/MetFrag/
  85. 85. Best of both worlds: MetFusion </li></ul></ul>
  86. 86. Questions ?

×