ASMS Fall 2018 Metabolomics Informatics Workshop Peak Picking

1
Principles of Peak Picking and Alignment
Emma L. Schymanski
FNR ATTRACT Fellow and PI in Environmental Cheminformatics
Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg
Email: emma.schymanski@uni.lu
…and many colleagues who contributed to my science over the years!
ASMS Fall Meeting, San Francisco, California, November 29-30, 2018
Image©www.seanoakley.com/
https://tinyurl.com/asmsfall2018-peaks
How many peaks will a peak picker pick if a peak picker only picks peaks?

2
(nevertheless, I will do my best!)
DISCLAIMER!
MS1
MS2
Two very different worlds …

3
Presenting Peak Picking: Plan
o Why Peak Pick
o Terminology
• Peak Picking vs Centroid vs Profile …
o Peak Picking & Peak Pickers
• “best of” xcms and enviPick
• Peak Picking in Pictures
• Peak Picking Parameters
• Alleviating Peak Picking Parameter Panic
o Alignment ( / Profiling)
• “best of” xcms and enviMass
o Peak Picking Pointers
o Don’t just listen to me … do it!

4
Why Peak Pick (I)
Example scheme of liquid chromatography - mass spectrometry
Image © www.planetorbitrap.com/q-exactive
Sampling
Extraction (SPE)
HPLC separation
HR-MS/MS

5
Why Peak Pick (II)
This is what the output “really” looks like …
Image © www.planetorbitrap.com/q-exactive

6
Why Peak Pick (III)
Identification = turning numbers into structures
N
N
N
S
CH3
NHNH
CH3
CH3
CH3
N
N
N
S
CH3
NHNHCH3
CH3
OH
P
O
S
SO
CH3
CH3
CH3
P OHS
S
O
CH3
CH3
OH
CH3
S
O
O
OH
CH3
CH3
S
N
S
O
O
OH
S
O
O
OH
CH3
CH3
S
O
O
OH
CH3
CH3
S
O
O
OH
CH3
CH3
S
O
O
OH
CH3
CH3
S
O
O
OH
CH3
CH3
N
N
N
S
NHNH
CH3
CH3
CH3
NH2
OH
O
massbank.eu

7
TERMINOLOGY!
o Peak picking can be multi-directional, i.e.
• in mass… or time…

8
Mass: Centroid vs Profile Data (enviPat)
https://www.envipat.eawag.ch/index.php and Loos et al Anal. Chem. 87(11), 5738-5744. DOI: 10.1021/acs.analchem.5b00941

9
Mass: Centroid vs Profile Data (enviPat)
https://www.envipat.eawag.ch/index.php and Loos et al Anal. Chem. 87(11), 5738-5744. DOI: 10.1021/acs.analchem.5b00941

10
TERMINOLOGY!
http://proteowizard.sourceforge.net/
o Peak picking can be multi-directional (mass, time)
• Peak picking in Proteowizard MSConvert is “centroiding” masses
(turning profile mode data into centroided data for efficient processing)

11
Peak Picking (in time)
Source: R. Tautenhahn, C. Böttcher, S. Neumann, BMC Bioinformatics 2008, 9:504. DOI: 10.1186/1471-2105-9-504
o Peak picking along time axis (chromatographic peaks)

12
Peak Picking
Source: R. Tautenhahn, C. Böttcher, S. Neumann, BMC Bioinformatics 2008, 9:504. DOI: 10.1186/1471-2105-9-504

13
Peak Picking
Source: Johannes Rainer; http://bioconductor.org/packages/release/bioc/vignettes/xcms/inst/doc/xcms.html

14
Peak Picking
Source: Johannes Rainer; http://bioconductor.org/packages/release/bioc/vignettes/xcms/inst/doc/xcms.html
Several Samples Overlaid
Red = KO
Blue = wild type
Rectangle = chromatographic
peaks identified per sample

15
Peak Picking
o Several options for peak picking
• XCMS and centWave
• Tautenhahn et al 2008 DOI: 10.1186/1471-2105-9-504
• http://bioconductor.org/packages/xcms/
• MZmine 2
• Pluskal et al 2010 DOI: 10.1186/1471-2105-11-395
• http://mzmine.github.io/
• enviPick / enviMass
• Loos 2018 DOI: 10.5281/zenodo.1213098
• http://www.looscomputing.ch/eng/enviMass/overview.htm
• Plenty of other open, research and vendor options ...

16
Peak Picking
o Result is something like this (from Formulator output):

17
Peak Picking – XCMS & XCMS Online
o http://bioconductor.org/packages/xcms/

18
Peak Picking – XCMS & XCMS Online
o https://xcmsonline.scripps.edu/

19
Peak Picking – enviMass and enviPick
o http://www.looscomputing.ch/eng/enviMass/overview.htm
o R packages …

20
Peak Picking in Pictures
http://www.looscomputing.ch/eng/enviMass/topics/peakpicking.htm
Red = peaks
Grey = noise

21
Peak Picking .. Somewhat simpler picture

22
centWave – Gaussian with “Mexican Hat”
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-504

23

24

25
But … peaks are not perfect!
o See enviMass website for explanation …

26
Critical Point: Separating Peaks from Baseline

27
Peak Picking Parameters
o There are a lot of options to tweak!
• I will just run through (main) centWave parameters
• enviPick is too complicated => further reading!

28
Peak Picking Parameters: centWave
ppm maximal tolerated m/z deviation in consecutive scans, in
ppm (parts per million)
NOTE: dependent on your mass spectrometer

29
peakwidth Chromatographic peak width, given as range (min,max) in seconds
NOTE: highly dependent on your chromatography!

30
snthresh Signal to noise ratio cutoff

31
prefilter prefilter=c(k,I). Prefilter step for the first phase. Mass traces are
only retained if they contain at least k peaks with intensity >= I
Only one “stick” so will
fail recommended prefilter
settings

32
Too Many Peak Picking Parameters ???????
https://bioconductor.org/packages/
release/bioc/vignettes/IPO/inst/doc
/IPO.html
o IPO to the rescue!
o Parameter
optimization for
xcms-based
workflows …
o Libiseller et al
2015, DOI:
10.1186/s12859-015-0562-8
IPO = Isotopologue Parameter Optimization

33
Too Many Peak Picking Parameters ???????

34
RECAP: Why Peak Pick?
Identification = turning numbers into structures
N
N
N
S
CH3
NHNH
CH3
CH3
CH3
N
N
N
S
CH3
NHNHCH3
CH3
OH
P
O
S
SO
CH3
CH3
CH3
P OHS
S
O
CH3
CH3
OH
CH3
S
O
O
OH
CH3
CH3
S
N
S
O
O
OH
S
O
O
OH
CH3
CH3
S
O
O
OH
CH3
CH3
S
O
O
OH
CH3
CH3
S
O
O
OH
CH3
CH3
S
O
O
OH
CH3
CH3
N
N
N
S
NHNH
CH3
CH3
CH3
NH2
OH
O
massbank.eu

35
o Instruments change over time …
o Before we can do fancy statistics, we need to make sure
our samples are comparable!

36
Alignment
http://bioconductor.org/packages/release/bioc/vignettes/xcms/inst/doc/xcms.html#3_initial_data_inspection
o Alignment / Profiling => which peaks belong together
across large sample sets?

37
Alignment
http://www.looscomputing.ch/eng/enviMass/topics/profiling.htm
o “Profiling” in enviMass

38
Alignment ~= Retention Time Correction
http://bioconductor.org/packages/release/bioc/vignettes/xcms/inst/doc/xcms.html#3_initial_data_inspection
o Many algorithms and methods …
o Before:

39
Alignment ~= Retention Time Correction
http://bioconductor.org/packages/release/bioc/vignettes/xcms/inst/doc/xcms.html#5_alignment
o Many algorithms and methods …
o After (Obiwarp algorithm in xcms)

40
Before Alignment
After Alignment

41
Changes over samples
http://bioconductor.org/packages/release/bioc/vignettes/xcms/inst/doc/xcms.html#5_alignment
o Difference between adjusted and raw retention times
along the retention time axis

42
Some advice …
o Peak pickers are designed to pick the perfect peak
• But life is never perfect and peaks are no different
o Pick the peak picker that is best for your situation
• Convenience, ease of use, designed for your data, …
• The optimal choice is usually a compromise
o Be sceptical (visualise your data, reality check it, etc.)
• But don’t go overboard in evaluating peak pickers … remember
your (real) goal …

43
Peak Picking Overlap (centWave paper)

44
Verify with EIC Extraction [these are NOT picked]
https://github.com/schymane/ReSOLUTION/blob/master/R/RMB_EIC_prescreen.R
No peak at all
Nice peak, MSMS
Peak, no MSMS
Noise with MSMS (careful!)
Isobars with MSMS (careful!)*
Looking for chemicals known
to be present in the sample

45
Just because you find a peak …
ENTACT Project: https://www.epa.gov/sites/production/files/2018-06/documents/comptox_cop_6-28-18.pdf
o Mix 505: One candidate with this mass/formula
• DTXSID9040001, C9H8O4
o One chemical…
How many
peaks?

46
…doesn’t mean it’s your compound of interest!

47
Beware of artefacts!
o Your results also depend on the acquisition data!

48
Further reading DOING! [Vendor independent]
o Don’t just take my word for it … don’t just read about it
… DO IT. There are so many ways to try it out …
complete with sample data! [Open Science!]
o http://bioconductor.org/packages/release/bioc/vignettes/x
cms/inst/doc/xcms.html
o http://www.looscomputing.ch/eng/enviMass/overview.htm
o An interface that many enjoy, likely comes with example
data but requires a login …
o https://xcmsonline.scripps.edu/

49
Further reading DOING! [Vendor independent]
o http://mzmine.github.io/
o http://prime.psc.riken.jp/Metabolomics_Software/MS-DIAL/
o MS-DIAL

50
Acknowledgements
emma.schymanski@uni.lu
Further Information:
http://bioconductor.org/packages/xcms/
http://www.looscomputing.ch/eng/enviMass/overview.htm
https://xcmsonline.scripps.edu/
http://mzmine.github.io/
EU Grant
603437
The CompMS Community (proxy photo)

52
Quality Control of Data
Slide c/o Michael Stravs
o Always visualise results … never take anything for granted

53
Homologues: Challenge Peak Pickers but are Present!
Stravs et al. (2013), J. Mass Spectrom, 48(1):89-99. DOI: 10.1002/jms.3131
OHSO
O
CH3
O
OH
m n
SPA-9C
m+n=6
www.massbank.eu ACCESSIONS (LAS, SPACs):
Literature MS/MS LIT00034, LIT00037
Std Mix., Sample ETS00012, ETS00018https://github.com/MassBank/RMassBank/
Tentatively Identified Spectra:
http://goo.gl/0t7jGp

54
Be wary of instrument specific phenomena!
o R package nontarget: satellite peak removal

55
Be wary of instrument specific phenomena II
o Orbitrap-specific calibration issues (not observed in TOF)

ASMS Fall 2018 Metabolomics Informatics Workshop Peak Picking

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to ASMS Fall 2018 Metabolomics Informatics Workshop Peak Picking

Similar to ASMS Fall 2018 Metabolomics Informatics Workshop Peak Picking (20)

Recently uploaded

Recently uploaded (20)

ASMS Fall 2018 Metabolomics Informatics Workshop Peak Picking