The document summarizes the ADAP-GC software for deconvoluting co-eluting metabolites from GC/TOF-MS data in metabolomics studies. It describes how GC-MS results in co-elution of metabolites due to its lower resolution compared to LC-MS. ADAP-GC uses a three step process to deconvolute co-eluting peaks: 1) identifying chromatographic peak features, 2) selecting model peak features for each component, and 3) constructing a spectrum for each component through constrained optimization. The software allows automated processing, alignment and export of metabolite identities and quantities for downstream analysis.
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
Pittcon 3 5-2014
1. ADAP-GC: Deconvolution of
Co-Eluting Metabolites from GC/TOF-MS
Data for Metabolomics Studies
Xiuxia Du
Department of Bioinformatics and Genomics
University of North Carolina at Charlotte
2. Outline
• Background
• Why is deconvolution necessary?
• How is deconvolution done in ADAP-GC?
• ADAP-GC software
• Next step
2
3. GC-MS vs. LC-MS
M E TA B O L O M I C SO N C H N S C O H P C N S H C O
Alkylsilyl derivatives
Eicosanoids
Essential oils
Esters
Perfumes
Terpenes
Waxes
Volatiles
Caratenoids
Flavenoids
Lipids
Alcohols
Alkaloids
Amino acids
Catecholamines
Fatty acids
Phenolics
Polar organics
Prostaglandins
Steroids
Organic Acids
Organic Amines
Nucleosides
Ionic Species
Nucleotides
Polyamines
Less Polar More Polar
GC/MSGC/MS LC/MS
overlap
Figure 1. Classes of chemicals and the analytical techniques with which they are
3
4. Ionization
• Electron ionization (EI)
• Hard method
• Small molecules, 1-1000 Da
• Electrospray ionization (ESI)
• Soft method
• Small molecules, peptides, proteins, up to 200,00 Da
4
5. EI
M +e−
→ M+•
+ 2e−
EI fragmentation of CH3OH
CH3OH → CH3OH+
CH 3OH → CH2O = H+
+ H
CH3OH →+
CH3 +OH
CH2O = H+
→ CHO ≡ H+
+ H
5
6. EI breaks up molecules …
Molecular ion
in predictable ways.
6
9. Deconvolution
As a review, let's look at the deconvolution process. AMDIS
considers the peak shapes of all extracted ions and their apex
retention times (RT). In this example, only some of the
extracted ion chromatograms (EICs) are overlaid for clarity
with the apex spectrum (Figure 1A).
Figure 1A
50
170
280
31075
185
160
Extracted Ion
Chromatograms
(EIC)
After de-skewing
50
170
280
75 late retention time
185 shape & early retention time
310 early retention time
160 shape
Same shape and same
retention time
Figure 1B shows the EICs after the different peak shapes or RTs are eliminated from Figure 1A. Ions 50, 170, 280 and a few others remain.
Ion 160 EIC has the same RT as ions 50, 170 and 280, but has
a different peak shape. Ion 185 has a different peak shape and
an earlier RT. Ions 75 and 310 have similar peak shapes but
they have different RTs.
www.agilent.com
9
10. Deconvolution
3
Figure 1A-1C. Simplified deconvolution process (continued).
310 early retention time
50
170
280
31075
185
160
Extracted Ion
Chromatograms
(EIC)
Figure 1B
50
170
280
Only the ions in black
have the same shape
and retention time as
shown by 50, 170, 280-
plus others
Figure 1B shows the EICs after the different peak shapes or RTs are eliminated from Figure 1A. Ions 50, 170, 280 and a few others remain.
www.agilent.com
10
11. Deconvolution
50
170
280
Extracted Ion
Chromatograms
(EIC)
Figure 1C
These
deconvoluted ions
are grouped
together as a
component
50
170
280
Figure 1C shows all of the ions in black that have similar peak shapes and RTs, within the criteria set earlier by the analyst. These are
grouped together and referred to as a component by AMDIS.
Figure 1A-1C. Simplified deconvolution process (continued).
www.agilent.com
11
12. GC-EI-MS data processing workflow
peak
detection
deconvolutionalignment denoising
baseline
correction
library search
EIC
extraction
raw MS data
12
13. • For low-resolution mass measurement: relatively easy
• For high-resolution mass measurement: more involved
EIC extraction
13
14. Peak picking
• Each EIC chromatographic peak is characterized by its apex elution time,
left and right boundary, peak height, and peak shape.
!
14
15. Background
ADAP-GC
ADAP-LC
ADAP-Stats
ADAP-CAGT
An automated data analysis pipeline
for GC-TOF-MS metabonomics
studies. Journal of proteome research
2010, 9 (11), 5974-81.
!
ADAP-GC 1.0
Deconvolution
ex = a1,a2,,an{ }
ey = b1,b2,,bn{ }
Let the abundance
values of two EICs be
Then, the similarity
between the two
EICs can be
measured by
r =
ex •
ey
ex •
ey
15
16. Why ADAP-GC 2.0?
ADAP-GC 2.0: Deconvolution of Coeluting
Metabolites from GC/TOF-MS Data for
Metabolomics Studies. Analytical chemistry
2012, 84 (15), 6619-29.
• 43, 73, and 117: shared
• 217: unique to uridine
• 132: unique to n-eicosanoic acid
810 881
909 948
16
18. ADAP-GC 2.0
• An EIC peak could result from
the elution of a single or
multiple co-eluting
components.
• Chromatographic Peak
Features (CPF) is defined.
• Simple CPF and composite
CPF are identified.
• Deconvolution is performed.
determination of deconvolution windows
selection of model CPFs
construction of spectrum for each component
correction of splitting issues
decon procedure
18
19. Selection of Model CPFs
• Step 1: select good candidates
• Step 2: determine the number of
components by hierarchical
clustering of the good candidates
• Step 3: determine the model CPF for
each component
sharpness =
Ii − Ii−1
Ii−1i=2
p
∑ +
Ii − Ii+1
Ii+1i=p
n−1
∑
total score = c1( ) mass( )+ c2( ) gaussian similarity( )
+ c3( ) apex intensity( )+ c4( ) SNR( )
19
20. Construction of Spectrum
• Each composite CPF is a linear summation of model CPFs.
• Weights are determined by constrained optimization.
• The weights that correspond to the same model CPF yield
the spectrum of a component.
E = X i[ ]− ak Mk i[ ]
k=1
K
∑
#
$
%
&
'
(
i=1
n
∑
2
20
24. Alignment
• Component-based: the same component across samples are
identified based on spectrum and retention time similarity
scoretotal si,sj( )= 0.9scorespec si,sj( )+ 0.1scoreRT si,sj( )
scoreRT =1− ΔRT w
An automated data analysis pipeline for GC-TOF-MS metabonomics studies. Journal of proteome research 2010, 9 (11), 5974-81.
24
25. Alignment
• For each component, the best representative spectrum across
all of the samples are determined
RT: 21.2955 21.2938 21.2938 21.2947 21.3097
21.2955 21.2972 21.2963 21.3038 21.3030
25
27. Export
• Identity and quantity
in .csv files
• Spectra in .msp format
that can be read by
NIST MS Search
software and other
library search tools
27
34. Next step
• Many parameters must be pre-specified in the current data
processing.
• How to reduce reliance on parameter settings?
• Is an adaptive workflow possible?
34
35. Acknowledgement
• Du lab
§ Wenxin Jiang
§ Yan Ni
§ Peter Pham
§ Kyle Suttlemyre
§ Fei Xu
§ Wenchao Zhang
35
36. Acknowledgement
• Dr. Wei Jia’s group @ UNC-Greensboro
§ Yunping Qiu
§ Guoxiang Xie
§ Xiaojiao Zheng
• Dr. Steve Zeisel’s group @ UNC-Chapel Hill
• Mingming Su @ DHMRI
36