2. concentration. In addition to peak-alignment based methods
of spectral deconvolution,14
spectral fitting approaches, that
use individual template spectra matching for each metabolite, are
commonly used in an attempt to reduce the influence of signal
overlap on quantification.15
By virtue of the spectral matching
process (that typically uses multiple peak fits in the spectrum to
provide a best estimate of concentration), spectral fitting has the
advantage oversimple spectral integration in that it is inherently
less affected by background variation arising from the sample
matrix, and from spectral artifacts such as the residual water peak
in aqueous sample spectra. However, spectral fitting is more
time-consuming and may be prone to user error or subjectivity.
A common approach used to assess the quantification
accuracy in biofluid spectra is the use of traditional “spike-in”
experiments, whereby authentic standards are added to the
sample in known concentration. However, these experiments
are conducted typically in the context of an invariant back-
ground, which is often not representative of the “real world”
scenario where baseline signals from different samples vary due
to numerous matrix effects.
Additional measures of the spectral quality and reliability of
individual measurements made in metabolic profiling studies,
that characterize performance in real sample sets, are therefore
of potential utility to the metabolic profiling community.
Mixture Design. Mixture design experiments are routinely
used for the selection of optimal criteria for production pro-
cesses, formulation, and more generally in the characterization
of relationships between response and system composition.
There are numerous designs that can be used, depending on
the constraints placed on the mixture components; a simplex-
lattice design reflects one of the simplest designs, and is
described as follows: “A {q, m} simplex-lattice design for q com-
ponents consists of points defined by the following coordinate
settings: the proportions assumed by each component take the
m+1 equally spaced values from 0 to 1
= =x m m i q0, 1/ , 2/ , ..., 1 for 1, 2, ...,i
...and all possible combinations (mixtures) of the proportions
from this equation are used.”16
The proportions must sum to
unity. For example, a {3,3} simplex lattice design represents
three components (q = 3), each of which have four (m + 1 = 4),
equally spaced, different possible levels (0, 1/3, 2/3, 1), and
therefore will have ten possible mixture combinations.
We propose that mixing different biofluid samples in known
proportions according to a mixture design (such as a simplex
lattice) will produce a sample set that enables metabolite behavior
across the sample compositional space to be characterized by
regression of the design against the metabolite response. In an
ideal situation, the observed response of an individual metabolite
will exactly follow the mixture design, and a perfect fit will be
achieved. In reality, matrix effects and confounding signal overlap
may reduce the accuracy of metabolite responses and reduce the
correspondence with the mixture design. Thus, this approach
allows the reproducibility of individual metabolites to be assessed,
and those that are adversely affected by matrix effects or signal
overlap to be identified.
Here we have applied this strategy of mixing intact biofluids,
according to a predetermined experimental mixture design, to
compare the performance of two commonly used metabolite
quantification methods in the context of “real world” 1
H NMR
metabonomic analysis. The potential benefits of incorporating
a designed mixture component in metabonomic analyses, as a
method of assessing the accuracy of metabolite quantification,
are discussed. We suggest that this strategy may have general
benefits and applicability in metabolic profiling studies.
■ MATERIALS AND METHODS
Chemicals. D2O was obtained from Goss Scientific
(Nantwich, U.K.). All other reagents were of analytical grade
and obtained from SigmaAldrich (Poole, U.K.).
Experimental Design. A schematic of the experimental
design is shown in (Figure 1), with details of discussed in turn
below.
Sample Collection and Preparation. Urine samples were
obtained from an existing large-scale toxicological study
resource.10,17,18
Sprague−Dawley rats (n = 7) were individually
housed in standard metabolism cages (21 ± 3 ◦C, relative
humidity 55 ± 15%) and acclimatized for six days prior to the
start of the study (t = 0 h). A standard diet (Purina chow 5002)
and fresh water (acidified to pH 2.5 using HCl to prevent
microbial growth) was available to each animal ad libitum.
Urine samples used in the current study were collected during
Figure 1. Schematic showing the overall approach described. Three
different urines were mixed in known proportions according to a
mixture design (1, 2). Concentrations of metabolites were determined
by 1
H NMR spectroscopy (3). Spectral fitting and spectral integration
were both used for quantification (4). The mixture design data (Y block)
were used in a PLS regression against the metabolite concentration data
(X block) to generate model metrics (5).
Table 1. Sample Composition for Designed Biofluid
Mixtures Used in This Study Following a {3,3} Simplex-
Lattice Mixture Design
volume (μL)
rat urine human urine
sample number 0−8 h 8−24 h spot sample sodium phospate buffer
1 300 0 0 300
2 0 300 0 300
3 0 0 300 300
4 200 100 0 300
5 200 0 100 300
6 100 200 0 300
7 0 200 100 300
8 100 0 200 300
9 0 100 200 300
10 100 100 100 300
Analytical Chemistry Article
dx.doi.org/10.1021/ac400449f | Anal. Chem. 2013, 85, 6674−66816675
4. two periods (0−8 h, 8−24 h) from control animals. Urine
voided by the animals was collected in the metabolism cage
into a container cooled by dry ice. Samples were subsequently
stored at −40 °C. Each sample underwent two freeze−thaw
cycles before use in this work as a consequence of realiquoting.
Further study information has previously been published.17
Additionally, a spot urine sample (5 mL) was obtained from a
healthy human volunteer, according to established protocols,
including filtration to remove cellular material (0.2 μm Minisart
16534K, Sartorius, Germany), and immediate storage at −40 °C
until required for analysis. Urine samples were prepared follow-
ing established protocols for NMR metabolome analysis.19
Urine samples were defrosted, vortex mixed (30 s, RT), and
centrifuged (16000 g, RT, 10 min) to remove particulate matter.
To provide sufficient total sample, for each collection period,
450 μL of each rat urine sample was pooled (total volume 3150 μL
per collection period). The three urines (two pooled rat urine,
one human spot urine) were mixed according to a {3,3} simplex-
lattice mixture design (Table 1), with each mixed sample having
a volume of 300 μL. These mixed samples were then buffered by
the addition of 300 μL sodium phosphate buffer (pH 7.4, 0.2 M,
80:20 H2O:D2O (v/v)) containing sodium 3-(trimethylsilyl)-
[2,2,3,3-2
H4]propionate (TSP, 1 mM). Samples were vortex
mixed (30 s, RT), and a 550 μL aliquot transferred to a 96-well
autosampler plate. The mixed samples were prepared in triplicate
from the pooled rat samples and the human spot urine. The
preparation order was randomized.
NMR Spectral Acquisition and Processing. 1
H NMR
spectra were acquired on a Bruker AVANCE DRX600 NMR
spectrometer (Bruker Biospin, Rheinstetten, Germany) operat-
ing at 14.1 T (600.29 MHz 1
H NMR frequency) using a PH FI
TXI 600SB 5 mm probe maintained at 300 K. Samples were
introduced to the probe using a BEST flow-injection system
(Bruker) in a randomized order. Gradient shimming was used
immediately prior to spectral acquisition to ensure high field
homogeneity. Spectral acquisition was made using standard a
standard 1D pulse sequence (RD-90°-t1-90°-tm-90°-AQ).20
The
t1 delay and the mixing time (tm) were set to 3 μs and 100 ms
respectively. All spectra were collected as the sum of 128 free
induction decays (FIDs) were collected into 32K complex
data points. The spectral width of 12019.23 Hz (20 ppm)
giving the FID a native resolution of 0.366 Hz/pt, and an
acquisition time (AQ) of 1.36 s. A 2 s relaxation delay (RD)
was used between pulses. A presaturation pulse was applied
to the water resonance (δH = 4.7 ppm) during RD and tm.
Processing of the raw NMR data for analysis using a targeted
integration approach was carried out using XWINNMR
software (Bruker Biospin, Rheinstetten, Germany), with each
FID being multiplied by an exponential weighting func-
tion equivalent to a line broadening of 1 Hz prior to Fourier
transformation. Resulting frequency-domain spectra were
referenced to TSP (δH = 0.00 ppm) and interpolated from
32K to ∼42K data points using a cubic spline function
to regularize the abscissa and improve calibration accuracy
Figure 2. 1
H NMR spectra of urine samples used in this study: (A) Pooled rat urine 0−8 h collection (n = 7), (B) pooled rat urine 8−24 h collection
(n = 7), and (C) human urine spot collection. Spectra were acquired at an observation frequency of 600 MHz using a standard 1D pulse sequence
with water presaturation.
Analytical Chemistry Article
dx.doi.org/10.1021/ac400449f | Anal. Chem. 2013, 85, 6674−66816677
5. (final resolution 0.29 Hz/pt) prior to analysis using in-house
scripts running in the Matlab (The Mathworks, Natick) computing
environment.
Metabolite Quantification. Fifty-four metabolites were
quantified using both the spectral fitting approach and a
targeted spectral integration approach. Spectral fitting was
performed using Chenomx NMR Suite 4.6 (Chenomx Inc.,
Edmonton, Canada); reference spectra from the Chenomx
600 MHz library were combined so as to best approximate each
acquired urine spectrum, and the relative concentrations of
each metabolite present determined by reference to the internal
TSP standard15
(Table 2). For the targeted spectral integration
approach, spectral regions were defined for each of the
metabolites of interest (Table 2), with a width sufficient to
encapsulate the majority of the peak across the entire set of
spectra (determined manually by spectral overlay). The integral
area of these regions was calculated using an in-house routine
in Matlab. Probabilistic quotient normalization21
(PQN) was
applied to remove variation originating from intersample dif-
ferences in urinary dilution. Chemometric analysis of meta-
bolite concentration data was completed using Simca P+12
(Umetrics, Umea, Sweden). Principal component analysis
(PCA, using metabolite concentration data) was conducted.
Partial least-squares regression (PLS, using metabolite concen-
tration data and the experimental design matrix) allowed
goodness-of-fit (R2
) and goodness-of-prediction (Q2
) estimates
to be made for each metabolite.22
■ RESULTS AND DISCUSSION
Quantification of Metabolites Using NMR. As detailed
in the Materials and Methods section, three different urines
were mixed according to a {3,3} simplex lattice design, and
analyzed in triplicate by 1
H NMR spectroscopy. Representative
spectra are shown in Figure 2. A total of 54 metabolites were
successfully quantified in these samples using both a targeted
spectral fitting approach (involving the fitting of individual
reference metabolite spectra to the spectra acquired for each
sample mixture), and a targeted spectral integration approach
(involving the integration of a representative spectral region).
Metabolite data are given in Table 2.
Comparison of the metabolite concentrations of metabolites
across the three samples containing only one of the three urine
components (i.e., the corners of the simplex lattice design) showed
considerable variation between the rat and human samples
(Supporting Information Figure S1). Some metabolites were
present or absent in only one of the samples (2-hydroxyisobutyrate,
2-oxoisocaproate, fumarate, N,N-dimethylglycine, oxaloacetate,
tryptophan). Other metabolites spanned up to 2 orders of
magnitude in their absolute concentration in these samples;
those with the greatest variation included 1-methylnicotina-
mide, 1,3-dimethylurate, betaine, and allantoin.
PCA/PLS Modeling and Effect of Normalization.
Principal component analysis (PCA) and partial least-squares
(PLS) regression are widely used multivariate analysis tools
based on latent variable methods.23,24
For each quantification
approach, the metabolite concentrations (X-matrix) were
modeled by PCA in an unsupervised manner, and also
modeled against the experimental mixture design (Y-
matrix) using PLS.
PCA of the data set before and after PQN was conducted
and showed differences in the variation captured by the two
methods of quantification (Supporting Information, Figure S2).
Prior to normalization, the largest variation in each data set was
attributable to the sample dilution. Upon normalization, the
next biggest variation in the spectral fitting data set was revealed
to be the mixture design, whereas in the spectral targeting data
set, it was related to nuisance variation, driven by outlying
samples, and attributable to inferior water suppression.
PLS analysis was used to assess the fit of the relative
metabolite variation to the mixture design. The component
scores for the PLS models (Figure 3) clearly showed the experi-
mental design as anticipated. Switching the X and Y matrices
of the PLS model also allowed a calculation of the goodness-
of-fit (R2
) and goodness-of-prediction (Q2
) values for each
metabolite against the {3,3} simplex-lattice design (Figure 3
and Table 2). It can be seen that a large proportion have high
Q2
values indicating that the latent structure of these data
follow that of the mixture proportions. It was recognized that in
these models, overall urinary dilution would have the effect of
artificially enhancing some of these values as a consequence of
introducing structure into the concentration profiles. PQN
concentrations were subsequently modeled and indicated that
once the global dilution factor was removed from the con-
centration data, several metabolites displayed greatly reduced
Q2
values. Removing the dominating dilution difference means
that the Q2
statistic reported for each metabolite gives a more
realistic representation of the fit to the design of the metabolite
response, in the presence of a variable background.
There are several reasons that explain low Q2
value including
(a) changes in chemical shift as a consequence of pH variation,
Figure 3. Partial least-squares regression analysis scores plots
indicating latent structure of the TSP normalized data (A) spectral
integration data set and (B) spectral fitting data set. It can be seen that
the samples recapitulate the {3,3} simplex-lattice mixture design in the
score space. Samples (triplicates) are colored according to proportions
of the three component urines (Table 1).
Analytical Chemistry Article
dx.doi.org/10.1021/ac400449f | Anal. Chem. 2013, 85, 6674−66816678
6. (b) the absence of a metabolite in one or more of the
component samples (in this case the rat and human urines),
(c) low s/n in the measurement, and (d) overlap of spectral
features. Where identified during analysis, these influences are
indicated (Table 2).
Comparison of the PQN data revealed that several
metabolites exhibited high Q2
values in PLS models from
both quantification approaches, as shown in Figure 4. Of these,
the highest were N,N-dimethylglycine, succinate, trans-
aconitate, and 2-oxoglutarate. The Q2
value for several
Figure 4. continued
Analytical Chemistry Article
dx.doi.org/10.1021/ac400449f | Anal. Chem. 2013, 85, 6674−66816679
7. metabolites was substantially different when comparing the two
methods. Those performing well only in the targeted spectral
fitting approach included 1-methylnicotinamide, trigonelline,
and tyrosine. Conversely, those performing well only in the
targeted integration approach included N,N-dimethylformamide,
allantoin, 3-indoxysulfate, dimethylamine, malonate, guanidinoa-
cetate, and ethanol. Oxaloacetate, phenylacetylglycine, 1,6-andro-
β-D-glucose, and glucose exhibited low, or subzero Q2
values in
both models.
In summary, metabolites artificially well modeled as a con-
sequence of the urinary dilution factor may be shown to be poorly
modeled following normalization (e.g., an overlapped peak on a
variable background). Conversely, metabolites apparently
poorly modeled may have their concentration structure across
the experimental design revealed (e.g., a small peak on a variable
background). Metabolite resonances in spectral regions with
little background variation, and that are well resolved should be
well modeled and exhibit a high Q2
value. It should be noted
Figure 4. Goodness-of-fit (R2
, green bars) and goodness-of-prediction (Q2
, blue bars) metrics generated across the {3,3} simplex-lattice mixture
design data for 54 metabolites for (A) spectral fitting data set normalized to TSP, (B) spectral integration data set normalized to TSP, (C) spectral
fitting data set normalized using PQN, and (D) spectral integration data set normalized using PQN.
Analytical Chemistry Article
dx.doi.org/10.1021/ac400449f | Anal. Chem. 2013, 85, 6674−66816680
8. that metabolites that are invariant across all the component
samples might artificially appear to be poorly quantified as a
consequence of their low difference in signal relative to the
background noise. In this study, we deliberately included samples
of the same type (urine), but of varying similarity (rat (day) vs
rat (night) vs human) to produce a spectral set with contrasting
metabolite concentrations and background matrix effects.
We chose a simple mixture design as an exemplar of the
approach, but other designs are possible. For example, this might
have particular value when substantial changes in the back-
ground/matrix are expected to change between the samples
in each class (e.g., toxicological interventions that result in
proteinuria, clinical samples containing high concentrations of
treatment excipients). Samples pooled according to class and
titrated to give linear combinations with known proportions
would characterize the sample compositional space between these
classes. In practice, the approach described could be adapted
for use in larger sample sets; post hoc selection of samples that
are identified by their profiles as being at contrasting extremes
could be used in this way to characterize the individual metabolite
behavior (linearity and matrix effect in relation to a varying
background) across the sample compositional space.
■ CONCLUSION
The approach we report combines the use of sample mixing to
encode sample spectra according to a known experimental
design, with multivariate analysis that allows the theoretical and
observed responses to be compared. We propose that a Q2
statistic is a suitable index with which to make this comparison.
This statistic provides an unbiased estimate of how reliable the
quantification of a particular spectral feature is across the
sample compositional space, and thus which can be safely
interpreted from the urinary data. We found PQN suitable to
remove nuisance variation attributable to gross sample dilution,
and this procedure helped reveal the variation of interest, that
related to the experimental design. Broad agreement between
targeted spectral fitting and targeted spectral integration approaches
was observed, but differences in the response of metabolites with
peaks in overlapped or baseline-dominated spectral regions. This
approach, which efficiently exploits the information contained in
several samples simultaneously, has general applicability, can be
used as an additional metric for profile quality assessment when
conducting biomarker discovery research using spectroscopic
platforms. We suggest that the method offers good complemen-
tarity to measures of analytical reproducibility obtained by replicate
analysis of individual samples.
■ ASSOCIATED CONTENT
*S Supporting Information
Additional material as described in the text. This material is
available free of charge via the Internet at http://pubs.acs.org.
■ AUTHOR INFORMATION
Corresponding Author
*E-mail: toby.athersuch@imperial.ac.uk.
Notes
The authors declare no competing financial interest.
■ ACKNOWLEDGMENTS
The authors wish to acknowledge the Consortium of
Metabonomic Toxicology (COMET) - comprising Bristol-
Myers-Squibb, Hoffman-La Roche Pharmaceuticals, Pfizer Inc.,
Eli Lilly & Co. and NovoNordisk - for the provision of rat urine
samples.
■ REFERENCES
(1) Nicholson, J. K.; Lindon, J. C.; Holmes, E. Xenobiotica 1999, 29,
1181−1189.
(2) Nicholson, J. K.; Connelly, J.; Lindon, J. C.; Holmes, E. Nat. Rev.
Drug Discovery 2002, 1, 153−161.
(3) Nicholson, J. K.; Lindon, J. C. Nature 2008, 455, 1054−1056.
(4) Keun, H. C. Biomarker discovery for drug development and
translational medicine using metabonomics. Oncogenes Meet Metabo-
lism: From Deregulated Genes to a Broader Understanding of Tumour
Physiology; Kroemer, G., Mumberg, D., Keun, K., Riefke, B., Steger-
Hartman, T., Petersen, K., Eds.; Springer: Berlin, 2008; Vol. 4, pp 79−
98.
(5) Keun, H. C.; Athersuch, T. J. Pharmacogenomics 2007, 8, 731−
741.
(6) Nicholson, J. K.; Holmes, E.; Kinross, J. M.; Darzi, A. W.; Takats,
Z.; Lindon, J. C Nature 2012, 491, 384−392.
(7) Holmes, E.; Loo, R. L.; Stamler, J.; Bictash, M.; Yap, I. K.; Chan,
Q.; Ebbels, T.; De Iorio, M.; Brown, I. J.; Veselkov, K. A.; Daviglus, M.
L.; Kesteloot, H.; Ueshima, H.; Zhao, L.; Nicholson, J. K.; Elliott, P.
Nature 2008, 453, 396−400.
(8) Athersuch, T. J. Bioanalysis 2012, 4, 2207−2212.
(9) Keun, H. C.; Ebbels, T. M.; Antti, H.; Bollard, M. E.; Beckonert,
O.; Schlotterbeck, G.; Senn, H.; Niederhauser, U.; Holmes, E.; Lindon,
J. C.; Nicholson, J. K. Chem. Res. Toxicol. 2002, 15, 1380−1386.
(10) Lindon, J. C.; Keun, H. C.; Ebbels, T. M. D.; Pearce, J. M. T.;
Holmes, E.; Nicholson, J. K. Pharmacogenomics 2005, 6, 691−699.
(11) Forgue, P.; Halouska, S.; Werth, M.; Xu, K.; Harris, S.; Powers,
R. J. Proteome Res. 2006, 5, 1916−1923.
(12) Holmes, E.; Nicholson, J. K. Ernst Schering Found. Symp. Proc.
2007, 227−249.
(13) Ellis, J. K.; Athersuch, T. J.; Thomas, L. D.; Teichert, F.; Perez-
Trujillo, M.; Svendsen, C.; Spurgeon, D. J.; Singh, R.; Jarup, L.; Bundy,
J. G.; Keun, H. C. BMC Med. 2012, 10, 61.
(14) Veselkov, K. A.; Lindon, J. C.; Ebbels, T. M.; Crockford, D.;
Volynkin, V. V.; Holmes, E.; Davies, D. B.; Nicholson, J. K. Anal.
Chem. 2009, 81, 56−66.
(15) Weljie, A. M.; Newton, J.; Mercier, P.; Carlson, E.; Slupsky, C.
M. Anal. Chem. 2006, 78, 4430−4442.
(16) accessed January 2013.
(17) Athersuch, T. J.; Keun, H.; Tang, H.; Nicholson, J. K. J. Pharm.
Biomed Anal. 2006, 40, 410−416.
(18) Ebbels, T. M. D.; Keun, H. C.; Beckonert, O. P.; Bollard, M. E.;
Lindon, J. C.; Holmes, E.; Nicholson, J. K. J. Proteome Res. 2007, 6,
4407−4422.
(19) Beckonert, O.; Keun, H. C.; Ebbels, T. M.; Bundy, J.; Holmes,
E.; Lindon, J. C.; Nicholson, J. K. Nat. Protoc. 2007, 2, 2692−2703.
(20) Neuhaus, D.; Ismail, I. M.; Chung, C. W. J. Magn. Reson. Series A
1996, 118, 256−263.
(21) Dieterle, F.; Ross, A.; Schlotterbeck, G.; Senn, H. Anal. Chem.
2006, 78, 4281−4290.
(22) Keun, H. C.; Ebbels, T. M. D.; Antti, H.; Bollard, M. E.;
Beckonert, O.; Holmes, E.; Lindon, J. C.; Nicholson, J. K Anal. Chim.
Acta 2003, 490, 265−276.
(23) Wold, S.; Esbensen, K.; Geladi, P. Chemometr. Intell. Lab. 1987,
2, 37−52.
(24) Wold, S.; Ruhe, A.; Wold, H.; Dunn, W. J. Siam J. Sci. Comput.
1984, 5, 735−743.
Analytical Chemistry Article
dx.doi.org/10.1021/ac400449f | Anal. Chem. 2013, 85, 6674−66816681