Raman spectroscopy can be used to identify molecules by the characteristic scattering of light from a laser. Each Raman-active dye label has a unique spectral signature, comprised by the locations and amplitudes of the peaks. The Raman spectrum is discretised into a multivariate observation that is highly collinear, hence it lends itself to a reduced-rank representation. We introduce a sequential Monte Carlo (SMC) algorithm to separate this signal into a series of peaks plus a smoothly-varying baseline, corrupted by additive white noise. By incorporating this representation into a Bayesian functional regression, we can quantify the relationship between dye concentration and peak intensity. We also estimate the model evidence using SMC to investigate long-range dependence between peaks. These methods have been implemented as an R package, using RcppEigen and OpenMP.
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Bayesian modelling and computation for Raman spectroscopy
1. Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Bayesian modelling and computation
for Raman spectroscopy
Matt Moores
Department of Statistics
University of Warwick
Oxford computational statistics & machine learning reading group
March 11, 2016
2. Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Acknowledgements
University of Warwick
Mark Girolami
Jake Carson
Karla Monterrubio Gómez
University of Strathclyde
Kirsten Gracie
Karen Faulds
Duncan Graham
Funded by the EPSRC grant “In Situ Nanoparticle Assemblies for
Healthcare Diagnostics and Therapy” (ref: EP/L014165/1) and an
Award for Postdoctoral Collaboration from the EPSRC Network on
Computational Statistics & Machine Learning (ref: EP/K009788/2)
3. Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Outline
1 Raman Spectroscopy
2 Functional Model
3 Bayesian Computation
4 Experimental Results
Model Choice
Multivariate Calibration
5. Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Statistical properties
Each Raman-active dye has a unique spectral signature:
Peaks correspond to vibrational modes of the molecule
Shift in wavenumber proportional to change in energy state
Smoothly-varying baseline (background fluorescence)
200 300 400 500 600 700 800 900 1100 1300 1500 1700
1000020000300004000050000
∆ν~ cm−1
photoncounts
Replicate
A
B
C
D
E
6. Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Surface-enhanced Raman scattering (SERS)
Raman signal enhanced by proximity to nanoparticles
Functionalisation using antibodies
Illustration courtesy Kirsten Gracie (U. Strathclyde)
9. Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Additive functional model of a SERS spectrum
Separate the hyperspectral signal into 3 components:
yi(˜ν) = ξi(˜ν) + s(˜ν) + (1)
where:
yi(˜ν) is a an observed SERS spectrum, discretised at
multiple wavenumbers νj ∈ V
ξi(˜ν) is a smooth baseline function
s(˜ν) is the spectral signature of the dye molecule
is additive, zero mean white noise with variance σ2
10. Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Baseline
Penalised spline:
ξi(˜ν) =
M
m=1
Bm(˜ν)αi,m (2)
π(αi,·) ∼ NM (0, Σλ) (3)
where Bm(˜ν) are Demmler-Reinsch or B-spline basis functions
550 600 650 700 750 800
−1.5−0.50.51.5
∆ν~ (cm−1
)
Bm(ν~)
11. Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Spectral signature
An additive mixture of radial basis functions:
s(˜ν) =
P
p=1
f (˜ν | p, Ap, ϕp) (4)
where:
p is the location of peak p
Ap is the amplitude
ϕp is the scale (broadening)
12. Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Squared exponential
Kernel function is the Gaussian density:
f (νj | p, Ap, ϕp) = Ap exp
(νj − p)2
2ϕ2
p
(5)
FWHM = 2
√
2 ln 2ϕp (6)
1300 1400 1500 1600 1700 1800
050001000015000
∆ν~ (cm−1
)
Intensity(a.u.)
13. Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Lorentzian
Long-range dependence between peaks can be modelled
using the Cauchy density:
f (νj | p, Ap, ϕp) = Ap
ϕ2
p
(νj − p)2 + ϕ2
p
(7)
FWHM = 2ϕp (8)
1300 1400 1500 1600 1700 1800
050001000015000
∆ν~ (cm−1
)
Intensity(a.u.)
14. Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Informative priors
Obtained from manual peak fitting of independent data:
ϕ
Density
1 2 5 10 20 50
0.000.050.100.15
kernel density
lognormal
(a) Scale parameters, ϕ (cm−1
)
amplitudes
Density
0 5000 10000 15000 20000 25000
0.000000.000100.00020
kernel density
truncated normal
gamma
(b) Amplitudes, A (arbitrary units)
15. Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Multivariate calibration
Signal intensity depends linearly on dye concentration, from the
limit of detection (LOD) up to monolayer coverage:
Ap = βpci, cLOD ≤ ci < cMLC (9)
where:
ci is the nanomolar (nM) concentration of the dye in
observation i
βp is a linear regression coefficient
cLOD is based on the signal-to-noise ratio
cMLC is proportional to the surface area of the
nanoparticles
Jones, et al. (1999) Anal. Chem. 71(3): 596–601
16. Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Markov chain Monte Carlo
MCMC targeting the joint posterior π (β, ϕ, α, σ | yi(˜ν))
Algorithm 1 Marginal Metropolis-Hastings
1: Draw random walk proposals for the peaks: β , ϕ
2: Propose baseline αi,· ∼ q αi,· | yi(˜ν), β , ϕ , σ
3: Propose σ ∼ q σ | yi(˜ν), β , ϕ , αi,·
4: Compute the marginal acceptance ratio:
ρ =
p(yi(˜ν) | β , ϕ , α , σ )
p(yi(˜ν) | β◦
, ϕ◦, α◦, σ◦)
q(α◦
| ·)q(σ◦
| ·)π(β )π(ϕ )π(α )π(σ )
q(α | ·)q(σ | ·)π(β◦
)π(ϕ◦)π(α◦)π(σ◦)
=
p(yi(˜ν) | β , ϕ )π(β )π(ϕ )
p(yi(˜ν) | β◦
, ϕ◦)π(β◦
)π(ϕ◦)
5: Accept β , ϕ , αi,·, σ jointly with probability min(1, ρ)
(assuming peak locations p and number of peaks P are fixed)
17. Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Sequential Monte Carlo
Particle-based method targeting a sequence of partial
posteriors πi (β, ϕ, α1:i,·, σ | Y1:i,˜ν)
Algorithm 2 SMC
1: Initialise ϕ(q), β(q)
, α(q), σ
(q)
∀ q ∈ {1, . . . , Q}
2: Initialise importance weights, w
(q)
0 = 1
Q
3: for all observations i = 1, . . . , n do
4: Update importance weights:
w
(q)
i ∝ w
(q)
i−1 p yi(˜ν) | ϕ(q)
, β(q)
, α
(q)
i,· , σ(q)
(10)
5: Resample particles if ESSi is below threshold
6: for all particles q ∈ {1, . . . , Q} do
7: Update ϕ(q)
, β(q)
, α(q)
, σ
(q)
using Algorithm 1
8: end for
9: end for
Chopin (2002) Biometrika 89(3): 539–551
18. Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Model evidence
Algorithm 2 provides a consistent, unbiased estimate of the
marginal likelihood:
Zi
Zi−1
=
Q
q=1
w
(q)
i−1 p yi(˜ν) | Θ
(q)
k (11)
p(Y | Mk) ≈ Zn =
n
i=1
Zi
Zi−1
(12)
where:
Θ
(q)
k = ϕ
(q)
k , β
(q)
k , α
(q)
k , σ
(q)
k are the parameters of
model Mk for particle q
Zn is the normalising constant
Z0 = 1
Del Moral, Doucet & Jasra (2006) JRSS B 68(3): 411–436
Pitt, Silva, Giordani & Kohn (2012) J. Econom. 171(2): 134–151
19. Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Thermodynamic integration
An alternative approach is to use the path sampling identity:
log
Zn
Z0
=
1
0
Eγ
d log qγ(·)
dγ
dγ (13)
where γ = i
n identifies the sequence of partial posterior
distributions pγ =
qγ
Zi
= πi (β, ϕ, α1:i,·, σ | Y1:i,˜ν)
This equation cannot be solved exactly, so it must be
approximated using a numerical integration method.
The expectation Eγ can be estimated using a weighted sum
over the SMC particles.
Zhou, Johansen & Aston (2015) arXiv:1303.3123 [stat.ME]
Gelman & Meng (1998) Statist. Sci. 13(2): 95–208
20. Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
R package ’serrsBayes’
§
library ( serrsBayes )
library ( hyperSpec )
ramanSpectra ← read . spc ("mfutils.spc")
wavenumbers ← wl ( ramanSpectra )
peakLoc ← wl2i ( ramanSpectra , c(964 , 1138, 1218, . . . )
# informative p r i o r s f o r the peaks and baseline
l P r i o r s ← l i s t ( scale .mu=log (25.27) − (0.4^2) / 2 ,
scale . sd=0.4 , bl . smooth=1.25e−3, bl . knots =200,
amp.mu=3449, amp. sd=5672, noise . sd=3 ,
noise . nu=length ( wavenumbers ) ∗nrow( ramanSpectra )
beta . sd=1000)
r e s u l t ← fitPeaksWithBaselineSMC ( wavenumbers ,
ramanSpectra [ [ ] ] , peakLoc , l P r i o r s )
28. Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Model Choice
The Bayes Factor (log10 BF) correctly identifies the generative
model for simulated spectra with Gaussian (194) and
Lorentzian (-160) peaks:
Table: Simulation study
Data Mk log10 p(Y | Mk)
Gaussian sq. exp. -2011
Gaussian Cauchy -2205
Lorentzian sq. exp. -2163
Lorentzian Cauchy -2004
29. Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Model Choice for TAMRA
The Bayes Factor favours Lorentzian peaks (log10 BF = -32) for
tetramethylrhodamine (TAMRA):
Table: Observed spectra (TAMRA)
Data Mk log10 p(Y | Mk)
TAMRA sq. exp. -2257
TAMRA Cauchy -2225
This indicates very strong evidence of long-range dependence
between peaks in SERS spectra.
Kass & Raftery (1995) JASA 90(430): 773–795
34. Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
TAMRA at 0.13 nM
500 1000 1500 2000
0204060
∆ν~ (cm−1
)
Intensity(a.u.)
baseline−corrected spectra
posterior mean
3σε
35. Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
TAMRA at 0.65 nM
500 1000 1500 2000
0100200300400
∆ν~ (cm−1
)
Intensity(a.u.)
baseline−corrected spectra
posterior mean
3σε
36. Raman Spectroscopy Functional Model Bayesian Computation Experimental Results Conclusion
Summary
Semi-parametric model of hyperspectral data:
Joint estimation of baseline & peaks
Continuous representation of discretised spectra
Data tempering using SMC
Bayesian model choice for long-range dependence
Ongoing work:
Peak detection (estimation of p & P)
Informed source separation for multiplex spectra
Spatio-temporal correlation between spectra
37. Appendix
For Further Reading I
Moores, Gracie, Carson, Faulds, Graham & Girolami
Bayesian modelling and quantification of Raman spectroscopy.
in prep.
Gracie, Moores, Smith, Harding, Girolami, Graham, & Faulds
Preferential attachment of specific fluorescent dyes and dye
labelled DNA sequences in a SERS multiplex.
Anal. Chem., 88(2): 1147–1153, 2016.
Zhong, Girolami, Faulds & Graham
Bayesian methods to detect dye-labelled DNA oligonucleotides in
multiplexed Raman spectra.
J. R. Stat. Soc. Ser. C, 60(2): 187–206, 2011.
Zhou, Johansen & Aston
Towards Automatic Model Comparison: An Adaptive Sequential
Monte Carlo Approach.
arXiv:1303.3123 [stat.ME], 2015.
38. Appendix
For Further Reading II
Chopin
A Sequential Particle Filter Method for Static Models.
Biometrika, 89(3): 539–551, 2002.
Pitt, Silva, Giordani & Kohn
On some properties of Markov chain Monte Carlo simulation
methods based on the particle filter.
J. Econometrics 171(2): 134–151, 2012.
Jones, McLaughlin, Littlejohn, Sadler, Graham & Smith
Quantitative Assessment of Surface-Enhanced Resonance
Raman Scattering for the Analysis of Dyes on Colloidal Silver.
Anal. Chem., 71(3): 596–601, 1999.
Ramsay & Silverman
Functional Data Analysis, 2nd
ed.
Springer, 2005.
41. Appendix
SERRS
Surface-enhanced: proximity to a nanoplasmonic substrate
(silver/gold colloid)
Resonance: tune excitation wavelength to an electronic
transition of the molecule
Illustration courtesy Jake Carson (U. Warwick)