1. The document discusses the challenges in computing the CMBR likelihood function exactly due to the large size of the covariance matrix. Approximations like the Gaussian likelihood are generally used.
2. It describes how the likelihood function depends on the angular power spectrum estimators Cl, which provide a sufficient statistic to compress the full CMB data. However, the distribution of Cl is non-Gaussian.
3. It outlines the different methods used to compute the likelihood function for different multipole ranges in the WMAP 7-year data release, including direct computation for low-l and quadratic estimators for high-l.
Anomaly detection and data imputation within time series
likelihood_p1.pdf
1. Cosmological Microwave Background Radiation
Likelihood Analysis - Part 1
Jayanti Prasad
Inter-University Centre for Astronomy & Astrophysics (IUCAA)
Pune, India (411007)
September 12, 2011
2. Plan of the talk
I CMBR Likelihood
I Gaussian Likelihood
I Optimal Quadratic Estimator
I WMAP Likelihood
3. CMBR Likelihood
I CMBR temperature and polarization observations can
constrain cosmological parameters if the likelihood function
can be computed exactly.
I Computing the likelihood function exactly in a brute force way
is computationally challenging since it involves inversion of
the covariance matrix i.e., O(N3) computation.
L =
1
(2π)N/2
p
|C|
exp
−dt
C−1
d
, (1)
where d is the data vector and C is the covariance matrix.
I Temperature and polarizations fields are correlated and partial
sky coverage correlates power at different ls.
I Since the exact computation of likelihood function is
challenging, approximations are generally made (Gaussian
Likelihood, Gibbs sampling etc).
4. Gaussian Likelihood
I The likelihood function for the temperature fluctuations
observed by a noiseless experiment with full-sky coverage has
the form:
L(T|Cl ) ∝
1
p
|S|
exp[−(TS
−1
T)/2] (2)
or
L(T|Cl ) =
Y
lm
1
p
Cl
exp[−|alm|
2
/(2Cl )] (3)
where
T(n̂) =
X
lm
almYlm; and Sij =
X
l
((2l + 1)/(4π))Cl Pl (n̂i .n̂j ) (4)
I Since observe only one sky, we cannot measure the power
spectra directly, but instead form the rotationally invariant
estimators, CXY
l , for full-sky CMB maps given by
Ĉ
XY
l = (1/(2l + 1))
m=l
X
m=−l
a
X
lma
Y
lm (5)
where ĈXY
l = CXY
l (true power spectrum). Note that
the likelihood has a maximum when Cl = Ĉl so Ĉl is the MLE.
[Hamimeche Lewis (2008)]
5. I Since alm are assumed to be Gaussian and statistically
isotropic, they have independent distributions (for |m| ≥ 0).
The probability of a set of alm at a given l is given by:
−2lnP(alm|Cl ) =
m=l
X
m=−l
[a
0
lmC−1
l alm + ln|2πCl |]
= (2l + 1)[Tr(Ĉl C−1
l ) + ln|Cl |] + const. (6)
I The fact that this likelihood for Cl depends only on the ĈXY
l
shows that on the full sky the CMB data can losslessly be
compressed to a set of power spectrum estimators, that
contain all the relevant information about the posterior
distribution i.e., Ĉl is a sufficient statistic for the likelihood.
6. I If the likelihood of the theory power spectrum Cl as a function
of the measured estimators Ĉl were Gaussian, the likelihood
could be calculated straightforwardly from the measured Ĉl .
I However the distribution is non-Gaussian because for a given
temperature power spectrum Cl , the Ĉl is a sum of squares of
Gaussian harmonic coefficients, have a (reduced) χ2
distribution. This makes it difficult to use “χ2 per degree of
freedom” as “goodness of fit”.
7. Maximum Likelihood Estimation
L =
1
(2π)N/2
p
|C|
exp
−(x − µ)C−1
(x − µ)t
(7)
or
2 log L = 2L = ln|C|+(x −µ)C−1
(x −µ)t
= Tr[lnC +C−1
D] (8)
where the covariance matrix C and data matrices D are given by
C = (x − µ)(x − µ)t
and D = (x − µ)(x − µ)t
(9)
differentiating equation (8) with respect to to θi
2L,i = Tr
C−1
C,i − C−1
C,i C−1
D + C−1
D,i
(10)
note that
, =
∂
∂θi
;
∂
∂θi
(C
−1
) = −C
−1 ∂C
∂θi
C
−1
and
∂
∂θi
(lnC) = C
−1 ∂C
∂θi
(11)
[Bond et al. (1998); Tegmark et al. (1997)]
8. Fisher Information Matrix
I
Fij = hL,ij i =
1
2
Tr Ai Aj + C−1
Mij
, (12)
where
Ai = (lnC),i and Mij = D,ij (13)
for isotropic CMBR fluctuations µ = 0 and
Cij = δij
Cl +
4πσ2
N
eθ2
bl(l+1)
(14)
where σ is the RMS pixel noise and θb = 0.425FWHM.
I From the above equations
Fij =
l=lmax
X
l=2
2l + 1
2
Cl +
4πσ2
N
eθ2
bl(l+1)
−2
∂Cl
∂θi
∂Cl
∂θj
(15)
9. Quadratic Estimator
I The best fit model i.e., parameter set θ̄, corresponds to a
point in the parameter space at which
∂L
∂θ
|θ=θ̄ = 0 (16)
We can use a root finding method to obtain θ̄ in the
following way:
I We begin with some best guess value θ(0) and approximate
the derived to the first order
∂L(θ̄)
∂θi
=
∂L(θ(0)
)
∂θi
+
θ̄j − θ
(0)
j
∂2
L(θ(0)
)
∂θi ∂θj
(17)
setting the LHS zero
θ̄j − θ
(0)
j
≈ −
∂2
L(θ(0)
)
∂θi ∂θj
#−1
∂L(θ(0)
)
∂θi
δθ (18)
for the next step we can write
θ
(1)
= θ
(0)
+ δθ (19)
and keep iterating till we get convergence.
[Bond et al. (1998); Durrer (2008)]
10. I
∂L
∂θi
=
1
2
dt
C−1
C,i C−1
d − Trace C−1
C,i
(20)
and
Fij =
∂2L
∂θi ∂θj
=
1
2
Trace
C−1
C,i C−1
C,i
(21)
I
θ(1)
= θ(0)
+
1
2
Fij
dt
C−1
C,i C−1
d − Trace C−1
C,i
(22)
I The advantage of this method is that for a given starting
parameter the Fisher matrix is readily calculated from the
covariance matrix alone, without having to know the data.
I The above estimated parameters θ(1) depend quadratically on
the data vector d so they are called a quadratic estimators.
I This method is faster than the grid based methods, however,
there are problems. This method is Levenberg-Marquardt
method as is implemented in GSL.
11.
12. Error Estimate
I A good first estimate for 1σ error bars are the diagonal elements of
the Fisher matrix at the maximum, i.e., the best fit point.
hδθi δθj i = F−1
ij (23)
I These are the true 1σ errors only if the distribution is Gaussian in
the parameters θ which usually it is not.
I The Fisher matrix defines an ellipse around θ̄ in the parameter space
via the equation δθt
F(θ̄)δθ = 1
I The principal directions of this error ellipse are parallel to the
eigenvectors of F and their half length is given by the square root of
the eigenvalues of F
I The total width of the error ellipse in a given direction j at θ̄ is
2/
√
Fjj and 1/
√
Fjj is the error of the parameter j if all other
parameters are known and are equal to θ̄.
I In practice we do not know the other parameters any better than j
so the true error in j is given by the size of the projection of the
error matrix onto the j axis.
13.
14. WMAP Likelihood
I At large l likelihood is approximated as Gaussian
2lnLGauss ∝
X
ll0
(Cl − Ĉl )Qll0 (Cl − Ĉl ) (24)
where Qll0 , the curvature matrix, is the inverse of the power
spectrum covariance matrix.
I The power spectrum covariance encodes the uncertainties in
the power spectrum due to cosmic variance, detector noise,
point sources, the sky cut, and systematic errors.
I Since the likelihood function for the power spectrum is slightly
non-Gaussian, equation (24) is a systematically biased
estimator so a log normal function is suggested
2lnLLN ∝
X
ll0
(zl − ẑl )Dll0 (zl − ẑl ) (25)
where zl = (Cl + Nl ) and Dll0 = (Cl + Nl )Qll0 (Cl + Nl )
I It has been argued that the following estimator is more
accurate then the the Gaussian or log normal only [Verde et al.
(2003)]
L = (1/3)LGauss + (2/3)LLN (26)
15. WMAP seven years Likelihood
I For low-l TT ( l ≤ 32) the likelihood of a model is computed
directly from the 7-year Internal Linear Combination (ILC)
maps.
I For high-l (l 32) TT MASTER pseudo-Cl quadratic
estimator is used.
I For low-l polarization, (l 23) TE, EE, BB, pixel-space
estimator is used.
I For high-l TE (l 23) MASTER quadratic estimator is being
used.
I Given a best-fit model one asks how well the model fits the data.
Given that the likelihood function is non-Gaussian, answering the
question is not as straightforward as testing the χ2
per dof of the
best-fit model. Instead one resorts to Monte Carlo simulations and
compare the absolute likelihood obtained from fitting the flight data
to an ensemble of simulated values.
[Larson et al. (2011)]
16. Summary Discussion
I I have discussed the difficulties and approximations used to
compute the CMBR likelihood function.
I My aim was to understand why and how the likelihood
function is computed differently for different values of l and
different type of power spectra.
I I have also discussed various estimators related to angular
power spectrum and the likelihood function.
I I was not able to discuss the actual computation of the
various components of the likelihood function in the WMAP
likelihood code.
I It has been suggested (by Dodelson) that the Nelder Mead
method or downhill simplex method (commonly known
amoeba) can be be used for finding the best fit parameters.
I Another method called NEWUOA which is used for
unconstrained optimization without derivatives, is also
suggested (by Antony Lewis ) to find the best fit parameters,
optimization of the likelihood function.
17. References
Bond, J. R., Jaffe, A. H., Knox, L. 1998, Phys. Rev. D , 57, 2117
Dodelson, S. 2003, Modern cosmology (San Diego, U.S.A.: Academic
Press)
Durrer, R. 2008, The Cosmic Microwave Background (Cambridge
University Press)
Hamimeche, S., Lewis, A. 2008, Phys. Rev. D , 77, 103013
Larson, D., et al. 2011, Astrophys. J. Suppl. Ser. , 192, 16
Tegmark, M., Taylor, A. N., Heavens, A. F. 1997, Astrophys. J. , 480,
22
Verde, L., et al. 2003, Astrophys. J. Suppl. Ser. , 148, 195