14 S. Baraldo et al.properties of the tissue, and it is usually estimated from the exponential decay of thesignal with respect to the b-value, the MR acquisition parameter. The assumption ofisotropy is common and reasonable in various cases, like breast and prostate cancer(see, for example, [7, 10]). In many practical situations it may not be possible to collect more than fewmeasures at different b-values, limiting the accuracy of the estimation. A reductionin the total number of measures necessary to achieve a certain accuracy is convenientin term of costs and allows to keep the patient involved in the MR procedure for ashorter amount of time (the experience may be unpleasant, especially when totalbody MR must be performed). The purpose of this work is to compare differentfrequentist and Bayesian approaches to the estimation of the ADC, underlining theirstatistical properties and computational issues.2 Rice-Distributed Diffusion MR Signals2.1 The Rice DistributionThe random variables we deal with derive from the complex signal w D wr C iwimeasured in diffusion MR. It is usual to assume that both wr and wi are affectedby a Gaussian noise with equal, constant variance, i.e. wr N. cos.#/; 2 / and 2 Cwi N. sin.#/; /, with 2 R and # 2 Œ0; 2 /. The quantity at hand is themodulus M of this signal, which has then a Rice (or Rician) distribution, that wewill denote as M Rice. ; 2 /. The density of this random variable has the form 2 m m2 C 2 m Á fM .mj ; /D 2 e 2 2 I0 2 I.0;C1/ .m/; (1)where I0 is the zeroth-order modiﬁed Bessel function of the ﬁrst kind (see ).Using the series expression of I0 , it is possible p deduce a different, equivalent todeﬁnition of a Rician random variable as M D R, where R is a noncentral 2 2variable that can be expressed as a mixture of .2P C 2/ distributions with PPoisson. 2 =2 2 /. This formulation becomes particularly useful for sampling froma Rice distribution, as it allows an easy implementation of a Gibbs sampler.2.2 Rice Exponential RegressionDiffusion MR aims at computing the diffusion tensor ﬁeld on a portion of tissue,and this is achieved by analyzing the inﬂuence of water diffusion on the measuredsignal, under different experimental settings. In particular, the classical model forrelating the magnitude signal to the acquisition parameters and the 3-dimensionaldiffusion tensor D is the Stejskal–Tanner equation
Estimation Approaches for ADC in MR Signals 15 g D 0 exp. gT Dgb/; (2)where g is the “real” intensity signal we want to measure, 0 is the signal at b D 0and the vector g 2 R3 is the applied magnetic gradient. The b-value is a function ofother acquisition settings, which we will omit since their description and discussionis beyond the scope of this article. See, for example,  for an overview on MRtechniques, including diffusion MR, and a discussion of various issues and recentadvances in this ﬁeld. In general, even in the ideal noiseless case, at least six observations are neededto determine the components of the symmetric, positive deﬁnite diffusion tensorD, by varying the direction g of the magnetic ﬁeld gradient. However, if the tissueunder study can be considered as isotropic, the diffusion tensor has the simpler formD D ˛I , where ˛ is the ADC, a scalar parameter, and I is the identity matrix. Thisreduces model (2) to the following D 0 exp. ˛b/ (3)for any vector g (in the following, we will omit it for ease of notation). Equation (3) describes pointwise the phenomenon on the tissue region of interest.In this study we consider the pixels of a diffusion MR sequence of images asindependent and focus on the estimation problem for a single point in space. Wedo not consider a spatial modeling for the ADC ﬁeld: although it could be a usefulway to ﬁlter noise and to capture underlying tissue structures; on the other hand, fordiagnostic purposes it may be preferable to submit to the physician an estimate thathas not been artiﬁcially smoothed.3 Estimation MethodsIn this section we present different methods for the estimation of ˛, the unknownparameter of interest. We consider a sample of signal intensities on a singlepixel Mi Rice. 0 e ˛bi ; 2 /, i D 1; : : : ; n, and their respective realizations m Dm1 ; : : : ; mn at b-values b D b1 ; : : : ; bn . The dispersion parameter 2 is usually measured over regions where almost purenoise is observed, and used as a known parameter in the subsequent estimates. Thisestimate of 2 is considered as reliable, since it can be based on a very large numberof pixels, so we will consider the case of known dispersion parameter. We consider nonlinear least squares, maximum likelihood and three Bayesianpoint estimators. In the case of a simple Rice. ; 2 / random variable an iterativemethod of moments estimator has been proposed in , but this technique hasno straightforward extension to the case of covariate-dependent , while momentequations would be difﬁcult to invert in the considered case. Moreover, underthe model assumptions presented in Sect. 2 a decoupling of noise and signal in
16 S. Baraldo et al.the fashion of signal detection theory could not be pursued (see  for a briefpresentation and a Bayesian implementation of the SDT classical scheme). The different estimation methods for the couple . 0 ; ˛/ presented here will betested under different signal-to-noise ratios (SNRs) = in Sect. 4.3.1 Nonlinear RegressionA standard approach for the estimation of 0 and ˛ is to solve a nonlinear leastsquares problem, which is equivalent to approximating Mi D 0 exp. ˛bi / C "i fori D 1; : : : ; n, where "i are iid, zero mean, Gaussian noise terms. The estimators O 0 LSand ˛ are deﬁned as O LS n X ˛bi 2 . O 0 ; ˛ LS / D argmin LS O .mi 0e / ; . 0 ;˛/ i D1for 0; ˛ > 0, which is equivalent to the solution of the following equations ( Pn Pn 2˛bi ˛bi 0 i D1 e D i D1 mi e ; Pn 2˛bi Pn ˛bi (4) 0 i D1 bi e D i D1 m i bi e :The approximation to a nonlinear regression model is inconsistent with the phe-nomenon under study, most evidently for the fact that in this case the noise term issymmetric and it can assume real values. This inconsistency is negligible for highSNR values, since a Rice. ; 2 / distribution in this case approaches a N. 0 ; 2 /,but becomes important with “intermediate” and low SNRs. In , the behavior ofthe Rice distribution with ﬁxing D 1 and varying is examined, observing thatnormality can be considered a good approximation at about = > 2:64, but thesample variance approaches 2 only for SNR values greater than 5:19. Even forpixels with high SNRs at b D 0, for large b-values the real signal could reachthe same order of magnitude of noise, depending on the unknown value of ˛, andthis could lead to very biased estimates. However, the least squares approach iscomputationally simpler and quicker to carry out, since it can be seen from (4)that 0 can be expressed as a function of ˛, thus requiring just a one-dimensionaloptimization to compute the estimates.3.2 Maximum LikelihoodThe maximum likelihood approach allows to take into account the asymmetry ofthe signal distribution, always providing admissible values of the parameters. Theobjective function is the log-likelihood
Estimation Approaches for ADC in MR Signals 17 n X 2 2 ˛bi 2 l. 0 ; ˛jm; b; / D log L. 0 ; ˛jm; b; /D log fMi .mi j 0 e ; / i D1 n n Ä Â Ã 1 X 2 2˛bi X mi 0e ˛bi / 2 0e C log I0 2 ; 2 i D1 i D1where fMi is the Rice density (1), for i D 1; : : : ; n. The ML estimator is then 2 . O 0 ; ˛ ML / D argmax l. 0 ; ˛jm; b; ML O /; . 0 ;˛/for 0 ; ˛ > 0. 0 Looking for stationary points of l and using the fact that I0 .x/ D I1 .x/, we obtainthe following estimating equations 8 mi 0 e ˛bi ˆ ˆ Pn 2˛bi Pn I1 . 2 /mi ˛bi ˆ < 0 i D1 e D i D1 mi 0 e ˛bi e ; I0 . 2 / e ˛bi ˆ ˆ Pn P n m I1 . i 0 2 /mi ˆ : 0 i D1 bi e 2˛bi D i D1 bi e ˛bi : mi 0 e ˛bi I0 . 2 /Notice that these score equations differ from (4) only for the Bessel functions ratios ˛bi ˛biI1 . mi 0 e2 /=I0 . mi 0 e2 /, which multiplies the observations mi . In particular, thisfactor decreases the values of observations, since 0 < I1 .x/=I0 .x/ < 1 for x > 0,and increases asymptotically to 1 for large SNRs, so that the score equations tendto (4). As shown in , the maximum likelihood estimator for obtained froman iid sample M1 ; : : : ; Mn Rice. ; 2 / and known 2 becomes exactly 0when the moment estimator for EŒM 2 D 2 C 2 2 becomes inadmissible, i.e. whenPn 2 i D1 Mi =n 2 2 Ä 0, even if the real value of is larger than 0. The case ofRice exponential regression suffers from a similar problem in a nontrivial way, andwould require 2 to be estimated with the other parameters to keep parameter valuescoherent with the model. Here we will not address this problem, but efforts in thisdirection are currently in progress.3.3 Bayesian ApproachesWe consider also three different estimators based on a Bayesian posterior distribu-tion: its mean, its median, and its mode. To allow an easy implementation usingBUGS code, we introduce a slightly different formulation of the model. If M Rice. ; 2 /, then R D M 2 = 2 has noncentral 2 distribution with 2degrees of freedom and noncentrality parameter D 2 =.2 2 /. Be now R1 ; : : : ; Rn
18 S. Baraldo et al.the random sample considered, with Ri D Mi2 = 2 and Mi Rice. 0 e ˛bi ; 2 /for i D 1; : : : ; n, and let r D .r1 ; : : : ; rn / be the observations from this sample. Let . 0 / and .˛/ be the prior distributions of the two unknown parameters, while the 2density of each Ri will be denoted as fRi .ri /, with parameter i D 0 e 2˛bi =2 2 .The joint posterior distribution of 0 and ˛ is then n Y 2 p. 0 ; ˛jr; b; // fRi .ri j i / . 0 / .˛/: i D1 As anticipated in Sect. 2, a noncentral 2 distribution of noncentrality can besampled as a mixture of 2 .2P C 2/ with P P. /. This allows an easy BUGSimplementation of these estimators.4 Simulation StudyWe compared ﬁve estimators for ˛—least squares (LS), maximum likelihood (ML)and posterior mean (PMe), median (PMd) and mode (PMo)—in terms of meanand mean square error. For the two frequentist approaches, ranges for the possibleparameter values have been chosen, considering 0 2 Œ0:1; 10 and ˛ 2 Œ0:1; 5,while the ﬁxed parameter 2 has been taken always equal to 1. For the Bayesianpoint estimators we chose uninformative, uniform priors, with the same support asthe ranges chosen for LS and ML. The ﬁrst two estimators have been computedwith R 2.12.2 (see ), using built-in optimization functions: optimize for theone-dimensional minimization required in LS and optim, using the L-BFGS-Bmethod, for the likelihood maximization, with startup values . 0start ; ˛start / D.1; 1/. Bayesian posterior distributions have been computed using a Gibbs samplerimplemented in JAGS (see ). In particular, the following model code (valid forany program supporting BUGS-type language) was used:model f for ( i in 1: n )f l am bda [ i ]< ( nu0 nu0 ) exp ( 2 a l p h a b [ i ] ) / ( 2 si gm a si gm a ) p [ i ] ˜ d p o i s ( l am bda [ i ] ) k [ i ]< 2 p [ i ]+ 2 M[ i ] ˜ d c h i s q r ( k [ i ] ) g alpha ˜ dunif (0 .1 ,5 ) nu0 ˜ d u n i f ( 0 . 1 , 1 0 )gAs it can be seen from the model code, uniform prior distributions have beenchosen, with supports equal to the search ranges for LS and ML. 10,000 Gibbssampling iterations have been run for each different sample, with a thinning of 10,and standard diagnostics revealed a good behavior of the generated chains.
Estimation Approaches for ADC in MR Signals 19 We chose b-values in a typical range for diffusion MR machine settings, i.e.from 0 to 1; 000 s=mm2 , on equally spaced grids of n D 5, 10, 15, 20, 25, 30points. Different simulations have been run with parameter values 0 D 2; 4; 8,which represent a low, an intermediate and a high SNR, and ˛ D 0:7; 1; 3, typicallow, intermediate, and high physiological values of ADC. It must be reported that the ML estimator, in cases of low SNR, reached theboundaries of the optimization region in various simulations. In the combinationn D 5, 0 D 2, ˛ D 3 only 45% of the simulations gave ML estimates that convergedto a value inside the predeﬁned ranges of parameters search, while in the othercases this number oscillated around 70% when ˛ D 1 or 100% when ˛ D 0:7. Thesedegenerate results have been removed for the computation of bias and variance. Figure 1 displays the decaying exponential curves we aim to estimate in thenine different combinations of 0 and ˛, along with a horizontal line at level ,to represent the order of magnitude of noise with respect to the signal. The qualityof estimates depends both on the SNR at b D 0 and on the ADC, as will be clearfrom simulations. Figure 2 shows the behavior of bias for the estimators of ˛ with differentsample sizes n. For what concerns the frequentist estimators (LS and ML), thereis no uniform ordering through the considered values of n when the signal decaysslowly (˛ D 0:7), but in the other cases, when noise is stronger along the curve, themaximum likelihood estimate is always less biased than the least squares one; noticealso that the least squares estimates do not seem to have a decreasing bias when nincreases among the considered values. Concerning the three Bayesian estimators,no striking differences arise among them, while with respect to the frequentistestimators in many cases they have comparable or higher bias, with the exceptionof the “worst case” 0 D 2, ˛ D 3, where they are uniformly more accurate. From what concerns variance, analyzed in Fig. 3, the LS estimator shows almostalways the best performance, excepted for low sample sizes when ˛ D 3. The otherestimators have similar performances and behaviors at different sample sizes n, withML and PMe having strikingly higher variance in some noisy cases. As expected,variance notably decreases for all estimators at increasing n in most combinationsof parameters, but with very low SNR ( 0 D 2) the only one showing empiricalconvergence of variance to 0 is LS. An overall index of estimator performance can be evaluated by the mean squareerror (MSE). Since the MSE is the sum of square bias and variance, the orders ofmagnitude of these two characteristics assume an important role. As it can be seenfrom Fig. 4, the LS estimator has the lowest MSE when ˛ D 0:7; 1, but exhibits theworst performances in the critical cases of high ADC, where Bayesian estimatorsseem to work better. Results for 0 are not detailed here, but it is worth mentioning that, since it isnecessary to estimate the two parameters jointly, the precisions and accuracies oftheir estimators are mutually inﬂuenced. Anyway, estimators for 0 show a moreclassical behavior: the LS estimator is in all cases less accurate but more precise(high bias and low variance), and the consistency of all estimators is evident whenincreasing n. The summary plots for the MSE of the estimators for 0 can be seenin Fig. 5.
20 S. Baraldo et al. nu0=2, alpha=0.7 nu0=2, alpha=1 nu0=2, alpha=38 8 86 6 64 4 42 2 20 0 0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 b b b nu0=4, alpha=0.7 nu0=4, alpha=1 nu0=4, alpha=38 8 86 6 64 4 42 2 20 0 0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 b b b nu0=8, alpha=0.7 nu0=8, alpha=1 nu0=8, alpha=38 8 86 6 64 4 42 2 20 0 0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 b b bFig. 1 Stejskal–Tanner model in simulation parameter combinations. b-values are expressed in1; 000 s/mm2
Estimation Approaches for ADC in MR Signals 21 nu0=2, alpha=0.7 nu0=2, alpha=1 nu0=2, alpha=3 1 0 0 0 −5 −2log(bias2) log(bias2) log(bias2) −1 −10 −2 −4 −15 −3 −6 5 10 15 20 25 30 5 10 15 20 25 30 5 10 15 20 25 30 n n n nu0=4, alpha=0.7 nu0=4, alpha=1 nu0=4, alpha=3 −2 0 −2 −4 −2log(bias2) log(bias2) log(bias2) −4 −4 −6 −6 −6 −8 −8 −8 5 10 15 20 25 30 5 10 15 20 25 30 5 10 15 20 25 30 n n n nu0=8, alpha=0.7 nu0=8, alpha=1 nu0=8, alpha=3 −1 −6 −6 −2 −7 −8 −3log(bias2) log(bias2) log(bias2) −8 −4 −10 −9 −5 −10 −12 −6 −11 −7 5 10 15 20 25 30 5 10 15 20 25 30 5 10 15 20 25 30 n n nFig. 2 Bias of estimators for ˛. Bold lines: solid D LS, dashed D ML; slim lines: solid D PMe,dashed D PMd, dotted D PMo
Estimation Approaches for ADC in MR Signals 23 nu0=2, alpha=0.7 nu0=2, alpha=1 nu0=2, alpha=3 1.0 1.5 1.0 0.5 1.0 0.0 0.5 0.5log(MSE) log(MSE) log(MSE) −0.5 0.0 0.0 −1.0 −0.5 −0.5 −1.5 −1.0 −1.5 −2.0 −1.0 5 10 15 20 25 30 5 10 15 20 25 30 5 10 15 20 25 30 n n n nu0=4, alpha=0.7 nu0=4, alpha=1 nu0=4, alpha=3 −0.5 1.0 −0.5 −1.0 0.5 −1.0 −1.5log(MSE) log(MSE) log(MSE) −1.5 0.0 −2.0 −2.5 −2.0 −0.5 −3.0 −2.5 −1.0 5 10 15 20 25 30 5 10 15 20 25 30 5 10 15 20 25 30 n n n nu0=8, alpha=0.7 nu0=8, alpha=1 nu0=8, alpha=3 −3.0 −0.5 −3.0 −3.5log(MSE) log(MSE) log(MSE) −1.0 −3.5 −4.0 −4.0 −1.5 5 10 15 20 25 30 5 10 15 20 25 30 5 10 15 20 25 30 n n nFig. 4 MSE of estimators for ˛. Bold lines: solid D LS, dashed D ML; slim lines: solid D PMe,dashed D PMd, dotted D PMo
24 S. Baraldo et al. nu0=2, alpha=0.7 nu0=2, alpha=1 nu0=2, alpha=3 0.0 0.0 0.0 −0.5 −0.5 −0.5log(MSE) log(MSE) log(MSE) −1.0 −1.0 −1.0 −1.5 −1.5 −1.5 −2.0 5 10 15 20 25 30 5 10 15 20 25 30 5 10 15 20 25 30 n n n nu0=4, alpha=0.7 nu0=4, alpha=1 nu0=4, alpha=3 0.2 −0.5 0.0 −0.5 −0.2 log(MSE)log(MSE) log(MSE) −0.4 −1.0 −1.0 −0.6 −0.8 −1.5 −1.5 −1.0 5 10 15 20 25 30 5 10 15 20 25 30 5 10 15 20 25 30 n n n nu0=8, alpha=0.7 nu0=8, alpha=1 nu0=8, alpha=3 −0.2 −0.4 −0.3 −0.6 −0.5 −0.4 −0.8 log(MSE)log(MSE) log(MSE) −0.5 −1.0 −1.0 −0.6 −1.2 −0.7 −1.4 −1.5 −0.8 −1.6 −0.9 5 10 15 20 25 30 5 10 15 20 25 30 5 10 15 20 25 30 n n nFig. 5 MSE of estimators for 0 . Bold lines: solid D LS, dashed D ML; slim lines: solid D PMe,dashed D PMd, dotted D PMo
Estimation Approaches for ADC in MR Signals 255 ConclusionsIn this work, we proposed different methods for estimating pixelwise the ADCfrom diffusion MR signals, following the Rice noise model and the Stejskal–Tannerequation for magnitude decay. The presented estimators exhibit different featuresthat should be taken into account when approaching real data. The least squaresapproach is the fastest and has low variance, but becomes less accurate when theconditional signal distribution at different b-values is more distant from normality.The maximum likelihood estimator is slightly slower, requiring a nonlinear maxi-mization on two variables, and has the lowest bias in many cases, but, as pointedout before, it may diverge with samples from noisy signals. Bayesian estimators arethe most expensive in terms of computational costs and may require further tuningfor improving their performances; they are the best in terms of mean square error atthe high ADC here tested and offer the advantage of providing the whole posteriordistribution for inferential purposes, while inferential tools regarding LS and MLshould rely, at present time, on normal approximations, which may not be reliablewith low sample sizes and SNRs. Future studies will focus on inferential aspects,while extending in efﬁcient ways these estimation methods to full MR images. Thesimultaneous estimation of the dispersion parameter 2 will also be developed andtested, requiring some added computational effort to estimation algorithms.References 1. Abramowitz, M., Stegun, I.A. (eds.): Handbook of Mathematical Functions. Dover, New York (1964) 2. Koay, C.G., Basser, P.J.: Analytically exact correction scheme for signal extraction from noisy magnitude MR signals. J. Magn. Reson. 179, 317–322 (2006) 3. Landini, L., Positano, V., Santarelli, M.F. (eds.): Advanced Image Processing in Magnetic Resonance Imaging. CRC, West Palm Beach (2005) 4. Lee, M.: BayesSDT: Software for Bayesian inference with signal detection theory. Behav. Res. Methods 40(3), 450–456 (2008) 5. Plummer, M.: Jags: a program for analysis of Bayesian graphical models using Gibbs sampling. In: Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003). Vienna, Austria (2003) 6. R Development Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2009). URL http://www.r-project. org. Available at: http://cran.r-project.org/, accessed December 2012. 7. Sato, C., Naganawa, S., Nakamura, T., Kumada, H., Miura, S., Takizawa, O., Ishigaki, T.: Differentiation of noncancerous tissue and cancer lesions by apparent diffusion coefﬁcient values in transition and peripheral zones of the prostate. J. Magn. Reson. Imag. 21(3), 258– 262 (2005) 8. Sijbers, J., den Dekker, A.J., Scheunders, P., Van Dyck, D.: Maximum likelihood estimation of Rician distribution parameters. IEEE Trans. Med. Imag. 17, 357–361 (1998)
26 S. Baraldo et al. 9. Walker-Samuel, S., Orton, M., McPhail, L.D., Robinson, S.P.: Robust estimation of the apparent diffusion coefﬁcient (ADC) in heterogeneous solid tumors. Magn. Reson. Med. 62(2), 420–429 (2009)10. Woodhams, R., Matsunaga, K., Iwabuchi, K., Kan, S., Hata, H., Kuranami, M., Watanabe, M., Hayakawa, K.: Diffusion-weighted imaging of malignant breast tumors: the usefulness of apparent diffusion coefﬁcient (ADC) value and ADC map for the detection of malignant breast tumors and evaluation of cancer extension. J. Comput. Assist. Tomo. 29(5), 644–649 (2005)