398 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2012familiar with basic CS terminology and only review the most relevant where n is the measurement noise. The CS measurement matrix 8CS concepts. is required to satisfy some stable embedding conditions for stable re- We address the particular issue with noisy CS recovery. Typically, a covery . This is reﬂected in the restricted isometry property (RIP)noise-aware CS recovery is formulated to minimize the `1 norm of the constant S , which is the smallest value such that, for all S -sparse vec-recovered signal under the constraint that the `2 norm of the residuals is tors x 2 N , the following holds:bounded by a factor proportional to the noise level. While the use of the`1 norm on the recovered signal leads to a tractable convex optimization (1 0 S )kxk2 k8xk2 (1 + S )kxk2: 2 2 2 (2)problem, the use of the `2 norm on the residuals in the existing CStheory has been a common practice without much investigation. Two Intuitively, the RIP constant indicates the degree of orthogonality be-facts could explain this. First, the use of `2 norm, apart from making the tween the columns in 8 . In the CS literature, 8 is constructed usuallyproblem still convex, simpliﬁes the derivation of recovery algorithms. from random ensembles as they provides good RIP with high proba-Second, the measurement noise is approximately Gaussian; this makes bility. It is noted that, as M N , the recovery of x from y in (1)the use of the `2 norm on the residuals a somewhat optimal choice. If is typically ill-posed. However, under the assumption that x is sparse,the signal is sparse, CS theory tells us that the recovery is stable in a the CS theory has shown that the following formulation can provide asense that the error is upper bounded by a factor proportional to the reliable recovery with an error upper bounded by O (knk2 ):noise level , , . However, it is well known in the literature that, in many practical (P1 )^ = arg xmin ky 0 8xk2 + kxk1 x 2 2 (3)situations, the noise behavior is impulsive and that the probability den-sity function has a much heavier tail than the Gaussian counterpart. for a suitable value of the regularization parameter . Hereafter, weIn image processing, impulsive noise, i.e., including salt-and-pepper shall refer to (3) as the standard CS formulation. Specialized algorithmsnoise and random-valued noise, is a common model for causes such as to solve this formulation include those in – and .bit errors in transmission, malfunctioning pixels, faulty memory loca-tions , and buffer overﬂow . This motivates a number of impul- B. Proposed Formulationsive noise suppression methods (see , , , and the referencestherein). If the impulsive ambient noise enters a CS imaging application Formulation (3) is essentially a tradeoff between two goals: modelat the sensory level, the compressed data will be contaminated. Con- ﬁtting (via minimizing the `2 norm of the residuals) and promotion ofsequently, this leads to a larger error and reduced performance. One sparsity (via minimizing the `1 norm of the signal). When the noise iscould argue that the regularization parameter can be adjusted so that Gaussian, this objective function is optimal in the maximum likelihoodthe recovery error is still ﬁnite. However, the `2 norm of the residual sense. However, when the noise is impulsive, the theory of robust statis-can be large due to outliers, thus reducing recovery efﬁciency. tics indicates that it would be prone to larger errors. Impulsive noise is http://ieeexploreprojects.blogspot.com We address this issue by integrating robust statistics and the CS characterized by a small percentage of samples having extremely largetheory. We propose a new formulation for CS called robust CS, fol- values, and its modeling has been studied widely in the literature ,lowing the principle of robust statistics , i.e., by using a convex . One could argue that by adjusting the regularization parameterbut subquadratic cost function on the residuals. This new cost function , the effect of outliers can be reduced. However, the recovered signalplaces less weight on large residuals, giving it the ability to suppress could be far from optimal as the parameters are adjusted considerablyoutliers while retaining the optimality of the standard formulation when to cope with outliers.the noise is Gaussian. It is known from the theory of robust statistics1 that a better Our contributions are twofold. First, we show that the new CS strategy is to replace the quadratic cost function on residual ky 0 8 xk2 2formulation can be solved readily by using the majorization–mini- in (3) with a less rapidly increasing cost function g (x) so that the ob-mization (MM) framework that solves iteratively a series of simpler jective function can be written asproblems whose solutions approach that of the main problem. Thesesimpler problems are similar to that in the existing CS formulation; f (x ) = g (x ) + kx k 1 (4)thus, state-of-the-art CS solvers can be deployed readily. Second, weprove that the new formulation can reduce effectively the recovery and the robust CS recovery is obtained by solvingerror bound and show that the improvement is related directly to theportion and the strength of the outliers in the noise samples. Our r ^ (P1 ) x = arg xmin f (x): (5)claims are veriﬁed through a number of numerical studies; all of which 2reveal the advantage of the robust CS formulation over the standard Although there is a wide range of cost functions in the robust statis- tics literature, we use g (x) = M (yi 0 (8x)i ), where (r) is the 8CS formulation. i=1 Huber’s penalty function (soft limiter) given as follows:2 II. PROPOSED METHOD r jrj k 2 ( )= r 2 ; 0k (6) 2 + kjrj; jrj k 2A. Problem Settings Without loss of generality, assume that signal x 2 N is S -sparse and its derivative is given byin basis 9 = I. In this model, there are only S nonzero entries in x, rand their locations are unknown. We are interested in the problem of (r) = 0 (r) = ksgn(r); jjrjj k 2 : ; r k 2 (7)sampling x in a nonadaptive and compressive manner and recovering x 1The robust statistical literature is well established, and for background ma-from the compressed measurements. CS uses the measurement matrix8 2 M2N , where M N , to obtain terial, the reader is referred to . 2Please see  for a discussion on how to select the parameters. We also note that the choice of the Huber function is simply because of its simplicity. y = 8x + n (1) Other robust cost function for speciﬁc impulsive distributions are also available.
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2012 399 Essentially, this cost function is quadratic where the noise samples in the discussion to follow, is denoted as x(k) . It is always possible ^are most concentrated and linear where the outliers reside. Problem to majorize g (x) at x(k) by the following quadratic function l(k) (x), ^P1 is an extreme case of P1 when one uses (r) = r2 . We note that r given asthe proposed objective function f (x) is a convex function; hence, any Tlocal minimum is also a global minimum. Furthermore, under typical l (k ) ( x ) = g x (k ) ^ + (1=2) x 0 x (k ) ^ T 8 W8 x 0 x (k ) ^CS settings, where the CS matrix 8 is drawn from random ensembles Tand its dimensions are practically meaningful with N M 1, it 0 x 0 x (k ) ^ 8 T y 0 8x(k) ; (8) ^can be shown that the proposed problem has a unique solution with aprobability of almost one.3 where we have used notation (r) = [ (r1 ); . . . ; (rn )]T . The ma- In theory, it might be possible to use a smooth version of kxk1 such jorization can be veriﬁed easily by three facts: First, l(k) (^ (k) ) = xas n x2 = 2 + x2 to convert (5) to a simple objective function i=1 i i g (^ (k) ); second, rl(k) (x)jx=^ x = rg (x)jx=^ = 08 T (y 0 x xthat can be solved by the gradient method. However, as x is likely to (k ) 8 x ); and last, for the Hessian, we have ^be sparse and the Hessian is ill-conditioned, this approach suffers fromextremely slow convergence and larger errors . In what follows, we rrT l(k) (x)jx=^ = rrT g(x)jx=^ x xpropose a computationally efﬁcient method to solve (5) by exploitingthe power of existing CS solvers for (3). =8 T W 0 diag y i 0 8 x (k ) 0 ^ 8: (9) i Due to the fact that j 0 (x)j is bounded, which is a property of robust penalty functions, it is always possible to ﬁnd W such that g (x) isC. Proposed Algorithm majorized at x(k) by l(k) (x). ^ We note importantly that (8) can be rearranged after straightforward Compared with the standard CS formulation (3), the proposed robust manipulation so that it becomes the familiar quadratic form found inCS formulation (5) is different only in the choice of (x). Existing standard CS as follows:methods often exploit the speciﬁc nature of the CS formulation. For Texample, the gradient projection method  converts the original CS l(k) (x) = (1=2) v(k) 0 8x W v(k) 0 8x + C (10)formulation to a bound constrained quadratic programming problem.However, it appears difﬁcult to do so with the robust CS formulation wherebecause of the subquadratic part . T We deviate from the approaches that solve (5) directly. Rather, we C =g x(k) 0 (1=2) y 0 8 x(k) W01 y 0 8 x(k) ^ ^ ^rely on a popular technique in optimization, which is known by dif- (11) http://ieeexploreprojects.blogspot.com (k)ferent names such as bound optimization, surrogate functions, or ex-plicit proximal methods . It is now better known as the MM frame- v(k) =W01 y 0 8 x +8 x(k) :^ 8^ (12)work . Our proposed approach is to leave the `1 -norm term in (5)intact and majorize the loss function g (x) with the suitable quadraticmajorization l(x) recursively so that the familiar form of standard CS Successive Standard CS Approximation:can be realized. The main idea of the MM framework is to consider Returning to (5), as g (x) is majorized by l(k) (x) at x(k) , it follows ^recursively a series of problems, i.e., the upper bound of the originalproblem, such that those alternative problems are easier to solve and that f (x) is majorized by h (k ) ( x ) = l (k ) ( x ) + k x k 1that the solution from these problems converge to the solution of the (13)original problem. Two related works by Nesterov  and Beck and Teboulle  solve at x(k) . In the MM framework, we optimize h(k) (x) instead of op- ^a general CS formulation. They both propose MM algorithms where timizing f (x) directly. Suppose that the minimization of h(k) (x) is x(k+1) ; then, the generated sequence fx(k) g can be shown to convergethe quadratic majorization is isotropic; thus, they are slightly more re- ^ ^strictive than what is described here. In both  and , an increased to the solution of (5). The intuition can be illustrated in Fig. 1. Thecomputational burden is spent on optimizing the surrogate function to minimization of h(k) (x) leads to x(k+1) , which then becomes the next ^ﬁnd the optimal Lipchitz constant. Here, we follow the approach in iterative point for majorizing f (x). This procedure leads to the itera-robust statistics explicitly specifying the majorization. This leads to al- tive minimization of f (x). Indeed, by the deﬁnition of majorization, we have h(k) (x) f (x). According to the deﬁnition of x(k+1) , it fol-gorithmic simplicity and stability. We note, however, that both  and ^ have extended the basic MM framework in that the update rule also lows thatinvolves historical points to speed up the convergence. As a result, they f x(k) = h x(k) h x(k+1) f x(k+1)both achieve quadratic convergence. Here, we only use the basic up- ^ ^ ^ ^ (14)date rule of the MM framework, and linear convergence is obtained.While it appears likely that the same strategy can also be used in ouralgorithm to achieve quadratic convergence, this is beyond the scope of where the equality only occurs at the global optimum. Sincekf (x)this paper. Our main goal is to prove that the new formulation is solv- is convex and bounded from below, this implies limk!1 f (^ ) = x( ) minx f (x); thus, fx g converges to the global optimum. (k ) ^able with existing CS machinery and provide more theoretical insightsvia statistical analysis. Choice of W : Roughly speaking, a majorization that is closer to the Majorization of the Robust Loss Function: We note that the Huber actual function is better. Two popular choices in the robust statisticspenalty function has a quadratic behavior for small residuals and a literature  are the following:linear behavior for large residuals, and its curvature at any point should 1) The modiﬁed residual (MR) method, i.e., W = I, where max j (x)j; 0not exceed the curvature of a quadratic function. The point of interestat the outer iteration k , where the precise meaning of k will be clearer 2) Iteratively reweighed least-squares (IRLS) method, i.e., W is a (k ) (k ) diagonal matrix with entries wii = (ri )=ri and residuals 3A proof is available upon request. r = y 0 8x when (r)=r is a monotonically decreasing (k ) ^ (k )
400 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2012 x level sets of f ( ) are bounded such that x Proposition 2: Suppose that 3 is the solution of (5) and that the x x k 0 3k R 8x 2 X : x ^ f (0) f( x) (16) x x where ^ (0) is the initial point. Then, sequence ^ (k) converges linearly according to f x ^ (k ) 0 f( x3 ) 2 + 2 k R2 : (17) The proof of this result is based on arguments similar to those in  and, for completeness, is included in the Appendix. We also note that, as the objective function is strictly convex and bounded from below, the assumption of bounded level sets in the proposition is satisﬁed for the problem being considered. III. ANALYSIS A. Optimality Fig. 1. Comparison function illustration. The choice of the regularization parameter , which is a a model se- lection problem, is crucial for a convex relaxation technique to work properly. Typically, this problem is solved for a set of , which is function for r 0, such as the case of the Huber’s penalty func- often called the regularization path, and the value of the objective func- tion. tion is examined carefully to select the optimal value. The optimal We now show how existing CS solvers can be used readily to solve value of is selected typically when constraints in the formulation are W I y 8xthe robust CS formulation. satisﬁed. For example, in the standard CS formulation (3), the regu- = is chosen. Problem (P1 ) larization parameter is chosen such that k 0 ^ k2 , where n n Proposition 1: Suppose that MRxcan be solved iteratively as follows. Starting from some initial estimate = k k2 . However, the actual is not known, and is estimated^ (0) , the update at iteration k is a solution of the following standard typically using, e.g., the concentration property of Gaussian randomCS problem: http://ieeexploreprojects.blogspot.comgiven an estimate of the noise variance 2 , n p variables. In other words, x ^ (k+1) = arg min x2 v (k ) x 2 0 8 2 + k k12 x it can be shown that k k2 pM + 2 2M with high probability; y 8x (15) thus, k 0 ^ k M + 2 2M is used commonly to select the 2 regularization parameter . The noise variance 2 can be estimated v y x x xwhere (k) = (1=) ( 0 8 ^ (k) ) + 8 ^ (k) : Sequence f ^ (k) g con- using standard statistical methods.verges to the global solution of (5). n To extend the selection method to the robust CS formulation and The proof of this result is simply a veriﬁcation. If one uses the IRLS instead of using k k2 as the bound on the `2 norm of the residual, we W x x propose to use 2 2 g ( ) to bound 2 2 g (^ ), where g () is deﬁned rightchoice for , the iterative procedure instead solves x after (5). If one sets to be large enough, 2 2 g ( ) ! k k2 , and this 2 n x v ^ (k+1) = arg min (k) 0 8 x2 x W v T (k ) (k ) x 08 +2k k1 x approaches the selection criterion in the standard CS formulation. To compute this bound, the scale parameter has to be estimated by using, W e.g., the median absolute deviation from the median as follows: (k )where needs to be recalculated at each iteration. For the MRchoice, many existing CS solvers can be used without modiﬁcation. If, n MAD( ) = 1:4826 mediani jni 0 medianj (nj )j (18)on the other hand, IRLS is chosen, one may need to modify the CSsolver in the case where, e.g., the CS matrix 8 is not deﬁned explic- which is a standard robust statistical method . To ensure that theitly for efﬁcient implementation such as in the partial discrete cosine majority of normal noise samples are bounded with high probabilitytransform (DCT) matrix. The other difference is that the MR majoriza- in a similar manner to the standard CS case, we use = MAD( ), ^ ntion is global (when maxx j 0 (x)j), whereas the IRLS may not where constant = 2 is selected. It can be seen that, if is chosen tobe so; hence, backtracking might be required with the IRLS choice . be very large, the robust CS formulation approaches the standard CSOverall, IRLS has less outer iterations than MR for a given convergence formulation.criterion. Along the regularization path, it is widely known that, when is Convergence Rate: We note that each step of the proposed algo- large, the solution of the standard CS (and also for robust CS) tends torithm is also an optimization problem (15), whose convergence rate zero. This upper bound on is useful in practice to set up the regular-is largely dependent on which CS solver is used. Here, we are more ization path. This value can be found from the optimality conditions, x 0interested in the convergence rate of the outer loop, i.e., how fast the i.e., the gradient at a point is zero. However, as the objective functionsequence f ^ (k) g generated in (15) converges to the global optimum in is not differentiable at point , we need to use subdifferential calculus(5), assuming that (15) can be solved with high accuracy. Clearly, this instead . It follows that condition = is optimal when x 0 xconvergence rate depends on how close g ( ) is approximated by l( ).Being an MM algorithm, the proposed method inherits the convergence x 8T ( ) : y (19) 1property of typical MM algorithms, which is linear . By followingthe same approach as in , one can obtain the following result forthe MR choice. where max = k8T ( )k1 . y Therefore, in practice, the regularization path should be [0; max ],
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2012 401B. Bounds on Recovery Error To study the recovery error of constrained `1 minimization for thesignal model (1), it is more convenient to express the recovery as min kxk1 ; x2C (20)where C denotes the constraint set. For the standard CS recovery, theconstraint set is deﬁned as C = fx : ky 0 8xk2 g (21)where is a suitable upper bound on the `2 norm of the noise, i.e., kxk2 , which is already discussed in the preceding section. For theproposed robust CS, the constraint set is deﬁned as C = f x : g (x ) tg (22)for some suitable t, as discussed in the preceding section. Here, g x ()is the subquadratic cost function in (4). For notational simplicity, weintroduce notation ky 0 xk3 8 =2 ( ) 2 g x so that one can view k k3 Fig. 2. Illustration for the construction proof when + .as a subquadratic vector norm in a loose sense (this norm depends alsoon and ). Due to the deﬁnition of the Huber’s penalty function, wenote importantly that kxk3 kxk2 for any vector x. Hence, the aboveconstraint set can be written as ^ what x is, the existence of such a signal model and the above observa- tion allow us to conclude on the error bound of the robust CS recovery C = x : ky 0 8 xk2 2 3 (23) of the original signal model. We note that, under the robust CS formulation, the following inequal- ities hold:where 2 is a suitable upper bound of the squared subquadratic normof the noise n. Due to the deﬁnition of the subquadratic norm, there knk3 2 2 kn + zk2 2 :exists such that http://ieeexploreprojects.blogspot.com ; 3 (26) Next, we deﬁne the following index sets: knk3 : (24) I1 = i : jn + z j k 2 ; I2 = i : jn + z j k 2 : (27) ^ Then, by denoting x as the solution of the proposed robust CS re- i i i icovery, it also satisﬁes We propose to construct the elements of w as follows: + 1) For i 2 I1 , we note that, because jni zi j k 2 , then 2 ni 2 ( + 8^ ky 0 xk3 : (25) zi )=( + ) ni zi 2 . Hence, we set wi = ni , which implies that ( + ) =2 ( + wi zi 2 2 ni zi . ) For the sake of the following discussion, we assume that such , , 2) For i 2 I2 , without the loss of generality, we assume that ni + , and have been speciﬁed properly (as discussed in the preceding 0 + 0 zi (case ni zi is proven similarly). The motivation ofsection) such that the above inequalities hold. We also assume that the the construction of wi in this case is illustrated in Fig. 2. Here, weerror bound of CS recovery is consistent with the proof in . The ( aim to ﬁnd wi such that wi zi 2 + ) =2 ( + ) 2 ni zi . It is canfollowing result indicates that the robust CS recovery has a lower error be seen easily that, with wi = 2 ( + ) 2 ni zi 0 zi , we havebound than that of the standard CS recovery. ( + ) =2 ( + wi zi 2 2 ni ) zi and wi ni . Theorem 1: In the presence of impulsive noise, the robust CS re-covery has an error bound less than or equal to that of the standard CS From this construction, it follows that, by taking the sums over in-recovery. dices I1 and I2 : Proof: We start from an important observation that, for a givensignal model y =8 + 8 x n with ﬁxed , the `2 -norm error bound of 2 kwk2 = 2 wi + 2 wi (28)standard CS recovery is proportional to knk2 by a universal constant i2I i2I 8that depends only on the RIP property of . This implies that, if we kw + zk2 = (w + z )2 + (w + z )2have another signal model v =8 + x w, where kwk2 knk2 , then 2 i2I i i i2I i ithe error bound of the latter model is less than that of the former model.Please note that this is only for the error bound and that no statement is = (n + z )2 + i i 2 2 (n + z )2 i i i2I i2Imade on the actual error. Our strategy is to prove by construction thatthe robust CS recovery is equivalent to the standard CS recovery with = 2 2 (n + z ) + i i 2 2 (n + z ) i inoise w whose `2 norm is smaller than that of the original problem n. i2I i2I Indeed, we ﬁrst recall in Section II that the proposed robust CS is = kn + zk2 : 3 (29)convex and has a unique solution with a probability of almost 1. Sup- ^pose that the robust CS solution is x, and deﬁne z = 8( ^) x 0 x . Similar Then, we haveto the MM framework and to approximate the objective function, we ^ﬁnd an equivalent signal model at solution x. While we do not specify kwk2 ; kw + zk2 (30)
402 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2012where is such that knk2 . The existence of such w indi- Then, under the assumption that is large and is small, we cancates that the error bound is improved. Indeed, applying the triangle approximate the term on the right by O(). This implies that theinequality, we have gain is related directly to the percentage of the outliers in the noise as well as the strength of the outliers . kzk2 kwk2 + kw + zk2 : (31) IV. EXPERIMENTS ^ Then, it follows from  that the `1 -norm minimization solution x1 We present a number of numerical studies to demonstrate the ro-satisﬁes bustness of the proposed formulation in CS applications where there are outliers in the noise. We use model (33) for modeling impulsive (32) noise. The SNR is deﬁned as 10 log(1=total ). 2 x 8x k^ 0 xk2 C k8(^ 0 x)k2 For the CS settings, we consider the measurement matrix 8 to be a p pwhere C = 2 1 + 2S =(1 0 (1 + 2)2S ) is a universal constant, partial DCT matrix, which is obtained by randomly selecting M out2S is a 2S-RIP constant of 8 , and S is the number of nonzero entries of N rows of the full DCT matrix. The actual realization of the partial ^in x. Together with deﬁnition z = 8 (x 0 x), it follows that the error DCT matrix is known to be not important in CS applications due tobound of robust CS is C ( + ). Comparing with the error bound of the concentration property of the random ensemble constructed fromCS as 2C and noting that and , the proof follows. the DCT matrix. The choice of M follows the common practice of CS , i.e., M is at least 3–5 times the intrinsic dimension, which isC. Discussion the minimum number of (signiﬁcant) nonzero coefﬁcients necessary to represent an image under some basis functions. In all experiments, we While the above result indicates that doing a robust CS recovery is choose M = 0:5N to ensure that this CS requirement is always met.likely to yield smaller error, it does not indicate what the gain would We also select the l1 ls algorithm  as the solver for the standarddepend on. Here, we aim to answer this question. To do so, we need CS formulation as this algorithm produces very accurate results due toto characterize the noise more explicitly. There are many statistical the nature of its interior-point method. For the l1 ls algorithm, we setmodels for impulsive noise. For simplicity, we consider a two-term the relative dual gap of 1% as the convergence criterion. In addition,Gaussian mixture model whose probability distribution function is we do not consider the postprocessing of the results as it may further introduce errors . f = (1 0 )N(0; ) + N(0; ): 2 2 We note importantly that, due to the projection mechanism of CS sensing, i.e., mapping from the high-dimensional space N to the low- (33) dimensional space M , the CS measurement y tends to have noise-like The nominal background noise is represented by the ﬁrst term, and appearance and does not carry directly the semantic meaning similar tothe effect of impulsiveness is captured in the second term. This model http://ieeexploreprojects.blogspot.com Thus, it may be difﬁcult, if not impos- what the original image does.has been used regularly in modeling practical environmental noise  sible, to localize the strongly corrupted samples simply by examiningas it serves as an approximation to Middleton’s Class A noise model.While denotes the portion of outliers in the noise, indicates their y. Even if the corrupted samples can be localized, one may have nostrength. The total noise variance is total = (1 0 ) 2 + 2 . 2 other options (e.g., interpolation or imputation) but to discard the bad samples, which is not efﬁcient for recovery. First, we use a conservative inequality to lower bound the improve-ment as follows: A. Recovery of the Random-Bar Image We consider a synthetic image of random bars, as shown in Fig. 3, 2C 0 C ( + ) C ( 0 ): (34) which has been used in the previous CS literature . The Haar wavelets are used as the basis functions because the image is sparse in As C is a constant, the error bound improvement of O( 0 ) de- this transform domain. In particular, with a three-level Haar waveletpends only on the gap between and , which in turn depends on the decomposition, the number of nonzero wavelet coefﬁcients is 5%impulsiveness of the noise. Under the assumption of the Gaussian mix- of the dimension of the image. We obtain the CS measurements onture model for impulsive noise, it can be shown easily by computing this image, recover the Haar wavelet coefﬁcients, and compare theE [n2 ] and E [2 2 (ni )] and that i recovered image with the original one in a similar manner with that of previous work , . = O (1 0 ) + ) 2 2 (35) First, we repeat the recovery over a number of runs where only 2 2 p = O (1 0 ) + k (2 2 0:8 0 k ) 2 2 the Gaussian noise varies and measure the average output peak SNR O 2(1 0 + 1:6pk ) (PSNR) of the recovered images using the standard CS and the (36) proposed robust CS formulations. Here, the regularization parameter is computed automatically, as discussed above. In the presence of pwhere we have made used of the fact that, for a zero-mean Gaussianp k under an impulsive noise [jZ j] = 2= Thus, 0applyingthatrandom variable, Z N (0; 2 ), E 2 :8 and Gaussian noise, the dependence of the output PSNR of the amount of assumption. by the noise is depicted in Fig. 4. Clearly, the performance of the proposedfollowing inequality for a, b, c 0, and b c: CS formulation is almost equivalent to that of the standard CS formu- lation. When the noise power is smaller, both formulations approach p p b0c perfect CS recovery, which is exhibited by high output PSNRs. a +b0 b + c 2 pa + b (37) Next, we compute similarly the average output PSNR when impul- sive noise is present instead. The fraction of outliers in impulsive noise is = 0:1, and its strength is = 100. The PSNR performance isone obtains shown in Fig. 4. Clearly, the proposed robust CS formulation outper- (1 p 1:6k=p) forms consistently the standard CS formulation by several decibels over 0 O 0 : the simulated range for the noise. At high SNRs, the gap between the 2 1 + (38) proposed CS formulation and the standard CS formulation appears to
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2012 403 Fig. 3. Random-bar example of CS recovery for the compared formulations in the presence of impulsive noise with SNR = 10 dB. http://ieeexploreprojects.blogspot.com Fig. 5. Inﬂuence of impulsiveness. Fig. 4. Comparison of output PSNR (random bars). ﬁxed at SNR = 20 dB. The impulsive noise is the same as the previousbecome smaller, possibly due to the robustness of `1 -norm regulariza- experiment: = 0:1 and = 100. The recovered MRI images fortion to small noise. Typical recovered images for both formulations are both standard and robust CS formulations are shown in Fig. 6. Quan-shown in Fig. 3. The proposed CS formulation yields images that are titatively, it is observed that the PSNR of the recovered image by thetypically less distorted than the others. proposed CS formulation has a slightly higher value than that obtained To examine the main result in the theorem on the gain of the proposed by the standard CS formulation at an improvement of 1 dB. We also ex-formulation versus impulsiveness, we ﬁx the noise at SNR = 10 dB, amine the residual image from both formulations, as shown in Fig. 7,vary parameter from 100 to 1000 (extremely impulsive), and mea- and ﬁnd that there are more hot spots in the residual images obtainedsure the output PSNRs for both formulations. The results, which are from the standard CS formulation than for those obtained from the pro-depicted in Fig. 5, clearly show that, as expected, the standard CS for- posed formulation.mulation is dependent only on the noise power and not the character-istics of the noise (i.e., the approximately constant output PSNR over V. CONCLUSIONthe impulsiveness range) and that the improvement in the PSNR of the We have presented a new approach to improving CS recovery in theproposed CS formulation is larger as the level of impulsiveness is in- presence of impulsive noise. By using a robust cost function on thecreased. residuals, we are able to suppress large outliers in the measurement noise. This results in an improved recovery because the regulariza-B. Recovery of an MRI Image tion parameter is not inﬂuenced signiﬁcantly by these outliers, meaning We consider an MRI image, as shown in Fig. 6. As shown in , that the recovered signal is not being driven further toward 0. We alsoMRI is very relevant to CS as the sensing cost can be reduced consid- show that an iterative algorithm can be developed readily under theerably without compromising the quality of the image. In contrast with MM framework to utilize the power and computational efﬁciency ofthe previous random-bar image where the Haar wavelet coefﬁcients are the existing CS solvers to obtain the solution of the new formulation.truly sparse (5% nonzeros), the wavelet coefﬁcients of this MRI image Most importantly, we have established a theoretical guarantee on theare not sparse but approximately follow an exponential decay. In the improvement of the upper bound of the recovery error. The numer-CS literature, this is often referred to as compressible, which reﬂects ical studies on both synthetic and real images show that the proposedmore accurately the characteristics of images in real life. The purpose formulation achieves equivalent performance when the noise is indeedof this paper is to examine the quality of the recovered images from CS Gaussian, but an improvement is found when the noise is heavily im-samples under the inﬂuence of impulsive noise. To do so, we also use pulsive. The proposed method can be used to improve further CS re-the partial DCT matrix to obtain the CS samples. The noise power is covery upon an inspection of the residuals for impulsiveness.
404 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2012 Fig. 6. Recovery of an MRI image using standard and robust CS formulations. http://ieeexploreprojects.blogspot.com Fig. 7. MRI residual images. APPENDIX I easily that, if f (^ (0) ) 0 f (x3 ) R2 , then the optimal = 1; thus, x First, due to the convexity of g (x), it holds that f (^ (1) ) 0 f (x3 ) R2 =2. Otherwise, and for k 1, we always have x f (^ (k) ) 0 f (x3 ) R2 ; thus x T g (x ) g x (k ) ^ + rg x (k ) ^ x 0 x (k ) ^ : (39) 0 f (x 3 ) 2 Thus, according to the deﬁnition of x(k+1) f x (k ) ^ ^ f x ^ (k+1) 0 f (x 3 ) f x (k ) ^ 0 f (x 3 ) 0 2R2 : min x0x 2 (k+1) (k ) f x ^ f (x) + (=2) ^ : (40) x 2 From this and using the fact that f (^ (k) ) 0 f (x3 ) f (^ (k+1) ) 0 x x f (x3 ), one can derive easily the result on a linear convergence rate, as It can be seen easily that the generated sequence fx(k) g satisﬁes ^f (^ (k) ) f (^ (0) ) so that kx(k) 0 x3 k2 R. Using the upper bound stated in the proposition. x x ^(40) over segment [^ (k) ; x3 ], we have x ACKNOWLEDGMENT The authors would like to thank Dr. B. Adams for proofreading the f x(k+1) ^ manuscript, and the anonymous reviewers for providing constructive min 2 ;[0 1] f x + (1 3 0 )^ k x ( ) feedback that help improve technical clarity. REFERENCES 0 )^ k 0 x3 2 3 +=2 x + (1 x ( ) 2  A. Auslender and M. Teboulle, “Interior gradient and proximal methods for convex and conic optimization,” SIAM J. Optim., vol. 16, min 0 x3 2 (k ) =2 x 2 no. 3, pp. 697–725, 2006. ^ 2 ;[0 1] 2  L. Bar, A. Brook, N. Sochen, and N. Kiryati, “Deblurring of color im- 0 0 f (x 3 ) + f ages corrupted by impulsive noise,” IEEE Trans. Image Process., vol. f x (k ) ^ x (k ) ^ 16, no. 4, pp. 1101–1111, Apr. 2007.  A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding al- min 2 ;[0 1] R =2 0 f x 2 2 ^ k 0 f (x 3 ) ( ) gorithm for linear inverse problems,” SIAM J. Imag. Sci., vol. 2, no. 1, pp. 183–202, Jan. 2009. +f x (k ) ^ :  J. Bobin, J.-L. Starck, and R. Ottensamer, “Compressed sensing in astronomy,” IEEE J. Sel. Topics Signal Process., vol. 2, no. 5, pp. 718–726, Oct. 2008. Without the constraint, the right-hand side attains a minimum at =  S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, ) 0 f (x ))=(R ). However, as 2 [0; 1], it can be seen (k ) 3 x(f (^ 2 U.K.: Cambridge Univ. Press, 2004.