Improved image recovery from compressed data contaminated with impulsive noise.bak


Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Improved image recovery from compressed data contaminated with impulsive noise.bak

  1. 1. IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2012 397plication, 52 absolute, 36 shift, and 10 division operations for every 2 [17] D. Menon and G. Calvagno, “Regularization approaches to demo-by 2 GRBG input pixel block. The highest performing IGD method re- saicking,” IEEE Trans. Image Process., vol. 18, no. 10, pp. 2209–2220,quires between 266 and 374 operations for the same 2 by 2 block [19]. Oct. 2009. [18] I. Pekkucuksen and Y. Altunbasak, “Gradient based threshold freeA detailed complexity comparison table can be found in [19]. color filter array interpolation,” in Proc. IEEE Int. Conf. Image A challenging image region is presented in Fig. 2 for visual quality Process., Sep. 2010, pp. 137–140.comparison. The performance of the proposed solution under noise is [19] K. H. Chung and Y. H. Chan, “Low-complexity color demosaicing al-compared against three highest performing methods in Fig. 3. Note that gorithm based on integrated gradients,” J. Electron. Imag., vol. 19, no. 2, p. 021104-1-15, Jun. 2010.the LPA method has a noise reduction component built-in, so its resulthas less visible noise but more false color artifacts. We observe that thenoise sensitivity of the proposed method is comparable with the GBTFand IGD methods. Improved Image Recovery From Compressed Data Contaminated With Impulsive Noise IV. CONCLUSION Duc-Son Pham, Member, IEEE, and We presented a simple edge strength filter and applied it to the CFA Svetha Venkatesh, Senior Member, IEEEinterpolation problem. The edge strength filter helped us identify the re-gions where constant color difference assumption is likely to fail whichin turn lead to improved demosaicing performance. Further research Abstract—Compressed sensing (CS) is a new information samplingefforts will focus on improving the interpolation results by exploiting theory for acquiring sparse or compressible data with much fewerspectral correlation more effectively and applying the proposed edge measurements than those otherwise required by the Nyquist/Shannonstrength filter to other image processing problems. counterpart. This is particularly important for some imaging applications such as magnetic resonance imaging or in astronomy. However, in the existing CS formulation, the use of the norm on the residuals is not REFERENCES particularly efficient when the noise is impulsive. This could lead to an increase in the upper bound of the recovery error. To address this [1] B. E. Bayer, “Color Imaging Array,” U.S. 3 971 065, Jul. 1976. problem, we consider a robust formulation for CS to suppress outliers in [2] R. Kimmel, “Demosaicing: Image reconstruction from color CCD sam- the residuals. We propose an iterative algorithm for solving the robust ples,” IEEE Trans. Image Process., vol. 8, no. 9, pp. 1221–1228, Sep. CS problem that exploits the power of existing CS solvers. We also show 1999. that the upper bound on the recovery error in the case of non-Gaussian [3] R. Lukac and K. N. Plataniotis, “A normalized model for color-ratio noise is reduced and then demonstrate the efficacy of the method through based demosaicking schemes,” in Int. Conf. on Image Process., 2004, numerical studies. vol. 3, pp. 1657–1660. sensing (CS), image compression, impulsive [4] J. F. Hamilton and J. E. Adams, “Adaptive color plan interpolation in Index Terms—Compressed single sensor color electronic camera,” U.S. Patent 5 629 734, Mar. 13, noise, inverse problems, robust recovery, robust statistics. 1997. [5] C. A. Laroche and M. A. Prescott, “Apparatus and method for adap- tively interpolating a full color image utilizing chrominance gradients,” U.S. Patent 5 373 322, Dec. 13, 1994. I. INTRODUCTION [6] J. W. Glotzbach, R. W. Schafer, and K. Illgner, “A method of color filter array interpolation with alias cancellation properties,” in Proc. Compressed sensing (CS) [8], [12] is a new direct information sam- IEEE Int. Conf. Image Process., 2001, vol. 1, pp. 141–144. pling theory specifically for the acquisition and recovery of sparse or [7] B. K. Gunturk, Y. Altunbasak, and R. M. Mersereau, “Color plane in- compressible data as an alternative to the existing Nyquist/Shannon terpolation using alternating projections,” IEEE Trans. Image Process., sampling counterpart to exploit the characteristics of the signals. This vol. 11, no. 9, pp. 997–1013, Sep. 2002. [8] N.-X. Lian, L. Chang, Y.-P. Tan, and V. Zagorodnov, “Adaptive fil- theory is essentially a direct information sampling scheme. Such a tering for color filter array demosaicking,” IEEE Trans. Image Process., scheme is crucial for some applications where reducing the sensing vol. 16, no. 10, pp. 2515–2525, Oct. 2007. cost is desirable, such as MRI [25]. [9] R. H. Hibbard, “Apparatus and method for adaptively interpolating a The CS theory has led to numerous computationally efficient re- full color image utilizing luminance gradients,” U.S. Patent 5 382 976, Jan. 17, 1995. covery methods such as pursuit algorithms [11], [26], [28], optimiza- [10] J. E. Adams and J. F. Hamilton, Jr., “Adaptive color plan interpolation tion algorithms [16], [23], a complexity regularization algorithm [19], in single sensor color electronic camera,” U.S. Patent 5 506 619, Apr. and Bayesian methods [22]. The CS theory has been found useful in 9, 1996. a number of imaging applications, including MRI [25], astronomy [4], [11] K.-H. Chung and Y.-H. Chan, “Color demosaicing using variance of and high-SNR image compression [18]. We refer the reader to the CS color differences,” IEEE Trans. Image Process., vol. 15, no. 10, pp. 2944–2955, Oct. 2006. repository ( for background material and current [12] L. Zhang and X. Wu, “Color demosaicking via directional linear min- development in this area. In the following, we assume that the reader is imum mean square-error estimation,” IEEE Trans. Image Process., vol. 14, no. 12, pp. 2167–2178, Dec. 2005. [13] D. Paliy, V. Katkovnik, R. Bilcu, S. Alenius, and K. Egiazarian, “Spa- tially adaptive color filter array interpolation for noiseless and noisy Manuscript received March 02, 2010; revised August 26, 2010, December data,” Int. J. Imag. Syst. Technol., vol. 17, no. 3, pp. 105–122, 2007. 02, 2010 and February 07, 2011; accepted July 11, 2011. Date of publication [14] X. Li, B. Gunturk, and L. Zhang, “Image demosaicing: A systematic September 12, 2011; date of current version December 16, 2011. The associate survey,” Proc. SPIE–Int. Soc. Opt. Eng., vol. 6822, p. 68221J-1-15, Jan. editor coordinating the review of this manuscript and approving it for publica- 2008. tion was Prof. Birsen Yazici. [15] K. Hirakawa and T. W. Parks, “Adaptive homogeneity-directed demo- The authors are with the Institute for Multisensor and Content Analysis, saicing algorithm,” IEEE Trans. Image Process., vol. 14, no. 3, pp. Curtin University, Perth, WA 6845, Australia (e-mail:; 360–369, Mar. 2005. [16] D. Menon, S. Andriani, and G. Calvagno, “Demosaicing with direc- Color versions of one or more of the figures in this paper are available online tional filtering and a posteriori decision,” IEEE Trans. Image Process., at vol. 16, no. 1, pp. 132–141, Jan. 2007. Digital Object Identifier 10.1109/TIP.2011.2162418 1057-7149/$26.00 © 2011 IEEE
  2. 2. 398 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2012familiar with basic CS terminology and only review the most relevant where n is the measurement noise. The CS measurement matrix 8CS concepts. is required to satisfy some stable embedding conditions for stable re- We address the particular issue with noisy CS recovery. Typically, a covery [7]. This is reflected in the restricted isometry property (RIP)noise-aware CS recovery is formulated to minimize the `1 norm of the constant S , which is the smallest value such that, for all S -sparse vec-recovered signal under the constraint that the `2 norm of the residuals is tors x 2 N , the following holds:bounded by a factor proportional to the noise level. While the use of the`1 norm on the recovered signal leads to a tractable convex optimization (1 0 S )kxk2 k8xk2 (1 + S )kxk2: 2 2 2 (2)problem, the use of the `2 norm on the residuals in the existing CStheory has been a common practice without much investigation. Two Intuitively, the RIP constant indicates the degree of orthogonality be-facts could explain this. First, the use of `2 norm, apart from making the tween the columns in 8 . In the CS literature, 8 is constructed usuallyproblem still convex, simplifies the derivation of recovery algorithms. from random ensembles as they provides good RIP with high proba-Second, the measurement noise is approximately Gaussian; this makes bility. It is noted that, as M N , the recovery of x from y in (1)the use of the `2 norm on the residuals a somewhat optimal choice. If is typically ill-posed. However, under the assumption that x is sparse,the signal is sparse, CS theory tells us that the recovery is stable in a the CS theory has shown that the following formulation can provide asense that the error is upper bounded by a factor proportional to the reliable recovery with an error upper bounded by O (knk2 ):noise level [6], [7], [13]. However, it is well known in the literature that, in many practical (P1 )^ = arg xmin ky 0 8xk2 + kxk1 x 2 2 (3)situations, the noise behavior is impulsive and that the probability den-sity function has a much heavier tail than the Gaussian counterpart. for a suitable value of the regularization parameter . Hereafter, weIn image processing, impulsive noise, i.e., including salt-and-pepper shall refer to (3) as the standard CS formulation. Specialized algorithmsnoise and random-valued noise, is a common model for causes such as to solve this formulation include those in [14]–[16] and [24].bit errors in transmission, malfunctioning pixels, faulty memory loca-tions [9], and buffer overflow [17]. This motivates a number of impul- B. Proposed Formulationsive noise suppression methods (see [2], [10], [33], and the referencestherein). If the impulsive ambient noise enters a CS imaging application Formulation (3) is essentially a tradeoff between two goals: modelat the sensory level, the compressed data will be contaminated. Con- fitting (via minimizing the `2 norm of the residuals) and promotion ofsequently, this leads to a larger error and reduced performance. One sparsity (via minimizing the `1 norm of the signal). When the noise iscould argue that the regularization parameter can be adjusted so that Gaussian, this objective function is optimal in the maximum likelihoodthe recovery error is still finite. However, the `2 norm of the residual sense. However, when the noise is impulsive, the theory of robust statis-can be large due to outliers, thus reducing recovery efficiency. tics indicates that it would be prone to larger errors. Impulsive noise is We address this issue by integrating robust statistics and the CS characterized by a small percentage of samples having extremely largetheory. We propose a new formulation for CS called robust CS, fol- values, and its modeling has been studied widely in the literature [27],lowing the principle of robust statistics [20], i.e., by using a convex [30]. One could argue that by adjusting the regularization parameterbut subquadratic cost function on the residuals. This new cost function , the effect of outliers can be reduced. However, the recovered signalplaces less weight on large residuals, giving it the ability to suppress could be far from optimal as the parameters are adjusted considerablyoutliers while retaining the optimality of the standard formulation when to cope with outliers.the noise is Gaussian. It is known from the theory of robust statistics1[20] that a better Our contributions are twofold. First, we show that the new CS strategy is to replace the quadratic cost function on residual ky 0 8 xk2 2formulation can be solved readily by using the majorization–mini- in (3) with a less rapidly increasing cost function g (x) so that the ob-mization (MM) framework that solves iteratively a series of simpler jective function can be written asproblems whose solutions approach that of the main problem. Thesesimpler problems are similar to that in the existing CS formulation; f (x ) = g (x ) + kx k 1 (4)thus, state-of-the-art CS solvers can be deployed readily. Second, weprove that the new formulation can reduce effectively the recovery and the robust CS recovery is obtained by solvingerror bound and show that the improvement is related directly to theportion and the strength of the outliers in the noise samples. Our r ^ (P1 ) x = arg xmin f (x): (5)claims are verified through a number of numerical studies; all of which 2reveal the advantage of the robust CS formulation over the standard Although there is a wide range of cost functions in the robust statis- tics literature, we use g (x) = M (yi 0 (8x)i ), where (r) is the 8CS formulation. i=1 Huber’s penalty function (soft limiter) given as follows:2 II. PROPOSED METHOD r jrj k 2 ( )= r 2 ; 0k (6) 2 + kjrj; jrj k 2A. Problem Settings Without loss of generality, assume that signal x 2 N is S -sparse and its derivative is given byin basis 9 = I. In this model, there are only S nonzero entries in x, rand their locations are unknown. We are interested in the problem of (r) = 0 (r) = ksgn(r); jjrjj k 2 : ; r k 2 (7)sampling x in a nonadaptive and compressive manner and recovering x 1The robust statistical literature is well established, and for background ma-from the compressed measurements. CS uses the measurement matrix8 2 M2N , where M N , to obtain terial, the reader is referred to [20]. 2Please see [20] for a discussion on how to select the parameters. We also note that the choice of the Huber function is simply because of its simplicity. y = 8x + n (1) Other robust cost function for specific impulsive distributions are also available.
  3. 3. IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2012 399 Essentially, this cost function is quadratic where the noise samples in the discussion to follow, is denoted as x(k) . It is always possible ^are most concentrated and linear where the outliers reside. Problem to majorize g (x) at x(k) by the following quadratic function l(k) (x), ^P1 is an extreme case of P1 when one uses (r) = r2 . We note that r given asthe proposed objective function f (x) is a convex function; hence, any Tlocal minimum is also a global minimum. Furthermore, under typical l (k ) ( x ) = g x (k ) ^ + (1=2) x 0 x (k ) ^ T 8 W8 x 0 x (k ) ^CS settings, where the CS matrix 8 is drawn from random ensembles Tand its dimensions are practically meaningful with N M 1, it 0 x 0 x (k ) ^ 8 T y 0 8x(k) ; (8) ^can be shown that the proposed problem has a unique solution with aprobability of almost one.3 where we have used notation (r) = [ (r1 ); . . . ; (rn )]T . The ma- In theory, it might be possible to use a smooth version of kxk1 such jorization can be verified easily by three facts: First, l(k) (^ (k) ) = xas n x2 = 2 + x2 to convert (5) to a simple objective function i=1 i i g (^ (k) ); second, rl(k) (x)jx=^ x = rg (x)jx=^ = 08 T (y 0 x xthat can be solved by the gradient method. However, as x is likely to (k ) 8 x ); and last, for the Hessian, we have ^be sparse and the Hessian is ill-conditioned, this approach suffers fromextremely slow convergence and larger errors [24]. In what follows, we rrT l(k) (x)jx=^ = rrT g(x)jx=^ x xpropose a computationally efficient method to solve (5) by exploitingthe power of existing CS solvers for (3). =8 T W 0 diag y i 0 8 x (k ) 0 ^ 8: (9) i Due to the fact that j 0 (x)j is bounded, which is a property of robust penalty functions, it is always possible to find W such that g (x) isC. Proposed Algorithm majorized at x(k) by l(k) (x). ^ We note importantly that (8) can be rearranged after straightforward Compared with the standard CS formulation (3), the proposed robust manipulation so that it becomes the familiar quadratic form found inCS formulation (5) is different only in the choice of (x). Existing standard CS as follows:methods often exploit the specific nature of the CS formulation. For Texample, the gradient projection method [14] converts the original CS l(k) (x) = (1=2) v(k) 0 8x W v(k) 0 8x + C (10)formulation to a bound constrained quadratic programming problem.However, it appears difficult to do so with the robust CS formulation wherebecause of the subquadratic part . T We deviate from the approaches that solve (5) directly. Rather, we C =g x(k) 0 (1=2) y 0 8 x(k) W01 y 0 8 x(k) ^ ^ ^rely on a popular technique in optimization, which is known by dif- (11) (k)ferent names such as bound optimization, surrogate functions, or ex-plicit proximal methods [1]. It is now better known as the MM frame- v(k) =W01 y 0 8 x +8 x(k) :^ 8^ (12)work [21]. Our proposed approach is to leave the `1 -norm term in (5)intact and majorize the loss function g (x) with the suitable quadraticmajorization l(x) recursively so that the familiar form of standard CS Successive Standard CS Approximation:can be realized. The main idea of the MM framework is to consider Returning to (5), as g (x) is majorized by l(k) (x) at x(k) , it follows ^recursively a series of problems, i.e., the upper bound of the originalproblem, such that those alternative problems are easier to solve and that f (x) is majorized by h (k ) ( x ) = l (k ) ( x ) + k x k 1that the solution from these problems converge to the solution of the (13)original problem. Two related works by Nesterov [29] and Beck and Teboulle [3] solve at x(k) . In the MM framework, we optimize h(k) (x) instead of op- ^a general CS formulation. They both propose MM algorithms where timizing f (x) directly. Suppose that the minimization of h(k) (x) is x(k+1) ; then, the generated sequence fx(k) g can be shown to convergethe quadratic majorization is isotropic; thus, they are slightly more re- ^ ^strictive than what is described here. In both [3] and [29], an increased to the solution of (5). The intuition can be illustrated in Fig. 1. Thecomputational burden is spent on optimizing the surrogate function to minimization of h(k) (x) leads to x(k+1) , which then becomes the next ^find the optimal Lipchitz constant. Here, we follow the approach in iterative point for majorizing f (x). This procedure leads to the itera-robust statistics explicitly specifying the majorization. This leads to al- tive minimization of f (x). Indeed, by the definition of majorization, we have h(k) (x) f (x). According to the definition of x(k+1) , it fol-gorithmic simplicity and stability. We note, however, that both [29] and ^[3] have extended the basic MM framework in that the update rule also lows thatinvolves historical points to speed up the convergence. As a result, they f x(k) = h x(k) h x(k+1) f x(k+1)both achieve quadratic convergence. Here, we only use the basic up- ^ ^ ^ ^ (14)date rule of the MM framework, and linear convergence is obtained.While it appears likely that the same strategy can also be used in ouralgorithm to achieve quadratic convergence, this is beyond the scope of where the equality only occurs at the global optimum. Sincekf (x)this paper. Our main goal is to prove that the new formulation is solv- is convex and bounded from below, this implies limk!1 f (^ ) = x( ) minx f (x); thus, fx g converges to the global optimum. (k ) ^able with existing CS machinery and provide more theoretical insightsvia statistical analysis. Choice of W : Roughly speaking, a majorization that is closer to the Majorization of the Robust Loss Function: We note that the Huber actual function is better. Two popular choices in the robust statisticspenalty function has a quadratic behavior for small residuals and a literature [20] are the following:linear behavior for large residuals, and its curvature at any point should 1) The modified residual (MR) method, i.e., W = I, where max j (x)j; 0not exceed the curvature of a quadratic function. The point of interestat the outer iteration k , where the precise meaning of k will be clearer 2) Iteratively reweighed least-squares (IRLS) method, i.e., W is a (k ) (k ) diagonal matrix with entries wii = (ri )=ri and residuals 3A proof is available upon request. r = y 0 8x when (r)=r is a monotonically decreasing (k ) ^ (k )
  4. 4. 400 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2012 x level sets of f ( ) are bounded such that x Proposition 2: Suppose that 3 is the solution of (5) and that the x x k 0 3k R 8x 2 X : x ^ f (0) f( x) (16) x x where ^ (0) is the initial point. Then, sequence ^ (k) converges linearly according to f x ^ (k ) 0 f( x3 ) 2 + 2 k R2 : (17) The proof of this result is based on arguments similar to those in [29] and, for completeness, is included in the Appendix. We also note that, as the objective function is strictly convex and bounded from below, the assumption of bounded level sets in the proposition is satisfied for the problem being considered. III. ANALYSIS A. Optimality Fig. 1. Comparison function illustration. The choice of the regularization parameter , which is a a model se- lection problem, is crucial for a convex relaxation technique to work properly. Typically, this problem is solved for a set of , which is function for r 0, such as the case of the Huber’s penalty func- often called the regularization path, and the value of the objective func- tion. tion is examined carefully to select the optimal value. The optimal We now show how existing CS solvers can be used readily to solve value of is selected typically when constraints in the formulation are W I y 8xthe robust CS formulation. satisfied. For example, in the standard CS formulation (3), the regu- = is chosen. Problem (P1 ) larization parameter is chosen such that k 0 ^ k2 , where n n Proposition 1: Suppose that MRxcan be solved iteratively as follows. Starting from some initial estimate = k k2 . However, the actual is not known, and is estimated^ (0) , the update at iteration k is a solution of the following standard typically using, e.g., the concentration property of Gaussian randomCS problem: http://ieeexploreprojects.blogspot.comgiven an estimate of the noise variance 2 , n p variables. In other words, x ^ (k+1) = arg min x2 v (k ) x 2 0 8 2 + k k12 x it can be shown that k k2 pM + 2 2M with high probability; y 8x (15) thus, k 0 ^ k M + 2 2M is used commonly to select the 2 regularization parameter [7]. The noise variance 2 can be estimated v y x x xwhere (k) = (1=) ( 0 8 ^ (k) ) + 8 ^ (k) : Sequence f ^ (k) g con- using standard statistical methods.verges to the global solution of (5). n To extend the selection method to the robust CS formulation and The proof of this result is simply a verification. If one uses the IRLS instead of using k k2 as the bound on the `2 norm of the residual, we W x x propose to use 2 2 g ( ) to bound 2 2 g (^ ), where g () is defined rightchoice for , the iterative procedure instead solves x after (5). If one sets to be large enough, 2 2 g ( ) ! k k2 , and this 2 n x v ^ (k+1) = arg min (k) 0 8 x2 x W v T (k ) (k ) x 08 +2k k1 x approaches the selection criterion in the standard CS formulation. To compute this bound, the scale parameter has to be estimated by using, W e.g., the median absolute deviation from the median as follows: (k )where needs to be recalculated at each iteration. For the MRchoice, many existing CS solvers can be used without modification. If, n MAD( ) = 1:4826 mediani jni 0 medianj (nj )j (18)on the other hand, IRLS is chosen, one may need to modify the CSsolver in the case where, e.g., the CS matrix 8 is not defined explic- which is a standard robust statistical method [20]. To ensure that theitly for efficient implementation such as in the partial discrete cosine majority of normal noise samples are bounded with high probabilitytransform (DCT) matrix. The other difference is that the MR majoriza- in a similar manner to the standard CS case, we use = MAD( ), ^ ntion is global (when maxx j 0 (x)j), whereas the IRLS may not where constant = 2 is selected. It can be seen that, if is chosen tobe so; hence, backtracking might be required with the IRLS choice [5]. be very large, the robust CS formulation approaches the standard CSOverall, IRLS has less outer iterations than MR for a given convergence formulation.criterion. Along the regularization path, it is widely known that, when is Convergence Rate: We note that each step of the proposed algo- large, the solution of the standard CS (and also for robust CS) tends torithm is also an optimization problem (15), whose convergence rate zero. This upper bound on is useful in practice to set up the regular-is largely dependent on which CS solver is used. Here, we are more ization path. This value can be found from the optimality conditions, x 0interested in the convergence rate of the outer loop, i.e., how fast the i.e., the gradient at a point is zero. However, as the objective functionsequence f ^ (k) g generated in (15) converges to the global optimum in is not differentiable at point , we need to use subdifferential calculus(5), assuming that (15) can be solved with high accuracy. Clearly, this instead [23]. It follows that condition = is optimal when x 0 xconvergence rate depends on how close g ( ) is approximated by l( ).Being an MM algorithm, the proposed method inherits the convergence x 8T ( ) : y (19) 1property of typical MM algorithms, which is linear [21]. By followingthe same approach as in [29], one can obtain the following result forthe MR choice. where max = k8T ( )k1 . y Therefore, in practice, the regularization path should be [0; max ],
  5. 5. IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2012 401B. Bounds on Recovery Error To study the recovery error of constrained `1 minimization for thesignal model (1), it is more convenient to express the recovery as min kxk1 ; x2C (20)where C denotes the constraint set. For the standard CS recovery, theconstraint set is defined as C = fx : ky 0 8xk2 g (21)where is a suitable upper bound on the `2 norm of the noise, i.e., kxk2 , which is already discussed in the preceding section. For theproposed robust CS, the constraint set is defined as C = f x : g (x ) tg (22)for some suitable t, as discussed in the preceding section. Here, g x ()is the subquadratic cost function in (4). For notational simplicity, weintroduce notation ky 0 xk3 8 =2 ( ) 2 g x so that one can view k k3 Fig. 2. Illustration for the construction proof when + .as a subquadratic vector norm in a loose sense (this norm depends alsoon and ). Due to the definition of the Huber’s penalty function, wenote importantly that kxk3 kxk2 for any vector x. Hence, the aboveconstraint set can be written as ^ what x is, the existence of such a signal model and the above observa- tion allow us to conclude on the error bound of the robust CS recovery C = x : ky 0 8 xk2 2 3 (23) of the original signal model. We note that, under the robust CS formulation, the following inequal- ities hold:where 2 is a suitable upper bound of the squared subquadratic normof the noise n. Due to the definition of the subquadratic norm, there knk3 2 2 kn + zk2 2 :exists such that ; 3 (26) Next, we define the following index sets: knk3 : (24) I1 = i : jn + z j k 2 ; I2 = i : jn + z j k 2 : (27) ^ Then, by denoting x as the solution of the proposed robust CS re- i i i icovery, it also satisfies We propose to construct the elements of w as follows: + 1) For i 2 I1 , we note that, because jni zi j k 2 , then 2 ni 2 ( + 8^ ky 0 xk3 : (25) zi )=( + ) ni zi 2 . Hence, we set wi = ni , which implies that ( + ) =2 ( + wi zi 2 2 ni zi . ) For the sake of the following discussion, we assume that such , , 2) For i 2 I2 , without the loss of generality, we assume that ni + , and have been specified properly (as discussed in the preceding 0 + 0 zi (case ni zi is proven similarly). The motivation ofsection) such that the above inequalities hold. We also assume that the the construction of wi in this case is illustrated in Fig. 2. Here, weerror bound of CS recovery is consistent with the proof in [6]. The ( aim to find wi such that wi zi 2 + ) =2 ( + ) 2 ni zi . It is canfollowing result indicates that the robust CS recovery has a lower error be seen easily that, with wi = 2 ( + ) 2 ni zi 0 zi , we havebound than that of the standard CS recovery. ( + ) =2 ( + wi zi 2 2 ni ) zi and wi ni . Theorem 1: In the presence of impulsive noise, the robust CS re-covery has an error bound less than or equal to that of the standard CS From this construction, it follows that, by taking the sums over in-recovery. dices I1 and I2 : Proof: We start from an important observation that, for a givensignal model y =8 + 8 x n with fixed , the `2 -norm error bound of 2 kwk2 = 2 wi + 2 wi (28)standard CS recovery is proportional to knk2 by a universal constant i2I i2I 8that depends only on the RIP property of [6]. This implies that, if we kw + zk2 = (w + z )2 + (w + z )2have another signal model v =8 + x w, where kwk2 knk2 , then 2 i2I i i i2I i ithe error bound of the latter model is less than that of the former model.Please note that this is only for the error bound and that no statement is = (n + z )2 + i i 2 2 (n + z )2 i i i2I i2Imade on the actual error. Our strategy is to prove by construction thatthe robust CS recovery is equivalent to the standard CS recovery with = 2 2 (n + z ) + i i 2 2 (n + z ) i inoise w whose `2 norm is smaller than that of the original problem n. i2I i2I Indeed, we first recall in Section II that the proposed robust CS is = kn + zk2 : 3 (29)convex and has a unique solution with a probability of almost 1. Sup- ^pose that the robust CS solution is x, and define z = 8( ^) x 0 x . Similar Then, we haveto the MM framework and to approximate the objective function, we ^find an equivalent signal model at solution x. While we do not specify kwk2 ; kw + zk2 (30)
  6. 6. 402 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2012where is such that knk2 . The existence of such w indi- Then, under the assumption that is large and is small, we cancates that the error bound is improved. Indeed, applying the triangle approximate the term on the right by O(). This implies that theinequality, we have gain is related directly to the percentage of the outliers in the noise as well as the strength of the outliers . kzk2 kwk2 + kw + zk2 : (31) IV. EXPERIMENTS ^ Then, it follows from [6] that the `1 -norm minimization solution x1 We present a number of numerical studies to demonstrate the ro-satisfies bustness of the proposed formulation in CS applications where there are outliers in the noise. We use model (33) for modeling impulsive (32) noise. The SNR is defined as 10 log(1=total ). 2 x 8x k^ 0 xk2 C k8(^ 0 x)k2 For the CS settings, we consider the measurement matrix 8 to be a p pwhere C = 2 1 + 2S =(1 0 (1 + 2)2S ) is a universal constant, partial DCT matrix, which is obtained by randomly selecting M out2S is a 2S-RIP constant of 8 , and S is the number of nonzero entries of N rows of the full DCT matrix. The actual realization of the partial ^in x. Together with definition z = 8 (x 0 x), it follows that the error DCT matrix is known to be not important in CS applications due tobound of robust CS is C ( + ). Comparing with the error bound of the concentration property of the random ensemble constructed fromCS as 2C and noting that and , the proof follows. the DCT matrix. The choice of M follows the common practice of CS [7], i.e., M is at least 3–5 times the intrinsic dimension, which isC. Discussion the minimum number of (significant) nonzero coefficients necessary to represent an image under some basis functions. In all experiments, we While the above result indicates that doing a robust CS recovery is choose M = 0:5N to ensure that this CS requirement is always met.likely to yield smaller error, it does not indicate what the gain would We also select the l1 ls algorithm [23] as the solver for the standarddepend on. Here, we aim to answer this question. To do so, we need CS formulation as this algorithm produces very accurate results due toto characterize the noise more explicitly. There are many statistical the nature of its interior-point method. For the l1 ls algorithm, we setmodels for impulsive noise. For simplicity, we consider a two-term the relative dual gap of 1% as the convergence criterion. In addition,Gaussian mixture model whose probability distribution function is we do not consider the postprocessing of the results as it may further introduce errors [14]. f = (1 0 )N(0; ) + N(0; ): 2 2 We note importantly that, due to the projection mechanism of CS sensing, i.e., mapping from the high-dimensional space N to the low- (33) dimensional space M , the CS measurement y tends to have noise-like The nominal background noise is represented by the first term, and appearance and does not carry directly the semantic meaning similar tothe effect of impulsiveness is captured in the second term. This model Thus, it may be difficult, if not impos- what the original image does.has been used regularly in modeling practical environmental noise [32] sible, to localize the strongly corrupted samples simply by examiningas it serves as an approximation to Middleton’s Class A noise model.While denotes the portion of outliers in the noise, indicates their y. Even if the corrupted samples can be localized, one may have nostrength. The total noise variance is total = (1 0 ) 2 + 2 . 2 other options (e.g., interpolation or imputation) but to discard the bad samples, which is not efficient for recovery. First, we use a conservative inequality to lower bound the improve-ment as follows: A. Recovery of the Random-Bar Image We consider a synthetic image of random bars, as shown in Fig. 3, 2C 0 C ( + ) C ( 0 ): (34) which has been used in the previous CS literature [22]. The Haar wavelets are used as the basis functions because the image is sparse in As C is a constant, the error bound improvement of O( 0 ) de- this transform domain. In particular, with a three-level Haar waveletpends only on the gap between and , which in turn depends on the decomposition, the number of nonzero wavelet coefficients is 5%impulsiveness of the noise. Under the assumption of the Gaussian mix- of the dimension of the image. We obtain the CS measurements onture model for impulsive noise, it can be shown easily by computing this image, recover the Haar wavelet coefficients, and compare theE [n2 ] and E [2 2 (ni )] and that i recovered image with the original one in a similar manner with that of previous work [22], [31]. = O (1 0 ) + ) 2 2 (35) First, we repeat the recovery over a number of runs where only 2 2 p = O (1 0 ) + k (2 2 0:8 0 k ) 2 2 the Gaussian noise varies and measure the average output peak SNR O 2(1 0 + 1:6pk ) (PSNR) of the recovered images using the standard CS and the (36) proposed robust CS formulations. Here, the regularization parameter is computed automatically, as discussed above. In the presence of pwhere we have made used of the fact that, for a zero-mean Gaussianp k under an impulsive noise [jZ j] = 2= Thus, 0applyingthatrandom variable, Z N (0; 2 ), E 2 :8 and Gaussian noise, the dependence of the output PSNR of the amount of assumption. by the noise is depicted in Fig. 4. Clearly, the performance of the proposedfollowing inequality for a, b, c 0, and b c: CS formulation is almost equivalent to that of the standard CS formu- lation. When the noise power is smaller, both formulations approach p p b0c perfect CS recovery, which is exhibited by high output PSNRs. a +b0 b + c 2 pa + b (37) Next, we compute similarly the average output PSNR when impul- sive noise is present instead. The fraction of outliers in impulsive noise is = 0:1, and its strength is = 100. The PSNR performance isone obtains shown in Fig. 4. Clearly, the proposed robust CS formulation outper- (1 p 1:6k=p) forms consistently the standard CS formulation by several decibels over 0 O 0 : the simulated range for the noise. At high SNRs, the gap between the 2 1 + (38) proposed CS formulation and the standard CS formulation appears to
  7. 7. IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2012 403 Fig. 3. Random-bar example of CS recovery for the compared formulations in the presence of impulsive noise with SNR = 10 dB. Fig. 5. Influence of impulsiveness. Fig. 4. Comparison of output PSNR (random bars). fixed at SNR = 20 dB. The impulsive noise is the same as the previousbecome smaller, possibly due to the robustness of `1 -norm regulariza- experiment: = 0:1 and = 100. The recovered MRI images fortion to small noise. Typical recovered images for both formulations are both standard and robust CS formulations are shown in Fig. 6. Quan-shown in Fig. 3. The proposed CS formulation yields images that are titatively, it is observed that the PSNR of the recovered image by thetypically less distorted than the others. proposed CS formulation has a slightly higher value than that obtained To examine the main result in the theorem on the gain of the proposed by the standard CS formulation at an improvement of 1 dB. We also ex-formulation versus impulsiveness, we fix the noise at SNR = 10 dB, amine the residual image from both formulations, as shown in Fig. 7,vary parameter from 100 to 1000 (extremely impulsive), and mea- and find that there are more hot spots in the residual images obtainedsure the output PSNRs for both formulations. The results, which are from the standard CS formulation than for those obtained from the pro-depicted in Fig. 5, clearly show that, as expected, the standard CS for- posed formulation.mulation is dependent only on the noise power and not the character-istics of the noise (i.e., the approximately constant output PSNR over V. CONCLUSIONthe impulsiveness range) and that the improvement in the PSNR of the We have presented a new approach to improving CS recovery in theproposed CS formulation is larger as the level of impulsiveness is in- presence of impulsive noise. By using a robust cost function on thecreased. residuals, we are able to suppress large outliers in the measurement noise. This results in an improved recovery because the regulariza-B. Recovery of an MRI Image tion parameter is not influenced significantly by these outliers, meaning We consider an MRI image, as shown in Fig. 6. As shown in [25], that the recovered signal is not being driven further toward 0. We alsoMRI is very relevant to CS as the sensing cost can be reduced consid- show that an iterative algorithm can be developed readily under theerably without compromising the quality of the image. In contrast with MM framework to utilize the power and computational efficiency ofthe previous random-bar image where the Haar wavelet coefficients are the existing CS solvers to obtain the solution of the new formulation.truly sparse (5% nonzeros), the wavelet coefficients of this MRI image Most importantly, we have established a theoretical guarantee on theare not sparse but approximately follow an exponential decay. In the improvement of the upper bound of the recovery error. The numer-CS literature, this is often referred to as compressible, which reflects ical studies on both synthetic and real images show that the proposedmore accurately the characteristics of images in real life. The purpose formulation achieves equivalent performance when the noise is indeedof this paper is to examine the quality of the recovered images from CS Gaussian, but an improvement is found when the noise is heavily im-samples under the influence of impulsive noise. To do so, we also use pulsive. The proposed method can be used to improve further CS re-the partial DCT matrix to obtain the CS samples. The noise power is covery upon an inspection of the residuals for impulsiveness.
  8. 8. 404 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2012 Fig. 6. Recovery of an MRI image using standard and robust CS formulations. Fig. 7. MRI residual images. APPENDIX I easily that, if f (^ (0) ) 0 f (x3 ) R2 , then the optimal = 1; thus, x First, due to the convexity of g (x), it holds that f (^ (1) ) 0 f (x3 ) R2 =2. Otherwise, and for k 1, we always have x f (^ (k) ) 0 f (x3 ) R2 ; thus x T g (x ) g x (k ) ^ + rg x (k ) ^ x 0 x (k ) ^ : (39) 0 f (x 3 ) 2 Thus, according to the definition of x(k+1) f x (k ) ^ ^ f x ^ (k+1) 0 f (x 3 ) f x (k ) ^ 0 f (x 3 ) 0 2R2 : min x0x 2 (k+1) (k ) f x ^ f (x) + (=2) ^ : (40) x 2 From this and using the fact that f (^ (k) ) 0 f (x3 ) f (^ (k+1) ) 0 x x f (x3 ), one can derive easily the result on a linear convergence rate, as It can be seen easily that the generated sequence fx(k) g satisfies ^f (^ (k) ) f (^ (0) ) so that kx(k) 0 x3 k2 R. Using the upper bound stated in the proposition. x x ^(40) over segment [^ (k) ; x3 ], we have x ACKNOWLEDGMENT The authors would like to thank Dr. B. Adams for proofreading the f x(k+1) ^ manuscript, and the anonymous reviewers for providing constructive min 2 ;[0 1] f x + (1 3 0 )^ k x ( ) feedback that help improve technical clarity. REFERENCES 0 )^ k 0 x3 2 3 +=2 x + (1 x ( ) 2 [1] A. Auslender and M. Teboulle, “Interior gradient and proximal methods for convex and conic optimization,” SIAM J. Optim., vol. 16, min 0 x3 2 (k ) =2 x 2 no. 3, pp. 697–725, 2006. ^ 2 ;[0 1] 2 [2] L. Bar, A. Brook, N. Sochen, and N. Kiryati, “Deblurring of color im- 0 0 f (x 3 ) + f ages corrupted by impulsive noise,” IEEE Trans. Image Process., vol. f x (k ) ^ x (k ) ^ 16, no. 4, pp. 1101–1111, Apr. 2007. [3] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding al- min 2 ;[0 1] R =2 0 f x 2 2 ^ k 0 f (x 3 ) ( ) gorithm for linear inverse problems,” SIAM J. Imag. Sci., vol. 2, no. 1, pp. 183–202, Jan. 2009. +f x (k ) ^ : [4] J. Bobin, J.-L. Starck, and R. Ottensamer, “Compressed sensing in astronomy,” IEEE J. Sel. Topics Signal Process., vol. 2, no. 5, pp. 718–726, Oct. 2008. Without the constraint, the right-hand side attains a minimum at = [5] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, ) 0 f (x ))=(R ). However, as 2 [0; 1], it can be seen (k ) 3 x(f (^ 2 U.K.: Cambridge Univ. Press, 2004.
  9. 9. IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2012 405 [6] E. Candes, “The restricted isometry property and its implications for Capacity Analysis For Orthogonal Halftone Orientation compressed sensing,” Compte Rendus de l’Academie des Sciences, Modulation Channels Paris, Serie I, vol. 346, no. 9/10, pp. 589–592, May 2008. [7] E. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency infor- Orhan Bulan, Student Member, IEEE, Vishal Monga, Member, IEEE, mation,” IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489–509, Feb. and Gaurav Sharma, Senior Member, IEEE 2006. [8] E. Candes and T. Tao, “Near-optimal signal recovery from random pro- jections: Universal encoding strategies,” IEEE Trans. Inf. Theory, vol. 52, no. 12, pp. 5406–5425, Dec. 2006. Abstract—Halftone dot orientation modulation has recently been pro- [9] R. Chan, C.-W. Ho, and M. Nikolova, “Salt-and-pepper noise removal posed as a method for data hiding in printed images. Extraction of data by median-type noise detectors and detail-preserving regularization,” embedded with halftone orientation modulation is accomplished by com- IEEE Trans. Image Process., vol. 14, no. 10, pp. 1479–1485, Oct. 2005. puting, from the scanned hardcopy image, detection statistics that uniquely [10] P. Civicioglu, “Using uncorrupted neighborhoods of the pixels for im- identify the embedded orientation. From a communications perspective, pulsive noise suppression with ANFIS,” IEEE Trans. Image Process., this data hiding setup forms an interesting class of channels with dot orien- vol. 16, no. 3, pp. 759–773, Mar. 2007. tation as input and a vector of statistics as the output. This paper derives [11] W. Dai and O. Milenkovic, “Subspace pursuit for compressive sensing capacity expressions for these channels that allow for numerical evaluation signal reconstruction,” IEEE Trans. Inform. Theory, vol. 55, no. 5, pp. of the capacity. Results provide significant insight for orientation modula- 2230–2249, May 2009. tion based print-scan resilient data hiding: the capacity varies significantly [12] D. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52, as a function of the image graylevel and experimentally observed error free no. 4, pp. 1289–1306, Apr. 2006. data rates closely mirror the variation in capacity. [13] D. Donoho, Y. Tsaig, I. Drori, and J. Starck, Sparse solution of under- Index Terms—Capacity, halftone, hardcopy data embedding, orientation determined linear equations by stagewise orthogonalmatching pursuit modulation channel. 2006, Preprint. [14] M. Figueiredo, R. Nowak, and S. Wright, “Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems,” IEEE J. Sel. Topics Signal Process., vol. 1, no. 4, I. INTRODUCTION pp. 586–597, Dec. 2007. [15] R. Garg and R. Khandekar, “Gradient descent with sparsification: An Hardcopy data embedding, i.e., data embedding in images that are iterative algorithm for sparse recovery with restricted isometry prop- intended to survive the print-scan process, continues to be an area erty,” in Proc. ICML, 2009, pp. 337–344. of significant interest. Applications lie in document authentication, [16] E. Hale, W. Yin, and Y. Zhang, A fixed-point continuation method for regularized minimization with applications to compressed sensing tamper prevention and detection, tracking/inventory control, and Rice University (CAAM), Houston, TX, Tech. Rep. TR07-07, 2007. meta-data embedding. While hardcopy data embedding shares several [17] T. Hashimoto, “Bounds on a probability for the heavy tailed distribu- tion and the probability of deficient decoding in seequential decoding,” generic concerns with robust watermarking, the major distinguishing IEEE Trans. Inf. Theory, vol. 51, no. 3, pp. 990–1002, Mar. 2005. factor is the presence of the print-scan distortion channel. are typically binarized or halftoned be- [18] J. Haupt and R. Nowak, “Compressive sampling vs. conventional Continuous grayscale images imaging,” in Proc. ICIP, 2006, pp. 1269–1272. [19] J. Haupt and R. Nowak, “Signal reconstruction from noisy random pro- fore printing. A large number of binary representations provide a per- jections,” IEEE Trans. Inf. Theory, vol. 52, no. 9, pp. 4036–4048, 2006. ceptually acceptable representation of a given continuous image. The [20] P. J. Huber, Robust Statistics. New York: Wiley, 1981. flexibility available in choosing among these binary patterns provides [21] D. Hunter and K. Lange, “A tutorial on MM algorithms,” Amer. Stat., vol. 58, no. 1, pp. 30–37, 2004. an avenue for data embedding. A significant class of methods [2]–[5] [22] S. Ji, Y. Xue, and L. Carin, “Bayesian compressive sensing,” IEEE utilize oriented binary patterns for the purpose of embedding, including Trans. Signal Process., vol. 56, no. 6, pp. 2346–2356, Jun. 2008. our recent proposal [5] upon which we base our subsequent discussion. [23] S.-J. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky, “A method for large-scale -regularized least squares,” IEEE J. Sel. Topics Signal Our method adapts classical clustered dot halftoning [6], which is used Process., vol. 4, no. 1, pp. 606–617, 2007. widely in laser printers, for hiding data in hardcopy images via halftone [24] K. Koh, S.-J. Kim, and S. Boyd, “An interior-point method for large- dot orientation modulation. Our modifications of the halftoning process scale -regularized logistic regression,” J. Mach. Learn. Res., vol. 8, pp. 1519–1555, Dec. 2007. [25] M. Lustig, D. Donoho, and J. M. Pauly, “Sparse MRI: The application Manuscript received October 20, 2010; revised March 07, 2011 and April 25, of compressed sensing for rapid MR imaging,” Magn. Reson. Med., 2011; accepted May 02, 2011. Date of publication May 19, 2011; date of cur- vol. 58, no. 6, pp. 1182–1195, Dec. 2007. [26] S. G. Mallat and Z. Zhang, “Matching pursuit with time-frequency dic- rent version December 16, 2011. This work was supported in part by a grant tionaries,” IEEE Trans. Signal Process., vol. 41, no. 12, pp. 3397–3415, from Xerox Corporation and by a grant from New York State Office of Science, Dec. 1993. Technology Academic Research (NYSTAR) through the Center for Emerging [27] D. Middleton, “Non-Gaussian noise models in signal processing for and Innovative Sciences (CEIS). An earlier version of this paper was presented telecommunications:new methods and results for class A and class B at the IEEE ICASSP, Las Vegas, NV, April 2008. The associate editor coordi- noise models,” IEEE Trans. Inf. Theory, vol. 45, no. 4, pp. 1129–1149, nating the review of this manuscript and approving it for publication was Dr. May 1999. Stefan Winkler. [28] D. Needell and J. Tropp, “CoSaMP: Iterative signal recovery from in- O. Bulan is with the Department of Electrical and Computer Engineering, complete and inaccurate samples,” Appl. Comput. Harmon. Anal., vol. University of Rochester, Rochester, NY 14627-0126 USA (e-mail: bulan@ece. 26, no. 3, pp. 301–321, 2009. [29] Y. Nesterov, Gradient methods for minimizing composite objective V. Monga is with the Department of Electrical Engineering, Pennsylvania function Catholic Univ. Louvain, Leuven, Belgium, Tech. Rep., 2007. State University, University Park, PA 16802 USA. Part of this work was con- [30] C. L. Nikias and M. Shao, Signal Processing With Alpha-Stable Dis- ducted while he was with Xerox Research Center, Webster, NY 14580 USA tributions and Applications. Hoboken, NJ: Wiley, 1995. (e-mail: [31] Y. Tsaig and D. Donoho, “Extensions of compressed sensing,” Signal G. Sharma is with the Department of Electrical and Computer Engineering, Process., vol. 86, no. 3, pp. 549–571, 2006. the Department of Biostatistics and Computational Biology, and the Department [32] X. Wang and H. V. Poor, “Robust multiuser detection in non-Gaussian of Oncology, University of Rochester, Rochester, NY 14627-0126 USA (e-mail: channels,” IEEE Trans. Signal Process., vol. 47, no. 2, pp. 289–305, Feb. 1999. Color versions of one or more of the figures in this paper are available online [33] P. Windyga, “Fast impulsive noise removal,” IEEE Trans. Image at Process., vol. 10, no. 1, pp. 173–179, Jan. 2001. Digital Object Identifier 10.1109/TIP.2011.2155078 1057-7149/$26.00 © 2011 IEEE