• Like
  • Save
Blind seperation image sources via adaptive dictionary learning
Upcoming SlideShare
Loading in...5
×
 

Blind seperation image sources via adaptive dictionary learning

on

  • 1,841 views

abstarct of a project

abstarct of a project

Statistics

Views

Total Views
1,841
Views on SlideShare
1,841
Embed Views
0

Actions

Likes
0
Downloads
5
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Blind seperation image sources via adaptive dictionary learning Blind seperation image sources via adaptive dictionary learning Document Transcript

    • IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 6, JUNE 2012 2921 Blind Separation of Image Sources via Adaptive Dictionary LearningVahid Abolghasemi, Member, IEEE, Saideh Ferdowsi, Student Member, IEEE, and Saeid Sanei, Senior Member, IEEE Abstract—Sparsity has been shown to be very useful in source represents the instrumental noise or the imperfection of theseparation of multichannel observations. However, in most cases, model. In BSS, the aim is to estimate both and from the ob-the sources of interest are not sparse in their current domain and servations. This problem does not have a unique solution in gen-one needs to sparsify them using a known transform or dictionary.If such a priori about the underlying sparse domain of the sources eral. However, one can find a solution for (1) by imposing someis not available, then the current algorithms will fail to success- constraints into the separation process and making the sourcesfully recover the sources. In this paper, we address this problem distinguishable. Independent component analysis (ICA) is oneand attempt to give a solution via fusing the dictionary learning of the well-established methods that exploits the statistical in-into the source separation. We first define a cost function based dependence of the sources [2]. It also assumes that the sourceson this idea and propose an extension of the denoising method inthe work of Elad and Aharon to minimize it. Due to impracticality are non-Gaussian and attempts to separate them by minimizingof such direct extension, we then propose a feasible approach. In the mutual information.the proposed hierarchical method, a local dictionary is adaptively Nonnegativity is another constraint that has been shown to belearned for each source along with separation. This process im- useful for source separation. In nonnegative matrix factoriza-proves the quality of source separation even in noisy situations. tion, a nonnegative matrix is decomposed into a product of twoIn another part of this paper, we explore the possibility of addingglobal priors to the proposed method. The results of our experi- nonnegative matrices. The nonnegativity constraint allows onlyments are promising and confirm the strength of the proposed ap- additive combination; hence, it can produce a part-based repre-proach. sentation of data [3]. Index Terms—Blind source separation (BSS), dictionary In particular, and in image separation applications, therelearning, image denoising, morphological component analysis have been proposed several Bayesian approaches [4]–[8]. In(MCA), sparsity. [4], Kayabol et al. adopted a Markov random field (MRF) model to preserve the spatial dependence of neighboring pixels (e.g., sharpness of edges) in a 2-D model. They provided a I. INTRODUCTION full Bayesian solution to the image separation using Gibbs sampling. Another MRF-based method has been proposed byI N SIGNAL and image processing, there are many instances where a set of observations is available and we wish to re-cover the sources generating these observations. This problem, Tonazzini et al. in [7]. They used a maximum a posteriori (MAP) estimation method for recovering both the mixing matrix and the sources, based on alternating maximization in awhich is known as blind source separation (BSS), can be pre- simulated annealing scheme. Their experimental results havesented by the following linear mixture model: shown that this scheme is able to increase the robustness against noise. Later, Tonazzini et al. [8] proposed to apply mean field (1) approximation to MRF and used expectation maximization (EM) algorithm to estimate parameters of both the mixingwhere is the observation matrix, matrix and the sources. This method has been successfully is the source matrix, and is the used in document processing application. In addition, in [9],mixing matrix (throughout the paper, we assume that the mixing the images have been modeled by hidden Markov fields withmatrix is column normalized). The additive of size unknown parameters and a Bayesian formulation has been considered. A fast Monte Carlo Markov chain (MCMC) al- gorithm has been proposed in this paper for joint separation Manuscript received July 16, 2011; revised November 04, 2011 and Jan-uary 08, 2012; accepted February 04, 2012. Date of publication February 13, and segmentation of mixed images. Kuruoglu et al. [10] have2012; date of current version May 11, 2012. The associate editor coordinating studied the detection of cosmic microwave background (CMB)the review of this manuscript and approving it for publication was Prof. Bulent using image separation techniques. They exploited the spatialSankur. V. Abolghasemi is with the School of Engineering and Design, Brunel Uni- information in the images by modeling the correlation andversity, Uxbridge UB8 3PH, U.K., and also with the Faculty of Engineering and interactions between pixels by MRFs. They proposed a MAPPhysical Sciences, University of Surrey, Guildford, GU2 7XH, U.K. (e-mail: algorithm for estimation of the sources and the mixing matrixvabolghasemi@ieee.org). S. Ferdowsi and S. Sanei are with the Department of Computing, Faculty of parameters.Engineering and Physical Sciences, University of Surrey, Guildford GU2 7XH, Different from MRF-based approaches for CMB detectionU.K. in astrophysical images is spectral matching ICA (SMICA) Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org. method proposed by Cardoso et al. [5]. In SMICA, the spectral Digital Object Identifier 10.1109/TIP.2012.2187530 diversity of the sources is exploited. This enables SMICA 1057-7149/$31.00 © 2012 IEEE
    • 2922 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 6, JUNE 2012to separate the sources even if they are Gaussian and noisy, in MMCA is still an open issue. One possible approach to useprovided that they have different power spectra. It is shown in dictionary learning for MMCA is to learn a specific dictionary[5] that this idea leads to an efficient EM algorithm, much faster for each source from a set of exemplar images. This, however,than previous algorithms with non-Gaussianity assumption. is rarely the case, since in most BSS problems, such trainingAn extension of this method to the wavelet domain, called samples are not available.wSMICA, has been proposed by Moudden et al. [6]. In this paper, we adapt MMCA to those cases that the spar- Wavelet domain analysis for image separation problem has sifying dictionaries/transforms are not available. The proposedbeen also shown to be advantageous in some other works. In algorithm is designed to adaptively learn the dictionaries from[11], Ichir and Mohammad-Djafari assumed three different the mixed images within the source separation process. Thismodels of the wavelet coefficients, i.e., independent Gaussian method is motivated by the idea of image denoising using amixture model, hidden Markov tree model, and contextual learned dictionary from the corrupted image in [1]. Other ex-hidden Markov field model. Then, they proposed an MCMC tensions of the work in [1] have been also reported in [22], [24],algorithm for joint blind separation of the sources and estima- and [25] for the purpose of image separation from a single mix-tion of the mixing matrix and the hyperparameters. The work ture. In this paper, the multichannel case with more observationsin [12] also uses the wavelet domain for source separation. than sources is considered. We start by theoreticallyIn another work, Bronstein et al. [13] studied the problem of extending the denoising problem to BSS. Then, a practical algo-recovering a scene recorded through a semi-reflecting medium rithm is proposed for BSS without any prior knowledge aboutand proposed a technique called sparse ICA (SPICA). They the sparse domains of the sources. The results indicate that adap-proposed to apply the wavelet packet transform (WPT) to the tive dictionary learning, one for each source, enhances the sep-mixtures prior to applying Infomax ICA. Their algorithm is arability of the sources.suitable for the cases where all the sources can be sparselyrepresented in the wavelet domain. A. Notation Sparsity-based approaches for the BSS problem have re- In this paper, all parameters have real values although it is notceived much attention recently. The term sparse refers to explicitly mentioned. We use small and capital boldface charac-signals or images with small number of nonzeros with respect ters to represent vectors and matrices, respectively. All vectorsto some representation bases. In sparse component analysis are represented as column vectors by default. For instance, the(SCA), the assumption is that the sources can be sparsely th column and th row of matrix are represented by columnrepresented using a known common basis or dictionary. For vectors and , respectively. The vectorized version of ma-instance, the aforementioned SPICA method [13] considers trix is shown by vec . The matrix Frobenius normthe wavelet domain for this purpose. In addition, the proposed is indicated by . The transpose operation is denoted bymethods in [14] and [15] use sparse dictionaries for separation . The -norm, which counts the number of nonzeros, andof speech mixtures. Nevertheless, there are many cases where the -norm, which sums over the absolute values of all ele-each source is sparse in a different domain, which makes it ments, are shown by and , respectively.difficult to directly apply SCA methods to their mixtures. Mul-tichannel morphological component analysis (MMCA) [16] B. Organization of the Paperand generalized morphological component analysis (GMCA) In the next section, the motivation behind the proposed[17] have been recently proposed to address this problem. method is stated. We start by describing the denoising problemThe main assumption in MMCA is that each source can be and its relation to image separation. Then, the extension ofsparsely represented in a specific known transform domain. In single mixture separation to the multichannel observationGMCA, each source is modeled as the linear combination of a scenario is argued. In Section III, a practical approach for thisnumber of morphological components where each component purpose is proposed. Section IV is devoted to discussing theis sparse in a specific basis. MMCA and GMCA are extensions possibility of adding global sparsity priors to the proposedof a previously proposed method of morphological component method. Some practical issues are discussed in Section V. Theanalysis (MCA) [18]–[21] to the multichannel case. In MCA, numerical results are given in Section VI. Finally, this paper isthe given signal/image is decomposed into different morpho- concluded in Section VII.logical components subject to sparsity of each component in aknown basis (or dictionary). II. FROM IMAGE DENOISING TO SOURCE SEPARATION MMCA performs well where prior knowledge about thesparse domain of each individual source is available; however, A. Image Denoisingwhat if such a priori is not available or what if the sources Consider a noisy image corrupted by additive noise. Elad andare not sparsely representable using the existing transforms? Aharon [1] showed that if the knowledge about the noise powerOne may answer these questions by referring to the dictionary is available, it is possible to denoise it by learning a local dic-learning framework. In dictionary learning framework, the aim tionary from the noisy image itself. In order to deal with largeis to find an overcomplete dictionary that can sparsely represent images, they used small (overlapped) patches to learn such dic-a given set of images or signals. Utilizing learned dictionaries tionary. The obtained dictionary is considered local since it de-into MCA has been shown to yield promising results [22], scribes the features extracted from small patches. Let us repre-[23]. However, taking the advantages of learned dictionaries sent the noisy image by vector of length . The
    • ABOLGHASEMI et al.: BLIND SEPARATION OF IMAGE SOURCES VIA ADAPTIVE DICTIONARY LEARNING 2923unknown denoised image is also vectorized and represented global transforms (e.g., wavelet and curvelet) for image sepa-as . The th patch from is shown by vector ration from a single mixture. Their simulation results show theof pixels. For notational simplicity, the th patch is ex- effects of adaptive dictionaries on separation of complex texturepressed as explicit multiplication of operator (a binary patterns from natural images.matrix) by .1 The overall denoising problem is expressed as All the related studies have demonstrated the advantages that adaptive dictionary learning can have on the separation task. However, there is still one missing piece and that is considering (2) such adaptivity for the multichannel mixtures. Performing the idea of learning local dictionaries within the source separationwhere scalars and control the noise power and sparsity de- of multichannel observations obviously has many benefits ingree, respectively. In addition, is the sparsifying dic- different applications. Next, we extend the denoising methodtionary that contains normalized columns (also called atoms), in [1] for this purpose.and are sparse coefficients of length . In the proposed algorithm by Elad and Aharon [1], and C. Multichannel Source Separationare respectively initialized with and overcompletediscrete cosine transform (DCT) dictionary. The minimization In this section, the aim is to extend the denoising problemof (2) starts with extracting and rearranging all the patches of . (2) to the multichannel source separation. Consider the BSSThe patches are then processed by K-SVD [26], which updates model introduced in Section I, and assume further that the and estimates sparse coefficients . Afterward, and sources of interest are 2-D grayscale images. The BSS modelare assumed fixed and is estimated by computing for 2-D sources can be represented by vectorizing all images to and then stacking them to form . The BSS model (1) cannot be (3) directly incorporated into (2) as it requires both and to be single vectors. Hence, we use the vectorized versions of these matrices in model (1), which can be obtained using thewhere is the identity matrix and is the refined version of . properties of the Kronecker product2Again, and are updated by K-SVD but this time usingthe patches from that are less noisy. Such conjoined denoisingand dictionary adaptation is repeated to minimize (2). In prac-tice, (3) is obtained computationally easier since is (4)diagonal and the above expression can be calculated in a pixel-wise fashion. In the above expression, and are column vectors of lengths It is shown that (3) is a kind of averaging using both noisy and , respectively. In addition, is a column vector ofand denoised patches, which, if repeated along with updating of length . is a block diagonal matrix of sizeother parameters, will denoise the entire image [1]. However, in , and is the Kronecker product symbol. We consider thethe sequel, we will try to find out if this strategy is extendable noiseless setting and modify (2) tofor the cases where the noise is added to the mixtures of morethan one image.B. Image Separation (5) Image separation is a more complicated case of image de-noising where more than one image are to be recovered from asingle observation. Consider a single linear mixture of two tex- It is clearly seen that the above expression is similar to (2) buttures with additive noise: (or ). with an extra mixing matrix to be estimated. In addition, vec-The authors of [25] attempt to recover and using the prior tors and are much lengthier than and as they representknowledge about two sparsifying dictionaries and . They vectorized versions of multiple images and mixtures.use a minimum mean-squared-error (MSE) estimator for this The problem (5) can be minimized in an alternating schemepurpose. In contrast, the recent work in [24] does not assume by keeping all but one unknown fixed at a time. The estima-any prior knowledge about the dictionaries. It rather attempts to tion of and can be achieved using K-SVD similar tolearn a single dictionary from the mixture and then applies a de- [1]. In K-SVD, the dictionary is updated columnwise using sin-cision criterion to the dictionary atoms to separate the images. gular value decomposition (SVD); the sparse coefficients areIn another recent work, Peyre et al. [22] presented an adaptive estimated using one of the common sparse coding techniques.MCA scheme by learning the morphologies of image layers. However, estimation of is slightly different from that ofThey proposed to use both adaptive local dictionaries and fixed in (3) and needs more attention. It has a closed-form solution, 1Practically, we apply a nonlinear operation to extract the patches from image 2The matrix multiplication can be expressed as a linear transformation on by sliding a mask of appropriate size over the entire image similar to [1] and matrices. In particular, vec vec for three matrices ,[22]. , and . In our case, vec vec .
    • 2924 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 6, JUNE 2012which is obtained by taking the gradient of (5) and setting it to cost function in (10) does not require dealing with cumbersomezero matrices and vectors [as shown in (5)] and allows calculating the image sources in a pixelwise fashion. The solution for (10) is achieved using an alternating scheme. (6) The minimization process for the th level of hierarchy can be expressed as follows. The image patches are extracted fromleading to and then processed by K-SVD for learning and sparse coef- ficients , whereas other parameters are kept fixed. Then, the gradient of (10) with respect to is calculated and set to zero (12) (7) Finally, after some manipulations and simplifications in (12),In order to estimate , we consider all unknowns, except , the estimation of the th source is obtained asfixed and simplify (5) by converting the first quadratic term intoan ordinary matrix product and obtain (8) (13) It is interesting to notice that the inverting term in the aboveThe above minimization problem is easily solved using pseu- expression is the same as that in (3) for the denoising problem.doinverse of as Thus, as aforementioned, this calculation can be obtained pixel- (9) wise. Next, in order to update , a simple least square linear re- gression such as the one in [16] will give the following solution: The above steps (i.e., estimation of , , , and ) should (14)be alternately repeated to minimize (5). However, the long ex-pression (7) is not practically computable, particularly if the However, normalization of is necessary after each updatenumber of sources and observations is large. This is because to preserve the column norm of the mixing matrix. The aboveof dealing with a huge matrix . In addition, steps for updating all variables are executed for all from 1 toin contrast to the aforementioned denoising problem, the matrix . Moreover, the entire procedure should be repeated to mini-to be inverted in (7) is not diagonal, and therefore, the estima- mize (10). A pseudocode of the proposed algorithm is given intion of cannot be calculated pixel by pixel, which makes the Algorithm 1.situation more difficult to handle. In the next section, a practicalapproach is proposed to solve this problem. Algorithm 1 The proposed algorithm. III. ALGORITHM Input: Observation matrix , patch size , number of In order to find a practical solution for (5), we use a hierar- dictionary atoms , number of sources , noise standardchical scheme such as the one in MMCA [16]. To do this, the deviation , noise gain , regularization parameter , andBSS model (1) is broken into rank-1 multiplications and the total number of iterations .following minimization problem is defined: Output: Dictionaries , sparse coefficients , source matrix , and mixing matrix . 1 begin (10) 2 Initialization: 3 Set to overcomplete DCT;where denotes the dictionary corresponding to the thsource, i.e., , and is the th residual expressed as 4 Set to a random column-normalized matrix; 5 ; (11) 6 Choose to be multiple times of ; 7 ; It is important to note that, with this new formulation, we haveto learn dictionaries, one for each source, different from (5). 8 repeatThe advantage of this scheme is learning adaptive source-spe- 9 ;cific dictionaries, which improves source diversity, as also seenin MMCA using known transform domains [21]. In addition, the 10 for to do
    • ABOLGHASEMI et al.: BLIND SEPARATION OF IMAGE SOURCES VIA ADAPTIVE DICTIONARY LEARNING 2925 11 Extract all the patches from ; Consider a known global unitary basis for each source. The minimization problem (10) can be modified as fol- 12 Solve the sparse recovery problem: lows to capture both global and local structures of the sources: s.t. ; 13 Update using K-SVD; (15) 14 Calculate the residual: ; Note that term is exactly similar to what was used in 15 Compute using (13); the original MMCA [16]. All variables in the above expression can be similarly estimated using Algorithm 1, except the actual 16 ; sources . In order to find , the gradient of (15) with 17 ; respect to is set to zero, leading to 18 end 19 ; 20 until stopping criterion is met; sgn (16)21 end As already implied, the motivation behind the proposed al-gorithm is to learn source-specific dictionaries offering sparse where sgn is a componentwise signum function. The aboverepresentations. Such dictionary learning, when embedded into expression amounts to soft thresholding [16] due to the signumthe source separation process, improves both separation and dic- function, and hence, the estimation of can be obtained bytionary learning quality. In other words, in the first few iter- the following steps:ations of Algorithm 1, each source includes portions of other • Soft thresholding of with threshold andsources. However, the dictionaries gradually learn the domi- attaining ;nant components and reject the weak portions caused by other • Reconstructing bysources. Using these dictionaries, the estimated sources, whichare used for dictionary learning in the next iteration, will have (17)less amount of irrelevant components. This adaptive process isrepeated until most of the irrelevant portions are rejected anddictionaries become source specific. The entire above procedure Note that, since is a unitary and known matrix, it is notis carried out to minimize the cost function in (10). It is also explicitly stored but implicitly applied as a forward or inversenoteworthy to mention that (10) is not jointly convex in all pa- transform where applicable. Similar to the previous section, therameters. Therefore, the alternating estimation of these parame- above expression can be executed pixelwise and is not compu-ters, using the proposed method, does not necessarily lead to an tationally expensive.optimal solution. This is the case for other previous alternatingminimization problems too [16], [24]. However, the obtainedparameters are local minima of (10) and, hence, are considered V. PRACTICAL ISSUESas approximate solutions. Implementation of the proposed methods needs some careful considerations, which are addressed here. IV. GLOBAL DICTIONARIES VERSUS LOCAL DICTIONARIES A. Noise The proposed method takes the advantages of fully local dic- Similar to the denoising methods in [1], the proposed methodtionaries that are learned within the separation task. We call also requires the knowledge about noise power. This should bethese dictionaries local since they capture the structure of small utilized in the sparse coding step of the K-SVD algorithm forimage patches to generate dictionary atoms. In contrast, the solving the following problem:global dictionaries are generally applied to the entire image orsignal and capture the global features. Incorporating the global s.t. (18)fixed dictionaries into the proposed method can be advanta-geous where a prior knowledge about the structure of sources where is a constant and is the noise standard deviation. Theis available. Such combined local and global dictionaries have above well-known sparse recovery problem can be solved usingbeen used in [22] for single mixture separation. Here, we con- orthogonal matching pursuit (OMP) [27]. The FOCal Underde-sider the multichannel case and extend our proposed method in termined System Solver (FOCUSS) [28] method is also used toSection III for this purpose. solve (18) by replacing the -norm with the -norm.
    • 2926 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 6, JUNE 2012 Furthermore, incorporating the noise power in solving (18) and , where denotes the number of ex-has an important advantage in the dictionary update stage. It tracted patches. Finally, normalization in line 17 costs .ensures that the dictionary does not learn the existing noise in Since all these calculations are executed for each source (i.e.,the patches. Consequently, the estimated image using this dic- ), then the total computation cost per each iteration oftionary would become cleaner, which will later refine the dic- the algorithm would betionary atoms in the next iteration. This progressive denoising . It is seen that, due to learning one dictio-loop is repeated until a clean image is achieved. nary for each individual source, the proposed algorithm would In the proposed method, however, noise means the portions be computationally demanding for large-scale problems.of other sources that are mixed with . Hence, we initially con- A possible approach to speed up the algorithm is to learn onesider a high value for since the portions of other sources dictionary for all the sources and then choose relevant atomsmight be high although the noise is zero, and then gradually for estimating each source. This strategy, however, has not beenreduce it to zero as the iterations evolve. Nevertheless, if the completed yet and is under development. In addition, consid-observations themselves are noisy, then we should start from a ering faster dictionary learning algorithms rather than K-SVDhigher bound and decrease it toward the actual noise power as can alleviate the problem.the iterations evolve. VI. RESULTSB. Dictionary and Patch Sizes In the first experiment, we illustrate the results of applying Unfortunately, not much can be theoretically said about Algorithm 1 to the mixtures of four sources. A severe casechoosing the optimum dictionary size (or redundancy factor of image sources with different morphologies was chosen to ). Highly overcomplete dictionaries allow higher sparsity examine the performance of the proposed method. A 6 4of the coefficients, although being computationally expensive. full-rank random column-normalized was used as the mixingHowever, it may not be necessary to choose very high redun- matrix. Five hundred iterations were selected as the stoppingdancy and it depends on the data type and number of training criterion. The mixtures were noiseless. However, we selectedsignals [24]. There have been some reported works allowing a (see below for the reasoning of this choice) and usedvariable dictionary size to find the optimum redundancy factor a decreasing starting from 10 and reaching 0.01 at the endsuch as in [29] and [30]. Such analysis is out of the scope of of iterations. The patches had 50% overlap. Other parametersthis paper, and we choose a fixed redundancy factor in our were chosen as follows: , , ,simulations. and . In addition, in order to show the advantages of Patch size is normally chosen depending on the entire image learning adaptive local dictionaries over the fixed local dic-size and also the knowledge (if available) about the patterns of tionaries, we applied Algorithm 1 to the same mixtures whilethe actual sources. However, adopting very large patches should ignoring the dictionary learning part. This way, source sepa-be avoided as they lead to large dictionaries and also provide ration is performed using local fixed DCT dictionaries for allfew training signals for the dictionary learning stage. We have sources. We also applied the following methods for comparisonchosen 8 8 patches for all the experiments, which seems to purposes: GMCA3 [21] based on wavelet transform, SPICA4become a standard in the literature. [13] based on WPT, and SMICA [5] based on Fourier trans- Furthermore, we normally use overlapping patches as shown form. Fig. 1 illustrates the results of applying all these methodsto give better results in our simulations and also in the cor- together with the corresponding MSEs. MSE is calculated asresponding literature [1], [25]. However, we have empirically MSE , where and are the originalobserved that the percentage of overlapping can be adjusted and recovered images, respectively. Fig. 1 (see the bottom row)based on the noise level. In noiseless settings, less percentage of shows that the proposed method could successfully recover theoverlap (e.g., 50%) would be sufficient, whereas in noisy situa- image sources via adaptive learning of the sparsifying dictio-tions, full overlapped patches result in the desired performance. naries. The achieved MSEs, given in Fig. 1, are lowest for theFull overlapped patches are obtained by one-pixel shifting of proposed method for all image sources except for the bricklikethe operator , in both vertical and horizontal directions, over texture. In addition, the learned dictionary atoms, as shown inthe entire image. Fig. 2, indicate good adaptation to the corresponding sources.C. Complexity Analysis Other methods do not perform as well as the proposed method. For instance, as shown in Fig. 1, SPICA has a problem in sep- The proposed algorithm is more computationally expensive aration of noiselike texture and Barbara. GMCA has separatedthan standard MMCA. It includes K-SVD, which imposes two all image sources with some interference from other sources.extra steps of sparse coding and dictionary update. However, SMICA has perfectly recovered the bricklike texture with thethe complexity of computing is almost the same as in MMCA lowest MSE among all other methods. However, it has somesince (13) can be computed pixelwise. The detailed analysis of difficulties in recovering Barbara and cartoon boy images.the complexity of the proposed method per iteration is as fol- As another measure on the performance of the proposedlows. Consider lines 12 and 13 in Algorithm 1, and assume method, the reconstruction error as a function of the numberfurther that we use approximate K-SVD [31]. Based on [31], of iterations is shown in Fig. 3. It is shown in this figure thatthese two lines cost , where , foreach . The complexity of computing the residual (line 14) is 3Available at: http://md.cosmostat.org/Home.html . Lines 15 and 16 respectively cost 4Available at: http://visl.technion.ac.il/bron/spica/
    • ABOLGHASEMI et al.: BLIND SEPARATION OF IMAGE SOURCES VIA ADAPTIVE DICTIONARY LEARNING 2927 Fig. 2. Obtained dictionaries as a result of applying the proposed method to the mixtures shown in Fig. 1. Fig. 3. MSE computed as . Note that the elements of image sources have amplitude in the range [0, 255]. Fig. 1. Results of applying different methods to six noiseless mixtures. lead the algorithm to fail. Among the existing related methods,a monotonic decrease in the value of the separation error is GMCA [21] uses a thresholding scheme to denoise the imageachieved and the asymptotic error is almost zero. sources during the separation. Hence, we applied this algorithm In the next experiment, the performance of the proposed to the same mixtures to compare its performance with that ofmethod in noisy situations is evaluated. For this purpose, we the proposed method. We applied SPICA and SMICA to thegenerated four mixtures from two image sources, namely, Lena same mixtures too. The results of this experiment are demon-and boat. Then, Gaussian noise with a standard deviation of strated in Fig. 4. It is shown in Fig. 4(f) that both separation and15 was added to the mixtures so that the peak signal-to-noise denoising tasks have been successful in the proposed method.ratios (PSNRs) of the mixtures are equal to 20 dB. (Note that The corresponding learned dictionaries using the proposedthe image size is 128 128.) The noisy mixtures are shown in method, given in Fig. 5(a), confirm these observations. InFig. 4(a). We applied the proposed algorithm to these mixtures addition, the proposed method is superior to GMCA, SPICA,starting with and evolving while was gradually and SMICA, and it denoises the sources with higher PSNRs,decreasing to . We used fully overlapped patches and as seen by comparing Fig. 4(c)–(f). Although GMCA, SPICA,200 iterations, with the rest of parameters similar to those in and SMICA have separated the image sources, they are notthe first experiment. One of the considerable advantages of the successful in denoising them.proposed method is the ability to denoise the sources during In addition, the sparsity rate of ’s, obtained by averagingthe separation. This is because the core idea of the proposed over the support of all (where is the number ofmethod comes from a denoising task. In contrast, in most patches), for both sources is given in Fig. 5(b). Note that a 4conventional BSS methods, denoising should be carried out 2 mixing matrix was used in this experiment. This grapheither before or after the separation, which is not ideal and may shows the percentage of used atoms by sparse coefficients,
    • 2928 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 6, JUNE 2012 Fig. 4. (a) Noisy mixtures. (b) Original sources. Separated sources using (c) GMCA, (d) SPICA, (e) SMICA, and (f) the proposed method. linear combinations of their atoms are sufficient to generate the image sources of interest. In another experiment, we evaluated and compared the influ- ences of adding global sparsity prior to the proposed algorithm. The experiment settings are as follows. Two images, namely, texture and Lena, were mixed together using a 4 2 random . Algorithm 1, which only takes advantage of locally learned dictionaries, was applied to the mixtures. The proposed algo- rithm in Section IV (local and global) was also applied to the same mixtures. We considered discrete wavelet transform and DCT as global sparsifying bases for Lena and texture, respec- tively. The methods of FastICA5 [32], JADE [2], and SMICA were also used in this experiment for comparison. We varied the noise standard deviation from 0 to 20 to investigate the perfor- mance of all these methods in dealing with different noise levels. Similar to [16] and [17], we calculated the mixing matrix crite- rion (also called Amari criterion [33]) as , where is the scaling and permutation matrix.6 The mixing ma- trix criterion curves are depicted in Fig. 6 as a function of stan- dard deviation. As it is shown in Fig. 6, in low/moderate noise situations, the best performance is achieved when both global and local dictionaries are used. However, the performance of the proposed method using only local and both local and globalFig. 5. (a) Obtained dictionaries as a result of applying the proposed method to dictionaries decreases in high noise. Moreover, the recovery re-the mixtures shown in Fig. 4(a). (b) Percentage of average number of nonzeros sults of JADE and FastICA are less accurate than those of theof sparse coefficients, equivalent to the occurrence rate of dictionary atomssorted based on a descending order. other methods. The performance of SMICA is somewhere be- tween JADE’s and that of the proposed method. Next, in order to find an optimal , we investigated the ef-sorted in descending order, for both dictionaries corresponding fects of choosing different ’s on the recovery quality. As ex-to Lena and boat. As found from Fig. 5(b), the most frequently pected and also shown in [1], selection of is dependent on theused atom shows a small percentage of appearance ( 17%), 5Available at: http://research.ics.tkk.fi/ica/fastica/and other dictionary atoms have been used less frequently. This 6The proposed method only causes signed permutations due to dealing withmeans that the learned dictionaries are so efficient that few column-normalized mixing matrices.
    • ABOLGHASEMI et al.: BLIND SEPARATION OF IMAGE SOURCES VIA ADAPTIVE DICTIONARY LEARNING 2929 TABLE I DETAILED SIMULATION TIME OF ALGORITHM 1. “TOTAL” MEANS THE TOTAL TIME (IN SECONDS) ELAPSED PER ONE ITERATION sources. A set of simulations was conducted to numerically demonstrate this effect and also to evaluate the effects of changing other dimensions such as patch size and dictionary size on the simulation time. In Table I, we have given these re- sults per one iteration of Algorithm 1, where the image patches had 50% overlap. A desktop computer with a Core 2 Duo CPU of 3 GHz and 2 GB of RAM was used for this experiment. It is interesting to note that, as shown in Table I, much of the computational burden is incurred by the dictionary update stage (using K-SVD). In addition, the computation of is far less complicated than both “sparse coding” and “dictionary update” due to pixelwise operation. This implies that furtherFig. 6. Mixing matrix criterion as a function of noise standard deviation fordifferent algorithms. Note that the “Global” method is actually equivalent to effort is required to speed up the dictionary learning part of thethe original MMCA in [16]. proposed algorithm. VII. CONCLUSION In this paper, the BSS problem has been addressed. The aim has been to take advantage of sparsifying dictionaries for this purpose. Unlike the existing sparsity-based methods, we as- sumed no prior knowledge about the underlying sparsity domain of the sources. Instead, we have proposed to fuse the learning of adaptive sparsifying dictionaries for each individual source into the separation process. Motivated by the idea of image de- noising via a learned dictionary from the patches of the cor- rupted image in [1], we proposed a hierarchical approach for this purpose. Our simulation results on both noisy and noiseless mix- tures have been encouraging and confirmed the effectiveness of the proposed approach. However, further work is requiredFig. 7. Mixing matrix criterion as a function of . to speed up the algorithm and make it suitable for large-scale problems. In addition, the possibility of applying the proposed method to the underdetermined mixtures (less observations thannoise standard deviation. However, our case is slightly different sources) is on our future plans.due to the source separation task.7 In our simulations, Gaussiannoises with different power values were added to the mixtures REFERENCESand the mixing matrix criterion was calculated while varying [1] M. Elad and M. Aharon, “Image denoising via sparse and redun- . Fig. 7 represents the achieved results for this experiment. In dant representations over learned dictionaries,” IEEE Trans. Imagethis figure, it can be found that all the curves achieve nearly the Process., vol. 15, no. 12, pp. 3736–3745, Dec. 2006.same recovery quality for . Therefore, is [2] A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component Anal- ysis. New York: Wiley-Interscience, May 2001.a reasonable choice. Interestingly, these results and the range of [3] D. D. Lee and H. S. Seung, “Learning the parts of objects by non-appropriate choices for are very similar to what were achieved negative matrix factorization,” Nature vol. 401, no. 6755, pp. 788–791,in [1]. Oct. 1999 [Online]. Available: http://dx.doi.org/10.1038/44565 [4] K. Kayabol, E. E. Kuruoglu, and B. Sankur, “Bayesian separation As already mentioned, it is expected that the computation of images modeled with MRFs using MCMC,” IEEE Trans. Imagetime of the proposed method increases for a large number of Process., vol. 18, no. 5, pp. 982–994, May 2009. [5] J.-F. Cardoso, H. Snoussi, J. Delabrouille, and G. Patanchon, “Blind 7Indeed, we empirically observed better performance by starting with a higher separation of noisy Gaussian stationary sources. Application to cosmicvalue of for solving (18) and decreasing it toward the true noise standard microwave background imaging,” in Proc. 11th EUSIPCO, Toulouse,deviation as the iterations proceed. France, Sep. 2002, pp. 561–564.
    • 2930 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 6, JUNE 2012 [6] Y. Moudden, J.-F. Cardoso, J.-L. Starck, and J. Delabrouille, “Blind [26] M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for component separation in wavelet space: Application to CMB analysis,” designing overcomplete dictionaries for sparse representation,” IEEE EURASIP J. Appl. Signal Process., vol. 2005, pp. 2437–2454, 2005. Trans. Signal Process., vol. 54, no. 11, pp. 4311–4322, Nov. 2006. [7] A. Tonazzini, L. Bedini, E. E. Kuruoglu, and E. Salerno, “Blind [27] J. A. Tropp and A. C. Gilbert, “Signal recovery from random measure- separation of auto-correlated images from noisy mixtures using MRF ments via orthogonal matching pursuit,” IEEE Trans. Inf. Theory, vol. models,” in Proc. Int. Symp. ICA, Nara, Japan, Apr. 2003, pp. 675–680. 53, no. 12, pp. 4655–4666, Dec. 2007. [8] A. Tonazzini, L. Bedini, and E. Salerno, “A Markov model for blind [28] I. Gorodnitsky and B. Rao, “Sparse signal reconstruction from limited image separation by a mean-field EM algorithm,” IEEE Trans. Image data using FOCUSS: A re-weighted minimum norm algorithm,” IEEE Process., vol. 15, no. 2, pp. 473–482, Feb. 2006. Trans. Signal Process., vol. 45, no. 3, pp. 600–616, Mar. 1997. [9] H. Snoussi and A. Mohammad-Djafari, “Fast joint separation and seg- [29] M. Yaghoobi, T. Blumensath, and M. Davies, “Parsimonious dictio- mentation of mixed images,” J. Electron. Imaging, vol. 13, no. 2, pp. nary learning,” in Proc. IEEE ICASSP, Apr. 2009, pp. 2869–2872. 349–361, Apr. 2004. [30] S. Tjoa, M. Stamm, W. Lin, and K. Liu, “Harmonic variable-size dic- [10] E. Kuruoglu, A. Tonazzini, and L. Bianchi, “Source separation in noisy tionary learning for music source separation,” in Proc. IEEE ICASSP, astrophysical images modelled by Markov random fields,” in Proc. Mar. 2010, pp. 413–416. ICIP, Oct. 2004, vol. 4, pp. 2701–2704. [31] R. Rubinstein, M. Zibulevsky, and M. Elad, Efficient implementation [11] M. M. Ichir and A. Mohammad-Djafari, “Hidden Markov models for of the K-SVD algorithm using batch orthogonal matching pursuit Tech- wavelet-based blind source separation,” IEEE Trans. Image Process., nion, Haifa, Israel, Tech. Rep., 2008. vol. 15, no. 7, pp. 1887–1899, Jul. 2006. [32] A. Hyvarinen, “Fast and robust fixed-point algorithms for independent [12] W. Addison and S. Roberts, “Blind source separation with non-sta- component analysis,” IEEE Trans. Neural Netw., vol. 10, no. 3, pp. tionary mixing using wavelets,” in Proc. Int. Conf. ICA, Charleston, 626–634, May 1999. SC, 2006. [33] S. Amari, A. Cichocki, and H. H. Yang, “A new learning algorithm for [13] A. M. Bronstein, M. M. Bronstein, M. Zibulevsky, and Y. Y. Zeevi, blind signal separation,” in Advances in Neural Information Processing “Sparse ICA for blind separation of transmitted and reflected images,” Systems. Cambridge, MA: MIT Press, 1996, pp. 757–763. Int. J. Imaging Syst. Technol., vol. 15, no. 1, pp. 84–91, 2005. [14] M. G. Jafari and M. D. Plumbley, “Separation of stereo speech signals based on a sparse dictionary algorithm,” in Proc. 16th EUSIPCO, Lau- sanne, Switzerland, Aug. 2008, pp. 25–29. [15] M. G. Jafari, M. D. Plumbley, and M. E. Davies, “Speech separation Vahid Abolghasemi (S’08–M’12) received the using an adaptive sparse dictionary algorithm,” in Proc. Joint Work- Ph.D. degree in signal and image processing from shop HSCMA, Trento, Italy, May 2008, pp. 25–28. the University of Surrey, Guildford, U.K., in 2011. [16] J. Bobin, Y. Moudden, J. Starck, and M. Elad, “Morphological diver- He is currently with the School of Engineering sity and source separation,” IEEE Signal Process. Lett., vol. 13, no. 7, and Design, Brunel University, Uxbridge, U.K. His pp. 409–412, Jul. 2006. main research interests include sparse signal and [17] J. Bobin, J. Starck, J. Fadili, and Y. Moudden, “Sparsity and mor- image analysis, dictionary learning, compressed phological diversity in blind source separation,” IEEE Trans. Image sensing, and blind source separation. Process., vol. 16, no. 11, pp. 2662–2674, Nov. 2007. [18] J. Starck, M. Elad, and D. Donoho, “Redundant multiscale transforms and their application for morphological component separation,” Adv. Imaging Electron Phys., vol. 132, pp. 287–348, 2004. [19] J.-L. Starck, M. Elad, and D. Donoho, “Image decomposition via the combination of sparse representations and a variational approach,” IEEE Trans. Image Process., vol. 14, no. 10, pp. 1570–1582, Oct. Saideh Ferdowsi (S’09) received the B.Sc. and M.Sc. degrees in electronic 2005. engineering from Shahrood University of Technology, Shahrood, Iran, in 2005 [20] M. Elad, J. Starck, P. Querre, and D. Donoho, “Simultaneous cartoon and 2007, respectively. She is currently working toward the Ph.D. degree in the and texture image inpainting using morphological component analysis Faculty of Engineering and Physical Sciences, University of Surrey, Guildford, (MCA),” Appl. Comput. Harmonic Anal., vol. 19, no. 3, pp. 340–358, U.K. Nov. 2005. Her main research interests include blind source separation and biomedical [21] J. Bobin, J.-L. Starck, J. M. Fadili, Y. Moudden, and D. L. Donoho, signal and image processing. “Morphological component analysis: An adaptive thresholding strategy,” IEEE Trans. Image Process., vol. 16, no. 11, pp. 2675–2681, Nov. 2007. [22] G. Peyre, J. Fadili, and J. Starck, “Learning the morphological diver- sity,” SIAM J. Imaging Sci., vol. 3, no. 3, pp. 646–669, Jul. 2010. Saeid Sanei (SM’05) received the Ph.D. degree in [23] G. Peyre, J. Fadili, and J.-L. Starck, “Learning adapted dictionaries for biomedical signal and image processing from Impe- geometry and texture separation,” in Proc. SPIE, 2007, p. 670 11T. rial College London, London, U.K., in 1991. [24] M. Elad, Sparse and Redundant Representations: From Theory He is currently a Reader in Neurocomputing at the to Applications in Signal and Image Processing. New York: Faculty of Engineering and Physical Sciences, Uni- Springer-Verlag, 2010. versity of Surrey, Guildford, U.K. His main area of [25] N. Shoham and M. Elad, “Algorithms for signal separation exploiting research is signal processing and its application in sparse representations, with application to texture image separation,” biomedical engineering. in Proc. IEEEI 25th Conv. Elect. Electron. Eng. Israel, Dec. 2008, pp. Dr. Sanei is an associate editor of IEEE SIGNAL 538–542. PROCESSING LETTERS and EURASIP Journals.