2922 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 6, JUNE 2012to separate the sources even if they are Gaussian and noisy, in MMCA is still an open issue. One possible approach to useprovided that they have different power spectra. It is shown in dictionary learning for MMCA is to learn a speciﬁc dictionary that this idea leads to an efﬁcient EM algorithm, much faster for each source from a set of exemplar images. This, however,than previous algorithms with non-Gaussianity assumption. is rarely the case, since in most BSS problems, such trainingAn extension of this method to the wavelet domain, called samples are not available.wSMICA, has been proposed by Moudden et al. . In this paper, we adapt MMCA to those cases that the spar- Wavelet domain analysis for image separation problem has sifying dictionaries/transforms are not available. The proposedbeen also shown to be advantageous in some other works. In algorithm is designed to adaptively learn the dictionaries from, Ichir and Mohammad-Djafari assumed three different the mixed images within the source separation process. Thismodels of the wavelet coefﬁcients, i.e., independent Gaussian method is motivated by the idea of image denoising using amixture model, hidden Markov tree model, and contextual learned dictionary from the corrupted image in . Other ex-hidden Markov ﬁeld model. Then, they proposed an MCMC tensions of the work in  have been also reported in , ,algorithm for joint blind separation of the sources and estima- and  for the purpose of image separation from a single mix-tion of the mixing matrix and the hyperparameters. The work ture. In this paper, the multichannel case with more observationsin  also uses the wavelet domain for source separation. than sources is considered. We start by theoreticallyIn another work, Bronstein et al.  studied the problem of extending the denoising problem to BSS. Then, a practical algo-recovering a scene recorded through a semi-reﬂecting medium rithm is proposed for BSS without any prior knowledge aboutand proposed a technique called sparse ICA (SPICA). They the sparse domains of the sources. The results indicate that adap-proposed to apply the wavelet packet transform (WPT) to the tive dictionary learning, one for each source, enhances the sep-mixtures prior to applying Infomax ICA. Their algorithm is arability of the sources.suitable for the cases where all the sources can be sparselyrepresented in the wavelet domain. A. Notation Sparsity-based approaches for the BSS problem have re- In this paper, all parameters have real values although it is notceived much attention recently. The term sparse refers to explicitly mentioned. We use small and capital boldface charac-signals or images with small number of nonzeros with respect ters to represent vectors and matrices, respectively. All vectorsto some representation bases. In sparse component analysis are represented as column vectors by default. For instance, the(SCA), the assumption is that the sources can be sparsely th column and th row of matrix are represented by columnrepresented using a known common basis or dictionary. For vectors and , respectively. The vectorized version of ma-instance, the aforementioned SPICA method  considers trix is shown by vec . The matrix Frobenius normthe wavelet domain for this purpose. In addition, the proposed is indicated by . The transpose operation is denoted bymethods in  and  use sparse dictionaries for separation . The -norm, which counts the number of nonzeros, andof speech mixtures. Nevertheless, there are many cases where the -norm, which sums over the absolute values of all ele-each source is sparse in a different domain, which makes it ments, are shown by and , respectively.difﬁcult to directly apply SCA methods to their mixtures. Mul-tichannel morphological component analysis (MMCA)  B. Organization of the Paperand generalized morphological component analysis (GMCA) In the next section, the motivation behind the proposed have been recently proposed to address this problem. method is stated. We start by describing the denoising problemThe main assumption in MMCA is that each source can be and its relation to image separation. Then, the extension ofsparsely represented in a speciﬁc known transform domain. In single mixture separation to the multichannel observationGMCA, each source is modeled as the linear combination of a scenario is argued. In Section III, a practical approach for thisnumber of morphological components where each component purpose is proposed. Section IV is devoted to discussing theis sparse in a speciﬁc basis. MMCA and GMCA are extensions possibility of adding global sparsity priors to the proposedof a previously proposed method of morphological component method. Some practical issues are discussed in Section V. Theanalysis (MCA) – to the multichannel case. In MCA, numerical results are given in Section VI. Finally, this paper isthe given signal/image is decomposed into different morpho- concluded in Section VII.logical components subject to sparsity of each component in aknown basis (or dictionary). II. FROM IMAGE DENOISING TO SOURCE SEPARATION MMCA performs well where prior knowledge about thesparse domain of each individual source is available; however, A. Image Denoisingwhat if such a priori is not available or what if the sources Consider a noisy image corrupted by additive noise. Elad andare not sparsely representable using the existing transforms? Aharon  showed that if the knowledge about the noise powerOne may answer these questions by referring to the dictionary is available, it is possible to denoise it by learning a local dic-learning framework. In dictionary learning framework, the aim tionary from the noisy image itself. In order to deal with largeis to ﬁnd an overcomplete dictionary that can sparsely represent images, they used small (overlapped) patches to learn such dic-a given set of images or signals. Utilizing learned dictionaries tionary. The obtained dictionary is considered local since it de-into MCA has been shown to yield promising results , scribes the features extracted from small patches. Let us repre-. However, taking the advantages of learned dictionaries sent the noisy image by vector of length . The
ABOLGHASEMI et al.: BLIND SEPARATION OF IMAGE SOURCES VIA ADAPTIVE DICTIONARY LEARNING 2923unknown denoised image is also vectorized and represented global transforms (e.g., wavelet and curvelet) for image sepa-as . The th patch from is shown by vector ration from a single mixture. Their simulation results show theof pixels. For notational simplicity, the th patch is ex- effects of adaptive dictionaries on separation of complex texturepressed as explicit multiplication of operator (a binary patterns from natural images.matrix) by .1 The overall denoising problem is expressed as All the related studies have demonstrated the advantages that adaptive dictionary learning can have on the separation task. However, there is still one missing piece and that is considering (2) such adaptivity for the multichannel mixtures. Performing the idea of learning local dictionaries within the source separationwhere scalars and control the noise power and sparsity de- of multichannel observations obviously has many beneﬁts ingree, respectively. In addition, is the sparsifying dic- different applications. Next, we extend the denoising methodtionary that contains normalized columns (also called atoms), in  for this purpose.and are sparse coefﬁcients of length . In the proposed algorithm by Elad and Aharon , and C. Multichannel Source Separationare respectively initialized with and overcompletediscrete cosine transform (DCT) dictionary. The minimization In this section, the aim is to extend the denoising problemof (2) starts with extracting and rearranging all the patches of . (2) to the multichannel source separation. Consider the BSSThe patches are then processed by K-SVD , which updates model introduced in Section I, and assume further that the and estimates sparse coefﬁcients . Afterward, and sources of interest are 2-D grayscale images. The BSS modelare assumed ﬁxed and is estimated by computing for 2-D sources can be represented by vectorizing all images to and then stacking them to form . The BSS model (1) cannot be (3) directly incorporated into (2) as it requires both and to be single vectors. Hence, we use the vectorized versions of these matrices in model (1), which can be obtained using thewhere is the identity matrix and is the reﬁned version of . properties of the Kronecker product2Again, and are updated by K-SVD but this time usingthe patches from that are less noisy. Such conjoined denoisingand dictionary adaptation is repeated to minimize (2). In prac-tice, (3) is obtained computationally easier since is (4)diagonal and the above expression can be calculated in a pixel-wise fashion. In the above expression, and are column vectors of lengths It is shown that (3) is a kind of averaging using both noisy and , respectively. In addition, is a column vector ofand denoised patches, which, if repeated along with updating of length . is a block diagonal matrix of sizeother parameters, will denoise the entire image . However, in , and is the Kronecker product symbol. We consider thethe sequel, we will try to ﬁnd out if this strategy is extendable noiseless setting and modify (2) tofor the cases where the noise is added to the mixtures of morethan one image.B. Image Separation (5) Image separation is a more complicated case of image de-noising where more than one image are to be recovered from asingle observation. Consider a single linear mixture of two tex- It is clearly seen that the above expression is similar to (2) buttures with additive noise: (or ). with an extra mixing matrix to be estimated. In addition, vec-The authors of  attempt to recover and using the prior tors and are much lengthier than and as they representknowledge about two sparsifying dictionaries and . They vectorized versions of multiple images and mixtures.use a minimum mean-squared-error (MSE) estimator for this The problem (5) can be minimized in an alternating schemepurpose. In contrast, the recent work in  does not assume by keeping all but one unknown ﬁxed at a time. The estima-any prior knowledge about the dictionaries. It rather attempts to tion of and can be achieved using K-SVD similar tolearn a single dictionary from the mixture and then applies a de- . In K-SVD, the dictionary is updated columnwise using sin-cision criterion to the dictionary atoms to separate the images. gular value decomposition (SVD); the sparse coefﬁcients areIn another recent work, Peyre et al.  presented an adaptive estimated using one of the common sparse coding techniques.MCA scheme by learning the morphologies of image layers. However, estimation of is slightly different from that ofThey proposed to use both adaptive local dictionaries and ﬁxed in (3) and needs more attention. It has a closed-form solution, 1Practically, we apply a nonlinear operation to extract the patches from image 2The matrix multiplication can be expressed as a linear transformation on by sliding a mask of appropriate size over the entire image similar to  and matrices. In particular, vec vec for three matrices ,. , and . In our case, vec vec .
2924 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 6, JUNE 2012which is obtained by taking the gradient of (5) and setting it to cost function in (10) does not require dealing with cumbersomezero matrices and vectors [as shown in (5)] and allows calculating the image sources in a pixelwise fashion. The solution for (10) is achieved using an alternating scheme. (6) The minimization process for the th level of hierarchy can be expressed as follows. The image patches are extracted fromleading to and then processed by K-SVD for learning and sparse coef- ﬁcients , whereas other parameters are kept ﬁxed. Then, the gradient of (10) with respect to is calculated and set to zero (12) (7) Finally, after some manipulations and simpliﬁcations in (12),In order to estimate , we consider all unknowns, except , the estimation of the th source is obtained asﬁxed and simplify (5) by converting the ﬁrst quadratic term intoan ordinary matrix product and obtain (8) (13) It is interesting to notice that the inverting term in the aboveThe above minimization problem is easily solved using pseu- expression is the same as that in (3) for the denoising problem.doinverse of as Thus, as aforementioned, this calculation can be obtained pixel- (9) wise. Next, in order to update , a simple least square linear re- gression such as the one in  will give the following solution: The above steps (i.e., estimation of , , , and ) should (14)be alternately repeated to minimize (5). However, the long ex-pression (7) is not practically computable, particularly if the However, normalization of is necessary after each updatenumber of sources and observations is large. This is because to preserve the column norm of the mixing matrix. The aboveof dealing with a huge matrix . In addition, steps for updating all variables are executed for all from 1 toin contrast to the aforementioned denoising problem, the matrix . Moreover, the entire procedure should be repeated to mini-to be inverted in (7) is not diagonal, and therefore, the estima- mize (10). A pseudocode of the proposed algorithm is given intion of cannot be calculated pixel by pixel, which makes the Algorithm 1.situation more difﬁcult to handle. In the next section, a practicalapproach is proposed to solve this problem. Algorithm 1 The proposed algorithm. III. ALGORITHM Input: Observation matrix , patch size , number of In order to ﬁnd a practical solution for (5), we use a hierar- dictionary atoms , number of sources , noise standardchical scheme such as the one in MMCA . To do this, the deviation , noise gain , regularization parameter , andBSS model (1) is broken into rank-1 multiplications and the total number of iterations .following minimization problem is deﬁned: Output: Dictionaries , sparse coefﬁcients , source matrix , and mixing matrix . 1 begin (10) 2 Initialization: 3 Set to overcomplete DCT;where denotes the dictionary corresponding to the thsource, i.e., , and is the th residual expressed as 4 Set to a random column-normalized matrix; 5 ; (11) 6 Choose to be multiple times of ; 7 ; It is important to note that, with this new formulation, we haveto learn dictionaries, one for each source, different from (5). 8 repeatThe advantage of this scheme is learning adaptive source-spe- 9 ;ciﬁc dictionaries, which improves source diversity, as also seenin MMCA using known transform domains . In addition, the 10 for to do
ABOLGHASEMI et al.: BLIND SEPARATION OF IMAGE SOURCES VIA ADAPTIVE DICTIONARY LEARNING 2925 11 Extract all the patches from ; Consider a known global unitary basis for each source. The minimization problem (10) can be modiﬁed as fol- 12 Solve the sparse recovery problem: lows to capture both global and local structures of the sources: s.t. ; 13 Update using K-SVD; (15) 14 Calculate the residual: ; Note that term is exactly similar to what was used in 15 Compute using (13); the original MMCA . All variables in the above expression can be similarly estimated using Algorithm 1, except the actual 16 ; sources . In order to ﬁnd , the gradient of (15) with 17 ; respect to is set to zero, leading to 18 end 19 ; 20 until stopping criterion is met; sgn (16)21 end As already implied, the motivation behind the proposed al-gorithm is to learn source-speciﬁc dictionaries offering sparse where sgn is a componentwise signum function. The aboverepresentations. Such dictionary learning, when embedded into expression amounts to soft thresholding  due to the signumthe source separation process, improves both separation and dic- function, and hence, the estimation of can be obtained bytionary learning quality. In other words, in the ﬁrst few iter- the following steps:ations of Algorithm 1, each source includes portions of other • Soft thresholding of with threshold andsources. However, the dictionaries gradually learn the domi- attaining ;nant components and reject the weak portions caused by other • Reconstructing bysources. Using these dictionaries, the estimated sources, whichare used for dictionary learning in the next iteration, will have (17)less amount of irrelevant components. This adaptive process isrepeated until most of the irrelevant portions are rejected anddictionaries become source speciﬁc. The entire above procedure Note that, since is a unitary and known matrix, it is notis carried out to minimize the cost function in (10). It is also explicitly stored but implicitly applied as a forward or inversenoteworthy to mention that (10) is not jointly convex in all pa- transform where applicable. Similar to the previous section, therameters. Therefore, the alternating estimation of these parame- above expression can be executed pixelwise and is not compu-ters, using the proposed method, does not necessarily lead to an tationally expensive.optimal solution. This is the case for other previous alternatingminimization problems too , . However, the obtainedparameters are local minima of (10) and, hence, are considered V. PRACTICAL ISSUESas approximate solutions. Implementation of the proposed methods needs some careful considerations, which are addressed here. IV. GLOBAL DICTIONARIES VERSUS LOCAL DICTIONARIES A. Noise The proposed method takes the advantages of fully local dic- Similar to the denoising methods in , the proposed methodtionaries that are learned within the separation task. We call also requires the knowledge about noise power. This should bethese dictionaries local since they capture the structure of small utilized in the sparse coding step of the K-SVD algorithm forimage patches to generate dictionary atoms. In contrast, the solving the following problem:global dictionaries are generally applied to the entire image orsignal and capture the global features. Incorporating the global s.t. (18)ﬁxed dictionaries into the proposed method can be advanta-geous where a prior knowledge about the structure of sources where is a constant and is the noise standard deviation. Theis available. Such combined local and global dictionaries have above well-known sparse recovery problem can be solved usingbeen used in  for single mixture separation. Here, we con- orthogonal matching pursuit (OMP) . The FOCal Underde-sider the multichannel case and extend our proposed method in termined System Solver (FOCUSS)  method is also used toSection III for this purpose. solve (18) by replacing the -norm with the -norm.
2926 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 6, JUNE 2012 Furthermore, incorporating the noise power in solving (18) and , where denotes the number of ex-has an important advantage in the dictionary update stage. It tracted patches. Finally, normalization in line 17 costs .ensures that the dictionary does not learn the existing noise in Since all these calculations are executed for each source (i.e.,the patches. Consequently, the estimated image using this dic- ), then the total computation cost per each iteration oftionary would become cleaner, which will later reﬁne the dic- the algorithm would betionary atoms in the next iteration. This progressive denoising . It is seen that, due to learning one dictio-loop is repeated until a clean image is achieved. nary for each individual source, the proposed algorithm would In the proposed method, however, noise means the portions be computationally demanding for large-scale problems.of other sources that are mixed with . Hence, we initially con- A possible approach to speed up the algorithm is to learn onesider a high value for since the portions of other sources dictionary for all the sources and then choose relevant atomsmight be high although the noise is zero, and then gradually for estimating each source. This strategy, however, has not beenreduce it to zero as the iterations evolve. Nevertheless, if the completed yet and is under development. In addition, consid-observations themselves are noisy, then we should start from a ering faster dictionary learning algorithms rather than K-SVDhigher bound and decrease it toward the actual noise power as can alleviate the problem.the iterations evolve. VI. RESULTSB. Dictionary and Patch Sizes In the ﬁrst experiment, we illustrate the results of applying Unfortunately, not much can be theoretically said about Algorithm 1 to the mixtures of four sources. A severe casechoosing the optimum dictionary size (or redundancy factor of image sources with different morphologies was chosen to ). Highly overcomplete dictionaries allow higher sparsity examine the performance of the proposed method. A 6 4of the coefﬁcients, although being computationally expensive. full-rank random column-normalized was used as the mixingHowever, it may not be necessary to choose very high redun- matrix. Five hundred iterations were selected as the stoppingdancy and it depends on the data type and number of training criterion. The mixtures were noiseless. However, we selectedsignals . There have been some reported works allowing a (see below for the reasoning of this choice) and usedvariable dictionary size to ﬁnd the optimum redundancy factor a decreasing starting from 10 and reaching 0.01 at the endsuch as in  and . Such analysis is out of the scope of of iterations. The patches had 50% overlap. Other parametersthis paper, and we choose a ﬁxed redundancy factor in our were chosen as follows: , , ,simulations. and . In addition, in order to show the advantages of Patch size is normally chosen depending on the entire image learning adaptive local dictionaries over the ﬁxed local dic-size and also the knowledge (if available) about the patterns of tionaries, we applied Algorithm 1 to the same mixtures whilethe actual sources. However, adopting very large patches should ignoring the dictionary learning part. This way, source sepa-be avoided as they lead to large dictionaries and also provide ration is performed using local ﬁxed DCT dictionaries for allfew training signals for the dictionary learning stage. We have sources. We also applied the following methods for comparisonchosen 8 8 patches for all the experiments, which seems to purposes: GMCA3  based on wavelet transform, SPICA4become a standard in the literature.  based on WPT, and SMICA  based on Fourier trans- Furthermore, we normally use overlapping patches as shown form. Fig. 1 illustrates the results of applying all these methodsto give better results in our simulations and also in the cor- together with the corresponding MSEs. MSE is calculated asresponding literature , . However, we have empirically MSE , where and are the originalobserved that the percentage of overlapping can be adjusted and recovered images, respectively. Fig. 1 (see the bottom row)based on the noise level. In noiseless settings, less percentage of shows that the proposed method could successfully recover theoverlap (e.g., 50%) would be sufﬁcient, whereas in noisy situa- image sources via adaptive learning of the sparsifying dictio-tions, full overlapped patches result in the desired performance. naries. The achieved MSEs, given in Fig. 1, are lowest for theFull overlapped patches are obtained by one-pixel shifting of proposed method for all image sources except for the bricklikethe operator , in both vertical and horizontal directions, over texture. In addition, the learned dictionary atoms, as shown inthe entire image. Fig. 2, indicate good adaptation to the corresponding sources.C. Complexity Analysis Other methods do not perform as well as the proposed method. For instance, as shown in Fig. 1, SPICA has a problem in sep- The proposed algorithm is more computationally expensive aration of noiselike texture and Barbara. GMCA has separatedthan standard MMCA. It includes K-SVD, which imposes two all image sources with some interference from other sources.extra steps of sparse coding and dictionary update. However, SMICA has perfectly recovered the bricklike texture with thethe complexity of computing is almost the same as in MMCA lowest MSE among all other methods. However, it has somesince (13) can be computed pixelwise. The detailed analysis of difﬁculties in recovering Barbara and cartoon boy images.the complexity of the proposed method per iteration is as fol- As another measure on the performance of the proposedlows. Consider lines 12 and 13 in Algorithm 1, and assume method, the reconstruction error as a function of the numberfurther that we use approximate K-SVD . Based on , of iterations is shown in Fig. 3. It is shown in this ﬁgure thatthese two lines cost , where , foreach . The complexity of computing the residual (line 14) is 3Available at: http://md.cosmostat.org/Home.html . Lines 15 and 16 respectively cost 4Available at: http://visl.technion.ac.il/bron/spica/
ABOLGHASEMI et al.: BLIND SEPARATION OF IMAGE SOURCES VIA ADAPTIVE DICTIONARY LEARNING 2927 Fig. 2. Obtained dictionaries as a result of applying the proposed method to the mixtures shown in Fig. 1. Fig. 3. MSE computed as . Note that the elements of image sources have amplitude in the range [0, 255]. Fig. 1. Results of applying different methods to six noiseless mixtures. lead the algorithm to fail. Among the existing related methods,a monotonic decrease in the value of the separation error is GMCA  uses a thresholding scheme to denoise the imageachieved and the asymptotic error is almost zero. sources during the separation. Hence, we applied this algorithm In the next experiment, the performance of the proposed to the same mixtures to compare its performance with that ofmethod in noisy situations is evaluated. For this purpose, we the proposed method. We applied SPICA and SMICA to thegenerated four mixtures from two image sources, namely, Lena same mixtures too. The results of this experiment are demon-and boat. Then, Gaussian noise with a standard deviation of strated in Fig. 4. It is shown in Fig. 4(f) that both separation and15 was added to the mixtures so that the peak signal-to-noise denoising tasks have been successful in the proposed method.ratios (PSNRs) of the mixtures are equal to 20 dB. (Note that The corresponding learned dictionaries using the proposedthe image size is 128 128.) The noisy mixtures are shown in method, given in Fig. 5(a), conﬁrm these observations. InFig. 4(a). We applied the proposed algorithm to these mixtures addition, the proposed method is superior to GMCA, SPICA,starting with and evolving while was gradually and SMICA, and it denoises the sources with higher PSNRs,decreasing to . We used fully overlapped patches and as seen by comparing Fig. 4(c)–(f). Although GMCA, SPICA,200 iterations, with the rest of parameters similar to those in and SMICA have separated the image sources, they are notthe ﬁrst experiment. One of the considerable advantages of the successful in denoising them.proposed method is the ability to denoise the sources during In addition, the sparsity rate of ’s, obtained by averagingthe separation. This is because the core idea of the proposed over the support of all (where is the number ofmethod comes from a denoising task. In contrast, in most patches), for both sources is given in Fig. 5(b). Note that a 4conventional BSS methods, denoising should be carried out 2 mixing matrix was used in this experiment. This grapheither before or after the separation, which is not ideal and may shows the percentage of used atoms by sparse coefﬁcients,
2928 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 6, JUNE 2012 Fig. 4. (a) Noisy mixtures. (b) Original sources. Separated sources using (c) GMCA, (d) SPICA, (e) SMICA, and (f) the proposed method. linear combinations of their atoms are sufﬁcient to generate the image sources of interest. In another experiment, we evaluated and compared the inﬂu- ences of adding global sparsity prior to the proposed algorithm. The experiment settings are as follows. Two images, namely, texture and Lena, were mixed together using a 4 2 random . Algorithm 1, which only takes advantage of locally learned dictionaries, was applied to the mixtures. The proposed algo- rithm in Section IV (local and global) was also applied to the same mixtures. We considered discrete wavelet transform and DCT as global sparsifying bases for Lena and texture, respec- tively. The methods of FastICA5 , JADE , and SMICA were also used in this experiment for comparison. We varied the noise standard deviation from 0 to 20 to investigate the perfor- mance of all these methods in dealing with different noise levels. Similar to  and , we calculated the mixing matrix crite- rion (also called Amari criterion ) as , where is the scaling and permutation matrix.6 The mixing ma- trix criterion curves are depicted in Fig. 6 as a function of stan- dard deviation. As it is shown in Fig. 6, in low/moderate noise situations, the best performance is achieved when both global and local dictionaries are used. However, the performance of the proposed method using only local and both local and globalFig. 5. (a) Obtained dictionaries as a result of applying the proposed method to dictionaries decreases in high noise. Moreover, the recovery re-the mixtures shown in Fig. 4(a). (b) Percentage of average number of nonzeros sults of JADE and FastICA are less accurate than those of theof sparse coefﬁcients, equivalent to the occurrence rate of dictionary atomssorted based on a descending order. other methods. The performance of SMICA is somewhere be- tween JADE’s and that of the proposed method. Next, in order to ﬁnd an optimal , we investigated the ef-sorted in descending order, for both dictionaries corresponding fects of choosing different ’s on the recovery quality. As ex-to Lena and boat. As found from Fig. 5(b), the most frequently pected and also shown in , selection of is dependent on theused atom shows a small percentage of appearance ( 17%), 5Available at: http://research.ics.tkk.ﬁ/ica/fastica/and other dictionary atoms have been used less frequently. This 6The proposed method only causes signed permutations due to dealing withmeans that the learned dictionaries are so efﬁcient that few column-normalized mixing matrices.
ABOLGHASEMI et al.: BLIND SEPARATION OF IMAGE SOURCES VIA ADAPTIVE DICTIONARY LEARNING 2929 TABLE I DETAILED SIMULATION TIME OF ALGORITHM 1. “TOTAL” MEANS THE TOTAL TIME (IN SECONDS) ELAPSED PER ONE ITERATION sources. A set of simulations was conducted to numerically demonstrate this effect and also to evaluate the effects of changing other dimensions such as patch size and dictionary size on the simulation time. In Table I, we have given these re- sults per one iteration of Algorithm 1, where the image patches had 50% overlap. A desktop computer with a Core 2 Duo CPU of 3 GHz and 2 GB of RAM was used for this experiment. It is interesting to note that, as shown in Table I, much of the computational burden is incurred by the dictionary update stage (using K-SVD). In addition, the computation of is far less complicated than both “sparse coding” and “dictionary update” due to pixelwise operation. This implies that furtherFig. 6. Mixing matrix criterion as a function of noise standard deviation fordifferent algorithms. Note that the “Global” method is actually equivalent to effort is required to speed up the dictionary learning part of thethe original MMCA in . proposed algorithm. VII. CONCLUSION In this paper, the BSS problem has been addressed. The aim has been to take advantage of sparsifying dictionaries for this purpose. Unlike the existing sparsity-based methods, we as- sumed no prior knowledge about the underlying sparsity domain of the sources. Instead, we have proposed to fuse the learning of adaptive sparsifying dictionaries for each individual source into the separation process. Motivated by the idea of image de- noising via a learned dictionary from the patches of the cor- rupted image in , we proposed a hierarchical approach for this purpose. Our simulation results on both noisy and noiseless mix- tures have been encouraging and conﬁrmed the effectiveness of the proposed approach. However, further work is requiredFig. 7. Mixing matrix criterion as a function of . to speed up the algorithm and make it suitable for large-scale problems. In addition, the possibility of applying the proposed method to the underdetermined mixtures (less observations thannoise standard deviation. However, our case is slightly different sources) is on our future plans.due to the source separation task.7 In our simulations, Gaussiannoises with different power values were added to the mixtures REFERENCESand the mixing matrix criterion was calculated while varying  M. Elad and M. Aharon, “Image denoising via sparse and redun- . Fig. 7 represents the achieved results for this experiment. In dant representations over learned dictionaries,” IEEE Trans. Imagethis ﬁgure, it can be found that all the curves achieve nearly the Process., vol. 15, no. 12, pp. 3736–3745, Dec. 2006.same recovery quality for . Therefore, is  A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component Anal- ysis. New York: Wiley-Interscience, May 2001.a reasonable choice. Interestingly, these results and the range of  D. D. Lee and H. S. Seung, “Learning the parts of objects by non-appropriate choices for are very similar to what were achieved negative matrix factorization,” Nature vol. 401, no. 6755, pp. 788–791,in . Oct. 1999 [Online]. Available: http://dx.doi.org/10.1038/44565  K. Kayabol, E. E. Kuruoglu, and B. Sankur, “Bayesian separation As already mentioned, it is expected that the computation of images modeled with MRFs using MCMC,” IEEE Trans. Imagetime of the proposed method increases for a large number of Process., vol. 18, no. 5, pp. 982–994, May 2009.  J.-F. Cardoso, H. Snoussi, J. Delabrouille, and G. Patanchon, “Blind 7Indeed, we empirically observed better performance by starting with a higher separation of noisy Gaussian stationary sources. Application to cosmicvalue of for solving (18) and decreasing it toward the true noise standard microwave background imaging,” in Proc. 11th EUSIPCO, Toulouse,deviation as the iterations proceed. France, Sep. 2002, pp. 561–564.
2930 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 6, JUNE 2012  Y. Moudden, J.-F. Cardoso, J.-L. Starck, and J. Delabrouille, “Blind  M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for component separation in wavelet space: Application to CMB analysis,” designing overcomplete dictionaries for sparse representation,” IEEE EURASIP J. Appl. Signal Process., vol. 2005, pp. 2437–2454, 2005. Trans. Signal Process., vol. 54, no. 11, pp. 4311–4322, Nov. 2006.  A. Tonazzini, L. Bedini, E. E. Kuruoglu, and E. Salerno, “Blind  J. A. Tropp and A. C. Gilbert, “Signal recovery from random measure- separation of auto-correlated images from noisy mixtures using MRF ments via orthogonal matching pursuit,” IEEE Trans. Inf. Theory, vol. models,” in Proc. Int. Symp. ICA, Nara, Japan, Apr. 2003, pp. 675–680. 53, no. 12, pp. 4655–4666, Dec. 2007.  A. Tonazzini, L. Bedini, and E. Salerno, “A Markov model for blind  I. Gorodnitsky and B. Rao, “Sparse signal reconstruction from limited image separation by a mean-ﬁeld EM algorithm,” IEEE Trans. Image data using FOCUSS: A re-weighted minimum norm algorithm,” IEEE Process., vol. 15, no. 2, pp. 473–482, Feb. 2006. Trans. Signal Process., vol. 45, no. 3, pp. 600–616, Mar. 1997.  H. Snoussi and A. Mohammad-Djafari, “Fast joint separation and seg-  M. Yaghoobi, T. Blumensath, and M. Davies, “Parsimonious dictio- mentation of mixed images,” J. Electron. Imaging, vol. 13, no. 2, pp. nary learning,” in Proc. IEEE ICASSP, Apr. 2009, pp. 2869–2872. 349–361, Apr. 2004.  S. Tjoa, M. Stamm, W. Lin, and K. Liu, “Harmonic variable-size dic-  E. Kuruoglu, A. Tonazzini, and L. Bianchi, “Source separation in noisy tionary learning for music source separation,” in Proc. IEEE ICASSP, astrophysical images modelled by Markov random ﬁelds,” in Proc. Mar. 2010, pp. 413–416. ICIP, Oct. 2004, vol. 4, pp. 2701–2704.  R. Rubinstein, M. Zibulevsky, and M. Elad, Efﬁcient implementation  M. M. Ichir and A. Mohammad-Djafari, “Hidden Markov models for of the K-SVD algorithm using batch orthogonal matching pursuit Tech- wavelet-based blind source separation,” IEEE Trans. Image Process., nion, Haifa, Israel, Tech. Rep., 2008. vol. 15, no. 7, pp. 1887–1899, Jul. 2006.  A. Hyvarinen, “Fast and robust ﬁxed-point algorithms for independent  W. Addison and S. Roberts, “Blind source separation with non-sta- component analysis,” IEEE Trans. Neural Netw., vol. 10, no. 3, pp. tionary mixing using wavelets,” in Proc. Int. Conf. ICA, Charleston, 626–634, May 1999. SC, 2006.  S. Amari, A. Cichocki, and H. H. Yang, “A new learning algorithm for  A. M. Bronstein, M. M. Bronstein, M. Zibulevsky, and Y. Y. Zeevi, blind signal separation,” in Advances in Neural Information Processing “Sparse ICA for blind separation of transmitted and reﬂected images,” Systems. Cambridge, MA: MIT Press, 1996, pp. 757–763. Int. J. Imaging Syst. Technol., vol. 15, no. 1, pp. 84–91, 2005.  M. G. Jafari and M. D. Plumbley, “Separation of stereo speech signals based on a sparse dictionary algorithm,” in Proc. 16th EUSIPCO, Lau- sanne, Switzerland, Aug. 2008, pp. 25–29.  M. G. Jafari, M. D. Plumbley, and M. E. Davies, “Speech separation Vahid Abolghasemi (S’08–M’12) received the using an adaptive sparse dictionary algorithm,” in Proc. Joint Work- Ph.D. degree in signal and image processing from shop HSCMA, Trento, Italy, May 2008, pp. 25–28. the University of Surrey, Guildford, U.K., in 2011.  J. Bobin, Y. Moudden, J. Starck, and M. Elad, “Morphological diver- He is currently with the School of Engineering sity and source separation,” IEEE Signal Process. Lett., vol. 13, no. 7, and Design, Brunel University, Uxbridge, U.K. His pp. 409–412, Jul. 2006. main research interests include sparse signal and  J. Bobin, J. Starck, J. Fadili, and Y. Moudden, “Sparsity and mor- image analysis, dictionary learning, compressed phological diversity in blind source separation,” IEEE Trans. Image sensing, and blind source separation. Process., vol. 16, no. 11, pp. 2662–2674, Nov. 2007.  J. Starck, M. Elad, and D. Donoho, “Redundant multiscale transforms and their application for morphological component separation,” Adv. Imaging Electron Phys., vol. 132, pp. 287–348, 2004.  J.-L. Starck, M. Elad, and D. Donoho, “Image decomposition via the combination of sparse representations and a variational approach,” IEEE Trans. Image Process., vol. 14, no. 10, pp. 1570–1582, Oct. Saideh Ferdowsi (S’09) received the B.Sc. and M.Sc. degrees in electronic 2005. engineering from Shahrood University of Technology, Shahrood, Iran, in 2005  M. Elad, J. Starck, P. Querre, and D. Donoho, “Simultaneous cartoon and 2007, respectively. She is currently working toward the Ph.D. degree in the and texture image inpainting using morphological component analysis Faculty of Engineering and Physical Sciences, University of Surrey, Guildford, (MCA),” Appl. Comput. Harmonic Anal., vol. 19, no. 3, pp. 340–358, U.K. Nov. 2005. Her main research interests include blind source separation and biomedical  J. Bobin, J.-L. Starck, J. M. Fadili, Y. Moudden, and D. L. Donoho, signal and image processing. “Morphological component analysis: An adaptive thresholding strategy,” IEEE Trans. Image Process., vol. 16, no. 11, pp. 2675–2681, Nov. 2007.  G. Peyre, J. Fadili, and J. Starck, “Learning the morphological diver- sity,” SIAM J. Imaging Sci., vol. 3, no. 3, pp. 646–669, Jul. 2010. Saeid Sanei (SM’05) received the Ph.D. degree in  G. Peyre, J. Fadili, and J.-L. Starck, “Learning adapted dictionaries for biomedical signal and image processing from Impe- geometry and texture separation,” in Proc. SPIE, 2007, p. 670 11T. rial College London, London, U.K., in 1991.  M. Elad, Sparse and Redundant Representations: From Theory He is currently a Reader in Neurocomputing at the to Applications in Signal and Image Processing. New York: Faculty of Engineering and Physical Sciences, Uni- Springer-Verlag, 2010. versity of Surrey, Guildford, U.K. His main area of  N. Shoham and M. Elad, “Algorithms for signal separation exploiting research is signal processing and its application in sparse representations, with application to texture image separation,” biomedical engineering. in Proc. IEEEI 25th Conv. Elect. Electron. Eng. Israel, Dec. 2008, pp. Dr. Sanei is an associate editor of IEEE SIGNAL 538–542. PROCESSING LETTERS and EURASIP Journals.