Independent
Component Analaysis
1
 Significant recent advances in the field of statistical signal processing
should be brought to the attention of the biomedical engineering
community.
 Algorithms have been proposed to separate multiple signal sources based
solely on their statistical independence, instead of the usual spectral
differences.
 These algorithms promise to:
 lead to more accurate source modeling,
 more effective artifact rejection algorithms.
2
Motivation
 Problem: Decomposing a mixed signal into independent sources
Ex.
Given: Mixed Signal
Our Objective is to gain:
Source1 News
Source2 Song
 ICA (Independent Component Analysis) is a quite powerful technique to
separate independent sources
3
BSS and ICA
 Cocktail party or Blind Source Separation (BSS) problem
 Ill posed problem, unless assumptions are made!
 Most common assumption is that source signals are statistically
independent. This means knowing value of one of them gives no
information about the other.
 Methods based on this assumption are called Independent
Component Analysis methods
 statistical techniques for decomposing a complex data set into
independent parts.
 It can be shown that under some reasonable conditions, if the
ICA assumption holds, then the source signals can be recovered
up to permutation and scaling.
4
 Statistical independence
 In probability theory, to say that two events are independent
means that the occurrence of one event makes it neither more
nor less probable that the other occurs.
 Examples of such independent random variables are the value
of a dice thrown and of a coin tossed, or speech signal and
background noise originating from
 a ventilation system at a certain time instant.
 This means if we have two signals A and B one signal doesn’t
give any information with regards of other signal this is the
definition of statistical independent with aspect to ICA
Definitions
)()()()(
)(
)(
)|( ,
,
yPXPyxPyP
xP
yxP
xyP yxyxy
x
yx
y  

5
Example
cocktail-party problem
The microphones give us two recorded time signals. We denote
them with x=(x1(t), x2(t)). x1 and x2 are the amplitudes and t is the
time index. We denote the independent signals by s=(s1(t),s2(t));
A - mixing matrix (2x2)
x1(t) = a11s1 +a12s2
x2(t) = a21s1 +a22s2
a11,a12,a21, and a22 are some parameters that depend on the distances of the
microphones from the speakers. It would be very good if we could estimate the two
original speech signals s1(t) and s2(t), using only the recorded signals x1(t) and x2(t). We
need to estimate the aij., but it is enough to assume that s1(t) and s2(t), at each time
instant t, are statistically independent. The main task is to transform the data (x); s=Ax to
independent components, measured by function: F(s1,s2)
6
Recordings in real environments
Separation of Music & Speech
Experiment-Setup:
- office room (5m x 4m)
- two distant talking mics
- 16kHz sampling rate
40cm
60cm
7
The ICA model
s1 s2
s3 s4
x1 x2 x3 x4
a11
a12
a13
a14
xi(t) = ai1*s1(t) +
ai2*s2(t) +
ai3*s3(t) +
ai4*s4(t)
Here, i=1:4.
In vector-matrix notation,
and dropping index t, this is
x = A * s
8
Restrictions on ICA
 To make sure that the basic ICA model just given can be estimated, we
have to make certain assumptions and restrictions.
 1. The independent components are assumed statistically independent
 2. The independent components must have nongaussian distributions
 3. For simplicity, we assume that the unknown mixing matrix is square.
 In other words, the number of independent components is equal to the number
of observed mixtures.
9
Central Limit Theorem
 Central limit theorem states that linear combination of given random variables is more
Gaussian as compared to the original variables themselves.
 Example:
 Repeated throwing of nonbiased dices, with the result given by the sum of the dice faces.
 • For one dice the distribution is uniform;
 • For two dices the sum is piecewise linearly distributed;
 • For a set of k dices, the distribution is given by a
 piecewise k-order polynomial distribution;
 • With the increase of k, the distribution of the sum tends towards a more Gaussian one.
 See Matlab script !!
10
One Dice
11
2 dices
12
5 dices
13
9 Dices
14
10 Dices
15
Why does ICA require signals to be
non-Gaussian ?
 Central limit theorem states that linear combination of given random
variables is more Gaussian as compared to the original variables
themselves.
 So if we have two independent Gaussian sources after mixing them they
become more Gaussian as central limit theorem state, which make it
impossible to recover the original sources
16
Mathematical framework
 Very general-purpose method of signal processing and data analysis. Definition of
ICA
 Statistical latent variables model n linear mixtures of n independent components
 N linear mixtures of N independent components
 𝑋1, 𝑋1 … … 𝑋 𝑛 and 𝑆1, 𝑆2 … … 𝑆 𝑛are considered random variables, not proper time
signals. The values of the signals are considered samples (instantiations) of the
random variables, not functions of time. The mean value is taken zero, without loss
of generality
17
18
 The linear mixing equation (the ICA model):
 Denoting by aj – the j th column of matrix A; the model becomes
 The ICA model is a generative model, i.e., it describes how the observed data
are generated by mixing the components si .
 The independent components are latent variables, i.e., not directly observable.
 The mixing matrix A is also unknown.
 We observe only the random vector x, and we must estimate both A and s. This
must be done under as general assumptions as possible.
19
 ICA is a special case of blind source separation (BSS)
Blind means that we know very little, if anything, on the mixing matrix, and
make little assumptions on the source signals.
Basic ICA assumption: The source components are
statistically independent  they have (unknown) distributions as
Non-gaussian as possible  optimize a certain contrast function
The problem is finding W - the unmixing matrix that gives
- the best estimate of the independent source vector:
• If the unknown mixing matrix A is square and nonsingular, then
• Else, the best unmixing matrix, that separates sources as independent as
possible, is given by the generalized inverse Penrose-Moore matrix:
20
Ambiguities of ICA
 Permutation Ambiguity:
 Assume that P is a nxn permutation matrix.
 Given only the s , we cannot distinguish between W and PW
 The permutation of the original sources is ambiguous
21
Ambiguities of ICA
 Scaling Ambiguity:
 We cannot recover the “correct” scaling of the sources
 Solution: Data whitening (sphering)
 Scaling a speaker's speech signal by some positive factor affects only the
volume of that speaker's speech.
 Also, sign changes do not matter: and sound identical when played on a
speaker.
22
Whitening (sphering)
 Suppose 𝑋 is a random column vector with covariance matrix M and mean
zero.
 Whitening X means simply multiplying it by 𝑀−1/2
 Steps:
 Find the covariance matrix of X, Cov(X) = 𝑋𝑋 𝑇
 Find the inverse of the square root of Cov(X) and call it invcov
 Find the mean of X then subtract it from X
 Finally multiply invcov by X to get the whitened version of X
 Whitening solves half of the ICA problem !!
 See matlab script !!
23
ICA Algorithm
 To find the unmixing matrix W it is a maximization problem
 We want to find the unmixing matrix W such that the non-Guassinaty is
maximized
 W is the parameter that we want to estimate
 Given a training set 𝑥𝑖; 𝑖 = 1, … , 𝑚 the log likelihood is
24
ICA Algorithm
 Maximize 𝑙(𝑊) using gradient ascent:
 By taking the derivatives of 𝑙(𝑊) using:
25
ICA Algorithm
26
Usages of ICA
 Separation of Artifacts in MEG (magneto-encephalography)
data
 Reducing Noise in Natural Images
 Telecommunications (CDMA [Code-Division Multiple Access]
mobile communications)
27
Application domains of ICA
 Blind source separation
 Image denoising
 Medical signal processing – fMRI, ECG, EEG
 Modelling of the hippocampus and visual cortex
 Feature extraction, face recognition
 Compression, redundancy reduction
 Watermarking
 Clustering
 Time series analysis (stock market, microarray
data)
28
Image denoising
Wiener
filtering
ICA
filtering
Noisy
image
Original
image
29
Automatic Image Segmentation
30
Barcode Classification Results
Classifying 4 data sets: linear, postal, matrix, junk
31
Image De-noising
32
Filling in missing data
33
ICA applied to Brainwaves
An EEG recording consists of activity arising
from many brain and extra-brain processes
34
ICA applied to Brainwaves
35
References
 Aapo Hyvarinen, Juha Karhunen, Erkki Oja-Independent Component Analysis-
Wiley-Interscience (2001)
 http://www.cs.haifa.ac.il/~rita/uml_course/lectures/ICA.pdf
 https://www.youtube.com/watch?v=b2H9_VV1Qgg
 https://www.youtube.com/watch?v=smibJH-0YGc
 http://www.quora.com/Why-does-independent-component-analysis-require-
non-gaussian-signals
36
37

Independent Component Analysis

  • 1.
  • 2.
     Significant recentadvances in the field of statistical signal processing should be brought to the attention of the biomedical engineering community.  Algorithms have been proposed to separate multiple signal sources based solely on their statistical independence, instead of the usual spectral differences.  These algorithms promise to:  lead to more accurate source modeling,  more effective artifact rejection algorithms. 2
  • 3.
    Motivation  Problem: Decomposinga mixed signal into independent sources Ex. Given: Mixed Signal Our Objective is to gain: Source1 News Source2 Song  ICA (Independent Component Analysis) is a quite powerful technique to separate independent sources 3
  • 4.
    BSS and ICA Cocktail party or Blind Source Separation (BSS) problem  Ill posed problem, unless assumptions are made!  Most common assumption is that source signals are statistically independent. This means knowing value of one of them gives no information about the other.  Methods based on this assumption are called Independent Component Analysis methods  statistical techniques for decomposing a complex data set into independent parts.  It can be shown that under some reasonable conditions, if the ICA assumption holds, then the source signals can be recovered up to permutation and scaling. 4
  • 5.
     Statistical independence In probability theory, to say that two events are independent means that the occurrence of one event makes it neither more nor less probable that the other occurs.  Examples of such independent random variables are the value of a dice thrown and of a coin tossed, or speech signal and background noise originating from  a ventilation system at a certain time instant.  This means if we have two signals A and B one signal doesn’t give any information with regards of other signal this is the definition of statistical independent with aspect to ICA Definitions )()()()( )( )( )|( , , yPXPyxPyP xP yxP xyP yxyxy x yx y    5
  • 6.
    Example cocktail-party problem The microphonesgive us two recorded time signals. We denote them with x=(x1(t), x2(t)). x1 and x2 are the amplitudes and t is the time index. We denote the independent signals by s=(s1(t),s2(t)); A - mixing matrix (2x2) x1(t) = a11s1 +a12s2 x2(t) = a21s1 +a22s2 a11,a12,a21, and a22 are some parameters that depend on the distances of the microphones from the speakers. It would be very good if we could estimate the two original speech signals s1(t) and s2(t), using only the recorded signals x1(t) and x2(t). We need to estimate the aij., but it is enough to assume that s1(t) and s2(t), at each time instant t, are statistically independent. The main task is to transform the data (x); s=Ax to independent components, measured by function: F(s1,s2) 6
  • 7.
    Recordings in realenvironments Separation of Music & Speech Experiment-Setup: - office room (5m x 4m) - two distant talking mics - 16kHz sampling rate 40cm 60cm 7
  • 8.
    The ICA model s1s2 s3 s4 x1 x2 x3 x4 a11 a12 a13 a14 xi(t) = ai1*s1(t) + ai2*s2(t) + ai3*s3(t) + ai4*s4(t) Here, i=1:4. In vector-matrix notation, and dropping index t, this is x = A * s 8
  • 9.
    Restrictions on ICA To make sure that the basic ICA model just given can be estimated, we have to make certain assumptions and restrictions.  1. The independent components are assumed statistically independent  2. The independent components must have nongaussian distributions  3. For simplicity, we assume that the unknown mixing matrix is square.  In other words, the number of independent components is equal to the number of observed mixtures. 9
  • 10.
    Central Limit Theorem Central limit theorem states that linear combination of given random variables is more Gaussian as compared to the original variables themselves.  Example:  Repeated throwing of nonbiased dices, with the result given by the sum of the dice faces.  • For one dice the distribution is uniform;  • For two dices the sum is piecewise linearly distributed;  • For a set of k dices, the distribution is given by a  piecewise k-order polynomial distribution;  • With the increase of k, the distribution of the sum tends towards a more Gaussian one.  See Matlab script !! 10
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
    Why does ICArequire signals to be non-Gaussian ?  Central limit theorem states that linear combination of given random variables is more Gaussian as compared to the original variables themselves.  So if we have two independent Gaussian sources after mixing them they become more Gaussian as central limit theorem state, which make it impossible to recover the original sources 16
  • 17.
    Mathematical framework  Verygeneral-purpose method of signal processing and data analysis. Definition of ICA  Statistical latent variables model n linear mixtures of n independent components  N linear mixtures of N independent components  𝑋1, 𝑋1 … … 𝑋 𝑛 and 𝑆1, 𝑆2 … … 𝑆 𝑛are considered random variables, not proper time signals. The values of the signals are considered samples (instantiations) of the random variables, not functions of time. The mean value is taken zero, without loss of generality 17
  • 18.
  • 19.
     The linearmixing equation (the ICA model):  Denoting by aj – the j th column of matrix A; the model becomes  The ICA model is a generative model, i.e., it describes how the observed data are generated by mixing the components si .  The independent components are latent variables, i.e., not directly observable.  The mixing matrix A is also unknown.  We observe only the random vector x, and we must estimate both A and s. This must be done under as general assumptions as possible. 19
  • 20.
     ICA isa special case of blind source separation (BSS) Blind means that we know very little, if anything, on the mixing matrix, and make little assumptions on the source signals. Basic ICA assumption: The source components are statistically independent  they have (unknown) distributions as Non-gaussian as possible  optimize a certain contrast function The problem is finding W - the unmixing matrix that gives - the best estimate of the independent source vector: • If the unknown mixing matrix A is square and nonsingular, then • Else, the best unmixing matrix, that separates sources as independent as possible, is given by the generalized inverse Penrose-Moore matrix: 20
  • 21.
    Ambiguities of ICA Permutation Ambiguity:  Assume that P is a nxn permutation matrix.  Given only the s , we cannot distinguish between W and PW  The permutation of the original sources is ambiguous 21
  • 22.
    Ambiguities of ICA Scaling Ambiguity:  We cannot recover the “correct” scaling of the sources  Solution: Data whitening (sphering)  Scaling a speaker's speech signal by some positive factor affects only the volume of that speaker's speech.  Also, sign changes do not matter: and sound identical when played on a speaker. 22
  • 23.
    Whitening (sphering)  Suppose𝑋 is a random column vector with covariance matrix M and mean zero.  Whitening X means simply multiplying it by 𝑀−1/2  Steps:  Find the covariance matrix of X, Cov(X) = 𝑋𝑋 𝑇  Find the inverse of the square root of Cov(X) and call it invcov  Find the mean of X then subtract it from X  Finally multiply invcov by X to get the whitened version of X  Whitening solves half of the ICA problem !!  See matlab script !! 23
  • 24.
    ICA Algorithm  Tofind the unmixing matrix W it is a maximization problem  We want to find the unmixing matrix W such that the non-Guassinaty is maximized  W is the parameter that we want to estimate  Given a training set 𝑥𝑖; 𝑖 = 1, … , 𝑚 the log likelihood is 24
  • 25.
    ICA Algorithm  Maximize𝑙(𝑊) using gradient ascent:  By taking the derivatives of 𝑙(𝑊) using: 25
  • 26.
  • 27.
    Usages of ICA Separation of Artifacts in MEG (magneto-encephalography) data  Reducing Noise in Natural Images  Telecommunications (CDMA [Code-Division Multiple Access] mobile communications) 27
  • 28.
    Application domains ofICA  Blind source separation  Image denoising  Medical signal processing – fMRI, ECG, EEG  Modelling of the hippocampus and visual cortex  Feature extraction, face recognition  Compression, redundancy reduction  Watermarking  Clustering  Time series analysis (stock market, microarray data) 28
  • 29.
  • 30.
  • 31.
    Barcode Classification Results Classifying4 data sets: linear, postal, matrix, junk 31
  • 32.
  • 33.
  • 34.
    ICA applied toBrainwaves An EEG recording consists of activity arising from many brain and extra-brain processes 34
  • 35.
    ICA applied toBrainwaves 35
  • 36.
    References  Aapo Hyvarinen,Juha Karhunen, Erkki Oja-Independent Component Analysis- Wiley-Interscience (2001)  http://www.cs.haifa.ac.il/~rita/uml_course/lectures/ICA.pdf  https://www.youtube.com/watch?v=b2H9_VV1Qgg  https://www.youtube.com/watch?v=smibJH-0YGc  http://www.quora.com/Why-does-independent-component-analysis-require- non-gaussian-signals 36
  • 37.