A Comparative Study between
ICA and PCA

Md. Sahidul Islam
Roll No. 08054718
Department of Statistics
University of Rajsha...
Overview
 Motivation of the study
 Objective
 Definition of ICA
 FastICA algorithm
 Results of the study
 Latent str...
Motivation of the study
o In multivariate statistics Latent structure detection, cluster
analysis, and outlier detection u...
Objectives
o Study algorithms of ICA
o Applying ICA for Latent structure detection, cluster analysis
and outlier detection...
Independent Component Analysis
The simple “Cocktail Party” Problem
Mixing matrix A

x1

s1

a 12 s 2

x2

a 11

a 11 s1
a ...
Non-gaussianity is independent
Central limit theorem
The distribution of a sum of independent random variables tends
towar...
Non-guassianity is Independent
Nongaussianity estimates independent
 Estimation of y = wT x =wTAs = zTs
 let z = AT w, ...
FastICA algorithm
Iteration procedure for maximizing
nongaussianity
Step1: choose an initial weight vector w
Step2: Let w+...
Results and Discussions

Latent structure detection

Department of Statistics, University of Rajshahi-6205

9
Simulated dataset -1

Figure: Matrix plot of original source of 10
uniform distribution.
Department of Statistics, Univers...
Simulated dataset -1

Figure: (a) Matrix plot of 10 principal components. (b) Matrix plot of source variables.
Department ...
Simulated dataset -1

Figure: (a) Matrix plot of 10 independent components. (b) Matrix plot of source variables
Department...
Simulated dataset-2

Simulated dataset-2 consists of
5 variables comes from Laplace
(super-gaussian), uniform
(sub-gaussia...
Simulated dataset-2

Figure: (Left)Matrix plot of principle components. (Right) Original source of 5 variables
each comes ...
Simulated dataset-2

Figure: (Left)Matrix plot of independent components. (Right) Original source of 5
variables each come...
Cluster Analysis

Department of Statistics, University of Rajshahi-6205

16
Australian Crabs dataset
The first experiment of real data set for clustering is Australian crabs data set where
there are...
Fisher Iris dataset
The second example of real data set is world famous Fishers Iris data set
where the data report four c...
Outlier detection

Department of Statistics, University of Rajshahi-6205

19
Scottish hill racing dataset

The data gives the record wining times for 35 hill races in Scotland (Atkinson,
1986). The p...
Epilepsy dataset
Thal and Vail reported data from clinical trial of 59 patients with
epilepsy, 31 of whom were randomized ...
Stackloss data
This data consists of 21 days of operation for a plant for the
oxidation of ammonia as a stage in the produ...
Education expenditure dataset
These data are used by Chatterjee, Hadi, and Price as an example
of heteroscedasticity. The ...
Conclusions
If the subject domain supports the assumption of
independent non-gaussian source variables, we
recommended of ...
Future Research
The following are the areas in which we want to study
o Use Kernel technique of ICA for shape study, clust...
Thank you

Department of Statistics, University of Rajshahi-6205

26
Upcoming SlideShare
Loading in …5
×

A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

1,369 views

Published on

A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,369
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
67
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

  1. 1. A Comparative Study between ICA and PCA Md. Sahidul Islam Roll No. 08054718 Department of Statistics University of Rajshahi ripon.ru.statistics@gmail.com 1
  2. 2. Overview  Motivation of the study  Objective  Definition of ICA  FastICA algorithm  Results of the study  Latent structure  Cluster analysis  Outlier detection  Conclusions Department of Statistics, University of Rajshahi-6205 2
  3. 3. Motivation of the study o In multivariate statistics Latent structure detection, cluster analysis, and outlier detection using PCA is a promising old technique. o In many cases ICA perform better than PCA. o Our motivation in this thesis is to perform latent structure, cluster analysis and outlier detection using ICA and compare it with that of PCA Department of Statistics, University of Rajshahi-6205 3
  4. 4. Objectives o Study algorithms of ICA o Applying ICA for Latent structure detection, cluster analysis and outlier detection. o Comparing its performance with that of PCA Department of Statistics, University of Rajshahi-6205 4
  5. 5. Independent Component Analysis The simple “Cocktail Party” Problem Mixing matrix A x1 s1 a 12 s 2 x2 a 11 a 11 s1 a 21 s1 a 22 s 2 x1 a 21 Sources s2 a 12 a11 a12 s1 x2 Observations x1 a 21 a 22 s2 x2 ICA y= WTx a 22 x=As PCA Department of Statistics, University of Rajshahi-6205 5
  6. 6. Non-gaussianity is independent Central limit theorem The distribution of a sum of independent random variables tends toward a Gaussian distribution Observed signal toward Gaussian = a1 S1 Non-Gaussian + a2 S2 ….+ an Non-Gaussian Department of Statistics, University of Rajshahi-6205 Sn Non-Gaussian 6
  7. 7. Non-guassianity is Independent Nongaussianity estimates independent  Estimation of y = wT x =wTAs = zTs  let z = AT w, so y = wTAs = zTs  y is a linear combination of si, therefore zTs is more gaussian than any of si  zTs becomes least gaussian when it is equal to one of the si  wTx = zTs equals an independent component Maximizing nongaussianity of wTx gives us one of the independent components Department of Statistics, University of Rajshahi-6205 7
  8. 8. FastICA algorithm Iteration procedure for maximizing nongaussianity Step1: choose an initial weight vector w Step2: Let w+=E[xg(wTx)]-E[g’(wTx)]w (g: a non-quadratic function) Step3: Let w=w+/||w+|| Step4: if not converged, go back to Step2 Department of Statistics, University of Rajshahi-6205 8
  9. 9. Results and Discussions Latent structure detection Department of Statistics, University of Rajshahi-6205 9
  10. 10. Simulated dataset -1 Figure: Matrix plot of original source of 10 uniform distribution. Department of Statistics, University of Rajshahi-6205 10
  11. 11. Simulated dataset -1 Figure: (a) Matrix plot of 10 principal components. (b) Matrix plot of source variables. Department of Statistics, University of Rajshahi-6205 11
  12. 12. Simulated dataset -1 Figure: (a) Matrix plot of 10 independent components. (b) Matrix plot of source variables Department of Statistics, University of Rajshahi-6205 12
  13. 13. Simulated dataset-2 Simulated dataset-2 consists of 5 variables comes from Laplace (super-gaussian), uniform (sub-gaussian), binomial, multinomial and normal distribution each have 10000 observation. Figure: Matrix plot of original source of 5 variables each comes form different distribution. Department of Statistics, University of Rajshahi-6205 13
  14. 14. Simulated dataset-2 Figure: (Left)Matrix plot of principle components. (Right) Original source of 5 variables each comes form different distribution. Department of Statistics, University of Rajshahi-6205 14
  15. 15. Simulated dataset-2 Figure: (Left)Matrix plot of independent components. (Right) Original source of 5 variables each comes form different distribution. Department of Statistics, University of Rajshahi-6205 15
  16. 16. Cluster Analysis Department of Statistics, University of Rajshahi-6205 16
  17. 17. Australian Crabs dataset The first experiment of real data set for clustering is Australian crabs data set where there are 200 rows and 8 columns describing the 5 morphological measurements (Frontal lob size, Rear width, Carapace length, Carapace width, Body depth). There are two species in the data set each have both sexes (male, female) of the genus Leptograpsus. There are 50 specimens of each sex of each species, collected on site at Fremantle, Western Australia. (N. A. Campbell et al., 1974). Department of Statistics, University of Rajshahi-6205 17
  18. 18. Fisher Iris dataset The second example of real data set is world famous Fishers Iris data set where the data report four characteristics (sepal width, sepal length, petal width and petal length) of three species (setosa, versicolor, virginica) of Iris flower. Department of Statistics, University of Rajshahi-6205 18
  19. 19. Outlier detection Department of Statistics, University of Rajshahi-6205 19
  20. 20. Scottish hill racing dataset The data gives the record wining times for 35 hill races in Scotland (Atkinson, 1986). The purpose of that study was to investigate the relationship of record time 35 hill races. Department of Statistics, University of Rajshahi-6205 20
  21. 21. Epilepsy dataset Thal and Vail reported data from clinical trial of 59 patients with epilepsy, 31 of whom were randomized to receive the anti-epilepsy drug Progabide and 28 receive placebo Department of Statistics, University of Rajshahi-6205 21
  22. 22. Stackloss data This data consists of 21 days of operation for a plant for the oxidation of ammonia as a stage in the production of nitric acid. The response is called stack loss which is percent of uncovered ammonia that escapes from the planet. There are three explanatory and one response variable in the dataset. Department of Statistics, University of Rajshahi-6205 22
  23. 23. Education expenditure dataset These data are used by Chatterjee, Hadi, and Price as an example of heteroscedasticity. The data gives the education expenditures of U.S. states as projected in 1975. Department of Statistics, University of Rajshahi-6205 23
  24. 24. Conclusions If the subject domain supports the assumption of independent non-gaussian source variables, we recommended of using ICA in place of PCA for latent structure detection, clustering and outlier detection. Department of Statistics, University of Rajshahi-6205 24
  25. 25. Future Research The following are the areas in which we want to study o Use Kernel technique of ICA for shape study, clustering and outlier detection. o Separation of Nonlinear mixture. o Data mining (sometimes called data or knowledge discovery) is the most recent technique in multivariate analysis to extract information from a data set and transform it into an understandable structure for further use. Text data mining or Medical data mining using ICA wolud be future research. Department of Statistics, University of Rajshahi-6205 25
  26. 26. Thank you Department of Statistics, University of Rajshahi-6205 26

×