SlideShare a Scribd company logo
1 of 46
FEATURE SELECTION
Presented by Priagung Khusumanegara
Prof. Soo-Hyung Kim
OUTLINES
• CLASS SEPARABILITY MEASURES
 Divergence
 Bhattacharyya Distance and Chernoff Bound
 Measures Based on Scatter Matrices
• FEATURE SUBSET SELECTION
 Scalar Feature Selection
 Feature Vector Selection
CLASS SEPARABILITY MEASURES
This section focus on combinations of features (i.e., feature
vectors) and describe measures that quantify class separability
in the respective feature space.
Three class-separability measures are considered:
1. Divergence
2. Bhattacharyya Distance and Chernoff Bound
3. Scatter matrices
Divergence
Based on a measure of difference between pairs of classes. The
divergence between them is defined as
Where, Si is the covariance matrix; mi , i = 1,2 is the respective mean
vector of the I th class; and I is the l × l identity matrix.
Example 4.7.1
We consider two classes and assume the features to be independent
and normally distributed in each one. Specifically, the first (second)
class is modelled by the Gaussian distribution.
m1 = [3,3]T , S1 = 0.2I
m2 = [2.3,2.3]T , S2 = 1.9I
Generate 100 data points from each class. Then compute the
divergence between the two classes and plot the data.
Matlab Code
% 1. Generate data for the two classes
m1=[3 3]';S1=0.2*eye(2);
m2=[2.3 2.3]';S2=1.9*eye(2);
randn('seed',0);
class1=mvnrnd(m1,S1,100)';
randn('seed',100);
class2=mvnrnd(m2,S2,100)';
% 2. Compute the divergence. Employ the divergence function
Distance=divergence(class1,class2)
% Use function plotData to plot the data
featureNames={'1','2'};
plotData(class1,class2,1:2,featureNames);
Result
Divergence Distance = 5.7233
-2 -1 0 1 2 3 4 5 6
-3
-2
-1
0
1
2
3
4
5
6
Feature: 1
Feature:2
In fact that the mean values of the
two classes are very close, but the
variances of the features differ
significantly, so the divergence
measure results in a large value.
Bhattacharyya Distance and Chernoff Bound
• The Bhattacharyya distance bears a close relationship to the error of
the Bayesian classifier. The Bhattacharyya distance B1,2 between two
classes is defined as
Where, and |.| denotes the determinant of the
respective matrix.
• The Chernoff bound is an upper bound of the optimal Bayesian error.
where P(ω1),P(ω2 ) are the a priori class probabilities
Example 4.7.2
Consider a 2-class 2-dimensional classification problem with two
equiprobable classes ω1 and ω2 modeled by the Gaussian distributions
N (m1,S1) and N (m2,S2), respectively, where
m1 = [3,3]T , S1 = 0.2I
m2 = [2.3,2.3]T , S2 = 1.9I
This is the case of small differences between mean values and large
differences between variances. Compute the Bhattacharyya distance
and the Chernoff bound between the two classes.
Matlab Code
% 1. Generate data for the two classes
m1=[3 3]';S1=0.2*eye(2);
m2=[2.2 2.2]';S2=1.9*eye(2);
randn('seed',0);
class1=mvnrnd(m1,S1,100)';
randn('seed',100);
class2=mvnrnd(m2,S2,100)';
% 2. Compute the Bhattacharyya distance and the Chernoff bound
BhattaDistance=divergenceBhata (class1,class2)
ChernoffBound=0.5*exp(-BhattaDistance)
% Use function plotData to plot the data
featureNames={'1','2'};
plotData(class1,class2,1:2,featureNames);
Result
• Bhattacharyya Distance = 0.3730
• Chernoff Bound = 0.3443
Another Case
m1 = [1,1]T, m2 = [4,4]T ,S1 = 0.2I, S2 = 0.2I
• Bhattacharyya Distance = 5.9082
• Chernoff Bound = 0.0014
-2 -1 0 1 2 3 4 5 6
-3
-2
-1
0
1
2
3
4
5
6
Feature: 1
Feature:2
-1 0 1 2 3 4 5 6
-1
0
1
2
3
4
5
6
Feature: 1
Feature:2
Measures Based on Scatter Matrices
Scatter matrices : measures for quantifying the way feature vectors
“scatter” in the feature space. Three such measures are the following:
Where, Sw is the within-class scatter matrix, Sb is the between-class
scatter matrix and Sm is mixture scatter matrix
1. Within-class scatter matrix (Sw )
Average feature variance per class
2. Between-class scatter matrix (Sb)
Average variance of class means vs. global mean
3. Mixture scatter matrix (Sm)
Feature covariance with respect to global mean
Large values of J1, J2, and J3 indicate that data points in the respective
feature space have small within-class variance and large between-class
distance.
Example 4.7.3
In this example, the previously defined J3 measure will be used to
choose the best l features out of m > l originally generated features. To
be more realistic, we will consider Example 4.6.2, where four features
(mean, standard deviation, skewness, and kurtosis) were used. From
Table 4.3 we have ten values for each feature and for each class. The
goal is to select three out of the four features that will result in the
best J3 value.
1. Normalize the values of each feature to have zero mean and unit
variance.
2. Select three out of the four features that result in the best J3 value.
3. Plot the data points in the 3-dimensional space for the best feature
combination.
Matlab Code
% 1. Load files “cirrhoticLiver.dat” and “fattyLiver.dat”
class1=load('cirrhoticLiver.dat')';
class2=load('fattyLiver.dat')';
% Normalize features
[class1,class2]=normalizeStd(class1,class2);
% 2. Evaluate J3 for the three-feature combination [1, 2, 3], where 1 stands for the
mean, 2 for std, and so on
[J]=ScatterMatrices(class1([1 2 3],:),class2([1 2 3],:))
% 3. Plot the results of the feature combination [1, 2, 3]. Use function plotData
featureNames = {'mean','standart dev','skewness','kurtosis'};
plotData(class1,class2,[1 2 3],featureNames);
Result
J [1 2 3] = 3.9742
Another Result
J [1 2 4] = 3.9741
J [1 3 4] = 3.6195
J [2 3 4] = 1.3972
J [1 2 3] Indicate that data points in the respective feature space have small within-
class variance and large between-class distance.
Feature Subset Selection
• Problem
Select a subset of l features out of m originally available, with the
goal of maximizing class separation.
• Approaches:
1. Scalar Feature Selection : treat feature individually (ignores
feature correlations)
2. Feature vector Selection: consider feature sets and feature
correlation
Scalar Feature Selection
Procedure:
1. Select a class separability criterion C and compute its
values for all the available features xk, k 1, 2, . . . ,m. Rank
them in descending order and choose the one with the
best C value. Let us say that this is xi1 .
2. To select the second feature, compute the cross-correlation
coefficient between the chosen xi1 and each of the
remaining m1 features.
3. Choose the feature xi2 for which
where a1, a2 are weighting factors that determine the relative
importance we give to the two terms
4. Select xik , k 3, . . . , l, so that
That is, the average correlation with all previously selected features is
taken into account
Example 4.8.1
This example demonstrates the ranking of a given number of features
according to their class-discriminatory power. We will use the same data as
in Example 4.6.2. Recall that there are two classes and four features (mean,
standard deviation, skewness, and kurtosis); their respective values for the
two classes are summarized in Table 4.3.
1. Normalize the features so as to have zero mean and unit variance. Then
use the FDR criterion to rank the features by considering each one
independently (see Example 4.6.2).
2. Use the scalar feature selection technique described before to rank the
features. Use the FDR in place of the criterion C. Set a1 = 0.2 and a2 =
0.8.
3. Comment on the results
Matlab Code
Result
Scalar Feature Ranking
1. (mean)
2. (st. dev.)
3. (skewness)
4. (kurtosis)
Scalar Feature Ranking with correlation
1. (mean)
2. (kurtosis)
3. (st. dev.)
4. (skewness)
The rank order provided by the FDR class-
separability criterion differs from the ranking that
results when both the FDR and the cross-
correlations are considered. It must be emphasized
that different values for coefficients a1 and a2 may
change the ranking.
Feature Vector Selection
Feature vector selection: consider feature sets and feature correlation.
The goal now is to find the “best” combination of features.
Techniques:
• Exhaustive search
• Sequential forward and backward selection
• Forward floating search selection
Exhaustive search
According to this technique, all possible combinations will be
“exhaustively” formed and for each combination its class separability
will be computed. Different class separability measures will be used.
Example 4.8.2
We will demonstrate the procedure for selecting the “best” feature
combination in the context of computer-aided diagnosis in mammography.
In digital X-ray breast imaging, micro calcifications are concentrations of
small white spots representing calcium deposits in breast tissue (see Figure
4.6). Often they are assessed as a noncancerous sign, but depending on their
shape and concentration patterns, they may indicate cancer. Their
significance in breast cancer diagnosis tasks makes it important to have a
pattern recognition system that detects the presence of micro calcifications
in a mammogram. To this end, a procedure similar to the one used in
Example 4.6.2 will be adopted here. Four features are derived for each ROI:
mean, standard deviation, skewness, and kurtosis. Figure 4.6 shows the
selected ROIs for the two classes (10 for normal tissue; 11 for abnormal
tissue with micro calcifications).
1. Employ the exhaustive search method to select the best
combination of three features out of the four previously
mentioned, according to the divergence, the Bhattacharyya
distance, and the J3 measure associated with the scatter matrices.
2. Plot the data in the subspace spanned by the features selected in
step 1.
Matlab Code
% 1. Load files breastMicrocalcifications.dat and breastNormalTissue.dat
class1=load('breastMicrocalcifications.dat')';
class2=load('breastNormalTissue.dat')';
% Normalize features so that all of them have zero mean and unit variance
[class1,class2]=normalizeStd(class1,class2);
% Select the best combination of features using the J3 criterion
costFunction='ScatterMatrices';
[cLbest,Jmax]=exhaustiveSearch(class1,class2, costFunction,[3])
% 2. Form the classes c1 and c2 using the best feature combination
c1 = class1(cLbest,:);
c2 = class2(cLbest,:);
% Plot the data using the plotData function
featureNames = {'mean','st. dev.','skewness','kurtosis'};
plotData(c1,c2,cLbest,featureNames);
Result
cLbest =
1 2 3
Jmax =
1.9625 -3
-2
-1
0
1
2
-2
0
2
4
-2
-1
0
1
2
3
Feature: meanFeature: st. dev.
Feature:skewness
Suboptimal Searching Techniques
Backward Selection
1. Start with all features in a vector (m features)
2. Interactively eliminate one feature, compute class separability
criterion
3. Keep combination with the highest criterion value
4. Repeat with chosen combination until we have a vector of size l
Number of Combinations Generated
1 + ½ ((m +1)m - l (l +1))
Sequential Forward Selection
1. Compute criterion value for each feature
2. Select feature with best value
3. Form all possible pairings of best vector with another unused feature
4. Evaluate each using the criterion, select best vector
5. Repeat step 3 until we have a vector of size l
Number of Combinations Generated
lm- l(l- 1)/2
Forward Floating Search Selection
The idea of forward floating search selection: we to add a feature
(forward), check smaller feature sets to see if we do better with this
feature replacing a previously selected feature (backward). Terminate
when k features selected.
Example 4.8.4
The goal of this example is to demonstrate the (possible) differences
in accuracy between the suboptimal feature selection techniques SFS,
SBS, and SFFS and the optimal exhaustive search technique, which
will be considered as the “gold standard.”
We assume that we are working in a 2-dimensional feature space, and
we evaluate the performance of the four feature selection methods in
determining the best 2-feature combination, employing the functions
exhaustive Search, Sequential Forward Selection, Sequential Backward
Selection, and Sequential Forward Floating Selection.
Matlab Code
Result
Exhaustive Search -> Best of two: (1) (6)
Sequential Forward Selection -> Best of two: (1) (5)
Sequential Backward Selection -> Best of two: (2) (9)
Floating Search Method -> Best of two: (1) (6)
Example 4.8.4. Designing a Classification
System
The goal of this example is to demonstrate the various stages involved
in the design of a classification system. We will adhere to the 2-class
texture classification task of Figure 4.8. The following five stages will be
explicitly considered:
• Data collection for training and testing
• Feature generation
• Feature selection
• Classifier design
• Performance evaluation
Data collection for training and testing
• A number of different images must be chosen for both the training
and the test set. To form the training data set, 25 ROIs are selected
from each image representing the two classes
Feature generation
• From each of the selected ROIs, a number of features is generated.
We have decided on 20 different texture-related features.
• Feature selection
In this case, we use scalar feature selection (function composite
Features Ranking), which employs the FDR criterion as well as a cross-
correlation measure between pairs of features in order to rank them.
• Classifier design
In this stage different classifiers are employed in the selected feature
space; the one that results in the best performance is chosen. In this
example we use the k-nearest neighbour (k-NN) classifier.
• Performance evaluation
The performance of the classifier, in terms of its error rate, is
measured against the test data set. However, in order to cover the
case where the same data must be used for both training and testing,
the leave-one-out (LOO) method will be employed. LOO is particularly
useful in cases where only a limited data set is available.
Matlab Code
% 1. Read the training data and normalize the feature values.
c1_train=load('trainingClass1.dat')';
c2_train=load('trainingClass2.dat')';
% Normalize dataset
superClass=[c1_train c2_train];
for i=1:size(superClass,1)
m(i)=mean(superClass(i,:)); % mean value of feature
s(i)=std (superClass(i,:)); % std of feature
superClass(i,:)=(superClass(i,:)-m(i))/s(i);
end
c1_train=superClass(:,1:size(c1_train,2));
c2_train=superClass(:,size(c1_train,2)+1:size(superClass,2));
% 2. Rank the features using the normalized training dataset. We have adopted
% the scalar feature ranking technique which employs Fisher’s Discrimination Ratio
% in conjunction with a feature correlation measure. The ranking results are returned in
variable p.
[T]=ScalarFeatureSelectionRanking(c1_train,c2_train,'Fisher');
[p]= compositeFeaturesRanking (c1_train,c2_train,0.2,0.8,T);
% 4. Choose the best feature combination consisting of three features (out of the previously selected
seven), using the exhaustive search method.
• [cLbest,Jmax]= exhaustiveSearch(c1_train,c2_train,'ScatterMatrices',[3]);
% 5. Form the resulting training dataset (using the best feature combination), along with the
corresponding class labels.
trainSet=[c1_train c2_train];
trainSet=trainSet(cLbest,:);
trainLabels=[ones(1,size(c1_train,2)) 2*ones(1,size(c2_train,2))];
% 6. Load the test dataset and normalize using the mean and std (computed over the training dataset)
Form the vector of the corresponding test labels.
c1_test=load('testClass1.dat')';
c2_test=load('testClass2.dat')';
for i=1:size(c1_test,1)
c1_test(i,:)=(c1_test(i,:)-m(i))/s(i);
c2_test(i,:)=(c2_test(i,:)-m(i))/s(i);
end
c1_test=c1_test(inds,:);
c2_test=c2_test(inds,:);
testSet=[c1_test c2_test];
testSet=testSet(cLbest,:);
testLabels=[ones(1,size(c1_test,2)) 2*ones(1,size(c2_test,2))];
% 7. Plot the test dataset by means of function plotData Provide names for the features
featureNames={'mean','stand dev','skewness','kurtosis','Contrast 0','Contrast 90','Contrast
45','Contrast 135','Correlation 0','Correlation 90','Correlation 45','Correlation 135','Energy
0','Energy 90','Energy 45','Energy 135','Homogeneity 0','Homogeneity 90','Homogeneity
45','Homogeneity 135'};
fNames=featureNames(inds);
fNames=fNames(cLbest);
plotData(c1_test(cLbest,:),c2_test(cLbest,:),1:3,fNames);
% 8. Classify the feature vectors of the test data using the k-NN classifier
[classified]=k_nn_classifier(trainSet,trainLabels,3,testSet);
[classif_error]=compute_error(testLabels,classified)
% Leave-one-out (LOO) method (a) Load all training data, normalize them and create class labels.
c1_train=load('trainingClass1.dat')';
c1=c1_train;
Labels1=ones(1,size(c1,2));
c2_train=load('trainingClass2.dat')';
c2=c2_train;
Labels2=2*ones(1,size(c2,2));
[c1,c2]=normalizeStd(c1,c2);
AllDataset=[c1 c2];
Labels=[Labels1 Labels2];
% (b) Keep features of the best feature combination (determined previously) and discard all the
rest.
AllDataset=AllDataset(inds,:);
AllDataset=AllDataset(cLbest,:);
% (c) Apply the Leave One Out method on the k-NN classifier (k = 3) and compute the error.
[M,N]=size(AllDataset);
for i=1:N
dec(i)=k_nn_classifier([AllDataset(:,1:i-1) AllDataset(:,i+1:N)], [Labels(1,1:i-1)
Labels(1,i+1:N)],3,AllDataset(:,i));
end
LOO_error=sum((dec~=Labels))/N
Result
Classification Error: 0.0200
The leave-one-out (LOO) error is 1%.
Conclusion
Feature Selection: select the most effective features
according to given optimal criterion (ranking the
features according to their contribution to recognition
results)
Reference
• Sergios Theodoridis, Konstantinos Koutroumbas: An Introduction to
Pattern Recognition : A MATLAB Approach

More Related Content

What's hot

Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.Rohit Kumar
 
Neural net and back propagation
Neural net and back propagationNeural net and back propagation
Neural net and back propagationMohit Shrivastava
 
Neural Networks
Neural NetworksNeural Networks
Neural NetworksAdri Jovin
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and ldaSuresh Pokharel
 
Neural Networks and Deep Learning Basics
Neural Networks and Deep Learning BasicsNeural Networks and Deep Learning Basics
Neural Networks and Deep Learning BasicsJon Lederman
 
Speaker Recognition using Gaussian Mixture Model
Speaker Recognition using Gaussian Mixture Model Speaker Recognition using Gaussian Mixture Model
Speaker Recognition using Gaussian Mixture Model Saurab Dulal
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentMuhammad Rasel
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)EdutechLearners
 
Feature selection
Feature selectionFeature selection
Feature selectiondkpawar
 
lecture07.ppt
lecture07.pptlecture07.ppt
lecture07.pptbutest
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classificationSung Yub Kim
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationYan Xu
 
Presentation on unsupervised learning
Presentation on unsupervised learning Presentation on unsupervised learning
Presentation on unsupervised learning ANKUSH PAL
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagationKrish_ver2
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningAmAn Singh
 
Spatial Filters (Digital Image Processing)
Spatial Filters (Digital Image Processing)Spatial Filters (Digital Image Processing)
Spatial Filters (Digital Image Processing)Kalyan Acharjya
 

What's hot (20)

Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.
 
Neural net and back propagation
Neural net and back propagationNeural net and back propagation
Neural net and back propagation
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
 
Neural Networks and Deep Learning Basics
Neural Networks and Deep Learning BasicsNeural Networks and Deep Learning Basics
Neural Networks and Deep Learning Basics
 
Self Organizing Maps
Self Organizing MapsSelf Organizing Maps
Self Organizing Maps
 
Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)
 
Speaker Recognition using Gaussian Mixture Model
Speaker Recognition using Gaussian Mixture Model Speaker Recognition using Gaussian Mixture Model
Speaker Recognition using Gaussian Mixture Model
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descent
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
Feature selection
Feature selectionFeature selection
Feature selection
 
lecture07.ppt
lecture07.pptlecture07.ppt
lecture07.ppt
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
 
Presentation on unsupervised learning
Presentation on unsupervised learning Presentation on unsupervised learning
Presentation on unsupervised learning
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 
Spatial Filters (Digital Image Processing)
Spatial Filters (Digital Image Processing)Spatial Filters (Digital Image Processing)
Spatial Filters (Digital Image Processing)
 

Similar to Feature Selection

A comparative study of dimension reduction methods combined with wavelet tran...
A comparative study of dimension reduction methods combined with wavelet tran...A comparative study of dimension reduction methods combined with wavelet tran...
A comparative study of dimension reduction methods combined with wavelet tran...ijcsit
 
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...ijcsit
 
Pattern Recognition - Designing a minimum distance class mean classifier
Pattern Recognition - Designing a minimum distance class mean classifierPattern Recognition - Designing a minimum distance class mean classifier
Pattern Recognition - Designing a minimum distance class mean classifierNayem Nayem
 
Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...IJERA Editor
 
A New Framework for Kmeans Algorithm by Combining the Dispersions of Clusters
A New Framework for Kmeans Algorithm by Combining the Dispersions of ClustersA New Framework for Kmeans Algorithm by Combining the Dispersions of Clusters
A New Framework for Kmeans Algorithm by Combining the Dispersions of ClustersIJMTST Journal
 
Data Mining and Neural NetworksComputational Task 1Tas
Data Mining and Neural NetworksComputational Task 1TasData Mining and Neural NetworksComputational Task 1Tas
Data Mining and Neural NetworksComputational Task 1TasOllieShoresna
 
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...IJRES Journal
 
Shriram Nandakumar & Deepa Naik
Shriram Nandakumar & Deepa NaikShriram Nandakumar & Deepa Naik
Shriram Nandakumar & Deepa NaikShriram Nandakumar
 
A Modified KS-test for Feature Selection
A Modified KS-test for Feature SelectionA Modified KS-test for Feature Selection
A Modified KS-test for Feature SelectionIOSR Journals
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...IRJET Journal
 
EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171Yaxin Liu
 
A Novel Algorithm for Design Tree Classification with PCA
A Novel Algorithm for Design Tree Classification with PCAA Novel Algorithm for Design Tree Classification with PCA
A Novel Algorithm for Design Tree Classification with PCAEditor Jacotech
 
Implementing Minimum Error Rate Classifier
Implementing Minimum Error Rate ClassifierImplementing Minimum Error Rate Classifier
Implementing Minimum Error Rate ClassifierDipesh Shome
 
A general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernelsA general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernelsinfopapers
 
ARABIC HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITION
ARABIC HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITIONARABIC HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITION
ARABIC HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITIONsipij
 
Practical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationPractical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationHariniMS1
 
A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...
A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...
A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...cscpconf
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdfBeyaNasr1
 

Similar to Feature Selection (20)

A comparative study of dimension reduction methods combined with wavelet tran...
A comparative study of dimension reduction methods combined with wavelet tran...A comparative study of dimension reduction methods combined with wavelet tran...
A comparative study of dimension reduction methods combined with wavelet tran...
 
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
 
Pattern Recognition - Designing a minimum distance class mean classifier
Pattern Recognition - Designing a minimum distance class mean classifierPattern Recognition - Designing a minimum distance class mean classifier
Pattern Recognition - Designing a minimum distance class mean classifier
 
Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...
 
A New Framework for Kmeans Algorithm by Combining the Dispersions of Clusters
A New Framework for Kmeans Algorithm by Combining the Dispersions of ClustersA New Framework for Kmeans Algorithm by Combining the Dispersions of Clusters
A New Framework for Kmeans Algorithm by Combining the Dispersions of Clusters
 
Data Mining and Neural NetworksComputational Task 1Tas
Data Mining and Neural NetworksComputational Task 1TasData Mining and Neural NetworksComputational Task 1Tas
Data Mining and Neural NetworksComputational Task 1Tas
 
Lda
LdaLda
Lda
 
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
 
Shriram Nandakumar & Deepa Naik
Shriram Nandakumar & Deepa NaikShriram Nandakumar & Deepa Naik
Shriram Nandakumar & Deepa Naik
 
A Modified KS-test for Feature Selection
A Modified KS-test for Feature SelectionA Modified KS-test for Feature Selection
A Modified KS-test for Feature Selection
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
 
EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171
 
1376846406 14447221
1376846406  144472211376846406  14447221
1376846406 14447221
 
A Novel Algorithm for Design Tree Classification with PCA
A Novel Algorithm for Design Tree Classification with PCAA Novel Algorithm for Design Tree Classification with PCA
A Novel Algorithm for Design Tree Classification with PCA
 
Implementing Minimum Error Rate Classifier
Implementing Minimum Error Rate ClassifierImplementing Minimum Error Rate Classifier
Implementing Minimum Error Rate Classifier
 
A general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernelsA general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernels
 
ARABIC HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITION
ARABIC HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITIONARABIC HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITION
ARABIC HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITION
 
Practical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationPractical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and Presentation
 
A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...
A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...
A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 

More from Lippo Group Digital

Behavior-Based Authentication System Based on Smartphone Life-Logs Data
Behavior-Based Authentication System Based on Smartphone Life-Logs DataBehavior-Based Authentication System Based on Smartphone Life-Logs Data
Behavior-Based Authentication System Based on Smartphone Life-Logs DataLippo Group Digital
 
A web based iptv content syndication system for personalized content guide
A web based iptv content syndication system for personalized content guideA web based iptv content syndication system for personalized content guide
A web based iptv content syndication system for personalized content guideLippo Group Digital
 
Time-based DDoS Detection and Mitigation for SDN Controller
Time-based DDoS Detection and Mitigation for SDN ControllerTime-based DDoS Detection and Mitigation for SDN Controller
Time-based DDoS Detection and Mitigation for SDN ControllerLippo Group Digital
 
Caching in Information Centric Network (ICN)
Caching in Information Centric Network (ICN)Caching in Information Centric Network (ICN)
Caching in Information Centric Network (ICN)Lippo Group Digital
 
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...Lippo Group Digital
 
Analisis performa kecepatan mapreduce pada hadoop menggunakan tcp packet flow...
Analisis performa kecepatan mapreduce pada hadoop menggunakan tcp packet flow...Analisis performa kecepatan mapreduce pada hadoop menggunakan tcp packet flow...
Analisis performa kecepatan mapreduce pada hadoop menggunakan tcp packet flow...Lippo Group Digital
 
Analisa pengaruh block size pada hdfs terhadap kecepatan
Analisa pengaruh block size pada hdfs terhadap kecepatanAnalisa pengaruh block size pada hdfs terhadap kecepatan
Analisa pengaruh block size pada hdfs terhadap kecepatanLippo Group Digital
 

More from Lippo Group Digital (13)

Behavior-Based Authentication System Based on Smartphone Life-Logs Data
Behavior-Based Authentication System Based on Smartphone Life-Logs DataBehavior-Based Authentication System Based on Smartphone Life-Logs Data
Behavior-Based Authentication System Based on Smartphone Life-Logs Data
 
Domain specific IoT
Domain specific IoTDomain specific IoT
Domain specific IoT
 
Fall detection
Fall detectionFall detection
Fall detection
 
The Cognitive Net is Coming
The Cognitive Net is ComingThe Cognitive Net is Coming
The Cognitive Net is Coming
 
The future internet web 3.0
The future internet  web 3.0The future internet  web 3.0
The future internet web 3.0
 
A web based iptv content syndication system for personalized content guide
A web based iptv content syndication system for personalized content guideA web based iptv content syndication system for personalized content guide
A web based iptv content syndication system for personalized content guide
 
Time-based DDoS Detection and Mitigation for SDN Controller
Time-based DDoS Detection and Mitigation for SDN ControllerTime-based DDoS Detection and Mitigation for SDN Controller
Time-based DDoS Detection and Mitigation for SDN Controller
 
Distance function
Distance functionDistance function
Distance function
 
Caching in Information Centric Network (ICN)
Caching in Information Centric Network (ICN)Caching in Information Centric Network (ICN)
Caching in Information Centric Network (ICN)
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
 
Analisis performa kecepatan mapreduce pada hadoop menggunakan tcp packet flow...
Analisis performa kecepatan mapreduce pada hadoop menggunakan tcp packet flow...Analisis performa kecepatan mapreduce pada hadoop menggunakan tcp packet flow...
Analisis performa kecepatan mapreduce pada hadoop menggunakan tcp packet flow...
 
Analisa pengaruh block size pada hdfs terhadap kecepatan
Analisa pengaruh block size pada hdfs terhadap kecepatanAnalisa pengaruh block size pada hdfs terhadap kecepatan
Analisa pengaruh block size pada hdfs terhadap kecepatan
 

Recently uploaded

IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacingjaychoudhary37
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 

Recently uploaded (20)

IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacing
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 

Feature Selection

  • 1. FEATURE SELECTION Presented by Priagung Khusumanegara Prof. Soo-Hyung Kim
  • 2. OUTLINES • CLASS SEPARABILITY MEASURES  Divergence  Bhattacharyya Distance and Chernoff Bound  Measures Based on Scatter Matrices • FEATURE SUBSET SELECTION  Scalar Feature Selection  Feature Vector Selection
  • 3. CLASS SEPARABILITY MEASURES This section focus on combinations of features (i.e., feature vectors) and describe measures that quantify class separability in the respective feature space. Three class-separability measures are considered: 1. Divergence 2. Bhattacharyya Distance and Chernoff Bound 3. Scatter matrices
  • 4. Divergence Based on a measure of difference between pairs of classes. The divergence between them is defined as Where, Si is the covariance matrix; mi , i = 1,2 is the respective mean vector of the I th class; and I is the l × l identity matrix.
  • 5. Example 4.7.1 We consider two classes and assume the features to be independent and normally distributed in each one. Specifically, the first (second) class is modelled by the Gaussian distribution. m1 = [3,3]T , S1 = 0.2I m2 = [2.3,2.3]T , S2 = 1.9I Generate 100 data points from each class. Then compute the divergence between the two classes and plot the data.
  • 6. Matlab Code % 1. Generate data for the two classes m1=[3 3]';S1=0.2*eye(2); m2=[2.3 2.3]';S2=1.9*eye(2); randn('seed',0); class1=mvnrnd(m1,S1,100)'; randn('seed',100); class2=mvnrnd(m2,S2,100)'; % 2. Compute the divergence. Employ the divergence function Distance=divergence(class1,class2) % Use function plotData to plot the data featureNames={'1','2'}; plotData(class1,class2,1:2,featureNames);
  • 7. Result Divergence Distance = 5.7233 -2 -1 0 1 2 3 4 5 6 -3 -2 -1 0 1 2 3 4 5 6 Feature: 1 Feature:2 In fact that the mean values of the two classes are very close, but the variances of the features differ significantly, so the divergence measure results in a large value.
  • 8. Bhattacharyya Distance and Chernoff Bound • The Bhattacharyya distance bears a close relationship to the error of the Bayesian classifier. The Bhattacharyya distance B1,2 between two classes is defined as Where, and |.| denotes the determinant of the respective matrix.
  • 9. • The Chernoff bound is an upper bound of the optimal Bayesian error. where P(ω1),P(ω2 ) are the a priori class probabilities
  • 10. Example 4.7.2 Consider a 2-class 2-dimensional classification problem with two equiprobable classes ω1 and ω2 modeled by the Gaussian distributions N (m1,S1) and N (m2,S2), respectively, where m1 = [3,3]T , S1 = 0.2I m2 = [2.3,2.3]T , S2 = 1.9I This is the case of small differences between mean values and large differences between variances. Compute the Bhattacharyya distance and the Chernoff bound between the two classes.
  • 11. Matlab Code % 1. Generate data for the two classes m1=[3 3]';S1=0.2*eye(2); m2=[2.2 2.2]';S2=1.9*eye(2); randn('seed',0); class1=mvnrnd(m1,S1,100)'; randn('seed',100); class2=mvnrnd(m2,S2,100)'; % 2. Compute the Bhattacharyya distance and the Chernoff bound BhattaDistance=divergenceBhata (class1,class2) ChernoffBound=0.5*exp(-BhattaDistance) % Use function plotData to plot the data featureNames={'1','2'}; plotData(class1,class2,1:2,featureNames);
  • 12. Result • Bhattacharyya Distance = 0.3730 • Chernoff Bound = 0.3443 Another Case m1 = [1,1]T, m2 = [4,4]T ,S1 = 0.2I, S2 = 0.2I • Bhattacharyya Distance = 5.9082 • Chernoff Bound = 0.0014 -2 -1 0 1 2 3 4 5 6 -3 -2 -1 0 1 2 3 4 5 6 Feature: 1 Feature:2 -1 0 1 2 3 4 5 6 -1 0 1 2 3 4 5 6 Feature: 1 Feature:2
  • 13. Measures Based on Scatter Matrices Scatter matrices : measures for quantifying the way feature vectors “scatter” in the feature space. Three such measures are the following: Where, Sw is the within-class scatter matrix, Sb is the between-class scatter matrix and Sm is mixture scatter matrix
  • 14. 1. Within-class scatter matrix (Sw ) Average feature variance per class 2. Between-class scatter matrix (Sb) Average variance of class means vs. global mean 3. Mixture scatter matrix (Sm) Feature covariance with respect to global mean Large values of J1, J2, and J3 indicate that data points in the respective feature space have small within-class variance and large between-class distance.
  • 15. Example 4.7.3 In this example, the previously defined J3 measure will be used to choose the best l features out of m > l originally generated features. To be more realistic, we will consider Example 4.6.2, where four features (mean, standard deviation, skewness, and kurtosis) were used. From Table 4.3 we have ten values for each feature and for each class. The goal is to select three out of the four features that will result in the best J3 value. 1. Normalize the values of each feature to have zero mean and unit variance. 2. Select three out of the four features that result in the best J3 value. 3. Plot the data points in the 3-dimensional space for the best feature combination.
  • 16. Matlab Code % 1. Load files “cirrhoticLiver.dat” and “fattyLiver.dat” class1=load('cirrhoticLiver.dat')'; class2=load('fattyLiver.dat')'; % Normalize features [class1,class2]=normalizeStd(class1,class2); % 2. Evaluate J3 for the three-feature combination [1, 2, 3], where 1 stands for the mean, 2 for std, and so on [J]=ScatterMatrices(class1([1 2 3],:),class2([1 2 3],:)) % 3. Plot the results of the feature combination [1, 2, 3]. Use function plotData featureNames = {'mean','standart dev','skewness','kurtosis'}; plotData(class1,class2,[1 2 3],featureNames);
  • 17. Result J [1 2 3] = 3.9742 Another Result J [1 2 4] = 3.9741 J [1 3 4] = 3.6195 J [2 3 4] = 1.3972 J [1 2 3] Indicate that data points in the respective feature space have small within- class variance and large between-class distance.
  • 18. Feature Subset Selection • Problem Select a subset of l features out of m originally available, with the goal of maximizing class separation. • Approaches: 1. Scalar Feature Selection : treat feature individually (ignores feature correlations) 2. Feature vector Selection: consider feature sets and feature correlation
  • 19. Scalar Feature Selection Procedure: 1. Select a class separability criterion C and compute its values for all the available features xk, k 1, 2, . . . ,m. Rank them in descending order and choose the one with the best C value. Let us say that this is xi1 . 2. To select the second feature, compute the cross-correlation coefficient between the chosen xi1 and each of the remaining m1 features.
  • 20. 3. Choose the feature xi2 for which where a1, a2 are weighting factors that determine the relative importance we give to the two terms 4. Select xik , k 3, . . . , l, so that That is, the average correlation with all previously selected features is taken into account
  • 21. Example 4.8.1 This example demonstrates the ranking of a given number of features according to their class-discriminatory power. We will use the same data as in Example 4.6.2. Recall that there are two classes and four features (mean, standard deviation, skewness, and kurtosis); their respective values for the two classes are summarized in Table 4.3. 1. Normalize the features so as to have zero mean and unit variance. Then use the FDR criterion to rank the features by considering each one independently (see Example 4.6.2). 2. Use the scalar feature selection technique described before to rank the features. Use the FDR in place of the criterion C. Set a1 = 0.2 and a2 = 0.8. 3. Comment on the results
  • 23. Result Scalar Feature Ranking 1. (mean) 2. (st. dev.) 3. (skewness) 4. (kurtosis) Scalar Feature Ranking with correlation 1. (mean) 2. (kurtosis) 3. (st. dev.) 4. (skewness) The rank order provided by the FDR class- separability criterion differs from the ranking that results when both the FDR and the cross- correlations are considered. It must be emphasized that different values for coefficients a1 and a2 may change the ranking.
  • 24. Feature Vector Selection Feature vector selection: consider feature sets and feature correlation. The goal now is to find the “best” combination of features. Techniques: • Exhaustive search • Sequential forward and backward selection • Forward floating search selection
  • 25. Exhaustive search According to this technique, all possible combinations will be “exhaustively” formed and for each combination its class separability will be computed. Different class separability measures will be used.
  • 26. Example 4.8.2 We will demonstrate the procedure for selecting the “best” feature combination in the context of computer-aided diagnosis in mammography. In digital X-ray breast imaging, micro calcifications are concentrations of small white spots representing calcium deposits in breast tissue (see Figure 4.6). Often they are assessed as a noncancerous sign, but depending on their shape and concentration patterns, they may indicate cancer. Their significance in breast cancer diagnosis tasks makes it important to have a pattern recognition system that detects the presence of micro calcifications in a mammogram. To this end, a procedure similar to the one used in Example 4.6.2 will be adopted here. Four features are derived for each ROI: mean, standard deviation, skewness, and kurtosis. Figure 4.6 shows the selected ROIs for the two classes (10 for normal tissue; 11 for abnormal tissue with micro calcifications).
  • 27. 1. Employ the exhaustive search method to select the best combination of three features out of the four previously mentioned, according to the divergence, the Bhattacharyya distance, and the J3 measure associated with the scatter matrices. 2. Plot the data in the subspace spanned by the features selected in step 1.
  • 28. Matlab Code % 1. Load files breastMicrocalcifications.dat and breastNormalTissue.dat class1=load('breastMicrocalcifications.dat')'; class2=load('breastNormalTissue.dat')'; % Normalize features so that all of them have zero mean and unit variance [class1,class2]=normalizeStd(class1,class2); % Select the best combination of features using the J3 criterion costFunction='ScatterMatrices'; [cLbest,Jmax]=exhaustiveSearch(class1,class2, costFunction,[3]) % 2. Form the classes c1 and c2 using the best feature combination c1 = class1(cLbest,:); c2 = class2(cLbest,:); % Plot the data using the plotData function featureNames = {'mean','st. dev.','skewness','kurtosis'}; plotData(c1,c2,cLbest,featureNames);
  • 29. Result cLbest = 1 2 3 Jmax = 1.9625 -3 -2 -1 0 1 2 -2 0 2 4 -2 -1 0 1 2 3 Feature: meanFeature: st. dev. Feature:skewness
  • 30. Suboptimal Searching Techniques Backward Selection 1. Start with all features in a vector (m features) 2. Interactively eliminate one feature, compute class separability criterion 3. Keep combination with the highest criterion value 4. Repeat with chosen combination until we have a vector of size l Number of Combinations Generated 1 + ½ ((m +1)m - l (l +1))
  • 31. Sequential Forward Selection 1. Compute criterion value for each feature 2. Select feature with best value 3. Form all possible pairings of best vector with another unused feature 4. Evaluate each using the criterion, select best vector 5. Repeat step 3 until we have a vector of size l Number of Combinations Generated lm- l(l- 1)/2
  • 32. Forward Floating Search Selection The idea of forward floating search selection: we to add a feature (forward), check smaller feature sets to see if we do better with this feature replacing a previously selected feature (backward). Terminate when k features selected.
  • 33. Example 4.8.4 The goal of this example is to demonstrate the (possible) differences in accuracy between the suboptimal feature selection techniques SFS, SBS, and SFFS and the optimal exhaustive search technique, which will be considered as the “gold standard.” We assume that we are working in a 2-dimensional feature space, and we evaluate the performance of the four feature selection methods in determining the best 2-feature combination, employing the functions exhaustive Search, Sequential Forward Selection, Sequential Backward Selection, and Sequential Forward Floating Selection.
  • 35. Result Exhaustive Search -> Best of two: (1) (6) Sequential Forward Selection -> Best of two: (1) (5) Sequential Backward Selection -> Best of two: (2) (9) Floating Search Method -> Best of two: (1) (6)
  • 36. Example 4.8.4. Designing a Classification System The goal of this example is to demonstrate the various stages involved in the design of a classification system. We will adhere to the 2-class texture classification task of Figure 4.8. The following five stages will be explicitly considered: • Data collection for training and testing • Feature generation • Feature selection • Classifier design • Performance evaluation
  • 37. Data collection for training and testing • A number of different images must be chosen for both the training and the test set. To form the training data set, 25 ROIs are selected from each image representing the two classes Feature generation • From each of the selected ROIs, a number of features is generated. We have decided on 20 different texture-related features.
  • 38. • Feature selection In this case, we use scalar feature selection (function composite Features Ranking), which employs the FDR criterion as well as a cross- correlation measure between pairs of features in order to rank them. • Classifier design In this stage different classifiers are employed in the selected feature space; the one that results in the best performance is chosen. In this example we use the k-nearest neighbour (k-NN) classifier.
  • 39. • Performance evaluation The performance of the classifier, in terms of its error rate, is measured against the test data set. However, in order to cover the case where the same data must be used for both training and testing, the leave-one-out (LOO) method will be employed. LOO is particularly useful in cases where only a limited data set is available.
  • 40. Matlab Code % 1. Read the training data and normalize the feature values. c1_train=load('trainingClass1.dat')'; c2_train=load('trainingClass2.dat')'; % Normalize dataset superClass=[c1_train c2_train]; for i=1:size(superClass,1) m(i)=mean(superClass(i,:)); % mean value of feature s(i)=std (superClass(i,:)); % std of feature superClass(i,:)=(superClass(i,:)-m(i))/s(i); end c1_train=superClass(:,1:size(c1_train,2)); c2_train=superClass(:,size(c1_train,2)+1:size(superClass,2)); % 2. Rank the features using the normalized training dataset. We have adopted % the scalar feature ranking technique which employs Fisher’s Discrimination Ratio % in conjunction with a feature correlation measure. The ranking results are returned in variable p. [T]=ScalarFeatureSelectionRanking(c1_train,c2_train,'Fisher'); [p]= compositeFeaturesRanking (c1_train,c2_train,0.2,0.8,T);
  • 41. % 4. Choose the best feature combination consisting of three features (out of the previously selected seven), using the exhaustive search method. • [cLbest,Jmax]= exhaustiveSearch(c1_train,c2_train,'ScatterMatrices',[3]); % 5. Form the resulting training dataset (using the best feature combination), along with the corresponding class labels. trainSet=[c1_train c2_train]; trainSet=trainSet(cLbest,:); trainLabels=[ones(1,size(c1_train,2)) 2*ones(1,size(c2_train,2))]; % 6. Load the test dataset and normalize using the mean and std (computed over the training dataset) Form the vector of the corresponding test labels. c1_test=load('testClass1.dat')'; c2_test=load('testClass2.dat')'; for i=1:size(c1_test,1) c1_test(i,:)=(c1_test(i,:)-m(i))/s(i); c2_test(i,:)=(c2_test(i,:)-m(i))/s(i); end c1_test=c1_test(inds,:); c2_test=c2_test(inds,:); testSet=[c1_test c2_test]; testSet=testSet(cLbest,:); testLabels=[ones(1,size(c1_test,2)) 2*ones(1,size(c2_test,2))];
  • 42. % 7. Plot the test dataset by means of function plotData Provide names for the features featureNames={'mean','stand dev','skewness','kurtosis','Contrast 0','Contrast 90','Contrast 45','Contrast 135','Correlation 0','Correlation 90','Correlation 45','Correlation 135','Energy 0','Energy 90','Energy 45','Energy 135','Homogeneity 0','Homogeneity 90','Homogeneity 45','Homogeneity 135'}; fNames=featureNames(inds); fNames=fNames(cLbest); plotData(c1_test(cLbest,:),c2_test(cLbest,:),1:3,fNames); % 8. Classify the feature vectors of the test data using the k-NN classifier [classified]=k_nn_classifier(trainSet,trainLabels,3,testSet); [classif_error]=compute_error(testLabels,classified) % Leave-one-out (LOO) method (a) Load all training data, normalize them and create class labels. c1_train=load('trainingClass1.dat')'; c1=c1_train; Labels1=ones(1,size(c1,2)); c2_train=load('trainingClass2.dat')'; c2=c2_train; Labels2=2*ones(1,size(c2,2)); [c1,c2]=normalizeStd(c1,c2); AllDataset=[c1 c2]; Labels=[Labels1 Labels2];
  • 43. % (b) Keep features of the best feature combination (determined previously) and discard all the rest. AllDataset=AllDataset(inds,:); AllDataset=AllDataset(cLbest,:); % (c) Apply the Leave One Out method on the k-NN classifier (k = 3) and compute the error. [M,N]=size(AllDataset); for i=1:N dec(i)=k_nn_classifier([AllDataset(:,1:i-1) AllDataset(:,i+1:N)], [Labels(1,1:i-1) Labels(1,i+1:N)],3,AllDataset(:,i)); end LOO_error=sum((dec~=Labels))/N
  • 44. Result Classification Error: 0.0200 The leave-one-out (LOO) error is 1%.
  • 45. Conclusion Feature Selection: select the most effective features according to given optimal criterion (ranking the features according to their contribution to recognition results)
  • 46. Reference • Sergios Theodoridis, Konstantinos Koutroumbas: An Introduction to Pattern Recognition : A MATLAB Approach