SlideShare a Scribd company logo
K.S .Phani krishna
      11MM91R05



                     1
      14-Apr-12
 Dimensionality       Reduction(DR)
  › Reduction of data with „D‟ dimensions to „d‟ dimensions

 Based
      on resulting features DR is
 categorized to
  › DR by Feature extraction
  › DR by Feature selection

 Feature    extraction
  › Linear/non-linear transformation of current features to
    generate new features

 Feature    selection
  › No transformation but to select the best ones from original set
    of features
  › Reduction of computation and discussion with domain experts
                                  14-Apr-12               2
 In present days for accurate analysis
  data from different modules are taken
  and fused
 This rapidly increases the features
  dimensionality
 Dimensionality reduction is required with
  out changing the toplogy of features for
  future discussions with domain
  experts(doctors)


                        14-Apr-12       3
 Feature     selection algorithms designed
    with different evaluation criteria broadly
    fall into
    › Filter ( distance, inforamation)
    › Wrapper (classification criteria)
    › Hybrid(filter +wrapper)
   Mode of search and type of data dealing with also
    opens new dimension
                                     14-Apr-12   4
14-Apr-12   5
algorithms from various categories are considered for
  understandability of various domains
 Unsupervised    feature selection using
  feature similarity [P.Mitra 2002](covers
  filter: dependancy search: sequential and clustering )

 Feature selection based on mutual
  information [H.Peng 2005](covers wrapper: {filter:
  information+ classifier bayesian}search: sequential and classification)

A  branch and bound algorithm for
  feature subset selection[P.M.Narendra
  1977] (covers filter: distance search: complete and classification)
 Feature usability Index [D.Sheet 2010]
                                       14-Apr-12                  6
 Sequential search using Dependency
 criteria on a unlabelled data
 removal of redundant features
    › Redundant feature is this context is a “feature which caries
        little or no additional information beyond that subsumed
        by remaining features”
   Introduced a similarity measure
    › Minimization of information loss in process of feature
        elimination
    ›   Zero when features are linearly dependent
    ›   Symmetry
    ›   Sensitivity to scaling
    ›   Invariance to rotation
    ›   Can compute in less time O(D2)
                                     14-Apr-12                 7
Maximal information compression
 Index(MICI)

MICI is eigen value for the direction normal to
  prinicipal component direction of feature pair(X,Y)
Maximum information is achieved if multivariate data
  is projected along principle component direction




                             14-Apr-12          8
MICI(X,Y)=MIN(eig(cov(X,Y))))
  = 0.5*(var(X)+var(Y)-4*sqrt((var(X)+var(Y))^2-4*var(X)*var(Y)*(1-
corr(X,Y)^2)))

   Selection        method:-
      › Partitioning of original data into
        homogeneous clusters based on k-NN
        principle using MICI
      › From each set most compact features are
        selected and remaining k candidates are
        discarded
      › Set threshold=min(MICI in first iteration)
      › For successive iteration if MICI > threshold
      K=k-1;

                                         14-Apr-12                    9
 Sequential  search using Dependency
  criteria on a labelled data
 Generally known as Max-relevance and
  Min-redundancy
    › Sprouted out from maximal dependacy
   For the first order incremental search mRMR is
    equalent to max dependancy
   selection of optimalfeatures
    › optimal feature is this context is a “feature which has more
      information regarding(Relevance) target class and least
      correlated (Redundancy) to other features ”

                                    14-Apr-12               11
Using mRMR criteria select n sequential features from the input X.
                                       14-Apr-12                     12
 complete   search using Distance criteria
  on a labelled data
 Sequential search
  › given D feature, need d fetures
  › No. of subsets toevaluate= D!/(d!*(D-d)!)
 Evaluating
          criteria „J‟should satisfy
 monotonicity




                          14-Apr-12             13
14-Apr-12   14
14-Apr-12   15
 Ranking   of individual feature based on
  › Homogeneity
  › Class specificity
  › Error in decision making
  Homogeneity:-
  one oulier scatter ratio




                             14-Apr-12   18
14-Apr-12   19
14-Apr-12   20
 UCI machine learning repository
 3 data sets on breast cancer
 Data1:- 194 samples 33 features
 Data2:- 683 samples 9 features
 Data3:- 569 samples 30 features




                      14-Apr-12     21
 Acquired    data is k-fold processed
 Classifiers used are lin-SVM, kmeans and
  bayesian but linear SVM has given
  presentable results
 Plotted  the accuracy on y axis and
  number os features selected on x axis
 In plots Red -mRMR Green-data
  compression blue- branch and bound
  and blbo is PCA
                       14-Apr-12       22
Red-mRMR Green-data compression blue-
      branch and bound and blob is PCA



Number of features considered
              14-Apr-12              23
Red-mRMR Green-data compression blue-
branch andRed-mRMRblbo is PCA compression blue-
          bound and Green-data
          branch and bound and blob is PCA




         Number of features considered

                     14-Apr-12                24
Red-mRMR Green-data compression blue-
     branch and bound and blob is PCA




Number of features considered
                 14-Apr-12               25
Narendra, P. and Fukunaga, K. (1977). A branch and bound algorithm for
feature subset selection, Computers, IEEE Transactions on C- 26(9): 917–
922.
somel, (20ering10).efficient feature subset selection
p.mitra(2002).unsupervised feature selection using feature similarity
Huan.l (2005) towards integrating feature selection algorithm for classification
and clus
 Debdoot sheet (2010); Feature usability index and optimal feature subset
selection, International journal of computer applications




                                            14-Apr-12                     26

More Related Content

What's hot

Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
Varad Meru
 
Dynamically Partitioning Big Data Using Virtual Machine Mapping
Dynamically Partitioning Big Data Using Virtual Machine MappingDynamically Partitioning Big Data Using Virtual Machine Mapping
Dynamically Partitioning Big Data Using Virtual Machine Mapping
AM Publications
 
Self-organizing map
Self-organizing mapSelf-organizing map
Self-organizing map
Tarat Diloksawatdikul
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET Journal
 
GCUBE INDEXING
GCUBE INDEXINGGCUBE INDEXING
GCUBE INDEXING
IJDKP
 
Scalable Machine Learning: The Role of Stratified Data Sharding
Scalable Machine Learning: The Role of Stratified Data ShardingScalable Machine Learning: The Role of Stratified Data Sharding
Scalable Machine Learning: The Role of Stratified Data Sharding
inside-BigData.com
 
Data cubes
Data cubesData cubes
Data cubes
Mohammed
 
Workflow Scheduling Techniques and Algorithms in IaaS Cloud: A Survey
Workflow Scheduling Techniques and Algorithms in IaaS Cloud: A Survey Workflow Scheduling Techniques and Algorithms in IaaS Cloud: A Survey
Workflow Scheduling Techniques and Algorithms in IaaS Cloud: A Survey
IJECEIAES
 
IRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction DataIRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET Journal
 
Visualization of Crisp and Rough Clustering using MATLAB
Visualization of Crisp and Rough Clustering using MATLABVisualization of Crisp and Rough Clustering using MATLAB
Visualization of Crisp and Rough Clustering using MATLAB
International Journal of Advance Research and Innovative Ideas in Education
 

What's hot (10)

Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
 
Dynamically Partitioning Big Data Using Virtual Machine Mapping
Dynamically Partitioning Big Data Using Virtual Machine MappingDynamically Partitioning Big Data Using Virtual Machine Mapping
Dynamically Partitioning Big Data Using Virtual Machine Mapping
 
Self-organizing map
Self-organizing mapSelf-organizing map
Self-organizing map
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
 
GCUBE INDEXING
GCUBE INDEXINGGCUBE INDEXING
GCUBE INDEXING
 
Scalable Machine Learning: The Role of Stratified Data Sharding
Scalable Machine Learning: The Role of Stratified Data ShardingScalable Machine Learning: The Role of Stratified Data Sharding
Scalable Machine Learning: The Role of Stratified Data Sharding
 
Data cubes
Data cubesData cubes
Data cubes
 
Workflow Scheduling Techniques and Algorithms in IaaS Cloud: A Survey
Workflow Scheduling Techniques and Algorithms in IaaS Cloud: A Survey Workflow Scheduling Techniques and Algorithms in IaaS Cloud: A Survey
Workflow Scheduling Techniques and Algorithms in IaaS Cloud: A Survey
 
IRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction DataIRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction Data
 
Visualization of Crisp and Rough Clustering using MATLAB
Visualization of Crisp and Rough Clustering using MATLABVisualization of Crisp and Rough Clustering using MATLAB
Visualization of Crisp and Rough Clustering using MATLAB
 

Viewers also liked

Lecture 10h
Lecture 10hLecture 10h
Lecture 10h
Krishna Karri
 
Lecture 7
Lecture 7Lecture 7
Lecture 7
Krishna Karri
 
Translational health research
Translational health researchTranslational health research
Translational health research
Krishna Karri
 
Em
EmEm
Lecture 8
Lecture 8Lecture 8
Lecture 8
Krishna Karri
 
Lecture 13
Lecture 13Lecture 13
Lecture 13
Krishna Karri
 
Lecture 12
Lecture 12Lecture 12
Lecture 12
Krishna Karri
 
Lecture 3
Lecture 3Lecture 3
Lecture 3
Krishna Karri
 
Lecture 4
Lecture 4Lecture 4
Lecture 4
Krishna Karri
 
Lecture 11
Lecture 11Lecture 11
Lecture 11
Krishna Karri
 
Экостроительство. Перспективы для зеленого туризма.
Экостроительство. Перспективы для зеленого туризма.Экостроительство. Перспективы для зеленого туризма.
Экостроительство. Перспективы для зеленого туризма.
Andrey Bobrovitskiy
 
Перемакультура. Арбоскульптура. Дом как часть живой природы.
Перемакультура. Арбоскульптура. Дом как часть живой природы.Перемакультура. Арбоскульптура. Дом как часть живой природы.
Перемакультура. Арбоскульптура. Дом как часть живой природы.
Andrey Bobrovitskiy
 
Lecture9
Lecture9Lecture9
Lecture9
Krishna Karri
 
Lecture 6
Lecture 6Lecture 6
Lecture 6
Krishna Karri
 
здоровый дом здоровый ребенок
здоровый дом  здоровый ребенокздоровый дом  здоровый ребенок
здоровый дом здоровый ребенокAndrey Bobrovitskiy
 
Lecture 9h
Lecture 9hLecture 9h
Lecture 9h
Krishna Karri
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
Krishna Karri
 

Viewers also liked (17)

Lecture 10h
Lecture 10hLecture 10h
Lecture 10h
 
Lecture 7
Lecture 7Lecture 7
Lecture 7
 
Translational health research
Translational health researchTranslational health research
Translational health research
 
Em
EmEm
Em
 
Lecture 8
Lecture 8Lecture 8
Lecture 8
 
Lecture 13
Lecture 13Lecture 13
Lecture 13
 
Lecture 12
Lecture 12Lecture 12
Lecture 12
 
Lecture 3
Lecture 3Lecture 3
Lecture 3
 
Lecture 4
Lecture 4Lecture 4
Lecture 4
 
Lecture 11
Lecture 11Lecture 11
Lecture 11
 
Экостроительство. Перспективы для зеленого туризма.
Экостроительство. Перспективы для зеленого туризма.Экостроительство. Перспективы для зеленого туризма.
Экостроительство. Перспективы для зеленого туризма.
 
Перемакультура. Арбоскульптура. Дом как часть живой природы.
Перемакультура. Арбоскульптура. Дом как часть живой природы.Перемакультура. Арбоскульптура. Дом как часть живой природы.
Перемакультура. Арбоскульптура. Дом как часть живой природы.
 
Lecture9
Lecture9Lecture9
Lecture9
 
Lecture 6
Lecture 6Lecture 6
Lecture 6
 
здоровый дом здоровый ребенок
здоровый дом  здоровый ребенокздоровый дом  здоровый ребенок
здоровый дом здоровый ребенок
 
Lecture 9h
Lecture 9hLecture 9h
Lecture 9h
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 

Similar to 11 mm91r05

Data Preparation and Reduction Technique in Intrusion Detection Systems: ANOV...
Data Preparation and Reduction Technique in Intrusion Detection Systems: ANOV...Data Preparation and Reduction Technique in Intrusion Detection Systems: ANOV...
Data Preparation and Reduction Technique in Intrusion Detection Systems: ANOV...
CSCJournals
 
Cluster
ClusterCluster
LIDAR- Light Detection and Ranging.
LIDAR- Light Detection and Ranging.LIDAR- Light Detection and Ranging.
LIDAR- Light Detection and Ranging.
Gaurav Agarwal
 
2013 feature selection for intrusion detection using nsl kdd
2013 feature selection for intrusion detection using nsl kdd2013 feature selection for intrusion detection using nsl kdd
2013 feature selection for intrusion detection using nsl kdd
Van Thanh
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
mrizwan969
 
Feature selection a novel
Feature selection a novelFeature selection a novel
Feature selection a novel
csandit
 
Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...
Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...
Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...
csandit
 
Ieeepro techno solutions ieee java project - generalized approach for data
Ieeepro techno solutions  ieee java project - generalized approach for dataIeeepro techno solutions  ieee java project - generalized approach for data
Ieeepro techno solutions ieee java project - generalized approach for data
hemanthbbc
 
Ieeepro techno solutions ieee java project - generalized approach for data
Ieeepro techno solutions  ieee java project - generalized approach for dataIeeepro techno solutions  ieee java project - generalized approach for data
Ieeepro techno solutions ieee java project - generalized approach for data
hemanthbbc
 
Ieeepro techno solutions ieee java project - generalized approach for data
Ieeepro techno solutions  ieee java project - generalized approach for dataIeeepro techno solutions  ieee java project - generalized approach for data
Ieeepro techno solutions ieee java project - generalized approach for data
hemanthbbc
 
Ieeepro techno solutions ieee dotnet project - generalized approach for data
Ieeepro techno solutions  ieee dotnet project - generalized approach for dataIeeepro techno solutions  ieee dotnet project - generalized approach for data
Ieeepro techno solutions ieee dotnet project - generalized approach for data
ASAITHAMBIRAJAA
 
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Maxim Kazantsev
 
Ijsws14 423 (1)-paper-17-normalization of data in (1)
Ijsws14 423 (1)-paper-17-normalization of data in (1)Ijsws14 423 (1)-paper-17-normalization of data in (1)
Ijsws14 423 (1)-paper-17-normalization of data in (1)
Raghavendra Pokuri
 
Classfication Basic.ppt
Classfication Basic.pptClassfication Basic.ppt
Classfication Basic.ppt
henonah
 
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...
Beniamino Murgante
 
The Use of K-NN and Bees Algorithm for Big Data Intrusion Detection System
The Use of K-NN and Bees Algorithm for Big Data Intrusion Detection SystemThe Use of K-NN and Bees Algorithm for Big Data Intrusion Detection System
The Use of K-NN and Bees Algorithm for Big Data Intrusion Detection System
IOSRjournaljce
 
Chapter8
Chapter8Chapter8
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...
IJDKP
 
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...
IJDKP
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLab
Impetus Technologies
 

Similar to 11 mm91r05 (20)

Data Preparation and Reduction Technique in Intrusion Detection Systems: ANOV...
Data Preparation and Reduction Technique in Intrusion Detection Systems: ANOV...Data Preparation and Reduction Technique in Intrusion Detection Systems: ANOV...
Data Preparation and Reduction Technique in Intrusion Detection Systems: ANOV...
 
Cluster
ClusterCluster
Cluster
 
LIDAR- Light Detection and Ranging.
LIDAR- Light Detection and Ranging.LIDAR- Light Detection and Ranging.
LIDAR- Light Detection and Ranging.
 
2013 feature selection for intrusion detection using nsl kdd
2013 feature selection for intrusion detection using nsl kdd2013 feature selection for intrusion detection using nsl kdd
2013 feature selection for intrusion detection using nsl kdd
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Feature selection a novel
Feature selection a novelFeature selection a novel
Feature selection a novel
 
Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...
Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...
Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...
 
Ieeepro techno solutions ieee java project - generalized approach for data
Ieeepro techno solutions  ieee java project - generalized approach for dataIeeepro techno solutions  ieee java project - generalized approach for data
Ieeepro techno solutions ieee java project - generalized approach for data
 
Ieeepro techno solutions ieee java project - generalized approach for data
Ieeepro techno solutions  ieee java project - generalized approach for dataIeeepro techno solutions  ieee java project - generalized approach for data
Ieeepro techno solutions ieee java project - generalized approach for data
 
Ieeepro techno solutions ieee java project - generalized approach for data
Ieeepro techno solutions  ieee java project - generalized approach for dataIeeepro techno solutions  ieee java project - generalized approach for data
Ieeepro techno solutions ieee java project - generalized approach for data
 
Ieeepro techno solutions ieee dotnet project - generalized approach for data
Ieeepro techno solutions  ieee dotnet project - generalized approach for dataIeeepro techno solutions  ieee dotnet project - generalized approach for data
Ieeepro techno solutions ieee dotnet project - generalized approach for data
 
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

 
Ijsws14 423 (1)-paper-17-normalization of data in (1)
Ijsws14 423 (1)-paper-17-normalization of data in (1)Ijsws14 423 (1)-paper-17-normalization of data in (1)
Ijsws14 423 (1)-paper-17-normalization of data in (1)
 
Classfication Basic.ppt
Classfication Basic.pptClassfication Basic.ppt
Classfication Basic.ppt
 
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...
 
The Use of K-NN and Bees Algorithm for Big Data Intrusion Detection System
The Use of K-NN and Bees Algorithm for Big Data Intrusion Detection SystemThe Use of K-NN and Bees Algorithm for Big Data Intrusion Detection System
The Use of K-NN and Bees Algorithm for Big Data Intrusion Detection System
 
Chapter8
Chapter8Chapter8
Chapter8
 
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...
 
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLab
 

11 mm91r05

  • 1. K.S .Phani krishna 11MM91R05 1 14-Apr-12
  • 2.  Dimensionality Reduction(DR) › Reduction of data with „D‟ dimensions to „d‟ dimensions  Based on resulting features DR is categorized to › DR by Feature extraction › DR by Feature selection  Feature extraction › Linear/non-linear transformation of current features to generate new features  Feature selection › No transformation but to select the best ones from original set of features › Reduction of computation and discussion with domain experts 14-Apr-12 2
  • 3.  In present days for accurate analysis data from different modules are taken and fused  This rapidly increases the features dimensionality  Dimensionality reduction is required with out changing the toplogy of features for future discussions with domain experts(doctors)  14-Apr-12 3
  • 4.  Feature selection algorithms designed with different evaluation criteria broadly fall into › Filter ( distance, inforamation) › Wrapper (classification criteria) › Hybrid(filter +wrapper)  Mode of search and type of data dealing with also opens new dimension 14-Apr-12 4
  • 6. algorithms from various categories are considered for understandability of various domains  Unsupervised feature selection using feature similarity [P.Mitra 2002](covers filter: dependancy search: sequential and clustering )  Feature selection based on mutual information [H.Peng 2005](covers wrapper: {filter: information+ classifier bayesian}search: sequential and classification) A branch and bound algorithm for feature subset selection[P.M.Narendra 1977] (covers filter: distance search: complete and classification)  Feature usability Index [D.Sheet 2010] 14-Apr-12 6
  • 7.  Sequential search using Dependency criteria on a unlabelled data  removal of redundant features › Redundant feature is this context is a “feature which caries little or no additional information beyond that subsumed by remaining features”  Introduced a similarity measure › Minimization of information loss in process of feature elimination › Zero when features are linearly dependent › Symmetry › Sensitivity to scaling › Invariance to rotation › Can compute in less time O(D2) 14-Apr-12 7
  • 8. Maximal information compression Index(MICI) MICI is eigen value for the direction normal to prinicipal component direction of feature pair(X,Y) Maximum information is achieved if multivariate data is projected along principle component direction 14-Apr-12 8
  • 9. MICI(X,Y)=MIN(eig(cov(X,Y)))) = 0.5*(var(X)+var(Y)-4*sqrt((var(X)+var(Y))^2-4*var(X)*var(Y)*(1- corr(X,Y)^2)))  Selection method:- › Partitioning of original data into homogeneous clusters based on k-NN principle using MICI › From each set most compact features are selected and remaining k candidates are discarded › Set threshold=min(MICI in first iteration) › For successive iteration if MICI > threshold K=k-1; 14-Apr-12 9
  • 10.  Sequential search using Dependency criteria on a labelled data  Generally known as Max-relevance and Min-redundancy › Sprouted out from maximal dependacy  For the first order incremental search mRMR is equalent to max dependancy  selection of optimalfeatures › optimal feature is this context is a “feature which has more information regarding(Relevance) target class and least correlated (Redundancy) to other features ” 14-Apr-12 11
  • 11. Using mRMR criteria select n sequential features from the input X. 14-Apr-12 12
  • 12.  complete search using Distance criteria on a labelled data  Sequential search › given D feature, need d fetures › No. of subsets toevaluate= D!/(d!*(D-d)!)  Evaluating criteria „J‟should satisfy monotonicity 14-Apr-12 13
  • 13. 14-Apr-12 14
  • 14. 14-Apr-12 15
  • 15.  Ranking of individual feature based on › Homogeneity › Class specificity › Error in decision making Homogeneity:- one oulier scatter ratio 14-Apr-12 18
  • 16. 14-Apr-12 19
  • 17. 14-Apr-12 20
  • 18.  UCI machine learning repository  3 data sets on breast cancer  Data1:- 194 samples 33 features  Data2:- 683 samples 9 features  Data3:- 569 samples 30 features 14-Apr-12 21
  • 19.  Acquired data is k-fold processed  Classifiers used are lin-SVM, kmeans and bayesian but linear SVM has given presentable results  Plotted the accuracy on y axis and number os features selected on x axis  In plots Red -mRMR Green-data compression blue- branch and bound and blbo is PCA 14-Apr-12 22
  • 20. Red-mRMR Green-data compression blue- branch and bound and blob is PCA Number of features considered 14-Apr-12 23
  • 21. Red-mRMR Green-data compression blue- branch andRed-mRMRblbo is PCA compression blue- bound and Green-data branch and bound and blob is PCA Number of features considered 14-Apr-12 24
  • 22. Red-mRMR Green-data compression blue- branch and bound and blob is PCA Number of features considered 14-Apr-12 25
  • 23. Narendra, P. and Fukunaga, K. (1977). A branch and bound algorithm for feature subset selection, Computers, IEEE Transactions on C- 26(9): 917– 922. somel, (20ering10).efficient feature subset selection p.mitra(2002).unsupervised feature selection using feature similarity Huan.l (2005) towards integrating feature selection algorithm for classification and clus Debdoot sheet (2010); Feature usability index and optimal feature subset selection, International journal of computer applications 14-Apr-12 26