Presented by
DEEPAN SHAKARAVARTHY V
M.Tech.,
 Using FAST Algorithm to identify a subset.
 Based on A fast clustering-based feature selection algorithm
(FAST) and experimentally evaluated.
 Efficiency and effectiveness it adopt the efficient (MST)
clustering method.
 Subset selection is an effective way for reducing
dimensionality.
 Removing irrelevant data.
 Increasing learning accuracy.
 Improving results.
 Accuracy of the learning algorithms is not
guaranteed.
 Selected features is limited and the computational
complexity is large.
 Many irrelevant and redundant features are
possible.
 The selected features is limited.
 The computational complexity is large.
 The accuracy of the learning algorithms is not
guaranteed.
 Forming clusters by using graph-theoretic
clustering methods.
 Selection algorithms effectively eliminate
irrelevant features.
 Achieve significant reduction of dimensionality.
 It provide good feature subsets selection.
 The efficiently deal with both irrelevant and redundant
features.
 Totally find the duplicate data set.
 Less time to find results.
 Distributed clustering
 Subset Selection Algorithm
 Time complexity
 Microarray data
 Data source
 Irrelevant feature
 
 Cluster words into groups.
 Cluster evaluation measure based on distance.
 Even compared with other feature selection
methods the obtained accuracy is lower.
 The Irrelevant features, along with redundant
features.
 Identify and remove as much of the irrelevant data.
 Good feature subsets contain features highly
correlated.
 Calculated in terms of the number of instances in a
given dataset.
 Features selection as relevant ones in the first part.
 Construct a complete graph from relevant feature.
 Partitions the MST and choose the representative
features with the complexity.
 Use to identify length of the data.
 It manage a searchable index.
 Subset selection feature has been improved.
 FAST ranks 1 again with the proportion of
selected features.
 Purposes of evaluating the performance and
effectiveness of our proposed FAST algorithm.
 Data sets have more than 10,000 features.
 Hospitality dataset is used.
 Right relevance measure is selected
1. Minimum spanning tree
2. The partitioning of cluster
3. Representative features from the clusters
 The conclusion of the project is a subset of good
features with respect to the target concepts.
 Feature selection is used to cluster the related data
in databases.
 Feature subset selection is an effective way for
reducing dimensionality, removing irrelevant data,
increasing learning accuracy.
 [1] H. Almuallim and T.G. Dietterich, (1994), ““Algorithms for
Identifying Relevant Features,” Artificial Intelligence, vol. 69,
nos. 1/2, pp. 279-305.
 [2] L.D. Baker and A.K. McCallum, (1998), “ Learning boolean
concepts in the presence of many irrelevant features,” Proc. 21st
Ann. Int’l ACM SIGIR Conf. Research and Development in
information Retrieval, pp. 96-103.
 [3] Arauzo-Azofra, J.M. Benitez, and J.L. Castro, (2004), “A
Feature Set Measure Based on Relief,” Proc. Fifth Int’l Conf.
Recent Advances in Soft.
High dimesional data (FAST clustering ALG) PPT

High dimesional data (FAST clustering ALG) PPT

  • 1.
  • 2.
     Using FASTAlgorithm to identify a subset.  Based on A fast clustering-based feature selection algorithm (FAST) and experimentally evaluated.  Efficiency and effectiveness it adopt the efficient (MST) clustering method.
  • 3.
     Subset selectionis an effective way for reducing dimensionality.  Removing irrelevant data.  Increasing learning accuracy.  Improving results.
  • 4.
     Accuracy ofthe learning algorithms is not guaranteed.  Selected features is limited and the computational complexity is large.  Many irrelevant and redundant features are possible.
  • 5.
     The selectedfeatures is limited.  The computational complexity is large.  The accuracy of the learning algorithms is not guaranteed.
  • 6.
     Forming clustersby using graph-theoretic clustering methods.  Selection algorithms effectively eliminate irrelevant features.  Achieve significant reduction of dimensionality.
  • 7.
     It providegood feature subsets selection.  The efficiently deal with both irrelevant and redundant features.  Totally find the duplicate data set.  Less time to find results.
  • 9.
     Distributed clustering Subset Selection Algorithm  Time complexity  Microarray data  Data source  Irrelevant feature  
  • 10.
     Cluster wordsinto groups.  Cluster evaluation measure based on distance.  Even compared with other feature selection methods the obtained accuracy is lower.
  • 11.
     The Irrelevantfeatures, along with redundant features.  Identify and remove as much of the irrelevant data.  Good feature subsets contain features highly correlated.
  • 12.
     Calculated interms of the number of instances in a given dataset.  Features selection as relevant ones in the first part.  Construct a complete graph from relevant feature.  Partitions the MST and choose the representative features with the complexity.
  • 13.
     Use toidentify length of the data.  It manage a searchable index.  Subset selection feature has been improved.  FAST ranks 1 again with the proportion of selected features.
  • 14.
     Purposes ofevaluating the performance and effectiveness of our proposed FAST algorithm.  Data sets have more than 10,000 features.  Hospitality dataset is used.
  • 15.
     Right relevancemeasure is selected 1. Minimum spanning tree 2. The partitioning of cluster 3. Representative features from the clusters
  • 19.
     The conclusionof the project is a subset of good features with respect to the target concepts.  Feature selection is used to cluster the related data in databases.  Feature subset selection is an effective way for reducing dimensionality, removing irrelevant data, increasing learning accuracy.
  • 20.
     [1] H.Almuallim and T.G. Dietterich, (1994), ““Algorithms for Identifying Relevant Features,” Artificial Intelligence, vol. 69, nos. 1/2, pp. 279-305.  [2] L.D. Baker and A.K. McCallum, (1998), “ Learning boolean concepts in the presence of many irrelevant features,” Proc. 21st Ann. Int’l ACM SIGIR Conf. Research and Development in information Retrieval, pp. 96-103.  [3] Arauzo-Azofra, J.M. Benitez, and J.L. Castro, (2004), “A Feature Set Measure Based on Relief,” Proc. Fifth Int’l Conf. Recent Advances in Soft.