SlideShare a Scribd company logo
1 of 5
Download to read offline
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 03 Issue: 07 | July-2016 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 14
Survey on Feature Selection and Dimensionality Reduction
Techniques
Govinda.K1, Kevin Thomas 2
12School of Computing Science Engineering, VIT University, Vellore, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Data mining methods are used to extract useful
information from huge amount of data. Mining of data is a
time consuming process because the data thathastobemined
may be comprised of large number of dimensions. So we can
say that number of dimensions exponentially vary with
computation time. In order to speed up the decision making
process, dimensionality reduction techniques came into
existence. In this paper we have discussed about various
techniques of dimensionality reduction that exists currently.
Key Words: PCA, SVD, SVM, LCM, CCA
1. INTRODUCTION
Advancement in data collection and data storage during the
past decade has led to an information overload. Researchers
from the field of engineering, biology, astronomy, remote
sensing, consumer transaction and economics face larger
observations and simulation everyday.[1]Data miningaims
at processing, classifying and selecting useful data from any
given dataset. The dataset on which data miningisdone may
consist of information of varied form and enormous size.
This information may be inappropriateormayshowmissing
values that may mislead the user due to the irregularities.
These irregularities are detected, analyzed and modified
during data pre- processing by applying certain
modifications and behavioral predictions.
Dimensionality reduction is among the predominant
techniques of data pre-processing which helps to
incorporate a systematized structure into the dataset, prior
to the mining process. It is also an effective way to downsize
data which encourages effective storage and retrieval of
data. Some of its important applications are found in the
areas like text mining,image processing,patternrecognition,
image retrieval etc. Dimensionality reduction is
implemented using various algorithms such as SVD, PCA,
ICA, SVM etc. [2] This paper describes and analysis OF nine
such algorithms to find each ones strength and weaknesses.
1.1 Why Dimensionality Reduction
 It is easy and convenient way to collect data
 Data that is collected, is not just from data mining
 Data is accumulated in an unprecedented speed
 For effective machine learning and data mining data
processing is an important part
 Helps in downsizing data
 Visualization
 Data compression
 Noise removal
1.2 Applications of Dimensionality Reduction
 Text mining
 Image retrieval
 Customer management
 Face recognition
 Intrusion detection
 Microarray data analysis
 Handwritten digit recognition
 Protein classification
2. MAJOR TECHNIQUES OF DIMENSIONALITY
REDUCTION
2.1 Feature Selection
A process that chooses an optimal subset of features
according to an objective function [3]. Feature selection is
done to remove noise, reducedimensionality and toimprove
mining performance such as speed of learning, predictive
accuracy and to make the mined results simple to
understand.
2.2 Feature Extraction
It refers to mapping of high-dimensional data to a lower-
dimensional space. [3]. Criterion for thismaydiffer based on
different problem settings such as:
3. ALGORITHMS
3.1 Singular Value Decomposition (SVD)
SVD is a gene selection procedure that is performed to
reduce dimensionality in data. It is a matrix factorization
method which comes under linear vector algebra. The main
objective of SVD in data analysis of gene expression is to
identify and extract the structural constraints within the
 Unsupervised Setting – minimum loss of information
 Supervised Setting – maximum class discrimination
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 03 Issue: 07 | July-2016 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 15
data and also to relate significantassociations.Themainidea
behind SVD is to calculate eigenvalues and eigenvectors of
covariance matrix of sample-gene matrix [4]. The
eigenvalues would help to deduce the respective (or
corresponding) eigenvectors; for higher eigenvalues the
corresponding eigenvectors would contain a higher
variability. Usuallytheeigenvectorsthatappearinitiallywith
higher unpredictability are considered as prime Principle
Components (PCs). These PCs are used to reduce the data
into smaller dimensions [5].
The tool used to analyze the data is known as clustering.
Group of genes the show similarity (i.e. similar expression
profile) are discovered by Clustering. The two methods in
clustering are Model-Free andModel-based methods[6].
Model-freemethods do not have any probabilistic tree
whereas model-based clustering method clusters are build
based on assumptions such that data follows mixture
distribution.
SVD overcomes two main difficulties in gene expressiondata
that is parameter estimation and normality assumptions on
gene expression. After applying SVD on data a technique
called probit transformation is done to improve the working
of model-freemethod. SVD can be applied to dataset that has
both scattered and un-scattered genes.
3.2 Partial Least Squares Regression (PLSR)
Research in science and engineering often involves using
controllable and/or easy-to-measure variables (factors) to
explain, regulate, or predict the behavior (responses) of
other variables. When the factors are few in number, they
are not significantly redundant (collinear), and have a well-
understood relationshiptotheresponses,thena goodwayto
turn data into information could be multiple linear
regression (MLR). When any one of these three conditions
does not satisfy, MLR proves to be inefficient or
inappropriate. Hence, researchers face problems withmany
variables and ill-understood relationships, while
constructing a good predictive model. Example:
Spectrographs.
When the factors are many and highly collinear we use
Partial Least Squares (PLS) method for constructing
predictive models. This emphasizes on predicting the
response rather than trying to understand the relationship
between the variables. For example, PLS is usually not
appropriate for screening out factors that have a negligible
effect on response. However, when thegoal ispredictionand
there is no practical need to limit the number of factors, PLS
may be a very useful tool.
MLR can be used with many factors. However, if the number
of factors gets too large, you are likely to get a model thatfits
the sampled data perfectly but that will fail to predict new
data well. This phenomenon is called over-fitting. In such
cases, although there are many manifest factors, there may
be only a few underlying factors or latent factors that
account for most of the variation in response. The main
intention of PLS is to extract these latent factors, accounting
for as much of the manifest factor variationaspossiblewhile
modelling the responses well[7].
3.3 Linear Discriminant Analysis (LDA)
Linear discriminant analysis (LDA) is a method which finds
its application in statistics, machine learning and pattern
recognition to find a linear combination of features which
characterizes or separates two or more classes of objects.
This resulting combination is used as a linear classifier and
also for dimensionality reduction before later classification.
LDA is closely related to ANOVA (analysis of variance) and
regression analysis, attempts to express one dependent
variable as a linear combination of other features. However,
ANOVA uses categorical independent variables and a
continuous dependent variable. In discriminant analysis
continuous independent variables and a categorical
dependent variable i.e. the class label are used. Logistic
regression is more similar to LDA, as it explainsa categorical
variable by the values of continuous independent variables.
We prefer these methods in applications where it is not
reasonable to assume that the independent variables are
normally distributed. This is the fundamental assumption of
the LDA method.
When the measurements madeon independent variablesfor
each observation are continuous quantities LDA is used. For
categorical independent variables, a similar technique is
discriminant correspondence analysis [8].
3.4 Locally Linear Embedding (LLE)
High-dimensional data can often be converted to low-
dimensional data with little or no fundamental loss of
information. A simple and widely used method for
dimensionality reduction is principal component analysis
(PCA). In this method data points by their respective
orthogonal projections on a subspace of low dimension
spanned by the directions (also called components,features,
factors or sources) of greatest variance in the data set is
represented. The two recently proposednon-
linear generalizations of PCA are locally linear embedding
(LLE) and ISOMAP algorithms. It was developed for
visualization purposes, these two methods projecthigh-
dimensional data into a two orlow-dimensional subspaceby
extracting meaningful components in a non-linear fashion.
The locally linear embedding algorithmassumesthata high-
dimensional data set lies on, or near to, a smooth low-
dimensional manifold. Small patches of the manifold, eachof
which contains a fraction of the data set, can be equipped
with individual localco-ordinates. High-dimensional co-
ordinates of each patch can be mapped into corresponding
local co-ordinates by means of an essentially linear
transformation. This method attempts to find a global
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 03 Issue: 07 | July-2016 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 16
transformation of the high-dimensional co-ordinates into
low- dimensional ones by exploiting adjacency information
about closely located data points, this information being a
form of summarization of the local transformationsbetween
the high- and low-dimensional co-ordinates[9].
3.5 Generic Algorithm
Computational studies of Darwinian evolution and natural
selection have led to numerous models for resolving
optimization. Generic Algorithms comprise a subsetofthese
evolution based optimization problems techniques focusing
on the application of selection, mutation, andrecombination
to a population of competing problem solutions. Generic
Algorithms are parallel, iterative optimizers, and have been
successfully applied to a broad spectrum of optimization
problems, including many pattern recognition and
classification tasks [10].
3.6 Principal Component Analysis (PCA)
It is a Feature Extraction technique that is used to analyze
statistical data by transforming the starting set of variables
into various set of linear combinations which are known as
the principal components (PC), and these components have
some specific properties with regardtovariances.Thismake
the dimensionality of the system more concentrated and at
the same time, variable connections information is also
maintained [11]. Calculations are made on the data set by
analysing eigenvalue and its eigenvectors,covariancematrix
arranged systematically in descendingorder. Thistechnique
produces the maximum feasibility arbitrary solutions in
the high-dimensional space. We can view it as data
visualization method because here, the high dimensional
data sets can be reduced to two dimensional or three
dimensional data sets that can be easily plottedusing graphs
or charts.
Simple Principal Components Analysis techniquetakea data
oriented approach and it is used in case of dimensionality
reduction of two highly dimensional picture databases. In
such cases it has showed quick convergence rate and
stability and has produced estimated solutions without
considering thevariance-covariance matrix [12]. So this
technique is more beneficial in use as compared to other
existing methods.
Matrix method and data method are the key methods for
PCA. In the matrix method as the name suggest, matrix is
formed for the datasets which comprises of the calculated
variance and co variance of the data.Furtherdiagonalization
technique is also applied on the matrix. Data methods
directly deal with the data. PCA is generally applied before
clustering of the datasets happens so as to get an improved
drawing out of the cluster organization.
This technique also has a disadvantage of not considering
the class separability because of the absence of class label of
the feature vector [13].PCA mainly focuses ontherotationof
the coordinate axes, that is transformed in the direction
which as the maximum variance.
The traditional PCA is implied only on linear transformation
and is not effective fornon-linear data. So, to overcome this
drawback, non-linear transformation is applied to high
dimensional space [14].
3.7 Support Vector Machines (SVM)
SVM is a technique which can be applied to several domains
consecutively. SVM mainly focus on the microarray cancer
data that contains huge number of gene expressions. The
objective of SVM is to get the mostoptimal separating hyper-
plane that has the maximum margin (w).this techniques is
more suitable in analyzing gene expressions because it
specifically concentrateondatasetswhichhaslowernumber
of samples as compared to the genes and it is also useful in
gene selection.
SVM is useful because the resultsproduced bythistechnique
are optimal and distinctive. It is s used in combination with
Recursive Feature Replacement (RFR) that produces a
unique assessment of generalization of feature (genes).it
starts with the gene set which is complete; it removes the
least significant gene from the classifier in subsequent
iterations. There is another method called SVM-T-
RFE method which works on smaller gene subsets and thus
produces the highest accuracy.
There is one more method similar to SVM, it is called as SVD.
The major difference between the two methods is that SVM
is a supervised method while SVD is an unsupervised
method. They work on the same application areas like
feature selection and modelling of dynamics of gene
expression and clustering. But SVM is consideredtobemore
effective over other existing procedures [15].
3.8 Independent Component Analysis (ICA)
The computational technique that is used to obtain reduced
subcomponents from the assorted signal aftersplittingthem
is called Independent Component Analysis technique.to
understand this methodwecanconsidera simpleexampleof
ICA called “cocktail party ”. Here a sample data is taken
which consists of individuals interacting with eachotherina
room and there fundamental speech signals are divided.In a
generalized way we can say for N sources at least N
estimations are needed for primal signals mining.
This can be also achieved using Regularized Whitening
method in which dimension of independent sets is reduced
to limited set. This algorithm hasthepotential toincorporate
statistics of higher order that comprises of complementary
data.it finds its importancewhensendingofdata differsfrom
its prevalent parameters. Whitened data is needed for this
method, which is obtained from the identity covariance
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 03 Issue: 07 | July-2016 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 17
matrix. It can be used as an augmented version of the PCA
method. Initial projection vectors are used in ICA, usingthee
parameters, three more algorithms are developed for ICA-
DR. This procedure is further classified into three
subprocedures. ICA-DR1, is the first procedure which uses
virtual parameter together with prioritization and selection
components. The next one is the ICA-DR2, here ICA is
executed as an unauthorized algorithm and its randomness
is characterized by randomly choosing initial projection
vectors. The third algorithm is called the ICA-DR3, this
method uses initialization algorithm together with virtual
dimension to develop more appropriate initial projection
vectors group.so that they can be used to substitute these
vectors required by ICA to develop each of its ICs.
ICA is closely related to Projection Pursuit (PP), which is
another statically data analysis tool, whose aim is to reduce
the dimensions of the multivariate data sets by identifying
the“interested projections “from the projection directory.
3.9 Canonical Correlation Analysis (CCA)
This technique was first introduced by Harold Hoteling; he
used cross covariance matrices for dimensionality
reduction.in this technique, two different setofvariables are
taken along with their correlations,thiscorrelationshelpsto
establish linear combinations of the sets with the set of
highest correlation. CCA is used in the creation of a model
equation that is used to relate two set of variables, for
example one setcomprisesofperformancemeasuresand the
other one consists of descriptive variables.
CCA is also used for class prediction taken from standard
classes that has primal set of samples. This method along
with its regularized version( RCCA) are used for blending
two modalities.it is mainly used for establishing linear
relationship between the text associated toanimageandthe
pixel values of the same image. RCCA is used to study gene
expressions in liver cells and finally it linksthisdata withthe
concentrated hepatic acid in mice. CCA has disadvantage of
not fitting into modalities which have large quantities of
dimensions.RCCA overcome this drawback but is a bit
expensive method. It also has a disadvantage of
regularization and thus produces defectivesinverses.Thisis
also overcome by RCCA but failed to create equivalent level
of refinement between patient classes.
CCA is used to design linear affiliation that comprises of
multidimensional variables.it chooses two bases ideal with
correlations and these bases are assigned to each variable.
Then it selects two bases from the correlation matrix in
which diagonal elements are variables and whose
correlation is maximum. This is the only difference between
ordinary correlation analysis and the CCA.
4. CONCLUSIONS
The objective behind the survey is to provide a complete
understanding of the various algorithms used in
dimensionality reduction and to analyze the developing
interest in this field during the past few years.
Dimensionality reduction technique based on dictionaries
and projections are growing rapidly. PCA is the most
preferred technique due to its simplicity. In middle 2000’s
ICA was at its boom but later it started seeing its decline.But
it will continue to in demand for the applications related to
single processing. The rest of the techniques have placed
themselves well in the market. Overall wemayconcludethat
dimensionality reduction techniques have been and will
continue to be applied in many sectors ranging from
biomedical research to pattern recognition. In this survey
paper we have covered different methodologies; each
requiring different criteria but all having the same goal of
reducing the complexity at the same time to deliver a more
appropriate (understandable) form of the information.
REFERENCES
[1] A survey of dimension reduction techniques Imola K.
Fodor Center for AppliedScienti¯cComputing,Lawrence
Livermore National Laboratory P.O. Box 808, L-
560, Livermore, CA 94551.
[2] A SURVEY OF DIMENSIONALITY REDUCTION AND
CLASSIFICATION METHODS, International Journal of
Computer Science & Engineering Survey (IJCSES) Vol.3,
No.3, June 2012
[3] Lei Yu, , Jieping Ye, Huan Liu , “DimensionalityReduction
for Data Mining Techniques, Applications and
Trends; Binghamton University and Arizona State
University.
[4] Abdi, H. Lewis‐Beck, M.; Bryman, A. & Futing, T. ,
Encyclopedia for research methods for the social
sciences Factor rotations in factor analyses Sage, 2003,
792‐795.
[5] Abdi, H. Salkind, N. (ed.) Encyclopedia of measurements
and statistics Singular value decomposition (SVD) and
Generalized Singular Value Decomposition (GSVD)Sage
Publications, 2007, 907‐912.
[6] Aharon, M.; Elad, M. & Bruckstein, A. K‐SVD: An
Algorithm for Designing Overcomplete Dictionaries for
Sparse Representation IEEE Trans. Signal Processing,
2006, 54, 4311‐4322.
[7] CSCE 666 Pattern Analysis | RicardoGutierrez-Osuna |
CSE@TAMU .
[8] Modern Methods For Business Research edited by
George A.Marcoulides . An Introduction to Genetic
Algorithms Mitchell Melanie A Bradford Book The MIT
Press
[9] Cambridge, Massachusetts • London, England Fifth
printing, 1999 First MIT Press paperback edition, 1998
Copyright © 1996 Massachusetts Institute of
Technology.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 03 Issue: 07 | July-2016 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 18
[10] S. Bicciato, A. Luchini, C. Di Bello, “Disjoint PCA Models
for Marker Identification And Classification Of Cancer
Types Using Gene Expression Data,”2002, IEEE 0-7803-
7557-2.
[11] Matthew Partridge, Rafael Calvo, “Fast Dimensionality
Reduction and Simple PCA,”2006.
[12] Vinay Nadimpally , Mohammed J. Zaki,“A Novel
Approach to Determine Normal Variation in Gene
Expression Data,” SIGKDD Explorations, Vol. 5 Issue
2,2002.
[13] Isabelle Guyon, Jason Weston, Stephen Barnhill, M.D.,
“Gene Selection for Cancer Classification using Support
Vector Machines,” Barnhill Bioinformatics 2001.
[14] Jing Wang, Student Member, IEEE, andChein-I Chang,
Senior Member, “Independent Component Analysis
Based Dimensionality Reduction with Applications in
Hyper spectral Image Analysis,” IEEE Transactions On
Geoscience And Remote Sensing Vol. 44, 2006, IEEE
0196-2892.
[15] Abhishek Golugula, George Lee, Stephen R. Master,
Michael D. Feldman, John E. Tomaszewski, and Anant
Madabhushi, “Supervised Regularized Canonical
Correlation Analysis: Integrating Histologic and
Proteomic Data for Predicting Biochemical Failures,”
EMBS 2011, IEEE 978-1-4244-4122-8.
BIOGRAPHIES
Dr.K.Govinda received the degree
in computer science and
engineering from Nagarjuna
University in 1998. He is
Associate Professor in School of
Computing Science and
Engineering, VIT University. His
interests are database, data
warehousingandcloudcomputing.

More Related Content

What's hot

Effective data mining for proper
Effective data mining for properEffective data mining for proper
Effective data mining for properIJDKP
 
IRJET- Evidence Chain for Missing Data Imputation: Survey
IRJET- Evidence Chain for Missing Data Imputation: SurveyIRJET- Evidence Chain for Missing Data Imputation: Survey
IRJET- Evidence Chain for Missing Data Imputation: SurveyIRJET Journal
 
Recommendation system using bloom filter in mapreduce
Recommendation system using bloom filter in mapreduceRecommendation system using bloom filter in mapreduce
Recommendation system using bloom filter in mapreduceIJDKP
 
Anomaly detection via eliminating data redundancy and rectifying data error i...
Anomaly detection via eliminating data redundancy and rectifying data error i...Anomaly detection via eliminating data redundancy and rectifying data error i...
Anomaly detection via eliminating data redundancy and rectifying data error i...nalini manogaran
 
A statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environmentA statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environmentIJDKP
 
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...IJDKP
 
Data mining techniques
Data mining techniquesData mining techniques
Data mining techniqueseSAT Journals
 
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...IRJET Journal
 
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...IRJET Journal
 
Data mining techniques a survey paper
Data mining techniques a survey paperData mining techniques a survey paper
Data mining techniques a survey papereSAT Publishing House
 
Using particle swarm optimization to solve test functions problems
Using particle swarm optimization to solve test functions problemsUsing particle swarm optimization to solve test functions problems
Using particle swarm optimization to solve test functions problemsriyaniaes
 
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...ijiert bestjournal
 
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression MethodIRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression MethodIRJET Journal
 
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace DataMPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace DataIRJET Journal
 
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...csandit
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Sagar Deogirkar
 
Analysis on Data Mining Techniques for Heart Disease Dataset
Analysis on Data Mining Techniques for Heart Disease DatasetAnalysis on Data Mining Techniques for Heart Disease Dataset
Analysis on Data Mining Techniques for Heart Disease DatasetIRJET Journal
 

What's hot (18)

Effective data mining for proper
Effective data mining for properEffective data mining for proper
Effective data mining for proper
 
IRJET- Evidence Chain for Missing Data Imputation: Survey
IRJET- Evidence Chain for Missing Data Imputation: SurveyIRJET- Evidence Chain for Missing Data Imputation: Survey
IRJET- Evidence Chain for Missing Data Imputation: Survey
 
Recommendation system using bloom filter in mapreduce
Recommendation system using bloom filter in mapreduceRecommendation system using bloom filter in mapreduce
Recommendation system using bloom filter in mapreduce
 
Anomaly detection via eliminating data redundancy and rectifying data error i...
Anomaly detection via eliminating data redundancy and rectifying data error i...Anomaly detection via eliminating data redundancy and rectifying data error i...
Anomaly detection via eliminating data redundancy and rectifying data error i...
 
A statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environmentA statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environment
 
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
 
Data mining techniques
Data mining techniquesData mining techniques
Data mining techniques
 
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
 
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
 
Data mining techniques a survey paper
Data mining techniques a survey paperData mining techniques a survey paper
Data mining techniques a survey paper
 
Using particle swarm optimization to solve test functions problems
Using particle swarm optimization to solve test functions problemsUsing particle swarm optimization to solve test functions problems
Using particle swarm optimization to solve test functions problems
 
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
 
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression MethodIRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
 
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace DataMPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
 
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
 
Analysis on Data Mining Techniques for Heart Disease Dataset
Analysis on Data Mining Techniques for Heart Disease DatasetAnalysis on Data Mining Techniques for Heart Disease Dataset
Analysis on Data Mining Techniques for Heart Disease Dataset
 
B0930610
B0930610B0930610
B0930610
 

Similar to Survey on Feature Selection and Dimensionality Reduction Techniques

IRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data ScienceIRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data ScienceIRJET Journal
 
High dimensionality reduction on graphical data
High dimensionality reduction on graphical dataHigh dimensionality reduction on graphical data
High dimensionality reduction on graphical dataeSAT Journals
 
Comparative study of various supervisedclassification methodsforanalysing def...
Comparative study of various supervisedclassification methodsforanalysing def...Comparative study of various supervisedclassification methodsforanalysing def...
Comparative study of various supervisedclassification methodsforanalysing def...eSAT Publishing House
 
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...IRJET Journal
 
TERM DEPOSIT SUBSCRIPTION PREDICTION
TERM DEPOSIT SUBSCRIPTION PREDICTIONTERM DEPOSIT SUBSCRIPTION PREDICTION
TERM DEPOSIT SUBSCRIPTION PREDICTIONIRJET Journal
 
Fault detection of imbalanced data using incremental clustering
Fault detection of imbalanced data using incremental clusteringFault detection of imbalanced data using incremental clustering
Fault detection of imbalanced data using incremental clusteringIRJET Journal
 
Feature selection a novel
Feature selection a novelFeature selection a novel
Feature selection a novelcsandit
 
Survey paper on Big Data Imputation and Privacy Algorithms
Survey paper on Big Data Imputation and Privacy AlgorithmsSurvey paper on Big Data Imputation and Privacy Algorithms
Survey paper on Big Data Imputation and Privacy AlgorithmsIRJET Journal
 
Data Mining Module 2 Business Analytics.
Data Mining Module 2 Business Analytics.Data Mining Module 2 Business Analytics.
Data Mining Module 2 Business Analytics.Jayanti Pande
 
Data reduction techniques to analyze nsl kdd dataset
Data reduction techniques to analyze nsl kdd datasetData reduction techniques to analyze nsl kdd dataset
Data reduction techniques to analyze nsl kdd datasetIAEME Publication
 
Review of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionReview of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionIRJET Journal
 
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...IRJET Journal
 
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...IJAEMSJORNAL
 
Analysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOTAnalysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOTIJERA Editor
 
IRJET - A Novel Approach for Software Defect Prediction based on Dimensio...
IRJET -  	  A Novel Approach for Software Defect Prediction based on Dimensio...IRJET -  	  A Novel Approach for Software Defect Prediction based on Dimensio...
IRJET - A Novel Approach for Software Defect Prediction based on Dimensio...IRJET Journal
 
Anomaly Detection using multidimensional reduction Principal Component Analysis
Anomaly Detection using multidimensional reduction Principal Component AnalysisAnomaly Detection using multidimensional reduction Principal Component Analysis
Anomaly Detection using multidimensional reduction Principal Component AnalysisIOSR Journals
 
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGESCASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGESIRJET Journal
 
Machine Learning Approaches and its Challenges
Machine Learning Approaches and its ChallengesMachine Learning Approaches and its Challenges
Machine Learning Approaches and its Challengesijcnes
 
A Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data MiningA Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data MiningIRJET Journal
 
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYCLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYEditor IJMTER
 

Similar to Survey on Feature Selection and Dimensionality Reduction Techniques (20)

IRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data ScienceIRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data Science
 
High dimensionality reduction on graphical data
High dimensionality reduction on graphical dataHigh dimensionality reduction on graphical data
High dimensionality reduction on graphical data
 
Comparative study of various supervisedclassification methodsforanalysing def...
Comparative study of various supervisedclassification methodsforanalysing def...Comparative study of various supervisedclassification methodsforanalysing def...
Comparative study of various supervisedclassification methodsforanalysing def...
 
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
 
TERM DEPOSIT SUBSCRIPTION PREDICTION
TERM DEPOSIT SUBSCRIPTION PREDICTIONTERM DEPOSIT SUBSCRIPTION PREDICTION
TERM DEPOSIT SUBSCRIPTION PREDICTION
 
Fault detection of imbalanced data using incremental clustering
Fault detection of imbalanced data using incremental clusteringFault detection of imbalanced data using incremental clustering
Fault detection of imbalanced data using incremental clustering
 
Feature selection a novel
Feature selection a novelFeature selection a novel
Feature selection a novel
 
Survey paper on Big Data Imputation and Privacy Algorithms
Survey paper on Big Data Imputation and Privacy AlgorithmsSurvey paper on Big Data Imputation and Privacy Algorithms
Survey paper on Big Data Imputation and Privacy Algorithms
 
Data Mining Module 2 Business Analytics.
Data Mining Module 2 Business Analytics.Data Mining Module 2 Business Analytics.
Data Mining Module 2 Business Analytics.
 
Data reduction techniques to analyze nsl kdd dataset
Data reduction techniques to analyze nsl kdd datasetData reduction techniques to analyze nsl kdd dataset
Data reduction techniques to analyze nsl kdd dataset
 
Review of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionReview of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & Prediction
 
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
 
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
 
Analysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOTAnalysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOT
 
IRJET - A Novel Approach for Software Defect Prediction based on Dimensio...
IRJET -  	  A Novel Approach for Software Defect Prediction based on Dimensio...IRJET -  	  A Novel Approach for Software Defect Prediction based on Dimensio...
IRJET - A Novel Approach for Software Defect Prediction based on Dimensio...
 
Anomaly Detection using multidimensional reduction Principal Component Analysis
Anomaly Detection using multidimensional reduction Principal Component AnalysisAnomaly Detection using multidimensional reduction Principal Component Analysis
Anomaly Detection using multidimensional reduction Principal Component Analysis
 
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGESCASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
 
Machine Learning Approaches and its Challenges
Machine Learning Approaches and its ChallengesMachine Learning Approaches and its Challenges
Machine Learning Approaches and its Challenges
 
A Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data MiningA Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data Mining
 
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYCLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
 

More from IRJET Journal

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...IRJET Journal
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTUREIRJET Journal
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...IRJET Journal
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsIRJET Journal
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...IRJET Journal
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...IRJET Journal
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...IRJET Journal
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...IRJET Journal
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASIRJET Journal
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...IRJET Journal
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProIRJET Journal
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...IRJET Journal
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemIRJET Journal
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesIRJET Journal
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web applicationIRJET Journal
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...IRJET Journal
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.IRJET Journal
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...IRJET Journal
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignIRJET Journal
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...IRJET Journal
 

More from IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Recently uploaded

INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 

Recently uploaded (20)

INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 

Survey on Feature Selection and Dimensionality Reduction Techniques

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 03 Issue: 07 | July-2016 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 14 Survey on Feature Selection and Dimensionality Reduction Techniques Govinda.K1, Kevin Thomas 2 12School of Computing Science Engineering, VIT University, Vellore, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Data mining methods are used to extract useful information from huge amount of data. Mining of data is a time consuming process because the data thathastobemined may be comprised of large number of dimensions. So we can say that number of dimensions exponentially vary with computation time. In order to speed up the decision making process, dimensionality reduction techniques came into existence. In this paper we have discussed about various techniques of dimensionality reduction that exists currently. Key Words: PCA, SVD, SVM, LCM, CCA 1. INTRODUCTION Advancement in data collection and data storage during the past decade has led to an information overload. Researchers from the field of engineering, biology, astronomy, remote sensing, consumer transaction and economics face larger observations and simulation everyday.[1]Data miningaims at processing, classifying and selecting useful data from any given dataset. The dataset on which data miningisdone may consist of information of varied form and enormous size. This information may be inappropriateormayshowmissing values that may mislead the user due to the irregularities. These irregularities are detected, analyzed and modified during data pre- processing by applying certain modifications and behavioral predictions. Dimensionality reduction is among the predominant techniques of data pre-processing which helps to incorporate a systematized structure into the dataset, prior to the mining process. It is also an effective way to downsize data which encourages effective storage and retrieval of data. Some of its important applications are found in the areas like text mining,image processing,patternrecognition, image retrieval etc. Dimensionality reduction is implemented using various algorithms such as SVD, PCA, ICA, SVM etc. [2] This paper describes and analysis OF nine such algorithms to find each ones strength and weaknesses. 1.1 Why Dimensionality Reduction  It is easy and convenient way to collect data  Data that is collected, is not just from data mining  Data is accumulated in an unprecedented speed  For effective machine learning and data mining data processing is an important part  Helps in downsizing data  Visualization  Data compression  Noise removal 1.2 Applications of Dimensionality Reduction  Text mining  Image retrieval  Customer management  Face recognition  Intrusion detection  Microarray data analysis  Handwritten digit recognition  Protein classification 2. MAJOR TECHNIQUES OF DIMENSIONALITY REDUCTION 2.1 Feature Selection A process that chooses an optimal subset of features according to an objective function [3]. Feature selection is done to remove noise, reducedimensionality and toimprove mining performance such as speed of learning, predictive accuracy and to make the mined results simple to understand. 2.2 Feature Extraction It refers to mapping of high-dimensional data to a lower- dimensional space. [3]. Criterion for thismaydiffer based on different problem settings such as: 3. ALGORITHMS 3.1 Singular Value Decomposition (SVD) SVD is a gene selection procedure that is performed to reduce dimensionality in data. It is a matrix factorization method which comes under linear vector algebra. The main objective of SVD in data analysis of gene expression is to identify and extract the structural constraints within the  Unsupervised Setting – minimum loss of information  Supervised Setting – maximum class discrimination
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 03 Issue: 07 | July-2016 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 15 data and also to relate significantassociations.Themainidea behind SVD is to calculate eigenvalues and eigenvectors of covariance matrix of sample-gene matrix [4]. The eigenvalues would help to deduce the respective (or corresponding) eigenvectors; for higher eigenvalues the corresponding eigenvectors would contain a higher variability. Usuallytheeigenvectorsthatappearinitiallywith higher unpredictability are considered as prime Principle Components (PCs). These PCs are used to reduce the data into smaller dimensions [5]. The tool used to analyze the data is known as clustering. Group of genes the show similarity (i.e. similar expression profile) are discovered by Clustering. The two methods in clustering are Model-Free andModel-based methods[6]. Model-freemethods do not have any probabilistic tree whereas model-based clustering method clusters are build based on assumptions such that data follows mixture distribution. SVD overcomes two main difficulties in gene expressiondata that is parameter estimation and normality assumptions on gene expression. After applying SVD on data a technique called probit transformation is done to improve the working of model-freemethod. SVD can be applied to dataset that has both scattered and un-scattered genes. 3.2 Partial Least Squares Regression (PLSR) Research in science and engineering often involves using controllable and/or easy-to-measure variables (factors) to explain, regulate, or predict the behavior (responses) of other variables. When the factors are few in number, they are not significantly redundant (collinear), and have a well- understood relationshiptotheresponses,thena goodwayto turn data into information could be multiple linear regression (MLR). When any one of these three conditions does not satisfy, MLR proves to be inefficient or inappropriate. Hence, researchers face problems withmany variables and ill-understood relationships, while constructing a good predictive model. Example: Spectrographs. When the factors are many and highly collinear we use Partial Least Squares (PLS) method for constructing predictive models. This emphasizes on predicting the response rather than trying to understand the relationship between the variables. For example, PLS is usually not appropriate for screening out factors that have a negligible effect on response. However, when thegoal ispredictionand there is no practical need to limit the number of factors, PLS may be a very useful tool. MLR can be used with many factors. However, if the number of factors gets too large, you are likely to get a model thatfits the sampled data perfectly but that will fail to predict new data well. This phenomenon is called over-fitting. In such cases, although there are many manifest factors, there may be only a few underlying factors or latent factors that account for most of the variation in response. The main intention of PLS is to extract these latent factors, accounting for as much of the manifest factor variationaspossiblewhile modelling the responses well[7]. 3.3 Linear Discriminant Analysis (LDA) Linear discriminant analysis (LDA) is a method which finds its application in statistics, machine learning and pattern recognition to find a linear combination of features which characterizes or separates two or more classes of objects. This resulting combination is used as a linear classifier and also for dimensionality reduction before later classification. LDA is closely related to ANOVA (analysis of variance) and regression analysis, attempts to express one dependent variable as a linear combination of other features. However, ANOVA uses categorical independent variables and a continuous dependent variable. In discriminant analysis continuous independent variables and a categorical dependent variable i.e. the class label are used. Logistic regression is more similar to LDA, as it explainsa categorical variable by the values of continuous independent variables. We prefer these methods in applications where it is not reasonable to assume that the independent variables are normally distributed. This is the fundamental assumption of the LDA method. When the measurements madeon independent variablesfor each observation are continuous quantities LDA is used. For categorical independent variables, a similar technique is discriminant correspondence analysis [8]. 3.4 Locally Linear Embedding (LLE) High-dimensional data can often be converted to low- dimensional data with little or no fundamental loss of information. A simple and widely used method for dimensionality reduction is principal component analysis (PCA). In this method data points by their respective orthogonal projections on a subspace of low dimension spanned by the directions (also called components,features, factors or sources) of greatest variance in the data set is represented. The two recently proposednon- linear generalizations of PCA are locally linear embedding (LLE) and ISOMAP algorithms. It was developed for visualization purposes, these two methods projecthigh- dimensional data into a two orlow-dimensional subspaceby extracting meaningful components in a non-linear fashion. The locally linear embedding algorithmassumesthata high- dimensional data set lies on, or near to, a smooth low- dimensional manifold. Small patches of the manifold, eachof which contains a fraction of the data set, can be equipped with individual localco-ordinates. High-dimensional co- ordinates of each patch can be mapped into corresponding local co-ordinates by means of an essentially linear transformation. This method attempts to find a global
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 03 Issue: 07 | July-2016 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 16 transformation of the high-dimensional co-ordinates into low- dimensional ones by exploiting adjacency information about closely located data points, this information being a form of summarization of the local transformationsbetween the high- and low-dimensional co-ordinates[9]. 3.5 Generic Algorithm Computational studies of Darwinian evolution and natural selection have led to numerous models for resolving optimization. Generic Algorithms comprise a subsetofthese evolution based optimization problems techniques focusing on the application of selection, mutation, andrecombination to a population of competing problem solutions. Generic Algorithms are parallel, iterative optimizers, and have been successfully applied to a broad spectrum of optimization problems, including many pattern recognition and classification tasks [10]. 3.6 Principal Component Analysis (PCA) It is a Feature Extraction technique that is used to analyze statistical data by transforming the starting set of variables into various set of linear combinations which are known as the principal components (PC), and these components have some specific properties with regardtovariances.Thismake the dimensionality of the system more concentrated and at the same time, variable connections information is also maintained [11]. Calculations are made on the data set by analysing eigenvalue and its eigenvectors,covariancematrix arranged systematically in descendingorder. Thistechnique produces the maximum feasibility arbitrary solutions in the high-dimensional space. We can view it as data visualization method because here, the high dimensional data sets can be reduced to two dimensional or three dimensional data sets that can be easily plottedusing graphs or charts. Simple Principal Components Analysis techniquetakea data oriented approach and it is used in case of dimensionality reduction of two highly dimensional picture databases. In such cases it has showed quick convergence rate and stability and has produced estimated solutions without considering thevariance-covariance matrix [12]. So this technique is more beneficial in use as compared to other existing methods. Matrix method and data method are the key methods for PCA. In the matrix method as the name suggest, matrix is formed for the datasets which comprises of the calculated variance and co variance of the data.Furtherdiagonalization technique is also applied on the matrix. Data methods directly deal with the data. PCA is generally applied before clustering of the datasets happens so as to get an improved drawing out of the cluster organization. This technique also has a disadvantage of not considering the class separability because of the absence of class label of the feature vector [13].PCA mainly focuses ontherotationof the coordinate axes, that is transformed in the direction which as the maximum variance. The traditional PCA is implied only on linear transformation and is not effective fornon-linear data. So, to overcome this drawback, non-linear transformation is applied to high dimensional space [14]. 3.7 Support Vector Machines (SVM) SVM is a technique which can be applied to several domains consecutively. SVM mainly focus on the microarray cancer data that contains huge number of gene expressions. The objective of SVM is to get the mostoptimal separating hyper- plane that has the maximum margin (w).this techniques is more suitable in analyzing gene expressions because it specifically concentrateondatasetswhichhaslowernumber of samples as compared to the genes and it is also useful in gene selection. SVM is useful because the resultsproduced bythistechnique are optimal and distinctive. It is s used in combination with Recursive Feature Replacement (RFR) that produces a unique assessment of generalization of feature (genes).it starts with the gene set which is complete; it removes the least significant gene from the classifier in subsequent iterations. There is another method called SVM-T- RFE method which works on smaller gene subsets and thus produces the highest accuracy. There is one more method similar to SVM, it is called as SVD. The major difference between the two methods is that SVM is a supervised method while SVD is an unsupervised method. They work on the same application areas like feature selection and modelling of dynamics of gene expression and clustering. But SVM is consideredtobemore effective over other existing procedures [15]. 3.8 Independent Component Analysis (ICA) The computational technique that is used to obtain reduced subcomponents from the assorted signal aftersplittingthem is called Independent Component Analysis technique.to understand this methodwecanconsidera simpleexampleof ICA called “cocktail party ”. Here a sample data is taken which consists of individuals interacting with eachotherina room and there fundamental speech signals are divided.In a generalized way we can say for N sources at least N estimations are needed for primal signals mining. This can be also achieved using Regularized Whitening method in which dimension of independent sets is reduced to limited set. This algorithm hasthepotential toincorporate statistics of higher order that comprises of complementary data.it finds its importancewhensendingofdata differsfrom its prevalent parameters. Whitened data is needed for this method, which is obtained from the identity covariance
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 03 Issue: 07 | July-2016 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 17 matrix. It can be used as an augmented version of the PCA method. Initial projection vectors are used in ICA, usingthee parameters, three more algorithms are developed for ICA- DR. This procedure is further classified into three subprocedures. ICA-DR1, is the first procedure which uses virtual parameter together with prioritization and selection components. The next one is the ICA-DR2, here ICA is executed as an unauthorized algorithm and its randomness is characterized by randomly choosing initial projection vectors. The third algorithm is called the ICA-DR3, this method uses initialization algorithm together with virtual dimension to develop more appropriate initial projection vectors group.so that they can be used to substitute these vectors required by ICA to develop each of its ICs. ICA is closely related to Projection Pursuit (PP), which is another statically data analysis tool, whose aim is to reduce the dimensions of the multivariate data sets by identifying the“interested projections “from the projection directory. 3.9 Canonical Correlation Analysis (CCA) This technique was first introduced by Harold Hoteling; he used cross covariance matrices for dimensionality reduction.in this technique, two different setofvariables are taken along with their correlations,thiscorrelationshelpsto establish linear combinations of the sets with the set of highest correlation. CCA is used in the creation of a model equation that is used to relate two set of variables, for example one setcomprisesofperformancemeasuresand the other one consists of descriptive variables. CCA is also used for class prediction taken from standard classes that has primal set of samples. This method along with its regularized version( RCCA) are used for blending two modalities.it is mainly used for establishing linear relationship between the text associated toanimageandthe pixel values of the same image. RCCA is used to study gene expressions in liver cells and finally it linksthisdata withthe concentrated hepatic acid in mice. CCA has disadvantage of not fitting into modalities which have large quantities of dimensions.RCCA overcome this drawback but is a bit expensive method. It also has a disadvantage of regularization and thus produces defectivesinverses.Thisis also overcome by RCCA but failed to create equivalent level of refinement between patient classes. CCA is used to design linear affiliation that comprises of multidimensional variables.it chooses two bases ideal with correlations and these bases are assigned to each variable. Then it selects two bases from the correlation matrix in which diagonal elements are variables and whose correlation is maximum. This is the only difference between ordinary correlation analysis and the CCA. 4. CONCLUSIONS The objective behind the survey is to provide a complete understanding of the various algorithms used in dimensionality reduction and to analyze the developing interest in this field during the past few years. Dimensionality reduction technique based on dictionaries and projections are growing rapidly. PCA is the most preferred technique due to its simplicity. In middle 2000’s ICA was at its boom but later it started seeing its decline.But it will continue to in demand for the applications related to single processing. The rest of the techniques have placed themselves well in the market. Overall wemayconcludethat dimensionality reduction techniques have been and will continue to be applied in many sectors ranging from biomedical research to pattern recognition. In this survey paper we have covered different methodologies; each requiring different criteria but all having the same goal of reducing the complexity at the same time to deliver a more appropriate (understandable) form of the information. REFERENCES [1] A survey of dimension reduction techniques Imola K. Fodor Center for AppliedScienti¯cComputing,Lawrence Livermore National Laboratory P.O. Box 808, L- 560, Livermore, CA 94551. [2] A SURVEY OF DIMENSIONALITY REDUCTION AND CLASSIFICATION METHODS, International Journal of Computer Science & Engineering Survey (IJCSES) Vol.3, No.3, June 2012 [3] Lei Yu, , Jieping Ye, Huan Liu , “DimensionalityReduction for Data Mining Techniques, Applications and Trends; Binghamton University and Arizona State University. [4] Abdi, H. Lewis‐Beck, M.; Bryman, A. & Futing, T. , Encyclopedia for research methods for the social sciences Factor rotations in factor analyses Sage, 2003, 792‐795. [5] Abdi, H. Salkind, N. (ed.) Encyclopedia of measurements and statistics Singular value decomposition (SVD) and Generalized Singular Value Decomposition (GSVD)Sage Publications, 2007, 907‐912. [6] Aharon, M.; Elad, M. & Bruckstein, A. K‐SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation IEEE Trans. Signal Processing, 2006, 54, 4311‐4322. [7] CSCE 666 Pattern Analysis | RicardoGutierrez-Osuna | CSE@TAMU . [8] Modern Methods For Business Research edited by George A.Marcoulides . An Introduction to Genetic Algorithms Mitchell Melanie A Bradford Book The MIT Press [9] Cambridge, Massachusetts • London, England Fifth printing, 1999 First MIT Press paperback edition, 1998 Copyright © 1996 Massachusetts Institute of Technology.
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 03 Issue: 07 | July-2016 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 18 [10] S. Bicciato, A. Luchini, C. Di Bello, “Disjoint PCA Models for Marker Identification And Classification Of Cancer Types Using Gene Expression Data,”2002, IEEE 0-7803- 7557-2. [11] Matthew Partridge, Rafael Calvo, “Fast Dimensionality Reduction and Simple PCA,”2006. [12] Vinay Nadimpally , Mohammed J. Zaki,“A Novel Approach to Determine Normal Variation in Gene Expression Data,” SIGKDD Explorations, Vol. 5 Issue 2,2002. [13] Isabelle Guyon, Jason Weston, Stephen Barnhill, M.D., “Gene Selection for Cancer Classification using Support Vector Machines,” Barnhill Bioinformatics 2001. [14] Jing Wang, Student Member, IEEE, andChein-I Chang, Senior Member, “Independent Component Analysis Based Dimensionality Reduction with Applications in Hyper spectral Image Analysis,” IEEE Transactions On Geoscience And Remote Sensing Vol. 44, 2006, IEEE 0196-2892. [15] Abhishek Golugula, George Lee, Stephen R. Master, Michael D. Feldman, John E. Tomaszewski, and Anant Madabhushi, “Supervised Regularized Canonical Correlation Analysis: Integrating Histologic and Proteomic Data for Predicting Biochemical Failures,” EMBS 2011, IEEE 978-1-4244-4122-8. BIOGRAPHIES Dr.K.Govinda received the degree in computer science and engineering from Nagarjuna University in 1998. He is Associate Professor in School of Computing Science and Engineering, VIT University. His interests are database, data warehousingandcloudcomputing.