This document summarizes a research paper titled "A Novel Algorithm for Design Tree Classification with PCA". It discusses dimensionality reduction techniques like principal component analysis (PCA) that can improve the efficiency of classification algorithms on high-dimensional data. PCA transforms data to a new coordinate system such that the greatest variance by any projection of the data comes to lie on the first coordinate, called the first principal component. The paper proposes applying PCA and linear transformation on an original dataset before using a decision tree classification algorithm, in order to get better classification results.
Data Mining: Concepts and Techniques — Chapter 2 —Salah Amean
the presentation contains the following :
-Data Objects and Attribute Types.
-Basic Statistical Descriptions of Data.
-Data Visualization.
-Measuring Data Similarity and Dissimilarity.
-Summary.
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSijdkp
Subspace clustering discovers the clusters embedded in multiple, overlapping subspaces of high
dimensional data. Many significant subspace clustering algorithms exist, each having different
characteristics caused by the use of different techniques, assumptions, heuristics used etc. A comprehensive
classification scheme is essential which will consider all such characteristics to divide subspace clustering
approaches in various families. The algorithms belonging to same family will satisfy common
characteristics. Such a categorization will help future developers to better understand the quality criteria to
be used and similar algorithms to be used to compare results with their proposed clustering algorithms. In
this paper, we first proposed the concept of SCAF (Subspace Clustering Algorithms’ Family).
Characteristics of SCAF will be based on the classes such as cluster orientation, overlap of dimensions etc.
As an illustration, we further provided a comprehensive, systematic description and comparison of few
significant algorithms belonging to “Axis parallel, overlapping, density based” SCAF.
A COMPARATIVE STUDY ON DISTANCE MEASURING APPROACHES FOR CLUSTERINGIJORCS
Clustering plays a vital role in the various areas of research like Data Mining, Image Retrieval, Bio-computing and many a lot. Distance measure plays an important role in clustering data points. Choosing the right distance measure for a given dataset is a biggest challenge. In this paper, we study various distance measures and their effect on different clustering. This paper surveys existing distance measures for clustering and present a comparison between them based on application domain, efficiency, benefits and drawbacks. This comparison helps the researchers to take quick decision about which distance measure to use for clustering. We conclude this work by identifying trends and challenges of research and development towards clustering.
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...ijscmc
Face recognition is one of the most unobtrusive biometric techniques that can be used for access control as well as surveillance purposes. Various methods for implementing face recognition have been proposed with varying degrees of performance in different scenarios. The most common issue with effective facial biometric systems is high susceptibility of variations in the face owing to different factors like changes in pose, varying illumination, different expression, presence of outliers, noise etc. This paper explores a novel technique for face recognition by performing classification of the face images using unsupervised learning approach through K-Medoids clustering. Partitioning Around Medoids algorithm (PAM) has been used for performing K-Medoids clustering of the data. The results are suggestive of increased robustness to noise and outliers in comparison to other clustering methods. Therefore the technique can also be used to increase the overall robustness of a face recognition system and thereby increase its invariance and make it a reliably usable biometric modality
Data Mining: Concepts and Techniques — Chapter 2 —Salah Amean
the presentation contains the following :
-Data Objects and Attribute Types.
-Basic Statistical Descriptions of Data.
-Data Visualization.
-Measuring Data Similarity and Dissimilarity.
-Summary.
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSijdkp
Subspace clustering discovers the clusters embedded in multiple, overlapping subspaces of high
dimensional data. Many significant subspace clustering algorithms exist, each having different
characteristics caused by the use of different techniques, assumptions, heuristics used etc. A comprehensive
classification scheme is essential which will consider all such characteristics to divide subspace clustering
approaches in various families. The algorithms belonging to same family will satisfy common
characteristics. Such a categorization will help future developers to better understand the quality criteria to
be used and similar algorithms to be used to compare results with their proposed clustering algorithms. In
this paper, we first proposed the concept of SCAF (Subspace Clustering Algorithms’ Family).
Characteristics of SCAF will be based on the classes such as cluster orientation, overlap of dimensions etc.
As an illustration, we further provided a comprehensive, systematic description and comparison of few
significant algorithms belonging to “Axis parallel, overlapping, density based” SCAF.
A COMPARATIVE STUDY ON DISTANCE MEASURING APPROACHES FOR CLUSTERINGIJORCS
Clustering plays a vital role in the various areas of research like Data Mining, Image Retrieval, Bio-computing and many a lot. Distance measure plays an important role in clustering data points. Choosing the right distance measure for a given dataset is a biggest challenge. In this paper, we study various distance measures and their effect on different clustering. This paper surveys existing distance measures for clustering and present a comparison between them based on application domain, efficiency, benefits and drawbacks. This comparison helps the researchers to take quick decision about which distance measure to use for clustering. We conclude this work by identifying trends and challenges of research and development towards clustering.
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...ijscmc
Face recognition is one of the most unobtrusive biometric techniques that can be used for access control as well as surveillance purposes. Various methods for implementing face recognition have been proposed with varying degrees of performance in different scenarios. The most common issue with effective facial biometric systems is high susceptibility of variations in the face owing to different factors like changes in pose, varying illumination, different expression, presence of outliers, noise etc. This paper explores a novel technique for face recognition by performing classification of the face images using unsupervised learning approach through K-Medoids clustering. Partitioning Around Medoids algorithm (PAM) has been used for performing K-Medoids clustering of the data. The results are suggestive of increased robustness to noise and outliers in comparison to other clustering methods. Therefore the technique can also be used to increase the overall robustness of a face recognition system and thereby increase its invariance and make it a reliably usable biometric modality
Reduct generation for the incremental data using rough set theorycsandit
n today’s changing world huge amount of data is ge
nerated and transferred frequently.
Although the data is sometimes static but most comm
only it is dynamic and transactional. New
data that is being generated is getting constantly
added to the old/existing data. To discover the
knowledge from this incremental data, one approach
is to run the algorithm repeatedly for the
modified data sets which is time consuming. The pap
er proposes a dimension reduction
algorithm that can be applied in dynamic environmen
t for generation of reduced attribute set as
dynamic reduct.
The method analyzes the new dataset, when it become
s available, and modifies
the reduct accordingly to fit the entire dataset. T
he concepts of discernibility relation, attribute
dependency and attribute significance of Rough Set
Theory are integrated for the generation of
dynamic reduct set, which not only reduces the comp
lexity but also helps to achieve higher
accuracy of the decision system. The proposed metho
d has been applied on few benchmark
dataset collected from the UCI repository and a dyn
amic reduct is computed. Experimental
result shows the efficiency of the proposed method
A brief description of clustering, two relevant clustering algorithms(K-means and Fuzzy C-means), clustering validation, two inner validity indices(Dunn-n-Dunn and Devies Bouldin) .
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...ijcsit
In this paper, we present an algorithm for feature selection. This algorithm labeled QC-FS: Quantum
Clustering for Feature Selection performs the selection in two steps. Partitioning the original features
space in order to group similar features is performed using the Quantum Clustering algorithm. Then the
selection of a representative for each cluster is carried out. It uses similarity measures such as correlation
coefficient (CC) and the mutual information (MI). The feature which maximizes this information is chosen
by the algorithm
Reduct generation for the incremental data using rough set theorycsandit
n today’s changing world huge amount of data is ge
nerated and transferred frequently.
Although the data is sometimes static but most comm
only it is dynamic and transactional. New
data that is being generated is getting constantly
added to the old/existing data. To discover the
knowledge from this incremental data, one approach
is to run the algorithm repeatedly for the
modified data sets which is time consuming. The pap
er proposes a dimension reduction
algorithm that can be applied in dynamic environmen
t for generation of reduced attribute set as
dynamic reduct.
The method analyzes the new dataset, when it become
s available, and modifies
the reduct accordingly to fit the entire dataset. T
he concepts of discernibility relation, attribute
dependency and attribute significance of Rough Set
Theory are integrated for the generation of
dynamic reduct set, which not only reduces the comp
lexity but also helps to achieve higher
accuracy of the decision system. The proposed metho
d has been applied on few benchmark
dataset collected from the UCI repository and a dyn
amic reduct is computed. Experimental
result shows the efficiency of the proposed method
A brief description of clustering, two relevant clustering algorithms(K-means and Fuzzy C-means), clustering validation, two inner validity indices(Dunn-n-Dunn and Devies Bouldin) .
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...ijcsit
In this paper, we present an algorithm for feature selection. This algorithm labeled QC-FS: Quantum
Clustering for Feature Selection performs the selection in two steps. Partitioning the original features
space in order to group similar features is performed using the Quantum Clustering algorithm. Then the
selection of a representative for each cluster is carried out. It uses similarity measures such as correlation
coefficient (CC) and the mutual information (MI). The feature which maximizes this information is chosen
by the algorithm
Web image annotation by diffusion maps manifold learning algorithmijfcstjournal
Automatic image annotation is one of the most challenging problems in machine vision areas. The goal of this task is to predict number of keywords automatically for images captured in real data. Many methods are based on visual features in order to calculate similarities between image samples. But the computation cost of these approaches is very high. These methods require many training samples to be stored in memory. To lessen thisburden, a number of techniques have been developed to reduce the number
of features in a dataset. Manifold learning is a popular approach to nonlinear dimensionality reduction. In
this paper, we investigate Diffusion maps manifold learning method for webimage auto-annotation task.Diffusion maps
manifold learning method isused to reduce the dimension of some visual features. Extensive experiments and analysis onNUS-WIDE-LITE web image dataset with
different visual featuresshow how this manifold learning dimensionality reduction method can be applied effectively to image annotation.
A Review on Non Linear Dimensionality Reduction Techniques for Face Recognitionrahulmonikasharma
Principal component Analysis (PCA) has gained much attention among researchers to address the pboblem of high dimensional data sets.during last decade a non-linear variantof PCA has been used to reduce the dimensions on a non linear hyperplane.This paper reviews the various Non linear techniques ,applied on real and artificial data .It is observed that Non-Linear PCA outperform in the counterpart in most cases .However exceptions are noted.
Data mining is a process to extract information from a huge amount of data and transform it into an
understandable structure. Data mining provides the number of tasks to extract data from large databases such
as Classification, Clustering, Regression, Association rule mining. This paper provides the concept of
Classification. Classification is an important data mining technique based on machine learning which is used to
classify the each item on the bases of features of the item with respect to the predefined set of classes or groups.
This paper summarises various techniques that are implemented for the classification such as k-NN, Decision
Tree, Naïve Bayes, SVM, ANN and RF. The techniques are analyzed and compared on the basis of their
advantages and disadvantages
Improved probabilistic distance based locality preserving projections method ...IJECEIAES
In this paper, a dimensionality reduction is achieved in large datasets using the proposed distance based Non-integer Matrix Factorization (NMF) technique, which is intended to solve the data dimensionality problem. Here, NMF and distance measurement aim to resolve the non-orthogonality problem due to increased dataset dimensionality. It initially partitions the datasets, organizes them into a defined geometric structure and it avoids capturing the dataset structure through a distance based similarity measurement. The proposed method is designed to fit the dynamic datasets and it includes the intrinsic structure using data geometry. Therefore, the complexity of data is further avoided using an Improved Distance based Locality Preserving Projection. The proposed method is evaluated against existing methods in terms of accuracy, average accuracy, mutual information and average mutual information.
Among many data clustering approaches available today, mixed data set of numeric and category data
poses a significant challenge due to difficulty of an appropriate choice and employment of
distance/similarity functions for clustering and its verification. Unsupervised learning models for
artificial neural network offers an alternate means for data clustering and analysis. The objective of this
study is to highlight an approach and its associated considerations for mixed data set clustering with
Adaptive Resonance Theory 2 (ART-2) artificial neural network model and subsequent validation of the
clusters with dimensionality reduction using Autoencoder neural network model.
A Combined Approach for Feature Subset Selection and Size Reduction for High ...IJERA Editor
selection of relevant feature from a given set of feature is one of the important issues in the field of
data mining as well as classification. In general the dataset may contain a number of features however it is not
necessary that the whole set features are important for particular analysis of decision making because the
features may share the common information‟s and can also be completely irrelevant to the undergoing
processing. This generally happen because of improper selection of features during the dataset formation or
because of improper information availability about the observed system. However in both cases the data will
contain the features that will just increase the processing burden which may ultimately cause the improper
outcome when used for analysis. Because of these reasons some kind of methods are required to detect and
remove these features hence in this paper we are presenting an efficient approach for not just removing the
unimportant features but also the size of complete dataset size. The proposed algorithm utilizes the information
theory to detect the information gain from each feature and minimum span tree to group the similar features
with that the fuzzy c-means clustering is used to remove the similar entries from the dataset. Finally the
algorithm is tested with SVM classifier using 35 publicly available real-world high-dimensional dataset and the
results shows that the presented algorithm not only reduces the feature set and data lengths but also improves the
performances of the classifier.
A Novel Approach to Mathematical Concepts in Data Miningijdmtaiir
-This paper describes three different fundamental
mathematical programming approaches that are relevant to
data mining. They are: Feature Selection, Clustering and
Robust Representation. This paper comprises of two clustering
algorithms such as K-mean algorithm and K-median
algorithms. Clustering is illustrated by the unsupervised
learning of patterns and clusters that may exist in a given
databases and useful tool for Knowledge Discovery in
Database (KDD). The results of k-median algorithm are used
to collecting the blood cancer patient from a medical database.
K-mean clustering is a data mining/machine learning algorithm
used to cluster observations into groups of related observations
without any prior knowledge of those relationships. The kmean algorithm is one of the simplest clustering techniques
and it is commonly used in medical imaging, biometrics and
related fields.
Finding Relationships between the Our-NIR Cluster ResultsCSCJournals
The problem of evaluating node importance in clustering has been active research in present days and many methods have been developed. Most of the clustering algorithms deal with general similarity measures. However In real situation most of the cases data changes over time. But clustering this type of data not only decreases the quality of clusters but also disregards the expectation of users, when usually require recent clustering results. In this regard we proposed Our-NIR method that is better than Ming-Syan Chen proposed a method and it has proven with the help of results of node importance, which is related to calculate the node importance that is very useful in clustering of categorical data, still it has deficiency that is importance of data labeling and outlier detection. In this paper we modified Our-NIR method for evaluating of node importance by introducing the probability distribution which will be better than by comparing the results.
Validation Study of Dimensionality Reduction Impact on Breast Cancer Classifi...ijcsit
A fundamental problem in machine learning is identifying the most representative subset of features from
which we can construct a predictive model for a classification task. This paper aims to present a validation
study of dimensionality reduction effect on the classification accuracy of mammographic images. The
studied dimensionality reduction methods were: locality-preserving projection (LPP), locally linear
embedding (LLE), Isometric Mapping (ISOMAP) and spectral regression (SR). We have achieved high
rates of classifications. In some combinations the classification rate was 100%. But in most of the cases the
classification rate is about 95%. It was also found that the classification rate increases with the size of the
reduced space and the optimal value of space dimension is 60. We proceeded to validate the obtained
results by measuring some validation indices such as: Xie-Beni index, Dun index and Alternative Dun
index. The measurement of these indices confirms that the optimal value of reduced space dimension is
d=60.
Dimensionality Reduction and feature extraction.pptxSivam Chinna
Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension.
Semi-Supervised Discriminant Analysis Based On Data Structureiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACHcscpconf
Due to the intangible nature of “software”, accurate and reliable software effort estimation is a challenge in the software Industry. It is unlikely to expect very accurate estimates of software
development effort because of the inherent uncertainty in software development projects and the complex and dynamic interaction of factors that impact software development. Heterogeneity exists in the software engineering datasets because data is made available from diverse sources.
This can be reduced by defining certain relationship between the data values by classifying them into different clusters. This study focuses on how the combination of clustering and
regression techniques can reduce the potential problems in effectiveness of predictive efficiency due to heterogeneity of the data. Using a clustered approach creates the subsets of data having a degree of homogeneity that enhances prediction accuracy. It was also observed in this study that ridge regression performs better than other regression techniques used in the analysis.
Similar to A Novel Algorithm for Design Tree Classification with PCA (20)
Performance of Wideband Mobile Channel with Perfect Synchronism BPSK vs QPSK ...Editor Jacotech
Direct-sequence code-division multiple access (DS-CDMA) is
currently the subject of much research as it is a promising
multiple access capability for third and fourth generations
mobile communication systems. The synchronous DS-CDMA
system is well known for eliminating the effects of multiple
access interference (MAI) which limits the capacity and
degrades the BER performance of the system. In this paper,
we investigate the bit error rate (BER) performance of a
synchronous DS-CDMA system over a wideband mobile
radio channel. The BER performance is affected by the
difference in path length ΔL and the number of arriving
signals N. Furthermore, the effect of these parameters is
examined on the synchronous DS-CDMA system for different
users’ number as well as different processing gain Gp. In this
environment and under the above conditions the performances
of the BPSK (Binary Phase Shift Keying) and the QPSK
(Quadrature Phase Shift Keying) modulations are compared.
The promising simulation results showed the possibility of
applying this system to the wideband mobile radio channel.
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSISEditor Jacotech
With microblogging platforms such as Twitter generating
huge amounts of textual data every day, the possibilities of
knowledge discovery through Twitter data becomes
increasingly relevant. Similar to the public voting mechanism
on websites such as the Internet Movie Database (IMDb) that
aggregates movies ratings, Twitter content contains
reflections of public opinion about movies. This study aims to
explore the use of Twitter content as textual data for
predicting the movie rating. In this study, we extract number
of tweets and compiled to predict the rating scores of newly
released movies. Predictions were done with the algorithms,
exploring the tweet polarity. In addition, this study explores
the use of several different kinds of tweet classification
Algorithm and movie rating algorithm. Results show that
movie rating developed by our application is compared to
IMDB and Rotten Tomatoes.
Non integer order controller based robust performance analysis of a conical t...Editor Jacotech
The design of robust controller for any non linear process is a
challenging task because of the presence of various types of
uncertainties. In this paper, various design methods of robust
PID controller for the level control of conical tank are
discussed. Uncertainties are of different types, among that
structured uncertainty of 30% is introduced to the nominal
plant for analysing the robustness. As a first step, the control
of level is done by using conventional integer order controller
for both nominal and uncertain system. Then, the control is
done by means of Fractional Order Proportional Integral
Derivative (FOPID) controller for achieving robustness. With
the help of time series parameters, a comparison is made
between conventional PID and FOPID with respect to the
simulated output using MATLAB and also analyzed the
robustness.
FACTORS CAUSING STRESS AMONG FEMALE DOCTORS (A COMPARATIVE STUDY BETWEEN SELE...Editor Jacotech
It is an important task of working women to handle two
important tasks. Balancing these two roles at home and
work is very challenging task and causes stress at different
levels. Different dimension of working women’s life
involves in evolving the stress in working women’s life.
These stresses cause the imbalance at the front of and
handling family responsibility. In the current scenario,
doctors face many stressors that are peculiar to the medical
profession and doctors are required to have more
competencies than before in diagnosis ongoing
management of medical conditions. This means increased
responsibilities which may contribute to stress. Stress
experienced at work can have adverse outcomes for the
well-being of individual employees and organization as
whole. My study is focusing on identifying the factors
causing stress among female doctors working for public
and private hospitals and their stress levels associations
with respect to sector. A sample of 300 female doctors
from urban area participated in this study. Out of this, 150
each are from public and private hospitals respectively. A
self-made standardized tool was administered based on five
point scale. Results indicates that the values were found to
be 0.000 in all the cases except, psychosomatic problems
(0.004) which is lesser than (0.05) p-value resulting into
rejection of null hypotheses , consequently revealing an
association between sector of female doctors and stress due
to workload, working condition, physical exertion,
emotional exhaustion, job security, organizational support,
work family conflict, family adjustment, task demands,
psychosomatic problems, patient’s expectation and working
hours.
ANALYSIS AND DESIGN OF MULTIPLE WATERMARKING IN A VIDEO FOR AUTHENTICATION AN...Editor Jacotech
Watermarking technique be employ instance & for a second time for
validation and protection of digital data (images, video and audio
files, digital repositories and libraries, web publishing). It is helpful
to copyright protection and illegal copying of digital data like video
frames and making digital data more robust and imperceptible. With
the advent of internet, creation and delivery of digital data has grown
many fold. In that Scenario has to need a technique for transferring
digital data securely without changing their originality and
robustness. In this paper proposed a plan of latest watermarking
method which involves inserting and adding two or more digital data
or pictures in a single video frame for the principle of protection and
replicate the similar procedure for N no video frames for
authentication of entire digital video. After that digital video is
encrypted and decrypted by using motion vector bit-xor encryption
and decryption technique.
The Impact of Line Resistance on the Performance of Controllable Series Compe...Editor Jacotech
In recent years controllable FACTS devices are increasingly
integrated into the transmission system. FACTS devices that
provide series control such as Controllable Series Compensator
(CSC) has significant effect on the voltage stability of Electric
Power system. In this work impact of line resistance on the
performance of CSC in a single-load infinitive-bus (SLIB)
model is investigated. The proposed framework is applied to
SLIB model and obtained results demonstrates that line
resistance has considerable effect on voltage stability limits and
performance of CSC.
Security Strength Evaluation of Some Chaos Based Substitution-BoxesEditor Jacotech
Recently, handful amount of S-boxes, using the various
methods such as affine transformations, gray coding,
optimization, chaotic systems, etc, have been suggested. It is
prudent to use cryptographically strong S-boxes for the design
of powerful ciphers. In this paper, we sampled some widely
used 8×8 S-boxes which are recently synthesized and security
analysis and evaluation is executed to uncover the best
candidate(s). The performance analysis is exercised against
the crucial measures like nonlinearity, linear approximation
probability, algebraic immunity, algebraic complexity,
differential uniformity. These parameters are custom selected
because their scores decide the security strength against
cryptographic assaults like linear cryptanalysis, algebraic
attacks, and differential cryptanalysis. The anticipated
analysis in this work facilitates the cryptographers, designers,
researchers to choose suitable candidate decided over many
parameters and can be engaged in modern block encryption
systems that solely rely on 8×8 S-box. Moreover, the analysis
assists in articulating efficient S-boxes and to evaluate the
attacks resistivity of their S-boxes.
Traffic Detection System is an Android application that aims at determining the behavior of traffic in a particular location. It calculates the speed of the vehicle and the level of congestion or the amount of traffic is determined on the basis of the values of sensors. If any such obstruct found, then the driver is provided an option to send messages regarding high traffic to his/her friends. After a distinct number of repeated low speed and breaks, the location of the vehicle (latitude and longitude) send to a pre-specified contact (selected in case of traffic congestion) through an SMS. This application uses the features of the Global positioning system. The Latitude, as well as the longitude of the location where traffic jams are formed, is sent to the friends of the user. The Goggle map of the location also sends to the friends. It uses the SMS Manager a functionality of Android. The friends receiving the messages will thereby avoid taking the congested route and hence the level of traffic on the congested road will decrease, and the friends will reach the destination in comparatively less time.
Performance analysis of aodv with the constraints of varying terrain area and...Editor Jacotech
Mobile Ad Hoc Networks (MANETs) are wireless networks,
where there is no requirement for any infrastructure support to
transfer data packets between mobile nodes. These nodes
communicate in a multi-hop mode; each mobile node acts
both as a host and router. The main job of Quality of Service
(QoS)[1][2] routing in MANETs is to search and establish
routes among different mobile nodes for satisfying QoS
requirements of wireless sensor networks as PDR, Average
end-to-end delay, Average Throughput. The QoS routing
protocols efficient for commercial, real-time and multimedia
applications are in demand for day to day activities[2].
Modeling of solar array and analyze the current transient response of shunt s...Editor Jacotech
Spacecraft bus voltage is regulated by power
conditioning unit using switching shunt voltage regulator having
solar array cells as the primary source of power. This source
switches between the bus loads and the shunt switch for fine
control of spacecraft bus voltage. The effect of solar array cell
capacitance [5][6] along with inductance and resistance of the
interface wires between solar cells and power conditioning
unit[1], generates damped sinusoidal currents superimposed on
the short circuit current of solar cell when shunted through
switch. The peak current stress on the shunt switch is to be
considered in the selection of shunt switch in power conditioning
unit. The analysis of current transients of shunt switch in PCU
considering actual spacecraft interface wire length by
illumination of solar panel (combination of series and parallel
solar cells) is difficult with hardware simulation. Software
simulation by modeling solar cell is carried out for a single string
(one parallel) in Pspice [6]. Since in spacecrafts number of
parallels and interface cable length are variable parameters the
analysis of current transients of shunt switch is carried out by
modeling solar array with the help of solar cell model[6] for the
actual spacecraft condition.
License plate recognition an insight to the proposed approach for plate local...Editor Jacotech
License Plate Recognition (LPR) system for vehicles is an innovative and a very challenging area for research due to the innumerous plate formats and the nonuniform outdoor illumination conditions during which images are acquired. Thus, most approaches developed, work under certain restrictions such as fixed illumination, stationary background and limited speed. Algorithms developed for LPR systems are generally composed of three significant stages: 1] localization of the license plate from an entire scene image; 2] segmentation of the characters on the plate; 3] recognition of each of the segmented characters. A simple approach for preprocessing of the images, localization and extraction phase has been described in this paper. Numerous procedures have been developed for LPR systems and are assessed in this paper taking into consideration issues like processing time, computational power and recognition rate wherever available.
Design of airfoil using backpropagation training with mixed approachEditor Jacotech
Levenberg-Marquardt back-propagation training method has some limitations associated with over fitting and local optimum problems. Here, we proposed a new algorithm to increase the convergence speed of Backpropagation learning to design the airfoil. The aerodynamic force coefficients corresponding to series of airfoil are stored in a database along with the airfoil coordinates. A feedforward neural network is created with aerodynamic coefficient as input to produce the airfoil coordinates as output. In the proposed algorithm, for output layer, we used the cost function having linear & nonlinear error terms then for the hidden layer, we used steepest descent cost function. Results indicate that this mixed approach greatly enhances the training of artificial neural network and may accurately predict airfoil profile.
Ant colony optimization based routing algorithm in various wireless sensor ne...Editor Jacotech
Wireless Sensor Network has several issues and challenges due to limited battery backup, limited computation capability, and limited computation capability. These issues and challenges must be taken care while designing the algorithms to increase the Network lifetime of WSN. Routing, the act of moving information across an internet world from a source to a destination is one of the vital issue associated with Wireless Sensor Network. The Ant Colony Optimization (ACO) algorithm is a probabilistic technique for solving computational problems that can be used to find optimal paths through graphs. The short route will be increasingly enhanced therefore become more attractive. The foraging behavior and optimal route finding capability of ants can be the inspiration for ACO based algorithm in WSN. The nature of ants is to wander randomly in search of food from their nest. While moving, ants lay down a pheromone trail on the ground. This chemical pheromone has the ability to evaporate with the time. Ants have the ability to smell pheromone. When selecting their path, they tend to select, probably the paths that has strong pheromone concentrations. As soon as an ant finds a food source, carries some of it back to the nest. While returning, the quantity of chemical pheromone that an ant lay down on the ground may depend on the quantity and quality of the food. The pheromone trails will lead other ants towards the food source. The path which has the strongest pheromone concentration is followed by the ant which is the shortest paths between their nest and food source. This paper surveys the ACO based routing in various Networking domains like Wireless Sensor Networks and Mobile Ad Hoc Networks.
An efficient ant optimized multipath routing in wireless sensor networkEditor Jacotech
Today, the Wireless Sensor Network is increasingly gaining popularity and importance. It is the more interesting and stimulating area of research. Now, the WSN is applied in object tracking and environmental monitoring applications. This paper presents the self-optimized model of multipath routing algorithm for WSN which considers definite parameters like delay, throughput level and loss and generates the outcomes that maximizes data throughput rate and minimizes delay and loss. This algorithm is based on ANT optimization technique that will bring out an optimal and organized route for WSN and is also to avoid congestion in WSN, the algorithm incorporate multipath capability..
A mobile monitoring and alert sms system with remote configuration – a case s...Editor Jacotech
One of the parent´s main concerns nowadays it to know their children´s whereabouts. Some applications exist to address this issue and most of them rely on internet connection which makes the solution expensive. In this paper we present a low cost solution, based on SMS, and with the ability to remotely configure the child monitoring process. We also present the architecture and the full flowchart of the child application whenever a SMS is received. This case study uses Android and the more recent location API – the Fused Location Provider. For obvious reasons, the security issue has been a concern, which resulted in a configuration module in the child application to specify authorized senders
Leader Election Approach: A Comparison and SurveyEditor Jacotech
In distributed system, the coordinator is needed to manage the use of the resources in the shared environment. Many algorithms have been proposed for the same. They have various positive and negative parts. Here we will discuss those issues which ensure the efficiency of the algorithm for election leader. Here a comparison will be provided to show the advantages and disadvantages of different election algorithms. The comparison would be based on the number of messages passing and the order of time complexity.
Leader election approach a comparison and surveyEditor Jacotech
In distributed system, the coordinator is needed to manage the use of the resources in the shared environment. Many algorithms have been proposed for the same. They have various positive and negative parts. Here we will discuss those issues which ensure the efficiency of the algorithm for election leader. Here a comparison will be provided to show the advantages and disadvantages of different election algorithms. The comparison would be based on the number of messages passing and the order of time complexity
Modeling of solar array and analyze the current transientEditor Jacotech
Spacecraft bus voltage is regulated by power
conditioning unit using switching shunt voltage regulator having
solar array cells as the primary source of power. This source
switches between the bus loads and the shunt switch for fine
control of spacecraft bus voltage. The effect of solar array cell
capacitance [5][6] along with inductance and resistance of the
interface wires between solar cells and power conditioning
unit[1], generates damped sinusoidal currents superimposed on
the short circuit current of solar cell when shunted through
switch. The peak current stress on the shunt switch is to be
considered in the selection of shunt switch in power conditioning
unit. The analysis of current transients of shunt switch in PCU
considering actual spacecraft interface wire length by
illumination of solar panel (combination of series and parallel
solar cells) is difficult with hardware simulation. Software
simulation by modeling solar cell is carried out for a single string
(one parallel) in Pspice [6]. Since in spacecrafts number of
parallels and interface cable length are variable parameters the
analysis of current transients of shunt switch is carried out by
modeling solar array with the help of solar cell model[6] for the
actual spacecraft condition.
Traffic Detection System is an Android application that aims at determining the behavior of traffic in a particular location. It calculates the speed of the vehicle and the level of congestion or the amount of traffic is determined on the basis of the values of sensors. If any such obstruct found, then the driver is provided an option to send messages regarding high traffic to his/her friends. After a distinct number of repeated low speed and breaks, the location of the vehicle (latitude and longitude) send to a pre-specified contact (selected in case of traffic congestion) through an SMS. This application uses the features of the Global positioning system. The Latitude, as well as the longitude of the location where traffic jams are formed, is sent to the friends of the user. The Goggle map of the location also sends to the friends. It uses the SMS Manager a functionality of Android. The friends receiving the messages will thereby avoid taking the congested route and hence the level of traffic on the congested road will decrease, and the friends will reach the destination in comparatively less time.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
Thinking of getting a dog? Be aware that breeds like Pit Bulls, Rottweilers, and German Shepherds can be loyal and dangerous. Proper training and socialization are crucial to preventing aggressive behaviors. Ensure safety by understanding their needs and always supervising interactions. Stay safe, and enjoy your furry friends!
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
How to Build a Module in Odoo 17 Using the Scaffold MethodCeline George
Odoo provides an option for creating a module by using a single line command. By using this command the user can make a whole structure of a module. It is very easy for a beginner to make a module. There is no need to make each file manually. This slide will show how to create a module using the scaffold method.
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
A review of the growth of the Israel Genealogy Research Association Database Collection for the last 12 months. Our collection is now passed the 3 million mark and still growing. See which archives have contributed the most. See the different types of records we have, and which years have had records added. You can also see what we have for the future.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
A Novel Algorithm for Design Tree Classification with PCA
1. Journal of Advanced Computing and Communication Technologies (ISSN: 2347 - 2804)
Volume No.1 Issue No. 1, August 2013
A Novel Algorithm for Design Tree Classification with PCA
By
Ravindra Gupta, Gajendra Singh, Gaurav Kumar Saxena
SSSIST Sehore, India
sravindra_p84@rediffmail.com,gajendrasingh86@rediffmail.com,gaurav.saxena18@rediffmail.com
ABSTRACT
Classification technique is useful to categorize data into
classes from the current dataset on the basis of a training set
of data containing observations (or instances) whose category
membership is known. The decision Tree classification
algorithm doesn't work well separately for high dimensional
data. To improve the efficiency, in our work, we apply
principal component analysis and linear transformation
together on the original data set for dimensionality reduction
after that classification algorithm is applied for a better
solution.
the results of MRI scans, and classifying galaxies based upon
their shapes.
A classification model is useful to distinguish
between objects of different classes and also to predict the
class label of unknown records. Various methods of
classification are decision tree classifiers, rule-based
classifiers, neural networks, support vector machines, and
naıve Bayes classifiers.
Dimensionality Reduction
Keywords
Classification,
Dimensionally
Reduction,
Component Analysis, Decision Tree.
Principal
1. INTRODUCTION
Classification technique is used in the assignment of some
combination of input variables, which are measured or preset,
into predefined classes. Over the last decade, various
technologies came such as imaging processing, gene micro
array studies and textual data analysis with huge amount of
data or growth of data dimension which may suffer from
efficiency of classification rate.
Principal component analysis is appropriate when
you have obtained measures on a number of observed
variables and wish to develop a smaller number of artificial
variables (called principal components) that will account for
most of the variance in the observed variables. The principal
components may then be used as a predictor or criterion
variables in subsequent analyses.
Classification
Classification is the process of generalizing the data according
to different instances, in other words we can say that
Classification is the task of assigning objects to one of several
predefined categories.It is a persistent problem that
encompasses many diverse applications such as detecting
spam email messages based upon the message header and
content, categorizing cells as malignant or benign based upon
In the growing age where various applications are facilitated
for us, have a large number of datasets in multidimensional
space.Dimensionality reduction deals with the transformation
of a high dimensional dataset into a low dimensional space,
while retaining most of the useful structure in the original
data.For improving the efficiency of classifier dimension
reduction plays an important role.
Dimensionality reduction can be done by linear and
nonlinear methods, which are described as:
Linear technique is an unsupervised technique, its purpose to
perform dimensionality reduction by embedding the data into
a subspace of lower dimensionality. Although there exist
various techniques to do so, PCA is by far the most popular
linear technique.[1]
In mathematical terms, PCA attempts to find a linear mapping
M that maximizes MT cov(X)M, where cov(X) is the
covariance matrix of the data X. It can be shown that this
linear mapping is formed by the d principal eigenvectors (i.e.,
principal components) of the covariance matrix of the zeromean data. Hence, PCA solves the eigen problem
cov(X)M = λM
The eigen problem is solved for the d principal eigenvalues λ.
The low-dimensional data representations yi of the datapoints
xi are computed by mapping them onto the linear basis M, i.e.,
Y = (X − ¯X )M.
1
2. Journal of Advanced Computing and Communication Technologies (ISSN: 2347 - 2804)
Volume No.1 Issue No. 1, August 2013
PCA has been successfully applied in a large number of
domains such as face recognition [2], coin classification [3],
and seismic series analysis [4].
In the subspace A consisting of the eigenvectors having the
largest eigenvalues, we can do the similar transformation to
the above to get the low dimensional code vector Y’ of X.
The main drawback of PCA is that the size of the covariance
matrix is proportional to the dimensionality of the data points.
As a result, the computation of the eigenvectors might be
infeasible for very high-dimensional data.
Y’ = A( X- µ ) ………….
On the other hand, Nonlinear techniques for
dimensionality reduction must have global and local
properties of the original data in the low-dimensional
representation and it must perform global alignment of a
mixture of linear models.
Principal Component Analysis
Principal Component Analysis (PCA) also known as the
Karhunen-Loeve Transform is a classical statistical method. It
identifies the axes for a set of data vectors along which the
correlation between components of the data vectors can be
most clearly shown[ 43]
Suppose there is a data set M={ Xi | i=1, …, N }, where X is
an n-dimensional column vector and X = ( x1, … , xn )T. The
mean of the data vector is µ = < X >, here < > stands for the
average over the data set. The data set can be represented by a
matrix D = ( X1, X2, …, XN ). The covariance matrix of D is
C with its element Cij which can be calculated as shown
below.
Cij = <( xi – µi)( xj – µj)>
………….
(1.1)
By solving the characteristic equation of the covariance
matrix C, we can obtain the eigenvectors that specify the axes
having the properties described above and the corresponding
eigenvalues that are respectively indicative of the variance of
the dataset along these axes. Therefore, just by looking at the
eigenvalues, we can easily find out along which axes the
dataset has little or no spread. Hence, the principal axes and
the eigenvalues give a good reflection of the linear
interdependence between the components of the data vectors.
After choosing some eigenvectors with largest eigenvalues,
form a subspace A in which the data set has the most
significant amounts of variance. Thus, the dimensionality of
the data can be reduced by means of this property of PCA.
Suppose B has all the eigenvectors of the covariance matrix C
as its row vectors, we can transform a data vector X this way:
Y = B( X- µ ) ………..
(1.2)
By applying this projection to the original dataset D, we can
get an uncorrelated vector set {Y}. Since B is an orthogonal
matrix, the inverse of B is equal to the transpose of B (BT).
We can use Y to obtain the original data vector X like this:
X = BTY + µ
…………..
(1.3)
(1.4)
And we can reconstruct X in the way that is similar to (1.3).
X’ = ATY + µ
………….
(1.5)
By (1.4) and (1.5), we project the original data vector to the
low dimensional space spanned by A and then we use the low
dimensional code to reconstruct the original data. This
projection minimizes the mean-square error between the data
and the reconstruction of the data.
In practice, dimensionality reduction using PCA can be done
efficiently through singular value decomposition (SVD) [43].
2. RELATED WORK
Dimensionality reduction techniques can be broadly divided
into several categories: (i) feature selection and feature
weighting, (ii) feature extraction, and (iii) feature grouping.
Dimension reduction techniques based
Selection and Feature Weighting
on
Feature
Feature selection or subset selection deals with the selection
of a subset of features that is most appropriate for the task at
hand after that feature weighting [5] assigns (usually between
zero and one) two different features to indicate the salience of
the individual features. Most of the literature on feature
selection/weighting pertains to supervised learning.
Feature selection and weighting algorithms categorized on the
basis of Filters, Wrappers, and Embedded [6].
The filter approaches evaluate the relevance of each feature
subset using the data set alone by some learning task. RELIEF
and its enhancement are representatives of this class; the basic
idea behind this task is to assign feature weights based on the
consistency of the feature value in the k nearest neighbors of
every data point. Wrapper algorithms,learning algorithm
evaluates the quality of each feature (subset Specifically, a
learning algorithm (e.g., a nearest neighbor classifier, a
decision tree, a naive Bayes method) is run using a feature
subset and the feature subset is assessed by some estimate
related to the classification accuracy. Often the learning
algorithm is regarded as a “black box" in the sense that the
wrapper algorithm operates independent of the internal
mechanism of the classifier. An example is [9], which used
genetic search to adjust the feature weights for the best
performance of the k nearest neighbor classifier. In the third
approach (called embedded in [10]), the learning algorithm is
modified to have the ability to perform feature selection.
There is no longer an explicit feature selection step; the
algorithm automatically builds a classifier with a small
number of features. LASSO (least absolute shrinkage and
selection operator) [11] is a good example in this category.
2
3. Journal of Advanced Computing and Communication Technologies (ISSN: 2347 - 2804)
Volume No.1 Issue No. 1, August 2013
LASSO modifies the ordinary least square by including a
constraint on the L1 norm of the weight coefficients. This has
the effect of preferring sparse regression coefficients (a formal
statement for this is proved in [12,13]), effectively performing
feature selection. Another example is MARS (multivariate
adaptive regression splines) [13], where choosing the
variables used in the polynomial splines effectively performs
variable selection. Automatic relevance detection in neural
networks [14] is another example, which uses a Bayesian
approach to estimate the weights in the neural network as well
as the relevancy parameters that can be interpreted as feature
weights.
Filter approaches are generally faster because they are
classifier independent and require computation of simple
quantities. They scale well with the number of features, and
many of them can comfortably handle thousands of features.
Wrapper approaches, on the other hand, can be superior in
accuracy when compared with filters, which ignore the
properties of the learning task at hand [15]. They are,
however, computationally more demanding, and do not scale
very well with the number of features. It is because training
and evaluating a classifier with many features can be slow,
and the performance of a traditional classifier with a large
number of features may not be reliable enough to estimate the
utilities of individual features. To get the best results from
filters and wrappers, the user can apply a filter-type technique
as preprocessing to cut down the feature set to a moderate
size, and then use a wrapper algorithm to determine a small
yet discriminative feature subset. Some state-of-the-art feature
selection algorithms indeed adopt this approach, as observed
in. “Embedded" algorithms are highly specialized and it is
difficult to compare them in general with filter and wrapper
approaches.
Quality of a Feature Subset Feature selection/weighting
algorithms can also be classified according to the definition of
“relevance" or how the quality of a feature subset is assessed.
Five definitions of relevance are given in. Informationtheoretic methods are often used to evaluate features, because
the mutual information between a relevant feature and the
class labels should be high [15]. Non-parametric methods can
be used to estimate the probability density function of a
continuous feature, which is used to compute the mutual
information. Correlation is also used frequently to evaluate
features. A feature can be declared irrelevant if it is
conditionally independent of the class labels given other
features. The concept of Markov blanket is used to formalize
this notion of irrelevancy in. RELIEF uses the consistency of
the feature value in the k nearest neighbors of every data point
to quantify the usefulness of a feature.
Optimization Strategy Given a definition of feature
relevancy, a feature selection algorithm can search for the
most relevant feature subset. Because of the lack of
monotonicity
(with relation
to the
features) of
the
many feature connexion criteria, a combinatorial search
through the area of all doable feature subsets is required.
Usually, heuristic (non-exhaustive) methods have to be
adopted, because the size of this space is exponential in the
number of features. In this case, one generally loses any
guarantee of optimality of the selected feature subset. In the
last few years different types of heuristics like sequential
forward or backward searches, floating search, beam search,
bi-directional search, and genetic search have been proposed
[16]. A comparison of some of these search heuristics can be
found in [17]. In the context of regression toward the
mean, successive forward search is commonly referred to
as stepwise
regression.
Forward stepwise regression is a generalization of stepwise
regression, where a feature is only “partially" selected by
increasing the corresponding regression coefficient by a fixed
amount. It is closely related to LASSO [18], and this
relationship was established via least angle regression
(LARS), another interesting algorithm on its own, in [20].
Wrapper algorithms have some intelligence with heuristic
search. Feature weighting algorithms do not involve a
heuristic search because the weights for all features are
computed simultaneously. Embedded approaches also do not
require any heuristic search.
The best parameter is
usually calculable by
optimizing an
explicit objective perform. Counting
on the
shape of the
target perform, completely different optimization methods are
used. Within the case of LASSO, as an example, a general
quadratic
programming problem
solver,
homotopy methodology [12], a changed version of LARS or
the EM algorithmic rule is wanting to estimate the parameters.
Feature Extraction
In feature extraction, a small set of new features is constructed
by a general mapping from the high dimensional data. The
mapping often involves all the available features. Many
techniques for feature extraction have been proposed. In this
section, we describe some of the linear feature extraction
methods, i.e., the extracted features can be written as linear
combinations of the original features. Nonlinear feature
extraction techniques are more sophisticated. The readers may
also find two recent surveys [55,56] useful in this regard.
Unsupervised Techniques “Unsupervised" here refers to the
fact that these feature extraction techniques are based only on
the data (pattern matrix), without pattern label information.
Principal component analysis (PCA), also known as
Karhunen-Loeve Transform or simply KL transform, is
arguably the most popular feature extraction method. PCA
finds a hyper plane such that, upon projection to the hyper
plane, the data variance is best preserved. The optimal hyper
plane is spanned by the principal components, which are the
leading eigenvectors of the sample covariance matrix.
Features extracted by PCA consist of the projection of the
data points to different principal components. When the
features extracted by PCA are used for linear regression, it is
sometimes called “principal component regression". Recently,
sparse variants of PCA have also been proposed [20], where
3
4. Journal of Advanced Computing and Communication Technologies (ISSN: 2347 - 2804)
Volume No.1 Issue No. 1, August 2013
each principal component only has a small number of nonzero coefficients.
Factor analysis (FA) can also be used for feature extraction.
FA assumes that the observed high dimensional data points
are the results of a linear function (expressed by the factor
loading matrix) on a few unobserved random variables,
together with uncorrelated zero-mean noise. After estimating
the factor loading matrix and the variance of the noise, the
factor scores for different patterns can be estimated and serve
as a low-dimensional representation of the data.
Supervised Techniques Labels in classification and response
variables in regression can be used together with the data to
extract more relevant features. Linear discriminate analysis
(LDA) finds the projection direction such that the ratio of
between-class variance to within-class variance is the largest.
When there are more than two classes, multiple discriminate
analysis (MDA) finds a sequence of projection directions that
maximizes a similar criterion. Features are extracted by
projecting the data points to these directions.
Partial least squares (PLS) can be viewed as the regression
counterpart of LDA. Instead of extracting features by
retaining maximum data variance as in principal component
regression, PLS finds projection directions that can best
explain the response variable. Canonical correlation analysis
(CCA) is a closely related technique that finds projection
directions that maximize the correlation between the response
variables and the features extracted by projection.
Feature Grouping
In feature grouping, new features are constructed by
combining several existing features. Feature grouping can be
useful in scenarios where it can be more meaningful to
combine features due to the characteristics of the domain. For
example, in a text categorization task different words can have
similar meanings and combining them into a single word class
is more appropriate. Another example is the use of the power
spectrum of classification, where each feature corresponds to
the energy in a certain frequency range. The preset boundaries
of the frequency ranges can be sub-optimal, and the sum of
features from adjacent frequency ranges can lead to a more
meaningful feature by capturing the energy in a wider
frequency range. For gene expression data, genes that are
similar may share a common biological pathway and the
grouping of predictive genes can be of interest to biologists
[21].
The most direct way to perform feature grouping is to cluster
the features (instead of the objects) of a data set. Feature
clustering is not new; the SAS/STAT procedure “varclus" for
variable clustering was written before 1990 [19]. It is
performed by applying the hierarchical clustering method on a
similarity matrix of different features, which is derived by,
say, the Pearson's correlation coefficient. This scheme was
probably first proposed in [14], which also suggested
summarizing one group of features by a single feature in order
to achieve dimensionality reduction. Recently, feature
clustering has been applied to boost the performance in text
categorization. Techniques based on distribution clustering
[15], mutual information and information bottleneck have also
been proposed.
Features can also be clustered together with the objects. As
mentioned in [18], this idea has been known under different
names in the literature, including bi-clustering" , “coclustering" , double-clustering", coupled clustering" , and
simultaneous clustering" . A bipartite graph can be used to
represent the relationship between objects and features, and
the partitioning of the graph can be used to cluster the objects
and the features simultaneously . Information bottleneck can
also be used for this task.
In the context of regression, feature grouping can be achieved
indirectly by favoring similar features to have similar
coefficients. This can be done by combining ridge regression
with LASSO, leading to the elastic net regression algorithm
[19]
3. PROPOSED WORK
In the proposed work, first dimensionality of data set is
reduced by applying Principal Component Analysis (PCA)
and then decision tree is built.
In combination with principal components analysis and
Decision tree with their characteristics, firstly, filter the
sample data set, then extract the main attributes, and lastly
construct a new decision tree by the following algorithm. The
detailed is as follows:
Step 1 In starting step we Convert data source into a multimatrix, identify the main attributes by principal components
analysis.
1)
Get mean vector of all the feature vectors
(xmean).
2)
Get xi-xmean.
3)
Get the covariance matrix. (Square and symmetric)
4)
Get the Eigen Values and Eigen Vectors.
5)
Normalize the Eigen Vectors.
6)
Form the Transformation matrix (T) (contains the
eigen vectors sorted by putting the eigen vectors
that correspond to the max eigen values first).
7)
Apply the transformation: y = transpose(T) * [xixmean].
8)
Reduction of y :
4
5. Journal of Advanced Computing and Communication Technologies (ISSN: 2347 - 2804)
Volume No.1 Issue No. 1, August 2013
Calculate the Loss when removing some features of y.
Review And Empirical Evaluation Of Feature
Weighting Methods For A Class Of Lazy Learning
Algorithms” Artif. Intell. Rev., 11(1-5):273-314,
1997
- Sort the eigen values discerningly and remove from the
smaller (from the bottom).
[5]
Mohavi and G. John, “ Wrappers For Feature Subset
Selection. Artificial Intelligence”, 97(1-2):273-324,
1997.
[6]
Remove the features according to the required loss (make
them zeroes) get y.
Raymer, W.F. Punch, E.D. Goodman, L.A. Kuhn,
and A.K. Jain, “ Dimensionality Reduction Using
Genetic Algorithms”. IEEE Transactions on
Evolutionary Computation, 4(2):164-171, July
2000.
This way, we reduced the number of features.
Step2 Do data cleaning for data source and generate the
training sets of decision tree through converting the
continuous data into discrete variables.
[7]
.
Guyon and A. Elissee, “ An Introduction To
Variable And Feature Selection. Journal Of
Machine Learning Research”, 3:1157-1182, March
2003.
Step3 Compute the information (entropy) of training sample
sets, the information (entropy) of each attribute, split
information, information gain and information gain ratio, of
which S stands for the training sample sets and A denotes the
attributes.
[8]
.
Tibshirani, “ Regression Shrinkage And Selection
Via The Lasso. Journal Of The Royal Statistical
Society”. Series B (Methodological), 58(1):267288, 1996.
[9]
.
Donoho, “ For Most Large Underdetermined
Systems Of Linear Equations, The Minimal L1Norm Solution Is Also The Sparsest Solution”.
Technical report, Department of Statistics, Stanford
University, 2004
Step4 Each possible value of root may correspond to a subset.
Do step 3 recursively and generate a decision tree for the
sample subset until the observed data of each divided subset
are the same in the classification attributes.
Step5 Extract the classification rules based on the constructed
decision tree and do classification of new data sets.
[10]
.
H. Friedman, “Multivariate Adaptive Regression
Splines”. Annals of Statistics, 19(1):1-67, March
1991.
4. Conclusion
[11]
[12]
[13]
[14]
M
S. Venkatarajan and W. Braun, “New Quantitative
Descriptors Of Amino Acids Based On
Multidimensional Scaling Of A Large Number Of
Physicalchemical Properties”. Journal of Molecular
Modeling, 7(12):445–453, 2004.
[15]
[16]
.
Wettschereck, D.W. Aha, and T. Mohri, “ A
.
Battiti, “ Using Mutual Information For Selecting
Features In Supervised Neural Net Learning”. IEEE
Transactions on Neural Networks, 5(4):537-550,
July 1994
[17]
C. Jenkins and M.J. Mataric, “ Deriving Action And
Behavior Primitives From Human Motion Data”, In
International Conference on Intelligent Robots and
Systems, volume 3, p.p 2551–2556, 2002.
[4]
.
L. Blum and P. Langley, “ Selection Of Relevant
Features And Examples In Machine Learning.
Artificial Intelligence”, 97(1-2):245-271, 1997.
.
[3]
.
Guyon, S. Gunn, A. Ben-Hur, and G. Dror, “ Result
Analysis Of The NIPS 2003 Feature Selection
Challenge. In Advances In Neural Information
Processing Systems”, pages 545-552. MIT Press,
2005.
P
Hughes and L. Tarassenko, “Novel Signal Shape
Descriptors Through Wavelet Transforms And
Dimensionality Reduction. In Wavelet Applications
In Signal And Image Processing”, p.p 763–773,
2003.
[2]
.
Kohavi and G. John, “ Wrappers For Feature
Subset Selection. Artificial Intelligence”, 97(12):273-324, 1997.
5. REFERENCES
[1]
.
J.C. MacKay, “ Bayesian Non-Linear Modelling
For The Prediction Competition”. In ASHRAE
Transactions, V.100, Pt.2, pages 1053-1062, Atlanta
Georgia, 1994.
Classification algorithm is used in various applications as we
have discussed above but it is not limited, the algorithm is
also useful for natural disasters like cloud bursting,
earthquake etc. A Principal component analysis (PCA) is a
mathematical procedure that uses an orthogonal
transformation to convert a set of observations of possible
correlated variables into a set of values of linearly
uncorrelated variables for dimension reduction to enhance the
classification rate.
N
Kwak and C.-H. Choi, “Input Feature Selection By
Mutual Information Based On Parzen Window”.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 24(12):1667-1671, December 2002.
K
Torkkola, “ Feature Extraction By Non Parametric
5
6. Journal of Advanced Computing and Communication Technologies (ISSN: 2347 - 2804)
Volume No.1 Issue No. 1, August 2013
Mutual Information Maximization”. Journal of
Machine Learning Research, 3:1415-1438, March
2003.
[18]
L
Yu and H. Liu, “ Feature Selection For HighDimensional Data: A Fast Correlation-Based Filter
Solution”. In Proc. 20th International Conference on
Machine Learning, pages 856-863. AAAI Press,
2003.
[19]
M
A. Hall, “ Correlation-Based Feature Selection For
Discrete And Numeric Class Machine Learning”. In
Proc. 17th International Conference on Machine
Learning, pages 359-366. Morgan Kaufmann, 2000.
[20]
D
Koller and M. Sahami, “ Toward Optimal Feature
Selection”. In Proc. 13th International Conference
on Machine Learning, pages 284-292. Morgan
Kaufmann, 1996.
[21]
K
Kira and L. Rendell, “The Feature Selection
Problem: Traditional Methods And A New
Algorithm”. In Proc. of the 10th National
Conference on Artificial Intelligence, pages 129134, Menlo Park, CA, USA, 1992. AAAI Press.
[22]
I
Kononenko, “ Estimating Attributes: Analysis And
Extensions Of RELIEF”. In Proc. 7th European
Conference on Machine Learning, pages 171-182,
1994.
[23]
R
Caruana and D. Freitag, “ Greedy Attribute
Selection”. In Proc. 11th International Conference
on Machine Learning, pages 28-36. Morgan
Kaufmann, 1994.
[24]
R
Kohavi and G. John, “ Wrappers For Feature
Subset Selection”. Artificial Intelligence, 97(12):273-324, 1997.
6