The document summarizes a study on optimal feature selection from VMware ESXi 5.1. Data was collected from a VMware server running multiple virtual machines under different loads. Feature selection algorithms like CFS, RELIEF, chi-square, wrapper and rough set methods were used to extract the optimal set of parameters. K-means clustering algorithm grouped the VMs and cluster quality was determined using Davies Bouldin and Dunn indices. The best cluster's features were considered the optimal parameter set for resource usage analysis and VM selection during migration.
Optimal Feature Selection from VMware ESXi 5.1 Feature Setijccmsjournal
A study of VMware ESXi 5.1 server has been carried out to find the optimal set of parameters which suggest usage of different resources of the server. Feature selection algorithms have been used to extract the optimum set of parameters of the data obtained from VMware ESXi 5.1 server using esxtop command. Multiple virtual machines (VMs) are running in the mentioned server. K-means algorithm is used for clustering the VMs. The goodness of each cluster is determined by Davies Bouldin index and Dunn index respectively. The best cluster is further identified by the determined indices. The features of the best cluster are considered into a set of optimal parameters.
Optimal feature selection from v mware esxi 5.1 feature setijccmsjournal
A study of VMware ESXi 5.1 server has been carried out to find the optimal set of parameters which
suggest usage of different resources of the server. Feature selection algorithms have been used to extract
the optimum set of parameters of the data obtained from VMware ESXi 5.1 server using esxtop command.
Multiple virtual machines (VMs) are running in the mentioned server. K-means algorithm is used for
clustering the VMs. The goodness of each cluster is determined by Davies Bouldin index and Dunn index
respectively. The best cluster is further identified by the determined indices. The features of the best cluster
are considered into a set of optimal parameters.
A Modified KS-test for Feature SelectionIOSR Journals
This document proposes a modified Kolmogorov-Smirnov (KS) test-based feature selection algorithm. It begins with an overview of feature selection and its benefits. It then discusses two common feature selection approaches: filter and wrapper models. The document proposes a fast redundancy removal filter based on a modified KS statistic that utilizes class label information to compare feature pairs. It compares the proposed algorithm to other methods like Correlation Feature Selection (CFS) and KS-Correlation Based Filter (KS-CBF). The efficiency and effectiveness of the various methods are tested on standard classifiers. In most cases, the proposed approach achieved equal or better classification accuracy compared to using all features or the other algorithms.
A Combined Approach for Feature Subset Selection and Size Reduction for High ...IJERA Editor
selection of relevant feature from a given set of feature is one of the important issues in the field of
data mining as well as classification. In general the dataset may contain a number of features however it is not
necessary that the whole set features are important for particular analysis of decision making because the
features may share the common information‟s and can also be completely irrelevant to the undergoing
processing. This generally happen because of improper selection of features during the dataset formation or
because of improper information availability about the observed system. However in both cases the data will
contain the features that will just increase the processing burden which may ultimately cause the improper
outcome when used for analysis. Because of these reasons some kind of methods are required to detect and
remove these features hence in this paper we are presenting an efficient approach for not just removing the
unimportant features but also the size of complete dataset size. The proposed algorithm utilizes the information
theory to detect the information gain from each feature and minimum span tree to group the similar features
with that the fuzzy c-means clustering is used to remove the similar entries from the dataset. Finally the
algorithm is tested with SVM classifier using 35 publicly available real-world high-dimensional dataset and the
results shows that the presented algorithm not only reduces the feature set and data lengths but also improves the
performances of the classifier.
Evaluation of a hybrid method for constructing multiple SVM kernelsinfopapers
Dana Simian, Florin Stoica, Evaluation of a hybrid method for constructing multiple SVM kernels, Recent Advances in Computers, Proceedings of the 13th WSEAS International Conference on Computers, Recent Advances in Computer Engineering Series, WSEAS Press, Rodos, Greece, July 23-25, 2009, ISSN: 1790-5109, ISBN: 978-960-474-099-4, pp. 619-623
Network Based Intrusion Detection System using Filter Based Feature Selection...IRJET Journal
This document proposes a mutual information-based feature selection algorithm to select optimal features for network intrusion detection classification. The algorithm aims to handle dependent data features better than previous methods. It evaluates the effectiveness of the algorithm on network intrusion detection cases. Most previous methods suffer from low detection rates and high false alarm rates. The proposed approach uses feature selection, filtering, clustering, and clustering ensemble techniques in a hybrid data mining method to achieve high accuracy for intrusion detection systems.
Data mining is indispensable for business organizations for extracting useful information from the huge volume of stored data which can be used in managerial decision making to survive in the competition. Due to the day-to-day advancements in information and communication technology, these data collected from ecommerce and e-governance are mostly high dimensional. Data mining prefers small datasets than high dimensional datasets. Feature selection is an important dimensionality reduction technique. The subsets selected in subsequent iterations by feature selection should be same or similar even in case of small perturbations of the dataset and is called as selection stability. It is recently becomes important topic of research community. The selection stability has been measured by various measures. This paper analyses the selection of the suitable search method and stability measure for the feature selection algorithms and also the influence of the characteristics of the dataset as the choice of the best approach is highly problem dependent.
ON FEATURE SELECTION ALGORITHMS AND FEATURE SELECTION STABILITY MEASURES: A C...ijcsit
Data mining is indispensable for business organizations for extracting useful information from the huge volume of stored data which can be used in managerial decision making to survive in the competition. Due to the day-to-day advancements in information and communication technology, these data collected from ecommerce and e-governance are mostly high dimensional. Data mining prefers small datasets than high dimensional datasets. Feature selection is an important dimensionality reduction technique. The subsets selected in subsequent iterations by feature selection should be same or similar even in case of small perturbations of the dataset and is called as selection stability. It is recently becomes important topic of research community. The selection stability has been measured by various measures. This paper analyses the selection of the suitable search method and stability measure for the feature selection algorithms and also the influence of the characteristics of the dataset as the choice of the best approach is highly problem dependent.
Optimal Feature Selection from VMware ESXi 5.1 Feature Setijccmsjournal
A study of VMware ESXi 5.1 server has been carried out to find the optimal set of parameters which suggest usage of different resources of the server. Feature selection algorithms have been used to extract the optimum set of parameters of the data obtained from VMware ESXi 5.1 server using esxtop command. Multiple virtual machines (VMs) are running in the mentioned server. K-means algorithm is used for clustering the VMs. The goodness of each cluster is determined by Davies Bouldin index and Dunn index respectively. The best cluster is further identified by the determined indices. The features of the best cluster are considered into a set of optimal parameters.
Optimal feature selection from v mware esxi 5.1 feature setijccmsjournal
A study of VMware ESXi 5.1 server has been carried out to find the optimal set of parameters which
suggest usage of different resources of the server. Feature selection algorithms have been used to extract
the optimum set of parameters of the data obtained from VMware ESXi 5.1 server using esxtop command.
Multiple virtual machines (VMs) are running in the mentioned server. K-means algorithm is used for
clustering the VMs. The goodness of each cluster is determined by Davies Bouldin index and Dunn index
respectively. The best cluster is further identified by the determined indices. The features of the best cluster
are considered into a set of optimal parameters.
A Modified KS-test for Feature SelectionIOSR Journals
This document proposes a modified Kolmogorov-Smirnov (KS) test-based feature selection algorithm. It begins with an overview of feature selection and its benefits. It then discusses two common feature selection approaches: filter and wrapper models. The document proposes a fast redundancy removal filter based on a modified KS statistic that utilizes class label information to compare feature pairs. It compares the proposed algorithm to other methods like Correlation Feature Selection (CFS) and KS-Correlation Based Filter (KS-CBF). The efficiency and effectiveness of the various methods are tested on standard classifiers. In most cases, the proposed approach achieved equal or better classification accuracy compared to using all features or the other algorithms.
A Combined Approach for Feature Subset Selection and Size Reduction for High ...IJERA Editor
selection of relevant feature from a given set of feature is one of the important issues in the field of
data mining as well as classification. In general the dataset may contain a number of features however it is not
necessary that the whole set features are important for particular analysis of decision making because the
features may share the common information‟s and can also be completely irrelevant to the undergoing
processing. This generally happen because of improper selection of features during the dataset formation or
because of improper information availability about the observed system. However in both cases the data will
contain the features that will just increase the processing burden which may ultimately cause the improper
outcome when used for analysis. Because of these reasons some kind of methods are required to detect and
remove these features hence in this paper we are presenting an efficient approach for not just removing the
unimportant features but also the size of complete dataset size. The proposed algorithm utilizes the information
theory to detect the information gain from each feature and minimum span tree to group the similar features
with that the fuzzy c-means clustering is used to remove the similar entries from the dataset. Finally the
algorithm is tested with SVM classifier using 35 publicly available real-world high-dimensional dataset and the
results shows that the presented algorithm not only reduces the feature set and data lengths but also improves the
performances of the classifier.
Evaluation of a hybrid method for constructing multiple SVM kernelsinfopapers
Dana Simian, Florin Stoica, Evaluation of a hybrid method for constructing multiple SVM kernels, Recent Advances in Computers, Proceedings of the 13th WSEAS International Conference on Computers, Recent Advances in Computer Engineering Series, WSEAS Press, Rodos, Greece, July 23-25, 2009, ISSN: 1790-5109, ISBN: 978-960-474-099-4, pp. 619-623
Network Based Intrusion Detection System using Filter Based Feature Selection...IRJET Journal
This document proposes a mutual information-based feature selection algorithm to select optimal features for network intrusion detection classification. The algorithm aims to handle dependent data features better than previous methods. It evaluates the effectiveness of the algorithm on network intrusion detection cases. Most previous methods suffer from low detection rates and high false alarm rates. The proposed approach uses feature selection, filtering, clustering, and clustering ensemble techniques in a hybrid data mining method to achieve high accuracy for intrusion detection systems.
Data mining is indispensable for business organizations for extracting useful information from the huge volume of stored data which can be used in managerial decision making to survive in the competition. Due to the day-to-day advancements in information and communication technology, these data collected from ecommerce and e-governance are mostly high dimensional. Data mining prefers small datasets than high dimensional datasets. Feature selection is an important dimensionality reduction technique. The subsets selected in subsequent iterations by feature selection should be same or similar even in case of small perturbations of the dataset and is called as selection stability. It is recently becomes important topic of research community. The selection stability has been measured by various measures. This paper analyses the selection of the suitable search method and stability measure for the feature selection algorithms and also the influence of the characteristics of the dataset as the choice of the best approach is highly problem dependent.
ON FEATURE SELECTION ALGORITHMS AND FEATURE SELECTION STABILITY MEASURES: A C...ijcsit
Data mining is indispensable for business organizations for extracting useful information from the huge volume of stored data which can be used in managerial decision making to survive in the competition. Due to the day-to-day advancements in information and communication technology, these data collected from ecommerce and e-governance are mostly high dimensional. Data mining prefers small datasets than high dimensional datasets. Feature selection is an important dimensionality reduction technique. The subsets selected in subsequent iterations by feature selection should be same or similar even in case of small perturbations of the dataset and is called as selection stability. It is recently becomes important topic of research community. The selection stability has been measured by various measures. This paper analyses the selection of the suitable search method and stability measure for the feature selection algorithms and also the influence of the characteristics of the dataset as the choice of the best approach is highly problem dependent.
Data mining is indispensable for business organizations for extracting useful information from the huge
volume of stored data which can be used in managerial decision making to survive in the competition. Due
to the day-to-day advancements in information and communication technology, these data collected from e-
commerce and e-governance are mostly high dimensional. Data mining prefers small datasets than high
dimensional datasets. Feature selection is an important dimensionality reduction technique. The subsets
selected in subsequent iterations by feature selection should be same or similar even in case of small
perturbations of the dataset and is called as selection stability. It is recently becomes important topic of
research community. The selection stability has been measured by various measures. This paper analyses
the selection of the suitable search method and stability measure for the feature selection algorithms and
also the influence of the characteristics of the dataset as the choice of the best approach is highly problem
dependent.
The document describes a proposed fast clustering-based feature subset selection (FAST) algorithm for high-dimensional data. The FAST algorithm works in two steps: 1) clustering features using minimum spanning tree methods, and 2) selecting the most representative feature from each cluster. This identifies useful and independent features efficiently. Experimental results on 35 real-world datasets demonstrate that FAST produces smaller feature subsets and improves classifier performance compared to other feature selection algorithms.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
An integrated mechanism for feature selectionsai kumar
This document discusses an integrated mechanism for feature selection and fuzzy rule extraction for classification problems. It aims to select a useful set of features that can solve the classification problem while designing an interpretable fuzzy rule-based system. The mechanism is an embedded feature selection method, meaning feature selection is integrated into the rule base formation process. This allows it to account for possible nonlinear interactions between features and between features and the modeling tool. The authors demonstrate the effectiveness of the proposed method on several datasets.
JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...IEEEGLOBALSOFTTECHNOLOGIES
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
A fast clustering based feature subset selection algorithm for high-dimension...IEEEFINALYEARPROJECTS
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.co¬m-Visit Our Website: www.finalyearprojects.org
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...IEEEGLOBALSOFTTECHNOLOGIES
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Feature selection is an essential issue in machine learning. It discards the unnecessary or redundant features in the dataset. This paper introduced the new feature selection based on kernel function using 16 the real-world datasets from UCI data repository, and k-means clustering was utilized as the classifier using radial basis function (RBF) and polynomial kernel function. After sorting the features using the new feature selection, 75 percent of it was examined and evaluated using 10-fold cross-validation, then the accuracy, F1-Score, and running time were compared. From the experiments, it was concluded that the performance of the new feature selection based on RBF kernel function varied according to the value of the kernel parameter, opposite with the polynomial kernel function. Moreover, the new feature selection based on RBF has a faster running time compared to the polynomial kernel function. Besides, the proposed method has higher accuracy and F1-Score until 40 percent difference in several datasets compared to the commonly used feature selection techniques such as Fisher score, Chi-Square test, and Laplacian score. Therefore, this method can be considered to use for feature selection
EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...cscpconf
Classification is a step by step practice for allocating a given piece of input into any of the given
category. Classification is an essential Machine Learning technique. There are many
classification problem occurs in different application areas and need to be solved. Different
types are classification algorithms like memory-based, tree-based, rule-based, etc are widely
used. This work studies the performance of different memory based classifiers for classification
of Multivariate data set from UCI machine learning repository using the open source machine
learning tool. A comparison of different memory based classifiers used and a practical
guideline for selecting the most suited algorithm for a classification is presented. Apart fromthat some empirical criteria for describing and evaluating the best classifiers are discussed
A Threshold Fuzzy Entropy Based Feature Selection: Comparative StudyIJMER
Feature selection is one of the most common and critical tasks in database classification. It
reduces the computational cost by removing insignificant and unwanted features. Consequently, this
makes the diagnosis process accurate and comprehensible. This paper presents the measurement of
feature relevance based on fuzzy entropy, tested with Radial Basis Classifier (RBF) network,
Bagging(Bootstrap Aggregating), Boosting and stacking for various fields of datasets. Twenty
benchmarked datasets which are available in UCI Machine Learning Repository and KDD have been
used for this work. The accuracy obtained from these classification process shows that the proposed
method is capable of producing good and accurate results with fewer features than the original
datasets.
A fast clustering based feature subset selection algorithm for high-dimension...JPINFOTECH JAYAPRAKASH
The document proposes a fast clustering-based feature selection algorithm (FAST) to efficiently and effectively select useful feature subsets from high-dimensional data. FAST works in two steps: (1) it clusters features using minimum spanning trees, partitioning clusters so each represents a subset of independent features; (2) it selects the most representative feature from each cluster to form the output subset. Experiments on 35 real-world datasets show FAST not only selects smaller feature subsets but also improves performance of four common classifiers compared to other feature selection methods.
Performance Comparision of Machine Learning AlgorithmsDinusha Dilanka
In this paper Compare the performance of two
classification algorithm. I t is useful to differentiate
algorithms based on computational performance rather
than classification accuracy alone. As although
classification accuracy between the algorithms is similar,
computational performance can differ significantly and it
can affect to the final results. So the objective of this paper
is to perform a comparative analysis of two machine
learning algorithms namely, K Nearest neighbor,
classification and Logistic Regression. In this paper it
was considered a large dataset of 7981 data points and 112
features. Then the performance of the above mentioned
machine learning algorithms are examined. In this paper
the processing time and accuracy of the different machine
learning techniques are being estimated by considering the
collected data set, over a 60% for train and remaining
40% for testing. The paper is organized as follows. In
Section I, introduction and background analysis of the
research is included and in section II, problem statement.
In Section III, our application and data analyze Process,
the testing environment, and the Methodology of our
analysis are being described briefly. Section IV comprises
the results of two algorithms. Finally, the paper concludes
with a discussion of future directions for research by
eliminating the problems existing with the current
research methodology.
JAVA 2013 IEEE DATAMINING PROJECT A fast clustering based feature subset sele...IEEEGLOBALSOFTTECHNOLOGIES
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
A fast clustering based feature subset selection algorithm for high-dimension...IEEEFINALYEARPROJECTS
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.co¬m-Visit Our Website: www.finalyearprojects.org
A general frame for building optimal multiple SVM kernelsinfopapers
Dana Simian, Florin Stoica, A General Frame for Building Optimal Multiple SVM Kernels, Large-Scale Scientific Computing, Lecture Notes in Computer Science, 2012, Volume 7116/2012, 256-263, DOI: 10.1007/978-3-642-29843-1_29
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINERIJCSEA Journal
Comparison study of algorithms is very much required before implementing them for the needs of any
organization. The comparisons of algorithms are depending on the various parameters such as data
frequency, types of data and relationship among the attributes in a given data set. There are number of
learning and classifications algorithms are used to analyse, learn patterns and categorize data are
available. But the problem is the one to find the best algorithm according to the problem and desired
output. The desired result has always been higher accuracy in predicting future values or events from the
given dataset. Algorithms taken for the comparisons study are Neural net, SVM, Naïve Bayes, BFT and
Decision stump. These top algorithms are most influential data mining algorithms in the research
community. These algorithms have been considered and mostly used in the field of knowledge discovery
and data mining.
Iaetsd an enhanced feature selection forIaetsd Iaetsd
The document discusses feature selection techniques for machine learning applications. It proposes an Enhanced Fast Clustering-based Feature Selection (EFAST) algorithm. The EFAST algorithm works in two steps: 1) features are clustered using graph-theoretic clustering methods, and 2) the most relevant representative feature strongly correlated with the target categories is selected from each cluster to form the optimal feature subset. Features from different clusters are relatively independent, so EFAST has a high chance of selecting a set of useful and independent features. The algorithm was tested on real-world data and showed improved performance over other feature selection methods by reducing features while also improving classifier performance.
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subse...IEEEGLOBALSOFTTECHNOLOGIES
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
This document compares four clustering algorithms (K-means, hierarchical, EM, and density-based) using the WEKA tool. It applies the algorithms to a dataset of software classes and evaluates them based on number of clusters, time to build models, squared errors, and log likelihood. The results show that K-means performs best in terms of time to build models, while density-based clustering performs best in terms of log likelihood. Overall, the document concludes that K-means is the best algorithm for this dataset because it balances low runtime and good clustering accuracy.
A report on designing a model for improving CPU Scheduling by using Machine L...MuskanRath1
Disclaimer: Please let me know in case some of the portions of the article match your research. I would include the link to your research in the description section of my article.
Description:
The main concern of our paper describes that we are proposing a model for a uniprocessor system for improving CPU scheduling. Our model is implemented at low-level language or assembly language and LINUX is used for the implementation of the model as it is an open-source environment and its kernel is editable.
There are several methods to predict the length of the CPU bursts, such as the exponential averaging method, however, these methods may not give accurate or reliable predicted values. In this paper, we will propose a Machine Learning (ML) based on the best approach to estimate the length of the CPU bursts for processes. We will make use of Bayesian Theory for our model as a classifier tool that will decide which process will execute first in the ready queue. The proposed approach aims to select the most significant attributes of the process using feature selection techniques and then predicts the CPU-burst for the process in the grid. Furthermore, applying attribute selection techniques improves the performance in terms of space, time, and estimation.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
More Related Content
Similar to Optimal Feature Selection from VMware ESXi 5.1 Feature Set
Data mining is indispensable for business organizations for extracting useful information from the huge
volume of stored data which can be used in managerial decision making to survive in the competition. Due
to the day-to-day advancements in information and communication technology, these data collected from e-
commerce and e-governance are mostly high dimensional. Data mining prefers small datasets than high
dimensional datasets. Feature selection is an important dimensionality reduction technique. The subsets
selected in subsequent iterations by feature selection should be same or similar even in case of small
perturbations of the dataset and is called as selection stability. It is recently becomes important topic of
research community. The selection stability has been measured by various measures. This paper analyses
the selection of the suitable search method and stability measure for the feature selection algorithms and
also the influence of the characteristics of the dataset as the choice of the best approach is highly problem
dependent.
The document describes a proposed fast clustering-based feature subset selection (FAST) algorithm for high-dimensional data. The FAST algorithm works in two steps: 1) clustering features using minimum spanning tree methods, and 2) selecting the most representative feature from each cluster. This identifies useful and independent features efficiently. Experimental results on 35 real-world datasets demonstrate that FAST produces smaller feature subsets and improves classifier performance compared to other feature selection algorithms.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
An integrated mechanism for feature selectionsai kumar
This document discusses an integrated mechanism for feature selection and fuzzy rule extraction for classification problems. It aims to select a useful set of features that can solve the classification problem while designing an interpretable fuzzy rule-based system. The mechanism is an embedded feature selection method, meaning feature selection is integrated into the rule base formation process. This allows it to account for possible nonlinear interactions between features and between features and the modeling tool. The authors demonstrate the effectiveness of the proposed method on several datasets.
JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...IEEEGLOBALSOFTTECHNOLOGIES
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
A fast clustering based feature subset selection algorithm for high-dimension...IEEEFINALYEARPROJECTS
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.co¬m-Visit Our Website: www.finalyearprojects.org
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...IEEEGLOBALSOFTTECHNOLOGIES
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Feature selection is an essential issue in machine learning. It discards the unnecessary or redundant features in the dataset. This paper introduced the new feature selection based on kernel function using 16 the real-world datasets from UCI data repository, and k-means clustering was utilized as the classifier using radial basis function (RBF) and polynomial kernel function. After sorting the features using the new feature selection, 75 percent of it was examined and evaluated using 10-fold cross-validation, then the accuracy, F1-Score, and running time were compared. From the experiments, it was concluded that the performance of the new feature selection based on RBF kernel function varied according to the value of the kernel parameter, opposite with the polynomial kernel function. Moreover, the new feature selection based on RBF has a faster running time compared to the polynomial kernel function. Besides, the proposed method has higher accuracy and F1-Score until 40 percent difference in several datasets compared to the commonly used feature selection techniques such as Fisher score, Chi-Square test, and Laplacian score. Therefore, this method can be considered to use for feature selection
EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...cscpconf
Classification is a step by step practice for allocating a given piece of input into any of the given
category. Classification is an essential Machine Learning technique. There are many
classification problem occurs in different application areas and need to be solved. Different
types are classification algorithms like memory-based, tree-based, rule-based, etc are widely
used. This work studies the performance of different memory based classifiers for classification
of Multivariate data set from UCI machine learning repository using the open source machine
learning tool. A comparison of different memory based classifiers used and a practical
guideline for selecting the most suited algorithm for a classification is presented. Apart fromthat some empirical criteria for describing and evaluating the best classifiers are discussed
A Threshold Fuzzy Entropy Based Feature Selection: Comparative StudyIJMER
Feature selection is one of the most common and critical tasks in database classification. It
reduces the computational cost by removing insignificant and unwanted features. Consequently, this
makes the diagnosis process accurate and comprehensible. This paper presents the measurement of
feature relevance based on fuzzy entropy, tested with Radial Basis Classifier (RBF) network,
Bagging(Bootstrap Aggregating), Boosting and stacking for various fields of datasets. Twenty
benchmarked datasets which are available in UCI Machine Learning Repository and KDD have been
used for this work. The accuracy obtained from these classification process shows that the proposed
method is capable of producing good and accurate results with fewer features than the original
datasets.
A fast clustering based feature subset selection algorithm for high-dimension...JPINFOTECH JAYAPRAKASH
The document proposes a fast clustering-based feature selection algorithm (FAST) to efficiently and effectively select useful feature subsets from high-dimensional data. FAST works in two steps: (1) it clusters features using minimum spanning trees, partitioning clusters so each represents a subset of independent features; (2) it selects the most representative feature from each cluster to form the output subset. Experiments on 35 real-world datasets show FAST not only selects smaller feature subsets but also improves performance of four common classifiers compared to other feature selection methods.
Performance Comparision of Machine Learning AlgorithmsDinusha Dilanka
In this paper Compare the performance of two
classification algorithm. I t is useful to differentiate
algorithms based on computational performance rather
than classification accuracy alone. As although
classification accuracy between the algorithms is similar,
computational performance can differ significantly and it
can affect to the final results. So the objective of this paper
is to perform a comparative analysis of two machine
learning algorithms namely, K Nearest neighbor,
classification and Logistic Regression. In this paper it
was considered a large dataset of 7981 data points and 112
features. Then the performance of the above mentioned
machine learning algorithms are examined. In this paper
the processing time and accuracy of the different machine
learning techniques are being estimated by considering the
collected data set, over a 60% for train and remaining
40% for testing. The paper is organized as follows. In
Section I, introduction and background analysis of the
research is included and in section II, problem statement.
In Section III, our application and data analyze Process,
the testing environment, and the Methodology of our
analysis are being described briefly. Section IV comprises
the results of two algorithms. Finally, the paper concludes
with a discussion of future directions for research by
eliminating the problems existing with the current
research methodology.
JAVA 2013 IEEE DATAMINING PROJECT A fast clustering based feature subset sele...IEEEGLOBALSOFTTECHNOLOGIES
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
A fast clustering based feature subset selection algorithm for high-dimension...IEEEFINALYEARPROJECTS
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.co¬m-Visit Our Website: www.finalyearprojects.org
A general frame for building optimal multiple SVM kernelsinfopapers
Dana Simian, Florin Stoica, A General Frame for Building Optimal Multiple SVM Kernels, Large-Scale Scientific Computing, Lecture Notes in Computer Science, 2012, Volume 7116/2012, 256-263, DOI: 10.1007/978-3-642-29843-1_29
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINERIJCSEA Journal
Comparison study of algorithms is very much required before implementing them for the needs of any
organization. The comparisons of algorithms are depending on the various parameters such as data
frequency, types of data and relationship among the attributes in a given data set. There are number of
learning and classifications algorithms are used to analyse, learn patterns and categorize data are
available. But the problem is the one to find the best algorithm according to the problem and desired
output. The desired result has always been higher accuracy in predicting future values or events from the
given dataset. Algorithms taken for the comparisons study are Neural net, SVM, Naïve Bayes, BFT and
Decision stump. These top algorithms are most influential data mining algorithms in the research
community. These algorithms have been considered and mostly used in the field of knowledge discovery
and data mining.
Iaetsd an enhanced feature selection forIaetsd Iaetsd
The document discusses feature selection techniques for machine learning applications. It proposes an Enhanced Fast Clustering-based Feature Selection (EFAST) algorithm. The EFAST algorithm works in two steps: 1) features are clustered using graph-theoretic clustering methods, and 2) the most relevant representative feature strongly correlated with the target categories is selected from each cluster to form the optimal feature subset. Features from different clusters are relatively independent, so EFAST has a high chance of selecting a set of useful and independent features. The algorithm was tested on real-world data and showed improved performance over other feature selection methods by reducing features while also improving classifier performance.
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subse...IEEEGLOBALSOFTTECHNOLOGIES
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
This document compares four clustering algorithms (K-means, hierarchical, EM, and density-based) using the WEKA tool. It applies the algorithms to a dataset of software classes and evaluates them based on number of clusters, time to build models, squared errors, and log likelihood. The results show that K-means performs best in terms of time to build models, while density-based clustering performs best in terms of log likelihood. Overall, the document concludes that K-means is the best algorithm for this dataset because it balances low runtime and good clustering accuracy.
A report on designing a model for improving CPU Scheduling by using Machine L...MuskanRath1
Disclaimer: Please let me know in case some of the portions of the article match your research. I would include the link to your research in the description section of my article.
Description:
The main concern of our paper describes that we are proposing a model for a uniprocessor system for improving CPU scheduling. Our model is implemented at low-level language or assembly language and LINUX is used for the implementation of the model as it is an open-source environment and its kernel is editable.
There are several methods to predict the length of the CPU bursts, such as the exponential averaging method, however, these methods may not give accurate or reliable predicted values. In this paper, we will propose a Machine Learning (ML) based on the best approach to estimate the length of the CPU bursts for processes. We will make use of Bayesian Theory for our model as a classifier tool that will decide which process will execute first in the ready queue. The proposed approach aims to select the most significant attributes of the process using feature selection techniques and then predicts the CPU-burst for the process in the grid. Furthermore, applying attribute selection techniques improves the performance in terms of space, time, and estimation.
Similar to Optimal Feature Selection from VMware ESXi 5.1 Feature Set (20)
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
6th International Conference on Machine Learning & Applications (CMLA 2024)ClaraZara1
6th International Conference on Machine Learning & Applications (CMLA 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSIJNSA Journal
The smart irrigation system represents an innovative approach to optimize water usage in agricultural and landscaping practices. The integration of cutting-edge technologies, including sensors, actuators, and data analysis, empowers this system to provide accurate monitoring and control of irrigation processes by leveraging real-time environmental conditions. The main objective of a smart irrigation system is to optimize water efficiency, minimize expenses, and foster the adoption of sustainable water management methods. This paper conducts a systematic risk assessment by exploring the key components/assets and their functionalities in the smart irrigation system. The crucial role of sensors in gathering data on soil moisture, weather patterns, and plant well-being is emphasized in this system. These sensors enable intelligent decision-making in irrigation scheduling and water distribution, leading to enhanced water efficiency and sustainable water management practices. Actuators enable automated control of irrigation devices, ensuring precise and targeted water delivery to plants. Additionally, the paper addresses the potential threat and vulnerabilities associated with smart irrigation systems. It discusses limitations of the system, such as power constraints and computational capabilities, and calculates the potential security risks. The paper suggests possible risk treatment methods for effective secure system operation. In conclusion, the paper emphasizes the significant benefits of implementing smart irrigation systems, including improved water conservation, increased crop yield, and reduced environmental impact. Additionally, based on the security analysis conducted, the paper recommends the implementation of countermeasures and security approaches to address vulnerabilities and ensure the integrity and reliability of the system. By incorporating these measures, smart irrigation technology can revolutionize water management practices in agriculture, promoting sustainability, resource efficiency, and safeguarding against potential security threats.
Low power architecture of logic gates using adiabatic techniquesnooriasukmaningtyas
The growing significance of portable systems to limit power consumption in ultra-large-scale-integration chips of very high density, has recently led to rapid and inventive progresses in low-power design. The most effective technique is adiabatic logic circuit design in energy-efficient hardware. This paper presents two adiabatic approaches for the design of low power circuits, modified positive feedback adiabatic logic (modified PFAL) and the other is direct current diode based positive feedback adiabatic logic (DC-DB PFAL). Logic gates are the preliminary components in any digital circuit design. By improving the performance of basic gates, one can improvise the whole system performance. In this paper proposed circuit design of the low power architecture of OR/NOR, AND/NAND, and XOR/XNOR gates are presented using the said approaches and their results are analyzed for powerdissipation, delay, power-delay-product and rise time and compared with the other adiabatic techniques along with the conventional complementary metal oxide semiconductor (CMOS) designs reported in the literature. It has been found that the designs with DC-DB PFAL technique outperform with the percentage improvement of 65% for NOR gate and 7% for NAND gate and 34% for XNOR gate over the modified PFAL techniques at 10 MHz respectively.
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
Optimal Feature Selection from VMware ESXi 5.1 Feature Set
1. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.3, No.3, September 2014
DOI:10.5121/ijccms.2014.3301 1
Optimal Feature Selection from VMware ESXi 5.1
Feature Set
Amartya Hatua
Bangalore, India
Abstract
A study of VMware ESXi 5.1 server has been carried out to find the optimal set of parameters which
suggest usage of different resources of the server. Feature selection algorithms have been used to extract
the optimum set of parameters of the data obtained from VMware ESXi 5.1 server using esxtop command.
Multiple virtual machines (VMs) are running in the mentioned server. K-means algorithm is used for
clustering the VMs. The goodness of each cluster is determined by Davies Bouldin index and Dunn index
respectively. The best cluster is further identified by the determined indices. The features of the best cluster
are considered into a set of optimal parameters.
Index Terms
Clustering, Feature selection, K-means clustering algorithm, VMware ESXi 5.1
1.INTRODUCTION
Feature selection or dimensionality reduction problem for supervised or unsupervised inductive
learning basically generates a set of candidate features. These set of features can be defined mainly
in three different approaches, a) the number or size of features is specified and they optimize an
evaluation measure. b) The subset of features must satisfy some predefined restriction on
evaluation measure. c) The subset with the best size and value measured by evaluation measure.
Improvement of inductive learning is the prime objective of feature selection. The improvement
can be measured in terms of learning speed, ease of the presentation or ability of generalization. To
improve the results of inducer, the associated key factors are diminishing the volume of storage,
reduction of the noise generated by irrelevant or redundant features and elimination of useless
knowledge.
The motivation behind the work comes from selection procedure of VMs during VM migration
process or load balancing of VMware ESXi 5.1 server. If a server is overloaded with VMs then
some of the VMs should be migrated to other sever. The selection of VMs happens depending on
the load of different resources. For every resource there are many parameters or features are
present in VMware ESXi 5.1 server. The set of all parameters and their values can be observed
using ESXTOP command. Selection of the optimal set of parameters has been carried out in this
study. Higher value for particular VM for a set of parameters indicates high usage or load of the
particular resource. Depending on the types of load of different VMs; the selection of VMs can be
done, which are eligible for migration.
2. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.3, No.3, September 2014
2
In the following sections Feature selection methods, Clustering methods, Cluster analysis methods,
Experiment methodology, Result and Conclusion are elaborately discussed.
II.FEATURE SELECTION METHODS
A feature selection algorithm (FSA) is a computational solution that is defined by a certain degree
of relevance. However the relevance of a feature always depends on the objective of the inductive
learning algorithm which uses the reduce set of feature. For induction an irrelevant feature is not
useful at the same time all relevant features are not required always [1]. Some of the feature
selection methods are: Wrapper method, Filter methods, Rough set methods and Schemata search
method. Feature selection using similarity, Principal Component Analysis method, Filter-Wrapper
Hybrid method. Again different types filtering method are also available for feature selection, like
FOCUS algorithm, RELIEF algorithm, Las Vegas algorithm, Markov Blanket filtering, Decision
Tree method and Genetic algorithms [2], [3], [4].
A.Correlation-based Feature Selection method
Correlation-based Feature Selection (CFS) [5] comes under the category of filter algorithm. In this
type of algorithm the subset of features are chosen based on ranks. The rank is determined based
on a correlation based heuristic evaluation function. The evaluation function selects features which
are highly correlated with the class and at the same time uncorrelated with each other. The
evaluation function selects features which are highly correlated with the class and at the same time
uncorrelated with each other. Selection of feature happens mainly based on the scale to which it
predicts classes in the instance space not predicted by some other features. CFSs feature subset
evaluation function is repeated here (with slightly modified notation) for ease of reference:
is representing the heuristic merit of a feature subset S containing k features. is denoting
the mean feature-class correlation (f ∈ s), and is stands for average feature-feature inter
correlation. Numerator of the equation is giving an idea regarding the prediction of the class. The
denominator is representing redundancy among the features. CFS uses best heuristic search
strategy. If continuous five times fully expanded subsets do not give a better result the process
stops; in other words that is the stopping condition of the process.
B.Wrapper method
The wrapper methodology, popularized by Kohavi and John (1997), provides an efficient way to
determine the required subset of features, regardless of the machine learning algorithm. Here the
machine learning algorithm is considered as a perfect black box. In practice, one needs to define:
(i) the procedure to find the best subset among all possible variable subsets; (ii) the procedure to
evaluate the prediction performance of a learning machine, which helps to guide the search and
reaches the stopping condition; and (iii) the selection of predictor. Wrapper method performs a
search in the space of all possible subsets.
Use of machine learning algorithms makes wrapper method remarkably universal and simple. On
the other hand when the number of features are very large then this method required an exhaustive
search for O( ) number of subsets. Where n is the number of features. The complexity of the
algorithm is O( ). So the algorithms is not efficient when n is very large.
3. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.3, No.3, September 2014
3
C.Filter Algorithm
Filter algorithms determine the subset of optimal features by measuring the intrinsic properties of
the data. The algorithm calculates a relevance score for each feature. Features having low
relevance score are removed. Afterwards, the subset of relevant features is used as input in
different machine learning algorithms.
RELIEF Algorithm: The RELIEF assigns different weight to every feature depending on the
relevance of the particular feature on the given context. RELIEF algorithm follows the filter
method to determine the minimal set of features. It samples instances from the training data
randomly and finds the relevance value based on near-hit and near-miss [6].
D.Rough Set Method
In Velayutham [7] Rough Set Theory (RST) is used as a tool to discover data dependencies and to
find the subset of relevant features.
The rough set is approximation of set by two concepts known as lower and upper approximations.
The domain objects are represented by the lower approximation. These domain objects belong to
the subset of interest with a certainty. On the other hand the upper approximation describes the
objects which are possibly belonging to subset. Approximations are made with respect to a
particular subset of features or parameters.
Considering an information system as I = (U, A d). In the information system U is the universe
with a non-empty set of finite objects. On the other hand A represents nonempty finite set of
condition attributes, and d represents the decision attribute, the corresponding function is
, where represents set of values of a. If then there is an associated
equivalence relation:
The partition of U generated by is denoted by U/P. If (x, y) IND(P) holds good, then it is
difficult to find a clearly distinguish x and y are by attributes from P. The equivalence classes of
the P in discernibility relation are denoted [x]p. Considering XU the P-lower approximation PX of
set X and P-upper approximation PX of set X can be defined as:
PX = {X ∈ U|[X]p X }
X = {X ∈ U|[X]p X }
Let P,Q A be equivalence relations over U , then positive region defined as:
PX. The positive region of the partition U/Q with respect to P, is
the set of all objects of U that can be certainly classified to blocks of the partition . U/Q by means
of P. Q depends on P in a degree denoted
Where
If k=1, Q depends totally on P, if depends partially on P, if k=0 then Q does not
depend on P.
4. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.3, No.3, September 2014
4
The Unsupervised Quick Reduct (USQR) algorithm endeavors to calculate a reduct without fully
generating all possible subsets. In Velayutham [8] the algorithm starts off with an empty set. In
each iteration one attribute is added in the set. Those attributes are selected depending on the rough
set dependency metric. This procedure continues until the value reaches its maximum value for
that dataset. The mean dependency of each attribute is calculated and best candidate is chosen:
The data set is considered as consistent if the process ends at dependency value 1; otherwise it is
inconsistent.
A.Chi-square feature selection
Chi-square method calculates the lack of information between a feature and a category. Chi-square
distribution is used for comparison with one degree of freedom to judge extremeness [9]. It is
defined as:
Here feature selection metrics (four known measures and two proposed variants), which are
functions of the following four dependency tuples:
( ): presence of t and membership in .
( ): presence of t and non-membership in .
( ): absence of and membership in .
( ): absence of t and non-membership in .
III.CLUSTERING METHOD
K-means clustering [9] is a method of cluster creation, where the main objective is to partition n
observations into K groups/clusters in each instance belongs to the cluster with nearest mean. This
results in a partitioning of the data space into Voronoi cells. Mathematically Voronoi diagram
helps to divide a space into a number of regions. Depending on some predefined conditions set of
point is taken in consideration they are known as seeds, sites, or generators. Around each site
several points which are closet to that site form a region. The regions are called Voronoi cells. The
most popular algorithm uses an iterative technique to reach the stopping point. It is commonly
famous as K-means algorithm. In computer science community it is also known as Lloyd’s
algorithm.
IV.CLUSTER ANALYSIS METHOD
The cluster analysis methods provide the measure, using which the goodness clusters can be
determined. The cluster analysis methods has been used are Davies Bouldin index and
Dunn index (J. C. Dunn 1974).
5. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.3, No.3, September 2014
5
A.DaviesBouldin index
The Davies Bouldin index can be calculated by the following formula:
In the above equation n represents the number of clusters. Cx stands for centroid of the cluster x.
is representing the average distance between each of the point in the cluster and the centroid.
is denoting the distance between two centroid of two clusters.
The clustering algorithm produces a collection of clusters having smallest Davies Bouldin index
is considered to be the best algorithm. Clusters with smallest Davies Bouldin index suggests low
intra-cluster distances and high intra-cluster similarity.
B.Dunn index
The main objective of Dunn index is to find dense and well-separated clusters. It is defined as the
ratio between the minimal inter- cluster distance to maximal intra-cluster distance. The following
formula is used to calculate Dunn index for each cluster partition.
If i and j are two clusters then distance between these two cluster is represented by d(i,j). For a
cluster K, d(k) represents
intra-cluster distance. The distance between centroids of the clusters can be considered as inter-
cluster distance represented by: d(i,j). Similarly intra-cluster distance d(k) can be measured as
distance between any two elements of cluster K. As clusters with high intra-cluster similarity and
low inter-cluster similarity are the desirable that is why cluster with high Dunn index are
preferable.
V.EXPERIMENT METHODOLOGY
The steps of the experiment are following:
Environment setup: Firstly in a physical server VMware ESXi 5.1 has been installed and twenty
six virtual machines (VM) have been installed in that.
Load selection and run the load: There are four different types (CPU intensive, Network
intensive, Memory intensive and Disk intensive) of load have been run. Each of CPU,
Network and Disk intensive loads has been run in seven different VMs, whereas Network
intensive load has been run
in rest of the five VMs.
Collection of result set: In third step data has been collected from the server using ESXTOP reply
mode. Performance of different factor has been collected like CPU, Memory, Disk, Network and
Power.
6. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.3, No.3, September 2014
6
Feature selection: In the result set there are many parameters in this phase the relevant parameters
are selected using feature selection algorithms.
Use K-means algorithm: By using the clustering algorithms clusters has been created using the
subset of features resulting from the feature selection algorithms.
Comparison: In this step the clusters has been compared based on some cluster comparison
method.
TABLE I. LIST OF FEATURES FOR DIFFERENT RESOURCES
Resources Parameters
CPU
%USED, %RUN, %SYS, %WAIT, %VMWAIT, %RDY,
%IDLE, %OVRLP, %CSTP, %MLMTD, %SWPWT,
SWTCH/s, MIG/s
Memory
SWCUR, SWTGT, SWR/s, SWW/s, LLSWR/s, LLSWW/s,
CPTDR, CPTTGT, ZERO, SHRD,SHRDSVD, COWH,
NHN,
NMIG, NRMEM, NLMEM, N_L, GST_ND0,
OVD_ND0,GST_ ND1, OVD_ND1, OVHDUM, OVHD,
OVHDMAX, CMTTGT, CMTTGT, CMTCHRG,
CMTPPS, CACHESZ, CACHUSD, ZIP/s, UNZIP/s,
MEMSZ, GRANT, SZTGT, TCHD, TCHD W, ACTV,
ACTVS, ACTVF, ACTVN, MCTLSZ, MCTLTGT,
MCTLMAX
Disk
CMDS/s, READS/s, WRITES/s, MBREAD/s,
MBWRTN/s, LAT/rd, LAT/wr
Network
PKTTX/s, MbTX/s, PKTRX/s, MbRX/s, %DRPTX,
%DRPRX, ACTN/s, PKTTXMUL/s, PKTRXMUL/s,
PKTTXBRD/s, PKTRXBRD/s
Power
%USED, %UTIL, %C0, %C1, %C2, %C3, %T0, %T1,
%T2, %T3, %T4, %T5, %T6,%T7
VI.RESULTS
A.Features selected by using different feature selection algorithms
There are five different feature selection algorithms have been used: CFS method, RELIEF
method, Chi Square Method, Wrapper method, Rough set method. The feature selection methods
have been used on the data generated from the experiment. The methods select a subset of relevant
features depending on some measuring metrics defined by the method.
TABLE I .LIST OF SELECTED FEATURES FOR DIFFERENT RESOURCES
Resource CFS RELIEF Chi
Square
Wrapper Rough
Set
CPU %USED,
%IDLE,
MIG/s
%USED,
%RUN,
%SYS,
%VMWAI
T,%RDY,
%IDLE,
%OVRLP,
%SWPWT
MIG/s
%USED,
%RUN,
%SYS,
%WAIT,
%VMWAI
T,%RDY,
%IDLE,
%OVRLP,
%CSTP,
%MLMT,
%USED,
%RUN,
%SYS,
%WAIT,
%VMWA
IT,%RD,
%IDLE,
%OVRLP,
%CSTP,
%MLMD,
%USD
%SYS,
MIG/s
7. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.3, No.3, September 2014
7
%SWPW,
SWTCH/s,
MIG/s
%SWPT,
SWTCH/,
MIG/s
Disk CMDS/s,
LAT/r22
2d
CMDS/s,
READS/s,
WRITES/,
MBREAD/
s,LAT/rd,
LAT/wr
CDMS/s,
READS/s,
WRITES/,
MBREAD/
s,MBWRT
N/s,LAT/r
d,LAT/wr
CMDS/s,
MBREAD
/s,LAT/rd
CMDS,
LAT/rd,
LAT/wr
Network PKTRXs,
MbRX/s
PKTTX/s,
MbTX/s,
PKTRX/s,
MbRX/s,
ACTN/s,
PKTRXM
UL/
PKTTX/s,
MbTX/s,
PKTRX/s,
MbRX/s
PKTTX/s,
MbTX/s,
PKTRX/s,
ACTN/s,
PKTRXM
UL/
PKTR/,
ACTN/
s,
PKTRX
,MUL/
s,PKTR
XBRD/
s
Memory SWCUR,
COWH,
OVHD,
GRANT,
SZTGT
SZTGT,
GRANT,
MCTLMA
X,
TCHD W,
MEMSZ,
TCHD,
SHRDSD,
ACTVS,
MCTLSZ,
SHRD,
ZERO,
SWCUR,
MCTLTT,
ACTVF,
ACTV,
CMTTGT,
ACTVN,
OVHDU,
OVHDM,
OVHD,
COWH,
CACHUS
D,NLME
M,NHN,
NRMEM,
N L,
CMTPPS,
CACHESZ
SWCUR,
ZERO,
SHRD,
SHRDSV
D,
COWH,
NHN,
NRMEM,
NLMEM,
N L,
OVHDUM
,OVHD,
OVHDMA
X,
CMTTGT,
CACHESZ
,CACHUS
D,MEMSZ
,GRANT,
SZTGT,
TCHD,
TCHD W,
ACTV,
ACTVS,
ACTVF,
ACTVN,
MCTLMA
X
GRANT,
SZTGT,
SWCUR,
SHRDSV
D,SHRD,
ZERO,
ACTVS,
TCHD W,
ACTVF,
TCHD,
ACTV,
CMTTGT,
ACTVN,
OVHD,
MEMSZ,
MCTLM
AX,
MCTLTG
T,
MCTLSZ,
OVHDM
AX,
OVHDU
M,
COWH,
CACHUS
D
SHRDS
VD,
GRAN
T
Power %USED %USED,
%C1,
%USED
%USED,
%C0,
%C1
%USED,
%C0,
%C1
%USE
D,%UT
IL,%C1
B.Clustering using K-means algorithm
After getting the selected features K-means clustering algorithm has been applied on each of the
set. Where each setconsists of all the selected features of all resources by a particular method of
feature selection. Here number of clusteris four. To find the goodness of the clusters, the analysis
has been done by the help of Davies Bouldin index and Dunn index.
Using these two different indices the goodness of the clustering algorithm has been found out. The
cluster having maximum Davies Bouldin index and minimum Dunn index is considered to be the
best cluster and the feature selection algorithm related to that cluster is considered to be the best
algorithm among these algorithms using the data set. From this result it can be concluded that the
parameters selected are optimum parameters from which we can have a clear idea about types of
8. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.3, No.3, September 2014
8
load running in VMware ESXi 5.1 server. Here is the table where Davies Bouldin index and Dunn
index for different clusters have been given. Where clusters found using the parameters selected by
the respective feature selection algorithms.
TABLE I .DAVIES BOULDIN INDEX AND DUNN INDEX FOR CLUSTERS.
Feature selection
technique
Davies Bouldin
index
Dunn index
CFS 9.9122e+004 1.4196e-006
RELIEF 2.2679e+00 1.0874e-009
ChiSquare Method 2.9018e+007 2.442-009
Wrapper 2.1199e+007 1.0862e-009
Rough set 3.9079e+005 4.9040e-006
From the result it can be observed that the best result in both of the case is given by the clusters
formed by the features selected by the CFS algorithm.
VII.CONCLUSION
So from the result it can be concluded that the set of parameters selected by the CFS is the best set
of parameters of VMware ESXi 5.1. So from that set of parameters we can decide the type of load
in VMware ESXi 5.1. At the same time VMs having high values of those parameters suggest the
higher usage of those resources by the VM, which can be helpful for load balancing of Virtual
Machine.
REFERENCES
[1] How Useful is Relevance? R.A. Caruana and D. Freitag, Technical report, Fall94 AAAI Symposium
on Relevance, New Orleans, 1994, AAAI.
[2] Selection of Relevant Features and Examples in Machine Learning, A.L.Blum and P.Langley, R.
Greiner and D. Subramanian, eds., Artificial Intelligence on Relevance, volume 97, pages 245271.
Artificial Intelligence, 1997.
[3] An Evaluation of Feature Selection Methods and their Application to Computer Security, J.Doak, An
Evaluation of Feature Selection Methods and their Application to Computer Security, 245271, 1992,
Technical Report CSE9218, Davis, CA: University of California, Department of Computer Science.
[4] Feature Selection for Knowledge Discovery and Data Mining, H.Liu and H.Motoda, 1998, Kluwer
Academic Publishers, London, GB.[5]Mark A.Hall, Feature Selection for Discrete and Numeric Class
Machine Learning.
[6] D. W. Yijun Sun, A relief based feature extraction algorithm, Journal of the Royal Statistical Society.
Series B (Methodological) Vol. 39, No. 1(1977), pp. 1-38, Tech. Rep., 2011.
[7] A Novel Entropy Based Unsupervised Feature Selection Algorithm Using Rough Set Theory, K.
Thangavel and C.Velayutham, Science And Management (ICAESM -2012) March 30, 31, 2012, 2012,
IEEE International Conference On Advances In Engineering.
[8] Feature selection for text categorization on imbalanced data, M. A. Hall,Sigkdd Explorations ,Volume
6, Issue 1 - Page 80, Tech. Rep., 2003.K. Elissa, “Title of paper if known,” unpublished.
[9] Extensions to the k-Means Algorithm for Clustering Large Data Setswith Categorical Values,
ZHEXUE HUANG, P1: SUD Data Mining and Knowledge Discovery KL657-03-Huang 12:59 Data
Mining and Knowledge Discovery 2, 283304 (1998) 1998, October 27, 1998, Kluwer Academic
Publishers. Manufactured in The Netherlands.