A study of VMware ESXi 5.1 server has been carried out to find the optimal set of parameters which
suggest usage of different resources of the server. Feature selection algorithms have been used to extract
the optimum set of parameters of the data obtained from VMware ESXi 5.1 server using esxtop command.
Multiple virtual machines (VMs) are running in the mentioned server. K-means algorithm is used for
clustering the VMs. The goodness of each cluster is determined by Davies Bouldin index and Dunn index
respectively. The best cluster is further identified by the determined indices. The features of the best cluster
are considered into a set of optimal parameters.
This document discusses feature selection algorithms, specifically branch and bound and beam search algorithms. It provides an overview of feature selection, discusses the fundamentals and objectives of feature selection. It then goes into more detail about how branch and bound works, including pseudocode, a flowchart, and an example. It also discusses beam search and compares branch and bound to other algorithms. In summary, it thoroughly explains branch and bound and beam search algorithms for performing feature selection on datasets.
This document discusses feature selection concepts and methods. It defines features as attributes that determine which class an instance belongs to. Feature selection aims to select a relevant subset of features by removing irrelevant, redundant and unnecessary data. This improves learning accuracy, model performance and interpretability. The document categorizes feature selection algorithms as filter, wrapper or embedded methods based on how they evaluate feature subsets. It also discusses concepts like feature relevance, search strategies, successor generation and evaluation measures used in feature selection algorithms.
Network Based Intrusion Detection System using Filter Based Feature Selection...IRJET Journal
This document proposes a mutual information-based feature selection algorithm to select optimal features for network intrusion detection classification. The algorithm aims to handle dependent data features better than previous methods. It evaluates the effectiveness of the algorithm on network intrusion detection cases. Most previous methods suffer from low detection rates and high false alarm rates. The proposed approach uses feature selection, filtering, clustering, and clustering ensemble techniques in a hybrid data mining method to achieve high accuracy for intrusion detection systems.
This document summarizes a machine learning workshop on feature selection. It discusses typical feature selection methods like single feature evaluation using metrics like mutual information and Gini indexing. It also covers subset selection techniques like sequential forward selection and sequential backward selection. Examples are provided showing how feature selection improves performance for logistic regression on large datasets with more features than samples. The document outlines the workshop agenda and provides details on when and why feature selection is important for machine learning models.
EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...cscpconf
Classification is a step by step practice for allocating a given piece of input into any of the given
category. Classification is an essential Machine Learning technique. There are many
classification problem occurs in different application areas and need to be solved. Different
types are classification algorithms like memory-based, tree-based, rule-based, etc are widely
used. This work studies the performance of different memory based classifiers for classification
of Multivariate data set from UCI machine learning repository using the open source machine
learning tool. A comparison of different memory based classifiers used and a practical
guideline for selecting the most suited algorithm for a classification is presented. Apart fromthat some empirical criteria for describing and evaluating the best classifiers are discussed
A Modified KS-test for Feature SelectionIOSR Journals
This document proposes a modified Kolmogorov-Smirnov (KS) test-based feature selection algorithm. It begins with an overview of feature selection and its benefits. It then discusses two common feature selection approaches: filter and wrapper models. The document proposes a fast redundancy removal filter based on a modified KS statistic that utilizes class label information to compare feature pairs. It compares the proposed algorithm to other methods like Correlation Feature Selection (CFS) and KS-Correlation Based Filter (KS-CBF). The efficiency and effectiveness of the various methods are tested on standard classifiers. In most cases, the proposed approach achieved equal or better classification accuracy compared to using all features or the other algorithms.
Feature selection is the process of selecting a subset of relevant features for model construction. It reduces complexity and can improve or maintain model accuracy. The curse of dimensionality means that as the number of features increases, the amount of data needed to maintain accuracy also increases exponentially. Feature selection methods include filter methods (statistical tests for correlation), wrapper methods (using the model to select features), and embedded methods (combining filter and wrapper approaches). Common filter methods include linear discriminant analysis, analysis of variance, chi-square tests, and Pearson correlation. Wrapper methods use techniques like forward selection, backward elimination, and recursive feature elimination. Embedded methods dynamically select features based on inferences from previous models.
A report on designing a model for improving CPU Scheduling by using Machine L...MuskanRath1
Disclaimer: Please let me know in case some of the portions of the article match your research. I would include the link to your research in the description section of my article.
Description:
The main concern of our paper describes that we are proposing a model for a uniprocessor system for improving CPU scheduling. Our model is implemented at low-level language or assembly language and LINUX is used for the implementation of the model as it is an open-source environment and its kernel is editable.
There are several methods to predict the length of the CPU bursts, such as the exponential averaging method, however, these methods may not give accurate or reliable predicted values. In this paper, we will propose a Machine Learning (ML) based on the best approach to estimate the length of the CPU bursts for processes. We will make use of Bayesian Theory for our model as a classifier tool that will decide which process will execute first in the ready queue. The proposed approach aims to select the most significant attributes of the process using feature selection techniques and then predicts the CPU-burst for the process in the grid. Furthermore, applying attribute selection techniques improves the performance in terms of space, time, and estimation.
This document discusses feature selection algorithms, specifically branch and bound and beam search algorithms. It provides an overview of feature selection, discusses the fundamentals and objectives of feature selection. It then goes into more detail about how branch and bound works, including pseudocode, a flowchart, and an example. It also discusses beam search and compares branch and bound to other algorithms. In summary, it thoroughly explains branch and bound and beam search algorithms for performing feature selection on datasets.
This document discusses feature selection concepts and methods. It defines features as attributes that determine which class an instance belongs to. Feature selection aims to select a relevant subset of features by removing irrelevant, redundant and unnecessary data. This improves learning accuracy, model performance and interpretability. The document categorizes feature selection algorithms as filter, wrapper or embedded methods based on how they evaluate feature subsets. It also discusses concepts like feature relevance, search strategies, successor generation and evaluation measures used in feature selection algorithms.
Network Based Intrusion Detection System using Filter Based Feature Selection...IRJET Journal
This document proposes a mutual information-based feature selection algorithm to select optimal features for network intrusion detection classification. The algorithm aims to handle dependent data features better than previous methods. It evaluates the effectiveness of the algorithm on network intrusion detection cases. Most previous methods suffer from low detection rates and high false alarm rates. The proposed approach uses feature selection, filtering, clustering, and clustering ensemble techniques in a hybrid data mining method to achieve high accuracy for intrusion detection systems.
This document summarizes a machine learning workshop on feature selection. It discusses typical feature selection methods like single feature evaluation using metrics like mutual information and Gini indexing. It also covers subset selection techniques like sequential forward selection and sequential backward selection. Examples are provided showing how feature selection improves performance for logistic regression on large datasets with more features than samples. The document outlines the workshop agenda and provides details on when and why feature selection is important for machine learning models.
EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...cscpconf
Classification is a step by step practice for allocating a given piece of input into any of the given
category. Classification is an essential Machine Learning technique. There are many
classification problem occurs in different application areas and need to be solved. Different
types are classification algorithms like memory-based, tree-based, rule-based, etc are widely
used. This work studies the performance of different memory based classifiers for classification
of Multivariate data set from UCI machine learning repository using the open source machine
learning tool. A comparison of different memory based classifiers used and a practical
guideline for selecting the most suited algorithm for a classification is presented. Apart fromthat some empirical criteria for describing and evaluating the best classifiers are discussed
A Modified KS-test for Feature SelectionIOSR Journals
This document proposes a modified Kolmogorov-Smirnov (KS) test-based feature selection algorithm. It begins with an overview of feature selection and its benefits. It then discusses two common feature selection approaches: filter and wrapper models. The document proposes a fast redundancy removal filter based on a modified KS statistic that utilizes class label information to compare feature pairs. It compares the proposed algorithm to other methods like Correlation Feature Selection (CFS) and KS-Correlation Based Filter (KS-CBF). The efficiency and effectiveness of the various methods are tested on standard classifiers. In most cases, the proposed approach achieved equal or better classification accuracy compared to using all features or the other algorithms.
Feature selection is the process of selecting a subset of relevant features for model construction. It reduces complexity and can improve or maintain model accuracy. The curse of dimensionality means that as the number of features increases, the amount of data needed to maintain accuracy also increases exponentially. Feature selection methods include filter methods (statistical tests for correlation), wrapper methods (using the model to select features), and embedded methods (combining filter and wrapper approaches). Common filter methods include linear discriminant analysis, analysis of variance, chi-square tests, and Pearson correlation. Wrapper methods use techniques like forward selection, backward elimination, and recursive feature elimination. Embedded methods dynamically select features based on inferences from previous models.
A report on designing a model for improving CPU Scheduling by using Machine L...MuskanRath1
Disclaimer: Please let me know in case some of the portions of the article match your research. I would include the link to your research in the description section of my article.
Description:
The main concern of our paper describes that we are proposing a model for a uniprocessor system for improving CPU scheduling. Our model is implemented at low-level language or assembly language and LINUX is used for the implementation of the model as it is an open-source environment and its kernel is editable.
There are several methods to predict the length of the CPU bursts, such as the exponential averaging method, however, these methods may not give accurate or reliable predicted values. In this paper, we will propose a Machine Learning (ML) based on the best approach to estimate the length of the CPU bursts for processes. We will make use of Bayesian Theory for our model as a classifier tool that will decide which process will execute first in the ready queue. The proposed approach aims to select the most significant attributes of the process using feature selection techniques and then predicts the CPU-burst for the process in the grid. Furthermore, applying attribute selection techniques improves the performance in terms of space, time, and estimation.
An introduction to variable and feature selectionMarco Meoni
Presentation of a great paper from Isabelle Guyon (Clopinet) and André Elisseeff (Max Planck Institute) back in 2003, which outlines the main techniques for feature selection and model validation in machine learning systems
Fuzzy inference systems use fuzzy logic to map inputs to outputs. There are two main types:
Mamdani systems use fuzzy outputs and are well-suited for problems involving human expert knowledge. Sugeno systems have faster computation using linear or constant outputs.
The fuzzy inference process involves fuzzifying inputs, applying fuzzy logic operators, and using if-then rules. Outputs are determined through implication, aggregation, and defuzzification. Mamdani systems find the centroid of fuzzy outputs while Sugeno uses weighted averages, making it more efficient.
IRJET - Movie Genre Prediction from Plot Summaries by Comparing Various C...IRJET Journal
This document discusses predicting movie genres from plot summaries using various classification algorithms. It analyzes Multinomial Naive Bayes, Logistic Regression, Random Forest, and Stochastic Gradient Descent algorithms on a dataset of over 40,000 movie plots labeled with 12 genres. It finds that Multinomial Naive Bayes performs best, achieving the highest AUC scores for predicting each genre individually. It then builds separate classifiers for each genre using the best performing algorithm to predict the genre of new movie plots.
A fast clustering based feature subset selection algorithm for high-dimension...IEEEFINALYEARPROJECTS
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.co¬m-Visit Our Website: www.finalyearprojects.org
ON FEATURE SELECTION ALGORITHMS AND FEATURE SELECTION STABILITY MEASURES: A C...ijcsit
Data mining is indispensable for business organizations for extracting useful information from the huge volume of stored data which can be used in managerial decision making to survive in the competition. Due to the day-to-day advancements in information and communication technology, these data collected from ecommerce and e-governance are mostly high dimensional. Data mining prefers small datasets than high dimensional datasets. Feature selection is an important dimensionality reduction technique. The subsets selected in subsequent iterations by feature selection should be same or similar even in case of small perturbations of the dataset and is called as selection stability. It is recently becomes important topic of research community. The selection stability has been measured by various measures. This paper analyses the selection of the suitable search method and stability measure for the feature selection algorithms and also the influence of the characteristics of the dataset as the choice of the best approach is highly problem dependent.
Classification using L1-Penalized Logistic RegressionSetia Pramana
L1-Penalized Logistic Regression, is commonly used for classification in high dimensional data such as microarray. This slide presents a brief overview of the algorithm.
PARTICLE SWARM OPTIMIZATION FOR MULTIDIMENSIONAL CLUSTERING OF NATURAL LANGUA...IAEME Publication
Consider a non-linear dimensionality reduction method which takes into account
the discriminating power of the solution found for given values of the categorical
variable associated with each observation. Stochastic optimization method known as
the "Particle swarm optimization" is proposed to found characteristics that ensure the
best separation of observations in terms of a given quality functional. The basis for
evaluating the quality of the solution lies in the purity of the clusters obtained with the
k-means method, or with using self-organizing Kohonen feature maps.
This document discusses using feature selection techniques to address the curse of dimensionality in microarray data analysis. It presents the problem of having many more features than samples in bioinformatics tasks like cancer classification and network inference. It describes filter, wrapper and embedded feature selection approaches and proposes a blocking strategy that uses multiple learning algorithms to evaluate feature subsets in order to improve selection robustness when samples are limited. Finally, it lists several microarray gene expression datasets that are commonly used to evaluate feature selection methods.
International Journal of Engineering Research and Applications (IJERA) is a team of researchers not publication services or private publications running the journals for monetary benefits, we are association of scientists and academia who focus only on supporting authors who want to publish their work. The articles published in our journal can be accessed online, all the articles will be archived for real time access.
Our journal system primarily aims to bring out the research talent and the works done by sciaentists, academia, engineers, practitioners, scholars, post graduate students of engineering and science. This journal aims to cover the scientific research in a broader sense and not publishing a niche area of research facilitating researchers from various verticals to publish their papers. It is also aimed to provide a platform for the researchers to publish in a shorter of time, enabling them to continue further All articles published are freely available to scientific researchers in the Government agencies,educators and the general public. We are taking serious efforts to promote our journal across the globe in various ways, we are sure that our journal will act as a scientific platform for all researchers to publish their works online.
This document discusses various classical and advanced optimization techniques. It begins with an overview of classical techniques like single/multivariable optimization and methods using Lagrange multipliers or Kuhn-Tucker conditions. Numerical methods are then introduced, including linear programming, integer programming, and nonlinear programming. Advanced techniques like hill climbing, simulated annealing, genetic algorithms, and ant colony optimization are also summarized. These optimization methods are inspired by natural processes and use techniques such as local search, positive feedback, and path pheromones to find approximate solutions.
Applications and Analysis of Bio-Inspired Eagle Strategy for Engineering Opti...Xin-She Yang
This document discusses applying an eagle strategy inspired by nature to engineering optimization problems. The eagle strategy uses a two-stage approach combining global exploration with local exploitation. Global exploration uses Lèvy flights for random walks to diversify solutions. Promising solutions are then locally optimized using an efficient local search algorithm like particle swarm optimization. The document analyzes random walk models like Lèvy flights and how they can maintain diversity in swarm intelligence algorithms. It applies the eagle strategy to four engineering design problems, finding Lèvy flights can effectively reduce computational efforts.
This document proposes and evaluates a new metaheuristic optimization algorithm called Current Search (CS) and applies it to optimize PID controller parameters for DC motor speed control. The CS is inspired by electric current flow and aims to balance exploration and exploitation. It outperforms genetic algorithm, particle swarm optimization, and adaptive tabu search on benchmark optimization problems, finding better solutions faster. When applied to optimize a PID controller for DC motor speed control, the CS successfully controlled motor speed.
Feature selection is an essential issue in machine learning. It discards the unnecessary or redundant features in the dataset. This paper introduced the new feature selection based on kernel function using 16 the real-world datasets from UCI data repository, and k-means clustering was utilized as the classifier using radial basis function (RBF) and polynomial kernel function. After sorting the features using the new feature selection, 75 percent of it was examined and evaluated using 10-fold cross-validation, then the accuracy, F1-Score, and running time were compared. From the experiments, it was concluded that the performance of the new feature selection based on RBF kernel function varied according to the value of the kernel parameter, opposite with the polynomial kernel function. Moreover, the new feature selection based on RBF has a faster running time compared to the polynomial kernel function. Besides, the proposed method has higher accuracy and F1-Score until 40 percent difference in several datasets compared to the commonly used feature selection techniques such as Fisher score, Chi-Square test, and Laplacian score. Therefore, this method can be considered to use for feature selection
Implicit and explicit sequence control with exception handlingVIKASH MAINANWAL
This document presents information on exception handling in programming languages. It discusses implicit and explicit sequence control, defines exceptions as erroneous situations during program execution, and describes different types of errors like syntax errors and runtime errors. It also explains exception handling keywords like try, catch, and finally, and shows the exception handling control flow. Code examples are provided to illustrate syntax errors, runtime errors, logical errors, and exception handling.
Feature selection using modified particle swarm optimisation for face recogni...eSAT Journals
Abstract
One of the major influential factors which affects the accuracy of classification rate is the selection of right features. Not all features have vital role in classification. Many of the features in the dataset may be redundant and irrelevant, which increase the computational cost and may reduce classification rate. In this paper, we used DCT(Discrete cosine transform) coefficients as features for face recognition application. The coefficients are optimally selected based on a modified PSO algorithm. In this, the choice of coefficients is done by incorporating the average of the mean normalized standard deviations of various classes and giving more weightage to the lower indexed DCT coefficients. The algorithm is tested on ORL database. A recognition rate of 97% is obtained. Average number of features selected is about 40 percent for a 10 × 10 input. The modified PSO took about 50 iterations for convergence. These performance figures are found to be better than some of the work reported in literature.
Keywords: Particle swarm optimization, Discrete cosine transform, feature extraction, feature selection, face recognition, classification rate.
AN ENVIRONMENT FOR NON-DUPLICATE TEST GENERATION FOR WEB BASED APPLICATIONecij
Web applications quality, portability, consistency, interoperability and dependability are main parameters because software could block completely business. Internet based applications are very complex and heterogeneous also based on different kind of modules which can be implemented in a specific language, so testing is very much require. This paper represents the object oriented based environment. Web applications consist of Web related documents, class and objects and some different kinds of server like HTML, DHTML, and JAVA SCRIPT etc. This paper presents three kinds of layers of test: unit testing, integration testing and system testing. This paper also discusses the different techniques. This paper also
based on the reverse engineering techniques which can be used to analyze the software which is probably connected with a model. It also consists a method of generate the test cases. This paper also describes software used to perform some results.
OPTIMIZATION OF ARTIFICIAL NEURAL NETWORK MODEL FOR IMPROVEMENT OF ARTIFICIAL...ijccmsjournal
Considerable development has been done by some authors of this paper towards development of manufacturing process units energized by Human Powered Flywheel Motor (HPFM) as an energy source. This machine system comprises three sub systems namely (i) HPFM (ii) Torsionally Flexible Clutch (TFC) (iii) A Process Unit. Process unit so far tried are mostly rural based such as brick making machine (both rectangular and keyed cross sectioned), Low head water lifting, Wood turning, Wood strips cutting, etc HPFM comprises pedalling system similar to bicycle, a speed rising gear pair and a flywheel big enough such that a young lad of 21-25 years, 165cm height, slim structure can pump energy around 30,000 N-m in minutes time. Once such an energy is stored peddling is stopped and a special type of TFC is engaged which very efficiently brings about momentum and energy transfer from flywheel to a process unit. Process unit utilization time upon clutch engagement is 5 to 15 seconds depending on the application. In other word process units needing power of order of 3 hp to 10 hp can be powered by such a machine concept. Experimental data base model is formulated for Human Powered Flywheel Motor Energized Brick Making Machine (HPFMEBMM).
The focus of the present paper is on development of an optimum Artificial Neural Network (ANN) model which will predict the experimental evidences accurately and precisely. The optimisation is acknowledged though variation of various parameters of ANN topology like training algorithm, learning algorithm, size of hidden layer, number of hidden layers, etc while training the network and accepting the best value of that parameter. The paper also discusses the effects and results of variation of various parameters on prediction of network.
This document summarizes a study on the impact of black hole attacks on the performance of mobile ad hoc networks (MANETs). The study used the Network Simulator 2 (NS-2) to simulate black hole attacks in MANETs using the Ad hoc On-Demand Distance Vector (AODV) routing protocol. It was found that the packet delivery ratio decreased significantly when black hole attacks were introduced. Additionally, the packet delivery ratio decreased dramatically as the number of black hole nodes increased.
Synchronization and Inverse Synchronization of Some Different Dimensional Dis...ijccmsjournal
This document summarizes research on synchronization and inverse synchronization of discrete-time chaotic dynamical systems using scaling matrices. It introduces definitions of synchronization and inverse synchronization for discrete systems via scaling matrices. It then applies these concepts to synchronize (1) a 2D Lorenz discrete system and 3D Wang system, and (2) a 3D generalized Henon map and 2D Fold map. Numerical simulations verify the effectiveness of the proposed synchronization schemes.
This webinar series from Jennifer Schaus & Associates provides information on government contracting topics. The first webinar on August 6th will cover government grants and the SBIR process, presented by Leona Charles from SPC Consulting. Future webinars will discuss simplified acquisitions, the mentor-protégé program, HUBZone certification, access to capital, and winning proposals. All webinars will be recorded and available on the company's website.
Optimal Feature Selection from VMware ESXi 5.1 Feature Setijccmsjournal
A study of VMware ESXi 5.1 server has been carried out to find the optimal set of parameters which suggest usage of different resources of the server. Feature selection algorithms have been used to extract the optimum set of parameters of the data obtained from VMware ESXi 5.1 server using esxtop command. Multiple virtual machines (VMs) are running in the mentioned server. K-means algorithm is used for clustering the VMs. The goodness of each cluster is determined by Davies Bouldin index and Dunn index respectively. The best cluster is further identified by the determined indices. The features of the best cluster are considered into a set of optimal parameters.
An introduction to variable and feature selectionMarco Meoni
Presentation of a great paper from Isabelle Guyon (Clopinet) and André Elisseeff (Max Planck Institute) back in 2003, which outlines the main techniques for feature selection and model validation in machine learning systems
Fuzzy inference systems use fuzzy logic to map inputs to outputs. There are two main types:
Mamdani systems use fuzzy outputs and are well-suited for problems involving human expert knowledge. Sugeno systems have faster computation using linear or constant outputs.
The fuzzy inference process involves fuzzifying inputs, applying fuzzy logic operators, and using if-then rules. Outputs are determined through implication, aggregation, and defuzzification. Mamdani systems find the centroid of fuzzy outputs while Sugeno uses weighted averages, making it more efficient.
IRJET - Movie Genre Prediction from Plot Summaries by Comparing Various C...IRJET Journal
This document discusses predicting movie genres from plot summaries using various classification algorithms. It analyzes Multinomial Naive Bayes, Logistic Regression, Random Forest, and Stochastic Gradient Descent algorithms on a dataset of over 40,000 movie plots labeled with 12 genres. It finds that Multinomial Naive Bayes performs best, achieving the highest AUC scores for predicting each genre individually. It then builds separate classifiers for each genre using the best performing algorithm to predict the genre of new movie plots.
A fast clustering based feature subset selection algorithm for high-dimension...IEEEFINALYEARPROJECTS
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.co¬m-Visit Our Website: www.finalyearprojects.org
ON FEATURE SELECTION ALGORITHMS AND FEATURE SELECTION STABILITY MEASURES: A C...ijcsit
Data mining is indispensable for business organizations for extracting useful information from the huge volume of stored data which can be used in managerial decision making to survive in the competition. Due to the day-to-day advancements in information and communication technology, these data collected from ecommerce and e-governance are mostly high dimensional. Data mining prefers small datasets than high dimensional datasets. Feature selection is an important dimensionality reduction technique. The subsets selected in subsequent iterations by feature selection should be same or similar even in case of small perturbations of the dataset and is called as selection stability. It is recently becomes important topic of research community. The selection stability has been measured by various measures. This paper analyses the selection of the suitable search method and stability measure for the feature selection algorithms and also the influence of the characteristics of the dataset as the choice of the best approach is highly problem dependent.
Classification using L1-Penalized Logistic RegressionSetia Pramana
L1-Penalized Logistic Regression, is commonly used for classification in high dimensional data such as microarray. This slide presents a brief overview of the algorithm.
PARTICLE SWARM OPTIMIZATION FOR MULTIDIMENSIONAL CLUSTERING OF NATURAL LANGUA...IAEME Publication
Consider a non-linear dimensionality reduction method which takes into account
the discriminating power of the solution found for given values of the categorical
variable associated with each observation. Stochastic optimization method known as
the "Particle swarm optimization" is proposed to found characteristics that ensure the
best separation of observations in terms of a given quality functional. The basis for
evaluating the quality of the solution lies in the purity of the clusters obtained with the
k-means method, or with using self-organizing Kohonen feature maps.
This document discusses using feature selection techniques to address the curse of dimensionality in microarray data analysis. It presents the problem of having many more features than samples in bioinformatics tasks like cancer classification and network inference. It describes filter, wrapper and embedded feature selection approaches and proposes a blocking strategy that uses multiple learning algorithms to evaluate feature subsets in order to improve selection robustness when samples are limited. Finally, it lists several microarray gene expression datasets that are commonly used to evaluate feature selection methods.
International Journal of Engineering Research and Applications (IJERA) is a team of researchers not publication services or private publications running the journals for monetary benefits, we are association of scientists and academia who focus only on supporting authors who want to publish their work. The articles published in our journal can be accessed online, all the articles will be archived for real time access.
Our journal system primarily aims to bring out the research talent and the works done by sciaentists, academia, engineers, practitioners, scholars, post graduate students of engineering and science. This journal aims to cover the scientific research in a broader sense and not publishing a niche area of research facilitating researchers from various verticals to publish their papers. It is also aimed to provide a platform for the researchers to publish in a shorter of time, enabling them to continue further All articles published are freely available to scientific researchers in the Government agencies,educators and the general public. We are taking serious efforts to promote our journal across the globe in various ways, we are sure that our journal will act as a scientific platform for all researchers to publish their works online.
This document discusses various classical and advanced optimization techniques. It begins with an overview of classical techniques like single/multivariable optimization and methods using Lagrange multipliers or Kuhn-Tucker conditions. Numerical methods are then introduced, including linear programming, integer programming, and nonlinear programming. Advanced techniques like hill climbing, simulated annealing, genetic algorithms, and ant colony optimization are also summarized. These optimization methods are inspired by natural processes and use techniques such as local search, positive feedback, and path pheromones to find approximate solutions.
Applications and Analysis of Bio-Inspired Eagle Strategy for Engineering Opti...Xin-She Yang
This document discusses applying an eagle strategy inspired by nature to engineering optimization problems. The eagle strategy uses a two-stage approach combining global exploration with local exploitation. Global exploration uses Lèvy flights for random walks to diversify solutions. Promising solutions are then locally optimized using an efficient local search algorithm like particle swarm optimization. The document analyzes random walk models like Lèvy flights and how they can maintain diversity in swarm intelligence algorithms. It applies the eagle strategy to four engineering design problems, finding Lèvy flights can effectively reduce computational efforts.
This document proposes and evaluates a new metaheuristic optimization algorithm called Current Search (CS) and applies it to optimize PID controller parameters for DC motor speed control. The CS is inspired by electric current flow and aims to balance exploration and exploitation. It outperforms genetic algorithm, particle swarm optimization, and adaptive tabu search on benchmark optimization problems, finding better solutions faster. When applied to optimize a PID controller for DC motor speed control, the CS successfully controlled motor speed.
Feature selection is an essential issue in machine learning. It discards the unnecessary or redundant features in the dataset. This paper introduced the new feature selection based on kernel function using 16 the real-world datasets from UCI data repository, and k-means clustering was utilized as the classifier using radial basis function (RBF) and polynomial kernel function. After sorting the features using the new feature selection, 75 percent of it was examined and evaluated using 10-fold cross-validation, then the accuracy, F1-Score, and running time were compared. From the experiments, it was concluded that the performance of the new feature selection based on RBF kernel function varied according to the value of the kernel parameter, opposite with the polynomial kernel function. Moreover, the new feature selection based on RBF has a faster running time compared to the polynomial kernel function. Besides, the proposed method has higher accuracy and F1-Score until 40 percent difference in several datasets compared to the commonly used feature selection techniques such as Fisher score, Chi-Square test, and Laplacian score. Therefore, this method can be considered to use for feature selection
Implicit and explicit sequence control with exception handlingVIKASH MAINANWAL
This document presents information on exception handling in programming languages. It discusses implicit and explicit sequence control, defines exceptions as erroneous situations during program execution, and describes different types of errors like syntax errors and runtime errors. It also explains exception handling keywords like try, catch, and finally, and shows the exception handling control flow. Code examples are provided to illustrate syntax errors, runtime errors, logical errors, and exception handling.
Feature selection using modified particle swarm optimisation for face recogni...eSAT Journals
Abstract
One of the major influential factors which affects the accuracy of classification rate is the selection of right features. Not all features have vital role in classification. Many of the features in the dataset may be redundant and irrelevant, which increase the computational cost and may reduce classification rate. In this paper, we used DCT(Discrete cosine transform) coefficients as features for face recognition application. The coefficients are optimally selected based on a modified PSO algorithm. In this, the choice of coefficients is done by incorporating the average of the mean normalized standard deviations of various classes and giving more weightage to the lower indexed DCT coefficients. The algorithm is tested on ORL database. A recognition rate of 97% is obtained. Average number of features selected is about 40 percent for a 10 × 10 input. The modified PSO took about 50 iterations for convergence. These performance figures are found to be better than some of the work reported in literature.
Keywords: Particle swarm optimization, Discrete cosine transform, feature extraction, feature selection, face recognition, classification rate.
AN ENVIRONMENT FOR NON-DUPLICATE TEST GENERATION FOR WEB BASED APPLICATIONecij
Web applications quality, portability, consistency, interoperability and dependability are main parameters because software could block completely business. Internet based applications are very complex and heterogeneous also based on different kind of modules which can be implemented in a specific language, so testing is very much require. This paper represents the object oriented based environment. Web applications consist of Web related documents, class and objects and some different kinds of server like HTML, DHTML, and JAVA SCRIPT etc. This paper presents three kinds of layers of test: unit testing, integration testing and system testing. This paper also discusses the different techniques. This paper also
based on the reverse engineering techniques which can be used to analyze the software which is probably connected with a model. It also consists a method of generate the test cases. This paper also describes software used to perform some results.
OPTIMIZATION OF ARTIFICIAL NEURAL NETWORK MODEL FOR IMPROVEMENT OF ARTIFICIAL...ijccmsjournal
Considerable development has been done by some authors of this paper towards development of manufacturing process units energized by Human Powered Flywheel Motor (HPFM) as an energy source. This machine system comprises three sub systems namely (i) HPFM (ii) Torsionally Flexible Clutch (TFC) (iii) A Process Unit. Process unit so far tried are mostly rural based such as brick making machine (both rectangular and keyed cross sectioned), Low head water lifting, Wood turning, Wood strips cutting, etc HPFM comprises pedalling system similar to bicycle, a speed rising gear pair and a flywheel big enough such that a young lad of 21-25 years, 165cm height, slim structure can pump energy around 30,000 N-m in minutes time. Once such an energy is stored peddling is stopped and a special type of TFC is engaged which very efficiently brings about momentum and energy transfer from flywheel to a process unit. Process unit utilization time upon clutch engagement is 5 to 15 seconds depending on the application. In other word process units needing power of order of 3 hp to 10 hp can be powered by such a machine concept. Experimental data base model is formulated for Human Powered Flywheel Motor Energized Brick Making Machine (HPFMEBMM).
The focus of the present paper is on development of an optimum Artificial Neural Network (ANN) model which will predict the experimental evidences accurately and precisely. The optimisation is acknowledged though variation of various parameters of ANN topology like training algorithm, learning algorithm, size of hidden layer, number of hidden layers, etc while training the network and accepting the best value of that parameter. The paper also discusses the effects and results of variation of various parameters on prediction of network.
This document summarizes a study on the impact of black hole attacks on the performance of mobile ad hoc networks (MANETs). The study used the Network Simulator 2 (NS-2) to simulate black hole attacks in MANETs using the Ad hoc On-Demand Distance Vector (AODV) routing protocol. It was found that the packet delivery ratio decreased significantly when black hole attacks were introduced. Additionally, the packet delivery ratio decreased dramatically as the number of black hole nodes increased.
Synchronization and Inverse Synchronization of Some Different Dimensional Dis...ijccmsjournal
This document summarizes research on synchronization and inverse synchronization of discrete-time chaotic dynamical systems using scaling matrices. It introduces definitions of synchronization and inverse synchronization for discrete systems via scaling matrices. It then applies these concepts to synchronize (1) a 2D Lorenz discrete system and 3D Wang system, and (2) a 3D generalized Henon map and 2D Fold map. Numerical simulations verify the effectiveness of the proposed synchronization schemes.
This webinar series from Jennifer Schaus & Associates provides information on government contracting topics. The first webinar on August 6th will cover government grants and the SBIR process, presented by Leona Charles from SPC Consulting. Future webinars will discuss simplified acquisitions, the mentor-protégé program, HUBZone certification, access to capital, and winning proposals. All webinars will be recorded and available on the company's website.
Optimal Feature Selection from VMware ESXi 5.1 Feature Setijccmsjournal
A study of VMware ESXi 5.1 server has been carried out to find the optimal set of parameters which suggest usage of different resources of the server. Feature selection algorithms have been used to extract the optimum set of parameters of the data obtained from VMware ESXi 5.1 server using esxtop command. Multiple virtual machines (VMs) are running in the mentioned server. K-means algorithm is used for clustering the VMs. The goodness of each cluster is determined by Davies Bouldin index and Dunn index respectively. The best cluster is further identified by the determined indices. The features of the best cluster are considered into a set of optimal parameters.
Synchronization and inverse synchronization of some different dimensional dis...ijccmsjournal
In this paper, new types of synchronization and inverse synchro-nization are proposed for some di¤erent dimensional chaotic dynamical systems in discrete-time using scaling matrices. Based on Lyapunov stability theory and nonlinear controllers, new synchronization results are derived. Numerical simulations are used to verify the e¤ectiveness of the proposed schemes.
Optimal Feature Selection from VMware ESXi 5.1 Feature Setijccmsjournal
A study of VMware ESXi 5.1 server has been carried out to find the optimal set of parameters which
suggest usage of different resources of the server. Feature selection algorithms have been used to extract
the optimum set of parameters of the data obtained from VMware ESXi 5.1 server using esxtop command.
Multiple virtual machines (VMs) are running in the mentioned server. K-means algorithm is used for
clustering the VMs. The goodness of each cluster is determined by Davies Bouldin index and Dunn index
respectively. The best cluster is further identified by the determined indices. The features of the best cluster
are considered into a set of optimal parameters.
Optimal Feature Selection from VMware ESXi 5.1 Feature Setijccmsjournal
A study of VMware ESXi 5.1 server has been carried out to find the optimal set of parameters which
suggest usage of different resources of the server. Feature selection algorithms have been used to extract
the optimum set of parameters of the data obtained from VMware ESXi 5.1 server using esxtop command.
Multiple virtual machines (VMs) are running in the mentioned server. K-means algorithm is used for
clustering the VMs. The goodness of each cluster is determined by Davies Bouldin index and Dunn index
respectively. The best cluster is further identified by the determined indices. The features of the best cluster
are considered into a set of optimal parameters.
Optimal Feature Selection from VMware ESXi 5.1 Feature Setijccmsjournal
The document summarizes a study on optimal feature selection from VMware ESXi 5.1. Data was collected from a VMware server running multiple virtual machines under different loads. Feature selection algorithms like CFS, RELIEF, chi-square, wrapper and rough set methods were used to extract the optimal set of parameters. K-means clustering algorithm grouped the VMs and cluster quality was determined using Davies Bouldin and Dunn indices. The best cluster's features were considered the optimal parameter set for resource usage analysis and VM selection during migration.
A Combined Approach for Feature Subset Selection and Size Reduction for High ...IJERA Editor
selection of relevant feature from a given set of feature is one of the important issues in the field of
data mining as well as classification. In general the dataset may contain a number of features however it is not
necessary that the whole set features are important for particular analysis of decision making because the
features may share the common information‟s and can also be completely irrelevant to the undergoing
processing. This generally happen because of improper selection of features during the dataset formation or
because of improper information availability about the observed system. However in both cases the data will
contain the features that will just increase the processing burden which may ultimately cause the improper
outcome when used for analysis. Because of these reasons some kind of methods are required to detect and
remove these features hence in this paper we are presenting an efficient approach for not just removing the
unimportant features but also the size of complete dataset size. The proposed algorithm utilizes the information
theory to detect the information gain from each feature and minimum span tree to group the similar features
with that the fuzzy c-means clustering is used to remove the similar entries from the dataset. Finally the
algorithm is tested with SVM classifier using 35 publicly available real-world high-dimensional dataset and the
results shows that the presented algorithm not only reduces the feature set and data lengths but also improves the
performances of the classifier.
Data mining is indispensable for business organizations for extracting useful information from the huge volume of stored data which can be used in managerial decision making to survive in the competition. Due to the day-to-day advancements in information and communication technology, these data collected from ecommerce and e-governance are mostly high dimensional. Data mining prefers small datasets than high dimensional datasets. Feature selection is an important dimensionality reduction technique. The subsets selected in subsequent iterations by feature selection should be same or similar even in case of small perturbations of the dataset and is called as selection stability. It is recently becomes important topic of research community. The selection stability has been measured by various measures. This paper analyses the selection of the suitable search method and stability measure for the feature selection algorithms and also the influence of the characteristics of the dataset as the choice of the best approach is highly problem dependent.
Data mining is indispensable for business organizations for extracting useful information from the huge
volume of stored data which can be used in managerial decision making to survive in the competition. Due
to the day-to-day advancements in information and communication technology, these data collected from e-
commerce and e-governance are mostly high dimensional. Data mining prefers small datasets than high
dimensional datasets. Feature selection is an important dimensionality reduction technique. The subsets
selected in subsequent iterations by feature selection should be same or similar even in case of small
perturbations of the dataset and is called as selection stability. It is recently becomes important topic of
research community. The selection stability has been measured by various measures. This paper analyses
the selection of the suitable search method and stability measure for the feature selection algorithms and
also the influence of the characteristics of the dataset as the choice of the best approach is highly problem
dependent.
The document describes a proposed fast clustering-based feature subset selection (FAST) algorithm for high-dimensional data. The FAST algorithm works in two steps: 1) clustering features using minimum spanning tree methods, and 2) selecting the most representative feature from each cluster. This identifies useful and independent features efficiently. Experimental results on 35 real-world datasets demonstrate that FAST produces smaller feature subsets and improves classifier performance compared to other feature selection algorithms.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
A Threshold Fuzzy Entropy Based Feature Selection: Comparative StudyIJMER
Feature selection is one of the most common and critical tasks in database classification. It
reduces the computational cost by removing insignificant and unwanted features. Consequently, this
makes the diagnosis process accurate and comprehensible. This paper presents the measurement of
feature relevance based on fuzzy entropy, tested with Radial Basis Classifier (RBF) network,
Bagging(Bootstrap Aggregating), Boosting and stacking for various fields of datasets. Twenty
benchmarked datasets which are available in UCI Machine Learning Repository and KDD have been
used for this work. The accuracy obtained from these classification process shows that the proposed
method is capable of producing good and accurate results with fewer features than the original
datasets.
A fast clustering based feature subset selection algorithm for high-dimension...JPINFOTECH JAYAPRAKASH
The document proposes a fast clustering-based feature selection algorithm (FAST) to efficiently and effectively select useful feature subsets from high-dimensional data. FAST works in two steps: (1) it clusters features using minimum spanning trees, partitioning clusters so each represents a subset of independent features; (2) it selects the most representative feature from each cluster to form the output subset. Experiments on 35 real-world datasets show FAST not only selects smaller feature subsets but also improves performance of four common classifiers compared to other feature selection methods.
JAVA 2013 IEEE DATAMINING PROJECT A fast clustering based feature subset sele...IEEEGLOBALSOFTTECHNOLOGIES
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...IEEEGLOBALSOFTTECHNOLOGIES
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
A fast clustering based feature subset selection algorithm for high-dimension...IEEEFINALYEARPROJECTS
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.co¬m-Visit Our Website: www.finalyearprojects.org
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...IEEEGLOBALSOFTTECHNOLOGIES
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
This document discusses various machine learning concepts related to data processing, feature selection, dimensionality reduction, feature encoding, feature engineering, dataset construction, and model tuning. It covers techniques like principal component analysis, singular value decomposition, correlation, covariance, label encoding, one-hot encoding, normalization, discretization, imputation, and more. It also discusses different machine learning algorithm types, categories, representations, libraries and frameworks for model tuning.
Iaetsd an enhanced feature selection forIaetsd Iaetsd
The document discusses feature selection techniques for machine learning applications. It proposes an Enhanced Fast Clustering-based Feature Selection (EFAST) algorithm. The EFAST algorithm works in two steps: 1) features are clustered using graph-theoretic clustering methods, and 2) the most relevant representative feature strongly correlated with the target categories is selected from each cluster to form the optimal feature subset. Features from different clusters are relatively independent, so EFAST has a high chance of selecting a set of useful and independent features. The algorithm was tested on real-world data and showed improved performance over other feature selection methods by reducing features while also improving classifier performance.
Performance Comparision of Machine Learning AlgorithmsDinusha Dilanka
In this paper Compare the performance of two
classification algorithm. I t is useful to differentiate
algorithms based on computational performance rather
than classification accuracy alone. As although
classification accuracy between the algorithms is similar,
computational performance can differ significantly and it
can affect to the final results. So the objective of this paper
is to perform a comparative analysis of two machine
learning algorithms namely, K Nearest neighbor,
classification and Logistic Regression. In this paper it
was considered a large dataset of 7981 data points and 112
features. Then the performance of the above mentioned
machine learning algorithms are examined. In this paper
the processing time and accuracy of the different machine
learning techniques are being estimated by considering the
collected data set, over a 60% for train and remaining
40% for testing. The paper is organized as follows. In
Section I, introduction and background analysis of the
research is included and in section II, problem statement.
In Section III, our application and data analyze Process,
the testing environment, and the Methodology of our
analysis are being described briefly. Section IV comprises
the results of two algorithms. Finally, the paper concludes
with a discussion of future directions for research by
eliminating the problems existing with the current
research methodology.
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subse...IEEEGLOBALSOFTTECHNOLOGIES
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
This document provides an overview of machine learning concepts including feature selection, dimensionality reduction techniques like principal component analysis and singular value decomposition, feature encoding, normalization and scaling, dataset construction, feature engineering, data exploration, machine learning types and categories, model selection criteria, popular Python libraries, tuning techniques like cross-validation and hyperparameters, and performance analysis metrics like confusion matrix, accuracy, F1 score, ROC curve, and bias-variance tradeoff.
Evaluation of a hybrid method for constructing multiple SVM kernelsinfopapers
Dana Simian, Florin Stoica, Evaluation of a hybrid method for constructing multiple SVM kernels, Recent Advances in Computers, Proceedings of the 13th WSEAS International Conference on Computers, Recent Advances in Computer Engineering Series, WSEAS Press, Rodos, Greece, July 23-25, 2009, ISSN: 1790-5109, ISBN: 978-960-474-099-4, pp. 619-623
Similar to Optimal feature selection from v mware esxi 5.1 feature set (20)
HEAP SORT ILLUSTRATED WITH HEAPIFY, BUILD HEAP FOR DYNAMIC ARRAYS.
Heap sort is a comparison-based sorting technique based on Binary Heap data structure. It is similar to the selection sort where we first find the minimum element and place the minimum element at the beginning. Repeat the same process for the remaining elements.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
Low power architecture of logic gates using adiabatic techniquesnooriasukmaningtyas
The growing significance of portable systems to limit power consumption in ultra-large-scale-integration chips of very high density, has recently led to rapid and inventive progresses in low-power design. The most effective technique is adiabatic logic circuit design in energy-efficient hardware. This paper presents two adiabatic approaches for the design of low power circuits, modified positive feedback adiabatic logic (modified PFAL) and the other is direct current diode based positive feedback adiabatic logic (DC-DB PFAL). Logic gates are the preliminary components in any digital circuit design. By improving the performance of basic gates, one can improvise the whole system performance. In this paper proposed circuit design of the low power architecture of OR/NOR, AND/NAND, and XOR/XNOR gates are presented using the said approaches and their results are analyzed for powerdissipation, delay, power-delay-product and rise time and compared with the other adiabatic techniques along with the conventional complementary metal oxide semiconductor (CMOS) designs reported in the literature. It has been found that the designs with DC-DB PFAL technique outperform with the percentage improvement of 65% for NOR gate and 7% for NAND gate and 34% for XNOR gate over the modified PFAL techniques at 10 MHz respectively.
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Optimal feature selection from v mware esxi 5.1 feature set
1. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.3, No.3, September 2014
Optimal Feature Selection from VMware ESXi 5.1
Feature Set
Amartya Hatua
Bangalore, India
Abstract
A study of VMware ESXi 5.1 server has been carried out to find the optimal set of parameters which
suggest usage of different resources of the server. Feature selection algorithms have been used to extract
the optimum set of parameters of the data obtained from VMware ESXi 5.1 server using esxtop command.
Multiple virtual machines (VMs) are running in the mentioned server. K-means algorithm is used for
clustering the VMs. The goodness of each cluster is determined by Davies Bouldin index and Dunn index
respectively. The best cluster is further identified by the determined indices. The features of the best cluster
are considered into a set of optimal parameters.
Index Terms
Clustering, Feature selection, K-means clustering algorithm, VMware ESXi 5.1
1.INTRODUCTION
Feature selection or dimensionality reduction problem for supervised or unsupervised inductive
learning basically generates a set of candidate features. These set of features can be defined mainly
in three different approaches, a) the number or size of features is specified and they optimize an
evaluation measure. b) The subset of features must satisfy some predefined restriction on
evaluation measure. c) The subset with the best size and value measured by evaluation measure.
Improvement of inductive learning is the prime objective of feature selection. The improvement
can be measured in terms of learning speed, ease of the presentation or ability of generalization. To
improve the results of inducer, the associated key factors are diminishing the volume of storage,
reduction of the noise generated by irrelevant or redundant features and elimination of useless
knowledge.
The motivation behind the work comes from selection procedure of VMs during VM migration
process or load balancing of VMware ESXi 5.1 server. If a server is overloaded with VMs then
some of the VMs should be migrated to other sever. The selection of VMs happens depending on
the load of different resources. For every resource there are many parameters or features are
present in VMware ESXi 5.1 server. The set of all parameters and their values can be observed
using ESXTOP command. Selection of the optimal set of parameters has been carried out in this
study. Higher value for particular VM for a set of parameters indicates high usage or load of the
particular resource. Depending on the types of load of different VMs; the selection of VMs can be
done, which are eligible for migration.
DOI:10.5121/ijccms.2014.3301 1
2. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.3, No.3, September 2014
In the following sections Feature selection methods, Clustering methods, Cluster analysis methods,
Experiment methodology, Result and Conclusion are elaborately discussed.
2
II.FEATURE SELECTION METHODS
A feature selection algorithm (FSA) is a computational solution that is defined by a certain degree
of relevance. However the relevance of a feature always depends on the objective of the inductive
learning algorithm which uses the reduce set of feature. For induction an irrelevant feature is not
useful at the same time all relevant features are not required always [1]. Some of the feature
selection methods are: Wrapper method, Filter methods, Rough set methods and Schemata search
method. Feature selection using similarity, Principal Component Analysis method, Filter-Wrapper
Hybrid method. Again different types filtering method are also available for feature selection, like
FOCUS algorithm, RELIEF algorithm, Las Vegas algorithm, Markov Blanket filtering, Decision
Tree method and Genetic algorithms [2], [3], [4].
A.Correlation-based Feature Selection method
Correlation-based Feature Selection (CFS) [5] comes under the category of filter algorithm. In this
type of algorithm the subset of features are chosen based on ranks. The rank is determined based
on a correlation based heuristic evaluation function. The evaluation function selects features which
are highly correlated with the class and at the same time uncorrelated with each other. The
evaluation function selects features which are highly correlated with the class and at the same time
uncorrelated with each other. Selection of feature happens mainly based on the scale to which it
predicts classes in the instance space not predicted by some other features. CFSs feature subset
evaluation function is repeated here (with slightly modified notation) for ease of reference:
is representing the heuristic merit of a feature subset S containing k features. is denoting
the mean feature-class correlation (f ∈ s), and is stands for average feature-feature inter
correlation. Numerator of the equation is giving an idea regarding the prediction of the class. The
denominator is representing redundancy among the features. CFS uses best heuristic search
strategy. If continuous five times fully expanded subsets do not give a better result the process
stops; in other words that is the stopping condition of the process.
B.Wrapper method
The wrapper methodology, popularized by Kohavi and John (1997), provides an efficient way to
determine the required subset of features, regardless of the machine learning algorithm. Here the
machine learning algorithm is considered as a perfect black box. In practice, one needs to define:
(i) the procedure to find the best subset among all possible variable subsets; (ii) the procedure to
evaluate the prediction performance of a learning machine, which helps to guide the search and
reaches the stopping condition; and (iii) the selection of predictor. Wrapper method performs a
search in the space of all possible subsets.
Use of machine learning algorithms makes wrapper method remarkably universal and simple. On
the other hand when the number of features are very large then this method required an exhaustive
search for O( ) number of subsets. Where n is the number of features. The complexity of the
algorithm is O( ). So the algorithms is not efficient when n is very large.
3. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.3, No.3, September 2014
3
C.Filter Algorithm
Filter algorithms determine the subset of optimal features by measuring the intrinsic properties of
the data. The algorithm calculates a relevance score for each feature. Features having low
relevance score are removed. Afterwards, the subset of relevant features is used as input in
different machine learning algorithms.
RELIEF Algorithm: The RELIEF assigns different weight to every feature depending on the
relevance of the particular feature on the given context. RELIEF algorithm follows the filter
method to determine the minimal set of features. It samples instances from the training data
randomly and finds the relevance value based on near-hit and near-miss [6].
D.Rough Set Method
In Velayutham [7] Rough Set Theory (RST) is used as a tool to discover data dependencies and to
find the subset of relevant features.
The rough set is approximation of set by two concepts known as lower and upper approximations.
The domain objects are represented by the lower approximation. These domain objects belong to
the subset of interest with a certainty. On the other hand the upper approximation describes the
objects which are possibly belonging to subset. Approximations are made with respect to a
particular subset of features or parameters.
Considering an information system as I = (U, A d). In the information system U is the universe
with a non-empty set of finite objects. On the other hand A represents nonempty finite set of
condition attributes, and d represents the decision attribute, the corresponding function is
, where represents set of values of a. If then there is an associated
equivalence relation:
The partition of U generated by is denoted by U/P. If (x, y) IND(P) holds good, then it is
difficult to find a clearly distinguish x and y are by attributes from P. The equivalence classes of
the P in discernibility relation are denoted [x]p. Considering XU the P-lower approximation PX of
set X and P-upper approximation PX of set X can be defined as:
PX = {X ∈ U|[X]p X }
X = {X ∈ U|[X]p X }
Let P,Q A be equivalence relations over U , then positive region defined as:
PX. The positive region of the partition U/Q with respect to P, is
the set of all objects of U that can be certainly classified to blocks of the partition . U/Q by means
of P. Q depends on P in a degree denoted
Where
If k=1, Q depends totally on P, if depends partially on P, if k=0 then Q does not
depend on P.
4. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.3, No.3, September 2014
The Unsupervised Quick Reduct (USQR) algorithm endeavors to calculate a reduct without fully
generating all possible subsets. In Velayutham [8] the algorithm starts off with an empty set. In
each iteration one attribute is added in the set. Those attributes are selected depending on the rough
set dependency metric. This procedure continues until the value reaches its maximum value for
that dataset. The mean dependency of each attribute is calculated and best candidate is chosen:
4
The data set is considered as consistent if the process ends at dependency value 1; otherwise it is
inconsistent.
A.Chi-square feature selection
Chi-square method calculates the lack of information between a feature and a category. Chi-square
distribution is used for comparison with one degree of freedom to judge extremeness [9]. It is
defined as:
Here feature selection metrics (four known measures and two proposed variants), which are
functions of the following four dependency tuples:
( ): presence of t and membership in .
( ): presence of t and non-membership in .
( ): absence of and membership in .
( ): absence of t and non-membership in .
III.CLUSTERING METHOD
K-means clustering [9] is a method of cluster creation, where the main objective is to partition n
observations into K groups/clusters in each instance belongs to the cluster with nearest mean. This
results in a partitioning of the data space into Voronoi cells. Mathematically Voronoi diagram
helps to divide a space into a number of regions. Depending on some predefined conditions set of
point is taken in consideration they are known as seeds, sites, or generators. Around each site
several points which are closet to that site form a region. The regions are called Voronoi cells. The
most popular algorithm uses an iterative technique to reach the stopping point. It is commonly
famous as K-means algorithm. In computer science community it is also known as Lloyd’s
algorithm.
IV.CLUSTER ANALYSIS METHOD
The cluster analysis methods provide the measure, using which the goodness clusters can be
determined. The cluster analysis methods has been used are Davies Bouldin index and
Dunn index (J. C. Dunn 1974).
5. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.3, No.3, September 2014
5
A.DaviesBouldin index
The Davies Bouldin index can be calculated by the following formula:
In the above equation n represents the number of clusters. Cx stands for centroid of the cluster x.
is representing the average distance between each of the point in the cluster and the centroid.
is denoting the distance between two centroid of two clusters.
The clustering algorithm produces a collection of clusters having smallest Davies Bouldin index
is considered to be the best algorithm. Clusters with smallest Davies Bouldin index suggests low
intra-cluster distances and high intra-cluster similarity.
B.Dunn index
The main objective of Dunn index is to find dense and well-separated clusters. It is defined as the
ratio between the minimal inter- cluster distance to maximal intra-cluster distance. The following
formula is used to calculate Dunn index for each cluster partition.
If i and j are two clusters then distance between these two cluster is represented by d(i,j). For a
cluster K, d(k) represents
intra-cluster distance. The distance between centroids of the clusters can be considered as inter-cluster
distance represented by: d(i,j). Similarly intra-cluster distance d(k) can be measured as
distance between any two elements of cluster K. As clusters with high intra-cluster similarity and
low inter-cluster similarity are the desirable that is why cluster with high Dunn index are
preferable.
V.EXPERIMENT METHODOLOGY
The steps of the experiment are following:
Environment setup: Firstly in a physical server VMware ESXi 5.1 has been installed and twenty
six virtual machines (VM) have been installed in that.
Load selection and run the load: There are four different types (CPU intensive, Network
intensive, Memory intensive and Disk intensive) of load have been run. Each of CPU,
Network and Disk intensive loads has been run in seven different VMs, whereas Network
intensive load has been run
in rest of the five VMs.
Collection of result set: In third step data has been collected from the server using ESXTOP reply
mode. Performance of different factor has been collected like CPU, Memory, Disk, Network and
Power.
6. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.3, No.3, September 2014
Feature selection: In the result set there are many parameters in this phase the relevant parameters
are selected using feature selection algorithms.
6
Use K-means algorithm: By using the clustering algorithms clusters has been created using the
subset of features resulting from the feature selection algorithms.
Comparison: In this step the clusters has been compared based on some cluster comparison
method.
TABLE I. LIST OF FEATURES FOR DIFFERENT RESOURCES
Resources Parameters
CPU
%USED, %RUN, %SYS, %WAIT, %VMWAIT, %RDY,
%IDLE, %OVRLP, %CSTP, %MLMTD, %SWPWT,
SWTCH/s, MIG/s
Memory
SWCUR, SWTGT, SWR/s, SWW/s, LLSWR/s, LLSWW/s,
CPTDR, CPTTGT, ZERO, SHRD,SHRDSVD, COWH,
NHN,
NMIG, NRMEM, NLMEM, N_L, GST_ND0,
OVD_ND0,GST_ ND1, OVD_ND1, OVHDUM, OVHD,
OVHDMAX, CMTTGT, CMTTGT, CMTCHRG,
CMTPPS, CACHESZ, CACHUSD, ZIP/s, UNZIP/s,
MEMSZ, GRANT, SZTGT, TCHD, TCHD W, ACTV,
ACTVS, ACTVF, ACTVN, MCTLSZ, MCTLTGT,
MCTLMAX
Disk
CMDS/s, READS/s, WRITES/s, MBREAD/s,
MBWRTN/s, LAT/rd, LAT/wr
Network
PKTTX/s, MbTX/s, PKTRX/s, MbRX/s, %DRPTX,
%DRPRX, ACTN/s, PKTTXMUL/s, PKTRXMUL/s,
PKTTXBRD/s, PKTRXBRD/s
Power
%USED, %UTIL, %C0, %C1, %C2, %C3, %T0, %T1,
%T2, %T3, %T4, %T5, %T6,%T7
VI.RESULTS
A.Features selected by using different feature selection algorithms
There are five different feature selection algorithms have been used: CFS method, RELIEF
method, Chi Square Method, Wrapper method, Rough set method. The feature selection methods
have been used on the data generated from the experiment. The methods select a subset of relevant
features depending on some measuring metrics defined by the method.
TABLE I .LIST OF SELECTED FEATURES FOR DIFFERENT RESOURCES
Resource CFS RELIEF Chi
Square
Wrapper Rough
Set
CPU %USED,
%IDLE,
MIG/s
%USED,
%RUN,
%SYS,
%VMWAI
T,%RDY,
%IDLE,
%OVRLP,
%SWPWT
MIG/s
%USED,
%RUN,
%SYS,
%WAIT,
%VMWAI
T,%RDY,
%IDLE,
%OVRLP,
%CSTP,
%MLMT,
%USED,
%RUN,
%SYS,
%WAIT,
%VMWA
IT,%RD,
%IDLE,
%OVRLP,
%CSTP,
%MLMD,
%USD
%SYS,
MIG/s
7. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.3, No.3, September 2014
7
%SWPW,
SWTCH/s,
MIG/s
%SWPT,
SWTCH/,
MIG/s
Disk CMDS/s,
LAT/r22
2d
CMDS/s,
READS/s,
WRITES/,
MBREAD/
s,LAT/rd,
LAT/wr
CDMS/s,
READS/s,
WRITES/,
MBREAD/
s,MBWRT
N/s,LAT/r
d,LAT/wr
CMDS/s,
MBREAD
/s,LAT/rd
CMDS,
LAT/rd,
LAT/wr
Network PKTRXs,
MbRX/s
PKTTX/s,
MbTX/s,
PKTRX/s,
MbRX/s,
ACTN/s,
PKTRXM
UL/
PKTTX/s,
MbTX/s,
PKTRX/s,
MbRX/s
PKTTX/s,
MbTX/s,
PKTRX/s,
ACTN/s,
PKTRXM
UL/
PKTR/,
ACTN/
s,
PKTRX
,MUL/
s,PKTR
XBRD/
s
Memory SWCUR,
COWH,
OVHD,
GRANT,
SZTGT
SZTGT,
GRANT,
MCTLMA
X,
TCHD W,
MEMSZ,
TCHD,
SHRDSD,
ACTVS,
MCTLSZ,
SHRD,
ZERO,
SWCUR,
MCTLTT,
ACTVF,
ACTV,
CMTTGT,
ACTVN,
OVHDU,
OVHDM,
OVHD,
COWH,
CACHUS
D,NLME
M,NHN,
NRMEM,
N L,
CMTPPS,
CACHESZ
SWCUR,
ZERO,
SHRD,
SHRDSV
D,
COWH,
NHN,
NRMEM,
NLMEM,
N L,
OVHDUM
,OVHD,
OVHDMA
X,
CMTTGT,
CACHESZ
,CACHUS
D,MEMSZ
,GRANT,
SZTGT,
TCHD,
TCHD W,
ACTV,
ACTVS,
ACTVF,
ACTVN,
MCTLMA
X
GRANT,
SZTGT,
SWCUR,
SHRDSV
D,SHRD,
ZERO,
ACTVS,
TCHD W,
ACTVF,
TCHD,
ACTV,
CMTTGT,
ACTVN,
OVHD,
MEMSZ,
MCTLM
AX,
MCTLTG
T,
MCTLSZ,
OVHDM
AX,
OVHDU
M,
COWH,
CACHUS
D
SHRDS
VD,
GRAN
T
Power %USED %USED,
%C1,
%USED
%USED,
%C0,
%C1
%USED,
%C0,
%C1
%USE
D,%UT
IL,%C1
B.Clustering using K-means algorithm
After getting the selected features K-means clustering algorithm has been applied on each of the
set. Where each setconsists of all the selected features of all resources by a particular method of
feature selection. Here number of clusteris four. To find the goodness of the clusters, the analysis
has been done by the help of Davies Bouldin index and Dunn index.
Using these two different indices the goodness of the clustering algorithm has been found out. The
cluster having maximum Davies Bouldin index and minimum Dunn index is considered to be the
best cluster and the feature selection algorithm related to that cluster is considered to be the best
algorithm among these algorithms using the data set. From this result it can be concluded that the
parameters selected are optimum parameters from which we can have a clear idea about types of
8. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.3, No.3, September 2014
load running in VMware ESXi 5.1 server. Here is the table where Davies Bouldin index and Dunn
index for different clusters have been given. Where clusters found using the parameters selected by
the respective feature selection algorithms.
8
TABLE I .DAVIES BOULDIN INDEX AND DUNN INDEX FOR CLUSTERS.
Feature selection
technique
Davies Bouldin
index
Dunn index
CFS 9.9122e+004 1.4196e-006
RELIEF 2.2679e+00 1.0874e-009
ChiSquare Method 2.9018e+007 2.442-009
Wrapper 2.1199e+007 1.0862e-009
Rough set 3.9079e+005 4.9040e-006
From the result it can be observed that the best result in both of the case is given by the clusters
formed by the features selected by the CFS algorithm.
VII.CONCLUSION
So from the result it can be concluded that the set of parameters selected by the CFS is the best set
of parameters of VMware ESXi 5.1. So from that set of parameters we can decide the type of load
in VMware ESXi 5.1. At the same time VMs having high values of those parameters suggest the
higher usage of those resources by the VM, which can be helpful for load balancing of Virtual
Machine.
REFERENCES
[1] How Useful is Relevance? R.A. Caruana and D. Freitag, Technical report, Fall94 AAAI Symposium
on Relevance, New Orleans, 1994, AAAI.
[2] Selection of Relevant Features and Examples in Machine Learning, A.L.Blum and P.Langley, R.
Greiner and D. Subramanian, eds., Artificial Intelligence on Relevance, volume 97, pages 245271.
Artificial Intelligence, 1997.
[3] An Evaluation of Feature Selection Methods and their Application to Computer Security, J.Doak, An
Evaluation of Feature Selection Methods and their Application to Computer Security, 245271, 1992,
Technical Report CSE9218, Davis, CA: University of California, Department of Computer Science.
[4] Feature Selection for Knowledge Discovery and Data Mining, H.Liu and H.Motoda, 1998, Kluwer
Academic Publishers, London, GB.[5]Mark A.Hall, Feature Selection for Discrete and Numeric Class
Machine Learning.
[6] D. W. Yijun Sun, A relief based feature extraction algorithm, Journal of the Royal Statistical Society.
Series B (Methodological) Vol. 39, No. 1(1977), pp. 1-38, Tech. Rep., 2011.
[7] A Novel Entropy Based Unsupervised Feature Selection Algorithm Using Rough Set Theory, K.
Thangavel and C.Velayutham, Science And Management (ICAESM -2012) March 30, 31, 2012, 2012,
IEEE International Conference On Advances In Engineering.
[8] Feature selection for text categorization on imbalanced data, M. A. Hall,Sigkdd Explorations ,Volume
6, Issue 1 - Page 80, Tech. Rep., 2003.K. Elissa, “Title of paper if known,” unpublished.
[9] Extensions to the k-Means Algorithm for Clustering Large Data Setswith Categorical Values,
ZHEXUE HUANG, P1: SUD Data Mining and Knowledge Discovery KL657-03-Huang 12:59 Data
Mining and Knowledge Discovery 2, 283304 (1998) 1998, October 27, 1998, Kluwer Academic
Publishers. Manufactured in The Netherlands.