Data analysis plays a prominent role in interpreting various phenomena. Data mining is the process to hypothesize useful knowledge from the extensive data. Based upon the classical statistical prototypes the data can be exploited beyond the storage and management of the data. Cluster analysis a primary investigation with little or no prior knowledge, consists of research and development across a wide variety of communities. Cluster ensembles are melange of individual solutions obtained from different clusterings to produce final quality clustering which is required in wider applications. The method arises in the perspective of increasing robustness, scalability and accuracy. This paper gives a brief overview of the generation methods and consensus functions included in cluster ensemble. The survey is to analyze the various techniques and cluster ensemble methods.
Building a Classifier Employing Prism Algorithm with Fuzzy LogicIJDKP
Classification in data mining is receiving immense interest in recent times. As the knowledge is based on
historical data, classifications of data are essential for discovering the knowledge. To decrease the
classification complexity, the quantitative attributes of data need splitting. But the splitting using the
classical logic is less accurate. This can be overcome by the use of fuzzy logic. This paper illustrates how to
build up the classification rules using the fuzzy logic. The fuzzy classifier is built on by using the prism
decision tree algorithm. This classifier produces more realistic results than the classical one. The
effectiveness of this method is justified over a sample dataset.
The improved k means with particle swarm optimizationAlexander Decker
This document summarizes a research paper that proposes an improved K-means clustering algorithm using particle swarm optimization. It begins with an introduction to data clustering and types of clustering algorithms. It then discusses K-means clustering and some of its drawbacks. Particle swarm optimization is introduced as an optimization technique inspired by swarm behavior in nature. The proposed algorithm uses particle swarm optimization to select better initial cluster centroids for K-means clustering in order to overcome some limitations of standard K-means. The algorithm works in two phases - the first uses particle swarm optimization and the second performs K-means clustering using the outputs from the first phase.
Hybrid Algorithm for Clustering Mixed Data SetsIOSR Journals
This document summarizes a hybrid algorithm for clustering mixed data sets that was proposed in reference [1]. The algorithm uses a genetic k-means approach to cluster both numeric and categorical data, overcoming limitations of other algorithms that can only handle one data type. It aims to minimize the total within-cluster variation to group similar objects. The selection operator uses proportional selection to determine the population for the next generation based on each solution's probability and fitness. The algorithm was reviewed, implemented in a prototype application, and found to improve performance compared to other related clustering algorithms like GKMODE and IGKA that also handle mixed data types.
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
This document summarizes a research paper that proposes a new Particle Swarm Optimization (PSO) based K-Prototype clustering algorithm to cluster mixed numeric and categorical data. It begins with background information on clustering algorithms like K-Means, K-Modes, and K-Prototype. It then describes the K-Prototype algorithm, PSO, and discrete binary PSO. Related work integrating PSO with other clustering algorithms is also reviewed. The proposed approach uses binary PSO to select improved initial prototypes for K-Prototype clustering in order to obtain better clustering results than traditional K-Prototype and avoid local optima.
An Heterogeneous Population-Based Genetic Algorithm for Data Clusteringijeei-iaes
As a primary data mining method for knowledge discovery, clustering is a technique of classifying a dataset into groups of similar objects. The most popular method for data clustering K-means suffers from the drawbacks of requiring the number of clusters and their initial centers, which should be provided by the user. In the literature, several methods have proposed in a form of k-means variants, genetic algorithms, or combinations between them for calculating the number of clusters and finding proper clusters centers. However, none of these solutions has provided satisfactory results and determining the number of clusters and the initial centers are still the main challenge in clustering processes. In this paper we present an approach to automatically generate such parameters to achieve optimal clusters using a modified genetic algorithm operating on varied individual structures and using a new crossover operator. Experimental results show that our modified genetic algorithm is a better efficient alternative to the existing approaches.
Cancer data partitioning with data structure and difficulty independent clust...IRJET Journal
This document discusses cancer data partitioning using clustering techniques. It begins with an introduction to clustering concepts and different clustering methods like k-means, hierarchical agglomerative clustering, and partitioning methods. It then reviews literature on clustering algorithms and ensemble methods applied to problems like speaker diarization and tumor clustering from gene expression data. The document analyzes issues with existing clustering methodology and proposes a new dynamic ensemble membership selection scheme to support data structure and complexity independent clustering for cancer data partitioning. The method combines partition around medoids clustering with an incremental semi-supervised cluster ensemble framework to improve healthcare data partitioning accuracy.
A Survey on Constellation Based Attribute Selection Method for High Dimension...IJERA Editor
Attribute Selection is an important topic in Data Mining, because it is the effective way for reducing dimensionality, removing irrelevant data, removing redundant data, & increasing accuracy of the data. It is the process of identifying a subset of the most useful attributes that produces compatible results as the original entire set of attribute. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group called a cluster are more similar in some sense or another to each other than to those in other groups (Clusters). There are various approaches & techniques for attribute subset selection namely Wrapper approach, Filter Approach, Relief Algorithm, Distributional clustering etc. But each of one having some disadvantages like unable to handle large volumes of data, computational complexity, accuracy is not guaranteed, difficult to evaluate and redundancy detection etc. To get the upper hand on some of these issues in attribute selection this paper proposes a technique that aims to design an effective clustering based attribute selection method for high dimensional data. Initially, attributes are divided into clusters by using graph-based clustering method like minimum spanning tree (MST). In the second step, the most representative attribute that is strongly related to target classes is selected from each cluster to form a subset of attributes. The purpose is to increase the level of accuracy, reduce dimensionality; shorter training time and improves generalization by reducing over fitting.
This document provides an overview of different techniques for clustering categorical data. It discusses various clustering algorithms that have been used for categorical data, including K-modes, ROCK, COBWEB, and EM algorithms. It also reviews more recently developed algorithms for categorical data clustering, such as algorithms based on particle swarm optimization, rough set theory, and feature weighting schemes. The document concludes that clustering categorical data remains an important area of research, with opportunities to develop techniques that initialize cluster centers better.
Building a Classifier Employing Prism Algorithm with Fuzzy LogicIJDKP
Classification in data mining is receiving immense interest in recent times. As the knowledge is based on
historical data, classifications of data are essential for discovering the knowledge. To decrease the
classification complexity, the quantitative attributes of data need splitting. But the splitting using the
classical logic is less accurate. This can be overcome by the use of fuzzy logic. This paper illustrates how to
build up the classification rules using the fuzzy logic. The fuzzy classifier is built on by using the prism
decision tree algorithm. This classifier produces more realistic results than the classical one. The
effectiveness of this method is justified over a sample dataset.
The improved k means with particle swarm optimizationAlexander Decker
This document summarizes a research paper that proposes an improved K-means clustering algorithm using particle swarm optimization. It begins with an introduction to data clustering and types of clustering algorithms. It then discusses K-means clustering and some of its drawbacks. Particle swarm optimization is introduced as an optimization technique inspired by swarm behavior in nature. The proposed algorithm uses particle swarm optimization to select better initial cluster centroids for K-means clustering in order to overcome some limitations of standard K-means. The algorithm works in two phases - the first uses particle swarm optimization and the second performs K-means clustering using the outputs from the first phase.
Hybrid Algorithm for Clustering Mixed Data SetsIOSR Journals
This document summarizes a hybrid algorithm for clustering mixed data sets that was proposed in reference [1]. The algorithm uses a genetic k-means approach to cluster both numeric and categorical data, overcoming limitations of other algorithms that can only handle one data type. It aims to minimize the total within-cluster variation to group similar objects. The selection operator uses proportional selection to determine the population for the next generation based on each solution's probability and fitness. The algorithm was reviewed, implemented in a prototype application, and found to improve performance compared to other related clustering algorithms like GKMODE and IGKA that also handle mixed data types.
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
This document summarizes a research paper that proposes a new Particle Swarm Optimization (PSO) based K-Prototype clustering algorithm to cluster mixed numeric and categorical data. It begins with background information on clustering algorithms like K-Means, K-Modes, and K-Prototype. It then describes the K-Prototype algorithm, PSO, and discrete binary PSO. Related work integrating PSO with other clustering algorithms is also reviewed. The proposed approach uses binary PSO to select improved initial prototypes for K-Prototype clustering in order to obtain better clustering results than traditional K-Prototype and avoid local optima.
An Heterogeneous Population-Based Genetic Algorithm for Data Clusteringijeei-iaes
As a primary data mining method for knowledge discovery, clustering is a technique of classifying a dataset into groups of similar objects. The most popular method for data clustering K-means suffers from the drawbacks of requiring the number of clusters and their initial centers, which should be provided by the user. In the literature, several methods have proposed in a form of k-means variants, genetic algorithms, or combinations between them for calculating the number of clusters and finding proper clusters centers. However, none of these solutions has provided satisfactory results and determining the number of clusters and the initial centers are still the main challenge in clustering processes. In this paper we present an approach to automatically generate such parameters to achieve optimal clusters using a modified genetic algorithm operating on varied individual structures and using a new crossover operator. Experimental results show that our modified genetic algorithm is a better efficient alternative to the existing approaches.
Cancer data partitioning with data structure and difficulty independent clust...IRJET Journal
This document discusses cancer data partitioning using clustering techniques. It begins with an introduction to clustering concepts and different clustering methods like k-means, hierarchical agglomerative clustering, and partitioning methods. It then reviews literature on clustering algorithms and ensemble methods applied to problems like speaker diarization and tumor clustering from gene expression data. The document analyzes issues with existing clustering methodology and proposes a new dynamic ensemble membership selection scheme to support data structure and complexity independent clustering for cancer data partitioning. The method combines partition around medoids clustering with an incremental semi-supervised cluster ensemble framework to improve healthcare data partitioning accuracy.
A Survey on Constellation Based Attribute Selection Method for High Dimension...IJERA Editor
Attribute Selection is an important topic in Data Mining, because it is the effective way for reducing dimensionality, removing irrelevant data, removing redundant data, & increasing accuracy of the data. It is the process of identifying a subset of the most useful attributes that produces compatible results as the original entire set of attribute. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group called a cluster are more similar in some sense or another to each other than to those in other groups (Clusters). There are various approaches & techniques for attribute subset selection namely Wrapper approach, Filter Approach, Relief Algorithm, Distributional clustering etc. But each of one having some disadvantages like unable to handle large volumes of data, computational complexity, accuracy is not guaranteed, difficult to evaluate and redundancy detection etc. To get the upper hand on some of these issues in attribute selection this paper proposes a technique that aims to design an effective clustering based attribute selection method for high dimensional data. Initially, attributes are divided into clusters by using graph-based clustering method like minimum spanning tree (MST). In the second step, the most representative attribute that is strongly related to target classes is selected from each cluster to form a subset of attributes. The purpose is to increase the level of accuracy, reduce dimensionality; shorter training time and improves generalization by reducing over fitting.
This document provides an overview of different techniques for clustering categorical data. It discusses various clustering algorithms that have been used for categorical data, including K-modes, ROCK, COBWEB, and EM algorithms. It also reviews more recently developed algorithms for categorical data clustering, such as algorithms based on particle swarm optimization, rough set theory, and feature weighting schemes. The document concludes that clustering categorical data remains an important area of research, with opportunities to develop techniques that initialize cluster centers better.
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization stratagem. This evolutionary technique always aims to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
Automatic Feature Subset Selection using Genetic Algorithm for Clusteringidescitation
Feature subset selection is a process of selecting a
subset of minimal, relevant features and is a pre processing
technique for a wide variety of applications. High dimensional
data clustering is a challenging task in data mining. Reduced
set of features helps to make the patterns easier to understand.
Reduced set of features are more significant if they are
application specific. Almost all existing feature subset
selection algorithms are not automatic and are not application
specific. This paper made an attempt to find the feature subset
for optimal clusters while clustering. The proposed Automatic
Feature Subset Selection using Genetic Algorithm (AFSGA)
identifies the required features automatically and reduces
the computational cost in determining good clusters. The
performance of AFSGA is tested using public and synthetic
datasets with varying dimensionality. Experimental results
have shown the improved efficacy of the algorithm with optimal
clusters and computational cost.
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
The document describes an automatic unsupervised data classification method using the Jaya evolutionary algorithm. It proposes using Jaya to optimize multiple cluster validity indices (CVIs) simultaneously to determine the optimal number of clusters and cluster assignments. Twelve real-world datasets from different domains are used to evaluate the method. The results show that the proposed AutoJAYA algorithm is able to accurately detect the number of clusters in each dataset and achieve good performance according to various CVIs, demonstrating its effectiveness at automatic unsupervised data classification.
Engineering Research Publication
Best International Journals, High Impact Journals,
International Journal of Engineering & Technical Research
ISSN : 2321-0869 (O) 2454-4698 (P)
www.erpublication.org
In the present day huge amount of data is generated in every minute and transferred frequently. Although
the data is sometimes static but most commonly it is dynamic and transactional. New data that is being
generated is getting constantly added to the old/existing data. To discover the knowledge from this
incremental data, one approach is to run the algorithm repeatedly for the modified data sets which is time
consuming. Again to analyze the datasets properly, construction of efficient classifier model is necessary.
The objective of developing such a classifier is to classify unlabeled dataset into appropriate classes. The
paper proposes a dimension reduction algorithm that can be applied in dynamic environment for
generation of reduced attribute set as dynamic reduct, and an optimization algorithm which uses the
reduct and build up the corresponding classification system. The method analyzes the new dataset, when it
becomes available, and modifies the reduct accordingly to fit the entire dataset and from the entire data
set, interesting optimal classification rule sets are generated. The concepts of discernibility relation,
attribute dependency and attribute significance of Rough Set Theory are integrated for the generation of
dynamic reduct set, and optimal classification rules are selected using PSO method, which not only
reduces the complexity but also helps to achieve higher accuracy of the decision system. The proposed
method has been applied on some benchmark dataset collected from the UCI repository and dynamic
reduct is computed, and from the reduct optimal classification rules are also generated. Experimental
result shows the efficiency of the proposed method.
A Novel Clustering Method for Similarity Measuring in Text DocumentsIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSEditor IJCATR
This paper presents a hybrid data mining approach based on supervised learning and unsupervised learning to identify the closest data patterns in the data base. This technique enables to achieve the maximum accuracy rate with minimal complexity. The proposed algorithm is compared with traditional clustering and classification algorithm and it is also implemented with multidimensional datasets. The implementation results show better prediction accuracy and reliability.
Column store decision tree classification of unseen attribute setijma
A decision tree can be used for clustering of frequently used attributes to improve tuple reconstruction time
in column-stores databases. Due to ad-hoc nature of queries, strongly correlative attributes are grouped
together using a decision tree to share a common minimum support probability distribution. At the same
time in order to predict the cluster for unseen attribute set, the decision tree may work as a classifier. In
this paper we propose classification and clustering of unseen attribute set using decision tree to improve
tuple reconstruction time.
Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach IJECEIAES
The document presents a new approach called Bat-Cluster (BC) for automated graph clustering. BC combines the Fast Fourier Domain Positioning (FFDP) algorithm and the Bat Algorithm. FFDP positions graph nodes, then Bat Algorithm optimizes clustering by finding configurations that minimize the Davies-Bouldin Index. BC is tested on four benchmark graphs and outperforms Particle Swarm Optimization, Ant Colony Optimization, and Differential Evolution in providing higher clustering precision.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSijdkp
Subspace clustering discovers the clusters embedded in multiple, overlapping subspaces of high
dimensional data. Many significant subspace clustering algorithms exist, each having different
characteristics caused by the use of different techniques, assumptions, heuristics used etc. A comprehensive
classification scheme is essential which will consider all such characteristics to divide subspace clustering
approaches in various families. The algorithms belonging to same family will satisfy common
characteristics. Such a categorization will help future developers to better understand the quality criteria to
be used and similar algorithms to be used to compare results with their proposed clustering algorithms. In
this paper, we first proposed the concept of SCAF (Subspace Clustering Algorithms’ Family).
Characteristics of SCAF will be based on the classes such as cluster orientation, overlap of dimensions etc.
As an illustration, we further provided a comprehensive, systematic description and comparison of few
significant algorithms belonging to “Axis parallel, overlapping, density based” SCAF.
TWO PARTY HIERARICHAL CLUSTERING OVER HORIZONTALLY PARTITIONED DATA SETIJDKP
This document summarizes a research paper that proposes a two-party hierarchical clustering approach for horizontally partitioned data to enable privacy-preserving data mining. The key points are:
1) The paper presents an approach for applying hierarchical clustering across two parties that hold horizontally partitioned data, with the goal of preserving privacy.
2) Each party independently computes k-cluster centers on their own data and encrypts the distance matrices before sharing. Hierarchical clustering is then applied to merge the clusters.
3) An algorithm is provided for identifying the closest cluster for each data point based on the merged distance matrices.
4) The approach is analyzed and compared to other clustering techniques, demonstrating it has lower computational complexity
This document discusses using fuzzy set theory and decision trees to predict student performance. It proposes using fuzzy sets to represent numeric student data like test scores and attendance to allow for imprecise values. A decision tree is generated on this fuzzy data set to classify students as passing or failing. The fuzzy decision tree achieves an accuracy of 81.5% compared to 76% for a non-fuzzy decision tree, indicating fuzzy sets improve predictive performance. Location, attendance, and prior academic performance were identified as important factors impacting student results.
This document provides an overview and summary of Pankaj Jajoo's 2008 master's thesis on improving document clustering algorithms. The thesis explores two approaches: 1) preprocessing the graph representation of documents to remove noise before applying standard graph partitioning algorithms, and 2) clustering words first before clustering documents to reduce noise. Experimental results on three datasets show these approaches improve clustering quality over standard K-Means clustering. The thesis provides background on clustering, reviews existing document clustering methods, and describes the two new algorithms and evaluation of their performance.
ROLE OF CERTAINTY FACTOR IN GENERATING ROUGH-FUZZY RULEIJCSEA Journal
The generation of effective feature-based rules is essential to the development of any intelligent system. This paper presents an approach that integrates a powerful fuzzy rule generation algorithm with a rough set-assisted feature reduction method to generate diagnostic rule with a certainty factor. Certainty factor of each rule is calculated by considering both the membership value of each linguistic term introduced at time of fuzzyfication of data as well as possibility values, due to inconsistent data, generated by rough set theory at time of rule generation. In time of knowledge inferencing in an intelligent system, certainty factor of each rule will play an important role to find out the appropriate rule to be selected. Experimental results demonstrate the superiority of our approach.
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...ijistjournal
Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. We suggest improvements to the existing C4.5 decision tree algorithm. In this paper attribute oriented induction (AOI) and relevance analysis are incorporated with concept hierarchy’s knowledge and HeightBalancePriority algorithm for construction of decision tree along with Multi level mining. The assignment of priorities to attributes is done by evaluating information entropy, at different levels of abstraction for building decision tree using HeightBalancePriority algorithm. Modified DMQL queries are used to understand and explore the shortcomings of the decision trees generated by C4.5 classifier for education dataset and the results are compared with the proposed approach.
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...ijistjournal
Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. We suggest improvements to the existing C4.5 decision tree algorithm. In this paper attribute oriented induction (AOI) and relevance analysis are incorporated with concept hierarchy’s knowledge and HeightBalancePriority algorithm for construction of decision tree along with Multi level mining. The assignment of priorities to attributes is done by evaluating information entropy, at different levels of abstraction for building decision tree using HeightBalancePriority algorithm. Modified DMQL queries are used to understand and explore the shortcomings of the decision trees generated by C4.5 classifier for education dataset and the results are compared with the proposed approach.
Introduction to Multi-Objective Clustering EnsembleIJSRD
Association rule mining is a popular and well researched method for discovering interesting relations between variables in large databases. In this paper we introduce the concept of Data mining, Association rule and Multilevel association rule with different algorithm, its advantage and concept of Fuzzy logic and Genetic Algorithm. Multilevel association rules can be mined efficiently using concept hierarchies under a support-confidence framework.
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGijcsa
This document provides a survey of optimization approaches that have been applied to text document clustering. It discusses several clustering algorithms and categorizes them as partitioning methods, hierarchical methods, density-based methods, grid-based methods, model-based methods, frequent pattern-based clustering, and constraint-based clustering. It then describes several soft computing techniques that have been used as optimization approaches for text document clustering, including genetic algorithms, bees algorithms, particle swarm optimization, and ant colony optimization. These optimization techniques perform a global search to improve the quality and efficiency of document clustering algorithms.
Assessment of Cluster Tree Analysis based on Data Linkagesjournal ijrtem
Abstract: Details linkage is a procedure which almost adjoins two or more places of data (surveyed or proprietary) from different companies to generate a value chest of information which can be used for further analysis. This allows for the real application of the details. One-to-Many data linkage affiliates an enterprise from the first data set with a number of related companies from the other data places. Before performs concentrate on accomplishing one-to-one data linkages. So formerly a two level clustering shrub known as One-Class Clustering Tree (OCCT) with designed in Jaccard Likeness evaluate was suggested in which each flyer contains team instead of only one categorized sequence. OCCT's strategy to use Jaccard's similarity co-efficient increases time complexness significantly. So we recommend to substitute jaccard's similarity coefficient with Jaro wrinket similarity evaluate to acquire the team similarity related because it requires purchase into consideration using positional indices to calculate relevance compared with Jaccard's. An assessment of our suggested idea suffices as approval of an enhanced one-to-many data linkage system.
Index Terms: Maximum-Weighted Bipartite Matching, Ant Colony Optimization, Graph Partitioning Technique
Enhanced Clustering Algorithm for Processing Online DataIOSR Journals
This document proposes an enhanced incremental clustering algorithm for processing online data. It discusses existing clustering algorithms like leader clustering and hierarchical clustering, which have limitations in handling dynamic data. The proposed algorithm aims to dynamically create initial clusters and rearrange clusters based on data characteristics, allowing for more accurate clustering of online data over time. It also introduces a new frequency-based method for searching and retrieving specific data from fixed clusters.
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization stratagem. This evolutionary technique always aims to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
Automatic Feature Subset Selection using Genetic Algorithm for Clusteringidescitation
Feature subset selection is a process of selecting a
subset of minimal, relevant features and is a pre processing
technique for a wide variety of applications. High dimensional
data clustering is a challenging task in data mining. Reduced
set of features helps to make the patterns easier to understand.
Reduced set of features are more significant if they are
application specific. Almost all existing feature subset
selection algorithms are not automatic and are not application
specific. This paper made an attempt to find the feature subset
for optimal clusters while clustering. The proposed Automatic
Feature Subset Selection using Genetic Algorithm (AFSGA)
identifies the required features automatically and reduces
the computational cost in determining good clusters. The
performance of AFSGA is tested using public and synthetic
datasets with varying dimensionality. Experimental results
have shown the improved efficacy of the algorithm with optimal
clusters and computational cost.
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
The document describes an automatic unsupervised data classification method using the Jaya evolutionary algorithm. It proposes using Jaya to optimize multiple cluster validity indices (CVIs) simultaneously to determine the optimal number of clusters and cluster assignments. Twelve real-world datasets from different domains are used to evaluate the method. The results show that the proposed AutoJAYA algorithm is able to accurately detect the number of clusters in each dataset and achieve good performance according to various CVIs, demonstrating its effectiveness at automatic unsupervised data classification.
Engineering Research Publication
Best International Journals, High Impact Journals,
International Journal of Engineering & Technical Research
ISSN : 2321-0869 (O) 2454-4698 (P)
www.erpublication.org
In the present day huge amount of data is generated in every minute and transferred frequently. Although
the data is sometimes static but most commonly it is dynamic and transactional. New data that is being
generated is getting constantly added to the old/existing data. To discover the knowledge from this
incremental data, one approach is to run the algorithm repeatedly for the modified data sets which is time
consuming. Again to analyze the datasets properly, construction of efficient classifier model is necessary.
The objective of developing such a classifier is to classify unlabeled dataset into appropriate classes. The
paper proposes a dimension reduction algorithm that can be applied in dynamic environment for
generation of reduced attribute set as dynamic reduct, and an optimization algorithm which uses the
reduct and build up the corresponding classification system. The method analyzes the new dataset, when it
becomes available, and modifies the reduct accordingly to fit the entire dataset and from the entire data
set, interesting optimal classification rule sets are generated. The concepts of discernibility relation,
attribute dependency and attribute significance of Rough Set Theory are integrated for the generation of
dynamic reduct set, and optimal classification rules are selected using PSO method, which not only
reduces the complexity but also helps to achieve higher accuracy of the decision system. The proposed
method has been applied on some benchmark dataset collected from the UCI repository and dynamic
reduct is computed, and from the reduct optimal classification rules are also generated. Experimental
result shows the efficiency of the proposed method.
A Novel Clustering Method for Similarity Measuring in Text DocumentsIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSEditor IJCATR
This paper presents a hybrid data mining approach based on supervised learning and unsupervised learning to identify the closest data patterns in the data base. This technique enables to achieve the maximum accuracy rate with minimal complexity. The proposed algorithm is compared with traditional clustering and classification algorithm and it is also implemented with multidimensional datasets. The implementation results show better prediction accuracy and reliability.
Column store decision tree classification of unseen attribute setijma
A decision tree can be used for clustering of frequently used attributes to improve tuple reconstruction time
in column-stores databases. Due to ad-hoc nature of queries, strongly correlative attributes are grouped
together using a decision tree to share a common minimum support probability distribution. At the same
time in order to predict the cluster for unseen attribute set, the decision tree may work as a classifier. In
this paper we propose classification and clustering of unseen attribute set using decision tree to improve
tuple reconstruction time.
Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach IJECEIAES
The document presents a new approach called Bat-Cluster (BC) for automated graph clustering. BC combines the Fast Fourier Domain Positioning (FFDP) algorithm and the Bat Algorithm. FFDP positions graph nodes, then Bat Algorithm optimizes clustering by finding configurations that minimize the Davies-Bouldin Index. BC is tested on four benchmark graphs and outperforms Particle Swarm Optimization, Ant Colony Optimization, and Differential Evolution in providing higher clustering precision.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSijdkp
Subspace clustering discovers the clusters embedded in multiple, overlapping subspaces of high
dimensional data. Many significant subspace clustering algorithms exist, each having different
characteristics caused by the use of different techniques, assumptions, heuristics used etc. A comprehensive
classification scheme is essential which will consider all such characteristics to divide subspace clustering
approaches in various families. The algorithms belonging to same family will satisfy common
characteristics. Such a categorization will help future developers to better understand the quality criteria to
be used and similar algorithms to be used to compare results with their proposed clustering algorithms. In
this paper, we first proposed the concept of SCAF (Subspace Clustering Algorithms’ Family).
Characteristics of SCAF will be based on the classes such as cluster orientation, overlap of dimensions etc.
As an illustration, we further provided a comprehensive, systematic description and comparison of few
significant algorithms belonging to “Axis parallel, overlapping, density based” SCAF.
TWO PARTY HIERARICHAL CLUSTERING OVER HORIZONTALLY PARTITIONED DATA SETIJDKP
This document summarizes a research paper that proposes a two-party hierarchical clustering approach for horizontally partitioned data to enable privacy-preserving data mining. The key points are:
1) The paper presents an approach for applying hierarchical clustering across two parties that hold horizontally partitioned data, with the goal of preserving privacy.
2) Each party independently computes k-cluster centers on their own data and encrypts the distance matrices before sharing. Hierarchical clustering is then applied to merge the clusters.
3) An algorithm is provided for identifying the closest cluster for each data point based on the merged distance matrices.
4) The approach is analyzed and compared to other clustering techniques, demonstrating it has lower computational complexity
This document discusses using fuzzy set theory and decision trees to predict student performance. It proposes using fuzzy sets to represent numeric student data like test scores and attendance to allow for imprecise values. A decision tree is generated on this fuzzy data set to classify students as passing or failing. The fuzzy decision tree achieves an accuracy of 81.5% compared to 76% for a non-fuzzy decision tree, indicating fuzzy sets improve predictive performance. Location, attendance, and prior academic performance were identified as important factors impacting student results.
This document provides an overview and summary of Pankaj Jajoo's 2008 master's thesis on improving document clustering algorithms. The thesis explores two approaches: 1) preprocessing the graph representation of documents to remove noise before applying standard graph partitioning algorithms, and 2) clustering words first before clustering documents to reduce noise. Experimental results on three datasets show these approaches improve clustering quality over standard K-Means clustering. The thesis provides background on clustering, reviews existing document clustering methods, and describes the two new algorithms and evaluation of their performance.
ROLE OF CERTAINTY FACTOR IN GENERATING ROUGH-FUZZY RULEIJCSEA Journal
The generation of effective feature-based rules is essential to the development of any intelligent system. This paper presents an approach that integrates a powerful fuzzy rule generation algorithm with a rough set-assisted feature reduction method to generate diagnostic rule with a certainty factor. Certainty factor of each rule is calculated by considering both the membership value of each linguistic term introduced at time of fuzzyfication of data as well as possibility values, due to inconsistent data, generated by rough set theory at time of rule generation. In time of knowledge inferencing in an intelligent system, certainty factor of each rule will play an important role to find out the appropriate rule to be selected. Experimental results demonstrate the superiority of our approach.
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...ijistjournal
Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. We suggest improvements to the existing C4.5 decision tree algorithm. In this paper attribute oriented induction (AOI) and relevance analysis are incorporated with concept hierarchy’s knowledge and HeightBalancePriority algorithm for construction of decision tree along with Multi level mining. The assignment of priorities to attributes is done by evaluating information entropy, at different levels of abstraction for building decision tree using HeightBalancePriority algorithm. Modified DMQL queries are used to understand and explore the shortcomings of the decision trees generated by C4.5 classifier for education dataset and the results are compared with the proposed approach.
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...ijistjournal
Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. We suggest improvements to the existing C4.5 decision tree algorithm. In this paper attribute oriented induction (AOI) and relevance analysis are incorporated with concept hierarchy’s knowledge and HeightBalancePriority algorithm for construction of decision tree along with Multi level mining. The assignment of priorities to attributes is done by evaluating information entropy, at different levels of abstraction for building decision tree using HeightBalancePriority algorithm. Modified DMQL queries are used to understand and explore the shortcomings of the decision trees generated by C4.5 classifier for education dataset and the results are compared with the proposed approach.
Introduction to Multi-Objective Clustering EnsembleIJSRD
Association rule mining is a popular and well researched method for discovering interesting relations between variables in large databases. In this paper we introduce the concept of Data mining, Association rule and Multilevel association rule with different algorithm, its advantage and concept of Fuzzy logic and Genetic Algorithm. Multilevel association rules can be mined efficiently using concept hierarchies under a support-confidence framework.
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGijcsa
This document provides a survey of optimization approaches that have been applied to text document clustering. It discusses several clustering algorithms and categorizes them as partitioning methods, hierarchical methods, density-based methods, grid-based methods, model-based methods, frequent pattern-based clustering, and constraint-based clustering. It then describes several soft computing techniques that have been used as optimization approaches for text document clustering, including genetic algorithms, bees algorithms, particle swarm optimization, and ant colony optimization. These optimization techniques perform a global search to improve the quality and efficiency of document clustering algorithms.
Assessment of Cluster Tree Analysis based on Data Linkagesjournal ijrtem
Abstract: Details linkage is a procedure which almost adjoins two or more places of data (surveyed or proprietary) from different companies to generate a value chest of information which can be used for further analysis. This allows for the real application of the details. One-to-Many data linkage affiliates an enterprise from the first data set with a number of related companies from the other data places. Before performs concentrate on accomplishing one-to-one data linkages. So formerly a two level clustering shrub known as One-Class Clustering Tree (OCCT) with designed in Jaccard Likeness evaluate was suggested in which each flyer contains team instead of only one categorized sequence. OCCT's strategy to use Jaccard's similarity co-efficient increases time complexness significantly. So we recommend to substitute jaccard's similarity coefficient with Jaro wrinket similarity evaluate to acquire the team similarity related because it requires purchase into consideration using positional indices to calculate relevance compared with Jaccard's. An assessment of our suggested idea suffices as approval of an enhanced one-to-many data linkage system.
Index Terms: Maximum-Weighted Bipartite Matching, Ant Colony Optimization, Graph Partitioning Technique
Enhanced Clustering Algorithm for Processing Online DataIOSR Journals
This document proposes an enhanced incremental clustering algorithm for processing online data. It discusses existing clustering algorithms like leader clustering and hierarchical clustering, which have limitations in handling dynamic data. The proposed algorithm aims to dynamically create initial clusters and rearrange clusters based on data characteristics, allowing for more accurate clustering of online data over time. It also introduces a new frequency-based method for searching and retrieving specific data from fixed clusters.
This document compares hierarchical and non-hierarchical clustering algorithms. It summarizes four clustering algorithms: K-Means, K-Medoids, Farthest First Clustering (hierarchical algorithms), and DBSCAN (non-hierarchical algorithm). It describes the methodology of each algorithm and provides pseudocode. It also describes the datasets used to evaluate the performance of the algorithms and the evaluation metrics. The goal is to compare the performance of the clustering methods on different datasets.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This document discusses using particle swarm optimization to improve the k-prototype clustering algorithm. The k-prototype algorithm clusters data with both numeric and categorical attributes but can get stuck in local optima. The proposed method uses particle swarm optimization, a global optimization technique, to guide the k-prototype algorithm towards better clusterings. Particle swarm optimization models potential solutions as particles that explore the search space. It is integrated with k-prototype clustering to avoid locally optimal solutions and produce better clusterings. The method is tested on standard benchmark datasets and shown to outperform traditional k-modes and k-prototype clustering algorithms.
Applications Of Clustering Techniques In Data Mining A Comparative StudyFiona Phillips
This document discusses and compares various clustering techniques used in data mining. It begins with an introduction to data mining and clustering. It then discusses different types of clustering (hard vs soft), popular clustering methodologies like K-means, hierarchical, density-based etc. It provides examples of clustering applications. The document also discusses challenges in clustering large datasets and proposes approaches like MapReduce. It evaluates pros and cons of different clustering algorithms and their real-world applications.
An Analysis On Clustering Algorithms In Data MiningGina Rizzo
This document summarizes and analyzes various clustering algorithms used in data mining. It begins with an introduction to clustering and discusses hierarchical and partitional clustering approaches. Specific algorithms discussed include k-means, single linkage, and fuzzy clustering. The document compares techniques on aspects like complexity and sensitivity. It concludes that choosing the best clustering algorithm depends on factors like the data type, size, and desired cluster structure. The ideal is to have criteria to automatically select the appropriate algorithm for a given data set.
Multilevel techniques for the clustering problemcsandit
Data Mining is concerned with the discovery of interesting patterns and knowledge in data
repositories. Cluster Analysis which belongs to the core methods of data mining is the process
of discovering homogeneous groups called clusters. Given a data-set and some measure of
similarity between data objects, the goal in most clustering algorithms is maximizing both the
homogeneity within each cluster and the heterogeneity between different clusters. In this work,
two multilevel algorithms for the clustering problem are introduced. The multilevel
paradigm suggests looking at the clustering problem as a hierarchical optimization process
going through different levels evolving from a coarse grain to fine grain strategy. The clustering
problem is solved by first reducing the problem level by level to a coarser problem where an
initial clustering is computed. The clustering of the coarser problem is mapped back level-bylevel
to obtain a better clustering of the original problem by refining the intermediate different
clustering obtained at various levels. A benchmark using a number of data sets collected from a
variety of domains is used to compare the effectiveness of the hierarchical approach against its
single-level counterpart.
This document summarizes a research paper on clustering algorithms in data mining. It begins by defining clustering as an unsupervised learning technique that organizes unlabeled data into groups of similar objects. The document then reviews different types of clustering algorithms and methods for evaluating clustering results. Key steps in clustering include feature selection, algorithm selection, and cluster validation to assess how well the derived groups represent the underlying data structure. A variety of clustering algorithms exist and must be chosen based on the problem characteristics.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
11.software modules clustering an effective approach for reusabilityAlexander Decker
This document summarizes previous work on using clustering techniques for software module classification and reusability. It discusses hierarchical clustering and non-hierarchical clustering methods. Previous studies have used these techniques for software component classification, identifying reusable software modules, course clustering based on industry needs, mobile phone clustering based on attributes, and customer clustering based on electricity load. The document provides background on clustering analysis and its uses in various domains including software testing, pattern recognition, and software restructuring.
A Comparative Study Of Various Clustering Algorithms In Data MiningNatasha Grant
This document provides an overview and comparison of various clustering algorithms used in data mining. It discusses the key types of clustering algorithms: partition-based (such as k-means and k-medoids), hierarchical-based, density-based, and grid-based. For partition-based algorithms, it describes k-means and k-medoids in more detail. It also discusses hierarchical clustering approaches like agglomerative nesting. The document aims to provide insights into different clustering techniques for segmenting and grouping data in an unsupervised manner.
An Iterative Improved k-means ClusteringIDES Editor
This document presents a new iterative improved k-means clustering algorithm.
The k-means clustering algorithm is widely used but depends on random initial starting points, which can impact the results. The new algorithm aims to provide better initial starting points to improve k-means clustering results.
The algorithm divides the data into K initial groups, calculates new cluster centers iteratively using a distance-based formula, assigns data points to clusters, and repeats until cluster centers no longer change. Experimental results on several datasets show the new algorithm converges in fewer iterations than standard k-means, demonstrating it finds better cluster solutions.
(2016)application of parallel glowworm swarm optimization algorithm for data ...Akram Pasha
The document describes a research article that proposes applying a parallel glowworm swarm optimization algorithm for clustering large unstructured data sets. The algorithm uses optimized glowworm swarms to evaluate the clustering problem and find multiple cluster centroids. It employs the MapReduce framework for parallelization, which balances the load, localizes the data, and provides fault tolerance. Experiments showed the algorithm scales well with increasing data set sizes and achieves near-linear speed while maintaining high-quality clustering, demonstrating it is more efficient than traditional algorithms for clustering large unstructured data.
This document presents a feature clustering algorithm to reduce the dimensionality of feature vectors for text classification. The algorithm groups words in documents into clusters based on similarity, with each cluster characterized by a membership function. Words not similar to existing clusters form new clusters. This avoids specifying features in advance and the need for trial and error. Experimental results showed the method can classify text faster and with better extracted features than other methods.
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...ijaia
Feature selection and classification task are an essential process in dealing with large data sets that
comprise numerous number of input attributes. There are many search methods and classifiers that have
been used to find the optimal number of attributes. The aim of this paper is to find the optimal set of
attributes and improve the classification accuracy by adopting ensemble rule classifiers method. Research
process involves 2 phases; finding the optimal set of attributes and ensemble classifiers method for
classification task. Results are in terms of percentage of accuracy and number of selected attributes and
rules generated. 6 datasets were used for the experiment. The final output is an optimal set of attributes
with ensemble rule classifiers method. The experimental results conducted on public real dataset
demonstrate that the ensemble rule classifiers methods consistently show improve classification accuracy
on the selected dataset. Significant improvement in accuracy and optimal set of attribute selected is
achieved by adopting ensemble rule classifiers method.
Data mining is utilized to manage huge measure of information which are put in the data ware houses and databases, to discover required information and data. Numerous data mining systems have been proposed, for example, association rules, decision trees, neural systems, clustering, and so on. It has turned into the purpose of consideration from numerous years. A re-known amongst the available data mining strategies is clustering of the dataset. It is the most effective data mining method. It groups the dataset in number of clusters based on certain guidelines that are predefined. It is dependable to discover the connection between the distinctive characteristics of data.
In k-mean clustering algorithm, the function is being selected on the basis of the relevancy of the function for predicting the data and also the Euclidian distance between the centroid of any cluster and the data objects outside the cluster is being computed for the clustering the data points. In this work, author enhanced the Euclidian distance formula to increase the cluster quality.
The problem of accuracy and redundancy of the dissimilar points in the clusters remains in the improved k-means for which new enhanced approach is been proposed which uses the similarity function for checking the similarity level of the point before including it to the cluster.
Research Inventy : International Journal of Engineering and Science is published by the group of young academic and industrial researchers with 12 Issues per year. It is an online as well as print version open access journal that provides rapid publication (monthly) of articles in all areas of the subject such as: civil, mechanical, chemical, electronic and computer engineering as well as production and information technology. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published by rapid process within 20 days after acceptance and peer review process takes only 7 days. All articles published in Research Inventy will be peer-reviewed.
Similar to Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensemble: A Survey (20)
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Neural network optimizer of proportional-integral-differential controller par...IJECEIAES
Wide application of proportional-integral-differential (PID)-regulator in industry requires constant improvement of methods of its parameters adjustment. The paper deals with the issues of optimization of PID-regulator parameters with the use of neural network technology methods. A methodology for choosing the architecture (structure) of neural network optimizer is proposed, which consists in determining the number of layers, the number of neurons in each layer, as well as the form and type of activation function. Algorithms of neural network training based on the application of the method of minimizing the mismatch between the regulated value and the target value are developed. The method of back propagation of gradients is proposed to select the optimal training rate of neurons of the neural network. The neural network optimizer, which is a superstructure of the linear PID controller, allows increasing the regulation accuracy from 0.23 to 0.09, thus reducing the power consumption from 65% to 53%. The results of the conducted experiments allow us to conclude that the created neural superstructure may well become a prototype of an automatic voltage regulator (AVR)-type industrial controller for tuning the parameters of the PID controller.
An improved modulation technique suitable for a three level flying capacitor ...IJECEIAES
This research paper introduces an innovative modulation technique for controlling a 3-level flying capacitor multilevel inverter (FCMLI), aiming to streamline the modulation process in contrast to conventional methods. The proposed
simplified modulation technique paves the way for more straightforward and
efficient control of multilevel inverters, enabling their widespread adoption and
integration into modern power electronic systems. Through the amalgamation of
sinusoidal pulse width modulation (SPWM) with a high-frequency square wave
pulse, this controlling technique attains energy equilibrium across the coupling
capacitor. The modulation scheme incorporates a simplified switching pattern
and a decreased count of voltage references, thereby simplifying the control
algorithm.
A review on features and methods of potential fishing zoneIJECEIAES
This review focuses on the importance of identifying potential fishing zones in seawater for sustainable fishing practices. It explores features like sea surface temperature (SST) and sea surface height (SSH), along with classification methods such as classifiers. The features like SST, SSH, and different classifiers used to classify the data, have been figured out in this review study. This study underscores the importance of examining potential fishing zones using advanced analytical techniques. It thoroughly explores the methodologies employed by researchers, covering both past and current approaches. The examination centers on data characteristics and the application of classification algorithms for classification of potential fishing zones. Furthermore, the prediction of potential fishing zones relies significantly on the effectiveness of classification algorithms. Previous research has assessed the performance of models like support vector machines, naïve Bayes, and artificial neural networks (ANN). In the previous result, the results of support vector machine (SVM) were 97.6% more accurate than naive Bayes's 94.2% to classify test data for fisheries classification. By considering the recent works in this area, several recommendations for future works are presented to further improve the performance of the potential fishing zone models, which is important to the fisheries community.
Electrical signal interference minimization using appropriate core material f...IJECEIAES
As demand for smaller, quicker, and more powerful devices rises, Moore's law is strictly followed. The industry has worked hard to make little devices that boost productivity. The goal is to optimize device density. Scientists are reducing connection delays to improve circuit performance. This helped them understand three-dimensional integrated circuit (3D IC) concepts, which stack active devices and create vertical connections to diminish latency and lower interconnects. Electrical involvement is a big worry with 3D integrates circuits. Researchers have developed and tested through silicon via (TSV) and substrates to decrease electrical wave involvement. This study illustrates a novel noise coupling reduction method using several electrical involvement models. A 22% drop in electrical involvement from wave-carrying to victim TSVs introduces this new paradigm and improves system performance even at higher THz frequencies.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Bibliometric analysis highlighting the role of women in addressing climate ch...IJECEIAES
Fossil fuel consumption increased quickly, contributing to climate change
that is evident in unusual flooding and draughts, and global warming. Over
the past ten years, women's involvement in society has grown dramatically,
and they succeeded in playing a noticeable role in reducing climate change.
A bibliometric analysis of data from the last ten years has been carried out to
examine the role of women in addressing the climate change. The analysis's
findings discussed the relevant to the sustainable development goals (SDGs),
particularly SDG 7 and SDG 13. The results considered contributions made
by women in the various sectors while taking geographic dispersion into
account. The bibliometric analysis delves into topics including women's
leadership in environmental groups, their involvement in policymaking, their
contributions to sustainable development projects, and the influence of
gender diversity on attempts to mitigate climate change. This study's results
highlight how women have influenced policies and actions related to climate
change, point out areas of research deficiency and recommendations on how
to increase role of the women in addressing the climate change and
achieving sustainability. To achieve more successful results, this initiative
aims to highlight the significance of gender equality and encourage
inclusivity in climate change decision-making processes.
Voltage and frequency control of microgrid in presence of micro-turbine inter...IJECEIAES
The active and reactive load changes have a significant impact on voltage
and frequency. In this paper, in order to stabilize the microgrid (MG) against
load variations in islanding mode, the active and reactive power of all
distributed generators (DGs), including energy storage (battery), diesel
generator, and micro-turbine, are controlled. The micro-turbine generator is
connected to MG through a three-phase to three-phase matrix converter, and
the droop control method is applied for controlling the voltage and
frequency of MG. In addition, a method is introduced for voltage and
frequency control of micro-turbines in the transition state from gridconnected mode to islanding mode. A novel switching strategy of the matrix
converter is used for converting the high-frequency output voltage of the
micro-turbine to the grid-side frequency of the utility system. Moreover,
using the switching strategy, the low-order harmonics in the output current
and voltage are not produced, and consequently, the size of the output filter
would be reduced. In fact, the suggested control strategy is load-independent
and has no frequency conversion restrictions. The proposed approach for
voltage and frequency regulation demonstrates exceptional performance and
favorable response across various load alteration scenarios. The suggested
strategy is examined in several scenarios in the MG test systems, and the
simulation results are addressed.
Enhancing battery system identification: nonlinear autoregressive modeling fo...IJECEIAES
Precisely characterizing Li-ion batteries is essential for optimizing their
performance, enhancing safety, and prolonging their lifespan across various
applications, such as electric vehicles and renewable energy systems. This
article introduces an innovative nonlinear methodology for system
identification of a Li-ion battery, employing a nonlinear autoregressive with
exogenous inputs (NARX) model. The proposed approach integrates the
benefits of nonlinear modeling with the adaptability of the NARX structure,
facilitating a more comprehensive representation of the intricate
electrochemical processes within the battery. Experimental data collected
from a Li-ion battery operating under diverse scenarios are employed to
validate the effectiveness of the proposed methodology. The identified
NARX model exhibits superior accuracy in predicting the battery's behavior
compared to traditional linear models. This study underscores the
importance of accounting for nonlinearities in battery modeling, providing
insights into the intricate relationships between state-of-charge, voltage, and
current under dynamic conditions.
Smart grid deployment: from a bibliometric analysis to a surveyIJECEIAES
Smart grids are one of the last decades' innovations in electrical energy.
They bring relevant advantages compared to the traditional grid and
significant interest from the research community. Assessing the field's
evolution is essential to propose guidelines for facing new and future smart
grid challenges. In addition, knowing the main technologies involved in the
deployment of smart grids (SGs) is important to highlight possible
shortcomings that can be mitigated by developing new tools. This paper
contributes to the research trends mentioned above by focusing on two
objectives. First, a bibliometric analysis is presented to give an overview of
the current research level about smart grid deployment. Second, a survey of
the main technological approaches used for smart grid implementation and
their contributions are highlighted. To that effect, we searched the Web of
Science (WoS), and the Scopus databases. We obtained 5,663 documents
from WoS and 7,215 from Scopus on smart grid implementation or
deployment. With the extraction limitation in the Scopus database, 5,872 of
the 7,215 documents were extracted using a multi-step process. These two
datasets have been analyzed using a bibliometric tool called bibliometrix.
The main outputs are presented with some recommendations for future
research.
Use of analytical hierarchy process for selecting and prioritizing islanding ...IJECEIAES
One of the problems that are associated to power systems is islanding
condition, which must be rapidly and properly detected to prevent any
negative consequences on the system's protection, stability, and security.
This paper offers a thorough overview of several islanding detection
strategies, which are divided into two categories: classic approaches,
including local and remote approaches, and modern techniques, including
techniques based on signal processing and computational intelligence.
Additionally, each approach is compared and assessed based on several
factors, including implementation costs, non-detected zones, declining
power quality, and response times using the analytical hierarchy process
(AHP). The multi-criteria decision-making analysis shows that the overall
weight of passive methods (24.7%), active methods (7.8%), hybrid methods
(5.6%), remote methods (14.5%), signal processing-based methods (26.6%),
and computational intelligent-based methods (20.8%) based on the
comparison of all criteria together. Thus, it can be seen from the total weight
that hybrid approaches are the least suitable to be chosen, while signal
processing-based methods are the most appropriate islanding detection
method to be selected and implemented in power system with respect to the
aforementioned factors. Using Expert Choice software, the proposed
hierarchy model is studied and examined.
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...IJECEIAES
The power generated by photovoltaic (PV) systems is influenced by
environmental factors. This variability hampers the control and utilization of
solar cells' peak output. In this study, a single-stage grid-connected PV
system is designed to enhance power quality. Our approach employs fuzzy
logic in the direct power control (DPC) of a three-phase voltage source
inverter (VSI), enabling seamless integration of the PV connected to the
grid. Additionally, a fuzzy logic-based maximum power point tracking
(MPPT) controller is adopted, which outperforms traditional methods like
incremental conductance (INC) in enhancing solar cell efficiency and
minimizing the response time. Moreover, the inverter's real-time active and
reactive power is directly managed to achieve a unity power factor (UPF).
The system's performance is assessed through MATLAB/Simulink
implementation, showing marked improvement over conventional methods,
particularly in steady-state and varying weather conditions. For solar
irradiances of 500 and 1,000 W/m2
, the results show that the proposed
method reduces the total harmonic distortion (THD) of the injected current
to the grid by approximately 46% and 38% compared to conventional
methods, respectively. Furthermore, we compare the simulation results with
IEEE standards to evaluate the system's grid compatibility.
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...IJECEIAES
Photovoltaic systems have emerged as a promising energy resource that
caters to the future needs of society, owing to their renewable, inexhaustible,
and cost-free nature. The power output of these systems relies on solar cell
radiation and temperature. In order to mitigate the dependence on
atmospheric conditions and enhance power tracking, a conventional
approach has been improved by integrating various methods. To optimize
the generation of electricity from solar systems, the maximum power point
tracking (MPPT) technique is employed. To overcome limitations such as
steady-state voltage oscillations and improve transient response, two
traditional MPPT methods, namely fuzzy logic controller (FLC) and perturb
and observe (P&O), have been modified. This research paper aims to
simulate and validate the step size of the proposed modified P&O and FLC
techniques within the MPPT algorithm using MATLAB/Simulink for
efficient power tracking in photovoltaic systems.
Adaptive synchronous sliding control for a robot manipulator based on neural ...IJECEIAES
Robot manipulators have become important equipment in production lines, medical fields, and transportation. Improving the quality of trajectory tracking for
robot hands is always an attractive topic in the research community. This is a
challenging problem because robot manipulators are complex nonlinear systems
and are often subject to fluctuations in loads and external disturbances. This
article proposes an adaptive synchronous sliding control scheme to improve trajectory tracking performance for a robot manipulator. The proposed controller
ensures that the positions of the joints track the desired trajectory, synchronize
the errors, and significantly reduces chattering. First, the synchronous tracking
errors and synchronous sliding surfaces are presented. Second, the synchronous
tracking error dynamics are determined. Third, a robust adaptive control law is
designed,the unknown components of the model are estimated online by the neural network, and the parameters of the switching elements are selected by fuzzy
logic. The built algorithm ensures that the tracking and approximation errors
are ultimately uniformly bounded (UUB). Finally, the effectiveness of the constructed algorithm is demonstrated through simulation and experimental results.
Simulation and experimental results show that the proposed controller is effective with small synchronous tracking errors, and the chattering phenomenon is
significantly reduced.
Remote field-programmable gate array laboratory for signal acquisition and de...IJECEIAES
A remote laboratory utilizing field-programmable gate array (FPGA) technologies enhances students’ learning experience anywhere and anytime in embedded system design. Existing remote laboratories prioritize hardware access and visual feedback for observing board behavior after programming, neglecting comprehensive debugging tools to resolve errors that require internal signal acquisition. This paper proposes a novel remote embeddedsystem design approach targeting FPGA technologies that are fully interactive via a web-based platform. Our solution provides FPGA board access and debugging capabilities beyond the visual feedback provided by existing remote laboratories. We implemented a lab module that allows users to seamlessly incorporate into their FPGA design. The module minimizes hardware resource utilization while enabling the acquisition of a large number of data samples from the signal during the experiments by adaptively compressing the signal prior to data transmission. The results demonstrate an average compression ratio of 2.90 across three benchmark signals, indicating efficient signal acquisition and effective debugging and analysis. This method allows users to acquire more data samples than conventional methods. The proposed lab allows students to remotely test and debug their designs, bridging the gap between theory and practice in embedded system design.
Detecting and resolving feature envy through automated machine learning and m...IJECEIAES
Efficiently identifying and resolving code smells enhances software project quality. This paper presents a novel solution, utilizing automated machine learning (AutoML) techniques, to detect code smells and apply move method refactoring. By evaluating code metrics before and after refactoring, we assessed its impact on coupling, complexity, and cohesion. Key contributions of this research include a unique dataset for code smell classification and the development of models using AutoGluon for optimal performance. Furthermore, the study identifies the top 20 influential features in classifying feature envy, a well-known code smell, stemming from excessive reliance on external classes. We also explored how move method refactoring addresses feature envy, revealing reduced coupling and complexity, and improved cohesion, ultimately enhancing code quality. In summary, this research offers an empirical, data-driven approach, integrating AutoML and move method refactoring to optimize software project quality. Insights gained shed light on the benefits of refactoring on code quality and the significance of specific features in detecting feature envy. Future research can expand to explore additional refactoring techniques and a broader range of code metrics, advancing software engineering practices and standards.
Smart monitoring technique for solar cell systems using internet of things ba...IJECEIAES
Rapidly and remotely monitoring and receiving the solar cell systems status parameters, solar irradiance, temperature, and humidity, are critical issues in enhancement their efficiency. Hence, in the present article an improved smart prototype of internet of things (IoT) technique based on embedded system through NodeMCU ESP8266 (ESP-12E) was carried out experimentally. Three different regions at Egypt; Luxor, Cairo, and El-Beheira cities were chosen to study their solar irradiance profile, temperature, and humidity by the proposed IoT system. The monitoring data of solar irradiance, temperature, and humidity were live visualized directly by Ubidots through hypertext transfer protocol (HTTP) protocol. The measured solar power radiation in Luxor, Cairo, and El-Beheira ranged between 216-1000, 245-958, and 187-692 W/m 2 respectively during the solar day. The accuracy and rapidity of obtaining monitoring results using the proposed IoT system made it a strong candidate for application in monitoring solar cell systems. On the other hand, the obtained solar power radiation results of the three considered regions strongly candidate Luxor and Cairo as suitable places to build up a solar cells system station rather than El-Beheira.
An efficient security framework for intrusion detection and prevention in int...IJECEIAES
Over the past few years, the internet of things (IoT) has advanced to connect billions of smart devices to improve quality of life. However, anomalies or malicious intrusions pose several security loopholes, leading to performance degradation and threat to data security in IoT operations. Thereby, IoT security systems must keep an eye on and restrict unwanted events from occurring in the IoT network. Recently, various technical solutions based on machine learning (ML) models have been derived towards identifying and restricting unwanted events in IoT. However, most ML-based approaches are prone to miss-classification due to inappropriate feature selection. Additionally, most ML approaches applied to intrusion detection and prevention consider supervised learning, which requires a large amount of labeled data to be trained. Consequently, such complex datasets are impossible to source in a large network like IoT. To address this problem, this proposed study introduces an efficient learning mechanism to strengthen the IoT security aspects. The proposed algorithm incorporates supervised and unsupervised approaches to improve the learning models for intrusion detection and mitigation. Compared with the related works, the experimental outcome shows that the model performs well in a benchmark dataset. It accomplishes an improved detection accuracy of approximately 99.21%.
Design and optimization of ion propulsion dronebjmsejournal
Electric propulsion technology is widely used in many kinds of vehicles in recent years, and aircrafts are no exception. Technically, UAVs are electrically propelled but tend to produce a significant amount of noise and vibrations. Ion propulsion technology for drones is a potential solution to this problem. Ion propulsion technology is proven to be feasible in the earth’s atmosphere. The study presented in this article shows the design of EHD thrusters and power supply for ion propulsion drones along with performance optimization of high-voltage power supply for endurance in earth’s atmosphere.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELijaia
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...PriyankaKilaniya
Energy efficiency has been important since the latter part of the last century. The main object of this survey is to determine the energy efficiency knowledge among consumers. Two separate districts in Bangladesh are selected to conduct the survey on households and showrooms about the energy and seller also. The survey uses the data to find some regression equations from which it is easy to predict energy efficiency knowledge. The data is analyzed and calculated based on five important criteria. The initial target was to find some factors that help predict a person's energy efficiency knowledge. From the survey, it is found that the energy efficiency awareness among the people of our country is very low. Relationships between household energy use behaviors are estimated using a unique dataset of about 40 households and 20 showrooms in Bangladesh's Chapainawabganj and Bagerhat districts. Knowledge of energy consumption and energy efficiency technology options is found to be associated with household use of energy conservation practices. Household characteristics also influence household energy use behavior. Younger household cohorts are more likely to adopt energy-efficient technologies and energy conservation practices and place primary importance on energy saving for environmental reasons. Education also influences attitudes toward energy conservation in Bangladesh. Low-education households indicate they primarily save electricity for the environment while high-education households indicate they are motivated by environmental concerns.
Digital Twins Computer Networking Paper Presentation.pptxaryanpankaj78
A Digital Twin in computer networking is a virtual representation of a physical network, used to simulate, analyze, and optimize network performance and reliability. It leverages real-time data to enhance network management, predict issues, and improve decision-making processes.
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...PIMR BHOPAL
Variable frequency drive .A Variable Frequency Drive (VFD) is an electronic device used to control the speed and torque of an electric motor by varying the frequency and voltage of its power supply. VFDs are widely used in industrial applications for motor control, providing significant energy savings and precise motor operation.
Build the Next Generation of Apps with the Einstein 1 Platform.
Rejoignez Philippe Ozil pour une session de workshops qui vous guidera à travers les détails de la plateforme Einstein 1, l'importance des données pour la création d'applications d'intelligence artificielle et les différents outils et technologies que Salesforce propose pour vous apporter tous les bénéfices de l'IA.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
Applications of artificial Intelligence in Mechanical Engineering.pdfAtif Razi
Historically, mechanical engineering has relied heavily on human expertise and empirical methods to solve complex problems. With the introduction of computer-aided design (CAD) and finite element analysis (FEA), the field took its first steps towards digitization. These tools allowed engineers to simulate and analyze mechanical systems with greater accuracy and efficiency. However, the sheer volume of data generated by modern engineering systems and the increasing complexity of these systems have necessitated more advanced analytical tools, paving the way for AI.
AI offers the capability to process vast amounts of data, identify patterns, and make predictions with a level of speed and accuracy unattainable by traditional methods. This has profound implications for mechanical engineering, enabling more efficient design processes, predictive maintenance strategies, and optimized manufacturing operations. AI-driven tools can learn from historical data, adapt to new information, and continuously improve their performance, making them invaluable in tackling the multifaceted challenges of modern mechanical engineering.
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...shadow0702a
This document serves as a comprehensive step-by-step guide on how to effectively use PyCharm for remote debugging of the Windows Subsystem for Linux (WSL) on a local Windows machine. It meticulously outlines several critical steps in the process, starting with the crucial task of enabling permissions, followed by the installation and configuration of WSL.
The guide then proceeds to explain how to set up the SSH service within the WSL environment, an integral part of the process. Alongside this, it also provides detailed instructions on how to modify the inbound rules of the Windows firewall to facilitate the process, ensuring that there are no connectivity issues that could potentially hinder the debugging process.
The document further emphasizes on the importance of checking the connection between the Windows and WSL environments, providing instructions on how to ensure that the connection is optimal and ready for remote debugging.
It also offers an in-depth guide on how to configure the WSL interpreter and files within the PyCharm environment. This is essential for ensuring that the debugging process is set up correctly and that the program can be run effectively within the WSL terminal.
Additionally, the document provides guidance on how to set up breakpoints for debugging, a fundamental aspect of the debugging process which allows the developer to stop the execution of their code at certain points and inspect their program at those stages.
Finally, the document concludes by providing a link to a reference blog. This blog offers additional information and guidance on configuring the remote Python interpreter in PyCharm, providing the reader with a well-rounded understanding of the process.
Software Engineering and Project Management - Software Testing + Agile Method...Prakhyath Rai
Software Testing: A Strategic Approach to Software Testing, Strategic Issues, Test Strategies for Conventional Software, Test Strategies for Object -Oriented Software, Validation Testing, System Testing, The Art of Debugging.
Agile Methodology: Before Agile – Waterfall, Agile Development.
Software Engineering and Project Management - Introduction, Modeling Concepts...Prakhyath Rai
Introduction, Modeling Concepts and Class Modeling: What is Object orientation? What is OO development? OO Themes; Evidence for usefulness of OO development; OO modeling history. Modeling
as Design technique: Modeling, abstraction, The Three models. Class Modeling: Object and Class Concept, Link and associations concepts, Generalization and Inheritance, A sample class model, Navigation of class models, and UML diagrams
Building the Analysis Models: Requirement Analysis, Analysis Model Approaches, Data modeling Concepts, Object Oriented Analysis, Scenario-Based Modeling, Flow-Oriented Modeling, class Based Modeling, Creating a Behavioral Model.
Discover the latest insights on Data Driven Maintenance with our comprehensive webinar presentation. Learn about traditional maintenance challenges, the right approach to utilizing data, and the benefits of adopting a Data Driven Maintenance strategy. Explore real-world examples, industry best practices, and innovative solutions like FMECA and the D3M model. This presentation, led by expert Jules Oudmans, is essential for asset owners looking to optimize their maintenance processes and leverage digital technologies for improved efficiency and performance. Download now to stay ahead in the evolving maintenance landscape.
2. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 8, No. 4, August 2018 : 2351 – 2357
2352
Some of the data sets contains any type of data such as numeric or categorical or both and differs
from their attributes. Conventional clustering algorithms cannot perform well when the data sets are mixed
type. In order to group the dissimilar data types there is a clustering ensemble approach in which a combined
solution is obtained from a group of individual solutions to produce a quality clustering. Clustering ensemble
improves the strength and stability of clustering solution due to its consolidating and dividing nature. Cluster
Ensemble is an approach that consolidates various findings of dissimilar clusterings to bring out the final
quality clustering of original data set.
Clustering ensembles are more advantageous than a single clustering algorithm in so many strands
Robustness: The clustering ensemble improves the average performance on different streams and datasets
more than a single clustering.
Novelty: The combined solution gives unusual results which cannot be produced by one clustering algorithm.
Stability: Clustering ensemble works efficiently and can handle noise and outliers.
Parallelization and Scalability: Parallelization of clustering can be acquired by successive synthesis of
results. It has the ability to amalgamate results from multiple heterogeneous sources of data.
Clustering ensembles are used in many areas. Such as bioinformatics, machine learning and
information retrival. The ensembles are formulated with different types of optimization algorithms such as
genetic algorithms, evolutionary algorithms, k-particle swarm optimization algorithm, k-muscles wandering
optimization algorithms in different aspects in different areas are specified in the following sections clearly in
accordance with some journal papers.
The difficulty with clustering ensemble is to perceive a consensus function. There are different
consensuses functions are available but in order to increase the stability and robustness genetic algorithm
with co-association matrix is used as a consensus function. The genetic algorithm composed of four phases
includes fitness function, selection method, crossover method and finally mutation method. The co-
association matrix values are used to obtain the intra and extra cluster fitness by evaluating average similarity
between all clusters in first phase. In the second phase tournament selection is used in which two individuals
are adopted arbitrarily and the individual with preferable fitness is elected for next population. In third phase
the two off springs are generated by the individuals are exchanged with a random crossover point.
Intelligence mutation is used in fourth phase [5].
The utilization of locally adaptive clustering algorithm provides an implementation to identify a
partition that finds solutions to the clusters. It imparts set of clusters with some weights then assign a
specified probability to each cluster. Using Jaccard coefficient find inter cluster similarity based on feature
and object. This two-objective clustering ensemble complicate in setting parameter and in interpretation of
results. So single objective clustering is composed both feature based and object based as a whole which
increases accuracy [6].
For stream mining clustering ensemble is imparted. This integrates both clusters and classifiers
together and employ genetic algorithm and has high propensity to handle optimization [7].
The clustering ensemble is designed as an optimization problem on multiple objects by adopting
evolutionary algorithm on multiple objects. The first criteria in multi objective clustering ensemble are to
maximize the similarity measure of final clustering from all input clusterings. The similarity measure is
calculated using adjusted random index. The second criterion in multi objective clustering ensemble is to
reduce the similarity measure [8].
The clustering ensemble is designed using three different algorithms k-means, k-particle swarm
optimization and k-muscles wandering optimization. The combination of k-means with muscles wandering
optimization overcomes the shortcomings of k-means algorithm. It implements similarity based clustering
algorithm using weights on input data. Samples the dataset first and then apply clustering algorithms
specified on subsamples which give clustering results. From that similarity matrices are generated. Based on
various metrics of clustering best clustering can be derived. Reduce the weights of the samples. Repeat the
process until best resultant clustering found [9].
The clustering ensemble is introduced based on particle swarm clustering. The particle swarm
clustering is act as a base clusterer and as well as consensus function is a challenging element. The consensus
function allows the base partitions with different number of clusters and permits both disjoint and
overlapping partitions. Proposed ensemble produce statistically better partitions [10].
The next part of the paper is organized as follows. Section 2 gives concept of clustering ensemble,
Section 3 explains taxonomy of generation methods, Section 4 specifies taxonomy of consensus methods and
Section 5 presents conclusion and future work on clustering ensemble.
3. Int J Elec & Comp Eng ISSN: 2088-8708
Extensive Analysis on Generation and Consensus Mechanisms of Clustering … (Y. Leela Sandhya Rani)
2353
2. CLUSTERING ENSEMBLE
Clustering ensemble is an approach that concatenates the subset clustering solutions in to quality
clustering for original data set. From Figure 1 the data set can be divided into N number of samples and
applied clustering algorithm on each sample set, clustering ensemble generates N clustering solutions and
finally combined the solutions to get final quality clustering using a consensus function. The clustering
ensemble has been processed in two steps. One is generation step which generates the number of clustering
solutions. Second one is consensus step which combines the solution into a final clustering.
2.1. Generation step
The way of combining all the individual clustering solutions of subsets generated from original data
set is called ensembling. The first step is generation of all individual clustering solutions. The clustering
ensemble is the combination of clustering results. Given a data sets of m objects P ={P1,P2,P3….Pm}, the
clustering ensemble generates n number of clusterings represented as β={β1, β2, β3….. βn}[1]. Each clustering
solution βi is one part of the original data set P into Ki dissimilar groups of objects, denoted as βi = ai1,…aik.
Figure 1. General mechanism of clustering ensemble
2.2. Consensus step
The consensus step is used to combine the solutions of clusterings and is the important step in any
algorithm of clustering ensemble. It is the function which improves the results of single clustering algorithm.
There are two ways to apply consensus function one is correlation between objects and optimization for
partition. The first one is to analyze the number of one instance belonging to one cluster and number of two
instances belonging to the same cluster. It is done through voting approach and Co-Association Matrix based
methods.
In the second approach of consensus function, the feature partition is acquired in association with
optimization problem [11]. The partition can be find by using some similarity between the features is the
main problem with respect to the cluster ensemble. Formally, the feature partition is defined as:
∑
Here ɼ is a similarity measure between partitions. The feature partition is the maximization problem
which is given as the subgroup that increases the similarity with all subgroups in the cluster ensemble. The
following are the examples use feature partition are kernel based methods and non-negative matrix
factorization.
3. TAXONAMY OF CLUSTERING ENSEMBLE GENERATION METHODS
The clustering ensemble generates the set of clustering solutions by applying some clustering
algorithm on set of samples and combines the clustering solutions to get final quality clustering. The main
concept is to handle different types of features. This can be solved by randomly selecting the features on
basis of cluster analysis. The clustering ensemble produces accurate results as it finds one final clustering
4. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 8, No. 4, August 2018 : 2351 – 2357
2354
from set of clusterings based on different samples with different algorithms for generation process. We
provide some of the generation methods which are reviewed from previous papers.
3.1. Similar ensembling
The ensemble generation process uses similar algorithms for all sample subsets. The k-means
algorithm fails to perform class discovery effectively on data sets because it assumes that the data is in
Gaussian distribution. As spectral clustering handles this problem, Spectral clustering (SC) algorithm is used
for the ensemble generation in Knowledge based Cluster Ensemble (KCE).
The knowledge based cluster ensemble randomly generates dx
dimensions from d dimensions this
process continues until A subspaces generated {D1,D2,….Da}. A spectral clustering algorithm is applied to
each subspace and generates A clustering solutions. Spectral clustering partition the data points into K
classes. First SC constructs an affinity matrix, and obtains a normalized matrix X, and then applies k-means
for each row of X to get these points into K clusters. In this way SC is applied repeatedly to all samples to get
solution for each sample subspace.
Next a confidence factor was calculated for all the clustering solutions by constructing adjacency
matrix on pair wise constraints. If the most of the pair wise constraints are satisfied by the clustering solution
it will have high confidence measure otherwise less confidence measure [12].
3.2. Random initialization of input parameters
The “Projective Clustering Ensemble” (PCE) is based on a set of heterogeneous gene-to-cluster
assignments and sample-to-cluster assignments. Input to the PCE is taken from gene expression data G. Each
entry of G represents a gene expression level of a particular gene. If we group samples into clusters use
sample-to-cluster assignment. The probability of a sample that belongs to a cluster is nothing but sample-to-
cluster assignment. If we group genes into clusters use gene-to-cluster assignment. The probability of gene
belonging to a cluster is nothing but gene-to-cluster assignment.
These assignments are produced by applying continuous projective clustering N times with different
random initializations for input parameters to produce N clustering solutions, which are used as main
clustering for consensus clustering [13].
3.3. Feature selection for sampling
Set of sample subset can be generated based on random sampling techniques to generate set of
clustering solutions. Now a days large dimensional data sets are used for data analysis. So feature selection
plays an important role in generation of sample subsets. In “Double Selection Semi Supervised Clustering
Ensemble” (DSSSCE) they used feature selection methods to remove noise and outliers.
The DSSSCE use input from gene expression data. It first applies a set of feature selection methods
such as Mutual Information Maximization (MIM), Mutual Information Feature Selection (MIFS), Joint
Mutual Information (JMI), Conditional Infomax Feature Extraction (CIFE), Conditional Redundancy
(CONDRED), Interaction Capping (ICAP), Double Input Symmetrical Relevance (DISR), Max-Relevance
Min-Redundancy (MRMR) to select set of sub samples.
Later DSSSCE applies PC-Kmeans to identify the labels of the cancer dataset. This algorithm
considers the number of must-link and cannot-link constraints between pairs of cancer samples which leads
to clustering solution. Using feature selection methods as a selection strategy it selects set of clustering
solutions and aggregate all the solutions by building matrix in the first phase. Next, DSSSCE divides the
aggregated solution into set of clustering solutions and calculate the confidence factors for the clustering
solutions based on prior knowledge of the data set which is specified by pair wise constraints [14].
3.4. Incremental ensembling
In “Incremental Semi Supervised Clustering Ensemble” (ISSCE) first one original ensemble is
generated. Then the final new ensemble is produced with the help of set of selection members. It generates
two ensembles using random subspace generation method as a subspace generator, Constraint Propagation
approach as a clustering algorithm.
The Double Selection Semi Supervised Clustering Ensemble feature selection methods are used as a
subset selection and clustering applied in two phases. The ISSCE also used two ensembles in the design. To
handle high dimensional data space use random subspace methodology to generate set of subspaces. Apply
constraint propagation methodology on set of subspaces to produce set of clustering solutions. The ISSCE
incorporated incremental member selection process based on local and global cost function and produced
new ensemble with same algorithm used in first ensemble [15].
5. Int J Elec & Comp Eng ISSN: 2088-8708
Extensive Analysis on Generation and Consensus Mechanisms of Clustering … (Y. Leela Sandhya Rani)
2355
3.5. Dissimilar ensembling
The generation step in dissimilar ensembling involves different clustering algorithms and finally
uses a base clustering. The number of clusters is generated by using different clustering algorithm with its
different parameters that is randomization of the sample data. Different clustering algorithms may give
different clustering results due to properties of data.
If we apply different number of clustering algorithms on data set then set of different clustering
solutions may occur. Among them a compromised clustering solution should be identified. All the clusters
can be identified by using any one of the candidate clustering algorithm. So it must follow that any data point
must assigned to only one cluster and every data point in the set must assigned to any cluster. It is necessary
to interpret all the partitions whether they follow above mentioned criteria or not. Use goodness function to
evaluate the quality of the cluster. Select certain clustering solutions using goodness function [16].
4. TAXONAMY OF CLUSTERING ENSEMBLE CONSENSUS METHODS
There are various types of consensus functions they are Hyper graph Partitioning, Co-association
based functions, Mutual Information Algorithm, Finite Mixture model and Voting Approach. We provide
some of the consensus functions which are reviewed from previous papers.
4.1. Spectral graph partitioning
Spectral Clustering chooses a spectral graph partitioning algorithm, which used to optimize the cut
scale. First KCE constructs a matrix by considering all the generated matrices of the clustering solutions
and respective confidence factors simply concatenation of all matrices. Finally based on spectral clustering
algorithm it partitions the new features into K classes. For the majority of the cancer datasets KCE
outperforms the other clustering ensembles.
KCE constructs a matrix by specifying all the membership matrices of the clustering solutions and
the respective confidence factors as follows:
Where is is the representation of all membership matrices of the clustering solutions, and
set of confidence vectors of clustering solutions. is used to concatenate these two. Using spectral
clustering partition the new features of concatenated result in to K classes [12].
4.2. Optimization algorithm
It is necessary for a clustering ensemble to find a consensus function that minimizes the distance
from all clusters so the following function is optimized.
{ }
Here is used as a distance function for the clusterings. PCE optimize the for two requirements
gene-to-cluster and sample-to-cluster assignment. So Expectation Maximization of Projective Clustering
Ensemble (EM-PCE) is used as a consensus function. The main aim of EM-PCE is to minimize the error that
corresponds to both sample to cluster and gene to cluster assignment [13].
4.3. Graph partitioning
In “double selection semi supervised clustering ensemble” they designed consensus function by
combining all the membership matrices of the clustering solutions and corresponding confidence factors to
one matrix A. Based on the sample set Y a graph is constructed on Y and A. Using the normalized cut
approach on the constructed graph, the final clustering of the original data set is obtained [14], [15].
4.4. Hill climbing
Based upon the goodness function the number of clustering solutions can be obtained. To generate
clustering solutions there are two conflicts, one is absence conflict and other is coverage conflict. So the
consideration of conflicts becomes NP hard. Based on hill climbing approach the optimization problem can
be solved and finally gets one clustering for the given data set [16].
6. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 8, No. 4, August 2018 : 2351 – 2357
2356
5. PROPOSED OBJECTIVE
Clustering ensemble is a framework that combines the solutions from individual clusterings to
produce a qualified clustering. Our objective is to preprocess the data by using hybrid fuzzy logic feature
selection method which is our next future work. For the resultant samples we apply different clustering
algorithms and finally we get qualified clustering. From Figure 2, c-1 c-2 c-3 c-4 specifies different
clustering algorithms.
Figure 2. Extended clustering ensemble
6. CONCLUSION AND FUTURE WORK
Clustering ensemble is a framework that provides set of clustering solutions and merges solutions to
get a qualified clustering output for the given data set. Definitely it produces more accurate results as with the
single clustering and also improves the robustness, scalability and quality of the clustering. In this paper we
reviewed some papers which use different generation methods and different consensus functions to get final
clustering. The main aspect is generation mechanism. In some papers they used similar algorithms and in
some they used dissimilar algorithms for generation process. For sampling of subspace some used feature
selection and some used Random Sampling. Current trends handle large dimensional data sets so we use
feature selection methods for reducing the dimensionality and increasing the performance. Later we apply
different clustering algorithms for each subset generated from the application of feature selection methods.
This new generation step of our new ensemble increases the performance of the final clustering solution as
we applying hybrid fuzzy logic feature selection method and different clustering algorithms. If we remove
the noise and redundant data from the data set it will increases the performance of data analysis. It is done
with hybrid fuzzy logic feature selection method. If we apply different clustering algorithms different
clustering solutions will be generated from them which are having the highest similarity those will be
considered as best clustering solutions and also uncovered clusters from different solutions are amalgamated
to get final clustering solution. This is the future scope of our work.
REFERENCES
[1] Naveen Kumar, S. Siva Sathya, “Clustering Assisted Co-location Pattern Mining for Spatial Data”, International
Journal of Applied Engineering Research, ISSN 0973-4562, vol. 11, no. 2, pp. 1386-1393, 2016.
[2] M. John Basha, K.P. Kaliyamurthie, “An Improved Similarity Matching based Clustering Framework for Short and
Sentence Level Text”, International Journal of Electrical and Computer Engineering (IJECE), vol. 7, no. 1,
pp. 551-558, February 2017.
[3] Charu Virmani, Anuradha Pillai, Dimple Juneja, “Clustering in Aggregated User Profiles across Multiple Social
Networks”, International Journal of Electrical and Computer Engineering (IJECE), vol. 7, no. 6, pp. 3692-3699
December 2017.
[4] K.R. Nirmal, K.V.V. Satyanarayana, “Issues of K Means Clustering While Migrating to Map Reduce Paradigm
with Big Data: A Survey”, International Journal of Electrical and Computer Engineering (IJECE), vol. 6, no. 6, ,
pp. 3047-3051, December 2016.
[5] Javad Azimi, Mehdi Mohammadi, Ali movaghar, Morteza Analoui, “Clustering Ensembles Using Genetic
Algorithm”, The International Workshop on Computer Architecture for Machine Perception and Sensing,
September 2006.
[6] Purushothaman B, “Clustering Ensembles Using Evolutionary Algorithm”, International Journal for Research in
Applied Science & Engineering Technology, ISSN: 2321-9653, vol. 3, no. 2, February 2015.
7. Int J Elec & Comp Eng ISSN: 2088-8708
Extensive Analysis on Generation and Consensus Mechanisms of Clustering … (Y. Leela Sandhya Rani)
2357
[7] Anutosh Pratap Singh, Jitendra Agrawal, Varsha Sharma, “An Efficient Approach to Enhance Classifier and
Cluster Ensembles Using Genetic algorithms for Mining Drifting Data Streams Multi objecctive clustering”,
International Journal of Computer Applications ISSN 0975 – 8887, vol. 44, no. 21, April 2012.
[8] Sujoy Chatterjee, Anirban Mukhopadhyay, “Clustering Ensemble: A Multiobjective Genetic Algorithm based
Approach”, International Conference on Computational Intelligence: Modeling, Techniques and Application, 2013.
[9] Qi Kang, ShiYao Liu, Meng Chu Zhou, SiSi Li, “A weight-incorporated similarity-based clustering ensemble
method based on swarm intelligence”, knowledge based systems, Elsevier.
[10] Jos´e Valente de Oliveira, “Particle Swarm Clustering in clustering ensembles: exploiting pruning and alignment
free consensus”, Applied Soft Computing, Feb 13, 2016.
[11] Sandro Vega-Pons and Jose Ruiz-Shulcloper, “A Survey of Clustering Ensemble Algorithms”, 2011.
[12] Zhiwen Yu, Hau-San Wongb, Jane You, Qinmin Yang and Hongying Liao, “Knowledge Based Cluster Ensemble
for Cancer Discovery from Biomolecular Data”, IEEE Transactions on Nanobioscience vol. 10, no 2, June 2011.
[13] Xlanxue Yu Guoxian Yu and JunWang, “Clustering cancer gene expression data by projective clustering
ensemble”, Research article, Plos One, 2017.
[14] Zhiwen Yu, Hongsheng Chen Jane Yu, Hau SanWong , Jiming Liu Fellow Le Li Guoqiang Han, “Double Selection
Based Semi Supervised Clustering Ensemble for Tumor Clustering from Gene Expression Profiles”, IEEE/ACM
Transactions on Computational Biology and Bioinformatics, 2013.
[15] Zhiwen Yu, Jane You, Hau SanWong, Jun Zhang, “Incremental semi supervised Clustering Ensemble for High
Dimensional data Clustering”, article, IEEE Transactions on Knowledge and Data Engineering, 2015.
[16] Martin H.C. Law, Alexander P. Topchy, Anil K. Jain “Multiobjective Data Clustering”, IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, 204.
BIOGRAPHIES OF AUTHORS
Y. L. Sandhya Rani is a Scholar in K L University and working as an assistant professor in Sir C R R
College of Engineering, Eluru. She has interested in the areas like Clustering in data mining, Data
Structures etc. Currently she was working with Clustering Ensemble Performance as a research work.
She has published many papers in national and international journals.
Dr. V. Sucharita is a professor in Narayana Engineering College, Gudur. She has published many
papers in national and international journals. Also she is reviewer and editorial board member for
reputed journals
Dr. K. V. V. Satyanarayana is professor in K L university. He has published many papers in national
and international journals. He has interested in the area like Bio informatics and cloud computing.