A hard partition clustering algorithm assigns equally distant points to one of the clusters, where each datum has the probability to appear in simultaneous assignment to further clusters. The fuzzy cluster analysis assigns membership coefficients of data points which are equidistant between two clusters so the information directs have a place toward in excess of one cluster in the meantime. For a subset of CiteScore dataset, fuzzy clustering (fanny) and fuzzy c-means (fcm) algorithms were implemented to study the data points that lie equally distant from each other. Before analysis, clusterability of the dataset was evaluated with Hopkins statistic which resulted in 0.4371, a value < 0.5, indicating that the data is highly clusterable. The optimal clusters were determined using NbClust package, where it is evidenced that 9 various indices proposed 3 cluster solutions as best clusters. Further, appropriate value of fuzziness parameter m was evaluated to determine the distribution of membership values with variation in m from 1 to 2. Coefficient of variation (CV), also known as relative variability was evaluated to study the spread of data. The time complexity of fuzzy clustering (fanny) and fuzzy c-means algorithms were evaluated by keeping data points constant and varying number of clusters.
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...ijscmc
Face recognition is one of the most unobtrusive biometric techniques that can be used for access control as well as surveillance purposes. Various methods for implementing face recognition have been proposed with varying degrees of performance in different scenarios. The most common issue with effective facial biometric systems is high susceptibility of variations in the face owing to different factors like changes in pose, varying illumination, different expression, presence of outliers, noise etc. This paper explores a novel technique for face recognition by performing classification of the face images using unsupervised learning approach through K-Medoids clustering. Partitioning Around Medoids algorithm (PAM) has been used for performing K-Medoids clustering of the data. The results are suggestive of increased robustness to noise and outliers in comparison to other clustering methods. Therefore the technique can also be used to increase the overall robustness of a face recognition system and thereby increase its invariance and make it a reliably usable biometric modality
K-Means clustering uses an iterative procedure which is very much sensitive and dependent upon the initial centroids. The initial centroids in the k-means clustering are chosen randomly, and hence the clustering also changes with respect to the initial centroids. This paper tries to overcome this problem of random selection of centroids and hence change of clusters with a premeditated selection of initial centroids. We have used the iris, abalone and wine data sets to demonstrate that the proposed method of finding the initial centroids and using the centroids in k-means algorithm improves the clustering performance. The clustering also remains the same in every run as the initial centroids are not randomly selected but through premeditated method.
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
Data mining is the process of using technology to identify patterns and prospects from large amount of information. In Data Mining, Clustering is an important research topic and wide range of unverified classification application. Clustering is technique which divides a data into meaningful groups. K-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. In this paper, we present the comparison of different K-means clustering algorithms.
A Study of Efficiency Improvements Technique for K-Means AlgorithmIRJET Journal
This document discusses techniques to improve the efficiency of the K-means clustering algorithm. It begins with an introduction to K-means clustering and discusses some of its limitations, such as high computational time. It then proposes using a ranking method to help assign data points to clusters more efficiently. The key steps of standard K-means clustering and the proposed ranking-based approach are described. Experimental results on sample datasets show that the ranking method leads to faster clustering compared to standard K-means, with comparable accuracy. Therefore, the ranking method can help enhance the performance of K-means clustering for large datasets.
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
The k- Means clustering algorithm is an old algorithm that has been intensely researched owing to its ease
and simplicity of implementation. Clustering algorithm has a broad attraction and usefulness in
exploratory data analysis. This paper presents results of the experimental study of different approaches to
k- Means clustering, thereby comparing results on different datasets using Original k-Means and other
modified algorithms implemented using MATLAB R2009b. The results are calculated on some performance
measures such as no. of iterations, no. of points misclassified, accuracy, Silhouette validity index and
execution time
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...IRJET Journal
This document discusses machine learning algorithms for image classification using five different classification schemes. It summarizes the mathematical models behind each classification algorithm, including Nearest Class Centroid classifier, Nearest Sub-Class Centroid classifier, k-Nearest Neighbor classifier, Perceptron trained using Backpropagation, and Perceptron trained using Mean Squared Error. It also describes two datasets used in the experiments - the MNIST dataset of handwritten digits and the ORL face recognition dataset. The performance of the five classification schemes are compared on these datasets.
This document describes a new distance-based clustering algorithm (DBCA) that aims to improve upon K-means clustering. DBCA selects initial cluster centroids based on the total distance of each data point to all other points, rather than random selection. It calculates distances between all points, identifies points with maximum total distances, and sets initial centroids as the averages of groups of these maximally distant points. The algorithm is compared to K-means, hierarchical clustering, and hierarchical partitioning clustering on synthetic and real data. Experimental results show DBCA produces better quality clusters than these other algorithms.
This document compares the k-means and grid density clustering algorithms. K-means partitions data into k clusters based on minimizing distances between points and cluster centroids. It works well with numerical data but can be affected by outliers. Grid density determines dense grids based on neighbor densities and can handle different shaped and multi-density clusters without knowing the number of clusters beforehand. It has advantages over k-means in that it can handle categorical data, noise and arbitrary shaped clusters.
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...ijscmc
Face recognition is one of the most unobtrusive biometric techniques that can be used for access control as well as surveillance purposes. Various methods for implementing face recognition have been proposed with varying degrees of performance in different scenarios. The most common issue with effective facial biometric systems is high susceptibility of variations in the face owing to different factors like changes in pose, varying illumination, different expression, presence of outliers, noise etc. This paper explores a novel technique for face recognition by performing classification of the face images using unsupervised learning approach through K-Medoids clustering. Partitioning Around Medoids algorithm (PAM) has been used for performing K-Medoids clustering of the data. The results are suggestive of increased robustness to noise and outliers in comparison to other clustering methods. Therefore the technique can also be used to increase the overall robustness of a face recognition system and thereby increase its invariance and make it a reliably usable biometric modality
K-Means clustering uses an iterative procedure which is very much sensitive and dependent upon the initial centroids. The initial centroids in the k-means clustering are chosen randomly, and hence the clustering also changes with respect to the initial centroids. This paper tries to overcome this problem of random selection of centroids and hence change of clusters with a premeditated selection of initial centroids. We have used the iris, abalone and wine data sets to demonstrate that the proposed method of finding the initial centroids and using the centroids in k-means algorithm improves the clustering performance. The clustering also remains the same in every run as the initial centroids are not randomly selected but through premeditated method.
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
Data mining is the process of using technology to identify patterns and prospects from large amount of information. In Data Mining, Clustering is an important research topic and wide range of unverified classification application. Clustering is technique which divides a data into meaningful groups. K-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. In this paper, we present the comparison of different K-means clustering algorithms.
A Study of Efficiency Improvements Technique for K-Means AlgorithmIRJET Journal
This document discusses techniques to improve the efficiency of the K-means clustering algorithm. It begins with an introduction to K-means clustering and discusses some of its limitations, such as high computational time. It then proposes using a ranking method to help assign data points to clusters more efficiently. The key steps of standard K-means clustering and the proposed ranking-based approach are described. Experimental results on sample datasets show that the ranking method leads to faster clustering compared to standard K-means, with comparable accuracy. Therefore, the ranking method can help enhance the performance of K-means clustering for large datasets.
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
The k- Means clustering algorithm is an old algorithm that has been intensely researched owing to its ease
and simplicity of implementation. Clustering algorithm has a broad attraction and usefulness in
exploratory data analysis. This paper presents results of the experimental study of different approaches to
k- Means clustering, thereby comparing results on different datasets using Original k-Means and other
modified algorithms implemented using MATLAB R2009b. The results are calculated on some performance
measures such as no. of iterations, no. of points misclassified, accuracy, Silhouette validity index and
execution time
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...IRJET Journal
This document discusses machine learning algorithms for image classification using five different classification schemes. It summarizes the mathematical models behind each classification algorithm, including Nearest Class Centroid classifier, Nearest Sub-Class Centroid classifier, k-Nearest Neighbor classifier, Perceptron trained using Backpropagation, and Perceptron trained using Mean Squared Error. It also describes two datasets used in the experiments - the MNIST dataset of handwritten digits and the ORL face recognition dataset. The performance of the five classification schemes are compared on these datasets.
This document describes a new distance-based clustering algorithm (DBCA) that aims to improve upon K-means clustering. DBCA selects initial cluster centroids based on the total distance of each data point to all other points, rather than random selection. It calculates distances between all points, identifies points with maximum total distances, and sets initial centroids as the averages of groups of these maximally distant points. The algorithm is compared to K-means, hierarchical clustering, and hierarchical partitioning clustering on synthetic and real data. Experimental results show DBCA produces better quality clusters than these other algorithms.
This document compares the k-means and grid density clustering algorithms. K-means partitions data into k clusters based on minimizing distances between points and cluster centroids. It works well with numerical data but can be affected by outliers. Grid density determines dense grids based on neighbor densities and can handle different shaped and multi-density clusters without knowing the number of clusters beforehand. It has advantages over k-means in that it can handle categorical data, noise and arbitrary shaped clusters.
This document discusses various clustering analysis methods including k-means, k-medoids (PAM), and CLARA. It explains that clustering involves grouping similar objects together without predefined classes. Partitioning methods like k-means and k-medoids (PAM) assign objects to clusters to optimize a criterion function. K-means uses cluster centroids while k-medoids uses actual data points as cluster representatives. PAM is more robust to outliers than k-means but does not scale well to large datasets, so CLARA applies PAM to samples of the data. Examples of clustering applications include market segmentation, land use analysis, and earthquake studies.
IRJET- A Survey of Text Document Clustering by using Clustering TechniquesIRJET Journal
This document discusses text document clustering techniques. It provides an overview of clustering algorithms like k-means, k-means++, and k-medoids and compares their performance on text document clustering. It also discusses common clustering validation metrics like purity, F-measure. The paper proposes an improved method for text document clustering that involves preprocessing the dataset to reduce noise, applying clustering algorithms while evaluating with different indexes, and using distance measures like Euclidean distance to improve cluster quality. The proposed method aims to improve the accuracy and efficiency of k-medoids clustering for large datasets.
The document discusses implementing an integrated approach of the K-means clustering algorithm for prediction analysis. It begins with motivating the need to improve the accuracy and dependability of existing overlapping K-means clustering by removing its dependency on random initialization parameters. The proposed methodology determines the optimal number of clusters K based on the dataset, calculates initial centroid positions using a harmonic means method, and applies overlapping K-means clustering. The implementation and results on two large datasets show the integrated approach outperforms original overlapping K-means in terms of accuracy, F-measure, Rand index, and number of iterations.
The improved k means with particle swarm optimizationAlexander Decker
This document summarizes a research paper that proposes an improved K-means clustering algorithm using particle swarm optimization. It begins with an introduction to data clustering and types of clustering algorithms. It then discusses K-means clustering and some of its drawbacks. Particle swarm optimization is introduced as an optimization technique inspired by swarm behavior in nature. The proposed algorithm uses particle swarm optimization to select better initial cluster centroids for K-means clustering in order to overcome some limitations of standard K-means. The algorithm works in two phases - the first uses particle swarm optimization and the second performs K-means clustering using the outputs from the first phase.
New Approach for K-mean and K-medoids AlgorithmEditor IJCATR
K-means and K-medoids clustering algorithms are widely used for many practical applications. Original k
medoids algorithms select initial centroids and medoids randomly that affect the quality of the resulting clusters and sometimes it
generates unstable and empty clusters which are meaningless.
expensive and requires time proportional to the product of the number of data items, number of clusters and the number of iterations.
The new approach for the k mean algorithm eliminates the deficiency of exiting k mean. It first calculates the initial centro
requirements of users and then gives better, effective and stable cluster. It also takes less execution time because it eliminates
unnecessary distance computation by using previous iteration. The new approach for k
systematically based on initial centroids. It generates stable clusters to improve accuracy.
The document analyzes crop yield data from spatial locations in Guntur District, Andhra Pradesh, India using hybrid data mining techniques. It first applies k-means clustering to the dataset, producing 5 clusters. It then applies the J48 classification algorithm to the clustered data, resulting in a decision tree that predicts cluster membership based on attributes like crop type, irrigated area, and latitude. Analysis found irrigated areas of cotton and chilies increased from 2007-2008 to 2011-2012. Association rule mining on the clustered data also found relationships between productivity and location attributes. The hybrid approach of clustering followed by classification effectively analyzed the spatial agricultural data.
This document discusses cluster analysis and various clustering algorithms. It begins with an overview of supervised and unsupervised learning, as well as generative models. It then discusses 5 common clustering techniques: partitioning, hierarchical, density-based, grid-based, and model-based clustering. The document also covers challenges with cluster analysis such as centroid initialization, outlier handling, categorical data, the curse of dimensionality, and computational complexity. Specific clustering algorithms discussed in more detail include K-means, K-medoids, K-modes, mini-batch K-means, and scalable K-means++.
This document is a seminar report on the K-Means clustering algorithm submitted by Gaurav Handa. It includes an introduction that discusses the importance of data mining and describes K-Means clustering. It also includes chapters that analyze and plan the implementation of K-Means, describe the algorithm and its flowchart, discuss limitations, and provide examples of implementing K-Means using graphs and Java code. The report was submitted in partial fulfillment of seminar requirements and includes acknowledgements and certificates.
Feature Subset Selection for High Dimensional Data using Clustering TechniquesIRJET Journal
The document discusses feature subset selection for high dimensional data using clustering techniques. It proposes a FAST algorithm that has three steps: (1) removing irrelevant features, (2) dividing features into clusters, (3) selecting the most representative feature from each cluster. The FAST algorithm uses DBSCAN, a density-based clustering algorithm, to cluster the features. DBSCAN can identify clusters of arbitrary shape and detect noise, making it suitable for high dimensional data. The goal of feature subset selection is to find a small number of discriminative features that best represent the data.
This document compares four clustering algorithms (K-means, hierarchical, EM, and density-based) using the WEKA tool. It applies the algorithms to a dataset of software classes and evaluates them based on number of clusters, time to build models, squared errors, and log likelihood. The results show that K-means performs best in terms of time to build models, while density-based clustering performs best in terms of log likelihood. Overall, the document concludes that K-means is the best algorithm for this dataset because it balances low runtime and good clustering accuracy.
Cluster analysis is a technique used to classify objects into groups called clusters based on their similarities. It has many applications in areas like market research, biology, and image processing. There are different types of clustering methods like partitioning, hierarchical, density-based, and grid-based. The k-means algorithm is a commonly used partitioning method where objects are grouped into k clusters based on their distances from centroid points, which are recalculated in each iteration until cluster memberships stabilize. Cluster analysis helps discover patterns and insights from large datasets.
A brief description of clustering, two relevant clustering algorithms(K-means and Fuzzy C-means), clustering validation, two inner validity indices(Dunn-n-Dunn and Devies Bouldin) .
This document presents a feature clustering algorithm to reduce the dimensionality of feature vectors for text classification. The algorithm groups words in documents into clusters based on similarity, with each cluster characterized by a membership function. Words not similar to existing clusters form new clusters. This avoids specifying features in advance and the need for trial and error. Experimental results showed the method can classify text faster and with better extracted features than other methods.
This document provides a short review of clustering techniques for students. It defines clustering and different types of grouping methods such as hard vs soft clustering. It discusses popular clustering algorithms like hierarchical clustering, k-means clustering, and density-based clustering. It also covers cluster validity, usability, preprocessing techniques, meta methods, and visual clustering. Open problems in clustering mentioned include how to identify outlier objects and accelerate classification.
Big data Clustering Algorithms And StrategiesFarzad Nozarian
The document discusses various algorithms for big data clustering. It begins by covering preprocessing techniques such as data reduction. It then covers hierarchical, prototype-based, density-based, grid-based, and scalability clustering algorithms. Specific algorithms discussed include K-means, K-medoids, PAM, CLARA/CLARANS, DBSCAN, OPTICS, MR-DBSCAN, DBCURE, and hierarchical algorithms like PINK and l-SL. The document emphasizes techniques for scaling these algorithms to large datasets, including partitioning, sampling, approximation strategies, and MapReduce implementations.
The International Journal of Engineering and Science (The IJES)theijes
This document summarizes a research paper that proposes a novel approach to improving the k-means clustering algorithm. The standard k-means algorithm is computationally expensive and produces results that depend heavily on the initial centroid selection. The proposed approach determines initial centroids systematically and uses a heuristic to efficiently assign data points to clusters. It improves both the accuracy and efficiency of k-means clustering by ensuring the entire process takes O(n2) time without sacrificing cluster quality.
The document discusses different clustering techniques used for grouping large amounts of data. It covers partitioning methods like k-means and k-medoids that organize data into exclusive groups. It also describes hierarchical methods like agglomerative and divisive clustering that arrange data into nested groups or trees. Additionally, it mentions density-based and grid-based clustering and provides algorithms for different clustering approaches.
Step by step operations by which we make a group of objects in which attributes
of all the objects are nearly similar, known as clustering. So, a cluster is a collection of
objects that acquire nearly same attribute values. The property of an object in a cluster is
similar to other objects in same cluster but different with objects of other clusters.
Clustering is used in wide range of applications like pattern recognition, image processing,
data analysis, machine learning etc. Nowadays, more attention has been put on categorical
data rather than numerical data. Where, the range of numerical attributes organizes in a
class like small, medium, high, and so on. There is wide range of algorithm that used to
make clusters of given categorical data. Our approach is to enhance the working on well-
known clustering algorithm k-modes to improve accuracy of algorithm. We proposed a new
approach named “High Accuracy Clustering Algorithm for Categorical datasets”.
This document summarizes Chapter 10 of the book "Data Mining: Concepts and Techniques (3rd ed.)" which covers cluster analysis. The chapter introduces different types of clustering methods including partitioning methods like k-means and k-medoids, hierarchical methods, density-based methods, and grid-based methods. It discusses how to evaluate the quality of clustering results and highlights considerations for cluster analysis such as similarity measures, clustering space, and challenges like scalability and high dimensionality.
A Comparative Study Of Various Clustering Algorithms In Data MiningNatasha Grant
This document provides an overview and comparison of various clustering algorithms used in data mining. It discusses the key types of clustering algorithms: partition-based (such as k-means and k-medoids), hierarchical-based, density-based, and grid-based. For partition-based algorithms, it describes k-means and k-medoids in more detail. It also discusses hierarchical clustering approaches like agglomerative nesting. The document aims to provide insights into different clustering techniques for segmenting and grouping data in an unsupervised manner.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This document discusses various clustering analysis methods including k-means, k-medoids (PAM), and CLARA. It explains that clustering involves grouping similar objects together without predefined classes. Partitioning methods like k-means and k-medoids (PAM) assign objects to clusters to optimize a criterion function. K-means uses cluster centroids while k-medoids uses actual data points as cluster representatives. PAM is more robust to outliers than k-means but does not scale well to large datasets, so CLARA applies PAM to samples of the data. Examples of clustering applications include market segmentation, land use analysis, and earthquake studies.
IRJET- A Survey of Text Document Clustering by using Clustering TechniquesIRJET Journal
This document discusses text document clustering techniques. It provides an overview of clustering algorithms like k-means, k-means++, and k-medoids and compares their performance on text document clustering. It also discusses common clustering validation metrics like purity, F-measure. The paper proposes an improved method for text document clustering that involves preprocessing the dataset to reduce noise, applying clustering algorithms while evaluating with different indexes, and using distance measures like Euclidean distance to improve cluster quality. The proposed method aims to improve the accuracy and efficiency of k-medoids clustering for large datasets.
The document discusses implementing an integrated approach of the K-means clustering algorithm for prediction analysis. It begins with motivating the need to improve the accuracy and dependability of existing overlapping K-means clustering by removing its dependency on random initialization parameters. The proposed methodology determines the optimal number of clusters K based on the dataset, calculates initial centroid positions using a harmonic means method, and applies overlapping K-means clustering. The implementation and results on two large datasets show the integrated approach outperforms original overlapping K-means in terms of accuracy, F-measure, Rand index, and number of iterations.
The improved k means with particle swarm optimizationAlexander Decker
This document summarizes a research paper that proposes an improved K-means clustering algorithm using particle swarm optimization. It begins with an introduction to data clustering and types of clustering algorithms. It then discusses K-means clustering and some of its drawbacks. Particle swarm optimization is introduced as an optimization technique inspired by swarm behavior in nature. The proposed algorithm uses particle swarm optimization to select better initial cluster centroids for K-means clustering in order to overcome some limitations of standard K-means. The algorithm works in two phases - the first uses particle swarm optimization and the second performs K-means clustering using the outputs from the first phase.
New Approach for K-mean and K-medoids AlgorithmEditor IJCATR
K-means and K-medoids clustering algorithms are widely used for many practical applications. Original k
medoids algorithms select initial centroids and medoids randomly that affect the quality of the resulting clusters and sometimes it
generates unstable and empty clusters which are meaningless.
expensive and requires time proportional to the product of the number of data items, number of clusters and the number of iterations.
The new approach for the k mean algorithm eliminates the deficiency of exiting k mean. It first calculates the initial centro
requirements of users and then gives better, effective and stable cluster. It also takes less execution time because it eliminates
unnecessary distance computation by using previous iteration. The new approach for k
systematically based on initial centroids. It generates stable clusters to improve accuracy.
The document analyzes crop yield data from spatial locations in Guntur District, Andhra Pradesh, India using hybrid data mining techniques. It first applies k-means clustering to the dataset, producing 5 clusters. It then applies the J48 classification algorithm to the clustered data, resulting in a decision tree that predicts cluster membership based on attributes like crop type, irrigated area, and latitude. Analysis found irrigated areas of cotton and chilies increased from 2007-2008 to 2011-2012. Association rule mining on the clustered data also found relationships between productivity and location attributes. The hybrid approach of clustering followed by classification effectively analyzed the spatial agricultural data.
This document discusses cluster analysis and various clustering algorithms. It begins with an overview of supervised and unsupervised learning, as well as generative models. It then discusses 5 common clustering techniques: partitioning, hierarchical, density-based, grid-based, and model-based clustering. The document also covers challenges with cluster analysis such as centroid initialization, outlier handling, categorical data, the curse of dimensionality, and computational complexity. Specific clustering algorithms discussed in more detail include K-means, K-medoids, K-modes, mini-batch K-means, and scalable K-means++.
This document is a seminar report on the K-Means clustering algorithm submitted by Gaurav Handa. It includes an introduction that discusses the importance of data mining and describes K-Means clustering. It also includes chapters that analyze and plan the implementation of K-Means, describe the algorithm and its flowchart, discuss limitations, and provide examples of implementing K-Means using graphs and Java code. The report was submitted in partial fulfillment of seminar requirements and includes acknowledgements and certificates.
Feature Subset Selection for High Dimensional Data using Clustering TechniquesIRJET Journal
The document discusses feature subset selection for high dimensional data using clustering techniques. It proposes a FAST algorithm that has three steps: (1) removing irrelevant features, (2) dividing features into clusters, (3) selecting the most representative feature from each cluster. The FAST algorithm uses DBSCAN, a density-based clustering algorithm, to cluster the features. DBSCAN can identify clusters of arbitrary shape and detect noise, making it suitable for high dimensional data. The goal of feature subset selection is to find a small number of discriminative features that best represent the data.
This document compares four clustering algorithms (K-means, hierarchical, EM, and density-based) using the WEKA tool. It applies the algorithms to a dataset of software classes and evaluates them based on number of clusters, time to build models, squared errors, and log likelihood. The results show that K-means performs best in terms of time to build models, while density-based clustering performs best in terms of log likelihood. Overall, the document concludes that K-means is the best algorithm for this dataset because it balances low runtime and good clustering accuracy.
Cluster analysis is a technique used to classify objects into groups called clusters based on their similarities. It has many applications in areas like market research, biology, and image processing. There are different types of clustering methods like partitioning, hierarchical, density-based, and grid-based. The k-means algorithm is a commonly used partitioning method where objects are grouped into k clusters based on their distances from centroid points, which are recalculated in each iteration until cluster memberships stabilize. Cluster analysis helps discover patterns and insights from large datasets.
A brief description of clustering, two relevant clustering algorithms(K-means and Fuzzy C-means), clustering validation, two inner validity indices(Dunn-n-Dunn and Devies Bouldin) .
This document presents a feature clustering algorithm to reduce the dimensionality of feature vectors for text classification. The algorithm groups words in documents into clusters based on similarity, with each cluster characterized by a membership function. Words not similar to existing clusters form new clusters. This avoids specifying features in advance and the need for trial and error. Experimental results showed the method can classify text faster and with better extracted features than other methods.
This document provides a short review of clustering techniques for students. It defines clustering and different types of grouping methods such as hard vs soft clustering. It discusses popular clustering algorithms like hierarchical clustering, k-means clustering, and density-based clustering. It also covers cluster validity, usability, preprocessing techniques, meta methods, and visual clustering. Open problems in clustering mentioned include how to identify outlier objects and accelerate classification.
Big data Clustering Algorithms And StrategiesFarzad Nozarian
The document discusses various algorithms for big data clustering. It begins by covering preprocessing techniques such as data reduction. It then covers hierarchical, prototype-based, density-based, grid-based, and scalability clustering algorithms. Specific algorithms discussed include K-means, K-medoids, PAM, CLARA/CLARANS, DBSCAN, OPTICS, MR-DBSCAN, DBCURE, and hierarchical algorithms like PINK and l-SL. The document emphasizes techniques for scaling these algorithms to large datasets, including partitioning, sampling, approximation strategies, and MapReduce implementations.
The International Journal of Engineering and Science (The IJES)theijes
This document summarizes a research paper that proposes a novel approach to improving the k-means clustering algorithm. The standard k-means algorithm is computationally expensive and produces results that depend heavily on the initial centroid selection. The proposed approach determines initial centroids systematically and uses a heuristic to efficiently assign data points to clusters. It improves both the accuracy and efficiency of k-means clustering by ensuring the entire process takes O(n2) time without sacrificing cluster quality.
The document discusses different clustering techniques used for grouping large amounts of data. It covers partitioning methods like k-means and k-medoids that organize data into exclusive groups. It also describes hierarchical methods like agglomerative and divisive clustering that arrange data into nested groups or trees. Additionally, it mentions density-based and grid-based clustering and provides algorithms for different clustering approaches.
Step by step operations by which we make a group of objects in which attributes
of all the objects are nearly similar, known as clustering. So, a cluster is a collection of
objects that acquire nearly same attribute values. The property of an object in a cluster is
similar to other objects in same cluster but different with objects of other clusters.
Clustering is used in wide range of applications like pattern recognition, image processing,
data analysis, machine learning etc. Nowadays, more attention has been put on categorical
data rather than numerical data. Where, the range of numerical attributes organizes in a
class like small, medium, high, and so on. There is wide range of algorithm that used to
make clusters of given categorical data. Our approach is to enhance the working on well-
known clustering algorithm k-modes to improve accuracy of algorithm. We proposed a new
approach named “High Accuracy Clustering Algorithm for Categorical datasets”.
This document summarizes Chapter 10 of the book "Data Mining: Concepts and Techniques (3rd ed.)" which covers cluster analysis. The chapter introduces different types of clustering methods including partitioning methods like k-means and k-medoids, hierarchical methods, density-based methods, and grid-based methods. It discusses how to evaluate the quality of clustering results and highlights considerations for cluster analysis such as similarity measures, clustering space, and challenges like scalability and high dimensionality.
A Comparative Study Of Various Clustering Algorithms In Data MiningNatasha Grant
This document provides an overview and comparison of various clustering algorithms used in data mining. It discusses the key types of clustering algorithms: partition-based (such as k-means and k-medoids), hierarchical-based, density-based, and grid-based. For partition-based algorithms, it describes k-means and k-medoids in more detail. It also discusses hierarchical clustering approaches like agglomerative nesting. The document aims to provide insights into different clustering techniques for segmenting and grouping data in an unsupervised manner.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This document provides an overview of different techniques for clustering categorical data. It discusses various clustering algorithms that have been used for categorical data, including K-modes, ROCK, COBWEB, and EM algorithms. It also reviews more recently developed algorithms for categorical data clustering, such as algorithms based on particle swarm optimization, rough set theory, and feature weighting schemes. The document concludes that clustering categorical data remains an important area of research, with opportunities to develop techniques that initialize cluster centers better.
This document summarizes a research paper that proposes a dynamic approach to improving the k-means clustering algorithm. The proposed approach aims to address two weaknesses of the standard k-means algorithm: its requirement of prior knowledge of the number of clusters k, and its sensitivity to initialization. The approach determines initial cluster centroids by segmenting the data space and selecting high-frequency segments. It then uses the silhouette validity index to dynamically determine the optimal number of clusters k, rather than requiring the user to specify k. The approach is compared to the standard k-means algorithm and other modified approaches, and is shown to improve initial center selection and reduce computation time.
Mine Blood Donors Information through Improved K-Means Clusteringijcsity
The number of accidents and health diseases which are increasing at an alarming rate are resulting in a huge increase in the demand for blood. There is a necessity for the organized analysis of the blood donor database or blood banks repositories. Clustering analysis is one of the data mining applications and K-means clustering algorithm is the fundamental algorithm for modern clustering techniques. K-means clustering algorithm is traditional approach and iterative algorithm. At every iteration, it attempts to find the distance from the centroid of each cluster to each and every data point. This paper gives the improvement to the original k-means algorithm by improving the initial centroids with distribution of data. Results and discussions show that improved K-means algorithm produces accurate clusters in less computation time to find the donors information
Finding Relationships between the Our-NIR Cluster ResultsCSCJournals
The problem of evaluating node importance in clustering has been active research in present days and many methods have been developed. Most of the clustering algorithms deal with general similarity measures. However In real situation most of the cases data changes over time. But clustering this type of data not only decreases the quality of clusters but also disregards the expectation of users, when usually require recent clustering results. In this regard we proposed Our-NIR method that is better than Ming-Syan Chen proposed a method and it has proven with the help of results of node importance, which is related to calculate the node importance that is very useful in clustering of categorical data, still it has deficiency that is importance of data labeling and outlier detection. In this paper we modified Our-NIR method for evaluating of node importance by introducing the probability distribution which will be better than by comparing the results.
This document compares the k-means and grid density clustering algorithms. It summarizes that grid density clustering determines dense grids based on the densities of neighboring grids, and is able to handle different shaped clusters in multi-density environments. The grid density algorithm does not require distance computation and is not dependent on the number of clusters being known in advance like k-means. The document concludes that grid density clustering is better than k-means clustering as it can handle noise and outliers, find arbitrary shaped clusters, and has lower time complexity.
An improvement in k mean clustering algorithm using better time and accuracyijpla
This document summarizes a research paper that proposes an improved K-means clustering algorithm to enhance accuracy and reduce computation time. The standard K-means algorithm randomly selects initial cluster centroids, affecting results. The proposed algorithm systematically determines initial centroids based on data point distances. It assigns data to the closest initial centroid to generate initial clusters. Iteratively, it calculates new centroids and reassigns data only if distances decrease, reducing unnecessary computations. Experiments on various datasets show the proposed algorithm achieves higher accuracy faster than standard K-means.
The document discusses employing a rough set based approach for clustering categorical time-evolving data. It proposes using node importance and a rough membership function to label unlabeled data points and detect concept drift between data clusters over time. Specifically, it defines key terms like nodes, introduces the problem of clustering categorical time-series data and concept drift detection. It then describes using a rough membership function to calculate similarity between unlabeled data and existing clusters in order to label the data and detect changes in cluster characteristics over time.
The document proposes improvements to the traditional K-means clustering algorithm to increase accuracy and efficiency. It discusses selecting initial centroids and assigning data points to clusters. The improved algorithm determines initial centroids by calculating distances from data points to the mean and partitioning into K clusters. It then assigns points to centroids based on minimum distance, only recalculating distances if they increase. Experimental results on standard datasets show the improved algorithm takes less time with higher accuracy compared to traditional K-means.
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace DataIRJET Journal
The document describes a new algorithm called MPSKM that clusters uneven dimensional time series subspace data. The algorithm aims to select attribute ranks based on their involvement in the data set and identify global and local patterns. It automates determining the number of clusters and cluster centers. The algorithm calculates a rank matrix based on the sum of squared errors between attribute pairs to rank attributes. It then uses the ranks to transform the data dimensions before clustering. The algorithm is tested on weather data and shown to reduce iteration counts and error compared to traditional methods.
IRJET- Optimal Number of Cluster Identification using Robust K-Means for ...IRJET Journal
This document discusses using a modified k-means algorithm to identify the optimal number of clusters in categorical sequence data. The traditional k-means algorithm requires the number of clusters to be predefined, which can impact performance. The proposed Robust K-means for Sequences algorithm aims to predict the optimal number of clusters by removing noise clusters. It evaluates cluster validation to assess clustering quality for categorical sequence data, where defining similarity is challenging. The algorithm combines a partition-based clustering method and a cluster validity index within a model selection process to determine the best number of clusters for categorical sequence sets.
Data mining is utilized to manage huge measure of information which are put in the data ware houses and databases, to discover required information and data. Numerous data mining systems have been proposed, for example, association rules, decision trees, neural systems, clustering, and so on. It has turned into the purpose of consideration from numerous years. A re-known amongst the available data mining strategies is clustering of the dataset. It is the most effective data mining method. It groups the dataset in number of clusters based on certain guidelines that are predefined. It is dependable to discover the connection between the distinctive characteristics of data.
In k-mean clustering algorithm, the function is being selected on the basis of the relevancy of the function for predicting the data and also the Euclidian distance between the centroid of any cluster and the data objects outside the cluster is being computed for the clustering the data points. In this work, author enhanced the Euclidian distance formula to increase the cluster quality.
The problem of accuracy and redundancy of the dissimilar points in the clusters remains in the improved k-means for which new enhanced approach is been proposed which uses the similarity function for checking the similarity level of the point before including it to the cluster.
Research Inventy : International Journal of Engineering and Science is published by the group of young academic and industrial researchers with 12 Issues per year. It is an online as well as print version open access journal that provides rapid publication (monthly) of articles in all areas of the subject such as: civil, mechanical, chemical, electronic and computer engineering as well as production and information technology. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published by rapid process within 20 days after acceptance and peer review process takes only 7 days. All articles published in Research Inventy will be peer-reviewed.
A Study in Employing Rough Set Based Approach for Clustering on Categorical ...IOSR Journals
This document discusses employing a rough set based approach for clustering categorical time-evolving data. It proposes a method using node importance to label unlabeled data points and find the next clustering result based on the previous one. It first reviews related literature on clustering categorical data and cluster representatives. It then defines basic notations for the problem, including defining a node. It discusses data labeling for unlabeled data points through a rough membership function based similarity between clusters and unlabeled points. This considers node frequency within clusters and distribution across clusters to measure similarity.
Comparison Between Clustering Algorithms for Microarray Data AnalysisIOSR Journals
Currently, there are two techniques used for large-scale gene-expression profiling; microarray and
RNA-Sequence (RNA-Seq).This paper is intended to study and compare different clustering algorithms that used
in microarray data analysis. Microarray is a DNA molecules array which allows multiple hybridization
experiments to be carried out simultaneously and trace expression levels of thousands of genes. It is a highthroughput
technology for gene expression analysis and becomes an effective tool for biomedical research.
Microarray analysis aims to interpret the data produced from experiments on DNA, RNA, and protein
microarrays, which enable researchers to investigate the expression state of a large number of genes. Data
clustering represents the first and main process in microarray data analysis. The k-means, fuzzy c-mean, selforganizing
map, and hierarchical clustering algorithms are under investigation in this paper. These algorithms
are compared based on their clustering model.
This document compares hierarchical and non-hierarchical clustering algorithms. It summarizes four clustering algorithms: K-Means, K-Medoids, Farthest First Clustering (hierarchical algorithms), and DBSCAN (non-hierarchical algorithm). It describes the methodology of each algorithm and provides pseudocode. It also describes the datasets used to evaluate the performance of the algorithms and the evaluation metrics. The goal is to compare the performance of the clustering methods on different datasets.
CCC-Bicluster Analysis for Time Series Gene Expression DataIRJET Journal
The document presents a CCC-Biclustering (Contiguous Column Coherence) algorithm for identifying biclusters in time series gene expression data. The algorithm finds maximal biclusters with adjacent/contiguous columns in linear time using Ukkonen's suffix tree construction algorithm and discretized gene expression matrices. The algorithm was applied to a Saccharomyces cerevisiae gene expression time series in response to heat stress. It identifies coherent expression patterns shared among genes over contiguous time points, potentially revealing relevant regulatory modules.
Density Based Clustering Approach for Solving the Software Component Restruct...IRJET Journal
This document presents research on using the DBSCAN clustering algorithm to solve the problem of software component restructuring. It begins with an abstract that introduces DBSCAN and describes how it can group related software components. It then provides background on software component clustering and describes DBSCAN in more detail. The methodology section outlines the 4 phases of the proposed approach: data collection and processing, clustering with DBSCAN, visualization and analysis, and final restructuring. Experimental results show that DBSCAN produces more evenly distributed clusters compared to fuzzy clustering. The conclusion is that DBSCAN is a better technique for software restructuring as it can identify clusters of varying shapes and sizes without specifying the number of clusters in advance.
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...IRJET Journal
This document discusses several approaches for clustering textual documents, including:
1. TF-IDF, word embedding, and K-means clustering are proposed to automatically classify and organize documents.
2. Previous work on document clustering is reviewed, including partition-based techniques like K-means and K-medoids, hierarchical clustering, and approaches using semantic features, PSO optimization, and multi-view clustering.
3. Challenges of clustering large document collections at scale are discussed, along with potential solutions using frameworks like Hadoop.
Similar to Fuzzy clustering and fuzzy c-means partition cluster analysis and validation studies on a subset of citescore dataset (20)
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Neural network optimizer of proportional-integral-differential controller par...IJECEIAES
Wide application of proportional-integral-differential (PID)-regulator in industry requires constant improvement of methods of its parameters adjustment. The paper deals with the issues of optimization of PID-regulator parameters with the use of neural network technology methods. A methodology for choosing the architecture (structure) of neural network optimizer is proposed, which consists in determining the number of layers, the number of neurons in each layer, as well as the form and type of activation function. Algorithms of neural network training based on the application of the method of minimizing the mismatch between the regulated value and the target value are developed. The method of back propagation of gradients is proposed to select the optimal training rate of neurons of the neural network. The neural network optimizer, which is a superstructure of the linear PID controller, allows increasing the regulation accuracy from 0.23 to 0.09, thus reducing the power consumption from 65% to 53%. The results of the conducted experiments allow us to conclude that the created neural superstructure may well become a prototype of an automatic voltage regulator (AVR)-type industrial controller for tuning the parameters of the PID controller.
An improved modulation technique suitable for a three level flying capacitor ...IJECEIAES
This research paper introduces an innovative modulation technique for controlling a 3-level flying capacitor multilevel inverter (FCMLI), aiming to streamline the modulation process in contrast to conventional methods. The proposed
simplified modulation technique paves the way for more straightforward and
efficient control of multilevel inverters, enabling their widespread adoption and
integration into modern power electronic systems. Through the amalgamation of
sinusoidal pulse width modulation (SPWM) with a high-frequency square wave
pulse, this controlling technique attains energy equilibrium across the coupling
capacitor. The modulation scheme incorporates a simplified switching pattern
and a decreased count of voltage references, thereby simplifying the control
algorithm.
A review on features and methods of potential fishing zoneIJECEIAES
This review focuses on the importance of identifying potential fishing zones in seawater for sustainable fishing practices. It explores features like sea surface temperature (SST) and sea surface height (SSH), along with classification methods such as classifiers. The features like SST, SSH, and different classifiers used to classify the data, have been figured out in this review study. This study underscores the importance of examining potential fishing zones using advanced analytical techniques. It thoroughly explores the methodologies employed by researchers, covering both past and current approaches. The examination centers on data characteristics and the application of classification algorithms for classification of potential fishing zones. Furthermore, the prediction of potential fishing zones relies significantly on the effectiveness of classification algorithms. Previous research has assessed the performance of models like support vector machines, naïve Bayes, and artificial neural networks (ANN). In the previous result, the results of support vector machine (SVM) were 97.6% more accurate than naive Bayes's 94.2% to classify test data for fisheries classification. By considering the recent works in this area, several recommendations for future works are presented to further improve the performance of the potential fishing zone models, which is important to the fisheries community.
Electrical signal interference minimization using appropriate core material f...IJECEIAES
As demand for smaller, quicker, and more powerful devices rises, Moore's law is strictly followed. The industry has worked hard to make little devices that boost productivity. The goal is to optimize device density. Scientists are reducing connection delays to improve circuit performance. This helped them understand three-dimensional integrated circuit (3D IC) concepts, which stack active devices and create vertical connections to diminish latency and lower interconnects. Electrical involvement is a big worry with 3D integrates circuits. Researchers have developed and tested through silicon via (TSV) and substrates to decrease electrical wave involvement. This study illustrates a novel noise coupling reduction method using several electrical involvement models. A 22% drop in electrical involvement from wave-carrying to victim TSVs introduces this new paradigm and improves system performance even at higher THz frequencies.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Bibliometric analysis highlighting the role of women in addressing climate ch...IJECEIAES
Fossil fuel consumption increased quickly, contributing to climate change
that is evident in unusual flooding and draughts, and global warming. Over
the past ten years, women's involvement in society has grown dramatically,
and they succeeded in playing a noticeable role in reducing climate change.
A bibliometric analysis of data from the last ten years has been carried out to
examine the role of women in addressing the climate change. The analysis's
findings discussed the relevant to the sustainable development goals (SDGs),
particularly SDG 7 and SDG 13. The results considered contributions made
by women in the various sectors while taking geographic dispersion into
account. The bibliometric analysis delves into topics including women's
leadership in environmental groups, their involvement in policymaking, their
contributions to sustainable development projects, and the influence of
gender diversity on attempts to mitigate climate change. This study's results
highlight how women have influenced policies and actions related to climate
change, point out areas of research deficiency and recommendations on how
to increase role of the women in addressing the climate change and
achieving sustainability. To achieve more successful results, this initiative
aims to highlight the significance of gender equality and encourage
inclusivity in climate change decision-making processes.
Voltage and frequency control of microgrid in presence of micro-turbine inter...IJECEIAES
The active and reactive load changes have a significant impact on voltage
and frequency. In this paper, in order to stabilize the microgrid (MG) against
load variations in islanding mode, the active and reactive power of all
distributed generators (DGs), including energy storage (battery), diesel
generator, and micro-turbine, are controlled. The micro-turbine generator is
connected to MG through a three-phase to three-phase matrix converter, and
the droop control method is applied for controlling the voltage and
frequency of MG. In addition, a method is introduced for voltage and
frequency control of micro-turbines in the transition state from gridconnected mode to islanding mode. A novel switching strategy of the matrix
converter is used for converting the high-frequency output voltage of the
micro-turbine to the grid-side frequency of the utility system. Moreover,
using the switching strategy, the low-order harmonics in the output current
and voltage are not produced, and consequently, the size of the output filter
would be reduced. In fact, the suggested control strategy is load-independent
and has no frequency conversion restrictions. The proposed approach for
voltage and frequency regulation demonstrates exceptional performance and
favorable response across various load alteration scenarios. The suggested
strategy is examined in several scenarios in the MG test systems, and the
simulation results are addressed.
Enhancing battery system identification: nonlinear autoregressive modeling fo...IJECEIAES
Precisely characterizing Li-ion batteries is essential for optimizing their
performance, enhancing safety, and prolonging their lifespan across various
applications, such as electric vehicles and renewable energy systems. This
article introduces an innovative nonlinear methodology for system
identification of a Li-ion battery, employing a nonlinear autoregressive with
exogenous inputs (NARX) model. The proposed approach integrates the
benefits of nonlinear modeling with the adaptability of the NARX structure,
facilitating a more comprehensive representation of the intricate
electrochemical processes within the battery. Experimental data collected
from a Li-ion battery operating under diverse scenarios are employed to
validate the effectiveness of the proposed methodology. The identified
NARX model exhibits superior accuracy in predicting the battery's behavior
compared to traditional linear models. This study underscores the
importance of accounting for nonlinearities in battery modeling, providing
insights into the intricate relationships between state-of-charge, voltage, and
current under dynamic conditions.
Smart grid deployment: from a bibliometric analysis to a surveyIJECEIAES
Smart grids are one of the last decades' innovations in electrical energy.
They bring relevant advantages compared to the traditional grid and
significant interest from the research community. Assessing the field's
evolution is essential to propose guidelines for facing new and future smart
grid challenges. In addition, knowing the main technologies involved in the
deployment of smart grids (SGs) is important to highlight possible
shortcomings that can be mitigated by developing new tools. This paper
contributes to the research trends mentioned above by focusing on two
objectives. First, a bibliometric analysis is presented to give an overview of
the current research level about smart grid deployment. Second, a survey of
the main technological approaches used for smart grid implementation and
their contributions are highlighted. To that effect, we searched the Web of
Science (WoS), and the Scopus databases. We obtained 5,663 documents
from WoS and 7,215 from Scopus on smart grid implementation or
deployment. With the extraction limitation in the Scopus database, 5,872 of
the 7,215 documents were extracted using a multi-step process. These two
datasets have been analyzed using a bibliometric tool called bibliometrix.
The main outputs are presented with some recommendations for future
research.
Use of analytical hierarchy process for selecting and prioritizing islanding ...IJECEIAES
One of the problems that are associated to power systems is islanding
condition, which must be rapidly and properly detected to prevent any
negative consequences on the system's protection, stability, and security.
This paper offers a thorough overview of several islanding detection
strategies, which are divided into two categories: classic approaches,
including local and remote approaches, and modern techniques, including
techniques based on signal processing and computational intelligence.
Additionally, each approach is compared and assessed based on several
factors, including implementation costs, non-detected zones, declining
power quality, and response times using the analytical hierarchy process
(AHP). The multi-criteria decision-making analysis shows that the overall
weight of passive methods (24.7%), active methods (7.8%), hybrid methods
(5.6%), remote methods (14.5%), signal processing-based methods (26.6%),
and computational intelligent-based methods (20.8%) based on the
comparison of all criteria together. Thus, it can be seen from the total weight
that hybrid approaches are the least suitable to be chosen, while signal
processing-based methods are the most appropriate islanding detection
method to be selected and implemented in power system with respect to the
aforementioned factors. Using Expert Choice software, the proposed
hierarchy model is studied and examined.
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...IJECEIAES
The power generated by photovoltaic (PV) systems is influenced by
environmental factors. This variability hampers the control and utilization of
solar cells' peak output. In this study, a single-stage grid-connected PV
system is designed to enhance power quality. Our approach employs fuzzy
logic in the direct power control (DPC) of a three-phase voltage source
inverter (VSI), enabling seamless integration of the PV connected to the
grid. Additionally, a fuzzy logic-based maximum power point tracking
(MPPT) controller is adopted, which outperforms traditional methods like
incremental conductance (INC) in enhancing solar cell efficiency and
minimizing the response time. Moreover, the inverter's real-time active and
reactive power is directly managed to achieve a unity power factor (UPF).
The system's performance is assessed through MATLAB/Simulink
implementation, showing marked improvement over conventional methods,
particularly in steady-state and varying weather conditions. For solar
irradiances of 500 and 1,000 W/m2
, the results show that the proposed
method reduces the total harmonic distortion (THD) of the injected current
to the grid by approximately 46% and 38% compared to conventional
methods, respectively. Furthermore, we compare the simulation results with
IEEE standards to evaluate the system's grid compatibility.
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...IJECEIAES
Photovoltaic systems have emerged as a promising energy resource that
caters to the future needs of society, owing to their renewable, inexhaustible,
and cost-free nature. The power output of these systems relies on solar cell
radiation and temperature. In order to mitigate the dependence on
atmospheric conditions and enhance power tracking, a conventional
approach has been improved by integrating various methods. To optimize
the generation of electricity from solar systems, the maximum power point
tracking (MPPT) technique is employed. To overcome limitations such as
steady-state voltage oscillations and improve transient response, two
traditional MPPT methods, namely fuzzy logic controller (FLC) and perturb
and observe (P&O), have been modified. This research paper aims to
simulate and validate the step size of the proposed modified P&O and FLC
techniques within the MPPT algorithm using MATLAB/Simulink for
efficient power tracking in photovoltaic systems.
Adaptive synchronous sliding control for a robot manipulator based on neural ...IJECEIAES
Robot manipulators have become important equipment in production lines, medical fields, and transportation. Improving the quality of trajectory tracking for
robot hands is always an attractive topic in the research community. This is a
challenging problem because robot manipulators are complex nonlinear systems
and are often subject to fluctuations in loads and external disturbances. This
article proposes an adaptive synchronous sliding control scheme to improve trajectory tracking performance for a robot manipulator. The proposed controller
ensures that the positions of the joints track the desired trajectory, synchronize
the errors, and significantly reduces chattering. First, the synchronous tracking
errors and synchronous sliding surfaces are presented. Second, the synchronous
tracking error dynamics are determined. Third, a robust adaptive control law is
designed,the unknown components of the model are estimated online by the neural network, and the parameters of the switching elements are selected by fuzzy
logic. The built algorithm ensures that the tracking and approximation errors
are ultimately uniformly bounded (UUB). Finally, the effectiveness of the constructed algorithm is demonstrated through simulation and experimental results.
Simulation and experimental results show that the proposed controller is effective with small synchronous tracking errors, and the chattering phenomenon is
significantly reduced.
Remote field-programmable gate array laboratory for signal acquisition and de...IJECEIAES
A remote laboratory utilizing field-programmable gate array (FPGA) technologies enhances students’ learning experience anywhere and anytime in embedded system design. Existing remote laboratories prioritize hardware access and visual feedback for observing board behavior after programming, neglecting comprehensive debugging tools to resolve errors that require internal signal acquisition. This paper proposes a novel remote embeddedsystem design approach targeting FPGA technologies that are fully interactive via a web-based platform. Our solution provides FPGA board access and debugging capabilities beyond the visual feedback provided by existing remote laboratories. We implemented a lab module that allows users to seamlessly incorporate into their FPGA design. The module minimizes hardware resource utilization while enabling the acquisition of a large number of data samples from the signal during the experiments by adaptively compressing the signal prior to data transmission. The results demonstrate an average compression ratio of 2.90 across three benchmark signals, indicating efficient signal acquisition and effective debugging and analysis. This method allows users to acquire more data samples than conventional methods. The proposed lab allows students to remotely test and debug their designs, bridging the gap between theory and practice in embedded system design.
Detecting and resolving feature envy through automated machine learning and m...IJECEIAES
Efficiently identifying and resolving code smells enhances software project quality. This paper presents a novel solution, utilizing automated machine learning (AutoML) techniques, to detect code smells and apply move method refactoring. By evaluating code metrics before and after refactoring, we assessed its impact on coupling, complexity, and cohesion. Key contributions of this research include a unique dataset for code smell classification and the development of models using AutoGluon for optimal performance. Furthermore, the study identifies the top 20 influential features in classifying feature envy, a well-known code smell, stemming from excessive reliance on external classes. We also explored how move method refactoring addresses feature envy, revealing reduced coupling and complexity, and improved cohesion, ultimately enhancing code quality. In summary, this research offers an empirical, data-driven approach, integrating AutoML and move method refactoring to optimize software project quality. Insights gained shed light on the benefits of refactoring on code quality and the significance of specific features in detecting feature envy. Future research can expand to explore additional refactoring techniques and a broader range of code metrics, advancing software engineering practices and standards.
Smart monitoring technique for solar cell systems using internet of things ba...IJECEIAES
Rapidly and remotely monitoring and receiving the solar cell systems status parameters, solar irradiance, temperature, and humidity, are critical issues in enhancement their efficiency. Hence, in the present article an improved smart prototype of internet of things (IoT) technique based on embedded system through NodeMCU ESP8266 (ESP-12E) was carried out experimentally. Three different regions at Egypt; Luxor, Cairo, and El-Beheira cities were chosen to study their solar irradiance profile, temperature, and humidity by the proposed IoT system. The monitoring data of solar irradiance, temperature, and humidity were live visualized directly by Ubidots through hypertext transfer protocol (HTTP) protocol. The measured solar power radiation in Luxor, Cairo, and El-Beheira ranged between 216-1000, 245-958, and 187-692 W/m 2 respectively during the solar day. The accuracy and rapidity of obtaining monitoring results using the proposed IoT system made it a strong candidate for application in monitoring solar cell systems. On the other hand, the obtained solar power radiation results of the three considered regions strongly candidate Luxor and Cairo as suitable places to build up a solar cells system station rather than El-Beheira.
An efficient security framework for intrusion detection and prevention in int...IJECEIAES
Over the past few years, the internet of things (IoT) has advanced to connect billions of smart devices to improve quality of life. However, anomalies or malicious intrusions pose several security loopholes, leading to performance degradation and threat to data security in IoT operations. Thereby, IoT security systems must keep an eye on and restrict unwanted events from occurring in the IoT network. Recently, various technical solutions based on machine learning (ML) models have been derived towards identifying and restricting unwanted events in IoT. However, most ML-based approaches are prone to miss-classification due to inappropriate feature selection. Additionally, most ML approaches applied to intrusion detection and prevention consider supervised learning, which requires a large amount of labeled data to be trained. Consequently, such complex datasets are impossible to source in a large network like IoT. To address this problem, this proposed study introduces an efficient learning mechanism to strengthen the IoT security aspects. The proposed algorithm incorporates supervised and unsupervised approaches to improve the learning models for intrusion detection and mitigation. Compared with the related works, the experimental outcome shows that the model performs well in a benchmark dataset. It accomplishes an improved detection accuracy of approximately 99.21%.
Home security is of paramount importance in today's world, where we rely more on technology, home
security is crucial. Using technology to make homes safer and easier to control from anywhere is
important. Home security is important for the occupant’s safety. In this paper, we came up with a low cost,
AI based model home security system. The system has a user-friendly interface, allowing users to start
model training and face detection with simple keyboard commands. Our goal is to introduce an innovative
home security system using facial recognition technology. Unlike traditional systems, this system trains
and saves images of friends and family members. The system scans this folder to recognize familiar faces
and provides real-time monitoring. If an unfamiliar face is detected, it promptly sends an email alert,
ensuring a proactive response to potential security threats.
Build the Next Generation of Apps with the Einstein 1 Platform.
Rejoignez Philippe Ozil pour une session de workshops qui vous guidera à travers les détails de la plateforme Einstein 1, l'importance des données pour la création d'applications d'intelligence artificielle et les différents outils et technologies que Salesforce propose pour vous apporter tous les bénéfices de l'IA.
Discover the latest insights on Data Driven Maintenance with our comprehensive webinar presentation. Learn about traditional maintenance challenges, the right approach to utilizing data, and the benefits of adopting a Data Driven Maintenance strategy. Explore real-world examples, industry best practices, and innovative solutions like FMECA and the D3M model. This presentation, led by expert Jules Oudmans, is essential for asset owners looking to optimize their maintenance processes and leverage digital technologies for improved efficiency and performance. Download now to stay ahead in the evolving maintenance landscape.
Accident detection system project report.pdfKamal Acharya
The Rapid growth of technology and infrastructure has made our lives easier. The
advent of technology has also increased the traffic hazards and the road accidents take place
frequently which causes huge loss of life and property because of the poor emergency facilities.
Many lives could have been saved if emergency service could get accident information and
reach in time. Our project will provide an optimum solution to this draw back. A piezo electric
sensor can be used as a crash or rollover detector of the vehicle during and after a crash. With
signals from a piezo electric sensor, a severe accident can be recognized. According to this
project when a vehicle meets with an accident immediately piezo electric sensor will detect the
signal or if a car rolls over. Then with the help of GSM module and GPS module, the location
will be sent to the emergency contact. Then after conforming the location necessary action will
be taken. If the person meets with a small accident or if there is no serious threat to anyone’s
life, then the alert message can be terminated by the driver by a switch provided in order to
avoid wasting the valuable time of the medical rescue team.
Open Channel Flow: fluid flow with a free surfaceIndrajeet sahu
Open Channel Flow: This topic focuses on fluid flow with a free surface, such as in rivers, canals, and drainage ditches. Key concepts include the classification of flow types (steady vs. unsteady, uniform vs. non-uniform), hydraulic radius, flow resistance, Manning's equation, critical flow conditions, and energy and momentum principles. It also covers flow measurement techniques, gradually varied flow analysis, and the design of open channels. Understanding these principles is vital for effective water resource management and engineering applications.
Applications of artificial Intelligence in Mechanical Engineering.pdfAtif Razi
Historically, mechanical engineering has relied heavily on human expertise and empirical methods to solve complex problems. With the introduction of computer-aided design (CAD) and finite element analysis (FEA), the field took its first steps towards digitization. These tools allowed engineers to simulate and analyze mechanical systems with greater accuracy and efficiency. However, the sheer volume of data generated by modern engineering systems and the increasing complexity of these systems have necessitated more advanced analytical tools, paving the way for AI.
AI offers the capability to process vast amounts of data, identify patterns, and make predictions with a level of speed and accuracy unattainable by traditional methods. This has profound implications for mechanical engineering, enabling more efficient design processes, predictive maintenance strategies, and optimized manufacturing operations. AI-driven tools can learn from historical data, adapt to new information, and continuously improve their performance, making them invaluable in tackling the multifaceted challenges of modern mechanical engineering.
Generative AI Use cases applications solutions and implementation.pdfmahaffeycheryld
Generative AI solutions encompass a range of capabilities from content creation to complex problem-solving across industries. Implementing generative AI involves identifying specific business needs, developing tailored AI models using techniques like GANs and VAEs, and integrating these models into existing workflows. Data quality and continuous model refinement are crucial for effective implementation. Businesses must also consider ethical implications and ensure transparency in AI decision-making. Generative AI's implementation aims to enhance efficiency, creativity, and innovation by leveraging autonomous generation and sophisticated learning algorithms to meet diverse business challenges.
https://www.leewayhertz.com/generative-ai-use-cases-and-applications/
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...shadow0702a
This document serves as a comprehensive step-by-step guide on how to effectively use PyCharm for remote debugging of the Windows Subsystem for Linux (WSL) on a local Windows machine. It meticulously outlines several critical steps in the process, starting with the crucial task of enabling permissions, followed by the installation and configuration of WSL.
The guide then proceeds to explain how to set up the SSH service within the WSL environment, an integral part of the process. Alongside this, it also provides detailed instructions on how to modify the inbound rules of the Windows firewall to facilitate the process, ensuring that there are no connectivity issues that could potentially hinder the debugging process.
The document further emphasizes on the importance of checking the connection between the Windows and WSL environments, providing instructions on how to ensure that the connection is optimal and ready for remote debugging.
It also offers an in-depth guide on how to configure the WSL interpreter and files within the PyCharm environment. This is essential for ensuring that the debugging process is set up correctly and that the program can be run effectively within the WSL terminal.
Additionally, the document provides guidance on how to set up breakpoints for debugging, a fundamental aspect of the debugging process which allows the developer to stop the execution of their code at certain points and inspect their program at those stages.
Finally, the document concludes by providing a link to a reference blog. This blog offers additional information and guidance on configuring the remote Python interpreter in PyCharm, providing the reader with a well-rounded understanding of the process.
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...PriyankaKilaniya
Energy efficiency has been important since the latter part of the last century. The main object of this survey is to determine the energy efficiency knowledge among consumers. Two separate districts in Bangladesh are selected to conduct the survey on households and showrooms about the energy and seller also. The survey uses the data to find some regression equations from which it is easy to predict energy efficiency knowledge. The data is analyzed and calculated based on five important criteria. The initial target was to find some factors that help predict a person's energy efficiency knowledge. From the survey, it is found that the energy efficiency awareness among the people of our country is very low. Relationships between household energy use behaviors are estimated using a unique dataset of about 40 households and 20 showrooms in Bangladesh's Chapainawabganj and Bagerhat districts. Knowledge of energy consumption and energy efficiency technology options is found to be associated with household use of energy conservation practices. Household characteristics also influence household energy use behavior. Younger household cohorts are more likely to adopt energy-efficient technologies and energy conservation practices and place primary importance on energy saving for environmental reasons. Education also influences attitudes toward energy conservation in Bangladesh. Low-education households indicate they primarily save electricity for the environment while high-education households indicate they are motivated by environmental concerns.
Software Engineering and Project Management - Introduction, Modeling Concepts...Prakhyath Rai
Introduction, Modeling Concepts and Class Modeling: What is Object orientation? What is OO development? OO Themes; Evidence for usefulness of OO development; OO modeling history. Modeling
as Design technique: Modeling, abstraction, The Three models. Class Modeling: Object and Class Concept, Link and associations concepts, Generalization and Inheritance, A sample class model, Navigation of class models, and UML diagrams
Building the Analysis Models: Requirement Analysis, Analysis Model Approaches, Data modeling Concepts, Object Oriented Analysis, Scenario-Based Modeling, Flow-Oriented Modeling, class Based Modeling, Creating a Behavioral Model.
2. Int J Elec & Comp Eng ISSN: 2088-8708
Fuzzy clustering and Fuzzy C-Means partition cluster analysis and validation… (K. Varada Rajkumar)
2761
In the form of an objective function, a better fuzzy partition and cluster prototypes are found by the Fuzzy
methods. The clustering work can then be reformulated as a function optimization problem. The cluster
prototypes and the memberships of data points are the basis for the objective function to the clusters.
Many fuzzy clustering ways have been proposed due to modifications of the objective functions, which aims
at improving the result with respect to noise, outliers etc [6]. Fuzzy C-Means (FCM) clustering is an
unsupervised technique that classifies the image by grouping similar data points in the feature space into
clusters [7]. Fuzzy c-means clustering to frame the clusters in the dataset. The optimal and equalized clusters
are generated by the FCM algorithm [8]. In this task, fuzzy clustering algorithm followed by fuzzy c-means
(FCM) algorithm are implemented on a subset of data which is derived from CiteScore database. A note
specifying the number of clusters to search for the prior analysis has been presented.
2. RESEARCH METHOD
a. Dataset
A dataset of 670 journals were selected as dataset in this study. The dataset included in analysis has
data regarding CiteScore, percentile, citation count, percent cited, SNIP, SJR respectively. Apart from this,
publisher data was also included. Before finalizing dataset, a filtered search was performed on 22,618 journal
titles for which CiteScore was evaluated. A preliminary broad search employing term such as “Computer
Science”, with at least “250 documents” in any journal with inclusion of open access titles resulted in 670
journals.
b. CiteScore
The use of CiteScore is to measure the reference (citation) impact of the serial titles such as in
journals. Serial titles publish at uniform intervals (i.e. one or more number of volumes per year). CiteScore is
used to calculate the average number of references (citations) in the journal per year by comparing it with
those of preceding three year CiteScore. CiteScore metrics are calculated from Scopus Data. Trade journals,
journals, book series and conference proceedings are the kinds of serial data available. However, only journal
titles are selected for the study.
c. Assessing the clusterability - Hopkins statistic
The function get_clust_tendency() in factoextra shall be used to assess whether the dataset can be
clustered. Computing Hopkins statistic is a way to achieve this. The ‘clustering tendency’ of a dataset is
assessed by Hopkin statistics. We can reject the null hypothesis if the value of Hopkin statistics is almost
zero. We can conclude that the dataset is significantly a clusterable data.
d. Determining optimal clusters
A cluster validity package namely NbClust was used to estimate the number of clusters in a dataset.
NbClust contains nearly 30 various validity indices. The aim of NbClust is to gather all the available indices.
Different clusters of data are produced by different clustering algorithms. Sometimes, the final clustering
partitions may vary greatly even with the same algorithm having a variation in the presentation order of the
data or parameters. There can be any number of clusters in the range 2-8. The distance metric is set to
"euclidean". Some other distance metrics are "minkowski", "maximum", "camberra", "manhattan", and
"binary". The agglomeration method for hierarchical clustering is set to "ward.D2". Other methods such as
"ward.D", "complete", "single", "mcquitty", "average", "median" or "centroid" can also be selected.
e. Soft clustering
Soft partitions of data are naturally given out as output by some clustering algorithms like as EM [9]
and fuzzy cmeans [10]. The degree of relation (association) of each instance to each output cluster can be
assigned a value by soft partition. Pedrycz created a milestone on “collaborative” fuzzy clustering by
working on it [11]. Pedrycz considered a vertical partitioning scenario. He then represented the association of
multiple partitioning via pair wise interaction coefficients. This resulted in the increase in the cost function to
include the effect of association in optimization process.
f. Fanny
FANNY, derives its name from Fuzzy ANalYsis. Fanny aims to minimize the objective function.
SUM_[v=1..k] (SUM_(i,j) u(i,v)^r u(j,v)^r d(i,j)) / (2 SUM_j u(j,v)^r)
Where n is the number of observations,
k is the number of clusters,
r is the membership exponent memb.exp
d(i,j) is the dissimilarity between observations i and j.
If r -> 1 gives increasingly crisper clustering’s whereas
r -> ∞ leads to complete fuzziness.
3. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 9, No. 4, August 2019 : 2760 - 2770
2762
Compared to other fuzzy clustering methods, fanny has the following features:
(a) It also accepts a dissimilarity matrix;
(b) It is more robust to the spherical cluster assumption;
(c) It provides a novel graphical display, the silhouette plot.
g. Validation
To assess whether fanny resulted in better clusters, validation on clusters should be carried out.
Hence, clValid package was used where it offers internal and stability validation procedures. Only the
clustering partition and the dataset are considered as input by internal validation. Intrinsic information in the
data is used to assess the quality of clustering. The stability measures evaluate the consistency of a clustering
result. It is done by comparing clustering result with the clusters obtained after each column is removed,
one at a time, a leave-one-out method.
h. Internal validation
Internal validation is a measure of the connectedness compactness and separation of the cluster
partitions. The extent to which, observations are placed in the same cluster as their nearest neighbors in the
data space is referred to as connectedness [12]. Compactness, referred by Dunn index [13], assesses the
homogeneity of a cluster. This is usually done by looking at the intra-cluster variance. Separation, referred by
silhouette width [14], measures the degree of separation between the clusters. This is usually done by
measuring the distance between the centroids of the clusters.
i. Stability validation
Let the total number of observations (rows) in a dataset be N and the total number of columns be M.
Then the clustering results will be compared by the stability measures based on the full data to perform
clustering versus based on leaving one column, one at a time [15]. The average distance between the means
(ADM), the average proportion of non-overlap (APN), and the figure of merit (FOM), the average distance
(AD), are the stability measures.
j. Fuzzy C-Means (FCM)
Fuzzy C- is a clustering method which allows one point to belong to one or more than one clusters.
Dunn developed this method in 1973 and Bezdek improved it in the year 1981. It is often used in pattern
recognition. A finite collection of points is partitioned into a collection of C fuzzy clusters on the basis of
given criteria is done by the FCM algorithm. Thus, the points lying on the edge of a cluster, may be in the
cluster to a lesser degree than points lying in the centre of cluster. Based on minimization of the following
objective function the FCM algorithm is derived:
1,µaµ,
2
1 1
maxJ ij
c
i
n
j
m
ijm
Fuzzy c-means (FCM) is the best fuzzy clustering method known [16]. Fuzzy logic is used so that
each instance is associated not only with a single cluster but also has a certain degree of membership with
each of the centroids that exist. The fuzziness of the solution becomes closer to 1 as m goes closer to
infinity (∞). The solution becomes increasingly similar to the clustering of binary k-means [17].
Setting m=2.0 is a good choice [18].
Ruspini Fuzzy clustering theory is the base for FCM clustering algorithm. On the basis of distance
between various input data points, analysis is carried out by this algorithm. Basing on the distance between
data points and the cluster centres the clusters are formed. Grouping of dataset into n clusters with each and
every data point in the dataset being related to every cluster is done by using the FCM data clustering
technique [19]. The data point will have a high low degree of connection with the cluster whose centre of the
cluster is lying far away from it and will have high degree of connection if the cluster centre is closer to it.
k. Validation: Fuzzy C-Means (FCM)
Cluster validity indices are objective functions. The cluster validity indices are used to obtain
clustering indices, and clustering solution [20]. Validation measures such as "partition. coefficient",
"gath. geva", "fukuyama. sugeno" "xie. beni", “partition entropy”,"separation. index" were obtained by
running e1071 cluster package. Gath and Geva [21] adopted three measures such as average partition density
(APD), fuzzy hyper volume (FHV) and partition density (PD). The Xie-Beni index (XB) [22] defined inter-
cluster separation as the square of the minimum distance between the centers of the clusters. They defined the
intra-cluster compactness as the square of the mean gap (distance) between each data object and its cluster
center. When the minimum of XB is found the optimal cluster number is said to be reached.
Two properties of clustering are computed by using Fukuyama-Sugeno index (FS). They are
cohesion and separation. The aim of enhancing of this objective function is to decrease the separation
between clusters of a data set and increase the compactness of a certain cluster. A good partition is proposed
4. Int J Elec & Comp Eng ISSN: 2088-8708
Fuzzy clustering and Fuzzy C-Means partition cluster analysis and validation… (K. Varada Rajkumar)
2763
by the minimum values of this index. The fuzziness of the partition is measured by partition coefficient index
but without taking into consideration the dataset itself. It is a kind of trail and error method since it has no
connection to any property of the data. A good partition is implied by maximum values in the meaning of a
least fuzzy clustering. Information about the membership matrix is provided by partition entropy index,
without also considering the data itself. A good partition is implied by the minimum values in the meaning of
a more crisp partition. Separation index (CS Index) identifies unique cluster structure with well-defined
features that depend on the data and a measure of seperation. Information about the membership matrix is
provided by partition entropy index, without also considering the data itself. A good partition is implied by
the minimum values in the meaning of a more crisp partition. Separation index (CS Index) identifies unique
cluster structure with well-defined features that depend on the data and a measure of seperation.
3. RESULTS AND ANALYSIS
If a fuzzy clustering technique is to be employed in an application oriented task to significantly
cluster groups of objects, then the first aspect to encounter is assessing clustering tendency. To achieve better
results in cluster analysis, data has been investigated for clusterability. Hence, a method put forward by
Banerjee et al [23] such as Hopkins statistic assesses clustering tendency of a dataset. This is measured by
calculating the probability of uniform data distribution or otherwise the ‘spatial randomness’ of the data.
A value of Hopkins statistic close to zero represents the null hypothesis being rejected and the dataset
represents significant clusterability. The citescore dataset considered in the analysis resulted in Hopkins
statistic of 0.4371, a value < 0.5, indicating that the data is highly clusterable.
a. NBClust
From NbClust analysis, the result given below signifies that around 9 various indices proposed 3
cluster solutions as best clusters and 4 proposed 2 solutions and 8 validity indices suggested 10 clusters.
According to the majority rule, the best number of clusters is 3 cluster groups.
**************************************************************************************
* Among all indices:
* 4 proposed 2 as the best number of clusters
* 9 proposed 3 as the best number of clusters
* 1 proposed 4 as the best number of clusters
* 1 proposed 5 as the best number of clusters
* 2 proposed 7 as the best number of clusters
* 1 proposed 8 as the best number of clusters
* 1 proposed 9 as the best number of clusters
* 8 proposed 10 as the best number of clusters
* According to the majority rule, the best number of clusters is 3
**************************************************************************************
b. Hubert index
The Hubert index is a graphical method of determining the number of clusters. In the plot of Hubert
index, a significant knee was observed at 3rd cluster that corresponds to a significant increase of the value of
the measure i.e the significant peak in Hubert index second differences plot as shown in Figure 1.
Figure 1. Hubert index
c. D-index
The D index is a graphical method of determining the number of clusters. In the plot of D index,
a significant knee (the significant peak in Dindex second differences plot) observed at 3rd cluster that
corresponds to a significant increase of the value of the measure as shown in Figure 2. From the Figure 3 and
5. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 9, No. 4, August 2019 : 2760 - 2770
2764
NbClust output in Table 1, it can be concluded that the optimal number of clusters, for the dataset comprising
of citescore bibliometric dataset was found to be 3 cluster solutions. Therefore, initial value of k=3 was used
to perform fuzzy clustering analysis.
Figure 2. D-index Figure 3. Consensus on optimal number of clusters
obtained from NbClust package
Table 1. NBCLUST Output
d. Fanny
A subset data comprising of 26 objects were subjected to fuzzy clustering (fanny) algorithm and the
data was scaled to convert into a data frame, provided k=3 as desired number of clusters. The run resulted in
cluster as well as silhouette plots. As per Ruspini [5], the data points in the centre of the cluster can have a
degree equal to 1 and are associated with the membership degrees to the nearby clusters. Certain points near
the boundary of the cluster were identified among clusters 1 and 2 respectively. From Figure 4, point 16 of
cluster 1 and point 4 of cluster 2 have a chance to appear in either of the clusters. From the membership
coefficient data given below, it can be found that only point 16 lies between cluster 1 and 2 (with coefficients
43 and 39) sharing 50% characteristics whereas point 4 has more cluster-2 significance than cluster 1 which
is found from coefficients being 73 for cluster-2 and 20 for cluster-1. Membership coefficients (in %,
rounded) as shown in Table 2.
6. Int J Elec & Comp Eng ISSN: 2088-8708
Fuzzy clustering and Fuzzy C-Means partition cluster analysis and validation… (K. Varada Rajkumar)
2765
Figure 4. Fanny output showing 3 clusters
Table 2. Membership coefficients
[,1] [,2] [,3] [,1] [,2] [,3]
[1,] 73 19 8 [14,] 60 19 21
[2,] 85 10 5 [15,] 11 69 20
[3,] 18 71 10 [16,] 43 39 19
[4,] 20 73 7 [17,] 10 68 22
[5,] 12 79 9 [18,] 6 8 86
[6,] 68 20 12 [19,] 2 4 93
[7,] 11 82 8 [20,] 7 19 75
[8,] 9 83 8 [21,] 5 15 80
[9,] 76 15 9 [22,] 2 3 95
[10,] 7 89 5 [23,] 1 3 96
[11,] 6 89 5 [24,] 2 4 94
[12,] 26 57 17 [25,] 2 5 93
[13,] 10 56 33 [26,] 4 6 90
e. Fanny output
Tables 3-6 presents about Fanny output.
Table 3. Fuzzy clustering objects of class ‘fanny’
Fuzzy Clustering Objects of Class ‘fanny’
m.ship.expon 2
Objective 10.15785
Tolerance 1e-15
Iterations 299
Converged 1
Maxit 500
n 26
Table 4. Fuzzyness coefficients
dunn_coeff Normalized
0.4324369 0.1486554
Table 5. ATTR (“SCALED:CENTER)
CiteScore Count SNIP SJR
18.631538 10066.576923 6.313346 10.740308
Table 6. ATTR (,"SCALED:SCALE")
CiteScore Count SNIP SJR
9.687746 15622.344869 3.314119 7.473616
f. Fuzzy C-Means
The fuzzy c-means program written in R was run using default parameters with maximum number
of iterations fixed to 100, distance measure being euclidean, k=3 as initial centroids, and data frame as input.
However, the fuzziness parameter, m depends on the dataset and variations of objects within the dataset.
In literature, many works fixed m=2, which allowed computation of µij. Therefore, we intended to evaluate
the appropriate value of m for citescore subset dataset starting from m=1, which resulted in failure of FCM to
extract any clustering solution. Further, gradual increase from m=1.1 till 1.5 resulted in relatively low
membership values suggesting that FCM failed to associate any citescore value to any cluster with k=3.
Appropriate value of m
It was reported by Bezdek, [17] that when m goes to infinity, value of µij attains 1/K, which
suggests that for a given dataset, at a particular value of m and above which the membership values obtained
from fuzzy c-means equals to 1/K. It should be noted that the membership values µij depend on the distances
between citescore objects and cluster centroids. For a complex dataset, it is approximated that the cluster
centroids appear close to the objects of a dataset. This raises probability that when m varies, the FCM
membership values and coefficient of variation (CV) of the set of distances between objects also vary [24].
Therefore, FCM algorithm was run to determine the distribution of membership values with variation in m
from 1 to 2. The program iteratively updates the cluster centres and the membership values for each data
point, which moves the cluster centres to an appropriate location within a data set. The iteration minimizes
the objective function which represents the distance from any given data point to the cluster centre.
The obtained data is given in Table 7 and the membership plots are given in Figure 5. The coefficient of
variation (CV), also known as relative variability, equals the standard deviation divided by the mean. It is a
measure of spread that describes the amount of variability relative to the mean. In general, the lower the ratio
7. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 9, No. 4, August 2019 : 2760 - 2770
2766
of standard deviation to mean, the better is the data. A small CV indicates that the data points tend to be very
close to each other. A high CV indicates that the data points are spread out from one another.
This observation is evidenced from membership plots in Figure 5.
Table 6. Coefficient of variation versus changes in ‘m’ values
m
Standard deviation
of cluster dataset
Mean of cluster
dataset
Coefficient of
variance
1.0 0.777 1.731 44.92
1.1 0.777 1.731 44.92
1.2 0.777 1.731 44.92
1.3 0.732 2.269 42.96
1.4 0.732 2.153 33.97
1.5 0.732 2.153 33.97
1.6 0.732 2.153 33.97
2.0 0.732 1.846 39.63
Figure 5. Distribution of membership values upon varying m
The distribution of the membership values varies markedly from one m value to another. From the
above data, it can be inferred that when m nearer to 1.0, the FCM algorithm results in less fuzzy membership
values and hence to obtain high and reasonable membership values for objects within a cluster, a continual
process of evaluating m values resulted in CV 33.97 for m values 1.3 to 1.5. When higher m values are used,
a more spread in µij distribution was observed. With lower m values, all objects tend to clustered as groups, a
feature evidenced when using hard clustering procedure such as k-means. Hence, membership values are
8. Int J Elec & Comp Eng ISSN: 2088-8708
Fuzzy clustering and Fuzzy C-Means partition cluster analysis and validation… (K. Varada Rajkumar)
2767
plotted for clusters obtained from FCM algorithm and observation of plots in Fig. 3 suggests that a fine
distribution of data was evidenced when m=1.6 and m=2.0. Thus, the selected value for m=2.0 showed more
distribution of data. Nearly 5 runs of FCM algorithm for each m was performed, and the membership values
and centroids corresponding to the lowest value was reported, which allowed to scan the entire data space for
best outcome.
g. FCM output
Fuzzy c-means clustering with 3 clusters as shown in Table 8 and 9. From the output of membership
coefficients and Fig. 6, it is observed that point 13 appears to be associated with the equi membership vector
(0.42 and 0.49) between cluster 1 and cluster2 respectively. Comparison of output clusters from fanny and
fuzzy c-means reported in Fig. 4 and 6 suggest that FCM outperforms the method. It should be noted that
object 16 belongs to cluster1 in fanny whereas it appeared under cluster-2.
Table 8. Cluster centres
Citescore Count SNIP SJR
1 0.4351629 -0.4594609 0.6589130 0.2602495
2 1.1223955 1.7159599 0.1394195 0.9156109
3 -0.9677277 -0.3844464 -0.8681554 -0.8074940
Table 9. Memberships
1 2 3 1 2 3
[1,] 0.08975023 0.680169994 0.23007978 [14,] 0.25828359 0.534040262 0.20767615
[2,] 0.04052149 0.879051443 0.08042707 [15,] 0.24099783 0.087734834 0.67126734
[3,] 0.10766629 0.135105057 0.75722865 [16,] 0.21537091 0.350244051 0.43438504
[4,] 0.06713259 0.131599971 0.80126744 [17,] 0.26534923 0.077211872 0.65743890
[5,] 0.10105010 0.089192260 0.80975764 [18,] 0.89729461 0.037845149 0.06486024
[6,] 0.13783684 0.654451299 0.20771186 [19,] 0.97268473 0.007618149 0.01969712
[7,] 0.09094683 0.084667884 0.82438529 [20,] 0.82357384 0.040067803 0.13635836
[8,] 0.10525986 0.073075931 0.82166421 [21,] 0.85187495 0.033754744 0.11437030
[9,] 0.10639560 0.732322982 0.16128142 [22,] 0.96482567 0.011124273 0.02405005
[10,] 0.02687498 0.023996248 0.94912877 [23,] 0.96780975 0.009855446 0.02233481
[11,] 0.02078841 0.016760549 0.96245104 [24,] 0.94393379 0.019411553 0.03665465
[12,] 0.18145613 0.205332635 0.61321123 [25,] 0.94339514 0.017418952 0.03918591
[13,] 0.42270327 0.081091918 0.49620481 [26,] 0.91062372 0.033043802 0.05633248
Figure 6. Image showing FCM output
h. Time complexity
The time complexity [25] of fuzzy clustering (fanny) and fuzzy c-means algorithms were evaluated
by keeping data point’s constant and varying number of clusters. The details were presented in Table 10.
Table 10. Time complexity analysis of fanny and FCM algorithms
Algorithm
Number of
clusters (k)
Executed
time in secs
Algorithm
Number of
clusters (k)
Executed
time in secs
Fanny
1 0.74
FCM
1 0.72
2 0.79 2 0.76
3 0.81 3 0.76
4 0.82 4 0.79
9. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 9, No. 4, August 2019 : 2760 - 2770
2768
i. Assessing quality of clusters
On the other hand, validity of fuzzy clustering was proposed by Gath and Geva (1989) [21] in
applications to mixtures of normal distributions. Distributed hierarchical cluster validity was proposed by Xie
and Beni [26]. In general, the clusters are not crisp in nature, for example, an object in one cluster may not
absolutely belong to a particular cluster and however, it may contain the property of other cluster. Hence,
introduction of fuzziness concept would generate soft clustering of objects rather employing hard clustering
methods such as k-means, PAM etc. Here, we employed two fuzzy clustering models such as fanny [27],
fuzzy c-means (FCM) [28] techniques to cluster and find out the variability among these methods.
j. Validation
1. Fanny – internal validation
Internal validation measures include, compactness, connectedness and separation of the cluster
partitions. They are otherwise referred as Connectivity, Silhouette width and Dunn index. The program
resulted in the following output where all the three measures are within the specified limits as shown in
Table 11.
Table 11. Internal validation data and
their values output from the program CL valid
Internal Validation Indexes Value
Connectivity 12.6222
Dunn 0.1843
Silhouette 0.4210
2. Fanny – stability valdation
The stability measures are calculated using APN, AD, ADM, and FOM. All measures should result
in minimum values. These methods take computation time due to the fact that the program re-clusters a
dataset by omitting single column. The output of the program is given in Table 12. It can be observed from
the output data as shown in Tables 11 and 12 that all internal and stability measures are within the limits.
Table 12. Stability validation methods and
respective scores of each method signifying acceptable limits
Validation method Score
APN 0.1182984
AD 1.5048444
ADM 0.3834056
FOM 0.8069575
3. Validation: fuzzy c-means (FCM)
The values of the indexes can be independently used in order to evaluate and compare clustering
partitions or even to determine the number of clusters existing in a data set as shown in Tables 13 and 14.
It is therefore understood that all validation parameters are within the limits.
Table 13. Estimated GATH and GEVA cluster validity measures of
citescore subset data set with K=2,3,4 and M=2
Cluster size FHV DPA DP
2 0.7455 15.744 11.165
3 1.0943 15.805 6.819
4 1.326 14.003 6.578
Table 14. Estimated cluster validity indexes of
citescore subset data set with K=2,3,4 and M=2
Cluster size XB FS PC PE SI
2 0.011 -23.894 0.719 0.437 0.193
3 0.006 -44.194 0.687 0.570 0.221
4 0.033 -40.116 0.568 0.814 0.221
10. Int J Elec & Comp Eng ISSN: 2088-8708
Fuzzy clustering and Fuzzy C-Means partition cluster analysis and validation… (K. Varada Rajkumar)
2769
4. CONCLUSION
Cluster analysis initiated on a subset of citescore dataset using fuzzy clustering and fuzzy c-means
algorithm resulted in appearance of equidistant points in the boundary of clusters which is evidenced from
membership degrees of each cluster data point. Time complexity analysis revealed that FCM performs much
faster than fuzzy method. Further, all internal and stability measure procedures of fuzzy clustering and all
validity indexes of FCM are found to be within the limits.
REFERENCES
[1] Babu, G.P., Murty, M.N, “Clustering with evolution strategies,” Pattern Recognition, Vol. 27, pp.321–329, 1994.
[2] Bezdek, J.C., Keller, J., Krishnapuram, R., Pal, N.R, “Fuzzy Models and Algorithms for Pattern Recognition and
Image Processing,” The Hand books of Fuzzy Sets, Vol. 4 Kluwer, Boston and London, 1999.
[3] Zadeh. L.A, “Fuzzy sets,” Information and Control, Vol. 8, Iss. 3, pp.338–353, 1965.
[4] Döring, C., Borgelt, C., Kruse R, “Fuzzy clustering of quantitative and qualitative data,” Fuzzy Information,
Processing NAFIPS ‘04, IEEE, PP.84-89.
[5] Ruspini, E.H, “A new approach to clustering,” Information and Control, Vol. 15 Issue 1, pp.22–32, 1969.
[6] Davé, R, “Characterization and detection of noise in clustering,” Pattern Recognition Letters, Vol. 12, pp.657–664,
1991.
[7] Joonsung Park, Doo Heon Song, Hosung Nho, Hyun-Min Choi, Kyung-Ae Kim, Hyun Jun Park, Kwang Baek Kim,
“Automatic Segmentation of Brachial Artery based on Fuzzy C-Means Pixel Clustering from Ultrasound Images,”
International Journal of Electrical and Computer Engineering (IJECE) Vol. 8, No. 2, pp. 638-643, 2018.
[8] Amit Kr. Kaushik, “A Hybrid Approach of Fuzzy C-Means Clustering and Neural Network to Make Energy
Efficient Heterogeneous Wireless Sensor Network,” International Journal of Electrical and Computer Engineering
(IJECE) Vol. 6, No. 2, pp. 674-681, 2016.
[9] Dempster A, Laird N and Rubin D, “Maximum likelihood from incomplete data via the em algorithm,” Journal of
the Royal Statistical Society, Vol. 39 Series B, pp.1–38, 1977.
[10] Dunn JC, “A fuzzy relative of the iso-data process and its use in detecting compact well-separated clusters,” Journal
of Cybernetics, Vol- 3, pp.32–57, 1973.
[11] Pedrycz W, “Collaborative fuzzy clustering,” Pattern Recognition Letters, Vol. 23, Issue 14, pp.1675–86, 2002.
[12] Handl J, Knowles J, Kell DB, “Computational Cluster Validation inPost-Genomic Data Analysis,” Bioinformatics,
Vol. 21 Issue 15, pp.3201–12, 2005.
[13] Dunn JC, “Well Separated Clusters and Fuzzy Partitions,” Journal on Cybernetics, Vol. 4, pp.95–104, 1974.
[14] Rousseeuw PJ, “Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis,” Journal of
Computational and Applied Mathematics, 20, pp.53–65, 1987.
[15] Yeung KY, Haynor DR, Ruzzo WL, “Validating Clustering for GeneExpression Data,” Bioinformatics, Vol. 17
Issue 4, pp.309–18, 2001.
[16] Zadeh.L.A, “Fuzzy sets and systems,” International Journal of General Systems, Vol. 17, pp.129-138, 1965.
[17] Bezdek.J.C, “Pattern Recognition with Fuzzy Objective Function Algorithms,” Plenum Press, New York, 1981.
[18] Nikhil R. Pal, James C. Bezdek, and Richard J. Hathaway, “Sequential Competitive Learning and the Fuzzy
c-Means Clustering Algorithms,” Neural Networks, Vol. 9, No. 5, pp. 787-796, 1996.
[19] Chen.S, Zhang.D, “Robust image segmentation using FCM withspatial constraints based on new kernel-induced
distance measure,” IEEE Transactions on Systems, Man and Cybernetics, Vol. 34, pp.1907-1916, 1998.
[20] Sengupta.S, S. De, Konar.A and Janarthanan.R, “An improved fuzzy clustering method using modified Fukuyama-
Sugeno cluster validity index,” Internationa Conference on Recent Trends in Information Systems, Kolkata, pp.
269-274, 2011.
[21] Gath, I., Geva, A.B, “Unsupervised optimal fuzzy clustering,” IEEETransactions on Pattern Analysis and Machine
Intelligence, Vol. 11, pp.773–781, 1989.
[22] Xie. X.L., Beni.G, “A validity measure for fuzzy clustering,” IEEE PAMI, Vol. 13, No. 8, pp. 841–847, 1991.
[23] Banerjee A, Dave RN, “Validating clusters using the Hopkins Statistic,” Fuzzy Systems, Proceedings 2004 IEEE
international conference, Vol. 1, pp.149-153, 2004.
[24] Doulaye Dembel, Philippe Kastner, “Fuzzy C-means method for clustering microarray data,” Bioinformatis, 19(8),
pp.973-980, 2003.
[25] Lim, A., Tjhi, W, “R High Performance Programming,” Packet Publishing Ltd, 2015.
[26] Xie. X.L., Beni.G, “Distributed hierarchical decision fusion with cluster validity,” in Proceedings IEEE
International Conference, Syst. Man Cybern, pp. 689-694, 1990.
[27] Kaufman, L., Rousseeuw, P.J, “Finding Groups in Data: An Introduction to Cluster Analysis,” Wiley, New York,
1990.
[28] Bezdek, James C., Robert Ehrlich, and William Full, “FCM: The fuzzyc-means clustering algorithm,” Computers &
Geosciences, Vol. 10, No. 2-3, pp.191-203, 1984.
11. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 9, No. 4, August 2019 : 2760 - 2770
2770
BIOGRAPHIES OF AUTHORS
K VaradaRajkumar is presently working as Assistant Professor of CSE at Sir C R Reddy
College of Engineering, Eluru, Andhra Pradesh. He received Master of Technology in Computer
Science & Engineering during 2008-10 from JNT University Kakinada. Presently he is pursuing
Ph.D. in the field of Data Mining Cluster analysis from K L University, Vijayawada, Andhra
Pradesh.
AdimulamYesuBabu presently working as Professor & Head of the Department of Computer
Science & Engineering, Sir C R Reddy College of Engineering, Eluru, Andhra Pradesh, India.
He had Around 29 years of Academic & Administration experience. Well versed with technical
writing and editing. Reviewing research papers for International Journals. Reviewing Journal
papers for Journal of Computational Biology and Bioinformatics Research (JCBBR). He had
been Reviewing journal papers for International Journal of Biometrics and Bioinformatics
(IJBB). Reviewer for CiiT International Journal of Data Mining knowledge Engineering.
K. Subrahmanyam presently working as Professor of the Department of Computer Science &
Engineering, K L Unveristy, Vijayawada, Andhra Pradesh, India. He received Ph.D from Andhra
University. He has several papers in National and International Journals. His Research interest
includes Software Engineering, Bioinformatics and Data Mining.