Due to various killing diseases in the world, medical data clustering is a very
challenging and critical task to handle and to take the proper decision from
multidimensional complex data in an effective manner. The most familiar
and suitable speedy clustering algorithm is K-means than other traditional
clustering approaches. But K-means is extra sensitive for initialization of
clustering centroid and it can easily surround. Thus, there is a necessity for
faster clustering with an effective optimum clustering centroid. Based on
that, this research paper projected an optimization-based clustering by hybrid
fuzzy C-means (FCM) clustering on rainfall flow optimization technique
(RFFO), which is the normal flow and behavior of rainfall flow from one
position to another position. FCM clustering algorithm is used to cluster the
given medical data and RFFO is used to produce optimum clustering
centroid. Finally, the clustering performance is also measured for the
proposed FCM clustering on RFFO technique with the help of accuracy,
random coefficient, and Jaccard coefficient for medical data set and find the
risk factor of a heart attack.
RAINFALL PREDICTION USING DATA MINING TECHNIQUES - A SURVEYcsandit
Rainfall is considered as one of the major components of the hydrological process; it takes
significant part in evaluating drought and flooding events. Therefore, it is important to have an
accurate model for rainfall prediction. Recently, several data-driven modeling approaches have
been investigated to perform such forecasting tasks as multilayer perceptron neural networks
(MLP-NN). In fact, the rainfall time series modeling (SARIMA) involvesimportant temporal
dimensions. In order to evaluate the incomes of both models, statistical parameters were used to
make the comparison between the two models. These parameters include the Root Mean Square
Error RMSE, Mean Absolute Error MAE, Coefficient Of Correlation CC and BIAS. Two-Third
of the data was used for training the model and One-third for testing.
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...ijcsit
The wide usage of the Internet and the availability of powerful computers and high-speed networks as low cost
commodity components have a deep impact on the way we use computers today, in such a way that
these technologies facilitated the usage of multi-owner and geographically distributed resources to address
large-scale problems in many areas such as science, engineering, and commerce. The new paradigm of
Grid computing has evolved from these researches on these topics. Performance and utilization of the grid
depends on a complex and excessively dynamic procedure of optimally balancing the load among the
available nodes. In this paper, we suggest a novel two-dimensional figure of merit that depict the network
effects on load balance and fault tolerance estimation to improve the performance of the network
utilizations. The enhancement of fault tolerance is obtained by adaptively decrease replication time and
message cost. On the other hand, load balance is improved by adaptively decrease mean job response time.
Finally, analysis of Genetic Algorithm, Ant Colony Optimization, and Particle Swarm Optimization is
conducted with regards to their solutions, issues and improvements concerning load balancing in
computational grid. Consequently, a significant system utilization improvement was attained. Experimental
results eventually demonstrate that the proposed method's performance surpasses other methods.
Illustration of Medical Image Segmentation based on Clustering Algorithmsrahulmonikasharma
Image segmentation is the most basic and crucial process remembering the true objective to facilitate the characterization and representation of the structure of excitement for medical or basic images. Despite escalated research, segmentation remains a challenging issue because of the differing image content, cluttered objects, occlusion, non-uniform object surface, and different factors. There are numerous calculations and techniques accessible for image segmentation yet at the same time there requirements to build up an efficient, quick technique of medical image segmentation. This paper has focused on K-means and Fuzzy C means clustering algorithm to segment malaria blood samples in more accurate manner.
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...cscpconf
In the present study, the abilities of three classification methods of data mining namely artificial
neural networks with feed-forward back propagation algorithm, J48 decision tree method and
logistic regression analysis are compared in a medical real dataset. The prediction of
malignancy in suspected thyroid tumour patients is the objective of the study. The accuracy of
the correct predictions (the minimum error rate), the amount of time consuming in the
modelling process and the interpretability and simplicity of the results for clinical experts are
the factors considered to choose the best method
RAINFALL PREDICTION USING DATA MINING TECHNIQUES - A SURVEYcscpconf
Rainfall is considered as one of the major components of the hydrological process; it takes significant part in evaluating drought and flooding events. Therefore, it is important to have anaccurate model for rainfall prediction. Recently, several data-driven modeling approaches havebeen investigated to perform such forecasting tasks as multilayer perceptron neural networks
(MLP-NN). In fact, the rainfall time series modeling (SARIMA) involvesimportant temporal dimensions. In order to evaluate the incomes of both models, statistical parameters were used to
make the comparison between the two models. These parameters include the Root Mean Square Error RMSE, Mean Absolute Error MAE, Coefficient Of Correlation CC and BIAS. Two-Third of the data was used for training the model and One-third for testing.
An Improved Differential Evolution Algorithm for Data Stream ClusteringIJECEIAES
A Few algorithms were actualized by the analysts for performing clustering of data streams. Most of these algorithms require that the number of clusters (K) has to be fixed by the customer based on input data and it can be kept settled all through the clustering process. Stream clustering has faced few difficulties in picking up K. In this paper, we propose an efficient approach for data stream clustering by embracing an Improved Differential Evolution (IDE) algorithm. The IDE algorithm is one of the quick, powerful and productive global optimization approach for programmed clustering. In our proposed approach, we additionally apply an entropy based method for distinguishing the concept drift in the data stream and in this way updating the clustering procedure online. We demonstrated that our proposed method is contrasted with Genetic Algorithm and identified as proficient optimization algorithm. The performance of our proposed technique is assessed and cr eates the accuracy of 92.29%, the precision is 86.96%, recall is 90.30% and F-measure estimate is 88.60%.
RAINFALL PREDICTION USING DATA MINING TECHNIQUES - A SURVEYcsandit
Rainfall is considered as one of the major components of the hydrological process; it takes
significant part in evaluating drought and flooding events. Therefore, it is important to have an
accurate model for rainfall prediction. Recently, several data-driven modeling approaches have
been investigated to perform such forecasting tasks as multilayer perceptron neural networks
(MLP-NN). In fact, the rainfall time series modeling (SARIMA) involvesimportant temporal
dimensions. In order to evaluate the incomes of both models, statistical parameters were used to
make the comparison between the two models. These parameters include the Root Mean Square
Error RMSE, Mean Absolute Error MAE, Coefficient Of Correlation CC and BIAS. Two-Third
of the data was used for training the model and One-third for testing.
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...ijcsit
The wide usage of the Internet and the availability of powerful computers and high-speed networks as low cost
commodity components have a deep impact on the way we use computers today, in such a way that
these technologies facilitated the usage of multi-owner and geographically distributed resources to address
large-scale problems in many areas such as science, engineering, and commerce. The new paradigm of
Grid computing has evolved from these researches on these topics. Performance and utilization of the grid
depends on a complex and excessively dynamic procedure of optimally balancing the load among the
available nodes. In this paper, we suggest a novel two-dimensional figure of merit that depict the network
effects on load balance and fault tolerance estimation to improve the performance of the network
utilizations. The enhancement of fault tolerance is obtained by adaptively decrease replication time and
message cost. On the other hand, load balance is improved by adaptively decrease mean job response time.
Finally, analysis of Genetic Algorithm, Ant Colony Optimization, and Particle Swarm Optimization is
conducted with regards to their solutions, issues and improvements concerning load balancing in
computational grid. Consequently, a significant system utilization improvement was attained. Experimental
results eventually demonstrate that the proposed method's performance surpasses other methods.
Illustration of Medical Image Segmentation based on Clustering Algorithmsrahulmonikasharma
Image segmentation is the most basic and crucial process remembering the true objective to facilitate the characterization and representation of the structure of excitement for medical or basic images. Despite escalated research, segmentation remains a challenging issue because of the differing image content, cluttered objects, occlusion, non-uniform object surface, and different factors. There are numerous calculations and techniques accessible for image segmentation yet at the same time there requirements to build up an efficient, quick technique of medical image segmentation. This paper has focused on K-means and Fuzzy C means clustering algorithm to segment malaria blood samples in more accurate manner.
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...cscpconf
In the present study, the abilities of three classification methods of data mining namely artificial
neural networks with feed-forward back propagation algorithm, J48 decision tree method and
logistic regression analysis are compared in a medical real dataset. The prediction of
malignancy in suspected thyroid tumour patients is the objective of the study. The accuracy of
the correct predictions (the minimum error rate), the amount of time consuming in the
modelling process and the interpretability and simplicity of the results for clinical experts are
the factors considered to choose the best method
RAINFALL PREDICTION USING DATA MINING TECHNIQUES - A SURVEYcscpconf
Rainfall is considered as one of the major components of the hydrological process; it takes significant part in evaluating drought and flooding events. Therefore, it is important to have anaccurate model for rainfall prediction. Recently, several data-driven modeling approaches havebeen investigated to perform such forecasting tasks as multilayer perceptron neural networks
(MLP-NN). In fact, the rainfall time series modeling (SARIMA) involvesimportant temporal dimensions. In order to evaluate the incomes of both models, statistical parameters were used to
make the comparison between the two models. These parameters include the Root Mean Square Error RMSE, Mean Absolute Error MAE, Coefficient Of Correlation CC and BIAS. Two-Third of the data was used for training the model and One-third for testing.
An Improved Differential Evolution Algorithm for Data Stream ClusteringIJECEIAES
A Few algorithms were actualized by the analysts for performing clustering of data streams. Most of these algorithms require that the number of clusters (K) has to be fixed by the customer based on input data and it can be kept settled all through the clustering process. Stream clustering has faced few difficulties in picking up K. In this paper, we propose an efficient approach for data stream clustering by embracing an Improved Differential Evolution (IDE) algorithm. The IDE algorithm is one of the quick, powerful and productive global optimization approach for programmed clustering. In our proposed approach, we additionally apply an entropy based method for distinguishing the concept drift in the data stream and in this way updating the clustering procedure online. We demonstrated that our proposed method is contrasted with Genetic Algorithm and identified as proficient optimization algorithm. The performance of our proposed technique is assessed and cr eates the accuracy of 92.29%, the precision is 86.96%, recall is 90.30% and F-measure estimate is 88.60%.
A SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MININGijcsit
The main goal of cluster analysis is to classify elements into groupsbased on their similarity. Clustering
has many applications such as astronomy, bioinformatics, bibliography, and pattern recognition. In this
paper, a survey of clustering methods and techniques and identification of advantages and disadvantages
of these methods are presented to give a solid background to choose the best method to extract strong
association rules.
A SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MININGijcsit
The main goal of cluster analysis is to classify elements into groupsbased on their similarity. Clustering
has many applications such as astronomy, bioinformatics, bibliography, and pattern recognition. In this
paper, a survey of clustering methods and techniques and identification of advantages and disadvantages
of these methods are presented to give a solid background to choose the best method to extract strong
association rules.
The main goal of cluster analysis is to classify elements into groupsbased on their similarity. Clustering has many applications such as astronomy, bioinformatics, bibliography, and pattern recognition. In this paper, a survey of clustering methods and techniques and identification of advantages and disadvantages of these methods are presented to give a solid background to choose the best method to extract strong association rules.
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal
Apriori is one of the key algorithms to generate frequent itemsets. Analysing frequent itemset is a crucial
step in analysing structured data and in finding association relationship between items. This stands as an
elementary foundation to supervised learning, which encompasses classifier and feature extraction
methods. Applying this algorithm is crucial to understand the behaviour of structured data. Most of the
structured data in scientific domain are voluminous. Processing such kind of data requires state of the art
computing machines. Setting up such an infrastructure is expensive. Hence a distributed environment
such as a clustered setup is employed for tackling such scenarios. Apache Hadoop distribution is one of
the cluster frameworks in distributed environment that helps by distributing voluminous data across a
number of nodes in the framework. This paper focuses on map/reduce design and implementation of
Apriori algorithm for structured data analysis.
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCEIJCSEA Journal
The traditional medical analysis is based on the static data, the medical data is about to be analysis after
the collection of these data sets is completed, but this is far from satisfying the actual demand. Large
amounts of medical data are generated in real time, so that real-time analysis can yield more value. This
paper introduces the design of the Sentinel which can realize the real-time analysis system based on the
clustering algorithm. Sentinel can realize clustering analysis of real-time data based on the clustering
algorithm and issue an early alert.
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCEIJCSEA Journal
The traditional medical analysis is based on the static data, the medical data is about to be analysis after
the collection of these data sets is completed, but this is far from satisfying the actual demand. Large
amounts of medical data are generated in real time, so that real-time analysis can yield more value. This
paper introduces the design of the Sentinel which can realize the real-time analysis system based on the
clustering algorithm. Sentinel can realize clustering analysis of real-time data based on the clustering
algorithm and issue an early alert.
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCEIJCSEA Journal
The traditional medical analysis is based on the static data, the medical data is about to be analysis after the collection of these data sets is completed, but this is far from satisfying the actual demand. Large amounts of medical data are generated in real time, so that real-time analysis can yield more value. This paper introduces the design of the Sentinel which can realize the real-time analysis system based on the clustering algorithm. Sentinel can realize clustering analysis of real-time data based on the clustering algorithm and issue an early alert.
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCEIJCSEA Journal
The traditional medical analysis is based on the static data, the medical data is about to be analysis after the collection of these data sets is completed, but this is far from satisfying the actual demand. Large amounts of medical data are generated in real time, so that real-time analysis can yield more value. This paper introduces the design of the Sentinel which can realize the real-time analysis system based on the clustering algorithm. Sentinel can realize clustering analysis of real-time data based on the clustering algorithm and issue an early alert.
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCEIJCSEA Journal
The traditional medical analysis is based on the static data, the medical data is about to be analysis after
the collection of these data sets is completed, but this is far from satisfying the actual demand. Large
amounts of medical data are generated in real time, so that real-time analysis can yield more value. This
paper introduces the design of the Sentinel which can realize the real-time analysis system based on the
clustering algorithm. Sentinel can realize clustering analysis of real-time data based on the clustering
algorithm and issue an early alert.
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCEIJCSEA Journal
The traditional medical analysis is based on the static data, the medical data is about to be analysis after
the collection of these data sets is completed, but this is far from satisfying the actual demand. Large
amounts of medical data are generated in real time, so that real-time analysis can yield more value. This
paper introduces the design of the Sentinel which can realize the real-time analysis system based on the
clustering algorithm. Sentinel can realize clustering analysis of real-time data based on the clustering
algorithm and issue an early alert.
Comparative study of optimization algorithms on convolutional network for aut...IJECEIAES
The last 10 years have been the decade of autonomous vehicles. Advances in intelligent sensors and control schemes have shown the possibility of real applications.
Deep learning, and in particular convolutional networks have become a fundamental
tool in the solution of problems related to environment identification, path planning,
vehicle behavior, and motion control. In this paper, we perform a comparative study of
the most used optimization strategies on the convolutional architecture residual neural network (ResNet) for an autonomous driving problem as a previous step to the
development of an intelligent sensor. This sensor, part of our research in reactive
systems for autonomous vehicles, aims to become a system for direct mapping of sensory information to control actions from real-time images of the environment. The
optimization techniques analyzed include stochastic gradient descent (SGD), adaptive gradient (Adagrad), adaptive learning rate (Adadelta), root mean square propagation (RMSProp), Adamax, adaptive moment estimation (Adam), nesterov-accelerated
adaptive moment estimation (Nadam), and follow the regularized leader (Ftrl). The
training of the deep model is evaluated in terms of convergence, accuracy, recall, and
F1-score metrics. Preliminary results show a better performance of the deep network
when using the SGD function as an optimizer, while the Ftrl function presents the
poorest performances.
Energy efficiency in virtual machines allocation for cloud data centers with ...IJECEIAES
Energy usage of data centers is a challenging and complex issue because computing applications and data are growing so quickly that increasingly larger servers and disks are needed to process them fast enough within the required time period. In the past few years, many approaches to virtual machine placement have been proposed. This study proposes a new approach for virtual machine allocation to physical hosts. Either minimizes the physical hosts and avoids the SLA violation. The proposed method in comparison to the other algorithms achieves better results.
Evolving Efficient Clustering and Classification Patterns in Lymphography Dat...ijsc
Data mining refers to the process of retrieving knowledge by discovering novel and relative patterns from large datasets. Clustering and Classification are two distinct phases in data mining that work to provide an established, proven structure from a voluminous collection of facts. A dominant area of modern-day research in the field of medical investigations includes disease prediction and malady categorization. In this paper, our focus is to analyze clusters of patient records obtained via unsupervised clustering techniques and compare the performance of classification algorithms on the clinical data. Feature selection is a supervised method that attempts to select a subset of the predictor features based on the information gain. The Lymphography dataset comprises of 18 predictor attributes and 148 instances with the class label having four distinct values. This paper highlights the accuracy of eight clustering algorithms in detecting clusters of patient records and predictor attributes and highlights the performance of sixteen classification algorithms on the Lymphography dataset that enables the classifier to accurately perform multi-class categorization of medical data. Our work asserts the fact that the Random Tree algorithm and the Quinlan’s C4.5 algorithm give 100 percent classification accuracy with all the predictor features and also with the feature subset selected by the Fisher Filtering feature selection algorithm.. It is also stated here that the Density Based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm offers increased clustering accuracy in less computation time.
EVOLVING EFFICIENT CLUSTERING AND CLASSIFICATION PATTERNS IN LYMPHOGRAPHY DAT...ijsc
Data mining refers to the process of retrieving knowledge by discovering novel and relative patterns from
large datasets. Clustering and Classification are two distinct phases in data mining that work to provide an
established, proven structure from a voluminous collection of facts. A dominant area of modern-day
research in the field of medical investigations includes disease prediction and malady categorization. In
this paper, our focus is to analyze clusters of patient records obtained via unsupervised clustering
techniques and compare the performance of classification algorithms on the clinical data. Feature
selection is a supervised method that attempts to select a subset of the predictor features based on the
information gain. The Lymphography dataset comprises of 18 predictor attributes and 148 instances with
the class label having four distinct values. This paper highlights the accuracy of eight clustering algorithms
in detecting clusters of patient records and predictor attributes and highlights the performance of sixteen
classification algorithms on the Lymphography dataset that enables the classifier to accurately perform
multi-class categorization of medical data. Our work asserts the fact that the Random Tree algorithm and
the Quinlan’s C4.5 algorithm give 100 percent classification accuracy with all the predictor features and
also with the feature subset selected by the Fisher Filtering feature selection algorithm.. It is also stated
here that the Density Based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm
offers increased clustering accuracy in less computation time.
Hybrid features selection method using random forest and meerkat clan algorithmTELKOMNIKA JOURNAL
In the majority of gene expression investigations, selecting relevant genes for sample classification is considered a frequent challenge, with researchers attempting to discover the minimum feasible number of genes while yet achieving excellent predictive performance. Various gene selection methods employ univariate (gene-by-gene) gene relevance rankings as well as arbitrary thresholds for selecting the number of genes, are only applicable to 2-class problems and use gene selection ranking criteria unrelated to the algorithm of classification. A modified random forest (MRF) algorithm depending on the meerkat clan algorithm (MCA) is provided in this work.
It is one of the swarm intelligence algorithms and one of the most significant machine learning approaches in the decision tree. MCA is used to choose characteristics for the RF algorithm. In information systems, databases, and other applications, feature selection imputation is critical. The proposed algorithm was applied to three different databases, where the experimental results for accuracy and time proved the superiority of the proposed algorithm over the original algorithm.
SCCAI- A Student Career Counselling Artificial Intelligencevivatechijri
As education is growing day by day, the competition has prompted a need for the student to
understand more about the educational field. Many times the counselor isn’t available all the time and
sometimes due to the lack of proper knowledge about some educational field. Due to this, it creates an issue of
misconception of that field. This creates a problem for the student to decide a proper educational trajectory and
guidance is not always useful. The proposed paper will overcome all these problem using machine learning
algorithm. Various algorithms are being considered and amongst them the best suitable for our project are used
here. There are 3 major problems that come across our path and they are solved using Random forest, Linear
regression and Searching algorithm using Google API. At first Searching algorithm solves the problem of
location by segregating the college’s location vice, then Random Forest provides the list of colleges by using
stream and range of percentage and finally Linear Regression predicts the current cutoff using previous years’
data. Rather than this, the proposed system also provides information regarding all fields of education helping
students to understand and know about their field of interest better. The following idea is a total fresh idea with
no existing projects of similar kind. This project will help students guide them throughout.
New hybrid ensemble method for anomaly detection in data science IJECEIAES
Anomaly detection is a significant research area in data science. Anomaly detection is used to find unusual points or uncommon events in data streams. It is gaining popularity not only in the business world but also in different of other fields, such as cyber security, fraud detection for financial systems, and healthcare. Detecting anomalies could be useful to find new knowledge in the data. This study aims to build an effective model to protect the data from these anomalies. We propose a new hyper ensemble machine learning method that combines the predictions from two methodologies the outcomes of isolation forest-k-means and random forest using a voting majority. Several available datasets, including KDD Cup-99, Credit Card, Wisconsin Prognosis Breast Cancer (WPBC), Forest Cover, and Pima, were used to evaluate the proposed method. The experimental results exhibit that our proposed model gives the highest realization in terms of receiver operating characteristic performance, accuracy, precision, and recall. Our approach is more efficient in detecting anomalies than other approaches. The highest accuracy rate achieved is 99.9%, compared to accuracy without a voting method, which achieves 97%.
Convolutional neural network with binary moth flame optimization for emotion ...IAESIJAI
Electroencephalograph (EEG) signals have the ability of real-time reflecting brain activities. Utilizing the EEG signal for analyzing human emotional states is a common study. The EEG signals of the emotions aren’t distinctive and it is different from one person to another as every one of them has different emotional responses to same stimuli. Which is why, the signals of the EEG are subject dependent and proven to be effective for the subject dependent detection of the Emotions. For the purpose of achieving enhanced accuracy and high true positive rate, the suggested system proposed a binary moth flame optimization (BMFO) algorithm for the process of feature selection and convolutional neural networks (CNNs) for classifications. In this proposal, optimum features are chosen with the use of accuracy as objective function. Ultimately, optimally chosen features are classified after that with the use of a CNN for the purpose of discriminating different emotion states.
A novel ensemble model for detecting fake newsIAESIJAI
Due the growing proliferation of fake news over the past couple of years, our objective in this paper is to propose an ensemble model for the automatic classification of article news as being either real or fake. For this purpose, we opt for a blending technique that combines three models, namely bidirectional long short-term memory (Bi-LSTM), stochastic gradient descent classifier and ridge classifier. The implementation of the proposed model (i.e. BI-LSR) on real world datasets, has shown outstanding results. In fact, it achieved an accuracy score of 99.16%. Accordingly, this ensemble learning has proven to do perform better than individual conventional machine learning and deep learning models as well as many ensemble learning approaches cited in the literature.
More Related Content
Similar to Fuzzy C-means clustering on rainfall flow optimization technique for medical data
A SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MININGijcsit
The main goal of cluster analysis is to classify elements into groupsbased on their similarity. Clustering
has many applications such as astronomy, bioinformatics, bibliography, and pattern recognition. In this
paper, a survey of clustering methods and techniques and identification of advantages and disadvantages
of these methods are presented to give a solid background to choose the best method to extract strong
association rules.
A SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MININGijcsit
The main goal of cluster analysis is to classify elements into groupsbased on their similarity. Clustering
has many applications such as astronomy, bioinformatics, bibliography, and pattern recognition. In this
paper, a survey of clustering methods and techniques and identification of advantages and disadvantages
of these methods are presented to give a solid background to choose the best method to extract strong
association rules.
The main goal of cluster analysis is to classify elements into groupsbased on their similarity. Clustering has many applications such as astronomy, bioinformatics, bibliography, and pattern recognition. In this paper, a survey of clustering methods and techniques and identification of advantages and disadvantages of these methods are presented to give a solid background to choose the best method to extract strong association rules.
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal
Apriori is one of the key algorithms to generate frequent itemsets. Analysing frequent itemset is a crucial
step in analysing structured data and in finding association relationship between items. This stands as an
elementary foundation to supervised learning, which encompasses classifier and feature extraction
methods. Applying this algorithm is crucial to understand the behaviour of structured data. Most of the
structured data in scientific domain are voluminous. Processing such kind of data requires state of the art
computing machines. Setting up such an infrastructure is expensive. Hence a distributed environment
such as a clustered setup is employed for tackling such scenarios. Apache Hadoop distribution is one of
the cluster frameworks in distributed environment that helps by distributing voluminous data across a
number of nodes in the framework. This paper focuses on map/reduce design and implementation of
Apriori algorithm for structured data analysis.
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCEIJCSEA Journal
The traditional medical analysis is based on the static data, the medical data is about to be analysis after
the collection of these data sets is completed, but this is far from satisfying the actual demand. Large
amounts of medical data are generated in real time, so that real-time analysis can yield more value. This
paper introduces the design of the Sentinel which can realize the real-time analysis system based on the
clustering algorithm. Sentinel can realize clustering analysis of real-time data based on the clustering
algorithm and issue an early alert.
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCEIJCSEA Journal
The traditional medical analysis is based on the static data, the medical data is about to be analysis after
the collection of these data sets is completed, but this is far from satisfying the actual demand. Large
amounts of medical data are generated in real time, so that real-time analysis can yield more value. This
paper introduces the design of the Sentinel which can realize the real-time analysis system based on the
clustering algorithm. Sentinel can realize clustering analysis of real-time data based on the clustering
algorithm and issue an early alert.
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCEIJCSEA Journal
The traditional medical analysis is based on the static data, the medical data is about to be analysis after the collection of these data sets is completed, but this is far from satisfying the actual demand. Large amounts of medical data are generated in real time, so that real-time analysis can yield more value. This paper introduces the design of the Sentinel which can realize the real-time analysis system based on the clustering algorithm. Sentinel can realize clustering analysis of real-time data based on the clustering algorithm and issue an early alert.
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCEIJCSEA Journal
The traditional medical analysis is based on the static data, the medical data is about to be analysis after the collection of these data sets is completed, but this is far from satisfying the actual demand. Large amounts of medical data are generated in real time, so that real-time analysis can yield more value. This paper introduces the design of the Sentinel which can realize the real-time analysis system based on the clustering algorithm. Sentinel can realize clustering analysis of real-time data based on the clustering algorithm and issue an early alert.
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCEIJCSEA Journal
The traditional medical analysis is based on the static data, the medical data is about to be analysis after
the collection of these data sets is completed, but this is far from satisfying the actual demand. Large
amounts of medical data are generated in real time, so that real-time analysis can yield more value. This
paper introduces the design of the Sentinel which can realize the real-time analysis system based on the
clustering algorithm. Sentinel can realize clustering analysis of real-time data based on the clustering
algorithm and issue an early alert.
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCEIJCSEA Journal
The traditional medical analysis is based on the static data, the medical data is about to be analysis after
the collection of these data sets is completed, but this is far from satisfying the actual demand. Large
amounts of medical data are generated in real time, so that real-time analysis can yield more value. This
paper introduces the design of the Sentinel which can realize the real-time analysis system based on the
clustering algorithm. Sentinel can realize clustering analysis of real-time data based on the clustering
algorithm and issue an early alert.
Comparative study of optimization algorithms on convolutional network for aut...IJECEIAES
The last 10 years have been the decade of autonomous vehicles. Advances in intelligent sensors and control schemes have shown the possibility of real applications.
Deep learning, and in particular convolutional networks have become a fundamental
tool in the solution of problems related to environment identification, path planning,
vehicle behavior, and motion control. In this paper, we perform a comparative study of
the most used optimization strategies on the convolutional architecture residual neural network (ResNet) for an autonomous driving problem as a previous step to the
development of an intelligent sensor. This sensor, part of our research in reactive
systems for autonomous vehicles, aims to become a system for direct mapping of sensory information to control actions from real-time images of the environment. The
optimization techniques analyzed include stochastic gradient descent (SGD), adaptive gradient (Adagrad), adaptive learning rate (Adadelta), root mean square propagation (RMSProp), Adamax, adaptive moment estimation (Adam), nesterov-accelerated
adaptive moment estimation (Nadam), and follow the regularized leader (Ftrl). The
training of the deep model is evaluated in terms of convergence, accuracy, recall, and
F1-score metrics. Preliminary results show a better performance of the deep network
when using the SGD function as an optimizer, while the Ftrl function presents the
poorest performances.
Energy efficiency in virtual machines allocation for cloud data centers with ...IJECEIAES
Energy usage of data centers is a challenging and complex issue because computing applications and data are growing so quickly that increasingly larger servers and disks are needed to process them fast enough within the required time period. In the past few years, many approaches to virtual machine placement have been proposed. This study proposes a new approach for virtual machine allocation to physical hosts. Either minimizes the physical hosts and avoids the SLA violation. The proposed method in comparison to the other algorithms achieves better results.
Evolving Efficient Clustering and Classification Patterns in Lymphography Dat...ijsc
Data mining refers to the process of retrieving knowledge by discovering novel and relative patterns from large datasets. Clustering and Classification are two distinct phases in data mining that work to provide an established, proven structure from a voluminous collection of facts. A dominant area of modern-day research in the field of medical investigations includes disease prediction and malady categorization. In this paper, our focus is to analyze clusters of patient records obtained via unsupervised clustering techniques and compare the performance of classification algorithms on the clinical data. Feature selection is a supervised method that attempts to select a subset of the predictor features based on the information gain. The Lymphography dataset comprises of 18 predictor attributes and 148 instances with the class label having four distinct values. This paper highlights the accuracy of eight clustering algorithms in detecting clusters of patient records and predictor attributes and highlights the performance of sixteen classification algorithms on the Lymphography dataset that enables the classifier to accurately perform multi-class categorization of medical data. Our work asserts the fact that the Random Tree algorithm and the Quinlan’s C4.5 algorithm give 100 percent classification accuracy with all the predictor features and also with the feature subset selected by the Fisher Filtering feature selection algorithm.. It is also stated here that the Density Based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm offers increased clustering accuracy in less computation time.
EVOLVING EFFICIENT CLUSTERING AND CLASSIFICATION PATTERNS IN LYMPHOGRAPHY DAT...ijsc
Data mining refers to the process of retrieving knowledge by discovering novel and relative patterns from
large datasets. Clustering and Classification are two distinct phases in data mining that work to provide an
established, proven structure from a voluminous collection of facts. A dominant area of modern-day
research in the field of medical investigations includes disease prediction and malady categorization. In
this paper, our focus is to analyze clusters of patient records obtained via unsupervised clustering
techniques and compare the performance of classification algorithms on the clinical data. Feature
selection is a supervised method that attempts to select a subset of the predictor features based on the
information gain. The Lymphography dataset comprises of 18 predictor attributes and 148 instances with
the class label having four distinct values. This paper highlights the accuracy of eight clustering algorithms
in detecting clusters of patient records and predictor attributes and highlights the performance of sixteen
classification algorithms on the Lymphography dataset that enables the classifier to accurately perform
multi-class categorization of medical data. Our work asserts the fact that the Random Tree algorithm and
the Quinlan’s C4.5 algorithm give 100 percent classification accuracy with all the predictor features and
also with the feature subset selected by the Fisher Filtering feature selection algorithm.. It is also stated
here that the Density Based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm
offers increased clustering accuracy in less computation time.
Hybrid features selection method using random forest and meerkat clan algorithmTELKOMNIKA JOURNAL
In the majority of gene expression investigations, selecting relevant genes for sample classification is considered a frequent challenge, with researchers attempting to discover the minimum feasible number of genes while yet achieving excellent predictive performance. Various gene selection methods employ univariate (gene-by-gene) gene relevance rankings as well as arbitrary thresholds for selecting the number of genes, are only applicable to 2-class problems and use gene selection ranking criteria unrelated to the algorithm of classification. A modified random forest (MRF) algorithm depending on the meerkat clan algorithm (MCA) is provided in this work.
It is one of the swarm intelligence algorithms and one of the most significant machine learning approaches in the decision tree. MCA is used to choose characteristics for the RF algorithm. In information systems, databases, and other applications, feature selection imputation is critical. The proposed algorithm was applied to three different databases, where the experimental results for accuracy and time proved the superiority of the proposed algorithm over the original algorithm.
SCCAI- A Student Career Counselling Artificial Intelligencevivatechijri
As education is growing day by day, the competition has prompted a need for the student to
understand more about the educational field. Many times the counselor isn’t available all the time and
sometimes due to the lack of proper knowledge about some educational field. Due to this, it creates an issue of
misconception of that field. This creates a problem for the student to decide a proper educational trajectory and
guidance is not always useful. The proposed paper will overcome all these problem using machine learning
algorithm. Various algorithms are being considered and amongst them the best suitable for our project are used
here. There are 3 major problems that come across our path and they are solved using Random forest, Linear
regression and Searching algorithm using Google API. At first Searching algorithm solves the problem of
location by segregating the college’s location vice, then Random Forest provides the list of colleges by using
stream and range of percentage and finally Linear Regression predicts the current cutoff using previous years’
data. Rather than this, the proposed system also provides information regarding all fields of education helping
students to understand and know about their field of interest better. The following idea is a total fresh idea with
no existing projects of similar kind. This project will help students guide them throughout.
New hybrid ensemble method for anomaly detection in data science IJECEIAES
Anomaly detection is a significant research area in data science. Anomaly detection is used to find unusual points or uncommon events in data streams. It is gaining popularity not only in the business world but also in different of other fields, such as cyber security, fraud detection for financial systems, and healthcare. Detecting anomalies could be useful to find new knowledge in the data. This study aims to build an effective model to protect the data from these anomalies. We propose a new hyper ensemble machine learning method that combines the predictions from two methodologies the outcomes of isolation forest-k-means and random forest using a voting majority. Several available datasets, including KDD Cup-99, Credit Card, Wisconsin Prognosis Breast Cancer (WPBC), Forest Cover, and Pima, were used to evaluate the proposed method. The experimental results exhibit that our proposed model gives the highest realization in terms of receiver operating characteristic performance, accuracy, precision, and recall. Our approach is more efficient in detecting anomalies than other approaches. The highest accuracy rate achieved is 99.9%, compared to accuracy without a voting method, which achieves 97%.
Convolutional neural network with binary moth flame optimization for emotion ...IAESIJAI
Electroencephalograph (EEG) signals have the ability of real-time reflecting brain activities. Utilizing the EEG signal for analyzing human emotional states is a common study. The EEG signals of the emotions aren’t distinctive and it is different from one person to another as every one of them has different emotional responses to same stimuli. Which is why, the signals of the EEG are subject dependent and proven to be effective for the subject dependent detection of the Emotions. For the purpose of achieving enhanced accuracy and high true positive rate, the suggested system proposed a binary moth flame optimization (BMFO) algorithm for the process of feature selection and convolutional neural networks (CNNs) for classifications. In this proposal, optimum features are chosen with the use of accuracy as objective function. Ultimately, optimally chosen features are classified after that with the use of a CNN for the purpose of discriminating different emotion states.
A novel ensemble model for detecting fake newsIAESIJAI
Due the growing proliferation of fake news over the past couple of years, our objective in this paper is to propose an ensemble model for the automatic classification of article news as being either real or fake. For this purpose, we opt for a blending technique that combines three models, namely bidirectional long short-term memory (Bi-LSTM), stochastic gradient descent classifier and ridge classifier. The implementation of the proposed model (i.e. BI-LSR) on real world datasets, has shown outstanding results. In fact, it achieved an accuracy score of 99.16%. Accordingly, this ensemble learning has proven to do perform better than individual conventional machine learning and deep learning models as well as many ensemble learning approaches cited in the literature.
K-centroid convergence clustering identification in one-label per type for di...IAESIJAI
Disease prediction is a high demand field which requires significant support from machine learning (ML) to enhance the result efficiency. The research works on application of K-means clustering supervised classification in disease prediction where each class only has one labeled data. The K-centroid convergence clustering identification (KC3 I) system is based on semi-K-means clustering but only requires single labeled data per class for the training process with the training dataset to update the centroid. The KC3 I model also includes a dictionary box to index all the input centroids before and after the updating process. Each centroid matches with a corresponding label inside this box. After the training process, each time the input features arrive, the trained centroid will put them to its cluster depending on the Euclidean distance, then convert them into the specific class name, which is coherent to that centroid index. Two validation stages were carried out and accomplished the expectation in terms of precision, recall, F1-score, and absolute accuracy. The last part demonstrates the possibility of feature reduction by selecting the most crucial feature with the extra tree classifier method. Total data are fed into the KC3 I system with the most important features and remain the same accuracy.
Plant leaf detection through machine learning based image classification appr...IAESIJAI
Since maize is a staple diet for people, especially vegetarians and vegans, maize leaf disease has a significant influence here on the food industry including maize crop productivity. Therefore, it should be understood that maize quality must be optimal; yet, to do so, maize must be safeguarded from several illnesses. As a result, there is a great demand for such an automated system that can identify the condition early on and take the appropriate action. Early disease identification is crucial, but it also poses a major obstacle. As a result, in this research project, we adopt the fundamental k-nearest neighbor (KNN) model and concentrate on building and developing the enhanced k-nearest neighbor (EKNN) model. EKNN aids in identifying several classes of disease. To gather discriminative, boundary, pattern, and structurally linked information, additional high-quality fine and coarse features are generated. This information is then used in the classification process. The classification algorithm offers high-quality gradient-based features. Additionally, the proposed model is assessed using the Plant-Village dataset, and a comparison with many standard classification models using various metrics is also done.
Backbone search for object detection for applications in intrusion warning sy...IAESIJAI
In this work, we propose a novel backbone search method for object detection for applications in intrusion warning systems. The goal is to find a compact model for use in embedded thermal imaging cameras widely used in intrusion warning systems. The proposed method is based on faster region-based convolutional neural network (Faster R-CNN) because it can detect small objects. Inspired by EfficientNet, the sought-after backbone architecture is obtained by finding the most suitable width scale for the base backbone (ResNet50). The evaluation metrics are mean average precision (mAP), number of parameters, and number of multiply–accumulate operations (MACs). The experimental results showed that the proposed method is effective in building a lightweight neural network for the task of object detection. The obtained model can keep the predefined mAP while minimizing the number of parameters and computational resources. All experiments are executed elaborately on the person detection in intrusion warning systems (PDIWS) dataset.
Deep learning method for lung cancer identification and classificationIAESIJAI
Lung cancer (LC) is calming many lives and is becoming a serious cause of concern. The detection of LC at an early stage assists the chances of recovery. Accuracy of detection of LC at an early stage can be improved with the help of a convolutional neural network (CNN) based deep learning approach. In this paper, we present two methodologies for Lung cancer detection (LCD) applied on Lung image database consortium (LIDC) and image database resource initiative (IDRI) data sets. Classification of these LC images is carried out using support vector machine (SVM), and deep CNN. The CNN is trained with i) multiple batches and ii) single batch for LC image classification as non cancer and cancer image. All these methods are being implemented in MATLAB. The accuracy of classification obtained by SVM is 65%, whereas deep CNN produced detection accuracy of 80% and 100% respectively for multiple and single batch training. The novelty of our experimentation is near 100% classification accuracy obtained by our deep CNN model when tested on 25 Lung computed tomography (CT) test images each of size 512×512 pixels in less than 20 iterations as compared to the research work carried out by other researchers using cropped LC nodule images.
Optically processed Kannada script realization with Siamese neural network modelIAESIJAI
Optical character recognition (OCR) is a technology that allows computers to recognize and extract text from images or scanned documents. It is commonly used to convert printed or handwritten text into machine-readable format. This Study presents an OCR system on Kannada Characters based on siamese neural network (SNN). Here the SNN, a Deep neural network which comprises of two identical convolutional neural network (CNN) compare the script and ranks based on the dissimilarity. When lesser dissimilarity score is identified, prediction is done as character match. In this work the authors use 5 classes of Kannada characters which were initially preprocessed using grey scaling and convert it to pgm format. This is directly input into the Deep convolutional network which is learnt from matching and non-matching image between the CNN with contrastive loss function in Siamese architecture. The Proposed OCR system uses very less time and gives more accurate results as compared to the regular CNN. The model can become a powerful tool for identification, particularly in situations where there is a high degree of variation in writing styles or limited training data is available.
Embedded artificial intelligence system using deep learning and raspberrypi f...IAESIJAI
Melanoma is a kind of skin cancer that originates in melanocytes responsible for producing melanin, it can be a severe and potentially deadly form of cancer because it can metastasize to other regions of the body if not detected and treated early. To facilitate this process, Recently, various computer-assisted low-cost, reliable, and accurate diagnostic systems have been proposed based on artificial intelligence (AI) algorithms, particularly deep learning techniques. This work proposed an innovative and intelligent system that combines the internet of things (IoT) with a Raspberry Pi connected to a camera and a deep learning model based on the deep convolutional neural network (CNN) algorithm for real-time detection and classification of melanoma cancer lesions. The key stages of our model before serializing to the Raspberry Pi: Firstly, the preprocessing part contains data cleaning, data transformation (normalization), and data augmentation to reduce overfitting when training. Then, the deep CNN algorithm is used to extract the features part. Finally, the classification part with applied Sigmoid Activation Function. The experimental results indicate the efficiency of our proposed classification system as we achieved an accuracy rate of 92%, a precision of 91%, a sensitivity of 91%, and an area under the curve- receiver operating characteristics (AUC-ROC) of 0.9133.
Deep learning based biometric authentication using electrocardiogram and irisIAESIJAI
Authentication systems play an important role in wide range of applications. The traditional token certificate and password-based authentication systems are now replaced by biometric authentication systems. Generally, these authentication systems are based on the data obtained from face, iris, electrocardiogram (ECG), fingerprint and palm print. But these types of models are unimodal authentication, which suffer from accuracy and reliability issues. In this regard, multimodal biometric authentication systems have gained huge attention to develop the robust authentication systems. Moreover, the current development in deep learning schemes have proliferated to develop more robust architecture to overcome the issues of tradition machine learning based authentication systems. In this work, we have adopted ECG and iris data and trained the obtained features with the help of hybrid convolutional neural network- long short-term memory (CNN-LSTM) model. In ECG, R peak detection is considered as an important aspect for feature extraction and morphological features are extracted. Similarly, gabor-wavelet, gray level co-occurrence matrix (GLCM), gray level difference matrix (GLDM) and principal component analysis (PCA) based feature extraction methods are applied on iris data. The final feature vector is obtained from MIT-BIH and IIT Delhi Iris dataset which is trained and tested by using CNN-LSTM. The experimental analysis shows that the proposed approach achieves average accuracy, precision, and F1-core as 0.985, 0.962 and 0.975, respectively.
Hybrid channel and spatial attention-UNet for skin lesion segmentationIAESIJAI
Melanoma is a type of skin cancer which has affected many lives globally. The American Cancer Society research has suggested that it a serious type of skin cancer and lead to mortality but it is almost 100% curable if it is detected and treated in its early stages. Currently automated computer vision-based schemes are widely adopted but these systems suffer from poor segmentation accuracy. To overcome these issue, deep learning (DL) has become the promising solution which performs extensive training for pattern learning and provide better classification accuracy. However, skin lesion segmentation is affected due to skin hair, unclear boundaries, pigmentation, and mole. To overcome this issue, we adopt UNet based deep learning scheme and incorporated attention mechanism which considers low level statistics and high-level statistics combined with feedback and skip connection module. This helps to obtain the robust features without neglecting the channel information. Further, we use channel attention, spatial attention modulation to achieve the final segmentation. The proposed DL based scheme is instigated on publically available dataset and experimental investigation shows that the proposed Hybrid Attention UNet approach achieves average performance as 0.9715, 0.9962, 0.9710.
Photoplethysmogram signal reconstruction through integrated compression sensi...IAESIJAI
The transmission of photoplethysmogram (PPG) signals in real-time is extremely challenging and facilitates the use of an internet of things (IoT) environment for healthcare- monitoring. This paper proposes an approach for PPG signal reconstruction through integrated compression sensing and basis function aware shallow learning (CSBSL). Integrated-CSBSL approach for combined compression of PPG signals via multiple channels thereby improving the reconstruction accuracy for the PPG signals essential in healthcare monitoring. An optimal basis function aware shallow learning procedure is employed on PPG signals with prior initialization; this is further fine-tuned by utilizing the knowledge of various other channels, which exploit the further sparsity of the PPG signals. The proposed method for learning combined with PPG signals retains the knowledge of spatial and temporal correlation. The proposed Integrated-CSBSL approach consists of two steps, in the first step the shallow learning based on basis function is carried out through training the PPG signals. The proposed method is evaluated using multichannel PPG signal reconstruction, which potentially benefits clinical applications through PPG monitoring and diagnosis.
Speaker identification under noisy conditions using hybrid convolutional neur...IAESIJAI
Speaker identification is biometrics that classifies or identifies a person from other speakers based on speech characteristics. Recently, deep learning models outperformed conventional machine learning models in speaker identification. Spectrograms of the speech have been used as input in deep learning-based speaker identification using clean speech. However, the performance of speaker identification systems gets degraded under noisy conditions. Cochleograms have shown better results than spectrograms in deep learning-based speaker recognition under noisy and mismatched conditions. Moreover, hybrid convolutional neural network (CNN) and recurrent neural network (RNN) variants have shown better performance than CNN or RNN variants in recent studies. However, there is no attempt conducted to use a hybrid CNN and enhanced RNN variants in speaker identification using cochleogram input to enhance the performance under noisy and mismatched conditions. In this study, a speaker identification using hybrid CNN and the gated recurrent unit (GRU) is proposed for noisy conditions using cochleogram input. VoxCeleb1 audio dataset with real-world noises, white Gaussian noises (WGN) and without additive noises were employed for experiments. The experiment results and the comparison with existing works show that the proposed model performs better than other models in this study and existing works.
Multi-channel microseismic signals classification with convolutional neural n...IAESIJAI
Identifying and classifying microseismic signals is essential to warn of mines’ dangers. Deep learning has replaced traditional methods, but labor-intensive manual identification and varying deep learning outcomes pose challenges. This paper proposes a transfer learning-based convolutional neural network (CNN) method called microseismic signals-convolutional neural network (MS-CNN) to automatically recognize and classify microseismic events and blasts. The model was instructed on a limited sample of data to obtain an optimal weight model for microseismic waveform recognition and classification. A comparative analysis was performed with an existing CNN model and classical image classification models such as AlexNet, GoogLeNet, and ResNet50. The outcomes demonstrate that the MS-CNN model achieved the best recognition and classification effect (99.6% accuracy) in the shortest time (0.31 s to identify 277 images in the test set). Thus, the MS-CNN model can efficiently recognize and classify microseismic events and blasts in practical engineering applications, improving the recognition timeliness of microseismic signals and further enhancing the accuracy of event classification.
Sophisticated face mask dataset: a novel dataset for effective coronavirus di...IAESIJAI
Efficient and accurate coronavirus disease (COVID-19) surveillance necessitates robust identification of individuals wearing face masks. This research introduces the sophisticated face mask dataset (SFMD), a comprehensive compilation of high-quality face mask images enriched with detailed annotations on mask types, fits, and usage patterns. Leveraging cutting-edge deep learning models—EfficientNet-B2, ResNet50, and MobileNet-V2—, we compare SFMD against two established benchmarks: the real-world masked face dataset (RMFD) and the masked face recognition dataset (MFRD). Across all models, SFMD consistently outperforms RMFD and MFRD in key metrics, including accuracy, precision, recall, and F1 score. Additionally, our study demonstrates the dataset's capability to cultivate robust models resilient to intricate scenarios like low-light conditions and facial occlusions due to accessories or facial hair.
Transfer learning for epilepsy detection using spectrogram imagesIAESIJAI
Epilepsy stands out as one of the common neurological diseases. The neural activity of the brain is observed using electroencephalography (EEG). Manual inspection of EEG brain signals is a slow and arduous process, which puts heavy load on neurologists and affects their performance. The aim of this study is to find the best result of classification using the transfer learning model that automatically identify the epileptic and the normal activity, to classify EEG signals by using images of spectrogram which represents the percentage of energy for each coefficient of the continuous wavelet. Dataset includes the EEG signals recorded at monitoring unit of epilepsy used in this study to presents an application of transfer learning by comparing three models Alexnet, visual geometry group (VGG19) and residual neural network ResNet using different combinations with seven different classifiers. This study tested the models and reached a different value of accuracy and other metrics used to judge their performances, and as a result the best combination has been achieved with ResNet combined with support vector machine (SVM) classifier that classified EEG signals with a high success rate using multiple performance metrics such as 97.22% accuracy and 2.78% the value of the error rate.
Deep neural network for lateral control of self-driving cars in urban environ...IAESIJAI
The exponential growth of the automotive industry clearly indicates that self-driving cars are the future of transportation. However, their biggest challenge lies in lateral control, particularly in urban bottlenecking environments, where disturbances and obstacles are abundant. In these situations, the ego vehicle has to follow its own trajectory while rapidly correcting deviation errors without colliding with other nearby vehicles. Various research efforts have focused on developing lateral control approaches, but these methods remain limited in terms of response speed and control accuracy. This paper presents a control strategy using a deep neural network (DNN) controller to effectively keep the car on the centerline of its trajectory and adapt to disturbances arising from deviations or trajectory curvature. The controller focuses on minimizing deviation errors. The Matlab/Simulink software is used for designing and training the DNN. Finally, simulation results confirm that the suggested controller has several advantages in terms of precision, with lateral deviation remaining below 0.65 meters, and rapidity, with a response time of 0.7 seconds, compared to traditional controllers in solving lateral control.
Attention mechanism-based model for cardiomegaly recognition in chest X-Ray i...IAESIJAI
Recently, cardiovascular diseases (CVDs) have become a rapidly growing problem in the world, especially in developing countries. The latter are facing a lifestyle change that introduces new risk factors for heart disease, that requires a particular and urgent interest. Besides, cardiomegaly is a sign of cardiovascular diseases that refers to various conditions; it is associated with the heart enlargement that can be either transient or permanent depending on certain conditions. Furthermore, cardiomegaly is visible on any imaging test including Chest X-Radiation (X-Ray) images; which are one of the most common tools used by Cardiologists to detect and diagnose many diseases. In this paper, we propose an innovative deep learning (DL) model based on an attention module and MobileNet architecture to recognize Cardiomegaly patients using the popular Chest X-Ray8 dataset. Actually, the attention module captures the spatial relationship between the relevant regions in Chest X-Ray images. The experimental results show that the proposed model achieved interesting results with an accuracy rate of 81% which makes it suitable for detecting cardiomegaly disease.
Efficient commodity price forecasting using long short-term memory modelIAESIJAI
Predicting commodity prices, particularly food prices, is a significant concern for various stakeholders, especially in regions that are highly sensitive to commodity price volatility. Historically, many machine learning models like autoregressive integrated moving average (ARIMA) and support vector machine (SVM) have been suggested to overcome the forecasting task. These models struggle to capture the multifaceted and dynamic factors influencing these prices. Recently, deep learning approaches have demonstrated considerable promise in handling complex forecasting tasks. This paper presents a novel long short-term memory (LSTM) network-based model for commodity price forecasting. The model uses five essential commodities namely bread, meat, milk, oil, and petrol. The proposed model focuses on advanced feature engineering which involves moving averages, price volatility, and past prices. The results reveal that our model outperforms traditional methods as it achieves 0.14, 3.04%, and 98.2% for root mean square error (RMSE), mean absolute percentage error (MAPE), and R-squared (R2 ), respectively. In addition to the simplicity of the model, which consists of an LSTM single-cell architecture that reduced the training time to a few minutes instead of hours. This paper contributes to the economic literature on price prediction using advanced deep learning techniques as well as provides practical implications for managing commodity price instability globally.
1-dimensional convolutional neural networks for predicting sudden cardiacIAESIJAI
Sudden cardiac arrest (SCA) is a serious heart problem that occurs without symptoms or warning. SCA causes high mortality. Therefore, it is important to estimate the incidence of SCA. Current methods for predicting ventricular fibrillation (VF) episodes require monitoring patients over time, resulting in no complications. New technologies, especially machine learning, are gaining popularity due to the benefits they provide. However, most existing systems rely on manual processes, which can lead to inefficiencies in disseminating patient information. On the other hand, existing deep learning methods rely on large data sets that are not publicly available. In this study, we propose a deep learning method based on one-dimensional convolutional neural networks to learn to use discrete fourier transform (DFT) features in raw electrocardiogram (ECG) signals. The results showed that our method was able to accurately predict the onset of SCA with an accuracy of 96% approximately 90 minutes before it occurred. Predictions can save many lives. That is, optimized deep learning models can outperform manual models in analyzing long-term signals.
A deep learning-based approach for early detection of disease in sugarcane pl...IAESIJAI
In many regions of the nation, agriculture serves as the primary industry. The farming environment now faces a number of challenges to farmers. One of the major concerns, and the focus of this research, is disease prediction. A methodology is suggested to automate a process for identifying disease in plant growth and warning farmers in advance so they can take appropriate action. Disease in crop plants has an impact on agricultural production. In this work, a novel DenseNet-support vector machine: explainable artificial intelligence (DNet-SVM: XAI) interpretation that combines a DenseNet with support vector machine (SVM) and local interpretable model-agnostic explanation (LIME) interpretation has been proposed. DNet-SVM: XAI was created by a series of modifications to DenseNet201, including the addition of a support vector machine (SVM) classifier. Prior to using SVM to identify if an image is healthy or un-healthy, images are first feature extracted using a convolution network called DenseNet. In addition to offering a likely explanation for the prediction, the reasoning is carried out utilizing the visual cue produced by the LIME. In light of this, the proposed approach, when paired with its determined interpretability and precision, may successfully assist farmers in the detection of infected plants and recommendation of pesticide for the identified disease.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Fuzzy C-means clustering on rainfall flow optimization technique for medical data
1. IAES International Journal of Artificial Intelligence (IJ-AI)
Vol. 12, No. 1, March 2023, pp. 180~188
ISSN: 2252-8938, DOI: 10.11591/ijai.v12.i1.pp180-188 180
Journal homepage: http://ijai.iaescore.com
Fuzzy C-means clustering on rainfall flow optimization
technique for medical data
Antony Jaya Mabel Rani1
, C. Srivenkateswaran1
, M. Rajasekar2
, M. Arun3
1
Department of Computer Science and Engineering, Kings Engineering College, Chennai, India
2
Institute of Computer Science and Engineering, Saveetha School of Engineering, Chennai, India
3
Department of Electronics and Communication Engineering, Panimalar Institute of Technology, Chennai, India
Article Info ABSTRACT
Article history:
Received Jul 28, 2021
Revised Jul 27, 2022
Accepted Aug 25, 2022
Due to various killing diseases in the world, medical data clustering is a very
challenging and critical task to handle and to take the proper decision from
multidimensional complex data in an effective manner. The most familiar
and suitable speedy clustering algorithm is K-means than other traditional
clustering approaches. But K-means is extra sensitive for initialization of
clustering centroid and it can easily surround. Thus, there is a necessity for
faster clustering with an effective optimum clustering centroid. Based on
that, this research paper projected an optimization-based clustering by hybrid
fuzzy C-means (FCM) clustering on rainfall flow optimization technique
(RFFO), which is the normal flow and behavior of rainfall flow from one
position to another position. FCM clustering algorithm is used to cluster the
given medical data and RFFO is used to produce optimum clustering
centroid. Finally, the clustering performance is also measured for the
proposed FCM clustering on RFFO technique with the help of accuracy,
random coefficient, and Jaccard coefficient for medical data set and find the
risk factor of a heart attack.
Keywords:
Accuracy
Clustering
Flood optimization
Fuzzy C-means
Rainfall flow
This is an open access article under the CC BY-SA license.
Corresponding Author:
Antony Jaya Mabel Rani
Department of Computer Science and Engineering, Kings Engineering College
Chennai, Tamil Nadu, India
Email: jayamabelrani@kingsedu.ac.in
1. INTRODUCTION
In a recent trend, an optimization-based clustering algorithm is used to cluster complex problems in
various environments in different situations. This clustering algorithm is mainly projected on clustering the
data by nearest neighboring of flow depth by the natural behavior and characteristic flow of rainwater. This
hybrid optimization-based clustering algorithm also used some mathematical calculations for updating the
next nearest neighbor location and velocity from one position to another position. Here land space location is
used to locate all the drops of rainfall, which is considered as land space data (LSD). All drops are considered
together as a flood, that is total data in the located land space (dataset). It can move from its current position
to another position based on the condition of the location such as a river, drains, pond, lake, and some other
storage locations.
Optimization based algorithms such as sun flower optimization (SFO) [1], rider optimization
algorithm (ROA) [2], gray wolf optimization (GWO) [3], particle swarm optimization (PSO) [4] and genetic
algorithm [5], are very powerful algorithms in machine learning under data mining, that is the sub branch of
artificial intelligence [6], [7]. There are three different models of machine learning such as unsupervised
learning, supervised learning and reinforcement learning. Depends on the input and some other features it is
2. Int J Artif Intell ISSN: 2252-8938
Fuzzy C-means clustering on rainfall flow optimization technique for … (Antony Jaya Mabel Rani)
181
defined as supervisor or without supervisor or situation like failure or success [8]. Then another form of
machine learning is called semi-supervised, that is the combination of with supervisor (classification) and
without supervisor (clustering) [9]–[11].
The main inspiration of designing this novel proposed data clustering algorithm is to produce better
optimum clustering solutions with faster convergence. To produce an optimum clustering solution this paper
proposed fuzzy-C-means (FCM) based on rainfall flow optimization (RFFO) clustering algorithm for medical
data. In this problem, there are three steps; i) data preprocessing, ii) feature selection and iii) data clustering
by a proposed data clustering algorithm.
The proposed solution is designed based on two important algorithmic concepts. First to produce an
optimum clustering solution here used FCM clustering. Then RFFO is used to find optimum clustering
centroid. FCM clustering algorithm [12], [13] is the most popular clustering algorithm based on a
mathematical logical model. RFFO [14] approach works based on the natural rainfall flow behavior. The
open land source (OLS) is suitable for locating raindrops, which is referred to as OLSD (open land source
data) [15], [16]. The raindrops are poised together and well-thought-out as a torrent, that is total fed data [17].
In some situations, and places, the water dew will not move to any other places, which is called data
stagnation. When the raining, the collected raindrops (water) flow from one position to another position and
get together as flood, which is called as clustering of data. The stowage space jerks raindrops by its least
distance and slant by its acceleration. The total fed data (collected raindrops) is called the torrent. The water
stowage is based on the minimum distance of stowage location, max depth, maximum storage size, condition
of stowage location such as soil, climate, nature of wetlands, and artificial lake. Today real-world
applications have various types with heterogeneous data sets with dissimilar features [18]. For solving all
these complex problems this paper presented a novel optimization-based clustering algorithm, which is the
FCM based RFFO algorithm. The main challenges of medical data clustering are to handle data
preprocessing to find missing data, noise data, data inconsistency, and redundant data in data mining [19].
The visual representation of the projected FCM clustering algorithm based on RFFO technique for medical
data is shown in Figure 1. Today medical data clustering is very vast and intricate, due to the large size of
receiving data, hidden information, massive volume, and its most frequency.
Figure 1. Visual representation for FCM on RFFO algorithm
2. LITERATURE REVIEW
In this literature review given five different latest existing medical data-based clustering methods
with its drawback. In 2018 Yelipe et. al designed an imputation based on class-based clustering, which is
simply called IM-CBC to identify and evaluate the similarity between the two medical records. This paper
used Euclidean distance to find the similarity between the clusters with fuzzy similarity functions. Then,
classification is also done with the help of classification methods, such as SVM, C4.5, or k-nearest neighbors
(KNN). Here the performance is given as high accuracy. At the same time, this method is not examined fuzzy
measures for data classification and predicting the results based on given medical data [20]. Then in 2018
Das et al proposed a modified bee colony optimization (MBCO) technique for clustering the medical data
with the combination of K-means clustering algorithm with chaotic theory for faster convergence. This
method compared with other clustering methods and shown MBCO produced faster convergence. But this
hybrid method does not adapt for multi-objective functions and is not processed for high-frequency data
streams [21]. Then in 2019 Chauhan et al. given a two-step clustering technique to analyze the patient’s
disorders by using different variables to and determine the earlier stage of the liver disease from the hidden
knowledge [22]. In 2020 Yu et.al. [23] designed medical data clustering and feature extraction by using
immune evolutionary algorithm under cloud computing for big data. Here the final results produced the better
3. ISSN: 2252-8938
Int J Artif Intell, Vol. 12, No. 1, March 2023: 180-188
182
accuracy of data classification, improve the performance for medical data. Here the final results produced the
better clustering solution by optimum clustering centroid. Then this algorithm is also compared with the
existing algorithm [14].
3. THE PROPOSED CLUSTERING METHOD
By getting advantages of both traditional clustering and global optimization-based technique for
optimum centroid here, proposed hybrid FCM based on RFFO techniques for medical data. RFFO can
produce optimum clustering cenroid. Based on the optimized clustering centroid FCM clustering algorithm
can produce better clustering solution.
3.1. Introduction about clustering
In clustering, the collection of data items is grouped into a set of disjoint classes, which is the sub
branch of unsupervised learning in machine learning [24], [25]. Here are different forms of clustering
algorithms from traditional clustering algorithms to global optimization-based clustering techniques. This
paper used optimization-based clustering by using hybrid FCM based RFFO clustering algorithm.
3.2. Fuzzy C-means clustering
FCM clustering algorithm is also simple clustering under fuzzy logic. (It can have the value 0 and
1), that is mathematical logical model-based partitioning clustering [26]. The core objective of the FCM
algorithm is the minimum cost function OFCM using Euclidean distance by (1).
𝑂𝐹𝐶𝑀 = 𝑂(𝑊, 𝑉) = ∑ ∑ (𝑊𝑖𝑗)𝑑
‖𝐵𝑖 − 𝐶𝑗‖
2
𝐶
𝑗=1
𝑋
𝑖=1 (1)
Where fuzzification degree, here i=1, . . ., X, and j=1, . . ., C that is membership matrix. Then Bi is the ith
dimension of the given data, and the jth dimension of the cluster center is Cj. Then the cluster center will be
updated by (2);
𝐶𝑗 =
∑ 𝑊𝑖𝑗
𝑑
· 𝐵𝑖
𝑋
𝑖=1
∑ 𝑊𝑖𝑗
𝑑
𝑋
𝑖=1
(2)
Here the updated fuzzy membership matrix is calculated by (3);
𝑊𝑖𝑗 =
1
∑ (
‖𝐵𝑖 −𝐶𝑗‖
‖𝐵𝑖 − 𝐶𝑘‖
)
2
𝑑−1
⁄
𝐶
𝑘=1
(3)
3.3. Pseudocode for FCM clustering
Consider 𝐵 = {𝐵1 , 𝐵2, 𝐵3, . . . 𝐵𝑛} for the data point sets and the cluster centers C={Ce1, Ce2, Cej}.
Initially the centers of each cluster are selected randomly, then the fuzzy membership value will be the
computer, Wij by (3), calculate center Cj for the fuzzy cluster center by (2), reiterate steps 2 and 3 till the
defined number of iterations or if it is less than the given threshold value or there is no improvement.
‖(𝑊𝑖𝑗)
𝑘+1
− (𝑊𝑖𝑗)
𝑘
‖ < Ꜫ (4)
Here, the iteration step k, then the expiry condition. This FCM iteration stops when the value of the partition
matrix is less than, which is definite as 0.0001. FCM is a little more beneficial than K-means. But it has also
some shortcomings than global optimization-based clustering. The drawback of the FCM is a sensibility for
initialization of cluster centroid and premature convergence.
3.4. Procedure for proposed hybrid FCM based RFFO
FCM depend on the primary membership matrix values. Based on probability distribution, the
candidate data is selected, which is performed by random initialization. The algorithmic steps for hybrid
FCM based RFFO is shown as;
− Step 1: Initialization. Initialize Fmax, Maximum iteration numbers (Itmax) acceleration coefficient AC1
and, AC2, Flood best (Fb = ), Depth best (Db = ). Then initial cluster centroids will be selected
randomly,
4. Int J Artif Intell ISSN: 2252-8938
Fuzzy C-means clustering on rainfall flow optimization technique for … (Antony Jaya Mabel Rani)
183
C = {Ce1 , Ce2 ,. . . Cej} (5)
− Step 2: Evaluate fitness function. Here each and every iteration does, calculate the fitness function Ffit by
using (6).
𝐹𝑓𝑖𝑡 = 𝑂𝐹𝐶𝑀 𝐾
⁄ (6)
OFCM is calculated by (1), that is the objective function, then total number of clusters are represented by
“K” for FCM algorithm, with minimum cost n with the help of distance measure as Euclidean distance
formula. Again, centroid will be calculated for cluster by using (2); For finding optimum centroid of the
cluster use RFFO using (6).
− Step 3: Velocity and position updation. By (7) and (8) RFFO’s position and velocity are updated. The
updated position is calculated by,
𝑃𝑖(𝑡 + 1) = 𝑃𝑖(𝑡) + 𝑌𝑖(𝑡 + 1) (7)
Then the updated velocity is calculated by;
𝑌𝑖(𝑡 + 1) = 𝑌𝑖(𝑡) + 𝐶1 ∗ ⅄1(𝐷𝑏 − 𝑃𝑖(𝑡)) + 𝐶2 ∗ ⅄2(𝐹𝑏 − 𝑃𝑖(𝑡)) (8)
Here Yi(t) is computed by using the (9);
𝑌𝑖(𝑡) = (Kci ∗ Δx)/ WP (9)
The gradient Δx, that is M, computed by the (10);
M = (a2 − a1)/(b2 − b1) (10)
Here there are two slope coordinates such as (a1, b1) and (a2, b2), then Pi(t) is the present particle’s
position at t. Particle’s next position will be updating Pi(t+1) at (t+1). WP is water absorbency, that is
overall suckled data. Db denotes personal best location. Then Fb is the globally best solution. The
hydraulic conductivity (Kci) value from 0.8 to 0.95, and the capillary constant AC1, AC2 coefficient,
which is 2.0 and the values of random variables ranging from X1 to Xn considered from 0 to 1.
− Step 4: Defining the optimum centroid by RFFO. RFFO method is used to define optimum clustering
centroid to produce better clustering solution.
− Step 5: Termination Condition. Iterate the steps 2 to 3 until the extreme or determined number of iteration
count reaches.
4. RESEARCH METHOD
The research method is designed based on two important algorithmic concepts. First to produce an
optimum clustering solution here used FCM clustering. Then RFFO is used to find optimum clustering
centroid. Today real-world applications have various types with heterogeneous data sets. For solving all these
complex problems this paper presented a novel optimization-based clustering algorithm, which is the FCM
based RFFO algorithm. The main problem of medical data clustering are to handle data preprocessing to find
missing data, noise data, data inconsistency, and redundant data in data mining. This fuzzy clustering is
implemented using the Python 3.8.6 in Windows 10 operating system, intel i5 core processor. For this
experimentation taken 300 persons real medical checkup data to predict the symptoms of heart disease. The
experimental results also compared with existing methods and shown the performance measure based on
accuracy, Jaccard coefficient and random coefficient.
5. EXPERIMENTAL RESULTS AND DISCUSSION
For the medical data clustering, heart disease data has been taken for experimentation to analyze and
forecast the risk factors of heart disease. Heart disease data were collected from Johnson Jims, a staff nurse
from Kuwait based on the reference. From this, we can provide various suggestions for each type of
clustering. For low symptoms of heart disease, we can provide suggestions to take healthy food, and doing
exercise then average risk factors, can provide the suggestions such as food diet, walking distance, exercise
to do, and any medicine to take. Then for high risk of heart disease can suggest more concentrate on a food
5. ISSN: 2252-8938
Int J Artif Intell, Vol. 12, No. 1, March 2023: 180-188
184
diet, walking, regular medical checkup, and exercise to do. Figure 2 detailed about 2-dimensional (2D) view
of clustering results for cholesterol vs age, body mass index vs age and glucose level vs age. Then in Figures
2(a) to 2(c) are shown the simulation results in the 2D model using FCM based RFFO with different features
of medical data in the sense early phase and ending phase of the clusters.
The Figure 2(a) shows the 2D simulation result for age vs cholesterol. Here green color shows less
symptoms of heart disease risk factor, then the blue color denotes the average risk factor of heart disease and
finally the red color shows the high-risk factor of heart disease. Similar that the Figure 2(b) shows the 2D
simulation result for age vs body mass index, and the Figure 2(c) figured for age vs glucose level in the form
of 2D simulation.
(a)
(b)
(c)
Figure 2. 2D clustering result for (a) age vs cholesterol, (b) age vs body mass index, and (c) age vs glucose
level using FCM based RFFO
6. Int J Artif Intell ISSN: 2252-8938
Fuzzy C-means clustering on rainfall flow optimization technique for … (Antony Jaya Mabel Rani)
185
5.1. Comparative study analysis
Figures 3(a) to 3(c), are shown the qualified comparative study analysis by using input size for the
above performance measures. The input size of the comparative study analysis is varying from 50 to 300.
The study analysis is shown based on accuracy measure Jaccard coefficient and random coefficient. When
input size is 50, the corresponding accuracy values are computed by existing K-means, K-harmonic means
(KHM), FCM, K-means+RFFO and proposed RFFO-based FCM. Likewise, accuracy, Jaccard coefficient
and random coefficients are also calculated for the input size 300.
(a) (b)
(c)
Figure 3. The comparative study analysis for the performance measures (a) accuracy b) Jaccard coefficient,
and (c) Rand coefficient based on input size
Here the Figure 3(a) shows the comparative study analysis by using accuracy measure, which is
computed for existing models such as K-means, KHM, FCM, RFFO+K-means and proposed FCM based
RFFO at the input size 50 are 50.263%, 57.327%, 63.23%, 66.2387% and 69.748 respectively. Similar that,
accuracy is calculated by using existing K-means, KHM, FCM, RFFO+K-means and proposed FCM based
RFFO at the input size 300 are 79.545%, 80.321%, 81.532%, 90.234% and 91.234% respectively. The Figure
3(b) shows the comparative study analysis by using Jaccard coefficient, which is computed for existing
models such as K-means, KHM, FCM, RFFO+K-means and proposed FCM based RFFO at the input size 50
are 32.123%, 40.437%, 46.438%, 70.297% and 72.748% respectively. Similar that, Jaccard coefficient is
calculated by using existing K-means, KHM, FCM, RFFO+K-means and proposed FCM based RFFO at the
input size 300 are 63.09%, 73.555%, 77.3825%, 90.626% and 91.614% respectively. The Figure 3(c) shows
the comparative study analysis by using random coefficient, which is computed for existing models such as
K-means, KHM, FCM, RFFO+K-means and proposed FCM based RFFO at the input size 50 are 47.231%,
48.487%, 53.208%, 65.767% and 68.101% respectively. Similar that, random coefficient is calculated by
using existing K-means, KHM, FCM, RFFO+K-means and proposed FCM based RFFO at the input size 300
are 71.653%, 71.985%, 76.326%, 90.534% and 91.767% respectively.
7. ISSN: 2252-8938
Int J Artif Intell, Vol. 12, No. 1, March 2023: 180-188
186
5.2. Performance metric
The performance measure for the hybrid FCM based RFFO clustering algorithm is employed by
accuracy, random coefficient, and Jaccard coefficient which are given in the following section, by using
accuracy data quality can be calculated from true positives ACTp
, true negatives ACTn
, false positives ACFp
and false negatives ACFn
;
Accuracy =
ACTp+ACTn
ACTp+ACTn+ACFp+ACFn (11)
Here ACTp
, ACTn
, ACFp
, ACFn
are the parameters.
Jaccard coefficient measure is used to find similarities by comparing two data clusters;
Jack(U, V) =
|U∩V|
|U∪V|
(12)
Here U and V are two different clusters.
Random coefficient is the third performance measure, which is used to find the ratio of correct decision.
Rand coefficient is calculated to estimate the right clustered pairs and the equation to compute and coefficient
is as,
Random Coefficient=
𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑠𝑖𝑚𝑖𝑙𝑎𝑟 𝑝𝑎𝑖𝑟𝑠 + 𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑑𝑖𝑠𝑠𝑖𝑚𝑖𝑙𝑎𝑟 𝑝𝑎𝑖𝑟𝑠
𝑇𝑜𝑡𝑎𝑙 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑝𝑎𝑖𝑟𝑠
(13)
5.3. Comparative analysis table based on performance measure
The given Table 1 analyze the above three performance measures. The maximal accuracy Jaccard
coefficient and random coefficient for the proposed FCM based RFFO are 91.234%, 89.614%, and 92.767%.
Here the maximal accuracy is acquired by proposed RFFO-based FCM with accuracy of 91.234%, whereas
the accuracy of existing K-means, KHM, FCM and K-means based RFFO are 79.545%, 80.231%, 81.534%
and 90.166% respectively. Likewise, the input size for the maximal Jaccard coefficient and random
coefficient also given in that Table 1.
Table 1. Comparative analysis
Input Comparative metrics K-means KHM FCM KM+RFFO FCM+RFFO
Input size Accuracy (%) 79.545 80.321 81.534 90.166 91.234
Jaccard coefficient (%) 63.09 73.555 77.382 87.626 89.614
Random coefficient (%) 71.653 71.985 76.326 90.534 92.767
6. CONCLUSION
Thus, the paper proposed an optimization-based clustering algorithm with the name of a hybrid
RFFO based FCM clustering algorithm for medical data. Here heart disease-based medical data has been
taken and projected the model with optimal data clustering. The final data clustering was done by FCM based
RFFO algorithm for medical data. The proposed success is achieved for FCM based RFFO algorithm with
maximal accuracy 91.234%, Jaccard coefficient 89.614% and Rand coefficient 92.767%. The main
advantages of this hybrid optimization-based clustering algorithm combine the advantages of both algorithms
like fast convergence of traditional clustering FCM algorithm and to produce better centroid by using
optimization-based method RFFO. So, this hybrid algorithm can avoid premature convergence and it can also
produce optimum centroid. In the future, this model can be extended by multi-objective functions for a more
effective and better clustering centroid. This will help the doctors to take proper decisions from the immense
needs with huge data size.
REFERENCES
[1] G. F. Gomes, S. S. da Cunha, and A. C. Ancelotti, “A sunflower optimization (SFO) algorithm applied to damage identification on
laminated composite plates,” Engineering with Computers, vol. 35, no. 2, pp. 619–626, May 2019, doi: 10.1007/s00366-018-0620-8.
[2] D. Binu and B. S. Kariyappa, “RideNN: a new rider optimization algorithm-based neural network for fault diagnosis in analog
circuits,” IEEE Transactions on Instrumentation and Measurement, vol. 68, no. 1, pp. 2–26, Jan. 2019, doi:
10.1109/TIM.2018.2836058.
[3] A. N. Jadhav and N. Gomathi, “DIGWO: hybridization of dragonfly algorithm with improved grey wolf optimization algorithm
for data clustering,” Multimedia Research, vol. 2, no. 3, pp. 1–11, Jul. 2019, doi: 10.46253/j.mr.v2i3.a1.
8. Int J Artif Intell ISSN: 2252-8938
Fuzzy C-means clustering on rainfall flow optimization technique for … (Antony Jaya Mabel Rani)
187
[4] N. Zemmal et al., “Particle swarm optimization-based swarm intelligence for active learning improvement: application on medical
data classification,” Cognitive Computation, vol. 12, no. 5, pp. 991–1010, Aug. 2020, doi: 10.1007/s12559-020-09739-z.
[5] B. Gao, X. Li, W. L. Woo, and G. Y. Tian, “Physics-based image segmentation using first order statistical properties and genetic
algorithm for inductive thermography imaging,” IEEE Transactions on Image Processing, vol. 27, no. 5, pp. 2160–2175, May
2018, doi: 10.1109/TIP.2017.2783627.
[6] S. N. Ghazavi and T. W. Liao, “Medical data mining by fuzzy modeling with selected features,” Artificial Intelligence in
Medicine, vol. 43, no. 3, pp. 195–206, Jul. 2008, doi: 10.1016/j.artmed.2008.04.004.
[7] A. Jaya Mabel Rani and A. Pravin, “Multi-objective hybrid fuzzified PSO and fuzzy C-means algorithm for clustering CDR
data,” in Proceedings of the 2019 IEEE International Conference on Communication and Signal Processing, ICCSP 2019, Apr.
2019, pp. 94–98, doi: 10.1109/ICCSP.2019.8698080.
[8] J. Han, M. Kambar, and J. Pei, Data mining: concepts and techniques, 3rd Ed. Elsevier Inc., 2012.
[9] X. Li and N. Ye, “A supervised clustering and classification algorithm for mining data with mixed variables,” IEEE Transactions
on Systems, Man, and Cybernetics Part A:Systems and Humans, vol. 36, no. 2, pp. 396–406, Mar. 2006, doi:
10.1109/TSMCA.2005.853501.
[10] B. S. Chandana, K. Srinivas, and R. K. Kumar, “Clustering algorithm combined with hill climbing for classification of remote
sensing image,” International Journal of Electrical and Computer Engineering (IJECE), vol. 4, no. 6, pp. 923–930, Dec. 2014,
doi: 10.11591/ijece.v4i6.6608.
[11] Y. T. Quek, W. L. Woo, and L. Thillainathan, “IoT load classification and anomaly Wwarning in ELV DC picogrids using
hierarchical extended k-nearest neighbors,” IEEE Internet of Things Journal, vol. 7, no. 2, pp. 863–873, Feb. 2020, doi:
10.1109/JIOT.2019.2945425.
[12] G. Parthasarathy and D. C. Tomar, “A novel approach for mining frequent itemsets in medical image databases,” International
Journal of Pharmacy and Technology, vol. 8, no. 3, pp. 18126–18135, Sep. 2016.
[13] Y. Liu, K. Xiao, A. Liang, and H. Guan, “Fuzzy C-means clustering with bilateral filtering for medical image segmentation,” in
Proceeding of International Conference on Hybrid Artificial Intelligence Systems (HAIS 2012), vol. 7208 LNAI, no. PART 1,
Springer Berlin Heidelberg, 2012, pp. 221–230.
[14] A. Jaya Mabel Rani and A. Pravin, “Rainfall flow optimization based K-means clustering for medical data,” Concurrency and
Computation: Practice and Experience, vol. 33, no. 17, Sep. 2021, doi: 10.1002/cpe.6308.
[15] Y. Qin, “Urban flooding mitigation techniques: a systematic review and future studies,” Water (Switzerland), vol. 12, no. 12, p.
3579, Dec. 2020, doi: 10.3390/w12123579.
[16] A. K. Saini, S. S. Chauhan, and A. Tiwari, “Creeping flow of Jeffrey fluid through a swarm of porous cylindrical particles:
Brinkman–Forchheimer model,” International Journal of Multiphase Flow, vol. 145, 2021, doi:
10.1016/j.ijmultiphaseflow.2021.103803.
[17] H. Ma and D. Simon, “Analysis of migration models of biogeography-based optimization using Markov theory,” Engineering
Applications of Artificial Intelligence, vol. 24, no. 6, pp. 1052–1060, Sep. 2011, doi: 10.1016/j.engappai.2011.04.012.
[18] J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proceedings of ICNN’95 - International Conference on Neural
Networks, 1995, pp. 1942–1948, doi: 10.1109/icnn.1995.488968.
[19] Y. Zhong, S. Zhang, and L. Zhang, “Automatic fuzzy clustering based on adaptive multi-objective differential evolution for
remote sensing imagery,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 6, no. 5, pp.
2290–2301, Oct. 2013, doi: 10.1109/JSTARS.2013.2240655.
[20] U. R. Yelipe, S. Porika, and M. Golla, “An efficient approach for imputation and classification of medical data values using class-
based clustering of medical records,” Computers and Electrical Engineering, vol. 66, pp. 487–504, Feb. 2018, doi:
10.1016/j.compeleceng.2017.11.030.
[21] P. Das, D. K. Das, and S. Dey, “A modified bee colony optimization (MBCO) and its hybridization with K-means for an
application to data clustering,” Applied Soft Computing Journal, vol. 70, pp. 590–603, Sep. 2018, doi:
10.1016/j.asoc.2018.05.045.
[22] R. Chauhan, N. Kumar, and R. Rekapally, “Predictive data analytics technique for optimization of medical databases,” in
Advances in Intelligent Systems and Computing, vol. 742, Springer Singapore, 2019, pp. 433–441.
[23] J. Yu, H. Li, and D. Liu, “Modified immune evolutionary algorithm for medical data clustering and feature extraction under cloud
computing environmentjournal of healthcare engineering,” Journal of Healthcare Engineering, vol. 2020, pp. 1–11, Jan. 2020,
doi: 10.1155/2020/1051394.
[24] A. Al-Shammari, R. Zhou, M. Naseriparsaa, and C. Liu, “An effective density-based clustering and dynamic maintenance
framework for evolving medical data streams,” International Journal of Medical Informatics, vol. 126, pp. 176–186, Jun. 2019,
doi: 10.1016/j.ijmedinf.2019.03.016.
[25] V. Kumar, “Implementation of data mining techniques for information retrieval,” National University of Science and Technology
“NSIT-MISIS,” Jaipur, india, 2018.
[26] Y. Zhang, Y. Du, X. Li, S. Fang, and F. Ling, “Unsupervised subpixel mapping of remotely sensed imagery based on fuzzy C-
means clustering approach,” IEEE Geoscience and Remote Sensing Letters, vol. 11, no. 5, pp. 1024–1028, May 2014, doi:
10.1109/LGRS.2013.2285404.
BIOGRAPHIES OF AUTHORS
Antony Jaya Mabel Rani received the M.E degree in Computer Science and
Engineering from, Rajarajeshwari Engineering college, Anna University, Chennai, India in 2007.
She is pursuing her Ph.D in Computer Science and Engineering at Sathyabama University
Chennai, India. She has 18 Years of teaching experience. She has participated and presented
many Research Papers in International and National Conferences and also published papers in
International and National Journals. Her area of interests includes artificial intelligence, data
mining, machine learning, deep learning and big data. She can be contacted at email:
ajayamabelrani@gmail.com.
9. ISSN: 2252-8938
Int J Artif Intell, Vol. 12, No. 1, March 2023: 180-188
188
Dr. C. Srivenkateswaran received the B.E degree in Electronics & Communication
Engineering from University of Madras, Chennai, India in 1996, M.E degree in Computer and
Communication Engineering from Anna University, Chennai, India in 2009 and Ph.D degree in
Information and Communication Engineering at Anna University, Chennai, India in 2019. He
works currently as an Associate Professor for the Department of Computer Science and
Engineering at Kings Engineering College; Chennai and he has 23 Years of teaching & Industries
experience. He has participated and presented many Research Papers in International and
National Conferences and also published papers in International and National Journals. His area
of interests includes cyber security, ethical hacking and big data. He can be contacted at email:
srivenkateswaran@kingsedu.ac.in.
Dr. M. Rajasekar received the B.E degree in Computer Science and Engineering
from Sardhar Raja College Engineering, Anna University, Chennai, India in 2007, M.E degree in
Computer Science and Engineering from Veltech engineering college, Anna University, Chennai,
India in 2010 and Ph.D degree in Information and Communication Engineering at Anna
University, Chennai, India in 2021. He works currently as an Associate Professor for the
Department of Computer Science and Engineering at Institute of Computer Science and
Engineering, Saveetha School Of Engineering, SIMATS, Chennai and he has 13 Years of
teaching experience. He has participated and presented many Research Papers in International
and National Conferences and also published many papers in International and National Journals.
His area of interests includes image processing, artificial intelligence, machine learning and
network. He can be contacted at email: mrajasekarcse@gmail.com.
Mr. M. Arun received his M.E., Degree in Applied Electronics from College of
Engineering Guindy, Anna University Chennai. He is pursuing his Ph.D in Antenna Domain at
sathyabama Institute of Science and Technology, India. He is working as an Assistant Professor
ECE, Panimalar Institute of Technology. He is also serving as Executive Committee member of
IETE Chennai Centre. He is serving as Vice Chairman of IEEE Madras YP & IEEE TEMS
society, Secretary in IEEE EMC society, Treasurer, IEEE COMSOC Madras Chapter & Ex-
com Member of IETE Chennai Center. He has got 13 Years of experience in Teaching Field. He
received IEEE Outstanding Student Branch Counselor and Branch Chapter Advisor Award from
IEEE MGA for the year 2019. He also received best student branch counselor award for the year
2015-20 from IEEE Madras Section. He can be contacted at email: arunmemba@ieee.org.