In this paper, we propose a new approach to solve the document-clustering using the K-Means algorithm. The latter is sensitive to the random selection of the k cluster centroids in the initialization phase. To evaluate the quality of K-Means clustering we propose to model the text document clustering problem as the max stable set problem (MSSP) and use continuous Hopfield network to solve the MSSP problem to have initial centroids. The idea is inspired by the fact that MSSP and clustering share the same principle, MSSP consists to find the largest set of nodes completely disconnected in a graph, and in clustering, all objects are divided into disjoint clusters. Simulation results demonstrate that the proposed K-Means improved by MSSP (KM_MSSP) is efficient of large data sets, is much optimized in terms of time, and provides better quality of clustering than other methods.
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
Data mining is the process of using technology to identify patterns and prospects from large amount of information. In Data Mining, Clustering is an important research topic and wide range of unverified classification application. Clustering is technique which divides a data into meaningful groups. K-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. In this paper, we present the comparison of different K-means clustering algorithms.
Textual Data Partitioning with Relationship and Discriminative AnalysisEditor IJMTER
Data partitioning methods are used to partition the data values with similarity. Similarity
measures are used to estimate transaction relationships. Hierarchical clustering model produces tree
structured results. Partitioned clustering produces results in grid format. Text documents are
unstructured data values with high dimensional attributes. Document clustering group ups unlabeled text
documents into meaningful clusters. Traditional clustering methods require cluster count (K) for the
document grouping process. Clustering accuracy degrades drastically with reference to the unsuitable
cluster count.
Textual data elements are divided into two types’ discriminative words and nondiscriminative
words. Only discriminative words are useful for grouping documents. The involvement of
nondiscriminative words confuses the clustering process and leads to poor clustering solution in return.
A variation inference algorithm is used to infer the document collection structure and partition of
document words at the same time. Dirichlet Process Mixture (DPM) model is used to partition
documents. DPM clustering model uses both the data likelihood and the clustering property of the
Dirichlet Process (DP). Dirichlet Process Mixture Model for Feature Partition (DPMFP) is used to
discover the latent cluster structure based on the DPM model. DPMFP clustering is performed without
requiring the number of clusters as input.
Document labels are used to estimate the discriminative word identification process. Concept
relationships are analyzed with Ontology support. Semantic weight model is used for the document
similarity analysis. The system improves the scalability with the support of labels and concept relations
for dimensionality reduction process.
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
Data mining is the process of using technology to identify patterns and prospects from large amount of information. In Data Mining, Clustering is an important research topic and wide range of unverified classification application. Clustering is technique which divides a data into meaningful groups. K-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. In this paper, we present the comparison of different K-means clustering algorithms.
Textual Data Partitioning with Relationship and Discriminative AnalysisEditor IJMTER
Data partitioning methods are used to partition the data values with similarity. Similarity
measures are used to estimate transaction relationships. Hierarchical clustering model produces tree
structured results. Partitioned clustering produces results in grid format. Text documents are
unstructured data values with high dimensional attributes. Document clustering group ups unlabeled text
documents into meaningful clusters. Traditional clustering methods require cluster count (K) for the
document grouping process. Clustering accuracy degrades drastically with reference to the unsuitable
cluster count.
Textual data elements are divided into two types’ discriminative words and nondiscriminative
words. Only discriminative words are useful for grouping documents. The involvement of
nondiscriminative words confuses the clustering process and leads to poor clustering solution in return.
A variation inference algorithm is used to infer the document collection structure and partition of
document words at the same time. Dirichlet Process Mixture (DPM) model is used to partition
documents. DPM clustering model uses both the data likelihood and the clustering property of the
Dirichlet Process (DP). Dirichlet Process Mixture Model for Feature Partition (DPMFP) is used to
discover the latent cluster structure based on the DPM model. DPMFP clustering is performed without
requiring the number of clusters as input.
Document labels are used to estimate the discriminative word identification process. Concept
relationships are analyzed with Ontology support. Semantic weight model is used for the document
similarity analysis. The system improves the scalability with the support of labels and concept relations
for dimensionality reduction process.
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...kevig
The tremendous increase in the amount of available research documents impels researchers to propose
topic models to extract the latent semantic themes of a documents collection. However, how to extract the
hidden topics of the documents collection has become a crucial task for many topic model applications.
Moreover, conventional topic modeling approaches suffer from the scalability problem when the size of
documents collection increases. In this paper, the Correlated Topic Model with variational Expectation-
Maximization algorithm is implemented in MapReduce framework to solve the scalability problem. The
proposed approach utilizes the dataset crawled from the public digital library. In addition, the full-texts of
the crawled documents are analysed to enhance the accuracy of MapReduce CTM. The experiments are
conducted to demonstrate the performance of the proposed algorithm. From the evaluation, the proposed
approach has a comparable performance in terms of topic coherences with LDA implemented in
MapReduce framework.
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGijcsa
Text Document Clustering is one of the fastest growing research areas because of availability of huge amount of information in an electronic form. There are several number of techniques launched for clustering documents in such a way that documents within a cluster have high intra-similarity and low inter-similarity to other clusters. Many document clustering algorithms provide localized search in effectively navigating, summarizing, and organizing information. A global optimal solution can be obtained by applying high-speed and high-quality optimization algorithms. The optimization technique performs a globalized search in the entire solution space. In this paper, a brief survey on optimization approaches to text document clustering is turned out.
Mine Blood Donors Information through Improved K-Means Clusteringijcsity
The number of accidents and health diseases which are increasing at an alarming rate are resulting in a huge increase in the demand for blood. There is a necessity for the organized analysis of the blood donor database or blood banks repositories. Clustering analysis is one of the data mining applications and K-means clustering algorithm is the fundamental algorithm for modern clustering techniques. K-means clustering algorithm is traditional approach and iterative algorithm. At every iteration, it attempts to find the distance from the centroid of each cluster to each and every data point. This paper gives the improvement to the original k-means algorithm by improving the initial centroids with distribution of data. Results and discussions show that improved K-means algorithm produces accurate clusters in less computation time to find the donors information
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...ijnlc
The tremendous increase in the amount of available research documents impels researchers to propose topic models to extract the latent semantic themes of a documents collection. However, how to extract the hidden topics of the documents collection has become a crucial task for many topic model applications. Moreover, conventional topic modeling approaches suffer from the scalability problem when the size of documents collection increases. In this paper, the Correlated Topic Model with variational ExpectationMaximization algorithm is implemented in MapReduce framework to solve the scalability problem. The proposed approach utilizes the dataset crawled from the public digital library. In addition, the full-texts of the crawled documents are analysed to enhance the accuracy of MapReduce CTM. The experiments are conducted to demonstrate the performance of the proposed algorithm. From the evaluation, the proposed approach has a comparable performance in terms of topic coherences with LDA implemented in MapReduce framework.
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...ijsrd.com
A cluster is a group of objects which are similar to each other within a cluster and are dissimilar to the objects of other clusters. The similarity is typically calculated on the basis of distance between two objects or clusters. Two or more objects present inside a cluster and only if those objects are close to each other based on the distance between them.The major objective of clustering is to discover collection of comparable objects based on similarity metric. Fuzzy Possibilistic C-Means (FPCM) is the effective clustering algorithm available to cluster unlabeled data that produces both membership and typicality values during clustering process. In this approach, the efficiency of the Fuzzy Possibilistic C-means clustering approach is enhanced by using the penalized and compensated constraints based FPCM (PCFPCM). The proposed PCFPCM approach differ from the conventional clustering techniques by imposing the possibilistic reasoning strategy on fuzzy clustering with penalized and compensated constraints for updating the grades of membership and typicality. The performance of the proposed approaches is evaluated on the University of California, Irvine (UCI) machine repository datasets such as Iris, Wine, Lung Cancer and Lymphograma. The parameters used for the evaluation is Clustering accuracy, Mean Squared Error (MSE), Execution Time and Convergence behavior.
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...ijdmtaiir
-In this study a comprehensive evaluation of two
supervised feature selection methods for dimensionality
reduction is performed - Latent Semantic Indexing (LSI) and
Principal Component Analysis (PCA). This is gauged against
unsupervised techniques like fuzzy feature clustering using
hard fuzzy C-means (FCM) . The main objective of the study is
to estimate the relative efficiency of two supervised techniques
against unsupervised fuzzy techniques while reducing the
feature space. It is found that clustering using FCM leads to
better accuracy in classifying documents in the face of
evolutionary algorithms like LSI and PCA. Results show that
the clustering of features improves the accuracy of document
classification
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION cscpconf
Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text
classification. In this paper, Fast Fuzzy Feature clustering for text classification is proposed. It
is based on the framework proposed by Jung-Yi Jiang, Ren-Jia Liou and Shie-Jue Lee in 2011.
The word in the feature vector of the document is grouped into the cluster in less iteration. The
numbers of iterations required to obtain cluster centers are reduced by transforming clusters
center dimension from n-dimension to 2-dimension. Principle Component Analysis with slit
change is used for dimension reduction. Experimental results show that, this method improve
the performance by significantly reducing the number of iterations required to obtain the cluster
center. The same is being verified with three benchmark datasets
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelWaqas Tariq
A novel clustering algorithm CSHARP is presented for the purpose of finding clusters of arbitrary shapes and arbitrary densities in high dimensional feature spaces. It can be considered as a variation of the Shared Nearest Neighbor algorithm (SNN), in which each sample data point votes for the points in its k-nearest neighborhood. Sets of points sharing a common mutual nearest neighbor are considered as dense regions/ blocks. These blocks are the seeds from which clusters may grow up. Therefore, CSHARP is not a point-to-point clustering algorithm. Rather, it is a block-to-block clustering technique. Much of its advantages come from these facts: Noise points and outliers correspond to blocks of small sizes, and homogeneous blocks highly overlap. This technique is not prone to merge clusters of different densities or different homogeneity. The algorithm has been applied to a variety of low and high dimensional data sets with superior results over existing techniques such as DBScan, K-means, Chameleon, Mitosis and Spectral Clustering. The quality of its results as well as its time complexity, rank it at the front of these techniques.
Comparision of methods for combination of multiple classifiers that predict b...IJERA Editor
Predictive analysis include techniques fromdata mining that analyze current and historical data and make
predictions about the future. Predictive analytics is used in actuarial science, financial services, retail, travel,
healthcare, insurance, pharmaceuticals, marketing, telecommunications and other fields.Predicting patterns can
be considered as a classification problem and combining the different classifiers gives better results. We will
study and compare three methods used to combine multiple classifiers. Bayesian networks perform
classification based on conditional probability. It is ineffective and easy to interpret as it assumes that the
predictors are independent. Tree augmented naïve Bayes (TAN) constructs a maximum weighted spanning tree
that maximizes the likelihood of the training data, to perform classification.This tree structure eliminates the
independent attribute assumption of naïve Bayesian networks. Behavior-knowledge space method works in two
phases and can provide very good performances if large and representative data sets are available.
K-Means clustering uses an iterative procedure which is very much sensitive and dependent upon the initial centroids. The initial centroids in the k-means clustering are chosen randomly, and hence the clustering also changes with respect to the initial centroids. This paper tries to overcome this problem of random selection of centroids and hence change of clusters with a premeditated selection of initial centroids. We have used the iris, abalone and wine data sets to demonstrate that the proposed method of finding the initial centroids and using the centroids in k-means algorithm improves the clustering performance. The clustering also remains the same in every run as the initial centroids are not randomly selected but through premeditated method.
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
The k- Means clustering algorithm is an old algorithm that has been intensely researched owing to its ease
and simplicity of implementation. Clustering algorithm has a broad attraction and usefulness in
exploratory data analysis. This paper presents results of the experimental study of different approaches to
k- Means clustering, thereby comparing results on different datasets using Original k-Means and other
modified algorithms implemented using MATLAB R2009b. The results are calculated on some performance
measures such as no. of iterations, no. of points misclassified, accuracy, Silhouette validity index and
execution time
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
There is a tremendous proliferation in the amount of information available on the largest shared information source, the World Wide Web. Fast and high-quality clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the information. Recent studies have shown that partitional clustering algorithms such as the k-means algorithm are the most popular algorithms for clustering large datasets. The major problem with partitional clustering algorithms is that they are sensitive to the selection of the initial partitions and are prone to premature converge to local optima. Subtractive clustering is a fast, one-pass algorithm for estimating the number of clusters and cluster centers for any given set of data. The cluster estimates can be used to initialize iterative optimization-based clustering methods and model identification methods. In this paper, we present a hybrid Particle Swarm Optimization, Subtractive + (PSO) clustering algorithm that performs fast clustering. For comparison purpose, we applied the Subtractive + (PSO) clustering algorithm, PSO, and the Subtractive clustering algorithms on three different datasets. The results illustrate that the Subtractive + (PSO) clustering algorithm can generate the most compact clustering results as compared to other algorithms.
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...IJDKP
Quantum clustering (QC), is a data clustering algorithm based on quantum mechanics which is
accomplished by substituting each point in a given dataset with a Gaussian. The width of the Gaussian is a
σ value, a hyper-parameter which can be manually defined and manipulated to suit the application.
Numerical methods are used to find all the minima of the quantum potential as they correspond to cluster
centers. Herein, we investigate the mathematical task of expressing and finding all the roots of the
exponential polynomial corresponding to the minima of a two-dimensional quantum potential. This is an
outstanding task because normally such expressions are impossible to solve analytically. However, we
prove that if the points are all included in a square region of size σ, there is only one minimum. This bound
is not only useful in the number of solutions to look for, by numerical means, it allows to to propose a new
numerical approach “per block”. This technique decreases the number of particles by approximating some
groups of particles to weighted particles. These findings are not only useful to the quantum clustering
problem but also for the exponential polynomials encountered in quantum chemistry, Solid-state Physics
and other applications.
Novel text categorization by amalgamation of augmented k nearest neighbourhoo...ijcsity
Machine learning for text classification is the
underpinning
of document
cataloging
, news filtering,
document
steering
and
exemplif
ication
. In text mining realm, effective feature selection is significant to
make the learning task more accurate and competent. One of the
traditional
lazy
text classifier
k
-
Nearest
Neighborhood (
k
NN) has
a
major pitfall in calculating the similarity between
all
the
objects in training and
testing se
t
s,
there by leads to exaggeration of
both
computational complexity
of the algorithm
and
massive
consumption
of
main memory
. To diminish these shortcomings
in
viewpoint
of a
data
-
mining
practitioner
a
n
amalgamati
ve technique is proposed in this paper using
a novel restructured version of
k
NN called
Augmented
k
NN
(AkNN)
and
k
-
Medoids
(kMdd)
clustering.
The proposed work
comprises
preprocesses
on
the
initial training
set
by
imposing
attribute feature selection
for reduc
tion of high dimensionality, also it
detects and excludes the high
-
fliers
samples
in t
he
initial
training set
and
re
structure
s
a
constricted
training
set
.
The kMdd clustering algorithm generates the cluster centers (as interior objects) for each category
and
restructures
the constricted training set
with centroids
. This technique
is
amalgamated with
AkNN
classifier
that
was prearranged with
text mining similarity measure
s.
Eventually, s
ignifican
tweights
and ranks were
assigned to each object in the new
training set based upon the
ir
accessory towards the
object in testing set
.
Experiments
conducted
on Reuters
-
21578 a
UCI benchmark
text mining
data
set
, and
comparisons with
traditional
k
NN
classifier designates
the
referred
method
yield
spreeminentrecital
in b
oth clustering and
classification
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
Soft Computing Techniques Based Image Classification using Support Vector Mac...ijtsrd
n this paper we compare different kernel had been developed for support vector machine based time series classification. Despite the better presentation of Support Vector Machine SVM on many concrete classification problems, the algorithm is not directly applicable to multi dimensional routes having different measurements. Training support vector machines SVM with indefinite kernels has just fascinated consideration in the machine learning public. This is moderately due to the fact that many similarity functions that arise in practice are not symmetric positive semidefinite. In this paper, by spreading the Gaussian RBF kernel by Gaussian elastic metric kernel. Gaussian elastic metric kernel is extended version of Gaussian RBF. The extended version divided in two ways time wrap distance and its real penalty. Experimental results on 17 datasets, time series data sets show that, in terms of classification accuracy, SVM with Gaussian elastic metric kernel is much superior to other kernels, and the ultramodern similarity measure methods. In this paper we used the indefinite resemblance function or distance directly without any conversion, and, hence, it always treats both training and test examples consistently. Finally, it achieves the highest accuracy of Gaussian elastic metric kernel among all methods that train SVM with kernels i.e. positive semi definite PSD and Non PSD, with a statistically significant evidence while also retaining sparsity of the support vector set. Tarun Jaiswal | Dr. S. Jaiswal | Dr. Ragini Shukla ""Soft Computing Techniques Based Image Classification using Support Vector Machine Performance"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-3 , April 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23437.pdf
Paper URL: https://www.ijtsrd.com/computer-science/artificial-intelligence/23437/soft-computing-techniques-based-image-classification-using-support-vector-machine-performance/tarun-jaiswal
In recent machine learning community, there is a trend of constructing a linear logarithm version of
nonlinear version through the ‘kernel method’ for example kernel principal component analysis, kernel
fisher discriminant analysis, support Vector Machines (SVMs), and the current kernel clustering
algorithms. Typically, in unsupervised methods of clustering algorithms utilizing kernel method, a
nonlinear mapping is operated initially in order to map the data into a much higher space feature, and then
clustering is executed. A hitch of these kernel clustering algorithms is that the clustering prototype resides
in increased features specs of dimensions and therefore lack intuitive and clear descriptions without
utilizing added approximation of projection from the specs to the data as executed in the literature
presented. This paper aims to utilize the ‘kernel method’, a novel clustering algorithm, founded on the
conventional fuzzy clustering algorithm (FCM) is anticipated and known as kernel fuzzy c-means algorithm
(KFCM). This method embraces a novel kernel-induced metric in the space of data in order to interchange
the novel Euclidean matric norm in cluster prototype and fuzzy clustering algorithm still reside in the space
of data so that the results of clustering could be interpreted and reformulated in the spaces which are
original. This property is used for clustering incomplete data. Execution on supposed data illustrate that
KFCM has improved performance of clustering and stout as compare to other transformations of FCM for
clustering incomplete data.
Development of depth map from stereo images using sum of absolute differences...nooriasukmaningtyas
This article proposes a framework for the depth map reconstruction using stereo images. Fundamentally, this map provides an important information which commonly used in essential applications such as autonomous vehicle navigation, drone’s navigation and 3D surface reconstruction. To develop an accurate depth map, the framework must be robust against the challenging regions of low texture, plain color and repetitive pattern on the input stereo image. The development of this map requires several stages which starts with matching cost calculation, cost aggregation, optimization and refinement stage. Hence, this work develops a framework with sum of absolute difference (SAD) and the combination of two edge preserving filters to increase the robustness against the challenging regions. The SAD convolves using block matching technique to increase the efficiency of matching process on the low texture and plain color regions. Moreover, two edge preserving filters will increase the accuracy on the repetitive pattern region. The results show that the proposed method is accurate and capable to work with the challenging regions. The results are provided by the Middlebury standard dataset. The framework is also efficiently and can be applied on the 3D surface reconstruction. Moreover, this work is greatly competitive with previously available methods.
Model predictive controller for a retrofitted heat exchanger temperature cont...nooriasukmaningtyas
This paper aims to demonstrate the practical aspects of process control theory for undergraduate students at the Department of Chemical Engineering at the University of Bahrain. Both, the ubiquitous proportional integral derivative (PID) as well as model predictive control (MPC) and their auxiliaries were designed and implemented in a real-time framework. The latter was realized through retrofitting an existing plate-and-frame heat exchanger unit that has been operated using an analog PID temperature controller. The upgraded control system consists of a personal computer (PC), low-cost signal conditioning circuit, national instruments USB 6008 data acquisition card, and LabVIEW software. LabVIEW control design and simulation modules were used to design and implement the PID and MPC controllers. The performance of the designed controllers was evaluated while controlling the outlet temperature of the retrofitted plate-and-frame heat exchanger. The distinguished feature of the MPC controller in handling input and output constraints was perceived in real-time. From a pedagogical point of view, realizing the theory of process control through practical implementation was substantial in enhancing the student’s learning and the instructor’s teaching experience.
More Related Content
Similar to Max stable set problem to found the initial centroids in clustering problem
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...kevig
The tremendous increase in the amount of available research documents impels researchers to propose
topic models to extract the latent semantic themes of a documents collection. However, how to extract the
hidden topics of the documents collection has become a crucial task for many topic model applications.
Moreover, conventional topic modeling approaches suffer from the scalability problem when the size of
documents collection increases. In this paper, the Correlated Topic Model with variational Expectation-
Maximization algorithm is implemented in MapReduce framework to solve the scalability problem. The
proposed approach utilizes the dataset crawled from the public digital library. In addition, the full-texts of
the crawled documents are analysed to enhance the accuracy of MapReduce CTM. The experiments are
conducted to demonstrate the performance of the proposed algorithm. From the evaluation, the proposed
approach has a comparable performance in terms of topic coherences with LDA implemented in
MapReduce framework.
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGijcsa
Text Document Clustering is one of the fastest growing research areas because of availability of huge amount of information in an electronic form. There are several number of techniques launched for clustering documents in such a way that documents within a cluster have high intra-similarity and low inter-similarity to other clusters. Many document clustering algorithms provide localized search in effectively navigating, summarizing, and organizing information. A global optimal solution can be obtained by applying high-speed and high-quality optimization algorithms. The optimization technique performs a globalized search in the entire solution space. In this paper, a brief survey on optimization approaches to text document clustering is turned out.
Mine Blood Donors Information through Improved K-Means Clusteringijcsity
The number of accidents and health diseases which are increasing at an alarming rate are resulting in a huge increase in the demand for blood. There is a necessity for the organized analysis of the blood donor database or blood banks repositories. Clustering analysis is one of the data mining applications and K-means clustering algorithm is the fundamental algorithm for modern clustering techniques. K-means clustering algorithm is traditional approach and iterative algorithm. At every iteration, it attempts to find the distance from the centroid of each cluster to each and every data point. This paper gives the improvement to the original k-means algorithm by improving the initial centroids with distribution of data. Results and discussions show that improved K-means algorithm produces accurate clusters in less computation time to find the donors information
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...ijnlc
The tremendous increase in the amount of available research documents impels researchers to propose topic models to extract the latent semantic themes of a documents collection. However, how to extract the hidden topics of the documents collection has become a crucial task for many topic model applications. Moreover, conventional topic modeling approaches suffer from the scalability problem when the size of documents collection increases. In this paper, the Correlated Topic Model with variational ExpectationMaximization algorithm is implemented in MapReduce framework to solve the scalability problem. The proposed approach utilizes the dataset crawled from the public digital library. In addition, the full-texts of the crawled documents are analysed to enhance the accuracy of MapReduce CTM. The experiments are conducted to demonstrate the performance of the proposed algorithm. From the evaluation, the proposed approach has a comparable performance in terms of topic coherences with LDA implemented in MapReduce framework.
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...ijsrd.com
A cluster is a group of objects which are similar to each other within a cluster and are dissimilar to the objects of other clusters. The similarity is typically calculated on the basis of distance between two objects or clusters. Two or more objects present inside a cluster and only if those objects are close to each other based on the distance between them.The major objective of clustering is to discover collection of comparable objects based on similarity metric. Fuzzy Possibilistic C-Means (FPCM) is the effective clustering algorithm available to cluster unlabeled data that produces both membership and typicality values during clustering process. In this approach, the efficiency of the Fuzzy Possibilistic C-means clustering approach is enhanced by using the penalized and compensated constraints based FPCM (PCFPCM). The proposed PCFPCM approach differ from the conventional clustering techniques by imposing the possibilistic reasoning strategy on fuzzy clustering with penalized and compensated constraints for updating the grades of membership and typicality. The performance of the proposed approaches is evaluated on the University of California, Irvine (UCI) machine repository datasets such as Iris, Wine, Lung Cancer and Lymphograma. The parameters used for the evaluation is Clustering accuracy, Mean Squared Error (MSE), Execution Time and Convergence behavior.
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...ijdmtaiir
-In this study a comprehensive evaluation of two
supervised feature selection methods for dimensionality
reduction is performed - Latent Semantic Indexing (LSI) and
Principal Component Analysis (PCA). This is gauged against
unsupervised techniques like fuzzy feature clustering using
hard fuzzy C-means (FCM) . The main objective of the study is
to estimate the relative efficiency of two supervised techniques
against unsupervised fuzzy techniques while reducing the
feature space. It is found that clustering using FCM leads to
better accuracy in classifying documents in the face of
evolutionary algorithms like LSI and PCA. Results show that
the clustering of features improves the accuracy of document
classification
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION cscpconf
Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text
classification. In this paper, Fast Fuzzy Feature clustering for text classification is proposed. It
is based on the framework proposed by Jung-Yi Jiang, Ren-Jia Liou and Shie-Jue Lee in 2011.
The word in the feature vector of the document is grouped into the cluster in less iteration. The
numbers of iterations required to obtain cluster centers are reduced by transforming clusters
center dimension from n-dimension to 2-dimension. Principle Component Analysis with slit
change is used for dimension reduction. Experimental results show that, this method improve
the performance by significantly reducing the number of iterations required to obtain the cluster
center. The same is being verified with three benchmark datasets
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelWaqas Tariq
A novel clustering algorithm CSHARP is presented for the purpose of finding clusters of arbitrary shapes and arbitrary densities in high dimensional feature spaces. It can be considered as a variation of the Shared Nearest Neighbor algorithm (SNN), in which each sample data point votes for the points in its k-nearest neighborhood. Sets of points sharing a common mutual nearest neighbor are considered as dense regions/ blocks. These blocks are the seeds from which clusters may grow up. Therefore, CSHARP is not a point-to-point clustering algorithm. Rather, it is a block-to-block clustering technique. Much of its advantages come from these facts: Noise points and outliers correspond to blocks of small sizes, and homogeneous blocks highly overlap. This technique is not prone to merge clusters of different densities or different homogeneity. The algorithm has been applied to a variety of low and high dimensional data sets with superior results over existing techniques such as DBScan, K-means, Chameleon, Mitosis and Spectral Clustering. The quality of its results as well as its time complexity, rank it at the front of these techniques.
Comparision of methods for combination of multiple classifiers that predict b...IJERA Editor
Predictive analysis include techniques fromdata mining that analyze current and historical data and make
predictions about the future. Predictive analytics is used in actuarial science, financial services, retail, travel,
healthcare, insurance, pharmaceuticals, marketing, telecommunications and other fields.Predicting patterns can
be considered as a classification problem and combining the different classifiers gives better results. We will
study and compare three methods used to combine multiple classifiers. Bayesian networks perform
classification based on conditional probability. It is ineffective and easy to interpret as it assumes that the
predictors are independent. Tree augmented naïve Bayes (TAN) constructs a maximum weighted spanning tree
that maximizes the likelihood of the training data, to perform classification.This tree structure eliminates the
independent attribute assumption of naïve Bayesian networks. Behavior-knowledge space method works in two
phases and can provide very good performances if large and representative data sets are available.
K-Means clustering uses an iterative procedure which is very much sensitive and dependent upon the initial centroids. The initial centroids in the k-means clustering are chosen randomly, and hence the clustering also changes with respect to the initial centroids. This paper tries to overcome this problem of random selection of centroids and hence change of clusters with a premeditated selection of initial centroids. We have used the iris, abalone and wine data sets to demonstrate that the proposed method of finding the initial centroids and using the centroids in k-means algorithm improves the clustering performance. The clustering also remains the same in every run as the initial centroids are not randomly selected but through premeditated method.
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
The k- Means clustering algorithm is an old algorithm that has been intensely researched owing to its ease
and simplicity of implementation. Clustering algorithm has a broad attraction and usefulness in
exploratory data analysis. This paper presents results of the experimental study of different approaches to
k- Means clustering, thereby comparing results on different datasets using Original k-Means and other
modified algorithms implemented using MATLAB R2009b. The results are calculated on some performance
measures such as no. of iterations, no. of points misclassified, accuracy, Silhouette validity index and
execution time
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
There is a tremendous proliferation in the amount of information available on the largest shared information source, the World Wide Web. Fast and high-quality clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the information. Recent studies have shown that partitional clustering algorithms such as the k-means algorithm are the most popular algorithms for clustering large datasets. The major problem with partitional clustering algorithms is that they are sensitive to the selection of the initial partitions and are prone to premature converge to local optima. Subtractive clustering is a fast, one-pass algorithm for estimating the number of clusters and cluster centers for any given set of data. The cluster estimates can be used to initialize iterative optimization-based clustering methods and model identification methods. In this paper, we present a hybrid Particle Swarm Optimization, Subtractive + (PSO) clustering algorithm that performs fast clustering. For comparison purpose, we applied the Subtractive + (PSO) clustering algorithm, PSO, and the Subtractive clustering algorithms on three different datasets. The results illustrate that the Subtractive + (PSO) clustering algorithm can generate the most compact clustering results as compared to other algorithms.
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...IJDKP
Quantum clustering (QC), is a data clustering algorithm based on quantum mechanics which is
accomplished by substituting each point in a given dataset with a Gaussian. The width of the Gaussian is a
σ value, a hyper-parameter which can be manually defined and manipulated to suit the application.
Numerical methods are used to find all the minima of the quantum potential as they correspond to cluster
centers. Herein, we investigate the mathematical task of expressing and finding all the roots of the
exponential polynomial corresponding to the minima of a two-dimensional quantum potential. This is an
outstanding task because normally such expressions are impossible to solve analytically. However, we
prove that if the points are all included in a square region of size σ, there is only one minimum. This bound
is not only useful in the number of solutions to look for, by numerical means, it allows to to propose a new
numerical approach “per block”. This technique decreases the number of particles by approximating some
groups of particles to weighted particles. These findings are not only useful to the quantum clustering
problem but also for the exponential polynomials encountered in quantum chemistry, Solid-state Physics
and other applications.
Novel text categorization by amalgamation of augmented k nearest neighbourhoo...ijcsity
Machine learning for text classification is the
underpinning
of document
cataloging
, news filtering,
document
steering
and
exemplif
ication
. In text mining realm, effective feature selection is significant to
make the learning task more accurate and competent. One of the
traditional
lazy
text classifier
k
-
Nearest
Neighborhood (
k
NN) has
a
major pitfall in calculating the similarity between
all
the
objects in training and
testing se
t
s,
there by leads to exaggeration of
both
computational complexity
of the algorithm
and
massive
consumption
of
main memory
. To diminish these shortcomings
in
viewpoint
of a
data
-
mining
practitioner
a
n
amalgamati
ve technique is proposed in this paper using
a novel restructured version of
k
NN called
Augmented
k
NN
(AkNN)
and
k
-
Medoids
(kMdd)
clustering.
The proposed work
comprises
preprocesses
on
the
initial training
set
by
imposing
attribute feature selection
for reduc
tion of high dimensionality, also it
detects and excludes the high
-
fliers
samples
in t
he
initial
training set
and
re
structure
s
a
constricted
training
set
.
The kMdd clustering algorithm generates the cluster centers (as interior objects) for each category
and
restructures
the constricted training set
with centroids
. This technique
is
amalgamated with
AkNN
classifier
that
was prearranged with
text mining similarity measure
s.
Eventually, s
ignifican
tweights
and ranks were
assigned to each object in the new
training set based upon the
ir
accessory towards the
object in testing set
.
Experiments
conducted
on Reuters
-
21578 a
UCI benchmark
text mining
data
set
, and
comparisons with
traditional
k
NN
classifier designates
the
referred
method
yield
spreeminentrecital
in b
oth clustering and
classification
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
Soft Computing Techniques Based Image Classification using Support Vector Mac...ijtsrd
n this paper we compare different kernel had been developed for support vector machine based time series classification. Despite the better presentation of Support Vector Machine SVM on many concrete classification problems, the algorithm is not directly applicable to multi dimensional routes having different measurements. Training support vector machines SVM with indefinite kernels has just fascinated consideration in the machine learning public. This is moderately due to the fact that many similarity functions that arise in practice are not symmetric positive semidefinite. In this paper, by spreading the Gaussian RBF kernel by Gaussian elastic metric kernel. Gaussian elastic metric kernel is extended version of Gaussian RBF. The extended version divided in two ways time wrap distance and its real penalty. Experimental results on 17 datasets, time series data sets show that, in terms of classification accuracy, SVM with Gaussian elastic metric kernel is much superior to other kernels, and the ultramodern similarity measure methods. In this paper we used the indefinite resemblance function or distance directly without any conversion, and, hence, it always treats both training and test examples consistently. Finally, it achieves the highest accuracy of Gaussian elastic metric kernel among all methods that train SVM with kernels i.e. positive semi definite PSD and Non PSD, with a statistically significant evidence while also retaining sparsity of the support vector set. Tarun Jaiswal | Dr. S. Jaiswal | Dr. Ragini Shukla ""Soft Computing Techniques Based Image Classification using Support Vector Machine Performance"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-3 , April 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23437.pdf
Paper URL: https://www.ijtsrd.com/computer-science/artificial-intelligence/23437/soft-computing-techniques-based-image-classification-using-support-vector-machine-performance/tarun-jaiswal
In recent machine learning community, there is a trend of constructing a linear logarithm version of
nonlinear version through the ‘kernel method’ for example kernel principal component analysis, kernel
fisher discriminant analysis, support Vector Machines (SVMs), and the current kernel clustering
algorithms. Typically, in unsupervised methods of clustering algorithms utilizing kernel method, a
nonlinear mapping is operated initially in order to map the data into a much higher space feature, and then
clustering is executed. A hitch of these kernel clustering algorithms is that the clustering prototype resides
in increased features specs of dimensions and therefore lack intuitive and clear descriptions without
utilizing added approximation of projection from the specs to the data as executed in the literature
presented. This paper aims to utilize the ‘kernel method’, a novel clustering algorithm, founded on the
conventional fuzzy clustering algorithm (FCM) is anticipated and known as kernel fuzzy c-means algorithm
(KFCM). This method embraces a novel kernel-induced metric in the space of data in order to interchange
the novel Euclidean matric norm in cluster prototype and fuzzy clustering algorithm still reside in the space
of data so that the results of clustering could be interpreted and reformulated in the spaces which are
original. This property is used for clustering incomplete data. Execution on supposed data illustrate that
KFCM has improved performance of clustering and stout as compare to other transformations of FCM for
clustering incomplete data.
Similar to Max stable set problem to found the initial centroids in clustering problem (20)
Development of depth map from stereo images using sum of absolute differences...nooriasukmaningtyas
This article proposes a framework for the depth map reconstruction using stereo images. Fundamentally, this map provides an important information which commonly used in essential applications such as autonomous vehicle navigation, drone’s navigation and 3D surface reconstruction. To develop an accurate depth map, the framework must be robust against the challenging regions of low texture, plain color and repetitive pattern on the input stereo image. The development of this map requires several stages which starts with matching cost calculation, cost aggregation, optimization and refinement stage. Hence, this work develops a framework with sum of absolute difference (SAD) and the combination of two edge preserving filters to increase the robustness against the challenging regions. The SAD convolves using block matching technique to increase the efficiency of matching process on the low texture and plain color regions. Moreover, two edge preserving filters will increase the accuracy on the repetitive pattern region. The results show that the proposed method is accurate and capable to work with the challenging regions. The results are provided by the Middlebury standard dataset. The framework is also efficiently and can be applied on the 3D surface reconstruction. Moreover, this work is greatly competitive with previously available methods.
Model predictive controller for a retrofitted heat exchanger temperature cont...nooriasukmaningtyas
This paper aims to demonstrate the practical aspects of process control theory for undergraduate students at the Department of Chemical Engineering at the University of Bahrain. Both, the ubiquitous proportional integral derivative (PID) as well as model predictive control (MPC) and their auxiliaries were designed and implemented in a real-time framework. The latter was realized through retrofitting an existing plate-and-frame heat exchanger unit that has been operated using an analog PID temperature controller. The upgraded control system consists of a personal computer (PC), low-cost signal conditioning circuit, national instruments USB 6008 data acquisition card, and LabVIEW software. LabVIEW control design and simulation modules were used to design and implement the PID and MPC controllers. The performance of the designed controllers was evaluated while controlling the outlet temperature of the retrofitted plate-and-frame heat exchanger. The distinguished feature of the MPC controller in handling input and output constraints was perceived in real-time. From a pedagogical point of view, realizing the theory of process control through practical implementation was substantial in enhancing the student’s learning and the instructor’s teaching experience.
Control of a servo-hydraulic system utilizing an extended wavelet functional ...nooriasukmaningtyas
Servo-hydraulic systems have been extensively employed in various industrial applications. However, these systems are characterized by their highly complex and nonlinear dynamics, which complicates the control design stage of such systems. In this paper, an extended wavelet functional link neural network (EWFLNN) is proposed to control the displacement response of the servo-hydraulic system. To optimize the controller's parameters, a recently developed optimization technique, which is called the modified sine cosine algorithm (M-SCA), is exploited as the training method. The proposed controller has achieved remarkable results in terms of tracking two different displacement signals and handling external disturbances. From a comparative study, the proposed EWFLNN controller has attained the best control precision compared with those of other controllers, namely, a proportional-integralderivative (PID) controller, an artificial neural network (ANN) controller, a wavelet neural network (WNN) controller, and the original wavelet functional link neural network (WFLNN) controller. Moreover, compared to the genetic algorithm (GA) and the original sine cosine algorithm (SCA), the M-SCA has shown better optimization results in finding the optimal values of the controller's parameters.
Decentralised optimal deployment of mobile underwater sensors for covering la...nooriasukmaningtyas
This paper presents the problem of sensing coverage of layers of the ocean in three dimensional underwater environments. We propose distributed control laws to drive mobile underwater sensors to optimally cover a given confined layer of the ocean. By applying this algorithm at first the mobile underwater sensors adjust their depth to the specified depth. Then, they make a triangular grid across a given area. Afterwards, they randomly move to spread across the given grid. These control laws only rely on local information also they are easily implemented and computationally effective as they use some easy consensus rules. The feature of exchanging information just among neighbouring mobile sensors keeps the information exchange minimum in the whole networks and makes this algorithm practicable option for undersea. The efficiency of the presented control laws is confirmed via mathematical proof and numerical simulations.
Evaluation quality of service for internet of things based on fuzzy logic: a ...nooriasukmaningtyas
The development of the internet of thing (IoT) technology has become a major concern in sustainability of quality of service (SQoS) in terms of efficiency, measurement, and evaluation of services, such as our smart home case study. Based on several ambiguous linguistic and standard criteria, this article deals with quality of service (QoS). We used fuzzy logic to select the most appropriate and efficient services. For this reason, we have introduced a new paradigmatic approach to assess QoS. In this regard, to measure SQoS, linguistic terms were collected for identification of ambiguous criteria. This paper collects the results of other work to compare the traditional assessment methods and techniques in IoT. It has been proven that the comparison that traditional valuation methods and techniques could not effectively deal with these metrics. Therefore, fuzzy logic is a worthy method to provide a good measure of QoS with ambiguous linguistic and criteria. The proposed model addresses with constantly being improved, all the main axes of the QoS for a smart home. The results obtained also indicate that the model with its fuzzy performance importance index (FPII) has efficiently evaluate the multiple services of SQoS.
Low power architecture of logic gates using adiabatic techniquesnooriasukmaningtyas
The growing significance of portable systems to limit power consumption in ultra-large-scale-integration chips of very high density, has recently led to rapid and inventive progresses in low-power design. The most effective technique is adiabatic logic circuit design in energy-efficient hardware. This paper presents two adiabatic approaches for the design of low power circuits, modified positive feedback adiabatic logic (modified PFAL) and the other is direct current diode based positive feedback adiabatic logic (DC-DB PFAL). Logic gates are the preliminary components in any digital circuit design. By improving the performance of basic gates, one can improvise the whole system performance. In this paper proposed circuit design of the low power architecture of OR/NOR, AND/NAND, and XOR/XNOR gates are presented using the said approaches and their results are analyzed for powerdissipation, delay, power-delay-product and rise time and compared with the other adiabatic techniques along with the conventional complementary metal oxide semiconductor (CMOS) designs reported in the literature. It has been found that the designs with DC-DB PFAL technique outperform with the percentage improvement of 65% for NOR gate and 7% for NAND gate and 34% for XNOR gate over the modified PFAL techniques at 10 MHz respectively.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
Smart monitoring system using NodeMCU for maintenance of production machinesnooriasukmaningtyas
Maintenance is an activity that helps to reduce risk, increase productivity, improve quality, and minimize production costs. The necessity for maintenance actions will increase efficiency and enhance the safety and quality of products and processes. On getting these conditions, it is necessary to implement a monitoring system used to observe machines' conditions from time to time, especially the machine parts that often experience problems. This paper presents a low-cost intelligent monitoring system using NodeMCU to continuously monitor machine conditions and provide warnings in the case of machine failure. Not only does it provide alerts, but this monitoring system also generates historical data on machine conditions to the Google Cloud (Google Sheet), includes which machines were down, downtime, issues occurred, repairs made, and technician handling. The results obtained are machine operators do not need to lose a relatively long time to call the technician. Likewise, the technicians assisted in carrying out machine maintenance activities and online reports so that errors that often occur due to human error do not happen again. The system succeeded in reducing the technician-calling time and maintenance workreporting time up to 50%. The availability of online and real-time maintenance historical data will support further maintenance strategy.
Design and simulation of a software defined networkingenabled smart switch, f...nooriasukmaningtyas
Using sustainable energy is the future of our planet earth, this became not only economically efficient but also a necessity for the preservation of life on earth. Because of such necessity, smart grids became a very important issue to be researched. Many literatures discussed this topic and with the development of internet of things (IoT) and smart sensors, smart grids are developed even further. On the other hand, software defined networking is a technology that separates the control plane from the data plan of the network. It centralizes the management and the orchestration of the network tasks by using a network controller. The network controller is the heart of the SDN-enabled network, and it can control other networking devices using software defined networking (SDN) protocols such as OpenFlow. A smart switching mechanism called (SDN-smgrid-sw) for the smart grid will be modeled and controlled using SDN. We modeled the environment that interact with the sensors, for the sun and the wind elements. The Algorithm is modeled and programmed for smart efficient power sharing that is managed centrally and monitored using SDN controller. Also, all if the smart grid elements (power sources) are connected to the IP network using IoT protocols.
Efficient wireless power transmission to remote the sensor in restenosis coro...nooriasukmaningtyas
In this study, the researchers have proposed an alternative technique for designing an asymmetric 4 coil-resonance coupling module based on the series-to-parallel topology at 27 MHz industrial scientific medical (ISM) band to avoid the tissue damage, for the constant monitoring of the in-stent restenosis coronary artery. This design consisted of 2 components, i.e., the external part that included 3 planar coils that were placed outside the body and an internal helical coil (stent) that was implanted into the coronary artery in the human tissue. This technique considered the output power and the transfer efficiency of the overall system, coil geometry like the number of coils per turn, and coil size. The results indicated that this design showed an 82% efficiency in the air if the transmission distance was maintained as 20 mm, which allowed the wireless power supply system to monitor the pressure within the coronary artery when the implanted load resistance was 400 Ω.
Grid reactive voltage regulation and cost optimization for electric vehicle p...nooriasukmaningtyas
Expecting large electric vehicle (EV) usage in the future due to environmental issues, state subsidies, and incentives, the impact of EV charging on the power grid is required to be closely analyzed and studied for power quality, stability, and planning of infrastructure. When a large number of energy storage batteries are connected to the grid as a capacitive load the power factor of the power grid is inevitably reduced, causing power losses and voltage instability. In this work large-scale 18K EV charging model is implemented on IEEE 33 network. Optimization methods are described to search for the location of nodes that are affected most due to EV charging in terms of power losses and voltage instability of the network. Followed by optimized reactive power injection magnitude and time duration of reactive power at the identified nodes. It is shown that power losses are reduced and voltage stability is improved in the grid, which also complements the reduction in EV charging cost. The result will be useful for EV charging stations infrastructure planning, grid stabilization, and reducing EV charging costs.
Topology network effects for double synchronized switch harvesting circuit on...nooriasukmaningtyas
Energy extraction takes place using several different technologies, depending on the type of energy and how it is used. The objective of this paper is to study topology influence for a smart network based on piezoelectric materials using the double synchronized switch harvesting (DSSH). In this work, has been presented network topology for circuit DSSH (DSSH Standard, Independent DSSH, DSSH in parallel, mono DSSH, and DSSH in series). Using simulation-based on a structure with embedded piezoelectric system harvesters, then compare different topology of circuit DSSH for knowledge is how to connect the circuit DSSH together and how to implement accurately this circuit strategy for maximizing the total output power. The network topology DSSH extracted power a technique allows again up to in terms of maximal power output compared with network topology standard extracted at the resonant frequency. The simulation results show that by using the same input parameters the maximum efficiency for topology DSSH in parallel produces 120% more energy than topology DSSH-series. In addition, the energy harvesting by mono-DSSH is more than DSSH-series by 650% and it has exceeded DSSHind by 240%.
Improving the design of super-lift Luo converter using hybrid switching capac...nooriasukmaningtyas
In this article, an improvement to the positive output super-lift Luo converter (POSLC) has been proposed to get high gain at a low duty cycle. Also, reduce the stress on the switch and diodes, reduce the current through the inductors to reduce loss, and increase efficiency. Using a hybrid switch unit composed of four inductors and two capacitors it is replaced by the main inductor in the elementary circuit. It’s charged in parallel with the same input voltage and discharged in series. The output voltage is increased according to the number of components. The gain equation is modeled. The boundary condition between continuous conduction mode (CCM) and discontinuous conduction mode (DCM) has been derived. Passive components are designed to get high output voltage (8 times at D=0.5) and low ripple about (0.004). The circuit is simulated and analyzed using MATLAB/Simulink. Maximum power point tracker (MPPT) controls the converter to provide the most interest from solar energy.
Third harmonic current minimization using third harmonic blocking transformernooriasukmaningtyas
Zero sequence blocking transformers (ZSBTs) are used to suppress third harmonic currents in 3-phase systems. Three-phase systems where singlephase loading is present, there is every chance that the load is not balanced. If there is zero-sequence current due to unequal load current, then the ZSBT will impose high impedance and the supply voltage at the load end will be varied which is not desired. This paper presents Third harmonic blocking transformer (THBT) which suppresses only higher harmonic zero sequences. The constructional features using all windings in single-core and construction using three single-phase transformers explained. The paper discusses the constructional features, full details of circuit usage, design considerations, and simulation results for different supply and load conditions. A comparison of THBT with ZSBT is made with simulation results by considering four different cases
Power quality improvement of distribution systems asymmetry caused by power d...nooriasukmaningtyas
With an increase of non-linear load in today’s electrical power systems, the rate of power quality drops and the voltage source and frequency deteriorate if not properly compensated with an appropriate device. Filters are most common techniques that employed to overcome this problem and improving power quality. In this paper an improved optimization technique of filter applies to the power system is based on a particle swarm optimization with using artificial neural network technique applied to the unified power flow quality conditioner (PSO-ANN UPQC). Design particle swarm optimization and artificial neural network together result in a very high performance of flexible AC transmission lines (FACTs) controller and it implements to the system to compensate all types of power quality disturbances. This technique is very powerful for minimization of total harmonic distortion of source voltages and currents as a limit permitted by IEEE-519. The work creates a power system model in MATLAB/Simulink program to investigate our proposed optimization technique for improving control circuit of filters. The work also has measured all power quality disturbances of the electrical arc furnace of steel factory and suggests this technique of filter to improve the power quality.
Studies enhancement of transient stability by single machine infinite bus sys...nooriasukmaningtyas
Maintaining network synchronization is important to customer service. Low fluctuations cause voltage instability, non-synchronization in the power system or the problems in the electrical system disturbances, harmonics current and voltages inflation and contraction voltage. Proper tunning of the parameters of stabilizer is prime for validation of stabilizer. To overcome instability issues and get reinforcement found a lot of the techniques are developed to overcome instability problems and improve performance of power system. Genetic algorithm was applied to optimize parameters and suppress oscillation. The simulation of the robust composite capacitance system of an infinite single-machine bus was studied using MATLAB was used for optimization purpose. The critical time is an indication of the maximum possible time during which the error can pass in the system to obtain stability through the simulation. The effectiveness improvement has been shown in the system
Renewable energy based dynamic tariff system for domestic load managementnooriasukmaningtyas
To deal with the present power-scenario, this paper proposes a model of an advanced energy management system, which tries to achieve peak clipping, peak to average ratio reduction and cost reduction based on effective utilization of distributed generations. This helps to manage conventional loads based on flexible tariff system. The main contribution of this work is the development of three-part dynamic tariff system on the basis of time of utilizing power, available renewable energy sources (RES) and consumers’ load profile. This incorporates consumers’ choice to suitably select for either consuming power from conventional energy sources and/or renewable energy sources during peak or off-peak hours. To validate the efficiency of the proposed model we have comparatively evaluated the model performance with existing optimization techniques using genetic algorithm and particle swarm optimization. A new optimization technique, hybrid greedy particle swarm optimization has been proposed which is based on the two aforementioned techniques. It is found that the proposed model is superior with the improved tariff scheme when subjected to load management and consumers’ financial benefit. This work leads to maintain a healthy relationship between the utility sectors and the consumers, thereby making the existing grid more reliable, robust, flexible yet cost effective.
Energy harvesting maximization by integration of distributed generation based...nooriasukmaningtyas
The purpose of distributed generation systems (DGS) is to enhance the distribution system (DS) performance to be better known with its benefits in the power sector as installing distributed generation (DG) units into the DS can introduce economic, environmental and technical benefits. Those benefits can be obtained if the DG units' site and size is properly determined. The aim of this paper is studying and reviewing the effect of connecting DG units in the DS on transmission efficiency, reactive power loss and voltage deviation in addition to the economical point of view and considering the interest and inflation rate. Whale optimization algorithm (WOA) is introduced to find the best solution to the distributed generation penetration problem in the DS. The result of WOA is compared with the genetic algorithm (GA), particle swarm optimization (PSO), and grey wolf optimizer (GWO). The proposed solutions methodologies have been tested using MATLAB software on IEEE 33 standard bus system
Intelligent fault diagnosis for power distribution systemcomparative studiesnooriasukmaningtyas
Short circuit is one of the most popular types of permanent fault in power distribution system. Thus, fast and accuracy diagnosis of short circuit failure is very important so that the power system works more effectively. In this paper, a newly enhanced support vector machine (SVM) classifier has been investigated to identify ten short-circuit fault types, including single line-toground faults (XG, YG, ZG), line-to-line faults (XY, XZ, YZ), double lineto-ground faults (XYG, XZG, YZG) and three-line faults (XYZ). The performance of this enhanced SVM model has been improved by using three different versions of particle swarm optimization (PSO), namely: classical PSO (C-PSO), time varying acceleration coefficients PSO (T-PSO) and constriction factor PSO (K-PSO). Further, utilizing pseudo-random binary sequence (PRBS)-based time domain reflectometry (TDR) method allows to obtain a reliable dataset for SVM classifier. The experimental results performed on a two-branch distribution line show the most optimal variant of PSO for short fault diagnosis.
A deep learning approach based on stochastic gradient descent and least absol...nooriasukmaningtyas
More than eighty-five to ninety percentage of the diabetic patients are affected with diabetic retinopathy (DR) which is an eye disorder that leads to blindness. The computational techniques can support to detect the DR by using the retinal images. However, it is hard to measure the DR with the raw retinal image. This paper proposes an effective method for identification of DR from the retinal images. In this research work, initially the Weiner filter is used for preprocessing the raw retinal image. Then the preprocessed image is segmented using fuzzy c-mean technique. Then from the segmented image, the features are extracted using grey level co-occurrence matrix (GLCM). After extracting the fundus image, the feature selection is performed stochastic gradient descent, and least absolute shrinkage and selection operator (LASSO) for accurate identification during the classification process. Then the inception v3-convolutional neural network (IV3-CNN) model is used in the classification process to classify the image as DR image or non-DR image. By applying the proposed method, the classification performance of IV3-CNN model in identifying DR is studied. Using the proposed method, the DR is identified with the accuracy of about 95%, and the processed retinal image is identified as mild DR.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
Home assignment II on Spectroscopy 2024 Answers.pdf
Max stable set problem to found the initial centroids in clustering problem
1. Indonesian Journal of Electrical Engineering and Computer Science
Vol. 25, No. 1, January 2022, pp. 569~579
ISSN: 2502-4752, DOI: 10.11591/ijeecs.v25.i1.pp569-579 569
Journal homepage: http://ijeecs.iaescore.com
Max stable set problem to found the initial centroids in
clustering problem
Awatif Karim1
, Chakir Loqman1
, Youssef Hami2
, Jaouad Boumhidi1
1
LISAC Laboratory, Faculty of Science Dhar El Mehraz, University Sidi Mohamed Ben Abdellah, Fez, Morocco
2
RTMCSA Research Team, School National of Applied Sciences, University Abd El MalekEssaadi, Tangier, Morocco
Article Info ABSTRACT
Article history:
Received Feb 9, 2021
Revised Nov 5, 2021
Accepted Nov 26, 2021
In this paper, we propose a new approach to solve the document-clustering
using the K-Means algorithm. The latter is sensitive to the random selection
of the k cluster centroids in the initialization phase. To evaluate the quality
of K-Means clustering we propose to model the text document clustering
problem as the max stable set problem (MSSP) and use continuous Hopfield
network to solve the MSSP problem to have initial centroids. The idea is
inspired by the fact that MSSP and clustering share the same principle,
MSSP consists to find the largest set of nodes completely disconnected in a
graph, and in clustering, all objects are divided into disjoint clusters.
Simulation results demonstrate that the proposed K-Means improved by
MSSP (KM_MSSP) is efficient of large data sets, is much optimized in
terms of time, and provides better quality of clustering than other methods.
Keywords:
Continous hopfield network
Document clustering
Initial centroids
Maximum stable set problem
This is an open access article under the CC BY-SA license.
Corresponding Author:
Awatif Karim
LISAC Laboratory, Faculty of Science Dhar El Mehraz, University Sidi Mohamed Ben Abdellah
Box 30003, Fez, Morocco
Email: awatif.karim@usmba.ac.ma
1. INTRODUCTION
Today, clustering is an important tool in data mining; it is an automatic learning process where the
objects are clustered or grouped based on the principle of maximizing the intraclass similarity and
minimizing the interclass similarity without knowing the labels of the data. Many clustering strategies are
available in the literature as [1], [2], among them two main categories known as hierarchical and partitioning
clustering. A tree structure created in the process of hierarchical clustering and shows how objects are
grouped (in an agglomerative method) or partitioned (in a divisive method). Dumond [3] have argued that the
agglomerative hierarchical clustering algorithm starts with each object forming a separate group.
In partitioning methods, a data set is decomposed into k partitions. A cluster is a set of elements
represented by the centroid (or prototype) of the cluster. This latter is formed in such a way that it is closely
related (in terms of similarity function) to all objects in that cluster. The best known partitioning methods
including, K-Means and its variants [4], fuzzy c-means, possibilistic c-means, hard c-means (HCM) and
mean shift [5], K-medoids [6], partitioning around medoids (PAM), CLARA [7] and CLARANS [8]. Most of
these algorithms select randomly the initial centers in advance, which considerably affect the quality of
clustering results. In the same context, K-Means is an unsupervised learning method [9], which is widely
used in text clustering. K-Means is an algorithm whose cluster number is given at the start and well
constructed as well. But, the difficulty is that the results of clustering depends on the fixed number of classes
and on the random choice of initial clusters, especially when the data set is large and we don't have
assumptions about the data [10]. However, it can be stuck in a local minimum and cause an unstable result (if
we reinitialize the algorithm with other values, it may converge to another local solution) [11]-[14].
2. ISSN: 2502-4752
Indonesian J Elec Eng & Comp Sci, Vol. 25, No. 1, January 2022: 569-579
570
To overcome the above deficiencies, most research in clustering analysis has been focused on the
“automatic clustering algorithm”. The Elbow Method is one of the most popular methods to determine this
optimal value of k. It consists of running K-Means clustering of the data set with a range of values of k,
calculating the sum of squared errors for each k, and plotting them in a line chart. If the chart looks like an
arm, the best value of k will be on the "elbow" [15].
With the help of metaheuristics algorithms, it was investigated as a new method in cluster analysis
[16]. One of the major methods for this problem is a combination of swarm intelligence algorithms with
cluster validity indices as an objective function [17], [18]. There are some criteria, derived from different
approaches, for determining the optimal number of classes. We cite the criterion of Xie and Beni [19], which
is based on a measure of separability and the compactness of classes. These two notions define criteria for
evaluating a classification. Xie and Beni propose to choose the optimal k which minimizes the relationship
between separability and compactness.
In [20], researchers have presented a method fuzzy silhouette index on dynamic data to find the
optimal number of clusters. The optimal number of clusters k is the one that maximizes the average silhouette
over a range of possible values for k, the clustering result is unstable. In [21], the greedy algorithm was
applied to obtain the number of centers and made the final clustering result features of higher accuracy rate
and stability. Shen-Yi et al. [22] have initialized K-Means by the initial centers produced by hierarchical
clustering algorithm, the clustering results found in terms of FM are not very convincing, and the dataset is
from the Chinese text corpus. According to [23], the Fuzzy C-Means is generated to produce initial centers,
and particle swarm optimization (PSO) is used to make optimum clusters.
Song et al. [24] improved Huang Min’s algorithm [25] to select initial clustering centers focusing on
the distance between the samples that have the same maximum density parameters and compare it with the
average distance of the dataset. Sherkat et al. [26] produced initial centers based on a method called
deterministic seeding K-Means (DSKM). The key idea of the proposed method is to select k data points that
are distant from each other, and at the same time have a high L1 norm. These data points are used to initialize
the K-Means algorithm, the method has produced good results because it takes as parameter (k) the real
number of classes, which means that it is sensitive to the number of cluster parameter.
In this paper, we are interested in evaluating the approach described in our previous work [27],
which assumed good results in determining the number of clusters by using MSSP and community healthcare
network (CHN), but it overlooked the selecting initial clustering centers. In order to evaluate this approach,
we consider the nodes found by MSSP as the initial cluster centroids. Thus, we propose a method for the
automatic detection of initial cluster centroids, which are the input parameters in several partitioning
clustering methods. The proposed algorithm is executed before clustering, which means that it is independent
of any clustering method that starts with k centers, it is efficient of large data sets and much optimized in
terms of time. It consists to modelize the text document clustering problem as the MSSP, and use CHN to
solve the MSSP problem to have document seeds. We have chosen K-Means as a clustering method to test
our experimentation. This paper is structured: Section 2 describes the concept of text clustering. Section 3
discusses MSSP and CHN, which are the main components of the method and presents its steps. In section 4,
we present the implementation and the experimental results of the KM_MSSP. Section 5 concludes this
work.
2. FUNDAMENTAL CONCEPTS OF DOCUMENTS CLUSTERING
A document clustering is a difficult task in text mining, it consists of grouping similar documents.
when the topics don’t know in advance. Its goal is to organize a collection of documents according to their
topics and help users to have information access, its process consists of the following steps [28]
2.1. Text pre-processing
Each text is represented in the vector space model (VSM) which is an algebraic model for
representing text documents as vectors of terms. Then, the stop words list and punctuation marks must be
removed. Stemming which is the process of reducing words to their stem or root form is performed, and a
pre-processing filter has already been applied to the data to eliminate terms that have a low frequency (count
< 3), leading to significant dimensionality reduction without loss of clustering performance.
2.2. Term weighting
To represent quantitatively the term in a document, we use the term frequency-inverse document
frequency algorithm (TF-IDF). It takes into account both the relative frequency of a stem in a document and
the frequency of the stem within the corpus. A composite weight for each term 𝑡𝑖 in each document 𝑑𝑗 is 𝑤ij,
it is given by,
3. Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752
Max Stable set problem to found the initial centroids in clustering problem (Awatif Karim)
571
𝑤𝑖𝑗 = 𝑇𝐹𝑖𝑗 × 𝐼𝐷𝐹𝑖
2.3. Similarity measure
There are several measures of similarity between documents in the literature. In particular, we find
the Euclidean, Manhattan and Cosine distance 𝐷 that we will be used in our experimentation, it is computed.
𝐷(𝑥𝑖, 𝑥𝑗) = 1 − 𝑐𝑜𝑠(𝑥𝑖, 𝑥𝑗)
Where 𝑐𝑜𝑠(𝑥𝑖, 𝑥𝑗) is cosine similarity of two text vectors. Cosine value is 1 when the documents are similar
and 0 when they are dissimilar.
2.4. Validity index
A clustering evaluation demands an independent and reliable measure for the assessment and
comparison of clustering experiments and results. In this research, we focus on six validation measures: F-
measure, purity, entropy, normalized mutual information, Xie-Beni Index and Fukuyama-Sugeno index [29].
The common basis of the indexes is that their computations are all based on a contingency table, which
defines the association between two classifications on one same set of individuals 𝐸.
3. THE PROPOSED METHOD
In the following, we describe the maximum stable set problem and we introduce the continuous
Hopfield network, which are the main components of our approach. Also, the CHN and MSSP problem are
combined to find the initial centroids or documents seeds. We have chosen K-Means as a clustering method
to test our approach.
3.1. Maximum stable set problem
We denote by 𝐺 = (𝑉, 𝐸) an undirected graph. 𝑉 is the set of nodes and 𝐸 is the set of edges. A
stable set of a graph 𝐺 is the subset 𝑆 of 𝑉 with the property that each pair of 𝑆 is not connected via an edge
in graph 𝐺 [30]. The MSSP consists to find the stable set in a graph of maximum cardinality.
The goal in this paper is to consider the data mining applications specifically the clustering of text
documents the motivating application of the maximum stable set. The MSSP is a well-known NP-hard
problem in combinatorial optimization, which can be formulated as a quadratic 0-1 programming. To solve
this latter problem, we use the CHN.
3.2. Continuous hopfield network
At the beginning of 1980, Hopfield and Tank [31] introduce a Hopfield neural network. It has been
extensively studied of neural networks and is trained efficiently to solve difficult problems [32] such as
pattern recognition, model identification, and optimization. The major advantage of the continuous Hopfield
network is in its structure which can be realized on an energy function approach by adding the objective and
penalizing the constraints in order to solve some classification and optimization problems. Talavn and Yez [33]
showed that when the function reaches a steady state of a differential equation system associated with the
CHN, an approximate solution of several optimization problems is obtained. Their results encouraged a
number of researchers to apply this network to different problems.
The CHN of size 𝑛 is a fully connected neural network with 𝑛 continuous-valued units (or neurons).
Following the notation used in [33]: Let 𝑇𝑖𝑗 be the strength of the connection from neuron 𝑖 and neuron 𝑗. The
model assumes symmetrical weights (𝑇𝑖𝑗 = 𝑇𝑗𝑖), in most cases, zero self-coupling terms (𝑇𝑖𝑖 = 0) and each
neuron 𝑖 has an offset biais 𝑖𝑖
𝑏
. Let 𝑢 and 𝑥 be the current state and the output of the neuron 𝑖, with 𝑖 ∈
{1, . . , 𝑛}. The system of the CHN is described by the differential equation,
𝑑𝑢
𝑑𝑡
=
𝑢
τ
+ 𝑇𝑥 + 𝑖𝑏
where 𝜏 is the value of the time constant of the amplifiers, and without loss of generality can be assigned a
value of unity. The output function is a hyperbolic tangent of each neuron state. The expression of the energy
function associated with the continuous Hopfield network is defined by,
𝐸(𝑥) = −
1
2
𝑥𝑡
𝑇𝑥 − (𝑖𝑏
)𝑡
𝑥
typically, in the CHN, the energy function is initialized to the objective function of the optimization problem
and for each constraint a penalty term is added.
4. ISSN: 2502-4752
Indonesian J Elec Eng & Comp Sci, Vol. 25, No. 1, January 2022: 569-579
572
3.3. MSSP and CHN to find initial centroids
We suppose that we have a dataset of 𝑛 documents. So we want to divide a dataset into groups such
that the members of each group are as similar as possible to one another. The overall flow of the algorithm is
shown in Figure 1.
Figure 1. MSSP and CHN to find initial centroids
The specific process of the algorithm is:
- After the preprocessing step, we compute the term-document matrix 𝑊.
- Based on the cosine distance between documents we create a similarity matrix 𝐵. Therefore the nodes V
and edges E of the graph 𝐺(𝑉, 𝐸) are building.
- Then we represent a text document clustering problem as the max stable set problem (MSSP).
- The documents are considered as nodes, To build the edges we calculate the similarity (cosine ditance)
between different nodes 𝑣𝑖. If similarity between tow documents bij >𝜀 we have an edge between them.
- After that we use quadratic integer programming formulation QP, to determine the stable set maximal S
associated with this graph.
(𝑄𝑃) {
𝑀𝑖𝑛𝐹(𝑥) = − ∑ 𝑥𝑖
𝑛
𝑖=1
𝑆𝑢𝑏𝑗𝑒𝑐𝑡 𝑡𝑜
𝑥𝑡
𝐶𝑥 = 0
𝑥 ∈ {0,1}𝑛
𝑠𝑢𝑐ℎ 𝑎𝑠 ∀𝑖 ∈ {1, … , 𝑛}, 𝑥𝑖 = {
1 𝑖𝑓 𝑣𝑖 ∈ 𝑆
0 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
C is an n × n symmetric matrix defined by: 𝑐𝑖𝑗 = {
1 𝑖𝑓(𝑣𝑖, 𝑣𝑗) ∈ 𝐸
0 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
- The continuous Hopfield network is implemented to solve the QP problem.
- So the solution is denoted as vector that have values 0 and 1. If the value at position 𝑗 is 1 that means 𝑗𝑡ℎ
document is selected as centroid otherwise it is not selected.
- Finally, we obtain 𝑘 documents fully disconnected (document seeds).
3.4. K-Means algorithm initialized by MSSP (KM_MSSP)
K-Means is a commonly used clustering method in text clustering, which serves centroids to
represent clusters by minimizing the squared errors. The algorithm begins with a predefined set of centroids
(which can be produced either randomly or by means of any other criterion, in this study by MSSP). It
5. Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752
Max Stable set problem to found the initial centroids in clustering problem (Awatif Karim)
573
achieves sequential repetitions of the rest of the sample according to the similarity (using cosine distance)
with the centroids such as each document is assigned to the cluster with the most similar centroid. Then, the
algorithm is iterative processing and adjust the center position, until no reassignment of patterns to new
cluster centroids or minimal decrease in square error or the number of iterations has surpassed a threshold. So
this produces a separation of the objects into groups from which the metric to be minimized can be
calculated. The last step concerns improving the obtained solution found by MSSP and CHN as an input
parameter in K-Means as shown in Figure 2, and evaluates the clustering results based on some validity
criteria.
Figure 2. The proposed KM_MSSP algorithm
4. RESULTS AND DISCUSSION
In order to demonstrate the efficiency of our approach, we have affected a series of experiments of
instances from the dataset british broadcasting corporation (BBC) and dataset 20NewsGroup. Most of these
instances are created by varying the number of classes and the number of documents. In this section we
present the dataset description, the results and the comparison of KM which uses random centroids with
KM_MSSP which starts by 𝑘 centroids found by MSSP.
4.1. Data set description
BBC Dataset consists of 2225 documents from the BBC news website corresponding to stories in
five topical areas from 2004-2005. The dataset is classified into five natural classes such as business,
entertainment, politics, sport, and technology. It is available in [34]. A subset of the data sets is taken and
separated into four categories, we use three subsets of the documents from the dataset BBC. The first subset
contains 5 topics and 500 documents, the second contains 3 topics and 1328 documents, the third contains
four topics and 1715 documents and each document has a single topic label, as shown in Table 1.
BBC sport dataset consists of 737 documents from the BBC Sport website corresponding to sports
newsarticles in five topical areas from 2004-2005. The dataset is classified into five natural classes such as
athletics, cricket, football, rugby, tennis. It is available in [34]. We use one subset of the documents from the
dataset BBC, which contains 3 topics and 372 documents.
20NewsGroup: is a collection of approximately 20,000 newsgroup documents, partitioned in 20
groups. We have selected a subset of this dataset containing total 400 documents from over four categories.
The 20 newsgroups collection is a popular data set for experiments in text classification and text clustering.
6. ISSN: 2502-4752
Indonesian J Elec Eng & Comp Sci, Vol. 25, No. 1, January 2022: 569-579
574
4.2. Experimental results
Simulative experiments are executed to show the advantages and drawing scientific remarks on the
algorithm adaptability of the Improved KM_MSSP. The parameter of similarity 𝜀 is determined by several
tests and is fixed as (𝜀 = 1). As shown in Table 1, in dataset BBC_2225 we have 2225 documents organized
into 5 different classes, in the fourth column (Number of clusters obtained by MSSP), our method gives 6
clusters for BBC_2225 and 5 clusters for BBC_1715 which are very close to the value of the third column
(real number of classes). However, in BBC_500, BCC_1328, BBC_Sport1 and BBC_Sport2 the solution is
equal to the real number of classes. Hence, we conclude that our method gives for each dataset a very close
or an equal result to the real number of classes. In column 5, we observe that the initial centroids proposed by
MSSP for each instance of the dataset are very dissimilar and belong to different groups. The execution time
is very limited; it varies according to the size of dataset as shown in column 6.
To examine the quality of our approach, a statistical study was represented; this study is based on
the calculation operator’s performance,
𝑅𝑎𝑡𝑖𝑜 =
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠 𝑜𝑏𝑡𝑎𝑖𝑛𝑒𝑑 𝑏𝑦 𝑀𝑆𝑆𝑃
𝑟𝑒𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠 𝑒𝑥𝑖𝑠𝑡𝑖𝑛𝑔 𝑖𝑛 𝑡ℎ𝑒 𝑙𝑖𝑡𝑒𝑟𝑎𝑡𝑢𝑟𝑒
in this context, the ratio minimum is always superior or equal to 1 then the KM_MSSP give the upper bound
of real number of class. Moreover, if the ratio minimum is equals to 1, then the KM_MSSP has found the real
number of classes existing in the literature.
Table 1. Initial centroids obtained by KM_MSSP out of 20 runs
Dataset Number of
documents
Real number
of classes
Number of clusters
obtained
Ratio Initial centroids obtained CPU
time(sec)
mode Mean min Clusters Documents
BBC_News
BBC_2225 2225 5 6 1.2 1.19 1.2 Sport
politics
sport
entertainment
tech
business
199.txt
397.txt
324.txt
243.txt
334.txt
280.txt
4.2 s
BBC_1328 1328 3 3 01 1.1 01 business
politics
tech
394.txt
397.txt
284.txt
1.2 s
BBC_500 500 5 5 01 1.1 01 sport
business
entertainment
politics
tech
018.txt
080.txt
092.txt
074.txt
039.txt
0.18 s
BBC_1715 1715 4 5 1.25 1.25 1.25 sport
sport
politics
entertainment
tech
191.txt
324.txt
397.txt
154.txt
284.txt
0.06s
BBC_Sport
BBC_Sport1 737 5 5 01 01 01 football
athletics
football
cricket
rugby
263.txt
026.txt
147.txt
077.txt
067.txt
0.01s
BBC_Sport2 372 3 3 01 01 01 rugby
cricket
athletics
041.txt
104.txt
019.txt
0.002s
20NesGroup
20NG_400 400 4 4 01 01 01
misc.forsale
comp.graphics
rec.autos
rec.motorcycles
76027.txt
38464.txt
103209.txt
104297.txt
0.16s
4.3. Comparison between classic K-Means (KM) and K-Means improved (KM_MSSP)
In order to demonstrate the efficiency of our approach, we compare KM which uses random
centroids with KM_MSSP which starts by 𝑘 centroids found by MSSP, using six validity measures: F-
measure, purity, entropy, XieBeni index, Fukuyama Sugeno index and normalized mutual information
(NMI). The comparison between KM and KM_MSSP has to be implemented on the same number of clusters
7. Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752
Max Stable set problem to found the initial centroids in clustering problem (Awatif Karim)
575
founded by MSSP and under the same environment by measuring the cosine distance between the given
element representation and the centroid of the cluster. But the difference is that KM is based on the random
choice of centers, whereas KM_MSSP starts with the centroids found by MSSP.
4.3.1. F-measure comparison of KM and the proposed KM_MSSP algorithm
As can be seen from Figure 3, the average F-metric of the clustering algorithm of KM_MSSP is
94% on dataset BBC_1328, 92% on dataset BBC_2225, 86% on BBC_500, 81% on BBC_1715, 68% on
BBC_Sport1 and 96% on BBC_Sport2. Which is higher than the average F-metric of the KM clustering of
81% on BBC_1328, 75% on dataset BBC_2225 and 73% on BBC_500. So, we notice that there is an
increase of F-Measure values in KM_MSSP compared with those of KM in all available datasets.
Figure 3. F-measure comparison of KM and the proposed KM_MSSP algorithm
4.3.2. Purity comparison of KM and the proposed KM_MSSP algorithm
A comparison of purity between KM and KM_MSSP is depicted in Figure 4. It is pictured that
KM_MSSP outperforms KM in terms of purity in all available datasets. The purity of KM_MSSP is 94% on
dataset BBC_1328, 92% on dataset BBC_2225 and 86% on BBC_500, which is higher than the purity of the
KM of 81% on BBC_1328, 72% on dataset BBC_2225 and 66% on BBC_500.
Figure 4. Purity comparison of KM and the proposed KM_MSSP algorithm
4.3.3. Entropy comparison of KM and the proposed KM_MSSP algorithm
Also, we observe that, there are a decline of entropy as shown in Figure 5, values in KM_MSSP
compared with those of KM. The entropy of the KM_MSSP is 32%, which is lower than the entropy of the
KM clustering of 61% on the BBC_2225 dataset. Thus KM_MSSP outperforms KM in terms of entropy in
all available datasets.
4.3.4. NMI comparison of KM and the proposed KM_MSSP algorithm
A comparison of NMI between KM and KM_MSSP is depicted in Figure 6. It is pictured that
KM_MSSP outperforms KM in terms of NMI in all available datasets. The NMI of KM_MSSP is 78% on
dataset BBC_1328 and dataset BBC_2225, 63% on BBC_500, 85% on BBC_Sport2, which is higher than
the NMI of K-Means of 51% on BBC_500 and dataset BBC_1328, 77% on BBC_2225, 64% on
BBC_Sport2.
8. ISSN: 2502-4752
Indonesian J Elec Eng & Comp Sci, Vol. 25, No. 1, January 2022: 569-579
576
Figure 5. Entropy comparison of KM and the proposed KM_MSSP algorithm
Figure 6. NMI comparison of KM and the proposed KM_MSSP algorithm
4.3.5. Time comparison of KM and the proposed KM_MSSP algorithm
The comparisons of the CPU time on all datasets are shown in Figure 7. According to the graph,
KM_MSSP spends a very little time on all datasets than KM. So, our approach improves KM in terms of
CPU time.
Figure 7. Time comparison of KM and the proposed KM_MSSP algorithm
4.3.6. Comparing Xie-Beni Index (XB) of KM and the proposed KM_MSSP algorithm
A comparison of the Xie-Beni index (XB) between KM and KM_MSSP is depicted in Table 2. It is
pictured that KM_MSSP outperforms KM in terms of the XB index in all available datasets. The XB of
KM_MSSP is 3% on dataset BBC_1328, 7% on dataset BBC_2225 and 8% on BBC_500, which is lower
than the XB of the KM of 45% on BBC_1328, 47% on dataset BBC_2225 and 52% on BBC_500.
9. Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752
Max Stable set problem to found the initial centroids in clustering problem (Awatif Karim)
577
Table 2. Comparing Xie-Beni index (XB) of KM and the proposed KM_MSSP algorithm
Dataset KM KM_MSSP
BBC_500 0.52 0.08
BBC_1328 0.47 0.03
BBC_2225 0.45 0.07
BBC_1715 0.50 0.07
BBC_Sport1 0.53 0.09
BBC_Sport2 0.52 0.06
20NG_400 1.08 0.17
4.3.7. Comparing Fukuyama-Sugeno index of KM and the proposed KM_MSSP algorithm
From Table 3 we can see that there is a decline of Fukuyama-Sugeno index (FS) values in
KM_MSSP compared with those of KM. The FS index of the KM_MSSP is 11%, which is lower than the FS
of the KM clustering of 17% on the dataset_24. Thus KM_MSSP outperforms KM in terms of FS index in all
available datasets.
Table 3. Comparing Fukuyama-Sugeno index of KM and the proposed KM_MSSP algorithm
Dataset KM KM_MSSP
BBC_500 0.87 0.42
BBC_1328 0.55 0.13
BBC_2225 0.75 0.4
BBC_1715 0.38 0.21
BBC_Sport1 0.20 0.16
BBC_Sport2 0.37 0.24
20NG_400 0.53 0.48
It is clearly observed that the results of KM_MSSP are far better for all datasets in terms of entropy,
purity, F-measure, NMI, and CPU time. KM_MSSP begins with a number of clusters (number of clusters
obtained by MSSP) very close or equal to the real number of classes and starts with initial cluster centroids
selected by MSSP (which guarantees an independent set of documents). But KM obtains the initial centers by
using a random method which gives unstable clustering results and too many iterations, it affects the quality
of clustering and costs a great amount of time during the process of clustering.
4.4. Comparison between KM MSSP and other deterministic method
In order to demonstrate the effectiveness of our approach, we compare our approach with DSKM
[26] in terms of Normalized Mutual Information NMI on dataset BBC_2225. From Table 4, the comparison
between KM_MSSP and DSKM was done on the same number of clusters founded by MSSP (6 for Dataset
BBC_2225 (see Table 2)). The NMI of Ké²& is greater than the DSKM algorithm.
Table 4. Comparing clustering NMI score between KM_MSSP and DSKM on dataset BBC_2225 and
number of clusters equal to number founded by MSSP (k=6)
Methods NMI
DSKM 0.681
KM_MSSP 0.78
5. CONCLUSION
This research aims to develop a method for determining automatically both, number of clusters and
initial cluster centroids which are the background knowledge in classic K-Means or other methods of
clustering. To achieve this goal and obtain the highest quality of clustering, the MSSP is generalized to apply
to the clustering of text documents using CHN. The method is independent of any clustering method that
starts with k centroids. Experimental results show that our method can effectively find the optimal number of
clusters, find a correct set of centroids, obtain better clustering results in a short time, and also a large number
of documents can be easily handled. To demonstrate the efficiency of our approach, we compare classic K-
Means which uses random centroids with KM_MSSP which starts by K centroids found by MSSP, using six
validity measures: F-measure, purity, entropy, XB, FS and NMI.
Thus, KM_MSSP outperforms KM in all available datasets, for example, in dataset BBC_1328 the
purity and the F-measure are 94%, NMI is 78%, entropy is 22% and time is 1.63s which is higher than K-
Means of the purity and the F-measure are 81%, NMI is 51%, entropy is 53% and time is 5.01s. The field of
10. ISSN: 2502-4752
Indonesian J Elec Eng & Comp Sci, Vol. 25, No. 1, January 2022: 569-579
578
MSSP is still open to many challenges that provide future scope for improvement in the document-clustering
problem. The future work includes: (i) apply our approach to other areas, (ii) Improve CHN by combining it
with a metaheuristic to obtain a global minimum, and (iii) Use a deep learning concept to improve the pre-
processing step.
REFERENCES
[1] J. Han, J. Pei, and M. Kamber, “Data Mining Concepts and Techniques,” Morgan Kauffman Publishers, 3rd ed, USA, 2012.
[2] D. Larose, “Discovering knowledge in data: an introduction to data mining,” John Wiley & Sons, Inc., Hoboken, New Jersey,
2014.
[3] M. Dumont, P.-A. Reninger, A. Pryet, G. Martelet, B. Aunay, and J.-L. Join, “Agglomerative hierarchical clustering of airborne
electromagnetic data for multi-scale geological studies,” Journal of Applied Geophysics, vol. 157, pp. 1-9, Oct. 2018,
doi: 10.1016/j.jappgeo.2018.06.020.
[4] S. Bandyapadhyay and K. R. Varadarajan, “On variants of k-means clustering,” arXiv preprint arXiv:1512.02985, 2015.
[5] Md. A. B. Siddique, R. B. Arif, M. M. R. Khan, and Z. Ashrafi, “Implementation of Fuzzy C-Means and Possibilistic C-Means
Clustering Algorithms, Cluster Tendency Analysis and Cluster Validation,” Preprints 2018, doi: 10.20944/preprints
201811.0581.v1.
[6] H. S. Park and C. H. Jun, “A simple and fast algorithm for K-medoids clustering,” Expert SystAppl, vol. 36, no. 2, pp. 3336-3341,
2009, doi: 10.1016/j.eswa.2008.01.039.
[7] L. Kaufman and P. J. Rousseeuw, “Finding groups in data: An introduction to cluster analysis,” Wiley, Hoboken, vol. 344, 2008,
doi: 10.1002/9780470316801.
[8] R. T. Ng and J. Han, “Clarans: a method for clustering objects for spatial data mining,” IEEE Transactions on Knowledge and
Data Engineering, vol. 14, no. 5, pp. 1003-1016, Sep. 2002, doi: 10.1109/TKDE.2002.1033770.
[9] J. Wu, “Advances in K-Means Clustering: A Data Mining Thinking,” Springer Theses, Springer Berlin Heidelberg, 2012.
[10] W. Ashour and C. Fyfe, “Improving Bregman K-Means,” International Journal of Data Mining, Modelling and Management,
vol. 6, no. 1, pp. 65-82, Jan. 2014, doi: 10.1504/IJDMMM.2014.059981.
[11] L. I. Kuncheva and J. C. Bezdek, “Selection of cluster prototypes from data by a genetic algorithm,” in: Proc. 5th European
Congress on Intelligent Techniques and Soft Computing (EUFIT), Aachen, Germany, vol. 18, 1997, pp. 1683-1688.
[12] J. C. Bezdek and R. J. Hathaway, “Optimization of fuzzy clustering criteria using genetic algorithms,” in Proc. First IEEE Conf.
Evolutionary Computation, Piscataway, NJ: IEEE Press, 1994, pp. 589-594, doi: 10.1109/ICEC.1994.349993.
[13] B. Kövesi, J.-M. Boucher, and S. Saoudi, “Stochastic K-means algorithm for vector quantization,” Pattern Recognit. Lett., vol. 22,
no. 6-7, pp. 603-610, 2001, doi: 10.1016/s0167-8655(01)00021-6.
[14] M. Sarkar, B. Yegnanarayana, and D. Khemanial, “A clustering algorithm using an evolutionary-based approach,” Pattern
Recognition Letters, vol. 18, no. 10, pp. 975-986, 1997, doi: 10.1016/S0167-8655(97)00122-0.
[15] D. J. Ketchen and C. l. Shook, “The application of cluster analysis in strategic management research: an analysis and critique,”
Strategic Management Journal, vol. 17, no. 6, pp. 441-458, Jun 1996, doi: 10.1002/(sici)1097-0266(199606)17:6<441::aid-
smj819>3.0.co;2-g.
[16] A. J.-Garcia and W. G.-Flores, “Automatic clustering using nature-inspired metaheuristics: A survey,” Applied Soft Computing,
vol. 41, pp. 192-213, Apr. 2016, doi: 10.1016/j.asoc.2015.12.001.
[17] S. Das, A. Abraham, and A. Konar, “Spatial information-based image segmentation using a modified particle swarm optimization
algorithm,” ISDA '06: Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications, Oct.
vol. 2, 2006, pp. 438-444, doi: 10.1109/ISDA.2006.253877.
[18] R. Kuo, Y. Syu, Z.-Y. Chen, and F.-C. Tien, “Integration of particle swarm optimization and genetic algorithm for dynamic
clustering,” Information Sciences, vol. 195, pp. 124-140, Jul. 2012, doi: 10.1016/j.ins.2012.01.021.
[19] X. L. Xie and G. Beni, “A validity measure for fuzzy clustering,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 13, no. 8,
pp. 841-847, Aug. 1991, doi: 10.1109/34.85677.
[20] C. Subbalakshmi, G. Krishna, K. Rao, and P. Rao, “A method to find optimum number of clusters based on fuzzy silhouette on
dynamic data set,” Procedia Computer Science, 2015, vol. 46, pp. 346-353, doi: 10.1016/j.procs.2015.02.030.
[21] X. Tong, F. Meng, and Z. Wang, “Optimization to K-Means initial cluster centers,” Computer Engineering and Design, vol. 32,
no. 8, pp. 2721-2723, 2011.
[22] Q. S.-Yi, L. H.-Hui, and L. D.-Yi, “Research and Application of Improved K-Means Algorithm in Text Clustering,” DEStech
Transactions on Computer Science and Engineering (pcmm), Jun. 2018, doi: 10.12783/dtcse/pcmm2018/23653.
[23] S. Karol and V. Mangat, “Evaluation of text document clustering approach based on particle swarm optimization,” Proceedings of
Central European Journal of Computer Science, vol. 3, no. 2, Jun. 2013, pp. 69-90, doi: 10.2478/s13537-013-0104-2.
[24] J. Song, X. Li, and Y. Liu, “An Optimized K-Means Algorithm for Selecting Initial Clustering Centers,” International Journal of
Security and Its Applications, vol. 9, no. 10, pp. 177-186, 2015, doi: 10.14257/ijsia.2015.9.10.16.
[25] M. Huang, Z. S. He, X. L. Xing, and Y. Chen, “New K-Means clustering center select algorithm,” Computer Engineering and
Applications, vol. 47, no. 35, pp. 132-134, 2011.
[26] E. Sherkat, J. Velcin, and E. E. Milios, “Fast and Simple Deterministic Seeding of K-Means for Text Document Clustering,” 9th
Conference and Labs of the Evaluation Forum (CLEF), Proceedings, Sep. 2018, pp. 76-88, doi: 10.1007/978-3-319-98932-7_7.
[27] A. Karim, C. Loqman, and J. Boumhidi, “Determining the Number of Clusters using Neural Network and Max Stable Set
Problem,” The 1st. Int. Conf. On Intelligent Computing in Data Sciences. Procedia Computer Science, 2018, vol. 127, pp. 16-25,
doi: 10.1016/j.procs.2018.01.093.
[28] M. R. Aliguliyev, “Clustering of document collection, A weighting approach,” Expert Systems with Applications, vol. 36, no. 4,
pp. 7904-7916, 2009, doi: 10.1016/j.eswa.2008.11.017.
[29] Y. Fukuyama and M. Sugeno, “A new method of choosing the number of clusters for the fuzzy c-means method,” in: Proc. 5th
Fuzzy Syst. Symp., Jan. 1989, pp. 247-250.
[30] Y. Chung and M. Demange, “The 0-1 inverse maximum stable set problem,” Discrete Applied Mathematics, vol. 156, no. 13,
pp. 2501-2516, 2008, doi: 10.1016/j.dam.2008.03.015.
[31] J. Hopfield and D. Tank, “Neural computation of decisions in optimization problems,” Biological Cybernetics, vol. 52, no. 3,
pp. 1-25, 1985, doi: 10.1007/BF00339943.
11. Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752
Max Stable set problem to found the initial centroids in clustering problem (Awatif Karim)
579
[32] Y. Hami and C. Loqman, “Quadratic Convex Reformulation for Solving Task Assignment Problem with Continuous Hopfield
Network,” IJCIA, vol. 20, no. 5, Dec. 2021, doi: S1469026821500243.
[33] P. Talavn and J. Yez, “The generalized quadratic knapsack problem. A neuronal network approach,” Neural Networks, vol. 19,
no. 14, pp. 416-428, 2006, doi: 10.1016/j.neunet.2005.10.008.
[34] D. Greene and P. Cunningham, “Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering,”
Proc. ICML 2006, pp. 377-384, doi: 10.1145/1143844.1143892.
BIOGRAPHIES OF AUTHORS
Awatif Karim received the diploma of engineer in networks and computer
systems in Cadi Ayyad University, Morocco,2010. Now she is a phd.D. student in Sidi
Mohamed Ben Abdellah University, Maroc,2014. Her research interests include artificial
intelligence and data mining. She can be contacted at email: awatif.karim@usmba.ac.ma.
Chakir Loqman is a Professor of Computer Science at the Faculty of Sciences,
Fez Morocco. He received the PhD in Computer Science from The University of Sidi
Mohamed ben Abdellah. His research interests include image processing, artificial
intelligence and optimization. He can be contacted at email: loqman.chakir@usmba.ac.ma.
Youssef Hami is a Professor in National School of Applied Sciences, Tangier
Morocco. He received Ph.D. degrees in operational research and computer science fromthe
University of Sidi Mohamed ben Abdellah in 2014. His research interests include operational
research, computer science and optimization. He can be contacted at email:
yhami@uae.ac.ma.
Jaouad Boumhidi is a Professor of Computer Science at the Faculty of Sciences,
Fez Morocco. He received the PhD in Computer Science from The University of Sidi
Mohamed ben Abdellah in 2005. His research interests are in machine learning, deep learning
and intelligent transportation systems. He can be contacted at email:
jaouad.boumhidi@usmba.ac.ma.