Clustering as unsupervised learning method is the mission of dividing data objects into clusters with common characteristics. In the present paper, we introduce an enhanced technique of the existing EPCA data transformation method. Incorporating the kernel function into the EPCA, the input space can be mapped implicitly into a high-dimensional of feature space. Then, the Shannon’s entropy estimated via the inertia provided by the contribution of every mapped object in data is the key measure to determine the optimal extracted features space. Our proposed method performs very well the clustering algorithm of the fast search of clusters’ centers based on the local densities’ computing. Experimental results disclose that the approach is feasible and efficient on the performance query.
In recent machine learning community, there is a trend of constructing a linear logarithm version of
nonlinear version through the ‘kernel method’ for example kernel principal component analysis, kernel
fisher discriminant analysis, support Vector Machines (SVMs), and the current kernel clustering
algorithms. Typically, in unsupervised methods of clustering algorithms utilizing kernel method, a
nonlinear mapping is operated initially in order to map the data into a much higher space feature, and then
clustering is executed. A hitch of these kernel clustering algorithms is that the clustering prototype resides
in increased features specs of dimensions and therefore lack intuitive and clear descriptions without
utilizing added approximation of projection from the specs to the data as executed in the literature
presented. This paper aims to utilize the ‘kernel method’, a novel clustering algorithm, founded on the
conventional fuzzy clustering algorithm (FCM) is anticipated and known as kernel fuzzy c-means algorithm
(KFCM). This method embraces a novel kernel-induced metric in the space of data in order to interchange
the novel Euclidean matric norm in cluster prototype and fuzzy clustering algorithm still reside in the space
of data so that the results of clustering could be interpreted and reformulated in the spaces which are
original. This property is used for clustering incomplete data. Execution on supposed data illustrate that
KFCM has improved performance of clustering and stout as compare to other transformations of FCM for
clustering incomplete data.
System for Prediction of Non Stationary Time Series based on the Wavelet Radi...IJECEIAES
This document proposes and examines the performance of a hybrid time series forecasting model called the wavelet radial bases function neural network (WRBFNN) model. The WRBFNN model uses wavelet transforms to extract coefficients from time series data as inputs to a radial basis function neural network for forecasting. The performance of the WRBFNN model is compared to a wavelet feed forward neural network (WFFNN) model using four types of non-stationary time series data. Results show the WRBFNN model achieves more accurate forecasts than the WFFNN model for certain types of non-stationary data, and trains significantly faster with fewer computational resources required.
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Scientific Review
Radial Basis Probabilistic Neural Network (RBPNN) has a broader generalized capability that been successfully applied to multiple fields. In this paper, the Euclidean distance of each data point in RBPNN is extended by calculating its kernel-induced distance instead of the conventional sum-of squares distance. The kernel function is a generalization of the distance metric that measures the distance between two data points as the data points are mapped into a high dimensional space. During the comparing of the four constructed classification models with Kernel RBPNN, Radial Basis Function networks, RBPNN and Back-Propagation networks as proposed, results showed that, model classification on Iris Data with Kernel RBPNN display an outstanding performance in this regard.
Density Based Clustering Approach for Solving the Software Component Restruct...IRJET Journal
This document presents research on using the DBSCAN clustering algorithm to solve the problem of software component restructuring. It begins with an abstract that introduces DBSCAN and describes how it can group related software components. It then provides background on software component clustering and describes DBSCAN in more detail. The methodology section outlines the 4 phases of the proposed approach: data collection and processing, clustering with DBSCAN, visualization and analysis, and final restructuring. Experimental results show that DBSCAN produces more evenly distributed clusters compared to fuzzy clustering. The conclusion is that DBSCAN is a better technique for software restructuring as it can identify clusters of varying shapes and sizes without specifying the number of clusters in advance.
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...IJDKP
Quantum clustering (QC), is a data clustering algorithm based on quantum mechanics which is
accomplished by substituting each point in a given dataset with a Gaussian. The width of the Gaussian is a
σ value, a hyper-parameter which can be manually defined and manipulated to suit the application.
Numerical methods are used to find all the minima of the quantum potential as they correspond to cluster
centers. Herein, we investigate the mathematical task of expressing and finding all the roots of the
exponential polynomial corresponding to the minima of a two-dimensional quantum potential. This is an
outstanding task because normally such expressions are impossible to solve analytically. However, we
prove that if the points are all included in a square region of size σ, there is only one minimum. This bound
is not only useful in the number of solutions to look for, by numerical means, it allows to to propose a new
numerical approach “per block”. This technique decreases the number of particles by approximating some
groups of particles to weighted particles. These findings are not only useful to the quantum clustering
problem but also for the exponential polynomials encountered in quantum chemistry, Solid-state Physics
and other applications.
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
The k- Means clustering algorithm is an old algorithm that has been intensely researched owing to its ease
and simplicity of implementation. Clustering algorithm has a broad attraction and usefulness in
exploratory data analysis. This paper presents results of the experimental study of different approaches to
k- Means clustering, thereby comparing results on different datasets using Original k-Means and other
modified algorithms implemented using MATLAB R2009b. The results are calculated on some performance
measures such as no. of iterations, no. of points misclassified, accuracy, Silhouette validity index and
execution time
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...acijjournal
Cluster analysis of graph related problems is an important issue now-a-day. Different types of graph
clustering techniques are appeared in the field but most of them are vulnerable in terms of effectiveness
and fragmentation of output in case of real-world applications in diverse systems. In this paper, we will
provide a comparative behavioural analysis of RNSC (Restricted Neighbourhood Search Clustering) and
MCL (Markov Clustering) algorithms on Power-Law Distribution graphs. RNSC is a graph clustering
technique using stochastic local search. RNSC algorithm tries to achieve optimal cost clustering by
assigning some cost functions to the set of clusterings of a graph. This algorithm was implemented by A.
D. King only for undirected and unweighted random graphs. Another popular graph clustering
algorithm MCL is based on stochastic flow simulation model for weighted graphs. There are plentiful
applications of power-law or scale-free graphs in nature and society. Scale-free topology is stochastic i.e.
nodes are connected in a random manner. Complex network topologies like World Wide Web, the web of
human sexual contacts, or the chemical network of a cell etc., are basically following power-law
distribution to represent different real-life systems. This paper uses real large-scale power-law
distribution graphs to conduct the performance analysis of RNSC behaviour compared with Markov
clustering (MCL) algorithm. Extensive experimental results on several synthetic and real power-law
distribution datasets reveal the effectiveness of our approach to comparative performance measure of
these algorithms on the basis of cost of clustering, cluster size, modularity index of clustering results and
normalized mutual information (NMI).
Parallel KNN for Big Data using Adaptive IndexingIRJET Journal
This document presents an evaluation of different algorithms for performing parallel k-nearest neighbor (kNN) queries on big data using the MapReduce framework. It first discusses how kNN algorithms do not scale well for large datasets. It then reviews existing MapReduce-based kNN algorithms like H-BNLJ, H-zkNNJ, and RankReduce that improve performance by partitioning data and distributing computation. The document also proposes using an adaptive indexing technique with the RankReduce algorithm. An implementation of this approach on a airline on-time statistics dataset shows it achieves better precision and speed than other algorithms.
In recent machine learning community, there is a trend of constructing a linear logarithm version of
nonlinear version through the ‘kernel method’ for example kernel principal component analysis, kernel
fisher discriminant analysis, support Vector Machines (SVMs), and the current kernel clustering
algorithms. Typically, in unsupervised methods of clustering algorithms utilizing kernel method, a
nonlinear mapping is operated initially in order to map the data into a much higher space feature, and then
clustering is executed. A hitch of these kernel clustering algorithms is that the clustering prototype resides
in increased features specs of dimensions and therefore lack intuitive and clear descriptions without
utilizing added approximation of projection from the specs to the data as executed in the literature
presented. This paper aims to utilize the ‘kernel method’, a novel clustering algorithm, founded on the
conventional fuzzy clustering algorithm (FCM) is anticipated and known as kernel fuzzy c-means algorithm
(KFCM). This method embraces a novel kernel-induced metric in the space of data in order to interchange
the novel Euclidean matric norm in cluster prototype and fuzzy clustering algorithm still reside in the space
of data so that the results of clustering could be interpreted and reformulated in the spaces which are
original. This property is used for clustering incomplete data. Execution on supposed data illustrate that
KFCM has improved performance of clustering and stout as compare to other transformations of FCM for
clustering incomplete data.
System for Prediction of Non Stationary Time Series based on the Wavelet Radi...IJECEIAES
This document proposes and examines the performance of a hybrid time series forecasting model called the wavelet radial bases function neural network (WRBFNN) model. The WRBFNN model uses wavelet transforms to extract coefficients from time series data as inputs to a radial basis function neural network for forecasting. The performance of the WRBFNN model is compared to a wavelet feed forward neural network (WFFNN) model using four types of non-stationary time series data. Results show the WRBFNN model achieves more accurate forecasts than the WFFNN model for certain types of non-stationary data, and trains significantly faster with fewer computational resources required.
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Scientific Review
Radial Basis Probabilistic Neural Network (RBPNN) has a broader generalized capability that been successfully applied to multiple fields. In this paper, the Euclidean distance of each data point in RBPNN is extended by calculating its kernel-induced distance instead of the conventional sum-of squares distance. The kernel function is a generalization of the distance metric that measures the distance between two data points as the data points are mapped into a high dimensional space. During the comparing of the four constructed classification models with Kernel RBPNN, Radial Basis Function networks, RBPNN and Back-Propagation networks as proposed, results showed that, model classification on Iris Data with Kernel RBPNN display an outstanding performance in this regard.
Density Based Clustering Approach for Solving the Software Component Restruct...IRJET Journal
This document presents research on using the DBSCAN clustering algorithm to solve the problem of software component restructuring. It begins with an abstract that introduces DBSCAN and describes how it can group related software components. It then provides background on software component clustering and describes DBSCAN in more detail. The methodology section outlines the 4 phases of the proposed approach: data collection and processing, clustering with DBSCAN, visualization and analysis, and final restructuring. Experimental results show that DBSCAN produces more evenly distributed clusters compared to fuzzy clustering. The conclusion is that DBSCAN is a better technique for software restructuring as it can identify clusters of varying shapes and sizes without specifying the number of clusters in advance.
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...IJDKP
Quantum clustering (QC), is a data clustering algorithm based on quantum mechanics which is
accomplished by substituting each point in a given dataset with a Gaussian. The width of the Gaussian is a
σ value, a hyper-parameter which can be manually defined and manipulated to suit the application.
Numerical methods are used to find all the minima of the quantum potential as they correspond to cluster
centers. Herein, we investigate the mathematical task of expressing and finding all the roots of the
exponential polynomial corresponding to the minima of a two-dimensional quantum potential. This is an
outstanding task because normally such expressions are impossible to solve analytically. However, we
prove that if the points are all included in a square region of size σ, there is only one minimum. This bound
is not only useful in the number of solutions to look for, by numerical means, it allows to to propose a new
numerical approach “per block”. This technique decreases the number of particles by approximating some
groups of particles to weighted particles. These findings are not only useful to the quantum clustering
problem but also for the exponential polynomials encountered in quantum chemistry, Solid-state Physics
and other applications.
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
The k- Means clustering algorithm is an old algorithm that has been intensely researched owing to its ease
and simplicity of implementation. Clustering algorithm has a broad attraction and usefulness in
exploratory data analysis. This paper presents results of the experimental study of different approaches to
k- Means clustering, thereby comparing results on different datasets using Original k-Means and other
modified algorithms implemented using MATLAB R2009b. The results are calculated on some performance
measures such as no. of iterations, no. of points misclassified, accuracy, Silhouette validity index and
execution time
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...acijjournal
Cluster analysis of graph related problems is an important issue now-a-day. Different types of graph
clustering techniques are appeared in the field but most of them are vulnerable in terms of effectiveness
and fragmentation of output in case of real-world applications in diverse systems. In this paper, we will
provide a comparative behavioural analysis of RNSC (Restricted Neighbourhood Search Clustering) and
MCL (Markov Clustering) algorithms on Power-Law Distribution graphs. RNSC is a graph clustering
technique using stochastic local search. RNSC algorithm tries to achieve optimal cost clustering by
assigning some cost functions to the set of clusterings of a graph. This algorithm was implemented by A.
D. King only for undirected and unweighted random graphs. Another popular graph clustering
algorithm MCL is based on stochastic flow simulation model for weighted graphs. There are plentiful
applications of power-law or scale-free graphs in nature and society. Scale-free topology is stochastic i.e.
nodes are connected in a random manner. Complex network topologies like World Wide Web, the web of
human sexual contacts, or the chemical network of a cell etc., are basically following power-law
distribution to represent different real-life systems. This paper uses real large-scale power-law
distribution graphs to conduct the performance analysis of RNSC behaviour compared with Markov
clustering (MCL) algorithm. Extensive experimental results on several synthetic and real power-law
distribution datasets reveal the effectiveness of our approach to comparative performance measure of
these algorithms on the basis of cost of clustering, cluster size, modularity index of clustering results and
normalized mutual information (NMI).
Parallel KNN for Big Data using Adaptive IndexingIRJET Journal
This document presents an evaluation of different algorithms for performing parallel k-nearest neighbor (kNN) queries on big data using the MapReduce framework. It first discusses how kNN algorithms do not scale well for large datasets. It then reviews existing MapReduce-based kNN algorithms like H-BNLJ, H-zkNNJ, and RankReduce that improve performance by partitioning data and distributing computation. The document also proposes using an adaptive indexing technique with the RankReduce algorithm. An implementation of this approach on a airline on-time statistics dataset shows it achieves better precision and speed than other algorithms.
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...cscpconf
For performing distributed data mining two approaches are possible: First, data from several sources are copied to a data warehouse and mining algorithms are applied in it. Secondly,
mining can performed at the local sites and the results can be aggregated. When the number of
features is high, a lot of bandwidth is consumed in transferring datasets to a centralized location. For this dimensionality reduction can be done at the local sites. In dimensionality reduction a certain encoding is applied on data so as to obtain its compressed form. The
reduced features thus obtained at the local sites are aggregated and data mining algorithms are applied on them. There are several methods of performing dimensionality reduction. Two most important ones are Discrete Wavelet Transforms (DWT) and Principal Component Analysis (PCA). Here a detailed study is done on how PCA could be useful in reducing data flow across a distributed network.
A Novel Algorithm for Design Tree Classification with PCAEditor Jacotech
This document summarizes a research paper titled "A Novel Algorithm for Design Tree Classification with PCA". It discusses dimensionality reduction techniques like principal component analysis (PCA) that can improve the efficiency of classification algorithms on high-dimensional data. PCA transforms data to a new coordinate system such that the greatest variance by any projection of the data comes to lie on the first coordinate, called the first principal component. The paper proposes applying PCA and linear transformation on an original dataset before using a decision tree classification algorithm, in order to get better classification results.
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...ijscmc
Face recognition is one of the most unobtrusive biometric techniques that can be used for access control as well as surveillance purposes. Various methods for implementing face recognition have been proposed with varying degrees of performance in different scenarios. The most common issue with effective facial biometric systems is high susceptibility of variations in the face owing to different factors like changes in pose, varying illumination, different expression, presence of outliers, noise etc. This paper explores a novel technique for face recognition by performing classification of the face images using unsupervised learning approach through K-Medoids clustering. Partitioning Around Medoids algorithm (PAM) has been used for performing K-Medoids clustering of the data. The results are suggestive of increased robustness to noise and outliers in comparison to other clustering methods. Therefore the technique can also be used to increase the overall robustness of a face recognition system and thereby increase its invariance and make it a reliably usable biometric modality
Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...IJECEIAES
This document compares the performance of K-means and adaptive K-means clustering algorithms on different images. It finds that adaptive K-means clustering more accurately detects tumor regions in MRI brain images and the area of a lake in a satellite image, compared to K-means clustering. This is evaluated by comparing the time taken, peak signal-to-noise ratio, and root mean square error between the original and segmented images. Adaptive K-means clustering does not require pre-specifying the number of clusters, which allows it to better segment images without user input.
The document discusses employing a rough set based approach for clustering categorical time-evolving data. It proposes using node importance and a rough membership function to label unlabeled data points and detect concept drift between data clusters over time. Specifically, it defines key terms like nodes, introduces the problem of clustering categorical time-series data and concept drift detection. It then describes using a rough membership function to calculate similarity between unlabeled data and existing clusters in order to label the data and detect changes in cluster characteristics over time.
A fuzzy clustering algorithm for high dimensional streaming dataAlexander Decker
This document summarizes a research paper that proposes a new dimension-reduced weighted fuzzy clustering algorithm (sWFCM-HD) for high-dimensional streaming data. The algorithm can cluster datasets that have both high dimensionality and a streaming (continuously arriving) nature. It combines previous work on clustering algorithms for streaming data and high-dimensional data. The paper introduces the algorithm and compares it experimentally to show improvements in memory usage and runtime over other approaches for these types of datasets.
K Means Clustering and Meanshift Analysis for Grouping the Data of Coal Term ...TELKOMNIKA JOURNAL
Indonesian government agencies under the Ministry of Energy and Mineral Resources have
problems in classifying data dictionary of coal. This research conduct grouping coal dictionary using KMeans
and MeanShift algorithm. K-means algorithm is used to get cluster value on character and word
criteria. The last iteration of Euclidian distance calculation data on k-means combine with Meanshift
algorithm. The meanshift calculates centroid by selecting different bandwidths. The result of grouping
using k-means and meanshift algorithm shows different centroid to find optimum bandwidth value. The
data dictionary of this research has sorted in alphabetically.
Clustering is also known as data segmentation aims to partitions data set into groups, clusters, according to their similarity. Cluster analysis has been extensively studied in many researches. There are many algorithms for different types of clustering. These classical algorithms can't be applied on big data due to its distinct features. It is a challenge to apply the traditional techniques on large unstructured data. This study proposes a hybrid model to cluster big data using the famous traditional K-means clustering algorithm. The proposed model consists of three phases namely; Mapper phase, Clustering Phase and Reduce phase. The first phase uses map-reduce algorithm to split big data into small datasets. Whereas, the second phase implements the traditional clustering K-means algorithm on each of the spitted small data sets. The last phase is responsible of producing the general clusters output of the complete data set. Two functions, Mode and Fuzzy Gaussian, have been implemented and compared at the last phase to determine the most suitable one. The experimental study used four benchmark big data sets; Covtype, Covtype-2, Poker, and Poker-2. The results proved the efficiency of the proposed model in clustering big data using the traditional K-means algorithm. Also, the experiments show that the Fuzzy Gaussian function produces more accurate results than the traditional Mode function.
Advanced SOM & K Mean Method for Load Curve Clustering IJECEIAES
From the load curve classification for one customer, the main features such as the seasonal factors, the weekday factors influencing on the electricity consumption may be extracted. By this way some utilities can make decision on the tariff by seasons or by day in week. The popular clustering techniques are the SOM & K-mean or Fuzzy K-mean. SOM &Kmean is a prominent approach for clustering with a two-level approach: first, the data set will be clustered using the SOM and in the second level, the SOM will be clustered by K-mean. In the first level, two training algorithms were examined: sequential and batch training. For the second level, the K-mean has the results that are strongly depended on the initial values of the centers. To overcome this, this paper used the subtractive clustering approach proposed by Chiu in 1994 to determine the centers. Because the effective radius in Chiu’s method has some influence on the number of centers, the paper applied the PSO technique to find the optimum radius. To valid the proposed approach, the test on well-known data samples is carried out. The applications for daily load curves of one Southern utility are presented.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Further Analysis Of A Framework To Analyze Network Performance Based On Infor...CSCJournals
Abstract In [1], Geng and Li presented a framework to analyze network performance based on information quality. In that paper, the authors based their framework on the flow of information from a Base Station (BS) to clients. The theory they established can, and needs, to be extended to accommodate for the flow of information from the clients to the BS. In this work, we use that framework and study the case of client to BS data transmission. Our work closely parallels the work of Geng and Li, we use the same notation and liberally reference their work. Keywords: information theory, information quality, network protocols, network performance
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
There is a tremendous proliferation in the amount of information available on the largest shared information source, the World Wide Web. Fast and high-quality clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the information. Recent studies have shown that partitional clustering algorithms such as the k-means algorithm are the most popular algorithms for clustering large datasets. The major problem with partitional clustering algorithms is that they are sensitive to the selection of the initial partitions and are prone to premature converge to local optima. Subtractive clustering is a fast, one-pass algorithm for estimating the number of clusters and cluster centers for any given set of data. The cluster estimates can be used to initialize iterative optimization-based clustering methods and model identification methods. In this paper, we present a hybrid Particle Swarm Optimization, Subtractive + (PSO) clustering algorithm that performs fast clustering. For comparison purpose, we applied the Subtractive + (PSO) clustering algorithm, PSO, and the Subtractive clustering algorithms on three different datasets. The results illustrate that the Subtractive + (PSO) clustering algorithm can generate the most compact clustering results as compared to other algorithms.
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGcscpconf
The document proposes a hybrid clustering algorithm that combines K-means and K-harmonic mean algorithms. It performs clustering by alternating between using harmonic mean and arithmetic mean to recalculate cluster centers after each iteration. Experimental results on five datasets show the hybrid algorithm produces clusters with lower mean values, indicating tighter grouping, compared to traditional K-means and K-harmonic mean algorithms. The hybrid approach overcomes issues with initialization sensitivity and helps improve computation time and clustering accuracy.
SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONSijscmcj
The purpose of this article is to determine the usefulness of the Graphics Processing Unit (GPU) calculations used to implement the Latent Semantic Indexing (LSI) reduction of the TERM-BY DOCUMENT matrix. Considered reduction of the matrix is based on the use of the SVD (Singular Value Decomposition) decomposition. A high computational complexity of the SVD decomposition - O(n3), causes that a reduction of a large indexing structure is a difficult task. In this article there is a comparison of the time complexity and accuracy of the algorithms implemented for two different environments. The first environment is associated with the CPU and MATLAB R2011a. The second environment is related to graphics processors and the CULA library. The calculations were carried out on generally available benchmark matrices, which were combined to achieve the resulting matrix of high size. For both considered environments computations were performed for double and single precision data.
This document proposes a new approximate approach for computing the optimal optimistic decision within possibilistic networks. The approach avoids transforming the initial graph into a junction tree, which is computationally expensive. Instead, it performs the computation by calculating the degree of normalization in the moral graph resulting from merging the possibilistic network representing the agent's beliefs and the one representing its preferences. This allows the approach to have polynomial complexity compared to the exact approach based on junction trees, which is NP-hard.
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit
The ability to mine and extract useful information automatically, from large datasets, is a
common concern for organizations (having large datasets), over the last few decades. Over the
internet, data is vastly increasing gradually and consequently the capacity to collect and store
very large data is significantly increasing.
Existing clustering algorithms are not always efficient and accurate in solving clustering
problems for large datasets.
However, the development of accurate and fast data classification algorithms for very large
scale datasets is still a challenge. In this paper, various algorithms and techniques especially,
approach using non-smooth optimization formulation of the clustering problem, are proposed
for solving the minimum sum-of-squares clustering problems in very large datasets. This
research also develops accurate and real time L2-DC algorithm based with the incremental
approach to solve the minimum
Segmentation of Images by using Fuzzy k-means clustering with ACOIJTET Journal
Abstract— Super pixels are becoming increasingly popular for use in computer vision applications. Image segmentation is the process of partitioning a digital image into multiple segments (known as super pixels). In this paper, we developed fuzzy k-means clustering with Ant Colony Optimization (ACO). In this propose algorithm the initial assumptions are made in the calculation of the mean value, which are depends on the colors of neighbored pixel in the image. Fuzzy mean is calculated for the whole image, this process having set of rules that rules are applied iteratively which is used to cluster the whole image. Once choosing a neighbor around that the fitness function is calculated in the optimization process. Based on the optimized clusters the image is segmented. By using fuzzy k-means clustering with ACO technique the image segmentation obtain high accuracy and the segmentation time is reduced compared to previous technique that is Lazy random walk (LRW) methodology. This LRW is optimized from Random walk technique.
Fault diagnosis using genetic algorithms and principal curveseSAT Journals
Abstract Several applications of nonlinear principal component analysis (NPCA) have appeared recently in process monitoring and fault diagnosis. In this paper a new approach is proposed for fault detection based on principal curves and genetic algorithms. The principal curve is a generation of linear principal component (PCA) introduced by Hastie as a parametric curve passes satisfactorily through the middle of data. The existing principal curves algorithms employ the first component of the data as an initial estimation of principal curve. However the dependence on initial line leads to a lack of flexibility and the final curve is only satisfactory for specific problems. In this paper we extend this work in two ways. First, we propose a new method based on genetic algorithms to find the principal curve. Here, lines are fitted and connected to form polygonal lines (PL). Second, potential application of principal curves is discussed. An example is used to illustrate fault diagnosis of nonlinear process using the proposed approach. Index Terms: Principal curve, Genetic Algorithm, Nonlinear principal component analysis, Fault detection.
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...IJERA Editor
Digital media, applications, copyright defense, and multimedia security have become a vital aspect of our daily life. Digital watermarking is a technology used for the copyright security of digital applications. In this work we have dealt with a process able to mark digital pictures with a visible and semi invisible hided information, called watermark. This process may be the basis of a complete copyright protection system. Watermarking is implemented here using Haar Wavelet Coefficients and Principal Component analysis. Experimental results show high imperceptibility where there is no noticeable difference between the watermarked video frames and the original frame in case of invisible watermarking, vice-versa for semi visible implementation. The watermark is embedded in lower frequency band of Wavelet Transformed cover image. The combination of the two transform algorithm has been found to improve performance of the watermark algorithm. The robustness of the watermarking scheme is analyzed by means of two distinct performance measures viz. Peak Signal to Noise Ratio (PSNR) and Normalized Coefficient (NC).
Multimode system condition monitoring using sparsity reconstruction for quali...IJECEIAES
In this paper, we introduce an improved multivariate statistical monitoring method based on the stacked sparse autoencoder (SSAE). Our contribution focuses on the choice of the SSAE model based on neural networks to solve diagnostic problems of complex systems. In order to monitor the process performance, the squared prediction error (SPE) chart is linked with nonparametric adaptive confidence bounds which arise from the kernel density estimation to minimize erroneous alerts. Then, faults are localized using two methods: contribution plots and sensor validity index (SVI). The results are obtained from experiments and real data from a drinkable water processing plant, demonstrating how the applied technique is performed. The simulation results of the SSAE model show a better ability to detect and identify sensor failures.
Stochastic Computing Correlation Utilization in Convolutional Neural Network ...TELKOMNIKA JOURNAL
In recent years, many applications have been implemented in embedded systems and mobile Internet of Things (IoT) devices that typically have constrained resources, smaller power budget, and exhibit "smartness" or intelligence. To implement computation-intensive and resource-hungry Convolutional Neural Network (CNN) in this class of devices, many research groups have developed specialized parallel accelerators using Graphical Processing Units (GPU), Field-Programmable Gate Arrays (FPGA), or Application-Specific Integrated Circuits (ASIC). An alternative computing paradigm called Stochastic Computing (SC) can implement CNN with low hardware footprint and power consumption. To enable building more efficient SC CNN, this work incorporates the CNN basic functions in SC that exploit correlation, share Random Number Generators (RNG), and is more robust to rounding error. Experimental results show our proposed solution provides significant savings in hardware footprint and increased accuracy for the SC CNN basic functions circuits compared to previous work.
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...Scientific Review SR
This document summarizes a study that evaluated the performance of a kernel radial basis probabilistic neural network (Kernel RBPNN) model for classifying iris data, compared to backpropagation, radial basis function, and radial basis probabilistic neural network models. The Kernel RBPNN model achieved the highest classification accuracy of 89.12% on test data from the iris dataset, performing better than the other models. It also had the fastest training time, being over 80 times faster than the radial basis function model. Analysis of the receiver operating characteristic curves showed that the Kernel RBPNN model had the largest area under the curve, indicating it had the best classification prediction capability out of the four models evaluated.
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...cscpconf
For performing distributed data mining two approaches are possible: First, data from several sources are copied to a data warehouse and mining algorithms are applied in it. Secondly,
mining can performed at the local sites and the results can be aggregated. When the number of
features is high, a lot of bandwidth is consumed in transferring datasets to a centralized location. For this dimensionality reduction can be done at the local sites. In dimensionality reduction a certain encoding is applied on data so as to obtain its compressed form. The
reduced features thus obtained at the local sites are aggregated and data mining algorithms are applied on them. There are several methods of performing dimensionality reduction. Two most important ones are Discrete Wavelet Transforms (DWT) and Principal Component Analysis (PCA). Here a detailed study is done on how PCA could be useful in reducing data flow across a distributed network.
A Novel Algorithm for Design Tree Classification with PCAEditor Jacotech
This document summarizes a research paper titled "A Novel Algorithm for Design Tree Classification with PCA". It discusses dimensionality reduction techniques like principal component analysis (PCA) that can improve the efficiency of classification algorithms on high-dimensional data. PCA transforms data to a new coordinate system such that the greatest variance by any projection of the data comes to lie on the first coordinate, called the first principal component. The paper proposes applying PCA and linear transformation on an original dataset before using a decision tree classification algorithm, in order to get better classification results.
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...ijscmc
Face recognition is one of the most unobtrusive biometric techniques that can be used for access control as well as surveillance purposes. Various methods for implementing face recognition have been proposed with varying degrees of performance in different scenarios. The most common issue with effective facial biometric systems is high susceptibility of variations in the face owing to different factors like changes in pose, varying illumination, different expression, presence of outliers, noise etc. This paper explores a novel technique for face recognition by performing classification of the face images using unsupervised learning approach through K-Medoids clustering. Partitioning Around Medoids algorithm (PAM) has been used for performing K-Medoids clustering of the data. The results are suggestive of increased robustness to noise and outliers in comparison to other clustering methods. Therefore the technique can also be used to increase the overall robustness of a face recognition system and thereby increase its invariance and make it a reliably usable biometric modality
Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...IJECEIAES
This document compares the performance of K-means and adaptive K-means clustering algorithms on different images. It finds that adaptive K-means clustering more accurately detects tumor regions in MRI brain images and the area of a lake in a satellite image, compared to K-means clustering. This is evaluated by comparing the time taken, peak signal-to-noise ratio, and root mean square error between the original and segmented images. Adaptive K-means clustering does not require pre-specifying the number of clusters, which allows it to better segment images without user input.
The document discusses employing a rough set based approach for clustering categorical time-evolving data. It proposes using node importance and a rough membership function to label unlabeled data points and detect concept drift between data clusters over time. Specifically, it defines key terms like nodes, introduces the problem of clustering categorical time-series data and concept drift detection. It then describes using a rough membership function to calculate similarity between unlabeled data and existing clusters in order to label the data and detect changes in cluster characteristics over time.
A fuzzy clustering algorithm for high dimensional streaming dataAlexander Decker
This document summarizes a research paper that proposes a new dimension-reduced weighted fuzzy clustering algorithm (sWFCM-HD) for high-dimensional streaming data. The algorithm can cluster datasets that have both high dimensionality and a streaming (continuously arriving) nature. It combines previous work on clustering algorithms for streaming data and high-dimensional data. The paper introduces the algorithm and compares it experimentally to show improvements in memory usage and runtime over other approaches for these types of datasets.
K Means Clustering and Meanshift Analysis for Grouping the Data of Coal Term ...TELKOMNIKA JOURNAL
Indonesian government agencies under the Ministry of Energy and Mineral Resources have
problems in classifying data dictionary of coal. This research conduct grouping coal dictionary using KMeans
and MeanShift algorithm. K-means algorithm is used to get cluster value on character and word
criteria. The last iteration of Euclidian distance calculation data on k-means combine with Meanshift
algorithm. The meanshift calculates centroid by selecting different bandwidths. The result of grouping
using k-means and meanshift algorithm shows different centroid to find optimum bandwidth value. The
data dictionary of this research has sorted in alphabetically.
Clustering is also known as data segmentation aims to partitions data set into groups, clusters, according to their similarity. Cluster analysis has been extensively studied in many researches. There are many algorithms for different types of clustering. These classical algorithms can't be applied on big data due to its distinct features. It is a challenge to apply the traditional techniques on large unstructured data. This study proposes a hybrid model to cluster big data using the famous traditional K-means clustering algorithm. The proposed model consists of three phases namely; Mapper phase, Clustering Phase and Reduce phase. The first phase uses map-reduce algorithm to split big data into small datasets. Whereas, the second phase implements the traditional clustering K-means algorithm on each of the spitted small data sets. The last phase is responsible of producing the general clusters output of the complete data set. Two functions, Mode and Fuzzy Gaussian, have been implemented and compared at the last phase to determine the most suitable one. The experimental study used four benchmark big data sets; Covtype, Covtype-2, Poker, and Poker-2. The results proved the efficiency of the proposed model in clustering big data using the traditional K-means algorithm. Also, the experiments show that the Fuzzy Gaussian function produces more accurate results than the traditional Mode function.
Advanced SOM & K Mean Method for Load Curve Clustering IJECEIAES
From the load curve classification for one customer, the main features such as the seasonal factors, the weekday factors influencing on the electricity consumption may be extracted. By this way some utilities can make decision on the tariff by seasons or by day in week. The popular clustering techniques are the SOM & K-mean or Fuzzy K-mean. SOM &Kmean is a prominent approach for clustering with a two-level approach: first, the data set will be clustered using the SOM and in the second level, the SOM will be clustered by K-mean. In the first level, two training algorithms were examined: sequential and batch training. For the second level, the K-mean has the results that are strongly depended on the initial values of the centers. To overcome this, this paper used the subtractive clustering approach proposed by Chiu in 1994 to determine the centers. Because the effective radius in Chiu’s method has some influence on the number of centers, the paper applied the PSO technique to find the optimum radius. To valid the proposed approach, the test on well-known data samples is carried out. The applications for daily load curves of one Southern utility are presented.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Further Analysis Of A Framework To Analyze Network Performance Based On Infor...CSCJournals
Abstract In [1], Geng and Li presented a framework to analyze network performance based on information quality. In that paper, the authors based their framework on the flow of information from a Base Station (BS) to clients. The theory they established can, and needs, to be extended to accommodate for the flow of information from the clients to the BS. In this work, we use that framework and study the case of client to BS data transmission. Our work closely parallels the work of Geng and Li, we use the same notation and liberally reference their work. Keywords: information theory, information quality, network protocols, network performance
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
There is a tremendous proliferation in the amount of information available on the largest shared information source, the World Wide Web. Fast and high-quality clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the information. Recent studies have shown that partitional clustering algorithms such as the k-means algorithm are the most popular algorithms for clustering large datasets. The major problem with partitional clustering algorithms is that they are sensitive to the selection of the initial partitions and are prone to premature converge to local optima. Subtractive clustering is a fast, one-pass algorithm for estimating the number of clusters and cluster centers for any given set of data. The cluster estimates can be used to initialize iterative optimization-based clustering methods and model identification methods. In this paper, we present a hybrid Particle Swarm Optimization, Subtractive + (PSO) clustering algorithm that performs fast clustering. For comparison purpose, we applied the Subtractive + (PSO) clustering algorithm, PSO, and the Subtractive clustering algorithms on three different datasets. The results illustrate that the Subtractive + (PSO) clustering algorithm can generate the most compact clustering results as compared to other algorithms.
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGcscpconf
The document proposes a hybrid clustering algorithm that combines K-means and K-harmonic mean algorithms. It performs clustering by alternating between using harmonic mean and arithmetic mean to recalculate cluster centers after each iteration. Experimental results on five datasets show the hybrid algorithm produces clusters with lower mean values, indicating tighter grouping, compared to traditional K-means and K-harmonic mean algorithms. The hybrid approach overcomes issues with initialization sensitivity and helps improve computation time and clustering accuracy.
SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONSijscmcj
The purpose of this article is to determine the usefulness of the Graphics Processing Unit (GPU) calculations used to implement the Latent Semantic Indexing (LSI) reduction of the TERM-BY DOCUMENT matrix. Considered reduction of the matrix is based on the use of the SVD (Singular Value Decomposition) decomposition. A high computational complexity of the SVD decomposition - O(n3), causes that a reduction of a large indexing structure is a difficult task. In this article there is a comparison of the time complexity and accuracy of the algorithms implemented for two different environments. The first environment is associated with the CPU and MATLAB R2011a. The second environment is related to graphics processors and the CULA library. The calculations were carried out on generally available benchmark matrices, which were combined to achieve the resulting matrix of high size. For both considered environments computations were performed for double and single precision data.
This document proposes a new approximate approach for computing the optimal optimistic decision within possibilistic networks. The approach avoids transforming the initial graph into a junction tree, which is computationally expensive. Instead, it performs the computation by calculating the degree of normalization in the moral graph resulting from merging the possibilistic network representing the agent's beliefs and the one representing its preferences. This allows the approach to have polynomial complexity compared to the exact approach based on junction trees, which is NP-hard.
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit
The ability to mine and extract useful information automatically, from large datasets, is a
common concern for organizations (having large datasets), over the last few decades. Over the
internet, data is vastly increasing gradually and consequently the capacity to collect and store
very large data is significantly increasing.
Existing clustering algorithms are not always efficient and accurate in solving clustering
problems for large datasets.
However, the development of accurate and fast data classification algorithms for very large
scale datasets is still a challenge. In this paper, various algorithms and techniques especially,
approach using non-smooth optimization formulation of the clustering problem, are proposed
for solving the minimum sum-of-squares clustering problems in very large datasets. This
research also develops accurate and real time L2-DC algorithm based with the incremental
approach to solve the minimum
Segmentation of Images by using Fuzzy k-means clustering with ACOIJTET Journal
Abstract— Super pixels are becoming increasingly popular for use in computer vision applications. Image segmentation is the process of partitioning a digital image into multiple segments (known as super pixels). In this paper, we developed fuzzy k-means clustering with Ant Colony Optimization (ACO). In this propose algorithm the initial assumptions are made in the calculation of the mean value, which are depends on the colors of neighbored pixel in the image. Fuzzy mean is calculated for the whole image, this process having set of rules that rules are applied iteratively which is used to cluster the whole image. Once choosing a neighbor around that the fitness function is calculated in the optimization process. Based on the optimized clusters the image is segmented. By using fuzzy k-means clustering with ACO technique the image segmentation obtain high accuracy and the segmentation time is reduced compared to previous technique that is Lazy random walk (LRW) methodology. This LRW is optimized from Random walk technique.
Fault diagnosis using genetic algorithms and principal curveseSAT Journals
Abstract Several applications of nonlinear principal component analysis (NPCA) have appeared recently in process monitoring and fault diagnosis. In this paper a new approach is proposed for fault detection based on principal curves and genetic algorithms. The principal curve is a generation of linear principal component (PCA) introduced by Hastie as a parametric curve passes satisfactorily through the middle of data. The existing principal curves algorithms employ the first component of the data as an initial estimation of principal curve. However the dependence on initial line leads to a lack of flexibility and the final curve is only satisfactory for specific problems. In this paper we extend this work in two ways. First, we propose a new method based on genetic algorithms to find the principal curve. Here, lines are fitted and connected to form polygonal lines (PL). Second, potential application of principal curves is discussed. An example is used to illustrate fault diagnosis of nonlinear process using the proposed approach. Index Terms: Principal curve, Genetic Algorithm, Nonlinear principal component analysis, Fault detection.
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...IJERA Editor
Digital media, applications, copyright defense, and multimedia security have become a vital aspect of our daily life. Digital watermarking is a technology used for the copyright security of digital applications. In this work we have dealt with a process able to mark digital pictures with a visible and semi invisible hided information, called watermark. This process may be the basis of a complete copyright protection system. Watermarking is implemented here using Haar Wavelet Coefficients and Principal Component analysis. Experimental results show high imperceptibility where there is no noticeable difference between the watermarked video frames and the original frame in case of invisible watermarking, vice-versa for semi visible implementation. The watermark is embedded in lower frequency band of Wavelet Transformed cover image. The combination of the two transform algorithm has been found to improve performance of the watermark algorithm. The robustness of the watermarking scheme is analyzed by means of two distinct performance measures viz. Peak Signal to Noise Ratio (PSNR) and Normalized Coefficient (NC).
Multimode system condition monitoring using sparsity reconstruction for quali...IJECEIAES
In this paper, we introduce an improved multivariate statistical monitoring method based on the stacked sparse autoencoder (SSAE). Our contribution focuses on the choice of the SSAE model based on neural networks to solve diagnostic problems of complex systems. In order to monitor the process performance, the squared prediction error (SPE) chart is linked with nonparametric adaptive confidence bounds which arise from the kernel density estimation to minimize erroneous alerts. Then, faults are localized using two methods: contribution plots and sensor validity index (SVI). The results are obtained from experiments and real data from a drinkable water processing plant, demonstrating how the applied technique is performed. The simulation results of the SSAE model show a better ability to detect and identify sensor failures.
Stochastic Computing Correlation Utilization in Convolutional Neural Network ...TELKOMNIKA JOURNAL
In recent years, many applications have been implemented in embedded systems and mobile Internet of Things (IoT) devices that typically have constrained resources, smaller power budget, and exhibit "smartness" or intelligence. To implement computation-intensive and resource-hungry Convolutional Neural Network (CNN) in this class of devices, many research groups have developed specialized parallel accelerators using Graphical Processing Units (GPU), Field-Programmable Gate Arrays (FPGA), or Application-Specific Integrated Circuits (ASIC). An alternative computing paradigm called Stochastic Computing (SC) can implement CNN with low hardware footprint and power consumption. To enable building more efficient SC CNN, this work incorporates the CNN basic functions in SC that exploit correlation, share Random Number Generators (RNG), and is more robust to rounding error. Experimental results show our proposed solution provides significant savings in hardware footprint and increased accuracy for the SC CNN basic functions circuits compared to previous work.
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...Scientific Review SR
This document summarizes a study that evaluated the performance of a kernel radial basis probabilistic neural network (Kernel RBPNN) model for classifying iris data, compared to backpropagation, radial basis function, and radial basis probabilistic neural network models. The Kernel RBPNN model achieved the highest classification accuracy of 89.12% on test data from the iris dataset, performing better than the other models. It also had the fastest training time, being over 80 times faster than the radial basis function model. Analysis of the receiver operating characteristic curves showed that the Kernel RBPNN model had the largest area under the curve, indicating it had the best classification prediction capability out of the four models evaluated.
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval IJECEIAES
Data mining is an essential process for identifying the patterns in large datasets through machine learning techniques and database systems. Clustering of high dimensional data is becoming very challenging process due to curse of dimensionality. In addition, space complexity and data retrieval performance was not improved. In order to overcome the limitation, Spectral Clustering Based VP Tree Indexing Technique is introduced. The technique clusters and indexes the densely populated high dimensional data points for effective data retrieval based on user query. A Normalized Spectral Clustering Algorithm is used to group similar high dimensional data points. After that, Vantage Point Tree is constructed for indexing the clustered data points with minimum space complexity. At last, indexed data gets retrieved based on user query using Vantage Point Tree based Data Retrieval Algorithm. This in turn helps to improve true positive rate with minimum retrieval time. The performance is measured in terms of space complexity, true positive rate and data retrieval time with El Nino weather data sets from UCI Machine Learning Repository. An experimental result shows that the proposed technique is able to reduce the space complexity by 33% and also reduces the data retrieval time by 24% when compared to state-of-the-artworks.
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...IAEME Publication
This paper presents an approach based on applying an aggregated predictor formed by multiple versions of a multilayer neural network with a back-propagation optimization algorithm for helping the engineer to get a list of the most appropriate well-test interpretation models for a given set of pressure/ production data. The proposed method consists of three stages: (1) data decorrelation through principal component analysis to reduce the covariance between the variables and the dimension of the input layer in the artificial neural network, (2) bootstrap replicates of the learning set where the data is repeatedly sampled with a random split of the data into train sets and using these as new learning sets, and (3) automatic reservoir model identification through aggregated predictor formed by a plurality vote when predicting a new class. This method is described in detail to ensure successful replication of results. The required training and test dataset were generated by using analytical solution models. In our case, there were used 600 samples: 300 for training, 100 for cross-validation, and 200 for testing. Different network structures were tested during this study to arrive at optimum network design. We notice that the single net methodology always brings about confusion in selecting the correct model even though the training results for the constructed networks are close to 1. We notice also that the principal component analysis is an effective strategy in reducing the number of input features, simplifying the network structure, and lowering the training time of the ANN. The results obtained show that the proposed model provides better performance when predicting new data with a coefficient of correlation approximately equal to 95% Compared to a previous approach 80%, the combination of the PCA and ANN is more stable and determine the more accurate results with lesser computational complexity than was feasible previously. Clearly, the aggregated predictor is more stable and shows less bad classes compared to the previous approach.
Black-box modeling of nonlinear system using evolutionary neural NARX modelIJECEIAES
Nonlinear systems with uncertainty and disturbance are very difficult to model using mathematic approach. Therefore, a black-box modeling approach without any prior knowledge is necessary. There are some modeling approaches have been used to develop a black box model such as fuzzy logic, neural network, and evolution algorithms. In this paper, an evolutionary neural network by combining a neural network and a modified differential evolution algorithm is applied to model a nonlinear system. The feasibility and effectiveness of the proposed modeling are tested on a piezoelectric actuator SISO system and an experimental quadruple tank MIMO system.
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkPutra Wanda
Deep learning (DL) has achieved a significant performance in computer vision problems, mainly in automatic feature extraction and representation. However, it is not easy to determine the best pooling method in a different case study. For instance, experts can implement the best types of pooling in image processing cases, which might not be optimal for various tasks. Thus, it is
required to keep in line with the philosophy of DL. In dynamic neural network architecture, it is not practically possible to find
a proper pooling technique for the layers. It is the primary reason why various pooling cannot be applied in the dynamic and multidimensional dataset. To deal with the limitations, it needs to construct an optimal pooling method as a better option than max pooling and average pooling. Therefore, we introduce a dynamic pooling layer called RunPool to train the convolutional
neuralnetwork(CNN)architecture.RunPoolpoolingisproposedtoregularizetheneuralnetworkthatreplacesthedeterministic
pooling functions. In the final section, we test the proposed pooling layer to address classification problems with online social network (OSN) dataset
Kernal based speaker specific feature extraction and its applications in iTau...TELKOMNIKA JOURNAL
This document summarizes kernel-based speaker recognition techniques for an automatic speaker recognition system (ASR) in iTaukei cross-language speech recognition. It discusses kernel principal component analysis (KPCA), kernel independent component analysis (KICA), and kernel linear discriminant analysis (KLDA) for nonlinear speaker-specific feature extraction to improve ASR classification rates. Evaluation of the ASR system using these techniques on a Japanese language corpus and self-recorded iTaukei corpus showed that KLDA achieved the best performance, with an equal error rate improvement of up to 8.51% compared to KPCA and KICA.
Improving K-NN Internet Traffic Classification Using Clustering and Principle...journalBEEI
K-Nearest Neighbour (K-NN) is one of the popular classification algorithm, in this research K-NN use to classify internet traffic, the K-NN is appropriate for huge amounts of data and have more accurate classification, K-NN algorithm has a disadvantages in computation process because K-NN algorithm calculate the distance of all existing data in dataset. Clustering is one of the solution to conquer the K-NN weaknesses, clustering process should be done before the K-NN classification process, the clustering process does not need high computing time to conqest the data which have same characteristic, Fuzzy C-Mean is the clustering algorithm used in this research. The Fuzzy C-Mean algorithm no need to determine the first number of clusters to be formed, clusters that form on this algorithm will be formed naturally based datasets be entered. The Fuzzy C-Mean has weakness in clustering results obtained are frequently not same even though the input of dataset was same because the initial dataset that of the Fuzzy C-Mean is less optimal, to optimize the initial datasets needs feature selection algorithm. Feature selection is a method to produce an optimum initial dataset Fuzzy C-Means. Feature selection algorithm in this research is Principal Component Analysis (PCA). PCA can reduce non significant attribute or feature to create optimal dataset and can improve performance for clustering and classification algorithm. The resultsof this research is the combination method of classification, clustering and feature selection of internet traffic dataset was successfully modeled internet traffic classification method that higher accuracy and faster performance.
On the High Dimentional Information Processing in Quaternionic Domain and its...IJAAS Team
There are various high dimensional engineering and scientific applications in communication, control, robotics, computer vision, biometrics, etc.; where researchers are facing problem to design an intelligent and robust neural system which can process higher dimensional information efficiently. The conventional real-valued neural networks are tried to solve the problem associated with high dimensional parameters, but the required network structure possesses high complexity and are very time consuming and weak to noise. These networks are also not able to learn magnitude and phase values simultaneously in space. The quaternion is the number, which possesses the magnitude in all four directions and phase information is embedded within it. This paper presents a well generalized learning machine with a quaternionic domain neural network that can finely process magnitude and phase information of high dimension data without any hassle. The learning and generalization capability of the proposed learning machine is presented through a wide spectrum of simulations which demonstrate the significance of the work.
Time Series Forecasting Using Novel Feature Extraction Algorithm and Multilay...Editor IJCATR
Time series forecasting is important because it can often provide the foundation for decision making in a large variety of fields. A tree-ensemble method, referred to as time series forest (TSF), is proposed for time series classification. The approach is based on the concept of data series envelopes and essential attributes generated by a multilayer neural network... These claims are further investigated by applying statistical tests. With the results presented in this article and results from related investigations that are considered as well, we want to support practitioners or scholars in answering the following question: Which measure should be looked at first if accuracy is the most important criterion, if an application is time-critical, or if a compromise is needed? In this paper demonstrated feature extraction by novel method can improvement in time series data forecasting process
The document proposes improvements to the traditional K-means clustering algorithm to increase accuracy and efficiency. It discusses selecting initial centroids and assigning data points to clusters. The improved algorithm determines initial centroids by calculating distances from data points to the mean and partitioning into K clusters. It then assigns points to centroids based on minimum distance, only recalculating distances if they increase. Experimental results on standard datasets show the improved algorithm takes less time with higher accuracy compared to traditional K-means.
A h k clustering algorithm for high dimensional data using ensemble learningijitcs
The document summarizes a proposed clustering algorithm for high dimensional data that combines hierarchical (H-K) clustering, subspace clustering, and ensemble clustering. It begins with background on challenges of clustering high dimensional data and related work applying dimension reduction, subspace clustering, ensemble clustering, and H-K clustering individually. The proposed model first applies subspace clustering to identify clusters within subsets of features. It then performs H-K clustering on each subspace cluster. Finally, it applies ensemble clustering techniques to integrate the results into a single clustering. The goal is to leverage each technique's strengths to improve clustering performance for high dimensional data compared to using a single approach.
Half Gaussian-based wavelet transform for pooling layer for convolution neura...TELKOMNIKA JOURNAL
Pooling methods are used to select most significant features to be aggregated to small region. In this paper, anew pooling method is proposed based on probability function. Depending on the fact that, most information is concentrated from mean of the signal to its maximum values, upper half of Gaussian function is used to determine weights of the basic signal statistics, which is used to determine the transform of the original signal into more concise formula, which can represent signal features, this method named half gaussian transform (HGT). Based on strategy of transform computation, Three methods are proposed, the first method (HGT1) is used basic statistics after normalized it as weights to be multiplied by original signal, second method (HGT2) is used determined statistics as features of the original signal and multiply it with constant weights based on half Gaussian, while the third method (HGT3) is worked in similar to (HGT1) except, it depend on entire signal. The proposed methods are applied on three databases, which are (MNIST, CIFAR10 and MIT-BIH ECG) database. The experimental results show that, our methods are achieved good improvement, which is outperformed standard pooling methods such as max pooling and average pooling.
This document discusses various algorithms used for clustering data streams. It begins by introducing the problem of clustering streaming data and the common approach of using micro-clusters to summarize streaming data. It then reviews several prominent clustering algorithms like DBSCAN, DENCLUE, SNN, and CHAMELEON. The document focuses on the DBSTREAM algorithm, which explicitly captures density between micro-clusters using a shared density graph to improve reclustering. Experimental results show DBSTREAM's reclustering using shared density outperforms other reclustering strategies while using fewer micro-clusters.
SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...ijmnct
The document summarizes research on solving the optimal components assignment problem for a multistate network using fuzzy optimization. It discusses how the problem can be formulated as a fuzzy linear program by defining fuzzy membership functions for the objectives of maximizing reliability, minimizing total lead time, and minimizing total cost. The paper then proposes using a genetic algorithm combined with fuzzy linear programming to find component assignments that maximize the fuzzy objective membership degree.
SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...ijmnct
Optimal components assignment problem subject to system reliability, total lead-time, and total cost
constraints is studied in this paper. The problem is formulated as fuzzy linear problem using fuzzy
membership functions. An approach based on genetic algorithm with fuzzy optimization to sole the
presented problem. The optimal solution found by the proposed approach is characterized by maximum
reliability, minimum total cost and minimum total lead-time. The proposed approach is tested on different
examples taken from the literature to illustrate its efficiency in comparison with other previous methods
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...rinzindorjej
In this paper, a fruit image data set is used to compare the efficiency and accuracy of two widely used
Convolutional Neural Network, namely the ResNet and the DenseNet, for the recognition of 50 different
kinds of fruits. In the experiment, the structure of ResNet-34 and DenseNet_BC-121 (with bottleneck layer)
are used. The mathematic principle, experiment detail and the experiment result will be explained through
comparison.
In this paper, a fruit image data set is used to compare the efficiency and accuracy of two widely used Convolutional Neural Network, namely the ResNet and the DenseNet, for the recognition of 50 different kinds of fruits. In the experiment, the structure of ResNet-34 and DenseNet_BC-121 (with bottleneck layer) are used. The mathematic principle, experiment detail and the experiment result will be explained through comparison.
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...rinzindorjej
In this paper, a fruit image data set is used to compare the efficiency and accuracy of two widely used Convolutional Neural Network, namely the ResNet and the DenseNet, for the recognition of 50 different kinds of fruits. In the experiment, the structure of ResNet-34 and DenseNet_BC-121 (with bottleneck layer)
are used. The mathematic principle, experiment detail and the experiment result will be explained through comparison.
Similar to Clustering using kernel entropy principal component analysis and variable kernel estimator (20)
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Neural network optimizer of proportional-integral-differential controller par...IJECEIAES
Wide application of proportional-integral-differential (PID)-regulator in industry requires constant improvement of methods of its parameters adjustment. The paper deals with the issues of optimization of PID-regulator parameters with the use of neural network technology methods. A methodology for choosing the architecture (structure) of neural network optimizer is proposed, which consists in determining the number of layers, the number of neurons in each layer, as well as the form and type of activation function. Algorithms of neural network training based on the application of the method of minimizing the mismatch between the regulated value and the target value are developed. The method of back propagation of gradients is proposed to select the optimal training rate of neurons of the neural network. The neural network optimizer, which is a superstructure of the linear PID controller, allows increasing the regulation accuracy from 0.23 to 0.09, thus reducing the power consumption from 65% to 53%. The results of the conducted experiments allow us to conclude that the created neural superstructure may well become a prototype of an automatic voltage regulator (AVR)-type industrial controller for tuning the parameters of the PID controller.
An improved modulation technique suitable for a three level flying capacitor ...IJECEIAES
This research paper introduces an innovative modulation technique for controlling a 3-level flying capacitor multilevel inverter (FCMLI), aiming to streamline the modulation process in contrast to conventional methods. The proposed
simplified modulation technique paves the way for more straightforward and
efficient control of multilevel inverters, enabling their widespread adoption and
integration into modern power electronic systems. Through the amalgamation of
sinusoidal pulse width modulation (SPWM) with a high-frequency square wave
pulse, this controlling technique attains energy equilibrium across the coupling
capacitor. The modulation scheme incorporates a simplified switching pattern
and a decreased count of voltage references, thereby simplifying the control
algorithm.
A review on features and methods of potential fishing zoneIJECEIAES
This review focuses on the importance of identifying potential fishing zones in seawater for sustainable fishing practices. It explores features like sea surface temperature (SST) and sea surface height (SSH), along with classification methods such as classifiers. The features like SST, SSH, and different classifiers used to classify the data, have been figured out in this review study. This study underscores the importance of examining potential fishing zones using advanced analytical techniques. It thoroughly explores the methodologies employed by researchers, covering both past and current approaches. The examination centers on data characteristics and the application of classification algorithms for classification of potential fishing zones. Furthermore, the prediction of potential fishing zones relies significantly on the effectiveness of classification algorithms. Previous research has assessed the performance of models like support vector machines, naïve Bayes, and artificial neural networks (ANN). In the previous result, the results of support vector machine (SVM) were 97.6% more accurate than naive Bayes's 94.2% to classify test data for fisheries classification. By considering the recent works in this area, several recommendations for future works are presented to further improve the performance of the potential fishing zone models, which is important to the fisheries community.
Electrical signal interference minimization using appropriate core material f...IJECEIAES
As demand for smaller, quicker, and more powerful devices rises, Moore's law is strictly followed. The industry has worked hard to make little devices that boost productivity. The goal is to optimize device density. Scientists are reducing connection delays to improve circuit performance. This helped them understand three-dimensional integrated circuit (3D IC) concepts, which stack active devices and create vertical connections to diminish latency and lower interconnects. Electrical involvement is a big worry with 3D integrates circuits. Researchers have developed and tested through silicon via (TSV) and substrates to decrease electrical wave involvement. This study illustrates a novel noise coupling reduction method using several electrical involvement models. A 22% drop in electrical involvement from wave-carrying to victim TSVs introduces this new paradigm and improves system performance even at higher THz frequencies.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Bibliometric analysis highlighting the role of women in addressing climate ch...IJECEIAES
Fossil fuel consumption increased quickly, contributing to climate change
that is evident in unusual flooding and draughts, and global warming. Over
the past ten years, women's involvement in society has grown dramatically,
and they succeeded in playing a noticeable role in reducing climate change.
A bibliometric analysis of data from the last ten years has been carried out to
examine the role of women in addressing the climate change. The analysis's
findings discussed the relevant to the sustainable development goals (SDGs),
particularly SDG 7 and SDG 13. The results considered contributions made
by women in the various sectors while taking geographic dispersion into
account. The bibliometric analysis delves into topics including women's
leadership in environmental groups, their involvement in policymaking, their
contributions to sustainable development projects, and the influence of
gender diversity on attempts to mitigate climate change. This study's results
highlight how women have influenced policies and actions related to climate
change, point out areas of research deficiency and recommendations on how
to increase role of the women in addressing the climate change and
achieving sustainability. To achieve more successful results, this initiative
aims to highlight the significance of gender equality and encourage
inclusivity in climate change decision-making processes.
Voltage and frequency control of microgrid in presence of micro-turbine inter...IJECEIAES
The active and reactive load changes have a significant impact on voltage
and frequency. In this paper, in order to stabilize the microgrid (MG) against
load variations in islanding mode, the active and reactive power of all
distributed generators (DGs), including energy storage (battery), diesel
generator, and micro-turbine, are controlled. The micro-turbine generator is
connected to MG through a three-phase to three-phase matrix converter, and
the droop control method is applied for controlling the voltage and
frequency of MG. In addition, a method is introduced for voltage and
frequency control of micro-turbines in the transition state from gridconnected mode to islanding mode. A novel switching strategy of the matrix
converter is used for converting the high-frequency output voltage of the
micro-turbine to the grid-side frequency of the utility system. Moreover,
using the switching strategy, the low-order harmonics in the output current
and voltage are not produced, and consequently, the size of the output filter
would be reduced. In fact, the suggested control strategy is load-independent
and has no frequency conversion restrictions. The proposed approach for
voltage and frequency regulation demonstrates exceptional performance and
favorable response across various load alteration scenarios. The suggested
strategy is examined in several scenarios in the MG test systems, and the
simulation results are addressed.
Enhancing battery system identification: nonlinear autoregressive modeling fo...IJECEIAES
Precisely characterizing Li-ion batteries is essential for optimizing their
performance, enhancing safety, and prolonging their lifespan across various
applications, such as electric vehicles and renewable energy systems. This
article introduces an innovative nonlinear methodology for system
identification of a Li-ion battery, employing a nonlinear autoregressive with
exogenous inputs (NARX) model. The proposed approach integrates the
benefits of nonlinear modeling with the adaptability of the NARX structure,
facilitating a more comprehensive representation of the intricate
electrochemical processes within the battery. Experimental data collected
from a Li-ion battery operating under diverse scenarios are employed to
validate the effectiveness of the proposed methodology. The identified
NARX model exhibits superior accuracy in predicting the battery's behavior
compared to traditional linear models. This study underscores the
importance of accounting for nonlinearities in battery modeling, providing
insights into the intricate relationships between state-of-charge, voltage, and
current under dynamic conditions.
Smart grid deployment: from a bibliometric analysis to a surveyIJECEIAES
Smart grids are one of the last decades' innovations in electrical energy.
They bring relevant advantages compared to the traditional grid and
significant interest from the research community. Assessing the field's
evolution is essential to propose guidelines for facing new and future smart
grid challenges. In addition, knowing the main technologies involved in the
deployment of smart grids (SGs) is important to highlight possible
shortcomings that can be mitigated by developing new tools. This paper
contributes to the research trends mentioned above by focusing on two
objectives. First, a bibliometric analysis is presented to give an overview of
the current research level about smart grid deployment. Second, a survey of
the main technological approaches used for smart grid implementation and
their contributions are highlighted. To that effect, we searched the Web of
Science (WoS), and the Scopus databases. We obtained 5,663 documents
from WoS and 7,215 from Scopus on smart grid implementation or
deployment. With the extraction limitation in the Scopus database, 5,872 of
the 7,215 documents were extracted using a multi-step process. These two
datasets have been analyzed using a bibliometric tool called bibliometrix.
The main outputs are presented with some recommendations for future
research.
Use of analytical hierarchy process for selecting and prioritizing islanding ...IJECEIAES
One of the problems that are associated to power systems is islanding
condition, which must be rapidly and properly detected to prevent any
negative consequences on the system's protection, stability, and security.
This paper offers a thorough overview of several islanding detection
strategies, which are divided into two categories: classic approaches,
including local and remote approaches, and modern techniques, including
techniques based on signal processing and computational intelligence.
Additionally, each approach is compared and assessed based on several
factors, including implementation costs, non-detected zones, declining
power quality, and response times using the analytical hierarchy process
(AHP). The multi-criteria decision-making analysis shows that the overall
weight of passive methods (24.7%), active methods (7.8%), hybrid methods
(5.6%), remote methods (14.5%), signal processing-based methods (26.6%),
and computational intelligent-based methods (20.8%) based on the
comparison of all criteria together. Thus, it can be seen from the total weight
that hybrid approaches are the least suitable to be chosen, while signal
processing-based methods are the most appropriate islanding detection
method to be selected and implemented in power system with respect to the
aforementioned factors. Using Expert Choice software, the proposed
hierarchy model is studied and examined.
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...IJECEIAES
The power generated by photovoltaic (PV) systems is influenced by
environmental factors. This variability hampers the control and utilization of
solar cells' peak output. In this study, a single-stage grid-connected PV
system is designed to enhance power quality. Our approach employs fuzzy
logic in the direct power control (DPC) of a three-phase voltage source
inverter (VSI), enabling seamless integration of the PV connected to the
grid. Additionally, a fuzzy logic-based maximum power point tracking
(MPPT) controller is adopted, which outperforms traditional methods like
incremental conductance (INC) in enhancing solar cell efficiency and
minimizing the response time. Moreover, the inverter's real-time active and
reactive power is directly managed to achieve a unity power factor (UPF).
The system's performance is assessed through MATLAB/Simulink
implementation, showing marked improvement over conventional methods,
particularly in steady-state and varying weather conditions. For solar
irradiances of 500 and 1,000 W/m2
, the results show that the proposed
method reduces the total harmonic distortion (THD) of the injected current
to the grid by approximately 46% and 38% compared to conventional
methods, respectively. Furthermore, we compare the simulation results with
IEEE standards to evaluate the system's grid compatibility.
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...IJECEIAES
Photovoltaic systems have emerged as a promising energy resource that
caters to the future needs of society, owing to their renewable, inexhaustible,
and cost-free nature. The power output of these systems relies on solar cell
radiation and temperature. In order to mitigate the dependence on
atmospheric conditions and enhance power tracking, a conventional
approach has been improved by integrating various methods. To optimize
the generation of electricity from solar systems, the maximum power point
tracking (MPPT) technique is employed. To overcome limitations such as
steady-state voltage oscillations and improve transient response, two
traditional MPPT methods, namely fuzzy logic controller (FLC) and perturb
and observe (P&O), have been modified. This research paper aims to
simulate and validate the step size of the proposed modified P&O and FLC
techniques within the MPPT algorithm using MATLAB/Simulink for
efficient power tracking in photovoltaic systems.
Adaptive synchronous sliding control for a robot manipulator based on neural ...IJECEIAES
Robot manipulators have become important equipment in production lines, medical fields, and transportation. Improving the quality of trajectory tracking for
robot hands is always an attractive topic in the research community. This is a
challenging problem because robot manipulators are complex nonlinear systems
and are often subject to fluctuations in loads and external disturbances. This
article proposes an adaptive synchronous sliding control scheme to improve trajectory tracking performance for a robot manipulator. The proposed controller
ensures that the positions of the joints track the desired trajectory, synchronize
the errors, and significantly reduces chattering. First, the synchronous tracking
errors and synchronous sliding surfaces are presented. Second, the synchronous
tracking error dynamics are determined. Third, a robust adaptive control law is
designed,the unknown components of the model are estimated online by the neural network, and the parameters of the switching elements are selected by fuzzy
logic. The built algorithm ensures that the tracking and approximation errors
are ultimately uniformly bounded (UUB). Finally, the effectiveness of the constructed algorithm is demonstrated through simulation and experimental results.
Simulation and experimental results show that the proposed controller is effective with small synchronous tracking errors, and the chattering phenomenon is
significantly reduced.
Remote field-programmable gate array laboratory for signal acquisition and de...IJECEIAES
A remote laboratory utilizing field-programmable gate array (FPGA) technologies enhances students’ learning experience anywhere and anytime in embedded system design. Existing remote laboratories prioritize hardware access and visual feedback for observing board behavior after programming, neglecting comprehensive debugging tools to resolve errors that require internal signal acquisition. This paper proposes a novel remote embeddedsystem design approach targeting FPGA technologies that are fully interactive via a web-based platform. Our solution provides FPGA board access and debugging capabilities beyond the visual feedback provided by existing remote laboratories. We implemented a lab module that allows users to seamlessly incorporate into their FPGA design. The module minimizes hardware resource utilization while enabling the acquisition of a large number of data samples from the signal during the experiments by adaptively compressing the signal prior to data transmission. The results demonstrate an average compression ratio of 2.90 across three benchmark signals, indicating efficient signal acquisition and effective debugging and analysis. This method allows users to acquire more data samples than conventional methods. The proposed lab allows students to remotely test and debug their designs, bridging the gap between theory and practice in embedded system design.
Detecting and resolving feature envy through automated machine learning and m...IJECEIAES
Efficiently identifying and resolving code smells enhances software project quality. This paper presents a novel solution, utilizing automated machine learning (AutoML) techniques, to detect code smells and apply move method refactoring. By evaluating code metrics before and after refactoring, we assessed its impact on coupling, complexity, and cohesion. Key contributions of this research include a unique dataset for code smell classification and the development of models using AutoGluon for optimal performance. Furthermore, the study identifies the top 20 influential features in classifying feature envy, a well-known code smell, stemming from excessive reliance on external classes. We also explored how move method refactoring addresses feature envy, revealing reduced coupling and complexity, and improved cohesion, ultimately enhancing code quality. In summary, this research offers an empirical, data-driven approach, integrating AutoML and move method refactoring to optimize software project quality. Insights gained shed light on the benefits of refactoring on code quality and the significance of specific features in detecting feature envy. Future research can expand to explore additional refactoring techniques and a broader range of code metrics, advancing software engineering practices and standards.
Smart monitoring technique for solar cell systems using internet of things ba...IJECEIAES
Rapidly and remotely monitoring and receiving the solar cell systems status parameters, solar irradiance, temperature, and humidity, are critical issues in enhancement their efficiency. Hence, in the present article an improved smart prototype of internet of things (IoT) technique based on embedded system through NodeMCU ESP8266 (ESP-12E) was carried out experimentally. Three different regions at Egypt; Luxor, Cairo, and El-Beheira cities were chosen to study their solar irradiance profile, temperature, and humidity by the proposed IoT system. The monitoring data of solar irradiance, temperature, and humidity were live visualized directly by Ubidots through hypertext transfer protocol (HTTP) protocol. The measured solar power radiation in Luxor, Cairo, and El-Beheira ranged between 216-1000, 245-958, and 187-692 W/m 2 respectively during the solar day. The accuracy and rapidity of obtaining monitoring results using the proposed IoT system made it a strong candidate for application in monitoring solar cell systems. On the other hand, the obtained solar power radiation results of the three considered regions strongly candidate Luxor and Cairo as suitable places to build up a solar cells system station rather than El-Beheira.
An efficient security framework for intrusion detection and prevention in int...IJECEIAES
Over the past few years, the internet of things (IoT) has advanced to connect billions of smart devices to improve quality of life. However, anomalies or malicious intrusions pose several security loopholes, leading to performance degradation and threat to data security in IoT operations. Thereby, IoT security systems must keep an eye on and restrict unwanted events from occurring in the IoT network. Recently, various technical solutions based on machine learning (ML) models have been derived towards identifying and restricting unwanted events in IoT. However, most ML-based approaches are prone to miss-classification due to inappropriate feature selection. Additionally, most ML approaches applied to intrusion detection and prevention consider supervised learning, which requires a large amount of labeled data to be trained. Consequently, such complex datasets are impossible to source in a large network like IoT. To address this problem, this proposed study introduces an efficient learning mechanism to strengthen the IoT security aspects. The proposed algorithm incorporates supervised and unsupervised approaches to improve the learning models for intrusion detection and mitigation. Compared with the related works, the experimental outcome shows that the model performs well in a benchmark dataset. It accomplishes an improved detection accuracy of approximately 99.21%.
The CBC machine is a common diagnostic tool used by doctors to measure a patient's red blood cell count, white blood cell count and platelet count. The machine uses a small sample of the patient's blood, which is then placed into special tubes and analyzed. The results of the analysis are then displayed on a screen for the doctor to review. The CBC machine is an important tool for diagnosing various conditions, such as anemia, infection and leukemia. It can also help to monitor a patient's response to treatment.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...shadow0702a
This document serves as a comprehensive step-by-step guide on how to effectively use PyCharm for remote debugging of the Windows Subsystem for Linux (WSL) on a local Windows machine. It meticulously outlines several critical steps in the process, starting with the crucial task of enabling permissions, followed by the installation and configuration of WSL.
The guide then proceeds to explain how to set up the SSH service within the WSL environment, an integral part of the process. Alongside this, it also provides detailed instructions on how to modify the inbound rules of the Windows firewall to facilitate the process, ensuring that there are no connectivity issues that could potentially hinder the debugging process.
The document further emphasizes on the importance of checking the connection between the Windows and WSL environments, providing instructions on how to ensure that the connection is optimal and ready for remote debugging.
It also offers an in-depth guide on how to configure the WSL interpreter and files within the PyCharm environment. This is essential for ensuring that the debugging process is set up correctly and that the program can be run effectively within the WSL terminal.
Additionally, the document provides guidance on how to set up breakpoints for debugging, a fundamental aspect of the debugging process which allows the developer to stop the execution of their code at certain points and inspect their program at those stages.
Finally, the document concludes by providing a link to a reference blog. This blog offers additional information and guidance on configuring the remote Python interpreter in PyCharm, providing the reader with a well-rounded understanding of the process.
Clustering using kernel entropy principal component analysis and variable kernel estimator
1. International Journal of Electrical and Computer Engineering (IJECE)
Vol. 11, No. 3, June 2021, pp. 2109~2119
ISSN: 2088-8708, DOI: 10.11591/ijece.v11i3.pp2109-2119 2109
Journal homepage: http://ijece.iaescore.com
Clustering using kernel entropy principal component analysis
and variable kernel estimator
Loubna El Fattahi1
, El Hassan Sbai2
1
Department of Physics, Moulay Ismail University of Meknes, Morocco
2
High School of Technology, Moulay Ismail University of Meknes, Morocco
Article Info ABSTRACT
Article history:
Received Dec 9, 2019
Revised Sep 26, 2020
Accepted Oct 6, 2020
Clustering as unsupervised learning method is the mission of dividing data
objects into clusters with common characteristics. In the present paper, we
introduce an enhanced technique of the existing EPCA data transformation
method. Incorporating the kernel function into the EPCA, the input space can
be mapped implicitly into a high-dimensional of feature space. Then, the
Shannon’s entropy estimated via the inertia provided by the contribution of
every mapped object in data is the key measure to determine the optimal
extracted features space. Our proposed method performs very well the
clustering algorithm of the fast search of clusters’ centers based on the local
densities’ computing. Experimental results disclose that the approach is
feasible and efficient on the performance query.
Keywords:
Clustering
Kernel entropy principal
component analysis
Maximum entropy principle
density peak
Variable kernel estimator
This is an open access article under the CC BY-SA license.
Corresponding Author:
Loubna El Fattahi
Department of Physics
Moulay Ismail University of Meknes
Km 5, Rue d'Agouray ،N6, Meknes 50040, Morocco
Email: estm@est-umi.ac.ma
1. INTRODUCTION
Clustering the data is the task of discovering natural groupings in data, named clusters according to
their similarity. Objects are similar inside the same cluster whereas dissimilar compared to objects
descending from other clusters. Clustering, as a class of unsupervised classification method, has been widely
applied in different domains, machine learning, image segmentation, pattern recognition, text mining and
many other domains [1-3]. Great number of clustering algorithms lie in literature, the famous K-mean
clustering [4], hierarchical clustering [5], k-medoids [6], and mean shift [7] have been considered in various
problems.
Despite of the extensive studies in the past on clustering, a critical issue remains largely unsolved:
how to automatically determine the number of clusters. Most of them assumed that the number of clusters
either has been manually set or is known in advance [4, 8]. Recently, great attention has been accorded to
tackle with this issue. Clustering by density peaks selection criterion [9-11] is a technique that tends to find
intuitively the number of clusters independently of their shape and the dimension of the space containing
them, so that to avoid the inherent shortcomings of providing the number of clusters as an input parameter
like in the K-means. The approach proposed by Rodriguez and Laio [10] relies as a first step on the fast
search of the density peaks referring the cluster centers characterized by having a higher local density
compared to their neighborhood and relatively large distance regard to points having higher densities. As a
result, the cluster centers are located in general at the upper right corner of the decision graph which is the
plot of the density as a function of the distance of each point. Therefore, once centers are determined all
2. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 11, No. 3, June 2021 : 2109 - 2119
2110
remaining data points are assigned to the same cluster as its nearest neighbors of higher density so as to form
final clusters.
As a further work, the Rodriguez and Laio’s method was improved by Wang and Xu [11] through
substituting the step function by the Parzen estimator so that the truncated density becomes smoother. Here,
we develop a method for data transformation and reduction based on the variable kernel estimator [12].
Aimed at the problems of clustering, many researchers are convinced that the dimensionality reduction is an
important stage that must be adopted in data analysis before considering any classification step. In fact, many
algorithms of clustering often do not work well in high dimension, so, to improve the efficiency, a data
reduction is needed [1]. In this sense, we propose a hybrid method, which combines the entropy principal
component analysis (EPCA) [13, 14] and the data mapping. The core element is to perform a data mapping
using the kernel function before implementing the EPCA. Data mapping consists of transforming the data
into a high-dimensional feature space, where patterns become linear and the nonlinearity disappears [15].
Then, using EPCA, we restrict the high-dimensional space to a subspace of the extracted features.
In many cases, the quality of clustering is approved by a low similarity of the inter-cluster and a
high similarity of the intra-cluster. To this end, we integrate an automatic method for cluster centroid
selection based on the validity index algorithm using the concept of entropy introduced by Jayens Edwin in
[16] and computing the inter-cluster and intra-cluster similarities [17].
The present paper is organized as follows. Section 2 presents a brief review of PCA as a linear data
transformation. We move to consider the kernel entropy principal component analysis (KEPCA) for
dimensionality reduction. In section 3, we present the clustering by the fast search of centers relying on the
estimation of the probability density function (PDF) as well as the automatic criterion for the selection of
cluster centers using validity index algorithm. Section 4, is dedicated to the validation of the proposed
method on both synthetic and real datasets including a comparison to other clustering algorithms.
2. RESEARCH METHOD
2.1. Dimensionality reduction
Clustering algorithms are facing problems of dimensionality especially when the dimension
increases importantly. Therefore, in some cases, they lose their efficiency, likewise, their productivity
especially when data present sparseness. As a solution, we consider the reduction of the dimensionality as an
efficient preprocessing. Due to this effect, many researches have been done to get rid of this complication
[18-21]. Thus, we establish a nonlinear method named KEPCA, which improves the existed linear EPCA
method [13]. The purpose is to discard the redundant and irrelevant information and get only the valuable
one. Using the maximum entropy principle [16], we can easily determine the reduced dimension of data in
kernel space.
2.1.1. Principal component analysis
Principal component analysis (PCA) is very famous as a technique of multivariate statistics. Due to
the fact of analyzing data in term of feature extraction and dimensionality reduction, PCA is adopted by
almost all disciplines. The ultimate objective of PCA is to select variables from input data table which have
higher statistical information then squeeze out this information as a set of new orthogonal variables called
principal component based on mathematic notions: eigenvalues, eigenvectors, mean and standard deviation
[14, 20].
2.1.2. Shannon entropy
Claude Shannon established the entropy concept in information theory. He introduced the term
– 𝑙𝑜𝑔 ((𝑋𝑖)) as a measurement of the information carried by the realization 𝑋𝑖 knowing the probability of
distribution 𝑝 of the discrete variable 𝑋 [22]. Relating to a discrete variable, the entropy measures the
uncertainty. The Shannon entropy related to 𝑋 is obtained calculating the following formula (1):
𝑆(𝑝) = ∑ 𝑝(𝑋𝑖) 𝑙𝑜𝑔 𝑝(𝑋𝑖)
𝑛
𝑖=1
(1)
where 𝑝 = {𝑝1, 𝑝2, … , 𝑝𝑛} . Thus, by combining the Shannon entropy and the PCA it gives the EPCA, where
the core element is maximum entropy principle (MEP) [17] so as to determine the optimal dimension of the
principal subspace keeping the maximum information.
3. Int J Elec & Comp Eng ISSN: 2088-8708
Clustering using kernel entropy principal component analysis… (Loubna El Fattahi)
2111
2.1.3. Kernel entropy principal component analysis
Our proposed approach, KEPCA, is a nonlinear version of EPCA [23]. The basic aspect of kernel
EPCA method is to map input data 𝑥𝑡 = 𝑥1, … , 𝑥𝑁 such that 𝑡 = 1, . . , 𝑁 into kernel space through the kernel
function. Then, kernel matrix is given by Φ: ℜ𝑑
→ ℱ where 𝑥𝑡 = Φ(𝑥𝑡) , is Φ = [𝜙(𝑥1), … , 𝜙(𝑥𝑁)]. As soon
as data mapping is done, EPCA is implemented in 𝐹. The positive semi-definite kernel function provides
data mapping, 𝑘𝜎 = 𝑅𝑑
× 𝑅𝑑
→ 𝑅 which produces an inner product in the Hilbert space 𝐹:
𝑘𝜎(𝑥𝑡, 𝑥𝑡′) = 〈𝜙(𝑥𝑡), 𝜙(𝑥𝑡′), 〉 (2)
where every single element (𝑡, 𝑡′), of the (𝑁 , 𝑁) kernel matrix 𝐾, is equivalent to 𝑘𝜎(𝑥𝑖, 𝑥𝑡′). Thus, the
inner product is 𝐾 = Φ𝑇
× Φ. To elucidate more explicitly how we proceed in implementing the EPCA, let
𝑋1, … , 𝑋𝑛 be the mapped data contained in 𝐾 defined by 𝑛 features 𝑓1, … 𝑓𝑛, and 𝐸𝑞 is the subspace with 𝑞 =
1, … , 𝑛.
The average information supplied by the contribution of each mapped element to the construction of
the subspace of projection is the explained inertia, whereas the average information given by the contribution
of every mapped individual to the loss of inertia is the residual inertia. Both contributions to the explained
inertia (EIC) and to the residual inertia (RIC) of each single individual 𝑋𝑖 all over the subspace of features 𝐸𝑞
are successively given as a probability distribution in (3) and (4) [23].
𝑝𝑞
1 (𝑋𝑖) = 𝐸𝐼𝐶(𝑋𝑖, 𝐸𝑞) (3)
𝑝𝑞
2 (𝑋𝑖) = 𝑅𝐼𝐶(𝑋𝑖, 𝐸𝑞) (4)
with ∑ 𝐸𝐼𝐶(𝑋𝑖, 𝐸𝑞) = 1
𝑛
𝑖=1 , and ∑ 𝑅𝐼𝐶(𝑋𝑖, 𝐸𝑞) = 1
𝑛
𝑖=1 . Thus, the Shannon entropy provided by these
distributions respectively is given asin (5) and (6):
𝑆1(𝑝𝑞
1
) = ∑ 𝑝𝑞
1(𝑋𝑖) 𝑙𝑜𝑔 𝑝𝑞
1(𝑋𝑖)
𝑛
𝑖=1
(5)
𝑆2(𝑝𝑞
2
) = ∑ 𝑝𝑞
2(𝑋𝑖) 𝑙𝑜𝑔 𝑝𝑞
2(𝑋𝑖)
𝑛
𝑖=1
(6)
The variation of the quantities in (5) and (6) is antagonist. According to the maximum entropy
principle, the maximized sum of the both entropies of the probability distributions corresponds to the
minimum dimension of the subspace of features.
𝑆(𝑞∗) = 𝑚𝑎𝑥(𝑆1(𝑝𝑞
1
) + 𝑆2(𝑝𝑞
2
) (7)
𝑞∗
is the optimal dimension of the feature subspace.
2.2. Clustering by density peak selection
Recent researches give big interest to the clustering by the fast search of cluster centroids based on
the nonparametric estimation of the PDF. The extended version of Rodriguez and Laio’s method [10] given by
Wang and Xu [11] is summarized here through the Parzen estimator rather than the step function. The main
aspect of the method is the measurement of the couple (density, distance) values characterizing each data
point.
2.2.1. Density estimation and distance
Relying on Rodriguez and Laio’s clustering method [10], we resume here the computation of the
density and distance values. The main role is to identify the cluster centroids based on the two assumptions:
the first one establishes the fact that each cluster center is enclosed by elements, which have local densities
lower than the center density. The Second one esteems that each cluster center is far enough from all other
points with higher density. For a given data point 𝑋𝑖, the local density and the distance are defined
respectively as (8) and (9):
4. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 11, No. 3, June 2021 : 2109 - 2119
2112
𝜌𝑖 = ∑ 𝐼(𝑑(𝑋𝑖, 𝑋𝑗) < 𝑑𝑐)
𝑛
𝑗=1
, (8)
𝛿
̂(𝑥𝑖) = 𝑚𝑖𝑛
𝑗:𝑓
̂(𝑥𝑖)<𝑓
̂(𝑥𝑗)
𝑑(𝑥𝑖, 𝑥𝑗). (9)
where 𝐼(𝐴) is the step function of the set 𝐴; 𝑑(𝑋𝑖, 𝑋𝑗) is the Euclidian distance between two different data
points; and 𝑑𝑐 is the cut-off distance defined in advance. Analyzing definition (8), we could understand that 𝜌𝑖
is simply the count of points that are closer than 𝑑𝑐 to the 𝑖𝑡ℎ data point, whereas in (9), the measurement 𝛿 is
determined as the minimum distance among distances computed between the 𝑖𝑡ℎ data point and all other
points having higher density. Finally, the point, which has the highest density, 𝛿𝑖 is defined to
be 𝑚𝑎𝑥𝑗 𝑑(𝑥𝑖, 𝑥𝑗).
Nonetheless, the choice of 𝑑𝑐 is not always useful because the result of the algorithm fundamentally
depends on it. The cause is that 𝑑𝑐 describes the average number of neighbors, close to 1% and 2% of the
whole number of data points. Consequently, the choice of 𝑑𝑐 is unsystematic and unstable when the size of the
sample is changing. To get rid of the bad effect of 𝑑𝑐, we opt using the Parzen estimator and the variable
kernel estimator in lieu of the step function that has good effect on the query of performance [10]. Concerning
the bandwidth parameter, it is efficiently computed using the rule of Silverman [24].
2.2.2. Variable kernel estimator
The variable kernel estimator (VKE) is a combination of Parzen estimator where the scale of the
bumps placed on the data points are allowed to vary from data point to another [25, 26] and the kNN
estimator.
The estimator of Parzen-Rosenblatt is defined as in (10:
𝑓
̂(𝑥) =
1
𝑛ℎ𝑑
∑ 𝑘𝑑 (
𝐷(𝑥, 𝑋𝑖)
ℎ
)
𝑛
𝑖=1
(10)
where 𝐾𝑑 is the kernel, ℎ is the smoothing parameter and 𝐷(𝑥, 𝑋𝑖) is the Euclidean distance between 𝑥 and 𝑋𝑖.
The k-nearest neighbors (kNN) estimator is defined as in (11) [24]:
𝑓
̂𝑘𝑛𝑛(𝑥) =
(𝑘 ⁄ 𝑛)
(𝑉𝑘(𝑥))
=
𝑘 𝑛
⁄
𝑐𝑑𝑟𝑘(𝑥)
(11)
where 𝑘 is a positive integer, 𝑟𝑘(𝑥) is the distance from 𝑥 to the 𝑘𝑡ℎ nearest point and 𝑉𝑘(𝑥) is the volume of a
sphere of radius 𝑟𝑘(𝑥) and 𝑐𝑑is the volume of the unit sphere in 𝑑 dimensions. The smoothness degree of this
estimator is affected by the parameter 𝑘, taken to be very smaller than the sample size. The VKE is constructed
similarly to the classical kernel estimator. It is defined by (12) [24]:
𝑓
̂(𝑥) =
1
𝑛ℎ𝑑
∑
1
(𝑟𝑖,𝑘)
𝑑
𝑘𝑑 (
𝐷(𝑥, 𝑋𝑖)
ℎ𝑟𝑖,𝑘
)
𝑛
𝑖=1
(12)
with 𝑟𝑖,𝑘 is a Euclidean distance between a data point 𝑋𝑖 and the 𝑘𝑡ℎ nearest point of the other 𝑛 − 1 data
points.
2.2.3. Cluster validity index 𝑽𝑴𝑬𝑷
For every clustering process, a group of clusters 𝑐1, … , 𝑐𝑗, … , 𝑐𝑘 is obtained from a given dataset.
The measurement 𝑃𝑖𝑗 is the relation between each point 𝑖 and the cluster 𝑐𝑗, for 𝑗 = 1, … , 𝑘. For all the pre-
defined clusters 𝑐𝑗, we set 𝑃𝑖𝑗 = 0 in case 𝑖 ∉ 𝑐𝑗 and, when 𝑖 ∈ 𝑐𝑗, 𝑃𝑖𝑗 > 0, we have [12, 17, 23]:
∑ 𝑃𝑖𝑗 = 1
𝑖∈𝐶𝑗
, 𝑓𝑜𝑟 𝑗 = 1, … , k (13)
Each class provides information which is measured using the entropy formula (14):
5. Int J Elec & Comp Eng ISSN: 2088-8708
Clustering using kernel entropy principal component analysis… (Loubna El Fattahi)
2113
𝑆𝑗 = ∑ 𝑃𝑖𝑗
𝑘
𝑗=1
𝑙𝑜𝑔(𝑃𝑖𝑗 ) (14)
Finally, the 𝑉𝑀𝐸𝑃 index is recognized as an entropy by (15):
𝑉𝑀𝐸𝑃 = 𝑆 =
1
𝑘
∑ 𝑆𝑗
𝑘
𝑗=1
+ 𝑙𝑜𝑔( 𝑘∗) (15)
where 𝑆𝑗 is an entropy and 𝑘 ∗
is the optimal number of classes for which the entropy 𝑆 is maximal.
2.2.4. Algorithm of the Kernel entropy principal component analysis
According to the kernel method, the input space can be indirectly mapped into a high-dimensional
feature space through which the nonlinearity could be removed or reduced. The GRB Function is the principal
element of the kernel function because it is typically used in Reproducing Kernel Hilbert Space (RKHS) with
the objective of maximizing the feature space variance of the output variables. Subsequently, the mapped data
were reduced using the EPCA as a linear method for data reduction with the ultimate objective to maintain
features expected to preserve as possible the valuable information. Eventually, a simple algorithm of clustering
would be able to achieve significant results based on both Parzen estimator and variable kernel estimator.
Our proposed algorithm of clustering considering our proposed KEPCA method, is resumed in the
next steps.
Normalize the input data ;
Map the input data;
Perform the EPCA;
Reduce the transformed data;
Compute the bandwidth ;
Compute the pair density-distance of reduced data;
For 𝐶 = 1, … , 𝐶𝑚𝑎𝑥
Select the cluster peaks and assign each element into its cluster;
Compute the cluster validity index 𝑉𝑀𝐸𝑃;
The correct number of clusters and the best grouping for the input data corresponds to the one that have
maximum value of 𝑉𝑀𝐸𝑃.
3. RESULTS AND DISCUSSION
Our present section has as concern, demonstrating the performance query of our approach by
reducing data and extracting only the valuable information. Inspired by computing the couple density-distance
values through the fast search of cluster centers. The use of either the Parzen estimator [27] or variable kernel
estimator integrating the rule of Silverman has good outcomes on our clustering results. A comparison
between the results using our clustering algorithm based on KEPCA data transformation and other algorithms
of clustering is given. The artificial, the real well-known (Iris, Seeds, Flame and Heart) datasets were used [28]
as well as a vehicle trajectory dataset. Our analysis study is demonstrated on MATLAB environment.
3.1. Simulated study
We consider as a first application, a simulated data that consists of three random clusters laid in 600
by 2 matrix. All clusters were generated by a normal distribution but with different random covariate
matrices and centers. Cluster contains 200 data points for each. Figure 1, reveals the plot of the initial data of
the two variables of the input space before considering our clustering algorithm.
As it is shown in Figure 2(a), we can observe that after executing the KEPCA data transformation
result has given three dimensions as the reduced dimension after data mapping. Thus, for synthetic data, three-
dimensional array are enough to represent input data in feature space. The Figure 2(b) discloses that the
centers are identified as three. They are located at the upper-right corner of the density-distance plot,
discriminated in red color, and circled shape. Besides, on the Figure 2(c) it can be easily understood that the
sorted quantity (product of density and distance) has high value for the first three data points. This magnitude
is by its own definition large, starts increasing abnormally between cluster centers and getting very narrow
between other samples. Consequently, both techniques show the same number of centers, which eventually
confirms the existence of three clusters. The obtained results can be explained thanks to the fact that each
6. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 11, No. 3, June 2021 : 2109 - 2119
2114
center is recognized by higher local density and relatively large distance away from the other data points with
higher density. Hence, our algorithm is capable of differentiating amongst centers and other data points. Next,
to examine the validity of our clustering result, we investigate both 𝑉𝑀𝐸𝑃 criterion and the Elbow method [29].
The given results are displayed on the same Figure 2. On the Figure 2(d), we can observe the progress of the
𝑉𝑀𝐸𝑃 index through which the optimal number of clusters is identified to be three clusters applying the
maximum entropy principle. Likewise, same result was given on Figure 2(e) by investigating the Elbow
method, which relies on the minimization of the sum of the squared errors within each cluster. Three clusters
were picked. On the last Figure 2(f) we can see the samples assignation to their convenient clusters
considering the center selection outcome. Every single point is assigned to the nearest center based on the
Euclidean distance calculation. Eventually, the result of the clustering of the proposed algorithm is given on
three-dimensional feature space plot for the synthetic dataset since the dimension was reduced into three
features after performing KEPCA Figure 2(a). The three clusters were properly distinguished one from another
by different shapes and colors as the best grouping for samples, which demonstrates our clustering algorithm
assumptions.
Figure 1. Two-dimensional plot of the initial artificial dataset
Figure 1. Results obtained on artificial dataset (a) Result of the optimal components number using the
KEPCA, (b) decision graph for centers selection of the density distance plot, (c) the product of the density
and distance plotted in decreasing order, (d) number of clusters validation given by 𝑉𝑀𝐸𝑃 index, (e) the Elbow
method for artificial dataset, (f) the assignment of samples to clusters
7. Int J Elec & Comp Eng ISSN: 2088-8708
Clustering using kernel entropy principal component analysis… (Loubna El Fattahi)
2115
The classification rate of our proposed algorithm with KEPCA data transformation and the VKE is
about 97.17% for the simulated dataset, whereas the K-means algorithm achieved 96.83%. Then, using our
algorithm of clustering with the VKE and EPCA has given 82.33%. Therefore, the present algorithm with the
kernel data transformation has given relatively higher classification rate than the other clustering algorithms.
3.2. The Iris dataset
The Iris benchmark is one of the machine-learning datasets; Fisher first used it in [30]. It consists of
150 measurements of three distinct types of the Iris plant (Iris setosa, Iris virginica and Iris versicolor) of the
four variables: width and length for sepal and petal. It is worth mentioning that, one of the classes is linearly
separable, whereas the two others are not [30]. Considering the combination of kernel data transformation
(mapping) and the EPCA as a preprocessing for input data, the figure presents results obtained on Iris dataset,
Figure 3(a) illustrates the reduced features in the high-dimensional space (150 features). We restrict our plot to
only 25 features on the Figure 3 for the sake of clearness as the entropy evolves in decreasing order. Therefore,
the seventeen maintained nonlinear features interpret the more relevant ones for our clustering algorithm in
term of information content. On the Figures 3(b) and 3(c) both graphs are given relying on the measurement of
the pair density and distance. Considering the first plot on the Figure 3(b), it presents a density-distance plot,
whereas the Figure 3(c) it presents the sorted quantity of the density and distance product. Thus, identical
result was given by both techniques, three centroids have been determined from data. The obtained results are
interpreted thanks to the coming assumption: Every center is distinguished from the other data points by its
higher local density and relatively large distance. Hence, our algorithm is capable of differentiating among
centers and the other data points. To demonstrate our clustering algorithm performance, we incorporate the
cluster validity index founded the maximum entropy principle in Figure 3(d) and the Elbow method in
Figure 3(e). Both criteria perform same result, which consequently confirms the existence of three clusters.
Figure 2. Results obtained on Iris dataset (a) result of the optimal components number using the KEPCA, (b)
decision graph for centers selection of the density distance plot, (c) the product of the density and distance
plotted in decreasing order, (d) number of clusters validation given by 𝑉𝑀𝐸𝑃 index and (e) the Elbow method
for Iris dataset
Therefore, our clustering algorithm is able to identify automatically the center of each cluster. The
classification rate for Iris data using our algorithm of clustering considering the KEPCA data transformation as
a preprocessing step is 89.33% incorporating variable kernel estimator while it is equals to 88% incorporating
Parzen estimator, for the EPCA data transformation integrating VKE, it is unable to detect all three clusters
and considers only 2 clusters, likewise for K-medoids. Then, the K-means has reached only 82%. Therefore,
our automatic algorithm integrating KEPCA data transformation and the variable kernel estimator is capable
of classifying patterns with a higher classification rate compared to the other algorithms.
8. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 11, No. 3, June 2021 : 2109 - 2119
2116
3.3. Seeds database
The Seeds dataset composed of 210 samples referring to three wheat varieties, 70 elements for each,
described by 7 geometric features [29]. Considering the combination of the kernel data transformation
(mapping) and the EPCA as an efficient preprocessing for input data, the Figure 4(a) illustrates the reduced
number of features in the high-dimensional space (210 features). For the sake of clearness, we limit our plot
to 25 features because of the decreasing evolution of the entropy. Therefore, the seven maintained nonlinear
features interpret the more relevant ones in term of information content for our clustering algorithm. In the
Figure 4(b) and 4(c) the displayed results are given based on the same magnitudes (the density and the
distance). For the plot in Figure 4(b) it is a density-distance plot whereas the plot in Figure 4(c), it reveals the
sorted product of density and distance. This quantity is by its definition large and begins to grow between
cluster centroids progressively afterwards it becomes tight between the rest of points. Thus, the same result is
given by both techniques, it is clearly seen that three centroids were determined from data, which declares in
advance the existence of three clusters. The obtained results are confirmed thanks to the assumption that each
center is distinguished by its high local density and relatively large distance away from the other data points
with higher density. Hence, our algorithm is apt to extract centers from data points as a first step. To prove its
efficiency, we opt to investigate the cluster validity index in Figure 4(d) and the Elbow method in
Figure 4(e).
Figure 3. Results obtained on Seeds dataset (a) result of the optimal components number using the KEPCA,
(b) decision graph for centers selection of the density distance plot, (c) the product of the density and
distance plotted in decreasing order, (d) number of clusters validation given by 𝑉𝑀𝐸𝑃 index and (e) the Elbow
method for Seeds dataset
The classification rate given by our clustering algorithm considering the KEPCA data
transformation is equal to 87.62% using VKE, whereas by using the Parzen estimator we got 89.52%, and
considering EPCA data reduction with VKE we obtain 87.62% and the k-medoids and the k-means, results
are equal to 84%. As a comparison, we conclude that our algorithm has better result than its counterpart
algorithms of clustering.
3.4. Trajectories database
The LCPC (Laboratoire Central des Ponts et Chanssées) makes it possible to provide a database of
experimental trajectories by measuring the parameters of trajectory in the bend during the vehicle passage at a
discrete intervals of time. To achieve the purpose, they utilize a data acquisition system incorporated to a test
vehicle that can restore the positions, speeds and accelerations. The survey was conducted with volunteer
drivers, where each driver has to go through different trajectories. Hence, the database of physical trajectories
was founded [31]. Therefore, in the present study, we are motivated to investigate the trajectories database,
9. Int J Elec & Comp Eng ISSN: 2088-8708
Clustering using kernel entropy principal component analysis… (Loubna El Fattahi)
2117
which consists of 232 trajectories with 6 variables (longitudinal position, lateral position, longitudinal speed,
lateral speed, longitudinal acceleration and lateral acceleration). Thus, we consider our approach for data
transformation combining the kernel function the EPCA. The Figure 5 presents results obtained on trajectories
dataset, Figure 5(a) demonstrates the reduced features in the high-dimensional space (232 features). We limit
our plot to only 25 features on the Figure 5(a) for the sake of clearness since the entropy evolves in decreasing
order. Hence, the three maintained nonlinear features are selected to have the highest value of entropy, which
means they interpret the more relevant ones for our clustering algorithm in term of information content.
Afterwards, on the Figures 5(b) and 5(c) both charts are given based on the pair density-distance measurement.
The first plot on the Figure 5(b) presents a density-distance plot, whereas the Figure 5(c) illustrates the sorted
quantity of the density-distance product. Hence, identical result was given by both techniques, four centroids
have been determined. The obtained results are justified thanks to the assumption that defines every center to
be distinguished from the other data points by its higher local density and relatively large distance. Thus, our
algorithm is capable of differentiating between centers and the other points. To demonstrate our clustering
algorithm performance, we investigate the cluster validity index in Figure 5(d) and the Elbow method in
Figure 5(e). Both criteria perform same result, which confirms the existence of four clusters for the behavior of
drivers. On Figure 5(f), we displayed the samples (trajectories) assignation to their appropriate clusters based
on the centroids selection result where the remaining points are assigned to the closest centroid based on the
measurement of the Euclidean distance. The given result is displayed on three-dimensional plot since the
reduced dimension after performing KEPCA was deduced to be a three-feature Figure 5(a). The four clusters
are discovered and clearly distinguished from one to another and represented by different colors and shapes.
Each cluster represents a driver’s behavior. The first class C1, corresponds to the family of the slowest
trajectories of calmed driving. The second class C2, corresponds to the family of the slowest trajectories of
sporting driving. The third class C3, represents the family of the fastest trajectories of calmed driving. The
fourth class C4, correlates with the family of the fastest trajectories of sporting driving.
Figure 4. Results obtained on trajectories dataset (a) result of the optimal components number using the
KEPCA, (b) decision graph for centers selection of the density distance plot, (c) the product of the density
and distance plotted in decreasing order, (d) number of clusters validation given by 𝑉𝑀𝐸𝑃 index and (e) the
Elbow method, (f) Three-dimensional plot performed on mapped trajectories data in the reduced space of
three features, samples are colored according to the cluster to which they are assigned with different color
and shape
As a comparison to other works related to same trajectories dataset as in [31], same number of
clusters and of driver’s behavior were detected using our unsupervised algorithm based on kernel data
transformation and variable kernel estimator for density estimation that is critical for clustering. The KEPCA
has shown its potential over different dataset from artificial and real world database. Among all of them, the
proposed KEPCA present an advantage of extracting only valuable information and mapping input data into
high space, where the non-linearity can be omitted easily. The reduced number of features in the high space
10. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 11, No. 3, June 2021 : 2109 - 2119
2118
improves the results. Firstly, in the PDF estimation using the reduced dimension where most of the entropy
information is compacted and the selection of the kernel parameter become more robust. Secondly, in the
clustering task as it is shown on the Table 1, the experiment results reveal the improvement of the
performance of the algorithm using KEPCA over its counterpart EPCA. Furthermore, the use of VKE, often
makes KEPCA more efficient than the use of Parzen estimator, which can be explained with the fact that the
variable kernel estimator adapts the amount of smoothing to the local density data due to the adaptive scale
that can vary from one data point to another.
Table 1. Comparison of the different clustering algorithms over artificial and real world data.
Dataset Without data transformation EPCA-VKE KEPCA-Parzen KEPCA-VKE k-means
Artificial 96.83 82.33 96.67 97.17 96.83
Iris - - 88 89.33 89.33
Flame 84.17 80 85 85 85
heart 62.59 65.19 79.26 82.59 59.26
Trajectory - - - 4 clusters 4 clusters
Seed 91.43 89.52 89.52 87.62 89.52
4. CONCLUSION
In the present paper, we propose an efficient data transformation for the existed EPCA method to
extract the optimal features. Whereas the EPCA gives the optimal entropic component through maximizing the
average of the information (the inertia) provided by the elements, the KEPCA indeed makes a mapping for
data before considering EPCA. Therefore, the core element of the kernel used in KEPCA is to map implicitly
the input data into a high-dimensional feature space, where the nonlinear patterns become linear and the
separation of the elements becomes easier. We have revealed the ability of KEPCA to retain more information
in the high space in both PDF estimation and clustering over synthetic and real world dataset examples.
Results show the performance query of the KEPCA over its counterpart EPCA data transformation. Besides,
the use of VKE estimator has proven its efficiency in density estimation, which has critical effect on the
clustering algorithm.
REFERENCES
[1] D. Napoleon and S. Pavalakodi, “A New Method for Dimensionality Reduction Using KMeans Clustering
Algorithm for High Dimensional Data Set,” International Journal of Computer Applications, vol. 13, no. 7, pp. 41–
46, Jan. 2011.
[2] R. H and A. T, “Feature Extraction of Chest X-ray Images and Analysis using PCA and kPCA,” International
Journal of Electrical and Computer Engineering (IJECE), vol. 8, no. 5, pp. 3392–3398, Oct. 2018, doi:
10.11591/ijece.v8i5.pp3392-3398.
[3] A. K. Nikhath and K. Subrahmanyam, “Feature selection, optimization and clustering strategies of text documents,”
International Journal of Electrical & Computer Engineering (IJECE), vol. 9, no. 2, pp. 1313–1320, Apr. 2019, doi:
http://doi.org/10.11591/ijece.v9i2.pp1313-1320.
[4] J. MacQueen, “Some methods for classification and analysis of multivariate observations,” presented at the
Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1967.
[5] A. K. Jain, “Data clustering: 50 years beyond K-means,” Pattern Recognit. Lett., vol. 31, no. 8, pp. 651–666, Jun.
2010, doi: https://doi.org/10.1016/j.patrec.2009.09.011.
[6] L. Kaufman and P. J. Rousseeuw, “Finding groups in data: an introduction to cluster analysis,” Hoboken, N.J:
Wiley, 2005.
[7] U. Ozertem, et al., “Mean shift spectral clustering,” Pattern Recognit., vol. 41, no. 6, pp. 1924–1938, Jun. 2008,
doi: https://doi.org/10.1016/j.patcog.2007.09.009.
[8] Y. Weiss, “Segmentation using eigenvectors: a unifying view,” Proceedings of the Seventh IEEE International
Conference on Computer Vision, Kerkyra, Greece, vol. 2, 1999, pp. 975–982, doi: 10.1109/ICCV.1999.790354.
[9] E. mehdi Cherrat, Rachid Alaoui, Hassane Bouzahir., “Improving of Fingerprint Segmentation Images Based on K-
MEANS and DBSCAN Clustering,” International Journal of Electrical and Computer Engineering (IJECE), vol.
9, no. 4, pp. 2425–2432, Aug. 2019, doi: 10.11591/ijece.v9i4.pp2425-2432.
[10] A. Rodriguez and A. Laio, “Clustering by fast search and find of density peaks,” Science, vol. 344, no. 6191, pp.
1492–1496, Jun. 2014, doi: 10.1126/science.1242072.
[11] X.-F. Wang and Y. Xu, “Fast clustering using adaptive density peak detection,” Stat. Methods Med. Res., vol. 26,
no. 6, pp. 2800–2811, Dec. 2017, doi: https://doi.org/10.1177%2F0962280215609948.
[12] L. E. Fattahi, et al., “Clustering based on density estimation using variable kernel and maximum entropy principle,”
Intelligent Systems and Computer Vision (ISCV), 2017, pp. 1–7.
11. Int J Elec & Comp Eng ISSN: 2088-8708
Clustering using kernel entropy principal component analysis… (Loubna El Fattahi)
2119
[13] K. Slaoui, “Optimisation des méthodes d’analyses factorielles, discriminante, et de classification des données
statistiques et dynamiques par le principe de maximum d’entropie,” Doctoral thesis, University of Sidi Mohamed
Ben Abdellah, Fez, 2000.
[14] I. T. Jolliffe and J. Cadima, “Principal component analysis: a review and recent developments,” Philos. Trans. R.
Soc. Math. Phys. Eng. Sci., vol. 374, no. 2065, 2016, doi: https://doi.org/10.1098/rsta.2015.0202.
[15] S. Damavandinejadmonfared and V. Varadharajan, “A New Extension of Kernel Principal Component Analysis for
Finger Vein Authentication,” Comput. Sci., vol. 159, p. 5, 2015.
[16] E. T. Jaynes, “On the rationale of maximum-entropy methods,” Proc. IEEE, vol. 70, no. 9, pp. 939–952, Sep. 1982,
doi: 10.1109/PROC.1982.12425.
[17] O. Ammor, et al., “Optimal Fuzzy Clustering in Overlapping Clusters,” The International Arab Journal of
Information Technology, vol. 5, no. 4, p. 7, 2008.
[18] T. F. Abidin, et al., “Singular Value Decomposition for dimensionality reduction in unsupervised text learning
problems,” 2nd International Conference on Education Technology and Computer, Shanghai, China, 2010, pp. V4-
422-V4-426, doi: 10.1109/ICETC.2010.5529649.
[19] Z. Rustam and S. Hartini, “New feature selection based on kernel,” Bull. Electr. Eng. Inform., vol. 9, no. 4,
pp. 1569-1577–1577, Aug. 2020, doi: https://doi.org/10.11591/eei.v9i4.1959.
[20] O. A. Adegbola, et al., “A principal component analysis-based feature dimensionality reduction scheme for
content-based image retrieval system,” TELKOMNIKA Telecommun. Comput. Electron. Control, vol. 18, no. 4, pp.
1892–1896, Aug. 2020, doi: 10.12928/TELKOMNIKA.v18i4.11176.
[21] K. M. Rao, et al., “Dimensionality reduction and hierarchical clustering in framework for hyperspectral image
segmentation,” Bull. Electr. Eng. Inform., vol. 8, no. 3, pp. 1081-1087–1087, Sep. 2019, doi:
http://dx.doi.org/10.11591/eei.v8i3.1451.
[22] C. E. Shannon, “A Mathematical Theory of Communication,” ACM SIGMOBILE mobile computing and
communications review, vol. 5, no. 1, pp. 3-55, 2001.
[23] L. E. Fattahi and E. H. Sbai, “Kernel entropy principal component analysis using Parzen estimator,” Int. Conf.
Intell. Syst. Comput. Vis. ISCV, pp. 1–8, 2018.
[24] B. W. Silverman, “Density Estimation for Statistics and Data Analysis,” Boca Raton: Chapman and Hall/CRC,
1986.
[25] E. Parzen, “On Estimation of a Probability Density Function and Mode,” Ann. Math. Stat., vol. 33, no. 3, pp. 1065–
1076, Sep. 1962.
[26] M. Rosenblatt, “Remarks on Some Nonparametric Estimates of a Density Function,” Ann. Math. Stat., vol. 27, no.
3, pp. 832–837, Sep. 1956.
[27] “UCI Machine Learning Repository.” [Online]. Available: https://archive.ics.uci.edu/ml/index.php.
[28] D. J. Ketchen and C. L. Shook, “The Application of Cluster Analysis in Strategic Management Research: An
Analysis and Critique,” Strateg. Manag. J., vol. 17, no. 6, pp. 441–458, 1996.
[29] M.Charytanowicz, et al.,“A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images,”
Information technologies in biomedicine. Springer, Berlin, Heidelberg, pp. 15-24, 2010..
[30] A. Koita, Dimitri Daucher, and Michel Fogli.., “Multidimensional risk assessment for vehicle trajectories by using
copulas,” ICOSSAR, France, 2013.
[31] A. Boubezoul, et al., “Vehicle trajectories classification using Support Vectors Machines for failure trajectory
prediction,” 2009 International Conference on Advances in Computational Tools for Engineering Applications.,
pp. 486–491, 2009, doi: 10.1109/ACTEA.2009.5227873.
BIOGRAPHIES OF AUTHORS
L. El Fattahi, she got a master degree on “signals, systems and informatics” from Faculty of
Sciences, Dhar El Mehraz, University Sidi Mohamed Ben Abdallah, Fez, Morocco, in 2015. She
has been working on his Phd thesis at the University of Moulay Ismaïl, Meknes, Morocco, with
her research team "Control Systems and Information Processing (CSIP)" at High School of
Technology of Meknes. Her scientific interests include data analysis, artificial intelligence,
cluster analysis, image processing and machine learning.
S. El Hassan, received the Doctorate degree (Doctorat de 3ème cycle) from the University of
Sidi Mohammed Ben Abdellah, Fez, Morocco in 1992, and the Doctorate of sciences (Doctorat
d’Etat) from the university of Moulay Ismaïl, Meknes, Morocco in 2001. Since 1993, he is a
Professor in automatics and informatics at High School of Technology of Meknes. His main
research interest are cluster analysis, pattern recognition, machine learning and computer vision.