The document analyzes crop yield data from spatial locations in Guntur District, Andhra Pradesh, India using hybrid data mining techniques. It first applies k-means clustering to the dataset, producing 5 clusters. It then applies the J48 classification algorithm to the clustered data, resulting in a decision tree that predicts cluster membership based on attributes like crop type, irrigated area, and latitude. Analysis found irrigated areas of cotton and chilies increased from 2007-2008 to 2011-2012. Association rule mining on the clustered data also found relationships between productivity and location attributes. The hybrid approach of clustering followed by classification effectively analyzed the spatial agricultural data.
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
The clustering is a without monitoring process and one of the most common data mining techniques. The
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30
year it is still very popular among the developed clustering algorithm and then for improvement problem of
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO.
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the
problem’s optimal answer. The probe of results show that mooted algorithm have better performance
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality
of clustering.
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
The k- Means clustering algorithm is an old algorithm that has been intensely researched owing to its ease
and simplicity of implementation. Clustering algorithm has a broad attraction and usefulness in
exploratory data analysis. This paper presents results of the experimental study of different approaches to
k- Means clustering, thereby comparing results on different datasets using Original k-Means and other
modified algorithms implemented using MATLAB R2009b. The results are calculated on some performance
measures such as no. of iterations, no. of points misclassified, accuracy, Silhouette validity index and
execution time
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
The clustering is a without monitoring process and one of the most common data mining techniques. The
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30
year it is still very popular among the developed clustering algorithm and then for improvement problem of
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO.
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the
problem’s optimal answer. The probe of results show that mooted algorithm have better performance
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality
of clustering.
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
The k- Means clustering algorithm is an old algorithm that has been intensely researched owing to its ease
and simplicity of implementation. Clustering algorithm has a broad attraction and usefulness in
exploratory data analysis. This paper presents results of the experimental study of different approaches to
k- Means clustering, thereby comparing results on different datasets using Original k-Means and other
modified algorithms implemented using MATLAB R2009b. The results are calculated on some performance
measures such as no. of iterations, no. of points misclassified, accuracy, Silhouette validity index and
execution time
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization stratagem. This evolutionary technique always aims to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Optimising Data Using K-Means Clustering AlgorithmIJERA Editor
K-means is one of the simplest unsupervised learning algorithms that solve the well known clustering problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed a priori. The main idea is to define k centroids, one for each cluster. These centroids should be placed in a cunning way because of different location causes different result. So, the better choice is to place them as much as possible far away from each other.
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...IJECEIAES
A hard partition clustering algorithm assigns equally distant points to one of the clusters, where each datum has the probability to appear in simultaneous assignment to further clusters. The fuzzy cluster analysis assigns membership coefficients of data points which are equidistant between two clusters so the information directs have a place toward in excess of one cluster in the meantime. For a subset of CiteScore dataset, fuzzy clustering (fanny) and fuzzy c-means (fcm) algorithms were implemented to study the data points that lie equally distant from each other. Before analysis, clusterability of the dataset was evaluated with Hopkins statistic which resulted in 0.4371, a value < 0.5, indicating that the data is highly clusterable. The optimal clusters were determined using NbClust package, where it is evidenced that 9 various indices proposed 3 cluster solutions as best clusters. Further, appropriate value of fuzziness parameter m was evaluated to determine the distribution of membership values with variation in m from 1 to 2. Coefficient of variation (CV), also known as relative variability was evaluated to study the spread of data. The time complexity of fuzzy clustering (fanny) and fuzzy c-means algorithms were evaluated by keeping data points constant and varying number of clusters.
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
There is a tremendous proliferation in the amount of information available on the largest shared information source, the World Wide Web. Fast and high-quality clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the information. Recent studies have shown that partitional clustering algorithms such as the k-means algorithm are the most popular algorithms for clustering large datasets. The major problem with partitional clustering algorithms is that they are sensitive to the selection of the initial partitions and are prone to premature converge to local optima. Subtractive clustering is a fast, one-pass algorithm for estimating the number of clusters and cluster centers for any given set of data. The cluster estimates can be used to initialize iterative optimization-based clustering methods and model identification methods. In this paper, we present a hybrid Particle Swarm Optimization, Subtractive + (PSO) clustering algorithm that performs fast clustering. For comparison purpose, we applied the Subtractive + (PSO) clustering algorithm, PSO, and the Subtractive clustering algorithms on three different datasets. The results illustrate that the Subtractive + (PSO) clustering algorithm can generate the most compact clustering results as compared to other algorithms.
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
Data mining is the process of using technology to identify patterns and prospects from large amount of information. In Data Mining, Clustering is an important research topic and wide range of unverified classification application. Clustering is technique which divides a data into meaningful groups. K-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. In this paper, we present the comparison of different K-means clustering algorithms.
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...ijscmc
Face recognition is one of the most unobtrusive biometric techniques that can be used for access control as well as surveillance purposes. Various methods for implementing face recognition have been proposed with varying degrees of performance in different scenarios. The most common issue with effective facial biometric systems is high susceptibility of variations in the face owing to different factors like changes in pose, varying illumination, different expression, presence of outliers, noise etc. This paper explores a novel technique for face recognition by performing classification of the face images using unsupervised learning approach through K-Medoids clustering. Partitioning Around Medoids algorithm (PAM) has been used for performing K-Medoids clustering of the data. The results are suggestive of increased robustness to noise and outliers in comparison to other clustering methods. Therefore the technique can also be used to increase the overall robustness of a face recognition system and thereby increase its invariance and make it a reliably usable biometric modality
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as
automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic
clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization
stratagem. This evolutionary technique always aims
to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to
detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal
values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance
of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
A h k clustering algorithm for high dimensional data using ensemble learningijitcs
Advances made to the traditional clustering algorithms solves the various problems such as curse of
dimensionality and sparsity of data for multiple attributes. The traditional H-K clustering algorithm can
solve the randomness and apriority of the initial centers of K-means clustering algorithm. But when we
apply it to high dimensional data it causes the dimensional disaster problem due to high computational
complexity. All the advanced clustering algorithms like subspace and ensemble clustering algorithms
improve the performance for clustering high dimension dataset from different aspects in different extent.
Still these algorithms will improve the performance form a single perspective. The objective of the
proposed model is to improve the performance of traditional H-K clustering and overcome the limitations
such as high computational complexity and poor accuracy for high dimensional data by combining the
three different approaches of clustering algorithm as subspace clustering algorithm and ensemble
clustering algorithm with H-K clustering algorithm.
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGcscpconf
Data clustering is a process of arranging similar data into groups. A clustering algorithm
partitions a data set into several groups such that the similarity within a group is better than
among groups. In this paper a hybrid clustering algorithm based on K-mean and K-harmonic
mean (KHM) is described. The proposed algorithm is tested on five different datasets. The research is focused on fast and accurate clustering. Its performance is compared with the traditional K-means & KHM algorithm. The result obtained from proposed hybrid algorithm is much better than the traditional K-mean & KHM algorithm
Feature selection using modified particle swarm optimisation for face recogni...eSAT Journals
Abstract
One of the major influential factors which affects the accuracy of classification rate is the selection of right features. Not all features have vital role in classification. Many of the features in the dataset may be redundant and irrelevant, which increase the computational cost and may reduce classification rate. In this paper, we used DCT(Discrete cosine transform) coefficients as features for face recognition application. The coefficients are optimally selected based on a modified PSO algorithm. In this, the choice of coefficients is done by incorporating the average of the mean normalized standard deviations of various classes and giving more weightage to the lower indexed DCT coefficients. The algorithm is tested on ORL database. A recognition rate of 97% is obtained. Average number of features selected is about 40 percent for a 10 × 10 input. The modified PSO took about 50 iterations for convergence. These performance figures are found to be better than some of the work reported in literature.
Keywords: Particle swarm optimization, Discrete cosine transform, feature extraction, feature selection, face recognition, classification rate.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization stratagem. This evolutionary technique always aims to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Optimising Data Using K-Means Clustering AlgorithmIJERA Editor
K-means is one of the simplest unsupervised learning algorithms that solve the well known clustering problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed a priori. The main idea is to define k centroids, one for each cluster. These centroids should be placed in a cunning way because of different location causes different result. So, the better choice is to place them as much as possible far away from each other.
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...IJECEIAES
A hard partition clustering algorithm assigns equally distant points to one of the clusters, where each datum has the probability to appear in simultaneous assignment to further clusters. The fuzzy cluster analysis assigns membership coefficients of data points which are equidistant between two clusters so the information directs have a place toward in excess of one cluster in the meantime. For a subset of CiteScore dataset, fuzzy clustering (fanny) and fuzzy c-means (fcm) algorithms were implemented to study the data points that lie equally distant from each other. Before analysis, clusterability of the dataset was evaluated with Hopkins statistic which resulted in 0.4371, a value < 0.5, indicating that the data is highly clusterable. The optimal clusters were determined using NbClust package, where it is evidenced that 9 various indices proposed 3 cluster solutions as best clusters. Further, appropriate value of fuzziness parameter m was evaluated to determine the distribution of membership values with variation in m from 1 to 2. Coefficient of variation (CV), also known as relative variability was evaluated to study the spread of data. The time complexity of fuzzy clustering (fanny) and fuzzy c-means algorithms were evaluated by keeping data points constant and varying number of clusters.
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
There is a tremendous proliferation in the amount of information available on the largest shared information source, the World Wide Web. Fast and high-quality clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the information. Recent studies have shown that partitional clustering algorithms such as the k-means algorithm are the most popular algorithms for clustering large datasets. The major problem with partitional clustering algorithms is that they are sensitive to the selection of the initial partitions and are prone to premature converge to local optima. Subtractive clustering is a fast, one-pass algorithm for estimating the number of clusters and cluster centers for any given set of data. The cluster estimates can be used to initialize iterative optimization-based clustering methods and model identification methods. In this paper, we present a hybrid Particle Swarm Optimization, Subtractive + (PSO) clustering algorithm that performs fast clustering. For comparison purpose, we applied the Subtractive + (PSO) clustering algorithm, PSO, and the Subtractive clustering algorithms on three different datasets. The results illustrate that the Subtractive + (PSO) clustering algorithm can generate the most compact clustering results as compared to other algorithms.
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
Data mining is the process of using technology to identify patterns and prospects from large amount of information. In Data Mining, Clustering is an important research topic and wide range of unverified classification application. Clustering is technique which divides a data into meaningful groups. K-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. In this paper, we present the comparison of different K-means clustering algorithms.
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...ijscmc
Face recognition is one of the most unobtrusive biometric techniques that can be used for access control as well as surveillance purposes. Various methods for implementing face recognition have been proposed with varying degrees of performance in different scenarios. The most common issue with effective facial biometric systems is high susceptibility of variations in the face owing to different factors like changes in pose, varying illumination, different expression, presence of outliers, noise etc. This paper explores a novel technique for face recognition by performing classification of the face images using unsupervised learning approach through K-Medoids clustering. Partitioning Around Medoids algorithm (PAM) has been used for performing K-Medoids clustering of the data. The results are suggestive of increased robustness to noise and outliers in comparison to other clustering methods. Therefore the technique can also be used to increase the overall robustness of a face recognition system and thereby increase its invariance and make it a reliably usable biometric modality
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as
automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic
clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization
stratagem. This evolutionary technique always aims
to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to
detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal
values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance
of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
A h k clustering algorithm for high dimensional data using ensemble learningijitcs
Advances made to the traditional clustering algorithms solves the various problems such as curse of
dimensionality and sparsity of data for multiple attributes. The traditional H-K clustering algorithm can
solve the randomness and apriority of the initial centers of K-means clustering algorithm. But when we
apply it to high dimensional data it causes the dimensional disaster problem due to high computational
complexity. All the advanced clustering algorithms like subspace and ensemble clustering algorithms
improve the performance for clustering high dimension dataset from different aspects in different extent.
Still these algorithms will improve the performance form a single perspective. The objective of the
proposed model is to improve the performance of traditional H-K clustering and overcome the limitations
such as high computational complexity and poor accuracy for high dimensional data by combining the
three different approaches of clustering algorithm as subspace clustering algorithm and ensemble
clustering algorithm with H-K clustering algorithm.
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGcscpconf
Data clustering is a process of arranging similar data into groups. A clustering algorithm
partitions a data set into several groups such that the similarity within a group is better than
among groups. In this paper a hybrid clustering algorithm based on K-mean and K-harmonic
mean (KHM) is described. The proposed algorithm is tested on five different datasets. The research is focused on fast and accurate clustering. Its performance is compared with the traditional K-means & KHM algorithm. The result obtained from proposed hybrid algorithm is much better than the traditional K-mean & KHM algorithm
Feature selection using modified particle swarm optimisation for face recogni...eSAT Journals
Abstract
One of the major influential factors which affects the accuracy of classification rate is the selection of right features. Not all features have vital role in classification. Many of the features in the dataset may be redundant and irrelevant, which increase the computational cost and may reduce classification rate. In this paper, we used DCT(Discrete cosine transform) coefficients as features for face recognition application. The coefficients are optimally selected based on a modified PSO algorithm. In this, the choice of coefficients is done by incorporating the average of the mean normalized standard deviations of various classes and giving more weightage to the lower indexed DCT coefficients. The algorithm is tested on ORL database. A recognition rate of 97% is obtained. Average number of features selected is about 40 percent for a 10 × 10 input. The modified PSO took about 50 iterations for convergence. These performance figures are found to be better than some of the work reported in literature.
Keywords: Particle swarm optimization, Discrete cosine transform, feature extraction, feature selection, face recognition, classification rate.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
HOJITA EVANGELIO DOMINGO XX CICLO A BNNelson Gómez
COMUNIDAD PARROQUIAL NUESTRA SEÑORA DE LOS DOLORES
MISA DOMINICAL NIÑOS PARA LA PASTORAL NIÑOS
PEREIRA 17 DE AGOSTO DE 2014, DOMINGO XX TIEMPO ORDINARIO CICLO A
EVANGELIO SEGÚN SAN MATEO 15, 21-28
- Mujer, qué grande es tu fe -
PEREIRA – RISARALDA
COLOMBIA
2014
presentation for marketeers at a Datlinq event.How location and geography may provide insights and understanding of consumer behavior, how consumers are contributing themselves to better data, and how by analysing location we are able to make better decisions.
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
The clustering is a without monitoring process and one of the most common data mining techniques. The
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30
year it is still very popular among the developed clustering algorithm and then for improvement problem of
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO.
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the
problem’s optimal answer. The probe of results show that mooted algorithm have better performance
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality
of clustering.
A Novel Approach for Clustering Big Data based on MapReduce IJECEIAES
Clustering is one of the most important applications of data mining. It has attracted attention of researchers in statistics and machine learning. It is used in many applications like information retrieval, image processing and social network analytics etc. It helps the user to understand the similarity and dissimilarity between objects. Cluster analysis makes the users understand complex and large data sets more clearly. There are different types of clustering algorithms analyzed by various researchers. Kmeans is the most popular partitioning based algorithm as it provides good results because of accurate calculation on numerical data. But Kmeans give good results for numerical data only. Big data is combination of numerical and categorical data. Kprototype algorithm is used to deal with numerical as well as categorical data. Kprototype combines the distance calculated from numeric and categorical data. With the growth of data due to social networking websites, business transactions, scientific calculation etc., there is vast collection of structured, semi-structured and unstructured data. So, there is need of optimization of Kprototype so that these varieties of data can be analyzed efficiently.In this work, Kprototype algorithm is implemented on MapReduce in this paper. Experiments have proved that Kprototype implemented on Mapreduce gives better performance gain on multiple nodes as compared to single node. CPU execution time and speedup are used as evaluation metrics for comparison.Intellegent splitter is proposed in this paper which splits mixed big data into numerical and categorical data. Comparison with traditional algorithms proves that proposed algorithm works better for large scale of data.
Mine Blood Donors Information through Improved K-Means Clusteringijcsity
The number of accidents and health diseases which are increasing at an alarming rate are resulting in a huge increase in the demand for blood. There is a necessity for the organized analysis of the blood donor database or blood banks repositories. Clustering analysis is one of the data mining applications and K-means clustering algorithm is the fundamental algorithm for modern clustering techniques. K-means clustering algorithm is traditional approach and iterative algorithm. At every iteration, it attempts to find the distance from the centroid of each cluster to each and every data point. This paper gives the improvement to the original k-means algorithm by improving the initial centroids with distribution of data. Results and discussions show that improved K-means algorithm produces accurate clusters in less computation time to find the donors information
In recent machine learning community, there is a trend of constructing a linear logarithm version of
nonlinear version through the ‘kernel method’ for example kernel principal component analysis, kernel
fisher discriminant analysis, support Vector Machines (SVMs), and the current kernel clustering
algorithms. Typically, in unsupervised methods of clustering algorithms utilizing kernel method, a
nonlinear mapping is operated initially in order to map the data into a much higher space feature, and then
clustering is executed. A hitch of these kernel clustering algorithms is that the clustering prototype resides
in increased features specs of dimensions and therefore lack intuitive and clear descriptions without
utilizing added approximation of projection from the specs to the data as executed in the literature
presented. This paper aims to utilize the ‘kernel method’, a novel clustering algorithm, founded on the
conventional fuzzy clustering algorithm (FCM) is anticipated and known as kernel fuzzy c-means algorithm
(KFCM). This method embraces a novel kernel-induced metric in the space of data in order to interchange
the novel Euclidean matric norm in cluster prototype and fuzzy clustering algorithm still reside in the space
of data so that the results of clustering could be interpreted and reformulated in the spaces which are
original. This property is used for clustering incomplete data. Execution on supposed data illustrate that
KFCM has improved performance of clustering and stout as compare to other transformations of FCM for
clustering incomplete data.
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSEditor IJCATR
This paper presents a hybrid data mining approach based on supervised learning and unsupervised learning to identify the closest data patterns in the data base. This technique enables to achieve the maximum accuracy rate with minimal complexity. The proposed algorithm is compared with traditional clustering and classification algorithm and it is also implemented with multidimensional datasets. The implementation results show better prediction accuracy and reliability.
Ensemble based Distributed K-Modes ClusteringIJERD Editor
Clustering has been recognized as the unsupervised classification of data items into groups. Due to the explosion in the number of autonomous data sources, there is an emergent need for effective approaches in distributed clustering. The distributed clustering algorithm is used to cluster the distributed datasets without gathering all the data in a single site. The K-Means is a popular clustering method owing to its simplicity and speed in clustering large datasets. But it fails to handle directly the datasets with categorical attributes which are generally occurred in real life datasets. Huang proposed the K-Modes clustering algorithm by introducing a new dissimilarity measure to cluster categorical data. This algorithm replaces means of clusters with a frequency based method which updates modes in the clustering process to minimize the cost function. Most of the distributed clustering algorithms found in the literature seek to cluster numerical data. In this paper, a novel Ensemble based Distributed K-Modes clustering algorithm is proposed, which is well suited to handle categorical data sets as well as to perform distributed clustering process in an asynchronous manner. The performance of the proposed algorithm is compared with the existing distributed K-Means clustering algorithms, and K-Modes based Centralized Clustering algorithm. The experiments are carried out for various datasets of UCI machine learning data repository.
A Threshold Fuzzy Entropy Based Feature Selection: Comparative StudyIJMER
Feature selection is one of the most common and critical tasks in database classification. It
reduces the computational cost by removing insignificant and unwanted features. Consequently, this
makes the diagnosis process accurate and comprehensible. This paper presents the measurement of
feature relevance based on fuzzy entropy, tested with Radial Basis Classifier (RBF) network,
Bagging(Bootstrap Aggregating), Boosting and stacking for various fields of datasets. Twenty
benchmarked datasets which are available in UCI Machine Learning Repository and KDD have been
used for this work. The accuracy obtained from these classification process shows that the proposed
method is capable of producing good and accurate results with fewer features than the original
datasets.
A Novel Approach to Mathematical Concepts in Data Miningijdmtaiir
-This paper describes three different fundamental
mathematical programming approaches that are relevant to
data mining. They are: Feature Selection, Clustering and
Robust Representation. This paper comprises of two clustering
algorithms such as K-mean algorithm and K-median
algorithms. Clustering is illustrated by the unsupervised
learning of patterns and clusters that may exist in a given
databases and useful tool for Knowledge Discovery in
Database (KDD). The results of k-median algorithm are used
to collecting the blood cancer patient from a medical database.
K-mean clustering is a data mining/machine learning algorithm
used to cluster observations into groups of related observations
without any prior knowledge of those relationships. The kmean algorithm is one of the simplest clustering techniques
and it is commonly used in medical imaging, biometrics and
related fields.
Data mining is a process to extract information from a huge amount of data and transform it into an
understandable structure. Data mining provides the number of tasks to extract data from large databases such
as Classification, Clustering, Regression, Association rule mining. This paper provides the concept of
Classification. Classification is an important data mining technique based on machine learning which is used to
classify the each item on the bases of features of the item with respect to the predefined set of classes or groups.
This paper summarises various techniques that are implemented for the classification such as k-NN, Decision
Tree, Naïve Bayes, SVM, ANN and RF. The techniques are analyzed and compared on the basis of their
advantages and disadvantages
Data imputing uses to posit missing data values, as missing data have a negative effect on the computation validity of models. This study develops a genetic algorithm (GA) to optimize imputing for missing cost data of fans used in road tunnels by the Swedish Transport Administration (Trafikverket). GA uses to impute the missing cost data using an optimized valid data period. The results show highly correlated data (R- squared 0.99) after imputing the missing data. Therefore, GA provides a wide search space to optimize imputing and create complete data. The complete data can be used for forecasting and life cycle cost analysis. Ritesh Kumar Pandey | Dr Asha Ambhaikar"Data Imputation by Soft Computing" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-4 , June 2018, URL: http://www.ijtsrd.com/papers/ijtsrd14112.pdf http://www.ijtsrd.com/computer-science/real-time-computing/14112/data-imputation-by-soft-computing/ritesh-kumar-pandey
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
A046010107
1. Ch. Mallikarjuna Rao et al Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 6( Version 1), June 2014, pp.01-07
www.ijera.com 1 | P a g e
Crop Yield Analysis of the Irrigated Areas of All Spatial
Locations in Guntur District of AP.
Ch. Mallikarjuna Rao1
, Dr A. Ananda Rao2
(Department of computer science and Engineering, Asso. Prof GRIET)
(Department of Computer Science and Engineering, Professor, JNTUA)
ABSTRACT
Spatial data mining is a process to discover interesting and potentially useful spatial patterns embedded in
spatial databases, which are voluminous in their sizes. Efficient techniques for extracting information from geo-
spatial data sets can be of importance to various sectors like research, defense and private organizations for
generating and managing large geo-spatial data sets. The current approach towards solving spatial data mining
problems is to use classical data mining techniques. Effective analysis was done using the hybrid data mining
techniques by mixing both clustering and classification techniques. In this paper crop yield of spatial locations
of Guntur district were taken and studied using the hybrid technique.
Keywords: geo-spatial data sets, hybrid data mining technique, clustering, classification, spatial locations
I. Introduction
Indian agriculture is known for more
fluctuations in terms of crop yield, crop output and
crop intensity. Despite growth in Technology and
irrigation the production and income are highly
instable [11]. Guntur district is located in the state of
Andhra Pradesh. It is situated along the east coast of
the Bay of Bengal. Its coastline is approximately 100
kilometers. It is the largest city and is the
administrative center of the District. It has 57
mandals starting from Guntur, Pedakakani,
Prathipadu, etc..to ending with Nizampatnam. Major
crops are Cotton and Chillies. Analysis need to be
done on the agriculture data sets which requires
classical data mining techniques apart from statistical
techniques. The data mining techniques[12] like
classification, clustering and association are required
to apply on the realistic data sets for analysis and
conclusions on the agriculture crop yields [8, 9, and
10] of various seasons like Kharif and rubby. The
following existing techniques are discussed along
with proposed hybrid approach.
II. Literature Survey
K-Means clustering Algorithm:The k-means
algorithm [3] [Hartigan& Wong 1979] is the well
known clustering technique used in scientific and
industrial applications. Its name comes from centroid
which is the mean of c of k clusters C. This technique
is not suitable for categorical attributes. It is more
suitable for numerical attributes. K-means[1]
algorithm uses squared error criteria and is
extensively used algorithm. The data is partitioned
into K clusters (C1;C2; : : : ;CK), using this algorithm
which are represented by their centers or means. The
mean of all the instances belonging to that cluster
gives the center of each cluster.
The pseudo-code of the K-means algorithm
is given by Fig.1. The algorithm begins with
randomly selected initial set of cluster centers. Each
instance is assigned to its closest cluster center for
every iteration based on Euclidean distance
calculated between the two. After thatre-calculate the
cluster centers.
the instances number belonging to cluster k is
denoted byNkand the mean of the cluster kis
represented by µk.
There is a possibility for a number of
convergence conditions. If the partitioning error is
not reduced by the relocation of the centers the search
will be stopped as it concludes that the present
partition is locally optimal. Another stopping
criteriais that if it exceeds a pre-defined number of
iterations.
Input: S (instance set), K (no. of clusters)
Output: clusters
1: Initialize K cluster centers.
2: while termination condition is not satisfied do
3: Assign instances to the nearest cluster center.
4: Update cluster centers based on the assignment.
5: end while
Figure1. K-means Algorithm
This algorithm may be viewed as a gradient-
decent procedure. In which it begins with random
selection of an initial set number of K cluster-centers
and iteratively updates by which error function will
decrease. A rigorous proof of the finite convergence
RESEARCH ARTICLE OPEN ACCESS
2. Ch. Mallikarjuna Rao et al Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 6( Version 1), June 2014, pp.01-07
www.ijera.com 2 | P a g e
of the K-means type algorithms is given in [4]. The
complexity of T iterations of the K-means algorithm
performed on a sample size of m instances, each
characterized by N attributes, is: O(T * K * m * N).
This linear complexity is one of the reasons for the
popularity of the K- means algorithms. Even if the
number of instances is substantially large (which
often is the case nowadays), this algorithm is
computationally attractive. Thus, the K-means
algorithm has an advantage in comparison to other
clustering methods (e.g. hierarchical clustering
methods), which have non-linear complexity. Other
reasons for the algorithm’s popularity are its ease of
interpretation, simplicity of implementation, speed of
convergence and adaptability to sparse data [5]. The
Achilles heel of the K-means algorithm involves the
selection of the initial partition. The algorithm is very
sensitive to this selection, which may make the
difference between global and local minimum. Being
a typical partitioning algorithm, the K-means
algorithm works well only on data sets having
isotropic clusters, and is not as versatile as single link
algorithms, for instance.
In addition, this algorithm is sensitive to
noisy data and outliers (a single outlier can increase
the squared error dramatically); it is applicable only
when mean is defined (namely, for numeric
attributes); and it requires the number of clusters in
advance, which is not trivial when no prior
knowledge is available. The use of the K-means
algorithm is often limited to numeric attributes.
The similarity measure for numeric attributes and
categorical attributes differ in the first case take the
square Euclidean distance; where as in second case
take the number of mismatches between objects and
the cluster prototypes.
Another partitioning algorithm, which
attempts to minimize the SSE is the K-medoids or
PAM (partition around medoids[2]). It is similar to
the K-means algorithm. It differs with k-means
mainly in representation of the different clusters.
Each cluster is represented by the most centric object
in the cluster, rather than by the implicit mean that
may not belong to the cluster. This method is more
robust than K-means algorithm even if there is noise
and outliers. The influence of outliers is less. The
processing of K-means is less costlierwhen compared
to this. It is required to declare the number of clusters
K for both the methods.It is not mandatory to use
only SSE. Estivill-Castro (2000) analyzed the total
absolute error criterion. He stated that it is better to
summing up the absolute error instead of summing
up the squared error. This method is superior in terms
of robustness, but it requires more computational
effort. The objective function is defined as the sum of
discrepancies between a point and its centroid which
is expressed through an appropriate distance. The
total intra-cluster variance is the norm based
objective function, and is defined as the sum of
squares of errors between the points and the
corresponding centroids,
E( C ) = 𝑥𝑖 − 𝑐𝑗
2𝑘
𝑗=1 𝑥 𝑖 ∈𝐶 𝑗
It can be rationalized as log-likelihood for
normally distributed mixture model. In Statistic it is
extensively used. Therefore, it is derived from
general framework of probability. Its iterative
optimization has two versions. The first technique is
similar to EM algorithm one of the Data mining
technique. It has two-stepsof major iterations in
which first time reassigns all the points to their
nearest centroids, and second timerecompute
centroids of newly formed groups. Iterations continue
until a stopping criterion is achieved (for example, no
reassignments happen). This version is known as
Forgy.s algorithm [6] and has many advantages:
It easily works with any -norm p L
It allows straightforward parallelization[5]
It is insensitive with respect to data ordering.
If a move has a positive effect, the point is
relocated and the two centroids are recomputed.
J48 algorithm:J48 [7]is the optimization of C4.5
algorithm and also animprovised versions of it.Its
output result is a Decision tree which is of a tree
structure. It has a root node, along with intermediate
nodes apart from leaf nodes. Except Root node and
leaf nodes every other node in the tree consistsof
decision and which leads to our result. This tree
divides the given space of a data set into areas of
mutually exclusive. In this its data points are
described by every area will have a label, a value or
an action.The criterion of splitting is used to find
which attribute makes the best split on the portion of
tree of the training data set which in turn reaches to a
particular node. Decision tree is formed by using the
children attribute of the data set.
III. Proposed Approach
The raw data set was converted to the required format
and then apply the data mining technique namely k-
means clustering algorithm and then we get the new
data set namely clustered data set. The classification
technique J48 was applied on that clustered data set
which results in hybrid model.
IV. Implementation of proposed
approach
A data set on Irrigation for the Year 2007-
08& 2011-12 of Guntur district were considered for
analysis with k-means (or simple k-means) clustering
data [8,9] mining technique having 15 attributes
namely Mandal, latitude, latitude1, longitude,
longitude1, crop, area_irrigated, area_unirrigated, etc.
Clustering is an Unsupervised one and here Missing
values are replaced with mean and mode apart from
3. Ch. Mallikarjuna Rao et al Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 6( Version 1), June 2014, pp.01-07
www.ijera.com 3 | P a g e
Discretization with 10 bins. Here full training set as
the clustering model. The cluster table was
represented by Table 1 and Cluster proximities were
given by Table2. The cluster graph was shown below
in Fig.2. The 5 clusters are shown in 4 different
colors with instances on x-axis and mandals on y-
axis. A hybrid model was developed and applied after
applying K-means clustering.We took the J48
classification technique on the obtained clustered
data set. The result was summarized as follows. It is
with Minimum confidence is 0.25 and binary tree
classifier. It also has same number of instances
i.e.114 with 19 attributes. The attributes are
Instance_number, mandal, latitude, latitude1,
longitude, longitude1, crop, area_irrigated,
area_unirrigated, area_total, etc. attributes. The J48
pruned tree is given below.
crop = Cotton
| latitude<= 16.37184
| | area_unirrigated<= 3947: cluster4 (34.0/2.0)
| | area_unirrigated> 3947: cluster3 (6.0)
| latitude> 16.37184: cluster3 (17.0/1.0)
crop = Chillies
| pr
oductivity_unirrigated<= 3.239
| | latitude <= 16.340198
| | | production_irrigated<= 11367.5: cluster2
(35.0/1.0)
| | | production_irrigated> 11367.5: cluster1
(2.0)
| | latitude > 16.340198: cluster1 (15.0)
| productivity_unirrigated> 3.239: cluster0 (5.0)
The pruned tree in tree structure was shown in Fig.3
with crop as root node and clusters as leaf nodes.
Correctly Classified Instances are 103 (90.351%) and
Incorrectly Classified Instances are 11 (9.649%).
Kappa statistic value is 0.8716.Kappa is a chance-
corrected measure of agreement between the
classifications and the true classes. It's calculated by
taking the agreement expected by chance away from
the observed agreement and dividing by the
maximum possible agreement. A value greater than 0
means that your classifier is doing better than chance.
The Kappa statistic is near to 1 so it is near to perfect
agreement. Confusion Matrix: In the confusion
matrix each column represents instances in the
predicted class and each row indicates instances in
actual class. These mining techniques were applied
on various data sets.
V. Results & Analysis:
Initially k-means clustering technique was
applied on Guntur Irrigation data set for analysis.
Cluster 0: It was found that for the
PedanandipaduMandal centroids of the latitude is
16.4542 N, Longitude is 80.0421 E with crop Chillies
and area irrigated is 1531 area un-irrigated is 710
with area total is 1946 with highest productivity
_irrigated 5.5 and productivity_unirrigated is 5. It
also has highest production_irrigated and lowest
Production_unirrigated with Productivity _total and
Production_total as 0.
Cluster1 and 2 are also chillies as crop and
production-irrigated is highest in cluster1with
mandalsMangalagiri and chebrolu. Cluster 3 and 4
are with Mandals Guntur and nagaram and cotton is
their crop. Cluster 3 is with highest area-irrigated and
area_unirrigated. It also has highest area_total and
production-unirrigated among the clusters. Confusion
matrix shows that cluster 0 was classified accurately
next to that was cluster 2. In cluster 1 out of 18
instances 23 were classified under cluster 2 and 1 was
classified under cluster 3. Cluster3 and4 are not
classified correctly. The values of TP rate and F-
measure for cluster 0 are 1 indicates they are
classified accurately. Cluster2 and 4 are also has TP
rate and F-measure are almost 1,so classified almost
accurately.
Fig.4 and Fig.5 are showing that the
irrigated areas of major crops cotton and chillis
across mandals are increasing from the years 2007-08
to the years 2011-12. In these graphs number of
mandals were taken along x-axis and Irrigated area
along y-axis. Cotton area in Hectors is higher than
Chillis area in Hectors shown in Fig.6. in this graph
name of the mandalswere taken along x-axis and
Area in Hectors is taken along y-axis. Interesting
measures were found when Association rule Apriori
was applied on the clustered spatial data set and the
rules were shown in fig.7 and fig.8. Initially
discretization was done on the clustered data set as a
hybrid technique later apriori association rule was
applied. It was found that as we increase min support
from 0.1to 0.4 and to 0.6 and then to 0.8 keeping
minimum support to 0.9 constant the Latitude 1 = N,
Longitude 1 = E, and Productivity_total are
associated with 114 instances and along with the
other attribute productivity_irrigated are together
formed association rules with 99 instances. When
minimum support and minimum confidence attains
values both 0.9 or either of them takes the values 0.9
or 1.0 vice versa or both takes values as 1.0 the
attributes longitude1(E), latitude 1(N) and
Productivity_total are only associated together.
4. Ch. Mallikarjuna Rao et al Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 6( Version 1), June 2014, pp.01-07
www.ijera.com 4 | P a g e
Cluster centroids:
Cluster#
Attribute Full Data 0 1 2 3 4
(114) (5) (18) (35) (23) (33)
mandal Guntur Pedanandipadu Mangalagiri Chebrolu Guntur Nagaram
latitude 16.2535 16.4542 16.4478 16.1354 16.4331 16.1173
latitude1 N N N N N N
longitude 80.2683 80.0421 80.0966 80.3918 80.0998 80.3826
longitude1 E E E E E E
crop Cotton ChilliesChilliesChillies Cotton Cotton
area_irrigated 1476.8222 1531.7644 1737.3037 1108.6413 1901.1014 1421.202
area_unirrigated 1995.4521 710.4 970.7976 1391.1284 4536.2805
1619.1283
area_total 2566.5663 1946.8 2179.3889 1485.2265 5499.0681 1974.6685
productivity_irrigated 5.186 5.531 5.4436 5.4042 4.7033 5.0984
productivity_unirrigated2.7223 5.0448 2.7092 2.7223 2.2873 2.6809
productivity_total 0 0 0 0 0 0
production_irrigated 11367.5 11772.9 13594.4167 10537.5143 11100.7391 11157.6061
production_unirrigated 9012.8438 3560 8729.3524 9012.8438 11117.8492 8526.5388
production_total 0 0 0 0 0 0
Table 1: Cluster table on Irrigation data set
Fig.2: Cluster graph
Fig.3: Pruned tree
5. Ch. Mallikarjuna Rao et al Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 6( Version 1), June 2014, pp.01-07
www.ijera.com 5 | P a g e
Cluster-I Cluster-II Cluster-III Cluster-IV Cluster - V
5 1.682367 1 1.682367 1 1.682367 1 1.682367 1 1.682367
4 1.682367 3 2.236069 2 2.236069 3 2.236069 3 2.236069
2 1.682367 4 2.236069 4 2.236069 2 2.236069 4 2.236069
3 1.682367 5 2.236069 5 2.236069 5 2.236069 2 2.236069
Table2. Clusters along with their Proximities
The Confusion Matrix is given as below
a bc d e <-- classified as
5 0 0 0 0 | a = cluster0
0 14 3 1 0 | b = cluster1
0 0 34 0 1 | c = cluster2
0 2 0 20 1 | d = cluster3
0 0 1 2 30 | e = cluster4
Fig4: Cotton irrigated area across years and mandals
Fig.5: Chillis irrigated area across years and mandals
-1000
0
1000
2000
3000
4000
5000
6000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57
IrrigatedArea
Numbeer of Mandals
Cotton Irrigated area across years Irrigated 2007-08
Irrigated 2011-12
0
2000
4000
6000
8000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57
Irrigatedarea
Number of Mandals
Chillies Irrigated area across years Irrigated 2007-08
Irrigated 2011-12
7. Ch. Mallikarjuna Rao et al Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 6( Version 1), June 2014, pp.01-07
www.ijera.com 7 | P a g e
VI. Conclusion
There is a correlation between the cotton
irrigated area of Irrigation 2007-08 and
Irrigation2011-12 data and its value is 0.753871.
There is a higher correlation between chillies
irrigated area of Irrigation 2007-08 and
Irrigation2011-12 data and its value is
0.869333.These analysis are specified in the results
and analysis. Future scope of this hybrid approach
can be extended to various agricultural spatial
locations and also to various agricultural yields for
effective analysis .
References
[1] PavelBerkhin, Survey of Clustering Data
Mining Techniques, Accrue Software, Inc.
[2] Kaufman, L. and Rousseeuw, P.J., 1987,
Clustering by Means of Medoids, In Y.
Dodge, editor, Statistical Data Analysis,
based on the L1 Norm, pp. 405- 416,
Elsevier/North Holland, Amsterdam.
[3] Hartigan, J. A. Clustering algorithms. John
Wiley and Sons., 1975.
[4] Selim, S.Z., and Ismail, M.A. K-means-type
algorithms: a generalized convergence
theorem and characterization of local
optimality. In IEEE transactions on pattern
analysis and machine learning, vol. PAMI-6,
no. 1, January, 1984.
[5] Dhillon I. and Modha D., Concept
Decomposition for Large Sparse Text Data
Using Clustering.Machine Learning.42,
pp.143-175. (2001).
[6] FORGY, E. 1965. Cluster analysis of
multivariate data: Efficiency versus
interpretability of classification. Biometrics,
21, 768-780.
[7] Yugalkumar and G. Sahoo, Analysis of
Bayes, Neural Network and Tree Classifier
of Classification Technique in Data Mining
using WEKA.
[8] Dr.T. V. RajiniKanth, Ananthoju Vijay
Kumar, Estimation of the Influence of
Fertilizer Nutrients Consumption on the
Wheat Crop yield in India- a Data mining
Approach, 30 Dec 2013, Volume 3, Issue 2,
Pg.No: 316-320, ISSN: 2249-8958 (Online).
[9] Dr.T. V. RajiniKanth, Ananthoju Vijay
Kumar, A Data Mining Approach for the
Estimation of Climate Change on the Jowar
Crop Yield in India, 25Dec2013,Volume 2
Issue 2, Pg.No:16-20, ISSN: 2319-6378
(Online).
[10] A. Vijay Kumar, “Estimation of the
Influential Factors of rice yield in India” 2nd
International Conference on Advanced
Computing methodologies ICACM-2013,
02-03 Aug 2013, ElsevierPublications, Pg.
No: 459-465, ISBN No: 978-93-35107-14-
95
[11] Ramesh Chand, S.S. Raju, Instability in
Andhra Pradesh agriculture -A Disaggregate
Analysis, Agricultural Economics Research
Review Vol. 21 July-December 2008 pp283-
288.
[12] D. Hand, el al., Principles of Data
Mining. Massachusetts: MIT Press, 2001.
AUTHORS
Ch.MallikarjunaRao
Received his B.Tech degree in computer
Science and engineering from Dr.Baba sahib
AmbedkarMarathwada University, Aurangabad,
Maharastra in 1998,and M.Tech Degree in Computer
Science and Engineering from J.N.T.U Anantapur
,Andhrapradesh in 2007. He is currently pursuing his
Ph.D degree from JNTU Ananthapur University,
Andhra Pradesh. Currently he is working as
Associate Professor in the department of Computer
Science and Engineering of GokarajuRangaraju
Institute of Engineering and Technology, Hyderabad,
India.His research interest includes Data bases and
data mining.
Dr. AnandaRaoAkepogureceived his B.
TechdegreeinComputerScience&Engineerin
g from University of Hyderabad, Andhra
Pradesh, India and M.Tech degree in A.I &
Robotics from University of Hyderabad,
Andhra Pradesh, India. He received Ph.D degree from
Indian Institute of Technology Madras, Chennai, India.
He is Professor of ComputerScience& Engineering
Department and currently working as Principal of
JNTUA College of Engineering, Anantapur, Jawaharlal
Nehru Technological University, Andhra Pradesh, India.
Dr. Rao published more than 100 publications in various
National and International Journals/ Conferences. He
received Best Research Paper award for the paper
titled “An Approach to Test Case Design for Cost
Effective Software Testing” in an International
Conference on Software Engineering held at Hong
Kong, 18-20 March 2009. He also received Best
Educationist Award for outstanding achievements in
the field of education by International Institute of
Education & Management, New Delhi on 21st Jan.
2012. He bagged Bharat VidyaShiromani Award
from Indian Solidarity Council and
RashtriyaVidyaGaurav Gold Medal Award from
International Institute of Education & Management,
New Delhi on 19th March, 2012. Dr.Rao got Best
Computer Science and Engineering Faculty award from
ISTE for the Year 2013. His main research interest
includes software engineering and data mining.