SlideShare a Scribd company logo
The International Journal Of Engineering And Science (IJES)
|| Volume || 5 || Issue || 10 || Pages || PP 35-39 || 2016 ||
ISSN (e): 2319 – 1813 ISSN (p): 2319 – 1805
www.theijes.com The IJES Page 35
Applying K-Means Clustering Algorithm to Discover Knowledge
from Insurance Dataset Using WEKA Tool
Dr. Abdelrahman Elsharif Karrar1
, Marwa Abdelhameed Abdalrahman2
,
Moez Mutasim Ali3
1
College of Computer Science and Engineering, Taibah University, Saudi Arabia
2
College of Computer Science and Information Technology, University of Science and Technology, Sudan
3
College of Computer Science and Information Technology, University of Science and Technology, Sudan
--------------------------------------------------------ABSTRACT-------------------------------------------------------------
Data mining works to extract information known in advance from the enormous quantities of data which can
lead to knowledge. It provides information that helps to make good decisions. The effectiveness of data mining
in access to knowledge to achieve the goal of which is the discovery of the hidden facts contained in databases
and through the use of multiple technologies. Clustering is organizing data into clusters or groups such that
they have high intra-cluster similarity and low inter cluster similarity. This paper deals with K-means clustering
algorithm which collect a number of data based on the characteristics and attributes of this data, and process
the Clustering by reducing the distances between the data center. This algorithm is applied using open source
tool called WEKA, with the Insurance dataset as its input.
Keywords: Clustering, Centroid, Data Mining, knowledge discovery, K-means, WEKA.
---------------------------------------------------------------------------------------------------------------------------------------
Date of Submission: 17 May 2016 Date of Accepted: 05 November 2016
--------------------------------------------------------------------------------------------------------------------------------------
I. INTRODUCTION
The present age is characterized by the use of advanced data technology to save and retrieve data and enormous
quantities and which is described data warehousing. These data provide open the door to a range of specialized
in the management of those data subjects was the most prominent topic of data mining, which is one of the
important methods to get useful information from the data. Scientists have known the term data mining as "part
of the process of knowledge discovery in databases, which are made using multiple methods its goal configure
models of data". [1]
Data mining techniques are designed to extract hidden information in the database, this modern technology has
imposed itself firmly in the information age and in the light of the great technological development and
widespread use of databases, they offer institutions in all areas the ability to explore and focus on the most
important information in the databases. Data mining techniques focus on predictions future, explore behavior
and trends allowing to take decisions correct and taken at the right time, also as the data mining techniques to
answer Many questions in record time, especially those questions that are difficult to answer them by using
methods statistical traditional. [2]
II. DATA MINING
Data mining technology is simply extract the important information from huge amount of information to follow
certain mechanisms analysis this information, or it’s a technique used in the process of extracting data from data
warehouses. Data Mining passes a number of stages starting from data cleansing, and standardization of data,
and the relevant test data, then transfer them, classify then evaluated and extract data. [3]
There are many definitions of the concept of data mining:
 Is a computerized search for the knowledge of data without prior assumptions about what can be this
knowledge?
 Is the process used by companies to convert the raw data into useful information?
 Analysis of large sets of data and summarize the data in the new forms be understandable and useful to its
users and is done in banks telecommunications companies commercial transactions and scientific data
(Biology - Astronomy).
 Process data analysis by linking them with artificial intelligence techniques and statistical process, is simply a
process of exploration and search for specific and useful information in a huge amount of data.
Applying K-means Clustering Algorithm to Discover Knowledge from Insurance Dataset Using…
www.theijes.com The IJES Page 36
 Search the relevant information together collected by common characteristics and linked unit subject or
specialization is then to find this information between a very large amounts of information that has no
relationship was presented to the decision maker. [3]
III. TECHNIQUES OR METHODS OF DATA MINING
Data mining process uses several techniques through which you can discover the hidden trends and models in
large amounts of data and can be used one or more of these techniques are as follows:
 Prediction: Use of available data and the application of certain techniques to give them values successful
future.
 Description: Process description available data to see their ratings by the presence of the relationships
between them.
 Classification: It is a set of data analysis to create a set of database assembled that can be used to classify
any future data to find information that relates to the common characteristics and classification many tools
such as decision tree, nearest neighbor and regression.
 Association: Is the database that includes fixed coupling relationships among a group of objects in the
database any association between the occurrence of an event and another event occurs, which is often
called the market basket analysis.
 Sequential Analysis: Which is similar to association and placed in the name the link, but analysis linked in
time in the Search for models occurs in any deal with the succession of data that occur in separate cases.
 Clustering: The idea of collecting data is a simple idea in nature and very close to the human way of
thinking where we are whenever we deal with a large amount of data tends to summarize the vast amount
of data into a small number of groups or categories in order to facilitate the process of analysis. Algorithms
assembly used widely not only for the organization and classification of data but also the data compression
and build a model arrangement.The process of clustering is the process of collecting objects or items that
possess the qualities and attributes are similar in groups called clusters. The process of clustering one of the
main roads in the process of data mining and can be used as a standalone tool to gain insight into how the
distribution of data, control characteristics of each group and focus on a specific set of groups so for further
analysis and can be as a preliminary step to the work of an elementary or other techniques such as
characterization and classification. [4]
IV. K-MEANS ALGORITHM
Is one of the clustering algorithms, it collect a number of data based on the characteristics and attributes of this
data and the process of the Clustering by reducing the distances between the data center. The steps of this
algorithm are:
 Determine the number of clusters K which is a step Initialize Preliminary.
 Determine the coordinates of the centers of clusters (Centroid) randomly for the first time and calculate the
average of the points that belong to the center for the rest of the times.
 Calculate the distance between each example and among all centers and is used Euclidean dimension. Given
the Euclidean distance ) ij
d
) between the two examples( i,, j ) the following relationship:
 Data collection (examples) with its nearest center.
 Repeat steps 2 through 4 until you get stability (and the absence of moving objects within the clusters), or
even repeating a certain number of times. [5]
Before you start in the presentation of the results of the application must describe the data which were used in
the search: this paper apply the algorithm (k-means) to the insurance company data to clarify the best payment
method that can be available to the customers, as the company suffered from the challenges and difficulties
limit their ability to deal with payment methods. This was done using a program WEKA.
V. WEKA
WEKA is a data mining system developed by the University of Waikato in New Zealand that implements data
mining algorithms using the JAVA language. WEKA is a state-ofthe-art facility for developing machine
learning (ML) techniques and their application to real-world data mining problems. It is a collection of machine
learning algorithms for data mining tasks. The algorithms are applied directly to a dataset. [6]



n
k
jkikij
xxd
1
2
)(
Applying K-means Clustering Algorithm to Discover Knowledge from Insurance Dataset Using…
www.theijes.com The IJES Page 37
WEKA implements algorithms for data preprocessing, classification, regression, clustering and association
rules. It also includes visualization tools. [6]
To perform cluster analysis in weka the dataset is needed to be loaded to weka and it should be in the format of
CSV or ARFF file format. If the dataset is not in arff format we need to be converting it.
Figure 1: WEKA GUI
Figure 2: Customer Data in the Insurance Company
Applying K-means Clustering Algorithm to Discover Knowledge from Insurance Dataset Using…
www.theijes.com The IJES Page 38
Figure 3: Choose the Data File
Figure 4: Choosing the Attributes
Figure 5: Cluster mode
Applying K-means Clustering Algorithm to Discover Knowledge from Insurance Dataset Using…
www.theijes.com The IJES Page 39
Figure 6: Clustered output
VI. RESULT DISCUSSION
When we study the output report we can see four clusters (cluster 0, cluster1, cluster2, cluster3). In the report
appear as:
Number of clusters selected by cross validation: 4
The first column gives you the overall population centroid. The second and third and fourth and fifth columns
give you the centroids for cluster 0, 1, 2 and 3 respectively. Each row gives the centroid coordinate for the
specific dimension.
Here we must notice that finding the centroids is an essential part of the algorithm. The centroids are a result of
a specific run of the algorithm and are not unique, a different run may generate a different centroid set.
For each cluster there is “clusters prior” probability. The estimators consist of a number for each possible
attribute value, and the attribute values are treated in order.
 Cluster0 has total 4 objects, out of which majority of objects (6) data sets.
 Cluster1 has total of 22 objects, out of which majority of objects (32) data sets.
 Cluster2 has total 20 objects, out of which majority of objects (29) data sets.
 Cluster3 has total 22 objects, out of which majority of objects (32) data sets.
VII. CONCLUSION
Clustering is organizing data into clusters or groups such that they have high intra-cluster similarity and low
inter cluster similarity. K-means algorithm is one of the clustering algorithms, it collects a number of data based
on their characteristics and attributes, and run the process of Clustering by reducing the distances between the
data center. WEKA an open source tool is used to apply K-means algorithm on insurance dataset.
REFERENCE
[1]. Daniel T. Larose, “Data Mining Methods and Models”, 2011.
[2]. G. K. Gupta, “Introduction to Data Mining with Case Studies”, 2006.
[3]. Gregory Piatetsky, “From Data Mining to Knowledge Discovery:An Introduction”, 2012.
[4]. Jiawei Han, Micheline Kamber, “Data Mining: Concepts and Techniques (Second Edition)”, 2014.
[5]. Subhash Sharma, Ajith Kumar, “Cluster Analysis and Factor Analysis”, 2014.
[6]. Yizhou Sun, “an Introduction to WEKA”, 2008.

More Related Content

What's hot

Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...
IOSRjournaljce
 
Certain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic DataminingCertain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic Datamining
ijdmtaiir
 
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Universitas Pembangunan Panca Budi
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
IJERA Editor
 
Ijariie1184
Ijariie1184Ijariie1184
Ijariie1184
IJARIIE JOURNAL
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
DataminingTools Inc
 
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
IJCSIS Research Publications
 
Distributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebDistributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic Web
Editor IJCATR
 
V2 i9 ijertv2is90699-1
V2 i9 ijertv2is90699-1V2 i9 ijertv2is90699-1
V2 i9 ijertv2is90699-1
warishali570
 
Enhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaEnhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging area
IJDKP
 
Seminar Presentation
Seminar PresentationSeminar Presentation
Seminar Presentation
Vaibhav Dhattarwal
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
Valerii Klymchuk
 
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
Editor IJMTER
 
Variance rover system web analytics tool using data
Variance rover system web analytics tool using dataVariance rover system web analytics tool using data
Variance rover system web analytics tool using data
eSAT Publishing House
 
Variance rover system
Variance rover systemVariance rover system
Variance rover system
eSAT Journals
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection method
IJSRD
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsHypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining Algorithms
IJERA Editor
 
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYCLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
Editor IJMTER
 
IRJET- Improving the Performance of Smart Heterogeneous Big Data
IRJET- Improving the Performance of Smart Heterogeneous Big DataIRJET- Improving the Performance of Smart Heterogeneous Big Data
IRJET- Improving the Performance of Smart Heterogeneous Big Data
IRJET Journal
 

What's hot (19)

Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...
 
Certain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic DataminingCertain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic Datamining
 
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
 
Ijariie1184
Ijariie1184Ijariie1184
Ijariie1184
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
 
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
 
Distributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebDistributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic Web
 
V2 i9 ijertv2is90699-1
V2 i9 ijertv2is90699-1V2 i9 ijertv2is90699-1
V2 i9 ijertv2is90699-1
 
Enhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaEnhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging area
 
Seminar Presentation
Seminar PresentationSeminar Presentation
Seminar Presentation
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
 
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
 
Variance rover system web analytics tool using data
Variance rover system web analytics tool using dataVariance rover system web analytics tool using data
Variance rover system web analytics tool using data
 
Variance rover system
Variance rover systemVariance rover system
Variance rover system
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection method
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsHypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining Algorithms
 
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYCLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
 
IRJET- Improving the Performance of Smart Heterogeneous Big Data
IRJET- Improving the Performance of Smart Heterogeneous Big DataIRJET- Improving the Performance of Smart Heterogeneous Big Data
IRJET- Improving the Performance of Smart Heterogeneous Big Data
 

Viewers also liked

Receitas AlimentaçãO Escolar Lanche Gostoso 12
Receitas AlimentaçãO Escolar Lanche Gostoso 12Receitas AlimentaçãO Escolar Lanche Gostoso 12
Receitas AlimentaçãO Escolar Lanche Gostoso 12
tsunamidaiquiri
 
Artes rudolf 01
Artes rudolf 01Artes rudolf 01
Selection of Plastics by Design of Experiments
Selection of Plastics by Design of ExperimentsSelection of Plastics by Design of Experiments
Selection of Plastics by Design of Experiments
theijes
 
Primera guía de observación
Primera guía de observaciónPrimera guía de observación
Primera guía de observación
NormalistaV
 
Practical Implementation for Stator Faults Protection and Diagnosisin 3-Ph IM...
Practical Implementation for Stator Faults Protection and Diagnosisin 3-Ph IM...Practical Implementation for Stator Faults Protection and Diagnosisin 3-Ph IM...
Practical Implementation for Stator Faults Protection and Diagnosisin 3-Ph IM...
theijes
 
Aula1
Aula1Aula1
Domingo Cristo Rey ciclo c
Domingo Cristo Rey ciclo cDomingo Cristo Rey ciclo c
Domingo Cristo Rey ciclo c
Diócesis de Mayagüez
 
Arte vs Diseño
Arte vs DiseñoArte vs Diseño
Arte vs Diseño
Frida López
 

Viewers also liked (8)

Receitas AlimentaçãO Escolar Lanche Gostoso 12
Receitas AlimentaçãO Escolar Lanche Gostoso 12Receitas AlimentaçãO Escolar Lanche Gostoso 12
Receitas AlimentaçãO Escolar Lanche Gostoso 12
 
Artes rudolf 01
Artes rudolf 01Artes rudolf 01
Artes rudolf 01
 
Selection of Plastics by Design of Experiments
Selection of Plastics by Design of ExperimentsSelection of Plastics by Design of Experiments
Selection of Plastics by Design of Experiments
 
Primera guía de observación
Primera guía de observaciónPrimera guía de observación
Primera guía de observación
 
Practical Implementation for Stator Faults Protection and Diagnosisin 3-Ph IM...
Practical Implementation for Stator Faults Protection and Diagnosisin 3-Ph IM...Practical Implementation for Stator Faults Protection and Diagnosisin 3-Ph IM...
Practical Implementation for Stator Faults Protection and Diagnosisin 3-Ph IM...
 
Aula1
Aula1Aula1
Aula1
 
Domingo Cristo Rey ciclo c
Domingo Cristo Rey ciclo cDomingo Cristo Rey ciclo c
Domingo Cristo Rey ciclo c
 
Arte vs Diseño
Arte vs DiseñoArte vs Diseño
Arte vs Diseño
 

Similar to Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Dataset Using WEKA Tool

Applications Of Clustering Techniques In Data Mining A Comparative Study
Applications Of Clustering Techniques In Data Mining  A Comparative StudyApplications Of Clustering Techniques In Data Mining  A Comparative Study
Applications Of Clustering Techniques In Data Mining A Comparative Study
Fiona Phillips
 
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
Editor IJCATR
 
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning AlgorithmsSurvey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
IRJET Journal
 
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESA SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
IJCSES Journal
 
Association rule visualization technique
Association rule visualization techniqueAssociation rule visualization technique
Association rule visualization technique
mustafasmart
 
Data Mining System and Applications: A Review
Data Mining System and Applications: A ReviewData Mining System and Applications: A Review
Data Mining System and Applications: A Review
ijdpsjournal
 
Analysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry SystemAnalysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry System
IJSRD
 
Analysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry SystemAnalysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry System
IJSRD
 
Data Mining Framework for Network Intrusion Detection using Efficient Techniques
Data Mining Framework for Network Intrusion Detection using Efficient TechniquesData Mining Framework for Network Intrusion Detection using Efficient Techniques
Data Mining Framework for Network Intrusion Detection using Efficient Techniques
IJAEMSJORNAL
 
How Partitioning Clustering Technique For Implementing...
How Partitioning Clustering Technique For Implementing...How Partitioning Clustering Technique For Implementing...
How Partitioning Clustering Technique For Implementing...
Nicolle Dammann
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introduction
Basma Gamal
 
Frequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social MediaFrequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social Media
IJERA Editor
 
Seminar Report Vaibhav
Seminar Report VaibhavSeminar Report Vaibhav
Seminar Report Vaibhav
Vaibhav Dhattarwal
 
Applying Classification Technique using DID3 Algorithm to improve Decision Su...
Applying Classification Technique using DID3 Algorithm to improve Decision Su...Applying Classification Technique using DID3 Algorithm to improve Decision Su...
Applying Classification Technique using DID3 Algorithm to improve Decision Su...
IJMER
 
Feature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesFeature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering Techniques
IRJET Journal
 
Software Bug Detection Algorithm using Data mining Techniques
Software Bug Detection Algorithm using Data mining TechniquesSoftware Bug Detection Algorithm using Data mining Techniques
Software Bug Detection Algorithm using Data mining Techniques
AM Publications
 
A LITERATURE REVIEW ON DATAMINING
A LITERATURE REVIEW ON DATAMININGA LITERATURE REVIEW ON DATAMINING
A LITERATURE REVIEW ON DATAMINING
Carrie Romero
 
31 34
31 3431 34

Similar to Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Dataset Using WEKA Tool (18)

Applications Of Clustering Techniques In Data Mining A Comparative Study
Applications Of Clustering Techniques In Data Mining  A Comparative StudyApplications Of Clustering Techniques In Data Mining  A Comparative Study
Applications Of Clustering Techniques In Data Mining A Comparative Study
 
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
 
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning AlgorithmsSurvey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
 
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESA SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
 
Association rule visualization technique
Association rule visualization techniqueAssociation rule visualization technique
Association rule visualization technique
 
Data Mining System and Applications: A Review
Data Mining System and Applications: A ReviewData Mining System and Applications: A Review
Data Mining System and Applications: A Review
 
Analysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry SystemAnalysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry System
 
Analysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry SystemAnalysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry System
 
Data Mining Framework for Network Intrusion Detection using Efficient Techniques
Data Mining Framework for Network Intrusion Detection using Efficient TechniquesData Mining Framework for Network Intrusion Detection using Efficient Techniques
Data Mining Framework for Network Intrusion Detection using Efficient Techniques
 
How Partitioning Clustering Technique For Implementing...
How Partitioning Clustering Technique For Implementing...How Partitioning Clustering Technique For Implementing...
How Partitioning Clustering Technique For Implementing...
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introduction
 
Frequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social MediaFrequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social Media
 
Seminar Report Vaibhav
Seminar Report VaibhavSeminar Report Vaibhav
Seminar Report Vaibhav
 
Applying Classification Technique using DID3 Algorithm to improve Decision Su...
Applying Classification Technique using DID3 Algorithm to improve Decision Su...Applying Classification Technique using DID3 Algorithm to improve Decision Su...
Applying Classification Technique using DID3 Algorithm to improve Decision Su...
 
Feature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesFeature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering Techniques
 
Software Bug Detection Algorithm using Data mining Techniques
Software Bug Detection Algorithm using Data mining TechniquesSoftware Bug Detection Algorithm using Data mining Techniques
Software Bug Detection Algorithm using Data mining Techniques
 
A LITERATURE REVIEW ON DATAMINING
A LITERATURE REVIEW ON DATAMININGA LITERATURE REVIEW ON DATAMINING
A LITERATURE REVIEW ON DATAMINING
 
31 34
31 3431 34
31 34
 

Recently uploaded

Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
Rahul
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt
PuktoonEngr
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
yokeleetan1
 
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
Mukeshwaran Balu
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
mamunhossenbd75
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
JamalHussainArman
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
Low power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniquesLow power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniques
nooriasukmaningtyas
 
Exception Handling notes in java exception
Exception Handling notes in java exceptionException Handling notes in java exception
Exception Handling notes in java exception
Ratnakar Mikkili
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
University of Maribor
 

Recently uploaded (20)

Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
 
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
Low power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniquesLow power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniques
 
Exception Handling notes in java exception
Exception Handling notes in java exceptionException Handling notes in java exception
Exception Handling notes in java exception
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
 

Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Dataset Using WEKA Tool

  • 1. The International Journal Of Engineering And Science (IJES) || Volume || 5 || Issue || 10 || Pages || PP 35-39 || 2016 || ISSN (e): 2319 – 1813 ISSN (p): 2319 – 1805 www.theijes.com The IJES Page 35 Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Dataset Using WEKA Tool Dr. Abdelrahman Elsharif Karrar1 , Marwa Abdelhameed Abdalrahman2 , Moez Mutasim Ali3 1 College of Computer Science and Engineering, Taibah University, Saudi Arabia 2 College of Computer Science and Information Technology, University of Science and Technology, Sudan 3 College of Computer Science and Information Technology, University of Science and Technology, Sudan --------------------------------------------------------ABSTRACT------------------------------------------------------------- Data mining works to extract information known in advance from the enormous quantities of data which can lead to knowledge. It provides information that helps to make good decisions. The effectiveness of data mining in access to knowledge to achieve the goal of which is the discovery of the hidden facts contained in databases and through the use of multiple technologies. Clustering is organizing data into clusters or groups such that they have high intra-cluster similarity and low inter cluster similarity. This paper deals with K-means clustering algorithm which collect a number of data based on the characteristics and attributes of this data, and process the Clustering by reducing the distances between the data center. This algorithm is applied using open source tool called WEKA, with the Insurance dataset as its input. Keywords: Clustering, Centroid, Data Mining, knowledge discovery, K-means, WEKA. --------------------------------------------------------------------------------------------------------------------------------------- Date of Submission: 17 May 2016 Date of Accepted: 05 November 2016 -------------------------------------------------------------------------------------------------------------------------------------- I. INTRODUCTION The present age is characterized by the use of advanced data technology to save and retrieve data and enormous quantities and which is described data warehousing. These data provide open the door to a range of specialized in the management of those data subjects was the most prominent topic of data mining, which is one of the important methods to get useful information from the data. Scientists have known the term data mining as "part of the process of knowledge discovery in databases, which are made using multiple methods its goal configure models of data". [1] Data mining techniques are designed to extract hidden information in the database, this modern technology has imposed itself firmly in the information age and in the light of the great technological development and widespread use of databases, they offer institutions in all areas the ability to explore and focus on the most important information in the databases. Data mining techniques focus on predictions future, explore behavior and trends allowing to take decisions correct and taken at the right time, also as the data mining techniques to answer Many questions in record time, especially those questions that are difficult to answer them by using methods statistical traditional. [2] II. DATA MINING Data mining technology is simply extract the important information from huge amount of information to follow certain mechanisms analysis this information, or it’s a technique used in the process of extracting data from data warehouses. Data Mining passes a number of stages starting from data cleansing, and standardization of data, and the relevant test data, then transfer them, classify then evaluated and extract data. [3] There are many definitions of the concept of data mining:  Is a computerized search for the knowledge of data without prior assumptions about what can be this knowledge?  Is the process used by companies to convert the raw data into useful information?  Analysis of large sets of data and summarize the data in the new forms be understandable and useful to its users and is done in banks telecommunications companies commercial transactions and scientific data (Biology - Astronomy).  Process data analysis by linking them with artificial intelligence techniques and statistical process, is simply a process of exploration and search for specific and useful information in a huge amount of data.
  • 2. Applying K-means Clustering Algorithm to Discover Knowledge from Insurance Dataset Using… www.theijes.com The IJES Page 36  Search the relevant information together collected by common characteristics and linked unit subject or specialization is then to find this information between a very large amounts of information that has no relationship was presented to the decision maker. [3] III. TECHNIQUES OR METHODS OF DATA MINING Data mining process uses several techniques through which you can discover the hidden trends and models in large amounts of data and can be used one or more of these techniques are as follows:  Prediction: Use of available data and the application of certain techniques to give them values successful future.  Description: Process description available data to see their ratings by the presence of the relationships between them.  Classification: It is a set of data analysis to create a set of database assembled that can be used to classify any future data to find information that relates to the common characteristics and classification many tools such as decision tree, nearest neighbor and regression.  Association: Is the database that includes fixed coupling relationships among a group of objects in the database any association between the occurrence of an event and another event occurs, which is often called the market basket analysis.  Sequential Analysis: Which is similar to association and placed in the name the link, but analysis linked in time in the Search for models occurs in any deal with the succession of data that occur in separate cases.  Clustering: The idea of collecting data is a simple idea in nature and very close to the human way of thinking where we are whenever we deal with a large amount of data tends to summarize the vast amount of data into a small number of groups or categories in order to facilitate the process of analysis. Algorithms assembly used widely not only for the organization and classification of data but also the data compression and build a model arrangement.The process of clustering is the process of collecting objects or items that possess the qualities and attributes are similar in groups called clusters. The process of clustering one of the main roads in the process of data mining and can be used as a standalone tool to gain insight into how the distribution of data, control characteristics of each group and focus on a specific set of groups so for further analysis and can be as a preliminary step to the work of an elementary or other techniques such as characterization and classification. [4] IV. K-MEANS ALGORITHM Is one of the clustering algorithms, it collect a number of data based on the characteristics and attributes of this data and the process of the Clustering by reducing the distances between the data center. The steps of this algorithm are:  Determine the number of clusters K which is a step Initialize Preliminary.  Determine the coordinates of the centers of clusters (Centroid) randomly for the first time and calculate the average of the points that belong to the center for the rest of the times.  Calculate the distance between each example and among all centers and is used Euclidean dimension. Given the Euclidean distance ) ij d ) between the two examples( i,, j ) the following relationship:  Data collection (examples) with its nearest center.  Repeat steps 2 through 4 until you get stability (and the absence of moving objects within the clusters), or even repeating a certain number of times. [5] Before you start in the presentation of the results of the application must describe the data which were used in the search: this paper apply the algorithm (k-means) to the insurance company data to clarify the best payment method that can be available to the customers, as the company suffered from the challenges and difficulties limit their ability to deal with payment methods. This was done using a program WEKA. V. WEKA WEKA is a data mining system developed by the University of Waikato in New Zealand that implements data mining algorithms using the JAVA language. WEKA is a state-ofthe-art facility for developing machine learning (ML) techniques and their application to real-world data mining problems. It is a collection of machine learning algorithms for data mining tasks. The algorithms are applied directly to a dataset. [6]    n k jkikij xxd 1 2 )(
  • 3. Applying K-means Clustering Algorithm to Discover Knowledge from Insurance Dataset Using… www.theijes.com The IJES Page 37 WEKA implements algorithms for data preprocessing, classification, regression, clustering and association rules. It also includes visualization tools. [6] To perform cluster analysis in weka the dataset is needed to be loaded to weka and it should be in the format of CSV or ARFF file format. If the dataset is not in arff format we need to be converting it. Figure 1: WEKA GUI Figure 2: Customer Data in the Insurance Company
  • 4. Applying K-means Clustering Algorithm to Discover Knowledge from Insurance Dataset Using… www.theijes.com The IJES Page 38 Figure 3: Choose the Data File Figure 4: Choosing the Attributes Figure 5: Cluster mode
  • 5. Applying K-means Clustering Algorithm to Discover Knowledge from Insurance Dataset Using… www.theijes.com The IJES Page 39 Figure 6: Clustered output VI. RESULT DISCUSSION When we study the output report we can see four clusters (cluster 0, cluster1, cluster2, cluster3). In the report appear as: Number of clusters selected by cross validation: 4 The first column gives you the overall population centroid. The second and third and fourth and fifth columns give you the centroids for cluster 0, 1, 2 and 3 respectively. Each row gives the centroid coordinate for the specific dimension. Here we must notice that finding the centroids is an essential part of the algorithm. The centroids are a result of a specific run of the algorithm and are not unique, a different run may generate a different centroid set. For each cluster there is “clusters prior” probability. The estimators consist of a number for each possible attribute value, and the attribute values are treated in order.  Cluster0 has total 4 objects, out of which majority of objects (6) data sets.  Cluster1 has total of 22 objects, out of which majority of objects (32) data sets.  Cluster2 has total 20 objects, out of which majority of objects (29) data sets.  Cluster3 has total 22 objects, out of which majority of objects (32) data sets. VII. CONCLUSION Clustering is organizing data into clusters or groups such that they have high intra-cluster similarity and low inter cluster similarity. K-means algorithm is one of the clustering algorithms, it collects a number of data based on their characteristics and attributes, and run the process of Clustering by reducing the distances between the data center. WEKA an open source tool is used to apply K-means algorithm on insurance dataset. REFERENCE [1]. Daniel T. Larose, “Data Mining Methods and Models”, 2011. [2]. G. K. Gupta, “Introduction to Data Mining with Case Studies”, 2006. [3]. Gregory Piatetsky, “From Data Mining to Knowledge Discovery:An Introduction”, 2012. [4]. Jiawei Han, Micheline Kamber, “Data Mining: Concepts and Techniques (Second Edition)”, 2014. [5]. Subhash Sharma, Ajith Kumar, “Cluster Analysis and Factor Analysis”, 2014. [6]. Yizhou Sun, “an Introduction to WEKA”, 2008.