SlideShare a Scribd company logo
1 of 31
2nd International Conference on Advance
Information Scientific Development
5-6 August 2022
Combination of K-Means Method with Davies
Bouldin Index and Decision Tree Method with
Parameter Optimization for Best Performance
Elly Muningsih1
Chandra Kesuma2
Aprih Widayanto3
Sunanto4
Suripah5
2nd International Conference on Advance Information Scientific Development
1,2,4,5 Universitas Bina Sarana Informatika, 3Universitas Nusa Mandiri
1.Introduction
2.Relevant Studies
3.Research Method
 Clustering is one of the common ways of mining used to explore closed
structures in a data set approach (Rohini & Suseendran, 2016).
 The purpose of Clustering is to divide the dataset into groups that share the
same similarities or characteristics (Muningsih et al., 2020).
 Some popular Clustering methods include K-Means, Fuzzy C-Means,
DBSCAN, K-Medoids.
 While Classification is the process to classify new observations based on a
predetermined class, namely supervised learning (Gupta & Chandra,
2020).
 Classification techniques work on labeled data sets and classifications are
helpful for predicting class labels that are classified or categorized
(Diwathe & Dongare, 2017).
 Some popular Clasification methods are Decision Tree, Naïve Bayes,
Neural Network, kNN, Support Vector Machine.
Figure 1. Clustering and Classification
 This research will develop a combination of Clustering K-Means method and
Decision Tree Clasification method.
 The K-Means method is used to group datasets into groups.
 To overcome one of the shortcomings of the K-Means method in determining
the number of clusters, Davies Bouldin Index (IDB) is used which is known from
the smallest value.
 The result of Clustering is then used as a label to be classified using Decision
Tree Method with Parameter Optimization to get the highest performance
(accuracy, precision and recall).
Similar research has been conducted by (Rohini & Suseendran, 2016) to
analyze spirometry data that is widely used for medical i.e. application-
related.
The methods used are the K-Means And Decision Tree Aggregate
methods. From the results of the investigation, it is known that the
proposed K-Means Aggregate algorithm and Decision Tree algorithm for
spirometry data are better compared to other algorithms such as genetic
algorithms, classification training algorithms, and neural network-based
classification algorithms.
Other research conducted by (Khan & Mohamudally, 2011) which integrates the
K-means clustering algorithm with the Decision Tree (ID3) algorithm.
Decision Tree (ID3) is the best choice, used for the interpretation of K-Means
algorithm clusters because it is more user friendly, faster to generate and simpler
to explain "understandable" decision rules, compared to other data mining
algorithms.
This research resulted in an efficient Data Mining algorithm using intelligent
agents, called Learning Intelligent Agents (LIAgent), capable of performing
classification, grouping, and interpretation tasks on data sets.
The method used in this study is the Data
Mining method of clustering and classification
functions.
For clustering use the K-Means Method and for
classification use the Decision Tree Method.
Data processing for this study conducted using
RapidMiner tools
K-Means is a simple and fast method, commonly used because it is easy to
implement and a relatively small number of iterations (Lin & Ji, 2020).
While Davies-Bouldin Index is one method to evaluate the validity of
clusters in clustering where the principle of DBI measurement is to maximize
the distance between clusters and at the same time minimize the distance
between points in a cluster (Jumadi Dehotman Sitompul, Salim Sitompul, &
Sihombing, 2019).
The smallest DBI value represents the best among other DBI values.
Decision Tree is the most commonly used predictive model for classification
which is a structure-like flowchart in which each node shows a test on an
attribute value, each branch represents the test result and the tree leaf
represents a class or distribution class (Jain & Srivastava, 2013).
A Decision Tree is a format that contains root vertices, branches, and leaf
segments.
The topmost node in the tree is referred to as the root node.
The main goal is to generate a model that predicts the value of variables
needed based on many input variables that are also used by the decision
tree classification model of prediction-based rules (Rohini & Suseendran,
2016).
Dataset
The dataset used for processing is sales transaction dataset weekly which has
initial data 811 record and 104 attributes. The data attribute in question is
Product_Code, sales data from 52 weeks, minimum sales data, maximum sales
data, and Normalized weekly data values.
Data Pre-Processing
From the existing attribute data then selected attributes Product_Code consisting
of P1, P2, P3, P4 ..., P819 as and sales data from 52 weeks namely W0, W1, W2
...,W51 contains integer data for the first data processing using clustering method
K-Means with P1, P2, P3, P4 ..., P819 as 'id'. The next stage of clustering results is
processed again using the Decision Tree method with additional cluster attributes
as 'labels' with cluster 0, cluster1 and cluster2 fields.
Modeling and Evaluation
This research consists of two parts and data processing conducted using
RapidMiner tools.
The first part is modelling the K-Means Method, as shown in Figure 2.
The processed data will be linked with the Replace Missing Value operator to
remove the lost data. After that it is then connected with the Clustering
operator in this case the K-Means Method. The Performance operator is used
to see the Davies Bouldin Index value where the smallest value indicates the
optimal number of clusters. The test is to test the number of clusters 3 to 10.
Figure 2. Modelling the K-Means Method
The second part is data processing using Decision Tree data
classification method divided into 2 data training and testing data with
a ratio of 80:20. Operators used to know the optimization of
parameters for the best performance is Optimize Parameter (Grid).
Modelling for Decision Tree with Parameter Optimization is shown in
Figure 3 through Figure 5.
Figure 3. Modelling Decision Tree with Parameter Optimization
Figure 4. Decision Tree Modelling Testing Figure 5. Detail Modelling Decision Tree
In general accuracy, precision, and classifying acquisition are used to evaluate
model performance. Samples that predict positive or negative categories can
be known from the classifying prediction report. At the same time, the data
category can be known and can be obtained the calculation values of the four
basic indicators as shown in Table 1.
Positive Negative
True TP TN
False FP FN
Table 1. Confusion Matrix
The first data processing using the K-
Means method, from performance tests
to the number of clusters 3 to 10, is
known the smallest Davies Bouldin Index
value is 0.626 so it is known that the
optimal number of clusters is 3. as shown
in Table 2 :
Cluster DBI Value
3 0,626
4 0,864
5 0,777
6 1,988
7 1,939
8 2,204
9 2,342
10 2,178
Table 3. Cluster and DBI Value
The result of clustering which is then processed by Decision Tree
Method with parameter optimization for the best performance, known
by its parameters and values are:
a.Cross Validation.number_of_folds : 8
b.Decision Tree. Criterion : information_gain
c.Decision Tree.maximal_depth : 50
d.Decision Tree.apply_pruning : true
e.Decision Tree.apply_prepuning : true
For the resulting performance is in Table 3 as follows :
Training Data Testing Data
Akurasi 98,06 97,12
Precision 97,64 97,10
Recall 97,49 96,05
Because the accuracy value of the tests conducted is greater than 90, the
classification in this study is excellent clasification. From the data processing
done, also obtained Decision Tree as shown in Figure 6 where W8 is the
root that most determines and affects the decision tree or called the root
node.
Table 3. Accuracy, Precision and Recall Values
Figure 6. Decision Tree
Description of the Decision Tree above are :
W8 > 5.500
| W15 > 26.500: cluster_1 {cluster_2=0, cluster_0=0, cluster_1=86}
| W15 ≤ 26.500
| | W0 > 4.500
| | | W19 > 3.500: cluster_2 {cluster_2=134, cluster_0=0, cluster_1=1}
| | | W19 ≤ 3.500: cluster_0 {cluster_2=0, cluster_0=2, cluster_1=0}
| | W0 ≤ 4.500
| | | W5 > 9: cluster_2 {cluster_2=2, cluster_0=0, cluster_1=0}
| | | W5 ≤ 9: cluster_0 {cluster_2=0, cluster_0=17, cluster_1=0}
W8 ≤ 5.500
| W34 > 10.500: cluster_2 {cluster_2=2, cluster_0=0, cluster_1=0}
| W34 ≤ 10.500: cluster_0 {cluster_2=0, cluster_0=324, cluster_1=0}
Conclusion
1. From the data processing conducted, it is known that in this study the K-
Means method with the Davies Bouldin Index produced an optimal
cluster count of 3.
2. Decision Tree method will produce the best performance with parameter
settings namely Iteration, Cros Validation, Criterion, Maximal Depth,
Pruning and Pre Pruning.
3. The combination of clustering method and classification in this study was
able to produce excellent classification category because the accuracy
value in the test > 90.
4. In the next study in the second part of the study can be compared with
classification methods such as Naïve Bayes, Neural Network, kNN and
Linear Regression.
icaisd presentation 2021.pptx

More Related Content

Similar to icaisd presentation 2021.pptx

classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf321106410027
 
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...ijcnes
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsHypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsIJERA Editor
 
Classification and decision tree classifier machine learning
Classification and decision tree classifier machine learningClassification and decision tree classifier machine learning
Classification and decision tree classifier machine learningFrancisco E. Figueroa-Nigaglioni
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerIJERA Editor
 
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...Editor IJCATR
 
Review of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionReview of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionIRJET Journal
 
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINEA NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINEaciijournal
 
Advanced Computational Intelligence: An International Journal (ACII)
Advanced Computational Intelligence: An International Journal (ACII)Advanced Computational Intelligence: An International Journal (ACII)
Advanced Computational Intelligence: An International Journal (ACII)aciijournal
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseasesijsrd.com
 
IJCSI-10-6-1-288-292
IJCSI-10-6-1-288-292IJCSI-10-6-1-288-292
IJCSI-10-6-1-288-292HARDIK SINGH
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET Journal
 
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...csandit
 
Distributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebDistributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebEditor IJCATR
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Applicationaciijournal
 
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATIONANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATIONaciijournal
 
IRJET- Medical Data Mining
IRJET- Medical Data MiningIRJET- Medical Data Mining
IRJET- Medical Data MiningIRJET Journal
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Applicationaciijournal
 

Similar to icaisd presentation 2021.pptx (20)

classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf
 
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsHypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining Algorithms
 
G046024851
G046024851G046024851
G046024851
 
Classification and decision tree classifier machine learning
Classification and decision tree classifier machine learningClassification and decision tree classifier machine learning
Classification and decision tree classifier machine learning
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
 
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
 
Review of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionReview of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & Prediction
 
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINEA NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE
 
Advanced Computational Intelligence: An International Journal (ACII)
Advanced Computational Intelligence: An International Journal (ACII)Advanced Computational Intelligence: An International Journal (ACII)
Advanced Computational Intelligence: An International Journal (ACII)
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseases
 
IJCSI-10-6-1-288-292
IJCSI-10-6-1-288-292IJCSI-10-6-1-288-292
IJCSI-10-6-1-288-292
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data Mining
 
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
 
Distributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebDistributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic Web
 
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Application
 
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATIONANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
 
IRJET- Medical Data Mining
IRJET- Medical Data MiningIRJET- Medical Data Mining
IRJET- Medical Data Mining
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Application
 

Recently uploaded

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 

Recently uploaded (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

icaisd presentation 2021.pptx

  • 1. 2nd International Conference on Advance Information Scientific Development 5-6 August 2022
  • 2. Combination of K-Means Method with Davies Bouldin Index and Decision Tree Method with Parameter Optimization for Best Performance Elly Muningsih1 Chandra Kesuma2 Aprih Widayanto3 Sunanto4 Suripah5 2nd International Conference on Advance Information Scientific Development 1,2,4,5 Universitas Bina Sarana Informatika, 3Universitas Nusa Mandiri
  • 4.
  • 5.  Clustering is one of the common ways of mining used to explore closed structures in a data set approach (Rohini & Suseendran, 2016).  The purpose of Clustering is to divide the dataset into groups that share the same similarities or characteristics (Muningsih et al., 2020).  Some popular Clustering methods include K-Means, Fuzzy C-Means, DBSCAN, K-Medoids.
  • 6.  While Classification is the process to classify new observations based on a predetermined class, namely supervised learning (Gupta & Chandra, 2020).  Classification techniques work on labeled data sets and classifications are helpful for predicting class labels that are classified or categorized (Diwathe & Dongare, 2017).  Some popular Clasification methods are Decision Tree, Naïve Bayes, Neural Network, kNN, Support Vector Machine.
  • 7. Figure 1. Clustering and Classification
  • 8.  This research will develop a combination of Clustering K-Means method and Decision Tree Clasification method.  The K-Means method is used to group datasets into groups.  To overcome one of the shortcomings of the K-Means method in determining the number of clusters, Davies Bouldin Index (IDB) is used which is known from the smallest value.  The result of Clustering is then used as a label to be classified using Decision Tree Method with Parameter Optimization to get the highest performance (accuracy, precision and recall).
  • 9.
  • 10. Similar research has been conducted by (Rohini & Suseendran, 2016) to analyze spirometry data that is widely used for medical i.e. application- related. The methods used are the K-Means And Decision Tree Aggregate methods. From the results of the investigation, it is known that the proposed K-Means Aggregate algorithm and Decision Tree algorithm for spirometry data are better compared to other algorithms such as genetic algorithms, classification training algorithms, and neural network-based classification algorithms.
  • 11. Other research conducted by (Khan & Mohamudally, 2011) which integrates the K-means clustering algorithm with the Decision Tree (ID3) algorithm. Decision Tree (ID3) is the best choice, used for the interpretation of K-Means algorithm clusters because it is more user friendly, faster to generate and simpler to explain "understandable" decision rules, compared to other data mining algorithms. This research resulted in an efficient Data Mining algorithm using intelligent agents, called Learning Intelligent Agents (LIAgent), capable of performing classification, grouping, and interpretation tasks on data sets.
  • 12.
  • 13. The method used in this study is the Data Mining method of clustering and classification functions. For clustering use the K-Means Method and for classification use the Decision Tree Method. Data processing for this study conducted using RapidMiner tools
  • 14. K-Means is a simple and fast method, commonly used because it is easy to implement and a relatively small number of iterations (Lin & Ji, 2020). While Davies-Bouldin Index is one method to evaluate the validity of clusters in clustering where the principle of DBI measurement is to maximize the distance between clusters and at the same time minimize the distance between points in a cluster (Jumadi Dehotman Sitompul, Salim Sitompul, & Sihombing, 2019). The smallest DBI value represents the best among other DBI values.
  • 15. Decision Tree is the most commonly used predictive model for classification which is a structure-like flowchart in which each node shows a test on an attribute value, each branch represents the test result and the tree leaf represents a class or distribution class (Jain & Srivastava, 2013). A Decision Tree is a format that contains root vertices, branches, and leaf segments. The topmost node in the tree is referred to as the root node. The main goal is to generate a model that predicts the value of variables needed based on many input variables that are also used by the decision tree classification model of prediction-based rules (Rohini & Suseendran, 2016).
  • 16. Dataset The dataset used for processing is sales transaction dataset weekly which has initial data 811 record and 104 attributes. The data attribute in question is Product_Code, sales data from 52 weeks, minimum sales data, maximum sales data, and Normalized weekly data values. Data Pre-Processing From the existing attribute data then selected attributes Product_Code consisting of P1, P2, P3, P4 ..., P819 as and sales data from 52 weeks namely W0, W1, W2 ...,W51 contains integer data for the first data processing using clustering method K-Means with P1, P2, P3, P4 ..., P819 as 'id'. The next stage of clustering results is processed again using the Decision Tree method with additional cluster attributes as 'labels' with cluster 0, cluster1 and cluster2 fields.
  • 17. Modeling and Evaluation This research consists of two parts and data processing conducted using RapidMiner tools. The first part is modelling the K-Means Method, as shown in Figure 2. The processed data will be linked with the Replace Missing Value operator to remove the lost data. After that it is then connected with the Clustering operator in this case the K-Means Method. The Performance operator is used to see the Davies Bouldin Index value where the smallest value indicates the optimal number of clusters. The test is to test the number of clusters 3 to 10.
  • 18. Figure 2. Modelling the K-Means Method
  • 19. The second part is data processing using Decision Tree data classification method divided into 2 data training and testing data with a ratio of 80:20. Operators used to know the optimization of parameters for the best performance is Optimize Parameter (Grid). Modelling for Decision Tree with Parameter Optimization is shown in Figure 3 through Figure 5.
  • 20. Figure 3. Modelling Decision Tree with Parameter Optimization
  • 21. Figure 4. Decision Tree Modelling Testing Figure 5. Detail Modelling Decision Tree
  • 22.
  • 23. In general accuracy, precision, and classifying acquisition are used to evaluate model performance. Samples that predict positive or negative categories can be known from the classifying prediction report. At the same time, the data category can be known and can be obtained the calculation values of the four basic indicators as shown in Table 1. Positive Negative True TP TN False FP FN Table 1. Confusion Matrix
  • 24. The first data processing using the K- Means method, from performance tests to the number of clusters 3 to 10, is known the smallest Davies Bouldin Index value is 0.626 so it is known that the optimal number of clusters is 3. as shown in Table 2 : Cluster DBI Value 3 0,626 4 0,864 5 0,777 6 1,988 7 1,939 8 2,204 9 2,342 10 2,178 Table 3. Cluster and DBI Value
  • 25. The result of clustering which is then processed by Decision Tree Method with parameter optimization for the best performance, known by its parameters and values are: a.Cross Validation.number_of_folds : 8 b.Decision Tree. Criterion : information_gain c.Decision Tree.maximal_depth : 50 d.Decision Tree.apply_pruning : true e.Decision Tree.apply_prepuning : true For the resulting performance is in Table 3 as follows :
  • 26. Training Data Testing Data Akurasi 98,06 97,12 Precision 97,64 97,10 Recall 97,49 96,05 Because the accuracy value of the tests conducted is greater than 90, the classification in this study is excellent clasification. From the data processing done, also obtained Decision Tree as shown in Figure 6 where W8 is the root that most determines and affects the decision tree or called the root node. Table 3. Accuracy, Precision and Recall Values
  • 28. Description of the Decision Tree above are : W8 > 5.500 | W15 > 26.500: cluster_1 {cluster_2=0, cluster_0=0, cluster_1=86} | W15 ≤ 26.500 | | W0 > 4.500 | | | W19 > 3.500: cluster_2 {cluster_2=134, cluster_0=0, cluster_1=1} | | | W19 ≤ 3.500: cluster_0 {cluster_2=0, cluster_0=2, cluster_1=0} | | W0 ≤ 4.500 | | | W5 > 9: cluster_2 {cluster_2=2, cluster_0=0, cluster_1=0} | | | W5 ≤ 9: cluster_0 {cluster_2=0, cluster_0=17, cluster_1=0} W8 ≤ 5.500 | W34 > 10.500: cluster_2 {cluster_2=2, cluster_0=0, cluster_1=0} | W34 ≤ 10.500: cluster_0 {cluster_2=0, cluster_0=324, cluster_1=0}
  • 29.
  • 30. Conclusion 1. From the data processing conducted, it is known that in this study the K- Means method with the Davies Bouldin Index produced an optimal cluster count of 3. 2. Decision Tree method will produce the best performance with parameter settings namely Iteration, Cros Validation, Criterion, Maximal Depth, Pruning and Pre Pruning. 3. The combination of clustering method and classification in this study was able to produce excellent classification category because the accuracy value in the test > 90. 4. In the next study in the second part of the study can be compared with classification methods such as Naïve Bayes, Neural Network, kNN and Linear Regression.