SlideShare a Scribd company logo
1 of 5
Download to read offline
Accepted by editor: 25-01-2021 | Final revision: 15-09-2022 | Online publication: 25-09-2022
47
Comparative Analysis of Naïve Bayes and Decision Tree Algorithms in
Data Mining Classification to Predict Weckerle Machine Productivity
Fried Sinlae1*
, Anugrah Sandy Yudhasti2
, Arief Wibowo3
123
Universitas Budi Luhur, Jakarta, Indonesia
*
sinlaefried@gmail.com
Abstract
The level of data accuracy in everyday life is necessary because it is reflected in the ever-advancing development of
information technology. Analysis of data processing in information that can provide knowledge with the help of data mining
systems. Algorithms commonly used for prediction are Naive Bayes and Decision Trees. The purpose of this study is to
compare the Nave-Bayes algorithm and the decision tree algorithm in terms of the accuracy of predicting the productivity of
the Weckerle machine at PT XYZ. The method used is a literature study from various related sources and understanding of
the data in the source related to the subject of the classification method of the Naive Bayes algorithm and the decision tree
into the data mining system. The results of this study are a classification using the Nave-Bayes algorithm with a higher level
of confidence than the decision tree algorithm.
Keywords: Naïve Bayes; Decision Tree; Algorithm; Data Mining
1. Introduction
The level of data accuracy is needed in everyday life as it keeps driving the development of information
technology. According to [1], when determining any decision under certain conditions, the information must be
taken into account, therefore the availability of information has become a medium to analyze and summarize
knowledge from data, which is useful for decision-making. However, knowledge from data about information is
not enough to make decisions. Data analysis is also required to generate a consideration from the information
provided. The analysis of data management in information that can provide knowledge is carried out using a data
mining system [2].
One problem can be predicted from the trend in terms of rules or estimates into the future using data mining.
There are various real-life applications of steps and techniques in data mining, one of which is the classification
technique. According to [3], this classification is a basic form of data analysis, while according to [4] the
classification is a technique for group membership according to existing data. Previously, there have been many
studies on predicting closure using classification methods, one of which is the Nave Bayes algorithm and the
decision tree algorithm. According to [3], the Nave-Bayes algorithm itself is currently popular because the
accuracy rate of these two algorithms is high. Therefore, this study aims to compare the accuracy of the Nave-
Bayes algorithm and the decision tree algorithm in terms of the accuracy of predicting the productivity of the
Weckerle machine at PT XYZ.
The research the author will be conducting at this point is a comparison of Nave Bayes and decision tree
methods in data mining classification to predict the productivity of the Weckerle machine at PT XYZ.
2. Method
In this study, there are guidelines to help the research achieve maximum results and avoid falling short of the
research goals. The research steps are as follows:
a. Problem Identification. This stage carries out an explanation of the research problem by explaining the
important points of the problem and then the author gets a foundation for describing the research problem.
b. Literature Review. The research was conducted in literature from various related references and
understanding the data in the references related to the topic of the Naïve Bayes and Decision Tree algorithm
classification methods into a data mining system. There was also an assessment of theories related to
research topics that have been carried out in existing research as well as the development of various current
theories and references.
c. Analysis. At this stage, the process of studying, describing and solving a problem and research objectives is
carried out.
d. Paper Implementation. At this stage, the application of the analysis results into a paper or journal is carried
out.
Fried Sinlae, Anugrah Sandy Yudhasti, Arief Wibowo
DOI 10.29207/joseit.v1i2.3439
Journal of Systems Engineering and Information Technology (JOSEIT) Volume.1No. 2 (2022) 47-51
48
3. Result and Discussion
In this research, we analyze by comparing two methods namely decision tree algorithm and Nave Bayes
algorithm method in a data mining system. Here we compare the two methods using data from two sources
addressing the issue of predicting the productivity of the Weckerle machine at PT XYZ. The first source of
Weckerle machine productivity data below uses the Decision Tree algorithm computation. The process of
processing this data using the decision tree method works to build a decision tree.
There are several processes, namely:
a. Calculating the results of data summation, this data summation is based on the number of result
attributes by fulfilling the predetermined conditions.
b. Determine the attribute and use it for Node. Node is one of the attributes with the highest gain value
from other attributes.
c. Create branching for each member of the Node.
d. Check if any member of the Node has a zero value, but if the result has a zero value, then determine
which one is appropriate to become a leaf of the decision tree. Continue until the entire entropy value of
the members of the Node has a value of zero so that the process stops.
e. If there is more than zero entropy value coming from one member of the Node, then repeat the previous
process from the beginning until all Nodes have zero value.
Table 1. Data Sample
Finish Goods Down Time Reject Output Productive
High Available Many Many Yes
Medium Available Many Many Yes
Medium No Available Many Many Yes
Low Available Many Many Not Productive
Low Available Many Little Not Productive
High Available Many Many Yes
High No Available Little Little Not Productive
High No Available Many Many Yes
Medium Available Little Many Yes
Low Available Many Many Not Productive
Low Available Many Many Not Productive
Low No Available Many Little Not Productive
Medium No Available Little Little Not Productive
Medium No Available Little Many Productive
High No Available Many Many Productive
High Available Many Little Not Productive
Low No Available Many Little Not Productive
High Available Many Many Productive
Medium Available Many Many Productive
High Available Many Many Productive
High Available Many Many Productive
Low No Available Many Little Not Productive
Medium Available Many Many Productive
High Available Many Little Not Productive
Medium No Available Little Little Not Productive
Low No Available Little Many Not Productive
High Available Many Many Productive
High Available Little Many Productive
High Available Many Many Productive
Medium Available Little Many Productive
Medium Available Little Many Productive
Medium No Available Many Many Productive
High Available Many Many Productive
Low Available Little Little Not Productive
High Available Many Many Productive
Fried Sinlae, Anugrah Sandy Yudhasti, Arief Wibowo
DOI 10.29207/joseit.v1i2.3439
Journal of Systems Engineering and Information Technology (JOSEIT) Volume.1No. 2 (2022) 47-51
49
After getting sample data, go through the process of calculating data amount, entropy, and gain. The results are
in the following table:
Table 2. Calculation Of the Amount of Data, Entropy, and Gain
Node Sum Productive No Productive Entropy Gain
1 Total 35 21 14 0,97095
Finish
Goods
0,446569331
High 15 12 3 0,72193
Medium 11 9 2 0,68404
Low 9 0 9 0,00000
Down
Time
0,052411577
Available 23 16 7 0,88654
No
Available
12 5 7 0,97987
Reject
Product
0,011891174
Many 25 16 9 0,94268
Little 10 5 5 1,00000
Output
Products
0,517872341
Many 25 21 4 0,63431
Little 10 0 10 0,00000
After running through several calculation processes using the decision tree method from the data, one obtains
the results of the highest gain value, namely the production output. Then the production output is put here in the
root of the decision tree. And the finished goods production becomes a factor that determines the productivity of
the weckerle machine. Seen on the following picture:
Figure 1. Decision Tree
Figure 1 explains that there are two types of members, namely many members (productive) and few members
(unproductive). There are 3 members in Finish Goods, namely low (unproductive), medium (productive), high
(productive). And why this decision tree only reaches Finish Goods, that's because the value between productive
and unproductive members has a value of 0, so the decision can be obtained directly. This also shows that rejects
and downtimes do not affect the productivity of the Weckerle PT XYZ machine.
The next test concerns sample data using tools in the Excel application, namely Rapidminer tools, starting
with the connection process between the sample database and the operator and subsequent validation as shown in
Figure 2 below:
Fried Sinlae, Anugrah Sandy Yudhasti, Arief Wibowo
DOI 10.29207/joseit.v1i2.3439
Journal of Systems Engineering and Information Technology (JOSEIT) Volume.1No. 2 (2022) 47-51
50
Figure 2. Decision Tree Connection in RapidMiner Tools
From the RapidMiner connection process above, the results obtained are the same as from the manual
calculation process in Figure 2 to obtain the decision tree results as below. This chapter explains the process
carried out in this study. There are several steps used in this research.
Figure 3. Decision Tree in RapidMiner Tools
The following is a screenshot of the measurement results of the test data against the Apply Decision Tree
algorithm model in predicting the productivity of the Weckerle machine. It can be seen that the trust is
productive 1 and unproductive 0 as it follows the decision tree in Figure 4. Downtime and scrap products are
ignored.
Figure 4. Prediction of Decision Tree Test Data on RapidMiner Tools
In this second case study, the Nave Bayes method is applied, with the data processing still based on the same
topic. With this method, there is a set application testing process with RapidMiner. With this RapidMiner
application, it is grouped by the selected attributes, which are the same attributes and data sets as the decision
tree method.
Figure 5. Naïve Bayes Connection in RapidMiner Tools
Fried Sinlae, Anugrah Sandy Yudhasti, Arief Wibowo
DOI 10.29207/joseit.v1i2.3439
Journal of Systems Engineering and Information Technology (JOSEIT) Volume.1No. 2 (2022) 47-51
51
The following is a screenshot of the measurement results of the test data against the Apply Nave Bayes
algorithm model in predicting the productivity of the Weckerle machine. It can be seen that the confidence is
productive 0.816 and unproductive 0.184 because it follows the probability formula in Nave Bayes that it can be
implemented in numbers unlike the decision tree algorithm which just follows the decision tree and ignores
attributes that do not go into the calculation.
Figure 6. Prediction of Naïve Bayes Test Data in RapidMiner Tools
4. Conclusion
From the results of the explanation in the above description it can be concluded that the use of the Nave-Bayes
algorithm method evaluates the accuracy of the data to show the confidence of the processed test data applied to
display decimal numbers. Meanwhile, the decision tree algorithm gets the results of measuring the confidence of
test data in predicting the productivity of the Weckerle machine, which only shows the number 1 for productive
and 0 for unproductive. So, this shows that the Nave Bayes algorithm for predicting the deal has a higher
confidence level of accuracy than the decision tree algorithm.
References
[1] N. M. Huda, APLIKASI DATA MINING UNTUK MENAMPILKAN INFORMASI TINGKAT KELULUSAN
MAHASISWA (Skripsi), Semarang: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Diponegoro, 2010.
[2] A. Fikri, Penerapan Data Mining Untuk Mengetahui Tingkat Kekuatan Beton Yang Dihasilkan Dengan Metode Estimasi
Menggunakan Linear Regression (Skripsi), Semarang: Fakultas Ilmu Komputer Universitas Dian Nuswantoro, 2013.
[3] Y. I. Kurniawan, "PERBANDINGAN ALGORITMA NAIVE BAYES DAN C.45 DALAM KLASIFIKASI DATA
MINING," Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), pp. 455-463, 2018.
[4] M. S. S. G. Arpit Bansal, "Improved K-mean Clustering Algorithm for Prediction Analysis using Classification
Technique in Data Mining," International Journal of Computer Applications, pp. 35-40, 2017.
[5] P. B. Davies, Database Systems Third Edition, New York: Plgrave Macmillan, 2004.
[6] R. J. Roiger, Data Mining: A Tutorial-Based Primer, CRC Press, 2017.
[7] A. Saleh, "KLASIFIKASI METODE NAIVE BAYES DALAM DATA MINING UNTUK MENENTUKAN
KONSENTRASI SISWA ( STUDI KASUS DI MAS PAB 2 MEDAN )," Konferensi Nasional Pengembangan
Teknologi Informasi dan Komunikasi, pp. 200-208, 2015.
[8] Bustami, "PENERAPAN ALGORITMA NAIVE BAYES UNTUK MENGKLASIFIKASI DATA NASABAH
ASURANSI," JURNAL INFORMATIKA, pp. 884-898, 2018.
[9] A. K. N. I. A. Ayung Candra Padmasari, "Penerapan Model Decision Tree untuk Rancangan Game Multiplayer Berbasis
Jaringan (Uka-Uka Tresure Hunter)," Jurnal Edsence Vol. 1 No. 1, pp. 19-24, 2019.
[10] B. Unhelkar, Software Engineering with UML, Boca Raton: Auerbach Publications/CRC Press, 2018.
[11] Z. Syahputra, "Penerapan Permodelan UML Sistem Informasi Perpustakaan pada Universitas Islam Indragiri Berbasis
Client Server," Jurnal SISTEMASI, pp. 57-64, 2015.
[12] J. Martin, Rapid Application Development, New York: Macmillan Publishing, 1991.
[13] Nik, M. N., Nor, A. A., & Hazlifah, M. R., "Implementing Rapid Application Development," in Implementing Rapid
Application Development (RAD) Methodology in Developing Practical Training Application System, Kuala Lumpur,
Malaysia, IEEE, 2010, pp. 1664-1667.
[14] T. Lucidchart, "4 Phases of Rapid Application Development Methodology," 23 May 2018. [Online]. Available:
https://www.lucidchart.com/blog/rapid-application-development-methodology. [Accessed 7 11 2020].
[15] D. D. A. R. Ricardo M. Bastos, Extending UML Activity Diagram for Workflow Modeling in Production Systems,
Hawaii: IEEE, 2002.
[16] Y. Mardi, "Data Mining : Klasifikasi Menggunakan Algoritma C4.5," Jurnal Edik Informatika, pp. 213-219, 2016.
[17] E. Elisa, "Analisa dan Penerapan Algoritma C4.5 Dalam Data Mining Untuk Mengidentifikasi Faktor-Faktor Penyebab
Kecelakaan Kerja Kontruksi PT.Arupadhatu Adisesanti," Jurnal Online Informatika, pp. 36-41, 2017.

More Related Content

Similar to Comparative Analysis of Naive Bayes and Decision Tree Algorithms in Data Mining Classification to Predict Weckerle Machine Productivity

IRJET- Medical Data Mining
IRJET- Medical Data MiningIRJET- Medical Data Mining
IRJET- Medical Data MiningIRJET Journal
 
A study and survey on various progressive duplicate detection mechanisms
A study and survey on various progressive duplicate detection mechanismsA study and survey on various progressive duplicate detection mechanisms
A study and survey on various progressive duplicate detection mechanismseSAT Journals
 
C055011012
C055011012C055011012
C055011012inventy
 
Selecting the correct Data Mining Method: Classification & InDaMiTe-R
Selecting the correct Data Mining Method: Classification & InDaMiTe-RSelecting the correct Data Mining Method: Classification & InDaMiTe-R
Selecting the correct Data Mining Method: Classification & InDaMiTe-RIOSR Journals
 
Novel holistic architecture for analytical operation on sensory data relayed...
Novel holistic architecture for analytical operation  on sensory data relayed...Novel holistic architecture for analytical operation  on sensory data relayed...
Novel holistic architecture for analytical operation on sensory data relayed...IJECEIAES
 
A REVIEW PAPER ON BIG DATA ANALYTICS
A REVIEW PAPER ON BIG DATA ANALYTICSA REVIEW PAPER ON BIG DATA ANALYTICS
A REVIEW PAPER ON BIG DATA ANALYTICSSarah Adams
 
Selection of Articles Using Data Analytics for Behavioral Dissertation Resear...
Selection of Articles Using Data Analytics for Behavioral Dissertation Resear...Selection of Articles Using Data Analytics for Behavioral Dissertation Resear...
Selection of Articles Using Data Analytics for Behavioral Dissertation Resear...PhD Assistance
 
IP final project
IP final project IP final project
IP final project SantySS
 
Cross Domain Recommender System using Machine Learning and Transferable Knowl...
Cross Domain Recommender System using Machine Learning and Transferable Knowl...Cross Domain Recommender System using Machine Learning and Transferable Knowl...
Cross Domain Recommender System using Machine Learning and Transferable Knowl...IRJET Journal
 
Internship project report,Predictive Modelling
Internship project report,Predictive ModellingInternship project report,Predictive Modelling
Internship project report,Predictive ModellingAmit Kumar
 
Cross Domain Data Fusion
Cross Domain Data FusionCross Domain Data Fusion
Cross Domain Data FusionIRJET Journal
 
Detection Of Fraudlent Behavior In Water Consumption Using A Data Mining Base...
Detection Of Fraudlent Behavior In Water Consumption Using A Data Mining Base...Detection Of Fraudlent Behavior In Water Consumption Using A Data Mining Base...
Detection Of Fraudlent Behavior In Water Consumption Using A Data Mining Base...pandavaTirumala
 
IRJET- Analysis of Brand Value Prediction based on Social Media Data
IRJET-  	  Analysis of Brand Value Prediction based on Social Media DataIRJET-  	  Analysis of Brand Value Prediction based on Social Media Data
IRJET- Analysis of Brand Value Prediction based on Social Media DataIRJET Journal
 
Mining Social Media Data for Understanding Drugs Usage
Mining Social Media Data for Understanding Drugs  UsageMining Social Media Data for Understanding Drugs  Usage
Mining Social Media Data for Understanding Drugs UsageIRJET Journal
 
Fuzzy Analytical Hierarchy Process Method to Determine the Project Performanc...
Fuzzy Analytical Hierarchy Process Method to Determine the Project Performanc...Fuzzy Analytical Hierarchy Process Method to Determine the Project Performanc...
Fuzzy Analytical Hierarchy Process Method to Determine the Project Performanc...IRJET Journal
 
A Firefly based improved clustering algorithm
A Firefly based improved clustering algorithmA Firefly based improved clustering algorithm
A Firefly based improved clustering algorithmIRJET Journal
 
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETSURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETEditor IJMTER
 

Similar to Comparative Analysis of Naive Bayes and Decision Tree Algorithms in Data Mining Classification to Predict Weckerle Machine Productivity (20)

IRJET- Medical Data Mining
IRJET- Medical Data MiningIRJET- Medical Data Mining
IRJET- Medical Data Mining
 
A study and survey on various progressive duplicate detection mechanisms
A study and survey on various progressive duplicate detection mechanismsA study and survey on various progressive duplicate detection mechanisms
A study and survey on various progressive duplicate detection mechanisms
 
C055011012
C055011012C055011012
C055011012
 
Selecting the correct Data Mining Method: Classification & InDaMiTe-R
Selecting the correct Data Mining Method: Classification & InDaMiTe-RSelecting the correct Data Mining Method: Classification & InDaMiTe-R
Selecting the correct Data Mining Method: Classification & InDaMiTe-R
 
Novel holistic architecture for analytical operation on sensory data relayed...
Novel holistic architecture for analytical operation  on sensory data relayed...Novel holistic architecture for analytical operation  on sensory data relayed...
Novel holistic architecture for analytical operation on sensory data relayed...
 
A REVIEW PAPER ON BIG DATA ANALYTICS
A REVIEW PAPER ON BIG DATA ANALYTICSA REVIEW PAPER ON BIG DATA ANALYTICS
A REVIEW PAPER ON BIG DATA ANALYTICS
 
Selection of Articles Using Data Analytics for Behavioral Dissertation Resear...
Selection of Articles Using Data Analytics for Behavioral Dissertation Resear...Selection of Articles Using Data Analytics for Behavioral Dissertation Resear...
Selection of Articles Using Data Analytics for Behavioral Dissertation Resear...
 
My experiment
My experimentMy experiment
My experiment
 
IP final project
IP final project IP final project
IP final project
 
Cross Domain Recommender System using Machine Learning and Transferable Knowl...
Cross Domain Recommender System using Machine Learning and Transferable Knowl...Cross Domain Recommender System using Machine Learning and Transferable Knowl...
Cross Domain Recommender System using Machine Learning and Transferable Knowl...
 
Internship project report,Predictive Modelling
Internship project report,Predictive ModellingInternship project report,Predictive Modelling
Internship project report,Predictive Modelling
 
Cross Domain Data Fusion
Cross Domain Data FusionCross Domain Data Fusion
Cross Domain Data Fusion
 
Detection Of Fraudlent Behavior In Water Consumption Using A Data Mining Base...
Detection Of Fraudlent Behavior In Water Consumption Using A Data Mining Base...Detection Of Fraudlent Behavior In Water Consumption Using A Data Mining Base...
Detection Of Fraudlent Behavior In Water Consumption Using A Data Mining Base...
 
pradeep ppt final.pptx
pradeep ppt final.pptxpradeep ppt final.pptx
pradeep ppt final.pptx
 
IRJET- Analysis of Brand Value Prediction based on Social Media Data
IRJET-  	  Analysis of Brand Value Prediction based on Social Media DataIRJET-  	  Analysis of Brand Value Prediction based on Social Media Data
IRJET- Analysis of Brand Value Prediction based on Social Media Data
 
Mining Social Media Data for Understanding Drugs Usage
Mining Social Media Data for Understanding Drugs  UsageMining Social Media Data for Understanding Drugs  Usage
Mining Social Media Data for Understanding Drugs Usage
 
Fuzzy Analytical Hierarchy Process Method to Determine the Project Performanc...
Fuzzy Analytical Hierarchy Process Method to Determine the Project Performanc...Fuzzy Analytical Hierarchy Process Method to Determine the Project Performanc...
Fuzzy Analytical Hierarchy Process Method to Determine the Project Performanc...
 
H04564550
H04564550H04564550
H04564550
 
A Firefly based improved clustering algorithm
A Firefly based improved clustering algorithmA Firefly based improved clustering algorithm
A Firefly based improved clustering algorithm
 
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETSURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
 

Recently uploaded

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 

Recently uploaded (20)

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 

Comparative Analysis of Naive Bayes and Decision Tree Algorithms in Data Mining Classification to Predict Weckerle Machine Productivity

  • 1. Accepted by editor: 25-01-2021 | Final revision: 15-09-2022 | Online publication: 25-09-2022 47 Comparative Analysis of Naïve Bayes and Decision Tree Algorithms in Data Mining Classification to Predict Weckerle Machine Productivity Fried Sinlae1* , Anugrah Sandy Yudhasti2 , Arief Wibowo3 123 Universitas Budi Luhur, Jakarta, Indonesia * sinlaefried@gmail.com Abstract The level of data accuracy in everyday life is necessary because it is reflected in the ever-advancing development of information technology. Analysis of data processing in information that can provide knowledge with the help of data mining systems. Algorithms commonly used for prediction are Naive Bayes and Decision Trees. The purpose of this study is to compare the Nave-Bayes algorithm and the decision tree algorithm in terms of the accuracy of predicting the productivity of the Weckerle machine at PT XYZ. The method used is a literature study from various related sources and understanding of the data in the source related to the subject of the classification method of the Naive Bayes algorithm and the decision tree into the data mining system. The results of this study are a classification using the Nave-Bayes algorithm with a higher level of confidence than the decision tree algorithm. Keywords: Naïve Bayes; Decision Tree; Algorithm; Data Mining 1. Introduction The level of data accuracy is needed in everyday life as it keeps driving the development of information technology. According to [1], when determining any decision under certain conditions, the information must be taken into account, therefore the availability of information has become a medium to analyze and summarize knowledge from data, which is useful for decision-making. However, knowledge from data about information is not enough to make decisions. Data analysis is also required to generate a consideration from the information provided. The analysis of data management in information that can provide knowledge is carried out using a data mining system [2]. One problem can be predicted from the trend in terms of rules or estimates into the future using data mining. There are various real-life applications of steps and techniques in data mining, one of which is the classification technique. According to [3], this classification is a basic form of data analysis, while according to [4] the classification is a technique for group membership according to existing data. Previously, there have been many studies on predicting closure using classification methods, one of which is the Nave Bayes algorithm and the decision tree algorithm. According to [3], the Nave-Bayes algorithm itself is currently popular because the accuracy rate of these two algorithms is high. Therefore, this study aims to compare the accuracy of the Nave- Bayes algorithm and the decision tree algorithm in terms of the accuracy of predicting the productivity of the Weckerle machine at PT XYZ. The research the author will be conducting at this point is a comparison of Nave Bayes and decision tree methods in data mining classification to predict the productivity of the Weckerle machine at PT XYZ. 2. Method In this study, there are guidelines to help the research achieve maximum results and avoid falling short of the research goals. The research steps are as follows: a. Problem Identification. This stage carries out an explanation of the research problem by explaining the important points of the problem and then the author gets a foundation for describing the research problem. b. Literature Review. The research was conducted in literature from various related references and understanding the data in the references related to the topic of the Naïve Bayes and Decision Tree algorithm classification methods into a data mining system. There was also an assessment of theories related to research topics that have been carried out in existing research as well as the development of various current theories and references. c. Analysis. At this stage, the process of studying, describing and solving a problem and research objectives is carried out. d. Paper Implementation. At this stage, the application of the analysis results into a paper or journal is carried out.
  • 2. Fried Sinlae, Anugrah Sandy Yudhasti, Arief Wibowo DOI 10.29207/joseit.v1i2.3439 Journal of Systems Engineering and Information Technology (JOSEIT) Volume.1No. 2 (2022) 47-51 48 3. Result and Discussion In this research, we analyze by comparing two methods namely decision tree algorithm and Nave Bayes algorithm method in a data mining system. Here we compare the two methods using data from two sources addressing the issue of predicting the productivity of the Weckerle machine at PT XYZ. The first source of Weckerle machine productivity data below uses the Decision Tree algorithm computation. The process of processing this data using the decision tree method works to build a decision tree. There are several processes, namely: a. Calculating the results of data summation, this data summation is based on the number of result attributes by fulfilling the predetermined conditions. b. Determine the attribute and use it for Node. Node is one of the attributes with the highest gain value from other attributes. c. Create branching for each member of the Node. d. Check if any member of the Node has a zero value, but if the result has a zero value, then determine which one is appropriate to become a leaf of the decision tree. Continue until the entire entropy value of the members of the Node has a value of zero so that the process stops. e. If there is more than zero entropy value coming from one member of the Node, then repeat the previous process from the beginning until all Nodes have zero value. Table 1. Data Sample Finish Goods Down Time Reject Output Productive High Available Many Many Yes Medium Available Many Many Yes Medium No Available Many Many Yes Low Available Many Many Not Productive Low Available Many Little Not Productive High Available Many Many Yes High No Available Little Little Not Productive High No Available Many Many Yes Medium Available Little Many Yes Low Available Many Many Not Productive Low Available Many Many Not Productive Low No Available Many Little Not Productive Medium No Available Little Little Not Productive Medium No Available Little Many Productive High No Available Many Many Productive High Available Many Little Not Productive Low No Available Many Little Not Productive High Available Many Many Productive Medium Available Many Many Productive High Available Many Many Productive High Available Many Many Productive Low No Available Many Little Not Productive Medium Available Many Many Productive High Available Many Little Not Productive Medium No Available Little Little Not Productive Low No Available Little Many Not Productive High Available Many Many Productive High Available Little Many Productive High Available Many Many Productive Medium Available Little Many Productive Medium Available Little Many Productive Medium No Available Many Many Productive High Available Many Many Productive Low Available Little Little Not Productive High Available Many Many Productive
  • 3. Fried Sinlae, Anugrah Sandy Yudhasti, Arief Wibowo DOI 10.29207/joseit.v1i2.3439 Journal of Systems Engineering and Information Technology (JOSEIT) Volume.1No. 2 (2022) 47-51 49 After getting sample data, go through the process of calculating data amount, entropy, and gain. The results are in the following table: Table 2. Calculation Of the Amount of Data, Entropy, and Gain Node Sum Productive No Productive Entropy Gain 1 Total 35 21 14 0,97095 Finish Goods 0,446569331 High 15 12 3 0,72193 Medium 11 9 2 0,68404 Low 9 0 9 0,00000 Down Time 0,052411577 Available 23 16 7 0,88654 No Available 12 5 7 0,97987 Reject Product 0,011891174 Many 25 16 9 0,94268 Little 10 5 5 1,00000 Output Products 0,517872341 Many 25 21 4 0,63431 Little 10 0 10 0,00000 After running through several calculation processes using the decision tree method from the data, one obtains the results of the highest gain value, namely the production output. Then the production output is put here in the root of the decision tree. And the finished goods production becomes a factor that determines the productivity of the weckerle machine. Seen on the following picture: Figure 1. Decision Tree Figure 1 explains that there are two types of members, namely many members (productive) and few members (unproductive). There are 3 members in Finish Goods, namely low (unproductive), medium (productive), high (productive). And why this decision tree only reaches Finish Goods, that's because the value between productive and unproductive members has a value of 0, so the decision can be obtained directly. This also shows that rejects and downtimes do not affect the productivity of the Weckerle PT XYZ machine. The next test concerns sample data using tools in the Excel application, namely Rapidminer tools, starting with the connection process between the sample database and the operator and subsequent validation as shown in Figure 2 below:
  • 4. Fried Sinlae, Anugrah Sandy Yudhasti, Arief Wibowo DOI 10.29207/joseit.v1i2.3439 Journal of Systems Engineering and Information Technology (JOSEIT) Volume.1No. 2 (2022) 47-51 50 Figure 2. Decision Tree Connection in RapidMiner Tools From the RapidMiner connection process above, the results obtained are the same as from the manual calculation process in Figure 2 to obtain the decision tree results as below. This chapter explains the process carried out in this study. There are several steps used in this research. Figure 3. Decision Tree in RapidMiner Tools The following is a screenshot of the measurement results of the test data against the Apply Decision Tree algorithm model in predicting the productivity of the Weckerle machine. It can be seen that the trust is productive 1 and unproductive 0 as it follows the decision tree in Figure 4. Downtime and scrap products are ignored. Figure 4. Prediction of Decision Tree Test Data on RapidMiner Tools In this second case study, the Nave Bayes method is applied, with the data processing still based on the same topic. With this method, there is a set application testing process with RapidMiner. With this RapidMiner application, it is grouped by the selected attributes, which are the same attributes and data sets as the decision tree method. Figure 5. Naïve Bayes Connection in RapidMiner Tools
  • 5. Fried Sinlae, Anugrah Sandy Yudhasti, Arief Wibowo DOI 10.29207/joseit.v1i2.3439 Journal of Systems Engineering and Information Technology (JOSEIT) Volume.1No. 2 (2022) 47-51 51 The following is a screenshot of the measurement results of the test data against the Apply Nave Bayes algorithm model in predicting the productivity of the Weckerle machine. It can be seen that the confidence is productive 0.816 and unproductive 0.184 because it follows the probability formula in Nave Bayes that it can be implemented in numbers unlike the decision tree algorithm which just follows the decision tree and ignores attributes that do not go into the calculation. Figure 6. Prediction of Naïve Bayes Test Data in RapidMiner Tools 4. Conclusion From the results of the explanation in the above description it can be concluded that the use of the Nave-Bayes algorithm method evaluates the accuracy of the data to show the confidence of the processed test data applied to display decimal numbers. Meanwhile, the decision tree algorithm gets the results of measuring the confidence of test data in predicting the productivity of the Weckerle machine, which only shows the number 1 for productive and 0 for unproductive. So, this shows that the Nave Bayes algorithm for predicting the deal has a higher confidence level of accuracy than the decision tree algorithm. References [1] N. M. Huda, APLIKASI DATA MINING UNTUK MENAMPILKAN INFORMASI TINGKAT KELULUSAN MAHASISWA (Skripsi), Semarang: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Diponegoro, 2010. [2] A. Fikri, Penerapan Data Mining Untuk Mengetahui Tingkat Kekuatan Beton Yang Dihasilkan Dengan Metode Estimasi Menggunakan Linear Regression (Skripsi), Semarang: Fakultas Ilmu Komputer Universitas Dian Nuswantoro, 2013. [3] Y. I. Kurniawan, "PERBANDINGAN ALGORITMA NAIVE BAYES DAN C.45 DALAM KLASIFIKASI DATA MINING," Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), pp. 455-463, 2018. [4] M. S. S. G. Arpit Bansal, "Improved K-mean Clustering Algorithm for Prediction Analysis using Classification Technique in Data Mining," International Journal of Computer Applications, pp. 35-40, 2017. [5] P. B. Davies, Database Systems Third Edition, New York: Plgrave Macmillan, 2004. [6] R. J. Roiger, Data Mining: A Tutorial-Based Primer, CRC Press, 2017. [7] A. Saleh, "KLASIFIKASI METODE NAIVE BAYES DALAM DATA MINING UNTUK MENENTUKAN KONSENTRASI SISWA ( STUDI KASUS DI MAS PAB 2 MEDAN )," Konferensi Nasional Pengembangan Teknologi Informasi dan Komunikasi, pp. 200-208, 2015. [8] Bustami, "PENERAPAN ALGORITMA NAIVE BAYES UNTUK MENGKLASIFIKASI DATA NASABAH ASURANSI," JURNAL INFORMATIKA, pp. 884-898, 2018. [9] A. K. N. I. A. Ayung Candra Padmasari, "Penerapan Model Decision Tree untuk Rancangan Game Multiplayer Berbasis Jaringan (Uka-Uka Tresure Hunter)," Jurnal Edsence Vol. 1 No. 1, pp. 19-24, 2019. [10] B. Unhelkar, Software Engineering with UML, Boca Raton: Auerbach Publications/CRC Press, 2018. [11] Z. Syahputra, "Penerapan Permodelan UML Sistem Informasi Perpustakaan pada Universitas Islam Indragiri Berbasis Client Server," Jurnal SISTEMASI, pp. 57-64, 2015. [12] J. Martin, Rapid Application Development, New York: Macmillan Publishing, 1991. [13] Nik, M. N., Nor, A. A., & Hazlifah, M. R., "Implementing Rapid Application Development," in Implementing Rapid Application Development (RAD) Methodology in Developing Practical Training Application System, Kuala Lumpur, Malaysia, IEEE, 2010, pp. 1664-1667. [14] T. Lucidchart, "4 Phases of Rapid Application Development Methodology," 23 May 2018. [Online]. Available: https://www.lucidchart.com/blog/rapid-application-development-methodology. [Accessed 7 11 2020]. [15] D. D. A. R. Ricardo M. Bastos, Extending UML Activity Diagram for Workflow Modeling in Production Systems, Hawaii: IEEE, 2002. [16] Y. Mardi, "Data Mining : Klasifikasi Menggunakan Algoritma C4.5," Jurnal Edik Informatika, pp. 213-219, 2016. [17] E. Elisa, "Analisa dan Penerapan Algoritma C4.5 Dalam Data Mining Untuk Mengidentifikasi Faktor-Faktor Penyebab Kecelakaan Kerja Kontruksi PT.Arupadhatu Adisesanti," Jurnal Online Informatika, pp. 36-41, 2017.