SlideShare a Scribd company logo
1 of 39
KLASIFIKASI DATA ABSENSI KERJA DENGAN MENGGUNAKAN
METODE K-NEAREST NEIGHBOUR, NAÏVE BAYES, DECISION
TREE DAN RANDOM FOREST
OLEH : DIAN VITIANA NINGRUM
(06211540000020)
Kesuksesan suatu perusahaan terletak pada ketepatan
waktu dan integritas karyawannya. Absensi kerja perlu
dimonitor agar pekerja dapat mematuhi peraturan yang
ada di perusahaan.
Sistem manajemen kehadiran karyawan yang berfokus
meningkatkan produktivitas kerja dan pengembangan diri
karyawan dengan lebih efektif dan efisien sehingga
perusahaan dapat berkompetisis dengan kompetitor
dengan baik
2
PENDAHULUAN
SUMBER DATA
Sumber data yang digunakan
adalah data sekunder, karena
menggunakan data pada
UCI Machine Learning
3
DATA SEKUNDER
Data yang digunakan adalah
Absenteeism atWork
ABSENTEEISM AT WORK
VARIABEL PENELITIAN
• Y : Absensi karyawan
• X1 : Alasan sakit
• X2 : Biaya Trasnsportasi
• X3 : Jarak hunian ke tempat kerja
• X4 : Waktu pelayanan
• X5 : Usia
• X6 : Beban kerja rata-ratahari
• X7 : Hit target
• X8 : Kedisplinan
• X9 : Indeks massa tubuh
4
METODE KLASIFIKASI
5
K-NEAREST
NEIGHBOR
NAÏVE BAYES DECISIONTREE RANDOM FOREST
Kemudian, dari keempat metode tersebut dicari metode dengan
Score paling tinggi
6
Import Packages
##import packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import
LabelEncoder
data = pd.read_excel (r'E:data_absen.xlsx', sheet_name='Sheet1')
Loading Data
7
data['Absenteeism time in hours'].replace([0,1,2, 3, 4, 5, 7, 8,16, 24,32, 40, 48, 56, 64, 80, 104, 112,
120 ],['Kurang dari 6 jam','Kurang dari 6 jam','Kurang dari 6 jam','Kurang dari 6 jam','Kurang dari 6
jam','Kurang dari 6 jam', 'Lebih dari 6 jam','Lebih dari 6 jam','Lebih dari 6 jam','Lebih dari 6
jam','Lebih dari 6 jam','Lebih dari 6 jam','Lebih dari 6 jam','Lebih dari 6 jam','Lebih dari 6 jam','Lebih
dari 6 jam','Lebih dari 6 jam','Lebih dari 6 jam','Lebih dari 6 jam'],inplace=True)
FUTURE ENGINEERING
Variabel Y, yaitu data Absenteeism time in hours, data dibagi menjadi 2
kategori, yaitu kurang dari 6 jam dan lebih dari 6 jam, sehingga, data dikode
sebagai berikut
8
## drop data
data = data.drop("ID", axis=1)
data = data.drop("Education", axis=1)
data = data.drop("Son", axis=1)
data = data.drop("Seasons", axis=1)
data = data.drop("Social drinker", axis=1)
data = data.drop("Social smoker", axis=1)
data = data.drop("Pet", axis=1)
data = data.drop("Height", axis=1)
data = data.drop("Month of absence", axis=1)
data = data.drop("Day of the week", axis=1)
data = data.drop("Weight", axis=1)
Drop data dilakukan untuk
menghilangkan data yang
sekiranya tidak diperlukan
dalam analisis
9
data.head()
data.tail()
10
##Deteksi MissingValue
for col in data.columns.values:
if data[col].isnull().values.any():
print("Missing values in "+col)
data.isnull().any()
deteksi missing value digunakan
untuk mengetahui apakah
terdapat data yang missing atau
tidak lengkap. hasil deteksi
missing value pada data di
samping adalah False, artinya
tidak ada data yang missing.
11
##Statistika Deskriptif
data.describe()
12
##Correlation
f,ax = plt.subplots(figsize=(10, 10))
sns.heatmap(correlation, annot=True,
linewidths=.5, fmt= '.1f',ax=ax)
plt.title("Correlation between Columns of
dataFrame",y=1.08)
plt.show()
variabel umur dan variabel waktu
pelayanan memiliki koreasi yang paling
besar, yaitu mendekati 0,7. sedangkan
korelasi terbesar kedua setelah variabel
waktu pelayanan dan umur adalah
varaiabel umur dengan variabel indeks
massa tubuh.Yaitu sebesar 0.5
13
## Scatter Plot
g = sns.PairGrid(data,
hue="Absenteeism time in hours")
g.map_diag(plt.hist)
g.map_offdiag(plt.scatter)
g.add_legend()
plt.show()
14
### PAIR PLOT
sns.pairplot(data=data[['Reason
for absence', 'Transportation
expense', 'Distance from
Residence toWork', 'Service time',
'Age', 'Work load Average/day ',
'Hit target', 'Body mass index',
'Absenteeism time in hours']],
hue='Absenteeism time in hours')
plt.show()
15
## Komposisi Variabel Y
f,ax=plt.subplots(1,2,figsize=(18,8))
data['Absenteeism time in
hours'].value_counts().plot.pie(explode=[0
,0.1],autopct='%1.1f%%',ax=ax[0],shadow
=True)
ax[0].set_title('Absenteeism time in hours')
ax[0].set_ylabel('')
sns.countplot('Absenteeism time in
hours',data=data,ax=ax[1])
ax[1].set_title('Absenteeism time in hours')
plt.show()
16
## Diagram batangVariabel HitTarget
sns.catplot(x="Hit target",hue="Absenteeism time in hours",kind="count", data=data)
17
## Diagram BatangVariabel ServiceTime
sns.catplot(x="Service time",hue="Absenteeism time in hours",kind="count", data=data)
18
### Diagram BatangVariabel
Disciplinary Failure
sns.catplot(x="Disciplinary
failure",hue="Absenteeism time in
hours",kind="count", data=data)
19
## Diagram BatangVariabel Body Mass Index
sns.catplot(x="Body mass index",hue="Absenteeism time in hours",kind="count",
data=data)
20
### Diagram Batang
Variabel Reasons
sns.catplot(x="Reason for
absence",hue="Absenteeism time
in hours",kind="count", data=data)
21
### Diagram Batang
Variabel Distance
sns.catplot(y="Distance from
Residence to
Work",hue="Absenteeism
time in hours",kind="count",
data=data)
22
sns.catplot(y="Age",hue="
Absenteeism time in
hours",kind="count",
data=data)
## Diagram BatangVariabel Age
23
Pembagian dataTrainingTesting
from sklearn.model_selection import train_test_split
y = data['Absenteeism time in hours']
X = data.drop(['Absenteeism time in hours'], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20,
random_state = 123)
24
Metode kNN
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
knn.score(X_train, y_train)
knn.score(X_test, y_test)
y_predict=knn.predict(X_test)
y_proba=knn.predict_proba(X_test)
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, auc,
roc_curve
conf_matrix = confusion_matrix(y_test, y_predict)
print(conf_matrix)
Class= data['Absenteeism time in hours'].unique()
conf_matrix_df = pd.DataFrame(data=conf_matrix, columns=Class, index=Class)
conf_matrix_df
sns.heatmap(conf_matrix_df, annot=True, cmap="YlGnBu")
plt.show()
25
Goodness of fit kNN
akurasi=accuracy_score(y_test, y_predict)
presisi=precision_score(y_test, y_predict,pos_label=1, average=None)
recalls=recall_score(y_test,y_predict,pos_label=1, average=None)
print(akurasi)
print(presisi)
print(recalls)
print(akurasi)
print(presisi.mean())
print(recalls.mean())
26
KNN scoreTraining = 0.729729729729
KNN ScoreTesing = 0.729729729
Akurasi = 0.6756756756756757
presisi.mean() = 0.6315567726985465
recalls.mean() = 0.6287363430220573
Metode kNN
27
Metode Naive Bayes
from sklearn.naive_bayes import GaussianNB
nb= GaussianNB()
nb.fit(X_train, y_train)
nb.score(X_train, y_train)
nb.score(X_test, y_test)
ynb_predict=nb.predict(X_test)
ynb_proba=nb.predict_proba(X_test)
conf_matrixnb = confusion_matrix(y_test, ynb_predict)
print(conf_matrixnb)
Class= data['Absenteeism time in hours'].unique()
conf_matrix_df = pd.DataFrame(data=conf_matrixnb,
columns=Class, index=Class)
conf_matrix_df
sns.heatmap(conf_matrix_df, annot=True,
cmap="YlGnBu")
plt.show()
akurasinb=accuracy_score(y_test,
ynb_predict)
presisinb=precision_score(y_test,
ynb_predict,pos_label=1, average=None)
recallsnb=recall_score(y_test,ynb_predict,pos
_label=1, average=None)
print(akurasinb)
print(presisinb.mean())
print(recallsnb.mean())
28
Metode Naive Bayes
Naïve Bayes ScoreTraining =
0.6807432432432
Naïve Bayes ScoreTesting =
0.628378378378
Goodness of Fit
Akurasi = 0.6283783783783784
Presisi = 0.5680592991913747
Recall = 0.5624613481756339
29
Metode Klasifikasi DecisionTree
%pylab inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, BaggingClassifier, 
AdaBoostClassifier,GradientBoostingClassifier
pylab.rcParams['figure.figsize'] = (10, 7)
from sklearn.model_selection import train_test_split
y = data['Absenteeism time in hours']
X = data.drop(['Absenteeism time in hours'], axis=1)
X_train,X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20,
random_state = 123)
DecisionTree = DecisionTreeClassifier(random_state=123)
DecisionTree.fit(X_train, y_train)
DecisionTree.score(X_train, y_train)
30
DecisionTree.score(X_test, y_test)
conf_matrixDecisionTree = confusion_matrix(y_test, ynb_predict)
Class= data['Absenteeism time in hours'].unique()
conf_matrix_df = pd.DataFrame(data=conf_matrixDecisionTree, columns=Class, index=Class)
conf_matrix_df
sns.heatmap(conf_matrix_df, annot=True, cmap="YlGnBu")
plt.show()
Metode Klasifikasi DecisionTree
31
Metode Klasifikasi DecisionTree
DecisionTree ScoreTraining = 0.972972972973
DecisionTree ScoreTesting = 0.8175756756756
32
Metode Klasifikasi Random Forest
RF = RandomForestClassifier(random_state=123)
RF.fit(X_train, y_train)
RF.score(X_train, y_train)
RF.score(X_test, y_test)
conf_matrixRF = confusion_matrix(y_test, ynb_predict)
print(conf_matrixRF)
Class= data['Absenteeism time in hours'].unique()
conf_matrix_df = pd.DataFrame(data=conf_matrixRF, columns=Class,
index=Class)
sns.heatmap(conf_matrix_df, annot=True, cmap="YlGnBu")
plt.show()
33
Metode Klasifikasi Random Forest
RF ScoreTraining = 0.966216216216216
RF ScoreTesting = 0.837837837837837
34
Gradient Boosting
GB = GradientBoostingClassifier(random_state=123)
GB.fit(x_train, y_train)
GB.score(x_train, y_train)
GB.score(x_test, y_test)
Adaptive Boosting
AB = AdaBoostClassifier(random_state=123)
AB.fit(x_train, y_train)
AB.score(x_train, y_train)
AB.score(x_test, y_test)
AB.scoreTraining =0.856418918918919
AB ScoreTesting = 0.8040540540540541
GB.scoreTraining = 0.910472972972973
GB ScoreTesting = 0.8040540540540541
35
ModelTuning DecisionTree
def GridSearch(x, y, model, parameters):
clf = GridSearchCV(model, parameters, scoring='accuracy',
n_jobs=-1, cv=5, verbose=1)
clf.fit(x, y)
print("Best Score: "+str(clf.best_score_))
print("Best Params: "+str(clf.best_params_))
return (clf)
ListParams = {
'criterion': ['gini','entropy'],
'splitter': ['best', 'random'],
'max_features': ['auto','sqrt','log2',None],
'max_depth':[3,6,9],
'class_weight':['balanced', None]
}
BestDecisionTree = GridSearch(x_train, y_train,
DecisionTreeClassifier(random_state=123), ListParams)
BestDecisionTree.score(x_train,
y_train)
BestDecisionTree.score(x_test,
y_test)
BestDecisionTree =
DecisionTreeClassifier(class_weig
ht=None, criterion='gini',
max_depth=3,
max_features=None,
random_state=123,
splitter='random')
BestDecisionTree.fit(x_train,
y_train)
36
Best Score = 0.8293918918918919
Best ScoreTraining = 0.8327702702702703
Best AB ScoreTesting = 0.7905405054406
DecisionTree
ModelTuning
37
Random Forest
ListParams = {
'n_estimators': [50, 75, 100, 200],
'max_depth':[1, 5, 10, 15, 20, 25, 30],
'min_samples_leaf' : [1, 2, 4, 6, 8, 10],
'max_features': [0.1, 'sqrt', 'log2', None]
}
BestRF = GridSearch(x_train, y_train,
RandomForestClassifier(random_state=123),
ListParams)
AB
BestAB.score(x_train, y_train)
BestAB.score(x_test, y_test)
Gradient Boosting
ListParams = {
'loss': ['deviance', 'exponential'],
'learning_rate': [0.001, 0.01, 0.1],
'n_estimators': [50, 75, 100, 200],
'max_depth':[3, 5, 7],
'subsample': [0.5, 0.75, 1],
'max_features': [0.1, 'sqrt', 'log2', None]
}
BestGB = GridSearch(x_train, y_train,
GradientBoostingClassifier(random_state=123),
ListParams)
BestGB.score(x_train, y_train)
BestGB.score(x_test, y_test)
Adaptive Boosting
38
GRADIENT BOOSTING
Best Score = 0.8412162162162162
Best GB ScoreTraining = 0.893581081081081
Best GB ScoreTesting = 0.8040540540540541
ADAPTIVE BOOSTING
Best Score = 0.831081081081081
Best AB ScoreTraining = 0.8445945945945946
Best AB ScoreTesting = 0.8108108108108109
RANDOM FOREST
Best Score = 0.8344594594594594
Best RF ScoreTraining = 0.8665540540540541
Best RF ScoreTesting = 0.8040540540540541
39

More Related Content

Similar to Dian Vitiana Ningrum ()6211540000020)

Learning Predictive Modeling with TSA and Kaggle
Learning Predictive Modeling with TSA and KaggleLearning Predictive Modeling with TSA and Kaggle
Learning Predictive Modeling with TSA and KaggleYvonne K. Matos
 
Machine Learning: Classification Concepts (Part 1)
Machine Learning: Classification Concepts (Part 1)Machine Learning: Classification Concepts (Part 1)
Machine Learning: Classification Concepts (Part 1)Daniel Chan
 
Scikit learn cheat_sheet_python
Scikit learn cheat_sheet_pythonScikit learn cheat_sheet_python
Scikit learn cheat_sheet_pythonZahid Hasan
 
Scikit-learn Cheatsheet-Python
Scikit-learn Cheatsheet-PythonScikit-learn Cheatsheet-Python
Scikit-learn Cheatsheet-PythonDr. Volkan OBAN
 
Cheat Sheet for Machine Learning in Python: Scikit-learn
Cheat Sheet for Machine Learning in Python: Scikit-learnCheat Sheet for Machine Learning in Python: Scikit-learn
Cheat Sheet for Machine Learning in Python: Scikit-learnKarlijn Willems
 
Bigger Data v Better Math
Bigger Data v Better MathBigger Data v Better Math
Bigger Data v Better MathBrent Schneeman
 
wk5ppt1_Titanic
wk5ppt1_Titanicwk5ppt1_Titanic
wk5ppt1_TitanicAliciaWei1
 
ML with python.pdf
ML with python.pdfML with python.pdf
ML with python.pdfn58648017
 
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...Dataconomy Media
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Yao Yao
 
Machine Learning Algorithms
Machine Learning AlgorithmsMachine Learning Algorithms
Machine Learning AlgorithmsHichem Felouat
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsGabriel Moreira
 
Construire un modèle prédictif avec Tensorflow
Construire un modèle prédictif avec TensorflowConstruire un modèle prédictif avec Tensorflow
Construire un modèle prédictif avec TensorflowEric Bustarret
 
Big Data LDN 2017: From Zero to AI in 30 Minutes
Big Data LDN 2017: From Zero to AI in 30 MinutesBig Data LDN 2017: From Zero to AI in 30 Minutes
Big Data LDN 2017: From Zero to AI in 30 MinutesMatt Stubbs
 
Visualizing the Model Selection Process
Visualizing the Model Selection ProcessVisualizing the Model Selection Process
Visualizing the Model Selection ProcessBenjamin Bengfort
 

Similar to Dian Vitiana Ningrum ()6211540000020) (20)

Learning Predictive Modeling with TSA and Kaggle
Learning Predictive Modeling with TSA and KaggleLearning Predictive Modeling with TSA and Kaggle
Learning Predictive Modeling with TSA and Kaggle
 
Machine Learning: Classification Concepts (Part 1)
Machine Learning: Classification Concepts (Part 1)Machine Learning: Classification Concepts (Part 1)
Machine Learning: Classification Concepts (Part 1)
 
Scikit learn cheat_sheet_python
Scikit learn cheat_sheet_pythonScikit learn cheat_sheet_python
Scikit learn cheat_sheet_python
 
Scikit-learn Cheatsheet-Python
Scikit-learn Cheatsheet-PythonScikit-learn Cheatsheet-Python
Scikit-learn Cheatsheet-Python
 
Cheat Sheet for Machine Learning in Python: Scikit-learn
Cheat Sheet for Machine Learning in Python: Scikit-learnCheat Sheet for Machine Learning in Python: Scikit-learn
Cheat Sheet for Machine Learning in Python: Scikit-learn
 
wk5ppt2_Iris
wk5ppt2_Iriswk5ppt2_Iris
wk5ppt2_Iris
 
Bigger Data v Better Math
Bigger Data v Better MathBigger Data v Better Math
Bigger Data v Better Math
 
wk5ppt1_Titanic
wk5ppt1_Titanicwk5ppt1_Titanic
wk5ppt1_Titanic
 
ML with python.pdf
ML with python.pdfML with python.pdf
ML with python.pdf
 
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
 
Machine Learning Algorithms
Machine Learning AlgorithmsMachine Learning Algorithms
Machine Learning Algorithms
 
Naïve Bayes.pptx
Naïve Bayes.pptxNaïve Bayes.pptx
Naïve Bayes.pptx
 
MyStataLab Assignment Help
MyStataLab Assignment HelpMyStataLab Assignment Help
MyStataLab Assignment Help
 
Lower back pain Regression models
Lower back pain Regression modelsLower back pain Regression models
Lower back pain Regression models
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
 
Construire un modèle prédictif avec Tensorflow
Construire un modèle prédictif avec TensorflowConstruire un modèle prédictif avec Tensorflow
Construire un modèle prédictif avec Tensorflow
 
Big Data LDN 2017: From Zero to AI in 30 Minutes
Big Data LDN 2017: From Zero to AI in 30 MinutesBig Data LDN 2017: From Zero to AI in 30 Minutes
Big Data LDN 2017: From Zero to AI in 30 Minutes
 
Visualizing the Model Selection Process
Visualizing the Model Selection ProcessVisualizing the Model Selection Process
Visualizing the Model Selection Process
 
Building ML Pipelines
Building ML PipelinesBuilding ML Pipelines
Building ML Pipelines
 

Recently uploaded

microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 

Recently uploaded (20)

microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 

Dian Vitiana Ningrum ()6211540000020)

  • 1. KLASIFIKASI DATA ABSENSI KERJA DENGAN MENGGUNAKAN METODE K-NEAREST NEIGHBOUR, NAÏVE BAYES, DECISION TREE DAN RANDOM FOREST OLEH : DIAN VITIANA NINGRUM (06211540000020)
  • 2. Kesuksesan suatu perusahaan terletak pada ketepatan waktu dan integritas karyawannya. Absensi kerja perlu dimonitor agar pekerja dapat mematuhi peraturan yang ada di perusahaan. Sistem manajemen kehadiran karyawan yang berfokus meningkatkan produktivitas kerja dan pengembangan diri karyawan dengan lebih efektif dan efisien sehingga perusahaan dapat berkompetisis dengan kompetitor dengan baik 2 PENDAHULUAN
  • 3. SUMBER DATA Sumber data yang digunakan adalah data sekunder, karena menggunakan data pada UCI Machine Learning 3 DATA SEKUNDER Data yang digunakan adalah Absenteeism atWork ABSENTEEISM AT WORK
  • 4. VARIABEL PENELITIAN • Y : Absensi karyawan • X1 : Alasan sakit • X2 : Biaya Trasnsportasi • X3 : Jarak hunian ke tempat kerja • X4 : Waktu pelayanan • X5 : Usia • X6 : Beban kerja rata-ratahari • X7 : Hit target • X8 : Kedisplinan • X9 : Indeks massa tubuh 4
  • 5. METODE KLASIFIKASI 5 K-NEAREST NEIGHBOR NAÏVE BAYES DECISIONTREE RANDOM FOREST Kemudian, dari keempat metode tersebut dicari metode dengan Score paling tinggi
  • 6. 6 Import Packages ##import packages import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.preprocessing import LabelEncoder data = pd.read_excel (r'E:data_absen.xlsx', sheet_name='Sheet1') Loading Data
  • 7. 7 data['Absenteeism time in hours'].replace([0,1,2, 3, 4, 5, 7, 8,16, 24,32, 40, 48, 56, 64, 80, 104, 112, 120 ],['Kurang dari 6 jam','Kurang dari 6 jam','Kurang dari 6 jam','Kurang dari 6 jam','Kurang dari 6 jam','Kurang dari 6 jam', 'Lebih dari 6 jam','Lebih dari 6 jam','Lebih dari 6 jam','Lebih dari 6 jam','Lebih dari 6 jam','Lebih dari 6 jam','Lebih dari 6 jam','Lebih dari 6 jam','Lebih dari 6 jam','Lebih dari 6 jam','Lebih dari 6 jam','Lebih dari 6 jam','Lebih dari 6 jam'],inplace=True) FUTURE ENGINEERING Variabel Y, yaitu data Absenteeism time in hours, data dibagi menjadi 2 kategori, yaitu kurang dari 6 jam dan lebih dari 6 jam, sehingga, data dikode sebagai berikut
  • 8. 8 ## drop data data = data.drop("ID", axis=1) data = data.drop("Education", axis=1) data = data.drop("Son", axis=1) data = data.drop("Seasons", axis=1) data = data.drop("Social drinker", axis=1) data = data.drop("Social smoker", axis=1) data = data.drop("Pet", axis=1) data = data.drop("Height", axis=1) data = data.drop("Month of absence", axis=1) data = data.drop("Day of the week", axis=1) data = data.drop("Weight", axis=1) Drop data dilakukan untuk menghilangkan data yang sekiranya tidak diperlukan dalam analisis
  • 10. 10 ##Deteksi MissingValue for col in data.columns.values: if data[col].isnull().values.any(): print("Missing values in "+col) data.isnull().any() deteksi missing value digunakan untuk mengetahui apakah terdapat data yang missing atau tidak lengkap. hasil deteksi missing value pada data di samping adalah False, artinya tidak ada data yang missing.
  • 12. 12 ##Correlation f,ax = plt.subplots(figsize=(10, 10)) sns.heatmap(correlation, annot=True, linewidths=.5, fmt= '.1f',ax=ax) plt.title("Correlation between Columns of dataFrame",y=1.08) plt.show() variabel umur dan variabel waktu pelayanan memiliki koreasi yang paling besar, yaitu mendekati 0,7. sedangkan korelasi terbesar kedua setelah variabel waktu pelayanan dan umur adalah varaiabel umur dengan variabel indeks massa tubuh.Yaitu sebesar 0.5
  • 13. 13 ## Scatter Plot g = sns.PairGrid(data, hue="Absenteeism time in hours") g.map_diag(plt.hist) g.map_offdiag(plt.scatter) g.add_legend() plt.show()
  • 14. 14 ### PAIR PLOT sns.pairplot(data=data[['Reason for absence', 'Transportation expense', 'Distance from Residence toWork', 'Service time', 'Age', 'Work load Average/day ', 'Hit target', 'Body mass index', 'Absenteeism time in hours']], hue='Absenteeism time in hours') plt.show()
  • 15. 15 ## Komposisi Variabel Y f,ax=plt.subplots(1,2,figsize=(18,8)) data['Absenteeism time in hours'].value_counts().plot.pie(explode=[0 ,0.1],autopct='%1.1f%%',ax=ax[0],shadow =True) ax[0].set_title('Absenteeism time in hours') ax[0].set_ylabel('') sns.countplot('Absenteeism time in hours',data=data,ax=ax[1]) ax[1].set_title('Absenteeism time in hours') plt.show()
  • 16. 16 ## Diagram batangVariabel HitTarget sns.catplot(x="Hit target",hue="Absenteeism time in hours",kind="count", data=data)
  • 17. 17 ## Diagram BatangVariabel ServiceTime sns.catplot(x="Service time",hue="Absenteeism time in hours",kind="count", data=data)
  • 18. 18 ### Diagram BatangVariabel Disciplinary Failure sns.catplot(x="Disciplinary failure",hue="Absenteeism time in hours",kind="count", data=data)
  • 19. 19 ## Diagram BatangVariabel Body Mass Index sns.catplot(x="Body mass index",hue="Absenteeism time in hours",kind="count", data=data)
  • 20. 20 ### Diagram Batang Variabel Reasons sns.catplot(x="Reason for absence",hue="Absenteeism time in hours",kind="count", data=data)
  • 21. 21 ### Diagram Batang Variabel Distance sns.catplot(y="Distance from Residence to Work",hue="Absenteeism time in hours",kind="count", data=data)
  • 23. 23 Pembagian dataTrainingTesting from sklearn.model_selection import train_test_split y = data['Absenteeism time in hours'] X = data.drop(['Absenteeism time in hours'], axis=1) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 123)
  • 24. 24 Metode kNN from sklearn.neighbors import KNeighborsClassifier knn = KNeighborsClassifier(n_neighbors=5) knn.fit(X_train, y_train) knn.score(X_train, y_train) knn.score(X_test, y_test) y_predict=knn.predict(X_test) y_proba=knn.predict_proba(X_test) from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, auc, roc_curve conf_matrix = confusion_matrix(y_test, y_predict) print(conf_matrix) Class= data['Absenteeism time in hours'].unique() conf_matrix_df = pd.DataFrame(data=conf_matrix, columns=Class, index=Class) conf_matrix_df sns.heatmap(conf_matrix_df, annot=True, cmap="YlGnBu") plt.show()
  • 25. 25 Goodness of fit kNN akurasi=accuracy_score(y_test, y_predict) presisi=precision_score(y_test, y_predict,pos_label=1, average=None) recalls=recall_score(y_test,y_predict,pos_label=1, average=None) print(akurasi) print(presisi) print(recalls) print(akurasi) print(presisi.mean()) print(recalls.mean())
  • 26. 26 KNN scoreTraining = 0.729729729729 KNN ScoreTesing = 0.729729729 Akurasi = 0.6756756756756757 presisi.mean() = 0.6315567726985465 recalls.mean() = 0.6287363430220573 Metode kNN
  • 27. 27 Metode Naive Bayes from sklearn.naive_bayes import GaussianNB nb= GaussianNB() nb.fit(X_train, y_train) nb.score(X_train, y_train) nb.score(X_test, y_test) ynb_predict=nb.predict(X_test) ynb_proba=nb.predict_proba(X_test) conf_matrixnb = confusion_matrix(y_test, ynb_predict) print(conf_matrixnb) Class= data['Absenteeism time in hours'].unique() conf_matrix_df = pd.DataFrame(data=conf_matrixnb, columns=Class, index=Class) conf_matrix_df sns.heatmap(conf_matrix_df, annot=True, cmap="YlGnBu") plt.show() akurasinb=accuracy_score(y_test, ynb_predict) presisinb=precision_score(y_test, ynb_predict,pos_label=1, average=None) recallsnb=recall_score(y_test,ynb_predict,pos _label=1, average=None) print(akurasinb) print(presisinb.mean()) print(recallsnb.mean())
  • 28. 28 Metode Naive Bayes Naïve Bayes ScoreTraining = 0.6807432432432 Naïve Bayes ScoreTesting = 0.628378378378 Goodness of Fit Akurasi = 0.6283783783783784 Presisi = 0.5680592991913747 Recall = 0.5624613481756339
  • 29. 29 Metode Klasifikasi DecisionTree %pylab inline import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.preprocessing import LabelEncoder from sklearn.model_selection import GridSearchCV, train_test_split from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import RandomForestClassifier, BaggingClassifier, AdaBoostClassifier,GradientBoostingClassifier pylab.rcParams['figure.figsize'] = (10, 7) from sklearn.model_selection import train_test_split y = data['Absenteeism time in hours'] X = data.drop(['Absenteeism time in hours'], axis=1) X_train,X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 123) DecisionTree = DecisionTreeClassifier(random_state=123) DecisionTree.fit(X_train, y_train) DecisionTree.score(X_train, y_train)
  • 30. 30 DecisionTree.score(X_test, y_test) conf_matrixDecisionTree = confusion_matrix(y_test, ynb_predict) Class= data['Absenteeism time in hours'].unique() conf_matrix_df = pd.DataFrame(data=conf_matrixDecisionTree, columns=Class, index=Class) conf_matrix_df sns.heatmap(conf_matrix_df, annot=True, cmap="YlGnBu") plt.show() Metode Klasifikasi DecisionTree
  • 31. 31 Metode Klasifikasi DecisionTree DecisionTree ScoreTraining = 0.972972972973 DecisionTree ScoreTesting = 0.8175756756756
  • 32. 32 Metode Klasifikasi Random Forest RF = RandomForestClassifier(random_state=123) RF.fit(X_train, y_train) RF.score(X_train, y_train) RF.score(X_test, y_test) conf_matrixRF = confusion_matrix(y_test, ynb_predict) print(conf_matrixRF) Class= data['Absenteeism time in hours'].unique() conf_matrix_df = pd.DataFrame(data=conf_matrixRF, columns=Class, index=Class) sns.heatmap(conf_matrix_df, annot=True, cmap="YlGnBu") plt.show()
  • 33. 33 Metode Klasifikasi Random Forest RF ScoreTraining = 0.966216216216216 RF ScoreTesting = 0.837837837837837
  • 34. 34 Gradient Boosting GB = GradientBoostingClassifier(random_state=123) GB.fit(x_train, y_train) GB.score(x_train, y_train) GB.score(x_test, y_test) Adaptive Boosting AB = AdaBoostClassifier(random_state=123) AB.fit(x_train, y_train) AB.score(x_train, y_train) AB.score(x_test, y_test) AB.scoreTraining =0.856418918918919 AB ScoreTesting = 0.8040540540540541 GB.scoreTraining = 0.910472972972973 GB ScoreTesting = 0.8040540540540541
  • 35. 35 ModelTuning DecisionTree def GridSearch(x, y, model, parameters): clf = GridSearchCV(model, parameters, scoring='accuracy', n_jobs=-1, cv=5, verbose=1) clf.fit(x, y) print("Best Score: "+str(clf.best_score_)) print("Best Params: "+str(clf.best_params_)) return (clf) ListParams = { 'criterion': ['gini','entropy'], 'splitter': ['best', 'random'], 'max_features': ['auto','sqrt','log2',None], 'max_depth':[3,6,9], 'class_weight':['balanced', None] } BestDecisionTree = GridSearch(x_train, y_train, DecisionTreeClassifier(random_state=123), ListParams) BestDecisionTree.score(x_train, y_train) BestDecisionTree.score(x_test, y_test) BestDecisionTree = DecisionTreeClassifier(class_weig ht=None, criterion='gini', max_depth=3, max_features=None, random_state=123, splitter='random') BestDecisionTree.fit(x_train, y_train)
  • 36. 36 Best Score = 0.8293918918918919 Best ScoreTraining = 0.8327702702702703 Best AB ScoreTesting = 0.7905405054406 DecisionTree ModelTuning
  • 37. 37 Random Forest ListParams = { 'n_estimators': [50, 75, 100, 200], 'max_depth':[1, 5, 10, 15, 20, 25, 30], 'min_samples_leaf' : [1, 2, 4, 6, 8, 10], 'max_features': [0.1, 'sqrt', 'log2', None] } BestRF = GridSearch(x_train, y_train, RandomForestClassifier(random_state=123), ListParams) AB BestAB.score(x_train, y_train) BestAB.score(x_test, y_test) Gradient Boosting ListParams = { 'loss': ['deviance', 'exponential'], 'learning_rate': [0.001, 0.01, 0.1], 'n_estimators': [50, 75, 100, 200], 'max_depth':[3, 5, 7], 'subsample': [0.5, 0.75, 1], 'max_features': [0.1, 'sqrt', 'log2', None] } BestGB = GridSearch(x_train, y_train, GradientBoostingClassifier(random_state=123), ListParams) BestGB.score(x_train, y_train) BestGB.score(x_test, y_test) Adaptive Boosting
  • 38. 38 GRADIENT BOOSTING Best Score = 0.8412162162162162 Best GB ScoreTraining = 0.893581081081081 Best GB ScoreTesting = 0.8040540540540541 ADAPTIVE BOOSTING Best Score = 0.831081081081081 Best AB ScoreTraining = 0.8445945945945946 Best AB ScoreTesting = 0.8108108108108109 RANDOM FOREST Best Score = 0.8344594594594594 Best RF ScoreTraining = 0.8665540540540541 Best RF ScoreTesting = 0.8040540540540541
  • 39. 39