SlideShare a Scribd company logo
Klasifikasi Data Adult Income Amerika Serikat
Shindi Shella May Wara | 06211540000101
Data Mining A
Dra. Kartika Fithriasari, M.Si | Novri Suhermi, S.Si., M.Sc.
Departemen Statistika
Institut Teknologi Sepuluh Nopember Surabaya
01
02
03
04
05
06
OUTLINE
Penjelasan tentang Data yang Digunakan
Deskripsi Data
Tahapan Preprocessing Data
Data Preprocessing
Feature Selection dan Feature Enginering
Feature
Menyajikan data dalam bentuk grafik agar
mudah dipahami
Eksplorasi Data
Hold out Methods > Hyperparametric Tuning
> Cross Validation
Analisis dan Pembahasan
Membandingkan kebaikan model Klasifikasi
Kesimpulan
The Power of PowerPoint | thepopp.com 2
DESKRIPSI
DATA
Data Adult Income
Data yang digunakan merupakan data hasil sensus Amerika
Serikat tahun 1994
Data diekstraksi oleh Barry Becker
Data tersebut digunakan untuk menentukan apakah seseorang
menghasilkan lebih dari 50K dolar dalam setahun berdasarkan
variabel-variabel yang diamati
The Power of PowerPoint | thepopp.com 4
Data Adult Income
The Power of PowerPoint | thepopp.com 5
Variabel Keterangan Kategori
Y Klasifikasi Pendapatan <=50K
>50K
Age Umur continous
Workclass Kelas Pekerja Private
Federal-gov
Local-gov
Self-emp-not-inc
Self-emp-inc
State-Gove
Fnlwgt Final Weight Continous
Education Pendidikan terakhir Bachelors
Some-college
11th
HS-grad
Prof-school
Assoc-acdm
Assoc-voc
9th
7th-8th
12th
Masters
1st-4th
10th
Doctorate
5th-6th
Preschool.
Education Num Lama Pendidikan Continous
Data Adult Income
The Power of PowerPoint | thepopp.com 6
Variabel Keterangan Kategori
Marital Status Status Married-civ-spouse
Divorced
Never-married
Separated
Widowed
Married-spouse-absent
Married-AF-spouse
Occupation Pekerjaan Tech-support
Craft-repair
Other-service
Sales
Exec-managerial
Prof-specialty
Handlers-cleaners
Machine-op-inspct
Adm-clerical
Farming-fishing
Transport-moving
Priv-house-serv
Protective-serv
Armed-Forces.
Relationship Status dalam keluarga Wife
Own-child
Husband
Not-in-family
Other-relative
Unmarried.
Data Adult Income
The Power of PowerPoint | thepopp.com 7
Variabel Keterangan Kategori
Capital Loss Kerugian continous
Hours-Per_week Jam kerja/minggu continous
Native-Country Kebangsaan United-States
Cambodia
England
Puerto-Rico
Canada
Germany
Outlying-US(Guam-USVI-etc)
India
Japan
Greece
South
China
Cuba
Iran
Honduras
Philippines
Italy
Poland
Jamaica
Vietnam
Mexico
Portugal
Ireland
France
Dominican-Republic
DATA
PREPROCESSING
The Power of PowerPoint | thepopp.com 9
Missing Value merupakan nilai yang
hilang pada suatu variabel dengan
sebab tertentu
Mengatasi Missing Value
Dummy merupakan symbol berupa
angka yang merepresentasikan
variabel kategorik
Merubah Data Menjadi
Dummy
Outlier merupakan data dalam suatu
variabel yang memiliki nilai ekstrim
Mengecek Outlier
Mengatasi Missing
Value
Data Kategorik > “Never Worked”
Workclass
Data Kategorik > “None”
Occupation
Data Kategorik > “None”
Native-Country
Variabel
Jumlah Missing
Value
Age 0
Workclass 1836
Fnlwgt 0
Education: 0
Education-Num 0
Marital-Status 0
Occupation 1843
Relationship 0
Race 0
Sex 0
Capital-Gain 0
Capital-Loss 0
Hours-per-Week 0
Native-Country 583
Merubah Data Menjadi Dummy
The Power of PowerPoint | thepopp.com 11
 Age
 Final Weight
 Capital-Gain
 Capital-Loss
 Hours per Week
Variabel Kontinu
 Workclass (6 Kategori)
 Education (16 Kategori)
 Educatin-Num (16 Kategori)
 Marital-Sattus (7 Kategori)
 Occupation (15 Kategori()
 Relationship (6 Kategori)
 Race (5 Kategori)
 Sex (2 Kategori)
 Native Country (41 Kategori)
Variabel Kategorik
BEFORE
AFTER
Mengecek Data Outlier
FEATURE
FEATURE
SELECTION
 Feature selection merupakan
metode untuk memilih variabel
yang paling berpengaruh terhadap
klasifikasi
 Menggunakan metode Decision
Tree Classifier
 Terdapat 7 variabel yang terpilih
menjadi variabel yang paling
berpengaruh
 Variabel tersebut adalah :
Relationship, Final Weight, Age,
Education-Num, Capital-Gain,
Hours-per-Week, Occupation
 Feature engineering merupakan penggalian data
dengan lebih mendalam dengan tujuan untuk
mendapatkan variabel baru yang penting.
 Setelah dicermati, data adult income sudah memiliki
variabel yang jelas sehingga tidak dapat dilakukan
feature enginering
FEATURE ENGINERING
The Power of PowerPoint | thepopp.com 16
EKSPLORASI DATA
Variabel Y
The Power of PowerPoint | thepopp.com 18
Variabel
Kategorik
The Power of PowerPoint | thepopp.com 19
Variabel
Kontinu
The Power of PowerPoint | thepopp.com 20
Korelasi
The Power of PowerPoint | thepopp.com 21
ANALISIS
DAN
PEMBAHASAN
Langkah Analisis
23
Membagi data menjadi data training
dan data testing
HOLD ON METHODS
2
Mencari Parameter terbaik yang bisa
mengoptimalkan hasil klasifikasi
HYPERPARAMETER
TUNING
Mengevaluasi metode klasifikasi
dengan membagi data training menjadi
beberapa fold (bagian)
CROSS VALIDATION
1
3
HOLD out METHODS
The Power of PowerPoint | thepopp.com 24
RANDOMIZE HOLD OUT
METHODS
DATA TRAINING DATA
TESTING
80% 20%
Metode Klasifikasi yang Digunakan
The Power of PowerPoint | thepopp.com 25
Klasifikikasi data dengan mengadaptasi struktur
pohon yang bercabang
Decision Tree
Klasifikasi data dengan mengadaptasi
struktur hutan yang memiliki beberapa
pohon yg memiliki cabang
Random Forest
Klasifikasi data dengan
memperhitungkan probabilitas
pengklasifikasian data
Naïve Bayes
Klasifikasi data dengan melihat adanya
hubungan variabel respon yang
mempengaruhi prediktor
Logistik Regression
Klasifikasi data dengan melihat jenis
klasifikasi milik tetangganya
K-Nearest Neighbour
Naïve Bayes Decision Tree Random Forest Regresi Logistik
Hyperparameter Tunning
K-Nearest Neighbour
The Power of PowerPoint | thepopp.com 26
N_neighbour= 50
weight = distance None
Class_weight : None,
Max_depth : None,
Max_leaf_nodes : 50,
Min_samples_leaf :
21,
Min_samples_split :
2
Max_depth : 10,
Max_Features : None,
Min_samples_leaf :
30, N_estimators : 10
C : 0.8,
Class_weight : none.
Penalty : 11
Naïve Bayes Decision Tree Random Forest Regresi Logistik
Cross Validation
K-Nearest Neighbour
Akurasi adalah tingkat ketepatan system mengklasifikasikan data secara benar
Presisi merupakan jumlah data yang memiliki kategori positif(TP) dibanding total klasifikasi positif
(FP+TP)
Recalls adalah presentase data kategori positif (TP) terklasifikasi secara benar dalam sistem
The Power of PowerPoint | thepopp.com 27
Accuracy : 0,796
Precision : 0,798
Recalls : 0,980
Accuracy : 0,794
Precision : 0,796
Recalls : 0,978
Accuracy : 0,832
Precision : 0,877
Recalls : 0,937
Accuracy : 0,86
Precision : 0,871
Recalls : 0,956
Accuracy : 0,853
Precision : 0,878
Recalls : 0,936
KESIMPULAN
Naïve Bayes Decision Tree Random Forest Regresi Logistik
Data Testing
K-Nearest Neighbour
Akurasi adalah tingkat ketepatan system mengklasifikasikan data secara benar
Presisi merupakan jumlah data yang memiliki kategori positif(TP) dibanding total klasifikasi positif
(FP+TP)
Recalls adalah presentase data kategori positif (TP) terklasifikasi secara benar dalam sistem
The Power of PowerPoint | thepopp.com 29
Accuracy : 1
Precision : 1
Recalls : 1
Accuracy : 0,794
Precision : 0,796
Recalls : 0,978
Accuracy : 0,85
Precision : 0,870
Recalls : 0,957
Accuracy : 0,859
Precision : 0,876
Recalls : 0,946
Accuracy : 0,794
Precision : 0,793
Recalls : 0,983
More Information?
https://nbviewer.jupyter.org/gist/shindishell
a/1960a7297c55232094fa8a9e81bd7eb8
Klasifikasi Data Adult Income Amerika Serikat
The Power of PowerPoint – thepopp.com
Font: Ubuntu font family
Icons: Elegant Icon Font
Shindi Shella May Wara | 06211540000101
Thank You!

More Related Content

Recently uploaded

CONTOH CATATAN OBSERVASI KEPALA SEKOLAH.docx
CONTOH CATATAN OBSERVASI KEPALA SEKOLAH.docxCONTOH CATATAN OBSERVASI KEPALA SEKOLAH.docx
CONTOH CATATAN OBSERVASI KEPALA SEKOLAH.docx
WagKuza
 
Konsep dasar asuhan neonatus ,bayi dan balita
Konsep dasar asuhan neonatus ,bayi dan balitaKonsep dasar asuhan neonatus ,bayi dan balita
Konsep dasar asuhan neonatus ,bayi dan balita
Dilasambong
 
Pulupugbglueysoyaoyatiaitstisitatjsigsktstj.pdf
Pulupugbglueysoyaoyatiaitstisitatjsigsktstj.pdfPulupugbglueysoyaoyatiaitstisitatjsigsktstj.pdf
Pulupugbglueysoyaoyatiaitstisitatjsigsktstj.pdf
MRoyanzainuddin9A
 
PERATURAN BUPATI TENTANG KODE KLASIFIKASI ARSIP
PERATURAN BUPATI TENTANG KODE KLASIFIKASI ARSIPPERATURAN BUPATI TENTANG KODE KLASIFIKASI ARSIP
PERATURAN BUPATI TENTANG KODE KLASIFIKASI ARSIP
Pemdes Wonoyoso
 
Kisi-Kisi Asesmen Madrasah Akidah Akhlak MTs Arridho Tahun Pelajaran 2023-202...
Kisi-Kisi Asesmen Madrasah Akidah Akhlak MTs Arridho Tahun Pelajaran 2023-202...Kisi-Kisi Asesmen Madrasah Akidah Akhlak MTs Arridho Tahun Pelajaran 2023-202...
Kisi-Kisi Asesmen Madrasah Akidah Akhlak MTs Arridho Tahun Pelajaran 2023-202...
mtsarridho
 
Modul Ajar Seni Rupa - Melukis Pemandangan - Fase B.pdf
Modul Ajar Seni Rupa - Melukis Pemandangan  - Fase B.pdfModul Ajar Seni Rupa - Melukis Pemandangan  - Fase B.pdf
Modul Ajar Seni Rupa - Melukis Pemandangan - Fase B.pdf
MiliaSumendap
 
Pertemuan 9 - PERT CPM.pdfPertemuan 9 - PERT CPM.pdf
Pertemuan 9 - PERT CPM.pdfPertemuan 9 - PERT CPM.pdfPertemuan 9 - PERT CPM.pdfPertemuan 9 - PERT CPM.pdf
Pertemuan 9 - PERT CPM.pdfPertemuan 9 - PERT CPM.pdf
idoer11
 
manajer lapangan pelaksana gedung SKK JENJANG 6
manajer lapangan pelaksana gedung SKK JENJANG 6manajer lapangan pelaksana gedung SKK JENJANG 6
manajer lapangan pelaksana gedung SKK JENJANG 6
MhdFadliansyah1
 

Recently uploaded (8)

CONTOH CATATAN OBSERVASI KEPALA SEKOLAH.docx
CONTOH CATATAN OBSERVASI KEPALA SEKOLAH.docxCONTOH CATATAN OBSERVASI KEPALA SEKOLAH.docx
CONTOH CATATAN OBSERVASI KEPALA SEKOLAH.docx
 
Konsep dasar asuhan neonatus ,bayi dan balita
Konsep dasar asuhan neonatus ,bayi dan balitaKonsep dasar asuhan neonatus ,bayi dan balita
Konsep dasar asuhan neonatus ,bayi dan balita
 
Pulupugbglueysoyaoyatiaitstisitatjsigsktstj.pdf
Pulupugbglueysoyaoyatiaitstisitatjsigsktstj.pdfPulupugbglueysoyaoyatiaitstisitatjsigsktstj.pdf
Pulupugbglueysoyaoyatiaitstisitatjsigsktstj.pdf
 
PERATURAN BUPATI TENTANG KODE KLASIFIKASI ARSIP
PERATURAN BUPATI TENTANG KODE KLASIFIKASI ARSIPPERATURAN BUPATI TENTANG KODE KLASIFIKASI ARSIP
PERATURAN BUPATI TENTANG KODE KLASIFIKASI ARSIP
 
Kisi-Kisi Asesmen Madrasah Akidah Akhlak MTs Arridho Tahun Pelajaran 2023-202...
Kisi-Kisi Asesmen Madrasah Akidah Akhlak MTs Arridho Tahun Pelajaran 2023-202...Kisi-Kisi Asesmen Madrasah Akidah Akhlak MTs Arridho Tahun Pelajaran 2023-202...
Kisi-Kisi Asesmen Madrasah Akidah Akhlak MTs Arridho Tahun Pelajaran 2023-202...
 
Modul Ajar Seni Rupa - Melukis Pemandangan - Fase B.pdf
Modul Ajar Seni Rupa - Melukis Pemandangan  - Fase B.pdfModul Ajar Seni Rupa - Melukis Pemandangan  - Fase B.pdf
Modul Ajar Seni Rupa - Melukis Pemandangan - Fase B.pdf
 
Pertemuan 9 - PERT CPM.pdfPertemuan 9 - PERT CPM.pdf
Pertemuan 9 - PERT CPM.pdfPertemuan 9 - PERT CPM.pdfPertemuan 9 - PERT CPM.pdfPertemuan 9 - PERT CPM.pdf
Pertemuan 9 - PERT CPM.pdfPertemuan 9 - PERT CPM.pdf
 
manajer lapangan pelaksana gedung SKK JENJANG 6
manajer lapangan pelaksana gedung SKK JENJANG 6manajer lapangan pelaksana gedung SKK JENJANG 6
manajer lapangan pelaksana gedung SKK JENJANG 6
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
Marius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
Expeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
Pixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
marketingartwork
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
Skeleton Technologies
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Christy Abraham Joy
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

06211540000101 shindi shella klasifikasi data adult income amerika serikat

  • 1. Klasifikasi Data Adult Income Amerika Serikat Shindi Shella May Wara | 06211540000101 Data Mining A Dra. Kartika Fithriasari, M.Si | Novri Suhermi, S.Si., M.Sc. Departemen Statistika Institut Teknologi Sepuluh Nopember Surabaya
  • 2. 01 02 03 04 05 06 OUTLINE Penjelasan tentang Data yang Digunakan Deskripsi Data Tahapan Preprocessing Data Data Preprocessing Feature Selection dan Feature Enginering Feature Menyajikan data dalam bentuk grafik agar mudah dipahami Eksplorasi Data Hold out Methods > Hyperparametric Tuning > Cross Validation Analisis dan Pembahasan Membandingkan kebaikan model Klasifikasi Kesimpulan The Power of PowerPoint | thepopp.com 2
  • 4. Data Adult Income Data yang digunakan merupakan data hasil sensus Amerika Serikat tahun 1994 Data diekstraksi oleh Barry Becker Data tersebut digunakan untuk menentukan apakah seseorang menghasilkan lebih dari 50K dolar dalam setahun berdasarkan variabel-variabel yang diamati The Power of PowerPoint | thepopp.com 4
  • 5. Data Adult Income The Power of PowerPoint | thepopp.com 5 Variabel Keterangan Kategori Y Klasifikasi Pendapatan <=50K >50K Age Umur continous Workclass Kelas Pekerja Private Federal-gov Local-gov Self-emp-not-inc Self-emp-inc State-Gove Fnlwgt Final Weight Continous Education Pendidikan terakhir Bachelors Some-college 11th HS-grad Prof-school Assoc-acdm Assoc-voc 9th 7th-8th 12th Masters 1st-4th 10th Doctorate 5th-6th Preschool. Education Num Lama Pendidikan Continous
  • 6. Data Adult Income The Power of PowerPoint | thepopp.com 6 Variabel Keterangan Kategori Marital Status Status Married-civ-spouse Divorced Never-married Separated Widowed Married-spouse-absent Married-AF-spouse Occupation Pekerjaan Tech-support Craft-repair Other-service Sales Exec-managerial Prof-specialty Handlers-cleaners Machine-op-inspct Adm-clerical Farming-fishing Transport-moving Priv-house-serv Protective-serv Armed-Forces. Relationship Status dalam keluarga Wife Own-child Husband Not-in-family Other-relative Unmarried.
  • 7. Data Adult Income The Power of PowerPoint | thepopp.com 7 Variabel Keterangan Kategori Capital Loss Kerugian continous Hours-Per_week Jam kerja/minggu continous Native-Country Kebangsaan United-States Cambodia England Puerto-Rico Canada Germany Outlying-US(Guam-USVI-etc) India Japan Greece South China Cuba Iran Honduras Philippines Italy Poland Jamaica Vietnam Mexico Portugal Ireland France Dominican-Republic
  • 9. The Power of PowerPoint | thepopp.com 9 Missing Value merupakan nilai yang hilang pada suatu variabel dengan sebab tertentu Mengatasi Missing Value Dummy merupakan symbol berupa angka yang merepresentasikan variabel kategorik Merubah Data Menjadi Dummy Outlier merupakan data dalam suatu variabel yang memiliki nilai ekstrim Mengecek Outlier
  • 10. Mengatasi Missing Value Data Kategorik > “Never Worked” Workclass Data Kategorik > “None” Occupation Data Kategorik > “None” Native-Country Variabel Jumlah Missing Value Age 0 Workclass 1836 Fnlwgt 0 Education: 0 Education-Num 0 Marital-Status 0 Occupation 1843 Relationship 0 Race 0 Sex 0 Capital-Gain 0 Capital-Loss 0 Hours-per-Week 0 Native-Country 583
  • 11. Merubah Data Menjadi Dummy The Power of PowerPoint | thepopp.com 11  Age  Final Weight  Capital-Gain  Capital-Loss  Hours per Week Variabel Kontinu  Workclass (6 Kategori)  Education (16 Kategori)  Educatin-Num (16 Kategori)  Marital-Sattus (7 Kategori)  Occupation (15 Kategori()  Relationship (6 Kategori)  Race (5 Kategori)  Sex (2 Kategori)  Native Country (41 Kategori) Variabel Kategorik
  • 15. FEATURE SELECTION  Feature selection merupakan metode untuk memilih variabel yang paling berpengaruh terhadap klasifikasi  Menggunakan metode Decision Tree Classifier  Terdapat 7 variabel yang terpilih menjadi variabel yang paling berpengaruh  Variabel tersebut adalah : Relationship, Final Weight, Age, Education-Num, Capital-Gain, Hours-per-Week, Occupation
  • 16.  Feature engineering merupakan penggalian data dengan lebih mendalam dengan tujuan untuk mendapatkan variabel baru yang penting.  Setelah dicermati, data adult income sudah memiliki variabel yang jelas sehingga tidak dapat dilakukan feature enginering FEATURE ENGINERING The Power of PowerPoint | thepopp.com 16
  • 18. Variabel Y The Power of PowerPoint | thepopp.com 18
  • 19. Variabel Kategorik The Power of PowerPoint | thepopp.com 19
  • 20. Variabel Kontinu The Power of PowerPoint | thepopp.com 20
  • 21. Korelasi The Power of PowerPoint | thepopp.com 21
  • 23. Langkah Analisis 23 Membagi data menjadi data training dan data testing HOLD ON METHODS 2 Mencari Parameter terbaik yang bisa mengoptimalkan hasil klasifikasi HYPERPARAMETER TUNING Mengevaluasi metode klasifikasi dengan membagi data training menjadi beberapa fold (bagian) CROSS VALIDATION 1 3
  • 24. HOLD out METHODS The Power of PowerPoint | thepopp.com 24 RANDOMIZE HOLD OUT METHODS DATA TRAINING DATA TESTING 80% 20%
  • 25. Metode Klasifikasi yang Digunakan The Power of PowerPoint | thepopp.com 25 Klasifikikasi data dengan mengadaptasi struktur pohon yang bercabang Decision Tree Klasifikasi data dengan mengadaptasi struktur hutan yang memiliki beberapa pohon yg memiliki cabang Random Forest Klasifikasi data dengan memperhitungkan probabilitas pengklasifikasian data Naïve Bayes Klasifikasi data dengan melihat adanya hubungan variabel respon yang mempengaruhi prediktor Logistik Regression Klasifikasi data dengan melihat jenis klasifikasi milik tetangganya K-Nearest Neighbour
  • 26. Naïve Bayes Decision Tree Random Forest Regresi Logistik Hyperparameter Tunning K-Nearest Neighbour The Power of PowerPoint | thepopp.com 26 N_neighbour= 50 weight = distance None Class_weight : None, Max_depth : None, Max_leaf_nodes : 50, Min_samples_leaf : 21, Min_samples_split : 2 Max_depth : 10, Max_Features : None, Min_samples_leaf : 30, N_estimators : 10 C : 0.8, Class_weight : none. Penalty : 11
  • 27. Naïve Bayes Decision Tree Random Forest Regresi Logistik Cross Validation K-Nearest Neighbour Akurasi adalah tingkat ketepatan system mengklasifikasikan data secara benar Presisi merupakan jumlah data yang memiliki kategori positif(TP) dibanding total klasifikasi positif (FP+TP) Recalls adalah presentase data kategori positif (TP) terklasifikasi secara benar dalam sistem The Power of PowerPoint | thepopp.com 27 Accuracy : 0,796 Precision : 0,798 Recalls : 0,980 Accuracy : 0,794 Precision : 0,796 Recalls : 0,978 Accuracy : 0,832 Precision : 0,877 Recalls : 0,937 Accuracy : 0,86 Precision : 0,871 Recalls : 0,956 Accuracy : 0,853 Precision : 0,878 Recalls : 0,936
  • 29. Naïve Bayes Decision Tree Random Forest Regresi Logistik Data Testing K-Nearest Neighbour Akurasi adalah tingkat ketepatan system mengklasifikasikan data secara benar Presisi merupakan jumlah data yang memiliki kategori positif(TP) dibanding total klasifikasi positif (FP+TP) Recalls adalah presentase data kategori positif (TP) terklasifikasi secara benar dalam sistem The Power of PowerPoint | thepopp.com 29 Accuracy : 1 Precision : 1 Recalls : 1 Accuracy : 0,794 Precision : 0,796 Recalls : 0,978 Accuracy : 0,85 Precision : 0,870 Recalls : 0,957 Accuracy : 0,859 Precision : 0,876 Recalls : 0,946 Accuracy : 0,794 Precision : 0,793 Recalls : 0,983
  • 31. Klasifikasi Data Adult Income Amerika Serikat The Power of PowerPoint – thepopp.com Font: Ubuntu font family Icons: Elegant Icon Font Shindi Shella May Wara | 06211540000101 Thank You!