SlideShare a Scribd company logo
WEKA TUTORIAL
Presenter
Sajib Sen
1
Outline:
• Data Preprocessing
• Data Dimensionality
• Classification
• Model Evaluation
Data Preprocessing
1. Creating dataset
2. Creating Training, Validation and Test Sets
3. Generating Non-stratified Folds
4. Generating Stratified Folds
5. Discretization
6. Numeric Transform
7. Outliers and Extreme Values
4/30/18 3
Decision Tree Example
4/30/18 4
Dataset Example
• How to create a arff file
• How to open a file
• How to see number of attribute instances and attributes
• How to see the features and their information
Weather_data.arff
4/30/18 5
Creating Training, Validation and Test Sets
Training set: 60%
Cross validation: 20%
Test set: 20%
weather.arff
4/30/18 6
Generating Non-stratified Folds
When I am using k-fold cross validation, can I get each of the folds
from WEKA?
Stratified folds means every fold has every class of your dataset with
maintaining class ratio.
supermarket.arff
4/30/18 7
Generating Stratified Folds
When I am using k-fold cross validation, can I get each of the folds
from WEKA?
Stratified folds means every fold has every class of your dataset with
maintaining class ratio.
supermarket.arff
4/30/18 8
Discretization
Naïve bayes works well when values are discretized
Diabetes.arff
4/30/18 9
Numeric Transform
When your algorithm works well with integer but not real numbers.
Diabetes.arff
4/30/18 10
Outliers and Extreme Values
Is this possible to find out outliers and extreme values that are hidden in
dataset?
InterQuartileRange
Find outlier and extreme values.
Remove them.
nsl kdd dataset arff
4/30/18 11

More Related Content

Similar to Weka tutorial

Best Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowBest Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflow
Databricks
 
crossvalidation.pptx
crossvalidation.pptxcrossvalidation.pptx
crossvalidation.pptx
PriyadharshiniG41
 
How to use selenium successfully
How to use selenium successfullyHow to use selenium successfully
How to use selenium successfully
TEST Huddle
 
Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!
Maarten Smeets
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Sonya Liberman
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
Yalçın Yenigün
 
Data Mining using Weka
Data Mining using WekaData Mining using Weka
Data Mining using Weka
Shashidhar Shenoy
 
Introduction to Weka and Preprocessing.ppt
Introduction to Weka and Preprocessing.pptIntroduction to Weka and Preprocessing.ppt
Introduction to Weka and Preprocessing.ppt
radhikadsu
 
Running with Elephants: Predictive Analytics with HDInsight
Running with Elephants: Predictive Analytics with HDInsightRunning with Elephants: Predictive Analytics with HDInsight
Running with Elephants: Predictive Analytics with HDInsight
Chris Price
 
AzureML – zero to hero
AzureML – zero to heroAzureML – zero to hero
AzureML – zero to hero
Govind Kanshi
 
Saksham Sarode - Building Effective test Data Management in Distributed Envir...
Saksham Sarode - Building Effective test Data Management in Distributed Envir...Saksham Sarode - Building Effective test Data Management in Distributed Envir...
Saksham Sarode - Building Effective test Data Management in Distributed Envir...
TEST Huddle
 
Model Risk Management : Best Practices
Model Risk Management : Best PracticesModel Risk Management : Best Practices
Model Risk Management : Best Practices
QuantUniversity
 
Business Analytics Forum #BAF3
Business Analytics Forum #BAF3Business Analytics Forum #BAF3
Business Analytics Forum #BAF3
Simon Harrison ACMA CGMA
 
Too good to be true? How validate your data
Too good to be true? How validate your dataToo good to be true? How validate your data
Too good to be true? How validate your data
Alex Henderson
 
Barga Data Science lecture 8
Barga Data Science lecture 8Barga Data Science lecture 8
Barga Data Science lecture 8
Roger Barga
 
Data Driven Testing
Data Driven TestingData Driven Testing
Data Driven TestingMaveryx
 
Stacked Ensembles in H2O
Stacked Ensembles in H2OStacked Ensembles in H2O
Stacked Ensembles in H2O
Sri Ambati
 
1)Testing-Fundamentals_L_D.pptx
1)Testing-Fundamentals_L_D.pptx1)Testing-Fundamentals_L_D.pptx
1)Testing-Fundamentals_L_D.pptx
gianggiang114
 
Mar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working GroupMar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working GroupGenomeInABottle
 
DPR.pptx
DPR.pptxDPR.pptx
DPR.pptx
sivakumarR83
 

Similar to Weka tutorial (20)

Best Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowBest Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflow
 
crossvalidation.pptx
crossvalidation.pptxcrossvalidation.pptx
crossvalidation.pptx
 
How to use selenium successfully
How to use selenium successfullyHow to use selenium successfully
How to use selenium successfully
 
Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
Data Mining using Weka
Data Mining using WekaData Mining using Weka
Data Mining using Weka
 
Introduction to Weka and Preprocessing.ppt
Introduction to Weka and Preprocessing.pptIntroduction to Weka and Preprocessing.ppt
Introduction to Weka and Preprocessing.ppt
 
Running with Elephants: Predictive Analytics with HDInsight
Running with Elephants: Predictive Analytics with HDInsightRunning with Elephants: Predictive Analytics with HDInsight
Running with Elephants: Predictive Analytics with HDInsight
 
AzureML – zero to hero
AzureML – zero to heroAzureML – zero to hero
AzureML – zero to hero
 
Saksham Sarode - Building Effective test Data Management in Distributed Envir...
Saksham Sarode - Building Effective test Data Management in Distributed Envir...Saksham Sarode - Building Effective test Data Management in Distributed Envir...
Saksham Sarode - Building Effective test Data Management in Distributed Envir...
 
Model Risk Management : Best Practices
Model Risk Management : Best PracticesModel Risk Management : Best Practices
Model Risk Management : Best Practices
 
Business Analytics Forum #BAF3
Business Analytics Forum #BAF3Business Analytics Forum #BAF3
Business Analytics Forum #BAF3
 
Too good to be true? How validate your data
Too good to be true? How validate your dataToo good to be true? How validate your data
Too good to be true? How validate your data
 
Barga Data Science lecture 8
Barga Data Science lecture 8Barga Data Science lecture 8
Barga Data Science lecture 8
 
Data Driven Testing
Data Driven TestingData Driven Testing
Data Driven Testing
 
Stacked Ensembles in H2O
Stacked Ensembles in H2OStacked Ensembles in H2O
Stacked Ensembles in H2O
 
1)Testing-Fundamentals_L_D.pptx
1)Testing-Fundamentals_L_D.pptx1)Testing-Fundamentals_L_D.pptx
1)Testing-Fundamentals_L_D.pptx
 
Mar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working GroupMar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working Group
 
DPR.pptx
DPR.pptxDPR.pptx
DPR.pptx
 

More from Sajib Sen

An empirical study on algorithmic bias
An empirical study on algorithmic biasAn empirical study on algorithmic bias
An empirical study on algorithmic bias
Sajib Sen
 
Battery Less Solar Power Controller to Drive Load at Constant Power Irrespect...
Battery Less Solar Power Controller to Drive Load at Constant Power Irrespect...Battery Less Solar Power Controller to Drive Load at Constant Power Irrespect...
Battery Less Solar Power Controller to Drive Load at Constant Power Irrespect...
Sajib Sen
 
PMCN 2017- workshop presentation(Instrumentation for Detecting Cervical Cance...
PMCN 2017- workshop presentation(Instrumentation for Detecting Cervical Cance...PMCN 2017- workshop presentation(Instrumentation for Detecting Cervical Cance...
PMCN 2017- workshop presentation(Instrumentation for Detecting Cervical Cance...
Sajib Sen
 
Equifax data breach
Equifax data breachEquifax data breach
Equifax data breach
Sajib Sen
 
A Crowdsourcing Review Technique to Prevent Spreading Fake News
A Crowdsourcing Review Technique to Prevent Spreading Fake NewsA Crowdsourcing Review Technique to Prevent Spreading Fake News
A Crowdsourcing Review Technique to Prevent Spreading Fake News
Sajib Sen
 
Image Recognition with Neural Network
Image Recognition with Neural NetworkImage Recognition with Neural Network
Image Recognition with Neural Network
Sajib Sen
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
Sajib Sen
 
Machine Learning Landscape
Machine Learning LandscapeMachine Learning Landscape
Machine Learning Landscape
Sajib Sen
 
Raspberry-Pi GPIO
Raspberry-Pi GPIORaspberry-Pi GPIO
Raspberry-Pi GPIO
Sajib Sen
 
An Updated Survey on Niching Methods and Their Applications
An Updated Survey on Niching Methods and Their ApplicationsAn Updated Survey on Niching Methods and Their Applications
An Updated Survey on Niching Methods and Their Applications
Sajib Sen
 
Binary classification with logistic regression algorithm using hadoop
Binary classification with logistic regression algorithm using hadoopBinary classification with logistic regression algorithm using hadoop
Binary classification with logistic regression algorithm using hadoop
Sajib Sen
 
Leveraging Machine Learning Approach to Setup Software Defined Network(SDN) C...
Leveraging Machine Learning Approach to Setup Software Defined Network(SDN) C...Leveraging Machine Learning Approach to Setup Software Defined Network(SDN) C...
Leveraging Machine Learning Approach to Setup Software Defined Network(SDN) C...
Sajib Sen
 

More from Sajib Sen (12)

An empirical study on algorithmic bias
An empirical study on algorithmic biasAn empirical study on algorithmic bias
An empirical study on algorithmic bias
 
Battery Less Solar Power Controller to Drive Load at Constant Power Irrespect...
Battery Less Solar Power Controller to Drive Load at Constant Power Irrespect...Battery Less Solar Power Controller to Drive Load at Constant Power Irrespect...
Battery Less Solar Power Controller to Drive Load at Constant Power Irrespect...
 
PMCN 2017- workshop presentation(Instrumentation for Detecting Cervical Cance...
PMCN 2017- workshop presentation(Instrumentation for Detecting Cervical Cance...PMCN 2017- workshop presentation(Instrumentation for Detecting Cervical Cance...
PMCN 2017- workshop presentation(Instrumentation for Detecting Cervical Cance...
 
Equifax data breach
Equifax data breachEquifax data breach
Equifax data breach
 
A Crowdsourcing Review Technique to Prevent Spreading Fake News
A Crowdsourcing Review Technique to Prevent Spreading Fake NewsA Crowdsourcing Review Technique to Prevent Spreading Fake News
A Crowdsourcing Review Technique to Prevent Spreading Fake News
 
Image Recognition with Neural Network
Image Recognition with Neural NetworkImage Recognition with Neural Network
Image Recognition with Neural Network
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
 
Machine Learning Landscape
Machine Learning LandscapeMachine Learning Landscape
Machine Learning Landscape
 
Raspberry-Pi GPIO
Raspberry-Pi GPIORaspberry-Pi GPIO
Raspberry-Pi GPIO
 
An Updated Survey on Niching Methods and Their Applications
An Updated Survey on Niching Methods and Their ApplicationsAn Updated Survey on Niching Methods and Their Applications
An Updated Survey on Niching Methods and Their Applications
 
Binary classification with logistic regression algorithm using hadoop
Binary classification with logistic regression algorithm using hadoopBinary classification with logistic regression algorithm using hadoop
Binary classification with logistic regression algorithm using hadoop
 
Leveraging Machine Learning Approach to Setup Software Defined Network(SDN) C...
Leveraging Machine Learning Approach to Setup Software Defined Network(SDN) C...Leveraging Machine Learning Approach to Setup Software Defined Network(SDN) C...
Leveraging Machine Learning Approach to Setup Software Defined Network(SDN) C...
 

Recently uploaded

H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
ankuprajapati0525
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
manasideore6
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
BrazilAccount1
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 

Recently uploaded (20)

H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 

Weka tutorial