SlideShare a Scribd company logo
1 of 11
WEKA TUTORIAL
Presenter
Sajib Sen
1
Outline:
• Data Preprocessing
• Data Dimensionality
• Classification
• Model Evaluation
Data Preprocessing
1. Creating dataset
2. Creating Training, Validation and Test Sets
3. Generating Non-stratified Folds
4. Generating Stratified Folds
5. Discretization
6. Numeric Transform
7. Outliers and Extreme Values
4/30/18 3
Decision Tree Example
4/30/18 4
Dataset Example
• How to create a arff file
• How to open a file
• How to see number of attribute instances and attributes
• How to see the features and their information
Weather_data.arff
4/30/18 5
Creating Training, Validation and Test Sets
Training set: 60%
Cross validation: 20%
Test set: 20%
weather.arff
4/30/18 6
Generating Non-stratified Folds
When I am using k-fold cross validation, can I get each of the folds
from WEKA?
Stratified folds means every fold has every class of your dataset with
maintaining class ratio.
supermarket.arff
4/30/18 7
Generating Stratified Folds
When I am using k-fold cross validation, can I get each of the folds
from WEKA?
Stratified folds means every fold has every class of your dataset with
maintaining class ratio.
supermarket.arff
4/30/18 8
Discretization
Naïve bayes works well when values are discretized
Diabetes.arff
4/30/18 9
Numeric Transform
When your algorithm works well with integer but not real numbers.
Diabetes.arff
4/30/18 10
Outliers and Extreme Values
Is this possible to find out outliers and extreme values that are hidden in
dataset?
InterQuartileRange
Find outlier and extreme values.
Remove them.
nsl kdd dataset arff
4/30/18 11

More Related Content

Similar to Weka tutorial

Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Sonya Liberman
 
Data Driven Testing
Data Driven TestingData Driven Testing
Data Driven Testing
Maveryx
 
Mar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working GroupMar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working Group
GenomeInABottle
 

Similar to Weka tutorial (20)

Best Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowBest Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflow
 
crossvalidation.pptx
crossvalidation.pptxcrossvalidation.pptx
crossvalidation.pptx
 
How to use selenium successfully
How to use selenium successfullyHow to use selenium successfully
How to use selenium successfully
 
Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
Data Mining using Weka
Data Mining using WekaData Mining using Weka
Data Mining using Weka
 
Introduction to Weka and Preprocessing.ppt
Introduction to Weka and Preprocessing.pptIntroduction to Weka and Preprocessing.ppt
Introduction to Weka and Preprocessing.ppt
 
Running with Elephants: Predictive Analytics with HDInsight
Running with Elephants: Predictive Analytics with HDInsightRunning with Elephants: Predictive Analytics with HDInsight
Running with Elephants: Predictive Analytics with HDInsight
 
AzureML – zero to hero
AzureML – zero to heroAzureML – zero to hero
AzureML – zero to hero
 
Saksham Sarode - Building Effective test Data Management in Distributed Envir...
Saksham Sarode - Building Effective test Data Management in Distributed Envir...Saksham Sarode - Building Effective test Data Management in Distributed Envir...
Saksham Sarode - Building Effective test Data Management in Distributed Envir...
 
Model Risk Management : Best Practices
Model Risk Management : Best PracticesModel Risk Management : Best Practices
Model Risk Management : Best Practices
 
Business Analytics Forum #BAF3
Business Analytics Forum #BAF3Business Analytics Forum #BAF3
Business Analytics Forum #BAF3
 
Too good to be true? How validate your data
Too good to be true? How validate your dataToo good to be true? How validate your data
Too good to be true? How validate your data
 
Barga Data Science lecture 8
Barga Data Science lecture 8Barga Data Science lecture 8
Barga Data Science lecture 8
 
Data Driven Testing
Data Driven TestingData Driven Testing
Data Driven Testing
 
Stacked Ensembles in H2O
Stacked Ensembles in H2OStacked Ensembles in H2O
Stacked Ensembles in H2O
 
1)Testing-Fundamentals_L_D.pptx
1)Testing-Fundamentals_L_D.pptx1)Testing-Fundamentals_L_D.pptx
1)Testing-Fundamentals_L_D.pptx
 
Mar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working GroupMar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working Group
 
DPR.pptx
DPR.pptxDPR.pptx
DPR.pptx
 

More from Sajib Sen

PMCN 2017- workshop presentation(Instrumentation for Detecting Cervical Cance...
PMCN 2017- workshop presentation(Instrumentation for Detecting Cervical Cance...PMCN 2017- workshop presentation(Instrumentation for Detecting Cervical Cance...
PMCN 2017- workshop presentation(Instrumentation for Detecting Cervical Cance...
Sajib Sen
 

More from Sajib Sen (12)

An empirical study on algorithmic bias
An empirical study on algorithmic biasAn empirical study on algorithmic bias
An empirical study on algorithmic bias
 
Battery Less Solar Power Controller to Drive Load at Constant Power Irrespect...
Battery Less Solar Power Controller to Drive Load at Constant Power Irrespect...Battery Less Solar Power Controller to Drive Load at Constant Power Irrespect...
Battery Less Solar Power Controller to Drive Load at Constant Power Irrespect...
 
PMCN 2017- workshop presentation(Instrumentation for Detecting Cervical Cance...
PMCN 2017- workshop presentation(Instrumentation for Detecting Cervical Cance...PMCN 2017- workshop presentation(Instrumentation for Detecting Cervical Cance...
PMCN 2017- workshop presentation(Instrumentation for Detecting Cervical Cance...
 
Equifax data breach
Equifax data breachEquifax data breach
Equifax data breach
 
A Crowdsourcing Review Technique to Prevent Spreading Fake News
A Crowdsourcing Review Technique to Prevent Spreading Fake NewsA Crowdsourcing Review Technique to Prevent Spreading Fake News
A Crowdsourcing Review Technique to Prevent Spreading Fake News
 
Image Recognition with Neural Network
Image Recognition with Neural NetworkImage Recognition with Neural Network
Image Recognition with Neural Network
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
 
Machine Learning Landscape
Machine Learning LandscapeMachine Learning Landscape
Machine Learning Landscape
 
Raspberry-Pi GPIO
Raspberry-Pi GPIORaspberry-Pi GPIO
Raspberry-Pi GPIO
 
An Updated Survey on Niching Methods and Their Applications
An Updated Survey on Niching Methods and Their ApplicationsAn Updated Survey on Niching Methods and Their Applications
An Updated Survey on Niching Methods and Their Applications
 
Binary classification with logistic regression algorithm using hadoop
Binary classification with logistic regression algorithm using hadoopBinary classification with logistic regression algorithm using hadoop
Binary classification with logistic regression algorithm using hadoop
 
Leveraging Machine Learning Approach to Setup Software Defined Network(SDN) C...
Leveraging Machine Learning Approach to Setup Software Defined Network(SDN) C...Leveraging Machine Learning Approach to Setup Software Defined Network(SDN) C...
Leveraging Machine Learning Approach to Setup Software Defined Network(SDN) C...
 

Recently uploaded

Artificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdfArtificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdf
Kira Dess
 
01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...
01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...
01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...
AshwaniAnuragi1
 

Recently uploaded (20)

Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...
 
CLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalCLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference Modal
 
Artificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdfArtificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdf
 
Augmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxAugmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptx
 
handbook on reinforce concrete and detailing
handbook on reinforce concrete and detailinghandbook on reinforce concrete and detailing
handbook on reinforce concrete and detailing
 
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfInstruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
 
01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...
01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...
01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...
 
Signal Processing and Linear System Analysis
Signal Processing and Linear System AnalysisSignal Processing and Linear System Analysis
Signal Processing and Linear System Analysis
 
Artificial Intelligence in due diligence
Artificial Intelligence in due diligenceArtificial Intelligence in due diligence
Artificial Intelligence in due diligence
 
Raashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashid final report on Embedded Systems
Raashid final report on Embedded Systems
 
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
 
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfInvolute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
 
Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility Applications
 
Databricks Generative AI FoundationCertified.pdf
Databricks Generative AI FoundationCertified.pdfDatabricks Generative AI FoundationCertified.pdf
Databricks Generative AI FoundationCertified.pdf
 
Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...
Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...
Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...
 
Geometric constructions Engineering Drawing.pdf
Geometric constructions Engineering Drawing.pdfGeometric constructions Engineering Drawing.pdf
Geometric constructions Engineering Drawing.pdf
 
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
 
Dynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxDynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptx
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
 
5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...
 

Weka tutorial