SlideShare a Scribd company logo
1 of 15
WEKA
BY
SONY REDDY
SHREYA SINGH
MAHIMA VERMA
PARTHIBAN
INTRODUCTION OF WEKA
• WEKA – Waikato Environment for Knowledge Analysis
• Collection of machine learning algorithms for data mining task.
• Fully implemented in Java Programming language.
• Features
- 49 Data preprocessing tools
- 76 Classification algorithms
- 8 Clustering algorithms
- 15 Attribute evaluators
INTERFACES
Main GUI
- The Explorer
(exploratory data
analysis)
- The Experimenter
(experimental
environment)
- The Knowledge
flow(new process
model interface)
Simple CLI
- Recommended
for in-depth
usage.
- Offers more
functionality that are
not available in GUI.
THE EXPLORER
Data
preprocess
Classificati
on
Clustering
Associatio
n rules
Attribute
selection
Data
visualizati
on
ARFF file format
External representation of
instance of class.
Should includes
- Dataset name
(preceded by @relation)
- Attributes
(preceded by @attribute)
- Data values
(separated by commas)
DATA CLASSIFICATION
• Bayes
P(H|E) = P(E|H) * P(H)/P(E)
P(H|E) Is posterior probability when evidence is know
• Bayesian network
Where evidences are dependent on each other
P(H|E1, E2, E3,…, EN) = P(E1, E2, E3,…, EN|H) * P(H)/P(E1, E2, E3,…, EN)
• Naïvebayes classifier
Many evidences support occurrence of event where evidences are
independent of each other.
P(H|E1, E2, E3,…, EN) = P(E1|H) * P(E2 |H) * ….* P(E1|H) * P(H)/P(E1, E2, E3,…,
DATA CLASSIFICATION
• Functions
- Multilayer Perceptron (MLP)
a) Feed forward connection between pairs of adjacent layers.
b) Continuous and differential activation functions.
c) Realize a multi-dimensional functional y = Ø(x) between input X € Rdi and
output Y € Rdo .
d) Backpropagation.
ej(n) = dj(n) – yj(n).
DATA CLASSIFICATION
• Lazy:
K-star (nearest neighbor algorithm)
a) K  number of nearest neighbor
b) for each object X in the test set do
calculate the distance D(X,Y)
neighborhood  the k neighbors in training set closest to X
X.class  SelectClass(neighborhood)
c) end for
• Meta
Bagging:
• Bootstrap:
Create a random subset of data by sampling.
Draw N’ of the N samples with replacement.
•Bagging:
Repeat K times.
Create a training set N’ < N.
Train a classifier on the random training set.
To Test, run each trained classifier.
Each classifier votes on the output, majority
For regression: each regressor predicts, take average.
DATA CLASSIFICATION
Rules :
Prism
Generates only 100% correct rules for each class looking at the training set
Accuracy =
𝑷
𝒕
, where P is number of positive instance, T is the total
number of instances
Input : D – Training data , C - the set of class
step1: Compute
𝑃
𝑡
values for class C.
step2: Find one or more pair of
𝑃
𝑡
= 100%
step3: Select one pair as a Rule
step4: Repeat steps 1to3 until D is empty
DATA CLASSIFICATION
• Trees
J48 : implementation of C4.5 algorithm
Input: Training Data
Output: Decision tree
Information gain I(n) = ∑ (n * log2 n)
a) Evaluates Normalized information gain for all class from the training set.
b) Attribute with Highest information gain is used for splitting the data.
c) Splitting stops if all instances in a subset belong to the same class.
d) Attribute with Highest value of information gain is considered as first split
criteria (root node) and successive will be its leaf nodes.
CLUSTERING
K-Mean algorithm
• Widely used partition based clustering method
• Efficient one in terms of execution time
Input:
• K number of clusters,
• D dataset of n objects
Output: Set of K clusters
Method:
a) arbitrarily choose k objects from D as cluster center
b) repeat
c) Reassign each object based on mean value in cluster
d) update the cluster means
e) until no change.
ASSOCIATION RULES
• Apriori algorithm
Fl = (frequent itemsets of cardinality 1);
for(k=1;Fk ≠ Ø;k++)do begin
Ck+1 = apriori-gen(Fk );
for all transactions t € Database do begin
C’t = subset(Ck+1 , t);
for all candidate c € C’t do
c.count ++;
end
Fk+1 = { C € Ck+1 | c.count ≥ minimum support }
end
end
Answer UkFk
264finalppt (1)
264finalppt (1)

More Related Content

What's hot

Machine Learning - Dataset Preparation
Machine Learning - Dataset PreparationMachine Learning - Dataset Preparation
Machine Learning - Dataset PreparationAndrew Ferlitsch
 
Machine Learning - Dummy Variable Conversion
Machine Learning - Dummy Variable ConversionMachine Learning - Dummy Variable Conversion
Machine Learning - Dummy Variable ConversionAndrew Ferlitsch
 
Collections - Array List
Collections - Array List Collections - Array List
Collections - Array List Hitesh-Java
 
Standard Template Library
Standard Template LibraryStandard Template Library
Standard Template LibraryNilesh Dalvi
 
Collections - Lists, Sets
Collections - Lists, Sets Collections - Lists, Sets
Collections - Lists, Sets Hitesh-Java
 
Introduction data structure
Introduction data structureIntroduction data structure
Introduction data structureMuhammad Ismail
 
10. Introduction to Datastructure
10. Introduction to Datastructure10. Introduction to Datastructure
10. Introduction to DatastructureNilesh Dalvi
 
Whiteboarding Coding Challenges in Python
Whiteboarding Coding Challenges in PythonWhiteboarding Coding Challenges in Python
Whiteboarding Coding Challenges in PythonAndrew Ferlitsch
 
Wrokflow programming and provenance query model
Wrokflow programming and provenance query model  Wrokflow programming and provenance query model
Wrokflow programming and provenance query model Rayhan Ferdous
 
Collections In Java
Collections In JavaCollections In Java
Collections In JavaBinoj T E
 
Collections - Maps
Collections - Maps Collections - Maps
Collections - Maps Hitesh-Java
 

What's hot (19)

Machine Learning - Dataset Preparation
Machine Learning - Dataset PreparationMachine Learning - Dataset Preparation
Machine Learning - Dataset Preparation
 
Machine Learning - Dummy Variable Conversion
Machine Learning - Dummy Variable ConversionMachine Learning - Dummy Variable Conversion
Machine Learning - Dummy Variable Conversion
 
Java Collections
Java  Collections Java  Collections
Java Collections
 
Collections - Array List
Collections - Array List Collections - Array List
Collections - Array List
 
Data structure
Data structureData structure
Data structure
 
Standard Template Library
Standard Template LibraryStandard Template Library
Standard Template Library
 
Collections - Lists, Sets
Collections - Lists, Sets Collections - Lists, Sets
Collections - Lists, Sets
 
Introduction data structure
Introduction data structureIntroduction data structure
Introduction data structure
 
10. Introduction to Datastructure
10. Introduction to Datastructure10. Introduction to Datastructure
10. Introduction to Datastructure
 
Advanced R cheat sheet
Advanced R cheat sheetAdvanced R cheat sheet
Advanced R cheat sheet
 
List classes
List classesList classes
List classes
 
Whiteboarding Coding Challenges in Python
Whiteboarding Coding Challenges in PythonWhiteboarding Coding Challenges in Python
Whiteboarding Coding Challenges in Python
 
Collection framework
Collection frameworkCollection framework
Collection framework
 
Wrokflow programming and provenance query model
Wrokflow programming and provenance query model  Wrokflow programming and provenance query model
Wrokflow programming and provenance query model
 
Collections In Java
Collections In JavaCollections In Java
Collections In Java
 
Collections - Maps
Collections - Maps Collections - Maps
Collections - Maps
 
Collections
CollectionsCollections
Collections
 
Collections and generics
Collections and genericsCollections and generics
Collections and generics
 
Clustering: A Scikit Learn Tutorial
Clustering: A Scikit Learn TutorialClustering: A Scikit Learn Tutorial
Clustering: A Scikit Learn Tutorial
 

Similar to 264finalppt (1)

background.pptx
background.pptxbackground.pptx
background.pptxKabileshCm
 
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...ShivarkarSandip
 
08ClassBasic VT.ppt
08ClassBasic VT.ppt08ClassBasic VT.ppt
08ClassBasic VT.pptGaneshaAdhik
 
Supervised Machine Learning in R
Supervised  Machine Learning  in RSupervised  Machine Learning  in R
Supervised Machine Learning in RBabu Priyavrat
 
08ClassBasic.ppt
08ClassBasic.ppt08ClassBasic.ppt
08ClassBasic.pptharsh708944
 
Basics of Classification.ppt
Basics of Classification.pptBasics of Classification.ppt
Basics of Classification.pptNBACriteria2SICET
 
Chapter 8. Classification Basic Concepts.ppt
Chapter 8. Classification Basic Concepts.pptChapter 8. Classification Basic Concepts.ppt
Chapter 8. Classification Basic Concepts.pptSubrata Kumer Paul
 
Building largescalepredictionsystemv1
Building largescalepredictionsystemv1Building largescalepredictionsystemv1
Building largescalepredictionsystemv1arthi v
 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf321106410027
 
Cs501 classification prediction
Cs501 classification predictionCs501 classification prediction
Cs501 classification predictionKamal Singh Lodhi
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptRvishnupriya2
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptRvishnupriya2
 
Classfication Basic.ppt
Classfication Basic.pptClassfication Basic.ppt
Classfication Basic.ppthenonah
 
data mining with weka application
data mining with weka applicationdata mining with weka application
data mining with weka applicationRezapourabbas
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsSalah Amean
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in RSujaAldrin
 
The ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxThe ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxRuby Shrestha
 
Decision Tree from Scratch in Python
Decision Tree from Scratch in PythonDecision Tree from Scratch in Python
Decision Tree from Scratch in PythonDhirajk7
 
Weka : A machine learning algorithms for data mining
Weka : A machine learning algorithms for data miningWeka : A machine learning algorithms for data mining
Weka : A machine learning algorithms for data miningKeshab Kumar Gaurav
 

Similar to 264finalppt (1) (20)

background.pptx
background.pptxbackground.pptx
background.pptx
 
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
 
08ClassBasic VT.ppt
08ClassBasic VT.ppt08ClassBasic VT.ppt
08ClassBasic VT.ppt
 
Supervised Machine Learning in R
Supervised  Machine Learning  in RSupervised  Machine Learning  in R
Supervised Machine Learning in R
 
08ClassBasic.ppt
08ClassBasic.ppt08ClassBasic.ppt
08ClassBasic.ppt
 
08ClassBasic.ppt
08ClassBasic.ppt08ClassBasic.ppt
08ClassBasic.ppt
 
Basics of Classification.ppt
Basics of Classification.pptBasics of Classification.ppt
Basics of Classification.ppt
 
Chapter 8. Classification Basic Concepts.ppt
Chapter 8. Classification Basic Concepts.pptChapter 8. Classification Basic Concepts.ppt
Chapter 8. Classification Basic Concepts.ppt
 
Building largescalepredictionsystemv1
Building largescalepredictionsystemv1Building largescalepredictionsystemv1
Building largescalepredictionsystemv1
 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf
 
Cs501 classification prediction
Cs501 classification predictionCs501 classification prediction
Cs501 classification prediction
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
 
Classfication Basic.ppt
Classfication Basic.pptClassfication Basic.ppt
Classfication Basic.ppt
 
data mining with weka application
data mining with weka applicationdata mining with weka application
data mining with weka application
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R
 
The ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxThe ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptx
 
Decision Tree from Scratch in Python
Decision Tree from Scratch in PythonDecision Tree from Scratch in Python
Decision Tree from Scratch in Python
 
Weka : A machine learning algorithms for data mining
Weka : A machine learning algorithms for data miningWeka : A machine learning algorithms for data mining
Weka : A machine learning algorithms for data mining
 

264finalppt (1)

  • 2. INTRODUCTION OF WEKA • WEKA – Waikato Environment for Knowledge Analysis • Collection of machine learning algorithms for data mining task. • Fully implemented in Java Programming language. • Features - 49 Data preprocessing tools - 76 Classification algorithms - 8 Clustering algorithms - 15 Attribute evaluators
  • 3. INTERFACES Main GUI - The Explorer (exploratory data analysis) - The Experimenter (experimental environment) - The Knowledge flow(new process model interface) Simple CLI - Recommended for in-depth usage. - Offers more functionality that are not available in GUI.
  • 5. ARFF file format External representation of instance of class. Should includes - Dataset name (preceded by @relation) - Attributes (preceded by @attribute) - Data values (separated by commas)
  • 6. DATA CLASSIFICATION • Bayes P(H|E) = P(E|H) * P(H)/P(E) P(H|E) Is posterior probability when evidence is know • Bayesian network Where evidences are dependent on each other P(H|E1, E2, E3,…, EN) = P(E1, E2, E3,…, EN|H) * P(H)/P(E1, E2, E3,…, EN) • Naïvebayes classifier Many evidences support occurrence of event where evidences are independent of each other. P(H|E1, E2, E3,…, EN) = P(E1|H) * P(E2 |H) * ….* P(E1|H) * P(H)/P(E1, E2, E3,…,
  • 7. DATA CLASSIFICATION • Functions - Multilayer Perceptron (MLP) a) Feed forward connection between pairs of adjacent layers. b) Continuous and differential activation functions. c) Realize a multi-dimensional functional y = Ø(x) between input X € Rdi and output Y € Rdo . d) Backpropagation. ej(n) = dj(n) – yj(n).
  • 8. DATA CLASSIFICATION • Lazy: K-star (nearest neighbor algorithm) a) K  number of nearest neighbor b) for each object X in the test set do calculate the distance D(X,Y) neighborhood  the k neighbors in training set closest to X X.class  SelectClass(neighborhood) c) end for
  • 9. • Meta Bagging: • Bootstrap: Create a random subset of data by sampling. Draw N’ of the N samples with replacement. •Bagging: Repeat K times. Create a training set N’ < N. Train a classifier on the random training set. To Test, run each trained classifier. Each classifier votes on the output, majority For regression: each regressor predicts, take average.
  • 10. DATA CLASSIFICATION Rules : Prism Generates only 100% correct rules for each class looking at the training set Accuracy = 𝑷 𝒕 , where P is number of positive instance, T is the total number of instances Input : D – Training data , C - the set of class step1: Compute 𝑃 𝑡 values for class C. step2: Find one or more pair of 𝑃 𝑡 = 100% step3: Select one pair as a Rule step4: Repeat steps 1to3 until D is empty
  • 11. DATA CLASSIFICATION • Trees J48 : implementation of C4.5 algorithm Input: Training Data Output: Decision tree Information gain I(n) = ∑ (n * log2 n) a) Evaluates Normalized information gain for all class from the training set. b) Attribute with Highest information gain is used for splitting the data. c) Splitting stops if all instances in a subset belong to the same class. d) Attribute with Highest value of information gain is considered as first split criteria (root node) and successive will be its leaf nodes.
  • 12. CLUSTERING K-Mean algorithm • Widely used partition based clustering method • Efficient one in terms of execution time Input: • K number of clusters, • D dataset of n objects Output: Set of K clusters Method: a) arbitrarily choose k objects from D as cluster center b) repeat c) Reassign each object based on mean value in cluster d) update the cluster means e) until no change.
  • 13. ASSOCIATION RULES • Apriori algorithm Fl = (frequent itemsets of cardinality 1); for(k=1;Fk ≠ Ø;k++)do begin Ck+1 = apriori-gen(Fk ); for all transactions t € Database do begin C’t = subset(Ck+1 , t); for all candidate c € C’t do c.count ++; end Fk+1 = { C € Ck+1 | c.count ≥ minimum support } end end Answer UkFk