SlideShare a Scribd company logo
1 of 7
WEKA
INTRODUCTION TO WEKA:
WEKA (Waikato Environment for Knowledge Analysis) is a popular suite of machine
learning software written in Java, developed at the University of Waikato, New Zealand.
WEKA is an open source application that is freely available under the GNU general
public license agreement. Originally written in C, the WEKA application has been completely
rewritten in Java and is compatible with almost every computing platform.
It is user friendly with a graphical interface that allows for quick set up and
operation. WEKA operates on the predication that the user data is available as a flat file or
relation. This means that each data object is described by a fixed number of attributes that
usually are of a specific type, normal alpha-numeric or numeric values.
The WEKA application allows novice users a tool to identify hidden information from
database and file systems with simple to use options and visual interfaces.
The WEKA workbench contains a collection of visualization tools and algorithms for
data analysis and predictive modeling, together with graphical user interfaces for easy
access to this functionality.
This original version was primarily designed as a tool for analyzing data from
agricultural domains, but the more recent fully Java-based version (WEKA 3), for
which development started in 1997, is now used in many different application areas, in
particular for educational purposes and research.
ADVANTAGES OF WEKA
The obvious advantage of a package like WEKA is that a whole range of data
preparation, feature selection and data mining algorithms are integrated. This means that
only one data format is needed, and trying out and comparing different approaches
becomes really easy. The package also comes with a GUI, which should make it easier to
use.
1. Portability, since it is fully implemented in the Java programming language and thus
runs on almost any modern computing platform
2. A comprehensive collection of data preprocessing and modeling techniques
3. Ease of use due to its graphical user interfaces
WEKA
4. WEKA supports several standard data mining tasks, more specifically, data
preprocessing, clustering, classification, regression, visualization, and feature
selection
5. Al of WEKA's techniques are predicated on the assumption that the data is available
as a single flat file or relation, where each data point is described by a fixed number
of attributes (normally, numeric or nominal attributes, but some other at tribute
types are also supported).
6. WEKA provides access to SQL databases using Java Database Connectivity and can
process the result returned by a database query.
7. It is not capable of multi-relational data mining, but there is separate software for
converting a collection of linked database tables into a single table that is suitable for
processing using WEKA. Another important area is sequence modeling.
8. Attribute Relationship File Format (ARFF) is the text format file used by WEKA to store data
in a database.
9. The ARFF file contains two sections: the header and the data section. The first line of the
header tells us the relation name.
10. Then there is the list of the attributes (@attribute...). Each attribute is associated
with a unique name and a type.
11. The latter describes the kind of data contained in the variable and what values it can have.
The variables types are: numeric, nominal, string and date.
12. The class attribute is by default the last one of the list. In the header section there can
also be some comment lines, identified with a '%' at the beginning, which can
describe the database content or give the reader information about the author.
13. After that there is the data itself (@data), each line stores the attribute of a single
entry separated by a comma.
14. WEKA's main user interface is the Explorer, but essentially the same functionality can
be accessed through the component-based Knowledge Flow interface and from the
command line. There is also the Experimenter, which allows the systematic
comparison of the predictive performance of WEKA's machine learning algorithms on a
collection of datasets.
WEKA
LAUNCHING WEKA:
The WEKA GUI Chooser window is used to launch WEKA’s graphical environments. At
the bottom of the window are four buttons:
 Simple CLI: It provides a simple command-line interface that allows direct
execution of WEKA commands for operating systems that do not provide their own
command line Interface.
 Explorer: It provides an environment for exploring data with WEKA.
 Experimenter: It provides an environment for performing experiments and
conducting.
 Knowledge Flow: This environment supports essentially the same functions as the
incremental learning.
The WEKA Explorer:
Section Tabs:
At the very top of the window, just below the title bar, is a row of tabs.
When the Explorer is first started only the first tab is active; the others are greyed out. This
is because it is necessary to open (and potentially pre-process) a data set before starting to
explore the data. The tabs are as follows:
1. Preprocess. Choose and modify the data being acted on.
2. Classify. Train and test learning schemes that classify or perform regression.
3. Cluster. Learn clusters for the data.
4. Associate. Learn association rules for the data.
5. Select attributes. Select the most relevant attributes in the data.
6. Visualize. View an interactive 2D plot of the data.
Once the tabs are active, clicking on them flicks between different screens, on
which the respective actions can be performed. The bottom area of the window
(including the status box, the log button, and the WEKA bird) stays visible regardless
of which section you are in.
WEKA
CLASSIFICATION:
Selecting a Classifier:
At the top of the classify section is the Classifier box. This box has a text field that
gives the name of the currently selected classifier, and its options. Clicking on the text box
brings up a GenericObjectEditor dialog box, just the same as for filters that you can use to
configure the options of the current classifier. The Choose button allows you to choose
one of the classifiers that are available in WEKA.
Test Options
The result of applying the chosen classifier will be tested according to the options
that are set by clicking in the Test options box. There are four test modes:
1. Use training set. The classifier is evaluated on how well it predicts the class of the instances
it was trained on.
2. Supplied test set. The classifier is evaluated on how well it predicts the class of a
set of instances loaded from a file. Clicking the Set... button brings up a dialog allowing you
to choose the file to test on.
3. Cross-validation. The classifier is evaluated by cross-validation, using the number of folds
that are entered in the Folds text field.
4. Percentage split. The classifier is evaluated on how well it predicts a certain
percentage of the data which is held out for testing. The amount of data held out
depends on the value entered in the % field.
Note: No matter which evaluation method is used, the model that is output is always the
one build from all the training data. Further testing options can be set by clicking on the
More options... button:
1. Output model. The classification model on the full training set is output so that it
can be viewed, visualized, etc. This option is selected by default.
2. Output per-class stats. The precision/recall and true/false statistics for each class
are output. This option is also selected by default.
3. Output entropy evaluation measures. Entropy evaluation measures are included
in the output. This option is not selected by default.
4. Output confusion matrix. The confusion matrix of the classifier’s predictions is
included in the output. This option is selected by default.
5. Store predictions for visualization. The classifier’s predictions are remembered so
that they can be visualized. This option is selected by default.
WEKA
6. Output predictions. The predictions on the evaluation data are output. Note that in
the case of a cross-validation the instance numbers do not correspond to the location
in the data!
7. Cost-sensitive evaluation. The error is evaluated with respect to a cost matrix.
The Set... button allows you to specify the cost matrix used.
8. Random seed for xval / % Split. This specifies the random seed used when
randomizing the data before it is divided up for evaluation purposes.
The Class Attribute:
The classifiers in WEKA are designed to be trained to predict a single ‘class’
attribute, which is the target for prediction. Some classifiers can only learn nominal classes;
others can only learn numeric classes (regression problems); still others can learn both. By
default, the class is taken to be the last attribute in the data. If you want to train a classifier
to predict a different attribute, click on the box below the Test options box to bring up a
drop-down list of attributes to choose from.
Training a Classifier:
Once the classifier, test options and class have all been set, the learning process is
started by clicking on the Start button. While the classifier is busy being trained, the little
bird moves around. You can stop the training process at any time by clicking on the Stop
button. When training is complete, several things happen. The Classifier output area to the
right of the display is filled with text describing the results of training and testing. A
new entry appears in the Result list box. We look at the result list below; but first we
investigate the text that has been output.
The Classifier Output Text:
The text in the Classifier output area has scroll bars allowing you to browse the results.
Of course, you can also resize the Explorer window to get a larger display area. The output is split
into several sections:
1. Run information. A list of information giving the learning scheme options,
relation name, instances, attributes and test mode that were involved in the
process.
2. Classifier model (full training set). A textual representation of classification
model that was produced on the full training data
WEKA
3. The results of the chosen test mode are broken down thus:
4. Summary. A list of statistics summarizing how accurately the classifier was able to
predict the true class of instances under the chosen test mode
5. Detailed Accuracy By Class. A more detailed per-class break down of the classifier’s
prediction accuracy.
6. Confusion Matrix. Shows how many instances have been assigned to each class.
Elements show the number of test examples whose actual class is the row and whose
predicted class is the column.
The Result List:
After training several classifiers, the result list will contain several entries. Left -
clicking the entries flicks back and forth between the various results that have been
generated. Right-clicking an entry invokes a menu containing these items:
1. View in main window. Shows the output in the main window (just like left-clicking
the entry).
2. View in separate window: Opens a new independent window for viewing the
results.
3. Save result buffer: Brings up a dialog allowing you to save a text file containing the
textual output.
4. Load model: Loads a pre-trained model object from a binary file.
5. Save model: Saves a model object to a binary file. Objects are saved in Java
‘serialized object’ form.
6. Re-evaluate model on current test set: Takes the model that has been built and
tests its performance on the data set that has been specified with the Set..button
under the Supplied test set option.
7. Visualize classifier errors: Brings up a visualization window that plots the results
of classification. Correctly classified instances are represented by crosses, whereas
incorrectly classified ones show up as squares.
8. Visualize tree or Visualize graph: Brings up a graphical representation of the
structure of the classifier model, if possible (i.e. For decision trees or Bayesian
WEKA
networks). The graph visualization option only appears if a Bayesian network
classifier has been built. In the tree visualizer, you can bring up a menu by right -
clicking a blank area, pan around by dragging the mouse, and see the training
instances at each node by clicking on it.
a. CTRL-clicking zooms the view out, while SHIFT-dragging a box zooms the
view in. The graph visualize should be self-explanatory.
9. Visualize margin curve: Generates a plot illustrating the prediction margin. The
margin is defined as the difference between the probability predicted for the actual
class and the highest probability predicted for the other classes. For example,
boosting algorithms may achieve better performance on test data by increasing the
margins on the training data.
10. Visualize threshold curve: Generates a plot illustrating the tradeoffs in prediction
that are obtained by varying the threshold value between classes. For example, with
the default threshold value of 0.5, the predicted probability of ‘positive’ must be
greater than 0.5 for the instance to be predicted as ‘positive’. The plot can be used
to visualize the precision/recall tradeoff, for ROC curve analysis (true positive rate
vs. false positive rate), and for other types of curves.
11. Visualize cost curve. Generates a plot that gives an explicit representation of the
expected cost, as described by Drummond and Holte (2000), Options are greyed out
if they do not apply to the specific set of results.

More Related Content

What's hot

Ooad lab manual
Ooad  lab manualOoad  lab manual
Ooad lab manualPraseela R
 
SE18_Lec 10_ UML Behaviour and Interaction Diagrams
SE18_Lec 10_ UML Behaviour and Interaction DiagramsSE18_Lec 10_ UML Behaviour and Interaction Diagrams
SE18_Lec 10_ UML Behaviour and Interaction DiagramsAmr E. Mohamed
 
OBIEE publisher with Report creation - Tutorial
OBIEE publisher with Report creation - TutorialOBIEE publisher with Report creation - Tutorial
OBIEE publisher with Report creation - Tutorialonlinetrainingplacements
 
SAP ABAP using OOPS - JH Softech
SAP ABAP using OOPS - JH SoftechSAP ABAP using OOPS - JH Softech
SAP ABAP using OOPS - JH SoftechVikram P Madduri
 
EclipseCon 2005: Everything You Always Wanted to do with EMF (But were Afraid...
EclipseCon 2005: Everything You Always Wanted to do with EMF (But were Afraid...EclipseCon 2005: Everything You Always Wanted to do with EMF (But were Afraid...
EclipseCon 2005: Everything You Always Wanted to do with EMF (But were Afraid...Dave Steinberg
 
Using SPMetal for faster SharePoint development
Using SPMetal for faster SharePoint developmentUsing SPMetal for faster SharePoint development
Using SPMetal for faster SharePoint developmentPranav Sharma
 
IMPORT AND EXPORT UTILITIES IN MS-ACCESS
IMPORT AND EXPORT UTILITIES IN MS-ACCESSIMPORT AND EXPORT UTILITIES IN MS-ACCESS
IMPORT AND EXPORT UTILITIES IN MS-ACCESS23HARSHU
 
WEKA Tutorial
WEKA TutorialWEKA Tutorial
WEKA Tutorialbutest
 
SE_Lec 09_ UML Behaviour Diagrams
SE_Lec 09_ UML Behaviour DiagramsSE_Lec 09_ UML Behaviour Diagrams
SE_Lec 09_ UML Behaviour DiagramsAmr E. Mohamed
 
SE18_Lec 06_Object Oriented Analysis and Design
SE18_Lec 06_Object Oriented Analysis and DesignSE18_Lec 06_Object Oriented Analysis and Design
SE18_Lec 06_Object Oriented Analysis and DesignAmr E. Mohamed
 
WEKA: Data Mining Input Concepts Instances And Attributes
WEKA: Data Mining Input Concepts Instances And AttributesWEKA: Data Mining Input Concepts Instances And Attributes
WEKA: Data Mining Input Concepts Instances And AttributesDataminingTools Inc
 
Oracle advanced queuing
Oracle advanced queuingOracle advanced queuing
Oracle advanced queuingGurpreet singh
 
Abstract
AbstractAbstract
Abstractbutest
 

What's hot (20)

Oracle SQL Part 3
Oracle SQL Part 3Oracle SQL Part 3
Oracle SQL Part 3
 
Ppt chapter08
Ppt chapter08Ppt chapter08
Ppt chapter08
 
Reports 6i
Reports 6iReports 6i
Reports 6i
 
Ooad lab manual
Ooad  lab manualOoad  lab manual
Ooad lab manual
 
SE18_Lec 10_ UML Behaviour and Interaction Diagrams
SE18_Lec 10_ UML Behaviour and Interaction DiagramsSE18_Lec 10_ UML Behaviour and Interaction Diagrams
SE18_Lec 10_ UML Behaviour and Interaction Diagrams
 
OBIEE publisher with Report creation - Tutorial
OBIEE publisher with Report creation - TutorialOBIEE publisher with Report creation - Tutorial
OBIEE publisher with Report creation - Tutorial
 
Ch08
Ch08Ch08
Ch08
 
SAP ABAP using OOPS - JH Softech
SAP ABAP using OOPS - JH SoftechSAP ABAP using OOPS - JH Softech
SAP ABAP using OOPS - JH Softech
 
D2 k word_format
D2 k word_formatD2 k word_format
D2 k word_format
 
EclipseCon 2005: Everything You Always Wanted to do with EMF (But were Afraid...
EclipseCon 2005: Everything You Always Wanted to do with EMF (But were Afraid...EclipseCon 2005: Everything You Always Wanted to do with EMF (But were Afraid...
EclipseCon 2005: Everything You Always Wanted to do with EMF (But were Afraid...
 
Using SPMetal for faster SharePoint development
Using SPMetal for faster SharePoint developmentUsing SPMetal for faster SharePoint development
Using SPMetal for faster SharePoint development
 
IMPORT AND EXPORT UTILITIES IN MS-ACCESS
IMPORT AND EXPORT UTILITIES IN MS-ACCESSIMPORT AND EXPORT UTILITIES IN MS-ACCESS
IMPORT AND EXPORT UTILITIES IN MS-ACCESS
 
WEKA Tutorial
WEKA TutorialWEKA Tutorial
WEKA Tutorial
 
SE_Lec 09_ UML Behaviour Diagrams
SE_Lec 09_ UML Behaviour DiagramsSE_Lec 09_ UML Behaviour Diagrams
SE_Lec 09_ UML Behaviour Diagrams
 
SE18_Lec 06_Object Oriented Analysis and Design
SE18_Lec 06_Object Oriented Analysis and DesignSE18_Lec 06_Object Oriented Analysis and Design
SE18_Lec 06_Object Oriented Analysis and Design
 
WEKA: Data Mining Input Concepts Instances And Attributes
WEKA: Data Mining Input Concepts Instances And AttributesWEKA: Data Mining Input Concepts Instances And Attributes
WEKA: Data Mining Input Concepts Instances And Attributes
 
synopsis
synopsissynopsis
synopsis
 
Oracle advanced queuing
Oracle advanced queuingOracle advanced queuing
Oracle advanced queuing
 
Abstract
AbstractAbstract
Abstract
 
Ppt chapter01
Ppt chapter01Ppt chapter01
Ppt chapter01
 

Similar to Introduction to weka

Weka toolkit introduction
Weka toolkit introductionWeka toolkit introduction
Weka toolkit introductionbutest
 
Weka toolkit introduction
Weka toolkit introductionWeka toolkit introduction
Weka toolkit introductionbutest
 
TAO Fayan_ Introduction to WEKA
TAO Fayan_ Introduction to WEKATAO Fayan_ Introduction to WEKA
TAO Fayan_ Introduction to WEKAFayan TAO
 
Installation Guidelines_Weka.pptx
Installation Guidelines_Weka.pptxInstallation Guidelines_Weka.pptx
Installation Guidelines_Weka.pptxJayrajSingh9
 
Weka term paper(siddharth 10 bm60086)
Weka term paper(siddharth 10 bm60086)Weka term paper(siddharth 10 bm60086)
Weka term paper(siddharth 10 bm60086)Siddharth Verma
 
An Introduction To Weka
An Introduction To WekaAn Introduction To Weka
An Introduction To Wekaweka Content
 
Weka_Manual_Sagar
Weka_Manual_SagarWeka_Manual_Sagar
Weka_Manual_SagarSagar Kumar
 
WEKA:Introduction To Weka
WEKA:Introduction To WekaWEKA:Introduction To Weka
WEKA:Introduction To Wekaweka Content
 
Weka : A machine learning algorithms for data mining
Weka : A machine learning algorithms for data miningWeka : A machine learning algorithms for data mining
Weka : A machine learning algorithms for data miningKeshab Kumar Gaurav
 
Getting started with test complete 7
Getting started with test complete 7Getting started with test complete 7
Getting started with test complete 7Hoamuoigio Hoa
 
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082Saurabh Singh
 
MC0078 SMU 2013 Fall session
MC0078 SMU 2013 Fall sessionMC0078 SMU 2013 Fall session
MC0078 SMU 2013 Fall sessionNarinder Kumar
 
A two stage feature selection method for text categorization
A two stage feature selection method for text categorizationA two stage feature selection method for text categorization
A two stage feature selection method for text categorizationParag Tamhane
 
1) workbench basics
1) workbench basics1) workbench basics
1) workbench basicstechbed
 
lab #6
lab #6lab #6
lab #6butest
 
Data Mining with WEKA WEKA
Data Mining with WEKA WEKAData Mining with WEKA WEKA
Data Mining with WEKA WEKAbutest
 

Similar to Introduction to weka (20)

Weka toolkit introduction
Weka toolkit introductionWeka toolkit introduction
Weka toolkit introduction
 
Weka toolkit introduction
Weka toolkit introductionWeka toolkit introduction
Weka toolkit introduction
 
TAO Fayan_ Introduction to WEKA
TAO Fayan_ Introduction to WEKATAO Fayan_ Introduction to WEKA
TAO Fayan_ Introduction to WEKA
 
Weka
Weka Weka
Weka
 
Installation Guidelines_Weka.pptx
Installation Guidelines_Weka.pptxInstallation Guidelines_Weka.pptx
Installation Guidelines_Weka.pptx
 
Weka term paper(siddharth 10 bm60086)
Weka term paper(siddharth 10 bm60086)Weka term paper(siddharth 10 bm60086)
Weka term paper(siddharth 10 bm60086)
 
An Introduction To Weka
An Introduction To WekaAn Introduction To Weka
An Introduction To Weka
 
An Introduction To Weka
An Introduction To WekaAn Introduction To Weka
An Introduction To Weka
 
Weka_Manual_Sagar
Weka_Manual_SagarWeka_Manual_Sagar
Weka_Manual_Sagar
 
WEKA: Introduction To Weka
WEKA: Introduction To WekaWEKA: Introduction To Weka
WEKA: Introduction To Weka
 
WEKA:Introduction To Weka
WEKA:Introduction To WekaWEKA:Introduction To Weka
WEKA:Introduction To Weka
 
Weka : A machine learning algorithms for data mining
Weka : A machine learning algorithms for data miningWeka : A machine learning algorithms for data mining
Weka : A machine learning algorithms for data mining
 
Wek1
Wek1Wek1
Wek1
 
Getting started with test complete 7
Getting started with test complete 7Getting started with test complete 7
Getting started with test complete 7
 
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
 
MC0078 SMU 2013 Fall session
MC0078 SMU 2013 Fall sessionMC0078 SMU 2013 Fall session
MC0078 SMU 2013 Fall session
 
A two stage feature selection method for text categorization
A two stage feature selection method for text categorizationA two stage feature selection method for text categorization
A two stage feature selection method for text categorization
 
1) workbench basics
1) workbench basics1) workbench basics
1) workbench basics
 
lab #6
lab #6lab #6
lab #6
 
Data Mining with WEKA WEKA
Data Mining with WEKA WEKAData Mining with WEKA WEKA
Data Mining with WEKA WEKA
 

More from JK Knowledge

UNIX INTERNALS UNIT-I
UNIX INTERNALS UNIT-IUNIX INTERNALS UNIT-I
UNIX INTERNALS UNIT-IJK Knowledge
 
Symbol table format
Symbol table formatSymbol table format
Symbol table formatJK Knowledge
 
Register allocation
Register allocationRegister allocation
Register allocationJK Knowledge
 
Yacc topic beyond syllabus
Yacc   topic beyond syllabusYacc   topic beyond syllabus
Yacc topic beyond syllabusJK Knowledge
 
Make your analysis or thoughts to be positive
Make your analysis or thoughts to be positiveMake your analysis or thoughts to be positive
Make your analysis or thoughts to be positiveJK Knowledge
 

More from JK Knowledge (6)

UNIX INTERNALS UNIT-I
UNIX INTERNALS UNIT-IUNIX INTERNALS UNIT-I
UNIX INTERNALS UNIT-I
 
Symbol table format
Symbol table formatSymbol table format
Symbol table format
 
Object code
Object codeObject code
Object code
 
Register allocation
Register allocationRegister allocation
Register allocation
 
Yacc topic beyond syllabus
Yacc   topic beyond syllabusYacc   topic beyond syllabus
Yacc topic beyond syllabus
 
Make your analysis or thoughts to be positive
Make your analysis or thoughts to be positiveMake your analysis or thoughts to be positive
Make your analysis or thoughts to be positive
 

Recently uploaded

COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdfKamal Acharya
 
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesLinux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesRashidFaridChishti
 
Augmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxAugmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxMustafa Ahmed
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesChandrakantDivate1
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)ChandrakantDivate1
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...Amil baba
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...HenryBriggs2
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfsumitt6_25730773
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network DevicesChandrakantDivate1
 
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...ssuserdfc773
 
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...ronahami
 
8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessorAshwiniTodkar4
 

Recently uploaded (20)

COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesLinux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
 
Augmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxAugmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptx
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To Curves
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdf
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Signal Processing and Linear System Analysis
Signal Processing and Linear System AnalysisSignal Processing and Linear System Analysis
Signal Processing and Linear System Analysis
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
 
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
 
8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor
 

Introduction to weka

  • 1. WEKA INTRODUCTION TO WEKA: WEKA (Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software written in Java, developed at the University of Waikato, New Zealand. WEKA is an open source application that is freely available under the GNU general public license agreement. Originally written in C, the WEKA application has been completely rewritten in Java and is compatible with almost every computing platform. It is user friendly with a graphical interface that allows for quick set up and operation. WEKA operates on the predication that the user data is available as a flat file or relation. This means that each data object is described by a fixed number of attributes that usually are of a specific type, normal alpha-numeric or numeric values. The WEKA application allows novice users a tool to identify hidden information from database and file systems with simple to use options and visual interfaces. The WEKA workbench contains a collection of visualization tools and algorithms for data analysis and predictive modeling, together with graphical user interfaces for easy access to this functionality. This original version was primarily designed as a tool for analyzing data from agricultural domains, but the more recent fully Java-based version (WEKA 3), for which development started in 1997, is now used in many different application areas, in particular for educational purposes and research. ADVANTAGES OF WEKA The obvious advantage of a package like WEKA is that a whole range of data preparation, feature selection and data mining algorithms are integrated. This means that only one data format is needed, and trying out and comparing different approaches becomes really easy. The package also comes with a GUI, which should make it easier to use. 1. Portability, since it is fully implemented in the Java programming language and thus runs on almost any modern computing platform 2. A comprehensive collection of data preprocessing and modeling techniques 3. Ease of use due to its graphical user interfaces
  • 2. WEKA 4. WEKA supports several standard data mining tasks, more specifically, data preprocessing, clustering, classification, regression, visualization, and feature selection 5. Al of WEKA's techniques are predicated on the assumption that the data is available as a single flat file or relation, where each data point is described by a fixed number of attributes (normally, numeric or nominal attributes, but some other at tribute types are also supported). 6. WEKA provides access to SQL databases using Java Database Connectivity and can process the result returned by a database query. 7. It is not capable of multi-relational data mining, but there is separate software for converting a collection of linked database tables into a single table that is suitable for processing using WEKA. Another important area is sequence modeling. 8. Attribute Relationship File Format (ARFF) is the text format file used by WEKA to store data in a database. 9. The ARFF file contains two sections: the header and the data section. The first line of the header tells us the relation name. 10. Then there is the list of the attributes (@attribute...). Each attribute is associated with a unique name and a type. 11. The latter describes the kind of data contained in the variable and what values it can have. The variables types are: numeric, nominal, string and date. 12. The class attribute is by default the last one of the list. In the header section there can also be some comment lines, identified with a '%' at the beginning, which can describe the database content or give the reader information about the author. 13. After that there is the data itself (@data), each line stores the attribute of a single entry separated by a comma. 14. WEKA's main user interface is the Explorer, but essentially the same functionality can be accessed through the component-based Knowledge Flow interface and from the command line. There is also the Experimenter, which allows the systematic comparison of the predictive performance of WEKA's machine learning algorithms on a collection of datasets.
  • 3. WEKA LAUNCHING WEKA: The WEKA GUI Chooser window is used to launch WEKA’s graphical environments. At the bottom of the window are four buttons:  Simple CLI: It provides a simple command-line interface that allows direct execution of WEKA commands for operating systems that do not provide their own command line Interface.  Explorer: It provides an environment for exploring data with WEKA.  Experimenter: It provides an environment for performing experiments and conducting.  Knowledge Flow: This environment supports essentially the same functions as the incremental learning. The WEKA Explorer: Section Tabs: At the very top of the window, just below the title bar, is a row of tabs. When the Explorer is first started only the first tab is active; the others are greyed out. This is because it is necessary to open (and potentially pre-process) a data set before starting to explore the data. The tabs are as follows: 1. Preprocess. Choose and modify the data being acted on. 2. Classify. Train and test learning schemes that classify or perform regression. 3. Cluster. Learn clusters for the data. 4. Associate. Learn association rules for the data. 5. Select attributes. Select the most relevant attributes in the data. 6. Visualize. View an interactive 2D plot of the data. Once the tabs are active, clicking on them flicks between different screens, on which the respective actions can be performed. The bottom area of the window (including the status box, the log button, and the WEKA bird) stays visible regardless of which section you are in.
  • 4. WEKA CLASSIFICATION: Selecting a Classifier: At the top of the classify section is the Classifier box. This box has a text field that gives the name of the currently selected classifier, and its options. Clicking on the text box brings up a GenericObjectEditor dialog box, just the same as for filters that you can use to configure the options of the current classifier. The Choose button allows you to choose one of the classifiers that are available in WEKA. Test Options The result of applying the chosen classifier will be tested according to the options that are set by clicking in the Test options box. There are four test modes: 1. Use training set. The classifier is evaluated on how well it predicts the class of the instances it was trained on. 2. Supplied test set. The classifier is evaluated on how well it predicts the class of a set of instances loaded from a file. Clicking the Set... button brings up a dialog allowing you to choose the file to test on. 3. Cross-validation. The classifier is evaluated by cross-validation, using the number of folds that are entered in the Folds text field. 4. Percentage split. The classifier is evaluated on how well it predicts a certain percentage of the data which is held out for testing. The amount of data held out depends on the value entered in the % field. Note: No matter which evaluation method is used, the model that is output is always the one build from all the training data. Further testing options can be set by clicking on the More options... button: 1. Output model. The classification model on the full training set is output so that it can be viewed, visualized, etc. This option is selected by default. 2. Output per-class stats. The precision/recall and true/false statistics for each class are output. This option is also selected by default. 3. Output entropy evaluation measures. Entropy evaluation measures are included in the output. This option is not selected by default. 4. Output confusion matrix. The confusion matrix of the classifier’s predictions is included in the output. This option is selected by default. 5. Store predictions for visualization. The classifier’s predictions are remembered so that they can be visualized. This option is selected by default.
  • 5. WEKA 6. Output predictions. The predictions on the evaluation data are output. Note that in the case of a cross-validation the instance numbers do not correspond to the location in the data! 7. Cost-sensitive evaluation. The error is evaluated with respect to a cost matrix. The Set... button allows you to specify the cost matrix used. 8. Random seed for xval / % Split. This specifies the random seed used when randomizing the data before it is divided up for evaluation purposes. The Class Attribute: The classifiers in WEKA are designed to be trained to predict a single ‘class’ attribute, which is the target for prediction. Some classifiers can only learn nominal classes; others can only learn numeric classes (regression problems); still others can learn both. By default, the class is taken to be the last attribute in the data. If you want to train a classifier to predict a different attribute, click on the box below the Test options box to bring up a drop-down list of attributes to choose from. Training a Classifier: Once the classifier, test options and class have all been set, the learning process is started by clicking on the Start button. While the classifier is busy being trained, the little bird moves around. You can stop the training process at any time by clicking on the Stop button. When training is complete, several things happen. The Classifier output area to the right of the display is filled with text describing the results of training and testing. A new entry appears in the Result list box. We look at the result list below; but first we investigate the text that has been output. The Classifier Output Text: The text in the Classifier output area has scroll bars allowing you to browse the results. Of course, you can also resize the Explorer window to get a larger display area. The output is split into several sections: 1. Run information. A list of information giving the learning scheme options, relation name, instances, attributes and test mode that were involved in the process. 2. Classifier model (full training set). A textual representation of classification model that was produced on the full training data
  • 6. WEKA 3. The results of the chosen test mode are broken down thus: 4. Summary. A list of statistics summarizing how accurately the classifier was able to predict the true class of instances under the chosen test mode 5. Detailed Accuracy By Class. A more detailed per-class break down of the classifier’s prediction accuracy. 6. Confusion Matrix. Shows how many instances have been assigned to each class. Elements show the number of test examples whose actual class is the row and whose predicted class is the column. The Result List: After training several classifiers, the result list will contain several entries. Left - clicking the entries flicks back and forth between the various results that have been generated. Right-clicking an entry invokes a menu containing these items: 1. View in main window. Shows the output in the main window (just like left-clicking the entry). 2. View in separate window: Opens a new independent window for viewing the results. 3. Save result buffer: Brings up a dialog allowing you to save a text file containing the textual output. 4. Load model: Loads a pre-trained model object from a binary file. 5. Save model: Saves a model object to a binary file. Objects are saved in Java ‘serialized object’ form. 6. Re-evaluate model on current test set: Takes the model that has been built and tests its performance on the data set that has been specified with the Set..button under the Supplied test set option. 7. Visualize classifier errors: Brings up a visualization window that plots the results of classification. Correctly classified instances are represented by crosses, whereas incorrectly classified ones show up as squares. 8. Visualize tree or Visualize graph: Brings up a graphical representation of the structure of the classifier model, if possible (i.e. For decision trees or Bayesian
  • 7. WEKA networks). The graph visualization option only appears if a Bayesian network classifier has been built. In the tree visualizer, you can bring up a menu by right - clicking a blank area, pan around by dragging the mouse, and see the training instances at each node by clicking on it. a. CTRL-clicking zooms the view out, while SHIFT-dragging a box zooms the view in. The graph visualize should be self-explanatory. 9. Visualize margin curve: Generates a plot illustrating the prediction margin. The margin is defined as the difference between the probability predicted for the actual class and the highest probability predicted for the other classes. For example, boosting algorithms may achieve better performance on test data by increasing the margins on the training data. 10. Visualize threshold curve: Generates a plot illustrating the tradeoffs in prediction that are obtained by varying the threshold value between classes. For example, with the default threshold value of 0.5, the predicted probability of ‘positive’ must be greater than 0.5 for the instance to be predicted as ‘positive’. The plot can be used to visualize the precision/recall tradeoff, for ROC curve analysis (true positive rate vs. false positive rate), and for other types of curves. 11. Visualize cost curve. Generates a plot that gives an explicit representation of the expected cost, as described by Drummond and Holte (2000), Options are greyed out if they do not apply to the specific set of results.