Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
WEKABY: GAURAV SINGHBY: GAURAV SINGH(CENTRAL UNIVERSITY OF BIHAR))
INTRODUCTION TO WEKA A collection of open source of many datamining and machine learning algorithms,Including> Pre-proces...
Weka Main Features 71 data pre-processing tools 52 classification/regression algorithms 7 clustering algorithms 9 attr...
Weka : Download and Installation Download Weka (the stable version) fromhttp://www.cs.waikato.ac.nz/ml/weka/– Choose a se...
GOALThe programs aims to build a state-of-the-artfacility for developing techniques for machinelearning and investigating ...
Start WekaFrom windows desktop– click “Start”, choose “All programs”– Choose “Weka 3.7.9” to start WekaThen the first int...
WEKA APPLICATIONINTERFACES
 Explorer– Environment for exploring data with WEKA. It givesaccess to all the facilities using menu selection andform fi...
 Simple CLI– It provides a simple Command Line Interface fordirectly executing WEKA commands.WEKA Application Interface
WEKA FUNCTIONS ANDTOOLS
 Preprocessing Filters Attribute selection Classification/Regression Clustering Association discovery Visualization
LOAD DATA FILE ANDPREPROCESSING
 Load data file in formats: ARFF, CSV,C4.5,binary Import from URL or SQL database (usingJDBC) Preprocessing filterso Ad...
WEKA DATA FORMATS
FOUR FORMATS– ARFF (Attribute Relation File Format) has two sections• The Header information defines attribute name, type ...
ATTRIBUTE RELATION FILE FORMAT (arff)An ARFF file consists of two distinct sections• The Header section defines attribute ...
Example
WEKA SYSTEM HIERARCHY
Role of WEKAINPUTINPUTRaw dataRaw dataINPUTINPUTRaw dataRaw dataData Ming byData Ming by WEKAWEKA•Pre-processingPre-proces...
KDD Process of WEKAD a taK n o w le d g eS e le c tio nP re p ro c e s s in gT ra n s fo rm a tio nD a ta M in in gIn te r...
CLASSIFICATION
 Predicted target must be categorical Implemented methods decision trees(J48) and rules Naive Bayes neural networks ...
CLUSTERING
 Clustering allows a user to make groups of data todetermine patterns from the data. Clustering has its advantages when ...
 One defining benefit of clustering over classificationis that every attribute in the data set will be used toanalyze the...
Clustering SimpleKMeans
ASSOCIATION
There are few association rules algorithmsimplemented in WEKA. They try to findassociations between di erent attributes ins...
Association Rules (A=>B)
3D Visualising
ConclusionThe overall goal of Weka is to build a state-of-the-art facility for developing machinelearning (ML) techniques ...
Thank You !!!
Weka presentation
Weka presentation
Weka presentation
Upcoming SlideShare
Loading in …5
×

Weka presentation

4,166 views

Published on

Published in: Technology, Education

Weka presentation

  1. 1. WEKABY: GAURAV SINGHBY: GAURAV SINGH(CENTRAL UNIVERSITY OF BIHAR))
  2. 2. INTRODUCTION TO WEKA A collection of open source of many datamining and machine learning algorithms,Including> Pre-processing on data> Classification> Clustering> Association rule extraction>3D Visualize Developed by researchers at the Universityof Waikato in New Zealand Pure Java based (also open source).
  3. 3. Weka Main Features 71 data pre-processing tools 52 classification/regression algorithms 7 clustering algorithms 9 attribute/subset evaluators + 3 searchalgorithms for feature selection. 3 algorithms for finding association rules 3 graphical user interfaces“The Explorer”“The Experimenter”“The Knowledge Flow”
  4. 4. Weka : Download and Installation Download Weka (the stable version) fromhttp://www.cs.waikato.ac.nz/ml/weka/– Choose a self-extracting executable (includingJava VM) After download is completed, run the selfextracting file to install Weka, and use thedefault set-ups.
  5. 5. GOALThe programs aims to build a state-of-the-artfacility for developing techniques for machinelearning and investigating their application inkey areas of machine learning.Specifically we will create a workbench formachine learning. Determine the factors thatcontributes towards its successful application inthe agriculture, industries, scientific researchand developing new method for machinelearning and ways of accessing theireffectiveness.
  6. 6. Start WekaFrom windows desktop– click “Start”, choose “All programs”– Choose “Weka 3.7.9” to start WekaThen the first interface window appears:Weka GUI Chooser
  7. 7. WEKA APPLICATIONINTERFACES
  8. 8.  Explorer– Environment for exploring data with WEKA. It givesaccess to all the facilities using menu selection andform filling. Experimenter– It can be used to get the answer for a question: Whichmethods and parameter values work best for the givenproblem? Knowledge Flow– Same function as explorer. Supports incrementallearning. It allows designing configurations forstreamed data processing. Incremental algorithms canbe used to process very large datasets.
  9. 9.  Simple CLI– It provides a simple Command Line Interface fordirectly executing WEKA commands.WEKA Application Interface
  10. 10. WEKA FUNCTIONS ANDTOOLS
  11. 11.  Preprocessing Filters Attribute selection Classification/Regression Clustering Association discovery Visualization
  12. 12. LOAD DATA FILE ANDPREPROCESSING
  13. 13.  Load data file in formats: ARFF, CSV,C4.5,binary Import from URL or SQL database (usingJDBC) Preprocessing filterso Adding/removing attributeso Attribute value substitutiono Discretizationo Time series filters (delta, shift)o Sampling, randomizationo Missing value managemento Normalization and other numerictransformations.
  14. 14. WEKA DATA FORMATS
  15. 15. FOUR FORMATS– ARFF (Attribute Relation File Format) has two sections• The Header information defines attribute name, type andrelations.• The Data section lists the data records.– CSV: Comma Separated Values (text file)– C4.5: A format used by a decision induction algorithm C4.5,requires two separated files• Name file: defines the names of the attributes• Date file: lists the records (samples)– Binary– Data can also be read from a URL or from an SQL database(using JDBC).
  16. 16. ATTRIBUTE RELATION FILE FORMAT (arff)An ARFF file consists of two distinct sections• The Header section defines attribute name, type andrelations, start with a keyword.@Relation <data-name>@attribute <attribute-name> <type> or {range}• The Data section lists the data records, starts with@Data list of data instances
  17. 17. Example
  18. 18. WEKA SYSTEM HIERARCHY
  19. 19. Role of WEKAINPUTINPUTRaw dataRaw dataINPUTINPUTRaw dataRaw dataData Ming byData Ming by WEKAWEKA•Pre-processingPre-processing•ClassificationClassification•RegressionRegression•ClusteringClustering•Association RulesAssociation Rules•VisualizationVisualizationData Ming byData Ming by WEKAWEKA•Pre-processingPre-processing•ClassificationClassification•RegressionRegression•ClusteringClustering•Association RulesAssociation Rules•VisualizationVisualizationOUTPUTOUTPUTResultResultOUTPUTOUTPUTResultResult
  20. 20. KDD Process of WEKAD a taK n o w le d g eS e le c tio nP re p ro c e s s in gT ra n s fo rm a tio nD a ta M in in gIn te rp re ta tio nE v a lu a tio n
  21. 21. CLASSIFICATION
  22. 22.  Predicted target must be categorical Implemented methods decision trees(J48) and rules Naive Bayes neural networks instance-based classifier Evaluation methods test data set cross validation (Example)
  23. 23. CLUSTERING
  24. 24.  Clustering allows a user to make groups of data todetermine patterns from the data. Clustering has its advantages when the data set isdefined and a general pattern needs to bedetermined from the data. We can create a specific number of groups,depending on your business needs.
  25. 25.  One defining benefit of clustering over classificationis that every attribute in the data set will be used toanalyze the data. (where as in the classificationmethod, only a subset of the attributes are used inthe model.)
  26. 26. Clustering SimpleKMeans
  27. 27. ASSOCIATION
  28. 28. There are few association rules algorithmsimplemented in WEKA. They try to findassociations between di erent attributes insteadffof trying to predict the value of the classattribute.
  29. 29. Association Rules (A=>B)
  30. 30. 3D Visualising
  31. 31. ConclusionThe overall goal of Weka is to build a state-of-the-art facility for developing machinelearning (ML) techniques and allow people toapply them to real-world data miningproblems.
  32. 32. Thank You !!!

×