Weka presentation

3,123 views

Published on

Published in: Technology, Education
3 Comments
17 Likes
Statistics
Notes
No Downloads
Views
Total views
3,123
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
0
Comments
3
Likes
17
Embeds 0
No embeds

No notes for slide

Weka presentation

  1. 1. WEKABY: GAURAV SINGHBY: GAURAV SINGH(CENTRAL UNIVERSITY OF BIHAR))
  2. 2. INTRODUCTION TO WEKA A collection of open source of many datamining and machine learning algorithms,Including> Pre-processing on data> Classification> Clustering> Association rule extraction>3D Visualize Developed by researchers at the Universityof Waikato in New Zealand Pure Java based (also open source).
  3. 3. Weka Main Features 71 data pre-processing tools 52 classification/regression algorithms 7 clustering algorithms 9 attribute/subset evaluators + 3 searchalgorithms for feature selection. 3 algorithms for finding association rules 3 graphical user interfaces“The Explorer”“The Experimenter”“The Knowledge Flow”
  4. 4. Weka : Download and Installation Download Weka (the stable version) fromhttp://www.cs.waikato.ac.nz/ml/weka/– Choose a self-extracting executable (includingJava VM) After download is completed, run the selfextracting file to install Weka, and use thedefault set-ups.
  5. 5. GOALThe programs aims to build a state-of-the-artfacility for developing techniques for machinelearning and investigating their application inkey areas of machine learning.Specifically we will create a workbench formachine learning. Determine the factors thatcontributes towards its successful application inthe agriculture, industries, scientific researchand developing new method for machinelearning and ways of accessing theireffectiveness.
  6. 6. Start WekaFrom windows desktop– click “Start”, choose “All programs”– Choose “Weka 3.7.9” to start WekaThen the first interface window appears:Weka GUI Chooser
  7. 7. WEKA APPLICATIONINTERFACES
  8. 8.  Explorer– Environment for exploring data with WEKA. It givesaccess to all the facilities using menu selection andform filling. Experimenter– It can be used to get the answer for a question: Whichmethods and parameter values work best for the givenproblem? Knowledge Flow– Same function as explorer. Supports incrementallearning. It allows designing configurations forstreamed data processing. Incremental algorithms canbe used to process very large datasets.
  9. 9.  Simple CLI– It provides a simple Command Line Interface fordirectly executing WEKA commands.WEKA Application Interface
  10. 10. WEKA FUNCTIONS ANDTOOLS
  11. 11.  Preprocessing Filters Attribute selection Classification/Regression Clustering Association discovery Visualization
  12. 12. LOAD DATA FILE ANDPREPROCESSING
  13. 13.  Load data file in formats: ARFF, CSV,C4.5,binary Import from URL or SQL database (usingJDBC) Preprocessing filterso Adding/removing attributeso Attribute value substitutiono Discretizationo Time series filters (delta, shift)o Sampling, randomizationo Missing value managemento Normalization and other numerictransformations.
  14. 14. WEKA DATA FORMATS
  15. 15. FOUR FORMATS– ARFF (Attribute Relation File Format) has two sections• The Header information defines attribute name, type andrelations.• The Data section lists the data records.– CSV: Comma Separated Values (text file)– C4.5: A format used by a decision induction algorithm C4.5,requires two separated files• Name file: defines the names of the attributes• Date file: lists the records (samples)– Binary– Data can also be read from a URL or from an SQL database(using JDBC).
  16. 16. ATTRIBUTE RELATION FILE FORMAT (arff)An ARFF file consists of two distinct sections• The Header section defines attribute name, type andrelations, start with a keyword.@Relation <data-name>@attribute <attribute-name> <type> or {range}• The Data section lists the data records, starts with@Data list of data instances
  17. 17. Example
  18. 18. WEKA SYSTEM HIERARCHY
  19. 19. Role of WEKAINPUTINPUTRaw dataRaw dataINPUTINPUTRaw dataRaw dataData Ming byData Ming by WEKAWEKA•Pre-processingPre-processing•ClassificationClassification•RegressionRegression•ClusteringClustering•Association RulesAssociation Rules•VisualizationVisualizationData Ming byData Ming by WEKAWEKA•Pre-processingPre-processing•ClassificationClassification•RegressionRegression•ClusteringClustering•Association RulesAssociation Rules•VisualizationVisualizationOUTPUTOUTPUTResultResultOUTPUTOUTPUTResultResult
  20. 20. KDD Process of WEKAD a taK n o w le d g eS e le c tio nP re p ro c e s s in gT ra n s fo rm a tio nD a ta M in in gIn te rp re ta tio nE v a lu a tio n
  21. 21. CLASSIFICATION
  22. 22.  Predicted target must be categorical Implemented methods decision trees(J48) and rules Naive Bayes neural networks instance-based classifier Evaluation methods test data set cross validation (Example)
  23. 23. CLUSTERING
  24. 24.  Clustering allows a user to make groups of data todetermine patterns from the data. Clustering has its advantages when the data set isdefined and a general pattern needs to bedetermined from the data. We can create a specific number of groups,depending on your business needs.
  25. 25.  One defining benefit of clustering over classificationis that every attribute in the data set will be used toanalyze the data. (where as in the classificationmethod, only a subset of the attributes are used inthe model.)
  26. 26. Clustering SimpleKMeans
  27. 27. ASSOCIATION
  28. 28. There are few association rules algorithmsimplemented in WEKA. They try to findassociations between di erent attributes insteadffof trying to predict the value of the classattribute.
  29. 29. Association Rules (A=>B)
  30. 30. 3D Visualising
  31. 31. ConclusionThe overall goal of Weka is to build a state-of-the-art facility for developing machinelearning (ML) techniques and allow people toapply them to real-world data miningproblems.
  32. 32. Thank You !!!

×