• Save
WEKA:Output Knowledge Representation
Upcoming SlideShare
Loading in...5
×
 

WEKA:Output Knowledge Representation

on

  • 2,684 views

WEKA:Output Knowledge Representation

WEKA:Output Knowledge Representation

Statistics

Views

Total Views
2,684
Views on SlideShare
2,523
Embed Views
161

Actions

Likes
0
Downloads
0
Comments
0

3 Embeds 161

http://www.dataminingtools.net 131
http://dataminingtools.net 29
http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    WEKA:Output Knowledge Representation WEKA:Output Knowledge Representation Presentation Transcript

    • Output: Knowledge Representation
    • Topics Covered
      We will see how knowledge can be represented:
      Decision tables
      Decision tress
      Classification and Association rules
      Dealing with complex rules involving exceptions and relations
      Trees for numeric prediction
      Instance based representation
      Clustering
    • Decision Tables
      Simplest way to represent the output is using the way input was represented
      Selection of attributes is crucial
      Only attributes contributing to the results should be a part of a table
    • Decision Trees
      Divide and conquer approach gives us the results in the form of decision trees
    • Nodes in a decision tree involve testing a particular attribute
      Leaf nodes give a classification that applies to all instances that reach the leaf
      The number of children emerging from a node depends on the type of attribute being tested in the node
      For nominal attribute the number of splits is generally the number of different values of nominal attribute
      For example we can see 3 splits for outlook as it has three possible value
      For numeric attribute, generally we have a two way split representing sets of numbers < or > that the attribute
      For example attribute humidity in the previous example
    • Classification Rules
      Popular alternative to decision trees
      Antecedent, or precondition, of a rule is a series of tests (like the ones at the nodes of a decision tree)
      Consequent, or conclusion, gives the class or classes that apply to instances covered by that rule
    • Rules VS Tree
      Replicated Sub-tree Problem
      Some time the transformation of rules into tree is impractical :
      Consider the following classification rules and the corresponding decision tree
      If a and b then x
      If c and d then x
    • Advantages of rules over trees
      Rules are usually more compact than tree, as we observed in the case of replicated sub tree problem
      New rules can be added to the existing rule set without disturbing ones already there, whereas a tree may require complete reshaping
      Advantages of trees over rules
      Because of the redundancy present in the tree , any sort of ambiguities is avoided
      An instance might be encountered that the rules fail to classify, usually not the case with trees
    • Disjunctive Normal Form
      A rule in distinctive normal form follows close world assumption
      Close world assumption avoids ambiguities
      These rules are written as logical expressions, that is:
      Disjunctive(OR) conditions
      Conjunction(AND) conditions
    • Association Rules
      Association rules can predict any attribute, not just the class
      They can predict combination of attributes
      To select association rules which apply to large number of instances and have high accuracy, we use the following parameter to select an association rule:
      Coverage/Support : Number of instances for which it predicts correctly
      Accuracy/Confidence : Number of instances it predicts correctly in proportion to all the instances to which it is applied
    • Rules with Exception
      For classification rules
      Exceptions can be expressed using the ‘except’ keyword, for example:
      We can have exceptions to exceptions and so on
      Exceptions allows us to scale up well
    • Rules with Relations
      We generally use propositional rules, where we compare an attribute with a constant. For example :
      Relational rules are those which express relationship between attributes, for example:
    • Standard Relations:
      Equality(=) and Inequality (!=) for nominal attributes
      Comparison operators like < and > with numeric attributes
    • Trees for Numerical Prediction
      For numerical prediction we use decision trees
      Right side of the rule, or leaf of tree, would contain a numeric value that is the average of all the training set values to which the rule or leaf applies
      Prediction of numerical quantities is called regression
      Therefore trees for numerical prediction are called regression trees
    • Instance based learning
      In instance based learning we don’t create rules and use the stored instances directly
      In this all the real work is done during the classification of new instances, no pre-processing of training set
      The new instance is compared with the existing ones using a distance metric
      Using the distance metric, the close existing instance is used to assign the class to new one
    • Sometimes more than one nearest neighbor is used, the majority class of the closest k neighbor is assigned to the new instance
      This technique is called k-nearest-neighbor method
      Distance metric used should be according to the data set, most popular is Euclidian distance
      In case of nominal attributes distance metric has to defined manually, for example
      If two attribute are equal, then distance equals 0 else 1
    • Clusters
      When clusters rather than a classifier is learned, the output takes the form of a diagram which shows how the instances fall into clusters
      The output can be of 4 types:
      Clear demarcation of instances into different clusters
      An instance can be a part of more than one cluster, represented by a Venn diagram
      Probability of an instance falling in a cluster, for all the clusters
      Hierarchical tree like structure dividing trees into sub trees and so on
    • Different output types:
    • Visit more self help tutorials
      Pick a tutorial of your choice and browse through it at your own pace.
      The tutorials section is free, self-guiding and will not involve any additional support.
      Visit us at www.dataminingtools.net