1
Rule Based Classification
2
Rule-Based Classification
 Model – Rules
 Set of IF-THEN rules
 IF age = youth AND student = yes THEN buys_computer =
yes
 Rule antecedent/precondition vs. rule consequent
 Assessment of a rule: coverage and accuracy
 ncovers = # of tuples covered by R
 ncorrect = # of tuples correctly classified by R
 coverage(R) = ncovers /|D| /* D: training data set */
 accuracy(R) = ncorrect / ncovers
3
Rule Accuracy and Coverage
age income studentcredit_ratingbuys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
IF age = youth AND
student = yes THEN
buys_computer = yes
Coverage = 2/14 = 14.28%
Accuracy = 2/2 = 100%
4
If-Then Rules
 Rule Triggering
 Input X satisfies a rule
 Several rules are triggered – Conflict Resolution
 Size Ordering
 Highest priority to toughest (rule antecedent size) rule
 Rule Ordering
 Rules are prioritized before-hand
 Class based ordering
 Rules for most prevalent class comes first or based on mis-classification
cost / class
 Rule-based ordering
 Rule Quality based measures
 Ordered list – Decision list – Must be processed strictly in order
 No rule is triggered – Default rule
5
age?
student? credit rating?
<=30 >40
no yes yes
yes
31..40
no
fairexcellentyesno
 Example: Rule extraction from the buys_computer decision-tree
IF age = young AND student = no THEN buys_computer = no
IF age = young AND student = yes THEN buys_computer = yes
IF age = mid-age THEN buys_computer = yes
IF age = old AND credit_rating = excellent THEN buys_computer = yes
IF age = young AND credit_rating = fair THEN buys_computer = no
Rule Extraction from a Decision Tree
 Rules are easier to understand than large trees
 One rule is created for each path from the root to a leaf
 Each attribute-value pair along a path forms a
conjunction: the leaf holds the class prediction
 Rules are mutually exclusive and exhaustive (so
unordered)
6
Rule Extraction from Decision Tree
 Set of extracted rules – very high
 Pruning may be required
 Rule Generalization – For a given rule antecedent any condition
that does not improve the estimated accuracy can be dropped
 Side-effects of pruning
 Mutually Exclusive? / Exhaustive?
 C4.5 – Class Ordering for Conflict resolution
 All rules for a single class are grouped together
 Class rule sets are ranked – to minimize false-positive errors
 Default class – one that contains most training tuples not covered by
any rule
7
Rule Extraction from the Training Data
 Sequential covering algorithm: Extracts rules directly from training data
 Associative Classification Algorithms – may also be used
 Typical sequential covering algorithms: FOIL (First Order Inductive Learner), AQ,
CN2, RIPPER
 Rules are learned sequentially, each rule for a given class Ciwill cover many
tuples of Ci but none (or few) of the tuples of other classes
 Steps:
 Rules are learned one at a time
 Each time a rule is learned, the tuples covered by the rules are removed
 The process repeats on the remaining tuples unless termination condition, e.g.,
when no more training examples or when the quality of a rule returned is below a
user-specified threshold
8
 Algorithm: Sequential Covering
 Input: D, Att_vals
 Output: If-Then rules
 Method:
Rule_set = {}
For each class c do
Repeat
Rule = Learn_One_Rule(D, Att_vals, c) // Finds best rule for given class
Remove tuples covered by Rule from D
Until terminating condition
Rule_set = Rule_set + Rule
End for
Return Rule_Set
Rule Extraction from the Training Data
9
 Start with the most general rule possible: condition = empty
 Adding new attributes by adopting a greedy depth-first strategy
 Picks the one that most improves the rule quality
 Example:
 Start with IF _ THEN loan_decision = accept
 Consider IF loan_term=short THEN.. / IF loan_term=long THEN.. / IF income
= high THEN.. / IF income = medium THEN.. / …
 If best one is IF income = high THEN loan_decision = accept expand it
further
Rule Extraction from the Training Data
10
Rule Extraction from the Training Data
11
 Rule Quality measures
 Coverage or Accuracy independently will not be sufficient
 Rule-Quality measures: consider both coverage and accuracy
 Foil-gain (in FOIL & RIPPER): assesses info_gain by extending condition
It favors rules that have high accuracy and cover many positive tuples
R – Existing rule; R’ – Extended rule
 Likelihood Ratio Statistic
 Likelihood_Ratio = 2 ∑i=1
m
fi log(fi/ei)
 Greater this value – higher the significance
)log
''
'
(log'_ 22
negpos
pos
negpos
pos
posGainFOIL
+
−
+
×=
Rule Extraction from the Training Data
12
Rule Extraction from the Training Data
 Rule pruning based on an independent set of test tuples
Pos/neg are # of positive/negative tuples covered by R.
If FOIL_Prune is higher for the pruned version of R, prune R
negpos
negpos
RPruneFOIL
+
−
=)(_

2.4 rule based classification

  • 1.
  • 2.
    2 Rule-Based Classification  Model– Rules  Set of IF-THEN rules  IF age = youth AND student = yes THEN buys_computer = yes  Rule antecedent/precondition vs. rule consequent  Assessment of a rule: coverage and accuracy  ncovers = # of tuples covered by R  ncorrect = # of tuples correctly classified by R  coverage(R) = ncovers /|D| /* D: training data set */  accuracy(R) = ncorrect / ncovers
  • 3.
    3 Rule Accuracy andCoverage age income studentcredit_ratingbuys_computer <=30 high no fair no <=30 high no excellent no 31…40 high no fair yes >40 medium no fair yes >40 low yes fair yes >40 low yes excellent no 31…40 low yes excellent yes <=30 medium no fair no <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellent yes 31…40 medium no excellent yes 31…40 high yes fair yes >40 medium no excellent no IF age = youth AND student = yes THEN buys_computer = yes Coverage = 2/14 = 14.28% Accuracy = 2/2 = 100%
  • 4.
    4 If-Then Rules  RuleTriggering  Input X satisfies a rule  Several rules are triggered – Conflict Resolution  Size Ordering  Highest priority to toughest (rule antecedent size) rule  Rule Ordering  Rules are prioritized before-hand  Class based ordering  Rules for most prevalent class comes first or based on mis-classification cost / class  Rule-based ordering  Rule Quality based measures  Ordered list – Decision list – Must be processed strictly in order  No rule is triggered – Default rule
  • 5.
    5 age? student? credit rating? <=30>40 no yes yes yes 31..40 no fairexcellentyesno  Example: Rule extraction from the buys_computer decision-tree IF age = young AND student = no THEN buys_computer = no IF age = young AND student = yes THEN buys_computer = yes IF age = mid-age THEN buys_computer = yes IF age = old AND credit_rating = excellent THEN buys_computer = yes IF age = young AND credit_rating = fair THEN buys_computer = no Rule Extraction from a Decision Tree  Rules are easier to understand than large trees  One rule is created for each path from the root to a leaf  Each attribute-value pair along a path forms a conjunction: the leaf holds the class prediction  Rules are mutually exclusive and exhaustive (so unordered)
  • 6.
    6 Rule Extraction fromDecision Tree  Set of extracted rules – very high  Pruning may be required  Rule Generalization – For a given rule antecedent any condition that does not improve the estimated accuracy can be dropped  Side-effects of pruning  Mutually Exclusive? / Exhaustive?  C4.5 – Class Ordering for Conflict resolution  All rules for a single class are grouped together  Class rule sets are ranked – to minimize false-positive errors  Default class – one that contains most training tuples not covered by any rule
  • 7.
    7 Rule Extraction fromthe Training Data  Sequential covering algorithm: Extracts rules directly from training data  Associative Classification Algorithms – may also be used  Typical sequential covering algorithms: FOIL (First Order Inductive Learner), AQ, CN2, RIPPER  Rules are learned sequentially, each rule for a given class Ciwill cover many tuples of Ci but none (or few) of the tuples of other classes  Steps:  Rules are learned one at a time  Each time a rule is learned, the tuples covered by the rules are removed  The process repeats on the remaining tuples unless termination condition, e.g., when no more training examples or when the quality of a rule returned is below a user-specified threshold
  • 8.
    8  Algorithm: SequentialCovering  Input: D, Att_vals  Output: If-Then rules  Method: Rule_set = {} For each class c do Repeat Rule = Learn_One_Rule(D, Att_vals, c) // Finds best rule for given class Remove tuples covered by Rule from D Until terminating condition Rule_set = Rule_set + Rule End for Return Rule_Set Rule Extraction from the Training Data
  • 9.
    9  Start withthe most general rule possible: condition = empty  Adding new attributes by adopting a greedy depth-first strategy  Picks the one that most improves the rule quality  Example:  Start with IF _ THEN loan_decision = accept  Consider IF loan_term=short THEN.. / IF loan_term=long THEN.. / IF income = high THEN.. / IF income = medium THEN.. / …  If best one is IF income = high THEN loan_decision = accept expand it further Rule Extraction from the Training Data
  • 10.
    10 Rule Extraction fromthe Training Data
  • 11.
    11  Rule Qualitymeasures  Coverage or Accuracy independently will not be sufficient  Rule-Quality measures: consider both coverage and accuracy  Foil-gain (in FOIL & RIPPER): assesses info_gain by extending condition It favors rules that have high accuracy and cover many positive tuples R – Existing rule; R’ – Extended rule  Likelihood Ratio Statistic  Likelihood_Ratio = 2 ∑i=1 m fi log(fi/ei)  Greater this value – higher the significance )log '' ' (log'_ 22 negpos pos negpos pos posGainFOIL + − + ×= Rule Extraction from the Training Data
  • 12.
    12 Rule Extraction fromthe Training Data  Rule pruning based on an independent set of test tuples Pos/neg are # of positive/negative tuples covered by R. If FOIL_Prune is higher for the pruned version of R, prune R negpos negpos RPruneFOIL + − =)(_