Like this presentation? Why not share!

# Covering (Rules-based) Algorithm

## on Nov 11, 2007

• 17,815 views

What is the Covering (Rule-based) algorithm? ...

What is the Covering (Rule-based) algorithm?
Classification Rules- Straightforward
1. If-Then rule
2. Generating rules from Decision Tree
Rule-based Algorithm
1. The 1R Algorithm / Learn One Rule
2. The PRISM Algorithm
3. Other Algorithm
Application of Covering algorithm
Discussion on e/m-learning application

### Views

Total Views
17,815
Views on SlideShare
17,750
Embed Views
65

Likes
3
278
1

### 3 Embeds65

 http://www.slideshare.net 57 http://translate.googleusercontent.com 4 http://us-w1.rockmelt.com 4

### Report content

11 of 1 previous next

• Comment goes here.
Are you sure you want to
• thanks
Are you sure you want to

## Covering (Rules-based) AlgorithmPresentation Transcript

• Chapter 8 Covering (Rules-based) Algorithm Data Mining Technology
• Chapter 8 Covering (Rules-based) Algorithm Written by Shakhina Pulatova Presented by Zhao Xinyou [email_address] 2007.11.13 Data Mining Technology Some materials (Examples) are taken from Website.
• Contents
• What is the Covering (Rule-based) algorithm?
• Classification Rules- Straightforward
• 1. If-Then rule
• 2. Generating rules from Decision Tree
• Rule-based Algorithm
• 1. The 1R Algorithm / Learn One Rule
• 2. The PRISM Algorithm
• 3. Other Algorithm
• Application of Covering algorithm
• Discussion on e/m-learning application
• Introduction-App-1 PP87-88 Training Data Attributes Record Rules
• Rules given by people
• Rules generated by computer
Setting 1.(1.75, 0) short 2. [1.75, 1.95) Medium 3. [1.95, ..) tall
• Introduction-App-2 PP87-88 How to get all tall people from B based on A A B + Training Data
• What is Rule-based Algorithm?
• Definition :
• Each classification method uses an algorithm to generate rules from the sample data. These rules are then applied to new data.
• Rule-based algorithm provide mechanisms that generate rules by
• 1. concentrating on a specific class at a time
• 2. maximizing the probability of the desired classification.
PP87-88 Should be compact, easy-to-interpret, and accurate.
• Classification Rules- Straightforward
• If-Then rule
• Generating rules from Decision Tree
PP88-89
• formal Specification of Rule-based Algorithm
• The classification r ules, r=<a, c>, consists of :
• a ( a ntecedent/precondition): a series of tests that be valuated as true or false ;
• c ( c onsequent/conclusion): the class or classes that apply to instances covered by rule r.
PP88 a=0,b=0 a=0,b=1 a=1,b=0 a=1,b=1 a = x y c = a=0 b=0 b=0 yes no X X Y Y no no yes yes
• Remarks of Straightforward classification
• The a ntecedent contains a predicate that can be valuated as true or false against each tuple in database.
• These rules relate directly to corresponding decision tree (DT) that could be created.
• A DT can always be used to generate rules, but they are not equivalent.
• Differences:
• -the tree has a implied order in which the splitting is performed; rules have no order.
• -a tree is created based on looking at all classes; only one class must be examined at a time.
PP88-89
• If-Then rule
• Straightforward way to perform classification is to generate if-then rules that cover all cases.
1 PP88
• Generating rules from Decision Tree -1-Con’ Decision Tree 2
• Generating rules from Decision Tree -2-Con’ y n a b c d x y y
• Generating rules from Decision Tree -3-Con’
• Remarks
• Rules may be more complex and incomprehensible from DT.
• A new test or rules need reshaping the whole tree
• Rules obtained without decision trees are more compact and accurate.
• So many other covering algorithms have been proposed.
PP89-90 a b x y y c d x y y n n n n c d x y y n n c d x y y n n c d x y y n n duplicate subtrees a=0 b=0 b=0 yes no X X Y Y no no yes yes a=1 and c=0 Y
• Rule-based Classification
• Generate rules
• The 1R Algorithm / Learn One Rule
• The PRISM Algorithm
• Other Algorithm
PP90
• Generating rules without Decision Trees-1-con’
• Goal: find rules that identify the instances of a specific class
• Generate the “best” rule possible by optimizing the desired classification probability
• Usually, the “best” attribute-pair is chosen
• Remark
• -these technologies are also called covering algorithms because they attempt to generate rules which exactly cover a specific class.
• Generate Rules-Example-2-Con'
• Example 3
• Question: We want to generate a rule to classify persons as tall. Basic format of the rule:
• if ? then class = tall
• Goal: replace “?” with predicates that can be used to obtain the “best” probability of being tall
PP90
• Generate Rules-Algorithms-3-Con'
• 1.Generate rule R on training data S;
• 2.Remove the training data covered by rule R;
• 3. Repeat the process.
PP90
• Generate Rules-Example-4-Con'
• Sequential Covering
(I) Original data (ii) Step 1 r = NULL (iii) Step 2 R1 r = R1 (iii) Step 3 R1 R2 r = R1 U R2 (iii) Step 4 R1 R2 R3 r = R1 U R2 U R3 Wrong Class
• 1R Algorithm/ Learn One Rule-Con’
• Simple and cheap method
• it only generates a one level decision tree.
• Classify an object on the basis of a single attribute.
• Idea:
• Rules will be constructed to test a single attribute and branch for every value of that attribute. For each branch, the class with the test classification is the one occurring
PP91
• 1R Algorithm/ Learn One Rule-Con’
• Idea :
• 1. Rules will be constructed to test a single attribute and branch for every value of that attribute.
• Step
• 2. For each branch, the class with the test classification is the one occurring.
• 3. Find one biggest number as rules
• 4. Error rate will be evaluated.
• 5. The minimum error rate will be chosen.
PP91 M->T Error=5 F->M Error=3 Total Error=8 Total Error=3 Total Error=.. A2 An Gender F 2 5 1 S M T M 1 4 10 S M T
• 1R Algorithm
• Input:
• D //Training Data
• T //Attributes to consider for rules
• C //Classes
• Output :
• R //Rules
• ALgorithm :
• R=Φ;
• for all A in T do
• R A =Φ;
• for all possbile value, v, of A do
• for all C j ∈C do
• find count(C j )
• end for
• let C m be the class with the largest count;
• R A =R A ((A=v) ->(class= C m ));
• end for
• ERR A =number of tuples incorrectly classified by R A ;
• e nd for
• R=R A where ERR A is minimum
T={Gender, Height} D C={{F, M}, {0, ∞}} C1 C2 Training Data Gender F M Short Medium Tall 3 6 0 Short Medium Tall 1 2 3 R1=F->medium R2=M->tall Height
• Example 5 – 1R-3-Con’ Rules based on height … ... … 0/2 0/2 0/3 0/4 1/2 0/2 3/9 3/6 Error 1/15 (0 , 1.6]-> short (1.6, 1.7]->short (1.7, 1.8]-> medium (1.8, 1.9]-> medium (1.9, 2.0]-> medium (2.0, ∞ ]-> tall Height (Step=0.1) 2 6/15 F->medium M->tall Gender 1 Total Error Rules Attribute Option
• Example 6 -1R PP92-93 5/14 2/8 3/6 False->yes True->no windy 4 4/14 3/7 1/7 High->no Normal->yes humidity 3 2/4 2/6 1/4 2/5 0/4 2/5 Error 5/14 Hot->no Mild->yes Cool->yes temperature 2 4/14 Sunny->no Overcast->yes Rainy->yes outlook 1 Total Error Rules Attribute Rules based on humidity OR High->no Normal->yes Rules based on outlook Sunny->no Overcast->yes Rainy->yes
• PRISM Algorithm-Con’
• PRISM generate rules for each class by looking at the training data and adding rules that completely describe all tuples in that class.
• Generates only correct or perfect rules: the accuracy of so-constructed PRISM is 100%.
• Measures the success of a rule by a p/t, where
• -p is number of positive instance,
• -T is total number of instance covered by the rule.
Gender=Male P=10, T=10 Gender=Female P=1 T=8 R=Gender = Male …… A2 An Gender F 2 5 1 S M T M 0 0 10 S M T
• PRISM Algorithm Step Input D and C (Attribute -> Value) 1.Compute all class P/T (Attribute->Value) 2. Find one or more pair of (Attribute->Value) P/T = 100% 3. Select (Attribute->Value) as Rule 4. Repeat 1-3 until no data in D Input: D //Training Data C //Classes Output: R //Rules
• Example 8-Con’-which class may be tall? Compute the value p / t Which one is 100% PP94-95 0/9 Gender = F 1 2/2 2.0< Height 8 ½ 1.9< Height ≤ 2.0 7 0/4 1.8< Height ≤ 1.9 6 0/3 1.7< Height ≤ 1.8 5 0/2 1.6< Height ≤ 1.7 4 0/2 Height ≤ 1.6 3 3/6 Gender = M 2 p / t (Attribute, value) Num R1 = 2.0< Height
• R2 = 1.95< Height ≤ 2.0 R = R1 U R2 PP94-96 … … … 1/1 1.95< Height ≤ 2.0 0/1 1.9< Height ≤ 1.95 p / t (Attribute, value) Num
• Example 9-Con’-which days may play? The predicate outlook=overcast correctly implies play=yes on all four rows R1 =if outlook=overcast, then play=yes Compute the value p / t
• Example 8-Con’ R2= if humidity=normal and windy=false, then play=yes
• Example 8-Con’ R3 =….. R = R1 U R2 U R3 U…
• Application of Covering Algorithm
• To derive classification rules applied for diagnosing illness, business planning, banking, government.
• Machine learning
• Text classification. But to photos, it is difficult…
• And so on.
• Application on E-learning/M-learning
• Adaptive and personalized learning materials
• Virtual Group Classification
Initial Learner’s information Classification of learning styles or some Provide adaptive and personalized materials Collect learning styles feedback Chapter 2 or 3 Similarity, Bayesian… Rule-based algorithm
• Discussion