SlideShare a Scribd company logo
1 of 45
Swapna.C
INTRODUCTION
 One way to learn sets of rules is to first learn a decision tree, then
translate the tree into an equivalent set of rules-one rule for each
leaf node in the tree.
 A Second method, is to use a genetic algorithm that encodes
each rule set as a bit string and uses genetic search operators to
explore this hypothesis space.
 Variety of algorithms that directly learn rule sets and that differ
from these algorithms in 2 key respects:
 1. They are designed to learn sets of first-order rules that contain
variables. This is significant because first-order rules are much
more expressive than propositional rule.
 2.The algorithms discussed here use sequential covering
algorithms that learn one rule at a time to incrementally grow
the final set of rules.
 Ex: first-order rule sets, consider the following two
rules that jointly describe the target concept Ancestor.
Here we use the predicate Parent(x, y) to indicate that
y is the mother or father of x, and the predicate
Ancestor(x, y) to indicate that y is an ancestor of x
related by an arbitrary number of family generations.
 One way to see the representational power of first-
order rules is to consider the general purpose
programming language PROLOG.
 In programs are sets of first-order rules such as the two
shown above (rules of this form are also called
Horn clauses). In fact, when stated in a slightly different
syntax the above rules form a valid PROLOG program for
computing the Ancestor relation.
 In this light, a general purpose algorithm capable of
learning such rule sets may be viewed as an algorithm for
automatically inferring PROLOG programs from examples.
 Learning systems based on first-order representations have
been successfully applied to problems such as learning
which chemical bonds fragment in a mass spectrometer
(Buchanan 1976; Lindsay 1980), learning which chemical
substructures produce mutagenic activity (a property
related to arcinogenicity) (Srinivasan et al. 1994), and
learning to design finite element meshes to analyze stresses
in physical structures (Dolsak and Muggleton 1992)
 A family of algorithms for learning rule sets based on the
strategy of learning one rule, removing the data it covers,
then iterating this process. Such algorithms are called
sequential covering algorithms.
 A subroutine LEARN-ONE-RULE that accepts a set of
positive and negative training examples as input, then
outputs a single rule that covers many of the positive
examples and few of the negative examples.
 This output rule have high accuracy, but not necessarily
high coverage. By high accuracy, we mean the predictions it
makes should be correct. By accepting low coverage, we
mean it need not make predictions for every training
example.
2 SEQUENTIAL COVERING ALGORITHMS
 one obvious approach to learning a set of rules is to
invoke LEARN-ONE-RULE on all the available training
examples, remove any positive examples covered by
the rule it learns, then invoke it again to learn a second
rule based on the remaining training examples.
 This procedure can be iterated as many times as
desired to learn a disjunctive set of rules that together
cover any desired fraction of the positive examples.
This is called a sequential covering algorithm
because it sequentially learns a set of rules that
together cover the full set of positive examples. The
final set of rules can then be sorted so that more
accurate rules will be considered first when a new
instance must be classified.
 This sequential covering algorithm is one of the most
widespread approaches to learning disjunctive sets of
rules. It reduces the problem of learning a disjunctive
set of rules to a sequence of simpler problems, each
requiring that a single conjunctive rule be learned.
 Because it performs a greedy search, formulating a
sequence of rules without backtracking, it is not
guaranteed to find the smallest or best set of rules that
cover the training examples.
2.1 General to Specific Beam Search
 One effective approach to implementing LEARN-
ONE-RULE is to organize the hypothesis space search
in the same general fashion as the ID3 algorithm, but
to follow only the most promising branch in the tree at
each step.
 The search begins with general rule precondition
possible, then greedily adding the attribute test that
most improves rule performance measured over the
training examples.
 Once this test has been added, the process is repeated
by greedily adding a second attribute test.
 This approach to implementing LEARN-ONE-RULE
performs a general-to specific search through the
space of possible rules in search of a rule with high
accuracy, though perhaps incomplete coverage of the
data. As in decision tree learning, there are many ways
to define a measure to select the "best" descendant.
Measure is lowest entropy.
 The general-to-specific search suggested above for the
LEARN-ONE-RULE algorithm is a greedy depth-first
search with no backtracking.
 The general-to-specific search suggested above for the
LEARN-ONE-RULE algorithm is a greedy depth-first
search with no backtracking.
 On each search step, descendants (specializations)
are generated for each of these k best candidates, and the
resulting set is again reduced to the k most promising
members. Beam search keeps track of the most promising
alternatives to the current top-rated hypothesis, so that all
of their successors can be considered at each search step.
This general to specific beam search algorithm is used by
the CN2 program described by Clark and Niblett.
 First, note that each hypothesis considered in the main
loop of the algorithm is a conjunction of attribute-value
constraints. Each of these conjunctive hypotheses
corresponds to a candidate set of preconditions for the rule
to be learned and is evaluated by the entropy of the
examples it covers.
 The search considers increasingly specific candidate
hypotheses until it reaches a maximally specific hypothesis
that contains all available attributes. The rule that is output
by the algorithm is the rule encountered during the search
whose PERFORMANCE is greatest-not necessarily the final
hypothesis generated in the search. The post condition for
the output rule is chosen only in the final step of the
algorithm, after its precondition has been determined.
 The algorithm constructs the rule post condition to predict
the value of the target attribute that is most common
among the examples covered by the rule precondition.
 Finally, the use of beam search to reduce the risk, the
greedy search may still produce suboptimal rules. The
SEQUENTIALCOVERING algorithm can still learn a
collection of rules that together cover the training
examples, because it repeatedly calls LEARN-ONE-RULE
on the remaining uncovered examples.
2.2 Variations
 The SEQUENTIAL-COVERING algorithm, together
with the LEARN-ONE-RULE -algorithm, learns a set of
if-then rules that covers the training examples. Many
variations on this approach have been explored.
 Ex. some cases it might be desirable to have the
program learn only rules that cover positive examples
and to include a "default" that assigns a negative
classification to learn such rules that predict just a
single target value, the LEARN-ONE-RULE algorithm
can be modified to accept an additional input
argument specifying the target value of interest n to
instances not covered by any rule.
 Another variation is provided by a family of
algorithms called AQ, that predate the CN2 algorithm.
 AQ learns a disjunctive set of rules that together cover the
target function.
 First, the covering algorithm of AQ differs from the
 SEQUENTIAL-COVERING algorithm because it explicitly
seeks rules that cover a particular target value, learning a
disjunctive set of rules for each target value in turn.
 Second, AQ's algorithm for learning a single rule differs
from LEARN-ONE RULE. While it conducts a general-to-
specific beam search for each rule, it uses a single positive
example to focus this search.
 In particular, it considers only those attributes satisfied by
the positive example as it searches for progressively more
specific hypotheses. Each time it learns a new rule it selects
a new positive example from those that are not yet covered,
to act as a seed to guide the search for this new disjunct.
4. LEARNING FIRST-ORDER RULES
 Inductive learning of first-order rules or theories is
often referred to as inductive logic programming (or
LP for short), because this process can be viewed as
automatically inferring PROLOG programs from
examples.
 PROLOG is a general purpose, Turing-equivalent
programming language in which programs are
expressed as collections of Horn clauses.
4.1 First-Order Horn Clauses
 The advantages of first-order representations over
propositional (variablefree) representations, consider
the task of learning the simple target concept
Daughter (x, y), defined over pairs of people x and
y.
 The value of Daughter(x, y) is True when x is the
daughter of y, and False otherwise. Suppose each
person in the data is described by the attributes
Name, Mother, Father, Male, Female.
 For example, the following is a positive example in
which Sharon is the daughter of Bob:
 (Namel = Sharon, Motherl = Louise, Fatherl = Bob,
 Malel = False, Female1 = True,
 Name2 = Bob, Mother2 = Nora, Father2 = Victor,
 Male2 = True, Female2 = False, Daughterl.2 = True)
 If we were to collect a number of such training
examples for the target concept Daughterlv2 and
provide them to a propositional rule learner such
as CN2 or C4.5, the result would be a collection of very
specific rules such as
 IF (Father1 = Bob) A (Name2 = Bob) A (Femalel =
True)
 THEN daughter1.2 = True
 The problem is that propositional representations
offer no general way to describe the essential
relations among the values of the attributes. In
contrast, a program using first-order representations
could learn the following general rule:
 IF Father(y, x) / Female(y), THEN Daughter(x, y)
 where x and y are variables that can be bound to
any person.
 First-order Horn clauses may refer to variables in the
preconditions that do not occur in the post conditions.
 For ex. One rule for Granddaughter might be
 Note the variable z in this rule, which refers to the
father of y, is not present in the rule postconditions.
 Whenever such a variable occurs only in the
preconditions, it is assumed to be existentially
quantified; that is, the rule preconditions are satisfied
as long as there exists at least one binding of the
variable that satisfies the corresponding literal.
 It is also possible to use the same predicates in the rule post
conditions and preconditions, enabling the description of
recursive rules.
 Ancestor function, and functions for sorting the elements
of a list, removing a specific element from a list, and
appending two lists.
 4.2 Terminology:
 All expressions are composed of constants (e.g., Bob,
Louise), variables (e.g., x, y), predicate symbols (e.g.,
Married, Greater-Than), and function symbols (e.g., age).
The difference between predicates and functions is that
predicates take on values of True or False, whereas
functions may take on any constant as their value.
5.LEARNING SETS OF FIRST –
ORDER RULES:FOIL
 A variety of algorithms has been proposed for learning
first-order rules, or Horn clauses.
 In fact, the FOIL program is the natural extension of
these earlier algorithms to first-order representations.
 Formally, the hypotheses learned by FOIL are sets of
first-order rules, where each rule is similar to a Horn
clause with two exceptions. First, the rules learned by
FOIL are more restricted than general Horn clauses,
because the literals are not permitted to contain
function symbols.
 Second, FOIL rules are more expressive than Horn
clauses, because the literals appearing in the body of
the rule may be negated. FOIL has been applied to a
variety of problem domains.
 The FOIL algorithm shows the outer loop corresponds
to a variant of the SEQUENTIAL-COVERING
algorithm.
 It learns new rules one at a time, removing the positive
examples covered by the latest rule before attempting
to learn the next rule. The inner loop corresponds to a
variant of our earlier LEARN-ONE-RULE algorithm,
extended to accommodate first-order rules.
 FOIL seeks only rules that predict when the target
literal is True, whereas our earlier algorithm would
seek both rules that predict when it is True and rules
that predict when it is False. Also, FOIL performs a
simple hillclimbing search rather than a beam search.
 The hypothesis space search performed by FOIL is
best understood by viewing it hierarchically. Each
iteration through FOIL'S outer loop adds a new rule to
 its disjunctive hypothesis, Learned_rule.
 The two most substantial differences between FOIL and
our earlier
 SEQUENTIAL-COVERING and LGE ARN-ONE-RULE
algorithm follow from the requirement that it
accommodate first-order rules. These differences are:
 1. In its general-to-specific search to 'learn each new rule,
FOIL employs different detailed steps to generate candidate
specializations of the rule. This difference follows from the
need to accommodate variables in the rule preconditions.
 2. FOIL employs a PERFORMANCE measure, Foil-Gain,
that differs from the entropy measure shown for LEARN-
ONE-RUinL TEa ble 10.2. This difference follows from the
need to distinguish between different bindings of the rule
variables and from the fact that FOIL seeks only rules that
cover positive examples.
5.1 Generating Candidate
Specializations in FOIL
 To generate candidate specializations of the current
rule, FOIL generates a variety of new literals, each of
which may be individually added to the rule
preconditions.
 More precisely, suppose the current rule being
considered is
 P(x1,x2.....xk)<-L1....Ln
 where L1.. . L, are literals forming the current rule
preconditions and where P(x1, x2, . . . , xk) is the literal
that forms the rule head, or postconditions.
 FOIL generates candidate specializations of this rule
by considering new literals Ln+1that fit one of the
following forms:
 Q(vl, . . . , v,), where Q is any predicate name occurring
in Predicates and where the vi are either new variables
or variables already present in the rule.
 At least one of the vi in the created literal must already
exist as a variable in the rule.
 Equal(xj, xk), where xi and xk are variables already
present in the rule.
 The negation of either of the above forms of literals.
 The general-to-specific search in FOIL begins with the most
general rule
 GrandDaughter(x, y) --
 which asserts that GrandDaughter(x, y) is true of any x and y.
 Now suppose that among the above literals FOIL greedily selects
Father-( y , z) as the most promising, leading to the more specific
rule
 GrandDaughter(x, y)  Father(y, z)
 In generating candidate literals to further specialize this rule,
FOIL will now consider all of the literals mentioned in the
previous step, plus the additional literals Female(z), Equal(z,
x), Equal(z, y), Father(z, w), Father(w, z), and their
negations. These new literals are considered at this point
because the variable z was added to the rule in the previous step.
Because of this, FOIL now considers an additional new variable
w.
 If FOIL at this point were to select the literal Father(z,
x) and on the next iteration select the literal
Female(y), this would lead to the following rule,
 which covers only positive examples and hence
terminates the search for further specializations of the
rule.
5.2 Guiding the Search in FOIL
 To select the most promising literal from the candidates
generated at each step, FOIL considers the performance of
the rule over the training data.
 Assume the training data includes the following simple set
of assertions, where we use the convention
that P(x, y) can be read as "The P of x is y .“
 Here let us also make the closed world assumption that any
literal involving the predicate GrandDaughter, Father, or
Female and the constants Victor, Sharon, Bob, and Tom
that is not listed above can be assumed to be false.
 To select the best specialization of the current rule,
FOIL considers each distinct way in which the rule
variables can bind to constants in the training
examples.
GrandDaughter(x,y)-
 The rule variables x and y are not constraindb any
preconditions and may therefore bind in any
combination o the four constants Victor,Sharon,Bob
and Tom.
 We will use the notation {x/Bob, y/Shar on} to
denote a particular variable binding; that is, a
substitution mapping each variable to a constant.
 Given the four possible constants, there are 16 possible
variable bindings for this initial rule. The binding
{xlvictor, ylSharon} corresponds to a positive
example binding, because the training data includes
the assertion GrandDaughter(Victor, Sharon).
 The other 15 bindings allowed by the rule (e.g., the
binding {x/Bob, y/Tom}) constitute negative evidence
for the rule in the current example, because no
corresponding assertion can be found in the training
data.
 At each stage, the rule is evaluated based on these sets
of positive and negative variable bindings, with
preference given to rules that possess more positive
bindings and fewer negative bindings.
 As new literals are added to the rule, the sets of
bindings will change. Note if a literal is added that
introduces a new variable, then the bindings for the
rule will grow in length (e.g., if Father(y, z) is added
to the above rule, then the original bindingbecome the
more lengthy.
 The evaluation function used by FOIL to estimate the
utility of adding a new literal is based on the numbers
of positive and negative bindings covered before and
after adding the new literal.
 More precisely, consider some rule R, and a candidate
literal L that might be added to the body of R. Let
R' be the rule created by adding literal L to rule R.
The value Foil-Gain(L, R) of adding L to R is
defined as
 where po is the number of positive bindings of rule
R, no is the number of negative bindings of R, pl is
the number of positive bindings of rule R', and nl
is the number of negative bindings of R'.
 Finally, t is the number of positive bindings of rule
R that are still covered after adding literal L to R.
 When a new variable is introduced into R by adding
L, then any original binding is considered to be
covered so long as some binding extending it is
present in the bindings of R'.
Learning Recursive Rule Sets
 we ignored the possibility that new literals added to
the rule body could refer to the target predicate itself.
 However, if we include the target predicate in the
input list of Predicates, then FOIL will consider it
as well when generating candidate literals.
 This will allow it to form recursive rules-rules that use
the same predicate in the body and the head of the
rule.
 For instance, recall the following rule set that provides a
recursive definition of the Ancestor relation.
 Given an appropriate set of training examples, these two
rules can be learned following a trace similar to the one
above for GrandDaughter. Note the secondrule is among
the rules that are potentially within reach of FOIL'S search,
provided Ancestor is included in the list Predicates
that determines which predicates may be considered
when generating new literals. Of course whether this
particular rule would be learned or not depends on
whether these particular literals outscore competing
candidates during FOIL'S greedy search for increasingly
specific rules.
6.INDUCTION AS INVERTED DEDUCTION
 A second, quite different approach to inductive logic
programming is based on the simple observation that
induction is just the inverse of deduction.
 In general, machine learning involves building
theories that explain the observed data.
 xi denotes the ith training instance and f (xi) denotes
its target value. Then learning is the problem of
discovering a hypothesis h, such that the classification
f (xi) of each training instance xi follows deductively
from the hypothesis h, the description of xi, and any
other background knowledge B known to the system.
 The expression X|- Y is read "Y follows deductively
from X,“ or alternatively "X entails Y.“
 The expression X |-Y is read "Y follows deductively
from X," or alternatively entails Y."
 The process of augmenting the set of predicates, based
on background knowledge, is often referred to as
constructive induction.
 Induction is, in fact, the inverse operation of
deduction, and cannot be conceived to exist without
the corresponding operation, so that the question of
relative importance cannot arise.
 An inverse entailment operator, O(B, D) takes the
training data D = {(xi,f (xi ) ) }and background
knowledge B as input and produces as output a
hypothesis h satisfying Equation
oThis formulation subsumes the common definition of learning as finding
some general concept that matches a given set of training examples (which
corresponds to the special case where no background knowledge B is
available).
o By incorporating the notion of background information B, this formulation
allows a more rich definition of when a hypothesis may be said to "fit"
the data. Up until now, we have always determined whether a hypothesis
(e.g., neural network) fits the data based solely on the description of the
hypothesis and data, independent of the task domain under study. In
contrast, this formulation allows the domain-specific background
information B to become part of the definition of "fit." In particular, h fits the
training example (xi, f (xi)) as long as f (xi) follows deductively from B A h A
xi.
oBy incorporating background information B, this formulation invites
learning methods that use this background information to guide the search
for h, rather than merely searching the space of syntactically legal
hypotheses. The inverse resolution procedure described in the following
section uses background knowledge in this fashion.

More Related Content

What's hot

Inductive analytical approaches to learning
Inductive analytical approaches to learningInductive analytical approaches to learning
Inductive analytical approaches to learningswapnac12
 
Counter propagation Network
Counter propagation NetworkCounter propagation Network
Counter propagation NetworkAkshay Dhole
 
Unit I & II in Principles of Soft computing
Unit I & II in Principles of Soft computing Unit I & II in Principles of Soft computing
Unit I & II in Principles of Soft computing Sivagowry Shathesh
 
Production System in AI
Production System in AIProduction System in AI
Production System in AIBharat Bhushan
 
Address in the target code in Compiler Construction
Address in the target code in Compiler ConstructionAddress in the target code in Compiler Construction
Address in the target code in Compiler ConstructionMuhammad Haroon
 
Back propagation
Back propagationBack propagation
Back propagationNagarajan
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
 
Unification and Lifting
Unification and LiftingUnification and Lifting
Unification and LiftingMegha Sharma
 
Inductive bias
Inductive biasInductive bias
Inductive biasswapnac12
 
Learning sets of rules, Sequential Learning Algorithm,FOIL
Learning sets of rules, Sequential Learning Algorithm,FOILLearning sets of rules, Sequential Learning Algorithm,FOIL
Learning sets of rules, Sequential Learning Algorithm,FOILPavithra Thippanaik
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptronomaraldabash
 
Knowledge Representation in Artificial intelligence
Knowledge Representation in Artificial intelligence Knowledge Representation in Artificial intelligence
Knowledge Representation in Artificial intelligence Yasir Khan
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learningamalalhait
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networksAkash Goel
 

What's hot (20)

Concept learning
Concept learningConcept learning
Concept learning
 
Inductive analytical approaches to learning
Inductive analytical approaches to learningInductive analytical approaches to learning
Inductive analytical approaches to learning
 
Counter propagation Network
Counter propagation NetworkCounter propagation Network
Counter propagation Network
 
AI: Logic in AI
AI: Logic in AIAI: Logic in AI
AI: Logic in AI
 
Rule based system
Rule based systemRule based system
Rule based system
 
Unit I & II in Principles of Soft computing
Unit I & II in Principles of Soft computing Unit I & II in Principles of Soft computing
Unit I & II in Principles of Soft computing
 
Production System in AI
Production System in AIProduction System in AI
Production System in AI
 
Address in the target code in Compiler Construction
Address in the target code in Compiler ConstructionAddress in the target code in Compiler Construction
Address in the target code in Compiler Construction
 
Hidden markov model ppt
Hidden markov model pptHidden markov model ppt
Hidden markov model ppt
 
Back propagation
Back propagationBack propagation
Back propagation
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Unification and Lifting
Unification and LiftingUnification and Lifting
Unification and Lifting
 
Inductive bias
Inductive biasInductive bias
Inductive bias
 
Learning sets of rules, Sequential Learning Algorithm,FOIL
Learning sets of rules, Sequential Learning Algorithm,FOILLearning sets of rules, Sequential Learning Algorithm,FOIL
Learning sets of rules, Sequential Learning Algorithm,FOIL
 
Ontology engineering
Ontology engineering Ontology engineering
Ontology engineering
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
Knowledge Representation in Artificial intelligence
Knowledge Representation in Artificial intelligence Knowledge Representation in Artificial intelligence
Knowledge Representation in Artificial intelligence
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
supervised learning
supervised learningsupervised learning
supervised learning
 

Similar to Learning set of rules

Learning sets of Rules by Dr.C.R.Dhivyaa Kongu Engineering College
Learning sets of Rules by Dr.C.R.Dhivyaa Kongu Engineering CollegeLearning sets of Rules by Dr.C.R.Dhivyaa Kongu Engineering College
Learning sets of Rules by Dr.C.R.Dhivyaa Kongu Engineering CollegeDhivyaa C.R
 
New_ML_Lecture_9.ppt
New_ML_Lecture_9.pptNew_ML_Lecture_9.ppt
New_ML_Lecture_9.pptEmmaMaria6
 
Clustering and Classification Algorithms Ankita Dubey
Clustering and Classification Algorithms Ankita DubeyClustering and Classification Algorithms Ankita Dubey
Clustering and Classification Algorithms Ankita DubeyAnkita Dubey
 
rule refinement in inductive knowledge based systems
rule refinement in inductive knowledge based systemsrule refinement in inductive knowledge based systems
rule refinement in inductive knowledge based systemsarteimi
 
Structure based drug design- kiranmayi
Structure based drug design- kiranmayiStructure based drug design- kiranmayi
Structure based drug design- kiranmayiKiranmayiKnv
 
A New Extraction Optimization Approach to Frequent 2 Item sets
A New Extraction Optimization Approach to Frequent 2 Item setsA New Extraction Optimization Approach to Frequent 2 Item sets
A New Extraction Optimization Approach to Frequent 2 Item setsijcsa
 
A NEW EXTRACTION OPTIMIZATION APPROACH TO FREQUENT 2 ITEMSETS
A NEW EXTRACTION OPTIMIZATION APPROACH TO FREQUENT 2 ITEMSETSA NEW EXTRACTION OPTIMIZATION APPROACH TO FREQUENT 2 ITEMSETS
A NEW EXTRACTION OPTIMIZATION APPROACH TO FREQUENT 2 ITEMSETSijcsa
 
A NEW EXTRACTION OPTIMIZATION APPROACH TO FREQUENT 2 ITEMSETS
A NEW EXTRACTION OPTIMIZATION APPROACH TO FREQUENT 2 ITEMSETSA NEW EXTRACTION OPTIMIZATION APPROACH TO FREQUENT 2 ITEMSETS
A NEW EXTRACTION OPTIMIZATION APPROACH TO FREQUENT 2 ITEMSETSijcsa
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsDinusha Dilanka
 
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model SelectionAdapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model SelectionIJECEIAES
 
Multi label text classification
Multi label text classificationMulti label text classification
Multi label text classificationraghavr186
 
An improved teaching learning
An improved teaching learningAn improved teaching learning
An improved teaching learningcsandit
 
17 2 expert systems
17 2 expert systems17 2 expert systems
17 2 expert systemsTianlu Wang
 
Introduction Of Artificial neural network
Introduction Of Artificial neural networkIntroduction Of Artificial neural network
Introduction Of Artificial neural networkNagarajan
 

Similar to Learning set of rules (20)

Learning sets of Rules by Dr.C.R.Dhivyaa Kongu Engineering College
Learning sets of Rules by Dr.C.R.Dhivyaa Kongu Engineering CollegeLearning sets of Rules by Dr.C.R.Dhivyaa Kongu Engineering College
Learning sets of Rules by Dr.C.R.Dhivyaa Kongu Engineering College
 
UNIT V (5).pptx
UNIT V (5).pptxUNIT V (5).pptx
UNIT V (5).pptx
 
New_ML_Lecture_9.ppt
New_ML_Lecture_9.pptNew_ML_Lecture_9.ppt
New_ML_Lecture_9.ppt
 
syllabus-CBR.pdf
syllabus-CBR.pdfsyllabus-CBR.pdf
syllabus-CBR.pdf
 
Clustering and Classification Algorithms Ankita Dubey
Clustering and Classification Algorithms Ankita DubeyClustering and Classification Algorithms Ankita Dubey
Clustering and Classification Algorithms Ankita Dubey
 
G
GG
G
 
rule refinement in inductive knowledge based systems
rule refinement in inductive knowledge based systemsrule refinement in inductive knowledge based systems
rule refinement in inductive knowledge based systems
 
Machine learning
Machine learningMachine learning
Machine learning
 
Structure based drug design- kiranmayi
Structure based drug design- kiranmayiStructure based drug design- kiranmayi
Structure based drug design- kiranmayi
 
A New Extraction Optimization Approach to Frequent 2 Item sets
A New Extraction Optimization Approach to Frequent 2 Item setsA New Extraction Optimization Approach to Frequent 2 Item sets
A New Extraction Optimization Approach to Frequent 2 Item sets
 
A NEW EXTRACTION OPTIMIZATION APPROACH TO FREQUENT 2 ITEMSETS
A NEW EXTRACTION OPTIMIZATION APPROACH TO FREQUENT 2 ITEMSETSA NEW EXTRACTION OPTIMIZATION APPROACH TO FREQUENT 2 ITEMSETS
A NEW EXTRACTION OPTIMIZATION APPROACH TO FREQUENT 2 ITEMSETS
 
A NEW EXTRACTION OPTIMIZATION APPROACH TO FREQUENT 2 ITEMSETS
A NEW EXTRACTION OPTIMIZATION APPROACH TO FREQUENT 2 ITEMSETSA NEW EXTRACTION OPTIMIZATION APPROACH TO FREQUENT 2 ITEMSETS
A NEW EXTRACTION OPTIMIZATION APPROACH TO FREQUENT 2 ITEMSETS
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning Algorithms
 
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model SelectionAdapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
 
Multi label text classification
Multi label text classificationMulti label text classification
Multi label text classification
 
An improved teaching learning
An improved teaching learningAn improved teaching learning
An improved teaching learning
 
17 2 expert systems
17 2 expert systems17 2 expert systems
17 2 expert systems
 
.doc
.doc.doc
.doc
 
.doc
.doc.doc
.doc
 
Introduction Of Artificial neural network
Introduction Of Artificial neural networkIntroduction Of Artificial neural network
Introduction Of Artificial neural network
 

More from swapnac12

Awt, Swing, Layout managers
Awt, Swing, Layout managersAwt, Swing, Layout managers
Awt, Swing, Layout managersswapnac12
 
Event handling
Event handlingEvent handling
Event handlingswapnac12
 
Asymptotic notations(Big O, Omega, Theta )
Asymptotic notations(Big O, Omega, Theta )Asymptotic notations(Big O, Omega, Theta )
Asymptotic notations(Big O, Omega, Theta )swapnac12
 
Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)swapnac12
 
Introduction ,characteristics, properties,pseudo code conventions
Introduction ,characteristics, properties,pseudo code conventionsIntroduction ,characteristics, properties,pseudo code conventions
Introduction ,characteristics, properties,pseudo code conventionsswapnac12
 
Genetic algorithms
Genetic algorithmsGenetic algorithms
Genetic algorithmsswapnac12
 
Instance based learning
Instance based learningInstance based learning
Instance based learningswapnac12
 
Computational learning theory
Computational learning theoryComputational learning theory
Computational learning theoryswapnac12
 
Artificial Neural Networks 1
Artificial Neural Networks 1Artificial Neural Networks 1
Artificial Neural Networks 1swapnac12
 
Evaluating hypothesis
Evaluating  hypothesisEvaluating  hypothesis
Evaluating hypothesisswapnac12
 
Advanced topics in artificial neural networks
Advanced topics in artificial neural networksAdvanced topics in artificial neural networks
Advanced topics in artificial neural networksswapnac12
 
Introdution and designing a learning system
Introdution and designing a learning systemIntrodution and designing a learning system
Introdution and designing a learning systemswapnac12
 
Concept learning and candidate elimination algorithm
Concept learning and candidate elimination algorithmConcept learning and candidate elimination algorithm
Concept learning and candidate elimination algorithmswapnac12
 

More from swapnac12 (14)

Awt, Swing, Layout managers
Awt, Swing, Layout managersAwt, Swing, Layout managers
Awt, Swing, Layout managers
 
Applet
 Applet Applet
Applet
 
Event handling
Event handlingEvent handling
Event handling
 
Asymptotic notations(Big O, Omega, Theta )
Asymptotic notations(Big O, Omega, Theta )Asymptotic notations(Big O, Omega, Theta )
Asymptotic notations(Big O, Omega, Theta )
 
Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)
 
Introduction ,characteristics, properties,pseudo code conventions
Introduction ,characteristics, properties,pseudo code conventionsIntroduction ,characteristics, properties,pseudo code conventions
Introduction ,characteristics, properties,pseudo code conventions
 
Genetic algorithms
Genetic algorithmsGenetic algorithms
Genetic algorithms
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
 
Computational learning theory
Computational learning theoryComputational learning theory
Computational learning theory
 
Artificial Neural Networks 1
Artificial Neural Networks 1Artificial Neural Networks 1
Artificial Neural Networks 1
 
Evaluating hypothesis
Evaluating  hypothesisEvaluating  hypothesis
Evaluating hypothesis
 
Advanced topics in artificial neural networks
Advanced topics in artificial neural networksAdvanced topics in artificial neural networks
Advanced topics in artificial neural networks
 
Introdution and designing a learning system
Introdution and designing a learning systemIntrodution and designing a learning system
Introdution and designing a learning system
 
Concept learning and candidate elimination algorithm
Concept learning and candidate elimination algorithmConcept learning and candidate elimination algorithm
Concept learning and candidate elimination algorithm
 

Recently uploaded

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxAnaBeatriceAblay2
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 

Recently uploaded (20)

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 

Learning set of rules

  • 2. INTRODUCTION  One way to learn sets of rules is to first learn a decision tree, then translate the tree into an equivalent set of rules-one rule for each leaf node in the tree.  A Second method, is to use a genetic algorithm that encodes each rule set as a bit string and uses genetic search operators to explore this hypothesis space.  Variety of algorithms that directly learn rule sets and that differ from these algorithms in 2 key respects:  1. They are designed to learn sets of first-order rules that contain variables. This is significant because first-order rules are much more expressive than propositional rule.  2.The algorithms discussed here use sequential covering algorithms that learn one rule at a time to incrementally grow the final set of rules.
  • 3.  Ex: first-order rule sets, consider the following two rules that jointly describe the target concept Ancestor. Here we use the predicate Parent(x, y) to indicate that y is the mother or father of x, and the predicate Ancestor(x, y) to indicate that y is an ancestor of x related by an arbitrary number of family generations.  One way to see the representational power of first- order rules is to consider the general purpose programming language PROLOG.
  • 4.  In programs are sets of first-order rules such as the two shown above (rules of this form are also called Horn clauses). In fact, when stated in a slightly different syntax the above rules form a valid PROLOG program for computing the Ancestor relation.  In this light, a general purpose algorithm capable of learning such rule sets may be viewed as an algorithm for automatically inferring PROLOG programs from examples.  Learning systems based on first-order representations have been successfully applied to problems such as learning which chemical bonds fragment in a mass spectrometer (Buchanan 1976; Lindsay 1980), learning which chemical substructures produce mutagenic activity (a property related to arcinogenicity) (Srinivasan et al. 1994), and learning to design finite element meshes to analyze stresses in physical structures (Dolsak and Muggleton 1992)
  • 5.  A family of algorithms for learning rule sets based on the strategy of learning one rule, removing the data it covers, then iterating this process. Such algorithms are called sequential covering algorithms.  A subroutine LEARN-ONE-RULE that accepts a set of positive and negative training examples as input, then outputs a single rule that covers many of the positive examples and few of the negative examples.  This output rule have high accuracy, but not necessarily high coverage. By high accuracy, we mean the predictions it makes should be correct. By accepting low coverage, we mean it need not make predictions for every training example. 2 SEQUENTIAL COVERING ALGORITHMS
  • 6.  one obvious approach to learning a set of rules is to invoke LEARN-ONE-RULE on all the available training examples, remove any positive examples covered by the rule it learns, then invoke it again to learn a second rule based on the remaining training examples.  This procedure can be iterated as many times as desired to learn a disjunctive set of rules that together cover any desired fraction of the positive examples. This is called a sequential covering algorithm because it sequentially learns a set of rules that together cover the full set of positive examples. The final set of rules can then be sorted so that more accurate rules will be considered first when a new instance must be classified.
  • 7.
  • 8.  This sequential covering algorithm is one of the most widespread approaches to learning disjunctive sets of rules. It reduces the problem of learning a disjunctive set of rules to a sequence of simpler problems, each requiring that a single conjunctive rule be learned.  Because it performs a greedy search, formulating a sequence of rules without backtracking, it is not guaranteed to find the smallest or best set of rules that cover the training examples.
  • 9. 2.1 General to Specific Beam Search  One effective approach to implementing LEARN- ONE-RULE is to organize the hypothesis space search in the same general fashion as the ID3 algorithm, but to follow only the most promising branch in the tree at each step.  The search begins with general rule precondition possible, then greedily adding the attribute test that most improves rule performance measured over the training examples.  Once this test has been added, the process is repeated by greedily adding a second attribute test.
  • 10.  This approach to implementing LEARN-ONE-RULE performs a general-to specific search through the space of possible rules in search of a rule with high accuracy, though perhaps incomplete coverage of the data. As in decision tree learning, there are many ways to define a measure to select the "best" descendant. Measure is lowest entropy.  The general-to-specific search suggested above for the LEARN-ONE-RULE algorithm is a greedy depth-first search with no backtracking.  The general-to-specific search suggested above for the LEARN-ONE-RULE algorithm is a greedy depth-first search with no backtracking.
  • 11.
  • 12.
  • 13.  On each search step, descendants (specializations) are generated for each of these k best candidates, and the resulting set is again reduced to the k most promising members. Beam search keeps track of the most promising alternatives to the current top-rated hypothesis, so that all of their successors can be considered at each search step. This general to specific beam search algorithm is used by the CN2 program described by Clark and Niblett.  First, note that each hypothesis considered in the main loop of the algorithm is a conjunction of attribute-value constraints. Each of these conjunctive hypotheses corresponds to a candidate set of preconditions for the rule to be learned and is evaluated by the entropy of the examples it covers.
  • 14.  The search considers increasingly specific candidate hypotheses until it reaches a maximally specific hypothesis that contains all available attributes. The rule that is output by the algorithm is the rule encountered during the search whose PERFORMANCE is greatest-not necessarily the final hypothesis generated in the search. The post condition for the output rule is chosen only in the final step of the algorithm, after its precondition has been determined.  The algorithm constructs the rule post condition to predict the value of the target attribute that is most common among the examples covered by the rule precondition.  Finally, the use of beam search to reduce the risk, the greedy search may still produce suboptimal rules. The SEQUENTIALCOVERING algorithm can still learn a collection of rules that together cover the training examples, because it repeatedly calls LEARN-ONE-RULE on the remaining uncovered examples.
  • 15. 2.2 Variations  The SEQUENTIAL-COVERING algorithm, together with the LEARN-ONE-RULE -algorithm, learns a set of if-then rules that covers the training examples. Many variations on this approach have been explored.  Ex. some cases it might be desirable to have the program learn only rules that cover positive examples and to include a "default" that assigns a negative classification to learn such rules that predict just a single target value, the LEARN-ONE-RULE algorithm can be modified to accept an additional input argument specifying the target value of interest n to instances not covered by any rule.
  • 16.  Another variation is provided by a family of algorithms called AQ, that predate the CN2 algorithm.  AQ learns a disjunctive set of rules that together cover the target function.  First, the covering algorithm of AQ differs from the  SEQUENTIAL-COVERING algorithm because it explicitly seeks rules that cover a particular target value, learning a disjunctive set of rules for each target value in turn.  Second, AQ's algorithm for learning a single rule differs from LEARN-ONE RULE. While it conducts a general-to- specific beam search for each rule, it uses a single positive example to focus this search.  In particular, it considers only those attributes satisfied by the positive example as it searches for progressively more specific hypotheses. Each time it learns a new rule it selects a new positive example from those that are not yet covered, to act as a seed to guide the search for this new disjunct.
  • 17. 4. LEARNING FIRST-ORDER RULES  Inductive learning of first-order rules or theories is often referred to as inductive logic programming (or LP for short), because this process can be viewed as automatically inferring PROLOG programs from examples.  PROLOG is a general purpose, Turing-equivalent programming language in which programs are expressed as collections of Horn clauses.
  • 18. 4.1 First-Order Horn Clauses  The advantages of first-order representations over propositional (variablefree) representations, consider the task of learning the simple target concept Daughter (x, y), defined over pairs of people x and y.  The value of Daughter(x, y) is True when x is the daughter of y, and False otherwise. Suppose each person in the data is described by the attributes Name, Mother, Father, Male, Female.
  • 19.  For example, the following is a positive example in which Sharon is the daughter of Bob:  (Namel = Sharon, Motherl = Louise, Fatherl = Bob,  Malel = False, Female1 = True,  Name2 = Bob, Mother2 = Nora, Father2 = Victor,  Male2 = True, Female2 = False, Daughterl.2 = True)  If we were to collect a number of such training examples for the target concept Daughterlv2 and provide them to a propositional rule learner such as CN2 or C4.5, the result would be a collection of very specific rules such as
  • 20.  IF (Father1 = Bob) A (Name2 = Bob) A (Femalel = True)  THEN daughter1.2 = True  The problem is that propositional representations offer no general way to describe the essential relations among the values of the attributes. In contrast, a program using first-order representations could learn the following general rule:  IF Father(y, x) / Female(y), THEN Daughter(x, y)  where x and y are variables that can be bound to any person.
  • 21.  First-order Horn clauses may refer to variables in the preconditions that do not occur in the post conditions.  For ex. One rule for Granddaughter might be  Note the variable z in this rule, which refers to the father of y, is not present in the rule postconditions.  Whenever such a variable occurs only in the preconditions, it is assumed to be existentially quantified; that is, the rule preconditions are satisfied as long as there exists at least one binding of the variable that satisfies the corresponding literal.
  • 22.  It is also possible to use the same predicates in the rule post conditions and preconditions, enabling the description of recursive rules.  Ancestor function, and functions for sorting the elements of a list, removing a specific element from a list, and appending two lists.  4.2 Terminology:  All expressions are composed of constants (e.g., Bob, Louise), variables (e.g., x, y), predicate symbols (e.g., Married, Greater-Than), and function symbols (e.g., age). The difference between predicates and functions is that predicates take on values of True or False, whereas functions may take on any constant as their value.
  • 23.
  • 24. 5.LEARNING SETS OF FIRST – ORDER RULES:FOIL  A variety of algorithms has been proposed for learning first-order rules, or Horn clauses.  In fact, the FOIL program is the natural extension of these earlier algorithms to first-order representations.  Formally, the hypotheses learned by FOIL are sets of first-order rules, where each rule is similar to a Horn clause with two exceptions. First, the rules learned by FOIL are more restricted than general Horn clauses, because the literals are not permitted to contain function symbols.
  • 25.  Second, FOIL rules are more expressive than Horn clauses, because the literals appearing in the body of the rule may be negated. FOIL has been applied to a variety of problem domains.  The FOIL algorithm shows the outer loop corresponds to a variant of the SEQUENTIAL-COVERING algorithm.  It learns new rules one at a time, removing the positive examples covered by the latest rule before attempting to learn the next rule. The inner loop corresponds to a variant of our earlier LEARN-ONE-RULE algorithm, extended to accommodate first-order rules.
  • 26.  FOIL seeks only rules that predict when the target literal is True, whereas our earlier algorithm would seek both rules that predict when it is True and rules that predict when it is False. Also, FOIL performs a simple hillclimbing search rather than a beam search.  The hypothesis space search performed by FOIL is best understood by viewing it hierarchically. Each iteration through FOIL'S outer loop adds a new rule to  its disjunctive hypothesis, Learned_rule.
  • 27.
  • 28.  The two most substantial differences between FOIL and our earlier  SEQUENTIAL-COVERING and LGE ARN-ONE-RULE algorithm follow from the requirement that it accommodate first-order rules. These differences are:  1. In its general-to-specific search to 'learn each new rule, FOIL employs different detailed steps to generate candidate specializations of the rule. This difference follows from the need to accommodate variables in the rule preconditions.  2. FOIL employs a PERFORMANCE measure, Foil-Gain, that differs from the entropy measure shown for LEARN- ONE-RUinL TEa ble 10.2. This difference follows from the need to distinguish between different bindings of the rule variables and from the fact that FOIL seeks only rules that cover positive examples.
  • 29. 5.1 Generating Candidate Specializations in FOIL  To generate candidate specializations of the current rule, FOIL generates a variety of new literals, each of which may be individually added to the rule preconditions.  More precisely, suppose the current rule being considered is  P(x1,x2.....xk)<-L1....Ln  where L1.. . L, are literals forming the current rule preconditions and where P(x1, x2, . . . , xk) is the literal that forms the rule head, or postconditions.
  • 30.  FOIL generates candidate specializations of this rule by considering new literals Ln+1that fit one of the following forms:  Q(vl, . . . , v,), where Q is any predicate name occurring in Predicates and where the vi are either new variables or variables already present in the rule.  At least one of the vi in the created literal must already exist as a variable in the rule.  Equal(xj, xk), where xi and xk are variables already present in the rule.  The negation of either of the above forms of literals.
  • 31.  The general-to-specific search in FOIL begins with the most general rule  GrandDaughter(x, y) --  which asserts that GrandDaughter(x, y) is true of any x and y.  Now suppose that among the above literals FOIL greedily selects Father-( y , z) as the most promising, leading to the more specific rule  GrandDaughter(x, y)  Father(y, z)  In generating candidate literals to further specialize this rule, FOIL will now consider all of the literals mentioned in the previous step, plus the additional literals Female(z), Equal(z, x), Equal(z, y), Father(z, w), Father(w, z), and their negations. These new literals are considered at this point because the variable z was added to the rule in the previous step. Because of this, FOIL now considers an additional new variable w.
  • 32.  If FOIL at this point were to select the literal Father(z, x) and on the next iteration select the literal Female(y), this would lead to the following rule,  which covers only positive examples and hence terminates the search for further specializations of the rule.
  • 33. 5.2 Guiding the Search in FOIL  To select the most promising literal from the candidates generated at each step, FOIL considers the performance of the rule over the training data.  Assume the training data includes the following simple set of assertions, where we use the convention that P(x, y) can be read as "The P of x is y .“  Here let us also make the closed world assumption that any literal involving the predicate GrandDaughter, Father, or Female and the constants Victor, Sharon, Bob, and Tom that is not listed above can be assumed to be false.
  • 34.  To select the best specialization of the current rule, FOIL considers each distinct way in which the rule variables can bind to constants in the training examples. GrandDaughter(x,y)-  The rule variables x and y are not constraindb any preconditions and may therefore bind in any combination o the four constants Victor,Sharon,Bob and Tom.  We will use the notation {x/Bob, y/Shar on} to denote a particular variable binding; that is, a substitution mapping each variable to a constant.
  • 35.  Given the four possible constants, there are 16 possible variable bindings for this initial rule. The binding {xlvictor, ylSharon} corresponds to a positive example binding, because the training data includes the assertion GrandDaughter(Victor, Sharon).  The other 15 bindings allowed by the rule (e.g., the binding {x/Bob, y/Tom}) constitute negative evidence for the rule in the current example, because no corresponding assertion can be found in the training data.
  • 36.  At each stage, the rule is evaluated based on these sets of positive and negative variable bindings, with preference given to rules that possess more positive bindings and fewer negative bindings.  As new literals are added to the rule, the sets of bindings will change. Note if a literal is added that introduces a new variable, then the bindings for the rule will grow in length (e.g., if Father(y, z) is added to the above rule, then the original bindingbecome the more lengthy.
  • 37.  The evaluation function used by FOIL to estimate the utility of adding a new literal is based on the numbers of positive and negative bindings covered before and after adding the new literal.  More precisely, consider some rule R, and a candidate literal L that might be added to the body of R. Let R' be the rule created by adding literal L to rule R. The value Foil-Gain(L, R) of adding L to R is defined as
  • 38.  where po is the number of positive bindings of rule R, no is the number of negative bindings of R, pl is the number of positive bindings of rule R', and nl is the number of negative bindings of R'.  Finally, t is the number of positive bindings of rule R that are still covered after adding literal L to R.  When a new variable is introduced into R by adding L, then any original binding is considered to be covered so long as some binding extending it is present in the bindings of R'.
  • 39.
  • 40. Learning Recursive Rule Sets  we ignored the possibility that new literals added to the rule body could refer to the target predicate itself.  However, if we include the target predicate in the input list of Predicates, then FOIL will consider it as well when generating candidate literals.  This will allow it to form recursive rules-rules that use the same predicate in the body and the head of the rule.
  • 41.  For instance, recall the following rule set that provides a recursive definition of the Ancestor relation.  Given an appropriate set of training examples, these two rules can be learned following a trace similar to the one above for GrandDaughter. Note the secondrule is among the rules that are potentially within reach of FOIL'S search, provided Ancestor is included in the list Predicates that determines which predicates may be considered when generating new literals. Of course whether this particular rule would be learned or not depends on whether these particular literals outscore competing candidates during FOIL'S greedy search for increasingly specific rules.
  • 42. 6.INDUCTION AS INVERTED DEDUCTION  A second, quite different approach to inductive logic programming is based on the simple observation that induction is just the inverse of deduction.  In general, machine learning involves building theories that explain the observed data.  xi denotes the ith training instance and f (xi) denotes its target value. Then learning is the problem of discovering a hypothesis h, such that the classification f (xi) of each training instance xi follows deductively from the hypothesis h, the description of xi, and any other background knowledge B known to the system.
  • 43.  The expression X|- Y is read "Y follows deductively from X,“ or alternatively "X entails Y.“  The expression X |-Y is read "Y follows deductively from X," or alternatively entails Y."
  • 44.  The process of augmenting the set of predicates, based on background knowledge, is often referred to as constructive induction.  Induction is, in fact, the inverse operation of deduction, and cannot be conceived to exist without the corresponding operation, so that the question of relative importance cannot arise.  An inverse entailment operator, O(B, D) takes the training data D = {(xi,f (xi ) ) }and background knowledge B as input and produces as output a hypothesis h satisfying Equation
  • 45. oThis formulation subsumes the common definition of learning as finding some general concept that matches a given set of training examples (which corresponds to the special case where no background knowledge B is available). o By incorporating the notion of background information B, this formulation allows a more rich definition of when a hypothesis may be said to "fit" the data. Up until now, we have always determined whether a hypothesis (e.g., neural network) fits the data based solely on the description of the hypothesis and data, independent of the task domain under study. In contrast, this formulation allows the domain-specific background information B to become part of the definition of "fit." In particular, h fits the training example (xi, f (xi)) as long as f (xi) follows deductively from B A h A xi. oBy incorporating background information B, this formulation invites learning methods that use this background information to guide the search for h, rather than merely searching the space of syntactically legal hypotheses. The inverse resolution procedure described in the following section uses background knowledge in this fashion.