Machine Learning:
         Inductive Logic Programming
                         Dr Valentina Plekhanova
                        University of Sunderland, UK




                        Formalisms in Inductive Learning


    Learning the attribute descriptions, e.g. Decision Tree
                           descriptions,
    Learning the first-order relational descriptions, e.g. ILP
                 first-




Valentina Plekhanova           Machine Learning: ILP             2




                                              ILP: a Framework


•   Theoretical Setting: Inductive Logic Programming
                Setting:
•   Task: Concept Learning
    Task:
•   Methods: Inductive Leaning
    Methods:
•   Algorithms: FOIL
    Algorithms:




Valentina Plekhanova           Machine Learning: ILP             3




                                                                     1
Inductive Logic Programming


  Inductive Logic Programming ILP = I ∩ LP , where I stands
  for induction in machine learning, LP stands for logic
  programming.




Valentina Plekhanova       Machine Learning: ILP                4




                          Inductive Concept Learning:
                                            Definition
  Given a set E of positive and negative examples of concept C ,
  find a hypothesis H , expressed in a given concept description
  language L , such that
     every positive example ε ∈ Ε +           is covered by H
                               −
     no negative examples ε ∈ Ε are covered by H
      ( H is "complete and consistent").




Valentina Plekhanova       Machine Learning: ILP                5




                         Inductive Logic Programming:
                                             a Method
    Reminder: Induction means reasoning from specific
    Reminder:
   to general.
   In the case of inductive leaning from examples, the
   learner is given some examples from which general
   rules or a theory underplaying the examples are
   derived.
   Inductive inference can involve the use of
   background knowledge to construct a hypothesis
   which agrees with some set of examples according to
   relationship.
   relationship.

Valentina Plekhanova       Machine Learning: ILP                6




                                                                    2
Definitions
 Clause - component of a (complex) sentence, with its own
 subject and predicate.
       Predicate - part of a statement which says something about
       the subject ,e.g. "is short" in "Life is short".
                         "is short" "Life short".
       Subject - (contrasted with predicate) word(s) in a sentence
       about which something is predicated (contrasted with
       object), e.g. Life.
                     Life.
 Inference - process of inferring - to get conclusion or to reach
 an option from facts or reasoning.


Valentina Plekhanova               Machine Learning: ILP                         7




                                                                         ILP
In an ILP problem the task is to define, from given examples, an
unknown relation (i.e. the target predicate) in terms of (itself
and) known relations from background knowledge.
In ILP, the training examples, the background knowledge and
                     examples,
the induced hypotheses are all expressed in a logic program
form, with additional restrictions imposed on each of the three
languages.
languages.
For example, training examples are typically represented as
ground facts of the target predicate, and most often background
knowledge is restricted to be of the same form.


Valentina Plekhanova               Machine Learning: ILP                         8




                              First-Order Predicate Logic
                              First-
- formal framework for describing and reasoning about objects, their parts,
and relations among the objects and/or the parts. An important subset of first-
order logic is Horn clauses: grandparent (X,Y) ← parent (X, Z), parent (Z,Y)
 where
 grandparent (X,Y) - head of clause or postcondition,
 parent (X, Z), parent (Z,Y) - body of clause or precondition,

 grandparent, parent - predicates; a Literal is any predicate or its negation,
  (X,Y), (X,Z), (Z,Y) - arguments,
                 X, Y, Z - variables,
comma between predicates means "conjunction", ← means
                       IF (body of clause) THEN (head of clause )

Valentina Plekhanova               Machine Learning: ILP                         9




                                                                                     3
Reasons for ILP
 Any sets of first-order Horn clauses can be interpreted as
 programs in the logic programming language PROLOG, so
 learning them (Horn clauses) is often called inductive logic
 programming (ILP).
 ILP is very convenient learning because there are two reasons:
   ILP is based on the sets of IF-THEN rules (logic) - one of the
 most expressive and human readable representations for learned
 hypotheses.
   ILP can be viewed as automatically inferring PROLOG
 programs from examples. PROLOG is a programming language
 in which programs are expressed as collections of Horn clauses.
Valentina Plekhanova                  Machine Learning: ILP                          10




                                  ILP Problem: an Example

   • Illustration of the ILP task on a simple problem of learning
     family relations.
   • Example – an ILP Problem
   • The task is to define the target relation daughter (x,y) which
     states that person x is a daughter of person y, in terms of the
     background knowledge relations female and parent.
                                                    parent.
   • These relations are given in the following table.
   • There are two positive ⊕ and two negative Θ examples
     of the target relation.



Valentina Plekhanova                  Machine Learning: ILP                          11




                                              A Simple ILP Problem:
                                      Learning the daughter Relation
  Training Examples                Background Knowledge           Background Knowledge


  daughter (mary, ann)      ⊕      parent (ann, mary)             female(ann)



  daughter (eve, tom)              parent (ann, tom)              female(mary)
                            ⊕

  daughter (tom, ann)
                            Θ      parent (tom, eve)              female(eve)



  daughter (eve, ann)
                            Θ      parent (tom, ian)




In the hypothesis language of Horn clauses it is possible to formulate the
                                                             formulate
following definition of target relation:
                         daughter (x,y)          female (x), parent (y,x)
Valentina Plekhanova                  Machine Learning: ILP                          12




                                                                                          4
INPUT: B, E+, E-, H=∅
    R = p(X1,…,Xn) ←                                          FOIL Algorithm
    WHILE E+≠ ∅ DO
          WHILE There are examples in E – , i.e. ∃ e ∈ E - ,
          i.e. that are still covered by H∪ {R}        DO
          find the best Literal L (via FOIL_Gain) to add this Literal L to R
                                       FOIL_Gain)
          R =R←L
          E - = E -  {e ∈ E- | that does not satisfy R }
          END

    H = H ∪ {R}
    E+ = E+  {e ∈ E + | H}
     i.e. {The examples in E+ , that are covered by B, are removed}
    END
OUTPUT: H


 Valentina Plekhanova                 Machine Learning: ILP                               13




                                                              FOIL Algorithm
   Consider some rule R, and a candidate literal L that might be
   added to the body of R. Let R′ be the new rule created by adding
   literal L to R.
   The value Foil_Gain (L, R) of adding L to R is defined as
      Foil_Gain (L, R)=t { log2 [p1 / (p1+n1)]-log2 [p0 / (p0+n0)] }
                                            )]-
   where p0 is the number of positive bindings of rule R, n0 is the
   number of negative bindings of R, p1 and n1 - for rule R′.
   t is the number of positive bindings of rule R that are still
   covered after adding literal L to R. Foil_Gain value has an
   interpretation in terms of information theory ( -log2[p0 /(p0+n0)]
   is the entropy).
 Valentina Plekhanova                 Machine Learning: ILP                               14




                                   FOIL Algorithm: an Example
  Rule1: daughter (X,Y) ←
  T1: (Mary, Ann) +                p1 = 2                            L1 = female (X)
      (Eve, Tom)     +             n1 = 2                            Foil_Gain_L1 = …
                                                                     Foil_Gain_L
      (Tom, Ann)     -              t1 = …                            L2 = parent (Y,X)
      (Eve, Ann)     -                                                Foil_Gain_L2 = …
  Rule2: daughter (X,Y) ← female (X)
  T2: (Mary, Ann) +                 p2 = 2                            L2 = parent (Y,X)
      (Eve, Tom)     +              n2 = 1                            Foil_Gain_L2 = …
                                                                      Foil_Gain_L
      (Eve, Ann)     -              t2 = …
  Rule3: daughter (X,Y) ← female (X), parent (Y,X)
  T3: (Mary, Ann) +                 p3 = 2                           Final rule - …?
                                                                           rule
       (Eve, Tom)       +            n3 = 0
                                     t3 = …

 Valentina Plekhanova                 Machine Learning: ILP                               15




                                                                                               5
Complete & Consistent

 Prior satisfiablity: Before we take any hypothesis into account
       satisfiablity:
 we cannot make any conclusions with respect to the negative
 examples and the background knowledge. This is needed for the
 negative examples that have to conflict the background
 knowledge.
 Posterior satisfiability: It means a negative example cannot be
            satisfiability:
 derived from Hypothesis and Background Knowledge.




Valentina Plekhanova       Machine Learning: ILP                     16




                             Complete & Consistent


    Prior Necessity: Some positive examples may simply be a
          Necessity:
    conclusion from the background knowledge, but not all.
    Posterior Sufficiency: To verify that all positive examples
               Sufficiency:
    are covered by the background knowledge and the
    hypothesis. If the hypothesis satisfies this condition we call
    it complete.




Valentina Plekhanova       Machine Learning: ILP                     17




                                                   ID3 vs FOIL
ID3:
 Learner learns attribute descriptions.
 There are limitations, e.g. limited representational formalism.
 Limited capability of taking into account the available
background knowledge.
Foil:
 Object can be described structurally, i.e. in terms of their
                                          ,
components and relations among the components.
 Learner learns first-order relational descriptions.
 The given relations constitute the background knowledge.
                                                        .

Valentina Plekhanova       Machine Learning: ILP                     18




                                                                          6

Machine Learning: Machine Learning:

  • 1.
    Machine Learning: Inductive Logic Programming Dr Valentina Plekhanova University of Sunderland, UK Formalisms in Inductive Learning Learning the attribute descriptions, e.g. Decision Tree descriptions, Learning the first-order relational descriptions, e.g. ILP first- Valentina Plekhanova Machine Learning: ILP 2 ILP: a Framework • Theoretical Setting: Inductive Logic Programming Setting: • Task: Concept Learning Task: • Methods: Inductive Leaning Methods: • Algorithms: FOIL Algorithms: Valentina Plekhanova Machine Learning: ILP 3 1
  • 2.
    Inductive Logic Programming Inductive Logic Programming ILP = I ∩ LP , where I stands for induction in machine learning, LP stands for logic programming. Valentina Plekhanova Machine Learning: ILP 4 Inductive Concept Learning: Definition Given a set E of positive and negative examples of concept C , find a hypothesis H , expressed in a given concept description language L , such that every positive example ε ∈ Ε + is covered by H − no negative examples ε ∈ Ε are covered by H ( H is "complete and consistent"). Valentina Plekhanova Machine Learning: ILP 5 Inductive Logic Programming: a Method Reminder: Induction means reasoning from specific Reminder: to general. In the case of inductive leaning from examples, the learner is given some examples from which general rules or a theory underplaying the examples are derived. Inductive inference can involve the use of background knowledge to construct a hypothesis which agrees with some set of examples according to relationship. relationship. Valentina Plekhanova Machine Learning: ILP 6 2
  • 3.
    Definitions Clause -component of a (complex) sentence, with its own subject and predicate. Predicate - part of a statement which says something about the subject ,e.g. "is short" in "Life is short". "is short" "Life short". Subject - (contrasted with predicate) word(s) in a sentence about which something is predicated (contrasted with object), e.g. Life. Life. Inference - process of inferring - to get conclusion or to reach an option from facts or reasoning. Valentina Plekhanova Machine Learning: ILP 7 ILP In an ILP problem the task is to define, from given examples, an unknown relation (i.e. the target predicate) in terms of (itself and) known relations from background knowledge. In ILP, the training examples, the background knowledge and examples, the induced hypotheses are all expressed in a logic program form, with additional restrictions imposed on each of the three languages. languages. For example, training examples are typically represented as ground facts of the target predicate, and most often background knowledge is restricted to be of the same form. Valentina Plekhanova Machine Learning: ILP 8 First-Order Predicate Logic First- - formal framework for describing and reasoning about objects, their parts, and relations among the objects and/or the parts. An important subset of first- order logic is Horn clauses: grandparent (X,Y) ← parent (X, Z), parent (Z,Y) where grandparent (X,Y) - head of clause or postcondition, parent (X, Z), parent (Z,Y) - body of clause or precondition, grandparent, parent - predicates; a Literal is any predicate or its negation, (X,Y), (X,Z), (Z,Y) - arguments, X, Y, Z - variables, comma between predicates means "conjunction", ← means IF (body of clause) THEN (head of clause ) Valentina Plekhanova Machine Learning: ILP 9 3
  • 4.
    Reasons for ILP Any sets of first-order Horn clauses can be interpreted as programs in the logic programming language PROLOG, so learning them (Horn clauses) is often called inductive logic programming (ILP). ILP is very convenient learning because there are two reasons: ILP is based on the sets of IF-THEN rules (logic) - one of the most expressive and human readable representations for learned hypotheses. ILP can be viewed as automatically inferring PROLOG programs from examples. PROLOG is a programming language in which programs are expressed as collections of Horn clauses. Valentina Plekhanova Machine Learning: ILP 10 ILP Problem: an Example • Illustration of the ILP task on a simple problem of learning family relations. • Example – an ILP Problem • The task is to define the target relation daughter (x,y) which states that person x is a daughter of person y, in terms of the background knowledge relations female and parent. parent. • These relations are given in the following table. • There are two positive ⊕ and two negative Θ examples of the target relation. Valentina Plekhanova Machine Learning: ILP 11 A Simple ILP Problem: Learning the daughter Relation Training Examples Background Knowledge Background Knowledge daughter (mary, ann) ⊕ parent (ann, mary) female(ann) daughter (eve, tom) parent (ann, tom) female(mary) ⊕ daughter (tom, ann) Θ parent (tom, eve) female(eve) daughter (eve, ann) Θ parent (tom, ian) In the hypothesis language of Horn clauses it is possible to formulate the formulate following definition of target relation: daughter (x,y) female (x), parent (y,x) Valentina Plekhanova Machine Learning: ILP 12 4
  • 5.
    INPUT: B, E+,E-, H=∅ R = p(X1,…,Xn) ← FOIL Algorithm WHILE E+≠ ∅ DO WHILE There are examples in E – , i.e. ∃ e ∈ E - , i.e. that are still covered by H∪ {R} DO find the best Literal L (via FOIL_Gain) to add this Literal L to R FOIL_Gain) R =R←L E - = E - {e ∈ E- | that does not satisfy R } END H = H ∪ {R} E+ = E+ {e ∈ E + | H} i.e. {The examples in E+ , that are covered by B, are removed} END OUTPUT: H Valentina Plekhanova Machine Learning: ILP 13 FOIL Algorithm Consider some rule R, and a candidate literal L that might be added to the body of R. Let R′ be the new rule created by adding literal L to R. The value Foil_Gain (L, R) of adding L to R is defined as Foil_Gain (L, R)=t { log2 [p1 / (p1+n1)]-log2 [p0 / (p0+n0)] } )]- where p0 is the number of positive bindings of rule R, n0 is the number of negative bindings of R, p1 and n1 - for rule R′. t is the number of positive bindings of rule R that are still covered after adding literal L to R. Foil_Gain value has an interpretation in terms of information theory ( -log2[p0 /(p0+n0)] is the entropy). Valentina Plekhanova Machine Learning: ILP 14 FOIL Algorithm: an Example Rule1: daughter (X,Y) ← T1: (Mary, Ann) + p1 = 2 L1 = female (X) (Eve, Tom) + n1 = 2 Foil_Gain_L1 = … Foil_Gain_L (Tom, Ann) - t1 = … L2 = parent (Y,X) (Eve, Ann) - Foil_Gain_L2 = … Rule2: daughter (X,Y) ← female (X) T2: (Mary, Ann) + p2 = 2 L2 = parent (Y,X) (Eve, Tom) + n2 = 1 Foil_Gain_L2 = … Foil_Gain_L (Eve, Ann) - t2 = … Rule3: daughter (X,Y) ← female (X), parent (Y,X) T3: (Mary, Ann) + p3 = 2 Final rule - …? rule (Eve, Tom) + n3 = 0 t3 = … Valentina Plekhanova Machine Learning: ILP 15 5
  • 6.
    Complete & Consistent Prior satisfiablity: Before we take any hypothesis into account satisfiablity: we cannot make any conclusions with respect to the negative examples and the background knowledge. This is needed for the negative examples that have to conflict the background knowledge. Posterior satisfiability: It means a negative example cannot be satisfiability: derived from Hypothesis and Background Knowledge. Valentina Plekhanova Machine Learning: ILP 16 Complete & Consistent Prior Necessity: Some positive examples may simply be a Necessity: conclusion from the background knowledge, but not all. Posterior Sufficiency: To verify that all positive examples Sufficiency: are covered by the background knowledge and the hypothesis. If the hypothesis satisfies this condition we call it complete. Valentina Plekhanova Machine Learning: ILP 17 ID3 vs FOIL ID3: Learner learns attribute descriptions. There are limitations, e.g. limited representational formalism. Limited capability of taking into account the available background knowledge. Foil: Object can be described structurally, i.e. in terms of their , components and relations among the components. Learner learns first-order relational descriptions. The given relations constitute the background knowledge. . Valentina Plekhanova Machine Learning: ILP 18 6