Bayesian Logic Programs Kristian Kersting, Luc De Raedt Albert-Ludwigs University Freiburg, Germany Summer School on Relat...
Context Real-world applications uncertainty complex, structured domains logic objects, relations,functors probability theo...
Outline <ul><li>Bayesian Logic Programs </li></ul><ul><ul><li>Examples and Language </li></ul></ul><ul><ul><li>Semantics  ...
Bayesian Logic Programs <ul><li>Probabilistic models structured using logic  </li></ul><ul><li>Extend Bayesian networks wi...
Bayesian Networks <ul><li>One of the successes  of AI </li></ul><ul><li>State-of-the-art to model uncertainty, in particul...
Stud farm (Jensen ´96) <ul><li>The colt John has been born recently on a stud farm. </li></ul><ul><li>John suffers from a ...
Bayesian networks  [Pearl ´88] bt_ann bt_brian bt_cecily bt_dorothy bt_eric bt_gwenn bt_unknown2 bt_unknown1 bt_fred bt_he...
Bayesian networks  [Pearl ´88] bt_ann bt_brian bt_cecily bt_dorothy bt_eric bt_gwenn bt_unknown2 bt_unknown1 bt_fred bt_he...
Bayesian networks (contd.) <ul><li>acyclic graphs </li></ul><ul><li>probability distribution  over a finite set    of rand...
From Bayesian Networks to Bayesian Logic Programs bt_ann bt_brian bt_cecily bt_dorothy bt_eric bt_gwenn bt_unknown2 bt_unk...
From Bayesian Networks to Bayesian Logic Programs bt_ann. bt_brian. bt_cecily. bt_dorothy bt_eric bt_gwenn bt_unknown2. bt...
From Bayesian Networks to Bayesian Logic Programs bt_ann. bt_brian. bt_cecily. bt_dorothy | bt_ann,  bt_brian.  bt_eric| b...
From Bayesian Networks to Bayesian Logic Programs bt_ann. bt_brian. bt_cecily. bt_dorothy | bt_ann,  bt_brian.  bt_eric| b...
From Bayesian Networks to Bayesian Logic Programs bt_ann. bt_brian. bt_cecily. bt_dorothy | bt_ann,  bt_brian.  bt_eric| b...
From Bayesian Networks to Bayesian Logic Programs %  apriori nodes bt_ann.  bt_brian.  bt_cecily. bt_unknown1. bt_unknown1...
From Bayesian Networks to Bayesian Logic Programs %  apriori nodes bt(ann).  bt(brian).  bt(cecily). bt(unknown1). bt(unkn...
From Bayesian Networks to Bayesian Logic Programs  %  ground facts /  apriori  bt(ann).  bt(brian).  bt(cecily). bt(unkow...
Dependency graph   = Bayesian network bt(ann) bt(brian) bt(cecily) bt(dorothy) mother(ann,dorothy) father(brian,dorothy) b...
bt_ann bt_brian bt_cecily bt_dorothy bt_eric bt_gwenn bt_unknown2 bt_unknown1 bt_fred bt_henry bt_irene bt_john  Dependen...
Bayesian Logic Programs - a first definition <ul><li>A BLP  consists of  </li></ul><ul><ul><li>a finite set of Bayesian cl...
Bayesian Logic Programs - Examples %  apriori nodes nat(0).  %  aposteriori nodes nat(s(X)) | nat(X). ... MC %  apriori no...
<ul><li>represent generically the CPD  for each ground instance   of the corresponding Bayesian clause. </li></ul>Associat...
Combining Rules  <ul><li>Multiple ground instances of clauses having the same head atom?  </li></ul>%  ground facts as bef...
C ombining  R ules (contd.) P (A|B,C) P (A|B) and  P (A|C) CR <ul><li>Any algorithm which  </li></ul><ul><ul><ul><li>combi...
Bayesian Logic Programs - a definition <ul><li>A BLP  consists of  </li></ul><ul><ul><li>a finite set of Bayesian clauses....
Outline <ul><li>Bayesian Logic Programs </li></ul><ul><ul><li>Examples and Language </li></ul></ul><ul><ul><li>Semantics  ...
Discrete-Time Stochastic Process <ul><li>Family  of random variables  over a domain X, where  </li></ul> <ul><li>for each...
Theorem of Kolmogorov   Existence and uniqueness of  probability measure <ul><ul><li>: a Polish space </li></ul></ul><ul>...
Consistency Conditions <ul><li>Probability measure  ,  </li></ul><ul><li>is represented by a finite Bayesian network which...
Support network bt(ann) bt(brian) bt(cecily) bt(dorothy) mother(ann,dorothy) father(brian,dorothy) bt(eric) father(brian,e...
Support network bt(ann) bt(brian) bt(cecily) bt(dorothy) mother(ann,dorothy) father(brian,dorothy) bt(eric) father(brian,e...
Support network bt(ann) bt(brian) bt(cecily) bt(dorothy) mother(ann,dorothy) father(brian,dorothy) bt(eric) father(brian,e...
Support network <ul><li>Support network  of  is the induced subnetwork of </li></ul><ul><li>Support network  of  is define...
Queries using And/Or trees <ul><li>A probabilistic query  </li></ul><ul><li>?- Q 1 ...,Q n |E 1 =e 1 ,...,E m =e m .   </l...
Consistency Condition (contd.) <ul><ul><li>the dependency graph is acyclic, and  </li></ul></ul><ul><ul><li>every random v...
Relational Character % ground facts bt(ann).  bt(brian).  bt(cecily).  bt(unknown1).  bt(unknown1). father(unknown1,fred)....
Bayesian Logic Programs - Summary <ul><li>First order logic extension of Bayesian networks </li></ul><ul><li>constants, re...
Applications <ul><li>Probabilistic, logical </li></ul><ul><ul><li>Description and prediction </li></ul></ul><ul><ul><li>Re...
Other frameworks <ul><li>Probabilistic Horn Abduction  [Poole 93]  </li></ul><ul><li>Distributional Semantics (PRISM)  [Sa...
Outline <ul><li>Bayesian Logic Programs </li></ul><ul><ul><li>Examples and Language </li></ul></ul><ul><ul><li>Semantics  ...
Learning  Bayesian Logic Programs Data  +  Background Knowledge  learning algorithm .9 .1 e b e .7 .3 .01 .99 .8 .2 b e b...
Why Learning  Bayesian Logic Programs ? <ul><li>Of interest to different communities  ? </li></ul><ul><ul><li>scoring func...
What is the data about ?  A data case  is a partially observed  joint state of a finite, nonempty subset
Learning Task  <ul><li>Given: </li></ul><ul><ul><li>set  of data cases </li></ul></ul><ul><ul><li>a Bayesian logic program...
Parameter Estimation (contd.) <ul><li>„ best fit“ ~ ML-Estimation </li></ul><ul><li>where the hypothesis space  is spanned...
Parameter Estimation (contd.) <ul><li>Assumption:  </li></ul><ul><li>D1,...,DN are independently sampled from indentical d...
Parameter Estimation (contd.)  is an ordinary BN marginal of  marginal of
Parameter Estimation (contd.) <ul><li>Reduced to a problem within Bayesian networks: </li></ul><ul><li>given structure,  <...
Decomposable CRs <ul><li>Parameters of the clauses and not of the support network. </li></ul> Single ground instance of a...
Gradient Ascent  Goal : Computation of
Gradient Ascent  Bayesian ground clause Bayesian  clause
Gradient Ascent  Bayesian Network Inference Engine
Algorithm 
Expectation-Maximization <ul><li>Initialize parameters </li></ul><ul><li>E-Step  and  M-Step , i.e. </li></ul><ul><li>comp...
Experimental Evidence <ul><li>[Koller, Pfeffer ´97]  </li></ul><ul><li>support network is a good approximation </li></ul><...
Outline <ul><li>Bayesian Logic Programs </li></ul><ul><ul><li>Examples and Language </li></ul></ul><ul><ul><li>Semantics  ...
Structural Learning <ul><li>Combination of Inductive Logic Programming and Bayesian network learning </li></ul> <ul><li>D...
Idea - CLAUDIEN <ul><li>learning from interpretations  </li></ul><ul><li>all data cases are </li></ul><ul><li>Herbrand int...
What is the data about ? ... 
<ul><li>:  set of data cases </li></ul><ul><li>:  set of all clauses that can be part of  </li></ul><ul><li>hypotheses </l...
Learning Task <ul><li>Given: </li></ul><ul><ul><li>set  of data cases </li></ul></ul><ul><ul><li>a set  of Bayesian logic ...
Algorithm 
Example  mc(john) pc(john) bc(john) m(ann,john) f(eric,john) pc(ann) mc(ann) mc(eric) pc(eric) mc(X) | m(M,X), mc(M), pc(...
Example  mc(john) pc(john) bc(john) m(ann,john) f(eric,john) pc(ann) mc(ann) mc(eric) pc(eric) mc(X) | m(M,X), mc(M), pc(...
Example  mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Original Bayesian logic progra...
Example  mc(john) pc(john) bc(john) m(ann,john) f(eric,john) pc(ann) mc(ann) mc(eric) pc(eric) mc(X) | m(M,X), mc(M), pc(...
Example  mc(john) pc(john) bc(john) m(ann,john) f(eric,john) pc(ann) mc(ann) mc(eric) pc(eric) mc(X) | m(M,X), mc(M), pc(...
Example  mc(john) pc(john) bc(john) m(ann,john) f(eric,john) pc(ann) mc(ann) mc(eric) pc(eric) mc(X) | m(M,X), mc(M), pc(...
Example  mc(john) pc(john) bc(john) m(ann,john) f(eric,john) pc(ann) mc(ann) mc(eric) pc(eric) ... mc(X) | m(M,X), mc(M),...
Properties <ul><li>All relevant random variables are known  </li></ul><ul><li>First order equivalent of Bayesian network s...
Example Experiments  mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Data:  sampling fr...
Conclusion <ul><li>EM-based and Gradient-based method to do ML parameter estimation </li></ul><ul><li>Link between ILP and...
 Thanks !
Upcoming SlideShare
Loading in...5
×

BLPs

312

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
312
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

BLPs

  1. 1. Bayesian Logic Programs Kristian Kersting, Luc De Raedt Albert-Ludwigs University Freiburg, Germany Summer School on Relational Data Mining 17 and 18 August 2002, Helsinki, Finland
  2. 2. Context Real-world applications uncertainty complex, structured domains logic objects, relations,functors probability theory discrete, continuous Bayesian networks Logic Programming (Prolog) + Bayesian logic programs
  3. 3. Outline <ul><li>Bayesian Logic Programs </li></ul><ul><ul><li>Examples and Language </li></ul></ul><ul><ul><li>Semantics and Support Networks </li></ul></ul><ul><li>Learning Bayesian Logic Programs </li></ul><ul><ul><li>Data Cases </li></ul></ul><ul><ul><li>Parameter Estimation </li></ul></ul><ul><ul><li>Structural Learning </li></ul></ul>
  4. 4. Bayesian Logic Programs <ul><li>Probabilistic models structured using logic </li></ul><ul><li>Extend Bayesian networks with notions of objects and relations </li></ul><ul><li>Probability density over (countably) infinitely many random variables </li></ul><ul><li>Flexible discrete-time stochastic processes </li></ul><ul><li>Generalize pure Prolog, Bayesian networks, dynamic Bayesian networks, dynamic Bayesian multinets, hidden Markov models,... </li></ul>
  5. 5. Bayesian Networks <ul><li>One of the successes of AI </li></ul><ul><li>State-of-the-art to model uncertainty, in particular the degree of belief </li></ul><ul><li>Advantage [Russell, Norvig 96]: </li></ul><ul><ul><li>„ strict separation of qualitative and quantitative aspects of the world“ </li></ul></ul><ul><li>Disadvantge [Breese, Ngo, Haddawy, Koller, ...]: </li></ul><ul><ul><li>Propositional character, no notion of objects and relations among them </li></ul></ul>
  6. 6. Stud farm (Jensen ´96) <ul><li>The colt John has been born recently on a stud farm. </li></ul><ul><li>John suffers from a life threatening hereditary carried by a recessive gene. The disease is so serious that John is displaced instantly, and the stud farm wants the gene out of production, his parents are taken out of breeding. </li></ul><ul><li>What are the probabilities for the remaining horses to be carriers of the unwanted gene ? </li></ul>
  7. 7. Bayesian networks [Pearl ´88] bt_ann bt_brian bt_cecily bt_dorothy bt_eric bt_gwenn bt_unknown2 bt_unknown1 bt_fred bt_henry bt_irene bt_john Based on the stud farm example [Jensen ´96] 
  8. 8. Bayesian networks [Pearl ´88] bt_ann bt_brian bt_cecily bt_dorothy bt_eric bt_gwenn bt_unknown2 bt_unknown1 bt_fred bt_henry bt_irene bt_john Based on the stud farm example [Jensen ´96] (Conditional) Probability distribution  P(bt_cecily=aA|bt_john=aA)=0.1499 P(bt_john=AA|bt_ann=aA)=0.6906 P(bt_john=AA)=0.9909 (0.33,0.33,0.33) ... (0.0,1.0,0.0) (0.5,0.5,0.0) bt_irene bt_henry P (bt_john) AA AA AA aa aA aa aa aa (1.0,0.0,0.0)
  9. 9. Bayesian networks (contd.) <ul><li>acyclic graphs </li></ul><ul><li>probability distribution over a finite set of random variables: </li></ul>
  10. 10. From Bayesian Networks to Bayesian Logic Programs bt_ann bt_brian bt_cecily bt_dorothy bt_eric bt_gwenn bt_unknown2 bt_unknown1 bt_fred bt_henry bt_irene bt_john 
  11. 11. From Bayesian Networks to Bayesian Logic Programs bt_ann. bt_brian. bt_cecily. bt_dorothy bt_eric bt_gwenn bt_unknown2. bt_unknown1. bt_fred bt_henry bt_irene bt_john 
  12. 12. From Bayesian Networks to Bayesian Logic Programs bt_ann. bt_brian. bt_cecily. bt_dorothy | bt_ann, bt_brian. bt_eric| bt_brian, bt_cecily. bt_gwenn | bt_ann, bt_unknown2. bt_unknown2. bt_unknown1. bt_fred | bt_unknown1, bt_ann. bt_henry bt_irene bt_john 
  13. 13. From Bayesian Networks to Bayesian Logic Programs bt_ann. bt_brian. bt_cecily. bt_dorothy | bt_ann, bt_brian. bt_eric| bt_brian, bt_cecily. bt_gwenn | bt_ann, bt_unknown2. bt_unknown2. bt_unknown1. bt_fred | bt_unknown1, bt_ann. bt_henry | bt_fred, bt_dorothy. bt_irene | bt_eric, bt_gwenn. bt_john 
  14. 14. From Bayesian Networks to Bayesian Logic Programs bt_ann. bt_brian. bt_cecily. bt_dorothy | bt_ann, bt_brian. bt_eric| bt_brian, bt_cecily. bt_gwenn | bt_ann, bt_unknown2. bt_unknown2. bt_unknown1. bt_fred | bt_unknown1, bt_ann. bt_henry | bt_fred, bt_dorothy. bt_irene | bt_eric, bt_gwenn. bt_john | bt_henry ,bt_irene.  (0.33,0.33,0.33) ... (0.0,1.0,0.0) (0.5,0.5,0.0) bt_irene bt_henry P (bt_john) AA AA AA aa aA aa aa aa (1.0,0.0,0.0)
  15. 15. From Bayesian Networks to Bayesian Logic Programs % apriori nodes bt_ann. bt_brian. bt_cecily. bt_unknown1. bt_unknown1. % aposteriori nodes bt_henry | bt_fred, bt_dorothy. bt_irene | bt_eric, bt_gwenn. bt_fred | bt_unknown1, bt_ann. bt_dorothy| bt_brian, bt_ann. bt_eric | bt_brian, bt_cecily. bt_gwenn | bt_unknown2, bt_ann. bt_john | bt_henry, bt_irene.  D omain e.g. finite, discrete, continuous (conditional) probability distribution (0.33,0.33,0.33) ... (0.0,1.0,0.0) (0.5,0.5,0.0) bt_irene bt_henry P (bt_john) AA AA AA aa aA aa aa aa (1.0,0.0,0.0)
  16. 16. From Bayesian Networks to Bayesian Logic Programs % apriori nodes bt(ann). bt(brian). bt(cecily). bt(unknown1). bt(unknown1). % aposteriori nodes bt(henry) | bt(fred), bt(dorothy). bt(irene) | bt(eric), bt(gwenn). bt(fred) | bt(unknown1), bt(ann). bt(dorothy)| bt(brian), bt(ann). bt(eric) | bt(brian), bt(cecily). bt(gwenn) | bt(unknown2), bt(ann). bt(john) | bt(henry), bt(irene).  (conditional) probability distribution (0.33,0.33,0.33) ... (0.0,1.0,0.0) (0.5,0.5,0.0) bt(irene) bt(henry) P (bt(john)) AA AA AA aa aA aa aa aa (1.0,0.0,0.0)
  17. 17. From Bayesian Networks to Bayesian Logic Programs  % ground facts / apriori bt(ann). bt(brian). bt(cecily). bt(unkown1). bt(unkown1). father(unkown1,fred). mother(ann,fred). father(brian,dorothy). mother(ann, dorothy). father(brian,eric). mother(cecily,eric). father(unkown2,gwenn). mother(ann,gwenn). father(fred,henry). mother(dorothy,henry). father(eric,irene). mother(gwenn,irene). father(henry,john). mother(irene,john). % rules / aposteriori bt(X) | father(F,X), bt(F), mother(M,X), bt(M). (conditional) probability distribution AA Aa bt(F) false true father(F,X) (0.33,0.33,0.33) ... (1.0,0.0,0.0) P (bt(X)) false bt(M) mother(M,X) AA aa true
  18. 18. Dependency graph = Bayesian network bt(ann) bt(brian) bt(cecily) bt(dorothy) mother(ann,dorothy) father(brian,dorothy) bt(eric) father(brian,eric) mother(cecily,eric) bt(gwenn) mother(ann,gwenn) bt(unknown2) bt(unknown1) bt(fred) mother(ann,fred) father(unknown1,fred) bt(henry) bt(irene) mother(dorothy,henry) mother(gwenn,irene) father(eric,irene) father(fred,henry) bt(john) father(henry,john) mother(irene,john) father(unknown2,eric) 
  19. 19. bt_ann bt_brian bt_cecily bt_dorothy bt_eric bt_gwenn bt_unknown2 bt_unknown1 bt_fred bt_henry bt_irene bt_john  Dependency graph = Bayesian network
  20. 20. Bayesian Logic Programs - a first definition <ul><li>A BLP consists of </li></ul><ul><ul><li>a finite set of Bayesian clauses. </li></ul></ul><ul><ul><li>To each clause in a conditional probability distribution is associated: </li></ul></ul><ul><li>Proper random variables ~ LH(B) </li></ul><ul><li>graphical structure ~ dependency graph </li></ul><ul><li>Quantitative information ~ CPDs </li></ul>
  21. 21. Bayesian Logic Programs - Examples % apriori nodes nat(0). % aposteriori nodes nat(s(X)) | nat(X). ... MC % apriori nodes state(0). % aposteriori nodes state(s(Time)) | state(Time). output(Time) | state(Time) HMM % apriori nodes n1(0). % aposteriori nodes n1(s(TimeSlice) | n2(TimeSlice). n2(TimeSlice) | n1(TimeSlice). n3(TimeSlice) | n1(TimeSlice), n2(TimeSlice). DBN  pure Prolog nat(0) nat(s(0)) nat(s(s(0)) state(0) output(0) state(s(0)) output(s(0)) ... n1(0) n2(0) n3(0) n1(s(0)) n2(s(0)) n3(s(0)) ...
  22. 22. <ul><li>represent generically the CPD for each ground instance of the corresponding Bayesian clause. </li></ul>Associated CPDs  Multiple ground instances of clauses having the same head atom? AA Aa bt(F) false true father(F,X) (0.33,0.33,0.33) ... (1.0,0.0,0.0) P (bt(X)) false bt(M) mother(M,X) AA aa true 
  23. 23. Combining Rules <ul><li>Multiple ground instances of clauses having the same head atom? </li></ul>% ground facts as before % rules bt(X) | father(F,X), bt(F). bt(X) | mother(M,X), bt(M).  cpd(bt(john)|father(henry,john), bt(henry)) and cpd(bt(john)|mother(henry,john), bt(irene)) cpd(bt(john)|father(henry,john),bt(henry),mother(irene,john),bt(irene)) But we need !!!
  24. 24. C ombining R ules (contd.) P (A|B,C) P (A|B) and P (A|C) CR <ul><li>Any algorithm which </li></ul><ul><ul><ul><li>combines a set of PDFs </li></ul></ul></ul><ul><ul><ul><li>into the (combined) PDFs </li></ul></ul></ul><ul><ul><ul><li>where </li></ul></ul></ul><ul><ul><ul><li>has an empty output if and only if the input is empty </li></ul></ul></ul><ul><li>E.g. noisy-or, regression, ... </li></ul>
  25. 25. Bayesian Logic Programs - a definition <ul><li>A BLP consists of </li></ul><ul><ul><li>a finite set of Bayesian clauses. </li></ul></ul><ul><ul><li>To each clause in a conditional probability distribution is associated: </li></ul></ul><ul><ul><li>To each Bayesian predicate p a combining rule is associated to combine CPDs of multiple ground instances of clauses having the same head </li></ul></ul><ul><li>Proper random variables ~ LH(B) </li></ul><ul><li>graphical structure ~ dependency graph </li></ul><ul><li>Quantitative information ~ CPDs and CRs </li></ul>
  26. 26. Outline <ul><li>Bayesian Logic Programs </li></ul><ul><ul><li>Examples and Language </li></ul></ul><ul><ul><li>Semantics and Support Networks </li></ul></ul><ul><li>Learning Bayesian Logic Programs </li></ul><ul><ul><li>Data Cases </li></ul></ul><ul><ul><li>Parameter Estimation </li></ul></ul><ul><ul><li>Structural Learning </li></ul></ul>
  27. 27. Discrete-Time Stochastic Process <ul><li>Family of random variables over a domain X, where </li></ul> <ul><li>for each linearization of the partial order induced by the dependency graph a Bayesian logic program specifies a discrete-time stochastic process </li></ul>
  28. 28. Theorem of Kolmogorov  Existence and uniqueness of probability measure <ul><ul><li>: a Polish space </li></ul></ul><ul><ul><li>: set of all non-empty, finite subsets of J </li></ul></ul><ul><ul><li>: the probability measure over </li></ul></ul><ul><ul><li>If the projective family exists then there exists a unique probability measure </li></ul></ul>
  29. 29. Consistency Conditions <ul><li>Probability measure , </li></ul><ul><li>is represented by a finite Bayesian network which is a subnetwork of the dependency graph over LH(B): Support Network </li></ul> <ul><li>(Elimination Order): All stochastic processes represented by a Bayesian logic program B specify the same probability measure over LH(B). </li></ul>
  30. 30. Support network bt(ann) bt(brian) bt(cecily) bt(dorothy) mother(ann,dorothy) father(brian,dorothy) bt(eric) father(brian,eric) mother(cecily,eric) bt(gwenn) mother(ann,gwenn) bt(unknown2) bt(unknown1) bt(fred) mother(ann,fred) father(unknown1,fred) bt(henry) bt(irene) mother(dorothy,henry) mother(gwenn,irene) father(eric,irene) father(fred,henry) bt(john) father(henry,john) mother(irene,john) father(unknown2,eric) 
  31. 31. Support network bt(ann) bt(brian) bt(cecily) bt(dorothy) mother(ann,dorothy) father(brian,dorothy) bt(eric) father(brian,eric) mother(cecily,eric) bt(gwenn) mother(ann,gwenn) bt(unknown2) bt(unknown1) bt(fred) mother(ann,fred) father(unknown1,fred) bt(henry) bt(irene) mother(dorothy,henry) mother(gwenn,irene) father(eric,irene) father(fred,henry) bt(john) father(henry,john) mother(irene,john) father(unknown2,eric) 
  32. 32. Support network bt(ann) bt(brian) bt(cecily) bt(dorothy) mother(ann,dorothy) father(brian,dorothy) bt(eric) father(brian,eric) mother(cecily,eric) bt(gwenn) mother(ann,gwenn) bt(unknown2) bt(unknown1) bt(fred) mother(ann,fred) father(unknown1,fred) bt(henry) bt(irene) mother(dorothy,henry) mother(gwenn,irene) father(eric,irene) father(fred,henry) bt(john) father(henry,john) mother(irene,john) father(unknown2,eric) 
  33. 33. Support network <ul><li>Support network of is the induced subnetwork of </li></ul><ul><li>Support network of is defined as </li></ul><ul><li>Computation utilizes And/Or trees </li></ul>
  34. 34. Queries using And/Or trees <ul><li>A probabilistic query </li></ul><ul><li>?- Q 1 ...,Q n |E 1 =e 1 ,...,E m =e m . </li></ul><ul><li>asks for the distribution </li></ul><ul><li>P (Q 1 , ..., Q n |E 1 =e 1 , ..., E m =e m ). </li></ul><ul><li>?- bt(eric). </li></ul> <ul><li>Or node is proven if at least one of its successors is provable. </li></ul><ul><li>And node is proven if all of its successors are provable . </li></ul>bt(brian) bt(cecily) father(brian,eric),bt(brian), mother(cecily,eric),bt(cecily) father(brian,eric) mother(cecily,eric) bt(eric) cpd bt(brian) bt(cecily) father(brian,eric) mother(cecily,eric) bt(eric) combined cpd
  35. 35. Consistency Condition (contd.) <ul><ul><li>the dependency graph is acyclic, and </li></ul></ul><ul><ul><li>every random variable is influenced by a finite set of random variables only </li></ul></ul> well-defined Bayesian logic program Projective family exists if
  36. 36. Relational Character % ground facts bt(ann). bt(brian). bt(cecily). bt(unknown1). bt(unknown1). father(unknown1,fred). mother(ann,fred). father(brian,dorothy). mother(ann, dorothy). father(brian,eric). mother(cecily,eric). father(unknown2,gwenn). mother(ann,gwenn). father(fred,henry). mother(dorothy,henry). father(eric,irene). mother(gwenn,irene). father(henry,john). mother(irene,john). % rules bt(X) | father(F,X), bt(F), mother(M,X), bt(M).  AA Aa bt(F) false true father(X,F) (0.33,0.33,0.33) ... (1.0,0.0,0.0) P (bt(X)) false bt(M) mother(X,M) AA aa true % ground facts bt(susanne). bt(ralf). bt(peter). bt(uta). father(ralf,luca). mother(susanne,luca). ... % ground facts bt(petra). bt(bsilvester). bt(anne). bt(wilhelm). bt(beate). father(silvester,claudien). mother(beate,claudien). father(wilhelm,marta). mother(anne, marthe). ...
  37. 37. Bayesian Logic Programs - Summary <ul><li>First order logic extension of Bayesian networks </li></ul><ul><li>constants, relations, functors </li></ul><ul><li>discrete and continuous random variables </li></ul><ul><li>ground atoms = random variables </li></ul><ul><li>CPDs associated to clauses </li></ul><ul><li>Dependency graph = </li></ul><ul><li>(possibly) infinite Bayesian network </li></ul><ul><li>Generalize dynamic Bayesian networks and definite clause logic (range-restricted) </li></ul>
  38. 38. Applications <ul><li>Probabilistic, logical </li></ul><ul><ul><li>Description and prediction </li></ul></ul><ul><ul><li>Regression </li></ul></ul><ul><ul><li>Classification </li></ul></ul><ul><ul><li>Clustering </li></ul></ul><ul><li>Computational Biology </li></ul><ul><ul><li>APrIL IST-2001-33053 </li></ul></ul><ul><li>Web Mining </li></ul><ul><li>Query approximation </li></ul><ul><li>Planning, ... </li></ul>
  39. 39. Other frameworks <ul><li>Probabilistic Horn Abduction [Poole 93] </li></ul><ul><li>Distributional Semantics (PRISM) [Sato 95] </li></ul><ul><li>Stochastic Logic Programs [Muggleton 96; Cussens 99] </li></ul><ul><li>Relational Bayesian Nets [Jaeger 97] </li></ul><ul><li>Probabilistic Logic Programs [Ngo, Haddawy 97] </li></ul><ul><li>Object-Oriented Bayesian Nets [Koller, Pfeffer 97] </li></ul><ul><li>Probabilistic Frame-Based Systems [Koller, Pfeffer 98] </li></ul><ul><li>Probabilistic Relational Models [Koller 99] </li></ul>
  40. 40. Outline <ul><li>Bayesian Logic Programs </li></ul><ul><ul><li>Examples and Language </li></ul></ul><ul><ul><li>Semantics and Support Networks </li></ul></ul><ul><li>Learning Bayesian Logic Programs </li></ul><ul><ul><li>Data Cases </li></ul></ul><ul><ul><li>Parameter Estimation </li></ul></ul><ul><ul><li>Structural Learning </li></ul></ul>
  41. 41. Learning Bayesian Logic Programs Data + Background Knowledge  learning algorithm .9 .1 e b e .7 .3 .01 .99 .8 .2 b e b b e E P (A) A | E, B. B
  42. 42. Why Learning Bayesian Logic Programs ? <ul><li>Of interest to different communities ? </li></ul><ul><ul><li>scoring functions, pruning techniques, theoretical insights, ... </li></ul></ul> Inductive Logic Programming Learning within Bayesian Logic Programs Learning within Bayesian network
  43. 43. What is the data about ?  A data case is a partially observed joint state of a finite, nonempty subset
  44. 44. Learning Task <ul><li>Given: </li></ul><ul><ul><li>set of data cases </li></ul></ul><ul><ul><li>a Bayesian logic program B </li></ul></ul><ul><li>Goal: for each the parameters </li></ul><ul><li>of that best fit the given data </li></ul>
  45. 45. Parameter Estimation (contd.) <ul><li>„ best fit“ ~ ML-Estimation </li></ul><ul><li>where the hypothesis space is spanned by the product space over the possible values of </li></ul><ul><li>: </li></ul>
  46. 46. Parameter Estimation (contd.) <ul><li>Assumption: </li></ul><ul><li>D1,...,DN are independently sampled from indentical distributions (e.g. totally separated families), ), </li></ul>
  47. 47. Parameter Estimation (contd.)  is an ordinary BN marginal of marginal of
  48. 48. Parameter Estimation (contd.) <ul><li>Reduced to a problem within Bayesian networks: </li></ul><ul><li>given structure, </li></ul><ul><li>partially observed random varianbles </li></ul><ul><li>EM </li></ul><ul><li>[Dempster, Laird, Rubin, ´77], [Lauritzen, ´91] </li></ul><ul><li>Gradient Ascent </li></ul><ul><li>[Binder, Koller, Russel, Kanazawa, ´97], [Jensen, ´99] </li></ul>
  49. 49. Decomposable CRs <ul><li>Parameters of the clauses and not of the support network. </li></ul> Single ground instance of a Bayesian clause Multiple ground instance of the same Bayesian clause CPD for Combining Rule ... ... ... ...
  50. 50. Gradient Ascent  Goal : Computation of
  51. 51. Gradient Ascent  Bayesian ground clause Bayesian clause
  52. 52. Gradient Ascent  Bayesian Network Inference Engine
  53. 53. Algorithm 
  54. 54. Expectation-Maximization <ul><li>Initialize parameters </li></ul><ul><li>E-Step and M-Step , i.e. </li></ul><ul><li>compute expected counts for each clause and treat the expected count as counts </li></ul><ul><li>If not converged, iterate to 2 </li></ul>
  55. 55. Experimental Evidence <ul><li>[Koller, Pfeffer ´97] </li></ul><ul><li>support network is a good approximation </li></ul><ul><li>[Binder et al. ´97] </li></ul><ul><li>equality constraints speed up learning </li></ul> <ul><li>100 data cases </li></ul><ul><li>constant step-size </li></ul><ul><li>Estimation of means </li></ul><ul><ul><li>13 iterations </li></ul></ul><ul><li>Estimation of the weights </li></ul><ul><ul><li>sum = 1.0 </li></ul></ul>N(165,s) false false N(165,s) true false N(165,s) false true N(0.5*h(M)+0.5*h(F),s) true true pdf (c)(h(X)|h(M),h(F)) f(F,X) m(M,X)
  56. 56. Outline <ul><li>Bayesian Logic Programs </li></ul><ul><ul><li>Examples and Language </li></ul></ul><ul><ul><li>Semantics and Support Networks </li></ul></ul><ul><li>Learning Bayesian Logic Programs </li></ul><ul><ul><li>Data Cases </li></ul></ul><ul><ul><li>Parameter Estimation </li></ul></ul><ul><ul><li>Structural Learning </li></ul></ul>
  57. 57. Structural Learning <ul><li>Combination of Inductive Logic Programming and Bayesian network learning </li></ul> <ul><li>Datalog fragment of </li></ul><ul><li>Bayesian logic programs (no functors) </li></ul><ul><li>intensional Bayesian clauses </li></ul>
  58. 58. Idea - CLAUDIEN <ul><li>learning from interpretations </li></ul><ul><li>all data cases are </li></ul><ul><li>Herbrand interpretations </li></ul><ul><li>a hypothesis should </li></ul><ul><li>reflect what is in the </li></ul><ul><li>data </li></ul> probabilistic extension all data cases are partially observed joint states of Herbrand intepretations all hypotheses have to be (logically) true in all data cases
  59. 59. What is the data about ? ... 
  60. 60. <ul><li>: set of data cases </li></ul><ul><li>: set of all clauses that can be part of </li></ul><ul><li>hypotheses </li></ul><ul><li>(logically) valid iff </li></ul><ul><li>logical solution iff is a logically maximally general valid hypothesis </li></ul>Claudien - Learning From Interpretations  probabilistic solution iff is (logically) valid and the Bayesian network induced by B on is acyclic
  61. 61. Learning Task <ul><li>Given: </li></ul><ul><ul><li>set of data cases </li></ul></ul><ul><ul><li>a set of Bayesian logic programs </li></ul></ul><ul><ul><li>a scoring function </li></ul></ul><ul><li>Goal: probabilistic solution </li></ul><ul><ul><li>matches the data best according to </li></ul></ul>
  62. 62. Algorithm 
  63. 63. Example  mc(john) pc(john) bc(john) m(ann,john) f(eric,john) pc(ann) mc(ann) mc(eric) pc(eric) mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Original Bayesian logic program {m(ann,john)=true, pc(ann)=a, mc(ann)=?, f(eric,john)=true, pc(eric)=b, mc(eric)=a, mc(john)=ab, pc(john)=a, bt(john) = ? } ... Data cases
  64. 64. Example  mc(john) pc(john) bc(john) m(ann,john) f(eric,john) pc(ann) mc(ann) mc(eric) pc(eric) mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Original Bayesian logic program mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Initial hypothesis
  65. 65. Example  mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Original Bayesian logic program mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Initial hypothesis mc(john) pc(john) bc(john) m(ann,john) f(eric,john) pc(ann) mc(ann) mc(eric) pc(eric)
  66. 66. Example  mc(john) pc(john) bc(john) m(ann,john) f(eric,john) pc(ann) mc(ann) mc(eric) pc(eric) mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Original Bayesian logic program mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Refinement mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Initial hypothesis
  67. 67. Example  mc(john) pc(john) bc(john) m(ann,john) f(eric,john) pc(ann) mc(ann) mc(eric) pc(eric) mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Original Bayesian logic program mc(X) | m(M,X),mc(X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Refinement mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Refinement
  68. 68. Example  mc(john) pc(john) bc(john) m(ann,john) f(eric,john) pc(ann) mc(ann) mc(eric) pc(eric) mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Original Bayesian logic program mc(X) | m(M,X),pc(X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Refinement mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Refinement
  69. 69. Example  mc(john) pc(john) bc(john) m(ann,john) f(eric,john) pc(ann) mc(ann) mc(eric) pc(eric) ... mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Original Bayesian logic program mc(X) | m(M,X),pc(X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Refinement mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Refinement
  70. 70. Properties <ul><li>All relevant random variables are known </li></ul><ul><li>First order equivalent of Bayesian network setting </li></ul><ul><li>Hypothesis postulates true regularities in the data </li></ul><ul><li>Logical solutions as inital hypotheses </li></ul><ul><li>Highlights Background Knowledge </li></ul>
  71. 71. Example Experiments  mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Data: sampling from 2 families, each 1000 samples Score: LogLikelihood Goal: learn the definition of bt CLAUDIEN mc(X) | m(M,X). pc(X) | f(F,X). Bloodtype mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). highest score
  72. 72. Conclusion <ul><li>EM-based and Gradient-based method to do ML parameter estimation </li></ul><ul><li>Link between ILP and learning Bayesian networks </li></ul><ul><li>CLAUDIEN setting used to define and to traverse the search space </li></ul><ul><li>Bayesian network scores used to evaluate hypotheses </li></ul>
  73. 73.  Thanks !
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×