BLPs
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

BLPs

on

  • 543 views

 

Statistics

Views

Total Views
543
Views on SlideShare
543
Embed Views
0

Actions

Likes
0
Downloads
2
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

BLPs Presentation Transcript

  • 1. Bayesian Logic Programs Kristian Kersting, Luc De Raedt Albert-Ludwigs University Freiburg, Germany Summer School on Relational Data Mining 17 and 18 August 2002, Helsinki, Finland
  • 2. Context Real-world applications uncertainty complex, structured domains logic objects, relations,functors probability theory discrete, continuous Bayesian networks Logic Programming (Prolog) + Bayesian logic programs
  • 3. Outline
    • Bayesian Logic Programs
      • Examples and Language
      • Semantics and Support Networks
    • Learning Bayesian Logic Programs
      • Data Cases
      • Parameter Estimation
      • Structural Learning
  • 4. Bayesian Logic Programs
    • Probabilistic models structured using logic
    • Extend Bayesian networks with notions of objects and relations
    • Probability density over (countably) infinitely many random variables
    • Flexible discrete-time stochastic processes
    • Generalize pure Prolog, Bayesian networks, dynamic Bayesian networks, dynamic Bayesian multinets, hidden Markov models,...
  • 5. Bayesian Networks
    • One of the successes of AI
    • State-of-the-art to model uncertainty, in particular the degree of belief
    • Advantage [Russell, Norvig 96]:
      • „ strict separation of qualitative and quantitative aspects of the world“
    • Disadvantge [Breese, Ngo, Haddawy, Koller, ...]:
      • Propositional character, no notion of objects and relations among them
  • 6. Stud farm (Jensen ´96)
    • The colt John has been born recently on a stud farm.
    • John suffers from a life threatening hereditary carried by a recessive gene. The disease is so serious that John is displaced instantly, and the stud farm wants the gene out of production, his parents are taken out of breeding.
    • What are the probabilities for the remaining horses to be carriers of the unwanted gene ?
  • 7. Bayesian networks [Pearl ´88] bt_ann bt_brian bt_cecily bt_dorothy bt_eric bt_gwenn bt_unknown2 bt_unknown1 bt_fred bt_henry bt_irene bt_john Based on the stud farm example [Jensen ´96] 
  • 8. Bayesian networks [Pearl ´88] bt_ann bt_brian bt_cecily bt_dorothy bt_eric bt_gwenn bt_unknown2 bt_unknown1 bt_fred bt_henry bt_irene bt_john Based on the stud farm example [Jensen ´96] (Conditional) Probability distribution  P(bt_cecily=aA|bt_john=aA)=0.1499 P(bt_john=AA|bt_ann=aA)=0.6906 P(bt_john=AA)=0.9909 (0.33,0.33,0.33) ... (0.0,1.0,0.0) (0.5,0.5,0.0) bt_irene bt_henry P (bt_john) AA AA AA aa aA aa aa aa (1.0,0.0,0.0)
  • 9. Bayesian networks (contd.)
    • acyclic graphs
    • probability distribution over a finite set of random variables:
  • 10. From Bayesian Networks to Bayesian Logic Programs bt_ann bt_brian bt_cecily bt_dorothy bt_eric bt_gwenn bt_unknown2 bt_unknown1 bt_fred bt_henry bt_irene bt_john 
  • 11. From Bayesian Networks to Bayesian Logic Programs bt_ann. bt_brian. bt_cecily. bt_dorothy bt_eric bt_gwenn bt_unknown2. bt_unknown1. bt_fred bt_henry bt_irene bt_john 
  • 12. From Bayesian Networks to Bayesian Logic Programs bt_ann. bt_brian. bt_cecily. bt_dorothy | bt_ann, bt_brian. bt_eric| bt_brian, bt_cecily. bt_gwenn | bt_ann, bt_unknown2. bt_unknown2. bt_unknown1. bt_fred | bt_unknown1, bt_ann. bt_henry bt_irene bt_john 
  • 13. From Bayesian Networks to Bayesian Logic Programs bt_ann. bt_brian. bt_cecily. bt_dorothy | bt_ann, bt_brian. bt_eric| bt_brian, bt_cecily. bt_gwenn | bt_ann, bt_unknown2. bt_unknown2. bt_unknown1. bt_fred | bt_unknown1, bt_ann. bt_henry | bt_fred, bt_dorothy. bt_irene | bt_eric, bt_gwenn. bt_john 
  • 14. From Bayesian Networks to Bayesian Logic Programs bt_ann. bt_brian. bt_cecily. bt_dorothy | bt_ann, bt_brian. bt_eric| bt_brian, bt_cecily. bt_gwenn | bt_ann, bt_unknown2. bt_unknown2. bt_unknown1. bt_fred | bt_unknown1, bt_ann. bt_henry | bt_fred, bt_dorothy. bt_irene | bt_eric, bt_gwenn. bt_john | bt_henry ,bt_irene.  (0.33,0.33,0.33) ... (0.0,1.0,0.0) (0.5,0.5,0.0) bt_irene bt_henry P (bt_john) AA AA AA aa aA aa aa aa (1.0,0.0,0.0)
  • 15. From Bayesian Networks to Bayesian Logic Programs % apriori nodes bt_ann. bt_brian. bt_cecily. bt_unknown1. bt_unknown1. % aposteriori nodes bt_henry | bt_fred, bt_dorothy. bt_irene | bt_eric, bt_gwenn. bt_fred | bt_unknown1, bt_ann. bt_dorothy| bt_brian, bt_ann. bt_eric | bt_brian, bt_cecily. bt_gwenn | bt_unknown2, bt_ann. bt_john | bt_henry, bt_irene.  D omain e.g. finite, discrete, continuous (conditional) probability distribution (0.33,0.33,0.33) ... (0.0,1.0,0.0) (0.5,0.5,0.0) bt_irene bt_henry P (bt_john) AA AA AA aa aA aa aa aa (1.0,0.0,0.0)
  • 16. From Bayesian Networks to Bayesian Logic Programs % apriori nodes bt(ann). bt(brian). bt(cecily). bt(unknown1). bt(unknown1). % aposteriori nodes bt(henry) | bt(fred), bt(dorothy). bt(irene) | bt(eric), bt(gwenn). bt(fred) | bt(unknown1), bt(ann). bt(dorothy)| bt(brian), bt(ann). bt(eric) | bt(brian), bt(cecily). bt(gwenn) | bt(unknown2), bt(ann). bt(john) | bt(henry), bt(irene).  (conditional) probability distribution (0.33,0.33,0.33) ... (0.0,1.0,0.0) (0.5,0.5,0.0) bt(irene) bt(henry) P (bt(john)) AA AA AA aa aA aa aa aa (1.0,0.0,0.0)
  • 17. From Bayesian Networks to Bayesian Logic Programs  % ground facts / apriori bt(ann). bt(brian). bt(cecily). bt(unkown1). bt(unkown1). father(unkown1,fred). mother(ann,fred). father(brian,dorothy). mother(ann, dorothy). father(brian,eric). mother(cecily,eric). father(unkown2,gwenn). mother(ann,gwenn). father(fred,henry). mother(dorothy,henry). father(eric,irene). mother(gwenn,irene). father(henry,john). mother(irene,john). % rules / aposteriori bt(X) | father(F,X), bt(F), mother(M,X), bt(M). (conditional) probability distribution AA Aa bt(F) false true father(F,X) (0.33,0.33,0.33) ... (1.0,0.0,0.0) P (bt(X)) false bt(M) mother(M,X) AA aa true
  • 18. Dependency graph = Bayesian network bt(ann) bt(brian) bt(cecily) bt(dorothy) mother(ann,dorothy) father(brian,dorothy) bt(eric) father(brian,eric) mother(cecily,eric) bt(gwenn) mother(ann,gwenn) bt(unknown2) bt(unknown1) bt(fred) mother(ann,fred) father(unknown1,fred) bt(henry) bt(irene) mother(dorothy,henry) mother(gwenn,irene) father(eric,irene) father(fred,henry) bt(john) father(henry,john) mother(irene,john) father(unknown2,eric) 
  • 19. bt_ann bt_brian bt_cecily bt_dorothy bt_eric bt_gwenn bt_unknown2 bt_unknown1 bt_fred bt_henry bt_irene bt_john  Dependency graph = Bayesian network
  • 20. Bayesian Logic Programs - a first definition
    • A BLP consists of
      • a finite set of Bayesian clauses.
      • To each clause in a conditional probability distribution is associated:
    • Proper random variables ~ LH(B)
    • graphical structure ~ dependency graph
    • Quantitative information ~ CPDs
  • 21. Bayesian Logic Programs - Examples % apriori nodes nat(0). % aposteriori nodes nat(s(X)) | nat(X). ... MC % apriori nodes state(0). % aposteriori nodes state(s(Time)) | state(Time). output(Time) | state(Time) HMM % apriori nodes n1(0). % aposteriori nodes n1(s(TimeSlice) | n2(TimeSlice). n2(TimeSlice) | n1(TimeSlice). n3(TimeSlice) | n1(TimeSlice), n2(TimeSlice). DBN  pure Prolog nat(0) nat(s(0)) nat(s(s(0)) state(0) output(0) state(s(0)) output(s(0)) ... n1(0) n2(0) n3(0) n1(s(0)) n2(s(0)) n3(s(0)) ...
  • 22.
    • represent generically the CPD for each ground instance of the corresponding Bayesian clause.
    Associated CPDs  Multiple ground instances of clauses having the same head atom? AA Aa bt(F) false true father(F,X) (0.33,0.33,0.33) ... (1.0,0.0,0.0) P (bt(X)) false bt(M) mother(M,X) AA aa true 
  • 23. Combining Rules
    • Multiple ground instances of clauses having the same head atom?
    % ground facts as before % rules bt(X) | father(F,X), bt(F). bt(X) | mother(M,X), bt(M).  cpd(bt(john)|father(henry,john), bt(henry)) and cpd(bt(john)|mother(henry,john), bt(irene)) cpd(bt(john)|father(henry,john),bt(henry),mother(irene,john),bt(irene)) But we need !!!
  • 24. C ombining R ules (contd.) P (A|B,C) P (A|B) and P (A|C) CR
    • Any algorithm which
        • combines a set of PDFs
        • into the (combined) PDFs
        • where
        • has an empty output if and only if the input is empty
    • E.g. noisy-or, regression, ...
  • 25. Bayesian Logic Programs - a definition
    • A BLP consists of
      • a finite set of Bayesian clauses.
      • To each clause in a conditional probability distribution is associated:
      • To each Bayesian predicate p a combining rule is associated to combine CPDs of multiple ground instances of clauses having the same head
    • Proper random variables ~ LH(B)
    • graphical structure ~ dependency graph
    • Quantitative information ~ CPDs and CRs
  • 26. Outline
    • Bayesian Logic Programs
      • Examples and Language
      • Semantics and Support Networks
    • Learning Bayesian Logic Programs
      • Data Cases
      • Parameter Estimation
      • Structural Learning
  • 27. Discrete-Time Stochastic Process
    • Family of random variables over a domain X, where
    • for each linearization of the partial order induced by the dependency graph a Bayesian logic program specifies a discrete-time stochastic process
  • 28. Theorem of Kolmogorov  Existence and uniqueness of probability measure
      • : a Polish space
      • : set of all non-empty, finite subsets of J
      • : the probability measure over
      • If the projective family exists then there exists a unique probability measure
  • 29. Consistency Conditions
    • Probability measure ,
    • is represented by a finite Bayesian network which is a subnetwork of the dependency graph over LH(B): Support Network
    • (Elimination Order): All stochastic processes represented by a Bayesian logic program B specify the same probability measure over LH(B).
  • 30. Support network bt(ann) bt(brian) bt(cecily) bt(dorothy) mother(ann,dorothy) father(brian,dorothy) bt(eric) father(brian,eric) mother(cecily,eric) bt(gwenn) mother(ann,gwenn) bt(unknown2) bt(unknown1) bt(fred) mother(ann,fred) father(unknown1,fred) bt(henry) bt(irene) mother(dorothy,henry) mother(gwenn,irene) father(eric,irene) father(fred,henry) bt(john) father(henry,john) mother(irene,john) father(unknown2,eric) 
  • 31. Support network bt(ann) bt(brian) bt(cecily) bt(dorothy) mother(ann,dorothy) father(brian,dorothy) bt(eric) father(brian,eric) mother(cecily,eric) bt(gwenn) mother(ann,gwenn) bt(unknown2) bt(unknown1) bt(fred) mother(ann,fred) father(unknown1,fred) bt(henry) bt(irene) mother(dorothy,henry) mother(gwenn,irene) father(eric,irene) father(fred,henry) bt(john) father(henry,john) mother(irene,john) father(unknown2,eric) 
  • 32. Support network bt(ann) bt(brian) bt(cecily) bt(dorothy) mother(ann,dorothy) father(brian,dorothy) bt(eric) father(brian,eric) mother(cecily,eric) bt(gwenn) mother(ann,gwenn) bt(unknown2) bt(unknown1) bt(fred) mother(ann,fred) father(unknown1,fred) bt(henry) bt(irene) mother(dorothy,henry) mother(gwenn,irene) father(eric,irene) father(fred,henry) bt(john) father(henry,john) mother(irene,john) father(unknown2,eric) 
  • 33. Support network
    • Support network of is the induced subnetwork of
    • Support network of is defined as
    • Computation utilizes And/Or trees
  • 34. Queries using And/Or trees
    • A probabilistic query
    • ?- Q 1 ...,Q n |E 1 =e 1 ,...,E m =e m .
    • asks for the distribution
    • P (Q 1 , ..., Q n |E 1 =e 1 , ..., E m =e m ).
    • ?- bt(eric).
    • Or node is proven if at least one of its successors is provable.
    • And node is proven if all of its successors are provable .
    bt(brian) bt(cecily) father(brian,eric),bt(brian), mother(cecily,eric),bt(cecily) father(brian,eric) mother(cecily,eric) bt(eric) cpd bt(brian) bt(cecily) father(brian,eric) mother(cecily,eric) bt(eric) combined cpd
  • 35. Consistency Condition (contd.)
      • the dependency graph is acyclic, and
      • every random variable is influenced by a finite set of random variables only
     well-defined Bayesian logic program Projective family exists if
  • 36. Relational Character % ground facts bt(ann). bt(brian). bt(cecily). bt(unknown1). bt(unknown1). father(unknown1,fred). mother(ann,fred). father(brian,dorothy). mother(ann, dorothy). father(brian,eric). mother(cecily,eric). father(unknown2,gwenn). mother(ann,gwenn). father(fred,henry). mother(dorothy,henry). father(eric,irene). mother(gwenn,irene). father(henry,john). mother(irene,john). % rules bt(X) | father(F,X), bt(F), mother(M,X), bt(M).  AA Aa bt(F) false true father(X,F) (0.33,0.33,0.33) ... (1.0,0.0,0.0) P (bt(X)) false bt(M) mother(X,M) AA aa true % ground facts bt(susanne). bt(ralf). bt(peter). bt(uta). father(ralf,luca). mother(susanne,luca). ... % ground facts bt(petra). bt(bsilvester). bt(anne). bt(wilhelm). bt(beate). father(silvester,claudien). mother(beate,claudien). father(wilhelm,marta). mother(anne, marthe). ...
  • 37. Bayesian Logic Programs - Summary
    • First order logic extension of Bayesian networks
    • constants, relations, functors
    • discrete and continuous random variables
    • ground atoms = random variables
    • CPDs associated to clauses
    • Dependency graph =
    • (possibly) infinite Bayesian network
    • Generalize dynamic Bayesian networks and definite clause logic (range-restricted)
  • 38. Applications
    • Probabilistic, logical
      • Description and prediction
      • Regression
      • Classification
      • Clustering
    • Computational Biology
      • APrIL IST-2001-33053
    • Web Mining
    • Query approximation
    • Planning, ...
  • 39. Other frameworks
    • Probabilistic Horn Abduction [Poole 93]
    • Distributional Semantics (PRISM) [Sato 95]
    • Stochastic Logic Programs [Muggleton 96; Cussens 99]
    • Relational Bayesian Nets [Jaeger 97]
    • Probabilistic Logic Programs [Ngo, Haddawy 97]
    • Object-Oriented Bayesian Nets [Koller, Pfeffer 97]
    • Probabilistic Frame-Based Systems [Koller, Pfeffer 98]
    • Probabilistic Relational Models [Koller 99]
  • 40. Outline
    • Bayesian Logic Programs
      • Examples and Language
      • Semantics and Support Networks
    • Learning Bayesian Logic Programs
      • Data Cases
      • Parameter Estimation
      • Structural Learning
  • 41. Learning Bayesian Logic Programs Data + Background Knowledge  learning algorithm .9 .1 e b e .7 .3 .01 .99 .8 .2 b e b b e E P (A) A | E, B. B
  • 42. Why Learning Bayesian Logic Programs ?
    • Of interest to different communities ?
      • scoring functions, pruning techniques, theoretical insights, ...
     Inductive Logic Programming Learning within Bayesian Logic Programs Learning within Bayesian network
  • 43. What is the data about ?  A data case is a partially observed joint state of a finite, nonempty subset
  • 44. Learning Task
    • Given:
      • set of data cases
      • a Bayesian logic program B
    • Goal: for each the parameters
    • of that best fit the given data
  • 45. Parameter Estimation (contd.)
    • „ best fit“ ~ ML-Estimation
    • where the hypothesis space is spanned by the product space over the possible values of
    • :
  • 46. Parameter Estimation (contd.)
    • Assumption:
    • D1,...,DN are independently sampled from indentical distributions (e.g. totally separated families), ),
  • 47. Parameter Estimation (contd.)  is an ordinary BN marginal of marginal of
  • 48. Parameter Estimation (contd.)
    • Reduced to a problem within Bayesian networks:
    • given structure,
    • partially observed random varianbles
    • EM
    • [Dempster, Laird, Rubin, ´77], [Lauritzen, ´91]
    • Gradient Ascent
    • [Binder, Koller, Russel, Kanazawa, ´97], [Jensen, ´99]
  • 49. Decomposable CRs
    • Parameters of the clauses and not of the support network.
     Single ground instance of a Bayesian clause Multiple ground instance of the same Bayesian clause CPD for Combining Rule ... ... ... ...
  • 50. Gradient Ascent  Goal : Computation of
  • 51. Gradient Ascent  Bayesian ground clause Bayesian clause
  • 52. Gradient Ascent  Bayesian Network Inference Engine
  • 53. Algorithm 
  • 54. Expectation-Maximization
    • Initialize parameters
    • E-Step and M-Step , i.e.
    • compute expected counts for each clause and treat the expected count as counts
    • If not converged, iterate to 2
  • 55. Experimental Evidence
    • [Koller, Pfeffer ´97]
    • support network is a good approximation
    • [Binder et al. ´97]
    • equality constraints speed up learning
    • 100 data cases
    • constant step-size
    • Estimation of means
      • 13 iterations
    • Estimation of the weights
      • sum = 1.0
    N(165,s) false false N(165,s) true false N(165,s) false true N(0.5*h(M)+0.5*h(F),s) true true pdf (c)(h(X)|h(M),h(F)) f(F,X) m(M,X)
  • 56. Outline
    • Bayesian Logic Programs
      • Examples and Language
      • Semantics and Support Networks
    • Learning Bayesian Logic Programs
      • Data Cases
      • Parameter Estimation
      • Structural Learning
  • 57. Structural Learning
    • Combination of Inductive Logic Programming and Bayesian network learning
    • Datalog fragment of
    • Bayesian logic programs (no functors)
    • intensional Bayesian clauses
  • 58. Idea - CLAUDIEN
    • learning from interpretations
    • all data cases are
    • Herbrand interpretations
    • a hypothesis should
    • reflect what is in the
    • data
     probabilistic extension all data cases are partially observed joint states of Herbrand intepretations all hypotheses have to be (logically) true in all data cases
  • 59. What is the data about ? ... 
  • 60.
    • : set of data cases
    • : set of all clauses that can be part of
    • hypotheses
    • (logically) valid iff
    • logical solution iff is a logically maximally general valid hypothesis
    Claudien - Learning From Interpretations  probabilistic solution iff is (logically) valid and the Bayesian network induced by B on is acyclic
  • 61. Learning Task
    • Given:
      • set of data cases
      • a set of Bayesian logic programs
      • a scoring function
    • Goal: probabilistic solution
      • matches the data best according to
  • 62. Algorithm 
  • 63. Example  mc(john) pc(john) bc(john) m(ann,john) f(eric,john) pc(ann) mc(ann) mc(eric) pc(eric) mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Original Bayesian logic program {m(ann,john)=true, pc(ann)=a, mc(ann)=?, f(eric,john)=true, pc(eric)=b, mc(eric)=a, mc(john)=ab, pc(john)=a, bt(john) = ? } ... Data cases
  • 64. Example  mc(john) pc(john) bc(john) m(ann,john) f(eric,john) pc(ann) mc(ann) mc(eric) pc(eric) mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Original Bayesian logic program mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Initial hypothesis
  • 65. Example  mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Original Bayesian logic program mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Initial hypothesis mc(john) pc(john) bc(john) m(ann,john) f(eric,john) pc(ann) mc(ann) mc(eric) pc(eric)
  • 66. Example  mc(john) pc(john) bc(john) m(ann,john) f(eric,john) pc(ann) mc(ann) mc(eric) pc(eric) mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Original Bayesian logic program mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Refinement mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Initial hypothesis
  • 67. Example  mc(john) pc(john) bc(john) m(ann,john) f(eric,john) pc(ann) mc(ann) mc(eric) pc(eric) mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Original Bayesian logic program mc(X) | m(M,X),mc(X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Refinement mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Refinement
  • 68. Example  mc(john) pc(john) bc(john) m(ann,john) f(eric,john) pc(ann) mc(ann) mc(eric) pc(eric) mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Original Bayesian logic program mc(X) | m(M,X),pc(X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Refinement mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Refinement
  • 69. Example  mc(john) pc(john) bc(john) m(ann,john) f(eric,john) pc(ann) mc(ann) mc(eric) pc(eric) ... mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Original Bayesian logic program mc(X) | m(M,X),pc(X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Refinement mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Refinement
  • 70. Properties
    • All relevant random variables are known
    • First order equivalent of Bayesian network setting
    • Hypothesis postulates true regularities in the data
    • Logical solutions as inital hypotheses
    • Highlights Background Knowledge
  • 71. Example Experiments  mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Data: sampling from 2 families, each 1000 samples Score: LogLikelihood Goal: learn the definition of bt CLAUDIEN mc(X) | m(M,X). pc(X) | f(F,X). Bloodtype mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). highest score
  • 72. Conclusion
    • EM-based and Gradient-based method to do ML parameter estimation
    • Link between ILP and learning Bayesian networks
    • CLAUDIEN setting used to define and to traverse the search space
    • Bayesian network scores used to evaluate hypotheses
  • 73.  Thanks !