• Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
279
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
2
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Bayesian Logic Programs Kristian Kersting, Luc De Raedt Albert-Ludwigs University Freiburg, Germany Summer School on Relational Data Mining 17 and 18 August 2002, Helsinki, Finland
  • 2. Context Real-world applications uncertainty complex, structured domains logic objects, relations,functors probability theory discrete, continuous Bayesian networks Logic Programming (Prolog) + Bayesian logic programs
  • 3. Outline
    • Bayesian Logic Programs
      • Examples and Language
      • Semantics and Support Networks
    • Learning Bayesian Logic Programs
      • Data Cases
      • Parameter Estimation
      • Structural Learning
  • 4. Bayesian Logic Programs
    • Probabilistic models structured using logic
    • Extend Bayesian networks with notions of objects and relations
    • Probability density over (countably) infinitely many random variables
    • Flexible discrete-time stochastic processes
    • Generalize pure Prolog, Bayesian networks, dynamic Bayesian networks, dynamic Bayesian multinets, hidden Markov models,...
  • 5. Bayesian Networks
    • One of the successes of AI
    • State-of-the-art to model uncertainty, in particular the degree of belief
    • Advantage [Russell, Norvig 96]:
      • „ strict separation of qualitative and quantitative aspects of the world“
    • Disadvantge [Breese, Ngo, Haddawy, Koller, ...]:
      • Propositional character, no notion of objects and relations among them
  • 6. Stud farm (Jensen ´96)
    • The colt John has been born recently on a stud farm.
    • John suffers from a life threatening hereditary carried by a recessive gene. The disease is so serious that John is displaced instantly, and the stud farm wants the gene out of production, his parents are taken out of breeding.
    • What are the probabilities for the remaining horses to be carriers of the unwanted gene ?
  • 7. Bayesian networks [Pearl ´88] bt_ann bt_brian bt_cecily bt_dorothy bt_eric bt_gwenn bt_unknown2 bt_unknown1 bt_fred bt_henry bt_irene bt_john Based on the stud farm example [Jensen ´96] 
  • 8. Bayesian networks [Pearl ´88] bt_ann bt_brian bt_cecily bt_dorothy bt_eric bt_gwenn bt_unknown2 bt_unknown1 bt_fred bt_henry bt_irene bt_john Based on the stud farm example [Jensen ´96] (Conditional) Probability distribution  P(bt_cecily=aA|bt_john=aA)=0.1499 P(bt_john=AA|bt_ann=aA)=0.6906 P(bt_john=AA)=0.9909 (0.33,0.33,0.33) ... (0.0,1.0,0.0) (0.5,0.5,0.0) bt_irene bt_henry P (bt_john) AA AA AA aa aA aa aa aa (1.0,0.0,0.0)
  • 9. Bayesian networks (contd.)
    • acyclic graphs
    • probability distribution over a finite set of random variables:
  • 10. From Bayesian Networks to Bayesian Logic Programs bt_ann bt_brian bt_cecily bt_dorothy bt_eric bt_gwenn bt_unknown2 bt_unknown1 bt_fred bt_henry bt_irene bt_john 
  • 11. From Bayesian Networks to Bayesian Logic Programs bt_ann. bt_brian. bt_cecily. bt_dorothy bt_eric bt_gwenn bt_unknown2. bt_unknown1. bt_fred bt_henry bt_irene bt_john 
  • 12. From Bayesian Networks to Bayesian Logic Programs bt_ann. bt_brian. bt_cecily. bt_dorothy | bt_ann, bt_brian. bt_eric| bt_brian, bt_cecily. bt_gwenn | bt_ann, bt_unknown2. bt_unknown2. bt_unknown1. bt_fred | bt_unknown1, bt_ann. bt_henry bt_irene bt_john 
  • 13. From Bayesian Networks to Bayesian Logic Programs bt_ann. bt_brian. bt_cecily. bt_dorothy | bt_ann, bt_brian. bt_eric| bt_brian, bt_cecily. bt_gwenn | bt_ann, bt_unknown2. bt_unknown2. bt_unknown1. bt_fred | bt_unknown1, bt_ann. bt_henry | bt_fred, bt_dorothy. bt_irene | bt_eric, bt_gwenn. bt_john 
  • 14. From Bayesian Networks to Bayesian Logic Programs bt_ann. bt_brian. bt_cecily. bt_dorothy | bt_ann, bt_brian. bt_eric| bt_brian, bt_cecily. bt_gwenn | bt_ann, bt_unknown2. bt_unknown2. bt_unknown1. bt_fred | bt_unknown1, bt_ann. bt_henry | bt_fred, bt_dorothy. bt_irene | bt_eric, bt_gwenn. bt_john | bt_henry ,bt_irene.  (0.33,0.33,0.33) ... (0.0,1.0,0.0) (0.5,0.5,0.0) bt_irene bt_henry P (bt_john) AA AA AA aa aA aa aa aa (1.0,0.0,0.0)
  • 15. From Bayesian Networks to Bayesian Logic Programs % apriori nodes bt_ann. bt_brian. bt_cecily. bt_unknown1. bt_unknown1. % aposteriori nodes bt_henry | bt_fred, bt_dorothy. bt_irene | bt_eric, bt_gwenn. bt_fred | bt_unknown1, bt_ann. bt_dorothy| bt_brian, bt_ann. bt_eric | bt_brian, bt_cecily. bt_gwenn | bt_unknown2, bt_ann. bt_john | bt_henry, bt_irene.  D omain e.g. finite, discrete, continuous (conditional) probability distribution (0.33,0.33,0.33) ... (0.0,1.0,0.0) (0.5,0.5,0.0) bt_irene bt_henry P (bt_john) AA AA AA aa aA aa aa aa (1.0,0.0,0.0)
  • 16. From Bayesian Networks to Bayesian Logic Programs % apriori nodes bt(ann). bt(brian). bt(cecily). bt(unknown1). bt(unknown1). % aposteriori nodes bt(henry) | bt(fred), bt(dorothy). bt(irene) | bt(eric), bt(gwenn). bt(fred) | bt(unknown1), bt(ann). bt(dorothy)| bt(brian), bt(ann). bt(eric) | bt(brian), bt(cecily). bt(gwenn) | bt(unknown2), bt(ann). bt(john) | bt(henry), bt(irene).  (conditional) probability distribution (0.33,0.33,0.33) ... (0.0,1.0,0.0) (0.5,0.5,0.0) bt(irene) bt(henry) P (bt(john)) AA AA AA aa aA aa aa aa (1.0,0.0,0.0)
  • 17. From Bayesian Networks to Bayesian Logic Programs  % ground facts / apriori bt(ann). bt(brian). bt(cecily). bt(unkown1). bt(unkown1). father(unkown1,fred). mother(ann,fred). father(brian,dorothy). mother(ann, dorothy). father(brian,eric). mother(cecily,eric). father(unkown2,gwenn). mother(ann,gwenn). father(fred,henry). mother(dorothy,henry). father(eric,irene). mother(gwenn,irene). father(henry,john). mother(irene,john). % rules / aposteriori bt(X) | father(F,X), bt(F), mother(M,X), bt(M). (conditional) probability distribution AA Aa bt(F) false true father(F,X) (0.33,0.33,0.33) ... (1.0,0.0,0.0) P (bt(X)) false bt(M) mother(M,X) AA aa true
  • 18. Dependency graph = Bayesian network bt(ann) bt(brian) bt(cecily) bt(dorothy) mother(ann,dorothy) father(brian,dorothy) bt(eric) father(brian,eric) mother(cecily,eric) bt(gwenn) mother(ann,gwenn) bt(unknown2) bt(unknown1) bt(fred) mother(ann,fred) father(unknown1,fred) bt(henry) bt(irene) mother(dorothy,henry) mother(gwenn,irene) father(eric,irene) father(fred,henry) bt(john) father(henry,john) mother(irene,john) father(unknown2,eric) 
  • 19. bt_ann bt_brian bt_cecily bt_dorothy bt_eric bt_gwenn bt_unknown2 bt_unknown1 bt_fred bt_henry bt_irene bt_john  Dependency graph = Bayesian network
  • 20. Bayesian Logic Programs - a first definition
    • A BLP consists of
      • a finite set of Bayesian clauses.
      • To each clause in a conditional probability distribution is associated:
    • Proper random variables ~ LH(B)
    • graphical structure ~ dependency graph
    • Quantitative information ~ CPDs
  • 21. Bayesian Logic Programs - Examples % apriori nodes nat(0). % aposteriori nodes nat(s(X)) | nat(X). ... MC % apriori nodes state(0). % aposteriori nodes state(s(Time)) | state(Time). output(Time) | state(Time) HMM % apriori nodes n1(0). % aposteriori nodes n1(s(TimeSlice) | n2(TimeSlice). n2(TimeSlice) | n1(TimeSlice). n3(TimeSlice) | n1(TimeSlice), n2(TimeSlice). DBN  pure Prolog nat(0) nat(s(0)) nat(s(s(0)) state(0) output(0) state(s(0)) output(s(0)) ... n1(0) n2(0) n3(0) n1(s(0)) n2(s(0)) n3(s(0)) ...
  • 22.
    • represent generically the CPD for each ground instance of the corresponding Bayesian clause.
    Associated CPDs  Multiple ground instances of clauses having the same head atom? AA Aa bt(F) false true father(F,X) (0.33,0.33,0.33) ... (1.0,0.0,0.0) P (bt(X)) false bt(M) mother(M,X) AA aa true 
  • 23. Combining Rules
    • Multiple ground instances of clauses having the same head atom?
    % ground facts as before % rules bt(X) | father(F,X), bt(F). bt(X) | mother(M,X), bt(M).  cpd(bt(john)|father(henry,john), bt(henry)) and cpd(bt(john)|mother(henry,john), bt(irene)) cpd(bt(john)|father(henry,john),bt(henry),mother(irene,john),bt(irene)) But we need !!!
  • 24. C ombining R ules (contd.) P (A|B,C) P (A|B) and P (A|C) CR
    • Any algorithm which
        • combines a set of PDFs
        • into the (combined) PDFs
        • where
        • has an empty output if and only if the input is empty
    • E.g. noisy-or, regression, ...
  • 25. Bayesian Logic Programs - a definition
    • A BLP consists of
      • a finite set of Bayesian clauses.
      • To each clause in a conditional probability distribution is associated:
      • To each Bayesian predicate p a combining rule is associated to combine CPDs of multiple ground instances of clauses having the same head
    • Proper random variables ~ LH(B)
    • graphical structure ~ dependency graph
    • Quantitative information ~ CPDs and CRs
  • 26. Outline
    • Bayesian Logic Programs
      • Examples and Language
      • Semantics and Support Networks
    • Learning Bayesian Logic Programs
      • Data Cases
      • Parameter Estimation
      • Structural Learning
  • 27. Discrete-Time Stochastic Process
    • Family of random variables over a domain X, where
    • for each linearization of the partial order induced by the dependency graph a Bayesian logic program specifies a discrete-time stochastic process
  • 28. Theorem of Kolmogorov  Existence and uniqueness of probability measure
      • : a Polish space
      • : set of all non-empty, finite subsets of J
      • : the probability measure over
      • If the projective family exists then there exists a unique probability measure
  • 29. Consistency Conditions
    • Probability measure ,
    • is represented by a finite Bayesian network which is a subnetwork of the dependency graph over LH(B): Support Network
    • (Elimination Order): All stochastic processes represented by a Bayesian logic program B specify the same probability measure over LH(B).
  • 30. Support network bt(ann) bt(brian) bt(cecily) bt(dorothy) mother(ann,dorothy) father(brian,dorothy) bt(eric) father(brian,eric) mother(cecily,eric) bt(gwenn) mother(ann,gwenn) bt(unknown2) bt(unknown1) bt(fred) mother(ann,fred) father(unknown1,fred) bt(henry) bt(irene) mother(dorothy,henry) mother(gwenn,irene) father(eric,irene) father(fred,henry) bt(john) father(henry,john) mother(irene,john) father(unknown2,eric) 
  • 31. Support network bt(ann) bt(brian) bt(cecily) bt(dorothy) mother(ann,dorothy) father(brian,dorothy) bt(eric) father(brian,eric) mother(cecily,eric) bt(gwenn) mother(ann,gwenn) bt(unknown2) bt(unknown1) bt(fred) mother(ann,fred) father(unknown1,fred) bt(henry) bt(irene) mother(dorothy,henry) mother(gwenn,irene) father(eric,irene) father(fred,henry) bt(john) father(henry,john) mother(irene,john) father(unknown2,eric) 
  • 32. Support network bt(ann) bt(brian) bt(cecily) bt(dorothy) mother(ann,dorothy) father(brian,dorothy) bt(eric) father(brian,eric) mother(cecily,eric) bt(gwenn) mother(ann,gwenn) bt(unknown2) bt(unknown1) bt(fred) mother(ann,fred) father(unknown1,fred) bt(henry) bt(irene) mother(dorothy,henry) mother(gwenn,irene) father(eric,irene) father(fred,henry) bt(john) father(henry,john) mother(irene,john) father(unknown2,eric) 
  • 33. Support network
    • Support network of is the induced subnetwork of
    • Support network of is defined as
    • Computation utilizes And/Or trees
  • 34. Queries using And/Or trees
    • A probabilistic query
    • ?- Q 1 ...,Q n |E 1 =e 1 ,...,E m =e m .
    • asks for the distribution
    • P (Q 1 , ..., Q n |E 1 =e 1 , ..., E m =e m ).
    • ?- bt(eric).
    • Or node is proven if at least one of its successors is provable.
    • And node is proven if all of its successors are provable .
    bt(brian) bt(cecily) father(brian,eric),bt(brian), mother(cecily,eric),bt(cecily) father(brian,eric) mother(cecily,eric) bt(eric) cpd bt(brian) bt(cecily) father(brian,eric) mother(cecily,eric) bt(eric) combined cpd
  • 35. Consistency Condition (contd.)
      • the dependency graph is acyclic, and
      • every random variable is influenced by a finite set of random variables only
     well-defined Bayesian logic program Projective family exists if
  • 36. Relational Character % ground facts bt(ann). bt(brian). bt(cecily). bt(unknown1). bt(unknown1). father(unknown1,fred). mother(ann,fred). father(brian,dorothy). mother(ann, dorothy). father(brian,eric). mother(cecily,eric). father(unknown2,gwenn). mother(ann,gwenn). father(fred,henry). mother(dorothy,henry). father(eric,irene). mother(gwenn,irene). father(henry,john). mother(irene,john). % rules bt(X) | father(F,X), bt(F), mother(M,X), bt(M).  AA Aa bt(F) false true father(X,F) (0.33,0.33,0.33) ... (1.0,0.0,0.0) P (bt(X)) false bt(M) mother(X,M) AA aa true % ground facts bt(susanne). bt(ralf). bt(peter). bt(uta). father(ralf,luca). mother(susanne,luca). ... % ground facts bt(petra). bt(bsilvester). bt(anne). bt(wilhelm). bt(beate). father(silvester,claudien). mother(beate,claudien). father(wilhelm,marta). mother(anne, marthe). ...
  • 37. Bayesian Logic Programs - Summary
    • First order logic extension of Bayesian networks
    • constants, relations, functors
    • discrete and continuous random variables
    • ground atoms = random variables
    • CPDs associated to clauses
    • Dependency graph =
    • (possibly) infinite Bayesian network
    • Generalize dynamic Bayesian networks and definite clause logic (range-restricted)
  • 38. Applications
    • Probabilistic, logical
      • Description and prediction
      • Regression
      • Classification
      • Clustering
    • Computational Biology
      • APrIL IST-2001-33053
    • Web Mining
    • Query approximation
    • Planning, ...
  • 39. Other frameworks
    • Probabilistic Horn Abduction [Poole 93]
    • Distributional Semantics (PRISM) [Sato 95]
    • Stochastic Logic Programs [Muggleton 96; Cussens 99]
    • Relational Bayesian Nets [Jaeger 97]
    • Probabilistic Logic Programs [Ngo, Haddawy 97]
    • Object-Oriented Bayesian Nets [Koller, Pfeffer 97]
    • Probabilistic Frame-Based Systems [Koller, Pfeffer 98]
    • Probabilistic Relational Models [Koller 99]
  • 40. Outline
    • Bayesian Logic Programs
      • Examples and Language
      • Semantics and Support Networks
    • Learning Bayesian Logic Programs
      • Data Cases
      • Parameter Estimation
      • Structural Learning
  • 41. Learning Bayesian Logic Programs Data + Background Knowledge  learning algorithm .9 .1 e b e .7 .3 .01 .99 .8 .2 b e b b e E P (A) A | E, B. B
  • 42. Why Learning Bayesian Logic Programs ?
    • Of interest to different communities ?
      • scoring functions, pruning techniques, theoretical insights, ...
     Inductive Logic Programming Learning within Bayesian Logic Programs Learning within Bayesian network
  • 43. What is the data about ?  A data case is a partially observed joint state of a finite, nonempty subset
  • 44. Learning Task
    • Given:
      • set of data cases
      • a Bayesian logic program B
    • Goal: for each the parameters
    • of that best fit the given data
  • 45. Parameter Estimation (contd.)
    • „ best fit“ ~ ML-Estimation
    • where the hypothesis space is spanned by the product space over the possible values of
    • :
  • 46. Parameter Estimation (contd.)
    • Assumption:
    • D1,...,DN are independently sampled from indentical distributions (e.g. totally separated families), ),
  • 47. Parameter Estimation (contd.)  is an ordinary BN marginal of marginal of
  • 48. Parameter Estimation (contd.)
    • Reduced to a problem within Bayesian networks:
    • given structure,
    • partially observed random varianbles
    • EM
    • [Dempster, Laird, Rubin, ´77], [Lauritzen, ´91]
    • Gradient Ascent
    • [Binder, Koller, Russel, Kanazawa, ´97], [Jensen, ´99]
  • 49. Decomposable CRs
    • Parameters of the clauses and not of the support network.
     Single ground instance of a Bayesian clause Multiple ground instance of the same Bayesian clause CPD for Combining Rule ... ... ... ...
  • 50. Gradient Ascent  Goal : Computation of
  • 51. Gradient Ascent  Bayesian ground clause Bayesian clause
  • 52. Gradient Ascent  Bayesian Network Inference Engine
  • 53. Algorithm 
  • 54. Expectation-Maximization
    • Initialize parameters
    • E-Step and M-Step , i.e.
    • compute expected counts for each clause and treat the expected count as counts
    • If not converged, iterate to 2
  • 55. Experimental Evidence
    • [Koller, Pfeffer ´97]
    • support network is a good approximation
    • [Binder et al. ´97]
    • equality constraints speed up learning
    • 100 data cases
    • constant step-size
    • Estimation of means
      • 13 iterations
    • Estimation of the weights
      • sum = 1.0
    N(165,s) false false N(165,s) true false N(165,s) false true N(0.5*h(M)+0.5*h(F),s) true true pdf (c)(h(X)|h(M),h(F)) f(F,X) m(M,X)
  • 56. Outline
    • Bayesian Logic Programs
      • Examples and Language
      • Semantics and Support Networks
    • Learning Bayesian Logic Programs
      • Data Cases
      • Parameter Estimation
      • Structural Learning
  • 57. Structural Learning
    • Combination of Inductive Logic Programming and Bayesian network learning
    • Datalog fragment of
    • Bayesian logic programs (no functors)
    • intensional Bayesian clauses
  • 58. Idea - CLAUDIEN
    • learning from interpretations
    • all data cases are
    • Herbrand interpretations
    • a hypothesis should
    • reflect what is in the
    • data
     probabilistic extension all data cases are partially observed joint states of Herbrand intepretations all hypotheses have to be (logically) true in all data cases
  • 59. What is the data about ? ... 
  • 60.
    • : set of data cases
    • : set of all clauses that can be part of
    • hypotheses
    • (logically) valid iff
    • logical solution iff is a logically maximally general valid hypothesis
    Claudien - Learning From Interpretations  probabilistic solution iff is (logically) valid and the Bayesian network induced by B on is acyclic
  • 61. Learning Task
    • Given:
      • set of data cases
      • a set of Bayesian logic programs
      • a scoring function
    • Goal: probabilistic solution
      • matches the data best according to
  • 62. Algorithm 
  • 63. Example  mc(john) pc(john) bc(john) m(ann,john) f(eric,john) pc(ann) mc(ann) mc(eric) pc(eric) mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Original Bayesian logic program {m(ann,john)=true, pc(ann)=a, mc(ann)=?, f(eric,john)=true, pc(eric)=b, mc(eric)=a, mc(john)=ab, pc(john)=a, bt(john) = ? } ... Data cases
  • 64. Example  mc(john) pc(john) bc(john) m(ann,john) f(eric,john) pc(ann) mc(ann) mc(eric) pc(eric) mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Original Bayesian logic program mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Initial hypothesis
  • 65. Example  mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Original Bayesian logic program mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Initial hypothesis mc(john) pc(john) bc(john) m(ann,john) f(eric,john) pc(ann) mc(ann) mc(eric) pc(eric)
  • 66. Example  mc(john) pc(john) bc(john) m(ann,john) f(eric,john) pc(ann) mc(ann) mc(eric) pc(eric) mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Original Bayesian logic program mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Refinement mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Initial hypothesis
  • 67. Example  mc(john) pc(john) bc(john) m(ann,john) f(eric,john) pc(ann) mc(ann) mc(eric) pc(eric) mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Original Bayesian logic program mc(X) | m(M,X),mc(X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Refinement mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Refinement
  • 68. Example  mc(john) pc(john) bc(john) m(ann,john) f(eric,john) pc(ann) mc(ann) mc(eric) pc(eric) mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Original Bayesian logic program mc(X) | m(M,X),pc(X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Refinement mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Refinement
  • 69. Example  mc(john) pc(john) bc(john) m(ann,john) f(eric,john) pc(ann) mc(ann) mc(eric) pc(eric) ... mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Original Bayesian logic program mc(X) | m(M,X),pc(X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Refinement mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Refinement
  • 70. Properties
    • All relevant random variables are known
    • First order equivalent of Bayesian network setting
    • Hypothesis postulates true regularities in the data
    • Logical solutions as inital hypotheses
    • Highlights Background Knowledge
  • 71. Example Experiments  mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Data: sampling from 2 families, each 1000 samples Score: LogLikelihood Goal: learn the definition of bt CLAUDIEN mc(X) | m(M,X). pc(X) | f(F,X). Bloodtype mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). highest score
  • 72. Conclusion
    • EM-based and Gradient-based method to do ML parameter estimation
    • Link between ILP and learning Bayesian networks
    • CLAUDIEN setting used to define and to traverse the search space
    • Bayesian network scores used to evaluate hypotheses
  • 73.  Thanks !