## Just for you: FREE 60-day trial to the world’s largest digital library.

The SlideShare family just got bigger. Enjoy access to millions of ebooks, audiobooks, magazines, and more from Scribd.

Cancel anytime.Free with a 14 day trial from Scribd

- 1. . ...... Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both Discrete and Continuous Variables are Present Joe Suzuki Osaka University April 11, 2012 Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 1 / 17
- 2. Road Map ...1 Problem ...2 Density Estimation ...3 Density Estimation in a General Sense ...4 Structure Estimation in a General Sense ...5 Summary Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 2 / 17
- 3. Problem Bayesian Network Structure X, Y , Z: random variables ordered as X < Y < Z
- 4. Y Y Y YZ Z Z Z X X X X ¡ ¡ ¡ ¡ E ¡ ¡ e e ¡ ¡ e e E
- 5. Y Y Y YZ Z Z Z X X X X E e e e e E Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 3 / 17
- 6. Problem Structure Estimation X, Y , Z: random variables over sets A, B, C {(xi , yi , zi )}n i=1 ∈ (A × B × C)n: n examples independently emitted by P(X, Y , Z) . Structure Estimation .. ......Choose one among the eight structures based on {(xi , yi , zi )}n i=1 (The three variable case X, Y , Z can be extended to the d variable case {Xj }d j=1 in a straightforward manner. ) Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 4 / 17
- 7. Problem Previous Works Previous approaches assume either all of Xj are ﬁnite, or all of Xj are Gaussian. . In reality, .. ......in any database, some ﬁelds are discrete, and other ﬁelds continuous. Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 5 / 17
- 8. Problem If A, B, C are ﬁnite Given xn ∈ An, yn ∈ Bn, zn ∈ Cn, we compute Qn (xn ), Qn (yn ), Qn (zn ), Q(xn , yn ), Qn (xn , zn ), Qn (yn , zn ), Qn (xn , yn , zn ) For some prior probabilities p0, p1, p00, p01, p10, p11, what Y depends on is based on which is larger between p0Q(xn ), p1 Qn(xn, yn) Qn(xn) and what Z depends on is based on which is the largest among p00Qn (zn ), p01 Qn(yn, zn) Q(yn) , p10 Qn(xn, zn) Q(xn) , p11 Qn(xn, yn, zn) Qn(xn, yn) Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 6 / 17
- 9. Problem Universal Coding A := {0, 1, · · · , m − 1} with m ≥ 2 xn = (x1, · · · , xn) ∈ An: independently emitted by unknown Pn (xn ) := n∏ i=1 P(xi ) φ: uniquely decodable coding An → {0, 1}∗ φ(xn ) ∈ {0, 1}m =⇒ Lφ(xn ) := m . φ: universal .. ...... Lφ(xn) n → H := ∑ x∈A −P(x) log P(x) for any P, such as LZ, CTW Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 7 / 17
- 10. Problem Why can Pn be replaced by Qn ? . Qn: a universal coding measure w.r.t. A .. ...... − 1 n log Qn (xn ) → H for any P ∑ xn∈An Qn (xn ) ≤ 1 such as Qn (xn ) := 2−Lφ(xn) if φ is universal Shannon-McMillan-Breiman: for any P, − 1 n log Pn (xn ) = 1 n n∑ i=1 {− log P(xi )} → E[− log P(X)] = H . Universality .. ...... 1 n log Pn(xn) Qn(xn) → 0 Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 8 / 17
- 11. Problem Today’s Problem: What if A, B, C are not ﬁnite? X ∈ A = [0, 1) Continuous Y ∈ B = {1, 2, · · · } Discrete and Inﬁnite Z ∈ C = [0, 1) ∪ {1, 2, · · · } neither Continuous nor Discrete Without assuming that A, B, C are either discrete or continuous, What is universality like 1 n log Pn(xn) Qn(xn) → 0 ? What is a universal measure like Qn? Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 9 / 17
- 12. Density Estimation If Density Function f exists for X A0 := {A} Ak+1 is a reﬁnment of Ak Example 1: A = [0, 1) A0 = {[0, 1)} A1 = {[0, 1/2), [1/2, 1)} A2 = {[0, 1/4), [1/4, 1/2), [1/2, 3/4), [3/4, 1)} . . . Ak = {[0, 2−(k−1)), [2−(k−1), 2 · 2−(k−1)), · · · , [(2k−1 − 1)2−(k−1), 1)} . . . sk : A → Ak (quantizer over A) sn k : An → An k (quantizer over An) Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 10 / 17
- 13. Density Estimation Qn k : a universal coding measure w.r.t. Ak λn: Lebesgue measure (width of an interval), λn (sn k (xn )) = n∏ i=1 λ(sk(xi )) gn k (xn ) := Qn k (sn k (xn)) λn(sn k (xn)) {ωk}∞ k=1: ∑ k ωk = 1, ωk 0 , gn(xn) := ∑ k ωkgn k (xn) f n k (xn ) := Pn k (sn k (xn)) λn(sn k (xn)) = n∏ i=1 Pk(sk(xi )) λ(sk(xi )) If {Ak} is s.t. h(fk) → h(f ) (k → ∞), for any f n, 1 n log f n(xn) gn(xn) → 0 (B. Ryabko, 2009) Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 11 / 17
- 14. Density Estimation in a General Sense Exactly when does a density function f exist given X? B: the Borel set of R µ(D): the probability of D ∈ B λ(D): the Lebesgues measure of D ∈ B . µ is Absolutely Continuous w.r.t. λ .. ...... Equivalent Conditions (Radon-Nykodim): µ ≪ λ: for each D ∈ B, λ(D) = 0 =⇒ µ(D) = 0. There exists dµ dλ := f s.t. µ(D) = ∫ t∈D f (t)dλ(t). Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 12 / 17
- 15. Density Estimation in a General Sense Density Estimation in a General Sense (Suzuki 2011) . µ is Absolutely Continuous w.r.t. η .. ...... Equivalent Conditions (Radon-Nykodim): µ ≪ η: for each D ∈ B, η(D) = 0 =⇒ µ(D) = 0 There exists dµ dη := f s.t. µ(D) = ∫ t∈D f (t)dη(t) Example 2: µ({j}) 0, η({j}) := 1 j(j + 1) , j ∈ B = {1, 2, · · · } =⇒ µ ≪ η ⇐⇒ there exists f s.t. µ(D) = ∑ j∈D f (j)η({j}) , D ⊆ B In fact, f (j) = µ({j}) η({j}) satisﬁes the condition. (The Lebesgues ∫ does not distinguish discrete Σ and continuous ∫ .) Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 13 / 17
- 16. Density Estimation in a General Sense B0 := {B} with B = {1, 2, · · · } B1 := {{1}, {2, 3, · · · }} B2 := {{1}, {2}, {3, 4, · · · }} . . . Bk := {{1}, {2}, · · · , {k}, {k + 1, k + 2, · · · }} . . . sk : B → Bk, sn k : Bn → Bn k gn k (yn ) := Qn k (sn k (yn)) ηn(sn k (yn)) , gn (yn ) := ∞∑ k=1 ωkgn k (yn ) If {Bk} is s.t. h(fk) → h(f ) (k → ∞), for any f n, 1 n log f n(yn) gn(yn) → 0 (gn(yn) ∏n i=1 ηn({yi }) estimates P(yn) = f n(yn) ∏n i=1 ηn({yi }).) Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 14 / 17
- 17. Density Estimation in a General Sense Estimation of Simultaneous Density Functions Example 3: A × B (based on Examples 1,2 for A, B) µ ≪ λ and µ ≪ η A0 × B0 = {A} × {B} = [0, 1) × {1, 2, · · · } A1 × B1 A2 × B2 . . . Ak × Bk . . . sk : A × B → Ak × Bk If {Ak × Bk} is s.t. h(fk) → h(f ) (k → ∞), for any f n, gn can be constructed so that 1 n log f n(xn, yn) gn(xn, yn) → 0 (1) Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 15 / 17
- 18. Structure Estimation in a General Sense Structure Estimation in a General Sense Estimate the generalized density functions: f n X (xn ), f n Y (yn ), f n Z (zn ) f n XY (xn , yn ), f n XZ (xn , zn ), f n YZ (yn , zn ), f n XYZ (xn , yn , zn ) by gn X (xn ), gn Y (yn ), gn Z (zn ) gn XY (xn , yn ), gn XZ (xn , zn ), gn YZ (yn , zn ), gn XYZ (xn , yn , zn ) so that we can compare p0gn Y (yn ), p1 gXY (xn, yn) gn X (xn) p00gn Z (zn ), p01 gn YZ (yn, zn) gn Y (yn) , p10 gn XZ (xn, zn) gn XY (xn) , p11 gn XYZ (xn, yn, zn) gn XY (xn, yn) Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 16 / 17
- 19. Summary Summary . Universal measure without assuming either discrete or continuous .. ...... 1 n log f n(xn) gn(xn) → 0 f n (xn ) = dµn dηn (xn ), gn (xn ) = dνn dηn (xn ): extended density functions . Many applications based on the same approach .. ...... Estimation of Markov orders (discrete times and continuous values) Estimation of mutual information and its application to Chow-Liu . Future Works .. ...... Realistic settings of {Ak}, {ωk} based on the a prior informaation Development of structure estimation modules Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 17 / 17