.
......
Bayesian Network Structure Estimation
Based on the Bayesian/MDL Criteria
When Both Discrete and Continuous Variables are Present
Joe Suzuki
Osaka University
April 11, 2012
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 1 / 17
Road Map
...1 Problem
...2 Density Estimation
...3 Density Estimation in a General Sense
...4 Structure Estimation in a General Sense
...5 Summary
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 2 / 17
Problem
Bayesian Network Structure
X, Y , Z: random variables ordered as X < Y < Z
 
Y Y Y YZ Z Z Z
X X X X
¡
¡
¡
¡
E
¡
¡
e
e…
¡
¡
e
e…
E
Y Y Y YZ Z Z Z
X X X X
E
e
e…
e
e…
E
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 3 / 17
Problem
Structure Estimation
X, Y , Z: random variables over sets A, B, C
{(xi , yi , zi )}n
i=1 ∈ (A × B × C)n:
n examples independently emitted by P(X, Y , Z)
.
Structure Estimation
..
......Choose one among the eight structures based on {(xi , yi , zi )}n
i=1
 
(The three variable case X, Y , Z can be extended to the d variable case
{Xj }d
j=1 in a straightforward manner. )
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 4 / 17
Problem
Previous Works
Previous approaches assume either
all of Xj are finite, or
all of Xj are Gaussian.
.
In reality,
..
......in any database, some fields are discrete, and other fields continuous.
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 5 / 17
Problem
If A, B, C are finite
Given xn ∈ An, yn ∈ Bn, zn ∈ Cn, we compute
Qn
(xn
), Qn
(yn
), Qn
(zn
), Q(xn
, yn
), Qn
(xn
, zn
), Qn
(yn
, zn
), Qn
(xn
, yn
, zn
)
For some prior probabilities p0, p1, p00, p01, p10, p11,
what Y depends on is based on which is larger between
p0Q(xn
), p1
Qn(xn, yn)
Qn(xn)
and what Z depends on is based on which is the largest among
p00Qn
(zn
), p01
Qn(yn, zn)
Q(yn)
, p10
Qn(xn, zn)
Q(xn)
, p11
Qn(xn, yn, zn)
Qn(xn, yn)
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 6 / 17
Problem
Universal Coding
A := {0, 1, · · · , m − 1} with m ≥ 2
 
xn = (x1, · · · , xn) ∈ An: independently emitted by unknown
Pn
(xn
) :=
n∏
i=1
P(xi )
φ: uniquely decodable coding An → {0, 1}∗
φ(xn
) ∈ {0, 1}m
=⇒ Lφ(xn
) := m
.
φ: universal
..
......
Lφ(xn)
n
→ H :=
∑
x∈A
−P(x) log P(x)
for any P, such as LZ, CTW
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 7 / 17
Problem
Why can Pn
be replaced by Qn
?
.
Qn: a universal coding measure w.r.t. A
..
......
−
1
n
log Qn
(xn
) → H for any P
∑
xn∈An
Qn
(xn
) ≤ 1
such as Qn
(xn
) := 2−Lφ(xn)
if φ is universal
Shannon-McMillan-Breiman: for any P,
−
1
n
log Pn
(xn
) =
1
n
n∑
i=1
{− log P(xi )} → E[− log P(X)] = H
.
Universality
..
......
1
n
log
Pn(xn)
Qn(xn)
→ 0
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 8 / 17
Problem
Today’s Problem: What if A, B, C are not finite?
X ∈ A = [0, 1) Continuous
Y ∈ B = {1, 2, · · · } Discrete and Infinite
Z ∈ C = [0, 1) ∪ {1, 2, · · · } neither Continuous nor Discrete
 
Without assuming that A, B, C are either discrete or continuous,
What is universality like
1
n
log
Pn(xn)
Qn(xn)
→ 0 ?
What is a universal measure like Qn?
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 9 / 17
Density Estimation
If Density Function f exists for X
A0 := {A}
Ak+1 is a refinment of Ak
Example 1: A = [0, 1)
A0 = {[0, 1)}
A1 = {[0, 1/2), [1/2, 1)}
A2 = {[0, 1/4), [1/4, 1/2), [1/2, 3/4), [3/4, 1)}
. . .
Ak = {[0, 2−(k−1)), [2−(k−1), 2 · 2−(k−1)), · · · , [(2k−1 − 1)2−(k−1), 1)}
. . .
sk : A → Ak (quantizer over A)
sn
k : An → An
k (quantizer over An)
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 10 / 17
Density Estimation
Qn
k : a universal coding measure w.r.t. Ak
λn: Lebesgue measure (width of an interval), λn
(sn
k (xn
)) =
n∏
i=1
λ(sk(xi ))
gn
k (xn
) :=
Qn
k (sn
k (xn))
λn(sn
k (xn))
{ωk}∞
k=1:
∑
k ωk = 1, ωk  0 , gn(xn) :=
∑
k ωkgn
k (xn)
f n
k (xn
) :=
Pn
k (sn
k (xn))
λn(sn
k (xn))
=
n∏
i=1
Pk(sk(xi ))
λ(sk(xi ))
If {Ak} is s.t. h(fk) → h(f ) (k → ∞), for any f n,
1
n
log
f n(xn)
gn(xn)
→ 0
(B. Ryabko, 2009)
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 11 / 17
Density Estimation in a General Sense
Exactly when does a density function f exist given X?
B: the Borel set of R
µ(D): the probability of D ∈ B
λ(D): the Lebesgues measure of D ∈ B
.
µ is Absolutely Continuous w.r.t. λ
..
......
Equivalent Conditions (Radon-Nykodim):
µ ≪ λ: for each D ∈ B, λ(D) = 0 =⇒ µ(D) = 0.
There exists
dµ
dλ
:= f s.t. µ(D) =
∫
t∈D
f (t)dλ(t).
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 12 / 17
Density Estimation in a General Sense
Density Estimation in a General Sense (Suzuki 2011)
.
µ is Absolutely Continuous w.r.t. η
..
......
Equivalent Conditions (Radon-Nykodim):
µ ≪ η: for each D ∈ B, η(D) = 0 =⇒ µ(D) = 0
There exists
dµ
dη
:= f s.t. µ(D) =
∫
t∈D
f (t)dη(t)
Example 2: µ({j})  0, η({j}) :=
1
j(j + 1)
, j ∈ B = {1, 2, · · · }
=⇒ µ ≪ η ⇐⇒ there exists f s.t. µ(D) =
∑
j∈D
f (j)η({j}) , D ⊆ B
In fact, f (j) =
µ({j})
η({j})
satisfies the condition.
(The Lebesgues
∫
does not distinguish discrete Σ and continuous
∫
.)
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 13 / 17
Density Estimation in a General Sense
B0 := {B} with B = {1, 2, · · · }
B1 := {{1}, {2, 3, · · · }}
B2 := {{1}, {2}, {3, 4, · · · }}
. . .
Bk := {{1}, {2}, · · · , {k}, {k + 1, k + 2, · · · }}
. . .
sk : B → Bk, sn
k : Bn → Bn
k
gn
k (yn
) :=
Qn
k (sn
k (yn))
ηn(sn
k (yn))
, gn
(yn
) :=
∞∑
k=1
ωkgn
k (yn
)
If {Bk} is s.t. h(fk) → h(f ) (k → ∞), for any f n,
1
n
log
f n(yn)
gn(yn)
→ 0
(gn(yn)
∏n
i=1 ηn({yi }) estimates P(yn) = f n(yn)
∏n
i=1 ηn({yi }).)
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 14 / 17
Density Estimation in a General Sense
Estimation of Simultaneous Density Functions
Example 3: A × B (based on Examples 1,2 for A, B)
µ ≪ λ and µ ≪ η
A0 × B0 = {A} × {B} = [0, 1) × {1, 2, · · · }
A1 × B1
A2 × B2
. . .
Ak × Bk
. . .
sk : A × B → Ak × Bk
 
If {Ak × Bk} is s.t. h(fk) → h(f ) (k → ∞), for any f n, gn can be
constructed so that
1
n
log
f n(xn, yn)
gn(xn, yn)
→ 0 (1)
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 15 / 17

Bayesian network structure estimation based on the Bayesian/MDL criteria when both discrete and continuous variables are present

  • 1.
    . ...... Bayesian Network StructureEstimation Based on the Bayesian/MDL Criteria When Both Discrete and Continuous Variables are Present Joe Suzuki Osaka University April 11, 2012 Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 1 / 17
  • 2.
    Road Map ...1 Problem ...2Density Estimation ...3 Density Estimation in a General Sense ...4 Structure Estimation in a General Sense ...5 Summary Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 2 / 17
  • 3.
    Problem Bayesian Network Structure X,Y , Z: random variables ordered as X < Y < Z  
  • 4.
    Y Y YYZ Z Z Z X X X X ¡ ¡ ¡ ¡ E ¡ ¡ e e… ¡ ¡ e e… E
  • 5.
    Y Y YYZ Z Z Z X X X X E e e… e e… E Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 3 / 17
  • 6.
    Problem Structure Estimation X, Y, Z: random variables over sets A, B, C {(xi , yi , zi )}n i=1 ∈ (A × B × C)n: n examples independently emitted by P(X, Y , Z) . Structure Estimation .. ......Choose one among the eight structures based on {(xi , yi , zi )}n i=1   (The three variable case X, Y , Z can be extended to the d variable case {Xj }d j=1 in a straightforward manner. ) Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 4 / 17
  • 7.
    Problem Previous Works Previous approachesassume either all of Xj are finite, or all of Xj are Gaussian. . In reality, .. ......in any database, some fields are discrete, and other fields continuous. Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 5 / 17
  • 8.
    Problem If A, B,C are finite Given xn ∈ An, yn ∈ Bn, zn ∈ Cn, we compute Qn (xn ), Qn (yn ), Qn (zn ), Q(xn , yn ), Qn (xn , zn ), Qn (yn , zn ), Qn (xn , yn , zn ) For some prior probabilities p0, p1, p00, p01, p10, p11, what Y depends on is based on which is larger between p0Q(xn ), p1 Qn(xn, yn) Qn(xn) and what Z depends on is based on which is the largest among p00Qn (zn ), p01 Qn(yn, zn) Q(yn) , p10 Qn(xn, zn) Q(xn) , p11 Qn(xn, yn, zn) Qn(xn, yn) Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 6 / 17
  • 9.
    Problem Universal Coding A :={0, 1, · · · , m − 1} with m ≥ 2   xn = (x1, · · · , xn) ∈ An: independently emitted by unknown Pn (xn ) := n∏ i=1 P(xi ) φ: uniquely decodable coding An → {0, 1}∗ φ(xn ) ∈ {0, 1}m =⇒ Lφ(xn ) := m . φ: universal .. ...... Lφ(xn) n → H := ∑ x∈A −P(x) log P(x) for any P, such as LZ, CTW Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 7 / 17
  • 10.
    Problem Why can Pn bereplaced by Qn ? . Qn: a universal coding measure w.r.t. A .. ...... − 1 n log Qn (xn ) → H for any P ∑ xn∈An Qn (xn ) ≤ 1 such as Qn (xn ) := 2−Lφ(xn) if φ is universal Shannon-McMillan-Breiman: for any P, − 1 n log Pn (xn ) = 1 n n∑ i=1 {− log P(xi )} → E[− log P(X)] = H . Universality .. ...... 1 n log Pn(xn) Qn(xn) → 0 Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 8 / 17
  • 11.
    Problem Today’s Problem: Whatif A, B, C are not finite? X ∈ A = [0, 1) Continuous Y ∈ B = {1, 2, · · · } Discrete and Infinite Z ∈ C = [0, 1) ∪ {1, 2, · · · } neither Continuous nor Discrete   Without assuming that A, B, C are either discrete or continuous, What is universality like 1 n log Pn(xn) Qn(xn) → 0 ? What is a universal measure like Qn? Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 9 / 17
  • 12.
    Density Estimation If DensityFunction f exists for X A0 := {A} Ak+1 is a refinment of Ak Example 1: A = [0, 1) A0 = {[0, 1)} A1 = {[0, 1/2), [1/2, 1)} A2 = {[0, 1/4), [1/4, 1/2), [1/2, 3/4), [3/4, 1)} . . . Ak = {[0, 2−(k−1)), [2−(k−1), 2 · 2−(k−1)), · · · , [(2k−1 − 1)2−(k−1), 1)} . . . sk : A → Ak (quantizer over A) sn k : An → An k (quantizer over An) Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 10 / 17
  • 13.
    Density Estimation Qn k :a universal coding measure w.r.t. Ak λn: Lebesgue measure (width of an interval), λn (sn k (xn )) = n∏ i=1 λ(sk(xi )) gn k (xn ) := Qn k (sn k (xn)) λn(sn k (xn)) {ωk}∞ k=1: ∑ k ωk = 1, ωk 0 , gn(xn) := ∑ k ωkgn k (xn) f n k (xn ) := Pn k (sn k (xn)) λn(sn k (xn)) = n∏ i=1 Pk(sk(xi )) λ(sk(xi )) If {Ak} is s.t. h(fk) → h(f ) (k → ∞), for any f n, 1 n log f n(xn) gn(xn) → 0 (B. Ryabko, 2009) Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 11 / 17
  • 14.
    Density Estimation ina General Sense Exactly when does a density function f exist given X? B: the Borel set of R µ(D): the probability of D ∈ B λ(D): the Lebesgues measure of D ∈ B . µ is Absolutely Continuous w.r.t. λ .. ...... Equivalent Conditions (Radon-Nykodim): µ ≪ λ: for each D ∈ B, λ(D) = 0 =⇒ µ(D) = 0. There exists dµ dλ := f s.t. µ(D) = ∫ t∈D f (t)dλ(t). Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 12 / 17
  • 15.
    Density Estimation ina General Sense Density Estimation in a General Sense (Suzuki 2011) . µ is Absolutely Continuous w.r.t. η .. ...... Equivalent Conditions (Radon-Nykodim): µ ≪ η: for each D ∈ B, η(D) = 0 =⇒ µ(D) = 0 There exists dµ dη := f s.t. µ(D) = ∫ t∈D f (t)dη(t) Example 2: µ({j}) 0, η({j}) := 1 j(j + 1) , j ∈ B = {1, 2, · · · } =⇒ µ ≪ η ⇐⇒ there exists f s.t. µ(D) = ∑ j∈D f (j)η({j}) , D ⊆ B In fact, f (j) = µ({j}) η({j}) satisfies the condition. (The Lebesgues ∫ does not distinguish discrete Σ and continuous ∫ .) Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 13 / 17
  • 16.
    Density Estimation ina General Sense B0 := {B} with B = {1, 2, · · · } B1 := {{1}, {2, 3, · · · }} B2 := {{1}, {2}, {3, 4, · · · }} . . . Bk := {{1}, {2}, · · · , {k}, {k + 1, k + 2, · · · }} . . . sk : B → Bk, sn k : Bn → Bn k gn k (yn ) := Qn k (sn k (yn)) ηn(sn k (yn)) , gn (yn ) := ∞∑ k=1 ωkgn k (yn ) If {Bk} is s.t. h(fk) → h(f ) (k → ∞), for any f n, 1 n log f n(yn) gn(yn) → 0 (gn(yn) ∏n i=1 ηn({yi }) estimates P(yn) = f n(yn) ∏n i=1 ηn({yi }).) Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 14 / 17
  • 17.
    Density Estimation ina General Sense Estimation of Simultaneous Density Functions Example 3: A × B (based on Examples 1,2 for A, B) µ ≪ λ and µ ≪ η A0 × B0 = {A} × {B} = [0, 1) × {1, 2, · · · } A1 × B1 A2 × B2 . . . Ak × Bk . . . sk : A × B → Ak × Bk   If {Ak × Bk} is s.t. h(fk) → h(f ) (k → ∞), for any f n, gn can be constructed so that 1 n log f n(xn, yn) gn(xn, yn) → 0 (1) Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 15 / 17