Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MDL/Bayesian Criteria based on Universal Coding/Measure

153 views

Published on

MDL/Bayesian Criteria based on Universal Coding/Measure
Joe Suzuki
Solomonoff 85 conference, November 2011

Published in: Science
  • Be the first to comment

  • Be the first to like this

MDL/Bayesian Criteria based on Universal Coding/Measure

  1. 1. . ...... MDL/Bayesian Criteria based on Universal Coding/Measure Joe Suzuki Osaka University November 30, 2011 Joe Suzuki (Osaka University) MDL/Bayesian Criteria based on Universal Coding/MeasureNovember 30, 2011 1 / 17
  2. 2. Road Map ...1 Problem ...2 Density Functions ...3 Generalized Density Functions ...4 The Bayesian Solution ...5 Summary Joe Suzuki (Osaka University) MDL/Bayesian Criteria based on Universal Coding/MeasureNovember 30, 2011 2 / 17
  3. 3. Problem Warming-Up Identify whether X, Y are independent or not, from n examples (x1, y1), · · · , (xn, yn) independently emitted by (X, Y )? X ∈ A := {0, 1} Y ∈ B := {0, 1} p: a prior probability that X, Y are independent WA, WB, WAB: weights Qn (xn ) := ∫ P(xn |θ)dWA(θ) , Qn (yn ) := ∫ P(yn |θ)dWB(θ) Qn (xn , yn ) := ∫ P(xn , yn |θ)dWAB(θ) . The Bayesian answer .. ......pQn(xn)Qn(yn) ≥ (1 − p)Qn(xn, yn) ⇐⇒ X, Y are independent Joe Suzuki (Osaka University) MDL/Bayesian Criteria based on Universal Coding/MeasureNovember 30, 2011 3 / 17
  4. 4. Problem Today’s Exercise Identify whether X, Y are independent or not, from n examples (x1, y1), · · · , (xn, yn) independently emitted by (X, Y )? X ∈ A := [0, 1) Continuous Y ∈ B := {1, 2, · · · } Discrete and Infinite . Problem .. ......Construct something like Qn(xn), Qn(yn), Qn(xn, yn). Extend those quantities for general X, Y without assuming either discrete or continuous Joe Suzuki (Osaka University) MDL/Bayesian Criteria based on Universal Coding/MeasureNovember 30, 2011 4 / 17
  5. 5. Problem Why Qn (xn ), Qn (yn ), Qn (xn , yn ) can be probabilities? W ∗ A, W ∗ B, W ∗ A,B: the true priors Pn (xn ) := ∫ P(xn |θ)dW ∗ A(θ) , Pn (yn ) := ∫ P(yn |θ)dW ∗ B(θ) Pn (xn , yn ) := ∫ P(xn , yn |θ)dW ∗ AB(θ) Known Use W ∗ A, W ∗ B, W ∗ A,B to compare pPn(xn)Pn(yn) and (1 − p)Pn(xn, yn) Unknown Use WA, WB, WA,B to compare pQn(xn)Qn(yn) and (1 − p)Qn(xn, yn) . The main Issue .. ......What Qn is qualified to be an alternative to Pn? Joe Suzuki (Osaka University) MDL/Bayesian Criteria based on Universal Coding/MeasureNovember 30, 2011 5 / 17
  6. 6. Problem What is the exact Qn for finite A? P(X = 1) = θ, P(X = 0) = 1 − θ If we weight w(θ) = 1 Kθa(1 − θ)a , K := ∫ dθ θa(1 − θ)a with a > 0, then for each xn = (x1, · · · , xn) ∈ An Qn (xn ) := ∫ w(θ)P(xn |θ)dθ = Γ(2a) ∏ x∈A Γ(cn[x] + a) Γ(a)2Γ(n + 2a) ci [x]: the # of x ∈ A in xi = (x1, · · · , xi ) ∈ Ai Γ: the Gamma function Joe Suzuki (Osaka University) MDL/Bayesian Criteria based on Universal Coding/MeasureNovember 30, 2011 6 / 17
  7. 7. Problem Universal Coding/Measures If we choose a = 1/2 (Krichevsky-Trofimov) and xn is i.i.d. emitted by Pn (xn ) = n∏ i=1 P(xi ) then, for any P, almost surely, − 1 n log Qn (xn ) → H := ∑ x∈A −P(x) log P(x) From the law of large numbers (Shannon McMillian Breiman): for any P, almost surely, − 1 n log Pn (xn ) = 1 n n∑ i=1 − log P(xi ) → E[− log P(xi )] = H Joe Suzuki (Osaka University) MDL/Bayesian Criteria based on Universal Coding/MeasureNovember 30, 2011 7 / 17
  8. 8. Problem The Essential Problem For any P, almost surely, 1 n log Pn(xn) Qn(xn) → 0 (1) (the basis why Pn can be replaced by Qn) . X is neither discrete nor continuous .. ......Into what can Qn and (1) be generalized ? Joe Suzuki (Osaka University) MDL/Bayesian Criteria based on Universal Coding/MeasureNovember 30, 2011 8 / 17
  9. 9. Density Functions If X has a density function A: the range of X A0 := {A} Ak+1 is a refinement of Ak Example 1: if A0 = {[0, 1)}, the histogram sequence can be A1 = {[0, 1/2), [1/2, 1)} A2 = {[0, 1/4), [1/4, 1/2), [1/2, 3/4), [3/4, 1)} . . . Ak = {[0, 2−(k−1)), [2−(k−1), 2 · 2−(k−1)), · · · , [(2k−1 − 1)2−(k−1), 1)} . . . sk : A → Ak, sn k : An → An k λ: Lebesgue measure, λn (sn k (xn )) = n∏ i=1 λ(sk(xi )) Joe Suzuki (Osaka University) MDL/Bayesian Criteria based on Universal Coding/MeasureNovember 30, 2011 9 / 17
  10. 10. Density Functions {ωk}∞ k=1: ∑ ωk = 1, ωk > 0 gn k (xn ) := Qn k (sn k (xn)) λn(sn k (xn)) , gn (xn ) := ∞∑ k=1 ωkgn k (xn ) fk(xn ) := Pn k (sn k (xn)) λn(sn k (xn)) = n∏ i=1 Pk(sk(xi )) λ(sk(xi )) If we choose {Ak} such that fk → f , for any f n, almost surely 1 n log f n(xn) gn(xn) → 0 (2) B. Ryabko. IEEE Trans. on Inform. Theory, 55, 9, 2009. Joe Suzuki (Osaka University) MDL/Bayesian Criteria based on Universal Coding/MeasureNovember 30, 2011 10 / 17
  11. 11. Generalized Density Functions Exactly when does density function exist? B: the Borel set field of R µ(D): the probabbility of Borel set D . When a density function exists .. ...... The following are equivalent: for each D ∈ B, λ(D) = 0 =⇒ µ(D) = 0 (µ ≪ λ) There exists dµ dλ := f s.t. µ(D) = ∫ t∈D f (t)dλ(t) Joe Suzuki (Osaka University) MDL/Bayesian Criteria based on Universal Coding/MeasureNovember 30, 2011 11 / 17
  12. 12. Generalized Density Functions Density Functions in a General Sense . Radon-Nikodum’s Theorem .. ...... The following are equivalent: for each D ∈ B, η(D) = 0 =⇒ µ(D) = 0 (µ ≪ η) There exists dµ dη := f s.t. µ(D) = ∫ t∈D f (t)dη(t) Example 2: µ({j}) > 0, η({j}) := 1 j(j + 1) , j ∈ B := {1, 2, · · · } µ ≪ η µ(D) = ∑ j∈D∩B f (j)η({j}) dµ dη (j) = f (j) = µ({j}) η({j}) Joe Suzuki (Osaka University) MDL/Bayesian Criteria based on Universal Coding/MeasureNovember 30, 2011 12 / 17
  13. 13. Generalized Density Functions In this work, ... B1 := {{1}, {2, 3, · · · }} B2 := {{1}, {2}, {3, 4, · · · }} . . . Bk := {{1}, {2}, · · · , {k}, {k + 1, k + 2, · · · }} . . . sk : B → Bk, sn k : Bn → Bn k gn k (yn ) := Qn k (sn k (yn)) ηn(sn k (yn)) , gn (yn ) := ∞∑ k=1 ωkgn k (yn ) If we choose {Bk} s.t. fk → f , for any f n, almost surely 1 n log f n(yn) gn(yn) → 0 (3) (gn(yn) ∏n i=1 ηn({yi }) is estimation of P(yn) = f n(yn) ∏n i=1 ηn({yi })) Joe Suzuki (Osaka University) MDL/Bayesian Criteria based on Universal Coding/MeasureNovember 30, 2011 13 / 17
  14. 14. Generalized Density Functions Joint Density Functions Example 3: A × B (based on Examples 1,2) µ ≪ λη A0 × B0 = {A} × {B} = {[0, 1)} × {{1, 2, · · · }} A1 × B1 A2 × B2 . . . Ak × Bk . . . sk : A × B → Ak × Bk   If {Ak × Bk} satisfies fk → f , for any f n, almost surely, we can construct gn s.t. 1 n log f n(xn, yn) gn(xn, yn) → 0 (4) Joe Suzuki (Osaka University) MDL/Bayesian Criteria based on Universal Coding/MeasureNovember 30, 2011 14 / 17
  15. 15. The Bayesian Solution If we come back to “Today’s Problem”,... Estimate f n X (xn), f n Y (yn), f n XY (xn, yn) by   gn X (xn), gn Y (yn), gn XY (xn, yn)   . The Bayesian answer .. ......p0gn X (xn)gn Y (yn) ≤ p1gXY (xn, yn) ⇐⇒ X, Y are independent Joe Suzuki (Osaka University) MDL/Bayesian Criteria based on Universal Coding/MeasureNovember 30, 2011 15 / 17
  16. 16. The Bayesian Solution In General, ... Givem n example zn and prior {pm} over models m = 1, 2, · · · , estimate f n(zn|m) = dµn dηn (zn |m) w.r.t. model m by gn (zn |m) = dνn dηn (zn |m) s.t. 1 n log dµn dνn (zn |m) → 0 , where µ ≪ η, ν ≪ η, and dµn dνn (zn |m) = dµn dηn (zn |m)/ dνn dηn (zn |m) = f n(zn|m) gn(zn|m) to find the model m maxmizing pm · dνn dηn (zn |m) Joe Suzuki (Osaka University) MDL/Bayesian Criteria based on Universal Coding/MeasureNovember 30, 2011 16 / 17
  17. 17. Summary Summary and Discussion . Bayesian Measure .. ...... Generalization without assuming Discrete or Continuous Universality as Bayes as well as MDL . Many Applications .. ...... Markov order estimation even when {Xi } is continuous Bayesian network structure estimation Joe Suzuki (Osaka University) MDL/Bayesian Criteria based on Universal Coding/MeasureNovember 30, 2011 17 / 17

×