Upcoming SlideShare
×

# 2013 IEEE International Symposium on Information Theory

• 96 views

July 7-12,2013, Istanbul Turky.

July 7-12,2013, Istanbul Turky.

• Comment goes here.
Are you sure you want to
Be the first to comment
Be the first to like this

Total Views
96
On Slideshare
0
From Embeds
0
Number of Embeds
0

Shares
1
0
Likes
0

No embeds

### Report content

No notes for slide

### Transcript

• 1. Problem Density Functions Generalized Density Functions The Bayesian Solution Summary Universal Bayesian Measures Joe Suzuki Osaka University IEEE International Symposium on Information Theory Istanbul, Turky July 8, 2013 1 / 19 Universal Bayesian Measures
• 2. Problem Density Functions Generalized Density Functions The Bayesian Solution Summary Given n examples, identify whether X, Y are independent or not (x1, y1), · · · , (xn, yn) ∼ (X, Y ) ∈ {0, 1} × {0, 1} p: a prior probability that X, Y are independent The Bayesian answer Consider weight W over θ to compute Qn (xn ) := ∫ P(xn |θ)dW (θ) , Qn (yn ) := ∫ P(yn |θ)dW (θ) Qn (xn , yn ) := ∫ P(xn , yn |θ)dW (θ) pQn(xn)Qn(yn) ≥ (1 − p)Qn(xn, yn) ⇐⇒ X, Y are independent 2 / 19 Universal Bayesian Measures
• 3. Problem Density Functions Generalized Density Functions The Bayesian Solution Summary Problem: what if X, Y are arbitrary random variables? (Ω, F, P): probability space B: the Borel set of R   X is a random variable . . X : Ω → R is F-measurable (D ∈ B =⇒ {ω ∈ Ω|X(ω) ∈ D} ∈ F)   X, Y may be either discrete contunuous none of them 3 / 19 Universal Bayesian Measures
• 4. Problem Density Functions Generalized Density Functions The Bayesian Solution Summary What Qn is qualiﬁed to be an alternative to Pn ? True θ = θ∗ is not available . . Pn(xn) = P(xn|θ∗), Pn(yn) = P(yn|θ∗) Pn(xn, yn) = Pn(xn, yn|θ∗) Qn (xn ) := ∫ P(xn |θ)dW (θ) , Qn (yn ) := ∫ P(yn |θ)dW (θ) Qn (xn , yn ) := ∫ P(xn , yn |θ)dW (θ) 4 / 19 Universal Bayesian Measures
• 5. Problem Density Functions Generalized Density Functions The Bayesian Solution Summary Example: Bayes Codes c: the # of ones in xn θ: the prob. of ones P(xn |θ) = θc (1 − θ)n−c a, b > 0 w(θ) ∝ 1 θa(1 − θ)b   For each xn = (x1, · · · , xn) ∈ {0, 1}n, Qn (xn ) := ∫ w(θ)P(xn |θ)dθ = ∏c−1 j=0 (j + a) · ∏n−c−1 k=0 (k + b) ∏n−1 i=0 (i + a + b) 5 / 19 Universal Bayesian Measures
• 6. Problem Density Functions Generalized Density Functions The Bayesian Solution Summary Universal Coding/Measures If we choose a = b = 1/2 (Krichevsky-Troﬁmov) and xn is i.i.d. emitted by Pn (xn |θ) = n∏ i=1 P(xi ) , P(xi ) = θ, 1 − θ then, for any P, almost surely, − 1 n log Qn (xn ) → H := ∑ x∈A −P(x) log P(x) From Shannon McMillian Breiman, for any P, − 1 n log Pn (xn |θ) = 1 n n∑ i=1 − log P(xi ) → E[− log P(xi )] = H 6 / 19 Universal Bayesian Measures
• 7. Problem Density Functions Generalized Density Functions The Bayesian Solution Summary Why Pn can be replaced by Qn if n is large ? For any P, almost surely, 1 n log Pn(xn) Qn(xn) → 0 (1) Qn: a universal Bayesian measure for A . What are Qn and (1) in the general settings ? 7 / 19 Universal Bayesian Measures
• 8. Problem Density Functions Generalized Density Functions The Bayesian Solution Summary Suppose a density function exists for X A: the range of X A0 := {A} Aj+1 is a reﬁnement of Aj Example 1: if A = [0, 1), the sequence can be A0 = {[0, 1)}, A1 = {[0, 1/2), [1/2, 1)} A2 = {[0, 1/4), [1/4, 1/2), [1/2, 3/4), [3/4, 1)} . . . Aj = {[0, 2−(j−1)), [2−(j−1), 2 · 2−(j−1)), · · · , [(2j−1 − 1)2−(j−1), 1)} . . . sj : A → Aj (quantization, x ∈ a ∈ Aj =⇒ sj (x) = a) λ : R → B (Lebesgue measure, a = [b, c) =⇒ λ(a) = c − b) Qn j : a universal Bayesian measure for Aj 8 / 19 Universal Bayesian Measures
• 9. Problem Density Functions Generalized Density Functions The Bayesian Solution Summary If (sj (x1), · · · , sj (xn)) = (a1, · · · , an), gn j (xn ) := Qn j (a1, · · · , an) λ(a1) · · · λ(an) f n j (xn ) := fj (x1) · · · fj (xn) = Pj (a1) · · · Pj (an) λ(a1) . . . λ(an) For {ωj }∞ j=1: ∑ ωj = 1, ωj > 0, gn (xn ) := ∞∑ j=1 ωj gn j (xn ) For any f and {Aj } s.t. h(fj ) → h(f ) as j → ∞, almost surely 1 n log f n(xn) gn(xn) → 0 (2) B. Ryabko. IEEE Trans. on Inform. Theory, 55, 9, 2009. 9 / 19 Universal Bayesian Measures
• 10. Problem Density Functions Generalized Density Functions The Bayesian Solution Summary Our Goal: what are they generalized into? . 1 if the random variable takes ﬁnite values: 1 n log Pn (xn ) Qn(xn) → 0 (1) for any Pn . 2 if a density function exists: 1 n log f n (xn ) gn(xn) → 0 (2) for any f n and {Aj } satisﬁes h(fj ) → h(f ) as j → ∞ 10 / 19 Universal Bayesian Measures
• 11. Problem Density Functions Generalized Density Functions The Bayesian Solution Summary Exactly when does density function exist? B: the Borel sets of R µ(D): the prob. of D ∈ B When a density function exists . The following are equivalent (µ ≪ λ): for each D ∈ B, λ(D) = 0 =⇒ µ(D) = 0 ∃ B-measurable dµ dλ := f s.t. µ(D) = ∫ D f (t)dλ(t) f is the density function (w.r.t. λ). 11 / 19 Universal Bayesian Measures
• 12. Problem Density Functions Generalized Density Functions The Bayesian Solution Summary Density Functions in a General Sense Radon-Nikodum’s Theorem . . The following are equivalent (µ ≪ η): for each D ∈ B, η(D) = 0 =⇒ µ(D) = 0 ∃ B-measurable dµ dη := fη s.t. µ(D) = ∫ D fη(t)dη(t) fη is the density function w.r.t. η.   Example 2: µ({h}) > 0, η({h}) := 1 h(h + 1) , h ∈ B := {1, 2, · · · } µ ≪ η µ(D) = ∑ h∈D∩B fη(h)η({h}) dµ dη (h) = fη(h) = µ({h}) η({h}) = h(h + 1)µ({h}) 12 / 19 Universal Bayesian Measures
• 13. Problem Density Functions Generalized Density Functions The Bayesian Solution Summary B1 := {{1}, {2, 3, · · · }} B2 := {{1}, {2}, {3, 4, · · · }} . . . Bk := {{1}, {2}, · · · , {k}, {k + 1, k + 2, · · · }} . . . tk : B → Bk (quantization, y ∈ b ∈ Bk =⇒ tk(y) = b) If (tk(y1), · · · , tk(yn)) = (b1, · · · , bn), gn η,k(yn ) := Qn k (b1, · · · , bn) η(b1) · · · η(bn) , gn η (yn ) := ∞∑ k=1 ωkgn η,k(yn ) For any fη and {Bk} s.t. h(fη,k) → h(fη) , almost surely 1 n log f n η (yn) gn η (yn) → 0 (3) gn(yn) ∏n i=1 ηn({yi }) estimates P(yn) = f n η (yn) ∏n i=1 ηn({yi }) 13 / 19 Universal Bayesian Measures
• 14. Problem Density Functions Generalized Density Functions The Bayesian Solution Summary In the general case µn (Dn ) := ∫ D f n η (yn )dηn (yn ) νn (Dn ) := ∫ D gn η (yn )dηn (yn ) f n η (yn) gn η (yn) = dµn dηn (yn )/ dνn dηn (yn ) = dµn dνn (yn ) D(µ||ν) := ∫ dµ log dµ dν h(fη) := ∫ −f n η (yn ) log f n η (yn )dη(yn ) = − ∫ dµ dη (yn ) log dµ dη (yn ) · dη(yn ) = −D(µ||η) 14 / 19 Universal Bayesian Measures
• 15. Problem Density Functions Generalized Density Functions The Bayesian Solution Summary Main Theorem Theorem . With probability one as n → ∞ 1 n log dµn dνn (yn ) → 0 for any stationary ergodic µn and {Bk} such that D(µk||η) → D(µ||η) as k → ∞ 15 / 19 Universal Bayesian Measures
• 16. Problem Density Functions Generalized Density Functions The Bayesian Solution Summary Joint Density Functions Example 3: A × B (based on Examples 1,2) µ ≪ λη A0 × B0 = {A} × {B} = {[0, 1)} × {{1, 2, · · · }} A1 × B1 A2 × B2 . . . Aj × Bk . . . (sj , tk) : A × B → Aj × Bk   If {Aj × Bk} satisﬁes fλη,jk → fλη, for any fλη, almost surely, we can construct gn λη s.t. 1 n log f n λη(xn, yn) gn λη(xn, yn) → 0 (4) 16 / 19 Universal Bayesian Measures
• 17. Problem Density Functions Generalized Density Functions The Bayesian Solution Summary The Answer to the Problem Estimate f n X (xn), f n Y (yn), f n XY (xn, yn) by gn X (xn), gn Y (yn), gn XY (xn, yn)   The Bayesian answer . . pgn X (xn)gn Y (yn) ≤ (1 − p)gXY (xn, yn) ⇐⇒ X, Y are independent 17 / 19 Universal Bayesian Measures
• 18. Problem Density Functions Generalized Density Functions The Bayesian Solution Summary The General Bayesian Solution Givem n examples zn and prior {pm} over models m = 1, 2, · · · , compute gn (zn |m) for each m = 1, 2, · · · ﬁnd the model m maxmizing pmg(zn |m) 18 / 19 Universal Bayesian Measures
• 19. Problem Density Functions Generalized Density Functions The Bayesian Solution Summary Summary and Discussion Bayesian Measure . . Generalization without assuming Discrete or Continuous Universality of Bayes/MDL in the generalized sense Many Applications Bayesian network structure estimation (DCC 2012) The Bayesian Chow-Liu Algorithm (PGM 2012) Markov order estimation even when {Xi } is continuous 19 / 19 Universal Bayesian Measures