SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
1.
.
.
A Conjecture on Strongly Consistent Learning
Joe Suzuki
Osaka University
July 7, 2009
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 1 / 21
2.
Outline
Outline
1 Stochastic Learning with Model Selection
2 Learning CP
3 Learning ARMA
4 Conjecture
5 Summary
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 2 / 21
3.
Stochastic Learning with Model Selection
Probability Space (Ω, F, µ)
Ω: the entire set
F: the set of events over Ω
µ : F → [0, 1]: a probability measure (µ(Ω) = 1)
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 3 / 21
4.
Stochastic Learning with Model Selection
Stochastic Learning
X : Ω → R: random variable
µX : the measure w.r.t. X
x1, · · · , xn ∈ X(Ω): n samples
Induction/Inference
.
µX −→ x1, · · · , xn (Random Number Generation)
x1, · · · , xn −→ µX (Stochastic Learning)
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 4 / 21
5.
Stochastic Learning with Model Selection
Two Problems with Model Selection
Conditional Probability for Y given X
.
.
µY |X
Identify the equivalence relation in X from samples.
x ∼ x′
⇐⇒ µ(Y = y|X = x) = µ(Y = y|X = x′
) for y ∈ Y (Ω)
ARMA for {Xn}∞
n=−∞
.
Xn +
∑k
j=1 λj Xn−j ∼ N(0, σ2) with {λ}k
j=1 (0 ≤ k < ∞)
Identify the true order k from samples.
This paper
compares CP and ARMA to find how close they are.
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 5 / 21
6.
Learning CP
CP
A ∈ F
G ⊆ F
CP of A given G (Radon-Nykodim)
.
.
∃G-measurable g : Ω → R s.t µ(A ∩ G) =
∫
G
g(ω)µ(dω) for G ∈ G.
CP µY |X
A := (Y = y), y ∈ Y (Ω)
G: generated by subsets of X.
Assumption
|Y (Ω)| < ∞
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 6 / 21
7.
Learning CP
Applications for CP
Stochastic Decision Trees, Stochastic Horn Clauseses Y |X, etc.
.
.
Find G from samples.
G = {G1, G2, G3, G4, G5}
"!
#
"!
#
"!
#
"!
#
"!
#
"!
#
"!
#
"!
#
¡
¡¡
e
ee
¡
¡¡
e
ee
¨¨¨¨¨¨
rrrrrr
Weather
Temp
G3
Wind
G1 G2 G4 G5
Clear
Cloud
Rain
≤ 75 > 75 Yes No
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 7 / 21
9.
Learning CP
Learning CP
zi := (xi , yi ) ∈ Z(Ω) := X(Ω) × Y (Ω)
From zn := (z1, · · · , zn) ∈ Zn(Ω), find a minimal G.
G∗: true
ˆGn: estimated
Assumption
.
.
|G∗| ∞
Two Types of Errors
G∗ ⊂ ˆGn (Over Estimation)
G∗ ̸⊆ ˆGn (Under Estimation)
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 9 / 21
10.
Learning CP
Example: Quinlan’s Q4.5
(a) G∗
¡¡ ee ¡¡ ee
¨¨¨¨
rrrr
Weather
Temp
3
Wind
1 2 4 5
Clear
Cloud
Rain
≤ 75 75 Yes No
(b) Overestimation
¡¡ ee ¡¡ ee
¨¨¨¨
rrrr
Weather
Temp
Wind
Wind
1 2 5 6
3 4
Clear
Cloud
Rain
≤ 75 75 Yes No
Yes No
¡¡ ee
(c) Underestimation
¡¡ ee
¨¨¨¨
rrrr
Weather
Temp
3
4
1 2
Clear
Cloud
Rain
≤ 75 75
(d) Underestimation
¨¨¨¨
rrrr
Weather
1
Wind
Wind
4 5
¡¡ ee
¡¡ ee
2 3
Clear
Cloud
Rain
Yes No
¡¡ ee
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 10 / 21
11.
Learning CP
Information Criteria for CP
Given zn ∈ Z(Ω), find G minimizing
I(G, zn
) := H(G, zn
) +
k(G)
2
dn
H(G, zn): Empirical Entropy (Fitness of zn to G)
k(G): the # of Parameters (Simplicity of G)
dn ≥ 0:
dn
n
→ 0
dn = log n BIC/MDL
dn = 2 AIC
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 11 / 21
12.
Learning CP
Consistency
Consistency ( ˆGn −→ G∗ (n → ∞))
.
.
Weak Consistency Probability Convergence (O(1) dn o(n))
Strong Consistency Almost Surely Convergence (MDL/BIC etc.)
AIC (dn = 2) is not consistent because {dn} is too small !
Problem
What is the minimum {dn} for Strong Consistency ?
Answer (Suzuki, 2006)
dn = 2 log log n
(the Law of Iterated Logarithms)
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 12 / 21
13.
Learning CP
Error Probability for CP
G∗ ⊂ G (Over Estimation)
.
.
µ{ω ∈ Ω|I(G, Zn
(ω)) I(G∗
, Zn
(ω))}
= µ{ω ∈ Ω|χ2
K(G)−K(G∗)(ω) (K(G) − K(G∗
))dn}
χ2
K(G)−K(G∗) ∼ χ2 of freedom K(G) − K(G∗)
G∗ ̸⊆ G (Under Estimation)
µ{ω ∈ Ω|I(G, Zn
(ω)) I(G∗
, Zn
(ω))
diminishes exponentially with n
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 13 / 21
14.
Learning CP
Applications for ARMA
Time Series Xi |Xi−k, · · · , Xi−1
.
.
Xi +
k∑
j=1
λj Xi−j ∼ N(0, σ2
)
Find k from samples.
Gaussian Bayesian Networks Xi |Xj , j ∈ π(i)
Xi +
∑
j∈π(i)
λj,i Xj ∼ N(0, σ2
i )
Find π(i) ⊆ {1, · · · , i − 1} from samples.
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 14 / 21
15.
Learning ARMA
Learning ARMA
k ≥ 0
{λj }k
j=1: λi ∈ R
σ2 ∈ R0
{Xi }∞
i=−∞: Xi +
k∑
j=1
λj Xi−j = ϵi ∼ N(0, σ2
)
Given xn := (x1, · · · , xn) ∈ X1(Ω) × · · · × Xn(Ω), estimate
k: known {λj }k
j=1, σ2
k: Unknown k as well as {λj }k
j=1, σ2
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 15 / 21
16.
Learning ARMA
Yule-Walker
If k is known, solve {ˆλj,k}k
j=1 and ˆσ2
k w.r.t.
¯x :=
1
n
n∑
i=1
xi
cj :=
1
n
n−j∑
i=1
(xi − ¯x)(xi+j − ¯x) , j = 0, · · · , k
−1 c1 c2 · · · ck
0 c0 c1 · · · ck−1
0 c1 c0 · · · ck−2
...
...
...
...
...
0 ck−1 ck−2 · · · c0
ˆσ2
k
ˆλ1,k
ˆλ2,k
...
ˆλk,k
=
−c0
−c1
−c2
...
−ck
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 16 / 21
17.
Learning ARMA
Information Criteria for ARMA
If k is unknown, given xn ∈ Xn(Ω), find k minimiziing
I(k, xn
) :=
1
2
log ˆσ2
k +
k
2
dn
ˆσ2
k: obtain using Yule-Walker
dn ≥ 0:
dn
n
→ 0
dn = log n BIC/MDL
dn = 2 AIC
dn = 2 log log n Hannan-Quinn (1979)
=⇒the minimum {dn} satisfying Strong Consistency
Suzuki (2006) was inspired by Hannan-Quinn (1979) !
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 17 / 21
18.
Learning ARMA
Error Probability for ARMA
k∗: true Order
k∗ k (Under Estimation)
.
µ{ω ∈ Ω|I(k, Xn
(ω)) I(k∗, Xn
(ω))}
diminishes exponentially with n
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 18 / 21
19.
Conjecture
Error Probability for ARMA (cont’d)
Conjecture: k∗ k (Over Estimation)
.
µ{ω ∈ Ω|I(k, Xn
(ω)) I(k∗, Xn
(ω))}
= µ{ω ∈ Ω|χ2
k−k∗
(ω) (k − k∗)dn}
χ2
k−k∗
∼ χ2 of freedom k − k∗
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 19 / 21
20.
Conjecture
In fact, ...
For k k∗ and large n (use ˆσ2
k = (1 − ˆλ2
k,k)σ2
k−1),
1
2
log ˆσ2
k +
k
2
dn
k∗
2
log σ2
k∗
+
k∗
2
dn
⇐⇒
k∑
j=k∗+1
log
ˆσ2
j
ˆσ2
j−1
(k − k∗)dn
⇐⇒
k∑
j=k∗+1
log(1 − ˆλ2
k,k) dn
⇐⇒
k∑
j=k∗+1
ˆλ2
k,k (k − k∗)dn
It is very likely that
∑k
j=k∗+1
ˆλ2
k,k ∼ χ2
k−k∗
although ˆλk,k ̸∼ N(0, 1).
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 20 / 21
21.
Summary
Summary
CP and ARMA are similar
.
.
1 Results in ARMA can be applied to CP
2 Results in Gausssian BN can be applied to finite BN.
The conjecture is likely enough ?
.
.
Answer: Perhaps. Several evidences.
{Zi }n
i=1: independent
Zi ∼ N(0, 1)
rankA = n − k, A2
= A =⇒ t
[Z1, · · · , Zn]A[Z1, · · · , Zn] ∼ χ2
n−k
Joe Suzuki (Osaka University) A Conjecture on Strongly Consistent Learning July 7, 2009 21 / 21
0 likes
Be the first to like this
Views
Total views
339
On SlideShare
0
From Embeds
0
Number of Embeds
1
You have now unlocked unlimited access to 20M+ documents!
Unlimited Reading
Learn faster and smarter from top experts
Unlimited Downloading
Download to take your learnings offline and on the go
You also get free access to Scribd!
Instant access to millions of ebooks, audiobooks, magazines, podcasts and more.
Read and listen offline with any device.
Free access to premium services like Tuneln, Mubi and more.