Jokyokai2

Introduction Mixed-Norm-Elasticnet-MKL Mini-max Lp -MKL Conclusion References
. . . . . . . . . . . . . . . .

.
.
Fast Convergence Rate of
Multiple Kernel Learning
with Elastic-Net Regularization
.
.. .

.
† † ‡

†

‡

2011 4 25

. . . . . .

. . . . . . . . . . . . . . . .

Outline

.
. . Introduction
1
MKL

.
. . Mixed-Norm-Elasticnet-MKL
2

Mixed-Elasticnet-MKL

.
. . Mini-max
3

.
. . Lp -MKL
4

.
. . Conclusion
5

. . . . . .

. . . . . . . . . . . . . . . .
MKL

(RKHS)

k(x, x ′ ) ⇔ Hk

1∑
n
f ← min
ˆ ℓ(yi , f (xi )) + C ∥f ∥Hk
f ∈Hk n
i=1

∑
n
∃αi ∈ R s.t. ˆ
f (x) = αi k(xi , x)
i=1

. . . . . .

. . . . . . . . . . . . . . . .
MKL

Challenge
, , ,

Multiple Kernel Leaning

. . . . . .

. . . . . . . . . . . . . . . .
MKL

Multiple Kernel Learning

Single Kernel Learning

1∑
n
f ← min
ˆ ℓ(yi , f (xi )) + C ∥f ∥Hk
f ∈Hk n
i=1

Multiple Kernel Learning (Lanckriet et al., 2004; Bach et al., 2004)
( )
∑M ∑n ∑M ∑M
ˆ=
f ˆ ← min 1
fm ℓ yi , fm (xi ) + C ∥fm ∥Hm
fm ∈Hm n
m=1 m=1i=1 m=1

(Hm : km RKHS)
Group Lasso

(Sonnenburg et al., 2006;
Rakotomamonjy et al., 2008; Suzuki & Tomioka, 2009)

. . . . . .

. . . . . . . . . . . . . . . .
MKL

L1 -MKL (Lanckriet et al., 2004; Bach et al., 2004)
( M )
∑ ∑M
min L fm + C ∥fm ∥Hm
fm ∈Hm
m=1 m=1

L2 -MKL
( )
∑
M ∑
M
min L fm +C ∥fm ∥2 m
H
fm ∈Hm
m=1 m=1

. . . . . .

. . . . . . . . . . . . . . . .
MKL

L1 -MKL (Lanckriet et al., 2004; Bach et al., 2004)
( M )
∑ ∑M
min L fm + C ∥fm ∥Hm
fm ∈Hm
m=1 m=1

L2 -MKL
( )
∑
M ∑
M
min L fm +C ∥fm ∥2 m
H
fm ∈Hm
m=1 m=1

Elasticnet-MKL (Tomioka & Suzuki, 2009)
( M )
∑ ∑M ∑
M
min L fm + C1 ∥fm ∥Hm + C2 ∥fm ∥2 m
H
fm ∈Hm
m=1 m=1 m=1

Mixed-Norm-Elasticnet-MKL (Meier et al., 2009)
( M )
∑ ∑√
M ∑
M
min L fm + C1 ∥fm ∥2 + C2 ∥fm ∥2 m + C3
n H ∥fm ∥2 m
H
fm ∈Hm
m=1 m=1 m=1
∑n
∥f ∥2
n := 1
n i=1
2
f (xi ) .
. . . . . .

. . . . . . . . . . . . . . . .

Mixed-Norm-Elasticnet-MKL

regression

1∑
n
L(f ) = (f (xi ) − yi )2
n
i=1

∑
M
f ∗ (x) = ∗
fm (x)(= E[Y |x])
m=1

. . . . . .

. . . . . . . . . . . . . . . .

∥f − f ∗ ∥2 2
ˆ
L d ∗
d=|{m | ∥fm ∥Hm̸=0}|.
L1 -MKL (Koltchinskii & Yuan, 2008):
( )
d log(M)
1+s n − 1+s +
1−s 1
Op d
n
Mixed-Norm-Elasticnet-MKL (Meier et al., 2009): mini-max
( ( ) 1 )
log(M) 1+s
Op d
n
Mixed-Norm-L1 -MKL (Koltchinskii & Yuan, 2010): mini-max
∑
m (C1 ∥fm ∥n + C2 ∥fm ∥Hm )
( )
d log(M)
Op dn− 1+s +
1

n
Mini-max (Raskutti et al., 2009)
( )
− 1+s
1 d log(M/d)
Op dn +
n
. . . . . .

. . . . . . . . . . . . . . . .

( )
1+q 1+q 2s
d log(M)
∥f − f ∗ ∥2 2 = Op d 1+q+s n− 1+q+s R21+q+s +
ˆ L .
n

f∗ q
f∗ “ ”R2
ℓ2 mini-max ℓ∞

. . . . . .

. . . . . . . . . . . . . . . .

(q)
1−s 1
K&Y (2008) q=1 ? d 1+s n− 1+s + d log(M)
( ) 1 n
log(M) 1+s
Meier et al. (2009) q=0 d n
1
K&Y (2010) q=0 ℓ∞ -ball dn− 1+s +
d log(M)
n
1+q
− 1+q+s
IBIS2010 0≤q≤1 ℓ∞ -ball dn + d log(M)
n
( d ) 1+q+s 1+q+s
1+q 2s
0≤q≤1 ℓ2 -ball n
R2 + d log(M)
n

. . . . . .

. . . . . . . . . . . . . . . .

∗
I0 := {m | ∥fm ∥Hm ̸= 0}

∗
∥fm ∥Hm > 0 (m ∈ I0 ),
∗
∥fm ∥Hm = 0 (m ∈ I0 ).
c

d = |I0 | ( )

. . . . . .

. . . . . . . . . . . . . . . .

Spectrum Condition (s)
0 < s < 1:

Mercer
∑∞
km (x, x ′ ) = ℓ=1 µℓ,m ϕℓ,m (x)ϕℓ,m (x ′ )
{ϕℓ,m }∞
L2 (P)
ℓ=1 ONS.
.
Spectrum Condition (s) .
..
0<s<1

µℓ,m ≤ C ℓ− s
1

. (∀ℓ, m).
.. .

.
s RKHS
s s
.
Proposition (Steinwart et al. (2009)) .
..
µℓ,m ∼ ℓ− s ⇔ N(B(Hm ), ϵ, L2 (P)) ∼ ϵ−2s
1

.
.. .

.
. . . . . .

. . . . . . . . . . . . . . . .

Convolution Condition (q)

0 ≤ q ≤ 1: f∗

Σm : Hm → Hm ⟨f , Σm g ⟩Hm := E[f (X )g (X )]
.
Convolution Condition (q) (Caponnetto & de Vito, 2007) .
..
∗
0 ≤ q ≤ 1 gm ∈ Hm
∗ ∗
fm = Σq/2 gm
m

.
.. .

.
∑∞ q/2
km (x, x ′ ) := ℓ=1 µℓ,m ϕℓ,m (x)ϕℓ,m (x ′ )
(q/2)

∫
∗
fm (x) = km (x, x ′ )gm (x ′ )dP(x ′ ),
(q/2) ∗

. . . . . .

. . . . . . . . . . . . . . . .

s q

f*
f* f*

(a) s q=0 (b) s q>0 (c) s q>0

. . . . . .

. . . . . . . . . . . . . . . .

Incoherece Condition

.
Incoherece Condition (Koltchinskii & Yuan, 2008; Meier et al., 2009) .
..
0<C

. 0 < C < κ(I0 )(1 − ρ2 (I0 )).
.. .

.
{ ∑ }
∥ m∈I fm ∥2 2
κ(I ) := sup κ ≥ 0 | κ ≤ ∑ L
2 , ∀fm ∈ Hm (m ∈ I ) ,
m∈I ∥fm ∥L2
{ }
⟨fI , gI c ⟩L2
ρ(I ) := sup | fI ∈ HI , gI c ∈ HI c , fI ̸= 0, gI c ̸= 0 .
∥fI ∥L2 ∥gI c ∥L2

I0 .

. . . . . .

. . . . . . . . . . . . . . . .

.
Basic Condition .
.. ∑M
E[Y |X ] = f ∗ (X ) = m=1 fm (X )
∗
ϵ := Y − f (X ) ∗

|ϵ| ≤ L.
. supX ∈X |km (X , X )| ≤ 1 (∀m).
.. .

.
.
∞-norm Bound Condition .
..
Spectrum Condition (s)

∥fm ∥∞ ≤ C ∥fm ∥1−s ∥fm ∥s m .
L2 (P) H
.
.. .

.
Gaussian Sobolev
Mendelson and Neeman (2010); Steinwart et al. (2009)

. . . . . .

. . . . . . . . . . . . . . . .

( )
∑
M
(n)
∑√
M
(n) (n)
∑
M
min L fm + λ1 ∥fm ∥2 + λ2 ∥fm ∥2 m + λ3
n H ∥fm ∥2 m .
H
fm ∈Hm
m=1 m=1 m=1

.
Theorem (Suzuki et al. (2011)) .
..
Spectrum Condition (s), Convolution Condition (q), Incoherence
Condition, Basic Condition, ∞-norm Bound Condition
(n) (n) (n)
n λ1 , λ2 , λ3
( )
1+q 1+q 2s
d log(M)
∥f − f ∗ ∥2 2 ≤ C ′ d 1+q+s n− 1+q+s R2,g ∗ +
ˆ L
1+q+s
η(t)2 ,
n
√ √
. 1 − e− nt
− e− n
(∀t ≥ 1)
.. .

.
√ √
η(t) := max( t, t/ n) R2,g ∗ :
( )1
∑
M
∗
2

R 2,g ∗ := ∥gm ∥2 m
H .
m=1
. . . . . .

. . . . . . . . . . . . . . . .

Bound

q=0

dn− 1+s +
1 d log(M)
Koltchinskii and Yuan (2010) n .
1+q 1+q 2s
d 1+q+s n− 1+q+s R2,g ∗ +
1+q+s d log(M)
n .

...
1 ∗
∥fm ∥Hm = 1 (m = 1, . . . , d):
dn− 1+s + d log(M)
1

n
Koltchinskii and Yuan (2010)
...
2 ∥fm ∥Hm = m−1 (m = 1, . . . , d):
∗

d 1+s n− 1+s + d log(M)
1 1

n s
Koltchinskii and Yuan (2010) d 1+s

(s = 0)

. . . . . .

. . . . . . . . . . . . . . . .

Mini-max

Mini-max
q
∗ ∗
fm = Σm gm
2

(∑ )1
1
.
.. M
m=1
∗
∥gm ∥2 m
H
2
≤ R2 g∗ R2 ℓ2

1+q 1+q 2s
d log(M/d)
d 1+q+s n− 1+q+s R21+q+s +
n

...
2 ∗
maxm ∥gm ∥Hm ≤ R∞ g∗ R∞ ℓ∞

1+q 2s
d log(M/d)
dn− 1+q+s R∞ +
1+q+s

n
q = 0, R∞ = 1 Koltchinskii and Yuan (2010)

. . . . . .

. . . . . . . . . . . . . . . .

Lp -MKL
Lp -MKL (Kloft et al., 2009)
( M )
∑ (n)
∑
M
min L fm + λ1 ∥fm ∥p m
H
fm ∈Hm
m=1 m=1

√ t (∑ )p
1
M ∗
η(t) := max( t, √n ), Rp := m=1 ∥fm ∥p m
H
.
Theorem (Lp -MKL ) .
..
Spectrum Condition(s), Incoherence Condition, Basic Condition, ∞-norm
2s
1− − 1+s
2
λ1 = n− 1+s M
(n) 1 p(1+s)
Bound Condition Rp
( )
2s 2s M log(M)
∥f − f ∗ ∥2 2 ≤ C n− 1+s M 1− p(1+s) Rp1+s +
1
ˆ L η(t)2 ,
n
√
. 1 − exp(−t) − exp(− n)
.. .

.
Mini-max . . . . . .

. . . . . . . . . . . . . . . .

Conclusion

Mixed-Norm-Elasticnet–MKL

f∗ q
ℓ2 mini-max
Lp -MKL

arXiv http://arxiv.org/abs/1103.0431
slide: http://www.simplex.t.u-tokyo.ac.jp/˜s-taiji/data/IBISML2011.pdf

. . . . . .

. . . . . . . . . . . . . . . .

Bach, F., Lanckriet, G., & Jordan, M. (2004). Multiple kernel learning,
conic duality, and the SMO algorithm. the 21st International
Conference on Machine Learning (pp. 41–48).
Caponnetto, A., & de Vito, E. (2007). Optimal rates for regularized
least-squares algorithm. Foundations of Computational Mathematics,
7, 331–368.
Kloft, M., Brefeld, U., Sonnenburg, S., Laskov, P., M¨ller, K.-R., & Zien,
u
A. (2009). Eﬃcient and accurate ℓp -norm multiple kernel learning.
Advances in Neural Information Processing Systems 22 (pp.
997–1005). Cambridge, MA: MIT Press.
Koltchinskii, V., & Yuan, M. (2008). Sparse recovery in large ensembles
of kernel machines. Proceedings of the Annual Conference on Learning
Theory (pp. 229–238).
Koltchinskii, V., & Yuan, M. (2010). Sparsity in multiple kernel learning.
The Annals of Statistics, 38, 3660–3695.
Lanckriet, G., Cristianini, N., Ghaoui, L. E., Bartlett, P., & Jordan, M.
(2004). Learning the kernel matrix with semi-deﬁnite programming.
Journal of Machine Learning Research, 5, 27–72.
Meier, L., van de Geer, S., & B¨hlmann, P. (2009). High-dimensional
u
additive modeling. The Annals of Statistics, 37, 3779–3821. .
. . . . .

. . . . . . . . . . . . . . . .

Mendelson, S., & Neeman, J. (2010). Regularization in kernel learning.
The Annals of Statistics, 38, 526–565.
Rakotomamonjy, A., Bach, F., Canu, S., & Y., G. (2008). SimpleMKL.
Journal of Machine Learning Research, 9, 2491–2521.
Raskutti, G., Wainwright, M., & Yu, B. (2009). Lower bounds on
minimax rates for nonparametric regression with additive sparsity and
smoothness. In Advances in neural information processing systems 22,
1563–1570. Cambridge, MA: MIT Press.
Sonnenburg, S., R¨tsch, G., Sch¨fer, C., & Sch¨lkopf, B. (2006). Large
a a o
scale multiple kernel learning. Journal of Machine Learning Research,
7, 1531–1565.
Steinwart, I., Hush, D., & Scovel, C. (2009). Optimal rates for
regularized least squares regression. Proceedings of the Annual
Conference on Learning Theory (pp. 79–93).
Suzuki, T., & Tomioka, R. (2009). SpicyMKL. arXiv:0909.5026.
Suzuki, T., Tomioka, R., & Sugiyama, M. (2011). Fast convergence rate
of multiple kernel learning with elastic-net regularization.
arXiv:1103.0431.
. . . . . .

. . . . . . . . . . . . . . . .

Tomioka, R., & Suzuki, T. (2009). Sparsity-accuracy trade-oﬀ in MKL.
NIPS 2009 Workshop:: Understanding Multiple Kernel Learning
Methods. Whistler. arXiv:1001.2615.

. . . . . .

Jokyokai2

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Jokyokai2

Similar to Jokyokai2 (7)

More from Taiji Suzuki

More from Taiji Suzuki (7)

Jokyokai2