The document discusses multiple kernel learning approaches for classification. It introduces single kernel learning and describes the challenge of multiple kernel learning when using multiple reproducing kernel Hilbert spaces. The document then summarizes several regularization-based multiple kernel learning approaches, including L1-MKL, L2-MKL, Elasticnet-MKL, and proposes a new Mixed-Norm-Elasticnet-MKL approach.
6. Introduction Mixed-Norm-Elasticnet-MKL Mini-max Lp -MKL Conclusion References
. . . . . . . . . . . . . . . .
MKL
Multiple Kernel Learning
Single Kernel Learning
1∑
n
f ← min
ˆ ℓ(yi , f (xi )) + C ∥f ∥Hk
f ∈Hk n
i=1
Multiple Kernel Learning (Lanckriet et al., 2004; Bach et al., 2004)
( )
∑M ∑n ∑M ∑M
ˆ=
f ˆ ← min 1
fm ℓ yi , fm (xi ) + C ∥fm ∥Hm
fm ∈Hm n
m=1 m=1i=1 m=1
(Hm : km RKHS)
Group Lasso
(Sonnenburg et al., 2006;
Rakotomamonjy et al., 2008; Suzuki & Tomioka, 2009)
. . . . . .
7. Introduction Mixed-Norm-Elasticnet-MKL Mini-max Lp -MKL Conclusion References
. . . . . . . . . . . . . . . .
MKL
L1 -MKL (Lanckriet et al., 2004; Bach et al., 2004)
( M )
∑ ∑M
min L fm + C ∥fm ∥Hm
fm ∈Hm
m=1 m=1
L2 -MKL
( )
∑
M ∑
M
min L fm +C ∥fm ∥2 m
H
fm ∈Hm
m=1 m=1
. . . . . .
8. Introduction Mixed-Norm-Elasticnet-MKL Mini-max Lp -MKL Conclusion References
. . . . . . . . . . . . . . . .
MKL
L1 -MKL (Lanckriet et al., 2004; Bach et al., 2004)
( M )
∑ ∑M
min L fm + C ∥fm ∥Hm
fm ∈Hm
m=1 m=1
L2 -MKL
( )
∑
M ∑
M
min L fm +C ∥fm ∥2 m
H
fm ∈Hm
m=1 m=1
Elasticnet-MKL (Tomioka & Suzuki, 2009)
( M )
∑ ∑M ∑
M
min L fm + C1 ∥fm ∥Hm + C2 ∥fm ∥2 m
H
fm ∈Hm
m=1 m=1 m=1
Mixed-Norm-Elasticnet-MKL (Meier et al., 2009)
( M )
∑ ∑√
M ∑
M
min L fm + C1 ∥fm ∥2 + C2 ∥fm ∥2 m + C3
n H ∥fm ∥2 m
H
fm ∈Hm
m=1 m=1 m=1
∑n
∥f ∥2
n := 1
n i=1
2
f (xi ) .
. . . . . .
9. Introduction Mixed-Norm-Elasticnet-MKL Mini-max Lp -MKL Conclusion References
. . . . . . . . . . . . . . . .
MKL
L1 -MKL (Lanckriet et al., 2004; Bach et al., 2004)
( M )
∑ ∑M
min L fm + C ∥fm ∥Hm
fm ∈Hm
m=1 m=1
L2 -MKL
( )
∑
M ∑
M
min L fm +C ∥fm ∥2 m
H
fm ∈Hm
m=1 m=1
Elasticnet-MKL (Tomioka & Suzuki, 2009)
( M )
∑ ∑M ∑
M
min L fm + C1 ∥fm ∥Hm + C2 ∥fm ∥2 m
H
fm ∈Hm
m=1 m=1 m=1
Mixed-Norm-Elasticnet-MKL (Meier et al., 2009)
( M )
∑ ∑√
M ∑
M
min L fm + C1 ∥fm ∥2 + C2 ∥fm ∥2 m + C3
n H ∥fm ∥2 m
H
fm ∈Hm
m=1 m=1 m=1
∑n
∥f ∥2
n := 1
n i=1
2
f (xi ) .
. . . . . .
10. Introduction Mixed-Norm-Elasticnet-MKL Mini-max Lp -MKL Conclusion References
. . . . . . . . . . . . . . . .
Mixed-Norm-Elasticnet-MKL
regression
1∑
n
L(f ) = (f (xi ) − yi )2
n
i=1
∑
M
f ∗ (x) = ∗
fm (x)(= E[Y |x])
m=1
. . . . . .
11. Introduction Mixed-Norm-Elasticnet-MKL Mini-max Lp -MKL Conclusion References
. . . . . . . . . . . . . . . .
∥f − f ∗ ∥2 2
ˆ
L d ∗
d=|{m | ∥fm ∥Hm̸=0}|.
L1 -MKL (Koltchinskii & Yuan, 2008):
( )
d log(M)
1+s n − 1+s +
1−s 1
Op d
n
Mixed-Norm-Elasticnet-MKL (Meier et al., 2009): mini-max
( ( ) 1 )
log(M) 1+s
Op d
n
Mixed-Norm-L1 -MKL (Koltchinskii & Yuan, 2010): mini-max
∑
m (C1 ∥fm ∥n + C2 ∥fm ∥Hm )
( )
d log(M)
Op dn− 1+s +
1
n
Mini-max (Raskutti et al., 2009)
( )
− 1+s
1 d log(M/d)
Op dn +
n
. . . . . .
29. Introduction Mixed-Norm-Elasticnet-MKL Mini-max Lp -MKL Conclusion References
. . . . . . . . . . . . . . . .
Bach, F., Lanckriet, G., & Jordan, M. (2004). Multiple kernel learning,
conic duality, and the SMO algorithm. the 21st International
Conference on Machine Learning (pp. 41–48).
Caponnetto, A., & de Vito, E. (2007). Optimal rates for regularized
least-squares algorithm. Foundations of Computational Mathematics,
7, 331–368.
Kloft, M., Brefeld, U., Sonnenburg, S., Laskov, P., M¨ller, K.-R., & Zien,
u
A. (2009). Efficient and accurate ℓp -norm multiple kernel learning.
Advances in Neural Information Processing Systems 22 (pp.
997–1005). Cambridge, MA: MIT Press.
Koltchinskii, V., & Yuan, M. (2008). Sparse recovery in large ensembles
of kernel machines. Proceedings of the Annual Conference on Learning
Theory (pp. 229–238).
Koltchinskii, V., & Yuan, M. (2010). Sparsity in multiple kernel learning.
The Annals of Statistics, 38, 3660–3695.
Lanckriet, G., Cristianini, N., Ghaoui, L. E., Bartlett, P., & Jordan, M.
(2004). Learning the kernel matrix with semi-definite programming.
Journal of Machine Learning Research, 5, 27–72.
Meier, L., van de Geer, S., & B¨hlmann, P. (2009). High-dimensional
u
additive modeling. The Annals of Statistics, 37, 3779–3821. .
. . . . .
30. Introduction Mixed-Norm-Elasticnet-MKL Mini-max Lp -MKL Conclusion References
. . . . . . . . . . . . . . . .
Mendelson, S., & Neeman, J. (2010). Regularization in kernel learning.
The Annals of Statistics, 38, 526–565.
Rakotomamonjy, A., Bach, F., Canu, S., & Y., G. (2008). SimpleMKL.
Journal of Machine Learning Research, 9, 2491–2521.
Raskutti, G., Wainwright, M., & Yu, B. (2009). Lower bounds on
minimax rates for nonparametric regression with additive sparsity and
smoothness. In Advances in neural information processing systems 22,
1563–1570. Cambridge, MA: MIT Press.
Sonnenburg, S., R¨tsch, G., Sch¨fer, C., & Sch¨lkopf, B. (2006). Large
a a o
scale multiple kernel learning. Journal of Machine Learning Research,
7, 1531–1565.
Steinwart, I., Hush, D., & Scovel, C. (2009). Optimal rates for
regularized least squares regression. Proceedings of the Annual
Conference on Learning Theory (pp. 79–93).
Suzuki, T., & Tomioka, R. (2009). SpicyMKL. arXiv:0909.5026.
Suzuki, T., Tomioka, R., & Sugiyama, M. (2011). Fast convergence rate
of multiple kernel learning with elastic-net regularization.
arXiv:1103.0431.
. . . . . .