Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

636 views

Published on

No Downloads

Total views

636

On SlideShare

0

From Embeds

0

Number of Embeds

77

Shares

0

Downloads

13

Comments

0

Likes

2

No embeds

No notes for slide

- 1. 4th International Summer School Achievements and Applications of Contemporary Informatics, Mathematics and Physics National University of Technology of the Ukraine Kiev, Ukraine, August 5-16, 2009 Classification Theory Modelling of Kernel Machine by Infinite and Semi-Infinite Programming Süreyya Özöğür-Akyüz, Gerhard-Wilhelm Weber * Institute of Applied Mathematics, METU, Ankara, Turkey * Faculty of Economics, Management Science and Law, University of Siegen, Germany Center for Research on Optimization and Control, University of Aveiro, Portugal 1 August 7, 2009
- 2. Motivation Prediction of Cleavage Sites signal part mature part γ 2 August 7, 2009
- 3. Logistic Regression P(Y = 1 X = xl ) log = β0 + β1 ⋅ xl1 + β2 ⋅ xl 2 + K + β p ⋅ xlp P(Y = 0 X = x ) l (l = 1, 2,..., N ) 3 August 7, 2009
- 4. Linear Classifiers Maximum margin classifier: γ i := yi ⋅ (< w, xi > +b) Note: γ i > 0 implies correct classification. γ yk ⋅ (< w, xk > +b) = 1 y j ⋅ (< w, x j > +b) = 1 4 August 7, 2009
- 5. Linear Classifiers 2 • The geometric margin: γ= w 2 2 2 max min w w 2 2 2 Convex min w w ,b 2 Problem subject to yi ⋅ ( w, xi + b) ≥ 1 (i = 1, 2,..., l) 5 August 7, 2009
- 6. Linear Classifiers Dual Problem: l 1 l max ∑ α i − ∑ yi y jα iα j xi , x j i =1 2 i , j =1 l subject to ∑ yα i =1 i i = 0, α i ≥ 0 (i = 1, 2,..., l). 6 August 7, 2009
- 7. Linear Classifiers Dual Problem: l 1 l max ∑ α i − ∑ yi y jα iα j κ ( xi , x j ) i =1 2 i , j =1 l kernel function subject to ∑ yα i =1 i i = 0, α i ≥ 0 (i = 1, 2,..., l). 7 August 7, 2009
- 8. Linear Classifiers Soft Margin Classifier: • Introduce slack variables to allow the margin constraints to be violated subject to yi ⋅ ( w, x i + b) ≥ 1 − ξi , ξi ≥ 0 (i = 1, 2,..., l) l w + C ∑ ξi2 2 min ξ , w ,b 2 i =1 subject to yi ⋅ ( w, xi + b) ≥ 1 − ξi , ξi ≥ 0 (i = 1, 2,..., l) 8 August 7, 2009
- 9. Linear Classifiers • Projection of the data into a higher dimensional feature space. • Mapping the input space X into a new space F : x = ( x1 ,..., xn ) a φ ( x) = (φ1 ( x),..., φN ( x)) φ (x) φ (x) φ (0) φ (x) φ (x) φ (0) φ (x) φ (0) φ (0) φ (0) φ (x) 9 August 7, 2009
- 10. Nonlinear Classifiers N set of hypotheses f ( x) =∑ wiφi ( x) + b, i =1 l dual representation f ( x) =∑ α i yi φ ( xi ), φ ( x) + b. i =1 kernel function Ex.: polynomial kernels κ ( x, z ) = (1 + xT z )k sigmoid Kernel κ ( x, z ) = tanh(axT z + b) κ ( x, z ) = exp(− x − z / σ 2 ) 2 Gaussian (RBF) kernel 2 10 August 7, 2009
- 11. (In-) Finite Kernel Learning • Based on the motivation of multiple kernel learning (MKL): K ( ) ( κ xi , x j = ∑ β k κ k xi , x j ) k =1 kernel functions κ l (⋅, ⋅) : βl ≥ 0 ( l = 1,K, K ) , ∑ βk = 1 K k =1 • Semi-infinite LP formulation: (SILP MKL) max θ θ ,β (θ ∈R, β ∈RK ) ∑ K such that 0 ≤ β, β k =1 k = 1, ∑k =1βk Sk (α ) ≥ θ ∀α ∈ Rl with 0 ≤ α ≤ C1 and ∑i =1αi yi = 0. K l Sk (α ) := 1 l 2 ( ) ∑ i, j =1αiα j yi y jκ k xi , x j − ∑ i =1αi l 11 August 7, 2009
- 12. Infinite Kernel Learning Infinite Programming 2 ex.: −ω xi − x j * κ ( xi , x j , ω ) := ω exp 2 + (1 − ω )(1 + xiT x j ) d H (ω ) := κ ( xi , x j , ω ) homotopy 2 −ω * xi − x j H (0) = (1 + xi x j ) d T H (1) = exp 2 κ β ( xi , x j ) := ∫ κ ( xi , x j , ω )d β (ω ) Ω Infinite Programming 12 August 7, 2009
- 13. Infinite Kernel Learning Infinite Programming • Introducing Riemann-Stieltjes integrals to the problem (SILP-MKL), we get the following general problem formulation: κ β ( xi , x j ) = ∫ κ ( xi , x j , ω )d β (ω ) Ω = [0,1] Ω 13 August 7, 2009
- 14. Infinite Kernel Learning Infinite Programming • Introducing Riemann-Stieltjes integrals to the problem (SILP-MKL), we get the following general problem formulation: max θ θ ,β (θ ∈ R, β : [0,1] → R : monotonically increasing ) (IP) 1 subject to ∫0 d β (ω ) = 1, 1 S (ω , α ) − ∑ i =1αi d β (ω ) ≥ θ ∀α ∈ R l with 0 ≤ α ≤ C , ∑ i =1αi yi = 0. l l ∫Ω 2 ( ) 1 l l S (ω , α ) := ∑ i , j =1α iα j yi y jκ xi , x j , ω A := α ∈ R 0 ≤ α ≤ C1 and ∑ α i yi =0 l 2 i =1 1 T (ω , α ) := S (ω , α ) − ∑ α i l 14 2 i =1 August 7, 2009
- 15. Infinite Kernel Learning Infinite Programming max θ (θ ∈ R, β : a positive measure on Ω ) (IP) θ ,β such that θ − ∫ T (ω , α )d β (ω ) ≤ 0 ∀α ∈ A, ∫Ω d β (ω ) = 1. Ω infinite programming dual of (IP): min σ (σ ∈ R , ρ : a positive measure on A ) σ ,ρ (DIP) such that σ -∫ T (ω , α )d ρ (α ) ≥ 0 ∀ω ∈ Ω, ∫A d ρ (α ) = 1. A • Duality Conditions: Let (θ , β ) and (σ , ρ ) be feasible for their respective problems, and complementary slack, so β has measure only where σ = ∫A T (ω , α )d ρ and ρ has measure only where θ = ∫ T (ω , α )d β . Ω Then, both solutions are optimal for their respective problems. 15 August 7, 2009
- 16. Infinite Kernel Learning Infinite Programming • The interesting theoretical problem here is to find conditions which ensure that solutions are point masses (i.e., the original monotonic β is a step function). • Because of this and in view of the compactness of the feasible (index) sets at the lower levels, A and Ω , we are interested in the nondegeneracy of the local minima of the lower level problem to get finitely many local minimizers of g ( (σ , ρ ) , ω ) := σ − ∫ T (ω , α ) d ρ (α ). A • Lower Level Problem: For a given parameter (σ , ρ ), we consider (LLP) min g ( (σ , ρ ) , ω ) subject to ω ∈ Ω . ω 16 August 7, 2009
- 17. Infinite Kernel Learning Infinite Programming • “reduction ansatz” and • Implicit Function Theorem • parametrical measures • “finite optimization” 17 August 7, 2009
- 18. Infinite Kernel Learning Infinite Programming • “reduction ansatz” and • Implicit Function Theorem • parametrical measures 1 −(ω − µ )2 e.g., f (ω ;( µ , σ )) = 2 exp σ 2π 2σ 2 λ exp(−λω), ω ≥ 0 f (ω ; λ) = 0, ω<0 H (ω − a) − H (ω − b) f (ω ;(a, b)) = b−a ωα −1 (1 − ω ) β −1 f (ω;(α , β )) = 1 α −1 β −1 ∫0 u (1 − u ) du • “finite optimization” 18 August 7, 2009
- 19. Infinite Kernel Learning Reduction Ansatz • “reduction ansatz” and • Implicit Function Theorem g ( x, ⋅) % • parametrical measures g ( x ,.) Ω g ( x, y ) ≥ 0 ∀y ∈ I yj yj % yp ⇔ min g ( x, y ) ≥ 0 y∈I x a y j ( x) implicit function 19 August 7, 2009
- 20. Infinite Kernel Learning Reduction Ansatz based on the reduction ansatz : min f ( x) subject to g j ( x) := g ( x, y j ( x)) ≥ 0 ( j ∈ J := {1, 2, K, p}) g ((σ , ρ ), ⋅) g ((σ , ρ ), ⋅) • (σ , ρ ) • ω ω (σ , ρ ) topology ω = ω (σ , ρ ) % 20 August 7, 2009
- 21. Infinite Kernel Learning Regularization regularization t t d d2 min − θ + sup µ ∫ d β (ω ) ∫ d β (ω ) θ ,β t∈[0,1] dt 0 2 dt 0 subject to the constraints 0 = t0 < t1 < K < tι = 1 tν +1 tν tν ∫ d β (ω ) − ∫ d β (ω ) tν +1 d 1 ∫ d β (ω ) ≈ 0 0 = ∫ d β (ω ) dt tν +1 − tν tν +1 − tν 0 tν tν + 2 tν +1 1 1 ∫ d β (ω ) − ∫ d β (ω ) 2 tν tν + 2 − tν +1 tν +1 − tν d tν +1 tν dt 2 0 ∫ d β (ω ) ≈ tν +1 − tν 21 August 7, 2009
- 22. Infinite Kernel Learning Topology Radon measure: measure on the σ -algebra of Borel sets of E that is locally finite and inner regular. (E,d): metric space inner regularity Η (E) : set of Radon measures on E neighbourhood of measure ρ : µ (Kν ) Bρ (ε ) := µ ∈ Η ( E ) ∫ fd µ − ∫ fd ρ < ε f A A dual space ( Η ( E ))′ of continuous bounded functions, Kν ⊂ E : compact set f ∈ ( Η ( E ))′ 22 August 7, 2009
- 23. Infinite Kernel Learning Topology Def.: Basis of neighbourhood of a measure ρ ( f1,..., fn ∈(Η(E))′; ε > 0) : {µ ∈ Η (E) ∫E fi d ρ − ∫E fi d µ < ε } (i = 1, 2,..., n) . Def.: Prokhorov metric: d0 ( µ , ρ ) := inf {ε ≥ 0 | µ ( A) ≤ ρ ( Aε ) + ε and ρ ( A) ≤ µ ( Aε ) + ε (A : closed)} , ε where Aε := { x ∈ E | d ( x, A) < ε }. Open δ -neighbourhood of a measure ρ : Bδ ( ρ ) := {µ ∈ Η ( E ) d0 ( ρ , µ ) < δ }. 23 August 7, 2009
- 24. Infinite Kernel Learning Numerical Results 24 August 7, 2009
- 25. References Özöğür, S., Shawe-Taylor, J., Weber, G.-W., and Ögel, Z.B., Pattern analysis for the prediction of eukoryatic pro peptide cleavage sites, in the special issue Networks in Computational Biology of Discrete Applied Mathematics 157, 10 (May 2009) 2388-2394. Özöğür-Akyüz, S., and Weber, G.-W., Infinite kernel learning by infinite and semi-infinite programming, Proceedings of the Second Global Conference on Power Control and Optimization, AIP Conference Proceedings 1159, Bali, Indonesia, 1-3 June 2009, Subseries: Mathematical and Statistical Physics; ISBN 978-0-7354-0696-4 (August 2009) 306-313; Hakim, A.H., Vasant, P., and Barsoum, N., guest eds.. Özöğür-Akyüz, S., and Weber, G.-W., Infinite Kernel Learning via infinite and semi-infinite programming, to appear in the special issue of OMS (Optimization Software and Application) at the occasion of International Conference on Engineering Optimization (EngOpt 2008; Rio de Janeiro, Brazil, June 1-5, 2008), Schittkowski, K. (guest ed.). Özöğür-Akyüz, S., and Weber, G.-W., On numerical optimization theory of infinite kernel learning, preprint at IAM, METU, submitted to JOGO (Journal of Global Optimization). 25 August 7, 2009

No public clipboards found for this slide

Be the first to comment