4th International Summer School
Achievements and Applications of Contemporary
Informatics, Mathematics and Physics
Nationa...
Motivation      Prediction of Cleavage Sites




signal part   mature part




                                 γ


      ...
Logistic Regression

          P(Y = 1 X = xl ) 
     log                     = β0 + β1 ⋅ xl1 + β2 ⋅ xl 2 + K + β p ⋅ x...
Linear Classifiers

  Maximum margin classifier:
                                       γ i := yi ⋅ (< w, xi > +b)

      ...
Linear Classifiers


                                   2
 •   The geometric margin:      γ=
                             ...
Linear Classifiers



  Dual Problem:

                          l
                             1 l
                  max ...
Linear Classifiers



  Dual Problem:

                          l
                             1 l
                  max ...
Linear Classifiers
     Soft Margin Classifier:

 •    Introduce slack variables to allow the margin constraints to be
   ...
Linear Classifiers

• Projection of the data into a higher dimensional feature space.

• Mapping the input space X into a ...
Nonlinear Classifiers

                                             N
 set of hypotheses                 f ( x) =∑ wiφi ( ...
(In-) Finite Kernel Learning

     •       Based on the motivation of multiple kernel learning (MKL):

                   ...
Infinite Kernel Learning Infinite Programming

                                                                  2
     ex...
Infinite Kernel Learning Infinite Programming

•   Introducing Riemann-Stieltjes integrals to the problem (SILP-MKL),
    ...
Infinite Kernel Learning Infinite Programming

 •    Introducing Riemann-Stieltjes integrals to the problem (SILP-MKL),
  ...
Infinite Kernel Learning Infinite Programming
                max θ       (θ ∈ R, β :    a positive measure on Ω )
(IP)   ...
Infinite Kernel Learning Infinite Programming

 •   The interesting theoretical problem here is to find conditions
     wh...
Infinite Kernel Learning Infinite Programming


• “reduction ansatz” and
• Implicit Function Theorem
• parametrical measur...
Infinite Kernel Learning Infinite Programming


• “reduction ansatz” and
• Implicit Function Theorem
• parametrical measur...
Infinite Kernel Learning Reduction Ansatz


• “reduction ansatz” and
• Implicit Function Theorem
                         ...
Infinite Kernel Learning Reduction Ansatz
based on the reduction ansatz :

 min f ( x)
 subject to g j ( x) := g ( x, y j ...
Infinite Kernel Learning Regularization
regularization
                                t                                  ...
Infinite Kernel Learning Topology

Radon measure: measure on the σ -algebra of Borel sets of E that is
locally finite and ...
Infinite Kernel Learning Topology

Def.: Basis of neighbourhood of a measure    ρ ( f1,..., fn ∈(Η(E))′; ε > 0) :

       ...
Infinite Kernel Learning        Numerical Results




                           24
                                      ...
References
Özöğür, S., Shawe-Taylor, J., Weber, G.-W., and Ögel, Z.B., Pattern analysis for the prediction of eukoryatic p...
Upcoming SlideShare
Loading in …5
×

Classification Theory

636 views

Published on

AACIMP 2009 Summer School lecture by Gerhard Wilhelm Weber. "Modern Operational Research and Its Mathematical Methods" course.

Published in: Education, Technology
  • Be the first to comment

Classification Theory

  1. 1. 4th International Summer School Achievements and Applications of Contemporary Informatics, Mathematics and Physics National University of Technology of the Ukraine Kiev, Ukraine, August 5-16, 2009 Classification Theory Modelling of Kernel Machine by Infinite and Semi-Infinite Programming Süreyya Özöğür-Akyüz, Gerhard-Wilhelm Weber * Institute of Applied Mathematics, METU, Ankara, Turkey * Faculty of Economics, Management Science and Law, University of Siegen, Germany Center for Research on Optimization and Control, University of Aveiro, Portugal 1 August 7, 2009
  2. 2. Motivation Prediction of Cleavage Sites signal part mature part γ 2 August 7, 2009
  3. 3. Logistic Regression  P(Y = 1 X = xl )  log  = β0 + β1 ⋅ xl1 + β2 ⋅ xl 2 + K + β p ⋅ xlp  P(Y = 0 X = x )   l  (l = 1, 2,..., N ) 3 August 7, 2009
  4. 4. Linear Classifiers Maximum margin classifier: γ i := yi ⋅ (< w, xi > +b) Note: γ i > 0 implies correct classification. γ yk ⋅ (< w, xk > +b) = 1 y j ⋅ (< w, x j > +b) = 1 4 August 7, 2009
  5. 5. Linear Classifiers 2 • The geometric margin: γ= w 2 2 2 max min w w 2 2 2 Convex min w w ,b 2 Problem subject to yi ⋅ ( w, xi + b) ≥ 1 (i = 1, 2,..., l) 5 August 7, 2009
  6. 6. Linear Classifiers Dual Problem: l 1 l max ∑ α i − ∑ yi y jα iα j xi , x j i =1 2 i , j =1 l subject to ∑ yα i =1 i i = 0, α i ≥ 0 (i = 1, 2,..., l). 6 August 7, 2009
  7. 7. Linear Classifiers Dual Problem: l 1 l max ∑ α i − ∑ yi y jα iα j κ ( xi , x j ) i =1 2 i , j =1 l kernel function subject to ∑ yα i =1 i i = 0, α i ≥ 0 (i = 1, 2,..., l). 7 August 7, 2009
  8. 8. Linear Classifiers Soft Margin Classifier: • Introduce slack variables to allow the margin constraints to be violated subject to yi ⋅ ( w, x i + b) ≥ 1 − ξi , ξi ≥ 0 (i = 1, 2,..., l) l w + C ∑ ξi2 2 min ξ , w ,b 2 i =1 subject to yi ⋅ ( w, xi + b) ≥ 1 − ξi , ξi ≥ 0 (i = 1, 2,..., l) 8 August 7, 2009
  9. 9. Linear Classifiers • Projection of the data into a higher dimensional feature space. • Mapping the input space X into a new space F : x = ( x1 ,..., xn ) a φ ( x) = (φ1 ( x),..., φN ( x)) φ (x) φ (x) φ (0) φ (x) φ (x) φ (0) φ (x) φ (0) φ (0) φ (0) φ (x) 9 August 7, 2009
  10. 10. Nonlinear Classifiers N set of hypotheses f ( x) =∑ wiφi ( x) + b, i =1 l dual representation f ( x) =∑ α i yi φ ( xi ), φ ( x) + b. i =1 kernel function Ex.: polynomial kernels κ ( x, z ) = (1 + xT z )k sigmoid Kernel κ ( x, z ) = tanh(axT z + b) κ ( x, z ) = exp(− x − z / σ 2 ) 2 Gaussian (RBF) kernel 2 10 August 7, 2009
  11. 11. (In-) Finite Kernel Learning • Based on the motivation of multiple kernel learning (MKL): K ( ) ( κ xi , x j = ∑ β k κ k xi , x j ) k =1 kernel functions κ l (⋅, ⋅) : βl ≥ 0 ( l = 1,K, K ) , ∑ βk = 1 K k =1 • Semi-infinite LP formulation: (SILP MKL) max θ θ ,β (θ ∈R, β ∈RK ) ∑ K such that 0 ≤ β, β k =1 k = 1, ∑k =1βk Sk (α ) ≥ θ ∀α ∈ Rl with 0 ≤ α ≤ C1 and ∑i =1αi yi = 0. K l Sk (α ) := 1 l 2 ( ) ∑ i, j =1αiα j yi y jκ k xi , x j − ∑ i =1αi l 11 August 7, 2009
  12. 12. Infinite Kernel Learning Infinite Programming 2 ex.: −ω xi − x j * κ ( xi , x j , ω ) := ω exp 2 + (1 − ω )(1 + xiT x j ) d H (ω ) := κ ( xi , x j , ω ) homotopy 2 −ω * xi − x j H (0) = (1 + xi x j ) d T H (1) = exp 2 κ β ( xi , x j ) := ∫ κ ( xi , x j , ω )d β (ω ) Ω Infinite Programming 12 August 7, 2009
  13. 13. Infinite Kernel Learning Infinite Programming • Introducing Riemann-Stieltjes integrals to the problem (SILP-MKL), we get the following general problem formulation: κ β ( xi , x j ) = ∫ κ ( xi , x j , ω )d β (ω ) Ω = [0,1] Ω 13 August 7, 2009
  14. 14. Infinite Kernel Learning Infinite Programming • Introducing Riemann-Stieltjes integrals to the problem (SILP-MKL), we get the following general problem formulation: max θ θ ,β (θ ∈ R, β : [0,1] → R : monotonically increasing ) (IP) 1 subject to ∫0 d β (ω ) = 1, 1  S (ω , α ) − ∑ i =1αi  d β (ω ) ≥ θ ∀α ∈ R l with 0 ≤ α ≤ C , ∑ i =1αi yi = 0. l l ∫Ω  2     ( ) 1 l l S (ω , α ) := ∑ i , j =1α iα j yi y jκ xi , x j , ω   A := α ∈ R 0 ≤ α ≤ C1 and ∑ α i yi =0  l 2  i =1    1 T (ω , α ) := S (ω , α ) − ∑ α i l 14 2 i =1 August 7, 2009
  15. 15. Infinite Kernel Learning Infinite Programming max θ (θ ∈ R, β : a positive measure on Ω ) (IP) θ ,β such that θ − ∫ T (ω , α )d β (ω ) ≤ 0 ∀α ∈ A, ∫Ω d β (ω ) = 1. Ω infinite programming dual of (IP): min σ (σ ∈ R , ρ : a positive measure on A ) σ ,ρ (DIP) such that σ -∫ T (ω , α )d ρ (α ) ≥ 0 ∀ω ∈ Ω, ∫A d ρ (α ) = 1. A • Duality Conditions: Let (θ , β ) and (σ , ρ ) be feasible for their respective problems, and complementary slack, so β has measure only where σ = ∫A T (ω , α )d ρ and ρ has measure only where θ = ∫ T (ω , α )d β . Ω Then, both solutions are optimal for their respective problems. 15 August 7, 2009
  16. 16. Infinite Kernel Learning Infinite Programming • The interesting theoretical problem here is to find conditions which ensure that solutions are point masses (i.e., the original monotonic β is a step function). • Because of this and in view of the compactness of the feasible (index) sets at the lower levels, A and Ω , we are interested in the nondegeneracy of the local minima of the lower level problem to get finitely many local minimizers of g ( (σ , ρ ) , ω ) := σ − ∫ T (ω , α ) d ρ (α ). A • Lower Level Problem: For a given parameter (σ , ρ ), we consider (LLP) min g ( (σ , ρ ) , ω ) subject to ω ∈ Ω . ω 16 August 7, 2009
  17. 17. Infinite Kernel Learning Infinite Programming • “reduction ansatz” and • Implicit Function Theorem • parametrical measures • “finite optimization” 17 August 7, 2009
  18. 18. Infinite Kernel Learning Infinite Programming • “reduction ansatz” and • Implicit Function Theorem • parametrical measures 1 −(ω − µ )2 e.g., f (ω ;( µ , σ )) = 2 exp σ 2π 2σ 2 λ exp(−λω), ω ≥ 0 f (ω ; λ) =  0, ω<0 H (ω − a) − H (ω − b) f (ω ;(a, b)) = b−a ωα −1 (1 − ω ) β −1 f (ω;(α , β )) = 1 α −1 β −1 ∫0 u (1 − u ) du • “finite optimization” 18 August 7, 2009
  19. 19. Infinite Kernel Learning Reduction Ansatz • “reduction ansatz” and • Implicit Function Theorem g ( x, ⋅) % • parametrical measures g ( x ,.) Ω g ( x, y ) ≥ 0 ∀y ∈ I yj yj % yp ⇔ min g ( x, y ) ≥ 0 y∈I x a y j ( x) implicit function 19 August 7, 2009
  20. 20. Infinite Kernel Learning Reduction Ansatz based on the reduction ansatz : min f ( x) subject to g j ( x) := g ( x, y j ( x)) ≥ 0 ( j ∈ J := {1, 2, K, p}) g ((σ , ρ ), ⋅) g ((σ , ρ ), ⋅) • (σ , ρ ) • ω ω (σ , ρ ) topology ω = ω (σ , ρ ) % 20 August 7, 2009
  21. 21. Infinite Kernel Learning Regularization regularization t t d d2 min − θ + sup µ ∫ d β (ω ) ∫ d β (ω ) θ ,β t∈[0,1] dt 0 2 dt 0 subject to the constraints 0 = t0 < t1 < K < tι = 1 tν +1 tν tν ∫ d β (ω ) − ∫ d β (ω ) tν +1 d 1 ∫ d β (ω ) ≈ 0 0 = ∫ d β (ω ) dt tν +1 − tν tν +1 − tν 0 tν tν + 2 tν +1 1 1 ∫ d β (ω ) − ∫ d β (ω ) 2 tν tν + 2 − tν +1 tν +1 − tν d tν +1 tν dt 2 0 ∫ d β (ω ) ≈ tν +1 − tν 21 August 7, 2009
  22. 22. Infinite Kernel Learning Topology Radon measure: measure on the σ -algebra of Borel sets of E that is locally finite and inner regular. (E,d): metric space inner regularity Η (E) : set of Radon measures on E neighbourhood of measure ρ : µ (Kν )     Bρ (ε ) :=  µ ∈ Η ( E ) ∫ fd µ − ∫ fd ρ < ε  f   A A   dual space ( Η ( E ))′ of continuous bounded functions, Kν ⊂ E : compact set f ∈ ( Η ( E ))′ 22 August 7, 2009
  23. 23. Infinite Kernel Learning Topology Def.: Basis of neighbourhood of a measure ρ ( f1,..., fn ∈(Η(E))′; ε > 0) : {µ ∈ Η (E) ∫E fi d ρ − ∫E fi d µ < ε } (i = 1, 2,..., n) . Def.: Prokhorov metric: d0 ( µ , ρ ) := inf {ε ≥ 0 | µ ( A) ≤ ρ ( Aε ) + ε and ρ ( A) ≤ µ ( Aε ) + ε (A : closed)} , ε where Aε := { x ∈ E | d ( x, A) < ε }. Open δ -neighbourhood of a measure ρ : Bδ ( ρ ) := {µ ∈ Η ( E ) d0 ( ρ , µ ) < δ }. 23 August 7, 2009
  24. 24. Infinite Kernel Learning Numerical Results 24 August 7, 2009
  25. 25. References Özöğür, S., Shawe-Taylor, J., Weber, G.-W., and Ögel, Z.B., Pattern analysis for the prediction of eukoryatic pro peptide cleavage sites, in the special issue Networks in Computational Biology of Discrete Applied Mathematics 157, 10 (May 2009) 2388-2394. Özöğür-Akyüz, S., and Weber, G.-W., Infinite kernel learning by infinite and semi-infinite programming, Proceedings of the Second Global Conference on Power Control and Optimization, AIP Conference Proceedings 1159, Bali, Indonesia, 1-3 June 2009, Subseries: Mathematical and Statistical Physics; ISBN 978-0-7354-0696-4 (August 2009) 306-313; Hakim, A.H., Vasant, P., and Barsoum, N., guest eds.. Özöğür-Akyüz, S., and Weber, G.-W., Infinite Kernel Learning via infinite and semi-infinite programming, to appear in the special issue of OMS (Optimization Software and Application) at the occasion of International Conference on Engineering Optimization (EngOpt 2008; Rio de Janeiro, Brazil, June 1-5, 2008), Schittkowski, K. (guest ed.). Özöğür-Akyüz, S., and Weber, G.-W., On numerical optimization theory of infinite kernel learning, preprint at IAM, METU, submitted to JOGO (Journal of Global Optimization). 25 August 7, 2009

×