SlideShare a Scribd company logo
1 of 23
Download to read offline
Introduction to Machine
       Learning
                  Lecture 12
      Support Vector Machines

                Albert Orriols i Puig
               aorriols@salle.url.edu
                   i l @ ll       ld

      Artificial Intelligence – Machine Learning
          Enginyeria i Arquitectura La Salle
              gy           q
                 Universitat Ramon Llull
Recap of Lecture 11
        1st generation NN: Perceptrons and others
            g                   p




        Also multi-layer percetrons



                                                    Slide 2
Artificial Intelligence          Machine Learning
Recap of Lecture 11
        2nd generation NN
            g
                Some people figure it out how to adapt the weights of internal
                layers
                 aye s




                Seemed to be very powerful and able to solve almost anything
                The reality showed that this was not exactly true
                                                                            Slide 3
Artificial Intelligence                Machine Learning
Today’s Agenda


        Moving to SVM
             g
        Linear SVM
                The separable case
                The non-separable case
        Non-Linear
        Non Linear SVM



                                                  Slide 4
Artificial Intelligence        Machine Learning
Introduction
        SVM (Vapnik, 1995)
            (p,          )
                Clever type of perceptron
                Instead f h d di the layer of non-adaptive f t
                I t d of hand-coding th l          f     d ti features, each   h
                training example is used to create a new feature using a fixed
                recipe
                 ec pe
                A clever optimization technique is used to select the best
                subset o features
                       of eatu es
        Many NNs researchers switched to SVM in the 1990s
        because they work better
        Here, we’ll take a slow path into SVM concepts




                                                                             Slide 5
Artificial Intelligence                Machine Learning
Shattering Points with Oriented Hyperplanes
     Remember the idea
             I want to build hyperplanes that separate points of two classes
             In a two-dimensional space        lines
     E.g.: Linear Classifier



                                                          Which is the best separating line?

                                                          Remember, a hyperplane is
                                                        represented by th equation
                                                                t d b the     ti


                                                              WX + b = 0

                                                                                    Slide 6
Artificial Intelligence              Machine Learning
Linear SVM
        I want the line that maximizes the margin between
                                              g
        examples of both classes!


                                                 Support Vectors




                                                                   Slide 7
Artificial Intelligence       Machine Learning
Linear SVM
      In more detail
         Let’s assume two classes
                  yi = {-1 1}
                       {-1,
         Each example described by
         a set of features x (x is a
         vector; for clarity, we will
         mark vectors in bold in the
         remainder of the slides)
     The problem can be formulated as follows
         All training must satisfy
         (
         (in the separable case) )


         This can be combined

                                                        Slide 8
Artificial Intelligence              Machine Learning
Linear SVM
    What are the support vectors?
                   pp
         Let’s find the points that lay on the hyper plane H1
         Their perpendicular distance to the origin is


         Let’s find the points that lay on the hyper plane H2
         Their perpendicular distance to the origin is




                                                                The margin is:




                                                                            Slide 9
Artificial Intelligence                Machine Learning
Linear SVM
        Therefore, the problem is
                 ,     p
                Find the hyper plane that minimizes


                Subject to


        But let us change to the Lagrange formulation because
                The constraints will be placed on the Lagrange multipliers
                themselves (easier to handle)
                Training data will appear only in form of dot products between
                vectors




                                                                             Slide 10
Artificial Intelligence                Machine Learning
Linear SVM
        The Lagrangian formulation comes to be
              g   g




                Where αi are the Lagrange multipliers
        So,
        So now we need to
                Minimize Lp w.r.t w, b
                Simultaneously require that the derivatives of Lp w.r.t to α
                vanish
                All subject to the constraints αi ≥ 0



                                                                               Slide 11
Artificial Intelligence                  Machine Learning
Linear SVM
        Transformation to the dual problem
                                   p
                This is a convex problem
                We
                W can equivalently solve th d l problem
                         i l tl      l the dual    bl


        That is, maximize LD




                W.r.t αi
                Subject to constraints
                And with αi ≥ 0


                                                            Slide 12
Artificial Intelligence                  Machine Learning
Linear SVM


        This is a quadratic programming problem. You can solve
        it with many methods such as gradient descent
                We’ll not see these methods in class




                                                          Slide 13
Artificial Intelligence               Machine Learning
The Non-Separable case
        What if I can not separate the two classes
                            p




                We will not be able to solve the Lagrangian formulation
                proposed
                Any idea?

                                                                          Slide 14
Artificial Intelligence               Machine Learning
The Non-Separable Case
        Just relax the constraints by p
                                    y permitting some errors
                                               g




                                                               Slide 15
Artificial Intelligence       Machine Learning
The Non-Separable Case
    That means that the Lagrangian is rewritten
                          g   g
             We change the objective
             function to be minimized to
              uco o                ed o
             Therefore, we are maximizing the margin and minimizing the error
             C i a constant to be chosen b th user
               is      t tt b h          by the
        The dual problem becomes




             Subject to                      and




                                                                       Slide 16
Artificial Intelligence              Machine Learning
Non-Linear SVM
        What happens if the decision function is a linear function of
                pp
        the data?




        In our equations data appears in form of dot products xi · xj
               equations,
        Wouldn’t you like to have polynomials, logarithmics, …
        functions to fit the data?




                                                                   Slide 17
Artificial Intelligence          Machine Learning
Non-Linear SVM
        The kernel trick
                Map the data into a higher-dimensional space
                Mercer theorem: any continuous, symmetric, positive semi-
                definite kernel function K(x, y) can be expressed as a dot
                product in a high dimensional space
                             high-dimensional
        Now, we have a kernel function
        An example
        All we have talked about still holds when using the
        kernel function
        The only difference is that now my function will be



                                                                             Slide 18
Artificial Intelligence               Machine Learning
Non-Linear SVM
          Some typical kernels




          A visual example of a polynomial kernel with p=3
             i   l      lf        l    i lk     l ith 3




                                                             Slide 19
Artificial Intelligence          Machine Learning
Some Further Issues
        We have to classify data
                          y
                Described by nominal attributes and continuous attributes
                Probably ith i i
                P b bl with missing values
                                      l
                That may have more than two classes
        How SVM deal with them?
                SVM defined over continuous attributes No problem!
                                            attributes.
                Nominal attributes     Map into continuous space
                Multiple classes     Build S
                                           SVM that discriminate each pair of
                                                                            f
                classes




                                                                            Slide 20
Artificial Intelligence                 Machine Learning
Some Further Issues
        I’ve seen lots of formulas… But I want to program a SVM
                                                  pg
        builder. How I get my SVM?
                We have already mentioned that there are many methods to
                solve the quadratic programming problem
                Many algorithms designed for SVM
                One of the most significant: Sequential Minimal Optimization
                Currently, there are many new algorithms
                C      lh                      l ih




                                                                          Slide 21
Artificial Intelligence               Machine Learning
Next Class



        Association Rules




                                               Slide 22
Artificial Intelligence     Machine Learning
Introduction to Machine
       Learning
                  Lecture 12
      Support Vector Machines

                Albert Orriols i Puig
               aorriols@salle.url.edu
                   i l @ ll       ld

      Artificial Intelligence – Machine Learning
          Enginyeria i Arquitectura La Salle
              gy           q
                 Universitat Ramon Llull

More Related Content

What's hot

Support vector machine
Support vector machineSupport vector machine
Support vector machineMusa Hawamdah
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine LearningKnoldus Inc.
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksChristian Perone
 
Support vector machine
Support vector machineSupport vector machine
Support vector machineRishabh Gupta
 
Activation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkActivation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkGayatri Khanvilkar
 
Feature selection
Feature selectionFeature selection
Feature selectionDong Guo
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkKnoldus Inc.
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemMilind Gokhale
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
Cheatsheet deep-learning
Cheatsheet deep-learningCheatsheet deep-learning
Cheatsheet deep-learningSteve Nouri
 
Learning Deep Learning
Learning Deep LearningLearning Deep Learning
Learning Deep Learningsimaokasonse
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Simplilearn
 
Introduction to Applied Machine Learning
Introduction to Applied Machine LearningIntroduction to Applied Machine Learning
Introduction to Applied Machine LearningSheilaJimenezMorejon
 

What's hot (20)

Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Support Vector machine
Support Vector machineSupport Vector machine
Support Vector machine
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 
Support vector machine-SVM's
Support vector machine-SVM'sSupport vector machine-SVM's
Support vector machine-SVM's
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
Lecture11 - neural networks
Lecture11 - neural networksLecture11 - neural networks
Lecture11 - neural networks
 
Support Vector machine
Support Vector machineSupport Vector machine
Support Vector machine
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Activation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkActivation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural network
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Lstm
LstmLstm
Lstm
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Self-organizing map
Self-organizing mapSelf-organizing map
Self-organizing map
 
Cheatsheet deep-learning
Cheatsheet deep-learningCheatsheet deep-learning
Cheatsheet deep-learning
 
Learning Deep Learning
Learning Deep LearningLearning Deep Learning
Learning Deep Learning
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
 
Introduction to Applied Machine Learning
Introduction to Applied Machine LearningIntroduction to Applied Machine Learning
Introduction to Applied Machine Learning
 

Viewers also liked

HAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsHAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsAlbert Orriols-Puig
 
Lecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART IILecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART IIAlbert Orriols-Puig
 
Lecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesLecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesAlbert Orriols-Puig
 
Lecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceLecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceAlbert Orriols-Puig
 

Viewers also liked (7)

Lecture24
Lecture24Lecture24
Lecture24
 
HAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsHAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasets
 
Lecture21
Lecture21Lecture21
Lecture21
 
Lecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART IILecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART II
 
Lecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesLecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rules
 
Lecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceLecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligence
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 

Similar to Lecture12 - SVM

UE19EC353 ML Unit4_slides.pptx
UE19EC353 ML Unit4_slides.pptxUE19EC353 ML Unit4_slides.pptx
UE19EC353 ML Unit4_slides.pptxpremkumar901866
 
Machine Learning
Machine LearningMachine Learning
Machine Learningbutest
 
Linear Discrimination Centering on Support Vector Machines
Linear Discrimination Centering on Support Vector MachinesLinear Discrimination Centering on Support Vector Machines
Linear Discrimination Centering on Support Vector Machinesbutest
 
Shogun 2.0 @ PyData NYC 2012
Shogun 2.0 @ PyData NYC 2012Shogun 2.0 @ PyData NYC 2012
Shogun 2.0 @ PyData NYC 2012Christian Widmer
 
SVM & KNN Presentation.pptx
SVM & KNN Presentation.pptxSVM & KNN Presentation.pptx
SVM & KNN Presentation.pptxMohamedMonir33
 
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKSSupport Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKSrajalakshmi5921
 

Similar to Lecture12 - SVM (15)

Lecture7 - IBk
Lecture7 - IBkLecture7 - IBk
Lecture7 - IBk
 
Lecture18
Lecture18Lecture18
Lecture18
 
UE19EC353 ML Unit4_slides.pptx
UE19EC353 ML Unit4_slides.pptxUE19EC353 ML Unit4_slides.pptx
UE19EC353 ML Unit4_slides.pptx
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Lecture8 - From CBR to IBk
Lecture8 - From CBR to IBkLecture8 - From CBR to IBk
Lecture8 - From CBR to IBk
 
Linear Discrimination Centering on Support Vector Machines
Linear Discrimination Centering on Support Vector MachinesLinear Discrimination Centering on Support Vector Machines
Linear Discrimination Centering on Support Vector Machines
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 
Svm ms
Svm msSvm ms
Svm ms
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 
Shogun 2.0 @ PyData NYC 2012
Shogun 2.0 @ PyData NYC 2012Shogun 2.0 @ PyData NYC 2012
Shogun 2.0 @ PyData NYC 2012
 
Lecture3 - Machine Learning
Lecture3 - Machine LearningLecture3 - Machine Learning
Lecture3 - Machine Learning
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 
Km2417821785
Km2417821785Km2417821785
Km2417821785
 
SVM & KNN Presentation.pptx
SVM & KNN Presentation.pptxSVM & KNN Presentation.pptx
SVM & KNN Presentation.pptx
 
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKSSupport Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
 

More from Albert Orriols-Puig

Lecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART IIILecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART IIIAlbert Orriols-Puig
 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryAlbert Orriols-Puig
 
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...Albert Orriols-Puig
 
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...Albert Orriols-Puig
 
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...Albert Orriols-Puig
 
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSHIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSAlbert Orriols-Puig
 
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...Albert Orriols-Puig
 

More from Albert Orriols-Puig (18)

Lecture23
Lecture23Lecture23
Lecture23
 
Lecture22
Lecture22Lecture22
Lecture22
 
Lecture20
Lecture20Lecture20
Lecture20
 
Lecture19
Lecture19Lecture19
Lecture19
 
Lecture17
Lecture17Lecture17
Lecture17
 
Lecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART IIILecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART III
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-Theory
 
Lecture6 - C4.5
Lecture6 - C4.5Lecture6 - C4.5
Lecture6 - C4.5
 
Lecture5 - C4.5
Lecture5 - C4.5Lecture5 - C4.5
Lecture5 - C4.5
 
Lecture4 - Machine Learning
Lecture4 - Machine LearningLecture4 - Machine Learning
Lecture4 - Machine Learning
 
Lecture2 - Machine Learning
Lecture2 - Machine LearningLecture2 - Machine Learning
Lecture2 - Machine Learning
 
Lecture1 - Machine Learning
Lecture1 - Machine LearningLecture1 - Machine Learning
Lecture1 - Machine Learning
 
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
 
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
 
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
 
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSHIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
 
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
 

Lecture12 - SVM

  • 1. Introduction to Machine Learning Lecture 12 Support Vector Machines Albert Orriols i Puig aorriols@salle.url.edu i l @ ll ld Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle gy q Universitat Ramon Llull
  • 2. Recap of Lecture 11 1st generation NN: Perceptrons and others g p Also multi-layer percetrons Slide 2 Artificial Intelligence Machine Learning
  • 3. Recap of Lecture 11 2nd generation NN g Some people figure it out how to adapt the weights of internal layers aye s Seemed to be very powerful and able to solve almost anything The reality showed that this was not exactly true Slide 3 Artificial Intelligence Machine Learning
  • 4. Today’s Agenda Moving to SVM g Linear SVM The separable case The non-separable case Non-Linear Non Linear SVM Slide 4 Artificial Intelligence Machine Learning
  • 5. Introduction SVM (Vapnik, 1995) (p, ) Clever type of perceptron Instead f h d di the layer of non-adaptive f t I t d of hand-coding th l f d ti features, each h training example is used to create a new feature using a fixed recipe ec pe A clever optimization technique is used to select the best subset o features of eatu es Many NNs researchers switched to SVM in the 1990s because they work better Here, we’ll take a slow path into SVM concepts Slide 5 Artificial Intelligence Machine Learning
  • 6. Shattering Points with Oriented Hyperplanes Remember the idea I want to build hyperplanes that separate points of two classes In a two-dimensional space lines E.g.: Linear Classifier Which is the best separating line? Remember, a hyperplane is represented by th equation t d b the ti WX + b = 0 Slide 6 Artificial Intelligence Machine Learning
  • 7. Linear SVM I want the line that maximizes the margin between g examples of both classes! Support Vectors Slide 7 Artificial Intelligence Machine Learning
  • 8. Linear SVM In more detail Let’s assume two classes yi = {-1 1} {-1, Each example described by a set of features x (x is a vector; for clarity, we will mark vectors in bold in the remainder of the slides) The problem can be formulated as follows All training must satisfy ( (in the separable case) ) This can be combined Slide 8 Artificial Intelligence Machine Learning
  • 9. Linear SVM What are the support vectors? pp Let’s find the points that lay on the hyper plane H1 Their perpendicular distance to the origin is Let’s find the points that lay on the hyper plane H2 Their perpendicular distance to the origin is The margin is: Slide 9 Artificial Intelligence Machine Learning
  • 10. Linear SVM Therefore, the problem is , p Find the hyper plane that minimizes Subject to But let us change to the Lagrange formulation because The constraints will be placed on the Lagrange multipliers themselves (easier to handle) Training data will appear only in form of dot products between vectors Slide 10 Artificial Intelligence Machine Learning
  • 11. Linear SVM The Lagrangian formulation comes to be g g Where αi are the Lagrange multipliers So, So now we need to Minimize Lp w.r.t w, b Simultaneously require that the derivatives of Lp w.r.t to α vanish All subject to the constraints αi ≥ 0 Slide 11 Artificial Intelligence Machine Learning
  • 12. Linear SVM Transformation to the dual problem p This is a convex problem We W can equivalently solve th d l problem i l tl l the dual bl That is, maximize LD W.r.t αi Subject to constraints And with αi ≥ 0 Slide 12 Artificial Intelligence Machine Learning
  • 13. Linear SVM This is a quadratic programming problem. You can solve it with many methods such as gradient descent We’ll not see these methods in class Slide 13 Artificial Intelligence Machine Learning
  • 14. The Non-Separable case What if I can not separate the two classes p We will not be able to solve the Lagrangian formulation proposed Any idea? Slide 14 Artificial Intelligence Machine Learning
  • 15. The Non-Separable Case Just relax the constraints by p y permitting some errors g Slide 15 Artificial Intelligence Machine Learning
  • 16. The Non-Separable Case That means that the Lagrangian is rewritten g g We change the objective function to be minimized to uco o ed o Therefore, we are maximizing the margin and minimizing the error C i a constant to be chosen b th user is t tt b h by the The dual problem becomes Subject to and Slide 16 Artificial Intelligence Machine Learning
  • 17. Non-Linear SVM What happens if the decision function is a linear function of pp the data? In our equations data appears in form of dot products xi · xj equations, Wouldn’t you like to have polynomials, logarithmics, … functions to fit the data? Slide 17 Artificial Intelligence Machine Learning
  • 18. Non-Linear SVM The kernel trick Map the data into a higher-dimensional space Mercer theorem: any continuous, symmetric, positive semi- definite kernel function K(x, y) can be expressed as a dot product in a high dimensional space high-dimensional Now, we have a kernel function An example All we have talked about still holds when using the kernel function The only difference is that now my function will be Slide 18 Artificial Intelligence Machine Learning
  • 19. Non-Linear SVM Some typical kernels A visual example of a polynomial kernel with p=3 i l lf l i lk l ith 3 Slide 19 Artificial Intelligence Machine Learning
  • 20. Some Further Issues We have to classify data y Described by nominal attributes and continuous attributes Probably ith i i P b bl with missing values l That may have more than two classes How SVM deal with them? SVM defined over continuous attributes No problem! attributes. Nominal attributes Map into continuous space Multiple classes Build S SVM that discriminate each pair of f classes Slide 20 Artificial Intelligence Machine Learning
  • 21. Some Further Issues I’ve seen lots of formulas… But I want to program a SVM pg builder. How I get my SVM? We have already mentioned that there are many methods to solve the quadratic programming problem Many algorithms designed for SVM One of the most significant: Sequential Minimal Optimization Currently, there are many new algorithms C lh l ih Slide 21 Artificial Intelligence Machine Learning
  • 22. Next Class Association Rules Slide 22 Artificial Intelligence Machine Learning
  • 23. Introduction to Machine Learning Lecture 12 Support Vector Machines Albert Orriols i Puig aorriols@salle.url.edu i l @ ll ld Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle gy q Universitat Ramon Llull