SlideShare a Scribd company logo
1 of 32
Download to read offline
Introduction to Machine
       Learning
                  Lecture 11
             Neural Networks
             N    lN t    k

                Albert Orriols i Puig
               aorriols@salle.url.edu

      Artificial Intelligence – Machine Learning
                        g                      g
          Enginyeria i Arquitectura La Salle
                 Universitat Ramon Llull
Recap of Lecture 5-10
        Data classification
                Decision trees (C4.5)




                Instance-based learners (kNN and CBR)




                                                           Slide 2
Artificial Intelligence                 Machine Learning
Recap of Lecture 5-10
        Data classification
                Probabilistic-based learners


                                               P (D | h )P (h )
                                P (h | D ) =
                                                   P (D )


                Linear/polynomial classifier




                                                                  Slide 3
Artificial Intelligence                   Machine Learning
Today’s Agenda


        Why Neural Networks?
        Looking into a Brain
        Neural Networks
        Starting from the Beginning:
                Perceptrons
                Multi-layer perceptrons


                                                  Slide 4
Artificial Intelligence        Machine Learning
Why Neural Networks?
        Brain vs. machines
                Machines are tremendously faster than brains in well-defined
                problems:
                          Invert matrices solve differential equations etc
                                 matrices,                   equations, etc.
                Brains are tremendously faster and more accurate than
                machines in ill-defined methods or problems that require a lot
                                                   p               q
                of processing
                          Recognize the character of objects in TV


        Let’s simulate our brains with artificial neural networks!
                Massive parallelism
                Neurons interchanging signals


                                                                               Slide 5
Artificial Intelligence                       Machine Learning
Looking into a Brain
                1011 neurons of more than 20 different types
                0.001 seconds of neuron switching time
                104-5 connections per neuron
                0.1 seconds of scene recognition time




                                                               Slide 6
Artificial Intelligence                Machine Learning
Artificial Neural Networks
        Borrow some ideas from nervous systems of animals

                          ai =g (ini ) =g (∑ j W j ,i a j )




                                THE PERCEPTRON
                                (McCulloch & Pitts)
                                                              Slide 7
Artificial Intelligence               Machine Learning
Adaline
Adaptive Linear Element
Adaptive linear combiner
cascaded with a hard-limiting
quantizer
Linear output transformed to
binary by means of a threshold
device
Training = adjusting the weights


Activation functions




                                                       Slide 8
 Artificial Intelligence            Machine Learning
Adaline
        Note that Adaline implements a function
                                rr           n
                            f ( x , w) =w0 + ∑ xi wi
                                                      i =1


        This defines a threshold when the output is zero

                              rr           n
                          f ( x , w) =w0 + ∑ xi wi =0
                                               i =1




                                                             Slide 9
Artificial Intelligence              Machine Learning
Adaline
         Let’s assume that we have two variables
                              rr
                          f ( x , w) =w0 + x1w1 + x2 w2 = 0
         Therefore                             w0
                                       w1
                                 x2 =−    x1 −
                                       w2      w2


                                                           So, Adaline is drawing a linear
                                                              ,                  g
                                                           discriminant that divides the
                                                           space into two regions
                                                           Linear classifier




                                                                                 Slide 10
Artificial Intelligence                 Machine Learning
Adaline
        So, we got a cool way to create linear classifiers
        But are linear classifiers enough to tackle our problems?




                Can you draw a line that separates examples of class white
                and black for the last example?




                                                                         Slide 11
Artificial Intelligence               Machine Learning
Moving to more Flexible NN
        So, we want to classify problems such as x-or. Any idea?
                Polynomial discriminant functions




                In this system:
             rr
         f ( x , w) =w0 + x1w1 + x12 w11 + x1 x2 w12 + x2 w22 + x2 w2 = 0
                                                        2

                                                                   Slide 12
Artificial Intelligence               Machine Learning
Moving to more Flexible NN




        With appropriate values of w, I can fit data that is not
        linearly separable


                                                                   Slide 13
Artificial Intelligence         Machine Learning
Even more Flexible: Multi-layer NN
    So, we want to classify problems such as x-or. Any other idea?




                Madaline: Multiple Adalines connected
                This also enables the network to solve non-separable problems


                                                                        Slide 14
Artificial Intelligence               Machine Learning
But Step Down… How Do I Learn w?
        We have seen that different structures enable me to
        define different functions
        But the key is to get a proper estimation of w
        There are many algorithms
                Perceptron rule
                α-LMS
                α-perceptron
                May’s algorithm
                Backpropagation
                    p pg
        We are going to see two examples: α-LMS and backprop.



                                                              Slide 15
Artificial Intelligence           Machine Learning
Weight Learning in Adaline
    Recall that we want to adjust w




                                               Slide 16
Artificial Intelligence     Machine Learning
Weight Learning in Adaline
    Weight learning with α-LMS algorithm
                                                                              εk Xk
                                                           Wk +1 =Wk + α
            Incrementally update weights as                                        2
                                                                              Xk
            The error is the difference between
                                                            ε k +1 = d k − WkT X k
            the actual and the expected output

                                             Δε k = Δ(d k − WkT X k ) =− X k ΔWk
            A change in the                                                T
            weights effects the error
                                                                             εk Xk
                                              ΔWk = Wk +1 − Wk = α
            And the weight change is                                               2
                                                                              Xk

                                                             ε k X kT X k
                                              Δε k = −α                     = −αε k
            Therefore                                                2
                                                                Xk
                                                                                       Slide 17
Artificial Intelligence                 Machine Learning
Weight Learning in Adaline




                                                                     εk
                          Δε k = − X k ΔWk                 ΔWk = α
                                    T
                                                                              Xk
                                                                          2
                                                                     Xk



                                                                                   Slide 18
Artificial Intelligence                 Machine Learning
Backpropagation
        α-LMS works for networks with a single layer. But what
        happens in networks with multiple layers?
        Backpropagation (Rumelhat, 1986)
                The most influential development of NN in the 1980s
                Here, we present the method conceptually (the math details are
                in the papers)
        Let’s assume a network with
                Three neurons in the input layer
                Two neurons in the output layer




                                                                         Slide 19
Artificial Intelligence                Machine Learning
Backpropagation
        Strategy
                Compute the gradient of the error

                                          ∂ε
                                    ˆ k = ∂ε k
                                                  2
                                    ∇
                                          ∂Wk

                Adjust the weights in the direction opposite to the
                instantaneous error gradient
                Now, Wk is a vector that contains all the components of the net




                                                                          Slide 20
Artificial Intelligence                Machine Learning
Backpropagation
        Algorithm
                Insert a new example Xk into the network and sweep it forward
        1.
                till getting the output y
                Compute the square error of thi attribute
                C    t th                 f this tt ib t
        2.
                                         Ny               Ny

                                 ε k 2 = ∑ ε ik 2 = ∑ (d ik − yik )2
                                         i =1             i =1

                For example, for two outputs (disregarding k)

                                   ε = (d 1 − y1 ) + (d 2 − y2 )
                                                             2         2
                                     2


                 Propagate the error to the previous layer (b k
                 P      t th         t th       i    l     (back-propagation).
                                                                        ti )
        3.
                 How?
                          Steepest descent
                              p
                          Compute the derivative of the square error δ for each Adaline
                                                                                  Slide 21
Artificial Intelligence                         Machine Learning
Backpropagation Example
        Example borrowed from: http://home.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html




                                                                                    Slide 22
Artificial Intelligence                 Machine Learning
Backpropagation Example
1. Sweep the weights forward




                                                  Slide 23
Artificial Intelligence        Machine Learning
Backpropagation Example
2. Backpropagate the error




                                                Slide 24
Artificial Intelligence      Machine Learning
Backpropagation Example
3. Modify the weights of each neuron




                                                  Slide 25
Artificial Intelligence        Machine Learning
Backpropagation Example
3.bis. Do the same of each neuron




                                                  Slide 26
Artificial Intelligence        Machine Learning
Backpropagation Example
3.bis2. Until reaching the output




                                                       Slide 27
Artificial Intelligence             Machine Learning
Backpropagation for a Two-Layer Net.

        That is, the algorithm is
                 Find the instantaneous square error derivative
        1.

                                                       1 ∂ε 2
                                        δj          =−
                                             (l )

                                                       2 ∂s j ( l )
                          This tells us how sensitive is the square output error of the
                          network
                          net ork is to changes in the linear output s of the associated
                                                               o tp t
                          Madaline
                 Expanding the error term we g
                   p     g                   get
        2.

                                  [                                      ]
                             1 ∂ ( d 1 − y1 ) 2 + ( d 2 − y 2 ) 2    1 ∂[ d 1 − sgm( s1
                                                                                          (2)
                                                                                                )]2
             δ1           =−                                      =−
                  (2)

                                            ∂s1                                 ∂s1
                                                (2)                                 (2)
                             2                                       2
                 And recognizing that d1 is independent of s1
        3.


              δ 1( 2 ) = [ d 1 − sgm( s1( 2 ) )]sgm' ( s1( 2 ) ) = ε 1( 2 ) sgm' ( s1( 2 ) )
                                                                                                      Slide 28
Artificial Intelligence                               Machine Learning
Backpropagation for a Two-Layer Net.

        That is, the algorithm is
                 Similarly for the hidden layers we have
        4.


                                          1 ⎛ ∂ε 2 ∂s1        ∂ε 2 ∂s2 ⎞
                              1 ∂ε 2
                                                      (2)             (2)
                                       = − ⎜ (2)                          ⎟
                δ 1( 1 )   =−                              +
                                            ⎜ ∂s                      (1) ⎟
                              2 ∂s1       2 ⎝ 1 ∂s1          ∂s2 ∂s1 ⎠
                                   (1)                 (1)      (2)



                                                                    ∂s1          ( 2 ) ∂s 2
                                                                           (2)                           (2)
                                         δ           = δ1                   + δ2
                 That is                       (1)           (2)
        5.
                                                                    ∂s1                ∂s1
                                              1                         (1)                 (1)



                 Which yields
        4.
                                    ⎡                                            ⎤               ⎡                                             ⎤
                                                   3                                                            3
                                  ∂ ⎢ w10 ( 2 ) + ∑ w1 i ( 2 ) sgm ( si ( 1 )                  ∂ ⎢ w20 ( 2 ) + ∑ w1 i ( 2 ) sgm ( s 2 ( 1 )
                                                                                )⎥                                                            )⎥
              δ 1( 1 ) = δ                                                           +δ
                               (2) ⎣                                             ⎦          (2) ⎣                                              ⎦
                                                 i =1                                                         i =1
                                                     ∂s1( 1 )                                                     ∂s1( 1 )
                             1                                                            2

                           = δ1                                          ) + δ2
                                  (2)         (2)                 (1)                (2)         (2)                    (1)
                                        w11         sgm' ( s1                              w21         sgm' ( s1               )

                             [                                                   ]sgm' ( s
                           = δ1                      + δ2
                                  (2)         (2)           (2)          (2)                       (1)
                                        w11                       w21                                    )
                                                                                                 1
                                                                                                                                          Slide 29
Artificial Intelligence                                      Machine Learning
Backpropagation for a Two-Layer Net.
                            Δ
                 ε1         =δ1                   + δ2
                      (1)       (2)         (2)          (2)         (2)
                                      w11                      w21
Defining
                  δ 1( 1 ) = ε 1( 1 ) sgm' ( s1( 1 ) )
We obtain




      Implementation details of each Adaline

                                                                           Slide 30
Next Class



        Support Vector Machines




                                               Slide 31
Artificial Intelligence     Machine Learning
Introduction to Machine
       Learning
                  Lecture 11
             Neural Networks
             N    lN t    k

                Albert Orriols i Puig
               aorriols@salle.url.edu

      Artificial Intelligence – Machine Learning
                        g                      g
          Enginyeria i Arquitectura La Salle
                 Universitat Ramon Llull

More Related Content

What's hot

Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural networkmustafa aadel
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkRichard Kuo
 
Activation functions
Activation functionsActivation functions
Activation functionsPRATEEK SAHU
 
Autoencoders
AutoencodersAutoencoders
AutoencodersCloudxLab
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural NetworkKnoldus Inc.
 
Autoencoders in Deep Learning
Autoencoders in Deep LearningAutoencoders in Deep Learning
Autoencoders in Deep Learningmilad abbasi
 
Neural networks
Neural networksNeural networks
Neural networksSlideshare
 
MACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHMMACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHMPuneet Kulyana
 
Convolutional Neural Network (CNN) - image recognition
Convolutional Neural Network (CNN)  - image recognitionConvolutional Neural Network (CNN)  - image recognition
Convolutional Neural Network (CNN) - image recognitionYUNG-KUEI CHEN
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagationKrish_ver2
 
Activation function
Activation functionActivation function
Activation functionAstha Jain
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural NetworksAshray Bhandare
 
Artificial Intelligence: Artificial Neural Networks
Artificial Intelligence: Artificial Neural NetworksArtificial Intelligence: Artificial Neural Networks
Artificial Intelligence: Artificial Neural NetworksThe Integral Worm
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksChristian Perone
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningOswald Campesato
 

What's hot (20)

Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Perceptron & Neural Networks
Perceptron & Neural NetworksPerceptron & Neural Networks
Perceptron & Neural Networks
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
Activation functions
Activation functionsActivation functions
Activation functions
 
Neural networks
Neural networksNeural networks
Neural networks
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
 
Recurrent neural network
Recurrent neural networkRecurrent neural network
Recurrent neural network
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Autoencoders in Deep Learning
Autoencoders in Deep LearningAutoencoders in Deep Learning
Autoencoders in Deep Learning
 
Neural networks
Neural networksNeural networks
Neural networks
 
MACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHMMACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHM
 
Convolutional Neural Network (CNN) - image recognition
Convolutional Neural Network (CNN)  - image recognitionConvolutional Neural Network (CNN)  - image recognition
Convolutional Neural Network (CNN) - image recognition
 
Artificial Neural Network Topology
Artificial Neural Network TopologyArtificial Neural Network Topology
Artificial Neural Network Topology
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
 
Activation function
Activation functionActivation function
Activation function
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Lecture5 - C4.5
Lecture5 - C4.5Lecture5 - C4.5
Lecture5 - C4.5
 
Artificial Intelligence: Artificial Neural Networks
Artificial Intelligence: Artificial Neural NetworksArtificial Intelligence: Artificial Neural Networks
Artificial Intelligence: Artificial Neural Networks
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 

Similar to Lecture11 - neural networks

Machine Learning Live
Machine Learning LiveMachine Learning Live
Machine Learning LiveMike Anderson
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learningRADO7900
 
Artificial Neural Networks for NIU
Artificial Neural Networks for NIUArtificial Neural Networks for NIU
Artificial Neural Networks for NIUProf. Neeta Awasthy
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...Simplilearn
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningMustafa Aldemir
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Python for brain mining: (neuro)science with state of the art machine learnin...
Python for brain mining: (neuro)science with state of the art machine learnin...Python for brain mining: (neuro)science with state of the art machine learnin...
Python for brain mining: (neuro)science with state of the art machine learnin...Gael Varoquaux
 
Artificial neural network paper
Artificial neural network paperArtificial neural network paper
Artificial neural network paperAkashRanjandas1
 
Camp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine LearningCamp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine LearningKrzysztof Kowalczyk
 
Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Ha Phuong
 
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14Sri Ambati
 
introduction to deeplearning
introduction to deeplearningintroduction to deeplearning
introduction to deeplearningEyad Alshami
 
Machine Learning - What, Where and How
Machine Learning - What, Where and HowMachine Learning - What, Where and How
Machine Learning - What, Where and Hownarinderk
 
H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614Sri Ambati
 

Similar to Lecture11 - neural networks (20)

Lecture7 - IBk
Lecture7 - IBkLecture7 - IBk
Lecture7 - IBk
 
Lecture12 - SVM
Lecture12 - SVMLecture12 - SVM
Lecture12 - SVM
 
Machine Learning Live
Machine Learning LiveMachine Learning Live
Machine Learning Live
 
Lecture18
Lecture18Lecture18
Lecture18
 
Lecture17
Lecture17Lecture17
Lecture17
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
 
Artificial Neural Networks for NIU
Artificial Neural Networks for NIUArtificial Neural Networks for NIU
Artificial Neural Networks for NIU
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
 
Python for brain mining: (neuro)science with state of the art machine learnin...
Python for brain mining: (neuro)science with state of the art machine learnin...Python for brain mining: (neuro)science with state of the art machine learnin...
Python for brain mining: (neuro)science with state of the art machine learnin...
 
ANN Slides.pdf
ANN Slides.pdfANN Slides.pdf
ANN Slides.pdf
 
Lecture6 - C4.5
Lecture6 - C4.5Lecture6 - C4.5
Lecture6 - C4.5
 
Artificial neural network paper
Artificial neural network paperArtificial neural network paper
Artificial neural network paper
 
Camp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine LearningCamp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine Learning
 
Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)
 
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
 
introduction to deeplearning
introduction to deeplearningintroduction to deeplearning
introduction to deeplearning
 
Machine Learning - What, Where and How
Machine Learning - What, Where and HowMachine Learning - What, Where and How
Machine Learning - What, Where and How
 
H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614
 

More from Albert Orriols-Puig

Lecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceLecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceAlbert Orriols-Puig
 
HAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsHAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsAlbert Orriols-Puig
 
Lecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART IIILecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART IIIAlbert Orriols-Puig
 
Lecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART IILecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART IIAlbert Orriols-Puig
 
Lecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesLecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesAlbert Orriols-Puig
 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryAlbert Orriols-Puig
 
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...Albert Orriols-Puig
 

More from Albert Orriols-Puig (20)

Lecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceLecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligence
 
HAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsHAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasets
 
Lecture24
Lecture24Lecture24
Lecture24
 
Lecture23
Lecture23Lecture23
Lecture23
 
Lecture22
Lecture22Lecture22
Lecture22
 
Lecture21
Lecture21Lecture21
Lecture21
 
Lecture20
Lecture20Lecture20
Lecture20
 
Lecture19
Lecture19Lecture19
Lecture19
 
Lecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART IIILecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART III
 
Lecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART IILecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART II
 
Lecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesLecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rules
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-Theory
 
Lecture8 - From CBR to IBk
Lecture8 - From CBR to IBkLecture8 - From CBR to IBk
Lecture8 - From CBR to IBk
 
Lecture4 - Machine Learning
Lecture4 - Machine LearningLecture4 - Machine Learning
Lecture4 - Machine Learning
 
Lecture3 - Machine Learning
Lecture3 - Machine LearningLecture3 - Machine Learning
Lecture3 - Machine Learning
 
Lecture2 - Machine Learning
Lecture2 - Machine LearningLecture2 - Machine Learning
Lecture2 - Machine Learning
 
Lecture1 - Machine Learning
Lecture1 - Machine LearningLecture1 - Machine Learning
Lecture1 - Machine Learning
 
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
 

Lecture11 - neural networks

  • 1. Introduction to Machine Learning Lecture 11 Neural Networks N lN t k Albert Orriols i Puig aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull
  • 2. Recap of Lecture 5-10 Data classification Decision trees (C4.5) Instance-based learners (kNN and CBR) Slide 2 Artificial Intelligence Machine Learning
  • 3. Recap of Lecture 5-10 Data classification Probabilistic-based learners P (D | h )P (h ) P (h | D ) = P (D ) Linear/polynomial classifier Slide 3 Artificial Intelligence Machine Learning
  • 4. Today’s Agenda Why Neural Networks? Looking into a Brain Neural Networks Starting from the Beginning: Perceptrons Multi-layer perceptrons Slide 4 Artificial Intelligence Machine Learning
  • 5. Why Neural Networks? Brain vs. machines Machines are tremendously faster than brains in well-defined problems: Invert matrices solve differential equations etc matrices, equations, etc. Brains are tremendously faster and more accurate than machines in ill-defined methods or problems that require a lot p q of processing Recognize the character of objects in TV Let’s simulate our brains with artificial neural networks! Massive parallelism Neurons interchanging signals Slide 5 Artificial Intelligence Machine Learning
  • 6. Looking into a Brain 1011 neurons of more than 20 different types 0.001 seconds of neuron switching time 104-5 connections per neuron 0.1 seconds of scene recognition time Slide 6 Artificial Intelligence Machine Learning
  • 7. Artificial Neural Networks Borrow some ideas from nervous systems of animals ai =g (ini ) =g (∑ j W j ,i a j ) THE PERCEPTRON (McCulloch & Pitts) Slide 7 Artificial Intelligence Machine Learning
  • 8. Adaline Adaptive Linear Element Adaptive linear combiner cascaded with a hard-limiting quantizer Linear output transformed to binary by means of a threshold device Training = adjusting the weights Activation functions Slide 8 Artificial Intelligence Machine Learning
  • 9. Adaline Note that Adaline implements a function rr n f ( x , w) =w0 + ∑ xi wi i =1 This defines a threshold when the output is zero rr n f ( x , w) =w0 + ∑ xi wi =0 i =1 Slide 9 Artificial Intelligence Machine Learning
  • 10. Adaline Let’s assume that we have two variables rr f ( x , w) =w0 + x1w1 + x2 w2 = 0 Therefore w0 w1 x2 =− x1 − w2 w2 So, Adaline is drawing a linear , g discriminant that divides the space into two regions Linear classifier Slide 10 Artificial Intelligence Machine Learning
  • 11. Adaline So, we got a cool way to create linear classifiers But are linear classifiers enough to tackle our problems? Can you draw a line that separates examples of class white and black for the last example? Slide 11 Artificial Intelligence Machine Learning
  • 12. Moving to more Flexible NN So, we want to classify problems such as x-or. Any idea? Polynomial discriminant functions In this system: rr f ( x , w) =w0 + x1w1 + x12 w11 + x1 x2 w12 + x2 w22 + x2 w2 = 0 2 Slide 12 Artificial Intelligence Machine Learning
  • 13. Moving to more Flexible NN With appropriate values of w, I can fit data that is not linearly separable Slide 13 Artificial Intelligence Machine Learning
  • 14. Even more Flexible: Multi-layer NN So, we want to classify problems such as x-or. Any other idea? Madaline: Multiple Adalines connected This also enables the network to solve non-separable problems Slide 14 Artificial Intelligence Machine Learning
  • 15. But Step Down… How Do I Learn w? We have seen that different structures enable me to define different functions But the key is to get a proper estimation of w There are many algorithms Perceptron rule α-LMS α-perceptron May’s algorithm Backpropagation p pg We are going to see two examples: α-LMS and backprop. Slide 15 Artificial Intelligence Machine Learning
  • 16. Weight Learning in Adaline Recall that we want to adjust w Slide 16 Artificial Intelligence Machine Learning
  • 17. Weight Learning in Adaline Weight learning with α-LMS algorithm εk Xk Wk +1 =Wk + α Incrementally update weights as 2 Xk The error is the difference between ε k +1 = d k − WkT X k the actual and the expected output Δε k = Δ(d k − WkT X k ) =− X k ΔWk A change in the T weights effects the error εk Xk ΔWk = Wk +1 − Wk = α And the weight change is 2 Xk ε k X kT X k Δε k = −α = −αε k Therefore 2 Xk Slide 17 Artificial Intelligence Machine Learning
  • 18. Weight Learning in Adaline εk Δε k = − X k ΔWk ΔWk = α T Xk 2 Xk Slide 18 Artificial Intelligence Machine Learning
  • 19. Backpropagation α-LMS works for networks with a single layer. But what happens in networks with multiple layers? Backpropagation (Rumelhat, 1986) The most influential development of NN in the 1980s Here, we present the method conceptually (the math details are in the papers) Let’s assume a network with Three neurons in the input layer Two neurons in the output layer Slide 19 Artificial Intelligence Machine Learning
  • 20. Backpropagation Strategy Compute the gradient of the error ∂ε ˆ k = ∂ε k 2 ∇ ∂Wk Adjust the weights in the direction opposite to the instantaneous error gradient Now, Wk is a vector that contains all the components of the net Slide 20 Artificial Intelligence Machine Learning
  • 21. Backpropagation Algorithm Insert a new example Xk into the network and sweep it forward 1. till getting the output y Compute the square error of thi attribute C t th f this tt ib t 2. Ny Ny ε k 2 = ∑ ε ik 2 = ∑ (d ik − yik )2 i =1 i =1 For example, for two outputs (disregarding k) ε = (d 1 − y1 ) + (d 2 − y2 ) 2 2 2 Propagate the error to the previous layer (b k P t th t th i l (back-propagation). ti ) 3. How? Steepest descent p Compute the derivative of the square error δ for each Adaline Slide 21 Artificial Intelligence Machine Learning
  • 22. Backpropagation Example Example borrowed from: http://home.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html Slide 22 Artificial Intelligence Machine Learning
  • 23. Backpropagation Example 1. Sweep the weights forward Slide 23 Artificial Intelligence Machine Learning
  • 24. Backpropagation Example 2. Backpropagate the error Slide 24 Artificial Intelligence Machine Learning
  • 25. Backpropagation Example 3. Modify the weights of each neuron Slide 25 Artificial Intelligence Machine Learning
  • 26. Backpropagation Example 3.bis. Do the same of each neuron Slide 26 Artificial Intelligence Machine Learning
  • 27. Backpropagation Example 3.bis2. Until reaching the output Slide 27 Artificial Intelligence Machine Learning
  • 28. Backpropagation for a Two-Layer Net. That is, the algorithm is Find the instantaneous square error derivative 1. 1 ∂ε 2 δj =− (l ) 2 ∂s j ( l ) This tells us how sensitive is the square output error of the network net ork is to changes in the linear output s of the associated o tp t Madaline Expanding the error term we g p g get 2. [ ] 1 ∂ ( d 1 − y1 ) 2 + ( d 2 − y 2 ) 2 1 ∂[ d 1 − sgm( s1 (2) )]2 δ1 =− =− (2) ∂s1 ∂s1 (2) (2) 2 2 And recognizing that d1 is independent of s1 3. δ 1( 2 ) = [ d 1 − sgm( s1( 2 ) )]sgm' ( s1( 2 ) ) = ε 1( 2 ) sgm' ( s1( 2 ) ) Slide 28 Artificial Intelligence Machine Learning
  • 29. Backpropagation for a Two-Layer Net. That is, the algorithm is Similarly for the hidden layers we have 4. 1 ⎛ ∂ε 2 ∂s1 ∂ε 2 ∂s2 ⎞ 1 ∂ε 2 (2) (2) = − ⎜ (2) ⎟ δ 1( 1 ) =− + ⎜ ∂s (1) ⎟ 2 ∂s1 2 ⎝ 1 ∂s1 ∂s2 ∂s1 ⎠ (1) (1) (2) ∂s1 ( 2 ) ∂s 2 (2) (2) δ = δ1 + δ2 That is (1) (2) 5. ∂s1 ∂s1 1 (1) (1) Which yields 4. ⎡ ⎤ ⎡ ⎤ 3 3 ∂ ⎢ w10 ( 2 ) + ∑ w1 i ( 2 ) sgm ( si ( 1 ) ∂ ⎢ w20 ( 2 ) + ∑ w1 i ( 2 ) sgm ( s 2 ( 1 ) )⎥ )⎥ δ 1( 1 ) = δ +δ (2) ⎣ ⎦ (2) ⎣ ⎦ i =1 i =1 ∂s1( 1 ) ∂s1( 1 ) 1 2 = δ1 ) + δ2 (2) (2) (1) (2) (2) (1) w11 sgm' ( s1 w21 sgm' ( s1 ) [ ]sgm' ( s = δ1 + δ2 (2) (2) (2) (2) (1) w11 w21 ) 1 Slide 29 Artificial Intelligence Machine Learning
  • 30. Backpropagation for a Two-Layer Net. Δ ε1 =δ1 + δ2 (1) (2) (2) (2) (2) w11 w21 Defining δ 1( 1 ) = ε 1( 1 ) sgm' ( s1( 1 ) ) We obtain Implementation details of each Adaline Slide 30
  • 31. Next Class Support Vector Machines Slide 31 Artificial Intelligence Machine Learning
  • 32. Introduction to Machine Learning Lecture 11 Neural Networks N lN t k Albert Orriols i Puig aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull