Review: Smallest decision tree
                                                                                                                       Outlook
                                                                                                         sunny                            rain
              Artificial Intelligence                                                                                  overcast

                                                                                     {D1,D2,D8,D9,D11}            {D3,D7,D12,D13}         {D4,D5,D6,D10,D14}
                                                                                             Humidity                      YES                   Wind

                              Perceptrons                                            high               normal                          strong          weak


                                                                                  {D1,D2,D8}       {D9,D11}                          {D6,D14}        {D4,D5,D10}

                                                                                     NO                 YES                             NO               YES


CS 4633/6633 Artificial Intelligence                                               CS 4633/6633 Artificial Intelligence




  Review: The sigmoid function                                                                   Perceptron “Neuron”
                                                                  1                         I1                   I0 = -1
                        1.0                           y=              −x
                                                                                                    w1

                                                            (1 + e         )                                               w0
                                                                                                                                 Weighted Sum    Threshold
                        0.5                                                                 I2          w2

                                                                                                                                   Σ
                                                                                            I3          w3
                                                                                            .                                                n
                                                                                            .                                         1 if ∑ wi Ii > 0
                                                                                                    wn                      Output =       i =0
                                                                                            In
                                                                                                                                     − 1 otherwise


CS 4633/6633 Artificial Intelligence                                               CS 4633/6633 Artificial Intelligence




              Perceptron Network                                                   Perceptron Learning Algorithm
A single-layer feedforward network.
                                                                                 1 Set initial weights randomly, usually in range [-0.5, 0.5]
                   I1                                                            2 For each example in training set
                                                    O1
                                                                                    • Apply input to perceptron to calculate output O
                   I2
                                                                                    • Compare predicted output O to correct output T
                                                    O2                              • Error ← T - O
                   I3
                                                                                    • Use error to revise weights, as follows
                                                    O3
                                                                                           wj ← wj + α·Ij·Error
                   I4
                                                                                 3 Repeat step 2 until convergence (no errors on training
Each output unit is independent of the others and each weight only affects         set)
one of the outputs. Thus, we only need to consider how to train a single unit.
CS 4633/6633 Artificial Intelligence                                               CS 4633/6633 Artificial Intelligence
Convergence                                 Limits on representation
q   The perceptron learning algorithm will converge            q   A perceptron can only represent functions that
    to a set of weights that correctly classifies the              are linearly separable (whereas a multi-layer
    examples, as long as the examples represent a                  feedforward neural net can represent any
    linearly separable function                                    function)
q   Perceptron learning is a form of hill-climbing             q   It can only represent functions for which positive
    through weight space. Convergence is                           and negative examples can be separated by a
    guaranteed because there are no local minima.                  line (in two dimensions), or an n-dimensional
q   A perceptron can learn any function it can                     hyperplane (in n dimensions)
    represent! The problem is that it can’t represent
    most functions.
    CS 4633/6633 Artificial Intelligence                           CS 4633/6633 Artificial Intelligence




     Linear separability: Examples                                    Are they linearly separable?

      1       -               +    1       +               -                                                I2
                                                                                  I2

                                                                      +                +                             -
                                                                                               -        +
              -               -            -               +              +                        I1                    I1
          0                   1        0                   1
                                                                              -            -            -
                  I1 and I2                    I1 xor I2                                                         +


    CS 4633/6633 Artificial Intelligence                           CS 4633/6633 Artificial Intelligence




              Training Neural Networks
q Supervised learning. Labeled examples.
  Attribute-value pairs.
q Epochs. Repeatedly train on same
  examples until convergence (this is different
  from decision tree learning algorithms)
q Learning rate




    CS 4633/6633 Artificial Intelligence

Perceptron Slides

  • 1.
    Review: Smallest decisiontree Outlook sunny rain Artificial Intelligence overcast {D1,D2,D8,D9,D11} {D3,D7,D12,D13} {D4,D5,D6,D10,D14} Humidity YES Wind Perceptrons high normal strong weak {D1,D2,D8} {D9,D11} {D6,D14} {D4,D5,D10} NO YES NO YES CS 4633/6633 Artificial Intelligence CS 4633/6633 Artificial Intelligence Review: The sigmoid function Perceptron “Neuron” 1 I1 I0 = -1 1.0 y= −x w1 (1 + e ) w0 Weighted Sum Threshold 0.5 I2 w2 Σ I3 w3 .  n .  1 if ∑ wi Ii > 0 wn Output =  i =0 In − 1 otherwise CS 4633/6633 Artificial Intelligence CS 4633/6633 Artificial Intelligence Perceptron Network Perceptron Learning Algorithm A single-layer feedforward network. 1 Set initial weights randomly, usually in range [-0.5, 0.5] I1 2 For each example in training set O1 • Apply input to perceptron to calculate output O I2 • Compare predicted output O to correct output T O2 • Error ← T - O I3 • Use error to revise weights, as follows O3 wj ← wj + α·Ij·Error I4 3 Repeat step 2 until convergence (no errors on training Each output unit is independent of the others and each weight only affects set) one of the outputs. Thus, we only need to consider how to train a single unit. CS 4633/6633 Artificial Intelligence CS 4633/6633 Artificial Intelligence
  • 2.
    Convergence Limits on representation q The perceptron learning algorithm will converge q A perceptron can only represent functions that to a set of weights that correctly classifies the are linearly separable (whereas a multi-layer examples, as long as the examples represent a feedforward neural net can represent any linearly separable function function) q Perceptron learning is a form of hill-climbing q It can only represent functions for which positive through weight space. Convergence is and negative examples can be separated by a guaranteed because there are no local minima. line (in two dimensions), or an n-dimensional q A perceptron can learn any function it can hyperplane (in n dimensions) represent! The problem is that it can’t represent most functions. CS 4633/6633 Artificial Intelligence CS 4633/6633 Artificial Intelligence Linear separability: Examples Are they linearly separable? 1 - + 1 + - I2 I2 + + - - + - - - + + I1 I1 0 1 0 1 - - - I1 and I2 I1 xor I2 + CS 4633/6633 Artificial Intelligence CS 4633/6633 Artificial Intelligence Training Neural Networks q Supervised learning. Labeled examples. Attribute-value pairs. q Epochs. Repeatedly train on same examples until convergence (this is different from decision tree learning algorithms) q Learning rate CS 4633/6633 Artificial Intelligence