New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
Lecture11 - neural networks
1. Introduction to Machine
Learning
Lecture 11
Neural Networks
N lN t k
Albert Orriols i Puig
aorriols@salle.url.edu
Artificial Intelligence – Machine Learning
g g
Enginyeria i Arquitectura La Salle
Universitat Ramon Llull
2. Recap of Lecture 5-10
Data classification
Decision trees (C4.5)
Instance-based learners (kNN and CBR)
Slide 2
Artificial Intelligence Machine Learning
3. Recap of Lecture 5-10
Data classification
Probabilistic-based learners
P (D | h )P (h )
P (h | D ) =
P (D )
Linear/polynomial classifier
Slide 3
Artificial Intelligence Machine Learning
4. Today’s Agenda
Why Neural Networks?
Looking into a Brain
Neural Networks
Starting from the Beginning:
Perceptrons
Multi-layer perceptrons
Slide 4
Artificial Intelligence Machine Learning
5. Why Neural Networks?
Brain vs. machines
Machines are tremendously faster than brains in well-defined
problems:
Invert matrices solve differential equations etc
matrices, equations, etc.
Brains are tremendously faster and more accurate than
machines in ill-defined methods or problems that require a lot
p q
of processing
Recognize the character of objects in TV
Let’s simulate our brains with artificial neural networks!
Massive parallelism
Neurons interchanging signals
Slide 5
Artificial Intelligence Machine Learning
6. Looking into a Brain
1011 neurons of more than 20 different types
0.001 seconds of neuron switching time
104-5 connections per neuron
0.1 seconds of scene recognition time
Slide 6
Artificial Intelligence Machine Learning
7. Artificial Neural Networks
Borrow some ideas from nervous systems of animals
ai =g (ini ) =g (∑ j W j ,i a j )
THE PERCEPTRON
(McCulloch & Pitts)
Slide 7
Artificial Intelligence Machine Learning
8. Adaline
Adaptive Linear Element
Adaptive linear combiner
cascaded with a hard-limiting
quantizer
Linear output transformed to
binary by means of a threshold
device
Training = adjusting the weights
Activation functions
Slide 8
Artificial Intelligence Machine Learning
9. Adaline
Note that Adaline implements a function
rr n
f ( x , w) =w0 + ∑ xi wi
i =1
This defines a threshold when the output is zero
rr n
f ( x , w) =w0 + ∑ xi wi =0
i =1
Slide 9
Artificial Intelligence Machine Learning
10. Adaline
Let’s assume that we have two variables
rr
f ( x , w) =w0 + x1w1 + x2 w2 = 0
Therefore w0
w1
x2 =− x1 −
w2 w2
So, Adaline is drawing a linear
, g
discriminant that divides the
space into two regions
Linear classifier
Slide 10
Artificial Intelligence Machine Learning
11. Adaline
So, we got a cool way to create linear classifiers
But are linear classifiers enough to tackle our problems?
Can you draw a line that separates examples of class white
and black for the last example?
Slide 11
Artificial Intelligence Machine Learning
12. Moving to more Flexible NN
So, we want to classify problems such as x-or. Any idea?
Polynomial discriminant functions
In this system:
rr
f ( x , w) =w0 + x1w1 + x12 w11 + x1 x2 w12 + x2 w22 + x2 w2 = 0
2
Slide 12
Artificial Intelligence Machine Learning
13. Moving to more Flexible NN
With appropriate values of w, I can fit data that is not
linearly separable
Slide 13
Artificial Intelligence Machine Learning
14. Even more Flexible: Multi-layer NN
So, we want to classify problems such as x-or. Any other idea?
Madaline: Multiple Adalines connected
This also enables the network to solve non-separable problems
Slide 14
Artificial Intelligence Machine Learning
15. But Step Down… How Do I Learn w?
We have seen that different structures enable me to
define different functions
But the key is to get a proper estimation of w
There are many algorithms
Perceptron rule
α-LMS
α-perceptron
May’s algorithm
Backpropagation
p pg
We are going to see two examples: α-LMS and backprop.
Slide 15
Artificial Intelligence Machine Learning
16. Weight Learning in Adaline
Recall that we want to adjust w
Slide 16
Artificial Intelligence Machine Learning
17. Weight Learning in Adaline
Weight learning with α-LMS algorithm
εk Xk
Wk +1 =Wk + α
Incrementally update weights as 2
Xk
The error is the difference between
ε k +1 = d k − WkT X k
the actual and the expected output
Δε k = Δ(d k − WkT X k ) =− X k ΔWk
A change in the T
weights effects the error
εk Xk
ΔWk = Wk +1 − Wk = α
And the weight change is 2
Xk
ε k X kT X k
Δε k = −α = −αε k
Therefore 2
Xk
Slide 17
Artificial Intelligence Machine Learning
18. Weight Learning in Adaline
εk
Δε k = − X k ΔWk ΔWk = α
T
Xk
2
Xk
Slide 18
Artificial Intelligence Machine Learning
19. Backpropagation
α-LMS works for networks with a single layer. But what
happens in networks with multiple layers?
Backpropagation (Rumelhat, 1986)
The most influential development of NN in the 1980s
Here, we present the method conceptually (the math details are
in the papers)
Let’s assume a network with
Three neurons in the input layer
Two neurons in the output layer
Slide 19
Artificial Intelligence Machine Learning
20. Backpropagation
Strategy
Compute the gradient of the error
∂ε
ˆ k = ∂ε k
2
∇
∂Wk
Adjust the weights in the direction opposite to the
instantaneous error gradient
Now, Wk is a vector that contains all the components of the net
Slide 20
Artificial Intelligence Machine Learning
21. Backpropagation
Algorithm
Insert a new example Xk into the network and sweep it forward
1.
till getting the output y
Compute the square error of thi attribute
C t th f this tt ib t
2.
Ny Ny
ε k 2 = ∑ ε ik 2 = ∑ (d ik − yik )2
i =1 i =1
For example, for two outputs (disregarding k)
ε = (d 1 − y1 ) + (d 2 − y2 )
2 2
2
Propagate the error to the previous layer (b k
P t th t th i l (back-propagation).
ti )
3.
How?
Steepest descent
p
Compute the derivative of the square error δ for each Adaline
Slide 21
Artificial Intelligence Machine Learning
22. Backpropagation Example
Example borrowed from: http://home.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html
Slide 22
Artificial Intelligence Machine Learning
28. Backpropagation for a Two-Layer Net.
That is, the algorithm is
Find the instantaneous square error derivative
1.
1 ∂ε 2
δj =−
(l )
2 ∂s j ( l )
This tells us how sensitive is the square output error of the
network
net ork is to changes in the linear output s of the associated
o tp t
Madaline
Expanding the error term we g
p g get
2.
[ ]
1 ∂ ( d 1 − y1 ) 2 + ( d 2 − y 2 ) 2 1 ∂[ d 1 − sgm( s1
(2)
)]2
δ1 =− =−
(2)
∂s1 ∂s1
(2) (2)
2 2
And recognizing that d1 is independent of s1
3.
δ 1( 2 ) = [ d 1 − sgm( s1( 2 ) )]sgm' ( s1( 2 ) ) = ε 1( 2 ) sgm' ( s1( 2 ) )
Slide 28
Artificial Intelligence Machine Learning
30. Backpropagation for a Two-Layer Net.
Δ
ε1 =δ1 + δ2
(1) (2) (2) (2) (2)
w11 w21
Defining
δ 1( 1 ) = ε 1( 1 ) sgm' ( s1( 1 ) )
We obtain
Implementation details of each Adaline
Slide 30
31. Next Class
Support Vector Machines
Slide 31
Artificial Intelligence Machine Learning
32. Introduction to Machine
Learning
Lecture 11
Neural Networks
N lN t k
Albert Orriols i Puig
aorriols@salle.url.edu
Artificial Intelligence – Machine Learning
g g
Enginyeria i Arquitectura La Salle
Universitat Ramon Llull