Lecture11 - neural networks

4,726 views

Published on

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,726
On SlideShare
0
From Embeds
0
Number of Embeds
34
Actions
Shares
0
Downloads
389
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Lecture11 - neural networks

  1. 1. Introduction to Machine Learning Lecture 11 Neural Networks N lN t k Albert Orriols i Puig aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull
  2. 2. Recap of Lecture 5-10 Data classification Decision trees (C4.5) Instance-based learners (kNN and CBR) Slide 2 Artificial Intelligence Machine Learning
  3. 3. Recap of Lecture 5-10 Data classification Probabilistic-based learners P (D | h )P (h ) P (h | D ) = P (D ) Linear/polynomial classifier Slide 3 Artificial Intelligence Machine Learning
  4. 4. Today’s Agenda Why Neural Networks? Looking into a Brain Neural Networks Starting from the Beginning: Perceptrons Multi-layer perceptrons Slide 4 Artificial Intelligence Machine Learning
  5. 5. Why Neural Networks? Brain vs. machines Machines are tremendously faster than brains in well-defined problems: Invert matrices solve differential equations etc matrices, equations, etc. Brains are tremendously faster and more accurate than machines in ill-defined methods or problems that require a lot p q of processing Recognize the character of objects in TV Let’s simulate our brains with artificial neural networks! Massive parallelism Neurons interchanging signals Slide 5 Artificial Intelligence Machine Learning
  6. 6. Looking into a Brain 1011 neurons of more than 20 different types 0.001 seconds of neuron switching time 104-5 connections per neuron 0.1 seconds of scene recognition time Slide 6 Artificial Intelligence Machine Learning
  7. 7. Artificial Neural Networks Borrow some ideas from nervous systems of animals ai =g (ini ) =g (∑ j W j ,i a j ) THE PERCEPTRON (McCulloch & Pitts) Slide 7 Artificial Intelligence Machine Learning
  8. 8. Adaline Adaptive Linear Element Adaptive linear combiner cascaded with a hard-limiting quantizer Linear output transformed to binary by means of a threshold device Training = adjusting the weights Activation functions Slide 8 Artificial Intelligence Machine Learning
  9. 9. Adaline Note that Adaline implements a function rr n f ( x , w) =w0 + ∑ xi wi i =1 This defines a threshold when the output is zero rr n f ( x , w) =w0 + ∑ xi wi =0 i =1 Slide 9 Artificial Intelligence Machine Learning
  10. 10. Adaline Let’s assume that we have two variables rr f ( x , w) =w0 + x1w1 + x2 w2 = 0 Therefore w0 w1 x2 =− x1 − w2 w2 So, Adaline is drawing a linear , g discriminant that divides the space into two regions Linear classifier Slide 10 Artificial Intelligence Machine Learning
  11. 11. Adaline So, we got a cool way to create linear classifiers But are linear classifiers enough to tackle our problems? Can you draw a line that separates examples of class white and black for the last example? Slide 11 Artificial Intelligence Machine Learning
  12. 12. Moving to more Flexible NN So, we want to classify problems such as x-or. Any idea? Polynomial discriminant functions In this system: rr f ( x , w) =w0 + x1w1 + x12 w11 + x1 x2 w12 + x2 w22 + x2 w2 = 0 2 Slide 12 Artificial Intelligence Machine Learning
  13. 13. Moving to more Flexible NN With appropriate values of w, I can fit data that is not linearly separable Slide 13 Artificial Intelligence Machine Learning
  14. 14. Even more Flexible: Multi-layer NN So, we want to classify problems such as x-or. Any other idea? Madaline: Multiple Adalines connected This also enables the network to solve non-separable problems Slide 14 Artificial Intelligence Machine Learning
  15. 15. But Step Down… How Do I Learn w? We have seen that different structures enable me to define different functions But the key is to get a proper estimation of w There are many algorithms Perceptron rule α-LMS α-perceptron May’s algorithm Backpropagation p pg We are going to see two examples: α-LMS and backprop. Slide 15 Artificial Intelligence Machine Learning
  16. 16. Weight Learning in Adaline Recall that we want to adjust w Slide 16 Artificial Intelligence Machine Learning
  17. 17. Weight Learning in Adaline Weight learning with α-LMS algorithm εk Xk Wk +1 =Wk + α Incrementally update weights as 2 Xk The error is the difference between ε k +1 = d k − WkT X k the actual and the expected output Δε k = Δ(d k − WkT X k ) =− X k ΔWk A change in the T weights effects the error εk Xk ΔWk = Wk +1 − Wk = α And the weight change is 2 Xk ε k X kT X k Δε k = −α = −αε k Therefore 2 Xk Slide 17 Artificial Intelligence Machine Learning
  18. 18. Weight Learning in Adaline εk Δε k = − X k ΔWk ΔWk = α T Xk 2 Xk Slide 18 Artificial Intelligence Machine Learning
  19. 19. Backpropagation α-LMS works for networks with a single layer. But what happens in networks with multiple layers? Backpropagation (Rumelhat, 1986) The most influential development of NN in the 1980s Here, we present the method conceptually (the math details are in the papers) Let’s assume a network with Three neurons in the input layer Two neurons in the output layer Slide 19 Artificial Intelligence Machine Learning
  20. 20. Backpropagation Strategy Compute the gradient of the error ∂ε ˆ k = ∂ε k 2 ∇ ∂Wk Adjust the weights in the direction opposite to the instantaneous error gradient Now, Wk is a vector that contains all the components of the net Slide 20 Artificial Intelligence Machine Learning
  21. 21. Backpropagation Algorithm Insert a new example Xk into the network and sweep it forward 1. till getting the output y Compute the square error of thi attribute C t th f this tt ib t 2. Ny Ny ε k 2 = ∑ ε ik 2 = ∑ (d ik − yik )2 i =1 i =1 For example, for two outputs (disregarding k) ε = (d 1 − y1 ) + (d 2 − y2 ) 2 2 2 Propagate the error to the previous layer (b k P t th t th i l (back-propagation). ti ) 3. How? Steepest descent p Compute the derivative of the square error δ for each Adaline Slide 21 Artificial Intelligence Machine Learning
  22. 22. Backpropagation Example Example borrowed from: http://home.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html Slide 22 Artificial Intelligence Machine Learning
  23. 23. Backpropagation Example 1. Sweep the weights forward Slide 23 Artificial Intelligence Machine Learning
  24. 24. Backpropagation Example 2. Backpropagate the error Slide 24 Artificial Intelligence Machine Learning
  25. 25. Backpropagation Example 3. Modify the weights of each neuron Slide 25 Artificial Intelligence Machine Learning
  26. 26. Backpropagation Example 3.bis. Do the same of each neuron Slide 26 Artificial Intelligence Machine Learning
  27. 27. Backpropagation Example 3.bis2. Until reaching the output Slide 27 Artificial Intelligence Machine Learning
  28. 28. Backpropagation for a Two-Layer Net. That is, the algorithm is Find the instantaneous square error derivative 1. 1 ∂ε 2 δj =− (l ) 2 ∂s j ( l ) This tells us how sensitive is the square output error of the network net ork is to changes in the linear output s of the associated o tp t Madaline Expanding the error term we g p g get 2. [ ] 1 ∂ ( d 1 − y1 ) 2 + ( d 2 − y 2 ) 2 1 ∂[ d 1 − sgm( s1 (2) )]2 δ1 =− =− (2) ∂s1 ∂s1 (2) (2) 2 2 And recognizing that d1 is independent of s1 3. δ 1( 2 ) = [ d 1 − sgm( s1( 2 ) )]sgm' ( s1( 2 ) ) = ε 1( 2 ) sgm' ( s1( 2 ) ) Slide 28 Artificial Intelligence Machine Learning
  29. 29. Backpropagation for a Two-Layer Net. That is, the algorithm is Similarly for the hidden layers we have 4. 1 ⎛ ∂ε 2 ∂s1 ∂ε 2 ∂s2 ⎞ 1 ∂ε 2 (2) (2) = − ⎜ (2) ⎟ δ 1( 1 ) =− + ⎜ ∂s (1) ⎟ 2 ∂s1 2 ⎝ 1 ∂s1 ∂s2 ∂s1 ⎠ (1) (1) (2) ∂s1 ( 2 ) ∂s 2 (2) (2) δ = δ1 + δ2 That is (1) (2) 5. ∂s1 ∂s1 1 (1) (1) Which yields 4. ⎡ ⎤ ⎡ ⎤ 3 3 ∂ ⎢ w10 ( 2 ) + ∑ w1 i ( 2 ) sgm ( si ( 1 ) ∂ ⎢ w20 ( 2 ) + ∑ w1 i ( 2 ) sgm ( s 2 ( 1 ) )⎥ )⎥ δ 1( 1 ) = δ +δ (2) ⎣ ⎦ (2) ⎣ ⎦ i =1 i =1 ∂s1( 1 ) ∂s1( 1 ) 1 2 = δ1 ) + δ2 (2) (2) (1) (2) (2) (1) w11 sgm' ( s1 w21 sgm' ( s1 ) [ ]sgm' ( s = δ1 + δ2 (2) (2) (2) (2) (1) w11 w21 ) 1 Slide 29 Artificial Intelligence Machine Learning
  30. 30. Backpropagation for a Two-Layer Net. Δ ε1 =δ1 + δ2 (1) (2) (2) (2) (2) w11 w21 Defining δ 1( 1 ) = ε 1( 1 ) sgm' ( s1( 1 ) ) We obtain Implementation details of each Adaline Slide 30
  31. 31. Next Class Support Vector Machines Slide 31 Artificial Intelligence Machine Learning
  32. 32. Introduction to Machine Learning Lecture 11 Neural Networks N lN t k Albert Orriols i Puig aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull

×