Lecture11 - neural networks
Upcoming SlideShare
Loading in...5
×
 

Lecture11 - neural networks

on

  • 2,949 views

 

Statistics

Views

Total Views
2,949
Views on SlideShare
2,925
Embed Views
24

Actions

Likes
1
Downloads
251
Comments
0

3 Embeds 24

http://www.albertorriols.net 18
http://www.slideshare.net 5
http://www.slideee.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Lecture11 - neural networks Lecture11 - neural networks Presentation Transcript

    • Introduction to Machine Learning Lecture 11 Neural Networks N lN t k Albert Orriols i Puig aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull
    • Recap of Lecture 5-10 Data classification Decision trees (C4.5) Instance-based learners (kNN and CBR) Slide 2 Artificial Intelligence Machine Learning
    • Recap of Lecture 5-10 Data classification Probabilistic-based learners P (D | h )P (h ) P (h | D ) = P (D ) Linear/polynomial classifier Slide 3 Artificial Intelligence Machine Learning
    • Today’s Agenda Why Neural Networks? Looking into a Brain Neural Networks Starting from the Beginning: Perceptrons Multi-layer perceptrons Slide 4 Artificial Intelligence Machine Learning
    • Why Neural Networks? Brain vs. machines Machines are tremendously faster than brains in well-defined problems: Invert matrices solve differential equations etc matrices, equations, etc. Brains are tremendously faster and more accurate than machines in ill-defined methods or problems that require a lot p q of processing Recognize the character of objects in TV Let’s simulate our brains with artificial neural networks! Massive parallelism Neurons interchanging signals Slide 5 Artificial Intelligence Machine Learning
    • Looking into a Brain 1011 neurons of more than 20 different types 0.001 seconds of neuron switching time 104-5 connections per neuron 0.1 seconds of scene recognition time Slide 6 Artificial Intelligence Machine Learning
    • Artificial Neural Networks Borrow some ideas from nervous systems of animals ai =g (ini ) =g (∑ j W j ,i a j ) THE PERCEPTRON (McCulloch & Pitts) Slide 7 Artificial Intelligence Machine Learning
    • Adaline Adaptive Linear Element Adaptive linear combiner cascaded with a hard-limiting quantizer Linear output transformed to binary by means of a threshold device Training = adjusting the weights Activation functions Slide 8 Artificial Intelligence Machine Learning
    • Adaline Note that Adaline implements a function rr n f ( x , w) =w0 + ∑ xi wi i =1 This defines a threshold when the output is zero rr n f ( x , w) =w0 + ∑ xi wi =0 i =1 Slide 9 Artificial Intelligence Machine Learning
    • Adaline Let’s assume that we have two variables rr f ( x , w) =w0 + x1w1 + x2 w2 = 0 Therefore w0 w1 x2 =− x1 − w2 w2 So, Adaline is drawing a linear , g discriminant that divides the space into two regions Linear classifier Slide 10 Artificial Intelligence Machine Learning
    • Adaline So, we got a cool way to create linear classifiers But are linear classifiers enough to tackle our problems? Can you draw a line that separates examples of class white and black for the last example? Slide 11 Artificial Intelligence Machine Learning
    • Moving to more Flexible NN So, we want to classify problems such as x-or. Any idea? Polynomial discriminant functions In this system: rr f ( x , w) =w0 + x1w1 + x12 w11 + x1 x2 w12 + x2 w22 + x2 w2 = 0 2 Slide 12 Artificial Intelligence Machine Learning
    • Moving to more Flexible NN With appropriate values of w, I can fit data that is not linearly separable Slide 13 Artificial Intelligence Machine Learning
    • Even more Flexible: Multi-layer NN So, we want to classify problems such as x-or. Any other idea? Madaline: Multiple Adalines connected This also enables the network to solve non-separable problems Slide 14 Artificial Intelligence Machine Learning
    • But Step Down… How Do I Learn w? We have seen that different structures enable me to define different functions But the key is to get a proper estimation of w There are many algorithms Perceptron rule α-LMS α-perceptron May’s algorithm Backpropagation p pg We are going to see two examples: α-LMS and backprop. Slide 15 Artificial Intelligence Machine Learning
    • Weight Learning in Adaline Recall that we want to adjust w Slide 16 Artificial Intelligence Machine Learning
    • Weight Learning in Adaline Weight learning with α-LMS algorithm εk Xk Wk +1 =Wk + α Incrementally update weights as 2 Xk The error is the difference between ε k +1 = d k − WkT X k the actual and the expected output Δε k = Δ(d k − WkT X k ) =− X k ΔWk A change in the T weights effects the error εk Xk ΔWk = Wk +1 − Wk = α And the weight change is 2 Xk ε k X kT X k Δε k = −α = −αε k Therefore 2 Xk Slide 17 Artificial Intelligence Machine Learning
    • Weight Learning in Adaline εk Δε k = − X k ΔWk ΔWk = α T Xk 2 Xk Slide 18 Artificial Intelligence Machine Learning
    • Backpropagation α-LMS works for networks with a single layer. But what happens in networks with multiple layers? Backpropagation (Rumelhat, 1986) The most influential development of NN in the 1980s Here, we present the method conceptually (the math details are in the papers) Let’s assume a network with Three neurons in the input layer Two neurons in the output layer Slide 19 Artificial Intelligence Machine Learning
    • Backpropagation Strategy Compute the gradient of the error ∂ε ˆ k = ∂ε k 2 ∇ ∂Wk Adjust the weights in the direction opposite to the instantaneous error gradient Now, Wk is a vector that contains all the components of the net Slide 20 Artificial Intelligence Machine Learning
    • Backpropagation Algorithm Insert a new example Xk into the network and sweep it forward 1. till getting the output y Compute the square error of thi attribute C t th f this tt ib t 2. Ny Ny ε k 2 = ∑ ε ik 2 = ∑ (d ik − yik )2 i =1 i =1 For example, for two outputs (disregarding k) ε = (d 1 − y1 ) + (d 2 − y2 ) 2 2 2 Propagate the error to the previous layer (b k P t th t th i l (back-propagation). ti ) 3. How? Steepest descent p Compute the derivative of the square error δ for each Adaline Slide 21 Artificial Intelligence Machine Learning
    • Backpropagation Example Example borrowed from: http://home.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html Slide 22 Artificial Intelligence Machine Learning
    • Backpropagation Example 1. Sweep the weights forward Slide 23 Artificial Intelligence Machine Learning
    • Backpropagation Example 2. Backpropagate the error Slide 24 Artificial Intelligence Machine Learning
    • Backpropagation Example 3. Modify the weights of each neuron Slide 25 Artificial Intelligence Machine Learning
    • Backpropagation Example 3.bis. Do the same of each neuron Slide 26 Artificial Intelligence Machine Learning
    • Backpropagation Example 3.bis2. Until reaching the output Slide 27 Artificial Intelligence Machine Learning
    • Backpropagation for a Two-Layer Net. That is, the algorithm is Find the instantaneous square error derivative 1. 1 ∂ε 2 δj =− (l ) 2 ∂s j ( l ) This tells us how sensitive is the square output error of the network net ork is to changes in the linear output s of the associated o tp t Madaline Expanding the error term we g p g get 2. [ ] 1 ∂ ( d 1 − y1 ) 2 + ( d 2 − y 2 ) 2 1 ∂[ d 1 − sgm( s1 (2) )]2 δ1 =− =− (2) ∂s1 ∂s1 (2) (2) 2 2 And recognizing that d1 is independent of s1 3. δ 1( 2 ) = [ d 1 − sgm( s1( 2 ) )]sgm' ( s1( 2 ) ) = ε 1( 2 ) sgm' ( s1( 2 ) ) Slide 28 Artificial Intelligence Machine Learning
    • Backpropagation for a Two-Layer Net. That is, the algorithm is Similarly for the hidden layers we have 4. 1 ⎛ ∂ε 2 ∂s1 ∂ε 2 ∂s2 ⎞ 1 ∂ε 2 (2) (2) = − ⎜ (2) ⎟ δ 1( 1 ) =− + ⎜ ∂s (1) ⎟ 2 ∂s1 2 ⎝ 1 ∂s1 ∂s2 ∂s1 ⎠ (1) (1) (2) ∂s1 ( 2 ) ∂s 2 (2) (2) δ = δ1 + δ2 That is (1) (2) 5. ∂s1 ∂s1 1 (1) (1) Which yields 4. ⎡ ⎤ ⎡ ⎤ 3 3 ∂ ⎢ w10 ( 2 ) + ∑ w1 i ( 2 ) sgm ( si ( 1 ) ∂ ⎢ w20 ( 2 ) + ∑ w1 i ( 2 ) sgm ( s 2 ( 1 ) )⎥ )⎥ δ 1( 1 ) = δ +δ (2) ⎣ ⎦ (2) ⎣ ⎦ i =1 i =1 ∂s1( 1 ) ∂s1( 1 ) 1 2 = δ1 ) + δ2 (2) (2) (1) (2) (2) (1) w11 sgm' ( s1 w21 sgm' ( s1 ) [ ]sgm' ( s = δ1 + δ2 (2) (2) (2) (2) (1) w11 w21 ) 1 Slide 29 Artificial Intelligence Machine Learning
    • Backpropagation for a Two-Layer Net. Δ ε1 =δ1 + δ2 (1) (2) (2) (2) (2) w11 w21 Defining δ 1( 1 ) = ε 1( 1 ) sgm' ( s1( 1 ) ) We obtain Implementation details of each Adaline Slide 30
    • Next Class Support Vector Machines Slide 31 Artificial Intelligence Machine Learning
    • Introduction to Machine Learning Lecture 11 Neural Networks N lN t k Albert Orriols i Puig aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull