Neural Networks            Prof. Ruchi Sharma Bharatividyapeeth College of Engineering                 New Delhi          ...
Plan   Requirements   Background       Brains and Computers       Computational Models                      Learnabil...
Brains and Computers   What is computation?   Basic Computational Model    (Like C, Pascal, C++, etc)    (See course on ...
ComputationBrain also computes. ButIts programming seems   different.Flexible, Fault Tolerant  (neurons die every day),Aut...
Brain vs. Computer   Brain works on slow    components (10**-3 sec)   Computers use fast    components (10**-9 sec)   B...
Brain vs Computer   Areas that Brain is better:        Sense recognition and    integration        Working with incomplet...
AI vs. NNs   AI relates to cognitive    psychology       Chess, Theorem Proving,        Expert Systems, Intelligent     ...
How can NNs work?   Look at Brain:   10**10 neurons (10    Gigabytes) .   10 ** 20 possible    connections with differe...
Brain   Complex   Non-linear (more later!)   Parallel Processing   Fault Tolerant   Adaptive   Learns   Generalizes...
Abstracting   Note: Size may not be crucial    (apylsia or crab does many    things)   Look at simple structures first  ...
Real and ArtificialNeurons         Lecture 1    11
One Neuron     McCullough-Pitts        This is very complicated. But         abstracting the details,wex1       have     ...
Representability   What functions can be    represented by a network of    Mccullough-Pitts neurons?   Theorem: Every lo...
Proof   Show simple functions: and,    or, not, implies   Recall representability of    logic functions by DNF form.    ...
AND, OR, NOTx1              1.0           w1          1.0x2        w2          Σ          wn          Ιntegrate       Τhre...
AND, OR, NOTx1              1.0           w1          1.0x2        w2          Σ          wn          Ιntegrate       Τhre...
Lecture 1   17
AND, OR, NOTx1             -1.0           w1x2        w2          Σ          wn          Ιntegrate       Τhreshold     xn ...
DNF and All Functions   Theorem       Any logic (boolean) function of        any number of variables can        be repre...
Other Questions?What if we allow REAL numbers as inputs/outputs?What real functions can be represented?What if we modify ...
Representability andGeneralizability        Lecture 1      21
Learnability andGeneralizability   The previous theorem tells    us that neural networks are    potentially powerful, but...
One Neuron(Perceptron)   What can be represented by    one neuron?   Is there an automatic way to    learn a function by...
Perceptron TrainingRule Loop:      Take an example. Apply  to network. If correct  answer,  return to loop. If incorrect,...
Example ofPerceptronLearning   X1 = 1 (+) x2 = -.5 (-)   X3 = 3 (+) x4 = -2 (-)   Expanded Vector       Y1 = (1,1) (+)...
Graph of Learning        Lecture 1   26
Trace of Perceptron       W1 y1 = (-2.5,1.75) (1,1)<0        wrong   W2 = w1 + y1 = (-1.5, 2.75)       W2 y2 = (-1.5, 2...
PerceptronConvergenceTheorem   If the concept is    representable in a    perceptron then the    perceptron learning rule...
What is a NeuralNetwork?   What is an abstract Neuron?   What is a Neural Network?   How are they computed?   What are...
Perceptron Algorithm   Start: Choose arbitrary value    for weights, W   Test: Choose arbitrary    example X   If X pos...
Perceptron Conv.Thm.   Let F be a set of unit length    vectors. If there is a vector    V* and a value e>0 such that    ...
Applying Algorithm to“And”   W0 = (0,0,1) or random   X1 = (0,0,1) result 0   X2 = (0,1,1) result 0   X3 = (1,0, 1) re...
“And” continued    Wo X1 > 0 wrong;    W1 = W0 – X1 = (0,0,0)    W1 X2 = 0 OK (Bdry)    W1 X3 = 0 OK    W1 X4 = 0 wro...
“And” page 3   W8 X3 = 1 wrong   W9 = W8 – X3 = (1,0, 0)   W9X4 = 1 OK   W9 X1 = 0 OK   W9 X2 = 0 OK   W9 X3 = 1 wro...
Proof of Conv    Theorem  Note:   1. By hypothesis, there is a δ >0such that V*X > δ for all x ε F  1. Can eliminate thre...
Proof (cont). Consider quotient V*W/|W|.(note: this is multidimensionalcosine between V* and W.)Recall V* is unit vector ...
Proof(cont)Now each time FIX is visited W changes via ADD.V* W(n+1) = V*(W(n) + X)          = V* W(n) + V*X          >= V...
Proof (cont)   Now consider denominator:   |W(n+1)| = W(n+1)W(n+1) =    ( W(n) + X)(W(n) + X) = |W(n)|    **2 + 2W(n)X +...
Proof (cont)   Putting (*) and (**) together:    Quotient = V*W/|W|          > nδ/ sqrt(n)Since Quotient <=1 this means  ...
Geometric Proof   See hand slides.              Lecture 1   40
Perceptron Diagram 1               Solution                 OKNo examples               No examples               No examp...
Solution                 OKNo examples               No examples               No examples              Lecture 1         ...
Lecture 1   43
Additional Facts   Note: If X’s presented in    systematic way, then solution W    always found.   Note: Not necessarily...
PerceptronConvergenceTheorem   If the concept is    representable in a    perceptron then the    perceptron learning rule...
Important Points   Theorem only guarantees    result IF representable!   Usually we are not interested    in just repres...
Percentage of BooleanFunctions Representible bya Perceptron   Input   Perceptron Functions      1         4           4  ...
Generalizability   Typically train a network on a    sample set of examples   Use it on general class   Training can be...
Perceptron    •weights   ∑w x i i   > threshold ⇔ A in receptive field                      kdkdkfjlll ∑w x    i    i    >...
What wont work?   Example: Connectedness    with bounded diameter    perceptron. Compare with Convex with(use sensors of...
Biases andThresholds   We can    replace the    threshold with    a bias.                       n                        ...
Perceptron   Loop :       Take an example and apply to        network.       If correct answer – return to        Loop....
Perceptron Algorithm v     x∈ FLet       be arbitraryChoose: chooseFTest: v ⋅Ifx > 0x∈  x ∈ and                        F+ ...
Perceptron Algorithm   Conditions to the algorithm    existence :        Condition no.1: +                             ...
Geometric viewpoint             wn + 1         x               v   *    wn             Lecture 1           55
Perceptron Algorithm   Based on these conditions the    number of times we enter the    Loop is finite.                 E...
Perceptron Algorithm-Proof   We replace the threshold    with a bias.   We assume F is a group    of unit vectors.      ...
Perceptron Algorithm-Proof    We reduce                      ∑ ay≤0                    ∑ a (− y ) ≥ 0    what we have    ...
Perceptron Algorithm- Proof                            A* ⋅ A                                A* ⋅ A                  G ( A...
Perceptron Algorithm- ProofThedenominator:         At + 1 = ( At + 1 )( At + 1 ) = ( At + φ )( At + φ ) =              2  ...
Perceptron Algorithm- ProofFrom thedenominator:                             2                             An < nFrom thenu...
Example - AND                                               x1 x2   bias ANDw0 = 0,0,0                                    ...
AND – Bi Polar-    solution    -        1   0                            -   -1   1                                       ...
Problem                 LHS1 MHS1 RHS1 + MHS2 MHS 2RHS < LHSLHS 3 + 3MHS 33 + RHS 3 > 0                       LHS 2       ...
Linear Separation   Every perceptron determines    a classification of vector    inputs which is determined    by a hyper...
Linear Separation inHigher DimensionsIn higher dimensions, still  linear separation, but hard to  tellExample: Connected; ...
What wont work?   Try XOR.               Lecture 1   67
Limitations ofPerceptron   Representability       Only concepts that are linearly        separable.       Compare: Conv...
Upcoming SlideShare
Loading in …5
×

Nn3

281 views

Published on

neural neworks

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
281
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Nn3

  1. 1. Neural Networks Prof. Ruchi Sharma Bharatividyapeeth College of Engineering New Delhi Lecture 1 1
  2. 2. Plan Requirements Background  Brains and Computers  Computational Models  Learnability vs Programming  Representability vs Training Rules Abstractions of Neurons Abstractions of Networks  Completeness of 3-Level Mc- Cullough-Pitts Neurons Learnability in Perceptrons Lecture 1 2
  3. 3. Brains and Computers What is computation? Basic Computational Model (Like C, Pascal, C++, etc) (See course on Models of Computation.) Analysis of problem and reduction to algorithm. Implementation of algorithm by Programmer. Lecture 1 3
  4. 4. ComputationBrain also computes. ButIts programming seems different.Flexible, Fault Tolerant (neurons die every day),Automatic Programming,Learns from examples,Generalizability Lecture 1 4
  5. 5. Brain vs. Computer Brain works on slow components (10**-3 sec) Computers use fast components (10**-9 sec) Brain more efficient (few joules per operation) (factor of 10**10.) Uses massively parallel mechanism. ****Can we copy its secrets? Lecture 1 5
  6. 6. Brain vs Computer Areas that Brain is better: Sense recognition and integration Working with incomplete information Generalizing Learning from examples Fault tolerant (regular programming is notoriously fragile.) Lecture 1 6
  7. 7. AI vs. NNs AI relates to cognitive psychology  Chess, Theorem Proving, Expert Systems, Intelligent agents (e.g. on Internet) NNs relates to neurophysiology  Recognition tasks, Associative Memory, Learning Lecture 1 7
  8. 8. How can NNs work? Look at Brain: 10**10 neurons (10 Gigabytes) . 10 ** 20 possible connections with different numbers of dendrites (reals) Actually about 6x 10**13 connections (I.e. 60,000 hard discs for one photo of contents of one brain!) Lecture 1 8
  9. 9. Brain Complex Non-linear (more later!) Parallel Processing Fault Tolerant Adaptive Learns Generalizes Self Programs Lecture 1 9
  10. 10. Abstracting Note: Size may not be crucial (apylsia or crab does many things) Look at simple structures first Lecture 1 10
  11. 11. Real and ArtificialNeurons Lecture 1 11
  12. 12. One Neuron McCullough-Pitts  This is very complicated. But abstracting the details,wex1 have w1x2 w2 Σ wn Ιntegrate Τhreshold xn Integrate-and-fire Neuron Lecture 1 12
  13. 13. Representability What functions can be represented by a network of Mccullough-Pitts neurons? Theorem: Every logic function of an arbitrary number of variables can be represented by a three level network of neurons. Lecture 1 13
  14. 14. Proof Show simple functions: and, or, not, implies Recall representability of logic functions by DNF form. Lecture 1 14
  15. 15. AND, OR, NOTx1 1.0 w1 1.0x2 w2 Σ wn Ιntegrate Τhreshold xn 1.5 Lecture 1 15
  16. 16. AND, OR, NOTx1 1.0 w1 1.0x2 w2 Σ wn Ιntegrate Τhreshold xn .9 Lecture 1 16
  17. 17. Lecture 1 17
  18. 18. AND, OR, NOTx1 -1.0 w1x2 w2 Σ wn Ιntegrate Τhreshold xn -.5 Lecture 1 18
  19. 19. DNF and All Functions Theorem  Any logic (boolean) function of any number of variables can be represented in a network of McCullough-Pitts neurons.  In fact the depth of the network is three.  Proof: Use DNF and And, Or, Not representation Lecture 1 19
  20. 20. Other Questions?What if we allow REAL numbers as inputs/outputs?What real functions can be represented?What if we modify threshold to some other function; so output is not {0,1}. What functions can be represented? Lecture 1 20
  21. 21. Representability andGeneralizability Lecture 1 21
  22. 22. Learnability andGeneralizability The previous theorem tells us that neural networks are potentially powerful, but doesn’t tell us how to use them. We desire simple networks with uniform training rules. Lecture 1 22
  23. 23. One Neuron(Perceptron) What can be represented by one neuron? Is there an automatic way to learn a function by examples? Lecture 1 23
  24. 24. Perceptron TrainingRule Loop: Take an example. Apply to network. If correct answer, return to loop. If incorrect, go to FIX.FIX: Adjust network weights by input example . Go to Loop. Lecture 1 24
  25. 25. Example ofPerceptronLearning X1 = 1 (+) x2 = -.5 (-) X3 = 3 (+) x4 = -2 (-) Expanded Vector  Y1 = (1,1) (+) y2= (-.5,1)(-)  Y3 = (3,1) (+) y4 = (-2,1) (-)Random initial weight (-2.5, 1.75) Lecture 1 25
  26. 26. Graph of Learning Lecture 1 26
  27. 27. Trace of Perceptron  W1 y1 = (-2.5,1.75) (1,1)<0 wrong W2 = w1 + y1 = (-1.5, 2.75)  W2 y2 = (-1.5, 2.75)(-.5, 1)>0 wrong W3 = w2 – y2 = (-1, 1.75)  W3 y3 = (-1,1.75)(3,1) <0 wrong W4 = w4 + y3 = (2, 2.75) Lecture 1 27
  28. 28. PerceptronConvergenceTheorem If the concept is representable in a perceptron then the perceptron learning rule will converge in a finite amount of time. (MAKE PRECISE and Prove) Lecture 1 28
  29. 29. What is a NeuralNetwork? What is an abstract Neuron? What is a Neural Network? How are they computed? What are the advantages? Where can they be used?  Agenda  What to expect Lecture 1 29
  30. 30. Perceptron Algorithm Start: Choose arbitrary value for weights, W Test: Choose arbitrary example X If X pos and WX >0 or X neg and WX <= 0 go to Test Fix:  If X pos W := W +X;  If X negative W:= W –X;  Go to Test; Lecture 1 30
  31. 31. Perceptron Conv.Thm. Let F be a set of unit length vectors. If there is a vector V* and a value e>0 such that V*X > e for all X in F then the perceptron program goes to FIX only a finite number of times. Lecture 1 31
  32. 32. Applying Algorithm to“And” W0 = (0,0,1) or random X1 = (0,0,1) result 0 X2 = (0,1,1) result 0 X3 = (1,0, 1) result 0 X4 = (1,1,1) result 1 Lecture 1 32
  33. 33. “And” continued  Wo X1 > 0 wrong;  W1 = W0 – X1 = (0,0,0)  W1 X2 = 0 OK (Bdry)  W1 X3 = 0 OK  W1 X4 = 0 wrong;  W2 = W1 +X4 = (1,1,1)  W3 X1 = 1 wrong  W4 = W3 –X1 = (1,1,0)  W4X2 = 1 wrong  W5 = W4 – X2 = (1,0, -1)  W5 X3 = 0 OK  W5 X4 = 0 wrong  W6 = W5 + X4 = (2, 1, 0)  W6 X1 = 0 OK  W6 X2 = 1 wrong  W7 = W7 – X2 = (2,0, -1) Lecture 1 33
  34. 34. “And” page 3 W8 X3 = 1 wrong W9 = W8 – X3 = (1,0, 0) W9X4 = 1 OK W9 X1 = 0 OK W9 X2 = 0 OK W9 X3 = 1 wrong W10 = W9 – X3 = (0,0,-1) W10X4 = -1 wrong W11 = W10 + X4 = (1,1,0) W11X1 =0 OK W11X2 = 1 wrong W12 = W12 – X2 = (1,0, -1) Lecture 1 34
  35. 35. Proof of Conv Theorem Note: 1. By hypothesis, there is a δ >0such that V*X > δ for all x ε F 1. Can eliminate threshold (add additional dimension to input) W(x,y,z) > threshold if and only if W* (x,y,z,1) > 0 2. Can assume all examples are positive ones (Replace negative examples by their negated vectors) W(x,y,z) <0 if and only if W(-x,-y,-z) > 0. Lecture 1 35
  36. 36. Proof (cont). Consider quotient V*W/|W|.(note: this is multidimensionalcosine between V* and W.)Recall V* is unit vector .Quotient <= 1. Lecture 1 36
  37. 37. Proof(cont)Now each time FIX is visited W changes via ADD.V* W(n+1) = V*(W(n) + X) = V* W(n) + V*X >= V* W(n) + δHenceV* W(n) >= n δ (∗) Lecture 1 37
  38. 38. Proof (cont) Now consider denominator: |W(n+1)| = W(n+1)W(n+1) = ( W(n) + X)(W(n) + X) = |W(n)| **2 + 2W(n)X + 1 (recall |X| = 1 and W(n)X < 0 since X is positive example and we are in FIX) < |W(n)|**2 + 1So after n times |W(n+1)|**2 < n (**) Lecture 1 38
  39. 39. Proof (cont) Putting (*) and (**) together: Quotient = V*W/|W| > nδ/ sqrt(n)Since Quotient <=1 this means n < (1/δ)∗∗2.This means we enter FIX a bounded number of times. Q.E.D. Lecture 1 39
  40. 40. Geometric Proof See hand slides. Lecture 1 40
  41. 41. Perceptron Diagram 1 Solution OKNo examples No examples No examples Lecture 1 41
  42. 42. Solution OKNo examples No examples No examples Lecture 1 42
  43. 43. Lecture 1 43
  44. 44. Additional Facts Note: If X’s presented in systematic way, then solution W always found. Note: Not necessarily same as V* Note: If F not finite, may not obtain solution in finite time Can modify algorithm in minor ways and stays valid (e.g. not unit but bounded examples); changes in W(n). Lecture 1 44
  45. 45. PerceptronConvergenceTheorem If the concept is representable in a perceptron then the perceptron learning rule will converge in a finite amount of time. (MAKE PRECISE and Prove) Lecture 1 45
  46. 46. Important Points Theorem only guarantees result IF representable! Usually we are not interested in just representing something but in its generalizability – how will it work on examples we havent seen! Lecture 1 46
  47. 47. Percentage of BooleanFunctions Representible bya Perceptron Input Perceptron Functions 1 4 4 2 16 14 3 104 256 4 1,882 65,536 5 94,572 10**9 6 15,028,134 10**19 7 8,378,070,864 10**38 8 17,561,539,552,946 10**77 Lecture 1 47
  48. 48. Generalizability Typically train a network on a sample set of examples Use it on general class Training can be slow; but execution is fast. Lecture 1 48
  49. 49. Perceptron •weights ∑w x i i > threshold ⇔ A in receptive field kdkdkfjlll ∑w x i i > θ ⇔ The letter A is in the recept •Pattern Identification •(Note: Neuron is trained) Lecture 1 49
  50. 50. What wont work? Example: Connectedness with bounded diameter perceptron. Compare with Convex with(use sensors of order three). Lecture 1 50
  51. 51. Biases andThresholds We can replace the threshold with a bias. n ∑ wx >θ i i A bias acts i= 1 n exactly as a weight on a ∑ w x -θ > 0 i= 1 i i n connection from a unit ∑ wx > 0 i= 0 i i w0 = - θ x0 = 1 whose activation is always 1. Lecture 1 51
  52. 52. Perceptron Loop :  Take an example and apply to network.  If correct answer – return to Loop.  If incorrect – go to Fix. Fix :  Adjust network weights by input example.  Go to Loop. Lecture 1 52
  53. 53. Perceptron Algorithm v x∈ FLet be arbitraryChoose: chooseFTest: v ⋅Ifx > 0x∈ x ∈ and F+ go to v⋅ x ≤ Choose 0 x∈ F + If and go to v⋅ x ≤ 0 Fix plus x∈ F − vIf > 0 ⋅x x ∈ and F− go to v := v + x Choose If and go to v := v − x Fix minusFix plus: go to ChooseFix minus: go to Choose Lecture 1 53
  54. 54. Perceptron Algorithm Conditions to the algorithm existence : Condition no.1: +    if x ∈ F v * ⋅ x > 0 ∃ v∗ ,   if x ∈ F v * ⋅ x ≤ 0 −  Condition no.2: We choose F to be a group of unit vectors. Lecture 1 54
  55. 55. Geometric viewpoint wn + 1 x v * wn Lecture 1 55
  56. 56. Perceptron Algorithm Based on these conditions the number of times we enter the Loop is finite. Examples Proof: world + − F = F F { x A x > 0} { y A y ≤ 0} * * Positiv Negati e ve exampl examp es les Lecture 1 56
  57. 57. Perceptron Algorithm-Proof We replace the threshold with a bias. We assume F is a group of unit vectors. ∀φ ∈ F φ =1 Lecture 1 57
  58. 58. Perceptron Algorithm-Proof We reduce ∑ ay≤0 ∑ a (− y ) ≥ 0 what we have * i i ⇒ * i i to prove by yi ⇒ -yi eliminating all + the negative F= F examples and placing their ∀φ ∈ F A*φ > 0 negations in the positive ∀φ ∈ F A*φ > δ examples. ⇓ ∃δ > 0 Aφ > δ * Lecture 1 58
  59. 59. Perceptron Algorithm- Proof A* ⋅ A A* ⋅ A G ( A) = * = ≤1 A ⋅A AThenumerator : A ⋅ Al + 1 = A ⋅ ( Al + φ ) = A ⋅ Al + A ⋅ φ ≥ A ⋅ Al + δ * * * * * >δ After n changes A* ⋅ An ≥ n ⋅ δ Lecture 1 59
  60. 60. Perceptron Algorithm- ProofThedenominator: At + 1 = ( At + 1 )( At + 1 ) = ( At + φ )( At + φ ) = 2 2 2 2 = At + 2 ⋅ At ⋅ φ + φ < At + 1 <0 =1 After n 2 changes An < n Lecture 1 60
  61. 61. Perceptron Algorithm- ProofFrom thedenominator: 2 An < nFrom thenumerator : A* ⋅ An > n ⋅ δ A ⋅ An n ⋅ δ * G ( A) = > = n ⋅δ ≤ 1 An n ⇓ n is 1 final n< δ 2 Lecture 1 61
  62. 62. Example - AND x1 x2 bias ANDw0 = 0,0,0 0w0 ⋅ x 0 = w0 ⋅ 0,0,1 = 0w0 ⋅ x1 = w0 ⋅ 0,1,1 = 0 0 0 1 0w0 ⋅ x 2 = w0 ⋅ 1,0,1 = 0  wrong 0 1 1 0w0 ⋅ x3 = w0 ⋅ 1,1,1 = 0FIX = w1 = w0 + x3 = 0,0,0 + 1,1,1 = 1,1,1 1 0 1 1 w1 ⋅ x 0 = w1 ⋅ 0,0,1 = 1FIX = w2 = w1 − x 0 = 1,1,1 − 0,0,1wrong  = 1,1,0 1 1 1 w2 ⋅ x1 = w2 ⋅ 0,1,1 = 1FIX = w3 = w2 − x1 = 1,1,0 − 0,1,1 = 1,0,−1  wrong …etc Lecture 1 62
  63. 63. AND – Bi Polar- solution - 1 0 - -1 1 1 0 1 1 - -1 1 0 1 1 0 1 1- - 1 1 0 1 1 01 11 1 1 0  wrong+ 1 1 1 1- - 1 1 1 1 01 1 -1 1 1 1  wrong - success- 0 2 0 1 1 11  wrong - 1 1 -1 1 1 1 1 continue Lecture 1 63
  64. 64. Problem LHS1 MHS1 RHS1 + MHS2 MHS 2RHS < LHSLHS 3 + 3MHS 33 + RHS 3 > 0 LHS 2 LHS + RHS 2 0 3 MHS RHS 2 2 LHS1 + MHS1 + RHS1 > 0 MHS1 = MHS 2 = MHS 3LHS1 + RHS1 > 0 LHS 2 + RHS 2 < 0 LHS 3 + RHS3 > 0 LHS1 = LHS 2 RHS 2 = RHS 3 should be small RHS 2 should be small LHS 2 enough so that enough so that LHS 2 + RHS 2 < 0 LHS 2 + RHS 2 < 0 contradictio !!!n Lecture 1 64
  65. 65. Linear Separation Every perceptron determines a classification of vector inputs which is determined by a hyperline Two dimensional examples (add algebra) OR AND XOR not possible Lecture 1 65
  66. 66. Linear Separation inHigher DimensionsIn higher dimensions, still linear separation, but hard to tellExample: Connected; Convex - which can be handled by Perceptron with local sensors; which can not be.Note: Define local sensors. Lecture 1 66
  67. 67. What wont work? Try XOR. Lecture 1 67
  68. 68. Limitations ofPerceptron Representability  Only concepts that are linearly separable.  Compare: Convex versus connected  Examples: XOR vs OR Lecture 1 68

×