Nn3

  • 87 views
Uploaded on

neural neworks

neural neworks

More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
87
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
2
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Neural Networks Prof. Ruchi Sharma Bharatividyapeeth College of Engineering New Delhi Lecture 1 1
  • 2. Plan Requirements Background  Brains and Computers  Computational Models  Learnability vs Programming  Representability vs Training Rules Abstractions of Neurons Abstractions of Networks  Completeness of 3-Level Mc- Cullough-Pitts Neurons Learnability in Perceptrons Lecture 1 2
  • 3. Brains and Computers What is computation? Basic Computational Model (Like C, Pascal, C++, etc) (See course on Models of Computation.) Analysis of problem and reduction to algorithm. Implementation of algorithm by Programmer. Lecture 1 3
  • 4. ComputationBrain also computes. ButIts programming seems different.Flexible, Fault Tolerant (neurons die every day),Automatic Programming,Learns from examples,Generalizability Lecture 1 4
  • 5. Brain vs. Computer Brain works on slow components (10**-3 sec) Computers use fast components (10**-9 sec) Brain more efficient (few joules per operation) (factor of 10**10.) Uses massively parallel mechanism. ****Can we copy its secrets? Lecture 1 5
  • 6. Brain vs Computer Areas that Brain is better: Sense recognition and integration Working with incomplete information Generalizing Learning from examples Fault tolerant (regular programming is notoriously fragile.) Lecture 1 6
  • 7. AI vs. NNs AI relates to cognitive psychology  Chess, Theorem Proving, Expert Systems, Intelligent agents (e.g. on Internet) NNs relates to neurophysiology  Recognition tasks, Associative Memory, Learning Lecture 1 7
  • 8. How can NNs work? Look at Brain: 10**10 neurons (10 Gigabytes) . 10 ** 20 possible connections with different numbers of dendrites (reals) Actually about 6x 10**13 connections (I.e. 60,000 hard discs for one photo of contents of one brain!) Lecture 1 8
  • 9. Brain Complex Non-linear (more later!) Parallel Processing Fault Tolerant Adaptive Learns Generalizes Self Programs Lecture 1 9
  • 10. Abstracting Note: Size may not be crucial (apylsia or crab does many things) Look at simple structures first Lecture 1 10
  • 11. Real and ArtificialNeurons Lecture 1 11
  • 12. One Neuron McCullough-Pitts  This is very complicated. But abstracting the details,wex1 have w1x2 w2 Σ wn Ιntegrate Τhreshold xn Integrate-and-fire Neuron Lecture 1 12
  • 13. Representability What functions can be represented by a network of Mccullough-Pitts neurons? Theorem: Every logic function of an arbitrary number of variables can be represented by a three level network of neurons. Lecture 1 13
  • 14. Proof Show simple functions: and, or, not, implies Recall representability of logic functions by DNF form. Lecture 1 14
  • 15. AND, OR, NOTx1 1.0 w1 1.0x2 w2 Σ wn Ιntegrate Τhreshold xn 1.5 Lecture 1 15
  • 16. AND, OR, NOTx1 1.0 w1 1.0x2 w2 Σ wn Ιntegrate Τhreshold xn .9 Lecture 1 16
  • 17. Lecture 1 17
  • 18. AND, OR, NOTx1 -1.0 w1x2 w2 Σ wn Ιntegrate Τhreshold xn -.5 Lecture 1 18
  • 19. DNF and All Functions Theorem  Any logic (boolean) function of any number of variables can be represented in a network of McCullough-Pitts neurons.  In fact the depth of the network is three.  Proof: Use DNF and And, Or, Not representation Lecture 1 19
  • 20. Other Questions?What if we allow REAL numbers as inputs/outputs?What real functions can be represented?What if we modify threshold to some other function; so output is not {0,1}. What functions can be represented? Lecture 1 20
  • 21. Representability andGeneralizability Lecture 1 21
  • 22. Learnability andGeneralizability The previous theorem tells us that neural networks are potentially powerful, but doesn’t tell us how to use them. We desire simple networks with uniform training rules. Lecture 1 22
  • 23. One Neuron(Perceptron) What can be represented by one neuron? Is there an automatic way to learn a function by examples? Lecture 1 23
  • 24. Perceptron TrainingRule Loop: Take an example. Apply to network. If correct answer, return to loop. If incorrect, go to FIX.FIX: Adjust network weights by input example . Go to Loop. Lecture 1 24
  • 25. Example ofPerceptronLearning X1 = 1 (+) x2 = -.5 (-) X3 = 3 (+) x4 = -2 (-) Expanded Vector  Y1 = (1,1) (+) y2= (-.5,1)(-)  Y3 = (3,1) (+) y4 = (-2,1) (-)Random initial weight (-2.5, 1.75) Lecture 1 25
  • 26. Graph of Learning Lecture 1 26
  • 27. Trace of Perceptron  W1 y1 = (-2.5,1.75) (1,1)<0 wrong W2 = w1 + y1 = (-1.5, 2.75)  W2 y2 = (-1.5, 2.75)(-.5, 1)>0 wrong W3 = w2 – y2 = (-1, 1.75)  W3 y3 = (-1,1.75)(3,1) <0 wrong W4 = w4 + y3 = (2, 2.75) Lecture 1 27
  • 28. PerceptronConvergenceTheorem If the concept is representable in a perceptron then the perceptron learning rule will converge in a finite amount of time. (MAKE PRECISE and Prove) Lecture 1 28
  • 29. What is a NeuralNetwork? What is an abstract Neuron? What is a Neural Network? How are they computed? What are the advantages? Where can they be used?  Agenda  What to expect Lecture 1 29
  • 30. Perceptron Algorithm Start: Choose arbitrary value for weights, W Test: Choose arbitrary example X If X pos and WX >0 or X neg and WX <= 0 go to Test Fix:  If X pos W := W +X;  If X negative W:= W –X;  Go to Test; Lecture 1 30
  • 31. Perceptron Conv.Thm. Let F be a set of unit length vectors. If there is a vector V* and a value e>0 such that V*X > e for all X in F then the perceptron program goes to FIX only a finite number of times. Lecture 1 31
  • 32. Applying Algorithm to“And” W0 = (0,0,1) or random X1 = (0,0,1) result 0 X2 = (0,1,1) result 0 X3 = (1,0, 1) result 0 X4 = (1,1,1) result 1 Lecture 1 32
  • 33. “And” continued  Wo X1 > 0 wrong;  W1 = W0 – X1 = (0,0,0)  W1 X2 = 0 OK (Bdry)  W1 X3 = 0 OK  W1 X4 = 0 wrong;  W2 = W1 +X4 = (1,1,1)  W3 X1 = 1 wrong  W4 = W3 –X1 = (1,1,0)  W4X2 = 1 wrong  W5 = W4 – X2 = (1,0, -1)  W5 X3 = 0 OK  W5 X4 = 0 wrong  W6 = W5 + X4 = (2, 1, 0)  W6 X1 = 0 OK  W6 X2 = 1 wrong  W7 = W7 – X2 = (2,0, -1) Lecture 1 33
  • 34. “And” page 3 W8 X3 = 1 wrong W9 = W8 – X3 = (1,0, 0) W9X4 = 1 OK W9 X1 = 0 OK W9 X2 = 0 OK W9 X3 = 1 wrong W10 = W9 – X3 = (0,0,-1) W10X4 = -1 wrong W11 = W10 + X4 = (1,1,0) W11X1 =0 OK W11X2 = 1 wrong W12 = W12 – X2 = (1,0, -1) Lecture 1 34
  • 35. Proof of Conv Theorem Note: 1. By hypothesis, there is a δ >0such that V*X > δ for all x ε F 1. Can eliminate threshold (add additional dimension to input) W(x,y,z) > threshold if and only if W* (x,y,z,1) > 0 2. Can assume all examples are positive ones (Replace negative examples by their negated vectors) W(x,y,z) <0 if and only if W(-x,-y,-z) > 0. Lecture 1 35
  • 36. Proof (cont). Consider quotient V*W/|W|.(note: this is multidimensionalcosine between V* and W.)Recall V* is unit vector .Quotient <= 1. Lecture 1 36
  • 37. Proof(cont)Now each time FIX is visited W changes via ADD.V* W(n+1) = V*(W(n) + X) = V* W(n) + V*X >= V* W(n) + δHenceV* W(n) >= n δ (∗) Lecture 1 37
  • 38. Proof (cont) Now consider denominator: |W(n+1)| = W(n+1)W(n+1) = ( W(n) + X)(W(n) + X) = |W(n)| **2 + 2W(n)X + 1 (recall |X| = 1 and W(n)X < 0 since X is positive example and we are in FIX) < |W(n)|**2 + 1So after n times |W(n+1)|**2 < n (**) Lecture 1 38
  • 39. Proof (cont) Putting (*) and (**) together: Quotient = V*W/|W| > nδ/ sqrt(n)Since Quotient <=1 this means n < (1/δ)∗∗2.This means we enter FIX a bounded number of times. Q.E.D. Lecture 1 39
  • 40. Geometric Proof See hand slides. Lecture 1 40
  • 41. Perceptron Diagram 1 Solution OKNo examples No examples No examples Lecture 1 41
  • 42. Solution OKNo examples No examples No examples Lecture 1 42
  • 43. Lecture 1 43
  • 44. Additional Facts Note: If X’s presented in systematic way, then solution W always found. Note: Not necessarily same as V* Note: If F not finite, may not obtain solution in finite time Can modify algorithm in minor ways and stays valid (e.g. not unit but bounded examples); changes in W(n). Lecture 1 44
  • 45. PerceptronConvergenceTheorem If the concept is representable in a perceptron then the perceptron learning rule will converge in a finite amount of time. (MAKE PRECISE and Prove) Lecture 1 45
  • 46. Important Points Theorem only guarantees result IF representable! Usually we are not interested in just representing something but in its generalizability – how will it work on examples we havent seen! Lecture 1 46
  • 47. Percentage of BooleanFunctions Representible bya Perceptron Input Perceptron Functions 1 4 4 2 16 14 3 104 256 4 1,882 65,536 5 94,572 10**9 6 15,028,134 10**19 7 8,378,070,864 10**38 8 17,561,539,552,946 10**77 Lecture 1 47
  • 48. Generalizability Typically train a network on a sample set of examples Use it on general class Training can be slow; but execution is fast. Lecture 1 48
  • 49. Perceptron •weights ∑w x i i > threshold ⇔ A in receptive field kdkdkfjlll ∑w x i i > θ ⇔ The letter A is in the recept •Pattern Identification •(Note: Neuron is trained) Lecture 1 49
  • 50. What wont work? Example: Connectedness with bounded diameter perceptron. Compare with Convex with(use sensors of order three). Lecture 1 50
  • 51. Biases andThresholds We can replace the threshold with a bias. n ∑ wx >θ i i A bias acts i= 1 n exactly as a weight on a ∑ w x -θ > 0 i= 1 i i n connection from a unit ∑ wx > 0 i= 0 i i w0 = - θ x0 = 1 whose activation is always 1. Lecture 1 51
  • 52. Perceptron Loop :  Take an example and apply to network.  If correct answer – return to Loop.  If incorrect – go to Fix. Fix :  Adjust network weights by input example.  Go to Loop. Lecture 1 52
  • 53. Perceptron Algorithm v x∈ FLet be arbitraryChoose: chooseFTest: v ⋅Ifx > 0x∈ x ∈ and F+ go to v⋅ x ≤ Choose 0 x∈ F + If and go to v⋅ x ≤ 0 Fix plus x∈ F − vIf > 0 ⋅x x ∈ and F− go to v := v + x Choose If and go to v := v − x Fix minusFix plus: go to ChooseFix minus: go to Choose Lecture 1 53
  • 54. Perceptron Algorithm Conditions to the algorithm existence : Condition no.1: +    if x ∈ F v * ⋅ x > 0 ∃ v∗ ,   if x ∈ F v * ⋅ x ≤ 0 −  Condition no.2: We choose F to be a group of unit vectors. Lecture 1 54
  • 55. Geometric viewpoint wn + 1 x v * wn Lecture 1 55
  • 56. Perceptron Algorithm Based on these conditions the number of times we enter the Loop is finite. Examples Proof: world + − F = F F { x A x > 0} { y A y ≤ 0} * * Positiv Negati e ve exampl examp es les Lecture 1 56
  • 57. Perceptron Algorithm-Proof We replace the threshold with a bias. We assume F is a group of unit vectors. ∀φ ∈ F φ =1 Lecture 1 57
  • 58. Perceptron Algorithm-Proof We reduce ∑ ay≤0 ∑ a (− y ) ≥ 0 what we have * i i ⇒ * i i to prove by yi ⇒ -yi eliminating all + the negative F= F examples and placing their ∀φ ∈ F A*φ > 0 negations in the positive ∀φ ∈ F A*φ > δ examples. ⇓ ∃δ > 0 Aφ > δ * Lecture 1 58
  • 59. Perceptron Algorithm- Proof A* ⋅ A A* ⋅ A G ( A) = * = ≤1 A ⋅A AThenumerator : A ⋅ Al + 1 = A ⋅ ( Al + φ ) = A ⋅ Al + A ⋅ φ ≥ A ⋅ Al + δ * * * * * >δ After n changes A* ⋅ An ≥ n ⋅ δ Lecture 1 59
  • 60. Perceptron Algorithm- ProofThedenominator: At + 1 = ( At + 1 )( At + 1 ) = ( At + φ )( At + φ ) = 2 2 2 2 = At + 2 ⋅ At ⋅ φ + φ < At + 1 <0 =1 After n 2 changes An < n Lecture 1 60
  • 61. Perceptron Algorithm- ProofFrom thedenominator: 2 An < nFrom thenumerator : A* ⋅ An > n ⋅ δ A ⋅ An n ⋅ δ * G ( A) = > = n ⋅δ ≤ 1 An n ⇓ n is 1 final n< δ 2 Lecture 1 61
  • 62. Example - AND x1 x2 bias ANDw0 = 0,0,0 0w0 ⋅ x 0 = w0 ⋅ 0,0,1 = 0w0 ⋅ x1 = w0 ⋅ 0,1,1 = 0 0 0 1 0w0 ⋅ x 2 = w0 ⋅ 1,0,1 = 0  wrong 0 1 1 0w0 ⋅ x3 = w0 ⋅ 1,1,1 = 0FIX = w1 = w0 + x3 = 0,0,0 + 1,1,1 = 1,1,1 1 0 1 1 w1 ⋅ x 0 = w1 ⋅ 0,0,1 = 1FIX = w2 = w1 − x 0 = 1,1,1 − 0,0,1wrong  = 1,1,0 1 1 1 w2 ⋅ x1 = w2 ⋅ 0,1,1 = 1FIX = w3 = w2 − x1 = 1,1,0 − 0,1,1 = 1,0,−1  wrong …etc Lecture 1 62
  • 63. AND – Bi Polar- solution - 1 0 - -1 1 1 0 1 1 - -1 1 0 1 1 0 1 1- - 1 1 0 1 1 01 11 1 1 0  wrong+ 1 1 1 1- - 1 1 1 1 01 1 -1 1 1 1  wrong - success- 0 2 0 1 1 11  wrong - 1 1 -1 1 1 1 1 continue Lecture 1 63
  • 64. Problem LHS1 MHS1 RHS1 + MHS2 MHS 2RHS < LHSLHS 3 + 3MHS 33 + RHS 3 > 0 LHS 2 LHS + RHS 2 0 3 MHS RHS 2 2 LHS1 + MHS1 + RHS1 > 0 MHS1 = MHS 2 = MHS 3LHS1 + RHS1 > 0 LHS 2 + RHS 2 < 0 LHS 3 + RHS3 > 0 LHS1 = LHS 2 RHS 2 = RHS 3 should be small RHS 2 should be small LHS 2 enough so that enough so that LHS 2 + RHS 2 < 0 LHS 2 + RHS 2 < 0 contradictio !!!n Lecture 1 64
  • 65. Linear Separation Every perceptron determines a classification of vector inputs which is determined by a hyperline Two dimensional examples (add algebra) OR AND XOR not possible Lecture 1 65
  • 66. Linear Separation inHigher DimensionsIn higher dimensions, still linear separation, but hard to tellExample: Connected; Convex - which can be handled by Perceptron with local sensors; which can not be.Note: Define local sensors. Lecture 1 66
  • 67. What wont work? Try XOR. Lecture 1 67
  • 68. Limitations ofPerceptron Representability  Only concepts that are linearly separable.  Compare: Convex versus connected  Examples: XOR vs OR Lecture 1 68