Neural Network Fundamentals being explained using most basic form of mathematics. The slides slowly dives into solid geometry to explain few fundamental concepts on which gradient descent and steepest descent method works.
4. What is Neural net
Generalization of mathematical models of human cognition or neural biology.
Assumption:
1. Information processing occurs at many simple elements called neurons,
2. Signals are passed between neurons over connection links,
3. Each connection link has an associated weight, which, in a typical neural net,
multiplies the signal transmitted.
4. Each neuron applies an activation function (usually non-linear) to its net input
(sum of weighted input signals) to determine its output signal.
5. Characteristics
A neural network is characterized by
1. Pattern of connection between neurons called as architecture
2. Method of determining the weights on the connection called as training or
learning algorithm,
3. Activation function
6. Solve Problems
1. Storing and recalling data or patterns,
2. Classifying patterns,
3. Grouping similar patterns,
4. Constraint optimization problems
14. Classic key words
1. McCulloch-Pitts neurons 1988,
2. Hebb learning,
3. Perceptrons,
4. ADALINE & MADALINE (Adaptive Linear Neuron),
5. Backpropagation,
6. Hopfield nets,
7. Neocognitron,
8. Boltzmann machine
Note: These will be covered in chapters in detail. Under “Learning Rules”.
15. Linear Model | Data/Curve Fitting using ANN
y=mx+c [ m and c are free choice]
y=W1x+W0 [ W1 -> Synaptic weight, W0 ->Bias]
1
2
+1
W21 -> slope m
W20 -> intercept C
y2
Bias
Xi
X
y
Question: y=m1x1+m2x2+...+c1+c2…+cn | ADALINE
18. Revision
1. Generalization of mathematical models of human cognition or neural biology.
2. Pattern of connection between neurons called as architecture
3. Method of determining the weights on the connection called as training or
learning algorithm,
4. Activation function,
5. Simple Architecture and Bias,
6. Linear model and curve fitting using simple architecture,
7. Remember : y_in=b1 +w1x1+w2x2+w3x3
19. Linear Model | Data/Curve Fitting using ANN
y=mx+c [ m and c are free choice]
y=W1x+W0 [ W1 -> Synaptic weight, W0 ->Bias]
1
2
+1
W21 -> slope m
W20 -> intercept C
y2
Bias
Xi
X
y
Question: y=m1x1+m2x2+...+c1+c2…+cn | ADALINE
20. Combined Error Measurement
X
y
W0
Error
Ep = Sum(tp-yp)^2
tp->target output
yp->Neural Network Response
Seems familiar?
Best fit, lowest point
of hyperboloid
Iteration n, for W0 and W1
Iteration m, for W0 and W1
21. Minimum Error ⇔ Best Fit | 2D Explanation
Error(E)
Synaptic Weight values (W)
Partial derivative of E w.r.t. W under observation:
E / W will give this profile for slopes.
Why not to just differentiate to reach minima?
https://www.youtube.com/watch?v=_ON9fuVR9oA https://www.youtube.com/watch?v=AXqhWeUEtQU
22. Gradient Descent Algorithm
Non Linear Activation Units
W0
Error
Best fit ?
Iteration n, for W0 and W1
Iteration m, for W0 and W1
Global minima vs local minima?
Gradient of descent:
G= E/ Wij
= Ep / Wij
= ( Ep/ Wij)
Chain rule of differentiation….
applied Ep= (tp-yp)^2
Ep = Sum(tp-yp)^2
tp->target output
yp->Neural Network Response
https://www.youtube.com/watch?v=KshIEHQn5ZM&list=PL53BE265CE4A6C056&index=3
23. Gradient Descent Algorithm
Non Linear Activation Units
Global minima vs local minima?
Gradient of descent:
G= E/ Wij
= Ep / Wij
= ( Ep/ Wij) = (dEp/dy) * (dy/dw)
Chain rule of differentiation….
applied Ep=1/2 (tp-yp)^2
Ep = Sum(tp-yp)^2
tp->target output
yp->Neural Network Response
E / Wp,i = - (tp-yp) xi ….
Derivative or gradient w.r.t. Wp,i
Correction to reach minima, in -ve direction:
delta Wp,i= (tp-yp)xi
Wp,i= Wp,i + delta Wp,i
AND, to speed up correction
Delta Wp,i= e (tp-yp)xi
e is learning rate!
https://www.youtube.com/watch?v=0T0QrHO56qg
And, y_in=b1 +w1x1+w2x2+w3x3
24. Gradient Descent Algorithm
Non Linear Activation Units
Learning rate, e
Error(E)
Synaptic Weight values (W)
Learning rate controls
the speed of descent
● Simple: Higher learning rate can
reach minima faster and slower
learning rate will be slower?
● Is that true?
● Crossing the minima, a
possibility?
29. Learning Mechanisms in NN
To update synaptic weights and bias
Following five basic rules, can help, in doing so:
1. Error- correction learning
2. Memory based learning
3. Hebbian Learning
4. Competitive Learning
5. Boltzmann Learning
Stimulation
Change
Free Param
Respond
different
31. Memory Based Learning
● Memorize association between input and output vector
● Xi (inputs), di (output) for i= 1...N
● For unknown Xz vector , how to find match?
● We find closest match, using distance like euclidean distance. That will be
nearest neighbour of Xz. min of dist(Xi,Xz)
● Sounds familiar ?
● What’s the catch? Outlier ?
● Solution pick neighbours not neighbour , k-nearest
neighbour
32. Hebbian Learning
● Closest to biological neuron learning, Hebb (1949 book) Neurophysiologist,
● If cell A consistently fires signals for cell B then metabolic changes happens
so that the efficiency A signalling B increases. The synaptic weight
strengthens between them. And weakens in case it doesn’t,
● 2 Neurons: Presynaptic neurons and postsynaptic neurons,
● Hebbian Synapses
○ Time Dependent,
○ Local in nature,(Spatiotemporal continuity)
○ Strongly interactive (back and forth interaction)