1. Introduction to Artificial
Neural Networks (ANN)
Dr. Satya Prakash Sahu
Asstt Professor
Deptt of Information Technology
NIT Raipur, Chhattisgarh
spsahu.it@nitrr.ac.in
3. 3
- Introduction to ANN
- Simple Neuron Model
- Historical Development of Neural Networks
- Biological Neural Network (BNN)
- Comparison and analogy with BNN
- Building blocks of ANN
- Architecture of ANN
- Learning Rule
- McCulloch’s Pitts Model
- Summary
Contents:
4. 4
Introduction to ANN
• Artificial Neural Network is inspired by our biological nervous system and so it
resembles a simplified version of human neuron model. According to Dr. Robert
Hecht-Nielsen, inventor of the first neurocomputer, ANN can be defined as −
"...a computing system made up of a number of simple, highly interconnected
processing elements, which process information by their dynamic state response
to external inputs.”
• ANNs as Processing device: are non linear information (signal) processing
devices, built from interconnected elementary
processing devices c/d neurons.
• ANN is a mathematical model: inspired by the way biological nervous
systems (BNN -massively large parallel
interconnection of large no of neurons), such as
brain, process the information. (e.g learning in
BNN involves adjustment to synaptic
connections that exist b/w the neurons).
5. 5
• A neural network consists of an interconnected group of artificial
neurons, and it processes information using a connectionist
approach to computation. In most cases a neural network is an
adaptive system that changes its structure during a learning phase.
Neural networks are used to model complex relationships between
inputs and outputs or to find patterns in data.
• ANN like human, learn the pattern through the training with some
known facts (to be supplied by programmer) or known examples of
problems. When the network gets trained and is configured by
sufficient knowledge acquisition, it is ready to solve the similar but
unknown instances of problems.
• ANN holds the capabilities of handling imprecise and incomplete
data and shown the generalization capabilities and extract the
useful data to be processed further by computing techniques.
6. 6
• It is the type of AI that attempts to imitate the way a human brain works. Rather than using
digital model (0’s & 1’s), ANN works by creating connections b/w processing elements
(neurons eqv.in BNN), the organization and the weights of the connection determine the
output.
Resembles brain in two respects:
1. Knowledge is acquired by the n/w through learning process, and
2. Inter-neuron connection strengths known as synaptic wts are used to store the knowledge.
• Thus ANN is info processing system, where elements c/d neurons that process the
information. The signals are transmitted by means of connection links. The links possess an
associated weight, which is multiplied along with the incoming signal (input) for any typical
neural net. The output signal is obtained by applying activation to net input.
An ANN is typically defined (characterized) by three types of parameters:
• Architecture: The interconnection pattern between different layers of neurons
• Training or Learning: The learning process for updating the weights of the
interconnections
• Activation function: That converts a neuron's weighted input to its output activation.
9. 9
• A long course of evolution has given the human brain many desirable characteristics not
present in modern parallel computer (Von Neumann architecture) . Such as
• Massive parallelism
• Distributed representation and computation
• Learning ability
• Generalization ability
• Adaptivity
• Inherent contextual info processing
• Fault tolerance
• Low energy consumption
Von Neumann Computer Vs BNN
10. 10
• Other advantages of ANN:
• Adaptive learning
• Self organization
• Real time operation
• Fault Tolerance via redundant info coding
Historical Development of Neural Networks:
• 1943 – McCulloch & Pitts: start of the modern era of NN. Simple neuron model with logic
when net i/p to a particular neuron is greater than specified threshold, the neuron
fires. Only binary i/p.
• 1949 – Hebb’s book “The organization of behaviour”, learning rule for synaptic
modification.
• 1958 - Rosenblatt introduces Perceptron, followed by Minsky & Pappert[1988], the
weights on connection path can be adjusted.
• 1960 – Widrow & Hoff introduce ADALINE, using a learning rule LMS or Delta rule.
• 1982 - John Hopfield’s networks, type of model to store information in dynamically
stable networks.
• 1972 – Kohonen’s Self Organizing Maps (SOM) capable of reproducing important
aspects of the structure of the biological neural nets.
• 1985 - Parker, Lecum (1986) on BPN and paved its way to NN, generalized delta rule
for propagation of error. However credit of publishing this net goes to Rumelhart, Hinton
& Williams (1986).
11. 11
• 1988 - Grossberg, learning rule similar to that of Kohonen’s, used for CPN, the outstar
learning.
• 1987, 1990 - Carpenter & Grossberg, ART, ART1 for binary ART2 for continuous valued
i/p.
• 1988 - Broomhead & Lowe developed RBF, multilayer similar to BPN.
• 1990 – Vapnik developed the SVM.
• The term Deep Learning was introduced to the machine learning community by Rina
Dechter in 1986,and to artificial neural networks by Igor Aizenberg and colleagues in
2000, in the context of Boolean threshold neurons.
• The impact of deep learning in industry began in the early 2000s, when CNNs already
processed an estimated 10% to 20% of all the checks written in the US, according to
Yann LeCun. Industrial applications of deep learning to large-scale speech recognition
started around 2010.
13. 13
• Biological neuron consist of :
➢ Cell body or soma where cell nucleus is located
(size: 10-18µm)
➢ Dendrites, tree like nerve fibres associated with
soma, which receive signals from other neuron
➢ Axon, extending from cell body, a long fibre which
eventually branches into strands & substrands
connecting to many other neurons
➢ Synapses, a junction b/w strands of axon of one
neuron & dendrites or cell body themselves of
other neuron (gap = 200nm)
➢ No of neuron = 1011, interconnection = 104,
density = 1015, length .01mm to 1m(for limb)
➢ Presynaptic terminal
– excitatory
– inhibitory
Transmission of signal -
• All nerve cells signal in the same way through a
combination of electrical and chemical
processes:
• Input component produces graded local signals
• Trigger component initiates action potential
• Conductile component propagates action potential
• Output component releases neurotransmitter
• All signaling is unidirectional
• at a synapse is a complex chemical process,
specific transmitter substances are released from
sending side of the junction, effect is to raise/lower
the electric potential inside the body of receiving
14. 14
• If this potential => threshold, an electric
activity (short pulse) is generated. When
this happens, cell is said to have fired.
• Process:
- in the state of inactivity the interior of
neuron, protoplasm is -vely charged
against surrounding neural liquid
containing +ve sodium ions i.e. Na+ ions.
- the resting potential of about -70mV is
supported by cell membrane, which is
impenetrable for the Na+ ions . This
causes the deficiency of Na+ ions in
protoplasm.
- Signals arriving from synaptic
connection may result in temporary
depolarization of resting potential. When
potential is increased to level of above -
60mV, the membrane suddenly loses its
impermeability against Na+ ions, which
enters into protoplasm and reduce the
p.d. The sudden change in membrane
potential causes the neurons to
discharge, the neuron is said to have
fired.
15. 15
- the membrane then gradually recovers its original properties & regenerate the resting
pot. over a period of several milli seconds.
- speed of propagation of discharge signal is 0.5-2m/s, signal travelling along the axon
stops at synapse.
- transmission at synaptic cleft is affected by chem. activity, when signal arrives at
presynaptic nerve terminal, spl substance c/d neurotransmitter are produced, these
molecules travel to postsynaptic within 0.5ms and modify the conductance of postsynaptic
membrane causing polarization/depolarization of postsynaptic potential. If polarization
pot.is +ve, then synapse is termed as excitatory (tends to activate the postsynaptic
neuron) and vice versa inhibitory (counteracts excitation of the neuron)
• Neurotransmitter:
• Here are a few examples of important
neurotransmitters actions:
1. Glutamate : fast excitatory synapses in brain
& spinal cord, used at most synapses that
are "modifiable", i.e. capable of increasing or
decreasing in strength.
2. GABA is used at the great majority of fast
inhibitory synapses in virtually every part of
the brain
3. Acetylcholine is at the neuromuscular
junction connecting motor nerves to muscles.
4. Dopamine: regulation of motor behavior,
pleasures related to motivation and also
emotional arousal, a critical role in
the reward system; people with Parkinson's
disease (low levels of dopamine )
16. 16
5. Serotonin is a monoamine neurotransmitter. found in the intestine (approx. 90%), CNS
(10%) neurons, to regulate appetite, sleep, memory and learning, temperature, mood,
behaviour, muscle contraction, and function of the cardiovascular system
6. Substance P is an undecapeptide responsible for transmission of pain from certain
sensory neurons to the CNS.
7. Opioid peptides are neurotransmitters that act within pain pathways and the emotional
centers of the brain;
Major Elements of BNN
1. Dendrites : recieves the signal from other neuron (accepts input)
2. Soma (Cell body): sums all the incoming signals (process the input)
3. Axon: cell fires, after sufficient amount of input; Axon transmits signal to other
cells (turn the processed inputs into outputs)
4. Synapses: The electrochemical contact between neurons where
neurotransmitter have the major role
17. Analogy in ANN & BNN
17
X1
X2
Xn
W1
W2
Wi
Xi
Wn
…..
.....
Output
Summation
Block
Activation
Function
Weights
Dendrites & →
connected neurons
Synapses→ ----Cell Body-------→ Axon & →
output to others
Inputs signals & Input
layer nodes
18. 18
BRAIN COMPUTATION
The human brain contains about 10 billion nerve cells, or
neurons. On average, each neuron is connected to other
neurons through approximately 10,000 synapses.
21. 21
ANN Vs BNN
Characteri
stics
ANN BNN
Speed Faster in processing info; cycle time (one
step of program in CPU) in range of nano
second
Slow; Cycle time corresponds to neural event
prompted by external stimulus in a milli second
Processing Sequential Massively parallel operation having only few
steps
Size &
Complexity
Limited number of computational neuron.
Difficult to perform complex pattern
recognition
Large number of neurons; the size & complexity
gives brain the power of performing complex
pattern recognition tasks that can’t be realized on
computer
Storage Info is stored in memory; addressed by
its location; any new info in the same
location destroy the old info; strictly
replaceable
Stores info in strength of interconnection; info
is adaptable; new info is added by adjusting the
interconnection strength without destroying the
old information.
Fault
Tolerance
Inherently not a fault tolerant; since the
info corrupted in memory can’t be
retrieved
Fault tolerant; info is distributed throughout the
n/w; even if few connection not works the info is
still preserved due to distributed nature of
encoded information
Control
mechanism
Control unit present; which monitors all
the activities of computing
No central control; neurons acts based on local
info available & transmits its o/p to the connected
neurons
22. 22
AI Vs ANN
• Artificial Intelligence
▪ Intelligence comes by
designing.
▪ Response time is consistent.
▪ Knowledge is represented in
explicit and abstract form.
▪ Symbolic representation.
▪ Explanation regarding any
response or output i.e. it is
derived from the given facts or
rules.
• Artificial Neural Network
▪ Intelligence comes by Training.
▪ Response time is inconsistent.
▪ Knowledge is represented in
terms of weight that has no
relationship with explicit and
abstract form of knowledge.
▪ Numeric representation.
▪ No explanation for results or
output received.
23. 23
AI Vs ANN
• Artificial Intelligence
▪ Errors can be explicitly
corrected by the modification of
design, etc.
▪ Sequential Processing
▪ It is not a fault tolerant system.
▪ Processing speed is slow.
• Artificial Neural Network
▪ Errors can’t be explicitly
corrected. Network itself
modifies the weights to reduce
the errors and to produce the
correct output.
▪ Distributed Processing.
▪ Partially fault tolerant system.
▪ Due to dedicated hardware the
processing speed is fast.
24. 24
ARTIFICIAL NEURAL NET
X2
X1
W2
W1
Y
The figure shows a simple artificial neural net with two input neurons
(X1, X2) and one output neuron (Y). The inter connected weights are
given by W1 and W2.
25. 25
The neuron is the basic information processing unit of a NN. It consists
of:
1. A set of links, describing the neuron inputs, with weights W1, W2,
…, Wm.
2. An adder function (linear combiner) for computing the weighted
sum of the inputs (real numbers):
3. Activation function for limiting the amplitude of the neuron output.
j
j
jX
W
u
m
1
=
=
)
(u
y b
+
=
PROCESSING OF AN ARTIFICIAL NET
26. 26
BIAS OF AN ARTIFICIAL NEURON
The bias value is added to the weighted sum
∑wixi so that we can transform it from the origin.
Yin = ∑wixi + b, where b is the bias
x1-x2=0
x1-x2= 1
x1
x2
x1-x2= -1
27. 27
MULTI LAYER ARTIFICIAL NEURAL NET
INPUT: records without class attribute with normalized attributes
values.
INPUT VECTOR: X = { x1, x2, …, xn} where n is the number of
(non-class) attributes.
INPUT LAYER: there are as many nodes as non-class attributes, i.e.
as the length of the input vector.
HIDDEN LAYER: the number of nodes in the hidden layer and the
number of hidden layers depends on implementation.
28. 28
OPERATION OF A NEURAL NET
-
f
Weighted
sum
Input
vector x
Output y
Activation
function
Weight
vector w
w0j
w1j
wnj
x0
x1
xn
Bias
29. 29
WEIGHT AND BIAS UPDATION
Per Sample Updating
•updating weights and biases after the presentation of each sample.
Per Training Set Updating (Epoch or Iteration)
•weight and bias increments could be accumulated in variables and
the weights and biases updated after all the samples of the training
set have been presented.
30. 30
STOPPING CONDITION
➢ All change in weights (wij) in the previous epoch are below some
threshold, or
➢ The percentage of samples misclassified in the previous epoch is
below some threshold, or
➢ A pre-specified number of epochs has expired.
➢ In practice, several hundreds of thousands of epochs may be
required before the weights will converge.
31. 31
BUILDING BLOCKS OF ARTIFICIAL NEURAL NET
➢ Network Architecture (Connection between Neurons)
➢ Setting the Weights (Training)
➢ Activation Function
33. 33
LAYER PROPERTIES
➢ Input Layer: Each input unit may be designated by an attribute
value possessed by the instance.
➢ Hidden Layer: Not directly observable, provides nonlinearities for
the network.
➢ Output Layer: Encodes possible values.
34. 34
TRAINING PROCESS
➢ Supervised Training - Providing the network with a series of sample
inputs and comparing the output with the expected responses. The
training contiues until the network is able to provide the desired response.
The weights may then be adjusted according to a learning algorithm.
e.g. Associative N/w, BPN, CPN, etc.
➢ Unsupervised Training - Target o/p is not known. The net may modify
the weights so that the most similar input vector is assigned to the same
output unit. The net is found to form a exemplar or code book vector for
each cluster formed. The training process extracts the statistical properties
of the training set and groups similar vectors into classes. e.g. Self
learning or organizing n/w, ART, etc.
➢ Reinforcement Training - Right answer is not provided but indication of
whether ‘right’ or ‘wrong’ is provided.
35. 35
ACTIVATION FUNCTION
➢ ACTIVATION LEVEL – DISCRETE OR CONTINUOUS
➢ HARD LIMIT FUCNTION (DISCRETE)
• Binary Activation function
• Bipolar activation function
• Identity function
➢ SIGMOIDAL ACTIVATION FUNCTION (CONTINUOUS)
• Binary Sigmoidal activation function
• Bipolar Sigmoidal activation function
39. 39
CONSTRUCTING ANN
➢ Determine the network properties:
• Network topology
• Types of connectivity
• Order of connections
• Weight range
➢ Determine the node properties:
• Activation range
➢ Determine the system dynamics
• Weight initialization scheme
• Activation – calculating formula
• Learning rule
40. 40
McCULLOCH–PITTS NEURON (TLU)
➢ First formal defination of synthetic neuron model based on
simplified biological model formulated by Warren McCulloch and
Walter Pitts in 1943.
➢ Neurons are sparsely and randomly connected
➢ Model is binary activated i.e. allows binary 0 or 1.
➢ Connected by weighted path, path can be excitory (+w) or
inhibitory (-p).
➢ Associated with threshold value (θ), neuron fires if the net input to
the neuron is greater than θ value.
➢ Uses One time step function to pass signal over the connection
link.
➢ Firing state is binary (1 = firing, 0 = not firing)
➢ Excitatory tend to increase voltage of other cells
➢ When inhibitory neuron connects to all other neurons
• It functions to regulate network activity (prevent too many
firings)
41. 41
➢ x1.....xn are excitory denoted by w
➢ xn+1.....xn+m are inhibitory denoted by -p
➢ net input yin = Σxiwi + Σxj(-pj)
1, if yin > θ
f(yin) = 0, if yin < θ
➢ The threhold θ should satisfy the relation:
θ>nw - p
➢ McCulloch-Pitts neuron will fire if it receives k or more excitory inputs and no
inhibitory inputs, where
kw > θ > (k-1)w
➢ Problems with Logic and their realization
McCULLOCH–PITTS NEURON Cont..
42. PROBLEM SOLVING
➢ Select a suitable NN model based on the nature of the problem.
➢ Construct a NN according to the characteristics of the application
domain.
➢ Train the neural network with the learning procedure of the
selected model.
➢ Use the trained network for making inference or solving problems.
42
43. NEURAL NETWORKS
➢ Neural Network learns by adjusting the weights so as to be able
to correctly classify the training data and hence, after testing
phase, to classify unknown data.
➢ Neural Network needs long time for training.
➢ Neural Network has a high tolerance to noisy and incomplete
data.
43
44. Neural Networks - Model Selection
• In order to select the appropriate neural network model, it is necessary to find out
whether you are dealing with a
• supervised task, or
• unsupervised task.
• If you want to train the system to respond with a certain predefined output, the task
is supervised. If the network should self-organize, the target is not predetermined
and the task is called an unsupervised task.
• Furthermore, you have to know whether your task is a
• classification task,or
• function approximation task.
• A neural network performing a classification task puts each input into one of a
given number of classes. For function approximation tasks, each input is
associated with a certain (analog) value.
44
45. Taxonomy of ANNs
• During the last fifty years, many different models of artificial neural networks have been developed. A
classification of the various models might be rather artificial. However it could be of some benefit to
look at the type of data which can be processed by a particular network, and at the type of the training
method. Basically we can distinguish between networks processing only binary data, and
networks for analog data. We could further discriminate between supervised training methods
and unsupervised methods.
• Supervised training methods use the output values of a training in order to set up a relationship
between input and output of the ANN model. Unsupervised methods try to find the structure in the
data on their own. Supervised methods are therefore mostly used for function approximation
and classification, while unsupervised methods are most suitable for clustering tasks.
• ANNs classfied for fixed pattern
Training Data Training Method Examples
Binary input
Supervised
Hopfield network ,
Hamming network,
BAM, ABAM, etc.
Unsupervised Carpenter /Grossberg Classifier
Continous valued input
Supervised
Rosenblat Perceptron network,
Multilayer Perceptron network,
Radial Basis Function network,
Counter Progation network,
Unsupervised Kohonen's Self Organizing Feature Map Network
45
46. Taxonomy of ANNs Cont..
• Artificial neural networks (ANN) are adaptive models that can establish almost any
relationship between data. They can be regarded as black boxes to build mappings
between a set of input and output vectors. ANNs are quite promising in solving problems
where traditional models fail, especially for modeling complex phenomena which show a
non-linear relationship.
• Neural networks can be roughly divided into three categories:
– Signal transfer networks. In signal transfer networks, the input signal is transformed into an
output signal. Note that the dimensionality of the signals may change during this process. The
signal is propagated through the network and is thus changed by the internal mechanism of the
network. Most network models are based on some kind of predefined basis functions (e.g.
Gaussian peaks, as in the case of radial basis function networks (RBF networks), or sigmoid
function (in the case of multi-layer perceptrons).
– State transition networks. Examples: Hopfield networks, and Boltzmann machines.
– Competitive learning networks. In competitive networks (sometimes also called self-organizing
maps, or SOMs) all the neurons of the network compete for the input signal. The neuron which
"wins" gets the chance to move towards the input signal in n-dimensional space. Example:
Kohonen feature map.
• What these types of networks have in common is that they "learn" by adapting their
network parameters. In general, the learning algorithms try to minimize the error of the
model. This is often a type of gradient descent approach - with all its pitfalls.
46
47. Learning Rules
• NN learns about its environment through an interactive process of
adjustment applied to its synaptic weights and bias levels.
• Learning is process by which free parameters of NN get adapted
through a process of stimulation by the environment in which the n/w
is embedded. The type of learning is determined by the way the
parameter changes take place.
• Set of well-defined rules for the solution of a learning problem is called
learning algorithm.
• Each learning algorithm differs from the other in the way in which the
adjustment to a synaptic weight of a neuron is formulated and the
manner in which the NN is made up of set of inter-connected neurons
relating to its environment.
• The various learning rules are:
47
49. Learning Rules...
• Hebbian Learning Rule:
– “When an axon of cell A is near enough to excite cell B and
repeatedly or persistently takes part in firing it, some growth
process or metabolic changes take place in one or both cells
such as A's efficiency as one of the cells firing B, is increased”.
It is also called Correlational learning. And the above statement
can be split into two parts:
• If the two neurons on either side of a synapse are activated
simultaneously, then the strength of that synapse is selectively
increased.
• If the two neurons on either side of a synapse are activatd
asynchrounously, then that synapse is selectively weakened or
eliminated.
•This type of synapse is called Hebbian Synapse
49
50. • Four Key mechanisms that characterize this synapse are time dependent, local
mechanism, interactive mechanism, and correlational mechanism.
• Simplest form of this synapse is
• This rule represents a purely feed forward, unsupervised learning. It states that
if the cross product of output and input is positive, this results in increase in
weight, otherwise the weight decreases.
• limitations: needs to modified to counteract unconstrained growth of weight
values ( due to consistently excitation or firing) or the saturation of weights at a
certain preset level.
Hebbian Learning Rule...
y
x
w i
=
50
51. • Perceptron Learning rule:
– Supervised learning; learning signal is the diff b/w the desired and actual neuron's
response.
– to interpret the degree of difficulty of training a perceptron for diff types of input, weight
vector is assumed to be perpendicular to plane separating the input patterns during
learning process.
– x(n) where n=1......N input training vector.
– t(n).............................associated target value either +1 or -1 value and activation
function:
– y=f(yin) where
y = 1 if yin>θ
0 if -θ< yin < θ
-1 if yin < - θ
– Weight Updation (ΔW = αtx) :
• if y=t, no weight updation
• if y≠t , Wnew = Wold + αtx
– Perceptron learning convergence theorem states “If there is weight vector w* such that
f(x(p)w*) = t(p) for all p, then for any starting vector w1 the perceptron learning rule will
converge to a weight vector that gives the correct response for all training patterns ,
and this will be done in finite number of steps”.
51
52. Delta Learning Rule
• Only valid for continuous activation function
• Used in supervised training mode
• Learning signal for this rule is called delta
• The aim of the delta rule is to minimize the error over all training
patterns
• The adjustment made to a synaptic weight of a neuron is
proportional to the product of the error signal and the input signal of
the synapse in question.
52
53. Delta Learning Rule Contd.
Learning rule is derived from the condition of least squared error.
Calculating the gradient vector with respect to wi
Minimization of error requires the weight changes to be in the negative
gradient direction
53
54. Widrow-Hoff learning Rule
• Also called as least mean square learning rule
• Introduced by Widrow(1962), used in supervised learning
• Independent of the activation function
• Special case of delta learning rule wherein activation function is an
identity function ie f(net)=net
• Minimizes the squared error between the desired output value di
and neti
54
55. Memory Based Learning
• Memory-Based Learning:all of the past
experiences are explicitly stored in a large
memory of correctly classified input-output
examples {(xi,di)}N
i=1
• Criterion used for defining the local neighbourhood of
the test vector xtest.
• Learning rule applied to the training examples in the
local neighborhood of xtest.
• Nearest neighbor rule: the vector x’N {x1,x2,...,xN} is the
nearest neighbor of xtest if mini d(xi, xtest ) = d(x’N , xtest )
55
56. • If the classified examples d(xi, di ) are
independently and identically distributed
according to the joint probability
distribution of the example (x,d).
• If the sample size N is infinitely large.
• The classification error incurred by the
nearest neighbor rule is bounded above
twice the Bayes probability of error.
56
57. • k-nearest neighbor
classifier:
• Identify the k classified
patterns that lie nearest
to the test vector xtest for
some integer k.
• Assign xtest to the class
that is most frequently
represented in the k
nearest neighbors to xtest
.
57
58. Competitive Learning:
• The output neurons of a
neural network compete
among themselves to
become active.
• - a set of neurons that
are all the same (excepts
for synaptic weights)
• - a limit imposed on the
strength of each neuron
• - a mechanism that
permits the neurons to
compete -> a winner-
takes-all
58
59. Competitive Learning:
• The standard competitive learning rule
• wkj = (xj-wkj) if neuron k wins the
competition
= 0 if neuron k loses the competition
• Note. all the neurons in the network are
constrained to have the same length.
59
60. Boltzmann Learning:
• The neurons constitute a recurrent structure
and they operate in a binary manner. The
machine is characterized by an energy
function E.
• E = -½jk wkjxkxj , jk
• Machine operates by choosing a neuron at
random then flipping the state of neuron k
from state xk to state –xk at some temperature
T with probability
• P(xk→ - xk) = 1/(1+exp(- Ek/T))
60
61. Clamped condition:
the visible neurons
are all clamped onto
specific states
determined by the
environment
Free-running
condition: all the
neurons (=visible
and hidden) are
allowed to operate
freely
• The Boltzmann
learning rule:
• wkj = (+
kj- -
kj),
jk,
• note that both +
kj
and -
kj range in
value from –1 to +1.
61
63. Hebb Network
• Hebb learning rule is the simpliest one
• The learning in the brain is performed by the
change in the synaptic gap
• When an axon of cell A is near enough to excite
cell B and repeatedly keep firing it, some growth
process takes place in one or both cells
• According to Hebb rule, weight vector is found to
increase proportionately to the product of the
input and learning signal.
y
x
old
w
new
w i
i
i +
= )
(
)
( 63
64. Flow chart of Hebb training
algorithm
Start
Initialize Weights
For
Each
s:t
Activate input
xi=si
1
1
Activate output
y=t
Weight update
y
x
old
w
new
w i
i
i +
= )
(
)
(
Bias update
b(new)=b(old) + y
Stop
y
n
64
65. • Hebb rule can be used for pattern
association, pattern categorization, pattern
classification and over a range of other
areas
• Problem to be solved:
Design a Hebb net to implement OR
function
65
66. How to solve
Use bipolar data in the place of binary data
Initially the weights and bias are set to zero
w1=w2=b=0
X1 X2 B y
1 1 1 1
1 -1 1 1
-1 1 1 1
-1 -1 1 -1
66
68. LINEAR SEPARABILITY
➢ Linear separability is the concept wherein the separation of the
input space into regions is based on whether the network response
is positive or negative.
➢ Consider a network having
positive response in the first
quadrant and negative response
in all other quadrants (AND
function) with either binary or
bipolar data, then the decision
line is drawn separating the
positive response region from
the negative response region.
68
69. Perceptron
• In late 1950s, Frank Rosenblatt introduced a network composed of the units that
were enhanced version of McCulloch-Pitts Threshold Logic Unit (TLU) model.
• Rosenblatt's model of neuron, a perceptron, was the result of merger between
two concepts from the 1940s, McCulloch-Pitts model of an artificial neuron and
Hebbian learning rule of adjusting weights. In addition to the variable weight
values, the perceptron model added an extra input that represents bias. Thus,
the modified equation is now as follows:
• The only efficient learning element at that time was for single-layered networks.
• Today, used as a synonym for a single-layered feed-forward network.
69
70. Perceptron
• Linear treshold unit (LTU)
x1
x2
xn
.
.
.
w1
w2
wn
b
Bias = 1
yin = wi xi +b
1 if yin >0
o(xi)=
-1 otherwise
o
{
n
i=0
70
71. Perceptron Learning Rule
wi = wi + wi
wi = (t - o) xi
t=c(x) is the target value
o is the perceptron output
Is a small constant (e.g. 0.1) called learning rate
• If the output is correct (t=o) the weights wi are not changed
• If the output is incorrect (to) the weights wi are changed
such that the output of the perceptron for the new weights
is closer to t.
• The algorithm converges to the correct classification
• if the training data is linearly separable
• and is sufficiently small
71
72. Flow chart of Perceptron training algorithm
Start
Initialize Weights
For
Each
s:t
Activate input
xi=si
1
1
Weight Updation (ΔW = αtx) :
if y=t, no weight updation
if y≠t , Wnew = Wold + αtx
Bias update
bnew = bold + αt
Stop
y
n
Using
Act. Function, y = 1 if yin>θ
0 if -θ< yin < θ
-1 if yin < - θ
Compute the O/p response
Yin = b +∑XiWi
72
73. LEARNING ALGORITHM
➢ Epoch : Presentation of the entire training set to the neural
network.
➢ In the case of the AND function, an epoch consists of four sets of
inputs being presented to the network (i.e. [0,0], [0,1], [1,0],
[1,1]).
➢ Error: The error value is the amount by which the value output by
the network differs from the target value. For example, if we
required the network to output 0 and it outputs 1, then Error = -1.
73
74. ➢ Target Value, T : When we are training a network we not only
present it with the input but also with a value that we require the
network to produce. For example, if we present the network with
[1,1] for the AND function, the training value will be 1.
➢ Output , O : The output value from the neuron.
➢ Ij : Inputs being presented to the neuron.
➢ Wj : Weight from input neuron (Ij) to the output neuron.
➢ LR : The learning rate. This dictates how quickly the network
converges. It is set by a matter of experimentation. It is typically
0.1.
74
75. TRAINING ALGORITHM
➢ Adjust neural network weights to map inputs to outputs.
➢ Use a set of sample patterns where the desired output (given the
inputs presented) is known.
➢ The purpose is to learn to
• Recognize features which are common to good and bad
exemplars
75
77. 77
Summary
- Fundamental and overview of ANN
- Comparison and analogy with BNN
- History of ANN
- Building blocks of ANN
- Architecture of ANN
- Learning Rule
- McCulloch’s Pitts Model and realization with Logic problems
- Hebb Network & Perceptron Network and learning problems