SlideShare a Scribd company logo
1 of 116
Artificial Neural Network
Complied by
Dr. Vaishali Wangikar
What is Artificial Neural Network?
• The term "Artificial Neural Network" is derived from Biological neural
networks that develop the structure of a human brain. Similar to the
human brain that has neurons interconnected to one another,
artificial neural networks also have neurons that are interconnected
to one another in various layers of the networks. These neurons are
known as nodes.
https://www.javatpoint.com/artificial-neural-network
Neural Network
• NN are constructed and implemented to model the human brain.
• Performs various tasks such as pattern-matching, classification,
optimization function, approximation, vector quantization and data
clustering.
• These tasks are difficult for traditional computers
ANN
• ANN posess a large number of processing elements called
nodes/neurons which operate in parallel.
• Neurons are connected with others by connection link.
• Each link is associated with weights which contain information about
the input signal.
• Each neuron has an internal state of its own which is a function of the
inputs that neuron receives- Activation level
• In short , An artificial neural network consists of a pool of simple
processing units which communicate by sending signals to each other
over a large number of weighted connections.
Artificial Neural Network
 A set of major aspects of a parallel distributed model include:
 a set of processing units (cells).
 a state of activation for every unit, which equivalent to the output of the
unit.
 connections between the units. Generally each connection is defined by a
weight.
 a propagation rule, which determines the effective input of a unit from its
external inputs.
 an activation function, which determines the new level of activation based
on the effective input and the current activation.
 an external input for each unit.
 a method for information gathering (the learning rule).
 an environment within which the system must operate, providing input
signals and _ if necessary _ error signals.
Artificial Neural Networks
• The “building blocks” of neural networks are the
neurons.
• In technical systems, we also refer to them as units or nodes.
• Basically, each neuron
 receives input from many other neurons.
 changes its internal state (activation) based on the current
input.
 sends one output signal to many other neurons, possibly
including its input neurons (recurrent network).
Artificial Neural Networks
• Information is transmitted as a series of electric
impulses, so-called spikes.
• The frequency and phase of these spikes encodes the
information.
• In biological systems, one neuron can be connected to as
many as 10,000 other neurons.
• Usually, a neuron receives its information from other
neurons in a confined area, its so-called receptive field.
How do ANNs work?
 An artificial neural network (ANN) is either a hardware
implementation or a computer program which strives to
simulate the information processing capabilities of its biological
exemplar. ANNs are typically composed of a great number of
interconnected artificial neurons. The artificial neurons are
simplified models of their biological counterparts.
 ANN is a technique for solving problems by constructing software
that works like our brains.
How do our brains work?
 The Brain is A massively parallel information processing system.
 Our brains are a huge network of processing elements. A typical
brain contains a network of 10 billion neurons.
How do our brains work?
 A processing element
Dendrites: Input
Cell body: Processor
Synaptic: Link
Axon: Output
From Biological to Artificial Neurons
The Neuron - A Biological Information Processor
• dentrites - the receivers
• soma - neuron cell body (sums input signals)
• axon - the transmitter
• synapse - point of transmission
• neuron activates after a certain threshold is met
Learning occurs via electro-chemical changes in
effectiveness of synaptic junction.
From Biological to Artificial Neurons
An Artificial Neuron - The Perceptron
• simulated on hardware or by software
• input connections - the receivers
• node, unit, or PE simulates neuron body
• output connection - the transmitter
• activation function employs a threshold or bias
• connection weights act as synaptic junctions
Learning occurs via changes in value of the connection
weights.
From Biological to Artificial Neurons
An Artificial Neuron - The Perceptron
• simulated on hardware or by software
• input connections - the receivers
• node, unit, or PE simulates neuron body
• output connection - the transmitter
• activation function employs a threshold or bias
• connection weights act as synaptic junctions
Learning occurs via changes in value of the connection
weights.
How do our brains work?
 A processing element
A neuron is connected to other neurons through about 10,000
synapses
How do our brains work?
 A processing element
A neuron receives input from other neurons. Inputs are combined.
How do our brains work?
 A processing element
Once input exceeds a critical level, the neuron discharges a spike ‐
an electrical pulse that travels from the body, down the axon, to
the next neuron(s)
How do our brains work?
 A processing element
The axon endings almost touch the dendrites or cell body of the
next neuron.
How do our brains work?
 A processing element
Transmission of an electrical signal from one neuron to the next is
effected by neurotransmitters.
How do our brains work?
 A processing element
Neurotransmitters are chemicals which are released from
the first neuron and which bind to the
Second.
How do our brains work?
 A processing element
This link is called a synapse. The strength of the signal that
reaches the next neuron depends on factors such as the amount of
neurotransmitter available.
How do ANNs work?
An artificial neuron is an imitation of a human neuron
How do ANNs work?
• Now, let us have a look at the model of an artificial neuron.
How do ANNs work?
Output
x1
x2
xm
∑
y
Processing
Input
∑= X1+X2 + ….+Xm =y
. . . . . . . . . . .
.
How do ANNs work?
Not all inputs are equal
Output
x1
x2
xm
∑
y
Processing
Input
∑= X1w1+X2w2 + ….+Xmwm
=y
w1
w2
w
m
weights
. . . . . . . . . . .
.
. . . .
.
How do ANNs work?
The signal is not passed down to the
next neuron verbatim
Transfer Function
(Activation Function)
Output
x1
x2
xm
∑
y
Processing
Input
w1
w2
wm
weights
. . . . . . . . . . .
.
f(vk)
. . . .
.
The output is a function of the input, that is
affected by the weights, and the transfer
functions
27
Sj f(Sj) Xj
ao
a1
a2
an
+1
wj0
wj1
wj2
wjn
Going step by step : The Artificial
Neuron (Perceptron)
28
A Simple Model of a Neuron
(Perceptron)
• Each neuron has a threshold value
• Each neuron has weighted inputs from other
neurons
• The input signals form a weighted sum
• If the activation level exceeds the threshold, the
neuron “fires”
w1j
w2j
w3j
wij
y1
y2
y3
yi
 O
29
An Artificial Neuron
• Each hidden or output neuron has weighted input
connections from each of the units in the preceding layer.
• The unit performs a weighted sum of its inputs, and
subtracts its threshold value, to give its activation level.
• Activation level is passed through a sigmoid activation
function to determine output.
w1j
w2j
w3j
wij
y1
y2
y3
yi
 f(x) O
30
Supervised Learning
• Training and test data sets
• Training set; input & target
31
Perceptron Training
• Linear threshold is used.
• W - weight value
• t - threshold value
1 if  wi xi >t
Output=
0 otherwise
{ i=0
32
Simple network
1 if  wixi >t
output=
0 otherwise
{ i=0
t = 0.0
Y
X
W1 = 1.5
W3 = 1
-1
AND with a Biased input
W2 = 1
33
Learning algorithm
While epoch produces an error
Present network with next inputs from epoch
Error = T – O
If Error <> 0 then
Wj = Wj + LR * Ij * Error
End If
End While
34
Learning algorithm
Epoch : Presentation of the entire training set to the neural
network.
In the case of the AND function an epoch consists
of four sets of inputs being presented to the
network (i.e. [0,0], [0,1], [1,0], [1,1])
Error: The error value is the amount by which the value
output by the network differs from the target
value. For example, if we required the network to
output 0 and it output a 1, then Error = -1
35
Learning algorithm
Target Value, T : When we are training a network we not
only present it with the input but also with a value
that we require the network to produce. For
example, if we present the network with [1,1] for
the AND function the target value will be 1
Output , O : The output value from the neuron
Ij : Inputs being presented to the neuron
Wj : Weight from input neuron (Ij) to the output neuron
LR : The learning rate. This dictates how quickly the
network converges. It is set by a matter of
experimentation. It is typically 0.1
36
Training Perceptrons
t = 0.0
y
x
-1
W1 = ?
W3 = ?
W2 = ?
For AND
A B Output
0 0 0
0 1 0
1 0 0
1 1 1
•What are the weight values?
•Initialize with random weight values
37
Training Perceptrons
t = 0.0
y
x
-1
W1 = 0.3
W3 =-0.4
W2 = 0.5
I1 I2 I3 Summation Output
-1 0 0 (-1*0.3) + (0*0.5) + (0*-0.4) = -0.3 0
-1 0 1 (-1*0.3) + (0*0.5) + (1*-0.4) = -0.7 0
-1 1 0 (-1*0.3) + (1*0.5) + (0*-0.4) = 0.2 1
-1 1 1 (-1*0.3) + (1*0.5) + (1*-0.4) = -0.2 0
For AND
A B Output
0 0 0
0 1 0
1 0 0
1 1 1
38
Learning in Neural Networks
• Learn values of weights from I/O pairs
• Start with random weights
• Load training example’s input
• Observe computed input
• Modify weights to reduce difference
• Iterate over all training examples
• Terminate when weights stop changing OR when error is very small
Artificial Neural Networks
 An ANN can:
1. compute any computable function, by the appropriate
selection of the network topology and weights values.
2. learn from experience!
 Specifically, by trial‐and‐error
Learning by trial‐and‐error
Continuous process of:
Trial:
Processing an input to produce an output (In terms of
ANN: Compute the output function of a given input)
Evaluate:
Evaluating this output by comparing the actual
output with the expected output.
Adjust:
Adjust the weights.
How it works?
 Set initial values of the weights randomly.
 Input: truth table of the XOR
 Do
 Read input (e.g. 0, and 0)
 Compute an output (e.g. 0.60543)
 Compare it to the expected output. (Diff= 0.60543)
 Modify the weights accordingly.
 Loop until a condition is met
 Condition: certain number of iterations
 Condition: error threshold
Design Issues
 Initial weights (small random values ∈[‐1,1])
 Transfer function (How the inputs and the weights are
combined to produce output?)
 Error estimation
 Weights adjusting
 Number of neurons
 Data representation
 Size of training set
Transfer Functions
 Linear: The output is proportional to the total
weighted input.
 Threshold: The output is set at one of two values,
depending on whether the total weighted input is
greater than or less than some threshold value.
 Non‐linear: The output varies continuously but not
linearly as the input changes.
Error Estimation
 The root mean square error (RMSE) is a frequently-
used measure of the differences between values
predicted by a model or an estimator and the values
actually observed from the thing being modeled or
estimated
Weights Adjusting
 After each iteration, weights should be adjusted to
minimize the error.
– All possible weights
– Back propagation
Topologies of Neural Networks
completely
connected feedforward
(directed, a-cyclic)
recurrent
(feedback connections)
Basic models of ANN
Basic Models of
ANN
Interconnections Learning rules
Activation
function
Classification based on interconnections
Interconnections
Feed forward
Single layer
Multilayer
Feed Back Recurrent
Single layer
Multilayer
Single layer Feedforward Network
Feedforward Network
• Its output and input vectors are respectively
• Weight wij connects the i’th neuron with
j’th input. Activation rule of ith neuron is
where
EXAMPLE
Multilayer feed forward network
Can be used to solve complicated problems
Feedback network
When outputs are directed back as
inputs to same or preceding layer
nodes it results in the formation of
feedback networks
Lateral feedback
If the feedback of the output of the processing elements is directed back
as input to the processing elements in the same layer then it is called
lateral feedback
Recurrent n/ws
• Single node with own feedback
• Competitive nets
• Single-layer recurrent nts
• Multilayer recurrent networks
Feedback networks with closed loop are called Recurrent Networks. The
response at the k+1’th instant depends on the entire history of the network
starting at k=0.
Automaton: A system with discrete time inputs and a discrete data
representation is called an automaton
Basic models of ANN
Basic Models of
ANN
Interconnections Learning rules
Activation
function
Learning
• It’s a process by which a NN adapts itself to a stimulus by making
proper parameter adjustments, resulting in the production of desired
response
• Two kinds of learning
• Parameter learning:- connection weights are updated
• Structure Learning:- change in network structure
Training
• The process of modifying the weights in the connections between
network layers with the objective of achieving the expected output is
called training a network.
• This is achieved through
• Supervised learning
• Unsupervised learning
• Reinforcement learning
Classification of learning
• Supervised learning
• Unsupervised learning
• Reinforcement learning
Supervised Learning
• Child learns from a teacher
• Each input vector requires a corresponding target vector.
• Training pair=[input vector, target vector]
Neural
Network
W
Error
Signal
Generator
X
(Input)
Y
(Actual output)
(Desired Output)
Error
(D-Y)
signals
Supervised learning contd.
Supervised learning
does minimization of
error
Unsupervised Learning
• How a fish or tadpole learns
• All similar input patterns are grouped together as
clusters.
• If a matching input pattern is not found a new cluster
is formed
Unsupervised learning
Self-organizing
• In unsupervised learning there is no feedback
• Network must discover patterns, regularities, features for the input
data over the output
• While doing so the network might change in parameters
• This process is called self-organizing
Reinforcement Learning
NN
W
Error
Signal
Generator
X
(Input)
Y
(Actual output)
Error
signals R
Reinforcement signal
When Reinforcement learning is used?
• If less information is available about the target output values (critic
information)
• Learning based on this critic information is called reinforcement
learning and the feedback sent is called reinforcement signal
• Feedback in this case is only evaluative and not instructive
Basic models of ANN
Basic Models of
ANN
Interconnections Learning rules
Activation
function
1. Identity Function
f(x)=x for all x
2. Binary Step function
3. Bipolar Step function
4. Sigmoidal Functions:- Continuous functions
5. Ramp functions:-
Activation Function





ifx
ifx
x
f
0
1
{
)
(






ifx
ifx
x
f
1
1
{
)
(
0
0
1
0
1
1
)
(





ifx
x
if
x
ifx
x
f
71
Activation functions
• Transforms neuron’s input into output.
• Features of activation functions:
• A squashing effect is required
• Prevents accelerating growth of activation
levels through the network.
• Simple and easy to calculate
72
Standard activation functions
• The hard-limiting threshold function
– Corresponds to the biological paradigm
• either fires or not
• Sigmoid functions ('S'-shaped curves)
– The logistic function
– The hyperbolic tangent (symmetrical)
– Both functions have a simple differential
– Only the shape is important
f(x) =
1
1 + e -ax
Some learning algorithms we will learn are
• Supervised:
• Adaline, Madaline
• Perceptron
• Back Propagation
• multilayer perceptrons
• Radial Basis Function Networks
• Unsupervised
• Competitive Learning
• Kohenen self organizing map
• Learning vector quantization
• Hebbian learning
Neural processing
• Recall:- processing phase for a NN and its objective is to retrieve the
information. The process of computing o for a given x
• Basic forms of neural information processing
• Auto association
• Hetero association
• Classification
Neural processing-Autoassociation
• Set of patterns can be stored in
the network
• If a pattern similar to a member of
the stored set is presented, an
association with the input of
closest stored pattern is made
Neural Processing- Heteroassociation
• Associations between pairs of
patterns are stored
• Distorted input pattern may cause
correct heteroassociation at the
output
Neural processing-Classification
• Set of input patterns is divided
into a number of classes or
categories
• In response to an input pattern
from the set, the classifier is
supposed to recall the
information regarding class
membership of the input pattern.
Important terminologies of ANNs
• Weights
• Bias
• Threshold
• Learning rate
• Momentum factor
• Vigilance parameter
• Notations used in ANN
Weights
• Each neuron is connected to every other neuron by means of directed
links
• Links are associated with weights
• Weights contain information about the input signal and is
represented as a matrix
• Weight matrix also called connection matrix
Weight matrix
W=
1
2
3
.
.
.
.
.
T
T
T
T
n
w
w
w
w
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
=
11 12 13 1
21 22 23 2
1 2 3
...
...
..................
...................
...
m
m
n n n nm
w w w w
w w w w
w w w w
 
 
 
 
 
 
 
 
 
Weights contd…
• wij –is the weight from processing element ”i” (source node) to
processing element “j” (destination node)
X1
1
Xi
Yj
Xn
w1j
wij
wnj
bj
0
0 0 1 1 2 2
0
1
1
....
n
i ij
inj
i
j j j n nj
n
j i ij
i
n
j i ij
inj
i
y xw
x w xw x w x w
w xw
y b xw




    
 
 



Activation Functions
• Used to calculate the output response of a neuron.
• Sum of the weighted input signal is applied with an activation to
obtain the response.
• Activation functions can be linear or non linear
• Already dealt
• Identity function
• Single/binary step function
• Discrete/continuous sigmoidal function.
Bias
• Bias is like another weight. Its included by adding a component x0=1
to the input vector X.
• X=(1,X1,X2…Xi,…Xn)
• Bias is of two types
• Positive bias: increase the net input
• Negative bias: decrease the net input
Why Bias is required?
• The relationship between input and output given by
the equation of straight line y=mx+c
X Y
Input
C(bias)
y=mx+C
Threshold
• Set value based upon which the final output of the
network may be calculated
• Used in activation function
• The activation function using threshold can be
defined as












ifnet
ifnet
net
f
1
1
)
(
Learning rate
• Denoted by α.
• Used to control the amount of weight adjustment at each step of
training
• Learning rate ranging from 0 to 1 determines the rate of learning in
each time step
Learning rate
• The learning rate defines the size of the corrective steps that the model
takes to adjust for errors in each observation.
• A high learning rate shortens the training time, but with lower ultimate
accuracy, while a lower learning rate takes longer, but with the potential for
greater accuracy.
• Optimizations such as Quickprop are primarily aimed at speeding up error
minimization, while other improvements mainly try to increase reliability.
• In order to avoid oscillation inside the network such as alternating
connection weights, and to improve the rate of convergence, refinements
use an adaptive learning rate that increases or decreases as appropriate.
• (From Wikipedia)
Learning Rate
• Neural networks are often trained by gradient descent on the
weights. This means at each iteration we use backpropagation to
calculate the derivative of the loss function with respect to each
weight and subtract it from that weight.
However, if you actually try that, the weights will change far too much
each iteration, which will make them “overcorrect” and the loss will
actually increase/diverge. So in practice, people usually multiply each
derivative by a small value called the “learning rate” before they
subtract it from its corresponding weight.
• w1new= w1 + (learning rate)* (derivative of cost function wrt w1)
Learning Rate
• Stochastic gradient descent is an optimization algorithm that estimates the
error gradient for the current state of the model using examples from the
training dataset, then updates the weights of the model using the back-
propagation of errors algorithm, referred to as simply backpropagation.
• The amount that the weights are updated during training is referred to as
the step size or the “learning rate.”
• Specifically, the learning rate is a configurable hyperparameter used in the
training of neural networks that has a small positive value, often in the
range between 0.0 and 1.0.
• The learning rate is one of the most important hyper-parameters to tune
for training deep neural networks.
Learning Rate
• If the learning rate is low, then training is more reliable, but
optimization will take a lot of time because steps towards the
minimum of the loss function are tiny.
• If the learning rate is high, then training may not converge or even
diverge. Weight changes can be so big that the optimizer overshoots
the minimum and makes the loss worse.
Learning Rate –Variable learning rate
• Now another thing is learning rate may not be a constant for all the
layers of a neural network, it may be different for different layers
which avoids problem of vanishing gradient i.e, weights may stop
changing as weight change backpropogates itself to first layer (since
there are lot multiplications of derivatives and these derivatives itself
REACH decimal values < 1 and there products are even smaller if we
observe mathematical analysis of backpropagation of neural networks
and as a result learning will not take place and saturate immaturely)
so we assign variable learning rate to each layer.
A systematic approach towards finding the optimal
learning rate
1. start with a high learning rate and steadily decrease it. Changes in
the weight vector must be small in order to reduce oscillations or any
divergence
2. A simple suggestion is to increase learning rate in order to improve
performance and decrease the learning rate in order to worsen the
performance.
3 another method is to double the learning rate until the error values
worsens.
A systematic approach towards finding the optimal
learning rate
• Ultimately, we'd like a learning rate which results is a steep decrease
in the network's loss.
• We can observe this by performing a simple experiment where we
gradually increase the learning rate after each mini batch, recording
the loss at each increment.
• This gradual increase can be on either a linear or exponential scale.
• For learning rates which are too low, the loss may decrease, but at a
very shallow rate.
• When entering the optimal learning rate zone, you'll observe a quick
drop in the loss function. Increasing the learning rate further will
cause an increase in the loss as the parameter updates cause the loss
to "bounce around" and even diverge from the minima.
• Remember, the best learning rate is associated with the steepest
drop in loss, so we're mainly interested in analyzing the slope of the
plot.
Two types of learning
• 1. Sequential or pre-pattern method
• 2. Batch or pre-epoch method.
• In sequential learning a given input is pattern is propagated forward, the
error is determined and back propagated, and the weights are updated.
• In batch learning the weightsare updated only after the entire set of
training network has been presentedto the network. Thus the weight
update is only performed after every epoch.
• If p= patterns in one epoch then
• ∆ w=
1
𝑝 𝑝=1
∞
∆𝑤𝑝
• This method is having smoothing effect.
When to stop back propogation ?
• Continue as long as the error for the validation decreses.
• Whenever the error begins to increase the net is starting to memorise
the training patterns and the training is terminated.
How to choose hidden neurons
• There are many rule-of-thumb methods for determining an
acceptable number of neurons to use in the hidden layers, such as the
following:
1. The number of hidden neurons should be between the size of the
input layer and the size of the output layer.
2. The number of hidden neurons should be 2/3 the size of the input
layer, plus the size of the output layer.
3. The number of hidden neurons should be less than twice the size of
the input layer.
How to choose number of hidden layers
Number of Hidden Layers Result
none
Only capable of representing linear separable
functions or decisions.
1
Can approximate any function that contains a
continuous mapping from one finite space to
another.
2
Can represent an arbitrary decision boundary
to arbitrary accuracy with rational activation
functions and can approximate any smooth
mapping to any accuracy.
>2
Additional layers can learn complex
representations (sort of automatic feature
engineering) for layer layers.
Other terminologies
• Momentum factor:
• used for convergence when momentum factor is added to weight updation
process.
• Vigilance parameter:
• Denoted by ρ
• Used to control the degree of similarity required for patterns to be assigned
to the same cluster
Neural Network Learning rules
c – learning constant
Hebbian Learning Rule
• The learning signal is equal to the neuron’s output
FEED FORWARD UNSUPERVISED LEARNING
Features of Hebbian Learning
• Feedforward unsupervised learning
• “When an axon of a cell A is near enough to exicite a cell B and
repeatedly and persistently takes place in firing it, some growth
process or change takes place in one or both cells increasing the
efficiency”
• If oixj is positive the results is increase in weight else vice versa
Perceptron Learning rule
• Learning signal is the difference between the desired
and actual neuron’s response
• Learning is supervised
Delta Learning Rule
• Only valid for continuous activation function
• Used in supervised training mode
• Learning signal for this rule is called delta
• The aim of the delta rule is to minimize the error over all training patterns
Delta Learning Rule Contd.
Learning rule is derived from the condition of least squared error.
Calculating the gradient vector with respect to wi
Minimization of error requires the weight changes to be in the negative
gradient direction
Widrow-Hoff learning Rule
• Also called as least mean square learning rule
• Introduced by Widrow(1962), used in supervised learning
• Independent of the activation function
• Special case of delta learning rule wherein activation function is an
identity function ie f(net)=net
• Minimizes the squared error between the desired output value di and neti
Winner-Take-All learning rules
Winner-Take-All Learning rule Contd…
• Can be explained for a layer of neurons
• Example of competitive learning and used for
unsupervised network training
• Learning is based on the premise that one of the
neurons in the layer has a maximum response due to
the input x
• This neuron is declared the winner with a weight
Summary of learning rules
112
Neural Network –Weakness and Strengths
• Weakness
• Long training time
• Require a number of parameters typically best determined empirically, e.g., the
network topology or “structure.”
• Poor interpretability: Difficult to interpret the symbolic meaning behind the learned
weights and of “hidden units” in the network
• Strength
• High tolerance to noisy data
• Ability to classify untrained patterns
• Well-suited for continuous-valued inputs and outputs
• Successful on an array of real-world data, e.g., hand-written letters
• Algorithms are inherently parallel
• Techniques have recently been developed for the extraction of rules from trained
neural networks
113
Summary : A Multi-Layer Feed-Forward Neural Network
Output layer
Input layer
Hidden layer
Output vector
Input vector: X
wij
ij
k
i
i
k
j
k
j x
y
y
w
w )
ˆ
( )
(
)
(
)
1
(





114
Summary :How A Multi-Layer Neural Network Works
• The inputs to the network correspond to the attributes measured for each
training tuple
• Inputs are fed simultaneously into the units making up the input layer
• They are then weighted and fed simultaneously to a hidden layer
• The number of hidden layers is arbitrary, although usually only one
• The weighted outputs of the last hidden layer are input to units making up
the output layer, which emits the network's prediction
• The network is feed-forward: None of the weights cycles back to an input
unit or to an output unit of a previous layer
• From a statistical point of view, networks perform nonlinear regression:
Given enough hidden units and enough training samples, they can closely
approximate any function
115
Summary: Defining a Network Topology
• Decide the network topology: Specify # of units in the input
layer, # of hidden layers (if > 1), # of units in each hidden layer,
and # of units in the output layer
• Normalize the input values for each attribute measured in the
training tuples to [0.0—1.0]
• One input unit per domain value, each initialized to 0
• Output, if for classification and more than two classes, one
output unit per class is used
• Once a network has been trained and its accuracy is
unacceptable, repeat the training process with a different
network topology or a different set of initial weights
116
Summary: Backpropagation
• Iteratively process a set of training tuples & compare the network's prediction
with the actual known target value
• For each training tuple, the weights are modified to minimize the mean
squared error between the network's prediction and the actual target value
• Modifications are made in the “backwards” direction: from the output layer,
through each hidden layer down to the first hidden layer, hence
“backpropagation”
• Steps
• Initialize weights to small random numbers, associated with biases
• Propagate the inputs forward (by applying activation function)
• Backpropagate the error (by updating weights and biases)
• Terminating condition (when error is very small, etc.)
117
Summary :Neuron: A Hidden/Output Layer Unit
• An n-dimensional input vector x is mapped into variable y by means of the
scalar product and a nonlinear function mapping
• The inputs to unit are outputs from the previous layer. They are multiplied by
their corresponding weights to form a weighted sum, which is added to the bias
associated with unit. Then a nonlinear activation function is applied to it.
mk
f
weighted
sum
Input
vector x
output y
Activation
function
weight
vector w

w0
w1
wn
x0
x1
xn
)
sign(
y
Example
For
n
0
i
k
i
i x
w m

 

bias
118
Summary :Efficiency and Interpretability
• Efficiency of backpropagation: Each epoch (one iteration through the training
set) takes O(|D| * w), with |D| tuples and w weights, but # of epochs can be
exponential to n, the number of inputs, in worst case
• For easier comprehension: Rule extraction by network pruning
• Simplify the network structure by removing weighted links that have the
least effect on the trained network
• Then perform link, unit, or activation value clustering
• The set of input and activation values are studied to derive rules
describing the relationship between the input and hidden unit layers
• Sensitivity analysis: assess the impact that a given input variable has on a
network output. The knowledge gained from this analysis can be represented
in rules
References
 Craig Heller, and David Sadava, Life: The Science of Biology, fifth
edition, Sinauer Associates, INC, USA, 1998.
 Introduction to Artificial Neural Networks, Nicolas Galoppo von
Borries
 Tom M. Mitchell, Machine Learning, WCB McGraw-Hill, Boston,
1997.
 ”Neural Networks for Pattern Recognition”, Bishop, C.M., 1996
 Jiawei Han, Micheline Kamber, and Jian Pei,University of Illinois at Urbana-
Champaign & imon Fraser University©2011 Han, Kamber & Pei.

More Related Content

Similar to Artificial Neural Network_VCW (1).pptx

Neural networks
Neural networksNeural networks
Neural networksBasil John
 
what is neural network....???
what is neural network....???what is neural network....???
what is neural network....???Adii Shah
 
Neural networks of artificial intelligence
Neural networks of artificial  intelligenceNeural networks of artificial  intelligence
Neural networks of artificial intelligencealldesign
 
Soft Computing-173101
Soft Computing-173101Soft Computing-173101
Soft Computing-173101AMIT KUMAR
 
artificial-neural-networks-rev.ppt
artificial-neural-networks-rev.pptartificial-neural-networks-rev.ppt
artificial-neural-networks-rev.pptRINUSATHYAN
 
artificial-neural-networks-rev.ppt
artificial-neural-networks-rev.pptartificial-neural-networks-rev.ppt
artificial-neural-networks-rev.pptSanaMateen7
 
neuralnetwork.pptx
neuralnetwork.pptxneuralnetwork.pptx
neuralnetwork.pptxSherinRappai
 
Artificial Neural Network in Medical Diagnosis
Artificial Neural Network in Medical DiagnosisArtificial Neural Network in Medical Diagnosis
Artificial Neural Network in Medical DiagnosisAdityendra Kumar Singh
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networksAkashRanjandas1
 
Artificial Neural Networks ppt.pptx for final sem cse
Artificial Neural Networks  ppt.pptx for final sem cseArtificial Neural Networks  ppt.pptx for final sem cse
Artificial Neural Networks ppt.pptx for final sem cseNaveenBhajantri1
 
Acem neuralnetworks
Acem neuralnetworksAcem neuralnetworks
Acem neuralnetworksAastha Kohli
 
20200428135045cfbc718e2c.pdf
20200428135045cfbc718e2c.pdf20200428135045cfbc718e2c.pdf
20200428135045cfbc718e2c.pdfTitleTube
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural NetworkRenas Rekany
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Networkssuserab4f3e
 

Similar to Artificial Neural Network_VCW (1).pptx (20)

Neural networks
Neural networksNeural networks
Neural networks
 
what is neural network....???
what is neural network....???what is neural network....???
what is neural network....???
 
Neural networks of artificial intelligence
Neural networks of artificial  intelligenceNeural networks of artificial  intelligence
Neural networks of artificial intelligence
 
Soft Computing-173101
Soft Computing-173101Soft Computing-173101
Soft Computing-173101
 
19_Learning.ppt
19_Learning.ppt19_Learning.ppt
19_Learning.ppt
 
artificial-neural-networks-rev.ppt
artificial-neural-networks-rev.pptartificial-neural-networks-rev.ppt
artificial-neural-networks-rev.ppt
 
artificial-neural-networks-rev.ppt
artificial-neural-networks-rev.pptartificial-neural-networks-rev.ppt
artificial-neural-networks-rev.ppt
 
Neural network
Neural networkNeural network
Neural network
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
neuralnetwork.pptx
neuralnetwork.pptxneuralnetwork.pptx
neuralnetwork.pptx
 
neuralnetwork.pptx
neuralnetwork.pptxneuralnetwork.pptx
neuralnetwork.pptx
 
Artificial Neural Network in Medical Diagnosis
Artificial Neural Network in Medical DiagnosisArtificial Neural Network in Medical Diagnosis
Artificial Neural Network in Medical Diagnosis
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networks
 
Artificial Neural Networks ppt.pptx for final sem cse
Artificial Neural Networks  ppt.pptx for final sem cseArtificial Neural Networks  ppt.pptx for final sem cse
Artificial Neural Networks ppt.pptx for final sem cse
 
10-Perceptron.pdf
10-Perceptron.pdf10-Perceptron.pdf
10-Perceptron.pdf
 
Acem neuralnetworks
Acem neuralnetworksAcem neuralnetworks
Acem neuralnetworks
 
UNIT-3 .PPTX
UNIT-3 .PPTXUNIT-3 .PPTX
UNIT-3 .PPTX
 
20200428135045cfbc718e2c.pdf
20200428135045cfbc718e2c.pdf20200428135045cfbc718e2c.pdf
20200428135045cfbc718e2c.pdf
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 

Recently uploaded

GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 

Recently uploaded (20)

GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 

Artificial Neural Network_VCW (1).pptx

  • 1. Artificial Neural Network Complied by Dr. Vaishali Wangikar
  • 2. What is Artificial Neural Network? • The term "Artificial Neural Network" is derived from Biological neural networks that develop the structure of a human brain. Similar to the human brain that has neurons interconnected to one another, artificial neural networks also have neurons that are interconnected to one another in various layers of the networks. These neurons are known as nodes. https://www.javatpoint.com/artificial-neural-network
  • 3. Neural Network • NN are constructed and implemented to model the human brain. • Performs various tasks such as pattern-matching, classification, optimization function, approximation, vector quantization and data clustering. • These tasks are difficult for traditional computers
  • 4. ANN • ANN posess a large number of processing elements called nodes/neurons which operate in parallel. • Neurons are connected with others by connection link. • Each link is associated with weights which contain information about the input signal. • Each neuron has an internal state of its own which is a function of the inputs that neuron receives- Activation level • In short , An artificial neural network consists of a pool of simple processing units which communicate by sending signals to each other over a large number of weighted connections.
  • 5. Artificial Neural Network  A set of major aspects of a parallel distributed model include:  a set of processing units (cells).  a state of activation for every unit, which equivalent to the output of the unit.  connections between the units. Generally each connection is defined by a weight.  a propagation rule, which determines the effective input of a unit from its external inputs.  an activation function, which determines the new level of activation based on the effective input and the current activation.  an external input for each unit.  a method for information gathering (the learning rule).  an environment within which the system must operate, providing input signals and _ if necessary _ error signals.
  • 6. Artificial Neural Networks • The “building blocks” of neural networks are the neurons. • In technical systems, we also refer to them as units or nodes. • Basically, each neuron  receives input from many other neurons.  changes its internal state (activation) based on the current input.  sends one output signal to many other neurons, possibly including its input neurons (recurrent network).
  • 7. Artificial Neural Networks • Information is transmitted as a series of electric impulses, so-called spikes. • The frequency and phase of these spikes encodes the information. • In biological systems, one neuron can be connected to as many as 10,000 other neurons. • Usually, a neuron receives its information from other neurons in a confined area, its so-called receptive field.
  • 8. How do ANNs work?  An artificial neural network (ANN) is either a hardware implementation or a computer program which strives to simulate the information processing capabilities of its biological exemplar. ANNs are typically composed of a great number of interconnected artificial neurons. The artificial neurons are simplified models of their biological counterparts.  ANN is a technique for solving problems by constructing software that works like our brains.
  • 9. How do our brains work?  The Brain is A massively parallel information processing system.  Our brains are a huge network of processing elements. A typical brain contains a network of 10 billion neurons.
  • 10. How do our brains work?  A processing element Dendrites: Input Cell body: Processor Synaptic: Link Axon: Output
  • 11. From Biological to Artificial Neurons The Neuron - A Biological Information Processor • dentrites - the receivers • soma - neuron cell body (sums input signals) • axon - the transmitter • synapse - point of transmission • neuron activates after a certain threshold is met Learning occurs via electro-chemical changes in effectiveness of synaptic junction.
  • 12. From Biological to Artificial Neurons An Artificial Neuron - The Perceptron • simulated on hardware or by software • input connections - the receivers • node, unit, or PE simulates neuron body • output connection - the transmitter • activation function employs a threshold or bias • connection weights act as synaptic junctions Learning occurs via changes in value of the connection weights.
  • 13. From Biological to Artificial Neurons An Artificial Neuron - The Perceptron • simulated on hardware or by software • input connections - the receivers • node, unit, or PE simulates neuron body • output connection - the transmitter • activation function employs a threshold or bias • connection weights act as synaptic junctions Learning occurs via changes in value of the connection weights.
  • 14. How do our brains work?  A processing element A neuron is connected to other neurons through about 10,000 synapses
  • 15. How do our brains work?  A processing element A neuron receives input from other neurons. Inputs are combined.
  • 16. How do our brains work?  A processing element Once input exceeds a critical level, the neuron discharges a spike ‐ an electrical pulse that travels from the body, down the axon, to the next neuron(s)
  • 17. How do our brains work?  A processing element The axon endings almost touch the dendrites or cell body of the next neuron.
  • 18. How do our brains work?  A processing element Transmission of an electrical signal from one neuron to the next is effected by neurotransmitters.
  • 19. How do our brains work?  A processing element Neurotransmitters are chemicals which are released from the first neuron and which bind to the Second.
  • 20. How do our brains work?  A processing element This link is called a synapse. The strength of the signal that reaches the next neuron depends on factors such as the amount of neurotransmitter available.
  • 21. How do ANNs work? An artificial neuron is an imitation of a human neuron
  • 22. How do ANNs work? • Now, let us have a look at the model of an artificial neuron.
  • 23. How do ANNs work? Output x1 x2 xm ∑ y Processing Input ∑= X1+X2 + ….+Xm =y . . . . . . . . . . . .
  • 24. How do ANNs work? Not all inputs are equal Output x1 x2 xm ∑ y Processing Input ∑= X1w1+X2w2 + ….+Xmwm =y w1 w2 w m weights . . . . . . . . . . . . . . . . .
  • 25. How do ANNs work? The signal is not passed down to the next neuron verbatim Transfer Function (Activation Function) Output x1 x2 xm ∑ y Processing Input w1 w2 wm weights . . . . . . . . . . . . f(vk) . . . . .
  • 26. The output is a function of the input, that is affected by the weights, and the transfer functions
  • 27. 27 Sj f(Sj) Xj ao a1 a2 an +1 wj0 wj1 wj2 wjn Going step by step : The Artificial Neuron (Perceptron)
  • 28. 28 A Simple Model of a Neuron (Perceptron) • Each neuron has a threshold value • Each neuron has weighted inputs from other neurons • The input signals form a weighted sum • If the activation level exceeds the threshold, the neuron “fires” w1j w2j w3j wij y1 y2 y3 yi  O
  • 29. 29 An Artificial Neuron • Each hidden or output neuron has weighted input connections from each of the units in the preceding layer. • The unit performs a weighted sum of its inputs, and subtracts its threshold value, to give its activation level. • Activation level is passed through a sigmoid activation function to determine output. w1j w2j w3j wij y1 y2 y3 yi  f(x) O
  • 30. 30 Supervised Learning • Training and test data sets • Training set; input & target
  • 31. 31 Perceptron Training • Linear threshold is used. • W - weight value • t - threshold value 1 if  wi xi >t Output= 0 otherwise { i=0
  • 32. 32 Simple network 1 if  wixi >t output= 0 otherwise { i=0 t = 0.0 Y X W1 = 1.5 W3 = 1 -1 AND with a Biased input W2 = 1
  • 33. 33 Learning algorithm While epoch produces an error Present network with next inputs from epoch Error = T – O If Error <> 0 then Wj = Wj + LR * Ij * Error End If End While
  • 34. 34 Learning algorithm Epoch : Presentation of the entire training set to the neural network. In the case of the AND function an epoch consists of four sets of inputs being presented to the network (i.e. [0,0], [0,1], [1,0], [1,1]) Error: The error value is the amount by which the value output by the network differs from the target value. For example, if we required the network to output 0 and it output a 1, then Error = -1
  • 35. 35 Learning algorithm Target Value, T : When we are training a network we not only present it with the input but also with a value that we require the network to produce. For example, if we present the network with [1,1] for the AND function the target value will be 1 Output , O : The output value from the neuron Ij : Inputs being presented to the neuron Wj : Weight from input neuron (Ij) to the output neuron LR : The learning rate. This dictates how quickly the network converges. It is set by a matter of experimentation. It is typically 0.1
  • 36. 36 Training Perceptrons t = 0.0 y x -1 W1 = ? W3 = ? W2 = ? For AND A B Output 0 0 0 0 1 0 1 0 0 1 1 1 •What are the weight values? •Initialize with random weight values
  • 37. 37 Training Perceptrons t = 0.0 y x -1 W1 = 0.3 W3 =-0.4 W2 = 0.5 I1 I2 I3 Summation Output -1 0 0 (-1*0.3) + (0*0.5) + (0*-0.4) = -0.3 0 -1 0 1 (-1*0.3) + (0*0.5) + (1*-0.4) = -0.7 0 -1 1 0 (-1*0.3) + (1*0.5) + (0*-0.4) = 0.2 1 -1 1 1 (-1*0.3) + (1*0.5) + (1*-0.4) = -0.2 0 For AND A B Output 0 0 0 0 1 0 1 0 0 1 1 1
  • 38. 38 Learning in Neural Networks • Learn values of weights from I/O pairs • Start with random weights • Load training example’s input • Observe computed input • Modify weights to reduce difference • Iterate over all training examples • Terminate when weights stop changing OR when error is very small
  • 39. Artificial Neural Networks  An ANN can: 1. compute any computable function, by the appropriate selection of the network topology and weights values. 2. learn from experience!  Specifically, by trial‐and‐error
  • 40. Learning by trial‐and‐error Continuous process of: Trial: Processing an input to produce an output (In terms of ANN: Compute the output function of a given input) Evaluate: Evaluating this output by comparing the actual output with the expected output. Adjust: Adjust the weights.
  • 41. How it works?  Set initial values of the weights randomly.  Input: truth table of the XOR  Do  Read input (e.g. 0, and 0)  Compute an output (e.g. 0.60543)  Compare it to the expected output. (Diff= 0.60543)  Modify the weights accordingly.  Loop until a condition is met  Condition: certain number of iterations  Condition: error threshold
  • 42. Design Issues  Initial weights (small random values ∈[‐1,1])  Transfer function (How the inputs and the weights are combined to produce output?)  Error estimation  Weights adjusting  Number of neurons  Data representation  Size of training set
  • 43. Transfer Functions  Linear: The output is proportional to the total weighted input.  Threshold: The output is set at one of two values, depending on whether the total weighted input is greater than or less than some threshold value.  Non‐linear: The output varies continuously but not linearly as the input changes.
  • 44. Error Estimation  The root mean square error (RMSE) is a frequently- used measure of the differences between values predicted by a model or an estimator and the values actually observed from the thing being modeled or estimated
  • 45. Weights Adjusting  After each iteration, weights should be adjusted to minimize the error. – All possible weights – Back propagation
  • 46. Topologies of Neural Networks completely connected feedforward (directed, a-cyclic) recurrent (feedback connections)
  • 47. Basic models of ANN Basic Models of ANN Interconnections Learning rules Activation function
  • 48. Classification based on interconnections Interconnections Feed forward Single layer Multilayer Feed Back Recurrent Single layer Multilayer
  • 50. Feedforward Network • Its output and input vectors are respectively • Weight wij connects the i’th neuron with j’th input. Activation rule of ith neuron is where EXAMPLE
  • 51. Multilayer feed forward network Can be used to solve complicated problems
  • 52. Feedback network When outputs are directed back as inputs to same or preceding layer nodes it results in the formation of feedback networks
  • 53. Lateral feedback If the feedback of the output of the processing elements is directed back as input to the processing elements in the same layer then it is called lateral feedback
  • 54. Recurrent n/ws • Single node with own feedback • Competitive nets • Single-layer recurrent nts • Multilayer recurrent networks Feedback networks with closed loop are called Recurrent Networks. The response at the k+1’th instant depends on the entire history of the network starting at k=0. Automaton: A system with discrete time inputs and a discrete data representation is called an automaton
  • 55. Basic models of ANN Basic Models of ANN Interconnections Learning rules Activation function
  • 56. Learning • It’s a process by which a NN adapts itself to a stimulus by making proper parameter adjustments, resulting in the production of desired response • Two kinds of learning • Parameter learning:- connection weights are updated • Structure Learning:- change in network structure
  • 57. Training • The process of modifying the weights in the connections between network layers with the objective of achieving the expected output is called training a network. • This is achieved through • Supervised learning • Unsupervised learning • Reinforcement learning
  • 58. Classification of learning • Supervised learning • Unsupervised learning • Reinforcement learning
  • 59. Supervised Learning • Child learns from a teacher • Each input vector requires a corresponding target vector. • Training pair=[input vector, target vector] Neural Network W Error Signal Generator X (Input) Y (Actual output) (Desired Output) Error (D-Y) signals
  • 60. Supervised learning contd. Supervised learning does minimization of error
  • 61. Unsupervised Learning • How a fish or tadpole learns • All similar input patterns are grouped together as clusters. • If a matching input pattern is not found a new cluster is formed
  • 63. Self-organizing • In unsupervised learning there is no feedback • Network must discover patterns, regularities, features for the input data over the output • While doing so the network might change in parameters • This process is called self-organizing
  • 65. When Reinforcement learning is used? • If less information is available about the target output values (critic information) • Learning based on this critic information is called reinforcement learning and the feedback sent is called reinforcement signal • Feedback in this case is only evaluative and not instructive
  • 66. Basic models of ANN Basic Models of ANN Interconnections Learning rules Activation function
  • 67. 1. Identity Function f(x)=x for all x 2. Binary Step function 3. Bipolar Step function 4. Sigmoidal Functions:- Continuous functions 5. Ramp functions:- Activation Function      ifx ifx x f 0 1 { ) (       ifx ifx x f 1 1 { ) ( 0 0 1 0 1 1 ) (      ifx x if x ifx x f
  • 68. 71 Activation functions • Transforms neuron’s input into output. • Features of activation functions: • A squashing effect is required • Prevents accelerating growth of activation levels through the network. • Simple and easy to calculate
  • 69. 72 Standard activation functions • The hard-limiting threshold function – Corresponds to the biological paradigm • either fires or not • Sigmoid functions ('S'-shaped curves) – The logistic function – The hyperbolic tangent (symmetrical) – Both functions have a simple differential – Only the shape is important f(x) = 1 1 + e -ax
  • 70. Some learning algorithms we will learn are • Supervised: • Adaline, Madaline • Perceptron • Back Propagation • multilayer perceptrons • Radial Basis Function Networks • Unsupervised • Competitive Learning • Kohenen self organizing map • Learning vector quantization • Hebbian learning
  • 71. Neural processing • Recall:- processing phase for a NN and its objective is to retrieve the information. The process of computing o for a given x • Basic forms of neural information processing • Auto association • Hetero association • Classification
  • 72. Neural processing-Autoassociation • Set of patterns can be stored in the network • If a pattern similar to a member of the stored set is presented, an association with the input of closest stored pattern is made
  • 73. Neural Processing- Heteroassociation • Associations between pairs of patterns are stored • Distorted input pattern may cause correct heteroassociation at the output
  • 74. Neural processing-Classification • Set of input patterns is divided into a number of classes or categories • In response to an input pattern from the set, the classifier is supposed to recall the information regarding class membership of the input pattern.
  • 75. Important terminologies of ANNs • Weights • Bias • Threshold • Learning rate • Momentum factor • Vigilance parameter • Notations used in ANN
  • 76. Weights • Each neuron is connected to every other neuron by means of directed links • Links are associated with weights • Weights contain information about the input signal and is represented as a matrix • Weight matrix also called connection matrix
  • 77. Weight matrix W= 1 2 3 . . . . . T T T T n w w w w                                   = 11 12 13 1 21 22 23 2 1 2 3 ... ... .................. ................... ... m m n n n nm w w w w w w w w w w w w                  
  • 78. Weights contd… • wij –is the weight from processing element ”i” (source node) to processing element “j” (destination node) X1 1 Xi Yj Xn w1j wij wnj bj 0 0 0 1 1 2 2 0 1 1 .... n i ij inj i j j j n nj n j i ij i n j i ij inj i y xw x w xw x w x w w xw y b xw                
  • 79. Activation Functions • Used to calculate the output response of a neuron. • Sum of the weighted input signal is applied with an activation to obtain the response. • Activation functions can be linear or non linear • Already dealt • Identity function • Single/binary step function • Discrete/continuous sigmoidal function.
  • 80. Bias • Bias is like another weight. Its included by adding a component x0=1 to the input vector X. • X=(1,X1,X2…Xi,…Xn) • Bias is of two types • Positive bias: increase the net input • Negative bias: decrease the net input
  • 81. Why Bias is required? • The relationship between input and output given by the equation of straight line y=mx+c X Y Input C(bias) y=mx+C
  • 82. Threshold • Set value based upon which the final output of the network may be calculated • Used in activation function • The activation function using threshold can be defined as             ifnet ifnet net f 1 1 ) (
  • 83. Learning rate • Denoted by α. • Used to control the amount of weight adjustment at each step of training • Learning rate ranging from 0 to 1 determines the rate of learning in each time step
  • 84. Learning rate • The learning rate defines the size of the corrective steps that the model takes to adjust for errors in each observation. • A high learning rate shortens the training time, but with lower ultimate accuracy, while a lower learning rate takes longer, but with the potential for greater accuracy. • Optimizations such as Quickprop are primarily aimed at speeding up error minimization, while other improvements mainly try to increase reliability. • In order to avoid oscillation inside the network such as alternating connection weights, and to improve the rate of convergence, refinements use an adaptive learning rate that increases or decreases as appropriate. • (From Wikipedia)
  • 85. Learning Rate • Neural networks are often trained by gradient descent on the weights. This means at each iteration we use backpropagation to calculate the derivative of the loss function with respect to each weight and subtract it from that weight. However, if you actually try that, the weights will change far too much each iteration, which will make them “overcorrect” and the loss will actually increase/diverge. So in practice, people usually multiply each derivative by a small value called the “learning rate” before they subtract it from its corresponding weight. • w1new= w1 + (learning rate)* (derivative of cost function wrt w1)
  • 86. Learning Rate • Stochastic gradient descent is an optimization algorithm that estimates the error gradient for the current state of the model using examples from the training dataset, then updates the weights of the model using the back- propagation of errors algorithm, referred to as simply backpropagation. • The amount that the weights are updated during training is referred to as the step size or the “learning rate.” • Specifically, the learning rate is a configurable hyperparameter used in the training of neural networks that has a small positive value, often in the range between 0.0 and 1.0. • The learning rate is one of the most important hyper-parameters to tune for training deep neural networks.
  • 87. Learning Rate • If the learning rate is low, then training is more reliable, but optimization will take a lot of time because steps towards the minimum of the loss function are tiny. • If the learning rate is high, then training may not converge or even diverge. Weight changes can be so big that the optimizer overshoots the minimum and makes the loss worse.
  • 88. Learning Rate –Variable learning rate • Now another thing is learning rate may not be a constant for all the layers of a neural network, it may be different for different layers which avoids problem of vanishing gradient i.e, weights may stop changing as weight change backpropogates itself to first layer (since there are lot multiplications of derivatives and these derivatives itself REACH decimal values < 1 and there products are even smaller if we observe mathematical analysis of backpropagation of neural networks and as a result learning will not take place and saturate immaturely) so we assign variable learning rate to each layer.
  • 89. A systematic approach towards finding the optimal learning rate 1. start with a high learning rate and steadily decrease it. Changes in the weight vector must be small in order to reduce oscillations or any divergence 2. A simple suggestion is to increase learning rate in order to improve performance and decrease the learning rate in order to worsen the performance. 3 another method is to double the learning rate until the error values worsens.
  • 90. A systematic approach towards finding the optimal learning rate • Ultimately, we'd like a learning rate which results is a steep decrease in the network's loss. • We can observe this by performing a simple experiment where we gradually increase the learning rate after each mini batch, recording the loss at each increment. • This gradual increase can be on either a linear or exponential scale.
  • 91. • For learning rates which are too low, the loss may decrease, but at a very shallow rate. • When entering the optimal learning rate zone, you'll observe a quick drop in the loss function. Increasing the learning rate further will cause an increase in the loss as the parameter updates cause the loss to "bounce around" and even diverge from the minima. • Remember, the best learning rate is associated with the steepest drop in loss, so we're mainly interested in analyzing the slope of the plot.
  • 92.
  • 93. Two types of learning • 1. Sequential or pre-pattern method • 2. Batch or pre-epoch method. • In sequential learning a given input is pattern is propagated forward, the error is determined and back propagated, and the weights are updated. • In batch learning the weightsare updated only after the entire set of training network has been presentedto the network. Thus the weight update is only performed after every epoch. • If p= patterns in one epoch then • ∆ w= 1 𝑝 𝑝=1 ∞ ∆𝑤𝑝 • This method is having smoothing effect.
  • 94. When to stop back propogation ? • Continue as long as the error for the validation decreses. • Whenever the error begins to increase the net is starting to memorise the training patterns and the training is terminated.
  • 95. How to choose hidden neurons • There are many rule-of-thumb methods for determining an acceptable number of neurons to use in the hidden layers, such as the following: 1. The number of hidden neurons should be between the size of the input layer and the size of the output layer. 2. The number of hidden neurons should be 2/3 the size of the input layer, plus the size of the output layer. 3. The number of hidden neurons should be less than twice the size of the input layer.
  • 96. How to choose number of hidden layers Number of Hidden Layers Result none Only capable of representing linear separable functions or decisions. 1 Can approximate any function that contains a continuous mapping from one finite space to another. 2 Can represent an arbitrary decision boundary to arbitrary accuracy with rational activation functions and can approximate any smooth mapping to any accuracy. >2 Additional layers can learn complex representations (sort of automatic feature engineering) for layer layers.
  • 97. Other terminologies • Momentum factor: • used for convergence when momentum factor is added to weight updation process. • Vigilance parameter: • Denoted by ρ • Used to control the degree of similarity required for patterns to be assigned to the same cluster
  • 98. Neural Network Learning rules c – learning constant
  • 99. Hebbian Learning Rule • The learning signal is equal to the neuron’s output FEED FORWARD UNSUPERVISED LEARNING
  • 100. Features of Hebbian Learning • Feedforward unsupervised learning • “When an axon of a cell A is near enough to exicite a cell B and repeatedly and persistently takes place in firing it, some growth process or change takes place in one or both cells increasing the efficiency” • If oixj is positive the results is increase in weight else vice versa
  • 101. Perceptron Learning rule • Learning signal is the difference between the desired and actual neuron’s response • Learning is supervised
  • 102. Delta Learning Rule • Only valid for continuous activation function • Used in supervised training mode • Learning signal for this rule is called delta • The aim of the delta rule is to minimize the error over all training patterns
  • 103. Delta Learning Rule Contd. Learning rule is derived from the condition of least squared error. Calculating the gradient vector with respect to wi Minimization of error requires the weight changes to be in the negative gradient direction
  • 104. Widrow-Hoff learning Rule • Also called as least mean square learning rule • Introduced by Widrow(1962), used in supervised learning • Independent of the activation function • Special case of delta learning rule wherein activation function is an identity function ie f(net)=net • Minimizes the squared error between the desired output value di and neti
  • 106. Winner-Take-All Learning rule Contd… • Can be explained for a layer of neurons • Example of competitive learning and used for unsupervised network training • Learning is based on the premise that one of the neurons in the layer has a maximum response due to the input x • This neuron is declared the winner with a weight
  • 107.
  • 109. 112 Neural Network –Weakness and Strengths • Weakness • Long training time • Require a number of parameters typically best determined empirically, e.g., the network topology or “structure.” • Poor interpretability: Difficult to interpret the symbolic meaning behind the learned weights and of “hidden units” in the network • Strength • High tolerance to noisy data • Ability to classify untrained patterns • Well-suited for continuous-valued inputs and outputs • Successful on an array of real-world data, e.g., hand-written letters • Algorithms are inherently parallel • Techniques have recently been developed for the extraction of rules from trained neural networks
  • 110. 113 Summary : A Multi-Layer Feed-Forward Neural Network Output layer Input layer Hidden layer Output vector Input vector: X wij ij k i i k j k j x y y w w ) ˆ ( ) ( ) ( ) 1 (     
  • 111. 114 Summary :How A Multi-Layer Neural Network Works • The inputs to the network correspond to the attributes measured for each training tuple • Inputs are fed simultaneously into the units making up the input layer • They are then weighted and fed simultaneously to a hidden layer • The number of hidden layers is arbitrary, although usually only one • The weighted outputs of the last hidden layer are input to units making up the output layer, which emits the network's prediction • The network is feed-forward: None of the weights cycles back to an input unit or to an output unit of a previous layer • From a statistical point of view, networks perform nonlinear regression: Given enough hidden units and enough training samples, they can closely approximate any function
  • 112. 115 Summary: Defining a Network Topology • Decide the network topology: Specify # of units in the input layer, # of hidden layers (if > 1), # of units in each hidden layer, and # of units in the output layer • Normalize the input values for each attribute measured in the training tuples to [0.0—1.0] • One input unit per domain value, each initialized to 0 • Output, if for classification and more than two classes, one output unit per class is used • Once a network has been trained and its accuracy is unacceptable, repeat the training process with a different network topology or a different set of initial weights
  • 113. 116 Summary: Backpropagation • Iteratively process a set of training tuples & compare the network's prediction with the actual known target value • For each training tuple, the weights are modified to minimize the mean squared error between the network's prediction and the actual target value • Modifications are made in the “backwards” direction: from the output layer, through each hidden layer down to the first hidden layer, hence “backpropagation” • Steps • Initialize weights to small random numbers, associated with biases • Propagate the inputs forward (by applying activation function) • Backpropagate the error (by updating weights and biases) • Terminating condition (when error is very small, etc.)
  • 114. 117 Summary :Neuron: A Hidden/Output Layer Unit • An n-dimensional input vector x is mapped into variable y by means of the scalar product and a nonlinear function mapping • The inputs to unit are outputs from the previous layer. They are multiplied by their corresponding weights to form a weighted sum, which is added to the bias associated with unit. Then a nonlinear activation function is applied to it. mk f weighted sum Input vector x output y Activation function weight vector w  w0 w1 wn x0 x1 xn ) sign( y Example For n 0 i k i i x w m     bias
  • 115. 118 Summary :Efficiency and Interpretability • Efficiency of backpropagation: Each epoch (one iteration through the training set) takes O(|D| * w), with |D| tuples and w weights, but # of epochs can be exponential to n, the number of inputs, in worst case • For easier comprehension: Rule extraction by network pruning • Simplify the network structure by removing weighted links that have the least effect on the trained network • Then perform link, unit, or activation value clustering • The set of input and activation values are studied to derive rules describing the relationship between the input and hidden unit layers • Sensitivity analysis: assess the impact that a given input variable has on a network output. The knowledge gained from this analysis can be represented in rules
  • 116. References  Craig Heller, and David Sadava, Life: The Science of Biology, fifth edition, Sinauer Associates, INC, USA, 1998.  Introduction to Artificial Neural Networks, Nicolas Galoppo von Borries  Tom M. Mitchell, Machine Learning, WCB McGraw-Hill, Boston, 1997.  ”Neural Networks for Pattern Recognition”, Bishop, C.M., 1996  Jiawei Han, Micheline Kamber, and Jian Pei,University of Illinois at Urbana- Champaign & imon Fraser University©2011 Han, Kamber & Pei.