SSK_Artificial Neural Networks Basic to Models.pdf

AN INTRODUCTION TO
ARTIFICIAL NEURAL NETWORKS
Dr.S.SASIKALA
Department of ECE
Kumaraguru College of Technology
Coimbatore
Department of
Electronics and Communication Engineering
Since 1986
August 27, 2022
IEEE EAB & IEEE MAS Sponsored
TryEngineering Workshop on Artificial
Intelligence for All
1

INTRODUCTION
August 27, 2022
2

What is Learning?
Change is The Result of all True Learning
Leo Buscaglia
August 27, 2022
3

What is Learning?
• Learning happens when you observe a
phenomena and recognize a pattern.
• You try to understand this pattern by finding
out if there is any relationship between
the entities involved in that phenomena.
August 27, 2022
4

What is Learning?
• Take the example of a simple phenomenon
that we observe daily — the occurrence of day
and night – How do you realize?
Is there a pattern? Yes
Day time: A fixed time period, we
are exposed to light and heat of
the sun.
Night time: Another fixed period,
we are deprived of light and heat
from the sun.
This pattern repeats over and
over and over
August 27, 2022
5

What is Learning?
• how this pattern occurs?
• There are 2 entities involved in this observation
— Sun and Earth.
• Is there a relationship between the amount of light(and
heat) originating from the sun and the surface of earth
receiving it.
• The pattern suggests that the surface of the earth
receives the light alternatively
— gets it during the daytime
— does not get it during night-time.
• How is this possible?
— There are many possibilities
August 27, 2022
6

What is Learning?
• There are 3 conclusions derived called
“models” that explain the observed
phenomena.
• Model 1: Day/Night is a function of Magical
ON/OFF switch of sun
• Model 2: Day/Night is a function of the
Revolution of Sun around the earth
• Model 3: Day/Night is a function of Rotation of
Earth on its axis
August 27, 2022
7

What is Learning?
• The question now arises
— Which model(or function) is more accurate?
As per the observations/findings of different
philosophers/scientists across the ages, Model
3 is the most accurate model which explains
the phenomena of Day and Night.
— We can say, that this model “fits” best for
the observations around this phenomena.
August 27, 2022
8

What is Learning?
• Once a model has been built, it can be used
to predict future outcomes for that
phenomena.
• In our example, our model can safely predict
that occurrence of day/night will continue to
happen until, for some reason, the earth stops
rotating or sun runs out of its energy
➢ Will the earth stop rotating?
➢ When will the sun spent all of its energy ?
August 27, 2022
9

This is How Humans Learn
August 27, 2022
10

Human Learning
• Observing something, identifying a pattern,
building a theory (model) to explain this
pattern and testing this theory to check
whether it fits in most or all observations.
August 27, 2022
11

How Human Learn?
Parents Parents
Siblings
Teachers
Parents
Siblings
Teachers
Friends
Parents
Siblings
Teachers
Friends
Society
Experience
Parents
Siblings
Wife
Friends
Society
Colleagues
Parents
Siblings
Wife
Children
Friends
Society
Colleagues
Parents
Siblings
Wife
Children
Grand
Children
Friends
Society
Colleagues
BOOKS BOOKS
August 27, 2022
12

Is it possible for a machine to mimic
the process of human learning?
August 27, 2022
13

Human vs Machine
August 27, 2022
14

Machine Can Mimic
Human Learning Process
• The basic idea remains the same
• As with humans, machines are fed with
observations (data)
• The learning algorithm try to find out a
pattern among the data which best fits the
observations
August 27, 2022
15

Human learning vs Machine Learning
August 27, 2022
16

Machine Learning
A very powerful extension of
Human Brainpower
August 27, 2022
17

Task of Machine Learning
• Pattern Recognition
• Decision Making
• Optimization
August 27, 2022
18

Pattern Recognition
A pattern
• is an object, process or event that can be
given a name
• can either be seen physically or it can be
observed
• Eg. Eye colour, finger prints, handwriting
Recognition
• process of identifying the patterns
Pattern recognition
• is identifying patterns in data
• Process of converting the raw data into a
form that is amenable for a machine to use
• Pattern recognition involves classification
and cluster of patterns.
August 27, 2022
19

Facial Expression Recognition
August 27, 2022
20

August 27, 2022
21

Pattern Recognition
• Humans
Can perceive pattern naturally
But more computational time is required
• Machines
Computational speed is very high compared to humans.
August 27, 2022
22

Human - Very Good in PR
Humans have
Ability to learn from
experience
Brain with lot of information
processing cells
About 1011 neurons
interconnected to form a vast
and complex network like
structure
August 27, 2022
23

BIOLOGICAL AND ARTIFICIAL
NEURAL NETWORKS
August 27, 2022
24

Biological Neuron
Cell body(Soma)
• Containing organelles of the neuron
Dentrites (Rx)
• Tree-like structure originating to cell body that
receives the signal from surrounding neurons
Axon (TX)
• Long connection extending from cell body and carries signal
• There is only one axon per neuron that axon may divide in many branches at its end
and connected to other cells to transmits the signal from one neuron to others
Synapse
• Small-bulb like organ neuron at the end of axon which introduces the signal to the
near by dendrites of the other through chemical diffusion
Neuron
• Summed up all the inputs and process the sum by a threshold function and
produces an output signal.
• A neuron fires an electrical impulse only if certain condition is met
August 27, 2022
25

Biological Neural Network
August 27, 2022
26

How do you model an Artificial Neuron
By simulating functioning of a biological neuron
❑Function 1 – Accumulation of Information
Summation or Net Input Calculation
❑Function 2 – Passing of Information
Threshold or Activation or Producing output
Simulation involves
❑Identify the equivalent mathematical operator for the function
❑Design a mathematical model that process information
Artificial Neuron Resembles the human brain in two respects:
❑Knowledge acquisition through learning
❑Storage of knowledge in the synaptic weights
August 27, 2022
27

Biological Neuron and Artificial Neuron
August 27, 2022
28

Biological & Artificial Neuron
Resemblance
August 27, 2022
29

ANN vs BNN
BNN ANN
Soma Node
Dendrites Input
Synapse Weights or Interconnections
Axon Output
Massively parallel, slow but superior than
ANN
Massively parallel, fast but inferior than BNN
10
11
neurons and 10
15
interconnections 10
2
to 10
4
nodes mainly depends on the type
of application and network designer
They can tolerate ambiguity Very precise, structured and formatted data
is required to tolerate ambiguity
Performance degrades with even partial
damage
It is capable of robust performance, hence
has the potential to be fault tolerant
Stores the information in the synapse Stores the information in continuous
memory locations
August 27, 2022
30

ANN - Function
-
f
Weighted
sum
Input
vector x
Output y
Weight
vector
w

w0j
w1j
wnj
x0
x1
xn
August 27, 2022
31

What is ANN
Artificial Neuron
➢A digital construct that seeks to simulate the behavior of a
biological neuron in the brain.
➢They may be physical devices, or purely mathematical
constructs.
Artificial Neural Networks (ANN)
➢ Networks of Artificial Neurons
➢A parallel computational system consisting of a huge number
of simple and massively connected processing elements
connected together in a specific manner in order to perform a
particular task
August 27, 2022
32

History of ANN
August 27, 2022
33

Model of Artificial Neural Network
August 27, 2022
34

Model of Artificial Neural Network
August 27, 2022
35
• In the general model of ANN, the net input is
calculated by using the equation
• The output can be calculated by applying the
activation function over the net input

ANN - Building Blocks
August 27, 2022
36

CLASSIFICATIONS OF ANN
• Based on the architecture
➢Feed Forward Neural Network (FFNN)
➢Feed Back Neural Network (FBNN)
➢Recurrent Neural Network (RNN)
➢Competitive Neural Network (CNN)
• Based on the learning algorithm
➢Supervised Learning
➢Unsupervised Learning
➢Reinforcement Learning
August 27, 2022
37

Activation Functions
➢Activation functions are mathematical equations i.e a non-linear
transformations attached to each neuron in the network, which
determines whether the neuron should be activated (“fired”)
or not by calculating weighted sum and further adding bias with
it.
➢The purpose of the activation function is to introduce non-
linearity into the output of a neuron.
➢Activation functions also help normalize the output of each
neuron to a range between 1 and 0 or between -1 and 1.
➢The activation function does the non-linear transformation to
the input making it capable to learn and perform more complex
tasks.
August 27, 2022
38

Activation Functions
Linear Activation Function or identity function
Sigmoid Activation Function
➢Binary sigmoidal function
➢Bipolar sigmoidal function
F(x) = 1 if x > 0 else 0 if x < 0
Binary Step Activation Function
August 27, 2022
39

ANN MODELS
August 27, 2022
40

ANN Models
Models
➢McCulloch and Pitts Neuron
➢Hebb Network
➢Perceptron Network
➢Linear Separability
Insight
➢Architecture
➢Net Input Calculation
➢Output Calculation
➢Weight Updation - Learning
August 27, 2022
41

McCulloch and Pitts Neuron
➢ Activation is a binary step function
➢ Widely used in designing logic function
without bias
with bias
yin
= b + xi
wi
➢ Usually called as M-P Neuron or Threshold Logic Unit / gate
➢ Simply classifies the set of inputs into two different classes.
➢ Bias b is used to adjust the output along with the weighted
sum of the inputs to the neuron.
➢ b is a constant helps the model in a way
that it can fit best for the given data.
➢ Net input is calculated as
yin
= xi
wi
f(x) = 1 if x > 0 else 0 if x < 0
August 27, 2022
42

Hand Worked Example-MP Neuron
Calculation of net input without bias
[x1,x2,x3] = [0.1, 0.6, 0.3]
[w1,w2,w3] = [0.3 ,0.2, -0.4]
yin= xwT
yin=x1w1 + x2w2 + x3w3
= 0.1*0.3 + 0.6*0.2 + 0.3*(-0.4)
= 0.03 + 0.12 – 0.12
= 0.03
X1=0.1
X2=0.6
X3=0.3
w1=0.3
w2=0.2
w3=0.4
August 27, 2022
43
Calculation of output using binary step activation function
y = F(yin) = 1

Hand Worked Example-MP Neuron
Calculation of output using binary sigmoidal function
X = [x1,x2,x3] = [0.1, 0.6, 0.3]
W = [w1,w2,w3] = [0.3, 0.2,-0.4]
yin= b +xwT
Assuming x0 = 1 and w0=b
X = [x0, x1,x2,x3]
W = [w0,w1,w2,w3]
yin= xwT
yin=x1w1 +x1w1 + x2w2 + x3w3
= 1*0.5 + 0.1*0.3 + 0.6*0.2 + 0.3*(-0.4)
= 0.5 + 0.03 + 0.12 – 0.12
= 0.53
X1=0.1
X2=0.6
X3=0.3
w1=0.3
w2=0.2
w3=-0.4
b=0.5
1
y = f(yin) = 1 using Binary Step
= 0.63 using binary
sigmoid
August 27, 2022
44

Implementation of AND function
x1
x2
w1
w2
X1 X2 y
1 1 1
1 0 0
0 1 0
0 0 0
Assume Initial Weights w1 and w2 = 1
For inputs
➢ (1,1)→ yin=x1w1+x2w2 = 2
➢ (1,0) → 1
➢ (0,1) → 1
➢ (0,0) → 0
➢Assume threshold value Ѳ = 2
0if yin 2
y =f (yin)=
1if yin  2
August 27, 2022
45

Implementation of OR function
x1 x2 Y
1 1 1
1 0 1
0 1 1
0 0 0
Assume Initial Weights w1 and
w2 = 1 & b=0.5
For inputs
➢ (1,1)→ yin=x1w1+x2w2 + b= 2.5
➢ (1,0) → 1.5
➢ (0,1) → 1.5
➢ (0,0) → 0
➢Assume threshold value Ѳ = 1.5
y =f (yin)=
1if yin  1.5
0if yin 1.5
August 27, 2022
46

Hebb Network
➢ Observed that the learning in human brain takes place by
the change in synaptic gap.
➢ Weight vector is found to increase proportionately to the
product of input and output.
wi
(new )=wi
(old )+xi
y
b(new )=b (old )+y
➢ Weight and bias adjustment
➢ Change in weight w =xi
y
➢ Activation function is identity function f (yin ) = yin
➢ More suited for bipolar data
➢ Used for Pattern association, classification and clustering
August 27, 2022
47

Training Steps
1.Initially, the weights are set to zero, i.e. w =0 for all inputs
i =1 to n and n is the total number of input neurons.
2.The activation function for inputs is generally set as an
identity function.
3.The activation function for output is also set to y= t.
4.The weight adjustments and bias are adjusted to:
5.The steps 2 to 4 are repeated for each input vector and
output.
wi
(new )=wi
(old )+xi
y
b(new )=b (old )+y
August 27, 2022
48

X1 X2 y
1 1 1
1 0 0
0 1 0
0 0 0
Training data →truth table of AND function
In bipolar form 1 →1 & 0 → -1
x1
x2
w1
w2
x0
b
➢ Initially the weights are set to zero w1=w2=b=0
➢ Present the first set inputs and apply Hebb rule
[x1 x2 x0] = [1 1 1] and y=[1]
wi(new)=wi(old) + xiy
• w1(new) = w1(old)+x1y → 0 + 1 *1 = 1
• w2(new)=w2(old)+x2y → 0 + 1 * 1 = 1
• b(new) = b(old) + y → 0 + 1 = 1
➢ Change in weight
• ∆wi=xiy
• ∆w1=x1y → 1 * 1 = 1
• ∆ w2 = x2y → 1 * 1 = 1
• ∆b=y = 1
August 27, 2022
49

x1
x2
2
2
x0
-2
➢ Present the second set inputs and apply Hebb rule
– [x1 x2 x0] = [1 -1 1] and y=[-1]
– wi(new)=wi(old) + xiy
• w1(new) = w1(old)+x1y → 1 + 1 *-1 =0
• w2(new)=w2(old)+x2y → 1 + -1 * -1 = 2
• b(new) = b(old) + y → 1 + -1 = 0
➢ Change in weight
– ∆wi=xiy
• ∆w1=x1y → 1 * -1 =-1
• ∆ w2 = x2y → -1 * -1 = 1
• ∆b=y = -1
X1 X2 X0 Y ∆w1 ∆ w2 ∆b W1
(0)
W2
(0)
B
(0)
1 1 1 1 1 1 1 1 1 1
1 -1 1 -1 -1 1 -1 0 2 0
-1 1 1 -1 1 -1 -1 1 1 -1
-1 -1 1 -1
Dr
.P
.Ganes
1
hKumar,
1
Annauniver
-1
sity
2 2 -2
Hebb Net for AND Function
August 27, 2022
50

Perceptron Network
➢Perceptron Networks are single-layer feed-forward
networks introduced by Rosenblatt.
➢The Perceptron consists of an input layer, a hidden layer,
and output layer.
➢The input layer is connected to the hidden layer through
weights which may be inhibitory or excitatory or zero (-
1, +1 or 0).
➢The activation function used is a binary step function for
the input layer and the hidden layer.
➢The output is Y= f (y)
➢The activation function is: F(y)=
1, if y ≥ θ
0, if - θ ≤ y ≤ θ
-1, if y ≤ - θ
where θ is threshold
August 27, 2022
51

Perceptron Learning Rule
➢ The weight updation takes place between the hidden layer
and the output layer to match the target output.
➢ The error is calculated based on the actual output and the
desired output.
➢ If the output matches the target then no weight updation
takes place.
➢ The weights in the network can be set to any values initially.
➢ The Perceptron learning will converge to weight vector that
gives correct output for all input training pattern and this
learning happens in a finite number of steps.
➢ The Perceptron rule can be used for both binary and bipolar
inputs.
August 27, 2022
52

Training Steps
➢ Let there be “n” training input vectors and x(n) and t(n) are
associated with the target values.
➢ Initialize the weights and bias to zero for easy calculation
and the learning rate  be 1.
➢ The input layer has identity activation function so x(i)= y(i).
➢ To calculate the output of the network:
•Calculate the net input to the output neuron
•Apply the activation function over the net input
➢ Now based on the output y, compare the desired target
value (t) and the actual output.
➢ Update Weights and bias if y  t.
➢ Continue the iteration until there is
no weight change. Stop once this
condition is achieved
Weight Updation
if output (Y) t arget (t),
then w (new ) =w (old ) +tx
b(new ) =b(old ) +t
else w (new ) =w (old )
b(new ) =b(old )
August 27, 2022
53

Inputs Bias Target Net i/p O/p Weight Changes New Weights
w1 w2 b t yin y  w1 w2 b W1 w2 b
The EPOCHS are the cycle of input patterns fed to the system until there is no weight change
required and the iteration stops.
August 27, 2022
54

Linear Separability
• Linear Separability is possible by ANN(with input and output nodes alone)
only when the given problem is linear otherwise it is not possible.
• But Most of the real world problems are non linear in nature.
• Non-linear problems can be easily solved by introducing one or more
hidden layers between the input and output layers
x2
x1
x2
After Trained by NeuralNetwork
• Concept of separating the input data into classes by means of
straight line called decision line or decision making line or decision
support line or linearly separable line.
August 27, 2022
55

Linear Separability Illustrative Example
X1 X2 Y
0 0 0
0 1 1
1 0 1
1 1 1
‘OR’ GATE
X1
X2
‘AND’ GATE
X1 X2 Y
0 0 0
0 1 0
1 0 0
1 1 1
X1
X1
X2
X2
‘OR’ gate and ‘AND’ gate are LINEARLYSEPARABLE
‘XOR’ GATE
X1 X2 Y
0 0 1
0 1 0
1 0 0
1 1 1
‘XOR’ gate is NON-LINEAR
Logic 1 O/p
Logic 0 O/p
NOTE: Most of the data of real world problems are non linear only
August 27, 2022
56

SHALLOW NEURAL NETWORK
MODELS
August 27, 2022
57

Shallow Neural Networks
• A neural network with one hidden layer is considered
a shallow neural network whereas a network with
many hidden layers and a large number of neurons in
each layer is considered a deep neural network.
• A “shallow” neural network has only three layers of
neurons:
➢An input layer that accepts the independent
variables or inputs of the model
➢One hidden layer
➢An output layer that generates predictions
August 27, 2022
58

Shallow Machine Learning
• The features extraction in Shallow Machine
Learning is a manual process that requires
domain knowledge of the data that we
are learning from.
• In other words, "Shallow Learning" is a type
of machine learning where we learn from
data described by pre-defined features.
August 27, 2022
59

Shallow Neural Networks
• Multilayer Perceptron Network (MLPN)
• Radial Basis Function Network (RBFN)
August 27, 2022
60

MULTI-LAYER PERCEPTRON (MLP)
August 27, 2022
61

We will introduce the MLP and the backpropagation
algorithm which is used to train it
MLP used to describe any general feedforward (no
recurrent connections) network
However, we will concentrate on nets with units
arranged in layers
x1
xn
62
August 27, 2022

Different books refer to the above as either 4 layer (no. of
layers of neurons) or 3 layer (no. of layers of adaptive
weights). We will follow the latter convention
1st question:
what do the extra layers gain you? Start with looking at
what a single layer can’t do
x1
xn
63
August 27, 2022

Perceptron Learning Theorem
• Recap: A perceptron (threshold unit) can
learn anything that it can represent (i.e.
anything separable with a hyperplane)
64
August 27, 2022

The Exclusive OR problem
A Perceptron cannot represent Exclusive OR
since it is not linearly separable.
65
August 27, 2022

66
August 27, 2022

Minsky & Papert (1969) offered solution to XOR problem by
combining perceptron unit responses using a second layer of
Units. Piecewise linear classification using an MLP with
threshold (perceptron) units
1
2
+1
+1
3
67
August 27, 2022

xn
x1
x2
Input
Output
Three-layer networks
Hidden layers
68
August 27, 2022

Properties of architecture
• No connections within a layer
y f w x b
i ij j i
j
m
= +

=
( )
1
Each unit is a perceptron
69
August 27, 2022

• No direct connections between input and output layers
•
y f w x b
i ij j i
j
m
= +

=
( )
1
70
August 27, 2022

• Fully connected between layers
•
y f w x b
i ij j i
j
m
= +

=
( )
1
71
August 27, 2022

• Fully connected between layers
• Often more than 3 layers
• Number of output units need not equal number of input units
• Number of hidden units per layer can be more or less than
input or output units
y f w x b
i ij j i
j
m
= +

=
( )
1
Often include bias as an extra weight
72
August 27, 2022

What do each of the layers do?
1st layer draws
linear boundaries
2nd layer combines
the boundaries
3rd layer can generate
arbitrarily complex
boundaries
73
August 27, 2022

Backward pass phase: computes ‘error signal’, propagates
the error backwards through network starting at output units
(where the error is the difference between actual and desired
output values)
Forward pass phase: computes ‘functional signal’, feed forward
propagation of input pattern signals through network
Backpropagation Learning Algorithm ‘BP’
Solution to credit assignment problem in MLP. Rumelhart, Hinton and
Williams (1986) (though actually invented earlier in a PhD thesis
relating to economics)
BP has two phases:
74
August 27, 2022

Conceptually: Forward Activity -
Backward Error
75
August 27, 2022

Conceptually: Forward Activity -
Backward Error
76
August 27, 2022

MLP – with Single Hidden Layer
77
August 27, 2022
https://www.cse.unsw.edu.au/~cs9417ml/MLP2/BackPropagation.html
https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/

Forward Propagation of Activity
• Step 1: Initialize weights at random, choose a
learning rate η
• Until network is trained:
• For each training example i.e. input pattern and
target output(s):
• Step 2: Do forward pass through net (with fixed
weights) to produce output(s)
– i.e., in Forward Direction, layer by layer:
• Inputs applied
• Multiplied by weights
• Summed
• Squashed by sigmoid activation function
• Output passed to each neuron in next layer
– Repeat above until network output(s) produced
78
August 27, 2022

Step 3. Back-propagation of error
• Compute error (delta or local gradient) for each
output unit δ k
• Layer-by-layer, compute error (delta or local
gradient) for each hidden unit δ j by backpropagating
errors (as shown previously)
Step 4: Next, update all the weights Δwij
By gradient descent, and go back to Step 2
− The overall MLP learning algorithm, involving
forward pass and backpropagation of error
(until the network training completion), is
known as the Generalised Delta Rule (GDR),
or more commonly, the Back Propagation
(BP) algorithm
79
August 27, 2022

Back Propagation Algorithm Summary
80
August 27, 2022

MLP/BP: A worked example
81
August 27, 2022

Worked example: Forward Pass
82
August 27, 2022

Worked example: Forward Pass
83
August 27, 2022

Worked example: Backward Pass
84
August 27, 2022

Worked example: Update Weights
Using Generalized Delta Rule (BP)
85
August 27, 2022

Similarly for the all weights wij:
86
August 27, 2022

Verification that it works
87
August 27, 2022

Training
• This was a single iteration of back-prop
• Training requires many iterations with many
training examples or epochs (one epoch is entire
presentation of complete training set)
• It can be slow !
• Note that computation in MLP is local (with
respect to each neuron)
• Parallel computation implementation is also
possible
88
August 27, 2022

Training and testing data
• How many examples ?
– The more the merrier !
• Disjoint training and testing data sets
– learn from training data but evaluate
performance (generalization ability) on
unseen test data
• Aim: minimize error on test data
89
August 27, 2022

August 27, 2022
90

August 27, 2022
91

SSK_Artificial Neural Networks Basic to Models.pdf

Recommended

Recommended

More Related Content

Similar to SSK_Artificial Neural Networks Basic to Models.pdf

Similar to SSK_Artificial Neural Networks Basic to Models.pdf (20)

Recently uploaded

Recently uploaded (20)

SSK_Artificial Neural Networks Basic to Models.pdf