Artificial Neural Networks for NIU session 2016 17

Artificial Neural
Networks
14.2.2017
Prof. (Dr.) Neeta Awasthy
Director, School of Engineering and Technology
NOIDA INTERNATIONAL UNIVERSITY,
Greater Noida

Course Objective
 To understand, successfully apply
and evaluate Neural Network
structures and paradigms for
problems in Science, Engineering and
Business.

PreRequisites
 It is expected that, the audience has a flair
to understand algorithms and basic
knowledge of Mathematics, Logic gates
and Programming

Outline
 Introduction
 How the human brain learns
 Neuron Models
 Different types of Neural Networks
 Network Layers and Structure
 Training a Neural Network
 Application of ANN

Introduction:
 Soft Computing techniques such as Neural
networks, genetic algorithms and fuzzy logic are
among the most powerful tools available for
detecting and describing subtle relationships in
massive amounts of seemingly unrelated data.
 Neural networks can learn and are actually
taught instead of being programmed.
 Teaching mode can be supervised or
unsupervised
 Neural Networks learn in the presence of noise

Neuron and a sample of pulse train

How does the brain work
• Each neuron receives inputs from other neurons
– Use spikes to communicate
• The effect of each input line on the neuron is controlled
by a synaptic weight
– Positive or negative
• Synaptic weight adapts so that the whole network learns
to perform useful computations
– Recognizing objects, understanding languages,
making plans, controlling the body
• There are 1011 neurons with 104 weights.

How the Human Brain learns
 In the human brain, a typical neuron collects signals from others through a host of
fine structures called dendrites.
 The neuron sends out spikes of electrical activity through a long, thin stand known
as an axon, which splits into thousands of branches.
 At the end of each branch, a structure called a synapse converts the activity from
the axon into electrical effects that inhibit or excite activity in the connected
neurons.

Modularity and brain
• Different bits of the cortex do different things
• Local damage to the brain has specific effects
• Early brain damage makes function relocate
• Cortex gives rapid parallel computation plus
flexibility
• Conventional computers requires very fast
central processors for long sequential
computations

Information flow in nervous system

Fundamental concept
• NN are constructed and implemented to
model the human brain.
• Performs various tasks such as pattern-
matching, classification, optimization
function, approximation, vector
quantization and data clustering.
• These tasks are difficult for traditional
computers

ANN
• ANN posess a large number of processing
elements called nodes/neurons which operate in
parallel.
• Neurons are connected with others by
connection link.
• Each link is associated with weights which
contain information about the input signal.
• Each neuron has an internal state of its own
which is a function of the inputs that neuron
receives- Activation level

Comparison between brain verses computer
Brain ANN
Speed Few ms. Few nano sec. massive
||el processing
Size and complexity 1011 neurons & 1015
interconnections
Depends on designer
Storage capacity Stores information in its
interconnection or in
synapse.
No Loss of memory
Contiguous memory
locations
loss of memory may
happen sometimes.
Tolerance Has fault tolerance No fault tolerance Inf gets
disrupted when
interconnections are
disconnected
Control mechanism Complicated involves
chemicals in biological
neuron
Simpler in ANN

Types of Problems ANN can handle
 Mathematical Modeling (Function Approximation)
 Classification
 Clustering
 Forecasting
 Vector Quantization
 Pattern Association
 Control
 Optimization

A Neuron Model
 When a neuron receives excitatory input that is sufficiently large
compared with its inhibitory input, it sends a spike of electrical activity
down its axon. Learning occurs by changing the effectiveness of the
synapses so that the influence of one neuron on another changes.
 We conduct these neural networks by first trying to deduce the essential
features of neurons and their interconnections.
 We then typically program a computer to simulate these features.

A Simple Neuron
 An artificial neuron is a device with many inputs and one output.
 The neuron has two modes of operation;
 the training mode and
 the using mode.

Important terminologies of ANNs
• Weights
• Bias
• Threshold
• Learning rate
• Momentum factor
• Vigilance parameter
• Notations used in ANN

Weights
• Each neuron is connected to every other
neuron by means of directed links
• Links are associated with weights
• Weights contain information about the
input signal and is represented as a matrix
• Weight matrix also called connection
matrix

Weights contd…
• wij –is the weight from processing element ”i” (source node)
to processing element “j” (destination node)
X1
1
Xi
Yj
Xn
w1j
wij
wnj
bj

Activation Functions
• Used to calculate the output response of a
neuron.
• Sum of the weighted input signal is applied with
an activation to obtain the response.
• Activation functions can be linear or non linear
• Already dealt
– Identity function
– Single/binary step function
– Discrete/continuous sigmoidal function.

Bias
• Bias is like another weight. Its included by
adding a component x0=1 to the input
vector X.
• X=(1,X1,X2…Xi,…Xn)
• Bias is of two types
– Positive bias: increase the net input
– Negative bias: decrease the net input

Why Bias is required?
• The relationship between input and output
given by the equation of straight line
y=mx+c
X YInput
C(bias)
y=mx+C

Threshold
• Set value based upon which the final output of
the network may be calculated
• Used in activation function
• The activation function using threshold can be
defined as

Learning rate
• Denoted by α.
• Used to control the amount of weight
adjustment at each step of training
• Learning rate ranging from 0 to 1
determines the rate of learning in each
time step

Other terminologies
• Momentum factor:
– used for convergence when momentum factor
is added to weight updation process.
• Vigilance parameter:
– Denoted by ρ
– Used to control the degree of similarity
required for patterns to be assigned to the
same cluster

The McCulloch-Pitts model
Neurons work by processing information. They receive and provide
information in form of spikes.
Inputs
Output
w2
w1
w3
wn
.
.
.
x1
x2
x3
…
xn-1
xn
y

McCulloch Pits for And and or model

Features of McCulloch-Pitts model
• Allows binary 0,1 states only
• Operates under a discrete-time assumption
• Weights and the neurons’ thresholds are
fixed in the model and no interaction
among network neurons
• Just a primitive model

Properties for Mc Culloch and Pitts Model
Input is 0 or 1
Weights are -1, 0 or +1
Threshold is an integer
Output is 0 or 1
Output is 1 if multiplication of weight and input is more than the threshold
else Outputs 0
Represent the NOT gate with the help of this model, using signal flow graph and flow
Truth Table
L=0
-1
x
y
x y
0 1
1 0
Input x
w= -1
Start
L=0

Mc Culloch and Pitts Model……… OR Gate and AND Gate
35
 OR Gatex y z
0 0 0
0 1 1
1 0 1
1 1 1
z
y
z
x y
L>=1
Start
Stop
wx,wy
Input x,y

Advantages and Disadvantages of
McCulloch Pitt model
• Advantages
• Simplistic
• Substantial computing
power
• Disadvantages
– Weights and thresholds
are fixed
– Not very flexible

Quiz
• Which of the following tasks are neural
networks good at?
– Recognizing fragments of words in a pre-
processed sound wave.
– Recognizing badly written characters.
– Storing lists of names and birth dates.
– logical reasoning
Neural networks are good at finding statistical regularities that allow
them to recognize patterns. They are not good at flawlessly
applying symbolic rules or storing exact numbers.

Perceptron Learning rule
• Learning signal is the difference between the
desired and actual neuron’s response
• Learning is supervised

General symbol of neuron consisting of
processing node and synaptic connections

Neuron Modeling for ANN
Is referred to activation function. Domain is
set of activation values net.
Scalar product of weight and input vector
Neuron as a processing node performs the operation of summation of
its weighted input.

Sigmoid neurons
• These give a real-valued
output that is a smooth and
bounded function of their
total input.
– Typically they use the
logistic function
– They have nice
derivatives which make
learning easy
0.5
0
0
1

Activation function
• Bipolar binary and unipolar binary are
called as hard limiting activation functions
used in discrete neuron model
• Unipolar continuous and bipolar continuous
are called soft limiting activation functions
are called sigmoidal characteristics.

Activation functions
Bipolar continuous
Bipolar binary functions

Activation functions
Unipolar continuous
Unipolar Binary

Common models of neurons
Binary perceptrons
Continuous perceptrons

Quiz
• Suppose we have 3D input x=(0.5,-0.5) connected to a
neuron with weights w=(2,-1) and bias b=0.5. furthermore
the target for x is t=0. in this case we use a binary
threshold neuron for the output so that
y=1 if xTw+b>=0 and 0 otherwise
What will be the weights and bias after 1 iteration of
perceptron learning algorithm?
 w= (1.5,-0.5) b=-1.5
 w=(1.5,-0.5) b=-0.5
 w=(2.5,-1.5) b=0.5
 w=(-1.5,0.5) b=1.5

Basic models of ANN
Basic Models of ANN
Interconnections Learning rules Activation function

Classification based on
interconnections

Summary of the simple networks
 Single layer nets have limited representation
power (linear separability problem)
 Error drive seems a good way to train a net
 Multi-layer nets (or nets with non-linear hidden
units) may overcome linear inseparability
problem, learning methods for such nets are
needed
 Threshold/step output functions hinders the
effort to develop learning methods for multi-
layered nets

Training/ Learning
 Learning can be of one of the following forms:
 Supervised Learning
 Unsupervised Learning
 Reinforced Learning
 The patterns given to classifier may be on:
 Parametric Estimation
 Non- Parametric Estimation

Machine Learning in ANNs
 Supervised Learning − It involves a
teacher that is scholar than the ANN itself.
For example, the teacher feeds some
example data about which the teacher
already knows the answers.

 Unsupervised Learning − It is required
when there is no example data set with
known answers. For example, searching
for a hidden pattern. In this case, clustering
i.e. dividing a set of elements into groups
according to some unknown pattern is
carried out based on the existing data sets
present.

 Reinforcement Learning − This strategy
built on observation. The ANN makes a
decision by observing its environment. If
the observation is negative, the network
adjusts its weights to be able to make a
different required decision the next time.

Unsupervised Learning: why?
 Collecting and labeling a large set of sample patterns can
be costly.
 Train with large amounts of unlabeled data, and only then
use supervision to label the groupings found.
 In dynamic systems, the samples can change slowly.
 To find features that will then be useful for categorization.
To provide a form of data dependent smart processing or
smart feature extraction.
 To Perform exploratory data analysis, to find structure of
data, to form proper classes for supervised analysis.

Measure of Dissimilarity:
 Define a metric or distance function d on the vector space λ as
a real-valued function on the Cartesian product λX λ such that:
 Positive Definiteness:
0 < d(x,y) < ∞ for x,y ελ and d(x,y)=0 if and only if x=y
 Symmetry:
d(x,y) = d(y,x) for x,y ελ
 Triangular Inequality:
d(x,y) = d(x,z) + d(y,z) for x,y,z ελ
 Invariance or distance function: d(x+z,y+z) = d(x,y)

Error Computation
 Minkowski Matrix or Lk norm
 Manhattan Distance or L1 norm
 Euclidian Distance or L2 norm
 Ln norm

Neural networks have performed
successfully where other methods have
not, predicting system behavior,
recognizing and matching complicated,
vague, or incomplete data patterns.
Apply ANNs to pattern recognition,
interpretation, prediction, diagnosis,
planning, monitoring, debugging, repair,
instruction, control
 Biomedical Signal Processing
 Biometric Identification
 Pattern Recognition
 System Reliability
 Business
 Target Tracking
Neural Network Applications

Pattern Recognition System
Sensing Segmentation
Classification (missing
features & context)
Post-processing (costs/
errors)
Feature Extraction
Input
Output (decision)

Feed-forward neural networks
• These are the commonest type of neural
network in practical applications.
– The first layer is the input and the last layer
is the output.
– If there is more than one hidden layer, we
call them “deep” neural networks.
• They compute a series of transformations that
change the similarities between cases.
– The activities of the neurons in each layer
are a non-linear function of the activities in
the layer below.
hidden units
output units
input units

Feedforward Network
• Its output and input vectors are
respectively
• Weight wij connects the i’th neuron with
j’th input. Activation rule of ith neuron is
where
EXAMPLE

Multilayer feed forward network
Can be used to solve complicated problems

Feedback network
When outputs are directed back as
inputs to same or preceding layer
nodes it results in the formation of
feedback networks

Lateral feedback
If the feedback of the output of the processing elements is directed back
as input to the processing elements in the same layer then it is called
lateral feedback

Recurrent networks
• These have directed cycles in their connection
graph.
– That means you can sometimes get back to
where you started by following the arrows.
• They can have complicated dynamics and this
can make them very difficult to train.
– There is a lot of interest at present in finding
efficient ways of training recurrent nets.
• They are more biologically realistic.
Recurrent nets with
multiple hidden layers
are just a special case
that has some of the
hiddenhidden
connections missing.

Recurrent neural networks for modeling sequences
• Recurrent neural networks are a very natural
way to model sequential data:
– They are equivalent to very deep nets with
one hidden layer per time slice.
– Except that they use the same weights at
every time slice and they get input at every
time slice.
• They have the ability to remember information
in their hidden state for a long time.
– But its very hard to train them to use this
potential.
input
input
input
hidden
hidden
hidden
output
output
output
time 

An example of what recurrent neural nets can now do
(to whet your interest!)
• Ilya Sutskever (2011) trained a special type of recurrent neural net to
predict the next character in a sequence.
• After training for a long time on a string of half a billion characters
from English Wikipedia, he got it to generate new text.
– It generates by predicting the probability distribution for the next
character and then sampling a character from this distribution.

Symmetrically connected networks
• These are like recurrent networks, but the connections between units
are symmetrical (they have the same weight in both directions).
– John Hopfield (and others) realized that symmetric networks are
much easier to analyze than recurrent networks.
– They are also more restricted in what they can do. because they
obey an energy function.
• For example, they cannot model cycles.
• Symmetrically connected nets without hidden units are called
“Hopfield nets”.

Symmetrically connected networks
with hidden units
• These are called “Boltzmann machines”.
– They are much more powerful models than Hopfield nets.
– They are less powerful than recurrent neural networks.
– They have a beautifully simple learning algorithm.

Learning
• It’s a process by which a NN adapts itself
to a stimulus by making proper parameter
adjustments, resulting in the production of
desired response
• Two kinds of learning
– Parameter learning:- connection weights are
updated
– Structure Learning:- change in network
structure

Training
• The process of modifying the weights in
the connections between network layers
with the objective of achieving the
expected output is called training a
network.
• This is achieved through
– Supervised learning
– Unsupervised learning
– Reinforcement learning

Classification of learning
• Supervised learning:-
– Learn to predict an output when given an input
vector.
• Unsupervised learning
– Discover a good internal representation of the
input.
• Reinforcement learning
– Learn to select an action to maximize payoff.

Supervised Learning
• Child learns from a teacher
• Each input vector requires a corresponding
target vector.
• Training pair=[input vector, target vector]
Neural
Network
W
Error
Signal
Generator
X
(Input)
Y
(Actual output)
(Desired Output)
Error
(D-Y)
signals

• Each training case consists of an input vector x and a
target output t.
• Regression: The target output is a real number or a whole
vector of real numbers.
– The price of a stock in 6 months time.
– The temperature at noon tomorrow.
• Classification: The target output is a class label.
– The simplest case is a choice between 1 and 0.
– We can also have multiple alternative labels.
Two types of supervised learning

Unsupervised
Learning
• How a fish or tadpole learns
• All similar input patterns are grouped together as clusters.
• If a matching input pattern is not found a new cluster is formed
• One major aim is to create an internal representation of the input
that is useful for subsequent supervised or reinforcement learning.
• It provides a compact, low-dimensional representation of the input.

Self-organizing
• In unsupervised learning there is no
feedback
• Network must discover patterns,
regularities, features for the input data over
the output
• While doing so the network might change
in parameters
• This process is called self-organizing

Reinforcement Learning
NN
W
Error
Signal
Generator
X
(Input)
Y
(Actual output)
Error
signals R
Reinforcement signal

When Reinforcement learning is used?
• If less information is available about the
target output values (critic information)
• Learning based on this critic information is
called reinforcement learning and the
feedback sent is called reinforcement
signal
• Feedback in this case is only evaluative
and not instructive

1. Identity Function
f(x)=x for all x
2. Binary Step function
3. Bipolar Step function
4. Sigmoidal Functions:- Continuous functions
5. Ramp functions:-
Activation Function

Some learning algorithms we will learn
are
• Supervised:
• Adaline, Madaline
• Perceptron
• Back Propagation
• multilayer perceptrons
• Radial Basis Function Networks
• Unsupervised
• Competitive Learning
• Kohenen self organizing map
• Learning vector quantization
• Hebbian learning

Neural processing
• Recall:- processing phase for a NN and its
objective is to retrieve the information. The
process of computing o for a given x
• Basic forms of neural information
processing
– Auto association
– Hetero association
– Classification

Neural processing-Autoassociation
• Set of patterns can be
stored in the network
• If a pattern similar to a
member of the stored
set is presented, an
association with the
input of closest stored
pattern is made

Neural Processing- Heteroassociation
• Associations between
pairs of patterns are
stored
• Distorted input pattern
may cause correct
heteroassociation at
the output

Neural processing-Classification
• Set of input patterns is
divided into a number
of classes or
categories
• In response to an
input pattern from the
set, the classifier is
supposed to recall the
information regarding
class membership of
the input pattern.

Neural Network Learning rules
c – learning constant

Hebbian Learning Rule
• The learning signal is equal to the neuron’s
output
FEED FORWARD UNSUPERVISED LEARNING

Features of Hebbian Learning
• Feedforward unsupervised learning
• “When an axon of a cell A is near enough
to exicite a cell B and repeatedly and
persistently takes place in firing it, some
growth process or change takes place in
one or both cells increasing the efficiency”
• If oixj is positive the results is increase in
weight else vice versa

Delta Learning Rule
• Only valid for continuous activation function
• Used in supervised training mode
• Learning signal for this rule is called delta
• The aim of the delta rule is to minimize the error over all training
patterns

Delta Learning Rule Contd.
Learning rule is derived from the condition of least squared error.
Calculating the gradient vector with respect to wi
Minimization of error requires the weight changes to be in the negative
gradient direction

Widrow-Hoff learning Rule
• Also called as least mean square learning rule
• Introduced by Widrow(1962), used in supervised learning
• Independent of the activation function
• Special case of delta learning rule wherein activation function is an
identity function ie f(net)=net
• Minimizes the squared error between the desired output value di
and neti

Winner-Take-All learning rules

Winner-Take-All Learning rule Contd…
• Can be explained for a layer of neurons
• Example of competitive learning and used for
unsupervised network training
• Learning is based on the premise that one of the
neurons in the layer has a maximum response
due to the input x
• This neuron is declared the winner with a weight

Linear Separability
• Separation of the input space into regions
is based on whether the network response
is positive or negative
• Line of separation is called linear-
separable line.
• Example:-
– AND function & OR function are linear
separable Example
– EXOR function Linearly inseparable. Example

Hebb Network
• Hebb learning rule is the simpliest one
• The learning in the brain is performed by the
change in the synaptic gap
• When an axon of cell A is near enough to excite
cell B and repeatedly keep firing it, some growth
process takes place in one or both cells
• According to Hebb rule, weight vector is found to
increase proportionately to the product of the
input and learning signal.

Flow chart of Hebb training algorithm
Start
Initialize Weights
For
Each
s:t
Activate input
xi=si
1
1
Activate output
y=t
Weight update
Bias update
b(new)=b(old) + y
Stop
y
n

Artificial Neural Networks for NIU session 2016 17

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Artificial Neural Networks for NIU session 2016 17

Similar to Artificial Neural Networks for NIU session 2016 17 (20)

More from Prof. Neeta Awasthy

More from Prof. Neeta Awasthy (20)

Recently uploaded

Recently uploaded (20)

Artificial Neural Networks for NIU session 2016 17