INTRODUCTION TO
ARTIFICIAL NEURAL NETWORKS
(ANN)
Definition, why and how are neural
networks being used in solving problems
Human biological neuron
Artificial Neuron
Outline
Applications of ANN
The idea of ANNs..?
NNs learn relationship between cause and effect or
organize large volumes of data into orderly and
informative patterns.
frog
lion
bird
What is that?
It’s a frog
5
6
Neural networks to the rescue…
•Neural network: information processing paradigm
inspired by biological nervous systems, such as our
brain
•Structure: large number of highly interconnected
processing elements (neurons) working together
•Like people, they learn from experience (by
example)
7
Definition of ANN
“Data processing system consisting of a large number
of simple, highly interconnected processing elements
(artificial neurons) in an architecture inspired by the
structure of the cerebral cortex of the brain”
(Tsoukalas & Uhrig, 1997).
8
Inspiration from Neurobiology
9
Human Biological Neuron
86 billion neurons
can be found in the
human nervous
system and they are
connected with
approximately 10¹⁴ —
10¹⁵ synapses.
Biological Neural Networks
A biological neuron has
three types of main
components; dendrites,
soma (or cell body) and
axon.
Dendrites receives
signals from other
neurons.
The soma, sums the incoming signals.
When sufficient input is received, the cell fires;
that is it transmit a signal over its axon to other
cells.
Artificial Neurons
ANN is an information processing system that has
certain performance characteristics in common
with biological nets.
Several key features of the processing elements of
ANN are suggested by the properties of biological
neurons:
1. The processing element receives many signals.
2. Signals may be modified by a weight at the receiving
synapse.
3. The processing element sums the weighted inputs.
4. Under appropriate circumstances (sufficient input),
the neuron transmits a single output.
5. The output from a particular neuron may go to many
other neurons.
Artificial Neurons
A physical neuron
An artificial neuron
13
• From experience: examples / training data
• Strength of connection between the neurons is stored as a weight-
value for the specific connection.
• Learning the solution to a problem = changing the connection weights
Artificial Neurons
Model Of A Neuron
14
 f()
Y
Wa
Wb
Wc
Connection
weights
Summing
function
computation
X1
X3
X2
Input units
(dendrite) (synapse) (axon)
(soma)
• A neural net consists of a large number of simple processing
elements called neurons, units, cells or nodes.
• Each neuron is connected to other neurons by means of
directed communication links, each with associated weight.
• The weight represent information being used by the net to
solve a problem.
• Each neuron has an internal state, called its activation or
activity level, which is a function of the inputs it has received.
Typically, a neuron sends its activation as a signal to several
other neurons.
15
Model Of A Neuron
• It is important to note that a neuron can send only one
signal at a time, although that signal is broadcast to
several other neurons.
• Neural networks are configured for a specific application,
such as pattern recognition or data classification, through a
learning process
• In a biological system, learning involves adjustments to the
synaptic connections between neurons
 same for artificial neural networks (ANNs)
16
Model Of A Neuron
17
Perceptron
• Perceptron is a single layer neural network and a
multi-layer perceptron is called Neural Networks.
•Perceptron is a linear classifier (binary).
•it is used in supervised learning.
•It helps to classify the given input data.
18
The perceptron
19
The perceptron consists of 4 parts.
1.Input values or One input layer
2.Weights and Bias
3.Net sum
4.Activation Function
The perceptron consists of 4 parts.
• Input Nodes (input layer): No computation is done here within this layer,
they just pass the information to the next layer (hidden layer most of
the time). A block of nodes is also called layer.
• Hidden nodes (hidden layer): In Hidden layers is where intermediate
processing or computation is done, they perform computations and
then transfer the weights (signals or information) from the input layer to
the following layer (another hidden layer or to the output layer).
20
Neural Network Architecture
• Activation function: the activation function of a node
defines the output of that node given an input or set of
inputs. A standard computer chip circuit can be seen as a
digital network of activation functions that can be “ON” (1)
or “OFF” (0), depending on input.
• Learning rule: The learning rule is a rule or an algorithm
which modifies the parameters of the neural network, in
order for a given input to the network to produce a favored
output.
• This learning process typically amounts to modifying the weights
and thresholds.
21
how does it work?
• The perceptron works on these simple steps
a. All the inputs x are multiplied with their weights w. Let’s call it k.
22
how does it work?
b. Add all the multiplied values and call them Weighted Sum.
23
how does it work?
c. Apply that weighted sum to the correct Activation Function.
For Example: Unit Step Activation Function.
24
Why do we need Weights and Bias?
• Weights shows the strength of the particular node.
• A bias value allows you to shift the activation function curve
up or down.
• They are added to the middle data units to help influence the
end product.
• Biases cannot be added to initial units of data.
• Like weights, biases will also be adjusted through reversing the
neural network flow in order to produce the most accurate
end result.
• When a bias is added, even if the previous unit has a value of
zero, the bias will activate a signal and push the data forward.
25
Activation Functions
• The Activation Functions can be basically divided into 2 types
• Linear Activation Function
• Non-linear Activation Functions
26
linear Activation Function
• Equation : f(x) = x
• Range : (-infinity to infinity)
• It doesn’t help with the complexity or various parameters of usual data
that is fed to the neural networks.
• The output of the functions will not be confined between any range.
27
Non-linear Activation Function
• The Nonlinear Activation Functions are the most used activation
functions. Nonlinearity helps to makes the graph look something like
this
28
Non-linear Activation Function
• It makes it easy for the model to generalize or adapt with variety of
data and to differentiate between the output.
• The main terminologies needed to understand for nonlinear
functions are:
• Derivative or Differential: Change in y-axis w.r.t. change in x-axis.It
is also known as slope.
• Monotonic function: A function which is either entirely non-
increasing or non-decreasing.
29
1. Sigmoid or Logistic Activation Function
• The Sigmoid Function curve looks like a S-shape.
• sigmoid function is because it exists between (0 to 1).
• Therefore, it is especially used for models where we have to predict
the probability as an output.
• Since probability of anything exists only between the range of 0 and 1,
sigmoid is the right choice.
• The function is differentiable. That means, we can find the slope of
the sigmoid curve at any two points.
• The function is monotonic but function’s derivative is not.
30
softmax function is a more generalized
logistic activation function which is used for
multiclass classification.
Tanh or hyperbolic tangent Activation
Function
• tanh is also like logistic sigmoid but better. The range of the tanh
function is from (-1 to 1). tanh is also sigmoidal (s - shaped).
31
• The advantage is that the negative inputs will be
mapped strongly negative and the zero inputs will
be mapped near zero in the tanh graph.
• The function is differentiable.
• The function is monotonic while its derivative is not
monotonic.
• The tanh function is mainly used classification
between two classes.
Both tanh and logistic sigmoid activation functions are used in feed-forward
nets.
ReLU (Rectified Linear Unit) Activation
Function
• The ReLU is the most used activation function in the world right now.
• Since, it is used in almost all the convolutional neural networks or
deep learning.
32
• ReLU is half rectified (from bottom).
• f(z) is zero when z is less than zero and f(z) is equal to z when z is above
or equal to zero.
• Range: [ 0 to infinity)
• The function and its
derivative both are
monotonic.
ReLU (Rectified Linear Unit) Activation
Function
• ReLU: issue is that all the negative values become zero immediately
• which decreases the ability of the model to fit or train from the data
properly.
• That means any negative input given to the ReLU activation function
turns the value into zero immediately in the graph, which in turns
affects the resulting graph by not mapping the negative values
appropriately.
33
• Leaky ReLU: It is an attempt to solve the dying ReLU problem
• the value of a is 0.01 or so.
• range of the Leaky ReLU is (-infinity to infinity).
34
35
Characterization
•Architecture
•a pattern of connections between neurons
•Single Layer Feedforward
•Multilayer Feedforward
•Recurrent
•Strategy / Learning Algorithm
•a method of determining the connection weights
•Supervised
•Unsupervised
•Reinforcement
•Activation Function
•Function to compute output signal from input signal
36
Feedforward Neural Network
• A feedforward neural network is an artificial neural network
where connections between the units do not form a cycle.
• In this network, the information moves in only one direction,
forward, from the input nodes, through the hidden nodes (if
any) and to the output nodes.
• There are no cycles or loops in the network.
• Two types of feedforward neural networks:
• Single-layer Perceptron/ Feedforward
• Multi-layer perceptron (MLP)/ Feedforward
37
Single Layer Feedforward NN
38
• This is the simplest feedforward neural Network and does not contain
any hidden layer, Which means it only consists of a single layer of
output nodes.
• This is said to be single because when we count the layers we do not
include the input layer, the reason for that is because at the input
layer no computations is done, the inputs are fed directly to the
outputs via a series of weights.
Multilayer Neural Network
39
• In many applications the units of these networks apply a sigmoid
function as an activation function.
• MLP are very more useful and one good reason is that, they are
able to learn non-linear representations (most of the cases the
data presented to us is not linearly separable)
• This class of networks consists
of multiple layers of
computational units, usually
interconnected in a feed-
forward way.
• Each neuron in one layer has
directed connections to the
neurons of the subsequent
layer.
Convolutional Neural Network (CNN)
• Convolutional Neural Networks are very similar to ordinary
Neural Networks, they are made up of neurons that have
learnable weights and biases.
• In convolutional neural network (CNN, or ConvNet or shift
invariant or space invariant) the unit connectivity pattern is
inspired by the organization of the visual cortex, Units
respond to stimuli in a restricted region of space known as
the receptive field.
• Receptive fields partially overlap, over-covering the entire
visual field.
40
Convolutional Neural Network (CNN)
• Unit response can be approximated mathematically by a
convolution operation.
• They are variations of multilayer perceptron's that use minimal
preprocessing.
• Their wide applications is in image and video recognition,
recommender systems and natural language processing.
• CNNs requires large data to train on.
41
• In recurrent neural network (RNN), connections between
units form a directed cycle (they propagate data forward,
but also backwards, from later processing stages to earlier
stages).
• This allows it to exhibit dynamic temporal behavior.
• Unlike feedforward neural networks, RNNs can use their
internal memory to process arbitrary sequences of inputs.
• This makes them applicable to tasks such as unsegmented,
connected handwriting recognition, speech recognition and
other general sequence processors.
42
Recurrent neural networks
43
44
45
46
47
48
49
50
51
52
53
54
55

UNIT 5-ANN.ppt

  • 1.
  • 2.
    Definition, why andhow are neural networks being used in solving problems Human biological neuron Artificial Neuron Outline Applications of ANN
  • 3.
    The idea ofANNs..? NNs learn relationship between cause and effect or organize large volumes of data into orderly and informative patterns. frog lion bird What is that? It’s a frog
  • 5.
  • 6.
  • 7.
    Neural networks tothe rescue… •Neural network: information processing paradigm inspired by biological nervous systems, such as our brain •Structure: large number of highly interconnected processing elements (neurons) working together •Like people, they learn from experience (by example) 7
  • 8.
    Definition of ANN “Dataprocessing system consisting of a large number of simple, highly interconnected processing elements (artificial neurons) in an architecture inspired by the structure of the cerebral cortex of the brain” (Tsoukalas & Uhrig, 1997). 8
  • 9.
    Inspiration from Neurobiology 9 HumanBiological Neuron 86 billion neurons can be found in the human nervous system and they are connected with approximately 10¹⁴ — 10¹⁵ synapses.
  • 10.
    Biological Neural Networks Abiological neuron has three types of main components; dendrites, soma (or cell body) and axon. Dendrites receives signals from other neurons. The soma, sums the incoming signals. When sufficient input is received, the cell fires; that is it transmit a signal over its axon to other cells.
  • 11.
    Artificial Neurons ANN isan information processing system that has certain performance characteristics in common with biological nets. Several key features of the processing elements of ANN are suggested by the properties of biological neurons: 1. The processing element receives many signals. 2. Signals may be modified by a weight at the receiving synapse. 3. The processing element sums the weighted inputs. 4. Under appropriate circumstances (sufficient input), the neuron transmits a single output. 5. The output from a particular neuron may go to many other neurons.
  • 12.
    Artificial Neurons A physicalneuron An artificial neuron
  • 13.
    13 • From experience:examples / training data • Strength of connection between the neurons is stored as a weight- value for the specific connection. • Learning the solution to a problem = changing the connection weights Artificial Neurons
  • 14.
    Model Of ANeuron 14  f() Y Wa Wb Wc Connection weights Summing function computation X1 X3 X2 Input units (dendrite) (synapse) (axon) (soma)
  • 15.
    • A neuralnet consists of a large number of simple processing elements called neurons, units, cells or nodes. • Each neuron is connected to other neurons by means of directed communication links, each with associated weight. • The weight represent information being used by the net to solve a problem. • Each neuron has an internal state, called its activation or activity level, which is a function of the inputs it has received. Typically, a neuron sends its activation as a signal to several other neurons. 15 Model Of A Neuron
  • 16.
    • It isimportant to note that a neuron can send only one signal at a time, although that signal is broadcast to several other neurons. • Neural networks are configured for a specific application, such as pattern recognition or data classification, through a learning process • In a biological system, learning involves adjustments to the synaptic connections between neurons  same for artificial neural networks (ANNs) 16 Model Of A Neuron
  • 17.
  • 18.
    Perceptron • Perceptron isa single layer neural network and a multi-layer perceptron is called Neural Networks. •Perceptron is a linear classifier (binary). •it is used in supervised learning. •It helps to classify the given input data. 18
  • 19.
    The perceptron 19 The perceptronconsists of 4 parts. 1.Input values or One input layer 2.Weights and Bias 3.Net sum 4.Activation Function
  • 20.
    The perceptron consistsof 4 parts. • Input Nodes (input layer): No computation is done here within this layer, they just pass the information to the next layer (hidden layer most of the time). A block of nodes is also called layer. • Hidden nodes (hidden layer): In Hidden layers is where intermediate processing or computation is done, they perform computations and then transfer the weights (signals or information) from the input layer to the following layer (another hidden layer or to the output layer). 20
  • 21.
    Neural Network Architecture •Activation function: the activation function of a node defines the output of that node given an input or set of inputs. A standard computer chip circuit can be seen as a digital network of activation functions that can be “ON” (1) or “OFF” (0), depending on input. • Learning rule: The learning rule is a rule or an algorithm which modifies the parameters of the neural network, in order for a given input to the network to produce a favored output. • This learning process typically amounts to modifying the weights and thresholds. 21
  • 22.
    how does itwork? • The perceptron works on these simple steps a. All the inputs x are multiplied with their weights w. Let’s call it k. 22
  • 23.
    how does itwork? b. Add all the multiplied values and call them Weighted Sum. 23
  • 24.
    how does itwork? c. Apply that weighted sum to the correct Activation Function. For Example: Unit Step Activation Function. 24
  • 25.
    Why do weneed Weights and Bias? • Weights shows the strength of the particular node. • A bias value allows you to shift the activation function curve up or down. • They are added to the middle data units to help influence the end product. • Biases cannot be added to initial units of data. • Like weights, biases will also be adjusted through reversing the neural network flow in order to produce the most accurate end result. • When a bias is added, even if the previous unit has a value of zero, the bias will activate a signal and push the data forward. 25
  • 26.
    Activation Functions • TheActivation Functions can be basically divided into 2 types • Linear Activation Function • Non-linear Activation Functions 26
  • 27.
    linear Activation Function •Equation : f(x) = x • Range : (-infinity to infinity) • It doesn’t help with the complexity or various parameters of usual data that is fed to the neural networks. • The output of the functions will not be confined between any range. 27
  • 28.
    Non-linear Activation Function •The Nonlinear Activation Functions are the most used activation functions. Nonlinearity helps to makes the graph look something like this 28
  • 29.
    Non-linear Activation Function •It makes it easy for the model to generalize or adapt with variety of data and to differentiate between the output. • The main terminologies needed to understand for nonlinear functions are: • Derivative or Differential: Change in y-axis w.r.t. change in x-axis.It is also known as slope. • Monotonic function: A function which is either entirely non- increasing or non-decreasing. 29
  • 30.
    1. Sigmoid orLogistic Activation Function • The Sigmoid Function curve looks like a S-shape. • sigmoid function is because it exists between (0 to 1). • Therefore, it is especially used for models where we have to predict the probability as an output. • Since probability of anything exists only between the range of 0 and 1, sigmoid is the right choice. • The function is differentiable. That means, we can find the slope of the sigmoid curve at any two points. • The function is monotonic but function’s derivative is not. 30 softmax function is a more generalized logistic activation function which is used for multiclass classification.
  • 31.
    Tanh or hyperbolictangent Activation Function • tanh is also like logistic sigmoid but better. The range of the tanh function is from (-1 to 1). tanh is also sigmoidal (s - shaped). 31 • The advantage is that the negative inputs will be mapped strongly negative and the zero inputs will be mapped near zero in the tanh graph. • The function is differentiable. • The function is monotonic while its derivative is not monotonic. • The tanh function is mainly used classification between two classes. Both tanh and logistic sigmoid activation functions are used in feed-forward nets.
  • 32.
    ReLU (Rectified LinearUnit) Activation Function • The ReLU is the most used activation function in the world right now. • Since, it is used in almost all the convolutional neural networks or deep learning. 32 • ReLU is half rectified (from bottom). • f(z) is zero when z is less than zero and f(z) is equal to z when z is above or equal to zero. • Range: [ 0 to infinity) • The function and its derivative both are monotonic.
  • 33.
    ReLU (Rectified LinearUnit) Activation Function • ReLU: issue is that all the negative values become zero immediately • which decreases the ability of the model to fit or train from the data properly. • That means any negative input given to the ReLU activation function turns the value into zero immediately in the graph, which in turns affects the resulting graph by not mapping the negative values appropriately. 33 • Leaky ReLU: It is an attempt to solve the dying ReLU problem • the value of a is 0.01 or so. • range of the Leaky ReLU is (-infinity to infinity).
  • 34.
  • 35.
  • 36.
    Characterization •Architecture •a pattern ofconnections between neurons •Single Layer Feedforward •Multilayer Feedforward •Recurrent •Strategy / Learning Algorithm •a method of determining the connection weights •Supervised •Unsupervised •Reinforcement •Activation Function •Function to compute output signal from input signal 36
  • 37.
    Feedforward Neural Network •A feedforward neural network is an artificial neural network where connections between the units do not form a cycle. • In this network, the information moves in only one direction, forward, from the input nodes, through the hidden nodes (if any) and to the output nodes. • There are no cycles or loops in the network. • Two types of feedforward neural networks: • Single-layer Perceptron/ Feedforward • Multi-layer perceptron (MLP)/ Feedforward 37
  • 38.
    Single Layer FeedforwardNN 38 • This is the simplest feedforward neural Network and does not contain any hidden layer, Which means it only consists of a single layer of output nodes. • This is said to be single because when we count the layers we do not include the input layer, the reason for that is because at the input layer no computations is done, the inputs are fed directly to the outputs via a series of weights.
  • 39.
    Multilayer Neural Network 39 •In many applications the units of these networks apply a sigmoid function as an activation function. • MLP are very more useful and one good reason is that, they are able to learn non-linear representations (most of the cases the data presented to us is not linearly separable) • This class of networks consists of multiple layers of computational units, usually interconnected in a feed- forward way. • Each neuron in one layer has directed connections to the neurons of the subsequent layer.
  • 40.
    Convolutional Neural Network(CNN) • Convolutional Neural Networks are very similar to ordinary Neural Networks, they are made up of neurons that have learnable weights and biases. • In convolutional neural network (CNN, or ConvNet or shift invariant or space invariant) the unit connectivity pattern is inspired by the organization of the visual cortex, Units respond to stimuli in a restricted region of space known as the receptive field. • Receptive fields partially overlap, over-covering the entire visual field. 40
  • 41.
    Convolutional Neural Network(CNN) • Unit response can be approximated mathematically by a convolution operation. • They are variations of multilayer perceptron's that use minimal preprocessing. • Their wide applications is in image and video recognition, recommender systems and natural language processing. • CNNs requires large data to train on. 41
  • 42.
    • In recurrentneural network (RNN), connections between units form a directed cycle (they propagate data forward, but also backwards, from later processing stages to earlier stages). • This allows it to exhibit dynamic temporal behavior. • Unlike feedforward neural networks, RNNs can use their internal memory to process arbitrary sequences of inputs. • This makes them applicable to tasks such as unsegmented, connected handwriting recognition, speech recognition and other general sequence processors. 42 Recurrent neural networks
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.