Neural Networks Lec3.pptx

OUTLINE
• Biological neuron structure and function
• The structure of a Artificial Neurons (perceptron)
• What is a Neural Network?
• How do artificial neural networks work?
• Model of ANN
• Application of Artificial Neural Networks

Biological neuron structure and function
• The human ability to perceive their surroundings – to see, hear, and smell what’s
around you – depends on your nervous system.
• Human capacity to wonder how you know where you are depends on your nervous
system.
• Human ability to act on that information also depends on your nervous system.
• All of these processes depend on the interconnected cells (neurons) that make up
your nervous system.
• The basic functions of a neuron:
• Receive signals (or information).
• Integrate incoming signals (to determine whether or not the information should be passed
along).
• Communicate signals to target cells (other neurons or muscles or glands).

Neuron Anatomy
• Neurons vary in size, shape, and
structure depending on their role
and location.
• The Soma is a neuron cell body
which converts input activation
into output activation.
• Axons are transmission lines that
send activation to other neurons.
• Dendrites are the receive zones that
receive activation from other
neurons.
• Synapse allows the signal
transmission between axons and
dendrite

Neuron Anatomy
• All neurons have three essential
parts:
• Dendrites: Input
• Soma (Cell body): Processor
• Axon: Output
• All neurons connected by
Synaptic: Link.

Neuron Anatomy
• A neuron is connected to other neurons through about 10,000 synapses(link).
• A neuron receives input from other neurons. Inputs are combined.
• Once input exceeds a critical level, the neuron discharges a spike ‐ an electrical pulse that travels from the
body, down the axon(output), to the next neuron(s).
• The axon endings almost touch the dendrites (Input) or cell body (Processor) of the next neuron.
• Transmission of an electrical signal from one neuron to the next is affected by neurotransmitters.
• Neurotransmitters are chemicals which are released from the first neuron and which bind to the Second.
• The strength of the signal that reaches the next neuron depends on factors such as the amount of
neurotransmitter available.

The structure of a Artificial Neurons (perceptron)
• Inspiring by Biological neuron, artificial neuron has been made.
• The structure of an artificial neuron (Perceptron) is as shown:
Input units
 f()
Y
Wa
Wb
Wc
Connectio
n weights
Summing
function
Activation
Function
X1
X3
X2
Input(s)
Output
(dendrite) (synapse) (axon)
(soma)
b Bias

The structure of a Artificial Neurons (perceptron)
• Similarity to the biological network, the function of the perceptron is the same:
1. Receives inputs from other sources.
2. Combine them in some way.
3. Performs a generally nonlinear operation on the result.
4. Outputs the final result.
• Hence, the perceptron is a mathematical function modelled on the working of biological neurons.
• Perceptron is a linear classifier (binary).
• One or more inputs are separately weighted.
• Inputs are summed and passed through a nonlinear function to produce output.
• Every perceptron holds an internal state called an activation signal.
• Every perceptron is connected to another neuron via a connection link
• Each connection link carries information about the input signal.
• It is important to note that a perceptron can send only one signal at a time.

The structure of a perceptron
• The inputs of perceptron reflect the real world data.
𝑥 = (𝑥1, … , 𝑥𝑛)
• The synaptic weights 𝑤 = 𝑤1, … , 𝑤𝑛 . is a vector of size the number of inputs.
• The bias 𝑏 is a constant real value or vector.
• The summing (combination) function, 𝜇(𝑤):
• takes the input vector 𝑥 to produce a combined value.
• The combination is computed as bias plus a linear combination of the synaptic weights and the inputs in the perceptron.
𝜇 = 𝑏 + 𝑖=1
𝑛
𝑤𝑖 . 𝑥𝑖 𝑤ℎ𝑒𝑟𝑒 𝑖 = 1, … , 𝑛
• The bias increases or reduces the net input to the activation function, depending on whether it is positive or negative.
• The activation function 𝑓(⋅) defines the output from the perceptron in terms of its combination.
• The output 𝑦 is represented in terms of the composition of the combination and the activation
functions.

The structure of a perceptron Cont.
• Why do we need Weights?
• To answer this question:
• Let us consider there is no weights as shown in the Figure:
• The summation function 𝜇 sums up all the inputs and adds bias to it.
• Let us say, the role of the activation function is to allocate the data points to one of the classes.
• Compare the model expression with line equation as shown in Eq. 1 and Eq. 2 :
𝑦 = 𝑚𝑥 + 𝑐 1
𝑥2 = −𝑥1 + 𝑏 2
• The slope(𝑚) of the equation 𝑥₂ = −𝑥₁ + 𝑏 is fixed that is -1.
𝜇 = 𝑏 +
𝑖=1
2
𝑥𝑖
= 𝑥1 + 𝑥2 + 𝑏
𝑦 = 1 𝑖𝑓 𝜇 𝑥 ≥ 0
= 0 𝑒𝑙𝑠𝑒

• Example: Consider the following dataset for illustration:
• It contains two independent features [𝑥₁ 𝑎𝑛𝑑 𝑥₂] and one dependent
feature y.
• Our task is to classify a given data point to one of the classes that
belong to feature 𝑦.
• From the data, we infer that:
• If we try to fit the line equation ( 𝑥₂ = −𝑥₁ + 𝑏) for the different
values of 𝑏 we will get the following plot:

• Orange line is with 𝑏 = 0.
• Blue line is with 𝑏 = 1.
• Green line is with 𝑏 = 2.
𝑥₂ = −𝑥₁ + 𝑏
• Changing the value of 𝑏, end up with parallel lines.
• No change in the orientation or slope of the line.
• So, we require weights to change the orientation of
the line.

• What do the weights in a Neuron convey to us?
• Importance of the feature: Features with weights that are close to zero said to have
lesser importance in the prediction process compared to the features with weights
having a larger value.
• Tells the relationship between input and output:
• Example: consider the input of the perceptron represented as follow:
• Suppose that people often tend to buy a car within their budget and the most popular one among
many as follow:
• Hence, if the weight is positive, there is a direct relationship between that feature and the target
value, and inverse relationship if the weight is negative.

• Why do we need Bias?
• A bias value allows to shift the function curve
up or down.
• It can increase classification model accuracy.
• It serves as another model parameter to increase
the model performance on training data.

• Activation Functions:
• The activation function, also known as the transfer function, is the nonlinear function
applied on the inner product 𝑋𝑇𝑊 in an artificial neural network.
• Properties of activation function
1. Nonlinear: There are two reasons why an activation function should be
nonlinear:
• The boundaries or patterns in real-world problems are mostly non-linear and a
non-linear function can easily approximate a linear boundary whereas a linear
function cannot approximate a non-linear boundary.
• If the activation function is linear, then a perceptron with multiple hidden
layers can be easily compressed to a single layer perceptron.

2. Differentiable:
• During backpropagation, the gradient of the loss function is calculated in gradient descent
method.
• The gradient of the loss function with respect to weight is calculated using chain rule.
• Hence, it is necessary that the activation function is differentiable with respect to its input.
3. Continuous: A function cannot be differentiable unless it is continuous.
4. Bounded:
• The input data is passed through a series of perceptrons, each of which contains an activation
function.
• If the function is not bounded in a range, the output value may explode.

5. Zero-centered:
• A function is said to be zero centered when its range contains both positive and negative values.
• If the activation function of the network is not zero centered, 𝑦 = 𝑓(𝑋𝑇
𝑊) is always positive or always negative.
• Thus, the output of a layer is always being moved to either the positive values or the negative values.
• As a result, the weight vector needs more update to be trained properly.
• So, the number of epochs needed for the network to get trained increases if the activation function is not zero
centered.
6. Computational cost:
• It is defined as the time required to generate the output of the activation function when input is fed to it.
• The computational cost of the gradient is also important as the gradient is calculated during weight update in
backpropagation.
• Gradient descent optimization itself is a very time consuming process and many iterations are needed to perform
this. Therefore, the computational cost is an issue.
• When the activation function and the gradient of the activation function have high computational cost, it requires
more time to get trained.

• Why do we need Activation Function?
• It is biologically inspired by activities inside our brain wherein different neurons get
activated by different stimulus.
• It decides whether a neuron should be activated or not.
• It will decide whether the neuron’s input to the network is important or not.
• It is also used to introduce non-linearity into the output of a neuron.
• It does the non-linear transformation to the input making it capable to learn and
perform more complex tasks.
• Studying the derivatives and application of activation functions is essential for
selecting the proper type of activation function that give accuracy in a particular
Neural Network model.

• Problems faced by activation functions:
There are two major problems:
1. Vanishing gradient problem:
• When an activation function compresses a large range of input into a small output range, a large change
to the input of the activation function results into a very small change to the output.
• This leads to a small (typically close to zero) gradient value.
• During weight update, when backpropagation calculates the gradients of the network, the gradients of
each layer are multiplied down from the final layer to that layer because of the chain rule.
• When a value close to zero is multiplied to other values close to zero several times, the value becomes
closer and closer to zero.
• So, the weights get saturated and they are not updated properly.
• The neurons, of which the weights are not updated properly are called the saturated neurons.

• Problems faced by activation functions:
There are two major problems:
1. Dead neuron problem:
• When an activation function forces a large part of the input to zero or almost zero, those corresponding
neurons are inactive/dead in contributing to the final output.
• During weight update, there is a possibility that the weights will be updated in such a way that the
weighted sum of a large part of the network will be forced to zero.
• A network will hardly recover from such a situation and a large portion of the input fails to contribute
to the network.
• This leads to a problem because a large part of the input may remain completely deactivated during the
network performs.
• These forcefully deactivated neurons are called 'dead neurons' and this problem is termed as the dead
neuron problem.

• Types of Neural Networks Activation Functions:
1. Binary Step Function:
• It depends on a threshold value that decides whether a neuron should be activated or not.
• Mathematically it can be represented as:
𝑓 𝑥 =
0 𝑓𝑜𝑟 𝑥 < 𝜃
1 𝑓𝑜𝑟 𝑥 ≥ 𝜃
• The limitations of binary step function:
• It cannot provide multi-value outputs—for example, it cannot be used for multi-class
classification problems.
• The gradient (slop) of the step function is zero, which causes a hindrance in the
backpropagation process.

2. Linear Activation Function:
• The activation is proportional to the input.
• The function doesn't do anything to the weighted sum of the input, it simply spits out the
value it was given.
𝑓 𝑥 = 𝑥
• The limitations of the Linear activation function:
• It’s not possible to use backpropagation as the derivative
of the function is a constant and has no relation to the input x.
• All layers of the neural network will collapse into one if a linear activation function
is used. So, essentially, a linear activation function turns the neural network into just
one layer.

3. Non-Linear Activation Functions:
• The linear activation function is simply a linear regression model.
• Because of its limited power, this does not allow the model to create complex mappings between
the network’s inputs and outputs.
• Non-linear activation functions solve the limitations of linear activation functions.
• The common non-linear activations functions are:
• Sigmoid Function.
• Hyperbolic Tangent (Tanh) Function.
• Rectified Linear Unit (ReLU) Function.
• Leaky ReLU Function
• Softmax Function

• Sigmoid Function:
𝑓 𝑥 =
1
1 + 𝑒−𝑥
• This outputs a value between 0 and 1, making it useful for binary classification problems as one
can set a threshold “probability” value.
• The limitations of Sigmoid function:
• Computing exponents is power-intensive and can slow down large networks.
• Although, the function is computationally expensive, its gradient is not.
• Its gradient can be calculated using the formula 𝑓 𝑥 = 𝑓(𝑥)(1 − 𝑓 𝑥 ).
• The function is not zero-centred, as a result, the weight vector needs more updates to be
trained properly.
• It also has large portions of the line which are almost completely flat, giving very small
gradients (vanishing gradient problem) which makes the network hard to train.

• Hyperbolic Tangent (Tanh) Function:
𝑓 𝑥 =
1 − 𝑒−𝑥
1 + 𝑒−𝑥
• This outputs a value between -1 and 1.
• It is zero centred which makes parameter optimisation easier in layers coming after it.
• Hence, it gives better training performance for multi-layer neural networks.
• The limitations of Tanh function:
• The computing exponents is power-intensive and so can slow down large networks.
• It also suffers from saturation, the weights get saturated and they are not updated properly
(vanishing gradient problem).

• Rectified Linear Unit (ReLU) Function:
𝑓 𝑥 =
0 𝑥 < 0
𝑥 𝑥 ≥ 0
• This outputs a value between 0 and ∞.
• It has a gradient of 1 for positive values of x.
• It is extremely simple to compute, so it is the most popular choice of activation function for
hidden layers.
• The limitations of ReLU function:
• It is not zero-centred, and so outputs aren’t normalised.
• The input which have negative weighted sums fail to contribute to the whole process.

• Leaky ReLU Function:
𝑓 𝑥 =
𝛼𝑥 𝑓𝑜𝑟 𝑥 ≤ 0
𝑥 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
• A shallow slope is added for negative x by using alpha (normally small e.g. 0.01).
• This keeps nodes with negative values of x active.
• It is easy to compute, so it is the most popular choice of activation function for hidden layers.
• The limitations of Leaky ReLU function:
• The negative part, the gradient is always 0.01 which is close to zero which may lead to vanishing
gradient problem.

• Softmax Function:
𝜎(𝑧)𝑖 =
𝑒𝑧𝑖
𝑗=1
𝐾
𝑒𝑧𝑖
• The function returns 1 for the largest probability index while it returns 0 for the
other two array indexes.
• It is suitable more for multi-class classification problems.
• The limitations of Softmax function:
• The calculations for the softmax layer is computationally expensive.

• How to choose the right Activation Function?
• As a rule, you can begin with using the ReLU function and then move over to other
activation functions if ReLU doesn’t provide optimum results.
• Few other guidelines to choose the most appropriate activation function:
• ReLU activation function should only be used in the hidden layers.
• Sigmoid and Tanh functions should not be used in hidden layers.
• Few rules for choosing the activation function for your output layer based on the type
of prediction problem that you are solving:
• Regression - Linear Activation Function.
• Binary Classification—Sigmoid Activation Function.
• Multiclass Classification—Softmax.
• Multilabel Classification—Sigmoid.

What is a Artificial Neural Network?
• A neural network is a system or hardware that is designed to operate like a
human brain.
• Artificial Neural network (ANN) is a data processing system consisting of a
large number of simple, highly interconnected processing elements (artificial
neurons).
• Neural networks estimate any sort of function, no matter how complex it is, to
help people solve complex problems in real-life situations.
• (ANNs) are comprised of node layers, containing an input layer, one or more
hidden layers, and an output layer.
• If the output of any individual node is above the specified threshold value, that
node is activated, sending data to the next layer of the network.

How do artificial neural networks work?
• The Artificial Neural Network receives the input signal
mathematically assigned by the notations 𝑥(𝑛) from the external
source in the form of a pattern and image in the form of a vector.
• Afterward, each of the inputs is multiplied by its corresponding
weights
• All the weighted inputs are summarized inside the computing unit.
• If the weighted sum is equal to zero, then bias is added to make the
output non-zero.
• Bias has the same input, and weight equals to 1.
• The total of weighted inputs is passed through the activation
function.
• The predicted outputs are compared with actual outputs to find the
error rate and then update the weights depending on the error
estimation.

Model of ANN
• The model of ANN is specified by
1. Model of the neuron.
2. Model of the network interconnection.
3. Model of learning/training rule.

Model of ANN Cont.
1. Model of the neuron is specified by:
1. The net function.
• Linear function: apply the sum of the product of the input and the
weights.
• Quadratic function: apply the sum of the product of the squared input
and the weights
2. Activation function: There are several possible activation
functions as explained before.

Model of ANN Cont.
2. Model of the network interconnection: There exist two basic
types of neuron connection architecture:
1. Feedforward Networks can be divided into two types:
1. Single-layer feed-forward network
2. Multilayer feed-forward network
2. Feedback Networks can be divided into three types:
1. Single node with its own feedback
2. Single-layer recurrent network
3. Multilayer recurrent network

Model of ANN Cont.
1. Feedforward Network: Information/signals flow only in
one direction, from the input layer to the output layer.
1. Single-layer feed-forward network
• It is a single-layer perceptron.
• the input layer is fully connected
to the output layer.

Model of ANN Cont.
1. Feedforward Network : Information/signals flows only in one
direction, from the input layer to the output layer.
2. Multilayer feed-forward network
• The concept is having
more than one weighted layer.
• Layers between the input and
the output layer, it is called hidden layers.
• The hidden layers enable the network to be computationally stronger.
• They are used in Speech Recognition, Machine Translation, and
Complex Classification.

Model of ANN Cont.
2. Feedback Network : has feedback paths, hence the
signal can flow in both directions using loops.
1. Single node with its own feedback
• The output can be directed
back as inputs to the same layer.
• The figure shows a single recurrent
network having a single neuron with feedback to itself.

Model of ANN Cont.
2. Feedback Network: has feedback paths, hence the signal can flow
in both directions using loops.
2. Single-layer recurrent network
• Single-layer network with feedback
connection output directed back to
itself or to another processing element
or both.
• This allows it to exhibit dynamic temporal behaviour for a time
sequence.
• RNNs can use their internal state (memory) to process sequences of
inputs.

Model of ANN Cont.
2. Feedback Network: has feedback paths, hence the signal can flow in both
directions using loops.
2. Multilayer recurrent network
• Processing element output can be
directed to the processing element
in the same layer and in the preceding
layer forming a multilayer recurrent network.
• They perform the same task for every element of a sequence, with the output being
dependent on the previous computations.
• The main feature of a Recurrent Neural Network is its hidden state, which captures
some information about a sequence.
• They are used in Text processing like auto suggest, grammar checks, etc, Text to
speech processing, Translation.

Model of ANN Cont.
• Model of ANN is specified by
3. Model of learning/training rule:
• Learning, in artificial neural networks, is the method of modifying
the weights of connections between the neurons of a specified
network.
• Learning in ANN can be classified into three categories:
1. Supervised Learning.
2. Unsupervised Learning.
3. Reinforcement Learning.

Model of ANN Cont.
1. Supervised Learning:
• Correct inputs and correct outputs are provided, and the weight
adjustment is performed based on the error of the computed output.
• It uses a training set to teach models to yield the desired output.
• The algorithm measures its accuracy through the loss function, adjusting
until the error has been sufficiently minimized.
• It can used to solve two types of problems:
• Classification: assign test data into specific categories.
• Regression: used to understand the relationship between dependent and
independent variables.

Model of ANN Cont.
1. Supervised Learning:
• However, Training supervised learning models can be very time-
intensive.
• It also cannot cluster or classify data on its own.

Model of ANN Cont.
2. Unsupervised Learning:
• Unsupervised learning algorithms analyze and cluster unlabelled
datasets.
• It discovers hidden patterns or data groupings without the need for
human intervention.
• Its ability to discover similarities and differences in information
makes it the ideal solution for exploratory data analysis, cross-
selling strategies, customer segmentation, and image recognition.

Model of ANN Cont.
2. Unsupervised Learning :
• However, It has a Computational complexity due to a high volume of
training data.
• Higher risk of inaccurate results.

Model of ANN Cont.
3. Reinforcement Learning :
• It is a feedback-based learning technique in which an agent learns to
behave in an environment by performing the actions and seeing the
results of actions.
• For each good action, the agent gets positive feedback.
• For each bad action, the agent gets negative feedback or a penalty.
• The agent learns automatically using feedback without any labelled data.
• So the agent is bound to learn by its experience only.
• The primary goal is to improve performance by getting the maximum
positive rewards.

Model of ANN Cont.
3. Reinforcement Learning:
• The main challenge in reinforcement
learning lies in preparing the
simulation environment, which is
highly dependent on the task to be
performed.
• Another challenge is reaching a local
optimum – that is the agent performs
the task as it is, but not in the optimal
or required way.

Application of Artificial Neural Networks
• The ability of a neural network to learn, to make adjustments to its structure
over time, is what makes it so useful in the field of artificial intelligence.
• Here are some standard uses of neural networks :
• Pattern Recognition: It’s the most common application. Examples are facial
recognition, optical character recognition, etc.
• Time Series Prediction —Neural networks can be used to make predictions. Will the
stock rise or fall tomorrow? Will it rain or be sunny?
• Signal Processing —hearing aids need to filter out unnecessary noise and amplify
the important sounds. Neural networks can be trained to process an audio signal and
filter it appropriately.

Application of Artificial Neural Networks
• Here are some standard uses of neural networks :
• Control —In self-driving cars, neural networks are often used to manage the steering
decisions of physical vehicles (or simulated ones).
• Soft Sensors —A soft sensor refers to the process of analysing a collection of many
measurements. Neural networks can be employed to process the input data from
many individual sensors and evaluate them as a whole.
• Anomaly Detection —Because neural networks are so good at recognizing patterns,
they can also be trained to generate an output when something occurs that doesn’t fit
the pattern. Think of a neural network monitoring your daily routine over a long
period of time. After learning the patterns of your behaviour, it could alert you when
something is amiss.

Summary
• Artificial Neurons concept is introduced.
• The major points to recall are as follows:
• Biological neuron receive signals (or information), Integrate incoming signals (to
determine whether or not the information should be passed along), and Communicate
signals to target cells (other neurons or muscles or glands).
• Inspiring by Biological neuron, artificial neuron receives inputs from source,
combines them linearly, performs a generally nonlinear operation on the combination,
and outputs the final result.
• Weights are used to give the important of specific input.
• Bias is used to increase classification model accuracy.

Summary cont.
• The major points to recall are as follows cont.:
• Three types of Neural Networks Activation Functions are used to activate
specific perceptron.
• Non-linear activation functions are the most useful to allow the model to
create complex mappings between the network’s inputs and outputs.
• Artificial Neural networks (ANNs) are a able to estimate any sort of
function, no matter how complex it is and are comprised of a node
layers, containing an input layer, one or more hidden layers, and an
output layer.

Summary cont.
• The major points to recall are as follows cont.:
• Model of the neuron is specified by the network function and activation
function.
• There exist two basic types of neuron connection architecture
Feedforward Network and Feedback Network.
• Learning in ANN can be classified into three categories Supervised
Learning, Unsupervised Learning, and Reinforcement Learning.

Neural Networks Lec3.pptx

Recommended

Recommended

More Related Content

Similar to Neural Networks Lec3.pptx

Similar to Neural Networks Lec3.pptx (20)

Recently uploaded

Recently uploaded (20)

Neural Networks Lec3.pptx