Activation_function.pptx

Avtivation
Function
By:
Mohamed Essam
Nourhan Ahmed

Simple Neural Network
Simple Neural network and simple Example

Simple Neural Network :
Input Layer Hidden Layer Output Layer

Simple NN Example : First node in hidden layer =
W1 * X1 + W2 * X2
1 * 2 + 1 * 3 = 5
Second node in hidden
layer =
W3 * X1 + W4 * X2
-1 * 2 + 1 * 3 = 1
Output node =
W5 * X1 + W6 * X2
2 * 5 + -1 * 1 = 9
To calculate node = ΣWi.Xi

Why do Neural Networks need it?
● Activation function is to add non-linearity to the neural network.
● activation function is an additional step at each layer, but its computation
is worth it. Here is why?
● Assume we have a neural network working without the activation
functions, In that case, every neuron will only be performing a linear
transformation on the inputs and although the neural network becomes
simpler, learning any complex task is impossible, and our model would be
just a linear regression model.
● Activation Function :
○ Tanh()
○ RELU

Tanh() :
● The output of this function in the range
of -1 to 1 .
● In Tanh, the larger the input (more
positive), the closer the output value
will be to 1.0, whereas the smaller the
input (more negative), the closer the
output will be to -1.0 .
(The Tanh Activation Function Graph)

Tanh() derivative :
(The Tanh Derivative Activation Function Graph)

Tanh() Example :
First node in hidden layer
(Z1) =
W1 * X1 + W2 * X2
1 * 2 + 1 * 3 = 5
f(5)
Mathematically it can be represented as:
a1 = = 0.99

Cont … Tanh() Example :
Second node in hidden layer (Z2)=
W3 * X1 + W4 * X2
-1 * 2 + 1 * 3 = 1 f(1)
a2 = = 0.76
Output node =
W5 * a1 + W6 * a2
2 * 0.99 + -1 * 0.76 = 1.23

ReLU:
ReLU stands for Rectified Linear Unit.
the ReLU function does not activate all the neurons at the same time.
The neurons will only be deactivated if the output of the linear
transformation is less than 0.
only a certain number of neurons are activated, the ReLU function is
far more computationally efficient when compared to tanh function.

Cont … Relu :
● The neural network is characterized by:
○ Pattern of connection between neurons
( Architecture)
○ Activation Function
○ method of determining the weight of the
connections
( Training, Learning ).

Activation function
An Activation Function decides whether
a neuron should be activated or not.
This means that it will decide whether
the neuron’s input to the network is
important or not in the process of
prediction using simpler mathematical
operations.

Activation function
Activation (firing) of the neuron takes place when the neuron is
stimulated by pressure, heat, light, or chemical information from
other cells. (The type of stimulation necessary to produce firing
depends on the type of neuron.)
Depending on the nature and intensity of these input signals, the
brain processes them and decides whether the neuron should be
activated (“fired”) or not.
Activation Firing

Sigmoid -activation function
Activation (firing) of the neuron takes place when the neuron is stimulated by
pressure, heat, light, or chemical information from other cells. (The type of
stimulation necessary to produce firing depends on the type of neuron.)
Depending on the nature and intensity of these input signals, the brain
processes them and decides whether the neuron should be activated (“fired”)
or not.
Activation Firing

Activation function
❏ The primary role of the Activation Function is to transform
the summed weighted input from the node into an output
value to be fed to the next hidden layer or as output.
❏ It is used to determine the output of neural network like yes
or no. It maps the resulting values in between 0 to 1 or -1 to
1 etc. (depending upon the function).
Activation Firing

Leaky Relu-activation function
The Dying ReLU problem
Limitation Faced By Relu

The negative side of the graph makes the gradient value zero. Due to this
reason, during the backpropagation process, the weights and biases for
some neurons are not updated. This can create dead neurons which never
get activated.
All the negative input values become zero immediately, which decreases
the model’s ability to fit or train from the data properly.

The negative side of the graph makes the gradient value zero. Due to this reason,
during the backpropagation process, the weights and biases for some neurons are
not updated. This can create dead neurons which never get activated.
● All the negative input values become zero immediately, which decreases the
model’s ability to fit or train from the data properly.
● Leaky ReLU is an improved version of ReLU function to solve the Dying ReLU
problem as it has a small positive slope in the negative area.

Advantage of Leaky Relu
The advantages of Leaky ReLU are same as that of ReLU, in addition to the
fact that it does enable backpropagation, even for negative input values.
By making this minor modification for negative input values, the gradient of
the left side of the graph comes out to be a non-zero value. Therefore, we
would no longer encounter dead neurons in that region.

Sigmoid function
The main reason why we use sigmoid
function is because it exists between (0
to 1). Therefore, it is especially used for
models where we have to predict the
probability as an output.Since
probability of anything exists only
between the range of 0 and 1, sigmoid is
the right choice.

Well, the purpose of an activation function is to add non-
linearity to the neural network.
Activation Purpose

Softmax-activation function
The softmax function is a function that turns a vector of K real values
into a vector of K real values that sum to 1. The input values can be
positive, negative, zero, or greater than one, but the softmax transforms
them into values between 0 and 1, so that they can be interpreted as
probabilities. If one of the inputs is small or negative, the softmax turns it
into a small probability, and if an input is large, then it turns it into a large
probability, but it will always remain between 0 and 1.
Activation

Softmax-activation function
As mentioned above, the softmax function and the sigmoid
function are similar. The softmax operates on a vector while the
sigmoid takes a scalar.
Softmax Function vs Sigmoid Function

CREDITS: This presentation template was created by Slidesgo, including
icons by Flaticon, and infographics & images by Freepik
THANKS
Do you have any questions?
Please keep this slide for attribution

Activation_function.pptx

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Activation_function.pptx

Similar to Activation_function.pptx (20)

More from Mohamed Essam

More from Mohamed Essam (20)

Recently uploaded

Recently uploaded (20)

Activation_function.pptx