SlideShare a Scribd company logo
What are activation functions and why
do we need those?
Activation functions are functions which are used in the Artificial Neural Networks to
capture the complexities inside the data. A neural network without an activation
function is just a simple regression model. The activation function does the non-
linear transformation to the input making it capable to learn and perform more
complex tasks.We introduce non-linearity in each layer through activation functions.
Let us assume there are 3 hidden layers, 1 input and 1 output layer.
W1-Weight matrix between Input layer and first hidden layer
W2-Weight matrix between first hidden layer and second hidden layer
W3-Weight matrix between second hidden layer and third hidden layer
W4-Weight matrix between third hidden layer and output layer
Below mentioned equations represents a feedforward neural network.
If we stack multiple layers, we can see output layer as a function:
What are Ideal qualities of an activation function:
The activation function generally introduce non-linearity in the network to capture the
complex relations between input features and output variable/class.
2. Continuously differentiable:
The activation function needs to be differentiable since neural networks are generally
trained using gradient descent process or to enable gradient based optimization
methods.
3. Zero centered:
Zero centered activations functions makes sure that mean activation value is around
0. This is important because convergence is usually seen faster on normalized data.
I have explained many of the commonly used activation below, some are zero
centered some are not. Mostly when we have a activation function which is not zero
centered we tend to use normalization layers like batch normalization to mitigate this
issue.
4. Computational expense should be low:
Activation functions are used in each layer of the network and is computed a lot of
times, hence its computation should be easy and not very computationally
expensive.
5. Killing gradients:
Activation functions like sigmoid has a saturation problem where the value doesn’t
change much for large negative and large positive values.
The derivative of the sigmoid function gets very small there which in turn prevents
the updating of the weights in initial layers during backpropagation and hence the
network doesn’t learn effectively. This should be avoided to learn patterns in the data
and hence the activation function should not ideally suffer from this issue.
Most commonly used activation functions:
In this section we will go over different activation functions.
The sigmoid function is defined as:
The sigmoid function is a type of activation function which has a characteristic “S”
shaped curve which has domain of all real numbers and output between 0 and 1. An
undesirable property of the sigmoid function is that the activation of the neuron
saturates either at 0 or 1 when the input from the neuron is either large positive and
large negative. It is also non-zero centered which makes neural network learning
difficult. In almost majority of the cases, it is always better to use Tanh activation
function instead of sigmoid activation function.
2. Tanh function -
tanh curve
Tanh has just one advantage over sigmoid function that it is zero-centered and it’s
value is binded between -1 and 1.
3. RELU(Rectified Linear Unit) -
RELU plot
RELU is one of the many non zero-centered activation function and given this
disadvantage it is still widely used because of the advantages it has. It
is computationally very inexpensive, does not cause saturation and does not cause
the vanishing gradient problem. The RELU function doesn’t have a higher limit,
hence it has a problem of exploding activations and on the other hand for negative
values, it has 0 activation and hence it completely ignores the nodes with negative
values. Hence it suffers from “dying relu” problem.
Dying ReLU problem: During the backpropagation process, the weights and biases
for some neurons are not updated because its nature where activation is zero for
negative values. This might create dead neurons which never get activated.
4. Leaky RELU -
Leaky RELU is a type of activation function based on RELU function with a small
slope for negative values instead of zero.
Leaky RELU function
Here, alpha is generally set to 0.01. It solves the “dying RELU” problem and also its
value is generally small and is not set near to 1 since it will only be a linear function
then.
If we use alpha as hyperparameter for each neuron, it becomes a PReLU or
parametrized RELU function.
5. ReLU6 -
This version of ReLU function is basically a ReLU function restricted on the positive
side.
Image credit:pytorch
This helps in containing the activation function for large input positive values and
hence stops the gradient to go to inf value.
6. Exponential Linear Units (ELUs) Function -
Exponential Linear Unit is also a version of ReLU that modifies the slope of the
negative part of the function.
This activation function also avoids dead ReLU problem but it has exploding gradient
problem because of no constraint on the activations for large positive values.
7. Softmax activation function -
It often used in the last activation layer of a neural network to normalize the output of
a network to a probability value that in turn is mapped to each class which helps us
in deciding the probability of output belonging to each class with respect to given
inputs. It is popularly used for multi-class classification problems.
I hope you enjoyed reading this. I have tried to cover many of the activation functions
which are commonly used in Neural Networks.
To know more visit our remaining pages:-
Website:- https://coffeebeans.io/
Blogs:- https://coffeebeans.io/blogs

More Related Content

Similar to What are activation functions and why do we need those.pdf

Deep learning
Deep learningDeep learning
Deep learning
Kuppusamy P
 
V2.0 open power ai virtual university deep learning and ai introduction
V2.0 open power ai virtual university   deep learning and ai introductionV2.0 open power ai virtual university   deep learning and ai introduction
V2.0 open power ai virtual university deep learning and ai introduction
Ganesan Narayanasamy
 
14_cnn complete.pptx
14_cnn complete.pptx14_cnn complete.pptx
14_cnn complete.pptx
FaizanNadeem10
 
Data Science - Part VIII - Artifical Neural Network
Data Science - Part VIII -  Artifical Neural NetworkData Science - Part VIII -  Artifical Neural Network
Data Science - Part VIII - Artifical Neural Network
Derek Kane
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Simplilearn
 
Activation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkActivation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural network
Gayatri Khanvilkar
 
Designing your neural networks – a step by step walkthrough
Designing your neural networks – a step by step walkthroughDesigning your neural networks – a step by step walkthrough
Designing your neural networks – a step by step walkthrough
Lavanya Shukla
 
ANN - UNIT 2.pptx
ANN - UNIT 2.pptxANN - UNIT 2.pptx
Artificial neural network paper
Artificial neural network paperArtificial neural network paper
Artificial neural network paper
AkashRanjandas1
 
Lectura seis
Lectura seisLectura seis
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in RUnderstanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Manish Saraswat
 
Sigmoid function machine learning made simple
Sigmoid function  machine learning made simpleSigmoid function  machine learning made simple
Sigmoid function machine learning made simple
Devansh16
 
Convolutional_neural_network mechanism.pptx.pdf
Convolutional_neural_network mechanism.pptx.pdfConvolutional_neural_network mechanism.pptx.pdf
Convolutional_neural_network mechanism.pptx.pdf
SwathiSoman5
 
Deep learning crash course
Deep learning crash courseDeep learning crash course
Deep learning crash course
Vishwas N
 
Perceptron
Perceptron Perceptron
Perceptron
Kumud Arora
 
Unit 2 ml.pptx
Unit 2 ml.pptxUnit 2 ml.pptx
Unit 2 ml.pptx
PradeeshSAI
 
IRJET- Machine Learning based Object Identification System using Python
IRJET- Machine Learning based Object Identification System using PythonIRJET- Machine Learning based Object Identification System using Python
IRJET- Machine Learning based Object Identification System using Python
IRJET Journal
 
Loss Function.pptx
Loss Function.pptxLoss Function.pptx
Loss Function.pptx
funnyworld18
 
Regularizing DNN.pptx
Regularizing DNN.pptxRegularizing DNN.pptx
Regularizing DNN.pptx
SnehashisPaul8
 
Visualization of Deep Learning
Visualization of Deep LearningVisualization of Deep Learning
Visualization of Deep Learning
YaminiAlapati1
 

Similar to What are activation functions and why do we need those.pdf (20)

Deep learning
Deep learningDeep learning
Deep learning
 
V2.0 open power ai virtual university deep learning and ai introduction
V2.0 open power ai virtual university   deep learning and ai introductionV2.0 open power ai virtual university   deep learning and ai introduction
V2.0 open power ai virtual university deep learning and ai introduction
 
14_cnn complete.pptx
14_cnn complete.pptx14_cnn complete.pptx
14_cnn complete.pptx
 
Data Science - Part VIII - Artifical Neural Network
Data Science - Part VIII -  Artifical Neural NetworkData Science - Part VIII -  Artifical Neural Network
Data Science - Part VIII - Artifical Neural Network
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
 
Activation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkActivation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural network
 
Designing your neural networks – a step by step walkthrough
Designing your neural networks – a step by step walkthroughDesigning your neural networks – a step by step walkthrough
Designing your neural networks – a step by step walkthrough
 
ANN - UNIT 2.pptx
ANN - UNIT 2.pptxANN - UNIT 2.pptx
ANN - UNIT 2.pptx
 
Artificial neural network paper
Artificial neural network paperArtificial neural network paper
Artificial neural network paper
 
Lectura seis
Lectura seisLectura seis
Lectura seis
 
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in RUnderstanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
 
Sigmoid function machine learning made simple
Sigmoid function  machine learning made simpleSigmoid function  machine learning made simple
Sigmoid function machine learning made simple
 
Convolutional_neural_network mechanism.pptx.pdf
Convolutional_neural_network mechanism.pptx.pdfConvolutional_neural_network mechanism.pptx.pdf
Convolutional_neural_network mechanism.pptx.pdf
 
Deep learning crash course
Deep learning crash courseDeep learning crash course
Deep learning crash course
 
Perceptron
Perceptron Perceptron
Perceptron
 
Unit 2 ml.pptx
Unit 2 ml.pptxUnit 2 ml.pptx
Unit 2 ml.pptx
 
IRJET- Machine Learning based Object Identification System using Python
IRJET- Machine Learning based Object Identification System using PythonIRJET- Machine Learning based Object Identification System using Python
IRJET- Machine Learning based Object Identification System using Python
 
Loss Function.pptx
Loss Function.pptxLoss Function.pptx
Loss Function.pptx
 
Regularizing DNN.pptx
Regularizing DNN.pptxRegularizing DNN.pptx
Regularizing DNN.pptx
 
Visualization of Deep Learning
Visualization of Deep LearningVisualization of Deep Learning
Visualization of Deep Learning
 

Recently uploaded

20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 

Recently uploaded (20)

20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 

What are activation functions and why do we need those.pdf

  • 1. What are activation functions and why do we need those? Activation functions are functions which are used in the Artificial Neural Networks to capture the complexities inside the data. A neural network without an activation function is just a simple regression model. The activation function does the non- linear transformation to the input making it capable to learn and perform more complex tasks.We introduce non-linearity in each layer through activation functions.
  • 2. Let us assume there are 3 hidden layers, 1 input and 1 output layer. W1-Weight matrix between Input layer and first hidden layer W2-Weight matrix between first hidden layer and second hidden layer W3-Weight matrix between second hidden layer and third hidden layer W4-Weight matrix between third hidden layer and output layer Below mentioned equations represents a feedforward neural network. If we stack multiple layers, we can see output layer as a function:
  • 3. What are Ideal qualities of an activation function: The activation function generally introduce non-linearity in the network to capture the complex relations between input features and output variable/class. 2. Continuously differentiable: The activation function needs to be differentiable since neural networks are generally trained using gradient descent process or to enable gradient based optimization methods. 3. Zero centered: Zero centered activations functions makes sure that mean activation value is around 0. This is important because convergence is usually seen faster on normalized data. I have explained many of the commonly used activation below, some are zero centered some are not. Mostly when we have a activation function which is not zero centered we tend to use normalization layers like batch normalization to mitigate this issue. 4. Computational expense should be low: Activation functions are used in each layer of the network and is computed a lot of times, hence its computation should be easy and not very computationally expensive. 5. Killing gradients: Activation functions like sigmoid has a saturation problem where the value doesn’t change much for large negative and large positive values. The derivative of the sigmoid function gets very small there which in turn prevents the updating of the weights in initial layers during backpropagation and hence the network doesn’t learn effectively. This should be avoided to learn patterns in the data and hence the activation function should not ideally suffer from this issue.
  • 4. Most commonly used activation functions: In this section we will go over different activation functions. The sigmoid function is defined as: The sigmoid function is a type of activation function which has a characteristic “S” shaped curve which has domain of all real numbers and output between 0 and 1. An undesirable property of the sigmoid function is that the activation of the neuron saturates either at 0 or 1 when the input from the neuron is either large positive and large negative. It is also non-zero centered which makes neural network learning difficult. In almost majority of the cases, it is always better to use Tanh activation function instead of sigmoid activation function. 2. Tanh function - tanh curve Tanh has just one advantage over sigmoid function that it is zero-centered and it’s value is binded between -1 and 1.
  • 5. 3. RELU(Rectified Linear Unit) - RELU plot RELU is one of the many non zero-centered activation function and given this disadvantage it is still widely used because of the advantages it has. It is computationally very inexpensive, does not cause saturation and does not cause the vanishing gradient problem. The RELU function doesn’t have a higher limit, hence it has a problem of exploding activations and on the other hand for negative values, it has 0 activation and hence it completely ignores the nodes with negative values. Hence it suffers from “dying relu” problem. Dying ReLU problem: During the backpropagation process, the weights and biases for some neurons are not updated because its nature where activation is zero for negative values. This might create dead neurons which never get activated.
  • 6. 4. Leaky RELU - Leaky RELU is a type of activation function based on RELU function with a small slope for negative values instead of zero. Leaky RELU function Here, alpha is generally set to 0.01. It solves the “dying RELU” problem and also its value is generally small and is not set near to 1 since it will only be a linear function then. If we use alpha as hyperparameter for each neuron, it becomes a PReLU or parametrized RELU function.
  • 7. 5. ReLU6 - This version of ReLU function is basically a ReLU function restricted on the positive side. Image credit:pytorch This helps in containing the activation function for large input positive values and hence stops the gradient to go to inf value. 6. Exponential Linear Units (ELUs) Function - Exponential Linear Unit is also a version of ReLU that modifies the slope of the negative part of the function. This activation function also avoids dead ReLU problem but it has exploding gradient problem because of no constraint on the activations for large positive values.
  • 8. 7. Softmax activation function - It often used in the last activation layer of a neural network to normalize the output of a network to a probability value that in turn is mapped to each class which helps us in deciding the probability of output belonging to each class with respect to given inputs. It is popularly used for multi-class classification problems. I hope you enjoyed reading this. I have tried to cover many of the activation functions which are commonly used in Neural Networks. To know more visit our remaining pages:- Website:- https://coffeebeans.io/ Blogs:- https://coffeebeans.io/blogs