The field of artificial neural networks is often just called neural networks or multi-layer perceptrons after perhaps the most useful type of neural network. A perceptron is a single neuron model that was a precursor to larger neural networks.
It is a field that investigates how simple models of biological brains can be used to solve difficult computational tasks like the predictive modeling tasks we see in machine learning. The goal is not to create realistic models of the brain but instead to develop robust algorithms and data structures that we can use to model difficult problems.
The power of neural networks comes from their ability to learn the representation in your training data and how best to relate it to the output variable you want to predict. In this sense, neural networks learn mapping. Mathematically, they are capable of learning any mapping function and have been proven to be a universal approximation algorithm.
The predictive capability of neural networks comes from the hierarchical or multi-layered structure of the networks. The data structure can pick out (learn to represent) features at different scales or resolutions and combine them into higher-order features, for example, from lines to collections of lines to shapes.
3. “A single microscopic brain cell cannot think
and is not conscious, but if you bring in a few
more brain cells and a few more, and connect
them all, at a certain point, the group itself will
be able to think and experience emotions and
have opinions and a personality and know that
it exists.”
- Michael Stevens [1]
9/3/20XX Presentation Title 3
4. Biological
Neuron
• Neurons Talk to each other by sending
electrical impulses from one cell to another.
[1]
• When this electrical impulse reaches a certain
Threshold, The neuron fires. [1]
9/3/20XX 4
5. Artificial
Neuron
• An artificial neuron mimics a biological neuron.
• Typically includes multiple inputs, each with an
associated weight. [2]
• It computes a weighted sum of inputs, applies an
activation function, and produces an output. [2]
• The output of an artificial neuron is a continuous
value. [2]
9/3/20XX 5
Fig: Artificial neuron [3]
6. Perceptron
• Perceptron is a linear binary classifier. [4]
• A Perceptron is a type of Artificial Neuron.
• Artificial Neuron Produce a value between 1 and 0. However, Perceptron only produces binary output. [5]
• Perceptron is usually biased toward the extreme values of 0 or 1. [5].
• Neuron uses softer activation functions like the Sigmoid function, Hyperbolic Tangent (tanh), Rectified
Linear Unit (ReLU) etc. These functions generate a value between 0 and 1. [6]
• Perceptron uses a Step Function such as the Heaviside step function as an Activation function. These
functions generate a binary output. Either 0 or 1.[7]
• A Perceptron still qualify as a form of artificial neuron due to its core characteristics [2].
6
7. “All perceptrons are neurons, but all
neurons are not perceptrons”
9/3/20XX Presentation Title 7
8. Perceptron Structure
8
A perceptron has the following components.
• Input: Each input node takes a binary value of 1 or 0. [8]
• Weight: Represents the importance of each input. [8]
• Bias: Bias is used for shifting the activation function towards
0 or 1. [9] Fig: Perceptron [9]
• Summation Function: Computes a weighted sum of inputs.
• Activation function: Outputs 1 if the Summation function returns a value greater than or equal to the
threshold, otherwise returns 0.
• Output: Output of the Activation Function.
9. Perceptron in action
9
Here are the preferences of a customer and details of 2 restaurants. Threshold = 1
Node Criteria Personal preferences (W) Restaurant A Restaurant B
A Good Food 0.8 Yes (1) Yes (1)
B Friends will come 0.6 Yes (1) No (0)
C Cheap 0.4 No (0) Yes (1)
D Noisy Environment -0.5 Yes (1) No (0)
Restaurant A Restaurant B
10. Multilayer
Perceptron
• Multilayer perceptron is a supervised learning
model. [10]
• A Multilayer Perceptron (MLP) is a Fully-connected
Feed-forward Artificial Neural network. [10]
• The multi-layer perceptron model is also known as
the Backpropagation algorithm. [11]
10
Fig: A taxonomy of neural network architectures [10]
11. Artificial Neural Network
11
• It’s a machine-learning model designed to mimic the
function and structure of the human brain.
• ANNs are composed of multiple nodes, which imitate
biological neurons of the human brain. [13]
• The nodes take input data and perform simple operations on
the data. [13]
• Each link between two nodes is associated with weight.
• Nodes are arranged in multiple layers, the layers are the
input layer, Hidden layers, and the output layer.
• ANNs are capable of learning by altering weight values. [13]
Fig: Neural Network [17]
12. How ANN Works [14]
12
• The Input Neurons multiply the input values with their weight and pass
them to the next Neurons in the hidden layer.
• Neurons in the hidden layer are associated with a numerical value called
bias.
• Hidden Neurons add the weighted values and the bias and pass the
value to an activation function.
• The result of the activation function determines if the Neurons will be
activated or not.
• The activated Neurons transmit their value to the neurons in the next
layer.
• In this manner the data is propagated from the input to output layer. This
is called Forward Propagation.
• In the output layer the neuron with the highest value fires and
determines the output.
• The errors of the output layer are propagated back to the network to
reduce the error by modifying the weights and biases, this is called the
Back Propagation.
13. The Stilwell Brain Experiment [1]
13
• Human Neural Network
• Experiment by Michael Stevens
• YouTube Channel: Vsauce
• Name: The Stilwell Brain
• Series: Mind Field
• Season: 3
• Episode: 3
• Each person acts like a neuron.
• Several people are arranged in layers.
• They fire by raising a flag.
• Each layer gradually identifies more
complex patterns.
14. The Stilwell Brain [1]
14
• He writes a number on a paper
and divides the paper into
several pieces.
• Gives each person in the input
layer a piece of paper.
• People forming the neural
network don’t know the number.
• Each paper represents a pixel.
15. The Stilwell Brain [1]
15
• People in the first hidden layer
identify very basic lines.
16. The Stilwell Brain [1]
16
• Second hidden layer identifies
more complex patterns like an
angle
17. The Stilwell Brain [1]
17
• The Third hidden layer identifies
several angles and a pattern
starts to emerge.
18. The Stilwell Brain [1]
18
• The output layer identifies their
designated patterns and the
person associated with the
number raises the flag.
19. Feed Forward Neural
Network [15]
19
• The connections between units do not form a cycle.
• Information only travels forward in the network.
• Used in classification problems where the data is not
sequential or time-dependent.
Fig: Feed Forward Neural Network [15]
20. Fully connected Neural Network
20
• Every neuron in one layer is connected to every neuron in
the other layer. [16]
• Training a fully connected network takes a long time due
to the need to update a large number of parameters. [2]
• Storing and manipulating a large number of weights and
activations consume a significant amount of memory. [2]
Fig: Fully Connected Neural Network [16]
23. Activation functions in a nutshell
23
• The activation function calculates the output of a neuron. [18]
Sigmoid Function ReLU Function Softmax Function
Equation
𝐴(𝑥) =
1
1 + 𝑒−𝑥
𝐴(𝑥) = max(0, 𝑥)
𝐴 𝑥 =
𝑒𝑥
𝑗=𝑖
𝑘
𝑒𝑗
Nature Non-Linear Non-Linear Non-Linear
Output 0 to 1 0 to inf 0 to 1
Diagram
Uses Usually used in output layer of a
binary classification
Usually used in hidden layers of a
neural network
Usually used when trying to handle
multiple classes
24. How Multilayer Perceptron
Works?
24
• It works in two phases, Forward propagation and Backpropagation.
• Forword propagation is where prediction occurs.
• Back propagation is where learning occurs.
• Uses non-linear activation functions like sigmoid, TanH, ReLU, etc. to fit
non-linear data.
Fig: Neural Network [12]
25. Forward Propagation [19]
25
• The journey starts from the input layer to the output layer.
• Let’s Consider the following MLP with input, initial weights, the
biases.
• Initial weights are generated randomly.
• Sigmoid Activation function.
Outputs of the Hidden Layer neurons:
𝑛𝑒𝑡ℎ1= 𝑤1 ∗ 𝑖1 + 𝑤2 ∗ 𝑖2 + 𝑏1
= 0.15 ∗ 0.05 + 0.2 ∗ 0.1 + 0.35
= 0.3775
𝑜𝑢𝑡ℎ1=
1
1+𝑒−𝑥 =
1
1+𝑒−0.3775 = 0.593
𝑛𝑒𝑡ℎ2= 𝑤3 ∗ 𝑖1 + 𝑤4 ∗ 𝑖2 + 𝑏1
= 0.25 ∗ 0.05 + 0.3 ∗ 0.1 + 0.35
= 0.3925
𝑜𝑢𝑡ℎ2=
1
1 + 𝑒−𝑥
=
1
1 + 𝑒−0.3925
= 0.597
27. Error Function
27
• A measure that quantifies the difference between the predicted output and the actual output.
• Also known as the loss function or the cost function.
• Backpropagation tries to minimize the error function.
• Common Error Functions:
• Mean Squared Error (MSE): Commonly used for regression problems. It calculates the average of
the squared differences between predicted and actual values. [20]
𝑀𝑆𝐸 𝑝, 𝑦 =
1
𝑛
𝑖=1
𝑛
𝑦𝑖 − 𝑝𝑖
2
• Binary Cross-Entropy Loss: Frequently used for binary classification tasks, measuring the difference
between predicted probabilities and true binary labels. [21]
𝐵𝐶𝐸 𝑝, 𝑦 = −[𝑦𝑙𝑜𝑔 𝑝 + 1 − 𝑦 log(1 − 𝑝)]
• Categorical Cross-Entropy Loss: Often used for multi-class classification problems. It calculates the
difference between predicted probabilities and true class labels. [22]
𝐶𝐶𝐸(𝑝, 𝑦) =
𝑖
𝑦𝑖log(𝑝𝑖)
29. The Chain Rule
29
• The chain rule is a fundamental calculus principle extensively used in backpropagation.
• Extremely important for Multilayer Layer Perceptron.
• Determines the relation between weight and error.
•
𝜕𝐸𝑡𝑜𝑡𝑎𝑙
𝜕𝑤5
is the rate of change in error with respect
to change in weight.
•
𝜕𝐸𝑡𝑜𝑡𝑎𝑙
𝜕𝑜𝑢𝑡𝑜1
is the rate of change in error with respect
to change in output value of a neuron.
•
𝜕𝑜𝑢𝑡𝑜1
𝜕𝑛𝑒𝑡𝑜1
is the rate of change in output value of a
neuron with respect to change in net value.
•
𝜕𝑛𝑒𝑡𝑜1
𝜕𝑤5
is the change in net value with respect to
change in weight.
30. Backpropagation
30
• Change in error with respect to the output:
𝐸𝑡𝑜𝑡𝑎𝑙 =
1
2
[(𝑡𝑎𝑟𝑔𝑒𝑡𝑜1 − 𝑜𝑢𝑡𝑜1)2
− 𝑡𝑎𝑟𝑔𝑒𝑡𝑜2 − 𝑜𝑢𝑡𝑜2
2
]
𝜕𝐸𝑡𝑜𝑡𝑎𝑙
𝜕𝑜𝑢𝑡𝑜1
= 2 ∗
1
2
𝑡𝑎𝑟𝑔𝑒𝑡𝑜1 − 𝑜𝑢𝑡𝑜1 ∗ −1 + 0
= 0.01 − 0.75 ∗ −1 = 0.74
• Change in output with respect net input:
𝑜𝑢𝑡𝑜1 =
1
1 + 𝑒−𝑥
𝜕𝑜𝑢𝑡𝑜1
𝜕𝑛𝑒𝑡𝑜1
= 𝑜𝑢𝑡𝑜1 1 − 𝑜𝑢𝑡𝑜1 ……………………………[23]
= 0.75 1 − 0.75 = 0.186
• Change in net input with respect to weight:
𝑛𝑒𝑡𝑜1 = 𝑤5 ∗ 𝑜𝑢𝑡ℎ1 + 𝑤6 ∗ 𝑜𝑢𝑡ℎ2 + 𝑏2
𝜕𝑛𝑒𝑡𝑜1
𝜕𝑤5
= 1 ∗ 𝑜𝑢𝑡ℎ1 + 0 + 0 = 0.593
• Change in error with respect to the weight:
𝜕𝐸𝑡𝑜𝑡𝑎𝑙
𝜕𝑤5
= 0.74 ∗ 0.186 ∗ 0.59 = 0.082
0.593
0.597
0.75
0.77
31. What does it mean?
31
• What is the meaning of
𝜕𝐸𝑡𝑜𝑡𝑎𝑙
𝜕𝑤5
= 0.082 ?
⇒ 𝜕𝐸𝑡𝑜𝑡𝑎𝑙 = 0.082 ∗ 𝜕𝑤5
• So 𝜕𝐸𝑡𝑜𝑡𝑎𝑙 and 𝜕𝑤5 is proportionally related.
• Coefficient of proportionality is 0.082.
32. Learning Rate
32
• How much to change the model in response to the estimated error, each time the model weights are
updated. [24]
• A very large learning rate can overshoot the optimal values.
• A very small learning rate can cause the optimization process to converge extremely slowly.
• Learning Rate can be optimized using the Gradient Descent algorithm.
33. Continuing Backpropagation
33
• To decrease the error, we then update the weight
• Assume the learning, 𝜇 = 0.5
𝑤5
+
= 𝑤5
− 𝜇 ∗
𝜕𝐸𝑡𝑜𝑡𝑎𝑙
𝜕𝑤5
= 0.4 − 0.5 ∗ 0.082 = 0.358
Similarly, we get
𝑤6
+
= 0.408
𝑤7
+
= 0.511
𝑤8
+
= 0.561
𝑤1
+
= 0.149
𝑤2
+
= 0.199
𝑤3
+
= 0.249
𝑤4
+
= 0.299
• Repeat this process until the desired accuracy is reached.
34. Importance of MLP
• Can Fit extremely non-linear datasets [10].
• Can automatically learn sophisticated features from raw input data.
• Versatile model, can be used in Classification, Regression, Pattern Recognition, Time Series
Analysis and many other fields.
• Scalable model, can be used from small datasets to very large datasets.
• MLPs serve as fundamental building blocks in deep learning. By stacking multiple hidden layers,
MLPs become deep neural networks (DNNs).
• MLPs have been at the forefront of neural network research and innovations.
• MLPs are widely used in practical applications such as computer vision, natural language
processing, speech recognition, finance, healthcare, recommendation systems, robotics, and more.
34
35. References
• [1] “The stilwell brain,” YouTube, https://www.youtube.com/watch?v=rA5qnZUXcqo (accessed Oct. 2, 2023).
• [2] “Chatgpt,” ChatGPT, https://openai.com/chatgpt (accessed Oct. 2, 2023).
• [3] RainerGewalt and Name, “Perceptrons - these artificial neurons are the fundamentals of Neural Networks,” Fly spaceships with your mind, https://starship-knowledge.com/neural-
networks-perceptrons (accessed Oct. 3, 2023).
• [4] “What is a feedforward fully connected neural network and how to implement it in Matlab?,” Saturn Cloud Blog, https://saturncloud.io/blog/what-is-a-feedforward-fully-connected-
neural-network-and-how-to-implement-it-in-matlab/ (accessed Oct. 4, 2023).
• [5] P. King, “What is the difference between the ‘neurons’ in an artificial neural network (ANN) and ‘perceptrons’?,” Quora, https://www.quora.com/What-is-the-difference-between-
the-neurons-in-an-artificial-neural-network-ANN-and-perceptrons (accessed Oct. 2, 2023).
• [6] “Activation functions in neural networks,” GeeksforGeeks, https://www.geeksforgeeks.org/activation-functions-neural-networks/ (accessed Oct. 2, 2023).
• [7] “Perceptron in Machine Learning - Javatpoint,” www.javatpoint.com, https://www.javatpoint.com/perceptron-in-machine-learning (accessed Oct. 3, 2023).
• [8] Perceptrons, https://www.w3schools.com/ai/ai_perceptrons.asp (accessed Oct. 3, 2023).
• [9] A. Hange, “Flux prediction using single-layer perceptron and Multilayer Perceptron,” Medium, https://medium.com/nerd-for-tech/flux-prediction-using-single-layer-perceptron-and-
multilayer-perceptron-cf82c1341c33 (accessed Oct. 3, 2023).
• [10] M. W. Gardner and S. R. Dorling, “Artificial Neural Networks (the multilayer perceptron)—a review of applications in the Atmospheric Sciences,” Atmospheric Environment, vol.
32, no. 14–15, pp. 2627–2636, 1998. doi:10.1016/s1352-2310(97)00447-0.
• [11] “Perceptron in Machine Learning - Javatpoint,” www.javatpoint.com, https://www.javatpoint.com/perceptron-in-machine-learning (accessed Oct. 3, 2023).
• [12] Minesanalytix, “Demystifying Data-Driven Neural Networks for multivariate production analysis,” Data Analytix Association @ Mines,
https://orgs.mines.edu/daa/blog/2019/08/05/neural-networks-mva/ (accessed Oct. 4, 2023).
• [13] “Artificial Intelligence - Neural Networks,” Online Courses and eBooks Library, https://www.tutorialspoint.com/artificial_intelligence/artificial_intelligence_neural_networks.htm
(accessed Oct. 4, 2023).
35
36. • [14] “Neural network in 5 minutes | what is a neural network? | how neural networks work | Simplilearn,” YouTube, https://www.youtube.com/watch?v=bfmFfD2RIcg (accessed Oct.
4, 2023). 2, 2023).
• [15] “Feedforward Neural Networks,” Brilliant Math & Science Wiki, https://brilliant.org/wiki/feedforward-neural-networks/ (accessed Oct. 4, 2023).
• [16] P. Mahajan, “Fully connected vs Convolutional Neural Networks,” Medium, https://medium.com/swlh/fully-connected-vs-convolutional-neural-networks-813ca7bc6ee5 (accessed
Oct. 4, 2023).
• [17] “A beginner’s guide to keras: Digit recognition in 30 minutes,” SitePoint, https://www.sitepoint.com/keras-digit-recognition-tutorial/ (accessed Oct. 4, 2023).
• [18] “Activation function,” Wikipedia, https://en.wikipedia.org/wiki/Activation_function (accessed Oct. 4, 2023).
• [19] Mazur, “A step by step backpropagation example,” Matt Mazur, https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/ (accessed Oct. 5, 2023).
• [20] “Mean squared error,” Wikipedia, https://en.wikipedia.org/wiki/Mean_squared_error (accessed Oct. 5, 2023).
• [21] “A practical guide to binary cross-entropy and Log Loss,” Aporia, https://www.aporia.com/learn/understanding-binary-cross-entropy-and-log-loss-for-effective-model-monitoring/
(accessed Oct. 5, 2023).
• [22] Neuralthreads, “Categorical cross-entropy loss-the most important loss function,” Medium, https://neuralthreads.medium.com/categorical-cross-entropy-loss-the-most-
important-loss-function-d3792151d05b (accessed Oct. 5, 2023).
• [23] “Logistic function,” Wikipedia, https://en.wikipedia.org/wiki/Logistic_function#Derivative (accessed Oct. 5, 2023).
• [24] J. Brownlee, “Understand the impact of learning rate on neural network performance,” MachineLearningMastery.com, https://machinelearningmastery.com/understand-the-
dynamics-of-learning-rate-on-deep-learning-neural-networks/ (accessed Oct. 5, 2023).
• [25] “Educative answers - trusted answers to developer questions,” Educative, https://www.educative.io/answers/what-is-forward-propagation-in-neural-networks (accessed Oct. 5,
2023).
36