Module 1: Fundamentals of neural network.pptx

MODULE 1:
FUNDAMENTALS OF
NEURAL NETWORK
1

OUTLINE
• Biological neuron
• Mc-Culloch Pitts Neuron (MP Neuron)
• Boolean Functions using MP Neuron
• Perceptron and Perceptron Learning
• Delta learning
• Multilayer Perceptron: Linearly separable, linearly non-separable classes
• Brief History, Three Classes of Deep
2

SIMPLIFIED ILLUSTRATION OF HUMAN RESPONSE
TO A STIMULUS
4

BIOLOGICAL NEURON
• The most fundamental unit of deep
neural networks is called an artificial
neuron/perceptron.
• But the very first step towards the
perceptron we use today was taken in
1943 by McCulloch and Pitts, by
mimicking the functionality of a
biological neuron.
5

BIOLOGICAL NEURON
• A tiny piece of the brain, about the size
of grain of rice, contains over 10,000
neurons, each of which forms an average
of 6,000 connections with other
neurons.
• The neuron is optimized to receive
information from other neurons, process
this information in a unique way, and
send its result to other cells.
6

BIOLOGICAL NEURON
• Dendrite: Receives signals from other neurons
• Soma: Processes the information
• Axon: Transmits the output of this neuron
• Synapse: Point of connection to other neurons
7

BIOLOGICAL NEURON
• The neuron receives its inputs along antennae-like
structures called dendrites.
• Each of these incoming connections is dynamically
strengthened or weakened based on how often it is
used.
• It’s the strength of each connection that
determines the contribution of the input to the
neuron’s output.
• After being weighted by the strength of their
respective connections, the inputs are summed
together in the cell body.
• This sum is then transformed into a new signal
that’s propagated along the cell’s axon and sent off
to other neurons. 8

MC-CULLOCH PITTS NEURON
• The first computational model of a neuron was proposed by Warren MuCulloch
(neuroscientist) and Walter Pitts (logician) in 1943.
• It may be divided into 2 parts. The first part, g takes an input, performs an
aggregation and based on the aggregated value the second part, f makes a
decision. 10

• Lets suppose that I want to predict my own
decision, whether to watch a random football game
or not on TV.
• The inputs are all boolean i.e., {0,1} and my
output variable is also boolean {0: Won’t watch it, 1:
Will watch it}.
• So, x1 could be isPremierLeagueOn (I like Premier
League more)
• x2 could be isItAFriendlyGame (I tend to care less
about the friendlies)
• x3 could be isNotHome (Can’t watch it when I’m
running errands. Can I?)
• x4 could be isManUnitedPlaying (I am a big Man
United fan. GGMU!) and so on.
11

• These inputs can either be excitatory or
inhibitory.
• Inhibitory inputs are those that have
maximum effect on the decision
making irrespective of other inputs i.e., if
x_3 is 1 (not home) then my output will
always be 0 i.e., the neuron will never fire,
so x_3 is an inhibitory input.
• Excitatory inputs are NOT the ones that
will make the neuron fire on their own but
they might fire it when combined
together. Formally, this is what is going
on: 12

• We can see that g(x) is just doing a sum
of the inputs — a simple aggregation.
• And theta here is called thresholding
parameter.
• For example, if I always watch the game
when the sum turns out to be 2 or more,
the theta is 2 here. This is called the
Thresholding Logic.
13

BOOLEAN FUNCTIONS USING M-P NEURON
15

M-P NEURON: A CONCISE REPRESENTATION
• This representation just denotes that, for the boolean
inputs x_1, x_2 and x_3 if the g(x) i.e., sum ≥ theta, the neuron will fire
otherwise, it won’t.
16

AND FUNCTION
• An AND function neuron would only fire when ALL the inputs are ON i.e., g(x)
≥ 3 here.
17

OR FUNCTION
• An OR function neuron would fire if ANY of the inputs is ON i.e., g(x) ≥ 1
here.
18

NOR FUNCTION
• For a NOR neuron to fire, we want ALL the inputs to be 0 so the thresholding
parameter should also be 0 i.e the sum of inputs= theta = 0
• circle at the end indicates inhibitory input: if any inhibitory input is 1 the
output will be 0
19

NOT FUNCTION
• For a NOT neuron, 1 outputs 0 and 0 outputs 1.
• Sum of unputs==theta=0
• Inhibitory input
20

NAND FUNCTION
• For a NAND neuron to fire, we want any one of the inputs to be 0 so the
thresholding parameter should be 1
• Sum of inputs<= theta
1
21

LIMITATIONS OF M-P NEURON
• What about non-boolean (say, real)
inputs?
• Boolean and Non-Boolean functions
which are non-linearly seperable
• Do we always need to hand code the
threshold?
• Are all inputs equal? What if we want to
assign more importance to some inputs
22

OVERCOMING THE LIMITATIONS OF THE M-P NEURON,
FRANK ROSENBLATT, AN AMERICAN PSYCHOLOGIST,
PROPOSED THE CLASSICAL PERCEPTION MODEL, THE
MIGHTY ARTIFICIAL NEURON, IN 1958.
FURTHER REFINED AND CAREFULLY ANALYZED BY
MINSKY AND PAPERT (1969) — THEIR MODEL IS
REFERRED TO AS THE PERCEPTRON MODEL.
23

PERCEPTRON: THE ARTIFICIAL NEURON
• The most fundamental unit of a deep
neural network is called an artificial
neuron, which takes an input, processes it,
passes it through an activation function
like the Sigmoid, returns the activated
output.
• We are only going to talk about the
perceptron model proposed before the
‘activation’ part came into the picture.
25

26

• Introducing the concept of
numerical weights (a measure
of importance) for inputs
• Mechanism for learning these
weights.
• Inputs are no longer limited to
boolean values like in the case of
an M-P neuron
• Supports real inputs as well
which makes it more useful and
generalized.
27

28

INCORPORATING THRESHOLD IN PERCEPTRON
• we take a weighted sum of the inputs and set the output as one only when the sum is
more than an arbitrary threshold (theta).
• Instead of hand coding the thresholding parameter theta, we add it as one of the inputs,
with the weight -theta
29

EXAMPLE 1
Consider the task of predicting whether I would watch a random game of
football on TV or not (the same example from my M-P neuron post) using the
behavioral data available.
30

EXAMPLE 1
• Here, w0 is called the bias because it represents the prior (prejudice).
• A football freak may have a very low threshold and may watch any football
game irrespective of the league, club or importance of the game [theta = 0].
• On the other hand, a selective viewer like me may only watch a football game
that is a premier league game, featuring Man United game and is not friendly
[theta = 2].
• The point is, the weights and the bias will depend on the data (viewing
history in this case).
31

ANSWERS TO LIMITATIONS OF MC-CULLOCH PITTS NEURON
• What about non-boolean (say, real) inputs? Real valued inputs are allowed in
perceptron
• Do we always need to hand code the threshold? No, we can learn the
threshold
• Are all inputs equal? What if we want to assign more weight (importance) to
some inputs? A perceptron allows weights to be assigned to inputs
• What about functions which are not linearly separable ? Not possible with a
single perceptron (REQUIRE MULTILAYER PERCEPTRON)
33

WHAT ABOUT FUNCTIONS WHICH ARE NOT LINEARLY
SEPARABLE ? NOT POSSIBLE WITH A SINGLE PERCEPTRON
• Most real world data is not linearly
separable and will always contain some
outliers.
• In fact, sometimes there may not be any
outliers but still the data may not be linearly
separable
• We need computational units (models) which
can deal with such data
• While a single perceptron cannot deal with
such data, we will show that a network of
perceptrons can indeed deal with such
data
34

EXAMPLE OF LINEARLY AND NON-LINEARLY SEPARABLE
• Consider 2 input AND gate:
• Let’s plot a graph where inputs for variable “A” will be plotted on x-axis and inputs
for variable “B” will be plotted on y-axis
35

• Let’s plot a graph where inputs for variable “A” will be plotted on x-axis and inputs
for variable “B” will be plotted on y-axis
• Red dots represent 0 output and Blue dots represent 1 output.
• 0s and 1s can be separated by a line (green line)
• Hence we say AND function is linearly separable like OR, NOT
36

• Consider 2 input XOR gate:
• Let’s plot a graph where inputs for variable “A” will be plotted on x-axis and inputs for
variable “B” will be plotted on y-axis
• You cannot draw a line to separate blue dots from red dots, hence it is a non-linearly
separable gate
37
1
1
0

MULTILAYER PERCEPTRON
• For this discussion, we will assume True = +1
and False = -1
• We consider 2 inputs and 4 perceptrons
• Each input is connected to all the 4 perceptrons
with specific weights
• The bias (w0) of each perceptron is -2 (i.e.,
each perceptron will fire only if the weighted
sum of its input is ≥ 2)
• Each of these perceptrons is connected to an
output perceptron by weights (which need to
be learned)
• The output of this perceptron (y) is the output of
this network
39

• Terminology:
• This network contains 3 layers
• The layer containing the inputs (x1, x2) is called
the input layer
• The middle layer containing the 4 perceptrons is
called the hidden layer
• The final layer containing one output neuron is
called the output layer
• The outputs of the 4 perceptrons in the hidden
layer are denoted by h1, h2, h3, h4
• The red and blue edges are called layer 1
weights
• w1, w2, w3, w4 are called layer 2 weights
40

This network can be used to implement any boolean function (linearly
separable or not)
41

REVISED PERCEPTRON
COMPONENTS
42

REVISED PERCEPTRON COMPONENTS
2
3
4
43

REVISED PERCEPTRON COMPONENTS
The basic components of a perceptron are:
• Input Layer: The input layer consists of one or
more input neurons, which receive input
signals from the external world or from other
layers of the neural network.
• Weights: Each input neuron is associated with
a weight, which represents the strength of the
connection between the input neuron and the
output neuron.
• Bias: A bias term is added to the input layer to
provide the perceptron with additional
flexibility in modeling complex patterns in the
input data. 44

REVISED PERCEPTRON DEFINITION
The basic components of a perceptron are:
• Activation Function: The activation function
determines the output of the perceptron based
on the weighted sum of the inputs and the bias
term. Common activation functions used in
perceptrons include the step function, sigmoid
function, and ReLU function.
• Output: The output of the perceptron is a
single binary value, either 0 or 1, which
indicates the class or category to which the
input data belongs. (Case of
• Training Algorithm: The perceptron is typically
trained using a supervised learning algorithm
such as the perceptron learning algorithm.
During training, the weights and biases of the
perceptron are adjusted to minimize the error
between the predicted output and the true
output for a given set of training examples.
><
=
0
45

PERCEPTRON LEARNING
• Activation function here is also known
as the step function and is
represented by ‘f.’
• This step function or Activation
function is vital in ensuring that
output is mapped between (0,1) or (-
1,1)
46

PERCEPTRON LEARNING RULE
• y = 1, if f(x) >= 0
= 0, if f(x) < 0
• Using z > threshold implies a "hard" threshold where if the weighted sum z is
equal to or greater than the threshold, the output is 1; otherwise, it's 0. This
means that the decision boundary is sharp and clear, and it might be harder
for the model to learn the optimal weights and biases.
Hard threshold
47

PERCEPTRON LEARNING ALGORITHM (ANOTHER
REPRESENTATION)
49

DELTA RULE
• The delta rule is a learning rule used in a perceptron to update the weights based on the error between the
desired output and the actual output. The formula for updating the weights using the delta rule is:
• Δw = [α * (t - y) * x]
• New weight = previous weight + Δw
• New weight = previous weight + [α * (t - y) * x]
Where:
• Δw represents the change in weight.
• α is the learning rate, a small constant that determines how much the weights are updated in each
iteration.
• t is the target output.
• y is the predicted output.
• x is the input.
• This formula ensures that the weights are adjusted in such a way that the error is minimized.
• If the actual output (y) matches the target output (t), no change is made to the weights. If there's a mismatch,
the weights are adjusted in the direction that reduces the error
50

HOW TO DECIDE INITIAL VALUES- THRESHOLD, LEARNING RATE, AND
WEIGHTS
1. Initial weights: Random values (atleast normalized values)
2. Threshold: The threshold for binary classification problems is often set to
0.5. For imbalanced datasets or multi-class problems, you might want to
calculate the threshold based on the distribution of your classes.
3. Learning Rate: Typical values for a neural network with standardized inputs
are less than 1 and greater than 10^−6. Smaller learning rates will require
more training epochs, while larger learning rates will require fewer training
epochs.
** you can initialize using negative values, but that may make things
complex
51

ACTIVATION FUNCTION
• An Activation Function decides whether a neuron should be activated or not.
This means that it will decide whether the neuron’s input to the network
is important or not in the process of prediction using simpler mathematical
operations.
• The role of the Activation Function is to derive output from a set of input
values fed to a node (or a layer).
• Purpose of an activation function is to add non-linearity to the neural
network.
52

TYPES OF ACTIVATION FUNCTIONS
1. Binary Step Function
• Binary step function depends on a threshold value that decides whether a
neuron should be activated or not.
• It cannot provide multi-value outputs—for example, it cannot be used for
multi-class classification problems.
53

2. Linear Activation Function
• The linear activation function, also known as "no activation," or "identity function"
(multiplied x1.0), is where the activation is proportional to the input.
• The function doesn't do anything to the weighted sum of the input, it simply spits out
the value it was given.
• All layers of the neural network will collapse into one if a linear activation function is
used. No matter the number of layers in the neural network, the last layer will still be a
linear function of the first layer. So, essentially, a linear activation function turns the
neural network into just one layer.
54

2. Non Linear Activation Functions
• Non-linear activation functions solve the following limitations of linear
activation functions:
• They allow backpropagation because now the derivative function would be
related to the input, and it’s possible to go back and understand which
weights in the input neurons can provide a better prediction.
• They allow the stacking of multiple layers of neurons as the output would
now be a non-linear combination of input passed through multiple layers. Any
output can be represented as a functional computation in a neural network.
55

2.1 Sigmoid / Logistic Activation Function
• This function takes any real value as input and outputs values in the range
of 0 to 1.
• The larger the input (more positive), the closer the output value will be to 1.0,
whereas the smaller the input (more negative), the closer the output will be
to 0.0, as shown below.
56

HISTORY
• In 1986 a guy called Geoffrey Hinton, along with David Rumelhart and
Ronald Williams, made a breakthrough by introducing the
backpropagation algorithm.
• Geoffrey Hinton- Father of Deep Learning
57

DEEP LEARNING DEFINITION
• Deep learning is a subset of machine learning that uses artificial neural
networks with representation learning.
• The term "deep" refers to the use of multiple layers in the network.
• Deep learning algorithms progressively extract higher-level features from raw
input data.
• For example, in image processing, lower layers may identify edges, while
higher layers may identify concepts relevant to a human, such as digits,
letters, or faces
59

THREE CLASSES OF DEEP LEARNING
Hybrid Deep
Learning
60

1. Deep networks for unsupervised or generative learning
• Generative modeling is an unsupervised learning task in machine learning that
involves automatically discovering and learning the regularities or patterns in input
data in such a way that the model can be used to generate or output new examples
that plausibly could have been drawn from the original dataset.
• GANs (Generative Adversarial Networks) are a way of training a generative model
by framing the problem as a supervised learning problem with two sub-models:
• the generator model that we train to generate new examples,
• and the discriminator model that tries to classify examples as either real (from the
domain) or fake (generated).
• The two models are trained together in a zero-sum game, adversarial, until the
discriminator model is fooled about half the time, meaning the generator model
is generating plausible examples.
61

• Ability to generate realistic examples across a
range of problem domains, like image-to-
image translation tasks such as
• translating photos of summer to winter or day
to night,
• generating photorealistic photos of objects,
scenes, and people that even humans cannot
tell are fake.
62

2. Deep networks for supervised learning
• Supervised deep learning frameworks are trained using well-labelled data.
• It teaches the learning algorithm to generalise from the training data and to
implement in unseen situations.
• After completing the training process, the model is tested on a subset of the
testing set to predict the output. Thus, datasets containing inputs and correct
outputs become critical as they help the model learn faster.
• Examples- spam filters, fraud detection systems, recommendation engines,
and image recognition systems
63

3. Hybrid deep networks
• Goal is discrimination which is assisted, often in a significant way, with the
outcomes of generative or unsupervised deep networks. This can be
accomplished by better optimization or/and regularization of supervised
deep networks.
• The goal can also be accomplished when discriminative criteria for
supervised learning are used to estimate the parameters in any of the
unsupervised deep networks.
64

Module 1: Fundamentals of neural network.pptx

Recommended

Recommended

More Related Content

Similar to Module 1: Fundamentals of neural network.pptx

Similar to Module 1: Fundamentals of neural network.pptx (20)

Recently uploaded

Recently uploaded (20)

Module 1: Fundamentals of neural network.pptx

Editor's Notes