Unit-5 madhu .pdf

Machine Learning
By
Dr.G.MADHU
M.Tech., Ph.D., MIEEE., MCSI., MISTE., MISRS., MIRSS., MIAENG
Professor,
Department of Information Technology,
VNR Vignana Jyothi Institute of Engineering & Technology,
Bachupally, Nizampet (S.O.)
Hyderabad- 500 090,RangaReddy Dt. TELANGANA, INDIA.
Cell: +919849085728
E-mail: madhu_g@vnrvjiet.in
Subject Code: 22PC1IT302

Unit-5: Artificial Neural Networks
Machine Learning Course- Dr G Madhu 2
4/28/2025

• “Artificial Neural Networks or ANN is an
information processing paradigm that is inspired
by the way the biological nervous system such as
brain process information.
• It is composed of large number of highly
interconnected processing elements (neurons)
working in unison to solve a specific problem.”
• The brain is a highly complex, nonlinear, and
parallel computer (information-processing
system).
Introduction to Artificial Neural Networks
4/28/2025

• Brain has the capability to organize its structural
constituents, known as neurons, so as to
perform certain computations (e.g., pattern
recognition, perception, and motor control)
many times faster than the fastest digital
computer in existence today.
• Consider, for example, human vision, which is
an information-processing task.
• It is the function of the visual system to provide
a representation of the environment around us
and, more important, to supply the information
we need to interact with the environment.
4/28/2025

• To be specific, the brain routinely
accomplishes perceptual recognition tasks
(e.g., recognizing a familiar face embedded in
an unfamiliar scene) in approximately
100–200 ms, whereas tasks of much lesser
complexity take a great deal longer on a
powerful computer.
4/28/2025 Machine Learning Course- Dr G Madhu 5

• Although artificial neurons and perceptrons
were inspired by the biological processes
scientists were able to observe in the brain
back in the 50s, they do differ from their
biological counterparts in several ways.
• Birds have inspired flight and horses have
inspired locomotives and cars, yet none of
today’s transportation vehicles resemble metal
skeletons of living-breathing-self replicating
animals.

• Def: A neural network is a massively parallel
distributed processor made up of simple
processing units that has a natural propensity
for storing experiential knowledge and making
it available for use.
• It resembles the brain in two respects:
1. Knowledge is acquired by the network from its
environment through a learning process.
2. Inter-neuron connection strengths, known as
synaptic weights, are used to store the acquired
knowledge.
• The procedure used to perform the learning
process is called a learning algorithm

Biological Neural Networks
• A biological neural network is composed of a
groups of chemically connected or functionally
associated neurons.

https://en.wikipedia.org/wiki/Artificial_neural_network#/media/File:Neuron3.png

• The human nervous system contains cells, which
are referred to as neurons.
• The neurons are connected to one another with
the use of axons and dendrites, and the
connecting regions between axons and dendrites
are referred to as synapses.
• Tree like nerve fibres called dendrites are
associated with the cell body.
• These dendrites receive signals from other
neurons.

Source: https://www.kaggle.com/androbomb/simple-nn-with-python-multi-layer-perceptron

• These dendrites receive signals from other
neurons.
• Extending from the cell body is a single long
fibre called the axon, which eventually
branches into strands and substrands
connecting to many other neurons at the
synaptic junctions, or synapses.

Basic Notations
4/28/2025
1. Dendrite
– Dendrites are responsible for getting
incoming signals from outside
2. Soma
– Soma is the cell body responsible for
the processing of input signals and
deciding whether a neuron should
fire an output signal
3. Axon
– Axon is responsible for getting
processed signals from neuron to
relevant cells
4. Synapse
– Synapse is the connection between
an axon and other neuron dendrites

What is Artificial
Neuron?
• An artificial neuron is a mathematical function
conceived as a model of biological neurons, a
neural network.
• Artificial neurons are elementary units in an
artificial neural network.

illustration of a single biological neuron annotated to describe a single artificial neurons
function.

• A biological neuron receives input signals
from its dendrites from other neurons and
sends output signals along its axon, which
branches out and connects to other neurons.
• In the illustration above, the input signal is
represented by x0
, as this signal ‘travels’ it is
multiplied (w0
x0
) based on the a weight
variable (w0
).
• The weight variables are learnable and the
weights strength and polarity (positive or
negative) control the influence of the signal.

• The influence is determined by summing the
signal input and weight (∑wi
xi
+ b) which is
then calculated by the activation function f, if
it is above a certain threshold the neuron
fires.

Artificial Neurons
• Artificial neuron also known as perceptron is
the basic unit of the neural network.
• In simple terms, it is a mathematical function
based on a model of biological neurons.
or
A neuron is an information-processing unit that is
fundamental to the operation of a neural
network

What is Artificial Neural Network (ANN)?
• The human brain is considered the most
complicated object in the universe.
• Artificial Neural Network (ANN), which is a
system of computing that is loosely modelled on
the structure of the brain.

The Block Diagram of Model of a Neuron
Fig.1. Nonlinear model of a neuron, labelled k.

• In mathematical terms, we may describe the
neuron k depicted in above Fig.1 by writing
the pair of equations:

• The use of bias bk
has the effect of applying an affine
transformation to the output uk
of the linear combiner
in the model of Fig.1, as shown by
• In particular, depending on whether the bias bk
is
positive or negative, the relationship between the
induced local field, or activation potential, vk
of
neuron k and the linear combiner output uk
is
modified in the manner illustrated in Fig. 2;

• hereafter, these two terms are used interchangeably.
• Note that as a result of this affine transformation, the graph of
vk
versus uk
no longer passes through the origin.
Fig.2. Affine transformation produced by the presence of a bias; note that vk=bk at uk=0

• The bias bk
is an external parameter of neuron k. We
may account for its presence as in Eq. (2).
Equivalently, we may formulate the combination of
Eqs. (1) to (3) as follows:

• We may therefore reformulate the model of
neuron k as shown in Fig. 3.

• The values of the two inputs(x1
,x2
) are 0.8 and 1.2
• We have a set of weights (1.0,0.75) corresponding to the two
inputs
• Then we have a bias with value 0.5 which needs to be added
to the sum
• The input to activation function is then calculated using the
formula:

Biological Neuron vs. Artificial Neuron

NEURAL NETWORK REPRESENTATIONS

Appropriate problems for ANN Learning
• ANN learning is well-suited to problems in which
the training data corresponds to noisy, complex
sensor data, such as inputs from cameras and
microphones.
• It is also applicable to problems for which more
symbolic representations are often used, such
as the decision tree learning tasks discussed in
Chapter 2.
• In these cases ANN and decision tree learning
often produce results of comparable accuracy.

• The BACKPROPAGATION algorithm is the most
commonly used ANN learning technique. It is
appropriate for problems with the following
characteristics:
1. Instances are represented by many
attribute-value pairs: The target function to be
learned is defined over instances that can be
described by a vector of predefined features, such
as the pixel values in the ALVINN example. These
input attributes may be highly correlated or
independent of one another. Input values can be
any real values.

Appropriate for problems ANN Learning
2. The target function output may be discrete-valued,
real-valued, or a vector of several real- or discrete-
valued attributes.
– For example, in the ALVINN system the output is a vector of
30 attributes, each corresponding to a recommendation
regarding the steering direction.
– The value of each output is some real number between 0
and 1, which in this case corresponds to the confidence in
predicting the corresponding steering direction.
– We can also train a single network to output both the
steering command and suggested acceleration, simply by
concatenating the vectors that encode these two output
predictions.

3. The training examples may contain errors. ANN
learning methods are quite robust to noise in the
training data.
4. Long training times are acceptable. Network
training algorithms typically require longer training
times than, say, decision tree learning algorithms.
• Training times can range from a few seconds to
many hours, depending on factors such as the
number of weights in the network, the number of
training examples considered, and the settings of
various learning algorithm parameters

5. Fast evaluation of the learned target function
may be required.
– Although ANN learning times are relatively long,
evaluating the learned network, in order to apply it
to a subsequent instance, is typically very fast.
– For example, ALVINN applies its neural network
several times per second to continually update its
steering command as the vehicle drives forward.

6. The ability of humans to understand the learned
target function is not important.
– The weights learned by neural networks are often
difficult for humans to interpret. Learned neural
networks are less easily communicated to humans
than learned rules.

PERCEPTRONS
• Artificial neuron also known as perceptron is
the basic unit of the neural network.
• Any type of ANN system is based on a unit,
called a perceptron.
• A perceptron is a neural network unit (an
artificial neuron) that does certain
computations to detect features in the input
data.

Perceptron was introduced by Frank Rosenblatt in 1957. He proposed a Perceptron
learning rule based on the original MCP neuron.

How does it work?
• A perceptron takes a vector of real-valued
inputs, calculates a linear combination of these
inputs, then outputs is 1 if the result is greater
than some threshold and -1 otherwise.
• More precisely, given inputs x1
through xn
,the
output o(x1
, . . . , xn
) computed by the
perceptron is

• we will sometimes write the perceptron function as
• Learning a perceptron involves choosing values for
the weights wo
, . . . , wn
.
• Therefore, the space H of candidate hypotheses
considered in perceptron learning is the set of all
possible real-valued weight vectors.

How the Perceptron Algorithm Works

• Step-1: Assign a weight to each feature.
– In this case, there are two features, so we have two
weights. Set the initial values of the weights to 0.

• Step-2: For the first training example, take the
sum of each feature value multiplied by its
weight then add a bias term b which is also
initially set to 0.
Note : This represents an equation of a line. Currently, the line has 0 slope because we
initialized the weights as 0. We will be updating the weights momentarily and this will
result in the slope of the line converging to a value that separates the data linearly.

• Step-3: Apply a step function and assign the
result as the output prediction.
Note: Later, when learning about the multilayer perceptron, a different
activation function will be used such as the sigmoid, RELU or Tanh function.

• Step-4: Update the values of the weights and the
bias term.
• Step-5: Repeat steps 2,3 and 4 for each training
example.
• Step-6: Repeat until a specified number of
iterations have not resulted in the weights
changing or until the MSE (mean squared error) or
MAE (mean absolute error) is lower than a
specified value.
• Step-7: Use the weights and bias to predict the
output value of new observed values of x.

Illustrative Example

Challenges with Artificial Neural Network (ANN)
• While solving an image classification problem
using ANN, the first step is to convert a
2-dimensional image into a 1-dimensional
vector prior to training the model.
• This has two drawbacks:
– The number of trainable parameters increases
drastically with an increase in the size of the image
– ANN loses the spatial features of an image. Spatial
features refer to the arrangement of the pixels in
an image.

Comparing the Different Types of Neural Networks (MLP(ANN) vs. RNN vs. CNN)

Types of Perceptron's
There are two types of Perceptrons:
– Single layer and
– Multilayer
2. Single-layer Perceptron can learn only linearly separable
patterns.
3. Multilayer Perceptron or feedforward neural networks
with two or more layers have greater processing power.
4. The Perceptron algorithm learns the input signal weights
to draw a linear decision boundary.
5. This lets you distinguish between the two linearly
separable classes +1 and -1.

Single layer Perceptron
• A single layer perceptron (SLP) is a
feed-forward network based on a threshold
transfer function.
• SLP is the simplest type of artificial neural
networks and can only classify linearly
separable cases with a binary target (1 , 0).
• The single layer perceptron does not have a
priori knowledge, so the initial weights are
assigned randomly.

• SLP sums all the weighted inputs and if the sum is above the
threshold (some predetermined value), SLP is said to be
activated (output=1).

Machine Learning Course- Dr G Madhu 4/28/2025 61
The input values are presented to the perceptron, and if the predicted output is
the same as the desired output, then the performance is considered satisfactory
and no changes to the weights are made. However, if the output does not match
the desired output, then the weights need to be changed to reduce the error.

Perceptron Weight Adjustment
• Below is the equation in Perceptron weight
adjustment:
• Since this network model works with the linear classification and if the data is
not linearly separable, then this model will not show the proper results.

Representational Power of Perceptrons
A single perceptron can be used to represent many boolean
functions.
For example, if we assume boolean values of 1(true) and -1(false),
then one way to use a two-input perceptron to implement the AND
function is to set the weights w0=-0.8, and w1=w2=0.5.

• In fact, AND and OR can be viewed as special
cases of m-of-n functions: that is, functions
where at least m of the n inputs to the
perceptron must be true.
• However, some boolean functions cannot be
represented by a single perceptron, such as
the XOR function.

The decision surface represented by a two-input perceptron. x1 and
x2 are the perceptron inputs.
(a) A set of training examples and the decision surface of a
perceptron that classifies them correctly
(b) A set of training examples that is not linearly separable

• Because SLP is a linear classifier and if the
cases are not linearly separable the learning
process will never reach a point where all the
cases are classified properly.
• The most famous example of the inability of
perceptron to solve problems with linearly
non-separable cases is the XOR problem.

• However, a multi-layer perceptron using the
backpropagation algorithm can successfully
classify the XOR data.

The Perceptron Training Rule
• How does a single perceptron learn the weight?
– The precise learning problem is to determine a weight
vector that causes the perceptron to produce the correct
+1, -1 output for each of the given training examples.
• One way to learn an acceptable weight vector is
1. to begin with random weights
2. then iteratively apply the perceptron to each training
example
3. modifying the perceptron weights whenever it
misclassifies an example.
4. this process is repeated until the perceptron classifies all
training examples correctly.

• Weights are modified at each step according
to their perceptron training rule, which revises
the weight wi
associated with input xi
.

• It is usually set to some small value (e.g., 0.1) and
is sometimes made to decay as the number of
weight-tuning iterations increases.
• In fact, the above learning procedure can be
proven to converge within a finite number of
applications of the perceptron training rule to a
weight vector that correctly classifies all training
examples, provided the training examples are
linearly separable and provided a sufficiently
small 7 is used (see Minsky and Paper 1969). If the
data are not linearly separable, convergence is not
assured.

Multi-Layer Perceptron
• One input layer, one output layer, and one or
more hidden layers of processing units.
• No feedback connections (e.g. a Multi-Layer
Perceptron)

Questions
1. Make a perceptron that mimicks logical and, or, NAND,
Not, NOR etc.
2. Discuss the making of perceptron that output if
atleast m of n inputs are one.
3. Why perceptron model cannot learn XOR logic?
4. State Perceptron Learning Algorithm and discuss its
convergence
5. Compare Perceptron training rule and gradient descent
rule.
Compare incremental and stochastic approximation to
gradient descent
6. Discuss representational power of two layer perceptron
model versus multilayer perceptron model.

How a single perceptron can be used to represent the Boolean functions such as
AND, OR

Example-1: Representation of AND functions

Example-2: Representation of AND functions

Example-3: Representation of OR functions

Ans: Suppose the perceptron has two inputs A, B and constant 1.

Q.2. Design a two-layer network of perceptron's that implements A XOR B.
Why perceptron model cannot learn XOR logic?
Single Layer Perceptron Cannot Solve the "XOR" Problem
XOR logical Operator :
• XOR, or Exclusive OR, is a binary logical operator that takes in Boolean inputs and gives
out True if and only if the two inputs are different.
• This logical operator is especially useful when we want to check two conditions that can't
be simultaneously true. The following is the Truth table for the XOR function

The XOR Problem
• The XOR problem is that we need to build a Neural
Network (a perceptron in our case) to produce the
truth table related to the XOR logical operator.
• This is a binary classification problem. Hence,
supervised learning is a better way to solve it. In this
case, we will be using perceptrons.
• Uni layered perceptrons can only work with linearly
separable data.
• But in the following diagram drawn in accordance
with the truth table of the XOR logical operator, we
can see that the data is NOT linearly separable.

The Solution
• To solve this problem, we add an extra layer
to our vanilla perceptron, i.e., we create a
Multi Layered Perceptron (or MLP).
• We call this extra layer as the Hidden layer.
• To build a perceptron, we first need to
understand that the XOR gate can be written
as a combination of AND gates, NOT gates and
OR gates in the following way:
• a XOR b = (a AND NOT b)OR(b AND NOT a)
• The following is a plan for the perceptron.

Here, we need to observe that our inputs are 0s and 1s. To make it a XOR gate, we will make
the h1 node to perform the (x2 AND NOT x1) operation, the h2 node to perform (x1 AND
NOT x2) operation and the y node to perform (h1 OR h2) operation.
The NOT gate can be produced for an input a by writing (1-a), the AND gate can be produced
for inputs a and b by writing (a.b) and the OR gate can be produced for inputs a and b by
writing (a+b). Also, we'll use the sigmoid function as our activation function σ, i.e., σ(x) =
1/(1+e^(-x)) and the threshold for classification would be 0.5, i.e., any x with σ(x)>0.5 will be
classified as 1 and others will be classified as 0.

• Now, since we have all the information, we
can go on to define h1, h2 and y.
• Using the formulae for AND, NOT and OR
gates, we get:
– h1 = σ((1-x1) + x2) = σ((-1)x1 + x2 + 1)
– h2 = σ(x1 + (1-x2)) = σ(x1 + (-1)x2 + 1)
– y = σ(h1 + h2) = σ(h1 + h2 + 0)
Hence, we have built a multi layered perceptron
with the following weights and it predicts the
output of a XOR logical operator.

Q.2. Design a two-layer network of perceptron's that implements A XOR B.

• Drawback of Perceptron :
– The perceptron rule finds a successful weight
vector when the training examples are linearly
separable, it can fail to converge if the examples
are not linearly separable
• The Perceptron Training Rule
– The learning problem is to determine a weight
vector that causes the perceptron to produce the
correct + 1 or - 1 output for each of the given
training examples.

To learn an acceptable weight vector
• Begin with random weights, then iteratively
apply the perceptron to each training example,
modifying the perceptron weights whenever it
misclassifies an example.
• This process is repeated, iterating through the
training examples as many times as needed until
the perceptron classifies all training examples
correctly.

• Weights are modified at each
step according to the perceptron
training rule, which revises the
weight wi
associated with input xi
according to the rule.

• The role of the learning rate is to moderate the
degree to which weights are changed at each
step.
• It is usually set to some small value (e.g., 0.1)
and is sometimes made to decay as the
number of weight-tuning iterations increases
Drawback:
• The perceptron rule finds a successful weight
separable, it can fail to converge if the
examples are not linearly separable.

State Perceptron Learning Algorithm and Discuss its Convergence
• The perceptron convergence theorem basically
states that the perceptron learning algorithm
converges in finite number of steps, given a
linearly separable dataset.

Gradient Descent and the Delta Rule
• The perceptron rule finds a successful weight
separable.
• It can fail to converge if the examples are not
linearly separable.
• A second training rule, called the delta rule, is
designed to overcome this difficulty.
• If the training examples are not linearly
separable, the delta rule converges toward a
best-fit approximation to the target concept.

• The key idea behind the delta rule is to use
gradient descent to search the hypothesis
space of possible weight vectors to find the
weights that best fit the training examples.
• This rule is important because gradient descent
provides the basis for the BACKPROPAGATION
Algorithm, which can learn networks with many
interconnected units.

• It is also important because gradient descent can
serve as the basis for learning algorithms that
must search through hypothesis spaces
containing many different types of continuously
parameterized hypotheses.
• Gradient Descent : It is an optimization
algorithm used to find the values of parameters
(coefficients) of a function (f) that minimizes a
cost function (cost).

To understand the gradient descent algorithm, it is helpful to visualize the entire
hypothesis space of possible weight vectors and their associated E values, as
illustrated in Figure
• Here the axes w0
and w1
represent possible
values for the two weights of a simple linear
unit.
• The w0, w1 plane therefore represents the
entire hypothesis space.
• The vertical axis indicates the error E relative
to some fixed set of training examples.
• The error surface shown in the figure thus
summarizes the desirability of every weight
vector in the hypothesis space (we desire a
hypothesis with minimum error).
Source: Machine Learning, Tom Mitchell, McGraw Hill, 1997.

• Gradient descent search determines a weight
vector that minimizes E by starting with an
arbitrary initial weight vector, then repeatedly
modifying it in small steps.

DERIVATION OF THE GRADIENT DESCENT RULE

• Since the gradient specifies the direction of
steepest increase of E, the training rule for
gradient descent is

Feature of Gradient Descent Algorithm

Stochastic Approximation to Gradient Descent

Differences Between Standard Gradient Descent and Stochastic Gradient Descent

Remarks
• We have considered two similar algorithms for
iteratively learning perceptron weights.
• The key difference between these algorithms is
that the perceptron training rule updates
weights based on the error in the thresholded
perceptron output, whereas the delta rule
updates weights based on the error in the
un-thresholded linear combination of inputs.

• The difference between these two training rules
is reflected in different convergence properties.
– The perceptron training rule converges after a finite
number of iterations to a hypothesis that perfectly
classifies the training data, provided the training
examples are linearly separable.
– The delta rule converges only asymptotically toward
the minimum error hypothesis, possibly requiring
unbounded time, but converges regardless of
whether the training data are linearly separable.

Multilayer Networks and the Backpropagation Algorithm
• Single perceptrons can only express linear
decision surfaces.
• In contrast, the kind of multilayer networks
learned by the BACKPROPACATION algorithm are
capable of expressing a rich variety of nonlinear
decision surface.
• This section discusses how to learn such
multilayer networks using a gradient descent
algorithm.

• The network shown here was trained to recognize 1 of 10 vowel sounds occurring in
the context "h_d" (e.g., "had," "hid").
• The network input consists of two parameters, F1 and F2, obtained from a spectral
analysis of the sound.
• The 10 network outputs correspond to the 10 possible vowel sounds.
• The network prediction is the output whose value is highest. The plot on the right
illustrates the highly nonlinear decision surface represented by the learned network.
• Points shown on the plot are test examples distinct from the examples used to train
the network.

A Differentiable Threshold Unit
• What type of unit shall we use as the basis for
constructing multilayer networks?
• At first we might be tempted to choose the
linear units discussed in the previous section, for
which we have already derived a gradient
descent learning rule.
• However, multiple layers of cascaded linear
units still produce only linear functions, and we
prefer networks capable of representing highly
nonlinear functions.

• The perceptron unit is another possible
choice, but its discontinuous threshold makes
it undifferentiable and hence unsuitable for
gradient descent.
• What we need is a unit whose output is a
nonlinear function of its inputs, but whose
output is also a differentiable function of its
inputs.
• One solution is the sigmoid unit:
– a unit very much like a perceptron, but based on a
smoothed, differentiable threshold function.

The Sigmoid Threshold Unit
• The sigmoid unit is illustrated in following Figure.
• Like the perceptron, the sigmoid unit first computes a
linear combination of its inputs, then applies a
threshold to the result.
• In the case of the sigmoid unit, however, the threshold
output is a

The BACKPROPAGATION Algorithm
• The BACKPROPAGATION Algorithm learns the
weights for a multilayer network, given a
network with a fixed set of units and
interconnections.
• It employs gradient descent to attempt to
minimize the squared error between the
network output values and the target values
for these outputs.

Forward and Backward passes in Neural Networks
• To train a neural network, there are 2 passes
(phases):
– Forward
– Backward
• In the forward pass, we start by propagating
the data inputs to the input layer, go through
the hidden layer(s), measure the network’s
predictions from the output layer, and finally
calculate the network error based on the
predictions the network made.

• This network error measures how far
the network is from making the
correct prediction.
The forward and backward phases are
repeated from some epochs. In each
epoch, the following occurs:
1.The inputs are propagated from
the input to the output layer.
2.The network error is calculated.
3.The error is propagated from the
output layer to the input layer.

• In the backward pass, the flow is reversed so
that we start by propagating the error to the
output layer until reaching the input layer
passing through the hidden layer(s).
• The process of propagating the network error
from the output layer to the input layer is
called backward propagation, or
simple backpropagation.
• The backpropagation algorithm is the set of
steps used to update network weights to reduce
the network error.

• In BACKPROPAGATION algorithm, we consider
networks with multiple output units rather
than single units as before, so we redefine E to
sum the errors over all of the network output
units.

Algorithm

1.Convergence and Local Minima
• Backpropagation is only guaranteed to converge
to a local, and not a global, minima.
• However, since each weight in a network
essentially corresponds to a different dimension
in the error space, a local minimum with respect
to one weight may not be a local minimum with
respect to other weights.
• This can provide an “escape route” from
becoming trapped in local minima.

• If the weights are initialized to values close to
zero, the sigmoid threshold function is
approximately linear and so they produce linear
outputs.
• As the weights grow, though, the network is
able to represent more complex functions that
are not linear in nature.
• It is the hope that by the time the weights are
able to approximate the desired function that
they will be close enough to the global
minimum that even becoming stuck in a local
minima will be acceptable.

Common Heuristic methods to reduce the problem of local minima
are:
• Add a momentum term to the weight-update rule.
• Use stochastic gradient descent rather than true
gradient descent.
• Train multiple networks using the same training
data but initialize the networks with different
random weights.
• If the different networks lead to different local
minima, choose the network that performs best on
a validation set of data or all networks can be kept
and treated as a committee whose output is the
(possibly weighted) average of individual network
outputs.

• A local minimum of a function
is a point where the function
value is smaller than at
nearby points, but possibly
greater than at a distant
point.
• A global minimum is a point
where the function value is
smaller than at all other
feasible points.

Recurrent Neural Network
• A recurrent neural network (RNN) is a class of
artificial neural networks where connections
between nodes form a directed or undirected
graph along a temporal sequence.
• This allows it to exhibit temporal dynamic
behaviour.
• Derived from feedforward neural networks,
RNNs can use their internal state (memory) to
process variable length sequences of inputs

• Recurrent neural networks (RNN) are the state
of the art algorithm for sequential data and
are used by Apple's Siri and and Google's voice
search.
• It is the first algorithm that remembers its
input, due to an internal memory, which
makes it perfectly suited for machine learning
problems that involve sequential data.

Recurrent Neural Network (RNN) Feed-forward Neural Network –
Multilayer Perceptron (MLP)

Unit-5 madhu .pdf

More Related Content

Similar to Unit-5 madhu .pdf

Recently uploaded

Unit-5 madhu .pdf