SujanKhamrui_28100119050.pptx

NEURAL
NETWORKS
AND DEEP
LEARNING
(PEC-CS702A)

CONTENTS:
1. What is Artificial Neural Network?
2. What is Mucculloch Pitts model?
3. What is weight and bias
4. Various layer
5. Activation function
6. Examples of activation function
7. Feed forward neural network
8. Single layer perceptron and problem of single layer
perceptron
9. Multilayer perceptron
10.Types of neural networks
11.Various paradigms of learning problems
12.Perspectives and Issues in deep learning framework
13.Cardinality of feed forward neural network
14.Properties of fuzzy relation

What is
Artificial
Neural
Network?
• Neural networks, also known as artificial neural networks
(ANNs) or simulated neural networks (SNNs), are a subset of
ML and are at the heart of deep learning algorithms. Their
name and structure are inspired by the human brain,
mimicking the way that biological neurons signal to one
another.
• Artificial neural networks (ANNs) are comprised of a node
layers, containing an input layer, one or more hidden layers,
and an output layer. Each node, or artificial neuron, connects
to another and has an associated weight and threshold. If the
output of any individual node is above the specified threshold
value, that node is activated, sending data to the next layer of
the network. Otherwise, no data is passed along to the next
layer of the network.
•

• Neural networks rely on training data to
learn and improve their accuracy over time.
However, once these learning algorithms are
fine-tuned for accuracy, they are powerful
tools in computer science and AI, allowing us
to classify and cluster data at a high velocity.
Tasks in speech recognition or image
recognition can take minutes versus hours
when compared to the manual identification
by human experts. One of the most well-
known neural networks is Google’s search
algorithm.

HOW DO NEURAL NETWORK WORK?
A simple neural network includes an input layer, an output (or target) layer and, in between, a
hidden layer. The layers are connected via nodes, and these connections form a “network” – the
neural network – of interconnected nodes.
A node is patterned after a neuron in a human brain. Similar in behavior to neurons, nodes are
activated when there is sufficient stimuli or input. This activation spreads throughout the network,
creating a response to the stimuli (output). The connections between these artificial neurons act as
simple synapses, enabling signals to be transmitted from one to another. Signals across layers as
they travel from the first input to the last output layer – and get processed along the way.
When posed with a request or problem to solve, the neurons run mathematical calculations to
figure out if there’s enough information to pass on the information to the next neuron. Put more
simply, they read all the data and figure out where the strongest relationships exist. In the simplest
type of network, data inputs received are added up, and if the sum is more than a certain threshold
value, the neuron “fires” and activates the neurons it’s connected to.

As the number of hidden layers within a neural network increases, deep neural networks are
formed. DL architectures take simple neural networks to the next level. Using these layers, data
scientists can build their own deep learning networks that enable ML, which can train a computer
to accurately emulate human tasks, such as recognizing speech, identifying images or making
predictions. Equally important, the computer can learn on its own by recognizing patterns in many
layers of processing.
So let’s put this definition into action. Data is fed into a neural network through the input layer,
which communicates to hidden layers. Processing takes place in the hidden layers through a
system of weighted connections. Nodes in the hidden layer then combine data from the input layer
with a set of coefficients and assigns appropriate weights to inputs. These input-weight products
are then summed up. The sum is passed through a node’s activation function, which determines
the extent that a signal must progress further through the network to affect the final output. Finally,
the hidden layers link to the output layer – where the outputs are retrieved.

WHAT IS MUCCULLOCH PITTS MODEL?
It is very well known that the most fundamental unit of deep neural networks is called an artificial
neuron/perceptron. But the very first step towards the perceptron we use today was taken in 1943 by
McCulloch and Pitts, by mimicking the functionality of a biological neuron.
Basically, a neuron takes an input signal (dendrite), processes it like the CPU (soma), passes the
output through a cable like structure to other connected neurons (axon to synapse to other neuron’s
dendrite). Now, this might be biologically inaccurate as there is a lot more going on out there but on a
higher level, this is what is going on with a neuron in our brain — takes an input, processes it, throws
out an output.
Our sense organs interact with the outer world and send the visual and sound information to the
neurons. Let's say you are watching Friends. Now the information your brain receives is taken in by the
“laugh or not” set of neurons that will help you make a decision on whether to laugh or not. Each
neuron gets fired/activated only when its respective criteria (more on this later) is met like shown
below.
Of course, this is not entirely true. In reality, it is not just a couple of neurons which would do the
decision making. There is a massively parallel interconnected network of 10¹¹ neurons (100 billion) in
our brain and their connections are not as simple as I showed you above. It might look something like
this:

Now the sense organs pass the information to the first/lowest layer of neurons to process it. And
the output of the processes is passed on to the next layers in a hierarchical manner, some of the
neurons will fire and some won’t and this process goes on until it results in a final response — in
this case, laughter.
This massively parallel network also ensures that there is a division of work. Each neuron only
fires when its intended criteria is met i.e., a neuron may perform a certain role to a certain
stimulus, as shown below.
It may be divided into 2 parts. The first part, g takes an input (ahem dendrite ahem), performs an
aggregation and based on the aggregated value the second part, f makes a decision.
Lets suppose that I want to predict my own decision, whether to watch a random football game or
not on TV. The inputs are all boolean i.e., {0,1} and my output variable is also boolean {0: Will
watch it, 1: Won’t watch it}.

• So, x_1 could be isPremierLeagueOn (I like Premier League more)
• x_2 could be isItAFriendlyGame (I tend to care less about the friendlies)
• x_3 could be isNotHome (Can’t watch it when I’m running errands. Can I?)
• x_4 could be isManUnitedPlaying (I am a big Man United fan. GGMU!) and so on.
• These inputs can either be excitatory or inhibitory. Inhibitory inputs are those that
have maximum effect on the decision making irrespective of other inputs i.e., if x_3 is 1
(not home) then my output will always be 0 i.e., the neuron will never fire, so x_3 is an
inhibitory input. Excitatory inputs are NOT the ones that will make the neuron fire on
their own but they might fire it when combined together. Formally, this is what is going
on:

WEIGHTS IN NEURAL NETWORKS:
• Weight is the parameter
within a neural network that
transforms input data within
the network's hidden layers. A
neural network is a series of
nodes, or neurons.

BIAS IN NEURAL NETWORKS:
• Bias in Neural Networks can
be thought of as analogous to
the role of a constant in a
linear function, whereby the
line is effectively transposed
by the constant value. In a
scenario with no bias, the
input to the activation function
is 'x' multiplied by the
connection weight 'w0'.

VARIOUS
LAYER OF
NEURAL
NETWORK:
• There are Four common Types of Neural Network
layers:
• 1. Fully Connected Layer
• 2. Convolution Layer
• 3. Deconvolution Layer
• 4. Recurrent Layer

FULLY
CONNECTED
LAYER
• Fully Connected Layer is simply, feed forward neural
networks. Fully Connected Layers form the last few layers
in the network. The input to the fully connected layer is
the output from the final Pooling or Convolutional Layer,
which is flattened and then fed into the fully connected
layer.

CONVOLUTION
LAYER
• Convolutional layers are the
layers where filters are
applied to the original image,
or to other feature maps in a
deep CNN. This is where most
of the user-specified
parameters are in the
network. The most important
parameters are the number of
kernels and the size of the
kernels.

DECONVOLUTION
LAYER:
• Deconvolutional networks
are convolutional neural networks
(CNN) that work in a reversed process.
Deconvolutional networks, also known
as deconvolutional neural networks, are
very similar in nature to CNNs run in
reverse but are a distinct application of
artificial intelligence (AI).

RECURRENT LAYER:
• Layers to construct
recurrent networks.
Recurrent layers can be
used similarly to feed-
forward layers except that
the input shape is expected
to be (batch_size,
sequence_length,
num_inputs).

ACTIVATION FUNCTIONS:
• A neural network without an activation
function is essentially just a linear regression
model. The activation function does the non-
linear transformation to the input making it
capable to learn and perform more complex
tasks.

EXAMPLES
OF
ACTIVATION
FUNCTION
• The internet provides access to plethora of information today. Whatever
we need is just a Google (search) away. However, when we have so much
information, the challenge is to segregate between relevant and irrelevant
information.
• When our brain is fed with a lot of information simultaneously, it tries hard
to understand and classify the information into “useful” and “not-so-
useful” information. We need a similar mechanism for classifying incoming
information as “useful” or “less-useful” in case of Neural Networks.
• 1. Binary Step Function
• The first thing that comes to our mind when we have an activation function
would be a threshold based classifier i.e. whether or not the neuron should be
activated based on the value from the linear transformation.
• In other words, if the input to the activation function is greater than a
threshold, then the neuron is activated, else it is deactivated, i.e. its output is
not considered for the next hidden layer.

2. Linear Function
We saw the problem with the step function, the gradient of the function became zero. This is because there is
no component of x in the binary step function. Instead of a binary function, we can use a linear function.
3. Sigmoid
The next activation function that we are going to look at is the Sigmoid function. It is one of the most widely
used non-linear activation function. Sigmoid transforms the values between the range 0 and 1.
4. Tanh
The tanh function is very similar to the sigmoid function. The only difference is that it is symmetric around the
origin. The range of values in this case is from -1 to 1. Thus the inputs to the next layers will not always be of the
same sign.
5. ReLU
The ReLU function is another non-linear activation function that has gained popularity in the deep learning
domain. ReLU stands for Rectified Linear Unit. The main advantage of using the ReLU function over other
activation functions is that it does not activate all the neurons at the same time.

This means that the neurons will only be deactivated if the output of the linear transformation is less than 0.
6. Leaky ReLU
Leaky ReLU function is nothing but an improved version of the ReLU function. As we saw that for the ReLU
function, the gradient is 0 for x<0, which would deactivate the neurons in that region.
Leaky ReLU is defined to address this problem. Instead of defining the Relu function as 0 for negative values
of x, we define it as an extremely small linear component of x.
7. Parameterised ReLU
This is another variant of ReLU that aims to solve the problem of gradient’s becoming zero for the left half of
the axis. The parameterised ReLU, as the name suggests, introduces a new parameter as a slope of the
negative part of the function.
8. Exponential Linear Unit
Exponential Linear Unit or ELU for short is also a variant of Rectiufied Linear Unit (ReLU) that modifies the
slope of the negative part of the function. Unlike the leaky relu and parametric ReLU functions, instead of a
straight line, ELU uses a log curve for defning the negatice values.

9.Swish
Swish is a lesser known activation function which was discovered by researchers at Google. Swish is as
computationally efficient as ReLU and shows better performance than ReLU on deeper models. The values
for swish ranges from negative infinity to infinity.
10. Softmax
Softmax function is often described as a combination of multiple sigmoids. We know that sigmoid returns
values between 0 and 1, which can be treated as probabilities of a data point belonging to a particular
class. Thus sigmoid is widely used for binary classification problems.
The softmax function can be used for multiclass classification problems. This function returns the
probability for a datapoint belonging to each individual class

FEED FORWARD NEURAL NETWORK
• Deep feedforward networks, also often
called feedforward neural networks, or multilayer
perceptrons(MLPs), are the quintessential deep learning
models. The goal of a feedforward network is to
approximate some function f*. For example, for a
classifier, y = f*(x) maps an input x to a category y. A
feedforward network defines a mapping y = f(x;θ) and
learns the value of the parameters θ that result in the
best function approximation.
• These models are called feedforward because
information flows through the function being evaluated
from x, through the intermediate computations used to
define f, and finally to the output y. There are no
feedback connections in which outputs of the model are
fed back into itself. When feedforward neural networks
are extended to include feedback connections, they are
called recurrent neural networks(we will see in later
segment).

The simplest kind of neural network is a single-layer perceptron network, which consists of a single layer
of output nodes; the inputs are fed directly to the outputs via a series of weights. The sum of the
products of the weights and the inputs is calculated in each node, and if the value is above some
threshold (typically 0) the neuron fires and takes the activated value (typically 1); otherwise it takes the
deactivated value (typically -1). Neurons with this kind of activation function are also called artificial
neurons or linear threshold units. In the literature the term perceptron often refers to networks consisting
of just one of these units. A similar neuron was described by Warren McCulloch and Walter Pitts in the
1940s.
A perceptron can be created using any values for the activated and deactivated states as long as the
threshold value lies between the two.
Perceptrons can be trained by a simple learning algorithm that is usually called the delta rule. It
calculates the errors between calculated output and sample output data, and uses this to create an
adjustment to the weights, thus implementing a form of gradient descent.
Single-layer perceptrons are only capable of learning linearly separable patterns; in 1969 in a
famous monograph entitled Perceptron, Marvin Minsky and Seymour Papert showed that it was
impossible for a single-layer perceptron network to learn an XOR function (nonetheless, it was known
that multi-layer perceptrons are capable of producing any possible boolean function).
Although a single threshold unit is quite limited in its computational power, it has been shown that
networks of parallel threshold units can approximate any continuous function from a compact interval of
the real numbers into the interval [-1,1]. This result can be found in Peter Auer, Harald
Burgsteiner and Wolfgang Maass "A learning rule for very simple universal approximators consisting of a
single layer of perceptrons".

SINGLE LAYER PERCEPTRON AND PROBLEM OF
SINGLE LAYER PERCEPTRON
• The perceptron is a single processing unit
of any neural network. Frank
Rosenblatt first proposed in 1958 is a
simple neuron which is used to classify its
input into one or two categories.
Perceptron is a linear classifier, and is used
in supervised learning. It helps to organize
the given input data.
• A perceptron is a neural network unit
that does a precise computation to detect
features in the input data. Perceptron is
mainly used to classify the data into two
parts. Therefore, it is also known as Linear
Binary Classifier.

The perceptron consists of 4 parts.
•Input value or one input layer: the input layer of the perceptron is made of artificial input neurons
and takes the initial data into the system for further processing.
Weights and bias:
•
weight: it represents the dimension or strength of the connection between units. If the weight to node
1 to node 2 has a higher quantity, then neuron 1 has a more considerable influence on the neuron.
•
Bias: it is the same as the intercept added in a linear equation. It is an additional parameter which
task is to modify the output along with the weighted sum of the input to the other neuron.
•Net sum: it calculates the total sum.
•Activation function: a neuron can be activated or not, is determined by an activation function. The
activation function calculates a weighted sum and further adding bias with it to give the result.
The perceptron consists of 4 parts.
•Input value or One input layer: The input layer of the perceptron is made of artificial input neurons
and takes the initial data into the system for further processing.

•Weights and Bias:
Weight: It represents the dimension or strength of the connection between units. If the weight to node 1 to
node 2 has a higher quantity, then neuron 1 has a more considerable influence on the neuron.
Bias: It is the same as the intercept added in a linear equation. It is an additional parameter which task is to
modify the output along with the weighted sum of the input to the other neuron.
•Net sum: It calculates the total sum.
•Activation Function: A neuron can be activated or not, is determined by an activation function. The
activation function calculates a weighted sum and further adding bias with it to give the result.
The single-layer perceptron was the first neural network model, proposed in 1958 by Frank Rosenbluth. It is
one of the earliest models for learning. Our goal is to find a linear decision function measured by the weight
vector w and the bias parameter b.
To understand the perceptron layer, it is necessary to comprehend artificial neural networks (ANNs).
The artificial neural network (ANN) is an information processing system, whose mechanism is inspired by the
functionality of biological neural circuits. An artificial neural network consists of several processing units that
are interconnected.
This is the first proposal when the neural model is built. The content of the neuron's local memory contains a
vector of weight.
The single vector perceptron is calculated by calculating the sum of the input vector multiplied by the
corresponding element of the vector, with each increasing the amount of the corresponding component of the
vector by weight. The value that is displayed in the output is the input of an activation function.

Let us focus on the implementation of a single-layer perceptron for an image classification problem using
TensorFlow. The best example of drawing a single-layer perceptron is through the representation of "logistic
regression.“
•The weights are initialized with the random values at the origination of each training.
•For each element of the training set, the error is calculated with the difference between the desired output
and the actual output. The calculated error is used to adjust the weight.
•The process is repeated until the fault made on the entire training set is less than the specified limit until the
maximum number of iterations has been reached.

PROBLEM OF SINGLE LAYER PERCEPTRON
• A "single-layer" perceptron can't implement XOR. The reason is because
the classes in XOR are not linearly separable. You cannot draw a straight line
to separate the points (0,0),(1,1) from the points (0,1),(1,0). Led to invention
of multi-layer networks.

MULTILAYER PERCEPTRON
• A multilayer perceptron (MLP) is a fully connected
class of feedforward artificial neural network (ANN). The
term MLP is used ambiguously, sometimes loosely to
mean any feedforward ANN, sometimes Understanding
this network helps us to obtain information about the
underlying reasons in the advanced models of Deep
Learning. Multilayer Perceptron is commonly used in
simple regression problems. However, MLPs are not ideal
for processing patterns with sequential and
multidimensional data
• A multi-layered perceptron (MLP) is one of the most
common neural network models used in the field of deep
learning. Often referred to as a “vanilla” neural network,
an MLP is simpler than the complex models of today’s
era. However, the techniques it introduced have paved
the way for further advanced neural networks.

The multilayer perceptron (MLP) is used for a variety of tasks, such as stock analysis,
image identification, spam detection, and election voting predictions.
strictly to refer to networks composed of multiple layers of perceptrons (with threshold
activation);
In the multi-layer perceptron diagram above, we can see that there are three inputs and
thus three input nodes and the hidden layer has three nodes. The output layer gives two
outputs, therefore there are two output nodes. The nodes in the input layer take input and
forward it for further process, in the diagram above the nodes in the input layer forwards
their output to each of the three nodes in the hidden layer, and in the same way, the hidden
layer processes the information and passes it to the output layer.
Every node in the multi-layer perception uses a sigmoid activation function. The sigmoid
activation function takes real values as input and converts them to numbers between 0 and
1 using the sigmoid formula.

TYPES OF NEURAL NETWORKS
• Perceptron
• Feed Forward Neural Network
• Multilayer Perceptron
• Convolutional Neural Network
• Radial Basis Functional Neural Network
6. Recurrent Neural Network
7. LSTM – Long Short-Term
Memory
8. Sequence to Sequence Models
9. Modular Neural Network

A. Perceptron
Perceptron model, proposed by Minsky-Papert is one of the simplest and oldest
models of Neuron. It is the smallest unit of neural network that does certain
computations to detect features or business intelligence in the input data.
It accepts weighted inputs, and apply the activation function to obtain the
output as the final result. Perceptron is also known as TLU(threshold logic
unit)
Feed Forward Neural Networks
The simplest form of neural networks where input data travels in one
direction only, passing through artificial neural nodes and exiting through
output nodes. Where hidden layers may or may not be present, input and output
layers are present there. Based on this, they can be further classified as a
single-layered or multi-layered feed-forward neural network.
C. Multilayer Perceptron
An entry point towards complex neural nets where input data travels through various
layers of artificial neurons. Every single node is connected to all neurons in the
next layer which makes it a fully connected neural network. Input and output layers
are present having multiple hidden Layers i.e. at least three or more layers in total.
It has a bi-directional propagation i.e. forward propagation and backward propagation.
Inputs are multiplied with weights and fed to the activation function and in
backpropagation, they are modified to reduce the loss. In simple words, weights are
machine learnt values from Neural Networks. They self-adjust depending on the

D. Convolutional Neural Network
Convolution neural network contains a three-dimensional arrangement of neurons, instead of
the standard two-dimensional array. The first layer is called a convolutional layer. Each
neuron in the convolutional layer only processes the information from a small part of the
visual field. Input features are taken in batch-wise like a filter. The network
understands the images in parts and can compute these operations multiple times to
complete the full image processing. Processing involves conversion of the image from RGB
or HSI scale to grey-scale. Furthering the changes in the pixel value will help to detect
the edges and images can be classified into different categories.
E. Radial Basis Function Neural Networks
Radial Basis Function Network consists of an input vector followed by a layer of RBF
neurons and an output layer with one node per category. Classification is performed by
measuring the input’s similarity to data points from the training set where each neuron
stores a prototype. This will be one of the examples from the training set.
When a new input vector [the n-dimensional vector that you are trying to classify] needs
to be classified, each neuron calculates the Euclidean distance between the input and its
prototype. For example, if we have two classes i.e. class A and Class B, then the new
input to be classified is more close to class A prototypes than the class B prototypes.
Hence, it could be tagged or classified as class A.
Each RBF neuron compares the input vector to its prototype and outputs a value ranging
which is a measure of similarity from 0 to 1. As the input equals to the prototype, the
output of that RBF neuron will be 1 and with the distance grows between the input and
prototype the response falls off exponentially towards 0. The curve generated out of
neuron’s response tends towards a typical bell curve. The output layer consists of a set
of neurons [one per category].

F. Recurrent Neural Networks
Designed to save the output of a layer, Recurrent Neural Network is fed
back to the input to help in predicting the outcome of the layer. The first
layer is typically a feed forward neural network followed by recurrent
neural network layer where some information it had in the previous time-
step is remembered by a memory function. Forward propagation is implemented
in this case. It stores information required for it’s future use. If the
prediction is wrong, the learning rate is employed to make small changes.
Hence, making it gradually increase towards making the right prediction
during the backpropagation.
G. Sequence to sequence models
A sequence to sequence model consists of two Recurrent Neural Networks.
Here, there exists an encoder that processes the input and a decoder that
processes the output. The encoder and decoder work simultaneously – either
using the same parameter or different ones. This model, on contrary to the
actual RNN, is particularly applicable in those cases where the length of
the input data is equal to the length of the output data. While they
possess similar benefits and limitations of the RNN, these models are
usually applied mainly in chatbots, machine translations, and question
answering systems.

VARIOUS PARADIGMS OF LEARNING PROBLEMS
• There are three major learning paradigms.
• 1. supervised learning,
• 2. unsupervised learning
• 3. reinforcement learning.
• Usually they can be employed by any given type of artificial neural network
architecture. Each learning paradigm has many training algorithms.

PERSPECTIVES AND ISSUES IN DEEP LEARNING
FRAMEWORK
• There are three types of problems that are straightforward to diagnose
with regard to poor performance of a deep learning neural network model;
they are:
• Problems with Learning
• Problems with Generalization
• Problems with Predictions

CARDINALITY OF FEED
FORWARD NEURAL
NETWORK
• The cardinality refers to the number of
parallel paths that appear in a block. This
sounds similar to the inception block which
features 4 operations happening in parallel.
However, instead of using different types of
operations in parallel a cardinality of 4 will
simply use the same operation 4 times

PROPERTIES OF
FUZZY RELATION
• The rule bases and the fuzzy
relations may have algebraic
properties, the commutative
property, inverse, and identity, but
not the associative property, so no
kind of algebraic structures may be
developed. The fuzzy relations are
nonlinear functions.

SujanKhamrui_28100119050.pptx

Recommended

Recommended

More Related Content

Similar to SujanKhamrui_28100119050.pptx

Similar to SujanKhamrui_28100119050.pptx (20)

More from PrakasBhowmik

More from PrakasBhowmik (9)

Recently uploaded

Recently uploaded (20)

SujanKhamrui_28100119050.pptx