Chapter four
Deep Learning
 ANN is a computational model that simulates some human
brain properties.
 Algorithms that try to mimic the brain.
 massively parallel, distributed system, made up of
simple processing units (neurons)
 The network acquires knowledge from its environment
through a learning process
 Neurons are connected with others by connection link.
 Each link is associated with weights which contain
information about the input signal
Basics of Artificial Neural Networks (ANN)
Basics of ANN
Biological Neural Network Artificial Neural Networks
 There are three layers in the network architecture:
 The input layer
 The hidden layer (can be more than one)
 The output layer.
Basics of ANN
 In an ANN, data flows from the input layer, through one or more
hidden layers, to the output layer.
 Each layer consists of neurons that receive input, process it, and
pass the output to the next layer.
 The layers work together to extract features, transform data, and
make predictions.
 The input layer is the first layer in an ANN and is responsible for
receiving the raw input data.
 It doesn’t perform any computations but passes the data to
the next layer.
Basics of ANN
 Hidden Layers are the intermediate layers between the
input and output layers.
 They perform most of the computations required by the
network.
 can vary in number and size, depending on the
complexity of the task.
 Each hidden layer applies a set of weights and biases to
the input data, followed by an activation function to
introduce non-linearity.
Basics of ANN
 The Output Layer is the final layer in an ANN.
 It produces the output predictions.
 The number of neurons in this layer corresponds to the
number of classes in a classification problem or the number
of outputs in a regression problem.
Neurons
 a neuron (or node) is a basic computational unit that
mimics the behavior of a biological neuron in the human
brain.
 In ANN they receive inputs, process them, and pass the
output to the next layer of neurons.
Basics of ANN
 The neuron calculates a weighted sum of the inputs.
 This is done by multiplying each input by its
corresponding weight and adding them up.
Weights
 Weights determine the strength of the connections
between neurons.
 Each connection between neurons is assigned a weight,
which is multiplied by the input value to the neuron to
determine its output.
Basics of ANN
Bias
 Bias is added to the weighted sum of inputs to a neuron in a
given layer.
 It is an additional input to the neuron that helps to adjust the
output of the activation function.
Activation Function
 An activation function is a mathematical function applied to
the output of a neuron.
 Its function is to introduce non-linearity into the model,
allowing the network to learn and represent complex
patterns in the data.
Basics of ANN
 The activation function decides whether a neuron should
be activated or not by calculating the weighted and adding
bias to it.
Basics of ANN
Variants of Activation Function
Sigmoid Function
 takes any real value as input and outputs values from 0 to 1.
Basics of ANN
 The larger the input (more positive), the closer the output
value will be to 1, whereas the smaller the input (more
negative), the closer the output will be to 0.
Basics of ANN
Tanh Function (Hyperbolic Tangent)
 Tanh function is very similar to the sigmoid activation function
 The output range from -1 to 1.
 In Tanh, the larger the input (more positive), the closer the
output value will be to 1, whereas the smaller the input
(more negative), the closer the output will be to -1.
Basics of ANN
ReLu Activation function
 Rectified Linear Unit activation function is one of the most
commonly used activation functions in deep learning,
particularly in convolutional neural networks (CNNs).
Basics of ANN
Softmax activation function
 It is commonly used in the output layer of neural networks
for multi-class classification tasks.
 It converts the raw output of a neural network into a
probability distribution over multiple output classes
Basics of ANN
Loss Function
 A loss function measures how well a neural network model
performs a certain task, which in most cases is regression
or classification.
 To improve the neural network, we must minimize the value
of the loss function during the backpropagation step.
 We only use the cross-entropy loss function in
classification tasks when we want the neural network to
predict probabilities.
Basics of ANN
 For regression tasks, when we want the network to predict
continuous numbers, we must use the mean squared error
loss function.
 We use the mean absolute percentage error loss function
during demand forecasting to monitor the network's
performance during training time.
Types of Loss Functions
 In supervised learning:
 Regression Loss Functions:
• Mean Squared Error
• Mean Absolute Error
Basics of ANN
 Classification Loss Functions:
 Binary Cross-Entropy
 Categorical Cross-Entropy
Basics of ANN
 It is a type of artificial neural network where connections
between the nodes do not form cycles.
 This characteristic differentiates it from recurrent neural
networks (RNNs).
 The network consists of an input layer, one or more
hidden layers, and an output layer.
 Information flows in one direction—from input to output
Feedforward Neural Network
 RNN is a deep learning model that is trained to process and
convert a sequential data input into a specific sequential
data output.
 words
 Sentences
 time-series data
 The main and most important feature of RNN is its Hidden
state.
 remembers some information about a sequence
Recurrent Neural Network (RNN)
 Issues in RNN model training:
 Vanishing Gradient
 occurs when the gradients of the loss function with respect to the
parameters (like weights) become extremely small as they are
propagated back through layers or time steps.
 Exploding Gradient
 occurs when the weights are large, causing the gradients to blow up
during backpropagation and causing instability during training, where
the model's weights may oscillate wildly or overflow, making learning
impossible.
Recurrent Neural Network (RNN)
 special type of RNN capable of handling the vanishing
gradient problem faced by RNN.
 LSTMs are explicitly designed to avoid long-term
dependency problems.
 It has three parts (gates)
 Forget gate
 Input gate
 Output gate
Long Short-term Memory (LSTM)
 Forget gate chooses whether the information coming from
the previous timestamp is to be remembered or is
irrelevant.
 In the Input gate the cell tries to learn new information from
the input to this cell.
 In the output gate the cell passes the updated information
from the current timestamp to the next timestamp
Long Short-term Memory (LSTM)
 Forget gate chooses whether the information coming from
the previous timestamp is to be remembered or is
irrelevant.
 In the Input gate the cell tries to learn new information from
the input to this cell.
 In the output gate the cell passes the updated information
from the current timestamp to the next timestamp
Long Short-term Memory (LSTM)
 convolutional neural networks (ConvNets or CNNs) are
more often utilized for classification and computer vision
tasks.
 They provide a more scalable approach to image
classification and object recognition tasks, leveraging
principles from linear algebra, specifically matrix
multiplication, to identify patterns within an image.
 They can be computationally demanding, requiring graphical
processing units (GPUs) to train models.
Convolutional Neural Networks (CNNs)
 CNNs are distinguished from other neural networks by their
superior performance with image, speech, or audio signal
inputs.
 They have three main types of layers, which are:
 Convolutional layer
 Pooling layer
 Fully-connected (FC) layer
 With each layer, the CNN increases in complexity,
identifying greater portions of the image.
 Earlier layers focus on simple features, such as colors and
edges
Convolutional Neural Networks (CNNs)
 As the image data progresses through the layers of the CNN,
it starts to recognize larger elements or shapes of the
object until it finally identifies the intended object.
Convolutional layer
 It is the first layer of a convolutional network and the core
building block of a CNN.
 It is where the majority of computation occurs.
 It requires a few components:
 input data,
 Filter
 feature map.
Convolutional Neural Networks (CNNs)
 Let’s assume that the input will be a color image, made up of
a matrix of pixels in 3D.
 This means that the input will have three dimensions
height, width, and depth corresponding to RGB in an
image.
 We also have a feature detector, also known as a kernel or a
filter.
 Filter will move across the receptive fields of the image,
checking if the feature is present. This process is known
as a convolution.
Convolutional Neural Networks (CNNs)
 The feature detector is a two-dimensional (2-D) array of
weights, which represents part of the image.
 The filter is then applied to an area of the image, and a dot
product is calculated between the input pixels and the
filter.
 This dot product is then fed into an output array.
 Afterward, the filter shifts by a stride, repeating the process
until the kernel has swept across the entire image.
 The final output from the series of dot products from the
input and the filter is known as a feature map, activation
map, or a convolved feature.
Convolutional Neural Networks (CNNs)
 The feature detector remain fixed as it moves across the
image, which is also known as parameter sharing.
 Some parameters, like the weight values, adjust during
training through the process of backpropagation and
gradient descent.
 However, there are three hyperparameters which affect the
volume size of the output that need to be set before the
training of the neural network begins.
 This include:
Convolutional Neural Networks (CNNs)
 Number of filters: affects the depth of the output. For
example, three distinct filters would yield three different
feature maps, creating a depth of three.
 Stride: is the distance, or number of pixels, that the kernel
moves over the input matrix.
 Zero-padding: is usually used when the filters do not fit the
input image.
 This sets all elements that fall outside of the input matrix
to zero, producing a larger or equally sized output.
Convolutional Neural Networks (CNNs)
 The convolutional layer converts the image into numerical
values, allowing the neural network to interpret and extract
relevant patterns.
Convolutional Neural Networks (CNNs)
Convolutional Neural Networks (CNNs)
Convolutional Neural Networks (CNNs)
Convolutional Neural Networks (CNNs)
Convolutional Neural Networks (CNNs)
Pooling layer
 Pooling layers, also known as downsampling, conduct dimensionality
reduction, reducing the number of parameters in the input.
 Similar to the convolutional layer, the pooling operation sweeps a filter
across the entire input, but the difference is that this filter has no
weights.
 It is useful for extracting dominant features that are rotational and
positional invariant, thus maintaining the process of effectively training
the model.
Convolutional Neural Networks (CNNs)
 There are two main types of pooling:
 Max pooling: returns the maximum value from the portion of the
image covered by the Kernel.
 Average pooling: returns the average of all the values from the
portion of the image covered by the Kernel.
Convolutional Neural Networks (CNNs)
Fully-connected layer
 In the fully connected layer, each node in the output layer connects
directly to a node in the previous layer.
 This layer performs the task of classification based on the
features extracted through the previous layers and their
different filters.
 While convolutional and pooling layers tend to use ReLu
functions, FC layers usually leverage a softmax activation
function to classify inputs appropriately.
Convolutional Neural Networks (CNNs)
CNN-Example
Generative Adversarial Networks (GAN)
 GAN is a deep learning model in which sub-models
compete to become more accurate in their predictions.
They are typically run unsupervised and use a
cooperative zero-sum game framework to learn.
The two neural networks that make up a GAN are
referred to as the generator and the discriminator.
The goal of the generator is to artificially
manufacture outputs that could easily be mistaken
Advanced Topics in Deep Learning
 The goal of the discriminator is to identify which of the
outputs it receives have been artificially created.
Advanced Topics in Deep Learning
Transfer Learning
 It uses pre-trained models from one machine learning task
or dataset to improve performance and generalizability on
a related task or dataset.
 It uses what has been learned in one setting to improve
generalization in another setting.
 Transfer learning algorithms take already-trained models
or networks as a starting point.
Advanced Topics in Deep Learning
 Knowledge from an already trained ML model must be
similar to the new task to be transferable.
 For example, the knowledge gained from recognizing
an image of a dog in a supervised ML system could be
transferred to a new system to recognize images of
cats.
Advanced Topics in Deep Learning
Attention mechanisms
 Attention mechanisms enhance deep learning models by
selectively focusing on important input elements.
 Attention Mechanisms attempt to selectively concentrate
on a few relevant things while ignoring others in deep
neural networks.
Advanced Topics in Deep Learning
How do Attention mechanisms work?
1. First, it breaks down this input into smaller pieces, like individual
words.
2. Then, it looks at these pieces and decides which ones are the most
important.
3. Each piece gets a score based on how well it matches the question.
4. After scoring each piece, it figures out how much attention to give to
each one.
5. Finally, it adds up all the pieces but gives more weight to the
important ones.
Advanced Topics in Deep Learning
Transformer Model
 The Transformer model is a neural network architecture
that revolutionized NLP by removing the need for
sequence processing in order, using self-attention
mechanisms instead.
Key Components of the Transformer Model
 Self-Attention Mechanism
 Multi-Head Attention
 Positional Encoding
 Encoder and Decoder Structure
Advanced Topics in Deep Learning
Transformer Model
 Self-attention mechanism: allows the model to weigh the
importance of each word in a sequence relative to others,
capturing relationships even if words are not adjacent.
 Multi-Head Attention: Enables capturing diverse
contextual meanings using multiple attention heads.
 Positional Encoding: Adds positional information to input
embeddings, preserving the order of words.
 Encoder: processes the input and produces a context-rich
representation.
Advanced Topics in Deep Learning
Transformer Model
 Decoder: generates the target sequence using encoder
output and self-attention.
How the Transformer Works
1. Input Embedding and Positional Encoding
2. Processing Through Encoder Layers
3. Decoder Layers and Output Generation
Each step captures relationships and generates the target
sequence with optimal attention.
Advanced Topics in Deep Learning

Chapter Four Deep Learning artificial intelligence .pptx

  • 1.
  • 2.
     ANN isa computational model that simulates some human brain properties.  Algorithms that try to mimic the brain.  massively parallel, distributed system, made up of simple processing units (neurons)  The network acquires knowledge from its environment through a learning process  Neurons are connected with others by connection link.  Each link is associated with weights which contain information about the input signal Basics of Artificial Neural Networks (ANN)
  • 3.
    Basics of ANN BiologicalNeural Network Artificial Neural Networks
  • 4.
     There arethree layers in the network architecture:  The input layer  The hidden layer (can be more than one)  The output layer. Basics of ANN
  • 5.
     In anANN, data flows from the input layer, through one or more hidden layers, to the output layer.  Each layer consists of neurons that receive input, process it, and pass the output to the next layer.  The layers work together to extract features, transform data, and make predictions.  The input layer is the first layer in an ANN and is responsible for receiving the raw input data.  It doesn’t perform any computations but passes the data to the next layer. Basics of ANN
  • 6.
     Hidden Layersare the intermediate layers between the input and output layers.  They perform most of the computations required by the network.  can vary in number and size, depending on the complexity of the task.  Each hidden layer applies a set of weights and biases to the input data, followed by an activation function to introduce non-linearity. Basics of ANN
  • 7.
     The OutputLayer is the final layer in an ANN.  It produces the output predictions.  The number of neurons in this layer corresponds to the number of classes in a classification problem or the number of outputs in a regression problem. Neurons  a neuron (or node) is a basic computational unit that mimics the behavior of a biological neuron in the human brain.  In ANN they receive inputs, process them, and pass the output to the next layer of neurons. Basics of ANN
  • 8.
     The neuroncalculates a weighted sum of the inputs.  This is done by multiplying each input by its corresponding weight and adding them up. Weights  Weights determine the strength of the connections between neurons.  Each connection between neurons is assigned a weight, which is multiplied by the input value to the neuron to determine its output. Basics of ANN
  • 9.
    Bias  Bias isadded to the weighted sum of inputs to a neuron in a given layer.  It is an additional input to the neuron that helps to adjust the output of the activation function. Activation Function  An activation function is a mathematical function applied to the output of a neuron.  Its function is to introduce non-linearity into the model, allowing the network to learn and represent complex patterns in the data. Basics of ANN
  • 10.
     The activationfunction decides whether a neuron should be activated or not by calculating the weighted and adding bias to it. Basics of ANN
  • 11.
    Variants of ActivationFunction Sigmoid Function  takes any real value as input and outputs values from 0 to 1. Basics of ANN
  • 12.
     The largerthe input (more positive), the closer the output value will be to 1, whereas the smaller the input (more negative), the closer the output will be to 0. Basics of ANN
  • 13.
    Tanh Function (HyperbolicTangent)  Tanh function is very similar to the sigmoid activation function  The output range from -1 to 1.  In Tanh, the larger the input (more positive), the closer the output value will be to 1, whereas the smaller the input (more negative), the closer the output will be to -1. Basics of ANN
  • 14.
    ReLu Activation function Rectified Linear Unit activation function is one of the most commonly used activation functions in deep learning, particularly in convolutional neural networks (CNNs). Basics of ANN
  • 15.
    Softmax activation function It is commonly used in the output layer of neural networks for multi-class classification tasks.  It converts the raw output of a neural network into a probability distribution over multiple output classes Basics of ANN
  • 16.
    Loss Function  Aloss function measures how well a neural network model performs a certain task, which in most cases is regression or classification.  To improve the neural network, we must minimize the value of the loss function during the backpropagation step.  We only use the cross-entropy loss function in classification tasks when we want the neural network to predict probabilities. Basics of ANN
  • 17.
     For regressiontasks, when we want the network to predict continuous numbers, we must use the mean squared error loss function.  We use the mean absolute percentage error loss function during demand forecasting to monitor the network's performance during training time. Types of Loss Functions  In supervised learning:  Regression Loss Functions: • Mean Squared Error • Mean Absolute Error Basics of ANN
  • 18.
     Classification LossFunctions:  Binary Cross-Entropy  Categorical Cross-Entropy Basics of ANN
  • 19.
     It isa type of artificial neural network where connections between the nodes do not form cycles.  This characteristic differentiates it from recurrent neural networks (RNNs).  The network consists of an input layer, one or more hidden layers, and an output layer.  Information flows in one direction—from input to output Feedforward Neural Network
  • 20.
     RNN isa deep learning model that is trained to process and convert a sequential data input into a specific sequential data output.  words  Sentences  time-series data  The main and most important feature of RNN is its Hidden state.  remembers some information about a sequence Recurrent Neural Network (RNN)
  • 21.
     Issues inRNN model training:  Vanishing Gradient  occurs when the gradients of the loss function with respect to the parameters (like weights) become extremely small as they are propagated back through layers or time steps.  Exploding Gradient  occurs when the weights are large, causing the gradients to blow up during backpropagation and causing instability during training, where the model's weights may oscillate wildly or overflow, making learning impossible. Recurrent Neural Network (RNN)
  • 22.
     special typeof RNN capable of handling the vanishing gradient problem faced by RNN.  LSTMs are explicitly designed to avoid long-term dependency problems.  It has three parts (gates)  Forget gate  Input gate  Output gate Long Short-term Memory (LSTM)
  • 23.
     Forget gatechooses whether the information coming from the previous timestamp is to be remembered or is irrelevant.  In the Input gate the cell tries to learn new information from the input to this cell.  In the output gate the cell passes the updated information from the current timestamp to the next timestamp Long Short-term Memory (LSTM)
  • 24.
     Forget gatechooses whether the information coming from the previous timestamp is to be remembered or is irrelevant.  In the Input gate the cell tries to learn new information from the input to this cell.  In the output gate the cell passes the updated information from the current timestamp to the next timestamp Long Short-term Memory (LSTM)
  • 25.
     convolutional neuralnetworks (ConvNets or CNNs) are more often utilized for classification and computer vision tasks.  They provide a more scalable approach to image classification and object recognition tasks, leveraging principles from linear algebra, specifically matrix multiplication, to identify patterns within an image.  They can be computationally demanding, requiring graphical processing units (GPUs) to train models. Convolutional Neural Networks (CNNs)
  • 26.
     CNNs aredistinguished from other neural networks by their superior performance with image, speech, or audio signal inputs.  They have three main types of layers, which are:  Convolutional layer  Pooling layer  Fully-connected (FC) layer  With each layer, the CNN increases in complexity, identifying greater portions of the image.  Earlier layers focus on simple features, such as colors and edges Convolutional Neural Networks (CNNs)
  • 27.
     As theimage data progresses through the layers of the CNN, it starts to recognize larger elements or shapes of the object until it finally identifies the intended object. Convolutional layer  It is the first layer of a convolutional network and the core building block of a CNN.  It is where the majority of computation occurs.  It requires a few components:  input data,  Filter  feature map. Convolutional Neural Networks (CNNs)
  • 28.
     Let’s assumethat the input will be a color image, made up of a matrix of pixels in 3D.  This means that the input will have three dimensions height, width, and depth corresponding to RGB in an image.  We also have a feature detector, also known as a kernel or a filter.  Filter will move across the receptive fields of the image, checking if the feature is present. This process is known as a convolution. Convolutional Neural Networks (CNNs)
  • 29.
     The featuredetector is a two-dimensional (2-D) array of weights, which represents part of the image.  The filter is then applied to an area of the image, and a dot product is calculated between the input pixels and the filter.  This dot product is then fed into an output array.  Afterward, the filter shifts by a stride, repeating the process until the kernel has swept across the entire image.  The final output from the series of dot products from the input and the filter is known as a feature map, activation map, or a convolved feature. Convolutional Neural Networks (CNNs)
  • 30.
     The featuredetector remain fixed as it moves across the image, which is also known as parameter sharing.  Some parameters, like the weight values, adjust during training through the process of backpropagation and gradient descent.  However, there are three hyperparameters which affect the volume size of the output that need to be set before the training of the neural network begins.  This include: Convolutional Neural Networks (CNNs)
  • 31.
     Number offilters: affects the depth of the output. For example, three distinct filters would yield three different feature maps, creating a depth of three.  Stride: is the distance, or number of pixels, that the kernel moves over the input matrix.  Zero-padding: is usually used when the filters do not fit the input image.  This sets all elements that fall outside of the input matrix to zero, producing a larger or equally sized output. Convolutional Neural Networks (CNNs)
  • 32.
     The convolutionallayer converts the image into numerical values, allowing the neural network to interpret and extract relevant patterns. Convolutional Neural Networks (CNNs)
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
    Pooling layer  Poolinglayers, also known as downsampling, conduct dimensionality reduction, reducing the number of parameters in the input.  Similar to the convolutional layer, the pooling operation sweeps a filter across the entire input, but the difference is that this filter has no weights.  It is useful for extracting dominant features that are rotational and positional invariant, thus maintaining the process of effectively training the model. Convolutional Neural Networks (CNNs)
  • 38.
     There aretwo main types of pooling:  Max pooling: returns the maximum value from the portion of the image covered by the Kernel.  Average pooling: returns the average of all the values from the portion of the image covered by the Kernel. Convolutional Neural Networks (CNNs)
  • 39.
    Fully-connected layer  Inthe fully connected layer, each node in the output layer connects directly to a node in the previous layer.  This layer performs the task of classification based on the features extracted through the previous layers and their different filters.  While convolutional and pooling layers tend to use ReLu functions, FC layers usually leverage a softmax activation function to classify inputs appropriately. Convolutional Neural Networks (CNNs)
  • 40.
  • 41.
    Generative Adversarial Networks(GAN)  GAN is a deep learning model in which sub-models compete to become more accurate in their predictions. They are typically run unsupervised and use a cooperative zero-sum game framework to learn. The two neural networks that make up a GAN are referred to as the generator and the discriminator. The goal of the generator is to artificially manufacture outputs that could easily be mistaken Advanced Topics in Deep Learning
  • 42.
     The goalof the discriminator is to identify which of the outputs it receives have been artificially created. Advanced Topics in Deep Learning
  • 43.
    Transfer Learning  Ituses pre-trained models from one machine learning task or dataset to improve performance and generalizability on a related task or dataset.  It uses what has been learned in one setting to improve generalization in another setting.  Transfer learning algorithms take already-trained models or networks as a starting point. Advanced Topics in Deep Learning
  • 44.
     Knowledge froman already trained ML model must be similar to the new task to be transferable.  For example, the knowledge gained from recognizing an image of a dog in a supervised ML system could be transferred to a new system to recognize images of cats. Advanced Topics in Deep Learning
  • 45.
    Attention mechanisms  Attentionmechanisms enhance deep learning models by selectively focusing on important input elements.  Attention Mechanisms attempt to selectively concentrate on a few relevant things while ignoring others in deep neural networks. Advanced Topics in Deep Learning
  • 46.
    How do Attentionmechanisms work? 1. First, it breaks down this input into smaller pieces, like individual words. 2. Then, it looks at these pieces and decides which ones are the most important. 3. Each piece gets a score based on how well it matches the question. 4. After scoring each piece, it figures out how much attention to give to each one. 5. Finally, it adds up all the pieces but gives more weight to the important ones. Advanced Topics in Deep Learning
  • 47.
    Transformer Model  TheTransformer model is a neural network architecture that revolutionized NLP by removing the need for sequence processing in order, using self-attention mechanisms instead. Key Components of the Transformer Model  Self-Attention Mechanism  Multi-Head Attention  Positional Encoding  Encoder and Decoder Structure Advanced Topics in Deep Learning
  • 48.
    Transformer Model  Self-attentionmechanism: allows the model to weigh the importance of each word in a sequence relative to others, capturing relationships even if words are not adjacent.  Multi-Head Attention: Enables capturing diverse contextual meanings using multiple attention heads.  Positional Encoding: Adds positional information to input embeddings, preserving the order of words.  Encoder: processes the input and produces a context-rich representation. Advanced Topics in Deep Learning
  • 49.
    Transformer Model  Decoder:generates the target sequence using encoder output and self-attention. How the Transformer Works 1. Input Embedding and Positional Encoding 2. Processing Through Encoder Layers 3. Decoder Layers and Output Generation Each step captures relationships and generates the target sequence with optimal attention. Advanced Topics in Deep Learning