Deep Learning Basics
• Artificial Neural Networks (ANN): Deep
learning is a subset of machine learning that
uses artificial neural networks to learn from
data.
• Layers: ANNs are composed of layers of
interconnected nodes or neurons. The input
layer receives the data, the output layer
produces the predictions, and the hidden
layers perform the computations in between.
• Activation functions: Activation functions are applied
to the outputs of each layer to introduce non-linearity
and increase the model's expressiveness. Common
activation functions include ReLU, sigmoid, and tanh.
• Backpropagation: This is a method for training neural
networks by iteratively adjusting the weights in each
layer to minimize the difference between the predicted
outputs and the actual outputs. It works by
propagating the error backwards from the output layer
to the input layer and adjusting the weights accordingly
• Loss function: The loss function measures the
difference between the predicted outputs and the
actual outputs. The goal of training is to minimize the
loss function by adjusting the weights in each layer.
• Optimization algorithms: Optimization algorithms,
such as stochastic gradient descent (SGD) and Adam,
are used to adjust the weights in each layer during
training to minimize the loss function.
• Overfitting: Overfitting occurs when a model is too
complex and starts to memorize the training data
instead of learning the underlying patterns. This can be
prevented by using techniques
Activation Functions in Artificial Neural
Networks (ANNs):
• Activation functions are mathematical functions
used in ANNs to introduce non-linearity into the
output of a neuron or a layer.
• They are typically applied to the weighted sum of
inputs and biases before being passed through to
the next layer of the network.
• Without activation functions, ANNs would simply
be a linear regression model, which can only
model linear relationships between input and
output.
• Common activation functions include Sigmoid, Tanh, ReLU, and
Softmax.
• Sigmoid and Tanh functions are sigmoidal, meaning they produce
an S-shaped curve. They are used to squash the output of a neuron
to a range between 0 and 1 or -1 and 1, respectively.
• ReLU (Rectified Linear Unit) function is non-sigmoidal and is defined
as f(x) = max(0, x). It is one of the most commonly used activation
functions due to its simplicity and effectiveness.
• Softmax function is used in the output layer of a network to
produce a probability distribution over multiple classes.
• Choosing the right activation function can have a significant impact
on the performance of a neural network, and it is often an area of
active research.
Non Linearity
• In the context of machine learning, nonlinearity is
an important property of neural networks. Neural
networks are composed of many interconnected
processing units (neurons), which apply a
nonlinear activation function to their inputs
before passing them to the next layer of the
network. This nonlinearity allows neural
networks to model complex relationships
between inputs and outputs, and to learn
representations that are not directly observable
in the input data.
Simple Linear Regression Model
• For example, a linear model can only learn a linear
decision boundary between two classes, which may
not be sufficient to accurately classify complex data. In
contrast, a nonlinear model such as a neural network
can learn more complex decision boundaries that can
better separate the classes.
• Some common nonlinear activation functions used in
neural networks include the Rectified Linear Unit
(ReLU), sigmoid, tanh, and others. These functions
introduce nonlinearity into the network, allowing it to
learn more complex representations of the input data.
Rectified Linear Unit (ReLU) activation
function:
• ReLU is a non-linear activation function
commonly used in neural networks.
• It takes an input value x and returns the
maximum of 0 and x as the output value.
• The formula for ReLU is f(x) = max(0, x).
• ReLU is computationally efficient, since it
requires only simple thresholding of the input
value, compared to other activation functions
like sigmoid or tanh.
Sigmoid
• Sigmoid is a non-linear activation function commonly used in neural networks.
• It takes an input value x and maps it to a range between 0 and 1 using the formula
f(x) = 1 / (1 + e^-x).
• The output of the sigmoid function can be interpreted as a probability or
likelihood, since it always produces a value between 0 and 1.
• Sigmoid was one of the earliest activation functions used in neural networks, and
it is still used in some applications, such as logistic regression.
• Sigmoid is smooth and differentiable, which makes it useful for backpropagation
and gradient descent optimization.
• One limitation of sigmoid is that it suffers from the "vanishing gradient" problem,
where the gradient becomes very small as the input value becomes very large or
very small, making it difficult for the network to learn.
• Another limitation of sigmoid is that it is not zero-centered, which can slow down
the convergence of gradient descent.
• Due to these limitations, sigmoid is not as commonly used as other activation
functions like ReLU or its variants.
•
Tanh
• Tanh is a non-linear activation function commonly used in neural networks.
• It takes an input value x and maps it to a range between -1 and 1 using the formula f(x) = (e^x - e^-
x) / (e^x + e^-x).
• Tanh is a shifted and rescaled version of the sigmoid function, with the output value zero-centered.
• Like sigmoid, tanh is smooth and differentiable, which makes it useful for backpropagation and
gradient descent optimization.
• Tanh is often used in the hidden layers of neural networks, especially in recurrent neural networks
(RNNs) and long short-term memory (LSTM) networks.
• One limitation of tanh is that it also suffers from the "vanishing gradient" problem, where the
gradient becomes very small as the input value becomes very large or very small, making it difficult
for the network to learn.
• Another limitation of tanh is that it is more computationally expensive than ReLU or its variants,
since it involves exponentials.
• Despite its limitations, tanh can be useful in certain situations, such as when the input data is
standardized and zero-centered, or when the network needs to model both positive and negative
values.
Difference between Activation
Function and ML Algorithm
• An activation function is a mathematical function used in artificial neural networks
to introduce non-linearity into the output of a neuron.
• Activation functions are used to decide whether the neuron should be activated or
not based on the input it receives.
• Common activation functions include sigmoid, ReLU, tanh, and softmax.
• On the other hand, a machine learning algorithm is a method or set of methods
used to learn patterns and relationships
• in data in order to make predictions or decisions. Machine learning algorithms can
be supervised, unsupervised,
• or semi-supervised, and can be used for a wide range of tasks, such as regression,
classification, clustering, and reinforcement learning.
• While activation functions are used in neural networks to introduce non-linearity
and make them more expressive, machine learning algorithms are used to learn
patterns and relationships in data and make predictions based on that learning.
Why Accuracy of ML is better than DL
• Complexity: ML models are often simpler and more
interpretable than ANNs, which can make them easier to
train and optimize. In some cases, a simpler model may be
sufficient to achieve good performance on a given task,
without the need for a complex neural network.
• Data size: ANNs require large amounts of data to train
effectively, and may not perform well on small datasets. In
contrast, some ML models, such as decision trees or logistic
regression, can perform well even on smaller datasets.
• Feature engineering: ANNs often require extensive feature
engineering and preprocessing of input data, which can be
time-consuming and require domain expertise. In contrast,
some ML models, such as decision trees or Naive Bayes,
can perform well with minimal feature engineering.
• Model selection: Choosing the right ANN architecture and
hyperparameters can be a challenging task, and may require
extensive experimentation and tuning. In contrast, some ML
models, such as decision trees or Naive Bayes, have fewer
hyperparameters to tune and may be easier to select and optimize.
• Overfitting: ANNs are prone to overfitting, where the model
becomes too complex and performs well on the training data but
poorly on new, unseen data. ML models may be less prone to
overfitting, especially when regularized or constrained in some way.
• Overall, the choice of model depends on the specific task and
dataset at hand, and there is no one-size-fits-all solution. In some
cases, an ML model may be more suitable, while in other cases, an
ANN may be necessary to achieve the desired performance

Deep Learning Basics.pptx

  • 1.
  • 2.
    • Artificial NeuralNetworks (ANN): Deep learning is a subset of machine learning that uses artificial neural networks to learn from data. • Layers: ANNs are composed of layers of interconnected nodes or neurons. The input layer receives the data, the output layer produces the predictions, and the hidden layers perform the computations in between.
  • 3.
    • Activation functions:Activation functions are applied to the outputs of each layer to introduce non-linearity and increase the model's expressiveness. Common activation functions include ReLU, sigmoid, and tanh. • Backpropagation: This is a method for training neural networks by iteratively adjusting the weights in each layer to minimize the difference between the predicted outputs and the actual outputs. It works by propagating the error backwards from the output layer to the input layer and adjusting the weights accordingly
  • 4.
    • Loss function:The loss function measures the difference between the predicted outputs and the actual outputs. The goal of training is to minimize the loss function by adjusting the weights in each layer. • Optimization algorithms: Optimization algorithms, such as stochastic gradient descent (SGD) and Adam, are used to adjust the weights in each layer during training to minimize the loss function. • Overfitting: Overfitting occurs when a model is too complex and starts to memorize the training data instead of learning the underlying patterns. This can be prevented by using techniques
  • 5.
    Activation Functions inArtificial Neural Networks (ANNs): • Activation functions are mathematical functions used in ANNs to introduce non-linearity into the output of a neuron or a layer. • They are typically applied to the weighted sum of inputs and biases before being passed through to the next layer of the network. • Without activation functions, ANNs would simply be a linear regression model, which can only model linear relationships between input and output.
  • 6.
    • Common activationfunctions include Sigmoid, Tanh, ReLU, and Softmax. • Sigmoid and Tanh functions are sigmoidal, meaning they produce an S-shaped curve. They are used to squash the output of a neuron to a range between 0 and 1 or -1 and 1, respectively. • ReLU (Rectified Linear Unit) function is non-sigmoidal and is defined as f(x) = max(0, x). It is one of the most commonly used activation functions due to its simplicity and effectiveness. • Softmax function is used in the output layer of a network to produce a probability distribution over multiple classes. • Choosing the right activation function can have a significant impact on the performance of a neural network, and it is often an area of active research.
  • 7.
    Non Linearity • Inthe context of machine learning, nonlinearity is an important property of neural networks. Neural networks are composed of many interconnected processing units (neurons), which apply a nonlinear activation function to their inputs before passing them to the next layer of the network. This nonlinearity allows neural networks to model complex relationships between inputs and outputs, and to learn representations that are not directly observable in the input data.
  • 8.
  • 13.
    • For example,a linear model can only learn a linear decision boundary between two classes, which may not be sufficient to accurately classify complex data. In contrast, a nonlinear model such as a neural network can learn more complex decision boundaries that can better separate the classes. • Some common nonlinear activation functions used in neural networks include the Rectified Linear Unit (ReLU), sigmoid, tanh, and others. These functions introduce nonlinearity into the network, allowing it to learn more complex representations of the input data.
  • 14.
    Rectified Linear Unit(ReLU) activation function: • ReLU is a non-linear activation function commonly used in neural networks. • It takes an input value x and returns the maximum of 0 and x as the output value. • The formula for ReLU is f(x) = max(0, x). • ReLU is computationally efficient, since it requires only simple thresholding of the input value, compared to other activation functions like sigmoid or tanh.
  • 15.
    Sigmoid • Sigmoid isa non-linear activation function commonly used in neural networks. • It takes an input value x and maps it to a range between 0 and 1 using the formula f(x) = 1 / (1 + e^-x). • The output of the sigmoid function can be interpreted as a probability or likelihood, since it always produces a value between 0 and 1. • Sigmoid was one of the earliest activation functions used in neural networks, and it is still used in some applications, such as logistic regression. • Sigmoid is smooth and differentiable, which makes it useful for backpropagation and gradient descent optimization. • One limitation of sigmoid is that it suffers from the "vanishing gradient" problem, where the gradient becomes very small as the input value becomes very large or very small, making it difficult for the network to learn. • Another limitation of sigmoid is that it is not zero-centered, which can slow down the convergence of gradient descent. • Due to these limitations, sigmoid is not as commonly used as other activation functions like ReLU or its variants. •
  • 18.
    Tanh • Tanh isa non-linear activation function commonly used in neural networks. • It takes an input value x and maps it to a range between -1 and 1 using the formula f(x) = (e^x - e^- x) / (e^x + e^-x). • Tanh is a shifted and rescaled version of the sigmoid function, with the output value zero-centered. • Like sigmoid, tanh is smooth and differentiable, which makes it useful for backpropagation and gradient descent optimization. • Tanh is often used in the hidden layers of neural networks, especially in recurrent neural networks (RNNs) and long short-term memory (LSTM) networks. • One limitation of tanh is that it also suffers from the "vanishing gradient" problem, where the gradient becomes very small as the input value becomes very large or very small, making it difficult for the network to learn. • Another limitation of tanh is that it is more computationally expensive than ReLU or its variants, since it involves exponentials. • Despite its limitations, tanh can be useful in certain situations, such as when the input data is standardized and zero-centered, or when the network needs to model both positive and negative values.
  • 19.
    Difference between Activation Functionand ML Algorithm • An activation function is a mathematical function used in artificial neural networks to introduce non-linearity into the output of a neuron. • Activation functions are used to decide whether the neuron should be activated or not based on the input it receives. • Common activation functions include sigmoid, ReLU, tanh, and softmax. • On the other hand, a machine learning algorithm is a method or set of methods used to learn patterns and relationships • in data in order to make predictions or decisions. Machine learning algorithms can be supervised, unsupervised, • or semi-supervised, and can be used for a wide range of tasks, such as regression, classification, clustering, and reinforcement learning. • While activation functions are used in neural networks to introduce non-linearity and make them more expressive, machine learning algorithms are used to learn patterns and relationships in data and make predictions based on that learning.
  • 20.
    Why Accuracy ofML is better than DL • Complexity: ML models are often simpler and more interpretable than ANNs, which can make them easier to train and optimize. In some cases, a simpler model may be sufficient to achieve good performance on a given task, without the need for a complex neural network. • Data size: ANNs require large amounts of data to train effectively, and may not perform well on small datasets. In contrast, some ML models, such as decision trees or logistic regression, can perform well even on smaller datasets. • Feature engineering: ANNs often require extensive feature engineering and preprocessing of input data, which can be time-consuming and require domain expertise. In contrast, some ML models, such as decision trees or Naive Bayes, can perform well with minimal feature engineering.
  • 21.
    • Model selection:Choosing the right ANN architecture and hyperparameters can be a challenging task, and may require extensive experimentation and tuning. In contrast, some ML models, such as decision trees or Naive Bayes, have fewer hyperparameters to tune and may be easier to select and optimize. • Overfitting: ANNs are prone to overfitting, where the model becomes too complex and performs well on the training data but poorly on new, unseen data. ML models may be less prone to overfitting, especially when regularized or constrained in some way. • Overall, the choice of model depends on the specific task and dataset at hand, and there is no one-size-fits-all solution. In some cases, an ML model may be more suitable, while in other cases, an ANN may be necessary to achieve the desired performance