A Beginner’s Approach to Deep
Learning
What is Deep Learning?
deep
learning (DL)
machine
learning (ML)
artificial
intelligence
(AI)
Provides computers or computing systems the
ability to automatically learn and improve from
experience without being explicitly programmed
Uses multiple layers to progressively extract
higher level features from the raw input, with the
features and classification both learned from data
Algorithms to mimic human intelligence with
some logic rule which may or may not be trained
through some data
Courtesy: Semiconductor Engineering
Machine Learning vs Deep Learning
Deep Learning in Healthcare
Disease diagnosis
Medical imaging
Smart health
records
Disease prediction
Personalized
medicine
Machine
Learning
Supervised
Learning
Classification
Regression
Unsupervised
Learning
Clustering
Centroid
Density
Distribution
Hierarchical
Density
Estimation
Parametric
Non-parametric
Dimensionality
Reduction
Reinforcement
Learning
Positive
Negative
Hierarchy of ML
Input layer takes in numerical features
Input layers are often connected to hidden layers and finally to
output layer
These connections are called edges
Edges typically have a weight that adjusts as learning proceeds
Each circular unit is called a node
The inputs at each node is multiplied with the corresponding
weights, added with a bias, passed through an activation
function to obtain the output
Neural Networks
Perceptron – Simplest Neural Network
Courtesy: Towards Data Science
Perceptron is a single layer neural network
inputs weights
× +
bias
activate
The perceptron consists of 4 parts.
• Inputs
• Weights and Bias
• Net sum
• Activation Function
Weights show the strength of the particular
edge
A bias value allows you to shift the activation
function curve up or down
Activation functions are used to map the input
between the required values like (0, 1) or (-1, 1).
Multi-Layer Perceptron (MLP)
Courtesy: Towards Data Science
MLP has more than a single layer
Layers between input and output are hidden
layers
MLP utilizes a supervised learning technique
called backpropagation for training
It can distinguish data that is not linearly
separable.
As we increase no. of layers in MLP, we enter DL.
Convolutional Neural Networks (CNN)
Courtesy: Towards Data Science
ConvNets have the ability to learn these
filters
Less features to train than MLP, if input size is
large
CNNs have four types of layers:
• Convolution layer
• Pooling layer
• Dense/Fully connected layer
• Activation layer
CNNs are well-suited for image classification
tasks
Popular CNN architectures include LeNet,
AlexNet, VGGNet, etc.
Convolution Layer
Courtesy: IBM Research
The convolutional layer is the core building block of a
CNN
Filter values are the weights which are learned during
training
Convolution kernel or a filter moves across the
receptive fields of the image, checking if the feature is
present
A dot product is calculated between the input pixels
and the filter
The filter shifts by a stride, repeating the process until
the kernel has swept across the entire image
The final output from the series of dot products from
the input and the filter is known as a feature map,
activation map
Pooling Layer
Courtesy: Towards Data Science
Pooling layer conducts dimensionality
reduction, reducing the number of parameters
in the input
Pooling layer does not have any trainable
weights
There are two main types of pooling:
• Max pooling
• Average pooling
Pooling layers help to reduce complexity,
improve efficiency, and limit risk of overfitting.
Fully Connected Layer
Courtesy: Towards Data Science
In the fully-connected layer, each node in the
output layer connects directly to a node in
the previous layer
This layer performs the task of classification
based on the features extracted through the
previous layers and their different filters.
Activation Functions
Courtesy: Towards Data Science
Activation function is used to map the input feature value to get the desired output
of node
It is used to determine the output of neural network like yes or no. It maps the
resulting values in between 0 to 1 or-1 to 1 etc.
Sigmoid or Logistic Activation Function
The main reason why we use sigmoid function is because it
exists between 0 to 1.
Therefore, it is especially used for models where we have
to predict the probability as an output.
The function is differentiable, so we can find the slope of the
sigmoid curve at any two points.
The logistic sigmoid function can cause a neural network to get
stuck at the training time.
Activation Functions
Courtesy: Towards Data Science
Softmax Activation Function
The softmax function is a more generalized logistic activation
function which is used for multiclass classification.
Activation Functions
Courtesy: Towards Data Science
Tanh or hyperbolic tangent Activation Function
The range of the tanh function is from-1 to 1
The advantage is that the negative inputs will be mapped
strongly negative and the zero inputs will be mapped near zero
in the tanh graph.
The tanh function is mainly used classification between two
classes.
Activation Functions
Courtesy: Towards Data Science
ReLU (Rectified Linear Unit) Activation Function
The range of the ReLU function is from 0 to positive infinity
The disadvantage is that the any negative input given to the
ReLU activation function turns the value into zero immediately
in the graph
Activation Functions
Courtesy: Towards Data Science
Leaky ReLU Activation Function
It is an attempt to solve the dying ReLU problem
Usually, the value of a is 0.01
The range of the Leaky ReLU is-infinity to infinity
Training a CNN
Courtesy: Andrej Karpathy
Once the architecture is fixed, the training is
possible
Training is the fixing of optimal weights of
the convolutional and fully connected layers
Backpropagation of error is used for
updating the weights
A loss function is optimized w.r.t. the weights
Training uses Stochastic Gradient Decent
optimization
Parameters include learning rate, optimizer,
batch size, validation split, metric, and loss
Training is to map a set of inputs to a set of
outputs from training data.
Losses
Courtesy: Machine Learning Mastery
Loss function is used to evaluate the performance of prediction, and is a measure of the error in prediction
Binary Classification Problems Multi-Class Classification Problems Regression Problems
Mean Squared Error
(MSE)
Cross-Entropy Categorical Cross-
Entropy
Sparse Categorical Cross-
Entropy
Metrics
Courtesy: Towards Data Science
Supervised Learning
Classification Problems
Confusion Matrix
Accuracy
Precision
Recall
F1 score
ROC
AUC
Regression Problems
Mean Absolute Error (MAE)
Mean Squared Error (MSE)
Coefficient of Determination (R2
)
Getting Started
Courtesy: Towards Data Science
Install keras and tensorflow
Prepare the training and testing data
Build the CNN layers using the Tensorflow library
Select the Optimizer
Train the network
Finally, test the model
pip install keras and pip install tensorflow
Make separate folders, without repeating data, and separate classes in different folders in training data
Use a sequential model and keep stacking layers
The most common one is Adam
Use suitable parameters based on the problem
Test it on the test database
Text Recognition RNTN, RNN
Image Recognition CNN, DBN
Object Recognition CNN, RNTN
Time Series Analysis RNN
Video Analysis RNN
Other Networks
References
Nettleton, David F., Albert Orriols-Puig, and Albert Fornells. "A study of the effect of different types of noise on the precision of supervised learning
techniques." Artificial intelligence review 33.4 (2010): 275-306.
Polikar, Robi. "Ensemble learning." Ensemble machine learning. Springer, Boston, MA, 2012. 1-34.
Holzinger, Andreas, et al. "Current advances, trends and challenges of machine learning and knowledge extraction: from machine learning to explainable
AI." International Cross-Domain Conference for Machine Learning and Knowledge Extraction. Springer, Cham, 2018.
Seliya, Naeem, Taghi M. Khoshgoftaar, and Jason Van Hulse. "A study on the relationships of classifier performance metrics." 2009 21st IEEE
international conference on tools with artificial intelligence. IEEE, 2009.
Blum, Avrim L., and Pat Langley. "Selection of relevant features and examples in machine learning." Artificial intelligence 97.1-2 (1997): 245-271.
Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. "Unsupervised learning." The elements of statistical learning. Springer, New York, NY, 2009.
485-585.
Deng, Li, Geoffrey Hinton, and Brian Kingsbury. "New types of deep neural network learning for speech recognition and related applications: An
overview." 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, 2013.
A Beginner's Approach to Deep Learning Techniques

A Beginner's Approach to Deep Learning Techniques

  • 1.
    A Beginner’s Approachto Deep Learning
  • 2.
    What is DeepLearning? deep learning (DL) machine learning (ML) artificial intelligence (AI) Provides computers or computing systems the ability to automatically learn and improve from experience without being explicitly programmed Uses multiple layers to progressively extract higher level features from the raw input, with the features and classification both learned from data Algorithms to mimic human intelligence with some logic rule which may or may not be trained through some data
  • 3.
  • 4.
    Deep Learning inHealthcare Disease diagnosis Medical imaging Smart health records Disease prediction Personalized medicine
  • 5.
  • 6.
    Input layer takesin numerical features Input layers are often connected to hidden layers and finally to output layer These connections are called edges Edges typically have a weight that adjusts as learning proceeds Each circular unit is called a node The inputs at each node is multiplied with the corresponding weights, added with a bias, passed through an activation function to obtain the output Neural Networks
  • 7.
    Perceptron – SimplestNeural Network Courtesy: Towards Data Science Perceptron is a single layer neural network inputs weights × + bias activate The perceptron consists of 4 parts. • Inputs • Weights and Bias • Net sum • Activation Function Weights show the strength of the particular edge A bias value allows you to shift the activation function curve up or down Activation functions are used to map the input between the required values like (0, 1) or (-1, 1).
  • 8.
    Multi-Layer Perceptron (MLP) Courtesy:Towards Data Science MLP has more than a single layer Layers between input and output are hidden layers MLP utilizes a supervised learning technique called backpropagation for training It can distinguish data that is not linearly separable. As we increase no. of layers in MLP, we enter DL.
  • 9.
    Convolutional Neural Networks(CNN) Courtesy: Towards Data Science ConvNets have the ability to learn these filters Less features to train than MLP, if input size is large CNNs have four types of layers: • Convolution layer • Pooling layer • Dense/Fully connected layer • Activation layer CNNs are well-suited for image classification tasks Popular CNN architectures include LeNet, AlexNet, VGGNet, etc.
  • 10.
    Convolution Layer Courtesy: IBMResearch The convolutional layer is the core building block of a CNN Filter values are the weights which are learned during training Convolution kernel or a filter moves across the receptive fields of the image, checking if the feature is present A dot product is calculated between the input pixels and the filter The filter shifts by a stride, repeating the process until the kernel has swept across the entire image The final output from the series of dot products from the input and the filter is known as a feature map, activation map
  • 11.
    Pooling Layer Courtesy: TowardsData Science Pooling layer conducts dimensionality reduction, reducing the number of parameters in the input Pooling layer does not have any trainable weights There are two main types of pooling: • Max pooling • Average pooling Pooling layers help to reduce complexity, improve efficiency, and limit risk of overfitting.
  • 12.
    Fully Connected Layer Courtesy:Towards Data Science In the fully-connected layer, each node in the output layer connects directly to a node in the previous layer This layer performs the task of classification based on the features extracted through the previous layers and their different filters.
  • 13.
    Activation Functions Courtesy: TowardsData Science Activation function is used to map the input feature value to get the desired output of node It is used to determine the output of neural network like yes or no. It maps the resulting values in between 0 to 1 or-1 to 1 etc. Sigmoid or Logistic Activation Function The main reason why we use sigmoid function is because it exists between 0 to 1. Therefore, it is especially used for models where we have to predict the probability as an output. The function is differentiable, so we can find the slope of the sigmoid curve at any two points. The logistic sigmoid function can cause a neural network to get stuck at the training time.
  • 14.
    Activation Functions Courtesy: TowardsData Science Softmax Activation Function The softmax function is a more generalized logistic activation function which is used for multiclass classification.
  • 15.
    Activation Functions Courtesy: TowardsData Science Tanh or hyperbolic tangent Activation Function The range of the tanh function is from-1 to 1 The advantage is that the negative inputs will be mapped strongly negative and the zero inputs will be mapped near zero in the tanh graph. The tanh function is mainly used classification between two classes.
  • 16.
    Activation Functions Courtesy: TowardsData Science ReLU (Rectified Linear Unit) Activation Function The range of the ReLU function is from 0 to positive infinity The disadvantage is that the any negative input given to the ReLU activation function turns the value into zero immediately in the graph
  • 17.
    Activation Functions Courtesy: TowardsData Science Leaky ReLU Activation Function It is an attempt to solve the dying ReLU problem Usually, the value of a is 0.01 The range of the Leaky ReLU is-infinity to infinity
  • 18.
    Training a CNN Courtesy:Andrej Karpathy Once the architecture is fixed, the training is possible Training is the fixing of optimal weights of the convolutional and fully connected layers Backpropagation of error is used for updating the weights A loss function is optimized w.r.t. the weights Training uses Stochastic Gradient Decent optimization Parameters include learning rate, optimizer, batch size, validation split, metric, and loss Training is to map a set of inputs to a set of outputs from training data.
  • 19.
    Losses Courtesy: Machine LearningMastery Loss function is used to evaluate the performance of prediction, and is a measure of the error in prediction Binary Classification Problems Multi-Class Classification Problems Regression Problems Mean Squared Error (MSE) Cross-Entropy Categorical Cross- Entropy Sparse Categorical Cross- Entropy
  • 20.
    Metrics Courtesy: Towards DataScience Supervised Learning Classification Problems Confusion Matrix Accuracy Precision Recall F1 score ROC AUC Regression Problems Mean Absolute Error (MAE) Mean Squared Error (MSE) Coefficient of Determination (R2 )
  • 21.
    Getting Started Courtesy: TowardsData Science Install keras and tensorflow Prepare the training and testing data Build the CNN layers using the Tensorflow library Select the Optimizer Train the network Finally, test the model pip install keras and pip install tensorflow Make separate folders, without repeating data, and separate classes in different folders in training data Use a sequential model and keep stacking layers The most common one is Adam Use suitable parameters based on the problem Test it on the test database
  • 22.
    Text Recognition RNTN,RNN Image Recognition CNN, DBN Object Recognition CNN, RNTN Time Series Analysis RNN Video Analysis RNN Other Networks
  • 23.
    References Nettleton, David F.,Albert Orriols-Puig, and Albert Fornells. "A study of the effect of different types of noise on the precision of supervised learning techniques." Artificial intelligence review 33.4 (2010): 275-306. Polikar, Robi. "Ensemble learning." Ensemble machine learning. Springer, Boston, MA, 2012. 1-34. Holzinger, Andreas, et al. "Current advances, trends and challenges of machine learning and knowledge extraction: from machine learning to explainable AI." International Cross-Domain Conference for Machine Learning and Knowledge Extraction. Springer, Cham, 2018. Seliya, Naeem, Taghi M. Khoshgoftaar, and Jason Van Hulse. "A study on the relationships of classifier performance metrics." 2009 21st IEEE international conference on tools with artificial intelligence. IEEE, 2009. Blum, Avrim L., and Pat Langley. "Selection of relevant features and examples in machine learning." Artificial intelligence 97.1-2 (1997): 245-271. Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. "Unsupervised learning." The elements of statistical learning. Springer, New York, NY, 2009. 485-585. Deng, Li, Geoffrey Hinton, and Brian Kingsbury. "New types of deep neural network learning for speech recognition and related applications: An overview." 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, 2013.