Muhammad Usman Akhtar
Ph.D. Scholar, School Of Computer Science
Wuhan University, Wuhan, China
DEEP LEARNING
UNDERSTANDING FUNDAMENTALS
Outline
1 Machine Learning (ML)
2 Ingredients for training ML
3 Types of ML algorithms
3.1 Supervised Learning
3.2 Un-supervised Learning
3.3 Reinforcement Learning
4 Deep Learning (DL)
4.1 Why DL useful
4.2 Application
5 Architectures
6 Activation Function
7 Popular Neural Network Architecture
7.1 Feedforward Neural network
7.2 Recurrent Neural network
7.3 Convolutional neural network
MACHINE LEARNING
Muhammad Usman Akhtar | Ph.D Scholar | Wuhan
University | School of Computer Science
1 Machine Learning
◂ Machine learning is the scientific study of algorithms and statistical models
that computer systems use to perform a specific task without using explicit
instructions, relying on patterns and inference instead. It is seen as a subset
of artificial intelligence.
2 Ingredients for TRAINING ML ALGORITHM
 Data
 Model
 Objective function
 Optimization Algorithm
Data
◂ First we must prepare a certain amount of data to train with. Usually
this is historical data which is readily available.
Model
◂ The simplest model we need to train is a linear model.
◂ In weather forecast problem that would mean to find some coefficients multiply
each variable with them and sum everything to get the output .
Objective Function
DATA MODEL OUTPUT
fed obtain
We want the output as close to reality as possible. That's where the objective function comes in, it estimates how correct the model outputs are?
Here out goal is to minimize the objective function or error
Optimization Algorithm
◂ It consists the mechanics through which we vary the parameters of the
model to optimize the objective function.
3 Types of ML Algorithms
◂ Supervised Learning
◂ Unsupervised Learning
◂ Reinforcement learning
3.1 Supervised Learning
◂ Learning with a labeled training set. Starting from the analysis of a known training
dataset, the learning algorithm produces an inferred function to make predictions
about the output values. The system is able to provide targets for any new input
after sufficient training.
◂ Example: email classification and tea making trained on labeled data.
3.2 Unsupervised Learning
◂ Unsupervised learning studies how systems can infer a function to describe
a hidden structure from unlabeled data. The system doesn’t figure out the
right output, but it explores the data and can draw inferences from
datasets, that’s called clustering.
◂ Example: House Price and animal classifier
3.3 Reinforcement learning
◂ Interacts with its environment by producing actions and discovers errors or
rewards. This method allows machines and software agents to
automatically maximize its performance.
◂ Example: learn to play Go, reward: win or lose
DEEP LEARNING
Muhammad Usman Akhtar | Ph.D Scholar | Wuhan
University | School of Computer Science
4 Deep Learning (DL) ?
◂ Deep learning is a machine learning technique that teaches computers to do what
comes naturally to humans: Visual , text sound- learn by example.
◂ Deep learning algorithms attempt to learn (multiple levels of) representation by
using a hierarchy of multiple layers, learning can be supervised, semi-supervised or
unsupervised.
◂ If you provide the system tons of information, it begins to understand it and
respond in useful ways.
4.1 Why is DL useful?
 Manually designed features are often over-specified, incomplete and take a long time
to design and validate
 Learned Features are easy to adapt, fast to learn
 Utilize large amounts of training data
In ~2010 DL started outperforming other ML techniques
first in speech and vision, then NLP
4.2 Application
◂ DL is a key technology behind Drive Less cars. It is the key to voice control in
consumer devices like phones, tablets, TVs, and hands-free speakers.
◂ Medical Research
◂ Several big improvements in recent years in NLP
 Machine Translation
 Sentiment Analysis
 Dialogue Agents
 Question Answering
 Text Classification …
5 Architecture 3*4=12
w11 w12 w13 w14
w21 w22 w23 w24
w31 w32 w33 w34
X*Y=
1 ∗ 3 ∗ 3 ∗ 4 = 1 ∗ 4
5.1 Layers?
5.2 Packages
A third-party package used for computations. Allows us
to work with multi dimension arrays.
2D plotting package, especially designed for visualizing
Python and NumPy computations.
Machine Learning, especially deep learning.
Features various algorithms like support vector machine, random
forests, and k-neighbors, and it also supports Python numerical and
scientific libraries like NumPy and SciPy
Alternative
Parameters
found by optimizing
◂ Width
◂ Depth
◂ Learning Rate
5.3 Hyperparameters
pre-set by us
vs
◂ Weights(w)
◂ Biases(b)
5.4 Vanishing Gradient
◂ Each of the neural networks weights receives an update proportion to partial
derivation of error function with respect to the current weight in each iteration
of training.
5.5 Under fitting and Overfitting
5.6 Training Loss and validation loss
6 Activation Functions
◂ A Neural Network without Activation function would simply be a Linear
regression Model, which is limited in complexity and less power to learn to
learn complex functional mapping such as images, videos , audio , speech
etc.
6.1 Sigmoid (Logistic Function)
◂ A sigmoid activation squishes values between 0 and 1. That is helpful to
update or forget data because any number getting multiplied by 0 is 0, causing
values to disappears or be “forgotten.” Any number multiplied by 1 is the
same value therefore that value stay’s the same or is “kept.” The network can
learn which data is not important therefore can be forgotten or which data is
important to keep.
6.2 Activation: Tanh
◂ The tanh activation is used to help regulate the values flowing through the
network. The tanh function squishes values to always be between -1 and 1.
6.3 ReLU
◂ Takes a real-valued number and thresholds it at zero.
◂ Used within hidden layers for outside use softmax
◂ Prevents the gradient vanishing problem
6.4 Softmax
◂ Is a function that takes as input a vector of K real numbers, and normalizes
it into a probability distribution consisting of K probabilities proportional to
the exponentials of the input numbers
7 Popular Neural Networks
7.1 Feed Forward Neural network
◂ In feed forward Neural network information flows in only forward direction,
from input to nodes, through the hidden layers
7.2 Recurrent NN
◂ Recurrent Neural Networks, or RNNs, were designed to work with
sequence prediction problems rather than local features.
◂ Sequence prediction problems come in many forms and are best
described by the types of inputs and outputs supported.
Memorizes time
series input
Handle sequential data
Consider
Consider all inputs
Use RNNs For:
◂ Text data
◂ Speech data
◂ Classification prediction problems
◂ Regression prediction problems
◂ Machine Translation
Don’t Use RNNs For:
◂ Tabular data
◂ Image data
7.3 Convolutional NN
◂ CNNs, were designed to map image data to an output variable. They have ability
to develop an internal representation of a two-dimensional image.
◂ The CNN input is traditionally two-dimensional, a field or matrix, but can also be
changed to be one-dimensional, allowing it to develop an internal
representation of a one-dimensional sequence. This allows the CNN to be used
more generally on other types of data that has a spatial relationship.
◂ For example: There is an order relationship between words in a document of
text. There is an ordered relationship in the time steps of a time series.
Use CNNs For:
◂ Image data
◂ Classification prediction problems
◂ Regression prediction problems
Try CNNs On:
◂ Text data
◂ Time series data
◂ Sequence input data
Application Example:
IMDB Movie reviews sentiment classification
◂ https://uofi.box.com/v/cs510DL
50 K Reviews
25K Positive 25K Negative
Application Example:
Relation Extraction from text
Useful for:
• knowledge base completion
• social media analysis
• question answering
• …
Possible Question?
• When to use Supervised Learning?
• Which Algorithm is best for time series dependent solutions?
• What is 10 fold validation?
Muhammad Usman Akhtar | Ph.D Scholar | Wuhan
University | School of Computer Science
Reference
 http://web.stanford.edu/class/cs224n
 https://www.coursera.org/specializations/deep-learning
 https://chrisalbon.com/#Deep-Learning
 http://www.asimovinstitute.org/neural-network-zoo
 http://cs231n.github.io/optimization-2
 https://medium.com/@ramrajchandradevan/the-evolution-of-gradient-descend-optimization-algorithm-4106a6702d39
 https://arimo.com/data-science/2016/bayesian-optimization-hyperparameter-tuning
 http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow
 http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp
 https://medium.com/technologymadeeasy/the-best-explanation-of-convolutional-neural-networks-on-the-internet-fbb8b1ad5df8
 http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
 http://www.wildml.com/2015/10/recurrent-neural-network-tutorial-part-4-implementing-a-grulstm-rnn-with-python-and-theano/
 http://colah.github.io/posts/2015-08-Understanding-LSTMs
 https://github.com/hyperopt/hyperopt
 https://github.com/tensorflow/nmt
Thank you!
You can find me at
◂ ua@uetpeshawar.edu.pk
Q&A
Muhammad Usman Akhtar
Ph.D. Scholar, School Of Computer Science
Wuhan University, Wuhan, China
DEEP LEARNING
UNDERSTANDING FUNDAMENTALS
Muhammad Usman Akhtar | Ph.D Scholar | Wuhan
University | School of Computer Science

Muhammad Usman Akhtar | Ph.D Scholar | Wuhan University | School of Computer Science

  • 1.
    Muhammad Usman Akhtar Ph.D.Scholar, School Of Computer Science Wuhan University, Wuhan, China DEEP LEARNING UNDERSTANDING FUNDAMENTALS
  • 2.
    Outline 1 Machine Learning(ML) 2 Ingredients for training ML 3 Types of ML algorithms 3.1 Supervised Learning 3.2 Un-supervised Learning 3.3 Reinforcement Learning 4 Deep Learning (DL) 4.1 Why DL useful 4.2 Application 5 Architectures 6 Activation Function 7 Popular Neural Network Architecture 7.1 Feedforward Neural network 7.2 Recurrent Neural network 7.3 Convolutional neural network
  • 3.
    MACHINE LEARNING Muhammad UsmanAkhtar | Ph.D Scholar | Wuhan University | School of Computer Science
  • 4.
    1 Machine Learning ◂Machine learning is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions, relying on patterns and inference instead. It is seen as a subset of artificial intelligence.
  • 5.
    2 Ingredients forTRAINING ML ALGORITHM  Data  Model  Objective function  Optimization Algorithm
  • 6.
    Data ◂ First wemust prepare a certain amount of data to train with. Usually this is historical data which is readily available.
  • 7.
    Model ◂ The simplestmodel we need to train is a linear model. ◂ In weather forecast problem that would mean to find some coefficients multiply each variable with them and sum everything to get the output .
  • 8.
    Objective Function DATA MODELOUTPUT fed obtain We want the output as close to reality as possible. That's where the objective function comes in, it estimates how correct the model outputs are? Here out goal is to minimize the objective function or error
  • 9.
    Optimization Algorithm ◂ Itconsists the mechanics through which we vary the parameters of the model to optimize the objective function.
  • 11.
    3 Types ofML Algorithms ◂ Supervised Learning ◂ Unsupervised Learning ◂ Reinforcement learning
  • 12.
    3.1 Supervised Learning ◂Learning with a labeled training set. Starting from the analysis of a known training dataset, the learning algorithm produces an inferred function to make predictions about the output values. The system is able to provide targets for any new input after sufficient training. ◂ Example: email classification and tea making trained on labeled data.
  • 13.
    3.2 Unsupervised Learning ◂Unsupervised learning studies how systems can infer a function to describe a hidden structure from unlabeled data. The system doesn’t figure out the right output, but it explores the data and can draw inferences from datasets, that’s called clustering. ◂ Example: House Price and animal classifier
  • 14.
    3.3 Reinforcement learning ◂Interacts with its environment by producing actions and discovers errors or rewards. This method allows machines and software agents to automatically maximize its performance. ◂ Example: learn to play Go, reward: win or lose
  • 15.
    DEEP LEARNING Muhammad UsmanAkhtar | Ph.D Scholar | Wuhan University | School of Computer Science
  • 16.
    4 Deep Learning(DL) ? ◂ Deep learning is a machine learning technique that teaches computers to do what comes naturally to humans: Visual , text sound- learn by example. ◂ Deep learning algorithms attempt to learn (multiple levels of) representation by using a hierarchy of multiple layers, learning can be supervised, semi-supervised or unsupervised. ◂ If you provide the system tons of information, it begins to understand it and respond in useful ways.
  • 17.
    4.1 Why isDL useful?  Manually designed features are often over-specified, incomplete and take a long time to design and validate  Learned Features are easy to adapt, fast to learn  Utilize large amounts of training data In ~2010 DL started outperforming other ML techniques first in speech and vision, then NLP
  • 18.
    4.2 Application ◂ DLis a key technology behind Drive Less cars. It is the key to voice control in consumer devices like phones, tablets, TVs, and hands-free speakers. ◂ Medical Research ◂ Several big improvements in recent years in NLP  Machine Translation  Sentiment Analysis  Dialogue Agents  Question Answering  Text Classification …
  • 19.
    5 Architecture 3*4=12 w11w12 w13 w14 w21 w22 w23 w24 w31 w32 w33 w34 X*Y= 1 ∗ 3 ∗ 3 ∗ 4 = 1 ∗ 4
  • 20.
  • 21.
    5.2 Packages A third-partypackage used for computations. Allows us to work with multi dimension arrays. 2D plotting package, especially designed for visualizing Python and NumPy computations. Machine Learning, especially deep learning. Features various algorithms like support vector machine, random forests, and k-neighbors, and it also supports Python numerical and scientific libraries like NumPy and SciPy Alternative
  • 22.
    Parameters found by optimizing ◂Width ◂ Depth ◂ Learning Rate 5.3 Hyperparameters pre-set by us vs ◂ Weights(w) ◂ Biases(b)
  • 23.
    5.4 Vanishing Gradient ◂Each of the neural networks weights receives an update proportion to partial derivation of error function with respect to the current weight in each iteration of training.
  • 24.
    5.5 Under fittingand Overfitting
  • 25.
    5.6 Training Lossand validation loss
  • 26.
    6 Activation Functions ◂A Neural Network without Activation function would simply be a Linear regression Model, which is limited in complexity and less power to learn to learn complex functional mapping such as images, videos , audio , speech etc.
  • 27.
    6.1 Sigmoid (LogisticFunction) ◂ A sigmoid activation squishes values between 0 and 1. That is helpful to update or forget data because any number getting multiplied by 0 is 0, causing values to disappears or be “forgotten.” Any number multiplied by 1 is the same value therefore that value stay’s the same or is “kept.” The network can learn which data is not important therefore can be forgotten or which data is important to keep.
  • 28.
    6.2 Activation: Tanh ◂The tanh activation is used to help regulate the values flowing through the network. The tanh function squishes values to always be between -1 and 1.
  • 29.
    6.3 ReLU ◂ Takesa real-valued number and thresholds it at zero. ◂ Used within hidden layers for outside use softmax ◂ Prevents the gradient vanishing problem
  • 30.
    6.4 Softmax ◂ Isa function that takes as input a vector of K real numbers, and normalizes it into a probability distribution consisting of K probabilities proportional to the exponentials of the input numbers
  • 31.
  • 32.
    7.1 Feed ForwardNeural network ◂ In feed forward Neural network information flows in only forward direction, from input to nodes, through the hidden layers
  • 33.
    7.2 Recurrent NN ◂Recurrent Neural Networks, or RNNs, were designed to work with sequence prediction problems rather than local features. ◂ Sequence prediction problems come in many forms and are best described by the types of inputs and outputs supported. Memorizes time series input Handle sequential data Consider Consider all inputs
  • 34.
    Use RNNs For: ◂Text data ◂ Speech data ◂ Classification prediction problems ◂ Regression prediction problems ◂ Machine Translation Don’t Use RNNs For: ◂ Tabular data ◂ Image data
  • 35.
    7.3 Convolutional NN ◂CNNs, were designed to map image data to an output variable. They have ability to develop an internal representation of a two-dimensional image. ◂ The CNN input is traditionally two-dimensional, a field or matrix, but can also be changed to be one-dimensional, allowing it to develop an internal representation of a one-dimensional sequence. This allows the CNN to be used more generally on other types of data that has a spatial relationship. ◂ For example: There is an order relationship between words in a document of text. There is an ordered relationship in the time steps of a time series.
  • 36.
    Use CNNs For: ◂Image data ◂ Classification prediction problems ◂ Regression prediction problems Try CNNs On: ◂ Text data ◂ Time series data ◂ Sequence input data
  • 37.
    Application Example: IMDB Moviereviews sentiment classification ◂ https://uofi.box.com/v/cs510DL 50 K Reviews 25K Positive 25K Negative
  • 38.
    Application Example: Relation Extractionfrom text Useful for: • knowledge base completion • social media analysis • question answering • …
  • 39.
    Possible Question? • Whento use Supervised Learning? • Which Algorithm is best for time series dependent solutions? • What is 10 fold validation? Muhammad Usman Akhtar | Ph.D Scholar | Wuhan University | School of Computer Science
  • 40.
    Reference  http://web.stanford.edu/class/cs224n  https://www.coursera.org/specializations/deep-learning https://chrisalbon.com/#Deep-Learning  http://www.asimovinstitute.org/neural-network-zoo  http://cs231n.github.io/optimization-2  https://medium.com/@ramrajchandradevan/the-evolution-of-gradient-descend-optimization-algorithm-4106a6702d39  https://arimo.com/data-science/2016/bayesian-optimization-hyperparameter-tuning  http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow  http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp  https://medium.com/technologymadeeasy/the-best-explanation-of-convolutional-neural-networks-on-the-internet-fbb8b1ad5df8  http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/  http://www.wildml.com/2015/10/recurrent-neural-network-tutorial-part-4-implementing-a-grulstm-rnn-with-python-and-theano/  http://colah.github.io/posts/2015-08-Understanding-LSTMs  https://github.com/hyperopt/hyperopt  https://github.com/tensorflow/nmt
  • 41.
    Thank you! You canfind me at ◂ ua@uetpeshawar.edu.pk Q&A
  • 42.
    Muhammad Usman Akhtar Ph.D.Scholar, School Of Computer Science Wuhan University, Wuhan, China DEEP LEARNING UNDERSTANDING FUNDAMENTALS Muhammad Usman Akhtar | Ph.D Scholar | Wuhan University | School of Computer Science

Editor's Notes

  • #10 W1 and w2 are the parameters that will change for each set of parameters we will compute the objective function, then we will choose the model with highest predictive ower.
  • #13 In supervised learning we provide algorithm with inputs and their corresponding desired output based on the information based on this information it learns how to produce outputs closer to the ones we are looking for
  • #14 Some times we don’t have time or resources to label the whole dataset. Discover patterns in unlabeled data Means that we don’t tell the algorithm what our goal is.. Instead we ask it to find some sort of dependence or under lying logic in the data provided.
  • #15 learn to act based on feedback/reward
  • #20 Now Remember we combined these inputs linearly and then add non linearality. How can we do that? Inputs are X and to join them linearly we need weights. In this example the weights are 3*4 matrix 1*3 * 1*4=1*4 we will get vector of 1 * 9 matrix Non linearities don’t change the shape of the expression. Just its linearity.
  • #22 Tensor flow is leading library for neural networks (Deep NN, ConV NN, Rec NN) released by google in 2015. as google is king of Machine Learning For k mean and random forest SkLearn ot Machine Learning is a better choice.
  • #28  Sigmoid neurons saturate and kill gradients, thus NN will barely learn when the neuron’s activation are 0 or 1 (saturate) + gradient at these regions almost zero + almost no signal will flow to its weights + if initial weights are too large then most neurons would saturate especially used for models where we have to predict the probability as an output. Since probability of anything exists only between the range of 0 and 1, sigmoid is the right choice.
  • #29 Like sigmoid, tanh neurons saturate Unlike sigmoid, output is zero-centered Tanh is a scaled sigmoid: tanh 𝑥 =2𝑠𝑖𝑔𝑚 2𝑥 −1
  • #30 Most Deep Networks use ReLU nowadays Range: [ 0 to infinity) + Trains much faster accelerates the convergence of SGD due to linear, non-saturating form + Less expensive operations implemented by simply thresholding a matrix at zero + More expressive problem with ReLu is that some gradients can be fragile during training and can die. It can cause a weight update which will makes it never activate on any data point again. Simply saying that ReLu could result in Dead Neurons. To fix this problem another modification was introduced called Leaky ReLu to fix the problem of dying neurons. It introduces a small slope to keep the updates alive.
  • #31 Softmax function outputs a vector that represents the probability distributions of a list of potential outcomes.