武汉理工大学
Wuhan University of Technology
School of Computer Science
By OLOULADE BABATOUNDE MOCTARD
“ArtificialIntelligenceis thescienceandengineeringofmakingintelligentmachines,especiallyintelligentcomputerprograms”.
John McCarthy
• Gaming
• NaturalLanguage Processing
• ExpertSystems
• VisionSystems
• HandwritingRecognition
• IntelligentRobots
Applications of AI
INPUT
AI ALGORITHM
OUTPUT
Online Customer
MachineLearning exploresthe study and constructionof algorithmsthat can learn fromand make predictions on data.
SupervisedLearning
Regressionand
classification problemsare
mainly solved
Labelleddata is used for
training
Linear Regression,Support
Vector Machines (SVM),
NeuralNetworks, Decision
Trees,Naive Bayes, Nearest
Neighbor
UnsupervisedLearning
It is used for Clustering
problems(grouping),
Anomaly Detection (inbanks
forunusual transactions
Unlabeleddata is used
k-means
clustering, Association rule
Used inDescriptive Modelling
Semi-supervised
It is in-betweenthat
Supervisedand Unsupervised
Learning
ReinforcedLearning
machine learnsfrom past
experience
modelled as MarkovDecision
Process
Q-Learning,Deep Adversarial
Networks.
Healthcare
Finance
Retail
Travel
MediaVirtual Personal Assistants
Videos
Surveillance
Social Media Services
Malware FilteringResult Refining Product Recommendations
Online Fraud Detection
Web Search Engine
Photo tagging Applications
Spam Detector Marketing and Sales
GovernmentTransportatio
n
Deeplearningis a typeof machinelearningin whicha modellearnsto performclassificationtasksdirectly fromimages, text, or sound.
• DeepLearningis inspiredbythefunctionalityofourbraincellscalled
artificialneuralnetwork(ANN).
• DeepLearningautomaticallyfindsout thefeatureswhichare
importantfor classification
• DeepLearning isan emphasison learningsuccessivelayersof increasingly
meaningfulrepresentations
• neuralnetworksis structuredin literallayers stackedon topof
eachother(DNN)
• deep networkis a multistage information-distillation
operation
• Utilizelargeamountsof trainingdata
Decision Trees
Use text collections
and structured knowledge bases
Gradient Boosting Machines
• based on ensembling weak prediction
models
Kernel Methods
• group of classification algorithms
• support vector machine (SVM)
Random Forests
• involves building a large number of
specialized decision
trees and then ensembling their
outputs
Neural Networks
question answering problems
based on different types of
resources, including Web, tables ,
images, diagrams , videos
Probabilistic modeling
• the earliest forms of machine learning
• Naive Bayes algorithm
• Logistic regression
• Advancement in speech recognition in the last 3 years
• Advancement in Computer Vision
• Advancement in Natural Language Processing
6
Architecture of DNN
https://cdn-images-1.medium.com/max/800/1*5egrX--WuyrLA7gBEXdg5A.png
DNN usesa cascade of manylayersof non-linear processing unitsthat are usedfor featureextractionand transformation
• features have been learned using multiple levels of
representation
• Multi-layers of DNN helps the machine to derive the
hierarchical representation
• DL can be applied to a supervised as well as unsupervised dataset to
develop NLP applications
Basic layered ANN configuration
https://cn.bing.com/images/search?view=detailV2&ccid=1%2FTR7%2Ft2&id=E83FB45C81BE2EFB2C6EA8FAF5F66A7328B7DA6
6&thid=OIP.1_TR7_t2HMjc4nxdR6KNuQHaD9&mediaurl=http%3A%2F%2Fwww.scielo.org.co%2Fimg%2Frevistas%2Fiei%2Fv34
n2%2Fv34n2a03f2.jpg&exph=271&expw=507&q=ann+layers+weight&simid=608029997680099912&selectedindex=61&vt=0
• The loss function measures the quality of the network’s output.
• The loss score is used as a feedback signal to adjust the weights.
• Initially, the weights of the network are assigned
random values
 Gradient descent
• Gradient descent is use to optimize the accuracy of the linear regression and minimize the loss or error function over time.
Learning rate intuition
http://cs231n.github.io/assets/nn3/learningrates.jpeg
 Gradient descent
Activation functions
Activation functions map input nodes to the output nodes in a certain fashion
using certain mathematical operations .
ANN Structutre
Architecture
(arrangement of neurons and layers)
Activities
(activities of the neurons)
Learningrule
(Updateweightandoptimizeoutput)
• Transfer potential aggregates inputs and weights
• Activationfunction applies a non-linear mathematical transformation on transfert
potential function
• Thresholdfunction either activates the neuron or does not activate
Activation functions
Sigmoid function equation
https://cdn-images-1.medium.com/max/800/1*QHPXkxGmIyxn7mH4BtRJXQ.png
• Takes the given equation and a number then squash this number in the range of 0 and 1
Problems
Suffers from Vanishing gradient
problem
The gradient of the output of the network with respect to the
parameters in the early layers becomes very small
has a slow convergence rate
Due to this vanishing gradient problem sigmoid activation
function converges very slowly
is not a zero-centric function sigmoid function's output range is [0,1]
 hyperbolictangent function (TanH)
Tanh activation function equation
https://cdn-images-1.medium.com/max/800/1*HJhu8BO7KxkjqRRMSaz0Gw.png
• This function squashes the input region in the range of [-1 to 1]
• its output is zero-centric
TanH also suffers from the vanishing gradient problem
 Sigmoid
 RectifiedLinearUnit (ReLu)
ReLu activation function equation
https://cdn-images-1.medium.com/max/800/1*JtJaS_wPTCshSvAFlCu_Wg.png
• ReLuis simple and doesn't have any complex computation
• ReLu is less expensive compared to sigmoid and TanH
• ReLu doesn't have the vanishing gradient problem.
• some units of the neural network can be fragile and die during training
• the gradient flowing through it will always be zero from that point on
 LeakyReLu
Leaky ReLu
http://wangxinliu.com/images/machine_learning/leakyrelu.png
 maxout
• Generalized form of both ReLu and Leaky ReLu
• doubles the parameters of each neuron
 Otheractivationfunctions
• binary step function
• identity function
• ArcTan
 Loss functions (cost functions or error functions )
• define the error function and get the output when start to train ANN,
• compare the generated output with the expected output given as part of the training data
• calculate the gradient value of this error function
• backpropagate the error gradient in the network to update the existing weights and bias values to optimize the generated output
• Quadratic cost function ( meansquarederroror sumsquared error)
• Cross-entropy cost function (Bernoullinegative log likelihoodor binarycross-entropy)
• Kullback-Leibler divergence (information divergence, information gain, relative entropy, or KLIC)
• exponential cost, Hellinger distance, Generalized Kullback-Leibler divergence, and Itakura-Saito distance
 Popular Loss functions
Regression tasks
Categoricaldata and classifica
tasks
 Loss functions
 Unsupervised Pretrained Networks (UPNs)
 Convolutional Neural Networks (CNNs)
 Recurrent Neural Networks
 Recursive Neural Networks
Unsupervised Pretrained Networks
Autoencoders
Deep Belief Networks (DBNs)
Generative Adversarial Networks (GANs)
Autoencoders (compression autoencoders and denoising autoencoders)
Autoencoder network architecture
• Autoencoders are used to reduce a dataset’s dimensionality.
• The output of the autoencoder network is a reconstruction of the input data in
the most efficient form.
• the output layer in an autoen‐coder has the same number of units as the input layer does
• The autoencoder learns directly from unla‐beled data
• Autoencoders rely on backpropagation to update their weights
• Autoencoders are good at powering anomaly detec‐tion systems.
Deep Belief Networks (DBNs)
• DBNs are composed of layers of Restricted Boltzmann Machines (RBMs)
DBN architecture
Generative Adversarial Networks (GANs)
• GANs use unsupervised learning to train two models in parallel
• RMB is a generative stochastic artificial neural network that can learn a probability
distribution over its set of inputs
 Convolutional Neural Networks (CNNs)
• A convolutional neural network (CNN, or ConvNet) is one of the most popular algorithms for deep learning with images and video
• CNN is composed of an input layer, an output layer, and many hidden layers in between
Feature
Detection
Layers
Convolution
puts the input images through a set of convolutional filters
Pooling
simplifies the output by performing nonlinear down sampling reducing
the number of parameters that the network needs to learn about
Rectifed linear
unit (ReLU)
allows for faster and more effective training by mapping negative
values to zero and maintaining positive value
• classificationlayershave one or morelayers
• Layersproduceclassprobabilitiesor scores
• The outputof these layersproducestypicallya two-
dimensionalout‐put
• The input layer accepts three-dimensional input
generally in the form spatially
 Recurrent Neural Networks
• Recurrent Neural Networks are in the family of feed-forward neural networks
• Recurrent Neural Networks can send information over time-steps
• Recurrent Neural Networks use the backpropagation algorithm, but with a little twist
Advantages
• Possibility of processing input of any length
• Model size not increasing with size of input
• Computation takes into account historical information
• Weights are shared across time
 applications of Recurrent Neural Networks in NLP
• Language Modeling and Generating Text
• Machine Translation
• Speech Recognition
• Generating Image Descriptions
Drawbacks
• Computation being slow
• Difficulty of accessing information from a long time ago
• Cannot consider any future input for the current state
 RNN Extensions
• Bidirectional RNNs:
• the output at time may not only depend on the previous elements in the sequence, but also
future elements
• Deep (Bidirectional) RNNs
• multiple layers per time step
• LSTM networks
• LSTMs don’t have a fundamentally different architecture from RNNs
• use a different function to compute the hidden state
• They then combine the previous state, the current memory, and the input
 Recursive Neural Networks
• A recursive neural network is created by applying the same set of weights recursively over a structured input
• A recursive neural network have the ability to model the hierarchial structures in the training dataset
 Applications of Recurrent NeuralNetworks in NLP
 Recursive Neural Networks
A recurrent neural network basically unfolds over time.
• Image scene decomposition , NLP, Audio-to-text transcription
A recursive neural network is more like a hierarchical network
Vs Recurrent Neural Networks
 Simpler models (logistic regression) don’t achieve the accuracy level your use case needs
 You have complex pattern matching in images, NLP, or audio to deal with
 You have high dimensionality data
 You have the dimension of time in your vectors (sequences)
 You have high-quality, low-dimensional data; for example, columnar data from a database export
 You’re not trying to find complex patterns in image data
 You’ll achieve poor results from both methods when the data is incomplete and/or of poor quality
• Sarita Arora , Python Natural Language Processing, SMECorner, Mumbai, India
• François Chollet , Deep Learning with Python, MANNING Shelter Island ,2018
• Richard Socher, Yoshua Bengio and Chris Manning, Deep Learning for NLP(without Magic), ACL 2012
• Dr. Joshua F. Willey, R Deep Learning Essentials, BIRMINGHAM – MUMBAI, Packt Publishing Ltd , 2016
• Josh Patterson and Adam Gibson, Deep Learning A PRACTITIONER'S APPROACH , O’Reilly Media, Inc., 2017
• Nikhil Buduma, Fundamentals of Deep Learning , O’Reilly Media, Inc., 2017
• Antonio Spadaro, AI, Machine Learning & Deep Learning: cosa cambia , PyCon Italia 2017
• Ir Dr F. Chan , Artificial Intelligence – Deep Learning and its Applications, Build4Asia Conference 2018
• https://www.tutorialspoint.com/artificial_intelligence/artificial_intelligence_overview.htm
• https://data-flair.training/blogs/python-django-tutorial/
• https://www.geeksforgeeks.org/ml-machine-learning/
• https://www.google.com/search?q=machine+learning+application&source
Deep Learning

Deep Learning

  • 1.
    武汉理工大学 Wuhan University ofTechnology School of Computer Science By OLOULADE BABATOUNDE MOCTARD
  • 2.
    “ArtificialIntelligenceis thescienceandengineeringofmakingintelligentmachines,especiallyintelligentcomputerprograms”. John McCarthy •Gaming • NaturalLanguage Processing • ExpertSystems • VisionSystems • HandwritingRecognition • IntelligentRobots Applications of AI INPUT AI ALGORITHM OUTPUT
  • 3.
    Online Customer MachineLearning exploresthestudy and constructionof algorithmsthat can learn fromand make predictions on data. SupervisedLearning Regressionand classification problemsare mainly solved Labelleddata is used for training Linear Regression,Support Vector Machines (SVM), NeuralNetworks, Decision Trees,Naive Bayes, Nearest Neighbor UnsupervisedLearning It is used for Clustering problems(grouping), Anomaly Detection (inbanks forunusual transactions Unlabeleddata is used k-means clustering, Association rule Used inDescriptive Modelling Semi-supervised It is in-betweenthat Supervisedand Unsupervised Learning ReinforcedLearning machine learnsfrom past experience modelled as MarkovDecision Process Q-Learning,Deep Adversarial Networks. Healthcare Finance Retail Travel MediaVirtual Personal Assistants Videos Surveillance Social Media Services Malware FilteringResult Refining Product Recommendations Online Fraud Detection Web Search Engine Photo tagging Applications Spam Detector Marketing and Sales GovernmentTransportatio n
  • 4.
    Deeplearningis a typeofmachinelearningin whicha modellearnsto performclassificationtasksdirectly fromimages, text, or sound. • DeepLearningis inspiredbythefunctionalityofourbraincellscalled artificialneuralnetwork(ANN). • DeepLearningautomaticallyfindsout thefeatureswhichare importantfor classification • DeepLearning isan emphasison learningsuccessivelayersof increasingly meaningfulrepresentations • neuralnetworksis structuredin literallayers stackedon topof eachother(DNN) • deep networkis a multistage information-distillation operation • Utilizelargeamountsof trainingdata
  • 5.
    Decision Trees Use textcollections and structured knowledge bases Gradient Boosting Machines • based on ensembling weak prediction models Kernel Methods • group of classification algorithms • support vector machine (SVM) Random Forests • involves building a large number of specialized decision trees and then ensembling their outputs Neural Networks question answering problems based on different types of resources, including Web, tables , images, diagrams , videos Probabilistic modeling • the earliest forms of machine learning • Naive Bayes algorithm • Logistic regression
  • 6.
    • Advancement inspeech recognition in the last 3 years • Advancement in Computer Vision • Advancement in Natural Language Processing 6
  • 7.
    Architecture of DNN https://cdn-images-1.medium.com/max/800/1*5egrX--WuyrLA7gBEXdg5A.png DNNusesa cascade of manylayersof non-linear processing unitsthat are usedfor featureextractionand transformation • features have been learned using multiple levels of representation • Multi-layers of DNN helps the machine to derive the hierarchical representation • DL can be applied to a supervised as well as unsupervised dataset to develop NLP applications
  • 8.
    Basic layered ANNconfiguration https://cn.bing.com/images/search?view=detailV2&ccid=1%2FTR7%2Ft2&id=E83FB45C81BE2EFB2C6EA8FAF5F66A7328B7DA6 6&thid=OIP.1_TR7_t2HMjc4nxdR6KNuQHaD9&mediaurl=http%3A%2F%2Fwww.scielo.org.co%2Fimg%2Frevistas%2Fiei%2Fv34 n2%2Fv34n2a03f2.jpg&exph=271&expw=507&q=ann+layers+weight&simid=608029997680099912&selectedindex=61&vt=0
  • 9.
    • The lossfunction measures the quality of the network’s output. • The loss score is used as a feedback signal to adjust the weights. • Initially, the weights of the network are assigned random values
  • 10.
     Gradient descent •Gradient descent is use to optimize the accuracy of the linear regression and minimize the loss or error function over time.
  • 11.
  • 12.
    Activation functions Activation functionsmap input nodes to the output nodes in a certain fashion using certain mathematical operations . ANN Structutre Architecture (arrangement of neurons and layers) Activities (activities of the neurons) Learningrule (Updateweightandoptimizeoutput) • Transfer potential aggregates inputs and weights • Activationfunction applies a non-linear mathematical transformation on transfert potential function • Thresholdfunction either activates the neuron or does not activate
  • 13.
    Activation functions Sigmoid functionequation https://cdn-images-1.medium.com/max/800/1*QHPXkxGmIyxn7mH4BtRJXQ.png • Takes the given equation and a number then squash this number in the range of 0 and 1 Problems Suffers from Vanishing gradient problem The gradient of the output of the network with respect to the parameters in the early layers becomes very small has a slow convergence rate Due to this vanishing gradient problem sigmoid activation function converges very slowly is not a zero-centric function sigmoid function's output range is [0,1]  hyperbolictangent function (TanH) Tanh activation function equation https://cdn-images-1.medium.com/max/800/1*HJhu8BO7KxkjqRRMSaz0Gw.png • This function squashes the input region in the range of [-1 to 1] • its output is zero-centric TanH also suffers from the vanishing gradient problem  Sigmoid
  • 14.
     RectifiedLinearUnit (ReLu) ReLuactivation function equation https://cdn-images-1.medium.com/max/800/1*JtJaS_wPTCshSvAFlCu_Wg.png • ReLuis simple and doesn't have any complex computation • ReLu is less expensive compared to sigmoid and TanH • ReLu doesn't have the vanishing gradient problem. • some units of the neural network can be fragile and die during training • the gradient flowing through it will always be zero from that point on  LeakyReLu Leaky ReLu http://wangxinliu.com/images/machine_learning/leakyrelu.png  maxout • Generalized form of both ReLu and Leaky ReLu • doubles the parameters of each neuron  Otheractivationfunctions • binary step function • identity function • ArcTan
  • 15.
     Loss functions(cost functions or error functions ) • define the error function and get the output when start to train ANN, • compare the generated output with the expected output given as part of the training data • calculate the gradient value of this error function • backpropagate the error gradient in the network to update the existing weights and bias values to optimize the generated output
  • 16.
    • Quadratic costfunction ( meansquarederroror sumsquared error) • Cross-entropy cost function (Bernoullinegative log likelihoodor binarycross-entropy) • Kullback-Leibler divergence (information divergence, information gain, relative entropy, or KLIC) • exponential cost, Hellinger distance, Generalized Kullback-Leibler divergence, and Itakura-Saito distance  Popular Loss functions Regression tasks Categoricaldata and classifica tasks  Loss functions
  • 17.
     Unsupervised PretrainedNetworks (UPNs)  Convolutional Neural Networks (CNNs)  Recurrent Neural Networks  Recursive Neural Networks
  • 18.
    Unsupervised Pretrained Networks Autoencoders DeepBelief Networks (DBNs) Generative Adversarial Networks (GANs)
  • 19.
    Autoencoders (compression autoencodersand denoising autoencoders) Autoencoder network architecture • Autoencoders are used to reduce a dataset’s dimensionality. • The output of the autoencoder network is a reconstruction of the input data in the most efficient form. • the output layer in an autoen‐coder has the same number of units as the input layer does • The autoencoder learns directly from unla‐beled data • Autoencoders rely on backpropagation to update their weights • Autoencoders are good at powering anomaly detec‐tion systems.
  • 20.
    Deep Belief Networks(DBNs) • DBNs are composed of layers of Restricted Boltzmann Machines (RBMs) DBN architecture Generative Adversarial Networks (GANs) • GANs use unsupervised learning to train two models in parallel • RMB is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs
  • 21.
     Convolutional NeuralNetworks (CNNs) • A convolutional neural network (CNN, or ConvNet) is one of the most popular algorithms for deep learning with images and video • CNN is composed of an input layer, an output layer, and many hidden layers in between Feature Detection Layers Convolution puts the input images through a set of convolutional filters Pooling simplifies the output by performing nonlinear down sampling reducing the number of parameters that the network needs to learn about Rectifed linear unit (ReLU) allows for faster and more effective training by mapping negative values to zero and maintaining positive value • classificationlayershave one or morelayers • Layersproduceclassprobabilitiesor scores • The outputof these layersproducestypicallya two- dimensionalout‐put • The input layer accepts three-dimensional input generally in the form spatially
  • 22.
     Recurrent NeuralNetworks • Recurrent Neural Networks are in the family of feed-forward neural networks • Recurrent Neural Networks can send information over time-steps • Recurrent Neural Networks use the backpropagation algorithm, but with a little twist Advantages • Possibility of processing input of any length • Model size not increasing with size of input • Computation takes into account historical information • Weights are shared across time  applications of Recurrent Neural Networks in NLP • Language Modeling and Generating Text • Machine Translation • Speech Recognition • Generating Image Descriptions Drawbacks • Computation being slow • Difficulty of accessing information from a long time ago • Cannot consider any future input for the current state
  • 23.
     RNN Extensions •Bidirectional RNNs: • the output at time may not only depend on the previous elements in the sequence, but also future elements • Deep (Bidirectional) RNNs • multiple layers per time step • LSTM networks • LSTMs don’t have a fundamentally different architecture from RNNs • use a different function to compute the hidden state • They then combine the previous state, the current memory, and the input
  • 24.
     Recursive NeuralNetworks • A recursive neural network is created by applying the same set of weights recursively over a structured input • A recursive neural network have the ability to model the hierarchial structures in the training dataset  Applications of Recurrent NeuralNetworks in NLP  Recursive Neural Networks A recurrent neural network basically unfolds over time. • Image scene decomposition , NLP, Audio-to-text transcription A recursive neural network is more like a hierarchical network Vs Recurrent Neural Networks
  • 25.
     Simpler models(logistic regression) don’t achieve the accuracy level your use case needs  You have complex pattern matching in images, NLP, or audio to deal with  You have high dimensionality data  You have the dimension of time in your vectors (sequences)  You have high-quality, low-dimensional data; for example, columnar data from a database export  You’re not trying to find complex patterns in image data  You’ll achieve poor results from both methods when the data is incomplete and/or of poor quality
  • 26.
    • Sarita Arora, Python Natural Language Processing, SMECorner, Mumbai, India • François Chollet , Deep Learning with Python, MANNING Shelter Island ,2018 • Richard Socher, Yoshua Bengio and Chris Manning, Deep Learning for NLP(without Magic), ACL 2012 • Dr. Joshua F. Willey, R Deep Learning Essentials, BIRMINGHAM – MUMBAI, Packt Publishing Ltd , 2016 • Josh Patterson and Adam Gibson, Deep Learning A PRACTITIONER'S APPROACH , O’Reilly Media, Inc., 2017 • Nikhil Buduma, Fundamentals of Deep Learning , O’Reilly Media, Inc., 2017 • Antonio Spadaro, AI, Machine Learning & Deep Learning: cosa cambia , PyCon Italia 2017 • Ir Dr F. Chan , Artificial Intelligence – Deep Learning and its Applications, Build4Asia Conference 2018 • https://www.tutorialspoint.com/artificial_intelligence/artificial_intelligence_overview.htm • https://data-flair.training/blogs/python-django-tutorial/ • https://www.geeksforgeeks.org/ml-machine-learning/ • https://www.google.com/search?q=machine+learning+application&source