Why is Deep learning hot right now? and How can we apply it on each day job?

Why is Deep Learning hot
right now, and How can we
apply it on each day job
ISSAM A. AL-ZINATI
OUTREACH & TECHNICAL ADVISOR
UCAS TECHNOLOGY INCUBATOR
ISSAM A. AL-ZINATI - UCASTI 1

What is Deep Learning
Is a Neural Network

Is a Neural Network Neuron
Can run small specific
mathematical task

Is a Neural Network Neuron
Can run small specific
mathematical task
Edge
Connects Neurons
Holds weights to adjust inputs

Is a Neural Network
With More Layers

Is a Neural Network
With More Layers
And More Neurons

Why Now

Why Now- Scale
Data

Why Now- Scale
Data
Small Meduim Large
Performance Based on Data Size
Performance
The more data
you feed the
model, the
better results
you get

Why Now- Scale
Model Size & GPU

Why Now- Scale
Model Size & GPU
Small Meduim Large
Performance Based on Model Size
Performance
Bigger model could
achieve better
results.
GPUs help to train
those models in
much faster, 20X!!

Why Now– vs Others
What about other kind of machine learning algorithms, i.e. SVM, DT, Boosting, ….
Would they do better if they got more data and power?

Why Now– vs Others
Small Data Medium Data Large Data
Performance of NN VS Others
Based on Model Size and Data Amount
Others Small NN Medium NN Large NN

Why Now– End-To-End
Usual machine learning approach contains a pipeline of stages that are
responsible of feature extraction.
Each stage passes a set of engineered features which help model to better
understand the case it works on.
This approach is complex and prone to errors.

Data (Audio)
Speech Recognition Pipeline
Audio
Features
Phonemes
Language
Model
Transcript

Data (Audio)
Speech Recognition - DL
Audio
Features Phonemes Language
Model
Transcript

Data (Audio)
Speech Recognition - DL
Audio
Features Phonemes Language
Model
Transcript
The Magic

How it wok – The Magic

How it work – No Magic
Deep Neural network is not magic. But it is very good at finding patterns.
“The hierarchy of concepts allows the computer to learn complicated concepts
by building them out of simpler ones. If we draw a graph showing how these
concepts are built on top of each other, the graph is deep, with many layers. For
this reason, we call this approach to AI deep learning”, Ian Goodfellow.
Deep Learning is Hierarchical Feature Learning.

Deep Learning Models
General
Model
FC
Sequence
Model
RNN
LSTM
Image
Model
CNN
Other
Models
Unsupervised
RL

Deep Learning Models
General
Model
FC
Sequence
Model
RNN
LSTM
Image
Model
CNN
Other
Models
Unsupervised
RL
Hot Research Topic

Advanced Deep Learning Models –
VGGNET - ResNet
Achieves 7.3% on ImageNet-2014 classification Challenge, come in the first
place.
It Used
120 million
parameters.

Google Inception V3
Achieves 5.64% on ImageNet-2015 classification Challenge, come in the second place.

Google Inception V3
Based on ConvNet concept with the addition
of inception module.
Using a network with a
computational cost of 5 billion
multiply-adds per inference and
with using less than 25 million
parameters.

Deep Learning Applications – Deep Voice
Baidu Research presents Deep Voice, a production-quality text-to-speech system
constructed entirely from deep neural networks.
Ground Truth
Generated Voice

Deep Learning Applications – Image
Captioning
Multimodal Recurrent Neural Architecture generates sentence descriptions from
images. Source.
"man in black shirt is playing guitar." "two young girls are playing with lego toy."

Deep Learning Applications – Generating
Videos
This approach was driven by using Adversarial Network to
1) Generate Videos
2) Conditional Video Generation based on Static Images
Source

Applying Deep Learning – Frameworks
Low Level

Applying Deep Learning – Frameworks
Low Level
High Level

Applying Deep Learning – Bias/Variance
The goal is to build a model that is close to human-level performance.

Training Set – 70% Val Set – 15% Test Set – 15%

Training Set – 70% Val Set – 15% Test Set – 15%
You need to know the following values:
1- Human-Level Error
2- Training Level Error
3- Validation Level Error

1%
5%
Human-Level
Training-Level
6%
Validation-Level

1%
5%
Human-Level
Training-Level
6%
Validation-Level
High bias/
underfitting

1%
5%
Human-Level
Training-Level
6%
Validation-Level
High bias/
underfitting
1- Bigger Model
2- Train Longer
3- New Model Arch

1%
2%
Human-Level
Training-Level
6%
Validation-Level

1%
2%
Human-Level
Training-Level
6%
Validation-Level
High variance/
overfitting

1%
2%
Human-Level
Training-Level
6%
Validation-Level
High variance/
overfitting
1- More Data
2- Early Stopping
3- Regularization
4- New Model Arch

1%
5%
Human-Level
Training-Level
10%
Validation-Level

1%
5%
Human-Level
Training-Level
10%
Validation-Level
High bias/
underfitting
High variance/
overfitting

1%
5%
Human-Level
Training-Level
10%
Validation-Level
High bias/
underfitting
1- Bigger Model
2- Train Longer
3- New Model Arch
High variance/
overfitting
1- More Data
2- Early Stopping
3- Regularization
4- New Model Arch

Applying Deep Learning – Data Synthesis
Usually, to overcome the problem of bias/variance we tend to create new set of
handy engineered features and try to retrain our model to see if we get more
accurate one.
In Deep Learning, Having more data can be a great solution to many scenarios.
But Its not always the case that we had this data ready to use.
So playing with data and try to create new handy engineered data set can be the
solution.

1) OCR
Getting more data for an OCR model is easy. We can follow these steps to get
those new data sets:
- Downloading images from the internet
- Generate text from MSWord in different font, size, color, blur, …
- Combine these two steps and you get millions of new images for training.

2) Speech Recognition
- Collect a set of clean audio files
- Collect random background sounds
- Combine these two steps and you get millions of new audio files for training.

3) NLP – Grammar correction
- Collect a set of correct sentences
- Randomly shuffle the word in this sentence
- Those new sentences are the new data set that we can fed to our model.

4) Image Recognition
- Having a set of labeled images
- Randomly add new effects to those images, i.e. rotate, blur, flip, luminosity, …
- Those new images are the new data set that we can fed to our model.

Data Syntheses has a limit, it can not always work but it good to start with.

Applying Deep Learning – Transfer Learning
Another approach to overcome the problem of bias/variance is to have:
1- Larger Model
2- New Model Architecture
But these two approaches needs more powerful machine to do the training.
Also, sometimes you don’t have enough resources to train it, i.e. Data and GPUs.

Imagine if we can run Google Inception V3 as our model for image classification,
would not be Great!!
Transfer learning allow us to use these popular model by replacing the last fully
connected layer (1000 class classifier) with our classifier. Here we are using the
other layers as a feature extractors.

1) Fixed feature extractor
◦ Import one of the famous model with its weights.
◦ Replace last layer with custom classifier. It could be another fully connected NN or
other ML models like SVM.
◦ Train the new classifier based on the features and weights that this network had
extracted already.

2) Fine-Tunning
◦ Fine-tune the weights of the pretrained network by continuing the backpropagation.
Also, it will train your new classifier at the same time.

3) Retraining
◦ Retrain the whole model.

Best practices
1- New dataset is small and similar to original dataset. – use the first approach.
2- New dataset is large and similar to the original dataset. – use the second
approach.
3- New dataset is small but very different from the original dataset. – use the
second approach but only on early activations in network.
4- New dataset is large and very different from the original dataset. – use the
third approach.

Applying Deep Learning – Use GPU/Cloud
The last point that we should consider is using GPUs on cloud.
There are two famous provider who give you the ability to configure a machine with GPU for
good prices
1- Amazon AWS – By using its P2 instances you can train and run your model under 1$ per hour.
Another advantage of using AWS is the ability to use preconfigure images that has every thing
installed and configured for you.
2- Microsoft Azure – By using its NC instances you can train and run your model for 1.05$ per
hour.

Thanks for listening 

Why is Deep learning hot right now? and How can we apply it on each day job?

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Why is Deep learning hot right now? and How can we apply it on each day job?

Similar to Why is Deep learning hot right now? and How can we apply it on each day job? (20)

Recently uploaded

Recently uploaded (20)

Why is Deep learning hot right now? and How can we apply it on each day job?