Deep Learning: Application & Opportunity

DEEP
LEARNINGAPPLICATION AND
OPPORTUNITY
Speaker: Tarun Sukhani
iTrain Malaysia www.itrain.com.my iTrain Malaysia

AGENDA
• History & Introduction
• Available Algorithms and Tools
• Outlook
• Supervised Learning
• Convolutional Neural Network
• Sequence Modelling: RNN and its extensions
• Unsupervised Learning
• Autoencoder
• Stacked Denoising Autoencoder
• Reinforcement Learning
• Deep Reinforcement Learning
• Applications

HISTORY – IN THE BEGINNING, THERE
WAS…

HISTORICAL TRENDS: GROWING
DATASETS

HISTORY – THE BIG BANG – ONE NET TO
RULE THEM ALL

INTEREST – GOOGLE NGRAM AND TRENDS

HYPE OR REALITY? - SOME QUOTES

HYPE OR REALITY? – DEEP LEARNING AT
GOOGLE

WHAT IS ARTIFICIAL INTELLIGENCE?

MACHINE LEARNING PROBLEM TYPES

LEARNING MULTIPLE COMPONENTS
Figure 1.5

INTRODUCTION
• Traditional pattern recognition models use hand-
crafted features and relatively simple trainable
classifiers.
• This approach has the following limitations:
• It is very tedious and costly to develop hand-crafted features
• These hand-crafted features are usually highly dependent on
one application, and cannot be transferred easily to other
applications – ie, they are not generalizable.
Hand-crafted
Feature
Extractor
“Simple”
Trainable
Classifier
Output

DEEP LEARNING – NO MORE FEATURE
ENGINEERING

DEEP LEARNING – ARCHITECTURE

DEEP LEARNING
• Deep learning (a.k.a. representation learning)
seeks to learn rich hierarchical representations (i.e.
features) automatically through multiple stages of
a feature learning process.
Low-level
features
Output
Mid-level
features
High-level
features
Trainable
classifier
Feature visualization of convolutional net trained on ImageNet
(Zeiler and Fergus, 2013)

LEARNING HIERARCHICAL
REPRESENTATIONS
• Hierarchy of representations with increasing levels
of abstraction. Each stage is a kind of trainable
nonlinear feature transform.
• Image recognition
• Pixel → edge → shape → part → object
• Text
• Character → word → word group → clause → sentence
→ story
• Speech
• Sample → spectral band → sound→ …→ phoneme →
Low-level
features
Output
Mid-level
features
High-level
features
Trainable
classifier
Increasing levels of abstraction

THE MAMMALIAN VISUAL CORTEX IS
HIERARCHICAL

THE MAMMALIAN VISUAL CORTEX IS
HIERARCHICAL
• It is good to be inspired by
nature, but not too much.
• We need to understand
which details are
important, and which
details are merely the
result of evolution.
• Each module in Deep
Learning transforms its
input representation into a
higher-level one, in a way
similar to the mammalian
visual cortex.
(van Essen and Gallant, 1994)

ARTIFICIAL NEURAL NETWORKS (ANN’S)

LEARNING HIERARCHICAL
REPRESENTATIONS

LEARNING HOW TO LEARN
Deep Learning addresses the problem of learning hierarchical
representations with a single algorithm

NON-LINEAR ACTIVATION FUNCTION

IMPROVING THE NET – ERROR
CORRECTION

AVAILABLE DEEP LEARNING ALGORITHMS:
AUTOENCODERS

CONVOLUTIONAL NEURAL NETS (CNN‘S)

RECURRENT NEURAL NETS (RNN’S)

LONG SHORT-TERM MEMORY RNN (LSTM’S)

DEEP LEARNING APPLICATIONS:
USING RNN’S TO GENERATE TEXT

IMAGE CAPTIONING – COMBINING CNN’S
AND RNN’S

USAGE REQUIREMENTS

AVAILABLE TOOLS – ALL OPEN SOURCE!

COMPUTING RESOURCES

OUTLOOK

SUPERVISED LEARNING
• Convolutional Neural Network
• Sequence Modelling
• Why do we need RNN?
• What are RNNs?
• RNN Extensions
• What can RNNs can do?

CONVOLUTIONAL NEURAL NETWORK
• Input can have very high dimension. Using a fully-connected
neural network would need a large amount of parameters.
• Inspired by the neurophysiological experiments conducted
by [Hubel & Wiesel 1962], CNNs are a special type of neural
network whose hidden units are only connected to local
receptive field. The number of parameters needed by CNNs
is much smaller.
Example: 200x200 image
a) fully connected: 40,000
hidden units => 1.6
billion parameters
b) CNN: 5x5 kernel, 100
feature maps => 2,500
parameters

THREE STAGES OF A
CONVOLUTIONAL LAYER
1. Convolution stage
2. Nonlinearity: a nonlinear
transform such as rectified
linear or 𝐭𝐚𝐧𝐡
3. Pooling: output a summary
statistics of local input, such
as max pooling and average
pooling

CONVOLUTION OPERATION IN CNN
• Input: an image (2-D array) x
• Convolution kernel/operator(2-D array of learnable parameters):
w
• Feature map (2-D array of processed data): s
• Convolution operation in 2-D domains:

MULTIPLE CONVOLUTIONS
Usually there are multiple feature
maps, one for each convolution
operator.

NON-LINEARITY
Tanh(x) ReLU
tanh x =
ex
− e−x
ex + e−x
𝑓 𝑥 = max(0, 𝑥)

POOLING
• Common pooling operations:
• Max pooling: reports the maximum output within a
rectangular neighborhood.
• Average pooling: reports the average output of a
rectangular neighborhood (possibly weighted by
the distance from the central pixel).

DEEP CNN: WINNER
OF IMAGENET 2012
• Multiple feature maps per convolutional layer.
• Multiple convolutional layers for extracting features at different
levels.
• Higher-level layers take the feature maps in lower-level layers as
input.
(Alex et al., 2012)

SEQUENCE MODELLING
• Why do we need RNN?
• What are RNNs?
• RNN Extensions
• What can RNNs can do?

WHY DO WE NEED RNNS?
The limitations of the Neural network (CNNs)
• Rely on the assumption of independence among the
(training and test) examples.
• After each data point is processed, the entire state of the
network is lost
• Rely on examples being vectors of fixed length
We need to model the data with temporal or
sequential structures and varying length of
inputs and outputs
• Frames from video
• Snippets of audio
• Words pulled from sentences

Recurrent neural networks (RNNs) are connectionist models
with the ability to selectively pass information across sequence
steps, while processing sequential data one element at a time.
The simplest form of fully recurrent
neural network is an MLP with the
previous set of hidden unit activations
feeding back into the network along with
the inputs
ℎ 𝑡 = 𝑓𝐻 𝑊𝐼𝐻 𝑥 𝑡 + 𝑊𝐻𝐻ℎ(𝑡 − 1)
𝑦 𝑡 = 𝑓𝑂(𝑊𝐻𝑂ℎ(𝑡))
𝑓𝐻 and 𝑓𝑂 are the activation function for
hidden and output unit; 𝑊𝐼𝐻, 𝑊𝐻𝐻, and
𝑊𝐻𝑂 are connection weight matrices which
are learnt by training
Allow a ‘memory’ of previous inputs to persist in
the network’s internal state, and thereby
influence the network output
WHAT ARE RNNS?

An unfolded recurrent network. Each node represents a layer of network units at a single time
step. The weighted connections from the input layer to hidden layer are labelled ‘w1’, those from
the hidden layer to itself (i.e. the recurrent weights) are labelled ‘w2’ and the hidden to output
weights are labelled‘w3’. Note that the same weights are reused at every time step. Bias weights
are omitted for clarity.
WHAT ARE RNNS?
• The recurrent network can be converted into a feed-
forward network by unfolding over time

ℎ 𝑡 = 𝑓(𝑊 𝑥ℎ 𝑥 𝑡 + 𝑊ℎℎℎ 𝑡−1 + 𝑏ℎ)
ℎ 𝑡 = 𝑓(𝑊 𝑥ℎ 𝑥 𝑡 + 𝑊ℎℎℎ 𝑡−1 + 𝑏ℎ)
𝑦𝑡 = 𝑊ℎ𝑦ℎ 𝑡 + 𝑊ℎ𝑦ℎ 𝑡 + 𝑏 𝑦
training
sequence
forwards and
backwards to
two separate
recurrent
hidden layers
past and future context
determines the output
An unfolded
BRNN
RNN EXTENSIONS: BIDIRECTIONAL
RECURRENT NEURAL NETWORKS
Traditional RNNs only model the dependence of the current state
on the previous state, BRNNs (Schuster and Paliwal, 1997) extend
to model dependence on both past states and future states.
For example: predicting a missing word in a sequence you want to
look at both the left and the right context.

A gating mechanism of
the LSTM , which
generates the current
hidden state by the paste
hidden state and current
input ..It contains five
modules: input gate, new
memory cell, forget gate,
final memory generation,
and output gate.
RNN EXTENSIONS: LONG SHORT-TERM
MEMORY
The vanishing gradient problem prevents standard RNNs from learning
long-term dependencies. LSTMs (Hochreiter and Schmidhuber, 1997)
were designed to combat vanishing gradients through
a gating mechanism.

RNN EXTENSIONS:
LONG SHORT-TERM MEMORY
LSTMs contain information outside the normal flow
of the recurrent network in a gated cell. Information
can be stored in, written to, or read from a cell, much
like data in a computer’s memory. The cells learn
when to allow data to enter, leave or be deleted
through the iterative process of making guesses,
back-propagating error, and adjusting weights via
gradient descent.
Conclusions on LSTM

RNN EXTENSIONS:
LONG SHORT-TERM MEMORY
Why LSTM can combat the vanish gradient problem
LSTMs help preserve the error that can be back-
propagated through time and layers. By
maintaining a more constant error, they allow
recurrent nets to continue to learn over many time
steps (over 1000), thereby opening a channel to
link causes and effects remotely

What can RNNs can do?
Machine Translation Visual Question Answering

In machine translation, the input is a sequence
of words in source language, and the output is
a sequence of words in target language.
Encoder-decoder architecture for machine translation
Encoder: An
RNN to
encode the
input
sentence into
a hidden state
(feature)
Decoder: An RNN with
the hidden state of the
sentence in source
language as the input
and output the
translated sentence
MACHINE TRANSLATION

VQA: Given an image and a natural language
question about the image, the task is to provide
an accurate natural language answer
VISUAL QUESTION ANSWERING (VQA)
Picture from (Antol et al., 2015)

The output is to be conditioned on both image and
textual inputs. A CNN is used to encode the image and a
RNN is implemented to encode the sentence.
VISUAL QUESTION ANSWERING

UNSUPERVISED LEARNING
• Autoencoders
• Deep Autoencoders
• Denoising Autoencoders
• Stacked Denoising Autoencoders

AUTOENCODERS
An Autoencoder is a
feedforward neural network
that learns to predict the input
itself in the output.
𝑦(𝑖)
= 𝑥(𝑖)
• The input-to-hidden part
corresponds to an encoder
• The hidden-to-output part
corresponds to a decoder.

DEEP AUTOENCODERS
• A deep Autoencoder is
constructed by
extending the encoder
and decoder of
autoencoder with
multiple hidden layers.
• Gradient vanishing
problem: the gradient
becomes too small as
it passes back through
many layers
Diagram from (Hinton and Salakhutdinov, 2006)

TRAINING DEEP AUTOENCODERS

DENOISING AUTOENCODERS
By adding stochastic noise to the layers, it can
force Autoencoder to learn more robust features.

TRAINING DENOISING
AUTOENCODER ON MNIST
The following pictures show the difference between the
resulting filters of Denoising Autoencoder trained on
MNIST with different noise ratios.
No noise (noise ratio=0%) noise ratio=30%

DEEP REINFORCEMENT LEARNING
• Reinforcement Learning
• Deep Reinforcement Learning
• Applications
• Playing Atari Games
• AlphaGO

REINFORCEMENT LEARNING
What’s Reinforcement Learning?
Environment
Agent
{Observation, Reward} {Actions}
• Agent interacts with an environment and learns by maximizing
a scalar reward signal
• No labels or any other supervision signal.
• Previously suffering from hand-craft states or representation.

POLICIES AND VALUE FUNCTIONS
• Policy 𝜋 is a behavior function selecting actions given
states
𝑎 = 𝜋(s)
• Value function 𝑄 𝜋
(s,a) is expected total reward 𝑟 from
state s and action a under policy 𝜋
“How good is action 𝑎 in state 𝑠?”

APPROACHES TO
REINFORCEMENT LEARNING
• Policy-based RL
• Search directly for the optimal policy 𝛑∗
• Policy achieving maximum future reward
• Value-based RL
• Estimate the optimal value function 𝐐∗(s,a)
• Maximum value achievable under any policy
• Model-based RL
• Build a transition model of the environment
• Plan (e.g. by look-ahead) using model

ABOUT THE SPEAKER
• Tarun Sukhani has 16 years of both academic and industry experience as a data scientist over the
course of his career. Starting off as an EAI consultant in the USA, Tarun was involved in a number
of integration/ETL projects for a variety of Fortune 500 and Global 1000 clients, such as BP
Amoco, Praxair, and GE Medical Systems. While completing his Master's degree in Data
Warehousing, Data Mining, and Business Intelligence at Loyola University Chicago GSB in 2005,
Tarun also worked as a BI consultant for a number of Fortune 500 clients at Revere Consulting, a
Chicago-based boutique IT firm focusing on Data Warehousing/Mining projects. Within the
industry, he worked on ETL/Data Science/Machine Learning projects at Profitera, Experian, Atex,
E-Radar, ICarAsia, and Max Money. Some of the tools employed during his career included Oracle
BI, Informatica, Cognos, Business Objects, Pentaho, Apache Mahout, Spark MLib, Apache UIMA
and Temis. Tarun continues to work within the BI space, most recently focusing his time on
Deep/Reinforcement Learning projects within the Fintech sector.
• Tarun has worked on parametric statistical modeling as well within the Data Science and Big Data
Science space, using tools such as SciPy in Python and R and R/Hadoop for Big Data projects.
• Tarun is the principal Data Science trainer for iTrain Asia Singapore.
• Tarun has his own consulting firm - Abundent is a Digital Transformation and Big Data Analytics
company specializing in modernizing IT infrastructure and development practices across a wide
range of industries.

We Enable Industry 4.0
Abundent is a Digital Transformation and Big Data
Analytics company specializing in modernizing IT
infrastructure and development practices across a wide
range of industries.
info@abundent.com © Abundent Sdn Bhd© Abundent LLC

Cloud/DevOps/
IoT
Data
Science
Big Data
AI/ML/DRL
Fintech
Blockch
ain
Cloud Computing,
Containerization, Serverless
Computing, Reactive
Microservices, NoSQL, Agile
DevOps, and IoT can help
organizations to rapidly scale
their solutions and avoid
errors in deployment and
maintenance of software
© Abundent Sdn Bhd
Data Science, Big Data,
Data Lakes, Batch and
Real-Time Data
Processing, Data
Products, and Interactive
Data Visualization can
allow organizations to
make the most of the
data they have
Artificial Intelligence,
Machine Learning, Deep
Learning and Deep
Reinforcement Learning
can give organizations
an edge over their
competition as well as
streamline operations
and decision making
Fintech promises to transform the
way we do banking and
payments, with numerous
applications in non-financial
domains as well. Blockchain and
its derivatives will allow
organizations to bypass central
authorities and expedite
transactions all while providing
greater security.
© Abundent LLC
OUR
COMPETENCIES

© Abundent Sdn Bhd
We can develop, deploy, and maintain your data products,
visualizations and pipelines (Data Science as a Service)…
© Abundent LLC
DATA SCIENCE & BIG
DATA
Data Science as a Service

MACHINE LEARNING & DEEP
LEARNING
© Abundent Sdn Bhd
We can develop, deploy, and maintain your machine learning,
chatbot and deep learning models…
© Abundent LLC
Machine Learning as a Service

ASIA’S TOP DIGITAL TECH TRAINING POWERHOUSE.
TRAINING YOU TO MEET THE CHALLENGES OF TOMORROW HEAD ON
iTrain enables continuous brain gain through in-demand skills trainings that help
you stay ahead in the digital economy
iTrain (M) Sdn Bhd
C-L19-8, KL Trillion, 338 Jalan Tun Razak, 50400 Kuala Lumpur
Tel: +603 21661879 Email: info@itrain.com.my Web: www.itrain.com.my

UPCOMING TRAINING
INTRODUCTION TO DEEP LEARNING WITH NVIDIA GPUS
FEBRUARY 21ST - 23RD, 2018
MAGIC CYBERJAYA
9.00 AM - 5.00 PM
Supported by MaGIC with 50% early bird discount. Limited to 20 seats
only. First come first serve
Registration Link:
https://events.bizzabo.com/deeplearningintro

Deep Learning: Application & Opportunity

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Deep Learning: Application & Opportunity

Similar to Deep Learning: Application & Opportunity (20)

Recently uploaded

Recently uploaded (20)

Deep Learning: Application & Opportunity