AILABS Lecture Series - Is AI The New Electricity. Topic - Deep Learning - Evolution and Future Trends by Dr. Chiranjit Acharya

Confidential, unpublished property of aiLabs. Do not duplicate or distribute. Use and distribution limited solely to authorized personnel. (c) Copyright 2018
Lecture Series: AI is the New Electricity
Dr. Chiranjit Acharya
AILABS Academy
J-3, GP Block, Sector V, Salt
Lake City, Kolkata, West
Bengal 700091
Deep Learning - SCOPING, EVOLUTION & FUTURE TRENDS
Presented at AILABS Academy,
Kolkata on April 18th 2018

2AILABS (c) Copyright 2018
A Journey into Deep Learning
▪Cutting edge technology
▪Garnered traction in both industry and academics
▪Achieves near-human-level performance in many pattern
recognition tasks
▪Excels in
▪structured, relational data
▪unstructured rich-media data such as image, video,
audio and text

A Journey intoDeep Learning
▪What is Deep Learning? Where is the “deepness”?
▪Where does Deep Learning come from?
▪What are the models and algorithms of Deep Learning?
▪What is the trajectory of evolution of Deep Learning?
▪What are the future trends of Deep Learning?

A Journey into Deep Learning

Artificial Intelligence
Holy Grail of AI Research
▪Understanding the neuro-biological and neuro-
physical basis of human intelligence
▪science of intelligence
▪Building intelligent machines which can think and act
like humans
▪engineering of intelligence

Facets of AI Research
▪knowledge representation
▪Reasoning
▪natural language understanding
▪natural scene understanding

Facets of AI Research
▪natural speech understanding
▪problem solving
▪Perception
▪Learning
▪planning

Machine Learning
Basic Doctrine of Learning
▪learning from examples
Outcome of Learning
▪rules of inference for some predictive task
▪embodiment of the rules = model
▪model is an abstract computing device
•kernel machine, decision tree, neural
network

Machine Learning
Connotations of Learning
▪process of generalization
▪discovering nature/traits of data
▪unraveling patterns and anti-patterns in data

Machine Learning
Connotations of Learning:
▪knowing distributional characteristics of data
▪identifying causal effects and propagation
▪identifying non-causal co variations & correlations

Machine Learning
Design Aspects of Learning System
▪ Choose the training experience
▪ Choose exactly what is to be learned, i.e. the target function /
machine
▪ Choose objective function & optimality criteria
▪ Choose a learning algorithm to infer the target function from
the experience.

▪Stage 1: Feature Extraction, Feature subset selection,
Feature Vector Representation
▪Stage 2: Training / Testing Set Creation and Augmentation
▪Stage 3: Training the Inference Machine
▪Stage 4: Running the Inference Machine on Test Set
▪Stage 5: Stratified Sampling and Validation
Learning Work Flow

Feature Extraction/ Selection
low-level parts
mid-level parts
high-level parts
additional descriptors
Cognitive Elements
Knowledge EngineerDomain Expert
Sparse Coder
Sparse
Representation
Corpus

Training Set Augmentation
Sparse
Representation
Random
Sampler
Samples
Reviewer
Existing
Training Set
Existing
training set
Augmented
training set

Training and Prediction / Recognition
Training
Set
Adaptive
Learner
Prediction /
Recognition
Model
Prediction /
Recognition
Model
Unlabelled
Residual Corpus
Predicted /
Recognized Corpus

Sampling , Validation& Convergence
Stratified
Sampler
Precision &
Recall
Calculator
Converged
?
Yes End of
Relevance
Scoring
NoGo back to
Training Set
Augmentation
Human
Reviewed
Stratified sub-
samples
ReviewerStratified sub-
samples
Predicted
Corpus

Evolutionof Connectionist Models
1943: Artificial neuron model (McCulloch & Pitts)
▪ "A logical calculus of the ideas immanent in nervous activity"
▪ simple artificial “neurons” could be made to perform basic
logical operations such as AND, OR and NOT
▪ known as Linear Threshold Gate
▪ NO learning

Evolution of Connectionist Models
)( jj sfy j
n
i
iijj bxws  0
w1j
x1
w2j
x2
wnj
xn
bj
yj

1957: Perceptron model (Rosenblatt)
▪ invention of learning rules inspired by ideas from
neuroscience
if Σ inputi * weighti > threshold, output = +1
if Σ inputi * weighti < threshold, output = -1
▪ learns to classify input into two output classes
▪ Sigmoid transfer function: boundedness, graduality


xy
xy
as0
as1

)( jj sfy j
n
i
iijj bxws  0
w1j
x1
w2j
x2
wnj
xn
bj
yj
js
e

1
1

1960s: Delta Learning Rule (Widrow & Hoff)
▪ Define the error as the
squared residuals
summed over all training
cases:
▪ Now differentiate to get
error derivatives for
weights
▪ The batch delta rule
changes the weights in
proportion to their error
derivatives summed
over all training cases
1 2
2
1
2
,
ˆ( )
ˆ
ˆ
ˆ( )
n n
n
n n
ni i n
i n n n
n
i
i
E y y
y EE
w w y
x y y
E
w
w

 
 

  
  

  





1969: Minsky's objection to Perceptrons
▪ Marvin Minsky & Seymour Papert: Perceptrons
▪ Unless input categories are linearly separable, a perceptron
cannot learn to discriminate between them.
▪ Unfortunately, it appeared that many important categories
were not linearly separable.

Perceptrons are good at linear classification but ...
x1
x2
1
1
1
1
1
1
1
1
1

(XOR operation)
x1
x2(1)
(1) (0)
(0)
(0)
(0)
(1)
(1)
X1 X2 Output
0 0 0
0 1 1
1 0 1
1 1 0
Perceptrons are incapable of simple nonlinear classification like XOR

Universal ApproximationTheorem
Existential Version (Kolmogorov)
▪ There exists a finite combination of superposition and
addition of continuous functions of single variables which can
approximate any continuous, multivariate function on
compact subsets of R^d.
Constructive Version (Cybenko)
▪ The standard multilayer feed-forward network with a single
hidden layer, containing finite number of hidden neurons, is
a universal approximator among continuous functions on
compact subsets of R^d, under mild assumptions on the
activation function.

1986: Backpropagation for Multi-Layer Perceptrons
(Rumelhart, Hinton & Williams)
▪ solution to Minsky's objection regarding perceptron's limitation
▪ nonlinear classification is achieved by fully connected, multilayer,
feedforward networks of perceptrons (MLP)
▪ MLP can be trained by backpropagation
▪ Two-pass algorithm
▪ forward propagation of activation signals from input to output
▪ backward propagation of error derivatives from output to input

Output
Layer
Hidden
Layer
Input
Layer
Input
Output
1x
2x
Layer 1
…
…
Nx
…
…
Layer 2
…
…
…
…
y1
y2
yM

▪ solution to Minsky's objection regarding perceptron's limitation
▪ nonlinear classification is achieved by fully connected, multilayer,
feedforward networks of perceptrons (MLP)
▪ MLP can be trained by backpropagation
▪ Two-pass algorithm
▪ forward propagation of activation signals from input to output
▪ backward propagation of error derivatives from output to input

Machine Learning Example
Handwriting Digit Recognition
Machine
“2”

HandwritingDigit Recognition
Input Output
16 x 16 = 256
1x
2x
256x
…
…
Color → 1
No color → 0
…
y1
y2
y1
Each output represents the
confidence of a digit.
is 1
is 2
is 0
…
0.1
0.7
0.2
The image
is “2”
y1
y2
y10

Example Application
Handwriting Digit Recognition
Machine “2
”
1x
2x
256x
…
…
…
…
y1
y2
y10

1989: Convolutional Neural Network (LeCun)
Output
LayerHidden Layers
Input
Layer
Input Output
1x
2x
Layer 1
…
…
Nx
…
…
Layer 2
…
…
Layer
L
…
…
…
…
…
…
…
…
…
…
y1
y2
yM
Deep means many hidden layers
neuron

Convolutional Neural Network
▪ Input can have very high dimension.
▪ Using a fully-connected neural network would need a large
amount of parameters.
▪ CNNs are a special type of neural network whose hidden
units are only connected to local receptive field.
▪ The number of parameters needed by CNNs is much
smaller.
Example: 200x200 image
a)fully connected: 40,000
hidden units => 1.6 billion
parameters
b)CNN: 5x5 kernel (filter), 100
feature maps => 2,500
parameters

Convolution Operation
Patc
h

Convolution Operationin CNN
▪ Input: an image (2-D array): x
▪ Convolution kernel (2-D array of learnable parameters): w
▪ Feature map (2-D array of processed data): s
▪ Convolution operation in 2-D domains:

ConvolutionFilters

Convolution Operationwith Filters
C

Convolution Layers
Convolution Layer
Channels Feature Maps

3 Stages of a Convolutional Layer

Non Linear Stage
Tanh(x) ReLU

2006: Deep Belief Networks (Hinton), Stacked Auto-Encoders
(Bengio)
Output
LayerHidden Layers
Input
Layer
Input Output
1x
2x
Layer 1
…
…
Nx
…
…
Layer 2
…
…
Layer
L
…
…
…
…
…
…
…
…
…
…
y1
y2
yM
Deep means man y hidden layers
neuron

Deep Learning
Traditional pattern recognition models use hand-crafted
features and relatively simple trainable classifier.
This approach has the following limitations:
• It is very tedious and costly to develop hand-crafted
features
▪ The hand-crafted features are usually highly dependents on
one application, and cannot be transferred easily to other
applications
hand-crafted
feature
extractor
“Simple”
Trainable
Classifier
output

Deep Learning
Deep learning = representation learning
Seeks to learn hierarchical representations (i.e. features)
automatically through multiple stage of feature learning process.
Low-level
features
output
Mid-level
features
High-level
features
Trainable
classifier
Feature visualization of convolutional net trained on ImageNet (Zeiler and Fergus, 2013)

Learning Hierarchical Representations
Hierarchy of representations with increasing level of abstraction.
Each stage is a kind of trainable nonlinear feature transformation
Image recognition
Pixel → edge → motif → part → object
Text
Character → word → word group → clause → sentence → story
Low-level
features
output
Mid-level
features
High-level
features
Trainable
classifier
Increasing level of abstraction

Pooling
Common pooling operations:
Max pooling
Report the maximum output within a rectangular neighborhood.
Average pooling
Report the average output of a rectangular neighborhood (possibly
weighted by the distance from the central pixel).

CiFAR10
CiFAR10

Deep CNN on CiFAR10
Deep CNN on CiFAR10

Future Trends
▪ Different and wider range of problems are being
addressed
▪ natural language understanding
▪ natural scene understanding
▪ natural speech understanding
▪ Feature learning is being investigated at deeper level
▪ Manifold learning
▪ Reinforcement learning
▪ Integration with other paradigms of machine learning

AILABS Lecture Series - Is AI The New Electricity. Topic - Deep Learning - Evolution and Future Trends by Dr. Chiranjit Acharya

Recommended

Recommended

More Related Content

Similar to AILABS Lecture Series - Is AI The New Electricity. Topic - Deep Learning - Evolution and Future Trends by Dr. Chiranjit Acharya

Similar to AILABS Lecture Series - Is AI The New Electricity. Topic - Deep Learning - Evolution and Future Trends by Dr. Chiranjit Acharya (20)

More from AILABS Academy

More from AILABS Academy (6)

Recently uploaded

Recently uploaded (20)

AILABS Lecture Series - Is AI The New Electricity. Topic - Deep Learning - Evolution and Future Trends by Dr. Chiranjit Acharya