Artificial Intelligence is transforming the world. Deep Learning, an integral part of this new Artificial Intelligence paradigm, is becoming one of the most sought after skills. Learn more about Deep Learning and its Evolution.
AILABS Lecture Series - Is AI The New Electricity. Topic - Deep Learning - Evolution and Future Trends by Dr. Chiranjit Acharya
1. Confidential, unpublished property of aiLabs. Do not duplicate or distribute. Use and distribution limited solely to authorized personnel. (c) Copyright 2018
Lecture Series: AI is the New Electricity
Dr. Chiranjit Acharya
AILABS Academy
J-3, GP Block, Sector V, Salt
Lake City, Kolkata, West
Bengal 700091
Deep Learning - SCOPING, EVOLUTION & FUTURE TRENDS
Presented at AILABS Academy,
Kolkata on April 18th 2018
2. 2AILABS (c) Copyright 2018
A Journey into Deep Learning
▪Cutting edge technology
▪Garnered traction in both industry and academics
▪Achieves near-human-level performance in many pattern
recognition tasks
▪Excels in
▪structured, relational data
▪unstructured rich-media data such as image, video,
audio and text
3. 3AILABS (c) Copyright 2018
A Journey intoDeep Learning
▪What is Deep Learning? Where is the “deepness”?
▪Where does Deep Learning come from?
▪What are the models and algorithms of Deep Learning?
▪What is the trajectory of evolution of Deep Learning?
▪What are the future trends of Deep Learning?
5. 5AILABS (c) Copyright 2018
Artificial Intelligence
Holy Grail of AI Research
▪Understanding the neuro-biological and neuro-
physical basis of human intelligence
▪science of intelligence
▪Building intelligent machines which can think and act
like humans
▪engineering of intelligence
6. 6AILABS (c) Copyright 2018
Artificial Intelligence
Facets of AI Research
▪knowledge representation
▪Reasoning
▪natural language understanding
▪natural scene understanding
7. 7AILABS (c) Copyright 2018
Artificial Intelligence
Facets of AI Research
▪natural speech understanding
▪problem solving
▪Perception
▪Learning
▪planning
8. 8AILABS (c) Copyright 2018
Machine Learning
Basic Doctrine of Learning
▪learning from examples
Outcome of Learning
▪rules of inference for some predictive task
▪embodiment of the rules = model
▪model is an abstract computing device
•kernel machine, decision tree, neural
network
9. 9AILABS (c) Copyright 2018
Machine Learning
Connotations of Learning
▪process of generalization
▪discovering nature/traits of data
▪unraveling patterns and anti-patterns in data
10. 10AILABS (c) Copyright 2018
Machine Learning
Connotations of Learning:
▪knowing distributional characteristics of data
▪identifying causal effects and propagation
▪identifying non-causal co variations & correlations
11. 11AILABS (c) Copyright 2018
Machine Learning
Design Aspects of Learning System
▪ Choose the training experience
▪ Choose exactly what is to be learned, i.e. the target function /
machine
▪ Choose objective function & optimality criteria
▪ Choose a learning algorithm to infer the target function from
the experience.
12. 12AILABS (c) Copyright 2018
▪Stage 1: Feature Extraction, Feature subset selection,
Feature Vector Representation
▪Stage 2: Training / Testing Set Creation and Augmentation
▪Stage 3: Training the Inference Machine
▪Stage 4: Running the Inference Machine on Test Set
▪Stage 5: Stratified Sampling and Validation
Learning Work Flow
13. 13AILABS (c) Copyright 2018
Feature Extraction/ Selection
low-level parts
mid-level parts
high-level parts
additional descriptors
Cognitive Elements
Knowledge EngineerDomain Expert
Sparse Coder
Sparse
Representation
Corpus
14. 14AILABS (c) Copyright 2018
Training Set Augmentation
Sparse
Representation
Random
Sampler
Samples
Reviewer
Existing
Training Set
Existing
training set
Augmented
training set
15. 15AILABS (c) Copyright 2018
Training and Prediction / Recognition
Training
Set
Adaptive
Learner
Prediction /
Recognition
Model
Prediction /
Recognition
Model
Unlabelled
Residual Corpus
Predicted /
Recognized Corpus
16. 16AILABS (c) Copyright 2018
Sampling , Validation& Convergence
Stratified
Sampler
Precision &
Recall
Calculator
Converged
?
Yes End of
Relevance
Scoring
NoGo back to
Training Set
Augmentation
Human
Reviewed
Stratified sub-
samples
ReviewerStratified sub-
samples
Predicted
Corpus
17. 17AILABS (c) Copyright 2018
Evolutionof Connectionist Models
1943: Artificial neuron model (McCulloch & Pitts)
▪ "A logical calculus of the ideas immanent in nervous activity"
▪ simple artificial “neurons” could be made to perform basic
logical operations such as AND, OR and NOT
▪ known as Linear Threshold Gate
▪ NO learning
18. 18AILABS (c) Copyright 2018
1943: Artificial neuron model (McCulloch & Pitts)
Evolution of Connectionist Models
)( jj sfy j
n
i
iijj bxws 0
w1j
x1
w2j
x2
wnj
xn
bj
yj
19. 19AILABS (c) Copyright 2018
Evolutionof Connectionist Models
1957: Perceptron model (Rosenblatt)
▪ invention of learning rules inspired by ideas from
neuroscience
if Σ inputi * weighti > threshold, output = +1
if Σ inputi * weighti < threshold, output = -1
▪ learns to classify input into two output classes
▪ Sigmoid transfer function: boundedness, graduality
xy
xy
as0
as1
20. 20AILABS (c) Copyright 2018
1943: Artificial neuron model (McCulloch & Pitts)
Evolution of Connectionist Models
)( jj sfy j
n
i
iijj bxws 0
w1j
x1
w2j
x2
wnj
xn
bj
yj
js
e
1
1
21. 21AILABS (c) Copyright 2018
Evolutionof Connectionist Models
1960s: Delta Learning Rule (Widrow & Hoff)
▪ Define the error as the
squared residuals
summed over all training
cases:
▪ Now differentiate to get
error derivatives for
weights
▪ The batch delta rule
changes the weights in
proportion to their error
derivatives summed
over all training cases
1 2
2
1
2
,
ˆ( )
ˆ
ˆ
ˆ( )
n n
n
n n
ni i n
i n n n
n
i
i
E y y
y EE
w w y
x y y
E
w
w
22. 22AILABS (c) Copyright 2018
Evolutionof Connectionist Models
1969: Minsky's objection to Perceptrons
▪ Marvin Minsky & Seymour Papert: Perceptrons
▪ Unless input categories are linearly separable, a perceptron
cannot learn to discriminate between them.
▪ Unfortunately, it appeared that many important categories
were not linearly separable.
23. 23AILABS (c) Copyright 2018
Evolutionof Connectionist Models
1969: Minsky's objection to Perceptrons
Perceptrons are good at linear classification but ...
x1
x2
1
1
1
1
1
1
1
1
1
25. 25AILABS (c) Copyright 2018
Universal ApproximationTheorem
Existential Version (Kolmogorov)
▪ There exists a finite combination of superposition and
addition of continuous functions of single variables which can
approximate any continuous, multivariate function on
compact subsets of R^d.
Constructive Version (Cybenko)
▪ The standard multilayer feed-forward network with a single
hidden layer, containing finite number of hidden neurons, is
a universal approximator among continuous functions on
compact subsets of R^d, under mild assumptions on the
activation function.
26. 26AILABS (c) Copyright 2018
Evolutionof Connectionist Models
1986: Backpropagation for Multi-Layer Perceptrons
(Rumelhart, Hinton & Williams)
▪ solution to Minsky's objection regarding perceptron's limitation
▪ nonlinear classification is achieved by fully connected, multilayer,
feedforward networks of perceptrons (MLP)
▪ MLP can be trained by backpropagation
▪ Two-pass algorithm
▪ forward propagation of activation signals from input to output
▪ backward propagation of error derivatives from output to input
28. 28AILABS (c) Copyright 2018
Evolutionof Connectionist Models
1986: Backpropagation for Multi-Layer Perceptrons
(Rumelhart, Hinton & Williams)
▪ solution to Minsky's objection regarding perceptron's limitation
▪ nonlinear classification is achieved by fully connected, multilayer,
feedforward networks of perceptrons (MLP)
▪ MLP can be trained by backpropagation
▪ Two-pass algorithm
▪ forward propagation of activation signals from input to output
▪ backward propagation of error derivatives from output to input
30. 30AILABS (c) Copyright 2018
HandwritingDigit Recognition
Input Output
16 x 16 = 256
1x
2x
256x
…
…
Color → 1
No color → 0
…
y1
y2
y1
Each output represents the
confidence of a digit.
is 1
is 2
is 0
…
0.1
0.7
0.2
The image
is “2”
y1
y2
y10
32. 32AILABS (c) Copyright 2018
Evolution of Connectionist Models
1989: Convolutional Neural Network (LeCun)
Output
LayerHidden Layers
Input
Layer
Input Output
1x
2x
Layer 1
…
…
Nx
…
…
Layer 2
…
…
Layer
L
…
…
…
…
…
…
…
…
…
…
y1
y2
yM
Deep means many hidden layers
neuron
33. 33AILABS (c) Copyright 2018
Convolutional Neural Network
▪ Input can have very high dimension.
▪ Using a fully-connected neural network would need a large
amount of parameters.
▪ CNNs are a special type of neural network whose hidden
units are only connected to local receptive field.
▪ The number of parameters needed by CNNs is much
smaller.
Example: 200x200 image
a)fully connected: 40,000
hidden units => 1.6 billion
parameters
b)CNN: 5x5 kernel (filter), 100
feature maps => 2,500
parameters
41. 41AILABS (c) Copyright 2018
Evolution of Connectionist Models
2006: Deep Belief Networks (Hinton), Stacked Auto-Encoders
(Bengio)
Output
LayerHidden Layers
Input
Layer
Input Output
1x
2x
Layer 1
…
…
Nx
…
…
Layer 2
…
…
Layer
L
…
…
…
…
…
…
…
…
…
…
y1
y2
yM
Deep means man y hidden layers
neuron
42. 42AILABS (c) Copyright 2018
Deep Learning
Traditional pattern recognition models use hand-crafted
features and relatively simple trainable classifier.
This approach has the following limitations:
• It is very tedious and costly to develop hand-crafted
features
▪ The hand-crafted features are usually highly dependents on
one application, and cannot be transferred easily to other
applications
hand-crafted
feature
extractor
“Simple”
Trainable
Classifier
output
43. 43AILABS (c) Copyright 2018
Deep Learning
Deep learning = representation learning
Seeks to learn hierarchical representations (i.e. features)
automatically through multiple stage of feature learning process.
Low-level
features
output
Mid-level
features
High-level
features
Trainable
classifier
Feature visualization of convolutional net trained on ImageNet (Zeiler and Fergus, 2013)
44. 44AILABS (c) Copyright 2018
Learning Hierarchical Representations
Hierarchy of representations with increasing level of abstraction.
Each stage is a kind of trainable nonlinear feature transformation
Image recognition
Pixel → edge → motif → part → object
Text
Character → word → word group → clause → sentence → story
Low-level
features
output
Mid-level
features
High-level
features
Trainable
classifier
Increasing level of abstraction
45. 45AILABS (c) Copyright 2018
Pooling
Common pooling operations:
Max pooling
Report the maximum output within a rectangular neighborhood.
Average pooling
Report the average output of a rectangular neighborhood (possibly
weighted by the distance from the central pixel).
50. 50AILABS (c) Copyright 2018
Future Trends
▪ Different and wider range of problems are being
addressed
▪ natural language understanding
▪ natural scene understanding
▪ natural speech understanding
▪ Feature learning is being investigated at deeper level
▪ Manifold learning
▪ Reinforcement learning
▪ Integration with other paradigms of machine learning