SlideShare a Scribd company logo
1 of 88
Download to read offline
CS7015 (Deep Learning) : Lecture 1
(Partial/Brief) History of Deep Learning
Mitesh M. Khapra
Department of Computer Science and Engineering
Indian Institute of Technology Madras
. . . . . . . . . . . . . . . . . . . .
1/1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2/1
Acknowledgements
Most of this material is based on the article “Deep Learning in Neural Networks:
An Overview” by J. Schmidhuber[? ]
The errors, if any, are due to me and I apologize for them
Feel free to contact me if you think certain portions need to be corrected (please
provide appropriate references)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3/1
Chapter 1: Biological Neurons
Module 1.1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4/1
Reticular Theory
Joseph von Gerlach proposed that the ner-
vous system is a single continuous network
as opposed to a network of many discrete
cells!
1871-1873
Reticular theory
Module 1.1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4/1
Staining Technique
Camillo Golgi discovered a chemical reaction
that allowed him to examine nervous tissue
in much greater detail than ever before
He was a proponent of Reticular theory.
1871-1873
Reticular theory
Module 1.1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4/1
Neuron Doctrine
Santiago Ramón y Cajal used Golgi’s tech-
nique to study the nervous system and pro-
posed that it is actually made up of discrete
individual cells formimg a network (as op-
posed to a single continuous network)
1871-1873
Reticular theory
1888-1891
Neuron Doctrine
Module 1.1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4/1
The Term Neuron
The term neuron was coined by Hein-
rich Wilhelm Gottfried von Waldeyer-Hartz
around 1891.
He further consolidated the Neuron Doc-
trine.
1871-1873
Reticular theory
1888-1891
Neuron Doctrine
Module 1.1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4/1
Nobel Prize
Both Golgi (reticular theory) and Cajal (neu-
ron doctrine) were jointly awarded the 1906
Nobel Prize for Physiology or Medicine, that
resulted in lasting conflicting ideas and con-
troversies between the two scientists.
1871-1873
Reticular theory
1888-1891
Neuron Doctrine
1906
Nobel Prize
Module 1.1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4/1
The Final Word
In 1950s electron microscopy finally con-
firmed the neuron doctrine by unam-
biguously demonstrating that nerve cells
were individual cells interconnected through
synapses (a network of many individual neu-
rons).
1871-1873
Reticular theory
1888-1891
Neuron Doctrine
1906
Nobel Prize
1950
Synapse
Module 1.1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5/1
Chapter 2: From Spring to Winter of AI
Module 2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6/1
McCulloch Pitts Neuron
McCulloch (neuroscientist) and Pitts (logi-
cian) proposed a highly simplified model of
the neuron (1943)[? ]
1943
MP Neuron
Module 2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6/1
Perceptron
“the perceptron may eventually be able to
learn, make decisions, and translate lan-
guages” -Frank Rosenblatt
1943
MP Neuron
1957-1958
Perceptron
Module 2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6/1
Perceptron
“the embryo of an electronic computer that
the Navy expects will be able to walk, talk,
see, write, reproduce itself and be conscious
of its existence.” -New York Times
1943
MP Neuron
1957-1958
Perceptron
Module 2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6/1
First generation Multilayer
Perceptrons
Ivakhnenko et. al.[? ]
1943
MP Neuron
1957-1958
Perceptron
1965-1968
MLP
Module 2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6/1
Perceptron Limitations
In their now famous book “Perceptrons”,
Minsky and Papert outlined the limits of
what perceptrons could do[? ]
1943
MP Neuron
1957-1958
Perceptron
1965-1968
MLP
1969
Limitations
Module 2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6/1
AI Winter of connectionism
Almost lead to the abandonment of connec-
tionist AI
1943
MP Neuron
1957-1958
Perceptron
1965-1968
MLP
1969
Limitations
1969-1986
AI Winter
Module 2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6/1
Backpropagation
Discovered and rediscovered several
times throughout 1960’s and 1970’s
Werbos(1982)[? ] first used it in the
context of artificial neural networks
Eventually popularized by the work of
Rumelhart et. al. in 1986[? ]
1943
MP Neuron
1957-1958
Perceptron
1965-1968
MLP
1969
Limitations
1969-1986
AI Winter
1986
Backpropagation
Module 2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6/1
Gradient Descent
Cauchy discovered Gradient Descent moti-
vated by the need to compute the orbit of
heavenly bodies
1847
Gradient Descent
1943
MP Neuron
1957-1958
Perceptron
1965-1968
MLP
1969
Limitations
1969-1986
AI Winter
1986
Backpropagation
Module 2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6/1
Universal Approximation The-
orem
A multilayered network of neurons with a
single hidden layer can be used to approxi-
mate any continuous function to any desired
precision[? ]
1847
Gradient Descent
1943
MP Neuron
1957-1958
Perceptron
1965-1968
MLP
1969
Limitations
1969-1986
AI Winter
1986
Backpropagation
1989
UAT
Module 2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7/1
Chapter 3: The Deep Revival
Module 3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8/1
Unsupervised Pre-Training
Hinton and Salakhutdinov described an ef-
fective way of initializing the weights that
allows deep autoencoder networks to learn a
low-dimensional representation of data.[? ]
2006
Unsupervised Pre-Training
Module 3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9/1
Unsupervised Pre-Training
The idea of unsupervised pre-training actu-
ally dates back to 1991-1993 (J. Schmidhu-
ber) when it was used to train a “Very Deep
Learner”
1991-1993
Very Deep Learner
2006
Unsupervised Pre-Training
Module 3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9/1
More insights (2007-2009)
Further Investigations into the effectiveness
of Unsupervised Pre-training
1991-1993
Very Deep Learner
2006-2009
Unsupervised Pretraining
Module 3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9/1
Success in Handwriting Recog-
nition
Graves et. al. outperformed all entries in an
international Arabic handwriting recognition
competition[? ]
1991-1993
Very Deep Learner
2006-2009
Unsupervised Pretraining
2009
Handwriting
Module 3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9/1
Success in Speech Recognition
Dahl et. al. showed relative error reduction
of 16.0% and 23.2% over a state of the art
system[? ]
1991-1993
Very Deep Learner
2006-2009
Unsupervised Pretraining
2009
Handwriting
2010
Speech
Module 3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9/1
New record on MNIST
Ciresan et. al. set a new record on the
MNIST dataset using good old backpropa-
gation on GPUs (GPUs enter the scene)[? ]
1991-1993
Very Deep Learner
2006-2009
Unsupervised Pretraining
2009
Handwriting
2010
Speech
Record on MNIST
Module 3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9/1
First Superhuman Visual Pat-
tern Recognition
D. C. Ciresan et. al. achieved 0.56% error
rate in the IJCNN Traffic Sign Recognition
Competition[? ]
1991-1993
Very Deep Learner
2006-2009
Unsupervised Pretraining
2009
Handwriting
2010
Speech
Record on MNIST
2011
Visual Pattern
Recognition
Module 3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9/1
Winning more visual recogni-
tion challenges
Network Error Layers
AlexNet[? ] 16.0% 8
1991-1993
Very Deep Learner
2006-2009
Unsupervised Pretraining
2009
Handwriting
2010
Speech
Record on MNIST
2011
Visual Pattern
Recognition
2012-2016
Success on ImageNet
Module 3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9/1
Winning more visual recogni-
tion challenges
Network Error Layers
AlexNet[? ] 16.0% 8
ZFNet[? ] 11.2% 8
1991-1993
Very Deep Learner
2006-2009
Unsupervised Pretraining
2009
Handwriting
2010
Speech
Record on MNIST
2011
Visual Pattern
Recognition
2012-2016
Success on ImageNet
Module 3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9/1
Winning more visual recogni-
tion challenges
Network Error Layers
AlexNet[? ] 16.0% 8
ZFNet[? ] 11.2% 8
VGGNet[? ] 7.3% 19
1991-1993
Very Deep Learner
2006-2009
Unsupervised Pretraining
2009
Handwriting
2010
Speech
Record on MNIST
2011
Visual Pattern
Recognition
2012-2016
Success on ImageNet
Module 3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9/1
Winning more visual recogni-
tion challenges
Network Error Layers
AlexNet[? ] 16.0% 8
ZFNet[? ] 11.2% 8
VGGNet[? ] 7.3% 19
GoogLeNet[? ] 6.7% 22
1991-1993
Very Deep Learner
2006-2009
Unsupervised Pretraining
2009
Handwriting
2010
Speech
Record on MNIST
2011
Visual Pattern
Recognition
2012-2016
Success on ImageNet
Module 3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9/1
Winning more visual recogni-
tion challenges
Network Error Layers
AlexNet[? ] 16.0% 8
ZFNet[? ] 11.2% 8
VGGNet[? ] 7.3% 19
GoogLeNet[? ] 6.7% 22
MS ResNet[? ] 3.6% 152!!
1991-1993
Very Deep Learner
2006-2009
Unsupervised Pretraining
2009
Handwriting
2010
Speech
Record on MNIST
2011
Visual Pattern
Recognition
2012-2016
Success on ImageNet
Module 3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
10/1
Chapter 4: From Cats to Convolutional Neural Networks
Module 4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11/1
Hubel and Wiesel Experiment
Experimentally showed that each neuron has
a fixed receptive field - i.e. a neuron will
fire only in response to a visual stimuli in a
specific region in the visual space[? ]
1959
H and W experiment
Module 4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11/1
Neocognitron
Used for Handwritten character recogni-
tion and pattern recognition (Fukushima et.
al.)[? ]
1959
H and W experiment
1980
Neocognitron
Module 4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11/1
Convolutional Neural Network
Handwriting digit recognition using back-
propagation over a Convolutional Neural
Network (LeCun et. al.)[? ]
1959
H and W experiment
1980
Neocognitron
1989
CNN
Module 4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11/1
LeNet-5
Introduced the (now famous) MNIST
dataset (LeCun et. al.)[? ]
1959
H and W experiment
1980
Neocognitron
1989
CNN
1998
LeNet-5
Module 4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
12/1
An algorithm inspired by an experiment on cats is today
used to detect cats in videos :-)
Module 4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13/1
Chapter 5: Faster, higher, stronger
Module 5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
14/1
Better Optimization Methods
Faster convergence, better accuracies
1983
Nesterov
Module 5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
14/1
Better Optimization Methods
Faster convergence, better accuracies
1983
Nesterov
2011
Adagrad
Module 5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
14/1
Better Optimization Methods
Faster convergence, better accuracies
1983
Nesterov
2011
Adagrad
2012
RMSProp
Module 5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
14/1
Better Optimization Methods
Faster convergence, better accuracies
1983
Nesterov
2011
Adagrad
2012
RMSProp
2015
Adam
Module 5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
14/1
Better Optimization Methods
Faster convergence, better accuracies
1983
Nesterov
2011
Adagrad
2012
RMSProp
2015
Adam
2016
Eve
Module 5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
14/1
Better Optimization Methods
Faster convergence, better accuracies
1983
Nesterov
2011
Adagrad
2012
RMSProp
2015
Adam
2016
Eve
2018
Beyond Adam
Module 5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
14/1
Better Optimization Methods
Faster convergence, better accuracies
1983
Nesterov
2011
Adagrad
2012
RMSProp
2015 2016
Eve
2018
Beyond Adam
Adam/BatchNorm
Module 5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
15/1
Chapter 6: The Curious Case of Sequences
Module 6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
16/1
Sequences
They are everywhere
Time series, speech, music, text, video
Each unit in the sequence interacts
with other units
Need models to capture this
interaction
Module 6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
16/1
Hopfield Network
Content-addressable memory systems for
storing and retrieving patterns[? ]
1982
Hopfield
Module 6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
16/1
Jordan Network
The output state of each time step is fed to
the next time step thereby allowing interac-
tions between time steps in the sequence
1982
Hopfield
1986
Jordan
Module 6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
16/1
Elman Network
The hidden state of each time step is fed to
the next time step thereby allowing interac-
tions between time steps in the sequence
1982
Hopfield
1986
Jordan
1990
Elman
Module 6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
16/1
Drawbacks of RNNs
Hochreiter et. al. and Bengio et. al.
showed the difficulty in training RNNs (the
problem of exploding and vanishing gradi-
ents)
1982
Hopfield
1986
Jordan
1990
Elman
1991-1994
RNN drawbacks
Module 6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
16/1
Long Short Term Memory
Showed that LSTMs can solve complex long
time lag tasks that could never be solved
before
1982
Hopfield
1986
Jordan
1990
Elman
1991-1994
RNN drawbacks
1997
LSTMs
Module 6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
16/1
Sequence To Sequence Learn-
ing
Initial success in using RNNs/LSTMs
for large scale Sequence To Sequence
Learning Problems
Introduction of Attention which
inspired a lot of research over the next
two years
1982
Hopfield
1986
Jordan
1990
Elman
1991-1994
RNN drawbacks
1997
LSTMs
2014
Seq2Seq-Attention
Module 6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
16/1
RL for Attention
Schmidhuber & Huber proposed RNNs that
use reinforcement learning to decide where
to look
1982
Hopfield
1986
Jordan
1990
Elman
1991-1994
RNN drawbacks
1997
LSTMs
2014
Seq2Seq-Attention
1991
RL-Attention
Module 6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
17/1
Beating humans at their own game (literally)
Module 7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
18/1
Playing Atari Games
Human-level control through deep
reinforcement learning for playing
Atari Games[? ]
2015
DQNs
Module 7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
18/1
Let’s GO
Alpha Go Zero - Best Go player ever,
surpassing human players[? ]
GO is more complex than chess
because of number of possible moves
No brute force backtracking unlike
previous chess agents
2015
2015
DQNs/AlphaGO
Module 7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
18/1
Taking a shot at Poker
DeepStack defeated 11 professional poker
players with only one outside the margin of
statistical significance[? ]
2015
2015
DQNs/AlphaGO
2016
Poker
Module 7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
18/1
Defense of the Ancients
Widely popular game, with complex
strategies, large visual space
Bot was undefeated against many top
professional players
2015
2015
DQNs/AlphaGO
2016
Poker
2017
Dota 2
Module 7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
19/1
Chapter 8: The Madness (2013-)
Module 8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
20/1
He sat on a chair. Language Modeling
Mikolov et al. (2010)[? ]
Kiros et al. (2015)[? ]
Kim et al. (2015)[? ]
Module 8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
21/1
Speech Recognition
Hinton et al. (2012)[? ]
Graves et al. (2013)[? ]
Chorowski et al. (2015)[? ]
Sak et al. (2015)[? ]
Module 8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
22/1
Machine Translation
Kalchbrenner et al. (2013)[? ]
Cho et al. (2014)[? ]
Bahdanau et al. (2015)[? ]
Jean et al. (2015)[? ]
Gulcehre et al. (2015)[? ]
Sutskever et al. (2014)[? ]
Luong et al. (2015)[? ]
Zheng et al. (2017)[? ]
Cheng et al. (2016)[? ]
Chen et al. (2017)[? ]
Firat et al. (2016)[? ]
Module 8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
23/1
Conversation Modeling
Shang et al. (2015)[? ]
Vinyals et al. (2015)[? ]
Lowe et al. (2015)[? ]
Dodge et al. (2015)[? ]
Weston et al. (2016)[? ]
Serban et al. (2016)[? ]
Bordes et al. (2017)[? ]
Serban et al. (2017)[? ]
Module 8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
24/1
Question Answering
Hermann et al. (2015)[? ]
Chen et al. (2016)[? ]
Xiong et al. (2016)[? ]
Seo et al. (2016)[? ]
Dhingra et al. (2017)[? ]
Wang et al. (2017)[? ]
Hu et al. (2017)[? ]
Module 8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
25/1
Object Detection/Recognition
Semantic Segmentation (Long et al,
2015)[? ]
Recurrent CNNs (Liang et al.,
2015)[? ]
Faster RCNN (Ren et al., 2015)[? ]
Inside-Outside Net (Bell et al.,
2015)[? ]
YOLO9000 (Redmon et al., 2016)[? ]
R-FCN (Dai et al., 2016)[? ]
Mask R-CNN (He at al., 2017)[? ]
Video Object segmentation (Caelles et
al., 2017)[? ]
Module 8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
26/1
Visual Tracking
Choi et al. (2017)[? ]
Yun et al. (2017)[? ]
Alahi et al. (2017)[? ]
Module 8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
27/1
Image Captioning
Mao et al. (2014)[? ]
Mao at al. (2015)[? ]
Kiros et al. (2015)[? ]
Donahue et al. (2015)[? ]
Vinyals et al. (2015)[? ]
Karpathy et al. (2015)[? ]
Fang et al. (2015)[? ]
Chen et al. (2015)[? ]
Module 8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
28/1
Video Captioning
Donahue et al. (2014)[? ]
Venugopalan at al. (2014)[? ]
Pan et al. (2015)[? ]
Yao et al. (2015)[? ]
Rohrbach et al. (2015)[? ]
Zhu et al. (2015)[? ]
Cho et al. (2015)[? ]
Module 8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
29/1
Visual Question Answering
Santoro et al. (2017)[? ]
Hu at al. (2017)[? ]
Johnson et al. (2017)[? ]
Ben-younes et al. (2017)[? ]
Malinowski et al. (2017)[? ]
Kazemi et al. (2016)[? ]
Module 8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
30/1
Video Question Answering
Tapaswi et. al. 2016[? ]
Zeng et. al. 2016[? ]
Maharaj et. al. 2017[? ]
Zhao et. al. 2017[? ]
Yu Youngjae et. al. 2017[? ]
Xue Hongyang et. al.
2017[? ]
Mazaheri et. al. 2017[? ]
Module 8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
31/1
Video Summarization
Chheng 2007[? ]
Ajmal 2012[? ]
Zhang Ke 2016[? ]
Zhong Ji 2017[? ]
Panda 2017[? ]
Module 8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
32/1
Generating Authentic Photos
Variational Autoencoders
(Kingma et. al., 2013)[? ]
Generative Adversarial
Networks (Goodfellow et. al.,
2014)[? ]
Plug & Play generative nets
(Nguyen et al., 2016)[? ]
Progressive Growing of GANs
(Karras et al., 2017)[? ]
Module 8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
33/1
Generating Raw Audio
Wavenets (Oord et. al.,
2016)[? ]
Module 8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
34/1
Pixel RNNs
(Oord et al., 2016)[? ]
(Oord et al., 2016)[? ]
(Salimans et al., 2017)[? ]
Module 8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
35/1
Chapter 9: (Need for) Sanity
Module 9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
36/1
The Paradox of Deep Learning
Why does deep learning work so well despite
∗
https://arxiv.org/pdf/1710.05468.pdf
Module 9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
36/1
The Paradox of Deep Learning
Why does deep learning work so well despite
high capacity (susceptible to overfitting)
∗
https://arxiv.org/pdf/1710.05468.pdf
Module 9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
36/1
The Paradox of Deep Learning
Why does deep learning work so well despite
high capacity (susceptible to overfitting)
numerical instability (vanishing/exploding
gradients)
∗
https://arxiv.org/pdf/1710.05468.pdf
Module 9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
36/1
The Paradox of Deep Learning
Why does deep learning work so well despite
high capacity (susceptible to overfitting)
numerical instability (vanishing/exploding
gradients)
sharp minima (leading to overfitting)
∗
https://arxiv.org/pdf/1710.05468.pdf
Module 9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
36/1
The Paradox of Deep Learning
Why does deep learning work so well despite
high capacity (susceptible to overfitting)
numerical instability (vanishing/exploding
gradients)
sharp minima (leading to overfitting)
non-robustness (see figure)
∗
https://arxiv.org/pdf/1710.05468.pdf
Module 9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
36/1
The Paradox of Deep Learning
Why does deep learning work so well despite
high capacity (susceptible to overfitting)
numerical instability (vanishing/exploding
gradients)
sharp minima (leading to overfitting)
non-robustness (see figure)
No clear answers yet but ...
∗
https://arxiv.org/pdf/1710.05468.pdf
Module 9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
36/1
The Paradox of Deep Learning
Why does deep learning work so well despite
high capacity (susceptible to overfitting)
numerical instability (vanishing/exploding
gradients)
sharp minima (leading to overfitting)
non-robustness (see figure)
No clear answers yet but ...
Slowly but steadily there is increasing emphasis on
explainability and theoretical justifications!∗
∗
https://arxiv.org/pdf/1710.05468.pdf
Module 9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
36/1
The Paradox of Deep Learning
Why does deep learning work so well despite
high capacity (susceptible to overfitting)
numerical instability (vanishing/exploding
gradients)
sharp minima (leading to overfitting)
non-robustness (see figure)
No clear answers yet but ...
Slowly but steadily there is increasing emphasis on
explainability and theoretical justifications!∗
Hopefully this will bring sanity to the proceedings !
∗
https://arxiv.org/pdf/1710.05468.pdf
Module 9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
37/1
https://github.com/kjw0612/awesome-rnn
Module 9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
38/1
i
Source: https://www.cbinsights.com/blog/deep-learning-ai-startups-market-map-company-list/
Module 9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
39/1
References I
Module 9

More Related Content

Similar to Lecture1 (1).pdf

martinthesis
martinthesismartinthesis
martinthesis
Martin L
 
Finding Ourselves in the Universe_ A Mathematical Approach to Cosmic Crystall...
Finding Ourselves in the Universe_ A Mathematical Approach to Cosmic Crystall...Finding Ourselves in the Universe_ A Mathematical Approach to Cosmic Crystall...
Finding Ourselves in the Universe_ A Mathematical Approach to Cosmic Crystall...
Joshua Menges
 
Multi-Band Rejection EMI Shielding
Multi-Band Rejection EMI ShieldingMulti-Band Rejection EMI Shielding
Multi-Band Rejection EMI Shielding
Sourav Rakshit
 
SeniorThesisFinal_Biswas
SeniorThesisFinal_BiswasSeniorThesisFinal_Biswas
SeniorThesisFinal_Biswas
Aditya Biswas
 
Real-Time Vowel Synthesis - A Magnetic Resonator Piano Based Project_by_Vasil...
Real-Time Vowel Synthesis - A Magnetic Resonator Piano Based Project_by_Vasil...Real-Time Vowel Synthesis - A Magnetic Resonator Piano Based Project_by_Vasil...
Real-Time Vowel Synthesis - A Magnetic Resonator Piano Based Project_by_Vasil...
Vassilis Valavanis
 
MSc_thesis_OlegZero
MSc_thesis_OlegZeroMSc_thesis_OlegZero
MSc_thesis_OlegZero
Oleg Żero
 
Introduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusIntroduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabus
Andres Mendez-Vazquez
 

Similar to Lecture1 (1).pdf (20)

Abimbola_NMO-1
Abimbola_NMO-1Abimbola_NMO-1
Abimbola_NMO-1
 
mscthesis
mscthesismscthesis
mscthesis
 
martinthesis
martinthesismartinthesis
martinthesis
 
Finding Ourselves in the Universe_ A Mathematical Approach to Cosmic Crystall...
Finding Ourselves in the Universe_ A Mathematical Approach to Cosmic Crystall...Finding Ourselves in the Universe_ A Mathematical Approach to Cosmic Crystall...
Finding Ourselves in the Universe_ A Mathematical Approach to Cosmic Crystall...
 
Multi-Band Rejection EMI Shielding
Multi-Band Rejection EMI ShieldingMulti-Band Rejection EMI Shielding
Multi-Band Rejection EMI Shielding
 
SeniorThesisFinal_Biswas
SeniorThesisFinal_BiswasSeniorThesisFinal_Biswas
SeniorThesisFinal_Biswas
 
MyThesis
MyThesisMyThesis
MyThesis
 
Fundamentals of structural dynamics
Fundamentals of structural dynamicsFundamentals of structural dynamics
Fundamentals of structural dynamics
 
My User Experience Journals
My User Experience JournalsMy User Experience Journals
My User Experience Journals
 
Analysis and Simulation of Scienti c Networks
Analysis and Simulation of Scientic NetworksAnalysis and Simulation of Scientic Networks
Analysis and Simulation of Scienti c Networks
 
Real-Time Vowel Synthesis - A Magnetic Resonator Piano Based Project_by_Vasil...
Real-Time Vowel Synthesis - A Magnetic Resonator Piano Based Project_by_Vasil...Real-Time Vowel Synthesis - A Magnetic Resonator Piano Based Project_by_Vasil...
Real-Time Vowel Synthesis - A Magnetic Resonator Piano Based Project_by_Vasil...
 
MSc_thesis_OlegZero
MSc_thesis_OlegZeroMSc_thesis_OlegZero
MSc_thesis_OlegZero
 
Thesis
ThesisThesis
Thesis
 
Lecture notes on mobile communication
Lecture notes on mobile communicationLecture notes on mobile communication
Lecture notes on mobile communication
 
memoire_zarebski_2013
memoire_zarebski_2013memoire_zarebski_2013
memoire_zarebski_2013
 
Tr1546
Tr1546Tr1546
Tr1546
 
FundamentalsofComputerStudies.pdf
FundamentalsofComputerStudies.pdfFundamentalsofComputerStudies.pdf
FundamentalsofComputerStudies.pdf
 
FundamentalsofComputerStudies.pdf
FundamentalsofComputerStudies.pdfFundamentalsofComputerStudies.pdf
FundamentalsofComputerStudies.pdf
 
Introduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusIntroduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabus
 
introduction to deep learning
introduction to deep learningintroduction to deep learning
introduction to deep learning
 

Recently uploaded

DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 

Recently uploaded (20)

Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Bridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptxBridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptx
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Rums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdfRums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdf
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 

Lecture1 (1).pdf