SlideShare a Scribd company logo
1 of 51
What Deep Learning Means
for Artificial Intelligence
Jonathan Mugan
Austin Data Geeks
March 11, 2015
@jmugan, www.jonathanmugan.com
AI through the lens of System 1 and System 2
Psychologist Daniel Kahneman in Thinking Fast and Slow describes
humans as having two modes of thought: System 1 and System 2.
System 1: Fast and Parallel System 2: Slow and Serial
AI systems in these domains are useful
but limited. Called GOFAI (Good, Old-
Fashioned Artificial Intelligence).
1. Search and planning
2. Logic
3. Rule-based systems
AI systems in these domains have
been lacking.
1. Serial computers too slow
2. Lack of training data
3. Didnโ€™t have the right algorithms
Subconscious: E.g., face recognition or
speech understanding.
Conscious: E.g., when listening to a
conversation or making PowerPoint slides.
We underestimated how hard it would
be to implement. E.g., we thought
computer vision would be easy.
We assumed it was the most difficult.
E.g., we thought chess was hard.
AI through the lens of System 1 and System 2
Psychologist Daniel Kahneman in Thinking Fast and Slow describes
humans as having two modes of thought: System 1 and System 2.
System 1: Fast and Parallel System 2: Slow and Serial
AI systems in these domains are useful
but limited. Called GOFAI (Good, Old-
Fashioned Artificial Intelligence).
1. Search and planning
2. Logic
3. Rule-based systems
Subconscious: E.g., face recognition or
speech understanding.
Conscious: E.g., when listening to a
conversation or making PowerPoint slides.
We underestimated how hard it would
be to implement. E.g., we thought
computer vision would be easy.
We assumed it was the most difficult.
E.g., we thought chess was hard.
This has changed
1. We now have GPUs and
distributed computing
2. We have Big Data
3. We have new algorithms [Bengio et
al., 2003; Hinton et al., 2006; Ranzato et
al., 2006]
Deep learning begins with a little function
sum ๐‘ฅ =
๐‘–=1
๐‘›
๐‘ค๐‘– ๐‘ฅ๐‘– = ๐‘ค ๐‘‡
๐‘ฅ
๐‘ค๐‘’๐‘–๐‘”โ„Ž๐‘ก1 ร— ๐‘–๐‘›๐‘๐‘ข๐‘ก1
๐‘ค๐‘’๐‘–๐‘”โ„Ž๐‘ก2 ร— ๐‘–๐‘›๐‘๐‘ข๐‘ก2
๐‘ค๐‘’๐‘–๐‘”โ„Ž๐‘ก3 ร— ๐‘–๐‘›๐‘๐‘ข๐‘ก3+
๐‘ ๐‘ข๐‘š
It all starts with a humble linear function called a perceptron.
Perceptron:
If ๐‘ ๐‘ข๐‘š > ๐‘กโ„Ž๐‘Ÿ๐‘’๐‘ โ„Ž๐‘œ๐‘™๐‘‘: output 1
Else: output 0
In math, with ๐‘ฅ being an input
vector and ๐‘ค being a weight
vector.
Example: The inputs can be your data. Question: Should I buy this car?
0.2 ร— ๐‘”๐‘Ž๐‘  ๐‘š๐‘–๐‘™๐‘Ž๐‘”๐‘’
0.3 ร— โ„Ž๐‘œ๐‘Ÿ๐‘ ๐‘’ ๐‘๐‘œ๐‘ค๐‘’๐‘Ÿ
0.5 ร— ๐‘›๐‘ข๐‘š ๐‘๐‘ข๐‘ โ„Ž๐‘œ๐‘™๐‘‘๐‘’๐‘Ÿ๐‘ +
๐‘ ๐‘ข๐‘š
If ๐‘ ๐‘ข๐‘š > ๐‘กโ„Ž๐‘Ÿ๐‘’๐‘ โ„Ž๐‘œ๐‘™๐‘‘: buy car
Else: walk
These little functions are chained together
Deep learning comes from chaining a bunch of these little
functions together. Chained together, they are called
neurons.
๐œŽ ๐‘ ๐‘ข๐‘š ๐‘ฅ + ๐‘ where ๐œŽ ๐‘ฅ =
1
1+
1
๐‘’
To create a neuron, we add a nonlinearity to the perceptron to get
extra representational power when we chain them together.
Our nonlinear perceptron is
sometimes called a sigmoid.
The value ๐‘ just offsets the sigmoid so the
center is at 0.
Plot of a sigmoid
Single artificial neuron
๐‘ค๐‘’๐‘–๐‘”โ„Ž๐‘ก1 ร— ๐‘–๐‘›๐‘๐‘ข๐‘ก1
๐‘ค๐‘’๐‘–๐‘”โ„Ž๐‘ก2 ร— ๐‘–๐‘›๐‘๐‘ข๐‘ก2
๐‘ค๐‘’๐‘–๐‘”โ„Ž๐‘ก3 ร— ๐‘–๐‘›๐‘๐‘ข๐‘ก3
Output, or input to
next neuron
Three-layered neural network
A bunch of neurons chained together is called a neural network.
Layer 2: hidden layer. Called
this because it is neither input
nor output.
[16.2, 17.3, โˆ’52.3, 11.1]
Layer 3: output. E.g., cat
or not a cat; buy the car or
walk.
Layer 1: input data. Can
be pixel values or the number
of cup holders.
This network has three layers.
(Some edges lighter
to avoid clutter.)
Training with supervised learning
Supervised Learning: You show the network a bunch of things
with a labels saying what they are, and you want the network to
learn to classify future things without labels.
Example: here are some
pictures of cats. Tell me
which of these other pictures
are of cats.
To train the network, want to
find the weights that
correctly classify all of the
training examples. You hope
it will work on the testing
examples.
Done with an algorithm
called Backpropagation
[Rumelhart et al., 1986].
[16.2, 17.3, โˆ’52.3, 11.1]
Deep learning is adding more layers
There is no exact
definition of
what constitutes
โ€œdeep learning.โ€
The number of
weights (parameters)
is generally large.
Some networks have
millions of parameters
that are learned.
[16.2, 17.3, โˆ’52.3, 11.1]
(Some edges omitted
to avoid clutter.)
Talk Outline
โ€ข Introduction
โ€ข Deep learning and natural language processing
โ€ข Deep learning and computer vision
โ€ข Deep learning and robot actions
โ€ข What deep learning still canโ€™t do
โ€ข Practical ways you can get started
โ€ข Conclusion
โ€ข References
Talk Outline
โ€ข Introduction
โ€ข Deep learning and natural language processing
โ€ข Deep learning and computer vision
โ€ข Deep learning and robot actions
โ€ข What deep learning still canโ€™t do
โ€ข Practical ways you can get started
โ€ข Conclusion
โ€ข References
Recall our standard architecture
Layer 2: hidden layer. Called
this because it is neither input
nor output.
Layer 3: output. E.g., cat
or not a cat; buy the car or
walk.
Layer 1: input data. Can
be pixel values or the number
of cup holders.
[16.2, 17.3, โˆ’52.3, 11.1]
Is this a cat?
Neural nets with multiple outputs
Okay, but what kind of cat is it?
๐‘ƒ(๐‘ฅ)
[16.2, 17.3, โˆ’52.3, 11.1]
๐‘ƒ ๐‘œ๐‘– =
๐‘’ ๐‘ ๐‘ข๐‘š(๐‘ฅ ๐‘–)+๐‘ ๐‘–
๐‘— ๐‘’ ๐‘ ๐‘ข๐‘š ๐‘ฅ ๐‘— +๐‘ ๐‘—
๐‘ƒ(๐‘ฅ)๐‘ƒ(๐‘ฅ) ๐‘ƒ(๐‘ฅ) ๐‘ƒ(๐‘ฅ)
Introduce a new node
called a softmax.
Probability a
house cat
Probability a
lion
Probability a
panther
Probability a
bobcat
Where ๐‘— varies over all the
other nodes at that layer.
Just normalize the output ๐‘œ๐‘–
over the sum of the other
outputs.
Learning word vectors
13.2, 5.4, โˆ’3.1 [โˆ’12.1, 13.1, 0.1] [7.2, 3.2,-1.9]
the man ran
From the sentence, โ€œThe man ran fast.โ€
๐‘ƒ(๐‘ฅ)๐‘ƒ(๐‘ฅ) ๐‘ƒ(๐‘ฅ) ๐‘ƒ(๐‘ฅ)
Probability
of โ€œfastโ€
Probability
of โ€œslowโ€
Probability
of โ€œtacoโ€
Probability
of โ€œbobcatโ€
Learns a vector for each word based on the โ€œmeaningโ€ in the sentence by
trying to predict the next word [Bengio et al., 2003].
Computationally
expensive because you
need a softmax
node for each word in
the vocabulary.
Recent work models the top
layer using a binary tree
[Mikolov et al., 2013].
These numbers updated
along with the weights
and become the vector
representations of the
words.
Comparing vector and symbolic representations
Vector representation
taco = [17.32, 82.9, โˆ’4.6, 7.2]
Symbolic representation
taco = ๐‘ก๐‘Ž๐‘๐‘œ
โ€ข Vectors have a similarity score.
โ€ข A taco is not a burrito but similar.
โ€ข Symbols can be the same or not.
โ€ข A taco is just as different from a
burrito as a Toyota.
โ€ข Vectors have internal structure
[Mikolov et al., 2013].
โ€ข Italy โ€“ Rome = France โ€“ Paris
โ€ข King โ€“ Queen = Man โ€“ Woman
โ€ข Symbols have no structure.
โ€ข Symbols are arbitrarily assigned.
โ€ข Meaning relative to other symbols.
โ€ข Vectors are grounded in
experience.
โ€ข Meaning relative to predictions.
โ€ข Ability to learn representations
makes agents less brittle.
Learning vectors of longer text
[33.1, 7.5, 11.2] 13.2, 5.4, โˆ’3.1 [โˆ’12.1, 13.1, 0.1] [7.2, 3.2,-1.9]
<current paragraph> the man ran
From the sentence, โ€œThe man ran fast.โ€
๐‘ƒ(๐‘ฅ)๐‘ƒ(๐‘ฅ) ๐‘ƒ(๐‘ฅ) ๐‘ƒ(๐‘ฅ)
Probability
of โ€œfastโ€
Probability
of โ€œslowโ€
Probability
of โ€œtacoโ€
Probability
of โ€œbobcatโ€
The โ€œmeaningโ€ of a
paragraph is encoded in
a vector that allows you
to predict the next word
in the paragraph using
already learned word
vectors [Le and Mikolov,
2014].
Learning to parse
The dog went to the store.
NP NP
PP
VP
PPGiven the semantic
representation of
words, we can do
structure learning.
The system can
learn to group
things that go
together.
Learning to parse
[1.1,1.3] [8.1, 2.1] [2.6, 8.4] [2.0, 7.1] [8.9, 2.4] [7.1, 2.2]
[1.1, 1.3] [1.1, 1.3]
[7.3, 9.8]
[3.2, 8.8]
[9.1, 0.2]Given the semantic
representation of
words we can do
structure learning.
Socher et al. [2010]
learned word
vectors and then
used the Penn Tree
Bank to train a
parser that
encodes semantic
meaning.
The dog went to the store.
Talk Outline
โ€ข Introduction
โ€ข Deep learning and natural language processing
โ€ข Deep learning and computer vision
โ€ข Deep learning and robot actions
โ€ข What deep learning still canโ€™t do
โ€ข Practical ways you can get started
โ€ข Conclusion
โ€ข References
Talk Outline
โ€ข Introduction
โ€ข Deep learning and natural language processing
โ€ข Deep learning and computer vision
โ€ข Deep learning and robot actions
โ€ข What deep learning still canโ€™t do
โ€ข Practical ways you can get started
โ€ข Conclusion
โ€ข References
Vision is hard
โ€ข Even harder for 3D
objects.
โ€ข You move a bit, and
everything changes.
72 โ‹ฏ 91
โ‹ฎ โ‹ฑ โ‹ฎ
16 โ‹ฏ 40
How a computer sees an image.
Example from MNIST
handwritten digit dataset
[LeCun and Cortes, 1998].
Vision is hard because images are big matrices of numbers.
Breakthrough: Unsupervised Model
โ€ข Big breakthrough in 2006 by Hinton et al.
โ€ข Use a network with symmetric weights called a restricted Boltzmann machine.
โ€ข Stochastic binary neuron.
โ€ข Probabilistically outputs 0
(turns off) or 1 (turns on)
based on the weight of the
inputs from on units.
0 1
๐‘ƒ ๐‘ ๐‘– = 1 =
1
1 +
1
๐‘’ ๐‘ ๐‘ข๐‘š(๐‘  ๐‘–)
โ€ข Limit connections to be from one
layer to the next.
โ€ข Fast because decisions are made
locally.
โ€ข Trained in an unsupervised way
to reproduce the data.
0 1 0 1 0 1
0 1 0 1 0 1
Stack up the layers to make a deep network
Input data
Hidden
The output of each layer
becomes the input to the
next layer [Hinton et al.,
2006].
See video starting at
second 45
https://www.coursera.org
/course/neuralnets
0 1 0 1 0 1
0 1 0 1 0 1
Input data
Hidden
0 1 0 1 0 1
0 1 0 1 0 1
Hidden layer becomes
input data of next layer.
Computer vision, scaling up
Unsupervised learning was scaled up
by Honglak Lee et al. [2009] to learn
high-level visual features.
[Lee et al., 2009]
Further scaled up by Quoc Le et al.
[2012].
โ€ข Used 1,000 machines (16,000
cores) running for 3 days to train 1
billion weights by watching
YouTube videos.
โ€ข The network learned to identify
cats.
โ€ข The network wasnโ€™t told to look
for cats, it naturally learned that
cats were integral to online
viewing.
โ€ข Video on the topic at NYT
http://www.nytimes.com/2012/06/26/tec
hnology/in-a-big-network-of-computers-
evidence-of-machine-learning.html
Why is this significant?
To have a grounded understanding
of its environment, an agent must
be able to acquire representations
through experience [Pierce et al.,
1997; Mugan et al., 2012].
Without a grounded understanding,
the agent is limited to what was
programmed in.
We saw that unsupervised learning
could be used to learn the meanings of words, grounded in the
experience of reading.
Using these deep Boltzmann machines, machines can learn
to see the world through experience.
Limit connections and duplicate parameters
Convolutional neural networks build in
a kind of feature invariance. They take
advantage the layout of the pixels.
[16.2, 17.3, โˆ’52.3, 11.1]
With the layers and
topology, our networks are
starting to look a little like
the visual cortex. Although,
we still donโ€™t fully
understand the visual
cortex.
Different areas
of the image go
to different
parts of the
neural network,
and weights are
shared.
Recent deep vision networks
ImageNet http://www.image-net.org/ is a huge collection of images
corresponding to the nouns of the WordNet hierarchy. There are hundreds to
thousands of images per noun.
2012 โ€“ Deep Learning begins to dominate image recognition
Krizhevsky et al. [2012] got 16% error on recognizing objects, when
before the best error was 26%. They used a convolutional neural
network.
2015 โ€“ Deep Learning surpasses human level performance
He et al. [2015] surpassed human level performance on recognizing
images of objects.* Computers seem to have an advantage when the
classes of objects are fine grained, such as multiple species of dogs.
Closing note on computer vision: Hinton points out that modern networks can just
work with top down (supervised learning) if the network is small enough relative to
the amount of the training data; but the goal of AI is broad-based understanding,
and there will likely never be enough labeled training data for general intelligence.
*But deep learning can be easily fooled [Nguyen et al., 2014]. Enlightening video at https://www.youtube.com/watch?v=M2IebCN9Ht4.
Talk Outline
โ€ข Introduction
โ€ข Deep learning and natural language processing
โ€ข Deep learning and computer vision
โ€ข Deep learning and robot actions
โ€ข What deep learning still canโ€™t do
โ€ข Practical ways you can get started
โ€ข Conclusion
โ€ข References
Talk Outline
โ€ข Introduction
โ€ข Deep learning and natural language processing
โ€ข Deep learning and computer vision
โ€ข Deep learning and robot actions
โ€ข What deep learning still canโ€™t do
โ€ข Practical ways you can get started
โ€ข Conclusion
โ€ข References
A stamping in of behavior
When we think of doing things, we think of conscious planning
with System 2.
Imagine trying to get to Seattle.
โ€ข Get to the airport. How? Take a taxi. How? Call a taxi. How?
Find my phone.
โ€ข Some behaviors arise more from a
a gradual stamping in [Thorndike,
1898].
โ€ข Became the study of Behaviorism
[Skinner, 1953] (see Skinner box
on the right).
โ€ข Formulated into artificial
intelligence as Reinforcement
Learning [Sutton and Barto, 1998]. By Andreas1 (Adapted from Image:Boite skinner.jpg) [GFDL
(http://www.gnu.org/copyleft/fdl.html) or CC-BY-SA-3.0
(http://creativecommons.org/licenses/by-sa/3.0/)], via
Wikimedia Commons
A โ€œSkinner boxโ€
Beginning with random exploration
In reinforcement learning, the agent begins by
randomly exploring until it reaches its goal.
โ€ข When it reaches the goal, credit is propagated back to its previous
states.
โ€ข The agent learns the function ๐‘„ ๐œ‹(๐‘ , ๐‘Ž), which gives the cumulative
expected discounted reward of being in state ๐‘  and taking action
๐‘Ž and acting according to policy ๐œ‹ thereafter.
Reaching the goal
Eventually, the agent learns the value of being in each state and
taking each action and can therefore always do the best thing in
each state.
Learning the behavior
Playing Atari with Deep Learning
Input, last four frames, where each frame
is downsampled to 84 by 84 pixels.
[Mnih et al., 2013]
represent the
state-action value
function ๐‘„ ๐‘ , ๐‘Ž as
a convolutional
neural network.
๐‘ƒ(๐‘ฅ)๐‘ƒ(๐‘ฅ) ๐‘ƒ(๐‘ฅ) ๐‘ƒ(๐‘ฅ)
Value of
moving left
Value of
moving right
Value of
shooting
Value of
reloading
In [Mnih et al., 2013],
this is actually three
hidden layers.
See some videos at
http://mashable.com/2015/02
/25/computer-wins-at-atari-
games/
Talk Outline
โ€ข Introduction
โ€ข Deep learning and natural language processing
โ€ข Deep learning and computer vision
โ€ข Deep learning and robot actions
โ€ข What deep learning still canโ€™t do
โ€ข Practical ways you can get started
โ€ข Conclusion
โ€ข References
Talk Outline
โ€ข Introduction
โ€ข Deep learning and natural language processing
โ€ข Deep learning and computer vision
โ€ข Deep learning and robot actions
โ€ข What deep learning still canโ€™t do
โ€ข Practical ways you can get started
โ€ข Conclusion
โ€ข References
We must go deeper for a robust System 1
Imagine a dude standing
on a table. How would a
computer know that if
you move the table you
also move the dude?
Likewise, how could a
computer know that it
only rains outside?
Or, as Marvin Minsky asks, how could a computer learn
that you can pull a box with a string but not push it?
We must go deeper for a robust System 1
Imagine a dude standing
on a table. How would a
computer know that if
you move the table you
also move the dude?
Likewise, how could a
computer know that it
only rains outside?
Or, as Marvin Minsky asks, how could a computer learn
that you can pull a box with a string but not push it?
You couldnโ€™t possibly explain
all of these situations to a
computer. Thereโ€™s just too
many variations.
A robot can learn through
experience, but it must be
able to efficiently generalize
that experience.
Abstract thinking through image schemas
Humans efficiently generalize experience using abstractions called
image schemas [Johnson, 1987]. Image schemas map experience
to conceptual structure.
Developmental psychologist Jean Mandler argues that some
image schemas are formed before children begin to talk, and that
language is eventually built onto this base set of schemas [2004].
โ€ข Consider what it means for an object to contain another object,
such as for a box to contain a ball.
โ€ข The container constrains the movement of the object inside.
If the container is moved, the contained object moves.
โ€ข These constraints are represented by the container image
schema.
โ€ข Other image schemas from Mark Johnsonโ€™s book, The Body in the Mind:
path, counterforce, restraint, removal, enablement, attraction, link,
cycle, near-far, scale, part-whole, full-empty, matching, surface, object,
and collection.
Abstract thinking through metaphors
Love is a journey. Love is war.
Lakoff and Johnson [1980] argue that we understand abstract
concepts through metaphors to physical experience.
Neural networks will likely require advances in both architecture
and size to reach this level of abstraction.
For example, a container can be more than
a way of understanding physical
constraints, it can be a metaphor used to
understand the abstract concept of what
an argument is. You could say that
someoneโ€™s argument doesnโ€™t hold water, or
you could say that it is empty, or you could
say that the argument has holes in it.
Talk Outline
โ€ข Introduction
โ€ข Deep learning and natural language processing
โ€ข Deep learning and computer vision
โ€ข Deep learning and robot actions
โ€ข What deep learning still canโ€™t do
โ€ข Practical ways you can get started
โ€ข Conclusion
โ€ข References
Talk Outline
โ€ข Introduction
โ€ข Deep learning and natural language processing
โ€ข Deep learning and computer vision
โ€ข Deep learning and robot actions
โ€ข What deep learning still canโ€™t do
โ€ข Practical ways you can get started
โ€ข Conclusion
โ€ข References
Some code to get you started
Google Word2Vec: they looked at (3 billion documents) and
created 300 long vectors.
https://code.google.com/p/word2vec/
Gensim has an implementation of the learning algorithm in
Python.
http://radimrehurek.com/gensim/models/word2vec.html
Theano is a general-purpose deep learning implementation with
great documentation and tutorials.
http://deeplearning.net/software/theano/
Best learning resources
Best place to start. Hintonโ€™s Coursera Course. Get it right from
the horseโ€™s mouth. He explains things well.
https://www.coursera.org/course/neuralnets
Online textbook in preparation for deep learning from Yoshua
Bengio and friends. Clear and understandable.
http://www.iro.umontreal.ca/~bengioy/dlbook/
Introduction to programming deep learning with Python and
Theano. It's clear, detailed, and entertaining. 1-hour talk.
http://www.rosebt.com/blog/introduction-to-deep-learning-
with-python
Talk Outline
โ€ข Introduction
โ€ข Deep learning and natural language processing
โ€ข Deep learning and computer vision
โ€ข Deep learning and robot actions
โ€ข What deep learning still canโ€™t do
โ€ข Practical ways you can get started
โ€ข Conclusion
โ€ข References
Talk Outline
โ€ข Introduction
โ€ข Deep learning and natural language processing
โ€ข Deep learning and computer vision
โ€ข Deep learning and robot actions
โ€ข What deep learning still canโ€™t do
โ€ข Practical ways you can get started
โ€ข Conclusion
โ€ข References
What deep learning means for artificial intelligence
Deep learning can allow a robot to autonomously
learn a representation through experience with the
world.
When its representation is grounded in experience,
a robot can be autonomous without having to rely
on the intentionality of the human designer.
If we can continue to scale up deep learning to
represent ever higher levels of abstraction, our
robots may view the world in an alien way, but
they will be independently intelligent.
References
1. Yoshua Bengio, Rรฉjean Ducharme, Pascal Vincent, and Christian Janvin. A neural probabilistic
language model. The Journal of Machine Learning Research, 3:1137โ€“1155, 2003.
2. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers:
Surpassing human-level performance on imagenet classification. arXiv preprint
arXiv:1502.01852, 2015.
3. Geoffrey Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning algorithm for deep
belief nets. Neural computation, 18(7):1527โ€“1554, 2006.
4. M. Johnson. The Body in the Mind: The Bodily Basis of Meaning, Imagination, and Reason.
University of Chicago Press, Chicago, Illinois, USA, 1987.
5. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. ImageNet classification with deep
convolutional neural networks. In Advances in neural information processing systems, pages
1097โ€“1105, 2012.
6. G. Lakoff and M. Johnson. Metaphors We Live By. University of Chicago Press, Chicago, 1980.
7. Quoc Le and Tomas Mikolov. Distributed representations of sentences and documents. In
Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages
1188โ€“1196, 2014.
8. Q.V. Le, M.A. Ranzato, R. Monga, M. Devin, K. Chen, G.S. Corrado, J. Dean, and A. Ng. Building
high-level features using large scale unsupervised learning. In International Conference on
Machine Learning (ICML), 2012.
9. Yann LeCun and Corinna Cortes. The mnist database of handwritten digits, 1998.
References (continued)
10. Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y Ng. Convolutional deep belief
networks for scalable unsupervised learning of hierarchical representations. In Proceedings
of the 26th Annual International Conference on Machine Learning, pages 609โ€“616. ACM,
2009.
11. J. Mandler. The Foundations of Mind, Origins of Conceptual Thought. Oxford University
Press, New York, New York, USA, 2004.
12. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed
representations of words and phrases and their compositionality. In Advances in Neural
Information Processing Systems, pages 3111โ€“3119, 2013.
13. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan
Wierstra, and Martin Riedmiller. Playing Atari with deep reinforcement learning. arXiv
preprint arXiv:1312.5602, 2013.
14. J. Mugan and B. Kuipers. Autonomous learning of high-level states and actions in
continuous environments. IEEE Trans. Autonomous Mental Development, 4(1):70โ€“86,
2012.
15. Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: High
confidence predictions for unrecognizable images. arXiv preprint arXiv:1412.1897, 2014.
16. D. M. Pierce and B. J. Kuipers. Map learning with uninterpreted sensors and effectors.
92:169โ€“227, 1997.
17. Marcโ€™Aurelio Ranzato, Christopher Poultney, Sumit Chopra, and Yann LeCun. Efficient
learning of sparse representations with an energy-based model. In Advances in neural
information processing systems, pages 1137โ€“1144, 2006.
References (continued)
18. David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Parallel Distributed
Processing: Explorations in the Microstructure of Cognition, Volume 2, chapter Learning
representations by error propagation, pages 3โ€“18โ€“362. 1986.
19. Burrhus Frederic Skinner. Science and human behavior. Simon and Schuster, 1953.
20. Richard Socher, Christopher D Manning, and Andrew Y Ng. Learning continuous phrase
representations and syntactic parsing with recursive neural networks. In Proceedings of the
NIPS-2010 Deep Learning and Unsupervised Feature Learning Workshop, pages 1โ€“9, 2010.
21. R. S. Sutton and A. G. Barto. Reinforcement Learning. MIT Press, Cambridge MA, 1998.
22. E.L. Thorndike. Animal intelligence: an experimental study of the associative processes in
animals. 1898.
Thanks for listening
Jonathan Mugan
@jmugan
www.jonathanmugan.com

More Related Content

What's hot

[DSC 2016] ็ณปๅˆ—ๆดปๅ‹•๏ผšๆŽๅฎๆฏ… / ไธ€ๅคฉๆžๆ‡‚ๆทฑๅบฆๅญธ็ฟ’
[DSC 2016] ็ณปๅˆ—ๆดปๅ‹•๏ผšๆŽๅฎๆฏ… / ไธ€ๅคฉๆžๆ‡‚ๆทฑๅบฆๅญธ็ฟ’[DSC 2016] ็ณปๅˆ—ๆดปๅ‹•๏ผšๆŽๅฎๆฏ… / ไธ€ๅคฉๆžๆ‡‚ๆทฑๅบฆๅญธ็ฟ’
[DSC 2016] ็ณปๅˆ—ๆดปๅ‹•๏ผšๆŽๅฎๆฏ… / ไธ€ๅคฉๆžๆ‡‚ๆทฑๅบฆๅญธ็ฟ’
ๅฐ็ฃ่ณ‡ๆ–™็ง‘ๅญธๅนดๆœƒ
ย 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
Simplilearn
ย 
ๆŽๅฎๆฏ…/็•ถ่ชž้Ÿณ่™•็†้‡ไธŠๆทฑๅบฆๅญธ็ฟ’
ๆŽๅฎๆฏ…/็•ถ่ชž้Ÿณ่™•็†้‡ไธŠๆทฑๅบฆๅญธ็ฟ’ๆŽๅฎๆฏ…/็•ถ่ชž้Ÿณ่™•็†้‡ไธŠๆทฑๅบฆๅญธ็ฟ’
ๆŽๅฎๆฏ…/็•ถ่ชž้Ÿณ่™•็†้‡ไธŠๆทฑๅบฆๅญธ็ฟ’
ๅฐ็ฃ่ณ‡ๆ–™็ง‘ๅญธๅนดๆœƒ
ย 
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Simplilearn
ย 

What's hot (20)

Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
ย 
Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)
ย 
Deep Learning
Deep LearningDeep Learning
Deep Learning
ย 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!
ย 
Deep Learning: a birds eye view
Deep Learning: a birds eye viewDeep Learning: a birds eye view
Deep Learning: a birds eye view
ย 
[DSC 2016] ็ณปๅˆ—ๆดปๅ‹•๏ผšๆŽๅฎๆฏ… / ไธ€ๅคฉๆžๆ‡‚ๆทฑๅบฆๅญธ็ฟ’
[DSC 2016] ็ณปๅˆ—ๆดปๅ‹•๏ผšๆŽๅฎๆฏ… / ไธ€ๅคฉๆžๆ‡‚ๆทฑๅบฆๅญธ็ฟ’[DSC 2016] ็ณปๅˆ—ๆดปๅ‹•๏ผšๆŽๅฎๆฏ… / ไธ€ๅคฉๆžๆ‡‚ๆทฑๅบฆๅญธ็ฟ’
[DSC 2016] ็ณปๅˆ—ๆดปๅ‹•๏ผšๆŽๅฎๆฏ… / ไธ€ๅคฉๆžๆ‡‚ๆทฑๅบฆๅญธ็ฟ’
ย 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
ย 
Deep learning in Computer Vision
Deep learning in Computer VisionDeep learning in Computer Vision
Deep learning in Computer Vision
ย 
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
ย 
ๆŽๅฎๆฏ…/็•ถ่ชž้Ÿณ่™•็†้‡ไธŠๆทฑๅบฆๅญธ็ฟ’
ๆŽๅฎๆฏ…/็•ถ่ชž้Ÿณ่™•็†้‡ไธŠๆทฑๅบฆๅญธ็ฟ’ๆŽๅฎๆฏ…/็•ถ่ชž้Ÿณ่™•็†้‡ไธŠๆทฑๅบฆๅญธ็ฟ’
ๆŽๅฎๆฏ…/็•ถ่ชž้Ÿณ่™•็†้‡ไธŠๆทฑๅบฆๅญธ็ฟ’
ย 
Deep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingDeep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image Processing
ย 
Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)
ย 
Deep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog DetectorDeep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog Detector
ย 
Deep learning: the future of recommendations
Deep learning: the future of recommendationsDeep learning: the future of recommendations
Deep learning: the future of recommendations
ย 
Deep learning
Deep learningDeep learning
Deep learning
ย 
Deep Learning - A Literature survey
Deep Learning - A Literature surveyDeep Learning - A Literature survey
Deep Learning - A Literature survey
ย 
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
ย 
Tutorial on Deep Learning
Tutorial on Deep LearningTutorial on Deep Learning
Tutorial on Deep Learning
ย 
Information Retrieval with Deep Learning
Information Retrieval with Deep LearningInformation Retrieval with Deep Learning
Information Retrieval with Deep Learning
ย 
Machine Learning and Deep Learning with R
Machine Learning and Deep Learning with RMachine Learning and Deep Learning with R
Machine Learning and Deep Learning with R
ย 

Viewers also liked

Best Deep Learning Post from LinkedIn Group
Best Deep Learning Post from LinkedIn Group Best Deep Learning Post from LinkedIn Group
Best Deep Learning Post from LinkedIn Group
Farshid Pirahansiah
ย 
H2O Deep Learning at Next.ML
H2O Deep Learning at Next.MLH2O Deep Learning at Next.ML
H2O Deep Learning at Next.ML
Sri Ambati
ย 
Philosophy of Deep Learning
Philosophy of Deep LearningPhilosophy of Deep Learning
Philosophy of Deep Learning
Melanie Swan
ย 
[Pycon 2015] ์˜ค๋Š˜ ๋‹น์žฅ ๋”ฅ๋Ÿฌ๋‹ ์‹คํ—˜ํ•˜๊ธฐ ์ œ์ถœ์šฉ
[Pycon 2015] ์˜ค๋Š˜ ๋‹น์žฅ ๋”ฅ๋Ÿฌ๋‹ ์‹คํ—˜ํ•˜๊ธฐ ์ œ์ถœ์šฉ[Pycon 2015] ์˜ค๋Š˜ ๋‹น์žฅ ๋”ฅ๋Ÿฌ๋‹ ์‹คํ—˜ํ•˜๊ธฐ ์ œ์ถœ์šฉ
[Pycon 2015] ์˜ค๋Š˜ ๋‹น์žฅ ๋”ฅ๋Ÿฌ๋‹ ์‹คํ—˜ํ•˜๊ธฐ ์ œ์ถœ์šฉ
ํ˜„ํ˜ธ ๊น€
ย 
Deview deep learning-แ„€แ…ตแ†ทแ„Œแ…ฅแ†ผแ„’แ…ด
Deview deep learning-แ„€แ…ตแ†ทแ„Œแ…ฅแ†ผแ„’แ…ดDeview deep learning-แ„€แ…ตแ†ทแ„Œแ…ฅแ†ผแ„’แ…ด
Deview deep learning-แ„€แ…ตแ†ทแ„Œแ…ฅแ†ผแ„’แ…ด
NAVER D2
ย 
Indoor Point Cloud Processing - Deep learning for semantic segmentation of in...
Indoor Point Cloud Processing - Deep learning for semantic segmentation of in...Indoor Point Cloud Processing - Deep learning for semantic segmentation of in...
Indoor Point Cloud Processing - Deep learning for semantic segmentation of in...
CubiCasa
ย 

Viewers also liked (20)

Donner - Deep Learning - Overview and practical aspects
Donner - Deep Learning - Overview and practical aspectsDonner - Deep Learning - Overview and practical aspects
Donner - Deep Learning - Overview and practical aspects
ย 
Best Deep Learning Post from LinkedIn Group
Best Deep Learning Post from LinkedIn Group Best Deep Learning Post from LinkedIn Group
Best Deep Learning Post from LinkedIn Group
ย 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial Intelligence
ย 
Transform your Business with AI, Deep Learning and Machine Learning
Transform your Business with AI, Deep Learning and Machine LearningTransform your Business with AI, Deep Learning and Machine Learning
Transform your Business with AI, Deep Learning and Machine Learning
ย 
Deep Learning Computer Build
Deep Learning Computer BuildDeep Learning Computer Build
Deep Learning Computer Build
ย 
Passive stereo vision with deep learning
Passive stereo vision with deep learningPassive stereo vision with deep learning
Passive stereo vision with deep learning
ย 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
ย 
H2O Deep Learning at Next.ML
H2O Deep Learning at Next.MLH2O Deep Learning at Next.ML
H2O Deep Learning at Next.ML
ย 
Distributed Deep Learning At Scale On Apache Spark With BigDL
Distributed Deep Learning At Scale On Apache Spark With BigDLDistributed Deep Learning At Scale On Apache Spark With BigDL
Distributed Deep Learning At Scale On Apache Spark With BigDL
ย 
Philosophy of Deep Learning
Philosophy of Deep LearningPhilosophy of Deep Learning
Philosophy of Deep Learning
ย 
์ซ„์ง€๋ง์ž๋”ฅ๋Ÿฌ๋‹2 - CNN RNN ํฌํ•จ๋ฒ„์ „
์ซ„์ง€๋ง์ž๋”ฅ๋Ÿฌ๋‹2 - CNN RNN ํฌํ•จ๋ฒ„์ „์ซ„์ง€๋ง์ž๋”ฅ๋Ÿฌ๋‹2 - CNN RNN ํฌํ•จ๋ฒ„์ „
์ซ„์ง€๋ง์ž๋”ฅ๋Ÿฌ๋‹2 - CNN RNN ํฌํ•จ๋ฒ„์ „
ย 
๋จธ์‹  ๋Ÿฌ๋‹ ์ž…๋ฌธ #1-๋จธ์‹ ๋Ÿฌ๋‹ ์†Œ๊ฐœ์™€ kNN ์†Œ๊ฐœ
๋จธ์‹  ๋Ÿฌ๋‹ ์ž…๋ฌธ #1-๋จธ์‹ ๋Ÿฌ๋‹ ์†Œ๊ฐœ์™€ kNN ์†Œ๊ฐœ๋จธ์‹  ๋Ÿฌ๋‹ ์ž…๋ฌธ #1-๋จธ์‹ ๋Ÿฌ๋‹ ์†Œ๊ฐœ์™€ kNN ์†Œ๊ฐœ
๋จธ์‹  ๋Ÿฌ๋‹ ์ž…๋ฌธ #1-๋จธ์‹ ๋Ÿฌ๋‹ ์†Œ๊ฐœ์™€ kNN ์†Œ๊ฐœ
ย 
Deep Learning for Robotics
Deep Learning for RoboticsDeep Learning for Robotics
Deep Learning for Robotics
ย 
[Pycon 2015] ์˜ค๋Š˜ ๋‹น์žฅ ๋”ฅ๋Ÿฌ๋‹ ์‹คํ—˜ํ•˜๊ธฐ ์ œ์ถœ์šฉ
[Pycon 2015] ์˜ค๋Š˜ ๋‹น์žฅ ๋”ฅ๋Ÿฌ๋‹ ์‹คํ—˜ํ•˜๊ธฐ ์ œ์ถœ์šฉ[Pycon 2015] ์˜ค๋Š˜ ๋‹น์žฅ ๋”ฅ๋Ÿฌ๋‹ ์‹คํ—˜ํ•˜๊ธฐ ์ œ์ถœ์šฉ
[Pycon 2015] ์˜ค๋Š˜ ๋‹น์žฅ ๋”ฅ๋Ÿฌ๋‹ ์‹คํ—˜ํ•˜๊ธฐ ์ œ์ถœ์šฉ
ย 
๋”ฅ๋Ÿฌ๋‹๊ณผ ๊ฐ•ํ™” ํ•™์Šต์œผ๋กœ ๋‚˜๋ณด๋‹ค ์ž˜ํ•˜๋Š” ์ฟ ํ‚ค๋Ÿฐ AI ๊ตฌํ˜„ํ•˜๊ธฐ
๋”ฅ๋Ÿฌ๋‹๊ณผ ๊ฐ•ํ™” ํ•™์Šต์œผ๋กœ ๋‚˜๋ณด๋‹ค ์ž˜ํ•˜๋Š” ์ฟ ํ‚ค๋Ÿฐ AI ๊ตฌํ˜„ํ•˜๊ธฐ๋”ฅ๋Ÿฌ๋‹๊ณผ ๊ฐ•ํ™” ํ•™์Šต์œผ๋กœ ๋‚˜๋ณด๋‹ค ์ž˜ํ•˜๋Š” ์ฟ ํ‚ค๋Ÿฐ AI ๊ตฌํ˜„ํ•˜๊ธฐ
๋”ฅ๋Ÿฌ๋‹๊ณผ ๊ฐ•ํ™” ํ•™์Šต์œผ๋กœ ๋‚˜๋ณด๋‹ค ์ž˜ํ•˜๋Š” ์ฟ ํ‚ค๋Ÿฐ AI ๊ตฌํ˜„ํ•˜๊ธฐ
ย 
From Natural Language Processing to Artificial Intelligence
From Natural Language Processing to Artificial IntelligenceFrom Natural Language Processing to Artificial Intelligence
From Natural Language Processing to Artificial Intelligence
ย 
์ธ๊ณต์ง€๋Šฅ, ๊ธฐ๊ณ„ํ•™์Šต ๊ทธ๋ฆฌ๊ณ  ๋”ฅ๋Ÿฌ๋‹
์ธ๊ณต์ง€๋Šฅ, ๊ธฐ๊ณ„ํ•™์Šต ๊ทธ๋ฆฌ๊ณ  ๋”ฅ๋Ÿฌ๋‹์ธ๊ณต์ง€๋Šฅ, ๊ธฐ๊ณ„ํ•™์Šต ๊ทธ๋ฆฌ๊ณ  ๋”ฅ๋Ÿฌ๋‹
์ธ๊ณต์ง€๋Šฅ, ๊ธฐ๊ณ„ํ•™์Šต ๊ทธ๋ฆฌ๊ณ  ๋”ฅ๋Ÿฌ๋‹
ย 
Deview deep learning-แ„€แ…ตแ†ทแ„Œแ…ฅแ†ผแ„’แ…ด
Deview deep learning-แ„€แ…ตแ†ทแ„Œแ…ฅแ†ผแ„’แ…ดDeview deep learning-แ„€แ…ตแ†ทแ„Œแ…ฅแ†ผแ„’แ…ด
Deview deep learning-แ„€แ…ตแ†ทแ„Œแ…ฅแ†ผแ„’แ…ด
ย 
Indoor Point Cloud Processing - Deep learning for semantic segmentation of in...
Indoor Point Cloud Processing - Deep learning for semantic segmentation of in...Indoor Point Cloud Processing - Deep learning for semantic segmentation of in...
Indoor Point Cloud Processing - Deep learning for semantic segmentation of in...
ย 
๊ธฐ๊ณ„ํ•™์Šต / ๋”ฅ๋Ÿฌ๋‹์ด๋ž€ ๋ฌด์—‡์ธ๊ฐ€
๊ธฐ๊ณ„ํ•™์Šต / ๋”ฅ๋Ÿฌ๋‹์ด๋ž€ ๋ฌด์—‡์ธ๊ฐ€๊ธฐ๊ณ„ํ•™์Šต / ๋”ฅ๋Ÿฌ๋‹์ด๋ž€ ๋ฌด์—‡์ธ๊ฐ€
๊ธฐ๊ณ„ํ•™์Šต / ๋”ฅ๋Ÿฌ๋‹์ด๋ž€ ๋ฌด์—‡์ธ๊ฐ€
ย 

Similar to What Deep Learning Means for Artificial Intelligence

Deep Learning Survey
Deep Learning SurveyDeep Learning Survey
Deep Learning Survey
Anthony Parziale
ย 
NIPS2007: deep belief nets
NIPS2007: deep belief netsNIPS2007: deep belief nets
NIPS2007: deep belief nets
zukun
ย 
Y conf talk - Andrej Karpathy
Y conf talk - Andrej KarpathyY conf talk - Andrej Karpathy
Y conf talk - Andrej Karpathy
Sze Siong Teo
ย 
Cognitive Science Unit 4
Cognitive Science Unit 4Cognitive Science Unit 4
Cognitive Science Unit 4
CSITSansar
ย 

Similar to What Deep Learning Means for Artificial Intelligence (20)

Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in RUnderstanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
ย 
Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...
Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...
Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...
ย 
introduction to deeplearning
introduction to deeplearningintroduction to deeplearning
introduction to deeplearning
ย 
A Study On Deep Learning
A Study On Deep LearningA Study On Deep Learning
A Study On Deep Learning
ย 
Deep Learning Survey
Deep Learning SurveyDeep Learning Survey
Deep Learning Survey
ย 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdf
ย 
Intro to deep learning_ matteo alberti
Intro to deep learning_ matteo albertiIntro to deep learning_ matteo alberti
Intro to deep learning_ matteo alberti
ย 
Spike timing dependent plasticity to make robot navigation more intelligent. ...
Spike timing dependent plasticity to make robot navigation more intelligent. ...Spike timing dependent plasticity to make robot navigation more intelligent. ...
Spike timing dependent plasticity to make robot navigation more intelligent. ...
ย 
Abductive commonsense reasoning
Abductive commonsense reasoningAbductive commonsense reasoning
Abductive commonsense reasoning
ย 
NIPS2007: deep belief nets
NIPS2007: deep belief netsNIPS2007: deep belief nets
NIPS2007: deep belief nets
ย 
Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...
Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...
Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...
ย 
Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...
Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...
Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...
ย 
Artificial intelligence cs607 handouts lecture 11 - 45
Artificial intelligence   cs607 handouts lecture 11 - 45Artificial intelligence   cs607 handouts lecture 11 - 45
Artificial intelligence cs607 handouts lecture 11 - 45
ย 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
ย 
Muhammad Usman Akhtar | Ph.D Scholar | Wuhan University | School of Co...
Muhammad Usman Akhtar  |  Ph.D Scholar  |  Wuhan  University  |  School of Co...Muhammad Usman Akhtar  |  Ph.D Scholar  |  Wuhan  University  |  School of Co...
Muhammad Usman Akhtar | Ph.D Scholar | Wuhan University | School of Co...
ย 
Y conf talk - Andrej Karpathy
Y conf talk - Andrej KarpathyY conf talk - Andrej Karpathy
Y conf talk - Andrej Karpathy
ย 
Lecture 4 neural networks
Lecture 4 neural networksLecture 4 neural networks
Lecture 4 neural networks
ย 
Cognitive Science Unit 4
Cognitive Science Unit 4Cognitive Science Unit 4
Cognitive Science Unit 4
ย 
Deep learning tutorial 9/2019
Deep learning tutorial 9/2019Deep learning tutorial 9/2019
Deep learning tutorial 9/2019
ย 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning Tutorial
ย 

More from Jonathan Mugan

How to build someone we can talk to
How to build someone we can talk toHow to build someone we can talk to
How to build someone we can talk to
Jonathan Mugan
ย 
Moving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow ExtendedMoving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow Extended
Jonathan Mugan
ย 

More from Jonathan Mugan (6)

How to build someone we can talk to
How to build someone we can talk toHow to build someone we can talk to
How to build someone we can talk to
ย 
Moving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow ExtendedMoving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow Extended
ย 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural Networks
ย 
Data Day Seattle, From NLP to AI
Data Day Seattle, From NLP to AIData Day Seattle, From NLP to AI
Data Day Seattle, From NLP to AI
ย 
Data Day Seattle, Chatbots from First Principles
Data Day Seattle, Chatbots from First PrinciplesData Day Seattle, Chatbots from First Principles
Data Day Seattle, Chatbots from First Principles
ย 
Chatbots from first principles
Chatbots from first principlesChatbots from first principles
Chatbots from first principles
ย 

Recently uploaded

Recently uploaded (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
ย 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
ย 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
ย 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
ย 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
ย 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
ย 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
ย 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
ย 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
ย 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
ย 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ย 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
ย 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
ย 
๐Ÿฌ The future of MySQL is Postgres ๐Ÿ˜
๐Ÿฌ  The future of MySQL is Postgres   ๐Ÿ˜๐Ÿฌ  The future of MySQL is Postgres   ๐Ÿ˜
๐Ÿฌ The future of MySQL is Postgres ๐Ÿ˜
ย 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
ย 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
ย 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
ย 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
ย 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
ย 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
ย 

What Deep Learning Means for Artificial Intelligence

  • 1. What Deep Learning Means for Artificial Intelligence Jonathan Mugan Austin Data Geeks March 11, 2015 @jmugan, www.jonathanmugan.com
  • 2. AI through the lens of System 1 and System 2 Psychologist Daniel Kahneman in Thinking Fast and Slow describes humans as having two modes of thought: System 1 and System 2. System 1: Fast and Parallel System 2: Slow and Serial AI systems in these domains are useful but limited. Called GOFAI (Good, Old- Fashioned Artificial Intelligence). 1. Search and planning 2. Logic 3. Rule-based systems AI systems in these domains have been lacking. 1. Serial computers too slow 2. Lack of training data 3. Didnโ€™t have the right algorithms Subconscious: E.g., face recognition or speech understanding. Conscious: E.g., when listening to a conversation or making PowerPoint slides. We underestimated how hard it would be to implement. E.g., we thought computer vision would be easy. We assumed it was the most difficult. E.g., we thought chess was hard.
  • 3. AI through the lens of System 1 and System 2 Psychologist Daniel Kahneman in Thinking Fast and Slow describes humans as having two modes of thought: System 1 and System 2. System 1: Fast and Parallel System 2: Slow and Serial AI systems in these domains are useful but limited. Called GOFAI (Good, Old- Fashioned Artificial Intelligence). 1. Search and planning 2. Logic 3. Rule-based systems Subconscious: E.g., face recognition or speech understanding. Conscious: E.g., when listening to a conversation or making PowerPoint slides. We underestimated how hard it would be to implement. E.g., we thought computer vision would be easy. We assumed it was the most difficult. E.g., we thought chess was hard. This has changed 1. We now have GPUs and distributed computing 2. We have Big Data 3. We have new algorithms [Bengio et al., 2003; Hinton et al., 2006; Ranzato et al., 2006]
  • 4. Deep learning begins with a little function sum ๐‘ฅ = ๐‘–=1 ๐‘› ๐‘ค๐‘– ๐‘ฅ๐‘– = ๐‘ค ๐‘‡ ๐‘ฅ ๐‘ค๐‘’๐‘–๐‘”โ„Ž๐‘ก1 ร— ๐‘–๐‘›๐‘๐‘ข๐‘ก1 ๐‘ค๐‘’๐‘–๐‘”โ„Ž๐‘ก2 ร— ๐‘–๐‘›๐‘๐‘ข๐‘ก2 ๐‘ค๐‘’๐‘–๐‘”โ„Ž๐‘ก3 ร— ๐‘–๐‘›๐‘๐‘ข๐‘ก3+ ๐‘ ๐‘ข๐‘š It all starts with a humble linear function called a perceptron. Perceptron: If ๐‘ ๐‘ข๐‘š > ๐‘กโ„Ž๐‘Ÿ๐‘’๐‘ โ„Ž๐‘œ๐‘™๐‘‘: output 1 Else: output 0 In math, with ๐‘ฅ being an input vector and ๐‘ค being a weight vector. Example: The inputs can be your data. Question: Should I buy this car? 0.2 ร— ๐‘”๐‘Ž๐‘  ๐‘š๐‘–๐‘™๐‘Ž๐‘”๐‘’ 0.3 ร— โ„Ž๐‘œ๐‘Ÿ๐‘ ๐‘’ ๐‘๐‘œ๐‘ค๐‘’๐‘Ÿ 0.5 ร— ๐‘›๐‘ข๐‘š ๐‘๐‘ข๐‘ โ„Ž๐‘œ๐‘™๐‘‘๐‘’๐‘Ÿ๐‘ + ๐‘ ๐‘ข๐‘š If ๐‘ ๐‘ข๐‘š > ๐‘กโ„Ž๐‘Ÿ๐‘’๐‘ โ„Ž๐‘œ๐‘™๐‘‘: buy car Else: walk
  • 5. These little functions are chained together Deep learning comes from chaining a bunch of these little functions together. Chained together, they are called neurons. ๐œŽ ๐‘ ๐‘ข๐‘š ๐‘ฅ + ๐‘ where ๐œŽ ๐‘ฅ = 1 1+ 1 ๐‘’ To create a neuron, we add a nonlinearity to the perceptron to get extra representational power when we chain them together. Our nonlinear perceptron is sometimes called a sigmoid. The value ๐‘ just offsets the sigmoid so the center is at 0. Plot of a sigmoid
  • 6. Single artificial neuron ๐‘ค๐‘’๐‘–๐‘”โ„Ž๐‘ก1 ร— ๐‘–๐‘›๐‘๐‘ข๐‘ก1 ๐‘ค๐‘’๐‘–๐‘”โ„Ž๐‘ก2 ร— ๐‘–๐‘›๐‘๐‘ข๐‘ก2 ๐‘ค๐‘’๐‘–๐‘”โ„Ž๐‘ก3 ร— ๐‘–๐‘›๐‘๐‘ข๐‘ก3 Output, or input to next neuron
  • 7. Three-layered neural network A bunch of neurons chained together is called a neural network. Layer 2: hidden layer. Called this because it is neither input nor output. [16.2, 17.3, โˆ’52.3, 11.1] Layer 3: output. E.g., cat or not a cat; buy the car or walk. Layer 1: input data. Can be pixel values or the number of cup holders. This network has three layers. (Some edges lighter to avoid clutter.)
  • 8. Training with supervised learning Supervised Learning: You show the network a bunch of things with a labels saying what they are, and you want the network to learn to classify future things without labels. Example: here are some pictures of cats. Tell me which of these other pictures are of cats. To train the network, want to find the weights that correctly classify all of the training examples. You hope it will work on the testing examples. Done with an algorithm called Backpropagation [Rumelhart et al., 1986]. [16.2, 17.3, โˆ’52.3, 11.1]
  • 9. Deep learning is adding more layers There is no exact definition of what constitutes โ€œdeep learning.โ€ The number of weights (parameters) is generally large. Some networks have millions of parameters that are learned. [16.2, 17.3, โˆ’52.3, 11.1] (Some edges omitted to avoid clutter.)
  • 10. Talk Outline โ€ข Introduction โ€ข Deep learning and natural language processing โ€ข Deep learning and computer vision โ€ข Deep learning and robot actions โ€ข What deep learning still canโ€™t do โ€ข Practical ways you can get started โ€ข Conclusion โ€ข References
  • 11. Talk Outline โ€ข Introduction โ€ข Deep learning and natural language processing โ€ข Deep learning and computer vision โ€ข Deep learning and robot actions โ€ข What deep learning still canโ€™t do โ€ข Practical ways you can get started โ€ข Conclusion โ€ข References
  • 12. Recall our standard architecture Layer 2: hidden layer. Called this because it is neither input nor output. Layer 3: output. E.g., cat or not a cat; buy the car or walk. Layer 1: input data. Can be pixel values or the number of cup holders. [16.2, 17.3, โˆ’52.3, 11.1] Is this a cat?
  • 13. Neural nets with multiple outputs Okay, but what kind of cat is it? ๐‘ƒ(๐‘ฅ) [16.2, 17.3, โˆ’52.3, 11.1] ๐‘ƒ ๐‘œ๐‘– = ๐‘’ ๐‘ ๐‘ข๐‘š(๐‘ฅ ๐‘–)+๐‘ ๐‘– ๐‘— ๐‘’ ๐‘ ๐‘ข๐‘š ๐‘ฅ ๐‘— +๐‘ ๐‘— ๐‘ƒ(๐‘ฅ)๐‘ƒ(๐‘ฅ) ๐‘ƒ(๐‘ฅ) ๐‘ƒ(๐‘ฅ) Introduce a new node called a softmax. Probability a house cat Probability a lion Probability a panther Probability a bobcat Where ๐‘— varies over all the other nodes at that layer. Just normalize the output ๐‘œ๐‘– over the sum of the other outputs.
  • 14. Learning word vectors 13.2, 5.4, โˆ’3.1 [โˆ’12.1, 13.1, 0.1] [7.2, 3.2,-1.9] the man ran From the sentence, โ€œThe man ran fast.โ€ ๐‘ƒ(๐‘ฅ)๐‘ƒ(๐‘ฅ) ๐‘ƒ(๐‘ฅ) ๐‘ƒ(๐‘ฅ) Probability of โ€œfastโ€ Probability of โ€œslowโ€ Probability of โ€œtacoโ€ Probability of โ€œbobcatโ€ Learns a vector for each word based on the โ€œmeaningโ€ in the sentence by trying to predict the next word [Bengio et al., 2003]. Computationally expensive because you need a softmax node for each word in the vocabulary. Recent work models the top layer using a binary tree [Mikolov et al., 2013]. These numbers updated along with the weights and become the vector representations of the words.
  • 15. Comparing vector and symbolic representations Vector representation taco = [17.32, 82.9, โˆ’4.6, 7.2] Symbolic representation taco = ๐‘ก๐‘Ž๐‘๐‘œ โ€ข Vectors have a similarity score. โ€ข A taco is not a burrito but similar. โ€ข Symbols can be the same or not. โ€ข A taco is just as different from a burrito as a Toyota. โ€ข Vectors have internal structure [Mikolov et al., 2013]. โ€ข Italy โ€“ Rome = France โ€“ Paris โ€ข King โ€“ Queen = Man โ€“ Woman โ€ข Symbols have no structure. โ€ข Symbols are arbitrarily assigned. โ€ข Meaning relative to other symbols. โ€ข Vectors are grounded in experience. โ€ข Meaning relative to predictions. โ€ข Ability to learn representations makes agents less brittle.
  • 16. Learning vectors of longer text [33.1, 7.5, 11.2] 13.2, 5.4, โˆ’3.1 [โˆ’12.1, 13.1, 0.1] [7.2, 3.2,-1.9] <current paragraph> the man ran From the sentence, โ€œThe man ran fast.โ€ ๐‘ƒ(๐‘ฅ)๐‘ƒ(๐‘ฅ) ๐‘ƒ(๐‘ฅ) ๐‘ƒ(๐‘ฅ) Probability of โ€œfastโ€ Probability of โ€œslowโ€ Probability of โ€œtacoโ€ Probability of โ€œbobcatโ€ The โ€œmeaningโ€ of a paragraph is encoded in a vector that allows you to predict the next word in the paragraph using already learned word vectors [Le and Mikolov, 2014].
  • 17. Learning to parse The dog went to the store. NP NP PP VP PPGiven the semantic representation of words, we can do structure learning. The system can learn to group things that go together.
  • 18. Learning to parse [1.1,1.3] [8.1, 2.1] [2.6, 8.4] [2.0, 7.1] [8.9, 2.4] [7.1, 2.2] [1.1, 1.3] [1.1, 1.3] [7.3, 9.8] [3.2, 8.8] [9.1, 0.2]Given the semantic representation of words we can do structure learning. Socher et al. [2010] learned word vectors and then used the Penn Tree Bank to train a parser that encodes semantic meaning. The dog went to the store.
  • 19. Talk Outline โ€ข Introduction โ€ข Deep learning and natural language processing โ€ข Deep learning and computer vision โ€ข Deep learning and robot actions โ€ข What deep learning still canโ€™t do โ€ข Practical ways you can get started โ€ข Conclusion โ€ข References
  • 20. Talk Outline โ€ข Introduction โ€ข Deep learning and natural language processing โ€ข Deep learning and computer vision โ€ข Deep learning and robot actions โ€ข What deep learning still canโ€™t do โ€ข Practical ways you can get started โ€ข Conclusion โ€ข References
  • 21. Vision is hard โ€ข Even harder for 3D objects. โ€ข You move a bit, and everything changes. 72 โ‹ฏ 91 โ‹ฎ โ‹ฑ โ‹ฎ 16 โ‹ฏ 40 How a computer sees an image. Example from MNIST handwritten digit dataset [LeCun and Cortes, 1998]. Vision is hard because images are big matrices of numbers.
  • 22. Breakthrough: Unsupervised Model โ€ข Big breakthrough in 2006 by Hinton et al. โ€ข Use a network with symmetric weights called a restricted Boltzmann machine. โ€ข Stochastic binary neuron. โ€ข Probabilistically outputs 0 (turns off) or 1 (turns on) based on the weight of the inputs from on units. 0 1 ๐‘ƒ ๐‘ ๐‘– = 1 = 1 1 + 1 ๐‘’ ๐‘ ๐‘ข๐‘š(๐‘  ๐‘–) โ€ข Limit connections to be from one layer to the next. โ€ข Fast because decisions are made locally. โ€ข Trained in an unsupervised way to reproduce the data. 0 1 0 1 0 1 0 1 0 1 0 1
  • 23. Stack up the layers to make a deep network Input data Hidden The output of each layer becomes the input to the next layer [Hinton et al., 2006]. See video starting at second 45 https://www.coursera.org /course/neuralnets 0 1 0 1 0 1 0 1 0 1 0 1 Input data Hidden 0 1 0 1 0 1 0 1 0 1 0 1 Hidden layer becomes input data of next layer.
  • 24. Computer vision, scaling up Unsupervised learning was scaled up by Honglak Lee et al. [2009] to learn high-level visual features. [Lee et al., 2009] Further scaled up by Quoc Le et al. [2012]. โ€ข Used 1,000 machines (16,000 cores) running for 3 days to train 1 billion weights by watching YouTube videos. โ€ข The network learned to identify cats. โ€ข The network wasnโ€™t told to look for cats, it naturally learned that cats were integral to online viewing. โ€ข Video on the topic at NYT http://www.nytimes.com/2012/06/26/tec hnology/in-a-big-network-of-computers- evidence-of-machine-learning.html
  • 25. Why is this significant? To have a grounded understanding of its environment, an agent must be able to acquire representations through experience [Pierce et al., 1997; Mugan et al., 2012]. Without a grounded understanding, the agent is limited to what was programmed in. We saw that unsupervised learning could be used to learn the meanings of words, grounded in the experience of reading. Using these deep Boltzmann machines, machines can learn to see the world through experience.
  • 26. Limit connections and duplicate parameters Convolutional neural networks build in a kind of feature invariance. They take advantage the layout of the pixels. [16.2, 17.3, โˆ’52.3, 11.1] With the layers and topology, our networks are starting to look a little like the visual cortex. Although, we still donโ€™t fully understand the visual cortex. Different areas of the image go to different parts of the neural network, and weights are shared.
  • 27. Recent deep vision networks ImageNet http://www.image-net.org/ is a huge collection of images corresponding to the nouns of the WordNet hierarchy. There are hundreds to thousands of images per noun. 2012 โ€“ Deep Learning begins to dominate image recognition Krizhevsky et al. [2012] got 16% error on recognizing objects, when before the best error was 26%. They used a convolutional neural network. 2015 โ€“ Deep Learning surpasses human level performance He et al. [2015] surpassed human level performance on recognizing images of objects.* Computers seem to have an advantage when the classes of objects are fine grained, such as multiple species of dogs. Closing note on computer vision: Hinton points out that modern networks can just work with top down (supervised learning) if the network is small enough relative to the amount of the training data; but the goal of AI is broad-based understanding, and there will likely never be enough labeled training data for general intelligence. *But deep learning can be easily fooled [Nguyen et al., 2014]. Enlightening video at https://www.youtube.com/watch?v=M2IebCN9Ht4.
  • 28. Talk Outline โ€ข Introduction โ€ข Deep learning and natural language processing โ€ข Deep learning and computer vision โ€ข Deep learning and robot actions โ€ข What deep learning still canโ€™t do โ€ข Practical ways you can get started โ€ข Conclusion โ€ข References
  • 29. Talk Outline โ€ข Introduction โ€ข Deep learning and natural language processing โ€ข Deep learning and computer vision โ€ข Deep learning and robot actions โ€ข What deep learning still canโ€™t do โ€ข Practical ways you can get started โ€ข Conclusion โ€ข References
  • 30. A stamping in of behavior When we think of doing things, we think of conscious planning with System 2. Imagine trying to get to Seattle. โ€ข Get to the airport. How? Take a taxi. How? Call a taxi. How? Find my phone. โ€ข Some behaviors arise more from a a gradual stamping in [Thorndike, 1898]. โ€ข Became the study of Behaviorism [Skinner, 1953] (see Skinner box on the right). โ€ข Formulated into artificial intelligence as Reinforcement Learning [Sutton and Barto, 1998]. By Andreas1 (Adapted from Image:Boite skinner.jpg) [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0/)], via Wikimedia Commons A โ€œSkinner boxโ€
  • 31. Beginning with random exploration In reinforcement learning, the agent begins by randomly exploring until it reaches its goal.
  • 32. โ€ข When it reaches the goal, credit is propagated back to its previous states. โ€ข The agent learns the function ๐‘„ ๐œ‹(๐‘ , ๐‘Ž), which gives the cumulative expected discounted reward of being in state ๐‘  and taking action ๐‘Ž and acting according to policy ๐œ‹ thereafter. Reaching the goal
  • 33. Eventually, the agent learns the value of being in each state and taking each action and can therefore always do the best thing in each state. Learning the behavior
  • 34. Playing Atari with Deep Learning Input, last four frames, where each frame is downsampled to 84 by 84 pixels. [Mnih et al., 2013] represent the state-action value function ๐‘„ ๐‘ , ๐‘Ž as a convolutional neural network. ๐‘ƒ(๐‘ฅ)๐‘ƒ(๐‘ฅ) ๐‘ƒ(๐‘ฅ) ๐‘ƒ(๐‘ฅ) Value of moving left Value of moving right Value of shooting Value of reloading In [Mnih et al., 2013], this is actually three hidden layers. See some videos at http://mashable.com/2015/02 /25/computer-wins-at-atari- games/
  • 35. Talk Outline โ€ข Introduction โ€ข Deep learning and natural language processing โ€ข Deep learning and computer vision โ€ข Deep learning and robot actions โ€ข What deep learning still canโ€™t do โ€ข Practical ways you can get started โ€ข Conclusion โ€ข References
  • 36. Talk Outline โ€ข Introduction โ€ข Deep learning and natural language processing โ€ข Deep learning and computer vision โ€ข Deep learning and robot actions โ€ข What deep learning still canโ€™t do โ€ข Practical ways you can get started โ€ข Conclusion โ€ข References
  • 37. We must go deeper for a robust System 1 Imagine a dude standing on a table. How would a computer know that if you move the table you also move the dude? Likewise, how could a computer know that it only rains outside? Or, as Marvin Minsky asks, how could a computer learn that you can pull a box with a string but not push it?
  • 38. We must go deeper for a robust System 1 Imagine a dude standing on a table. How would a computer know that if you move the table you also move the dude? Likewise, how could a computer know that it only rains outside? Or, as Marvin Minsky asks, how could a computer learn that you can pull a box with a string but not push it? You couldnโ€™t possibly explain all of these situations to a computer. Thereโ€™s just too many variations. A robot can learn through experience, but it must be able to efficiently generalize that experience.
  • 39. Abstract thinking through image schemas Humans efficiently generalize experience using abstractions called image schemas [Johnson, 1987]. Image schemas map experience to conceptual structure. Developmental psychologist Jean Mandler argues that some image schemas are formed before children begin to talk, and that language is eventually built onto this base set of schemas [2004]. โ€ข Consider what it means for an object to contain another object, such as for a box to contain a ball. โ€ข The container constrains the movement of the object inside. If the container is moved, the contained object moves. โ€ข These constraints are represented by the container image schema. โ€ข Other image schemas from Mark Johnsonโ€™s book, The Body in the Mind: path, counterforce, restraint, removal, enablement, attraction, link, cycle, near-far, scale, part-whole, full-empty, matching, surface, object, and collection.
  • 40. Abstract thinking through metaphors Love is a journey. Love is war. Lakoff and Johnson [1980] argue that we understand abstract concepts through metaphors to physical experience. Neural networks will likely require advances in both architecture and size to reach this level of abstraction. For example, a container can be more than a way of understanding physical constraints, it can be a metaphor used to understand the abstract concept of what an argument is. You could say that someoneโ€™s argument doesnโ€™t hold water, or you could say that it is empty, or you could say that the argument has holes in it.
  • 41. Talk Outline โ€ข Introduction โ€ข Deep learning and natural language processing โ€ข Deep learning and computer vision โ€ข Deep learning and robot actions โ€ข What deep learning still canโ€™t do โ€ข Practical ways you can get started โ€ข Conclusion โ€ข References
  • 42. Talk Outline โ€ข Introduction โ€ข Deep learning and natural language processing โ€ข Deep learning and computer vision โ€ข Deep learning and robot actions โ€ข What deep learning still canโ€™t do โ€ข Practical ways you can get started โ€ข Conclusion โ€ข References
  • 43. Some code to get you started Google Word2Vec: they looked at (3 billion documents) and created 300 long vectors. https://code.google.com/p/word2vec/ Gensim has an implementation of the learning algorithm in Python. http://radimrehurek.com/gensim/models/word2vec.html Theano is a general-purpose deep learning implementation with great documentation and tutorials. http://deeplearning.net/software/theano/
  • 44. Best learning resources Best place to start. Hintonโ€™s Coursera Course. Get it right from the horseโ€™s mouth. He explains things well. https://www.coursera.org/course/neuralnets Online textbook in preparation for deep learning from Yoshua Bengio and friends. Clear and understandable. http://www.iro.umontreal.ca/~bengioy/dlbook/ Introduction to programming deep learning with Python and Theano. It's clear, detailed, and entertaining. 1-hour talk. http://www.rosebt.com/blog/introduction-to-deep-learning- with-python
  • 45. Talk Outline โ€ข Introduction โ€ข Deep learning and natural language processing โ€ข Deep learning and computer vision โ€ข Deep learning and robot actions โ€ข What deep learning still canโ€™t do โ€ข Practical ways you can get started โ€ข Conclusion โ€ข References
  • 46. Talk Outline โ€ข Introduction โ€ข Deep learning and natural language processing โ€ข Deep learning and computer vision โ€ข Deep learning and robot actions โ€ข What deep learning still canโ€™t do โ€ข Practical ways you can get started โ€ข Conclusion โ€ข References
  • 47. What deep learning means for artificial intelligence Deep learning can allow a robot to autonomously learn a representation through experience with the world. When its representation is grounded in experience, a robot can be autonomous without having to rely on the intentionality of the human designer. If we can continue to scale up deep learning to represent ever higher levels of abstraction, our robots may view the world in an alien way, but they will be independently intelligent.
  • 48. References 1. Yoshua Bengio, Rรฉjean Ducharme, Pascal Vincent, and Christian Janvin. A neural probabilistic language model. The Journal of Machine Learning Research, 3:1137โ€“1155, 2003. 2. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. arXiv preprint arXiv:1502.01852, 2015. 3. Geoffrey Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning algorithm for deep belief nets. Neural computation, 18(7):1527โ€“1554, 2006. 4. M. Johnson. The Body in the Mind: The Bodily Basis of Meaning, Imagination, and Reason. University of Chicago Press, Chicago, Illinois, USA, 1987. 5. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097โ€“1105, 2012. 6. G. Lakoff and M. Johnson. Metaphors We Live By. University of Chicago Press, Chicago, 1980. 7. Quoc Le and Tomas Mikolov. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 1188โ€“1196, 2014. 8. Q.V. Le, M.A. Ranzato, R. Monga, M. Devin, K. Chen, G.S. Corrado, J. Dean, and A. Ng. Building high-level features using large scale unsupervised learning. In International Conference on Machine Learning (ICML), 2012. 9. Yann LeCun and Corinna Cortes. The mnist database of handwritten digits, 1998.
  • 49. References (continued) 10. Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 609โ€“616. ACM, 2009. 11. J. Mandler. The Foundations of Mind, Origins of Conceptual Thought. Oxford University Press, New York, New York, USA, 2004. 12. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages 3111โ€“3119, 2013. 13. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013. 14. J. Mugan and B. Kuipers. Autonomous learning of high-level states and actions in continuous environments. IEEE Trans. Autonomous Mental Development, 4(1):70โ€“86, 2012. 15. Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. arXiv preprint arXiv:1412.1897, 2014. 16. D. M. Pierce and B. J. Kuipers. Map learning with uninterpreted sensors and effectors. 92:169โ€“227, 1997. 17. Marcโ€™Aurelio Ranzato, Christopher Poultney, Sumit Chopra, and Yann LeCun. Efficient learning of sparse representations with an energy-based model. In Advances in neural information processing systems, pages 1137โ€“1144, 2006.
  • 50. References (continued) 18. David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 2, chapter Learning representations by error propagation, pages 3โ€“18โ€“362. 1986. 19. Burrhus Frederic Skinner. Science and human behavior. Simon and Schuster, 1953. 20. Richard Socher, Christopher D Manning, and Andrew Y Ng. Learning continuous phrase representations and syntactic parsing with recursive neural networks. In Proceedings of the NIPS-2010 Deep Learning and Unsupervised Feature Learning Workshop, pages 1โ€“9, 2010. 21. R. S. Sutton and A. G. Barto. Reinforcement Learning. MIT Press, Cambridge MA, 1998. 22. E.L. Thorndike. Animal intelligence: an experimental study of the associative processes in animals. 1898.
  • 51. Thanks for listening Jonathan Mugan @jmugan www.jonathanmugan.com