SlideShare a Scribd company logo
1 of 69
What Deep Learning Means for
Artificial Intelligence
Jonathan Mugan, PhD
Encore Presentation to Austin Data Geeks
November 1, 2016
AI through the lens of System 1 and System 2
Psychologist Daniel Kahneman in Thinking Fast and Slow describes
humans as having two modes of thought: System 1 and System 2.
System 1: Fast and Parallel System 2: Slow and Serial
AI systems in these domains are useful
but limited. Called GOFAI (Good, Old-
Fashioned Artificial Intelligence).
1. Search and planning
2. Logic
3. Rule-based systems
AI systems in these domains have
been lacking.
1. Serial computers too slow
2. Lack of training data
3. Didn’t have the right algorithms
Subconscious: E.g., face recognition or
speech understanding.
Conscious: E.g., when listening to a
conversation or making PowerPoint slides.
We underestimated how hard it would
be to implement. E.g., we thought
computer vision would be easy.
We assumed it was the most difficult.
E.g., we thought chess was hard.
AI through the lens of System 1 and System 2
Psychologist Daniel Kahneman in Thinking Fast and Slow describes
humans as having two modes of thought: System 1 and System 2.
System 1: Fast and Parallel System 2: Slow and Serial
AI systems in these domains are useful
but limited. Called GOFAI (Good, Old-
Fashioned Artificial Intelligence).
1. Search and planning
2. Logic
3. Rule-based systems
Subconscious: E.g., face recognition or
speech understanding.
Conscious: E.g., when listening to a
conversation or making PowerPoint slides.
We underestimated how hard it would
be to implement. E.g., we thought
computer vision would be easy.
We assumed it was the most difficult.
E.g., we thought chess was hard.
This has changed
1. We now have GPUs and
distributed computing
2. We have Big Data
3. We have new algorithms [Bengio et
al., 2003; Hinton et al., 2006; Ranzato et
al., 2006]
Deep learning begins with a little function
It all starts with a humble linear function called a perceptron.
weight1 ✖ input1
weight2 ✖ input2
weight3 ✖ input3
sum
✚
Perceptron:
If sum > threshold: output 1
Else: output 0
Example: The inputs can be your data. Question: Should I buy this car?
0.2 ✖ gas mileage
0.3 ✖ horsepower
0.5 ✖ num cup holders
sum
✚
Perceptron:
If sum > threshold: buy
Else: walk
These little functions are chained together
Deep learning comes from chaining a bunch of these little
functions together. Chained together, they are called
neurons.
To create a neuron, we add a nonlinearity to the perceptron to get
extra representational power when we chain them together.
Our nonlinear perceptron is
sometimes called a sigmoid.
Plot of a sigmoid
where
The value b just offsets the sigmoid so the
center is at 0.
Single artificial neuron
Output, or input to
next neuron
weight1 ✖ input1
weight2 ✖ input2
weight3 ✖ input3
Three-layered neural network
A bunch of neurons chained together is called a neural network.
Layer 2: hidden layer. Called
this because it is neither input
nor output.
Layer 3: output. E.g., cat
or not a cat; buy the car or
walk.
Layer 1: input data. Can
be pixel values or the number
of cup holders.
This network has three layers.
(Some edges lighter
to avoid clutter.)
[16.2, 17.3, −52.3, 11.1]
Training with supervised learning
Supervised Learning: You show the network a bunch of things
with a labels saying what they are, and you want the network to
learn to classify future things without labels.
Example: here are some
pictures of cats. Tell me
which of these other pictures
are of cats.
To train the network, want to
find the weights that
correctly classify all of the
training examples. You hope
it will work on the testing
examples.
Done with an algorithm
called Backpropagation
[Rumelhart et al., 1986].
[16.2, 17.3, −52.3, 11.1]
Training with supervised learning
Supervised Learning: You show the network a bunch of things
with a labels saying what they are, and you want the network to
learn to classify future things without labels.
𝑤
𝑊
𝑦
𝑥 [16.2, 17.3, −52.3, 11.1]
Why Google’s Deep Learning toolbox is called TensorFlow.
y: output
x: input
h: number of hidden neurons
n: length of vector x
Learning is learning parameter
values
Deep learning is adding more layers
There is no exact
definition of
what constitutes
“deep learning.”
The number of
weights (parameters)
is generally large.
Some networks have
millions of parameters
that are learned.
(Some edges omitted
to avoid clutter.)
[16.2, 17.3, −52.3, 11.1]
Deep learning is adding more layers
Many “paths” for
updates in weights to
effect error.
Backpropagation is
like dynamic
programming.
“Chain Rule + Dynamic
Programming = Neural
Networks”
Edward Z. Yang
http://blog.ezyang.com/2011/05/neural-
networks
(Some edges omitted
to avoid clutter.)
[16.2, 17.3, −52.3, 11.1]
Talk Outline
• Introduction
• Deep learning and natural language processing
• Deep learning and computer vision
• Deep learning and robot actions
• What deep learning still can’t do
• Practical ways you can get started
• Conclusion
• About DeepGrammar (4 minutes, if time)
Talk Outline
• Introduction
• Deep learning and natural language processing
• Deep learning and computer vision
• Deep learning and robot actions
• What deep learning still can’t do
• Practical ways you can get started
• Conclusion
• About DeepGrammar (4 minutes, if time)
Deep learning enables sub-symbolic processing
Symbolic systems can be brittle.
I
bought
a
car
.
<i>
<bought>
<a>
<car>
<.>
You have to remember to
represent “purchased” and
“automobile.”
What about “truck”?
How do you encode the
meaning of the entire
sentence?
Recall our standard architecture
Layer 2: hidden layer. Called
this because it is neither input
nor output.
Layer 3: output. E.g., cat
or not a cat; buy the car or
walk.
Layer 1: input data. Can
be pixel values or the number
of cup holders.
Is this a cat?
[16.2, 17.3, −52.3, 11.1]
Neural nets with multiple outputs
Okay, but what kind of cat is it?
𝑃(𝑥)𝑃(𝑥)𝑃(𝑥) 𝑃(𝑥) 𝑃(𝑥)
Introduce a new node
called a softmax.
Probability a
house cat
Probability a
lion
Probability a
panther
Probability a
bobcat
Just normalize the
output over the sum of
the other outputs
(using the exponential).
Gives a probability.
[16.2, 17.3, −52.3, 11.1]
Learning word vectors
13.2, 5.4, −3.1 [−12.1, 13.1, 0.1] [7.2, 3.2,-1.9]
the man ran
From the sentence, “The man ran fast.”
𝑃(𝑥)𝑃(𝑥) 𝑃(𝑥) 𝑃(𝑥)
Probability
of “fast”
Probability
of “slow”
Probability
of “taco”
Probability
of “bobcat”
Learns a vector for each word based on the “meaning” in the sentence by
trying to predict the next word [Bengio et al., 2003].
These numbers updated
along with the weights
and become the vector
representations of the
words.
Comparing vector and symbolic representations
Vector representation
taco = [17.32, 82.9, −4.6, 7.2]
Symbolic representation
taco = 𝑡𝑎𝑐𝑜
• Vectors have a similarity score.
• A taco is not a burrito but similar.
• Symbols can be the same or not.
• A taco is just as different from a
burrito as a Toyota.
• Vectors have internal structure
[Mikolov et al., 2013].
• Italy – Rome = France – Paris
• King – Queen = Man – Woman
• Symbols have no structure.
• Symbols are arbitrarily assigned.
• Meaning relative to other symbols.
• Vectors are grounded in
experience.
• Meaning relative to predictions.
• Ability to learn representations
makes agents less brittle.
Yeah, that’s a word
But what about a sentence?
Algorithm for generating vectors for sentences
1. Make the sentence vector be the vector for the first
word.
2. For each subsequent word, combine its vector with the
sentence vector.
3. The resulting vector after the last word is the sentence
vector.
Can be implemented using a recurrent neural
network (RNN)
Encoding sentence meaning into a vector
h0
The
“The patient fell.”
Encoding sentence meaning into a vector
h0
The
h1
patient
“The patient fell.”
Encoding sentence meaning into a vector
h0
The
h1
patient
h2
fell
“The patient fell.”
Encoding sentence meaning into a vector
Like a hidden Markov model, but doesn’t make the Markov
assumption and benefits from a vector representation.
h0
The
h1
patient
h2
fell
h3
.
“The patient fell.”
Cool, a sentence vector
But what can you do with it?
You can unwind in the other direction to do
machine translation.
Called a seq2seq model, or Neural Machine
Translation, or encoder-decoder model.
You can feed it to a classifier.
Decoding sentence meaning
Machine translation, or structure learning more generally.
El
h3
Decoding sentence meaning
Machine translation, or structure learning more generally.
El
h3 h4
Decoding sentence meaning
Machine translation, or structure learning more generally.
El
h3
paciente
h4
Decoding sentence meaning
Machine translation, or structure learning more generally.
El
h3
paciente
h4
cayó
h5
.
h6
[Cho et al., 2014]
It keeps generating until it generates a stop symbol.
Generating image captions
Convolutional
neural network
An
h0
angry
h1
sister
h2
.
h3
[Karpathy and Fei-Fei, 2015]
[Vinyals et al., 2015]Image
Image caption examples
[Karpathy and Fei-Fei, 2015] http://cs.stanford.edu/people/karpathy/deepimagesent/
See:
Attention [Bahdanau et al., 2014]
El
h3
paciente
h4
cayó
h5
.
h6
h0
The
h1
patient
h2
fell
h3
.
RNNs and Structure Learning
• In addition to machine translation and generating
captions for images, can be used to learn just about
any kind of structure you’d want, as long as you
have lots of training data.
Deep learning and question answering
RNNs answer questions.
What is the translation of this
phrase to French?
What is the next word?
Attention is useful for question
answering.
This can be generalized to which facts
the learner should pay attention to
when answering questions.
Deep learning and question answering
Bob went home.
Tim went to the junkyard.
Bob picked up the jar.
Bob went to town.
Where is the jar? A: town
• Memory Networks [Weston et al.,
2014]
• Updates memory vectors based on
a question and finds the best one to
give the output.
The office is north of the yard.
The bath is north of the office.
The yard is west of the kitchen.
How do you go from the office to
the kitchen? A: south, east
• Neural Reasoner [Peng et al.,
2015]
• Encodes the question and facts in
many layers, and the final layer is
put through a function that gives
the answer.
Talk Outline
• Introduction
• Deep learning and natural language processing
• Deep learning and computer vision
• Deep learning and robot actions
• What deep learning still can’t do
• Practical ways you can get started
• Conclusion
• About DeepGrammar (4 minutes, if time)
Talk Outline
• Introduction
• Deep learning and natural language processing
• Deep learning and computer vision
• Deep learning and robot actions
• What deep learning still can’t do
• Practical ways you can get started
• Conclusion
• About DeepGrammar (4 minutes, if time)
Vision is hard
• Even harder for 3D
objects.
• You move a bit, and
everything changes.
How a computer sees an image
[22, 81, 44, 88, 17, 0, ..., 45]
Example from MNIST
handwritten digit dataset
[LeCun and Cortes, 1998].
Vision is hard because images are big matrices of numbers.
Breakthrough: Unsupervised Model
• Big breakthrough in 2006 by Hinton et al.
• Use a network with symmetric weights called a restricted Boltzmann machine.
• Stochastic binary neuron.
• Probabilistically outputs 0
(turns off) or 1 (turns on)
based on the weight of the
inputs from on units.
0 1
• Limit connections to be from one
layer to the next.
• Fast because decisions are made
locally.
• Trained in an unsupervised way
to reproduce the data.
0 1 0 1 0 1
0 1 0 1 0 1
Stack up the layers to make a deep network
Input data
Hidden
The output of each layer
becomes the input to the
next layer [Hinton et al.,
2006].
See video starting at
second 45
https://www.coursera.org
/course/neuralnets
0 1 0 1 0 1
0 1 0 1 0 1
Input data
Hidden
0 1 0 1 0 1
0 1 0 1 0 1
Hidden layer becomes
input data of next layer.
Computer vision, scaling up
Unsupervised learning was scaled up
by Honglak Lee et al. [2009] to learn
high-level visual features.
[Lee et al., 2009]
Further scaled up by Quoc Le et al.
[2012].
• Used 1,000 machines (16,000
cores) running for 3 days to train 1
billion weights by watching
YouTube videos.
• The network learned to identify
cats.
• The network wasn’t told to look
for cats, it naturally learned that
cats were integral to online
viewing.
• Video on the topic at NYT
http://www.nytimes.com/2012/06/26/tec
hnology/in-a-big-network-of-computers-
evidence-of-machine-learning.html
Why is this significant?
To have a grounded understanding
of its environment, an agent must
be able to acquire representations
through experience [Pierce et al.,
1997; Mugan et al., 2012].
Without a grounded understanding,
the agent is limited to what was
programmed in.
We saw that unsupervised learning
could be used to learn the meanings of words, grounded in the
experience of reading.
Using these deep Boltzmann machines, machines can learn
to see the world through experience.
CNNs: Limit connections and duplicate parameters
Convolutional neural networks (CNNs)
build in a kind of feature invariance.
1. Convolution layers
• Bank of feature detectors
• Different feature detectors slide
over the image
2. Sub-sampling layers
• The next layer pools from a
region on the layer below With the layers and
topology, our networks are
starting to look a little like
the visual cortex. Although,
we still don’t fully
understand the visual
cortex.
Modern image processing
systems have many such layer
pairs.
More recent deep vision networks
ImageNet http://www.image-net.org/ is a huge collection of images
corresponding to the nouns of the WordNet hierarchy. There are hundreds to
thousands of images per noun.
2012 – Deep Learning begins to dominate image recognition
Krizhevsky et al. [2012] got 16% error on recognizing objects, when
before the best error was 26%. They used a convolutional neural
network (CNN).
2015 – Deep Learning surpasses human level performance
He et al. [2015] surpassed human level performance on recognizing
images of objects.* Computers seem to have an advantage when the
classes of objects are fine grained, such as multiple species of dogs.
*But deep learning can be easily fooled [Nguyen et al., 2014]. Enlightening video at
https://www.youtube.com/watch?v=M2IebCN9Ht4.
Talk Outline
• Introduction
• Deep learning and natural language processing
• Deep learning and computer vision
• Deep learning and robot actions
• What deep learning still can’t do
• Practical ways you can get started
• Conclusion
• About DeepGrammar (4 minutes, if time)
Talk Outline
• Introduction
• Deep learning and natural language processing
• Deep learning and computer vision
• Deep learning and robot actions
• What deep learning still can’t do
• Practical ways you can get started
• Conclusion
• About DeepGrammar (4 minutes, if time)
A stamping in of behavior
When we think of doing things, we think of conscious planning
with System 2.
Imagine trying to get to Seattle.
• Get to the airport. How? Take a taxi. How? Call a taxi. How?
Find my phone.
• Some behaviors arise more from a
a gradual stamping in [Thorndike,
1898].
• Became the study of Behaviorism
[Skinner, 1953] (see Skinner box
on the right).
• Formulated into artificial
intelligence as Reinforcement
Learning [Sutton and Barto, 1998]. By Andreas1 (Adapted from Image:Boite skinner.jpg) [GFDL
(http://www.gnu.org/copyleft/fdl.html) or CC-BY-SA-3.0
(http://creativecommons.org/licenses/by-sa/3.0/)], via
Wikimedia Commons
A “Skinner box”
Beginning with random exploration
In reinforcement learning, the agent begins by
randomly exploring until it reaches its goal.
Reaching the goal
• When it reaches the goal, credit is propagated back to its previous states.
• The agent learns the function Q(s,a), which gives the cumulative expected
discounted reward of being in state s and taking action a and acting according
to the policy thereafter.
Eventually, the agent learns the value of being in each state and
taking each action and can therefore always do the best thing in
each state.
Learning the behavior
Playing Atari with deep learning
Input, last four frames, where each frame
is downsampled to 84 by 84 pixels.
[Mnih et al., 2013]
represent the
state-action value
function as a
convolutional
neural network.
𝑃(𝑥)𝑃(𝑥) 𝑃(𝑥) 𝑃(𝑥)
Value of
moving left
Value of
moving right
Value of
shooting
Value of
reloading
In [Mnih et al., 2013],
this is actually three
hidden layers.
See some videos at
http://mashable.com/2015/02
/25/computer-wins-at-atari-
games/
Talk Outline
• Introduction
• Deep learning and natural language processing
• Deep learning and computer vision
• Deep learning and robot actions
• What deep learning still can’t do
• Practical ways you can get started
• Conclusion
• About DeepGrammar (4 minutes, if time)
Talk Outline
• Introduction
• Deep learning and natural language processing
• Deep learning and computer vision
• Deep learning and robot actions
• What deep learning still can’t do
• Practical ways you can get started
• Conclusion
• About DeepGrammar (4 minutes, if time)
What deep learning still can’t do
System 1: Fast and Parallel
• With computer vision, we seem to be on the
right track
• Reinforcement learning is useful in increasingly
large worlds
System 2: Slow and Serial
• Still lacking in common sense
• Language processing needs a grounded
understanding
Limitations of deep learning
The encoded meaning is grounded with respect to other words.
There is no linkage to the physical world.
"ICubLugan01 Reaching". Licensed under CC BY-SA 3.0 via Wikipedia - https://en.wikipedia.org/wiki/File:ICubLugan01_Reaching.png#/media/File:ICubLugan01_Reaching.png
The iCub http://www.icub.org/
Limitations of deep learning
Bob went home.
Tim went to the junkyard.
Bob picked up the jar.
Bob went to town.
Where is the jar? A: town
Deep learning has no
understanding of what it means for
the jar to be in town.
For example that it can’t also be at
the junkyard. Or that it may be in
Bob’s car, or still in his hands.
The encoded meaning is grounded with respect to other words.
There is no linkage to the physical world.
Limitations of deep learning
Imagine a dude standing
on a table. How would a
computer know that if
you move the table you
also move the dude?
Likewise, how could a
computer know that it
only rains outside?
Or, as Marvin Minsky asks, how could a computer learn
that you can pull a box with a string but not push it?
Limitations of deep learning
Imagine a dude standing
on a table. How would a
computer know that if
you move the table you
also move the dude?
Likewise, how could a
computer know that it
only rains outside?
Or, as Marvin Minsky asks, how could a computer learn
that you can pull a box with a string but not push it?
No one knows how to explain
all of these situations to a
computer. There’s just too
many variations.
A robot can learn through
experience, but it must be
able to efficiently generalize
that experience.
Talk Outline
• Introduction
• Deep learning and natural language processing
• Deep learning and computer vision
• Deep learning and robot actions
• What deep learning still can’t do
• Practical ways you can get started
• Conclusion
• About DeepGrammar (4 minutes, if time)
Talk Outline
• Introduction
• Deep learning and natural language processing
• Deep learning and computer vision
• Deep learning and robot actions
• What deep learning still can’t do
• Practical ways you can get started
• Conclusion
• About DeepGrammar (4 minutes, if time)
Best learning resources
Hinton’s Coursera Course. Get it right from the horse’s mouth.
He explains things well.
https://www.coursera.org/course/neuralnets
Online textbook in preparation for deep learning from Yoshua
Bengio and friends. Clear and understandable.
http://www.deeplearningbook.org/
TensorFlow tutorials.
https://www.tensorflow.org/versions/r0.11/tutorials/index.ht
ml
Deep Learning by Google
https://www.udacity.com/course/deep-learning--ud730
Talk Outline
• Introduction
• Deep learning and natural language processing
• Deep learning and computer vision
• Deep learning and robot actions
• What deep learning still can’t do
• Practical ways you can get started
• Conclusion
• About DeepGrammar (4 minutes, if time)
Talk Outline
• Introduction
• Deep learning and natural language processing
• Deep learning and computer vision
• Deep learning and robot actions
• What deep learning still can’t do
• Practical ways you can get started
• Conclusion
• About DeepGrammar (4 minutes, if time)
Conclusion
System 1: Fast and Parallel
System 2: Slow and Serial
Before deep learning, this was the strength of AI
System 2: Slow and Serial
With deep learning, this is now our strength
Time to get back to work on this
Talk Outline
• Introduction
• Deep learning and natural language processing
• Deep learning and computer vision
• Deep learning and robot actions
• What deep learning still can’t do
• Practical ways you can get started
• Conclusion
• About DeepGrammar (4 minutes, if time)
Talk Outline
• Introduction
• Deep learning and natural language processing
• Deep learning and computer vision
• Deep learning and robot actions
• What deep learning still can’t do
• Practical ways you can get started
• Conclusion
• About DeepGrammar (4 minutes, if time)
DeepGrammar
The importance of finding dumb mistakes
The importance of finding dumb mistakes
Thanks for listening
Jonathan Mugan
@jmugan
www.deepgrammar.com

More Related Content

What's hot

What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
Simplilearn
 
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Simplilearn
 

What's hot (20)

Deep Learning and Reinforcement Learning
Deep Learning and Reinforcement LearningDeep Learning and Reinforcement Learning
Deep Learning and Reinforcement Learning
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Deep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applicationsDeep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applications
 
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
 
Deep learning in Computer Vision
Deep learning in Computer VisionDeep learning in Computer Vision
Deep learning in Computer Vision
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
 
Deep learning
Deep learningDeep learning
Deep learning
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
 
Deep learning: the future of recommendations
Deep learning: the future of recommendationsDeep learning: the future of recommendations
Deep learning: the future of recommendations
 
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
 
Deep learning
Deep learningDeep learning
Deep learning
 
Information Retrieval with Deep Learning
Information Retrieval with Deep LearningInformation Retrieval with Deep Learning
Information Retrieval with Deep Learning
 
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
 
Geek Night 17.0 - Artificial Intelligence and Machine Learning
Geek Night 17.0 - Artificial Intelligence and Machine LearningGeek Night 17.0 - Artificial Intelligence and Machine Learning
Geek Night 17.0 - Artificial Intelligence and Machine Learning
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Neural Networks and Deep Learning
Neural Networks and Deep LearningNeural Networks and Deep Learning
Neural Networks and Deep Learning
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word Embeddings
 

Viewers also liked

"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn..."Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
Edge AI and Vision Alliance
 
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
Edge AI and Vision Alliance
 

Viewers also liked (20)

From Natural Language Processing to Artificial Intelligence
From Natural Language Processing to Artificial IntelligenceFrom Natural Language Processing to Artificial Intelligence
From Natural Language Processing to Artificial Intelligence
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial Intelligence
 
Farming Unicorns: Building Startup & Investor Ecosystems for Emerging Markets
Farming Unicorns: Building Startup & Investor Ecosystems for Emerging MarketsFarming Unicorns: Building Startup & Investor Ecosystems for Emerging Markets
Farming Unicorns: Building Startup & Investor Ecosystems for Emerging Markets
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
 
SXSW
SXSWSXSW
SXSW
 
Learning possibilistic networks from data: a survey
Learning possibilistic networks from data: a surveyLearning possibilistic networks from data: a survey
Learning possibilistic networks from data: a survey
 
Deep learning review
Deep learning reviewDeep learning review
Deep learning review
 
20151223application of deep learning in basic bio
20151223application of deep learning in basic bio 20151223application of deep learning in basic bio
20151223application of deep learning in basic bio
 
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
 
Dcnn for text
Dcnn for textDcnn for text
Dcnn for text
 
AI, behind the scenes
AI, behind the scenesAI, behind the scenes
AI, behind the scenes
 
Robot, Learning From Data
Robot, Learning From DataRobot, Learning From Data
Robot, Learning From Data
 
Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications
 
自然語言處理簡介
自然語言處理簡介自然語言處理簡介
自然語言處理簡介
 
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn..."Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
 
TEDX Talk: What's Happening With Artificial Intelligence?
TEDX Talk: What's Happening With Artificial Intelligence?TEDX Talk: What's Happening With Artificial Intelligence?
TEDX Talk: What's Happening With Artificial Intelligence?
 
Deep Learning Jeff-Shomaker_1-20-17_Final_
Deep Learning Jeff-Shomaker_1-20-17_Final_Deep Learning Jeff-Shomaker_1-20-17_Final_
Deep Learning Jeff-Shomaker_1-20-17_Final_
 
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
 
Deep Learning for Robotics
Deep Learning for RoboticsDeep Learning for Robotics
Deep Learning for Robotics
 

Similar to What Deep Learning Means for Artificial Intelligence

cs4811-ch11-neural-networks.ppt
cs4811-ch11-neural-networks.pptcs4811-ch11-neural-networks.ppt
cs4811-ch11-neural-networks.ppt
butest
 
Y conf talk - Andrej Karpathy
Y conf talk - Andrej KarpathyY conf talk - Andrej Karpathy
Y conf talk - Andrej Karpathy
Sze Siong Teo
 

Similar to What Deep Learning Means for Artificial Intelligence (20)

Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in RUnderstanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
 
A Study On Deep Learning
A Study On Deep LearningA Study On Deep Learning
A Study On Deep Learning
 
Deep Learning Survey
Deep Learning SurveyDeep Learning Survey
Deep Learning Survey
 
Deep learning tutorial 9/2019
Deep learning tutorial 9/2019Deep learning tutorial 9/2019
Deep learning tutorial 9/2019
 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning Tutorial
 
Deep learning
Deep learningDeep learning
Deep learning
 
cs4811-ch11-neural-networks.ppt
cs4811-ch11-neural-networks.pptcs4811-ch11-neural-networks.ppt
cs4811-ch11-neural-networks.ppt
 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdf
 
Deep Learning (DL) from Scratch
Deep Learning (DL) from ScratchDeep Learning (DL) from Scratch
Deep Learning (DL) from Scratch
 
Deep learning algorithms
Deep learning algorithmsDeep learning algorithms
Deep learning algorithms
 
Som paper1.doc
Som paper1.docSom paper1.doc
Som paper1.doc
 
introduction to deeplearning
introduction to deeplearningintroduction to deeplearning
introduction to deeplearning
 
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Word2vec ultimate beginner
Word2vec ultimate beginnerWord2vec ultimate beginner
Word2vec ultimate beginner
 
Perceptron
PerceptronPerceptron
Perceptron
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural Networks
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Y conf talk - Andrej Karpathy
Y conf talk - Andrej KarpathyY conf talk - Andrej Karpathy
Y conf talk - Andrej Karpathy
 

More from Jonathan Mugan

How to build someone we can talk to
How to build someone we can talk toHow to build someone we can talk to
How to build someone we can talk to
Jonathan Mugan
 
Moving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow ExtendedMoving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow Extended
Jonathan Mugan
 

More from Jonathan Mugan (6)

How to build someone we can talk to
How to build someone we can talk toHow to build someone we can talk to
How to build someone we can talk to
 
Moving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow ExtendedMoving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow Extended
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural Networks
 
Data Day Seattle, From NLP to AI
Data Day Seattle, From NLP to AIData Day Seattle, From NLP to AI
Data Day Seattle, From NLP to AI
 
Data Day Seattle, Chatbots from First Principles
Data Day Seattle, Chatbots from First PrinciplesData Day Seattle, Chatbots from First Principles
Data Day Seattle, Chatbots from First Principles
 
Chatbots from first principles
Chatbots from first principlesChatbots from first principles
Chatbots from first principles
 

Recently uploaded

Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Lisi Hocke
 

Recently uploaded (20)

Abortion Pill Prices Jozini ](+27832195400*)[ 🏥 Women's Abortion Clinic in Jo...
Abortion Pill Prices Jozini ](+27832195400*)[ 🏥 Women's Abortion Clinic in Jo...Abortion Pill Prices Jozini ](+27832195400*)[ 🏥 Women's Abortion Clinic in Jo...
Abortion Pill Prices Jozini ](+27832195400*)[ 🏥 Women's Abortion Clinic in Jo...
 
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
 
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
 
Software Engineering - Introduction + Process Models + Requirements Engineering
Software Engineering - Introduction + Process Models + Requirements EngineeringSoftware Engineering - Introduction + Process Models + Requirements Engineering
Software Engineering - Introduction + Process Models + Requirements Engineering
 
Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...
Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...
Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...
 
Abortion Pill Prices Turfloop ](+27832195400*)[ 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Turfloop ](+27832195400*)[ 🏥 Women's Abortion Clinic in ...Abortion Pill Prices Turfloop ](+27832195400*)[ 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Turfloop ](+27832195400*)[ 🏥 Women's Abortion Clinic in ...
 
Test Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdfTest Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdf
 
Encryption Recap: A Refresher on Key Concepts
Encryption Recap: A Refresher on Key ConceptsEncryption Recap: A Refresher on Key Concepts
Encryption Recap: A Refresher on Key Concepts
 
[GRCPP] Introduction to concepts (C++20)
[GRCPP] Introduction to concepts (C++20)[GRCPP] Introduction to concepts (C++20)
[GRCPP] Introduction to concepts (C++20)
 
GraphSummit Milan - Neo4j: The Art of the Possible with Graph
GraphSummit Milan - Neo4j: The Art of the Possible with GraphGraphSummit Milan - Neo4j: The Art of the Possible with Graph
GraphSummit Milan - Neo4j: The Art of the Possible with Graph
 
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
 
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
 
Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...
Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...
Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...
 
Effective Strategies for Wix's Scaling challenges - GeeCon
Effective Strategies for Wix's Scaling challenges - GeeConEffective Strategies for Wix's Scaling challenges - GeeCon
Effective Strategies for Wix's Scaling challenges - GeeCon
 
Transformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with LinksTransformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with Links
 
From Theory to Practice: Utilizing SpiraPlan's REST API
From Theory to Practice: Utilizing SpiraPlan's REST APIFrom Theory to Practice: Utilizing SpiraPlan's REST API
From Theory to Practice: Utilizing SpiraPlan's REST API
 
Rapidoform for Modern Form Building and Insights
Rapidoform for Modern Form Building and InsightsRapidoform for Modern Form Building and Insights
Rapidoform for Modern Form Building and Insights
 
Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?
 
Abortion Clinic In Pongola ](+27832195400*)[ 🏥 Safe Abortion Pills In Pongola...
Abortion Clinic In Pongola ](+27832195400*)[ 🏥 Safe Abortion Pills In Pongola...Abortion Clinic In Pongola ](+27832195400*)[ 🏥 Safe Abortion Pills In Pongola...
Abortion Clinic In Pongola ](+27832195400*)[ 🏥 Safe Abortion Pills In Pongola...
 
Microsoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMicrosoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdf
 

What Deep Learning Means for Artificial Intelligence

  • 1. What Deep Learning Means for Artificial Intelligence Jonathan Mugan, PhD Encore Presentation to Austin Data Geeks November 1, 2016
  • 2. AI through the lens of System 1 and System 2 Psychologist Daniel Kahneman in Thinking Fast and Slow describes humans as having two modes of thought: System 1 and System 2. System 1: Fast and Parallel System 2: Slow and Serial AI systems in these domains are useful but limited. Called GOFAI (Good, Old- Fashioned Artificial Intelligence). 1. Search and planning 2. Logic 3. Rule-based systems AI systems in these domains have been lacking. 1. Serial computers too slow 2. Lack of training data 3. Didn’t have the right algorithms Subconscious: E.g., face recognition or speech understanding. Conscious: E.g., when listening to a conversation or making PowerPoint slides. We underestimated how hard it would be to implement. E.g., we thought computer vision would be easy. We assumed it was the most difficult. E.g., we thought chess was hard.
  • 3. AI through the lens of System 1 and System 2 Psychologist Daniel Kahneman in Thinking Fast and Slow describes humans as having two modes of thought: System 1 and System 2. System 1: Fast and Parallel System 2: Slow and Serial AI systems in these domains are useful but limited. Called GOFAI (Good, Old- Fashioned Artificial Intelligence). 1. Search and planning 2. Logic 3. Rule-based systems Subconscious: E.g., face recognition or speech understanding. Conscious: E.g., when listening to a conversation or making PowerPoint slides. We underestimated how hard it would be to implement. E.g., we thought computer vision would be easy. We assumed it was the most difficult. E.g., we thought chess was hard. This has changed 1. We now have GPUs and distributed computing 2. We have Big Data 3. We have new algorithms [Bengio et al., 2003; Hinton et al., 2006; Ranzato et al., 2006]
  • 4. Deep learning begins with a little function It all starts with a humble linear function called a perceptron. weight1 ✖ input1 weight2 ✖ input2 weight3 ✖ input3 sum ✚ Perceptron: If sum > threshold: output 1 Else: output 0 Example: The inputs can be your data. Question: Should I buy this car? 0.2 ✖ gas mileage 0.3 ✖ horsepower 0.5 ✖ num cup holders sum ✚ Perceptron: If sum > threshold: buy Else: walk
  • 5. These little functions are chained together Deep learning comes from chaining a bunch of these little functions together. Chained together, they are called neurons. To create a neuron, we add a nonlinearity to the perceptron to get extra representational power when we chain them together. Our nonlinear perceptron is sometimes called a sigmoid. Plot of a sigmoid where The value b just offsets the sigmoid so the center is at 0.
  • 6. Single artificial neuron Output, or input to next neuron weight1 ✖ input1 weight2 ✖ input2 weight3 ✖ input3
  • 7. Three-layered neural network A bunch of neurons chained together is called a neural network. Layer 2: hidden layer. Called this because it is neither input nor output. Layer 3: output. E.g., cat or not a cat; buy the car or walk. Layer 1: input data. Can be pixel values or the number of cup holders. This network has three layers. (Some edges lighter to avoid clutter.) [16.2, 17.3, −52.3, 11.1]
  • 8. Training with supervised learning Supervised Learning: You show the network a bunch of things with a labels saying what they are, and you want the network to learn to classify future things without labels. Example: here are some pictures of cats. Tell me which of these other pictures are of cats. To train the network, want to find the weights that correctly classify all of the training examples. You hope it will work on the testing examples. Done with an algorithm called Backpropagation [Rumelhart et al., 1986]. [16.2, 17.3, −52.3, 11.1]
  • 9. Training with supervised learning Supervised Learning: You show the network a bunch of things with a labels saying what they are, and you want the network to learn to classify future things without labels. 𝑤 𝑊 𝑦 𝑥 [16.2, 17.3, −52.3, 11.1] Why Google’s Deep Learning toolbox is called TensorFlow. y: output x: input h: number of hidden neurons n: length of vector x Learning is learning parameter values
  • 10. Deep learning is adding more layers There is no exact definition of what constitutes “deep learning.” The number of weights (parameters) is generally large. Some networks have millions of parameters that are learned. (Some edges omitted to avoid clutter.) [16.2, 17.3, −52.3, 11.1]
  • 11. Deep learning is adding more layers Many “paths” for updates in weights to effect error. Backpropagation is like dynamic programming. “Chain Rule + Dynamic Programming = Neural Networks” Edward Z. Yang http://blog.ezyang.com/2011/05/neural- networks (Some edges omitted to avoid clutter.) [16.2, 17.3, −52.3, 11.1]
  • 12. Talk Outline • Introduction • Deep learning and natural language processing • Deep learning and computer vision • Deep learning and robot actions • What deep learning still can’t do • Practical ways you can get started • Conclusion • About DeepGrammar (4 minutes, if time)
  • 13. Talk Outline • Introduction • Deep learning and natural language processing • Deep learning and computer vision • Deep learning and robot actions • What deep learning still can’t do • Practical ways you can get started • Conclusion • About DeepGrammar (4 minutes, if time)
  • 14. Deep learning enables sub-symbolic processing Symbolic systems can be brittle. I bought a car . <i> <bought> <a> <car> <.> You have to remember to represent “purchased” and “automobile.” What about “truck”? How do you encode the meaning of the entire sentence?
  • 15. Recall our standard architecture Layer 2: hidden layer. Called this because it is neither input nor output. Layer 3: output. E.g., cat or not a cat; buy the car or walk. Layer 1: input data. Can be pixel values or the number of cup holders. Is this a cat? [16.2, 17.3, −52.3, 11.1]
  • 16. Neural nets with multiple outputs Okay, but what kind of cat is it? 𝑃(𝑥)𝑃(𝑥)𝑃(𝑥) 𝑃(𝑥) 𝑃(𝑥) Introduce a new node called a softmax. Probability a house cat Probability a lion Probability a panther Probability a bobcat Just normalize the output over the sum of the other outputs (using the exponential). Gives a probability. [16.2, 17.3, −52.3, 11.1]
  • 17. Learning word vectors 13.2, 5.4, −3.1 [−12.1, 13.1, 0.1] [7.2, 3.2,-1.9] the man ran From the sentence, “The man ran fast.” 𝑃(𝑥)𝑃(𝑥) 𝑃(𝑥) 𝑃(𝑥) Probability of “fast” Probability of “slow” Probability of “taco” Probability of “bobcat” Learns a vector for each word based on the “meaning” in the sentence by trying to predict the next word [Bengio et al., 2003]. These numbers updated along with the weights and become the vector representations of the words.
  • 18. Comparing vector and symbolic representations Vector representation taco = [17.32, 82.9, −4.6, 7.2] Symbolic representation taco = 𝑡𝑎𝑐𝑜 • Vectors have a similarity score. • A taco is not a burrito but similar. • Symbols can be the same or not. • A taco is just as different from a burrito as a Toyota. • Vectors have internal structure [Mikolov et al., 2013]. • Italy – Rome = France – Paris • King – Queen = Man – Woman • Symbols have no structure. • Symbols are arbitrarily assigned. • Meaning relative to other symbols. • Vectors are grounded in experience. • Meaning relative to predictions. • Ability to learn representations makes agents less brittle.
  • 19. Yeah, that’s a word But what about a sentence? Algorithm for generating vectors for sentences 1. Make the sentence vector be the vector for the first word. 2. For each subsequent word, combine its vector with the sentence vector. 3. The resulting vector after the last word is the sentence vector. Can be implemented using a recurrent neural network (RNN)
  • 20. Encoding sentence meaning into a vector h0 The “The patient fell.”
  • 21. Encoding sentence meaning into a vector h0 The h1 patient “The patient fell.”
  • 22. Encoding sentence meaning into a vector h0 The h1 patient h2 fell “The patient fell.”
  • 23. Encoding sentence meaning into a vector Like a hidden Markov model, but doesn’t make the Markov assumption and benefits from a vector representation. h0 The h1 patient h2 fell h3 . “The patient fell.”
  • 24. Cool, a sentence vector But what can you do with it? You can unwind in the other direction to do machine translation. Called a seq2seq model, or Neural Machine Translation, or encoder-decoder model. You can feed it to a classifier.
  • 25. Decoding sentence meaning Machine translation, or structure learning more generally. El h3
  • 26. Decoding sentence meaning Machine translation, or structure learning more generally. El h3 h4
  • 27. Decoding sentence meaning Machine translation, or structure learning more generally. El h3 paciente h4
  • 28. Decoding sentence meaning Machine translation, or structure learning more generally. El h3 paciente h4 cayó h5 . h6 [Cho et al., 2014] It keeps generating until it generates a stop symbol.
  • 29. Generating image captions Convolutional neural network An h0 angry h1 sister h2 . h3 [Karpathy and Fei-Fei, 2015] [Vinyals et al., 2015]Image
  • 30. Image caption examples [Karpathy and Fei-Fei, 2015] http://cs.stanford.edu/people/karpathy/deepimagesent/ See:
  • 31. Attention [Bahdanau et al., 2014] El h3 paciente h4 cayó h5 . h6 h0 The h1 patient h2 fell h3 .
  • 32. RNNs and Structure Learning • In addition to machine translation and generating captions for images, can be used to learn just about any kind of structure you’d want, as long as you have lots of training data.
  • 33. Deep learning and question answering RNNs answer questions. What is the translation of this phrase to French? What is the next word? Attention is useful for question answering. This can be generalized to which facts the learner should pay attention to when answering questions.
  • 34. Deep learning and question answering Bob went home. Tim went to the junkyard. Bob picked up the jar. Bob went to town. Where is the jar? A: town • Memory Networks [Weston et al., 2014] • Updates memory vectors based on a question and finds the best one to give the output. The office is north of the yard. The bath is north of the office. The yard is west of the kitchen. How do you go from the office to the kitchen? A: south, east • Neural Reasoner [Peng et al., 2015] • Encodes the question and facts in many layers, and the final layer is put through a function that gives the answer.
  • 35. Talk Outline • Introduction • Deep learning and natural language processing • Deep learning and computer vision • Deep learning and robot actions • What deep learning still can’t do • Practical ways you can get started • Conclusion • About DeepGrammar (4 minutes, if time)
  • 36. Talk Outline • Introduction • Deep learning and natural language processing • Deep learning and computer vision • Deep learning and robot actions • What deep learning still can’t do • Practical ways you can get started • Conclusion • About DeepGrammar (4 minutes, if time)
  • 37. Vision is hard • Even harder for 3D objects. • You move a bit, and everything changes. How a computer sees an image [22, 81, 44, 88, 17, 0, ..., 45] Example from MNIST handwritten digit dataset [LeCun and Cortes, 1998]. Vision is hard because images are big matrices of numbers.
  • 38. Breakthrough: Unsupervised Model • Big breakthrough in 2006 by Hinton et al. • Use a network with symmetric weights called a restricted Boltzmann machine. • Stochastic binary neuron. • Probabilistically outputs 0 (turns off) or 1 (turns on) based on the weight of the inputs from on units. 0 1 • Limit connections to be from one layer to the next. • Fast because decisions are made locally. • Trained in an unsupervised way to reproduce the data. 0 1 0 1 0 1 0 1 0 1 0 1
  • 39. Stack up the layers to make a deep network Input data Hidden The output of each layer becomes the input to the next layer [Hinton et al., 2006]. See video starting at second 45 https://www.coursera.org /course/neuralnets 0 1 0 1 0 1 0 1 0 1 0 1 Input data Hidden 0 1 0 1 0 1 0 1 0 1 0 1 Hidden layer becomes input data of next layer.
  • 40. Computer vision, scaling up Unsupervised learning was scaled up by Honglak Lee et al. [2009] to learn high-level visual features. [Lee et al., 2009] Further scaled up by Quoc Le et al. [2012]. • Used 1,000 machines (16,000 cores) running for 3 days to train 1 billion weights by watching YouTube videos. • The network learned to identify cats. • The network wasn’t told to look for cats, it naturally learned that cats were integral to online viewing. • Video on the topic at NYT http://www.nytimes.com/2012/06/26/tec hnology/in-a-big-network-of-computers- evidence-of-machine-learning.html
  • 41. Why is this significant? To have a grounded understanding of its environment, an agent must be able to acquire representations through experience [Pierce et al., 1997; Mugan et al., 2012]. Without a grounded understanding, the agent is limited to what was programmed in. We saw that unsupervised learning could be used to learn the meanings of words, grounded in the experience of reading. Using these deep Boltzmann machines, machines can learn to see the world through experience.
  • 42. CNNs: Limit connections and duplicate parameters Convolutional neural networks (CNNs) build in a kind of feature invariance. 1. Convolution layers • Bank of feature detectors • Different feature detectors slide over the image 2. Sub-sampling layers • The next layer pools from a region on the layer below With the layers and topology, our networks are starting to look a little like the visual cortex. Although, we still don’t fully understand the visual cortex. Modern image processing systems have many such layer pairs.
  • 43. More recent deep vision networks ImageNet http://www.image-net.org/ is a huge collection of images corresponding to the nouns of the WordNet hierarchy. There are hundreds to thousands of images per noun. 2012 – Deep Learning begins to dominate image recognition Krizhevsky et al. [2012] got 16% error on recognizing objects, when before the best error was 26%. They used a convolutional neural network (CNN). 2015 – Deep Learning surpasses human level performance He et al. [2015] surpassed human level performance on recognizing images of objects.* Computers seem to have an advantage when the classes of objects are fine grained, such as multiple species of dogs. *But deep learning can be easily fooled [Nguyen et al., 2014]. Enlightening video at https://www.youtube.com/watch?v=M2IebCN9Ht4.
  • 44. Talk Outline • Introduction • Deep learning and natural language processing • Deep learning and computer vision • Deep learning and robot actions • What deep learning still can’t do • Practical ways you can get started • Conclusion • About DeepGrammar (4 minutes, if time)
  • 45. Talk Outline • Introduction • Deep learning and natural language processing • Deep learning and computer vision • Deep learning and robot actions • What deep learning still can’t do • Practical ways you can get started • Conclusion • About DeepGrammar (4 minutes, if time)
  • 46. A stamping in of behavior When we think of doing things, we think of conscious planning with System 2. Imagine trying to get to Seattle. • Get to the airport. How? Take a taxi. How? Call a taxi. How? Find my phone. • Some behaviors arise more from a a gradual stamping in [Thorndike, 1898]. • Became the study of Behaviorism [Skinner, 1953] (see Skinner box on the right). • Formulated into artificial intelligence as Reinforcement Learning [Sutton and Barto, 1998]. By Andreas1 (Adapted from Image:Boite skinner.jpg) [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0/)], via Wikimedia Commons A “Skinner box”
  • 47. Beginning with random exploration In reinforcement learning, the agent begins by randomly exploring until it reaches its goal.
  • 48. Reaching the goal • When it reaches the goal, credit is propagated back to its previous states. • The agent learns the function Q(s,a), which gives the cumulative expected discounted reward of being in state s and taking action a and acting according to the policy thereafter.
  • 49. Eventually, the agent learns the value of being in each state and taking each action and can therefore always do the best thing in each state. Learning the behavior
  • 50. Playing Atari with deep learning Input, last four frames, where each frame is downsampled to 84 by 84 pixels. [Mnih et al., 2013] represent the state-action value function as a convolutional neural network. 𝑃(𝑥)𝑃(𝑥) 𝑃(𝑥) 𝑃(𝑥) Value of moving left Value of moving right Value of shooting Value of reloading In [Mnih et al., 2013], this is actually three hidden layers. See some videos at http://mashable.com/2015/02 /25/computer-wins-at-atari- games/
  • 51. Talk Outline • Introduction • Deep learning and natural language processing • Deep learning and computer vision • Deep learning and robot actions • What deep learning still can’t do • Practical ways you can get started • Conclusion • About DeepGrammar (4 minutes, if time)
  • 52. Talk Outline • Introduction • Deep learning and natural language processing • Deep learning and computer vision • Deep learning and robot actions • What deep learning still can’t do • Practical ways you can get started • Conclusion • About DeepGrammar (4 minutes, if time)
  • 53. What deep learning still can’t do System 1: Fast and Parallel • With computer vision, we seem to be on the right track • Reinforcement learning is useful in increasingly large worlds System 2: Slow and Serial • Still lacking in common sense • Language processing needs a grounded understanding
  • 54. Limitations of deep learning The encoded meaning is grounded with respect to other words. There is no linkage to the physical world. "ICubLugan01 Reaching". Licensed under CC BY-SA 3.0 via Wikipedia - https://en.wikipedia.org/wiki/File:ICubLugan01_Reaching.png#/media/File:ICubLugan01_Reaching.png The iCub http://www.icub.org/
  • 55. Limitations of deep learning Bob went home. Tim went to the junkyard. Bob picked up the jar. Bob went to town. Where is the jar? A: town Deep learning has no understanding of what it means for the jar to be in town. For example that it can’t also be at the junkyard. Or that it may be in Bob’s car, or still in his hands. The encoded meaning is grounded with respect to other words. There is no linkage to the physical world.
  • 56. Limitations of deep learning Imagine a dude standing on a table. How would a computer know that if you move the table you also move the dude? Likewise, how could a computer know that it only rains outside? Or, as Marvin Minsky asks, how could a computer learn that you can pull a box with a string but not push it?
  • 57. Limitations of deep learning Imagine a dude standing on a table. How would a computer know that if you move the table you also move the dude? Likewise, how could a computer know that it only rains outside? Or, as Marvin Minsky asks, how could a computer learn that you can pull a box with a string but not push it? No one knows how to explain all of these situations to a computer. There’s just too many variations. A robot can learn through experience, but it must be able to efficiently generalize that experience.
  • 58. Talk Outline • Introduction • Deep learning and natural language processing • Deep learning and computer vision • Deep learning and robot actions • What deep learning still can’t do • Practical ways you can get started • Conclusion • About DeepGrammar (4 minutes, if time)
  • 59. Talk Outline • Introduction • Deep learning and natural language processing • Deep learning and computer vision • Deep learning and robot actions • What deep learning still can’t do • Practical ways you can get started • Conclusion • About DeepGrammar (4 minutes, if time)
  • 60. Best learning resources Hinton’s Coursera Course. Get it right from the horse’s mouth. He explains things well. https://www.coursera.org/course/neuralnets Online textbook in preparation for deep learning from Yoshua Bengio and friends. Clear and understandable. http://www.deeplearningbook.org/ TensorFlow tutorials. https://www.tensorflow.org/versions/r0.11/tutorials/index.ht ml Deep Learning by Google https://www.udacity.com/course/deep-learning--ud730
  • 61. Talk Outline • Introduction • Deep learning and natural language processing • Deep learning and computer vision • Deep learning and robot actions • What deep learning still can’t do • Practical ways you can get started • Conclusion • About DeepGrammar (4 minutes, if time)
  • 62. Talk Outline • Introduction • Deep learning and natural language processing • Deep learning and computer vision • Deep learning and robot actions • What deep learning still can’t do • Practical ways you can get started • Conclusion • About DeepGrammar (4 minutes, if time)
  • 63. Conclusion System 1: Fast and Parallel System 2: Slow and Serial Before deep learning, this was the strength of AI System 2: Slow and Serial With deep learning, this is now our strength Time to get back to work on this
  • 64. Talk Outline • Introduction • Deep learning and natural language processing • Deep learning and computer vision • Deep learning and robot actions • What deep learning still can’t do • Practical ways you can get started • Conclusion • About DeepGrammar (4 minutes, if time)
  • 65. Talk Outline • Introduction • Deep learning and natural language processing • Deep learning and computer vision • Deep learning and robot actions • What deep learning still can’t do • Practical ways you can get started • Conclusion • About DeepGrammar (4 minutes, if time)
  • 67. The importance of finding dumb mistakes
  • 68. The importance of finding dumb mistakes
  • 69. Thanks for listening Jonathan Mugan @jmugan www.deepgrammar.com