Deep Learning for AI
Yoshua Bengio
October 17th, 2018
Activate Conference Keynote, Montréal
Deep Learning: Underlying Assumption
• There are principles giving rise to intelligence (machine,
human or animal) via learning, simple enough that they
can be described compactly, similarly to the laws of
physics, i.e., our intelligence is not just the result of a
huge bag of tricks and pieces of knowledge, but of general
mechanisms to acquire knowledge.
Intelligence Needs Knowledge
• Learning:
powerful way to transfer knowledge to intelligent agents
3
• Failure of classical AI: a lot of knowledge is intuitive
• Solution: get knowledge from data & experience
Artificial Intelligence
Machine Learning
Deep
Learning
Deep Learning AI Breakthroughs
Computers have made huge
strides in
perception,
manipulating language,
playing games, reasoning, ...
4
2010-2012: breakthrough in 

speech recognition
Source: Microsoft
Convolutional Neural Networks
• A special kind of deep learning tailored for
images
• Exploits the invariance to translations
• Exploits multi-scale hierarchy
6
Convolutional neural network for imaging data
Learning Multiple Levels of Abstraction
• The big payoff of deep learning is to allow learning
higher levels of abstraction
• Higher-level abstractions disentangle the
factors of variation, which allows much easier
generalization and transfer
7
(Bengio & LeCun 2007)
Accuracy
Medical Image
Classification
Imagia
> 90%, real
time
GI Experts (Key
Opinion Leaders)*
~ 90%
GI Doctors Trained by
KOLs*
~ 75%
*(D. Rex, 2015)
Detect cancer cells by
deep learning
@ Montreal
Creative AI: 

Generative Adversarial Networks
9
Coming Deep Learning Revolution in Robotics
(& Mobile Robotics)
Groups of Pieter Abbeel & Sergey Levine @ Berkeley
10
Big Success of Deep Learning in NLP:
Neural Machine Translation
• Incorporating the idea of attention, using GATING units,
has unlocked a breakthrough in machine translation:
Neural Machine Translation
• Now in Google Translate:
11
Lower-level
Higher-level
Softmax over lower
locations conditioned
on context at lower and
higher locations
Human
evaluation
human
translation
n-gram
translation
current
neural net
translation
(ICLR’2015)
Machine Learning, AI
& No Free Lunch
• Five key ingredients for ML towards AI
1. Lots & lots of data
2. Very flexible models
3. Enough computing power
4. Computationally efficient inference
5. Powerful priors that can defeat the
curse of dimensionality
12
Why does Deep Learning Work?
Bypassing the curse of dimensionality
We need to build compositionality into our ML models
Just as human languages exploit compositionality to give
representations and meanings to complex ideas
Exploiting compositionality gives an exponential gain in
representational power
Distributed representations / embeddings: feature learning
Deep architecture: multiple levels of feature learning
Prior assumption: compositionality is useful to
describe the world around us efficiently
13
Deep Representations: The Power of
Compositionality
• Learned function seen as a composition of simpler operations,
e.g. inspired by neural computation
• Hierarchy of features, concepts, leading to more abstract
factors enabling better generalization
• Again, theory shows this can be exponentially advantageous
14
Why multiple layers? The world is compositional
Latent Variables and Abstract
Representations to Disentangle Manifolds
• Encoder/decoder view: maps
between low & high-levels
• Encoder does inference: interpret
the data at the abstract level
• Decoder can generate new
configurations
• Encoder flattens and disentangles
the data manifold
15
encoder decoder P(x|h)
P(h)
data space
Q(h|x) Abstract
representation
space
16
Why Generative Models with
a Latent Space?
“What I cannot create, I do not understand”

-Richard Feynman
Age Gender
c
Interpolation
Inference
Generation
Why Generate Models with a Latent
Space?
Still Far from Human-Level AI
• Industrial successes mostly
based on supervised learning
requiring lots of human-labeled
data implicitly defining the
relevant high-level abstractions.
• Learning relatively superficial
clues, sometimes not
generalizing well outside of
training contexts, easy to fool
trained networks:
Humans outperform machines at unsupervised
learning
• Humans are very good at
unsupervised learning, e.g. a
2 year old knows intuitive
physics
• Babies construct an
approximate but sufficiently
reliable model of physics,
how do they manage that?
Note that they interact with
the world, not just observe
it.
Short-Term Expectations
• Even without scientific breakthroughs in the short-term, current ML science
can deliver immense value thanks to
• larger datasets
• better engineering
• better hardware: energy consumption + speed
• For inference: to democratize applications
• For training: for smarter models
• Examples of application areas: medicine, factory automation, transportation,
agriculture, molecule design, personal assistants, better translation and speech
recognition & synthesis, etc
AI: The Upcoming Industrial Revolution
Upcoming industrial revolution:
• Machines extending humans’
cognitive power
Large growth in AI sector
All sectors of the economy
Increasing total GDP by 14% by 2030 = 16 trillion USD value
21
Accenture 2018: AI is the Future of Growth, see https://goo.gl/v1GWVM.
Embracing AI in industry:



Not just a question of
competitivity, 

a question of survival
Main demand from industry: deep learning talent
Data is the New Oil
• Because AI is based on ML, successful AI
applications require DATA – lots of data
• The first question to ask in any project:
– what data is available and what data is
needed, do we need to collect more, do we
need to label it?
Montreal’s AI Ecosystem
+ Mila: greatest concentration of academic researchers in
deep learning
+ Rapidly growing research hub, major international
recognition (topping recent citations in CS)
+ Local university + industry collaborations
+ Hundreds of M$ invested by governments
+ Rapidly growing AI startup ecosystem (200M$ invested in
last 2 yrs, expected to triple in next 1-2 year)
Entrepreneurs & AI
• Start thinking about how ML could be exploited to
improve current services & products, and create new
products and services
• Need a data strategy
• Need talent (that is tough), but smart engineers can
learn the skills if given time and resources
• Connect with academic researchers and take
advantage of training events & precompetitive
research at institutes like MILA
AI: Hopes & Dangers
• Hopes
– Economic growth
– Material progress for all
– Improving healthcare
– Improving education and other services (e.g. legal)
– Freedom from work as slavery
• Dangers
– Big Brother and killer robots
– Misery for jobless people, at least in transition
– Manipulation from advertising
– Reinforcement of social biases and discrimination
– Increased inequality and power concentration
!26

Deep Learning for AI - Yoshua Bengio, Mila

  • 1.
    Deep Learning forAI Yoshua Bengio October 17th, 2018 Activate Conference Keynote, Montréal
  • 2.
    Deep Learning: UnderlyingAssumption • There are principles giving rise to intelligence (machine, human or animal) via learning, simple enough that they can be described compactly, similarly to the laws of physics, i.e., our intelligence is not just the result of a huge bag of tricks and pieces of knowledge, but of general mechanisms to acquire knowledge.
  • 3.
    Intelligence Needs Knowledge •Learning: powerful way to transfer knowledge to intelligent agents 3 • Failure of classical AI: a lot of knowledge is intuitive • Solution: get knowledge from data & experience Artificial Intelligence Machine Learning Deep Learning
  • 4.
    Deep Learning AIBreakthroughs Computers have made huge strides in perception, manipulating language, playing games, reasoning, ... 4
  • 5.
    2010-2012: breakthrough in
 speech recognition Source: Microsoft
  • 6.
    Convolutional Neural Networks •A special kind of deep learning tailored for images • Exploits the invariance to translations • Exploits multi-scale hierarchy 6 Convolutional neural network for imaging data
  • 7.
    Learning Multiple Levelsof Abstraction • The big payoff of deep learning is to allow learning higher levels of abstraction • Higher-level abstractions disentangle the factors of variation, which allows much easier generalization and transfer 7 (Bengio & LeCun 2007)
  • 8.
    Accuracy Medical Image Classification Imagia > 90%,real time GI Experts (Key Opinion Leaders)* ~ 90% GI Doctors Trained by KOLs* ~ 75% *(D. Rex, 2015) Detect cancer cells by deep learning @ Montreal
  • 9.
    Creative AI: 
 GenerativeAdversarial Networks 9
  • 10.
    Coming Deep LearningRevolution in Robotics (& Mobile Robotics) Groups of Pieter Abbeel & Sergey Levine @ Berkeley 10
  • 11.
    Big Success ofDeep Learning in NLP: Neural Machine Translation • Incorporating the idea of attention, using GATING units, has unlocked a breakthrough in machine translation: Neural Machine Translation • Now in Google Translate: 11 Lower-level Higher-level Softmax over lower locations conditioned on context at lower and higher locations Human evaluation human translation n-gram translation current neural net translation (ICLR’2015)
  • 12.
    Machine Learning, AI &No Free Lunch • Five key ingredients for ML towards AI 1. Lots & lots of data 2. Very flexible models 3. Enough computing power 4. Computationally efficient inference 5. Powerful priors that can defeat the curse of dimensionality 12
  • 13.
    Why does DeepLearning Work? Bypassing the curse of dimensionality We need to build compositionality into our ML models Just as human languages exploit compositionality to give representations and meanings to complex ideas Exploiting compositionality gives an exponential gain in representational power Distributed representations / embeddings: feature learning Deep architecture: multiple levels of feature learning Prior assumption: compositionality is useful to describe the world around us efficiently 13
  • 14.
    Deep Representations: ThePower of Compositionality • Learned function seen as a composition of simpler operations, e.g. inspired by neural computation • Hierarchy of features, concepts, leading to more abstract factors enabling better generalization • Again, theory shows this can be exponentially advantageous 14 Why multiple layers? The world is compositional
  • 15.
    Latent Variables andAbstract Representations to Disentangle Manifolds • Encoder/decoder view: maps between low & high-levels • Encoder does inference: interpret the data at the abstract level • Decoder can generate new configurations • Encoder flattens and disentangles the data manifold 15 encoder decoder P(x|h) P(h) data space Q(h|x) Abstract representation space
  • 16.
    16 Why Generative Modelswith a Latent Space? “What I cannot create, I do not understand”
 -Richard Feynman Age Gender c Interpolation Inference Generation Why Generate Models with a Latent Space?
  • 17.
    Still Far fromHuman-Level AI • Industrial successes mostly based on supervised learning requiring lots of human-labeled data implicitly defining the relevant high-level abstractions. • Learning relatively superficial clues, sometimes not generalizing well outside of training contexts, easy to fool trained networks:
  • 18.
    Humans outperform machinesat unsupervised learning • Humans are very good at unsupervised learning, e.g. a 2 year old knows intuitive physics • Babies construct an approximate but sufficiently reliable model of physics, how do they manage that? Note that they interact with the world, not just observe it.
  • 19.
    Short-Term Expectations • Evenwithout scientific breakthroughs in the short-term, current ML science can deliver immense value thanks to • larger datasets • better engineering • better hardware: energy consumption + speed • For inference: to democratize applications • For training: for smarter models • Examples of application areas: medicine, factory automation, transportation, agriculture, molecule design, personal assistants, better translation and speech recognition & synthesis, etc
  • 20.
    AI: The UpcomingIndustrial Revolution Upcoming industrial revolution: • Machines extending humans’ cognitive power Large growth in AI sector All sectors of the economy Increasing total GDP by 14% by 2030 = 16 trillion USD value
  • 21.
    21 Accenture 2018: AIis the Future of Growth, see https://goo.gl/v1GWVM.
  • 22.
    Embracing AI inindustry:
 
 Not just a question of competitivity, 
 a question of survival Main demand from industry: deep learning talent
  • 23.
    Data is theNew Oil • Because AI is based on ML, successful AI applications require DATA – lots of data • The first question to ask in any project: – what data is available and what data is needed, do we need to collect more, do we need to label it?
  • 24.
    Montreal’s AI Ecosystem +Mila: greatest concentration of academic researchers in deep learning + Rapidly growing research hub, major international recognition (topping recent citations in CS) + Local university + industry collaborations + Hundreds of M$ invested by governments + Rapidly growing AI startup ecosystem (200M$ invested in last 2 yrs, expected to triple in next 1-2 year)
  • 25.
    Entrepreneurs & AI •Start thinking about how ML could be exploited to improve current services & products, and create new products and services • Need a data strategy • Need talent (that is tough), but smart engineers can learn the skills if given time and resources • Connect with academic researchers and take advantage of training events & precompetitive research at institutes like MILA
  • 26.
    AI: Hopes &Dangers • Hopes – Economic growth – Material progress for all – Improving healthcare – Improving education and other services (e.g. legal) – Freedom from work as slavery • Dangers – Big Brother and killer robots – Misery for jobless people, at least in transition – Manipulation from advertising – Reinforcement of social biases and discrimination – Increased inequality and power concentration !26