DEEP LEARNING PRIMER
Maurizio Caló Caligaris
Presented at NYU Center For Genomics
A First Principles Approach
CS 229
Machine
Learning
ABOUT ME
ufldl.stanford.edu/?people
THE RISE OF DEEP LEARNING
GOOGLE TRENDS
THE RISE OF DEEP LEARNING
“We will move
from a mobile-first world to
an AI-first world”
- Sundar Pichai, Google.
Letter to shareholders
THE RISE OF DEEP LEARNING
THE RISE OF DEEP LEARNING
THIS TALK
Demystify deep learning. Provide a simple way of
approaching the subject at a high-level, from the
ground-up.
GOAL
THIS TALK
Accessible to people with little or no background
in machine learning
INTENDED AUDIENCE
(experts in the field can hopefully learn something too)
ROUNDEDTALK
OUTLINE
• Preliminaries
• Machine learning
• Neural networks
• Bias / variance tradeoff
• The case for deep learning
• Why now
• State-of-the-art + trends
• FAQS
The ability for computers to learn from
experience and understand the world in
terms of a hierarchy of concepts, with
each concept defined in terms of its
relation to simpler concepts.
DEEP LEARNING: BASIC DEFINITION
Without the need for human operators to formally
specify all the knowledge the computer needs
SUBSET OF MACHINE LEARNING
http://www.deeplearningbook.org/
MACHINE LEARNING is a type of
ARTIFICIAL INTELLIGENCE that provides
computers the ability to learn without
being explicitly programmed
MACHINE LEARNING: BASIC DEFINITION
http://www.deeplearningbook.org/
MACHINE LEARNING
“JUST X —> Y”
Approximate a mapping f from input X to
output Y, based on some sample data
A SIMPLE WAY TO THINK ABOUT
SUPERVISED LEARNING
MACHINE LEARNING
“JUST X —> Y”
A SIMPLE WAY TO THINK ABOUT
SUPERVISED LEARNING
Problem X Y
Housing Price Prediction Size (sq. ft), location $35,000
Spam Detection Email Spam / Not Spam
Product recommendations Product and user features P(purchase)
Loan approval Loan application Will they pay? (0 or 1)
Preventive maintenance
Sensors from planes / hard
disk
Is it about to fail?
http://cs229.stanford.edu/notes/cs229-notes1.pdf
Input and output are both real numbers
features output
A SIMPLE X —> Y: REGRESSION
http://cs229.stanford.edu/notes/cs229-notes1.pdf
Predict:
where weights w are chosen so
as
to minimize
(sum of squared errors)
A SIMPLE X —> Y: LINEAR REGRESSION
(plus regularization)
A SIMPLE X —> Y: (BINARY) CLASSIFICATION
Presentation of
fetus
OUTPUT IS EITHER 0 OR 1
Presence of
uterine scar
Placenta previa
Maternal
disease
Presentation of
fetus
C-section (0) or
natural birth (1)
http://www.deeplearningbook.org/
http://cs229.stanford.edu/notes/cs229-notes1.pdf
LOGISTIC REGRESSSION
Input:
Multiply each feature
by some weight
Map the result
smoothly to (0 - 1)
A SIMPLE X —> Y: LOGISTIC REGRESSION
What if example data looks like this?
X —> Y: NON-LINEAR RELATIONSHIPS
Can learn non-linear
relationships in data
(universality
theorem)
Trained using
back-propagation
NEURAL NETWORKS
https://medium.com/@ageitgey/machine-learning-is-fun-part-2-a26a10b68df3
http://neuralnetworksanddeeplearning.com/chap4.html
NEURAL NETWORKS
or
where
http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/
PERFECTLY FITTING SAMPLE DATA:

GOOD OR BAD?
ASIDE: BIAS-VARIANCE TRADE-OFF
https://www.quora.com/What-is-the-best-way-to-explain-the-bias-variance-trade-off-in-laymens-terms
Want our models to generalize to data we
haven’t seen
(One of the most important ideas in machine learning)
http://cs229.stanford.edu/notes/cs229-notes4.pdf
ASIDE: MODEL GENERALIZABILITY:
TRAINING AND VALIDATION SETS
Want our models to generalize to data we
haven’t seen
Leave out some of data to
evaluate performance on
(called “validation” set)
k-fold cross-validation: do
this over multiple rounds,
each time leaving out a
different subset of the
sample data
MACHINE LEARNING:

JUST CURVE FITTING?
MACHINE LEARNING:

JUST CURVE FITTING?
Except we don’t call it that.
We wouldn’t get any funding if we did
that.
Machine learning sounds cooler and
impresses people.
MACHINE LEARNING:

JUST CURVE FITTING?
Finding good feature representations is often
the biggest challenge!
Performance of ML algorithms depends
heavily on the presentation of the data they
are given.
JOKING ASIDE
FEATURE REPRESENTATION matters. A
LOT.
ROUNDEDA SIMPLE
CHALLENGE
Compute CXMCXI times II
ROUNDEDA SIMPLE
CHALLENGE
Compute 111,111 times 2
(way easier)
ROUNDEDANOTHER
CHALLENGE
Which of these
contains a human
face?
http://neuralnetworksanddeeplearning.com/chap1.html#toward_deep_learning
ROUNDEDANOTHER
CHALLENGE
Does this contain a human
face?
http://neuralnetworksanddeeplearning.com/chap1.html#toward_deep_learning
WHAT WE SEE VS. WHAT COMPUTERS SEE
https://medium.com/@ageitgey/machine-learning-is-fun-part-3-deep-learning-and-convolutional-neural-networks-f40359318721
CHOICE
OF
FEATURE
REPRESENTATION
MATTERS.
A LOT.
+
pixel 1
+
++
+
++
+
+
+
+
+
+
+--
-
-
-
-
-
-
- -
-
-
-
pixel 2
+
-
Cesarean delivery
Natural birth
Individual pixels in MRI not
correlated with desired outcome
CHOICE OF FEATURE REPRESENTATION
MATTERS
feat 1
feat 2
-
-
-
-
--
-
+ +
+
+
+ +
+
+
+ +
+
-
Want feature representations
so that:
Need a doctor to tell the system features:
presence of uterine scar, presentation of fetus,
placenta previa, maternal disease
primiparity, twins, etc.
Raw MRI input:
FEATURE ENGINEERING:
COMPUTER VISION
SIFT
HoG
TEXTONS
RUFT
GLOH
FEATURE ENGINEERING:
NATURAL LANGUAGE PROCESSING
PARSER FEATURES
STEMMING
ONTOLOGIES
PART OF SPEECH
ANAPHORA
ROUNDED
FEATURE
ENGINEERING
• Time-consuming
• Domain and task-specific
• Requires human experts
• Lots of trial and error
TRADITIONAL APPROACH
Automatically learns feature
representations from data
(in terms of a hierarchy of concepts, with
each concept defined in terms of its
relation to simpler concepts)
THE CASE FOR DEEP LEARNING:

AUTOMATIC FEATURE
REPRESENTATION
Without the need for human operators to formally
specify all the knowledge the computer needs
THE CASE FOR DEEP LEARNING:

INTUITIVE EXPLANATION
Note: not a realistic approach (just to develop intuition)
http://neuralnetworksanddeeplearning.com/chap1.html#toward_deep_learning
THE CASE FOR DEEP LEARNING:

INTUITIVE EXPLANATION
Split problems into subproblems (hierarchy of
concepts)
http://neuralnetworksanddeeplearning.com/chap1.html#toward_deep_learning
THE CASE FOR DEEP LEARNING:

EXAMPLE OUTPUT OF DEEP NETWORK
http://www.cs.toronto.edu/~rgrosse/icml09-cdbn.pdf
Labeled data is expensive. Whereas vast
amounts of unlabeled data is freely
available on the web.
Train a network where output = input
Learn to see the world the way a human
babies learn (exploration as opposed to
learning from labeled examples)
THE CASE FOR DEEP LEARNING:

LEARNING FROM UNLABELED DATA
GOOGLE BRAIN
THE CASE FOR DEEP LEARNING:

LEARNING FROM UNLABELED DATA
1 billion connections
10 million 200x200 px
Youtube images (unlabeled)
1,000 machines
(16,000 cores)
https://static.googleusercontent.com/media/research.google.com/en//archive/unsupervised_icml2012.pdf
Trained to predict itself
(input =output)
GOOGLE BRAIN: RESULTS
THE CASE FOR DEEP LEARNING:

LEARNING FROM UNLABELED DATA
Computer learned concepts of
“cat” and “person” without
being explicitly told
THE CASE FOR DEEP LEARNING:

STATE-OF-THE-ART RESULTS ON STANDARD
BENCHMARKS
ImageNet Large Scale Visual Recognition
THE CASE FOR DEEP LEARNING:

STATE-OF-THE-ART RESULTS ON STANDARD
BENCHMARKS
Speech Recognition (TIMIT dataset)
Performance
stagnated in
2000s
Introduction of
deep learning
(2009) resulted in
sudden
improvements.
Some error rates
halved.
THE CASE FOR DEEP LEARNING:
REMARKABLE PERFORMANCE
Andrew Ng, NIPS 2016
THE CASE FOR DEEP LEARNING:
REMARKABLE PERFORMANCE
(CAVEAT)
(given enough data)
Deep networks do not necessarily
outperform shallow algorithms here
Andrew Ng, NIPS 2016
WHY IS DEEP
LEARNING
TAKING OFF
NOW?
WHY NOW
Scale of data
THE FUEL
THE ENGINE
Computing
infrastructure
https://www.quora.com/What-does-Andrew-Ng-think-about-Deep-Learning
THE ENGINE
GPUs
Computation across
several machines
Software frameworks
Theano
Torch
PyLearn2
Caffe
TensorFlow
POSITIVE FEEDBACK
LOOP
Lots of computational power
Greater incentive to acquire
more data
Greater incentive to build
bigger/faster
networks
https://www.quora.com/What-does-Andrew-Ng-think-about-Deep-Learning
POSITIVE FEEDBACK
LOOP
Efficient computing infrastructure
Faster experiments
(e.g. 1 day instead of 1 week)
Speeds up innovation
https://www.quora.com/What-does-Andrew-Ng-think-about-Deep-Learning
THE CASE FOR DEEP LEARNING:

BIAS-VARIANCE NOT AS MUCH OF
A TRADE-OFF
TRADITIONALLY
DEEP LEARNING
At least one possible action item in each case:
get more data or train a bigger model
Andrew Ng, NIPS 2016
THE CASE FOR DEEP LEARNING:
RICH, COMPLEX INPUTS & OUTPUTS
Andrew Ng, NIPS 2016
THE CASE FOR DEEP LEARNING:
RICH, COMPLEX INPUTS & OUTPUTS
Image super resolution
down sampled image -> natural detailed version
THE CASE FOR DEEP LEARNING:
RICH, COMPLEX INPUTS & OUTPUTS
Image-to-image translation
https://phillipi.github.io/pix2pix/
THE CASE FOR DEEP LEARNING:
GENERALIZING ACROSS TASKS
Joint-Many Tasks (JMT)
State-of-the art results on
multiple tasks by a single
model:
- Chunking
- Dependency parsing
- Semantic relatedness
- Textual entailment
https://metamind.io/research/multiple-different-natural-language-processing-tasks-in-a-single-deep-model/
THE CASE FOR DEEP LEARNING:
GENERALIZING ACROSS TASKS
https://blog.openai.com/unsupervised-sentiment-neuron/
THE CASE FOR DEEP LEARNING:
GENERALIZING ACROSS MODALITIES
Generates sentence descriptions from images
Multimodal Recurrent Neural Network (Karpathy, 2014)
THE CASE FOR DEEP LEARNING:
SINGLE-LEARNING ALGORITHM HYPOTHESIS
Evidence from neuroscience. Ferrets can learn to “see”
with the auditory cortex if their brains are rewired to send
visual signals to that area.
i.e. mammalian brain may use a single algorithm for many
different tasks http://www.deeplearningbook.org/contents/intro.html
THE CASE FOR DEEP LEARNING:
SINGLE-LEARNING ALGORITHM HYPOTHESIS
Machine learning research is becoming less fragmented:
NLP
VISION
MOTION PLANNING
SPEECH
http://www.deeplearningbook.org/contents/intro.html
THE CASE FOR DEEP LEARNING:
TOWARDS END-TO-END LEARNING?
Andrew Ng, NIPS 2016
THE CASE FOR DEEP LEARNING:
TOWARDS END-TO-END LEARNING?
Andrew Ng, NIPS 2016
ROUNDED
SUMMARY
• Machine Learning: “Just X to Y”
• Choice of feature representation matters
• Hand-engineering features is hard!
• Deep learning
• Intuitive explanation of how deep
networks can learn hierarchical feature
representations and why it works
• No need for humans to formally specify
knowledge
• Works remarkably well in practice, due to:
• Scale of computation
• Scale of data
• Can be successfully applied to an
increasingly wide variety of complex tasks
QUESTIONS
CONTACT
Maurizio Caló Caligaris
cs.stanford.edu/~maurizio
maurizio@cs.stanford.edu

Deep Learning Primer: A First-Principles Approach

  • 1.
    DEEP LEARNING PRIMER MaurizioCaló Caligaris Presented at NYU Center For Genomics A First Principles Approach
  • 2.
  • 3.
    THE RISE OFDEEP LEARNING GOOGLE TRENDS
  • 4.
    THE RISE OFDEEP LEARNING
  • 5.
    “We will move froma mobile-first world to an AI-first world” - Sundar Pichai, Google. Letter to shareholders THE RISE OF DEEP LEARNING
  • 6.
    THE RISE OFDEEP LEARNING
  • 7.
    THIS TALK Demystify deeplearning. Provide a simple way of approaching the subject at a high-level, from the ground-up. GOAL
  • 8.
    THIS TALK Accessible topeople with little or no background in machine learning INTENDED AUDIENCE (experts in the field can hopefully learn something too)
  • 9.
    ROUNDEDTALK OUTLINE • Preliminaries • Machinelearning • Neural networks • Bias / variance tradeoff • The case for deep learning • Why now • State-of-the-art + trends • FAQS
  • 10.
    The ability forcomputers to learn from experience and understand the world in terms of a hierarchy of concepts, with each concept defined in terms of its relation to simpler concepts. DEEP LEARNING: BASIC DEFINITION Without the need for human operators to formally specify all the knowledge the computer needs SUBSET OF MACHINE LEARNING http://www.deeplearningbook.org/
  • 11.
    MACHINE LEARNING isa type of ARTIFICIAL INTELLIGENCE that provides computers the ability to learn without being explicitly programmed MACHINE LEARNING: BASIC DEFINITION http://www.deeplearningbook.org/
  • 12.
    MACHINE LEARNING “JUST X—> Y” Approximate a mapping f from input X to output Y, based on some sample data A SIMPLE WAY TO THINK ABOUT SUPERVISED LEARNING
  • 13.
    MACHINE LEARNING “JUST X—> Y” A SIMPLE WAY TO THINK ABOUT SUPERVISED LEARNING Problem X Y Housing Price Prediction Size (sq. ft), location $35,000 Spam Detection Email Spam / Not Spam Product recommendations Product and user features P(purchase) Loan approval Loan application Will they pay? (0 or 1) Preventive maintenance Sensors from planes / hard disk Is it about to fail?
  • 14.
    http://cs229.stanford.edu/notes/cs229-notes1.pdf Input and outputare both real numbers features output A SIMPLE X —> Y: REGRESSION
  • 15.
    http://cs229.stanford.edu/notes/cs229-notes1.pdf Predict: where weights ware chosen so as to minimize (sum of squared errors) A SIMPLE X —> Y: LINEAR REGRESSION (plus regularization)
  • 16.
    A SIMPLE X—> Y: (BINARY) CLASSIFICATION Presentation of fetus OUTPUT IS EITHER 0 OR 1 Presence of uterine scar Placenta previa Maternal disease Presentation of fetus C-section (0) or natural birth (1) http://www.deeplearningbook.org/
  • 17.
    http://cs229.stanford.edu/notes/cs229-notes1.pdf LOGISTIC REGRESSSION Input: Multiply eachfeature by some weight Map the result smoothly to (0 - 1) A SIMPLE X —> Y: LOGISTIC REGRESSION
  • 18.
    What if exampledata looks like this? X —> Y: NON-LINEAR RELATIONSHIPS
  • 19.
    Can learn non-linear relationshipsin data (universality theorem) Trained using back-propagation NEURAL NETWORKS https://medium.com/@ageitgey/machine-learning-is-fun-part-2-a26a10b68df3 http://neuralnetworksanddeeplearning.com/chap4.html
  • 20.
  • 21.
    PERFECTLY FITTING SAMPLEDATA:
 GOOD OR BAD?
  • 22.
    ASIDE: BIAS-VARIANCE TRADE-OFF https://www.quora.com/What-is-the-best-way-to-explain-the-bias-variance-trade-off-in-laymens-terms Wantour models to generalize to data we haven’t seen (One of the most important ideas in machine learning) http://cs229.stanford.edu/notes/cs229-notes4.pdf
  • 23.
    ASIDE: MODEL GENERALIZABILITY: TRAININGAND VALIDATION SETS Want our models to generalize to data we haven’t seen Leave out some of data to evaluate performance on (called “validation” set) k-fold cross-validation: do this over multiple rounds, each time leaving out a different subset of the sample data
  • 24.
  • 25.
    MACHINE LEARNING:
 JUST CURVEFITTING? Except we don’t call it that. We wouldn’t get any funding if we did that. Machine learning sounds cooler and impresses people.
  • 26.
    MACHINE LEARNING:
 JUST CURVEFITTING? Finding good feature representations is often the biggest challenge! Performance of ML algorithms depends heavily on the presentation of the data they are given. JOKING ASIDE
  • 27.
  • 28.
  • 29.
  • 30.
    ROUNDEDANOTHER CHALLENGE Which of these containsa human face? http://neuralnetworksanddeeplearning.com/chap1.html#toward_deep_learning
  • 31.
    ROUNDEDANOTHER CHALLENGE Does this containa human face? http://neuralnetworksanddeeplearning.com/chap1.html#toward_deep_learning
  • 32.
    WHAT WE SEEVS. WHAT COMPUTERS SEE https://medium.com/@ageitgey/machine-learning-is-fun-part-3-deep-learning-and-convolutional-neural-networks-f40359318721
  • 33.
  • 34.
    + pixel 1 + ++ + ++ + + + + + + +-- - - - - - - - - - - - pixel2 + - Cesarean delivery Natural birth Individual pixels in MRI not correlated with desired outcome CHOICE OF FEATURE REPRESENTATION MATTERS feat 1 feat 2 - - - - -- - + + + + + + + + + + + - Want feature representations so that: Need a doctor to tell the system features: presence of uterine scar, presentation of fetus, placenta previa, maternal disease primiparity, twins, etc. Raw MRI input:
  • 35.
  • 36.
    FEATURE ENGINEERING: NATURAL LANGUAGEPROCESSING PARSER FEATURES STEMMING ONTOLOGIES PART OF SPEECH ANAPHORA
  • 37.
    ROUNDED FEATURE ENGINEERING • Time-consuming • Domainand task-specific • Requires human experts • Lots of trial and error TRADITIONAL APPROACH
  • 38.
    Automatically learns feature representationsfrom data (in terms of a hierarchy of concepts, with each concept defined in terms of its relation to simpler concepts) THE CASE FOR DEEP LEARNING:
 AUTOMATIC FEATURE REPRESENTATION Without the need for human operators to formally specify all the knowledge the computer needs
  • 39.
    THE CASE FORDEEP LEARNING:
 INTUITIVE EXPLANATION Note: not a realistic approach (just to develop intuition) http://neuralnetworksanddeeplearning.com/chap1.html#toward_deep_learning
  • 40.
    THE CASE FORDEEP LEARNING:
 INTUITIVE EXPLANATION Split problems into subproblems (hierarchy of concepts) http://neuralnetworksanddeeplearning.com/chap1.html#toward_deep_learning
  • 41.
    THE CASE FORDEEP LEARNING:
 EXAMPLE OUTPUT OF DEEP NETWORK http://www.cs.toronto.edu/~rgrosse/icml09-cdbn.pdf
  • 42.
    Labeled data isexpensive. Whereas vast amounts of unlabeled data is freely available on the web. Train a network where output = input Learn to see the world the way a human babies learn (exploration as opposed to learning from labeled examples) THE CASE FOR DEEP LEARNING:
 LEARNING FROM UNLABELED DATA
  • 43.
    GOOGLE BRAIN THE CASEFOR DEEP LEARNING:
 LEARNING FROM UNLABELED DATA 1 billion connections 10 million 200x200 px Youtube images (unlabeled) 1,000 machines (16,000 cores) https://static.googleusercontent.com/media/research.google.com/en//archive/unsupervised_icml2012.pdf Trained to predict itself (input =output)
  • 44.
    GOOGLE BRAIN: RESULTS THECASE FOR DEEP LEARNING:
 LEARNING FROM UNLABELED DATA Computer learned concepts of “cat” and “person” without being explicitly told
  • 45.
    THE CASE FORDEEP LEARNING:
 STATE-OF-THE-ART RESULTS ON STANDARD BENCHMARKS ImageNet Large Scale Visual Recognition
  • 46.
    THE CASE FORDEEP LEARNING:
 STATE-OF-THE-ART RESULTS ON STANDARD BENCHMARKS Speech Recognition (TIMIT dataset) Performance stagnated in 2000s Introduction of deep learning (2009) resulted in sudden improvements. Some error rates halved.
  • 47.
    THE CASE FORDEEP LEARNING: REMARKABLE PERFORMANCE Andrew Ng, NIPS 2016
  • 48.
    THE CASE FORDEEP LEARNING: REMARKABLE PERFORMANCE (CAVEAT) (given enough data) Deep networks do not necessarily outperform shallow algorithms here Andrew Ng, NIPS 2016
  • 49.
  • 50.
    WHY NOW Scale ofdata THE FUEL THE ENGINE Computing infrastructure https://www.quora.com/What-does-Andrew-Ng-think-about-Deep-Learning
  • 51.
    THE ENGINE GPUs Computation across severalmachines Software frameworks Theano Torch PyLearn2 Caffe TensorFlow
  • 52.
    POSITIVE FEEDBACK LOOP Lots ofcomputational power Greater incentive to acquire more data Greater incentive to build bigger/faster networks https://www.quora.com/What-does-Andrew-Ng-think-about-Deep-Learning
  • 53.
    POSITIVE FEEDBACK LOOP Efficient computinginfrastructure Faster experiments (e.g. 1 day instead of 1 week) Speeds up innovation https://www.quora.com/What-does-Andrew-Ng-think-about-Deep-Learning
  • 54.
    THE CASE FORDEEP LEARNING:
 BIAS-VARIANCE NOT AS MUCH OF A TRADE-OFF TRADITIONALLY DEEP LEARNING At least one possible action item in each case: get more data or train a bigger model Andrew Ng, NIPS 2016
  • 55.
    THE CASE FORDEEP LEARNING: RICH, COMPLEX INPUTS & OUTPUTS Andrew Ng, NIPS 2016
  • 56.
    THE CASE FORDEEP LEARNING: RICH, COMPLEX INPUTS & OUTPUTS Image super resolution down sampled image -> natural detailed version
  • 57.
    THE CASE FORDEEP LEARNING: RICH, COMPLEX INPUTS & OUTPUTS Image-to-image translation https://phillipi.github.io/pix2pix/
  • 58.
    THE CASE FORDEEP LEARNING: GENERALIZING ACROSS TASKS Joint-Many Tasks (JMT) State-of-the art results on multiple tasks by a single model: - Chunking - Dependency parsing - Semantic relatedness - Textual entailment https://metamind.io/research/multiple-different-natural-language-processing-tasks-in-a-single-deep-model/
  • 59.
    THE CASE FORDEEP LEARNING: GENERALIZING ACROSS TASKS https://blog.openai.com/unsupervised-sentiment-neuron/
  • 60.
    THE CASE FORDEEP LEARNING: GENERALIZING ACROSS MODALITIES Generates sentence descriptions from images Multimodal Recurrent Neural Network (Karpathy, 2014)
  • 61.
    THE CASE FORDEEP LEARNING: SINGLE-LEARNING ALGORITHM HYPOTHESIS Evidence from neuroscience. Ferrets can learn to “see” with the auditory cortex if their brains are rewired to send visual signals to that area. i.e. mammalian brain may use a single algorithm for many different tasks http://www.deeplearningbook.org/contents/intro.html
  • 62.
    THE CASE FORDEEP LEARNING: SINGLE-LEARNING ALGORITHM HYPOTHESIS Machine learning research is becoming less fragmented: NLP VISION MOTION PLANNING SPEECH http://www.deeplearningbook.org/contents/intro.html
  • 63.
    THE CASE FORDEEP LEARNING: TOWARDS END-TO-END LEARNING? Andrew Ng, NIPS 2016
  • 64.
    THE CASE FORDEEP LEARNING: TOWARDS END-TO-END LEARNING? Andrew Ng, NIPS 2016
  • 65.
    ROUNDED SUMMARY • Machine Learning:“Just X to Y” • Choice of feature representation matters • Hand-engineering features is hard! • Deep learning • Intuitive explanation of how deep networks can learn hierarchical feature representations and why it works • No need for humans to formally specify knowledge • Works remarkably well in practice, due to: • Scale of computation • Scale of data • Can be successfully applied to an increasingly wide variety of complex tasks
  • 66.
  • 67.