Promises of Deep Learning
David Khosid
Sept. 1, 2015
Agenda
• Neural Networks (NN) training
• Deep Learning = NN + … + …
• Deep Learning (DL) projects
• Topics: HW, IoT
Easy to human, hard to machines
Q: The goal of this talk?
To provide you with an intuitive understanding of what DL is
and why it works.
The needs:
• Need to perceive and understand the world
• Basic speech and vision capabilities
• Language understanding
How can we do this?
• Cannot write algorithms for each task we want to accomplish separately
• Need to write general algorithms that learn from observations
Why is this hard?
You see this:
But the camera sees this:
Example: Handwritten digit recognition
• The goal: SW to recognize the digit in each image (Classifier)
• Source: “MNIST database of handwritten digits”, 60,000 examples
• Typical human error: 2.5%. Common confusion between {2, 7} , {4,9}
‘1’ versus ‘5’ – features engineering
• Features (properties): ‘intensity’ and ‘symmetry’
x1 -> ‘Intensity’ = Average value for pixel in the image
x2 -> ‘Symmetry’. ‘1’ is more symmetric
x1
x2
Digits recognition – dream solution
• We are looking for features …
• If possible (I don’t know for sure), it requires
exceptional domain expertise.
Ideas for additional features:
- number of separate, connected regions of white
pixels. 1, 2, 3, 5, 7 tend to have one contiguous
region of white space while the loops in 6, 8, 9
create more.
- ask experts 
Do you like the process?
Traditional approach (Features Engineering)
“Hand-Crafted Feature Engineering” Limitations
• Generalization:
• How to recognize handwritten text?
• Printed text in different fonts?
• Time-consuming (of data scientist)
• Not scalable
• Can’t achieve human performance
DNN – Deep Neural Networks
Let's be inspired by nature, but not too much
• “Fly like a bird”
• The dream
• Aerodynamics. We figured that feathers and wing flapping weren't crucial
• Flight envelope: speed, altitude etc
• Brain Inspiration
Biological function Biological structure
Deep Learning (Neural Networks)
Neuroscience: how does the cortex learn perception?
• Does the cortex “run” a single, general learning algorithm? (or a small
number of them)
Deep Learning addresses the problem of learning hierarchical
representations with a single algorithm
• or perhaps with a few algorithms
Concrete(pixels) Abstract (object)
Deep Learning
The Neuron
• Different weights compute different functions
F(.)
Architecture of NN/DL
Neural Networks: Architectures
Neural Networks: Architectures
Neural Networks: Architectures
ImageNet Large Scale Visual Recognition Challenge
“World Cup” for CV and ML
1,000 object classes
1.2M training images
Resolution: 256x256 pixels
Our NN:
Input Layer: 256x256=65,536
Output layer: 1,000
NN: Back-propagation
Learning algorithm
• while not done
• pick a random training case (Xi, Yi)
• run NN on input Xi
• modify connection weights to make
prediction closer to Y
pixels
Learning Features
Q: What do the individual neurons
look for in an image?
DL Leaders
Andrew Ng
Jeff Dean (Google)
Stanford/CourseraNYU
Facebook (80%)
NYU (20%)
Yann LeCun
Geoffrey Hinton
Yoshua Bengio
U. MontrealU. Toronto
Baidu
2014
Google
2011
Google
2013
2013
DL: Rocket analogy
Face recognition error (smaller is better)
DL Architectures: Autoencoders
• Output same size as input
• Have target = input
• High-dimensional data could be
represented
Projects: mining for structure
• Datasets, private and public:
• ImageNet
• YouTube as a data source
• Architectures
• RNN, ConvNet
• AlexNet
“Google Brain” (2012)
• The goal: find ways to improve DL networks that can find deeper and
more meaningful patterns in data using less processing power.
• Famous for recognizing cats in YouTube videos 
• Architecture:
• Autoencoder
• 1 billion connections
• Training procedure (2012):
• Train on 10 million unlabeled images (YouTube)
• 1000 machines (16,000 cores) cluster for 1 week
• Training procedure (2015):
• 32 GPU (HW cost ~$32,000 )
Cat neuron
Le, et al., Building high-level features using large-scale unsupervised learning. ICML 2012
Deep Learning @Google
• Google has invested decades of person-years in building the state-of-
the-art infrastructure
• Leverage thousands of CPUs and GPUs to learn from billions of data
samples in parallel
• Publish frequently, and often place first in academic challenges in
image recognition, speech recognition, etc
• Extensive and accelerating experience in using DL in real products:
47 production launches in the last 2 years.
• e.g. Photo search, Android speech recognition, StreetView, Ads placement...
Example of modern DL architecture:
GoogLeNet
@Facebook: FAIR
• FAIR=Facebook AI Research
• Recommended:
https://www.facebook.com/yann.lecun
https://research.facebook.com/ai
• DeepFace
• M Project
Microsoft: Skype Translator
https://www.youtube.com/watch?v=eu9kMIeS0wQ
New Human-Machine Interfaces
• Beyond Verbal Communication
• Emotions Analytics
Self-driving cars
• Mobileye
• Google
• Tesla
• Apple
Autonomous Driving, clip by Mobileye
https://www.youtube.com/watch?v=yjRtGKtwOlc
Risks: unknown “Failure Modes”
• We will use DL/AI, without anybody fully understands how it works
• Reminder: Human brain and DL are different
Reference: Deep Neural Networks are Easily Fooled: High Confidence Predictions for
Unrecognizable Images
Risks: fooling NN
Risks: fooling NN
Risks: fooling NN
• These images are classified with >99.6% confidence as the shown
class by a Convolutional Network.
Is AI research safe?
• Social impact
• Employment impact
• Military usage
Risk: less privacy
Facebook’s Moments as illustrative example:
“Today we launched a new app called Moments that helps you sync
photos with your friends. Moments recognizes which of your friends are
in the photos you take, and lets you share those photos with those
people in one tap. If you use it, your friends will sync to you a lot of the
photos of you they have hidden in their camera rolls.
This is a simple example of AI at work. By building a system that
learned to recognize people and objects in images, we could enable this
new service.”
Mark Zuckerberg’s blog, June 15, 2015
who is
Hardware
• Nodes with 4 to 8 GPUs. Google has 10,000+ GPUs
• Google is building custom hardware, based on
FPGAs, to run its NNs. Microsoft also. Facebook?
• Mobileye: ConvNet chip for automotive
• Orcam: low-power ConvNet chip
• Torch7 (Lua) – Facebook, Google, Twitter and Intel
• Caffe
Open-Source Frameworks for DL
• Torch7 (Lua). Facebook, Google, Twitter and Intel
• Caffe. The community shares models in “Model Zoo”
• NVIDIA cuDNN – DL library
Money
Example: DeepMind, 75 employees, no product, £ 400 million
Google AI and robotics purchases timeline
October 1, 2012 Viewdle Facial recognition
March 12, 2013 DNNresearch Inc. Deep neural networks
April 23, 2013 Wavii Natural language processing
October 2, 2013 Flutter Gesture recognition technology
December 2, 2013 Schaft Robotics, humanoid robots
December 3, 2013 Industrial Perception Robotic arms, computer vision
December 4, 2013 Redwood Robotics Robotic arms
December 5, 2013 Meka Robotics Robots
December 6, 2013 Holomni Robotic wheels
December 7, 2013 Bot & Dolly Robotic cameras
December 10, 2013 Boston Dynamics Robotics
January 26, 2014 DeepMind Technologies Artificial intelligence
August 17, 2014 Jetpac
Artificial intelligence, image
recognition
October 23, 2014 Dark Blue Labs Artificial Intelligence
October 23, 2014 Vision Factory Artificial Intelligence
Transfer learning + fine tuning
• “training time” vs “execution time” = 5 till 8 orders of magnitude
• DL could be embedded in cars, IoT, smartphones
Summary, Q&A

Promises of Deep Learning

  • 1.
    Promises of DeepLearning David Khosid Sept. 1, 2015
  • 2.
    Agenda • Neural Networks(NN) training • Deep Learning = NN + … + … • Deep Learning (DL) projects • Topics: HW, IoT
  • 3.
    Easy to human,hard to machines Q: The goal of this talk? To provide you with an intuitive understanding of what DL is and why it works. The needs: • Need to perceive and understand the world • Basic speech and vision capabilities • Language understanding How can we do this? • Cannot write algorithms for each task we want to accomplish separately • Need to write general algorithms that learn from observations
  • 4.
    Why is thishard? You see this: But the camera sees this:
  • 5.
    Example: Handwritten digitrecognition • The goal: SW to recognize the digit in each image (Classifier) • Source: “MNIST database of handwritten digits”, 60,000 examples • Typical human error: 2.5%. Common confusion between {2, 7} , {4,9}
  • 6.
    ‘1’ versus ‘5’– features engineering • Features (properties): ‘intensity’ and ‘symmetry’ x1 -> ‘Intensity’ = Average value for pixel in the image x2 -> ‘Symmetry’. ‘1’ is more symmetric x1 x2
  • 7.
    Digits recognition –dream solution • We are looking for features … • If possible (I don’t know for sure), it requires exceptional domain expertise. Ideas for additional features: - number of separate, connected regions of white pixels. 1, 2, 3, 5, 7 tend to have one contiguous region of white space while the loops in 6, 8, 9 create more. - ask experts  Do you like the process?
  • 8.
  • 9.
    “Hand-Crafted Feature Engineering”Limitations • Generalization: • How to recognize handwritten text? • Printed text in different fonts? • Time-consuming (of data scientist) • Not scalable • Can’t achieve human performance DNN – Deep Neural Networks
  • 10.
    Let's be inspiredby nature, but not too much • “Fly like a bird” • The dream • Aerodynamics. We figured that feathers and wing flapping weren't crucial • Flight envelope: speed, altitude etc • Brain Inspiration Biological function Biological structure
  • 11.
    Deep Learning (NeuralNetworks) Neuroscience: how does the cortex learn perception? • Does the cortex “run” a single, general learning algorithm? (or a small number of them) Deep Learning addresses the problem of learning hierarchical representations with a single algorithm • or perhaps with a few algorithms Concrete(pixels) Abstract (object) Deep Learning
  • 12.
    The Neuron • Differentweights compute different functions F(.)
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
    ImageNet Large ScaleVisual Recognition Challenge “World Cup” for CV and ML 1,000 object classes 1.2M training images Resolution: 256x256 pixels Our NN: Input Layer: 256x256=65,536 Output layer: 1,000
  • 18.
    NN: Back-propagation Learning algorithm •while not done • pick a random training case (Xi, Yi) • run NN on input Xi • modify connection weights to make prediction closer to Y pixels
  • 19.
  • 20.
    Q: What dothe individual neurons look for in an image?
  • 21.
    DL Leaders Andrew Ng JeffDean (Google) Stanford/CourseraNYU Facebook (80%) NYU (20%) Yann LeCun Geoffrey Hinton Yoshua Bengio U. MontrealU. Toronto Baidu 2014 Google 2011 Google 2013 2013
  • 22.
  • 24.
    Face recognition error(smaller is better)
  • 26.
    DL Architectures: Autoencoders •Output same size as input • Have target = input • High-dimensional data could be represented
  • 27.
    Projects: mining forstructure • Datasets, private and public: • ImageNet • YouTube as a data source • Architectures • RNN, ConvNet • AlexNet
  • 28.
    “Google Brain” (2012) •The goal: find ways to improve DL networks that can find deeper and more meaningful patterns in data using less processing power. • Famous for recognizing cats in YouTube videos  • Architecture: • Autoencoder • 1 billion connections • Training procedure (2012): • Train on 10 million unlabeled images (YouTube) • 1000 machines (16,000 cores) cluster for 1 week • Training procedure (2015): • 32 GPU (HW cost ~$32,000 ) Cat neuron Le, et al., Building high-level features using large-scale unsupervised learning. ICML 2012
  • 29.
    Deep Learning @Google •Google has invested decades of person-years in building the state-of- the-art infrastructure • Leverage thousands of CPUs and GPUs to learn from billions of data samples in parallel • Publish frequently, and often place first in academic challenges in image recognition, speech recognition, etc • Extensive and accelerating experience in using DL in real products: 47 production launches in the last 2 years. • e.g. Photo search, Android speech recognition, StreetView, Ads placement...
  • 30.
    Example of modernDL architecture: GoogLeNet
  • 32.
    @Facebook: FAIR • FAIR=FacebookAI Research • Recommended: https://www.facebook.com/yann.lecun https://research.facebook.com/ai • DeepFace • M Project
  • 33.
  • 35.
    New Human-Machine Interfaces •Beyond Verbal Communication • Emotions Analytics
  • 36.
    Self-driving cars • Mobileye •Google • Tesla • Apple Autonomous Driving, clip by Mobileye https://www.youtube.com/watch?v=yjRtGKtwOlc
  • 37.
    Risks: unknown “FailureModes” • We will use DL/AI, without anybody fully understands how it works • Reminder: Human brain and DL are different Reference: Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images
  • 38.
  • 39.
  • 40.
    Risks: fooling NN •These images are classified with >99.6% confidence as the shown class by a Convolutional Network.
  • 41.
    Is AI researchsafe? • Social impact • Employment impact • Military usage
  • 42.
    Risk: less privacy Facebook’sMoments as illustrative example: “Today we launched a new app called Moments that helps you sync photos with your friends. Moments recognizes which of your friends are in the photos you take, and lets you share those photos with those people in one tap. If you use it, your friends will sync to you a lot of the photos of you they have hidden in their camera rolls. This is a simple example of AI at work. By building a system that learned to recognize people and objects in images, we could enable this new service.” Mark Zuckerberg’s blog, June 15, 2015 who is
  • 43.
    Hardware • Nodes with4 to 8 GPUs. Google has 10,000+ GPUs • Google is building custom hardware, based on FPGAs, to run its NNs. Microsoft also. Facebook? • Mobileye: ConvNet chip for automotive • Orcam: low-power ConvNet chip • Torch7 (Lua) – Facebook, Google, Twitter and Intel • Caffe
  • 44.
    Open-Source Frameworks forDL • Torch7 (Lua). Facebook, Google, Twitter and Intel • Caffe. The community shares models in “Model Zoo” • NVIDIA cuDNN – DL library
  • 45.
    Money Example: DeepMind, 75employees, no product, £ 400 million Google AI and robotics purchases timeline October 1, 2012 Viewdle Facial recognition March 12, 2013 DNNresearch Inc. Deep neural networks April 23, 2013 Wavii Natural language processing October 2, 2013 Flutter Gesture recognition technology December 2, 2013 Schaft Robotics, humanoid robots December 3, 2013 Industrial Perception Robotic arms, computer vision December 4, 2013 Redwood Robotics Robotic arms December 5, 2013 Meka Robotics Robots December 6, 2013 Holomni Robotic wheels December 7, 2013 Bot & Dolly Robotic cameras December 10, 2013 Boston Dynamics Robotics January 26, 2014 DeepMind Technologies Artificial intelligence August 17, 2014 Jetpac Artificial intelligence, image recognition October 23, 2014 Dark Blue Labs Artificial Intelligence October 23, 2014 Vision Factory Artificial Intelligence
  • 46.
    Transfer learning +fine tuning • “training time” vs “execution time” = 5 till 8 orders of magnitude • DL could be embedded in cars, IoT, smartphones
  • 47.