3. “cat”
Deep Learning
Modern Reincarnation of Artificial Neural Networks
Collection of simple trainable mathematical units, organized in layers, that work together to solve
complicated tasks
Key Benefit
Learns features from raw, heterogeneous, noisy data
No explicit feature engineering required
What’s New
new network architectures,
new training math, *scale*
7. input
Pixels:
Audio:
“Hello, how are you?”
output
“lion”
“How cold is it outside?”
“Bonjour, comment allez-vous?”
Functions a Deep Neural Network Can Learn
8. input
Pixels:
Audio:
“Hello, how are you?”
Pixels:
output
“lion”
“How cold is it outside?”
“Bonjour, comment allez-vous?”
“A blue and yellow train
travelling down the tracks”
Functions a Deep Neural Network Can Learn
15. 2008: Grand Engineering Challenges for 21st Century
● Make solar energy affordable
● Provide energy from fusion
● Develop carbon sequestration methods
● Manage the nitrogen cycle
● Provide access to clean water
● Restore & improve urban infrastructure
● Advance health informatics
● Engineer better medicines
● Reverse-engineer the brain
● Prevent nuclear terror
● Secure cyberspace
● Enhance virtual reality
● Advance personalized learning
● Engineer the tools for scientific
discovery
www.engineeringchallenges.org/challenges.aspx
19. Current:
Solution = ML expertise + data +
computation
Can we turn this into:
Solution = data + 100X computation
???
20. Idea: model-generating model trained via reinforcement learning
(1) Generate ten models
(2) Train them for a few hours
(3) Use loss of the generated models as reinforcement learning signal
Neural Architecture Search with Reinforcement Learning, Zoph & Le, ICLR 2016
arxiv.org/abs/1611.01578
Neural Architecture Search
31. Combining Vision with Robotics
“Deep Learning for Robots: Learning
from Large-Scale Interaction”, Google
Research Blog, March, 2016
“Learning Hand-Eye Coordination for
Robotic Grasping with Deep Learning
and Large-Scale Data Collection”,
Sergey Levine, Peter Pastor, Alex
Krizhevsky, & Deirdre Quillen,
Arxiv, arxiv.org/abs/1603.02199
Neural networks are not new. In fact, many of the algorithms that we are using today were invented in the 80s and 90s. At that time, neural nets showed promising, interesting results for small toy problems. However, because the learning algorithms for neural nets are relatively expensive computationally, though, it was difficult to make neural nets work on large, realistic datasets.
Moore’s law, which postulated that computers would double in performance roughly every 18 months, has been a tremendous driver of more computational power for the last 40 years. We now have computers that are 1000s of times as powerful as we had in the 1980s.
The recent success of deep learning has largely come about because we finally have both large, interesting real world datasets, as well as enough computational power to train large, powerful models on these datasets. Neural nets are now the best solution for a wide variety of problems, solving many problems in vision, speech recognition, and language understanding that we don’t know how to solve any other way.
It’s as if we’ve gone from this...
To this. In evolutionary biology, the time when animals first evolved eyes was likely a time of great change. We’re now at that same point in computer vision, and it’s having a dramatic effect in what machines can accomplish, and it’s very exciting. I’ll give you just a few examples of what great computer vision makes possible.
Deep learning is also reshaping how we think about designing computers. General purpose CPUs have to be able to do many different things. In contrast, the computations in neural nets are quite specialized, and have two extremely important properties. First, the computations are extremely tolerant of reduced precision. It’s perfectly fine in a neural net to do approximate computations, like “about 1.2 times about 0.6”, rather than “1.21042 times 0.61127”. General purpose CPUs are designed to handle much higher precision operations, which are much more costly in terms of number of transistors and chip area. In contrast, you can pack quite a lot of low precision multipliers and adders onto a chip and therefore perform many more operations every second.
Second, the vast majority of the computations done by deep learning algorithms are matrix and vector operations. So, we can build specialized hardware that is extremely good at very low precision, matrix and vector computations.