Before 2012, machine learning algorithms were hand-coded but the emergence of deep neural networks (DNNs) enabled a breakthrough in performance. DNNs are now simpler to develop and deploy, driving changes across many fields. Deep learning is growing exponentially, powering applications in areas like computer vision, natural language processing, robotics, and more. Training typically occurs in datacenters using GPUs while inference can happen on devices, in datacenters, or at the edge using various hardware like CPUs, GPUs, FPGAs and ASICs. Innovation in algorithms, hardware, and frameworks is expected to continue driving improvements in deep learning.
2. Qualcomm Datacenter Technologies, Inc. 2
Before the emergence of DNNs
Algorithms and rule based systems were laboriously hand-coded
But by 2012, the ingredients for change were available
Sufficiently powerful GPU’s
Readily available large data sets on the internet
The Deep Neural Net Era
Everything is a DNN now
The turning point - ImageNet Competition 2012
“ImageNet Classification with Deep Convolutional Neural Networks”, Neural Information
Processing Systems Conference (NIPS 2012)
Deep Neural Net enabled a performance breakthrough
Now - DNN’s are simpler to develop and deploy, ushering in radical change in many fields and
entire industries
8. Qualcomm Datacenter Technologies, Inc. 8
Machine & Deep Learning Applications
Vision Natural Language Processing
Other
Face Recognition
Drones
Self driving cars
Object Recognition
Virtual / Aug-
mented Reality
Smart Robots
Speech Recognition
Translation
Chat BotsGesture Control
MSFT Cortina
Amazon Alexa
Apple Siri
Google Now
Recommendation
Engines
Genomics / DNA sequencing
AdTec
Smart Cities / Homes
IOT / Sensor data
processing
Medical Imaging &
Interpretation
9. Qualcomm Datacenter Technologies, Inc. 9
Server/Cloud
Training
Execution/Inference
Devices
Execution/Inference
Training (emerging)
AI is Increasingly Everywhere
10. 1010
The challenge
of AI workloads
Constrained mobile
environment
Very compute intensive
Large, complicated
neural network models
Must be thermally efficient
for sleek, ultra-light designs
Complex concurrencies
Always-on
Real-time
Requires long battery
life for all-day use
Storage / Memory
bandwidth limitations
Power and thermal efficiency are
essential for on-device AI
11. 1212
Qualcomm® Artificial
Intelligence Platform
The platform for efficient on-device machine learning
A high-performance platform designed to support
myriad intelligent-on-device-capabilities that utilize:
• Qualcomm® Snapdragon™ mobile platform’s heterogeneous
compute capabilities within a highly integrated SoC
• Innovations in machine learning algorithms and enabling software
• Development frameworks to minimize the time and effort for
integrating customer networks with our platform
Audio
intelligence
Intuitive
security
Visual
intelligence
Qualcomm Artificial Intelligence Platform and Qualcomm Snapdragon are products of Qualcomm Technologies, Inc.
12. Qualcomm Datacenter Technologies, Inc. 13
Datacenter Deep Learning Applications
Self Driving Car
NEST
MAPS / Street View
Translate
Photos
Gmail / Smart Reply
Satellite Imagery
Drug
Discovery
News
Prediction
Prediction &
Training
13. Qualcomm Datacenter Technologies, Inc. 14
Deep Learning – Training & Inference
Training
Huge dataset – e.g. 1M+ images
Deep Neural Network
Training:
• “Off-line”, one-time or once-in-a-while
• Runs every 2 weeks
• Exclusively being done by GPUs in
datacenters
Inference
Deployment
Inference:
• Continuous, on-the-fly
• Servers with FPGA, Xeon,
GPU, TPU
• Mobile / automotive device
CAR!
Feed Forward
Training
Model
Back Propagation
Deep Neural Network
Feed Forward
14. Qualcomm Datacenter Technologies, Inc. 15
Datacenter Usage Models
InferenceTraining
• Primarily off-line
• Periodic (nightly to weekly)
• Developer-focused
• Throughput driven
Model DeploymentModel Deployment
• On-line
• Increasingly Integral part of
end-user experience
• Response time critical
Feed Forward
BackProp
FeedForward
15. Qualcomm Datacenter Technologies, Inc. 16
Datacenter Deployment Options
InferenceTraining
• Multiple GPUs
• ASICs (TPU-2)
• Large DPU (Wave)
Model DeploymentModel Deployment
• CPU
• GPU
• FPGAs (MSFT)
• ASICs (TPU & startups)
Feed Forward
BackProp
FeedForward
17. Qualcomm Datacenter Technologies, Inc. 18
Deploying DNNs at Datacenter Scale
Training tends toward concentrated, centralized computation
Inference tends toward wide distribution
GPUs
Large DPU
CPUs
Small DPU
18. Qualcomm Datacenter Technologies, Inc. 19
Training vs Inference
Key NN Concepts for Architects
Batch size
DNNs have millions of weights that take a long time to load from memory
Large batch size and more on chip memory can help
Training in Floating Point on GPUs popularized DNNs
FP32 and FP64 may not be necessary
Is FP16 good enough?
○ Inferring in Integers faster, lower power, and smaller chip area
8 bits or smaller? FP8?
Exploit Sparcity for energy efficiency
Power Budget
KW Box(es) in a Rack for Training?
Less than 40W for PCIe card for inference?
Even Lower for smaller form factors?
Key Tradeoffs for Designers
19. Qualcomm Datacenter Technologies, Inc. 20
CPUs will improve incrementally
GPUs may improve more, but still incrementally
ASIC architects have more freedom to exploit domain specific features of deep learning:
Massive compute parallelism
Dot products that dominate computation
Massive memory bandwidth needs
ASICs will improve dramatically in the new era of Domain Specific Architectures
Advice for ASIC architects:
Computation should be optimized for small data types with large amounts of data parallelism
Memory hierarchy should exploit regular, predictable access patterns with enough on & off-chip bandwidth
Understand sparsity. It impacts both computation and memory.
Learn to exploit new memory technologies
Hardware enabled by open source frameworks that aid development and deployment of NN models.
Number of frameworks is growing and already difficult for HW suppliers to support optimally
Standard IRs(such as ONNX) ease the HW vendors task of delivering tuned, integrated HW & SW
Thoughts on future Silicon for Deep Learning
Expect dramatic hardware performance improvements for years to come.
20. Qualcomm Datacenter Technologies, Inc. 21
CPUs are not powerful enough for training, but have free cycles available for
inference – opportunity for add-in accelerator cards
Instruction Set enhancements can improve performance
GPUs have too much “extra baggage” that add cost and power for features
not needed for AI – opportunity for domain specific accelerators
FPGAs offer more flexibility, but are difficult to program and expensive
ASICs are energy and product cost efficient, but less flexible
Deep neural networks are making significant strides in many areas
speech, vision, language, search, robotics, medical imaging & treatment, drug discovery …
We have an opportunity to dramatically reshape our computating devices
to better server this emerging and growing market
Expect to see lots of innovation and excitement in the years to come
Participate as a solution provider or use deep neural nets to solve your
problems
Parting Thoughts
21. 2222
We need to keep advancing AI
Industries should foster research and development in this space
General and super intelligence is
many decades away requiring
novel discoveries and methods
Regulation may be appropriate when
we get much further along
Tremendous
potential for good
Development of ethics
boards needed
We need to keep advancing AI
Industries should foster research and development in this space
General and super intelligence is
many decades away requiring
novel discoveries and methods
Regulation may be appropriate when
we get much further along
Tremendous
potential for good
Development of ethics
boards needed
General and super intelligence
Tremendous potential as well
Distant future
Decades