Qualcomm Datacenter Technologies, Inc. 1Qualcomm Datacenter Technologies, Inc.
Qualcomm Datacenter Technologies, Inc. 2
 Before the emergence of DNNs
 Algorithms and rule based systems were laboriously hand-coded
 But by 2012, the ingredients for change were available
 Sufficiently powerful GPU’s
 Readily available large data sets on the internet
The Deep Neural Net Era
Everything is a DNN now
 The turning point - ImageNet Competition 2012
 “ImageNet Classification with Deep Convolutional Neural Networks”, Neural Information
Processing Systems Conference (NIPS 2012)
 Deep Neural Net enabled a performance breakthrough
 Now - DNN’s are simpler to develop and deploy, ushering in radical change in many fields and
entire industries
Qualcomm Datacenter Technologies, Inc. 3
Deep Learning is Growing Exponentially
Source: Google
44
Devices,machines,
and things are becoming
more intelligent
55
Learn, infer
context, anticipate
Reasoning
Act intuitively, interact
naturally, protect privacy
Action
Hear, see,
monitor, observe
Perception
Offering new capabilities to enrich our lives
66
Smart
cities
Healthcare
Wearables
Smart
homes
Networking
Industrial
IoT
Extended
reality
Automotive
Superior scale
Rapid
replacement cycles
Integrated and
optimized technologies
Mobile scale changes everything
Bringing AI
to the masses
Smartphones
Mobile
computing
Qualcomm Datacenter Technologies, Inc. 7Source: Jeff Dean, Hot Chips 2017 Keynote
Qualcomm Datacenter Technologies, Inc. 8
Machine & Deep Learning Applications
Vision Natural Language Processing
Other
Face Recognition
Drones
Self driving cars
Object Recognition
Virtual / Aug-
mented Reality
Smart Robots
Speech Recognition
Translation
Chat BotsGesture Control
MSFT Cortina
Amazon Alexa
Apple Siri
Google Now
Recommendation
Engines
Genomics / DNA sequencing
AdTec
Smart Cities / Homes
IOT / Sensor data
processing
Medical Imaging &
Interpretation
Qualcomm Datacenter Technologies, Inc. 9
Server/Cloud
Training
Execution/Inference
Devices
Execution/Inference
Training (emerging)
AI is Increasingly Everywhere
1010
The challenge
of AI workloads
Constrained mobile
environment
Very compute intensive
Large, complicated
neural network models
Must be thermally efficient
for sleek, ultra-light designs
Complex concurrencies
Always-on
Real-time
Requires long battery
life for all-day use
Storage / Memory
bandwidth limitations
Power and thermal efficiency are
essential for on-device AI
1212
Qualcomm® Artificial
Intelligence Platform
The platform for efficient on-device machine learning
A high-performance platform designed to support
myriad intelligent-on-device-capabilities that utilize:
• Qualcomm® Snapdragon™ mobile platform’s heterogeneous
compute capabilities within a highly integrated SoC
• Innovations in machine learning algorithms and enabling software
• Development frameworks to minimize the time and effort for
integrating customer networks with our platform
Audio
intelligence
Intuitive
security
Visual
intelligence
Qualcomm Artificial Intelligence Platform and Qualcomm Snapdragon are products of Qualcomm Technologies, Inc.
Qualcomm Datacenter Technologies, Inc. 13
Datacenter Deep Learning Applications
Self Driving Car
NEST
MAPS / Street View
Translate
Photos
Gmail / Smart Reply
Satellite Imagery
Drug
Discovery
News
Prediction
Prediction &
Training
Qualcomm Datacenter Technologies, Inc. 14
Deep Learning – Training & Inference
Training
Huge dataset – e.g. 1M+ images
Deep Neural Network
Training:
• “Off-line”, one-time or once-in-a-while
• Runs every 2 weeks
• Exclusively being done by GPUs in
datacenters
Inference
Deployment
Inference:
• Continuous, on-the-fly
• Servers with FPGA, Xeon,
GPU, TPU
• Mobile / automotive device
CAR!
Feed Forward
Training
Model
Back Propagation
Deep Neural Network
Feed Forward
Qualcomm Datacenter Technologies, Inc. 15
Datacenter Usage Models
InferenceTraining
• Primarily off-line
• Periodic (nightly to weekly)
• Developer-focused
• Throughput driven
Model DeploymentModel Deployment
• On-line
• Increasingly Integral part of
end-user experience
• Response time critical
Feed Forward
BackProp
FeedForward
Qualcomm Datacenter Technologies, Inc. 16
Datacenter Deployment Options
InferenceTraining
• Multiple GPUs
• ASICs (TPU-2)
• Large DPU (Wave)
Model DeploymentModel Deployment
• CPU
• GPU
• FPGAs (MSFT)
• ASICs (TPU & startups)
Feed Forward
BackProp
FeedForward
Qualcomm Datacenter Technologies, Inc. 17
Source: Microsoft, Hot Chips 2017
Qualcomm Datacenter Technologies, Inc. 18
Deploying DNNs at Datacenter Scale
Training tends toward concentrated, centralized computation
Inference tends toward wide distribution
GPUs
Large DPU
CPUs
Small DPU
Qualcomm Datacenter Technologies, Inc. 19
 Training vs Inference
 Key NN Concepts for Architects
 Batch size
 DNNs have millions of weights that take a long time to load from memory
 Large batch size and more on chip memory can help
 Training in Floating Point on GPUs popularized DNNs
 FP32 and FP64 may not be necessary
 Is FP16 good enough?
 ○ Inferring in Integers faster, lower power, and smaller chip area
 8 bits or smaller? FP8?
 Exploit Sparcity for energy efficiency
 Power Budget
 KW Box(es) in a Rack for Training?
 Less than 40W for PCIe card for inference?
 Even Lower for smaller form factors?
Key Tradeoffs for Designers
Qualcomm Datacenter Technologies, Inc. 20
 CPUs will improve incrementally
 GPUs may improve more, but still incrementally
 ASIC architects have more freedom to exploit domain specific features of deep learning:
 Massive compute parallelism
 Dot products that dominate computation
 Massive memory bandwidth needs
 ASICs will improve dramatically in the new era of Domain Specific Architectures
 Advice for ASIC architects:
 Computation should be optimized for small data types with large amounts of data parallelism
 Memory hierarchy should exploit regular, predictable access patterns with enough on & off-chip bandwidth
 Understand sparsity. It impacts both computation and memory.
 Learn to exploit new memory technologies
 Hardware enabled by open source frameworks that aid development and deployment of NN models.
 Number of frameworks is growing and already difficult for HW suppliers to support optimally
 Standard IRs(such as ONNX) ease the HW vendors task of delivering tuned, integrated HW & SW
Thoughts on future Silicon for Deep Learning
Expect dramatic hardware performance improvements for years to come.
Qualcomm Datacenter Technologies, Inc. 21
 CPUs are not powerful enough for training, but have free cycles available for
inference – opportunity for add-in accelerator cards
 Instruction Set enhancements can improve performance
 GPUs have too much “extra baggage” that add cost and power for features
not needed for AI – opportunity for domain specific accelerators
 FPGAs offer more flexibility, but are difficult to program and expensive
 ASICs are energy and product cost efficient, but less flexible
 Deep neural networks are making significant strides in many areas
 speech, vision, language, search, robotics, medical imaging & treatment, drug discovery …
 We have an opportunity to dramatically reshape our computating devices
to better server this emerging and growing market
 Expect to see lots of innovation and excitement in the years to come
 Participate as a solution provider or use deep neural nets to solve your
problems
Parting Thoughts
2222
We need to keep advancing AI
Industries should foster research and development in this space
General and super intelligence is
many decades away requiring
novel discoveries and methods
Regulation may be appropriate when
we get much further along
Tremendous
potential for good
Development of ethics
boards needed
We need to keep advancing AI
Industries should foster research and development in this space
General and super intelligence is
many decades away requiring
novel discoveries and methods
Regulation may be appropriate when
we get much further along
Tremendous
potential for good
Development of ethics
boards needed
General and super intelligence
Tremendous potential as well
Distant future
Decades
2323
Algorithmic
advancements
Improved optimization
strategies
Specialized
hardware
What’s next?
Follow us on:
For more information, visit us at:
www.qualcomm.com & www.qualcomm.com/blog
Nothing in these materials is an offer to sell any of the components or devices referenced herein.
©2017 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Qualcomm, Snapdragon, Hexagon, Adreno, and Kryo are trademarks of Qualcomm Incorporated, registered in the United States and other
countries. Other products and brand names may be trademarks or registered trademarks of their respective owners.
References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries or
business units within the Qualcomm corporate structure, as applicable. Qualcomm Incorporated includes Qualcomm’s licensing business, QTL,
and the vast majority of its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates,
along with its subsidiaries, substantially all of Qualcomm’s engineering, research and development functions, and substantially all of its product
and services businesses, including its semiconductor business, QCT.
Thank you
Follow us on:
For more information, visit us at:
www.qualcomm.com & www.qualcomm.com/blog
Nothing in these materials is an offer to sell any of the components or devices referenced herein.
©2017 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Qualcomm, Snapdragon, Hexagon, Adreno, and Kryo are trademarks of Qualcomm Incorporated, registered in the United States and other
countries. Other products and brand names may be trademarks or registered trademarks of their respective owners.
References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries or
business units within the Qualcomm corporate structure, as applicable. Qualcomm Incorporated includes Qualcomm’s licensing business, QTL,
and the vast majority of its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates,
along with its subsidiaries, substantially all of Qualcomm’s engineering, research and development functions, and substantially all of its product
and services businesses, including its semiconductor business, QCT.
Thank you

China AI Summit talk 2017

  • 1.
    Qualcomm Datacenter Technologies,Inc. 1Qualcomm Datacenter Technologies, Inc.
  • 2.
    Qualcomm Datacenter Technologies,Inc. 2  Before the emergence of DNNs  Algorithms and rule based systems were laboriously hand-coded  But by 2012, the ingredients for change were available  Sufficiently powerful GPU’s  Readily available large data sets on the internet The Deep Neural Net Era Everything is a DNN now  The turning point - ImageNet Competition 2012  “ImageNet Classification with Deep Convolutional Neural Networks”, Neural Information Processing Systems Conference (NIPS 2012)  Deep Neural Net enabled a performance breakthrough  Now - DNN’s are simpler to develop and deploy, ushering in radical change in many fields and entire industries
  • 3.
    Qualcomm Datacenter Technologies,Inc. 3 Deep Learning is Growing Exponentially Source: Google
  • 4.
    44 Devices,machines, and things arebecoming more intelligent
  • 5.
    55 Learn, infer context, anticipate Reasoning Actintuitively, interact naturally, protect privacy Action Hear, see, monitor, observe Perception Offering new capabilities to enrich our lives
  • 6.
    66 Smart cities Healthcare Wearables Smart homes Networking Industrial IoT Extended reality Automotive Superior scale Rapid replacement cycles Integratedand optimized technologies Mobile scale changes everything Bringing AI to the masses Smartphones Mobile computing
  • 7.
    Qualcomm Datacenter Technologies,Inc. 7Source: Jeff Dean, Hot Chips 2017 Keynote
  • 8.
    Qualcomm Datacenter Technologies,Inc. 8 Machine & Deep Learning Applications Vision Natural Language Processing Other Face Recognition Drones Self driving cars Object Recognition Virtual / Aug- mented Reality Smart Robots Speech Recognition Translation Chat BotsGesture Control MSFT Cortina Amazon Alexa Apple Siri Google Now Recommendation Engines Genomics / DNA sequencing AdTec Smart Cities / Homes IOT / Sensor data processing Medical Imaging & Interpretation
  • 9.
    Qualcomm Datacenter Technologies,Inc. 9 Server/Cloud Training Execution/Inference Devices Execution/Inference Training (emerging) AI is Increasingly Everywhere
  • 10.
    1010 The challenge of AIworkloads Constrained mobile environment Very compute intensive Large, complicated neural network models Must be thermally efficient for sleek, ultra-light designs Complex concurrencies Always-on Real-time Requires long battery life for all-day use Storage / Memory bandwidth limitations Power and thermal efficiency are essential for on-device AI
  • 11.
    1212 Qualcomm® Artificial Intelligence Platform Theplatform for efficient on-device machine learning A high-performance platform designed to support myriad intelligent-on-device-capabilities that utilize: • Qualcomm® Snapdragon™ mobile platform’s heterogeneous compute capabilities within a highly integrated SoC • Innovations in machine learning algorithms and enabling software • Development frameworks to minimize the time and effort for integrating customer networks with our platform Audio intelligence Intuitive security Visual intelligence Qualcomm Artificial Intelligence Platform and Qualcomm Snapdragon are products of Qualcomm Technologies, Inc.
  • 12.
    Qualcomm Datacenter Technologies,Inc. 13 Datacenter Deep Learning Applications Self Driving Car NEST MAPS / Street View Translate Photos Gmail / Smart Reply Satellite Imagery Drug Discovery News Prediction Prediction & Training
  • 13.
    Qualcomm Datacenter Technologies,Inc. 14 Deep Learning – Training & Inference Training Huge dataset – e.g. 1M+ images Deep Neural Network Training: • “Off-line”, one-time or once-in-a-while • Runs every 2 weeks • Exclusively being done by GPUs in datacenters Inference Deployment Inference: • Continuous, on-the-fly • Servers with FPGA, Xeon, GPU, TPU • Mobile / automotive device CAR! Feed Forward Training Model Back Propagation Deep Neural Network Feed Forward
  • 14.
    Qualcomm Datacenter Technologies,Inc. 15 Datacenter Usage Models InferenceTraining • Primarily off-line • Periodic (nightly to weekly) • Developer-focused • Throughput driven Model DeploymentModel Deployment • On-line • Increasingly Integral part of end-user experience • Response time critical Feed Forward BackProp FeedForward
  • 15.
    Qualcomm Datacenter Technologies,Inc. 16 Datacenter Deployment Options InferenceTraining • Multiple GPUs • ASICs (TPU-2) • Large DPU (Wave) Model DeploymentModel Deployment • CPU • GPU • FPGAs (MSFT) • ASICs (TPU & startups) Feed Forward BackProp FeedForward
  • 16.
    Qualcomm Datacenter Technologies,Inc. 17 Source: Microsoft, Hot Chips 2017
  • 17.
    Qualcomm Datacenter Technologies,Inc. 18 Deploying DNNs at Datacenter Scale Training tends toward concentrated, centralized computation Inference tends toward wide distribution GPUs Large DPU CPUs Small DPU
  • 18.
    Qualcomm Datacenter Technologies,Inc. 19  Training vs Inference  Key NN Concepts for Architects  Batch size  DNNs have millions of weights that take a long time to load from memory  Large batch size and more on chip memory can help  Training in Floating Point on GPUs popularized DNNs  FP32 and FP64 may not be necessary  Is FP16 good enough?  ○ Inferring in Integers faster, lower power, and smaller chip area  8 bits or smaller? FP8?  Exploit Sparcity for energy efficiency  Power Budget  KW Box(es) in a Rack for Training?  Less than 40W for PCIe card for inference?  Even Lower for smaller form factors? Key Tradeoffs for Designers
  • 19.
    Qualcomm Datacenter Technologies,Inc. 20  CPUs will improve incrementally  GPUs may improve more, but still incrementally  ASIC architects have more freedom to exploit domain specific features of deep learning:  Massive compute parallelism  Dot products that dominate computation  Massive memory bandwidth needs  ASICs will improve dramatically in the new era of Domain Specific Architectures  Advice for ASIC architects:  Computation should be optimized for small data types with large amounts of data parallelism  Memory hierarchy should exploit regular, predictable access patterns with enough on & off-chip bandwidth  Understand sparsity. It impacts both computation and memory.  Learn to exploit new memory technologies  Hardware enabled by open source frameworks that aid development and deployment of NN models.  Number of frameworks is growing and already difficult for HW suppliers to support optimally  Standard IRs(such as ONNX) ease the HW vendors task of delivering tuned, integrated HW & SW Thoughts on future Silicon for Deep Learning Expect dramatic hardware performance improvements for years to come.
  • 20.
    Qualcomm Datacenter Technologies,Inc. 21  CPUs are not powerful enough for training, but have free cycles available for inference – opportunity for add-in accelerator cards  Instruction Set enhancements can improve performance  GPUs have too much “extra baggage” that add cost and power for features not needed for AI – opportunity for domain specific accelerators  FPGAs offer more flexibility, but are difficult to program and expensive  ASICs are energy and product cost efficient, but less flexible  Deep neural networks are making significant strides in many areas  speech, vision, language, search, robotics, medical imaging & treatment, drug discovery …  We have an opportunity to dramatically reshape our computating devices to better server this emerging and growing market  Expect to see lots of innovation and excitement in the years to come  Participate as a solution provider or use deep neural nets to solve your problems Parting Thoughts
  • 21.
    2222 We need tokeep advancing AI Industries should foster research and development in this space General and super intelligence is many decades away requiring novel discoveries and methods Regulation may be appropriate when we get much further along Tremendous potential for good Development of ethics boards needed We need to keep advancing AI Industries should foster research and development in this space General and super intelligence is many decades away requiring novel discoveries and methods Regulation may be appropriate when we get much further along Tremendous potential for good Development of ethics boards needed General and super intelligence Tremendous potential as well Distant future Decades
  • 22.
  • 23.
    Follow us on: Formore information, visit us at: www.qualcomm.com & www.qualcomm.com/blog Nothing in these materials is an offer to sell any of the components or devices referenced herein. ©2017 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Qualcomm, Snapdragon, Hexagon, Adreno, and Kryo are trademarks of Qualcomm Incorporated, registered in the United States and other countries. Other products and brand names may be trademarks or registered trademarks of their respective owners. References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries or business units within the Qualcomm corporate structure, as applicable. Qualcomm Incorporated includes Qualcomm’s licensing business, QTL, and the vast majority of its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of Qualcomm’s engineering, research and development functions, and substantially all of its product and services businesses, including its semiconductor business, QCT. Thank you Follow us on: For more information, visit us at: www.qualcomm.com & www.qualcomm.com/blog Nothing in these materials is an offer to sell any of the components or devices referenced herein. ©2017 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Qualcomm, Snapdragon, Hexagon, Adreno, and Kryo are trademarks of Qualcomm Incorporated, registered in the United States and other countries. Other products and brand names may be trademarks or registered trademarks of their respective owners. References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries or business units within the Qualcomm corporate structure, as applicable. Qualcomm Incorporated includes Qualcomm’s licensing business, QTL, and the vast majority of its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of Qualcomm’s engineering, research and development functions, and substantially all of its product and services businesses, including its semiconductor business, QCT. Thank you