China AI Summit talk 2017

Qualcomm Datacenter Technologies, Inc. 1Qualcomm Datacenter Technologies, Inc.

Qualcomm Datacenter Technologies, Inc. 2
 Before the emergence of DNNs
 Algorithms and rule based systems were laboriously hand-coded
 But by 2012, the ingredients for change were available
 Sufficiently powerful GPU’s
 Readily available large data sets on the internet
The Deep Neural Net Era
Everything is a DNN now
 The turning point - ImageNet Competition 2012
 “ImageNet Classification with Deep Convolutional Neural Networks”, Neural Information
Processing Systems Conference (NIPS 2012)
 Deep Neural Net enabled a performance breakthrough
 Now - DNN’s are simpler to develop and deploy, ushering in radical change in many fields and
entire industries

Deep Learning is Growing Exponentially
Source: Google

44
Devices,machines,
and things are becoming
more intelligent

55
Learn, infer
context, anticipate
Reasoning
Act intuitively, interact
naturally, protect privacy
Action
Hear, see,
monitor, observe
Perception
Offering new capabilities to enrich our lives

66
Smart
cities
Healthcare
Wearables
Smart
homes
Networking
Industrial
IoT
Extended
reality
Automotive
Superior scale
Rapid
replacement cycles
Integrated and
optimized technologies
Mobile scale changes everything
Bringing AI
to the masses
Smartphones
Mobile
computing

Qualcomm Datacenter Technologies, Inc. 7Source: Jeff Dean, Hot Chips 2017 Keynote

Machine & Deep Learning Applications
Vision Natural Language Processing
Other
Face Recognition
Drones
Self driving cars
Object Recognition
Virtual / Aug-
mented Reality
Smart Robots
Speech Recognition
Translation
Chat BotsGesture Control
MSFT Cortina
Amazon Alexa
Apple Siri
Google Now
Recommendation
Engines
Genomics / DNA sequencing
AdTec
Smart Cities / Homes
IOT / Sensor data
processing
Medical Imaging &
Interpretation

Server/Cloud
Training
Execution/Inference
Devices
Execution/Inference
Training (emerging)
AI is Increasingly Everywhere

1010
The challenge
of AI workloads
Constrained mobile
environment
Very compute intensive
Large, complicated
neural network models
Must be thermally efficient
for sleek, ultra-light designs
Complex concurrencies
Always-on
Real-time
Requires long battery
life for all-day use
Storage / Memory
bandwidth limitations
Power and thermal efficiency are
essential for on-device AI

1212
Qualcomm® Artificial
Intelligence Platform
The platform for efficient on-device machine learning
A high-performance platform designed to support
myriad intelligent-on-device-capabilities that utilize:
• Qualcomm® Snapdragon™ mobile platform’s heterogeneous
compute capabilities within a highly integrated SoC
• Innovations in machine learning algorithms and enabling software
• Development frameworks to minimize the time and effort for
integrating customer networks with our platform
Audio
intelligence
Intuitive
security
Visual
intelligence
Qualcomm Artificial Intelligence Platform and Qualcomm Snapdragon are products of Qualcomm Technologies, Inc.

Datacenter Deep Learning Applications
Self Driving Car
NEST
MAPS / Street View
Translate
Photos
Gmail / Smart Reply
Satellite Imagery
Drug
Discovery
News
Prediction
Prediction &
Training

Deep Learning – Training & Inference
Training
Huge dataset – e.g. 1M+ images
Deep Neural Network
Training:
• “Off-line”, one-time or once-in-a-while
• Runs every 2 weeks
• Exclusively being done by GPUs in
datacenters
Inference
Deployment
Inference:
• Continuous, on-the-fly
• Servers with FPGA, Xeon,
GPU, TPU
• Mobile / automotive device
CAR!
Feed Forward
Training
Model
Back Propagation
Deep Neural Network
Feed Forward

Datacenter Usage Models
InferenceTraining
• Primarily off-line
• Periodic (nightly to weekly)
• Developer-focused
• Throughput driven
Model DeploymentModel Deployment
• On-line
• Increasingly Integral part of
end-user experience
• Response time critical
Feed Forward
BackProp
FeedForward

Datacenter Deployment Options
InferenceTraining
• Multiple GPUs
• ASICs (TPU-2)
• Large DPU (Wave)
Model DeploymentModel Deployment
• CPU
• GPU
• FPGAs (MSFT)
• ASICs (TPU & startups)
Feed Forward
BackProp
FeedForward

Source: Microsoft, Hot Chips 2017

Deploying DNNs at Datacenter Scale
Training tends toward concentrated, centralized computation
Inference tends toward wide distribution
GPUs
Large DPU
CPUs
Small DPU

 Training vs Inference
 Key NN Concepts for Architects
 Batch size
 DNNs have millions of weights that take a long time to load from memory
 Large batch size and more on chip memory can help
 Training in Floating Point on GPUs popularized DNNs
 FP32 and FP64 may not be necessary
 Is FP16 good enough?
 ○ Inferring in Integers faster, lower power, and smaller chip area
 8 bits or smaller? FP8?
 Exploit Sparcity for energy efficiency
 Power Budget
 KW Box(es) in a Rack for Training?
 Less than 40W for PCIe card for inference?
 Even Lower for smaller form factors?
Key Tradeoffs for Designers

 CPUs will improve incrementally
 GPUs may improve more, but still incrementally
 ASIC architects have more freedom to exploit domain specific features of deep learning:
 Massive compute parallelism
 Dot products that dominate computation
 Massive memory bandwidth needs
 ASICs will improve dramatically in the new era of Domain Specific Architectures
 Advice for ASIC architects:
 Computation should be optimized for small data types with large amounts of data parallelism
 Memory hierarchy should exploit regular, predictable access patterns with enough on & off-chip bandwidth
 Understand sparsity. It impacts both computation and memory.
 Learn to exploit new memory technologies
 Hardware enabled by open source frameworks that aid development and deployment of NN models.
 Number of frameworks is growing and already difficult for HW suppliers to support optimally
 Standard IRs(such as ONNX) ease the HW vendors task of delivering tuned, integrated HW & SW
Thoughts on future Silicon for Deep Learning
Expect dramatic hardware performance improvements for years to come.

 CPUs are not powerful enough for training, but have free cycles available for
inference – opportunity for add-in accelerator cards
 Instruction Set enhancements can improve performance
 GPUs have too much “extra baggage” that add cost and power for features
not needed for AI – opportunity for domain specific accelerators
 FPGAs offer more flexibility, but are difficult to program and expensive
 ASICs are energy and product cost efficient, but less flexible
 Deep neural networks are making significant strides in many areas
 speech, vision, language, search, robotics, medical imaging & treatment, drug discovery …
 We have an opportunity to dramatically reshape our computating devices
to better server this emerging and growing market
 Expect to see lots of innovation and excitement in the years to come
 Participate as a solution provider or use deep neural nets to solve your
problems
Parting Thoughts

2222
We need to keep advancing AI
Industries should foster research and development in this space
General and super intelligence is
many decades away requiring
novel discoveries and methods
Regulation may be appropriate when
we get much further along
Tremendous
potential for good
Development of ethics
boards needed
We need to keep advancing AI
Industries should foster research and development in this space
General and super intelligence is
many decades away requiring
novel discoveries and methods
Regulation may be appropriate when
we get much further along
Tremendous
potential for good
Development of ethics
boards needed
General and super intelligence
Tremendous potential as well
Distant future
Decades

2323
Algorithmic
advancements
Improved optimization
strategies
Specialized
hardware
What’s next?

Follow us on:
For more information, visit us at:
www.qualcomm.com & www.qualcomm.com/blog
Nothing in these materials is an offer to sell any of the components or devices referenced herein.
©2017 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Qualcomm, Snapdragon, Hexagon, Adreno, and Kryo are trademarks of Qualcomm Incorporated, registered in the United States and other
countries. Other products and brand names may be trademarks or registered trademarks of their respective owners.
References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries or
business units within the Qualcomm corporate structure, as applicable. Qualcomm Incorporated includes Qualcomm’s licensing business, QTL,
and the vast majority of its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates,
along with its subsidiaries, substantially all of Qualcomm’s engineering, research and development functions, and substantially all of its product
and services businesses, including its semiconductor business, QCT.
Thank you
Follow us on:
For more information, visit us at:
www.qualcomm.com & www.qualcomm.com/blog
Nothing in these materials is an offer to sell any of the components or devices referenced herein.
©2017 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Qualcomm, Snapdragon, Hexagon, Adreno, and Kryo are trademarks of Qualcomm Incorporated, registered in the United States and other
countries. Other products and brand names may be trademarks or registered trademarks of their respective owners.
References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries or
business units within the Qualcomm corporate structure, as applicable. Qualcomm Incorporated includes Qualcomm’s licensing business, QTL,
and the vast majority of its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates,
along with its subsidiaries, substantially all of Qualcomm’s engineering, research and development functions, and substantially all of its product
and services businesses, including its semiconductor business, QCT.
Thank you

China AI Summit talk 2017

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to China AI Summit talk 2017

Similar to China AI Summit talk 2017 (20)

More from Dileep Bhandarkar

More from Dileep Bhandarkar (20)

Recently uploaded

Recently uploaded (20)

China AI Summit talk 2017