SlideShare a Scribd company logo
Qualcomm Datacenter Technologies, Inc. 1Qualcomm Datacenter Technologies, Inc.
Qualcomm Datacenter Technologies, Inc. 2
 Before the emergence of DNNs
 Algorithms and rule based systems were laboriously hand-coded
 But by 2012, the ingredients for change were available
 Sufficiently powerful GPU’s
 Readily available large data sets on the internet
The Deep Neural Net Era
Everything is a DNN now
 The turning point - ImageNet Competition 2012
 “ImageNet Classification with Deep Convolutional Neural Networks”, Neural Information
Processing Systems Conference (NIPS 2012)
 Deep Neural Net enabled a performance breakthrough
 Now - DNN’s are simpler to develop and deploy, ushering in radical change in many fields and
entire industries
Qualcomm Datacenter Technologies, Inc. 3
Deep Learning is Growing Exponentially
Source: Google
44
Devices,machines,
and things are becoming
more intelligent
55
Learn, infer
context, anticipate
Reasoning
Act intuitively, interact
naturally, protect privacy
Action
Hear, see,
monitor, observe
Perception
Offering new capabilities to enrich our lives
66
Smart
cities
Healthcare
Wearables
Smart
homes
Networking
Industrial
IoT
Extended
reality
Automotive
Superior scale
Rapid
replacement cycles
Integrated and
optimized technologies
Mobile scale changes everything
Bringing AI
to the masses
Smartphones
Mobile
computing
Qualcomm Datacenter Technologies, Inc. 7Source: Jeff Dean, Hot Chips 2017 Keynote
Qualcomm Datacenter Technologies, Inc. 8
Machine & Deep Learning Applications
Vision Natural Language Processing
Other
Face Recognition
Drones
Self driving cars
Object Recognition
Virtual / Aug-
mented Reality
Smart Robots
Speech Recognition
Translation
Chat BotsGesture Control
MSFT Cortina
Amazon Alexa
Apple Siri
Google Now
Recommendation
Engines
Genomics / DNA sequencing
AdTec
Smart Cities / Homes
IOT / Sensor data
processing
Medical Imaging &
Interpretation
Qualcomm Datacenter Technologies, Inc. 9
Server/Cloud
Training
Execution/Inference
Devices
Execution/Inference
Training (emerging)
AI is Increasingly Everywhere
1010
The challenge
of AI workloads
Constrained mobile
environment
Very compute intensive
Large, complicated
neural network models
Must be thermally efficient
for sleek, ultra-light designs
Complex concurrencies
Always-on
Real-time
Requires long battery
life for all-day use
Storage / Memory
bandwidth limitations
Power and thermal efficiency are
essential for on-device AI
1212
Qualcomm® Artificial
Intelligence Platform
The platform for efficient on-device machine learning
A high-performance platform designed to support
myriad intelligent-on-device-capabilities that utilize:
• Qualcomm® Snapdragon™ mobile platform’s heterogeneous
compute capabilities within a highly integrated SoC
• Innovations in machine learning algorithms and enabling software
• Development frameworks to minimize the time and effort for
integrating customer networks with our platform
Audio
intelligence
Intuitive
security
Visual
intelligence
Qualcomm Artificial Intelligence Platform and Qualcomm Snapdragon are products of Qualcomm Technologies, Inc.
Qualcomm Datacenter Technologies, Inc. 13
Datacenter Deep Learning Applications
Self Driving Car
NEST
MAPS / Street View
Translate
Photos
Gmail / Smart Reply
Satellite Imagery
Drug
Discovery
News
Prediction
Prediction &
Training
Qualcomm Datacenter Technologies, Inc. 14
Deep Learning – Training & Inference
Training
Huge dataset – e.g. 1M+ images
Deep Neural Network
Training:
• “Off-line”, one-time or once-in-a-while
• Runs every 2 weeks
• Exclusively being done by GPUs in
datacenters
Inference
Deployment
Inference:
• Continuous, on-the-fly
• Servers with FPGA, Xeon,
GPU, TPU
• Mobile / automotive device
CAR!
Feed Forward
Training
Model
Back Propagation
Deep Neural Network
Feed Forward
Qualcomm Datacenter Technologies, Inc. 15
Datacenter Usage Models
InferenceTraining
• Primarily off-line
• Periodic (nightly to weekly)
• Developer-focused
• Throughput driven
Model DeploymentModel Deployment
• On-line
• Increasingly Integral part of
end-user experience
• Response time critical
Feed Forward
BackProp
FeedForward
Qualcomm Datacenter Technologies, Inc. 16
Datacenter Deployment Options
InferenceTraining
• Multiple GPUs
• ASICs (TPU-2)
• Large DPU (Wave)
Model DeploymentModel Deployment
• CPU
• GPU
• FPGAs (MSFT)
• ASICs (TPU & startups)
Feed Forward
BackProp
FeedForward
Qualcomm Datacenter Technologies, Inc. 17
Source: Microsoft, Hot Chips 2017
Qualcomm Datacenter Technologies, Inc. 18
Deploying DNNs at Datacenter Scale
Training tends toward concentrated, centralized computation
Inference tends toward wide distribution
GPUs
Large DPU
CPUs
Small DPU
Qualcomm Datacenter Technologies, Inc. 19
 Training vs Inference
 Key NN Concepts for Architects
 Batch size
 DNNs have millions of weights that take a long time to load from memory
 Large batch size and more on chip memory can help
 Training in Floating Point on GPUs popularized DNNs
 FP32 and FP64 may not be necessary
 Is FP16 good enough?
 ○ Inferring in Integers faster, lower power, and smaller chip area
 8 bits or smaller? FP8?
 Exploit Sparcity for energy efficiency
 Power Budget
 KW Box(es) in a Rack for Training?
 Less than 40W for PCIe card for inference?
 Even Lower for smaller form factors?
Key Tradeoffs for Designers
Qualcomm Datacenter Technologies, Inc. 20
 CPUs will improve incrementally
 GPUs may improve more, but still incrementally
 ASIC architects have more freedom to exploit domain specific features of deep learning:
 Massive compute parallelism
 Dot products that dominate computation
 Massive memory bandwidth needs
 ASICs will improve dramatically in the new era of Domain Specific Architectures
 Advice for ASIC architects:
 Computation should be optimized for small data types with large amounts of data parallelism
 Memory hierarchy should exploit regular, predictable access patterns with enough on & off-chip bandwidth
 Understand sparsity. It impacts both computation and memory.
 Learn to exploit new memory technologies
 Hardware enabled by open source frameworks that aid development and deployment of NN models.
 Number of frameworks is growing and already difficult for HW suppliers to support optimally
 Standard IRs(such as ONNX) ease the HW vendors task of delivering tuned, integrated HW & SW
Thoughts on future Silicon for Deep Learning
Expect dramatic hardware performance improvements for years to come.
Qualcomm Datacenter Technologies, Inc. 21
 CPUs are not powerful enough for training, but have free cycles available for
inference – opportunity for add-in accelerator cards
 Instruction Set enhancements can improve performance
 GPUs have too much “extra baggage” that add cost and power for features
not needed for AI – opportunity for domain specific accelerators
 FPGAs offer more flexibility, but are difficult to program and expensive
 ASICs are energy and product cost efficient, but less flexible
 Deep neural networks are making significant strides in many areas
 speech, vision, language, search, robotics, medical imaging & treatment, drug discovery …
 We have an opportunity to dramatically reshape our computating devices
to better server this emerging and growing market
 Expect to see lots of innovation and excitement in the years to come
 Participate as a solution provider or use deep neural nets to solve your
problems
Parting Thoughts
2222
We need to keep advancing AI
Industries should foster research and development in this space
General and super intelligence is
many decades away requiring
novel discoveries and methods
Regulation may be appropriate when
we get much further along
Tremendous
potential for good
Development of ethics
boards needed
We need to keep advancing AI
Industries should foster research and development in this space
General and super intelligence is
many decades away requiring
novel discoveries and methods
Regulation may be appropriate when
we get much further along
Tremendous
potential for good
Development of ethics
boards needed
General and super intelligence
Tremendous potential as well
Distant future
Decades
2323
Algorithmic
advancements
Improved optimization
strategies
Specialized
hardware
What’s next?
Follow us on:
For more information, visit us at:
www.qualcomm.com & www.qualcomm.com/blog
Nothing in these materials is an offer to sell any of the components or devices referenced herein.
©2017 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Qualcomm, Snapdragon, Hexagon, Adreno, and Kryo are trademarks of Qualcomm Incorporated, registered in the United States and other
countries. Other products and brand names may be trademarks or registered trademarks of their respective owners.
References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries or
business units within the Qualcomm corporate structure, as applicable. Qualcomm Incorporated includes Qualcomm’s licensing business, QTL,
and the vast majority of its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates,
along with its subsidiaries, substantially all of Qualcomm’s engineering, research and development functions, and substantially all of its product
and services businesses, including its semiconductor business, QCT.
Thank you
Follow us on:
For more information, visit us at:
www.qualcomm.com & www.qualcomm.com/blog
Nothing in these materials is an offer to sell any of the components or devices referenced herein.
©2017 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Qualcomm, Snapdragon, Hexagon, Adreno, and Kryo are trademarks of Qualcomm Incorporated, registered in the United States and other
countries. Other products and brand names may be trademarks or registered trademarks of their respective owners.
References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries or
business units within the Qualcomm corporate structure, as applicable. Qualcomm Incorporated includes Qualcomm’s licensing business, QTL,
and the vast majority of its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates,
along with its subsidiaries, substantially all of Qualcomm’s engineering, research and development functions, and substantially all of its product
and services businesses, including its semiconductor business, QCT.
Thank you

More Related Content

What's hot

Intel 14nm aug11
Intel 14nm aug11Intel 14nm aug11
Intel 14nm aug11
lopatto
 
Basics Of VLSI
Basics Of VLSIBasics Of VLSI
Basics Of VLSI
Avanish Agarwal
 
Parallelism Processor Design
Parallelism Processor DesignParallelism Processor Design
Parallelism Processor Design
Sri Prasanna
 
Network: Synchronization: IEEE1588's Future in Computing and the Data Center
Network: Synchronization: IEEE1588's Future in Computing and the Data CenterNetwork: Synchronization: IEEE1588's Future in Computing and the Data Center
Network: Synchronization: IEEE1588's Future in Computing and the Data Center
Michelle Holley
 
Qualcomm Snapdragon Processor
Qualcomm Snapdragon ProcessorQualcomm Snapdragon Processor
Qualcomm Snapdragon Processor
Krishna Gehlot
 
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor CoreZen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
AMD
 
Blue Line Supermicro Superblade
Blue Line Supermicro SuperbladeBlue Line Supermicro Superblade
Blue Line Supermicro Superblade
Blue Line
 
Snapdragon Processor
Snapdragon ProcessorSnapdragon Processor
Snapdragon Processor
Krishna Gehlot
 
Basics of vlsi
Basics of vlsiBasics of vlsi
Qualcomm Snapdragon 820 Product and Infographics
Qualcomm Snapdragon 820 Product and InfographicsQualcomm Snapdragon 820 Product and Infographics
Qualcomm Snapdragon 820 Product and Infographics
Mark Shedd
 
Derek Aberle Presentation for Qualcomm Snapdragon 820
Derek Aberle Presentation for Qualcomm Snapdragon 820Derek Aberle Presentation for Qualcomm Snapdragon 820
Derek Aberle Presentation for Qualcomm Snapdragon 820
Low Hong Chuan
 
Vlsi
VlsiVlsi
Vlsi
soumya968
 
Vlsi
VlsiVlsi
Supercomputers and Cloud Games
Supercomputers and Cloud GamesSupercomputers and Cloud Games
Supercomputers and Cloud Games
Shinra_Technologies
 
Altera’s Role In Accelerating the Internet of Things
Altera’s Role In Accelerating the Internet of ThingsAltera’s Role In Accelerating the Internet of Things
Altera’s Role In Accelerating the Internet of Things
Altera Corporation
 
CAST BA22 32-bit Processor Design Seminar, 2/1/12
CAST BA22 32-bit Processor Design Seminar, 2/1/12CAST BA22 32-bit Processor Design Seminar, 2/1/12
CAST BA22 32-bit Processor Design Seminar, 2/1/12
CAST, Inc.
 
Increasing Throughput per Node for Content Delivery Networks
Increasing Throughput per Node for Content Delivery NetworksIncreasing Throughput per Node for Content Delivery Networks
Increasing Throughput per Node for Content Delivery Networks
DESMOND YUEN
 
Cost-Effective System Continuation using Xilinx FPGAs and Legacy Processor IP
Cost-Effective System Continuation using Xilinx FPGAs and Legacy Processor IPCost-Effective System Continuation using Xilinx FPGAs and Legacy Processor IP
Cost-Effective System Continuation using Xilinx FPGAs and Legacy Processor IP
CAST, Inc.
 
Performance out of the box developers
Performance   out of the box developersPerformance   out of the box developers
Performance out of the box developers
Michelle Holley
 
Shinra's Vision for Gaming / Presented at GigHacks 2015
Shinra's Vision for Gaming / Presented at GigHacks 2015Shinra's Vision for Gaming / Presented at GigHacks 2015
Shinra's Vision for Gaming / Presented at GigHacks 2015
KC Digital Drive
 

What's hot (20)

Intel 14nm aug11
Intel 14nm aug11Intel 14nm aug11
Intel 14nm aug11
 
Basics Of VLSI
Basics Of VLSIBasics Of VLSI
Basics Of VLSI
 
Parallelism Processor Design
Parallelism Processor DesignParallelism Processor Design
Parallelism Processor Design
 
Network: Synchronization: IEEE1588's Future in Computing and the Data Center
Network: Synchronization: IEEE1588's Future in Computing and the Data CenterNetwork: Synchronization: IEEE1588's Future in Computing and the Data Center
Network: Synchronization: IEEE1588's Future in Computing and the Data Center
 
Qualcomm Snapdragon Processor
Qualcomm Snapdragon ProcessorQualcomm Snapdragon Processor
Qualcomm Snapdragon Processor
 
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor CoreZen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
 
Blue Line Supermicro Superblade
Blue Line Supermicro SuperbladeBlue Line Supermicro Superblade
Blue Line Supermicro Superblade
 
Snapdragon Processor
Snapdragon ProcessorSnapdragon Processor
Snapdragon Processor
 
Basics of vlsi
Basics of vlsiBasics of vlsi
Basics of vlsi
 
Qualcomm Snapdragon 820 Product and Infographics
Qualcomm Snapdragon 820 Product and InfographicsQualcomm Snapdragon 820 Product and Infographics
Qualcomm Snapdragon 820 Product and Infographics
 
Derek Aberle Presentation for Qualcomm Snapdragon 820
Derek Aberle Presentation for Qualcomm Snapdragon 820Derek Aberle Presentation for Qualcomm Snapdragon 820
Derek Aberle Presentation for Qualcomm Snapdragon 820
 
Vlsi
VlsiVlsi
Vlsi
 
Vlsi
VlsiVlsi
Vlsi
 
Supercomputers and Cloud Games
Supercomputers and Cloud GamesSupercomputers and Cloud Games
Supercomputers and Cloud Games
 
Altera’s Role In Accelerating the Internet of Things
Altera’s Role In Accelerating the Internet of ThingsAltera’s Role In Accelerating the Internet of Things
Altera’s Role In Accelerating the Internet of Things
 
CAST BA22 32-bit Processor Design Seminar, 2/1/12
CAST BA22 32-bit Processor Design Seminar, 2/1/12CAST BA22 32-bit Processor Design Seminar, 2/1/12
CAST BA22 32-bit Processor Design Seminar, 2/1/12
 
Increasing Throughput per Node for Content Delivery Networks
Increasing Throughput per Node for Content Delivery NetworksIncreasing Throughput per Node for Content Delivery Networks
Increasing Throughput per Node for Content Delivery Networks
 
Cost-Effective System Continuation using Xilinx FPGAs and Legacy Processor IP
Cost-Effective System Continuation using Xilinx FPGAs and Legacy Processor IPCost-Effective System Continuation using Xilinx FPGAs and Legacy Processor IP
Cost-Effective System Continuation using Xilinx FPGAs and Legacy Processor IP
 
Performance out of the box developers
Performance   out of the box developersPerformance   out of the box developers
Performance out of the box developers
 
Shinra's Vision for Gaming / Presented at GigHacks 2015
Shinra's Vision for Gaming / Presented at GigHacks 2015Shinra's Vision for Gaming / Presented at GigHacks 2015
Shinra's Vision for Gaming / Presented at GigHacks 2015
 

Similar to China AI Summit talk 2017

Hipeac 2018 keynote Talk
Hipeac 2018 keynote TalkHipeac 2018 keynote Talk
Hipeac 2018 keynote Talk
Dileep Bhandarkar
 
Vertex Perspectives | AI Optimized Chipsets | Part III
Vertex Perspectives | AI Optimized Chipsets | Part IIIVertex Perspectives | AI Optimized Chipsets | Part III
Vertex Perspectives | AI Optimized Chipsets | Part III
Vertex Holdings
 
Achieving AI @scale on Mobile Devices
Achieving AI @scale on Mobile DevicesAchieving AI @scale on Mobile Devices
Achieving AI @scale on Mobile Devices
Qualcomm Research
 
Deep learning at supercomputing scale by Rangan Sukumar from Cray
Deep learning at supercomputing scale  by Rangan Sukumar from CrayDeep learning at supercomputing scale  by Rangan Sukumar from Cray
Deep learning at supercomputing scale by Rangan Sukumar from Cray
Bill Liu
 
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
Ανδρέας Τσαγκάρης, 5th Digital Banking ForumΑνδρέας Τσαγκάρης, 5th Digital Banking Forum
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
Starttech Ventures
 
Vertex perspectives ai optimized chipsets (part i)
Vertex perspectives   ai optimized chipsets (part i)Vertex perspectives   ai optimized chipsets (part i)
Vertex perspectives ai optimized chipsets (part i)
Yanai Oron
 
Vertex Perspectives | AI-optimized Chipsets | Part I
Vertex Perspectives | AI-optimized Chipsets | Part IVertex Perspectives | AI-optimized Chipsets | Part I
Vertex Perspectives | AI-optimized Chipsets | Part I
Vertex Holdings
 
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC..."Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
Edge AI and Vision Alliance
 
The future of AI is hybrid
The future of AI is hybridThe future of AI is hybrid
The future of AI is hybrid
Qualcomm Research
 
Dell NVIDIA AI Roadshow - South Western Ontario
Dell NVIDIA AI Roadshow - South Western OntarioDell NVIDIA AI Roadshow - South Western Ontario
Dell NVIDIA AI Roadshow - South Western Ontario
Bill Wong
 
Re-Imagining the Data Center with Intel
Re-Imagining the Data Center with IntelRe-Imagining the Data Center with Intel
Re-Imagining the Data Center with Intel
Intel IT Center
 
Linaro connect 2018 keynote final updated
Linaro connect 2018 keynote final updatedLinaro connect 2018 keynote final updated
Linaro connect 2018 keynote final updated
Dileep Bhandarkar
 
Give Your Organization Better, Faster Insights & Answers with High Performanc...
Give Your Organization Better, Faster Insights & Answers with High Performanc...Give Your Organization Better, Faster Insights & Answers with High Performanc...
Give Your Organization Better, Faster Insights & Answers with High Performanc...
Dell World
 
The Intel Xeon Scalable Processor and IoT
The Intel Xeon Scalable Processor and IoTThe Intel Xeon Scalable Processor and IoT
The Intel Xeon Scalable Processor and IoT
Advantech Industrial Automation Group
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
Linaro
 
On-Device AI
On-Device AIOn-Device AI
On-Device AI
LGCNSairesearch
 
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
Igor José F. Freitas
 
“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...
“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...
“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...
Edge AI and Vision Alliance
 
nippon semiconductor
nippon semiconductornippon semiconductor
nippon semiconductor
vikas gupta
 
Tyrone-Intel oneAPI Webinar: Optimized Tools for Performance-Driven, Cross-Ar...
Tyrone-Intel oneAPI Webinar: Optimized Tools for Performance-Driven, Cross-Ar...Tyrone-Intel oneAPI Webinar: Optimized Tools for Performance-Driven, Cross-Ar...
Tyrone-Intel oneAPI Webinar: Optimized Tools for Performance-Driven, Cross-Ar...
Tyrone Systems
 

Similar to China AI Summit talk 2017 (20)

Hipeac 2018 keynote Talk
Hipeac 2018 keynote TalkHipeac 2018 keynote Talk
Hipeac 2018 keynote Talk
 
Vertex Perspectives | AI Optimized Chipsets | Part III
Vertex Perspectives | AI Optimized Chipsets | Part IIIVertex Perspectives | AI Optimized Chipsets | Part III
Vertex Perspectives | AI Optimized Chipsets | Part III
 
Achieving AI @scale on Mobile Devices
Achieving AI @scale on Mobile DevicesAchieving AI @scale on Mobile Devices
Achieving AI @scale on Mobile Devices
 
Deep learning at supercomputing scale by Rangan Sukumar from Cray
Deep learning at supercomputing scale  by Rangan Sukumar from CrayDeep learning at supercomputing scale  by Rangan Sukumar from Cray
Deep learning at supercomputing scale by Rangan Sukumar from Cray
 
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
Ανδρέας Τσαγκάρης, 5th Digital Banking ForumΑνδρέας Τσαγκάρης, 5th Digital Banking Forum
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
 
Vertex perspectives ai optimized chipsets (part i)
Vertex perspectives   ai optimized chipsets (part i)Vertex perspectives   ai optimized chipsets (part i)
Vertex perspectives ai optimized chipsets (part i)
 
Vertex Perspectives | AI-optimized Chipsets | Part I
Vertex Perspectives | AI-optimized Chipsets | Part IVertex Perspectives | AI-optimized Chipsets | Part I
Vertex Perspectives | AI-optimized Chipsets | Part I
 
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC..."Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
 
The future of AI is hybrid
The future of AI is hybridThe future of AI is hybrid
The future of AI is hybrid
 
Dell NVIDIA AI Roadshow - South Western Ontario
Dell NVIDIA AI Roadshow - South Western OntarioDell NVIDIA AI Roadshow - South Western Ontario
Dell NVIDIA AI Roadshow - South Western Ontario
 
Re-Imagining the Data Center with Intel
Re-Imagining the Data Center with IntelRe-Imagining the Data Center with Intel
Re-Imagining the Data Center with Intel
 
Linaro connect 2018 keynote final updated
Linaro connect 2018 keynote final updatedLinaro connect 2018 keynote final updated
Linaro connect 2018 keynote final updated
 
Give Your Organization Better, Faster Insights & Answers with High Performanc...
Give Your Organization Better, Faster Insights & Answers with High Performanc...Give Your Organization Better, Faster Insights & Answers with High Performanc...
Give Your Organization Better, Faster Insights & Answers with High Performanc...
 
The Intel Xeon Scalable Processor and IoT
The Intel Xeon Scalable Processor and IoTThe Intel Xeon Scalable Processor and IoT
The Intel Xeon Scalable Processor and IoT
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
 
On-Device AI
On-Device AIOn-Device AI
On-Device AI
 
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
 
“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...
“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...
“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...
 
nippon semiconductor
nippon semiconductornippon semiconductor
nippon semiconductor
 
Tyrone-Intel oneAPI Webinar: Optimized Tools for Performance-Driven, Cross-Ar...
Tyrone-Intel oneAPI Webinar: Optimized Tools for Performance-Driven, Cross-Ar...Tyrone-Intel oneAPI Webinar: Optimized Tools for Performance-Driven, Cross-Ar...
Tyrone-Intel oneAPI Webinar: Optimized Tools for Performance-Driven, Cross-Ar...
 

More from Dileep Bhandarkar

Open Compute Summit Keynote 17 June 2011
Open Compute Summit Keynote 17 June 2011Open Compute Summit Keynote 17 June 2011
Open Compute Summit Keynote 17 June 2011
Dileep Bhandarkar
 
Datacenter Dynamics Chicago 30 sept 2010
Datacenter Dynamics Chicago 30 sept 2010Datacenter Dynamics Chicago 30 sept 2010
Datacenter Dynamics Chicago 30 sept 2010
Dileep Bhandarkar
 
Energy Efficiency Considerations in Large Datacenters
Energy Efficiency Considerations in Large DatacentersEnergy Efficiency Considerations in Large Datacenters
Energy Efficiency Considerations in Large Datacenters
Dileep Bhandarkar
 
Samsung cio-forum-2012
Samsung cio-forum-2012 Samsung cio-forum-2012
Samsung cio-forum-2012
Dileep Bhandarkar
 
Data center-server-cooling-power-management-paper
Data center-server-cooling-power-management-paperData center-server-cooling-power-management-paper
Data center-server-cooling-power-management-paper
Dileep Bhandarkar
 
Moscow conference keynote in 2012
Moscow conference keynote in 2012Moscow conference keynote in 2012
Moscow conference keynote in 2012
Dileep Bhandarkar
 
New Delhi Cloud Summit 05 26-11
New Delhi Cloud Summit 05 26-11New Delhi Cloud Summit 05 26-11
New Delhi Cloud Summit 05 26-11
Dileep Bhandarkar
 
Performance Characterization of the Pentium Pro Processor
Performance Characterization of the Pentium Pro ProcessorPerformance Characterization of the Pentium Pro Processor
Performance Characterization of the Pentium Pro Processor
Dileep Bhandarkar
 
Innovation lecture for hong kong
Innovation lecture for hong kongInnovation lecture for hong kong
Innovation lecture for hong kong
Dileep Bhandarkar
 
Performance from Architecture: Comparing a RISC and a CISC with Similar Hardw...
Performance from Architecture: Comparing a RISC and a CISC with Similar Hardw...Performance from Architecture: Comparing a RISC and a CISC with Similar Hardw...
Performance from Architecture: Comparing a RISC and a CISC with Similar Hardw...
Dileep Bhandarkar
 
Qualcomm centriq 2400 hot chips final submission corrected
Qualcomm centriq 2400 hot chips final submission correctedQualcomm centriq 2400 hot chips final submission corrected
Qualcomm centriq 2400 hot chips final submission corrected
Dileep Bhandarkar
 
Innovation lecture for shanghai final
Innovation lecture for shanghai finalInnovation lecture for shanghai final
Innovation lecture for shanghai final
Dileep Bhandarkar
 
Semicon2018 dileepb
Semicon2018 dileepbSemicon2018 dileepb
Semicon2018 dileepb
Dileep Bhandarkar
 
Alpha memo july 1992
Alpha memo july 1992Alpha memo july 1992
Alpha memo july 1992
Dileep Bhandarkar
 
Risc vs cisc
Risc vs ciscRisc vs cisc
Risc vs cisc
Dileep Bhandarkar
 
Server design summit keynote handout
Server design summit keynote handoutServer design summit keynote handout
Server design summit keynote handout
Dileep Bhandarkar
 
DileepB EDPS talk 2015
DileepB  EDPS talk 2015DileepB  EDPS talk 2015
DileepB EDPS talk 2015
Dileep Bhandarkar
 
Intel microprocessors
Intel microprocessorsIntel microprocessors
Intel microprocessors
Dileep Bhandarkar
 
Future of server design
Future of server designFuture of server design
Future of server design
Dileep Bhandarkar
 
Dileep b in 2013
Dileep b  in 2013Dileep b  in 2013
Dileep b in 2013
Dileep Bhandarkar
 

More from Dileep Bhandarkar (20)

Open Compute Summit Keynote 17 June 2011
Open Compute Summit Keynote 17 June 2011Open Compute Summit Keynote 17 June 2011
Open Compute Summit Keynote 17 June 2011
 
Datacenter Dynamics Chicago 30 sept 2010
Datacenter Dynamics Chicago 30 sept 2010Datacenter Dynamics Chicago 30 sept 2010
Datacenter Dynamics Chicago 30 sept 2010
 
Energy Efficiency Considerations in Large Datacenters
Energy Efficiency Considerations in Large DatacentersEnergy Efficiency Considerations in Large Datacenters
Energy Efficiency Considerations in Large Datacenters
 
Samsung cio-forum-2012
Samsung cio-forum-2012 Samsung cio-forum-2012
Samsung cio-forum-2012
 
Data center-server-cooling-power-management-paper
Data center-server-cooling-power-management-paperData center-server-cooling-power-management-paper
Data center-server-cooling-power-management-paper
 
Moscow conference keynote in 2012
Moscow conference keynote in 2012Moscow conference keynote in 2012
Moscow conference keynote in 2012
 
New Delhi Cloud Summit 05 26-11
New Delhi Cloud Summit 05 26-11New Delhi Cloud Summit 05 26-11
New Delhi Cloud Summit 05 26-11
 
Performance Characterization of the Pentium Pro Processor
Performance Characterization of the Pentium Pro ProcessorPerformance Characterization of the Pentium Pro Processor
Performance Characterization of the Pentium Pro Processor
 
Innovation lecture for hong kong
Innovation lecture for hong kongInnovation lecture for hong kong
Innovation lecture for hong kong
 
Performance from Architecture: Comparing a RISC and a CISC with Similar Hardw...
Performance from Architecture: Comparing a RISC and a CISC with Similar Hardw...Performance from Architecture: Comparing a RISC and a CISC with Similar Hardw...
Performance from Architecture: Comparing a RISC and a CISC with Similar Hardw...
 
Qualcomm centriq 2400 hot chips final submission corrected
Qualcomm centriq 2400 hot chips final submission correctedQualcomm centriq 2400 hot chips final submission corrected
Qualcomm centriq 2400 hot chips final submission corrected
 
Innovation lecture for shanghai final
Innovation lecture for shanghai finalInnovation lecture for shanghai final
Innovation lecture for shanghai final
 
Semicon2018 dileepb
Semicon2018 dileepbSemicon2018 dileepb
Semicon2018 dileepb
 
Alpha memo july 1992
Alpha memo july 1992Alpha memo july 1992
Alpha memo july 1992
 
Risc vs cisc
Risc vs ciscRisc vs cisc
Risc vs cisc
 
Server design summit keynote handout
Server design summit keynote handoutServer design summit keynote handout
Server design summit keynote handout
 
DileepB EDPS talk 2015
DileepB  EDPS talk 2015DileepB  EDPS talk 2015
DileepB EDPS talk 2015
 
Intel microprocessors
Intel microprocessorsIntel microprocessors
Intel microprocessors
 
Future of server design
Future of server designFuture of server design
Future of server design
 
Dileep b in 2013
Dileep b  in 2013Dileep b  in 2013
Dileep b in 2013
 

Recently uploaded

"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
AlexanderRichford
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
ScyllaDB
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
Fwdays
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Ukraine
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
Ortus Solutions, Corp
 
Discover the Unseen: Tailored Recommendation of Unwatched Content
Discover the Unseen: Tailored Recommendation of Unwatched ContentDiscover the Unseen: Tailored Recommendation of Unwatched Content
Discover the Unseen: Tailored Recommendation of Unwatched Content
ScyllaDB
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 

Recently uploaded (20)

"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
 
Discover the Unseen: Tailored Recommendation of Unwatched Content
Discover the Unseen: Tailored Recommendation of Unwatched ContentDiscover the Unseen: Tailored Recommendation of Unwatched Content
Discover the Unseen: Tailored Recommendation of Unwatched Content
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 

China AI Summit talk 2017

  • 1. Qualcomm Datacenter Technologies, Inc. 1Qualcomm Datacenter Technologies, Inc.
  • 2. Qualcomm Datacenter Technologies, Inc. 2  Before the emergence of DNNs  Algorithms and rule based systems were laboriously hand-coded  But by 2012, the ingredients for change were available  Sufficiently powerful GPU’s  Readily available large data sets on the internet The Deep Neural Net Era Everything is a DNN now  The turning point - ImageNet Competition 2012  “ImageNet Classification with Deep Convolutional Neural Networks”, Neural Information Processing Systems Conference (NIPS 2012)  Deep Neural Net enabled a performance breakthrough  Now - DNN’s are simpler to develop and deploy, ushering in radical change in many fields and entire industries
  • 3. Qualcomm Datacenter Technologies, Inc. 3 Deep Learning is Growing Exponentially Source: Google
  • 4. 44 Devices,machines, and things are becoming more intelligent
  • 5. 55 Learn, infer context, anticipate Reasoning Act intuitively, interact naturally, protect privacy Action Hear, see, monitor, observe Perception Offering new capabilities to enrich our lives
  • 6. 66 Smart cities Healthcare Wearables Smart homes Networking Industrial IoT Extended reality Automotive Superior scale Rapid replacement cycles Integrated and optimized technologies Mobile scale changes everything Bringing AI to the masses Smartphones Mobile computing
  • 7. Qualcomm Datacenter Technologies, Inc. 7Source: Jeff Dean, Hot Chips 2017 Keynote
  • 8. Qualcomm Datacenter Technologies, Inc. 8 Machine & Deep Learning Applications Vision Natural Language Processing Other Face Recognition Drones Self driving cars Object Recognition Virtual / Aug- mented Reality Smart Robots Speech Recognition Translation Chat BotsGesture Control MSFT Cortina Amazon Alexa Apple Siri Google Now Recommendation Engines Genomics / DNA sequencing AdTec Smart Cities / Homes IOT / Sensor data processing Medical Imaging & Interpretation
  • 9. Qualcomm Datacenter Technologies, Inc. 9 Server/Cloud Training Execution/Inference Devices Execution/Inference Training (emerging) AI is Increasingly Everywhere
  • 10. 1010 The challenge of AI workloads Constrained mobile environment Very compute intensive Large, complicated neural network models Must be thermally efficient for sleek, ultra-light designs Complex concurrencies Always-on Real-time Requires long battery life for all-day use Storage / Memory bandwidth limitations Power and thermal efficiency are essential for on-device AI
  • 11. 1212 Qualcomm® Artificial Intelligence Platform The platform for efficient on-device machine learning A high-performance platform designed to support myriad intelligent-on-device-capabilities that utilize: • Qualcomm® Snapdragon™ mobile platform’s heterogeneous compute capabilities within a highly integrated SoC • Innovations in machine learning algorithms and enabling software • Development frameworks to minimize the time and effort for integrating customer networks with our platform Audio intelligence Intuitive security Visual intelligence Qualcomm Artificial Intelligence Platform and Qualcomm Snapdragon are products of Qualcomm Technologies, Inc.
  • 12. Qualcomm Datacenter Technologies, Inc. 13 Datacenter Deep Learning Applications Self Driving Car NEST MAPS / Street View Translate Photos Gmail / Smart Reply Satellite Imagery Drug Discovery News Prediction Prediction & Training
  • 13. Qualcomm Datacenter Technologies, Inc. 14 Deep Learning – Training & Inference Training Huge dataset – e.g. 1M+ images Deep Neural Network Training: • “Off-line”, one-time or once-in-a-while • Runs every 2 weeks • Exclusively being done by GPUs in datacenters Inference Deployment Inference: • Continuous, on-the-fly • Servers with FPGA, Xeon, GPU, TPU • Mobile / automotive device CAR! Feed Forward Training Model Back Propagation Deep Neural Network Feed Forward
  • 14. Qualcomm Datacenter Technologies, Inc. 15 Datacenter Usage Models InferenceTraining • Primarily off-line • Periodic (nightly to weekly) • Developer-focused • Throughput driven Model DeploymentModel Deployment • On-line • Increasingly Integral part of end-user experience • Response time critical Feed Forward BackProp FeedForward
  • 15. Qualcomm Datacenter Technologies, Inc. 16 Datacenter Deployment Options InferenceTraining • Multiple GPUs • ASICs (TPU-2) • Large DPU (Wave) Model DeploymentModel Deployment • CPU • GPU • FPGAs (MSFT) • ASICs (TPU & startups) Feed Forward BackProp FeedForward
  • 16. Qualcomm Datacenter Technologies, Inc. 17 Source: Microsoft, Hot Chips 2017
  • 17. Qualcomm Datacenter Technologies, Inc. 18 Deploying DNNs at Datacenter Scale Training tends toward concentrated, centralized computation Inference tends toward wide distribution GPUs Large DPU CPUs Small DPU
  • 18. Qualcomm Datacenter Technologies, Inc. 19  Training vs Inference  Key NN Concepts for Architects  Batch size  DNNs have millions of weights that take a long time to load from memory  Large batch size and more on chip memory can help  Training in Floating Point on GPUs popularized DNNs  FP32 and FP64 may not be necessary  Is FP16 good enough?  ○ Inferring in Integers faster, lower power, and smaller chip area  8 bits or smaller? FP8?  Exploit Sparcity for energy efficiency  Power Budget  KW Box(es) in a Rack for Training?  Less than 40W for PCIe card for inference?  Even Lower for smaller form factors? Key Tradeoffs for Designers
  • 19. Qualcomm Datacenter Technologies, Inc. 20  CPUs will improve incrementally  GPUs may improve more, but still incrementally  ASIC architects have more freedom to exploit domain specific features of deep learning:  Massive compute parallelism  Dot products that dominate computation  Massive memory bandwidth needs  ASICs will improve dramatically in the new era of Domain Specific Architectures  Advice for ASIC architects:  Computation should be optimized for small data types with large amounts of data parallelism  Memory hierarchy should exploit regular, predictable access patterns with enough on & off-chip bandwidth  Understand sparsity. It impacts both computation and memory.  Learn to exploit new memory technologies  Hardware enabled by open source frameworks that aid development and deployment of NN models.  Number of frameworks is growing and already difficult for HW suppliers to support optimally  Standard IRs(such as ONNX) ease the HW vendors task of delivering tuned, integrated HW & SW Thoughts on future Silicon for Deep Learning Expect dramatic hardware performance improvements for years to come.
  • 20. Qualcomm Datacenter Technologies, Inc. 21  CPUs are not powerful enough for training, but have free cycles available for inference – opportunity for add-in accelerator cards  Instruction Set enhancements can improve performance  GPUs have too much “extra baggage” that add cost and power for features not needed for AI – opportunity for domain specific accelerators  FPGAs offer more flexibility, but are difficult to program and expensive  ASICs are energy and product cost efficient, but less flexible  Deep neural networks are making significant strides in many areas  speech, vision, language, search, robotics, medical imaging & treatment, drug discovery …  We have an opportunity to dramatically reshape our computating devices to better server this emerging and growing market  Expect to see lots of innovation and excitement in the years to come  Participate as a solution provider or use deep neural nets to solve your problems Parting Thoughts
  • 21. 2222 We need to keep advancing AI Industries should foster research and development in this space General and super intelligence is many decades away requiring novel discoveries and methods Regulation may be appropriate when we get much further along Tremendous potential for good Development of ethics boards needed We need to keep advancing AI Industries should foster research and development in this space General and super intelligence is many decades away requiring novel discoveries and methods Regulation may be appropriate when we get much further along Tremendous potential for good Development of ethics boards needed General and super intelligence Tremendous potential as well Distant future Decades
  • 23. Follow us on: For more information, visit us at: www.qualcomm.com & www.qualcomm.com/blog Nothing in these materials is an offer to sell any of the components or devices referenced herein. ©2017 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Qualcomm, Snapdragon, Hexagon, Adreno, and Kryo are trademarks of Qualcomm Incorporated, registered in the United States and other countries. Other products and brand names may be trademarks or registered trademarks of their respective owners. References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries or business units within the Qualcomm corporate structure, as applicable. Qualcomm Incorporated includes Qualcomm’s licensing business, QTL, and the vast majority of its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of Qualcomm’s engineering, research and development functions, and substantially all of its product and services businesses, including its semiconductor business, QCT. Thank you Follow us on: For more information, visit us at: www.qualcomm.com & www.qualcomm.com/blog Nothing in these materials is an offer to sell any of the components or devices referenced herein. ©2017 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Qualcomm, Snapdragon, Hexagon, Adreno, and Kryo are trademarks of Qualcomm Incorporated, registered in the United States and other countries. Other products and brand names may be trademarks or registered trademarks of their respective owners. References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries or business units within the Qualcomm corporate structure, as applicable. Qualcomm Incorporated includes Qualcomm’s licensing business, QTL, and the vast majority of its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of Qualcomm’s engineering, research and development functions, and substantially all of its product and services businesses, including its semiconductor business, QCT. Thank you