SlideShare a Scribd company logo
1 of 28
Download to read offline
Manish Pandey
May 13, 2019
Keynote Talk
Artificial Intelligence: Driving the Next Generation of
Chips and Systems
2019
Tel Aviv, Israel
22019
Revolution in Artificial Intelligence
32019
Intelligence Infused from the Edge to the Cloud
42019
Fueled by advances in AI/ML
2000 2005 2010 2015
Growthinpapers(relativeto1996)
Increasing
Accuracy
ErrorRateinImageClassification
2010 2011 2012 2013 2014 2015 2017
52019
AI Revolution Enabled by HardwareAI driven by advances in hardware
Chiwawa
=
=
=
62019
Deep Learning Advances Gated by Hardware
[Bill Dally, SysML 2018]
72019
Deep Learning Advances Gated by Hardware
• Results improve with
o Larger Models
o Larger Datasets
=> More Computation
82019
Data Set and Model Size Scaling
[Hestess et al. Arxiv 17.12.00409]
• 256X data
• 64X model size
Þ Compute 32,000 X
92019
How Much Compute?
12 HD Cameras
Inferencing**
• 25 Million Weights
• 300 Gops for HD Image
• 9.4 Tops for 30fps
• 12 Cameras, 3 nets = 338 Tops
Training
• 30Tops x 108 (train set) x 10^2 (epochs) =
1023 Ops
**ResNet-50
[Bill Dally, SysML 2018]
102019
Challenge: End of Line for CMOS Scaling
• Device scaling down slowing
• Power Density stopped scaling in 2005
[Olukotun, Horowitz][IMEC]
112019
Dennard Scaling to Dark Silicon
Can we specialize designs for DNNs?
Taylor, "A landscape of the new dark silicon design regime." IEEE Micro 33.5 (2013): 8-19
122019
Energy Cost for Operations
Instruction Fetch/D 70
[Horowitz ISSCC 2014]
*45nm
132019
Energy Cost for DNN Ops
Y = AB + C
Instruction Data Type Energy Energy/Op
1 Op/Instr
(*,+)
Memory fp32 89 nJ 693 pJ
128 Ops/Instr
(AX+B)
Memory fp32 72 nJ 562 pJ
128 Ops/Instr
(AX+B)
Cache fp32 0.87 nJ 6.82 pJ
128 Ops/Instr
(AX+B)
Cache fp16 0.47 nJ 3.68 pJ
*45nm
142019
Build a processor tuned to application - specialize for DNNs
• More effective parallelism for AI Operations (not ILP)
– SIMD vs. MIMD
– VLIW vs. Speculative, out-of-order
• Eliminate unneeded accuracy
– IEEE replaced by lower precision FP
– 32-64 bit bit integers to 8-16 bit integers
• More effective use of memory bandwidth
– User controlled versus caches
How many TeraOps per Watt?
152019
2-D Grid of Processing Elements
Systolic Arrays
• Balance compute and I/O
• Local Communication
• Concurrency
• M1*M2 nxn mult in 2n cycles
162019
TPUv2
• 8GB + 8GB HBM
• 600 GB/s Mem BW
• 32b float
• MXU fp32 accumulation
• Reduced precision mult.
[Dean NIPS’17]
172019
Datatypes – Reduced Precision
16-bits accuracy matches 32-bits!
Stochastic Rounding: 2.2 rounded to 2 with probability 0.8
rounded to 3 with probability 0.2
182019
Integer Weight Representations
192019
Extreme Datatypes
• Q: What can we do with single bit or ternary (+a, -b, 0) representations
• A: Surprisingly, a lot. Just retrain/increase the number of activation layers
[Chenzhuo et al,, arxiv 1612.01064]
Network
202019
Pruning – How many values?
90% values can be thrown away
212019
Quantization – How many distinct values?
16 Values => 4 bits representation
Instead of 16 fp32 numbers, store16 4-bit indices
[Han et al, 2015]
222019
Memory Locality
20mm
32-bit op
4pJ
256-bit access
8KB Cache
50pJ
256-bit
Bus 256pJ
DRAM 32bits
640pJ
256-bit
Bus 25pJ
Compute Weights
Interconnect
Bring Compute Closer to Memory
232019
Memory and inter-chip communication advances
[Graphcore 2018]
pJ
64pJ/B
16GB
900GB/s @ 60W
10pJ/B
256MB
6TB/s @ 60W
1pJ/B
1000 x 256KB
60TB/s @ 60W
Memory power density
Is ~25% of logic power density
242019
In-Memory Compute
[Ando et al., JSSC 04/18]
In-memory compute with low-bits/value, low cost ops
252019
AI Application-System Co-design
262019
AI Application-System Co-design
[Reagen et al, Minerva, ISCA 2016]
• Co-design across the algorithm, architecture, and circuit levels
• Optimize DNN hardware accelerators across multiple datasets
272019
Where Next?
• Rise of non-Von Neumann Architectures to efficiently solve Domain-Specific
Problems
– A new Golden Age of Computer Architecture
• Advances in semi technology, computer architecture will continue advances in AI
• 10 Tops/W is the current State of the Art (~14nm)
• Several orders of magnitude gap with the human brain
Thank You

More Related Content

Similar to ChipEx 2019 keynote

“Processing Raw Images Efficiently on the MAX78000 Neural Network Accelerator...
“Processing Raw Images Efficiently on the MAX78000 Neural Network Accelerator...“Processing Raw Images Efficiently on the MAX78000 Neural Network Accelerator...
“Processing Raw Images Efficiently on the MAX78000 Neural Network Accelerator...
Edge AI and Vision Alliance
 

Similar to ChipEx 2019 keynote (20)

Nvidia and ibm presentation feb18
Nvidia and ibm presentation feb18Nvidia and ibm presentation feb18
Nvidia and ibm presentation feb18
 
Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early...
Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early...Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early...
Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early...
 
Accelerating algorithmic and hardware advancements for power efficient on-dev...
Accelerating algorithmic and hardware advancements for power efficient on-dev...Accelerating algorithmic and hardware advancements for power efficient on-dev...
Accelerating algorithmic and hardware advancements for power efficient on-dev...
 
Hermes: Enabling Energy-efficient IoT Networks with Generalized Deduplication
Hermes: Enabling Energy-efficient IoT Networks with Generalized DeduplicationHermes: Enabling Energy-efficient IoT Networks with Generalized Deduplication
Hermes: Enabling Energy-efficient IoT Networks with Generalized Deduplication
 
Irish Earth Observation Symposium 2014: EO Technologies - The Cause and Solut...
Irish Earth Observation Symposium 2014: EO Technologies - The Cause and Solut...Irish Earth Observation Symposium 2014: EO Technologies - The Cause and Solut...
Irish Earth Observation Symposium 2014: EO Technologies - The Cause and Solut...
 
Dynamics Day 2014: Doing More with Data
Dynamics Day 2014: Doing More with Data Dynamics Day 2014: Doing More with Data
Dynamics Day 2014: Doing More with Data
 
How to design ai functions to the cloud native infra
How to design ai functions to the cloud native infraHow to design ai functions to the cloud native infra
How to design ai functions to the cloud native infra
 
EnBIS 2016 opening
EnBIS 2016 openingEnBIS 2016 opening
EnBIS 2016 opening
 
Crowd Counting from UAVs (ECCV2020)
Crowd Counting from UAVs (ECCV2020)Crowd Counting from UAVs (ECCV2020)
Crowd Counting from UAVs (ECCV2020)
 
Roberto minerva 20181130
Roberto minerva 20181130  Roberto minerva 20181130
Roberto minerva 20181130
 
Through The Looking Glass 2012 2015(Afcom)
Through The Looking Glass 2012 2015(Afcom)Through The Looking Glass 2012 2015(Afcom)
Through The Looking Glass 2012 2015(Afcom)
 
CVPR 2018 Paper Reading MobileNet V2
CVPR 2018 Paper Reading MobileNet V2CVPR 2018 Paper Reading MobileNet V2
CVPR 2018 Paper Reading MobileNet V2
 
Wally Goulet
Wally GouletWally Goulet
Wally Goulet
 
The internet of things, do we need all that data?
The internet of things, do we need all that data?The internet of things, do we need all that data?
The internet of things, do we need all that data?
 
Edge optimized architecture for fabric defect detection in real-time
Edge optimized architecture for fabric defect detection in real-timeEdge optimized architecture for fabric defect detection in real-time
Edge optimized architecture for fabric defect detection in real-time
 
“Processing Raw Images Efficiently on the MAX78000 Neural Network Accelerator...
“Processing Raw Images Efficiently on the MAX78000 Neural Network Accelerator...“Processing Raw Images Efficiently on the MAX78000 Neural Network Accelerator...
“Processing Raw Images Efficiently on the MAX78000 Neural Network Accelerator...
 
Speeding Up Data Science: From a Data Management Perspective
Speeding Up Data Science: From a Data Management PerspectiveSpeeding Up Data Science: From a Data Management Perspective
Speeding Up Data Science: From a Data Management Perspective
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
 
OutSystems Next Step 2014 Presentation - BIG DATA Journal ADEC Group
OutSystems Next Step 2014 Presentation - BIG DATA Journal ADEC GroupOutSystems Next Step 2014 Presentation - BIG DATA Journal ADEC Group
OutSystems Next Step 2014 Presentation - BIG DATA Journal ADEC Group
 
Bigdata
Bigdata Bigdata
Bigdata
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

ChipEx 2019 keynote

  • 1. Manish Pandey May 13, 2019 Keynote Talk Artificial Intelligence: Driving the Next Generation of Chips and Systems 2019 Tel Aviv, Israel
  • 3. 32019 Intelligence Infused from the Edge to the Cloud
  • 4. 42019 Fueled by advances in AI/ML 2000 2005 2010 2015 Growthinpapers(relativeto1996) Increasing Accuracy ErrorRateinImageClassification 2010 2011 2012 2013 2014 2015 2017
  • 5. 52019 AI Revolution Enabled by HardwareAI driven by advances in hardware Chiwawa = = =
  • 6. 62019 Deep Learning Advances Gated by Hardware [Bill Dally, SysML 2018]
  • 7. 72019 Deep Learning Advances Gated by Hardware • Results improve with o Larger Models o Larger Datasets => More Computation
  • 8. 82019 Data Set and Model Size Scaling [Hestess et al. Arxiv 17.12.00409] • 256X data • 64X model size Þ Compute 32,000 X
  • 9. 92019 How Much Compute? 12 HD Cameras Inferencing** • 25 Million Weights • 300 Gops for HD Image • 9.4 Tops for 30fps • 12 Cameras, 3 nets = 338 Tops Training • 30Tops x 108 (train set) x 10^2 (epochs) = 1023 Ops **ResNet-50 [Bill Dally, SysML 2018]
  • 10. 102019 Challenge: End of Line for CMOS Scaling • Device scaling down slowing • Power Density stopped scaling in 2005 [Olukotun, Horowitz][IMEC]
  • 11. 112019 Dennard Scaling to Dark Silicon Can we specialize designs for DNNs? Taylor, "A landscape of the new dark silicon design regime." IEEE Micro 33.5 (2013): 8-19
  • 12. 122019 Energy Cost for Operations Instruction Fetch/D 70 [Horowitz ISSCC 2014] *45nm
  • 13. 132019 Energy Cost for DNN Ops Y = AB + C Instruction Data Type Energy Energy/Op 1 Op/Instr (*,+) Memory fp32 89 nJ 693 pJ 128 Ops/Instr (AX+B) Memory fp32 72 nJ 562 pJ 128 Ops/Instr (AX+B) Cache fp32 0.87 nJ 6.82 pJ 128 Ops/Instr (AX+B) Cache fp16 0.47 nJ 3.68 pJ *45nm
  • 14. 142019 Build a processor tuned to application - specialize for DNNs • More effective parallelism for AI Operations (not ILP) – SIMD vs. MIMD – VLIW vs. Speculative, out-of-order • Eliminate unneeded accuracy – IEEE replaced by lower precision FP – 32-64 bit bit integers to 8-16 bit integers • More effective use of memory bandwidth – User controlled versus caches How many TeraOps per Watt?
  • 15. 152019 2-D Grid of Processing Elements Systolic Arrays • Balance compute and I/O • Local Communication • Concurrency • M1*M2 nxn mult in 2n cycles
  • 16. 162019 TPUv2 • 8GB + 8GB HBM • 600 GB/s Mem BW • 32b float • MXU fp32 accumulation • Reduced precision mult. [Dean NIPS’17]
  • 17. 172019 Datatypes – Reduced Precision 16-bits accuracy matches 32-bits! Stochastic Rounding: 2.2 rounded to 2 with probability 0.8 rounded to 3 with probability 0.2
  • 19. 192019 Extreme Datatypes • Q: What can we do with single bit or ternary (+a, -b, 0) representations • A: Surprisingly, a lot. Just retrain/increase the number of activation layers [Chenzhuo et al,, arxiv 1612.01064] Network
  • 20. 202019 Pruning – How many values? 90% values can be thrown away
  • 21. 212019 Quantization – How many distinct values? 16 Values => 4 bits representation Instead of 16 fp32 numbers, store16 4-bit indices [Han et al, 2015]
  • 22. 222019 Memory Locality 20mm 32-bit op 4pJ 256-bit access 8KB Cache 50pJ 256-bit Bus 256pJ DRAM 32bits 640pJ 256-bit Bus 25pJ Compute Weights Interconnect Bring Compute Closer to Memory
  • 23. 232019 Memory and inter-chip communication advances [Graphcore 2018] pJ 64pJ/B 16GB 900GB/s @ 60W 10pJ/B 256MB 6TB/s @ 60W 1pJ/B 1000 x 256KB 60TB/s @ 60W Memory power density Is ~25% of logic power density
  • 24. 242019 In-Memory Compute [Ando et al., JSSC 04/18] In-memory compute with low-bits/value, low cost ops
  • 26. 262019 AI Application-System Co-design [Reagen et al, Minerva, ISCA 2016] • Co-design across the algorithm, architecture, and circuit levels • Optimize DNN hardware accelerators across multiple datasets
  • 27. 272019 Where Next? • Rise of non-Von Neumann Architectures to efficiently solve Domain-Specific Problems – A new Golden Age of Computer Architecture • Advances in semi technology, computer architecture will continue advances in AI • 10 Tops/W is the current State of the Art (~14nm) • Several orders of magnitude gap with the human brain