SlideShare a Scribd company logo
Motivation for TPU
• 2006: “Just run DNNs
on our CPU datacenter.
It’s basically free.”
• 2013: “3 minutes of
DNN-based voice
search == 2x more
datacenter compute.”
The Players
Norman P. Jouppi and
his two musketeers.
David Patterson
70+ other Google engineers
• 30-80x TOPS/watt vs.
2015 CPUs and GPUs.
• 8 GiB DRAM.
• 8-bit fixed point.
• 256x256 MAC unit.
• Support for data
reordering, matrix
multiply, activation,
pooling, and
normalization.
Tensor Processing Unit (TPU)
“The unexpected desire for TPUs by many Google services combined with
the preference for low response time changed the equation, with
application writers often opting for reduced latency over waiting for
bigger batches to accumulate.”
Application Testbed
TPU Block Diagram & Floor Plan
Experimental Testbed
8x K80 GPUs
The Roofline Model
Rooflines of TPU with DNN Apps
App breakdown by Performance Counters
Latency Results (99%ile)
Programming the TPU Programming FPGAs
TensorFlow

graph
TPU host 

instructions
TPU
bitstream
NVIDIA’s Rebuttal to the TPU
https://blogs.nvidia.com/blog/2017/04/10/ai-drives-rise-accelerated-computing-datacenter/
“Patterson” Discussion
1. Fallacy: NN inference applications in data centers value throughput as much as response time.
2. Fallacy: The K80 GPU architecture is a good match to NN inference.
3. Pitfall: Architects have neglected important NN tasks.
4. Pitfall: For NN hardware, Inferences Per Second (IPS) is an inaccurate summary performance
metric.
5. Fallacy: The K80 GPU results would be much better if Boost mode were enabled.
6. Fallacy: CPU and GPU results would be comparable to the TPU if we used them more
efficiently or compared to newer versions.
7. Pitfall: Performance counters added as an afterthought for NN hardware.
8. Fallacy: After two years of software tuning, the only path left to increase TPU performance is
hardware upgrades.
“CNNs constitute only about 5% of the representative NN workload for
Google. More attention should be paid to MLPs and LSTMs. Repeating
history, it’s similar to when many architects concentrated on floating-
point performance when most mainstream workloads turned out to
be dominated by integer operations.”
Interesting quote

More Related Content

What's hot

08 Supercomputer Fugaku
08 Supercomputer Fugaku08 Supercomputer Fugaku
08 Supercomputer Fugaku
RCCSRENKEI
 
High performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspectiveHigh performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspectiveJason Shih
 
Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)
Saksham Tanwar
 
Google File System
Google File SystemGoogle File System
Google File Systemnadikari123
 
Linux memory consumption
Linux memory consumptionLinux memory consumption
Linux memory consumption
haish
 
Wireshark入門 (2014版)
Wireshark入門 (2014版)Wireshark入門 (2014版)
Wireshark入門 (2014版)
彰 村地
 
Google Spanner
Google SpannerGoogle Spanner
Google Spanner
Vaidas Brundza
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
Sri Ambati
 
why we need ext4
why we need ext4why we need ext4
why we need ext4
Hao(Robin) Dong
 
Google file system
Google file systemGoogle file system
Google file system
Lalit Rastogi
 
Introduction to High Performance Computing
Introduction to High Performance ComputingIntroduction to High Performance Computing
Introduction to High Performance Computing
Umarudin Zaenuri
 
Multicore computers
Multicore computersMulticore computers
Multicore computers
Syed Zaid Irshad
 
Intel® hyper threading technology
Intel® hyper threading technologyIntel® hyper threading technology
Intel® hyper threading technology
Amirali Sharifian
 
EnrootとPyxisで快適コンテナ生活
EnrootとPyxisで快適コンテナ生活EnrootとPyxisで快適コンテナ生活
EnrootとPyxisで快適コンテナ生活
Kuninobu SaSaki
 
確実な再起動からはじめる クラウドネイティブオペレーション
確実な再起動からはじめる クラウドネイティブオペレーション確実な再起動からはじめる クラウドネイティブオペレーション
確実な再起動からはじめる クラウドネイティブオペレーション
Toru Makabe
 
GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)
Fatima Qayyum
 
Multi core processors
Multi core processorsMulti core processors
Multi core processors
Nipun Sharma
 
Graphics Processing Unit by Saurabh
Graphics Processing Unit by SaurabhGraphics Processing Unit by Saurabh
Graphics Processing Unit by Saurabh
Saurabh Kumar
 
Doom Technical Review
Doom Technical ReviewDoom Technical Review
Doom Technical Review
Ali Salehi
 

What's hot (20)

08 Supercomputer Fugaku
08 Supercomputer Fugaku08 Supercomputer Fugaku
08 Supercomputer Fugaku
 
High performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspectiveHigh performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspective
 
Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)
 
Google File System
Google File SystemGoogle File System
Google File System
 
Linux memory consumption
Linux memory consumptionLinux memory consumption
Linux memory consumption
 
Wireshark入門 (2014版)
Wireshark入門 (2014版)Wireshark入門 (2014版)
Wireshark入門 (2014版)
 
Google Spanner
Google SpannerGoogle Spanner
Google Spanner
 
Current Trends in HPC
Current Trends in HPCCurrent Trends in HPC
Current Trends in HPC
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
 
why we need ext4
why we need ext4why we need ext4
why we need ext4
 
Google file system
Google file systemGoogle file system
Google file system
 
Introduction to High Performance Computing
Introduction to High Performance ComputingIntroduction to High Performance Computing
Introduction to High Performance Computing
 
Multicore computers
Multicore computersMulticore computers
Multicore computers
 
Intel® hyper threading technology
Intel® hyper threading technologyIntel® hyper threading technology
Intel® hyper threading technology
 
EnrootとPyxisで快適コンテナ生活
EnrootとPyxisで快適コンテナ生活EnrootとPyxisで快適コンテナ生活
EnrootとPyxisで快適コンテナ生活
 
確実な再起動からはじめる クラウドネイティブオペレーション
確実な再起動からはじめる クラウドネイティブオペレーション確実な再起動からはじめる クラウドネイティブオペレーション
確実な再起動からはじめる クラウドネイティブオペレーション
 
GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)
 
Multi core processors
Multi core processorsMulti core processors
Multi core processors
 
Graphics Processing Unit by Saurabh
Graphics Processing Unit by SaurabhGraphics Processing Unit by Saurabh
Graphics Processing Unit by Saurabh
 
Doom Technical Review
Doom Technical ReviewDoom Technical Review
Doom Technical Review
 

Similar to Slides for In-Datacenter Performance Analysis of a Tensor Processing Unit

Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep Learning
Sergey Karayev
 
[Hadoop Meetup] Tensorflow on Apache Hadoop YARN - Sunil Govindan
[Hadoop Meetup] Tensorflow on Apache Hadoop YARN - Sunil Govindan[Hadoop Meetup] Tensorflow on Apache Hadoop YARN - Sunil Govindan
[Hadoop Meetup] Tensorflow on Apache Hadoop YARN - Sunil Govindan
Newton Alex
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2Junli Gu
 
Deep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDeep Learning with Spark and GPUs
Deep Learning with Spark and GPUs
DataWorks Summit
 
GPU accelerated Large Scale Analytics
GPU accelerated Large Scale AnalyticsGPU accelerated Large Scale Analytics
GPU accelerated Large Scale Analytics
Suleiman Shehu
 
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using PrometheusMonitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Databricks
 
Deep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce SpitlerDeep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce Spitler
Databricks
 
IRJET-A Study on Parallization of Genetic Algorithms on GPUS using CUDA
IRJET-A Study on Parallization of Genetic Algorithms on GPUS using CUDAIRJET-A Study on Parallization of Genetic Algorithms on GPUS using CUDA
IRJET-A Study on Parallization of Genetic Algorithms on GPUS using CUDA
IRJET Journal
 
Fugaku, the Successes and the Lessons Learned
Fugaku, the Successes and the Lessons LearnedFugaku, the Successes and the Lessons Learned
Fugaku, the Successes and the Lessons Learned
RCCSRENKEI
 
An Introduction to H2O4GPU
An Introduction to H2O4GPUAn Introduction to H2O4GPU
An Introduction to H2O4GPU
Sri Ambati
 
E3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - SundanceE3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - Sundance
Sundance Multiprocessor Technology Ltd.
 
Large Model support and Distribute deep learning
Large Model support and Distribute deep learningLarge Model support and Distribute deep learning
Large Model support and Distribute deep learning
Ganesan Narayanasamy
 
Gpu
GpuGpu
Gpu
GpuGpu
Intel python 2017
Intel python 2017Intel python 2017
Intel python 2017
DESMOND YUEN
 
Python* Scalability in Production Environments
Python* Scalability in Production EnvironmentsPython* Scalability in Production Environments
Python* Scalability in Production Environments
Intel® Software
 
FYP1 Progress Report (final)
FYP1 Progress Report (final)FYP1 Progress Report (final)
FYP1 Progress Report (final)waqas khan
 
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
NVIDIA Taiwan
 
Deep Learning on the SaturnV Cluster
Deep Learning on the SaturnV ClusterDeep Learning on the SaturnV Cluster
Deep Learning on the SaturnV Cluster
inside-BigData.com
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Alluxio, Inc.
 

Similar to Slides for In-Datacenter Performance Analysis of a Tensor Processing Unit (20)

Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep Learning
 
[Hadoop Meetup] Tensorflow on Apache Hadoop YARN - Sunil Govindan
[Hadoop Meetup] Tensorflow on Apache Hadoop YARN - Sunil Govindan[Hadoop Meetup] Tensorflow on Apache Hadoop YARN - Sunil Govindan
[Hadoop Meetup] Tensorflow on Apache Hadoop YARN - Sunil Govindan
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 
Deep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDeep Learning with Spark and GPUs
Deep Learning with Spark and GPUs
 
GPU accelerated Large Scale Analytics
GPU accelerated Large Scale AnalyticsGPU accelerated Large Scale Analytics
GPU accelerated Large Scale Analytics
 
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using PrometheusMonitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
 
Deep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce SpitlerDeep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce Spitler
 
IRJET-A Study on Parallization of Genetic Algorithms on GPUS using CUDA
IRJET-A Study on Parallization of Genetic Algorithms on GPUS using CUDAIRJET-A Study on Parallization of Genetic Algorithms on GPUS using CUDA
IRJET-A Study on Parallization of Genetic Algorithms on GPUS using CUDA
 
Fugaku, the Successes and the Lessons Learned
Fugaku, the Successes and the Lessons LearnedFugaku, the Successes and the Lessons Learned
Fugaku, the Successes and the Lessons Learned
 
An Introduction to H2O4GPU
An Introduction to H2O4GPUAn Introduction to H2O4GPU
An Introduction to H2O4GPU
 
E3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - SundanceE3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - Sundance
 
Large Model support and Distribute deep learning
Large Model support and Distribute deep learningLarge Model support and Distribute deep learning
Large Model support and Distribute deep learning
 
Gpu
GpuGpu
Gpu
 
Gpu
GpuGpu
Gpu
 
Intel python 2017
Intel python 2017Intel python 2017
Intel python 2017
 
Python* Scalability in Production Environments
Python* Scalability in Production EnvironmentsPython* Scalability in Production Environments
Python* Scalability in Production Environments
 
FYP1 Progress Report (final)
FYP1 Progress Report (final)FYP1 Progress Report (final)
FYP1 Progress Report (final)
 
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
 
Deep Learning on the SaturnV Cluster
Deep Learning on the SaturnV ClusterDeep Learning on the SaturnV Cluster
Deep Learning on the SaturnV Cluster
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
 

Recently uploaded

Schematic Diagram MSI MS-7309 - REV 1.0 PDF .pdf
Schematic Diagram MSI MS-7309 - REV 1.0 PDF .pdfSchematic Diagram MSI MS-7309 - REV 1.0 PDF .pdf
Schematic Diagram MSI MS-7309 - REV 1.0 PDF .pdf
nikoloco007
 
欧洲杯冠军-欧洲杯冠军网站-欧洲杯冠军|【​网址​🎉ac123.net🎉​】领先全球的买球投注平台
欧洲杯冠军-欧洲杯冠军网站-欧洲杯冠军|【​网址​🎉ac123.net🎉​】领先全球的买球投注平台欧洲杯冠军-欧洲杯冠军网站-欧洲杯冠军|【​网址​🎉ac123.net🎉​】领先全球的买球投注平台
欧洲杯冠军-欧洲杯冠军网站-欧洲杯冠军|【​网址​🎉ac123.net🎉​】领先全球的买球投注平台
andreassenrolf537
 
LORRAINE ANDREI_LEQUIGAN_GOOGLE CALENDAR
LORRAINE ANDREI_LEQUIGAN_GOOGLE CALENDARLORRAINE ANDREI_LEQUIGAN_GOOGLE CALENDAR
LORRAINE ANDREI_LEQUIGAN_GOOGLE CALENDAR
lorraineandreiamcidl
 
一比一原版(UCSB毕业证)圣塔芭芭拉社区大学毕业证如何办理
一比一原版(UCSB毕业证)圣塔芭芭拉社区大学毕业证如何办理一比一原版(UCSB毕业证)圣塔芭芭拉社区大学毕业证如何办理
一比一原版(UCSB毕业证)圣塔芭芭拉社区大学毕业证如何办理
aozcue
 
加急办理美国南加州大学毕业证文凭毕业证原版一模一样
加急办理美国南加州大学毕业证文凭毕业证原版一模一样加急办理美国南加州大学毕业证文凭毕业证原版一模一样
加急办理美国南加州大学毕业证文凭毕业证原版一模一样
u0g33km
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证如何办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证如何办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证如何办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证如何办理
aozcue
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证如何办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证如何办理一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证如何办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证如何办理
peuce
 
Building a Raspberry Pi Robot with Dot NET 8, Blazor and SignalR - Slides Onl...
Building a Raspberry Pi Robot with Dot NET 8, Blazor and SignalR - Slides Onl...Building a Raspberry Pi Robot with Dot NET 8, Blazor and SignalR - Slides Onl...
Building a Raspberry Pi Robot with Dot NET 8, Blazor and SignalR - Slides Onl...
Peter Gallagher
 
Production.pptxd dddddddddddddddddddddddddddddddddd
Production.pptxd ddddddddddddddddddddddddddddddddddProduction.pptxd dddddddddddddddddddddddddddddddddd
Production.pptxd dddddddddddddddddddddddddddddddddd
DanielOliver74
 

Recently uploaded (9)

Schematic Diagram MSI MS-7309 - REV 1.0 PDF .pdf
Schematic Diagram MSI MS-7309 - REV 1.0 PDF .pdfSchematic Diagram MSI MS-7309 - REV 1.0 PDF .pdf
Schematic Diagram MSI MS-7309 - REV 1.0 PDF .pdf
 
欧洲杯冠军-欧洲杯冠军网站-欧洲杯冠军|【​网址​🎉ac123.net🎉​】领先全球的买球投注平台
欧洲杯冠军-欧洲杯冠军网站-欧洲杯冠军|【​网址​🎉ac123.net🎉​】领先全球的买球投注平台欧洲杯冠军-欧洲杯冠军网站-欧洲杯冠军|【​网址​🎉ac123.net🎉​】领先全球的买球投注平台
欧洲杯冠军-欧洲杯冠军网站-欧洲杯冠军|【​网址​🎉ac123.net🎉​】领先全球的买球投注平台
 
LORRAINE ANDREI_LEQUIGAN_GOOGLE CALENDAR
LORRAINE ANDREI_LEQUIGAN_GOOGLE CALENDARLORRAINE ANDREI_LEQUIGAN_GOOGLE CALENDAR
LORRAINE ANDREI_LEQUIGAN_GOOGLE CALENDAR
 
一比一原版(UCSB毕业证)圣塔芭芭拉社区大学毕业证如何办理
一比一原版(UCSB毕业证)圣塔芭芭拉社区大学毕业证如何办理一比一原版(UCSB毕业证)圣塔芭芭拉社区大学毕业证如何办理
一比一原版(UCSB毕业证)圣塔芭芭拉社区大学毕业证如何办理
 
加急办理美国南加州大学毕业证文凭毕业证原版一模一样
加急办理美国南加州大学毕业证文凭毕业证原版一模一样加急办理美国南加州大学毕业证文凭毕业证原版一模一样
加急办理美国南加州大学毕业证文凭毕业证原版一模一样
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证如何办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证如何办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证如何办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证如何办理
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证如何办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证如何办理一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证如何办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证如何办理
 
Building a Raspberry Pi Robot with Dot NET 8, Blazor and SignalR - Slides Onl...
Building a Raspberry Pi Robot with Dot NET 8, Blazor and SignalR - Slides Onl...Building a Raspberry Pi Robot with Dot NET 8, Blazor and SignalR - Slides Onl...
Building a Raspberry Pi Robot with Dot NET 8, Blazor and SignalR - Slides Onl...
 
Production.pptxd dddddddddddddddddddddddddddddddddd
Production.pptxd ddddddddddddddddddddddddddddddddddProduction.pptxd dddddddddddddddddddddddddddddddddd
Production.pptxd dddddddddddddddddddddddddddddddddd
 

Slides for In-Datacenter Performance Analysis of a Tensor Processing Unit

  • 1.
  • 2. Motivation for TPU • 2006: “Just run DNNs on our CPU datacenter. It’s basically free.” • 2013: “3 minutes of DNN-based voice search == 2x more datacenter compute.”
  • 3. The Players Norman P. Jouppi and his two musketeers. David Patterson 70+ other Google engineers
  • 4. • 30-80x TOPS/watt vs. 2015 CPUs and GPUs. • 8 GiB DRAM. • 8-bit fixed point. • 256x256 MAC unit. • Support for data reordering, matrix multiply, activation, pooling, and normalization. Tensor Processing Unit (TPU)
  • 5. “The unexpected desire for TPUs by many Google services combined with the preference for low response time changed the equation, with application writers often opting for reduced latency over waiting for bigger batches to accumulate.” Application Testbed
  • 6. TPU Block Diagram & Floor Plan
  • 9. Rooflines of TPU with DNN Apps
  • 10. App breakdown by Performance Counters
  • 12. Programming the TPU Programming FPGAs TensorFlow
 graph TPU host 
 instructions TPU bitstream
  • 13. NVIDIA’s Rebuttal to the TPU https://blogs.nvidia.com/blog/2017/04/10/ai-drives-rise-accelerated-computing-datacenter/
  • 14. “Patterson” Discussion 1. Fallacy: NN inference applications in data centers value throughput as much as response time. 2. Fallacy: The K80 GPU architecture is a good match to NN inference. 3. Pitfall: Architects have neglected important NN tasks. 4. Pitfall: For NN hardware, Inferences Per Second (IPS) is an inaccurate summary performance metric. 5. Fallacy: The K80 GPU results would be much better if Boost mode were enabled. 6. Fallacy: CPU and GPU results would be comparable to the TPU if we used them more efficiently or compared to newer versions. 7. Pitfall: Performance counters added as an afterthought for NN hardware. 8. Fallacy: After two years of software tuning, the only path left to increase TPU performance is hardware upgrades.
  • 15. “CNNs constitute only about 5% of the representative NN workload for Google. More attention should be paid to MLPs and LSTMs. Repeating history, it’s similar to when many architects concentrated on floating- point performance when most mainstream workloads turned out to be dominated by integer operations.” Interesting quote