SlideShare a Scribd company logo
1
Dr HAMADI CHAREF Brahim
Data Storage Institute (DSI)
Agency for Science, Technology and Research (A*STAR)
April 13, 2017
2
 ISCA 2017 Paper
 Motivations
 Photos
 Internals
 Performance
 Summary
 Related Patents
Agenda
3https://drive.google.com/file/d/0Bx4hafXDDq2EMzRNcy1vSUxtcEk/view
Paper on Google TPUTM
4
 Google Data Center demand / workload
 Neural Networks inference 95%
- Multi-Layer Perceptrons (MLPs)
- Convolutional Neural Networks (CNNs)
- Long Short-Term Memory Units (LSTMs)
 Performance Metrics
- Peak throughput (92 TeraOps/second)
- Power (Watts)
- Cost ($$$)
Google TPU - Motivations
5
 CPU vs GPU vs (FPGA vs) ASIC
CPU - server-class Intel Haswell
GPU - Nvidia K80
ASIC – Tensor Processing Unit
Google TPU - Motivations
6
Google TPU - Photos
7
Google TPU - Photos
8
Google TPU - Photos
9
Google TPU - Internals
10
Google TPU - Internals
11
Google TPU - Internals
12
Google TPU - Performance
13
Google TPU - Performance
14
Google TPU - Performance
15
Google TPU – Performance (roofline)
Roofline: an insightful visual performance model for multicore architectures
Samuel Williams, Andrew Waterman, David Patterson
Communications of the ACM - A Direct Path to Dependable Software: Volume 52 Issue 4, April 2009
16
 TPU is an ASIC for NNets
 BIG Matrix Unit
256x256 8b = 65,536 MACs (32b ACC)
 TPU on average 15X - 30X faster than GPU or CPU
 TOPS/Watt about 30X - 80X higher
 Future TPU could use GDDR5 memory (as GPU)
- triple achieved TOPS
- raise TOPS/Watt to nearly 70X the GPU
- raise TOPS/Watt to nearly 200X the CPU
Google TPU - Summary
17
 Vector Computation Unit in a Neural Network Processor
Gregory Michael Thorson, Christopher Aaron Clark, Dan Luu.
https://www.google.com/patents/US20160342889
 Batch Processing in a Neural Network Processor
Reginald Clifford Young
https://www.google.com/patents/US20160342890
 Neural Network Processor
Jonathan Ross, Norman Paul Jouppi, Andrew Everett Phelps, Reginald
Clifford Young, Thomas Norrie, Gregory Michael Thorson, Dan Luu.
https://www.google.com/patents/US20160342891
System and method for parallelizing convolutional neural networks
Alexander Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton
https://www.google.com/patents/US20140180989
Google TPU - Patents
18
 Computing Convolutions Using a Neural Network Processor
Jonathan Ross, Andrew Everett Phelps.
https://www.google.com/patents/WO2016186811A1
 Prefetching Weights for a Neural Network Processor
Jonathan Ross.
https://www.google.com/patents/US20160342892
 Rotating Data for Neural Network Computations
Jonathan Ross, Gregory Michael Thorson.
http://google.com/patents/US20160342893
Google TPU - Patents

More Related Content

What's hot

CUDA performance study on Hadoop MapReduce Cluster
CUDA performance study on Hadoop MapReduce ClusterCUDA performance study on Hadoop MapReduce Cluster
CUDA performance study on Hadoop MapReduce Cluster
airbots
 
Expectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchExpectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software research
Ryousei Takano
 
USENIX NSDI 2016 (Session: Resource Sharing)
USENIX NSDI 2016 (Session: Resource Sharing)USENIX NSDI 2016 (Session: Resource Sharing)
USENIX NSDI 2016 (Session: Resource Sharing)
Ryousei Takano
 
High performance computing tutorial, with checklist and tips to optimize clus...
High performance computing tutorial, with checklist and tips to optimize clus...High performance computing tutorial, with checklist and tips to optimize clus...
High performance computing tutorial, with checklist and tips to optimize clus...
Pradeep Redddy Raamana
 
Hadoop + GPU
Hadoop + GPUHadoop + GPU
Hadoop + GPU
Vladimir Starostenkov
 
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ..."NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
Edge AI and Vision Alliance
 
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore EraFlow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
Ryousei Takano
 
Harnessing Big Data with Spark
Harnessing Big Data with SparkHarnessing Big Data with Spark
Harnessing Big Data with Spark
Alpine Data
 
GPU Computing
GPU ComputingGPU Computing
GPU Computing
Khan Mostafa
 
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Junli Gu
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
Junli Gu
 
Hadoop mapreduce performance study on arm cluster
Hadoop mapreduce performance study on arm clusterHadoop mapreduce performance study on arm cluster
Hadoop mapreduce performance study on arm cluster
airbots
 
Google warehouse scale computer
Google warehouse scale computerGoogle warehouse scale computer
Google warehouse scale computer
Tejhaskar Ashok Kumar
 
OpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalOpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation final
Junli Gu
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with Gpu
Rohit Khatana
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network Processing
Ryousei Takano
 
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionKaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Preferred Networks
 
Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
 Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
Spark Summit
 

What's hot (18)

CUDA performance study on Hadoop MapReduce Cluster
CUDA performance study on Hadoop MapReduce ClusterCUDA performance study on Hadoop MapReduce Cluster
CUDA performance study on Hadoop MapReduce Cluster
 
Expectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchExpectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software research
 
USENIX NSDI 2016 (Session: Resource Sharing)
USENIX NSDI 2016 (Session: Resource Sharing)USENIX NSDI 2016 (Session: Resource Sharing)
USENIX NSDI 2016 (Session: Resource Sharing)
 
High performance computing tutorial, with checklist and tips to optimize clus...
High performance computing tutorial, with checklist and tips to optimize clus...High performance computing tutorial, with checklist and tips to optimize clus...
High performance computing tutorial, with checklist and tips to optimize clus...
 
Hadoop + GPU
Hadoop + GPUHadoop + GPU
Hadoop + GPU
 
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ..."NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
 
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore EraFlow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
 
Harnessing Big Data with Spark
Harnessing Big Data with SparkHarnessing Big Data with Spark
Harnessing Big Data with Spark
 
GPU Computing
GPU ComputingGPU Computing
GPU Computing
 
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 
Hadoop mapreduce performance study on arm cluster
Hadoop mapreduce performance study on arm clusterHadoop mapreduce performance study on arm cluster
Hadoop mapreduce performance study on arm cluster
 
Google warehouse scale computer
Google warehouse scale computerGoogle warehouse scale computer
Google warehouse scale computer
 
OpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalOpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation final
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with Gpu
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network Processing
 
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionKaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
 
Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
 Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
 

Similar to 2017 04-13-google-tpu-04

組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム
Shinnosuke Furuya
 
Kindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievKindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 Kiev
Volodymyr Saviak
 
GPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteGPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 Keynote
NVIDIA
 
Advances in GPU Computing
Advances in GPU ComputingAdvances in GPU Computing
Advances in GPU Computing
Frédéric Parienté
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
Wilhelm van Belkum
 
Petascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big AnalyticsPetascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big Analytics
Heiko Joerg Schick
 
Distributed Deep Learning with Hadoop and TensorFlow
Distributed Deep Learning with Hadoop and TensorFlowDistributed Deep Learning with Hadoop and TensorFlow
Distributed Deep Learning with Hadoop and TensorFlow
Jan Wiegelmann
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
Jinwon Lee
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Databricks
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Kohei KaiGai
 
Hortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AIHortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AI
DataWorks Summit
 
Lrz kurs: big data analysis
Lrz kurs: big data analysisLrz kurs: big data analysis
Lrz kurs: big data analysis
Ferdinand Jamitzky
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server Solution
NVIDIA Taiwan
 
GIST AI-X Computing Cluster
GIST AI-X Computing ClusterGIST AI-X Computing Cluster
GIST AI-X Computing Cluster
Jax Jargalsaikhan
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​
Larry Smarr
 
Deep Learning Update May 2016
Deep Learning Update May 2016Deep Learning Update May 2016
Deep Learning Update May 2016
Frédéric Parienté
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCI
Ryousei Takano
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
John Zedlewski
 
Cloud Computing y Big Data, próxima frontera de la innovación
Cloud Computing y Big Data, próxima frontera de la innovaciónCloud Computing y Big Data, próxima frontera de la innovación
Cloud Computing y Big Data, próxima frontera de la innovación
Fundación Ramón Areces
 
BioPig for scalable analysis of big sequencing data
BioPig for scalable analysis of big sequencing dataBioPig for scalable analysis of big sequencing data
BioPig for scalable analysis of big sequencing data
Zhong Wang
 

Similar to 2017 04-13-google-tpu-04 (20)

組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム
 
Kindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievKindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 Kiev
 
GPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteGPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 Keynote
 
Advances in GPU Computing
Advances in GPU ComputingAdvances in GPU Computing
Advances in GPU Computing
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
 
Petascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big AnalyticsPetascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big Analytics
 
Distributed Deep Learning with Hadoop and TensorFlow
Distributed Deep Learning with Hadoop and TensorFlowDistributed Deep Learning with Hadoop and TensorFlow
Distributed Deep Learning with Hadoop and TensorFlow
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
 
Hortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AIHortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AI
 
Lrz kurs: big data analysis
Lrz kurs: big data analysisLrz kurs: big data analysis
Lrz kurs: big data analysis
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server Solution
 
GIST AI-X Computing Cluster
GIST AI-X Computing ClusterGIST AI-X Computing Cluster
GIST AI-X Computing Cluster
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​
 
Deep Learning Update May 2016
Deep Learning Update May 2016Deep Learning Update May 2016
Deep Learning Update May 2016
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCI
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
 
Cloud Computing y Big Data, próxima frontera de la innovación
Cloud Computing y Big Data, próxima frontera de la innovaciónCloud Computing y Big Data, próxima frontera de la innovación
Cloud Computing y Big Data, próxima frontera de la innovación
 
BioPig for scalable analysis of big sequencing data
BioPig for scalable analysis of big sequencing dataBioPig for scalable analysis of big sequencing data
BioPig for scalable analysis of big sequencing data
 

Recently uploaded

Introduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.pptIntroduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.ppt
Dwarkadas J Sanghvi College of Engineering
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
Divyanshu
 
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
PIMR BHOPAL
 
smart pill dispenser is designed to improve medication adherence and safety f...
smart pill dispenser is designed to improve medication adherence and safety f...smart pill dispenser is designed to improve medication adherence and safety f...
smart pill dispenser is designed to improve medication adherence and safety f...
um7474492
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
MadhavJungKarki
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
ElakkiaU
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
Introduction to verilog basic modeling .ppt
Introduction to verilog basic modeling   .pptIntroduction to verilog basic modeling   .ppt
Introduction to verilog basic modeling .ppt
AmitKumar730022
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
Prakhyath Rai
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf
AlvianRamadhani5
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
21UME003TUSHARDEB
 
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENTNATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
Addu25809
 
Pressure Relief valve used in flow line to release the over pressure at our d...
Pressure Relief valve used in flow line to release the over pressure at our d...Pressure Relief valve used in flow line to release the over pressure at our d...
Pressure Relief valve used in flow line to release the over pressure at our d...
cannyengineerings
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
VANDANAMOHANGOUDA
 
TIME TABLE MANAGEMENT SYSTEM testing.pptx
TIME TABLE MANAGEMENT SYSTEM testing.pptxTIME TABLE MANAGEMENT SYSTEM testing.pptx
TIME TABLE MANAGEMENT SYSTEM testing.pptx
CVCSOfficial
 
Supermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdfSupermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdf
Kamal Acharya
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
aryanpankaj78
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
Atif Razi
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
UReason
 

Recently uploaded (20)

Introduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.pptIntroduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.ppt
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
 
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
 
smart pill dispenser is designed to improve medication adherence and safety f...
smart pill dispenser is designed to improve medication adherence and safety f...smart pill dispenser is designed to improve medication adherence and safety f...
smart pill dispenser is designed to improve medication adherence and safety f...
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
Introduction to verilog basic modeling .ppt
Introduction to verilog basic modeling   .pptIntroduction to verilog basic modeling   .ppt
Introduction to verilog basic modeling .ppt
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
 
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENTNATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
 
Pressure Relief valve used in flow line to release the over pressure at our d...
Pressure Relief valve used in flow line to release the over pressure at our d...Pressure Relief valve used in flow line to release the over pressure at our d...
Pressure Relief valve used in flow line to release the over pressure at our d...
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
 
TIME TABLE MANAGEMENT SYSTEM testing.pptx
TIME TABLE MANAGEMENT SYSTEM testing.pptxTIME TABLE MANAGEMENT SYSTEM testing.pptx
TIME TABLE MANAGEMENT SYSTEM testing.pptx
 
Supermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdfSupermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdf
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
 

2017 04-13-google-tpu-04

  • 1. 1 Dr HAMADI CHAREF Brahim Data Storage Institute (DSI) Agency for Science, Technology and Research (A*STAR) April 13, 2017
  • 2. 2  ISCA 2017 Paper  Motivations  Photos  Internals  Performance  Summary  Related Patents Agenda
  • 4. 4  Google Data Center demand / workload  Neural Networks inference 95% - Multi-Layer Perceptrons (MLPs) - Convolutional Neural Networks (CNNs) - Long Short-Term Memory Units (LSTMs)  Performance Metrics - Peak throughput (92 TeraOps/second) - Power (Watts) - Cost ($$$) Google TPU - Motivations
  • 5. 5  CPU vs GPU vs (FPGA vs) ASIC CPU - server-class Intel Haswell GPU - Nvidia K80 ASIC – Tensor Processing Unit Google TPU - Motivations
  • 6. 6 Google TPU - Photos
  • 7. 7 Google TPU - Photos
  • 8. 8 Google TPU - Photos
  • 9. 9 Google TPU - Internals
  • 10. 10 Google TPU - Internals
  • 11. 11 Google TPU - Internals
  • 12. 12 Google TPU - Performance
  • 13. 13 Google TPU - Performance
  • 14. 14 Google TPU - Performance
  • 15. 15 Google TPU – Performance (roofline) Roofline: an insightful visual performance model for multicore architectures Samuel Williams, Andrew Waterman, David Patterson Communications of the ACM - A Direct Path to Dependable Software: Volume 52 Issue 4, April 2009
  • 16. 16  TPU is an ASIC for NNets  BIG Matrix Unit 256x256 8b = 65,536 MACs (32b ACC)  TPU on average 15X - 30X faster than GPU or CPU  TOPS/Watt about 30X - 80X higher  Future TPU could use GDDR5 memory (as GPU) - triple achieved TOPS - raise TOPS/Watt to nearly 70X the GPU - raise TOPS/Watt to nearly 200X the CPU Google TPU - Summary
  • 17. 17  Vector Computation Unit in a Neural Network Processor Gregory Michael Thorson, Christopher Aaron Clark, Dan Luu. https://www.google.com/patents/US20160342889  Batch Processing in a Neural Network Processor Reginald Clifford Young https://www.google.com/patents/US20160342890  Neural Network Processor Jonathan Ross, Norman Paul Jouppi, Andrew Everett Phelps, Reginald Clifford Young, Thomas Norrie, Gregory Michael Thorson, Dan Luu. https://www.google.com/patents/US20160342891 System and method for parallelizing convolutional neural networks Alexander Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton https://www.google.com/patents/US20140180989 Google TPU - Patents
  • 18. 18  Computing Convolutions Using a Neural Network Processor Jonathan Ross, Andrew Everett Phelps. https://www.google.com/patents/WO2016186811A1  Prefetching Weights for a Neural Network Processor Jonathan Ross. https://www.google.com/patents/US20160342892  Rotating Data for Neural Network Computations Jonathan Ross, Gregory Michael Thorson. http://google.com/patents/US20160342893 Google TPU - Patents