SlideShare a Scribd company logo
1 of 12
FPGAS VERSUS GPUS IN DATACENTERS
Babak Falsafi, Bill Dally, Desh Singh, Derek Chiou, Joshua J. Yi, Resit Sendag
IEEE Computer Society, 2017
Mehedi Hasan Raju
Datacenter
• A big datacenter consumes about 20 MW
• 17 times the size of a football field (regardless of what kind of football you
play)
• Costs a few billion dollars
- Datacenter at London already consume 6 percent of electricity, and
that’s growing at 20 percent per year.
• Moore’s law came with Dennard scaling
- P = f * C * V2 -> “silver bullet” in supply voltages (V )
- Voltages are scaled down -> exponentially power decreases
- It limits the power per area as we scale down the area
• According to ITRS projections from 2014 voltages have leveled off
• Shifting from a modern server CPU, to something like the EZ Chip/Cavium
efficient core design
- Specialized for server workloads.
- The good news with this type of approach is that it reduces the power.
What do you do next?
• Core complexity is going down.
- we might eventually get to the five-stage pipeline.
• Nikos Hardavellas showed that, even for server workloads with abundant
request-level parallelism, the server won’t be able to power up the entire
chip.
continue..
Figure
On the one hand, we have massive data, and on the other hand, we have
diminishing efficiency.
• Two possibilities
- Field-programmable gate arrays (FPGAs)
- GPUs
Focus of the panel
Two experts on the topic-
• Bill Dally, a chief scientist and senior vice president of research at
Nvidia
• Desh Singh, the director of virtualization labs for Altera.
Panel experts
• The problem facing datacenters is analogous.
• If you need lots of arithmetic performance, integer, floating point,
memory bandwidth, and/or tight integration with the CPU to read and
write
• you would use a GPU, because it does all these things better than
anything else.
• Nvidia’s P100 GPU is an example of how you get all these things.
• If you need to build something to process gene sequences
• Memory bandwidth needs are modest, and you do not need tight
integration with the CPU.
• Use an ASIC [application-specific integrated circuit]
• For encoding and decoding and do not need the high bandwidth and
arithmetic
Bill Dally: Use the Right Tool for the Job
• If you do not have the volume-
• use an FPGA, FPUs on an FPGA are great
• But you must feed those FPUs by building multiplexors, sequencers,
and other structures in the reconfigurable fabric.
• The overhead is enormous
• you never get the efficiency of a GPU on an FGPA.
Bill Dally: Use the Right Tool for the Job
• Summary from Bill Dally –
• for arithmetic and memory-intensive problems – use GPU.
• They are also much more programmable. You can program them
in CUDA, OpenACC, and OpenMP, and soon in extended Cþþ.
• for nonarithmetic logic with modest memory bandwidth demands - >
use an ASIC.
• If you have low volume -> use an FPGA,
• but realize you’re going to get this huge overhead and have a
really hard time programming it.
• You basically have to write Verilog or the equivalent.
Bill Dally: Use the Right Tool for the Job
• To run anything on a simple processor, the processor would need
something that fetches instructions from memory , a simple arithmetic
unit, and a register file.
• If you were to implement an instruction on this processor, it would
basically fetch an opcode that tells the processor how to configure the
data path for that opcode.
• Executing the assembly language instructions on the simple processor,
basically what you are doing is multiplexing that same piece of hardware
across time.
• This is essentially what a processor does: it takes the same piece of
hardware and multiplexes it to implement different kinds of instructions.
• This is essentially what you do when you try to transform the program to
run on an FPGA. You are essentially creating a custom piece of hardware
to implement your algorithm.
Desh Singh: Specialization for Efficiency
• They have built a Java compiler and things as complex as Spark and
Hadoop implementations that directly map to FPGA hardware.
• When you start using software to define hardware like this, it is not just
about defining the accelerator itself; you will find that there are other
fundamental things you can now do with the FPGA.
• Customize how you move data around.
• Data can bypass memory and directly enter the FPGA, which gives
you very efficient types of computation.
In short, FPGA enables the implementation of custom data and control paths
from standard software languages.
Desh Singh: Specialization for Efficiency
Questions?

More Related Content

Similar to FPGAs versus GPUs in Data centers

GPU databases - How to use them and what the future holds
GPU databases - How to use them and what the future holdsGPU databases - How to use them and what the future holds
GPU databases - How to use them and what the future holds
Arnon Shimoni
 
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUsChoose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Databricks
 
GPU Computing: A brief overview
GPU Computing: A brief overviewGPU Computing: A brief overview
GPU Computing: A brief overview
Rajiv Kumar
 
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
chiportal
 
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015
Daniela Zuppini
 

Similar to FPGAs versus GPUs in Data centers (20)

Key Note Session IDUG DB2 Seminar, 16th April London - Julian Stuhler .Trito...
Key Note Session  IDUG DB2 Seminar, 16th April London - Julian Stuhler .Trito...Key Note Session  IDUG DB2 Seminar, 16th April London - Julian Stuhler .Trito...
Key Note Session IDUG DB2 Seminar, 16th April London - Julian Stuhler .Trito...
 
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
 
UNIT-1.pptx
UNIT-1.pptxUNIT-1.pptx
UNIT-1.pptx
 
Big Data in a Public Cloud
Big Data in a Public CloudBig Data in a Public Cloud
Big Data in a Public Cloud
 
Deep learning with FPGA
Deep learning with FPGADeep learning with FPGA
Deep learning with FPGA
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computing
 
GPU databases - How to use them and what the future holds
GPU databases - How to use them and what the future holdsGPU databases - How to use them and what the future holds
GPU databases - How to use them and what the future holds
 
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUsChoose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
 
GPU Computing: A brief overview
GPU Computing: A brief overviewGPU Computing: A brief overview
GPU Computing: A brief overview
 
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialSCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
 
Hipeac 2018 keynote Talk
Hipeac 2018 keynote TalkHipeac 2018 keynote Talk
Hipeac 2018 keynote Talk
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, SuccessesSQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
 
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
 
Deploying Pretrained Model In Edge IoT Devices.pdf
Deploying Pretrained Model In Edge IoT Devices.pdfDeploying Pretrained Model In Edge IoT Devices.pdf
Deploying Pretrained Model In Edge IoT Devices.pdf
 
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
 
Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep Learning
 
Spark and Deep Learning frameworks with distributed workloads
Spark and Deep Learning frameworks with distributed workloadsSpark and Deep Learning frameworks with distributed workloads
Spark and Deep Learning frameworks with distributed workloads
 
Lecture2 cuda spring 2010
Lecture2 cuda spring 2010Lecture2 cuda spring 2010
Lecture2 cuda spring 2010
 

More from Mehedi Hasan Raju (8)

Evaluating and reducing cloud waste and cost—A data-driven case study from Az...
Evaluating and reducing cloud waste and cost—A data-driven case study from Az...Evaluating and reducing cloud waste and cost—A data-driven case study from Az...
Evaluating and reducing cloud waste and cost—A data-driven case study from Az...
 
2D arrays
2D arrays2D arrays
2D arrays
 
Result Management System
Result Management SystemResult Management System
Result Management System
 
Waveguide
WaveguideWaveguide
Waveguide
 
Representation of signals
Representation of signalsRepresentation of signals
Representation of signals
 
Bit error rate
Bit error rateBit error rate
Bit error rate
 
Vector space
Vector spaceVector space
Vector space
 
Inverse function
Inverse functionInverse function
Inverse function
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 

FPGAs versus GPUs in Data centers

  • 1. FPGAS VERSUS GPUS IN DATACENTERS Babak Falsafi, Bill Dally, Desh Singh, Derek Chiou, Joshua J. Yi, Resit Sendag IEEE Computer Society, 2017 Mehedi Hasan Raju
  • 2. Datacenter • A big datacenter consumes about 20 MW • 17 times the size of a football field (regardless of what kind of football you play) • Costs a few billion dollars - Datacenter at London already consume 6 percent of electricity, and that’s growing at 20 percent per year. • Moore’s law came with Dennard scaling - P = f * C * V2 -> “silver bullet” in supply voltages (V ) - Voltages are scaled down -> exponentially power decreases - It limits the power per area as we scale down the area • According to ITRS projections from 2014 voltages have leveled off
  • 3. • Shifting from a modern server CPU, to something like the EZ Chip/Cavium efficient core design - Specialized for server workloads. - The good news with this type of approach is that it reduces the power. What do you do next? • Core complexity is going down. - we might eventually get to the five-stage pipeline. • Nikos Hardavellas showed that, even for server workloads with abundant request-level parallelism, the server won’t be able to power up the entire chip. continue..
  • 5. On the one hand, we have massive data, and on the other hand, we have diminishing efficiency. • Two possibilities - Field-programmable gate arrays (FPGAs) - GPUs Focus of the panel
  • 6. Two experts on the topic- • Bill Dally, a chief scientist and senior vice president of research at Nvidia • Desh Singh, the director of virtualization labs for Altera. Panel experts
  • 7. • The problem facing datacenters is analogous. • If you need lots of arithmetic performance, integer, floating point, memory bandwidth, and/or tight integration with the CPU to read and write • you would use a GPU, because it does all these things better than anything else. • Nvidia’s P100 GPU is an example of how you get all these things. • If you need to build something to process gene sequences • Memory bandwidth needs are modest, and you do not need tight integration with the CPU. • Use an ASIC [application-specific integrated circuit] • For encoding and decoding and do not need the high bandwidth and arithmetic Bill Dally: Use the Right Tool for the Job
  • 8. • If you do not have the volume- • use an FPGA, FPUs on an FPGA are great • But you must feed those FPUs by building multiplexors, sequencers, and other structures in the reconfigurable fabric. • The overhead is enormous • you never get the efficiency of a GPU on an FGPA. Bill Dally: Use the Right Tool for the Job
  • 9. • Summary from Bill Dally – • for arithmetic and memory-intensive problems – use GPU. • They are also much more programmable. You can program them in CUDA, OpenACC, and OpenMP, and soon in extended Cþþ. • for nonarithmetic logic with modest memory bandwidth demands - > use an ASIC. • If you have low volume -> use an FPGA, • but realize you’re going to get this huge overhead and have a really hard time programming it. • You basically have to write Verilog or the equivalent. Bill Dally: Use the Right Tool for the Job
  • 10. • To run anything on a simple processor, the processor would need something that fetches instructions from memory , a simple arithmetic unit, and a register file. • If you were to implement an instruction on this processor, it would basically fetch an opcode that tells the processor how to configure the data path for that opcode. • Executing the assembly language instructions on the simple processor, basically what you are doing is multiplexing that same piece of hardware across time. • This is essentially what a processor does: it takes the same piece of hardware and multiplexes it to implement different kinds of instructions. • This is essentially what you do when you try to transform the program to run on an FPGA. You are essentially creating a custom piece of hardware to implement your algorithm. Desh Singh: Specialization for Efficiency
  • 11. • They have built a Java compiler and things as complex as Spark and Hadoop implementations that directly map to FPGA hardware. • When you start using software to define hardware like this, it is not just about defining the accelerator itself; you will find that there are other fundamental things you can now do with the FPGA. • Customize how you move data around. • Data can bypass memory and directly enter the FPGA, which gives you very efficient types of computation. In short, FPGA enables the implementation of custom data and control paths from standard software languages. Desh Singh: Specialization for Efficiency

Editor's Notes

  1. International Technology Roadmap for Semiconductors