An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural Network Acceleration (full presentation)

LEGATO project
LEGATO projectLEGATO project
An Experimental Study of Reduced-Voltage
Operation in Modern FPGAs
for Neural Network Acceleration
50th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 30th June, 2020
Behzad Salami Baturay Onural Ismail Yuksel
Fahrettin Koc Oguz Ergin Adrian Cristal
Osman Unsal Hamid Sarbazi-Azad Onur Mutlu
2
Executive Summary
• Motivation: Power consumption of neural networks is a main concern
 Hardware acceleration: GPUs, FPGAs, and ASICs
• Problem: FPGAs are at least 10X less power-efficient than equivalent ASICs
• Goal: Bridge the power-efficiency gap between ASIC- and FPGA-based
neural networks by Undervolting below nominal level
• Evaluation Setup
 5 Image classification workloads
 3 Xilinx UltraScale+ ZCU102 platforms
 2 On-chip voltage rails
• Main Results
 Large voltage guardband (i.e., 33%)
 >3X power-efficiency gain
3
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
4
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
5
Motivation and Background
• Motivation
 Power consumption of neural networks is a main concern
 Hardware acceleration: GPUs, FPGAs, and ASICs
 FPGAs: Getting popular but less power-efficient than equivalent ASICs
 Large voltage guardbands (12-35%) for CPUs, GPUs, DRAMs
 Any potential of “Undervolting FPGAs” for power-efficiency of neural networks?
• Background
 Neural Networks: Widely deployed with an inherent resilience to errors
 FPGAs: Higher throughput than GPUs and better flexibility than ASICs
 Undervolting: Reduces power cons., may incur reliability or performance issues
6
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
7
Our Goal
• Primary Goal
 Bridge the power-efficiency gap between ASIC- and FPGA-based
neural networks by:
 Undervolting (i.e., underscaling voltage below nominal level)
• Secondary Goals
 Study the voltage behavior of real FPGAs (e.g., guardband)
 Study the power-efficiency gain of undervolting for neural networks
 Study the reliability overhead
 Study the frequency underscaling to prevent the accuracy loss
 Study the effect of environmental temperature
8
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
9
Overall Methodology
• 5 CNN image classification
workloads, i.e., VGGNet, GoogleNet,
AlexNet, ResNet50, Inception.
• Xilinx DNNDK to map CNN into FPGA
 By default optimized for INT8
• 3 identical samples of Xilinx ZCU102
 ZYNQ Ultrscale+ architecture
 Hard-core ARM for data orchestration
 FPGA for CNN acceleration
• 2 on-chip voltage rails, via PMBus
 𝑉𝐶𝐶𝐼𝑁𝑇: DSPs, LUTs, buffers, …
 𝑉𝐶𝐶𝐵𝑅𝐴𝑀: BRAMs
 𝑉𝑛𝑜𝑚= 850mV (set by manufacturer)
Vast majority (>99.9%) of the power is dissipated on 𝑉𝐶𝐶𝐼𝑁𝑇
10
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
11
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
12
Overall Voltage Behavior
Slight variation of voltage behavior across platforms and benchmarks
 FPGA stops operatingCrash
• Guardband: Large region below nominal level (𝑽 𝒏𝒐𝒎 = 𝟖𝟓𝟎𝒎𝑽)
• Critical: Narrower region below guardband (𝑽 𝒎𝒊𝒏 = 𝟓𝟕𝟎𝒎𝑽)
• Crash: FPGA crashes below critical region (𝑽 𝒄𝒓𝒂𝒔𝒉 = 𝟓𝟒𝟎𝒎𝑽)
 No performance or reliability loss
 Added by the vendor to ensure the
worst-case conditions
 Large guardband, average of 33%
Guard
band
 A narrow voltage region
 Neural network accuracy collapse
Critical
13
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
14
Power-Reliability Trade-off
Power-efficiency (GOPs/W) gain
• >3X power saving (2.6X by eliminating guardband and further 43% in critical region)
Reliability overhead (i.e., CNN accuracy loss)
VGGNet GoogleNet AlexNet ResNet Inception
• Slight variation across 3 platforms and 5 workloads
• No accuracy loss in the guardband, accuracy collapse in the critical region
• Slight variation across 3 platforms and 5 workloads
15
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
16
VCCINT
(mV)
Fmax
(Mhz)
GOPs
(Norm)
Power (W)
Norm)
GOPs/W
(Norm)
GOPs/J
(Norm)
570 333 1 1 1 1
565 300 0.94 0.97 0.97 0.87
560 250 0.83 0.84 0.99 0.75
555 250 0.83 0.78 1.06 0.8
550 250 0.83 0.75 1.1 0.83
545 250 0.83 0.74 1.12 0.84
540 200 0.7 0.56 1.25 0.75
Frequency Underscaling
• Simultaneous frequency underscaling to prevent CNN accuracy collapse in
the critical voltage region
• For each voltage level below 𝑽 𝒎𝒊𝒏, we found the 𝑭 𝒎𝒂𝒙, the maximum
operating frequency at which there is no accuracy loss
• Leads to performance and energy-efficiency loss
Best setting for High-performance and Energy-efficiency Best setting for Power-efficiency
(Voltage steps= 5mV, Frequency steps= 50Mhz)- shown for GoogleNet
17
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
18
Environmental Temperature
• Effects of environmental temperature on power-reliability
 Use fan speed to test temperature in [34 ℃, 50 ℃]
 On-board temperature monitored by PMBus
• Temperature effects on power consumption
 ↓ 𝑇𝑒𝑚𝑝 → ↓ 𝑃𝑜𝑤𝑒𝑟 (direct relation of power and temp)
 By undervolting, the impact of temperature on power consumption reduces.
• Temperature effects on reliability
 ↓ 𝑇𝑒𝑚𝑝 → ↑ 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑙𝑜𝑠𝑠 (indirect relation of reliability and temp)
 In our temperature range, 𝑉 𝑚𝑖𝑛 and 𝑉𝑐𝑟𝑎𝑠ℎdo not change significantly.
GoogleNet
19
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
20
Prior Works
• Undervolting
 Studies for off-the-shelf real CPUs, GPUs, ASICs, DRAMs
 Large voltage guardband (from 12% to 35%) for many devices
 This work extends such studies for off-the-shelf FPGAs especially for
neural network acceleration and confirms large guardbands (i.e., 33%)
• Power-Efficient Neural Networks
 Studies on architectural-, hardware-, and software-level techniques
 Undervolting in neural network ASIC accelerator (e.g., GreenTPU-DAC’19)
 This work proposes a hardware-level undervolting for further
power-saving (>3X) in FPGAs.
• Reliability in Neural Networks
 Analytical and simulation-based studies (e.g., Thundervolt-DAC’18)
 Some studies on real hardware (e.g., EDEN-MICRO’19)
 This work studies the reliability of neural networks on real FPGAs
when operating at reduced voltage levels.
21
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
22
Summary, Conclusion, and Future Works
• Summary
 We improve the power-efficiency (>3X) of off-the-shelf
FPGAs via undervolting for neural network accelerators:
 2.6X by eliminating the guardband (i.e., 33%) without any cost
 43% by further undervolting below the guardband with the cost of
 either accuracy loss, when the frequency is not underscaled
 or performance loss, when the frequency is underscaled
• Conclusion
 Undervolting is an effective way to achieve significant
power-saving for FPGA-based neural network accelerators
• Future Works
 HW & SW extension of our undervolting for FPGA clusters
and other neural network models and tools
An Experimental Study of Reduced-Voltage
Operation in Modern FPGAs
for Neural Network Acceleration
50th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 30th June, 2020
Behzad Salami Baturay Onural Ismail Yuksel
Fahrettin Koc Oguz Ergin Adrian Cristal
Osman Unsal Hamid Sarbazi-Azad Onur Mutlu
1 of 23

Recommended

A Brief Survey of Current Power Limiting Strategies by
A Brief Survey of Current Power Limiting StrategiesA Brief Survey of Current Power Limiting Strategies
A Brief Survey of Current Power Limiting StrategiesIRJET Journal
34 views6 slides
Run-time power management in cloud and containerized environments by
Run-time power management in cloud and containerized environmentsRun-time power management in cloud and containerized environments
Run-time power management in cloud and containerized environmentsNECST Lab @ Politecnico di Milano
236 views35 slides
Altera trcak g by
Altera  trcak gAltera  trcak g
Altera trcak gAlona Gradman
938 views41 slides
Low-power Innovative techniques for Wearable Computing by
Low-power Innovative techniques for Wearable ComputingLow-power Innovative techniques for Wearable Computing
Low-power Innovative techniques for Wearable ComputingOmar Elshal
635 views30 slides
12 15022 an adaptive manuscript (word)(edit) by
12 15022 an adaptive manuscript (word)(edit)12 15022 an adaptive manuscript (word)(edit)
12 15022 an adaptive manuscript (word)(edit)nooriasukmaningtyas
22 views8 slides
A0710113 by
A0710113A0710113
A0710113IOSR Journals
566 views13 slides

More Related Content

Similar to An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural Network Acceleration (full presentation)

LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project by
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGATO project
49 views35 slides
An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural ... by
An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural ...An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural ...
An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural ...LEGATO project
41 views3 slides
ISCA2021 Tutorial-Methods for Characterization and Analysis of Voltage Margin... by
ISCA2021 Tutorial-Methods for Characterization and Analysis of Voltage Margin...ISCA2021 Tutorial-Methods for Characterization and Analysis of Voltage Margin...
ISCA2021 Tutorial-Methods for Characterization and Analysis of Voltage Margin...Behzad Salami
34 views26 slides
Power Optimization with Efficient Test Logic Partitioning for Full Chip Design by
Power Optimization with Efficient Test Logic Partitioning for Full Chip DesignPower Optimization with Efficient Test Logic Partitioning for Full Chip Design
Power Optimization with Efficient Test Logic Partitioning for Full Chip DesignPankaj Singh
1.2K views19 slides
How to achieve 95%+ Accurate power measurement during architecture exploration? by
How to achieve 95%+ Accurate power measurement during architecture exploration? How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration? Deepak Shankar
27 views37 slides
RT15 Berkeley | Power HIL Simulator (SimP) A prototype to develop a high band... by
RT15 Berkeley | Power HIL Simulator (SimP) A prototype to develop a high band...RT15 Berkeley | Power HIL Simulator (SimP) A prototype to develop a high band...
RT15 Berkeley | Power HIL Simulator (SimP) A prototype to develop a high band...OPAL-RT TECHNOLOGIES
826 views19 slides

Similar to An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural Network Acceleration (full presentation)(20)

LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project by LEGATO project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGATO project49 views
An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural ... by LEGATO project
An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural ...An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural ...
An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural ...
LEGATO project41 views
ISCA2021 Tutorial-Methods for Characterization and Analysis of Voltage Margin... by Behzad Salami
ISCA2021 Tutorial-Methods for Characterization and Analysis of Voltage Margin...ISCA2021 Tutorial-Methods for Characterization and Analysis of Voltage Margin...
ISCA2021 Tutorial-Methods for Characterization and Analysis of Voltage Margin...
Behzad Salami34 views
Power Optimization with Efficient Test Logic Partitioning for Full Chip Design by Pankaj Singh
Power Optimization with Efficient Test Logic Partitioning for Full Chip DesignPower Optimization with Efficient Test Logic Partitioning for Full Chip Design
Power Optimization with Efficient Test Logic Partitioning for Full Chip Design
Pankaj Singh1.2K views
How to achieve 95%+ Accurate power measurement during architecture exploration? by Deepak Shankar
How to achieve 95%+ Accurate power measurement during architecture exploration? How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration?
Deepak Shankar27 views
RT15 Berkeley | Power HIL Simulator (SimP) A prototype to develop a high band... by OPAL-RT TECHNOLOGIES
RT15 Berkeley | Power HIL Simulator (SimP) A prototype to develop a high band...RT15 Berkeley | Power HIL Simulator (SimP) A prototype to develop a high band...
RT15 Berkeley | Power HIL Simulator (SimP) A prototype to develop a high band...
Low power in vlsi with upf basics part 1 by SUNODH GARLAPATI
Low power in vlsi with upf basics part 1Low power in vlsi with upf basics part 1
Low power in vlsi with upf basics part 1
SUNODH GARLAPATI1.9K views
PAN Files from IEC 61853-1 Test Data: Why Using Datasheet I-V Values is a Bad... by Kyumin Lee
PAN Files from IEC 61853-1 Test Data: Why Using Datasheet I-V Values is a Bad...PAN Files from IEC 61853-1 Test Data: Why Using Datasheet I-V Values is a Bad...
PAN Files from IEC 61853-1 Test Data: Why Using Datasheet I-V Values is a Bad...
Kyumin Lee597 views
Low-Power Design and Verification by DVClub
Low-Power Design and VerificationLow-Power Design and Verification
Low-Power Design and Verification
DVClub6.3K views
RT15 Berkeley | Optimized Power Flow Control in Microgrids - Sandia Laboratory by OPAL-RT TECHNOLOGIES
RT15 Berkeley | Optimized Power Flow Control in Microgrids - Sandia LaboratoryRT15 Berkeley | Optimized Power Flow Control in Microgrids - Sandia Laboratory
RT15 Berkeley | Optimized Power Flow Control in Microgrids - Sandia Laboratory
참여기관_발표자료-국민대학교 201301 정기회의 by DzH QWuynh
참여기관_발표자료-국민대학교 201301 정기회의참여기관_발표자료-국민대학교 201301 정기회의
참여기관_발표자료-국민대학교 201301 정기회의
DzH QWuynh224 views
Energy Efficient Computing using Dynamic Tuning by inside-BigData.com
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com454 views
Per domain power analysis by Arun Joseph
Per domain power analysisPer domain power analysis
Per domain power analysis
Arun Joseph134 views
Understanding the Reliability and Power-Efficiency Trade-offs of Modern FPGAs... by Behzad Salami
Understanding the Reliability and Power-Efficiency Trade-offs of Modern FPGAs...Understanding the Reliability and Power-Efficiency Trade-offs of Modern FPGAs...
Understanding the Reliability and Power-Efficiency Trade-offs of Modern FPGAs...
Behzad Salami74 views

More from LEGATO project

Scrooge Attack: Undervolting ARM Processors for Profit by
Scrooge Attack: Undervolting ARM Processors for ProfitScrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitLEGATO project
75 views15 slides
A practical approach for updating an integrity-enforced operating system by
A practical approach for updating an integrity-enforced operating systemA practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemLEGATO project
90 views65 slides
TEEMon: A continuous performance monitoring framework for TEEs by
TEEMon: A continuous performance monitoring framework for TEEsTEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsLEGATO project
57 views28 slides
secureTF: A Secure TensorFlow Framework by
secureTF: A Secure TensorFlow FrameworksecureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworkLEGATO project
94 views10 slides
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep... by
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...LEGATO project
37 views13 slides
LEGaTO: Machine Learning Use Case by
LEGaTO: Machine Learning Use CaseLEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGATO project
131 views18 slides

More from LEGATO project(20)

Scrooge Attack: Undervolting ARM Processors for Profit by LEGATO project
Scrooge Attack: Undervolting ARM Processors for ProfitScrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for Profit
LEGATO project75 views
A practical approach for updating an integrity-enforced operating system by LEGATO project
A practical approach for updating an integrity-enforced operating systemA practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating system
LEGATO project90 views
TEEMon: A continuous performance monitoring framework for TEEs by LEGATO project
TEEMon: A continuous performance monitoring framework for TEEsTEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEs
LEGATO project57 views
secureTF: A Secure TensorFlow Framework by LEGATO project
secureTF: A Secure TensorFlow FrameworksecureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow Framework
LEGATO project94 views
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep... by LEGATO project
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
LEGATO project37 views
LEGaTO: Machine Learning Use Case by LEGATO project
LEGaTO: Machine Learning Use CaseLEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use Case
LEGATO project131 views
LEGaTO: Software Stack Programming Models by LEGATO project
LEGaTO: Software Stack Programming ModelsLEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming Models
LEGATO project56 views
LEGaTO: Software Stack Runtimes by LEGATO project
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
LEGATO project62 views
LEGaTO: Low-Energy Heterogeneous Computing Workshop by LEGATO project
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGATO project50 views
TZ4Fabric: Executing Smart Contracts with ARM TrustZone by LEGATO project
TZ4Fabric: Executing Smart Contracts with ARM TrustZoneTZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
LEGATO project49 views
Infection Research with Maxeler Dataflow Computing by LEGATO project
Infection Research with Maxeler Dataflow ComputingInfection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow Computing
LEGATO project149 views
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency by LEGATO project
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-ResiliencyFPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
LEGATO project57 views
Device Data Directory and Asynchronous execution: A path to heterogeneous com... by LEGATO project
Device Data Directory and Asynchronous execution: A path to heterogeneous com...Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
LEGATO project46 views
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments by LEGATO project
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
LEGATO project38 views
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing by LEGATO project
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingRECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
LEGATO project71 views

Recently uploaded

ALGAL PRODUCTS.pptx by
ALGAL PRODUCTS.pptxALGAL PRODUCTS.pptx
ALGAL PRODUCTS.pptxRASHMI M G
7 views17 slides
Paper Chromatography or Paper partition chromatography by
Paper Chromatography or Paper partition chromatographyPaper Chromatography or Paper partition chromatography
Paper Chromatography or Paper partition chromatographyPoonam Aher Patil
7 views34 slides
XUE: Molecular Inventory in the Inner Region of an Extremely Irradiated Proto... by
XUE: Molecular Inventory in the Inner Region of an Extremely Irradiated Proto...XUE: Molecular Inventory in the Inner Region of an Extremely Irradiated Proto...
XUE: Molecular Inventory in the Inner Region of an Extremely Irradiated Proto...Sérgio Sacani
1.1K views14 slides
Krishna VSC 692 Credit Seminar.pptx by
Krishna VSC 692 Credit Seminar.pptxKrishna VSC 692 Credit Seminar.pptx
Krishna VSC 692 Credit Seminar.pptxKrishnaSharma682993
13 views54 slides
NU-543 Class II Type A2 Biosafety Cabinet by
NU-543 Class II Type A2 Biosafety CabinetNU-543 Class II Type A2 Biosafety Cabinet
NU-543 Class II Type A2 Biosafety CabinetGaia Science Pte Ltd
5 views1 slide
Determination of color fastness to rubbing(wet and dry condition) by crockmeter. by
Determination of color fastness to rubbing(wet and dry condition) by crockmeter.Determination of color fastness to rubbing(wet and dry condition) by crockmeter.
Determination of color fastness to rubbing(wet and dry condition) by crockmeter.ShadmanSakib63
9 views6 slides

Recently uploaded(20)

Paper Chromatography or Paper partition chromatography by Poonam Aher Patil
Paper Chromatography or Paper partition chromatographyPaper Chromatography or Paper partition chromatography
Paper Chromatography or Paper partition chromatography
XUE: Molecular Inventory in the Inner Region of an Extremely Irradiated Proto... by Sérgio Sacani
XUE: Molecular Inventory in the Inner Region of an Extremely Irradiated Proto...XUE: Molecular Inventory in the Inner Region of an Extremely Irradiated Proto...
XUE: Molecular Inventory in the Inner Region of an Extremely Irradiated Proto...
Sérgio Sacani1.1K views
Determination of color fastness to rubbing(wet and dry condition) by crockmeter. by ShadmanSakib63
Determination of color fastness to rubbing(wet and dry condition) by crockmeter.Determination of color fastness to rubbing(wet and dry condition) by crockmeter.
Determination of color fastness to rubbing(wet and dry condition) by crockmeter.
ShadmanSakib639 views
Micelle Drug Delivery System (Nanotechnology).pptx by ANANYA KUMAR
Micelle Drug Delivery System (Nanotechnology).pptxMicelle Drug Delivery System (Nanotechnology).pptx
Micelle Drug Delivery System (Nanotechnology).pptx
ANANYA KUMAR5 views
Oral_Presentation_by_Fatma (2).pdf by fatmaalmrzqi
Oral_Presentation_by_Fatma (2).pdfOral_Presentation_by_Fatma (2).pdf
Oral_Presentation_by_Fatma (2).pdf
fatmaalmrzqi8 views
INTRODUCTION TO PLANT SYSTEMATICS.pptx by RASHMI M G
INTRODUCTION TO PLANT SYSTEMATICS.pptxINTRODUCTION TO PLANT SYSTEMATICS.pptx
INTRODUCTION TO PLANT SYSTEMATICS.pptx
RASHMI M G 5 views
Exploring_The_Unthinkable_ Franco Gollo.pdf by draconox80
Exploring_The_Unthinkable_ Franco Gollo.pdfExploring_The_Unthinkable_ Franco Gollo.pdf
Exploring_The_Unthinkable_ Franco Gollo.pdf
draconox806 views
Presentation on experimental laboratory animal- Hamster by Kanika13641
Presentation on experimental laboratory animal- HamsterPresentation on experimental laboratory animal- Hamster
Presentation on experimental laboratory animal- Hamster
Kanika136418 views
Note on the Riemann Hypothesis by vegafrank2
Note on the Riemann HypothesisNote on the Riemann Hypothesis
Note on the Riemann Hypothesis
vegafrank29 views
RADIATION PHYSICS.pptx by drpriyanka8
RADIATION PHYSICS.pptxRADIATION PHYSICS.pptx
RADIATION PHYSICS.pptx
drpriyanka815 views
Towards Error-Corrected Quantum Computing with Neutral Atoms by Yuval Boger
Towards Error-Corrected Quantum Computing with Neutral AtomsTowards Error-Corrected Quantum Computing with Neutral Atoms
Towards Error-Corrected Quantum Computing with Neutral Atoms
Yuval Boger5 views
Vegetable grafting: A new crop improvement approach.pptx by Himul Suthar
Vegetable grafting: A new crop improvement approach.pptxVegetable grafting: A new crop improvement approach.pptx
Vegetable grafting: A new crop improvement approach.pptx
Himul Suthar11 views
Generative AI to Accelerate Discovery of Materials by Deakin University
Generative AI to Accelerate Discovery of MaterialsGenerative AI to Accelerate Discovery of Materials
Generative AI to Accelerate Discovery of Materials
2. Natural Sciences and Technology Author Siyavula.pdf by ssuser821efa
2. Natural Sciences and Technology Author Siyavula.pdf2. Natural Sciences and Technology Author Siyavula.pdf
2. Natural Sciences and Technology Author Siyavula.pdf
ssuser821efa13 views

An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural Network Acceleration (full presentation)

  • 1. An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration 50th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 30th June, 2020 Behzad Salami Baturay Onural Ismail Yuksel Fahrettin Koc Oguz Ergin Adrian Cristal Osman Unsal Hamid Sarbazi-Azad Onur Mutlu
  • 2. 2 Executive Summary • Motivation: Power consumption of neural networks is a main concern  Hardware acceleration: GPUs, FPGAs, and ASICs • Problem: FPGAs are at least 10X less power-efficient than equivalent ASICs • Goal: Bridge the power-efficiency gap between ASIC- and FPGA-based neural networks by Undervolting below nominal level • Evaluation Setup  5 Image classification workloads  3 Xilinx UltraScale+ ZCU102 platforms  2 On-chip voltage rails • Main Results  Large voltage guardband (i.e., 33%)  >3X power-efficiency gain
  • 3. 3 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 4. 4 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 5. 5 Motivation and Background • Motivation  Power consumption of neural networks is a main concern  Hardware acceleration: GPUs, FPGAs, and ASICs  FPGAs: Getting popular but less power-efficient than equivalent ASICs  Large voltage guardbands (12-35%) for CPUs, GPUs, DRAMs  Any potential of “Undervolting FPGAs” for power-efficiency of neural networks? • Background  Neural Networks: Widely deployed with an inherent resilience to errors  FPGAs: Higher throughput than GPUs and better flexibility than ASICs  Undervolting: Reduces power cons., may incur reliability or performance issues
  • 6. 6 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 7. 7 Our Goal • Primary Goal  Bridge the power-efficiency gap between ASIC- and FPGA-based neural networks by:  Undervolting (i.e., underscaling voltage below nominal level) • Secondary Goals  Study the voltage behavior of real FPGAs (e.g., guardband)  Study the power-efficiency gain of undervolting for neural networks  Study the reliability overhead  Study the frequency underscaling to prevent the accuracy loss  Study the effect of environmental temperature
  • 8. 8 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 9. 9 Overall Methodology • 5 CNN image classification workloads, i.e., VGGNet, GoogleNet, AlexNet, ResNet50, Inception. • Xilinx DNNDK to map CNN into FPGA  By default optimized for INT8 • 3 identical samples of Xilinx ZCU102  ZYNQ Ultrscale+ architecture  Hard-core ARM for data orchestration  FPGA for CNN acceleration • 2 on-chip voltage rails, via PMBus  𝑉𝐶𝐶𝐼𝑁𝑇: DSPs, LUTs, buffers, …  𝑉𝐶𝐶𝐵𝑅𝐴𝑀: BRAMs  𝑉𝑛𝑜𝑚= 850mV (set by manufacturer) Vast majority (>99.9%) of the power is dissipated on 𝑉𝐶𝐶𝐼𝑁𝑇
  • 10. 10 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 11. 11 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 12. 12 Overall Voltage Behavior Slight variation of voltage behavior across platforms and benchmarks  FPGA stops operatingCrash • Guardband: Large region below nominal level (𝑽 𝒏𝒐𝒎 = 𝟖𝟓𝟎𝒎𝑽) • Critical: Narrower region below guardband (𝑽 𝒎𝒊𝒏 = 𝟓𝟕𝟎𝒎𝑽) • Crash: FPGA crashes below critical region (𝑽 𝒄𝒓𝒂𝒔𝒉 = 𝟓𝟒𝟎𝒎𝑽)  No performance or reliability loss  Added by the vendor to ensure the worst-case conditions  Large guardband, average of 33% Guard band  A narrow voltage region  Neural network accuracy collapse Critical
  • 13. 13 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 14. 14 Power-Reliability Trade-off Power-efficiency (GOPs/W) gain • >3X power saving (2.6X by eliminating guardband and further 43% in critical region) Reliability overhead (i.e., CNN accuracy loss) VGGNet GoogleNet AlexNet ResNet Inception • Slight variation across 3 platforms and 5 workloads • No accuracy loss in the guardband, accuracy collapse in the critical region • Slight variation across 3 platforms and 5 workloads
  • 15. 15 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 16. 16 VCCINT (mV) Fmax (Mhz) GOPs (Norm) Power (W) Norm) GOPs/W (Norm) GOPs/J (Norm) 570 333 1 1 1 1 565 300 0.94 0.97 0.97 0.87 560 250 0.83 0.84 0.99 0.75 555 250 0.83 0.78 1.06 0.8 550 250 0.83 0.75 1.1 0.83 545 250 0.83 0.74 1.12 0.84 540 200 0.7 0.56 1.25 0.75 Frequency Underscaling • Simultaneous frequency underscaling to prevent CNN accuracy collapse in the critical voltage region • For each voltage level below 𝑽 𝒎𝒊𝒏, we found the 𝑭 𝒎𝒂𝒙, the maximum operating frequency at which there is no accuracy loss • Leads to performance and energy-efficiency loss Best setting for High-performance and Energy-efficiency Best setting for Power-efficiency (Voltage steps= 5mV, Frequency steps= 50Mhz)- shown for GoogleNet
  • 17. 17 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 18. 18 Environmental Temperature • Effects of environmental temperature on power-reliability  Use fan speed to test temperature in [34 ℃, 50 ℃]  On-board temperature monitored by PMBus • Temperature effects on power consumption  ↓ 𝑇𝑒𝑚𝑝 → ↓ 𝑃𝑜𝑤𝑒𝑟 (direct relation of power and temp)  By undervolting, the impact of temperature on power consumption reduces. • Temperature effects on reliability  ↓ 𝑇𝑒𝑚𝑝 → ↑ 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑙𝑜𝑠𝑠 (indirect relation of reliability and temp)  In our temperature range, 𝑉 𝑚𝑖𝑛 and 𝑉𝑐𝑟𝑎𝑠ℎdo not change significantly. GoogleNet
  • 19. 19 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 20. 20 Prior Works • Undervolting  Studies for off-the-shelf real CPUs, GPUs, ASICs, DRAMs  Large voltage guardband (from 12% to 35%) for many devices  This work extends such studies for off-the-shelf FPGAs especially for neural network acceleration and confirms large guardbands (i.e., 33%) • Power-Efficient Neural Networks  Studies on architectural-, hardware-, and software-level techniques  Undervolting in neural network ASIC accelerator (e.g., GreenTPU-DAC’19)  This work proposes a hardware-level undervolting for further power-saving (>3X) in FPGAs. • Reliability in Neural Networks  Analytical and simulation-based studies (e.g., Thundervolt-DAC’18)  Some studies on real hardware (e.g., EDEN-MICRO’19)  This work studies the reliability of neural networks on real FPGAs when operating at reduced voltage levels.
  • 21. 21 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 22. 22 Summary, Conclusion, and Future Works • Summary  We improve the power-efficiency (>3X) of off-the-shelf FPGAs via undervolting for neural network accelerators:  2.6X by eliminating the guardband (i.e., 33%) without any cost  43% by further undervolting below the guardband with the cost of  either accuracy loss, when the frequency is not underscaled  or performance loss, when the frequency is underscaled • Conclusion  Undervolting is an effective way to achieve significant power-saving for FPGA-based neural network accelerators • Future Works  HW & SW extension of our undervolting for FPGA clusters and other neural network models and tools
  • 23. An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration 50th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 30th June, 2020 Behzad Salami Baturay Onural Ismail Yuksel Fahrettin Koc Oguz Ergin Adrian Cristal Osman Unsal Hamid Sarbazi-Azad Onur Mutlu