Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Comprehensive evaluation of supply voltage underscaling in FPGA on chip memories

119 views

Published on

In this work, we evaluate aggressive undervolting, i.e., voltage scaling below the nominal level to reduce the energy consumption of Field Programmable Gate Arrays (FPGAs). Usually, voltage guardbands are added by chip vendors to ensure the worst-case process and environmental scenarios. Through experimenting on several FPGA architectures, we measure this voltage guardband to be on average 39% of the nominal level, which in turn, delivers more than an order of magnitude power savings. However, further undervolting below the voltage guardband may cause reliability issues as the result of the circuit delay increase, i.e., start to appear faults. We extensively characterize the behavior of these faults in terms of the rate, location, type, as well as sensitivity to environmental temperature, with a concentration of on-chip memories, or Block RAMs (BRAMs). Finally, we evaluate a typical FPGA-based Neural Network (NN) accelerator under low-voltage BRAM operations. In consequence, the substantial NN energy savings come with the cost of NN accuracy loss. To attain power savings without NN accuracy loss, we propose a novel technique that relies on the deterministic behavior of undervolting faults and can limit the accuracy loss to 0.1% without any timing-slack overhead.

Published in: Science
  • Login to see the comments

  • Be the first to like this

Comprehensive evaluation of supply voltage underscaling in FPGA on chip memories

  1. 1. www.bsc.es Comprehensive Evaluation of Supply Voltage Underscaling in FPGA on-chip Memories Behzad Salami, Osman Unsal, and Adrian Cristal Barcelona Supercomptuing Center (BSC) The 51st Annual IEEE/ACM International Symposium on Microarchitecture ® (MICOR), Fukuoka City, Japan.
  2. 2. 2 Aggressive Undervolting  Aggressive undervolting- Underscaling the supply voltage below the nominal and safe level:  Power/Energy Efficiency: Reduces dynamic and static power quadratically and linearly, respectively.  Reliability: Increases the circuit delay and in turn, causes timing faults.  Dual/Multi-Vdd, DVS, and DVFS: Similar but different mechanisms to aggressive undervolting:  Similarity: Underscaling the supply voltage.  Difference: Undervolting is until a certain safe level, usually constrained by vendors. Reliability Power/Energy Efficiency
  3. 3. 3 State-of-the-art 1. Aggressive undervolting has shown significant efficiency to reduce the energy consumption.  Devices:  CPUs: Itanium II (ISCA2014), X86 (IOLTS2017), ARM (HPCA2017)  GPUs: NVidia (Micro2015)  DRAMs: Multiple Brands (Sigmetrics2017)  FPGA: This work  Focus of the previous works:  Voltage guardband  Minimum safe voltage, i.e., Vmin prediction  Fault characterization and mitigation  Chip-to-chip, core-to-core, and workload-to-workload variation  …. 2. More straightforward and more parameters but less precise  ASIC DNN: Minerva (Micro2016), Thundervolt (DAC2018)  CPU: Bravo (HPCA2017 )  Network On-Chip (HPCA2014) Real hardware: Simulation-based studies:
  4. 4. 4 Undervolting on FPGAs: Motivation Contribution of FPGAs in large data centers is growing, expected to be in 30% of datacenter servers by 2020 (Top500 news).  In comparison to ASICs, energy efficiency of FPGAs is a serious concern, i.e., 10X- 100X less-efficient.  Nominal voltage reduction of FPGAs is naturally applied for different generations. Undervolting [Intel/Altera] [Xilinx]
  5. 5. 5 Contributions 1. Undervolting FPGAs  Voltage guardband  Overall power and reliability trade-off 2. Fault characterization in FPGA on-chip memories  Fault type, location, and rate  Temperature, Chip 3. Low-voltage FPGA-based Neural Network (NN)  Power consumption and NN accuracy characterization  Fault mitigation techniques  Application-aware technique  Built-in ECC
  6. 6. 6 Voltage Scaling Capability in Xilinx VC707: performance-efficient design KC705: power-efficient design (A & B) Evaluated Xilinx platforms VC707 Voltage distribution on Xilinx platforms Voltage regulator  Power Management Bus (PMBus).  Hardwired to the host. ZC702: ARM integrated with FPGA VCCINT VCCBRAM
  7. 7. 7 Overall Voltage Behavior  FPGA stops operating below Vcrash, min operating voltageCRASH  No observable fault  Voltage Guardband below Vnom SAFE  Faults manifest  Below Vmin, min safe voltageCRITICAL  Voltage guardband: to ensure the worst-case environmental and process technologies.  Experimental conditions: At ambient temperature and maximum operating frequency. We performed more detailed studies on FPGA on-chip memories (BRAMs). Vnom Vmin Vcrash 0 0.2 0.4 0.6 0.8 1 VC707 ZC702 KC705-A KC705-B VCCBRAM(V) Platform SAFE (Guardband) CRITICAL CRASH 0 0.2 0.4 0.6 0.8 1 VC707 ZC702 KC705-A KC705-B VCCINT(V) Platform SAFE (Guardband) CRITICAL CRASH
  8. 8. 8 Floorplan of VC707 Experimental Methodology  FPGA BRAMs:  Hierarchy of set of bit-cells distributed over the chip.  Size of each BRAM: 16-kbits  Experimental Methodology:  HW: Transfer content of BRAMs to the host.  SW: Analyze data, and adjust voltage of BRAMs. (2060 BRAMs)
  9. 9. 9 0 200 400 600 800 0 1 2 3 1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 FaultRate(per1Mbit) BRAMPower(Watts) VCCBRAM (V) BRAM Power Fault Rate Vmin=0.61V Vcrash=0.54V 0 400 800 0 0.1 0.2 0.3 0.4 0.61 0.6 0.59 0.58 0.57 0.56 0.55 0.54 per1Mbit Watts Vnom=1V Overall Trade-offs on BRAMs- Power & Reliability VC707
  10. 10. 10 0 150 300 0 0.05 0.1 0.15 0.61 0.6 0.59 0.58 0.57 0.56 0.55 0.54 per1Mbit Watts Overall Trade-offs on BRAMs- Multiple Platforms 0 200 400 600 800 0 1 2 3 1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 FaultRate (per1Mbit) BRAMPower (Watts) VCCBRAM (V) Vnom=1V Vmin=0.61V 0 400 800 0 0.2 0.4 0.61 0.6 0.59 0.58 0.57 0.56 0.55 0.54 per1Mbit Watts VC707 0 50 100 150 200 0 10 20 30 1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 FaultRate (per1Mbit) BRAMPower (mWatts) VCCBRAM (V) Vnom=1V Vcrash=0.53V 0 100 200 0 2 4 0.59 0.58 0.57 0.56 0.55 0.54 0.53 per1Mbit mWatts ZC702 0 100 200 300 0 1 2 3 1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 FaultRate (per1Mbit) BRAMPower (Watts) VCCBRAM (V) Vnom=1V Vcrash=0.54V Vmin=0.61V KC705-A 0 20 40 60 80 0 1 2 3 1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 FaultRate (per1Mbit) BRAMPower (Watts) VCCBRAM (V) Vnom=1V Vmin=0.57V Vcrash=0.54V 0 40 80 0 0.05 0.1 0.15 0.57 0.56 0.55 0.54 per1Mbit Watts KC705-B Vmin=0.59V Vcrash=0.54V
  11. 11. 11 Key Points of the First Contribution  Voltage regions: Safe, Critical, and Crash voltage regions exist for all platforms, slightly different among studied platforms.  Voltage guardbands: Large voltage guardband confirmed for all platforms on the studied voltage rails, i.e., VCCBRAM and VCCINT.  Power reduction: There is significant power reduction through aggressive undervolting, with more details studied for BRAMs.  Reliability costs: Fault rates exponentially increase in the Critical voltage region.
  12. 12. 12 Contributions 1. Undervolting FPGAs  Voltage guardband  Overall power and reliability trade-off 2. Fault characterization in FPGA on-chip memories  Fault type, location, and rate  Temperature, Chip 3. Low-voltage FPGA-based Neural Network (NN)  Power consumption and NN accuracy characterization  Fault mitigation techniques  Application-aware technique  Built-in ECC
  13. 13. 13 Fault Characterization at CRITICAL Region  Fully non-uniform fault distribution.  Majority of BRAMs do not experience many faults. Fault variability among FPGA BRAMs: Fully non-uniform fault distribution VC707 (2060 BRAMs) VCCBRAM@ Vcrash= 0.54V Temperature@ Ambient 0.0% 0.3% 0.6% 0.9% 1.2% 1.5% BRAMFaultRate(%) %BRAMs Average Fault Rate (%) 1.8% 0.86% High-vulnerable 9.4% 0.24% Mid-vulnerable 52.3% 0.03% Low-vulnerable 36.3% 0.0% Zero-vulnerable K-means clustering
  14. 14. 14 Fault Characterization at CRITICAL Region Type of undervolting faults: Permanent faults at specific voltage  There is no considerable change on the rate and location of faults over time.  Validated by repeating experiments for 100 times.  The physical location of BRAMs is extracted using Vivado.  Fault Variation Map (FVM): Fault rate mapped to the physical location of BRAMs. FVM can be potentially used in fault mitigation techniques! FPGA x-axis FPGAy-axis BRAMFaultRate(%) FVM @ (VCCBRAM @Vcrash, T= ambient, chip= VC707) 1 10 20 30 40 50 60 70 80 90 100 0 200 400 600 800 1000 1 11 21 31 41 51 61 71 81 91 FaultRate(per1Mbit) #Run Individual Run Cumulative Median Key observations discussed: 1. Fault rate exponentially increases by further undervolting. 2. BRAMs have fully different reliability behavior against undervolting faults. 3. The fault rate and location is deterministic over the time. Three parameters orthogonally have significant impact on the rate and location of faults: 1. Voltage 2. Temperature 3. Chip
  15. 15. 15 Fault Characterization (Voltage Impacts) Location of undervolting faults: Fault Inclusion Property (FIP)  FIP: A corrupted bit at a specific voltage stays faulty in lower voltages as well.  FIP can be used in mitigation techniques. 0.1 1 10 100 1000 10000 0.61 0.6 0.59 0.58 0.57 0.56 0.55 0.54 FaultRate(per1Mbit) logscale VCCBRAM (V) Illustration of FIP FIP shown as fault rate for VC707
  16. 16. 16 Fault Characterization (Temperature Impacts)  Methodology: Adjusting environmental temperature, monitoring on-board temperature via PMBus.  Experimental Observation:  At higher temperatures, fault rate is significantly reduced.  Inverse Temperature Dependency :  For nano-scale technology nodes, under ultra low-voltage operations, the circuit delay reduces at higher temperatures since supply voltage approaches the threshold voltage. * x-axis: VCCBRAM (V). * y-axis: fault rate (per 1Mbit). 50 60 70 80 Practical confirmation of Inverse Temperature Dependency (ITD) (1) Neshatpour, K., Burleson, W., Khajeh, A., & Homayoun, H. (2018). Enhancing Power, Performance, and Energy Efficiency in Chip Multiprocessors Exploiting Inverse Thermal Dependence. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, (4), 778-791.
  17. 17. 17 Fault Characterization (Chip Impacts) KC705-BKC705-A  Methodology: Repeating experiments on two identical samples of KC705 (A&B).  Observations:  Fault rates significantly vary, more than 4X.  Fault Variation Maps (FVMs) are entirely different. Fault location Fault location @VCCBRAM= Vcrash @VCCBRAM= Vcrash Even identical samples of same chips have totally different reliability behavior, due to the process variation/aging effects. Fault rate 0 100 200 300 0.57 0.56 0.55 0.54 Per1Mbit VCCBRAM (V) Fault rate 0 100 200 300 0.61 0.6 0.59 0.58 0.57 0.56 0.55 0.54 Per1Mbit VCCBRAM (V)
  18. 18. 18 Key Points of the Second Contribution  Fault rate: The increase of the fault rate by further undervolting is exponential.  Non-uniform fault distribution among BRAMs: BRAMs do not have similar sensitivity against undervolting.  Deterministic behavior of faults: The location of faults does not change over the time, at certain voltage and temperature, and for a certain chip.  Reliability behavior over different voltage levels: There is Fault Inclusion Property (FIP).  Environmental temperature: At higher temperatures, FPGA BRAMs shows better reliability behavior, i.e., less fault rate.  Reliability differences for chips: Even identical chips shows fully different reliability behaviors.
  19. 19. 19 Contributions 1. Undervolting FPGAs  Voltage guardband  Overall power and reliability trade-off 2. Fault characterization in FPGA on-chip memories  Fault type, location, and rate  Temperature, Chip 3. Low-voltage FPGA-based Neural Network (NN)  Power consumption and NN accuracy characterization  Fault mitigation techniques  Application-aware technique  Built-in ECC
  20. 20. 20 Experimental Methodology Neural Network (NN) Type Fully-connected classifier Total number of weights ~1.5 millions Activation function Logsig (logarithmic sigmoid) Major benchmark Name-type MNIST- handwritten digit images Number of images Training: 60000, Classification: 10000 Number of pixels per image 28*28=256 Number of output classes 10 Additional benchmarks Names Forest and Reuters Data representation model Type 16-bits fixed-point Precision Minimum sign and digit per layer An example implementation on VC707 Frequency 100 Mhz BRAM usage (total: 2060) 70.8%
  21. 21. 21 NN Implementation on FPGA  Input data: off-chip DDR memory.  Weights: on-chip FPGA BRAM.  Computation: Streaming data onto DSPs and LUTs.  We undervolt VCCBRAM:  Weights of the NN are potentially affected. FPGA Implementation
  22. 22. 22 Low-Voltage FPGA-based NN  Significant power reduction until the minimum safe voltage, i.e., Vmin (By eliminating the voltage guardband).  Additional 40% power reduction below the voltage guardband.  The NN classification error exponentially increases from 2.56% (inherent classification error) to 6.74% through undervolting BRAMs beyond Vmin.  Fault mitigation techniques to prevent the accuracy loss:  Application-aware mechanism  Built-in ECC Power saving NN accuracy loss 2.39 0.25 0.15 6.47 6.47 6.47 0 2 4 6 8 10 Vnom= 1 V Vmin= 0.61V Vcrash= 0.54V On-chipPower(Watts) BRAM Rest 0 100 200 300 400 0% 2% 4% 6% 8% 0.61 0.6 0.59 0.58 0.57 0.56 0.55 0.54 FaultRatio(per1Mbit) ClassificationError(%) VCCBRAM (V) NN Error Ratio Fault Ratio Inherent Error rate: 2.56%
  23. 23. 23 Intelligently-Constrained BRAM placement (ICBP)  Below voltage guardband level at CRITICAL voltage region, we present ICBP to prevent NN classification error rate loss.  Core Idea: Map most-sensitive weights to faults into robust BRAMs.  Q: Which are the most-sensitive NN weights? A: Deeper Layers. ICBP-Additional ConstraintsintheFPGA placementstage 1 1.4 2.1 3 5.7 LAYER0 LAYER1 LAYER2 LAYER3 LAYER4 Normalized Vulnerability NN Layers
  24. 24. 24 ICBP Evaluation  Pros:  Significant accuracy loss prevention.  No power and performance overhead.  Cons:  Needs the FVM as a pre-process step  Built-in ECC is evaluated without having this cost. 0 0.1 0.2 0.3 0.4 0% 2% 4% 6% 8% 10% 0.61 0.6 0.59 0.58 0.57 0.56 0.55 0.54 BRAMsPower(Watt) NNClassificationError(%) VCCBRAM (V) NN Error by Default Placement NN Error by ICBP BRAM Power Inherent NN Error: 2.56%
  25. 25. 25 Built-in ECC  Built-in ECC of FPGA BRAMs:  Hamming-code.  Two (2) additional bits per row are reserved as parities.  SECDED (Single-Error Correction and Double-Error Detection).  Experimental Methodology:  Activate built-in ECC under low-voltage read operations.  Experimental Observations:  >90% fault correction  >7% fault detection (not correction) 0 200 400 600 800 0.61 0.6 0.59 0.58 0.57 0.56 0.55 0.54 Faultrate(per1Mbit) VCCBRAM (V) Without ECC With ECC Parity Bits single-bit double-bit multiple-bit
  26. 26. 26 ECC for NN Accelerator 0% 2% 4% 6% 8% 0.61 0.6 0.59 0.58 0.57 0.56 0.55 0.54 ClassificationError(%) VCCBRAM (V) Without ECC With ECC Inherent NN Error: 2.56% Area Utilization (%) BRAM LUT FF Without ECC 96% 3% 0.25% With ECC 100% 12% 0.25% BRAM Power (W) Vnom= 1V Vmin= 0.61V Vcrash= 0.54V Without ECC 2.4 0.31 0.198 With ECC ---- ---- 0.211 ECC efficiency to prevent NN accuracy loss ECC area and power costs  Pros:  Significant accuracy loss prevention.  Negligible power and performance overhead.  Cons:  Requires larger data rows/lines.  Not all FPGAs are equipped with this technique.
  27. 27. 27 Key Points of the Third Contribution  Power reduction for FPGA-based accelerates: Significant energy improvement can be achieved for FPGA-based accelerators (studied for typical NN) through undervolting:  By eliminating the voltage guardband  By further undervolting in the critical voltage region  Cost of undervolting: Accuracy loss is also significant but controllable at the critical voltage region.  Fault mitigation techniques: According to the fault characterization study, efficient mitigation techniques can be deployed to prevent the NN accuracy loss.
  28. 28. 28 Wrap up Summary Conclusion Potential Next Steps ….
  29. 29. 29 Summary  We experimentally showed how Xilinx FPGAs work under aggressive low-voltage operations.  There is a conservative voltage guardband below the nominal voltage level, i.e., Vnom.  BRAMs power significantly reduces through undervolting; however, reliability degrades below the minimum safe voltage, i.e., Vmin.  We characterized the behavior of undervolting faults at the critical region.  We evaluated FPGA undervolting for a typical NN accelerator.
  30. 30. 30 Conclusion  There is significant potential in commercial FPGAs to improve the energy efficiency through aggressive undervolting.  By eliminating the conservative voltage guardband  By further undervolting into the voltage critical region  Undervolting faults manifest deterministic behaviors.  Efficient fault mitigation techniques can be deployed which can allow to further energy saving.  State-of-the-art FPGA-based accelerators can be adapted by undervolting approach.
  31. 31. 31 Constraints of the Xilinx FPGAs for Undervolting  Many FPGA platforms, e.g., Zynq are not equipped with voltage scaling capability.  There is no standard about the voltage distribution among platform components.  In Xilinx products, voltage regulators are hardwired to the host through PMBus interface.  In many cases, several components on the FPGA platform share a single voltage rail.  Vendors set unnecessarily conservative voltage guardbands that increase the energy.  There is no publicly-available circuit-level information of FPGAs.
  32. 32. 32 Potential Next Steps  Different computing paradigms, e.g., Heterogeneous Computing, Approximate Computing, Stochastic Computing, among others.  Generalizing observations by extending the experiments to other FPGA vendors like Intel/Altera.  Evaluation the undervolting in noisy and harsh environments.  More advanced designs, where other components such as I/O and DSP are undervolted.  Application profiling to analyze workload-to-workload variation.  Dynamic Vmin prediction and scaling, adapted by frequency and temperature.
  33. 33. www.bsc.es Thanks! Any Question/Comment? Contact: behzad.salami@bsc.es

×