SlideShare a Scribd company logo
1 of 51
Download to read offline
FPGA Undervolting for Energy-Efficiency
30th International Conference on Field-Programmable Logic and Applications (FPL).
3th September, 2020.
Behzad Salami
Barcelona Supercomputing Center (BSC)
2
Outline
• Motivation and Background
• Methodology and Results
- Undervolting FPGA On-Chip Memories
- Undervolting FPGA Internal Components
• More Information
3
Aggressive Undervolting
• Aggressive undervolting- Underscaling the supply voltage below the
nominal and safe level:
 Power/Energy Efficiency: Reduces dynamic and static power quadratically
and linearly, respectively.
 Reliability: Increases the circuit delay and in turn, causes timing faults.
• Dual/Multi-Vdd, DVS, and DVFS: Similar but different mechanisms to
aggressive undervolting:
 Similarity: Underscaling the supply voltage.
 Difference: Undervolting is until a certain safe level, usually constrained by
vendors.
Reliability
Power/Energy
Efficiency
4
State-of-the-art
• Aggressive undervolting has shown significant efficiency
to reduce the energy consumption.
 Devices:
 CPUs: Itanium II (ISCA2014), X86 (IOLTS2017), ARM
(HPCA2017)
 GPUs: NVidia (Micro2015)
 DRAMs: Multiple Brands (Sigmetrics2017)
 FPGA: This work
 Focus of the previous works:
 Voltage guardband
 Minimum safe voltage, i.e., Vmin prediction
 Fault characterization and mitigation
 Chip-to-chip, core-to-core, and workload-to-workload variation
 ….
• More straightforward and more parameters
but less precise
 ASIC DNN: Minerva (Micro2016), Thundervolt (DAC2018)
 CPU: Bravo (HPCA2017 )
 Network On-Chip (HPCA2014)
Real hardware:
Simulation-based studies:
5
Undervolting on FPGAs: Motivation
Contribution of FPGAs in large data centers is growing, expected
to be in 30% of datacenter servers by 2020 (Top500 news).
• In comparison to ASICs,
energy efficiency of FPGAs
is a serious concern, i.e.,
10X-100X less-efficient.
• Nominal voltage reduction
of FPGAs is naturally
applied for different
generations.
Undervolting
[Intel/Altera]
[Xilinx]
6
Outline
• Motivation and Background
• Methodology and Results
- Undervolting FPGA On-Chip Memories
- Undervolting FPGA Internal Components
• More Information
7
Undervolting FPGA On-Chip Memories
1. Undervolting FPGAs
 Voltage guardband
 Overall power and reliability trade-off
2. Fault characterization in FPGA on-chip memories
 Fault type, location, and rate
 Temperature, Chip
3. Low-voltage FPGA-based Neural Network (NN)
 Power consumption and NN accuracy characterization
 Fault mitigation techniques
 Application-aware technique
 Built-in ECC
8
Voltage Scaling Capability in Xilinx
VC707: performance-efficient design
KC705: power-efficient design (A & B)
Evaluated Xilinx platforms
VC707
Voltage distribution on Xilinx platforms
Voltage regulator
 Power Management Bus (PMBus).
 Hardwired to the host.
ZC702: ARM integrated with FPGA
VCCINT
VCCBRAM
9
Overall Voltage Behavior
• FPGA stops operating below
Vcrash, min operating voltageCRASH
• No observable fault
• Voltage Guardband below Vnom
SAFE
• Faults manifest
• Below Vmin, min safe voltage
CRITICAL
• Voltage guardband: to ensure
the worst-case environmental
and process technologies.
• Experimental conditions: At
ambient temperature and
maximum operating frequency.
Vnom
Vmin
Vcrash
0
0.2
0.4
0.6
0.8
1
VC707 ZC702 KC705-A KC705-B
VCCBRAM(V)
Platform
GUARDBAND
CRITICAL
CRASH
10
Floorplan of VC707
Experimental Methodology
• FPGA BRAMs:
 Hierarchy of set of bit-cells
 distributed over the chip.
 Size of each BRAM: 16-kbits
• Experimental Methodology:
 HW: Transfer content of BRAMs to the host.
 SW: Analyze data, and adjust voltage of BRAMs. (2060 BRAMs)
11
0
200
400
600
800
0
1
2
3
1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55
FaultRate(per1Mbit)
BRAMPower(Watts)
VCCBRAM (V)
BRAM Power
Fault Rate
Vmin=0.61V
Vcrash=0.54V
0
400
800
0
0.1
0.2
0.3
0.4
0.61 0.6 0.59 0.58 0.57 0.56 0.55 0.54
per1Mbit
Watts
Vnom=1V
Overall Trade-offs on BRAMs- Power & Reliability
VC707
12
0
150
300
0
0.05
0.1
0.15
0.61 0.6 0.59 0.58 0.57 0.56 0.55 0.54
per1Mbit
Watts
Overall Trade-offs on BRAMs- Multiple Platforms
0
200
400
600
800
0
1
2
3
1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55
FaultRate
(per1Mbit)
BRAMPower
(Watts)
VCCBRAM (V)
Vnom=1V
Vmin=0.61V
0
400
800
0
0.2
0.4
0.61 0.6 0.59 0.58 0.57 0.56 0.55 0.54
per1Mbit
Watts
VC707
0
50
100
150
200
0
10
20
30
1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55
FaultRate
(per1Mbit)
BRAMPower
(mWatts)
VCCBRAM (V)
Vnom=1V
Vcrash=0.53V
0
100
200
0
2
4
0.59 0.58 0.57 0.56 0.55 0.54 0.53
per1Mbit
mWatts
ZC702
0
100
200
300
0
1
2
3
1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55
FaultRate
(per1Mbit)
BRAMPower
(Watts)
VCCBRAM (V)
Vnom=1V
Vcrash=0.54V
Vmin=0.61V
KC705-A
0
20
40
60
80
0
1
2
3
1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55
FaultRate
(per1Mbit)
BRAMPower
(Watts)
VCCBRAM (V)
Vnom=1V
Vmin=0.57V
Vcrash=0.54V
0
40
80
0
0.05
0.1
0.15
0.57 0.56 0.55 0.54
per1Mbit
Watts
KC705-B
Vmin=0.59V
Vcrash=0.54V
13
Contributions
1. Undervolting FPGAs
 Voltage guardband
 Overall power and reliability trade-off
2. Fault characterization in FPGA on-chip memories
 Fault type, location, and rate
 Temperature, Chip
3. Low-voltage FPGA-based Neural Network (NN)
 Power consumption and NN accuracy characterization
 Fault mitigation techniques
 Application-aware technique
 Built-in ECC
14
Fault Characterization at CRITICAL Region
• Fully non-uniform fault distribution.
• Majority of BRAMs do not experience many faults.
Fault variability among FPGA BRAMs:
Fully non-uniform fault distribution
VC707 (2060 BRAMs)
VCCBRAM@ Vcrash= 0.54V
Temperature@ Ambient
0.0%
0.3%
0.6%
0.9%
1.2%
1.5%
BRAMFaultRate(%)
%BRAMs Average Fault Rate (%)
1.8% 0.86%
High-vulnerable
9.4% 0.24%
Mid-vulnerable
52.3% 0.03%
Low-vulnerable
36.3% 0.0%
Zero-vulnerable
K-means clustering
15
Fault Characterization at CRITICAL Region
Type of undervolting faults:
Permanent faults at specific voltage
• There is no considerable change on the rate and location of faults over time.
• Validated by repeating experiments for 100 times.
• The physical location of BRAMs is extracted using Vivado.
• Fault Variation Map (FVM): Fault rate mapped to the physical location of
BRAMs.
FVM can be potentially used in fault mitigation techniques!
FPGA x-axis
FPGAy-axis
BRAMFaultRate(%)
FVM @
(VCCBRAM @Vcrash, T= ambient, chip= VC707)
1 10 20 30 40 50 60 70 80 90 100
0
200
400
600
800
1000
1 11 21 31 41 51 61 71 81 91
FaultRate(per1Mbit)
#Run
Individual Run Cumulative Median
Three parameters orthogonally have significant impact on the
rate and location of faults:
1. Voltage
2. Temperature
3. Chip
16
Fault Characterization (Voltage Impacts)
Location of undervolting faults:
Fault Inclusion Property (FIP)
• FIP: A corrupted bit at a specific voltage stays faulty in lower voltages as
well.
• FIP can be used in mitigation techniques.
0.1
1
10
100
1000
10000
0.61 0.6 0.59 0.58 0.57 0.56 0.55 0.54
FaultRate(per1Mbit)
logscale
VCCBRAM (V)
Illustration of FIP FIP shown as fault rate for VC707
17
Fault Characterization (Temperature Impacts)
• Methodology: Adjusting environmental temperature, monitoring on-
board temperature via PMBus.
• Experimental Observation:
 At higher temperatures, fault rate is significantly reduced.
• Inverse Temperature Dependency (𝑰𝑻𝑫) 𝟏:
 For nano-scale technology nodes, under ultra low-voltage operations, the
circuit delay reduces at higher temperatures since supply voltage approaches
the threshold voltage.
* x-axis: VCCBRAM (V). * y-axis: fault rate (per 1Mbit).
𝑇 = 50 0
𝐶 𝑇 = 60 0
𝐶 𝑇 = 70 0
𝐶 𝑇 = 80 0
𝐶
Practical confirmation of Inverse Temperature Dependency (ITD)
(1) Neshatpour, K., Burleson, W., Khajeh, A., & Homayoun, H. (2018). Enhancing Power, Performance, and Energy Efficiency in Chip
Multiprocessors Exploiting Inverse Thermal Dependence. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, (4), 778-791.
18
Fault Characterization (Chip Impacts)
KC705-BKC705-A
• Methodology: Repeating experiments on identical samples of KC705 (A&B).
•
• Observations:
 Fault rates significantly vary, more than 4X.
 Fault Variation Maps (FVMs) are entirely different.
Fault location Fault location
@VCCBRAM= Vcrash @VCCBRAM= Vcrash
Even identical samples of same chips have totally different reliability
behavior, due to the process variation/aging effects.
Fault rate
0
100
200
300
0.57 0.56 0.55 0.54
Per1Mbit VCCBRAM (V)
Fault rate
0
100
200
300
0.61 0.6 0.59 0.58 0.57 0.56 0.55 0.54
Per1Mbit
VCCBRAM (V)
19
Contributions
1. Undervolting FPGAs
 Voltage guardband
 Overall power and reliability trade-off
2. Fault characterization in FPGA on-chip memories
 Fault type, location, and rate
 Temperature, Chip
3. Low-voltage FPGA-based Neural Network (NN)
 Power consumption and NN accuracy characterization
 Fault mitigation techniques
 Application-aware technique
 Built-in ECC
20
Experimental Methodology
Neural Network (NN)
Type Fully-connected classifier
Total number of weights ~1.5 millions
Activation function Logsig (logarithmic sigmoid)
Major benchmark
Name-type MNIST- handwritten digit images
Number of images Training: 60000, Classification: 10000
Number of pixels per image 28*28=256
Number of output classes 10
Additional benchmarks
Names Forest and Reuters
Data representation model
Type 16-bits fixed-point
Precision Minimum sign and digit per layer
An example implementation on VC707
Frequency 100 Mhz
BRAM usage (total: 2060) 70.8%
21
NN Implementation on FPGA
• Input data: off-chip DDR
memory.
• Weights: on-chip FPGA BRAM.
• Computation: Streaming data
onto DSPs and LUTs.
• We undervolt VCCBRAM:
 Weights of the NN are
potentially affected.
FPGA Implementation
22
Low-Voltage FPGA-based NN
• Significant power reduction until
the minimum safe voltage, i.e.,
Vmin (By eliminating the voltage
guardband).
• Additional 40% power reduction
below the voltage guardband.
• The NN classification error exponentially
increases from 2.56% (inherent classification
error) to 6.74% through undervolting
BRAMs beyond Vmin.
• Fault mitigation techniques to prevent the
accuracy loss:
 Application-aware mechanism
 Built-in ECC
Power saving
NN accuracy loss
2.39 0.25 0.15
6.47
6.47 6.47
0
2
4
6
8
10
Vnom= 1 V Vmin= 0.61V Vcrash= 0.54V
On-chipPower(Watts)
BRAM Rest
23
Intelligently-Constrained BRAM placement (ICBP)
• Below voltage guardband level at CRITICAL voltage region, we present
ICBP to prevent NN classification error rate loss.
• Core Idea: Map most-sensitive weights to faults into robust BRAMs.
 Q: Which are the most-sensitive NN weights? A: Deeper Layers.
ICBP-Additional
ConstraintsintheFPGA
placementstage
1 1.4
2.1
3
5.7
LAYER0 LAYER1 LAYER2 LAYER3 LAYER4
Normalized
Vulnerability
NN Layers
24
ICBP Evaluation
• Pros:
 Significant accuracy loss prevention.
 No power and performance overhead.
• Cons:
 Needs the FVM as a pre-process step  Built-in ECC is evaluated without
having this cost.
0
0.1
0.2
0.3
0.4
0%
2%
4%
6%
8%
10%
0.61 0.6 0.59 0.58 0.57 0.56 0.55 0.54
BRAMsPower(Watt)
NNClassificationError(%)
VCCBRAM (V)
NN Error by Default Placement NN Error by ICBP BRAM Power
Inherent NN Error: 2.56%
25
Built-in ECC
• Built-in ECC of FPGA BRAMs:
 Hamming-code.
 Two (2) additional bits per row are reserved as parities.
 SECDED (Single-Error Correction and Double-Error Detection).
• Experimental Methodology:
 Activate built-in ECC under low-voltage read operations.
• Experimental Observations:
 >90% fault correction
 >7% fault detection (not correction)
0
200
400
600
800
0.61 0.6 0.59 0.58 0.57 0.56 0.55 0.54
Faultrate(per1Mbit)
VCCBRAM (V)
Without ECC With ECC
Parity Bits
single-bit
double-bit
multiple-bit
26
ECC for NN Accelerator
0%
2%
4%
ClassificationError(%)
VCCBRAM (V)
Without ECC With ECC
Inherent NN Error: 2.56%
Area Utilization (%)
BRAM LUT FF
Without ECC 96% 3% 0.25%
With ECC 100% 12% 0.25%
BRAM Power (W)
Vnom= 1V Vmin= 0.61V Vcrash=
0.54V
Without ECC 2.4 0.31 0.198
With ECC ---- ---- 0.211ECC efficiency to prevent NN accuracy loss
ECC area and power costs
• Pros:
 Significant accuracy loss prevention.
 Negligible power and performance overhead.
• Cons:
 Requires larger data rows/lines.
 Not all FPGAs are equipped with this technique.
27
Outline
• Motivation and Background
• Methodology and Results
- Undervolting FPGA On-Chip Memories
- Undervolting FPGA Internal Components
• More Information
28
Executive Summary
• Motivation: Power consumption of neural networks is a main concern
 Hardware acceleration: GPUs, FPGAs, and ASICs
• Problem: FPGAs are at least 10X less power-efficient than equivalent ASICs
• Goal: Bridge the power-efficiency gap between ASIC- and FPGA-based
neural networks by Undervolting below nominal level
• Evaluation Setup
 5 Image classification workloads
 3 Xilinx UltraScale+ ZCU102 platforms
 On-chip voltage rail for internal FPGA components
• Main Results
 Large voltage guardband (i.e., 33%)
 >3X power-efficiency gain
29
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
30
Motivation and Background
• Motivation
 Power consumption of neural networks is a main concern
 Hardware acceleration: GPUs, FPGAs, and ASICs
 FPGAs: Getting popular but less power-efficient than equivalent ASICs
 Large voltage guardbands (12-35%) for CPUs, GPUs, DRAMs
 Any potential of “Undervolting FPGAs” for power-efficiency of neural networks?
• Background
 Neural Networks: Widely deployed with an inherent resilience to errors
 FPGAs: Higher throughput than GPUs and better flexibility than ASICs
 Undervolting: Reduces power cons., may incur reliability or performance issues
31
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
32
Our Goal
• Primary Goal
 Bridge the power-efficiency gap between ASIC- and FPGA-based
neural networks by:
 Undervolting (i.e., underscaling voltage below nominal level)
• Secondary Goals
 Study the voltage behavior of real FPGAs (e.g., guardband)
 Study the power-efficiency gain of undervolting for neural networks
 Study the reliability overhead
 Study the frequency underscaling to prevent the accuracy loss
 Study the effect of environmental temperature
33
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
34
Overall Methodology
• 5 CNN image classification
workloads, i.e., VGGNet, GoogleNet,
AlexNet, ResNet50, Inception.
• Xilinx DNNDK to map CNN into FPGA
 By default optimized for INT8
• 3 identical samples of Xilinx ZCU102
 ZYNQ Ultrscale+ architecture
 Hard-core ARM for data orchestration
 FPGA for CNN acceleration
• 1 on-chip voltage rails, via PMBus
 𝑉𝐶𝐶𝐼𝑁𝑇: DSPs, LUTs, buffers, …
 𝑉𝑛𝑜𝑚= 850mV (set by manufacturer)
Vast majority (>99.9%) of the power is dissipated on 𝑉𝐶𝐶𝐼𝑁𝑇
35
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
36
Overall Voltage Behavior
Slight variation of voltage behavior across platforms and benchmarks
 FPGA stops operatingCrash
• Guardband: Large region below nominal level (𝑽 𝒏𝒐𝒎 = 𝟖𝟓𝟎𝒎𝑽)
• Critical: Narrower region below guardband (𝑽 𝒎𝒊𝒏 = 𝟓𝟕𝟎𝒎𝑽)
• Crash: FPGA crashes below critical region (𝑽 𝒄𝒓𝒂𝒔𝒉 = 𝟓𝟒𝟎𝒎𝑽)
 No performance or reliability loss
 Added by the vendor to ensure the
worst-case conditions
 Large guardband, average of 33%
Guard
band
 A narrow voltage region
 Neural network accuracy collapse
Critical
37
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
38
Power-Reliability Trade-off
Power-efficiency (GOPs/W) gain
• >3X power saving (2.6X by eliminating guardband and further 43% in critical region)
Reliability overhead (i.e., CNN accuracy loss)
VGGNet GoogleNet AlexNet ResNet Inception
• Slight variation across 3 platforms and 5 workloads
• No accuracy loss in the guardband, accuracy collapse in the critical region
• Slight variation across 3 platforms and 5 workloads
39
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
40
VCCINT
(mV)
Fmax
(Mhz)
GOPs
(Norm)
Power (W)
Norm)
GOPs/W
(Norm)
GOPs/J
(Norm)
570 333 1 1 1 1
565 300 0.94 0.97 0.97 0.87
560 250 0.83 0.84 0.99 0.75
555 250 0.83 0.78 1.06 0.8
550 250 0.83 0.75 1.1 0.83
545 250 0.83 0.74 1.12 0.84
540 200 0.7 0.56 1.25 0.75
Frequency Underscaling
• Simultaneous frequency underscaling to prevent CNN accuracy collapse in
the critical voltage region
• For each voltage level below 𝑽 𝒎𝒊𝒏, we found the 𝑭 𝒎𝒂𝒙, the maximum
operating frequency at which there is no accuracy loss
• Leads to performance and energy-efficiency loss
Best setting for High-performance and Energy-efficiency Best setting for Power-efficiency
(Voltage steps= 5mV, Frequency steps= 50Mhz)- shown for GoogleNet
41
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
42
Environmental Temperature
• Effects of environmental temperature on power-reliability
 Use fan speed to test temperature in [34 ℃, 50 ℃]
 On-board temperature monitored by PMBus
• Temperature effects on power consumption
 ↓ 𝑇𝑒𝑚𝑝 → ↓ 𝑃𝑜𝑤𝑒𝑟 (direct relation of power and temp)
 By undervolting, the impact of temperature on power consumption reduces.
• Temperature effects on reliability
 ↓ 𝑇𝑒𝑚𝑝 → ↑ 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑙𝑜𝑠𝑠 (indirect relation of reliability and temp)
 In our temperature range, 𝑉 𝑚𝑖𝑛 and 𝑉𝑐𝑟𝑎𝑠ℎdo not change significantly.
GoogleNet
43
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
44
Prior Works
• Undervolting
 Studies for off-the-shelf real CPUs, GPUs, ASICs, DRAMs
 Large voltage guardband (from 12% to 35%) for many devices
 This work extends such studies for off-the-shelf FPGAs especially for
neural network acceleration and confirms large guardbands (i.e., 33%)
• Power-Efficient Neural Networks
 Studies on architectural-, hardware-, and software-level techniques
 Undervolting in neural network ASIC accelerator (e.g., GreenTPU-DAC’19)
 This work proposes a hardware-level undervolting for further
power-saving (>3X) in FPGAs.
• Reliability in Neural Networks
 Analytical and simulation-based studies (e.g., Thundervolt-DAC’18)
 Some studies on real hardware (e.g., EDEN-MICRO’19)
 This work studies the reliability of neural networks on real FPGAs
when operating at reduced voltage levels.
45
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
46
Summary, Conclusion, and Future Works
• Summary
 We improve the power-efficiency (>3X) of off-the-shelf
FPGAs via undervolting for neural network accelerators:
 2.6X by eliminating the guardband (i.e., 33%) without any cost
 43% by further undervolting below the guardband with the cost of
 either accuracy loss, when the frequency is not underscaled
 or performance loss, when the frequency is underscaled
• Conclusion
 Undervolting is an effective way to achieve significant
power-saving for FPGA-based neural network accelerators
• Future Works
 HW & SW extension of our undervolting for FPGA clusters
and other neural network models and tools
47
Outline
• Motivation and Background
• Methodology and Results
- Undervolting FPGA On-Chip Memories
- Undervolting FPGA Internal Components
• More Information
48
References
• B. Salami, et al., "An Experimental Study of Reduced-Voltage Operation in Modern FPGAs
for Neural Network Acceleration," in 50th IEEE/IFIP International Conference on
Dependable Systems and Networks (DSN), 2020.
• B. Salami, et al., "Comprehensive Evaluation of Supply Voltage Underscaling in FPGA on-
chip Memories.", in 51st Annual IEEE/ACM International Symposium on Microarchitecture
(MICRO ), 2018.
• B. Salami, et al., “Evaluating Built-in ECC of FPGA on-chip Memories for the Mitigation of
Undervolting Faults," in 27th Euromicro International Conference on Parallel, Distributed,
and Network-based Processing (PDP), 2019.
• B. Salami, et al., "Fault Characterization Through FPGAs Undervolting.", in 28th
International Conference on Field Programmable Logic & Applications (FPL), 2018.
• B. Salami, et al., “On the Resilience of RTL NN Accelerators: Fault Characterization and
Mitigation.", in 30th International Symposium on Computer Architecture and High
Performance Computing (SBAC-PAD), 2018.
49
Ongoing and Future Extensions
• Circuit-level simulation for validating the results
• Expansion for more number of FPGAs (cluster), more
workloads (DNN and non-DNN)
• Heterogeneous systems including hw-sw systems, more
voltage rails
• Design voltage-optimized FPGA components
• Integration with error handling systems like check-
pointing
50
Acknowledgment
• Adrian Cristal
• Osman Unsal
• Fahrettin Koc
• Baturay Onural
• Ismail Emir Yuksel
FPGA Undervolting for Energy-Efficiency
30th International Conference on Field-Programmable Logic and Applications (FPL).
3th September, 2020.
Behzad Salami
Barcelona Supercomputing Center (BSC)
behzad.salami@bsc.es

More Related Content

What's hot

Conversor a d mcp3201
Conversor a d mcp3201Conversor a d mcp3201
Conversor a d mcp3201Samuel Borges
 
Linear Isolators with Analog Devices iCoupler Technology
Linear Isolators with Analog Devices iCoupler TechnologyLinear Isolators with Analog Devices iCoupler Technology
Linear Isolators with Analog Devices iCoupler TechnologyAnalog Devices, Inc.
 
Constant Current Switching Regulator for LEDs with ON/OFF Function: NCP3066
Constant Current Switching Regulator for LEDs with ON/OFF Function: NCP3066Constant Current Switching Regulator for LEDs with ON/OFF Function: NCP3066
Constant Current Switching Regulator for LEDs with ON/OFF Function: NCP3066Premier Farnell
 
TechShanghai2016 - Reliable automotive-grade Isolators for new energy vehicles
TechShanghai2016 - Reliable automotive-grade Isolators for new energy vehiclesTechShanghai2016 - Reliable automotive-grade Isolators for new energy vehicles
TechShanghai2016 - Reliable automotive-grade Isolators for new energy vehiclesHardway Hou
 
Drvg_HB_LED_HP Ind_Light Fix
Drvg_HB_LED_HP Ind_Light FixDrvg_HB_LED_HP Ind_Light Fix
Drvg_HB_LED_HP Ind_Light FixSteve Mappus
 
Datasheet sensor temperatura mcp9700
Datasheet sensor temperatura mcp9700Datasheet sensor temperatura mcp9700
Datasheet sensor temperatura mcp9700ADELIUS
 
Original Power Factor Correction IC UCC28061DR 28061 SOP-16 New Texas Instrum...
Original Power Factor Correction IC UCC28061DR 28061 SOP-16 New Texas Instrum...Original Power Factor Correction IC UCC28061DR 28061 SOP-16 New Texas Instrum...
Original Power Factor Correction IC UCC28061DR 28061 SOP-16 New Texas Instrum...AUTHELECTRONIC
 
EC/Bios Interaction Laptop Repair Course
EC/Bios Interaction Laptop Repair CourseEC/Bios Interaction Laptop Repair Course
EC/Bios Interaction Laptop Repair CourseVikas Deoarshi
 
High-integrated Green-mode PWM Controller SG6841
High-integrated Green-mode PWM Controller SG6841High-integrated Green-mode PWM Controller SG6841
High-integrated Green-mode PWM Controller SG6841Bruno Camargo
 
Digital Potentiometers Replace Mechanical Potentiometers
Digital Potentiometers Replace Mechanical PotentiometersDigital Potentiometers Replace Mechanical Potentiometers
Digital Potentiometers Replace Mechanical PotentiometersPremier Farnell
 
LED Streetlight APEC Demo Performance_SMappus 03062013 AC 12 Mar 2013
LED Streetlight APEC Demo Performance_SMappus 03062013 AC 12 Mar 2013LED Streetlight APEC Demo Performance_SMappus 03062013 AC 12 Mar 2013
LED Streetlight APEC Demo Performance_SMappus 03062013 AC 12 Mar 2013Steve Mappus
 
Digital potentiometer ds1804 010+
Digital potentiometer ds1804 010+Digital potentiometer ds1804 010+
Digital potentiometer ds1804 010+robertoestrella
 
Ds012846
Ds012846Ds012846
Ds012846____
 
Non-Dimmable Lower Power LED Solutions
Non-Dimmable Lower Power LED SolutionsNon-Dimmable Lower Power LED Solutions
Non-Dimmable Lower Power LED SolutionsON Semiconductor
 
FEBFAN7688_I00250A
FEBFAN7688_I00250AFEBFAN7688_I00250A
FEBFAN7688_I00250ASteve Mappus
 

What's hot (20)

Conversor a d mcp3201
Conversor a d mcp3201Conversor a d mcp3201
Conversor a d mcp3201
 
Linear Isolators with Analog Devices iCoupler Technology
Linear Isolators with Analog Devices iCoupler TechnologyLinear Isolators with Analog Devices iCoupler Technology
Linear Isolators with Analog Devices iCoupler Technology
 
Constant Current Switching Regulator for LEDs with ON/OFF Function: NCP3066
Constant Current Switching Regulator for LEDs with ON/OFF Function: NCP3066Constant Current Switching Regulator for LEDs with ON/OFF Function: NCP3066
Constant Current Switching Regulator for LEDs with ON/OFF Function: NCP3066
 
TechShanghai2016 - Reliable automotive-grade Isolators for new energy vehicles
TechShanghai2016 - Reliable automotive-grade Isolators for new energy vehiclesTechShanghai2016 - Reliable automotive-grade Isolators for new energy vehicles
TechShanghai2016 - Reliable automotive-grade Isolators for new energy vehicles
 
Datasheet
DatasheetDatasheet
Datasheet
 
Drvg_HB_LED_HP Ind_Light Fix
Drvg_HB_LED_HP Ind_Light FixDrvg_HB_LED_HP Ind_Light Fix
Drvg_HB_LED_HP Ind_Light Fix
 
Datasheet sensor temperatura mcp9700
Datasheet sensor temperatura mcp9700Datasheet sensor temperatura mcp9700
Datasheet sensor temperatura mcp9700
 
Pdiusbd11
Pdiusbd11Pdiusbd11
Pdiusbd11
 
Original Power Factor Correction IC UCC28061DR 28061 SOP-16 New Texas Instrum...
Original Power Factor Correction IC UCC28061DR 28061 SOP-16 New Texas Instrum...Original Power Factor Correction IC UCC28061DR 28061 SOP-16 New Texas Instrum...
Original Power Factor Correction IC UCC28061DR 28061 SOP-16 New Texas Instrum...
 
EC/Bios Interaction Laptop Repair Course
EC/Bios Interaction Laptop Repair CourseEC/Bios Interaction Laptop Repair Course
EC/Bios Interaction Laptop Repair Course
 
Uno 2.0-2.5
Uno 2.0-2.5Uno 2.0-2.5
Uno 2.0-2.5
 
High-integrated Green-mode PWM Controller SG6841
High-integrated Green-mode PWM Controller SG6841High-integrated Green-mode PWM Controller SG6841
High-integrated Green-mode PWM Controller SG6841
 
Digital Potentiometers Replace Mechanical Potentiometers
Digital Potentiometers Replace Mechanical PotentiometersDigital Potentiometers Replace Mechanical Potentiometers
Digital Potentiometers Replace Mechanical Potentiometers
 
Mp8126 r1.03 1384507
Mp8126 r1.03 1384507Mp8126 r1.03 1384507
Mp8126 r1.03 1384507
 
LED Streetlight APEC Demo Performance_SMappus 03062013 AC 12 Mar 2013
LED Streetlight APEC Demo Performance_SMappus 03062013 AC 12 Mar 2013LED Streetlight APEC Demo Performance_SMappus 03062013 AC 12 Mar 2013
LED Streetlight APEC Demo Performance_SMappus 03062013 AC 12 Mar 2013
 
Digital potentiometer ds1804 010+
Digital potentiometer ds1804 010+Digital potentiometer ds1804 010+
Digital potentiometer ds1804 010+
 
Ds012846
Ds012846Ds012846
Ds012846
 
Non-Dimmable Lower Power LED Solutions
Non-Dimmable Lower Power LED SolutionsNon-Dimmable Lower Power LED Solutions
Non-Dimmable Lower Power LED Solutions
 
Low Power VLSI Design
Low Power VLSI DesignLow Power VLSI Design
Low Power VLSI Design
 
FEBFAN7688_I00250A
FEBFAN7688_I00250AFEBFAN7688_I00250A
FEBFAN7688_I00250A
 

Similar to FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency

Understanding the Reliability and Power-Efficiency Trade-offs of Modern FPGAs...
Understanding the Reliability and Power-Efficiency Trade-offs of Modern FPGAs...Understanding the Reliability and Power-Efficiency Trade-offs of Modern FPGAs...
Understanding the Reliability and Power-Efficiency Trade-offs of Modern FPGAs...Behzad Salami
 
ISCA2021 Tutorial-Methods for Characterization and Analysis of Voltage Margin...
ISCA2021 Tutorial-Methods for Characterization and Analysis of Voltage Margin...ISCA2021 Tutorial-Methods for Characterization and Analysis of Voltage Margin...
ISCA2021 Tutorial-Methods for Characterization and Analysis of Voltage Margin...Behzad Salami
 
HC24.29.625-IA-23-Wide-Ruhl-Intel_2012_NTV_iA
HC24.29.625-IA-23-Wide-Ruhl-Intel_2012_NTV_iAHC24.29.625-IA-23-Wide-Ruhl-Intel_2012_NTV_iA
HC24.29.625-IA-23-Wide-Ruhl-Intel_2012_NTV_iASaurabh Dighe
 
ACS37800-Datasheet.pdf
ACS37800-Datasheet.pdfACS37800-Datasheet.pdf
ACS37800-Datasheet.pdfSvenSong
 
How to protect electronic systems against esd
How to protect electronic systems against esdHow to protect electronic systems against esd
How to protect electronic systems against esdMohamed Saadna
 
FPL 2018: Fault Characterization Through FPGAs Undervolting
FPL 2018: Fault Characterization Through FPGAs UndervoltingFPL 2018: Fault Characterization Through FPGAs Undervolting
FPL 2018: Fault Characterization Through FPGAs UndervoltingLEGATO project
 
testing purpose for ic testing and othettastinng
testing purpose for ic testing and othettastinngtesting purpose for ic testing and othettastinng
testing purpose for ic testing and othettastinngPvaveen
 
embedded system introduction to microcontrollers
embedded system introduction to microcontrollersembedded system introduction to microcontrollers
embedded system introduction to microcontrollersBarER4
 
Automatic power factor controller by microcontroller
Automatic power factor controller by microcontrollerAutomatic power factor controller by microcontroller
Automatic power factor controller by microcontrollerSanket Shitole
 
Power system-protection-presentation-dated-03-10-2013-integrated-protection-c...
Power system-protection-presentation-dated-03-10-2013-integrated-protection-c...Power system-protection-presentation-dated-03-10-2013-integrated-protection-c...
Power system-protection-presentation-dated-03-10-2013-integrated-protection-c...jbpatel7290
 
under grund fault ppt (1).pptx
under grund fault ppt (1).pptxunder grund fault ppt (1).pptx
under grund fault ppt (1).pptxPoojaG86
 
Vlsi Design of Low Transition Low Power Test Pattern Generator Using Fault Co...
Vlsi Design of Low Transition Low Power Test Pattern Generator Using Fault Co...Vlsi Design of Low Transition Low Power Test Pattern Generator Using Fault Co...
Vlsi Design of Low Transition Low Power Test Pattern Generator Using Fault Co...iosrjce
 
QuickSilver Controls QCI-DS032 QCI-N2-MX
QuickSilver Controls QCI-DS032 QCI-N2-MXQuickSilver Controls QCI-DS032 QCI-N2-MX
QuickSilver Controls QCI-DS032 QCI-N2-MXElectromate
 
Mini Power Station Product Brochure - Salevo Pty Ltd
Mini Power Station Product Brochure - Salevo Pty LtdMini Power Station Product Brochure - Salevo Pty Ltd
Mini Power Station Product Brochure - Salevo Pty LtdChristopher Panopoulos
 
Low Power Design Techniques for ASIC / SOC Design
Low Power Design Techniques for ASIC / SOC DesignLow Power Design Techniques for ASIC / SOC Design
Low Power Design Techniques for ASIC / SOC DesignRajesh_navandar
 
Steper Motor Control Through Wireless
Steper Motor Control Through WirelessSteper Motor Control Through Wireless
Steper Motor Control Through WirelessPawan Bahuguna
 

Similar to FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency (20)

Understanding the Reliability and Power-Efficiency Trade-offs of Modern FPGAs...
Understanding the Reliability and Power-Efficiency Trade-offs of Modern FPGAs...Understanding the Reliability and Power-Efficiency Trade-offs of Modern FPGAs...
Understanding the Reliability and Power-Efficiency Trade-offs of Modern FPGAs...
 
ISCA2021 Tutorial-Methods for Characterization and Analysis of Voltage Margin...
ISCA2021 Tutorial-Methods for Characterization and Analysis of Voltage Margin...ISCA2021 Tutorial-Methods for Characterization and Analysis of Voltage Margin...
ISCA2021 Tutorial-Methods for Characterization and Analysis of Voltage Margin...
 
HC24.29.625-IA-23-Wide-Ruhl-Intel_2012_NTV_iA
HC24.29.625-IA-23-Wide-Ruhl-Intel_2012_NTV_iAHC24.29.625-IA-23-Wide-Ruhl-Intel_2012_NTV_iA
HC24.29.625-IA-23-Wide-Ruhl-Intel_2012_NTV_iA
 
ACS37800-Datasheet.pdf
ACS37800-Datasheet.pdfACS37800-Datasheet.pdf
ACS37800-Datasheet.pdf
 
How to protect electronic systems against esd
How to protect electronic systems against esdHow to protect electronic systems against esd
How to protect electronic systems against esd
 
FPL 2018: Fault Characterization Through FPGAs Undervolting
FPL 2018: Fault Characterization Through FPGAs UndervoltingFPL 2018: Fault Characterization Through FPGAs Undervolting
FPL 2018: Fault Characterization Through FPGAs Undervolting
 
testing purpose for ic testing and othettastinng
testing purpose for ic testing and othettastinngtesting purpose for ic testing and othettastinng
testing purpose for ic testing and othettastinng
 
About Sine Pulse Width Modulation
About Sine Pulse Width Modulation About Sine Pulse Width Modulation
About Sine Pulse Width Modulation
 
embedded system introduction to microcontrollers
embedded system introduction to microcontrollersembedded system introduction to microcontrollers
embedded system introduction to microcontrollers
 
Automatic power factor controller by microcontroller
Automatic power factor controller by microcontrollerAutomatic power factor controller by microcontroller
Automatic power factor controller by microcontroller
 
Power system-protection-presentation-dated-03-10-2013-integrated-protection-c...
Power system-protection-presentation-dated-03-10-2013-integrated-protection-c...Power system-protection-presentation-dated-03-10-2013-integrated-protection-c...
Power system-protection-presentation-dated-03-10-2013-integrated-protection-c...
 
under grund fault ppt (1).pptx
under grund fault ppt (1).pptxunder grund fault ppt (1).pptx
under grund fault ppt (1).pptx
 
Abb uno-6
Abb uno-6Abb uno-6
Abb uno-6
 
H010613642
H010613642H010613642
H010613642
 
Vlsi Design of Low Transition Low Power Test Pattern Generator Using Fault Co...
Vlsi Design of Low Transition Low Power Test Pattern Generator Using Fault Co...Vlsi Design of Low Transition Low Power Test Pattern Generator Using Fault Co...
Vlsi Design of Low Transition Low Power Test Pattern Generator Using Fault Co...
 
QuickSilver Controls QCI-DS032 QCI-N2-MX
QuickSilver Controls QCI-DS032 QCI-N2-MXQuickSilver Controls QCI-DS032 QCI-N2-MX
QuickSilver Controls QCI-DS032 QCI-N2-MX
 
Mini Power Station Product Brochure - Salevo Pty Ltd
Mini Power Station Product Brochure - Salevo Pty LtdMini Power Station Product Brochure - Salevo Pty Ltd
Mini Power Station Product Brochure - Salevo Pty Ltd
 
5 FINAL PROJECT REPORT
5 FINAL PROJECT REPORT5 FINAL PROJECT REPORT
5 FINAL PROJECT REPORT
 
Low Power Design Techniques for ASIC / SOC Design
Low Power Design Techniques for ASIC / SOC DesignLow Power Design Techniques for ASIC / SOC Design
Low Power Design Techniques for ASIC / SOC Design
 
Steper Motor Control Through Wireless
Steper Motor Control Through WirelessSteper Motor Control Through Wireless
Steper Motor Control Through Wireless
 

More from LEGATO project

Scrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitScrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitLEGATO project
 
A practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemA practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemLEGATO project
 
TEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsTEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsLEGATO project
 
secureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworksecureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworkLEGATO project
 
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...LEGATO project
 
LEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGATO project
 
Smart Home AI at the edge
Smart Home AI at the edgeSmart Home AI at the edge
Smart Home AI at the edgeLEGATO project
 
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGATO project
 
LEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGATO project
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGATO project
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGATO project
 
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGATO project
 
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZoneTZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZoneLEGATO project
 
Infection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingInfection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingLEGATO project
 
Smart Home - AI at the edge
Smart Home - AI at the edgeSmart Home - AI at the edge
Smart Home - AI at the edgeLEGATO project
 
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...LEGATO project
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsLEGATO project
 
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingRECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingLEGATO project
 

More from LEGATO project (20)

Scrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitScrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for Profit
 
A practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemA practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating system
 
TEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsTEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEs
 
secureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworksecureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow Framework
 
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
 
LEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use Case
 
Smart Home AI at the edge
Smart Home AI at the edgeSmart Home AI at the edge
Smart Home AI at the edge
 
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
 
LEGaTO Integration
LEGaTO IntegrationLEGaTO Integration
LEGaTO Integration
 
LEGaTO: Use cases
LEGaTO: Use casesLEGaTO: Use cases
LEGaTO: Use cases
 
LEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming Models
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous Hardware
 
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing Workshop
 
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZoneTZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
 
Infection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingInfection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow Computing
 
Smart Home - AI at the edge
Smart Home - AI at the edgeSmart Home - AI at the edge
Smart Home - AI at the edge
 
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
 
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingRECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
 

Recently uploaded

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfWadeK3
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 

Recently uploaded (20)

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 

FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency

  • 1. FPGA Undervolting for Energy-Efficiency 30th International Conference on Field-Programmable Logic and Applications (FPL). 3th September, 2020. Behzad Salami Barcelona Supercomputing Center (BSC)
  • 2. 2 Outline • Motivation and Background • Methodology and Results - Undervolting FPGA On-Chip Memories - Undervolting FPGA Internal Components • More Information
  • 3. 3 Aggressive Undervolting • Aggressive undervolting- Underscaling the supply voltage below the nominal and safe level:  Power/Energy Efficiency: Reduces dynamic and static power quadratically and linearly, respectively.  Reliability: Increases the circuit delay and in turn, causes timing faults. • Dual/Multi-Vdd, DVS, and DVFS: Similar but different mechanisms to aggressive undervolting:  Similarity: Underscaling the supply voltage.  Difference: Undervolting is until a certain safe level, usually constrained by vendors. Reliability Power/Energy Efficiency
  • 4. 4 State-of-the-art • Aggressive undervolting has shown significant efficiency to reduce the energy consumption.  Devices:  CPUs: Itanium II (ISCA2014), X86 (IOLTS2017), ARM (HPCA2017)  GPUs: NVidia (Micro2015)  DRAMs: Multiple Brands (Sigmetrics2017)  FPGA: This work  Focus of the previous works:  Voltage guardband  Minimum safe voltage, i.e., Vmin prediction  Fault characterization and mitigation  Chip-to-chip, core-to-core, and workload-to-workload variation  …. • More straightforward and more parameters but less precise  ASIC DNN: Minerva (Micro2016), Thundervolt (DAC2018)  CPU: Bravo (HPCA2017 )  Network On-Chip (HPCA2014) Real hardware: Simulation-based studies:
  • 5. 5 Undervolting on FPGAs: Motivation Contribution of FPGAs in large data centers is growing, expected to be in 30% of datacenter servers by 2020 (Top500 news). • In comparison to ASICs, energy efficiency of FPGAs is a serious concern, i.e., 10X-100X less-efficient. • Nominal voltage reduction of FPGAs is naturally applied for different generations. Undervolting [Intel/Altera] [Xilinx]
  • 6. 6 Outline • Motivation and Background • Methodology and Results - Undervolting FPGA On-Chip Memories - Undervolting FPGA Internal Components • More Information
  • 7. 7 Undervolting FPGA On-Chip Memories 1. Undervolting FPGAs  Voltage guardband  Overall power and reliability trade-off 2. Fault characterization in FPGA on-chip memories  Fault type, location, and rate  Temperature, Chip 3. Low-voltage FPGA-based Neural Network (NN)  Power consumption and NN accuracy characterization  Fault mitigation techniques  Application-aware technique  Built-in ECC
  • 8. 8 Voltage Scaling Capability in Xilinx VC707: performance-efficient design KC705: power-efficient design (A & B) Evaluated Xilinx platforms VC707 Voltage distribution on Xilinx platforms Voltage regulator  Power Management Bus (PMBus).  Hardwired to the host. ZC702: ARM integrated with FPGA VCCINT VCCBRAM
  • 9. 9 Overall Voltage Behavior • FPGA stops operating below Vcrash, min operating voltageCRASH • No observable fault • Voltage Guardband below Vnom SAFE • Faults manifest • Below Vmin, min safe voltage CRITICAL • Voltage guardband: to ensure the worst-case environmental and process technologies. • Experimental conditions: At ambient temperature and maximum operating frequency. Vnom Vmin Vcrash 0 0.2 0.4 0.6 0.8 1 VC707 ZC702 KC705-A KC705-B VCCBRAM(V) Platform GUARDBAND CRITICAL CRASH
  • 10. 10 Floorplan of VC707 Experimental Methodology • FPGA BRAMs:  Hierarchy of set of bit-cells  distributed over the chip.  Size of each BRAM: 16-kbits • Experimental Methodology:  HW: Transfer content of BRAMs to the host.  SW: Analyze data, and adjust voltage of BRAMs. (2060 BRAMs)
  • 11. 11 0 200 400 600 800 0 1 2 3 1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 FaultRate(per1Mbit) BRAMPower(Watts) VCCBRAM (V) BRAM Power Fault Rate Vmin=0.61V Vcrash=0.54V 0 400 800 0 0.1 0.2 0.3 0.4 0.61 0.6 0.59 0.58 0.57 0.56 0.55 0.54 per1Mbit Watts Vnom=1V Overall Trade-offs on BRAMs- Power & Reliability VC707
  • 12. 12 0 150 300 0 0.05 0.1 0.15 0.61 0.6 0.59 0.58 0.57 0.56 0.55 0.54 per1Mbit Watts Overall Trade-offs on BRAMs- Multiple Platforms 0 200 400 600 800 0 1 2 3 1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 FaultRate (per1Mbit) BRAMPower (Watts) VCCBRAM (V) Vnom=1V Vmin=0.61V 0 400 800 0 0.2 0.4 0.61 0.6 0.59 0.58 0.57 0.56 0.55 0.54 per1Mbit Watts VC707 0 50 100 150 200 0 10 20 30 1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 FaultRate (per1Mbit) BRAMPower (mWatts) VCCBRAM (V) Vnom=1V Vcrash=0.53V 0 100 200 0 2 4 0.59 0.58 0.57 0.56 0.55 0.54 0.53 per1Mbit mWatts ZC702 0 100 200 300 0 1 2 3 1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 FaultRate (per1Mbit) BRAMPower (Watts) VCCBRAM (V) Vnom=1V Vcrash=0.54V Vmin=0.61V KC705-A 0 20 40 60 80 0 1 2 3 1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 FaultRate (per1Mbit) BRAMPower (Watts) VCCBRAM (V) Vnom=1V Vmin=0.57V Vcrash=0.54V 0 40 80 0 0.05 0.1 0.15 0.57 0.56 0.55 0.54 per1Mbit Watts KC705-B Vmin=0.59V Vcrash=0.54V
  • 13. 13 Contributions 1. Undervolting FPGAs  Voltage guardband  Overall power and reliability trade-off 2. Fault characterization in FPGA on-chip memories  Fault type, location, and rate  Temperature, Chip 3. Low-voltage FPGA-based Neural Network (NN)  Power consumption and NN accuracy characterization  Fault mitigation techniques  Application-aware technique  Built-in ECC
  • 14. 14 Fault Characterization at CRITICAL Region • Fully non-uniform fault distribution. • Majority of BRAMs do not experience many faults. Fault variability among FPGA BRAMs: Fully non-uniform fault distribution VC707 (2060 BRAMs) VCCBRAM@ Vcrash= 0.54V Temperature@ Ambient 0.0% 0.3% 0.6% 0.9% 1.2% 1.5% BRAMFaultRate(%) %BRAMs Average Fault Rate (%) 1.8% 0.86% High-vulnerable 9.4% 0.24% Mid-vulnerable 52.3% 0.03% Low-vulnerable 36.3% 0.0% Zero-vulnerable K-means clustering
  • 15. 15 Fault Characterization at CRITICAL Region Type of undervolting faults: Permanent faults at specific voltage • There is no considerable change on the rate and location of faults over time. • Validated by repeating experiments for 100 times. • The physical location of BRAMs is extracted using Vivado. • Fault Variation Map (FVM): Fault rate mapped to the physical location of BRAMs. FVM can be potentially used in fault mitigation techniques! FPGA x-axis FPGAy-axis BRAMFaultRate(%) FVM @ (VCCBRAM @Vcrash, T= ambient, chip= VC707) 1 10 20 30 40 50 60 70 80 90 100 0 200 400 600 800 1000 1 11 21 31 41 51 61 71 81 91 FaultRate(per1Mbit) #Run Individual Run Cumulative Median Three parameters orthogonally have significant impact on the rate and location of faults: 1. Voltage 2. Temperature 3. Chip
  • 16. 16 Fault Characterization (Voltage Impacts) Location of undervolting faults: Fault Inclusion Property (FIP) • FIP: A corrupted bit at a specific voltage stays faulty in lower voltages as well. • FIP can be used in mitigation techniques. 0.1 1 10 100 1000 10000 0.61 0.6 0.59 0.58 0.57 0.56 0.55 0.54 FaultRate(per1Mbit) logscale VCCBRAM (V) Illustration of FIP FIP shown as fault rate for VC707
  • 17. 17 Fault Characterization (Temperature Impacts) • Methodology: Adjusting environmental temperature, monitoring on- board temperature via PMBus. • Experimental Observation:  At higher temperatures, fault rate is significantly reduced. • Inverse Temperature Dependency (𝑰𝑻𝑫) 𝟏:  For nano-scale technology nodes, under ultra low-voltage operations, the circuit delay reduces at higher temperatures since supply voltage approaches the threshold voltage. * x-axis: VCCBRAM (V). * y-axis: fault rate (per 1Mbit). 𝑇 = 50 0 𝐶 𝑇 = 60 0 𝐶 𝑇 = 70 0 𝐶 𝑇 = 80 0 𝐶 Practical confirmation of Inverse Temperature Dependency (ITD) (1) Neshatpour, K., Burleson, W., Khajeh, A., & Homayoun, H. (2018). Enhancing Power, Performance, and Energy Efficiency in Chip Multiprocessors Exploiting Inverse Thermal Dependence. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, (4), 778-791.
  • 18. 18 Fault Characterization (Chip Impacts) KC705-BKC705-A • Methodology: Repeating experiments on identical samples of KC705 (A&B). • • Observations:  Fault rates significantly vary, more than 4X.  Fault Variation Maps (FVMs) are entirely different. Fault location Fault location @VCCBRAM= Vcrash @VCCBRAM= Vcrash Even identical samples of same chips have totally different reliability behavior, due to the process variation/aging effects. Fault rate 0 100 200 300 0.57 0.56 0.55 0.54 Per1Mbit VCCBRAM (V) Fault rate 0 100 200 300 0.61 0.6 0.59 0.58 0.57 0.56 0.55 0.54 Per1Mbit VCCBRAM (V)
  • 19. 19 Contributions 1. Undervolting FPGAs  Voltage guardband  Overall power and reliability trade-off 2. Fault characterization in FPGA on-chip memories  Fault type, location, and rate  Temperature, Chip 3. Low-voltage FPGA-based Neural Network (NN)  Power consumption and NN accuracy characterization  Fault mitigation techniques  Application-aware technique  Built-in ECC
  • 20. 20 Experimental Methodology Neural Network (NN) Type Fully-connected classifier Total number of weights ~1.5 millions Activation function Logsig (logarithmic sigmoid) Major benchmark Name-type MNIST- handwritten digit images Number of images Training: 60000, Classification: 10000 Number of pixels per image 28*28=256 Number of output classes 10 Additional benchmarks Names Forest and Reuters Data representation model Type 16-bits fixed-point Precision Minimum sign and digit per layer An example implementation on VC707 Frequency 100 Mhz BRAM usage (total: 2060) 70.8%
  • 21. 21 NN Implementation on FPGA • Input data: off-chip DDR memory. • Weights: on-chip FPGA BRAM. • Computation: Streaming data onto DSPs and LUTs. • We undervolt VCCBRAM:  Weights of the NN are potentially affected. FPGA Implementation
  • 22. 22 Low-Voltage FPGA-based NN • Significant power reduction until the minimum safe voltage, i.e., Vmin (By eliminating the voltage guardband). • Additional 40% power reduction below the voltage guardband. • The NN classification error exponentially increases from 2.56% (inherent classification error) to 6.74% through undervolting BRAMs beyond Vmin. • Fault mitigation techniques to prevent the accuracy loss:  Application-aware mechanism  Built-in ECC Power saving NN accuracy loss 2.39 0.25 0.15 6.47 6.47 6.47 0 2 4 6 8 10 Vnom= 1 V Vmin= 0.61V Vcrash= 0.54V On-chipPower(Watts) BRAM Rest
  • 23. 23 Intelligently-Constrained BRAM placement (ICBP) • Below voltage guardband level at CRITICAL voltage region, we present ICBP to prevent NN classification error rate loss. • Core Idea: Map most-sensitive weights to faults into robust BRAMs.  Q: Which are the most-sensitive NN weights? A: Deeper Layers. ICBP-Additional ConstraintsintheFPGA placementstage 1 1.4 2.1 3 5.7 LAYER0 LAYER1 LAYER2 LAYER3 LAYER4 Normalized Vulnerability NN Layers
  • 24. 24 ICBP Evaluation • Pros:  Significant accuracy loss prevention.  No power and performance overhead. • Cons:  Needs the FVM as a pre-process step  Built-in ECC is evaluated without having this cost. 0 0.1 0.2 0.3 0.4 0% 2% 4% 6% 8% 10% 0.61 0.6 0.59 0.58 0.57 0.56 0.55 0.54 BRAMsPower(Watt) NNClassificationError(%) VCCBRAM (V) NN Error by Default Placement NN Error by ICBP BRAM Power Inherent NN Error: 2.56%
  • 25. 25 Built-in ECC • Built-in ECC of FPGA BRAMs:  Hamming-code.  Two (2) additional bits per row are reserved as parities.  SECDED (Single-Error Correction and Double-Error Detection). • Experimental Methodology:  Activate built-in ECC under low-voltage read operations. • Experimental Observations:  >90% fault correction  >7% fault detection (not correction) 0 200 400 600 800 0.61 0.6 0.59 0.58 0.57 0.56 0.55 0.54 Faultrate(per1Mbit) VCCBRAM (V) Without ECC With ECC Parity Bits single-bit double-bit multiple-bit
  • 26. 26 ECC for NN Accelerator 0% 2% 4% ClassificationError(%) VCCBRAM (V) Without ECC With ECC Inherent NN Error: 2.56% Area Utilization (%) BRAM LUT FF Without ECC 96% 3% 0.25% With ECC 100% 12% 0.25% BRAM Power (W) Vnom= 1V Vmin= 0.61V Vcrash= 0.54V Without ECC 2.4 0.31 0.198 With ECC ---- ---- 0.211ECC efficiency to prevent NN accuracy loss ECC area and power costs • Pros:  Significant accuracy loss prevention.  Negligible power and performance overhead. • Cons:  Requires larger data rows/lines.  Not all FPGAs are equipped with this technique.
  • 27. 27 Outline • Motivation and Background • Methodology and Results - Undervolting FPGA On-Chip Memories - Undervolting FPGA Internal Components • More Information
  • 28. 28 Executive Summary • Motivation: Power consumption of neural networks is a main concern  Hardware acceleration: GPUs, FPGAs, and ASICs • Problem: FPGAs are at least 10X less power-efficient than equivalent ASICs • Goal: Bridge the power-efficiency gap between ASIC- and FPGA-based neural networks by Undervolting below nominal level • Evaluation Setup  5 Image classification workloads  3 Xilinx UltraScale+ ZCU102 platforms  On-chip voltage rail for internal FPGA components • Main Results  Large voltage guardband (i.e., 33%)  >3X power-efficiency gain
  • 29. 29 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 30. 30 Motivation and Background • Motivation  Power consumption of neural networks is a main concern  Hardware acceleration: GPUs, FPGAs, and ASICs  FPGAs: Getting popular but less power-efficient than equivalent ASICs  Large voltage guardbands (12-35%) for CPUs, GPUs, DRAMs  Any potential of “Undervolting FPGAs” for power-efficiency of neural networks? • Background  Neural Networks: Widely deployed with an inherent resilience to errors  FPGAs: Higher throughput than GPUs and better flexibility than ASICs  Undervolting: Reduces power cons., may incur reliability or performance issues
  • 31. 31 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 32. 32 Our Goal • Primary Goal  Bridge the power-efficiency gap between ASIC- and FPGA-based neural networks by:  Undervolting (i.e., underscaling voltage below nominal level) • Secondary Goals  Study the voltage behavior of real FPGAs (e.g., guardband)  Study the power-efficiency gain of undervolting for neural networks  Study the reliability overhead  Study the frequency underscaling to prevent the accuracy loss  Study the effect of environmental temperature
  • 33. 33 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 34. 34 Overall Methodology • 5 CNN image classification workloads, i.e., VGGNet, GoogleNet, AlexNet, ResNet50, Inception. • Xilinx DNNDK to map CNN into FPGA  By default optimized for INT8 • 3 identical samples of Xilinx ZCU102  ZYNQ Ultrscale+ architecture  Hard-core ARM for data orchestration  FPGA for CNN acceleration • 1 on-chip voltage rails, via PMBus  𝑉𝐶𝐶𝐼𝑁𝑇: DSPs, LUTs, buffers, …  𝑉𝑛𝑜𝑚= 850mV (set by manufacturer) Vast majority (>99.9%) of the power is dissipated on 𝑉𝐶𝐶𝐼𝑁𝑇
  • 35. 35 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 36. 36 Overall Voltage Behavior Slight variation of voltage behavior across platforms and benchmarks  FPGA stops operatingCrash • Guardband: Large region below nominal level (𝑽 𝒏𝒐𝒎 = 𝟖𝟓𝟎𝒎𝑽) • Critical: Narrower region below guardband (𝑽 𝒎𝒊𝒏 = 𝟓𝟕𝟎𝒎𝑽) • Crash: FPGA crashes below critical region (𝑽 𝒄𝒓𝒂𝒔𝒉 = 𝟓𝟒𝟎𝒎𝑽)  No performance or reliability loss  Added by the vendor to ensure the worst-case conditions  Large guardband, average of 33% Guard band  A narrow voltage region  Neural network accuracy collapse Critical
  • 37. 37 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 38. 38 Power-Reliability Trade-off Power-efficiency (GOPs/W) gain • >3X power saving (2.6X by eliminating guardband and further 43% in critical region) Reliability overhead (i.e., CNN accuracy loss) VGGNet GoogleNet AlexNet ResNet Inception • Slight variation across 3 platforms and 5 workloads • No accuracy loss in the guardband, accuracy collapse in the critical region • Slight variation across 3 platforms and 5 workloads
  • 39. 39 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 40. 40 VCCINT (mV) Fmax (Mhz) GOPs (Norm) Power (W) Norm) GOPs/W (Norm) GOPs/J (Norm) 570 333 1 1 1 1 565 300 0.94 0.97 0.97 0.87 560 250 0.83 0.84 0.99 0.75 555 250 0.83 0.78 1.06 0.8 550 250 0.83 0.75 1.1 0.83 545 250 0.83 0.74 1.12 0.84 540 200 0.7 0.56 1.25 0.75 Frequency Underscaling • Simultaneous frequency underscaling to prevent CNN accuracy collapse in the critical voltage region • For each voltage level below 𝑽 𝒎𝒊𝒏, we found the 𝑭 𝒎𝒂𝒙, the maximum operating frequency at which there is no accuracy loss • Leads to performance and energy-efficiency loss Best setting for High-performance and Energy-efficiency Best setting for Power-efficiency (Voltage steps= 5mV, Frequency steps= 50Mhz)- shown for GoogleNet
  • 41. 41 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 42. 42 Environmental Temperature • Effects of environmental temperature on power-reliability  Use fan speed to test temperature in [34 ℃, 50 ℃]  On-board temperature monitored by PMBus • Temperature effects on power consumption  ↓ 𝑇𝑒𝑚𝑝 → ↓ 𝑃𝑜𝑤𝑒𝑟 (direct relation of power and temp)  By undervolting, the impact of temperature on power consumption reduces. • Temperature effects on reliability  ↓ 𝑇𝑒𝑚𝑝 → ↑ 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑙𝑜𝑠𝑠 (indirect relation of reliability and temp)  In our temperature range, 𝑉 𝑚𝑖𝑛 and 𝑉𝑐𝑟𝑎𝑠ℎdo not change significantly. GoogleNet
  • 43. 43 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 44. 44 Prior Works • Undervolting  Studies for off-the-shelf real CPUs, GPUs, ASICs, DRAMs  Large voltage guardband (from 12% to 35%) for many devices  This work extends such studies for off-the-shelf FPGAs especially for neural network acceleration and confirms large guardbands (i.e., 33%) • Power-Efficient Neural Networks  Studies on architectural-, hardware-, and software-level techniques  Undervolting in neural network ASIC accelerator (e.g., GreenTPU-DAC’19)  This work proposes a hardware-level undervolting for further power-saving (>3X) in FPGAs. • Reliability in Neural Networks  Analytical and simulation-based studies (e.g., Thundervolt-DAC’18)  Some studies on real hardware (e.g., EDEN-MICRO’19)  This work studies the reliability of neural networks on real FPGAs when operating at reduced voltage levels.
  • 45. 45 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 46. 46 Summary, Conclusion, and Future Works • Summary  We improve the power-efficiency (>3X) of off-the-shelf FPGAs via undervolting for neural network accelerators:  2.6X by eliminating the guardband (i.e., 33%) without any cost  43% by further undervolting below the guardband with the cost of  either accuracy loss, when the frequency is not underscaled  or performance loss, when the frequency is underscaled • Conclusion  Undervolting is an effective way to achieve significant power-saving for FPGA-based neural network accelerators • Future Works  HW & SW extension of our undervolting for FPGA clusters and other neural network models and tools
  • 47. 47 Outline • Motivation and Background • Methodology and Results - Undervolting FPGA On-Chip Memories - Undervolting FPGA Internal Components • More Information
  • 48. 48 References • B. Salami, et al., "An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration," in 50th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2020. • B. Salami, et al., "Comprehensive Evaluation of Supply Voltage Underscaling in FPGA on- chip Memories.", in 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ), 2018. • B. Salami, et al., “Evaluating Built-in ECC of FPGA on-chip Memories for the Mitigation of Undervolting Faults," in 27th Euromicro International Conference on Parallel, Distributed, and Network-based Processing (PDP), 2019. • B. Salami, et al., "Fault Characterization Through FPGAs Undervolting.", in 28th International Conference on Field Programmable Logic & Applications (FPL), 2018. • B. Salami, et al., “On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation.", in 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2018.
  • 49. 49 Ongoing and Future Extensions • Circuit-level simulation for validating the results • Expansion for more number of FPGAs (cluster), more workloads (DNN and non-DNN) • Heterogeneous systems including hw-sw systems, more voltage rails • Design voltage-optimized FPGA components • Integration with error handling systems like check- pointing
  • 50. 50 Acknowledgment • Adrian Cristal • Osman Unsal • Fahrettin Koc • Baturay Onural • Ismail Emir Yuksel
  • 51. FPGA Undervolting for Energy-Efficiency 30th International Conference on Field-Programmable Logic and Applications (FPL). 3th September, 2020. Behzad Salami Barcelona Supercomputing Center (BSC) behzad.salami@bsc.es