1. Brad L. Hutchings Electrical and Computer EngineeringBYU
1
Run-Time Reconfiguration
Evolution of a New
Strategy for Computing
2. Brad L. Hutchings Electrical and Computer EngineeringBYU
2
Reconfigurable Logic at BYU
l Lab Resources:
l Splash-II, TeraMAC, CLAyFun, Xilinx, Altera.
l Applications developed for most major platforms:
l Run-Time Reconfigurable Neural Networks
l Application-Specific Processors
l Image Processing
l Genetic Algorithms
l Linear Algebra
l Processor/Configurable-Logic Integration.
3. Brad L. Hutchings Electrical and Computer EngineeringBYU
3
http://splish.ee.byu.edu
l List of work in progress.
l All BYU papers are downloadable.
l Bibliography with text abstracts:
» Keyword search: device, tool, app, system.
l Tutorials for FPGA Platforms and CAD
Tools.
4. Brad L. Hutchings Electrical and Computer EngineeringBYU
4
What This Talk is About
Computing with FPGAs
5. Brad L. Hutchings Electrical and Computer EngineeringBYU
5
Assumptions
l Things I won’t tell you:
+how wonderful FPGAs are.
+how cost-effective FPGAs are.
+how FPGAs will replace all of your home appliances.
Everyone knows how to spell “FPGA”
6. Brad L. Hutchings Electrical and Computer EngineeringBYU
6
Device Assumptions
l Assume an FPGA that
» reconfigures rapidly (<10 ms or so)
» reconfigures in-circuit
» reconfigures an infinite number of times.
7. Brad L. Hutchings Electrical and Computer EngineeringBYU
7
FPGAs are Big
FETs ‘R US
8. Brad L. Hutchings Electrical and Computer EngineeringBYU
8
FPGAs are Slow
2K
2K
2K
9. Brad L. Hutchings Electrical and Computer EngineeringBYU
9
Most FPGA Designs are Static
l Eternal prototypes.
l Low volume or low gate counts.
l Evolving standards or requirements.
Final Application is Still Static
10. Brad L. Hutchings Electrical and Computer EngineeringBYU
10
After the application is
finished...
What are those other 900,000
transistors doing right now?
11. Brad L. Hutchings Electrical and Computer EngineeringBYU
11
FPGAs are Reusable
Convolution
Skeletonization
Neural Networks
Edge Detection
ATR
FFT
Encryption
Compression
Same
Device
12. Brad L. Hutchings Electrical and Computer EngineeringBYU
12
Slightly Less-Static Examples
l Diagnostic-specific configurations.
– Test with internal circuitry.
l Mode-specific configurations.
– Pixel depths on monitors.
l I/O, data format-specific configurations.
– Bus-couplers, etc.
Unrelated configurations.
Occasional reconfiguration.
13. Brad L. Hutchings Electrical and Computer EngineeringBYU
13
Run-Time Reconfiguration (RTR)
Implement application as a
set of multiple, cooperative configurations
At run-time, execute and reconfigure
as necessary.
14. Brad L. Hutchings Electrical and Computer EngineeringBYU
14
Evolutionary Progression of RTR
l RRANN-1(Backpropagation)
» Global RTR
l RRANN-2 (Backpropagation)
» Fixed-Schedule, Local RTR
l DPCC (General Computing)
» Demand-Paged, Local RTR
15. Brad L. Hutchings Electrical and Computer EngineeringBYU
15
Neural Network
Eye Color
Income
Debt
Need
Make of Car
Do I Make Loan
16. Brad L. Hutchings Electrical and Computer EngineeringBYU
16
Training Mode
Eye Color
Income
Debt
Need
Make of Car
Do I Make Loan
Training
Output
Data[n]
Training
Input
Data[n]
17. Brad L. Hutchings Electrical and Computer EngineeringBYU
17
Operational Mode
Eye Color
Income
Debt
Need
Make of Car
Do I Make Loan
Loan
Application
Answer
18. Brad L. Hutchings Electrical and Computer EngineeringBYU
18
Backpropagation
neti = j A OjWji ∀ i : i B
Oi = f (neti) = 1
1+e−neti
δi = f (neti)(Ti − Oi) ∀ i : i C
δi = f (neti) j E δjWij ∀ i : i D
Wij = Cl Oiδj ∀ i, j : i A, j B
Wijt+1 = Wijt + Wij
19. Brad L. Hutchings Electrical and Computer EngineeringBYU
19
RRANN Temporal Phases
Feedforward
Error Propagation
Update
neti = j A OjWji ∀ i : i B
Oi = f (neti) = 1
1+e−neti
δi = f (neti)(Ti − Oi) ∀ i : i C δi = f (neti) j E δjWij ∀ i : i D
Wij = Cl Oiδj ∀ i, j : i A, j B
Wijt+1 = Wijt + Wij
21. Brad L. Hutchings Electrical and Computer EngineeringBYU
21
RRANN-1
Implementation Details
l Global RTR
» All FPGA resources configured globally.
» All state stored external to FPGA.
l Xilinx XC3090 FPGAs
» Achieved 6 neurons per FPGA (RTR)
– Unified version achieved 1 neuron per FPGA
22. Brad L. Hutchings Electrical and Computer EngineeringBYU
22
RRANN-1
Training Performance
0 10 20 30 40 50 60
0
2
4
6
8
10
12
Statically Configured (1 Neuron per FPGA)
Run-Tim
e
Reconfigured
(6
Neurons
perFPG
A)
Number of FPGAs (XC3090s)
PerformanceRelativetoHP735-125
Peak Training Performance
23. Brad L. Hutchings Electrical and Computer EngineeringBYU
23
RRANN-1
Operational Performance
0 10 20 30 40 50 60
0
20
40
60
80
100
120
Statically Configured (1 Neuron per FPGA)
Run-Tim
e
Reconfigured
(6
Neurons per FPGA)
Number of FPGAs (XC3090s)
PerformanceRelativetoHP735-125 Peak Operational Performance
24. Brad L. Hutchings Electrical and Computer EngineeringBYU
24
RRANN-2
Implementation Details
l Local RTR
» Exploit commonality to reduce configuration.
» Partially configure FPGAs in each step.
l National Semiconductor Clay-31 FPGA.
» similar to CLI/Atmel.
» Achieved 9 neurons per FPGA.
– Unified version achieved 3 neurons per FPGA.
25. Brad L. Hutchings Electrical and Computer EngineeringBYU
25
RRANN-2
Training Performance
5 10 15 20 25 30 35 40
0
5
10
15
20
25
30
35
40
45
Peak Training Performance
Number of FPGAs
PerformanceRelativetoHP735-125
Unified
Complete
Partial
RRANN
26. Brad L. Hutchings Electrical and Computer EngineeringBYU
26
RRANN-2
Operational Performance
5 10 15 20 25 30 35 40
0
20
40
60
80
100
120
140
Peak Operational Performance
Number of FPGAs
PerformanceRelativetoHP735-125
Unified
Partial & Complete
RRANN
27. Brad L. Hutchings Electrical and Computer EngineeringBYU
27
RRANN-2 Demo
l 5 minutes
28. Brad L. Hutchings Electrical and Computer EngineeringBYU
28
RTR Point of Interest
A system computing only 30% of the time can
outperform a system computing 100% of the time
29. Brad L. Hutchings Electrical and Computer EngineeringBYU
29
DPCC
l Experiment with RTR architecture.
» Study impact of RTR on system architecture.
l FPGA-based processor architecture.
» NSC CLAy-31
» Partial configuration.
» Application-specific instruction sets.
30. Brad L. Hutchings Electrical and Computer EngineeringBYU
30
General DPCC Approach
l Library-based Approach
– Library contains application-specific circuit modules.
l Software Control
– Sequencing, complex control, I/O controlled by software.
l Linear Hardware Space.
– Hardware relocated at run-time to available space.
l Hardware Paging.
– Idle hardware replaced with active modules.
l Demand-Driven Execution
– Hardware modules loaded as required by application.
31. Brad L. Hutchings Electrical and Computer EngineeringBYU
31
DISC Internal Architecture
Static Control Circuitry
Dynamic Instruction Module A
Dynamic Instruction Module B
36. Brad L. Hutchings Electrical and Computer EngineeringBYU
36
DPCC Conclusions
l Provides modular programming model.
l Isolates capacity/timing issues.
l Allows reuse at the design level.
l Automated reuse of silicon:
» demand paging,
» module relocation.
37. Brad L. Hutchings Electrical and Computer EngineeringBYU
37
Summary
l Illustrated RTR as it evolved at BYU:
» Global RTR
» Local RTR
» DPCC
l RTR exploits FPGA overhead
» Static approaches waste FPGA resources.
38. Brad L. Hutchings Electrical and Computer EngineeringBYU
38
The Future
l FPGAs are reusable, VHDL is not!
» Must be able to reuse optimized designs.
l CAD/Compilers are far behind.
» Very dependent on GPR.
l Devices are not well-suited.
» Better devices will expand application base.
39. Brad L. Hutchings Electrical and Computer EngineeringBYU
39
Acknowledgement
Thanks to National Semiconductor for
funding this work through an
ARPA contract.
40. Brad L. Hutchings Electrical and Computer EngineeringBYU
40
End of Presentation
41. Brad L. Hutchings Electrical and Computer EngineeringBYU
41
Initial Motivation
l I was new faculty.
» I could do something completely new.
l I enjoyed custom hardware.
» FPGAs looked like the ultimate “tinkertoy.”
l I had to restructure CompEng.
» FPGAs could impact the undergraduates.
43. Brad L. Hutchings Electrical and Computer EngineeringBYU
43
Configurable Computing
l Most applications are still static.
» One configuration per application.
l Compile-Time Reconfiguration
» Configure once per application, prior to execution.
l Splash, Perle, Teramac
» Large ASIC emulators.
44. Brad L. Hutchings Electrical and Computer EngineeringBYU
44
Flexibility eases Development
l Software-like implementation strategy:
» Iterative
» Incremental
» Debuggable
» Execution versus Simulation
» ASIC results without ASIC development