Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
www.flextiles.eu 
FlexTiles 
Runtime Mapping of Hardware Accelerators on the Embedded FPGA Layer 
FPL’14, FlexTiles Worksh...
2 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are here...
3 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are here...
4 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are here...
5 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are here...
6 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are here...
7 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are here...
8 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are here...
9 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are here...
10 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
11 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
12 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
13 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
14 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
15 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
16 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
17 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
18 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
19 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
20 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
21 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
22 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
23 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
24 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
25 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
26 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
27 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
28 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
29 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
30 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
31 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
32 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
33 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
34 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
35 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
36 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
37 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
38 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
39 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
40 / 
The information contained in this document and any attachments are the property of FlexTiles consortium. You are her...
Upcoming SlideShare
Loading in …5
×

FPL'2014 - FlexTiles Workshop - 6 - FlexTiles Embedded FPGA Accelerators

543 views

Published on

Slides presented at the FlexTiles Workshop at FPL'2014.
Presentation #6: FlexTiles Embedded FPGA Accelerators
FlexTiles is a heterogeneous many-core platform reconfigurable at run-time developed within an FP7 project.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

FPL'2014 - FlexTiles Workshop - 6 - FlexTiles Embedded FPGA Accelerators

  1. 1. www.flextiles.eu FlexTiles Runtime Mapping of Hardware Accelerators on the Embedded FPGA Layer FPL’14, FlexTiles Workshop September 1st 2014 Olivier SENTIEYS★, Christophe HURIAUX, Antoine COURTAY  University of Rennes 1 ★ Inria
  2. 2. 2 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 The Multicore Era is Hitting the Utilization Wall Multicore era is true since 2005-2008, but what’s next? Energy efficiency is not scaling along with integration capacity Transistor and power budgets no longer balanced Classical scaling Device count S2 Device frequency S Device power (cap) 1/S Device power (Vdd) 1/S2 Utilization 1 Leakage limited scaling Device count S2 Device frequency S Device power (cap) 1/S Device power (Vdd) ~1 Utilization 1/S2 Pi=ai fi Ci Vddi2 Corei [Venkatesh et al., ASPLOS’10]
  3. 3. 3 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 The Utilization Wall With each successive process generation, the percentage of a chip that can switch at full frequency drops exponentially due to power constraints 8nm in 2018 best-case average 3.7x speedup 14% per year (highly parallel codes and optimal per-benchmark) [Esmaeilzadeh et al., ISCA’11]
  4. 4. 4 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 0 5 10 15 20 45nm 32nm 22nm 16nm 11nm 8nm Speedup Historical Scaling ITRS Scaling Realistic Scaling 18x 7.9x 3.7x Multicore and Dark Silicon [Doug Burger, HiPEAC’13] Dark Silicon 47% 36% 71% 51% 62% 40% 17% 1% 2014 >2016 >2018
  5. 5. 5 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 The Efficiency of Specialization * Source: Ning Zhang and Bob Brodersen, ISSCC data 100-1000X Gap in Efficiency … but Specialization comes with Penalties in Programmability ASICs FPGAs
  6. 6. 6 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 Heterogeneous Multicores Different cores on a single chip GPPs, HW accelerators, memory, network-on-chip Reconfigurable HW accelerators keep flexibility while increasing area and energy efficiency Self-adapting devices Dynamically adapt the hardware to the application and to changing environments Core Core Core Core Core Core Core Core Core Proc. Reconf. HW Mem. HW Acc.
  7. 7. 7 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 Can 3D Stacking Help? 3D-Stacked Reconfigurable Accelerators Improved bandwidth/latency between cores and accelerators Improved resource usage Improved performance and energy efficiency Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core reconfigurable layer multicore layer
  8. 8. 8 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 Outline eFPGA Reconfigurable Fabric General architecture overview Expected features Task migration in FPGA vs. task migration in eFPGA Virtual Bit-Stream Coping with Heterogeneous Blocks Development Flow Achievements & Conclusion
  9. 9. 9 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 FlexTiles Architecture Overview - 9 3D interface to the NoC DSP blocks Memory blocks
  10. 10. 10 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 Expected Features of the Reconfigurable Layer Main expected features Low reconfiguration time (and power) overhead Double-context configuration memory Low complexity reconfiguration control Resource sharing/distribution easiness, simplified task migration No predefined configuration domains Bit-stream independent from task location Smaller bit-stream size in configuration memory  Virtual Bit-Stream (VBS)
  11. 11. 11 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 Task Allocation & Migration in an FPGA Predefined reconfigurable regions Bit-stream depends on task location I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O HW Accelerator #1 BS #1 HW Accelerator #1 BS #2
  12. 12. 12 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 Task Migration in eFPGA 3D NI 3D NI 3D NI 3D NI RAM RAM RAM RAM RAM RAM RAM RAM 3D NI 3D NI 3D NI 3D NI 3D NI 3D NI 3D NI 3D NI 3D NI 3D NI 3D NI HW Accelerator #2 BS #2 HW Accelerator #1 BS #1
  13. 13. 13 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 Outline eFPGA Reconfigurable Fabric Virtual Bit-Stream Concept Abstraction of routing details Results Coping with Heterogeneous Fabric Development Flow Achievements & Conclusion
  14. 14. 14 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 Concept of Virtual Bit-Stream A task is synthesized and placed&routed into a Virtual Bit-Stream (VBS)  Hide some routing details which are architecture dependent  Remove details coming from task physical location in the fabric  No predefined configuration domains Final Bits-Stream is generated at run time  Resource sharing/distribution becomes easier, task migration is simplified Quartus II
  15. 15. 15 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 Interconnection Architecture Hiding routing details Full BS is 129 bits Could be reduced by giving less details CLBIN[1] CLBIN[2] CLBIN[3] CLBOUT CLBIN[0] 4 5 6 7 12 13 14 15 0 1 2 3 8 9 10 11 16 17 18 19 20
  16. 16. 16 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 Virtual Bit Stream Hiding routing details List of I/O and connections 20  8 1  9 5  18 4 5 6 7 12 13 14 15 0 1 2 3 8 9 10 11 16 17 18 19 20
  17. 17. 17 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 Results VBS is independent of task location with a smaller size than BS 44.4% 49.2% 47.2% 55.2% 49.7% 29.5% 27.4% 26.6% 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% 80.0% 90.0% 100.0% 0 200 400 600 800 1000 1200 1400 1600 tseng tseng diffeq diffeq apex4 des ex5p misex3 Kilo-bits BS size VBS size Compression ra o 3-4 time smaller for large bit-streams
  18. 18. 18 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 eFPGA Architecture using VBS Reconfiguration controller Upon GPP requirements: can place, duplicate and migrate tasks Finalizes VBS Reconfiguration controller External memory VBS 1 VBS 2 VBS 3 VBS N Buffer memory data control 1 2
  19. 19. 19 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 Outline eFPGA Reconfigurable Fabric Virtual Bit-Stream Coping with Heterogeneous Fabric Heterogeneous Blocks Task placement in a Homogeneous context Task placement in a Heterogeneous context Development Flow Achievements & Conclusion
  20. 20. 20 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 Heterogeneous Blocks Logic Elements  Cluster of four 6-input LUTs  3309 mm2 Arithmetic Elements  18x18 multiplier, 48-bit adder/subtractor  4351 mm2 … … … … … CLBIN CLBOUT LUT LUT LUT LUT + - A B 18 18 36 48
  21. 21. 21 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 Heterogeneous Blocks Memories  1024 x 16-bit word SRAM  6570 mm2 3D TSV and Accelerator Interface Reconfiguration Controller 3D 3D 3D 3D 3D 3D 3D 3D 3D Reconfiguration RAM 3DNI 3DNI 3DNI 3DNI 3DNI 3DNI 3DNI 3DNI NoC Link (400 I/O) Pitch X Y size X size Y Area mm² 40 20 20 800 800 0,64 26.95mm² Work In Progress
  22. 22. 22 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 eFPGA Floorplan (heterogeneous) Logic Block Arithmetic Accelerator Memories Accelerator Interface
  23. 23. 23 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 Task Placement & Migration Homogeneous case No constraint on task placement Regular routing architecture Easy! (thanks to the Virtual Bit-Stream) Cope with heterogeneity RAM, DSP, 3D I/Os Migration is limited vertically to the same column to the next column containing same complex blocks Task Configured LE Logic Element (LE)
  24. 24. 24 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 eFPGA: Handling of Complex Blocks Heterogeneous blocks routing is abstracted from logic routing Long lines allow a trade-off between placement flexibility and routing complexity A two-level routing is performed at runtime: Logic routing (as in the homogeneous case) Heterogeneous block routing through long lines
  25. 25. 25 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 eFPGA: Handling of Complex Blocks Delay depends on final placement Only worst-case delay can be estimated offline Flexibility is still limited in the vertical axis Multiple of block height Length of long lines and connections long-lines – routing-resources should be limited Area overhead, but slight delay penalty (see our paper at FPL’14 on Wednesday)
  26. 26. 26 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 Outline eFPGA Reconfigurable Fabric Virtual Bit-Stream Coping with Heterogeneous Fabric Development Flow Achievements & Conclusion
  27. 27. 27 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 Development Flow Custom development flow from C to Virtual Bit-Stream High-level Synthesis High-level task description RTL task description HDL Synthesis HDL task description Flat logic netlist Technology mapping Mapped logic netlist Placer Router Placement data Routing data Arch. netlist Bitstream generation Virtual bit-stream Arch. description  Integrated within the FlexTiles development flow  Generates VBS from a C description or a HDL description
  28. 28. 28 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 Development Flow Custom development flow from C to Virtual Bit-Stream Relies on Catapult C from Calypto Design Systems High-level synthesis from C to VHDL
  29. 29. 29 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 Development Flow Custom development flow from C to Virtual Bit-Stream Use the Verilog To Routing (VTR) academic tool flow to generate netlist and routing data from Verilog RTL task description HDL Synthesis HDL task description Flat logic netlist Technology mapping Mapped logic netlist Placer Router Placement data Routing data Arch. netlist Arch. description
  30. 30. 30 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 Development Flow Custom development flow from C to Virtual Bit-Stream A custom back-end generate the VBS from the data generated by VTR The VBS can be loaded on the FlexTiles platform
  31. 31. 31 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 Conclusions Overall results and achievements 3-D stacked embedded FPGA coupled to a processor layer Flexible resource allocation/sharing Seamless task migration Virtual Bit-Stream VBS also reduces bitstream size eFPGA Chip “Proof of Concept” 65nm CMOS Homogenous Fabric of LBs I/O Ring (not 3D…) External Reconfiguration Controller
  32. 32. 32 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 Results Thank you for your attention
  33. 33. 33 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 D-cache 6% Datapath 3% Energy Saved 91% D-cache 6% Datapath 38% Reg. File 14% Fetch/ Decode 19% I-cache 23% Where do the energy savings come from? MIPS baseline 91 pJ/instr. Specialized core 8 pJ/instr. [Goulding et al., Hot Chips’10]
  34. 34. 34 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 Energy per operation: 45nm CMOS, 40nm V6 FPGA HW operators (45nm) 32-bit addition: 0.5pJ 16-bit multiply: 2.2pJ 64-bit FPU: 50pJ/op 40nm V6 FPGA 16/32-bit multiply and add: 114pJ (DSP blocks), 170pJ (LUT) 32-bit I/O access: 1.47nJ 32-bit memory read: 660 pJ 32-bit register R/W: 1.12 pJ Embedded RISC Processor (45nm) 32-bit register R/W: 0.33pJ 32-bit cache R/W: 3.5pJ add instruction⋆⋆: 5.32 pJ ⋆⋆add instruction (best case) = fetch, decode, read 2 operands from RF, execute, write back (into local reg. first, then copy into RF) [Dally et al., Computer, 2010] [Bonamy et al., 2013]
  35. 35. 35 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 The Energy Cost of Data Movement Fetching operands costs more than computing Energy cost of cache coherence is huge! 28nm CMOS 500 pJ Efficient off-chip link 16 nJ DRAM Rd/Wr 64-bit DP 20pJ 26 pJ 256 pJ 1 nJ 256- bit buses 50 pJ 256-bit access 8 kB SRAM [Dally, IPDPS’11]
  36. 36. 36 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 Efficient Hardware Task Swapping Hiding reconfiguration time with computing Single-context memory Double-context memory eFPGA will use double-context memory Gain in dynamic reconfiguration efficiency At the cost of ~50% overhead Task 1 Task 2 time Cfg. 2 Cfg. 1 Task 1 Task 2 time Cfg. 2 Cfg. 1 CB FF ConfClk Latch ConfEn CB CB: one configuration bit
  37. 37. 37 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 eFPGA(V1) Architecture Logic Block Switch Block LUT CLBIN ScanIn FF mux CB ScanOut CLBOUT clk,rstb CB CB CB CB NORTH(i) SOUTH(i) EAST(i) WEST(i) ScanIn ScanOut
  38. 38. 38 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 eFPGA Architecture Interconnection Block CLBIN[1] CLBIN[2] CLBIN[3] CLBOUT CLBIN[0] NORTH 0 1 2 3 0 1 2 3 SOUTH 0 1 2 3 WEST EAST 0 1 2 3
  39. 39. 39 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 eFPGA Architecture eFPGA macro CHANY (i,j+1) SB (i-1,j) CHANX (i+1,j) CLB (i+1,j) SB (i,j-1) SB(i,j) CLB (i,j+1) CLB (i,j) CLBIN[1] CLBIN[2] CLBIN[0] CLBIN[3] CLBOUT CHANX(i,j) CHANY(i,j) CLBIN[3] CLBOUT CLBIN[0]
  40. 40. 40 / The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0 University of Rennes 1 – FPL’14 FlexTiles Workshop 32 eFPGA Floorplan eFPGA Floorplan

×