RR Osorio  FPGA
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

RR Osorio FPGA

on

  • 806 views

 

Statistics

Views

Total Views
806
Views on SlideShare
806
Embed Views
0

Actions

Likes
0
Downloads
16
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

RR Osorio FPGA Presentation Transcript

  • 1. Field-Programmable Gate Arrays as tracking devices Roberto Rodríguez Osorio Javier Díaz Bruguera Group of Computer Architecture Dept. of Electronics and Computer Science University of Santiago de Compostela
  • 2. Outline Application-specific computing machines ASIC vs FPGA FPGA technology basics Hard cores in FPGAs Performance Design effort Choices Applications 2
  • 3. Application-specific computing machines Microprocessor Application-Specific Integrated Circuit Code Data memory memory M p t p M PC IR Register file Control logic MAC Control logic Functional Control units Datapath section Control Datapath section Performance: 10 cycles @ 3GHz Performance: 1 cycle @ 1GHz Dissipated power: ~35 W Dissipated power: ~mW 3
  • 4. ASIC vs FPGA $4M $3M $2M NRE $1M 0.35 0.25 0.2 0.15 0.1 0.05 Technology (micrometers) 4
  • 5. ASIC vs FPGA 6 Computational efficiency (Mops/w) 10 5 10 Maximum efficiency FPGA 4 (ASIC) ASSP MPPA 10 GPGPU VLIW ASIP 3 ManyCore 10 ... 2 10 1 10 0 10 2 1 0.5 0.25 0.13 0.07 Technology ( m) 1986 1990 1994 1998 2002 2006 Source: Theo A.C.M Claasen, ISSCC 99 5
  • 6. FPGA technology basics – Computing a b carry carry input a b s output 0 0 0 0 0 c out FA c in 0 0 1 1 0 0 1 0 1 0 s 0 1 1 0 1 1 0 0 1 0 c in 1 0 1 0 1 a s 1 1 0 0 1 b 1 1 1 1 1 a b a c out cin b c in 6
  • 7. FPGA technology basics – Do not compute Logic blocks a SRAM b Memory s 8x1-bit cin SRAM Memory cout 8x1-bit 7
  • 8. FPGA technology basics – Interconnect █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ 8
  • 9. FPGA technology basics – Interconnect 9
  • 10. FPGA technology basics – Interconnect 10
  • 11. FPGA technology basics – Interconnect + memory FPGA fabric consists of a huge number of simple memory elements connected by means of a reconfigurable network Design software must break every computing tasks into 1-bit size operation with no more than 4, 5 or 6 variables Operations are spatially distributed according to proximity criteria Routing may be troublesome Long paths are slow Routing though logic blocks increase area 11
  • 12. Hard cores in FPGAs Memory blocks ████████████████████ Multipliers ████████████████████ DSP blocks ████████████████████ Microprocessors ████████████████████ Floating point units? ████████████████████ ████████████████████ ████████████████████ ████████████████████ ████████████████████ ████████████████████ 12
  • 13. Memory blocks Hundreds or thousands of small memory blocks Dual-port blocks 18 K-bit each for Xilinx Flexible configurations Many short words or a few large word Independent access Huge aggregated bandwidth 13
  • 14. Multipliers and DSP blocks As FPGAs were becoming larger, some people tried to implement DSP algorithms on them However: Multipliers take too much area Therefore: Hardwired multipliers were introduced DSP algorithms are often based on multiply & add multiply & accumulate DSP blocks in modern FPGAs implement hardwired: multipliy, multiply & add, multiply & accumulate optional addition before multiplying three-input add 1 large, 2 medium or 4 small operations on the same hardware shifting, comparisons, bit-wise operations,… Up to 2000 DSP blocks in current FPGAs for massive parallelism 14
  • 15. Microprocessors Xilinx: IBMs Power PC processors Virtex II Pro Virtex-4 FX Virtex-5 FX Microblaze soft processors Altera: ARM RISC processors Nios soft processor 15
  • 16. Floating point units Not implemented so far • Suggested to help to accelerate scientific computing • For engineering, fixed point arithmetic is usually enough Would it happen? ☺ It happened with multipliers, transceivers, DSP blocks, … GPUs have already a strong position in this field 16
  • 17. Performance Compared to an ASIC 10 times slower, larger and power hungry Compared to a microprocessor Fast, depending on: Potential parallelism Required bandwidth Small and simple, even standalone Reduced power consumption (< 1W), they may run on batteries 17
  • 18. Design effort Several scenarios: Pure VHDL or Verilog coding Higher flexibility, efficiency and performance Long design time Costly debugging Use macros combined with VHDL or Verilog Libraries of IP blocks easy the design process It is not guaranteed that the required functionalities can be found High level languages (DSP logic (Matlab), Impulse-C, Handel-C,…) Efficient and simple implementation for simple algorithms Lack of expressiveness for complex algorithms 18
  • 19. Choices Xilinx Virtex Spartan Altera Stratix Cyclone Others Actel Lattice Semiconductor … 19
  • 20. Choices - Xilinx Spartan 3 Spartan 6 Virtex 6 Logic Cells 1728 – 74880 3840 - 147443 74496 – 566784 Block RAM 12 - 1872 216 - 4824 5616 – 32832 (Kbits) Multipliers / 4 – 104 DSP 84 - 126 8 - 180 288 - 2016 Evaluation board < $200 $300 - $1000 $2000 - $2500 cost 20
  • 21. In the context of this applications Device choice • Logic bounded • Standard logic • Multipliers • IO bounded Parallel acquisition • Switching memory blocks for acquisition and computation High computing speed • Via pipelining Results storage • Internal or external memory Power consumption Configuration 21