Your SlideShare is downloading. ×
A Reconfigurable Signal Processing IC with embedded FPGA and ...
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

A Reconfigurable Signal Processing IC with embedded FPGA and ...

479
views

Published on


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
479
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. A Reconfigurable Signal Processing IC with embedded FPGA and Multi-Port Flash Memory M. Borgatti , L. Calì, G. De Sandre, B. Forêt, D. Iezzi, F. Lertora, G. Muzzi, M. Pasotti, M. Poles, P.L. Rolandi STMicroelectronics - Central R&D - Italy
  • 2. Outline of Presentation
    • Project motivation and background
    • System architecture
      • Reconfigurable core
      • Memory subsystem
    • System performance
      • Application example: embedded face recognition system
    • Energy efficiency, measurements
    • SoC integration and design flow
      • System 2 RTL and RTL 2 Layout
    • Summary
  • 3. Project motivation and background
    • Conflicting industry trends
      • Economics of system integration
        • Even more complex SoC
        • More integration
        • Cost effectiveness and performance (per unit)
      • Increasing design complexity and risks
      • Increasing NREs
      • Shorter time-to-market and product life
    • Strong need for:
      • Faster project turnaround
      • Lower risk
    • Usage of re-configurable silicon fabrics
  • 4. Project motivation and background
    • Pragmatic approach proposed:
      • Reconfigurable architecture
      • Joins a statically extensible processor with e-FPGA
      • Tight connection to Flash memory subsystem
      • Open architecture with flexible programmable I/O
    • Programmable platform approach
      • Simple model for programmers
  • 5. Programmable Platform Approach System Applications Family System Application Silicon process + Enabling technologies Platform Compilation Config. Proc + e-FPGA Application Compilation Programmable platform 5
  • 6. System Architecture 8KB D$ Inst. Ext I/F Extensible MPU bus bridge e-FPGA General Purpose I/O Lines 8KB D$ 8KB I$ I2C BUS M/S AHB I/F INTs DMA & FPGA Prog. I/F Buffer I/F GP I/O 64 bit APB BUS 1kB Buffer AHB/APB Bridge 64 bit AHB BUS I2C Master I/O registers 48 kB SRAM FP CP DP Flash Mem Instr. Ext.
  • 7. e-FPGA Purposes
    • Processor ISA extensions
      • Simplest programmer’s model
      • Specific interface to the MPU datapath
      • Impact on processor performance
      • Impact on processor energy efficiency
      • Efficiency limited by instruction stream decoding
    • Bus-mapped co-processor
      • Maximum benefits in speed/power
    • Flexible I/O
  • 8. e-FPGA – Microprocessor interface E Clock Ctrl Other FPGA Purposes Instruction extension R Pipe Control Decode Register File Instruction Result Microprocessor clock e-FPGA Clock
  • 9. Flash Memory Architecture DP CP FP 8-bit  P  P I/F PMA DFT Power Block 2Mb #0 FPGA Port Code Port Data Port 2Mb #1 2Mb #2 2Mb #3 128-bit Memory Sub-System Crossbar 128 128 128 128 64 64 32
  • 10. Flash Memory Subsystem
    • Modular approach
      • Customizable array of N independent 2Mb modules
    • 3 content-specific ports (CP, DP, FP)
    • HW support for filesystem implem. (DP)
      • Defrag
      • Compression
      • Virtual erase
    • 2Mb Module features:
      • 128b I/O
      • 40ns access time (400MB/s peak throughput)
      • Power management and arbitration
    10
  • 11. System Memory Hierarchy 64-bit AHB Bus 32-bit uP RegisterFile 6x4 128-bit Crossbar 4 x 16384 x 128-bit Memory Module AHB Bridge 4 x Flash Memory Controller Logic 64 bit Port CP 32-bit Port FP 2 x 64- + 1 x 32-bit Memory Port I/Fs 64-bit CP I/F 64-bit DP I/F DMA 64-bit AHB 32-bit FPGA PI/F 32-bit 512-B Buffer 64-bit Port DP
    • AHB Peak Throughput:
      • 800MB/s
    • e-FPGA
      • 400MB/s
      • (50MB/s sustained)
    • Total Aggregate Peak
      • 1.2GB/s
  • 12. Application Ex.: Face Recognition
    • Target application:
      • Recognize a face out of twenty
      • low-resolution images from CMOS cameras
    • Potential applications:
      • Low cost smart toys
      • Advanced human-machine interfaces
      • Color CMOS camera processors
    • Image preprocessing: Bayer filter
    • Face location: based on Hough transform
    • Face recognition: Line-Based
        • Recognition rates over 90 %
        • Scale-invariant
        • Tolerant to changes in illumination intensity
  • 13. Processor Extension (I) _ x + + + ‘ 8’ ’ 16’  Processor Load Unit 64-bit register Result 4-segm. 4-segm.
    • 8-issue, 8-bit L2 distance
    • Complexity:
      • 23 8-bit OPS
      • 6 64-bit OPS
    • 1GOPS peak throughput
      • Distance computation
    • 10k equiv. ASIC gates
    • Mapped to e-FPGA
  • 14. Processor Extension (II) Number Remaind. root >>1 << 1 <<2 >>2 >>30 + _ +1 > + 2 Result
    • Fixed-point square root kernel
    • Complexity:
      • 12 32-bit OPS
    • 2k equiv. ASIC gates
    • Mapped to e-FPGA
  • 15. Performance: Processing Time @ 100 MHz x 10.6 860 msec 9.15 sec Face Recognition (20-face database) x 8.5 x 4 x 1.8 x 2.3 Speed-Up 1.26 sec 382 msec 2.5 msec 24.7 msec RISC w/ basic DSP + uP Ext. 10.7 sec 1.5 sec 4.5 msec 58 msec RISC w/ basic DSP Face Detection Totals Edge Detection Bayer Filter Algorithm Stage
  • 16. Energy Efficiency vs. Flexibility Flexibility (Coverage) Energy Efficiency (MOPS/mW) Embedded Processors ASIPs, DSPs Dedicated HW 0.1 1 10 100 1000 from: Zhang et Al., ISSCC 2000 Energy-Flexibility Gap ! FPGA-mapped CoProcessors uP + FPGA Instructions
  • 17. Performance: Energy Efficiency x 95.4 x 9 x 10.6 Face Recognition (20-face database) x 57 x 11.6 x 1.7 x 3.2 Energy x Delay Gain x 6.7 x 2.9 x 0.95 x 1.4 Energy Gain x 8.5 x 4 x 1.8 x 2.3 Speed-Up Face Detection Totals Edge Detection Bayer Filter Algorithm Stage
  • 18. Cycle Accurate Simulation Performance Analysis C VHDL (e-FPGA) HW (RTL) uP , AHB/APB Bus Peripherals SW Apps SoC Integration uP ISS Functional model (untimed) Partitioning / I/F Synthesis / Refinement Libraries HW/SW Soft Hardware (eFPGA) eFPGA mapping eFPGA HARD MACRO Inst.Ext. Verilog
  • 19. Inst. Ext. Synthesis Mapping (P&R) CPU c ore , IPs Interface RTL code Flash RAM Synthesis Floorplanning / P & R Static Timing Analysis, Dynamic Verification Static Timing Analysis (SoC + eFPGA) FPGA Timing DB Bit-stream Coproc. I/O I/F eFPGA core Con. Netlist + Timing Database Silicon fab
  • 20. Chip Layout 48 KB SRAM BUFFER Embedded FPGA TAGS 8+8 KB I$ + D$ 32b uP + AHB & APB + 250k GATES 1MB FLASH Memory uP AHB/APB FPGA 8+8 kB I$+D$ DFT Flash Ports Buffers 48kB SRAM 2.7-3.6V (external), 1.8V(core) Supply 24 inputs + 24 outputs (tristate) + 8 bidirs I/O 8.4 x 8.4 mm2 ( e-FPGA size: 8.2 mm2) Chip size Main: 48kB (64-bit) I$: 8kB (64-bit) D$: 8kB (64-bit) Buffers: 4x256B SRAM Memory 256kB x 9 sectors 128-bit word 1MB/s write through. 400MB/s read through. Flash Memory (x4) 0.18um CMOS 2P/6M Embedded Flash Process
  • 21. Chip Performances and Power Consumption 300mW @ 100MHz, 1.8V Chip average power consumption 500us @ 100MHz clock Reconfiguration speed: 125MHz (WCMIL) Processor maximum speed:
  • 22. Summary
    • e-FPGAs allow architectural tradeoffs for reconfigurable embedded systems:
      • Processor ISA extensions
      • Bus-mapped co-processor
      • Flexible I/O
    • Modular, content-specific, multiport e-Flash
    • Performance figures:
      • Up to 10x speedup
      • Up to 9x energy reduction
      • Dynamic reconfiguration in 500 us
    • Specific design-flow for system and RTL
  • 23. Acknowledgements: The authors thank: all the colleagues of NVM-DP Dept. A. Maurelli, F. Piazza and L. Fumagalli.