Your SlideShare is downloading. ×
0
A Reconfigurable Signal Processing IC with embedded FPGA and ...
A Reconfigurable Signal Processing IC with embedded FPGA and ...
A Reconfigurable Signal Processing IC with embedded FPGA and ...
A Reconfigurable Signal Processing IC with embedded FPGA and ...
A Reconfigurable Signal Processing IC with embedded FPGA and ...
A Reconfigurable Signal Processing IC with embedded FPGA and ...
A Reconfigurable Signal Processing IC with embedded FPGA and ...
A Reconfigurable Signal Processing IC with embedded FPGA and ...
A Reconfigurable Signal Processing IC with embedded FPGA and ...
A Reconfigurable Signal Processing IC with embedded FPGA and ...
A Reconfigurable Signal Processing IC with embedded FPGA and ...
A Reconfigurable Signal Processing IC with embedded FPGA and ...
A Reconfigurable Signal Processing IC with embedded FPGA and ...
A Reconfigurable Signal Processing IC with embedded FPGA and ...
A Reconfigurable Signal Processing IC with embedded FPGA and ...
A Reconfigurable Signal Processing IC with embedded FPGA and ...
A Reconfigurable Signal Processing IC with embedded FPGA and ...
A Reconfigurable Signal Processing IC with embedded FPGA and ...
A Reconfigurable Signal Processing IC with embedded FPGA and ...
A Reconfigurable Signal Processing IC with embedded FPGA and ...
A Reconfigurable Signal Processing IC with embedded FPGA and ...
A Reconfigurable Signal Processing IC with embedded FPGA and ...
A Reconfigurable Signal Processing IC with embedded FPGA and ...
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

A Reconfigurable Signal Processing IC with embedded FPGA and ...

502

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
502
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. A Reconfigurable Signal Processing IC with embedded FPGA and Multi-Port Flash Memory M. Borgatti , L. Calì, G. De Sandre, B. Forêt, D. Iezzi, F. Lertora, G. Muzzi, M. Pasotti, M. Poles, P.L. Rolandi STMicroelectronics - Central R&D - Italy
  • 2. Outline of Presentation <ul><li>Project motivation and background </li></ul><ul><li>System architecture </li></ul><ul><ul><li>Reconfigurable core </li></ul></ul><ul><ul><li>Memory subsystem </li></ul></ul><ul><li>System performance </li></ul><ul><ul><li>Application example: embedded face recognition system </li></ul></ul><ul><li>Energy efficiency, measurements </li></ul><ul><li>SoC integration and design flow </li></ul><ul><ul><li>System 2 RTL and RTL 2 Layout </li></ul></ul><ul><li>Summary </li></ul>
  • 3. Project motivation and background <ul><li>Conflicting industry trends </li></ul><ul><ul><li>Economics of system integration </li></ul></ul><ul><ul><ul><li>Even more complex SoC </li></ul></ul></ul><ul><ul><ul><li>More integration </li></ul></ul></ul><ul><ul><ul><li>Cost effectiveness and performance (per unit) </li></ul></ul></ul><ul><ul><li>Increasing design complexity and risks </li></ul></ul><ul><ul><li>Increasing NREs </li></ul></ul><ul><ul><li>Shorter time-to-market and product life </li></ul></ul><ul><li>Strong need for: </li></ul><ul><ul><li>Faster project turnaround </li></ul></ul><ul><ul><li>Lower risk </li></ul></ul><ul><li>Usage of re-configurable silicon fabrics </li></ul>
  • 4. Project motivation and background <ul><li>Pragmatic approach proposed: </li></ul><ul><ul><li>Reconfigurable architecture </li></ul></ul><ul><ul><li>Joins a statically extensible processor with e-FPGA </li></ul></ul><ul><ul><li>Tight connection to Flash memory subsystem </li></ul></ul><ul><ul><li>Open architecture with flexible programmable I/O </li></ul></ul><ul><li>Programmable platform approach </li></ul><ul><ul><li>Simple model for programmers </li></ul></ul>
  • 5. Programmable Platform Approach System Applications Family System Application Silicon process + Enabling technologies Platform Compilation Config. Proc + e-FPGA Application Compilation Programmable platform 5
  • 6. System Architecture 8KB D$ Inst. Ext I/F Extensible MPU bus bridge e-FPGA General Purpose I/O Lines 8KB D$ 8KB I$ I2C BUS M/S AHB I/F INTs DMA & FPGA Prog. I/F Buffer I/F GP I/O 64 bit APB BUS 1kB Buffer AHB/APB Bridge 64 bit AHB BUS I2C Master I/O registers 48 kB SRAM FP CP DP Flash Mem Instr. Ext.
  • 7. e-FPGA Purposes <ul><li>Processor ISA extensions </li></ul><ul><ul><li>Simplest programmer’s model </li></ul></ul><ul><ul><li>Specific interface to the MPU datapath </li></ul></ul><ul><ul><li>Impact on processor performance </li></ul></ul><ul><ul><li>Impact on processor energy efficiency </li></ul></ul><ul><ul><li>Efficiency limited by instruction stream decoding </li></ul></ul><ul><li>Bus-mapped co-processor </li></ul><ul><ul><li>Maximum benefits in speed/power </li></ul></ul><ul><li>Flexible I/O </li></ul>
  • 8. e-FPGA – Microprocessor interface E Clock Ctrl Other FPGA Purposes Instruction extension R Pipe Control Decode Register File Instruction Result Microprocessor clock e-FPGA Clock
  • 9. Flash Memory Architecture DP CP FP 8-bit  P  P I/F PMA DFT Power Block 2Mb #0 FPGA Port Code Port Data Port 2Mb #1 2Mb #2 2Mb #3 128-bit Memory Sub-System Crossbar 128 128 128 128 64 64 32
  • 10. Flash Memory Subsystem <ul><li>Modular approach </li></ul><ul><ul><li>Customizable array of N independent 2Mb modules </li></ul></ul><ul><li>3 content-specific ports (CP, DP, FP) </li></ul><ul><li>HW support for filesystem implem. (DP) </li></ul><ul><ul><li>Defrag </li></ul></ul><ul><ul><li>Compression </li></ul></ul><ul><ul><li>Virtual erase </li></ul></ul><ul><li>2Mb Module features: </li></ul><ul><ul><li>128b I/O </li></ul></ul><ul><ul><li>40ns access time (400MB/s peak throughput) </li></ul></ul><ul><ul><li>Power management and arbitration </li></ul></ul>10
  • 11. System Memory Hierarchy 64-bit AHB Bus 32-bit uP RegisterFile 6x4 128-bit Crossbar 4 x 16384 x 128-bit Memory Module AHB Bridge 4 x Flash Memory Controller Logic 64 bit Port CP 32-bit Port FP 2 x 64- + 1 x 32-bit Memory Port I/Fs 64-bit CP I/F 64-bit DP I/F DMA 64-bit AHB 32-bit FPGA PI/F 32-bit 512-B Buffer 64-bit Port DP <ul><li>AHB Peak Throughput: </li></ul><ul><ul><li>800MB/s </li></ul></ul><ul><li>e-FPGA </li></ul><ul><ul><li>400MB/s </li></ul></ul><ul><ul><li>(50MB/s sustained) </li></ul></ul><ul><li>Total Aggregate Peak </li></ul><ul><ul><li>1.2GB/s </li></ul></ul>
  • 12. Application Ex.: Face Recognition <ul><li>Target application: </li></ul><ul><ul><li>Recognize a face out of twenty </li></ul></ul><ul><ul><li>low-resolution images from CMOS cameras </li></ul></ul><ul><li>Potential applications: </li></ul><ul><ul><li>Low cost smart toys </li></ul></ul><ul><ul><li>Advanced human-machine interfaces </li></ul></ul><ul><ul><li>Color CMOS camera processors </li></ul></ul><ul><li>Image preprocessing: Bayer filter </li></ul><ul><li>Face location: based on Hough transform </li></ul><ul><li>Face recognition: Line-Based </li></ul><ul><ul><ul><li>Recognition rates over 90 % </li></ul></ul></ul><ul><ul><ul><li>Scale-invariant </li></ul></ul></ul><ul><ul><ul><li>Tolerant to changes in illumination intensity </li></ul></ul></ul>
  • 13. Processor Extension (I) _ x + + + ‘ 8’ ’ 16’  Processor Load Unit 64-bit register Result 4-segm. 4-segm. <ul><li>8-issue, 8-bit L2 distance </li></ul><ul><li>Complexity: </li></ul><ul><ul><li>23 8-bit OPS </li></ul></ul><ul><ul><li>6 64-bit OPS </li></ul></ul><ul><li>1GOPS peak throughput </li></ul><ul><ul><li>Distance computation </li></ul></ul><ul><li>10k equiv. ASIC gates </li></ul><ul><li>Mapped to e-FPGA </li></ul>
  • 14. Processor Extension (II) Number Remaind. root >>1 << 1 <<2 >>2 >>30 + _ +1 > + 2 Result <ul><li>Fixed-point square root kernel </li></ul><ul><li>Complexity: </li></ul><ul><ul><li>12 32-bit OPS </li></ul></ul><ul><li>2k equiv. ASIC gates </li></ul><ul><li>Mapped to e-FPGA </li></ul>
  • 15. Performance: Processing Time @ 100 MHz x 10.6 860 msec 9.15 sec Face Recognition (20-face database) x 8.5 x 4 x 1.8 x 2.3 Speed-Up 1.26 sec 382 msec 2.5 msec 24.7 msec RISC w/ basic DSP + uP Ext. 10.7 sec 1.5 sec 4.5 msec 58 msec RISC w/ basic DSP Face Detection Totals Edge Detection Bayer Filter Algorithm Stage
  • 16. Energy Efficiency vs. Flexibility Flexibility (Coverage) Energy Efficiency (MOPS/mW) Embedded Processors ASIPs, DSPs Dedicated HW 0.1 1 10 100 1000 from: Zhang et Al., ISSCC 2000 Energy-Flexibility Gap ! FPGA-mapped CoProcessors uP + FPGA Instructions
  • 17. Performance: Energy Efficiency x 95.4 x 9 x 10.6 Face Recognition (20-face database) x 57 x 11.6 x 1.7 x 3.2 Energy x Delay Gain x 6.7 x 2.9 x 0.95 x 1.4 Energy Gain x 8.5 x 4 x 1.8 x 2.3 Speed-Up Face Detection Totals Edge Detection Bayer Filter Algorithm Stage
  • 18. Cycle Accurate Simulation Performance Analysis C VHDL (e-FPGA) HW (RTL) uP , AHB/APB Bus Peripherals SW Apps SoC Integration uP ISS Functional model (untimed) Partitioning / I/F Synthesis / Refinement Libraries HW/SW Soft Hardware (eFPGA) eFPGA mapping eFPGA HARD MACRO Inst.Ext. Verilog
  • 19. Inst. Ext. Synthesis Mapping (P&R) CPU c ore , IPs Interface RTL code Flash RAM Synthesis Floorplanning / P & R Static Timing Analysis, Dynamic Verification Static Timing Analysis (SoC + eFPGA) FPGA Timing DB Bit-stream Coproc. I/O I/F eFPGA core Con. Netlist + Timing Database Silicon fab
  • 20. Chip Layout 48 KB SRAM BUFFER Embedded FPGA TAGS 8+8 KB I$ + D$ 32b uP + AHB & APB + 250k GATES 1MB FLASH Memory uP AHB/APB FPGA 8+8 kB I$+D$ DFT Flash Ports Buffers 48kB SRAM 2.7-3.6V (external), 1.8V(core) Supply 24 inputs + 24 outputs (tristate) + 8 bidirs I/O 8.4 x 8.4 mm2 ( e-FPGA size: 8.2 mm2) Chip size Main: 48kB (64-bit) I$: 8kB (64-bit) D$: 8kB (64-bit) Buffers: 4x256B SRAM Memory 256kB x 9 sectors 128-bit word 1MB/s write through. 400MB/s read through. Flash Memory (x4) 0.18um CMOS 2P/6M Embedded Flash Process
  • 21. Chip Performances and Power Consumption 300mW @ 100MHz, 1.8V Chip average power consumption 500us @ 100MHz clock Reconfiguration speed: 125MHz (WCMIL) Processor maximum speed:
  • 22. Summary <ul><li>e-FPGAs allow architectural tradeoffs for reconfigurable embedded systems: </li></ul><ul><ul><li>Processor ISA extensions </li></ul></ul><ul><ul><li>Bus-mapped co-processor </li></ul></ul><ul><ul><li>Flexible I/O </li></ul></ul><ul><li>Modular, content-specific, multiport e-Flash </li></ul><ul><li>Performance figures: </li></ul><ul><ul><li>Up to 10x speedup </li></ul></ul><ul><ul><li>Up to 9x energy reduction </li></ul></ul><ul><ul><li>Dynamic reconfiguration in 500 us </li></ul></ul><ul><li>Specific design-flow for system and RTL </li></ul>
  • 23. Acknowledgements: The authors thank: all the colleagues of NVM-DP Dept. A. Maurelli, F. Piazza and L. Fumagalli.

×