Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Conference on Adaptive Hardware and Systems (AHS'14) - The DSP for FlexTiles

387 views

Published on

The FP7 FlexTiles Project uses DSP accelerators. They are connected with each other - and with the general purpose procesors (GPPs) through a Network-on-Chip (NoC). These slides give the details about the DSP accelerator.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Conference on Adaptive Hardware and Systems (AHS'14) - The DSP for FlexTiles

  1. 1. www.flextiles.eu FlexTiles Workshop at AHS’2014 conference: FlexTiles FP7 project Low-Power DSP Accelerator Embedded in a Heterogeneous Many-Core Architecture Marc MORGAN CSEM – Swiss Center for Electronics and Microtechnology
  2. 2. 1 / TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofFlexTilesconsortium.Youareherebynotifiedthatanyreview,dissemination,distribution, copyingorotherwiseuseofthisdocumentmustbedoneinaccordancewiththeCAoftheproject(TRT/DJ/624412785.2011).Templateversion1.0 CSEM overview on a single slide • private company, founded in the 1980’s, not for profit • approx. 400 employees on 5 sites in Switzerland (HQ in Neuchatel) and a site Brazil • 5 research programs: 1. ultra-low power integrated systems (SoC, Vision, Wireless) 2. systems engineering (med tech, instrumentation, automation) 3. MEMS 4. surface engineering (nano, bio, printable electronics) 5. photovoltaic • approx. 70 MCHF annual budget • over 20 start-ups and spin-offs since 1995
  3. 3. 2 / TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofFlexTilesconsortium.Youareherebynotifiedthatanyreview,dissemination,distribution, copyingorotherwiseuseofthisdocumentmustbedoneinaccordancewiththeCAoftheproject(TRT/DJ/624412785.2011).Templateversion1.0 Many-core architecture: GPPs + accelerators • An array of general purpose processors (GPP) • Connected via a Network-on-Chip (NoC) • Complemented with accelerators to optimize speed and power:  DSP processors or specialized logic implemented in embedded- FPGA • Plus memory nodes and I/O
  4. 4. 3 / TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofFlexTilesconsortium.Youareherebynotifiedthatanyreview,dissemination,distribution, copyingorotherwiseuseofthisdocumentmustbedoneinaccordancewiththeCAoftheproject(TRT/DJ/624412785.2011).Templateversion1.0 Many-core architecture: GPPs + accelerators (cont’d) Several IPs are available for the building blocks  both in the consortium and on the market  architectural choices attempt to retain genericity of the platform CSEM provides an ultra-low power DSP processor for the DSP accelerator It plugs into a generic accelerator interface (AI)
  5. 5. 4 / TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofFlexTilesconsortium.Youareherebynotifiedthatanyreview,dissemination,distribution, copyingorotherwiseuseofthisdocumentmustbedoneinaccordancewiththeCAoftheproject(TRT/DJ/624412785.2011).Templateversion1.0 Accelerator interface (AI) Interfaces the NoC’s NI to the accelerator by providing services:  programming, control/status, data in, data out, debug  DMA access, word FIFOs, notification
  6. 6. 5 / TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofFlexTilesconsortium.Youareherebynotifiedthatanyreview,dissemination,distribution, copyingorotherwiseuseofthisdocumentmustbedoneinaccordancewiththeCAoftheproject(TRT/DJ/624412785.2011).Templateversion1.0 DSP accelerator architecture Choices for the DSP accelerator avoid DSP specific features  the DSP will not run an OS or kernel  the DSP will not use (or at least not require) interruptions  Note: CSEM’s icyflex4 ULP DSP could support both of the above Implement a FIFO manager to handle input and output tokens from/to the accelerator interface (AI) Implement debug and tracing facilities  Debug: JTAG 1149.1 TAP  Tracing: programmable tracing unit
  7. 7. 6 / TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofFlexTilesconsortium.Youareherebynotifiedthatanyreview,dissemination,distribution, copyingorotherwiseuseofthisdocumentmustbedoneinaccordancewiththeCAoftheproject(TRT/DJ/624412785.2011).Templateversion1.0 DSP accelerator architecture (cont’d)
  8. 8. 7 / TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofFlexTilesconsortium.Youareherebynotifiedthatanyreview,dissemination,distribution, copyingorotherwiseuseofthisdocumentmustbedoneinaccordancewiththeCAoftheproject(TRT/DJ/624412785.2011).Templateversion1.0 Management of the DSP accelerator Each accelerator is managed by software running on GPPs  virtualization manager: attribution of the accelerator  resource manager: control of the accelerator These managers are in charge of:  transfer of the application (ELF) to the accelerator  signaling the accelerator when to start and when to stop  recovering statistics on usage of the accelerator to optimize the execution of the application on the many-core platform The tracing unit can be managed from the processor or from the JTAG interface
  9. 9. 8 / TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofFlexTilesconsortium.Youareherebynotifiedthatanyreview,dissemination,distribution, copyingorotherwiseuseofthisdocumentmustbedoneinaccordancewiththeCAoftheproject(TRT/DJ/624412785.2011).Templateversion1.0 Ultra low-power (ULP) processors at CSEM CSEM was founded in the 1980s to promote innovation 1980s, initially for the Swiss watch industry  ULP 4-bit processors: PUNCH, µPUS, Combo, ... 1990s, development of a general purpose ULP 8-bit processor:  CoolRISC: licensed to Swatch group, TI, Semtech, ... 2000s, powerful new ULP processors with DSP features  2006: icyflex1, a flexible processor for DSP/control apps  2009: icyflex2, a smaller processor for control applications  2009: icyflex4, a scalable processor for DSP/control apps icyflex is a registered trademark of CSEM
  10. 10. 9 / TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofFlexTilesconsortium.Youareherebynotifiedthatanyreview,dissemination,distribution, copyingorotherwiseuseofthisdocumentmustbedoneinaccordancewiththeCAoftheproject(TRT/DJ/624412785.2011).Templateversion1.0 icyflex family of ultra-low power processors icyflex2Control Computing Power DSP icyflex1 icyflex4 1 MUL 2 MAC 4 MAC … 36 MAC Application 6 µW/MHz 25 µW/MHz 10-150 µW/MHz 12 MAC power indicated for TSMC 65 nm CMOS
  11. 11. 10 / TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofFlexTilesconsortium.Youareherebynotifiedthatanyreview,dissemination,distribution, copyingorotherwiseuseofthisdocumentmustbedoneinaccordancewiththeCAoftheproject(TRT/DJ/624412785.2011).Templateversion1.0 icyflex software development kit GNU C compiler (gcc) v 4.6.3  icyflex instruction parallelism supported by latest releases of gcc  libc and libm from RedHat’s NewLib  software implementation of IEEE floating-point standard GNU assembler / linker (binutils), v 2.20  BFD / ELF32 object file format  Binary, SREC, IHEX memory image file formats GNU debugger (gdb), v 6.7.1  Mode 1: instruction set simulator of the icyflex core  Mode 2: On-Chip Debug (OCD) through a JTAG interface icyflex instruction set simulator (ISS), written in C++  Phase-accurate, pipelined  Wrappers to SystemC, VHDL (Modelsim), Matlab/Simulink Eclipse integrated development environment, v Helios  CDT C/C++ IDE plug-in  icyflex plug-in .c .o .exe .log gcc ld gdbgdb
  12. 12. 11 / TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofFlexTilesconsortium.Youareherebynotifiedthatanyreview,dissemination,distribution, copyingorotherwiseuseofthisdocumentmustbedoneinaccordancewiththeCAoftheproject(TRT/DJ/624412785.2011).Templateversion1.0 icyflex2 vs icyflex4 Feature icyflex2 icyflex4 VPS=2 Optimized for Control DSP P, X, Y memory buses, ISA, HW loops, saturation, … Instruction word [bits] 32 (1 or 2 sub) 64 (1, 2 or 3 sub) Memory access [bits] 8, 16 or 32 2x (8, 16, 32, 64, 128) Data processing [bits] 16 or 32, trunc 2x (16 or 32 or 64), full Single Instr. Multiple Data (SIMD) No Yes, up to 8 MAC Instruction set is reconfigurable on the fly No Yes Software Development Kit (SDK) GNU-based tool suite (gcc, gdb) + cycle- accurate instruction set simulator (ISS) Hardware Devt Kit (HDK) FPGA-based, customizable VPS = Vector Processing Slices in the Vector Processing Unit of the DSP
  13. 13. 12 / TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofFlexTilesconsortium.Youareherebynotifiedthatanyreview,dissemination,distribution, copyingorotherwiseuseofthisdocumentmustbedoneinaccordancewiththeCAoftheproject(TRT/DJ/624412785.2011).Templateversion1.0 blank instructions configured at run-time icyflex: reconfigurable instructions and addressing modes Instruction set ADD MUL SHR MAC JUMP configurable configurable SHIFT MUX ALU ACC ACC SHIFT MUX ALU ACC ACC instructiondecoding cycle N: config MOP cycle N+1: use MOP
  14. 14. 13 / TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofFlexTilesconsortium.Youareherebynotifiedthatanyreview,dissemination,distribution, copyingorotherwiseuseofthisdocumentmustbedoneinaccordancewiththeCAoftheproject(TRT/DJ/624412785.2011).Templateversion1.0 DSP in FlexTiles emulators Emulator 1 (software):  Using Open Virtual Platform (OVP)  Not cycle accurate  The icyflex4 DSP is emulated by a GPP running at a higher frequency Emulator 2 (hardware):  Using an FPGA board with two Xilinx Virtex6 FPGAs  Uses a DFF version of the DSP accelerator
  15. 15. 14 / TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofFlexTilesconsortium.Youareherebynotifiedthatanyreview,dissemination,distribution, copyingorotherwiseuseofthisdocumentmustbedoneinaccordancewiththeCAoftheproject(TRT/DJ/624412785.2011).Templateversion1.0 Exploitation of FlexTiles results at CSEM CSEM specializes in low power solutions A well-balanced multi-processor design can optimize energy consumption by reducing voltage and frequency For multi-core: we offer CSEM solutions For many-core: CSEM collaborates with 1 or more of our partners  including e.g. a follow up project to produce FlexTiles chips
  16. 16. 15 / TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofFlexTilesconsortium.Youareherebynotifiedthatanyreview,dissemination,distribution, copyingorotherwiseuseofthisdocumentmustbedoneinaccordancewiththeCAoftheproject(TRT/DJ/624412785.2011).Templateversion1.0 FlexTiles FP7 project For more information regarding the FlexTiles project, visit: http://www.flextiles.eu Please take 5 minutes to fill out the survey on the project web site under the Contact menu The FlexTiles project is funded in part by FP7, the seventh framework programme of the European Commission.
  17. 17. www.flextiles.eu FlexTiles Thank you for your attention! For more information: http://www.csem.ch Questions? mailto:marc.morgan@csem.ch

×