High-Level Synthesis with         GAUT     Université de Bretagne-Sud             Lab-STICC          Philippe COUSSY      ...
OutlineLab-STICCHigh-Level SynthesisGAUTExperimental resultsConclusion                                 2/68
Université de Bretagne Sud           EuropeFrance                     Brittany                                      3/68
Lab-STICC Laboratory“Laboratoire des Sciences etTechniques de l’Information, de laCommunication et de la Connaissance”    ...
Lab-STICC organizationLab-STICC is decomposed in three researchdomains (3 departments)  Dept. MOM     Microwaves and Mater...
OutlineLab-STICCHigh-Level SynthesisGAUTExperimental resultsConclusion                                 6/68
Typical HW design flowStarting from a Register Transfer Level description,generate an IC layout                           ...
Typical HW design flow                  Starting from a functional description, automatically                  generate an...
Electronic System Level Design (ESLD)Transistors                                                Circuit complexity        ...
High-level synthesisStarting from a functional description, automaticallygenerate an RTL architecture  Algorithmic descrip...
Synthesizable modelsC for the synthesis:  No pointer      Statically unresolved      Arrays are allowed!  No standard func...
Purely functional Example #1: a simple C code #define N 16 int main(int data_in, int *data_out) { static const int Coeffs ...
Purely functional example #2: bit accurate C++ code  #include "ac_fixed.h" // From Mentor Graphics  #define PORT_SIZE ac_f...
High-level synthesisStarting from a functional description, automaticallygenerate an RTL architecture  Algorithmic descrip...
Behavioral description    Behavioral description        Notion of step / local timing constraints in the source code      ...
High-level transformationsLoops  Loop pipelining,  loop unrolling        None, partially, completely  Loop merging  Loop t...
High-level synthesisStarting from a functional description, automaticallygenerate an RTL architecture  Algorithmic descrip...
Synthesis stepsCompilation  Generates a formal modeling of the specificationSelection  Chooses the architecture of the ope...
HLS steps: inputs                                  Constraints                                                  Operators ...
HLS steps: Compilation                                  Constraints                                                  Opera...
Synthesis stepsCompilation  Generates a formal modeling of the specificationSelection  Chooses the architecture of the ope...
HLS steps: Selection                                  Constraints                                                  Operato...
Synthesis stepsCompilation  Generates a formal modeling of the specificationSelection  Chooses the architecture of the ope...
HLS steps: allocation                                  Constraints                                                  Operat...
Synthesis stepsCompilation  Generates a formal modeling of the specificationSelection  Chooses the architecture of the ope...
HLS steps: scheduling                                  Constraints                                                     RCA...
Synthesis stepsCompilation  Generates a formal modeling of the specificationSelection  Chooses the architecture of the ope...
HLS steps: binding                                  Constraints                                                      RCA *...
Synthesis stepsCompilation  Generates a formal modeling of the specificationSelection  Chooses the architecture of the ope...
HLS steps: output                                  Constraints                                                Operation bi...
Academic toolsStreamroller (Univ. Mich.)SPARK (UCSD)xPilot (UCLA)UGH (TIMA+LIP6)MMALPHA (IRISA+CITI+…)ROCCC (UC Riverside)...
Commercial toolsCatapultC (Mentor Graphics => Calypto)PICO (Spin-off HP => Synfora => Synopsys)Cynthecizer (Forte design)C...
OutlineLab-STICCHigh-Level SynthesisGAUTExperimental resultsConclusion                                 33/68
GAUTAn academic, free and open source HLS toolDedicated to DSP applications  Data-dominated algorithm     1D, 2D Filters  ...
GAUT: Constraints         Bit accurate                                                        Synthesis constraintsAlgorit...
GAUT: Design flow         Bit accurate                                                     Synthesis constraints          ...
GAUT: Compilation                    37/68
GAUT: DFG viewer                   38/68
GAUT: Operators characterization                     Script and logic                     Area : operator only (nb slice) ...
GAUT: Synthesis steps                 Inititation Interval II  Clock period    I/O timing & memory constraints   Data Assg...
GAUT: I/O and memory constraints                                   41/68
GAUT: Gantt viewer                     42/68
GAUT: Interface synthesisPerformances of interfaces depend on data locality (datafetch penality, cache miss)Interface can ...
GAUT: Test-bench generation           Test-bench Generation           Modelsim Script Generation           Result File Gen...
GAUT: more than 100 downloads each year                                          45/68
OutlineLab-STICCHigh-Level SynthesisGAUTExperimental results   Design space exploration of HW accelerators   MPSoC Design ...
Experimental results: MJPEG decoding           Dc VLD     IDPCM    YuvDeMux         Huffman table                Dequant  ...
Synthesis resultsIDCTYUV2RGB                              48/68
Synthesis results  Virtualprototyping    IDCT                                   Hardware                                  ...
SoCLib: a virtual prototyping platformFrench National Research Project (ANR)Free and open source virtual prototyping envir...
SoCLib: Design flow              HLS           Simulation                             models                              ...
Experiments: architecture #1TG       Demux          VLD             IQ-ZZ      IDCT           Libu   RamDAC               ...
Experiments: architecture #2TG        Demux       VLD            IQ-ZZ     IDCT           Libu     RamDAC              Fil...
Experiments: architecture #3                         Demux        VLD      IQ-ZZ      IDCT     TG       Split             ...
Experiments: architecture #4                      Demux         VLD            IQ-ZZ           IDCTTG        Split        ...
MJPEG Results                     14%      21%                                        31%     IDCT generated by GAUT reduc...
MJPEG Results                                                 The 4 HW IDCT in the multiprocessor                         ...
MJPEG Results                                                                                  38%                        ...
MJPEG: Hardware prototyping    Real time decoding: 24 QCIF images/sec      IDCT: maximum I/O bandwidth (4 parallel input p...
Prototyping platformSundance platform   Mother board   Daughter boards       DSP C62 C67 (Texas Instrument)       FPGA Vir...
DVB-DSNG receiver architecture mapping                           C-functional architecture                                ...
DVB-DSNG receiver  Synchronization and interleaving : Sw : C62 DSP  Viterbi and Reed Solomon decoders : Hw : Virtex-1000E ...
Viterbi decoding• functional/application parameters : state number, throughput              State Number                  ...
Reed Solomon decoding• functional/application parameters : number of input symbols,data symbols, throughput               ...
ConclusionHLS allows to automatically generate several RTLarchitectures  From an algorithmic/behavioral description and a ...
References             66/68
References             67/68
High-Level Synthesis with         GAUT     Université de Bretagne-Sud             Lab-STICC          Philippe COUSSY      ...
Upcoming SlideShare
Loading in …5
×

High-Level Synthesis with GAUT

3,203 views

Published on

Dr. Philippe COUSSY from the Université de Bretagne-Sud gives an introduction to high-level synthesis principles and tools.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,203
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
146
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

High-Level Synthesis with GAUT

  1. 1. High-Level Synthesis with GAUT Université de Bretagne-Sud Lab-STICC Philippe COUSSY philippe.coussy@univ-ubs.fr 1/68
  2. 2. OutlineLab-STICCHigh-Level SynthesisGAUTExperimental resultsConclusion 2/68
  3. 3. Université de Bretagne Sud EuropeFrance Brittany 3/68
  4. 4. Lab-STICC Laboratory“Laboratoire des Sciences etTechniques de l’Information, de laCommunication et de la Connaissance” From sensor to knowledge: communicate and decideLab-STICC laboratory is composed of 4research centers Brest (2) Lorient (1) Vannes (1)Lab-STICC laboratory represents A staff of 400 people 182 researchers 178 PhDs More than 6,5M€ of research grants for 2010 A scientific production during the last 4 years 65 books or chapters 497 journal publications 1012 conference publications 26 new patents 69 PhDs diploma 4/68
  5. 5. Lab-STICC organizationLab-STICC is decomposed in three researchdomains (3 departments) Dept. MOM Microwaves and Material Head: Patrick Queffelec 36 Faculties Dept. CID Knowledge, Information, Decision Head: Gilles Coppin 22 Faculties Dept. CACS Communications, Architectures, Circuits/Systems Head: Emmanuel Boutillon 59 Faculties 5/68
  6. 6. OutlineLab-STICCHigh-Level SynthesisGAUTExperimental resultsConclusion 6/68
  7. 7. Typical HW design flowStarting from a Register Transfer Level description,generate an IC layout RTL Logic synthesis Gate level netlist Layout GDSII 7/68
  8. 8. Typical HW design flow Starting from a functional description, automatically generate an RTL architecture#define N 2 Algorithmtypedef int matrix[N][N];int main(const matrix A, matrix C){ High-Level synthesis const matrice B ={{1, 2},{ 3, 4}}; int tmp; SystemC simulation int i,j,k; RTL models (CABA/TLM) for (i=0;i<N;i++) for (j=0;j<N;j++){ tmp = A[i][0]*B[0][j]; Logic synthesis Virtual prototyping for (k=1;k<N - 1;k++) tmp = tmp + A[i][k] * B[k][j]; Gate level netlist C[i][j] = tmp + A[i][N-1] * B[N-1][j]; } return 0; Layout} GDSII 8/68
  9. 9. Electronic System Level Design (ESLD)Transistors Circuit complexity System-Level Design Language & virtual prototyping n io at m to au IP- & Plateform- based design n Abstraction ig es D Designer productivity Co-design & HLS RTL 95 00 05 10 Year 9/68
  10. 10. High-level synthesisStarting from a functional description, automaticallygenerate an RTL architecture Algorithmic description No timing notion in the source code Mainly oriented toward data dominated application Highly processing algorithm like filters… Initial description can be “RTL oriented” “Function oriented” Behavioral description Notion of step / local timing constraints in the source code by using the wait statements of SystemC for example Can be used for both data and control dominated application Interface controller, DMA… Filters… 10/68
  11. 11. Synthesizable modelsC for the synthesis: No pointer Statically unresolved Arrays are allowed! No standard function call printf, scanf, fopen, malloc… Function calls are allowed Can be in-lined or not Finite precision Bit accurate integers, fixed point, signed, unsigned… Based on SystemC or Mentor Graphics data types sc_int, sc_fixed ac_int, ac_fixed 11/68
  12. 12. Purely functional Example #1: a simple C code #define N 16 int main(int data_in, int *data_out) { static const int Coeffs [N] = {98,-39,-327,439,950,-2097,-1674,9883,9883,-1674,-2097,950,439,-327,-39,98}; int Values[N]; int temp; int sample,i,j; sample = data_in; temp = sample * Coeffs[N-1]; for(i = 1; i<=(N-1); i++){ temp += Values[i] * Coeffs[N-i-1]; } for(j=(N-1); j>=2; j-=1 ){ Values[j] = Values[j-1]; } Values[1] = sample; *data_out=temp; return 0; } 12/68
  13. 13. Purely functional example #2: bit accurate C++ code #include "ac_fixed.h" // From Mentor Graphics #define PORT_SIZE ac_fixed<16, 12, true, AC_RND,AC_SAT> // 16 bits, 12 bits after the point, quantization = rounding, overflow = saturation #define N 16 int main(PORT_SIZE data_in, PORT_SIZE &data_out) { static const PORT_SIZE Coeffs [N]={1.1, 1.5, 1.0, 1.0, 1.7, 1.8, 1.2, 1.0, 1.6, 1.0, 1.5, 1.1, 1.9, 1.3, 1.4, 1.7}; PORT_SIZE Values[N]; PORT_SIZE temp; PORT_SIZE sample; sample= data_in; temp = sample * Coeffs[N-1]; for(int i = 1; i<=(N-1); i++){ temp = Values [i] * Coeffs[N-i-1] + temp; } for(int j=(N-1); j>=2; j-=1 ){ Values[j] = Values [j-1]; } Values[1] = sample; data_out=temp; return 0; } 13/68
  14. 14. High-level synthesisStarting from a functional description, automaticallygenerate an RTL architecture Algorithmic description No timing notion in the source code Mainly oriented toward data dominated application Highly processing algorithm like filters… Initial description can be “RTL oriented” “Function oriented” Behavioral description Notion of step / local timing constraints in the source code by using the wait statements of SystemC for example Can be used for both data and control dominated application Interface controller, DMA… Filters… 14/68
  15. 15. Behavioral description Behavioral description Notion of step / local timing constraints in the source code by using the wait statements of SystemC for example ... void addmul() { sc_signal<sc_uint<32> > tmp1; tmp1 = 0;Reset state result = 0; wait(); while (1) { tmp1 = b * c; First state wait(); Second state result = a + tmp1; wait(); } } Cycle-by-cycle FSMD ... with reset state 15/68
  16. 16. High-level transformationsLoops Loop pipelining, loop unrolling None, partially, completely Loop merging Loop tiling …Arrays Arrays can be mapped on memory banks Arrays can be synthesized as registers Constant arrays can be synthesized as logic …Functions Function calls can be in-lined Function is synthesized as an operator Sequential, pipelined, functional unit… Single function instantiation … 16/68
  17. 17. High-level synthesisStarting from a functional description, automaticallygenerate an RTL architecture Algorithmic description no timing notion in the source code Behavioral description Notion of step / local timing constraints in the source code by using the wait statements of SystemC for exampleConstraints Timing constraints: latency and/or throughput Resource constraints: #Operators and/or #Registers and/or #Memory, #Slices...Objectives Minimization: area i.e. resources, latency, power consumption… Maximization: throughputLibrary of characterized operators 17/68
  18. 18. Synthesis stepsCompilation Generates a formal modeling of the specificationSelection Chooses the architecture of the operatorsAllocation Defines the number of operators for each selected typeScheduling Defines the execution date of each operationBinding (or Assignment) Defines which operator will execute a given operation Defines which memory element will store a dataArchitecture generation Writes out the RTL source code in the target language e.g. VHDL 18/68
  19. 19. HLS steps: inputs Constraints Operators library SpecificationOperators Specification Adders multipliers subtractorsLibrary ibrary CLA Booth CLA O = ((n01+n02)*n12)-(n21+n22) Compilation RCA Wallace RCA Intermediate format Selection Allocation Scheduling Binding Architecture generation RTL architecture 19/68
  20. 20. HLS steps: Compilation Constraints Operators library SpecificationOperators Specification subtractors Adders multipliersLibrary CLA Booth CLA O = ((n01+n02)*n12)-(n21+n22) Compilation RCA Wallace RCA Intermediate representation Intermediate n01 n02 n21 n22 format N0 N2 + + Selection Allocation n11 n12 N1 × Scheduling Binding n31 n32 N3 - Architecture generation O RTL architecture 20/68
  21. 21. Synthesis stepsCompilation Generates a formal modeling of the specificationSelection Chooses the architecture of the operatorsAllocation Defines the number of operators for each selected typeScheduling Defines the execution date of each operationBinding (or Assignment) Defines which operator will execute a given operation Defines which memory element will store a dataArchitecture generation Writes out the RTL source code in the target language e.g. VHDL 21/68
  22. 22. HLS steps: Selection Constraints Operators library SpecificationOperators Specification subtractors Adders multipliersLibrary CLA Booth CLA O = ((n01+n02)*n12)-(n21+n22) Compilation RCA Wallace RCA Intermediate representation Intermediate n01 n02 n21 n22 format N0 N2 + + Selection Allocation n11 n12 RCA N1 × Scheduling Binding Booth n31 n32 RCA N3 - Architecture generation O RTL architecture 22/68
  23. 23. Synthesis stepsCompilation Generates a formal modeling of the specificationSelection Chooses the architecture of the operatorsAllocation Defines the number of operators for each selected typeScheduling Defines the execution date of each operationBinding (or Assignment) Defines which operator will execute a given operation Defines which memory element will store a dataArchitecture generation Writes out the RTL source code in the target language e.g. VHDL 23/68
  24. 24. HLS steps: allocation Constraints Operators library SpecificationOperators Specification subtractors Adders multipliersLibrary CLA Booth CL O = ((n01+n02)*n12)-(n21+n22) Compilation RCA Wallace RCA Intermediate representation Intermediate n01 n02 n21 n22 format N0 N2 + + Selection Allocation n11 n12 RCA *1 N1 × Scheduling Binding Booth *1 n31 n32 RCA *1 N3 - Architecture generation O RTL architecture 24/68
  25. 25. Synthesis stepsCompilation Generates a formal modeling of the specificationSelection Chooses the architecture of the operatorsAllocation Defines the number of operators for each selected typeScheduling Defines the execution date of each operationBinding (or Assignment) Defines which operator will execute a given operation Defines which memory element will store a dataArchitecture generation Writes out the RTL source code in the target language e.g. VHDL 25/68
  26. 26. HLS steps: scheduling Constraints RCA *1Operators Specification Booth *1Library RCA *1 Compilation Intermediate format Selection Allocation Scheduling Binding N0 + Architecture N1 × N2 + generation RTL architecture N3 - 26/68
  27. 27. Synthesis stepsCompilation Generates a formal modeling of the specificationSelection Chooses the architecture of the operatorsAllocation Defines the number of operators for each selected typeScheduling Defines the execution date of each operationBinding (or Assignment) Defines which operator will execute a given operation Defines which memory element will store a dataArchitecture generation Writes out the RTL source code in the target language e.g. VHDL 27/68
  28. 28. HLS steps: binding Constraints RCA *1Operators Specification Booth *1Library RCA *1 Compilation Intermediate format Selection Allocation Operation binding Data Binding n01 R1 Scheduling Binding + n02 R2 n21, n11 R3 Architecture × + generation n22, n12 R4 n31 R5 RTL architecture - n32 R6 28/68
  29. 29. Synthesis stepsCompilation Generates a formal modeling of the specificationSelection Chooses the architecture of the operatorsAllocation Defines the number of operators for each selected typeScheduling Defines the execution date of each operationBinding (or Assignment) Defines which operator will execute a given operation Defines which memory element will store a dataArchitecture generation Writes out the RTL source code in the target language e.g. VHDL 29/68
  30. 30. HLS steps: output Constraints Operation binding Data binding n01 R1Operators Specification + × n02 R2Library × n21, n11 R3 - n22, n12 R4 Compilation n31 R5 n32 R6 Intermediate format Selection Allocation Controller - FSM controller Controller Scheduling Binding - Programmable controller Datapath components MUX R4 R6 Architecture - Storage components x generation - - Functional units R2 MUX R5 R3 - Connection components + RTL architecture R1 Datapath 30/68
  31. 31. Academic toolsStreamroller (Univ. Mich.)SPARK (UCSD)xPilot (UCLA)UGH (TIMA+LIP6)MMALPHA (IRISA+CITI+…)ROCCC (UC Riverside)GAUT (UBS / Lab-STICC)… 31/68
  32. 32. Commercial toolsCatapultC (Mentor Graphics => Calypto)PICO (Spin-off HP => Synfora => Synopsys)Cynthecizer (Forte design)Cyber (NEC)AutoPilot (AutoESL)C to Silicon (Candence)Synphony (Synopsys)… 32/68
  33. 33. OutlineLab-STICCHigh-Level SynthesisGAUTExperimental resultsConclusion 33/68
  34. 34. GAUTAn academic, free and open source HLS toolDedicated to DSP applications Data-dominated algorithm 1D, 2D Filters Transforms (Fourrier, Hadamar, DCT…) Channel Coding, source coding algorithmsInput : bit-accurate C/C++ algorithm bit-accurate integer and fixed-point from Mentor GraphicsOutput : RTL Architecture VHDL SystemC CABA: Cycle accurate and Bit accurate TLM: Transaction level model Compatible with both SocLib and MPARM virtual prototyping platformsAutomated Test-bench generationAutomated operators characterization 34/68
  35. 35. GAUT: Constraints Bit accurate Synthesis constraintsAlgorithm in bit-accurate C/C++ - Initiation Interval (Data average throughput ) - Clock frequency - FPGA/ASIC target technology GAUT - Memory architecture and mapping - I/O Timing diagram (scheduling + ports) - GALS/LIS Interface (FIFO protocol) Clock enable controller Req(i) Controller Data(i) GALS/LIS Memory Bus interface Data Unit Ack(i) Path Specific links & Internalprotocols buses 35/68
  36. 36. GAUT: Design flow Bit accurate Synthesis constraints Synthesis constraintsAlgorithm in bit-accurate C/C++ - Iteration Interval (Data (Data average Iteration period average throughput ) - Clock frequency throughput ) - FPGA/ASIC target technology Clock frequency - FPGA/ASIC target technology GAUT - Memory architecture and mapping - I/O Timing diagram (scheduling + ports) - GALS/LIS Interface (FIFO protocol) Clock enable controller Req(i) Controller Data(i) GALS/LIS Memory Bus interface Data Unit Ack(i) Path Specific links & Internalprotocols buses 36/68
  37. 37. GAUT: Compilation 37/68
  38. 38. GAUT: DFG viewer 38/68
  39. 39. GAUT: Operators characterization Script and logic Area : operator only (nb slice) R O R Propagation time : reg+tri+ope+reg Mux O Mux R R Database, interpolation… 39/68
  40. 40. GAUT: Synthesis steps Inititation Interval II Clock period I/O timing & memory constraints Data Assginment (Left Edge,MWBM…) HDL coding style: FSMD, FSM+reg, FSM_ROM+reg… 40/68
  41. 41. GAUT: I/O and memory constraints 41/68
  42. 42. GAUT: Gantt viewer 42/68
  43. 43. GAUT: Interface synthesisPerformances of interfaces depend on data locality (datafetch penality, cache miss)Interface can be:- Ping pong buffer (scratch-pad on Local Memory Bus)- FIFO (i.e. FSL Fast Simplex Link from Xilinx) 43/68
  44. 44. GAUT: Test-bench generation Test-bench Generation Modelsim Script Generation Result File Generation 44/68
  45. 45. GAUT: more than 100 downloads each year 45/68
  46. 46. OutlineLab-STICCHigh-Level SynthesisGAUTExperimental results Design space exploration of HW accelerators MPSoC Design space exploration through virtual prototyping SoC hardware prototyping “System on board”Conclusion 46/68
  47. 47. Experimental results: MJPEG decoding Dc VLD IDPCM YuvDeMux Huffman table Dequant Idct Yuv2rgb Ac VLD RLD Unzig Zag Q table Block Diagram of mjpeg baseline decoder Execution time ratio for software MJPEG decoding (by using gprof) 47/68
  48. 48. Synthesis resultsIDCTYUV2RGB 48/68
  49. 49. Synthesis results Virtualprototyping IDCT Hardware prototyping YUV2RGB 49/68
  50. 50. SoCLib: a virtual prototyping platformFrench National Research Project (ANR)Free and open source virtual prototyping environment Library of SystemC simulation models Hardware components CPUs, HW-ACCs, memories, busses VCI/OCP interface protocol is used Two types of model are available for each HW component CABA (Cycle Accurate / Bit Accurate) TLM-DT (Transaction Level Modeling with Distributed Time) Software components OS, API… Associated tools Simulation, configuration, debug Automatic generation of simulation models www.soclib.frGAUT is used, to generate simulation models of HW-ACC CABA and TLM-DT 50/68
  51. 51. SoCLib: Design flow HLS Simulation models Architecture DesignMapping Performance simulateur analysisDebugging 51/68
  52. 52. Experiments: architecture #1TG Demux VLD IQ-ZZ IDCT Libu RamDAC File access MIPS Frame buffer R3000 MWMR INST DATA MWMR Generic Micro Network VGMN Lock engine RAM TTY Pure software implementation on a mono-processor architecture 52/68
  53. 53. Experiments: architecture #2TG Demux VLD IQ-ZZ IDCT Libu RamDAC File access MIPS Frame buffer R3000 MWMR INST DATA MWMR Generic Micro Network VGMN MWMR Copro Lock engine RAM TTY Software implementation on a mono-processor architecture + IDCT as HW accelerator 53/68
  54. 54. Experiments: architecture #3 Demux VLD IQ-ZZ IDCT TG Split Libu RamDAC Demux VLD IQ-ZZ IDCT File Frame MIPS MIPS MIPS MIPS access buffer R3000 R3000 R3000 R3000 MWMR INST DATA INST DATA INST DATA INST DATA MWMR Generic Micro Network VGMN Lock RAM0 RAM1 RAM2 RAM3 TTY engineParallelized software implementation on a multiprocessor architecture 54/68
  55. 55. Experiments: architecture #4 Demux VLD IQ-ZZ IDCTTG Split Libu RamDAC Demux VLD IQ-ZZ IDCT File Frame MIPS MIPS MIPS MIPS access buffer R3000 R3000 R3000 R3000 MWMR INST DATA INST DATA INST DATA INST DATA MWMR MWMR Copro MWMR Copro Generic Micro Network VGMN MWMR Copro MWMR Copro Lock RAM0 RAM1 RAM2 RAM3 TTY engine 55/68
  56. 56. MJPEG Results 14% 21% 31% IDCT generated by GAUT reduces the application latency by 14% Parallelization of the application on 4 CPUs reduces the latency by 21%Execution time of the application (in cycles) to process 50 images of 48*48 pixels 56/68
  57. 57. MJPEG Results The 4 HW IDCT in the multiprocessor 10% architecture further reduce the latency by 10%Execution time of the application (in cycles) to process 50 images of 48*48 pixels 57/68
  58. 58. MJPEG Results 38% 65%Execution time of the application (in cycles) to process 50 images of 48*48 pixels 10% Simulation time increase Simulation time (in secondes) 58/68
  59. 59. MJPEG: Hardware prototyping Real time decoding: 24 QCIF images/sec IDCT: maximum I/O bandwidth (4 parallel input ports) and the lower latency (33 cycles, Freq. 138,9Mhz) YUV2RGB: minimum latency (12 cycles, Freq. 249,18Mhz) Compared to a pure SW implementation 10x speed-up for the IDCT function 5x speed-up for the yuv2rgb functionSoC design on a FPGA Xilinx Virtex 5 LX110 (XUPV5) board 59/68
  60. 60. Prototyping platformSundance platform Mother board Daughter boards DSP C62 C67 (Texas Instrument) FPGA Virtex 1000E (Xilinx) Interconnection matrix Point to point links : Com Port (CP, up to 20 Mbytes/sec) and Sundance Digital Bus (SDB, up to 200 Mbytes/sec) 60/68
  61. 61. DVB-DSNG receiver architecture mapping C-functional architecture MPEG2Received frame data Synchro Viterbi de-interleaving RS decoding Sw compiler High Level Sw compiler HLS (GAUT) (Code Synthesis (GAUT) (Code (+ ISE) Composer) (+ ISE) Composer) Sw Hw Sw Hw (DSP) (FPGA) (DSP) (FPGA) design architecture 61/68
  62. 62. DVB-DSNG receiver Synchronization and interleaving : Sw : C62 DSP Viterbi and Reed Solomon decoders : Hw : Virtex-1000E FPGA 4 SDB links 26 Mbps throughput (limited by the synchronization bloc…C64 for higherthroughputs) 62/68
  63. 63. Viterbi decoding• functional/application parameters : state number, throughput State Number 8 16 32 64 128 Throughput (Mbps) 44 39 35 26 22 Synthesis Time (s) 1 1 3 9 27 Number of logic 223 434 1130 2712 7051 elements• DVB-DSNG standard : throughput : 1.5 to 72 Mbps, 64 states Viterbi decoder 5000 4500 Number of logic elements 4000 3500 3000 2500 2000 1500 1000 500 0 1 10 100 Throughput (Mbps) 63/68
  64. 64. Reed Solomon decoding• functional/application parameters : number of input symbols,data symbols, throughput RS(207, 187, 10): ATSC 4000 RS(255,239,8): IEEE 802.16 3500 RS(255,223,16): CCSDS Number of logic elements RS(255,205,10): IESS308 3000 RS(255,205,16): ADSL2 RS(204,188,8): DVB-T 2500 RS(204,188,8): DVB-C DVB-S 2000 1500 1000 500 0 1 10 100 Throughput (Mbps)• DVB-DSNG standard : 1.5 to 72 Mbps, RS (204/188) decoder 6000 N umber of logic elements 5000 4000 3000 2000 1000 0 1 10 100 Throughput (Mbps) 64/68
  65. 65. ConclusionHLS allows to automatically generate several RTLarchitectures From an algorithmic/behavioral description and a set of constraintsHLS allows to generate VHDL models for synthesis purpose SystemC simulation models for virtual prototypingHLS allows to explore the design space of Hardware accelerators MPSoC architectures including HW acceleratorsGAUT is free downloadable at http://lab-sticc.fr/www-gaut 65/68
  66. 66. References 66/68
  67. 67. References 67/68
  68. 68. High-Level Synthesis with GAUT Université de Bretagne-Sud Lab-STICC Philippe COUSSY philippe.coussy@univ-ubs.fr 68/68

×