Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Archi Modelling

456 views

Published on

  • Be the first to comment

  • Be the first to like this

Archi Modelling

  1. 1. DSP CPU Architecture Modelling with Matlab <ul><li>WHY? </li></ul><ul><ul><li>DESIGN FLOW </li></ul></ul><ul><li>INTRODUCTION </li></ul><ul><ul><li>FXP Issues </li></ul></ul><ul><ul><li>MATLAB ISSUES </li></ul></ul><ul><ul><li>Modeling </li></ul></ul><ul><ul><li>Terms & Definitions </li></ul></ul><ul><li>TUTORIAL PROGRESS </li></ul><ul><li>DSPMA1 </li></ul><ul><li>VNEG </li></ul><ul><li>ADD2 MODULO </li></ul><ul><li>FUSED ADDERS </li></ul><ul><li>ACCUMARRAY </li></ul><ul><li>DIFF </li></ul><ul><li>VPU DESIGN </li></ul><ul><li>FILTER DESIGN </li></ul><ul><ul><li>(IIR1,PMAVF256, LMS,NLMS) </li></ul></ul><ul><li>EQUALIZER </li></ul>
  2. 2. PPT TEMPLATE <ul><li>Design hierarchy </li></ul><ul><ul><li>Functional model </li></ul></ul><ul><ul><ul><li>level 3 is 20 </li></ul></ul></ul><ul><ul><li>Bit accurate model </li></ul></ul><ul><ul><li>Time accurate model </li></ul></ul><ul><ul><li>Pipeline model </li></ul></ul>
  3. 3. TUTORIAL PROGRESS <ul><li>DSPMA1 </li></ul><ul><ul><li>overview of typical issues </li></ul></ul><ul><li>VNEG </li></ul><ul><ul><li>any type </li></ul></ul><ul><ul><ul><li>FXP uses INTxx </li></ul></ul></ul><ul><ul><li>any shape </li></ul></ul><ul><ul><ul><li>vector, introduction to VU </li></ul></ul></ul><ul><ul><li>exercise: NEG modulo </li></ul></ul><ul><li>ADD2 MODULO </li></ul><ul><ul><li>we cover how to do modulo </li></ul></ul><ul><li>FUSED ADDERS </li></ul><ul><ul><li>more into parallelism </li></ul></ul><ul><li>ACCUMARRAY </li></ul><ul><ul><li>some architecture issues </li></ul></ul><ul><ul><li>interesting problems </li></ul></ul><ul><li>DIFF </li></ul><ul><ul><li>to do </li></ul></ul><ul><li>VPU </li></ul><ul><ul><li>internal sequencer </li></ul></ul><ul><li>IIR1 </li></ul><ul><ul><li>introduction to all filters </li></ul></ul><ul><ul><li>states </li></ul></ul><ul><ul><li>PMAVF256: massive parallelism </li></ul></ul>
  4. 4. WHY? <ul><li>Typical DSP architecture issues </li></ul><ul><li>How much parallelism? </li></ul><ul><ul><li>multi-lane (2,4,8,16) study </li></ul></ul><ul><li>How wide? </li></ul><ul><ul><li>Bit exact hardware, near-bit exact hardware </li></ul></ul><ul><li>MAC operator </li></ul><ul><ul><li>multiple units </li></ul></ul><ul><ul><li>mixing rounding and saturation </li></ul></ul><ul><ul><ul><li>refer to ITU library (all single, all serial) </li></ul></ul></ul><ul><li>Compound operators </li></ul><ul><ul><li>how to deal with multiple overflow? </li></ul></ul><ul><ul><li>where to do saturation? </li></ul></ul><ul><li>Complex (as in A+iB) operators </li></ul><ul><li>DSP Functions </li></ul><ul><ul><li>where? Coprocessing issues </li></ul></ul><ul><li>MATLAB specific functions </li></ul><ul><ul><li>How? when? </li></ul></ul>
  5. 5. MATHWORKS design flow <ul><li>Design hierarchy </li></ul><ul><ul><li>Functional model </li></ul></ul><ul><ul><li>Bit accurate model </li></ul></ul><ul><ul><li>Time accurate model </li></ul></ul><ul><ul><li>Pipeline model </li></ul></ul><ul><li>Tool (LANGUAGE) name </li></ul><ul><ul><li>MATLAB </li></ul></ul><ul><ul><li>SIMULINK </li></ul></ul><ul><ul><li>VERILOG </li></ul></ul><ul><li>Alternative names </li></ul><ul><ul><li>Functional: behavorial </li></ul></ul><ul><ul><li>Bit accurate: bit exact </li></ul></ul><ul><ul><li>Time accurate: phase accurate </li></ul></ul><ul><ul><li>Pipeline: microarchitecture </li></ul></ul>
  6. 6. TWO DESIGN FLOWS <ul><li>Design flows </li></ul><ul><ul><li>Behavioral model </li></ul></ul><ul><ul><li>Bit accurate model </li></ul></ul><ul><ul><li>Phase accurate model </li></ul></ul><ul><ul><li>Pipeline model </li></ul></ul><ul><li>MATLAB </li></ul><ul><ul><li>Powerful environment </li></ul></ul><ul><ul><li>more functions than any other languages </li></ul></ul><ul><ul><li>integrated from top to bottom </li></ul></ul><ul><ul><li>pipedream? </li></ul></ul><ul><li>SIMULINK </li></ul><ul><ul><ul><li>Implicit timing and concurency </li></ul></ul></ul><ul><ul><ul><li>Access tools built directly into models </li></ul></ul></ul><ul><ul><ul><li>easily change parameters to model the FXP effects (rounding,overflow,scaling) </li></ul></ul></ul>
  7. 7. INTRODUCTION <ul><li>Arithmetic issues and FXP problem </li></ul><ul><ul><li>3 modal arithmetic types </li></ul></ul><ul><ul><li>propagation mode </li></ul></ul><ul><li>MATLAB issues </li></ul><ul><li>Modeling concepts </li></ul>
  8. 8. INTRO - Arithmetic issues and FXP problem <ul><li>This is the FXP perspective </li></ul><ul><li>3 types of modal arithmetic </li></ul><ul><ul><li>Modulo </li></ul></ul><ul><ul><ul><li>on overflow, wrap-around (max becomes 0) </li></ul></ul></ul><ul><ul><ul><li>32 + 32  32 </li></ul></ul></ul><ul><ul><ul><li>Ex: C integer type, hardware </li></ul></ul></ul><ul><ul><li>Saturated </li></ul></ul><ul><ul><ul><li>on overflow, saturate (max+1 becomes max) </li></ul></ul></ul><ul><ul><ul><li>32 + 32  32 </li></ul></ul></ul><ul><ul><ul><li>Matlab integer type (intXX) </li></ul></ul></ul><ul><ul><li>Promoted </li></ul></ul><ul><ul><ul><li>overflow is impossible (*) (*) in the limits of the precision </li></ul></ul></ul><ul><ul><ul><li>32 + 32  33 (width increase) </li></ul></ul></ul><ul><ul><ul><li>Ex: Floating point (*), 8-bit data in 256-bit, free hardware </li></ul></ul></ul><ul><li>Propagation </li></ul><ul><ul><li>Matlab: integer supersedes FP </li></ul></ul><ul><ul><li>C: FP supersedes integer </li></ul></ul>
  9. 9. INTRO - MODULO vs SATURATED ARITHMETIC <ul><li>Matlab use saturated arithmetic </li></ul><ul><ul><li>Embedded in the (U)INTXX type </li></ul></ul><ul><ul><li>Note that it is attached to the type not the operator. </li></ul></ul><ul><li>Saturation is easily added to an operator working in modulo mode </li></ul><ul><ul><li>Example 32-bit add </li></ul></ul><ul><ul><ul><li>z = x+y </li></ul></ul></ul><ul><ul><ul><li>overflow = xor(z.32, z.31) </li></ul></ul></ul><ul><ul><ul><li>if (overflow) z = sat(z) </li></ul></ul></ul><ul><ul><li>as long that we have access to bit 32 (means larger width). </li></ul></ul><ul><li>Matlab offers the unusual problem of having to add modulo mode to a operator working in saturated mode. </li></ul><ul><ul><li>Solution  Use promotion </li></ul></ul><ul><ul><li>example 16-bit add </li></ul></ul><ul><ul><ul><li>first promote z= int32(x) + int32 (y) </li></ul></ul></ul><ul><ul><ul><li>if overflow, correct the sum by substracting power(2,16) if positive (resp. substraction for negative) </li></ul></ul></ul><ul><ul><ul><li>final operation, demote, z = int16(z) </li></ul></ul></ul><ul><ul><li>example 32-bit add </li></ul></ul><ul><ul><ul><li>first promote z= double(x) + double (y) </li></ul></ul></ul><ul><ul><ul><ul><li>note that for 32-bit ; z= int64(x)+ int64(y) does not work </li></ul></ul></ul></ul><ul><ul><ul><li>if overflow, correct the sum by substracting power(2,32) if positive (resp. substraction for negative) </li></ul></ul></ul><ul><ul><ul><li>final operation, demote, z = int32(z) </li></ul></ul></ul><ul><ul><li>Issues: </li></ul></ul><ul><ul><ul><li>The operation must not need more than double of bits. </li></ul></ul></ul><ul><ul><ul><li>Bit operations (such as xor) only work on unsigned type. </li></ul></ul></ul>
  10. 10. INTRO - MODULO vs SATURATED ARITHMETIC -TestBench <ul><li>Modulo and Saturated differs only in case of overflow </li></ul><ul><li>When can overflow happen? </li></ul><ul><ul><li>Direct demotion </li></ul></ul><ul><ul><ul><li>long to short , short to byte </li></ul></ul></ul><ul><ul><li>Hidden demotion: promotion followed by demotion </li></ul></ul><ul><ul><ul><li>ADD: 32+32 -> 33 -> 32 </li></ul></ul></ul><ul><ul><ul><li>MUL: 16 x 16 -> 32 -> 16 </li></ul></ul></ul><ul><ul><ul><li>SHIFT: (32 bit) << 8 -> (40 bit) -> 32-bit </li></ul></ul></ul><ul><ul><ul><ul><li>The saturation/modulo operation happens when demoted. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>but the overflow detection can be done in many places. </li></ul></ul></ul></ul><ul><li>The test cases can all be simplified to a single case of demotion </li></ul><ul><ul><li>(no need to test add, shift,etc..) </li></ul></ul>
  11. 11. INTRO - MATLAB issues <ul><li>Index problem </li></ul><ul><ul><li>Index is not a problem for “for loops” </li></ul></ul><ul><ul><li>Index is a problem for vector </li></ul></ul><ul><ul><ul><li>use C convention </li></ul></ul></ul><ul><ul><ul><ul><li>for n= 0:N-1 </li></ul></ul></ul></ul><ul><ul><ul><ul><li>z(n+MLABIDX) = -x(n+MLABIDX) </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>MLABIDX=1 defined locally in each unit </li></ul></ul></ul></ul></ul><ul><ul><li>For N-dim arrays </li></ul></ul><ul><ul><ul><li>I don’t know yet </li></ul></ul></ul><ul><li>Int data type is saturated </li></ul><ul><ul><li>How to have Integer using modulo arithmetic? </li></ul></ul><ul><li>Vector organization: Column or row? </li></ul><ul><ul><li>for instance accumarray requires a vector organised as Nx1, not 1xN </li></ul></ul><ul><li>Common src/dst or predicated functions: </li></ul><ul><ul><ul><li>if (sign) x=abs(x) ; else x=x </li></ul></ul></ul><ul><ul><li>Predicated functions will touch (or not) the destination </li></ul></ul><ul><ul><ul><ul><li>in Matlab dst is always modified; impossible to implement </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Matlab does not have pointers; arguments passed by values </li></ul></ul></ul></ul>
  12. 12. INTRO - STRUCTURE OF A DSP UNIT <ul><li>DSP UNIT </li></ul><ul><ul><li>Behavioral model </li></ul></ul><ul><ul><li>Bit accurate model </li></ul></ul><ul><ul><li>Phase accurate model </li></ul></ul><ul><ul><li>Debugging hooks </li></ul></ul><ul><li>MATLAB CODE </li></ul>
  13. 13. INTRO to MODELING: Differences between Bit and Phase accurate <ul><li>Bit accurate </li></ul>f1 f2 f3 f4 <ul><li>Phase accurate </li></ul><ul><li>Phase accurate </li></ul>stage 1 stage 2 stage 3 f1 f2 f3 f4 decoder switch case opcode opcode
  14. 14. TERMS & DEFINITIONS <ul><li>Type </li></ul><ul><li>Shape </li></ul><ul><li>Flow processing </li></ul><ul><li>Building Block </li></ul><ul><li>Top level unit naming conventions </li></ul><ul><li>Arithmetic unit naming conventions </li></ul><ul><li>Variable naming conventions </li></ul><ul><ul><li>internal to units </li></ul></ul><ul><ul><li>global to testbench </li></ul></ul>
  15. 15. TERMS & DEFINITIONS - types and shapes <ul><li>Type </li></ul><ul><ul><li>FXP is the same as Q format </li></ul></ul><ul><ul><li>Integer </li></ul></ul><ul><ul><ul><li>all integer (no fractional part) </li></ul></ul></ul><ul><ul><ul><li>generally software types (short,long) limited to8,16,32 bit </li></ul></ul></ul><ul><ul><ul><li>in q format all integers are signed (8q0, 16q0, 32q0) </li></ul></ul></ul><ul><ul><ul><li>q format could be extended to 25q0,17q0 etc.. </li></ul></ul></ul><ul><ul><li>fractional: number is all fractional (1q15,1q31) </li></ul></ul><ul><li>Shape </li></ul><ul><ul><li>scalar </li></ul></ul><ul><ul><li>vector (1xN=row vector, Nx1=column vector) </li></ul></ul><ul><ul><ul><li>a 1x1 vector can be different from a scalar </li></ul></ul></ul><ul><ul><li>array (NXM) </li></ul></ul><ul><ul><li>Matrix == Array with special properties (linear algebra) </li></ul></ul>
  16. 16. TERMS & DEFINITIONS - Flow processing <ul><li>Flow processing alternative names </li></ul><ul><ul><li>Scheduling </li></ul></ul><ul><li>Types of flow processing </li></ul><ul><ul><li>sample (single sample) </li></ul></ul><ul><ul><ul><li>alternative name:sample by sample </li></ul></ul></ul><ul><ul><ul><li>sample in, sample out </li></ul></ul></ul><ul><ul><li>streaming (several samples) </li></ul></ul><ul><ul><ul><li>alternative name: block by block processing </li></ul></ul></ul><ul><ul><ul><li>data block: a block? a chunk? </li></ul></ul></ul><ul><ul><li>block processing (complete block) </li></ul></ul><ul><ul><ul><li>alternative name: full block processing, vector,array processing </li></ul></ul></ul>
  17. 17. TERMS & DEFINITIONS - Building Block <ul><li>Building block alternative name </li></ul><ul><ul><li>unit, function </li></ul></ul><ul><li>Complexity level </li></ul><ul><ul><li>Top level </li></ul></ul><ul><ul><li>msi structural unit </li></ul></ul><ul><li>Application level </li></ul><ul><ul><li>Structural unit </li></ul></ul><ul><ul><ul><li>arithmetic </li></ul></ul></ul><ul><ul><ul><li>vector </li></ul></ul></ul><ul><ul><ul><li>Top level,LSI </li></ul></ul></ul><ul><ul><ul><ul><li>alu,bmu </li></ul></ul></ul></ul><ul><ul><li>functional unit </li></ul></ul><ul><ul><ul><li>dsp function </li></ul></ul></ul><ul><ul><ul><li>matlab function </li></ul></ul></ul><ul><li>Processing unit </li></ul><ul><ul><li>alternative name: execution unit </li></ul></ul><ul><li>Vector Processing unit </li></ul><ul><ul><li>alternative name: VPU, Vector Unit </li></ul></ul>
  18. 18. TERMS & DEFINITIONS - Top level BB naming conventions <ul><li>unitnameNN where NN = complexity level </li></ul><ul><ul><li>alu0, alu1, alu2, alu3, alu4 </li></ul></ul><ul><ul><li>mac0, mac1, mac2, mac3 </li></ul></ul><ul><ul><li>bmu0, bmu1, bmu2 </li></ul></ul><ul><ul><li>ffter1,ffter2,ffter4 </li></ul></ul><ul><ul><li>cafir1, cafir2 </li></ul></ul><ul><li>unitnameNN where NN = brand name </li></ul><ul><ul><li>alu2901 </li></ul></ul><ul><ul><li>mac1616 </li></ul></ul><ul><ul><li>macSPARTAN-DSP48A </li></ul></ul>
  19. 19. TERMS & DEFINITIONS - arithmetic unit naming conventions <ul><li>Type agnostic </li></ul><ul><ul><li>Multi i/o </li></ul></ul><ul><ul><ul><li>functionnameNNOO </li></ul></ul></ul><ul><ul><ul><ul><li>NN = number of inputs (input can be any width or type) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>OO= number of outputs </li></ul></ul></ul></ul><ul><ul><ul><ul><li>example: 15-input adder </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>add151 </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><li>note if signal is 32-bit wide, the total nbr of pins is big </li></ul></ul></ul></ul><ul><ul><li>Type relevant </li></ul></ul><ul><ul><li>1vector to 1vector </li></ul></ul><ul><ul><ul><li>functionnameLLTT </li></ul></ul></ul><ul><ul><ul><ul><li>LL = number of lanes </li></ul></ul></ul></ul><ul><ul><ul><ul><li>TT= type (F,L,S) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>example: accumarray 16-lane 32-bit data type </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>accumar16_L </li></ul></ul></ul></ul></ul>
  20. 20. TERMS & DEFINITIONS – variable naming conventions x y 2+1 UNIT z x y v 3+2 UNIT z w a b c 3+2 UNIT d e x y v a b c d … M+N UNIT z w u t s .. a b 2+1 UNIT c b c internal vars= d e f g h.. 2+1 UNIT a
  21. 21. TERMS & DEFINITIONS – Testbench Global variable naming conventions x y PRINT_MANAGER z w zgolden wgolden gold z maxerr differ bounds CHECKER_ MANAGER input buffers: x y a b BUFFER_MANAGER output buffers: z w u t GENE_MANAGER (8 genes) geneA,B,C,D,E,F,G,H
  22. 22. OVERVIEW – DSPMA1 <ul><li>What is a CPU? a DSP? </li></ul><ul><ul><li>4 major blocks </li></ul></ul><ul><ul><ul><li>DPU, LSU, PCU, COP + register files + MEM + I/O </li></ul></ul></ul><ul><li>not all blocks need modeling </li></ul><ul><ul><li>DSP is 90% datapath </li></ul></ul><ul><li>Eventually it is better to model more </li></ul><ul><ul><li>Memory hierarchy (at least level1) </li></ul></ul><ul><ul><li>Program control </li></ul></ul>
  23. 23. WORKSHOP1 - VNEG <ul><li>Why? </li></ul><ul><ul><li>General introduction to the power of Matlab </li></ul></ul><ul><ul><ul><li>typeless </li></ul></ul></ul><ul><ul><ul><li>shapeless </li></ul></ul></ul><ul><ul><li>Dealing with parallelism: </li></ul></ul><ul><ul><ul><li>multi-lane is straightforward </li></ul></ul></ul><ul><ul><ul><li>..but only on ‘pure parallel’ vector processing </li></ul></ul></ul><ul><ul><ul><ul><li>sequence of input data is independent (same with results) </li></ul></ul></ul></ul><ul><li>Workshop Target </li></ul><ul><ul><li>Introduction to the testbench style </li></ul></ul><ul><ul><li>Working with vectors </li></ul></ul><ul><ul><li>Introduction to vector/scalar issues </li></ul></ul>
  24. 24. WK1 - VNEG (description) <ul><li>What? </li></ul><ul><ul><li>Negate building block </li></ul></ul><ul><ul><ul><li>same code for all types and all shapes </li></ul></ul></ul><ul><li>Examples: </li></ul><ul><ul><li>1 </li></ul></ul><ul><ul><ul><li>z = neg(x) </li></ul></ul></ul><ul><ul><ul><ul><li>z depends on type and shape of x </li></ul></ul></ul></ul><ul><ul><ul><ul><li>default is a matrix of FP complex numbers </li></ul></ul></ul></ul><ul><ul><li>2 </li></ul></ul><ul><ul><ul><li>x=int16(bufA); </li></ul></ul></ul><ul><ul><ul><li>z = neg(x) </li></ul></ul></ul><ul><ul><ul><ul><li>z is a vector of shorts with the length of buffer A </li></ul></ul></ul></ul><ul><ul><li>3 </li></ul></ul><ul><ul><ul><li>bufA= gene(1:40) </li></ul></ul></ul><ul><ul><ul><li>x=int16(bufA); </li></ul></ul></ul><ul><ul><ul><li>z = neg(x) </li></ul></ul></ul><ul><ul><ul><ul><li>z is a 40 x short vector </li></ul></ul></ul></ul>
  25. 25. WK1 - VNEG (design & issues) <ul><li>Initially this is a straightforward design with only minor cosmetics issues: </li></ul><ul><ul><li>Test bench style might not be the best fo everybody. </li></ul></ul><ul><li>Vector design is more involving </li></ul><ul><ul><li>Shape : input vector is cut in blocks </li></ul></ul><ul><ul><ul><li>need right equation for computing the length of each block </li></ul></ul></ul><ul><ul><li>Type: </li></ul></ul><ul><ul><ul><li>shall we keep it typeless? </li></ul></ul></ul><ul><ul><ul><li>modulo type might need more work than expected </li></ul></ul></ul><ul><ul><ul><ul><li>matlab intXX type is saturated. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>ex: int16 ….. neg(32768) -> -32767 </li></ul></ul></ul></ul><ul><ul><ul><ul><li>in C code (or modulo) int16 ….. neg(32768) -> 0 </li></ul></ul></ul></ul><ul><ul><ul><li>to create modulo type, careful to keep vectorized code (do not use “if construct” to detect overflow). </li></ul></ul></ul>
  26. 26. WK1 - VNEG (list of files) <ul><li>Functions </li></ul><ul><ul><li>neg.m </li></ul></ul><ul><ul><li>neg40.m </li></ul></ul><ul><ul><li>neg40s.m </li></ul></ul><ul><ul><li>vneg40.m </li></ul></ul><ul><ul><li>vneg40s.m </li></ul></ul><ul><li>Testbench files </li></ul><ul><ul><li>run.m </li></ul></ul><ul><ul><li>mmain.m </li></ul></ul><ul><ul><li>test_manager.m </li></ul></ul><ul><ul><li>gene_manager.m </li></ul></ul>
  27. 27. WORKSHOP2 - ADD2 MODULO <ul><li>Why? </li></ul><ul><ul><li>Some processing elements requires modulo arithmetic </li></ul></ul><ul><ul><ul><ul><li>checksum, coding </li></ul></ul></ul></ul><ul><ul><li>Some DSP algorithms have better accumulator behavior in modulo arithmetic than in saturated arithmetic. </li></ul></ul><ul><ul><li>Many DSP system designers prefer to use modulo instead of saturated arithmetic </li></ul></ul><ul><ul><ul><li>boys vs men? </li></ul></ul></ul><ul><li>Workshop Target </li></ul><ul><ul><li>Introduction to vector/scalar issues </li></ul></ul>
  28. 28. WK2 - ADD2 MODULO (description) <ul><li>What? </li></ul><ul><ul><li>32-bit 2-input adder (add2ml) </li></ul></ul><ul><ul><ul><li>l for long </li></ul></ul></ul><ul><ul><li>identically 16-bit, 8-bit </li></ul></ul><ul><ul><ul><li>s for short, b for byte </li></ul></ul></ul><ul><ul><li>32-bit 2-input adder (add2mul) </li></ul></ul><ul><ul><ul><li>ul for unsigned long </li></ul></ul></ul><ul><ul><li>identically 16-bit, 8-bit </li></ul></ul><ul><ul><ul><li>us for unsigned short, ub for unsigned byte </li></ul></ul></ul><ul><li>Example: </li></ul><ul><ul><li>z = add2ms(x,y) </li></ul></ul>
  29. 29. WK2 - ADD2 MODULO (design & issues) <ul><li>Input and output variables </li></ul><ul><ul><li>type: strongly typed : unsigned/signed (long,short,byte) </li></ul></ul><ul><ul><ul><ul><li>detection of overflow, needs width; </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>alternative is multi functional adder; this requires parameters which is silly (or this becomes an alu) </li></ul></ul></ul></ul></ul><ul><ul><li>shape: scalar </li></ul></ul><ul><li>Control parameters? none; this is the simplest of adder </li></ul><ul><li>Issues with Matlab </li></ul><ul><ul><li>why not both scalar/vector? </li></ul></ul><ul><ul><ul><li>the if condition is easier to do in scalar mode. </li></ul></ul></ul><ul><ul><ul><li>in vector mode needs use of find. </li></ul></ul></ul>
  30. 30. WK2 - ADD2 MODULO (list of files) <ul><li>Functions </li></ul><ul><ul><li>add2ml.m </li></ul></ul><ul><ul><li>add2ms.m </li></ul></ul><ul><ul><li>add2mb.m </li></ul></ul><ul><ul><li>add2mul.m </li></ul></ul><ul><ul><li>add2mus.m </li></ul></ul><ul><ul><li>add2mub.m </li></ul></ul><ul><li>Testbench files </li></ul><ul><ul><li>run.m </li></ul></ul><ul><ul><li>mmain.m </li></ul></ul><ul><ul><li>test_manager.m </li></ul></ul><ul><ul><li>gene_manager.m </li></ul></ul><ul><ul><li>print_manager.m </li></ul></ul>
  31. 31. WK2 - ADD2 MODULO (reviewing dfm work) <ul><li>Functions </li></ul><ul><ul><li>SHAPE: </li></ul></ul><ul><ul><ul><li>all units are vector based </li></ul></ul></ul><ul><ul><li>TYPE: </li></ul></ul><ul><ul><ul><li>all units expects and returns a type </li></ul></ul></ul><ul><ul><li>add2ml.m: </li></ul></ul><ul><ul><ul><li>promote to double, should be int64 but Matalb does not allow it. </li></ul></ul></ul><ul><ul><ul><li>use of fix, to go from double to int </li></ul></ul></ul><ul><ul><li>add2mul.m: same </li></ul></ul><ul><ul><li>add2ms.m, add2mus.m </li></ul></ul><ul><ul><ul><li>nope </li></ul></ul></ul><ul><ul><li>add2mb.m: promote to int32,could be int16 </li></ul></ul><ul><ul><li>add2mub.m: same </li></ul></ul><ul><li>Testbench files </li></ul><ul><ul><li>run.m </li></ul></ul><ul><ul><li>mmain.m </li></ul></ul><ul><ul><li>test_manager.m </li></ul></ul><ul><ul><ul><li>devectorization -> find better  buffer manager?? </li></ul></ul></ul><ul><ul><li>gene_manager.m </li></ul></ul><ul><ul><ul><li>sequences have larger width than width of operators; i.e short sequences are used to test add2 bytes; normally only byte sequences should be used; it is enough to create overflow. </li></ul></ul></ul><ul><ul><li>dp_manager.m : none </li></ul></ul><ul><ul><li>print_manager: </li></ul></ul><ul><ul><ul><li>printf decimal is hardly readable for long; ulong uses hex but then hex is not exactly the answer for signed long. </li></ul></ul></ul>
  32. 32. WORKSHOP3 - FUSED ADDERS <ul><li>Why? </li></ul><ul><ul><li>To build multi-lane procesing elements you need fused adders </li></ul></ul><ul><ul><ul><li>e.g. : any function with sum(..) </li></ul></ul></ul><ul><ul><ul><ul><li>accumulator= sum(x*k) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>sumabsdif = sum( abs(x-y)) </li></ul></ul></ul></ul><ul><ul><li>Example: </li></ul></ul><ul><ul><ul><li>16-lane sumabsdif </li></ul></ul></ul><ul><ul><li>Workshop target: </li></ul></ul><ul><ul><ul><li>improved test bench </li></ul></ul></ul><ul><ul><ul><li>fp vs fxp How to test? </li></ul></ul></ul><ul><ul><ul><li>use of checkers </li></ul></ul></ul><ul><ul><ul><ul><li>with max errors </li></ul></ul></ul></ul><ul><ul><ul><ul><li>quasi bit exact </li></ul></ul></ul></ul>fused adder (16 to 1) lane 14 lane 15 lane 1 lane 0 …
  33. 33. WK3 - FUSED ADDERS (description) <ul><li>What? </li></ul><ul><ul><li>16 to 1 (add161) </li></ul></ul><ul><ul><li>8 to 1 (add81) </li></ul></ul><ul><ul><li>4 to 1 (add41) </li></ul></ul><ul><ul><li>also 3-input adder (add31) </li></ul></ul><ul><li>Example: </li></ul><ul><ul><li>z = add161(x) </li></ul></ul>fused adder (16 to 1) Z X X(16) X(1) Indices are Matlab
  34. 34. WK3 - FUSED ADDERS (design & issues) <ul><li>Input variables </li></ul><ul><ul><li>a vector or separate variables? </li></ul></ul><ul><ul><ul><li>add161, add81  vector </li></ul></ul></ul><ul><ul><ul><li>add3 , add41  separate </li></ul></ul></ul><ul><li>Control parameters? </li></ul><ul><ul><li>not needed in basic units </li></ul></ul><ul><li>Issues with Matlab </li></ul><ul><ul><li>int16, uint16, int32 etc.. use saturated arithmetic </li></ul></ul><ul><ul><ul><li> less flexibility in design </li></ul></ul></ul><ul><ul><ul><li>z = (x0+x1) + (x2+x3) is different from z= (x0+x2) + (x1+x3) </li></ul></ul></ul>
  35. 35. WK3 - FUSED ADDERS (list of files) <ul><li>Functions </li></ul><ul><ul><li>add3.m </li></ul></ul><ul><ul><li>add161.m </li></ul></ul><ul><ul><li>add81.m </li></ul></ul><ul><ul><li>add41.m </li></ul></ul><ul><li>Testbench files </li></ul><ul><ul><li>run.m </li></ul></ul><ul><ul><li>mmain.m </li></ul></ul><ul><ul><li>test_manager.m </li></ul></ul><ul><ul><li>buffer_manager.m </li></ul></ul><ul><ul><li>gene_manager.m </li></ul></ul><ul><ul><li>dp_manager.m </li></ul></ul>
  36. 36. WORKSHOP 4 – ACCUMARRAY <ul><li>Why? </li></ul><ul><ul><li>Taken as example of a Matlab extensive function </li></ul></ul><ul><li>Example: 16-lane accumarray </li></ul><ul><li>Workshop target </li></ul><ul><ul><li>exploring several examples of fine-grain parallel implementations </li></ul></ul><ul><ul><li>2 to 16 lanes </li></ul></ul>fused adder (16 to 1) lane 14 lane 15 lane 1 lane 0 …
  37. 37. WK4 - ACCUMARRAY (description) <ul><li>What? </li></ul><ul><ul><li>simple: 1-lane structure(accumar1) </li></ul></ul><ul><ul><li>2-lane structure (accumar2) </li></ul></ul><ul><ul><li>4-lane structure (accumar4) </li></ul></ul><ul><ul><li>16-lane structure (accumar16) </li></ul></ul><ul><li>Example: </li></ul><ul><ul><li>z= accumarray(idx,x); MATLAB function </li></ul></ul><ul><ul><li>z= accumar16(idx,x,xlength,NACC) </li></ul></ul><ul><li>X and Z= vector of length L; </li></ul><ul><ul><li>L is a multiple of 16 </li></ul></ul><ul><li>Idx=index vector of length L </li></ul><ul><li>NACC =Number of ACCumulators </li></ul><ul><ul><li>typically 10 </li></ul></ul><ul><li>Xlength= length of x </li></ul>accumar16 Z X idx xlength NACC
  38. 38. WK4 - ACCUMARRAY (design & issues) <ul><li>Input variables </li></ul><ul><ul><li>always a vector; </li></ul></ul><ul><ul><ul><li>length must be a multiple of number of lanes </li></ul></ul></ul><ul><li>Control parameters? </li></ul><ul><ul><li>Not always necessary </li></ul></ul><ul><ul><ul><li>NACC: is a static parameter </li></ul></ul></ul><ul><ul><ul><li>xlength: simplifies caller job but it means that the unit must have a sequencer to do the for loop. </li></ul></ul></ul><ul><li>Issues with Matlab </li></ul><ul><ul><li>intXX.. adder use saturated arithmetic </li></ul></ul>
  39. 39. WK4 - ACCUMARRAY (list of files) <ul><li>Unit files </li></ul><ul><ul><li>accumar1.m </li></ul></ul><ul><ul><li>accumar2.m </li></ul></ul><ul><ul><ul><li>2 implementations </li></ul></ul></ul><ul><ul><li>accumar4.m </li></ul></ul><ul><ul><li>accumar16.m </li></ul></ul><ul><li>Testbench files </li></ul><ul><ul><li>run.m </li></ul></ul><ul><ul><li>mmain.m </li></ul></ul><ul><ul><li>test_manager </li></ul></ul><ul><ul><ul><li>evolution: test_model1 </li></ul></ul></ul><ul><ul><li>gene_manager.m </li></ul></ul><ul><ul><li>dp_manager.m </li></ul></ul><ul><ul><ul><li>evolution: dp_manager4  dp_manager3  dp_manager2 </li></ul></ul></ul>
  40. 40. WK4 - ACCUMARRAY (test bench evolution) <ul><li>Testbench files </li></ul><ul><ul><li>Model 0 </li></ul></ul><ul><ul><ul><li>run example given by Matlab on command line </li></ul></ul></ul><ul><ul><li>Model 1 = </li></ul></ul><ul><ul><ul><li>Model 0 </li></ul></ul></ul><ul><ul><ul><li>+ several examples </li></ul></ul></ul><ul><ul><ul><li>+ test _manager </li></ul></ul></ul><ul><ul><ul><li>+ some functional optimization </li></ul></ul></ul><ul><ul><ul><li>+ keep design simple </li></ul></ul></ul><ul><ul><ul><ul><li>2- lane, 4-lane,16-lane (m-lane) </li></ul></ul></ul></ul><ul><ul><li>Model 2= </li></ul></ul><ul><ul><ul><li>Model 1 </li></ul></ul></ul><ul><ul><ul><li>+ structured test bench (based on script) -> several separate files </li></ul></ul></ul><ul><ul><ul><li>+ gene_manager, </li></ul></ul></ul><ul><ul><ul><li>+ golden model and check_manager </li></ul></ul></ul><ul><ul><ul><li>+ dp_manager </li></ul></ul></ul><ul><ul><ul><ul><li>some datapath optimization </li></ul></ul></ul></ul><ul><ul><li>Model 3= </li></ul></ul><ul><ul><ul><li>Model 2 </li></ul></ul></ul><ul><ul><ul><li>+ fxp data </li></ul></ul></ul><ul><ul><ul><li>+ check is done with = + range value, not bit exact. </li></ul></ul></ul><ul><ul><li>Model X = </li></ul></ul><ul><ul><ul><li>model 2 +++ </li></ul></ul></ul><ul><ul><ul><li>mapping accumar to a Vector processing unit </li></ul></ul></ul>
  41. 41. WK7 - VPU <ul><li>What? </li></ul><ul><ul><li>40-lane structure (vneg40) </li></ul></ul><ul><li>X and Z= vector of length L; </li></ul><ul><ul><li>L is a multiple of 40 </li></ul></ul><ul><li>Xlength= length of x </li></ul>vneg40 Z X xlength xstride bufX bufZ dst@ Matlab hidden src@
  42. 42. WK7 - VPU (design & issues) <ul><ul><li>Shape : input vector is cut in blocks </li></ul></ul><ul><ul><ul><li>need right equation for computing the length of each block </li></ul></ul></ul><ul><ul><li>how many parameters </li></ul></ul><ul><ul><li>shall we keep it typeless? </li></ul></ul>
  43. 43. WORKSHOP 10 - FILTER functions <ul><li>Why? </li></ul><ul><ul><li>Filter are most common basic building blocks in DSP </li></ul></ul><ul><ul><li>Benchmarks are based on filters </li></ul></ul><ul><li>Workshop targets </li></ul><ul><ul><li>standard methodology to all filters </li></ul></ul><ul><ul><li>architecture issues </li></ul></ul><ul><ul><li>a working simple block </li></ul></ul><ul><ul><li>mega block: pmavf256 </li></ul></ul>
  44. 44. WK10 - FILTER functions (description) <ul><li>What? </li></ul><ul><ul><li>Filter building blocks </li></ul></ul><ul><li>Examples </li></ul><ul><ul><li>iir1 </li></ul></ul><ul><ul><li>pmavf </li></ul></ul><ul><ul><li>pmavf256 </li></ul></ul><ul><ul><li>lms </li></ul></ul><ul><ul><li>nlms </li></ul></ul>
  45. 45. WK10 - FILTER functions (design & issues) <ul><li>Defining and classifying filter bb </li></ul><ul><ul><li>fir,cplxfir, lms, iir, biquad, lattice, acf </li></ul></ul><ul><li>Defining and classifying filter characteritics </li></ul><ul><ul><li>block, stream, </li></ul></ul><ul><ul><li>number of states(or taps) </li></ul></ul><ul><ul><li>all parameters? flexibility </li></ul></ul><ul><li>Defining variable names </li></ul><ul><ul><li>input: bufin </li></ul></ul><ul><ul><li>states: x,w </li></ul></ul><ul><ul><li>output: z,bufout </li></ul></ul><ul><li>Is memory inside or outside datapath? </li></ul><ul><ul><li>outside is better idea </li></ul></ul>
  46. 46. WK10 - FILTER functions (list of files) <ul><li>IIR1 project </li></ul><ul><ul><li>Unit files </li></ul></ul><ul><ul><ul><li>iir1.m </li></ul></ul></ul><ul><ul><ul><li>iir1_q15.m </li></ul></ul></ul><ul><ul><li>Testbench files </li></ul></ul><ul><ul><ul><li>run.m,,mmain.m </li></ul></ul></ul><ul><ul><ul><li>test_manager.m,gene_manager.m, dp_manager.m </li></ul></ul></ul><ul><li>PMAVF project </li></ul><ul><ul><li>Unit files </li></ul></ul><ul><ul><ul><li>pmavf.m </li></ul></ul></ul><ul><ul><ul><li>pmavfsec_q15.m </li></ul></ul></ul><ul><ul><ul><li>pmavf256_q15.m </li></ul></ul></ul><ul><ul><li>Testbench files </li></ul></ul><ul><ul><ul><li>run.m,mmain.m </li></ul></ul></ul><ul><ul><ul><li>test_manager.m, gene_manager.m,dp_manager.m </li></ul></ul></ul><ul><li>LMS project </li></ul><ul><ul><li>Unit files </li></ul></ul><ul><ul><ul><li>lms1.m </li></ul></ul></ul><ul><ul><ul><li>lms1_q13.m </li></ul></ul></ul><ul><ul><ul><li>lmsVVl.m </li></ul></ul></ul><ul><ul><li>Testbench files </li></ul></ul><ul><ul><ul><li>run.m, mmain.m </li></ul></ul></ul><ul><ul><ul><li>test_manager, gene_manager.m, dp_manager.m,print_manager </li></ul></ul></ul><ul><li>NLMS project </li></ul><ul><ul><li>Unit files </li></ul></ul><ul><ul><ul><li>nlms.m </li></ul></ul></ul><ul><ul><ul><li>lms units to be used as comparison </li></ul></ul></ul><ul><ul><li>Testbench files </li></ul></ul><ul><ul><ul><li>run.m, mmain.m </li></ul></ul></ul><ul><ul><ul><li>test_manager, gene_manager.m, dp_manager.m,print_manager </li></ul></ul></ul><ul><ul><ul><li>dp_manager1.m, dp_manager4.m ??? </li></ul></ul></ul>
  47. 47. WK10 - FILTER functions (further work) <ul><li>IIR1 project </li></ul><ul><ul><li>Unit files </li></ul></ul><ul><ul><ul><li>from 3 port to 2 port to 1 port units </li></ul></ul></ul><ul><ul><ul><ul><li>(see single page doc) </li></ul></ul></ul></ul><ul><ul><li>Testbench files; na </li></ul></ul><ul><li>PMAVF project </li></ul><ul><ul><li>Unit files </li></ul></ul><ul><ul><ul><li>optimised pmavf256_q15.m using spuds graph </li></ul></ul></ul><ul><ul><ul><ul><li>(see single page doc) </li></ul></ul></ul></ul><ul><ul><li>Testbench files: na </li></ul></ul><ul><li>LMS project </li></ul><ul><ul><li>Unit files: ?? </li></ul></ul><ul><ul><li>Testbench files: na </li></ul></ul><ul><li>NLMS project </li></ul><ul><ul><li>Unit files: get rid of lms units to be used as comparison </li></ul></ul><ul><ul><li>Testbench files </li></ul></ul><ul><ul><ul><li>dp_manager1.m, dp_manager4.m what the heck??? </li></ul></ul></ul>
  48. 48. WK11 - MATLAB functions <ul><li>What? </li></ul><ul><ul><li>Matlab functions </li></ul></ul><ul><li>Issues </li></ul><ul><ul><li>Black box </li></ul></ul><ul><ul><ul><li>Understanding usage of functions </li></ul></ul></ul><ul><ul><ul><li>Run examples </li></ul></ul></ul><ul><ul><ul><li>How much compatibility? </li></ul></ul></ul><ul><ul><ul><ul><li>types of i/o (matrix,complex) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>all parameters? flexibility </li></ul></ul></ul></ul><ul><ul><li>Internal box </li></ul></ul><ul><ul><ul><li>how to get M-code? </li></ul></ul></ul><ul><ul><ul><ul><li>write its own code </li></ul></ul></ul></ul><ul><ul><ul><ul><li>look for librairies (C,) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>octave,scilab </li></ul></ul></ul></ul>
  49. 49. WK11 - EQUALIZER <ul><li>Why? </li></ul><ul><ul><li>A good example of applying a DSP function </li></ul></ul><ul><ul><li>Seen in: </li></ul></ul><ul><ul><ul><li>NLMS tips and tricks sigMag proc. </li></ul></ul></ul><ul><ul><ul><li>Optical receiver </li></ul></ul></ul><ul><li>What? </li></ul><ul><ul><li>NLMS: from equations to matlab code </li></ul></ul><ul><ul><li>Matlab functions: architecting </li></ul></ul><ul><ul><ul><li>defining an equalizer: normlms,lineareq </li></ul></ul></ul><ul><ul><ul><li>running an equalizer </li></ul></ul></ul><ul><ul><li>my former work (equalTDMA, equal): from Matlab code to FXP code </li></ul></ul><ul><ul><li>others </li></ul></ul>equalizer Z X nbr of taps bufY bufX bufZ dst@ src@ Y adaptfir type

×