3D-DRESD Lorenzo Pavesi

469 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
469
On SlideShare
0
From Embeds
0
Number of Embeds
33
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

3D-DRESD Lorenzo Pavesi

  1. 1. Università Milano Bicocca Studio di Tecniche di compilazione parallela per architetture riconfigurabili Pavesi Lorenzo 071042
  2. 2. Agenda <ul><li>Processore ibrido XiRisc </li></ul><ul><li>PiCoGa e GriffyC </li></ul><ul><li>Suif </li></ul><ul><li>Compilatore x PiCoGa </li></ul><ul><li>Risultati sperimentali </li></ul>
  3. 3. Hybrid Processors <ul><li>Semplice core non piu sufficiente </li></ul><ul><ul><li>Incremento delle performance </li></ul></ul><ul><ul><li>Riduzione dei consumi (potenza, area) </li></ul></ul><ul><li>Core Configurabili </li></ul><ul><ul><li>Specializzazione ISA (Xtensa Tensilica, ARC) </li></ul></ul><ul><ul><li>Ideali per applicazioni di Digital Signal Processing e Bit Level Manipulation </li></ul></ul><ul><li>Svilluppi futuri: Core riconfigurabile </li></ul><ul><ul><li>GarpChip, ASH </li></ul></ul>
  4. 4. XiRisc+PiCoGa e GriffyC <ul><li>• Microcontrollore RISC 32bit </li></ul><ul><li>• Architettura VLIW a 2 issues </li></ul><ul><li>• Pipeline a 5 stadi </li></ul><ul><li>• ISA Configurabile </li></ul><ul><li>• Componente riconfigurabile </li></ul><ul><li>- PiCoGa 16x24 RLC </li></ul><ul><li>GriffyC superset Ansi-C </li></ul><ul><li>- Stile DFG </li></ul>
  5. 5. PGAop <ul><li>DFG </li></ul><ul><li>Multi contesto (4 configurazione, 1 esecuzione) </li></ul>+ + + A B C D Y A B C D Y
  6. 6. GriffyC L1 : sub a,a,2 rol b,b,a add d,d,a add c,b,d add i,i,1 bnz c,L1 sub a,a,2 add d,d,a rol b,b,a add c,b,d add i,i,1 A D I B L1 : sub a,2 rol b,a add d,a add c,b,d add i,i,1 bnz c,L1 PGAop a,b,d,i [..] for(;c!=0;i++)[ a=a-2; b=b<<a; d=d+a; c=b+d; ] [..] [..] PD_0=pga_allocate(myPGAop); [..] for(;c!=0;i++)[ pgadirect1(PD_0, a,i,b,d); ] [..] pga_deallocate(myPGAop); [..]
  7. 7. SUIF <ul><li>Infrastruttura per compilatori </li></ul><ul><ul><li>( http://suif.stanford.edu/ ) </li></ul></ul><ul><li>Orientata alla ricerca e sviluppo </li></ul><ul><li>Passi di compilazione modulari </li></ul><ul><li>Sistema estendibile </li></ul>Suifdriver Pass - analyses - optimization IR - suifnodes - basicnodes Kernel - suifkernel - iokernel MODULES
  8. 8. Machine SUIF Optimization & Analysis Algorithms O P I Target Machines Compilation Environment ( SUIF ) <ul><li>Permette la costruzione di “back ends” </li></ul><ul><li>Machine level intermediate forms </li></ul><ul><li>Descrizione architettura target </li></ul>Suif (v.2.1) Machine SUIF-IR (qui è definito machine ir.hoof file) OPI cfa bvd suifvm x86 alpha cma / ssa picovm ksta ex1 m2gc Parametrized Target dependent Compilation Environment is defined Str.Anl machine cfg ssa
  9. 9. Flusso di compilazione per PiCoGA C to SUIF LIR MACHINE-SUIF CFG STRUCTURAL ANALYSIS KERNEL IDENTIFICATION <ul><li>Innermost while-region; </li></ul><ul><li>“ PiCoGa basic block” marking; </li></ul><ul><li>selezione di sub-trees while-region contenenti solo PiCoGa Basic Block; </li></ul>PiCoGa Kernel translation <ul><li>SSA representation </li></ul><ul><li>Cti  Cmove replacement </li></ul><ul><li>Independent from Identification </li></ul><ul><li>- manual selected kernels translation </li></ul>GRIFFY–C COMPILER <ul><li>Kernel ranking </li></ul><ul><li>Kernel incapsulation </li></ul>KERNEL EXTRACTION 1 2 3
  10. 10. Generazione del GriffyC ...... ........ ......#ifndef PICOHEADER__provaTmp1 #define PICOHEADER__provaTmp1 #pragma fpga _provaTmp1 0x00 0 0 { /* Virtual register declarations */ void * _vr0; double _vr1; float _vr2; _vr4 = (float (*)[1])part_amplitude; _vr5 = (float *)_vr4; _vr6 = (float *)((char *)_vr5 + _vr3); _vr7 = *_vr6; _vr2 = (float)_vr7; _vr1 = (double)_vr2; printf(_vr0, i, _vr1); } #pragma end #endif /*PICOHEADER__provaTmp1*/ ...... ........ ......#ifndef PICOHEADER__provaTmp1 #define PICOHEADER__provaTmp1 #pragma fpga _provaTmp1 0x00 0 0 { /* Virtual register declarations */ void * _vr0; double _vr1; float _vr2; _vr4 = (float (*)[1])part_amplitude; _vr5 = (float *)_vr4; _vr6 = (float *)((char *)_vr5 + _vr3); _vr7 = *_vr6; _vr2 = (float)_vr7; _vr1 = (double)_vr2; printf(_vr0, i, _vr1); } #pragma end #endif /*PICOHEADER__provaTmp1*/ ...... ........ ......#ifndef PICOHEADER__provaTmp1 #define PICOHEADER__provaTmp1 #pragma fpga _provaTmp1 0x00 0 0 { /* Virtual register declarations */ void * _vr0; double _vr1; float _vr2; _vr4 = (float (*)[1])part_amplitude; _vr5 = (float *)_vr4; _vr6 = (float *)((char *)_vr5 + _vr3); _vr7 = *_vr6; _vr2 = (float)_vr7; _vr1 = (double)_vr2; printf(_vr0, i, _vr1); } #pragma end #endif /*PICOHEADER__provaTmp1*/ C SUIF SUIF (LIR) Dismantling delle strutture di controllo FileSetBlock FileBlock procedure procedure procedure FileBlock procedure procedure Machine SUIF CFG
  11. 11. Generazione del GriffyC picovm Control Tree ANNOTED Mach – SUIF ...... ........ ......#ifndef PICOHEADER__provaTmp1 #define PICOHEADER__provaTmp1 #pragma fpga _provaTmp1 0x00 0 0 { /* Virtual register declarations */ void * _vr0; double _vr1; float _vr2; _vr4 = (float (*)[1])part_amplitude; _vr5 = (float *)_vr4; _vr6 = (float *)((char *)_vr5 + _vr3); _vr7 = *_vr6; _vr2 = (float)_vr7; _vr1 = (double)_vr2; printf(_vr0, i, _vr1); } #pragma end #endif /*PICOHEADER__provaTmp1*/ PICOHEADER ...... ........ ......#ifndef PICOHEADER__provaTmp1 #define PICOHEADER__provaTmp1 #pragma fpga _provaTmp1 0x00 0 0 { /* Virtual register declarations */ void * _vr0; double _vr1; float _vr2; _vr4 = (float (*)[1])part_amplitude; _vr5 = (float *)_vr4; _vr6 = (float *)((char *)_vr5 + _vr3); _vr7 = *_vr6; _vr2 = (float)_vr7; _vr1 = (double)_vr2; printf(_vr0, i, _vr1); } #pragma end #endif /*PICOHEADER__provaTmp1*/ ...... ........ ......#ifndef PICOHEADER__provaTmp1 #define PICOHEADER__provaTmp1 #pragma fpga _provaTmp1 0x00 0 0 { /* Virtual register declarations */ void * _vr0; double _vr1; float _vr2; _vr4 = (float (*)[1])part_amplitude; _vr5 = (float *)_vr4; _vr6 = (float *)((char *)_vr5 + _vr3); _vr7 = *_vr6; _vr2 = (float)_vr7; _vr1 = (double)_vr2; printf(_vr0, i, _vr1); } #pragma end #endif /*PICOHEADER__provaTmp1*/ ...... ........ ......#ifndef PICOHEADER__provaTmp1 #define PICOHEADER__provaTmp1 #pragma fpga _provaTmp1 0x00 0 0 { /* Virtual register declarations */ void * _vr0; double _vr1; float _vr2; _vr4 = (float (*)[1])part_amplitude; _vr5 = (float *)_vr4; _vr6 = (float *)((char *)_vr5 + _vr3); _vr7 = *_vr6; _vr2 = (float)_vr7; _vr1 = (double)_vr2; printf(_vr0, i, _vr1); } #pragma end #endif /*PICOHEADER__provaTmp1*/ FileSetBlock FileBlock procedure procedure procedure FileBlock procedure procedure kernel Ottimizzazioni sul tipo di selezione ottimizzazioni sul body del kernel Selezione 2 3 Ranking & Estrazione SSA M2GC ...... ........ ......#ifndef PICOHEADER__provaTmp1 #define PICOHEADER__provaTmp1 #pragma fpga _provaTmp1 0x00 0 0 { /* Virtual register declarations */ void * _vr0; double _vr1; float _vr2; _vr4 = (float (*)[1])part_amplitude; _vr5 = (float *)_vr4; _vr6 = (float *)((char *)_vr5 + _vr3); _vr7 = *_vr6; _vr2 = (float)_vr7; _vr1 = (double)_vr2; printf(_vr0, i, _vr1); } #pragma end #endif /*PICOHEADER__provaTmp1*/ ...... ........ ......#ifndef PICOHEADER__provaTmp1 #define PICOHEADER__provaTmp1 #pragma fpga _provaTmp1 0x00 0 0 { /* Virtual register declarations */ void * _vr0; double _vr1; float _vr2; _vr4 = (float (*)[1])part_amplitude; _vr5 = (float *)_vr4; _vr6 = (float *)((char *)_vr5 + _vr3); _vr7 = *_vr6; _vr2 = (float)_vr7; _vr1 = (double)_vr2; printf(_vr0, i, _vr1); } #pragma end #endif /*PICOHEADER__provaTmp1*/ ...... ........ ......#ifndef PICOHEADER__provaTmp1 #define PICOHEADER__provaTmp1 #pragma fpga _provaTmp1 0x00 0 0 { /* Virtual register declarations */ void * _vr0; double _vr1; float _vr2; _vr4 = (float (*)[1])part_amplitude; _vr5 = (float *)_vr4; _vr6 = (float *)((char *)_vr5 + _vr3); _vr7 = *_vr6; _vr2 = (float)_vr7; _vr1 = (double)_vr2; printf(_vr0, i, _vr1); } #pragma end #endif /*PICOHEADER__provaTmp1*/ Structural Analysis 1 X
  12. 12. Test e Risultati <ul><li>Applicazioni di codifica video </li></ul><ul><ul><li>iDCT, quantizzazione </li></ul></ul>Block division DCT Storage DCT Quantize Entropy Encoder IDCT Entropy Decoder Immagine Reconstruct Dequantize originale Immagine
  13. 13. Test e Risultati
  14. 14. Conclusioni <ul><li>Realizzazione di un flusso di Compilazione completa </li></ul><ul><ul><li>Buon numero di kernel identificati </li></ul></ul><ul><ul><li>Kernel di medie-piccole dimensioni </li></ul></ul><ul><li>Prototipo stabile e sufficientemente efficiente </li></ul>
  15. 15. Sviluppi Futuri <ul><li>Strategie di selezione più evolute </li></ul><ul><li>Integrazione con il compilatore FastGriffy </li></ul><ul><li>Nuovi passi di ottimizzazione </li></ul><ul><li>Analisi Interprocedurali </li></ul><ul><ul><li>Incremento della dimensione media dei kernel accellerabili </li></ul></ul><ul><li>Aggiornamento ad evoluzione del PiCoGa </li></ul>
  16. 16. <ul><li>Domande? </li></ul>

×