0507036

387 views
369 views

Published on

my first seminar slide.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
387
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

0507036

  1. 1. R EFERENCE: PUBLISHED BY THE IEEE COMPUTER SOCIETY, JULY 2008 Presented by: Md. Merazul Islam 0507036 Dept. of CSE, KUET
  2. 2. W ARP P ROCESSING ? <ul><li>Dynamically optimize the software to improve execution time and energy consumption. </li></ul><ul><li>A new architecture implementing with both H/W & S/W. </li></ul><ul><li>Transform binary kernel into FPGA circuit. </li></ul><ul><li>Fully dynamic and generate entire coprocessing circuits beyond functional units. </li></ul><ul><li>It can also works with multiple processors. </li></ul>Md. Merazul Islam, Dept. of CSE, KUET
  3. 3. F PGA C IRCUIT ? <ul><li>Field Programmable Gate Array: Programmable. </li></ul><ul><li>FPGA do Bit Manipulation Fast. </li></ul><ul><li>FPGAs aren't Part of Mainstream Computing. </li></ul><ul><li>Supports any compiler, any language, multiple sources etc. </li></ul>Figure: In the CAD-oriented FPGA, the configurable logic block inputs and outputs are directly connected to the switch matrices. Md. Merazul Islam, Dept. of CSE, KUET
  4. 4. W ARP A RCHITECTURE µ P I$ D$ FPGA Profiler Dynamic Part. Module (DPM) Md. Merazul Islam, Dept. of CSE, KUET Partitioned application executes faster with lower energy consumption 5 Profile application to determine critical regions 2 Profiler Initially execute application in software only 1 µ P I$ D$ Partition critical regions to hardware 3 Dynamic Part. Module (DPM) Program configurable logic & update software binary 4 FPGA
  5. 5. W ARP P ROCESSING S TEPS Md. Merazul Islam, Dept. of CSE, KUET µ P I$ D$ (FPGA) Profiler DPM (CAD) Binary Binary Decompilation Binary HW Bit stream RT Synthesis Partitioning Binary Updater Binary Updated Binary Binary Std. HW Binary JIT FPGA Compilation JIT FPGA Compilation Tech. Mapping/Packing Placement Logic Synthesis Routing
  6. 6. W ARP P ROCESSING S TEPS <ul><li>Dynamic Binary Translation </li></ul><ul><li>Decompilation: </li></ul><ul><ul><li>Recover high-level information lost during compilation. </li></ul></ul><ul><ul><li>Utilize sophisticated decompilation methods. </li></ul></ul><ul><li>RT Synthesis: </li></ul><ul><ul><li>Converts decompiled CDFG to Boolean expressions. </li></ul></ul><ul><ul><li>Detects read/write, memory access pattern, memory read/write ordering. </li></ul></ul>discover loops, if-else, etc. reduce operation sizes, etc. reroll loops, etc. Md. Merazul Islam, Dept. of CSE, KUET
  7. 7. W ARP P ROCESSING S TEPS <ul><li>Logic Synthesis: Optimize hardware circuit created during RT synthesis. </li></ul><ul><li>Technology Mapping/Packing: </li></ul><ul><ul><li>Decompose hardware circuit into basic logic gates. </li></ul></ul><ul><ul><li>Traverse logic network combining nodes to form single-output. </li></ul></ul><ul><ul><li>Placement: Identify critical path, placing critical nodes in center of configurable logic fabric. </li></ul></ul><ul><ul><li>Routing: </li></ul></ul><ul><ul><ul><li>Find a path within FPGA to connect source and sinks of each net. </li></ul></ul></ul><ul><ul><ul><li>Represent routing nets between CLBs as routing between SMs. </li></ul></ul></ul>Md. Merazul Islam, Dept. of CSE, KUET
  8. 8. R ESULTS <ul><li>Execution Time and Memory Requirements </li></ul><ul><li>(a) a commercial FPGA CAD tool running on a desktop workstation (b) the Riverside Dynamic CAD tools on the same workstation, and (c) the RDCAD tools on a lean 40- MHz ARM7 processor. </li></ul>size time a 120 MB 3 min b 3.6 MB .108 s c 3.6 MB 1.11 s Md. Merazul Islam, Dept. of CSE, KUET
  9. 9. S PEEDUP C OMPARISON <ul><li>[a] Comparison of software execution on a digital signal processor (DSP) and warped execution on a warp processor to a 200-MHz ARM9 on single threaded applications. </li></ul><ul><li>[b] Comparison of multithreaded application speedups on various 400-MHz ARM11-based multiprocessors and warp processors. </li></ul>Md. Merazul Islam, Dept. of CSE, KUET
  10. 10. C ONCLUSION <ul><li>Warp processing shows the technique’s & opening the door to new challenges. </li></ul><ul><li>Speed up 2X-100X or even more. </li></ul><ul><li>20X less memory usage. </li></ul><ul><li>10% more routing resource usage. </li></ul><ul><li>38%-94% power reduction. </li></ul><ul><li>In the near future, we expect warp processors to achieve speedups much greater than an order of magnitude. </li></ul>Md. Merazul Islam, Dept. of CSE, KUET

×