IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
9953330565 Low Rate Call Girls In Jahangirpuri Delhi NCR
PBL1-v1-014e.pptx
1. CPU GPU
Ultimate CGRA w/ high-speed compiler
CGRA for Energy-efficient Cryptography
Beyond-Neuromorphic Systems
Non-Deterministic Computing
20210401
1
IMAX3: Amazing Dataflow-Centric CGRA
and its difference from CPU/Vector ep.14
Computing Architecture Laboratory in Nara Institute of Science and Technology is now targeting at power efficient computers to suppress global warming. I present this video to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves. This video follows episode 13, and focuses on the difference from CPU and Vector.
CPU and GPU execute instructions in order. However, in CGRA, ALUs and data path are reconfigured, data go through the network, and the results are stored every cycle.
Let’s see from the program’s view point. In the case of CPU, the machine instructions are executed in order among each loop iterations. Superscalar and VLIW execute multiple instructions simultaneously, but basically the scope is relatively small.
Vector is a model for simultaneous execution of instructions across multiple iterations. Long vector processors can eliminate the loop structure itself.
On the other hand, CGRA has a lot of registers and ALUs, and you can paste the instructions in the loop structure as they are. CGRA executes instructions across multiple iterations but differs from vector processors. In case of CPU, data dependencies between instructions stop the pipelining of superscalars and VLIWs. However, CGRA does not stall the pipeline even if there are data dependencies. Similarly, vectors require an alignment mechanism that connects the main memory address of continuous data with the start of the vector register when executing multiple iterations simultaneously. However, CGRA pipelines multiple iterations and does not require complicated alignment mechanisms. A drawback of CGRA is that if the number of instructions in the loop structure exceeds the number that can be mapped at once, it should be split. However, by adopting high-density instructions such as CISC, and by adopting a ring structure that eliminates positional restrictions, these drawbacks can be overcome.
The motivation of IMAX is to overcome the drawbacks of CPU and Vector. You can modify IMAX for your applications by yourself. The source code of IMAX compiler and simulator will help your development. Thank you for your attention.