IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
IMAX3: Amazing Dataflow-Centric CGRA and its Applications
I present this slide to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves.
1. CPU GPU
Ultimate CGRA w/ high-speed compiler
CGRA for Energy-efficient Cryptography
Beyond-Neuromorphic Systems
Non-Deterministic Computing
1
ナレータ VOICEVOX:もち子(cv 明日葉よもぎ)
はらぺこエンジニアに贈るCGRAの世界2022
(9. ハッシュ計算とFFTと文字列検索編)
スパコンからIoTまで 省エネ社会に
AI+BCだけじゃない超効率計算手法
2. 20220202
2
ハッシュ計算には、面倒な演算が必要
for (i=0; i<ctx->mbuflen; i+=BLKSIZE) { /* 1データ流内の並列実行は不可能. 多数データ流のパイプライン実行のみ */
for (th=0; th<thnum; th++) {
sregs[th*8+0] = state[th*8+0]; sregs[th*8+1] = state[th*8+1]; sregs[th*8+2] = state[th*8+2]; sregs[th*8+3] = state[th*8+3];
sregs[th*8+4] = state[th*8+4]; sregs[th*8+5] = state[th*8+5]; sregs[th*8+6] = state[th*8+6]; sregs[th*8+7] = state[th*8+7];
}
for (j=0; j<BLKSIZE; j+=BLKSIZE/DIV) {
for (th=0; th<thnum; th++) {
a = sregs[th*8+0]; b = sregs[th*8+1]; c = sregs[th*8+2]; d = sregs[th*8+3];
e = sregs[th*8+4]; f = sregs[th*8+5]; g = sregs[th*8+6]; h = sregs[th*8+7];
#if (DIV==4)
t1 = h+EP1(e)+CH(e,f,g)+k[j+ 0]+mbuf[i/BLKSIZE*MAX_THNUM*BLKSIZE+th*BLKSIZE+j+ 0]; t2 = EP0(a)+MAJ(a,b,c); h = g; g = f; f = e; e = d+t1; d = c; c = b; b = a; a = t1+t2;
t1 = h+EP1(e)+CH(e,f,g)+k[j+ 1]+mbuf[i/BLKSIZE*MAX_THNUM*BLKSIZE+th*BLKSIZE+j+ 1]; t2 = EP0(a)+MAJ(a,b,c); h = g; g = f; f = e; e = d+t1; d = c; c = b; b = a; a = t1+t2;
t1 = h+EP1(e)+CH(e,f,g)+k[j+ 2]+mbuf[i/BLKSIZE*MAX_THNUM*BLKSIZE+th*BLKSIZE+j+ 2]; t2 = EP0(a)+MAJ(a,b,c); h = g; g = f; f = e; e = d+t1; d = c; c = b; b = a; a = t1+t2;
t1 = h+EP1(e)+CH(e,f,g)+k[j+ 3]+mbuf[i/BLKSIZE*MAX_THNUM*BLKSIZE+th*BLKSIZE+j+ 3]; t2 = EP0(a)+MAJ(a,b,c); h = g; g = f; f = e; e = d+t1; d = c; c = b; b = a; a = t1+t2;
t1 = h+EP1(e)+CH(e,f,g)+k[j+ 4]+mbuf[i/BLKSIZE*MAX_THNUM*BLKSIZE+th*BLKSIZE+j+ 4]; t2 = EP0(a)+MAJ(a,b,c); h = g; g = f; f = e; e = d+t1; d = c; c = b; b = a; a = t1+t2;
t1 = h+EP1(e)+CH(e,f,g)+k[j+ 5]+mbuf[i/BLKSIZE*MAX_THNUM*BLKSIZE+th*BLKSIZE+j+ 5]; t2 = EP0(a)+MAJ(a,b,c); h = g; g = f; f = e; e = d+t1; d = c; c = b; b = a; a = t1+t2;
t1 = h+EP1(e)+CH(e,f,g)+k[j+ 6]+mbuf[i/BLKSIZE*MAX_THNUM*BLKSIZE+th*BLKSIZE+j+ 6]; t2 = EP0(a)+MAJ(a,b,c); h = g; g = f; f = e; e = d+t1; d = c; c = b; b = a; a = t1+t2;
t1 = h+EP1(e)+CH(e,f,g)+k[j+ 7]+mbuf[i/BLKSIZE*MAX_THNUM*BLKSIZE+th*BLKSIZE+j+ 7]; t2 = EP0(a)+MAJ(a,b,c); h = g; g = f; f = e; e = d+t1; d = c; c = b; b = a; a = t1+t2;
t1 = h+EP1(e)+CH(e,f,g)+k[j+ 8]+mbuf[i/BLKSIZE*MAX_THNUM*BLKSIZE+th*BLKSIZE+j+ 8]; t2 = EP0(a)+MAJ(a,b,c); h = g; g = f; f = e; e = d+t1; d = c; c = b; b = a; a = t1+t2;
t1 = h+EP1(e)+CH(e,f,g)+k[j+ 9]+mbuf[i/BLKSIZE*MAX_THNUM*BLKSIZE+th*BLKSIZE+j+ 9]; t2 = EP0(a)+MAJ(a,b,c); h = g; g = f; f = e; e = d+t1; d = c; c = b; b = a; a = t1+t2;
t1 = h+EP1(e)+CH(e,f,g)+k[j+10]+mbuf[i/BLKSIZE*MAX_THNUM*BLKSIZE+th*BLKSIZE+j+10]; t2 = EP0(a)+MAJ(a,b,c); h = g; g = f; f = e; e = d+t1; d = c; c = b; b = a; a = t1+t2;
t1 = h+EP1(e)+CH(e,f,g)+k[j+11]+mbuf[i/BLKSIZE*MAX_THNUM*BLKSIZE+th*BLKSIZE+j+11]; t2 = EP0(a)+MAJ(a,b,c); h = g; g = f; f = e; e = d+t1; d = c; c = b; b = a; a = t1+t2;
t1 = h+EP1(e)+CH(e,f,g)+k[j+12]+mbuf[i/BLKSIZE*MAX_THNUM*BLKSIZE+th*BLKSIZE+j+12]; t2 = EP0(a)+MAJ(a,b,c); h = g; g = f; f = e; e = d+t1; d = c; c = b; b = a; a = t1+t2;
t1 = h+EP1(e)+CH(e,f,g)+k[j+13]+mbuf[i/BLKSIZE*MAX_THNUM*BLKSIZE+th*BLKSIZE+j+13]; t2 = EP0(a)+MAJ(a,b,c); h = g; g = f; f = e; e = d+t1; d = c; c = b; b = a; a = t1+t2;
t1 = h+EP1(e)+CH(e,f,g)+k[j+14]+mbuf[i/BLKSIZE*MAX_THNUM*BLKSIZE+th*BLKSIZE+j+14]; t2 = EP0(a)+MAJ(a,b,c); h = g; g = f; f = e; e = d+t1; d = c; c = b; b = a; a = t1+t2;
t1 = h+EP1(e)+CH(e,f,g)+k[j+15]+mbuf[i/BLKSIZE*MAX_THNUM*BLKSIZE+th*BLKSIZE+j+15]; t2 = EP0(a)+MAJ(a,b,c); h = g; g = f; f = e; e = d+t1; d = c; c = b; b = a; a = t1+t2;
#endif
sregs[th*8+0] = a; sregs[th*8+1] = b; sregs[th*8+2] = c; sregs[th*8+3] = d;
sregs[th*8+4] = e; sregs[th*8+5] = f; sregs[th*8+6] = g; sregs[th*8+7] = h;
}
}
for (th=0; th<thnum; th++) {
state[th*8+0] += sregs[th*8+0]; state[th*8+1] += sregs[th*8+1]; state[th*8+2] += sregs[th*8+2]; state[th*8+3] += sregs[th*8+3];
state[th*8+4] += sregs[th*8+4]; state[th*8+5] += sregs[th*8+5]; state[th*8+6] += sregs[th*8+6]; state[th*8+7] += sregs[th*8+7];
}
}