SlideShare a Scribd company logo
由GPU硬體概念到coding
CUDA
AJ
2014.6.17
GPU是否只能當顯示卡?
能不能拿來做平行運算?
兩個大廠
• NVIDIA
• AMD
• 這兩大廠都有提供open source project給玩家來join
• 能join什麼? 還沒涉略
• 因為我的實驗室只有NVIDA卡,所以就使用NVIDA ~”~
• NVIDA卡,它是使用何種programming model來programming?
• Single-instruction multiple thread (SIMT) programming model
使用此model帶來給你
怎樣的設計概念
從NVIDIA GPU設計概念說起
在NVIDIA GPU中,可用三個特性來看SIMT
• Single instruction, multiple register sets
• Single instruction, multiple addresses
• Single instruction, multiple flow paths
http://www.yosefk.com/blog/simd-simt-smt-parallelism-in-nvidia-gpus.html
Single Instruction, Multiple Register Sets
for(i=0;i<n;++i) a[i]=b[i]+c[i];
__global__ void add(float *a, float *b, float *c) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
a[i]=b[i]+c[i]; //no loop!
}
Costs:
• 每個thread都會對應自己的register set ,所以會有redundant情況發生。
http://www.yosefk.com/blog/simd-simt-smt-parallelism-in-nvidia-gpus.html
Single Instruction, Multiple Addresses
__global__ void apply(short* a, short* b, short* lut) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
a[i] = lut[b[i]]; //indirect memory access
// a[i] = b[i]
}
Cost:
• 對於DRAM memory來說, random access跟循序存取比起來是沒有效率
的。
• 對於shared memory來說, random access 會藉由bank contentions而變
慢速度。(先不討論shared memory)
http://www.yosefk.com/blog/simd-simt-smt-parallelism-in-nvidia-gpus.html
Single Instruction, Multiple Flow Paths
http://www.yosefk.com/blog/simd-simt-smt-parallelism-in-nvidia-gpus.html
__global__ void find(int* vec, int len,
int* ind, int* nfound,
int nthreads) {
int tid = blockIdx.x * blockDim.x + threadIdx.x;
int last = 0;
int* myind = ind + tid*len;
for(int i=tid; i<len; i+=nthreads) {
if( vec[i] ) { //flow divergence
myind[last] = i;
last++;
}
}
nfound[tid] = last;
}
….
len thread id = 0
nthreads =1vec
get coalescing to read
if(vec[i]) 成立 if(vec[i]) 不成成立
get coalescing to read
….
registers
以上為SIMT設計特性。
先來看kepler gk110 晶片方塊圖。
• 15 SMX(串流處理器) X 192 cores
• 4 warp scheduler per SMX
• 暫存器個數65536 per SMX
Form NVIDIA kepler gk110 architecture whitepaper
• warp scheduler 用來做啥?
• SMX內部的資源分配
Form NVIDIA kepler gk110 architecture whitepaper
warp1 warp2
Warp使用SIMT運作
1. 在NVIDIA中, a “warp”是由好幾個(32)threads組成
且同時跑。 而每個thread需要自己的registers 。
2.在Warp中,SIMT去執行,也就是說32 threads執行相
同指令。如果對於flow divergence ,則硬體會分多個warp處
理這問題,但效能會變差。(James Balfour, “CUDA
Threads and Atomics” ,p.13~p.18) 。
http://www.yosefk.com/blog/simd-simt-smt-parallelism-in-nvidia-gpus.html
[若渴計畫]由GPU硬體概念到coding CUDA
Warp使用SIMT運作(cont.)
1.. 好幾個warps組成a “block” , 一個block被對應到一
個SMX ,而一個SMX裡面有warp scheduler去切換一個
block中的warps去執行。 而每個warp都有自己的
register sets。
2. 由圖可知一個block ,再做warp schedule時,是zero
overhead (fast context switch)。因為狀態接由
register set保存。而warp狀態可分actives/suspended 。
http://www.yosefk.com/blog/simd-simt-smt-parallelism-in-nvidia-gpus.html
3. 你可以指定一個block有多少thread。但一個block做多指
定多少thread ,要看硬體可支援的運算能力。
Thread ID 如何對應到 Warp
• Warp ID (warpid)
• 如何知道一個block中某thread屬於哪個warp? threadIdx.x / 32
• Thread ID = 0 ~ 31  warp
• Thread ID = 32~64  warp
• …
http://on-demand.gputechconf.com/gtc/2013/presentations/S3174-Kepler-Shuffle-Tips-Tricks.pdf , p.2
以上GPU原理(當然不只) ,外加整合CPU ,
然而就有了CUDA的coding環境出現。
使用CUDA必須注意的事情
• 使用哪一個NVIDIA GPU Architecture 。
• NVIDIA Tesla K20c
• 從https://developer.nvidia.com/cuda-gpus 可知Tesla K20c的
Compute Capability為3.5 。
• 安裝CUDA環境,可參考http://docs.nvidia.com/cuda/cuda-getting-
started-guide-for-linux/index.html#axzz33nDhVV00 。編譯器名稱為
nvcc 。
• 最新的CUDA版本為6.0 ,而我安裝的是5.0 XD(懶得升級 哈) 。
• 安裝完CUDA環境,可跑內建執行檔deviceQuery 去看看安裝對不
對。
/usr/local/cuda-5.0/samples/1_Utilities/deviceQuery
你會有個疑問 那我同一個CUDA他如何做到同
一個GPU不同SMX數也可以執行?
Block Scalability
Program Compilation
CUDA 5: Separate Compilation & Linking
From Introducing CUDA 5.pdf
Makefile範例
##########################################################
# compiler setting
##########################################################
CC = gcc
CXX = g++
NVCC = nvcc
CFLAGS = -g -Wall
CXXFLAGS = $(CFLAGS) -Weffc++ -pg
LIBS = -lm -lglut -lGLU -lGL
INCPATH = -I./
OBJS = main.o
c_a.o
c_b.o
cpp_a.o
cpp_b.o
cu_a.o
cu_b.o
EXEC = output
all: $(OBJS)
$(NVCC) $(OBJS) -o $(EXEC) $(LIBS) -pg
%.o:%.cu
$(NVCC) -c $< -o $@ -g –G -arch=sm_35
%.o:%.cpp
$(CXX) -c $(CXXFLAGS) $(INCPATH) $< -o $@
%.o:%.c
$(CC) -c $(CFLAGS) $(INCPATH) $< -o $@
#########################################################
假設拿到別人的平行化程式,
可試試看一個不錯可能改善效能的方法。
The ILP method <=小時候學的ILP可以這樣
用啊!!
• 多條thread合併->ILP增加 -> 有機會對coalesce global memory-> Block數減少 -
> 一個thread使用register個數增加 -> Ocuupancy降低
(Vasily Volkov, “Better Performance at Lower Occupancy”)
先說什麼是Occupancy
• Occupancy = Number of warps running concurrently on a
multiprocessor divided by maximum number of warps that can run
concurrently.(意思就是說你每個時間所同時跑的thread數,到底有
沒有塞滿GPU提供的最大同時間跑的thread數。)
From Optimizing CUDA – Part II © NVIDIA Corporation 2009
• 假設某GPU的其中一個SMX最
多同時間可跑1536個threads以
及32K register
NVIDIA工程師
(http://stackoverflow.com/users/749748/harrism)
在stackoverflow表示
• In general, as Jared mentions, using too many registers per thread is
not desirable because it reduces occupancy, and therefore reduces
latency hiding ability in the kernel. GPUs thrive on parallelism and do
so by covering memory latency with work from other threads.
• Therefore, you should probably not optimize arrays into registers.
Instead, ensure that your memory accesses to those arrays across
threads are as close to sequential as possible so you maximize
coalescing (i.e. minimize memory transactions).
http://stackoverflow.com/questions/12167926/forcing-cuda-to-use-register-for-a-variable
也就是說不管Occupancy高不高,要讓
memory有機會能coalesce來讀取。
繼續對ILP在NVIDIA GPU影響做說明
http://continuum.io/blog/cudapy_ilp_opt
搬資料
• Core
• Memory
controller
上面的效果對應CODE是什麼啊?
ILP = 2時,右邊用
pseudocode表示
# read
i = thread.id
ai = a[i]
bi = b[i]
j = i+5
aj = a[j]
bj = b[j]
# compute
ci = core(ai, bi)
cj = core(aj, bj)
# write
c[i] = ci
c[j] = cj
ILP=4時,實際效果=>讓GPU pipeline效果變
高
http://continuum.io/blog/cudapy_ilp_opt
上述主要概念整理
•Hide latency = do other operations when
waiting for latency
• ILP增加
• 增加Occupancy
剛提到the ILP method ,
一個thread 所使用的register個數是一個重要考
量。
Interpreting Output of --ptxas-options=-v
http://stackoverflow.com/questions/12388207/interpreting-output-of-ptxas-options-v
http://stackoverflow.com/questions/7241062/is-local-memory-slower-than-shared-memory-in-cuda
• Each CUDA thread is using 46 registers?
Yes, correct
• There is no register spilling to local memory(shared memory)?
Yes, correct
• Is 72 bytes the sum-total of the memory for the stack frames of the __global__ (撰寫
平行化的副程式)and __device__(給__global__函數呼叫的副程式) functions?
Yes, correct
我要怎麼限制一個thread的register使用數
• control register usage with the nvcc flag: --maxrregcount
假設threads的分配register總量超過GPU上的
register數量,編譯器會怎做?
stackoverflow神人表示
• PTX level allows many more virtual registers than the hardware.
Those are mapped to hardware registers at load time. The register
limit you specify allows you to set an upper limit on the hardware
resources used by the generated binary. It serves as a heuristic for the
compiler to decide when to spill (see below) registers when compiling
to PTX already so certain concurrency needs can be met.
• For Fermi GPUs there are at most 64 hardware registers. The 64th is
used by the ABI as the stack pointer and thus for "register spilling" (it
means freeing up registers by temporarily storing their values on the
stack and happens when more registers are needed than available) so
it is untouchable.
http://stackoverflow.com/questions/12167926/forcing-cuda-to-use-register-for-a-variable
剛剛說利用增加register來賺memory coalesce的
時間。 register用超過會增加memory存取時間。
怎辦啊?
哈! 再怎嘴砲,也是要coding才知阿~~~~~
我可以寫程式把所需資料放在哪呢?
Mohamed Zahran,
“Lecture 6: CUDA Memories”
• 存取速度
shared memory >
constant memory >
global memory >
要怎宣告的資料,代表存取哪種memory啊?
描述有錯,要看
compiler放在哪裡
Stackoverflow神人
• Dynamically indexed arrays cannot be stored in registers, because the GPU
register file is not dynamically addressable.
• Scalar variables are automatically stored in registers by the compiler.
• Statically-indexed (i.e. where the index can be determined at compile
time), small arrays (say, less than 16 floats) may be stored in registers by the
compiler.
http://stackoverflow.com/questions/12167926/forcing-cuda-to-use-register-for-a-variable
來看一個簡單的範例
Summing two vectors
Jason Sa nders, Edward Kandrot, “CUDA by Example”
資料哪來啊? 從CPU Memory搬到global
memory
Jason Sa nders, Edward Kandrot, “CUDA by Example”
怎麼呼叫自己寫的平行化程式押?
• 呼叫時需要指定每個block有thread數,一個grid有多少block
• 上面意思是說一個grid有N個blocks ,每個block有1個thread再
執行
threadsblocks
Jason Sa nders, Edward Kandrot, “CUDA by Example”
從GPU global memory寫回到CPU memory去
處理
Jason Sa nders, Edward Kandrot, “CUDA by Example”
整理以上流程
http://en.wikipedia.org/wiki/CUDA
為什麼要指定的thread數block數會有
1D,2D,3D阿?
• 1 block 4
• 一個block是9x9,因為
100 thread所以有兩
個block
• 2 blocks
在thread數不是32倍數的狀況下,1D,2D,3D
的分法就是要比較哪個warp塞比較滿!!!
要怎量GPU跑的時間
Profiling Tool: nvprof
nvprof --events warps_launched,threads_launched ./執行檔 執行檔輸入參數 >
result
Q&A
Q&A-1: flow divergence的討論
• JIT的作法
• 程式用profile知道哪些true或false的狀況,分開同時丟給JIT去執行
• Brower就是用這樣的方式去加快處理
• 這樣的做法很吃memory
Q&A-2: NVIDA/AMD
• NVIDA
• 筆電,伺服器
• AMD
• 手機
Q&A-3:Single Instruction, Multiple Addresses
的討論
• 對於compiler處理random access
• Point analysis
Q&A-4:
• CUDA LLVM Compiler
• 目前CUDA不支援OpenCL 2.0
• https://developer.nvidia.com/opencl
Q&A-5: trace code討論
• cuda-gdb
• http://docs.nvidia.com/cuda/cuda-gdb/#axzz34ufkPsqt
• EX:
• Note: For disassembly instruction to work
properly, cuobjdump must be installed and present in your $PATH.
Q&A-6: GPU machine code放到哪執行阿?
 不知道GPU有沒有在討論locality問題?
Q&A-7 把function切開平行化是否有好處?
• Function()
function1()
function2()
function3()
• ?
Q&A-8 5 axis machine 的防碰撞平行化
• cutter每走一步就用GPU檢查有沒有撞到
• 問題: GPU持續耗電
• 如果5軸機開雕刻一整天 GPU不就耗電很恐怖?
• Trade off: 耗電/速度
CUDA Toolkit Documentation
• http://docs.nvidia.com/cuda/index.html#axzz33uurtJU9

More Related Content

What's hot

Javaはどのように動くのか~スライドでわかるJVMの仕組み
Javaはどのように動くのか~スライドでわかるJVMの仕組みJavaはどのように動くのか~スライドでわかるJVMの仕組み
Javaはどのように動くのか~スライドでわかるJVMの仕組み
Chihiro Ito
 
CUDAプログラミング入門
CUDAプログラミング入門CUDAプログラミング入門
CUDAプログラミング入門
NVIDIA Japan
 
[D20] 高速Software Switch/Router 開発から得られた高性能ソフトウェアルータ・スイッチ活用の知見 (July Tech Fest...
[D20] 高速Software Switch/Router 開発から得られた高性能ソフトウェアルータ・スイッチ活用の知見 (July Tech Fest...[D20] 高速Software Switch/Router 開発から得られた高性能ソフトウェアルータ・スイッチ活用の知見 (July Tech Fest...
[D20] 高速Software Switch/Router 開発から得られた高性能ソフトウェアルータ・スイッチ活用の知見 (July Tech Fest...
Tomoya Hibi
 
Twitterのsnowflakeについて
TwitterのsnowflakeについてTwitterのsnowflakeについて
Twitterのsnowflakeについて
moai kids
 
Windows Registered I/O (RIO) vs IOCP
Windows Registered I/O (RIO) vs IOCPWindows Registered I/O (RIO) vs IOCP
Windows Registered I/O (RIO) vs IOCP
Seungmo Koo
 
知っているようで知らないPAMのお話
知っているようで知らないPAMのお話知っているようで知らないPAMのお話
知っているようで知らないPAMのお話
Serverworks Co.,Ltd.
 
from Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu Worksfrom Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu Works
Zhen Wei
 
BoostAsioで可読性を求めるのは間違っているだろうか
BoostAsioで可読性を求めるのは間違っているだろうかBoostAsioで可読性を求めるのは間違っているだろうか
BoostAsioで可読性を求めるのは間違っているだろうか
Yuki Miyatake
 
エンジニアなら知っておきたい「仮想マシン」のしくみ v1.1 (hbstudy 17)
エンジニアなら知っておきたい「仮想マシン」のしくみ v1.1 (hbstudy 17)エンジニアなら知っておきたい「仮想マシン」のしくみ v1.1 (hbstudy 17)
エンジニアなら知っておきたい「仮想マシン」のしくみ v1.1 (hbstudy 17)
Takeshi HASEGAWA
 
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgen
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgenIntel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgen
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgen
MITSUNARI Shigeo
 
IPv4/IPv6 移行・共存技術の動向
IPv4/IPv6 移行・共存技術の動向IPv4/IPv6 移行・共存技術の動向
IPv4/IPv6 移行・共存技術の動向
Yuya Rin
 
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
Gopi Krishnamurthy
 
トランザクションの設計と進化
トランザクションの設計と進化トランザクションの設計と進化
トランザクションの設計と進化
Kumazaki Hiroki
 
Glibc malloc internal
Glibc malloc internalGlibc malloc internal
Glibc malloc internal
Motohiro KOSAKI
 
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
National Cheng Kung University
 
Dpdk pmd
Dpdk pmdDpdk pmd
Dpdk pmd
Masaru Oki
 
2015年度GPGPU実践プログラミング 第5回 GPUのメモリ階層
2015年度GPGPU実践プログラミング 第5回 GPUのメモリ階層2015年度GPGPU実践プログラミング 第5回 GPUのメモリ階層
2015年度GPGPU実践プログラミング 第5回 GPUのメモリ階層
智啓 出川
 
PlaySQLAlchemy: SQLAlchemy入門
PlaySQLAlchemy: SQLAlchemy入門PlaySQLAlchemy: SQLAlchemy入門
PlaySQLAlchemy: SQLAlchemy入門
泰 増田
 
まずやっとくPostgreSQLチューニング
まずやっとくPostgreSQLチューニングまずやっとくPostgreSQLチューニング
まずやっとくPostgreSQLチューニング
Kosuke Kida
 
NAND Flash から InnoDB にかけての話(仮)
NAND Flash から InnoDB にかけての話(仮)NAND Flash から InnoDB にかけての話(仮)
NAND Flash から InnoDB にかけての話(仮)
Takanori Sejima
 

What's hot (20)

Javaはどのように動くのか~スライドでわかるJVMの仕組み
Javaはどのように動くのか~スライドでわかるJVMの仕組みJavaはどのように動くのか~スライドでわかるJVMの仕組み
Javaはどのように動くのか~スライドでわかるJVMの仕組み
 
CUDAプログラミング入門
CUDAプログラミング入門CUDAプログラミング入門
CUDAプログラミング入門
 
[D20] 高速Software Switch/Router 開発から得られた高性能ソフトウェアルータ・スイッチ活用の知見 (July Tech Fest...
[D20] 高速Software Switch/Router 開発から得られた高性能ソフトウェアルータ・スイッチ活用の知見 (July Tech Fest...[D20] 高速Software Switch/Router 開発から得られた高性能ソフトウェアルータ・スイッチ活用の知見 (July Tech Fest...
[D20] 高速Software Switch/Router 開発から得られた高性能ソフトウェアルータ・スイッチ活用の知見 (July Tech Fest...
 
Twitterのsnowflakeについて
TwitterのsnowflakeについてTwitterのsnowflakeについて
Twitterのsnowflakeについて
 
Windows Registered I/O (RIO) vs IOCP
Windows Registered I/O (RIO) vs IOCPWindows Registered I/O (RIO) vs IOCP
Windows Registered I/O (RIO) vs IOCP
 
知っているようで知らないPAMのお話
知っているようで知らないPAMのお話知っているようで知らないPAMのお話
知っているようで知らないPAMのお話
 
from Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu Worksfrom Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu Works
 
BoostAsioで可読性を求めるのは間違っているだろうか
BoostAsioで可読性を求めるのは間違っているだろうかBoostAsioで可読性を求めるのは間違っているだろうか
BoostAsioで可読性を求めるのは間違っているだろうか
 
エンジニアなら知っておきたい「仮想マシン」のしくみ v1.1 (hbstudy 17)
エンジニアなら知っておきたい「仮想マシン」のしくみ v1.1 (hbstudy 17)エンジニアなら知っておきたい「仮想マシン」のしくみ v1.1 (hbstudy 17)
エンジニアなら知っておきたい「仮想マシン」のしくみ v1.1 (hbstudy 17)
 
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgen
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgenIntel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgen
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgen
 
IPv4/IPv6 移行・共存技術の動向
IPv4/IPv6 移行・共存技術の動向IPv4/IPv6 移行・共存技術の動向
IPv4/IPv6 移行・共存技術の動向
 
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
 
トランザクションの設計と進化
トランザクションの設計と進化トランザクションの設計と進化
トランザクションの設計と進化
 
Glibc malloc internal
Glibc malloc internalGlibc malloc internal
Glibc malloc internal
 
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
 
Dpdk pmd
Dpdk pmdDpdk pmd
Dpdk pmd
 
2015年度GPGPU実践プログラミング 第5回 GPUのメモリ階層
2015年度GPGPU実践プログラミング 第5回 GPUのメモリ階層2015年度GPGPU実践プログラミング 第5回 GPUのメモリ階層
2015年度GPGPU実践プログラミング 第5回 GPUのメモリ階層
 
PlaySQLAlchemy: SQLAlchemy入門
PlaySQLAlchemy: SQLAlchemy入門PlaySQLAlchemy: SQLAlchemy入門
PlaySQLAlchemy: SQLAlchemy入門
 
まずやっとくPostgreSQLチューニング
まずやっとくPostgreSQLチューニングまずやっとくPostgreSQLチューニング
まずやっとくPostgreSQLチューニング
 
NAND Flash から InnoDB にかけての話(仮)
NAND Flash から InnoDB にかけての話(仮)NAND Flash から InnoDB にかけての話(仮)
NAND Flash から InnoDB にかけての話(仮)
 

Viewers also liked

[SITCON2015] 自己的異質多核心平台自己幹
[SITCON2015] 自己的異質多核心平台自己幹[SITCON2015] 自己的異質多核心平台自己幹
[SITCON2015] 自己的異質多核心平台自己幹
Aj MaChInE
 
[MOSUT] Format String Attacks
[MOSUT] Format String Attacks[MOSUT] Format String Attacks
[MOSUT] Format String Attacks
Aj MaChInE
 
[若渴計畫]64-bit Linux Return-Oriented Programming
[若渴計畫]64-bit Linux Return-Oriented Programming[若渴計畫]64-bit Linux Return-Oriented Programming
[若渴計畫]64-bit Linux Return-Oriented Programming
Aj MaChInE
 
閱讀文章分享@若渴 2016.1.24
閱讀文章分享@若渴 2016.1.24閱讀文章分享@若渴 2016.1.24
閱讀文章分享@若渴 2016.1.24
Aj MaChInE
 
[若渴計畫2015.8.18] SMACK
[若渴計畫2015.8.18] SMACK[若渴計畫2015.8.18] SMACK
[若渴計畫2015.8.18] SMACK
Aj MaChInE
 
Code GPU with CUDA - SIMT
Code GPU with CUDA - SIMTCode GPU with CUDA - SIMT
Code GPU with CUDA - SIMT
Marina Kolpakova
 
[若渴計畫] Studying Concurrency
[若渴計畫] Studying Concurrency[若渴計畫] Studying Concurrency
[若渴計畫] Studying Concurrency
Aj MaChInE
 
大學部101級專題 cuda
大學部101級專題 cuda大學部101級專題 cuda
大學部101級專題 cuda
迺翔 黃
 
[MOSUT20150131] Linux Runs on SoCKit Board with the GPGPU
[MOSUT20150131] Linux Runs on SoCKit Board with the GPGPU[MOSUT20150131] Linux Runs on SoCKit Board with the GPGPU
[MOSUT20150131] Linux Runs on SoCKit Board with the GPGPU
Aj MaChInE
 
Introduction to gpu architecture
Introduction to gpu architectureIntroduction to gpu architecture
Introduction to gpu architecture
CHIHTE LU
 
圖形處理器於腦部核磁共振影像處理應用
圖形處理器於腦部核磁共振影像處理應用圖形處理器於腦部核磁共振影像處理應用
圖形處理器於腦部核磁共振影像處理應用
NVIDIA Taiwan
 

Viewers also liked (11)

[SITCON2015] 自己的異質多核心平台自己幹
[SITCON2015] 自己的異質多核心平台自己幹[SITCON2015] 自己的異質多核心平台自己幹
[SITCON2015] 自己的異質多核心平台自己幹
 
[MOSUT] Format String Attacks
[MOSUT] Format String Attacks[MOSUT] Format String Attacks
[MOSUT] Format String Attacks
 
[若渴計畫]64-bit Linux Return-Oriented Programming
[若渴計畫]64-bit Linux Return-Oriented Programming[若渴計畫]64-bit Linux Return-Oriented Programming
[若渴計畫]64-bit Linux Return-Oriented Programming
 
閱讀文章分享@若渴 2016.1.24
閱讀文章分享@若渴 2016.1.24閱讀文章分享@若渴 2016.1.24
閱讀文章分享@若渴 2016.1.24
 
[若渴計畫2015.8.18] SMACK
[若渴計畫2015.8.18] SMACK[若渴計畫2015.8.18] SMACK
[若渴計畫2015.8.18] SMACK
 
Code GPU with CUDA - SIMT
Code GPU with CUDA - SIMTCode GPU with CUDA - SIMT
Code GPU with CUDA - SIMT
 
[若渴計畫] Studying Concurrency
[若渴計畫] Studying Concurrency[若渴計畫] Studying Concurrency
[若渴計畫] Studying Concurrency
 
大學部101級專題 cuda
大學部101級專題 cuda大學部101級專題 cuda
大學部101級專題 cuda
 
[MOSUT20150131] Linux Runs on SoCKit Board with the GPGPU
[MOSUT20150131] Linux Runs on SoCKit Board with the GPGPU[MOSUT20150131] Linux Runs on SoCKit Board with the GPGPU
[MOSUT20150131] Linux Runs on SoCKit Board with the GPGPU
 
Introduction to gpu architecture
Introduction to gpu architectureIntroduction to gpu architecture
Introduction to gpu architecture
 
圖形處理器於腦部核磁共振影像處理應用
圖形處理器於腦部核磁共振影像處理應用圖形處理器於腦部核磁共振影像處理應用
圖形處理器於腦部核磁共振影像處理應用
 

Similar to [若渴計畫]由GPU硬體概念到coding CUDA

Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
J On The Beach
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
ssuser413a98
 
[Osxdev]metal
[Osxdev]metal[Osxdev]metal
[Osxdev]metal
NAVER D2
 
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAMaking the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Facultad de Informática UCM
 
Transparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaTransparent GPU Exploitation for Java
Transparent GPU Exploitation for Java
Kazuaki Ishizaki
 
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala TaiwanScala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
Jimin Hsieh
 
[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey Gordeychik
[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey Gordeychik[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey Gordeychik
[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey Gordeychik
CODE BLUE
 
Introduction to cuda geek camp singapore 2011
Introduction to cuda   geek camp singapore 2011Introduction to cuda   geek camp singapore 2011
Introduction to cuda geek camp singapore 2011
Raymond Tay
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 
gpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsngpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsn
ARUNACHALAM468781
 
GPU: Understanding CUDA
GPU: Understanding CUDAGPU: Understanding CUDA
GPU: Understanding CUDA
Joaquín Aparicio Ramos
 
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
AMD Developer Central
 
SMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiSMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgi
Takuya ASADA
 
Bits of Advice for the VM Writer, by Cliff Click @ Curry On 2015
Bits of Advice for the VM Writer, by Cliff Click @ Curry On 2015Bits of Advice for the VM Writer, by Cliff Click @ Curry On 2015
Bits of Advice for the VM Writer, by Cliff Click @ Curry On 2015
curryon
 
Java on the GPU: Where are we now?
Java on the GPU: Where are we now?Java on the GPU: Where are we now?
Java on the GPU: Where are we now?
Dmitry Alexandrov
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdf
Tigabu Yaya
 
QCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AIQCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AI
Lex Yu
 

Similar to [若渴計畫]由GPU硬體概念到coding CUDA (20)

Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
 
[Osxdev]metal
[Osxdev]metal[Osxdev]metal
[Osxdev]metal
 
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAMaking the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
 
Transparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaTransparent GPU Exploitation for Java
Transparent GPU Exploitation for Java
 
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala TaiwanScala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
 
[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey Gordeychik
[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey Gordeychik[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey Gordeychik
[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey Gordeychik
 
Introduction to cuda geek camp singapore 2011
Introduction to cuda   geek camp singapore 2011Introduction to cuda   geek camp singapore 2011
Introduction to cuda geek camp singapore 2011
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
gpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsngpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsn
 
GPU: Understanding CUDA
GPU: Understanding CUDAGPU: Understanding CUDA
GPU: Understanding CUDA
 
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
SMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiSMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgi
 
Bits of Advice for the VM Writer, by Cliff Click @ Curry On 2015
Bits of Advice for the VM Writer, by Cliff Click @ Curry On 2015Bits of Advice for the VM Writer, by Cliff Click @ Curry On 2015
Bits of Advice for the VM Writer, by Cliff Click @ Curry On 2015
 
Java on the GPU: Where are we now?
Java on the GPU: Where are we now?Java on the GPU: Where are we now?
Java on the GPU: Where are we now?
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdf
 
QCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AIQCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AI
 

More from Aj MaChInE

An Intro on Data-oriented Attacks
An Intro on Data-oriented AttacksAn Intro on Data-oriented Attacks
An Intro on Data-oriented Attacks
Aj MaChInE
 
A Study on .NET Framework for Red Team - Part I
A Study on .NET Framework for Red Team - Part IA Study on .NET Framework for Red Team - Part I
A Study on .NET Framework for Red Team - Part I
Aj MaChInE
 
A study on NetSpectre
A study on NetSpectreA study on NetSpectre
A study on NetSpectre
Aj MaChInE
 
Introduction to Adversary Evaluation Tools
Introduction to Adversary Evaluation ToolsIntroduction to Adversary Evaluation Tools
Introduction to Adversary Evaluation Tools
Aj MaChInE
 
[若渴] A preliminary study on attacks against consensus in bitcoin
[若渴] A preliminary study on attacks against consensus in bitcoin[若渴] A preliminary study on attacks against consensus in bitcoin
[若渴] A preliminary study on attacks against consensus in bitcoin
Aj MaChInE
 
[RAT資安小聚] Study on Automatically Evading Malware Detection
[RAT資安小聚] Study on Automatically Evading Malware Detection[RAT資安小聚] Study on Automatically Evading Malware Detection
[RAT資安小聚] Study on Automatically Evading Malware Detection
Aj MaChInE
 
[若渴] Preliminary Study on Design and Exploitation of Trustzone
[若渴] Preliminary Study on Design and Exploitation of Trustzone[若渴] Preliminary Study on Design and Exploitation of Trustzone
[若渴] Preliminary Study on Design and Exploitation of Trustzone
Aj MaChInE
 
[若渴]Study on Side Channel Attacks and Countermeasures
[若渴]Study on Side Channel Attacks and Countermeasures [若渴]Study on Side Channel Attacks and Countermeasures
[若渴]Study on Side Channel Attacks and Countermeasures
Aj MaChInE
 
[若渴計畫] Challenges and Solutions of Window Remote Shellcode
[若渴計畫] Challenges and Solutions of Window Remote Shellcode[若渴計畫] Challenges and Solutions of Window Remote Shellcode
[若渴計畫] Challenges and Solutions of Window Remote Shellcode
Aj MaChInE
 
[若渴計畫] Introduction: Formal Verification for Code
[若渴計畫] Introduction: Formal Verification for Code[若渴計畫] Introduction: Formal Verification for Code
[若渴計畫] Introduction: Formal Verification for Code
Aj MaChInE
 
[若渴計畫] Studying ASLR^cache
[若渴計畫] Studying ASLR^cache[若渴計畫] Studying ASLR^cache
[若渴計畫] Studying ASLR^cache
Aj MaChInE
 
[若渴計畫] Black Hat 2017之過去閱讀相關整理
[若渴計畫] Black Hat 2017之過去閱讀相關整理[若渴計畫] Black Hat 2017之過去閱讀相關整理
[若渴計畫] Black Hat 2017之過去閱讀相關整理
Aj MaChInE
 

More from Aj MaChInE (12)

An Intro on Data-oriented Attacks
An Intro on Data-oriented AttacksAn Intro on Data-oriented Attacks
An Intro on Data-oriented Attacks
 
A Study on .NET Framework for Red Team - Part I
A Study on .NET Framework for Red Team - Part IA Study on .NET Framework for Red Team - Part I
A Study on .NET Framework for Red Team - Part I
 
A study on NetSpectre
A study on NetSpectreA study on NetSpectre
A study on NetSpectre
 
Introduction to Adversary Evaluation Tools
Introduction to Adversary Evaluation ToolsIntroduction to Adversary Evaluation Tools
Introduction to Adversary Evaluation Tools
 
[若渴] A preliminary study on attacks against consensus in bitcoin
[若渴] A preliminary study on attacks against consensus in bitcoin[若渴] A preliminary study on attacks against consensus in bitcoin
[若渴] A preliminary study on attacks against consensus in bitcoin
 
[RAT資安小聚] Study on Automatically Evading Malware Detection
[RAT資安小聚] Study on Automatically Evading Malware Detection[RAT資安小聚] Study on Automatically Evading Malware Detection
[RAT資安小聚] Study on Automatically Evading Malware Detection
 
[若渴] Preliminary Study on Design and Exploitation of Trustzone
[若渴] Preliminary Study on Design and Exploitation of Trustzone[若渴] Preliminary Study on Design and Exploitation of Trustzone
[若渴] Preliminary Study on Design and Exploitation of Trustzone
 
[若渴]Study on Side Channel Attacks and Countermeasures
[若渴]Study on Side Channel Attacks and Countermeasures [若渴]Study on Side Channel Attacks and Countermeasures
[若渴]Study on Side Channel Attacks and Countermeasures
 
[若渴計畫] Challenges and Solutions of Window Remote Shellcode
[若渴計畫] Challenges and Solutions of Window Remote Shellcode[若渴計畫] Challenges and Solutions of Window Remote Shellcode
[若渴計畫] Challenges and Solutions of Window Remote Shellcode
 
[若渴計畫] Introduction: Formal Verification for Code
[若渴計畫] Introduction: Formal Verification for Code[若渴計畫] Introduction: Formal Verification for Code
[若渴計畫] Introduction: Formal Verification for Code
 
[若渴計畫] Studying ASLR^cache
[若渴計畫] Studying ASLR^cache[若渴計畫] Studying ASLR^cache
[若渴計畫] Studying ASLR^cache
 
[若渴計畫] Black Hat 2017之過去閱讀相關整理
[若渴計畫] Black Hat 2017之過去閱讀相關整理[若渴計畫] Black Hat 2017之過去閱讀相關整理
[若渴計畫] Black Hat 2017之過去閱讀相關整理
 

Recently uploaded

Flinders Cert degree offer diploma
Flinders Cert degree offer diploma Flinders Cert degree offer diploma
Flinders Cert degree offer diploma
popecap
 
Pass AWS Certified Developer Associate with new exam dumps 2024
Pass AWS Certified Developer Associate  with new exam dumps 2024Pass AWS Certified Developer Associate  with new exam dumps 2024
Pass AWS Certified Developer Associate with new exam dumps 2024
SkillCertProExams
 
ulcerative colitis case presentation
ulcerative colitis case presentation ulcerative colitis case presentation
ulcerative colitis case presentation
anshu reddy
 
VIP Shimla Girls Call Shimla 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Shimla Girls Call Shimla 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Shimla Girls Call Shimla 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Shimla Girls Call Shimla 0X0000000X Doorstep High-Profile Girl Service Ca...
sukaniyasunnu
 
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
shalvikaprincessparv
 
Chapter 1 - Comparsion of Dispute Resolution Technique - Reading Material.pdf
Chapter 1 - Comparsion of Dispute Resolution Technique - Reading Material.pdfChapter 1 - Comparsion of Dispute Resolution Technique - Reading Material.pdf
Chapter 1 - Comparsion of Dispute Resolution Technique - Reading Material.pdf
Adroit PMC
 
Chapter 1 A - Introduction to Arbitration
Chapter 1 A - Introduction to ArbitrationChapter 1 A - Introduction to Arbitration
Chapter 1 A - Introduction to Arbitration
Adroit PMC
 
VIP Mysore Girls Call Mysore 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Mysore Girls Call Mysore 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Mysore Girls Call Mysore 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Mysore Girls Call Mysore 0X0000000X Doorstep High-Profile Girl Service Ca...
satpalsheravatmumbai
 
2024-07-14 Transformed 07 (shared slides).pptx
2024-07-14 Transformed 07 (shared slides).pptx2024-07-14 Transformed 07 (shared slides).pptx
2024-07-14 Transformed 07 (shared slides).pptx
Dale Wells
 
VIP Ahmedabad Girls Call Ahmedabad 0X0000000X Doorstep High-Profile Girl Serv...
VIP Ahmedabad Girls Call Ahmedabad 0X0000000X Doorstep High-Profile Girl Serv...VIP Ahmedabad Girls Call Ahmedabad 0X0000000X Doorstep High-Profile Girl Serv...
VIP Ahmedabad Girls Call Ahmedabad 0X0000000X Doorstep High-Profile Girl Serv...
satpalsheravatmumbai
 
VIP Nashik Girls Call Nashik 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Nashik Girls Call Nashik 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Nashik Girls Call Nashik 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Nashik Girls Call Nashik 0X0000000X Doorstep High-Profile Girl Service Ca...
saroohilakhatariroy
 
Dehradun Girls Call Dehradun 0X0000000X Unlimited Short Providing Girls Servi...
Dehradun Girls Call Dehradun 0X0000000X Unlimited Short Providing Girls Servi...Dehradun Girls Call Dehradun 0X0000000X Unlimited Short Providing Girls Servi...
Dehradun Girls Call Dehradun 0X0000000X Unlimited Short Providing Girls Servi...
sankisogandhi
 
PSUG 3 - 2024-07-15 - Splunk & AI with Philipp Drieger
PSUG 3 - 2024-07-15 - Splunk & AI with Philipp DriegerPSUG 3 - 2024-07-15 - Splunk & AI with Philipp Drieger
PSUG 3 - 2024-07-15 - Splunk & AI with Philipp Drieger
Tomas Moser
 
Trapbone Routing Plan created by Marcus Davis Jr
Trapbone Routing Plan created by Marcus Davis JrTrapbone Routing Plan created by Marcus Davis Jr
Trapbone Routing Plan created by Marcus Davis Jr
MarcusDavisJr1
 
Hyderabad Girls Call Hyderabad 0X0000000X Unlimited Short Providing Girls Ser...
Hyderabad Girls Call Hyderabad 0X0000000X Unlimited Short Providing Girls Ser...Hyderabad Girls Call Hyderabad 0X0000000X Unlimited Short Providing Girls Ser...
Hyderabad Girls Call Hyderabad 0X0000000X Unlimited Short Providing Girls Ser...
rashmikasinghdelhiro
 
Curtin Cert degree offer diploma
Curtin Cert degree offer diploma Curtin Cert degree offer diploma
Curtin Cert degree offer diploma
popecap
 
Colorfcul Presentation - Public Relations
Colorfcul Presentation - Public RelationsColorfcul Presentation - Public Relations
Colorfcul Presentation - Public Relations
StephanieFeliciano8
 
Varanasi Girls Call Varanasi 0X0000000X Payment On Delevery Cash Hot Premium ...
Varanasi Girls Call Varanasi 0X0000000X Payment On Delevery Cash Hot Premium ...Varanasi Girls Call Varanasi 0X0000000X Payment On Delevery Cash Hot Premium ...
Varanasi Girls Call Varanasi 0X0000000X Payment On Delevery Cash Hot Premium ...
parichopra4
 
Integrated and localized Approach in Development Communication.pptx
Integrated and localized Approach in Development Communication.pptxIntegrated and localized Approach in Development Communication.pptx
Integrated and localized Approach in Development Communication.pptx
Sayan Bachaspati
 
NAAC REFORMS IN ACCREDITATION 2024.pptx
NAAC REFORMS IN ACCREDITATION  2024.pptxNAAC REFORMS IN ACCREDITATION  2024.pptx
NAAC REFORMS IN ACCREDITATION 2024.pptx
VeluSureshKumar
 

Recently uploaded (20)

Flinders Cert degree offer diploma
Flinders Cert degree offer diploma Flinders Cert degree offer diploma
Flinders Cert degree offer diploma
 
Pass AWS Certified Developer Associate with new exam dumps 2024
Pass AWS Certified Developer Associate  with new exam dumps 2024Pass AWS Certified Developer Associate  with new exam dumps 2024
Pass AWS Certified Developer Associate with new exam dumps 2024
 
ulcerative colitis case presentation
ulcerative colitis case presentation ulcerative colitis case presentation
ulcerative colitis case presentation
 
VIP Shimla Girls Call Shimla 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Shimla Girls Call Shimla 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Shimla Girls Call Shimla 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Shimla Girls Call Shimla 0X0000000X Doorstep High-Profile Girl Service Ca...
 
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
 
Chapter 1 - Comparsion of Dispute Resolution Technique - Reading Material.pdf
Chapter 1 - Comparsion of Dispute Resolution Technique - Reading Material.pdfChapter 1 - Comparsion of Dispute Resolution Technique - Reading Material.pdf
Chapter 1 - Comparsion of Dispute Resolution Technique - Reading Material.pdf
 
Chapter 1 A - Introduction to Arbitration
Chapter 1 A - Introduction to ArbitrationChapter 1 A - Introduction to Arbitration
Chapter 1 A - Introduction to Arbitration
 
VIP Mysore Girls Call Mysore 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Mysore Girls Call Mysore 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Mysore Girls Call Mysore 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Mysore Girls Call Mysore 0X0000000X Doorstep High-Profile Girl Service Ca...
 
2024-07-14 Transformed 07 (shared slides).pptx
2024-07-14 Transformed 07 (shared slides).pptx2024-07-14 Transformed 07 (shared slides).pptx
2024-07-14 Transformed 07 (shared slides).pptx
 
VIP Ahmedabad Girls Call Ahmedabad 0X0000000X Doorstep High-Profile Girl Serv...
VIP Ahmedabad Girls Call Ahmedabad 0X0000000X Doorstep High-Profile Girl Serv...VIP Ahmedabad Girls Call Ahmedabad 0X0000000X Doorstep High-Profile Girl Serv...
VIP Ahmedabad Girls Call Ahmedabad 0X0000000X Doorstep High-Profile Girl Serv...
 
VIP Nashik Girls Call Nashik 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Nashik Girls Call Nashik 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Nashik Girls Call Nashik 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Nashik Girls Call Nashik 0X0000000X Doorstep High-Profile Girl Service Ca...
 
Dehradun Girls Call Dehradun 0X0000000X Unlimited Short Providing Girls Servi...
Dehradun Girls Call Dehradun 0X0000000X Unlimited Short Providing Girls Servi...Dehradun Girls Call Dehradun 0X0000000X Unlimited Short Providing Girls Servi...
Dehradun Girls Call Dehradun 0X0000000X Unlimited Short Providing Girls Servi...
 
PSUG 3 - 2024-07-15 - Splunk & AI with Philipp Drieger
PSUG 3 - 2024-07-15 - Splunk & AI with Philipp DriegerPSUG 3 - 2024-07-15 - Splunk & AI with Philipp Drieger
PSUG 3 - 2024-07-15 - Splunk & AI with Philipp Drieger
 
Trapbone Routing Plan created by Marcus Davis Jr
Trapbone Routing Plan created by Marcus Davis JrTrapbone Routing Plan created by Marcus Davis Jr
Trapbone Routing Plan created by Marcus Davis Jr
 
Hyderabad Girls Call Hyderabad 0X0000000X Unlimited Short Providing Girls Ser...
Hyderabad Girls Call Hyderabad 0X0000000X Unlimited Short Providing Girls Ser...Hyderabad Girls Call Hyderabad 0X0000000X Unlimited Short Providing Girls Ser...
Hyderabad Girls Call Hyderabad 0X0000000X Unlimited Short Providing Girls Ser...
 
Curtin Cert degree offer diploma
Curtin Cert degree offer diploma Curtin Cert degree offer diploma
Curtin Cert degree offer diploma
 
Colorfcul Presentation - Public Relations
Colorfcul Presentation - Public RelationsColorfcul Presentation - Public Relations
Colorfcul Presentation - Public Relations
 
Varanasi Girls Call Varanasi 0X0000000X Payment On Delevery Cash Hot Premium ...
Varanasi Girls Call Varanasi 0X0000000X Payment On Delevery Cash Hot Premium ...Varanasi Girls Call Varanasi 0X0000000X Payment On Delevery Cash Hot Premium ...
Varanasi Girls Call Varanasi 0X0000000X Payment On Delevery Cash Hot Premium ...
 
Integrated and localized Approach in Development Communication.pptx
Integrated and localized Approach in Development Communication.pptxIntegrated and localized Approach in Development Communication.pptx
Integrated and localized Approach in Development Communication.pptx
 
NAAC REFORMS IN ACCREDITATION 2024.pptx
NAAC REFORMS IN ACCREDITATION  2024.pptxNAAC REFORMS IN ACCREDITATION  2024.pptx
NAAC REFORMS IN ACCREDITATION 2024.pptx
 

[若渴計畫]由GPU硬體概念到coding CUDA

Editor's Notes

  1. splashtop
  2. 當初有個議題
  3. 對於programming model的解釋, 我在上次研討會的時候,我覺得有一個不錯的解釋 使用此programming model 他會帶來給你怎樣的設計概念.
  4. 所以接下來就針對SIMT來說明
  5. 這裡就是在CUDA中,描述每條thread 要做的事情 i代表每條thread 某一條thread i B陣列的i元素 和c陣列的i元素 存到a陣列的i元素裡面 每條thread所占用的硬體資源會如右圖
  6. 每條tread i 取某個lut陣列的b[i]元素 這樣的意思是說 每個thread可以對memory 自己抓自己的memory address來處理.
  7. Warp表示一個一個warp 每個warp由32 threads組成 本來32條 threads執行在同一個warp同步執行 會變成兩個warp循序執行
  8. Latency hiding 當有等地latency hiding時 去做別的事情
  9. 在每個thread使用作多個register時, 可同時執行8個warp 所以如果設計成9個Dispatch的話,在這狀況下1個dispatch就多於了