SlideShare a Scribd company logo
1 of 69
由GPU硬體概念到coding
CUDA
AJ
2014.6.17
GPU是否只能當顯示卡?
能不能拿來做平行運算?
兩個大廠
• NVIDIA
• AMD
• 這兩大廠都有提供open source project給玩家來join
• 能join什麼? 還沒涉略
• 因為我的實驗室只有NVIDA卡,所以就使用NVIDA ~”~
• NVIDA卡,它是使用何種programming model來programming?
• Single-instruction multiple thread (SIMT) programming model
使用此model帶來給你
怎樣的設計概念
從NVIDIA GPU設計概念說起
在NVIDIA GPU中,可用三個特性來看SIMT
• Single instruction, multiple register sets
• Single instruction, multiple addresses
• Single instruction, multiple flow paths
http://www.yosefk.com/blog/simd-simt-smt-parallelism-in-nvidia-gpus.html
Single Instruction, Multiple Register Sets
for(i=0;i<n;++i) a[i]=b[i]+c[i];
__global__ void add(float *a, float *b, float *c) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
a[i]=b[i]+c[i]; //no loop!
}
Costs:
• 每個thread都會對應自己的register set ,所以會有redundant情況發生。
http://www.yosefk.com/blog/simd-simt-smt-parallelism-in-nvidia-gpus.html
Single Instruction, Multiple Addresses
__global__ void apply(short* a, short* b, short* lut) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
a[i] = lut[b[i]]; //indirect memory access
// a[i] = b[i]
}
Cost:
• 對於DRAM memory來說, random access跟循序存取比起來是沒有效率
的。
• 對於shared memory來說, random access 會藉由bank contentions而變
慢速度。(先不討論shared memory)
http://www.yosefk.com/blog/simd-simt-smt-parallelism-in-nvidia-gpus.html
Single Instruction, Multiple Flow Paths
http://www.yosefk.com/blog/simd-simt-smt-parallelism-in-nvidia-gpus.html
__global__ void find(int* vec, int len,
int* ind, int* nfound,
int nthreads) {
int tid = blockIdx.x * blockDim.x + threadIdx.x;
int last = 0;
int* myind = ind + tid*len;
for(int i=tid; i<len; i+=nthreads) {
if( vec[i] ) { //flow divergence
myind[last] = i;
last++;
}
}
nfound[tid] = last;
}
….
len thread id = 0
nthreads =1vec
get coalescing to read
if(vec[i]) 成立 if(vec[i]) 不成成立
get coalescing to read
….
registers
以上為SIMT設計特性。
先來看kepler gk110 晶片方塊圖。
• 15 SMX(串流處理器) X 192 cores
• 4 warp scheduler per SMX
• 暫存器個數65536 per SMX
Form NVIDIA kepler gk110 architecture whitepaper
• warp scheduler 用來做啥?
• SMX內部的資源分配
Form NVIDIA kepler gk110 architecture whitepaper
warp1 warp2
Warp使用SIMT運作
1. 在NVIDIA中, a “warp”是由好幾個(32)threads組成
且同時跑。 而每個thread需要自己的registers 。
2.在Warp中,SIMT去執行,也就是說32 threads執行相
同指令。如果對於flow divergence ,則硬體會分多個warp處
理這問題,但效能會變差。(James Balfour, “CUDA
Threads and Atomics” ,p.13~p.18) 。
http://www.yosefk.com/blog/simd-simt-smt-parallelism-in-nvidia-gpus.html
Warp使用SIMT運作(cont.)
1.. 好幾個warps組成a “block” , 一個block被對應到一
個SMX ,而一個SMX裡面有warp scheduler去切換一個
block中的warps去執行。 而每個warp都有自己的
register sets。
2. 由圖可知一個block ,再做warp schedule時,是zero
overhead (fast context switch)。因為狀態接由
register set保存。而warp狀態可分actives/suspended 。
http://www.yosefk.com/blog/simd-simt-smt-parallelism-in-nvidia-gpus.html
3. 你可以指定一個block有多少thread。但一個block做多指
定多少thread ,要看硬體可支援的運算能力。
Thread ID 如何對應到 Warp
• Warp ID (warpid)
• 如何知道一個block中某thread屬於哪個warp? threadIdx.x / 32
• Thread ID = 0 ~ 31  warp
• Thread ID = 32~64  warp
• …
http://on-demand.gputechconf.com/gtc/2013/presentations/S3174-Kepler-Shuffle-Tips-Tricks.pdf , p.2
以上GPU原理(當然不只) ,外加整合CPU ,
然而就有了CUDA的coding環境出現。
使用CUDA必須注意的事情
• 使用哪一個NVIDIA GPU Architecture 。
• NVIDIA Tesla K20c
• 從https://developer.nvidia.com/cuda-gpus 可知Tesla K20c的
Compute Capability為3.5 。
• 安裝CUDA環境,可參考http://docs.nvidia.com/cuda/cuda-getting-
started-guide-for-linux/index.html#axzz33nDhVV00 。編譯器名稱為
nvcc 。
• 最新的CUDA版本為6.0 ,而我安裝的是5.0 XD(懶得升級 哈) 。
• 安裝完CUDA環境,可跑內建執行檔deviceQuery 去看看安裝對不
對。
/usr/local/cuda-5.0/samples/1_Utilities/deviceQuery
你會有個疑問 那我同一個CUDA他如何做到同
一個GPU不同SMX數也可以執行?
Block Scalability
Program Compilation
CUDA 5: Separate Compilation & Linking
From Introducing CUDA 5.pdf
Makefile範例
##########################################################
# compiler setting
##########################################################
CC = gcc
CXX = g++
NVCC = nvcc
CFLAGS = -g -Wall
CXXFLAGS = $(CFLAGS) -Weffc++ -pg
LIBS = -lm -lglut -lGLU -lGL
INCPATH = -I./
OBJS = main.o
c_a.o
c_b.o
cpp_a.o
cpp_b.o
cu_a.o
cu_b.o
EXEC = output
all: $(OBJS)
$(NVCC) $(OBJS) -o $(EXEC) $(LIBS) -pg
%.o:%.cu
$(NVCC) -c $< -o $@ -g –G -arch=sm_35
%.o:%.cpp
$(CXX) -c $(CXXFLAGS) $(INCPATH) $< -o $@
%.o:%.c
$(CC) -c $(CFLAGS) $(INCPATH) $< -o $@
#########################################################
假設拿到別人的平行化程式,
可試試看一個不錯可能改善效能的方法。
The ILP method <=小時候學的ILP可以這樣
用啊!!
• 多條thread合併->ILP增加 -> 有機會對coalesce global memory-> Block數減少 -
> 一個thread使用register個數增加 -> Ocuupancy降低
(Vasily Volkov, “Better Performance at Lower Occupancy”)
先說什麼是Occupancy
• Occupancy = Number of warps running concurrently on a
multiprocessor divided by maximum number of warps that can run
concurrently.(意思就是說你每個時間所同時跑的thread數,到底有
沒有塞滿GPU提供的最大同時間跑的thread數。)
From Optimizing CUDA – Part II © NVIDIA Corporation 2009
• 假設某GPU的其中一個SMX最
多同時間可跑1536個threads以
及32K register
NVIDIA工程師
(http://stackoverflow.com/users/749748/harrism)
在stackoverflow表示
• In general, as Jared mentions, using too many registers per thread is
not desirable because it reduces occupancy, and therefore reduces
latency hiding ability in the kernel. GPUs thrive on parallelism and do
so by covering memory latency with work from other threads.
• Therefore, you should probably not optimize arrays into registers.
Instead, ensure that your memory accesses to those arrays across
threads are as close to sequential as possible so you maximize
coalescing (i.e. minimize memory transactions).
http://stackoverflow.com/questions/12167926/forcing-cuda-to-use-register-for-a-variable
也就是說不管Occupancy高不高,要讓
memory有機會能coalesce來讀取。
繼續對ILP在NVIDIA GPU影響做說明
http://continuum.io/blog/cudapy_ilp_opt
搬資料
• Core
• Memory
controller
上面的效果對應CODE是什麼啊?
ILP = 2時,右邊用
pseudocode表示
# read
i = thread.id
ai = a[i]
bi = b[i]
j = i+5
aj = a[j]
bj = b[j]
# compute
ci = core(ai, bi)
cj = core(aj, bj)
# write
c[i] = ci
c[j] = cj
ILP=4時,實際效果=>讓GPU pipeline效果變
高
http://continuum.io/blog/cudapy_ilp_opt
上述主要概念整理
•Hide latency = do other operations when
waiting for latency
• ILP增加
• 增加Occupancy
剛提到the ILP method ,
一個thread 所使用的register個數是一個重要考
量。
Interpreting Output of --ptxas-options=-v
http://stackoverflow.com/questions/12388207/interpreting-output-of-ptxas-options-v
http://stackoverflow.com/questions/7241062/is-local-memory-slower-than-shared-memory-in-cuda
• Each CUDA thread is using 46 registers?
Yes, correct
• There is no register spilling to local memory(shared memory)?
Yes, correct
• Is 72 bytes the sum-total of the memory for the stack frames of the __global__ (撰寫
平行化的副程式)and __device__(給__global__函數呼叫的副程式) functions?
Yes, correct
我要怎麼限制一個thread的register使用數
• control register usage with the nvcc flag: --maxrregcount
假設threads的分配register總量超過GPU上的
register數量,編譯器會怎做?
stackoverflow神人表示
• PTX level allows many more virtual registers than the hardware.
Those are mapped to hardware registers at load time. The register
limit you specify allows you to set an upper limit on the hardware
resources used by the generated binary. It serves as a heuristic for the
compiler to decide when to spill (see below) registers when compiling
to PTX already so certain concurrency needs can be met.
• For Fermi GPUs there are at most 64 hardware registers. The 64th is
used by the ABI as the stack pointer and thus for "register spilling" (it
means freeing up registers by temporarily storing their values on the
stack and happens when more registers are needed than available) so
it is untouchable.
http://stackoverflow.com/questions/12167926/forcing-cuda-to-use-register-for-a-variable
剛剛說利用增加register來賺memory coalesce的
時間。 register用超過會增加memory存取時間。
怎辦啊?
哈! 再怎嘴砲,也是要coding才知阿~~~~~
我可以寫程式把所需資料放在哪呢?
Mohamed Zahran,
“Lecture 6: CUDA Memories”
• 存取速度
shared memory >
constant memory >
global memory >
要怎宣告的資料,代表存取哪種memory啊?
描述有錯,要看
compiler放在哪裡
Stackoverflow神人
• Dynamically indexed arrays cannot be stored in registers, because the GPU
register file is not dynamically addressable.
• Scalar variables are automatically stored in registers by the compiler.
• Statically-indexed (i.e. where the index can be determined at compile
time), small arrays (say, less than 16 floats) may be stored in registers by the
compiler.
http://stackoverflow.com/questions/12167926/forcing-cuda-to-use-register-for-a-variable
來看一個簡單的範例
Summing two vectors
Jason Sa nders, Edward Kandrot, “CUDA by Example”
資料哪來啊? 從CPU Memory搬到global
memory
Jason Sa nders, Edward Kandrot, “CUDA by Example”
怎麼呼叫自己寫的平行化程式押?
• 呼叫時需要指定每個block有thread數,一個grid有多少block
• 上面意思是說一個grid有N個blocks ,每個block有1個thread再
執行
threadsblocks
Jason Sa nders, Edward Kandrot, “CUDA by Example”
從GPU global memory寫回到CPU memory去
處理
Jason Sa nders, Edward Kandrot, “CUDA by Example”
整理以上流程
http://en.wikipedia.org/wiki/CUDA
為什麼要指定的thread數block數會有
1D,2D,3D阿?
• 1 block 4
• 一個block是9x9,因為
100 thread所以有兩
個block
• 2 blocks
在thread數不是32倍數的狀況下,1D,2D,3D
的分法就是要比較哪個warp塞比較滿!!!
要怎量GPU跑的時間
Profiling Tool: nvprof
nvprof --events warps_launched,threads_launched ./執行檔 執行檔輸入參數 >
result
Q&A
Q&A-1: flow divergence的討論
• JIT的作法
• 程式用profile知道哪些true或false的狀況,分開同時丟給JIT去執行
• Brower就是用這樣的方式去加快處理
• 這樣的做法很吃memory
Q&A-2: NVIDA/AMD
• NVIDA
• 筆電,伺服器
• AMD
• 手機
Q&A-3:Single Instruction, Multiple Addresses
的討論
• 對於compiler處理random access
• Point analysis
Q&A-4:
• CUDA LLVM Compiler
• 目前CUDA不支援OpenCL 2.0
• https://developer.nvidia.com/opencl
Q&A-5: trace code討論
• cuda-gdb
• http://docs.nvidia.com/cuda/cuda-gdb/#axzz34ufkPsqt
• EX:
• Note: For disassembly instruction to work
properly, cuobjdump must be installed and present in your $PATH.
Q&A-6: GPU machine code放到哪執行阿?
 不知道GPU有沒有在討論locality問題?
Q&A-7 把function切開平行化是否有好處?
• Function()
function1()
function2()
function3()
• ?
Q&A-8 5 axis machine 的防碰撞平行化
• cutter每走一步就用GPU檢查有沒有撞到
• 問題: GPU持續耗電
• 如果5軸機開雕刻一整天 GPU不就耗電很恐怖?
• Trade off: 耗電/速度
CUDA Toolkit Documentation
• http://docs.nvidia.com/cuda/index.html#axzz33uurtJU9

More Related Content

What's hot

[B11] 基礎から知るSSD(いまさら聞けないSSDの基本) by Hironobu Asano
[B11] 基礎から知るSSD(いまさら聞けないSSDの基本) by Hironobu Asano[B11] 基礎から知るSSD(いまさら聞けないSSDの基本) by Hironobu Asano
[B11] 基礎から知るSSD(いまさら聞けないSSDの基本) by Hironobu AsanoInsight Technology, Inc.
 
NEDIA_SNIA_CXL_講演資料.pdf
NEDIA_SNIA_CXL_講演資料.pdfNEDIA_SNIA_CXL_講演資料.pdf
NEDIA_SNIA_CXL_講演資料.pdfYasunori Goto
 
ACRiウェビナー:岩渕様ご講演資料
ACRiウェビナー:岩渕様ご講演資料ACRiウェビナー:岩渕様ご講演資料
ACRiウェビナー:岩渕様ご講演資料直久 住川
 
Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownScyllaDB
 
Qemu device prototyping
Qemu device prototypingQemu device prototyping
Qemu device prototypingYan Vugenfirer
 
いまさら聞けないarmを使ったNEONの基礎と活用事例
いまさら聞けないarmを使ったNEONの基礎と活用事例いまさら聞けないarmを使ったNEONの基礎と活用事例
いまさら聞けないarmを使ったNEONの基礎と活用事例Fixstars Corporation
 
UEFIによるELFバイナリの起動
UEFIによるELFバイナリの起動UEFIによるELFバイナリの起動
UEFIによるELFバイナリの起動uchan_nos
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InSage Weil
 
Netmcr 40 - Salt + Netbox + Vyos = Network Automation + Routing Security
Netmcr 40 - Salt + Netbox + Vyos = Network Automation + Routing SecurityNetmcr 40 - Salt + Netbox + Vyos = Network Automation + Routing Security
Netmcr 40 - Salt + Netbox + Vyos = Network Automation + Routing SecurityFaelix Ltd
 
ZynqMPのブートとパワーマネージメント : (ZynqMP Boot and Power Management)
ZynqMPのブートとパワーマネージメント : (ZynqMP Boot and Power Management)ZynqMPのブートとパワーマネージメント : (ZynqMP Boot and Power Management)
ZynqMPのブートとパワーマネージメント : (ZynqMP Boot and Power Management)Mr. Vengineer
 
謎の言語Forthが謎なので実装した
謎の言語Forthが謎なので実装した謎の言語Forthが謎なので実装した
謎の言語Forthが謎なので実装したt-sin
 
QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?Pradeep Kumar
 
DPDKによる高速コンテナネットワーキング
DPDKによる高速コンテナネットワーキングDPDKによる高速コンテナネットワーキング
DPDKによる高速コンテナネットワーキングTomoya Hibi
 
DPDKを用いたネットワークスタック,高性能通信基盤開発
DPDKを用いたネットワークスタック,高性能通信基盤開発DPDKを用いたネットワークスタック,高性能通信基盤開発
DPDKを用いたネットワークスタック,高性能通信基盤開発slankdev
 
Introduction to eBPF and XDP
Introduction to eBPF and XDPIntroduction to eBPF and XDP
Introduction to eBPF and XDPlcplcp1
 
Portacle : Common Lispのオールインワン開発環境
Portacle : Common Lispのオールインワン開発環境Portacle : Common Lispのオールインワン開発環境
Portacle : Common Lispのオールインワン開発環境Satoshi imai
 
0円でできる自宅InfiniBandプログラム
0円でできる自宅InfiniBandプログラム0円でできる自宅InfiniBandプログラム
0円でできる自宅InfiniBandプログラムMinoru Nakamura
 

What's hot (20)

[B11] 基礎から知るSSD(いまさら聞けないSSDの基本) by Hironobu Asano
[B11] 基礎から知るSSD(いまさら聞けないSSDの基本) by Hironobu Asano[B11] 基礎から知るSSD(いまさら聞けないSSDの基本) by Hironobu Asano
[B11] 基礎から知るSSD(いまさら聞けないSSDの基本) by Hironobu Asano
 
NEDIA_SNIA_CXL_講演資料.pdf
NEDIA_SNIA_CXL_講演資料.pdfNEDIA_SNIA_CXL_講演資料.pdf
NEDIA_SNIA_CXL_講演資料.pdf
 
SR-IOV Introduce
SR-IOV IntroduceSR-IOV Introduce
SR-IOV Introduce
 
ACRiウェビナー:岩渕様ご講演資料
ACRiウェビナー:岩渕様ご講演資料ACRiウェビナー:岩渕様ご講演資料
ACRiウェビナー:岩渕様ご講演資料
 
Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance Showdown
 
Qemu device prototyping
Qemu device prototypingQemu device prototyping
Qemu device prototyping
 
いまさら聞けないarmを使ったNEONの基礎と活用事例
いまさら聞けないarmを使ったNEONの基礎と活用事例いまさら聞けないarmを使ったNEONの基礎と活用事例
いまさら聞けないarmを使ったNEONの基礎と活用事例
 
UEFIによるELFバイナリの起動
UEFIによるELFバイナリの起動UEFIによるELFバイナリの起動
UEFIによるELFバイナリの起動
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
 
Netmcr 40 - Salt + Netbox + Vyos = Network Automation + Routing Security
Netmcr 40 - Salt + Netbox + Vyos = Network Automation + Routing SecurityNetmcr 40 - Salt + Netbox + Vyos = Network Automation + Routing Security
Netmcr 40 - Salt + Netbox + Vyos = Network Automation + Routing Security
 
ZynqMPのブートとパワーマネージメント : (ZynqMP Boot and Power Management)
ZynqMPのブートとパワーマネージメント : (ZynqMP Boot and Power Management)ZynqMPのブートとパワーマネージメント : (ZynqMP Boot and Power Management)
ZynqMPのブートとパワーマネージメント : (ZynqMP Boot and Power Management)
 
謎の言語Forthが謎なので実装した
謎の言語Forthが謎なので実装した謎の言語Forthが謎なので実装した
謎の言語Forthが謎なので実装した
 
RISC-V Zce Extension
RISC-V Zce ExtensionRISC-V Zce Extension
RISC-V Zce Extension
 
QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?
 
DPDKによる高速コンテナネットワーキング
DPDKによる高速コンテナネットワーキングDPDKによる高速コンテナネットワーキング
DPDKによる高速コンテナネットワーキング
 
DPDKを用いたネットワークスタック,高性能通信基盤開発
DPDKを用いたネットワークスタック,高性能通信基盤開発DPDKを用いたネットワークスタック,高性能通信基盤開発
DPDKを用いたネットワークスタック,高性能通信基盤開発
 
Making Linux do Hard Real-time
Making Linux do Hard Real-timeMaking Linux do Hard Real-time
Making Linux do Hard Real-time
 
Introduction to eBPF and XDP
Introduction to eBPF and XDPIntroduction to eBPF and XDP
Introduction to eBPF and XDP
 
Portacle : Common Lispのオールインワン開発環境
Portacle : Common Lispのオールインワン開発環境Portacle : Common Lispのオールインワン開発環境
Portacle : Common Lispのオールインワン開発環境
 
0円でできる自宅InfiniBandプログラム
0円でできる自宅InfiniBandプログラム0円でできる自宅InfiniBandプログラム
0円でできる自宅InfiniBandプログラム
 

Viewers also liked

[SITCON2015] 自己的異質多核心平台自己幹
[SITCON2015] 自己的異質多核心平台自己幹[SITCON2015] 自己的異質多核心平台自己幹
[SITCON2015] 自己的異質多核心平台自己幹Aj MaChInE
 
[MOSUT] Format String Attacks
[MOSUT] Format String Attacks[MOSUT] Format String Attacks
[MOSUT] Format String AttacksAj MaChInE
 
[若渴計畫]64-bit Linux Return-Oriented Programming
[若渴計畫]64-bit Linux Return-Oriented Programming[若渴計畫]64-bit Linux Return-Oriented Programming
[若渴計畫]64-bit Linux Return-Oriented ProgrammingAj MaChInE
 
閱讀文章分享@若渴 2016.1.24
閱讀文章分享@若渴 2016.1.24閱讀文章分享@若渴 2016.1.24
閱讀文章分享@若渴 2016.1.24Aj MaChInE
 
[若渴計畫2015.8.18] SMACK
[若渴計畫2015.8.18] SMACK[若渴計畫2015.8.18] SMACK
[若渴計畫2015.8.18] SMACKAj MaChInE
 
[若渴計畫] Studying Concurrency
[若渴計畫] Studying Concurrency[若渴計畫] Studying Concurrency
[若渴計畫] Studying ConcurrencyAj MaChInE
 
大學部101級專題 cuda
大學部101級專題 cuda大學部101級專題 cuda
大學部101級專題 cuda迺翔 黃
 
Cuda optimization
Cuda optimizationCuda optimization
Cuda optimizationCHIHTE LU
 
[MOSUT20150131] Linux Runs on SoCKit Board with the GPGPU
[MOSUT20150131] Linux Runs on SoCKit Board with the GPGPU[MOSUT20150131] Linux Runs on SoCKit Board with the GPGPU
[MOSUT20150131] Linux Runs on SoCKit Board with the GPGPUAj MaChInE
 
Introduction to gpu architecture
Introduction to gpu architectureIntroduction to gpu architecture
Introduction to gpu architectureCHIHTE LU
 
圖形處理器於腦部核磁共振影像處理應用
圖形處理器於腦部核磁共振影像處理應用圖形處理器於腦部核磁共振影像處理應用
圖形處理器於腦部核磁共振影像處理應用NVIDIA Taiwan
 

Viewers also liked (12)

[SITCON2015] 自己的異質多核心平台自己幹
[SITCON2015] 自己的異質多核心平台自己幹[SITCON2015] 自己的異質多核心平台自己幹
[SITCON2015] 自己的異質多核心平台自己幹
 
[MOSUT] Format String Attacks
[MOSUT] Format String Attacks[MOSUT] Format String Attacks
[MOSUT] Format String Attacks
 
[若渴計畫]64-bit Linux Return-Oriented Programming
[若渴計畫]64-bit Linux Return-Oriented Programming[若渴計畫]64-bit Linux Return-Oriented Programming
[若渴計畫]64-bit Linux Return-Oriented Programming
 
閱讀文章分享@若渴 2016.1.24
閱讀文章分享@若渴 2016.1.24閱讀文章分享@若渴 2016.1.24
閱讀文章分享@若渴 2016.1.24
 
[若渴計畫2015.8.18] SMACK
[若渴計畫2015.8.18] SMACK[若渴計畫2015.8.18] SMACK
[若渴計畫2015.8.18] SMACK
 
Code GPU with CUDA - SIMT
Code GPU with CUDA - SIMTCode GPU with CUDA - SIMT
Code GPU with CUDA - SIMT
 
[若渴計畫] Studying Concurrency
[若渴計畫] Studying Concurrency[若渴計畫] Studying Concurrency
[若渴計畫] Studying Concurrency
 
大學部101級專題 cuda
大學部101級專題 cuda大學部101級專題 cuda
大學部101級專題 cuda
 
Cuda optimization
Cuda optimizationCuda optimization
Cuda optimization
 
[MOSUT20150131] Linux Runs on SoCKit Board with the GPGPU
[MOSUT20150131] Linux Runs on SoCKit Board with the GPGPU[MOSUT20150131] Linux Runs on SoCKit Board with the GPGPU
[MOSUT20150131] Linux Runs on SoCKit Board with the GPGPU
 
Introduction to gpu architecture
Introduction to gpu architectureIntroduction to gpu architecture
Introduction to gpu architecture
 
圖形處理器於腦部核磁共振影像處理應用
圖形處理器於腦部核磁共振影像處理應用圖形處理器於腦部核磁共振影像處理應用
圖形處理器於腦部核磁共振影像處理應用
 

Similar to [若渴計畫]由GPU硬體概念到coding CUDA

Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.J On The Beach
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxssuser413a98
 
[Osxdev]metal
[Osxdev]metal[Osxdev]metal
[Osxdev]metalNAVER D2
 
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAMaking the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAFacultad de Informática UCM
 
Transparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaTransparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaKazuaki Ishizaki
 
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala TaiwanScala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala TaiwanJimin Hsieh
 
[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey Gordeychik
[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey Gordeychik[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey Gordeychik
[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey GordeychikCODE BLUE
 
Introduction to cuda geek camp singapore 2011
Introduction to cuda   geek camp singapore 2011Introduction to cuda   geek camp singapore 2011
Introduction to cuda geek camp singapore 2011Raymond Tay
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
gpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsngpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsnARUNACHALAM468781
 
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsAMD Developer Central
 
SMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiSMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiTakuya ASADA
 
Bits of Advice for the VM Writer, by Cliff Click @ Curry On 2015
Bits of Advice for the VM Writer, by Cliff Click @ Curry On 2015Bits of Advice for the VM Writer, by Cliff Click @ Curry On 2015
Bits of Advice for the VM Writer, by Cliff Click @ Curry On 2015curryon
 
Java on the GPU: Where are we now?
Java on the GPU: Where are we now?Java on the GPU: Where are we now?
Java on the GPU: Where are we now?Dmitry Alexandrov
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdfTigabu Yaya
 
QCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AIQCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AILex Yu
 

Similar to [若渴計畫]由GPU硬體概念到coding CUDA (20)

Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
 
[Osxdev]metal
[Osxdev]metal[Osxdev]metal
[Osxdev]metal
 
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAMaking the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
 
Transparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaTransparent GPU Exploitation for Java
Transparent GPU Exploitation for Java
 
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala TaiwanScala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
 
[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey Gordeychik
[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey Gordeychik[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey Gordeychik
[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey Gordeychik
 
Introduction to cuda geek camp singapore 2011
Introduction to cuda   geek camp singapore 2011Introduction to cuda   geek camp singapore 2011
Introduction to cuda geek camp singapore 2011
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
gpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsngpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsn
 
GPU: Understanding CUDA
GPU: Understanding CUDAGPU: Understanding CUDA
GPU: Understanding CUDA
 
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
SMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiSMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgi
 
Bits of Advice for the VM Writer, by Cliff Click @ Curry On 2015
Bits of Advice for the VM Writer, by Cliff Click @ Curry On 2015Bits of Advice for the VM Writer, by Cliff Click @ Curry On 2015
Bits of Advice for the VM Writer, by Cliff Click @ Curry On 2015
 
Java on the GPU: Where are we now?
Java on the GPU: Where are we now?Java on the GPU: Where are we now?
Java on the GPU: Where are we now?
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdf
 
QCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AIQCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AI
 

More from Aj MaChInE

An Intro on Data-oriented Attacks
An Intro on Data-oriented AttacksAn Intro on Data-oriented Attacks
An Intro on Data-oriented AttacksAj MaChInE
 
A Study on .NET Framework for Red Team - Part I
A Study on .NET Framework for Red Team - Part IA Study on .NET Framework for Red Team - Part I
A Study on .NET Framework for Red Team - Part IAj MaChInE
 
A study on NetSpectre
A study on NetSpectreA study on NetSpectre
A study on NetSpectreAj MaChInE
 
Introduction to Adversary Evaluation Tools
Introduction to Adversary Evaluation ToolsIntroduction to Adversary Evaluation Tools
Introduction to Adversary Evaluation ToolsAj MaChInE
 
[若渴] A preliminary study on attacks against consensus in bitcoin
[若渴] A preliminary study on attacks against consensus in bitcoin[若渴] A preliminary study on attacks against consensus in bitcoin
[若渴] A preliminary study on attacks against consensus in bitcoinAj MaChInE
 
[RAT資安小聚] Study on Automatically Evading Malware Detection
[RAT資安小聚] Study on Automatically Evading Malware Detection[RAT資安小聚] Study on Automatically Evading Malware Detection
[RAT資安小聚] Study on Automatically Evading Malware DetectionAj MaChInE
 
[若渴] Preliminary Study on Design and Exploitation of Trustzone
[若渴] Preliminary Study on Design and Exploitation of Trustzone[若渴] Preliminary Study on Design and Exploitation of Trustzone
[若渴] Preliminary Study on Design and Exploitation of TrustzoneAj MaChInE
 
[若渴]Study on Side Channel Attacks and Countermeasures
[若渴]Study on Side Channel Attacks and Countermeasures [若渴]Study on Side Channel Attacks and Countermeasures
[若渴]Study on Side Channel Attacks and Countermeasures Aj MaChInE
 
[若渴計畫] Challenges and Solutions of Window Remote Shellcode
[若渴計畫] Challenges and Solutions of Window Remote Shellcode[若渴計畫] Challenges and Solutions of Window Remote Shellcode
[若渴計畫] Challenges and Solutions of Window Remote ShellcodeAj MaChInE
 
[若渴計畫] Introduction: Formal Verification for Code
[若渴計畫] Introduction: Formal Verification for Code[若渴計畫] Introduction: Formal Verification for Code
[若渴計畫] Introduction: Formal Verification for CodeAj MaChInE
 
[若渴計畫] Studying ASLR^cache
[若渴計畫] Studying ASLR^cache[若渴計畫] Studying ASLR^cache
[若渴計畫] Studying ASLR^cacheAj MaChInE
 
[若渴計畫] Black Hat 2017之過去閱讀相關整理
[若渴計畫] Black Hat 2017之過去閱讀相關整理[若渴計畫] Black Hat 2017之過去閱讀相關整理
[若渴計畫] Black Hat 2017之過去閱讀相關整理Aj MaChInE
 

More from Aj MaChInE (12)

An Intro on Data-oriented Attacks
An Intro on Data-oriented AttacksAn Intro on Data-oriented Attacks
An Intro on Data-oriented Attacks
 
A Study on .NET Framework for Red Team - Part I
A Study on .NET Framework for Red Team - Part IA Study on .NET Framework for Red Team - Part I
A Study on .NET Framework for Red Team - Part I
 
A study on NetSpectre
A study on NetSpectreA study on NetSpectre
A study on NetSpectre
 
Introduction to Adversary Evaluation Tools
Introduction to Adversary Evaluation ToolsIntroduction to Adversary Evaluation Tools
Introduction to Adversary Evaluation Tools
 
[若渴] A preliminary study on attacks against consensus in bitcoin
[若渴] A preliminary study on attacks against consensus in bitcoin[若渴] A preliminary study on attacks against consensus in bitcoin
[若渴] A preliminary study on attacks against consensus in bitcoin
 
[RAT資安小聚] Study on Automatically Evading Malware Detection
[RAT資安小聚] Study on Automatically Evading Malware Detection[RAT資安小聚] Study on Automatically Evading Malware Detection
[RAT資安小聚] Study on Automatically Evading Malware Detection
 
[若渴] Preliminary Study on Design and Exploitation of Trustzone
[若渴] Preliminary Study on Design and Exploitation of Trustzone[若渴] Preliminary Study on Design and Exploitation of Trustzone
[若渴] Preliminary Study on Design and Exploitation of Trustzone
 
[若渴]Study on Side Channel Attacks and Countermeasures
[若渴]Study on Side Channel Attacks and Countermeasures [若渴]Study on Side Channel Attacks and Countermeasures
[若渴]Study on Side Channel Attacks and Countermeasures
 
[若渴計畫] Challenges and Solutions of Window Remote Shellcode
[若渴計畫] Challenges and Solutions of Window Remote Shellcode[若渴計畫] Challenges and Solutions of Window Remote Shellcode
[若渴計畫] Challenges and Solutions of Window Remote Shellcode
 
[若渴計畫] Introduction: Formal Verification for Code
[若渴計畫] Introduction: Formal Verification for Code[若渴計畫] Introduction: Formal Verification for Code
[若渴計畫] Introduction: Formal Verification for Code
 
[若渴計畫] Studying ASLR^cache
[若渴計畫] Studying ASLR^cache[若渴計畫] Studying ASLR^cache
[若渴計畫] Studying ASLR^cache
 
[若渴計畫] Black Hat 2017之過去閱讀相關整理
[若渴計畫] Black Hat 2017之過去閱讀相關整理[若渴計畫] Black Hat 2017之過去閱讀相關整理
[若渴計畫] Black Hat 2017之過去閱讀相關整理
 

Recently uploaded

Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...NETWAYS
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Delhi Call girls
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfhenrik385807
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Krijn Poppe
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptssuser319dad
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024eCommerce Institute
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )Pooja Nehwal
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxFamilyWorshipCenterD
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Hasting Chen
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...NETWAYS
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@vikas rana
 
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrSaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrsaastr
 
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
LANDMARKS  AND MONUMENTS IN NIGERIA.pptxLANDMARKS  AND MONUMENTS IN NIGERIA.pptx
LANDMARKS AND MONUMENTS IN NIGERIA.pptxBasil Achie
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...NETWAYS
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...NETWAYS
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...NETWAYS
 
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝soniya singh
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Kayode Fayemi
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesPooja Nehwal
 

Recently uploaded (20)

Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.ppt
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@
 
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrSaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
 
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
LANDMARKS  AND MONUMENTS IN NIGERIA.pptxLANDMARKS  AND MONUMENTS IN NIGERIA.pptx
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
 
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
 

[若渴計畫]由GPU硬體概念到coding CUDA

Editor's Notes

  1. splashtop
  2. 當初有個議題
  3. 對於programming model的解釋, 我在上次研討會的時候,我覺得有一個不錯的解釋 使用此programming model 他會帶來給你怎樣的設計概念.
  4. 所以接下來就針對SIMT來說明
  5. 這裡就是在CUDA中,描述每條thread 要做的事情 i代表每條thread 某一條thread i B陣列的i元素 和c陣列的i元素 存到a陣列的i元素裡面 每條thread所占用的硬體資源會如右圖
  6. 每條tread i 取某個lut陣列的b[i]元素 這樣的意思是說 每個thread可以對memory 自己抓自己的memory address來處理.
  7. Warp表示一個一個warp 每個warp由32 threads組成 本來32條 threads執行在同一個warp同步執行 會變成兩個warp循序執行
  8. Latency hiding 當有等地latency hiding時 去做別的事情
  9. 在每個thread使用作多個register時, 可同時執行8個warp 所以如果設計成9個Dispatch的話,在這狀況下1個dispatch就多於了