More Related Content Similar to 関東GPGPU勉強会 LLVM meets GPU Similar to 関東GPGPU勉強会 LLVM meets GPU (20) 関東GPGPU勉強会 LLVM meets GPU11. LLVMの仕事
C X86
Lexer Optimize
Parser Generate Asm
Generate IR
C++ ARM
LLVM IR
Obj-C Power
JIT
Yet Another
Programming Language Program Yet Another
Hardware Platform
18. なにができるの?
CUDA Compiler SDK Preview
NVVM IR PTX
Optimize, Codegen
Yet Another NVIDIA
Programming Language
GPU
20. なにができるの?
CUDA Compiler SDK Preview
NVVM IR PTX
Optimize, Codegen
Yet Another NVIDIA
Programming Language
GPU
24. 細かい話
• __device__、 __constant__などはLLVMのaddress space
number という概念で表される
• __global__関数、__device__関数などの種別はmetadataという
仕組みで表される
• 追加の組み込み関数が幾つか(nvvm.*)
• 同期、テクスチャ/サーフェス、特殊レジスタ、etc..
25. NVVM Library API
• Start and Shutdown
• Create Compiler Unit
• Verify
• Compile To PTX
• Get Result
32. 1 import pycuda.driver as drv
2 import pycuda.tools
3 import pycuda.autoinit
4 import numpy
5 import numpy.linalg as la
6 from pycuda.compiler import SourceModule
7
8 mod = SourceModule("""
9 __global__ void multiply_them(float *dest, float *a, float *b)
10 {
11 const int i = threadIdx.x;
12 dest[i] = a[i] * b[i];
13 }
14 """)
15
16 multiply_them = mod.get_function("multiply_them")
17
18 a = numpy.random.randn(400).astype(numpy.float32)
19 b = numpy.random.randn(400).astype(numpy.float32)
20
21 dest = numpy.zeros_like(a)
22 multiply_them(
23 drv.Out(dest), drv.In(a), drv.In(b),
24 block=(400,1,1))
25
26 print dest-a*b
PythonからGPUプログラミングが簡単にできる!
34. 4 import numpy
5 import numpy.linalg as la
6 from pycuda.compiler import SourceModule
7
8 mod = SourceModule("""
9 __global__ void multiply_them(float *dest, float *a, float *b)
10 {
11 const int i = threadIdx.x;
12 dest[i] = a[i] * b[i];
13 }
14 """)
15
16 multiply_them = mod.get_function("multiply_them")
17
18 a = numpy.random.randn(400).astype(numpy.float32)
19 b = numpy.random.randn(400).astype(numpy.float32)
20
21 dest = numpy.zeros_like(a)
22 multiply_them(
23 drv.Out(dest), drv.In(a), drv.In(b),
24 block=(400,1,1))
35. 8 mod = SourceModule("""
9 __global__ void multiply_them(float *dest, float *a,
float *b)
10 {
11 const int i = threadIdx.x;
12 dest[i] = a[i] * b[i];
13 }
14 """)
38. 1 import pycuda.driver as drv
2 import pycuda.tools
3 import pycuda.autoinit
4 import numpy
5 import numpy.linalg as la
6 from pycuda.compiler import SourceModule
7
8 @kernel
9 def multiply_them(dest, a, b):
10 i = threadIdx.x
11 dest[i] = a[i] * b[i];
12
こんなかんじで書きたい
13 a = numpy.random.randn(400).astype(numpy.float32)
14 b = numpy.random.randn(400).astype(numpy.float32)
15
16 dest = numpy.zeros_like(a)
17 multiply_them(
18 drv.Out(dest), drv.In(a), drv.In(b),
19 block=(400,1,1))
20
21 print dest-a*b
40. @kernelデコレータで
1 import numpy as np
2
3 from pynvvm.kernel import kernel コード生成
4 from pynvvm.nvtype import array, float32, int32
5
6 @kernel(array(float32), array(float32), array(float32), float32(), int32(), int32())
7 def saxpy(z, x, y, a, w, h):
8 xidx = pynvvm_ctaid_x() * pynvvm_ntid_x() + pynvvm_tid_x()
9 yidx = pynvvm_ctaid_y() * pynvvm_ntid_y() + pynvvm_tid_y()
10
11 if yidx < h and xidx < w: 専用のintrinsic
12 i = yidx * w + xidx
13 z[i] = a * x[i] + y[i]
14
15 return
16
17 n = 1024
18
19 x = np.random.randn(n*n).astype(np.float32)
20 y = np.random.randn(n*n).astype(np.float32)
21 a = np.float32(2.71828183)
22
23 z = np.zeros_like(x)
24
25 bsz = (16, 16, 1) In, Outはコード生成時に
26 gsz = ((n+16-1)/16, (n+16-1)/16, 1)
27
28 saxpy(bsz, gsz)(z, x, y, a, np.int32(n), np.int32(n)) 解析して自動転送
29
30 print(z)
41. ast.parse()
@kernel decorator
AST
inspect.getsource()
型推論
kernel function
Typed AST
NVVM Codegen
GO GPU! PTX Codegen
PTX NVVM IR
43. つくったもの
• LLVM LibraryのPython Binding
• boost::pythonすばらしい
• NVVM LibraryのPython Binding
• CTypesすばらしい
• Python AST -> NVVM IRのトランスレータ
50. まとめ
• NVVM IRはLLVM IR Builderで作れるので
オレオレ言語をGPUで動かすのもカンタン!
• アホなIR作っても色々最適化してくれるのでハッピー
• みんなもCUDA Compiler SDKであそぼう
51. Related Works
• py2llvm
• python -> LLVM IR
• http://code.google.com/p/py2llvm/
• copperhead
• 自動並列化
• http://code.google.com/p/copperhead/
Editor's Notes \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n