GPU Computing with Ruby

GPU Computing with Ruby

SpeedGo Computing

Chung Shin Yee
shinyee@speedgocomputing.com

CPU vs GPU Architecture
6 Core vs 1024 Core
6 GB/s vs 300 GB/s Memory Bandwidth

By CUDA C Programming Guide

CUDA Programming Model

.
.
.
.

By CUDA C Programming Guide

Existing Programming Tools
● Cg
● BrookGPU
● GLSL (OpenGL Shading Language)
● Nvidia CUDA C/C++
● OpenCL
● PyCUDA Where is the Red Ruby ?

Bridging Ruby & CUDA C/C++
● Ruby C extension
– Hard to manipulate Ruby objects in C.
– Compilation problems.
● Ruby FFI
– Bridging purely in Ruby.
– Support multiple Ruby implementations.

Developing SGC Ruby CUDA
● Object-oriented API.
● Start with crucial operations.
– Memory allocation.
– Memory transfer.
– Kernel launch.
– Wrapper for structures.
● Documented with YARD.

Driver vs Runtime API
● CUDA Driver API
– For system developers.
– Supported by PyCUDA.
● CUDA Runtime API
– For computation centric developers.

We going to support both API !

Using SGC Ruby CUDA
● Kernel program in CUDA C.

Using SGC Ruby CUDA
● Compiling kernel into PTX.
– nvcc --ptx vadd.cu

Using SGC Ruby CUDA
● Setup
require 'rubycu'
include SGC::CU
CUInit.init
d = CUDevice.get(0)
c = CUContext.create(d)
m = CUModule.new.load(“vadd.ptx”)
f = m.function(“vadd”)

Using SGC Ruby CUDA
● Memory allocations
da = CUDevice.malloc(10*4)
db = CUDevice.malloc(10*4)
dc = CUDevice.malloc(10*4)
ha = Buffer.new(:int, 10)
hb = Buffer.new(:int, 10)
hc = Buffer.new(:int, 10)

Using SGC Ruby CUDA
● Initialization
(0...10).each { |i|
ha[i] = i
hb[i] = 1
hc[i] = ha[i] + hb[i]
hd[i] = 0
}

Using SGC Ruby CUDA
● Transfer inputs to the GPU
CUMemory.memcpy_htod(da, ha, 4*10)
CUMemory.memcpy_htod(db, hb, 4*10)
CUMemory.memcpy_htod(dc, hc, 4*10)

Using SGC Ruby CUDA
● Launch kernel on GPU
# Launch with 1x1x1 grid,
# 10x1x1 blocks,
params = [da, db, dc, 10]
f.launch_kernel(1, 1, 1, 10, 1, 1, 0, 0, params)

By CUDA C Programming Guide By CUDA C Programming Guide

Using SGC Ruby CUDA
● Transfer results back to system memory
CUMemory.memcpy_dtoh(hd, dc, 4*10)
● Verify results
(0...10).each { |i|
assert_equal(hc[i], hd[i])
}

Problematic CUDA Runtime API
● For use in a CUDA C/C++ program.
● Workaround
– CUDA C/C++ effectively uses C/C++
bindings.
– Create dynamic library for the kernel
programs.
– Load the library at runtime.

Current Limitations
● Support limited data types.
– Fixnum → int
– ?? → long
– Float → float
– ?? → double
● No supports for CUDA C++ templates.
● No Ruby in a kernel program.

To Support
● Texture memory.
● New features in CUDA 4.0
– Multi-GPU.
– Unified Virtual Memory.
● More C data types.
● Mac platform.

Try It Now! Thank You ~
git clone git://github.com/xman/sgc-ruby-cuda.git
cd sgc-ruby-cuda
gem install ffi yard
rake test
rake yard

GPU Computing with Ruby

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to GPU Computing with Ruby

Similar to GPU Computing with Ruby (20)

Recently uploaded

Recently uploaded (20)

GPU Computing with Ruby