GPU Computing with Ruby
Upcoming SlideShare
Loading in...5

GPU Computing with Ruby



Presented in pecha kucha sg, a follow up party of RedDotRubyConf 2011.

Presented in pecha kucha sg, a follow up party of RedDotRubyConf 2011.



Total Views
Views on SlideShare
Embed Views



5 Embeds 1,393 1385 5 1 1 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

GPU Computing with Ruby GPU Computing with Ruby Presentation Transcript

  • GPU Computing with Ruby SpeedGo Computing Chung Shin Yee
  • CPU vs GPU Architecture 6 Core vs 1024 Core6 GB/s vs 300 GB/s Memory Bandwidth By CUDA C Programming Guide
  • CUDA Programming Model . . . .By CUDA C Programming Guide
  • Existing Programming Tools● Cg● BrookGPU● GLSL (OpenGL Shading Language)● Nvidia CUDA C/C++● OpenCL● PyCUDA Where is the Red Ruby ?
  • Bridging Ruby & CUDA C/C++● Ruby C extension – Hard to manipulate Ruby objects in C. – Compilation problems.● Ruby FFI – Bridging purely in Ruby. – Support multiple Ruby implementations.
  • Ruby Bridge Sample
  • Developing SGC Ruby CUDA● Object-oriented API.● Start with crucial operations. – Memory allocation. – Memory transfer. – Kernel launch. – Wrapper for structures.● Documented with YARD.
  • Driver vs Runtime API● CUDA Driver API – For system developers. – Supported by PyCUDA.● CUDA Runtime API – For computation centric developers. We going to support both API !
  • Using SGC Ruby CUDA● Kernel program in CUDA C.
  • Using SGC Ruby CUDA● Compiling kernel into PTX. – nvcc --ptx
  • Using SGC Ruby CUDA● Setup require rubycu include SGC::CU CUInit.init d = CUDevice.get(0) c = CUContext.create(d) m =“vadd.ptx”) f = m.function(“vadd”)
  • Using SGC Ruby CUDA● Memory allocations da = CUDevice.malloc(10*4) db = CUDevice.malloc(10*4) dc = CUDevice.malloc(10*4) ha =, 10) hb =, 10) hc =, 10)
  • Using SGC Ruby CUDA● Initialization (0...10).each { |i| ha[i] = i hb[i] = 1 hc[i] = ha[i] + hb[i] hd[i] = 0 }
  • Using SGC Ruby CUDA● Transfer inputs to the GPU CUMemory.memcpy_htod(da, ha, 4*10) CUMemory.memcpy_htod(db, hb, 4*10) CUMemory.memcpy_htod(dc, hc, 4*10)
  • Using SGC Ruby CUDA● Launch kernel on GPU # Launch with 1x1x1 grid, # 10x1x1 blocks, params = [da, db, dc, 10] f.launch_kernel(1, 1, 1, 10, 1, 1, 0, 0, params) By CUDA C Programming Guide By CUDA C Programming Guide
  • Using SGC Ruby CUDA● Transfer results back to system memory CUMemory.memcpy_dtoh(hd, dc, 4*10)● Verify results (0...10).each { |i| assert_equal(hc[i], hd[i]) }
  • Problematic CUDA Runtime API● For use in a CUDA C/C++ program.● Workaround – CUDA C/C++ effectively uses C/C++ bindings. – Create dynamic library for the kernel programs. – Load the library at runtime.
  • Current Limitations● Support limited data types. – Fixnum → int – ?? → long – Float → float – ?? → double● No supports for CUDA C++ templates.● No Ruby in a kernel program.
  • To Support● Texture memory.● New features in CUDA 4.0 – Multi-GPU. – Unified Virtual Memory.● More C data types.● Mac platform.
  • Try It Now! Thank You ~git clone git:// sgc-ruby-cudagem install ffi yardrake testrake yard