SlideShare a Scribd company logo
1 of 20
Download to read offline
GPU Computing with Ruby



      SpeedGo Computing

         Chung Shin Yee
 shinyee@speedgocomputing.com
CPU vs GPU Architecture
        6 Core vs 1024 Core
6 GB/s vs 300 GB/s Memory Bandwidth




       By CUDA C Programming Guide
CUDA Programming Model



                              .
                              .
                              .
                              .




By CUDA C Programming Guide
Existing Programming Tools
●   Cg
●   BrookGPU
●   GLSL (OpenGL Shading Language)
●   Nvidia CUDA C/C++
●   OpenCL
●   PyCUDA     Where is the Red Ruby ?
Bridging Ruby & CUDA C/C++
●   Ruby C extension
       –   Hard to manipulate Ruby objects in C.
       –   Compilation problems.
●   Ruby FFI
       –   Bridging purely in Ruby.
       –   Support multiple Ruby implementations.
Ruby Bridge Sample
Developing SGC Ruby CUDA
●   Object-oriented API.
●   Start with crucial operations.
       –   Memory allocation.
       –   Memory transfer.
       –   Kernel launch.
       –   Wrapper for structures.
●   Documented with YARD.
Driver vs Runtime API
●   CUDA Driver API
      –   For system developers.
      –   Supported by PyCUDA.
●   CUDA Runtime API
      –   For computation centric developers.


          We going to support both API !
Using SGC Ruby CUDA
●   Kernel program in CUDA C.
Using SGC Ruby CUDA
●   Compiling kernel into PTX.
       –   nvcc --ptx vadd.cu
Using SGC Ruby CUDA
●   Setup
        require 'rubycu'
        include SGC::CU
        CUInit.init
        d = CUDevice.get(0)
        c = CUContext.create(d)
        m = CUModule.new.load(“vadd.ptx”)
        f = m.function(“vadd”)
Using SGC Ruby CUDA
●   Memory allocations
        da = CUDevice.malloc(10*4)
        db = CUDevice.malloc(10*4)
        dc = CUDevice.malloc(10*4)
        ha = Buffer.new(:int, 10)
        hb = Buffer.new(:int, 10)
        hc = Buffer.new(:int, 10)
Using SGC Ruby CUDA
●   Initialization
         (0...10).each { |i|
                ha[i] = i
                hb[i] = 1
                hc[i] = ha[i] + hb[i]
                hd[i] = 0
         }
Using SGC Ruby CUDA
●   Transfer inputs to the GPU
        CUMemory.memcpy_htod(da, ha, 4*10)
        CUMemory.memcpy_htod(db, hb, 4*10)
        CUMemory.memcpy_htod(dc, hc, 4*10)
Using SGC Ruby CUDA
●    Launch kernel on GPU
            # Launch with 1x1x1 grid,
            # 10x1x1 blocks,
            params = [da, db, dc, 10]
            f.launch_kernel(1, 1, 1, 10, 1, 1, 0, 0, params)




    By CUDA C Programming Guide       By CUDA C Programming Guide
Using SGC Ruby CUDA
●   Transfer results back to system memory
         CUMemory.memcpy_dtoh(hd, dc, 4*10)
●   Verify results
         (0...10).each { |i|
               assert_equal(hc[i], hd[i])
         }
Problematic CUDA Runtime API
●   For use in a CUDA C/C++ program.
●   Workaround
       –   CUDA C/C++ effectively uses C/C++
            bindings.
       –   Create dynamic library for the kernel
            programs.
       –   Load the library at runtime.
Current Limitations
●   Support limited data types.
       –   Fixnum   → int
       –   ??       → long
       –   Float    → float
       –   ??       → double
●   No supports for CUDA C++ templates.
●   No Ruby in a kernel program.
To Support
●   Texture memory.
●   New features in CUDA 4.0
       –   Multi-GPU.
       –   Unified Virtual Memory.
●   More C data types.
●   Mac platform.
Try It Now! Thank You ~
git clone git://github.com/xman/sgc-ruby-cuda.git
cd sgc-ruby-cuda
gem install ffi yard
rake test
rake yard

More Related Content

What's hot

Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08
Angela Mendoza M.
 
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
DefCamp
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
Rob Gillen
 

What's hot (20)

Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
 
Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)
 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance Computing
 
GPU: Understanding CUDA
GPU: Understanding CUDAGPU: Understanding CUDA
GPU: Understanding CUDA
 
CUDA
CUDACUDA
CUDA
 
NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009
 
Introduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : NotesIntroduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : Notes
 
Cuda tutorial
Cuda tutorialCuda tutorial
Cuda tutorial
 
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
 
Debugging CUDA applications
Debugging CUDA applicationsDebugging CUDA applications
Debugging CUDA applications
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDA
 
How to Burn Multi-GPUs using CUDA stress test memo
How to Burn Multi-GPUs using CUDA stress test memoHow to Burn Multi-GPUs using CUDA stress test memo
How to Burn Multi-GPUs using CUDA stress test memo
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
 
qCUDA-ARM : Virtualization for Embedded GPU Architectures
 qCUDA-ARM : Virtualization for Embedded GPU Architectures  qCUDA-ARM : Virtualization for Embedded GPU Architectures
qCUDA-ARM : Virtualization for Embedded GPU Architectures
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
 
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
 
Cuda intro
Cuda introCuda intro
Cuda intro
 
Computing using GPUs
Computing using GPUsComputing using GPUs
Computing using GPUs
 
GS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauGS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill Bilodeau
 

Viewers also liked

Computación paralela con gp us cuda
Computación paralela con gp us cudaComputación paralela con gp us cuda
Computación paralela con gp us cuda
Javier Zarco
 
FAST MAP PROJECTION ON CUDA.ppt
FAST MAP PROJECTION ON CUDA.pptFAST MAP PROJECTION ON CUDA.ppt
FAST MAP PROJECTION ON CUDA.ppt
grssieee
 
CPU vs. GPU presentation
CPU vs. GPU presentationCPU vs. GPU presentation
CPU vs. GPU presentation
Vishal Singh
 

Viewers also liked (9)

Ruby Supercomputing - Using The GPU For Massive Performance Speedup v1.1
Ruby Supercomputing - Using The GPU For Massive Performance Speedup v1.1Ruby Supercomputing - Using The GPU For Massive Performance Speedup v1.1
Ruby Supercomputing - Using The GPU For Massive Performance Speedup v1.1
 
Hardware Acceleration of Computional Fluid Dynamics SImulations in an Oxygena...
Hardware Acceleration of Computional Fluid Dynamics SImulations in an Oxygena...Hardware Acceleration of Computional Fluid Dynamics SImulations in an Oxygena...
Hardware Acceleration of Computional Fluid Dynamics SImulations in an Oxygena...
 
Sun Tzu: The Art Of...Business?
Sun Tzu: The Art Of...Business?Sun Tzu: The Art Of...Business?
Sun Tzu: The Art Of...Business?
 
Equipo 2 gpus
Equipo 2 gpusEquipo 2 gpus
Equipo 2 gpus
 
Computación paralela con gp us cuda
Computación paralela con gp us cudaComputación paralela con gp us cuda
Computación paralela con gp us cuda
 
FAST MAP PROJECTION ON CUDA.ppt
FAST MAP PROJECTION ON CUDA.pptFAST MAP PROJECTION ON CUDA.ppt
FAST MAP PROJECTION ON CUDA.ppt
 
OpenHPI - Parallel Programming Concepts - Week 4
OpenHPI - Parallel Programming Concepts - Week 4OpenHPI - Parallel Programming Concepts - Week 4
OpenHPI - Parallel Programming Concepts - Week 4
 
CPU vs. GPU presentation
CPU vs. GPU presentationCPU vs. GPU presentation
CPU vs. GPU presentation
 
GPU - An Introduction
GPU - An IntroductionGPU - An Introduction
GPU - An Introduction
 

Similar to GPU Computing with Ruby

CUDA by Example : The Final Countdown : Notes
CUDA by Example : The Final Countdown : NotesCUDA by Example : The Final Countdown : Notes
CUDA by Example : The Final Countdown : Notes
Subhajit Sahu
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 
Pycon2014 GPU computing
Pycon2014 GPU computingPycon2014 GPU computing
Pycon2014 GPU computing
Ashwin Ashok
 

Similar to GPU Computing with Ruby (20)

lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
 
CuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPUCuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPU
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
 
CUDA by Example : The Final Countdown : Notes
CUDA by Example : The Final Countdown : NotesCUDA by Example : The Final Countdown : Notes
CUDA by Example : The Final Countdown : Notes
 
Getting started with AMD GPUs
Getting started with AMD GPUsGetting started with AMD GPUs
Getting started with AMD GPUs
 
Hybrid Map Task Scheduling for GPU-based Heterogeneous Clusters
Hybrid Map Task Scheduling for GPU-based Heterogeneous ClustersHybrid Map Task Scheduling for GPU-based Heterogeneous Clusters
Hybrid Map Task Scheduling for GPU-based Heterogeneous Clusters
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmap
 
DockerCon EU '17 - Dockerizing Aurea
DockerCon EU '17 - Dockerizing AureaDockerCon EU '17 - Dockerizing Aurea
DockerCon EU '17 - Dockerizing Aurea
 
[HKOSCon x COSCUP 2020][20200801][Ansible: From VM to Kubernetes]
[HKOSCon x COSCUP 2020][20200801][Ansible: From VM to Kubernetes][HKOSCon x COSCUP 2020][20200801][Ansible: From VM to Kubernetes]
[HKOSCon x COSCUP 2020][20200801][Ansible: From VM to Kubernetes]
 
Introduction to cuda geek camp singapore 2011
Introduction to cuda   geek camp singapore 2011Introduction to cuda   geek camp singapore 2011
Introduction to cuda geek camp singapore 2011
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdf
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
NVIDIA GTC 2019: Red Hat and the NVIDIA DGX: Tried, Tested, Trusted
NVIDIA GTC 2019:  Red Hat and the NVIDIA DGX: Tried, Tested, TrustedNVIDIA GTC 2019:  Red Hat and the NVIDIA DGX: Tried, Tested, Trusted
NVIDIA GTC 2019: Red Hat and the NVIDIA DGX: Tried, Tested, Trusted
 
Pycon2014 GPU computing
Pycon2014 GPU computingPycon2014 GPU computing
Pycon2014 GPU computing
 
Building a continuous delivery platform for the biggest spike in e-commerce -...
Building a continuous delivery platform for the biggest spike in e-commerce -...Building a continuous delivery platform for the biggest spike in e-commerce -...
Building a continuous delivery platform for the biggest spike in e-commerce -...
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computing
 
S0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cudaS0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cuda
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

GPU Computing with Ruby

  • 1. GPU Computing with Ruby SpeedGo Computing Chung Shin Yee shinyee@speedgocomputing.com
  • 2. CPU vs GPU Architecture 6 Core vs 1024 Core 6 GB/s vs 300 GB/s Memory Bandwidth By CUDA C Programming Guide
  • 3. CUDA Programming Model . . . . By CUDA C Programming Guide
  • 4. Existing Programming Tools ● Cg ● BrookGPU ● GLSL (OpenGL Shading Language) ● Nvidia CUDA C/C++ ● OpenCL ● PyCUDA Where is the Red Ruby ?
  • 5. Bridging Ruby & CUDA C/C++ ● Ruby C extension – Hard to manipulate Ruby objects in C. – Compilation problems. ● Ruby FFI – Bridging purely in Ruby. – Support multiple Ruby implementations.
  • 7. Developing SGC Ruby CUDA ● Object-oriented API. ● Start with crucial operations. – Memory allocation. – Memory transfer. – Kernel launch. – Wrapper for structures. ● Documented with YARD.
  • 8. Driver vs Runtime API ● CUDA Driver API – For system developers. – Supported by PyCUDA. ● CUDA Runtime API – For computation centric developers. We going to support both API !
  • 9. Using SGC Ruby CUDA ● Kernel program in CUDA C.
  • 10. Using SGC Ruby CUDA ● Compiling kernel into PTX. – nvcc --ptx vadd.cu
  • 11. Using SGC Ruby CUDA ● Setup require 'rubycu' include SGC::CU CUInit.init d = CUDevice.get(0) c = CUContext.create(d) m = CUModule.new.load(“vadd.ptx”) f = m.function(“vadd”)
  • 12. Using SGC Ruby CUDA ● Memory allocations da = CUDevice.malloc(10*4) db = CUDevice.malloc(10*4) dc = CUDevice.malloc(10*4) ha = Buffer.new(:int, 10) hb = Buffer.new(:int, 10) hc = Buffer.new(:int, 10)
  • 13. Using SGC Ruby CUDA ● Initialization (0...10).each { |i| ha[i] = i hb[i] = 1 hc[i] = ha[i] + hb[i] hd[i] = 0 }
  • 14. Using SGC Ruby CUDA ● Transfer inputs to the GPU CUMemory.memcpy_htod(da, ha, 4*10) CUMemory.memcpy_htod(db, hb, 4*10) CUMemory.memcpy_htod(dc, hc, 4*10)
  • 15. Using SGC Ruby CUDA ● Launch kernel on GPU # Launch with 1x1x1 grid, # 10x1x1 blocks, params = [da, db, dc, 10] f.launch_kernel(1, 1, 1, 10, 1, 1, 0, 0, params) By CUDA C Programming Guide By CUDA C Programming Guide
  • 16. Using SGC Ruby CUDA ● Transfer results back to system memory CUMemory.memcpy_dtoh(hd, dc, 4*10) ● Verify results (0...10).each { |i| assert_equal(hc[i], hd[i]) }
  • 17. Problematic CUDA Runtime API ● For use in a CUDA C/C++ program. ● Workaround – CUDA C/C++ effectively uses C/C++ bindings. – Create dynamic library for the kernel programs. – Load the library at runtime.
  • 18. Current Limitations ● Support limited data types. – Fixnum → int – ?? → long – Float → float – ?? → double ● No supports for CUDA C++ templates. ● No Ruby in a kernel program.
  • 19. To Support ● Texture memory. ● New features in CUDA 4.0 – Multi-GPU. – Unified Virtual Memory. ● More C data types. ● Mac platform.
  • 20. Try It Now! Thank You ~ git clone git://github.com/xman/sgc-ruby-cuda.git cd sgc-ruby-cuda gem install ffi yard rake test rake yard