Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Rubyconfindia2018 - GPU accelerated libraries for Ruby

169 views

Published on

RbCUDA and ArrayFire gem has an outstanding performance and can handle real world problems by crunching huge datasets. Deep learning libraries for Ruby with GPU acceleration are on their way. In this talk I would like to show how RbCUDA and ArrayFire help you easily accelerate your code.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Rubyconfindia2018 - GPU accelerated libraries for Ruby

  1. 1. GPU Accelerated Libraries for Ruby Prasun Anand
  2. 2. About me ● SciRuby Contributor ● Google Summer of Code 2016, 2017 ● Genenetwork project ● Ruby Grant 2017 ● Projects: ○ JRuby port of NMatrix ○ ArrayFire gem ○ RbCUDA
  3. 3. Scientific Computing
  4. 4. Why should Python programmers have all the fun ?
  5. 5. Arrays / Matrices
  6. 6. BLAS and LAPACK
  7. 7. GPU Computing is not easy !
  8. 8. CUDA and OpenCL
  9. 9. Af_Array [1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4] No Name Array [2 2 1 1] Offsets: [0 0 0 0] Strides: [1 2 4 4] 1.0000 3.0000 2.0000 4.0000 => #<ArrayFire::Af_Array:0x000000020aeab8>
  10. 10. Af_Array [1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4] No Name Array [2 2 1 1] Offsets: [0 0 0 0] Strides: [1 2 4 4] 1.0000 3.0000 2.0000 4.0000 => #<ArrayFire::Af_Array:0x000000020aeab8>
  11. 11. Af_Array [1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4] No Name Array [2 2 1 1] Offsets: [0 0 0 0] Strides: [1 2 4 4] 1.0000 3.0000 2.0000 4.0000 => #<ArrayFire::Af_Array:0x000000020aeab8>
  12. 12. Af_Array [1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4] No Name Array [2 2 1 1] Offsets: [0 0 0 0] Strides: [1 2 4 4] 1.0000 3.0000 2.0000 4.0000 => #<ArrayFire::Af_Array:0x000000020aeab8>
  13. 13. [2] pry(main)> b = a + a No Name Array [2 2 1 1] Offsets: [0 0 0 0] Strides: [1 2 4 4] 2.0000 6.0000 4.0000 8.0000 => #<ArrayFire::Af_Array:0x000000020625c8>
  14. 14. [1] pry(main)> left = ArrayFire::Af_Array.new 2 , [3,3] , [1, 4, 6, 4, 11 , 2 ,-5, 8, 10] No Name Array [3 3 1 1] 1.0000 4.0000 -5.0000 4.0000 11.0000 8.0000 6.0000 2.0000 10.0000 => #<ArrayFire::Af_Array:0x000000014e56c8> [2] pry(main)> right = ArrayFire::Af_Array.new 2 , [3,2] , [1, 0, 8, 10, -11, 8] No Name Array [3 2 1 1] 1.0000 10.0000 0.0000 -11.0000 8.0000 8.0000 => #<ArrayFire::Af_Array:0x00000001591db0>
  15. 15. [3] pry(main)> result = ArrayFire::BLAS.matmul(left, right, :AF_MAT_NONE, :AF_MAT_NONE) No Name Array [3 2 1 1] -39.0000 -74.0000 68.0000 -17.0000 86.0000 118.0000 => #<ArrayFire::Af_Array:0x000000016136f8>
  16. 16. VALUE arf_init(int argc, VALUE* argv, VALUE self) { afstruct* afarray; Data_Get_Struct(self, afstruct, afarray); dim_t ndims = (dim_t)NUM2LONG(argv[0]); dim_t* dimensions = (dim_t*)malloc(ndims * sizeof(dim_t)); dim_t count = 1; for (size_t index = 0; index < ndims; index++) { dimensions[index] = (dim_t)NUM2LONG(RARRAY_AREF(argv[1], index)); count *= dimensions[index]; } double* host_array = (double*)malloc(count * sizeof(double)); for (size_t index = 0; index < count; index++) { host_array[index] = (double)NUM2DBL(RARRAY_AREF(argv[2], index)); } af_create_array(&afarray->carray, host_array, ndims, dimensions, f64); return self; }
  17. 17. static VALUE arf_matmul(VALUE self, VALUE left_val, VALUE right_val, VALUE left_prop_val, VALUE right_prop_val){ afstruct* left; afstruct* right; afstruct* result = ALLOC(afstruct); Data_Get_Struct(left_val, afstruct, left); Data_Get_Struct(right_val, afstruct, right); af_mat_prop left_mat_prop = arf_mat_type_from_rbsymbol(left_prop_val); af_mat_prop right_mat_prop = arf_mat_type_from_rbsymbol(right_prop_val); af_matmul(&result->carray, left->carray, right->carray, left_mat_prop, right_mat_prop); return Data_Wrap_Struct(CLASS_OF(left_val), NULL, arf_free, result); }
  18. 18. BLAS functionalities ● Matmult ● Transpose LAPACK functionalities ● Det ● Inverse ● Norm ● Qr ● Cholesky ● Svd ● lu
  19. 19. Statistics ● Mean ● Median ● Variance
  20. 20. Benchmarks ● AMD FX 8350 octacore processor ● Nvidia GTX 750Ti GPU ● Double dtype
  21. 21. 10 X Faster than NMatrix-Ruby-Lapack
  22. 22. 10,000 X Faster than NMatrix-Ruby
  23. 23. 100,000 X Faster than NMatrix-Ruby-BLAS
  24. 24. RbCUDA
  25. 25. GPU Array ● Generic pointer used to handle an array of elements on the GPU. ● Memory copying from CPU to GPU and vice-versa. ● Interfaced with NMatrix and NArray
  26. 26. vadd_kernel_src = <<-EOS extern "C" { __global__ void matSum(int *a, int *b, int *c) { int tid = blockIdx.x; if (tid < 100) c[tid] = a[tid] + b[tid]; } } EOS f = compile(vadd_kernel_src) RbCUDA::Driver.run_kernel(f.path)
  27. 27. ● CuBLAS ● CuSolver ● CuRand
  28. 28. Benchmarks ● AMD FX 8350 octacore processor ● Nvidia GTX 750Ti GPU ● Double dtype
  29. 29. 1,000,000 X Faster than NMatrix-Ruby-BLAS
  30. 30. Future Work ● Image Processing APIs and Indexers ● Multiple dtypes ● RbCUDA is under active development. ● Project RbCUDA is being funded by Ruby Association (Ruby Grant 2017)
  31. 31. Hobby Projects 1. Synapse : A Ruby first Deep Learning Library 2. Ninjaplot : A plotting library
  32. 32. ● https://github.com/arrayfire/arrayfire-rb ● https://github.com/prasunanand/rbcuda ● https://github.com/prasunanand/synapse ● https://github.com/prasunanand/ninjaplot Contributions are Welcome!
  33. 33. Acknowledgements 1. Pjotr Prins 2. Pradeep Garigipati 3. Kenta Murata
  34. 34. Thanks! Github: prasunanand Twitter: @prasun_anand Blog: prasunanand.com
  35. 35. Questions

×