Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

High Performance GPU computing with Ruby, Rubykaigi 2018

198 views

Published on

Using Arrayfire and RbCUDA gem for High Performance GPU computing with Ruby, Rubykaigi 2018.

Published in: Software
  • Be the first to comment

  • Be the first to like this

High Performance GPU computing with Ruby, Rubykaigi 2018

  1. 1. High Performance GPU Computing with Ruby Prasun Anand
  2. 2. About me ● Modak Analytics ● Genenetwork project ● SciRuby Contributor ● Google Summer of Code 2016, 2017 ● Ruby Grant 2017 ● Fukuoka Ruby Award 2018 ● Projects: ○ JRuby port of NMatrix ○ ArrayFire gem ○ RbCUDA
  3. 3. Data is the new Oil!
  4. 4. Highlights Modak Analytics is helping implement one of the largest Life Sciences Platform in the world. Platform Details 2100 Structured data sources 500k Tables 1350 Unstructured sources 1.3 Billion Files 1200 Data Nodes 6 Petabytes Usable information • 1000+ clinical trials being standardized to CDISC (SDTM) model for cross-study analysis, placebo baseline etc. • Single integrated data platform comprising of compound, activity results, assay protocol and project information • “Like Minded” data has been grounded into Data Domains by business areas. E.g. Clinical, Assay, Gene, Regulatory etc • Around 17+ solutions have been developed and deployed for business Awarded at the prestigious ‘Strata Data Conference 2017’ for building this platform in record time
  5. 5. Governed Data Lake approach AUTOMATED DATA DISCOVERY • Modak is providing end-to-end service for the platform including Automated Ingestion, Curation, and innovative Solutions • Modak is also providing 24*7 support for the massive platform AUTOMATED DATA INGESTION Data Spider Postgres SQL serverOracle MySQL Structured Data SAS Data Sets Unstructured Data File shares SharePointDocumentum BOTS FOUNDATION LAYER Ingested Raw Data Data Tagging Data Masking Data cleansing Data lineage Data profiling Augmented Data Mapping/ Standardization Data Fingerprinting A replica of the Data is ingested into the Integration Layer INTEGRATION LAYER SOLUTIONS LAYER Data Analytics SEMANTICLAYER Visulaisation Dashboards and Reports MetaData Catalog (KOSH) Flow controller Streamsets Pipelines are generated automatically Data Governance Data Security System / Application Management SOURCE DATA Originators of data and serve as “authoring” systems to support business processes Optimized for computing and distribution of data Optimized for strategic BI product development Optimized for Business Users Optimized for Analysts, Data scientists GWAS
  6. 6. Genome Wide Association Studies(GWAS)
  7. 7. Matrix Multiplication ?
  8. 8. Arrays / Matrices
  9. 9. BLAS and LAPACK
  10. 10. GPU Computing is not easy !
  11. 11. CUDA and OpenCL
  12. 12. Af_Array [1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4] No Name Array [2 2 1 1] Offsets: [0 0 0 0] Strides: [1 2 4 4] 1.0000 3.0000 2.0000 4.0000 => #<ArrayFire::Af_Array:0x000000020aeab8>
  13. 13. Af_Array [1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4] No Name Array [2 2 1 1] Offsets: [0 0 0 0] Strides: [1 2 4 4] 1.0000 3.0000 2.0000 4.0000 => #<ArrayFire::Af_Array:0x000000020aeab8>
  14. 14. Af_Array [1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4] No Name Array [2 2 1 1] Offsets: [0 0 0 0] Strides: [1 2 4 4] 1.0000 3.0000 2.0000 4.0000 => #<ArrayFire::Af_Array:0x000000020aeab8>
  15. 15. Af_Array [1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4] No Name Array [2 2 1 1] Offsets: [0 0 0 0] Strides: [1 2 4 4] 1.0000 3.0000 2.0000 4.0000 => #<ArrayFire::Af_Array:0x000000020aeab8>
  16. 16. [2] pry(main)> b = a + a No Name Array [2 2 1 1] Offsets: [0 0 0 0] Strides: [1 2 4 4] 2.0000 6.0000 4.0000 8.0000 => #<ArrayFire::Af_Array:0x000000020625c8>
  17. 17. [1] pry(main)> left = ArrayFire::Af_Array.new 2 , [3,3] , [1, 4, 6, 4, 11 , 2 ,-5, 8, 10] No Name Array [3 3 1 1] 1.0000 4.0000 -5.0000 4.0000 11.0000 8.0000 6.0000 2.0000 10.0000 => #<ArrayFire::Af_Array:0x000000014e56c8> [2] pry(main)> right = ArrayFire::Af_Array.new 2 , [3,2] , [1, 0, 8, 10, -11, 8] No Name Array [3 2 1 1] 1.0000 10.0000 0.0000 -11.0000 8.0000 8.0000 => #<ArrayFire::Af_Array:0x00000001591db0>
  18. 18. [3] pry(main)> result = ArrayFire::BLAS.matmul(left, right, :AF_MAT_NONE, :AF_MAT_NONE) No Name Array [3 2 1 1] -39.0000 -74.0000 68.0000 -17.0000 86.0000 118.0000 => #<ArrayFire::Af_Array:0x000000016136f8>
  19. 19. VALUE arf_init(int argc, VALUE* argv, VALUE self) { afstruct* afarray; Data_Get_Struct(self, afstruct, afarray); dim_t ndims = (dim_t)NUM2LONG(argv[0]); dim_t* dimensions = (dim_t*)malloc(ndims * sizeof(dim_t)); dim_t count = 1; for (size_t index = 0; index < ndims; index++) { dimensions[index] = (dim_t)NUM2LONG(RARRAY_AREF(argv[1], index)); count *= dimensions[index]; } double* host_array = (double*)malloc(count * sizeof(double)); for (size_t index = 0; index < count; index++) { host_array[index] = (double)NUM2DBL(RARRAY_AREF(argv[2], index)); } af_create_array(&afarray->carray, host_array, ndims, dimensions, f64); return self; }
  20. 20. static VALUE arf_matmul(VALUE self, VALUE left_val, VALUE right_val, VALUE left_prop_val, VALUE right_prop_val){ afstruct* left; afstruct* right; afstruct* result = ALLOC(afstruct); Data_Get_Struct(left_val, afstruct, left); Data_Get_Struct(right_val, afstruct, right); af_mat_prop left_mat_prop = arf_mat_type_from_rbsymbol(left_prop_val); af_mat_prop right_mat_prop = arf_mat_type_from_rbsymbol(right_prop_val); af_matmul(&result->carray, left->carray, right->carray, left_mat_prop, right_mat_prop); return Data_Wrap_Struct(CLASS_OF(left_val), NULL, arf_free, result); }
  21. 21. BLAS functionalities ● Matmult ● Transpose LAPACK functionalities ● Det ● Inverse ● Norm ● Qr ● Cholesky ● Svd ● lu
  22. 22. Statistics ● Mean ● Median ● Variance
  23. 23. Benchmarks ● AMD FX 8350 octacore processor ● Nvidia GTX 750Ti GPU ● Double dtype
  24. 24. 10 X Faster than NMatrix-Ruby-Lapack
  25. 25. 10,000 X Faster than NMatrix-Ruby
  26. 26. 100,000 X Faster than NMatrix-Ruby-BLAS
  27. 27. RbCUDA
  28. 28. GPU Array ● Generic pointer used to handle an array of elements on the GPU. ● Memory copying from CPU to GPU and vice-versa. ● Interfaced with NMatrix and NArray
  29. 29. vadd_kernel_src = <<-EOS extern "C" { __global__ void matSum(int *a, int *b, int *c) { int tid = blockIdx.x; if (tid < 100) c[tid] = a[tid] + b[tid]; } } EOS f = compile(vadd_kernel_src) RbCUDA::Driver.run_kernel(f.path)
  30. 30. ● CuBLAS ● CuSolver ● CuRand
  31. 31. Benchmarks ● AMD FX 8350 octacore processor ● Nvidia GTX 750Ti GPU ● Double dtype
  32. 32. 1,000,000 X Faster than NMatrix-Ruby-BLAS
  33. 33. Fastest Matrix Multiplication Library in Ruby!
  34. 34. Future Work ● Image Processing APIs and Indexers ● Multiple dtypes ● RbCUDA is under development.
  35. 35. ● https://github.com/arrayfire/arrayfire-rb ● https://github.com/prasunanand/rbcuda Contributions are Welcome!
  36. 36. Acknowledgements 1. Pjotr Prins 2. Pradeep Garigipati 3. Kenta Murata 4. Ruby Science Foundation 5. Ruby Association 6. Modak Analytics
  37. 37. Thanks! Github: prasunanand Twitter: @prasun_anand Blog: prasunanand.com
  38. 38. Questions

×