•2 likes•1,912 views

Report

Share

Download to read offline

GPU computing on Ruby and Scientific computing on JRuby.

Follow

- 2. Objective ● A Scientific library is memory intensive and speed counts.How to use JRuby effectively to create a great tool/gem. ● A General Purpose GPU library for Ruby that can be used by industry in production and academia for research.
- 3. ● Ruby Science Foundation ● SciRuby has been trying to push Ruby for scientific computing. ● Popular Rubygems: 1. NMatrix 2. Daru 3. Mixed_models
- 4. NMatrix NMatrix is SciRuby’s numerical matrix core, implementing dense matrices as well as two types of sparse (linked-list-based and Yale/CSR). It currently relies on ATLAS/CBLAS/CLAPACK and standard LAPACK for several of its linear algebra operations.
- 6. Daru
- 7. Mixed_models
- 8. Nyaplot
- 9. Why nya?
- 11. Contributors wanted ● IRC #sciruby ● Slack-channel #sciruby ● Google-group #sciruby
- 12. Known for performance JRuby is 10 times faster than CRuby. With truffle it’s around 40 times faster than CRuby.
- 13. Say hello
- 14. NMatrix for JRuby ● Not a unified interface for Sciruby gems: MDArray. ● MDArray is a great gem for Linear Algebra. ● However, every gem that used NMatrix as dependency needed to be reimplemented with MDArray. ● Hence, putting in effort for optimization.
- 15. NMatrix for JRuby ● Parallelism=> No Global Interpreter Lock as in case of MRI ● Easy Deployment(Warbler gem)
- 16. How NMatrix works ● N-Dimensional ● 2-Dimensional NMatrix
- 17. N-dimensional NMatrix N-dimensional matrices are stored as a one-dimensional Array.
- 19. Elementwise Operation ● Iterate through the elements ● Access the array; do the operation, return it ● [:add, :subtract, :sin, :gamma]
- 20. Determinants and Factoriztion ● Two dimensional matrix operations ● In NMatrix-MRI, BLAS-III and LAPACK routines are implemented using their respective libraries ● NMatrix-JRuby depends on Java functions.
- 21. Mixed models ● After NMAtrix for doubles was ready, I tested it with mixed_models.
- 22. Challenges ● Autoboxing and Multiple data type ● Minimise copying of data ● Handling large array
- 23. Autoboxing ● :float64 => double only ● Strict dtypes => creating data type in Java: not guessing ● Errors => that can’t be reproduced :P [ 0. 11, 0.05, 0.34, 0.14 ] + [ 0. 21,0.05, 0.14, 0.14 ] = [ 0, 0, 0, 0] ([ 0. 11, 0.05, 0.34, 0.14 ] + 5) + ([ 0. 21, 0.05, 0.14, 0.14 ] + 5) - 10 = [ 0.32, 0.1, 0.48, 0.28]
- 24. Minimise copying of data ● Make sure you make copies of data
- 25. Handling large arrays ● Array Size ● Accessing elements ● Chaining to java method ● Speed and Memory Required
- 26. Ruby Code index =0 puts Benchmark.measure{ (0...15000).each do |i| (0...15000).each do |j| c[i][j] = b[i][j] index+=1 end end } #67.790000 0.070000 67.860000 ( 65.126546) #RAM consumed => 5.4GB b = Java::double[15_000,15_000].new c = Java::double[15_000,15_000].new index=0 puts Benchmark.measure{ (0...15000).each do |i| (0...15000).each do |j| b[i][j] = index index+=1 end end } #43.260000 3.250000 46.510000 ( 39.606356)
- 28. Java Code public class MatrixGenerator{ public static void test2(){ for (int index=0, i=0; i < row ; i++){ for (int j=0; j < col; j++){ c[i][j]= b[i][j]; index++; } } } puts Benchmark.measure{MatrixGenerator.test2} #0.034000 0.001000 00.034000 ( 00.03300) #RAM consumed => 300MB public class MatrixGenerator{ public static void test1(){ double[][] b = new double[15000][15000]; double[][] c = new double[15000][15000]; for (int index=0, i=0; i < row ; i++){ for (int j=0; j < col; j++){ b[i][j]= index; index++; } } } puts Benchmark.measure{MatrixGenerator.test1} #0.032000 0.001000 00.032000 ( 00.03100)
- 29. Results Improves: ● 1000 times the speed ● 10times the memory
- 31. System Specifications ● CPU: AMD FX8350 0ctacore 4.2GHz ● RAM: 16GB
- 32. Addition
- 33. Subtraction
- 34. Gamma
- 36. Determinant
- 37. Factorization
- 38. Benchmark conclusion ● NMatrix-JRuby is incredibly faster for N-dimensional matrices when elementwise operations are concerned. ● NMatrix-MRI is faster for 2-dimensional matrix when calculating matrix multiplication, determinant calculation and factorization.
- 39. Improvements ● Make NMatrix-JRuby faster than NMatrix-MRI using BLAS level-3 and LAPACK routines. ● How? ● Why not JBlas?
- 40. Future Work ● Add support for complex dtype. ● Convert NMatrix-JRuby Enumerators to Java code. ● Add sparse support.
- 41. Am I done?
- 42. Nope!
- 46. Enter GPU
- 47. A General-Purpose GPU library ● Combine the beauty of Ruby with transparent GPU processing ● This will work both on client computers and on servers that make use of TESLA's and Intel Xeon Phi solutions. ● Developer activity and support for the current projects is mixed at best, and they are tough to use as they involve writing kernels and require a lot of effort to be put in buffer/RAM optimisation.
- 48. ArrayFire-rb ● Wraps ArrayFire library
- 49. Using ArrayFire
- 50. MRI ● C extension ● Architecture is inspired by NMatrix and NArray ● The C++ function is placed in a namespace (e.g., namespace af { }) or is declared static if possible. The C function receives the prefix af_, e.g., af_multiply() (this function also happens to be static). ● C macros are capitalized and generally have the prefix AF_, as with AF_DTYPE(). ● C functions (and macros, for consistency) are placed within extern "C" { } blocks to turn off C++ mangling.
- 51. JRuby ● The approach is same as NMatrix JRuby. ● Java Native Interface( JNI ) ● Work on ArrayFire-Java
- 53. System Specification CPU: AMD FX Octacore 4.2GHz RAM: 16GB GPU: Nvidia GTX 750Ti GPU RAM : 4GB DDR5
- 54. Matrix Addition
- 57. Factorization
- 58. Transparency ● Integrate with Narray ● Integrate with NMatrix ● Integrate with Rails
- 59. Applications ● Endless possibilities ;) ● Bioinformatics ● Integrate Tensorflow ● Image Processing ● Computational Fluid Dynamics
- 60. Conclusion
- 61. Useful Links ● https://github.com/sciruby/nmatrix ● https://github.com/arrayfire/arrayfire-rb ● https://github.com/prasunanand/arrayfire-rb/tree/temp
- 62. Acknowlegements 1. Pjotr Prins 2. Charles Nutter 3. John Woods 4. Alexej Gossmann 5. Sameer Deshmukh 6. Pradeep Garigipati
- 63. Thank You Github: prasunanand Twitter: @prasun_anand Blog: prasunanand.com