Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scientific computing on jruby

1,018 views

Published on

GPU computing on Ruby and Scientific computing on JRuby.

Published in: Technology
  • Be the first to comment

Scientific computing on jruby

  1. 1. Scientific Computing on JRuby github.com/prasunanand
  2. 2. Objective ● A Scientific library is memory intensive and speed counts.How to use JRuby effectively to create a great tool/gem. ● A General Purpose GPU library for Ruby that can be used by industry in production and academia for research.
  3. 3. ● Ruby Science Foundation ● SciRuby has been trying to push Ruby for scientific computing. ● Popular Rubygems: 1. NMatrix 2. Daru 3. Mixed_models
  4. 4. NMatrix NMatrix is SciRuby’s numerical matrix core, implementing dense matrices as well as two types of sparse (linked-list-based and Yale/CSR). It currently relies on ATLAS/CBLAS/CLAPACK and standard LAPACK for several of its linear algebra operations.
  5. 5. Daru
  6. 6. Mixed_models
  7. 7. Nyaplot
  8. 8. Why nya?
  9. 9. Contributors wanted ● IRC #sciruby ● Slack-channel #sciruby ● Google-group #sciruby
  10. 10. Known for performance JRuby is 10 times faster than CRuby. With truffle it’s around 40 times faster than CRuby.
  11. 11. Say hello
  12. 12. NMatrix for JRuby ● Not a unified interface for Sciruby gems: MDArray. ● MDArray is a great gem for Linear Algebra. ● However, every gem that used NMatrix as dependency needed to be reimplemented with MDArray. ● Hence, putting in effort for optimization.
  13. 13. NMatrix for JRuby ● Parallelism=> No Global Interpreter Lock as in case of MRI ● Easy Deployment(Warbler gem)
  14. 14. How NMatrix works ● N-Dimensional ● 2-Dimensional NMatrix
  15. 15. N-dimensional NMatrix N-dimensional matrices are stored as a one-dimensional Array.
  16. 16. Elementwise Operation ● Iterate through the elements ● Access the array; do the operation, return it ● [:add, :subtract, :sin, :gamma]
  17. 17. Determinants and Factoriztion ● Two dimensional matrix operations ● In NMatrix-MRI, BLAS-III and LAPACK routines are implemented using their respective libraries ● NMatrix-JRuby depends on Java functions.
  18. 18. Mixed models ● After NMAtrix for doubles was ready, I tested it with mixed_models.
  19. 19. Challenges ● Autoboxing and Multiple data type ● Minimise copying of data ● Handling large array
  20. 20. Autoboxing ● :float64 => double only ● Strict dtypes => creating data type in Java: not guessing ● Errors => that can’t be reproduced :P [ 0. 11, 0.05, 0.34, 0.14 ] + [ 0. 21,0.05, 0.14, 0.14 ] = [ 0, 0, 0, 0] ([ 0. 11, 0.05, 0.34, 0.14 ] + 5) + ([ 0. 21, 0.05, 0.14, 0.14 ] + 5) - 10 = [ 0.32, 0.1, 0.48, 0.28]
  21. 21. Minimise copying of data ● Make sure you make copies of data
  22. 22. Handling large arrays ● Array Size ● Accessing elements ● Chaining to java method ● Speed and Memory Required
  23. 23. Ruby Code index =0 puts Benchmark.measure{ (0...15000).each do |i| (0...15000).each do |j| c[i][j] = b[i][j] index+=1 end end } #67.790000 0.070000 67.860000 ( 65.126546) #RAM consumed => 5.4GB b = Java::double[15_000,15_000].new c = Java::double[15_000,15_000].new index=0 puts Benchmark.measure{ (0...15000).each do |i| (0...15000).each do |j| b[i][j] = index index+=1 end end } #43.260000 3.250000 46.510000 ( 39.606356)
  24. 24. Java Code public class MatrixGenerator{ public static void test2(){ for (int index=0, i=0; i < row ; i++){ for (int j=0; j < col; j++){ c[i][j]= b[i][j]; index++; } } } puts Benchmark.measure{MatrixGenerator.test2} #0.034000 0.001000 00.034000 ( 00.03300) #RAM consumed => 300MB public class MatrixGenerator{ public static void test1(){ double[][] b = new double[15000][15000]; double[][] c = new double[15000][15000]; for (int index=0, i=0; i < row ; i++){ for (int j=0; j < col; j++){ b[i][j]= index; index++; } } } puts Benchmark.measure{MatrixGenerator.test1} #0.032000 0.001000 00.032000 ( 00.03100)
  25. 25. Results Improves: ● 1000 times the speed ● 10times the memory
  26. 26. Benchmarking NMatrix functionalities
  27. 27. System Specifications ● CPU: AMD FX8350 0ctacore 4.2GHz ● RAM: 16GB
  28. 28. Addition
  29. 29. Subtraction
  30. 30. Gamma
  31. 31. Matrix Multiplication
  32. 32. Determinant
  33. 33. Factorization
  34. 34. Benchmark conclusion ● NMatrix-JRuby is incredibly faster for N-dimensional matrices when elementwise operations are concerned. ● NMatrix-MRI is faster for 2-dimensional matrix when calculating matrix multiplication, determinant calculation and factorization.
  35. 35. Improvements ● Make NMatrix-JRuby faster than NMatrix-MRI using BLAS level-3 and LAPACK routines. ● How? ● Why not JBlas?
  36. 36. Future Work ● Add support for complex dtype. ● Convert NMatrix-JRuby Enumerators to Java code. ● Add sparse support.
  37. 37. Am I done?
  38. 38. Nope!
  39. 39. Enter GPU
  40. 40. A General-Purpose GPU library ● Combine the beauty of Ruby with transparent GPU processing ● This will work both on client computers and on servers that make use of TESLA's and Intel Xeon Phi solutions. ● Developer activity and support for the current projects is mixed at best, and they are tough to use as they involve writing kernels and require a lot of effort to be put in buffer/RAM optimisation.
  41. 41. ArrayFire-rb ● Wraps ArrayFire library
  42. 42. Using ArrayFire
  43. 43. MRI ● C extension ● Architecture is inspired by NMatrix and NArray ● The C++ function is placed in a namespace (e.g., namespace af { }) or is declared static if possible. The C function receives the prefix af_, e.g., af_multiply() (this function also happens to be static). ● C macros are capitalized and generally have the prefix AF_, as with AF_DTYPE(). ● C functions (and macros, for consistency) are placed within extern "C" { } blocks to turn off C++ mangling.
  44. 44. JRuby ● The approach is same as NMatrix JRuby. ● Java Native Interface( JNI ) ● Work on ArrayFire-Java
  45. 45. Benchmarking ArrayFire
  46. 46. System Specification CPU: AMD FX Octacore 4.2GHz RAM: 16GB GPU: Nvidia GTX 750Ti GPU RAM : 4GB DDR5
  47. 47. Matrix Addition
  48. 48. Matrix Multiplication
  49. 49. Matrix Determinant
  50. 50. Factorization
  51. 51. Transparency ● Integrate with Narray ● Integrate with NMatrix ● Integrate with Rails
  52. 52. Applications ● Endless possibilities ;) ● Bioinformatics ● Integrate Tensorflow ● Image Processing ● Computational Fluid Dynamics
  53. 53. Conclusion
  54. 54. Useful Links ● https://github.com/sciruby/nmatrix ● https://github.com/arrayfire/arrayfire-rb ● https://github.com/prasunanand/arrayfire-rb/tree/temp
  55. 55. Acknowlegements 1. Pjotr Prins 2. Charles Nutter 3. John Woods 4. Alexej Gossmann 5. Sameer Deshmukh 6. Pradeep Garigipati
  56. 56. Thank You Github: prasunanand Twitter: @prasun_anand Blog: prasunanand.com

×