Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

GPU computing on Ruby and Scientific computing on JRuby.

No Downloads

Total views

1,827

On SlideShare

0

From Embeds

0

Number of Embeds

57

Shares

0

Downloads

4

Comments

0

Likes

1

No notes for slide

- 1. Scientific Computing on JRuby github.com/prasunanand
- 2. Objective ● A Scientific library is memory intensive and speed counts.How to use JRuby effectively to create a great tool/gem. ● A General Purpose GPU library for Ruby that can be used by industry in production and academia for research.
- 3. ● Ruby Science Foundation ● SciRuby has been trying to push Ruby for scientific computing. ● Popular Rubygems: 1. NMatrix 2. Daru 3. Mixed_models
- 4. NMatrix NMatrix is SciRuby’s numerical matrix core, implementing dense matrices as well as two types of sparse (linked-list-based and Yale/CSR). It currently relies on ATLAS/CBLAS/CLAPACK and standard LAPACK for several of its linear algebra operations.
- 5. Daru
- 6. Mixed_models
- 7. Nyaplot
- 8. Why nya?
- 9. Contributors wanted ● IRC #sciruby ● Slack-channel #sciruby ● Google-group #sciruby
- 10. Known for performance JRuby is 10 times faster than CRuby. With truffle it’s around 40 times faster than CRuby.
- 11. Say hello
- 12. NMatrix for JRuby ● Not a unified interface for Sciruby gems: MDArray. ● MDArray is a great gem for Linear Algebra. ● However, every gem that used NMatrix as dependency needed to be reimplemented with MDArray. ● Hence, putting in effort for optimization.
- 13. NMatrix for JRuby ● Parallelism=> No Global Interpreter Lock as in case of MRI ● Easy Deployment(Warbler gem)
- 14. How NMatrix works ● N-Dimensional ● 2-Dimensional NMatrix
- 15. N-dimensional NMatrix N-dimensional matrices are stored as a one-dimensional Array.
- 16. Elementwise Operation ● Iterate through the elements ● Access the array; do the operation, return it ● [:add, :subtract, :sin, :gamma]
- 17. Determinants and Factoriztion ● Two dimensional matrix operations ● In NMatrix-MRI, BLAS-III and LAPACK routines are implemented using their respective libraries ● NMatrix-JRuby depends on Java functions.
- 18. Mixed models ● After NMAtrix for doubles was ready, I tested it with mixed_models.
- 19. Challenges ● Autoboxing and Multiple data type ● Minimise copying of data ● Handling large array
- 20. Autoboxing ● :float64 => double only ● Strict dtypes => creating data type in Java: not guessing ● Errors => that can’t be reproduced :P [ 0. 11, 0.05, 0.34, 0.14 ] + [ 0. 21,0.05, 0.14, 0.14 ] = [ 0, 0, 0, 0] ([ 0. 11, 0.05, 0.34, 0.14 ] + 5) + ([ 0. 21, 0.05, 0.14, 0.14 ] + 5) - 10 = [ 0.32, 0.1, 0.48, 0.28]
- 21. Minimise copying of data ● Make sure you make copies of data
- 22. Handling large arrays ● Array Size ● Accessing elements ● Chaining to java method ● Speed and Memory Required
- 23. Ruby Code index =0 puts Benchmark.measure{ (0...15000).each do |i| (0...15000).each do |j| c[i][j] = b[i][j] index+=1 end end } #67.790000 0.070000 67.860000 ( 65.126546) #RAM consumed => 5.4GB b = Java::double[15_000,15_000].new c = Java::double[15_000,15_000].new index=0 puts Benchmark.measure{ (0...15000).each do |i| (0...15000).each do |j| b[i][j] = index index+=1 end end } #43.260000 3.250000 46.510000 ( 39.606356)
- 24. Java Code public class MatrixGenerator{ public static void test2(){ for (int index=0, i=0; i < row ; i++){ for (int j=0; j < col; j++){ c[i][j]= b[i][j]; index++; } } } puts Benchmark.measure{MatrixGenerator.test2} #0.034000 0.001000 00.034000 ( 00.03300) #RAM consumed => 300MB public class MatrixGenerator{ public static void test1(){ double[][] b = new double[15000][15000]; double[][] c = new double[15000][15000]; for (int index=0, i=0; i < row ; i++){ for (int j=0; j < col; j++){ b[i][j]= index; index++; } } } puts Benchmark.measure{MatrixGenerator.test1} #0.032000 0.001000 00.032000 ( 00.03100)
- 25. Results Improves: ● 1000 times the speed ● 10times the memory
- 26. Benchmarking NMatrix functionalities
- 27. System Specifications ● CPU: AMD FX8350 0ctacore 4.2GHz ● RAM: 16GB
- 28. Addition
- 29. Subtraction
- 30. Gamma
- 31. Matrix Multiplication
- 32. Determinant
- 33. Factorization
- 34. Benchmark conclusion ● NMatrix-JRuby is incredibly faster for N-dimensional matrices when elementwise operations are concerned. ● NMatrix-MRI is faster for 2-dimensional matrix when calculating matrix multiplication, determinant calculation and factorization.
- 35. Improvements ● Make NMatrix-JRuby faster than NMatrix-MRI using BLAS level-3 and LAPACK routines. ● How? ● Why not JBlas?
- 36. Future Work ● Add support for complex dtype. ● Convert NMatrix-JRuby Enumerators to Java code. ● Add sparse support.
- 37. Am I done?
- 38. Nope!
- 39. Enter GPU
- 40. A General-Purpose GPU library ● Combine the beauty of Ruby with transparent GPU processing ● This will work both on client computers and on servers that make use of TESLA's and Intel Xeon Phi solutions. ● Developer activity and support for the current projects is mixed at best, and they are tough to use as they involve writing kernels and require a lot of effort to be put in buffer/RAM optimisation.
- 41. ArrayFire-rb ● Wraps ArrayFire library
- 42. Using ArrayFire
- 43. MRI ● C extension ● Architecture is inspired by NMatrix and NArray ● The C++ function is placed in a namespace (e.g., namespace af { }) or is declared static if possible. The C function receives the prefix af_, e.g., af_multiply() (this function also happens to be static). ● C macros are capitalized and generally have the prefix AF_, as with AF_DTYPE(). ● C functions (and macros, for consistency) are placed within extern "C" { } blocks to turn off C++ mangling.
- 44. JRuby ● The approach is same as NMatrix JRuby. ● Java Native Interface( JNI ) ● Work on ArrayFire-Java
- 45. Benchmarking ArrayFire
- 46. System Specification CPU: AMD FX Octacore 4.2GHz RAM: 16GB GPU: Nvidia GTX 750Ti GPU RAM : 4GB DDR5
- 47. Matrix Addition
- 48. Matrix Multiplication
- 49. Matrix Determinant
- 50. Factorization
- 51. Transparency ● Integrate with Narray ● Integrate with NMatrix ● Integrate with Rails
- 52. Applications ● Endless possibilities ;) ● Bioinformatics ● Integrate Tensorflow ● Image Processing ● Computational Fluid Dynamics
- 53. Conclusion
- 54. Useful Links ● https://github.com/sciruby/nmatrix ● https://github.com/arrayfire/arrayfire-rb ● https://github.com/prasunanand/arrayfire-rb/tree/temp
- 55. Acknowlegements 1. Pjotr Prins 2. Charles Nutter 3. John Woods 4. Alexej Gossmann 5. Sameer Deshmukh 6. Pradeep Garigipati
- 56. Thank You Github: prasunanand Twitter: @prasun_anand Blog: prasunanand.com

No public clipboards found for this slide

Special Offer to SlideShare Readers

The SlideShare family just got bigger. You now have unlimited* access to books, audiobooks, magazines, and more from Scribd.

Cancel anytime.
Be the first to comment