Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Scientific Computing on JRuby
github.com/prasunanand
Objective
● A Scientific library is memory intensive and speed counts. How to use JRuby
effectively to create a great tool...
● Ruby Science Foundation
● SciRuby has been trying to push Ruby for scientific computing.
● Popular Rubygems:
1. NMatrix
...
NMatrix
● NMatrix is SciRuby’s numerical matrix core, implementing dense matrices as
well as two types of sparse (linked-l...
Daru
Mixed_models
Nyaplot
SciRuby vs SciPy
● We love Ruby.
● We love Rails.
● Expressiveness of Ruby.
● Known for performance JRuby is 10 times faster than CRuby.
● With truffle it’s around 40 times faster than CRuby. Truffl...
Say Hello!
NMatrix for JRuby
● Parallelism=> No Global Interpreter Lock as in case of MRI
● Easy Deployment(Warbler gem)
● Auto Garba...
MDArray
● Not a unified interface for Sciruby gems=> Why not build a wrapper around
MDArray ?
● MDArray is a great gem for...
How NMatrix works?
● N-Dimensional
● 2-Dimensional NMatrix
N-dimensional matrices are stored as a one-dimensional Array!
NMatrix Architecture
MRI JRuby
N - dimensional Matrix
Elementwise Operation
● [:add, :subtract, :sin, :gamma]
● Iterate through the elements.
● Access the element; do the opera...
Challenges
● Autoboxing and Multiple data type
● Minimise copying of data
Errors that can’t be reproduced :p
[ 0.11, 0.05, 0.34, 0.14 ]
+ [ 0. 21, 0.05, 0.14, 0.14 ]
= [ 0, 0, 0, 0]
([ 0. 11, 0.05...
Autoboxing
● :float64 => double only
● Strict dtypes => creating data type in Java. Can’t Rely on Reflection
● @s = Array....
Autoboxing and Enumerators
def each_with_indices
nmatrix = create_dummy_nmatrix
stride = get_stride(self)
offset = 0
coord...
Minimise copying of data
● Make sure you don’t make copies of data.
● Pass-by-Reference in action:
○ Use static methods as...
2 - dimensional Matrix
2 - dimensional Matrix Operations
● [:dot, :det, :factorize_lu]
● In NMatrix-MRI, BLAS-III and LAPACK routines are impleme...
Challenges
● Converting a 1-D array to 2-D array
● Array Size and Accessing elements
● Speed and Memory Required
Ruby Code
index =0
puts Benchmark.measure{
(0...15000).each do |i|
(0...15000).each do |j|
c[i][j] = b[i][j]
index+=1
end
...
Java Code
public class MatrixGenerator{
public static void test2(){
for (int index=0, i=0; i < row ; i++){
for (int j=0; j...
Results
Improves:
● 1000 times the speed
● 10times the memory
Mixed models
● After NMAtrix for doubles was ready, I tested it with mixed_models.
Benchmarking NMatrix functionalities
System Specifications
● CPU: AMD FX8350 0ctacore 4.2GHz
● RAM: 16GB
Addition
Subtraction
Gamma
Matrix Multiplication
Determinant
Factorization
Benchmark conclusion
● NMatrix-JRuby is incredibly faster for N-dimensional matrices when
elementwise operations are conce...
Improvements
● Make NMatrix-JRuby faster than NMatrix-MRI using BLAS level-3 and
LAPACK routines.
● How?
● Why not JBlas?
MRI
JRuby
Future Work
● Add support for complex dtype.
● Convert NMatrix-JRuby Enumerators to Java code.
● Add sparse support.
Am I done?
Nope!
Enter GPU
A General-Purpose GPU library
● Combine the beauty of Ruby with transparent GPU processing
● This will work both on client...
ArrayFire-rb
● Wraps ArrayFire library
ArrayFire
● ArrayFire is an open-source GPGPU library written in C++ and uses JIT.
● ArrayFire supports CUDA-capable NVIDI...
Using ArrayFire
MRI
● C extension
● Architecture is inspired by NMatrix and NArray
● The C++ function is placed in a namespace (e.g., name...
#include <ruby.h>
typedef struct AF_STRUCT
{
size_t ndims;
size_t count;
size_t* dimension;
double* array;
}afstruct;
void...
#include <arrayfire.h>
namespace arf {
using namespace af;
static void matmul(afstruct *result, afstruct *left, afstruct *...
JRuby
● The approach is same as NMatrix JRuby.
● Java Native Interface( JNI )
● Work on ArrayFire-Java.
● Place 'libaf.so' in the Load path.
require 'ext/vendor/ArrayFire.jar'
class Af_Array
attr_accessor :dims, :elements
def ...
Benchmarking ArrayFire
System Specification
CPU: AMD FX Octacore 4.2GHz
RAM: 16GB
GPU: Nvidia GTX 750Ti
GPU RAM : 4GB DDR5
Matrix Addition
Matrix Multiplication
Matrix Determinant
Factorization
Transparency
● Integrate with Narray
● Integrate with NMatrix
● Integrate with Rails
Applications
● Endless possibilities ;)
● Bioinformatics
● Integrate Tensorflow
● Image Processing
● Computational Fluid D...
Conclusion
Useful Links
● https://github.com/sciruby/nmatrix
● https://github.com/arrayfire/arrayfire-rb
● https://github.com/prasuna...
Acknowlegements
1. Pjotr Prins
2. Charles Nutter
3. John Woods
4. Alexej Gossmann
5. Sameer Deshmukh
6. Pradeep Garigipati
Thank You
Github: prasunanand
Twitter: @prasun_anand
Blog: prasunanand.com
Fosdem2017  Scientific computing on Jruby
Fosdem2017  Scientific computing on Jruby
Fosdem2017  Scientific computing on Jruby
Fosdem2017  Scientific computing on Jruby
Fosdem2017  Scientific computing on Jruby
Upcoming SlideShare
Loading in …5
×

Fosdem2017 Scientific computing on Jruby

823 views

Published on

Number Crunching on CRuby and JRuby using CPU and GPU.
....

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Fosdem2017 Scientific computing on Jruby

  1. 1. Scientific Computing on JRuby github.com/prasunanand
  2. 2. Objective ● A Scientific library is memory intensive and speed counts. How to use JRuby effectively to create a great tool/gem? ● A General Purpose GPU library for Ruby that can be used by industry in production and academia for research.
  3. 3. ● Ruby Science Foundation ● SciRuby has been trying to push Ruby for scientific computing. ● Popular Rubygems: 1. NMatrix 2. Daru 3. Mixed_models
  4. 4. NMatrix ● NMatrix is SciRuby’s numerical matrix core, implementing dense matrices as well as two types of sparse (linked-list-based and Yale/CSR). ● It currently relies on ATLAS/CBLAS/CLAPACK and standard LAPACK for several of its linear algebra operations.
  5. 5. Daru
  6. 6. Mixed_models
  7. 7. Nyaplot
  8. 8. SciRuby vs SciPy ● We love Ruby. ● We love Rails. ● Expressiveness of Ruby.
  9. 9. ● Known for performance JRuby is 10 times faster than CRuby. ● With truffle it’s around 40 times faster than CRuby. Truffle is supported by Oracle.
  10. 10. Say Hello!
  11. 11. NMatrix for JRuby ● Parallelism=> No Global Interpreter Lock as in case of MRI ● Easy Deployment(Warbler gem) ● Auto Garbage collection. ● Speed ● NMatrix for JRuby relies on Apache Commons Math
  12. 12. MDArray ● Not a unified interface for Sciruby gems=> Why not build a wrapper around MDArray ? ● MDArray is a great gem for Linear Algebra. ● MdArray used Parallel colt that was depreceated. ● However, every gem that used NMatrix as dependency needed to be reimplemented with MDArray.
  13. 13. How NMatrix works? ● N-Dimensional ● 2-Dimensional NMatrix
  14. 14. N-dimensional matrices are stored as a one-dimensional Array!
  15. 15. NMatrix Architecture MRI JRuby
  16. 16. N - dimensional Matrix
  17. 17. Elementwise Operation ● [:add, :subtract, :sin, :gamma] ● Iterate through the elements. ● Access the element; do the operation, return it
  18. 18. Challenges ● Autoboxing and Multiple data type ● Minimise copying of data
  19. 19. Errors that can’t be reproduced :p [ 0.11, 0.05, 0.34, 0.14 ] + [ 0. 21, 0.05, 0.14, 0.14 ] = [ 0, 0, 0, 0] ([ 0. 11, 0.05, 0.34, 0.14 ] + 5) + ([ 0. 21, 0.05, 0.14, 0.14 ] + 5) - 10 = [ 0.32, 0.1, 0.48, 0.28]
  20. 20. Autoboxing ● :float64 => double only ● Strict dtypes => creating data type in Java. Can’t Rely on Reflection ● @s = Array.new() ● @s = Java::double[rows*cols].new()
  21. 21. Autoboxing and Enumerators def each_with_indices nmatrix = create_dummy_nmatrix stride = get_stride(self) offset = 0 coords = Array.new(dim){ 0 } shape_copy = Array.new(dim) (0...size).each do |k| dense_storage_coords(nmatrix, k, coords, stride, offset) slice_index = dense_storage_pos(coords,stride) ary = Array.new if (@dtype == :object) ary << self.s[slice_index] else ary << self.s.toArray.to_a[slice_index] end (0...dim).each do |p| ary << coords[p] end yield(ary) end if block_given? return nmatrix end
  22. 22. Minimise copying of data ● Make sure you don’t make copies of data. ● Pass-by-Reference in action: ○ Use static methods as helpers.
  23. 23. 2 - dimensional Matrix
  24. 24. 2 - dimensional Matrix Operations ● [:dot, :det, :factorize_lu] ● In NMatrix-MRI, BLAS-III and LAPACK routines are implemented using their respective libraries. ● NMatrix-JRuby depends on Java functions.
  25. 25. Challenges ● Converting a 1-D array to 2-D array ● Array Size and Accessing elements ● Speed and Memory Required
  26. 26. Ruby Code index =0 puts Benchmark.measure{ (0...15000).each do |i| (0...15000).each do |j| c[i][j] = b[i][j] index+=1 end end } #67.790000 0.070000 67.860000 ( 65.126546) #RAM consumed => 5.4GB b = Java::double[15_000,15_000].new c = Java::double[15_000,15_000].new index=0 puts Benchmark.measure{ (0...15000).each do |i| (0...15000).each do |j| b[i][j] = index index+=1 end end } #43.260000 3.250000 46.510000 ( 39.606356)
  27. 27. Java Code public class MatrixGenerator{ public static void test2(){ for (int index=0, i=0; i < row ; i++){ for (int j=0; j < col; j++){ c[i][j]= b[i][j]; index++; } } } puts Benchmark.measure{MatrixGenerator.test2} #0.034000 0.001000 00.034000 ( 00.03300) #RAM consumed => 300MB public class MatrixGenerator{ public static void test1(){ double[][] b = new double[15000][15000]; double[][] c = new double[15000][15000]; for (int index=0, i=0; i < row ; i++){ for (int j=0; j < col; j++){ b[i][j]= index; index++; } } } puts Benchmark.measure{MatrixGenerator.test1} #0.032000 0.001000 00.032000 ( 00.03100)
  28. 28. Results Improves: ● 1000 times the speed ● 10times the memory
  29. 29. Mixed models ● After NMAtrix for doubles was ready, I tested it with mixed_models.
  30. 30. Benchmarking NMatrix functionalities
  31. 31. System Specifications ● CPU: AMD FX8350 0ctacore 4.2GHz ● RAM: 16GB
  32. 32. Addition
  33. 33. Subtraction
  34. 34. Gamma
  35. 35. Matrix Multiplication
  36. 36. Determinant
  37. 37. Factorization
  38. 38. Benchmark conclusion ● NMatrix-JRuby is incredibly faster for N-dimensional matrices when elementwise operations are concerned. ● NMatrix-MRI is faster for 2-dimensional matrix when calculating matrix multiplication, determinant calculation and factorization.
  39. 39. Improvements ● Make NMatrix-JRuby faster than NMatrix-MRI using BLAS level-3 and LAPACK routines. ● How? ● Why not JBlas?
  40. 40. MRI JRuby
  41. 41. Future Work ● Add support for complex dtype. ● Convert NMatrix-JRuby Enumerators to Java code. ● Add sparse support.
  42. 42. Am I done?
  43. 43. Nope!
  44. 44. Enter GPU
  45. 45. A General-Purpose GPU library ● Combine the beauty of Ruby with transparent GPU processing ● This will work both on client computers and on servers that make use of TESLA's and Intel Xeon Phi solutions. ● Developer activity and support for the current projects is mixed at best, and they are tough to use as they involve writing kernels and require a lot of effort to be put in buffer/RAM optimisation.
  46. 46. ArrayFire-rb ● Wraps ArrayFire library
  47. 47. ArrayFire ● ArrayFire is an open-source GPGPU library written in C++ and uses JIT. ● ArrayFire supports CUDA-capable NVIDIA GPUs, OpenCL devices, and a C- programming backend. ● It abstracts away from the difficult task of writing kernels for multiple architectures; handling memory management, and performing tuning and optimisation.
  48. 48. Using ArrayFire
  49. 49. MRI ● C extension ● Architecture is inspired by NMatrix and NArray ● The C++ function is placed in a namespace (e.g., namespace af { }) or is declared static if possible. The C function receives the prefix af_, e.g., arf_multiply() (this function also happens to be static). ● C macros are capitalized and generally have the prefix ARF_, as with ARF_DTYPE(). ● C functions (and macros, for consistency) are placed within extern "C" { } blocks to turn off C++ mangling. ● C macros (in extern blocks) may represent C++ constants (which are always
  50. 50. #include <ruby.h> typedef struct AF_STRUCT { size_t ndims; size_t count; size_t* dimension; double* array; }afstruct; void Init_arrayfire() { ArrayFire = rb_define_module("ArrayFire"); Blas = rb_define_class_under(ArrayFire, "BLAS", rb_cObject); rb_define_singleton_method(Blas, "matmul", (METHOD)arf_matmul, 2); } static VALUE arf_matmul(VALUE self, VALUE left_val, VALUE right_val){ afstruct* left; afstruct* right; afstruct* result = ALLOC(afstruct); Data_Get_Struct(left_val, afstruct, left); Data_Get_Struct(right_val, afstruct, right); result->ndims = left->ndims; size_t dimension[2]; dimension[0] = left->dimension[0]; dimension[1] = right->dimension[1]; size_t count = dimension[0]*dimension[1]; result->dimension = dimension; result->count = count; arf::matmul(result, left, right); return Data_Wrap_Struct(CLASS_OF(left_val), NULL, arf_free, result); }
  51. 51. #include <arrayfire.h> namespace arf { using namespace af; static void matmul(afstruct *result, afstruct *left, afstruct *right) { array l = array(left->dimension[0], left->dimension[1], left->array); array r = array(right->dimension[0], right->dimension[1], right->array); array res = matmul(l,r); result->array = res.host<double>(); } } extern "C" { #include "arrayfire.c" }
  52. 52. JRuby ● The approach is same as NMatrix JRuby. ● Java Native Interface( JNI ) ● Work on ArrayFire-Java.
  53. 53. ● Place 'libaf.so' in the Load path. require 'ext/vendor/ArrayFire.jar' class Af_Array attr_accessor :dims, :elements def matmul(other) Blas.matmul(self.arr, other) end end
  54. 54. Benchmarking ArrayFire
  55. 55. System Specification CPU: AMD FX Octacore 4.2GHz RAM: 16GB GPU: Nvidia GTX 750Ti GPU RAM : 4GB DDR5
  56. 56. Matrix Addition
  57. 57. Matrix Multiplication
  58. 58. Matrix Determinant
  59. 59. Factorization
  60. 60. Transparency ● Integrate with Narray ● Integrate with NMatrix ● Integrate with Rails
  61. 61. Applications ● Endless possibilities ;) ● Bioinformatics ● Integrate Tensorflow ● Image Processing ● Computational Fluid Dynamics
  62. 62. Conclusion
  63. 63. Useful Links ● https://github.com/sciruby/nmatrix ● https://github.com/arrayfire/arrayfire-rb ● https://github.com/prasunanand/arrayfire-rb/tree/temp
  64. 64. Acknowlegements 1. Pjotr Prins 2. Charles Nutter 3. John Woods 4. Alexej Gossmann 5. Sameer Deshmukh 6. Pradeep Garigipati
  65. 65. Thank You Github: prasunanand Twitter: @prasun_anand Blog: prasunanand.com

×