Fosdem2017 Scientific computing on Jruby

Scientific Computing on JRuby
github.com/prasunanand

Objective
● A Scientific library is memory intensive and speed counts. How to use JRuby
effectively to create a great tool/gem?
● A General Purpose GPU library for Ruby that can be used by industry in
production and academia for research.

● Ruby Science Foundation
● SciRuby has been trying to push Ruby for scientific computing.
● Popular Rubygems:
1. NMatrix
2. Daru
3. Mixed_models

NMatrix
● NMatrix is SciRuby’s numerical matrix core, implementing dense matrices as
well as two types of sparse (linked-list-based and Yale/CSR).
● It currently relies on ATLAS/CBLAS/CLAPACK and standard LAPACK for
several of its linear algebra operations.

SciRuby vs SciPy
● We love Ruby.
● We love Rails.
● Expressiveness of Ruby.

● Known for performance JRuby is 10 times faster than CRuby.
● With truffle it’s around 40 times faster than CRuby. Truffle is supported by
Oracle.

NMatrix for JRuby
● Parallelism=> No Global Interpreter Lock as in case of MRI
● Easy Deployment(Warbler gem)
● Auto Garbage collection.
● Speed
● NMatrix for JRuby relies on Apache Commons Math

MDArray
● Not a unified interface for Sciruby gems=> Why not build a wrapper around
MDArray ?
● MDArray is a great gem for Linear Algebra.
● MdArray used Parallel colt that was depreceated.
● However, every gem that used NMatrix as dependency needed to be
reimplemented with MDArray.

How NMatrix works?
● N-Dimensional
● 2-Dimensional NMatrix

N-dimensional matrices are stored as a one-dimensional Array!

NMatrix Architecture
MRI JRuby

Elementwise Operation
● [:add, :subtract, :sin, :gamma]
● Iterate through the elements.
● Access the element; do the operation, return it

Challenges
● Autoboxing and Multiple data type
● Minimise copying of data

Errors that can’t be reproduced :p
[ 0.11, 0.05, 0.34, 0.14 ]
+ [ 0. 21, 0.05, 0.14, 0.14 ]
= [ 0, 0, 0, 0]
([ 0. 11, 0.05, 0.34, 0.14 ] + 5)
+ ([ 0. 21, 0.05, 0.14, 0.14 ] + 5)
- 10
= [ 0.32, 0.1, 0.48, 0.28]

Autoboxing
● :float64 => double only
● Strict dtypes => creating data type in Java. Can’t Rely on Reflection
● @s = Array.new()
● @s = Java::double[rows*cols].new()

Autoboxing and Enumerators
def each_with_indices
nmatrix = create_dummy_nmatrix
stride = get_stride(self)
offset = 0
coords = Array.new(dim){ 0 }
shape_copy = Array.new(dim)
(0...size).each do |k|
dense_storage_coords(nmatrix, k, coords,
stride, offset)
slice_index =
dense_storage_pos(coords,stride)
ary = Array.new
if (@dtype == :object)
ary << self.s[slice_index]
else
ary << self.s.toArray.to_a[slice_index]
end
(0...dim).each do |p|
ary << coords[p]
end
yield(ary)
end if block_given?
return nmatrix
end

Minimise copying of data
● Make sure you don’t make copies of data.
● Pass-by-Reference in action:
○ Use static methods as helpers.

2 - dimensional Matrix Operations
● [:dot, :det, :factorize_lu]
● In NMatrix-MRI, BLAS-III and LAPACK routines are implemented using their
respective libraries.
● NMatrix-JRuby depends on Java functions.

Challenges
● Converting a 1-D array to 2-D array
● Array Size and Accessing elements
● Speed and Memory Required

Ruby Code
index =0
puts Benchmark.measure{
(0...15000).each do |i|
(0...15000).each do |j|
c[i][j] = b[i][j]
index+=1
end
end
}
#67.790000 0.070000 67.860000 ( 65.126546)
#RAM consumed => 5.4GB
b = Java::double[15_000,15_000].new
c = Java::double[15_000,15_000].new
index=0
puts Benchmark.measure{
(0...15000).each do |i|
(0...15000).each do |j|
b[i][j] = index
index+=1
end
end
}
#43.260000 3.250000 46.510000 ( 39.606356)

Java Code
public class MatrixGenerator{
public static void test2(){
for (int index=0, i=0; i < row ; i++){
for (int j=0; j < col; j++){
c[i][j]= b[i][j];
index++;
}
}
}
puts Benchmark.measure{MatrixGenerator.test2}
#0.034000 0.001000 00.034000 ( 00.03300)
#RAM consumed => 300MB
public class MatrixGenerator{
public static void test1(){
double[][] b = new double[15000][15000];
double[][] c = new double[15000][15000];
for (int index=0, i=0; i < row ; i++){
for (int j=0; j < col; j++){
b[i][j]= index;
index++;
}
}
}
puts Benchmark.measure{MatrixGenerator.test1}
#0.032000 0.001000 00.032000 ( 00.03100)

Results
Improves:
● 1000 times the speed
● 10times the memory

Mixed models
● After NMAtrix for doubles was ready, I tested it with mixed_models.

Benchmarking NMatrix functionalities

System Specifications
● CPU: AMD FX8350 0ctacore 4.2GHz
● RAM: 16GB

Benchmark conclusion
● NMatrix-JRuby is incredibly faster for N-dimensional matrices when
elementwise operations are concerned.
● NMatrix-MRI is faster for 2-dimensional matrix when calculating matrix
multiplication, determinant calculation and factorization.

Improvements
● Make NMatrix-JRuby faster than NMatrix-MRI using BLAS level-3 and
LAPACK routines.
● How?
● Why not JBlas?

Future Work
● Add support for complex dtype.
● Convert NMatrix-JRuby Enumerators to Java code.
● Add sparse support.

A General-Purpose GPU library
● Combine the beauty of Ruby with transparent GPU processing
● This will work both on client computers and on servers that make use of
TESLA's and Intel Xeon Phi solutions.
● Developer activity and support for the current projects is mixed at best, and
they are tough to use as they involve writing kernels and require a lot of effort
to be put in buffer/RAM optimisation.

ArrayFire-rb
● Wraps ArrayFire library

ArrayFire
● ArrayFire is an open-source GPGPU library written in C++ and uses JIT.
● ArrayFire supports CUDA-capable NVIDIA GPUs, OpenCL devices, and a C-
programming backend.
● It abstracts away from the difficult task of writing kernels for multiple
architectures; handling memory management, and performing tuning and
optimisation.

MRI
● C extension
● Architecture is inspired by NMatrix and NArray
● The C++ function is placed in a namespace (e.g., namespace af { }) or is
declared static if possible. The C function receives the prefix af_, e.g.,
arf_multiply() (this function also happens to be static).
● C macros are capitalized and generally have the prefix ARF_, as with
ARF_DTYPE().
● C functions (and macros, for consistency) are placed within extern "C" { }
blocks to turn off C++ mangling.
● C macros (in extern blocks) may represent C++ constants (which are always

#include <ruby.h>
typedef struct AF_STRUCT
{
size_t ndims;
size_t count;
size_t* dimension;
double* array;
}afstruct;
void Init_arrayfire() {
ArrayFire = rb_define_module("ArrayFire");
Blas = rb_define_class_under(ArrayFire, "BLAS",
rb_cObject);
rb_define_singleton_method(Blas, "matmul",
(METHOD)arf_matmul, 2);
}
static VALUE arf_matmul(VALUE self, VALUE left_val, VALUE
right_val){
afstruct* left;
afstruct* right;
afstruct* result = ALLOC(afstruct);
Data_Get_Struct(left_val, afstruct, left);
Data_Get_Struct(right_val, afstruct, right);
result->ndims = left->ndims;
size_t dimension[2];
dimension[0] = left->dimension[0];
dimension[1] = right->dimension[1];
size_t count = dimension[0]*dimension[1];
result->dimension = dimension;
result->count = count;
arf::matmul(result, left, right);
return Data_Wrap_Struct(CLASS_OF(left_val), NULL,
arf_free, result);
}

#include <arrayfire.h>
namespace arf {
using namespace af;
static void matmul(afstruct *result, afstruct *left, afstruct *right)
{
array l = array(left->dimension[0], left->dimension[1], left->array);
array r = array(right->dimension[0], right->dimension[1], right->array);
array res = matmul(l,r);
result->array = res.host<double>();
}
}
extern "C" {
#include "arrayfire.c"
}

JRuby
● The approach is same as NMatrix JRuby.
● Java Native Interface( JNI )
● Work on ArrayFire-Java.

● Place 'libaf.so' in the Load path.
require 'ext/vendor/ArrayFire.jar'
class Af_Array
attr_accessor :dims, :elements
def matmul(other)
Blas.matmul(self.arr, other)
end
end

System Specification
CPU: AMD FX Octacore 4.2GHz
RAM: 16GB
GPU: Nvidia GTX 750Ti
GPU RAM : 4GB DDR5

Transparency
● Integrate with Narray
● Integrate with NMatrix
● Integrate with Rails

Applications
● Endless possibilities ;)
● Bioinformatics
● Integrate Tensorflow
● Image Processing
● Computational Fluid Dynamics

Useful Links
● https://github.com/sciruby/nmatrix
● https://github.com/arrayfire/arrayfire-rb
● https://github.com/prasunanand/arrayfire-rb/tree/temp

Acknowlegements
1. Pjotr Prins
2. Charles Nutter
3. John Woods
4. Alexej Gossmann
5. Sameer Deshmukh
6. Pradeep Garigipati

Thank You
Github: prasunanand
Twitter: @prasun_anand
Blog: prasunanand.com

Fosdem2017 Scientific computing on Jruby

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Fosdem2017 Scientific computing on Jruby

Similar to Fosdem2017 Scientific computing on Jruby (20)

Recently uploaded

Recently uploaded (20)

Fosdem2017 Scientific computing on Jruby