Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

JobLib: Running Python function as pipeline jobs

4,376 views

Published on

Lighting Talk at VII PythonBrasil , São Paulo, SP about the framework joblib:

http://packages.python.org/joblib/

Published in: Technology
  • Be the first to comment

JobLib: Running Python function as pipeline jobs

  1. 1. JobLibRunning Python function as pipeline jobs Marcel Caraciolo, @marcelcaraciolo
  2. 2. JobLibSet of tools to provide lightweight pipelining in Python http://packages.python.org/joblib/ easy_install joblib
  3. 3. JobLib Avoid computing twice the same thing>>> from joblib import Memory>>> mem = Memory(cachedir=/tmp/joblib)>>> import numpy as np>>> a = np.vander(np.arange(3))>>> square = mem.cache(np.square)>>> b = square(a)________________________________________________________________________________[Memory] Calling square...square(array([[0, 0, 1], [1, 1, 1], [4, 2, 1]]))___________________________________________________________square - 0.0s, 0.0min>>> c = square(a)>>> # The above call did not trigger an evaluation Memoize pattern with fast disk-caching
  4. 4. JobLib UseCases>>> import numpy as np>>> @memory.cache... def g(x):... print A long-running calculation, with parameter, x... return np.hamming(x)>>> @memory.cache... def h(x):... print A second long-running calculation, using g(x)... return np.vander(x)>>> a = g(3)A long-running calculation, with parameter 3>>> aarray([ 0.08, 1. , 0.08])>>> g(3)array([ 0.08, 1. , 0.08])>>> b = h(a)A second long-running calculation, using g(x)>>> b2 = h(a)>>> b2array([[ 0.0064, 0.08 , 1. ], [ 1. , 1. , 1. [ 0.0064, 0.08 , 1. ], ]]) Numpy arrays Support!>>> np.allclose(b, b2)True
  5. 5. JobLib Benchmarks - Fibonacci >>>In [3]: timeit normal_fib(30) 100 loops, best of 3: 576 ms per loop Após cache ... >>>In [9]: timeit fib(30)1000 loops, best of 3: 262 us per loop
  6. 6. JobLibTransparent parallelization using multiprocessing moduleBefore>>> from math import sqrt>>> [sqrt(i**2) for i in range(10)][0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]After>>> from math import sqrt>>> from joblib import Parallel, delayed>>> Parallel(n_jobs=2, verbose=1)(delayed(sqrt)(i**2) for i in range(10))[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
  7. 7. JobLib Running Python function as pipeline jobshttp://packages.python.org/joblib/index.html Marcel Caraciolo, @marcelcaraciolo

×