JobLib: Running Python function as pipeline jobs

3,644 views
3,558 views

Published on

Lighting Talk at VII PythonBrasil , São Paulo, SP about the framework joblib:

http://packages.python.org/joblib/

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,644
On SlideShare
0
From Embeds
0
Number of Embeds
35
Actions
Shares
0
Downloads
12
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

JobLib: Running Python function as pipeline jobs

  1. 1. JobLibRunning Python function as pipeline jobs Marcel Caraciolo, @marcelcaraciolo
  2. 2. JobLibSet of tools to provide lightweight pipelining in Python http://packages.python.org/joblib/ easy_install joblib
  3. 3. JobLib Avoid computing twice the same thing>>> from joblib import Memory>>> mem = Memory(cachedir=/tmp/joblib)>>> import numpy as np>>> a = np.vander(np.arange(3))>>> square = mem.cache(np.square)>>> b = square(a)________________________________________________________________________________[Memory] Calling square...square(array([[0, 0, 1], [1, 1, 1], [4, 2, 1]]))___________________________________________________________square - 0.0s, 0.0min>>> c = square(a)>>> # The above call did not trigger an evaluation Memoize pattern with fast disk-caching
  4. 4. JobLib UseCases>>> import numpy as np>>> @memory.cache... def g(x):... print A long-running calculation, with parameter, x... return np.hamming(x)>>> @memory.cache... def h(x):... print A second long-running calculation, using g(x)... return np.vander(x)>>> a = g(3)A long-running calculation, with parameter 3>>> aarray([ 0.08, 1. , 0.08])>>> g(3)array([ 0.08, 1. , 0.08])>>> b = h(a)A second long-running calculation, using g(x)>>> b2 = h(a)>>> b2array([[ 0.0064, 0.08 , 1. ], [ 1. , 1. , 1. [ 0.0064, 0.08 , 1. ], ]]) Numpy arrays Support!>>> np.allclose(b, b2)True
  5. 5. JobLib Benchmarks - Fibonacci >>>In [3]: timeit normal_fib(30) 100 loops, best of 3: 576 ms per loop Após cache ... >>>In [9]: timeit fib(30)1000 loops, best of 3: 262 us per loop
  6. 6. JobLibTransparent parallelization using multiprocessing moduleBefore>>> from math import sqrt>>> [sqrt(i**2) for i in range(10)][0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]After>>> from math import sqrt>>> from joblib import Parallel, delayed>>> Parallel(n_jobs=2, verbose=1)(delayed(sqrt)(i**2) for i in range(10))[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
  7. 7. JobLib Running Python function as pipeline jobshttp://packages.python.org/joblib/index.html Marcel Caraciolo, @marcelcaraciolo

×