Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Robustifying concurrent.futures, Thomas Moreau

610 views

Published on

PyParis 2017
http://pyparis.org

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Robustifying concurrent.futures, Thomas Moreau

  1. 1. Robustifying concurrent.futures Thomas Moreau - Olivier Grisel 1 / 26
  2. 2. Embarassingly parallel computation in python using a pool of workers Three API available: multiprocessing : rst implementation. concurrent.futures : reimplementation using multiprocessing under the hood. loky : robusti cation of concurrent.futures . 2 / 26
  3. 3. The concurrent.futures API 3 / 26
  4. 4. from concurrent.futures import ThreadPoolExecutor def fit_model(params): # Heavy computation return model fit_model is the function that we want to run asynchronously. The Executor : a worker pool 4 / 26
  5. 5. from concurrent.futures import ThreadPoolExecutor def fit_model(params): # Heavy computation return model # Create an executor with 4 threads with ThreadPoolExecutor(max_workers=4) as executor: fit_model is the function that we want to run asynchronously. We instanciate a ThreadPoolExecutor with 4 threads. The Executor : a worker pool 4 / 26
  6. 6. from concurrent.futures import ThreadPoolExecutor def fit_model(params): # Heavy computation return model # Create an executor with 4 threads with ThreadPoolExecutor(max_workers=4) as executor: # Submit an asynchronous job and return a Future future1 = executor.submit(fit_model, param1) fit_model is the function that we want to run asynchronously. We instanciate a ThreadPoolExecutor with 4 threads. A new job is submitted to the Executor . The Future object future1 holds the state of the computation. The Executor : a worker pool 4 / 26
  7. 7. from concurrent.futures import ThreadPoolExecutor def fit_model(params): # Heavy computation return model # Create an executor with 4 threads with ThreadPoolExecutor(max_workers=4) as executor: # Submit an asynchronous job and return a Future future1 = executor.submit(fit_model, param1) # Submit other job future2 = executor.submit(fit_model, param2) fit_model is the function that we want to run asynchronously. We instanciate a ThreadPoolExecutor with 4 threads. A new job is submitted to the Executor . The Future object future1 holds the state of the computation. The Executor : a worker pool 4 / 26
  8. 8. from concurrent.futures import ThreadPoolExecutor def fit_model(params): # Heavy computation return model # Create an executor with 4 threads with ThreadPoolExecutor(max_workers=4) as executor: # Submit an asynchronous job and return a Future future1 = executor.submit(fit_model, param1) # Submit other job future2 = executor.submit(fit_model, param2) # Run other computation ... fit_model is the function that we want to run asynchronously. We instanciate a ThreadPoolExecutor with 4 threads. A new job is submitted to the Executor . The Future object future1 holds the state of the computation. The Executor : a worker pool 4 / 26
  9. 9. from concurrent.futures import ThreadPoolExecutor def fit_model(params): # Heavy computation return model # Create an executor with 4 threads with ThreadPoolExecutor(max_workers=4) as executor: # Submit an asynchronous job and return a Future future1 = executor.submit(fit_model, param1) # Submit other job future2 = executor.submit(fit_model, param2) # Run other computation ... # Blocking call, wait and return the result model1 = future1.result(timeout=None) model2 = future2.result(timeout=None) fit_model is the function that we want to run asynchronously. We instanciate a ThreadPoolExecutor with 4 threads. A new job is submitted to the Executor . The Future object future1 holds the state of the computation. Wait for the computation to end and return the result with f.result . The Executor : a worker pool 4 / 26
  10. 10. from concurrent.futures import ThreadPoolExecutor def fit_model(params): # Heavy computation return model # Create an executor with 4 threads with ThreadPoolExecutor(max_workers=4) as executor: # Submit an asynchronous job and return a Future future1 = executor.submit(fit_model, param1) # Submit other job future2 = executor.submit(fit_model, param2) # Run other computation ... # Blocking call, wait and return the result model1 = future1.result(timeout=None) model2 = future2.result(timeout=None) # The ressources have been cleaned up print(model1, model2) fit_model is the function that we want to run asynchronously. We instanciate a ThreadPoolExecutor with 4 threads. A new job is submitted to the Executor . The Future object future1 holds the state of the computation. Wait for the computation to end and return the result with f.result . The ressources are cleaned up. The Executor : a worker pool 4 / 26
  11. 11. from concurrent.futures import ThreadPoolExecutor def fit_model(params): # Heavy computation return model # Create an executor with 4 threads with ThreadPoolExecutor(max_workers=4) as executor: # Submit an asynchronous job and return a Future future1 = executor.submit(fit_model, param1) # Submit other job future2 = executor.submit(fit_model, param2) # Run other computation ... # Blocking call, wait and return the result model1 = future1.result(timeout=None) model2 = future2.result(timeout=None) # The ressources have been cleaned up print(model1, model2) fit_model is the function that we want to run asynchronously. We instanciate a ThreadPoolExecutor with 4 threads. A new job is submitted to the Executor . The Future object future1 holds the state of the computation. Wait for the computation to end and return the result with f.result . The ressources are cleaned up. The Executor : a worker pool Submitting more than one job returns an iterator: executor.map 4 / 26
  12. 12. Blocking methods f.result(timeout=None) f.exception(timeout=None) wait for computations to be done. The Future object: an asynchronous result state. States Future objects hold the state of the asynchronous computations, wich can be in one of 4 states: Not started, Running, Cancelled and Done The state of a Future can be checked using f.running, f.cancelled, f.done . 5 / 26
  13. 13. from concurrent.futures import ThreadPoolExecutor def fit_model(params): # Heavy computation return model # Create an executor with 4 threads with ThreadPoolExecutor(max_workers=4) as executor: # Submit an asynchronous job and return a Future future1 = executor.submit(fit_model, param1) # Submit other job future2 = executor.submit(fit_model, param2) # Run other computation ... # Blocking call, wait and return the result model1 = future1.result(timeout=None) model2 = future2.result(timeout=None) # The ressources have been cleaned up print(model1, model2) The Executor : a worker pool 6 / 26
  14. 14. from concurrent.futures import ThreadPoolExecutor def fit_model(params): # Heavy computation return model # Create an executor with 4 threads with ThreadPoolExecutor(max_workers=4) as executor: # Submit an asynchronous job and return a Future future1 = executor.submit(fit_model, param1) # Submit other job future2 = executor.submit(fit_model, param2) # Run other computation ... # Blocking call, wait and return the result model1 = future1.result(timeout=None) model2 = future2.result(timeout=None) # The ressources have been cleaned up print(model1, model2) The Executor : a worker pool 6 / 26
  15. 15. from concurrent.futures import ThreadPoolExecutor def fit_model(params): # Heavy computation return model # Create an executor with 4 threads with ThreadPoolExecutor(max_workers=4) as executor: # Submit an asynchronous job and return a Future future1 = executor.submit(fit_model, param1) # Submit other job future2 = executor.submit(fit_model, param2) # Run other computation ... # Blocking call, wait and return the result model1 = future1.result(timeout=None) model2 = future2.result(timeout=None) # The ressources have been cleaned up print(model1, model2) The Executor : a worker pool 6 / 26
  16. 16. from concurrent.futures import ThreadPoolExecutor def fit_model(params): # Heavy computation return model # Create an executor with 4 threads with ThreadPoolExecutor(max_workers=4) as executor: # Submit an asynchronous job and return a Future future1 = executor.submit(fit_model, param1) # Submit other job future2 = executor.submit(fit_model, param2) # Run other computation ... # Blocking call, wait and return the result model1 = future1.result(timeout=None) model2 = future2.result(timeout=None) # The ressources have been cleaned up print(model1, model2) The Executor : a worker pool 6 / 26
  17. 17. from concurrent.futures import ThreadPoolExecutor def fit_model(params): # Heavy computation return model # Create an executor with 4 threads with ThreadPoolExecutor(max_workers=4) as executor: # Submit an asynchronous job and return a Future future1 = executor.submit(fit_model, param1) # Submit other job future2 = executor.submit(fit_model, param2) # Run other computation ... # Blocking call, wait and return the result model1 = future1.result(timeout=None) model2 = future2.result(timeout=None) # The ressources have been cleaned up print(model1, model2) The Executor : a worker pool 6 / 26
  18. 18. from concurrent.futures import ThreadPoolExecutor def fit_model(params): # Heavy computation return model # Create an executor with 4 threads with ThreadPoolExecutor(max_workers=4) as executor: # Submit an asynchronous job and return a Future future1 = executor.submit(fit_model, param1) # Submit other job future2 = executor.submit(fit_model, param2) # Run other computation ... # Blocking call, wait and return the result model1 = future1.result(timeout=None) model2 = future2.result(timeout=None) # The ressources have been cleaned up print(model1, model2) The Executor : a worker pool 6 / 26
  19. 19. from concurrent.futures import ThreadPoolExecutor def fit_model(params): # Heavy computation return model # Create an executor with 4 threads with ThreadPoolExecutor(max_workers=4) as executor: # Submit an asynchronous job and return a Future future1 = executor.submit(fit_model, param1) # Submit other job future2 = executor.submit(fit_model, param2) # Run other computation ... # Blocking call, wait and return the result model1 = future1.result(timeout=None) model2 = future2.result(timeout=None) # The ressources have been cleaned up print(model1, model2) The Executor : a worker pool 6 / 26
  20. 20. from concurrent.futures import ThreadPoolExecutor def fit_model(params): # Heavy computation return model # Create an executor with 4 threads with ThreadPoolExecutor(max_workers=4) as executor: # Submit an asynchronous job and return a Future future1 = executor.submit(fit_model, param1) # Submit other job future2 = executor.submit(fit_model, param2) # Run other computation ... # Blocking call, wait and return the result model1 = future1.result(timeout=None) model2 = future2.result(timeout=None) # The ressources have been cleaned up print(model1, model2) The Executor : a worker pool 6 / 26
  21. 21. Choosing the type of worker: Thread or Process ? 7 / 26
  22. 22. Running on multiple cores Python GIL The internal implementation of python interpreter relies on a "Global Interpreter Lock" (GIL), protecting the concurrent access to python objects: Only one thread can acquire it. Not designed for e cient multicore computing. Global lock everytime we access a python object. Released when performing long I/O operations or by some libraries. (e.g. numpy, openMP,..) 8 / 26
  23. 23. Real system thread: pthread windows thread All the computation are done with a single interpreter. Advantages: Fast spawning Reduced memory overhead No communication overhead (shared python objects) Thread 9 / 26
  24. 24. Real system thread: pthread windows thread All the computation are done with a single interpreter. Advantages: Fast spawning Reduced memory overhead No communication overhead (shared python objects) Thread Wait... shared python objects and a single interpreter?!? 9 / 26
  25. 25. Real system thread: pthread windows thread All the computation are done with a single interpreter. Advantages: Fast spawning Reduced memory overhead No communication overhead (shared python objects) Thread Wait... shared python objects and a single interpreter?!? There is only one GIL! 9 / 26
  26. 26. Thread Multiple threads running python code: This is not quicker than sequential even on a multicore machine. 10 / 26
  27. 27. Thread Threads hold the GIL when running python code. They release it when blocking for I/O: Or when using some c library: 11 / 26
  28. 28. Create a new python interpreter per worker. Each worker run in its own interpreter. Inconvenients: Slow spawning Higher memory overhead Higher communication overhead. Process 12 / 26
  29. 29. Create a new python interpreter per worker. Each worker run in its own interpreter. Inconvenients: Slow spawning Higher memory overhead Higher communication overhead. Process But there is no GIL! The computation can be done in parallel even for python code. 12 / 26
  30. 30. Create a new python interpreter per worker. Each worker run in its own interpreter. Inconvenients: Slow spawning Higher memory overhead Higher communication overhead. Process But there is no GIL! The computation can be done in parallel even for python code. Method to create a new interpreter: fork or spawn 12 / 26
  31. 31. Advantages: Low spawning overhead. The interpreter is warm imported. Inconvenient: Bad interaction with multithreaded programs Does not respect the POSIX speci cations Launching a new interpreter: fork Duplicate the current interpreter. (Only available on UNIX) Some libraries crash: numpy on OSX, openMP, ...⇒ 13 / 26
  32. 32. Advantages: Safe (respect POSIX) Fresh interpreter without extra libraries. Inconvenient: Slower to start. Need to reload the libraries, rede ne the functions... Launching a new interpreter: spawn Create a new interpreter from scratch. 14 / 26
  33. 33. Comparison between Thread and Process Thread Process (fork) Process (spawn) E cient multicore run No communication overhead POSIX safe No spawning overhead 15 / 26
  34. 34. Comparison between Thread and Process Thread Process (fork) Process (spawn) E cient multicore run No communication overhead POSIX safe No spawning overhead Hide the spawning overhead by reusing the pool of processes.⇒ 16 / 26
  35. 35. Reusing a ProcessPoolExecutor . 17 / 26
  36. 36. Reusing a ProcessPoolExecutor . To Avoid the spawning overhead, reuse a previously started ProcessPoolExecutor . The spawning overhead is only paid once. Easy using a global pool of process. Main issue: is that robust? 18 / 26
  37. 37. Managing the state of the executor Example deadlock >>> from concurrent.futures import ProcessPoolExecutor >>> with ProcessPoolExecutor(max_workers=4) as e: ... e.submit(lambda: 1).result() ... Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/queues.py", line 241, in _feed obj = _ForkingPickler.dumps(obj) File "/usr/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) _pickle.PicklingError: Can't pickle <function <lambda> at 0x7fcff0184d08>: attribute lookup <lambda> on __main__ failed ^C It can be tricky to know which submit call crashed the Executor . 19 / 26
  38. 38. Managing the state of the executor Example deadlock >>> from concurrent.futures import ProcessPoolExecutor >>> with ProcessPoolExecutor(max_workers=4) as e: ... e.submit(lambda: 1) ... Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/queues.py", line 241, in _feed obj = _ForkingPickler.dumps(obj) File "/usr/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) _pickle.PicklingError: Can't pickle <function <lambda> at 0x7f5c787bd488>: attribute lookup <lambda> on __main__ failed ^C Even worse, shutdown itself is deadlocked. 20 / 26
  39. 39. Reusable pool of workers: loky . 21 / 26
  40. 40. loky : a robust pool of workers >>> from loky import ProcessPoolExecutor >>> class CrashAtPickle(object): ... """Bad object that triggers a segfault at unpickling time.""" ... def __reduce__(self): ... raise RuntimeError() ... >>> with ProcessPoolExecutor(max_workers=4) as e: ... e.submit(CrashAtPickle()).result() ... Traceback (most recent call last): ... RuntimeError Traceback (most recent call last): ... BrokenExecutor: The QueueFeederThread was terminated abruptly while feeding a new job. This can be due to a job pickling error. >>> Return and raise a user friendly exception. Fix some other deadlocks. 22 / 26
  41. 41. >>> from loky import get_reusable_executor >>> excutor = get_reusable_executor(max_workers=4) >>> print(excutor.executor_id) 0 >>> excutor.submit(id, 42).result() 139655595838272 >>> excutor = get_reusable_executor(max_workers=4) >>> print(excutor.executor_id) 0 >>> excutor.submit(CrashAtUnpickle()).result() Traceback (most recent call last): ... BrokenExecutorError >>> excutor = get_reusable_executor(max_workers=4) >>> print(excutor.executor_id) 1 >>> excutor.submit(id, 42).result() 139655595838272 Create a ProcessPoolExecutor using the factory function get_reusable_executor . A reusable ProcessPoolExecutor 23 / 26
  42. 42. >>> from loky import get_reusable_executor >>> excutor = get_reusable_executor(max_workers=4) >>> print(excutor.executor_id) 0 >>> excutor.submit(id, 42).result() 139655595838272 >>> excutor = get_reusable_executor(max_workers=4) >>> print(excutor.executor_id) 0 >>> excutor.submit(CrashAtUnpickle()).result() Traceback (most recent call last): ... BrokenExecutorError >>> excutor = get_reusable_executor(max_workers=4) >>> print(excutor.executor_id) 1 >>> excutor.submit(id, 42).result() 139655595838272 Create a ProcessPoolExecutor using the factory function get_reusable_executor . The executor can be used exactly as ProcessPoolExecutor . A reusable ProcessPoolExecutor 23 / 26
  43. 43. >>> from loky import get_reusable_executor >>> excutor = get_reusable_executor(max_workers=4) >>> print(excutor.executor_id) 0 >>> excutor.submit(id, 42).result() 139655595838272 >>> excutor = get_reusable_executor(max_workers=4) >>> print(excutor.executor_id) 0 >>> excutor.submit(CrashAtUnpickle()).result() Traceback (most recent call last): ... BrokenExecutorError >>> excutor = get_reusable_executor(max_workers=4) >>> print(excutor.executor_id) 1 >>> excutor.submit(id, 42).result() 139655595838272 Create a ProcessPoolExecutor using the factory function get_reusable_executor . The executor can be used exactly as ProcessPoolExecutor . When the factory is called elsewhere, reuse the same executor if it is working. A reusable ProcessPoolExecutor 23 / 26
  44. 44. >>> from loky import get_reusable_executor >>> excutor = get_reusable_executor(max_workers=4) >>> print(excutor.executor_id) 0 >>> excutor.submit(id, 42).result() 139655595838272 >>> excutor = get_reusable_executor(max_workers=4) >>> print(excutor.executor_id) 0 >>> excutor.submit(CrashAtUnpickle()).result() Traceback (most recent call last): ... BrokenExecutorError >>> excutor = get_reusable_executor(max_workers=4) >>> print(excutor.executor_id) 1 >>> excutor.submit(id, 42).result() 139655595838272 Create a ProcessPoolExecutor using the factory function get_reusable_executor . The executor can be used exactly as ProcessPoolExecutor . When the factory is called elsewhere, reuse the same executor if it is working. When the executor is broken, automatically re-spawn a new one. A reusable ProcessPoolExecutor 23 / 26
  45. 45. Conclusion 24 / 26
  46. 46. Conclusion Thread can be e cient to run multicore programs if your code releases the GIL. Else, you should use Process with spawn and try to reuse the pool of process as much as possible. loky can help you do that ;). Improves the management of a pool of workers in projects such as joblib . 25 / 26
  47. 47. Thanks for your attention! Slides available at tommoral.github.io/pyparis17/ More on the GIL by Dave Beazley : dabeaz.com/python/GIL.pdf Loky project : github.com/tommoral/loky @tomamoral 26 / 26

×