Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Strategies and Tools for Parallel Machine Learning in Python

13,248 views

Published on

Published in: Technology, Education

Strategies and Tools for Parallel Machine Learning in Python

  1. 1. Strategies & Tools for Parallel Machine Learning in Python PyConFR 2012 - ParisSunday, September 16, 2012
  2. 2. The Story scikit-learn The Cloud stackoverflow & kaggleSunday, September 16, 2012
  3. 3. ...Sunday, September 16, 2012
  4. 4. Parts of the Ecosystem Single Machine with Multiple Cores ♥♥♥ multiprocessing Multiple Machines with Multiple Cores ♥♥♥Sunday, September 16, 2012
  5. 5. The Problem Big CPU (Supercomputers - MPI) Simulating stuff from models Big Data (Google scale - MapReduce) Counting stuff in logs / Indexing the Web Machine Learning? often somewhere in the middleSunday, September 16, 2012
  6. 6. Parallel ML Use Cases • Model Evaluation with Cross Validation • Model Selection with Grid Search • Bagging Models: Random Forests • Averaged ModelsSunday, September 16, 2012
  7. 7. Embarrassingly Parallel ML Use Cases • Model Evaluation with Cross Validation • Model Selection with Grid Search • Bagging Models: Random Forests • Averaged ModelsSunday, September 16, 2012
  8. 8. Inter-Process Comm. Use Cases • Model Evaluation with Cross Validation • Model Selection with Grid Search • Bagging Models: Random Forests • Averaged ModelsSunday, September 16, 2012
  9. 9. Cross Validation Input Data Labels to PredictSunday, September 16, 2012
  10. 10. Cross Validation A B C A B CSunday, September 16, 2012
  11. 11. Cross Validation A B C A B C Subset of the data used Held-out to train the model test set for evaluationSunday, September 16, 2012
  12. 12. Cross Validation A B C A B C A C B A C B B C A B C ASunday, September 16, 2012
  13. 13. Model Selection the Hyperparameters hell p1 in [1, 10, 100] p2 in [1e3, 1e4, 1e5] Find the best combination of parameters that maximizes the Cross Validated ScoreSunday, September 16, 2012
  14. 14. Grid Search p1 (1, 1e3) (10, 1e3) (100, 1e3) p2 (1, 1e4) (10, 1e4) (100, 1e4) (1, 1e5) (10, 1e5) (100, 1e5)Sunday, September 16, 2012
  15. 15. (1, 1e3) (10, 1e3) (100, 1e3) (1, 1e4) (10, 1e4) (100, 1e4) (1, 1e5) (10, 1e5) (100, 1e5)Sunday, September 16, 2012
  16. 16. Grid Search: Qualitative ResultsSunday, September 16, 2012
  17. 17. Grid Search: Cross Validated ScoresSunday, September 16, 2012
  18. 18. Enough ML theory! Lets go shopping^W parallel computing!Sunday, September 16, 2012
  19. 19. Single Machine with Multiple Cores ♥ ♥ ♥ ♥Sunday, September 16, 2012
  20. 20. multiprocessing >>> from multiprocessing import Pool >>> p = Pool(4) >>> p.map(type, [1, 2., 3]) [int, float, str] >>> r = p.map_async(type, [1, 2., 3]) >>> r.get() [int, float, str]Sunday, September 16, 2012
  21. 21. multiprocessing • Part of the standard lib • Nice API • Cross-Platform support (even Windows!) • Some support for shared memory • Support for synchronization (Lock)Sunday, September 16, 2012
  22. 22. multiprocessing: limitations • No docstrings in a stdlib module? WTF? • Tricky / impossible to use the shared memory values with NumPy • Bad support for KeyboardInterruptSunday, September 16, 2012
  23. 23. Sunday, September 16, 2012
  24. 24. • transparent disk-caching of the output values and lazy re-evaluation (memoize pattern) • easy simple parallel computing • logging and tracing of the executionSunday, September 16, 2012
  25. 25. joblib.Parallel >>> from os.path.join >>> from joblib import Parallel, delayed >>> Parallel(2)( ... delayed(join)(/ect, s) ... for s in abc) [/ect/a, /ect/b, /ect/c]Sunday, September 16, 2012
  26. 26. Usage in scikit-learn • Cross Validation cross_val(model, X, y, n_jobs=4, cv=3) • Grid Search GridSearchCV(model, n_jobs=4, cv=3).fit(X, y) • Random Forests RandomForestClassifier(n_jobs=4).fit(X, y)Sunday, September 16, 2012
  27. 27. joblib.Parallel: shared memory >>> from joblib import Parallel, delayed >>> import numpy as np >>> Parallel(2, max_nbytes=1e6)( ... delayed(type)(np.zeros(int(i))) ... for i in [1e4, 1e6]) [<type numpy.ndarray>, <class numpy.core.memmap.memmap>]Sunday, September 16, 2012
  28. 28. Only 3 allocated datasets shared by all the concurrent workers performing the grid search. (1, 1e3) (10, 1e3) (100, 1e3) (1, 1e4) (10, 1e4) (100, 1e4) (1, 1e5) (10, 1e5) (100, 1e5)Sunday, September 16, 2012
  29. 29. Multiple Machines with Multiple Cores ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥Sunday, September 16, 2012
  30. 30. MapReduce? mapper [ [ [ (k1, v1), (k3, v3), reducer (k5, v6), (k2, v2), mapper (k4, v4), (k6, v6), ... ... reducer ... ] ] ] mapperSunday, September 16, 2012
  31. 31. Why MapReduce does not always work Write a lot of stuff to disk for failover Inefficient for small to medium problems [(k, v)] mapper [(k, v)] reducer [(k, v)] Data and model params as (k, v) pairs? Complex to leverage for Iterative AlgorithmsSunday, September 16, 2012
  32. 32. IPython.parallel • Parallel Processing Library • Interactive Exploratory Shell Multi Core & DistributedSunday, September 16, 2012
  33. 33. Demo!Sunday, September 16, 2012
  34. 34. The AllReduce Pattern • Compute an aggregate (average) of active node data • Do not clog a single node with incoming data transfer • Traditionally implemented in MPI systemsSunday, September 16, 2012
  35. 35. AllReduce 0/3 Initial State Value: 1.0 Value: 2.0 Value: 0.5 Value: 1.1 Value: 3.2 Value: 0.9Sunday, September 16, 2012
  36. 36. AllReduce 1/3 Spanning Tree Value: 1.0 Value: 2.0 Value: 0.5 Value: 1.1 Value: 3.2 Value: 0.9Sunday, September 16, 2012
  37. 37. AllReduce 2/3 Upward Averages Value: 1.0 Value: 2.0 Value: 0.5 Value: 1.1 Value: 3.2 Value: 0.9 (1.1, 1) (3.1, 1) (0.9, 1)Sunday, September 16, 2012
  38. 38. AllReduce 2/3 Upward Averages Value: 2.0 Value: 0.5 Value: 1.0 (2.1, 3) (0.7, 2) Value: 1.1 Value: 3.2 Value: 0.9 (1.1, 1) (3.1, 1) (0.9, 1)Sunday, September 16, 2012
  39. 39. AllReduce 2/3 Upward Averages Value: 1.0 Value: 2.0 Value: 0.5 (1.38, 6) (2.1, 3) (0.7, 2) Value: 1.1 Value: 3.2 Value: 0.9 (1.1, 1) (3.1, 1) (0.9, 1)Sunday, September 16, 2012
  40. 40. AllReduce 3/3 Downward Updates Value: 2.0 Value: 0.5 Value: 1.38 (2.1, 3) (0.7, 2) Value: 1.1 Value: 3.2 Value: 0.9 (1.1, 1) (3.1, 1) (0.9, 1)Sunday, September 16, 2012
  41. 41. AllReduce 3/3 Downward Updates Value: 1.38 Value: 1.38 Value: 1.38 Value: 1.1 Value: 3.2 Value: 0.9 (1.1, 1) (3.1, 1) (0.9, 1)Sunday, September 16, 2012
  42. 42. AllReduce 3/3 Downward Updates Value: 1.38 Value: 1.38 Value: 1.38 Value: 1.38 Value: 1.38 Value: 1.38Sunday, September 16, 2012
  43. 43. AllReduce Final State Value: 1.38 Value: 1.38 Value: 1.38 Value: 1.38 Value: 1.38 Value: 1.38Sunday, September 16, 2012
  44. 44. AllReduce Implementations http://mpi4py.scipy.org IPC directly w/ IPython.parallel https://github.com/ipython/ipython/tree/ master/docs/examples/parallel/interengineSunday, September 16, 2012
  45. 45. Working in the Cloud • Launch a cluster of machines in one cmd: starcluster start mycluster -b 0.07 starcluster sshmaster mycluster • Supports spotinstances! • Ships blas, atlas, numpy, scipy! • IPython plugin!Sunday, September 16, 2012
  46. 46. PerspectivesSunday, September 16, 2012
  47. 47. 2012 results by Stanford / GoogleSunday, September 16, 2012
  48. 48. The YouTube NeuronSunday, September 16, 2012
  49. 49. Thanks • http://j.mp/ogrisel-pyconfr-2012 • http://scikit-learn.org • http://packages.python.org/joblib • http://ipython.org • http://star.mit.edu/cluster/ @ogriselSunday, September 16, 2012
  50. 50. Goodies Super Ctrl-C to killall nosetestsSunday, September 16, 2012
  51. 51. psutil nosetests killer script in AutomatorSunday, September 16, 2012
  52. 52. Bind killnose to Shift-Ctrl-CSunday, September 16, 2012
  53. 53. Or simpler: ps aux | grep IPython.parallel | grep -v grep | cut -d -f 2 | xargs killSunday, September 16, 2012

×