Strategies & Tools for                       Parallel Machine Learning                                in Python           ...
The Story                       scikit-learn                       The Cloud                       stackoverflow           ...
...Sunday, September 16, 2012
Parts of the Ecosystem             Single Machine with Multiple Cores ♥♥♥                 multiprocessing             Mult...
The Problem                   Big CPU (Supercomputers - MPI)                             Simulating stuff from models     ...
Parallel ML Use Cases                   • Model Evaluation with Cross Validation                   • Model Selection with ...
Embarrassingly Parallel                      ML Use Cases                   • Model Evaluation with Cross Validation      ...
Inter-Process Comm.                              Use Cases                   • Model Evaluation with Cross Validation     ...
Cross Validation                                   Input Data                                Labels to PredictSunday, Sept...
Cross Validation                             A      B       C                             A      B       CSunday, Septembe...
Cross Validation                             A         B            C                             A         B            C...
Cross Validation                             A         B            C                             A         B            C...
Model Selection                                               the Hyperparameters hell                       p1 in [1, 10,...
Grid Search                                               p1                                  (1, 1e3)   (10, 1e3) (100, 1...
(1, 1e3)   (10, 1e3)   (100, 1e3)                  (1, 1e4)   (10, 1e4)   (100, 1e4)                  (1, 1e5)   (10, 1e5)...
Grid Search:                             Qualitative ResultsSunday, September 16, 2012
Grid Search:                   Cross Validated ScoresSunday, September 16, 2012
Enough ML theory!                         Lets go shopping^W                          parallel computing!Sunday, September...
Single Machine                                   with                             Multiple Cores                          ...
multiprocessing    >>> from multiprocessing import Pool    >>> p = Pool(4)    >>> p.map(type, [1, 2., 3])    [int, float, ...
multiprocessing                   • Part of the standard lib                   • Nice API                   • Cross-Platfo...
multiprocessing:                                  limitations                   • No docstrings in a stdlib module? WTF?  ...
Sunday, September 16, 2012
• transparent disk-caching of the output                       values and lazy re-evaluation (memoize                     ...
joblib.Parallel   >>> from os.path.join   >>> from joblib import Parallel, delayed   >>> Parallel(2)(   ...    delayed(joi...
Usage in scikit-learn               • Cross Validation                       cross_val(model, X, y, n_jobs=4, cv=3)       ...
joblib.Parallel:                             shared memory   >>> from joblib import Parallel, delayed   >>> import numpy a...
Only 3 allocated datasets shared                         by all the concurrent workers performing                         ...
Multiple Machines                                    with                              Multiple Cores                     ...
MapReduce?                             mapper      [                               [                    [       (k1, v1), ...
Why MapReduce does                    not always work                       Write a lot of stuff to disk for failover     ...
IPython.parallel               • Parallel Processing Library               • Interactive Exploratory Shell                ...
Demo!Sunday, September 16, 2012
The AllReduce Pattern                   • Compute an aggregate (average) of active                             node data  ...
AllReduce 0/3                              Initial State           Value: 1.0            Value: 2.0   Value: 0.5          ...
AllReduce 1/3                             Spanning Tree           Value: 1.0            Value: 2.0   Value: 0.5           ...
AllReduce 2/3                             Upward Averages           Value: 1.0             Value: 2.0   Value: 0.5        ...
AllReduce 2/3                             Upward Averages                                  Value: 2.0   Value: 0.5        ...
AllReduce 2/3                             Upward Averages           Value: 1.0             Value: 2.0   Value: 0.5        ...
AllReduce 3/3                             Downward Updates                                   Value: 2.0   Value: 0.5      ...
AllReduce 3/3                             Downward Updates          Value: 1.38             Value: 1.38   Value: 1.38     ...
AllReduce 3/3                             Downward Updates          Value: 1.38             Value: 1.38   Value: 1.38     ...
AllReduce Final State          Value: 1.38         Value: 1.38   Value: 1.38          Value: 1.38         Value: 1.38   Va...
AllReduce                             Implementations                       http://mpi4py.scipy.org                       ...
Working in the Cloud               • Launch a cluster of machines in one cmd:                       starcluster start mycl...
PerspectivesSunday, September 16, 2012
2012 results by                             Stanford / GoogleSunday, September 16, 2012
The YouTube NeuronSunday, September 16, 2012
Thanks                   • http://j.mp/ogrisel-pyconfr-2012                   • http://scikit-learn.org                   ...
Goodies                             Super Ctrl-C to killall nosetestsSunday, September 16, 2012
psutil nosetests killer script in AutomatorSunday, September 16, 2012
Bind killnose to Shift-Ctrl-CSunday, September 16, 2012
Or simpler:                      ps aux |   grep IPython.parallel                              |   grep -v grep           ...
Upcoming SlideShare
Loading in...5
×

Strategies and Tools for Parallel Machine Learning in Python

8,100
-1

Published on

Published in: Technology, Education
4 Comments
11 Likes
Statistics
Notes
No Downloads
Views
Total Views
8,100
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
105
Comments
4
Likes
11
Embeds 0
No embeds

No notes for slide

Strategies and Tools for Parallel Machine Learning in Python

  1. 1. Strategies & Tools for Parallel Machine Learning in Python PyConFR 2012 - ParisSunday, September 16, 2012
  2. 2. The Story scikit-learn The Cloud stackoverflow & kaggleSunday, September 16, 2012
  3. 3. ...Sunday, September 16, 2012
  4. 4. Parts of the Ecosystem Single Machine with Multiple Cores ♥♥♥ multiprocessing Multiple Machines with Multiple Cores ♥♥♥Sunday, September 16, 2012
  5. 5. The Problem Big CPU (Supercomputers - MPI) Simulating stuff from models Big Data (Google scale - MapReduce) Counting stuff in logs / Indexing the Web Machine Learning? often somewhere in the middleSunday, September 16, 2012
  6. 6. Parallel ML Use Cases • Model Evaluation with Cross Validation • Model Selection with Grid Search • Bagging Models: Random Forests • Averaged ModelsSunday, September 16, 2012
  7. 7. Embarrassingly Parallel ML Use Cases • Model Evaluation with Cross Validation • Model Selection with Grid Search • Bagging Models: Random Forests • Averaged ModelsSunday, September 16, 2012
  8. 8. Inter-Process Comm. Use Cases • Model Evaluation with Cross Validation • Model Selection with Grid Search • Bagging Models: Random Forests • Averaged ModelsSunday, September 16, 2012
  9. 9. Cross Validation Input Data Labels to PredictSunday, September 16, 2012
  10. 10. Cross Validation A B C A B CSunday, September 16, 2012
  11. 11. Cross Validation A B C A B C Subset of the data used Held-out to train the model test set for evaluationSunday, September 16, 2012
  12. 12. Cross Validation A B C A B C A C B A C B B C A B C ASunday, September 16, 2012
  13. 13. Model Selection the Hyperparameters hell p1 in [1, 10, 100] p2 in [1e3, 1e4, 1e5] Find the best combination of parameters that maximizes the Cross Validated ScoreSunday, September 16, 2012
  14. 14. Grid Search p1 (1, 1e3) (10, 1e3) (100, 1e3) p2 (1, 1e4) (10, 1e4) (100, 1e4) (1, 1e5) (10, 1e5) (100, 1e5)Sunday, September 16, 2012
  15. 15. (1, 1e3) (10, 1e3) (100, 1e3) (1, 1e4) (10, 1e4) (100, 1e4) (1, 1e5) (10, 1e5) (100, 1e5)Sunday, September 16, 2012
  16. 16. Grid Search: Qualitative ResultsSunday, September 16, 2012
  17. 17. Grid Search: Cross Validated ScoresSunday, September 16, 2012
  18. 18. Enough ML theory! Lets go shopping^W parallel computing!Sunday, September 16, 2012
  19. 19. Single Machine with Multiple Cores ♥ ♥ ♥ ♥Sunday, September 16, 2012
  20. 20. multiprocessing >>> from multiprocessing import Pool >>> p = Pool(4) >>> p.map(type, [1, 2., 3]) [int, float, str] >>> r = p.map_async(type, [1, 2., 3]) >>> r.get() [int, float, str]Sunday, September 16, 2012
  21. 21. multiprocessing • Part of the standard lib • Nice API • Cross-Platform support (even Windows!) • Some support for shared memory • Support for synchronization (Lock)Sunday, September 16, 2012
  22. 22. multiprocessing: limitations • No docstrings in a stdlib module? WTF? • Tricky / impossible to use the shared memory values with NumPy • Bad support for KeyboardInterruptSunday, September 16, 2012
  23. 23. Sunday, September 16, 2012
  24. 24. • transparent disk-caching of the output values and lazy re-evaluation (memoize pattern) • easy simple parallel computing • logging and tracing of the executionSunday, September 16, 2012
  25. 25. joblib.Parallel >>> from os.path.join >>> from joblib import Parallel, delayed >>> Parallel(2)( ... delayed(join)(/ect, s) ... for s in abc) [/ect/a, /ect/b, /ect/c]Sunday, September 16, 2012
  26. 26. Usage in scikit-learn • Cross Validation cross_val(model, X, y, n_jobs=4, cv=3) • Grid Search GridSearchCV(model, n_jobs=4, cv=3).fit(X, y) • Random Forests RandomForestClassifier(n_jobs=4).fit(X, y)Sunday, September 16, 2012
  27. 27. joblib.Parallel: shared memory >>> from joblib import Parallel, delayed >>> import numpy as np >>> Parallel(2, max_nbytes=1e6)( ... delayed(type)(np.zeros(int(i))) ... for i in [1e4, 1e6]) [<type numpy.ndarray>, <class numpy.core.memmap.memmap>]Sunday, September 16, 2012
  28. 28. Only 3 allocated datasets shared by all the concurrent workers performing the grid search. (1, 1e3) (10, 1e3) (100, 1e3) (1, 1e4) (10, 1e4) (100, 1e4) (1, 1e5) (10, 1e5) (100, 1e5)Sunday, September 16, 2012
  29. 29. Multiple Machines with Multiple Cores ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥Sunday, September 16, 2012
  30. 30. MapReduce? mapper [ [ [ (k1, v1), (k3, v3), reducer (k5, v6), (k2, v2), mapper (k4, v4), (k6, v6), ... ... reducer ... ] ] ] mapperSunday, September 16, 2012
  31. 31. Why MapReduce does not always work Write a lot of stuff to disk for failover Inefficient for small to medium problems [(k, v)] mapper [(k, v)] reducer [(k, v)] Data and model params as (k, v) pairs? Complex to leverage for Iterative AlgorithmsSunday, September 16, 2012
  32. 32. IPython.parallel • Parallel Processing Library • Interactive Exploratory Shell Multi Core & DistributedSunday, September 16, 2012
  33. 33. Demo!Sunday, September 16, 2012
  34. 34. The AllReduce Pattern • Compute an aggregate (average) of active node data • Do not clog a single node with incoming data transfer • Traditionally implemented in MPI systemsSunday, September 16, 2012
  35. 35. AllReduce 0/3 Initial State Value: 1.0 Value: 2.0 Value: 0.5 Value: 1.1 Value: 3.2 Value: 0.9Sunday, September 16, 2012
  36. 36. AllReduce 1/3 Spanning Tree Value: 1.0 Value: 2.0 Value: 0.5 Value: 1.1 Value: 3.2 Value: 0.9Sunday, September 16, 2012
  37. 37. AllReduce 2/3 Upward Averages Value: 1.0 Value: 2.0 Value: 0.5 Value: 1.1 Value: 3.2 Value: 0.9 (1.1, 1) (3.1, 1) (0.9, 1)Sunday, September 16, 2012
  38. 38. AllReduce 2/3 Upward Averages Value: 2.0 Value: 0.5 Value: 1.0 (2.1, 3) (0.7, 2) Value: 1.1 Value: 3.2 Value: 0.9 (1.1, 1) (3.1, 1) (0.9, 1)Sunday, September 16, 2012
  39. 39. AllReduce 2/3 Upward Averages Value: 1.0 Value: 2.0 Value: 0.5 (1.38, 6) (2.1, 3) (0.7, 2) Value: 1.1 Value: 3.2 Value: 0.9 (1.1, 1) (3.1, 1) (0.9, 1)Sunday, September 16, 2012
  40. 40. AllReduce 3/3 Downward Updates Value: 2.0 Value: 0.5 Value: 1.38 (2.1, 3) (0.7, 2) Value: 1.1 Value: 3.2 Value: 0.9 (1.1, 1) (3.1, 1) (0.9, 1)Sunday, September 16, 2012
  41. 41. AllReduce 3/3 Downward Updates Value: 1.38 Value: 1.38 Value: 1.38 Value: 1.1 Value: 3.2 Value: 0.9 (1.1, 1) (3.1, 1) (0.9, 1)Sunday, September 16, 2012
  42. 42. AllReduce 3/3 Downward Updates Value: 1.38 Value: 1.38 Value: 1.38 Value: 1.38 Value: 1.38 Value: 1.38Sunday, September 16, 2012
  43. 43. AllReduce Final State Value: 1.38 Value: 1.38 Value: 1.38 Value: 1.38 Value: 1.38 Value: 1.38Sunday, September 16, 2012
  44. 44. AllReduce Implementations http://mpi4py.scipy.org IPC directly w/ IPython.parallel https://github.com/ipython/ipython/tree/ master/docs/examples/parallel/interengineSunday, September 16, 2012
  45. 45. Working in the Cloud • Launch a cluster of machines in one cmd: starcluster start mycluster -b 0.07 starcluster sshmaster mycluster • Supports spotinstances! • Ships blas, atlas, numpy, scipy! • IPython plugin!Sunday, September 16, 2012
  46. 46. PerspectivesSunday, September 16, 2012
  47. 47. 2012 results by Stanford / GoogleSunday, September 16, 2012
  48. 48. The YouTube NeuronSunday, September 16, 2012
  49. 49. Thanks • http://j.mp/ogrisel-pyconfr-2012 • http://scikit-learn.org • http://packages.python.org/joblib • http://ipython.org • http://star.mit.edu/cluster/ @ogriselSunday, September 16, 2012
  50. 50. Goodies Super Ctrl-C to killall nosetestsSunday, September 16, 2012
  51. 51. psutil nosetests killer script in AutomatorSunday, September 16, 2012
  52. 52. Bind killnose to Shift-Ctrl-CSunday, September 16, 2012
  53. 53. Or simpler: ps aux | grep IPython.parallel | grep -v grep | cut -d -f 2 | xargs killSunday, September 16, 2012
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×