Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Parallel Processing with IPython

7,433 views

Published on

In this screencast, Travis Oliphant gives an introduction to IPython, an extremely useful tool for task-based parallel processing with Python.

Published in: Technology
  • Be the first to comment

Parallel Processing with IPython

  1. 1. Parallel Processing with IPython January 22, 2010
  2. 2. Enthought Python Distribution (EPD) MORE THAN SIXTY INTEGRATED PACKAGES • Python 2.6 • Repository access • Science (NumPy, SciPy, etc.) • Data Storage (HDF, NetCDF, etc.) • Plotting (Chaco, Matplotlib) • Networking (twisted) • Visualization (VTK, Mayavi) • User Interface (wxPython, Traits UI) • Multi-language Integration • Enthought Tool Suite (SWIG,Pyrex, f2py, weave) (Application Development Tools)
  3. 3. Enthought Training Courses Python Basics, NumPy, SciPy, Matplotlib, Chaco, Traits, TraitsUI, …
  4. 4. PyCon http://us.pycon.org/2010/tutorials/ Introduction to Traits Introduction to Enthought Tool Suite Fantastic deal (normally $700 at PyCon get the same material for $275) Corran Webster
  5. 5. Upcoming Training Classes March 1 – 5, 2009 Python for Scientists and Engineers Austin, Texas, USA March 1 – 5, 2009 Python for Quants London, UK http://www.enthought.com/training/
  6. 6. Parallel Processing with IPython 6
  7. 7. IPython.kernel • IPython's interactive kernel provides a simple (but powerful) interface for task- based parallel programming. • Allows fast development and tuning of task-parallel algorithm to better utilize resources. 7
  8. 8. Getting started --- local cluster UNIX and OSX (and now WINDOWS) manually WINDOWS # run ipcluster to start-up a # run ipcontroller and then # controller and a set of engines # ipengine for each desired engine $ ipcluster local –n 4 > start /B C:Python25Scriptsipcontroller.exe Your cluster is up and running. > start /B C:Python25Scriptsipengine.exe > start /B C:Python25Scriptsipengine.exe ... > start /B C:Python25Scriptsipengine.exe ... You can then cleanly stop the cluster from IPython using: 2009-02-11 23:58:26-0600 [-] Log opened. 2009-02-11 23:58:28-0600 [-] Using furl file: C:Documents mec.kill(controller=True) and Settingsdemo_ip ythonsecurityipcontroller-engine.furl You can also hit Ctrl-C to stop it, or use from the cmd 2009-02-11 23:58:28-0600 [-] registered engine with id: 3 line: 2009-02-11 23:58:28-0600 [-] distributing Tasks 2009-02-11 23:58:28-0600 [Negotiation,client] engine kill -INT 20465 registration succeeded, got id: 3 Creates several key-files in Creates several key-files in %HOME%_ipythonsecurity : ~/.ipython/security : ipcontroller-engine.furl ipcontroller-engine.furl ipcontroller-mec.furl ipcontroller-mec.furl ipcontroller-tc.furl ipcontroller-tc.furl 8
  9. 9. Getting started -- distributed • Run ipcontroller on a host and create .furl files • Creates separate .furl files to be used by the different connections (engine, multiengine client, task client). • Places .furl files by default in ~/.ipython/security (UNIX or Mac OSX) or %HOME%_ipythonsecurity (Windows). • Takes --<connection>-furl-file=FILENAME options where <connection> is engine, multiengine, or task to place the .furl files somewhere else. • Ensure the ipcontroller-engine.furl file is available to each host that will run an engine and run ipengine on these hosts. • Either place it in the default security directory • Use the –furl-file=FILENAME option to ipengine • Ensure the multiengine (task) .furl file is available to each host that will run a multiengine (task) client. • Either place it in the default security directory • Pass the FILENAME as the first argument to the constructor 9
  10. 10. Initialize client >>> from IPython.kernel import client MULTIENGINECLIENT TASKCLIENT # * allows fine-grained control # * does not expose individual # * each engine has an id number # engines # * more intuitive for beginners # * presents a load-balanced, # optional argument can be # fault-tolerant queue # location of mec furl-file # optional argument can be # created by the controller # location of tc furl-file >>> mec = client.MultiEngineClient() # created by the controller >>> mec.get_ids() >>> tc = client.TaskClient() [0 1 2 3] mec.map -- parallel map tc.map –- parallel map mec.parallel –- parallel function tc.parallel –- function decorator mec.execute -- execute in parallel tc.run -- run Tasks mec.push -- push data tc.get_task_result – get result mec.pull -- pull data mec.scatter -- spread out client.MapTask –- function-like mec.gather -- collect back client.StringTask –- code-string mec.kill -- kill engines and controller 10
  11. 11. MultiEngineClient SCALAR FUNCTION  PARALLEL VECTORIZED FUNCTION # Using map >>> def func(x): ... return x**2.5 * (3*x – 2) # standard map >>> result = map(func, range(32)) # mec.map >>> parallel_result = mec.map(func, range(32)) # mec.parallel >>> pfunc = mec.parallel()(func) or using decorators @mec.parallel def pfunc(x): return x**2.5 * (3*x – 2) >>> parallel_result2 = pfunc(range(32)) 11
  12. 12. TaskClient – Load Balancing SCALAR FUNCTION  PARALLEL VECTORIZED FUNCTION # Using map >>> def func(x): ... return x**2.5 * (3*x – 2) # standard map >>> result = map(func, range(32)) # mec.map >>> parallel_result = tc.map(func, range(32)) # mec.parallel >>> pfunc = tc.parallel()(func) or using decorators @tc.parallel def pfunc(x): return x**2.5 * (3*x – 2) >>> parallel_result2 = pfunc(range(32)) 12
  13. 13. MultiEngineClient EXECUTE CODESTRING IN PARALLEL >>> from enthought.blocks.api import func2str # decorator that turns python-code into a string >>> @func2str ... def code(): ... import numpy as np ... a = np.random.randn(N,N) ... eigs, vals = np.linalg.eig(a) ... maxeig = max(abs(eigs)) >>> mec['N'] = 100 >>> result = mec.execute(code) >>> print mec['maxeig'] [10.471428625885835, 10.322386155553213, 10.237638983818622, 10.614715948426941] 13
  14. 14. TaskClient – Load Balancing Queue EXECUTE CODESTRING IN PARALLEL >>> from enthought.blocks.api import func2str # decorator that turns python-code into a string >>> @func2str ... def code(): ... import numpy as np ... a = np.random.randn(N,N) ... eigs, vals = np.linalg.eig(a) ... maxeig = max(abs(eigs)) >>> task = client.StringTask(str(code), push={'N':100}, pull='maxeig') >>> ids = [tc.run(task) for i in range(4)] >>> res = [tc.get_task_result(id) for id in ids] >>> print [x['maxeig'] for x in res] [10.439989436983467, 10.250842410862729, 10.040835983392991, 10.603885977189803] 14
  15. 15. Parallel FFT On Memory Mapped File Time Processors Speed Up (seconds) 1 11.75 1.0 2 6.06 1.9 4 3.36 3.5 8 2.50 4.7
  16. 16. EPD http://www.enthought.com/products/epd.php Webinars http://www.enthought.com/training/webinars.php Enthought Training: http://www.enthought.com/training/

×