SlideShare a Scribd company logo
Cloud computing made easy
in
Joblib
Alexandre Abadie
Outline
An overview of Joblib
Joblib for cloud computing
Future work
Joblib in a word
A Python package to make your algorithms run faster
Joblib in a word
A Python package to make your algorithms run faster
http://joblib.readthedocs.io
The ecosystem
54 different contributors since the beginning in 2008
Contributors per month
The ecosystem
54 different contributors since the beginning in 2008
Contributors per month
Joblib is the computing backend used by Scikit-Learn
The ecosystem
54 different contributors since the beginning in 2008
Contributors per month
Joblib is the computing backend used by Scikit-Learn
Stable and mature code base
https://github.com/joblib/joblib
Why Joblib?
Why Joblib?
Because we want to make use of all available computing resources
Why Joblib?
Because we want to make use of all available computing resources
⇒ And ensure algorithms run as fast as possible
Why Joblib?
Because we want to make use of all available computing resources
⇒ And ensure algorithms run as fast as possible
Because we work on large datasets
Why Joblib?
Because we want to make use of all available computing resources
⇒ And ensure algorithms run as fast as possible
Because we work on large datasets
⇒ Data that just fits in RAM
Why Joblib?
Because we want to make use of all available computing resources
⇒ And ensure algorithms run as fast as possible
Because we work on large datasets
⇒ Data that just fits in RAM
Because we want the internal algorithm logic to remain unchanged
Why Joblib?
Because we want to make use of all available computing resources
⇒ And ensure algorithms run as fast as possible
Because we work on large datasets
⇒ Data that just fits in RAM
Because we want the internal algorithm logic to remain unchanged
⇒ Adapted to embarrassingly parallel problems
Why Joblib?
Because we want to make use of all available computing resources
⇒ And ensure algorithms run as fast as possible
Because we work on large datasets
⇒ Data that just fits in RAM
Because we want the internal algorithm logic to remain unchanged
⇒ Adapted to embarrassingly parallel problems
Because we love simple APIs
Why Joblib?
Because we want to make use of all available computing resources
⇒ And ensure algorithms run as fast as possible
Because we work on large datasets
⇒ Data that just fits in RAM
Because we want the internal algorithm logic to remain unchanged
⇒ Adapted to embarrassingly parallel problems
Because we love simple APIs
⇒ And parallel programming is not user friendly in general
How?
Embarrassingly Parallel computing helper
⇒ make parallel computing easy
How?
Embarrassingly Parallel computing helper
⇒ make parallel computing easy
Efficient disk caching to avoid recomputation
⇒ computation resource friendly
How?
Embarrassingly Parallel computing helper
⇒ make parallel computing easy
Efficient disk caching to avoid recomputation
⇒ computation resource friendly
Fast I/O persistence
⇒ limit cache access time
How?
Embarrassingly Parallel computing helper
⇒ make parallel computing easy
Efficient disk caching to avoid recomputation
⇒ computation resource friendly
Fast I/O persistence
⇒ limit cache access time
No dependencies, optimized for numpy arrays
⇒ simple installation and integration in other projects
Overview
Parallel helper
>>> from joblib import Parallel, delayed
>>> from math import sqrt
>>> Parallel(n_jobs=3, verbose=50)(delayed(sqrt)(i**2) for i in range(6))
[Parallel(n_jobs=3)]: Done 1 tasks | elapsed: 0.0s
[...]
[Parallel(n_jobs=3)]: Done 6 out of 6 | elapsed: 0.0s finished
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0]
Parallel helper
>>> from joblib import Parallel, delayed
>>> from math import sqrt
>>> Parallel(n_jobs=3, verbose=50)(delayed(sqrt)(i**2) for i in range(6))
[Parallel(n_jobs=3)]: Done 1 tasks | elapsed: 0.0s
[...]
[Parallel(n_jobs=3)]: Done 6 out of 6 | elapsed: 0.0s finished
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0]
⇒ API can be extended with external backends
Parallel backends
Single machine backends: works on a Laptop
⇒ threading, multiprocessing and soon Loky
Parallel backends
Single machine backends: works on a Laptop
⇒ threading, multiprocessing and soon Loky
Multi machine backends: available as optional extensions
⇒ distributed, ipyparallel, CMFActivity, Hadoop Yarn
Parallel backends
Single machine backends: works on a Laptop
⇒ threading, multiprocessing and soon Loky
Multi machine backends: available as optional extensions
⇒ distributed, ipyparallel, CMFActivity, Hadoop Yarn
>>> from distributed.joblib import DistributedBackend
>>> from joblib import (Parallel, delayed,
>>> register_parallel_backend, parallel_backend)
>>> register_parallel_backend('distributed', DistributedBackend)
>>> with parallel_backend('distributed', scheduler_host='dscheduler:8786'):
>>> Parallel(n_jobs=3)(delayed(sqrt)(i**2) for i in range(6))
[...]
Parallel backends
Single machine backends: works on a Laptop
⇒ threading, multiprocessing and soon Loky
Multi machine backends: available as optional extensions
⇒ distributed, ipyparallel, CMFActivity, Hadoop Yarn
>>> from distributed.joblib import DistributedBackend
>>> from joblib import (Parallel, delayed,
>>> register_parallel_backend, parallel_backend)
>>> register_parallel_backend('distributed', DistributedBackend)
>>> with parallel_backend('distributed', scheduler_host='dscheduler:8786'):
>>> Parallel(n_jobs=3)(delayed(sqrt)(i**2) for i in range(6))
[...]
Future: new backends for Celery, Spark
Caching on disk
Use a memoize pattern with the Memory object
>>> from joblib import Memory
>>> import numpy as np
>>> a = np.vander(np.arange(3)).astype(np.float)
>>> mem = Memory(cachedir='/tmp/joblib')
>>> square = mem.cache(np.square)
Caching on disk
Use a memoize pattern with the Memory object
>>> from joblib import Memory
>>> import numpy as np
>>> a = np.vander(np.arange(3)).astype(np.float)
>>> mem = Memory(cachedir='/tmp/joblib')
>>> square = mem.cache(np.square)
>>> b = square(a)
________________________________________________________________________________
[Memory] Calling square...
square(array([[ 0., 0., 1.],
[ 1., 1., 1.],
[ 4., 2., 1.]]))
___________________________________________________________square - 0...s, 0.0min
>>> c = square(a) # no recomputation
array([[ 0., 0., 1.],
[...]
Caching on disk
Use a memoize pattern with the Memory object
>>> from joblib import Memory
>>> import numpy as np
>>> a = np.vander(np.arange(3)).astype(np.float)
>>> mem = Memory(cachedir='/tmp/joblib')
>>> square = mem.cache(np.square)
>>> b = square(a)
________________________________________________________________________________
[Memory] Calling square...
square(array([[ 0., 0., 1.],
[ 1., 1., 1.],
[ 4., 2., 1.]]))
___________________________________________________________square - 0...s, 0.0min
>>> c = square(a) # no recomputation
array([[ 0., 0., 1.],
[...]
Least Recently Used (LRU) cache replacement policy
Persistence
Convert/create an arbitrary object into/from a string of bytes
Streamable persistence to/from file or socket objects
>>> import numpy as np
>>> import joblib
>>> obj = [('a', [1, 2, 3]), ('b', np.arange(10))]
>>> joblib.dump(obj, '/tmp/test.pkl')
['/tmp/test.pkl']
>>> with open('/tmp/test.pkl', 'rb') as f:
>>> joblib.load(f)
[('a', [1, 2, 3]), ('b', array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))]
Persistence
Convert/create an arbitrary object into/from a string of bytes
Streamable persistence to/from file or socket objects
>>> import numpy as np
>>> import joblib
>>> obj = [('a', [1, 2, 3]), ('b', np.arange(10))]
>>> joblib.dump(obj, '/tmp/test.pkl')
['/tmp/test.pkl']
>>> with open('/tmp/test.pkl', 'rb') as f:
>>> joblib.load(f)
[('a', [1, 2, 3]), ('b', array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))]
Use compression for fast I/O:
   support for zlib, gz, bz2, xz and lzma compressors
>>> joblib.dump(obj, '/tmp/test.pkl.gz', compress=True, cache_size=0)
['/tmp/test.pkl.gz']
>>> joblib.load('/tmp/test.pkl.gz')
Outline
Joblib in a word
⇒Joblib for cloud computing
Future work
The Cloud trend
Lots of Cloud providers on the market:
              
The Cloud trend
Lots of Cloud providers on the market:
              
Existing solutions for processing Big Data:
         
The Cloud trend
Lots of Cloud providers on the market:
              
Existing solutions for processing Big Data:
         
Existing container orchestration solutions: Docker SWARM, Kubernetes
    
The Cloud trend
Lots of Cloud providers on the market:
              
Existing solutions for processing Big Data:
         
Existing container orchestration solutions: Docker SWARM, Kubernetes
    
How can Joblib be used with them?
The general idea
Use pluggable multi-machine parallel backends
Principle: configure your backend and wrap the calls to Parallel
>>> import time
>>> import ipyparallel as ipp
>>> from ipyparallel.joblib import register as register_joblib
>>> from joblib import parallel_backend, Parallel, delayed
# Setup ipyparallel backend
>>> register_joblib()
>>> dview = ipp.Client()[:]
# Start the job
>>> with parallel_backend("ipyparallel", view=dview):
>>> Parallel(n_jobs=20, verbose=50)(delayed(time.sleep)(1) for i in range(10))
Use pluggable multi-machine parallel backends
Principle: configure your backend and wrap the calls to Parallel
>>> import time
>>> import ipyparallel as ipp
>>> from ipyparallel.joblib import register as register_joblib
>>> from joblib import parallel_backend, Parallel, delayed
# Setup ipyparallel backend
>>> register_joblib()
>>> dview = ipp.Client()[:]
# Start the job
>>> with parallel_backend("ipyparallel", view=dview):
>>> Parallel(n_jobs=20, verbose=50)(delayed(time.sleep)(1) for i in range(10))
Complete examples exist for:
Dask distributed: https://github.com/ogrisel/docker-distributed
Hadoop Yarn: https://github.com/joblib/joblib-hadoop
Use pluggable store backends
Extends Memory API with other store providers
Not available upstream yet:
⇒ PR opened at https://github.com/joblib/joblib/pull/397
Use pluggable store backends
Extends Memory API with other store providers
Not available upstream yet:
⇒ PR opened at https://github.com/joblib/joblib/pull/397
>>> import numpy as np
>>> from joblib import Memory
>>> from joblibhadoop.hdfs import register_hdfs_store_backend
# Register HDFS store backend provider
>>> register_hdfs_store_backend()
# Persist data in hdfs://namenode:9000/user/john/cache/joblib
>>> mem = Memory(location='cache', backend='hdfs',
>>> host='namenode', port=9000, user='john', compress=True)
multiply = mem.cache(np.multiply)
Use pluggable store backends
Extends Memory API with other store providers
Not available upstream yet:
⇒ PR opened at https://github.com/joblib/joblib/pull/397
>>> import numpy as np
>>> from joblib import Memory
>>> from joblibhadoop.hdfs import register_hdfs_store_backend
# Register HDFS store backend provider
>>> register_hdfs_store_backend()
# Persist data in hdfs://namenode:9000/user/john/cache/joblib
>>> mem = Memory(location='cache', backend='hdfs',
>>> host='namenode', port=9000, user='john', compress=True)
multiply = mem.cache(np.multiply)
Store backends available:
Amazon S3: https://github.com/aabadie/joblib-s3
Hadoop HDFS: https://github.com/joblib/joblib-hadoop
Using Hadoop with Joblib
joblib-hadoop package: https://github.com/joblib/joblib-hadoop
Using Hadoop with Joblib
joblib-hadoop package: https://github.com/joblib/joblib-hadoop
Provides docker containers helpers for developing and testing
Using Hadoop with Joblib
joblib-hadoop package: https://github.com/joblib/joblib-hadoop
Provides docker containers helpers for developing and testing
⇒ no need for a production Hadoop cluster
⇒ make developer life easier: CI on Travis is possible
⇒ local repository on host is shared with Joblib-hadoop-node container
Using Hadoop with Joblib
joblib-hadoop package: https://github.com/joblib/joblib-hadoop
Provides docker containers helpers for developing and testing
⇒ no need for a production Hadoop cluster
⇒ make developer life easier: CI on Travis is possible
⇒ local repository on host is shared with Joblib-hadoop-node container
Outline
Joblib in a word
Joblib for cloud computing
⇒Future work and conclusion
Future work
In-memory object caching
⇒ Should save RAM during a parallel job
Future work
In-memory object caching
⇒ Should save RAM during a parallel job
Allow overriding of parallel backends
⇒ See PR: https://github.com/joblib/joblib/pull/524
⇒ Seamless distributed computing in scikit-learn
Future work
In-memory object caching
⇒ Should save RAM during a parallel job
Allow overriding of parallel backends
⇒ See PR: https://github.com/joblib/joblib/pull/524
⇒ Seamless distributed computing in scikit-learn
Replace multiprocessing parallel backend with Loky
⇒ See PR: https://github.com/joblib/joblib/pull/516
Future work
In-memory object caching
⇒ Should save RAM during a parallel job
Allow overriding of parallel backends
⇒ See PR: https://github.com/joblib/joblib/pull/524
⇒ Seamless distributed computing in scikit-learn
Replace multiprocessing parallel backend with Loky
⇒ See PR: https://github.com/joblib/joblib/pull/516
Extend Cloud providers support
⇒ Using Apache libcloud: give access to a lot more Cloud providers
Conclusion
Conclusion
Parallel helper is adapted to embarassingly parallel problems
Conclusion
Parallel helper is adapted to embarassingly parallel problems
Already a lot of parallel backends available
⇒ threading, multiprocessing, loky, CMFActivity distributed, ipyparallel, Yarn
Conclusion
Parallel helper is adapted to embarassingly parallel problems
Already a lot of parallel backends available
⇒ threading, multiprocessing, loky, CMFActivity distributed, ipyparallel, Yarn
Use caching techniques to avoid recomputation
Conclusion
Parallel helper is adapted to embarassingly parallel problems
Already a lot of parallel backends available
⇒ threading, multiprocessing, loky, CMFActivity distributed, ipyparallel, Yarn
Use caching techniques to avoid recomputation
Extra Store backends available ⇒ HDFS (Hadoop) and AWS S3
Conclusion
Parallel helper is adapted to embarassingly parallel problems
Already a lot of parallel backends available
⇒ threading, multiprocessing, loky, CMFActivity distributed, ipyparallel, Yarn
Use caching techniques to avoid recomputation
Extra Store backends available ⇒ HDFS (Hadoop) and AWS S3
Use Joblib either on your laptop or in a Cloud with very few code changes
Thanks!
           
           

More Related Content

What's hot

RubyKaigi2015 making robots-with-mruby
RubyKaigi2015 making robots-with-mrubyRubyKaigi2015 making robots-with-mruby
RubyKaigi2015 making robots-with-mruby
yamanekko
 
JavaScript Engines and Event Loop
JavaScript Engines and Event Loop JavaScript Engines and Event Loop
JavaScript Engines and Event Loop
Tapan B.K.
 
node.js and native code extensions by example
node.js and native code extensions by examplenode.js and native code extensions by example
node.js and native code extensions by example
Philipp Fehre
 
Functional Reactive Programming with RxJS
Functional Reactive Programming with RxJSFunctional Reactive Programming with RxJS
Functional Reactive Programming with RxJS
stefanmayer13
 
Javascript Everywhere
Javascript EverywhereJavascript Everywhere
Javascript Everywhere
Pascal Rettig
 
Letswift19-clean-architecture
Letswift19-clean-architectureLetswift19-clean-architecture
Letswift19-clean-architecture
Jung Kim
 
Realm.io par Clement Sauvage
Realm.io par Clement SauvageRealm.io par Clement Sauvage
Realm.io par Clement Sauvage
CocoaHeads France
 
Event loop
Event loopEvent loop
Event loop
codepitbull
 
History of asynchronous in .NET
History of asynchronous in .NETHistory of asynchronous in .NET
History of asynchronous in .NET
Marcin Tyborowski
 
Extending Node.js using C++
Extending Node.js using C++Extending Node.js using C++
Extending Node.js using C++
Kenneth Geisshirt
 
R and C++
R and C++R and C++
R and C++
Romain Francois
 
Theads services
Theads servicesTheads services
Theads services
Training Guide
 
RxSwift to Combine
RxSwift to CombineRxSwift to Combine
RxSwift to Combine
Bo-Young Park
 
RxSwift to Combine
RxSwift to CombineRxSwift to Combine
RxSwift to Combine
Bo-Young Park
 
Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Simple ETL in python 3.5+ with Bonobo - PyParis 2017Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Romain Dorgueil
 
Python Objects
Python ObjectsPython Objects
Python Objects
Quintagroup
 
Python to scala
Python to scalaPython to scala
Python to scala
kao kuo-tung
 
Scala Future & Promises
Scala Future & PromisesScala Future & Promises
Scala Future & Promises
Knoldus Inc.
 
Streams in Node.js
Streams in Node.jsStreams in Node.js
Streams in Node.js
Sebastian Springer
 
Porting legacy apps to Griffon
Porting legacy apps to GriffonPorting legacy apps to Griffon
Porting legacy apps to Griffon
James Williams
 

What's hot (20)

RubyKaigi2015 making robots-with-mruby
RubyKaigi2015 making robots-with-mrubyRubyKaigi2015 making robots-with-mruby
RubyKaigi2015 making robots-with-mruby
 
JavaScript Engines and Event Loop
JavaScript Engines and Event Loop JavaScript Engines and Event Loop
JavaScript Engines and Event Loop
 
node.js and native code extensions by example
node.js and native code extensions by examplenode.js and native code extensions by example
node.js and native code extensions by example
 
Functional Reactive Programming with RxJS
Functional Reactive Programming with RxJSFunctional Reactive Programming with RxJS
Functional Reactive Programming with RxJS
 
Javascript Everywhere
Javascript EverywhereJavascript Everywhere
Javascript Everywhere
 
Letswift19-clean-architecture
Letswift19-clean-architectureLetswift19-clean-architecture
Letswift19-clean-architecture
 
Realm.io par Clement Sauvage
Realm.io par Clement SauvageRealm.io par Clement Sauvage
Realm.io par Clement Sauvage
 
Event loop
Event loopEvent loop
Event loop
 
History of asynchronous in .NET
History of asynchronous in .NETHistory of asynchronous in .NET
History of asynchronous in .NET
 
Extending Node.js using C++
Extending Node.js using C++Extending Node.js using C++
Extending Node.js using C++
 
R and C++
R and C++R and C++
R and C++
 
Theads services
Theads servicesTheads services
Theads services
 
RxSwift to Combine
RxSwift to CombineRxSwift to Combine
RxSwift to Combine
 
RxSwift to Combine
RxSwift to CombineRxSwift to Combine
RxSwift to Combine
 
Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Simple ETL in python 3.5+ with Bonobo - PyParis 2017Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Simple ETL in python 3.5+ with Bonobo - PyParis 2017
 
Python Objects
Python ObjectsPython Objects
Python Objects
 
Python to scala
Python to scalaPython to scala
Python to scala
 
Scala Future & Promises
Scala Future & PromisesScala Future & Promises
Scala Future & Promises
 
Streams in Node.js
Streams in Node.jsStreams in Node.js
Streams in Node.js
 
Porting legacy apps to Griffon
Porting legacy apps to GriffonPorting legacy apps to Griffon
Porting legacy apps to Griffon
 

Similar to PyParis2017 / Cloud computing made easy in Joblib, by Alexandre Abadie

Joblib PyDataParis2016
Joblib PyDataParis2016Joblib PyDataParis2016
Joblib PyDataParis2016
Alexandre Abadie
 
Joblib Toward efficient computing : from laptop to cloud
Joblib Toward efficient computing : from laptop to cloudJoblib Toward efficient computing : from laptop to cloud
Joblib Toward efficient computing : from laptop to cloud
PyDataParis
 
Angular - Improve Runtime performance 2019
Angular - Improve Runtime performance 2019Angular - Improve Runtime performance 2019
Angular - Improve Runtime performance 2019
Eliran Eliassy
 
Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance Python
Ian Ozsvald
 
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamMalo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Flink Forward
 
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
Flink Forward
 
Why learn Internals?
Why learn Internals?Why learn Internals?
Why learn Internals?
Shaul Rosenzwieg
 
Data herding
Data herdingData herding
Data herding
unbracketed
 
Data herding
Data herdingData herding
Data herding
unbracketed
 
Accelerating the Development of Efficient CP Optimizer Models
Accelerating the Development of Efficient CP Optimizer ModelsAccelerating the Development of Efficient CP Optimizer Models
Accelerating the Development of Efficient CP Optimizer Models
Philippe Laborie
 
Paris DataGeek - SummingBird
Paris DataGeek - SummingBirdParis DataGeek - SummingBird
Paris DataGeek - SummingBird
Sofian Djamaa
 
Simple ETL in python 3.5+ with Bonobo, Romain Dorgueil
Simple ETL in python 3.5+ with Bonobo, Romain DorgueilSimple ETL in python 3.5+ with Bonobo, Romain Dorgueil
Simple ETL in python 3.5+ with Bonobo, Romain Dorgueil
Pôle Systematic Paris-Region
 
Java Performance Tuning
Java Performance TuningJava Performance Tuning
Java Performance Tuning
Atthakorn Chanthong
 
End to-end async and await
End to-end async and awaitEnd to-end async and await
End to-end async and await
vfabro
 
PyCon Estonia 2019
PyCon Estonia 2019PyCon Estonia 2019
PyCon Estonia 2019
Travis Oliphant
 
Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)
W2O Group
 
Runtime performance
Runtime performanceRuntime performance
Runtime performance
Eliran Eliassy
 
Scale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyDataScale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyData
Travis Oliphant
 
The genesis of clusterlib - An open source library to tame your favourite sup...
The genesis of clusterlib - An open source library to tame your favourite sup...The genesis of clusterlib - An open source library to tame your favourite sup...
The genesis of clusterlib - An open source library to tame your favourite sup...
Arnaud Joly
 
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVMVoxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Manuel Bernhardt
 

Similar to PyParis2017 / Cloud computing made easy in Joblib, by Alexandre Abadie (20)

Joblib PyDataParis2016
Joblib PyDataParis2016Joblib PyDataParis2016
Joblib PyDataParis2016
 
Joblib Toward efficient computing : from laptop to cloud
Joblib Toward efficient computing : from laptop to cloudJoblib Toward efficient computing : from laptop to cloud
Joblib Toward efficient computing : from laptop to cloud
 
Angular - Improve Runtime performance 2019
Angular - Improve Runtime performance 2019Angular - Improve Runtime performance 2019
Angular - Improve Runtime performance 2019
 
Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance Python
 
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamMalo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
 
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
 
Why learn Internals?
Why learn Internals?Why learn Internals?
Why learn Internals?
 
Data herding
Data herdingData herding
Data herding
 
Data herding
Data herdingData herding
Data herding
 
Accelerating the Development of Efficient CP Optimizer Models
Accelerating the Development of Efficient CP Optimizer ModelsAccelerating the Development of Efficient CP Optimizer Models
Accelerating the Development of Efficient CP Optimizer Models
 
Paris DataGeek - SummingBird
Paris DataGeek - SummingBirdParis DataGeek - SummingBird
Paris DataGeek - SummingBird
 
Simple ETL in python 3.5+ with Bonobo, Romain Dorgueil
Simple ETL in python 3.5+ with Bonobo, Romain DorgueilSimple ETL in python 3.5+ with Bonobo, Romain Dorgueil
Simple ETL in python 3.5+ with Bonobo, Romain Dorgueil
 
Java Performance Tuning
Java Performance TuningJava Performance Tuning
Java Performance Tuning
 
End to-end async and await
End to-end async and awaitEnd to-end async and await
End to-end async and await
 
PyCon Estonia 2019
PyCon Estonia 2019PyCon Estonia 2019
PyCon Estonia 2019
 
Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)
 
Runtime performance
Runtime performanceRuntime performance
Runtime performance
 
Scale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyDataScale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyData
 
The genesis of clusterlib - An open source library to tame your favourite sup...
The genesis of clusterlib - An open source library to tame your favourite sup...The genesis of clusterlib - An open source library to tame your favourite sup...
The genesis of clusterlib - An open source library to tame your favourite sup...
 
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVMVoxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
 

More from Pôle Systematic Paris-Region

OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
Pôle Systematic Paris-Region
 
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
Pôle Systematic Paris-Region
 
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
Pôle Systematic Paris-Region
 
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
Pôle Systematic Paris-Region
 
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
Pôle Systematic Paris-Region
 
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
Pôle Systematic Paris-Region
 
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
Pôle Systematic Paris-Region
 
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick MoyOsis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Pôle Systematic Paris-Region
 
Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Pas de commun sans communauté ?Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Pas de commun sans communauté ?
Pôle Systematic Paris-Region
 
Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Projet Wolphin Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Projet Wolphin
Pôle Systematic Paris-Region
 
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMAOsis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
Pôle Systematic Paris-Region
 
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur BittorrentOsis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Pôle Systematic Paris-Region
 
Osis18_Cloud : Software-heritage
Osis18_Cloud : Software-heritageOsis18_Cloud : Software-heritage
Osis18_Cloud : Software-heritage
Pôle Systematic Paris-Region
 
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
Pôle Systematic Paris-Region
 
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riotOSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
Pôle Systematic Paris-Region
 
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
Pôle Systematic Paris-Region
 
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
Pôle Systematic Paris-Region
 
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
Pôle Systematic Paris-Region
 
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
Pôle Systematic Paris-Region
 
PyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelatPyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelat
Pôle Systematic Paris-Region
 

More from Pôle Systematic Paris-Region (20)

OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
 
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
 
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
 
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
 
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
 
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
 
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
 
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick MoyOsis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
 
Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Pas de commun sans communauté ?Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Pas de commun sans communauté ?
 
Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Projet Wolphin Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Projet Wolphin
 
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMAOsis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
 
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur BittorrentOsis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
 
Osis18_Cloud : Software-heritage
Osis18_Cloud : Software-heritageOsis18_Cloud : Software-heritage
Osis18_Cloud : Software-heritage
 
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
 
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riotOSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
 
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
 
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
 
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
 
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
 
PyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelatPyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelat
 

Recently uploaded

Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 

Recently uploaded (20)

Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 

PyParis2017 / Cloud computing made easy in Joblib, by Alexandre Abadie

  • 1. Cloud computing made easy in Joblib Alexandre Abadie
  • 2. Outline An overview of Joblib Joblib for cloud computing Future work
  • 3. Joblib in a word A Python package to make your algorithms run faster
  • 4. Joblib in a word A Python package to make your algorithms run faster http://joblib.readthedocs.io
  • 5. The ecosystem 54 different contributors since the beginning in 2008 Contributors per month
  • 6. The ecosystem 54 different contributors since the beginning in 2008 Contributors per month Joblib is the computing backend used by Scikit-Learn
  • 7. The ecosystem 54 different contributors since the beginning in 2008 Contributors per month Joblib is the computing backend used by Scikit-Learn Stable and mature code base https://github.com/joblib/joblib
  • 9. Why Joblib? Because we want to make use of all available computing resources
  • 10. Why Joblib? Because we want to make use of all available computing resources ⇒ And ensure algorithms run as fast as possible
  • 11. Why Joblib? Because we want to make use of all available computing resources ⇒ And ensure algorithms run as fast as possible Because we work on large datasets
  • 12. Why Joblib? Because we want to make use of all available computing resources ⇒ And ensure algorithms run as fast as possible Because we work on large datasets ⇒ Data that just fits in RAM
  • 13. Why Joblib? Because we want to make use of all available computing resources ⇒ And ensure algorithms run as fast as possible Because we work on large datasets ⇒ Data that just fits in RAM Because we want the internal algorithm logic to remain unchanged
  • 14. Why Joblib? Because we want to make use of all available computing resources ⇒ And ensure algorithms run as fast as possible Because we work on large datasets ⇒ Data that just fits in RAM Because we want the internal algorithm logic to remain unchanged ⇒ Adapted to embarrassingly parallel problems
  • 15. Why Joblib? Because we want to make use of all available computing resources ⇒ And ensure algorithms run as fast as possible Because we work on large datasets ⇒ Data that just fits in RAM Because we want the internal algorithm logic to remain unchanged ⇒ Adapted to embarrassingly parallel problems Because we love simple APIs
  • 16. Why Joblib? Because we want to make use of all available computing resources ⇒ And ensure algorithms run as fast as possible Because we work on large datasets ⇒ Data that just fits in RAM Because we want the internal algorithm logic to remain unchanged ⇒ Adapted to embarrassingly parallel problems Because we love simple APIs ⇒ And parallel programming is not user friendly in general
  • 17. How? Embarrassingly Parallel computing helper ⇒ make parallel computing easy
  • 18. How? Embarrassingly Parallel computing helper ⇒ make parallel computing easy Efficient disk caching to avoid recomputation ⇒ computation resource friendly
  • 19. How? Embarrassingly Parallel computing helper ⇒ make parallel computing easy Efficient disk caching to avoid recomputation ⇒ computation resource friendly Fast I/O persistence ⇒ limit cache access time
  • 20. How? Embarrassingly Parallel computing helper ⇒ make parallel computing easy Efficient disk caching to avoid recomputation ⇒ computation resource friendly Fast I/O persistence ⇒ limit cache access time No dependencies, optimized for numpy arrays ⇒ simple installation and integration in other projects
  • 22. Parallel helper >>> from joblib import Parallel, delayed >>> from math import sqrt >>> Parallel(n_jobs=3, verbose=50)(delayed(sqrt)(i**2) for i in range(6)) [Parallel(n_jobs=3)]: Done 1 tasks | elapsed: 0.0s [...] [Parallel(n_jobs=3)]: Done 6 out of 6 | elapsed: 0.0s finished [0.0, 1.0, 2.0, 3.0, 4.0, 5.0]
  • 23. Parallel helper >>> from joblib import Parallel, delayed >>> from math import sqrt >>> Parallel(n_jobs=3, verbose=50)(delayed(sqrt)(i**2) for i in range(6)) [Parallel(n_jobs=3)]: Done 1 tasks | elapsed: 0.0s [...] [Parallel(n_jobs=3)]: Done 6 out of 6 | elapsed: 0.0s finished [0.0, 1.0, 2.0, 3.0, 4.0, 5.0] ⇒ API can be extended with external backends
  • 24. Parallel backends Single machine backends: works on a Laptop ⇒ threading, multiprocessing and soon Loky
  • 25. Parallel backends Single machine backends: works on a Laptop ⇒ threading, multiprocessing and soon Loky Multi machine backends: available as optional extensions ⇒ distributed, ipyparallel, CMFActivity, Hadoop Yarn
  • 26. Parallel backends Single machine backends: works on a Laptop ⇒ threading, multiprocessing and soon Loky Multi machine backends: available as optional extensions ⇒ distributed, ipyparallel, CMFActivity, Hadoop Yarn >>> from distributed.joblib import DistributedBackend >>> from joblib import (Parallel, delayed, >>> register_parallel_backend, parallel_backend) >>> register_parallel_backend('distributed', DistributedBackend) >>> with parallel_backend('distributed', scheduler_host='dscheduler:8786'): >>> Parallel(n_jobs=3)(delayed(sqrt)(i**2) for i in range(6)) [...]
  • 27. Parallel backends Single machine backends: works on a Laptop ⇒ threading, multiprocessing and soon Loky Multi machine backends: available as optional extensions ⇒ distributed, ipyparallel, CMFActivity, Hadoop Yarn >>> from distributed.joblib import DistributedBackend >>> from joblib import (Parallel, delayed, >>> register_parallel_backend, parallel_backend) >>> register_parallel_backend('distributed', DistributedBackend) >>> with parallel_backend('distributed', scheduler_host='dscheduler:8786'): >>> Parallel(n_jobs=3)(delayed(sqrt)(i**2) for i in range(6)) [...] Future: new backends for Celery, Spark
  • 28. Caching on disk Use a memoize pattern with the Memory object >>> from joblib import Memory >>> import numpy as np >>> a = np.vander(np.arange(3)).astype(np.float) >>> mem = Memory(cachedir='/tmp/joblib') >>> square = mem.cache(np.square)
  • 29. Caching on disk Use a memoize pattern with the Memory object >>> from joblib import Memory >>> import numpy as np >>> a = np.vander(np.arange(3)).astype(np.float) >>> mem = Memory(cachedir='/tmp/joblib') >>> square = mem.cache(np.square) >>> b = square(a) ________________________________________________________________________________ [Memory] Calling square... square(array([[ 0., 0., 1.], [ 1., 1., 1.], [ 4., 2., 1.]])) ___________________________________________________________square - 0...s, 0.0min >>> c = square(a) # no recomputation array([[ 0., 0., 1.], [...]
  • 30. Caching on disk Use a memoize pattern with the Memory object >>> from joblib import Memory >>> import numpy as np >>> a = np.vander(np.arange(3)).astype(np.float) >>> mem = Memory(cachedir='/tmp/joblib') >>> square = mem.cache(np.square) >>> b = square(a) ________________________________________________________________________________ [Memory] Calling square... square(array([[ 0., 0., 1.], [ 1., 1., 1.], [ 4., 2., 1.]])) ___________________________________________________________square - 0...s, 0.0min >>> c = square(a) # no recomputation array([[ 0., 0., 1.], [...] Least Recently Used (LRU) cache replacement policy
  • 31. Persistence Convert/create an arbitrary object into/from a string of bytes Streamable persistence to/from file or socket objects >>> import numpy as np >>> import joblib >>> obj = [('a', [1, 2, 3]), ('b', np.arange(10))] >>> joblib.dump(obj, '/tmp/test.pkl') ['/tmp/test.pkl'] >>> with open('/tmp/test.pkl', 'rb') as f: >>> joblib.load(f) [('a', [1, 2, 3]), ('b', array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))]
  • 32. Persistence Convert/create an arbitrary object into/from a string of bytes Streamable persistence to/from file or socket objects >>> import numpy as np >>> import joblib >>> obj = [('a', [1, 2, 3]), ('b', np.arange(10))] >>> joblib.dump(obj, '/tmp/test.pkl') ['/tmp/test.pkl'] >>> with open('/tmp/test.pkl', 'rb') as f: >>> joblib.load(f) [('a', [1, 2, 3]), ('b', array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))] Use compression for fast I/O:    support for zlib, gz, bz2, xz and lzma compressors >>> joblib.dump(obj, '/tmp/test.pkl.gz', compress=True, cache_size=0) ['/tmp/test.pkl.gz'] >>> joblib.load('/tmp/test.pkl.gz')
  • 33. Outline Joblib in a word ⇒Joblib for cloud computing Future work
  • 34. The Cloud trend Lots of Cloud providers on the market:               
  • 35. The Cloud trend Lots of Cloud providers on the market:                Existing solutions for processing Big Data:          
  • 36. The Cloud trend Lots of Cloud providers on the market:                Existing solutions for processing Big Data:           Existing container orchestration solutions: Docker SWARM, Kubernetes     
  • 37. The Cloud trend Lots of Cloud providers on the market:                Existing solutions for processing Big Data:           Existing container orchestration solutions: Docker SWARM, Kubernetes      How can Joblib be used with them?
  • 39. Use pluggable multi-machine parallel backends Principle: configure your backend and wrap the calls to Parallel >>> import time >>> import ipyparallel as ipp >>> from ipyparallel.joblib import register as register_joblib >>> from joblib import parallel_backend, Parallel, delayed # Setup ipyparallel backend >>> register_joblib() >>> dview = ipp.Client()[:] # Start the job >>> with parallel_backend("ipyparallel", view=dview): >>> Parallel(n_jobs=20, verbose=50)(delayed(time.sleep)(1) for i in range(10))
  • 40. Use pluggable multi-machine parallel backends Principle: configure your backend and wrap the calls to Parallel >>> import time >>> import ipyparallel as ipp >>> from ipyparallel.joblib import register as register_joblib >>> from joblib import parallel_backend, Parallel, delayed # Setup ipyparallel backend >>> register_joblib() >>> dview = ipp.Client()[:] # Start the job >>> with parallel_backend("ipyparallel", view=dview): >>> Parallel(n_jobs=20, verbose=50)(delayed(time.sleep)(1) for i in range(10)) Complete examples exist for: Dask distributed: https://github.com/ogrisel/docker-distributed Hadoop Yarn: https://github.com/joblib/joblib-hadoop
  • 41. Use pluggable store backends Extends Memory API with other store providers Not available upstream yet: ⇒ PR opened at https://github.com/joblib/joblib/pull/397
  • 42. Use pluggable store backends Extends Memory API with other store providers Not available upstream yet: ⇒ PR opened at https://github.com/joblib/joblib/pull/397 >>> import numpy as np >>> from joblib import Memory >>> from joblibhadoop.hdfs import register_hdfs_store_backend # Register HDFS store backend provider >>> register_hdfs_store_backend() # Persist data in hdfs://namenode:9000/user/john/cache/joblib >>> mem = Memory(location='cache', backend='hdfs', >>> host='namenode', port=9000, user='john', compress=True) multiply = mem.cache(np.multiply)
  • 43. Use pluggable store backends Extends Memory API with other store providers Not available upstream yet: ⇒ PR opened at https://github.com/joblib/joblib/pull/397 >>> import numpy as np >>> from joblib import Memory >>> from joblibhadoop.hdfs import register_hdfs_store_backend # Register HDFS store backend provider >>> register_hdfs_store_backend() # Persist data in hdfs://namenode:9000/user/john/cache/joblib >>> mem = Memory(location='cache', backend='hdfs', >>> host='namenode', port=9000, user='john', compress=True) multiply = mem.cache(np.multiply) Store backends available: Amazon S3: https://github.com/aabadie/joblib-s3 Hadoop HDFS: https://github.com/joblib/joblib-hadoop
  • 44. Using Hadoop with Joblib joblib-hadoop package: https://github.com/joblib/joblib-hadoop
  • 45. Using Hadoop with Joblib joblib-hadoop package: https://github.com/joblib/joblib-hadoop Provides docker containers helpers for developing and testing
  • 46. Using Hadoop with Joblib joblib-hadoop package: https://github.com/joblib/joblib-hadoop Provides docker containers helpers for developing and testing ⇒ no need for a production Hadoop cluster ⇒ make developer life easier: CI on Travis is possible ⇒ local repository on host is shared with Joblib-hadoop-node container
  • 47. Using Hadoop with Joblib joblib-hadoop package: https://github.com/joblib/joblib-hadoop Provides docker containers helpers for developing and testing ⇒ no need for a production Hadoop cluster ⇒ make developer life easier: CI on Travis is possible ⇒ local repository on host is shared with Joblib-hadoop-node container
  • 48. Outline Joblib in a word Joblib for cloud computing ⇒Future work and conclusion
  • 49. Future work In-memory object caching ⇒ Should save RAM during a parallel job
  • 50. Future work In-memory object caching ⇒ Should save RAM during a parallel job Allow overriding of parallel backends ⇒ See PR: https://github.com/joblib/joblib/pull/524 ⇒ Seamless distributed computing in scikit-learn
  • 51. Future work In-memory object caching ⇒ Should save RAM during a parallel job Allow overriding of parallel backends ⇒ See PR: https://github.com/joblib/joblib/pull/524 ⇒ Seamless distributed computing in scikit-learn Replace multiprocessing parallel backend with Loky ⇒ See PR: https://github.com/joblib/joblib/pull/516
  • 52. Future work In-memory object caching ⇒ Should save RAM during a parallel job Allow overriding of parallel backends ⇒ See PR: https://github.com/joblib/joblib/pull/524 ⇒ Seamless distributed computing in scikit-learn Replace multiprocessing parallel backend with Loky ⇒ See PR: https://github.com/joblib/joblib/pull/516 Extend Cloud providers support ⇒ Using Apache libcloud: give access to a lot more Cloud providers
  • 54. Conclusion Parallel helper is adapted to embarassingly parallel problems
  • 55. Conclusion Parallel helper is adapted to embarassingly parallel problems Already a lot of parallel backends available ⇒ threading, multiprocessing, loky, CMFActivity distributed, ipyparallel, Yarn
  • 56. Conclusion Parallel helper is adapted to embarassingly parallel problems Already a lot of parallel backends available ⇒ threading, multiprocessing, loky, CMFActivity distributed, ipyparallel, Yarn Use caching techniques to avoid recomputation
  • 57. Conclusion Parallel helper is adapted to embarassingly parallel problems Already a lot of parallel backends available ⇒ threading, multiprocessing, loky, CMFActivity distributed, ipyparallel, Yarn Use caching techniques to avoid recomputation Extra Store backends available ⇒ HDFS (Hadoop) and AWS S3
  • 58. Conclusion Parallel helper is adapted to embarassingly parallel problems Already a lot of parallel backends available ⇒ threading, multiprocessing, loky, CMFActivity distributed, ipyparallel, Yarn Use caching techniques to avoid recomputation Extra Store backends available ⇒ HDFS (Hadoop) and AWS S3 Use Joblib either on your laptop or in a Cloud with very few code changes