SlideShare a Scribd company logo
1 of 45
Download to read offline
The genesis of clusterlib
An open source library to tame your
favourite supercomputer
Arnaud Joly
May 2015, Phd hour discussions
Use case for the birth of clusterlib
Solving supervised learning tasks
The goal of supervised learning is to learn a function from
input-output pairs in order to predict the output for any new
input.
A supervised learning task
1.5 1.0 0.5 0.0 0.5 1.0 1.5
1.5
1.0
0.5
0.0
0.5
1.0
1.5
A learnt function
1.5 1.0 0.5 0.0 0.5 1.0 1.5
1.5
1.0
0.5
0.0
0.5
1.0
1.5
Time is running out
A huge set of tasks and a practical upper bound
O(#datasets×#algorithms×#parameters) ≤ Time before deadline
Embarrassingly parallel tasks.
Supercomputer = scheduler + workers
Supercomputer = cluster of computers
CECI is in !
Thanks to the CECI1 at the University of Liège, we have access
to
7 supercomputers (or clusters of computers);
≈ 20 000 cores;
>60 000 GB ram;
12 high performance scientific GPU.
1
The CECI (http: // www. ceci-hpc. be/ ) is a supercomputer
consortium funded by the FNRS.
A glimpse to the SLURM scheduler user interface
How to sumit a job?
First, we need to write a bash file job.sh which specifies
resource requirements and calls the program.
#!/bin/bash
#SBATCH --job-name=job-name
#SBATCH --time=10:00
#SBATCH --mem=1000
srun hostname
Then you can launch the job in the queue
$ sbatch job.sh
How to launch easily many jobs?
With clusterlib, how to launch easily many jobs??
Let’s generate jobs submission command on the fly!
>>> from clusterlib.scheduler import submit
>>> script = submit(job_command="srun hostname",
... job_name="test",
... time="10:00",
... memory=1000,
... backend="slurm")
>>> print(script)
echo ’#!/bin/bash
srun hostname’ | sbatch --job-name=test --time=10:00 --mem=1000
>>> # Let’s launch the job
>>> os.system(script)
A glimpse to the SLURM scheduler user interface
How to check if a job is running?
To check if a job is running, you can use the squeue command.
$ squeue -u ‘whoami‘
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1225128 defq job-9999 someone PD 0:00 1 (Priority)
1225129 defq job-9998 someone PD 0:00 1 (Priority)
...
1224607 defq job-0003 someone R 7:39:16 1 node025
1224605 defq job-0002 someone R 7:43:25 1 node040
1224593 defq job-0001 someone R 8:06:33 1 node035
How to avoid launching running, queued or completed jobs?
With clusterlib, how to avoid re-launching ... ?
... running and queued jobs?
Let’s give to jobs a unique name, then let’s retrieve the names
of running / queued jobs.
>>> from clusterlib.scheduler import queued_or_running_jobs
>>> queued_or_running_jobs()
["job-0001", "job-0002", ..., "job-9999"]
... completed jobs?
A job must indicate through the file system that it has been
completed e.g. by creating a file or by registering completion into a
database.
clusterlib provides a small NO-SQL database based on sqlite3.
Simplicity beats complexity
With only 4 functions
from clusterlib.scheduler import queued_or_running_jobs
from clusterlib.scheduler import submit
from clusterlib.storage import sqlite3_dumps
from clusterlib.storage import sqlite3_loads
We launch easily thousands of jobs.
We are task DRY (Don’t repeat yourself).
We are smoothly working on SLURM and SGE schedulers.
We have only a dependency on python,
Are we done?
Are we done?
Let’s go open source!
Why making an open source library?
Give back to the open source community.
Open source initiative affiliate communities
Why making an open source library?
Bug reports are great!
Why making an open source library?
Welcome new contributors !
From left to right: Olivier Grisel, Antonio Sutera, Loic Esteve (no
photo) and Konstantin Petrov (no photo).
Why making open source in sciences? Reproducibility!
Be proud of your code
Open source way: Host your code publicly
www.myproject.com
Another awesome host platform
Tip Sit on the shoulders of a giant!
Bonus Use a control version system such as git (clusterlib
choice), mercurial,. . .
Open source way: Choose a license
No license = closed source
In short
You can’t use or even read my code.
Open source way: Choose a license
No license = closed source
GPL-like license = copyleft
In short
You can read / use / share / modify my code, but derivatives
must retains those rights.
Open source way: Choose a license
No license = closed source
GPL-like license = copyleft
BSD / MIT-style license = permissive
In short
Do whatever you want with the code, but keep my name with it.
Open source way: Choose a license
No license = closed source
GPL-like license = copyleft
BSD / MIT-style license = permissive
For the sake of open source, pick a popular open source license.
Open source way: Choose a license
No license = closed source
GPL-like license = copyleft
BSD / MIT-style license = permissive
For the sake of open source, pick a popular open source license.
For the sake of wisdom or money, choose carefully.
Open source way: Choose a license
No license = closed source
GPL-like license = copyleft
BSD / MIT-style license = permissive
For the sake of open source, pick a popular open source license.
For the sake of wisdom or money, choose carefully.
For the sake of science, go with BSD / MIT-style license.
Clusterlib is BSD Licensed.
Open source way: Start an issue tracker
An issue tracker allows managing and maintaining a list of
issues.
Tip Sit on the shoulders of giants! (again)
Open source way: Let users discuss with core contributors
1. Issue tracker (only viable for small project)
2. Mailing list : sourceforge, google groups, . . .
3. Stack overflow tag for big projects
Tip Sit on the shoulders of giants! (again * 2)
Are we done?
Are we done?
The grand seduction!
Know who you are!
Vision
The goal of the clusterlib is to ease the creation, launch and
management of embarrassingly parallel jobs on supercomputers
with schedulers such as SLURM and SGE.
Core values
Pure python, simple, user-friendly!
Attrative documentation
Readme
Attrative documentation
API documentation is nice.
Tip Follow a standard such as "PEP 0257 – Docstring
Conventions".
Attrative documentation
Beautiful doc is better.
clusterlib uses sphinx for building its doc.
Attrative documentation
And even better with examples.
Attrative documentation
A narrative documentation is awesome.
Attrative documentation
What’s new?
Thanks to all clusterlib contributors!
Appealing test suite
How good is your test suite (code coverage)?
All code lines are hit by the tests (100% line coverage), but not
all code paths (branches) are tested.
$ make test
nosetests clusterlib doc
Name Stmts Miss Branch BrMiss Cover Missing
------------------------------------------------------------------
clusterlib 1 0 0 0 100%
clusterlib.scheduler 82 0 40 16 87%
clusterlib.storage 35 0 14 0 100%
------------------------------------------------------------------
TOTAL 118 0 54 16 91%
------------------------------------------------------------------
Ran 12 tests in 0.189s
OK (SKIP=3)
Publicity
Are we done?
Are we done?
Not yet! Let’s make our live easier!
Automation, automation, . . .
Continuous testing
clusterlib uses Travis CI (works for many languages).
Tip Sit on the shoulders of giants! (again * 3)
Automation, automation, . . .
Continuous integration pays off in the long term!
Ensure test suite is often run. Awesome during development.
Tip Continuous integration can be enhanced with code
test coverage and code quality coverage.
Automation, automation, . . .
Continuous doc building
Tip Sit on the shoulders of giants! (again * 4)
Create and join an open source projects!
Join a community! Learn the best practices and technologies!
Make your code survive more than the one project! Give your
code to the world!
scikit-learn sprint 2014
A full example of clusterlib usage
# main.py
import sys, os
from clusterlib.storage import sqlite3_dumps
NOSQL_PATH = os.path.join(os.environ["HOME"], "job.sqlite3")
def main(argv=None):
# For ease here, function parameters are sys.argv
if argv is None:
argv = sys.argv
# Do heavy computation
# Save script evaluation on the hard disk
if __name__ == "__main__":
main()
# Great, the jobs is done!
sqlite3_dumps({" ".join(sys.argv): "JOB DONE"}, NOSQL_PATH)
A full example of clusterlib usage
# launcher.py
import sys
from clusterlib.scheduler import queued_or_running_jobs
from clusterlib.scheduler import submit
from clusterlib.storage import sqlite3_loads
from main import NOSQL_PATH
if __name__ == "__main__":
scheduled_jobs = set(queued_or_running_jobs())
done_jobs = sqlite3_loads(NOSQL_PATH)
for param in range(100):
job_name = "job-param=%s" % param
job_command = ("%s main.py --param %s"
% (sys.executable, param))
if (job_name not in scheduled_jobs and
job_command not in done_jobs):
script = submit(job_command, job_name=job_name)

More Related Content

What's hot

Reversing the dropbox client on windows
Reversing the dropbox client on windowsReversing the dropbox client on windows
Reversing the dropbox client on windowsextremecoders
 
PyParis 2017 / Pandas - What's new and whats coming - Joris van den Bossche
PyParis 2017 / Pandas - What's new and whats coming - Joris van den BosschePyParis 2017 / Pandas - What's new and whats coming - Joris van den Bossche
PyParis 2017 / Pandas - What's new and whats coming - Joris van den BosschePôle Systematic Paris-Region
 
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patternsnpinto
 
WAD : A Module for Converting Fatal Extension Errors into Python Exceptions
WAD : A Module for Converting Fatal Extension Errors into Python ExceptionsWAD : A Module for Converting Fatal Extension Errors into Python Exceptions
WAD : A Module for Converting Fatal Extension Errors into Python ExceptionsDavid Beazley (Dabeaz LLC)
 
Practicing Python 3
Practicing Python 3Practicing Python 3
Practicing Python 3Mosky Liu
 
Ekon 25 Python4Delphi_MX475
Ekon 25 Python4Delphi_MX475Ekon 25 Python4Delphi_MX475
Ekon 25 Python4Delphi_MX475Max Kleiner
 
Quality assurance of large c++ projects
Quality assurance of large c++ projectsQuality assurance of large c++ projects
Quality assurance of large c++ projectscorehard_by
 
Puppet DSL: back to the basics
Puppet DSL: back to the basicsPuppet DSL: back to the basics
Puppet DSL: back to the basicsJulien Pivotto
 
Developer-friendly taskqueues: What you should ask yourself before choosing one
Developer-friendly taskqueues: What you should ask yourself before choosing oneDeveloper-friendly taskqueues: What you should ask yourself before choosing one
Developer-friendly taskqueues: What you should ask yourself before choosing oneSylvain Zimmer
 
EKON 25 Python4Delphi_mX4
EKON 25 Python4Delphi_mX4EKON 25 Python4Delphi_mX4
EKON 25 Python4Delphi_mX4Max Kleiner
 
Python for IoT, A return of experience
Python for IoT, A return of experiencePython for IoT, A return of experience
Python for IoT, A return of experienceAlexandre Abadie
 
Sour Pickles
Sour PicklesSour Pickles
Sour PicklesSensePost
 
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...David Beazley (Dabeaz LLC)
 
Introduction to IPython & Jupyter Notebooks
Introduction to IPython & Jupyter NotebooksIntroduction to IPython & Jupyter Notebooks
Introduction to IPython & Jupyter NotebooksEueung Mulyana
 
Data analytics in the cloud with Jupyter notebooks.
Data analytics in the cloud with Jupyter notebooks.Data analytics in the cloud with Jupyter notebooks.
Data analytics in the cloud with Jupyter notebooks.Graham Dumpleton
 
Using Python3 to Build a Cloud Computing Service for my Superboard II
Using Python3 to Build a Cloud Computing Service for my Superboard IIUsing Python3 to Build a Cloud Computing Service for my Superboard II
Using Python3 to Build a Cloud Computing Service for my Superboard IIDavid Beazley (Dabeaz LLC)
 
Tensorflow on Android
Tensorflow on AndroidTensorflow on Android
Tensorflow on AndroidKoan-Sin Tan
 

What's hot (20)

Reversing the dropbox client on windows
Reversing the dropbox client on windowsReversing the dropbox client on windows
Reversing the dropbox client on windows
 
PyParis 2017 / Pandas - What's new and whats coming - Joris van den Bossche
PyParis 2017 / Pandas - What's new and whats coming - Joris van den BosschePyParis 2017 / Pandas - What's new and whats coming - Joris van den Bossche
PyParis 2017 / Pandas - What's new and whats coming - Joris van den Bossche
 
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns
 
Doing the Impossible
Doing the ImpossibleDoing the Impossible
Doing the Impossible
 
WAD : A Module for Converting Fatal Extension Errors into Python Exceptions
WAD : A Module for Converting Fatal Extension Errors into Python ExceptionsWAD : A Module for Converting Fatal Extension Errors into Python Exceptions
WAD : A Module for Converting Fatal Extension Errors into Python Exceptions
 
Practicing Python 3
Practicing Python 3Practicing Python 3
Practicing Python 3
 
Ekon 25 Python4Delphi_MX475
Ekon 25 Python4Delphi_MX475Ekon 25 Python4Delphi_MX475
Ekon 25 Python4Delphi_MX475
 
Quality assurance of large c++ projects
Quality assurance of large c++ projectsQuality assurance of large c++ projects
Quality assurance of large c++ projects
 
Puppet DSL: back to the basics
Puppet DSL: back to the basicsPuppet DSL: back to the basics
Puppet DSL: back to the basics
 
Developer-friendly taskqueues: What you should ask yourself before choosing one
Developer-friendly taskqueues: What you should ask yourself before choosing oneDeveloper-friendly taskqueues: What you should ask yourself before choosing one
Developer-friendly taskqueues: What you should ask yourself before choosing one
 
EKON 25 Python4Delphi_mX4
EKON 25 Python4Delphi_mX4EKON 25 Python4Delphi_mX4
EKON 25 Python4Delphi_mX4
 
Python for IoT, A return of experience
Python for IoT, A return of experiencePython for IoT, A return of experience
Python for IoT, A return of experience
 
Sour Pickles
Sour PicklesSour Pickles
Sour Pickles
 
Perl-C/C++ Integration with Swig
Perl-C/C++ Integration with SwigPerl-C/C++ Integration with Swig
Perl-C/C++ Integration with Swig
 
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
 
Introduction to IPython & Jupyter Notebooks
Introduction to IPython & Jupyter NotebooksIntroduction to IPython & Jupyter Notebooks
Introduction to IPython & Jupyter Notebooks
 
Data analytics in the cloud with Jupyter notebooks.
Data analytics in the cloud with Jupyter notebooks.Data analytics in the cloud with Jupyter notebooks.
Data analytics in the cloud with Jupyter notebooks.
 
Is Python still production ready ? Ludovic Gasc
Is Python still production ready ? Ludovic GascIs Python still production ready ? Ludovic Gasc
Is Python still production ready ? Ludovic Gasc
 
Using Python3 to Build a Cloud Computing Service for my Superboard II
Using Python3 to Build a Cloud Computing Service for my Superboard IIUsing Python3 to Build a Cloud Computing Service for my Superboard II
Using Python3 to Build a Cloud Computing Service for my Superboard II
 
Tensorflow on Android
Tensorflow on AndroidTensorflow on Android
Tensorflow on Android
 

Viewers also liked

Open Source Technology for Libraries
Open Source Technology for LibrariesOpen Source Technology for Libraries
Open Source Technology for LibrariesNicole C. Engard
 
Practical Open Source Software for Libraries (part 1)
Practical Open Source Software for Libraries (part 1)Practical Open Source Software for Libraries (part 1)
Practical Open Source Software for Libraries (part 1)Nicole C. Engard
 
Open Source Software and Libraries
Open Source Software and LibrariesOpen Source Software and Libraries
Open Source Software and LibrariesEllyssa Kroski
 
Power Point Presentation on Open Source Software
Power Point Presentation on Open Source Software Power Point Presentation on Open Source Software
Power Point Presentation on Open Source Software opensourceacademy
 
Open Source Software Presentation
Open Source Software PresentationOpen Source Software Presentation
Open Source Software PresentationHenry Briggs
 
OPEN SOURCE SEMINAR PRESENTATION
OPEN SOURCE SEMINAR PRESENTATIONOPEN SOURCE SEMINAR PRESENTATION
OPEN SOURCE SEMINAR PRESENTATIONRitwick Halder
 

Viewers also liked (7)

Open Source Technology for Libraries
Open Source Technology for LibrariesOpen Source Technology for Libraries
Open Source Technology for Libraries
 
Practical Open Source Software for Libraries (part 1)
Practical Open Source Software for Libraries (part 1)Practical Open Source Software for Libraries (part 1)
Practical Open Source Software for Libraries (part 1)
 
Open Source Software and Libraries
Open Source Software and LibrariesOpen Source Software and Libraries
Open Source Software and Libraries
 
Power Point Presentation on Open Source Software
Power Point Presentation on Open Source Software Power Point Presentation on Open Source Software
Power Point Presentation on Open Source Software
 
Open Source Technology
Open Source TechnologyOpen Source Technology
Open Source Technology
 
Open Source Software Presentation
Open Source Software PresentationOpen Source Software Presentation
Open Source Software Presentation
 
OPEN SOURCE SEMINAR PRESENTATION
OPEN SOURCE SEMINAR PRESENTATIONOPEN SOURCE SEMINAR PRESENTATION
OPEN SOURCE SEMINAR PRESENTATION
 

Similar to The genesis of clusterlib - An open source library to tame your favourite supercomputer

Open frameworks 101_fitc
Open frameworks 101_fitcOpen frameworks 101_fitc
Open frameworks 101_fitcbenDesigning
 
Hands-on Lab: Red Hat Container Development & OpenShift
Hands-on Lab: Red Hat Container Development & OpenShiftHands-on Lab: Red Hat Container Development & OpenShift
Hands-on Lab: Red Hat Container Development & OpenShiftAmazon Web Services
 
Python testing like a pro by Keith Yang
Python testing like a pro by Keith YangPython testing like a pro by Keith Yang
Python testing like a pro by Keith YangPYCON MY PLT
 
Hacking the Kinect with GAFFTA Day 1
Hacking the Kinect with GAFFTA Day 1Hacking the Kinect with GAFFTA Day 1
Hacking the Kinect with GAFFTA Day 1benDesigning
 
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres..."The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...Edge AI and Vision Alliance
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Luciano Resende
 
What is Python? An overview of Python for science.
What is Python? An overview of Python for science.What is Python? An overview of Python for science.
What is Python? An overview of Python for science.Nicholas Pringle
 
MobileConf 2021 Slides: Let's build macOS CLI Utilities using Swift
MobileConf 2021 Slides:  Let's build macOS CLI Utilities using SwiftMobileConf 2021 Slides:  Let's build macOS CLI Utilities using Swift
MobileConf 2021 Slides: Let's build macOS CLI Utilities using SwiftDiego Freniche Brito
 
OptView2 - C++ on Sea 2022
OptView2 - C++ on Sea 2022OptView2 - C++ on Sea 2022
OptView2 - C++ on Sea 2022Ofek Shilon
 
PyCon2022 - Building Python Extensions
PyCon2022 - Building Python ExtensionsPyCon2022 - Building Python Extensions
PyCon2022 - Building Python ExtensionsHenry Schreiner
 
python-160403194316.pdf
python-160403194316.pdfpython-160403194316.pdf
python-160403194316.pdfgmadhu8
 
Python Seminar PPT
Python Seminar PPTPython Seminar PPT
Python Seminar PPTShivam Gupta
 
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONMicroservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONAdrian Cockcroft
 

Similar to The genesis of clusterlib - An open source library to tame your favourite supercomputer (20)

Open frameworks 101_fitc
Open frameworks 101_fitcOpen frameworks 101_fitc
Open frameworks 101_fitc
 
Hands-on Lab: Red Hat Container Development & OpenShift
Hands-on Lab: Red Hat Container Development & OpenShiftHands-on Lab: Red Hat Container Development & OpenShift
Hands-on Lab: Red Hat Container Development & OpenShift
 
Python testing like a pro by Keith Yang
Python testing like a pro by Keith YangPython testing like a pro by Keith Yang
Python testing like a pro by Keith Yang
 
Hacking the Kinect with GAFFTA Day 1
Hacking the Kinect with GAFFTA Day 1Hacking the Kinect with GAFFTA Day 1
Hacking the Kinect with GAFFTA Day 1
 
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres..."The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
 
What is Python? An overview of Python for science.
What is Python? An overview of Python for science.What is Python? An overview of Python for science.
What is Python? An overview of Python for science.
 
MobileConf 2021 Slides: Let's build macOS CLI Utilities using Swift
MobileConf 2021 Slides:  Let's build macOS CLI Utilities using SwiftMobileConf 2021 Slides:  Let's build macOS CLI Utilities using Swift
MobileConf 2021 Slides: Let's build macOS CLI Utilities using Swift
 
05 python.pdf
05 python.pdf05 python.pdf
05 python.pdf
 
OptView2 - C++ on Sea 2022
OptView2 - C++ on Sea 2022OptView2 - C++ on Sea 2022
OptView2 - C++ on Sea 2022
 
Spock Framework
Spock FrameworkSpock Framework
Spock Framework
 
Software Engineering
Software EngineeringSoftware Engineering
Software Engineering
 
PyCon2022 - Building Python Extensions
PyCon2022 - Building Python ExtensionsPyCon2022 - Building Python Extensions
PyCon2022 - Building Python Extensions
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
 
python-160403194316.pdf
python-160403194316.pdfpython-160403194316.pdf
python-160403194316.pdf
 
Python
PythonPython
Python
 
Python Seminar PPT
Python Seminar PPTPython Seminar PPT
Python Seminar PPT
 
HPC Examples
HPC ExamplesHPC Examples
HPC Examples
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
 
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONMicroservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
 

Recently uploaded

AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfayushiqss
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyAnusha Are
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 

Recently uploaded (20)

AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodology
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 

The genesis of clusterlib - An open source library to tame your favourite supercomputer

  • 1. The genesis of clusterlib An open source library to tame your favourite supercomputer Arnaud Joly May 2015, Phd hour discussions
  • 2. Use case for the birth of clusterlib Solving supervised learning tasks The goal of supervised learning is to learn a function from input-output pairs in order to predict the output for any new input. A supervised learning task 1.5 1.0 0.5 0.0 0.5 1.0 1.5 1.5 1.0 0.5 0.0 0.5 1.0 1.5 A learnt function 1.5 1.0 0.5 0.0 0.5 1.0 1.5 1.5 1.0 0.5 0.0 0.5 1.0 1.5
  • 3. Time is running out A huge set of tasks and a practical upper bound O(#datasets×#algorithms×#parameters) ≤ Time before deadline Embarrassingly parallel tasks.
  • 4. Supercomputer = scheduler + workers Supercomputer = cluster of computers
  • 5. CECI is in ! Thanks to the CECI1 at the University of Liège, we have access to 7 supercomputers (or clusters of computers); ≈ 20 000 cores; >60 000 GB ram; 12 high performance scientific GPU. 1 The CECI (http: // www. ceci-hpc. be/ ) is a supercomputer consortium funded by the FNRS.
  • 6. A glimpse to the SLURM scheduler user interface How to sumit a job? First, we need to write a bash file job.sh which specifies resource requirements and calls the program. #!/bin/bash #SBATCH --job-name=job-name #SBATCH --time=10:00 #SBATCH --mem=1000 srun hostname Then you can launch the job in the queue $ sbatch job.sh How to launch easily many jobs?
  • 7. With clusterlib, how to launch easily many jobs?? Let’s generate jobs submission command on the fly! >>> from clusterlib.scheduler import submit >>> script = submit(job_command="srun hostname", ... job_name="test", ... time="10:00", ... memory=1000, ... backend="slurm") >>> print(script) echo ’#!/bin/bash srun hostname’ | sbatch --job-name=test --time=10:00 --mem=1000 >>> # Let’s launch the job >>> os.system(script)
  • 8. A glimpse to the SLURM scheduler user interface How to check if a job is running? To check if a job is running, you can use the squeue command. $ squeue -u ‘whoami‘ JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 1225128 defq job-9999 someone PD 0:00 1 (Priority) 1225129 defq job-9998 someone PD 0:00 1 (Priority) ... 1224607 defq job-0003 someone R 7:39:16 1 node025 1224605 defq job-0002 someone R 7:43:25 1 node040 1224593 defq job-0001 someone R 8:06:33 1 node035 How to avoid launching running, queued or completed jobs?
  • 9. With clusterlib, how to avoid re-launching ... ? ... running and queued jobs? Let’s give to jobs a unique name, then let’s retrieve the names of running / queued jobs. >>> from clusterlib.scheduler import queued_or_running_jobs >>> queued_or_running_jobs() ["job-0001", "job-0002", ..., "job-9999"] ... completed jobs? A job must indicate through the file system that it has been completed e.g. by creating a file or by registering completion into a database. clusterlib provides a small NO-SQL database based on sqlite3.
  • 10. Simplicity beats complexity With only 4 functions from clusterlib.scheduler import queued_or_running_jobs from clusterlib.scheduler import submit from clusterlib.storage import sqlite3_dumps from clusterlib.storage import sqlite3_loads We launch easily thousands of jobs. We are task DRY (Don’t repeat yourself). We are smoothly working on SLURM and SGE schedulers. We have only a dependency on python,
  • 12. Are we done? Let’s go open source!
  • 13. Why making an open source library? Give back to the open source community. Open source initiative affiliate communities
  • 14. Why making an open source library? Bug reports are great!
  • 15. Why making an open source library? Welcome new contributors ! From left to right: Olivier Grisel, Antonio Sutera, Loic Esteve (no photo) and Konstantin Petrov (no photo).
  • 16. Why making open source in sciences? Reproducibility!
  • 17. Be proud of your code
  • 18. Open source way: Host your code publicly www.myproject.com Another awesome host platform Tip Sit on the shoulders of a giant! Bonus Use a control version system such as git (clusterlib choice), mercurial,. . .
  • 19. Open source way: Choose a license No license = closed source In short You can’t use or even read my code.
  • 20. Open source way: Choose a license No license = closed source GPL-like license = copyleft In short You can read / use / share / modify my code, but derivatives must retains those rights.
  • 21. Open source way: Choose a license No license = closed source GPL-like license = copyleft BSD / MIT-style license = permissive In short Do whatever you want with the code, but keep my name with it.
  • 22. Open source way: Choose a license No license = closed source GPL-like license = copyleft BSD / MIT-style license = permissive For the sake of open source, pick a popular open source license.
  • 23. Open source way: Choose a license No license = closed source GPL-like license = copyleft BSD / MIT-style license = permissive For the sake of open source, pick a popular open source license. For the sake of wisdom or money, choose carefully.
  • 24. Open source way: Choose a license No license = closed source GPL-like license = copyleft BSD / MIT-style license = permissive For the sake of open source, pick a popular open source license. For the sake of wisdom or money, choose carefully. For the sake of science, go with BSD / MIT-style license. Clusterlib is BSD Licensed.
  • 25. Open source way: Start an issue tracker An issue tracker allows managing and maintaining a list of issues. Tip Sit on the shoulders of giants! (again)
  • 26. Open source way: Let users discuss with core contributors 1. Issue tracker (only viable for small project) 2. Mailing list : sourceforge, google groups, . . . 3. Stack overflow tag for big projects Tip Sit on the shoulders of giants! (again * 2)
  • 28. Are we done? The grand seduction!
  • 29. Know who you are! Vision The goal of the clusterlib is to ease the creation, launch and management of embarrassingly parallel jobs on supercomputers with schedulers such as SLURM and SGE. Core values Pure python, simple, user-friendly!
  • 31. Attrative documentation API documentation is nice. Tip Follow a standard such as "PEP 0257 – Docstring Conventions".
  • 32. Attrative documentation Beautiful doc is better. clusterlib uses sphinx for building its doc.
  • 33. Attrative documentation And even better with examples.
  • 34. Attrative documentation A narrative documentation is awesome.
  • 35. Attrative documentation What’s new? Thanks to all clusterlib contributors!
  • 36. Appealing test suite How good is your test suite (code coverage)? All code lines are hit by the tests (100% line coverage), but not all code paths (branches) are tested. $ make test nosetests clusterlib doc Name Stmts Miss Branch BrMiss Cover Missing ------------------------------------------------------------------ clusterlib 1 0 0 0 100% clusterlib.scheduler 82 0 40 16 87% clusterlib.storage 35 0 14 0 100% ------------------------------------------------------------------ TOTAL 118 0 54 16 91% ------------------------------------------------------------------ Ran 12 tests in 0.189s OK (SKIP=3)
  • 39. Are we done? Not yet! Let’s make our live easier!
  • 40. Automation, automation, . . . Continuous testing clusterlib uses Travis CI (works for many languages). Tip Sit on the shoulders of giants! (again * 3)
  • 41. Automation, automation, . . . Continuous integration pays off in the long term! Ensure test suite is often run. Awesome during development. Tip Continuous integration can be enhanced with code test coverage and code quality coverage.
  • 42. Automation, automation, . . . Continuous doc building Tip Sit on the shoulders of giants! (again * 4)
  • 43. Create and join an open source projects! Join a community! Learn the best practices and technologies! Make your code survive more than the one project! Give your code to the world! scikit-learn sprint 2014
  • 44. A full example of clusterlib usage # main.py import sys, os from clusterlib.storage import sqlite3_dumps NOSQL_PATH = os.path.join(os.environ["HOME"], "job.sqlite3") def main(argv=None): # For ease here, function parameters are sys.argv if argv is None: argv = sys.argv # Do heavy computation # Save script evaluation on the hard disk if __name__ == "__main__": main() # Great, the jobs is done! sqlite3_dumps({" ".join(sys.argv): "JOB DONE"}, NOSQL_PATH)
  • 45. A full example of clusterlib usage # launcher.py import sys from clusterlib.scheduler import queued_or_running_jobs from clusterlib.scheduler import submit from clusterlib.storage import sqlite3_loads from main import NOSQL_PATH if __name__ == "__main__": scheduled_jobs = set(queued_or_running_jobs()) done_jobs = sqlite3_loads(NOSQL_PATH) for param in range(100): job_name = "job-param=%s" % param job_command = ("%s main.py --param %s" % (sys.executable, param)) if (job_name not in scheduled_jobs and job_command not in done_jobs): script = submit(job_command, job_name=job_name)