SlideShare a Scribd company logo
1 of 27
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101017529.
Copernicus - eoSC AnaLytics Engine
C-SCALE tutorial: Snakemake
Sebastian Luna-Valero, EGI Foundation
sebastian.luna.valero@egi.eu
C-SCALE tutorial: Snakemake | 29th November 2022 | Online
Outline
• Why workflows?
• Why snakemake?
• Let’s build a workflow!
2
C-SCALE tutorial: Snakemake | 29th November 2022 | Online
Why workflows?
Credits: https://github.com/c-scale-community/use-case-hisea
Goals:
● from raw data to figures
○ with “one click”
● re-run with new config
○ spatial scale
○ temporal scale
● re-run half-way through
○ recover from issues
● dependency management
○ between tasks
○ software packages
3
C-SCALE tutorial: Snakemake | 29th November 2022 | Online
Why workflows?
When to build a workflow?
● Re-run the same analysis over and over again, with different input parameters
● Ability to re-run the work partially; recover from intermediate failures
● Combine together heterogeneous tooling into the same analysis
○ Python, R, Julia, Docker, Bash, etc.
4
C-SCALE tutorial: Snakemake | 29th November 2022 | Online
Why snakemake?
• Mature workflow management system.
• Great community around it.
• Easy to learn? :)
• A Snakemake workflow scales without modification from single core workstations and
multi-core servers to batch systems (e.g. slurm)
• Snakemake integrates with the package manager Conda and the container engine
Singularity such that defining the software stack becomes part of the workflow itself.
• Further information: https://snakemake.readthedocs.io/
5
C-SCALE tutorial: Snakemake | 29th November 2022 | Online
Let’s build a workflow!
• Snakemake follows the GNU Make paradigm: workflows are defined in terms of rules that
define how to create output files from input files.
• $ snakemake --cores 1
• The application of a rule to generate a set of output files is called job.
6
C-SCALE tutorial: Snakemake | 29th November 2022 | Online
rule count_countries:
input:
"european-countries.txt"
output:
"number-of-countries.txt"
shell:
"wc --lines european-countries.txt > number-of-countries.txt"
$ cat european-countries.txt
Netherlands
Greece
Spain
Portugal
Italy
Poland
Austria
Snakefile
Let’s build a workflow!
• Snakemake follows the GNU Make paradigm: workflows are defined in terms of rules that
define how to create output files from input files.
• $ snakemake --cores 1
• Snakemake only re-runs jobs if one of the input files is newer than one of the output files
or one of the input files will be updated by another job.
7
C-SCALE tutorial: Snakemake | 29th November 2022 | Online
rule count_countries:
input:
"european-countries.txt"
output:
"number-of-countries.txt"
shell:
"wc --lines european-countries.txt > number-of-countries.txt"
$ cat european-countries.txt
Netherlands
Greece
Spain
Portugal
Italy
Poland
Austria
Belgium
Snakefile
Let’s build a workflow!
• Generalize the rule:
• $ snakemake --cores 1
• $ wc --lines european-countries.txt > number-of-countries.txt
8
C-SCALE tutorial: Snakemake | 29th November 2022 | Online
rule count_countries:
input:
"european-countries.txt"
output:
"number-of-countries.txt"
shell:
"wc --lines {input} > {output}"
$ cat european-countries.txt
Netherlands
Greece
Spain
Portugal
Italy
Poland
Austria
Snakefile
Let’s build a workflow!
• Adding more than one input file:
• $ snakemake --cores 1
• $ wc --lines european-countries.txt other-countries.txt 
> number-of-countries.txt
9
C-SCALE tutorial: Snakemake | 29th November 2022 | Online
rule count_countries:
input:
"european-countries.txt",
"other-countries.txt"
output:
"number-of-countries.txt"
shell:
"wc --lines {input} > {output}"
$ cat european-countries.txt
Netherlands
Greece
Spain
Portugal
Italy
Poland
Austria
Snakefile
$ cat other-countries.txt
US
Canada
Let’s build a workflow!
• It’s better to organize your working directory:
• $ snakemake --cores 1
10
C-SCALE tutorial: Snakemake | 29th November 2022 | Online
rule count_countries:
input:
"countries/european-countries.txt",
"countries/other-countries.txt"
output:
"stats/number-of-countries.txt"
shell:
"wc --lines {input} > {output}"
$ cat european-countries.txt
Netherlands
Greece
Spain
Portugal
Italy
Poland
Austria
Snakefile
$ cat other-countries.txt
US
Canada
Let’s build a workflow!
• Connecting rules! Targets can be rules, output files.
• $ snakemake --cores 1 <target>
11
C-SCALE tutorial: Snakemake | 29th November 2022 | Online
rule pre_processing:
input:
"stats/number-of-countries.txt"
output:
"pre-processing.done"
shell:
"touch pre-processing.done"
rule count_countries:
input:
"countries/european-countries.txt",
"countries/other-countries.txt"
output:
"stats/number-of-countries.txt"
shell:
"wc --lines {input} > {output}"
$ cat european-countries.txt
Netherlands
Greece
Spain
Portugal
Italy
Poland
Austria
Snakefile
$ cat other-countries.txt
US
Canada
Let’s build a workflow!
• Updating intermediate files (however: #1978 and #2011)
• $ snakemake --cores 1 <target>
12
C-SCALE tutorial: Snakemake | 29th November 2022 | Online
rule pre_processing:
input:
"stats/number-of-countries.txt"
output:
"pre-processing.done"
shell:
"touch pre-processing.done"
rule count_countries:
input:
"countries/european-countries.txt",
"countries/other-countries.txt"
output:
"stats/number-of-countries.txt"
shell:
"wc --lines {input} > {output}"
Snakefile $ cat european-countries.txt
Netherlands
Greece
Spain
Portugal
Italy
Poland
Austria
Belgium
$ cat other-countries.txt
US
Canada
Let’s build a workflow!
• Dependencies between the rules are determined creating a Directed Acyclic Graph
• $ snakemake --cores 1 --dag | dot -Tsvg > dag.svg
13
C-SCALE tutorial: Snakemake | 29th November 2022 | Online
rule pre_processing:
input:
"stats/number-of-countries.txt"
output:
"pre-processing.done"
shell:
"touch pre-processing.done"
rule count_countries:
input:
"countries/european-countries.txt",
"countries/other-countries.txt"
output:
"stats/number-of-countries.txt"
shell:
"wc --lines {input} > {output}"
Snakefile
Let’s build a workflow!
• Python
• $ snakemake --cores 1 <target>
14
C-SCALE tutorial: Snakemake | 29th November 2022 | Online
rule pre_processing:
input:
"stats/number-of-countries.txt"
output:
"pre-processing.done"
shell:
"python --input stats/number-of-countries.txt myscript.py"
rule count_countries:
input:
"countries/european-countries.txt",
"countries/other-countries.txt"
output:
"stats/number-of-countries.txt"
shell:
"wc --lines {input} > {output}"
Snakefile
Let’s build a workflow!
• Containers
• $ snakemake --cores 1 <target>
15
C-SCALE tutorial: Snakemake | 29th November 2022 | Online
rule pre_processing:
input:
"stats/number-of-countries.txt"
output:
"pre-processing.done"
shell:
"udocker run example"
rule count_countries:
input:
"countries/european-countries.txt",
"countries/other-countries.txt"
output:
"stats/number-of-countries.txt"
shell:
"wc --lines {input} > {output}"
Snakefile
Let’s build a workflow!
• Pre-built support for Singularity (see docs for more details)
16
C-SCALE tutorial: Snakemake | 29th November 2022 | Online
rule pre_processing:
input:
"stats/number-of-countries.txt"
output:
"pre-processing.done"
container:
"docker://repo/image"
script:
"scripts/plot.R"
rule count_countries:
input:
"countries/european-countries.txt",
"countries/other-countries.txt"
output:
"stats/number-of-countries.txt"
shell:
"wc --lines {input} > {output}"
Snakefile
Let’s build a workflow!
• Configuration
• $ snakemake --cores 1
17
C-SCALE tutorial: Snakemake | 29th November 2022 | Online
configfile: "config.yaml"
rule count_countries:
input:
expand("{input}", input=config['european']),
expand("{input}", input=config['other'])
output:
"stats/number-of-countries.txt"
shell:
"wc --lines {input} > {output}"
$ cat european-countries.txt
Netherlands
Greece
Spain
Portugal
Italy
Poland
Austria
Snakefile $ cat config.yaml
european: 'countries/european-countries.txt'
other: 'countries/other-countries.txt'
$ cat other-countries.txt
US
Canada
Let’s build a workflow!
• Logging
• $ snakemake --cores 1
18
C-SCALE tutorial: Snakemake | 29th November 2022 | Online
rule count_countries:
input:
"countries/european-countries.txt",
"countries/other-countries.txt"
output:
"stats/number-of-countries.txt"
log:
"logs/count_countries.log"
shell:
"wc --lines {input} > {output}"
$ cat european-countries.txt
Netherlands
Greece
Spain
Portugal
Italy
Poland
Austria
Snakefile
$ cat other-countries.txt
US
Canada
Let’s build a workflow!
• Benchmarking
• $ snakemake --cores 1
19
C-SCALE tutorial: Snakemake | 29th November 2022 | Online
rule count_countries:
input:
"countries/european-countries.txt",
"countries/other-countries.txt"
output:
"stats/number-of-countries.txt"
benchmark:
"benchmarks/count_countries.txt"
shell:
"wc --lines {input} > {output}"
$ cat european-countries.txt
Netherlands
Greece
Spain
Portugal
Italy
Poland
Austria
Snakefile
$ cat other-countries.txt
US
Canada
Let’s build a workflow!
• Modularization
20
C-SCALE tutorial: Snakemake | 29th November 2022 | Online
rule count_countries:
input:
"countries/european-countries.txt",
"countries/other-countries.txt"
output:
"stats/number-of-countries.txt"
shell:
"wc --lines {input} > {output}"
$ cat european-countries.txt
Netherlands
Greece
Spain
Portugal
Italy
Poland
Austria
Snakefile
$ cat other-countries.txt
US
Canada
include: "rules/count_countries.smk"
rule pre_processing:
input:
"stats/number-of-countries.txt"
output:
"pre-processing.done"
shell:
"touch pre-processing.done"
Snakefile
Let’s build a workflow!
• Integration with conda
• $ snakemake --cores 1 --use-conda --conda-frontend mamba
21
C-SCALE tutorial: Snakemake | 29th November 2022 | Online
rule count_countries:
input:
"countries/european-countries.txt",
"countries/other-countries.txt"
output:
"stats/number-of-countries.txt"
conda:
"envs/count_countries.yaml"
shell:
"wc --lines {input} > {output}"
$ cat european-countries.txt
Netherlands
Greece
Spain
Portugal
Italy
Poland
Austria
Snakefile
$ cat other-countries.txt
US
Canada
$ cat envs/count_countries.yaml
name: count_countries
channels:
- conda-forge
- defaults
dependencies:
- coreutils
Let’s build a workflow!
• Other examples
• https://github.com/c-scale-community/c-scale-tutorial-snakemake
• https://github.com/c-scale-community/use-case-hisea/pull/41/files
22
C-SCALE tutorial: Snakemake | 29th November 2022 | Online
Let’s build a workflow!
• Advanced features
• Pre-built functionality for scatter-gather jobs
• Cluster execution: snakemake --cluster qsub (see SLURM docs)
• Self-contained HTML reports
• Accessing remote storage:
• Amazon S3, Google Cloud Storage, Microsoft Azure Blob Storage
• SFTP, HTTP, FTP, Dropbox, XRootD, WebDAV, GFAL, GridFTP, iRODs, etc.
• Best practices
• https://snakemake.readthedocs.io/en/stable/snakefiles/best_practices.html
• FAQs: https://snakemake.readthedocs.io/en/stable/project_info/faq.html
23
C-SCALE tutorial: Snakemake | 29th November 2022 | Online
Thank you for your attention.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101017529.
Copernicus - eoSC AnaLytics Engine
contact@c-scale.eu
https://c-scale.eu
@C_SCALE_EU
C-SCALE tutorial: Snakemake | 29th November 2022 | Online
Sebastian Luna-Valero, EGI Foundation
sebastian.luna.valero@egi.eu
Let’s build a workflow!
• Wildcards example:
• $ snakemake --cores 1 stats/number-of-european-countries.txt
• $ snakemake --cores 1 stats/number-of-other-countries.txt
25
C-SCALE tutorial: Snakemake | 29th November 2022 | Online
rule count_countries:
input:
"countries/{category}-countries.txt"
output:
"stats/number-of-{category}-countries.txt"
shell:
"wc --lines {input} > {output}"
$ cat list-of-countries.txt
Netherlands
Greece
Spain
Portugal
Italy
Poland
Austria
Snakefile
Let’s build a workflow!
• Many to many with glob_wildcards:
• $ snakemake --cores 1
26
C-SCALE tutorial: Snakemake | 29th November 2022 | Online
CATEGORIES, = glob_wildcards("countries/{category}-countries.txt")
print(CATEGORIES)
rule all:
input:
expand("stats/number-of-{category}-countries.txt", category=CATEGORIES)
rule count_countries:
input:
"countries/{category}-countries.txt"
output:
"stats/number-of-{category}-countries.txt"
shell:
"wc --lines {input} > {output}"
$ cat list-of-countries.txt
Netherlands
Greece
Spain
Portugal
Italy
Poland
Austria
Snakefile
input-1
input-2
output-1
output-2
input-n output-n
input-.. output-..
Let’s build a workflow!
• Dependencies between the rules are determined automatically, creating a DAG (directed
acyclic graph) of jobs that can be automatically parallelized.
• Snakemake only re-runs jobs if one of the input files is newer than one of the output files
or one of the input files will be updated by another job.
• https://github.com/snakemake/snakemake/issues/1978
• Snakemake works backwards from requested output, and not from available input.
• Targets
• rule names can be targets
• output files can be targets
• if no target is given at the command line, Snakemake will define the first rule of the
Snakefile as the target. Hence, it is best practice to have a rule all at the top of the
workflow which has all typically desired target files as input files.
27
C-SCALE tutorial: Snakemake | 29th November 2022 | Online

More Related Content

Similar to C-SCALE Tutorial: Snakemake

Node-RED and getting started on the Internet of Things
Node-RED and getting started on the Internet of ThingsNode-RED and getting started on the Internet of Things
Node-RED and getting started on the Internet of Things
Boris Adryan
 
Feature Detection in Ajax-enabled Web Applications
Feature Detection in Ajax-enabled Web ApplicationsFeature Detection in Ajax-enabled Web Applications
Feature Detection in Ajax-enabled Web Applications
Nikolaos Tsantalis
 
Beam_installation123456785678777777.pptx
Beam_installation123456785678777777.pptxBeam_installation123456785678777777.pptx
Beam_installation123456785678777777.pptx
SravanthiVaka1
 

Similar to C-SCALE Tutorial: Snakemake (20)

InfluxDB Live Product Training
InfluxDB Live Product TrainingInfluxDB Live Product Training
InfluxDB Live Product Training
 
GitHub Actions - using Free Oracle Cloud Infrastructure (OCI)
GitHub Actions - using Free Oracle Cloud Infrastructure (OCI)GitHub Actions - using Free Oracle Cloud Infrastructure (OCI)
GitHub Actions - using Free Oracle Cloud Infrastructure (OCI)
 
Scilab: Computing Tool For Engineers
Scilab: Computing Tool For EngineersScilab: Computing Tool For Engineers
Scilab: Computing Tool For Engineers
 
Cape2013 scilab-workshop-19Oct13
Cape2013 scilab-workshop-19Oct13Cape2013 scilab-workshop-19Oct13
Cape2013 scilab-workshop-19Oct13
 
Node-RED and Minecraft - CamJam September 2015
Node-RED and Minecraft - CamJam September 2015Node-RED and Minecraft - CamJam September 2015
Node-RED and Minecraft - CamJam September 2015
 
Practical virtual network functions with Snabb (SDN Barcelona VI)
Practical virtual network functions with Snabb (SDN Barcelona VI)Practical virtual network functions with Snabb (SDN Barcelona VI)
Practical virtual network functions with Snabb (SDN Barcelona VI)
 
Node-RED and getting started on the Internet of Things
Node-RED and getting started on the Internet of ThingsNode-RED and getting started on the Internet of Things
Node-RED and getting started on the Internet of Things
 
An Introduction to OMNeT++ 6.0
An Introduction to OMNeT++ 6.0An Introduction to OMNeT++ 6.0
An Introduction to OMNeT++ 6.0
 
Creating basic workflows as Jupyter Notebooks to use Cytoscape programmatically.
Creating basic workflows as Jupyter Notebooks to use Cytoscape programmatically.Creating basic workflows as Jupyter Notebooks to use Cytoscape programmatically.
Creating basic workflows as Jupyter Notebooks to use Cytoscape programmatically.
 
Machinel Learning with spark
Machinel Learning with spark Machinel Learning with spark
Machinel Learning with spark
 
LAB 1 Report.docx
LAB 1 Report.docxLAB 1 Report.docx
LAB 1 Report.docx
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
 
Optimizing Your CI Pipelines
Optimizing Your CI PipelinesOptimizing Your CI Pipelines
Optimizing Your CI Pipelines
 
An introduction to workflow-based programming with Node-RED
An introduction to workflow-based programming with Node-REDAn introduction to workflow-based programming with Node-RED
An introduction to workflow-based programming with Node-RED
 
Feature Detection in Ajax-enabled Web Applications
Feature Detection in Ajax-enabled Web ApplicationsFeature Detection in Ajax-enabled Web Applications
Feature Detection in Ajax-enabled Web Applications
 
Graphical packet generator
Graphical packet generatorGraphical packet generator
Graphical packet generator
 
Larson and toubro
Larson and toubroLarson and toubro
Larson and toubro
 
Building TaxBrain: Numba-enabled Financial Computing on the Web
Building TaxBrain: Numba-enabled Financial Computing on the WebBuilding TaxBrain: Numba-enabled Financial Computing on the Web
Building TaxBrain: Numba-enabled Financial Computing on the Web
 
ESP8266 and IOT
ESP8266 and IOTESP8266 and IOT
ESP8266 and IOT
 
Beam_installation123456785678777777.pptx
Beam_installation123456785678777777.pptxBeam_installation123456785678777777.pptx
Beam_installation123456785678777777.pptx
 

Recently uploaded

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Recently uploaded (20)

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 

C-SCALE Tutorial: Snakemake

  • 1. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101017529. Copernicus - eoSC AnaLytics Engine C-SCALE tutorial: Snakemake Sebastian Luna-Valero, EGI Foundation sebastian.luna.valero@egi.eu C-SCALE tutorial: Snakemake | 29th November 2022 | Online
  • 2. Outline • Why workflows? • Why snakemake? • Let’s build a workflow! 2 C-SCALE tutorial: Snakemake | 29th November 2022 | Online
  • 3. Why workflows? Credits: https://github.com/c-scale-community/use-case-hisea Goals: ● from raw data to figures ○ with “one click” ● re-run with new config ○ spatial scale ○ temporal scale ● re-run half-way through ○ recover from issues ● dependency management ○ between tasks ○ software packages 3 C-SCALE tutorial: Snakemake | 29th November 2022 | Online
  • 4. Why workflows? When to build a workflow? ● Re-run the same analysis over and over again, with different input parameters ● Ability to re-run the work partially; recover from intermediate failures ● Combine together heterogeneous tooling into the same analysis ○ Python, R, Julia, Docker, Bash, etc. 4 C-SCALE tutorial: Snakemake | 29th November 2022 | Online
  • 5. Why snakemake? • Mature workflow management system. • Great community around it. • Easy to learn? :) • A Snakemake workflow scales without modification from single core workstations and multi-core servers to batch systems (e.g. slurm) • Snakemake integrates with the package manager Conda and the container engine Singularity such that defining the software stack becomes part of the workflow itself. • Further information: https://snakemake.readthedocs.io/ 5 C-SCALE tutorial: Snakemake | 29th November 2022 | Online
  • 6. Let’s build a workflow! • Snakemake follows the GNU Make paradigm: workflows are defined in terms of rules that define how to create output files from input files. • $ snakemake --cores 1 • The application of a rule to generate a set of output files is called job. 6 C-SCALE tutorial: Snakemake | 29th November 2022 | Online rule count_countries: input: "european-countries.txt" output: "number-of-countries.txt" shell: "wc --lines european-countries.txt > number-of-countries.txt" $ cat european-countries.txt Netherlands Greece Spain Portugal Italy Poland Austria Snakefile
  • 7. Let’s build a workflow! • Snakemake follows the GNU Make paradigm: workflows are defined in terms of rules that define how to create output files from input files. • $ snakemake --cores 1 • Snakemake only re-runs jobs if one of the input files is newer than one of the output files or one of the input files will be updated by another job. 7 C-SCALE tutorial: Snakemake | 29th November 2022 | Online rule count_countries: input: "european-countries.txt" output: "number-of-countries.txt" shell: "wc --lines european-countries.txt > number-of-countries.txt" $ cat european-countries.txt Netherlands Greece Spain Portugal Italy Poland Austria Belgium Snakefile
  • 8. Let’s build a workflow! • Generalize the rule: • $ snakemake --cores 1 • $ wc --lines european-countries.txt > number-of-countries.txt 8 C-SCALE tutorial: Snakemake | 29th November 2022 | Online rule count_countries: input: "european-countries.txt" output: "number-of-countries.txt" shell: "wc --lines {input} > {output}" $ cat european-countries.txt Netherlands Greece Spain Portugal Italy Poland Austria Snakefile
  • 9. Let’s build a workflow! • Adding more than one input file: • $ snakemake --cores 1 • $ wc --lines european-countries.txt other-countries.txt > number-of-countries.txt 9 C-SCALE tutorial: Snakemake | 29th November 2022 | Online rule count_countries: input: "european-countries.txt", "other-countries.txt" output: "number-of-countries.txt" shell: "wc --lines {input} > {output}" $ cat european-countries.txt Netherlands Greece Spain Portugal Italy Poland Austria Snakefile $ cat other-countries.txt US Canada
  • 10. Let’s build a workflow! • It’s better to organize your working directory: • $ snakemake --cores 1 10 C-SCALE tutorial: Snakemake | 29th November 2022 | Online rule count_countries: input: "countries/european-countries.txt", "countries/other-countries.txt" output: "stats/number-of-countries.txt" shell: "wc --lines {input} > {output}" $ cat european-countries.txt Netherlands Greece Spain Portugal Italy Poland Austria Snakefile $ cat other-countries.txt US Canada
  • 11. Let’s build a workflow! • Connecting rules! Targets can be rules, output files. • $ snakemake --cores 1 <target> 11 C-SCALE tutorial: Snakemake | 29th November 2022 | Online rule pre_processing: input: "stats/number-of-countries.txt" output: "pre-processing.done" shell: "touch pre-processing.done" rule count_countries: input: "countries/european-countries.txt", "countries/other-countries.txt" output: "stats/number-of-countries.txt" shell: "wc --lines {input} > {output}" $ cat european-countries.txt Netherlands Greece Spain Portugal Italy Poland Austria Snakefile $ cat other-countries.txt US Canada
  • 12. Let’s build a workflow! • Updating intermediate files (however: #1978 and #2011) • $ snakemake --cores 1 <target> 12 C-SCALE tutorial: Snakemake | 29th November 2022 | Online rule pre_processing: input: "stats/number-of-countries.txt" output: "pre-processing.done" shell: "touch pre-processing.done" rule count_countries: input: "countries/european-countries.txt", "countries/other-countries.txt" output: "stats/number-of-countries.txt" shell: "wc --lines {input} > {output}" Snakefile $ cat european-countries.txt Netherlands Greece Spain Portugal Italy Poland Austria Belgium $ cat other-countries.txt US Canada
  • 13. Let’s build a workflow! • Dependencies between the rules are determined creating a Directed Acyclic Graph • $ snakemake --cores 1 --dag | dot -Tsvg > dag.svg 13 C-SCALE tutorial: Snakemake | 29th November 2022 | Online rule pre_processing: input: "stats/number-of-countries.txt" output: "pre-processing.done" shell: "touch pre-processing.done" rule count_countries: input: "countries/european-countries.txt", "countries/other-countries.txt" output: "stats/number-of-countries.txt" shell: "wc --lines {input} > {output}" Snakefile
  • 14. Let’s build a workflow! • Python • $ snakemake --cores 1 <target> 14 C-SCALE tutorial: Snakemake | 29th November 2022 | Online rule pre_processing: input: "stats/number-of-countries.txt" output: "pre-processing.done" shell: "python --input stats/number-of-countries.txt myscript.py" rule count_countries: input: "countries/european-countries.txt", "countries/other-countries.txt" output: "stats/number-of-countries.txt" shell: "wc --lines {input} > {output}" Snakefile
  • 15. Let’s build a workflow! • Containers • $ snakemake --cores 1 <target> 15 C-SCALE tutorial: Snakemake | 29th November 2022 | Online rule pre_processing: input: "stats/number-of-countries.txt" output: "pre-processing.done" shell: "udocker run example" rule count_countries: input: "countries/european-countries.txt", "countries/other-countries.txt" output: "stats/number-of-countries.txt" shell: "wc --lines {input} > {output}" Snakefile
  • 16. Let’s build a workflow! • Pre-built support for Singularity (see docs for more details) 16 C-SCALE tutorial: Snakemake | 29th November 2022 | Online rule pre_processing: input: "stats/number-of-countries.txt" output: "pre-processing.done" container: "docker://repo/image" script: "scripts/plot.R" rule count_countries: input: "countries/european-countries.txt", "countries/other-countries.txt" output: "stats/number-of-countries.txt" shell: "wc --lines {input} > {output}" Snakefile
  • 17. Let’s build a workflow! • Configuration • $ snakemake --cores 1 17 C-SCALE tutorial: Snakemake | 29th November 2022 | Online configfile: "config.yaml" rule count_countries: input: expand("{input}", input=config['european']), expand("{input}", input=config['other']) output: "stats/number-of-countries.txt" shell: "wc --lines {input} > {output}" $ cat european-countries.txt Netherlands Greece Spain Portugal Italy Poland Austria Snakefile $ cat config.yaml european: 'countries/european-countries.txt' other: 'countries/other-countries.txt' $ cat other-countries.txt US Canada
  • 18. Let’s build a workflow! • Logging • $ snakemake --cores 1 18 C-SCALE tutorial: Snakemake | 29th November 2022 | Online rule count_countries: input: "countries/european-countries.txt", "countries/other-countries.txt" output: "stats/number-of-countries.txt" log: "logs/count_countries.log" shell: "wc --lines {input} > {output}" $ cat european-countries.txt Netherlands Greece Spain Portugal Italy Poland Austria Snakefile $ cat other-countries.txt US Canada
  • 19. Let’s build a workflow! • Benchmarking • $ snakemake --cores 1 19 C-SCALE tutorial: Snakemake | 29th November 2022 | Online rule count_countries: input: "countries/european-countries.txt", "countries/other-countries.txt" output: "stats/number-of-countries.txt" benchmark: "benchmarks/count_countries.txt" shell: "wc --lines {input} > {output}" $ cat european-countries.txt Netherlands Greece Spain Portugal Italy Poland Austria Snakefile $ cat other-countries.txt US Canada
  • 20. Let’s build a workflow! • Modularization 20 C-SCALE tutorial: Snakemake | 29th November 2022 | Online rule count_countries: input: "countries/european-countries.txt", "countries/other-countries.txt" output: "stats/number-of-countries.txt" shell: "wc --lines {input} > {output}" $ cat european-countries.txt Netherlands Greece Spain Portugal Italy Poland Austria Snakefile $ cat other-countries.txt US Canada include: "rules/count_countries.smk" rule pre_processing: input: "stats/number-of-countries.txt" output: "pre-processing.done" shell: "touch pre-processing.done" Snakefile
  • 21. Let’s build a workflow! • Integration with conda • $ snakemake --cores 1 --use-conda --conda-frontend mamba 21 C-SCALE tutorial: Snakemake | 29th November 2022 | Online rule count_countries: input: "countries/european-countries.txt", "countries/other-countries.txt" output: "stats/number-of-countries.txt" conda: "envs/count_countries.yaml" shell: "wc --lines {input} > {output}" $ cat european-countries.txt Netherlands Greece Spain Portugal Italy Poland Austria Snakefile $ cat other-countries.txt US Canada $ cat envs/count_countries.yaml name: count_countries channels: - conda-forge - defaults dependencies: - coreutils
  • 22. Let’s build a workflow! • Other examples • https://github.com/c-scale-community/c-scale-tutorial-snakemake • https://github.com/c-scale-community/use-case-hisea/pull/41/files 22 C-SCALE tutorial: Snakemake | 29th November 2022 | Online
  • 23. Let’s build a workflow! • Advanced features • Pre-built functionality for scatter-gather jobs • Cluster execution: snakemake --cluster qsub (see SLURM docs) • Self-contained HTML reports • Accessing remote storage: • Amazon S3, Google Cloud Storage, Microsoft Azure Blob Storage • SFTP, HTTP, FTP, Dropbox, XRootD, WebDAV, GFAL, GridFTP, iRODs, etc. • Best practices • https://snakemake.readthedocs.io/en/stable/snakefiles/best_practices.html • FAQs: https://snakemake.readthedocs.io/en/stable/project_info/faq.html 23 C-SCALE tutorial: Snakemake | 29th November 2022 | Online
  • 24. Thank you for your attention. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101017529. Copernicus - eoSC AnaLytics Engine contact@c-scale.eu https://c-scale.eu @C_SCALE_EU C-SCALE tutorial: Snakemake | 29th November 2022 | Online Sebastian Luna-Valero, EGI Foundation sebastian.luna.valero@egi.eu
  • 25. Let’s build a workflow! • Wildcards example: • $ snakemake --cores 1 stats/number-of-european-countries.txt • $ snakemake --cores 1 stats/number-of-other-countries.txt 25 C-SCALE tutorial: Snakemake | 29th November 2022 | Online rule count_countries: input: "countries/{category}-countries.txt" output: "stats/number-of-{category}-countries.txt" shell: "wc --lines {input} > {output}" $ cat list-of-countries.txt Netherlands Greece Spain Portugal Italy Poland Austria Snakefile
  • 26. Let’s build a workflow! • Many to many with glob_wildcards: • $ snakemake --cores 1 26 C-SCALE tutorial: Snakemake | 29th November 2022 | Online CATEGORIES, = glob_wildcards("countries/{category}-countries.txt") print(CATEGORIES) rule all: input: expand("stats/number-of-{category}-countries.txt", category=CATEGORIES) rule count_countries: input: "countries/{category}-countries.txt" output: "stats/number-of-{category}-countries.txt" shell: "wc --lines {input} > {output}" $ cat list-of-countries.txt Netherlands Greece Spain Portugal Italy Poland Austria Snakefile input-1 input-2 output-1 output-2 input-n output-n input-.. output-..
  • 27. Let’s build a workflow! • Dependencies between the rules are determined automatically, creating a DAG (directed acyclic graph) of jobs that can be automatically parallelized. • Snakemake only re-runs jobs if one of the input files is newer than one of the output files or one of the input files will be updated by another job. • https://github.com/snakemake/snakemake/issues/1978 • Snakemake works backwards from requested output, and not from available input. • Targets • rule names can be targets • output files can be targets • if no target is given at the command line, Snakemake will define the first rule of the Snakefile as the target. Hence, it is best practice to have a rule all at the top of the workflow which has all typically desired target files as input files. 27 C-SCALE tutorial: Snakemake | 29th November 2022 | Online