SlideShare a Scribd company logo
SciPipe
A light-weight workflow library
inspired by flow-based
programming
Samuel Lampa, @smllmp, bionics.it
Dept. Pharm. Biosci. UU, 2016-04-28
Top light-weight workflow tools
Snakemake
● Great for short one-off explorative stuff
● Tricky for complex graphs
Bpipe
● Easy to use for highly linear workflows
● Not so easy with branching workflows
Nextflow
● Dataflow means dynamic scheduling possible(!)
● Own way of organizing outputs
● No “re-usable components” support
SciLuigi and SciPipe
SciLuigi
● Great re-usable components story
● Highly customizable output file naming
● Easy to extend API
● No dynamic scheduling :(
● Performance problems with more than 64 workers
SciPipe
● (Same benefits as SciLuigi)
● Also: Allows dynamic scheduling
● Also: Much lower resource usage
(1000s of workers is OK)
● Also: Simpler, less code, less maintenance
● Also: High-performance for in-line components
SciPipe in brief
● Website: scipipe.org
● Simple, very little code => maintainable
● Write workflows in a subset of Go(lang)
● Execute readable .go-files:
go run myworkflow.go
● Optional compilation to static executable files:
go build; ./myworkflow
● No new language. Use existing Go tooling:
● Editors, Debuggers, Linters, Profilers ...
Flow-based programming
www.jpaulmorrison.com/fbp
Flow-based programming principles
● Separate network definition
(separate from process definitions)
● Named ports
● Channels with bounded buffers
● Information packets (IPs) with defined lifetimes
● More info:
en.wikipedia.org/wiki/Flow-based programming
www.jpaulmorrison.com/fbp
Define components
From shell commands:
… or in plain Go:
… and then mix & match! (See e.g. this example)
Connect components
Just assign out-ports to in-ports
(will make them use the same Go channel):
That’s about it.
SciPipe in action: ”Hello World” example
SciPipe:
A bit longer
example...
SciPipe:
A bit longer
example...
In this area, all processes will
run tasks in parallel, one for
each split produced by “split”
Architecture: Basic Components
● scipipe.SciProcess
● Long-running
● Typically one per operation
● Typically spawns one task per input
● scipipe.SciTask
● Short lived
● Executes just one shell command or custom Go
function
● Typically one per operation/set of in-data files
● scipipe.FileTarget
● Most common data type passed between processes
Architecture: Processes vs. Tasks
Thank you!
@smllmp
bionics.it

More Related Content

What's hot

Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
Max De Marzi
 
YAML Engineering: why we need a new paradigm
YAML Engineering: why we need a new paradigmYAML Engineering: why we need a new paradigm
YAML Engineering: why we need a new paradigm
Raphaël PINSON
 
The Science of a Great Career in Data Science
The Science of a Great Career in Data ScienceThe Science of a Great Career in Data Science
The Science of a Great Career in Data Science
Kate Matsudaira
 
The Purpose of a Process
The Purpose of a ProcessThe Purpose of a Process
The Purpose of a Process
Business Enterprise Mapping
 
Drupalcon 2023 - How Drupal builds your pages.pdf
Drupalcon 2023 - How Drupal builds your pages.pdfDrupalcon 2023 - How Drupal builds your pages.pdf
Drupalcon 2023 - How Drupal builds your pages.pdf
Luca Lusso
 
Structuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and StreamingStructuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and Streaming
Databricks
 
Training Series - Intro to Neo4j
Training Series - Intro to Neo4jTraining Series - Intro to Neo4j
Training Series - Intro to Neo4j
Neo4j
 
Drone.io のご紹介
Drone.io のご紹介Drone.io のご紹介
Drone.io のご紹介
Uchio Kondo
 
AUDIENCE - 4 Keys to the Future of Mobile Video
AUDIENCE - 4 Keys to the Future of Mobile Video AUDIENCE - 4 Keys to the Future of Mobile Video
AUDIENCE - 4 Keys to the Future of Mobile Video
Liam Boogar-Azoulay
 
Neo4j Bloom: Data Visualization for Everyone
Neo4j Bloom: Data Visualization for EveryoneNeo4j Bloom: Data Visualization for Everyone
Neo4j Bloom: Data Visualization for Everyone
Neo4j
 
Hexagonal And Beyond
Hexagonal And BeyondHexagonal And Beyond
Hexagonal And Beyond
Thomas Pierrain
 
PSFK Future of Work Report
PSFK Future of Work ReportPSFK Future of Work Report
PSFK Future of Work Report
PSFK
 
REST vs. GraphQL: Critical Look
REST vs. GraphQL: Critical LookREST vs. GraphQL: Critical Look
REST vs. GraphQL: Critical Look
Nordic APIs
 
How Google Works
How Google WorksHow Google Works
How Google Works
Eric Schmidt
 
15 Quotes To Nurture Your Creative Soul!
15 Quotes To Nurture Your Creative Soul!15 Quotes To Nurture Your Creative Soul!
15 Quotes To Nurture Your Creative Soul!
DesignMantic
 
14 Tips to Entrepreneurs to start the Right Stuff
14 Tips to Entrepreneurs to start the Right Stuff14 Tips to Entrepreneurs to start the Right Stuff
14 Tips to Entrepreneurs to start the Right Stuff
Patrick Stähler
 
What is Rack Hijacking API
What is Rack Hijacking APIWhat is Rack Hijacking API
What is Rack Hijacking API
Nomo Kiyoshi
 
00- Indonesia's PPP - Bidding Process - Investor Guide
00- Indonesia's PPP - Bidding Process - Investor Guide00- Indonesia's PPP - Bidding Process - Investor Guide
00- Indonesia's PPP - Bidding Process - Investor Guide
H2O Management
 
Oil and gas big data analytics data Visualization
Oil and gas big data analytics data VisualizationOil and gas big data analytics data Visualization
Oil and gas big data analytics data Visualization
Infobrandz
 
Tests unitaires pour PostgreSQL avec pgTap
Tests unitaires pour PostgreSQL avec pgTapTests unitaires pour PostgreSQL avec pgTap
Tests unitaires pour PostgreSQL avec pgTap
Rodolphe Quiédeville
 

What's hot (20)

Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 
YAML Engineering: why we need a new paradigm
YAML Engineering: why we need a new paradigmYAML Engineering: why we need a new paradigm
YAML Engineering: why we need a new paradigm
 
The Science of a Great Career in Data Science
The Science of a Great Career in Data ScienceThe Science of a Great Career in Data Science
The Science of a Great Career in Data Science
 
The Purpose of a Process
The Purpose of a ProcessThe Purpose of a Process
The Purpose of a Process
 
Drupalcon 2023 - How Drupal builds your pages.pdf
Drupalcon 2023 - How Drupal builds your pages.pdfDrupalcon 2023 - How Drupal builds your pages.pdf
Drupalcon 2023 - How Drupal builds your pages.pdf
 
Structuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and StreamingStructuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and Streaming
 
Training Series - Intro to Neo4j
Training Series - Intro to Neo4jTraining Series - Intro to Neo4j
Training Series - Intro to Neo4j
 
Drone.io のご紹介
Drone.io のご紹介Drone.io のご紹介
Drone.io のご紹介
 
AUDIENCE - 4 Keys to the Future of Mobile Video
AUDIENCE - 4 Keys to the Future of Mobile Video AUDIENCE - 4 Keys to the Future of Mobile Video
AUDIENCE - 4 Keys to the Future of Mobile Video
 
Neo4j Bloom: Data Visualization for Everyone
Neo4j Bloom: Data Visualization for EveryoneNeo4j Bloom: Data Visualization for Everyone
Neo4j Bloom: Data Visualization for Everyone
 
Hexagonal And Beyond
Hexagonal And BeyondHexagonal And Beyond
Hexagonal And Beyond
 
PSFK Future of Work Report
PSFK Future of Work ReportPSFK Future of Work Report
PSFK Future of Work Report
 
REST vs. GraphQL: Critical Look
REST vs. GraphQL: Critical LookREST vs. GraphQL: Critical Look
REST vs. GraphQL: Critical Look
 
How Google Works
How Google WorksHow Google Works
How Google Works
 
15 Quotes To Nurture Your Creative Soul!
15 Quotes To Nurture Your Creative Soul!15 Quotes To Nurture Your Creative Soul!
15 Quotes To Nurture Your Creative Soul!
 
14 Tips to Entrepreneurs to start the Right Stuff
14 Tips to Entrepreneurs to start the Right Stuff14 Tips to Entrepreneurs to start the Right Stuff
14 Tips to Entrepreneurs to start the Right Stuff
 
What is Rack Hijacking API
What is Rack Hijacking APIWhat is Rack Hijacking API
What is Rack Hijacking API
 
00- Indonesia's PPP - Bidding Process - Investor Guide
00- Indonesia's PPP - Bidding Process - Investor Guide00- Indonesia's PPP - Bidding Process - Investor Guide
00- Indonesia's PPP - Bidding Process - Investor Guide
 
Oil and gas big data analytics data Visualization
Oil and gas big data analytics data VisualizationOil and gas big data analytics data Visualization
Oil and gas big data analytics data Visualization
 
Tests unitaires pour PostgreSQL avec pgTap
Tests unitaires pour PostgreSQL avec pgTapTests unitaires pour PostgreSQL avec pgTap
Tests unitaires pour PostgreSQL avec pgTap
 

Viewers also liked

AddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based ProgrammingAddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based Programming
Samuel Lampa
 
2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
Samuel Lampa
 
Hooking up Semantic MediaWiki with external tools via SPARQL
Hooking up Semantic MediaWiki with external tools via SPARQLHooking up Semantic MediaWiki with external tools via SPARQL
Hooking up Semantic MediaWiki with external tools via SPARQL
Samuel Lampa
 
iRODS Rule Language Cheat Sheet
iRODS Rule Language Cheat SheetiRODS Rule Language Cheat Sheet
iRODS Rule Language Cheat Sheet
Samuel Lampa
 
Taming Snakemake
Taming SnakemakeTaming Snakemake
Taming Snakemake
Jeremy Leipzig
 
Principals, Practices, and Habits
Principals, Practices, and HabitsPrincipals, Practices, and Habits
Principals, Practices, and Habits
Jeremy Leipzig
 
The RDFIO Extension - A Status update
The RDFIO Extension - A Status updateThe RDFIO Extension - A Status update
The RDFIO Extension - A Status update
Samuel Lampa
 
Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...
Ola Spjuth
 
Python Generators - Talk at PySthlm meetup #15
Python Generators - Talk at PySthlm meetup #15Python Generators - Talk at PySthlm meetup #15
Python Generators - Talk at PySthlm meetup #15
Samuel Lampa
 
Thesis presentation Samuel Lampa
Thesis presentation Samuel LampaThesis presentation Samuel Lampa
Thesis presentation Samuel Lampa
Samuel Lampa
 
Reproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilReproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and Anduril
Christian Frech
 
Agile large-scale machine-learning pipelines in drug discovery
Agile large-scale machine-learning pipelines in drug discoveryAgile large-scale machine-learning pipelines in drug discovery
Agile large-scale machine-learning pipelines in drug discovery
Ola Spjuth
 
Batch import of large RDF datasets into Semantic MediaWiki
Batch import of large RDF datasets into Semantic MediaWikiBatch import of large RDF datasets into Semantic MediaWiki
Batch import of large RDF datasets into Semantic MediaWiki
Samuel Lampa
 
Neuro4j Workflow Overview
Neuro4j Workflow OverviewNeuro4j Workflow Overview
Neuro4j Workflow Overview
Dmytro Pavlikovskiy
 
Flow based programming an overview
Flow based programming   an overviewFlow based programming   an overview
Flow based programming an overview
Samuel Lampa
 
Flow-based programming with Elixir
Flow-based programming with ElixirFlow-based programming with Elixir
Flow-based programming with Elixir
Anton Mishchuk
 
Vagrant, Ansible and Docker - How they fit together for productive flexible d...
Vagrant, Ansible and Docker - How they fit together for productive flexible d...Vagrant, Ansible and Docker - How they fit together for productive flexible d...
Vagrant, Ansible and Docker - How they fit together for productive flexible d...
Samuel Lampa
 
ZeroMQ Is The Answer
ZeroMQ Is The AnswerZeroMQ Is The Answer
ZeroMQ Is The Answer
Ian Barber
 
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Go Reactive: Event-Driven, Scalable, Resilient & Responsive SystemsGo Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Jonas Bonér
 
Reactive Streams: Handling Data-Flow the Reactive Way
Reactive Streams: Handling Data-Flow the Reactive WayReactive Streams: Handling Data-Flow the Reactive Way
Reactive Streams: Handling Data-Flow the Reactive Way
Roland Kuhn
 

Viewers also liked (20)

AddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based ProgrammingAddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based Programming
 
2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
 
Hooking up Semantic MediaWiki with external tools via SPARQL
Hooking up Semantic MediaWiki with external tools via SPARQLHooking up Semantic MediaWiki with external tools via SPARQL
Hooking up Semantic MediaWiki with external tools via SPARQL
 
iRODS Rule Language Cheat Sheet
iRODS Rule Language Cheat SheetiRODS Rule Language Cheat Sheet
iRODS Rule Language Cheat Sheet
 
Taming Snakemake
Taming SnakemakeTaming Snakemake
Taming Snakemake
 
Principals, Practices, and Habits
Principals, Practices, and HabitsPrincipals, Practices, and Habits
Principals, Practices, and Habits
 
The RDFIO Extension - A Status update
The RDFIO Extension - A Status updateThe RDFIO Extension - A Status update
The RDFIO Extension - A Status update
 
Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...
 
Python Generators - Talk at PySthlm meetup #15
Python Generators - Talk at PySthlm meetup #15Python Generators - Talk at PySthlm meetup #15
Python Generators - Talk at PySthlm meetup #15
 
Thesis presentation Samuel Lampa
Thesis presentation Samuel LampaThesis presentation Samuel Lampa
Thesis presentation Samuel Lampa
 
Reproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilReproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and Anduril
 
Agile large-scale machine-learning pipelines in drug discovery
Agile large-scale machine-learning pipelines in drug discoveryAgile large-scale machine-learning pipelines in drug discovery
Agile large-scale machine-learning pipelines in drug discovery
 
Batch import of large RDF datasets into Semantic MediaWiki
Batch import of large RDF datasets into Semantic MediaWikiBatch import of large RDF datasets into Semantic MediaWiki
Batch import of large RDF datasets into Semantic MediaWiki
 
Neuro4j Workflow Overview
Neuro4j Workflow OverviewNeuro4j Workflow Overview
Neuro4j Workflow Overview
 
Flow based programming an overview
Flow based programming   an overviewFlow based programming   an overview
Flow based programming an overview
 
Flow-based programming with Elixir
Flow-based programming with ElixirFlow-based programming with Elixir
Flow-based programming with Elixir
 
Vagrant, Ansible and Docker - How they fit together for productive flexible d...
Vagrant, Ansible and Docker - How they fit together for productive flexible d...Vagrant, Ansible and Docker - How they fit together for productive flexible d...
Vagrant, Ansible and Docker - How they fit together for productive flexible d...
 
ZeroMQ Is The Answer
ZeroMQ Is The AnswerZeroMQ Is The Answer
ZeroMQ Is The Answer
 
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Go Reactive: Event-Driven, Scalable, Resilient & Responsive SystemsGo Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
 
Reactive Streams: Handling Data-Flow the Reactive Way
Reactive Streams: Handling Data-Flow the Reactive WayReactive Streams: Handling Data-Flow the Reactive Way
Reactive Streams: Handling Data-Flow the Reactive Way
 

Similar to SciPipe - A light-weight workflow library inspired by flow-based programming

apacheairflow-160827123852.pdf
apacheairflow-160827123852.pdfapacheairflow-160827123852.pdf
apacheairflow-160827123852.pdf
vijayapraba1
 
Jupyter notebooks on steroids
Jupyter notebooks on steroidsJupyter notebooks on steroids
Jupyter notebooks on steroids
Jose Enrique Ruiz
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
Sumit Maheshwari
 
Performance optimization techniques for Java code
Performance optimization techniques for Java codePerformance optimization techniques for Java code
Performance optimization techniques for Java code
Attila Balazs
 
HiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOS
Tulipp. Eu
 
Iga workflow
Iga workflowIga workflow
Iga workflow
WEBdeBS
 
How to write a well-behaved Python command line application
How to write a well-behaved Python command line applicationHow to write a well-behaved Python command line application
How to write a well-behaved Python command line application
gjcross
 
FireWorks overview
FireWorks overviewFireWorks overview
FireWorks overview
Anubhav Jain
 
PyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsPyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web Applications
Graham Dumpleton
 
Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLMachine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkML
Arnab Biswas
 
Wait, IPython can do that?! (30 minutes)
Wait, IPython can do that?! (30 minutes)Wait, IPython can do that?! (30 minutes)
Wait, IPython can do that?! (30 minutes)
Sebastian Witowski
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
ScyllaDB
 
Callgraph analysis
Callgraph analysisCallgraph analysis
Callgraph analysis
Roberto Agostino Vitillo
 
Go at Skroutz
Go at SkroutzGo at Skroutz
Go at Skroutz
AgisAnastasopoulos
 
Building your first aplication using Apache Apex
Building your first aplication using Apache ApexBuilding your first aplication using Apache Apex
Building your first aplication using Apache Apex
Yogi Devendra Vyavahare
 
Building Your First Apache Apex Application
Building Your First Apache Apex ApplicationBuilding Your First Apache Apex Application
Building Your First Apache Apex Application
Apache Apex
 
Great Tools Heavily Used In Japan, You Don't Know.
Great Tools Heavily Used In Japan, You Don't Know.Great Tools Heavily Used In Japan, You Don't Know.
Great Tools Heavily Used In Japan, You Don't Know.
Junichi Ishida
 
My "Perfect" Toolchain Setup for Grails Projects
My "Perfect" Toolchain Setup for Grails ProjectsMy "Perfect" Toolchain Setup for Grails Projects
My "Perfect" Toolchain Setup for Grails Projects
GR8Conf
 
Api world apache nifi 101
Api world   apache nifi 101Api world   apache nifi 101
Api world apache nifi 101
Timothy Spann
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache Flink
AKASH SIHAG
 

Similar to SciPipe - A light-weight workflow library inspired by flow-based programming (20)

apacheairflow-160827123852.pdf
apacheairflow-160827123852.pdfapacheairflow-160827123852.pdf
apacheairflow-160827123852.pdf
 
Jupyter notebooks on steroids
Jupyter notebooks on steroidsJupyter notebooks on steroids
Jupyter notebooks on steroids
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Performance optimization techniques for Java code
Performance optimization techniques for Java codePerformance optimization techniques for Java code
Performance optimization techniques for Java code
 
HiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOS
 
Iga workflow
Iga workflowIga workflow
Iga workflow
 
How to write a well-behaved Python command line application
How to write a well-behaved Python command line applicationHow to write a well-behaved Python command line application
How to write a well-behaved Python command line application
 
FireWorks overview
FireWorks overviewFireWorks overview
FireWorks overview
 
PyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsPyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web Applications
 
Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLMachine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkML
 
Wait, IPython can do that?! (30 minutes)
Wait, IPython can do that?! (30 minutes)Wait, IPython can do that?! (30 minutes)
Wait, IPython can do that?! (30 minutes)
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
 
Callgraph analysis
Callgraph analysisCallgraph analysis
Callgraph analysis
 
Go at Skroutz
Go at SkroutzGo at Skroutz
Go at Skroutz
 
Building your first aplication using Apache Apex
Building your first aplication using Apache ApexBuilding your first aplication using Apache Apex
Building your first aplication using Apache Apex
 
Building Your First Apache Apex Application
Building Your First Apache Apex ApplicationBuilding Your First Apache Apex Application
Building Your First Apache Apex Application
 
Great Tools Heavily Used In Japan, You Don't Know.
Great Tools Heavily Used In Japan, You Don't Know.Great Tools Heavily Used In Japan, You Don't Know.
Great Tools Heavily Used In Japan, You Don't Know.
 
My "Perfect" Toolchain Setup for Grails Projects
My "Perfect" Toolchain Setup for Grails ProjectsMy "Perfect" Toolchain Setup for Grails Projects
My "Perfect" Toolchain Setup for Grails Projects
 
Api world apache nifi 101
Api world   apache nifi 101Api world   apache nifi 101
Api world apache nifi 101
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache Flink
 

More from Samuel Lampa

Using Flow-based programming to write tools and workflows for Scientific Comp...
Using Flow-based programming to write tools and workflows for Scientific Comp...Using Flow-based programming to write tools and workflows for Scientific Comp...
Using Flow-based programming to write tools and workflows for Scientific Comp...
Samuel Lampa
 
Linked Data for improved organization of research data
Linked Data  for improved organization  of research dataLinked Data  for improved organization  of research data
Linked Data for improved organization of research data
Samuel Lampa
 
How to document computational research projects
How to document computational research projectsHow to document computational research projects
How to document computational research projects
Samuel Lampa
 
Reproducibility in Scientific Data Analysis - BioScience Seminar
Reproducibility in Scientific Data Analysis - BioScience SeminarReproducibility in Scientific Data Analysis - BioScience Seminar
Reproducibility in Scientific Data Analysis - BioScience Seminar
Samuel Lampa
 
First encounter with Elixir - Some random things
First encounter with Elixir - Some random thingsFirst encounter with Elixir - Some random things
First encounter with Elixir - Some random things
Samuel Lampa
 
Profiling go code a beginners tutorial
Profiling go code   a beginners tutorialProfiling go code   a beginners tutorial
Profiling go code a beginners tutorial
Samuel Lampa
 
My lightning talk at Go Stockholm meetup Aug 6th 2013
My lightning talk at Go Stockholm meetup Aug 6th 2013My lightning talk at Go Stockholm meetup Aug 6th 2013
My lightning talk at Go Stockholm meetup Aug 6th 2013
Samuel Lampa
 
3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
Samuel Lampa
 

More from Samuel Lampa (8)

Using Flow-based programming to write tools and workflows for Scientific Comp...
Using Flow-based programming to write tools and workflows for Scientific Comp...Using Flow-based programming to write tools and workflows for Scientific Comp...
Using Flow-based programming to write tools and workflows for Scientific Comp...
 
Linked Data for improved organization of research data
Linked Data  for improved organization  of research dataLinked Data  for improved organization  of research data
Linked Data for improved organization of research data
 
How to document computational research projects
How to document computational research projectsHow to document computational research projects
How to document computational research projects
 
Reproducibility in Scientific Data Analysis - BioScience Seminar
Reproducibility in Scientific Data Analysis - BioScience SeminarReproducibility in Scientific Data Analysis - BioScience Seminar
Reproducibility in Scientific Data Analysis - BioScience Seminar
 
First encounter with Elixir - Some random things
First encounter with Elixir - Some random thingsFirst encounter with Elixir - Some random things
First encounter with Elixir - Some random things
 
Profiling go code a beginners tutorial
Profiling go code   a beginners tutorialProfiling go code   a beginners tutorial
Profiling go code a beginners tutorial
 
My lightning talk at Go Stockholm meetup Aug 6th 2013
My lightning talk at Go Stockholm meetup Aug 6th 2013My lightning talk at Go Stockholm meetup Aug 6th 2013
My lightning talk at Go Stockholm meetup Aug 6th 2013
 
3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
 

Recently uploaded

GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 

Recently uploaded (20)

GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 

SciPipe - A light-weight workflow library inspired by flow-based programming

  • 1. SciPipe A light-weight workflow library inspired by flow-based programming Samuel Lampa, @smllmp, bionics.it Dept. Pharm. Biosci. UU, 2016-04-28
  • 2. Top light-weight workflow tools Snakemake ● Great for short one-off explorative stuff ● Tricky for complex graphs Bpipe ● Easy to use for highly linear workflows ● Not so easy with branching workflows Nextflow ● Dataflow means dynamic scheduling possible(!) ● Own way of organizing outputs ● No “re-usable components” support
  • 3. SciLuigi and SciPipe SciLuigi ● Great re-usable components story ● Highly customizable output file naming ● Easy to extend API ● No dynamic scheduling :( ● Performance problems with more than 64 workers SciPipe ● (Same benefits as SciLuigi) ● Also: Allows dynamic scheduling ● Also: Much lower resource usage (1000s of workers is OK) ● Also: Simpler, less code, less maintenance ● Also: High-performance for in-line components
  • 4. SciPipe in brief ● Website: scipipe.org ● Simple, very little code => maintainable ● Write workflows in a subset of Go(lang) ● Execute readable .go-files: go run myworkflow.go ● Optional compilation to static executable files: go build; ./myworkflow ● No new language. Use existing Go tooling: ● Editors, Debuggers, Linters, Profilers ...
  • 6. Flow-based programming principles ● Separate network definition (separate from process definitions) ● Named ports ● Channels with bounded buffers ● Information packets (IPs) with defined lifetimes ● More info: en.wikipedia.org/wiki/Flow-based programming www.jpaulmorrison.com/fbp
  • 7. Define components From shell commands: … or in plain Go: … and then mix & match! (See e.g. this example)
  • 8. Connect components Just assign out-ports to in-ports (will make them use the same Go channel): That’s about it.
  • 9. SciPipe in action: ”Hello World” example
  • 11. SciPipe: A bit longer example... In this area, all processes will run tasks in parallel, one for each split produced by “split”
  • 12. Architecture: Basic Components ● scipipe.SciProcess ● Long-running ● Typically one per operation ● Typically spawns one task per input ● scipipe.SciTask ● Short lived ● Executes just one shell command or custom Go function ● Typically one per operation/set of in-data files ● scipipe.FileTarget ● Most common data type passed between processes