SciPipe - A light-weight workflow library inspired by flow-based programming

SciPipe
A light-weight workflow library
inspired by flow-based
programming
Samuel Lampa, @smllmp, bionics.it
Dept. Pharm. Biosci. UU, 2016-04-28

Top light-weight workflow tools
Snakemake
● Great for short one-off explorative stuff
● Tricky for complex graphs
Bpipe
● Easy to use for highly linear workflows
● Not so easy with branching workflows
Nextflow
● Dataflow means dynamic scheduling possible(!)
● Own way of organizing outputs
● No “re-usable components” support

SciLuigi and SciPipe
SciLuigi
● Great re-usable components story
● Highly customizable output file naming
● Easy to extend API
● No dynamic scheduling :(
● Performance problems with more than 64 workers
SciPipe
● (Same benefits as SciLuigi)
● Also: Allows dynamic scheduling
● Also: Much lower resource usage
(1000s of workers is OK)
● Also: Simpler, less code, less maintenance
● Also: High-performance for in-line components

SciPipe in brief
● Website: scipipe.org
● Simple, very little code => maintainable
● Write workflows in a subset of Go(lang)
● Execute readable .go-files:
go run myworkflow.go
● Optional compilation to static executable files:
go build; ./myworkflow
● No new language. Use existing Go tooling:
● Editors, Debuggers, Linters, Profilers ...

Flow-based programming
www.jpaulmorrison.com/fbp

Flow-based programming principles
● Separate network definition
(separate from process definitions)
● Named ports
● Channels with bounded buffers
● Information packets (IPs) with defined lifetimes
● More info:
en.wikipedia.org/wiki/Flow-based programming
www.jpaulmorrison.com/fbp

Define components
From shell commands:
… or in plain Go:
… and then mix & match! (See e.g. this example)

Connect components
Just assign out-ports to in-ports
(will make them use the same Go channel):
That’s about it.

SciPipe in action: ”Hello World” example

SciPipe:
A bit longer
example...

SciPipe:
A bit longer
example...
In this area, all processes will
run tasks in parallel, one for
each split produced by “split”

Architecture: Basic Components
● scipipe.SciProcess
● Long-running
● Typically one per operation
● Typically spawns one task per input
● scipipe.SciTask
● Short lived
● Executes just one shell command or custom Go
function
● Typically one per operation/set of in-data files
● scipipe.FileTarget
● Most common data type passed between processes

Architecture: Processes vs. Tasks

SciPipe - A light-weight workflow library inspired by flow-based programming

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to SciPipe - A light-weight workflow library inspired by flow-based programming

Similar to SciPipe - A light-weight workflow library inspired by flow-based programming (20)

More from Samuel Lampa

More from Samuel Lampa (8)

Recently uploaded

Recently uploaded (20)

SciPipe - A light-weight workflow library inspired by flow-based programming