Successfully reported this slideshow.
Your SlideShare is downloading. ×

Using Flow-based programming to write tools and workflows for Scientific Computing in Go

Ad

Using Flow-based programming
... to write Tools and Workflows
for Scientific Computing
Go Stockholm Conference Oct 6, 2018...

Ad

About the speaker
● Name: Samuel Lampa
● PhD in Pharm. Bioinformatics from UU / pharmb.io (since 1 week)
● Researched: Flo...

Ad

Read more about my research
bit.ly/samlthesis →
(bionics.it/posts/phdthesis)

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Loading in …3
×

Check these out next

1 of 54 Ad
1 of 54 Ad

More Related Content

More from Samuel Lampa (17)

Using Flow-based programming to write tools and workflows for Scientific Computing in Go

  1. 1. Using Flow-based programming ... to write Tools and Workflows for Scientific Computing Go Stockholm Conference Oct 6, 2018 Samuel Lampa | bionics.it | @saml (slack) | @smllmp (twitter) Ex - Dept. of Pharm. Biosci, Uppsala University | www.farmbio.uu.se | pharmb.io Savantic AB savantic.se | RIL Partner AB rilpartner.com
  2. 2. About the speaker ● Name: Samuel Lampa ● PhD in Pharm. Bioinformatics from UU / pharmb.io (since 1 week) ● Researched: Flow-based programming-based workflow tools to build predictive models for drug discovery ● Previously: HPC sysadmin & developer,Web developer,etc, M.Sc. in molecular biotechnology engineering ● Next week: R&D Engineer at Savantic AB (savanticab.com) ● (Also:AfricArxiv (africarxiv.org) and RIL Partner AB (rilpartner.com))
  3. 3. Read more about my research bit.ly/samlthesis → (bionics.it/posts/phdthesis)
  4. 4. Flow-based … what?
  5. 5. Flow-based programming (FBP) Note: Doesn’t need to be done visaully though!
  6. 6. ● Black box, asynchronously running processes ● Data exchange across predefined connections between named ports (with bounded buffers) by message passing only ● Connections specified separately from processes ● Processes can be reconnected endlessly to form different applications without having being changed internally FBP in brief
  7. 7. Flow-based programming (FBP) Note: Doesn’t need to be done visaully though!
  8. 8. The Central Dogma of Biology … … from DNA to RNA to Proteins DNA mRNA Protein Image credits: Nicolle Rager, National Science Foundation. License: Public domain Amino acids Ribosome RNA polymerase Cell nucleus Cell
  9. 9. “FBP is a particular form of dataflow programming based on bounded buffers, information packets with defined lifetimes, named ports, and separate definition of connections” FBP vs Dataflow
  10. 10. ● Change of connection wiring without rewriting components ● Inherently concurrent - suited for the multi-core CPU world ● Testing, monitoring and logging very easy: Just plug in a mock-, logging- or debugging component. ● Etc etc ... Benefits abound
  11. 11. jpaulmorrison.com(/fbp) Invented by J. Paul Morrison at IBM in late 60’s
  12. 12. github.com/trustmaster/goflow by Vladimir Sibirov @sibiroff (twitter) FBP in Go: GoFlow
  13. 13. FBP in plain Go (almost) without frameworks?
  14. 14. Generator functions Adapted from Rob Pike’s slides: talks.golang.org/2012/concurrency.slide#25 func main() { c := generateInts(10) // Call function to get a channel for v := range c { // … and loop over it fmt.Println(v) } } func generateInts(max int) <-chan int { // Return a channel of ints c := make(chan int) go func() { // Init go-routine inside function defer close(c) for i := 0; i <= max; i++ { c <- i } }() return c // Return the channel }
  15. 15. Chaining generator functions 1/2 func reverse(cin chan string) chan string { cout := make(chan string) go func() { defer close(cout) for s := range cin { // Loop over in-chan cout <- reverse(s) // Send on out-chan } }() return cout }
  16. 16. Chaining generator functions 2/2 // Chain the generator functions dna := generateDNA() // Generator func of strings rev := reverse(dna) compl := complement(rev) // Drive the chain by reading from last channel for dnaString := range compl { fmt.Println(dnaString) }
  17. 17. Chaining generator functions 2/2 // Chain the generator functions dna := generateDNA() // Generator func of strings rev := reverse(dna) compl := complement(rev) // Drive the chain by reading from last channel for dnaString := range compl { fmt.Println(dnaString) }
  18. 18. Problems with the generator approach ● Inputs not named in connection code (no keyword arguments) ● Multiple return values depend on positional arguments: leftPart, rightPart := splitInHalves(chanOfStrings)
  19. 19. Could we emulate named ports? type P struct { in chan string // Channels as struct fields, to act as “named ports” out chan string } func NewP() *P { // Initialize a new component return &P{ in: make(chan string, 16), out: make(chan string, 16), } } func (p *P) Run() { defer close(p.out) for s := range p.in { // Refer to struct fields when reading ... p.out <- s // ... and writing } }
  20. 20. Could we emulate named ports? func main() { p1 := NewP() p2 := NewP() p2.in = p1.out // Connect dependencies here, by assigning to same chan go p1.Run() go p2.Run() go func() { // Feed the input of the network defer close(p1.in) for i := 0; i <= 10; i++ { p1.in <- "Hej" } }() for s := range p2.out { // Drive the chain from the main go-routine fmt.Println(s) } }
  21. 21. Add almost no additional code, and get: flowbase.org
  22. 22. Real-world use of FlowBase ● RDF (Semantic) MediaWiki XML→ ● Import via MediaWiki XML import ● Code: github.com/rdfio/rdf2smw ● Paper: bit.ly/rdfiopub
  23. 23. Connecting dependencies with FlowBase ttlFileRead.OutTriple = aggregator.In aggregator.Out = indexCreator.In indexCreator.Out = indexFanOut.In indexFanOut.Out["serialize"] = indexToAggr.In indexFanOut.Out["conv"] = triplesToWikiConverter.InIndex indexToAggr.Out = triplesToWikiConverter.InAggregate triplesToWikiConverter.OutPage = xmlCreator.InWikiPage xmlCreator.OutTemplates = templateWriter.In xmlCreator.OutProperties = propertyWriter.In xmlCreator.OutPages = pageWriter.In github.com/rdfio/rdf2smw/blob/e7e2b3/main.go#L100-L125
  24. 24. Taking it further: Port structs ttlFileRead.OutTriple().To(aggregator.In()) aggregator.Out().To(indexCreator.In()) indexCreator.Out().To(indexToAggr.In()) indexCreator.Out().To(triplesToWikiConverter.InIndex()) indexToAggr.Out().To(triplesToWikiConverter.InAggregate()) triplesToWikiConverter.OutPage().To(xmlCreator.InWikiPage()) xmlCreator.OutTemplates().To(templateWriter.In()) xmlCreator.OutProperties().To(propertyWriter.In()) xmlCreator.OutPages().To(pageWriter.In()) (So far only used in SciPipe, not yet FlowBase)
  25. 25. SciPipe Write Scientific Workflows in Go ● Define processes with shell command patterns ● Atomic writes, Restartable workflows, Caching ● Automatic file naming ● Audit logging ● Workflow graph plotting ● Intro & Docs: scipipe.org ● Preprint paper: doi.org/10.1101/380808
  26. 26. SciPipe ● Workflow ● Keeps track of dependency graph ● Process ● Added to workflows ● Long-running ● Typically one per operation ● Task ● Spawned by processes ● Executes just one shell command or custom Go function ● Typically one task spawned per operation on a set of input files ● Information Packet (IP) ● Most common data type passed between processes Workflow Process File IP Task Task Task
  27. 27. “Hello World” in SciPipe package main import ( // Import the SciPipe package, aliased to 'sp' sp "github.com/scipipe/scipipe" ) func main() { // Init workflow with a name, and max concurrent tasks wf := sp.NewWorkflow("hello_world", 4) // Initialize processes and set output file paths hello := wf.NewProc("hello", "echo 'Hello ' > {o:out}") hello.SetOut("out", "hello.txt") world := wf.NewProc("world", "echo $(cat {i:in}) World >> {o:out}") world.SetOut("out", "{i:in|%.txt}_world.txt") // Connect network world.In("in").From(hello.Out("out")) // Run workflow wf.Run() }
  28. 28. “Hello World” in SciPipe package main import ( // Import the SciPipe package, aliased to 'sp' sp "github.com/scipipe/scipipe" ) func main() { // Init workflow with a name, and max concurrent tasks wf := sp.NewWorkflow("hello_world", 4) // Initialize processes and set output file paths hello := wf.NewProc("hello", "echo 'Hello ' > {o:out}") hello.SetOut("out", "hello.txt") world := wf.NewProc("world", "echo $(cat {i:in}) World >> {o:out}") world.SetOut("out", "{i:in|%.txt}_world.txt") // Connect network world.In("in").From(hello.Out("out")) // Run workflow wf.Run() } ● Import SciPipe ● Set up any default variables or data, handle flags etc ● Initiate workflow ● Create processes ● Define outputs and paths ● Connect outputs to inputs (dependencies / data flow) ● Run the workflow
  29. 29. “Hello World” in SciPipe package main import ( // Import the SciPipe package, aliased to 'sp' sp "github.com/scipipe/scipipe" ) func main() { // Init workflow with a name, and max concurrent tasks wf := sp.NewWorkflow("hello_world", 4) // Initialize processes and set output file paths hello := wf.NewProc("hello", "echo 'Hello ' > {o:out}") hello.SetOut("out", "hello.txt") world := wf.NewProc("world", "echo $(cat {i:in}) World >> {o:out}") world.SetOut("out", "{i:in|%.txt}_world.txt") // Connect network world.In("in").From(hello.Out("out")) // Run workflow wf.Run() } ● Import SciPipe ● Set up any default variables or data, handle flags etc ● Initiate workflow ● Create processes ● Define outputs and paths ● Connect outputs to inputs (dependencies / data flow) ● Run the workflow
  30. 30. “Hello World” in SciPipe package main import ( // Import the SciPipe package, aliased to 'sp' sp "github.com/scipipe/scipipe" ) func main() { // Init workflow with a name, and max concurrent tasks wf := sp.NewWorkflow("hello_world", 4) // Initialize processes and set output file paths hello := wf.NewProc("hello", "echo 'Hello ' > {o:out}") hello.SetOut("out", "hello.txt") world := wf.NewProc("world", "echo $(cat {i:in}) World >> {o:out}") world.SetOut("out", "{i:in|%.txt}_world.txt") // Connect network world.In("in").From(hello.Out("out")) // Run workflow wf.Run() } ● Import SciPipe ● Set up any default variables or data, handle flags etc ● Initiate workflow ● Create processes ● Define outputs and paths ● Connect outputs to inputs (dependencies / data flow) ● Run the workflow
  31. 31. “Hello World” in SciPipe package main import ( // Import the SciPipe package, aliased to 'sp' sp "github.com/scipipe/scipipe" ) func main() { // Init workflow with a name, and max concurrent tasks wf := sp.NewWorkflow("hello_world", 4) // Initialize processes and set output file paths hello := wf.NewProc("hello", "echo 'Hello ' > {o:out}") hello.SetOut("out", "hello.txt") world := wf.NewProc("world", "echo $(cat {i:in}) World >> {o:out}") world.SetOut("out", "{i:in|%.txt}_world.txt") // Connect network world.In("in").From(hello.Out("out")) // Run workflow wf.Run() } ● Import SciPipe ● Set up any default variables or data, handle flags etc ● Initiate workflow ● Create processes ● Define outputs and paths ● Connect outputs to inputs (dependencies / data flow) ● Run the workflow
  32. 32. “Hello World” in SciPipe package main import ( // Import the SciPipe package, aliased to 'sp' sp "github.com/scipipe/scipipe" ) func main() { // Init workflow with a name, and max concurrent tasks wf := sp.NewWorkflow("hello_world", 4) // Initialize processes and set output file paths hello := wf.NewProc("hello", "echo 'Hello ' > {o:out}") hello.SetOut("out", "hello.txt") world := wf.NewProc("world", "echo $(cat {i:in}) World >> {o:out}") world.SetOut("out", "{i:in|%.txt}_world.txt") // Connect network world.In("in").From(hello.Out("out")) // Run workflow wf.Run() } ● Import SciPipe ● Set up any default variables or data, handle flags etc ● Initiate workflow ● Create processes ● Define outputs and paths ● Connect outputs to inputs (dependencies / data flow) ● Run the workflow
  33. 33. “Hello World” in SciPipe package main import ( // Import the SciPipe package, aliased to 'sp' sp "github.com/scipipe/scipipe" ) func main() { // Init workflow with a name, and max concurrent tasks wf := sp.NewWorkflow("hello_world", 4) // Initialize processes and set output file paths hello := wf.NewProc("hello", "echo 'Hello ' > {o:out}") hello.SetOut("out", "hello.txt") world := wf.NewProc("world", "echo $(cat {i:in}) World >> {o:out}") world.SetOut("out", "{i:in|%.txt}_world.txt") // Connect network world.In("in").From(hello.Out("out")) // Run workflow wf.Run() } ● Import SciPipe ● Set up any default variables or data, handle flags etc ● Initiate workflow ● Create processes ● Define outputs and paths ● Connect outputs & inputs (dependencies / data flow) ● Run the workflow
  34. 34. “Hello World” in SciPipe package main import ( // Import the SciPipe package, aliased to 'sp' sp "github.com/scipipe/scipipe" ) func main() { // Init workflow with a name, and max concurrent tasks wf := sp.NewWorkflow("hello_world", 4) // Initialize processes and set output file paths hello := wf.NewProc("hello", "echo 'Hello ' > {o:out}") hello.SetOut("out", "hello.txt") world := wf.NewProc("world", "echo $(cat {i:in}) World >> {o:out}") world.SetOut("out", "{i:in|%.txt}_world.txt") // Connect network world.In("in").From(hello.Out("out")) // Run workflow wf.Run() } ● Import SciPipe ● Set up any default variables or data, handle flags etc ● Initiate workflow ● Create processes ● Define outputs and paths ● Connect outputs to inputs (dependencies / data flow) ● Run the workflow
  35. 35. Writing SciPipe workflows package main import ( "github.com/scipipe/scipipe" ) const dna = "AAAGCCCGTGGGGGACCTGTTC" func main() { wf := scipipe.NewWorkflow("DNA Base Complement Workflow", 4) makeDNA := wf.NewProc("Make DNA", "echo "+dna+" > {o:dna}") makeDNA.SetOut("dna", "dna.txt") complmt := wf.NewProc("Base Complement", "cat {i:in} | tr ATCG TAGC > {o:compl}") complmt.SetOut("compl", "{i:in|%.txt}.compl.txt") reverse := wf.NewProc("Reverse", "cat {i:in} | rev > {o:rev}") reverse.SetOut("rev", "{i:in|%.txt}.rev.txt") complmt.In("in").From(makeDNA.Out("dna")) reverse.In("in").From(complmt.Out("compl")) wf.Run() } ● Import SciPipe ● Set up any default variables or data, handle flags etc ● Initiate workflow ● Create processes ● Define outputs and paths ● Connect outputs to inputs (dependencies / data flow) ● Run the workflow
  36. 36. Writing SciPipe workflows package main import ( "github.com/scipipe/scipipe" ) const dna = "AAAGCCCGTGGGGGACCTGTTC" func main() { wf := scipipe.NewWorkflow("DNA Base Complement Workflow", 4) makeDNA := wf.NewProc("Make DNA", "echo "+dna+" > {o:dna}") makeDNA.SetOut("dna", "dna.txt") complmt := wf.NewProc("Base Complement", "cat {i:in} | tr ATCG TAGC > {o:compl}") complmt.SetOut("compl", "{i:in|%.txt}.compl.txt") reverse := wf.NewProc("Reverse", "cat {i:in} | rev > {o:rev}") reverse.SetOut("rev", "{i:in|%.txt}.rev.txt") complmt.In("in").From(makeDNA.Out("dna")) reverse.In("in").From(complmt.Out("compl")) wf.Run() } ● Import SciPipe ● Set up any default variables or data, handle flags etc ● Initiate workflow ● Create processes ● Define outputs and paths ● Connect outputs to inputs (dependencies / data flow) ● Run the workflow
  37. 37. Writing SciPipe workflows package main import ( "github.com/scipipe/scipipe" ) const dna = "AAAGCCCGTGGGGGACCTGTTC" func main() { wf := scipipe.NewWorkflow("DNA Base Complement Workflow", 4) makeDNA := wf.NewProc("Make DNA", "echo "+dna+" > {o:dna}") makeDNA.SetOut("dna", "dna.txt") complmt := wf.NewProc("Base Complement", "cat {i:in} | tr ATCG TAGC > {o:compl}") complmt.SetOut("compl", "{i:in|%.txt}.compl.txt") reverse := wf.NewProc("Reverse", "cat {i:in} | rev > {o:rev}") reverse.SetOut("rev", "{i:in|%.txt}.rev.txt") complmt.In("in").From(makeDNA.Out("dna")) reverse.In("in").From(complmt.Out("compl")) wf.Run() } ● Import SciPipe ● Set up any default variables or data, handle flags etc ● Initiate workflow ● Create processes ● Define outputs and paths ● Connect outputs to inputs (dependencies / data flow) ● Run the workflow
  38. 38. Writing SciPipe workflows package main import ( "github.com/scipipe/scipipe" ) const dna = "AAAGCCCGTGGGGGACCTGTTC" func main() { wf := scipipe.NewWorkflow("DNA Base Complement Workflow", 4) makeDNA := wf.NewProc("Make DNA", "echo "+dna+" > {o:dna}") makeDNA.SetOut("dna", "dna.txt") complmt := wf.NewProc("Base Complement", "cat {i:in} | tr ATCG TAGC > {o:compl}") complmt.SetOut("compl", "{i:in|%.txt}.compl.txt") reverse := wf.NewProc("Reverse", "cat {i:in} | rev > {o:rev}") reverse.SetOut("rev", "{i:in|%.txt}.rev.txt") complmt.In("in").From(makeDNA.Out("dna")) reverse.In("in").From(complmt.Out("compl")) wf.Run() } ● Import SciPipe ● Set up any default variables or data, handle flags etc ● Initiate workflow ● Create processes ● Define outputs and paths ● Connect outputs to inputs (dependencies / data flow) ● Run the workflow
  39. 39. Writing SciPipe workflows package main import ( "github.com/scipipe/scipipe" ) const dna = "AAAGCCCGTGGGGGACCTGTTC" func main() { wf := scipipe.NewWorkflow("DNA Base Complement Workflow", 4) makeDNA := wf.NewProc("Make DNA", "echo "+dna+" > {o:dna}") makeDNA.SetOut("dna", "dna.txt") complmt := wf.NewProc("Base Complement", "cat {i:in} | tr ATCG TAGC > {o:compl}") complmt.SetOut("compl", "{i:in|%.txt}.compl.txt") reverse := wf.NewProc("Reverse", "cat {i:in} | rev > {o:rev}") reverse.SetOut("rev", "{i:in|%.txt}.rev.txt") complmt.In("in").From(makeDNA.Out("dna")) reverse.In("in").From(complmt.Out("compl")) wf.Run() } ● Import SciPipe ● Set up any default variables or data, handle flags etc ● Initiate workflow ● Create processes ● Define outputs and paths ● Connect outputs to inputs (dependencies / data flow) ● Run the workflow
  40. 40. Writing SciPipe workflows package main import ( "github.com/scipipe/scipipe" ) const dna = "AAAGCCCGTGGGGGACCTGTTC" func main() { wf := scipipe.NewWorkflow("DNA Base Complement Workflow", 4) makeDNA := wf.NewProc("Make DNA", "echo "+dna+" > {o:dna}") makeDNA.SetOut("dna", "dna.txt") complmt := wf.NewProc("Base Complement", "cat {i:in} | tr ATCG TAGC > {o:compl}") complmt.SetOut("compl", "{i:in|%.txt}.compl.txt") reverse := wf.NewProc("Reverse", "cat {i:in} | rev > {o:rev}") reverse.SetOut("rev", "{i:in|%.txt}.rev.txt") complmt.In("in").From(makeDNA.Out("dna")) reverse.In("in").From(complmt.Out("compl")) wf.Run() } ● Import SciPipe ● Set up any default variables or data, handle flags etc ● Initiate workflow ● Create processes ● Define outputs and paths ● Connect outputs to inputs (dependencies / data flow) ● Run the workflow
  41. 41. Writing SciPipe workflows package main import ( "github.com/scipipe/scipipe" ) const dna = "AAAGCCCGTGGGGGACCTGTTC" func main() { wf := scipipe.NewWorkflow("DNA Base Complement Workflow", 4) makeDNA := wf.NewProc("Make DNA", "echo "+dna+" > {o:dna}") makeDNA.SetOut("dna", "dna.txt") complmt := wf.NewProc("Base Complement", "cat {i:in} | tr ATCG TAGC > {o:compl}") complmt.SetOut("compl", "{i:in|%.txt}.compl.txt") reverse := wf.NewProc("Reverse", "cat {i:in} | rev > {o:rev}") reverse.SetOut("rev", "{i:in|%.txt}.rev.txt") complmt.In("in").From(makeDNA.Out("dna")) reverse.In("in").From(complmt.Out("compl")) wf.Run() } ● Import SciPipe ● Set up any default variables or data, handle flags etc ● Initiate workflow ● Create processes ● Define outputs and paths ● Connect outputs to inputs (dependencies / data flow) ● Run the workflow
  42. 42. Running it go run revcompl.go
  43. 43. Dependency graph plotting
  44. 44. Structured audit log (Hierarchical JSON)
  45. 45. Turn Audit log into TeX/PDF report TeX template by Jonathan Alvarsson @jonalv
  46. 46. ● Intuitive behaviour: Like conveyor belts & stations in a factory. ● Flexible: Combine command-line programs with Go components ● Custom file naming: Easy to manually browse output files ● Portable: Distribute as Go code or as compiled executable files ● Easy to debug: Use any Go debugging tools or even just println() ● Powerful audit logging: Stream outputs via UNIX FIFO files ● Efficient & Parallel: Fast code + Efficient use of multi-core CPU Benefits of SciPipe - Thanks to Go + FBP
  47. 47. More info at: scipipe.org
  48. 48. Thank you for your time! Using Flow-based programming ... to write Tools and Workflows for Scientific Computing Talk at Go Stockholm Conference Oct 6, 2018 Samuel Lampa | bionics.it | @saml (slack) | @smllmp (twitter) Dept. of Pharm. Biosci, Uppsala University | www.farmbio.uu.se | pharmb.io

×