Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using Flow-based programming to write tools and workflows for Scientific Computing in Go

966 views

Published on

Presentation at Go Stockholm Conference October 6, 2018, at Google Stockholm.

Published in: Engineering
  • Be the first to comment

Using Flow-based programming to write tools and workflows for Scientific Computing in Go

  1. 1. Using Flow-based programming ... to write Tools and Workflows for Scientific Computing Go Stockholm Conference Oct 6, 2018 Samuel Lampa | bionics.it | @saml (slack) | @smllmp (twitter) Ex - Dept. of Pharm. Biosci, Uppsala University | www.farmbio.uu.se | pharmb.io Savantic AB savantic.se | RIL Partner AB rilpartner.com
  2. 2. About the speaker ● Name: Samuel Lampa ● PhD in Pharm. Bioinformatics from UU / pharmb.io (since 1 week) ● Researched: Flow-based programming-based workflow tools to build predictive models for drug discovery ● Previously: HPC sysadmin & developer,Web developer,etc, M.Sc. in molecular biotechnology engineering ● Next week: R&D Engineer at Savantic AB (savanticab.com) ● (Also:AfricArxiv (africarxiv.org) and RIL Partner AB (rilpartner.com))
  3. 3. Read more about my research bit.ly/samlthesis → (bionics.it/posts/phdthesis)
  4. 4. Flow-based … what?
  5. 5. Flow-based programming (FBP) Note: Doesn’t need to be done visaully though!
  6. 6. ● Black box, asynchronously running processes ● Data exchange across predefined connections between named ports (with bounded buffers) by message passing only ● Connections specified separately from processes ● Processes can be reconnected endlessly to form different applications without having being changed internally FBP in brief
  7. 7. Flow-based programming (FBP) Note: Doesn’t need to be done visaully though!
  8. 8. The Central Dogma of Biology … … from DNA to RNA to Proteins DNA mRNA Protein Image credits: Nicolle Rager, National Science Foundation. License: Public domain Amino acids Ribosome RNA polymerase Cell nucleus Cell
  9. 9. “FBP is a particular form of dataflow programming based on bounded buffers, information packets with defined lifetimes, named ports, and separate definition of connections” FBP vs Dataflow
  10. 10. ● Change of connection wiring without rewriting components ● Inherently concurrent - suited for the multi-core CPU world ● Testing, monitoring and logging very easy: Just plug in a mock-, logging- or debugging component. ● Etc etc ... Benefits abound
  11. 11. jpaulmorrison.com(/fbp) Invented by J. Paul Morrison at IBM in late 60’s
  12. 12. github.com/trustmaster/goflow by Vladimir Sibirov @sibiroff (twitter) FBP in Go: GoFlow
  13. 13. FBP in plain Go (almost) without frameworks?
  14. 14. Generator functions Adapted from Rob Pike’s slides: talks.golang.org/2012/concurrency.slide#25 func main() { c := generateInts(10) // Call function to get a channel for v := range c { // … and loop over it fmt.Println(v) } } func generateInts(max int) <-chan int { // Return a channel of ints c := make(chan int) go func() { // Init go-routine inside function defer close(c) for i := 0; i <= max; i++ { c <- i } }() return c // Return the channel }
  15. 15. Chaining generator functions 1/2 func reverse(cin chan string) chan string { cout := make(chan string) go func() { defer close(cout) for s := range cin { // Loop over in-chan cout <- reverse(s) // Send on out-chan } }() return cout }
  16. 16. Chaining generator functions 2/2 // Chain the generator functions dna := generateDNA() // Generator func of strings rev := reverse(dna) compl := complement(rev) // Drive the chain by reading from last channel for dnaString := range compl { fmt.Println(dnaString) }
  17. 17. Chaining generator functions 2/2 // Chain the generator functions dna := generateDNA() // Generator func of strings rev := reverse(dna) compl := complement(rev) // Drive the chain by reading from last channel for dnaString := range compl { fmt.Println(dnaString) }
  18. 18. Problems with the generator approach ● Inputs not named in connection code (no keyword arguments) ● Multiple return values depend on positional arguments: leftPart, rightPart := splitInHalves(chanOfStrings)
  19. 19. Could we emulate named ports? type P struct { in chan string // Channels as struct fields, to act as “named ports” out chan string } func NewP() *P { // Initialize a new component return &P{ in: make(chan string, 16), out: make(chan string, 16), } } func (p *P) Run() { defer close(p.out) for s := range p.in { // Refer to struct fields when reading ... p.out <- s // ... and writing } }
  20. 20. Could we emulate named ports? func main() { p1 := NewP() p2 := NewP() p2.in = p1.out // Connect dependencies here, by assigning to same chan go p1.Run() go p2.Run() go func() { // Feed the input of the network defer close(p1.in) for i := 0; i <= 10; i++ { p1.in <- "Hej" } }() for s := range p2.out { // Drive the chain from the main go-routine fmt.Println(s) } }
  21. 21. Add almost no additional code, and get: flowbase.org
  22. 22. Real-world use of FlowBase ● RDF (Semantic) MediaWiki XML→ ● Import via MediaWiki XML import ● Code: github.com/rdfio/rdf2smw ● Paper: bit.ly/rdfiopub
  23. 23. Connecting dependencies with FlowBase ttlFileRead.OutTriple = aggregator.In aggregator.Out = indexCreator.In indexCreator.Out = indexFanOut.In indexFanOut.Out["serialize"] = indexToAggr.In indexFanOut.Out["conv"] = triplesToWikiConverter.InIndex indexToAggr.Out = triplesToWikiConverter.InAggregate triplesToWikiConverter.OutPage = xmlCreator.InWikiPage xmlCreator.OutTemplates = templateWriter.In xmlCreator.OutProperties = propertyWriter.In xmlCreator.OutPages = pageWriter.In github.com/rdfio/rdf2smw/blob/e7e2b3/main.go#L100-L125
  24. 24. Taking it further: Port structs ttlFileRead.OutTriple().To(aggregator.In()) aggregator.Out().To(indexCreator.In()) indexCreator.Out().To(indexToAggr.In()) indexCreator.Out().To(triplesToWikiConverter.InIndex()) indexToAggr.Out().To(triplesToWikiConverter.InAggregate()) triplesToWikiConverter.OutPage().To(xmlCreator.InWikiPage()) xmlCreator.OutTemplates().To(templateWriter.In()) xmlCreator.OutProperties().To(propertyWriter.In()) xmlCreator.OutPages().To(pageWriter.In()) (So far only used in SciPipe, not yet FlowBase)
  25. 25. SciPipe Write Scientific Workflows in Go ● Define processes with shell command patterns ● Atomic writes, Restartable workflows, Caching ● Automatic file naming ● Audit logging ● Workflow graph plotting ● Intro & Docs: scipipe.org ● Preprint paper: doi.org/10.1101/380808
  26. 26. SciPipe ● Workflow ● Keeps track of dependency graph ● Process ● Added to workflows ● Long-running ● Typically one per operation ● Task ● Spawned by processes ● Executes just one shell command or custom Go function ● Typically one task spawned per operation on a set of input files ● Information Packet (IP) ● Most common data type passed between processes Workflow Process File IP Task Task Task
  27. 27. “Hello World” in SciPipe package main import ( // Import the SciPipe package, aliased to 'sp' sp "github.com/scipipe/scipipe" ) func main() { // Init workflow with a name, and max concurrent tasks wf := sp.NewWorkflow("hello_world", 4) // Initialize processes and set output file paths hello := wf.NewProc("hello", "echo 'Hello ' > {o:out}") hello.SetOut("out", "hello.txt") world := wf.NewProc("world", "echo $(cat {i:in}) World >> {o:out}") world.SetOut("out", "{i:in|%.txt}_world.txt") // Connect network world.In("in").From(hello.Out("out")) // Run workflow wf.Run() }
  28. 28. “Hello World” in SciPipe package main import ( // Import the SciPipe package, aliased to 'sp' sp "github.com/scipipe/scipipe" ) func main() { // Init workflow with a name, and max concurrent tasks wf := sp.NewWorkflow("hello_world", 4) // Initialize processes and set output file paths hello := wf.NewProc("hello", "echo 'Hello ' > {o:out}") hello.SetOut("out", "hello.txt") world := wf.NewProc("world", "echo $(cat {i:in}) World >> {o:out}") world.SetOut("out", "{i:in|%.txt}_world.txt") // Connect network world.In("in").From(hello.Out("out")) // Run workflow wf.Run() } ● Import SciPipe ● Set up any default variables or data, handle flags etc ● Initiate workflow ● Create processes ● Define outputs and paths ● Connect outputs to inputs (dependencies / data flow) ● Run the workflow
  29. 29. “Hello World” in SciPipe package main import ( // Import the SciPipe package, aliased to 'sp' sp "github.com/scipipe/scipipe" ) func main() { // Init workflow with a name, and max concurrent tasks wf := sp.NewWorkflow("hello_world", 4) // Initialize processes and set output file paths hello := wf.NewProc("hello", "echo 'Hello ' > {o:out}") hello.SetOut("out", "hello.txt") world := wf.NewProc("world", "echo $(cat {i:in}) World >> {o:out}") world.SetOut("out", "{i:in|%.txt}_world.txt") // Connect network world.In("in").From(hello.Out("out")) // Run workflow wf.Run() } ● Import SciPipe ● Set up any default variables or data, handle flags etc ● Initiate workflow ● Create processes ● Define outputs and paths ● Connect outputs to inputs (dependencies / data flow) ● Run the workflow
  30. 30. “Hello World” in SciPipe package main import ( // Import the SciPipe package, aliased to 'sp' sp "github.com/scipipe/scipipe" ) func main() { // Init workflow with a name, and max concurrent tasks wf := sp.NewWorkflow("hello_world", 4) // Initialize processes and set output file paths hello := wf.NewProc("hello", "echo 'Hello ' > {o:out}") hello.SetOut("out", "hello.txt") world := wf.NewProc("world", "echo $(cat {i:in}) World >> {o:out}") world.SetOut("out", "{i:in|%.txt}_world.txt") // Connect network world.In("in").From(hello.Out("out")) // Run workflow wf.Run() } ● Import SciPipe ● Set up any default variables or data, handle flags etc ● Initiate workflow ● Create processes ● Define outputs and paths ● Connect outputs to inputs (dependencies / data flow) ● Run the workflow
  31. 31. “Hello World” in SciPipe package main import ( // Import the SciPipe package, aliased to 'sp' sp "github.com/scipipe/scipipe" ) func main() { // Init workflow with a name, and max concurrent tasks wf := sp.NewWorkflow("hello_world", 4) // Initialize processes and set output file paths hello := wf.NewProc("hello", "echo 'Hello ' > {o:out}") hello.SetOut("out", "hello.txt") world := wf.NewProc("world", "echo $(cat {i:in}) World >> {o:out}") world.SetOut("out", "{i:in|%.txt}_world.txt") // Connect network world.In("in").From(hello.Out("out")) // Run workflow wf.Run() } ● Import SciPipe ● Set up any default variables or data, handle flags etc ● Initiate workflow ● Create processes ● Define outputs and paths ● Connect outputs to inputs (dependencies / data flow) ● Run the workflow
  32. 32. “Hello World” in SciPipe package main import ( // Import the SciPipe package, aliased to 'sp' sp "github.com/scipipe/scipipe" ) func main() { // Init workflow with a name, and max concurrent tasks wf := sp.NewWorkflow("hello_world", 4) // Initialize processes and set output file paths hello := wf.NewProc("hello", "echo 'Hello ' > {o:out}") hello.SetOut("out", "hello.txt") world := wf.NewProc("world", "echo $(cat {i:in}) World >> {o:out}") world.SetOut("out", "{i:in|%.txt}_world.txt") // Connect network world.In("in").From(hello.Out("out")) // Run workflow wf.Run() } ● Import SciPipe ● Set up any default variables or data, handle flags etc ● Initiate workflow ● Create processes ● Define outputs and paths ● Connect outputs to inputs (dependencies / data flow) ● Run the workflow
  33. 33. “Hello World” in SciPipe package main import ( // Import the SciPipe package, aliased to 'sp' sp "github.com/scipipe/scipipe" ) func main() { // Init workflow with a name, and max concurrent tasks wf := sp.NewWorkflow("hello_world", 4) // Initialize processes and set output file paths hello := wf.NewProc("hello", "echo 'Hello ' > {o:out}") hello.SetOut("out", "hello.txt") world := wf.NewProc("world", "echo $(cat {i:in}) World >> {o:out}") world.SetOut("out", "{i:in|%.txt}_world.txt") // Connect network world.In("in").From(hello.Out("out")) // Run workflow wf.Run() } ● Import SciPipe ● Set up any default variables or data, handle flags etc ● Initiate workflow ● Create processes ● Define outputs and paths ● Connect outputs & inputs (dependencies / data flow) ● Run the workflow
  34. 34. “Hello World” in SciPipe package main import ( // Import the SciPipe package, aliased to 'sp' sp "github.com/scipipe/scipipe" ) func main() { // Init workflow with a name, and max concurrent tasks wf := sp.NewWorkflow("hello_world", 4) // Initialize processes and set output file paths hello := wf.NewProc("hello", "echo 'Hello ' > {o:out}") hello.SetOut("out", "hello.txt") world := wf.NewProc("world", "echo $(cat {i:in}) World >> {o:out}") world.SetOut("out", "{i:in|%.txt}_world.txt") // Connect network world.In("in").From(hello.Out("out")) // Run workflow wf.Run() } ● Import SciPipe ● Set up any default variables or data, handle flags etc ● Initiate workflow ● Create processes ● Define outputs and paths ● Connect outputs to inputs (dependencies / data flow) ● Run the workflow
  35. 35. Writing SciPipe workflows package main import ( "github.com/scipipe/scipipe" ) const dna = "AAAGCCCGTGGGGGACCTGTTC" func main() { wf := scipipe.NewWorkflow("DNA Base Complement Workflow", 4) makeDNA := wf.NewProc("Make DNA", "echo "+dna+" > {o:dna}") makeDNA.SetOut("dna", "dna.txt") complmt := wf.NewProc("Base Complement", "cat {i:in} | tr ATCG TAGC > {o:compl}") complmt.SetOut("compl", "{i:in|%.txt}.compl.txt") reverse := wf.NewProc("Reverse", "cat {i:in} | rev > {o:rev}") reverse.SetOut("rev", "{i:in|%.txt}.rev.txt") complmt.In("in").From(makeDNA.Out("dna")) reverse.In("in").From(complmt.Out("compl")) wf.Run() } ● Import SciPipe ● Set up any default variables or data, handle flags etc ● Initiate workflow ● Create processes ● Define outputs and paths ● Connect outputs to inputs (dependencies / data flow) ● Run the workflow
  36. 36. Writing SciPipe workflows package main import ( "github.com/scipipe/scipipe" ) const dna = "AAAGCCCGTGGGGGACCTGTTC" func main() { wf := scipipe.NewWorkflow("DNA Base Complement Workflow", 4) makeDNA := wf.NewProc("Make DNA", "echo "+dna+" > {o:dna}") makeDNA.SetOut("dna", "dna.txt") complmt := wf.NewProc("Base Complement", "cat {i:in} | tr ATCG TAGC > {o:compl}") complmt.SetOut("compl", "{i:in|%.txt}.compl.txt") reverse := wf.NewProc("Reverse", "cat {i:in} | rev > {o:rev}") reverse.SetOut("rev", "{i:in|%.txt}.rev.txt") complmt.In("in").From(makeDNA.Out("dna")) reverse.In("in").From(complmt.Out("compl")) wf.Run() } ● Import SciPipe ● Set up any default variables or data, handle flags etc ● Initiate workflow ● Create processes ● Define outputs and paths ● Connect outputs to inputs (dependencies / data flow) ● Run the workflow
  37. 37. Writing SciPipe workflows package main import ( "github.com/scipipe/scipipe" ) const dna = "AAAGCCCGTGGGGGACCTGTTC" func main() { wf := scipipe.NewWorkflow("DNA Base Complement Workflow", 4) makeDNA := wf.NewProc("Make DNA", "echo "+dna+" > {o:dna}") makeDNA.SetOut("dna", "dna.txt") complmt := wf.NewProc("Base Complement", "cat {i:in} | tr ATCG TAGC > {o:compl}") complmt.SetOut("compl", "{i:in|%.txt}.compl.txt") reverse := wf.NewProc("Reverse", "cat {i:in} | rev > {o:rev}") reverse.SetOut("rev", "{i:in|%.txt}.rev.txt") complmt.In("in").From(makeDNA.Out("dna")) reverse.In("in").From(complmt.Out("compl")) wf.Run() } ● Import SciPipe ● Set up any default variables or data, handle flags etc ● Initiate workflow ● Create processes ● Define outputs and paths ● Connect outputs to inputs (dependencies / data flow) ● Run the workflow
  38. 38. Writing SciPipe workflows package main import ( "github.com/scipipe/scipipe" ) const dna = "AAAGCCCGTGGGGGACCTGTTC" func main() { wf := scipipe.NewWorkflow("DNA Base Complement Workflow", 4) makeDNA := wf.NewProc("Make DNA", "echo "+dna+" > {o:dna}") makeDNA.SetOut("dna", "dna.txt") complmt := wf.NewProc("Base Complement", "cat {i:in} | tr ATCG TAGC > {o:compl}") complmt.SetOut("compl", "{i:in|%.txt}.compl.txt") reverse := wf.NewProc("Reverse", "cat {i:in} | rev > {o:rev}") reverse.SetOut("rev", "{i:in|%.txt}.rev.txt") complmt.In("in").From(makeDNA.Out("dna")) reverse.In("in").From(complmt.Out("compl")) wf.Run() } ● Import SciPipe ● Set up any default variables or data, handle flags etc ● Initiate workflow ● Create processes ● Define outputs and paths ● Connect outputs to inputs (dependencies / data flow) ● Run the workflow
  39. 39. Writing SciPipe workflows package main import ( "github.com/scipipe/scipipe" ) const dna = "AAAGCCCGTGGGGGACCTGTTC" func main() { wf := scipipe.NewWorkflow("DNA Base Complement Workflow", 4) makeDNA := wf.NewProc("Make DNA", "echo "+dna+" > {o:dna}") makeDNA.SetOut("dna", "dna.txt") complmt := wf.NewProc("Base Complement", "cat {i:in} | tr ATCG TAGC > {o:compl}") complmt.SetOut("compl", "{i:in|%.txt}.compl.txt") reverse := wf.NewProc("Reverse", "cat {i:in} | rev > {o:rev}") reverse.SetOut("rev", "{i:in|%.txt}.rev.txt") complmt.In("in").From(makeDNA.Out("dna")) reverse.In("in").From(complmt.Out("compl")) wf.Run() } ● Import SciPipe ● Set up any default variables or data, handle flags etc ● Initiate workflow ● Create processes ● Define outputs and paths ● Connect outputs to inputs (dependencies / data flow) ● Run the workflow
  40. 40. Writing SciPipe workflows package main import ( "github.com/scipipe/scipipe" ) const dna = "AAAGCCCGTGGGGGACCTGTTC" func main() { wf := scipipe.NewWorkflow("DNA Base Complement Workflow", 4) makeDNA := wf.NewProc("Make DNA", "echo "+dna+" > {o:dna}") makeDNA.SetOut("dna", "dna.txt") complmt := wf.NewProc("Base Complement", "cat {i:in} | tr ATCG TAGC > {o:compl}") complmt.SetOut("compl", "{i:in|%.txt}.compl.txt") reverse := wf.NewProc("Reverse", "cat {i:in} | rev > {o:rev}") reverse.SetOut("rev", "{i:in|%.txt}.rev.txt") complmt.In("in").From(makeDNA.Out("dna")) reverse.In("in").From(complmt.Out("compl")) wf.Run() } ● Import SciPipe ● Set up any default variables or data, handle flags etc ● Initiate workflow ● Create processes ● Define outputs and paths ● Connect outputs to inputs (dependencies / data flow) ● Run the workflow
  41. 41. Writing SciPipe workflows package main import ( "github.com/scipipe/scipipe" ) const dna = "AAAGCCCGTGGGGGACCTGTTC" func main() { wf := scipipe.NewWorkflow("DNA Base Complement Workflow", 4) makeDNA := wf.NewProc("Make DNA", "echo "+dna+" > {o:dna}") makeDNA.SetOut("dna", "dna.txt") complmt := wf.NewProc("Base Complement", "cat {i:in} | tr ATCG TAGC > {o:compl}") complmt.SetOut("compl", "{i:in|%.txt}.compl.txt") reverse := wf.NewProc("Reverse", "cat {i:in} | rev > {o:rev}") reverse.SetOut("rev", "{i:in|%.txt}.rev.txt") complmt.In("in").From(makeDNA.Out("dna")) reverse.In("in").From(complmt.Out("compl")) wf.Run() } ● Import SciPipe ● Set up any default variables or data, handle flags etc ● Initiate workflow ● Create processes ● Define outputs and paths ● Connect outputs to inputs (dependencies / data flow) ● Run the workflow
  42. 42. Running it go run revcompl.go
  43. 43. Dependency graph plotting
  44. 44. Structured audit log (Hierarchical JSON)
  45. 45. Turn Audit log into TeX/PDF report TeX template by Jonathan Alvarsson @jonalv
  46. 46. ● Intuitive behaviour: Like conveyor belts & stations in a factory. ● Flexible: Combine command-line programs with Go components ● Custom file naming: Easy to manually browse output files ● Portable: Distribute as Go code or as compiled executable files ● Easy to debug: Use any Go debugging tools or even just println() ● Powerful audit logging: Stream outputs via UNIX FIFO files ● Efficient & Parallel: Fast code + Efficient use of multi-core CPU Benefits of SciPipe - Thanks to Go + FBP
  47. 47. More info at: scipipe.org
  48. 48. Thank you for your time! Using Flow-based programming ... to write Tools and Workflows for Scientific Computing Talk at Go Stockholm Conference Oct 6, 2018 Samuel Lampa | bionics.it | @saml (slack) | @smllmp (twitter) Dept. of Pharm. Biosci, Uppsala University | www.farmbio.uu.se | pharmb.io

×