Bioinformatics alchemy 101 transmuting dark script matter into reusable tools - ross lazarus

543 views

Published on

Reproducibility is a fundamental goal of good experimental science. Despite the increasing availability and deployment of analytic frameworks such as Galaxy, readily reproducible bioinformatic analysis remains difficult to achieve. Mature complex workflows often require small tweaks to accommodate the idiosyncracies of new datasets, but integrating the required new capabilities into the framework is prohibitively complex and expensive. As a result, when problems are encountered in an existing pipeline, data may be temporarily diverted for manual processing outside the framework. These manual steps typically involve relatively trivial, transient, undocumented and poorly curated programs or scripts - "dark script matter" that rarely reaches appropriate local version control or archiving systems where production code is maintained, threatening the goal of reproducible analysis. The Galaxy Toolfactory is a Galaxy tool that allows scripts (R, perl, python, Bash...) to be run directly and repeatably through the normal Galaxy interface. The Toolfactory optionally generates all the biolerplate code needed for a new Galaxy tool that permanently wraps the script for reuse. Newly generated tools can be uploaded to a local or remote Galaxy Toolshed. Tools can be installed in a running Galaxy server from any Toolshed through the administrative interface for subsequent use in worflows and analyses. The conversion of a trivial script into a working, shareable Galaxy tool will be demonstrated during the presentation.

Published in: Real Estate
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
543
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Bioinformatics alchemy 101 transmuting dark script matter into reusable tools - ross lazarus

  1. 1. Bioinformatic Alchemy 101 Transmuting dark script matter into reusable tools Ross Lazarus BakerIDI 1
  2. 2. Context: bioinformatic analyses Big data; complex analyses Repeatable, automated pipelines Reproducibility real goal Reproducibility is hard 2
  3. 3. Frameworks Eg VGL Local SOPs for biologists Tools, canned workflows Minimise opportunities for error Maximise reproducibilty 3
  4. 4. In real life 90/10 rule Need to tweak SOPs Trivial disposable scripts Not documented or curated Not reliably available to re-run “Dark script matter” 4
  5. 5. Dark Script Matter Outside usual VCS/pipelines Manual =/= reproducible Necessary evil? Platform extensions complex Eg Galaxy – hours of work 5
  6. 6. Plan Context: Reproducible analyses Frameworks vs Dark Scripts Alchemy: script to Galaxy tool Demonstration Summary Conclusions 6
  7. 7. Galaxy Tool Factory An installable Galaxy tool Runs scripts: Python,R,Perl,sh Generates new Galaxy tools Tool code wraps the script Minutes – not hours 7
  8. 8. Galaxy Tool Shed Separate server Stores/serves Galaxy tools Admin can install to Galaxy Mercurial VCS archives Explicit tool versioning Sharing and reproducibility 8
  9. 9. Demo 1: Install the Tool Factory
  10. 10. Demo 2: Create a new tool
  11. 11. Prepare script Python; R; Perl; Sh Parse CL params – 1=in, 2=out Typically workflow transformations Arbitrary complexity Simple example Write transpose of a tabular file 11
  12. 12. Prepare/upload test data SMALL sample input Becomes functional test case h1 h2 h3 h4 r11 r12 r13 r14 r21 r22 r23 r24 r31 r32 r33 r34 12
  13. 13. # R transpose a tabular input file and write as# a tabular output fileourargs = commandArgs(TRUE)inf = ourargs[1]outf = ourargs[2]inp = read.table(inf,head=F,row.names=NULL,sep=t)outp = t(inp)write.table(outp,outf,quote=FALSE, sep="t",row.names=F,col.names=FALSE) 13
  14. 14. Demo part 1As an admin, test run the codeCant make a new tool until it works!Admin only real time scripting in Galaxy.Overrides ALL other security.Generated tools run with normal security. 14
  15. 15. Use Redo button; Generate When working right Use Redo to save retyping Select Generate option Provide tool ID, help text Execute Expect a toolfactory.gz in history Copy link (floppy disk icon) 15
  16. 16. Whats in the toolshed.gz ? A gzipd mercurial tool repository (!) Auto generated tool XML file Auto generated tool python wrapper Functional test case - the sample data Familiar Galaxy tool for all users Executes your script over their data Interoperably inside Galaxy 16
  17. 17. Upload TS gzip to new repository Upload to any tool shed Create new repo; sensible name! Choose Upload files to new repo Paste URL (floppydisk save icon) New tool ready to install 17
  18. 18. Install and Test New Tool Back to Galaxy admin interface Browse local tool shed Choose new tool Install to local Galaxy Try it out Run functional test 18
  19. 19. Summary GTF = script to tool in minutes Integrated with Galaxy and TS Simple workflow components If needed, generate simple tool Then add parameters manually 19
  20. 20. Tool Factory Operation Guide Galaxy Install new tool from toolshed Script Tool Factory from Galaxy admin page; (Python,R, Tool Form; Test; Functional test; perl, sh) Paste script; Upload/pasteSample Input for Test run; Create new repository. functional test Check outputs; Upload files – paste TS gzip Rerun/fix; link and upload Generate TS gzip; Copy download link for Tool Shed pasting 20
  21. 21. GALAXYhttp://usegalaxy.org 21
  22. 22. Galaxy Tool FactoryGenerate a new Galaxy tool From a python, R, Perl or bash script Using a Galaxy write as a tabular output file # transpose a tabular input file and tool Via a Tool Shed ourargs = commandArgs(T) inf = ourargs[1] outf = ourargs[2] inp = read.table(inf,head=F,row.names=NULL,sep=t) outp = t(inp) write.table(outp,outf,quote=F, sep="t",row.names=F,col.names=F) 22
  23. 23. Tool Factory Operation Guide Galaxy Install new tool from toolshedScript – R, Tool Factory from Galaxy admin page;perl, python Tool Form; Test; Functional test; Paste script; Upload/pasteSample Input for Test run; Create new repository. functional test Check outputs; Upload files – paste TS gzip Rerun/fix; link and upload Generate TS gzip; Copy download link for Tool Shed pasting 23

×