Bioinformatics alchemy 101   transmuting dark script matter into reusable tools - ross lazarus
Upcoming SlideShare
Loading in...5
×
 

Bioinformatics alchemy 101 transmuting dark script matter into reusable tools - ross lazarus

on

  • 596 views

Reproducibility is a fundamental goal of good experimental science. Despite the increasing availability and deployment of analytic frameworks such as Galaxy, readily reproducible bioinformatic ...

Reproducibility is a fundamental goal of good experimental science. Despite the increasing availability and deployment of analytic frameworks such as Galaxy, readily reproducible bioinformatic analysis remains difficult to achieve. Mature complex workflows often require small tweaks to accommodate the idiosyncracies of new datasets, but integrating the required new capabilities into the framework is prohibitively complex and expensive. As a result, when problems are encountered in an existing pipeline, data may be temporarily diverted for manual processing outside the framework. These manual steps typically involve relatively trivial, transient, undocumented and poorly curated programs or scripts - "dark script matter" that rarely reaches appropriate local version control or archiving systems where production code is maintained, threatening the goal of reproducible analysis. The Galaxy Toolfactory is a Galaxy tool that allows scripts (R, perl, python, Bash...) to be run directly and repeatably through the normal Galaxy interface. The Toolfactory optionally generates all the biolerplate code needed for a new Galaxy tool that permanently wraps the script for reuse. Newly generated tools can be uploaded to a local or remote Galaxy Toolshed. Tools can be installed in a running Galaxy server from any Toolshed through the administrative interface for subsequent use in worflows and analyses. The conversion of a trivial script into a working, shareable Galaxy tool will be demonstrated during the presentation.

Statistics

Views

Total Views
596
Views on SlideShare
596
Embed Views
0

Actions

Likes
0
Downloads
7
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Bioinformatics alchemy 101   transmuting dark script matter into reusable tools - ross lazarus Bioinformatics alchemy 101 transmuting dark script matter into reusable tools - ross lazarus Presentation Transcript

  • Bioinformatic Alchemy 101 Transmuting dark script matter into reusable tools Ross Lazarus BakerIDI 1
  • Context: bioinformatic analyses Big data; complex analyses Repeatable, automated pipelines Reproducibility real goal Reproducibility is hard 2
  • Frameworks Eg VGL Local SOPs for biologists Tools, canned workflows Minimise opportunities for error Maximise reproducibilty 3
  • In real life 90/10 rule Need to tweak SOPs Trivial disposable scripts Not documented or curated Not reliably available to re-run “Dark script matter” 4
  • Dark Script Matter Outside usual VCS/pipelines Manual =/= reproducible Necessary evil? Platform extensions complex Eg Galaxy – hours of work 5
  • Plan Context: Reproducible analyses Frameworks vs Dark Scripts Alchemy: script to Galaxy tool Demonstration Summary Conclusions 6
  • Galaxy Tool Factory An installable Galaxy tool Runs scripts: Python,R,Perl,sh Generates new Galaxy tools Tool code wraps the script Minutes – not hours 7
  • Galaxy Tool Shed Separate server Stores/serves Galaxy tools Admin can install to Galaxy Mercurial VCS archives Explicit tool versioning Sharing and reproducibility 8
  • Demo 1: Install the Tool Factory
  • Demo 2: Create a new tool
  • Prepare script Python; R; Perl; Sh Parse CL params – 1=in, 2=out Typically workflow transformations Arbitrary complexity Simple example Write transpose of a tabular file 11
  • Prepare/upload test data SMALL sample input Becomes functional test case h1 h2 h3 h4 r11 r12 r13 r14 r21 r22 r23 r24 r31 r32 r33 r34 12
  • # R transpose a tabular input file and write as# a tabular output fileourargs = commandArgs(TRUE)inf = ourargs[1]outf = ourargs[2]inp = read.table(inf,head=F,row.names=NULL,sep=t)outp = t(inp)write.table(outp,outf,quote=FALSE, sep="t",row.names=F,col.names=FALSE) 13
  • Demo part 1As an admin, test run the codeCant make a new tool until it works!Admin only real time scripting in Galaxy.Overrides ALL other security.Generated tools run with normal security. 14
  • Use Redo button; Generate When working right Use Redo to save retyping Select Generate option Provide tool ID, help text Execute Expect a toolfactory.gz in history Copy link (floppy disk icon) 15
  • Whats in the toolshed.gz ? A gzipd mercurial tool repository (!) Auto generated tool XML file Auto generated tool python wrapper Functional test case - the sample data Familiar Galaxy tool for all users Executes your script over their data Interoperably inside Galaxy 16
  • Upload TS gzip to new repository Upload to any tool shed Create new repo; sensible name! Choose Upload files to new repo Paste URL (floppydisk save icon) New tool ready to install 17
  • Install and Test New Tool Back to Galaxy admin interface Browse local tool shed Choose new tool Install to local Galaxy Try it out Run functional test 18
  • Summary GTF = script to tool in minutes Integrated with Galaxy and TS Simple workflow components If needed, generate simple tool Then add parameters manually 19
  • Tool Factory Operation Guide Galaxy Install new tool from toolshed Script Tool Factory from Galaxy admin page; (Python,R, Tool Form; Test; Functional test; perl, sh) Paste script; Upload/pasteSample Input for Test run; Create new repository. functional test Check outputs; Upload files – paste TS gzip Rerun/fix; link and upload Generate TS gzip; Copy download link for Tool Shed pasting 20
  • GALAXYhttp://usegalaxy.org 21
  • Galaxy Tool FactoryGenerate a new Galaxy tool From a python, R, Perl or bash script Using a Galaxy write as a tabular output file # transpose a tabular input file and tool Via a Tool Shed ourargs = commandArgs(T) inf = ourargs[1] outf = ourargs[2] inp = read.table(inf,head=F,row.names=NULL,sep=t) outp = t(inp) write.table(outp,outf,quote=F, sep="t",row.names=F,col.names=F) 22
  • Tool Factory Operation Guide Galaxy Install new tool from toolshedScript – R, Tool Factory from Galaxy admin page;perl, python Tool Form; Test; Functional test; Paste script; Upload/pasteSample Input for Test run; Create new repository. functional test Check outputs; Upload files – paste TS gzip Rerun/fix; link and upload Generate TS gzip; Copy download link for Tool Shed pasting 23