The Galaxy toolshed
Upcoming SlideShare
Loading in...5
×
 

The Galaxy toolshed

on

  • 218 views

This presentation provides some technical details on the function of the Galaxy toolshed. It was prepared for a group (Biobix at UGent), during my previous job.

This presentation provides some technical details on the function of the Galaxy toolshed. It was prepared for a group (Biobix at UGent), during my previous job.

Statistics

Views

Total Views
218
Views on SlideShare
218
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

The Galaxy toolshed The Galaxy toolshed Presentation Transcript

  • The Galaxy ToolShed The software repository of Galaxy
  • Galaxy is an interface to a system cpu storage binaries libraries GALAXY GUI
  • Toolshed: software repository ● In the long term: empty galaxy and with installation install wanted tools, on a user basis.
  • Toolshed: get your own?... ● Code is part of main distribution ● ./run_tool_shed.sh ● Very easy to have it run locally...
  • Code is shared through hg Galaxy main code on Bitbucket hg Galaxy server Toolshed+
  • Toolshed: run your own? ● Toolshed is completely separate process to Galaxy ● Uses it's own pg database: need to create a new user account ● Files of toolshed need to be stored separate next to Galaxy root
  • Sharing a tool is basically simple Allyouhavetoshareis(ifit'sasimplescript): tool_conf.xml tool.pl ThiscanbedistributedusingtheToolShed Dependencieshavetobeinstalledseparately
  • Sharing through the toolshed Galaxy moves to installing everything through the Tool Shed: see shed_tool_conf.xml <?xml version="1.0"?> <toolbox tool_path="/shed_tools"> <section id="textutil" name="Text Manipulation" version=""> <tool file="/shed_tools/toolshed.g2.bx.psu.edu/repos/bjoern- gruening/sed_wrapper/e850a63e5aed/sed_wrapper/sed.xml" guid="toolshed.g2.bx.psu.edu/repos/bjoern- gruening/sed_wrapper/sed_stream_editor/0.0.1"> <tool_shed>toolshed.g2.bx.psu.edu</tool_shed> <repository_name>sed_wrapper</repository_name> <repository_owner>bjoern-gruening</repository_owner> <installed_changeset_revision>e850a63e5aed </installed_changeset_revision> <id>toolshed.g2.bx.psu.edu/repos/bjoern- gruening/sed_wrapper/sed_stream_editor/0.0.1</id> <version>0.0.1</version> </tool> </section> </toolbox>
  • Tasks of the toolshed ● Communicate with any Galaxy that wants to install a tool from it (Galaxy admin that accepts the tool needs to add your Toolshed) ● Periodically runs functional tests on the Tools ● Allow people to update the tools ● Codevelop tools
  • Philosophy: task of a tool Somefunctionalityisencodedredundantlyintools. Anexampleisvisualisingdata:somecallR,somecallGNUplot. IreallythinkthatthepreferredoutputofGalaxyneedstobetext.Ananversatilestrongvisualisationtoolcandrawthengraphsas neededfromtheoutput. (PNG,PDFandothervisualformatsaresupported.) BTW:the2differentrepositorytypescomplywiththisview,
  • My original aim... Prod gal test gal Dev 1 Dev 2 Dev 3 Tool Shed (BITS?) Update Offic. gal-dist
  • 'Official' advise ● Run Galaxy and toolshed locally ● Develop your tool in your local Galaxy ● If everything runs, wrap it up as .tar ● Upload everything to Toolshed of your choice. ● Test download in a test Galaxy from the Toolshed ● Debug... Do not use the toolshed As a development environment
  • All code is shared through hg Galaxy main code on Bitbucket hg Galaxy server Toolshed server+
  • All code is shared through hg Galaxy main code on Bitbucket hg Galaxy server Toolshed server+ FancyTool (hg repo) SuperTool (hg repo) PowerTool (hg repo) Your uploaded .tar balls
  • Code is shared through hg
  • Code is shared through hg
  • Tips ● To test installation: empty your local toolbox
  • What is mercurial? version/source control system Without mercurial
  • What is mercurial? version/source control system Without mercurial
  • What is mercurial? version/source control system Without mercurial: continuously adding changes.
  • What is mercurial? version/source control system With mercurial: fix certain states of your file “commits”
  • What is mercurial? 1. keep track of the changes YOU do on your files, scripts, folders,... joachim@joachim-laptop:~/Projects/hgprojects$ hg log changeset: 2:726fa53bcd7d tag: tip user: Joachim Jacob <joachim.jacob@gmail.com> date: Fri Nov 16 11:24:09 2012 +0100 summary: Third change, playing with copy and remove changeset: 1:744894cb4ee6 user: Joachim Jacob <joachim.jacob@gmail.com> date: Fri Nov 16 11:09:49 2012 +0100 summary: I have added a small change to hello.txt changeset: 0:b84e0105967f user: Joachim Jacob <joachim.jacob@gmail.com> date: Fri Nov 16 11:08:01 2012 +0100
  • What is mercurial? You can go back to a previous revision (e.g. hg update 2). You can do some changes to the files (creating multiple heads) “head” “head”
  • What is mercurial? You can go back to a previous revision. You can do some changes to the files. joachim@joachim-laptop:~/Projects/hgprojects$ hg update 1 1 files updated, 0 files merged, 3 files removed, 0 files unresolved joachim@joachim-laptop:~/Projects/hgprojects$ nano hello.txt joachim@joachim-laptop:~/Projects/hgprojects$ hg commit -m "Bug fix" created new head joachim@joachim-laptop:~/Projects/hgprojects$ hg summary parent: 3:2d1d80bd0124 tip Bug fix branch: default commit: (clean) update: 1 new changesets, 2 branch heads (merge)
  • What is mercurial? When done a change, you can merge the heads together again in one tip. joachim@joachim-laptop:~/Projects/hgprojects$ hg merge merging hello.txt and another.txt to another.txt merging hello.txt and mvtest.txt to mvtest.txt 1 files updated, 2 files merged, 0 files removed, 0 files unresolved (branch merge, don't forget to commit) “merge”
  • What is mercurial? When done a change, you can merge the heads together again in one tip. joachim@joachim-laptop:~/Projects/hgprojects$ hg commit -m 'Commit the bug fix permanently' “commit” In case of conflicts, use 'hg resolve --list' to view the conflicting files. Fix them by hand.
  • What is mercurial? 1. keep track of the changes YOU do on your files, scripts, folders,... 2. clone your working directory to a new directory (e.g. to work on another feature). “clone”
  • What is mercurial? You can compare two different repositories with incoming. If you want to merge the changes, you can use pull. “incoming”
  • What is mercurial? You can compare two different repositories with incoming. If you want to merge the changes, you can use pull. “pull”
  • What is mercurial? You can compare two different repositories with incoming. If you want to merge the changes, you can use pull. “merge”
  • What is mercurial? Hg commit to fix the change! “commit”
  • What is mercurial? So, in your directory, OR you change/add yourself files OR mercurial does this for you (during a merge) (undo with 'rollback') Both need to be followed by a commit.
  • What is mercurial? 1. keep track of the changes YOU do on your files, scripts, folders,... 2. clone your working directory to a new directory (e.g. to work on another feature). 3. Share changes with other users.
  • Sharing in mercurial? The directories might be located - on local directories: - on your intranet (hg serve): - on the internet: You can also export a commit, send it through email, and import it. You can also set up an push repository online on BitBucket. “pull /path/to/directory” “pull http://10.10.10.100:8000” “pull hg clone http://joachim@toolshed.bits.vib.be/repos/joachim/clcaligner
  • What is mercurial? Guide! http://mercurial.selenic.com/guide/ http://hginit.com/
  • Galaxy Toolshed Galaxy Toolshed contains a bunch of Mercurial repositories you can clone
  • Getting ready for Galaxy development How I develop for Galaxy:
  • Getting ready for Galaxy development How I develop for Galaxy: template Set tool name Toolshed upload hg clone Dev Galaxy hg push
  • Getting ready for Galaxy development And the last step: template Set tool name Toolshed upload hg clone Dev Galaxy hg push Galaxy.bits.vib.be
  • How I develop for Galaxy: - you need a personal Galaxy (hg clone …) - you might use a Toolshed repository 1. Get a template (right): a tar ball with some files. Getting ready for Galaxy development ● README ● tool_data_table_conf.xml.sample ● tool_dependencies.xml ● tool_indices.loc.sample ● tool_wrapper_template.pl ● tool_wrapper.xml
  • 2. Rename the files: - replace 'tool' with your tool name [galaxy@joagal razers]$ ls razers3_wrapper.xml README tool_data_table_conf.xml.sample tool_indices.loc.sample tool_wrapper_template.pl Getting ready for Galaxy development
  • 3. Edit the wrapper.xml: the <tool> section. Getting ready for Galaxy development
  • 4. Pack again everything in a tarball and upload to the test Toolshed in a new repository Getting ready for Galaxy development
  • 4. Pack again everything in a tarball and upload to the test Toolshed in a new repository Getting ready for Galaxy development
  • 5. hg clone your repository to a folder in your development Galaxy. Getting ready for Galaxy development
  • 5. hg clone your repository to a folder in your development Galaxy. Getting ready for Galaxy development [galaxy@joagal GalaxyHangar]$ hg clone http://joachim@192.168.10.23 :9009/repos/joachim/fastqseqlen destination directory: fastqseqlen requesting all changes adding changesets adding manifests adding file changes added 1 changesets with 2 changes to 2 files updating to branch default resolving manifests getting README getting fastqseqlen.xml 2 files updated, 0 files merged, 0 files removed, 0 files unresolved
  • 5. hg clone your repository to a folder in your development Galaxy. Getting ready for Galaxy development [galaxy@joagal GalaxyHangar]$ cd fastqseqlen/ [galaxy@joagal fastqseqlen]$ ls fastqseqlen.xml README [galaxy@joagal fastqseqlen]$ [galaxy@joagal fastqseqlen]$ hg summary parent: 0:3f22736718ef tip Uploaded files branch: default commit: (clean) update: (current) [galaxy@joagal fastqseqlen]$
  • 6. Link the complete directory to a directory under $GALAXY_HOME/tools/ and make Galaxy aware of it by modifying tool_conf.xml Getting ready for Galaxy development
  • 7. (re)start your Galaxy $ ./run.sh –reload And check if tool loads: Getting ready for Galaxy development
  • 8. Get your tools parameters display straight: Fill the rest of the tool's XML file. Add also the loc.file (which contains your reference data) if needed. (when modifying the XML, to see the changes you have to restart Galaxy. Kill Galaxy and run ./run.sh --reload again. Getting ready for Galaxy development
  • 9. Fun! Start developing your tool Development happens in the development Galaxy, committing changes from time to time (evt. with pushing to Toolshed) Starting Galaxy tools development $ hg commit -m "Alpha version of RazerS3 wrapper" $ hg push --debug $ hg commit -m "Some small changes" $ hg push --debug
  • Mercurial credentials should be stored in ~/.hgrc (hgrc.ini for windows) [ui] username = "joachim <joachim.jacob@vib.be>" verbose=True [extensions] hgext.graphlog = [auth] bb.prefix = http://192.168.10.26:9009/repos/joachim/razers bb.username = joachim bb.password = ******** Starting Galaxy tools development
  • When development is ready... Push the last changes to the Galaxy test Toolshed. Export from the Galaxy Test Toolshed and import in BITS Toolshed. Install in Galaxy.bits.vib.be
  • When development is ready... Push the last changes to the Galaxy test Toolshed. Export from the Galaxy Test Toolshed and import in BITS Toolshed. Install in Galaxy.bits.vib.be
  • Galaxy manages scripts (tools) 1. Galaxy knows the location of tools, as this is set in (an) xml file(s) 2. The tool referenced by an xml file can be - a script that does all calculations by itself (e.g. bash script, python script,...) - a script that does calculations by using 3rd party libraries (e.g. R) - a script that does calculations by calling a 3rd party binary
  • 4 different XML files ● integrated_tool_panel.xml - layout of panel ● shed_tool_conf.xml - tools from shed ● tool_conf.xml - tools from install or own ● migrated_tools_conf.xml : tools removed from tool_conf.xml upon updating. Noot:dezexmlfileszijnpasinvoegenade laatsteupdate!
  • Galaxy installation directory ● Galaxy is installed as the user galaxy /home/galaxy/galaxy-dist ● Installation and Version control of this directory is done by Mercurial (config in .hg directory, file .hgignore to ignore updating certain files) ● Installation for production required some changes: PostgresDB, apache serving static content, network settings, running galaxy as a daemon in the background http://wiki.g2.bx.psu.edu/Admin/Get%20Galaxy
  • Galaxy installation directory ● Galaxy is installed on linux as the user galaxy in /home/galaxy/galaxy-dist ● Important locations under this directory: - universe_wsgi.ini → general config file - *.xml → 'embedding' of tools and types - tools/ → location of the scripts - database/ → location of the datasets http://wiki.g2.bx.psu.edu/Admin/Get%20Galaxy
  • integrated_tool_panel.xml <toolbox> <section id="fasta_manipulation" name="FASTA manipulation" version=""> <tool id="fasta_compute_length" /> <tool id="fasta_filter_by_length" /> <tool id="fasta_concatenate0" /> <tool id="fasta2tab" /> <tool id="tab2fasta" /> <tool id="cshl_fasta_formatter" /> <tool id="cshl_fasta_nucleotides_changer" /> <tool id="cshl_fastx_collapser" /> </section> </toolbox> IsdoorGalaxysamengesteldvanshed_tool_conf.xmlen tool_conf.xml.DeIDvaneentoolverwijstnaardeIDvaluein deandere*.xmlfiles.ALEENaantepassenbijwijzigenpositie inhettoolpaneel
  • tool_conf.xml <?xml version="1.0"?> <toolbox> <section name="FASTA manipulation" id="fasta_manipulation"> <tool file="fasta_tools/fasta_compute_length.xml" /> <tool file="fasta_tools/fasta_filter_by_length.xml" /> <tool file="fasta_tools/fasta_concatenate_by_species.xml" /> <tool file="fasta_tools/fasta_to_tabular.xml" /> <tool file="fasta_tools/tabular_to_fasta.xml" /> <tool file="fastx_toolkit/fasta_formatter.xml" /> <tool file="fastx_toolkit/fasta_nucleotide_changer.xml" /> <tool file="fastx_toolkit/fastx_collapser.xml" /> </section> </toolbox> Isdoorontwikkelaarsaantepassenvoorhettoevoegenvan nieuwetools:hierbijverwijsjenaardelocatie,startendvanaf detoolsdirectory(tools/,uituniverse_wsgi.ini),vandetoolxml.
  • tool.xml, the tool definition file <tool id="fasta_compute_length" name="Compute sequence length"> <description></description> <command interpreter="python"> fasta_compute_length.py $input $output $keep_first </command> <inputs> <param name="input" type="data" format="fasta" label="Compute length for these sequences"/> <param name="keep_first" type="integer" size="5" value="0" label="How many title characters to keep?" help="'0' = keep the whole thing"/> </inputs> <outputs> <data name="output" format="tabular"/> </outputs> <tests/> <help/> </tool> Elketoolheefteenxml,datverwijstnaarhetscript,datde interfaceopbouwtenparametersnaardetoolzendt.
  • Tool interface is build from xml
  • The tool XML points to a script ./tools/fasta_tools/fasta_compute_length.py : #!/usr/bin/env python """ Uses fasta_to_len converter code. """ import sys from galaxy.datatypes.converters.fasta_to_len import compute_fasta_length compute_fasta_length( sys.argv[1], sys.argv[2], sys.argv[3]) Inditgevalvindtdeberekeningplaatsinpythonzelf.Soms moetenechter3rd partieslibrariesgeinstalleerdworden.
  • The tool XML points to a binary #!/usr/bin/env python """ Runs BWA on single-end or paired-end data. Produces a SAM file containing the mappings. Works with BWA version 0.5.9. usage: bwa_wrapper.py [options] See below for options """ import optparse, os, shutil, subprocess, sys, tempfile def stop_err( msg ): sys.stderr.write( '%sn' % msg ) sys.exit() def check_is_double_encoded( fastq ): # check that first read is bases, not one base followed by numbers bases = [ 'A', 'C', 'G', 'T', 'a', 'c', 'g', 't', 'N' ] nums = [ '0', '1', '2', '3' ] for line in file( fastq, 'rb'): if not line.strip() or line.startswith( '@' ):
  • Options for building interfaces Overviewofthetagson http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax Theparameterstoconstructtheinterfaceareplacedwithin <input> </input>tags Thetagsyouuseinthe<input>sectiondefinealotthesyntax touseinothertagsets,suchas<output>,and<command> BASICUSE <param name=[param_name]type=text value=”default” label=”Explanationoftheparameter”help=”help”/> e.g.
  • Select a dataset from history Ifthetypeofinput=”data”,adropdownlistofhistoryitemsappear. Theacceptedformatshouldbeincludedasformat=”format”. <param name="input"type="data" format="tabular"label="Dataset"/>
  • Choose from a list <param name="detection_thresh" type="select" multiple="true" label="Detection thresholds"> <option value="0.001">0.001</option> <option value="0.002">0.002</option> <option value="0.003">0.003</option> <option value="0.004">0.004</option> </param>
  • Select reference data <param name="indices" type="select" label="Select a reference genome"> <options from_data_table="bwa_indexes"> <filter type="sort_by" column="2" /> <validator type="no_options" message="No indexes are available" /> </options> <!-- is not option --> </param> Forsometoolsindexeddatacanbemade available(e.g.BLAST,NGSmappers,…). Topass indexedsets,theycanbereferencedtoby tool_data_table_conf.xml:theypointto ./tool_data/<toolname>.loc files
  • Select reference data ./tool_data_table_conf.xml: <table name="bwa_indexes" comment_char="#"> <columns>value, dbkey, name, path</columns> <file path="tool-data/bwa_index.loc" /> </table> ./tool_data/<toolname>.loc hg19_chr21 hg19 Human chrom 21 bld 37 (hg19) /mnt/genomes/hg19_chrom21/bwa/base/build37_chr21.fa hg18 hg18 Human genome bld 36 (hg18) /mnt/genomes/hg18/bwa/base/build36.fa hg19 hg19 Human genome bld 37 (hg19) /mnt/genomes/hg19/bwa/base/build37.fa Thereferencedataisonadiskmountedon/mnt/genomes
  • Select reference data Thereferencedataisonadiskmountedon/mnt/genomes /mnt/genomes/ (800GB) |-- hg18 | |-- bfast | |-- bowtie | |-- bwa |-- hg19 | |-- bfast | |-- bowtie | |-- bwa
  • Other useful input: conditional <conditional name > <param type=select … /> <option name=no > No </option> <option name=yes > Yes </option> </param> <when value=”No”> <param name=[name] … /> <!--e.g.askforinput--> </when> <when value=”Yes”/> </conditional>
  • Other useful input: conditional conditional
  • Output section Itistheeasiestthatyourscriptcanacceptthenameoftheoutputfiletooutputtheresultsto.TheeffectiveoutputfilenamesarethenpassedbyGalaxyto yourprogram. <outputs> <data format="fasta" name="trim_fasta" label="${tool.name} on ${on_string} seq"/> </outputs> <command … > myscript.pl -i $input -o $trim_fasta </command > Important:settheformattothecorrecttype! Optionaloutputfiles:canbehandledwiththe<conditional>tagset,andlinkingittothe<filter>tagintheoutputsets.
  • How to integrate a tool? Youhave:ascriptthatacceptsparametersandwritestheresultstoatextfile. TODO 1.putyourscriptin~/galaxy-dist/tools/mytools/ 2.inthatdirectory,createamytool.xmlfile,pointingtothattool,withalltagsetssetcorrectly. 3.in~/galaxy-dist/tool_conf.xml,enteralinewithyourtoolxmlfile 4.restartgalaxy:#service galaxyd restart (4'.optional:changethelocationofyourtoolinintegrated_tool_panel.xml andrestartagain) 5.There'sthemagic.Enjoyyourtool!
  • Wrapping Binaries Thingsgetabitdifficultwithwrapperscripts:scriptsthatdriveathirdpartybinary,whichneedstobeavailableonthesystem.Ihaveinstalled3rd partybinariesin: /opt (Inonecase,Ifoundmyselfwritingapython script,todrivea3rd partybash script,thatconsecutivelyexecutedaJAVA binaryandanR command,togenerateaPDFdocument.Thecorrectimplementation:executetheJAVAbinary,generatetext.LetvisualisationtoolsinGalaxy generategraphs)
  • Tool dependencies Some tools in the Toolshed require common code base: e.g. R, samtools, GATK In your .xml you specify these requirements:
  • Tool dependencies In your .xml these requirements must match the tool_dependencies.xml
  • Tool dependencies In your .xml these requirements must match the tool_dependencies.xml
  • Tool dependencies In your .xml these requirements must match the tool_dependencies.xml
  • Tool_dependencies.xml 1, define a dependency as repository of a toolshed containin a tool dependency definition type 2, or write directly in the tool_dependencies.xml the instructions to install the dependency, and make it available system wide. Galaxy aims to be platform independent, so A HELL OF A JOB. http://wiki.galaxyproject.org/ToolShedToolFeatures#Automatic_third-party_tool_dependency_installa
  • Tool_dependencies.xml This is the simplest you can get. Really.
  • Tool_dependencies.xml A more complex example
  • Tool_dependencies.xml A more complex example
  • Lesson 1 It pays of to use / build on repositories started by others.
  • The problem is the testing 1, build your tool and make it work in your galaxy 2, define your dependencies 3, search the (test)toolshed for repositories you can use – tool dependency definitions (“just installing packages, without providing an interface”). 4, put them as requirements in your tool.xml 5, the ones you do not find: decide whether to create a separate tool dependency definition and integrate them OR 5' add them to your dependencies.xml file. 6' Update/Load to a Toolshed 7' Fire up a test Galaxy, and plug the tool in to see whether it works.
  • The problem is the testing You might consider a virtual test machine e.g. In Virtualbox. 1, install your OS 2, fetch galaxy 3, set the universe_wsgi.ini ready (admin, location,...) 4, plug in your repository 5, SNAPSHOT your machine 6, graphically install your tool 7, define what went wrong 7` update the repository 7`` and restore the snapshot 8, interate until SUCCESS!
  • Tool dependencies Dependencies IGENOMES (http://support.illumina.com/sequencing/sequencing_software/igenome.ilmn) gtf file: $IGENOMES_ROOT/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.gtf reference whole genome sequence: $IGENOMES_ROOT/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/ reference chromosome sequences: $IGENOMES_ROOT/Mus_musculus/Ensembl/GRCm38/Sequence/Chromosomes/ PHIX-control sequences: $IGENOMES_ROOT/Mus_musculus/Ensembl/GRCm38/Sequence/AbundantSequences/phix.fa TopHat2 (Bowtie2) and STAR indexes: $IGENOMES_ROOT/Mus_musculus/Ensembl/GRCm38/Sequence/Bowtie2Index $IGENOMES_ROOT/Mus_musculus/Ensembl/GRCm38/Sequence/STARIndex Chr size file: $IGENOMES_ROOT/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/ChromInfo.txt Binaries STAR (https://code.google.com/p/rna-star/) TOPHAT2 (http://tophat.cbcb.umd.edu/) BLASTP ( ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/) or USEARCH (http://www.drive5.com/usearch/download.html) R (http://www.r-project.org/) SAMTOOLS (http://sourceforge.net/projects/samtools/files/samtools/) GATK (http://www.broadinstitute.org/gatk/download) PICARD (http://sourceforge.net/projects/picard/files/picard-tools/) SQLITE3 (http://www.sqlite.org/download.html) Custom Ensembl SQLite DB tables included: coord_system exon_transcript intergene (made by the intergenic TIScalling script based on gene) transcript exon gene data