Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Bioconda and the Conda Package Manager

275 views

Published on

Slides from Thom Cuddihy for the 12 July 2018 EMBL-ABR webinar about Bioconda and Conda.

Thom, a bioinformatician from QFAB and the Research Computing Centre at the University of Queensland talked about Bioconda, which is the most popular and widely used bioinformatics channel for Conda (the package, dependency and environment management tool for any language—Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN). By adding a tool to the Bioconda ecosystem, it becomes widely available as an installable tool package for various operating systems and hardware that is stored in a fully-supported, global repository of bioinformatics tools.

The webinar covered Bioconda basics and how Australian researchers can use it to streamline bioinformatics tool wrapping for use on various systems.

A recording of the webinar is available here: https://youtu.be/lGa9PCSH5IU

Published in: Science
  • Be the first to comment

  • Be the first to like this

Bioconda and the Conda Package Manager

  1. 1. Bioconda and the Conda Package Manager Thom Cuddihy QFAB, QCIF, RCC, UQ
  2. 2. Overview • Discuss Conda and Bioconda • Using Conda • Wrapping a tool for (Bio)conda • Submitting a tool to Bioconda
  3. 3. Why Conda? • Manages self contained environments, including dependencies. • No sudo required • Large ecosystem of precompiled packages, organized as 'channels' (eg conda-forge, bioconda) • Language agnostic (not only Python) • Creating new packages is straightforward compared with many systems • Is the supported tool installation method for Galaxy, making tools available to additional users
  4. 4. What about brew / linuxbrew? • macOS brew and linuxbrew are essentially separate projects with little cooperation. • Anecdotally, brew/linuxbrew bioinformatics installations were often low quality, frequently broken. • homebrew-science is now deprecated
  5. 5. What about apt, yum, etc • Targeted at system-wide packages - typically don't handle non- privileged installations or concurrent versions gracefully, if at all • Official distributions (Debian, Ubuntu, Centos) software repositories rarely keep up to date with recent versions • Creating packages seems needlessly complex for researchers
  6. 6. What about modules, LMOD etc • Well suited to using the shell environment (i.e. PATH) to manage concurrent versions • LMOD environments can be 'additive' (many 'modules' can be loaded simultaneously, unlike conda environments) • LMOD isn't a package manager (doesn't handle download/compilation/installation)
  7. 7. What about pip and virtualenv? • Both come with Python, great for Python packages, large ecosystem of existing packages • Not well suited to non-Python packages (binaries and dependencies)
  8. 8. Installing Conda • Miniconda installs the conda package manager in your home directory, in it's own conda 'environment' (Miniconda is a trimmed down version of 'Anaconda’) • Miniconda can be found at: https://conda.io/miniconda.html
  9. 9. Using Conda • conda create • Create new environments • conda list • List install packages in environment • conda install • Installs packages into environment • conda remove • Removes packages from environment
  10. 10. Setting up Conda • Add default channels • ‘defaults’ – Anaconda defaults • ‘conda-forge’ – Common software, libraries and dependencies • ‘bioconda’ – Software for biological sciences
  11. 11. Using Environments “--yes” flag: skip asking for confirmation “--name” flag: environment name
  12. 12. Using Environments • Load environments: • “source activate environment” • Unload current environment: • “source deactivate” • Install software: • “conda install software[=version]] • Note that not stating a “--name” flag will perform that action in the currently activated environment
  13. 13. Using Environments
  14. 14. Exporting Conda Environments
  15. 15. Importing Conda Environments
  16. 16. Conda Environments • New conda environments “stack” • Legacy behaviour “replaced” • Keep in mind when writing scripts
  17. 17. Creating (Bio)conda packages • Conda supports packaging for: • Linux • OSX • Windows (not supported by Bioconda) • Package recipes consists of: • Package metadata (meta.yaml) required • Package install scripts (build.sh, bld.bat) required for desired platform/s • Package test scripts (run_test.sh, run_test.bat) optional
  18. 18. Defining metadata (meta.yaml)
  19. 19. Defining metadata (meta.yaml) Source
  20. 20. Defining metadata (meta.yaml) Requirements • Build • High level dependencies for building • New “{{ compiler() }}” templates • Host • Platform specific dependencies for building (cross-linked libraries e.g.) • If using new “{{ compiler() }}” templates, include all non-build dependencies • Run • Dependencies for running
  21. 21. Defining metadata (meta.yaml) Test
  22. 22. Defining metadata (meta.yaml) About
  23. 23. Build scripts (build.sh, bld.bat) • Regular installation script • Linux, OSX (build.sh) • Windows (build.bat) • Special variables • $PREFIX, %PREFIX% - Build prefix to which the build script should install • $PKG_NAME, %PKG_NAME% - Name of the package being built • $CPU_COUNT, %CPU_COUNT% - The number of CPUs on the system • E.g.
  24. 24. Test scripts (run_test.sh, run_test.bat) • Optional test scripts • Linux, OSX (run_test.sh) • Windows (run_test.bat) • Used for extended testing beyond meta.yml • Linux/OSX – use “set -euo pipefail” to enable fail on any error • Must return exit code 0 (no errors) to pass
  25. 25. Why use Bioconda? • Large channel of pre-packaged bioinformatics software • 4103 recipes • 472 contributors • Conda environments aid reproducibility • An environment.yml file can record tool versions (+build number) that can be reliably reinstalled elsewhere. • Quality control • Well documented guidelines, automatic testing and package builds (CircleCI), pull requests reviewed by core team. • Exposure
  26. 26. Why use Bioconda?
  27. 27. Dos and Don’ts for Bioconda Do • Use a stable URL • Include a SHA512/MD5 hash • Include adequate tests • Check license allows redistribution • Include program homepage, summary and license information Don’t • Include Windows build (build.bat) • Include unnecessary comments • Push inappropriate recipes • Re-package existing recipes • Use git URLs
  28. 28. Setting up for Bioconda Development https://github.com/bioconda/bioconda-recipes
  29. 29. Setting up for Bioconda Development • Clone fork • Git clone https://github.com/thomcuddihy/bioconda-recipes • Add Bioconda as “upstream” • Git remote add upstream https://github.com/ bioconda/bioconda-recipes.git • Create new branch for recipe • git checkout -b my_tool
  30. 30. Building your own recipe for Bioconda
  31. 31. Example - Parsnp
  32. 32. Example - Parsnp Build Script • Ensure prefix (output) directory has a ‘bin’ directory • Copy precompiled binary to ’bin’ Test Script • Download test data and run parsnp
  33. 33. Building your own recipe • Building your recipes locally requires ‘conda-build’ • Install into base environment using Conda
  34. 34. Building your own recipe
  35. 35. Building your own recipe • Once built successfully, tool is packaged into an archive for distribution • Locally built tool packages can be installed into conda environment for further testing or use
  36. 36. Testing your recipe • Local builds using conda-build may not be truly cross-compatible • Bioconda provides two methods for comprehensive testing locally • CircleCI docker client • Easier to use, quicker • Bioconda-utils • More stringent, allows for MacOS testing
  37. 37. Testing your recipe • CircleCI runs from base directory • Requires Docker to be installed • Downloads latest Linux testing image
  38. 38. Testing your recipe
  39. 39. Submitting your recipe to Bioconda • Go to https://github.com/bioconda/bioconda-recipes and make a pull request for your branch • CircleCI will automatically build your branch. • If CircleCI tests pass, the Bioconda team will review your PR and if it's okay, merge it • Recipes in the Bioconda master branch are automatically built on Linux and macOS and uploaded to the Anaconda Bioconda channel.
  40. 40. Submitting your recipe to Bioconda
  41. 41. Submitting your recipe to Bioconda
  42. 42. Submitting your recipe to Bioconda • Wait for CircleCI checks to finish • MacOS takes a while to start • Add comment requesting code review, mentioning the Bioconda Core (“@bioconda/core”) • If unable to mention, register as a contributor: https://github.com/bioconda/bi oconda-recipes/issues/1
  43. 43. Submitting your recipe to Bioconda • Core team members may comment or request changes before merging • If new commits to master repository since initial commit: git checkout master git pull upstream master git push origin master git checkout my_branch git merge origin/master
  44. 44. Summary • Conda • Simple package and environment management • Easy recipe packaging for software • Allows tool deployment across multiple OS platforms • Allows tool inclusion environments like HPCs and Galaxy • Bioconda • Comprehensive tool testing • Active community of contributors and moderators • Extensive guidelines and best practices ensure high quality tool packaging • Most popular and widely used bioinformatics channel
  45. 45. Acknowledgements • Simon Gladman, Saskia Hiltemann and Eric Rasche • https://www.melbournebioinformatics.org.au/projects-blog/bioconda/ • Andrew Perry • https://github.com/MonashBioinformaticsPlatform/bioconda-tutorial Further Information • https://conda.io/docs/user-guide/tasks/build-packages • https://bioconda.github.io/contributing.html

×