Group Meeting 2017-08-09, Tech
“Conda and Bioconda”
by Davide Chicco
davide.chicco@gmail.com
●
Conda is a package manager application that quickly installs,
runs, and updates packages and their dependencies.
●
The conda command is the primary interface for managing
installations of various packages. It can query and search the
package index and current installation, create new environments,
and install and update packages into existing conda
environments.
Conda
Bioconda is a channel for the conda package manager
specializing in bioinformatics software. Bioconda consists of:
●
a repository of recipes hosted on GitHub
●
a build system that turns these recipes into conda packages
●
a repository of >2700 bioinformatics packages ready to use
with the conda install command
Bioconda
Anaconda is the original scientific Python distribution that
includes Conda.
Conda is a package manager, and became so useful and
important on its own that the developers decided to create a
separate open source package for it
Miniconda is the name of the installer package of conda, to
differentiate it from the full-size Anaconda
Bioconda is a channel of bioinformatics packages for Conda
Conda, Bioconda, Anaconda, Miniconda
Two main advantages of using Conda as a package manager,
compared to the traditional approach (installing packages
independently):
1 – The installation of Conda packages is easy, fast, and
smooth. Conda also installs all the dependencies quite easily.
2 – It has a virtual environment similar to virtualenv, which is
used in the Python community. You can have any Python
environment on your system (e.g. 2.6, 3.4 and so on) without
the need to have to install the specific Python version yourself
(e.g. “Project X depends on version 1.x but, Project Y needs
4.x”).
Conda advantages
Bioconda: how to use it
(1) Manually install Miniconda
Bioconda: how to use it
https://conda.io/miniconda.html
(2) After installing conda you will need to add the bioconda
channel as well as the other channels bioconda depends on. It is
important to add them in this order so that the priority is set
correctly (that is, bioconda is highest priority).
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
Bioconda: how to use it
(3) Bioconda is now enabled, so it's possible to install new
packages into the current Conda enrivorment:
conda install new_package
Bioconda: how to use it
That's it!
Problem with Conda installation:
●
Conda edits the bashrc file and its PATH
●
It sets the computer to use the Miniconda R version, instead of
the usual /usr/bin/R version
●
The Miniconda R version has a problem with the TCL package,
and does not let you install new packages
Solution:
●
Manually edit the bashrc file and put your /usr/bin/ path
before the Miniconda one
Bioconda: how to use it
Bioconda: how to add
a new package
First, understand the difference between recipe and package:
●
A recipe is a directory containing small set of files that defines
name, version, dependencies, and URL for source code. A recipe
typically contains a meta.yaml file that defines these settings and
a build.sh script that builds the software.
●
A recipe is converted into a package by running conda-build on
the recipe. A package is a bgzipped tar file (.tar.bz2) that
contains the built software. Packages are uploaded to
anaconda.org so that users can install them with conda
install
Bioconda: how to add a new package
Then create a recipe for your new package (
https://bioconda.github.io/guidelines.html ). Requirements:
●
Have a stable URL for the source package (for example, from
GitHub, or BitBucket)
●
Separate the download of the data files into another script
●
Remove comments and prints
●
Add md5 or sha256 hash for the source package
●
No hard-coded paths in the code
Bioconda: how to add a new package
Example: BEHST recipe
Three files:
build.sh
meta.yaml
run_test.sh
Bioconda: how to add a new package
Example: BEHST recipe build.sh file
mkdir -p $PREFIX/bin
cp $SRC_DIR/bin/* $PREFIX/bin
cd $PREFIX/bin
# downloads the BEHST default data files
./download_behst_data.sh ~/thisBEHSTdataFolder
Bioconda: how to add a new package
Example: BEHST recipe run_test.sh file
# download a minimal data set for testing
./download_behst_data.sh
~/thisBEHSTdataFolder --small
# runs a BEHST test in few minutes
./behst.py
~/thisBEHSTdataFolder/pressto_LIVER_enhancers
.bed ~/thisBEHSTdataFolder
Bioconda: how to add a new package
Example: BEHST recipe meta.yaml file
package:
name: behst
version: 0.7
source:
fn: 3b1d38405007.zip
url: https://bitbucket.org/hoffmanlab/behst/get/3b1d38405007.zip
sha256:67d554adbb765ec53cb557e4ed9fbe8b6fc1dff133352e2c64f3a3c85797fe7a
Bioconda: how to add a new package
1 / 4
Example: BEHST recipe meta.yaml file
build:
number: 0
skip: True # [not py27 or osx]
requirements:
build:
- python
- pandas
- zlib
- libpng
- gnu-wget
Bioconda: how to add a new package
2 / 4
Example: BEHST recipe meta.yaml file
run:
- python
- pandas
- zlib
- libpng
- bedtools
- pybedtools
- r-base
- curl
- r-gprofiler
Bioconda: how to add a new package
3 / 4
Example: BEHST recipe meta.yaml file
test:
files:
- run_test.sh
about:
home: https://bitbucket.org/hoffmanlab/behst
license: BSD
license_file: LICENSE.txt
summary: BEHST package
Bioconda: how to add a new package
4 / 4
Steps I followed to put BEHST as a package on Bioconda:
1) I forked the Bioconda recipes project on GitHub
2) I created a branch for BEHST
3) I cloned the repository on my pc (git clone…)
4) In the /bioconda-recipes/recipes/ I created the BEHST folder
5) I put my recipe files into this BEHST folder
6) I created a profile on Travis CI & linked it to my GitHub account
7) I edited my code & tested the Bioconda recipe locally on my pc
8) I committed and pushed changes (git commit & git push...)
9) After each push, Travis CI starts a test job on a Linux machine. I
can check it here:
https://travis-ci.org/davidechicco/bioconda-recipes/builds
Bioconda: how to add a new package
Steps I followed to put BEHST as a package on Bioconda:
10) Once I had a working version of the recipe, working
locally and on Travis CI, I asked to make a pull request to the
Bioconda developers
11) They finally added it to the official list of recipes of
Bioconda
12) Now it's freely available and it can be installed easily:
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
conda install -c bioconda behst
Bioconda: how to add a new package
Lessons learned:
●
The package requirements on the Bioconda website are
incomplete: there are other specifications required by the
Bioconda developers
●
No hard-coded file names in the package code
●
No data files in the package code (create a separated
downoad_data_files script instead)
Bioconda: how to add a new package
The end

Conda and Bioconda

  • 1.
    Group Meeting 2017-08-09,Tech “Conda and Bioconda” by Davide Chicco davide.chicco@gmail.com
  • 2.
    ● Conda is apackage manager application that quickly installs, runs, and updates packages and their dependencies. ● The conda command is the primary interface for managing installations of various packages. It can query and search the package index and current installation, create new environments, and install and update packages into existing conda environments. Conda
  • 3.
    Bioconda is achannel for the conda package manager specializing in bioinformatics software. Bioconda consists of: ● a repository of recipes hosted on GitHub ● a build system that turns these recipes into conda packages ● a repository of >2700 bioinformatics packages ready to use with the conda install command Bioconda
  • 4.
    Anaconda is theoriginal scientific Python distribution that includes Conda. Conda is a package manager, and became so useful and important on its own that the developers decided to create a separate open source package for it Miniconda is the name of the installer package of conda, to differentiate it from the full-size Anaconda Bioconda is a channel of bioinformatics packages for Conda Conda, Bioconda, Anaconda, Miniconda
  • 5.
    Two main advantagesof using Conda as a package manager, compared to the traditional approach (installing packages independently): 1 – The installation of Conda packages is easy, fast, and smooth. Conda also installs all the dependencies quite easily. 2 – It has a virtual environment similar to virtualenv, which is used in the Python community. You can have any Python environment on your system (e.g. 2.6, 3.4 and so on) without the need to have to install the specific Python version yourself (e.g. “Project X depends on version 1.x but, Project Y needs 4.x”). Conda advantages
  • 6.
  • 7.
    (1) Manually installMiniconda Bioconda: how to use it https://conda.io/miniconda.html
  • 8.
    (2) After installingconda you will need to add the bioconda channel as well as the other channels bioconda depends on. It is important to add them in this order so that the priority is set correctly (that is, bioconda is highest priority). conda config --add channels defaults conda config --add channels conda-forge conda config --add channels bioconda Bioconda: how to use it
  • 9.
    (3) Bioconda isnow enabled, so it's possible to install new packages into the current Conda enrivorment: conda install new_package Bioconda: how to use it That's it!
  • 10.
    Problem with Condainstallation: ● Conda edits the bashrc file and its PATH ● It sets the computer to use the Miniconda R version, instead of the usual /usr/bin/R version ● The Miniconda R version has a problem with the TCL package, and does not let you install new packages Solution: ● Manually edit the bashrc file and put your /usr/bin/ path before the Miniconda one Bioconda: how to use it
  • 11.
    Bioconda: how toadd a new package
  • 12.
    First, understand thedifference between recipe and package: ● A recipe is a directory containing small set of files that defines name, version, dependencies, and URL for source code. A recipe typically contains a meta.yaml file that defines these settings and a build.sh script that builds the software. ● A recipe is converted into a package by running conda-build on the recipe. A package is a bgzipped tar file (.tar.bz2) that contains the built software. Packages are uploaded to anaconda.org so that users can install them with conda install Bioconda: how to add a new package
  • 13.
    Then create arecipe for your new package ( https://bioconda.github.io/guidelines.html ). Requirements: ● Have a stable URL for the source package (for example, from GitHub, or BitBucket) ● Separate the download of the data files into another script ● Remove comments and prints ● Add md5 or sha256 hash for the source package ● No hard-coded paths in the code Bioconda: how to add a new package
  • 14.
    Example: BEHST recipe Threefiles: build.sh meta.yaml run_test.sh Bioconda: how to add a new package
  • 15.
    Example: BEHST recipebuild.sh file mkdir -p $PREFIX/bin cp $SRC_DIR/bin/* $PREFIX/bin cd $PREFIX/bin # downloads the BEHST default data files ./download_behst_data.sh ~/thisBEHSTdataFolder Bioconda: how to add a new package
  • 16.
    Example: BEHST reciperun_test.sh file # download a minimal data set for testing ./download_behst_data.sh ~/thisBEHSTdataFolder --small # runs a BEHST test in few minutes ./behst.py ~/thisBEHSTdataFolder/pressto_LIVER_enhancers .bed ~/thisBEHSTdataFolder Bioconda: how to add a new package
  • 17.
    Example: BEHST recipemeta.yaml file package: name: behst version: 0.7 source: fn: 3b1d38405007.zip url: https://bitbucket.org/hoffmanlab/behst/get/3b1d38405007.zip sha256:67d554adbb765ec53cb557e4ed9fbe8b6fc1dff133352e2c64f3a3c85797fe7a Bioconda: how to add a new package 1 / 4
  • 18.
    Example: BEHST recipemeta.yaml file build: number: 0 skip: True # [not py27 or osx] requirements: build: - python - pandas - zlib - libpng - gnu-wget Bioconda: how to add a new package 2 / 4
  • 19.
    Example: BEHST recipemeta.yaml file run: - python - pandas - zlib - libpng - bedtools - pybedtools - r-base - curl - r-gprofiler Bioconda: how to add a new package 3 / 4
  • 20.
    Example: BEHST recipemeta.yaml file test: files: - run_test.sh about: home: https://bitbucket.org/hoffmanlab/behst license: BSD license_file: LICENSE.txt summary: BEHST package Bioconda: how to add a new package 4 / 4
  • 21.
    Steps I followedto put BEHST as a package on Bioconda: 1) I forked the Bioconda recipes project on GitHub 2) I created a branch for BEHST 3) I cloned the repository on my pc (git clone…) 4) In the /bioconda-recipes/recipes/ I created the BEHST folder 5) I put my recipe files into this BEHST folder 6) I created a profile on Travis CI & linked it to my GitHub account 7) I edited my code & tested the Bioconda recipe locally on my pc 8) I committed and pushed changes (git commit & git push...) 9) After each push, Travis CI starts a test job on a Linux machine. I can check it here: https://travis-ci.org/davidechicco/bioconda-recipes/builds Bioconda: how to add a new package
  • 22.
    Steps I followedto put BEHST as a package on Bioconda: 10) Once I had a working version of the recipe, working locally and on Travis CI, I asked to make a pull request to the Bioconda developers 11) They finally added it to the official list of recipes of Bioconda 12) Now it's freely available and it can be installed easily: conda config --add channels defaults conda config --add channels conda-forge conda config --add channels bioconda conda install -c bioconda behst Bioconda: how to add a new package
  • 23.
    Lessons learned: ● The packagerequirements on the Bioconda website are incomplete: there are other specifications required by the Bioconda developers ● No hard-coded file names in the package code ● No data files in the package code (create a separated downoad_data_files script instead) Bioconda: how to add a new package
  • 24.