More Related Content

More from CSUC - Consorci de Serveis Universitaris de Catalunya(20)

The Conda environment system and how to use it on CSUC machines

  1. The Conda environment system and how to use it on CSUC machines Víctor Pérez 13/12/2022
  2. Index 1. What is Conda? 2. Scope of the project 3. Using Conda 4. Conda environments 5. Basic management operations 6. Using Mamba 7. Script examples 8. Best practices
  3. What is Conda? ✓ Originally: Anaconda, a distribution of Python including common scientific packages. https://www.anaconda.com/ ✓ Extended to include R and R packages, scientific libraries, other software, etc. ✓ Conda is the core package manager for the Anaconda project. ✓ Well established within the statistics, bioinformatics, data science, artificial intelligence communities, amongst others.
  4. What is Conda? ✓ Conda installs and updates binary versions of Python and R packages from its own (or third party) repositories. ✓ It is an alternative to other repository systems, like Pip for Python or CRAN for R. ✓ It is also a convenient way to manage dependencies for Python and R packages. ✓ Particularly useful for fields in which there are complex analysis pipelineswhich require many different packages.
  5. What is Conda? ✓ But Conda isn't... - A repository of system software packages. - A repository of source code. - A replacement for environment modules. - Perfect, infinite or infallible. ✓ Be mindful of its limitations. ✓ You can find full documentation at: https://conda.io
  6. Scope of the project ✓ Conda has grown over the past few years to encompass a large interconnected ecosystem of software, including: ✓ Python and Python packages ✓ R and R packages ✓ IDEs: Jupyter, Spyder, Rstudio... ✓ Scientific packages: Numpy, SciPy, Pandas, Numba, Dask... ✓ Machine learning packages: Scikit-learn, TensorFlow, Theano... ✓ Data visualisation packages: Matplotlib, Bokeh, Datashader, Holoviews... ✓ External libraries and tools: compilers, libraries, programs...
  7. Scope of the project ✓ Channels are thematic collections of packages, equivalent to repositories, which are useful to avoid version conflicts. ✓ Examples: - main: default channel - free: software with licences that restrict commercial use - conda-forge: large collection of third-party packages - bioconda: software for bioinformatics - r: tailored to R users ✓ Companies and reference research groups often create their own channel, to ensure that pre-compiled, compatible versions of certain packages are portable between machines within the organisation.
  8. How to use Conda ✓ All we need to do to use Conda on our cluster is to load an environment module: module load conda/current ✓ This will put the Conda executable in our path and will configure everything else for us.
  9. How to use Conda ✓ Then we use the command conda (+ action) to run it: conda list conda activate conda deactivate conda create conda search conda install conda remove conda update conda clean conda info
  10. Conda environments ✓ Inside a given installation of Conda, there can be any number of environments. ✓ Environments are like Python profiles: each will be an independent ecosystem with a different list of packages and versions installed. ✓ On our machines, Conda is always used via private environments, owned by a specific user. ✓ Users within the same group can easily share their environments with each other through a simple configuration step.
  11. Conda environments ✓ To see a list of environments: conda env list ✓ To load an environment: conda activate <env_name> ✓ To unload: conda deactivate
  12. Conda environments ✓ To see the contents of an env: conda list [-n env_name] ✓ If no environment is specified, actions typically apply to the currently active one. ✓ Note: source activate and source deactivate are obsolete.
  13. Basic management operations ✓ Your environments are stored at /scratch/$USER/.conda/envs ✓ To create a new empty environment: conda create -n <env_name> ✓ To create a new environment with packages already installed in it: conda create –n <env_name> [list of packages] ✓ To install packages in an existing environment: conda install [-n env_name] <list of packages> ✓ Version and channel can also be specified: conda install [-n env_name] [-C channel] <package=version>
  14. Basic management operations ✓ To update packages in an environment: conda update [-n env_name] <packages> or conda update [-n env_name] --all ✓ To uninstall packages: conda remove [-n env_name] <packages> ✓ To completely delete an environment: conda env remove -n <env_name> ✓ conda keeps track of modifications to each environment; you can see a history with conda list [-n env_name] --revisions
  15. Basic management operations ✓ Although not ideal, sometimes you will need software that isn't packaged for Conda inside a Conda environment. ✓ It is possible to install Python packages using Pip, or R packages using BioConductor or CRAN, within a Conda environment, but it requires manually configuring a proxy. ✓ Those packages won't be managed by Conda, so be careful. ✓ Let us know if you need to do this so that we can set it up for you.
  16. Using Mamba Mamba is a reimplementation of Conda with fast execution in mind. ✓ It is rewritten in C++, with multithreading support for downloads, installations and searches. ✓ It is fully integrated with Conda and can be used as an alternative on an execution-by-execution basis.
  17. Using Mamba ✓ To use it, simply replace conda with mamba in your command: mamba list mamba create mamba search mamba install mamba remove mamba update mamba clean mamba info ✓ Note: to activate or deactivate an environment, you should still use conda activate/deactivate instead
  18. Using Mamba ✓ Mamba will take the same options and arguments as Conda. ✓ The additional performance brought by Mamba will be most useful when managing large numbers of packages at once, such as when: - Installing many packages at once - Updating a large environment - Searching caches - Solving dependency clashes ✓ Conda will still be a fallback for the rare functionalities not yet implemented in Mamba.
  19. Script examples: Python ✓ Minimal example running Python through Conda: #!/bin/bash #SBATCH –p std #SBATCH –N 1 #SBATCH –n 1 #SBATCH –t 1:00:00 module load conda/current conda activate my_py_environment python example.py > out.log
  20. Script examples: R ✓ Minimal example running R through Conda: #!/bin/bash #SBATCH –p std #SBATCH –N 1 #SBATCH –n 1 #SBATCH –t 1:00:00 module load conda/current conda activate my_R_environment Rscript example.R > out.log
  21. Best practices ✓ For software used across a group, it's typically more convenient to designate one user as the owner and manager of an environment and have everyone else access it. ✓ Avoid clutter; it's better to create multiple single-purpose environments than one large environment with too many packages. ✓ Be mindful of version clashes when updating environments; if you don't need to update, don't. ✓ When in doubt, contact us – we can do it for you.
  22. That's all! Thank you for your attention!