A presentation for the encapsulation section at the Dataverse 2020 Community Meeting. Includes an overview of Singularity and dummy implementation / idea for data containers.
9. The problems Singularity solves
SINGULARITY: CONTAINERS FOR HPC
Reproducibility and archive of software and environments
10. The problems Singularity solves
SINGULARITY: CONTAINERS FOR HPC
Reproducibility and archive of software and environments
A container is a file. It’s portable and shareable.
11. The problems Singularity solves
SINGULARITY: CONTAINERS FOR HPC
Reproducibility and archive of software and environments
A container is a file. It’s portable and shareable.
Integration with traditional HPC
12. The problems Singularity solves
SINGULARITY: CONTAINERS FOR HPC
Reproducibility and archive of software and environments
A container is a file. It’s portable and shareable.
Integration with traditional HPC
The user inside the container is the user outside
14. SINGULARITY: THE BUILD SPECIFICATION
Singularity
Bootstrap: docker
From: python:3.7
%post
echo “This is just run once after bootstrap!”
echo “Install and add stuff here!”
apt-get update
apt-get install -y vim wget
sudo singularity build container.sif Singularity
15. SINGULARITY: THE BUILD SPECIFICATION
Singularity
Bootstrap: docker
From: python:3.7
%post
echo “This is just run once after bootstrap!”
echo “Install and add stuff here!”
apt-get update
apt-get install -y vim wget
%runscript
exec /usr/bin/python “$@”
sudo singularity build container.sif Singularity
21. SINGULARITY: DISCOVERABILITY
Bootstrap: docker
From: python:3.7
%post
…
%help
Hey there! This is how you can run this container:
$ singularity exec container.sif /code/script.py input1
$ singularity inspect -H container.sif
Help within container
31. %apprun hello-world
/bin/bash hello-world.sh
%appinstall hello-world
echo "echo 'Hello World!'" >> $SCIF_APPBIN/hello-world.sh
chmod u+x $SCIF_APPBIN/hello-world.sh
%appenv hello-world
THEBESTAPP=$SCIF_APPNAME
export THEBESTAPP
%applabels hello-world
MAINTAINER Vanessasaur
%apphelp hello-world
This is an example "Hello World" application.
You can install it to a Scientific Filesystem
(scif) with the command:
scif install hello-world.scif
hello-world.scif
https://sci-f.github.io/tutorial-quick-start
%app[action] [name]
34. SINGULARITY: DISCOVERABILITY
- Inspect for global container metadata and labels
- Help for user friendly instructions
- Read Only containers for portability, reproducibility
- SCIF is natively implemented in Singularity
39. What are the features of a data container?
SINGULARITY: DATA CONTAINERS
40. What are the features of a data container?
SINGULARITY: DATA CONTAINERS
- Exclusively data (no operating system)
41. What are the features of a data container?
SINGULARITY: DATA CONTAINERS
- Exclusively data (no operating system)
- Allows query of data / search of metadata
42. What are the features of a data container?
SINGULARITY: DATA CONTAINERS
- Exclusively data (no operating system)
- Allows query of data / search of metadata
- Container can be bound as a volume
54. Why is this so powerful?
SINGULARITY: DATA CONTAINERS
- Entrypoint functions can be optimized for dataset
- Metadata extraction can be too
- Container can be bound as a volume
55. What would a data container look like for your use case?
Hey friends, I’m Vanessa, and I’m a research software dinosaur at the Stanford Research Computing CEnter. And today we are going to talk about...
Containers, containers containers! Now, I don’t want to mislead you - containers are old news. You could probably give talk in like 2015 about this exciting newish technology that is helping research, but that was 5 years ago dawg, that’s OLD NEWS.
You probably know this too, but we can abstractly think of a container as this isolated environment that let’s you package an operating system, software libraries, oh yeah, and your life’s work, which might include data and scripts to some extent.
You probably know this too, but we can abstractly think of a container as this isolated environment that let’s you package an operating system, software libraries, oh yeah, and your life’s work, which might include data and scripts to some extent.
You probably know this too, but we can abstractly think of a container as this isolated environment that let’s you package an operating system, software libraries, oh yeah, and your life’s work, which might include data and scripts to some extent.
I’m guessing that you are familiar with Docker, and at some points, especially if you want to run Docker on a cluster, it’s probably felt like this.
Well, back in 2016 we decided that Scientists need containers too. This was really when the early versions of Singularity started to be kicked up, and I was an early developer.
Singularity is just like Docker in that you can put your analysis code, dependencies, and everything about the environment in a container, and move it from your local machine, to a cluster, and it’s going to work. Singularity containers will run using SLURM, locally, or basically all the places that Docker cannot.
Singularity is just like Docker in that you can put your analysis code, dependencies, and everything about the environment in a container, and move it from your local machine, to a cluster, and it’s going to work. Singularity containers will run using SLURM, locally, or basically all the places that Docker cannot.
Singularity is just like Docker in that you can put your analysis code, dependencies, and everything about the environment in a container, and move it from your local machine, to a cluster, and it’s going to work. Singularity containers will run using SLURM, locally, or basically all the places that Docker cannot.
Singularity is just like Docker in that you can put your analysis code, dependencies, and everything about the environment in a container, and move it from your local machine, to a cluster, and it’s going to work. Singularity containers will run using SLURM, locally, or basically all the places that Docker cannot.
So what about provenance and data? This is probably what we really care about.
So what about provenance and data? This is probably what we really care about.
Like Docker, Singularity has support for adding labels, which is something I first added way back in 2017. And you’ll notice there is also this help section, so you can literally ask the container for help, and it will show you some text dump that was written by the creator in a text file.
This might seem simple and silly, but without some external documentation, a container is really just a black box. So I pushed for this as some sort of best practice because we really don’t want our containers to be black boxes!
And this inspect command is still present now, it automatically adds basic metadata to your container about when it was built, what the boostrap was from, and the version of singularity. And this can also be exported as json.
This might seem simple and silly, but without some external documentation, a container is really just a black box. So I pushed for this as some sort of best practice because we really don’t want our containers to be black boxes!
And this inspect command is still present now, it automatically adds basic metadata to your container about when it was built, what the boostrap was from, and the version of singularity. And this can also be exported as json.
So what about provenance and data? This is probably what we really care about.
So what about provenance and data? This is probably what we really care about.
And a feature that I developed that is way out of scope for this discussion is the scientific filesystem, which gives you that same ability to define labels, environment, and custom entrypoints or help snippets, but just for a single app installed inside your container.
And a feature that I developed that is way out of scope for this discussion is the scientific filesystem, which gives you that same ability to define labels, environment, and custom entrypoints or help snippets, but just for a single app installed inside your container.
And a feature that I developed that is way out of scope for this discussion is the scientific filesystem, which gives you that same ability to define labels, environment, and custom entrypoints or help snippets, but just for a single app installed inside your container.
And a feature that I developed that is way out of scope for this discussion is the scientific filesystem, which gives you that same ability to define labels, environment, and custom entrypoints or help snippets, but just for a single app installed inside your container.
So the interesting thing here is that these three sections, for a simple container, do very well to capture the kind of information that our scientist would need for reproducibility. The challenge, however, is that all of this is primarily accessible traditionally v one entrypoint, OR some kind of inspect command that is specific to the controller. And what’s the problem with this?
So a container with a good design will not only be able to serve one or more entrypoints, I want to also be able to interact with it, and have it tell me information about the runtime and content
When we install a SCIF, we still have this global content, and it still hopefully conforms to the linux filesystem hierarchy. But, what we have now is internal modularity. And actually it’s predictable internal modularity, because each of foo and bar has been installed to the container in a way that I can just look at the file content and know what belongs to each, both in terms of runtime variables and whatever file content is associated with the applications.
You are going to notice different sections here, each starts with app, then has a term to describe the section like run, install, labels, and then is followed by an app name. It’s a very nice recipe format that a user can sit down and write sections for, and then install to different hosts.
So it follows that this entire thing is for the “hello world” application, and we have these chunks sitting in the same file. And there are no rules about how many sections are required for an application, an application could be just a runscript, just a file, or just a label or environment variable. There are also no requirements about how many applications you can define in a file. If they interact with one another you should use the same file, but if not you can keep them separate and install just one or both of them to some host.
SO what does it look like to interact with a scif? Well, the first thing you would want to do is create one on your host. And a host can be your actual computer, or more likely for our purposes, inside of a container technology. In this case the development flow is simple, and familiar. You write a recipe, in this recipe you define your different applications, environments, entry points, and then you install the recipe to the host using the provided client. Finally, you interact with your SCIF.
You would find it just by executing the container, and it would show you all the commands available to interact with the scientific applications within.
So how does this work? And what do these commands actually look like? Let’s talk a little bit about the SCIF itself.
And a feature that I developed that is way out of scope for this discussion is the scientific filesystem, which gives you that same ability to define labels, environment, and custom entrypoints or help snippets, but just for a single app installed inside your container.
Well, back in 2016 we decided that Scientists need containers too. This was really when the early versions of Singularity started to be kicked up, and I was an early developer.
Well, back in 2016 we decided that Scientists need containers too. This was really when the early versions of Singularity started to be kicked up, and I was an early developer.
You probably know this too, but we can abstractly think of a container as this isolated environment that let’s you package an operating system, software libraries, oh yeah, and your life’s work, which might include data and scripts to some extent.
You probably know this too, but we can abstractly think of a container as this isolated environment that let’s you package an operating system, software libraries, oh yeah, and your life’s work, which might include data and scripts to some extent.
Well, back in 2016 we decided that Scientists need containers too. This was really when the early versions of Singularity started to be kicked up, and I was an early developer.
Well, back in 2016 we decided that Scientists need containers too. This was really when the early versions of Singularity started to be kicked up, and I was an early developer.
Well, back in 2016 we decided that Scientists need containers too. This was really when the early versions of Singularity started to be kicked up, and I was an early developer.
Well, back in 2016 we decided that Scientists need containers too. This was really when the early versions of Singularity started to be kicked up, and I was an early developer.
Well, back in 2016 we decided that Scientists need containers too. This was really when the early versions of Singularity started to be kicked up, and I was an early developer.
Well, back in 2016 we decided that Scientists need containers too. This was really when the early versions of Singularity started to be kicked up, and I was an early developer.
Well, back in 2016 we decided that Scientists need containers too. This was really when the early versions of Singularity started to be kicked up, and I was an early developer.
Well, back in 2016 we decided that Scientists need containers too. This was really when the early versions of Singularity started to be kicked up, and I was an early developer.
Well, back in 2016 we decided that Scientists need containers too. This was really when the early versions of Singularity started to be kicked up, and I was an early developer.
Well, back in 2016 we decided that Scientists need containers too. This was really when the early versions of Singularity started to be kicked up, and I was an early developer.
Well, back in 2016 we decided that Scientists need containers too. This was really when the early versions of Singularity started to be kicked up, and I was an early developer.
Well, back in 2016 we decided that Scientists need containers too. This was really when the early versions of Singularity started to be kicked up, and I was an early developer.
Well, back in 2016 we decided that Scientists need containers too. This was really when the early versions of Singularity started to be kicked up, and I was an early developer.
Well, back in 2016 we decided that Scientists need containers too. This was really when the early versions of Singularity started to be kicked up, and I was an early developer.
Well, back in 2016 we decided that Scientists need containers too. This was really when the early versions of Singularity started to be kicked up, and I was an early developer.
Well, back in 2016 we decided that Scientists need containers too. This was really when the early versions of Singularity started to be kicked up, and I was an early developer.