Alternatives to 

layer-based 

image distribution
using a CERN filesystem for 

distributing container images
George Lestaris

@glestaris
• Software engineer at Pivotal
• working for Cloud
Foundry GrootFS
• github.com/cloudfoundry/
grootfs
• ex-CERNois
About me
docker	run	elasticsearch:2.3.5
Container image
Container image format
• Container images are formalized in: Docker, AppC
(ACI) and OCI Image spec
• Generally: image is the combination of:
• a set of layers
• metadata
Building an image
FROM	python:3.5
ADD	.	/myapp
RUN	pip	install		
	-r	/myapp/requirements.txt
ENTRYPOINT	python	/myapp/manage.py		
runserver	0.0.0.0:8000
Container
images are
composed of
layers
Layer is a set of
files and
directories
FROM	python:3.5
Layers help us to
inherit images
FROM	python:3.5
ADD	.	/myapp
RUN	pip	install	…
• Different image formats - different distributions
mechanisms
• Docker: download layers through HTTP
connections from a registry
• Helps reusing layers of base images
• Efficient container image fetching by parallelizing
the downloads
Container image distribution
Registry
ClientClientClientClient
New image
From cached base
Update dependencies
Distributing software
in HEP
Data
Data
Data
Data
Data
Data
Data
Data
Frequent

releases
Simulation engine
Analysis framework
Experiment geometry
Experiment software
Dependencies
Simulation engine
Analysis framework
Experiment geometry
Experiment software
Dependencies
Simulation engine
Analysis framework
Experiment geometry
Experiment software
Dependencies
Simulation engine
Analysis framework
Experiment geometry
Experiment software
Dependencies
Simulation engine
Analysis framework
Experiment geometry
Experiment software
Dependencies
Simulation engine
Analysis framework
Experiment geometry
Experiment software
Dependencies
Simulation engine
Analysis framework
Experiment geometry
Experiment software
Dependencies
Simulation engine
Analysis framework
Experiment geometry
Experiment software
Dependencies
WLCG
170 computing
centres 

in 42 countries
CernVM-FS
• Network file system
• no packages and layers —> files and directories
• FUSE
• Lazily downloads the used files
• Deduplication Downloaded files get cached using
a content addressable storage
using a network filesystem
User application
VFS
FUSE kernel
module
CernVM-FS
FUSE
CernVM-FS service
GET catalog
Cache
stat	sha256:…
GET /blob/sha256:…
open	/dir/file catalog
/dir/file	

		—>	sha256:…
Similarities between
HEP software and
container images
• Most images are based on a Linux distribution
• redis 3.2.3
• Image size: 190 MB (Compressed 74 MB)
• Used to boot: 11 MB - 5.7 %
• node 6.5.0 5.4 %
• nginx 1.11 3.1 %
Applications use a small fragment of the image
• nginx 1.10 to 1.11:
• Real changes: 4.02 MB
• Layer changes: 58 MB (two of the three layers)
• 14.4 times the size of the diff
• nginx 1.9 to 1.10: 4.8 times the size of the diff
Small changes between versions
Demo
CernVM-FS
and runC
• Small tool to create containers
• Low-level interface - not supposed to be a
container runtime
• Used by container runtimes (Docker, Garden)
internally
runC
Performance
comparison
• http://github.com/glestaris/container-camp
• Used iCE - see PyCon UK 2015
• 20 AWS VMs in eu-west (m4.large)
• 1 CernVM-FS server on an AWS VM (m4.large) in
eu-central
• Dockerhub
Experiment setup
• All VMs create a redis:3.2.3 container in parallel
• Comparing runC, Docker and Docker with warm cache
• Run the server and ping (wait for the server to came
up)
Scenario
redis-server	--daemonize	yes	
while	!	redis-cli	ping;	do	
		echo	'retrying'	
done
• IPFS: InterPlanetary file system
• Deduplication Content addressed storage for
object
• History Versioned objects
• Decentralized P2P transfers
• Objects are files, directories or changes
(commits)
Other approaches
• CI server
• Large clusters that parallelly fetch images
• Network contention
• Maintaining a private registry
• Serverless (?)
Use cases

Alternatives to layer-based image distribution: using CERN filesystem for images