Presented by Daniele Venzano, Research Engineer, EURECOM
We built Zoe, an open source user-facing service that ties together Spark, a data-intensive framework for big data computation, and Swarm, the Docker clustering system. It targets data scientists who need to run their data analysis applications without having to worry about systems details. Zoe can execute long running Spark jobs, but also Scala or iPython interactive notebooks and streaming applications, covering the full Spark development cycle. When a computation is finished, resources are automatically freed and available for other uses, since all processes are run in Docker containers.
In this talk we are going to present why Zoe, the Container Analytics as a Service, was born, its architecture and the problems it tries to solve. Zoe would not be there without Swarm and Docker and we will also talk about some of the stumbling blocks we encountered and the solutions we found, in particular in transparently connecting Docker hosts through a physical network. Zoe was born as a research prototype, but is now stable and is currently being used to run real jobs from users in our research institution. Application scheduling on top of Swarm and optimized container placement will also be covered during the presentation.