Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DockerCon EU 2015: Zoe: Swarming Spark applications


Published on

Presented by Daniele Venzano, Research Engineer, EURECOM

We built Zoe, an open source user-facing service that ties together Spark, a data-intensive framework for big data computation, and Swarm, the Docker clustering system. It targets data scientists who need to run their data analysis applications without having to worry about systems details. Zoe can execute long running Spark jobs, but also Scala or iPython interactive notebooks and streaming applications, covering the full Spark development cycle. When a computation is finished, resources are automatically freed and available for other uses, since all processes are run in Docker containers.

In this talk we are going to present why Zoe, the Container Analytics as a Service, was born, its architecture and the problems it tries to solve. Zoe would not be there without Swarm and Docker and we will also talk about some of the stumbling blocks we encountered and the solutions we found, in particular in transparently connecting Docker hosts through a physical network. Zoe was born as a research prototype, but is now stable and is currently being used to run real jobs from users in our research institution. Application scheduling on top of Swarm and optimized container placement will also be covered during the presentation.

Published in: Technology
  • Dating for everyone is here: ❶❶❶ ❶❶❶
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating direct: ❶❶❶ ❶❶❶
    Are you sure you want to  Yes  No
    Your message goes here

DockerCon EU 2015: Zoe: Swarming Spark applications

  1. 1. Zoe: Swarming Spark applications Daniele Venzano Research Engineer, EURECOM
  2. 2. My background Software engineering (2010) • Linux embedded systems, kernel drivers, graphical interfaces Research (2012) • Code analysis, OpenFlow, automatic bug detection More research (now) • Virtualization, networking, distributed systems performance
  3. 3. DSG and Eurecom Research center on the French Riviera Like this?
  4. 4. DSG and Eurecom Research center on the French Riviera Or more like this?
  5. 5. DSG and Eurecom Engineering research center • Academic research in telecommunication, multimedia, networks and security • Close ties with local and international companies Distributed Systems Group • Focusing on data-intensive applications (so called “big data”) at all levels • Performance impact of virtualization, storage and network technologies (that’s me!) • Data processing frameworks (Hadoop, Spark) • Machine learning algorithms
  6. 6. Docker at the Distributed Systems Group Started investigating Docker in 2012 • Virtualization platform for Big Data research Summer 2015 • Built Swarm cluster • Planning to shift from VMs to Containers for most use cases Bigfoot project
  7. 7. Use cases Internally at Eurecom: • Laboratory sessions for Data Science course • ~100 students, fixed configuration, throw-away environments • Academic research • very dynamic loads, all kinds of software combinations, higher priorities near deadlines Companies have similar use cases • Production jobs • Fixed configuration, periodic executions • Research teams Smart airports Power load forecasting Customer location forecasting
  8. 8. The last 3 years: OpenStack + Sahara Public/private cloud with VM-based virtualization We contributed Spark support to Sahara Users can create clusters on-demand Assumes infinite resources Slow • Create an HDFS+Spark cluster: 5 to 10 minutes • Swarm takes a few seconds for the same task Supporting new services/versions requires code changes Users make static allocations
  9. 9. Why build on top of Docker and Swarm? Swarm has a simple, documented API Start solving our problem immediately Packaging software is very easy Freedom to experiment Fast deployments No static allocation, automatic resizing Swarm does only one thing and does it well
  10. 10. Zoe Application scheduler on top of Swarm Queues requests when resources are scarce Users can submit their own applications And create their own container images! Dynamically resizes active applications Free unused resources to speed-up other apps Can coexist with other Swarm users MSC Zoe Launch: August 2015 Tonnage: 197,362t Capacity: 19,224 TEU Length: 395.4 m Engine: 83,800 HP Crew: 22
  11. 11. What is a Zoe application?
  12. 12. Zoe architecture Zoe scheduler Swarm Images from private registry or Docker Hub Monitoring data Users submit application descriptions Zoe schedules requests
  13. 13. Automatic resize of running applications Volumes Data layer Applications Example: a data layer is not needed if there are no users Data is kept in volumes The data layer can be restarted when needed
  14. 14. Examples of scheduling policies FIFO – First In First Out Priority based Researchers near deadlines have more priority Fits nicely the Swarm priority model Deadline Finish this work by 3 p.m. Streaming analysis latency must be less than 200ms Size-based Run first the smallest applications Need to know the runtime in advance
  15. 15. Zoe implementation Two client implementations Web interface Command line for scripting Simple FIFO scheduler Docker images for Spark, HDFS, iPython and Spark notebooks Open source on GitHub, images available on the Docker Hub
  16. 16. Zoe - future Set date: March 2016 version 1.0 Big plans for Zoe One full-time programmer Companies we spoke to, all, are very interested Features for 1.0 and after: Create Zoe applications with more and more services Automatic resizing of applications Use the new volume management Monitoring Advanced scheduling
  17. 17. Using Docker Swarm for data-intensive apps L2 networking for Docker containers Service discovery via DNS Docker bridge eth0 eth1 Docker bridge eth0 eth1 What about Swarm 1.0 multi-host networking? - We need hostnames to be visible from outside - Will run measurements on overlay network performance c1 c2 c3 c4
  18. 18. Key takeaways 1. Zoe is a data-intensive application scheduler that targets data scientists and private clouds 2. It is very easy to build cloud applications on top of Swarm 3. Data-intensive frameworks like Spark can run easily and efficiently on top of Swarm 4. Network between Docker containers on different hosts can be made transparent
  19. 19. Thank you! Daniele Venzano