Choosing the Right Framework for Running Docker Containers in Prod

8,723 views

Published on

In this talk, I cover the key elements of running multiple Docker containers per VM, the major frameworks available to assist with this, and when to choose each.

Published in: Software

Choosing the Right Framework for Running Docker Containers in Prod

  1. 1. Choosing the Right Framework for Running Docker Containers in Prod Presented by: Josh Padnick Phoenix DevOps
  2. 2. Docker is a game-changer.
  3. 3. Why Devs • Lightweight
 Containers are just isolated processes. We can start a new container in seconds. • Portable
 My Mac, the linux EC2 instance, and your Windows PC all run the exact same container. • Ecosystem
 I can easily share images, manage private images, and use “official” images for virtually all open source software.
  4. 4. Why Devs • Squeeze More Resources out of a Single Server
 Did you know this dirty secret of the Infrastructure-as-a-Service world? 85% 15% In Use Free SOURCE: http://radar.oreilly.com/2014/12/why-the-data-center-needs-an-operating-system.html Typical Data Center Resource Utilization
  5. 5. So can I run multiple containers in a single VM?
  6. 6. VM-1 VM-2 Service A Service AService B Service C Service B Something like this?
  7. 7. Yes! But, well…
  8. 8. The Gartner Tech Hypecycle Any guesses where the “multi-container VM” paradigm is? SOURCE: https://setandbma.wordpress.com/2012/05/28/technology-adoption-shift/
  9. 9. My Take on This In reality, the exact spot varies by team, so this is a bit of a generalization.
  10. 10. Today’s talk is about our options for that red dot. For each option, we’ll cover: • Pro’s • Con’s • When to use
  11. 11. • Full-stack web-app engineer for 12+ years. • Since I’ve worked with many different teams, I generally help accelerate the DevOps/AWS learning curve for teams. • PhxDevOps Clients include: Intel, Infusionsoft, American Bible Society, CÜR Music, plus multiple startups and web design companies. Josh Padnick These slides are posted on http://joshpadnick.com I help software teams scale their app using DevOps and AWS. http://PhoenixDevOps.com @OhMyGoshJosh My LinkedIn My GitHub Want to know more about building scalable apps on AWS? Check out a 12,000+ word article I wrote on how at http://bit.ly/1EtYRbL.
  12. 12. Disclaimers • I have a bias toward AWS and may leave
 out solutions from other IaaS providers 
 such as Azure. • The solutions we cover today are deep
 and diverse. This talk reflects my own experiences but your mileage may vary!
  13. 13. Agenda • CoreOS in 60 seconds • Theory of Multi-Container VM’s • The Three Paradigms of Multi-Container VM’s • Cover all the Major Solutions
  14. 14. We only have 60 minutes. So we’ll move fast.
  15. 15. CoreOS in 60 Seconds
  16. 16. What is ? • It’s super stripped-down linux. You don’t even get a package manager. • The idea is you run everything as a container. • CoreOS is based on ChromiumOS, which itself is based on Gentoo Linux. • Uses systemd for init.
  17. 17. CoreOS and This Presentation • Because CoreOS is “built for Docker”, many solutions use it as their default OS. • In reality, you can usually use any OS that runs Docker natively, but CoreOS is often the “recommended” linux distro for Docker.
  18. 18. Theory of Multi-Container VM’s
  19. 19. What does it take to run this? VM-1 VM-2 Service A Service AService B Service C Service B
  20. 20. Docker Builder We need somewhere to build our image.
  21. 21. • If we build from a fresh environment each time, every Docker image/layer is downloaded from scratch. • Ideally, our “Docker Builder” has pre-downloaded (“seeded”) all our most popular Docker images. • In practice, this is managed by your build tool, like Jenkins, CircleCI, Shippable, etc. Docker Builder We need somewhere to build our image.
  22. 22. Docker Registry We need somewhere to store our built images. Docker Builder We need somewhere to build our image.
  23. 23. • Main options here are: • Cloud • Docker Hub • Quay.io • On-Premise • Docker Trusted Registry (Paid) • Docker Distribution (Free) • Quay.io (Paid) Docker Registry We need somewhere to store our built images. Docker Builder We need somewhere to build our image.
  24. 24. Docker Registry We need somewhere to store our built images. Automated Deployment We need a way to deploy our Docker image into the cluster. Docker Builder We need somewhere to build our image.
  25. 25. Docker Registry We need somewhere to store our built images. Automated Deployment We need a way to deploy our Docker image into the cluster. Container Scheduling We need something to decide which host will launch our new container? Docker Builder We need somewhere to build our image.
  26. 26. VM-1 VM-2 Service A Service AService B Service C Service B
  27. 27. VM-1 VM-2 Service A Service AService B Service C Service B “Increase Service C container count from 1 to 2.”
  28. 28. VM-1 VM-2 Service A Service AService B Service C Service B Service C
  29. 29. Docker Builder We need somewhere to build our image. Docker Registry We need somewhere to store our built images. Automated Deployment We need a way to deploy our Docker image into the cluster. Container Scheduling We need something to decide which host will launch our new container?
  30. 30. Docker Registry We need somewhere to store our built images. Automated Deployment We need a way to deploy our Docker image into the cluster. Container Scheduling We need something to decide which host will launch our new container? • One of the most important considerations when choosing a host is “who’s got the memory and CPU I need?” • But we also need to know who’s in a different Availability Zone / Data Center so we can achieve high fault tolerance. Docker Builder We need somewhere to build our image.
  31. 31. Docker Registry We need somewhere to store our built images. Automated Deployment We need a way to deploy our Docker image into the cluster. Routing / Load Balancing We need a way to route a request to any of our containers. Container Scheduling We need something to decide which host will launch our new container? Docker Builder We need somewhere to build our image.
  32. 32. VM-1 VM-2 Service A Service AService B Service C Service B
  33. 33. VM-1 VM-2 Service A Service AService B Service C Service B GET ServiceB HTTP/1.1
  34. 34. VM-1 VM-2 Service A Service AService B Service C Service B Routing / Load Balancing Solution
  35. 35. Docker Registry We need somewhere to store our built images. Automated Deployment We need a way to deploy our Docker image into the cluster. Routing / Load Balancing We need a way to route a request to any of our containers. Container Scheduling We need something to decide which host will launch our new container? Docker Builder We need somewhere to build our image.
  36. 36. Docker Registry We need somewhere to store our built images. Automated Deployment We need a way to deploy our Docker image into the cluster. Service Discovery When we launch new containers, we need to tell our router they exist. Routing / Load Balancing We need a way to route a request to any of our containers. Container Scheduling We need something to decide which host will launch our new container? Docker Builder We need somewhere to build our image.
  37. 37. VM-1 VM-2 Service A Service AService B Service C Service B Routing / Load Balancing Solution
  38. 38. VM-1 VM-2 Service A Service AService B Service C Service B Routing / Load Balancing Solution “Increase Service C container count from 1 to 2.”
  39. 39. VM-1 VM-2 Service A Service AService B Service C Service B Routing / Load Balancing Solution Service C
  40. 40. VM-1 VM-2 Service A Service AService B Service C Service B Routing / Load Balancing Solution Service C
  41. 41. VM-1 VM-2 Service A Service AService B Service C Service B Routing / Load Balancing Solution Service C
  42. 42. Docker Registry We need somewhere to store our built images. Automated Deployment We need a way to deploy our Docker image into the cluster. Service Discovery When we launch new containers, we need to tell our router they exist. Routing / Load Balancing We need a way to route a request to any of our containers. Container Scheduling We need something to decide which host will launch our new container? Docker Builder We need somewhere to build our image.
  43. 43. Docker Registry We need somewhere to store our built images. Automated Deployment We need a way to deploy our Docker image into the cluster. Auto-Restart Failed Containers Something needs to know that a container failed and auto-restart it. Service Discovery When we launch new containers, we need to tell our router they exist. Routing / Load Balancing We need a way to route a request to any of our containers. Container Scheduling We need something to decide which host will launch our new container? Docker Builder We need somewhere to build our image.
  44. 44. Docker Registry We need somewhere to store our built images. Automated Deployment We need a way to deploy our Docker image into the cluster. Auto-Restart Failed Containers Something needs to know that a container failed and auto-restart it. Extract Container Logs We need a way to read logs from all containers. Service Discovery When we launch new containers, we need to tell our router they exist. Routing / Load Balancing We need a way to route a request to any of our containers. Container Scheduling We need something to decide which host will launch our new container? Docker Builder We need somewhere to build our image.
  45. 45. Docker Registry We need somewhere to store our built images. Automated Deployment We need a way to deploy our Docker image into the cluster. Extract Container Logs We need a way to read logs from all containers. Monitor Everything We need to monitor cluster resources and individual containers. Auto-Restart Failed Containers Something needs to know that a container failed and auto-restart it. Service Discovery When we launch new containers, we need to tell our router they exist. Routing / Load Balancing We need a way to route a request to any of our containers. Container Scheduling We need something to decide which host will launch our new container? Docker Builder We need somewhere to build our image.
  46. 46. Docker Registry We need somewhere to store our built images. Automated Deployment We need a way to deploy our Docker image into the cluster. Extract Container Logs We need a way to read logs from all containers. Monitor Everything We need to monitor cluster resources and individual containers. Auto-Restart Failed Containers Something needs to know that a container failed and auto-restart it. Service Discovery When we launch new containers, we need to tell our router they exist. Routing / Load Balancing We need a way to route a request to any of our containers. Container Scheduling We need something to decide which host will launch our new container? Docker Builder We need somewhere to build our image.
  47. 47. Does our cluster have “state”? • Yes! • Router needs to know which containers are from which services. • We need to know which hosts are actually in our cluster.
  48. 48. Storing Cluster State • This topic alone warrants full books. • One option for storing state is to simply use a database like PostgreSQL. • But the more popular option is for each host in the cluster to store state in an eventually consistent way using a “consensus algorithm.” I call this a cluster datastore. • The most popular such solutions are: etcd, consul, and zookeeper.
  49. 49. Unit of Container Deployment • We need something that describes what kind of container to deploy. • This is typically a declarative file in either YAML or JSON that declares all aspects of our docker run command, whether 2+ containers are run together, etc.
  50. 50. The Theory in Summary • Docker builder • Docker registry • Automated deployment • Container scheduling • Routing / Load Balancing • Service discovery • Auto-restart failed containers • Logging • Monitoring • Cluster datastore • Unit of Container Deployment
  51. 51. Paradigms of Multi-Container VM’s
  52. 52. Three Paradigms • Cluster Frameworks • Platform-as-a-Service (PaaS) • Data Center Operating Systems
  53. 53. Paradigm #1: Cluster Frameworks
  54. 54. The Big Idea • You control the infrastructure (e.g. AWS, Azure) • You’re given an unopinionated set of primitives on top of which you can build your own solution. • Primitives include launching containers, but not full deployment.
  55. 55. Major Cluster Frameworks + Fleet Docker Swarm (We’ll cover each of these today)
  56. 56. Paradigm #2: Platform-as-a-Service (PaaS)
  57. 57. The Big Idea • You control the infrastructure (e.g. AWS, Azure) • Install the PaaS tool on top of your own infrastructure. • PaaS tool typically sits on top of a Cluster Framework. • You’re not 100% sure how it works, but it solves your needs today and you can always deep dive later, or (hopefully) get commercial support.
  58. 58. Major PaaS Solutions (We’ll cover Deis in depth shortly) https://github.com/remind101/empire
  59. 59. Paradigm #3: Data Center Operating Systems
  60. 60. The Big Idea • You control the infrastructure (e.g. AWS, Azure) • You’re given an opinionated framework which has everything you need to deploy. • You operate at the abstraction level of “cluster” and really don’t care when individual hosts die. • These tend to be the most powerful, and the most complex.
  61. 61. Major Data Center Operating System Frameworks (We’ll cover each of these today) ( + ? )
  62. 62. Hybrid Paradigms: Where A runs on B
  63. 63. Hybrid Combo #1: Run Kubernetes on Mesos
  64. 64. Hybrid Combo #2: Run Kubernetes as a Hosted Service
  65. 65. Hybrid Combo #3: Run the Open Source PaaS “Empire” on top of EC2 Container Service https://github.com/remind101/empire
  66. 66. Hybrid Combo #4: Use Docker Swarm as the UX to Mesos.
  67. 67. Mentally Managing the Hybrids • Don’t get too caught up on these exotic combinations. • Focus first on one of the “non-hybrid” technologies. • Then evaluate what can be run on top of your choice technology, and whether it will make life easier for you.
  68. 68. The Major Solutions
  69. 69. Disclaimers • We just don’t have enough time to cover
 each solution in depth. • We only get about 3 minutes per solution, 
 so let’s get started!

  70. 70. Cluster Framework Solution: + Fleet
  71. 71. How It Works • Launch a CoreOS cluster on the IaaS platform of your choice (e.g. AWS, Azure, VMWare, etc.) • CoreOS comes with a CLI tool fleet that enables launching containers, but does not constitute a full deployment system. • Best thought of as a set of primitives you can work with, not a full-fledged framework. • Define systemd unit files to describe the Docker container you want to launch.
  72. 72. Docker Builder Roll Your Own Docker Registry Roll Your Own Deployment: Scheduling Built into fleet, but no resource- aware scheduling Deployment: Routing Roll Your Own Deployment: Service Discovery Roll Your Own Auto-Restart Failed Containers Built into Fleet Monitoring Roll Your Own Logging Roll Your Own Cluster Data Store etcd Unit of Deployment Systemd unit file
  73. 73. Pro’s • Relatively mature/stable among Cluster Frameworks. • Once you’ve setup etcd, everything else “just works”. • RESTful API into fleet allows for easily building out your own custom solution. • Fleet will auto-restart failed containers. • Tagging cluster nodes allows for clever distribution of containers (e.g. across Availability Zones). • CoreOS gives us a well-defined method for updating individual cluster nodes to the latest CoreOS. • Commercial support available.
  74. 74. Con’s • Setting up etcd can be painful. • Fleet does not allow resource-aware scheduling, so containers may run out of resources. • Fleet does not expose a primitive for “transferring” a container from one cluster node to another. • No built-in way to monitor cluster-wide resource consumption. • Not usable for a production cluster without significant setup overhead (e.g. setting up service discovery). • Learning fleet ultimately requires learning systemd and discovering what fleet commands actually do.
  75. 75. When To Use It • You want to learn the foundations of CoreOS. • You want high customizability over your setup and can tolerate non-resource-aware scheduling. • You’re willing to manually handle many operations such as launching additional containers.
  76. 76. Cluster Framework Solution:
  77. 77. How It Works • Launch at least 3 EC2 instances in AWS. • Install the ECS agent on each node (or launch an AMI with the agent pre-installed). • Cluster setup “just works” • Define a “Task Definition” to describe how one or more Docker containers should be launched. • Define a “Service” that launches one or more instances of the Task Definition, and ECS auto-deploys your Tasks (containers).
  78. 78. Docker Builder Roll Your Own Docker Registry Roll Your Own Deployment: Scheduling Resource-aware, pluggable scheduler. Can be swapped w/ custom one. Deployment: Routing Leverages AWS 
 Elastic Load Balancers Deployment: Service Discovery Built in to services Auto-Restart Failed Containers Built in to services Monitoring Basic monitoring included at cluster level. Logging Roll Your Own Cluster Data Store Zookeeper 
 (but this is hidden to us) Unit of Deployment Task Definition
  79. 79. Pro’s • Very easy to set up. • Simple UX. • Low learning curve. • Covers most of what you need out of the box, including built-in routing and service discovery. • Presumably AWS will keep improving it. • Supported via AWS.
  80. 80. Con’s • Doesn’t support dynamic port mapping from container to host. • Each service requires its own Elastic Load Balancer, which is $18/month. (Unless you’re willing to expose a service on a port other than 80/443) • Supports rolling deployments provided you have a spare node to launch a new service instance on. Blue/Green deployments are claimed as a feature, but require out-of-band customization. • Not recommended to leverage the existing Zookeeper cluster datastore in use, so you may have to run two cluster datastores (e.g Zookeeper + Consul). • Use of “private subnets” requires two separate clusters, one for public and one for private.
  81. 81. When To Use It • You use AWS and… • You want to get up and running quickly with your Docker- based microservices framework. • You want to run your monolith using containers today, knowing you can migrate to other cluster tech’s in the future. • You want an official solution with official support. • You want to minimize the number of vendors/tech’s you work with.
  82. 82. Cluster Framework Solution: Docker Swarm
  83. 83. How It Works • You use “docker-machine” to launch multiple EC2 instances (or other VMs). Each EC2 instance is configured with the Docker daemon and the docker-swarm agent (which is just a container). • You launch one or more “Swarm Masters”, one of which is the “master leader.” You use this to control your cluster. • You can now run Docker CLI commands to launch containers on your cluster.
  84. 84. Docker Builder Roll Your Own Docker Registry Roll Your Own Deployment: Scheduling Resource-aware, pluggable scheduler. Can be swapped w/ custom one. Deployment: Routing Roll Your Own Deployment: Service Discovery Roll Your Own Auto-Restart Failed Containers Open GitHub Issue:
 https://github.com/docker/swarm/issues/599 Monitoring Roll Your Own Logging Roll Your Own Cluster Data Store Pluggable!
 See https://docs.docker.com/ swarm/discovery/ Unit of Deployment Docker container, or Docker compose manifest
  85. 85. Pro’s • Use the Docker CLI you’ve come to know and love. • You can run other Docker tools that call the old Docker CLI directly on top of Swarm and they will “just work”. Does this matter? • Potentially simpler to program against compared to CoreOS fleet. • Resource-aware scheduling. • Open source event bus available allows for interesting possibilities in response to cluster events (esp. around service discovery). • Official Docker solution.
  86. 86. Con’s • Not yet recommended for production. • My own experience with docker-machine and docker-swarm have been underwhelming in terms of stability. • Other than “official Docker solution” and “use Docker CLI”, I don’t see any superior features to alternatives.
  87. 87. When To Use It • For experiments and curiosity. • Docker swarm is intriguing, but under heavy development and doesn’t yet present a clear value proposition compared to alternatives. • But check back in 6 months, and it may be a solid contender. See Project Orca for an exciting opinionated take on the UX. https://youtu.be/ 8vSPpPSd00w?t=1h25m16s 

  88. 88. PaaS Solution:
  89. 89. How Deis Works Control Plane Component Cluster Control Plane Component Control Plane Component Data Plane Component Data Plane Component Router per Host Router per Host
  90. 90. How Deis Works Control Plane Component Cluster Control Plane Component Control Plane Component Data Plane Component Data Plane Component Router per Host Router per Host Service A Service B Service B Service C
  91. 91. Deis Workflow for Dev’s
 (basically, Heroku on your own infrastructure) SOURCE: http://docs.deis.io/en/latest/understanding_deis/concepts/
  92. 92. How Deis Works for Operators SOURCE: http://docs.deis.io/en/latest/understanding_deis/architecture/
  93. 93. Docker Builder Included! Docker Registry Included! Deployment: Scheduling Defaults to resource-unaware fleet. Pluggable schedulers are in tech-preview. Deployment: Routing Included! Deployment: Service Discovery Included! Auto-Restart Failed Containers Included! Monitoring Roll Your Own Logging Built-in logspout means you can send your logs anywhere. Cluster Data Store PostgreSQL + Ceph Unit of Deployment Heroku buildpack, Dockerfile, or Docker image.
  94. 94. Pro’s • Everything in one package, ready to go. Get up and running pretty quickly. • Nice workflow for dev’s • Open source • Great community • Good paradigm of what you would eventually need to build. • Commercial support available through Engine Yard.
  95. 95. Con’s • Learning curve for operators can feel steep. • When the PaaS fails, it’s time to start climbing the learning curve. For example, I once terminated a node, broke the ceph cluster and had to dig into the guts to figure out how to fix it. • Deis’s architectural opinions may differ from your own.
  96. 96. Data Center Operating Solution #1:
  97. 97. How Mesos Works Cluster
  98. 98. How Mesos Works Zookeeper Cluster Zookeeper Zookeeper
  99. 99. How Mesos Works Zookeeper Cluster Zookeeper Zookeeper Our job is to store cluster state!
  100. 100. How Mesos Works Zookeeper Cluster Zookeeper Zookeeper Mesos Master Mesos Master Mesos Master
  101. 101. How Mesos Works Zookeeper Cluster Zookeeper Zookeeper Mesos Master Mesos Master Mesos Master We make “resource offers” to “frameworks”.
  102. 102. How Mesos Works Zookeeper Cluster Zookeeper Zookeeper Mesos Master Mesos Master Mesos Master
  103. 103. How Mesos Works Zookeeper Cluster Zookeeper Zookeeper Mesos Master Mesos Master Mesos Master Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) We run “tasks”, which are Docker containers. We take our orders from the Master.
  104. 104. How Mesos Works Zookeeper Cluster Zookeeper Zookeeper Mesos Master Mesos Master Mesos Master Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) A Resource Offer: “Master, I have 2 CPU cores, 8 GB RAM, and 25GB of disk space available!”
  105. 105. How Mesos Works Zookeeper Cluster Zookeeper Zookeeper Mesos Master Mesos Master Mesos Master Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) A Resource Offer: “Master, why go for 2 CPUs when 
 I’ve got 3 CPUs and 10 GBs of RAM!”
  106. 106. You’ve just seen Tier 1 of the Mesos resource scheduling algorithm.
  107. 107. How Mesos Works Zookeeper Cluster Zookeeper Zookeeper Mesos Master Mesos Master Mesos Master Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Master Forwards Resource Offers “I have received your offers and they will be forwarded to whomever I please.”
  108. 108. How Mesos Works Zookeeper Cluster Zookeeper Zookeeper Mesos Master Mesos Master Mesos Master Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Framework: Chronos Framework: Marathon Framework: Apache Spark Framework: Cassandra
  109. 109. How Mesos Works Zookeeper Cluster Zookeeper Zookeeper Mesos Master Mesos Master Mesos Master Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Framework: Chronos Framework: Marathon Framework: Apache Spark Framework: Cassandra Master Forwards Resource Offers “Marathon, I choose you first! I can offer you 3 CPUs and 10 GB of RAM.”
  110. 110. How Mesos Works Zookeeper Cluster Zookeeper Zookeeper Mesos Master Mesos Master Mesos Master Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Framework: Chronos Framework: Marathon Framework: Apache Spark Framework: Cassandra Frameworks Accept/Reject Offers: “Yawn. Pass.”
  111. 111. How Mesos Works Zookeeper Cluster Zookeeper Zookeeper Mesos Master Mesos Master Mesos Master Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Framework: Chronos Framework: Marathon Framework: Apache Spark Framework: Cassandra Master Forwards Resource Offers “Chronos, surely you have need of resources. I offer you the same!”
  112. 112. How Mesos Works Zookeeper Cluster Zookeeper Zookeeper Mesos Master Mesos Master Mesos Master Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Framework: Chronos Framework: Marathon Framework: Apache Spark Framework: Cassandra Frameworks Accept/Reject Offers: “Let’s do this. 
 I need 1 GB of RAM and 1 CPU core.”
  113. 113. How Mesos Works Zookeeper Cluster Zookeeper Zookeeper Mesos Master Mesos Master Mesos Master Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Framework: Chronos Framework: Marathon Framework: Apache Spark Framework: Cassandra Master Schedules Task on a Slave “It shall be so! 
 Agent #22, you shall run this task.”
  114. 114. How Mesos Works Zookeeper Cluster Zookeeper Zookeeper Mesos Master Mesos Master Mesos Master Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Framework: Chronos Framework: Marathon Framework: Apache Spark Framework: Cassandra Slave Receives the Task: “Strength and honor, sire!”
  115. 115. How Mesos Works Zookeeper Cluster Zookeeper Zookeeper Mesos Master Mesos Master Mesos Master Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Framework: Chronos Framework: Marathon Framework: Apache Spark Framework: Cassandra Slave starts running the task 
 as a docker container.
  116. 116. How Mesos Works Zookeeper Cluster Zookeeper Zookeeper Mesos Master Mesos Master Mesos Master Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Mesos Slave (aka Agent) Framework: Chronos Framework: Marathon Framework: Apache Spark Framework: Cassandra
  117. 117. Docker Builder Roll Your Own Docker Registry Roll Your Own Deployment: Scheduling Resource-aware scheduler. Can run other “frameworks” side by side. Deployment: Routing Roll Your Own Deployment: Service Discovery Roll Your Own Auto-Restart Failed Containers Included! Monitoring Roll Your Own Logging Roll Your Own Cluster Data Store Zookeeper Unit of Deployment Mesos Task (which will usually be a Docker container, usually submitted through Marathon.
  118. 118. But wait, there’s more! • Setting up Mesos involves coordinating many different moving pieces. • Also, there’s no immediate way to gain a cluster- wide view of total memory/CPU/disk space use. • Also, the learning curve can be steep.
  119. 119. Mesosphere DCOS is meant to solve these problems. • Offers “turn-key” setup (though the setup itself is not really production-grade). • Offers a fancy UI for viewing cluster resource usage. • Offers a special CLI for installing frameworks with 1 command. • It’s very much in active development and would work best with a Mesosphere support plan.
  120. 120. Pro’s • I find the Mesos abstraction the most intuitive when it comes to managing cluster resources. • Scalability is off the charts. Verizon, Siri, Yelp, Twitter and OpenTable all use Mesos. • Growing community. • Multiple “frameworks” already supported such as Apache Spark and Cassandra. • Solomon Hykes called it the “gold standard” for running Docker containers in a cluster.
  121. 121. Con’s • It can take weeks to setup if you need to do it right. • The learning curve for dev’s is manageable, but for operators there are many moving pieces. • There are certain edge cases that are rare but that would affect cluster performance over time. • If you want to run Mesos on CoreOS, either you need to violate the CoreOS way, or run Mesos Master / Slave (Agent) in docker containers which is officially not recommended.
  122. 122. When To Use It • You’re running multiple microservices, and you anticipate significant scale. • You want to squeeze as much possible utilization out of your large cluster as possible. • You’re ready to adopt the cluster as the primary abstraction and expect to co-mingle prod and dev, multiple services, and multiple frameworks. • Note: Smallest company I met at MesosCon was ~60 employees. That is probably the lower limit of company size before Mesos makes sense (IMO).
  123. 123. Mesos + Docker Swarm • At MesosCon (August 2015), Docker showed Docker Swarm as the CLI-based way to control Mesos deployments.
  124. 124. Data Center Operating Solution #2:
  125. 125. Disclaimers • Mr. Padnick may or may not have any 
 actual real-world experience with 
 Kubernetes but felt it necessary to include 
 it here for the sake of completeness.
  126. 126. Kubernetes Pods • A pod is a group of docker containers that should be run together. Pod Web Server Content Management Server SOURCE: Illustrations reproduced from https://www.youtube.com/watch?v=Fcb4aoSAZ98
  127. 127. Kubernetes Labels • A label is a set of key-value pairs that attach to allow Kubernetes to identify groups of pods. • Concept of labels is baked into most APIs. Pod SOURCE: Illustrations reproduced from https://www.youtube.com/watch?v=Fcb4aoSAZ98 FE Pod BI, FE Pod v2
  128. 128. Kubernetes 
 Replication Controllers • A replication controller is a definition:
 “I want to run this pod 5 times.” • If one of the pods fails, Kubernetes will auto-restart a new one. SOURCE: Illustrations reproduced from https://www.youtube.com/watch?v=Fcb4aoSAZ98 Pod v1 Pod v1 Replication Controller #Pods = 2 Label selector: v1
  129. 129. Kubernetes Cluster Node Kubernetes Master etcd API Server Controller Manager Server Scheduler Server Kubernetes Master etcd API Server Controller Manager Server Scheduler Server kubelet agent proxy Pod FE Pod v2 Pod v1 Node Node Node Node Node Node
  130. 130. Docker Builder Roll Your Own Docker Registry Roll Your Own Deployment: Scheduling Resource-aware scheduler. Deployment: Routing Included! Deployment: Service Discovery Included! Auto-Restart Failed Containers Included! Monitoring Optimal support with Google Cloud Engine. Limited support for others. Logging Optimal support with Google Cloud Engine. Limited support for others. Cluster Data Store etcd Unit of Deployment Pod
  131. 131. Pro’s • Produced by google. • Very well-documented. • Open source. • The “successor” to CoreOS + Fleet. Commercially supported by CoreOS as tectonic. • If run in Google Cloud Engine, can potentially be quite powerful.
  132. 132. Con’s • Preferential support for Google Cloud Engine. • Produced by Google but not necessarily the exact system Google uses to run its own cluster (though based on it). • I may or may not be aware of additional issues.
  133. 133. When To Use It • You’re running Google Cloud Engine • You have prior experience from working at Google • I may or may not be aware of add’l use cases.
  134. 134. Mesos + Kubernetes • You can run kubernetes on top of Mesos as an alternative to Marathon.
  135. 135. Final Thoughts
  136. 136. Closing Thoughts • To get started quickly, choose EC2 Container Service. • To get a feel for the core technologies, choose a PaaS like Deis and slowly learn CoreOS. • To run multi-container VM’s at (potentially huge) scale, choose Mesos. • There are many more “satellite” projects I didn’t cover solving unique problems!
  137. 137. Now have fun and docker on!
  138. 138. Q&A

×