Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Moving to Kubernetes on Amazon Web Services at Scale

861 views

Published on

A few months ago, Vungle’s infrastructure was showing its age. As the company moved more toward microservices and needing globaly distributed infrastructure, our old approach of deploying a single app to a group of Ubuntu machines with Chef (either with autoscaling or manually) was starting to become a bottleneck. We were also worried that we were not utilizing our server resources well. We were already using Docker to streamline our development environments and CI systems, so moving production to a Docker-based system seemed like an obvious choice. After evaluating the options (Kubernetes, Mesos, Fleet, etc), we decided to go with Kubernetes on CoreOS.

This talk will focus on the technical decisions we made about our Kubernetes infrastructure to allow us to scale all over the globe, some of the issues we faced and how we worked around them, and the benefits we have seen.

Some highlights:
Setting up clusters in VPCs using CloudFormation
Moving from legacy infrastructure
Exposing services to the outside world
Making complex http routing easily configurable by services
Communication between clusters
Limitations in AWS support
Integration into Deployment process

Vungle has benefitted greatly by embracing containers as the basic method for packaging services, and Kubernetes has really allowed us to become container-native all the way into production. It’s a lot of work to get it right, but putting in the effort is really paying off.

http://sched.co/4Uqg

Published in: Technology
  • Be the first to comment

Moving to Kubernetes on Amazon Web Services at Scale

  1. 1. 1 MAKING VIDEO ADS PERSONAL
  2. 2. PRESENTATION TITLE Month 2015 (or delete) Moving to Kube on AWS at Scale November 2015
  3. 3. 3 ABOUT ME Daniel Nelson Staff Ops Engineer at Vungle @packetcollision daniel@dcn.io
  4. 4. 4 WHAT WE HAD BEFORE • Ubuntu 12.04 • Chef • ASGs and manually started • One app per server • Uneven resource utilization • Dev very different from prod
  5. 5. 5 THE STAGE • Docker dev environment • Prod still piles of Chef • Moving to SOA architecture • More services • More servers
  6. 6. 6 THE DREAM • Prod uses the same Docker images as dev/QA • easily add/remove machines • minimal maintenance • Self-Service
  7. 7. 7 THE OS • Minimal • Simple updates • Community • CoreOS • RancherOS OPTIONS
  8. 8. 8 THE OS • CoreOS • RancherOS • Minimal • Simple updates • Community OPTIONS
  9. 9. 9 THE CONTENDERS • Mesos • Fleet • Docker Swarm • Amazon ECS • Kubernetes
  10. 10. 10 THE CONTENDERS • Mesos • Fleet • Docker Swarm • Amazon ECS • Kubernetes
  11. 11. 11 THE CONTENDERS • Mesos • Fleet • Docker Swarm • Amazon ECS • Kubernetes
  12. 12. 12 THE CONTENDERS • Mesos • Fleet • Docker Swarm • Amazon ECS • Kubernetes
  13. 13. 13 THE CONTENDERS • Mesos • Fleet • Docker Swarm • Amazon ECS • Kubernetes
  14. 14. 14 Construction Challenges • Can’t use kube-up.sh — Hardcoded VPC name
  15. 15. 15 Construction Challenges • Can’t use kube-up.sh — Hardcoded VPC name • Can’t use ELB — Hardcoded VPC name
  16. 16. 16 Construction Challenges • Can’t use kube-up.sh — Hardcoded VPC name • Can’t use ELB — Hardcoded VPC name • CloudFormation templates get big Construction Challenges
  17. 17. 17 Construction Challenges • Can’t use kube-up.sh — Hardcoded VPC name • Can’t use ELB — Hardcoded VPC name • CloudFormation templates get big • Tons of little things Construction Challenges
  18. 18. 18 REPLACING ELB • ASG of router/load-balancer machines • Not in Kube cluster • Are in same SDN as Kube cluster • Running Vulcand (or Waco Kid) • Romulus for auto-configuration (use dev branch)
  19. 19. 19 REPLACING ELB • ASG of router/load-balancer machines • Not in Kube cluster • Are in same SDN as Kube cluster • Running Vulcand (or Waco Kid) • Romulus for auto-configuration (use dev branch)Nginx/HAProxy Ingress balancer
  20. 20. 20 REPLACING LEGACY SYSTEMS • Kafka is our backbone - MirrorMaker from new to old • Move consumers only once producers are on Kube • GTM or Route53 to slowly move traffic over • Help project teams move to Docker prod deploys and pod/service configs
  21. 21. 21 DESIGN FOR SCALE AND RELIABILITY • Multi-region (can also be used to keep clusters smaller within one region) • Make adjusting scale easy • Assume machines and zones will go down
  22. 22. 22 SAVE MONEY • Spot pricing is awesome • You can bid on a bunch of different instance types • Kubernetes makes instance eviction less painful
  23. 23. 23 COMMUNICATING BETWEEN CLUSTERS • Avoid (synchronous) communication • Kafka • VPN • Internal Load balancer
  24. 24. 24 DEPLOYMENT • Have a standard • Empower the Project teams • Automate, automate, automate
  25. 25. 25 That’s it
  26. 26. 26 WE’RE HIRING Senior Software Engineer, Data Senior Software Engineer, Machine Learning Infrastructure Senior Software Engineer, Machine Learning Data Scientist Engineer
  27. 27. Thank you! @packetcollision

×