Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Container World 2018


Published on

Running Containers at Scale at Netflix. An update on the usage of containers at Netflix. Technical discussions on new features and concepts we've added across container scheduling and execution.

Published in: Technology
  • Your opinions matter! get paid BIG $$$ for them! START NOW!!.. ★★★
    Are you sure you want to  Yes  No
    Your message goes here
  • Secrets to making $$$ with paid surveys... ▲▲▲
    Are you sure you want to  Yes  No
    Your message goes here

Container World 2018

  1. 1. Running Containers at Scale at Netflix @aspyker @corindwyer
  2. 2. The Titus Team ● Develop ● Operate ● Support
  3. 3. Netflix’s Container Management Platform Titus Scheduling ● Service & batch jobs ● Resource management Container Execution ● Docker/AWS Integration ● Netflix Infra Support Service Job and Fleet Management Resource Management & Optimization Container Execution Integration Batch
  4. 4. ● 1000+ Applications ● Netflix API, NodeJS Backend UI Scripts ● Machine Learning (GPUs) for personalization ● Encoding and Content use cases ● Netflix Studio use cases ● CDN tracking and planning ● Massively parallel CI system ● Data Pipeline routing and SPaaS ● Big Data platform use cases Growing set of container use cases Batch Q4 15 Basic Service 1Q 16 Production Service 4Q 16 Customer Facing Service 2Q 17 shadow
  5. 5. High Level Titus Architecture Cassandra Titus Control Plane ● API ● Scheduling ● Job Lifecycle Control EC2 Autoscaling Fenzo container container container docker Titus Agents Mesos agent Docker Docker Registry containercontainerUser Containers AWS Virtual Machines Mesos Titus System ServicesBatch/Workflow Systems Service CI/CD
  6. 6. Q1 2018 Container Usage Common Jobs Launched 176K jobs / day Different applications 1K+ different images Regional isolated Titus stacks 7 Services Single App Cluster Size 5K (real), 12K containers (benchmark) Agents managed 16K VMs Batch Containers launched 430K / day Agents autoscaled 350K VMs / month
  7. 7. Leveraging existing Netflix and AWS Infrastructure Single consistent cloud environment between VMs and containers VMVM EC2 AWSAutoScaler VMs Service App Cloud Platform (metrics, IPC, health) VPC VMVM Atlas TitusJobControl Containers Service App Cloud Platform (metrics, IPC, health) Eureka Edda VMVMContainers Batch App Cloud Platform (metrics, IPC, health)
  8. 8. Most Native AWS Container Platform IP per container ● VPC IP, ENI and security group ● Optimized to share ENIs ● ENI pre-attaching, opportunistic batching of IPs (bursty deploys) IAM Roles and Metadata Endpoint per container ● Container view of Cryptographic identity per container ● Using Amazon instance identity document Service job container autoscaling ● Using Native AWS Cloudwatch and Autoscaling policies and engine Application Load Balancing (ALB)
  9. 9. Advanced Scheduling and Control Plane Technologies
  10. 10. Scheduling / Placement Considering the realities of … ● Docker, Linux, Image Pulling, etc. ● Complex resources (ENIs) ● Amazon rate limiting ● Filtering (constraints) and ranking (fitness) ● Different profiles for service | batch, critical | operational, etc. Reliability Provisioning Time Cost Trade offs
  11. 11. Capacity Management User configures “capacity groups” based on workload type Critical (RIs) ● Preallocated instances in order to achieve low provisioning time ● Buffer to support temporary extra capacity needs for deployments Flex (On-Demand) ● Autoscaled instances based on demand Opportunistic (Spot) - Coming ● Utilize extra instances with the ability to preempt or evict the workload
  12. 12. Centralized Agent Management Agent Management Other subsystems Health checks Cluster lifecycle Other signals Unified component for tracking agent information, Powers other systems like task migration, canaries, agent remediations Cluster B Agents states = schedulable For example: Task Migration Cluster A Agent state = non-schedulable, drain tasks Agent Management Task Migration Cluster state
  13. 13. Integrations Titus External Resources Operational VisibilityInterfaces Load Balancers Autoscaling Spinnaker (Task Migration) Telemetry - Atlas Event Storage - Elasticsearch REST / gRPC Streaming updates
  14. 14. Advanced Container Runtime Technologies
  15. 15. Multi-tenant networking is hard Decided early on we wanted full IP stacks per container But what about? ● Security group support ● IAM role support ● Network bandwidth isolation ● Leverage VPC
  16. 16. Virtual Machine Host Titus Networking sg=A,B IP 2 sg=B,C IP 3 Metadata service IPVlan, BPF, IFBs to route app traffic Container 1 Container 2 sg=A,B IP 4 Container 3 eth 0 sg=Titus control plane eth1 sg=A,B eth2 sg=B,C eth-mdeth-md Titus executor eth0eth0eth0 IP 2 IP 4 IP 3 IP 1 Metadata service eth-md Metadata service
  17. 17. Next challenge: Speed limits of EC2 Networking Largest EC2 challenge: speed of networking reconfiguration Changes in how we work with EC2 API’s ● Work with Amazon to redefine networking related API rate limits, buckets ● Pre-attach all networking interfaces ● IPs are asked for in bulk opportunistically Also, coordination with scheduler ● Prefer instances with containers already in the same security group For large scale failovers ● Before … hours, after ... minutes
  18. 18. ● Detection - health checks ○ Linux subsystems (systemd, filesystems) ○ Docker aspects (runtime health, registry pulls) ○ Titus processes (networking, GPU, security drivers) ○ Mesos aspects (agent, executor) ● Remediation ○ Local reconciliation ○ Docker image cleanup Overcoming failures on each agent
  19. 19. Process Model Evolution Single process containers ● Worked for some time, until we needed system services System services ● Telemetry, IAM support, log uploading ● Added as host installed daemons; isolation & multi-tenancy concerns ● Currently injecting system services into containers Composing system services into containers ● Considered pods; lifecycle and usage complexities limited value ● Considering future of both systemd and docker image composability
  20. 20. Resource Isolation ● CPU ○ Started with bursting; was interfering with predictability ○ Resource tiers to avoid interference problems ● Memory ○ Hard limit, OOM kills entire container ● Network ○ Bandwidth throttling ● Disk space ● GPUs
  21. 21. Security Isolation ● Deployed user namespaces ○ Challenging due to shared systems without UID shifting ● Needed ad hoc debugging ○ Titus-ssh for user level access to their container ○ Still required power user access for kernel functions ○ Working to automate through tools like Vector (NetflixOSS) ● Seccomp overhead and complexity is prohibitive ○ Working towards automated policies and BPF driven implementations
  22. 22. Open Sourcing Currently in private open source collaboration with those who want ... ● The NetflixOSS container solution (Spinnaker + Titus + Netflix RPC) ● A unified batch and service Mesos scheduler ● More robust & native AWS container platform Hope to fully open source in early Q2 ● If you want access now, let us know ● Looking for collaborators, feedback
  23. 23. Q&A