SlideShare a Scribd company logo
1 of 55
Download to read offline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Another Week, Another Million
Containers on Amazon EC2
Andrew Spyker
Software Engineering Manager
Netflix
C M P 3 7 6
Joe Hsieh
Principal Technical Account
Manager
Amazon Web Services
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why containers?
Given our VM architecture comprised of …
Amazingly resilient
Microservice driven
Cloud native
CI/CD DevOps enabled
Elastically scalable
Do we really need containers?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What was missing from our VM environment?
Packaging
• Simple to customize application focused artifacts
• Especially for growth of polyglot environments
• Notably for platforms with OS level dependencies
Local development
• Ability to run applications locally on developer laptops
Simple way to manage compute resources
• Especially for ad hoc batch processing
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Titus, Netflix’s container management platform
Scheduling
• Service & batch job lifecycle
• Resource management
Container execution
• AWS Integration
• Netflix Ecosystem Support
Job and Fleet Management
Batch
Resource Management & Optimization
Container Execution
Service
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The Titus team
• Design
• Develop
• Operate
• Support
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Titus and containers product strategy
• Ordered priority focus on
• Developer velocity
• Reliability
• Cost efficiency
Easy migration from VMs to containers
Easy container integration with VMs and Amazon Services
Focus on just what Netflix needs
“Our focus is to leverage EC2 deeply in
Titus, not abstract it away or implement
similar features. We see this as a
differentiator of Titus versus other
container management solutions.”
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Mesos
High level architecture
Titus Control Plane
• API
• Scheduling
• Job Lifecycle Control
Fenzo
Titus Agents
User Containers
Docker
Mesos Agent
Netflix System Services
AWS Virtual Machines
Docker Registry
Cassandra
AWS Auto Scaling
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
EC2 virtual machine portability
Early on we decided a container MUST …
• Natively integrate with VPC for networking
• Natively integrate with security groups for firewalling
• Work with IAM based Amazon Web Services
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Key leverage points
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon EC2
GPUs - 10’s of p2.8xlarges
Memory optimized - 100’s of r4.16xlarges
General purpose - 1000’s of m4.16xlarges
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
VPC and security groups
EC2 VM
ENI0
(to control plane)
ENI1
SG = w
ENI2
SG = x
ENIn
SG = z
Container 1
SG = w
ENI1 IP1
Container 2
SG = w
ENI1 IP2
Container 3
SG = y
ENI3 IP1
Titus
Container
Mgmt
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
IAM based services
EC2 VM
ENI0
Container 1
eth0 ethMD
ENI1
Titus
Metadata
Proxy
Normal
networking 169.254.169.25
4
Amazon Metadata Service and
Security Token Service (STS)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Titus Host
Instance cryptographic identity
Metatron
Service
User
Container
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
All I really needed to know about
containers, I learned from Titus …
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Choices for Auto Scaling Titus applications
Use the two existing Netflix autoscaling engines we already had
• Pro: Code existed
• Con: Lacking features, we’d have to operate
Write a new one
Look for one from Amazon Web Services
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Choices for Auto Scaling Titus applications
Use the two existing Netflix autoscaling engines we already had
Write a new one
• Pro: Would be specific to our needs
• Con: Would be lacking features, we’d have to operate
Look for one from Amazon Web Services
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Choices for Auto Scaling Titus applications
Use the two existing Netflix autoscaling engines we already had
Write a new one
Look for one from Amazon Web Services
• Pro: Already well understood for VMs, feature-rich
• Con: Only works for VMs
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
A true story
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
A product manager introduction,
development team interchanges, and
multiple iterations later …
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Application Auto Scaling with custom resources
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Configuring Auto Scaling in Spinnaker
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Titus and Application Auto Scaling integration
User
Containers
Control
Plane
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Titus API call pattern
CreateNetworkInterface Total CreateNetworkInterface Throttled
AttachNetworkInterface Total AttachNetworkInterface Throttled
ModifyNetworkInterfaceAttribute Total ModifyNetworkInterfaceAttribute Throttled
AssignPrivateIpAddresses Total AssignPrivateIpAddresses Throttled
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Auto Scaling group Auto Scaling group Auto Scaling group
An infrastructure view of applications
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
An infrastructure view of applications
Auto Scaling group
VPC
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
API calls
RunInstances
CreateNetworkInterface
AttachNetworkInterface
AssignPrivateIpAddress
ModifyNetworkInterface
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Netflix regional failover
Kong evacuation of us-east-1
Traffic diverted to other regions
Fail back to us-east-1
Traffic moved back to us-east-1
us-east-1
eu-west-1
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Infrastructure challenge
• Increase capacity during scale up of savior region
• Launch 1000s of containers in seven minutes
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Easy right?
“we reduced time to schedule 30,000
pods onto 1,000 nodes from
8,780 seconds to 587 seconds”
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Easy right?
“we reduced time to schedule 30,000
pods onto 1,000 nodes from
8,780 seconds to 587 seconds”
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Titus can do this by …
• Dynamically changeable scheduling behavior
• Fleet wide networking optimizations
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Normal scheduling
VM1
App 1
App 2
ENI 1 App 2
IP1 IP1
VM2
App 1
ENI 1
IP1
VMn
App 1
App 2
ENI 1 App 2
IP1 IP1
Trade-off for reliability
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Failover scheduling
VM1
App 1
App 2
ENI 1 App 2
IP1 IP1
VM2
App 1
ENI 1
IP1
VMn
App 1
App 2
ENI 1 App 2
IP1 IP1
App 1
App 1
App 1
App 1
App 1
App 2
App 2
IP2, IP3 IP2, IP3, IP4 IP2, IP3
Trade-off for speed
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
On each host
Change when create and attach ENIs is performed
• Moved this to instance start time
• No longer needed on-demand
Need to burst allocate IP addresses
• Opportunistically batch allocate at container launch time
• Likely if one container was launched more are coming
• Garbage collect unused later
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Titus API pattern
ModifyNetworkInterfaceAttribute Total ModifyNetworkInterfaceAttribute Throttled
AssignPrivateIpAddresses Total AssignPrivateIpAddresses Throttled
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Results
us-east-1 / prod
containers started per minute
}7500 Launched
in 5 minutes
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Netflix load balancing
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
IP based Application Load Balancing
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Configuring EC2 load balancers in Spinnaker
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Titus and Load Balancing integration
User
Containers
Control
Plane
IP Target
Group
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Use cases on Titus
• Netflix API, Node.js Backend UI Scripts
• Machine Learning (GPUs) for personalization
• Encoding and Content use cases
• Netflix Studio use cases
• CDN tracking and planning
• Massively parallel CI system
• Data Pipeline routing and SPaaS
• Big Data platform use cases
Batch
Q4 15
Basic
Services
1Q 16
Production
Services
4Q 16
Customer
Facing
Services
2Q 17
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Q4 2018 container usage
Common
Jobs launched 255K jobs / day
Different applications 1K+ different images
Isolated Titus deployments 7 stacks
Services
Single app cluster size 5K (real), 12K containers (benchmark)
Hosts managed 7K VMs (435,000 CPUs)
Batch
Containers launched 450K / day (750K / day peak)
Hosts managed (autoscaled) 55K VMs / month
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Open Source
Open sourced April 2018
Help other communities by sharing our approach
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Current and future work
Advanced CPU Isolation Opportunistic Workloads
Nitro and Bare Metal
Instances
Next Amazon and Netflix
Partnership
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Andrew Spyker
@aspyker
Joe Hsieh
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

More Related Content

What's hot

How to Use Predictive Scaling (API331-R1) - AWS re:Invent 2018
How to Use Predictive Scaling (API331-R1) - AWS re:Invent 2018How to Use Predictive Scaling (API331-R1) - AWS re:Invent 2018
How to Use Predictive Scaling (API331-R1) - AWS re:Invent 2018Amazon Web Services
 
Earn Your DevOps Black Belt: Deployment Scenarios with AWS CloudFormation (DE...
Earn Your DevOps Black Belt: Deployment Scenarios with AWS CloudFormation (DE...Earn Your DevOps Black Belt: Deployment Scenarios with AWS CloudFormation (DE...
Earn Your DevOps Black Belt: Deployment Scenarios with AWS CloudFormation (DE...Amazon Web Services
 
Data Lake Patterns for Voice, Vision, Advanced Analytics, & ML Using Serverle...
Data Lake Patterns for Voice, Vision, Advanced Analytics, & ML Using Serverle...Data Lake Patterns for Voice, Vision, Advanced Analytics, & ML Using Serverle...
Data Lake Patterns for Voice, Vision, Advanced Analytics, & ML Using Serverle...Amazon Web Services
 
Deploy Alexa for Business in Your Organization & Build Your First Private Ski...
Deploy Alexa for Business in Your Organization & Build Your First Private Ski...Deploy Alexa for Business in Your Organization & Build Your First Private Ski...
Deploy Alexa for Business in Your Organization & Build Your First Private Ski...Amazon Web Services
 
Continuous Integration Best Practices (DEV319-R1) - AWS re:Invent 2018
Continuous Integration Best Practices (DEV319-R1) - AWS re:Invent 2018Continuous Integration Best Practices (DEV319-R1) - AWS re:Invent 2018
Continuous Integration Best Practices (DEV319-R1) - AWS re:Invent 2018Amazon Web Services
 
Reserve Amazon EC2 On-Demand Capacity for Any Duration with On-Demand Capacit...
Reserve Amazon EC2 On-Demand Capacity for Any Duration with On-Demand Capacit...Reserve Amazon EC2 On-Demand Capacity for Any Duration with On-Demand Capacit...
Reserve Amazon EC2 On-Demand Capacity for Any Duration with On-Demand Capacit...Amazon Web Services
 
Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...
Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...
Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...Amazon Web Services
 
Voice-Powered Serverless Analytics (SRV240-R1) - AWS re:Invent 2018
Voice-Powered Serverless Analytics (SRV240-R1) - AWS re:Invent 2018Voice-Powered Serverless Analytics (SRV240-R1) - AWS re:Invent 2018
Voice-Powered Serverless Analytics (SRV240-R1) - AWS re:Invent 2018Amazon Web Services
 
DEM18 How SendBird Built a Serverless Log-Processing Pipeline in a Week
DEM18 How SendBird Built a Serverless Log-Processing Pipeline in a WeekDEM18 How SendBird Built a Serverless Log-Processing Pipeline in a Week
DEM18 How SendBird Built a Serverless Log-Processing Pipeline in a WeekAmazon Web Services
 
Resiliency Testing: Verify That Your System Is as Reliable as You Think (ARC4...
Resiliency Testing: Verify That Your System Is as Reliable as You Think (ARC4...Resiliency Testing: Verify That Your System Is as Reliable as You Think (ARC4...
Resiliency Testing: Verify That Your System Is as Reliable as You Think (ARC4...Amazon Web Services
 
Compliance and Security Mitigation Techniques
Compliance and Security Mitigation TechniquesCompliance and Security Mitigation Techniques
Compliance and Security Mitigation TechniquesAmazon Web Services
 
Lessons Learned from Building an AWS Service on AWS Lambda (SRV327-R1) - AWS ...
Lessons Learned from Building an AWS Service on AWS Lambda (SRV327-R1) - AWS ...Lessons Learned from Building an AWS Service on AWS Lambda (SRV327-R1) - AWS ...
Lessons Learned from Building an AWS Service on AWS Lambda (SRV327-R1) - AWS ...Amazon Web Services
 
SRV318 Running Kubernetes with Amazon EKS
SRV318 Running Kubernetes with Amazon EKSSRV318 Running Kubernetes with Amazon EKS
SRV318 Running Kubernetes with Amazon EKSAmazon Web Services
 
Leadership Session: Using DevOps, Microservices, and Serverless to Accelerate...
Leadership Session: Using DevOps, Microservices, and Serverless to Accelerate...Leadership Session: Using DevOps, Microservices, and Serverless to Accelerate...
Leadership Session: Using DevOps, Microservices, and Serverless to Accelerate...Amazon Web Services
 
SRV205 Architectures and Strategies for Building Modern Applications on AWS
 SRV205 Architectures and Strategies for Building Modern Applications on AWS SRV205 Architectures and Strategies for Building Modern Applications on AWS
SRV205 Architectures and Strategies for Building Modern Applications on AWSAmazon Web Services
 
Machine Learning at the IoT Edge (IOT214) - AWS re:Invent 2018
Machine Learning at the IoT Edge (IOT214) - AWS re:Invent 2018Machine Learning at the IoT Edge (IOT214) - AWS re:Invent 2018
Machine Learning at the IoT Edge (IOT214) - AWS re:Invent 2018Amazon Web Services
 
Driving DevOps Transformation in Enterprises (DEV320) - AWS re:Invent 2018
Driving DevOps Transformation in Enterprises (DEV320) - AWS re:Invent 2018Driving DevOps Transformation in Enterprises (DEV320) - AWS re:Invent 2018
Driving DevOps Transformation in Enterprises (DEV320) - AWS re:Invent 2018Amazon Web Services
 
Deep Dive on Cloud File System Offerings: What to Use, Where, and Why (STG392...
Deep Dive on Cloud File System Offerings: What to Use, Where, and Why (STG392...Deep Dive on Cloud File System Offerings: What to Use, Where, and Why (STG392...
Deep Dive on Cloud File System Offerings: What to Use, Where, and Why (STG392...Amazon Web Services
 
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...Amazon Web Services
 
Update Microcontroller Devices Over-the-Air with Amazon FreeRTOS (IOT304-R1) ...
Update Microcontroller Devices Over-the-Air with Amazon FreeRTOS (IOT304-R1) ...Update Microcontroller Devices Over-the-Air with Amazon FreeRTOS (IOT304-R1) ...
Update Microcontroller Devices Over-the-Air with Amazon FreeRTOS (IOT304-R1) ...Amazon Web Services
 

What's hot (20)

How to Use Predictive Scaling (API331-R1) - AWS re:Invent 2018
How to Use Predictive Scaling (API331-R1) - AWS re:Invent 2018How to Use Predictive Scaling (API331-R1) - AWS re:Invent 2018
How to Use Predictive Scaling (API331-R1) - AWS re:Invent 2018
 
Earn Your DevOps Black Belt: Deployment Scenarios with AWS CloudFormation (DE...
Earn Your DevOps Black Belt: Deployment Scenarios with AWS CloudFormation (DE...Earn Your DevOps Black Belt: Deployment Scenarios with AWS CloudFormation (DE...
Earn Your DevOps Black Belt: Deployment Scenarios with AWS CloudFormation (DE...
 
Data Lake Patterns for Voice, Vision, Advanced Analytics, & ML Using Serverle...
Data Lake Patterns for Voice, Vision, Advanced Analytics, & ML Using Serverle...Data Lake Patterns for Voice, Vision, Advanced Analytics, & ML Using Serverle...
Data Lake Patterns for Voice, Vision, Advanced Analytics, & ML Using Serverle...
 
Deploy Alexa for Business in Your Organization & Build Your First Private Ski...
Deploy Alexa for Business in Your Organization & Build Your First Private Ski...Deploy Alexa for Business in Your Organization & Build Your First Private Ski...
Deploy Alexa for Business in Your Organization & Build Your First Private Ski...
 
Continuous Integration Best Practices (DEV319-R1) - AWS re:Invent 2018
Continuous Integration Best Practices (DEV319-R1) - AWS re:Invent 2018Continuous Integration Best Practices (DEV319-R1) - AWS re:Invent 2018
Continuous Integration Best Practices (DEV319-R1) - AWS re:Invent 2018
 
Reserve Amazon EC2 On-Demand Capacity for Any Duration with On-Demand Capacit...
Reserve Amazon EC2 On-Demand Capacity for Any Duration with On-Demand Capacit...Reserve Amazon EC2 On-Demand Capacity for Any Duration with On-Demand Capacit...
Reserve Amazon EC2 On-Demand Capacity for Any Duration with On-Demand Capacit...
 
Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...
Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...
Reliability of the Cloud: How AWS Achieves High Availability (ARC317-R1) - AW...
 
Voice-Powered Serverless Analytics (SRV240-R1) - AWS re:Invent 2018
Voice-Powered Serverless Analytics (SRV240-R1) - AWS re:Invent 2018Voice-Powered Serverless Analytics (SRV240-R1) - AWS re:Invent 2018
Voice-Powered Serverless Analytics (SRV240-R1) - AWS re:Invent 2018
 
DEM18 How SendBird Built a Serverless Log-Processing Pipeline in a Week
DEM18 How SendBird Built a Serverless Log-Processing Pipeline in a WeekDEM18 How SendBird Built a Serverless Log-Processing Pipeline in a Week
DEM18 How SendBird Built a Serverless Log-Processing Pipeline in a Week
 
Resiliency Testing: Verify That Your System Is as Reliable as You Think (ARC4...
Resiliency Testing: Verify That Your System Is as Reliable as You Think (ARC4...Resiliency Testing: Verify That Your System Is as Reliable as You Think (ARC4...
Resiliency Testing: Verify That Your System Is as Reliable as You Think (ARC4...
 
Compliance and Security Mitigation Techniques
Compliance and Security Mitigation TechniquesCompliance and Security Mitigation Techniques
Compliance and Security Mitigation Techniques
 
Lessons Learned from Building an AWS Service on AWS Lambda (SRV327-R1) - AWS ...
Lessons Learned from Building an AWS Service on AWS Lambda (SRV327-R1) - AWS ...Lessons Learned from Building an AWS Service on AWS Lambda (SRV327-R1) - AWS ...
Lessons Learned from Building an AWS Service on AWS Lambda (SRV327-R1) - AWS ...
 
SRV318 Running Kubernetes with Amazon EKS
SRV318 Running Kubernetes with Amazon EKSSRV318 Running Kubernetes with Amazon EKS
SRV318 Running Kubernetes with Amazon EKS
 
Leadership Session: Using DevOps, Microservices, and Serverless to Accelerate...
Leadership Session: Using DevOps, Microservices, and Serverless to Accelerate...Leadership Session: Using DevOps, Microservices, and Serverless to Accelerate...
Leadership Session: Using DevOps, Microservices, and Serverless to Accelerate...
 
SRV205 Architectures and Strategies for Building Modern Applications on AWS
 SRV205 Architectures and Strategies for Building Modern Applications on AWS SRV205 Architectures and Strategies for Building Modern Applications on AWS
SRV205 Architectures and Strategies for Building Modern Applications on AWS
 
Machine Learning at the IoT Edge (IOT214) - AWS re:Invent 2018
Machine Learning at the IoT Edge (IOT214) - AWS re:Invent 2018Machine Learning at the IoT Edge (IOT214) - AWS re:Invent 2018
Machine Learning at the IoT Edge (IOT214) - AWS re:Invent 2018
 
Driving DevOps Transformation in Enterprises (DEV320) - AWS re:Invent 2018
Driving DevOps Transformation in Enterprises (DEV320) - AWS re:Invent 2018Driving DevOps Transformation in Enterprises (DEV320) - AWS re:Invent 2018
Driving DevOps Transformation in Enterprises (DEV320) - AWS re:Invent 2018
 
Deep Dive on Cloud File System Offerings: What to Use, Where, and Why (STG392...
Deep Dive on Cloud File System Offerings: What to Use, Where, and Why (STG392...Deep Dive on Cloud File System Offerings: What to Use, Where, and Why (STG392...
Deep Dive on Cloud File System Offerings: What to Use, Where, and Why (STG392...
 
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
 
Update Microcontroller Devices Over-the-Air with Amazon FreeRTOS (IOT304-R1) ...
Update Microcontroller Devices Over-the-Air with Amazon FreeRTOS (IOT304-R1) ...Update Microcontroller Devices Over-the-Air with Amazon FreeRTOS (IOT304-R1) ...
Update Microcontroller Devices Over-the-Air with Amazon FreeRTOS (IOT304-R1) ...
 

Similar to Another Week, Another Million Containers on Amazon EC2 (CMP376) - AWS re:Invent 2018

Getting Started with Kubernetes on AWS
Getting Started with Kubernetes on AWSGetting Started with Kubernetes on AWS
Getting Started with Kubernetes on AWSAmazon Web Services
 
Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018
Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018
Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018Amazon Web Services
 
Expert Tips for Successful Kubernetes Deployment on AWS
Expert Tips for Successful Kubernetes Deployment on AWSExpert Tips for Successful Kubernetes Deployment on AWS
Expert Tips for Successful Kubernetes Deployment on AWSAmazon Web Services
 
[AWS Container Service] Getting Started with Kubernetes on AWS
[AWS Container Service] Getting Started with Kubernetes on AWS[AWS Container Service] Getting Started with Kubernetes on AWS
[AWS Container Service] Getting Started with Kubernetes on AWSAmazon Web Services Korea
 
Expert Tips for Successful Kubernetes Deployments on AWS
Expert Tips for Successful Kubernetes Deployments on AWSExpert Tips for Successful Kubernetes Deployments on AWS
Expert Tips for Successful Kubernetes Deployments on AWSAmazon Web Services
 
Deep Dive on Amazon Elastic Container Service (ECS) I AWS Dev Day 2018
Deep Dive on Amazon Elastic Container Service (ECS) I AWS Dev Day 2018Deep Dive on Amazon Elastic Container Service (ECS) I AWS Dev Day 2018
Deep Dive on Amazon Elastic Container Service (ECS) I AWS Dev Day 2018AWS Germany
 
More Containers Less Operations
More Containers Less OperationsMore Containers Less Operations
More Containers Less OperationsDonnie Prakoso
 
Exciting world of Amazon container services with AWS Fargate and Amazon EKS
Exciting world of Amazon container services with AWS Fargate and Amazon EKSExciting world of Amazon container services with AWS Fargate and Amazon EKS
Exciting world of Amazon container services with AWS Fargate and Amazon EKSAmazon Web Services
 
Getting Started with Containers on AWS
Getting Started with Containers on AWSGetting Started with Containers on AWS
Getting Started with Containers on AWSAmazon Web Services
 
Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...
Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...
Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...Amazon Web Services
 
Introduction to Serverless computing and AWS Lambda - Floor28
Introduction to Serverless computing and AWS Lambda - Floor28Introduction to Serverless computing and AWS Lambda - Floor28
Introduction to Serverless computing and AWS Lambda - Floor28Boaz Ziniman
 
Introduction to Serverless computing and AWS Lambda | AWS Floor28
Introduction to Serverless computing and AWS Lambda | AWS Floor28Introduction to Serverless computing and AWS Lambda | AWS Floor28
Introduction to Serverless computing and AWS Lambda | AWS Floor28Amazon Web Services
 
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018Amazon Web Services
 
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...Amazon Web Services
 
SRV314 Containerized App Development with AWS Fargate
SRV314 Containerized App Development with AWS FargateSRV314 Containerized App Development with AWS Fargate
SRV314 Containerized App Development with AWS FargateAmazon Web Services
 
Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28
Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28
Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28Amazon Web Services
 
Getting Started with Containers in the Cloud: AWS Developer Workshop at Web S...
Getting Started with Containers in the Cloud: AWS Developer Workshop at Web S...Getting Started with Containers in the Cloud: AWS Developer Workshop at Web S...
Getting Started with Containers in the Cloud: AWS Developer Workshop at Web S...Amazon Web Services
 
Getting-started-with-containers on AWS
Getting-started-with-containers on AWSGetting-started-with-containers on AWS
Getting-started-with-containers on AWSAmazon Web Services
 

Similar to Another Week, Another Million Containers on Amazon EC2 (CMP376) - AWS re:Invent 2018 (20)

Getting Started with Kubernetes on AWS
Getting Started with Kubernetes on AWSGetting Started with Kubernetes on AWS
Getting Started with Kubernetes on AWS
 
Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018
Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018
Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018
 
Expert Tips for Successful Kubernetes Deployment on AWS
Expert Tips for Successful Kubernetes Deployment on AWSExpert Tips for Successful Kubernetes Deployment on AWS
Expert Tips for Successful Kubernetes Deployment on AWS
 
Deep Dive into Amazon Fargate
Deep Dive into Amazon FargateDeep Dive into Amazon Fargate
Deep Dive into Amazon Fargate
 
[AWS Container Service] Getting Started with Kubernetes on AWS
[AWS Container Service] Getting Started with Kubernetes on AWS[AWS Container Service] Getting Started with Kubernetes on AWS
[AWS Container Service] Getting Started with Kubernetes on AWS
 
Introducing AWS Fargate
Introducing AWS FargateIntroducing AWS Fargate
Introducing AWS Fargate
 
Expert Tips for Successful Kubernetes Deployments on AWS
Expert Tips for Successful Kubernetes Deployments on AWSExpert Tips for Successful Kubernetes Deployments on AWS
Expert Tips for Successful Kubernetes Deployments on AWS
 
Deep Dive on Amazon Elastic Container Service (ECS) I AWS Dev Day 2018
Deep Dive on Amazon Elastic Container Service (ECS) I AWS Dev Day 2018Deep Dive on Amazon Elastic Container Service (ECS) I AWS Dev Day 2018
Deep Dive on Amazon Elastic Container Service (ECS) I AWS Dev Day 2018
 
More Containers Less Operations
More Containers Less OperationsMore Containers Less Operations
More Containers Less Operations
 
Exciting world of Amazon container services with AWS Fargate and Amazon EKS
Exciting world of Amazon container services with AWS Fargate and Amazon EKSExciting world of Amazon container services with AWS Fargate and Amazon EKS
Exciting world of Amazon container services with AWS Fargate and Amazon EKS
 
Getting Started with Containers on AWS
Getting Started with Containers on AWSGetting Started with Containers on AWS
Getting Started with Containers on AWS
 
Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...
Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...
Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...
 
Introduction to Serverless computing and AWS Lambda - Floor28
Introduction to Serverless computing and AWS Lambda - Floor28Introduction to Serverless computing and AWS Lambda - Floor28
Introduction to Serverless computing and AWS Lambda - Floor28
 
Introduction to Serverless computing and AWS Lambda | AWS Floor28
Introduction to Serverless computing and AWS Lambda | AWS Floor28Introduction to Serverless computing and AWS Lambda | AWS Floor28
Introduction to Serverless computing and AWS Lambda | AWS Floor28
 
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
 
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
 
SRV314 Containerized App Development with AWS Fargate
SRV314 Containerized App Development with AWS FargateSRV314 Containerized App Development with AWS Fargate
SRV314 Containerized App Development with AWS Fargate
 
Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28
Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28
Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28
 
Getting Started with Containers in the Cloud: AWS Developer Workshop at Web S...
Getting Started with Containers in the Cloud: AWS Developer Workshop at Web S...Getting Started with Containers in the Cloud: AWS Developer Workshop at Web S...
Getting Started with Containers in the Cloud: AWS Developer Workshop at Web S...
 
Getting-started-with-containers on AWS
Getting-started-with-containers on AWSGetting-started-with-containers on AWS
Getting-started-with-containers on AWS
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Another Week, Another Million Containers on Amazon EC2 (CMP376) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Another Week, Another Million Containers on Amazon EC2 Andrew Spyker Software Engineering Manager Netflix C M P 3 7 6 Joe Hsieh Principal Technical Account Manager Amazon Web Services
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why containers? Given our VM architecture comprised of … Amazingly resilient Microservice driven Cloud native CI/CD DevOps enabled Elastically scalable Do we really need containers?
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What was missing from our VM environment? Packaging • Simple to customize application focused artifacts • Especially for growth of polyglot environments • Notably for platforms with OS level dependencies Local development • Ability to run applications locally on developer laptops Simple way to manage compute resources • Especially for ad hoc batch processing
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Titus, Netflix’s container management platform Scheduling • Service & batch job lifecycle • Resource management Container execution • AWS Integration • Netflix Ecosystem Support Job and Fleet Management Batch Resource Management & Optimization Container Execution Service
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The Titus team • Design • Develop • Operate • Support
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Titus and containers product strategy • Ordered priority focus on • Developer velocity • Reliability • Cost efficiency Easy migration from VMs to containers Easy container integration with VMs and Amazon Services Focus on just what Netflix needs
  • 8. “Our focus is to leverage EC2 deeply in Titus, not abstract it away or implement similar features. We see this as a differentiator of Titus versus other container management solutions.”
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Mesos High level architecture Titus Control Plane • API • Scheduling • Job Lifecycle Control Fenzo Titus Agents User Containers Docker Mesos Agent Netflix System Services AWS Virtual Machines Docker Registry Cassandra AWS Auto Scaling
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. EC2 virtual machine portability Early on we decided a container MUST … • Natively integrate with VPC for networking • Natively integrate with security groups for firewalling • Work with IAM based Amazon Web Services
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Key leverage points
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon EC2 GPUs - 10’s of p2.8xlarges Memory optimized - 100’s of r4.16xlarges General purpose - 1000’s of m4.16xlarges
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. VPC and security groups EC2 VM ENI0 (to control plane) ENI1 SG = w ENI2 SG = x ENIn SG = z Container 1 SG = w ENI1 IP1 Container 2 SG = w ENI1 IP2 Container 3 SG = y ENI3 IP1 Titus Container Mgmt
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. IAM based services EC2 VM ENI0 Container 1 eth0 ethMD ENI1 Titus Metadata Proxy Normal networking 169.254.169.25 4 Amazon Metadata Service and Security Token Service (STS)
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Titus Host Instance cryptographic identity Metatron Service User Container
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. All I really needed to know about containers, I learned from Titus …
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Choices for Auto Scaling Titus applications Use the two existing Netflix autoscaling engines we already had • Pro: Code existed • Con: Lacking features, we’d have to operate Write a new one Look for one from Amazon Web Services
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Choices for Auto Scaling Titus applications Use the two existing Netflix autoscaling engines we already had Write a new one • Pro: Would be specific to our needs • Con: Would be lacking features, we’d have to operate Look for one from Amazon Web Services
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Choices for Auto Scaling Titus applications Use the two existing Netflix autoscaling engines we already had Write a new one Look for one from Amazon Web Services • Pro: Already well understood for VMs, feature-rich • Con: Only works for VMs
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. A true story
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. A product manager introduction, development team interchanges, and multiple iterations later …
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Application Auto Scaling with custom resources
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Configuring Auto Scaling in Spinnaker
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Titus and Application Auto Scaling integration User Containers Control Plane
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Titus API call pattern CreateNetworkInterface Total CreateNetworkInterface Throttled AttachNetworkInterface Total AttachNetworkInterface Throttled ModifyNetworkInterfaceAttribute Total ModifyNetworkInterfaceAttribute Throttled AssignPrivateIpAddresses Total AssignPrivateIpAddresses Throttled
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Auto Scaling group Auto Scaling group Auto Scaling group An infrastructure view of applications
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. An infrastructure view of applications Auto Scaling group VPC
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. API calls RunInstances CreateNetworkInterface AttachNetworkInterface AssignPrivateIpAddress ModifyNetworkInterface
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Netflix regional failover Kong evacuation of us-east-1 Traffic diverted to other regions Fail back to us-east-1 Traffic moved back to us-east-1 us-east-1 eu-west-1
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Infrastructure challenge • Increase capacity during scale up of savior region • Launch 1000s of containers in seven minutes
  • 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Easy right? “we reduced time to schedule 30,000 pods onto 1,000 nodes from 8,780 seconds to 587 seconds”
  • 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Easy right? “we reduced time to schedule 30,000 pods onto 1,000 nodes from 8,780 seconds to 587 seconds”
  • 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Titus can do this by … • Dynamically changeable scheduling behavior • Fleet wide networking optimizations
  • 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Normal scheduling VM1 App 1 App 2 ENI 1 App 2 IP1 IP1 VM2 App 1 ENI 1 IP1 VMn App 1 App 2 ENI 1 App 2 IP1 IP1 Trade-off for reliability
  • 39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Failover scheduling VM1 App 1 App 2 ENI 1 App 2 IP1 IP1 VM2 App 1 ENI 1 IP1 VMn App 1 App 2 ENI 1 App 2 IP1 IP1 App 1 App 1 App 1 App 1 App 1 App 2 App 2 IP2, IP3 IP2, IP3, IP4 IP2, IP3 Trade-off for speed
  • 40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. On each host Change when create and attach ENIs is performed • Moved this to instance start time • No longer needed on-demand Need to burst allocate IP addresses • Opportunistically batch allocate at container launch time • Likely if one container was launched more are coming • Garbage collect unused later
  • 41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Titus API pattern ModifyNetworkInterfaceAttribute Total ModifyNetworkInterfaceAttribute Throttled AssignPrivateIpAddresses Total AssignPrivateIpAddresses Throttled
  • 42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Results us-east-1 / prod containers started per minute }7500 Launched in 5 minutes
  • 43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 44. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Netflix load balancing
  • 45. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. IP based Application Load Balancing
  • 46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Configuring EC2 load balancers in Spinnaker
  • 47. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Titus and Load Balancing integration User Containers Control Plane IP Target Group
  • 48. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 49. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Use cases on Titus • Netflix API, Node.js Backend UI Scripts • Machine Learning (GPUs) for personalization • Encoding and Content use cases • Netflix Studio use cases • CDN tracking and planning • Massively parallel CI system • Data Pipeline routing and SPaaS • Big Data platform use cases Batch Q4 15 Basic Services 1Q 16 Production Services 4Q 16 Customer Facing Services 2Q 17
  • 50. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 51. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Q4 2018 container usage Common Jobs launched 255K jobs / day Different applications 1K+ different images Isolated Titus deployments 7 stacks Services Single app cluster size 5K (real), 12K containers (benchmark) Hosts managed 7K VMs (435,000 CPUs) Batch Containers launched 450K / day (750K / day peak) Hosts managed (autoscaled) 55K VMs / month
  • 52. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Open Source Open sourced April 2018 Help other communities by sharing our approach
  • 53. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Current and future work Advanced CPU Isolation Opportunistic Workloads Nitro and Bare Metal Instances Next Amazon and Netflix Partnership
  • 54. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Andrew Spyker @aspyker Joe Hsieh
  • 55. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.