SlideShare a Scribd company logo
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Andrew Spyker (@aspyker)
12/1/2016
Container Scheduling, Execution
and AWS Integration
What to Expect from the Session
• Why containers?
• Including current use cases and scale
• How did we get there?
• Overview of our container cloud platform
• Collaboration with ECS
About Netflix
• 86.7M members
• 1000+ developers
• 190+ countries
• > ⅓ NA internet download traffic
• 500+ Microservices
• Over 100,000 VM’s
• 3 regions across the world
Why containers?
Given our VM architecture comprised of …
amazingly resilient,
microservice driven,
cloud native,
CI/CD devops enabled,
elastically scalable
do we really need containers?
Our Container System Provides Innovation Velocity
• Iterative local development, deploy when ready
• Manage app and dependencies easily and completely
• Simpler way to express resources, let system manage
Innovation Velocity - Use Cases
• Media Encoding - encoding research development time
• Using VM’s - 1 month, using containers - 1 week
• Niagara
• Build all Netflix codebases in hours
• Saves development 100’s of hours of debugging
• Edge Rearchitecture with NodeJS
• Focus returns to app development
• Simplifies, speeds test and deployment
Why not use existing container mgmt solution?
• Most solutions are focused on the datacenter
• Most solutions are
• Working to abstract datacenter and cross-cloud
• Delivering more than cluster manager
• Not yet at our level of scale
• Wanted to leverage our existing cloud platform
• Not appropriate for Netflix
Batch
What do batch users want?
• Simple shared resources, run till done, job files
• NOT
• EC2 Instance sizes, autoscaling, AMI OS’s
• WHY
• Offloads resource management ops, simpler
Historic use of containers
• General Workflow (Meson), Stream
Processing (Mantis)
• Proven using cgroups and Mesos
• With simple isolation
• Using specific packaging formats
Linux
cgroups
Enter Titus
Job Management
Batch
Resource Management & Optimization
Container Execution
Integration
Sample batch use cases
• Algorithm
Model
Training
GPU usage
• Personalization and recommendation
• Deep learning with neural nets/mini batch
• Titus
• Added g2 support using nvidia-docker-plugin
• Mounts nvidia drivers and devices into Docker container
• Distribution of training jobs and infrastructure made self service
• Recently moved to p2.8xl instances
• 2X performance improvement with same CUDA based code
Sample batch use cases
• Media Encoding Experimentation
• Digital Watermarking
Sample batch use cases
Ad hoc
Reporting
Open Connect
CDN Reporting
Lessons learned from batch
• Docker helped generalize use cases
• Cluster autoscaling adds efficiency
• Advanced scheduling required
• Initially ignored failures (with retries)
• Time sensitive batch came later
Titus Batch Usage (Week of 11/7)
• Started ~ 300,000 containers during the week
• Peak of 1000 containers per minute
• Peak of 3,000 instances (mix of r3.8xls and m4.4xls)
Services
Adding Services to Titus
Job Management
Batch
Resource Management & Optimization
Container Execution
Integration
Service
Services are just long
running batch, right?
Services more complex
Services resize constantly and run forever
• Autoscaling
• Hard to upgrade underlying hosts
Have more state
• Ready for traffic vs. just started/stopped
• Even harder to upgrade
Existing well defined dev, deploy, runtime & ops tools
Real Networking is Hard
Multi-Tenant Networking is Hard
• IP per container
• Security group support
• IAM role support
• Network bandwidth isolation
Solutions
• VPC Networking driver
• Supports ENI’s - full IP functionality
• With scheduling - security groups
• Support traffic control (isolation)
• EC2 Metadata proxy
• Adds container “node” identity
• Delivers IAM roles
VPC Networking Integration with Docker
Titus
Executor
Titus Networking Driver
- Create and attach ENI with
- security group
- IP address
create net namespace
VPC Networking Integration with Docker
Titus
Executor
Titus Networking Driver
- Launch ”pod root” container with
- IP address
- Using “pause” container
- Using net=none
Pod Root
Container
Docker
create net namespace
VPC Networking Integration with Docker
Titus
Executor
Titus Networking Driver
- Create virtual ethernet
- Configure routing rules
- Configure metadata proxy iptables NAT
- Configure traffic control for bandwidth
pod_root_id
Pod Root
Container
VPC Networking Integration with Docker
Titus
Executor
Pod Root
Container
(pod_root_id)
Docker
App
Container
create container with
--net=container:pod_root_id
Metadata Proxy
container
Amazon
Metadata
Service
(169.254.169.254)
Titus Metadata Proxy
What is my IP, instanceid, hostname?
- Return Titus assigned
What is my ami, instance type, etc.
- Unknown
Give me my role credentials
- Assume role to container role, return
credentials
Give me anything else
- Proxy
veth<id>
169.254.169.254:80
host_ip:9999
iptables/NAT
Putting it all together
Virtual Machine Host
ENI1
sg=A
ENI2
sg=X
ENI3
sg=Y,Z
Non-routable IP IP1
IP2
IP3
sg=X sg=X sg=Y,ZNonroutable IP, sg=A Metadata proxy
App
container
pod root
veth<id>
App
container
pod root
veth<id>
App
container
pod root
veth<id>
App
container
pod root
veth<id>
Container 1 Container 2 Container 3 Container 4
Linux Policy Based Routing
+ Traffic Control
169.254.169.254
NAT
Additional AWS Integrations
• Live and rotated to S3 log file access
• Multi-tenant resource isolation (disk)
• Environmental context
• Automatic instance type selection
• Elastic scaling of underlying resource pool
Netflix Infrastructure Integration
• Spinnaker CI/CD
• Atlas telemetry
• Discovery/IPC
• Edda (and dependent systems)
• Healthcheck, system metrics pollers
• Chaos testing
VM’s
VM’s
Why? Single consistent cloud platform
VPC
EC2
Virtual Machines
AWSAutoscaler Service
Applications
Cloud Platform Libraries
(metrics, IPC, health)
TitusJobControl
VM’s
VM’s
Container
Service
Applications
Cloud Platform Libraries
(metrics, IPC, health)
VM’s
VM’s
Container
Batch
Applications
Cloud Platform Libraries
(metrics, IPC)
Edda EurekaAtlas
Titus Spinnaker Integration
Deploy Based On
New Docker
Registry Tags
Deployment
Strategies Same
as ASG’s
IAM Roles and
Sec Groups Per
Container
Basic
Resource
Requirements
Easily See
Healthcheck &
Service
Discovery Status
Fenzo – The heart of Titus scheduling
Extensible Library for Scheduling Frameworks
• Plugins based scheduling objectives
• Bin packing, etc.
• Heterogeneous resources & tasks
• Cluster autoscaling
• Multiple instance types
• Plugins based constraints evaluator
• Resource affinity, task locality, etc.
• Single offer mode added in support of ECS
Fenzo scheduling strategy
For each task
On each host
Validate hard constraints
Eval fitness and soft constraints
Until fitness “good enough”, and
A minimum #hosts evaluated
Plugins
Scheduling – Capacity Guarantees
Desired
Max
Titus maintains …
Critical tier
• guaranteed
capacity & start
latencies
Flex tier
• more dynamic
capacity & variable
start latency
Titus Master
Scheduler
Fenzo
Scheduling – Bin Packing, Elastic Scaling
Max
User adds work tasks
• Titus does bin
packing to ensure
that we can
downscale entire
hosts efficiently
Can
terminate
Desired
Min
✖ ✖ ✖ ✖
Titus Master
Scheduler
Fenzo
Availability Zone B
Availability Zone A
Scheduling – Constraints including AZ Balancing
User specifies constraints
• AZ Balancing
• Resource and Task
affinity
• Hard and softDesired
Min
Titus Master
Scheduler
Fenzo
ASG version 001
Scheduling – Rolling new Titus code
Operator updates Titus agent
codebase
• New scheduling on new cluster
• Batch jobs drain
• Service tasks are migrated via
Spinnaker pipelines
• Old cluster autoscales down
Desired
Min
ASG version 002
Min
Desired
✖ ✖
Titus Master
Scheduler
Fenzo
Current Service Usage
• Approach
• Started with internal applications
• Moved on to line-of-fire NodeJS (shadow first, prod 1Q17)
• Moved on to stream processing (prod 4Q)
• Current - ~ 2000 long running containers
1Q
Batch 2Q
Service
pre-prod 3Q
Service
shadow
Service
Prod
4Q
Collaboration with ECS
Why ECS?
• Decrease operational overhead of underlying cluster
state management
• Allow open source collaboration on ECS Agent
• Work with Amazon and others on EC2 enablement
• GPUS, VPC, Sec Groups, IAM Roles, etc.
• Over time this enablement should result in less maintenance
Titus Today
Container Host
mesos-
agent
Titus
executor
container
container
containerMesos
master
Titus
Scheduler
EC2
Integration
Outbound
- Launch/Terminate Container
- Reconciliation
Inbound
- Container Host Events (and offers)
- Container Events
First Titus ECS Implementation
Container Host
ECS agent
Titus
executor
container
container
container
ECS
Titus
Scheduler
EC2
integrationOutbound
- Launch/Terminate Container
- Polling for
- Container Host Events
- Container Events
✖
✖
Collaboration with ECS team starts
• Collaboration on ECS “event stream” that could provide
• “Real time” task & container instance state changes
• Event based architecture more scalable than polling
• Great engineering collaboration
• Face to face focus
• Monthly interlocks
• Engineer to engineer focused
Current Titus ECS Implementation
Container Host
ECS agent
Titus
executor
container
container
container
ECS
Titus
Scheduler
EC2
Integration
Outbound
- Launch/Terminate Container
- Reconciliation
Inbound
- Container Host Events
- Container Events
✖
✖
Cloud Watch
Events
SQS
Analysis - Periodic Reconciliation
For tasks in listTasks
describeTasks (batches of 100)
Number of API calls: 1 + num tasks / 100 per reconcile
1280 containers
across 40 nodes
Analysis - Scheduling
• Number of API calls: 2X number of tasks
• registerTaskDefinition and startTask
• Largest Titus historical job
• 1000 tasks per minute
• Possible with increased rate limits
Continued areas of scheduling collaboration
• Combining/batching registerTaskDefinition and startTask
• More resource types in the control plane
• Disk, Network Bandwidth, ENI’s
• To fit with existing scheduler approach
• Extensible message fields in task state transitions
• Named tasks (beyond ARN’s) for terminate
• Starting vs. Started state
Possible phases of ECS support in Titus
• Work in progress
• ECS completing scheduling collaboration items
• Complete transition to ECS for overall cluster manager
• Allows us to contribute to ECS agent open source
Netflix cloud platform and EC2 integration points
• Future
• Provide Fenzo as the ECS task placement service
• Extend Titus Job Management features to ECS
Titus Future Focus
Future Strategy of Titus
• Service Autoscaling and global traffic integration
• Service/Batch SLA management
• Capacity guarantees, fair shares and pre-emption
• Trough / Internal spot market management
• Exposing pods to users
• More use cases and scale
Questions?
Andrew Spyker (@aspyker)
Thank you!
Remember to complete
your evaluations!

More Related Content

What's hot

Velocity NYC 2016 - Containers @ Netflix
Velocity NYC 2016 - Containers @ NetflixVelocity NYC 2016 - Containers @ Netflix
Velocity NYC 2016 - Containers @ Netflix
aspyker
 
Red Hat OpenShift Operators - Operators ABC
Red Hat OpenShift Operators - Operators ABCRed Hat OpenShift Operators - Operators ABC
Red Hat OpenShift Operators - Operators ABC
Robert Bohne
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
Chhavi Parasher
 
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
HostedbyConfluent
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
Amita Mirajkar
 
Practical Chaos Engineering
Practical Chaos EngineeringPractical Chaos Engineering
Practical Chaos Engineering
SIGHUP
 
Envoy and Kafka
Envoy and KafkaEnvoy and Kafka
Envoy and Kafka
Adam Kotwasinski
 
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
Brian Grant
 
Benefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use CasesBenefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use Cases
confluent
 
Let's build Developer Portal with Backstage
Let's build Developer Portal with BackstageLet's build Developer Portal with Backstage
Let's build Developer Portal with Backstage
Opsta
 
Big Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingBig Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb Sharding
Araf Karsh Hamid
 
Introduction to Kubernetes
Introduction to KubernetesIntroduction to Kubernetes
Introduction to Kubernetes
Paris Apostolopoulos
 
Container World 2018
Container World 2018Container World 2018
Container World 2018
aspyker
 
Linux host orchestration with Foreman, Puppet and Gitlab
Linux host orchestration with Foreman, Puppet and GitlabLinux host orchestration with Foreman, Puppet and Gitlab
Linux host orchestration with Foreman, Puppet and Gitlab
Ben Tullis
 
Building a CICD pipeline for deploying to containers
Building a CICD pipeline for deploying to containersBuilding a CICD pipeline for deploying to containers
Building a CICD pipeline for deploying to containers
Amazon Web Services
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
Michael Noll
 
CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton
Araf Karsh Hamid
 
Chaos Engineering
Chaos EngineeringChaos Engineering
Chaos Engineering
Amazon Web Services
 
Azure DevOps
Azure DevOpsAzure DevOps
Azure DevOps
Juan Fabian
 
Evolution of containers to kubernetes
Evolution of containers to kubernetesEvolution of containers to kubernetes
Evolution of containers to kubernetes
Krishna-Kumar
 

What's hot (20)

Velocity NYC 2016 - Containers @ Netflix
Velocity NYC 2016 - Containers @ NetflixVelocity NYC 2016 - Containers @ Netflix
Velocity NYC 2016 - Containers @ Netflix
 
Red Hat OpenShift Operators - Operators ABC
Red Hat OpenShift Operators - Operators ABCRed Hat OpenShift Operators - Operators ABC
Red Hat OpenShift Operators - Operators ABC
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Practical Chaos Engineering
Practical Chaos EngineeringPractical Chaos Engineering
Practical Chaos Engineering
 
Envoy and Kafka
Envoy and KafkaEnvoy and Kafka
Envoy and Kafka
 
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
 
Benefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use CasesBenefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use Cases
 
Let's build Developer Portal with Backstage
Let's build Developer Portal with BackstageLet's build Developer Portal with Backstage
Let's build Developer Portal with Backstage
 
Big Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingBig Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb Sharding
 
Introduction to Kubernetes
Introduction to KubernetesIntroduction to Kubernetes
Introduction to Kubernetes
 
Container World 2018
Container World 2018Container World 2018
Container World 2018
 
Linux host orchestration with Foreman, Puppet and Gitlab
Linux host orchestration with Foreman, Puppet and GitlabLinux host orchestration with Foreman, Puppet and Gitlab
Linux host orchestration with Foreman, Puppet and Gitlab
 
Building a CICD pipeline for deploying to containers
Building a CICD pipeline for deploying to containersBuilding a CICD pipeline for deploying to containers
Building a CICD pipeline for deploying to containers
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton
 
Chaos Engineering
Chaos EngineeringChaos Engineering
Chaos Engineering
 
Azure DevOps
Azure DevOpsAzure DevOps
Azure DevOps
 
Evolution of containers to kubernetes
Evolution of containers to kubernetesEvolution of containers to kubernetes
Evolution of containers to kubernetes
 

Viewers also liked

Netflix and Containers: Not A Stranger Thing
Netflix and Containers:  Not A Stranger ThingNetflix and Containers:  Not A Stranger Thing
Netflix and Containers: Not A Stranger Thing
aspyker
 
Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4
aspyker
 
Netflix Open Source: Building a Distributed and Automated Open Source Program
Netflix Open Source:  Building a Distributed and Automated Open Source ProgramNetflix Open Source:  Building a Distributed and Automated Open Source Program
Netflix Open Source: Building a Distributed and Automated Open Source Program
aspyker
 
Ibm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinalIbm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinalaspyker
 
Netflix Cloud Platform and Open Source
Netflix Cloud Platform and Open SourceNetflix Cloud Platform and Open Source
Netflix Cloud Platform and Open Source
aspyker
 
Netflix Open Source Meetup Season 4 Episode 3
Netflix Open Source Meetup Season 4 Episode 3Netflix Open Source Meetup Season 4 Episode 3
Netflix Open Source Meetup Season 4 Episode 3
aspyker
 
Culture
CultureCulture
Culture
Reed Hastings
 
Resilience and Compliance at Speed and Scale
Resilience and Compliance at Speed and ScaleResilience and Compliance at Speed and Scale
Resilience and Compliance at Speed and Scale
Jason Chan
 
From Gates to Guardrails: Alternate Approaches to Product Security
From Gates to Guardrails: Alternate Approaches to Product SecurityFrom Gates to Guardrails: Alternate Approaches to Product Security
From Gates to Guardrails: Alternate Approaches to Product Security
Jason Chan
 
Cloud Application Security: Lessons Learned
Cloud Application Security: Lessons LearnedCloud Application Security: Lessons Learned
Cloud Application Security: Lessons Learned
Jason Chan
 
Cloud Application Security: Lessons Learned
Cloud Application Security: Lessons LearnedCloud Application Security: Lessons Learned
Cloud Application Security: Lessons Learned
Jason Chan
 
Analyze System and Code Interactions
Analyze System and Code InteractionsAnalyze System and Code Interactions
Analyze System and Code Interactions
Qualcomm Developer Network
 
Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015
Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015
Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015
Kurt Brown
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search Roadshow
Adrian Cockcroft
 
Amazon Web Services Security
Amazon Web Services SecurityAmazon Web Services Security
Amazon Web Services Security
Jason Chan
 
Netflix Webkit-Based UI for TV Devices
Netflix Webkit-Based UI for TV DevicesNetflix Webkit-Based UI for TV Devices
Netflix Webkit-Based UI for TV Devices
Matt McCarthy
 
Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013
Adrian Cockcroft
 
Careers in Security
Careers in SecurityCareers in Security
Careers in Security
Jason Chan
 
The Psychology of Security Automation
The Psychology of Security AutomationThe Psychology of Security Automation
The Psychology of Security Automation
Jason Chan
 
Netflix IT Ops 2014 Roadmap
Netflix IT Ops 2014 RoadmapNetflix IT Ops 2014 Roadmap
Netflix IT Ops 2014 Roadmap
mike d. kail
 

Viewers also liked (20)

Netflix and Containers: Not A Stranger Thing
Netflix and Containers:  Not A Stranger ThingNetflix and Containers:  Not A Stranger Thing
Netflix and Containers: Not A Stranger Thing
 
Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4
 
Netflix Open Source: Building a Distributed and Automated Open Source Program
Netflix Open Source:  Building a Distributed and Automated Open Source ProgramNetflix Open Source:  Building a Distributed and Automated Open Source Program
Netflix Open Source: Building a Distributed and Automated Open Source Program
 
Ibm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinalIbm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinal
 
Netflix Cloud Platform and Open Source
Netflix Cloud Platform and Open SourceNetflix Cloud Platform and Open Source
Netflix Cloud Platform and Open Source
 
Netflix Open Source Meetup Season 4 Episode 3
Netflix Open Source Meetup Season 4 Episode 3Netflix Open Source Meetup Season 4 Episode 3
Netflix Open Source Meetup Season 4 Episode 3
 
Culture
CultureCulture
Culture
 
Resilience and Compliance at Speed and Scale
Resilience and Compliance at Speed and ScaleResilience and Compliance at Speed and Scale
Resilience and Compliance at Speed and Scale
 
From Gates to Guardrails: Alternate Approaches to Product Security
From Gates to Guardrails: Alternate Approaches to Product SecurityFrom Gates to Guardrails: Alternate Approaches to Product Security
From Gates to Guardrails: Alternate Approaches to Product Security
 
Cloud Application Security: Lessons Learned
Cloud Application Security: Lessons LearnedCloud Application Security: Lessons Learned
Cloud Application Security: Lessons Learned
 
Cloud Application Security: Lessons Learned
Cloud Application Security: Lessons LearnedCloud Application Security: Lessons Learned
Cloud Application Security: Lessons Learned
 
Analyze System and Code Interactions
Analyze System and Code InteractionsAnalyze System and Code Interactions
Analyze System and Code Interactions
 
Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015
Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015
Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search Roadshow
 
Amazon Web Services Security
Amazon Web Services SecurityAmazon Web Services Security
Amazon Web Services Security
 
Netflix Webkit-Based UI for TV Devices
Netflix Webkit-Based UI for TV DevicesNetflix Webkit-Based UI for TV Devices
Netflix Webkit-Based UI for TV Devices
 
Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013
 
Careers in Security
Careers in SecurityCareers in Security
Careers in Security
 
The Psychology of Security Automation
The Psychology of Security AutomationThe Psychology of Security Automation
The Psychology of Security Automation
 
Netflix IT Ops 2014 Roadmap
Netflix IT Ops 2014 RoadmapNetflix IT Ops 2014 Roadmap
Netflix IT Ops 2014 Roadmap
 

Similar to Re:invent 2016 Container Scheduling, Execution and AWS Integration

AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
Amazon Web Services
 
Netflix and Containers: Not Stranger Things
Netflix and Containers: Not Stranger ThingsNetflix and Containers: Not Stranger Things
Netflix and Containers: Not Stranger Things
All Things Open
 
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)
Amazon Web Services
 
Making sense of containers, docker and Kubernetes on Azure.
Making sense of containers, docker and Kubernetes on Azure.Making sense of containers, docker and Kubernetes on Azure.
Making sense of containers, docker and Kubernetes on Azure.
Nills Franssens
 
Structured Container Delivery by Oscar Renalias, Accenture
Structured Container Delivery by Oscar Renalias, AccentureStructured Container Delivery by Oscar Renalias, Accenture
Structured Container Delivery by Oscar Renalias, Accenture
Docker, Inc.
 
ECS and Docker at Okta
ECS and Docker at OktaECS and Docker at Okta
ECS and Docker at Okta
Jon Todd
 
Briefing: Containers
Briefing: ContainersBriefing: Containers
Briefing: Containers
Server Density
 
DCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at NetflixDCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at Netflix
Docker, Inc.
 
DockerCon 2016 - Structured Container Delivery
DockerCon 2016 - Structured Container DeliveryDockerCon 2016 - Structured Container Delivery
DockerCon 2016 - Structured Container Delivery
Oscar Renalias
 
OpenStack Block Storage 101
OpenStack Block Storage 101OpenStack Block Storage 101
OpenStack Block Storage 101
NetApp
 
AWS Webcast - Website Hosting in the Cloud
AWS Webcast - Website Hosting in the CloudAWS Webcast - Website Hosting in the Cloud
AWS Webcast - Website Hosting in the Cloud
Amazon Web Services
 
Kubernetes on on on on on on on on on on on on on on Azure Deck.pptx
Kubernetes on on on on on on on on on on on on on on Azure Deck.pptxKubernetes on on on on on on on on on on on on on on Azure Deck.pptx
Kubernetes on on on on on on on on on on on on on on Azure Deck.pptx
HectorSebastianMendo
 
Building a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsBuilding a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for Analysts
Avere Systems
 
Private Cloud with Open Stack, Docker
Private Cloud with Open Stack, DockerPrivate Cloud with Open Stack, Docker
Private Cloud with Open Stack, Docker
Davinder Kohli
 
Getting Started with Docker on AWS
Getting Started with Docker on AWSGetting Started with Docker on AWS
Getting Started with Docker on AWS
Amazon Web Services
 
Getting Started with Docker on AWS
Getting Started with Docker on AWSGetting Started with Docker on AWS
Getting Started with Docker on AWS
Amazon Web Services
 
Accelerate Application Innovation Journey with Azure Kubernetes Service
Accelerate Application Innovation Journey with Azure Kubernetes Service Accelerate Application Innovation Journey with Azure Kubernetes Service
Accelerate Application Innovation Journey with Azure Kubernetes Service
WinWire Technologies Inc
 
Kubernetes: від знайомства до використання у CI/CD
Kubernetes: від знайомства до використання у CI/CDKubernetes: від знайомства до використання у CI/CD
Kubernetes: від знайомства до використання у CI/CD
Stfalcon Meetups
 
Docker for the enterprise
Docker for the enterpriseDocker for the enterprise
Docker for the enterprise
Bert Poller
 
Storage Integrations for Container Orchestrators
Storage Integrations for Container OrchestratorsStorage Integrations for Container Orchestrators
Storage Integrations for Container Orchestrators
{code} by Dell EMC
 

Similar to Re:invent 2016 Container Scheduling, Execution and AWS Integration (20)

AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
 
Netflix and Containers: Not Stranger Things
Netflix and Containers: Not Stranger ThingsNetflix and Containers: Not Stranger Things
Netflix and Containers: Not Stranger Things
 
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)
 
Making sense of containers, docker and Kubernetes on Azure.
Making sense of containers, docker and Kubernetes on Azure.Making sense of containers, docker and Kubernetes on Azure.
Making sense of containers, docker and Kubernetes on Azure.
 
Structured Container Delivery by Oscar Renalias, Accenture
Structured Container Delivery by Oscar Renalias, AccentureStructured Container Delivery by Oscar Renalias, Accenture
Structured Container Delivery by Oscar Renalias, Accenture
 
ECS and Docker at Okta
ECS and Docker at OktaECS and Docker at Okta
ECS and Docker at Okta
 
Briefing: Containers
Briefing: ContainersBriefing: Containers
Briefing: Containers
 
DCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at NetflixDCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at Netflix
 
DockerCon 2016 - Structured Container Delivery
DockerCon 2016 - Structured Container DeliveryDockerCon 2016 - Structured Container Delivery
DockerCon 2016 - Structured Container Delivery
 
OpenStack Block Storage 101
OpenStack Block Storage 101OpenStack Block Storage 101
OpenStack Block Storage 101
 
AWS Webcast - Website Hosting in the Cloud
AWS Webcast - Website Hosting in the CloudAWS Webcast - Website Hosting in the Cloud
AWS Webcast - Website Hosting in the Cloud
 
Kubernetes on on on on on on on on on on on on on on Azure Deck.pptx
Kubernetes on on on on on on on on on on on on on on Azure Deck.pptxKubernetes on on on on on on on on on on on on on on Azure Deck.pptx
Kubernetes on on on on on on on on on on on on on on Azure Deck.pptx
 
Building a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsBuilding a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for Analysts
 
Private Cloud with Open Stack, Docker
Private Cloud with Open Stack, DockerPrivate Cloud with Open Stack, Docker
Private Cloud with Open Stack, Docker
 
Getting Started with Docker on AWS
Getting Started with Docker on AWSGetting Started with Docker on AWS
Getting Started with Docker on AWS
 
Getting Started with Docker on AWS
Getting Started with Docker on AWSGetting Started with Docker on AWS
Getting Started with Docker on AWS
 
Accelerate Application Innovation Journey with Azure Kubernetes Service
Accelerate Application Innovation Journey with Azure Kubernetes Service Accelerate Application Innovation Journey with Azure Kubernetes Service
Accelerate Application Innovation Journey with Azure Kubernetes Service
 
Kubernetes: від знайомства до використання у CI/CD
Kubernetes: від знайомства до використання у CI/CDKubernetes: від знайомства до використання у CI/CD
Kubernetes: від знайомства до використання у CI/CD
 
Docker for the enterprise
Docker for the enterpriseDocker for the enterprise
Docker for the enterprise
 
Storage Integrations for Container Orchestrators
Storage Integrations for Container OrchestratorsStorage Integrations for Container Orchestrators
Storage Integrations for Container Orchestrators
 

More from aspyker

Herding Kats - Netflix’s Journey to Kubernetes Public
Herding Kats - Netflix’s Journey to Kubernetes PublicHerding Kats - Netflix’s Journey to Kubernetes Public
Herding Kats - Netflix’s Journey to Kubernetes Public
aspyker
 
Season 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data ScientistsSeason 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data Scientists
aspyker
 
CMP376 - Another Week, Another Million Containers on Amazon EC2
CMP376 - Another Week, Another Million Containers on Amazon EC2CMP376 - Another Week, Another Million Containers on Amazon EC2
CMP376 - Another Week, Another Million Containers on Amazon EC2
aspyker
 
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and DaemonsQConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
aspyker
 
NetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, KayentaNetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, Kayenta
aspyker
 
NetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & ContainersNetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & Containers
aspyker
 
SRECon Lightning Talk
SRECon Lightning TalkSRECon Lightning Talk
SRECon Lightning Talk
aspyker
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Source
aspyker
 
Netflix OSS Meetup Season 5 Episode 1
Netflix OSS Meetup Season 5 Episode 1Netflix OSS Meetup Season 5 Episode 1
Netflix OSS Meetup Season 5 Episode 1
aspyker
 
Series of Unfortunate Netflix Container Events - QConNYC17
Series of Unfortunate Netflix Container Events - QConNYC17Series of Unfortunate Netflix Container Events - QConNYC17
Series of Unfortunate Netflix Container Events - QConNYC17
aspyker
 
Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016
aspyker
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016
aspyker
 
Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1
aspyker
 
CS80A Foothill College Open Source Talk
CS80A Foothill College Open Source TalkCS80A Foothill College Open Source Talk
CS80A Foothill College Open Source Talk
aspyker
 
Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015
aspyker
 
Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2
aspyker
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Source
aspyker
 
NetflixOSS and ZeroToDocker Talk
NetflixOSS and ZeroToDocker TalkNetflixOSS and ZeroToDocker Talk
NetflixOSS and ZeroToDocker Talk
aspyker
 
Docker Demo IBM Impact 2014
Docker Demo IBM Impact 2014Docker Demo IBM Impact 2014
Docker Demo IBM Impact 2014
aspyker
 

More from aspyker (20)

Herding Kats - Netflix’s Journey to Kubernetes Public
Herding Kats - Netflix’s Journey to Kubernetes PublicHerding Kats - Netflix’s Journey to Kubernetes Public
Herding Kats - Netflix’s Journey to Kubernetes Public
 
Season 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data ScientistsSeason 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data Scientists
 
CMP376 - Another Week, Another Million Containers on Amazon EC2
CMP376 - Another Week, Another Million Containers on Amazon EC2CMP376 - Another Week, Another Million Containers on Amazon EC2
CMP376 - Another Week, Another Million Containers on Amazon EC2
 
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and DaemonsQConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
 
NetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, KayentaNetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, Kayenta
 
NetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & ContainersNetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & Containers
 
SRECon Lightning Talk
SRECon Lightning TalkSRECon Lightning Talk
SRECon Lightning Talk
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Source
 
Netflix OSS Meetup Season 5 Episode 1
Netflix OSS Meetup Season 5 Episode 1Netflix OSS Meetup Season 5 Episode 1
Netflix OSS Meetup Season 5 Episode 1
 
Series of Unfortunate Netflix Container Events - QConNYC17
Series of Unfortunate Netflix Container Events - QConNYC17Series of Unfortunate Netflix Container Events - QConNYC17
Series of Unfortunate Netflix Container Events - QConNYC17
 
Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016
 
Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1
 
CS80A Foothill College Open Source Talk
CS80A Foothill College Open Source TalkCS80A Foothill College Open Source Talk
CS80A Foothill College Open Source Talk
 
Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015
 
Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Source
 
NetflixOSS and ZeroToDocker Talk
NetflixOSS and ZeroToDocker TalkNetflixOSS and ZeroToDocker Talk
NetflixOSS and ZeroToDocker Talk
 
Docker Demo IBM Impact 2014
Docker Demo IBM Impact 2014Docker Demo IBM Impact 2014
Docker Demo IBM Impact 2014
 

Recently uploaded

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
ViralQR
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 

Recently uploaded (20)

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 

Re:invent 2016 Container Scheduling, Execution and AWS Integration

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Andrew Spyker (@aspyker) 12/1/2016 Container Scheduling, Execution and AWS Integration
  • 2. What to Expect from the Session • Why containers? • Including current use cases and scale • How did we get there? • Overview of our container cloud platform • Collaboration with ECS
  • 3. About Netflix • 86.7M members • 1000+ developers • 190+ countries • > ⅓ NA internet download traffic • 500+ Microservices • Over 100,000 VM’s • 3 regions across the world
  • 4. Why containers? Given our VM architecture comprised of … amazingly resilient, microservice driven, cloud native, CI/CD devops enabled, elastically scalable do we really need containers?
  • 5. Our Container System Provides Innovation Velocity • Iterative local development, deploy when ready • Manage app and dependencies easily and completely • Simpler way to express resources, let system manage
  • 6. Innovation Velocity - Use Cases • Media Encoding - encoding research development time • Using VM’s - 1 month, using containers - 1 week • Niagara • Build all Netflix codebases in hours • Saves development 100’s of hours of debugging • Edge Rearchitecture with NodeJS • Focus returns to app development • Simplifies, speeds test and deployment
  • 7. Why not use existing container mgmt solution? • Most solutions are focused on the datacenter • Most solutions are • Working to abstract datacenter and cross-cloud • Delivering more than cluster manager • Not yet at our level of scale • Wanted to leverage our existing cloud platform • Not appropriate for Netflix
  • 9. What do batch users want? • Simple shared resources, run till done, job files • NOT • EC2 Instance sizes, autoscaling, AMI OS’s • WHY • Offloads resource management ops, simpler
  • 10. Historic use of containers • General Workflow (Meson), Stream Processing (Mantis) • Proven using cgroups and Mesos • With simple isolation • Using specific packaging formats Linux cgroups
  • 11. Enter Titus Job Management Batch Resource Management & Optimization Container Execution Integration
  • 12. Sample batch use cases • Algorithm Model Training
  • 13. GPU usage • Personalization and recommendation • Deep learning with neural nets/mini batch • Titus • Added g2 support using nvidia-docker-plugin • Mounts nvidia drivers and devices into Docker container • Distribution of training jobs and infrastructure made self service • Recently moved to p2.8xl instances • 2X performance improvement with same CUDA based code
  • 14. Sample batch use cases • Media Encoding Experimentation • Digital Watermarking
  • 15. Sample batch use cases Ad hoc Reporting Open Connect CDN Reporting
  • 16. Lessons learned from batch • Docker helped generalize use cases • Cluster autoscaling adds efficiency • Advanced scheduling required • Initially ignored failures (with retries) • Time sensitive batch came later
  • 17. Titus Batch Usage (Week of 11/7) • Started ~ 300,000 containers during the week • Peak of 1000 containers per minute • Peak of 3,000 instances (mix of r3.8xls and m4.4xls)
  • 19. Adding Services to Titus Job Management Batch Resource Management & Optimization Container Execution Integration Service
  • 20. Services are just long running batch, right?
  • 21. Services more complex Services resize constantly and run forever • Autoscaling • Hard to upgrade underlying hosts Have more state • Ready for traffic vs. just started/stopped • Even harder to upgrade Existing well defined dev, deploy, runtime & ops tools
  • 23. Multi-Tenant Networking is Hard • IP per container • Security group support • IAM role support • Network bandwidth isolation
  • 24. Solutions • VPC Networking driver • Supports ENI’s - full IP functionality • With scheduling - security groups • Support traffic control (isolation) • EC2 Metadata proxy • Adds container “node” identity • Delivers IAM roles
  • 25. VPC Networking Integration with Docker Titus Executor Titus Networking Driver - Create and attach ENI with - security group - IP address create net namespace
  • 26. VPC Networking Integration with Docker Titus Executor Titus Networking Driver - Launch ”pod root” container with - IP address - Using “pause” container - Using net=none Pod Root Container Docker create net namespace
  • 27. VPC Networking Integration with Docker Titus Executor Titus Networking Driver - Create virtual ethernet - Configure routing rules - Configure metadata proxy iptables NAT - Configure traffic control for bandwidth pod_root_id Pod Root Container
  • 28. VPC Networking Integration with Docker Titus Executor Pod Root Container (pod_root_id) Docker App Container create container with --net=container:pod_root_id
  • 29. Metadata Proxy container Amazon Metadata Service (169.254.169.254) Titus Metadata Proxy What is my IP, instanceid, hostname? - Return Titus assigned What is my ami, instance type, etc. - Unknown Give me my role credentials - Assume role to container role, return credentials Give me anything else - Proxy veth<id> 169.254.169.254:80 host_ip:9999 iptables/NAT
  • 30. Putting it all together Virtual Machine Host ENI1 sg=A ENI2 sg=X ENI3 sg=Y,Z Non-routable IP IP1 IP2 IP3 sg=X sg=X sg=Y,ZNonroutable IP, sg=A Metadata proxy App container pod root veth<id> App container pod root veth<id> App container pod root veth<id> App container pod root veth<id> Container 1 Container 2 Container 3 Container 4 Linux Policy Based Routing + Traffic Control 169.254.169.254 NAT
  • 31. Additional AWS Integrations • Live and rotated to S3 log file access • Multi-tenant resource isolation (disk) • Environmental context • Automatic instance type selection • Elastic scaling of underlying resource pool
  • 32. Netflix Infrastructure Integration • Spinnaker CI/CD • Atlas telemetry • Discovery/IPC • Edda (and dependent systems) • Healthcheck, system metrics pollers • Chaos testing
  • 33. VM’s VM’s Why? Single consistent cloud platform VPC EC2 Virtual Machines AWSAutoscaler Service Applications Cloud Platform Libraries (metrics, IPC, health) TitusJobControl VM’s VM’s Container Service Applications Cloud Platform Libraries (metrics, IPC, health) VM’s VM’s Container Batch Applications Cloud Platform Libraries (metrics, IPC) Edda EurekaAtlas
  • 35. Deploy Based On New Docker Registry Tags
  • 36. Deployment Strategies Same as ASG’s IAM Roles and Sec Groups Per Container Basic Resource Requirements
  • 38.
  • 39.
  • 40. Fenzo – The heart of Titus scheduling Extensible Library for Scheduling Frameworks • Plugins based scheduling objectives • Bin packing, etc. • Heterogeneous resources & tasks • Cluster autoscaling • Multiple instance types • Plugins based constraints evaluator • Resource affinity, task locality, etc. • Single offer mode added in support of ECS
  • 41. Fenzo scheduling strategy For each task On each host Validate hard constraints Eval fitness and soft constraints Until fitness “good enough”, and A minimum #hosts evaluated Plugins
  • 42. Scheduling – Capacity Guarantees Desired Max Titus maintains … Critical tier • guaranteed capacity & start latencies Flex tier • more dynamic capacity & variable start latency Titus Master Scheduler Fenzo
  • 43. Scheduling – Bin Packing, Elastic Scaling Max User adds work tasks • Titus does bin packing to ensure that we can downscale entire hosts efficiently Can terminate Desired Min ✖ ✖ ✖ ✖ Titus Master Scheduler Fenzo
  • 44. Availability Zone B Availability Zone A Scheduling – Constraints including AZ Balancing User specifies constraints • AZ Balancing • Resource and Task affinity • Hard and softDesired Min Titus Master Scheduler Fenzo
  • 45. ASG version 001 Scheduling – Rolling new Titus code Operator updates Titus agent codebase • New scheduling on new cluster • Batch jobs drain • Service tasks are migrated via Spinnaker pipelines • Old cluster autoscales down Desired Min ASG version 002 Min Desired ✖ ✖ Titus Master Scheduler Fenzo
  • 46. Current Service Usage • Approach • Started with internal applications • Moved on to line-of-fire NodeJS (shadow first, prod 1Q17) • Moved on to stream processing (prod 4Q) • Current - ~ 2000 long running containers 1Q Batch 2Q Service pre-prod 3Q Service shadow Service Prod 4Q
  • 48. Why ECS? • Decrease operational overhead of underlying cluster state management • Allow open source collaboration on ECS Agent • Work with Amazon and others on EC2 enablement • GPUS, VPC, Sec Groups, IAM Roles, etc. • Over time this enablement should result in less maintenance
  • 49. Titus Today Container Host mesos- agent Titus executor container container containerMesos master Titus Scheduler EC2 Integration Outbound - Launch/Terminate Container - Reconciliation Inbound - Container Host Events (and offers) - Container Events
  • 50. First Titus ECS Implementation Container Host ECS agent Titus executor container container container ECS Titus Scheduler EC2 integrationOutbound - Launch/Terminate Container - Polling for - Container Host Events - Container Events ✖ ✖
  • 51. Collaboration with ECS team starts • Collaboration on ECS “event stream” that could provide • “Real time” task & container instance state changes • Event based architecture more scalable than polling • Great engineering collaboration • Face to face focus • Monthly interlocks • Engineer to engineer focused
  • 52. Current Titus ECS Implementation Container Host ECS agent Titus executor container container container ECS Titus Scheduler EC2 Integration Outbound - Launch/Terminate Container - Reconciliation Inbound - Container Host Events - Container Events ✖ ✖ Cloud Watch Events SQS
  • 53. Analysis - Periodic Reconciliation For tasks in listTasks describeTasks (batches of 100) Number of API calls: 1 + num tasks / 100 per reconcile 1280 containers across 40 nodes
  • 54. Analysis - Scheduling • Number of API calls: 2X number of tasks • registerTaskDefinition and startTask • Largest Titus historical job • 1000 tasks per minute • Possible with increased rate limits
  • 55. Continued areas of scheduling collaboration • Combining/batching registerTaskDefinition and startTask • More resource types in the control plane • Disk, Network Bandwidth, ENI’s • To fit with existing scheduler approach • Extensible message fields in task state transitions • Named tasks (beyond ARN’s) for terminate • Starting vs. Started state
  • 56. Possible phases of ECS support in Titus • Work in progress • ECS completing scheduling collaboration items • Complete transition to ECS for overall cluster manager • Allows us to contribute to ECS agent open source Netflix cloud platform and EC2 integration points • Future • Provide Fenzo as the ECS task placement service • Extend Titus Job Management features to ECS
  • 58. Future Strategy of Titus • Service Autoscaling and global traffic integration • Service/Batch SLA management • Capacity guarantees, fair shares and pre-emption • Trough / Internal spot market management • Exposing pods to users • More use cases and scale

Editor's Notes

  1. Will talk how this led to rate limiting: com.amazonaws.services.ecs.model.AmazonECSException: Rate exceeded (Service: AmazonECS; Status Code: 400; Error Code: ThrottlingException; Request ID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
  2. Talking point: We were able to do this with our existing scheduler and task placement service (Fenzo) due to our architecture.