SlideShare a Scribd company logo
Heterogeneous Resource Scheduling Using
Apache Mesos for Cloud Native Frameworks
Sharma Podila
Senior Software Engineer
Netflix
Aug 20th
MesosCon 2015
Agenda
● Context, motivation
● Fenzo scheduler library
● Usage at Netflix
● Future direction
Why use Apache Mesos in a cloud?
Resource granularity
Application start latency
A tale of two frameworks
Reactive stream processing
Container deployment and management
Reactive stream processing, Mantis
● Cloud native
● Lightweight, dynamic jobs
○ Stateful, multi-stage
○ Real time, anomaly detection, etc.
● Task placement constraints
○ Cloud constructs
○ Resource utilization
● Service and batch style
Mantis job topology
Worker
SourceApp
Stage1
Stage2
Stage3
Sink
A job is set of 1 or more stages
A stage is a set of 1 or more workers
A worker is a Mesos task
Job
Container management, Titan
● Cloud native
● Service and batch workloads
● Jobs with multiple sets of container tasks
● Container placement constraints
○ Cloud constructs
○ Resource affinity
○ Task locality
Job 2
Set 0
Set 1
Container scheduling model
Job 1
Co-locate tasks from multiple
task sets
Why develop a new framework?
Easy to write a new framework?
Easy to write a new framework?
What about scale?
Performance?
Fault tolerance?
Availability?
Easy to write a new framework?
What about scale?
Performance?
Fault tolerance?
Availability?
And scheduling is a hard problem to solve
Long term justification is needed to
create a new Mesos framework
Our motivations for new framework
● Cloud native
(autoscaling)
Our motivations for new framework
● Cloud native
(autoscaling)
● Customizable task placement optimizations
(Mix of service, batch, and stream topologies)
Cluster autoscaling challenge
Host 1 Host 2 Host 3 Host 4
Host 1 Host 2 Host 3 Host 4
vs.
For long running stateful services
Cluster autoscaling challenge
Host 1 Host 2 Host 3 Host 4
Host 1 Host 2 Host 3 Host 4
vs.
For long running stateful services
Components of a mesos framework
API for users to interact
Components of a mesos framework
API for users to interact
Be connected to Mesos via the driver
Components of a mesos framework
API for users to interact
Be connected to Mesos via the driver
Compute resource assignments for tasks
Components of a mesos framework
API for users to interact
Be connected to Mesos via the driver
Compute resource assignments for tasks
Fenzo
A common scheduling library for Mesos
frameworks
Fenzo usage in frameworks
Mesos master
Mesos framework
Tasks
requests
Available
resource
offers
Fenzo task
scheduler
Task assignment result
• Host1
• Task1
• Task2
• Host2
• Task3
• Task4
Persistence
Fenzo scheduling library
Heterogeneous
resources
Autoscaling
of cluster
Visibility of
scheduler
actions
Plugins for
Constraints, Fitness
High speed
Heterogeneous
task requests
Announcing availability of Fenzo in
Netflix OSS suite
Fenzo details
Scheduling problem
Fitness
Pending
Assigned
Urgency
N tasks to assign from M possible slaves
Scheduling optimizations
Speed Accuracy
First fit assignment Optimal assignment
Real world trade-offs~ O (1) ~ O (N * M)1
1
Assuming tasks are not reassigned
Scheduling strategy
For each task
On each host
Validate hard constraints
Eval fitness and soft constraints
Until fitness good enough, and
A minimum #hosts evaluated
Task constraints
Soft
Hard
Task constraints
Soft
Hard
Extensible
Built-in Constraints
Host attribute value constraint
Task
HostAttrConstraint:instanceType=r3
Host1
Attr:instanceType=m3
Host2
Attr:instanceType=r3
Host3
Attr:instanceType=c3
Fenzo
Unique host attribute constraint
Task
UniqueAttr:zone
Host1
Attr:zone=1a
Host2
Attr:zone=1a
Host3
Attr:zone=1b
Fenzo
Balance host attribute constraint
Host1
Attr:zone=1a
Host2
Attr:zone=1b
Host3
Attr:zone=1c
Job with 9 tasks, BalanceAttr:zone
Fenzo
Fitness evaluation
Degree of fitness
Composable
Bin packing fitness calculator
Fitness for
Host1 Host2 Host3 Host4 Host5
fitness = usedCPUs / totalCPUs
Bin packing fitness calculator
Fitness for 0.25 0.5 0.75 1.0 0.0
fitness = usedCPUs / totalCPUs
Host1 Host2 Host3 Host4 Host5
Bin packing fitness calculator
Fitness for 0.25 0.5 0.75 1.0 0.0
✔
Host1 Host2 Host3 Host4 Host5
fitness = usedCPUs / totalCPUs
Host1 Host2 Host3 Host4 Host5
Composable fitness calculators
Fitness
= ( BinPackFitness * BinPackWeight +
RuntimePackFitness * RuntimeWeight
) / 2.0
Cluster autoscaling in Fenzo
ASG/Cluster:
mantisagent
MinIdle: 8
MaxIdle: 20
CooldownSecs:
360
ASG/Cluster:
mantisagent
MinIdle: 8
MaxIdle: 20
CooldownSecs:
360
ASG/cluster:
computeCluster
MinIdle: 8
MaxIdle: 20
CooldownSecs:
360
Fenzo
ScaleUp
action:
Cluster, N
ScaleDown
action:
Cluster,
HostList
Rules based cluster autoscaling
● Set up rules per host attribute value
○ E.g., one autoscale rule per ASG/cluster, one cluster for network-
intensive jobs, another for CPU/memory-intensive jobs
● Sample:
#Idle
hosts
Trigger down scale
Trigger up scale
min
max
Cluster Name Min Idle Count Max Idle Count Cooldown Secs
NetworkClstr 5 15 360
ComputeClstr 10 20 300
Shortfall analysis based autoscaling
● Rule-based scale up has a cool down period
○ What if there’s a surge of incoming requests?
● Pending requests trigger shortfall analysis
○ Scale up happens regardless of cool down period
○ Remembers which tasks have already been covered
Usage at Netflix
Cluster autoscaling
NumberofMesosSlaves
Time
Scheduler run time (milliseconds)
Scheduler run time in milliseconds
(over a week)
Average 2 mS
Maximum 38 mS
Note: times can vary depending on # of tasks, # and types of constraints, and # of hosts
Experimenting with Fenzo
Note: Experiments can be run without requiring a physical cluster
A bin packing experiment
Host 3
Host 2
Host 1
Mesos Slaves
Host 3000
Tasks with cpu=1
Tasks with cpu=3
Tasks with cpu=6
Fenzo
Iteratively assign
Bunch of tasks
Start with idle cluster
Bin packing sample results
Bin pack tasks using Fenzo’s built-in CPU bin packer
Bin packing sample results
Bin pack tasks using Fenzo’s built-in CPU bin packer
Task runtime bin packing sample
Bin pack tasks based on custom fitness calculator to pack
short vs. long run time jobs separately
Scheduler speed experiment
# of hosts # of tasks to
assign each
time
Avg time Avg time
per task
Min
time
Max time Total time
Hosts: 8-CPUs each
Task mix: 20% running 1-CPU jobs, 40% running 4-CPU, and 40% running 6-CPU jobs
Goal: starting from an empty cluster, assign tasks to fill all hosts
Scheduling strategy: CPU bin packing
Scheduler speed experiment
# of hosts # of tasks to
assign each
time
Avg time Avg time
per task
Min
time
Max time Total time
1,000 1 3 mS 3 mS 1 mS 188 mS 9 s
1,000 200 40 mS 0.2 mS 17 mS 100 mS 0.5 s
Hosts: 8-CPUs each
Task mix: 20% running 1-CPU jobs, 40% running 4-CPU, and 40% running 6-CPU jobs
Goal: starting from an empty cluster, assign tasks to fill all hosts
Scheduling strategy: CPU bin packing
Scheduler speed experiment
Hosts: 8-CPUs each
Task mix: 20% running 1-CPU jobs, 40% running 4-CPU, and 40% running 6-CPU jobs
Goal: starting from an empty cluster, assign tasks to fill all hosts
Scheduling strategy: CPU bin packing
# of hosts # of tasks to
assign each
time
Avg time Avg time
per task
Min
time
Max time Total time
1,000 1 3 mS 3 mS 1 mS 188 mS 9 s
1,000 200 40 mS 0.2 mS 17 mS 100 mS 0.5 s
10,000 1 29 mS 29 mS 10 mS 240 mS 870 s
10,000 200 132 mS 0.66 mS 22 mS 434 mS 19 s
Accessing Fenzo
Code at
https://github.com/Netflix/Fenzo
Wiki at
https://github.com/Netflix/Fenzo/wiki
Future directions
● Task management SLAs
● Support for newer Mesos features
● Collaboration
To summarize...
Fenzo: scheduling library for
frameworks
Heterogeneous
resources
Autoscaling
of cluster
Visibility of
scheduler
actions
Plugins for
Constraints, Fitness
High speed
Heterogeneous
task requests
Fenzo is now available in Netflix
OSS suite at
https://github.com/Netflix/Fenzo
Questions?
Heterogeneous Resource Scheduling Using
Apache Mesos for Cloud Native Frameworks
Sharma Podila
spodila @ netflix . com
@podila

More Related Content

What's hot

rspamd-hyperscan
rspamd-hyperscanrspamd-hyperscan
rspamd-hyperscan
Vsevolod Stakhov
 
rspamd-slides
rspamd-slidesrspamd-slides
rspamd-slides
Vsevolod Stakhov
 
Anatomy of an action
Anatomy of an actionAnatomy of an action
Anatomy of an action
Gordon Chung
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
Robert Evans
 
Storm
StormStorm
Storm
nathanmarz
 
Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBMonitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDB
Geoffrey Anderson
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
Chandler Huang
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.
DECK36
 
Gevent at TellApart
Gevent at TellApartGevent at TellApart
Gevent at TellApart
Kevin Ballard
 
Performance optimization 101 - Erlang Factory SF 2014
Performance optimization 101 - Erlang Factory SF 2014Performance optimization 101 - Erlang Factory SF 2014
Performance optimization 101 - Erlang Factory SF 2014
lpgauth
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with Prometheus
Shiao-An Yuan
 
Stabilising the jenga tower
Stabilising the jenga towerStabilising the jenga tower
Stabilising the jenga tower
Gordon Chung
 
rspamd-fosdem
rspamd-fosdemrspamd-fosdem
rspamd-fosdem
Vsevolod Stakhov
 
Jan 2012 HUG: Storm
Jan 2012 HUG: StormJan 2012 HUG: Storm
Jan 2012 HUG: Storm
Yahoo Developer Network
 
Chronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for PrometheusChronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for Prometheus
QAware GmbH
 
Multi-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridMulti-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop Grid
DataWorks Summit
 
Analysis big data by use php with storm
Analysis big data by use php with stormAnalysis big data by use php with storm
Analysis big data by use php with storm
毅 吕
 
Fact-Based Monitoring - PuppetConf 2014
Fact-Based Monitoring - PuppetConf 2014Fact-Based Monitoring - PuppetConf 2014
Fact-Based Monitoring - PuppetConf 2014
Puppet
 
Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28
Xavier Lucas
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
P. Taylor Goetz
 

What's hot (20)

rspamd-hyperscan
rspamd-hyperscanrspamd-hyperscan
rspamd-hyperscan
 
rspamd-slides
rspamd-slidesrspamd-slides
rspamd-slides
 
Anatomy of an action
Anatomy of an actionAnatomy of an action
Anatomy of an action
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
 
Storm
StormStorm
Storm
 
Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBMonitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDB
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.
 
Gevent at TellApart
Gevent at TellApartGevent at TellApart
Gevent at TellApart
 
Performance optimization 101 - Erlang Factory SF 2014
Performance optimization 101 - Erlang Factory SF 2014Performance optimization 101 - Erlang Factory SF 2014
Performance optimization 101 - Erlang Factory SF 2014
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with Prometheus
 
Stabilising the jenga tower
Stabilising the jenga towerStabilising the jenga tower
Stabilising the jenga tower
 
rspamd-fosdem
rspamd-fosdemrspamd-fosdem
rspamd-fosdem
 
Jan 2012 HUG: Storm
Jan 2012 HUG: StormJan 2012 HUG: Storm
Jan 2012 HUG: Storm
 
Chronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for PrometheusChronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for Prometheus
 
Multi-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridMulti-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop Grid
 
Analysis big data by use php with storm
Analysis big data by use php with stormAnalysis big data by use php with storm
Analysis big data by use php with storm
 
Fact-Based Monitoring - PuppetConf 2014
Fact-Based Monitoring - PuppetConf 2014Fact-Based Monitoring - PuppetConf 2014
Fact-Based Monitoring - PuppetConf 2014
 
Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 

Similar to Prezo at-mesos con2015-final

Netflix container scheduling talk at stanford final
Netflix container scheduling talk at stanford   finalNetflix container scheduling talk at stanford   final
Netflix container scheduling talk at stanford final
Sharma Podila
 
Podila mesos con-northamerica_sep2017
Podila mesos con-northamerica_sep2017Podila mesos con-northamerica_sep2017
Podila mesos con-northamerica_sep2017
Sharma Podila
 
Podila QCon SF 2016
Podila QCon SF 2016Podila QCon SF 2016
Podila QCon SF 2016
Sharma Podila
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Vigyan Jain
 
Podila mesos con europe keynote aug sep 2016
Podila mesos con europe keynote aug sep 2016Podila mesos con europe keynote aug sep 2016
Podila mesos con europe keynote aug sep 2016
Sharma Podila
 
Evolution Of MongoDB Replicaset
Evolution Of MongoDB ReplicasetEvolution Of MongoDB Replicaset
Evolution Of MongoDB Replicaset
M Malai
 
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOLMongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB
 
Evolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best PracticesEvolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best Practices
Mydbops
 
Gfs
GfsGfs
Testing data and metadata backends with ClawIO
Testing data and metadata backends with ClawIOTesting data and metadata backends with ClawIO
Testing data and metadata backends with ClawIO
Hugo González Labrador
 
MSR 2009
MSR 2009MSR 2009
MSR 2009
swy351
 
Netflix running Presto in the AWS Cloud
Netflix running Presto in the AWS CloudNetflix running Presto in the AWS Cloud
Netflix running Presto in the AWS Cloud
Zhenxiao Luo
 
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
Flink Forward
 
Apache Flink® Meets Apache Mesos® and DC/OS
Apache Flink® Meets Apache Mesos® and DC/OSApache Flink® Meets Apache Mesos® and DC/OS
Apache Flink® Meets Apache Mesos® and DC/OS
Till Rohrmann
 
End to-end async and await
End to-end async and awaitEnd to-end async and await
End to-end async and await
vfabro
 
SCaLE 20X: Kubernetes Cloud Cost Monitoring with OpenCost & Optimization Stra...
SCaLE 20X: Kubernetes Cloud Cost Monitoring with OpenCost & Optimization Stra...SCaLE 20X: Kubernetes Cloud Cost Monitoring with OpenCost & Optimization Stra...
SCaLE 20X: Kubernetes Cloud Cost Monitoring with OpenCost & Optimization Stra...
Matt Ray
 
Odoo Performance Limits
Odoo Performance LimitsOdoo Performance Limits
Odoo Performance Limits
Odoo
 
DockerCon EU 2015: The Glue is the Hard Part: Making a Production-Ready PaaS
DockerCon EU 2015: The Glue is the Hard Part: Making a Production-Ready PaaSDockerCon EU 2015: The Glue is the Hard Part: Making a Production-Ready PaaS
DockerCon EU 2015: The Glue is the Hard Part: Making a Production-Ready PaaS
Docker, Inc.
 
The Glue is the Hard Part: Making a Production-Ready PaaS
The Glue is the Hard Part: Making a Production-Ready PaaSThe Glue is the Hard Part: Making a Production-Ready PaaS
The Glue is the Hard Part: Making a Production-Ready PaaS
EvanKrall
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 

Similar to Prezo at-mesos con2015-final (20)

Netflix container scheduling talk at stanford final
Netflix container scheduling talk at stanford   finalNetflix container scheduling talk at stanford   final
Netflix container scheduling talk at stanford final
 
Podila mesos con-northamerica_sep2017
Podila mesos con-northamerica_sep2017Podila mesos con-northamerica_sep2017
Podila mesos con-northamerica_sep2017
 
Podila QCon SF 2016
Podila QCon SF 2016Podila QCon SF 2016
Podila QCon SF 2016
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
 
Podila mesos con europe keynote aug sep 2016
Podila mesos con europe keynote aug sep 2016Podila mesos con europe keynote aug sep 2016
Podila mesos con europe keynote aug sep 2016
 
Evolution Of MongoDB Replicaset
Evolution Of MongoDB ReplicasetEvolution Of MongoDB Replicaset
Evolution Of MongoDB Replicaset
 
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOLMongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOL
 
Evolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best PracticesEvolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best Practices
 
Gfs
GfsGfs
Gfs
 
Testing data and metadata backends with ClawIO
Testing data and metadata backends with ClawIOTesting data and metadata backends with ClawIO
Testing data and metadata backends with ClawIO
 
MSR 2009
MSR 2009MSR 2009
MSR 2009
 
Netflix running Presto in the AWS Cloud
Netflix running Presto in the AWS CloudNetflix running Presto in the AWS Cloud
Netflix running Presto in the AWS Cloud
 
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
 
Apache Flink® Meets Apache Mesos® and DC/OS
Apache Flink® Meets Apache Mesos® and DC/OSApache Flink® Meets Apache Mesos® and DC/OS
Apache Flink® Meets Apache Mesos® and DC/OS
 
End to-end async and await
End to-end async and awaitEnd to-end async and await
End to-end async and await
 
SCaLE 20X: Kubernetes Cloud Cost Monitoring with OpenCost & Optimization Stra...
SCaLE 20X: Kubernetes Cloud Cost Monitoring with OpenCost & Optimization Stra...SCaLE 20X: Kubernetes Cloud Cost Monitoring with OpenCost & Optimization Stra...
SCaLE 20X: Kubernetes Cloud Cost Monitoring with OpenCost & Optimization Stra...
 
Odoo Performance Limits
Odoo Performance LimitsOdoo Performance Limits
Odoo Performance Limits
 
DockerCon EU 2015: The Glue is the Hard Part: Making a Production-Ready PaaS
DockerCon EU 2015: The Glue is the Hard Part: Making a Production-Ready PaaSDockerCon EU 2015: The Glue is the Hard Part: Making a Production-Ready PaaS
DockerCon EU 2015: The Glue is the Hard Part: Making a Production-Ready PaaS
 
The Glue is the Hard Part: Making a Production-Ready PaaS
The Glue is the Hard Part: Making a Production-Ready PaaSThe Glue is the Hard Part: Making a Production-Ready PaaS
The Glue is the Hard Part: Making a Production-Ready PaaS
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 

Recently uploaded

8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
kalichargn70th171
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
safelyiotech
 
UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
Peter Muessig
 
YAML crash COURSE how to write yaml file for adding configuring details
YAML crash COURSE how to write yaml file for adding configuring detailsYAML crash COURSE how to write yaml file for adding configuring details
YAML crash COURSE how to write yaml file for adding configuring details
NishanthaBulumulla1
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
Patrick Weigel
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
Green Software Development
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
Alina Yurenko
 
Liberarsi dai framework con i Web Component.pptx
Liberarsi dai framework con i Web Component.pptxLiberarsi dai framework con i Web Component.pptx
Liberarsi dai framework con i Web Component.pptx
Massimo Artizzu
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
Bert Jan Schrijver
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
dakas1
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
GohKiangHock
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
rodomar2
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
brainerhub1
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
zOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL DifferenceszOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL Differences
YousufSait3
 
Preparing Non - Technical Founders for Engaging a Tech Agency
Preparing Non - Technical Founders for Engaging  a  Tech AgencyPreparing Non - Technical Founders for Engaging  a  Tech Agency
Preparing Non - Technical Founders for Engaging a Tech Agency
ISH Technologies
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
ICS
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
Rakesh Kumar R
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 

Recently uploaded (20)

8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
 
UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
 
YAML crash COURSE how to write yaml file for adding configuring details
YAML crash COURSE how to write yaml file for adding configuring detailsYAML crash COURSE how to write yaml file for adding configuring details
YAML crash COURSE how to write yaml file for adding configuring details
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
 
Liberarsi dai framework con i Web Component.pptx
Liberarsi dai framework con i Web Component.pptxLiberarsi dai framework con i Web Component.pptx
Liberarsi dai framework con i Web Component.pptx
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
zOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL DifferenceszOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL Differences
 
Preparing Non - Technical Founders for Engaging a Tech Agency
Preparing Non - Technical Founders for Engaging  a  Tech AgencyPreparing Non - Technical Founders for Engaging  a  Tech Agency
Preparing Non - Technical Founders for Engaging a Tech Agency
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 

Prezo at-mesos con2015-final

  • 1. Heterogeneous Resource Scheduling Using Apache Mesos for Cloud Native Frameworks Sharma Podila Senior Software Engineer Netflix Aug 20th MesosCon 2015
  • 2. Agenda ● Context, motivation ● Fenzo scheduler library ● Usage at Netflix ● Future direction
  • 3. Why use Apache Mesos in a cloud? Resource granularity Application start latency
  • 4. A tale of two frameworks Reactive stream processing Container deployment and management
  • 5. Reactive stream processing, Mantis ● Cloud native ● Lightweight, dynamic jobs ○ Stateful, multi-stage ○ Real time, anomaly detection, etc. ● Task placement constraints ○ Cloud constructs ○ Resource utilization ● Service and batch style
  • 6. Mantis job topology Worker SourceApp Stage1 Stage2 Stage3 Sink A job is set of 1 or more stages A stage is a set of 1 or more workers A worker is a Mesos task Job
  • 7. Container management, Titan ● Cloud native ● Service and batch workloads ● Jobs with multiple sets of container tasks ● Container placement constraints ○ Cloud constructs ○ Resource affinity ○ Task locality
  • 8. Job 2 Set 0 Set 1 Container scheduling model Job 1 Co-locate tasks from multiple task sets
  • 9. Why develop a new framework?
  • 10. Easy to write a new framework?
  • 11. Easy to write a new framework? What about scale? Performance? Fault tolerance? Availability?
  • 12. Easy to write a new framework? What about scale? Performance? Fault tolerance? Availability? And scheduling is a hard problem to solve
  • 13. Long term justification is needed to create a new Mesos framework
  • 14. Our motivations for new framework ● Cloud native (autoscaling)
  • 15. Our motivations for new framework ● Cloud native (autoscaling) ● Customizable task placement optimizations (Mix of service, batch, and stream topologies)
  • 16. Cluster autoscaling challenge Host 1 Host 2 Host 3 Host 4 Host 1 Host 2 Host 3 Host 4 vs. For long running stateful services
  • 17. Cluster autoscaling challenge Host 1 Host 2 Host 3 Host 4 Host 1 Host 2 Host 3 Host 4 vs. For long running stateful services
  • 18. Components of a mesos framework API for users to interact
  • 19. Components of a mesos framework API for users to interact Be connected to Mesos via the driver
  • 20. Components of a mesos framework API for users to interact Be connected to Mesos via the driver Compute resource assignments for tasks
  • 21. Components of a mesos framework API for users to interact Be connected to Mesos via the driver Compute resource assignments for tasks
  • 22. Fenzo A common scheduling library for Mesos frameworks
  • 23. Fenzo usage in frameworks Mesos master Mesos framework Tasks requests Available resource offers Fenzo task scheduler Task assignment result • Host1 • Task1 • Task2 • Host2 • Task3 • Task4 Persistence
  • 24. Fenzo scheduling library Heterogeneous resources Autoscaling of cluster Visibility of scheduler actions Plugins for Constraints, Fitness High speed Heterogeneous task requests
  • 25. Announcing availability of Fenzo in Netflix OSS suite
  • 28. Scheduling optimizations Speed Accuracy First fit assignment Optimal assignment Real world trade-offs~ O (1) ~ O (N * M)1 1 Assuming tasks are not reassigned
  • 29. Scheduling strategy For each task On each host Validate hard constraints Eval fitness and soft constraints Until fitness good enough, and A minimum #hosts evaluated
  • 33. Host attribute value constraint Task HostAttrConstraint:instanceType=r3 Host1 Attr:instanceType=m3 Host2 Attr:instanceType=r3 Host3 Attr:instanceType=c3 Fenzo
  • 34. Unique host attribute constraint Task UniqueAttr:zone Host1 Attr:zone=1a Host2 Attr:zone=1a Host3 Attr:zone=1b Fenzo
  • 35. Balance host attribute constraint Host1 Attr:zone=1a Host2 Attr:zone=1b Host3 Attr:zone=1c Job with 9 tasks, BalanceAttr:zone Fenzo
  • 36. Fitness evaluation Degree of fitness Composable
  • 37. Bin packing fitness calculator Fitness for Host1 Host2 Host3 Host4 Host5 fitness = usedCPUs / totalCPUs
  • 38. Bin packing fitness calculator Fitness for 0.25 0.5 0.75 1.0 0.0 fitness = usedCPUs / totalCPUs Host1 Host2 Host3 Host4 Host5
  • 39. Bin packing fitness calculator Fitness for 0.25 0.5 0.75 1.0 0.0 ✔ Host1 Host2 Host3 Host4 Host5 fitness = usedCPUs / totalCPUs Host1 Host2 Host3 Host4 Host5
  • 40. Composable fitness calculators Fitness = ( BinPackFitness * BinPackWeight + RuntimePackFitness * RuntimeWeight ) / 2.0
  • 41. Cluster autoscaling in Fenzo ASG/Cluster: mantisagent MinIdle: 8 MaxIdle: 20 CooldownSecs: 360 ASG/Cluster: mantisagent MinIdle: 8 MaxIdle: 20 CooldownSecs: 360 ASG/cluster: computeCluster MinIdle: 8 MaxIdle: 20 CooldownSecs: 360 Fenzo ScaleUp action: Cluster, N ScaleDown action: Cluster, HostList
  • 42. Rules based cluster autoscaling ● Set up rules per host attribute value ○ E.g., one autoscale rule per ASG/cluster, one cluster for network- intensive jobs, another for CPU/memory-intensive jobs ● Sample: #Idle hosts Trigger down scale Trigger up scale min max Cluster Name Min Idle Count Max Idle Count Cooldown Secs NetworkClstr 5 15 360 ComputeClstr 10 20 300
  • 43. Shortfall analysis based autoscaling ● Rule-based scale up has a cool down period ○ What if there’s a surge of incoming requests? ● Pending requests trigger shortfall analysis ○ Scale up happens regardless of cool down period ○ Remembers which tasks have already been covered
  • 46. Scheduler run time (milliseconds) Scheduler run time in milliseconds (over a week) Average 2 mS Maximum 38 mS Note: times can vary depending on # of tasks, # and types of constraints, and # of hosts
  • 47. Experimenting with Fenzo Note: Experiments can be run without requiring a physical cluster
  • 48. A bin packing experiment Host 3 Host 2 Host 1 Mesos Slaves Host 3000 Tasks with cpu=1 Tasks with cpu=3 Tasks with cpu=6 Fenzo Iteratively assign Bunch of tasks Start with idle cluster
  • 49. Bin packing sample results Bin pack tasks using Fenzo’s built-in CPU bin packer
  • 50. Bin packing sample results Bin pack tasks using Fenzo’s built-in CPU bin packer
  • 51. Task runtime bin packing sample Bin pack tasks based on custom fitness calculator to pack short vs. long run time jobs separately
  • 52. Scheduler speed experiment # of hosts # of tasks to assign each time Avg time Avg time per task Min time Max time Total time Hosts: 8-CPUs each Task mix: 20% running 1-CPU jobs, 40% running 4-CPU, and 40% running 6-CPU jobs Goal: starting from an empty cluster, assign tasks to fill all hosts Scheduling strategy: CPU bin packing
  • 53. Scheduler speed experiment # of hosts # of tasks to assign each time Avg time Avg time per task Min time Max time Total time 1,000 1 3 mS 3 mS 1 mS 188 mS 9 s 1,000 200 40 mS 0.2 mS 17 mS 100 mS 0.5 s Hosts: 8-CPUs each Task mix: 20% running 1-CPU jobs, 40% running 4-CPU, and 40% running 6-CPU jobs Goal: starting from an empty cluster, assign tasks to fill all hosts Scheduling strategy: CPU bin packing
  • 54. Scheduler speed experiment Hosts: 8-CPUs each Task mix: 20% running 1-CPU jobs, 40% running 4-CPU, and 40% running 6-CPU jobs Goal: starting from an empty cluster, assign tasks to fill all hosts Scheduling strategy: CPU bin packing # of hosts # of tasks to assign each time Avg time Avg time per task Min time Max time Total time 1,000 1 3 mS 3 mS 1 mS 188 mS 9 s 1,000 200 40 mS 0.2 mS 17 mS 100 mS 0.5 s 10,000 1 29 mS 29 mS 10 mS 240 mS 870 s 10,000 200 132 mS 0.66 mS 22 mS 434 mS 19 s
  • 55. Accessing Fenzo Code at https://github.com/Netflix/Fenzo Wiki at https://github.com/Netflix/Fenzo/wiki
  • 56. Future directions ● Task management SLAs ● Support for newer Mesos features ● Collaboration
  • 58. Fenzo: scheduling library for frameworks Heterogeneous resources Autoscaling of cluster Visibility of scheduler actions Plugins for Constraints, Fitness High speed Heterogeneous task requests
  • 59. Fenzo is now available in Netflix OSS suite at https://github.com/Netflix/Fenzo
  • 60. Questions? Heterogeneous Resource Scheduling Using Apache Mesos for Cloud Native Frameworks Sharma Podila spodila @ netflix . com @podila