Rich placement constraints: Who said YARN cannot schedule services?

Konstantinos Karanasos, Wangda Tan
Dataworks Summit 2018
San Jose, June 20

> Interactive data-intensive applications
• Spark, Hive LLAP
> Streaming systems
• Flink, Storm, Kafka Streams
> Latency-sensitive applications
• HBase, Memcached
> ML frameworks
• TensorFlow, Spark ML
Shift towards long
running containers
> Short-running containers
• MapReduce, Scope, Tez

>
> 10-100
>
0
20
40
60
80
100
MachinesusedforLRAs(%)

Important Less important
So we introduced YARN service framework
(Apache Hadoop 3.1.0)

> Performance:
Storm HBase
> Resilience:
HBase
> Cluster objectives:
Rack r1 Rack r2
Upgrade
Domain A
Upgrade
Domain B
n1 n2
n3 n4
n5
n7 n8
n6
MR
MR
HBase
Storm Storm
Storm
MR HBase
Storm Storm
MR
MR
MR
Node Groups
It is all about placement with constraints!

0
2
4
6
8
10
0 1 2 3 4
Unavailablemachines(%)
Days
1
10
100
total
total
> Less than 10% of cluster
nodes unavailable
> Cluster is organized in
node groups

0
2
4
6
8
10
0 1 2 3 4
Days
1
10
100
total node group A
total
upgrade domain A
> Less than 10% of cluster
nodes unavailable
> Cluster is organized in
node groups
> Machines become
unavailable in groups
> With random placement,
an LRA might lose all its
containers at once

>Random
> target nodes
> static machine attributes
>Kubernetes

> How to refer to container groups and node groups?
> Container tags and node groups
> How to express constraints related to LRA containers?
> Expressive constraints within and across LRAs
> How to achieve high quality placement without affecting task-based jobs?
> Placement constraint processor

> Idea: tags
Storm
n1
n2
HBase
Container Tags
KV
HBase master
memory critical
appID_1

Rack
Storm, nimbus,
appID_2
Node Tags
KV, HBase master,
memory_critical,
appID_1
Node Tags
Rack Tags
Storm
n1
n2
HBase
Container Tags
KV
HBase master
memory critical
appID_1
> Idea: logical node groups to refer to dynamic
node sets

> Affinity “Place 3 Storm containers in the same rack as an HBase container”
storm=3, IN, RACK, hbase
> Cardinality “Place 7 Storm containers with no more than 5 containers per
node”
storm=7, CARDINALITY, NODE, storm, 0, 5
> Anti-affinity “Place 5 Storm containers in different nodes than Spark”
storm=5, NOTIN, NODE, spark
> Placement Constraints API
Static methods for LRAs to specify constraints

>
zk=3 NOTIN NODE not-self/zk
hbase=5 IN RACK all/zk
>
zk=5 IN RACK hbase NOTIN NODE zk
>

Capacity/Fair
Scheduler
LRA Interface
Placement Constraint
Processor
Tasks LRAs
Cluster State
> Idea: introduce Placement Constraint Processor for satisfying
constraints of LRA requests
Constraint Manager
Constraints, Container Tags,
Node Groups
High quality placementFast task-based allocations

Capacity/Fair
Scheduler
LRA Interface
Processor
Tasks LRAs
Allocation
Cluster State
> LRA scheduling algorithm in the Placement Constraints Processor
Invoked when an LRA is submitted, considers multiple containers
Constraint Manager
Node Groups

Capacity/Fair
Scheduler
LRA Interface
Processor
Allocation
Cluster State
Constraint Manager
Node Groups
Tasks LRAs
Satisfy LRA constraints without affecting
task-based jobs
> LRA scheduling algorithm in the Placement Constraint Processor
Invoked when an LRA is submitted, considers multiple containers

TensorFlow ML workflow with 1M iterations using 32 workers
with varying workers per node
0
50
100
150
200
250
300
1 2 4 8 16 32
Runtime(min)
Max cardinality per node
Low utilized cluster High utilized cluster
Anti-affinity Affinity

TensorFlow ML workflow with 1M iterations using 32 workers
with varying workers per node
0
50
100
150
200
250
300
1 2 4 8 16 32
Runtime(min)
5% utilized cluster 70% utilized cluster
Cardinality constraints are important
Affinity and anti-affinity are not enough
5% utilized cluster

0
50
100
150
200
250
300
1 2 4 8 16 32
Runtime(min)
5% utilized cluster 70% utilized cluster
34%
42%
5% utilized cluster
70% utilized cluster
Over-utilizationNetwork overhead
Cardinality constraints are important
Affinity and anti-affinity are not enough

> Pre-production cluster
> Workloads
> Constraints

0
250
500
750
Runtime(min)
no-constraints
(YARN 2.x)
TensorFlow
affinity+ cardinality+ multiple
containers
+ YARN 3.1=
58%
54%
(Kubernetes) (Kubernetes++
)
99th
median
5th
Significant performance and predictability
improvement!

>
>
>
>
http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-
yarn-site/PlacementConstraints.html

> Important additions for long-running applications/services
in Hadoop 3.1
> Deployment, packaging, upgrade, discovery of LRAs
> Scheduling of LRAs
Expressive constraints (affinity, anti-affinity, cardinality)
High quality placement via constraint processor
> Many more things to be done: come help us!
> Demo time!

> Latency for placing all containers of an LRA
Scheduling scalability

0
10
20
30
40
50
60
YARN KUBE KUBE++ MEDEA
Runtime(sec)
Impact of MEDEA in Task performance
Short-tasks
Task-based job runtimes are not affected
no-constraints
YARN
affinity
KUBE
+ cardinality
KUBE++
+ multiple
containers
+

0.00
0.20
0.40
0.60
0.80
1.00
0 200 400 600 800
CDFofrequestlatency
Request latency (ms)
No Constraints Intra only Intra-Inter
better
4.6x
> Memcached lookups mean latency:
No constraints: 372ms
Intra-app: 361.7ms
Intra-inter: 78ms
> Total mean latency:
7.6x better over default
5x over intra-only
LRA performance: inter-app
Both intra- and inter-application constraints
are crucial to application performance

Rich placement constraints: Who said YARN cannot schedule services?

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Rich placement constraints: Who said YARN cannot schedule services?

Similar to Rich placement constraints: Who said YARN cannot schedule services? (20)

More from DataWorks Summit

More from DataWorks Summit (20)

Recently uploaded

Recently uploaded (20)

Rich placement constraints: Who said YARN cannot schedule services?

Editor's Notes