Mesos: 
The 
Datacenter 
Opera1ng 
System 
David 
Greenberg 
Two 
Sigma
Who 
am 
I? 
• Architected 
project 
to 
build 
a 
massive 
Mesos 
cluster 
• Building 
custom 
framework 
and 
leveraging 
open 
source
The 
Plan 
What 
is 
Mesos? 
How 
can 
I 
use 
Mesos? 
How 
can 
I 
build 
on 
Mesos?
What 
is 
Mesos?
A 
long 
1me 
ago… 
Are 
you 
done 
with 
the 
machine? 
I 
need 
to 
load 
my 
cards. 
Lol 
no; 
maybe 
tomorrow.
1957 
Oh 
man! 
Let’s 
all 
share 
the 
computer, 
AT 
THE 
SAME 
TIME! 
John 
McCarthy 
Popularized 
Timesharing
A 
long 
1me 
ago… 
Are 
you 
done 
with 
the 
Hadoop 
cluster? 
I 
need 
to 
run 
my 
analy1cs 
job. 
Lol 
no; 
maybe 
tomorrow.
2010 
Oh 
man! 
Let’s 
all 
share 
the 
cluster, 
AT 
THE 
SAME 
TIME! 
Ben 
Hindman 
Popularized 
Mesos
Good 
ideas 
today 
mirror 
good 
ideas 
of 
yesteryear
Mesos: 
an 
Opera1ng 
System
Isola1on
Resource 
Sharing
Common 
Infrastructure 
• read(), 
write(), 
open() 
• bind(), 
connect() 
• apt-­‐get, 
yum 
• launchTask(), 
killTask(), 
statusUpdate() 
• Docker
Distributed 
System* 
Anatomy 
Workers 
Coordinator 
* 
Excluding 
peer-­‐to-­‐peer 
systems
Sta1c 
Par11oning 
Coordinator 
(Hadoop) 
Coordinator 
(Storm)
Mesos: 
a 
Level 
of 
Indirec1on 
Mesos 
(slaves) 
Coordinator 
Mesos 
(master) 
Coordinator
Mesos: 
a 
Level 
of 
Indirec1on 
Mesos 
(slaves) 
Coordinator 
Mesos 
(master) 
Coordinator
Mesos: 
a 
Level 
of 
Indirec1on 
Mesos 
(slaves) 
Coordinator 
Mesos 
(master) 
Coordinator
Mesos: 
a 
Level 
of 
Indirec1on 
Mesos 
(slaves) 
Coordinator 
Mesos 
(master) 
Coordinator
Mesos: 
a 
Level 
of 
Indirec1on 
Mesos 
(slaves) 
Coordinator 
Mesos 
(master) 
Coordinator
Coordina1ng 
Execu1on 
≈ 
Scheduling
s/Coordinator/Scheduler/ 
Mesos 
(slaves) 
Coordinator 
Mesos 
(master)
s/Coordinator/Scheduler/ 
Mesos 
(slaves) 
Scheduler 
Mesos 
(master)
Apache 
Hadoop 
Mesos 
(slaves) 
JobTracker 
(Scheduler) 
Mesos 
(master)
Distributed 
System 
≈ 
(Mesos) 
framework
a 
Mesos 
framework 
is 
a 
distributed 
system 
that 
has 
a 
coordinator
a 
Mesos 
framework 
is 
a 
distributed 
system 
that 
has 
a 
coordinator
a 
Mesos 
framework 
is 
a 
distributed 
system 
that 
has 
a 
scheduler 
a
a 
Mesos 
framework 
is 
an 
app 
for 
your 
cluster
How 
can 
I 
use 
Mesos?
Tons 
of 
Flexibility!
Jenkins 
• Con1nuous 
build 
server 
• Just 
install 
a 
plugin!
Hadoop 
• Mul1-­‐cluster 
isola1on 
• Fast 
startup 
• Just 
run 
the 
repacked 
Cloudera 
CDH 
4.2.1 
MR1 
distribu1on 
for 
Mesos
Marathon 
• PaaS 
on 
Mesos 
• init.d 
for 
the 
cluster 
• Docker 
support 
• Scales 
at 
the 
click 
of 
a 
budon 
• Manages 
edge 
routers 
-­‐ 
HAProxy
Chronos 
• Distributed 
cron 
• Supports 
job 
dependencies 
• REST 
API
Aurora 
• Advanced 
PaaS 
on 
Mesos 
• Powers 
Twider 
• Supports 
phased 
rollouts 
• Supports 
complex 
deployments
Spark 
• In 
memory 
Map 
Reduce, 
built 
for 
“Medium 
Data” 
• Supports 
SQL 
as 
well 
as 
Java, 
Python, 
and 
Scala 
• Designed 
for 
interac1ve 
analysis 
via 
REPL
How 
do 
I 
use 
these? 
• Free 
online 
interac1ve 
tutorials! 
– hdp://mesosphere.io/learn 
• Covers 
all 
of 
the 
previously 
men1oned 
and 
many 
more
How 
can 
I 
build 
on 
Mesos?
Cluster 
Manager 
Status 
Quo 
Applica?on/Human 
Specifica1on 
Cluster 
Manager 
The 
specifica1on 
includes 
as 
much 
informa1on 
as 
possible 
to 
assist 
the 
cluster 
manager 
in 
scheduling 
and 
execu1on
Cluster 
Manager 
Status 
Quo 
Applica?on/Human 
Cluster 
Manager 
Wait 
for 
task 
to 
be 
executed
Cluster 
Manager 
Status 
Quo 
Applica?on/Human 
Result 
Cluster 
Manager
Problems 
with 
Specifica1ons 
① Hard 
to 
specify 
certain 
desires 
or 
constraints 
② Hard 
to 
update 
specifica1ons 
dynamically 
as 
tasks 
execute 
and 
finish/fail
An 
Alterna1ve 
Model 
Scheduler 
Mesos 
request 
3 
CPUs 
2 
GB 
RAM 
• A 
request 
is 
purposely 
simplified 
subset 
of 
a 
specifica1on 
• It 
is 
just 
the 
required 
resources 
at 
that 
point 
in 
)me
What 
should 
you 
do 
if 
you 
can’t 
sa1sfy 
a 
request?
What 
should 
you 
do 
if 
you 
can’t 
sa1sfy 
a 
request? 
① 
Wait 
un?l 
you 
can 
…
What 
should 
you 
do 
if 
you 
can’t 
sa1sfy 
a 
request? 
① 
Wait 
un?l 
you 
can 
… 
② 
Offer 
best 
you 
can 
immediately
What 
should 
you 
do 
if 
you 
can’t 
sa1sfy 
a 
request? 
① 
Wait 
un?l 
you 
can 
… 
② 
Offer 
best 
you 
can 
immediately
Mesos 
Model 
Scheduler 
Mesos 
offer 
hostname 
4 
CPUs 
4 
GB 
RAM 
• Resources 
are 
allocated 
via 
resource 
offers 
• A 
resource 
offer 
represents 
a 
snapshot 
of 
available 
resources 
that 
a 
scheduler 
can 
use 
to 
run 
tasks
An 
Analogue: 
non-­‐blocking 
sockets 
Applica?on 
Kernel 
write(s, buffer, size);!
An 
Analogue: 
non-­‐blocking 
sockets 
Applica?on 
Kernel 
42 of 100 bytes written!!
Mesos 
Model 
offer 
hostname 
4 
CPUs 
4 
GB 
RAM 
offer 
hostname 
4 
CPUs 
4 
GB 
RAM 
offer 
hostname 
4 
CPUs 
4 
GB 
RAM 
Scheduler 
Mesos 
offer 
hostname 
4 
CPUs 
4 
GB 
RAM 
Scheduler 
uses 
the 
offers 
to 
decide 
what 
tasks 
to 
run
Mesos 
Model 
Scheduler 
Mesos 
Scheduler 
uses 
the 
offers 
to 
decide 
what 
tasks 
to 
run 
“Two-­‐level 
scheduling” 
task 
3 
CPUs 
2 
GB 
RAM
Two-­‐level 
Scheduling 
• Mesos: 
controls 
resource 
alloca+ons 
to 
schedulers 
• Schedulers: 
make 
decisions 
about 
what 
tasks 
to 
run 
given 
allocated 
resources
Two-­‐level 
Scheduling 
Elsewhere 
• Mesos 
influenced 
by 
opera1ng 
system 
supported 
user-­‐space 
scheduling 
– E.g. 
green 
threads, 
gorou1nes 
• Mesos 
is 
designed 
less 
like 
a 
“cluster 
manager” 
and 
more 
like 
an 
opera1ng 
system 
(or 
kernel)
Language 
Bindings
Should 
I 
build 
it 
on 
Mesos? 
• Theme 
of 
MesosCon: 
it’s 
easy 
to 
build 
frameworks 
• Open 
source 
and 
proprietary 
frameworks 
are 
being 
created 
all 
the 
1me 
– Two 
Sigma 
– Neplix 
– Twider 
– Hubspot
But 
should 
I 
really 
build 
it 
on 
Mesos? 
• Most 
users 
just 
use 
Marathon, 
Hadoop, 
Spark, 
and 
Chronos 
• Why 
did 
we 
build 
our 
own? 
– Exo1c 
workload
The 
Plan, 
redux 
What 
is 
Mesos? 
How 
can 
I 
use 
Mesos? 
How 
can 
I 
build 
on 
Mesos?
Ques1ons? 
Thank 
you

Mesos: The Operating System for your Datacenter

  • 1.
    Mesos: The Datacenter Opera1ng System David Greenberg Two Sigma
  • 2.
    Who am I? • Architected project to build a massive Mesos cluster • Building custom framework and leveraging open source
  • 3.
    The Plan What is Mesos? How can I use Mesos? How can I build on Mesos?
  • 4.
  • 5.
    A long 1me ago… Are you done with the machine? I need to load my cards. Lol no; maybe tomorrow.
  • 6.
    1957 Oh man! Let’s all share the computer, AT THE SAME TIME! John McCarthy Popularized Timesharing
  • 7.
    A long 1me ago… Are you done with the Hadoop cluster? I need to run my analy1cs job. Lol no; maybe tomorrow.
  • 8.
    2010 Oh man! Let’s all share the cluster, AT THE SAME TIME! Ben Hindman Popularized Mesos
  • 9.
    Good ideas today mirror good ideas of yesteryear
  • 10.
  • 11.
  • 12.
  • 13.
    Common Infrastructure •read(), write(), open() • bind(), connect() • apt-­‐get, yum • launchTask(), killTask(), statusUpdate() • Docker
  • 14.
    Distributed System* Anatomy Workers Coordinator * Excluding peer-­‐to-­‐peer systems
  • 15.
    Sta1c Par11oning Coordinator (Hadoop) Coordinator (Storm)
  • 16.
    Mesos: a Level of Indirec1on Mesos (slaves) Coordinator Mesos (master) Coordinator
  • 17.
    Mesos: a Level of Indirec1on Mesos (slaves) Coordinator Mesos (master) Coordinator
  • 18.
    Mesos: a Level of Indirec1on Mesos (slaves) Coordinator Mesos (master) Coordinator
  • 19.
    Mesos: a Level of Indirec1on Mesos (slaves) Coordinator Mesos (master) Coordinator
  • 20.
    Mesos: a Level of Indirec1on Mesos (slaves) Coordinator Mesos (master) Coordinator
  • 21.
  • 22.
    s/Coordinator/Scheduler/ Mesos (slaves) Coordinator Mesos (master)
  • 23.
    s/Coordinator/Scheduler/ Mesos (slaves) Scheduler Mesos (master)
  • 24.
    Apache Hadoop Mesos (slaves) JobTracker (Scheduler) Mesos (master)
  • 25.
    Distributed System ≈ (Mesos) framework
  • 26.
    a Mesos framework is a distributed system that has a coordinator
  • 27.
    a Mesos framework is a distributed system that has a coordinator
  • 28.
    a Mesos framework is a distributed system that has a scheduler a
  • 29.
    a Mesos framework is an app for your cluster
  • 30.
    How can I use Mesos?
  • 31.
  • 32.
    Jenkins • Con1nuous build server • Just install a plugin!
  • 33.
    Hadoop • Mul1-­‐cluster isola1on • Fast startup • Just run the repacked Cloudera CDH 4.2.1 MR1 distribu1on for Mesos
  • 34.
    Marathon • PaaS on Mesos • init.d for the cluster • Docker support • Scales at the click of a budon • Manages edge routers -­‐ HAProxy
  • 35.
    Chronos • Distributed cron • Supports job dependencies • REST API
  • 36.
    Aurora • Advanced PaaS on Mesos • Powers Twider • Supports phased rollouts • Supports complex deployments
  • 37.
    Spark • In memory Map Reduce, built for “Medium Data” • Supports SQL as well as Java, Python, and Scala • Designed for interac1ve analysis via REPL
  • 38.
    How do I use these? • Free online interac1ve tutorials! – hdp://mesosphere.io/learn • Covers all of the previously men1oned and many more
  • 39.
    How can I build on Mesos?
  • 40.
    Cluster Manager Status Quo Applica?on/Human Specifica1on Cluster Manager The specifica1on includes as much informa1on as possible to assist the cluster manager in scheduling and execu1on
  • 41.
    Cluster Manager Status Quo Applica?on/Human Cluster Manager Wait for task to be executed
  • 42.
    Cluster Manager Status Quo Applica?on/Human Result Cluster Manager
  • 43.
    Problems with Specifica1ons ① Hard to specify certain desires or constraints ② Hard to update specifica1ons dynamically as tasks execute and finish/fail
  • 44.
    An Alterna1ve Model Scheduler Mesos request 3 CPUs 2 GB RAM • A request is purposely simplified subset of a specifica1on • It is just the required resources at that point in )me
  • 45.
    What should you do if you can’t sa1sfy a request?
  • 46.
    What should you do if you can’t sa1sfy a request? ① Wait un?l you can …
  • 47.
    What should you do if you can’t sa1sfy a request? ① Wait un?l you can … ② Offer best you can immediately
  • 48.
    What should you do if you can’t sa1sfy a request? ① Wait un?l you can … ② Offer best you can immediately
  • 49.
    Mesos Model Scheduler Mesos offer hostname 4 CPUs 4 GB RAM • Resources are allocated via resource offers • A resource offer represents a snapshot of available resources that a scheduler can use to run tasks
  • 50.
    An Analogue: non-­‐blocking sockets Applica?on Kernel write(s, buffer, size);!
  • 51.
    An Analogue: non-­‐blocking sockets Applica?on Kernel 42 of 100 bytes written!!
  • 52.
    Mesos Model offer hostname 4 CPUs 4 GB RAM offer hostname 4 CPUs 4 GB RAM offer hostname 4 CPUs 4 GB RAM Scheduler Mesos offer hostname 4 CPUs 4 GB RAM Scheduler uses the offers to decide what tasks to run
  • 53.
    Mesos Model Scheduler Mesos Scheduler uses the offers to decide what tasks to run “Two-­‐level scheduling” task 3 CPUs 2 GB RAM
  • 54.
    Two-­‐level Scheduling •Mesos: controls resource alloca+ons to schedulers • Schedulers: make decisions about what tasks to run given allocated resources
  • 55.
    Two-­‐level Scheduling Elsewhere • Mesos influenced by opera1ng system supported user-­‐space scheduling – E.g. green threads, gorou1nes • Mesos is designed less like a “cluster manager” and more like an opera1ng system (or kernel)
  • 56.
  • 57.
    Should I build it on Mesos? • Theme of MesosCon: it’s easy to build frameworks • Open source and proprietary frameworks are being created all the 1me – Two Sigma – Neplix – Twider – Hubspot
  • 58.
    But should I really build it on Mesos? • Most users just use Marathon, Hadoop, Spark, and Chronos • Why did we build our own? – Exo1c workload
  • 59.
    The Plan, redux What is Mesos? How can I use Mesos? How can I build on Mesos?
  • 60.