Mesos
A Platform for Fine-Grained
Resource Sharing in the Data Center
Background
• Rapid innovation in cluster Computing
frameworks
Problem
• Rapid innovation in cluster computing frameworks
• No single framework optimal for all applications
• Want to run multiple frameworks in a single cluster
» …to maximize utilization
» …to share data between frameworks
Where We Want to Go
Solution
• Mesos is a common resource sharing layer over which diverse
frameworks can run
Mesos Goals
• High utilization of resources
• Support diverse frameworks (current & future)
• Scalability to 10,000’s of nodes
• Reliability in face of failures
Mesos
• Fine‐Grained Sharing
» Improved utilization, responsiveness , data locality
• Resource Offers
» Offer available resources to frameworks, let them pick which
resources to use and which tasks to launch
» Keeps Mesos simple, lets it support future frameworks
Mesos Architecture
Mesos architecture diagram, showing two running frameworks
Resource Offers
• Mesos decides how many resources to offer each
framework ,based on an organizational policy such as
fair sharing , while frameworks decide which resources
to accept and which tasks to run on them
• A framework can reject resources that do not satisfy its
constraints in order to wait for ones that do
• Delegating control over scheduling to the frameworks,
push control of task scheduling and execution to the
frameworks
Resource Offers
• Mesos consists of a master process that manages slave daemons
running on each cluster node, and frameworks that run tasks on
these slaves.
• Each resource offer is a list of free resources on multiple slaves.
• Each framework running on Mesos consists of two components:
» a scheduler that registers with the master to be offered resources,
» an executor process that is launched on slave nodes to run the
framework’s tasks.
• When a framework accepts offered resources, it passes Mesos a
description of the tasks it wants to launch on them
Resource Offers
Resource offer example
Resource Offers
Resource Offers
Optimization : Filters
• Let frameworks short‐circuit rejection by providing a
predicate on resources to be offered
» E.g. “ nodes from list L” or “nodes with>8GB RAM ”
» Could generalize to other hints as well
Analysis
• Resource offers work well when:
» Frameworks can scale up and down elastically
» Task durations are homogeneous
» Frameworks have many preferred nodes
• These conditions hold in current data analytics
frameworks (MapReduce, Dryad, …)
» Work divided into short tasks to facilitate load balancing and fault
recovery
» Data replicated across multiple nodes
Resource Allocation
• Mesos delegates allocation decisions to a pluggable
allocation module, so that organizations can tailor
allocation to their needs.
• Have implemented two allocation modules:
» one that performs fair sharing based on a generalization of max-
min fairness for multiple resources(DSF)
» one that implements strict priorities
• Task revoke
» if a cluster becomes filled by long tasks, e.g., due to a buggy job
or a greedy framework, the allocation module can also revoke
(kill) tasks
Fault Tolerance
• Master failover using ZooKeeper
• Mesos master has only soft state: the list of active slaves,
active frameworks, and running tasks
» a new master can completely reconstruct its internal state from
information held by the slaves and the framework schedulers
• When the active master fails, the slaves and schedulers
connect to the next elected master and repopulate its
state.
• Aside from handling master failures, Mesos reports node
failures and executor crashes to frameworks’ schedulers.
Isolation
• Mesos provides performance isolation between
framework executors running on the same slave by
leveraging existing OS isolation mechanisms
• currently isolate resources using OS container
technologies, specifically Linux Containers and Solaris
Projects
• These technologies can limit the CPU, memory, network
bandwidth, and (in new Linux kernels) I/O usage of a
process tree
Data Locality with Resource
Offers
• Ran 16 instances of Hadoop on a shared HDFS cluster
• Used delay scheduling in Hadoop to get locality (wait a
short time to acquire data‐local nodes)
Scalability
• Mesos only performs inter-framework scheduling(e.g. fair
sharing),which is easier than intra‐framework scheduling
• Result:
Scaled to 50,000
Emulated slaves,
200 frameworks,
100K tasks (30s len)
Conclusion
• Mesos shares clusters efficiently among diverse
frameworks thanks to two design elements:
» Fine‐grained sharing at the level of tasks
» Resource offers, a scalable mechanism for
application‐controlled scheduling
• Enables co‐existence of current frameworks and
development of new specialized ones
• In use at Twitter , UC Berkeley , Conviva and UCSF

Mesos study report 03v1.2

  • 1.
    Mesos A Platform forFine-Grained Resource Sharing in the Data Center
  • 2.
    Background • Rapid innovationin cluster Computing frameworks
  • 3.
    Problem • Rapid innovationin cluster computing frameworks • No single framework optimal for all applications • Want to run multiple frameworks in a single cluster » …to maximize utilization » …to share data between frameworks
  • 4.
  • 5.
    Solution • Mesos isa common resource sharing layer over which diverse frameworks can run
  • 6.
    Mesos Goals • Highutilization of resources • Support diverse frameworks (current & future) • Scalability to 10,000’s of nodes • Reliability in face of failures
  • 7.
    Mesos • Fine‐Grained Sharing »Improved utilization, responsiveness , data locality • Resource Offers » Offer available resources to frameworks, let them pick which resources to use and which tasks to launch » Keeps Mesos simple, lets it support future frameworks
  • 8.
    Mesos Architecture Mesos architecturediagram, showing two running frameworks
  • 9.
    Resource Offers • Mesosdecides how many resources to offer each framework ,based on an organizational policy such as fair sharing , while frameworks decide which resources to accept and which tasks to run on them • A framework can reject resources that do not satisfy its constraints in order to wait for ones that do • Delegating control over scheduling to the frameworks, push control of task scheduling and execution to the frameworks
  • 10.
    Resource Offers • Mesosconsists of a master process that manages slave daemons running on each cluster node, and frameworks that run tasks on these slaves. • Each resource offer is a list of free resources on multiple slaves. • Each framework running on Mesos consists of two components: » a scheduler that registers with the master to be offered resources, » an executor process that is launched on slave nodes to run the framework’s tasks. • When a framework accepts offered resources, it passes Mesos a description of the tasks it wants to launch on them
  • 11.
  • 12.
  • 13.
  • 14.
    Optimization : Filters •Let frameworks short‐circuit rejection by providing a predicate on resources to be offered » E.g. “ nodes from list L” or “nodes with>8GB RAM ” » Could generalize to other hints as well
  • 15.
    Analysis • Resource offerswork well when: » Frameworks can scale up and down elastically » Task durations are homogeneous » Frameworks have many preferred nodes • These conditions hold in current data analytics frameworks (MapReduce, Dryad, …) » Work divided into short tasks to facilitate load balancing and fault recovery » Data replicated across multiple nodes
  • 16.
    Resource Allocation • Mesosdelegates allocation decisions to a pluggable allocation module, so that organizations can tailor allocation to their needs. • Have implemented two allocation modules: » one that performs fair sharing based on a generalization of max- min fairness for multiple resources(DSF) » one that implements strict priorities • Task revoke » if a cluster becomes filled by long tasks, e.g., due to a buggy job or a greedy framework, the allocation module can also revoke (kill) tasks
  • 17.
    Fault Tolerance • Masterfailover using ZooKeeper • Mesos master has only soft state: the list of active slaves, active frameworks, and running tasks » a new master can completely reconstruct its internal state from information held by the slaves and the framework schedulers • When the active master fails, the slaves and schedulers connect to the next elected master and repopulate its state. • Aside from handling master failures, Mesos reports node failures and executor crashes to frameworks’ schedulers.
  • 18.
    Isolation • Mesos providesperformance isolation between framework executors running on the same slave by leveraging existing OS isolation mechanisms • currently isolate resources using OS container technologies, specifically Linux Containers and Solaris Projects • These technologies can limit the CPU, memory, network bandwidth, and (in new Linux kernels) I/O usage of a process tree
  • 19.
    Data Locality withResource Offers • Ran 16 instances of Hadoop on a shared HDFS cluster • Used delay scheduling in Hadoop to get locality (wait a short time to acquire data‐local nodes)
  • 20.
    Scalability • Mesos onlyperforms inter-framework scheduling(e.g. fair sharing),which is easier than intra‐framework scheduling • Result: Scaled to 50,000 Emulated slaves, 200 frameworks, 100K tasks (30s len)
  • 21.
    Conclusion • Mesos sharesclusters efficiently among diverse frameworks thanks to two design elements: » Fine‐grained sharing at the level of tasks » Resource offers, a scalable mechanism for application‐controlled scheduling • Enables co‐existence of current frameworks and development of new specialized ones • In use at Twitter , UC Berkeley , Conviva and UCSF