Mesos study report 03v1.2

Mesos
A Platform for Fine-Grained
Resource Sharing in the Data Center

Background
• Rapid innovation in cluster Computing
frameworks

Problem
• Rapid innovation in cluster computing frameworks
• No single framework optimal for all applications
• Want to run multiple frameworks in a single cluster
» …to maximize utilization
» …to share data between frameworks

Solution
• Mesos is a common resource sharing layer over which diverse
frameworks can run

Mesos Goals
• High utilization of resources
• Support diverse frameworks (current & future)
• Scalability to 10,000’s of nodes
• Reliability in face of failures

Mesos
• Fine‐Grained Sharing
» Improved utilization, responsiveness , data locality
• Resource Offers
» Offer available resources to frameworks, let them pick which
resources to use and which tasks to launch
» Keeps Mesos simple, lets it support future frameworks

Mesos Architecture
Mesos architecture diagram, showing two running frameworks

Resource Offers
• Mesos decides how many resources to offer each
framework ,based on an organizational policy such as
fair sharing , while frameworks decide which resources
to accept and which tasks to run on them
• A framework can reject resources that do not satisfy its
constraints in order to wait for ones that do
• Delegating control over scheduling to the frameworks,
push control of task scheduling and execution to the
frameworks

Resource Offers
• Mesos consists of a master process that manages slave daemons
running on each cluster node, and frameworks that run tasks on
these slaves.
• Each resource offer is a list of free resources on multiple slaves.
• Each framework running on Mesos consists of two components:
» a scheduler that registers with the master to be offered resources,
» an executor process that is launched on slave nodes to run the
framework’s tasks.
• When a framework accepts offered resources, it passes Mesos a
description of the tasks it wants to launch on them

Resource Offers
Resource offer example

Optimization : Filters
• Let frameworks short‐circuit rejection by providing a
predicate on resources to be offered
» E.g. “ nodes from list L” or “nodes with>8GB RAM ”
» Could generalize to other hints as well

Analysis
• Resource offers work well when:
» Frameworks can scale up and down elastically
» Task durations are homogeneous
» Frameworks have many preferred nodes
• These conditions hold in current data analytics
frameworks (MapReduce, Dryad, …)
» Work divided into short tasks to facilitate load balancing and fault
recovery
» Data replicated across multiple nodes

Resource Allocation
• Mesos delegates allocation decisions to a pluggable
allocation module, so that organizations can tailor
allocation to their needs.
• Have implemented two allocation modules:
» one that performs fair sharing based on a generalization of max-
min fairness for multiple resources(DSF)
» one that implements strict priorities
• Task revoke
» if a cluster becomes filled by long tasks, e.g., due to a buggy job
or a greedy framework, the allocation module can also revoke
(kill) tasks

Fault Tolerance
• Master failover using ZooKeeper
• Mesos master has only soft state: the list of active slaves,
active frameworks, and running tasks
» a new master can completely reconstruct its internal state from
information held by the slaves and the framework schedulers
• When the active master fails, the slaves and schedulers
connect to the next elected master and repopulate its
state.
• Aside from handling master failures, Mesos reports node
failures and executor crashes to frameworks’ schedulers.

Isolation
• Mesos provides performance isolation between
framework executors running on the same slave by
leveraging existing OS isolation mechanisms
• currently isolate resources using OS container
technologies, specifically Linux Containers and Solaris
Projects
• These technologies can limit the CPU, memory, network
bandwidth, and (in new Linux kernels) I/O usage of a
process tree

Data Locality with Resource
Offers
• Ran 16 instances of Hadoop on a shared HDFS cluster
• Used delay scheduling in Hadoop to get locality (wait a
short time to acquire data‐local nodes)

Scalability
• Mesos only performs inter-framework scheduling(e.g. fair
sharing),which is easier than intra‐framework scheduling
• Result:
Scaled to 50,000
Emulated slaves,
200 frameworks,
100K tasks (30s len)

Conclusion
• Mesos shares clusters efficiently among diverse
frameworks thanks to two design elements:
» Fine‐grained sharing at the level of tasks
» Resource offers, a scalable mechanism for
application‐controlled scheduling
• Enables co‐existence of current frameworks and
development of new specialized ones
• In use at Twitter , UC Berkeley , Conviva and UCSF

Mesos study report 03v1.2

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Mesos study report 03v1.2

Similar to Mesos study report 03v1.2 (20)

Recently uploaded

Recently uploaded (20)

Mesos study report 03v1.2