3. Problem
• Rapid innovation in cluster computing frameworks
• No single framework optimal for all applications
• Want to run multiple frameworks in a single cluster
» …to maximize utilization
» …to share data between frameworks
5. Solution
• Mesos is a common resource sharing layer over which diverse
frameworks can run
6. Mesos Goals
• High utilization of resources
• Support diverse frameworks (current & future)
• Scalability to 10,000’s of nodes
• Reliability in face of failures
7. Mesos
• Fine‐Grained Sharing
» Improved utilization, responsiveness , data locality
• Resource Offers
» Offer available resources to frameworks, let them pick which
resources to use and which tasks to launch
» Keeps Mesos simple, lets it support future frameworks
9. Resource Offers
• Mesos decides how many resources to offer each
framework ,based on an organizational policy such as
fair sharing , while frameworks decide which resources
to accept and which tasks to run on them
• A framework can reject resources that do not satisfy its
constraints in order to wait for ones that do
• Delegating control over scheduling to the frameworks,
push control of task scheduling and execution to the
frameworks
10. Resource Offers
• Mesos consists of a master process that manages slave daemons
running on each cluster node, and frameworks that run tasks on
these slaves.
• Each resource offer is a list of free resources on multiple slaves.
• Each framework running on Mesos consists of two components:
» a scheduler that registers with the master to be offered resources,
» an executor process that is launched on slave nodes to run the
framework’s tasks.
• When a framework accepts offered resources, it passes Mesos a
description of the tasks it wants to launch on them
14. Optimization : Filters
• Let frameworks short‐circuit rejection by providing a
predicate on resources to be offered
» E.g. “ nodes from list L” or “nodes with>8GB RAM ”
» Could generalize to other hints as well
15. Analysis
• Resource offers work well when:
» Frameworks can scale up and down elastically
» Task durations are homogeneous
» Frameworks have many preferred nodes
• These conditions hold in current data analytics
frameworks (MapReduce, Dryad, …)
» Work divided into short tasks to facilitate load balancing and fault
recovery
» Data replicated across multiple nodes
16. Resource Allocation
• Mesos delegates allocation decisions to a pluggable
allocation module, so that organizations can tailor
allocation to their needs.
• Have implemented two allocation modules:
» one that performs fair sharing based on a generalization of max-
min fairness for multiple resources(DSF)
» one that implements strict priorities
• Task revoke
» if a cluster becomes filled by long tasks, e.g., due to a buggy job
or a greedy framework, the allocation module can also revoke
(kill) tasks
17. Fault Tolerance
• Master failover using ZooKeeper
• Mesos master has only soft state: the list of active slaves,
active frameworks, and running tasks
» a new master can completely reconstruct its internal state from
information held by the slaves and the framework schedulers
• When the active master fails, the slaves and schedulers
connect to the next elected master and repopulate its
state.
• Aside from handling master failures, Mesos reports node
failures and executor crashes to frameworks’ schedulers.
18. Isolation
• Mesos provides performance isolation between
framework executors running on the same slave by
leveraging existing OS isolation mechanisms
• currently isolate resources using OS container
technologies, specifically Linux Containers and Solaris
Projects
• These technologies can limit the CPU, memory, network
bandwidth, and (in new Linux kernels) I/O usage of a
process tree
19. Data Locality with Resource
Offers
• Ran 16 instances of Hadoop on a shared HDFS cluster
• Used delay scheduling in Hadoop to get locality (wait a
short time to acquire data‐local nodes)
20. Scalability
• Mesos only performs inter-framework scheduling(e.g. fair
sharing),which is easier than intra‐framework scheduling
• Result:
Scaled to 50,000
Emulated slaves,
200 frameworks,
100K tasks (30s len)
21. Conclusion
• Mesos shares clusters efficiently among diverse
frameworks thanks to two design elements:
» Fine‐grained sharing at the level of tasks
» Resource offers, a scalable mechanism for
application‐controlled scheduling
• Enables co‐existence of current frameworks and
development of new specialized ones
• In use at Twitter , UC Berkeley , Conviva and UCSF