Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Introduction to mesos
1. Mesos Cluster
• A Mesos cluster is comprised of
– Mesos master node
– Mesos agent nodes (earlier called as slave nodes)
• All agents in the cluster report to Mesos master, of their
available resources (e.g., 4 CPUs and 6 GB of memory).
• Master aggregates the reported resources.
• Master creates bundles of resources called ‘Resource offers’.
• Master sends the ‘resource offer(s)’ to the registered
framework(s), based on the allocation policy.
06 August 2016 murali.s.iyengar@gmail.com 1
2. Mesos Cluster (contd…)
• A framework can either accept or reject an offer.
• If the offer is accepted:
– Framework sends the master the description of the tasks to be run.
– The master sends these tasks to corresponding agent(s).
– Agent allocates appropriate resources to the framework’s executor.
– Executor in turn launches the tasks.
• If the framework rejects the offer, the resources are offered to
other framework(s).
06 August 2016 murali.s.iyengar@gmail.com 2
4. Mesos Framework
• In Mesos terminology, a framework is a software that
manages and executes jobs on a cluster.
– Scheduler registers with the master and gets resource offers
– Executor is launched on agent nodes to run the framework’s tasks
• Mesos scheduling is distributed and occurs in 2 levels, using a
mechanism called resource offers.
– Mesos scheduler decides "how much" resources to offer each
framework according to a given policy (fair sharing, or strict priority)
– Framework scheduler decides "which" resources to accept and which
tasks to run on them.
• Allowing frameworks to ‘reject’ resource offers is Mesos way
of satisfying the constraints of a framework.
06 August 2016 murali.s.iyengar@gmail.com 4
5. Efficiency & Robustness
• Resource filters
– offer me nodes from list L only
– offer me nodes with at least R resources free
Default resource filters – stop the rejected offer for 5 seconds
• Hey framework, I know you are a responsible citizen. Let me
put a check too – mesos.
– I’ll count the offered resources towards your share of the cluster
• Offers eventually timeout and will be rescinded.
06 August 2016 murali.s.iyengar@gmail.com 5
6. Resource Allocation & Revocation
• Allocation modules are pluggable.
• Buggy job? Or Greedy framework? – kill them. Yes.. by
providing a grace period, of course .
– Mesos requests the respective executor to kill the task, but if the
executor doesn’t respond, it kills the entire executor and all its tasks.
• Tell me my threshold – framework.
• Below ‘guaranteed allocation’/threshold? – no tasks are killed.
• Consider my task priorities while killing my tasks – framework.
• What is the trigger for revocation?
– Frameworks can indicate their interest in offers.
06 August 2016 murali.s.iyengar@gmail.com 6
7. Resource Isolation
• Mesos leverages the resource isolation mechanism of the OS.
• Isolation modules are pluggable.
• Isolation mechanisms limit the CPU, memory, network
bandwidth and such other things.
• Dynamic reconfiguration of resource limits is essential for
achieving finer granularity of resource sharing.
06 August 2016 murali.s.iyengar@gmail.com 7
8. Fault Tolerance
• Hot-Standby for failover.
• Mesos master is in soft state – reconstruct from periodic
messages from agents and framework schedulers.
• Zookeeper elects a new master and directs all slaves and
framework schedulers to this new master.
• Mesos reports failures of task, agent and executor to
frameworks’ schedulers.
• A framework can register multiple schedulers to address
scheduler failures. Schedulers need to share state amongst
them.
06 August 2016 murali.s.iyengar@gmail.com 8