marc@sloppy.io
Introduction to Apache Mesos
marc@sloppy.io
Overview
● Introduction
● Architecture
● Security
● Container
● High availability
marc@sloppy.io
Introduction
● First release in 2009 at the Berkley University
● Framework to use datacenter resources
efficiently
● Combine Cpu, storage, memory etc. to one big
shared virtual resource
● A distributed systems kernel
● 10.000 lines C++ code
marc@sloppy.io
Introduction - Definitions
● Master - Scheduler
● Slaves – Working Nodes
● Frameworks – Application running on Mesos
● Executors – Run tasks on the slaves
● Executor-Task - Running job on the slave
● Resource Offer - Slave resources which could
be used by the frameworks
marc@sloppy.io
Architecture
marc@sloppy.io
Introduction - Frameworks
https://docs.mesosphere.com/frameworks/
marc@sloppy.io
Resource Allocation
marc@sloppy.io
Resource Allocation
1) Slave 1 reports to the master that it has 4
CPUs and 4 GB of memory free. The master
then invokes the allocation policy module, which
tells it that framework 1 should be offered all
available resources.
2) The master sends a resource offer describing
what is available on slave 1 to framework 1.
marc@sloppy.io
Resource Allocation
3) The framework’s scheduler replies to the
master with information about two tasks to run
on the slave, using <2 CPUs, 1 GB RAM> for
the first task, and <1 CPUs, 2 GB RAM> for the
second task.
2) Finally, the master sends the tasks to the
slave, which allocates appropriate resources
to the framework’s executor, which in turn
launches the two tasks
marc@sloppy.io
Resource Allocation - DRF
● Resource offer decision are made by the
Resource Allocation Modul in the master
● In a heterogeneous environment resource
allocation is difficult
● What is a fair share, when:
User a require 1 CPU, 4GB RAM
User b require 3 CPUs, 1 GB RAM
● Mesos: Dominant Resource Fairness
marc@sloppy.io
DRF
● A modified fair share algorithm
● The goal is that each framework receives a fair
share of the the resources most needed by the
framework
● Dominant resource: Resource most demand by
the framework
● Dominant Share: The highest percentage of
shares owned across all resources of a
framework
marc@sloppy.io
DRF - Example
● Resource offer: 9 Cpu, 18GB RAM
● Tasks User A: 1CPU, 4 GB RAM - RAM=DR
● Tasks User B: 3CPUs, 1GB RAM – CPU=DR
● Each Framework has 2/3 of its DS
marc@sloppy.io
DRF - Example
● Framework1: 1CPU, 4GB RAM
● Framework2: 3CPU, 1GB RAM
● Buggy tasks could be killed by mesos
● Framework can have guaranteed allocation, non
of its tasks should be killed
marc@sloppy.io
RA Master Configuration
Name Default Example
allocation_interval 1s
framework_sorter drf
user_sorter drf
offer_timeout 5 minutes
roles - marathon,jenkins
weights - marathon=2,jenkins=1
marc@sloppy.io
RA Slave Configuration
Name Default Example
attributes ssd:true,rack:2
default_role *
resources cpus(jenkins):1;disk(jenkins):10000;
cpus(marathon):3;mem(marathon):2000
marc@sloppy.io
Mesos Security
● Default configuration = No security
Name Example
Master authenticate_slaves true
credentials /etc/mesos.pw
authenticators crammd5
authenticate true
Slave credential /etc/mesos.pw
marc@sloppy.io
Framework Security
1) Framework to (re-)register with authorized roles
2)Framework to launch task/executors as authorized
users
3)Authorized principals to shutdown frameworks
through “/shutdown” HTTP endpoint
marc@sloppy.io
Security ACLs
Subjects Action Object
principals register_framework roles
usernames run_tasks users
shutdown_frameworks framework_principals
● A set of subjects can perform an action
on a set of objects
marc@sloppy.io
Security ACLs Example
marc@sloppy.io
Extract of the mesos api
URL Function
master:5050/help REST Documentation
master:5050/metrics/snapshot Metrics of the cluster
master:5050/master/tasks.json List mesos tasks
master:5050/master/redirect 307 to the leading master
master:5050/master/shutdown Shutdown Framework
master:5050/registrar(1)/registry Content of the current registry
slave:5051/files/browse.json?path=pathOnSlave Browse files in sandbox
slave:5051/files/read.json?path=stdoutOnSlave Read stdout from sandbox
slave:5051/system/stats.json Local system metrics
marc@sloppy.io
Resource Isolation
● Mesos supports Docker - and Mesos Container
● Resource isolation with cgroups or posix
marc@sloppy.io
Mesos and Docker
marc@sloppy.io
Mesos HA
marc@sloppy.io
Mesos Tasks States
TaskState Int Description
TASK_STARTING 0
TASK_RUNNING 1 Task
TASK_FINISHED 2 TERMINAL: The task finished successfully
TASK_FAILED 3 TERMINAL: The task failed to finished
TASK_KILLED 4 TERMINAL: The task was killed by executor
TASK_LOST 5 TERMINAL: The task was failed but can
rescheduled
TASK_STAGING 6 Initial State
TASK_ERROR 7 TERMINAL: Task description contains an error
marc@sloppy.io
References
● http://mesos.apache.org/documentation/latest/
● Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
● Dominant Resource Fairness: Fair Allocation of Multiple Resource Types
● playing-traffic-cop-resource-allocation-in-apache-mesos
● https://mesosphere.com/
marc@sloppy.io
Thank you

Presentation v1 (1)

Editor's Notes

  • #7 Blue: PaaS Green: Big Data Processing Violet: Batch Scheduling Pink. Data Storage
  • #15 allocation_interval:
  • #22 -docker_remove_delay=VALUE