© 2015 Mesosphere, Inc. All Rights Reserved. 1
SCALING LIKE
TWITTER WITH
APACHE MESOS
Philip Norman & Sunil Shah
© 2015 Mesosphere, Inc. All Rights Reserved. 2
Dan the Datacenter Operator
Doesn’t sleep very well
Loves automation
Wants to control what runs in his
datacenter
Alice the Application Developer
Finds setting up infrastructure tedious
Wants her application to be deployed
as quickly as possible
MODERN INFRASTRUCTURE
© 2015 Mesosphere, Inc. All Rights Reserved. 3
Clean separation of
responsibilities
No more 3am wake
ups
Easy programmatic
deployment
3 TENETS
Modern Infrastructure
© 2015 Mesosphere, Inc. All Rights Reserved. 4
CLEAN
SEPARATION
Modern Infrastructure
Before
- Dan cares about his
hardware and Alice’s
software that runs on it
- Alice cares about her
software and what
hardware Dan provides
Now
- With Mesos, all the nodes
are provisioned exactly the
same (but may have
heterogenous hardware).
- Dan doesn’t care what
software is deployed since
applications are well
encapsulated.
- Alice doesn’t care where
her software is deployed
because it’s easy enough to
scale up and down.
© 2015 Mesosphere, Inc. All Rights Reserved. 5
NO MORE
3AM WAKE
UPS
Modern Infrastructure
Before
- Dan had to react every time
an application or machine
went down.
Now
- Mesos and Marathon
monitor running tasks.
- If a task fails or is lost (due
to a machine going offline),
Mesos communicates that
to Marathon.
- Marathon restarts the
application.
- Dan gets to sleep
peacefully!
© 2015 Mesosphere, Inc. All Rights Reserved. 6
EASY
PROGRAMMATIC
DEPLOYMENT
Modern Infrastructure
Before
- Servers were handcrafted.
- Deploying new or updated
software would require
oversight and involvement
from both Alice and Dan.
Now
- Dan provides Alice with her
own instance of Marathon
that makes it hard for her
to take down someone
else’s application.
- Running applications are
isolated from each other by
Mesos.
- Marathon offers a nice API
that allows Alice to easily
deploy new versions safely.
© 2015 Mesosphere, Inc. All Rights Reserved. 7
Apache Mesos
LAYER OF ABSTRACTION
© 2015 Mesosphere, Inc. All Rights Reserved. 8
Apache Mesos
INTRODUCTION
Apache Mesos is a cluster resource manager.
It handles:
Aggregating resources and offering them to schedulers
Launching tasks (i.e. processes) on those resources
Communicating the state of those tasks back to schedulers
INTRODUCTION
Apache Mesos
© 2015 Mesosphere, Inc. All Rights Reserved. 9
PRODUCTION CUSTOMERS AND MESOS USERS
Government Agencies
© 2015 Mesosphere, Inc. All Rights Reserved. 10
MESOS:
ORIGINS
© 2015 Mesosphere, Inc. All Rights Reserved. 11
THE BIRTH OF MESOS
TWITTER TECH TALK
The grad students working on Mesos
give a tech talk at Twitter.
March 2010
APACHE INCUBATION
Mesos enters the Apache Incubator.
Spring 2009
CS262B
Ben Hindman, Andy Konwinski and
Matei Zaharia create “Nexus” as their
CS262B class project.
MESOS PUBLISHED
Mesos: A Platform for Fine-Grained
Resource Sharing in the Data Center is
published as a technical report.
September 2010
December 2010
© 2015 Mesosphere, Inc. All Rights Reserved. 12
Sharing resources between batch
processing frameworks
Hadoop
MPI
Spark
What does an operating system provide?
Resource management
Programming abstractions
Security
Monitoring, debugging, logging
TECHNOLOGY VISION
© 2015 Mesosphere, Inc. All Rights Reserved. 13
ARCHITECTURE
MESOS FUNDAMENTALS
© 2015 Mesosphere, Inc. All Rights Reserved. 14
ARCHITECTURE
MESOS FUNDAMENTALS
Agents advertise resources to Master
Master offers resources to Scheduler
Scheduler rejects/uses resources
Agents report task status to Master
© 2015 Mesosphere, Inc. All Rights Reserved. 15
ARCHITECTURE
MESOS FUNDAMENTALS
Agents advertise resources to Master
Master offers resources to Scheduler
Scheduler rejects/uses resources
Agents report task status to Master
© 2015 Mesosphere, Inc. All Rights Reserved. 16
ARCHITECTURE
MESOS FUNDAMENTALS
Agents advertise resources to Master
Master offers resources to Scheduler
Scheduler rejects/uses resources
Agents report task status to Master
© 2015 Mesosphere, Inc. All Rights Reserved. 17
ARCHITECTURE
MESOS FUNDAMENTALS
Agents advertise resources to Master
Master offers resources to Scheduler
Scheduler rejects/uses resources
Agents report task status to Master
© 2015 Mesosphere, Inc. All Rights Reserved. 18
ARCHITECTURE
MESOS FUNDAMENTALS
Agents advertise resources to Master
Master offers resources to Scheduler
Scheduler rejects/uses resources
Agents report task status to Master
© 2015 Mesosphere, Inc. All Rights Reserved. 19
A naive approach to handling varied app
requirements: static partitioning.
This can cope with heterogeneity, but is
very expensive.
KEEP IT STATIC
time
© 2015 Mesosphere, Inc. All Rights Reserved. 20
Maintaining sufficient headroom to
handle peak workloads on all partitions
leads to poor utilisation overall.
KEEP IT STATIC
time
© 2015 Mesosphere, Inc. All Rights Reserved. 21
Multiple frameworks can use the same
cluster resources, with their share
adjusting dynamically.
SHARED RESOURCES
time
© 2015 Mesosphere, Inc. All Rights Reserved. 22
TWITTER &
MESOS
© 2015 Mesosphere, Inc. All Rights Reserved. 23
THE BIRTH OF MESOS
TWITTER TECH TALK
The grad students working on Mesos
give a tech talk at Twitter.
March 2010
APACHE INCUBATION
Mesos enters the Apache incubator.
Spring 2009
CS262B
Ben Hindman, Andy Konwinski and
Matei Zaharia create Nexus as their
CS262B class project.
MESOS PUBLISHED
Mesos: A Platform for Fine-Grained
Resource Sharing in the Data Center is
published as a technical report.
September 2010
December 2010
© 2015 Mesosphere, Inc. All Rights Reserved. 24
● Former Google engineers at Twitter
thought Mesos could provide the same
functionality as Borg.
● Mesos actually works pretty well for long
running services.
MESOS REALLY HELPS
© 2015 Mesosphere, Inc. All Rights Reserved. 25
MESOS WITH
MARATHON IN
PRODUCTION
© 2015 Mesosphere, Inc. All Rights Reserved. 26
WHAT IS MARATHON?
Mesos with Marathon in Production
Service scheduler for Mesos
init.d for long-running apps
Your own private PaaS
© 2015 Mesosphere, Inc. All Rights Reserved. 27
WHAT IS MARATHON?
Mesos with Marathon in Production
© 2015 Mesosphere, Inc. All Rights Reserved. 28
USEFUL MARATHON FEATURES
Mesos with Marathon in Production
Start, stop, scale, update apps
Highly available, no SPoF
Native Docker support
Powerful Web UI
Fully featured REST API
Pluggable event bus
Artifact staging
© 2015 Mesosphere, Inc. All Rights Reserved. 29
USEFUL MARATHON FEATURES: DEPLOY LIKE FACEBOOK
Mesos with Marathon in Production
Application versioning
Rolling deploy / restart
Deployment strategies
© 2015 Mesosphere, Inc. All Rights Reserved. 30
USEFUL MARATHON FEATURES: DEPLOY LIKE A TELCO
Mesos with Marathon in Production
Application versioning
Hot/hot new/old clusters
Authentic scale testing
Manual *and* automated testing
© 2015 Mesosphere, Inc. All Rights Reserved. 31
MESOS WITH
MARATHON IN
ACTION
© 2015 Mesosphere, Inc. All Rights Reserved. 32
© 2015 Mesosphere, Inc. All Rights Reserved. 33
© 2015 Mesosphere, Inc. All Rights Reserved. 34
© 2015 Mesosphere, Inc. All Rights Reserved. 35
© 2015 Mesosphere, Inc. All Rights Reserved. 36
© 2015 Mesosphere, Inc. All Rights Reserved. 37
© 2015 Mesosphere, Inc. All Rights Reserved. 38
© 2015 Mesosphere, Inc. All Rights Reserved. 39
© 2015 Mesosphere, Inc. All Rights Reserved. 40
© 2015 Mesosphere, Inc. All Rights Reserved. 41
© 2015 Mesosphere, Inc. All Rights Reserved. 42
© 2015 Mesosphere, Inc. All Rights Reserved. 43
Mesos with Marathon in Action
TASK FAILURE
© 2015 Mesosphere, Inc. All Rights Reserved. 44
© 2015 Mesosphere, Inc. All Rights Reserved. 45
© 2015 Mesosphere, Inc. All Rights Reserved. 46
© 2015 Mesosphere, Inc. All Rights Reserved. 47
© 2015 Mesosphere, Inc. All Rights Reserved. 48
How do my applications discover each other?
Two main service discovery mechanisms:
1.DNS based (Mesos-DNS)
2.HAProxy based (Marathon-lb)
SERVICE DISCOVERY
Mesos with Marathon in Production
© 2015 Mesosphere, Inc. All Rights Reserved. 49
MESOS-DNS
Service Discovery
● Ingests cluster state
periodically.
● Uses cluster state to
generate DNS records for
all running Mesos tasks.
● Services query DNS server
to discover IP address and
port of other services.
● Primarily used for internal
service discovery.
● No extra configuration
required!
© 2015 Mesosphere, Inc. All Rights Reserved. 50
MARATHON-LB
Service Discovery
● Ingests state of running
Marathon applications.
● Regenerates HAProxy
configuration.
● Supports virtual hosts!
● Can be used for both
internal and external
service discovery.
● Must add
HAPROXY_GROUP and
HAPROXY_0_VHOST
variables to your
marathon.json.
© 2015 Mesosphere, Inc. All Rights Reserved. 51
Using chef/puppet/ansible (or a reliable intern)
Install ZooKeeper and Mesos
Install your scheduler (Marathon)
Deploy some long-running services.
See https://open.mesosphere.com/getting-started/tools/ for more docs
HOW TO DEPLOY A MESOS CLUSTER (THE HARD WAY)
Mesos as the Datacenter Kernel
© 2015 Mesosphere, Inc. All Rights Reserved. 52
Visit http://mesosphere.com
Hit the ‘Get Started’ button
HOW TO DEPLOY A MESOS CLUSTER (OUR WAY)
Mesos as the Datacenter Kernel
© 2015 Mesosphere, Inc. All Rights Reserved. 53
Existing
Infrastructure
Mesosphere
DCOS
Services &
ContainersMESOS AS
THE
DATACENTER
KERNEL
© 2015 Mesosphere, Inc. All Rights Reserved. 54
Jenkins on Mesos allows you to share build resources between multiple Jenkins
masters.
PayPal does this with hundreds of Jenkins masters
Between them, they use less than a hundred build slaves to service several
thousand developers.
Combining Jenkins with a PaaS like Marathon or Kubernetes allows you to practice
easy continuous deployment.
JENKINS: BUILD RESOURCE POOLING
Mesos as the Datacenter Kernel
© 2015 Mesosphere, Inc. All Rights Reserved. 55
An example continuous
deployment pipeline
using Mesos, Marathon,
Docker and Jenkins on
Mesos.
© 2015 Mesosphere, Inc. All Rights Reserved. 56
CONTINUOUS
DELIVERY
DEMO
© 2015 Mesosphere, Inc. All Rights Reserved. 57
KUBERNETES ON MESOS
Mesos as the Datacenter Kernel
© 2015 Mesosphere, Inc. All Rights Reserved. 58
KUBERNETES ON MESOS
Mesos as the Datacenter Kernel
© 2015 Mesosphere, Inc. All Rights Reserved. 59
Mesos was built for and is great for running big data workloads:
Chronos (time scheduled jobs)
Spark
Cassandra
Kafka
Hadoop/YARN (via Myriad)
BIG DATA ON MESOS
Mesos as the Datacenter Kernel
QUESTIONS?
THANK YOU!
60
Come and talk to us!
● Email us at philip@mesosphere.io, sunil@mesosphere.io
● Slides will be up at http://mesosphere.github.io/presentations

Scaling Like Twitter with Apache Mesos

  • 1.
    © 2015 Mesosphere,Inc. All Rights Reserved. 1 SCALING LIKE TWITTER WITH APACHE MESOS Philip Norman & Sunil Shah
  • 2.
    © 2015 Mesosphere,Inc. All Rights Reserved. 2 Dan the Datacenter Operator Doesn’t sleep very well Loves automation Wants to control what runs in his datacenter Alice the Application Developer Finds setting up infrastructure tedious Wants her application to be deployed as quickly as possible MODERN INFRASTRUCTURE
  • 3.
    © 2015 Mesosphere,Inc. All Rights Reserved. 3 Clean separation of responsibilities No more 3am wake ups Easy programmatic deployment 3 TENETS Modern Infrastructure
  • 4.
    © 2015 Mesosphere,Inc. All Rights Reserved. 4 CLEAN SEPARATION Modern Infrastructure Before - Dan cares about his hardware and Alice’s software that runs on it - Alice cares about her software and what hardware Dan provides Now - With Mesos, all the nodes are provisioned exactly the same (but may have heterogenous hardware). - Dan doesn’t care what software is deployed since applications are well encapsulated. - Alice doesn’t care where her software is deployed because it’s easy enough to scale up and down.
  • 5.
    © 2015 Mesosphere,Inc. All Rights Reserved. 5 NO MORE 3AM WAKE UPS Modern Infrastructure Before - Dan had to react every time an application or machine went down. Now - Mesos and Marathon monitor running tasks. - If a task fails or is lost (due to a machine going offline), Mesos communicates that to Marathon. - Marathon restarts the application. - Dan gets to sleep peacefully!
  • 6.
    © 2015 Mesosphere,Inc. All Rights Reserved. 6 EASY PROGRAMMATIC DEPLOYMENT Modern Infrastructure Before - Servers were handcrafted. - Deploying new or updated software would require oversight and involvement from both Alice and Dan. Now - Dan provides Alice with her own instance of Marathon that makes it hard for her to take down someone else’s application. - Running applications are isolated from each other by Mesos. - Marathon offers a nice API that allows Alice to easily deploy new versions safely.
  • 7.
    © 2015 Mesosphere,Inc. All Rights Reserved. 7 Apache Mesos LAYER OF ABSTRACTION
  • 8.
    © 2015 Mesosphere,Inc. All Rights Reserved. 8 Apache Mesos INTRODUCTION Apache Mesos is a cluster resource manager. It handles: Aggregating resources and offering them to schedulers Launching tasks (i.e. processes) on those resources Communicating the state of those tasks back to schedulers INTRODUCTION Apache Mesos
  • 9.
    © 2015 Mesosphere,Inc. All Rights Reserved. 9 PRODUCTION CUSTOMERS AND MESOS USERS Government Agencies
  • 10.
    © 2015 Mesosphere,Inc. All Rights Reserved. 10 MESOS: ORIGINS
  • 11.
    © 2015 Mesosphere,Inc. All Rights Reserved. 11 THE BIRTH OF MESOS TWITTER TECH TALK The grad students working on Mesos give a tech talk at Twitter. March 2010 APACHE INCUBATION Mesos enters the Apache Incubator. Spring 2009 CS262B Ben Hindman, Andy Konwinski and Matei Zaharia create “Nexus” as their CS262B class project. MESOS PUBLISHED Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center is published as a technical report. September 2010 December 2010
  • 12.
    © 2015 Mesosphere,Inc. All Rights Reserved. 12 Sharing resources between batch processing frameworks Hadoop MPI Spark What does an operating system provide? Resource management Programming abstractions Security Monitoring, debugging, logging TECHNOLOGY VISION
  • 13.
    © 2015 Mesosphere,Inc. All Rights Reserved. 13 ARCHITECTURE MESOS FUNDAMENTALS
  • 14.
    © 2015 Mesosphere,Inc. All Rights Reserved. 14 ARCHITECTURE MESOS FUNDAMENTALS Agents advertise resources to Master Master offers resources to Scheduler Scheduler rejects/uses resources Agents report task status to Master
  • 15.
    © 2015 Mesosphere,Inc. All Rights Reserved. 15 ARCHITECTURE MESOS FUNDAMENTALS Agents advertise resources to Master Master offers resources to Scheduler Scheduler rejects/uses resources Agents report task status to Master
  • 16.
    © 2015 Mesosphere,Inc. All Rights Reserved. 16 ARCHITECTURE MESOS FUNDAMENTALS Agents advertise resources to Master Master offers resources to Scheduler Scheduler rejects/uses resources Agents report task status to Master
  • 17.
    © 2015 Mesosphere,Inc. All Rights Reserved. 17 ARCHITECTURE MESOS FUNDAMENTALS Agents advertise resources to Master Master offers resources to Scheduler Scheduler rejects/uses resources Agents report task status to Master
  • 18.
    © 2015 Mesosphere,Inc. All Rights Reserved. 18 ARCHITECTURE MESOS FUNDAMENTALS Agents advertise resources to Master Master offers resources to Scheduler Scheduler rejects/uses resources Agents report task status to Master
  • 19.
    © 2015 Mesosphere,Inc. All Rights Reserved. 19 A naive approach to handling varied app requirements: static partitioning. This can cope with heterogeneity, but is very expensive. KEEP IT STATIC time
  • 20.
    © 2015 Mesosphere,Inc. All Rights Reserved. 20 Maintaining sufficient headroom to handle peak workloads on all partitions leads to poor utilisation overall. KEEP IT STATIC time
  • 21.
    © 2015 Mesosphere,Inc. All Rights Reserved. 21 Multiple frameworks can use the same cluster resources, with their share adjusting dynamically. SHARED RESOURCES time
  • 22.
    © 2015 Mesosphere,Inc. All Rights Reserved. 22 TWITTER & MESOS
  • 23.
    © 2015 Mesosphere,Inc. All Rights Reserved. 23 THE BIRTH OF MESOS TWITTER TECH TALK The grad students working on Mesos give a tech talk at Twitter. March 2010 APACHE INCUBATION Mesos enters the Apache incubator. Spring 2009 CS262B Ben Hindman, Andy Konwinski and Matei Zaharia create Nexus as their CS262B class project. MESOS PUBLISHED Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center is published as a technical report. September 2010 December 2010
  • 24.
    © 2015 Mesosphere,Inc. All Rights Reserved. 24 ● Former Google engineers at Twitter thought Mesos could provide the same functionality as Borg. ● Mesos actually works pretty well for long running services. MESOS REALLY HELPS
  • 25.
    © 2015 Mesosphere,Inc. All Rights Reserved. 25 MESOS WITH MARATHON IN PRODUCTION
  • 26.
    © 2015 Mesosphere,Inc. All Rights Reserved. 26 WHAT IS MARATHON? Mesos with Marathon in Production Service scheduler for Mesos init.d for long-running apps Your own private PaaS
  • 27.
    © 2015 Mesosphere,Inc. All Rights Reserved. 27 WHAT IS MARATHON? Mesos with Marathon in Production
  • 28.
    © 2015 Mesosphere,Inc. All Rights Reserved. 28 USEFUL MARATHON FEATURES Mesos with Marathon in Production Start, stop, scale, update apps Highly available, no SPoF Native Docker support Powerful Web UI Fully featured REST API Pluggable event bus Artifact staging
  • 29.
    © 2015 Mesosphere,Inc. All Rights Reserved. 29 USEFUL MARATHON FEATURES: DEPLOY LIKE FACEBOOK Mesos with Marathon in Production Application versioning Rolling deploy / restart Deployment strategies
  • 30.
    © 2015 Mesosphere,Inc. All Rights Reserved. 30 USEFUL MARATHON FEATURES: DEPLOY LIKE A TELCO Mesos with Marathon in Production Application versioning Hot/hot new/old clusters Authentic scale testing Manual *and* automated testing
  • 31.
    © 2015 Mesosphere,Inc. All Rights Reserved. 31 MESOS WITH MARATHON IN ACTION
  • 32.
    © 2015 Mesosphere,Inc. All Rights Reserved. 32
  • 33.
    © 2015 Mesosphere,Inc. All Rights Reserved. 33
  • 34.
    © 2015 Mesosphere,Inc. All Rights Reserved. 34
  • 35.
    © 2015 Mesosphere,Inc. All Rights Reserved. 35
  • 36.
    © 2015 Mesosphere,Inc. All Rights Reserved. 36
  • 37.
    © 2015 Mesosphere,Inc. All Rights Reserved. 37
  • 38.
    © 2015 Mesosphere,Inc. All Rights Reserved. 38
  • 39.
    © 2015 Mesosphere,Inc. All Rights Reserved. 39
  • 40.
    © 2015 Mesosphere,Inc. All Rights Reserved. 40
  • 41.
    © 2015 Mesosphere,Inc. All Rights Reserved. 41
  • 42.
    © 2015 Mesosphere,Inc. All Rights Reserved. 42
  • 43.
    © 2015 Mesosphere,Inc. All Rights Reserved. 43 Mesos with Marathon in Action TASK FAILURE
  • 44.
    © 2015 Mesosphere,Inc. All Rights Reserved. 44
  • 45.
    © 2015 Mesosphere,Inc. All Rights Reserved. 45
  • 46.
    © 2015 Mesosphere,Inc. All Rights Reserved. 46
  • 47.
    © 2015 Mesosphere,Inc. All Rights Reserved. 47
  • 48.
    © 2015 Mesosphere,Inc. All Rights Reserved. 48 How do my applications discover each other? Two main service discovery mechanisms: 1.DNS based (Mesos-DNS) 2.HAProxy based (Marathon-lb) SERVICE DISCOVERY Mesos with Marathon in Production
  • 49.
    © 2015 Mesosphere,Inc. All Rights Reserved. 49 MESOS-DNS Service Discovery ● Ingests cluster state periodically. ● Uses cluster state to generate DNS records for all running Mesos tasks. ● Services query DNS server to discover IP address and port of other services. ● Primarily used for internal service discovery. ● No extra configuration required!
  • 50.
    © 2015 Mesosphere,Inc. All Rights Reserved. 50 MARATHON-LB Service Discovery ● Ingests state of running Marathon applications. ● Regenerates HAProxy configuration. ● Supports virtual hosts! ● Can be used for both internal and external service discovery. ● Must add HAPROXY_GROUP and HAPROXY_0_VHOST variables to your marathon.json.
  • 51.
    © 2015 Mesosphere,Inc. All Rights Reserved. 51 Using chef/puppet/ansible (or a reliable intern) Install ZooKeeper and Mesos Install your scheduler (Marathon) Deploy some long-running services. See https://open.mesosphere.com/getting-started/tools/ for more docs HOW TO DEPLOY A MESOS CLUSTER (THE HARD WAY) Mesos as the Datacenter Kernel
  • 52.
    © 2015 Mesosphere,Inc. All Rights Reserved. 52 Visit http://mesosphere.com Hit the ‘Get Started’ button HOW TO DEPLOY A MESOS CLUSTER (OUR WAY) Mesos as the Datacenter Kernel
  • 53.
    © 2015 Mesosphere,Inc. All Rights Reserved. 53 Existing Infrastructure Mesosphere DCOS Services & ContainersMESOS AS THE DATACENTER KERNEL
  • 54.
    © 2015 Mesosphere,Inc. All Rights Reserved. 54 Jenkins on Mesos allows you to share build resources between multiple Jenkins masters. PayPal does this with hundreds of Jenkins masters Between them, they use less than a hundred build slaves to service several thousand developers. Combining Jenkins with a PaaS like Marathon or Kubernetes allows you to practice easy continuous deployment. JENKINS: BUILD RESOURCE POOLING Mesos as the Datacenter Kernel
  • 55.
    © 2015 Mesosphere,Inc. All Rights Reserved. 55 An example continuous deployment pipeline using Mesos, Marathon, Docker and Jenkins on Mesos.
  • 56.
    © 2015 Mesosphere,Inc. All Rights Reserved. 56 CONTINUOUS DELIVERY DEMO
  • 57.
    © 2015 Mesosphere,Inc. All Rights Reserved. 57 KUBERNETES ON MESOS Mesos as the Datacenter Kernel
  • 58.
    © 2015 Mesosphere,Inc. All Rights Reserved. 58 KUBERNETES ON MESOS Mesos as the Datacenter Kernel
  • 59.
    © 2015 Mesosphere,Inc. All Rights Reserved. 59 Mesos was built for and is great for running big data workloads: Chronos (time scheduled jobs) Spark Cassandra Kafka Hadoop/YARN (via Myriad) BIG DATA ON MESOS Mesos as the Datacenter Kernel
  • 60.
    QUESTIONS? THANK YOU! 60 Come andtalk to us! ● Email us at philip@mesosphere.io, sunil@mesosphere.io ● Slides will be up at http://mesosphere.github.io/presentations

Editor's Notes

  • #5 In the old world, both our datacenter operator Dan and our application developer Alice had to be concerned about both the physical infrastructure running an application and the actual configuration and deployment of an application. - Dan would have to know how to run Alice's application in case a node went away. - We get a painful situation where Dan has to treat each of his machines like carefully crafted snowflakes. - Alice would have to know how much hardware to ask Dan for, and would usually overestimate to be safe. - This means that resource utilisation is often pretty bad. Now: - In the Mesos world, you have a homogenous cluster running your applications. It doesn't matter where your software gets deployed since it brings its dependencies with it. - Dan doesn't have to worry about specific machines going away and disproportionately impacting specific applications. - Alice doesn't have to worry too much about requesting enough resources ahead of time, because it's quick and easy to scale her application.
  • #6  If an application or server went down, your infrastructure might have enough slack to tolerate it but often that's not possible. Your operator and/or developer gets paged and has to restart the application manually or try to revive the faulty hardware. Now - Mesos and Marathon provide a health checking mechanism that allows them to tell if an application is up and responsive. If it's not, they'll reap the application and start it again somewhere else. - Similarly, if a machine goes down, Mesos communicates the task failure to Marathon. It will try to maintain the desired state (e.g. 100 instances) and re-deploy the application somewhere else. - Dan doesn't get woken up because DCOS takes care of these failures.
  • #7 - Provisioning machines was a privileged action. Alice would have to petition Dan for an upgrade. Changes were slow. Now - - Dan can provide Alice with her own instance of Marathon. These tasks are inherently isolated from other tasks running on the same cluster. Alice is able to do what she run what she likes using this instance (with some reasonable constraints). - Alice can now programmatically deploy applications to Marathon using its APIs.
  • #8 Coordinator -> scheduler Slaves -> agents
  • #12 Spring 2009: 262B project between BenH, Andy and Matei 2009: Nexus project started at UC Berkeley’s RAD Lab (later renamed to Mesos) March 2010: Grad students give talk at Twitter, Twitter starts exploring deployment September 2010: Mesos technical report published (http://mesos.berkeley.edu/mesos_tech_report.pdf) December 2010: Mesos enters Apache Incubator (http://incubator.apache.org/projects/mesos.html)
  • #14 Slave -> agent
  • #23 Sunil starts here Let’s talk a little about how Mesos had so much utility for Twitter
  • #24 When Mesos was presented at Twitter, engineers in the audience immediately saw potential. A couple of engineers, who had been at Google, saw this as a possible replacement for Borg.
  • #25 Mesos actually works pretty well for long running services.
  • #49 This a little more involved. - Marathon takes care of deploying your application, just post marathon.json to http://dcos-host/service/my-marathon - It has different strategies, default is to start as many healthy new instances of the app as are currently running before killing the old app. - Configurable.