SlideShare a Scribd company logo
Apache Mesos Overview and
Integration:
Docker, Kubernetes, and Beyond
Alex Barreto
Disney Interactive - Data Technology
alex.barreto@disney.com
@shakamunyi
What you should take away
● Overview
● History
● Use Cases
● Integration
– Dokah, dokah, dokah
– Kubernetes
What is Apache Mesos?
Apache Mesos is a cluster manager that
provides efficient resource isolation and sharing
across distributed applications, or frameworks.
It can run Hadoop, MPI, Hypertable, Spark,
Elastic Search, Storm, Aurora, Marathon ... and
other applications on a dynamically shared pool of
nodes.
Rephrased:
- Mesos is a distributed systems kernel -
● Mesos is built using the same principles as the Linux
kernel, only at a different level of abstraction. The
Mesos kernel runs on every machine and provides
applications (e.g., Hadoop, Spark, Kafka, Elastic
Search) with API’s for resource management and
scheduling across entire datacenter and cloud
environments.
● Or you could think of it as a distributed system to build
distributed systems.
Project Highlights
● Top-level Apache project ~ 1 year (mesos.apache.org)
● Scales to 10,000s of nodes
● Obviates the need for virtual machines in some use cases
● Isolation for CPU, RAM, I/O, FS, etc.
● Fault-tolerant leader election (HA) based on Zookeeper
● API's in C++, Java/Scala, Python, Go, Erlang, Haskell.
● Web UI for inspecting state
● Available for Linux, OpenSolaris, Mac OSX
Mesos From 50K.
Who is using Mesos? - This list is old.
Google Refs
● The Datacenter as a Computer: An Introduction to the Design
of Warehouse-Scale Machines
http://research.google.com/pubs/pub35290.html
● 2011 GAFS Omega John Wilkes:
http://youtu.be/0ZFMlO98Jkc
● Omega: flexible, scalable schedulers for large compute
clusters
http://eurosys2013.tudos.org/wp-content/uploads/2013/paper
/Schwarzkopf.pdf
● Taming Latency Variability and Scaling Deep Learning
https://plus.google.com/u/0/+ResearchatGoogle/posts/C1dPh
QhcDRv
History
Understanding of Datacenter Computing
Google has been doing data center computing for
years to address the complexities of large-scale
data workflows:
● Leveraging the modern kernel isolation. (cgroups)
● Containerization !Virtualization (lmctfy - Docker)
● Most (>80) jobs are batch jobs, but the majority of
resources(55-80%) are allocated to service jobs.
● Mixed workloads, multi-tenancy
● Relatively high utilization rates
● JVM? Not so much...
● Reality: scheduling batch is simple;
– scheduling services is hard/expensive.
Refs.
● The Datacenter as a Computer: An Introduction
to the Design of Warehouse-Scale Machines
– http://research.google.com/pubs/pub35290.html
● GAFS Omega John Wilkes
– https://www.youtube.com/watch?v=0ZFMlO98Jkc
● Taming Latency Variability and Scaling Deep
Learning
– http://youtu.be/nK6daeTZGA8
Why Mesos?
● Many existing solutions had architectural
deficiencies around a constraining model.
– Everything is a “job”, good for batch.
– Everything was a “service”, good for PaaS.
– What happens when you want to write your own
distributed application? (no primitives)
– What happens when you want to write your own
scheduler (elastic service). Reinvent the Square
wheel.
The New Reality
● New applications need to be:
– Fault Tolerant (Withstand failure)
– Scalable (Not crumble under it's own weight)
– Elastic (Can grow and shrink based on demand)
– Multi-tenent (It can't have it's own dedicated cluster)
● Must play nice with the other kids.
● So what does that really mean?
Distributed Applications
● “There's Just No Getting Around It: You're Building a Distributed System” Mark Cavage
– queue.acm.org/detail.cfm?id=2482856
● Key takeaways on architecture:
– Decompose the business applications into discrete services on the boundaries of fault
domains, scaling, and data workload.
– Make as many things as possible stateless
– When dealing with state, deeply understand CAP, latency, throughput, and durability
requirements.
“Without practical experience working on successful—and failed—systems, most engineers take
a "hopefully it works" approach and attempt to string together off-the-shelf software, whether
open source or commercial, and often are unsuccessful at building a resilient, performant system.
In reality, building a distributed system requires a methodical approach to requirements along the
boundaries of failure domains, latency, throughput, durability, consistency, and desired SLAs for
the business application at all aspects of the application.”
Google Refs
● The Datacenter as a Computer: An Introduction to the Design
of Warehouse-Scale Machines
http://research.google.com/pubs/pub35290.html
● 2011 GAFS Omega John Wilkes:
http://youtu.be/0ZFMlO98Jkc
● Omega: flexible, scalable schedulers for large compute
clusters
http://eurosys2013.tudos.org/wp-content/uploads/2013/paper
/Schwarzkopf.pdf
● Taming Latency Variability and Scaling Deep Learning
https://plus.google.com/u/0/+ResearchatGoogle/posts/C1dPh
QhcDRv
Emerging
at Berkeley
BDAS Stack
Prior Practice: Dedicated Servers
• low utilization rates
• longer time to ramp up new services
DATACENTER
Prior Practice: Virtualization
DATACENTER PROVISIONED VMS
• even more machines to manage
• substantial performance decrease
due to virtualization
• VM licensing costs
Prior Practice: Static Partitioning
STATIC PARTITIONING
• even more machines to manage
• substantial performance decrease
due to virtualization
• VM licensing costs
• failures make static partitioning
more complex to manage
DATACENTER
What are the costs of Single Tenancy?
0%
25%
50%
75%
100%
RAILS CPU
LOAD
MEMCACHED
CPU LOAD
0%
25%
50%
75%
100%
HADOOP CPU
LOAD
0%
25%
50%
75%
100%
tt
0%
25%
50%
75%
100%
Rails
Memcached
Hadoop
COMBINED CPU LOAD (RAILS,
MEMCACHED, HADOOP)
MESOS
Mesos: One Large Pool of Resources
“We wanted people to be able to program
for the datacenter just like they program
for their laptop."
Ben Hindman
DATACENTER
Kernel
Apps
servicesbatch
Frameworks
Python
JVM
C
++
Workloads
distributed file system
Chronos
DFS
distributed resources: CPU, RAM, I/O, FS, rack locality, etc. Cluster
Storm
Kafka JBoss Django RailsImpalaScalding
Marathon
SparkHadoopMPI
MySQL
Mesos – architecture
Google Refs
● The Datacenter as a Computer: An Introduction to the Design
of Warehouse-Scale Machines
http://research.google.com/pubs/pub35290.html
● 2011 GAFS Omega John Wilkes:
http://youtu.be/0ZFMlO98Jkc
● Omega: flexible, scalable schedulers for large compute
clusters
http://eurosys2013.tudos.org/wp-content/uploads/2013/paper
/Schwarzkopf.pdf
● Taming Latency Variability and Scaling Deep Learning
https://plus.google.com/u/0/+ResearchatGoogle/posts/C1dPh
QhcDRv
Use Cases
Case Study: Twitter (bare metal / on premise)
“Mesos is the cornerstone of our elastic compute infrastructure –
it’s how we build all our new services and is critical for Twitter’s
continued success at scale. It's one of the primary keys to our
data center efficiency."
Chris Fry, SVP Engineering
blog.twitter.com/2013/mesos-graduates-from-apache-incubation
wired.com/gadgetlab/2013/11/qa-with-chris-fry/
• key services run in production: analytics, typeahead, ads
• Twitter engineers rely on Mesos to build all new services
• instead of thinking about static machines, engineers think
about resources like CPU, memory and disk
• allows services to scale and leverage a shared pool of
servers across datacenters efficiently
• reduces the time between prototyping and launching
Case Study: Airbnb (fungible cloud infrastructure)
“We think we might be pushing data science in the field of travel
more so than anyone has ever done before… a smaller number
of engineers can have higher impact through automation on
Mesos."
Mike Curtis, VP Engineering
gigaom.com/2013/07/29/airbnb-is-engineering-itself-into-a-data...
• improves resource management and efficiency
• helps advance engineering strategy of building small teams
that can move fast
• key to letting engineers make the most of AWS-based
infrastructure beyond just Hadoop
• allowed company to migrate off Elastic MapReduce
• enables use of Hadoop along with Chronos, Spark, Storm, etc.
Case Study: HubSpot (cluster management)
Tom Petr
youtu.be/ROn14csiikw
• 500 deployable objects; 100 deploys/day to
production; 90 engineers; 3 devops on Mesos
cluster
• “Our QA cluster is now a fixed $10K/month —
that used to fluctuate”
Dock-ah, dock-ah, dock-ah
"If a Docker application is a Lego brick, Kubernetes would be like a kit
for building the Millennium Falcon and the Mesos cluster would be like
a whole Star Wars universe made of Legos." ~ Solomon
● Mesos used to have plugable add-ons for docker, and now they are 1st
classed into the core.
– Lots of churn between 0.20-0.21 on feedback from the field.
● It enables programmatic scheduling of containers from batch (cronos), to
service orchestration (marathon & Aurora).
● Common deployment is Marathon + HA-Proxy + Docker running from
O(10^1) - O(10^3) with mesos clusters O(10^4).
● Some users are exploring customized elastic batch.
● Custom frameworks are quite a large space, we only see a tip of the iceburg.
Kubernetes+Mesos = Awesome
● Kubernetes provides robust declarative
primitives for distributed micro-services.
● Mesos provides an imperative framework by
which application developers can define
scheduling policy in a programmatic fashion.
● When leveraged together it provides a data-
center with the ability to both.
– Both batch and service.
Kubernetes+Mesos cont.
● Launching pods
– Implement Kube-scheduler API
– Pick a Pod (FCFS), match it to an offer.
– Launch it!
– Kubelet as Executor+Containerizer
● Pod Labels: for Service Discovery + Load Balancing
● Running multi-node on GCE
● Replication Control
● Use resource shapes to schedule pods
● Even smarter scheduling
The BEYOND Section
Coming – “Soon-ish”
● Primitives to support Stateful Services (HDFS,
Cassandra, etc.)
– Persistent disk data.
– Lazy resource reservation
– Expanded Disk Isolation
● I/O
● FS/Mount namespace
● Support multiple spindles
– Dynamic slave attributes.
“Soon-ish” - part 2
● Opportunistic Offers
– Free-for-all like Omega.
– Requires Framework API modifications
– Increased utilization at cloud-scale.
Google Refs
● The Datacenter as a Computer: An Introduction to the Design
of Warehouse-Scale Machines
http://research.google.com/pubs/pub35290.html
● 2011 GAFS Omega John Wilkes:
http://youtu.be/0ZFMlO98Jkc
● Omega: flexible, scalable schedulers for large compute
clusters
http://eurosys2013.tudos.org/wp-content/uploads/2013/paper
/Schwarzkopf.pdf
● Taming Latency Variability and Scaling Deep Learning
https://plus.google.com/u/0/+ResearchatGoogle/posts/C1dPh
QhcDRv
Thank You!
mesos.apache.org
@shakamunyi

More Related Content

What's hot

RHTE2015_CloudForms_OpenStack
RHTE2015_CloudForms_OpenStackRHTE2015_CloudForms_OpenStack
RHTE2015_CloudForms_OpenStack
Jerome Marc
 
Openshift YARN - strata 2014
Openshift YARN - strata 2014Openshift YARN - strata 2014
Openshift YARN - strata 2014
Hortonworks
 
Red hat's updates on the cloud & infrastructure strategy
Red hat's updates on the cloud & infrastructure strategyRed hat's updates on the cloud & infrastructure strategy
Red hat's updates on the cloud & infrastructure strategy
Orgad Kimchi
 
Red Hat OpenStack - Open Cloud Infrastructure
Red Hat OpenStack - Open Cloud InfrastructureRed Hat OpenStack - Open Cloud Infrastructure
Red Hat OpenStack - Open Cloud Infrastructure
Alex Baretto
 
Putting Drupal in the Cloud with Red Hat's OpenShift PaaS #DrupalCon/Prague
Putting Drupal in the Cloud with Red Hat's OpenShift PaaS  #DrupalCon/Prague Putting Drupal in the Cloud with Red Hat's OpenShift PaaS  #DrupalCon/Prague
Putting Drupal in the Cloud with Red Hat's OpenShift PaaS #DrupalCon/Prague
OpenShift Origin
 
Red Hat presentatie: Open stack Latest Pure Tech
Red Hat presentatie: Open stack Latest Pure TechRed Hat presentatie: Open stack Latest Pure Tech
Red Hat presentatie: Open stack Latest Pure Tech
ProxyServices
 
OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4
OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4
OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4
Robert Bohne
 
Open stack platform director
Open stack platform director Open stack platform director
Open stack platform director
Jsonr4
 
LatinoWare 2013 An OpenSource Blueprint for Cloud presented by Diane Mueller,...
LatinoWare 2013 An OpenSource Blueprint for Cloud presented by Diane Mueller,...LatinoWare 2013 An OpenSource Blueprint for Cloud presented by Diane Mueller,...
LatinoWare 2013 An OpenSource Blueprint for Cloud presented by Diane Mueller,...
OpenShift Origin
 
OpenShift In a Nutshell - Episode 01 - Introduction
OpenShift In a Nutshell - Episode 01 - IntroductionOpenShift In a Nutshell - Episode 01 - Introduction
OpenShift In a Nutshell - Episode 01 - Introduction
Behnam Loghmani
 
Discover the all new Mesosphere DC/OS 1.10
Discover the all new Mesosphere DC/OS 1.10Discover the all new Mesosphere DC/OS 1.10
Discover the all new Mesosphere DC/OS 1.10
Mesosphere Inc.
 
What is the OpenStack Platform? By Peter Dens - Kangaroot
What is the OpenStack Platform? By Peter Dens - KangarootWhat is the OpenStack Platform? By Peter Dens - Kangaroot
What is the OpenStack Platform? By Peter Dens - Kangaroot
Kangaroot
 
Meetup
MeetupMeetup
Open shift 4 infra deep dive
Open shift 4    infra deep diveOpen shift 4    infra deep dive
Open shift 4 infra deep dive
Winton Winton
 
Oracle week Israel - OpenStack Platform - 2013
Oracle week Israel - OpenStack Platform - 2013Oracle week Israel - OpenStack Platform - 2013
Oracle week Israel - OpenStack Platform - 2013
Arthur Berezin
 
Red Hat OpenStack Deployment
Red Hat OpenStack DeploymentRed Hat OpenStack Deployment
Red Hat OpenStack Deployment
Michael Solberg
 
Deploying & Scaling OpenShift on OpenStack using Heat - OpenStack Seattle Mee...
Deploying & Scaling OpenShift on OpenStack using Heat - OpenStack Seattle Mee...Deploying & Scaling OpenShift on OpenStack using Heat - OpenStack Seattle Mee...
Deploying & Scaling OpenShift on OpenStack using Heat - OpenStack Seattle Mee...
Diane Mueller
 
Kangaroot open shift best practices - straight from the battlefield
Kangaroot open shift best practices - straight from the battlefieldKangaroot open shift best practices - straight from the battlefield
Kangaroot open shift best practices - straight from the battlefield
Kangaroot
 
An Introduction to Red Hat Enterprise Linux OpenStack Platform
An Introduction to Red Hat Enterprise Linux OpenStack PlatformAn Introduction to Red Hat Enterprise Linux OpenStack Platform
An Introduction to Red Hat Enterprise Linux OpenStack Platform
Rhys Oxenham
 
OpenStack Summit Tokyo 2015: Scale or Fail: Containers on OpenStack with Open...
OpenStack Summit Tokyo 2015: Scale or Fail: Containers on OpenStack with Open...OpenStack Summit Tokyo 2015: Scale or Fail: Containers on OpenStack with Open...
OpenStack Summit Tokyo 2015: Scale or Fail: Containers on OpenStack with Open...
Diane Mueller
 

What's hot (20)

RHTE2015_CloudForms_OpenStack
RHTE2015_CloudForms_OpenStackRHTE2015_CloudForms_OpenStack
RHTE2015_CloudForms_OpenStack
 
Openshift YARN - strata 2014
Openshift YARN - strata 2014Openshift YARN - strata 2014
Openshift YARN - strata 2014
 
Red hat's updates on the cloud & infrastructure strategy
Red hat's updates on the cloud & infrastructure strategyRed hat's updates on the cloud & infrastructure strategy
Red hat's updates on the cloud & infrastructure strategy
 
Red Hat OpenStack - Open Cloud Infrastructure
Red Hat OpenStack - Open Cloud InfrastructureRed Hat OpenStack - Open Cloud Infrastructure
Red Hat OpenStack - Open Cloud Infrastructure
 
Putting Drupal in the Cloud with Red Hat's OpenShift PaaS #DrupalCon/Prague
Putting Drupal in the Cloud with Red Hat's OpenShift PaaS  #DrupalCon/Prague Putting Drupal in the Cloud with Red Hat's OpenShift PaaS  #DrupalCon/Prague
Putting Drupal in the Cloud with Red Hat's OpenShift PaaS #DrupalCon/Prague
 
Red Hat presentatie: Open stack Latest Pure Tech
Red Hat presentatie: Open stack Latest Pure TechRed Hat presentatie: Open stack Latest Pure Tech
Red Hat presentatie: Open stack Latest Pure Tech
 
OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4
OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4
OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4
 
Open stack platform director
Open stack platform director Open stack platform director
Open stack platform director
 
LatinoWare 2013 An OpenSource Blueprint for Cloud presented by Diane Mueller,...
LatinoWare 2013 An OpenSource Blueprint for Cloud presented by Diane Mueller,...LatinoWare 2013 An OpenSource Blueprint for Cloud presented by Diane Mueller,...
LatinoWare 2013 An OpenSource Blueprint for Cloud presented by Diane Mueller,...
 
OpenShift In a Nutshell - Episode 01 - Introduction
OpenShift In a Nutshell - Episode 01 - IntroductionOpenShift In a Nutshell - Episode 01 - Introduction
OpenShift In a Nutshell - Episode 01 - Introduction
 
Discover the all new Mesosphere DC/OS 1.10
Discover the all new Mesosphere DC/OS 1.10Discover the all new Mesosphere DC/OS 1.10
Discover the all new Mesosphere DC/OS 1.10
 
What is the OpenStack Platform? By Peter Dens - Kangaroot
What is the OpenStack Platform? By Peter Dens - KangarootWhat is the OpenStack Platform? By Peter Dens - Kangaroot
What is the OpenStack Platform? By Peter Dens - Kangaroot
 
Meetup
MeetupMeetup
Meetup
 
Open shift 4 infra deep dive
Open shift 4    infra deep diveOpen shift 4    infra deep dive
Open shift 4 infra deep dive
 
Oracle week Israel - OpenStack Platform - 2013
Oracle week Israel - OpenStack Platform - 2013Oracle week Israel - OpenStack Platform - 2013
Oracle week Israel - OpenStack Platform - 2013
 
Red Hat OpenStack Deployment
Red Hat OpenStack DeploymentRed Hat OpenStack Deployment
Red Hat OpenStack Deployment
 
Deploying & Scaling OpenShift on OpenStack using Heat - OpenStack Seattle Mee...
Deploying & Scaling OpenShift on OpenStack using Heat - OpenStack Seattle Mee...Deploying & Scaling OpenShift on OpenStack using Heat - OpenStack Seattle Mee...
Deploying & Scaling OpenShift on OpenStack using Heat - OpenStack Seattle Mee...
 
Kangaroot open shift best practices - straight from the battlefield
Kangaroot open shift best practices - straight from the battlefieldKangaroot open shift best practices - straight from the battlefield
Kangaroot open shift best practices - straight from the battlefield
 
An Introduction to Red Hat Enterprise Linux OpenStack Platform
An Introduction to Red Hat Enterprise Linux OpenStack PlatformAn Introduction to Red Hat Enterprise Linux OpenStack Platform
An Introduction to Red Hat Enterprise Linux OpenStack Platform
 
OpenStack Summit Tokyo 2015: Scale or Fail: Containers on OpenStack with Open...
OpenStack Summit Tokyo 2015: Scale or Fail: Containers on OpenStack with Open...OpenStack Summit Tokyo 2015: Scale or Fail: Containers on OpenStack with Open...
OpenStack Summit Tokyo 2015: Scale or Fail: Containers on OpenStack with Open...
 

Similar to Apache Mesos Overview and Integration

Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
Timothy St. Clair
 
Strata SC 2014: Apache Mesos as an SDK for Building Distributed Frameworks
Strata SC 2014: Apache Mesos as an SDK for Building Distributed FrameworksStrata SC 2014: Apache Mesos as an SDK for Building Distributed Frameworks
Strata SC 2014: Apache Mesos as an SDK for Building Distributed Frameworks
Paco Nathan
 
Datacenter Computing with Apache Mesos - シリコンバレー日本人駐在員Meetup
Datacenter Computing with Apache Mesos - シリコンバレー日本人駐在員MeetupDatacenter Computing with Apache Mesos - シリコンバレー日本人駐在員Meetup
Datacenter Computing with Apache Mesos - シリコンバレー日本人駐在員Meetup
Paco Nathan
 
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
OW2
 
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware
 
Openstack - Enterprise cloud management platform
Openstack - Enterprise cloud management platformOpenstack - Enterprise cloud management platform
Openstack - Enterprise cloud management platform
Nagaraj Shenoy
 
DCOS Presentation
DCOS PresentationDCOS Presentation
DCOS Presentation
Jan Repnak
 
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platformOCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
Marc Dutoo
 
Mesos vs kubernetes comparison
Mesos vs kubernetes comparisonMesos vs kubernetes comparison
Mesos vs kubernetes comparison
Krishna-Kumar
 
Modern Elastic Datacenter Architecture
Modern Elastic Datacenter ArchitectureModern Elastic Datacenter Architecture
Modern Elastic Datacenter Architecture
Weston Bassler
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
Steve Wong
 
OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...
OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...
OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...
Marc Dutoo
 
Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...
Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...
Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...
OCCIware
 
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, SmileOCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
OCCIware
 
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
Marc Dutoo
 
Journey to Containerized Application / Google Container Engine
Journey to Containerized Application / Google Container EngineJourney to Containerized Application / Google Container Engine
Journey to Containerized Application / Google Container Engine
Google Cloud Platform - Japan
 
Docker
DockerDocker
Docker
Yansi Keim
 
Datacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DCDatacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DC
Paco Nathan
 
At the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with OpenstackAt the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with Openstack
Ryan Aydelott
 
[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native Environments[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native Environments
WSO2
 

Similar to Apache Mesos Overview and Integration (20)

Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
 
Strata SC 2014: Apache Mesos as an SDK for Building Distributed Frameworks
Strata SC 2014: Apache Mesos as an SDK for Building Distributed FrameworksStrata SC 2014: Apache Mesos as an SDK for Building Distributed Frameworks
Strata SC 2014: Apache Mesos as an SDK for Building Distributed Frameworks
 
Datacenter Computing with Apache Mesos - シリコンバレー日本人駐在員Meetup
Datacenter Computing with Apache Mesos - シリコンバレー日本人駐在員MeetupDatacenter Computing with Apache Mesos - シリコンバレー日本人駐在員Meetup
Datacenter Computing with Apache Mesos - シリコンバレー日本人駐在員Meetup
 
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
 
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
 
Openstack - Enterprise cloud management platform
Openstack - Enterprise cloud management platformOpenstack - Enterprise cloud management platform
Openstack - Enterprise cloud management platform
 
DCOS Presentation
DCOS PresentationDCOS Presentation
DCOS Presentation
 
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platformOCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
 
Mesos vs kubernetes comparison
Mesos vs kubernetes comparisonMesos vs kubernetes comparison
Mesos vs kubernetes comparison
 
Modern Elastic Datacenter Architecture
Modern Elastic Datacenter ArchitectureModern Elastic Datacenter Architecture
Modern Elastic Datacenter Architecture
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
 
OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...
OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...
OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...
 
Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...
Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...
Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...
 
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, SmileOCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
 
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
 
Journey to Containerized Application / Google Container Engine
Journey to Containerized Application / Google Container EngineJourney to Containerized Application / Google Container Engine
Journey to Containerized Application / Google Container Engine
 
Docker
DockerDocker
Docker
 
Datacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DCDatacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DC
 
At the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with OpenstackAt the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with Openstack
 
[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native Environments[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native Environments
 

Recently uploaded

June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
Techgropse Pvt.Ltd.
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
Claudio Di Ciccio
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 

Recently uploaded (20)

June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 

Apache Mesos Overview and Integration

  • 1. Apache Mesos Overview and Integration: Docker, Kubernetes, and Beyond Alex Barreto Disney Interactive - Data Technology alex.barreto@disney.com @shakamunyi
  • 2. What you should take away ● Overview ● History ● Use Cases ● Integration – Dokah, dokah, dokah – Kubernetes
  • 3. What is Apache Mesos? Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It can run Hadoop, MPI, Hypertable, Spark, Elastic Search, Storm, Aurora, Marathon ... and other applications on a dynamically shared pool of nodes.
  • 4. Rephrased: - Mesos is a distributed systems kernel - ● Mesos is built using the same principles as the Linux kernel, only at a different level of abstraction. The Mesos kernel runs on every machine and provides applications (e.g., Hadoop, Spark, Kafka, Elastic Search) with API’s for resource management and scheduling across entire datacenter and cloud environments. ● Or you could think of it as a distributed system to build distributed systems.
  • 5. Project Highlights ● Top-level Apache project ~ 1 year (mesos.apache.org) ● Scales to 10,000s of nodes ● Obviates the need for virtual machines in some use cases ● Isolation for CPU, RAM, I/O, FS, etc. ● Fault-tolerant leader election (HA) based on Zookeeper ● API's in C++, Java/Scala, Python, Go, Erlang, Haskell. ● Web UI for inspecting state ● Available for Linux, OpenSolaris, Mac OSX
  • 7. Who is using Mesos? - This list is old.
  • 8.
  • 9. Google Refs ● The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines http://research.google.com/pubs/pub35290.html ● 2011 GAFS Omega John Wilkes: http://youtu.be/0ZFMlO98Jkc ● Omega: flexible, scalable schedulers for large compute clusters http://eurosys2013.tudos.org/wp-content/uploads/2013/paper /Schwarzkopf.pdf ● Taming Latency Variability and Scaling Deep Learning https://plus.google.com/u/0/+ResearchatGoogle/posts/C1dPh QhcDRv History
  • 10. Understanding of Datacenter Computing Google has been doing data center computing for years to address the complexities of large-scale data workflows: ● Leveraging the modern kernel isolation. (cgroups) ● Containerization !Virtualization (lmctfy - Docker) ● Most (>80) jobs are batch jobs, but the majority of resources(55-80%) are allocated to service jobs. ● Mixed workloads, multi-tenancy ● Relatively high utilization rates ● JVM? Not so much... ● Reality: scheduling batch is simple; – scheduling services is hard/expensive.
  • 11. Refs. ● The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines – http://research.google.com/pubs/pub35290.html ● GAFS Omega John Wilkes – https://www.youtube.com/watch?v=0ZFMlO98Jkc ● Taming Latency Variability and Scaling Deep Learning – http://youtu.be/nK6daeTZGA8
  • 12. Why Mesos? ● Many existing solutions had architectural deficiencies around a constraining model. – Everything is a “job”, good for batch. – Everything was a “service”, good for PaaS. – What happens when you want to write your own distributed application? (no primitives) – What happens when you want to write your own scheduler (elastic service). Reinvent the Square wheel.
  • 13. The New Reality ● New applications need to be: – Fault Tolerant (Withstand failure) – Scalable (Not crumble under it's own weight) – Elastic (Can grow and shrink based on demand) – Multi-tenent (It can't have it's own dedicated cluster) ● Must play nice with the other kids. ● So what does that really mean?
  • 14. Distributed Applications ● “There's Just No Getting Around It: You're Building a Distributed System” Mark Cavage – queue.acm.org/detail.cfm?id=2482856 ● Key takeaways on architecture: – Decompose the business applications into discrete services on the boundaries of fault domains, scaling, and data workload. – Make as many things as possible stateless – When dealing with state, deeply understand CAP, latency, throughput, and durability requirements. “Without practical experience working on successful—and failed—systems, most engineers take a "hopefully it works" approach and attempt to string together off-the-shelf software, whether open source or commercial, and often are unsuccessful at building a resilient, performant system. In reality, building a distributed system requires a methodical approach to requirements along the boundaries of failure domains, latency, throughput, durability, consistency, and desired SLAs for the business application at all aspects of the application.”
  • 15. Google Refs ● The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines http://research.google.com/pubs/pub35290.html ● 2011 GAFS Omega John Wilkes: http://youtu.be/0ZFMlO98Jkc ● Omega: flexible, scalable schedulers for large compute clusters http://eurosys2013.tudos.org/wp-content/uploads/2013/paper /Schwarzkopf.pdf ● Taming Latency Variability and Scaling Deep Learning https://plus.google.com/u/0/+ResearchatGoogle/posts/C1dPh QhcDRv Emerging at Berkeley
  • 17. Prior Practice: Dedicated Servers • low utilization rates • longer time to ramp up new services DATACENTER
  • 18. Prior Practice: Virtualization DATACENTER PROVISIONED VMS • even more machines to manage • substantial performance decrease due to virtualization • VM licensing costs
  • 19. Prior Practice: Static Partitioning STATIC PARTITIONING • even more machines to manage • substantial performance decrease due to virtualization • VM licensing costs • failures make static partitioning more complex to manage DATACENTER
  • 20. What are the costs of Single Tenancy? 0% 25% 50% 75% 100% RAILS CPU LOAD MEMCACHED CPU LOAD 0% 25% 50% 75% 100% HADOOP CPU LOAD 0% 25% 50% 75% 100% tt 0% 25% 50% 75% 100% Rails Memcached Hadoop COMBINED CPU LOAD (RAILS, MEMCACHED, HADOOP)
  • 21. MESOS Mesos: One Large Pool of Resources “We wanted people to be able to program for the datacenter just like they program for their laptop." Ben Hindman DATACENTER
  • 22. Kernel Apps servicesbatch Frameworks Python JVM C ++ Workloads distributed file system Chronos DFS distributed resources: CPU, RAM, I/O, FS, rack locality, etc. Cluster Storm Kafka JBoss Django RailsImpalaScalding Marathon SparkHadoopMPI MySQL Mesos – architecture
  • 23. Google Refs ● The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines http://research.google.com/pubs/pub35290.html ● 2011 GAFS Omega John Wilkes: http://youtu.be/0ZFMlO98Jkc ● Omega: flexible, scalable schedulers for large compute clusters http://eurosys2013.tudos.org/wp-content/uploads/2013/paper /Schwarzkopf.pdf ● Taming Latency Variability and Scaling Deep Learning https://plus.google.com/u/0/+ResearchatGoogle/posts/C1dPh QhcDRv Use Cases
  • 24. Case Study: Twitter (bare metal / on premise) “Mesos is the cornerstone of our elastic compute infrastructure – it’s how we build all our new services and is critical for Twitter’s continued success at scale. It's one of the primary keys to our data center efficiency." Chris Fry, SVP Engineering blog.twitter.com/2013/mesos-graduates-from-apache-incubation wired.com/gadgetlab/2013/11/qa-with-chris-fry/ • key services run in production: analytics, typeahead, ads • Twitter engineers rely on Mesos to build all new services • instead of thinking about static machines, engineers think about resources like CPU, memory and disk • allows services to scale and leverage a shared pool of servers across datacenters efficiently • reduces the time between prototyping and launching
  • 25. Case Study: Airbnb (fungible cloud infrastructure) “We think we might be pushing data science in the field of travel more so than anyone has ever done before… a smaller number of engineers can have higher impact through automation on Mesos." Mike Curtis, VP Engineering gigaom.com/2013/07/29/airbnb-is-engineering-itself-into-a-data... • improves resource management and efficiency • helps advance engineering strategy of building small teams that can move fast • key to letting engineers make the most of AWS-based infrastructure beyond just Hadoop • allowed company to migrate off Elastic MapReduce • enables use of Hadoop along with Chronos, Spark, Storm, etc.
  • 26. Case Study: HubSpot (cluster management) Tom Petr youtu.be/ROn14csiikw • 500 deployable objects; 100 deploys/day to production; 90 engineers; 3 devops on Mesos cluster • “Our QA cluster is now a fixed $10K/month — that used to fluctuate”
  • 27. Dock-ah, dock-ah, dock-ah "If a Docker application is a Lego brick, Kubernetes would be like a kit for building the Millennium Falcon and the Mesos cluster would be like a whole Star Wars universe made of Legos." ~ Solomon ● Mesos used to have plugable add-ons for docker, and now they are 1st classed into the core. – Lots of churn between 0.20-0.21 on feedback from the field. ● It enables programmatic scheduling of containers from batch (cronos), to service orchestration (marathon & Aurora). ● Common deployment is Marathon + HA-Proxy + Docker running from O(10^1) - O(10^3) with mesos clusters O(10^4). ● Some users are exploring customized elastic batch. ● Custom frameworks are quite a large space, we only see a tip of the iceburg.
  • 28. Kubernetes+Mesos = Awesome ● Kubernetes provides robust declarative primitives for distributed micro-services. ● Mesos provides an imperative framework by which application developers can define scheduling policy in a programmatic fashion. ● When leveraged together it provides a data- center with the ability to both. – Both batch and service.
  • 29. Kubernetes+Mesos cont. ● Launching pods – Implement Kube-scheduler API – Pick a Pod (FCFS), match it to an offer. – Launch it! – Kubelet as Executor+Containerizer ● Pod Labels: for Service Discovery + Load Balancing ● Running multi-node on GCE ● Replication Control ● Use resource shapes to schedule pods ● Even smarter scheduling
  • 31. Coming – “Soon-ish” ● Primitives to support Stateful Services (HDFS, Cassandra, etc.) – Persistent disk data. – Lazy resource reservation – Expanded Disk Isolation ● I/O ● FS/Mount namespace ● Support multiple spindles – Dynamic slave attributes.
  • 32. “Soon-ish” - part 2 ● Opportunistic Offers – Free-for-all like Omega. – Requires Framework API modifications – Increased utilization at cloud-scale.
  • 33. Google Refs ● The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines http://research.google.com/pubs/pub35290.html ● 2011 GAFS Omega John Wilkes: http://youtu.be/0ZFMlO98Jkc ● Omega: flexible, scalable schedulers for large compute clusters http://eurosys2013.tudos.org/wp-content/uploads/2013/paper /Schwarzkopf.pdf ● Taming Latency Variability and Scaling Deep Learning https://plus.google.com/u/0/+ResearchatGoogle/posts/C1dPh QhcDRv Thank You! mesos.apache.org @shakamunyi