SlideShare a Scribd company logo
Dr. Bernd Mathiske

Senior Software Architect

Mesosphere

Why the Datacenter needs an
Operating System

1
Bringing
Google-Scale 

Computing
to Everybody
A Slice of Google Tech Transfer History
2005: MapReduce -> Hadoop (Yahoo)
2007: Linux cgroups for lightweight isolation (Google)
2009: BigTable -> MongoDB
2009: “The Datacenter as a Computer” - Barroso, Hölzle (Google)



2009: Mesos - a distributed operating system kernel (UC Berkeley)
2010: Large scale production Mesos deployment (Twitter)
since 2010: Many more frameworks and quite a few meta-frameworks

Notable Operating System Developments
Single-something => multi-something: user, tasking, threading, core, …

More: bits, memory, storage, bandwidth…

OS virtualization => lightweight virtualization (cgroups, LXCs, jails, …)

Packaging => containers (docker, rkt, lmctfy, …)

Static libraries => dynamic libraries => static libraries

4
Cluster Operating Systems (Hardware Clustering)
Researched since the 1980s

Trying to provide (the illusion of) a single system image

Aiming at HA, load balancing, location transparency (e.g. for storage)

Many systems: Amoeba, ChorusOS, GLUnix, Hurricane, MOSIX, Plan9, RHCS,
Spring, Sprite, Sumo, QNX, Solaris MC, UnixWare, VAXclusters, …



Relatively low scale (up to 100s of nodes) 

Complicated to manage, less dynamic than software clustering

5
From HPC Grid to Enterprise Cloud
Condor, LSF, Maui, Moab, Quartz, SLURM, …

Typically for batch jobs

Also cover services => SOA => more job schedulers

=> grid computing => grid middleware … => cloud stacks

6
From Server Virtualization to App Aggregation
Cloud Era:

Big apps, small servers
Client-Server Era:

Small apps, big servers
Server
Virtualization
App App App App
App
Aggregation
Serv Serv Serv Serv
Cloud Computing
SaaS: Salesforce demonstrated success, then many followed

PaaS: Deis, Dotcloud, OpenShift, Heroku, Pivotal, Stackato, …

IaaS: AWS, Azure, DigitalOcean, GCE…

Private cloud stacks including IaaS: Eucalyptus, CloudStack,
Joyent, OpenStack, SmartCloud, vSphere, …

8
Datacenter
✴ A facility used to house computer systems and associated
components (e.g. networking, storage, cooling, sensors)

✴ In this talk we focus on how to manage and use a single
production cluster of networked computers in a datacenter

✴ Such clusters range in size from 10s to 10000s of nodes

✴ Why should we and how can we end up with 

just one production cluster?

9
Datacenter Services
✴ LAMP (Linux, Apache, MSQL, PHP) or similar 

✴ MEAN (MongoDB, Express.js, Angular.js, Node.js) or similar

✴ Cassandra, ElasticSearch, Exelixi, Hadoop, Hypertable, Jenkins,
Kafka, MPI, Spark, Storm, SSSP, Torque, …

✴ Private PaaS: Deis, …

✴ …
10
Operate your Laptop like your Datacenter?
From Static Partitioning to Elastic Sharing
Static Partitioning
Elastic Sharing
WEB HADOOPCACHE
WASTED
FREEFREE
HADOOP
WEB
CACHE
WASTED WASTED
100% —
100% —
Software Clustering
Layer between node OS and
application frameworks

Scale
Multi-tenancy
High availability
Available Open Source Components
✴ 2-level scheduler: Apache Mesos

✴ Meta-frameworks / schedulers: Aurora, Chronos, Marathon,
Kubernetes, Swarm, …

✴ Service discovery: Consul, HAProxy, Mesos DNS, …

✴ Highly available configuration: zk, etcd, …

✴ Storage: HDFS, Ceph, …

✴ Node OSs: lots of Linux variants

✴ Lots of app frameworks: Sparc, Storm, Cassandra, Kafka, …14
2-Level Scheduling
Scale: from 1 node to at least 10000s of nodes

Optimizing resource management

End-to-end principle: “application-specific functions ought to reside
in the end nodes of a network rather than intermediary nodes”

-> Requirement for general multi-tenancy

-> Requirement for having only one production cluster
15
App
How Mesos Works
16
Framework
Scheduler Master Slave
Master
Master
Master
Executor
Executor
Task
Task
Task
Task
zk/etcd
Ways to Run an Application
1. Vanilla job

• Employ meta-framework for invocation: Chronos, Aurora, Kubernetes, …

2. Application of an adapted framework

• Hadoop, Sparc, Storm, ElasticSearch, Cassandra, Kafka, many more…

3. Non-adapted services

• Employ meta-framework for invocation: Marathon, Aurora, Kubernetes, …

• Provide (select) a service discovery solution

4. Program your own scheduler (and executor)

17
The Mesos Framework API
✴ Currently like internal Mesos communication:

• protobuf messages over HTTP

✴ Soon:

• JSON messages over HTTP (stream)

=> no need to link with binary Mesos library and/or less to reimplement

ca. a dozen programming languages => any language
18
How to implement a framework
✴ Scheduler interface: 1 half of 2-level scheduling

• The framework knows best when to do what with what kind of resources

• About a dozen callbacks, main functionality in 2 of them:

- receive resource offers

- receive task status updates

✴ Executor interface: task life-cycle management and monitoring

• Command line executor included in Mesos

• Docker executor included in Mesos

• Custom executors often not needed
19
Scheduler SPI (implemented by Framework)
20
public interface Scheduler {
void registered(SchedulerDriver driver, FrameworkID frameworkId, 

MasterInfo masterInfo);
void reregistered(SchedulerDriver driver, MasterInfo masterInfo);
void resourceOffers(SchedulerDriver driver, List<Offer> offers);
void offerRescinded(SchedulerDriver driver, OfferID offerId);
void statusUpdate(SchedulerDriver driver, TaskStatus status);
void frameworkMessage(SchedulerDriver driver, ExecutorID executorId,
SlaveID slaveId, byte[] data);
void disconnected(SchedulerDriver driver);
void slaveLost(SchedulerDriver driver, SlaveID slaveId);
void executorLost(SchedulerDriver driver, ExecutorID executorId,
SlaveID slaveId, int status);
void error(SchedulerDriver driver, String message);
}
Minimal Scheduler Implementation
class MyFrameworkScheduler implements Scheduler {
…
private TaskGenerator _taskGen;
public void resourceOffers(SchedulerDriver driver, List<Offer> offers) {
if (_taskGen.doneCreatingTasks()) {
for (offer : offers) {
driver.declineOffer(offer.getId());
}
} else {
for (offer : offers) {
List<TaskInfo> taskInfos = _taskGen.generateTaskInfos(offer);
driver.launchTasks(offer.getId(), taskInfos, _filters);
}
}
}
public void statusUpdate(SchedulerDriver driver, TaskStatus status) {
_taskGen.observeTaskStatusUpdate(taskStatus);
if (_taskGen.done()) {
driver.stop();
}
}
…

}
21
The Developer’s Perspective
✴ Focus on application logic, not datacenter structure

✴ Avoid networking-related code

✴ Reuse of built-in fault-tolerance and high availability

✴ Reuse distributed (infrastructure) frameworks (e.g., storage)

=> API, SDK for datacenter services
22
The Operations Engineer’s Perspective
✴ Ease of deployment/management

✴ Uniformity of deployment/management

✴ Hardware utilization rate

✴ Scaling up as business grows

✴ Scaling out sporadically 

✴ Cost and time for moving to a different datacenter

✴ High availability and fault-tolerance of system services

✴ Monitoring

✴ Trouble shooting
23
Necessary Multi-Tenancy Features
Task containerization

Resource isolation

Resource and task attributes

Static and dynamic resource reservations

Reservation levels

Meta-frameworks

Dynamic scheduler update and reconfiguration

Security

24
Desirable Multi-Tenancy Features
Optimistic offers

Oversubscription

Task preemption, migration, resizing, reconfiguration

Rate limiting

Auto-scaling => hybrid cloud

Infrastructure frameworks

25
Using Docker Containers in Mesos
26
Mesos Master Server
init
|
+ mesos-master
|
+ marathon
|
Mesos Slave Server
init
|
+ docker
| |
| + lxc
| |
| + (user task, under container init system)
| |
|
+ mesos-slave
| |
| + /var/lib/mesos/executors/docker
| | |
| | + docker run …
| | |
Docker
Registry
When a user requests
a container…
Mesos, LXC, and
Docker are tied
together for launch
2
1
3
4
5
6
7
8
Other Schedulers as Meta-Frameworks in a 2-level Scheduler

YARN => https://github.com/mesos/myriad

Kubernetes => https://github.com/mesosphere/kubernetes-mesos

Swarm => Swarm on Mesos (new project)

=> run everything in one cluster

27
Myriad : Virtual YARN Clusters on Mesos
28
	 ◦	 POST /api/clusters: Registers a new YARN

	 ◦	 GET /api/clusters: Lists all registered clusters

	 ◦	 GET /api/clusters/{clusterId}: Lists the cluster with {clusterId}

	 ◦	 PUT /api/clusters/{clusterId}/flexup: Expands the size of cluster with {clusterId}

	 ◦	 PUT /api/clusters/{clusterId}/flexdown: Shrinks the size of cluster with {clusterId}

	 ◦	 DELETE /api/clusters/{clusterId}: Unregisters YARN cluster with {clusterId}. Also, kills all the nodes.
Node
Master
Mesos
Slave
Mesos
YARN
Myriad
Scheduler RM
Myriad
Executor
1. Launch NodeManager
1
1
1
2.5 CPU
2.5 GB
1
NM
YARN
flexUp
2.0 CPU
2.0 GB
C1
C2
29
Kubernetes in Mesos
Portability
30
Mesos
Public Cloud Managed Cloud Your Own DC
Framework Apps
Meta-Frameworks
Vanilla Apps
Infrastructure Frameworks
The Application User’s Perspective
✴ Focus on apps, services, parameters, results

✴ Avoid dealing with datacenter operations/management

✴ Avoid adjusting system settings

✴ High availability

✴ Throughput

✴ Responsiveness

✴ Predictiveness

✴ Run everything I need

✴ Return on and safety of investment
31
The Datacenter is the new form factor
✴ 2-level scheduler => single production cluster

✴ scalability and portability => avoiding hardware/cloud lock-in

✴ built-in container support => running containers at scale

✴ automation => operator efficiency

✴ repositories => apps/services readily available

✴ API and SDK => productive/quick app/service development
32
33
Above the Clouds
with Open Source!

More Related Content

What's hot

Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
Siddharth Mathur
 
ARCHITECTING TENANT BASED QOS IN MULTI-TENANT CLOUD PLATFORMS
ARCHITECTING TENANT BASED QOS IN MULTI-TENANT CLOUD PLATFORMSARCHITECTING TENANT BASED QOS IN MULTI-TENANT CLOUD PLATFORMS
ARCHITECTING TENANT BASED QOS IN MULTI-TENANT CLOUD PLATFORMS
Arun prasath
 
Distributed Resource Scheduling Frameworks
Distributed Resource Scheduling FrameworksDistributed Resource Scheduling Frameworks
Distributed Resource Scheduling Frameworks
VARUN SAXENA
 
Develop with linux containers and docker
Develop with linux containers and dockerDevelop with linux containers and docker
Develop with linux containers and docker
Fabio Fumarola
 
Rhel cluster basics 1
Rhel cluster basics   1Rhel cluster basics   1
Rhel cluster basics 1
Manoj Singh
 
DC/OS: The definitive platform for modern apps
DC/OS: The definitive platform for modern appsDC/OS: The definitive platform for modern apps
DC/OS: The definitive platform for modern apps
Datio Big Data
 
Microservices, Containers and Docker
Microservices, Containers and DockerMicroservices, Containers and Docker
Microservices, Containers and Docker
Ioannis Papapanagiotou
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
DataStax
 
Robust Containers by Eric Brewer
Robust Containers by Eric BrewerRobust Containers by Eric Brewer
Robust Containers by Eric BrewerDocker, Inc.
 
Containers are the future of the Cloud
Containers are the future of the CloudContainers are the future of the Cloud
Containers are the future of the CloudPavel Odintsov
 
Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)
Sage Weil
 
Red Hat Global File System (GFS)
Red Hat Global File System (GFS)Red Hat Global File System (GFS)
Red Hat Global File System (GFS)Schubert Zhang
 
Mesos: Cluster Management System
Mesos: Cluster Management SystemMesos: Cluster Management System
Mesos: Cluster Management System
Erhan Bagdemir
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
Sage Weil
 
Gluster Storage
Gluster StorageGluster Storage
Gluster Storage
Raz Tamir
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
Ceph - A distributed storage system
Ceph - A distributed storage systemCeph - A distributed storage system
Ceph - A distributed storage system
Italo Santos
 
Glusterfs and openstack
Glusterfs  and openstackGlusterfs  and openstack
Glusterfs and openstackopenstackindia
 
Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26
Benoit Perroud
 
SQL Server on Linux
SQL Server on LinuxSQL Server on Linux
SQL Server on Linux
Fabrício Catae
 

What's hot (20)

Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
ARCHITECTING TENANT BASED QOS IN MULTI-TENANT CLOUD PLATFORMS
ARCHITECTING TENANT BASED QOS IN MULTI-TENANT CLOUD PLATFORMSARCHITECTING TENANT BASED QOS IN MULTI-TENANT CLOUD PLATFORMS
ARCHITECTING TENANT BASED QOS IN MULTI-TENANT CLOUD PLATFORMS
 
Distributed Resource Scheduling Frameworks
Distributed Resource Scheduling FrameworksDistributed Resource Scheduling Frameworks
Distributed Resource Scheduling Frameworks
 
Develop with linux containers and docker
Develop with linux containers and dockerDevelop with linux containers and docker
Develop with linux containers and docker
 
Rhel cluster basics 1
Rhel cluster basics   1Rhel cluster basics   1
Rhel cluster basics 1
 
DC/OS: The definitive platform for modern apps
DC/OS: The definitive platform for modern appsDC/OS: The definitive platform for modern apps
DC/OS: The definitive platform for modern apps
 
Microservices, Containers and Docker
Microservices, Containers and DockerMicroservices, Containers and Docker
Microservices, Containers and Docker
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
 
Robust Containers by Eric Brewer
Robust Containers by Eric BrewerRobust Containers by Eric Brewer
Robust Containers by Eric Brewer
 
Containers are the future of the Cloud
Containers are the future of the CloudContainers are the future of the Cloud
Containers are the future of the Cloud
 
Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)
 
Red Hat Global File System (GFS)
Red Hat Global File System (GFS)Red Hat Global File System (GFS)
Red Hat Global File System (GFS)
 
Mesos: Cluster Management System
Mesos: Cluster Management SystemMesos: Cluster Management System
Mesos: Cluster Management System
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
Gluster Storage
Gluster StorageGluster Storage
Gluster Storage
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Ceph - A distributed storage system
Ceph - A distributed storage systemCeph - A distributed storage system
Ceph - A distributed storage system
 
Glusterfs and openstack
Glusterfs  and openstackGlusterfs  and openstack
Glusterfs and openstack
 
Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26
 
SQL Server on Linux
SQL Server on LinuxSQL Server on Linux
SQL Server on Linux
 

Similar to OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System

Fully fault tolerant real time data pipeline with docker and mesos
Fully fault tolerant real time data pipeline with docker and mesos Fully fault tolerant real time data pipeline with docker and mesos
Fully fault tolerant real time data pipeline with docker and mesos
Rahul Kumar
 
MANTL Data Platform, Microservices and BigData Services
MANTL Data Platform, Microservices and BigData ServicesMANTL Data Platform, Microservices and BigData Services
MANTL Data Platform, Microservices and BigData Services
Cisco DevNet
 
Containerization - The DevOps Revolution
Containerization - The DevOps RevolutionContainerization - The DevOps Revolution
Containerization - The DevOps RevolutionYulian Slobodyan
 
Building Web Scale Apps with Docker and Mesos by Alex Rukletsov (Mesosphere)
Building Web Scale Apps with Docker and Mesos by Alex Rukletsov (Mesosphere)Building Web Scale Apps with Docker and Mesos by Alex Rukletsov (Mesosphere)
Building Web Scale Apps with Docker and Mesos by Alex Rukletsov (Mesosphere)
Docker, Inc.
 
DCOS Presentation
DCOS PresentationDCOS Presentation
DCOS Presentation
Jan Repnak
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
Joe Stein
 
OSDC 2016 - Mesos and the Architecture of the New Datacenter by Jörg Schad
OSDC 2016 - Mesos and the Architecture of the New Datacenter by Jörg SchadOSDC 2016 - Mesos and the Architecture of the New Datacenter by Jörg Schad
OSDC 2016 - Mesos and the Architecture of the New Datacenter by Jörg Schad
NETWAYS
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
Steve Wong
 
Modern Elastic Datacenter Architecture
Modern Elastic Datacenter ArchitectureModern Elastic Datacenter Architecture
Modern Elastic Datacenter Architecture
Weston Bassler
 
Techdays SE 2016 - Micros.. err Microcosmos
Techdays SE 2016 - Micros.. err MicrocosmosTechdays SE 2016 - Micros.. err Microcosmos
Techdays SE 2016 - Micros.. err Microcosmos
Mike Martin
 
The New Stack Container Summit Talk
The New Stack Container Summit TalkThe New Stack Container Summit Talk
The New Stack Container Summit Talk
The New Stack
 
2 Linux Container and Docker
2 Linux Container and Docker2 Linux Container and Docker
2 Linux Container and Docker
Fabio Fumarola
 
'Cloud-Native' Ecosystem - Aug 2015
'Cloud-Native' Ecosystem - Aug 2015'Cloud-Native' Ecosystem - Aug 2015
'Cloud-Native' Ecosystem - Aug 2015
Lenny Pruss
 
Doing Dropbox the Native Cloud Native Way
Doing Dropbox the Native Cloud Native WayDoing Dropbox the Native Cloud Native Way
Doing Dropbox the Native Cloud Native Way
Minio
 
Highly scalable caching service on cloud - Redis
Highly scalable caching service on cloud - RedisHighly scalable caching service on cloud - Redis
Highly scalable caching service on cloud - Redis
Krishna-Kumar
 
An Updated Performance Comparison of Virtual Machines and Linux Containers
An Updated Performance Comparison of Virtual Machines and Linux ContainersAn Updated Performance Comparison of Virtual Machines and Linux Containers
An Updated Performance Comparison of Virtual Machines and Linux Containers
Kento Aoyama
 
Choosing PaaS: Cisco and Open Source Options: an overview
Choosing PaaS:  Cisco and Open Source Options: an overviewChoosing PaaS:  Cisco and Open Source Options: an overview
Choosing PaaS: Cisco and Open Source Options: an overview
Cisco DevNet
 
Docker-v3.pdf
Docker-v3.pdfDocker-v3.pdf
Docker-v3.pdf
Bruno Cornec
 
Brk2051 sql server on linux and docker
Brk2051 sql server on linux and dockerBrk2051 sql server on linux and docker
Brk2051 sql server on linux and docker
Bob Ward
 
Cassandra & Kubernetes
Cassandra & KubernetesCassandra & Kubernetes
Cassandra & Kubernetes
Anant Corporation
 

Similar to OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System (20)

Fully fault tolerant real time data pipeline with docker and mesos
Fully fault tolerant real time data pipeline with docker and mesos Fully fault tolerant real time data pipeline with docker and mesos
Fully fault tolerant real time data pipeline with docker and mesos
 
MANTL Data Platform, Microservices and BigData Services
MANTL Data Platform, Microservices and BigData ServicesMANTL Data Platform, Microservices and BigData Services
MANTL Data Platform, Microservices and BigData Services
 
Containerization - The DevOps Revolution
Containerization - The DevOps RevolutionContainerization - The DevOps Revolution
Containerization - The DevOps Revolution
 
Building Web Scale Apps with Docker and Mesos by Alex Rukletsov (Mesosphere)
Building Web Scale Apps with Docker and Mesos by Alex Rukletsov (Mesosphere)Building Web Scale Apps with Docker and Mesos by Alex Rukletsov (Mesosphere)
Building Web Scale Apps with Docker and Mesos by Alex Rukletsov (Mesosphere)
 
DCOS Presentation
DCOS PresentationDCOS Presentation
DCOS Presentation
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
 
OSDC 2016 - Mesos and the Architecture of the New Datacenter by Jörg Schad
OSDC 2016 - Mesos and the Architecture of the New Datacenter by Jörg SchadOSDC 2016 - Mesos and the Architecture of the New Datacenter by Jörg Schad
OSDC 2016 - Mesos and the Architecture of the New Datacenter by Jörg Schad
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
 
Modern Elastic Datacenter Architecture
Modern Elastic Datacenter ArchitectureModern Elastic Datacenter Architecture
Modern Elastic Datacenter Architecture
 
Techdays SE 2016 - Micros.. err Microcosmos
Techdays SE 2016 - Micros.. err MicrocosmosTechdays SE 2016 - Micros.. err Microcosmos
Techdays SE 2016 - Micros.. err Microcosmos
 
The New Stack Container Summit Talk
The New Stack Container Summit TalkThe New Stack Container Summit Talk
The New Stack Container Summit Talk
 
2 Linux Container and Docker
2 Linux Container and Docker2 Linux Container and Docker
2 Linux Container and Docker
 
'Cloud-Native' Ecosystem - Aug 2015
'Cloud-Native' Ecosystem - Aug 2015'Cloud-Native' Ecosystem - Aug 2015
'Cloud-Native' Ecosystem - Aug 2015
 
Doing Dropbox the Native Cloud Native Way
Doing Dropbox the Native Cloud Native WayDoing Dropbox the Native Cloud Native Way
Doing Dropbox the Native Cloud Native Way
 
Highly scalable caching service on cloud - Redis
Highly scalable caching service on cloud - RedisHighly scalable caching service on cloud - Redis
Highly scalable caching service on cloud - Redis
 
An Updated Performance Comparison of Virtual Machines and Linux Containers
An Updated Performance Comparison of Virtual Machines and Linux ContainersAn Updated Performance Comparison of Virtual Machines and Linux Containers
An Updated Performance Comparison of Virtual Machines and Linux Containers
 
Choosing PaaS: Cisco and Open Source Options: an overview
Choosing PaaS:  Cisco and Open Source Options: an overviewChoosing PaaS:  Cisco and Open Source Options: an overview
Choosing PaaS: Cisco and Open Source Options: an overview
 
Docker-v3.pdf
Docker-v3.pdfDocker-v3.pdf
Docker-v3.pdf
 
Brk2051 sql server on linux and docker
Brk2051 sql server on linux and dockerBrk2051 sql server on linux and docker
Brk2051 sql server on linux and docker
 
Cassandra & Kubernetes
Cassandra & KubernetesCassandra & Kubernetes
Cassandra & Kubernetes
 

OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System

  • 1. Dr. Bernd Mathiske Senior Software Architect
 Mesosphere Why the Datacenter needs an Operating System
 1
  • 3. A Slice of Google Tech Transfer History 2005: MapReduce -> Hadoop (Yahoo) 2007: Linux cgroups for lightweight isolation (Google) 2009: BigTable -> MongoDB 2009: “The Datacenter as a Computer” - Barroso, Hölzle (Google)
 
 2009: Mesos - a distributed operating system kernel (UC Berkeley) 2010: Large scale production Mesos deployment (Twitter) since 2010: Many more frameworks and quite a few meta-frameworks

  • 4. Notable Operating System Developments Single-something => multi-something: user, tasking, threading, core, … More: bits, memory, storage, bandwidth… OS virtualization => lightweight virtualization (cgroups, LXCs, jails, …) Packaging => containers (docker, rkt, lmctfy, …) Static libraries => dynamic libraries => static libraries 4
  • 5. Cluster Operating Systems (Hardware Clustering) Researched since the 1980s Trying to provide (the illusion of) a single system image Aiming at HA, load balancing, location transparency (e.g. for storage) Many systems: Amoeba, ChorusOS, GLUnix, Hurricane, MOSIX, Plan9, RHCS, Spring, Sprite, Sumo, QNX, Solaris MC, UnixWare, VAXclusters, … 
 Relatively low scale (up to 100s of nodes) Complicated to manage, less dynamic than software clustering 5
  • 6. From HPC Grid to Enterprise Cloud Condor, LSF, Maui, Moab, Quartz, SLURM, … Typically for batch jobs Also cover services => SOA => more job schedulers => grid computing => grid middleware … => cloud stacks 6
  • 7. From Server Virtualization to App Aggregation Cloud Era:
 Big apps, small servers Client-Server Era:
 Small apps, big servers Server Virtualization App App App App App Aggregation Serv Serv Serv Serv
  • 8. Cloud Computing SaaS: Salesforce demonstrated success, then many followed PaaS: Deis, Dotcloud, OpenShift, Heroku, Pivotal, Stackato, … IaaS: AWS, Azure, DigitalOcean, GCE…
 Private cloud stacks including IaaS: Eucalyptus, CloudStack, Joyent, OpenStack, SmartCloud, vSphere, … 8
  • 9. Datacenter ✴ A facility used to house computer systems and associated components (e.g. networking, storage, cooling, sensors) ✴ In this talk we focus on how to manage and use a single production cluster of networked computers in a datacenter ✴ Such clusters range in size from 10s to 10000s of nodes ✴ Why should we and how can we end up with 
 just one production cluster? 9
  • 10. Datacenter Services ✴ LAMP (Linux, Apache, MSQL, PHP) or similar ✴ MEAN (MongoDB, Express.js, Angular.js, Node.js) or similar ✴ Cassandra, ElasticSearch, Exelixi, Hadoop, Hypertable, Jenkins, Kafka, MPI, Spark, Storm, SSSP, Torque, … ✴ Private PaaS: Deis, … ✴ … 10
  • 11. Operate your Laptop like your Datacenter?
  • 12. From Static Partitioning to Elastic Sharing Static Partitioning Elastic Sharing WEB HADOOPCACHE WASTED FREEFREE HADOOP WEB CACHE WASTED WASTED 100% — 100% —
  • 13. Software Clustering Layer between node OS and application frameworks
 Scale Multi-tenancy High availability
  • 14. Available Open Source Components ✴ 2-level scheduler: Apache Mesos ✴ Meta-frameworks / schedulers: Aurora, Chronos, Marathon, Kubernetes, Swarm, … ✴ Service discovery: Consul, HAProxy, Mesos DNS, … ✴ Highly available configuration: zk, etcd, … ✴ Storage: HDFS, Ceph, … ✴ Node OSs: lots of Linux variants ✴ Lots of app frameworks: Sparc, Storm, Cassandra, Kafka, …14
  • 15. 2-Level Scheduling Scale: from 1 node to at least 10000s of nodes Optimizing resource management End-to-end principle: “application-specific functions ought to reside in the end nodes of a network rather than intermediary nodes” -> Requirement for general multi-tenancy -> Requirement for having only one production cluster 15
  • 16. App How Mesos Works 16 Framework Scheduler Master Slave Master Master Master Executor Executor Task Task Task Task zk/etcd
  • 17. Ways to Run an Application 1. Vanilla job • Employ meta-framework for invocation: Chronos, Aurora, Kubernetes, … 2. Application of an adapted framework • Hadoop, Sparc, Storm, ElasticSearch, Cassandra, Kafka, many more… 3. Non-adapted services • Employ meta-framework for invocation: Marathon, Aurora, Kubernetes, … • Provide (select) a service discovery solution 4. Program your own scheduler (and executor) 17
  • 18. The Mesos Framework API ✴ Currently like internal Mesos communication: • protobuf messages over HTTP
 ✴ Soon: • JSON messages over HTTP (stream) => no need to link with binary Mesos library and/or less to reimplement ca. a dozen programming languages => any language 18
  • 19. How to implement a framework ✴ Scheduler interface: 1 half of 2-level scheduling • The framework knows best when to do what with what kind of resources • About a dozen callbacks, main functionality in 2 of them: - receive resource offers - receive task status updates
 ✴ Executor interface: task life-cycle management and monitoring • Command line executor included in Mesos • Docker executor included in Mesos • Custom executors often not needed 19
  • 20. Scheduler SPI (implemented by Framework) 20 public interface Scheduler { void registered(SchedulerDriver driver, FrameworkID frameworkId, 
 MasterInfo masterInfo); void reregistered(SchedulerDriver driver, MasterInfo masterInfo); void resourceOffers(SchedulerDriver driver, List<Offer> offers); void offerRescinded(SchedulerDriver driver, OfferID offerId); void statusUpdate(SchedulerDriver driver, TaskStatus status); void frameworkMessage(SchedulerDriver driver, ExecutorID executorId, SlaveID slaveId, byte[] data); void disconnected(SchedulerDriver driver); void slaveLost(SchedulerDriver driver, SlaveID slaveId); void executorLost(SchedulerDriver driver, ExecutorID executorId, SlaveID slaveId, int status); void error(SchedulerDriver driver, String message); }
  • 21. Minimal Scheduler Implementation class MyFrameworkScheduler implements Scheduler { … private TaskGenerator _taskGen; public void resourceOffers(SchedulerDriver driver, List<Offer> offers) { if (_taskGen.doneCreatingTasks()) { for (offer : offers) { driver.declineOffer(offer.getId()); } } else { for (offer : offers) { List<TaskInfo> taskInfos = _taskGen.generateTaskInfos(offer); driver.launchTasks(offer.getId(), taskInfos, _filters); } } } public void statusUpdate(SchedulerDriver driver, TaskStatus status) { _taskGen.observeTaskStatusUpdate(taskStatus); if (_taskGen.done()) { driver.stop(); } } …
 } 21
  • 22. The Developer’s Perspective ✴ Focus on application logic, not datacenter structure ✴ Avoid networking-related code ✴ Reuse of built-in fault-tolerance and high availability ✴ Reuse distributed (infrastructure) frameworks (e.g., storage) => API, SDK for datacenter services 22
  • 23. The Operations Engineer’s Perspective ✴ Ease of deployment/management ✴ Uniformity of deployment/management ✴ Hardware utilization rate ✴ Scaling up as business grows ✴ Scaling out sporadically ✴ Cost and time for moving to a different datacenter ✴ High availability and fault-tolerance of system services ✴ Monitoring ✴ Trouble shooting 23
  • 24. Necessary Multi-Tenancy Features Task containerization Resource isolation Resource and task attributes Static and dynamic resource reservations Reservation levels Meta-frameworks Dynamic scheduler update and reconfiguration Security 24
  • 25. Desirable Multi-Tenancy Features Optimistic offers Oversubscription Task preemption, migration, resizing, reconfiguration Rate limiting Auto-scaling => hybrid cloud Infrastructure frameworks 25
  • 26. Using Docker Containers in Mesos 26 Mesos Master Server init | + mesos-master | + marathon | Mesos Slave Server init | + docker | | | + lxc | | | + (user task, under container init system) | | | + mesos-slave | | | + /var/lib/mesos/executors/docker | | | | | + docker run … | | | Docker Registry When a user requests a container… Mesos, LXC, and Docker are tied together for launch 2 1 3 4 5 6 7 8
  • 27. Other Schedulers as Meta-Frameworks in a 2-level Scheduler YARN => https://github.com/mesos/myriad Kubernetes => https://github.com/mesosphere/kubernetes-mesos Swarm => Swarm on Mesos (new project) => run everything in one cluster 27
  • 28. Myriad : Virtual YARN Clusters on Mesos 28 ◦ POST /api/clusters: Registers a new YARN ◦ GET /api/clusters: Lists all registered clusters ◦ GET /api/clusters/{clusterId}: Lists the cluster with {clusterId} ◦ PUT /api/clusters/{clusterId}/flexup: Expands the size of cluster with {clusterId} ◦ PUT /api/clusters/{clusterId}/flexdown: Shrinks the size of cluster with {clusterId} ◦ DELETE /api/clusters/{clusterId}: Unregisters YARN cluster with {clusterId}. Also, kills all the nodes. Node Master Mesos Slave Mesos YARN Myriad Scheduler RM Myriad Executor 1. Launch NodeManager 1 1 1 2.5 CPU 2.5 GB 1 NM YARN flexUp 2.0 CPU 2.0 GB C1 C2
  • 30. Portability 30 Mesos Public Cloud Managed Cloud Your Own DC Framework Apps Meta-Frameworks Vanilla Apps Infrastructure Frameworks
  • 31. The Application User’s Perspective ✴ Focus on apps, services, parameters, results ✴ Avoid dealing with datacenter operations/management ✴ Avoid adjusting system settings ✴ High availability ✴ Throughput ✴ Responsiveness ✴ Predictiveness ✴ Run everything I need ✴ Return on and safety of investment 31
  • 32. The Datacenter is the new form factor ✴ 2-level scheduler => single production cluster ✴ scalability and portability => avoiding hardware/cloud lock-in ✴ built-in container support => running containers at scale ✴ automation => operator efficiency ✴ repositories => apps/services readily available ✴ API and SDK => productive/quick app/service development 32
  • 33. 33 Above the Clouds with Open Source!