DOING BIG DATA FOR
REAL WITH DOCKER
MESOSPHERE DCOS
Elizabeth Lingg
elizabeth@mesosphere.io
AGENDA
1. Intro
2. Mesosphere, Docker, and DCOS Overview
3. Big Data Container Orchestration using DCOS and Docker
4. Demo
5. Q & A
INTRO
Engineering Manager @ Mesosphere
M.S. Computer Science with a Specialization in Artificial
Intelligence from Stanford
B.S. Computer Science with a Minor in Math, B.S. Policy
and Management from Carnegie Mellon
Experience in AI, Big Data, and Systems
Enjoys applying Distributed Systems to Manage and
Reason Over Large Amounts of Data
MESOS
Provides primitives to author datacenter-native apps.
PRIMITIVES
Resources (cpu, mem, disk, ports)
Asset fetching
Task state tracking
API for the datacenter
STATUS QUO IS STATIC
PARTITIONING
AND USE OF VIRTUAL MACHINES
MESOS LET US TREAT A CLUSTER OF
NODES...
AS ONE BIG COMPUTER

Not as individual
machines

Not as VMs
BUT AS COMPUTATIONAL
RESOURCES LIKE CORES, MEMORY,
DISKS, ETC.
WE LOVE CONTAINERS
MOST MODERN APPLICATIONS ARE A WEB OF
CONTAINERS
A CONTAINER ORCHESTRATION PLATFORM
Containerization in Mesos, a brief history
MESOSPHERE DCOS
Software to provide a complete OS: init, cron, apt-get,
discovery, routing
Beautiful web UI and CLI
Support
Ecosystem of DCOS Services
Mesos Master and Mesos Workers Running in Docker
Containers
DCOS UI
DCOS CLI
$ dcos
Command line utility for the Mesosphere Datacenter Operating
System (DCOS). The Mesosphere DCOS is a distributed operating
system built around Apache Mesos. This utility provides tools
for easy management of a DCOS installation.
Available DCOS commands:
config Get and set DCOS CLI configuration properties
help Display command line usage information
marathon Deploy and manage applications on the DCOS
node Manage DCOS nodes
package Install and manage DCOS software packages
service Manage DCOS services
task Manage DCOS tasks
BIG DATA DISTRIBUTED
APPLICATIONS ON DCOS
Mesos Master and Mesos Workers Running in Docker
Containers
Distributed Applications Running in Containers on the
Mesos Workers
Container Orchestration done by Apache Mesos
Resource Allocation and Scaling Managed by Apache
Mesos
BIG DATA DISTRIBUTED
APPLICATIONS ON DCOS
Popular Distributed Apps easily deployed on a single
DCOS Cluster
Kafka, Cassandra, HDFS, Spark, and other Big Data
Services
Health checks and failure recovery are automated
APPLICATION NETWORKING
Interact with the CLI or REST API's to interact with the
services
Mesos DNS Resolution
Docker Networking mainly done through host mode
networking, works seamlessly
DATA SECURITY
Services storing secure data run on private worker nodes
in the cluster
Private nodes can only be accessed through VPN
As needed, services choose what is exposed through a
proxy running on a public node
Distributed Application can authenticate with the Master
using Framework Authentication (Kerberos Option)
EXAMPLE: SIMPLE DOCKER APP ON
DCOS
{
"id": "/mesosphere/cd-demo-app",
"instances": 1,
"cpus": 1,
"mem": 512,
"container": {
"type": "DOCKER",
"docker": {
"image": "mesosphere/cd-demo-app:$tag",
"network": "BRIDGE",
"portMappings": [
{
"servicePort": 28080,
"containerPort": 80,
"hostPort": 0,
"protocol": "tcp"
}<
EXAMPLE: CASSANDRA DCOS
SERVICE
FEATURES
Managed node configuration
Health Monitoring
Rest API
DNS Names for nodes
Multiple Rings in one cluster
INSTALL
$ dcos package install cassandra
CUSTOMIZABLE INSTALL OPTIONS
{
"cassandra": {
"cluster-name": "dev",
"resources": {
"cpus": 3.0,
"mem": 6144,
"disk": 30720
}
}
}
$ dcos package install cassandra --options=options.json
INSTALLING
HEALTHY
REST API
GET /node/all
GET /health/cluster/report
POST /node/{node}/replace
POST /cluster/repair/start
POST /scale/nodes?nodeCount={count}
DEMO!
Q & A
THANKS!
LET'S CHAT!
WE'RE HIRING!
DCOS:
Join:
mesosphere.com
mesosphere.com/careers/

Doing Big Data for Real with Docker

  • 1.
    DOING BIG DATAFOR REAL WITH DOCKER MESOSPHERE DCOS Elizabeth Lingg elizabeth@mesosphere.io
  • 2.
    AGENDA 1. Intro 2. Mesosphere,Docker, and DCOS Overview 3. Big Data Container Orchestration using DCOS and Docker 4. Demo 5. Q & A
  • 3.
    INTRO Engineering Manager @Mesosphere M.S. Computer Science with a Specialization in Artificial Intelligence from Stanford B.S. Computer Science with a Minor in Math, B.S. Policy and Management from Carnegie Mellon Experience in AI, Big Data, and Systems Enjoys applying Distributed Systems to Manage and Reason Over Large Amounts of Data
  • 4.
    MESOS Provides primitives toauthor datacenter-native apps. PRIMITIVES Resources (cpu, mem, disk, ports) Asset fetching Task state tracking API for the datacenter
  • 5.
    STATUS QUO ISSTATIC PARTITIONING AND USE OF VIRTUAL MACHINES
  • 6.
    MESOS LET USTREAT A CLUSTER OF NODES...
  • 7.
    AS ONE BIGCOMPUTER
  • 8.
  • 9.
    BUT AS COMPUTATIONAL RESOURCESLIKE CORES, MEMORY, DISKS, ETC.
  • 11.
  • 12.
    MOST MODERN APPLICATIONSARE A WEB OF CONTAINERS
  • 13.
  • 14.
  • 15.
    MESOSPHERE DCOS Software toprovide a complete OS: init, cron, apt-get, discovery, routing Beautiful web UI and CLI Support Ecosystem of DCOS Services Mesos Master and Mesos Workers Running in Docker Containers
  • 16.
  • 17.
    DCOS CLI $ dcos Commandline utility for the Mesosphere Datacenter Operating System (DCOS). The Mesosphere DCOS is a distributed operating system built around Apache Mesos. This utility provides tools for easy management of a DCOS installation. Available DCOS commands: config Get and set DCOS CLI configuration properties help Display command line usage information marathon Deploy and manage applications on the DCOS node Manage DCOS nodes package Install and manage DCOS software packages service Manage DCOS services task Manage DCOS tasks
  • 18.
    BIG DATA DISTRIBUTED APPLICATIONSON DCOS Mesos Master and Mesos Workers Running in Docker Containers Distributed Applications Running in Containers on the Mesos Workers Container Orchestration done by Apache Mesos Resource Allocation and Scaling Managed by Apache Mesos
  • 19.
    BIG DATA DISTRIBUTED APPLICATIONSON DCOS Popular Distributed Apps easily deployed on a single DCOS Cluster Kafka, Cassandra, HDFS, Spark, and other Big Data Services Health checks and failure recovery are automated
  • 20.
    APPLICATION NETWORKING Interact withthe CLI or REST API's to interact with the services Mesos DNS Resolution Docker Networking mainly done through host mode networking, works seamlessly
  • 21.
    DATA SECURITY Services storingsecure data run on private worker nodes in the cluster Private nodes can only be accessed through VPN As needed, services choose what is exposed through a proxy running on a public node Distributed Application can authenticate with the Master using Framework Authentication (Kerberos Option)
  • 22.
    EXAMPLE: SIMPLE DOCKERAPP ON DCOS { "id": "/mesosphere/cd-demo-app", "instances": 1, "cpus": 1, "mem": 512, "container": { "type": "DOCKER", "docker": { "image": "mesosphere/cd-demo-app:$tag", "network": "BRIDGE", "portMappings": [ { "servicePort": 28080, "containerPort": 80, "hostPort": 0, "protocol": "tcp" }<
  • 23.
    EXAMPLE: CASSANDRA DCOS SERVICE FEATURES Managednode configuration Health Monitoring Rest API DNS Names for nodes Multiple Rings in one cluster
  • 24.
    INSTALL $ dcos packageinstall cassandra CUSTOMIZABLE INSTALL OPTIONS { "cassandra": { "cluster-name": "dev", "resources": { "cpus": 3.0, "mem": 6144, "disk": 30720 } } } $ dcos package install cassandra --options=options.json
  • 25.
  • 26.
  • 27.
    REST API GET /node/all GET/health/cluster/report POST /node/{node}/replace POST /cluster/repair/start POST /scale/nodes?nodeCount={count}
  • 28.
  • 29.
  • 30.