Container and Kubernetes without limits

CONTAINER AND KUBERNETES
WITHOUT LIMITS
ANTJE BARTH
Advanced Spark and TensorFlow Meetup
O‘Reilly AI Conference, London
October 9th, 2018

2 © 2018 MapR Technologies, Inc. // MapR Confidential
Learn how a MODERN DATA PLATFORM can help to support
stateful applications in large containerized environments,
and how to handle persistent data
across multiple data centers or geographic locations.
#Kubernetes4Data
Today‘s Session

Who
ANTJE BARTH
Partner Engineer, MapR
abarth@mapr.com
antje-barth-413258bb
@anbarth
Chapter Lead Duesseldorf (Germany)
https://www.meetup.com/Women-in-Big-Data-Dusseldorf/
BIG DATA
ML/AI
CONTAINER
K8S
ADV. ANALYTICS

Agenda
QUICK INTRO / RECAP
MODERN DATA PLATFORM
• Data Persistence across data centers / geographic regions
• #Kubernetes4Data
AI BONUS TRACK - Kubernetes plays Cupid for Data Scientists and IT
CONTAINER ORCHESTRATION
• Kubernetes
• Challenges for stateful applications
CONTAINERS
• Architectural concepts
• Container challenges

Virtual Machines are Computers
in a Box
Containers are Applications
in a Box

hardware
os
hypervisor
vm
os
libs
app
vm
os
libs
app
hardware
os
container
libs
app
container
libs
app
container
libs
app
VM vs Container

Pets vs Cattle
- long lived
- name them
- care for them
- ephemeral
- brand them with #’s
- well.. vets are expensive

Containers
• Are lightweight
• Are stateless
• Are portable
• Targeted for developing applications
• Surely moving towards production
• Docker made it popular
… and added a whole lot of jargon for us to learn! J

But…
Containers have a problem

Challenges in using / deploying containers
Source: CNCF Survey, 2018.
https://www.cncf.io/blog/2018/08/29/cncf-survey-use-of-cloud-native-technologies-in-production-has-grown-over-200-percent/
• Cultural Changes with
Development Team
• Complexity
• Lack of Training
• Security
• Monitoring
• Storage
• Networking

Some of the things Docker can’t do
• Monitor running containers
• Handle dead containers
• Move containers so utilization improves
• Auto-scale container instances to handle load
• Solve port mapping hell
• …

You can never get away from pets
unless:
• You have an environment to
support cattle
• You handle the problem of
container state

Kubernetes
kubernetes (n.) - greek word for pilot or helm

Now home
at the CNCF!
Large-scale cluster management at Google with Borg, 2015.
https://ai.google/research/pubs/pub43438
Kubernetes started life as
a successor to Google’s
Borg project...
https://www.cncf.io/ https://kubernetes.io/

Kubernetes is an API and agents
The Kubernetes API provides containers with a
scheduling, configuration, network, and
storage
The Kubernetes runtime manages the containers

Magical View of Kubernetes
Kubernetes

App 1
Kubernetes
Kubernetes starts application
containers “somewhere”

App 1 App 3
Kubernetes
Later containers may be started
elsewhere due to “aﬃnities”

App 1 App 2 App 3
Kubernetes
Kubernetes provides super fast
naming via DNS so containers
can ﬁnd each other

Note that you don’t think about
which machine at all
No more names from The Hobbit
Just cattle!

Kubernetes – Why is it so popular?
• There are many management software
solutions to create, manage & delete
containers with newer vendors
emerging everyday
• Kubernetes remains the leader with
83% (up from 77%)
• The ecosystem and developer
community augmented by Google’s
support gave Kubernetes the edge over
others
Source: CNCF Survey, 2018.
https://www.cncf.io/blog/2018/08/29/cncf-survey-use-of-cloud-native-technologies-in-production-has-grown-over-200-percent/

Kubernetes – an open, pluggable framework

Kubernetes – an open, pluggable framework
Source: CNCF landscape, see https://github.com/cncf/landscape and http://l.cncf.io

We still have a problem

State!

Problem with Containers and State
• State in containers messes things up
• Restarts lose the state
• Replicating state makes services complex
• Application developers just aren’t systems developers
• State life-cycle doesn’t match app life-cycle
• …

App 1 App 2 App 3
Kubernetes

App 1 App 2 App 3
Kubernetes
rpc
stream
LogFile

App 1 App 2 App 3
Kubernetes
rpc
stream
LogFile
We need
multiple
forms of
persistence!

Data platform
App 1 App 2 App 3
Kubernetes
rpc

What Does This Data Platform Need to Have?
Global namespace across entire Kubernetes cluster
• Between clusters as well if possible
All three forms of primitive persistence
• Files, streams, tables
Inherently scalable
• Performance, cardinality, locality
Uniform access and control
• Path names for all objects, identical permission scheme

The Data Platform needs
to be like Kubernetes.
For Data.

MapR Data Platform
MAPR DATA PLATFORM
FILES / OBJECTS / TABLES / STREAMS APIs: NFS, POSIX, REST, S3, HDFS, HBASE, JSON, KAFKA
DATA CENTER CLOUD MULTI-CLOUD EDGE KUBERNETES
COMMODITY
SERVER
VIRTUAL
MACHINE
IoT & Edge
AI / ML
ADV. ANALYTICS
ENTERPRISE
APPLICATIONS
Pod Pod Pod Pod

Scale. It distributes data across the cluster and offers a global namespace for a unified view of data
regardless of its physical location
High Availability. Offers configurable levels of replication to ensure data durability. In event of a failure,
all nodes participate to self-heal and reconstruct data automatically
Data Protection. End-to-end security, per volume Access control expressions, space efficient
snapshots, volume mirroring, offers several choices to build a data protection strategy
Intelligent Data Placement. Offers three different storage tiers with automated storage policies to
place data based on their SLAs
Edge, on-premises, Cloud: Can be deployed in on-premises datacenters, edge and on the cloud
MapR Data Platform

MapR Volumes
Volumes are logical units of management, holding files, directories, tables, messages.
WHAT CAN YOU DO WITH VOLUMES?
• Schedule snapshots
• Schedule mirrors
• Control data placement
• Access permissions
• Enforce volume quotas
• Manage performance
• Specify replication factor
Volumes:
Shared MapR Cluster
r : user:sally |
(group:research & group:managers)
MAPR ACCESS CONTROL EXPRESSIONS
/mktg /finance /projectx

ü Global data view in a single
namespace
ü Distributed data processing
ü Unified Security
ü Global Replication For Data
Distribution & DR
ü Bandwidth-aware to manage
global data flows
ü Simplify cross cloud application
development & deployment
Global Namespace – common path to connect to any data
Globally Protected
Globally Accessible
Globally Managed
Globally Replicated
Across Locations Across Clouds
/mapr
/us.mapr.com
/eu_cloud.mapr.com
/asia.mapr.com
/us_cloud.mapr.com

Automatically Synchronized Globally Distributed Data
Topic
Topic
Topic
On-Premises
S3
EDGE
DATA
PLATFORM
DATA
PLATFORM
DATA
PLATFORM
DATA
PLATFORM
Multi-Cloud Data Movement & Application Portability
Enabling Application and Data Portability
#Kubernetes4Data

MapR Persistent Application Client Container (PACC)
• Pre-built, certified container image
for connecting to MapR services
• Secure authentication at
container level, secure connection
• Extensible support for application
layers
• Available in Docker Hub, Dockerfile
for customizability
MapR POSIX Client
for Containers
MapR Converged
Client for
Containers
Space for Customer Application
MapR PACC
MAPR DATA PLATFORM
EVENT DATA
STREAMS
ANALYTICS & ML
ENGINES
OPERATIONAL
DATABASE
CLOUD-SCALE
FILE AND OBJECT
STORE

Containerized Microservices have real-time access to
files/tables/streams
Microservices
Databases/files
Microservices
Databases/files
Microservices
Databases/files
Microservices Microservices Microservices
Microservices Microservices Microservices
MAPR DATA PLATFORM
Stream Stream

MapR Data Fabric for Kubernetes

node
Pod
Pod
node
Pod
Pod
node
Pod
Pod
MAPR MAPR MAPR
MAPR DATA PLATFORM
• Integration with Kubernetes APIs,
packaged and run as a POSIX client on
each Kubernetes host
• MapR Volumes are mounted for
containers
• Persist data for containerized
applications
• Scale data and performance as
containers grow
• Highly available by leveraging replicas,
snapshots, mirroring of data
• Benefit from MapR tickets, for end-to-
end security
• Multi-tenant deployment and access

There are two ways to provision a volume:
1. Kubernetes Volume with Static Provisioner
This is used to mount an existing MapR Volume to Containers
managed by Kubernetes.
2. Kubernetes Persistent Volumes with Dynamic Provisioner
This is used to create and mount a new MapR Volume to Containers
managed by Kubernetes.
Kubernetes Integration via Volume Driver Plugin

pod
kubelet
docker
plugin
mapr
fuse
Example 1: You have a Postgres container that needs persistent
storage. Plugin mounts MapR path via fuse
Static Provisioning
• Kdf volume plugin
• Admin provisions
• Fast, uses Posix drivers
• Secured with MapR tickets
• MapR cluster can be external
to K8s

KUBERNETES (CLIENT HOST)
YOUR CONTAINER
MAPR VOLUME PLUGIN - POSIX CLIENT
K8S PERSISTANT VOLUME
MAPR DATA PLATFORM
1. Request Volume
2. Mount POSIX Volume
Mounting An Existing MapR Volume (Static Provisioner)

Static provisioning

pod
kubelet
docker
plugin mapr
fuse
provisioner rest
Example 2: You are testing a new container. You want the
storage in MapR automatically allocated for the container.
Dynamic provisioning
• Kdf provisioner
• Uses MapR REST API’s to
allocate/delete MapR
volumes
• Mounting is the same as
static provisioning

KUBERNETES (CLIENT HOST)
YOUR CONTAINER
MAPR VOLUME PLUGIN - POSIX CLIENT
K8S PERSISTANT VOLUME
MAPR DATA PLATFORM
1. Request Volume
5. Mount POSIX Volume
PERSISTENT VOLUME CLAIM
STORAGE CLASS
DYNAMIC PROVISIONER
2. Request Volume
3. Request Volume
Creating A New MapR Volume (Dynamic Provisioner)
Volume Claim binds the
volume created to the
container(s)
Storage Classes used by
Administrators express
the type, size and other
characteristics that the
volume should contain

Dynamic
provisioning

Consequences
Installation of plugin is K8S level operation
• No per-node attention required
Use of plugin is overlay operation
• No change needed for an container
• Any Helm chart can use the plugin for conventional file access
Can share storage/compute or isolate or scale independently
State is no longer a dirty word for
Kubernetes!J

Application
GLOBAL DATA MANAGEMENT
Edge Private Cloud
On Premise
Public Cloud Public Cloud Public Cloud
PodPod Pod ML/AI workloadsClassic ETL
Scheduling & Scaling
MapR Kubernetes Volume Driver
Containers and Kubernetes without limits
Creating an “Ubernetes” Platform with MapR

AI BONUS TRACK
Kubernetes plays Cupid
for Data Scientists and IT

Data Science Phases
Exploration Training Deployment Production
In this phase, the
executable code that is
used to train models is
developed and some
prototyping is done.
• Typically uses data
science notebooks
• Output is code
The executable training
code is run on very large
datasets.
• Phase where
compute powers
matters
• Output is a model
Models are deployed
into a framework that
allows for the scoring
of data.
• Can be done in
batch or real time
• Output is a
microservices
framework
Models are monitored
and updated in
production.
• Requires CI/CD
pipeline capability
• Output is “insights”

Data science workflows benefit from containerization in every phase of the
pipeline from exploration, training, and deploying models to production.
• For Exploration: containerization enables isolated personalized development
environments
• For Training: containerization provides compute agility and the ability to
iterate with varying parameters
• For Deployment: containerization provides the ability to create a robust
microservices architecture
Containerization is good for Machine Learning

Everything on One Cluster
ON-PREMISES, MULTI-CLOUD, IoT EDGE
COMMODITY
SERVER
VIRTUAL
MACHINE
IoT & Edge
MAPR DATA PLATFORM
APIs: NFS, POSIX, REST, S3, HDFS, HBASE, JSON, KAFKA
Accessing Data In-Place
/f1
MAPR DATA PLATFORM
HDFS API
MAPR POSIX
CLIENT
MAPR CLIENTS
FOR CONTAINERS
MapR Makes Doing Data Science Easier

An Open Approach to Tooling
• Pre-built, certified container images connect
to MapR platform services
• Customizable using Volume Plugin and Dockerfile
to support any POSIX-compliant library or tool
• Provides a unified security model, enabling
secure connection between container and cluster
• High I/O throughput data connection to storage
layer with POSIX client
• Enables seamless multi-tenancy and job isolation MAPR DATA PLATFORM
Model A Model B
…..
For Tools
MAPR KUBERNETES VOLUME DRIVER
MAPR CLIENT
FOR
CONTAINERS
MAPR CLIENT
FOR
CONTAINERS
For Algorithms For Architectures

Kubernetes Namespace
• Used to manage and isolate cluster
resources
• Provides a multi-tenant architecture for
jobs, pods and deployments
Storage Namespace
• Can join data across architectural or
geographical divides
• Read/Write access to any dataset the user
has access to as if it were a local resource
• Data security and isolation at the user,
team, and tenant level
Kubernetes and Storage Namespaces

End to End Machine Learning on ALL of your Data
MAPR DATA PLATFORM
Exploration Training Deployment
A
B
Leverage MapR to deploy and run data science workflow end-to-end using your favorite tools

End to End Machine Learning on ALL of your Data
Leverage MapR to deploy and run data science workflow end-to-end using your favorite tools
MAPR DATA PLATFORM
Exploration Training Deployment
A
B
?

COMPUTE AGILITY
MAPR DATA PLATFORM
Containers & Kubernetes without limits!
MAPR KUBERNETES VOLUME PLUGIN
TENANT N
Application…..
TENANT 1
Application APP AGILITY
DATA AGILITY
DATA CENTER CLOUD MULTI-CLOUD KUBERNETES EDGE ACROSS
INFRASTRUCTURES
ENTERPRISE
APPLICATIONS
AI AND ML
ADV. ANALYTICS
Stateful app
container
MAPR POSIX
CLIENT FOR
CONTAINERS
Application

MapR The Leading Data Platform for AI and Analytics
https://mapr.com/solutions/ai-analytics/
Blog: Containers, Kubernetes, and MapR: The Time is Now
https://mapr.com/blog/containers-kubernetes-and-mapr-the-time-is-now/
https://mapr.com/solutions/data-fabric/kubernetes/
MapR Data Fabric for Kubernetes - Documentation
https://mapr.com/docs/60/PersistentStorage/kdf_overview.html
MapR Data Platform

O’Reilly (e)books!
Download the e-book
here:
https://mapr.com/ebook/
machine-learning-logistics/
by Ted Dunning and
Ellen Friedman
Just released at Strata
New York, Sept 2018
Download the e-book
here:
https://mapr.com/ebook/
ai-and-analytics-in-
production/

THANK YOU!
#MapR
#Kubernetes4Data

Container and Kubernetes without limits

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Container and Kubernetes without limits

Similar to Container and Kubernetes without limits (20)

Recently uploaded

Recently uploaded (20)

Container and Kubernetes without limits