Container Attached Storage with OpenEBS - CNCF Paris Meetup

Container attached storage with
openEBS
@JeﬀryMolanus

Date: 26/6/2019

https://openebs.io

About me
MayaData and the OpenEBS project

on premises Google packet.net
DMaaS
Analytics
Alerting
Compliance
Policies
Declarative Data Plane
A
P
I
Advisory
Chatbot

Resistance Is Futile
• K8s based on the original Google Borg paper

• Containers are the “unit” of management

• Mostly web based applications

• Typically the apps where stateless — if you agree there is such a thing

• In its most simplistic form k8s is a control loop

• Converge to the desired state based on declarative intent provided by the DevOps
persona

• Abstract away underlying compute cluster details and decouple apps from
infra structure: avoid lock-in

• Have developer focus on application deployment and not worry about the
environment it runs in

• HW independent (commodity)

Persistency in Volatile Environnements
• Containers storage is ephemeral; data is only stored during the life time of
the container(s)

• This either means that temporary data has no value or it can be regenerated

• Sharing data between containers is also a challenge — need to persist

• In the case of severless — the intermediate state between tasks is ephemeral

• The problem then: containers need persistent volumes in order to run state
full workloads

• While doing so: abstract away the underlying storage details and decouple
the data from the underlying infra: avoid lock-in

• The “bar” has been set in terms of expectation by the cloud providers i.e PD, EBS

• Volume available at multiple DCs and/or regions and replicated

Data Loss Is Almost Guaranteed
apiVersion: v1
kind: Pod
metadata:
name: test-pd
spec:
containers:
- image: k8s.gcr.io/test-webserver
name: test-container
volumeMounts:
- mountPath: /test-pd
name: test-volume
volumes:
- name: test-volume
hostPath:
# directory location on host
path: /data
Unless…

Use a “Cloud” Disk
apiVersion: v1
kind: Pod
metadata:
name: test-pd
spec:
containers:
- image: k8s.gcr.io/test-webserver
name: test-container
volumeMounts:
- mountPath: /test-pd
name: test-volume
volumes:
- name: test-volume
# This GCE PD must already exist!
gcePersistentDisk:
pdName: my-data-disk
fsType: ext4

Evaluation and Progress
• In both cases we tie ourselves to a particular node — that defeats the agility
found natively in k8s and it failed to abstract away details
• We are cherrypicking pets from our herd
• anti pattern — easy to say and hard to avoid in some cases

• The second example allows us to mount (who?) the PV to different nodes
but requires volumes to be created prior to launching the workload

• Good — not great

• More abstraction through community efforts around Persistent Volumes
(PV) and Persistent Volume Claims (PVC) and CSI

• Container Storage Interface (CSI) to handle vendor specific needs before, in
example, mounting the volume

• Avoid wild fire of “volume plugins” or “drivers” in k8s main repo

The PV and PVC
kind: PersistentVolume
apiVersion: v1
metadata:
name: task-pv-volume
spec:
storageClassName: manual
capacity:
storage: 3Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data"
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: task-pv-claim
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 3Gi
kind: Pod
apiVersion: v1
metadata:
name: mypod
spec:
containers:
- name: myfrontend
image: nginx
volumeMounts:
- mountPath: "/var/www/html"
name: mypd
volumes:
- name: mypd
persistentVolumeClaim:
claimName: task-pv-claim

Summary So Far
• Register a set of “mountable” things to the cluster (PVC)

• Take ownership of a “mountable” thing in the cluster (PV)

• Refer in the application to the PVC

• Dynamic provisioning; create ad-hoc PVCs when claiming something that
does not exist yet

• Remove the need to preallocate them (is that a good thing?)

• The attaching and detaching of volumes to nodes is standardised by means
of CSI which is an gRPC interface that handles the details of creating,
attaching, staging, destroying etc

• Vendor speciﬁc implementations are hidden from the users

The Basics — Follow the Workload
Node Node
POD
PVC

Problem Solved?
• How does a developer configure the PV such that it exactly has the features
that are required for that particular workload
• Number of replica’s, Compression, Snapshot and clones (opt in/out)
• How do we abstract away differences between storage vendors when
moving to/from private or public cloud?

• Differences in replication approaches — usually not interchangeable

• Abstract away access protocol and feature mismatch

• Provide cloud native storage type like “look and feel” on premises ?

• Don't throw away our million dollar existing storage infra

• GKE on premisses, AWS outpost — if you are not going to the cloud it will come to
you, resistance if futile

• Make data as agile as the applications that they serve

Data Gravity
• As data grows — it has the tendency to pull applications towards it (gravity)

• Everything will evolve around the sun and it dominates the planets

• Latency, throughput, IO blender

• If the sun goes super nova — all your apps circling it will be gone instantly

• Some solutions involve replicating the sun towards some other location in
the “space time continuum”

• It works — but it exacerbates the problem

Picard Knows the Borg Like no Other

What if….
Storage for containers was itself container native ?

Cloud Native Architecture?
• Applications have changed, and somebody forgot to tell storage
• Cloud native applications are —distributed systems themselves

• May use a variety of protocols to achieve consensus (Paxos, Gossip, etc)

• Is a distributed storage system still needed?

• Designed to fail and expected to fail

• Across racks, DC’s, regions and providers, physical or virtual

• Scalability batteries included

• HaProxy, Envoy, Nginx

• Datasets of individual containers relativity small in terms of IO and size
• Prefer having a collection of small stars over a big sun?

• The rise of cloud native languages such as Ballerina, Metaparticle etc

HW / Storage Trends
• Hardware trends enforce a change in the way we do things
• 40GbE and 100GbE are ramping up, RDMA capable

• NVMe and NVMe-OF (transport — works on any device)

• Increasing core counts — concurrency primitives built into languages

• Storage limitations bubble up in SW design (infra as code)

• “don’t do this because of that” — “don’t run X while I run my backup”

• Friction between teams creates “shadow it” — the (storage) problems start when
we move back from the dark side of the moon back into the sun
• “We simply use DAS —as there is nothing faster then that”

• small stars, that would work — no “enterprise features”?

• “they have to ﬁgure that out for themselves”

• Seems like storage is an agility anti-pattern?

The Persona Changed
• Deliver fast and frequently

• Infrastructure as code, declarative
intent, gitOps, chatOps

• K8s as the uniﬁed cross cloud
control plane (control loop)

• So what about storage? It has not
changed at all

The Idea
Manifests express intent
stateless
Container 1 Container 2 Container 3
stateful
Data Container Data Container Data Container
Any Server, Any Cloud Any Server, Any Cloud
container(n) container(n) container(n)
container(n) container(n) container(n)

Design Constraints
• Built on top of the substrate of Kubernetes

• That was a bet we made ~2 years ago that turned out to be right

• Not yet another distributed storage system; small is the new big
• Not to be confused with not scalable
• One on top of the other, an operational nightmare?

• Per workload: using declarative intent deﬁned by the persona

• Runs in containers for containers — so it needs to run in user space
• Make volumes omnipresent — compute follows the storage?

• Where is the value? Compute or the data that feeds the compute?

• Not a clustered storage instance rather a cluster of storage instances

SAN/NAS Vs. DASCAS
Container Attached Storage

Route Your Data Where You Need It To Be
PV
CAS
TheBox 1 TheBox 2 TheBox 3

Composable
PV
Ingress
local remote
T(x)
T(x)
T(x)
Egress
compress, encrypt, mirror

User Space and Performance
• NVMe as a transport is a game changer not just for its speed potential, but
also due to its relentless break away of the SCSI layer (1978)
• A Lot of similarities with Inﬁni Band technology found in HPC for many years
(1999 as a result of a merger)

HW Changes Enforce A Change
• With these low latency devices CPUs are becoming the
bottleneck

• Post spectre/meltdown syscalls have become more expensive
then ever

K8S as a Control Loop
Kubelet
K8s
Master
YAML
+ -
Primary loop (k8s)
OP Sched
API
Servers
…..

- 
+ 
Extending the K8S Control Loop
Kubeletk8s++
Adapt
YAML
+ -
RefMO
Primary loop (k8s)
Secondary loop (MOAC)

Raising the Bar — Automated Error Correction
CAS
FIO FIO FIO
replay blk IO pattern of various apps
kubectl scale up and down
DB
Regression
AI/ML
Logs Telemetry
Learn what failure

impacts app how
Declarative Data Plane
A
P
I

Storage just fades away as concern

Container Attached Storage with OpenEBS - CNCF Paris Meetup

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Container Attached Storage with OpenEBS - CNCF Paris Meetup

Similar to Container Attached Storage with OpenEBS - CNCF Paris Meetup (20)

More from MayaData Inc

More from MayaData Inc (15)

Recently uploaded

Recently uploaded (20)

Container Attached Storage with OpenEBS - CNCF Paris Meetup