OpenEBS; asymmetrical block layer in user-space breaking the million IOPS barrier

Containerized Storage for Containers
@JeffryMolanus
@openEBS
https://openebs.io

OpenEBS; asymmetrical block
layer in user-space breaking the
million IOPS barrier

• Screwed up the recording — hope its all good for this year
• Touched briefly on storage history, how SAN and NAS came to be
• Mostly to set the context here
• Introduced the concept of Container Attached Storage (CAS)
Today
• Talk about progress we made, our maiden voyage with RUST, and go over
some of the concepts that we are working on
• What you see here today is only worked on by 2 persons
• Hopefully a quick demo
• If what your hear today somewhat excites you; we are (remote) hiring
OpenEBS last year (2018)

• Open source project started now roughly 2 years ago
• Sponsored by my employee MayaData
• Provide a cloud native storage abstraction — data plane as well as
control plane which is operated by means of declarative intent such that
it provides a platform for persistent cloud native workloads
• Build on top of Kubernetes which has demonstrated that abstraction and
intent with reconciliation allows developers to focus on the deployment of
the app rather the underlying infra structure
• What k8s does for apps we inspire to do for data
About openEBS

How does that look?
on premises Google packet.net
MayaOnline
Analytics
Alerting
Compliance
Policies
Declarative Data Plane
A
P
I
Advisory
Chatbot

Motivation
• Applications have changed and someone forgot to tell storage
• The way modern day software is developed and deployed has changed
a lot due to introduction of docker (tarball on steroids)
• Scalability and availability “batteries” are included
• Small teams of people need to deliver “fast and frequently” and
innovations tends to happen in so called shadow IT (skunkworks)
• Born in the cloud — adopts cloud native patterns
• Hardware trends enforce a change in the way we do things
• These change propagate into our software, and the languages we use
• K8s as a universal control plane to deploy containerised applications
• Public cloud is moving on premises (GKE, Outpost)
• K8s capable of doing more then containers due to controllers (VMs)

• Register a set of “mountable” things to k8s cluster (PV)
• Take ownership of such a mountable thing — by claiming it (PVC)
• Refer to the PVC in the application
• To avoid having to fill the up a pool of PVs — a dynamic provisioner can
be used that does that automatically
• Potential implications may vary per storage solution (max LUs)
• Storage typically the mother of all snowflakes
• To avoid a wild fire of plugins, a Container Storage Interface (CSI) has
been developed by community members
• Vendor specific implementation (or black magic) hidden from the user
• Make it a pure consumption model
PVs and PVCs in a nutshell

Using host paths
apiVersion: v1
kind: Pod
metadata:
name: test-pd
spec:
containers:
- image: k8s.gcr.io/test-webserver
name: test-container
volumeMounts:
- mountPath: /test-pd
name: test-volume
volumes:
- name: test-volume
hostPath:
# directory location on host
path: /data
Unless…

The canonical way
kind: PersistentVolume
apiVersion: v1
metadata:
name: task-pv-volume
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data"
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: task-pv-claim
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 3Gi
kind: Pod
apiVersion: v1
metadata:
name: mypod
spec:
containers:
- name: myfrontend
image: nginx
volumeMounts:
- mountPath: "/var/www/html"
name: mypd
volumes:
- name: mypd
persistentVolumeClaim:
claimName: task-pv-claim

Generic flow of PV/PVCs
Node Node
POD
PVC

• How does a developer compose its volume in terms of storage specific
features for that particular workload?
• snapshots, clones, compression, encryption — persona in control
• How do we unify storage differences between different cloud providers
and/or storage vendors?
• They are as incompatible as they can be by design
• How to provide cloud native “EBS volume” look and feel on premisses
using your existing storage infra?
• Don’t trow away existing storage solutions and or vendors
• Make storage as agile — as they applications that they serve
Problem solved?

• As data grows — it has the tendency to pull applications towards it
• Everything evolves around the storage systems
• Latency, throughput — IO blender
• If the sun goes super nova, all the apps around it will be gone instantly
i.e huge blast radius
• Typically you have far more PV/PVC’s then you have LUs in a virtual
environment — 1000?
• Typical solution let us replicate the sun!
• Exacerbates the problem instead of solving it?
Data gravity

• Data placement is expressed in YAML as part of the application
• Replication factors can be dynamically changed (patch)
• Provide a set of composable transformations layers that can be enabled
based on application specific needs
• As monolithic apps are decomposed — so are their storage needs
• Volumes typically small, allows for data agility
• Allows us to reimagine the how we manage the data
• Runs in containers for containers — prevents depending feature mismatch
between different kernel flavours across distributions and “cloud” images
• Decompose the data in to a collection of small stars
• Monolith vs Micro
OpenEBS approach

• The user is faced with writing an application that might run in DC1 or DC2
as the k8s cluster is spanning both.
• DC1 happens to have vendor A and DC2 has vendor B
• typically, vendor A does not work with vendor B — efficiently
• OpenEBS can be used to abstract away the differences between the to
storage systems and make the volume available in both DCs
• Almost like a ‘real’ EBS volume except — we have more control
Data availability example

Simple replication of small datasets
PV
CAS
TheBox 1 TheBox 2 TheBox 3
• Data routing, you specify where you want
your data to go
• It is openEBS that connects to TheBox — not
the OS
• The openEBS operator, not shown, instantiates
the needed virtual devices on the fly

• Facing different type of storage protocols and performance tiers
• OpenEBS cant fill the performance gap, it is storage not magic
• As time moves one, we want to get “rid” of the slow tier as a faster tier has
become available
• PVs come and go all the time, like the slow tier will be repurposed
• The alternative is to “not deploy” and wait for storage
• How-to move the data, non disruptive?
• Hydrate and then vacate, formerly known as migration aka copy =)
Data Mobility use case

Data hydration and mobility
PV
iSCSI iSCSI NBD
iSCSI
hydrate/mirror
• Asymmetrical backends, performance depends on replication mode and
interconnect
• async, semi-sync and sync
• Data migration and hydration — small is the new big we copying GBs not PBs!
CAS

• Volumes are small, rebuild in general is quick, how to know what to rebuild
• Although small — you really don’t want to rebuild unused blocks
• General approach is to segment the drive(s) into fixed blocks (e.g 16MB)
• Keep a bitmap of dirty segments as writes come in
• Where to store the bitmap?
• Remember: small (Bonwick on Spacemaps)
• As a new drive/LU is added write out the marked segments to the other
drive(s)
• But, what about thin provisioning, clones, snapshots?
• We have something that does that, but.. maybe next year
• Most of this is not new — standing on the shoulder of giants
• “The design and implementation of a Log Structured filesystem”
Rebuilding

Composable storage
PV
Ingress
local remote
T(x)
T(x)
T(x)
Egress
compress, encrypt, mirror

Protocols (ingress, egress)
PV CAS
? iSCSI
nvmf-tcp
nvmf-rdma
virtio-fam
NBD
iSCSI
NVMe
nvmf-rdma
virtio-fam
AIO
gluster
Custom
Custom

Rings in CAS
reactor
func(arg1, arg2)
core(n) grpc
poller: func(arg1, interval)
iscsi
Dev

Efficiency
SAMSUNG MZWLK1T6HCHP

• Some results using a Micro 9200 1.9TB NVMe SSD (840K IOPS on paper)
Protocols matter
0
250
500
750
1.000
NVMe (UIO) NBD iSCSI
840
spec

• Note not a very good test — it was on my laptop! but….
• ~30% increase
iscsi vs nvme-tcp
0
2.000
4.000
6.000
8.000
iSCSI nvme-tcp

Million IOPS barrier
Very quick demo

QUESTIONS?
Did I mention we are hiring?
gila@openebs.io

OpenEBS; asymmetrical block layer in user-space breaking the million IOPS barrier

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to OpenEBS; asymmetrical block layer in user-space breaking the million IOPS barrier

Similar to OpenEBS; asymmetrical block layer in user-space breaking the million IOPS barrier (20)

Recently uploaded

Recently uploaded (20)

OpenEBS; asymmetrical block layer in user-space breaking the million IOPS barrier