Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Container Attached Storage with OpenEBS - CNCF Paris Meetup


Published on

These slides were presented by Jeffry Molanus, CTO of MayaData in the CNCF Paris meetup held on 26th June 2019.

Published in: Software
  • Login to see the comments

  • Be the first to like this

Container Attached Storage with OpenEBS - CNCF Paris Meetup

  1. 1. Container attached storage with openEBS @JeffryMolanus Date: 26/6/2019
  2. 2. About me MayaData and the OpenEBS project
  3. 3. on premises Google DMaaS Analytics Alerting Compliance Policies Declarative Data Plane A P I Advisory Chatbot
  4. 4. Resistance Is Futile • K8s based on the original Google Borg paper • Containers are the “unit” of management • Mostly web based applications • Typically the apps where stateless — if you agree there is such a thing • In its most simplistic form k8s is a control loop • Converge to the desired state based on declarative intent provided by the DevOps persona • Abstract away underlying compute cluster details and decouple apps from infra structure: avoid lock-in • Have developer focus on application deployment and not worry about the environment it runs in • HW independent (commodity)
  5. 5. Borg Schematic
  6. 6. Persistency in Volatile Environnements • Containers storage is ephemeral; data is only stored during the life time of the container(s) • This either means that temporary data has no value or it can be regenerated • Sharing data between containers is also a challenge — need to persist • In the case of severless — the intermediate state between tasks is ephemeral • The problem then: containers need persistent volumes in order to run state full workloads • While doing so: abstract away the underlying storage details and decouple the data from the underlying infra: avoid lock-in • The “bar” has been set in terms of expectation by the cloud providers i.e PD, EBS • Volume available at multiple DCs and/or regions and replicated
  7. 7. Data Loss Is Almost Guaranteed apiVersion: v1 kind: Pod metadata: name: test-pd spec: containers: - image: name: test-container volumeMounts: - mountPath: /test-pd name: test-volume volumes: - name: test-volume hostPath: # directory location on host path: /data Unless…
  8. 8. Use a “Cloud” Disk apiVersion: v1 kind: Pod metadata: name: test-pd spec: containers: - image: name: test-container volumeMounts: - mountPath: /test-pd name: test-volume volumes: - name: test-volume # This GCE PD must already exist! gcePersistentDisk: pdName: my-data-disk fsType: ext4
  9. 9. Evaluation and Progress • In both cases we tie ourselves to a particular node — that defeats the agility found natively in k8s and it failed to abstract away details • We are cherrypicking pets from our herd • anti pattern — easy to say and hard to avoid in some cases • The second example allows us to mount (who?) the PV to different nodes but requires volumes to be created prior to launching the workload • Good — not great • More abstraction through community efforts around Persistent Volumes (PV) and Persistent Volume Claims (PVC) and CSI • Container Storage Interface (CSI) to handle vendor specific needs before, in example, mounting the volume • Avoid wild fire of “volume plugins” or “drivers” in k8s main repo
  10. 10. The PV and PVC kind: PersistentVolume apiVersion: v1 metadata: name: task-pv-volume spec: storageClassName: manual capacity: storage: 3Gi accessModes: - ReadWriteOnce hostPath: path: "/mnt/data" kind: PersistentVolumeClaim apiVersion: v1 metadata: name: task-pv-claim spec: storageClassName: manual accessModes: - ReadWriteOnce resources: requests: storage: 3Gi kind: Pod apiVersion: v1 metadata: name: mypod spec: containers: - name: myfrontend image: nginx volumeMounts: - mountPath: "/var/www/html" name: mypd volumes: - name: mypd persistentVolumeClaim: claimName: task-pv-claim
  11. 11. Summary So Far • Register a set of “mountable” things to the cluster (PVC) • Take ownership of a “mountable” thing in the cluster (PV) • Refer in the application to the PVC • Dynamic provisioning; create ad-hoc PVCs when claiming something that does not exist yet • Remove the need to preallocate them (is that a good thing?) • The attaching and detaching of volumes to nodes is standardised by means of CSI which is an gRPC interface that handles the details of creating, attaching, staging, destroying etc • Vendor specific implementations are hidden from the users
  12. 12. The Basics — Follow the Workload Node Node POD PVC
  13. 13. Problem Solved? • How does a developer configure the PV such that it exactly has the features that are required for that particular workload • Number of replica’s, Compression, Snapshot and clones (opt in/out) • How do we abstract away differences between storage vendors when moving to/from private or public cloud? • Differences in replication approaches — usually not interchangeable • Abstract away access protocol and feature mismatch • Provide cloud native storage type like “look and feel” on premises ? • Don't throw away our million dollar existing storage infra • GKE on premisses, AWS outpost — if you are not going to the cloud it will come to you, resistance if futile • Make data as agile as the applications that they serve
  14. 14. Data Gravity • As data grows — it has the tendency to pull applications towards it (gravity) • Everything will evolve around the sun and it dominates the planets • Latency, throughput, IO blender • If the sun goes super nova — all your apps circling it will be gone instantly • Some solutions involve replicating the sun towards some other location in the “space time continuum” • It works — but it exacerbates the problem
  15. 15. Picard Knows the Borg Like no Other
  16. 16. What if…. Storage for containers was itself container native ?
  17. 17. Cloud Native Architecture? • Applications have changed, and somebody forgot to tell storage • Cloud native applications are —distributed systems themselves • May use a variety of protocols to achieve consensus (Paxos, Gossip, etc) • Is a distributed storage system still needed? • Designed to fail and expected to fail • Across racks, DC’s, regions and providers, physical or virtual • Scalability batteries included • HaProxy, Envoy, Nginx • Datasets of individual containers relativity small in terms of IO and size • Prefer having a collection of small stars over a big sun? • The rise of cloud native languages such as Ballerina, Metaparticle etc
  18. 18. HW / Storage Trends • Hardware trends enforce a change in the way we do things • 40GbE and 100GbE are ramping up, RDMA capable • NVMe and NVMe-OF (transport — works on any device) • Increasing core counts — concurrency primitives built into languages • Storage limitations bubble up in SW design (infra as code) • “don’t do this because of that” — “don’t run X while I run my backup” • Friction between teams creates “shadow it” — the (storage) problems start when we move back from the dark side of the moon back into the sun • “We simply use DAS —as there is nothing faster then that” • small stars, that would work — no “enterprise features”? • “they have to figure that out for themselves” • Seems like storage is an agility anti-pattern?
  19. 19. HW Trends
  20. 20. The Persona Changed • Deliver fast and frequently • Infrastructure as code, declarative intent, gitOps, chatOps • K8s as the unified cross cloud control plane (control loop) • So what about storage? It has not changed at all
  21. 21. The Idea Manifests express intent stateless Container 1 Container 2 Container 3 stateful Data Container Data Container Data Container Any Server, Any Cloud Any Server, Any Cloud container(n) container(n) container(n) container(n) container(n) container(n)
  22. 22. Design Constraints • Built on top of the substrate of Kubernetes • That was a bet we made ~2 years ago that turned out to be right • Not yet another distributed storage system; small is the new big • Not to be confused with not scalable • One on top of the other, an operational nightmare? • Per workload: using declarative intent defined by the persona • Runs in containers for containers — so it needs to run in user space • Make volumes omnipresent — compute follows the storage? • Where is the value? Compute or the data that feeds the compute? • Not a clustered storage instance rather a cluster of storage instances
  23. 23. Decompose the Data
  24. 24. SAN/NAS Vs. DASCAS Container Attached Storage
  25. 25. How Does That Look?
  26. 26. Topology Visualisation
  27. 27. Route Your Data Where You Need It To Be PV CAS TheBox 1 TheBox 2 TheBox 3
  28. 28. Composable PV Ingress local remote T(x) T(x) T(x) Egress compress, encrypt, mirror
  29. 29. User Space and Performance • NVMe as a transport is a game changer not just for its speed potential, but also due to its relentless break away of the SCSI layer (1978) • A Lot of similarities with Infini Band technology found in HPC for many years (1999 as a result of a merger)
  30. 30. Less Is More
  31. 31. HW Changes Enforce A Change • With these low latency devices CPUs are becoming the bottleneck • Post spectre/meltdown syscalls have become more expensive then ever
  32. 32. Hugepages
  33. 33. PMD User Space IO
  34. 34. Testing It DevOps Style
  35. 35. K8S as a Control Loop Kubelet K8s Master YAML + - Primary loop (k8s) OP Sched API Servers …..
  36. 36. -
 Extending the K8S Control Loop Kubeletk8s++ Adapt YAML + - RefMO Primary loop (k8s) Secondary loop (MOAC)
  37. 37. Raising the Bar — Automated Error Correction CAS FIO FIO FIO replay blk IO pattern of various apps kubectl scale up and down DB Regression AI/ML Logs Telemetry Learn what failure impacts app how Declarative Data Plane A P I
  38. 38. Storage just fades away as concern
  39. 39. Questions?!