Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Multitenant Data
Architectures with Kubernetes
Paul Brown
paul.brown@salesforce.com
Motivation
• Software development and data science have distinct
lifecycles.
• Repeatability is fundamental to both.
• Bri...
Multi-tenancy with Multiplicity
• No tool really does it all. (Sorry.)
• Data wrangling, ETL/ELT, different algorithms hos...
Containers
• Package apps with their libraries in a (relatively) clean manner
— especially important for native code.
• En...
Kubernetes is awesome.
For reasons you already know:
• Bin packing.
• Horizontal scale-out for the platform, auto-scaling ...
A Simple Idea
What if we could package
workloads in containers and
then kubectl could be our
fundamental devops
primitive…...
Challenges
• Typical workloads consist of multiple types of containers that need
to collaborate.
• Containerization (often...
Example
Problem:
• Zookeeper
• Nodes have distinct identity, and the client protocol is designed
to defy load balancing.
S...
Some Familiar Problems
Once you can stamp out workloads, you get down to familiar problems:
• Tenant-attributed logging (w...
Wrap Up
• Building a data processing platform on Kubernetes has some
obvious starting points and some familiar challenges....
Upcoming SlideShare
Loading in …5
×

1

Share

Download to read offline

Tectonic Summit 2016: Multitenant Data Architectures with Kubernetes

Download to read offline

Paul Brown walks through multitenant data architectures with Kubernetes.

12/12/16

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Tectonic Summit 2016: Multitenant Data Architectures with Kubernetes

  1. 1. Multitenant Data Architectures with Kubernetes Paul Brown paul.brown@salesforce.com
  2. 2. Motivation • Software development and data science have distinct lifecycles. • Repeatability is fundamental to both. • Bridging the data science lifecycle into the software development lifecycle presents challenges.
  3. 3. Multi-tenancy with Multiplicity • No tool really does it all. (Sorry.) • Data wrangling, ETL/ELT, different algorithms hosted in different compute frameworks, … • Data pipeline or workflow to tie it all together. • Everyone wants something different, sometimes for good reasons. Being able to run a large number of different workloads for a large number of different users is a win.
  4. 4. Containers • Package apps with their libraries in a (relatively) clean manner — especially important for native code. • Ensure traceability of code, presuming that there is a solid CI and repository solution in place.
  5. 5. Kubernetes is awesome. For reasons you already know: • Bin packing. • Horizontal scale-out for the platform, auto-scaling for pods. • Service discovery, load balancing. • Self-healing. • Batch execution. And more reasons in the future: • GPU affinity. • Backplane for Spark.
  6. 6. A Simple Idea What if we could package workloads in containers and then kubectl could be our fundamental devops primitive…? Napkin Sketch: 1. Build a control plane that knows how to stamp out workloads via a Provisioning API. 2. Profit. Kubernetes Control Plane Workload1 Workload2 Workload3 Provisioning API
  7. 7. Challenges • Typical workloads consist of multiple types of containers that need to collaborate. • Containerization (often) isn’t that bad, depending on your taste. • Many workloads or components thereof (e.g., Spark) aren’t designed in a manner that permits the best use of Kubernetes facilities. Surgery (or holding your nose) is frequently required, but sometimes (e.g., TensorFlow!) things work well from the start.
  8. 8. Example Problem: • Zookeeper • Nodes have distinct identity, and the client protocol is designed to defy load balancing. Solution: • Replication controller per node and call it a day.
  9. 9. Some Familiar Problems Once you can stamp out workloads, you get down to familiar problems: • Tenant-attributed logging (workload and user) and metrics. • “Billing” and metering. • Visibility and other flavors of operability. • Security — from purposeful or accidental attackers. • Workload isolation, e.g., for PII. Fixing these problems frequently frequently requires surgery, and none of these problems are unique to containerization or cluster scheduling of workloads, i.e., you have to solve them anyway.
  10. 10. Wrap Up • Building a data processing platform on Kubernetes has some obvious starting points and some familiar challenges. • More data scientists and middleware makers are starting with containers as a packaging scheme.
  • ssuser707aef1

    Jun. 29, 2017

Paul Brown walks through multitenant data architectures with Kubernetes. 12/12/16

Views

Total views

530

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

17

Shares

0

Comments

0

Likes

1

×