Tectonic Summit 2016: Multitenant Data Architectures with Kubernetes

•Download as PPTX, PDF•

2 likes•576 views

CoreOS

Paul Brown walks through multitenant data architectures with Kubernetes. 12/12/16

Technology

Multitenant Data
Architectures with Kubernetes
Paul Brown
paul.brown@salesforce.com

Motivation
• Software development and data science have distinct
lifecycles.
• Repeatability is fundamental to both.
• Bridging the data science lifecycle into the software
development lifecycle presents challenges.

Multi-tenancy with Multiplicity
• No tool really does it all. (Sorry.)
• Data wrangling, ETL/ELT, different algorithms hosted in different
compute frameworks, …
• Data pipeline or workflow to tie it all together.
• Everyone wants something different, sometimes for good reasons.
Being able to run a large number of different workloads for a large
number of different users is a win.

Containers
• Package apps with their libraries in a (relatively) clean manner
— especially important for native code.
• Ensure traceability of code, presuming that there is a solid CI
and repository solution in place.

Kubernetes is awesome.
For reasons you already know:
• Bin packing.
• Horizontal scale-out for the platform, auto-scaling for pods.
• Service discovery, load balancing.
• Self-healing.
• Batch execution.
And more reasons in the future:
• GPU affinity.
• Backplane for Spark.

A Simple Idea
What if we could package
workloads in containers and
then kubectl could be our
fundamental devops
primitive…?
Napkin Sketch:
1. Build a control plane
that knows how to
stamp out workloads via
a Provisioning API.
2. Profit.
Kubernetes
Control Plane
Workload1
Workload2
Workload3
Provisioning API

Challenges
• Typical workloads consist of multiple types of containers that need
to collaborate.
• Containerization (often) isn’t that bad, depending on your taste.
• Many workloads or components thereof (e.g., Spark) aren’t designed
in a manner that permits the best use of Kubernetes facilities.
Surgery (or holding your nose) is frequently required, but sometimes
(e.g., TensorFlow!) things work well from the start.

Example
Problem:
• Zookeeper
• Nodes have distinct identity, and the client protocol is designed
to defy load balancing.
Solution:
• Replication controller per node and call it a day.

Some Familiar Problems
Once you can stamp out workloads, you get down to familiar problems:
• Tenant-attributed logging (workload and user) and metrics.
• “Billing” and metering.
• Visibility and other flavors of operability.
• Security — from purposeful or accidental attackers.
• Workload isolation, e.g., for PII.
Fixing these problems frequently frequently requires surgery, and none of
these problems are unique to containerization or cluster scheduling of
workloads, i.e., you have to solve them anyway.

Wrap Up
• Building a data processing platform on Kubernetes has some
obvious starting points and some familiar challenges.
• More data scientists and middleware makers are starting with
containers as a packaging scheme.

What's hot

Tectonic Summit 2016: CoreOS Tectonic on AWSCoreOS

Is your kubernetes negative or positive LibbySchulze

DCSF19 Kubernetes Security with OPA Docker, Inc.

Whats new in brigade 2LibbySchulze

Zero-downtime deployment with Kubernetes [Meetup #21 - 01]Vietnam Open Infrastructure User Group

Using csi snapshot.pptxLibbySchulze

Deploying Anything as a Service (XaaS) Using Operators on KubernetesAll Things Open

Migrating to Cloud Native Solutionsinwin stack

Managing add-ons across clustersLibbySchulze

Deploy prometheus on kubernetesCloud Technology Experts

Kubernetes 1.21 releaseLibbySchulze

GitOps is the best modern practice for CD with KubernetesVolodymyr Shynkar

Kubernetes-Native DevOps: For Apache Kafka® with Confluentconfluent

Argo Workflows 3.0, a detailed look at what’s new from the Argo TeamLibbySchulze

CDK - The next big thing - Quang PhuongVietnam Open Infrastructure User Group

2015 DockeCon monitoring presentationBrian Christner

DCSF 19 Mitigating Legacy Windows Operating System Vulnerabilities with Docke...Docker, Inc.

Building Cloud Native Applications Using Azure Kubernetes ServiceDennis Moon

Architecting for Continuous DeliveryMohammad Bilal Wahla

Lugano Tech Talks - Why DockerBrian Christner

What's hot (20)

Tectonic Summit 2016: CoreOS Tectonic on AWS

Is your kubernetes negative or positive

DCSF19 Kubernetes Security with OPA

Whats new in brigade 2

Zero-downtime deployment with Kubernetes [Meetup #21 - 01]

Using csi snapshot.pptx

Deploying Anything as a Service (XaaS) Using Operators on Kubernetes

Migrating to Cloud Native Solutions

Managing add-ons across clusters

Deploy prometheus on kubernetes

Kubernetes 1.21 release

GitOps is the best modern practice for CD with Kubernetes

Kubernetes-Native DevOps: For Apache Kafka® with Confluent

Argo Workflows 3.0, a detailed look at what’s new from the Argo Team

CDK - The next big thing - Quang Phuong

2015 DockeCon monitoring presentation

DCSF 19 Mitigating Legacy Windows Operating System Vulnerabilities with Docke...

Building Cloud Native Applications Using Azure Kubernetes Service

Architecting for Continuous Delivery

Lugano Tech Talks - Why Docker

Similar to Tectonic Summit 2016: Multitenant Data Architectures with Kubernetes

Latest (storage IO) patterns for cloud-native applications OpenEBS

Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)Tibo Beijen

Why kubernetes mattersPlatform9

Solving k8s persistent workloads using k8s DevOps styleMayaData

Hot to build continuously processing for 24/7 real-time data streaming platform?GetInData

Computer system organizationSyed Zaid Irshad

Microservices for java architects it-symposium-2015-09-15Derek Ashmore

Simplify Your Way To Expert Kubernetes ManagementDevOps.com

Yapc10 Cdt World DominationcPanel

Intro to kubernetesFaculty of Technical Sciences, University of Novi Sad

Fastest Servlets in the WestStuart (Pid) Williams

Kubeflow.pptxdhaferbenali1

The Economies of Scaling SoftwareAbdelmonaim Remani

Docker for the enterpriseBert Poller

The economies of scaling software - Abdel Remanijaxconf

Choosing the right parallel compute architecture corehard_by

Brief Introduction To KubernetesAvinash Ketkar

Architectural Decisions: Smoothly and ConsistentlyComsysto Reply GmbH

Evolving for KubernetesChris McEniry

Similar to Tectonic Summit 2016: Multitenant Data Architectures with Kubernetes (20)

Latest (storage IO) patterns for cloud-native applications

Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)

Why kubernetes matters

Solving k8s persistent workloads using k8s DevOps style

Hot to build continuously processing for 24/7 real-time data streaming platform?

Computer system organization

Microservices for java architects it-symposium-2015-09-15

Simplify Your Way To Expert Kubernetes Management

Yapc10 Cdt World Domination

Intro to kubernetes

Fastest Servlets in the West

Kubeflow.pptx

The Economies of Scaling Software

Docker for the enterprise

The economies of scaling software - Abdel Remani

Choosing the right parallel compute architecture

Brief Introduction To Kubernetes

Architectural Decisions: Smoothly and Consistently

Evolving for Kubernetes

Recently uploaded

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

unit 4 immunoblotting technique complete.pptxBkGupta21

Take control of your SAP testing with UiPath Test SuiteDianaGray10

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

Time Series Foundation Models - current state and future directionsNathaniel Shimoni

A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan

Scale your database traffic with Read & Write split using MySQL RouterMydbops

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

Recently uploaded (20)

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

unit 4 immunoblotting technique complete.pptx

Take control of your SAP testing with UiPath Test Suite

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

DevoxxFR 2024 Reproducible Builds with Apache Maven

Ensuring Technical Readiness For Copilot in Microsoft 365

DSPy a system for AI to Write Prompts and Do Fine Tuning

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

The State of Passkeys with FIDO Alliance.pptx

Time Series Foundation Models - current state and future directions

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

TeamStation AI System Report LATAM IT Salaries 2024

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy

Generative AI for Technical Writer or Information Developers

Scale your database traffic with Read & Write split using MySQL Router

"Debugging python applications inside k8s environment", Andrii Soldatenko

Nell’iperspazio con Rocket: il Framework Web di Rust!

Dev Dives: Streamline document processing with UiPath Studio Web

Tectonic Summit 2016: Multitenant Data Architectures with Kubernetes

1. Multitenant Data Architectures with Kubernetes Paul Brown paul.brown@salesforce.com

2. Motivation • Software development and data science have distinct lifecycles. • Repeatability is fundamental to both. • Bridging the data science lifecycle into the software development lifecycle presents challenges.

3. Multi-tenancy with Multiplicity • No tool really does it all. (Sorry.) • Data wrangling, ETL/ELT, different algorithms hosted in different compute frameworks, … • Data pipeline or workflow to tie it all together. • Everyone wants something different, sometimes for good reasons. Being able to run a large number of different workloads for a large number of different users is a win.

4. Containers • Package apps with their libraries in a (relatively) clean manner — especially important for native code. • Ensure traceability of code, presuming that there is a solid CI and repository solution in place.

5. Kubernetes is awesome. For reasons you already know: • Bin packing. • Horizontal scale-out for the platform, auto-scaling for pods. • Service discovery, load balancing. • Self-healing. • Batch execution. And more reasons in the future: • GPU affinity. • Backplane for Spark.

6. A Simple Idea What if we could package workloads in containers and then kubectl could be our fundamental devops primitive…? Napkin Sketch: 1. Build a control plane that knows how to stamp out workloads via a Provisioning API. 2. Profit. Kubernetes Control Plane Workload1 Workload2 Workload3 Provisioning API

7. Challenges • Typical workloads consist of multiple types of containers that need to collaborate. • Containerization (often) isn’t that bad, depending on your taste. • Many workloads or components thereof (e.g., Spark) aren’t designed in a manner that permits the best use of Kubernetes facilities. Surgery (or holding your nose) is frequently required, but sometimes (e.g., TensorFlow!) things work well from the start.

8. Example Problem: • Zookeeper • Nodes have distinct identity, and the client protocol is designed to defy load balancing. Solution: • Replication controller per node and call it a day.

9. Some Familiar Problems Once you can stamp out workloads, you get down to familiar problems: • Tenant-attributed logging (workload and user) and metrics. • “Billing” and metering. • Visibility and other flavors of operability. • Security — from purposeful or accidental attackers. • Workload isolation, e.g., for PII. Fixing these problems frequently frequently requires surgery, and none of these problems are unique to containerization or cluster scheduling of workloads, i.e., you have to solve them anyway.

10. Wrap Up • Building a data processing platform on Kubernetes has some obvious starting points and some familiar challenges. • More data scientists and middleware makers are starting with containers as a packaging scheme.

Tectonic Summit 2016: Multitenant Data Architectures with Kubernetes

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Tectonic Summit 2016: Multitenant Data Architectures with Kubernetes

Similar to Tectonic Summit 2016: Multitenant Data Architectures with Kubernetes (20)

More from CoreOS

More from CoreOS (11)

Recently uploaded

Recently uploaded (20)

Tectonic Summit 2016: Multitenant Data Architectures with Kubernetes