SlideShare a Scribd company logo
Serverless Compute Platforms
on Kubernetes:
Beyond Web Applications
Alex Glikson
Senior Research Architect, Cloud Platforms
Carnegie Mellon University, Pittsburgh, USA
(IBM Research, Israel)
KubeCon, May 2019
with Ping-Min Lin (Pinterest), Shengjie Luo (VMware), Ke Chang (Facebook), Shichao Nie (Alibaba)
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
Outline
● Introduction
○ Serverless
■ Serverless Compute
● FaaS
● Non-FaaS
● Our Use-Cases
○ Interactive Computing
■ Demo
○ Deep Learning
● Conclusions
2
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
Serverless
● Many definitions. In a nutshell:
● Avoid management of servers, as a representative example of tasks that:
○ Keep you distracted from developing your *core* business capabilities, and
○ Can be outsourced to someone you trust, for whom this would be *their* core business
● Serverless = Distraction-Free
● Separation of concerns
● Developer experience??
3
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
Serverless = Distraction-Free (Examples)
● Object Storage:
○ Core: data organization
○ Distraction: servers, storage, network, high availability, fault tolerance, replication, consistency
● Micro-services:
○ Core: services logic, interfaces
○ Distraction: infra, scaling, LB, HA/FT, API management, routing, service discovery, databases
● Async/Event-driven:
○ Core: event-processing logic
○ Distraction: eventing, messaging, queuing, notifications, etc (+infra/scaling/LB/HA/FT/auth/etc)
● …
4
Example:
Amazon S3
Example:
Kubernetes+Istio+…
Example:
Lambda, SNS, etc
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
Serverless Compute Platform (SCP)
● Platform that executes user-provided code (BYOC)
● Often optimized for specific application patterns
○ Often associated fine-grained elasticity, scaling to zero, etc
● Distraction-free
○ Simplified management
■ Deployment, scaling, metering, monitoring, logging, updates, etc
○ Seamless integration with services that the ‘compute’ interacts with (or depends on)
■ Event sources, data, communication middleware, etc.
5
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
SCP: Function as a Service (FaaS)
6
Platform
Property
General-Purpose FaaS
Examples Lambda, Azure functions, Google Functions;
Kubeless, OpenFaaS, OpenWhisk
Code
Application
Pattern
Management
Integration
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
SCP: Function as a Service (FaaS)
7
Platform
Property
General-Purpose FaaS
Examples Lambda, Azure functions, Google Functions;
Kubeless, OpenFaaS, OpenWhisk
Code Arbitrary functions
Application
Pattern
Management
Integration
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
SCP: Function as a Service (FaaS)
8
Platform
Property
General-Purpose FaaS
Examples Lambda, Azure functions, Google Functions;
Kubeless, OpenFaaS, OpenWhisk
Code Arbitrary functions
Application
Pattern
(Not too) short-lived, ephemeral functions, triggered by events or requests;
High load variability (including periods of idleness), (relatively) low sensitivity to latency
Management
Integration
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
SCP: Function as a Service (FaaS)
9
Platform
Property
General-Purpose FaaS
Examples Lambda, Azure functions, Google Functions;
Kubeless, OpenFaaS, OpenWhisk
Code Arbitrary functions
Application
Pattern
(Not too) short-lived, ephemeral functions, triggered by events or requests;
High load variability (including periods of idleness), (relatively) low sensitivity to latency
Management Fully managed runtime containers; functions & function invocations as first class citizens
Integration
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
SCP: Function as a Service (FaaS)
10
Platform
Property
General-Purpose FaaS
Examples Lambda, Azure functions, Google Functions;
Kubeless, OpenFaaS, OpenWhisk
Code Arbitrary functions
Application
Pattern
(Not too) short-lived, ephemeral functions, triggered by events or requests;
High load variability (including periods of idleness), (relatively) low sensitivity to latency
Management Fully managed runtime containers; functions & function invocations as first class citizens
Integration Seamless integration with multiple event sources
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
SCP: Specialized (Embedded) FaaS
11
Platform
Property
Programmable network edge FaaS
Examples PubNub Functions, Lambda@Edge
Code
Application
Pattern
Management
Integration
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
SCP: Specialized (Embedded) FaaS
12
Platform
Property
Programmable network edge FaaS
Examples PubNub Functions, Lambda@Edge
Code Arbitrary functions (programming languages often limited)
Application
Pattern
Management
Integration
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
SCP: Specialized (Embedded) FaaS
13
Platform
Property
Programmable network edge FaaS
Examples PubNub Functions, Lambda@Edge
Code Arbitrary functions (programming languages often limited)
Application
Pattern
High throughput, low latency packet processing
Management
Integration
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
SCP: Specialized (Embedded) FaaS
14
Platform
Property
Programmable network edge FaaS
Examples PubNub Functions, Lambda@Edge
Code Arbitrary functions (programming languages often limited)
Application
Pattern
High throughput, low latency packet processing
Management Fully managed isolated runtime
Integration
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
SCP: Specialized (Embedded) FaaS
15
Platform
Property
Programmable network edge FaaS
Examples PubNub Functions, Lambda@Edge
Code Arbitrary functions (programming languages often limited)
Application
Pattern
High throughput, low latency packet processing
Management Fully managed isolated runtime
Integration The hosting platform
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
Other (Non-FaaS?) SCPs: Serverless ETL
16
Platform
Property
Serverless ETL
Examples
Code
Application
Pattern
Management
Integration
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
Other (Non-FaaS?) SCPs: Serverless ETL
17
Platform
Property
Serverless ETL
Examples AWS Glue
Code
Application
Pattern
Management
Integration
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
Other (Non-FaaS?) SCPs: Serverless ETL
18
Platform
Property
Serverless ETL
Examples AWS Glue
Code PySpark, PyShell jobs
Application
Pattern
Management
Integration
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
Other (Non-FaaS?) SCPs: Serverless ETL
19
Platform
Property
Serverless ETL
Examples AWS Glue
Code PySpark, PyShell jobs
Application
Pattern
Data-parallel Spark jobs (periodic or ad-hoc)
Non-parallel pre/post-processing jobs
Management
Integration
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
Other (Non-FaaS?) SCPs: Serverless ETL
20
Platform
Property
Serverless ETL
Examples AWS Glue
Code PySpark, PyShell jobs
Application
Pattern
Data-parallel Spark jobs (periodic or ad-hoc)
Non-parallel pre/post-processing jobs
Management Fully managed Spark cluster; Python runtime
Integration Data catalogue
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
Non-FaaS SCP: Cloud-Native Web Applications
21
Platform
Property
Cloud-Native Web Applications
Examples
Code
Application
Pattern
Management
Integration
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
Non-FaaS SCP: Cloud-Native Web Applications
22
Platform
Property
Cloud-Native Web Applications
Examples Knative
Code
Application
Pattern
Management
Integration
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
Non-FaaS SCP: Cloud-Native Web Applications
23
Platform
Property
Cloud-Native Web Applications
Examples Knative
Code Arbitrary application serving HTTP requests
Application
Pattern
Management
Integration
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
Non-FaaS SCP: Cloud-Native Web Applications
24
Platform
Property
Cloud-Native Web Applications
Examples Knative
Code Arbitrary application serving HTTP requests
Application
Pattern
Long-running, scale-out services; Linear resource demand per request
Often high-throughput, low-latency
Management
Integration
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
Non-FaaS SCP: Cloud-Native Web Applications
25
Platform
Property
Cloud-Native Web Applications
Examples Knative
Code Arbitrary application serving HTTP requests
Application
Pattern
Long-running, scale-out services; Linear resource demand per request
Often high-throughput, low-latency
Management K8s features + code-to-deploy, revisions, canary deployment, etc
Integration
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
Non-FaaS SCP: Cloud-Native Web Applications
26
Platform
Property
Cloud-Native Web Applications
Examples Knative
Code Arbitrary application serving HTTP requests
Application
Pattern
Long-running, scale-out services; Linear resource demand per request
Often high-throughput, low-latency
Management K8s features + code-to-deploy, revisions, canary deployment, etc
Integration Service mash, build, eventing
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
What Other Application Patterns Could Justify a Specialized SCP?
27
Platform
Property
?
Examples ?
Code ?
Application
Pattern
?
Management ?
Integration ?
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
Outline
● Introduction
○ Serverless
■ Serverless Compute
● FaaS
● Non-FaaS
● Our Use-Cases
○ Interactive Computing
○ Deep Learning
● Conclusions
28
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
Interactive Computing
● Example: Data Science using Jupyter Notebook
● Architecture 1: Python + Spark
○ Scale-out Spark jobs
○ Requires Spark programming model
● Architecture 2: “pure” Python
○ Local execution, using non-parallel
Python libraries
○ Not designed for scale-out,
but can take advantage of scale-up
● Other example: Linux Shell
29
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
SCP for Interactive Computing
30
Property
Interactive Computing (Jupyter, Shell)
Code
Application
Pattern
Management
Integration
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
SCP for Interactive Computing
31
Property
Interactive Computing (Jupyter, Shell)
Code Python, Bash
Application
Pattern
Management
Integration
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
SCP for Interactive Computing
32
Property
Interactive Computing (Jupyter, Shell)
Code Python, Bash
Application
Pattern
Iterative invocation of stateful, non-parallel, computation-intensive,
ad-hoc tasks, triggered by explicit user interaction
Management
Integration
Efficient persistence of state across invocations
Scale-up rather than scale-out Easily re-programmable (code as payload)
Scale to zero when idle
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
SCP for Interactive Computing
33
Property
Interactive Computing (Jupyter, Shell)
Code Python, Bash
Application
Pattern
Iterative invocation of stateful, non-parallel, computation-intensive,
ad-hoc tasks, triggered by explicit user interaction
Management Provisioning, management, scaling of underlying resources
Integration
Efficient persistence of state across invocations
Scale-up rather than scale-out
Scale to zero when idle
Easily re-programmable (code as payload)
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
SCP for Interactive Computing
34
Property
Interactive Computing (Jupyter, Shell)
Code Python, Bash
Application
Pattern
Iterative invocation of stateful, non-parallel, computation-intensive,
ad-hoc tasks, triggered by explicit user interaction
Management Provisioning, management, scaling of underlying resources
Integration Data sources, auth, etc
Efficient persistence of state across invocations
Scale-up rather than scale-out
Scale to zero when idle
Easily re-programmable (code as payload)
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
Runbox: Elastic Persistent Execution Environment on K8s
https://github.com/slsvm/runbox
35
Notebook Filesystem Data Volume
Pod/RS
Container
Dev Machine
Runbox
Runbox
Controller*
Kubernetes Cluster
UI
(e.g.,
Jupyter,
Bash)
Runbox
Proxy
Create
Exec
Recycle
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
DEMO – Bash
● https://github.com/slsvm/runbox
36
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
37
Runbox environment:
Pod, Image, Volume,
(+deployment, side-car)
Remote command execution
Filesystem synchronization
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
38
Filesystem synchronization
Persistent over recycling
of idle resource (e.g., by
Runbox controller)
Runbox environment:
Pod, Image, Volume,
(+deployment, side-car)
Remote command execution
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
39
Filesystem synchronization
Persistent over recycling
of idle resource (e.g., by
Runbox controller)
Per-command vertical scaling
Runbox environment:
Pod, Image, Volume,
(+deployment, side-car)
Remote command execution
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
40
Filesystem synchronization
Persistent over recycling
of idle resource (e.g., by
Runbox controller)
Per-command vertical scaling
Runbox environment:
Pod, Image, Volume,
(+deployment, side-car)
Remote command execution
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
DEMO – Jupyter
● https://github.com/slsvm/runbox-jupyter (COMING SOON)
41
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
42
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
46
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
47
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
Architecture - Jupyter
48
Jupyter-Browser
Jupyter Server
Runbox
Extension
Notebook Filesystem Data Volume
Pod/RS
Container
Dev Machine
Runbox
Runbox
Controller*
sync
cold
save
GC
1 start kernel
4 resize
3 sync
2 create
6 resize
up
5 run cell
7 exec
9 exec
11
12
10 save
8 restore
Kubernetes Cluster
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
Design Details
● Special Jupyter Kernels, delegating execution to a K8s Pod using `kubectl exec`
○ E.g., scp-python, scp-bash
● State is persisted in a K8s volume attached to the Pod
○ Snapshot/restore in-memory state using `dill` in Python and `set/source` in Bash
○ Also, state is synchronized from/to the local machine via a side-car running unison
● Pod is scaled down (optionally, to zero) when nothing is executed
○ E.g., by scaling the containing ReplicaSet, or using in-place Pod vertical scaling (WIP)
○ Tradeoff between capacity for ‘warm’ containers and latency managed by dedicated controller
● When image changes (e.g., after `apt install`), a new image is committed
○ Using tags for versioning; docker-squash to remove redundant layers
● Magics to control the non-functional properties
○ E.g., resource allocation, whether or not image snapshot is needed, etc
49
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
Lessons Learned
● Kubernetes originally focused on scale-out workloads, but can also support
scale-up
○ New kind of controller?
● Generic support for application-assisted snapshots could be useful
● For use-cases involving ephemeral compute, API for direct access to volumes
could be useful
50
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
Outline
● Introduction
○ Serverless
■ Serverless Compute
● FaaS
● Non-FaaS
● Our Use-Cases
○ Interactive Computing
○ Deep Learning
● Conclusions
51
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
Deep Learning
● Resource-intensive
○ (1) model training, (2) inference
● Frameworks: Tensorflow, Keras, PyTorch, etc.
● ‘Hot’ research area – new algorithms, frameworks, etc
● Example application: Image Classification
○ Given a model + unlabeled example(s), predict label(s)
○ Compute-intensive, scale-out, can leverage GPUs
52
transportation medicine smart cities, security consumer games e-commerce
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
SCP for Deep Learning Inference
53
Property
Deep Learning Inference
Code
Application
Pattern
Management
Integration
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
SCP for Deep Learning Inference
54
Property
Deep Learning Inference
Code Model inference implementation (Python)
Application
Pattern
Management
Integration
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
SCP for Deep Learning Inference
55
Property
Deep Learning Inference
Code Model inference implementation (Python)
Application
Pattern
Long-running, scale-out services; Linear resource demand per request; Load variance
Can benefit from running on GPUs; potentially large “cold-start” latencies
Management
Integration
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
SCP for Deep Learning Inference
56
Property
Deep Learning Inference
Code Model inference implementation (Python)
Application
Pattern
Long-running, scale-out services; Linear resource demand per request; Load variance
Can benefit from running on GPUs; potentially large “cold-start” latencies
Management
Same as Knative: build, serving, eventing
Load balancing between GPU and CPU resources; Minimal ‘cold-start’ latency
Integration
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
SCP for Deep Learning Inference
57
Property
Deep Learning Inference
Code Model inference implementation (Python)
Application
Pattern
Long-running, scale-out services; Linear resource demand per request; Load variance
Can benefit from running on GPUs; potentially large “cold-start” latencies
Management
Same as Knative: build, serving, eventing
Load balancing between GPU and CPU resources; Minimal ‘cold-start’ latency
Integration K8s, Istio, model storage, etc
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
Our Architecture
58
Pod
scaling
GPU Nodes
Pod Pod
scaling
Knative
Service 2
PodPodPodPod
Knative
Service 1
Pod
scaling
CPU Nodes
Pod Pod
scaling
Knative
Service 4
PodPodPodPod
Knative
Service 3
Pod
Standby
Pool
GPU-aware
Load Balancer
LB
GPU
Scheduler
Pool
Manager
User
Hybrid Service
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
Design Details
● Build: Automatically add HTTP interface
○ Augment the provided inference logic with a Django ‘wrapper’, then use Knative build to deploy it
● Load-balancing across GPU-enabled and CPU-only nodes
○ Patch Knative to support GPU resources
○ Based on model properties, indicate in the Knative service template whether a GPU is preferable
○ Two-level scheduling: 1 GPU service and 1 CPU service for each app; fair time-sharing of GPUs
● Maintain a pool of ‘warm’ Pods
○ “Pool” is a ReplicaSet with ‘warm’ (running) Pods
■ Size is adjusted dynamically by the Pool Controller (cluster utilization, estimated demand)
○ Knative scaling logic consumes a warm Pod from the Pool instead of provisioning a new one
■ Pod “migration” is implemented by label manipulation + update of the Istio side-car via API
59
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
Lessons Learned
● Standardized HTTP wrappers can be used to deliver FaaS-like experience
○ Can leverage existing open source FaaS solutions (e.g., OpenWhisk)
● More fine-grained management of GPU resources would be beneficial
○ The overhead of 2-level scheduling is substantial
● For reuse of ‘warm’ Pods, stronger notion of ‘similarity’ between Pods is needed
○ E.g., same model version?
● Even pool of size 1 significantly reduces the chances of cold starts
○ Instead of pools, can we reuse priority classes and make Knative scaling logic adjust priorities?
60
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
Outline
● Introduction
○ Serverless
■ Serverless Compute
● FaaS
● Non-FaaS
● Our Use-Cases
○ Deep Learning
○ Interactive Computing
● Conclusions
61
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
Conclusions
● “Serverless” = BYOC + distraction-free
● “Serverless” derives different requirements for different workloads
● No one-size-fits-all!
● Lots of opportunities to deliver ‘serverless’ experience for new workloads!
○ Knative can be enhanced to achieve “serverless” goals for DL inference (KFserving?)
○ SCP for Interactive Computing requires new capabilities on top of Kubernetes
■ https://github.com/slsvm/runbox
62
KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
Questions? Ideas? Suggestions? Collaboration?
● alex dot glikson at gmail dot com
63

More Related Content

Similar to Serverless Compute Platforms on Kubernetes

RedisConf18 - Redis in Dev, Test, and Prod with the OpenShift Service Catalog
RedisConf18 - Redis in Dev, Test, and Prod with the OpenShift Service CatalogRedisConf18 - Redis in Dev, Test, and Prod with the OpenShift Service Catalog
RedisConf18 - Redis in Dev, Test, and Prod with the OpenShift Service Catalog
Redis Labs
 

Similar to Serverless Compute Platforms on Kubernetes (20)

Building Cloud-Native Applications with Kubernetes, Helm and Kubeless
Building Cloud-Native Applications with Kubernetes, Helm and KubelessBuilding Cloud-Native Applications with Kubernetes, Helm and Kubeless
Building Cloud-Native Applications with Kubernetes, Helm and Kubeless
 
[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...
[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...
[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...
 
Cloud Architecture - Multi Cloud, Edge, On-Premise
Cloud Architecture - Multi Cloud, Edge, On-PremiseCloud Architecture - Multi Cloud, Edge, On-Premise
Cloud Architecture - Multi Cloud, Edge, On-Premise
 
stackconf 2020 | The path to a Serverless-native era with Kubernetes by Paolo...
stackconf 2020 | The path to a Serverless-native era with Kubernetes by Paolo...stackconf 2020 | The path to a Serverless-native era with Kubernetes by Paolo...
stackconf 2020 | The path to a Serverless-native era with Kubernetes by Paolo...
 
Kubernetes workshop -_the_basics
Kubernetes workshop -_the_basicsKubernetes workshop -_the_basics
Kubernetes workshop -_the_basics
 
apidays LIVE Hong Kong 2021 - Event-driven APIs & Schema governance for Apach...
apidays LIVE Hong Kong 2021 - Event-driven APIs & Schema governance for Apach...apidays LIVE Hong Kong 2021 - Event-driven APIs & Schema governance for Apach...
apidays LIVE Hong Kong 2021 - Event-driven APIs & Schema governance for Apach...
 
Cloud computing: highlights
Cloud computing: highlightsCloud computing: highlights
Cloud computing: highlights
 
AWS Cloud Solutions Architects & Tech Enthusiasts
AWS Cloud Solutions Architects & Tech EnthusiastsAWS Cloud Solutions Architects & Tech Enthusiasts
AWS Cloud Solutions Architects & Tech Enthusiasts
 
Cloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQ
Cloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQCloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQ
Cloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQ
 
RedisConf18 - Redis in Dev, Test, and Prod with the OpenShift Service Catalog
RedisConf18 - Redis in Dev, Test, and Prod with the OpenShift Service CatalogRedisConf18 - Redis in Dev, Test, and Prod with the OpenShift Service Catalog
RedisConf18 - Redis in Dev, Test, and Prod with the OpenShift Service Catalog
 
Building Serverless Microservices Using Serverless Framework on the Cloud
Building Serverless Microservices Using Serverless Framework on the CloudBuilding Serverless Microservices Using Serverless Framework on the Cloud
Building Serverless Microservices Using Serverless Framework on the Cloud
 
Confluent Operator as Cloud-Native Kafka Operator for Kubernetes
Confluent Operator as Cloud-Native Kafka Operator for KubernetesConfluent Operator as Cloud-Native Kafka Operator for Kubernetes
Confluent Operator as Cloud-Native Kafka Operator for Kubernetes
 
Streaming Movies brings you Streamlined Applications -- How Adopting Netflix ...
Streaming Movies brings you Streamlined Applications -- How Adopting Netflix ...Streaming Movies brings you Streamlined Applications -- How Adopting Netflix ...
Streaming Movies brings you Streamlined Applications -- How Adopting Netflix ...
 
Building Cross-Cloud Platform Cognitive Microservices Using Serverless Archit...
Building Cross-Cloud Platform Cognitive Microservices Using Serverless Archit...Building Cross-Cloud Platform Cognitive Microservices Using Serverless Archit...
Building Cross-Cloud Platform Cognitive Microservices Using Serverless Archit...
 
An introduction to Serverless
An introduction to ServerlessAn introduction to Serverless
An introduction to Serverless
 
CNCF Introduction - Feb 2018
CNCF Introduction - Feb 2018CNCF Introduction - Feb 2018
CNCF Introduction - Feb 2018
 
MongoDB World 2018: Partner Talk - Red Hat: Deploying to Enterprise Kubernetes
MongoDB World 2018: Partner Talk - Red Hat: Deploying to Enterprise KubernetesMongoDB World 2018: Partner Talk - Red Hat: Deploying to Enterprise Kubernetes
MongoDB World 2018: Partner Talk - Red Hat: Deploying to Enterprise Kubernetes
 
Migrate a on-prem platform to the public cloud with Java - SpringBoot and PCF
Migrate a on-prem platform to the public cloud with Java - SpringBoot and PCFMigrate a on-prem platform to the public cloud with Java - SpringBoot and PCF
Migrate a on-prem platform to the public cloud with Java - SpringBoot and PCF
 
Kubernetes - Cloud Native Application Orchestration - Catalin Jora
Kubernetes - Cloud Native Application Orchestration - Catalin JoraKubernetes - Cloud Native Application Orchestration - Catalin Jora
Kubernetes - Cloud Native Application Orchestration - Catalin Jora
 
Software Engineering in the (AWS) Cloud
Software Engineering in the (AWS) CloudSoftware Engineering in the (AWS) Cloud
Software Engineering in the (AWS) Cloud
 

More from Alex Glikson

More from Alex Glikson (8)

AWS Re:Invented
AWS Re:InventedAWS Re:Invented
AWS Re:Invented
 
From chroot to Docker to Kubernetes
From chroot to Docker to KubernetesFrom chroot to Docker to Kubernetes
From chroot to Docker to Kubernetes
 
Cloud-Native Application and Kubernetes
Cloud-Native Application and KubernetesCloud-Native Application and Kubernetes
Cloud-Native Application and Kubernetes
 
Mixing bare-metal and virtualized workloads on OpenStack - 2014
Mixing bare-metal and virtualized workloads on OpenStack - 2014Mixing bare-metal and virtualized workloads on OpenStack - 2014
Mixing bare-metal and virtualized workloads on OpenStack - 2014
 
Serverless, IoT and OpenWhisk
Serverless, IoT and OpenWhiskServerless, IoT and OpenWhisk
Serverless, IoT and OpenWhisk
 
Container-Based Platforms and Kubernetes
Container-Based Platforms and KubernetesContainer-Based Platforms and Kubernetes
Container-Based Platforms and Kubernetes
 
Going Serverless with OpenWhisk
Going Serverless with OpenWhiskGoing Serverless with OpenWhisk
Going Serverless with OpenWhisk
 
The Serverless Paradigm, OpenWhisk and FIWARE
The Serverless Paradigm, OpenWhisk and FIWAREThe Serverless Paradigm, OpenWhisk and FIWARE
The Serverless Paradigm, OpenWhisk and FIWARE
 

Recently uploaded

Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 

Recently uploaded (20)

Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 

Serverless Compute Platforms on Kubernetes

  • 1. Serverless Compute Platforms on Kubernetes: Beyond Web Applications Alex Glikson Senior Research Architect, Cloud Platforms Carnegie Mellon University, Pittsburgh, USA (IBM Research, Israel) KubeCon, May 2019 with Ping-Min Lin (Pinterest), Shengjie Luo (VMware), Ke Chang (Facebook), Shichao Nie (Alibaba)
  • 2. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 Outline ● Introduction ○ Serverless ■ Serverless Compute ● FaaS ● Non-FaaS ● Our Use-Cases ○ Interactive Computing ■ Demo ○ Deep Learning ● Conclusions 2
  • 3. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 Serverless ● Many definitions. In a nutshell: ● Avoid management of servers, as a representative example of tasks that: ○ Keep you distracted from developing your *core* business capabilities, and ○ Can be outsourced to someone you trust, for whom this would be *their* core business ● Serverless = Distraction-Free ● Separation of concerns ● Developer experience?? 3
  • 4. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 Serverless = Distraction-Free (Examples) ● Object Storage: ○ Core: data organization ○ Distraction: servers, storage, network, high availability, fault tolerance, replication, consistency ● Micro-services: ○ Core: services logic, interfaces ○ Distraction: infra, scaling, LB, HA/FT, API management, routing, service discovery, databases ● Async/Event-driven: ○ Core: event-processing logic ○ Distraction: eventing, messaging, queuing, notifications, etc (+infra/scaling/LB/HA/FT/auth/etc) ● … 4 Example: Amazon S3 Example: Kubernetes+Istio+… Example: Lambda, SNS, etc
  • 5. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 Serverless Compute Platform (SCP) ● Platform that executes user-provided code (BYOC) ● Often optimized for specific application patterns ○ Often associated fine-grained elasticity, scaling to zero, etc ● Distraction-free ○ Simplified management ■ Deployment, scaling, metering, monitoring, logging, updates, etc ○ Seamless integration with services that the ‘compute’ interacts with (or depends on) ■ Event sources, data, communication middleware, etc. 5
  • 6. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 SCP: Function as a Service (FaaS) 6 Platform Property General-Purpose FaaS Examples Lambda, Azure functions, Google Functions; Kubeless, OpenFaaS, OpenWhisk Code Application Pattern Management Integration
  • 7. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 SCP: Function as a Service (FaaS) 7 Platform Property General-Purpose FaaS Examples Lambda, Azure functions, Google Functions; Kubeless, OpenFaaS, OpenWhisk Code Arbitrary functions Application Pattern Management Integration
  • 8. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 SCP: Function as a Service (FaaS) 8 Platform Property General-Purpose FaaS Examples Lambda, Azure functions, Google Functions; Kubeless, OpenFaaS, OpenWhisk Code Arbitrary functions Application Pattern (Not too) short-lived, ephemeral functions, triggered by events or requests; High load variability (including periods of idleness), (relatively) low sensitivity to latency Management Integration
  • 9. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 SCP: Function as a Service (FaaS) 9 Platform Property General-Purpose FaaS Examples Lambda, Azure functions, Google Functions; Kubeless, OpenFaaS, OpenWhisk Code Arbitrary functions Application Pattern (Not too) short-lived, ephemeral functions, triggered by events or requests; High load variability (including periods of idleness), (relatively) low sensitivity to latency Management Fully managed runtime containers; functions & function invocations as first class citizens Integration
  • 10. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 SCP: Function as a Service (FaaS) 10 Platform Property General-Purpose FaaS Examples Lambda, Azure functions, Google Functions; Kubeless, OpenFaaS, OpenWhisk Code Arbitrary functions Application Pattern (Not too) short-lived, ephemeral functions, triggered by events or requests; High load variability (including periods of idleness), (relatively) low sensitivity to latency Management Fully managed runtime containers; functions & function invocations as first class citizens Integration Seamless integration with multiple event sources
  • 11. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 SCP: Specialized (Embedded) FaaS 11 Platform Property Programmable network edge FaaS Examples PubNub Functions, Lambda@Edge Code Application Pattern Management Integration
  • 12. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 SCP: Specialized (Embedded) FaaS 12 Platform Property Programmable network edge FaaS Examples PubNub Functions, Lambda@Edge Code Arbitrary functions (programming languages often limited) Application Pattern Management Integration
  • 13. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 SCP: Specialized (Embedded) FaaS 13 Platform Property Programmable network edge FaaS Examples PubNub Functions, Lambda@Edge Code Arbitrary functions (programming languages often limited) Application Pattern High throughput, low latency packet processing Management Integration
  • 14. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 SCP: Specialized (Embedded) FaaS 14 Platform Property Programmable network edge FaaS Examples PubNub Functions, Lambda@Edge Code Arbitrary functions (programming languages often limited) Application Pattern High throughput, low latency packet processing Management Fully managed isolated runtime Integration
  • 15. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 SCP: Specialized (Embedded) FaaS 15 Platform Property Programmable network edge FaaS Examples PubNub Functions, Lambda@Edge Code Arbitrary functions (programming languages often limited) Application Pattern High throughput, low latency packet processing Management Fully managed isolated runtime Integration The hosting platform
  • 16. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 Other (Non-FaaS?) SCPs: Serverless ETL 16 Platform Property Serverless ETL Examples Code Application Pattern Management Integration
  • 17. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 Other (Non-FaaS?) SCPs: Serverless ETL 17 Platform Property Serverless ETL Examples AWS Glue Code Application Pattern Management Integration
  • 18. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 Other (Non-FaaS?) SCPs: Serverless ETL 18 Platform Property Serverless ETL Examples AWS Glue Code PySpark, PyShell jobs Application Pattern Management Integration
  • 19. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 Other (Non-FaaS?) SCPs: Serverless ETL 19 Platform Property Serverless ETL Examples AWS Glue Code PySpark, PyShell jobs Application Pattern Data-parallel Spark jobs (periodic or ad-hoc) Non-parallel pre/post-processing jobs Management Integration
  • 20. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 Other (Non-FaaS?) SCPs: Serverless ETL 20 Platform Property Serverless ETL Examples AWS Glue Code PySpark, PyShell jobs Application Pattern Data-parallel Spark jobs (periodic or ad-hoc) Non-parallel pre/post-processing jobs Management Fully managed Spark cluster; Python runtime Integration Data catalogue
  • 21. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 Non-FaaS SCP: Cloud-Native Web Applications 21 Platform Property Cloud-Native Web Applications Examples Code Application Pattern Management Integration
  • 22. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 Non-FaaS SCP: Cloud-Native Web Applications 22 Platform Property Cloud-Native Web Applications Examples Knative Code Application Pattern Management Integration
  • 23. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 Non-FaaS SCP: Cloud-Native Web Applications 23 Platform Property Cloud-Native Web Applications Examples Knative Code Arbitrary application serving HTTP requests Application Pattern Management Integration
  • 24. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 Non-FaaS SCP: Cloud-Native Web Applications 24 Platform Property Cloud-Native Web Applications Examples Knative Code Arbitrary application serving HTTP requests Application Pattern Long-running, scale-out services; Linear resource demand per request Often high-throughput, low-latency Management Integration
  • 25. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 Non-FaaS SCP: Cloud-Native Web Applications 25 Platform Property Cloud-Native Web Applications Examples Knative Code Arbitrary application serving HTTP requests Application Pattern Long-running, scale-out services; Linear resource demand per request Often high-throughput, low-latency Management K8s features + code-to-deploy, revisions, canary deployment, etc Integration
  • 26. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 Non-FaaS SCP: Cloud-Native Web Applications 26 Platform Property Cloud-Native Web Applications Examples Knative Code Arbitrary application serving HTTP requests Application Pattern Long-running, scale-out services; Linear resource demand per request Often high-throughput, low-latency Management K8s features + code-to-deploy, revisions, canary deployment, etc Integration Service mash, build, eventing
  • 27. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 What Other Application Patterns Could Justify a Specialized SCP? 27 Platform Property ? Examples ? Code ? Application Pattern ? Management ? Integration ?
  • 28. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 Outline ● Introduction ○ Serverless ■ Serverless Compute ● FaaS ● Non-FaaS ● Our Use-Cases ○ Interactive Computing ○ Deep Learning ● Conclusions 28
  • 29. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 Interactive Computing ● Example: Data Science using Jupyter Notebook ● Architecture 1: Python + Spark ○ Scale-out Spark jobs ○ Requires Spark programming model ● Architecture 2: “pure” Python ○ Local execution, using non-parallel Python libraries ○ Not designed for scale-out, but can take advantage of scale-up ● Other example: Linux Shell 29
  • 30. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 SCP for Interactive Computing 30 Property Interactive Computing (Jupyter, Shell) Code Application Pattern Management Integration
  • 31. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 SCP for Interactive Computing 31 Property Interactive Computing (Jupyter, Shell) Code Python, Bash Application Pattern Management Integration
  • 32. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 SCP for Interactive Computing 32 Property Interactive Computing (Jupyter, Shell) Code Python, Bash Application Pattern Iterative invocation of stateful, non-parallel, computation-intensive, ad-hoc tasks, triggered by explicit user interaction Management Integration Efficient persistence of state across invocations Scale-up rather than scale-out Easily re-programmable (code as payload) Scale to zero when idle
  • 33. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 SCP for Interactive Computing 33 Property Interactive Computing (Jupyter, Shell) Code Python, Bash Application Pattern Iterative invocation of stateful, non-parallel, computation-intensive, ad-hoc tasks, triggered by explicit user interaction Management Provisioning, management, scaling of underlying resources Integration Efficient persistence of state across invocations Scale-up rather than scale-out Scale to zero when idle Easily re-programmable (code as payload)
  • 34. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 SCP for Interactive Computing 34 Property Interactive Computing (Jupyter, Shell) Code Python, Bash Application Pattern Iterative invocation of stateful, non-parallel, computation-intensive, ad-hoc tasks, triggered by explicit user interaction Management Provisioning, management, scaling of underlying resources Integration Data sources, auth, etc Efficient persistence of state across invocations Scale-up rather than scale-out Scale to zero when idle Easily re-programmable (code as payload)
  • 35. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 Runbox: Elastic Persistent Execution Environment on K8s https://github.com/slsvm/runbox 35 Notebook Filesystem Data Volume Pod/RS Container Dev Machine Runbox Runbox Controller* Kubernetes Cluster UI (e.g., Jupyter, Bash) Runbox Proxy Create Exec Recycle
  • 36. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 DEMO – Bash ● https://github.com/slsvm/runbox 36
  • 37. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 37 Runbox environment: Pod, Image, Volume, (+deployment, side-car) Remote command execution Filesystem synchronization
  • 38. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 38 Filesystem synchronization Persistent over recycling of idle resource (e.g., by Runbox controller) Runbox environment: Pod, Image, Volume, (+deployment, side-car) Remote command execution
  • 39. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 39 Filesystem synchronization Persistent over recycling of idle resource (e.g., by Runbox controller) Per-command vertical scaling Runbox environment: Pod, Image, Volume, (+deployment, side-car) Remote command execution
  • 40. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 40 Filesystem synchronization Persistent over recycling of idle resource (e.g., by Runbox controller) Per-command vertical scaling Runbox environment: Pod, Image, Volume, (+deployment, side-car) Remote command execution
  • 41. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 DEMO – Jupyter ● https://github.com/slsvm/runbox-jupyter (COMING SOON) 41
  • 42. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 42
  • 43. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
  • 44. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
  • 45. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019
  • 46. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 46
  • 47. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 47
  • 48. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 Architecture - Jupyter 48 Jupyter-Browser Jupyter Server Runbox Extension Notebook Filesystem Data Volume Pod/RS Container Dev Machine Runbox Runbox Controller* sync cold save GC 1 start kernel 4 resize 3 sync 2 create 6 resize up 5 run cell 7 exec 9 exec 11 12 10 save 8 restore Kubernetes Cluster
  • 49. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 Design Details ● Special Jupyter Kernels, delegating execution to a K8s Pod using `kubectl exec` ○ E.g., scp-python, scp-bash ● State is persisted in a K8s volume attached to the Pod ○ Snapshot/restore in-memory state using `dill` in Python and `set/source` in Bash ○ Also, state is synchronized from/to the local machine via a side-car running unison ● Pod is scaled down (optionally, to zero) when nothing is executed ○ E.g., by scaling the containing ReplicaSet, or using in-place Pod vertical scaling (WIP) ○ Tradeoff between capacity for ‘warm’ containers and latency managed by dedicated controller ● When image changes (e.g., after `apt install`), a new image is committed ○ Using tags for versioning; docker-squash to remove redundant layers ● Magics to control the non-functional properties ○ E.g., resource allocation, whether or not image snapshot is needed, etc 49
  • 50. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 Lessons Learned ● Kubernetes originally focused on scale-out workloads, but can also support scale-up ○ New kind of controller? ● Generic support for application-assisted snapshots could be useful ● For use-cases involving ephemeral compute, API for direct access to volumes could be useful 50
  • 51. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 Outline ● Introduction ○ Serverless ■ Serverless Compute ● FaaS ● Non-FaaS ● Our Use-Cases ○ Interactive Computing ○ Deep Learning ● Conclusions 51
  • 52. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 Deep Learning ● Resource-intensive ○ (1) model training, (2) inference ● Frameworks: Tensorflow, Keras, PyTorch, etc. ● ‘Hot’ research area – new algorithms, frameworks, etc ● Example application: Image Classification ○ Given a model + unlabeled example(s), predict label(s) ○ Compute-intensive, scale-out, can leverage GPUs 52 transportation medicine smart cities, security consumer games e-commerce
  • 53. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 SCP for Deep Learning Inference 53 Property Deep Learning Inference Code Application Pattern Management Integration
  • 54. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 SCP for Deep Learning Inference 54 Property Deep Learning Inference Code Model inference implementation (Python) Application Pattern Management Integration
  • 55. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 SCP for Deep Learning Inference 55 Property Deep Learning Inference Code Model inference implementation (Python) Application Pattern Long-running, scale-out services; Linear resource demand per request; Load variance Can benefit from running on GPUs; potentially large “cold-start” latencies Management Integration
  • 56. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 SCP for Deep Learning Inference 56 Property Deep Learning Inference Code Model inference implementation (Python) Application Pattern Long-running, scale-out services; Linear resource demand per request; Load variance Can benefit from running on GPUs; potentially large “cold-start” latencies Management Same as Knative: build, serving, eventing Load balancing between GPU and CPU resources; Minimal ‘cold-start’ latency Integration
  • 57. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 SCP for Deep Learning Inference 57 Property Deep Learning Inference Code Model inference implementation (Python) Application Pattern Long-running, scale-out services; Linear resource demand per request; Load variance Can benefit from running on GPUs; potentially large “cold-start” latencies Management Same as Knative: build, serving, eventing Load balancing between GPU and CPU resources; Minimal ‘cold-start’ latency Integration K8s, Istio, model storage, etc
  • 58. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 Our Architecture 58 Pod scaling GPU Nodes Pod Pod scaling Knative Service 2 PodPodPodPod Knative Service 1 Pod scaling CPU Nodes Pod Pod scaling Knative Service 4 PodPodPodPod Knative Service 3 Pod Standby Pool GPU-aware Load Balancer LB GPU Scheduler Pool Manager User Hybrid Service
  • 59. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 Design Details ● Build: Automatically add HTTP interface ○ Augment the provided inference logic with a Django ‘wrapper’, then use Knative build to deploy it ● Load-balancing across GPU-enabled and CPU-only nodes ○ Patch Knative to support GPU resources ○ Based on model properties, indicate in the Knative service template whether a GPU is preferable ○ Two-level scheduling: 1 GPU service and 1 CPU service for each app; fair time-sharing of GPUs ● Maintain a pool of ‘warm’ Pods ○ “Pool” is a ReplicaSet with ‘warm’ (running) Pods ■ Size is adjusted dynamically by the Pool Controller (cluster utilization, estimated demand) ○ Knative scaling logic consumes a warm Pod from the Pool instead of provisioning a new one ■ Pod “migration” is implemented by label manipulation + update of the Istio side-car via API 59
  • 60. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 Lessons Learned ● Standardized HTTP wrappers can be used to deliver FaaS-like experience ○ Can leverage existing open source FaaS solutions (e.g., OpenWhisk) ● More fine-grained management of GPU resources would be beneficial ○ The overhead of 2-level scheduling is substantial ● For reuse of ‘warm’ Pods, stronger notion of ‘similarity’ between Pods is needed ○ E.g., same model version? ● Even pool of size 1 significantly reduces the chances of cold starts ○ Instead of pools, can we reuse priority classes and make Knative scaling logic adjust priorities? 60
  • 61. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 Outline ● Introduction ○ Serverless ■ Serverless Compute ● FaaS ● Non-FaaS ● Our Use-Cases ○ Deep Learning ○ Interactive Computing ● Conclusions 61
  • 62. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 Conclusions ● “Serverless” = BYOC + distraction-free ● “Serverless” derives different requirements for different workloads ● No one-size-fits-all! ● Lots of opportunities to deliver ‘serverless’ experience for new workloads! ○ Knative can be enhanced to achieve “serverless” goals for DL inference (KFserving?) ○ SCP for Interactive Computing requires new capabilities on top of Kubernetes ■ https://github.com/slsvm/runbox 62
  • 63. KubeCon / CloudNativeCon, Barcelona, May 20-23, 2019 Questions? Ideas? Suggestions? Collaboration? ● alex dot glikson at gmail dot com 63