Insights on Knative and how it changes the serverless landscape

Ein Blick behind-the-scenes
und wie Knative die Serverless
Landschaft verändert
(following charts in english)
Jeremias Werner | Senior Software Developer | IBM

Who I‘am...
2Jeremias Werner @JereWerner
Jeremias Werner
Senior Software Developer
IBM Research & Development
jerewern@de.ibm.com
LinkedIn: jeremias-werner
Twitter: JereWerner
Opinions are my own!

Who knows
what serverless
is?

Who has used
serverless
technology?

Who knows
Knative?

Content
What is Serverless?
What is Serverless good for?
What is Knative?
How Knative works behind-the-scenes?
How Knative changes the Serverless Landscape?
• Positioning
• Value Proposition

An evolution of compute!
Break-up the monolith
– Inherent Scaling
– Better resource
utilization
– Reduced costs
Abstraction of
infrastructure
– Devs focus on code
not infrastructure
– Faster time to
market = $
7
Source: If applicable, describe source origin
Increasingfocusonbusinesslogic
Decreasing concern (and control) over stack implementation
Bare Metal
Virtual machines
Functions
Serverless
Containers
Apps/PaaS
Jeremias Werner @JereWerner

Value Proposition
No management and
operation of
infrastructures
Focus on developing
value-adding code and
on driving innovations
Transparently scales
with the number of
requests being served
Only pay for resources
being used, instead of
resources idling
around
8
$

This is the price of 1 GB allocated for 1s of
IBM Cloud Functions
$0.000017

Traditional Model
Worry about when and how to scale
Worry about resiliency & cost
Charged even when idling / not 100% utilized
Continuous polling due to missing event
programming model

Serverless Model
Scales inherently one process per request
No cost overhead for resiliency
Introduces event programming model
Charges only for what is used
Only worry about code
higher dev velocity, lower operational costs

Demo - A FaaS experience!
Jeremias Werner @JereWerner 12

Container cold start is in the ballpark of...
~200ms

Content
What is Serverless?
What is Knative?
• Use Cases
• Customer Stories
• Demo

Some customers
Many more in-prod
customer projects
across numerous
industries
incl automotive,
banking, insurance,
entertainment, retail,
manufacturing, etc.
List of customers are removed!

Serverless is more than FaaS
While FaaS is the key anchor point for serverless,
there is a growing set of services from other
domains also delivering serverless attributes
This enables customers to build application
topologies which are entirely serverless
Build your serverless architecture!

Common Use Cases
Serverless API
Backends /
Microservices
Mobile backends
Conversational
applications
Scheduled tasks
Massively parallel
compute / “Map”
operations
Parallel data
processing
Data-at-rest
processing & ETL
Pipelines
Data processing
enriched with cognitive
capabilities
Event Stream
Processing
IoT

Serverless API backend
Allows to map API
endpoints
to functions
DBaaSFaaSAPI GatewayClient

Mobile Backend
Remember the value
propostion!
If there is no request,
nothing to run and no
charges occur!
FaaS

Data Processing
Ideally suited for working with structured data, text,
audio, image and video data:
• Data enrichment, transformation, validation,
cleansing
• PDF processing
• Audio normalization
• Image rotation, sharpening, noise reduction or
• Thumbnail generation
• Image OCR’ing
• Video transcoding
20
Elephant
Animal
Sign
FaaSDBaaS

Gain for Customer
10x
faster

Gain for Customer
-90%
cost

IBM & ESPN bring AI to Fantasy Football
23
https://www.youtube.com/watch?v=uDeP5b3iKfU

ESPN Fantasy Football
Functions are used to
compute the content of
user dashboards

ESPN Fantasy Football has
10+ million daily
active user

Weather radar video processing
and thumbnail generation
Periodic trigger to scan
weather data from
object storage
Functions to generate
thumbnails and
animated gifs
26
(https://www.wunderground.com/maps)

Demo
Map and Reduce the values of a column in 288 CSV
files with a total of 360 million rows stored on cloud
object storage to generate a histogram
pywren.io + knative.dev

PyWren Rocks!
28
Number of forecasts Local run FaaS
100,000 ~10,000 secs ~140 secs

Content
What is Serverless?
What is Knative?
• Knative
• Knative Serving in Detail
• Demo

„Open source building blocks for
Serverless on Kubernetes“
30
https://knative.dev/

What is Knative?
Run serverless containers, apps and functions on
Kubernetes with ease
Knative takes care of the details of
• networking,
• autoscaling (+ from-zero, to-zero)
• revision tracking
You just have to focus on your core logic
Simplified UX on top of Kubernetes
31
https://knative.dev/docs/

Main Components
Serving is the runtime
component that hosts
and scales your
application as K8s
pods
Tekton (Build) are
Kubernetes building blocks to
run pipelines to create
images from source, now in
the ci/cd foundation
Eventing contains
tools for managing
events between
loosely coupled
services
Client owns the kn CLI to
manage knative resources
32
https://tekton.dev

Today, we focus on
Serving!

Serving
Service – Is the top level resource that controls the
deployment and life-cycle of the workload
Route – Is the external visible endpoint of the
service and is routing the traffic to individual
revisions
Configuration – Describes the current desired state
of the deployment
Revision – Is created for each modification of the
configuration and reflect a point-in-time
configuration of the deployment
34
https://knative.dev/docs/serving/

Serving and how it works?
1. Deploy app as pod/revision
2. Networking auto-setup
3. Revisions are scaled up/down based on load
4. Update of service create Revisions
5. Traffic splitting based on %
6. Dedicated URLs to Revisions

Demo
Group Name / DOC ID / Month XX, 2019 / © 2019 IBM Corporation 36

Knative Community
37
v0.10
Releases
~450
Individual
Contributors
~2.4k
Stars
>5k
Pull requests
9
Working groups
7
Seats in Steering
Commitee. 4xGoogle,
1xIBM, 1xRedHat,
1xPivotal

Content
What is Serverless?
What is Knative?
• How Knative works
• Revisit the customer requirements
• Understand scaling behaviour
• Capacity considerations

The API Specification
containers: Kubernetes container spec but only
1 container is allowed
containerConcurrency: The maximum number
of concurrent requests being handled by a single
container instance. Basis for scaling.
resources: The requested and limited memory
and cpu resources of the container
apiVersion: serving.knative.dev/v1beta1
kind: Service
metadata:
name: helloworld
spec:
template:
spec:
containerConcurrency: 10
containers:
- image: jerewern/helloworld
resources:
limits:
memory: 256Mi
cpu: 2000m
requests:
memory: 128Mi
cpu: 100m
https://github.com/knative/docs/blob/master/docs/serving/spec/knative-api-specification-1.0.md

The Request Flow (simplified)
1. Istio Ingress Gateway is configured by the
Route and terminates the request
2. Ingress Gateway forwards the requests to the
Activtor
3. Activator buffers requests when scaled to zero
or when in burst mode
4. Queue Proxy is terminating the request in the
service pod and forwards the request to the
user container
5. Autoscaler is scraping metrics from the
Activator and Queue Proxies and scales the
Deployment
assuming istio-sidecar injection being disabled

Think big! A customer requirement
~ 1s response
guarantee

Interactive workload
Cold start
latency

The Problem Statement
43
The Pod startup depends on a couple of factors,
like:
– Size of the container image which might need
to be pulled
– Creation and startup of user container, queue
proxy and (optionally) istio sidecar
– Process startup and waiting for readiness
– Network namespace setup for the Pod
– Network setup in k8s and making the Pod
available in the deployment and service
– Load on the worker node machine
Image Pull
Pod creation
Container
Startup
Container
Startup
Cluster network
setup

It‘s in the ballpark of...
(do you remember the FaaS experience from above?)
~3-5s
overhead

Possible Knative
Improvements
45
Discussions in the Knative community
– Improve load-balancing in activator
– Do not wait for readiness probe
– Do not wait for Pod being reachable behind
ClusterIP and address the Pod directly
– Get rid of the queue-proxy side-car container
– Write a custom kubelet or use virtual kubelet
– Pre-warming images for specific runtimes, i.e.
nodejs, … and only inject code!
https://docs.google.com/document/d/1Jdd8eu3cJRv
CVkl8Y48Fg3fVY6Bpp7Tv8HshSg58dOg/edit#

What the user can do...
As an user
– Use light-weight frameworks for your
application container and ensure fast startup
times, like Quarkus
– Small container image and reduce #layers
(GraalVM)
– Find the right container concurrency > 1
– Configure min replicas to avoid scale to zero

1 petabyte
image data

Massive Parallel Workload
Burst!

Revisit the Pywren Scenario
Execution with 100 requests in parallel
took 128s for ~360 million records and
7.7GB
49
Actual Pods
Desired Pods

Panic and Stable Mode
50
Panic Mode = True
6s panic window
60s sliding stable window
Goal: average of 70% of
container concurrency in sliding
window
Panic when observed
concurrency > 200% of
desired concurrency
Check metrics of observed
concurrency every 2s (tick interval)

The „Panic Mode“

Activator as buffer!
52
”Proxy Mode” – if …
a) … the spare capacity, i.e. the number of
requests the pods could handle, is smaller
than the target burst capacity
b) … scaled to zero
“Serve Mode” – otherwise
Proxy vs Serve
https://docs.google.com/document/d/1Jdd8eu3cJRv
CVkl8Y48Fg3fVY6Bpp7Tv8HshSg58dOg/edit#
Spare capacity <
threshold

1 million
req/min peak

Throughput matters!
High volume
& low latency

Test:
– 1000 req/s
– 100ms duration of
each request
– container
concurrency is 40
– 5 minutes test run
Testing autoscaler stability
and precision
Produce a constant rate of N req/s
Higher container concurrency
Actual vs Expected:
• number of scale-up and scale-down
• number of pods scaled
• success and error rates
• latency
55
Expected:
1000
𝑟𝑒𝑞
𝑠
∗
0,1𝑠
40 𝑟𝑒𝑞 ∗ 70%
= 3,5 𝑃𝑜𝑑𝑠
100 𝑚𝑠 𝑙𝑎𝑡𝑒𝑛𝑐𝑦
100% 𝑠𝑢𝑐𝑐𝑒𝑠𝑠

It‘s works as expected!
56
Panic, scale-out
to 100 Pods
Stablize,
4 Pods
Initial wave, buffered
110 ~= 40 req * 0,7 * 4 Pods
1000
𝑟𝑒𝑞
𝑠
∗
0,1𝑠
40 𝑟𝑒𝑞 ∗ 70%
= 3,5 𝑃𝑜𝑑𝑠
Activator in proxy
mode

Scaling matters...
57
1. Understand component on the critical path
• Activator is getting more and more part of the
critical path, i.e. new in 0.9 release
2. Identify bottlenecks
• Consider network bandwidth
3. Scale horizontally and vertically
• Istio is CPU hungry and requires 0.5 vCPU per 1k
req/s

We could easily scale up to...
140k
req/s

7 TB of
memory

Capacity considarations (very simplified)
Capacity!

61
Let‘s talk a bit about resources
and placement

Remember
resources: The requested and limited memory
and cpu resources of the container
apiVersion: serving.knative.dev/v1beta1
kind: Service
metadata:
name: helloworld
spec:
template:
spec:
containerConcurrency: 10
timeoutSeconds: 600
containers:
- image: jerewern/helloworld
resources:
limits:
memory: 256Mi
cpu: 2000m
requests:
memory: 128Mi
cpu: 100m
https://github.com/knative/docs/blob/master/docs/serving/spec/knative-api-specification-1.0.md

Placement
63
Placement is done based on the resource requests
values for Memory and CPU
If a pod can not be placed, the node autoscaler
kicks in and deploys additional nodes
https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/
32/32 GB = 32 GB requests / 32 GB limit

Resource Limits and Quota
64
Containers can have resource limits assigned
If the container reaches the limit
• Memory imit à Kill
• CPU limit à throttle
Note: Resource limits count for the resource quota
given to a k8s namespace
32/32 GB = 32 GB requests / 32 GB limit
https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/

20 minutes
model training

This is what you get if a request takes 10min
504 Gateway
Timeout

It‘s sort of a...
Gap!

We would need something
asynchronous...
68
1. Client should NOT need to wait for the request
to finish
2. Client should be able to submit non-blocking
requests that run asynchronous
3. Client should be able to query the state of the
request in order to check if the request was
successful or failed
4. System should NOT scale down the pods when
the request is running asynchronous
5. System should count an asynchronous request
for container concurrency
https://docs.google.com/document/d/11Fryfns-
KQL6JXfNG9TyMsh3MDVF_gFGT2sl9bOpc_o/edit

This problem is...
Not solved
yet!

Content
What is Serverless?
What is Knative?
How Knative is working behind-the-scenes?
How Knative changes the serverless landscape?

Serverless is now more!
Run serverless
containers, apps and
functions on
Kubernetes with ease
That’s why it changes
the serverless
landscape!
71
Source: If applicable, describe source origin
Increasingfocusonbusinesslogic
Decreasing concern (and control) over stack implementation
Bare Metal
Virtual machines
Functions
Containers
Apps/PaaS
Serverless

Traditional 12-factor app
With Knative you can run your traditional application
and container in a serverless fashion. Easy lift-and-
shift.
Knative scales your app and container from-zero
and to-zero
Scale by number of requests instead of
CPU/Memory

Portability!
73
Knative runs where Kubernetes runs!
Knative brings serverless to Kubernetes
Operators and developers can leverage the same
infrastructure, tools and skills as for Kubernetes
+

Strength and Weakness of
Knative
Best suitable for high-volume request-response
workload allowing much higher throughput than
traditional FaaS
Scaling based on requests and concurrency vs
memory/CPU
Allows higher memory limits than traditional
FaaS services (AWS Lambda, Azure Functions,...)
Designed to handle bursty workload, but slow
reaction due to async feedback loop
Short critical path allows very low latency for
“warm“ invocations, i.e. ~1ms.
Lack on container cold start time compared to
existing FaaS services (seconds vs milliseconds)
Lack of long-running invocations

A lot of work todo...
Help
wanted!
75
https://github.com/knative

Or you want to...
Try it
out?

It‘s available as a managed
service in IBM Cloud

Insights on Knative and how it changes the serverless landscape

More Related Content

What's hot

Similar to Insights on Knative and how it changes the serverless landscape

Recently uploaded

Insights on Knative and how it changes the serverless landscape