Ein Blick behind-the-scenes
und wie Knative die Serverless
Landschaft verändert
(following charts in english)
Jeremias Werner | Senior Software Developer | IBM
Who I‘am...
2Jeremias Werner @JereWerner
Jeremias Werner
Senior Software Developer
IBM Research & Development
jerewern@de.ibm.com
LinkedIn: jeremias-werner
Twitter: JereWerner
Opinions are my own!
Who knows
what serverless
is?
3Jeremias Werner @JereWerner
Who has used
serverless
technology?
4Jeremias Werner @JereWerner
Who knows
Knative?
5Jeremias Werner @JereWerner
Content
What is Serverless?
What is Serverless good for?
What is Knative?
How Knative works behind-the-scenes?
How Knative changes the Serverless Landscape?
• Positioning
• Value Proposition
6Jeremias Werner @JereWerner
An evolution of compute!
Break-up the monolith
– Inherent Scaling
– Better resource
utilization
– Reduced costs
Abstraction of
infrastructure
– Devs focus on code
not infrastructure
– Faster time to
market = $
7
Source: If applicable, describe source origin
Increasingfocusonbusinesslogic
Decreasing concern (and control) over stack implementation
Bare Metal
Virtual machines
Functions
Serverless
Containers
Apps/PaaS
Jeremias Werner @JereWerner
Value Proposition
No management and
operation of
infrastructures
Focus on developing
value-adding code and
on driving innovations
Transparently scales
with the number of
requests being served
Only pay for resources
being used, instead of
resources idling
around
8
$
Jeremias Werner @JereWerner
This is the price of 1 GB allocated for 1s of
IBM Cloud Functions
$0.000017
9Jeremias Werner @JereWerner
Traditional Model
Worry about when and how to scale
Worry about resiliency & cost
Charged even when idling / not 100% utilized
Continuous polling due to missing event
programming model
Jeremias Werner @JereWerner
Serverless Model
Scales inherently one process per request
No cost overhead for resiliency
Introduces event programming model
Charges only for what is used
Only worry about code
higher dev velocity, lower operational costs
Jeremias Werner @JereWerner
Demo - A FaaS experience!
Jeremias Werner @JereWerner 12
Container cold start is in the ballpark of...
~200ms
13Jeremias Werner @JereWerner
Content
What is Serverless?
What is Serverless good for?
What is Knative?
How Knative works behind-the-scenes?
How Knative changes the Serverless Landscape?
• Use Cases
• Customer Stories
• Demo
14Jeremias Werner @JereWerner
Some customers
Many more in-prod
customer projects
across numerous
industries
incl automotive,
banking, insurance,
entertainment, retail,
manufacturing, etc.
Jeremias Werner @JereWerner
List of customers are removed!
Serverless is more than FaaS
While FaaS is the key anchor point for serverless,
there is a growing set of services from other
domains also delivering serverless attributes
This enables customers to build application
topologies which are entirely serverless
Build your serverless architecture!
Jeremias Werner @JereWerner
Common Use Cases
Serverless API
Backends /
Microservices
Mobile backends
Conversational
applications
Scheduled tasks
Massively parallel
compute / “Map”
operations
Parallel data
processing
Data-at-rest
processing & ETL
Pipelines
Data processing
enriched with cognitive
capabilities
Event Stream
Processing
IoT
17Jeremias Werner @JereWerner
Serverless API backend
Allows to map API
endpoints
to functions
DBaaSFaaSAPI GatewayClient
Jeremias Werner @JereWerner
Mobile Backend
Remember the value
propostion!
If there is no request,
nothing to run and no
charges occur!
FaaS
Jeremias Werner @JereWerner
Data Processing
Ideally suited for working with structured data, text,
audio, image and video data:
• Data enrichment, transformation, validation,
cleansing
• PDF processing
• Audio normalization
• Image rotation, sharpening, noise reduction or
• Thumbnail generation
• Image OCR’ing
• Video transcoding
20
Elephant
Animal
Sign
FaaSDBaaS
Jeremias Werner @JereWerner
Gain for Customer
10x
faster
21Jeremias Werner @JereWerner
Gain for Customer
-90%
cost
22Jeremias Werner @JereWerner
IBM & ESPN bring AI to Fantasy Football
23
https://www.youtube.com/watch?v=uDeP5b3iKfU
Jeremias Werner @JereWerner
ESPN Fantasy Football
Functions are used to
compute the content of
user dashboards
24Jeremias Werner @JereWerner
ESPN Fantasy Football has
10+ million daily
active user
25Jeremias Werner @JereWerner
Weather radar video processing
and thumbnail generation
Periodic trigger to scan
weather data from
object storage
Functions to generate
thumbnails and
animated gifs
26
(https://www.wunderground.com/maps)
Jeremias Werner @JereWerner
Demo
Map and Reduce the values of a column in 288 CSV
files with a total of 360 million rows stored on cloud
object storage to generate a histogram
27Jeremias Werner @JereWerner
pywren.io + knative.dev
PyWren Rocks!
28
Number of forecasts Local run FaaS
100,000 ~10,000 secs ~140 secs
Jeremias Werner @JereWerner
Content
What is Serverless?
What is Serverless good for?
What is Knative?
How Knative works behind-the-scenes?
How Knative changes the Serverless Landscape?
• Knative
• Knative Serving in Detail
• Demo
29Jeremias Werner @JereWerner
„Open source building blocks for
Serverless on Kubernetes“
30
https://knative.dev/
Jeremias Werner @JereWerner
What is Knative?
Run serverless containers, apps and functions on
Kubernetes with ease
Knative takes care of the details of
• networking,
• autoscaling (+ from-zero, to-zero)
• revision tracking
You just have to focus on your core logic
Simplified UX on top of Kubernetes
31
https://knative.dev/docs/
Jeremias Werner @JereWerner
Main Components
Serving is the runtime
component that hosts
and scales your
application as K8s
pods
Tekton (Build) are
Kubernetes building blocks to
run pipelines to create
images from source, now in
the ci/cd foundation
Eventing contains
tools for managing
events between
loosely coupled
services
Client owns the kn CLI to
manage knative resources
32
https://tekton.dev
Jeremias Werner @JereWerner
Today, we focus on
Serving!
33Jeremias Werner @JereWerner
Serving
Service – Is the top level resource that controls the
deployment and life-cycle of the workload
Route – Is the external visible endpoint of the
service and is routing the traffic to individual
revisions
Configuration – Describes the current desired state
of the deployment
Revision – Is created for each modification of the
configuration and reflect a point-in-time
configuration of the deployment
34
https://knative.dev/docs/serving/
Jeremias Werner @JereWerner
Serving and how it works?
1. Deploy app as pod/revision
2. Networking auto-setup
3. Revisions are scaled up/down based on load
4. Update of service create Revisions
5. Traffic splitting based on %
6. Dedicated URLs to Revisions
35Jeremias Werner @JereWerner
Demo
Group Name / DOC ID / Month XX, 2019 / © 2019 IBM Corporation 36
Knative Community
37
v0.10
Releases
~450
Individual
Contributors
~2.4k
Stars
>5k
Pull requests
9
Working groups
7
Seats in Steering
Commitee. 4xGoogle,
1xIBM, 1xRedHat,
1xPivotal
Jeremias Werner @JereWerner
Content
What is Serverless?
What is Serverless good for?
What is Knative?
How Knative works behind-the-scenes?
How Knative changes the Serverless Landscape?
• How Knative works
• Revisit the customer requirements
• Understand scaling behaviour
• Capacity considerations
38Jeremias Werner @JereWerner
The API Specification
containers: Kubernetes container spec but only
1 container is allowed
containerConcurrency: The maximum number
of concurrent requests being handled by a single
container instance. Basis for scaling.
resources: The requested and limited memory
and cpu resources of the container
apiVersion: serving.knative.dev/v1beta1
kind: Service
metadata:
name: helloworld
spec:
template:
spec:
containerConcurrency: 10
containers:
- image: jerewern/helloworld
resources:
limits:
memory: 256Mi
cpu: 2000m
requests:
memory: 128Mi
cpu: 100m
https://github.com/knative/docs/blob/master/docs/serving/spec/knative-api-specification-1.0.md
Jeremias Werner @JereWerner
The Request Flow (simplified)
1. Istio Ingress Gateway is configured by the
Route and terminates the request
2. Ingress Gateway forwards the requests to the
Activtor
3. Activator buffers requests when scaled to zero
or when in burst mode
4. Queue Proxy is terminating the request in the
service pod and forwards the request to the
user container
5. Autoscaler is scraping metrics from the
Activator and Queue Proxies and scales the
Deployment
assuming istio-sidecar injection being disabled
Jeremias Werner @JereWerner
Think big! A customer requirement
~ 1s response
guarantee
41Jeremias Werner @JereWerner
Interactive workload
Cold start
latency
42Jeremias Werner @JereWerner
The Problem Statement
43
The Pod startup depends on a couple of factors,
like:
– Size of the container image which might need
to be pulled
– Creation and startup of user container, queue
proxy and (optionally) istio sidecar
– Process startup and waiting for readiness
– Network namespace setup for the Pod
– Network setup in k8s and making the Pod
available in the deployment and service
– Load on the worker node machine
Image Pull
Pod creation
Container
Startup
Container
Startup
Cluster network
setup
Jeremias Werner @JereWerner
It‘s in the ballpark of...
(do you remember the FaaS experience from above?)
~3-5s
overhead
44Jeremias Werner @JereWerner
Possible Knative
Improvements
45
Discussions in the Knative community
– Improve load-balancing in activator
– Do not wait for readiness probe
– Do not wait for Pod being reachable behind
ClusterIP and address the Pod directly
– Get rid of the queue-proxy side-car container
– Write a custom kubelet or use virtual kubelet
– Pre-warming images for specific runtimes, i.e.
nodejs, … and only inject code!
https://docs.google.com/document/d/1Jdd8eu3cJRv
CVkl8Y48Fg3fVY6Bpp7Tv8HshSg58dOg/edit#
Jeremias Werner @JereWerner
What the user can do...
As an user
– Use light-weight frameworks for your
application container and ensure fast startup
times, like Quarkus
– Small container image and reduce #layers
(GraalVM)
– Find the right container concurrency > 1
– Configure min replicas to avoid scale to zero
46Jeremias Werner @JereWerner
Think big! A customer requirement
1 petabyte
image data
47Jeremias Werner @JereWerner
Massive Parallel Workload
Burst!
48Jeremias Werner @JereWerner
Revisit the Pywren Scenario
Execution with 100 requests in parallel
took 128s for ~360 million records and
7.7GB
49
Actual Pods
Desired Pods
Jeremias Werner @JereWerner
Panic and Stable Mode
50
Panic Mode = True
6s panic window
60s sliding stable window
Goal: average of 70% of
container concurrency in sliding
window
Panic when observed
concurrency > 200% of
desired concurrency
Check metrics of observed
concurrency every 2s (tick interval)
Jeremias Werner @JereWerner
The „Panic Mode“
Group Name / DOC ID / Month XX, 2019 / © 2019 IBM Corporation 51
Activator as buffer!
52
”Proxy Mode” – if …
a) … the spare capacity, i.e. the number of
requests the pods could handle, is smaller
than the target burst capacity
b) … scaled to zero
“Serve Mode” – otherwise
Proxy vs Serve
https://docs.google.com/document/d/1Jdd8eu3cJRv
CVkl8Y48Fg3fVY6Bpp7Tv8HshSg58dOg/edit#
Spare capacity <
threshold
Jeremias Werner @JereWerner
Think big! A customer requirement
1 million
req/min peak
53Jeremias Werner @JereWerner
Throughput matters!
High volume
& low latency
54Jeremias Werner @JereWerner
Test:
– 1000 req/s
– 100ms duration of
each request
– container
concurrency is 40
– 5 minutes test run
Testing autoscaler stability
and precision
Produce a constant rate of N req/s
Higher container concurrency
Actual vs Expected:
• number of scale-up and scale-down
• number of pods scaled
• success and error rates
• latency
55
Expected:
1000
𝑟𝑒𝑞
𝑠
∗
0,1𝑠
40 𝑟𝑒𝑞 ∗ 70%
= 3,5 𝑃𝑜𝑑𝑠
100 𝑚𝑠 𝑙𝑎𝑡𝑒𝑛𝑐𝑦
100% 𝑠𝑢𝑐𝑐𝑒𝑠𝑠
Jeremias Werner @JereWerner
It‘s works as expected!
56
Panic, scale-out
to 100 Pods
Stablize,
4 Pods
Initial wave, buffered
110 ~= 40 req * 0,7 * 4 Pods
1000
𝑟𝑒𝑞
𝑠
∗
0,1𝑠
40 𝑟𝑒𝑞 ∗ 70%
= 3,5 𝑃𝑜𝑑𝑠
Activator in proxy
mode
Jeremias Werner @JereWerner
Scaling matters...
57
1. Understand component on the critical path
• Activator is getting more and more part of the
critical path, i.e. new in 0.9 release
2. Identify bottlenecks
• Consider network bandwidth
3. Scale horizontally and vertically
• Istio is CPU hungry and requires 0.5 vCPU per 1k
req/s
Jeremias Werner @JereWerner
We could easily scale up to...
140k
req/s
58Jeremias Werner @JereWerner
Think big! A customer requirement
7 TB of
memory
59Jeremias Werner @JereWerner
Capacity considarations (very simplified)
Capacity!
60Jeremias Werner @JereWerner
61
Let‘s talk a bit about resources
and placement
Jeremias Werner @JereWerner
Remember
resources: The requested and limited memory
and cpu resources of the container
apiVersion: serving.knative.dev/v1beta1
kind: Service
metadata:
name: helloworld
spec:
template:
spec:
containerConcurrency: 10
timeoutSeconds: 600
containers:
- image: jerewern/helloworld
resources:
limits:
memory: 256Mi
cpu: 2000m
requests:
memory: 128Mi
cpu: 100m
https://github.com/knative/docs/blob/master/docs/serving/spec/knative-api-specification-1.0.md
Jeremias Werner @JereWerner
Placement
63
Placement is done based on the resource requests
values for Memory and CPU
If a pod can not be placed, the node autoscaler
kicks in and deploys additional nodes
https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/
32/32 GB = 32 GB requests / 32 GB limit
Jeremias Werner @JereWerner
Resource Limits and Quota
64
Containers can have resource limits assigned
If the container reaches the limit
• Memory imit à Kill
• CPU limit à throttle
Note: Resource limits count for the resource quota
given to a k8s namespace
32/32 GB = 32 GB requests / 32 GB limit
https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/
Jeremias Werner @JereWerner
Think big! A customer requirement
20 minutes
model training
65Jeremias Werner @JereWerner
This is what you get if a request takes 10min
504 Gateway
Timeout
66Jeremias Werner @JereWerner
It‘s sort of a...
Gap!
Group Name / DOC ID / Month XX, 2019 / © 2019 IBM Corporation 67
We would need something
asynchronous...
68
1. Client should NOT need to wait for the request
to finish
2. Client should be able to submit non-blocking
requests that run asynchronous
3. Client should be able to query the state of the
request in order to check if the request was
successful or failed
4. System should NOT scale down the pods when
the request is running asynchronous
5. System should count an asynchronous request
for container concurrency
https://docs.google.com/document/d/11Fryfns-
KQL6JXfNG9TyMsh3MDVF_gFGT2sl9bOpc_o/edit
Jeremias Werner @JereWerner
This problem is...
Not solved
yet!
69Jeremias Werner @JereWerner
Content
What is Serverless?
What is Serverless good for?
What is Knative?
How Knative is working behind-the-scenes?
How Knative changes the serverless landscape?
70Jeremias Werner @JereWerner
Serverless is now more!
Run serverless
containers, apps and
functions on
Kubernetes with ease
That’s why it changes
the serverless
landscape!
71
Source: If applicable, describe source origin
Increasingfocusonbusinesslogic
Decreasing concern (and control) over stack implementation
Bare Metal
Virtual machines
Functions
Containers
Apps/PaaS
Serverless
Jeremias Werner @JereWerner
Traditional 12-factor app
With Knative you can run your traditional application
and container in a serverless fashion. Easy lift-and-
shift.
Knative scales your app and container from-zero
and to-zero
Scale by number of requests instead of
CPU/Memory
Jeremias Werner @JereWerner
Portability!
73
Knative runs where Kubernetes runs!
Knative brings serverless to Kubernetes
Operators and developers can leverage the same
infrastructure, tools and skills as for Kubernetes
+
Jeremias Werner @JereWerner
Strength and Weakness of
Knative
Best suitable for high-volume request-response
workload allowing much higher throughput than
traditional FaaS
Scaling based on requests and concurrency vs
memory/CPU
Allows higher memory limits than traditional
FaaS services (AWS Lambda, Azure Functions,...)
Designed to handle bursty workload, but slow
reaction due to async feedback loop
Short critical path allows very low latency for
“warm“ invocations, i.e. ~1ms.
Lack on container cold start time compared to
existing FaaS services (seconds vs milliseconds)
Lack of long-running invocations
74Jeremias Werner @JereWerner
A lot of work todo...
Help
wanted!
75
https://github.com/knative
Jeremias Werner @JereWerner
Or you want to...
Try it
out?
76Jeremias Werner @JereWerner
It‘s available as a managed
service in IBM Cloud
77Jeremias Werner @JereWerner
Questions?
Thank you!
78Group Name / DOC ID / Month XX, 2019 / © 2019 IBM Corporation

Insights on Knative and how it changes the serverless landscape

  • 1.
    Ein Blick behind-the-scenes undwie Knative die Serverless Landschaft verändert (following charts in english) Jeremias Werner | Senior Software Developer | IBM
  • 2.
    Who I‘am... 2Jeremias Werner@JereWerner Jeremias Werner Senior Software Developer IBM Research & Development jerewern@de.ibm.com LinkedIn: jeremias-werner Twitter: JereWerner Opinions are my own!
  • 3.
  • 4.
  • 5.
  • 6.
    Content What is Serverless? Whatis Serverless good for? What is Knative? How Knative works behind-the-scenes? How Knative changes the Serverless Landscape? • Positioning • Value Proposition 6Jeremias Werner @JereWerner
  • 7.
    An evolution ofcompute! Break-up the monolith – Inherent Scaling – Better resource utilization – Reduced costs Abstraction of infrastructure – Devs focus on code not infrastructure – Faster time to market = $ 7 Source: If applicable, describe source origin Increasingfocusonbusinesslogic Decreasing concern (and control) over stack implementation Bare Metal Virtual machines Functions Serverless Containers Apps/PaaS Jeremias Werner @JereWerner
  • 8.
    Value Proposition No managementand operation of infrastructures Focus on developing value-adding code and on driving innovations Transparently scales with the number of requests being served Only pay for resources being used, instead of resources idling around 8 $ Jeremias Werner @JereWerner
  • 9.
    This is theprice of 1 GB allocated for 1s of IBM Cloud Functions $0.000017 9Jeremias Werner @JereWerner
  • 10.
    Traditional Model Worry aboutwhen and how to scale Worry about resiliency & cost Charged even when idling / not 100% utilized Continuous polling due to missing event programming model Jeremias Werner @JereWerner
  • 11.
    Serverless Model Scales inherentlyone process per request No cost overhead for resiliency Introduces event programming model Charges only for what is used Only worry about code higher dev velocity, lower operational costs Jeremias Werner @JereWerner
  • 12.
    Demo - AFaaS experience! Jeremias Werner @JereWerner 12
  • 13.
    Container cold startis in the ballpark of... ~200ms 13Jeremias Werner @JereWerner
  • 14.
    Content What is Serverless? Whatis Serverless good for? What is Knative? How Knative works behind-the-scenes? How Knative changes the Serverless Landscape? • Use Cases • Customer Stories • Demo 14Jeremias Werner @JereWerner
  • 15.
    Some customers Many morein-prod customer projects across numerous industries incl automotive, banking, insurance, entertainment, retail, manufacturing, etc. Jeremias Werner @JereWerner List of customers are removed!
  • 16.
    Serverless is morethan FaaS While FaaS is the key anchor point for serverless, there is a growing set of services from other domains also delivering serverless attributes This enables customers to build application topologies which are entirely serverless Build your serverless architecture! Jeremias Werner @JereWerner
  • 17.
    Common Use Cases ServerlessAPI Backends / Microservices Mobile backends Conversational applications Scheduled tasks Massively parallel compute / “Map” operations Parallel data processing Data-at-rest processing & ETL Pipelines Data processing enriched with cognitive capabilities Event Stream Processing IoT 17Jeremias Werner @JereWerner
  • 18.
    Serverless API backend Allowsto map API endpoints to functions DBaaSFaaSAPI GatewayClient Jeremias Werner @JereWerner
  • 19.
    Mobile Backend Remember thevalue propostion! If there is no request, nothing to run and no charges occur! FaaS Jeremias Werner @JereWerner
  • 20.
    Data Processing Ideally suitedfor working with structured data, text, audio, image and video data: • Data enrichment, transformation, validation, cleansing • PDF processing • Audio normalization • Image rotation, sharpening, noise reduction or • Thumbnail generation • Image OCR’ing • Video transcoding 20 Elephant Animal Sign FaaSDBaaS Jeremias Werner @JereWerner
  • 21.
  • 22.
  • 23.
    IBM & ESPNbring AI to Fantasy Football 23 https://www.youtube.com/watch?v=uDeP5b3iKfU Jeremias Werner @JereWerner
  • 24.
    ESPN Fantasy Football Functionsare used to compute the content of user dashboards 24Jeremias Werner @JereWerner
  • 25.
    ESPN Fantasy Footballhas 10+ million daily active user 25Jeremias Werner @JereWerner
  • 26.
    Weather radar videoprocessing and thumbnail generation Periodic trigger to scan weather data from object storage Functions to generate thumbnails and animated gifs 26 (https://www.wunderground.com/maps) Jeremias Werner @JereWerner
  • 27.
    Demo Map and Reducethe values of a column in 288 CSV files with a total of 360 million rows stored on cloud object storage to generate a histogram 27Jeremias Werner @JereWerner pywren.io + knative.dev
  • 28.
    PyWren Rocks! 28 Number offorecasts Local run FaaS 100,000 ~10,000 secs ~140 secs Jeremias Werner @JereWerner
  • 29.
    Content What is Serverless? Whatis Serverless good for? What is Knative? How Knative works behind-the-scenes? How Knative changes the Serverless Landscape? • Knative • Knative Serving in Detail • Demo 29Jeremias Werner @JereWerner
  • 30.
    „Open source buildingblocks for Serverless on Kubernetes“ 30 https://knative.dev/ Jeremias Werner @JereWerner
  • 31.
    What is Knative? Runserverless containers, apps and functions on Kubernetes with ease Knative takes care of the details of • networking, • autoscaling (+ from-zero, to-zero) • revision tracking You just have to focus on your core logic Simplified UX on top of Kubernetes 31 https://knative.dev/docs/ Jeremias Werner @JereWerner
  • 32.
    Main Components Serving isthe runtime component that hosts and scales your application as K8s pods Tekton (Build) are Kubernetes building blocks to run pipelines to create images from source, now in the ci/cd foundation Eventing contains tools for managing events between loosely coupled services Client owns the kn CLI to manage knative resources 32 https://tekton.dev Jeremias Werner @JereWerner
  • 33.
    Today, we focuson Serving! 33Jeremias Werner @JereWerner
  • 34.
    Serving Service – Isthe top level resource that controls the deployment and life-cycle of the workload Route – Is the external visible endpoint of the service and is routing the traffic to individual revisions Configuration – Describes the current desired state of the deployment Revision – Is created for each modification of the configuration and reflect a point-in-time configuration of the deployment 34 https://knative.dev/docs/serving/ Jeremias Werner @JereWerner
  • 35.
    Serving and howit works? 1. Deploy app as pod/revision 2. Networking auto-setup 3. Revisions are scaled up/down based on load 4. Update of service create Revisions 5. Traffic splitting based on % 6. Dedicated URLs to Revisions 35Jeremias Werner @JereWerner
  • 36.
    Demo Group Name /DOC ID / Month XX, 2019 / © 2019 IBM Corporation 36
  • 37.
    Knative Community 37 v0.10 Releases ~450 Individual Contributors ~2.4k Stars >5k Pull requests 9 Workinggroups 7 Seats in Steering Commitee. 4xGoogle, 1xIBM, 1xRedHat, 1xPivotal Jeremias Werner @JereWerner
  • 38.
    Content What is Serverless? Whatis Serverless good for? What is Knative? How Knative works behind-the-scenes? How Knative changes the Serverless Landscape? • How Knative works • Revisit the customer requirements • Understand scaling behaviour • Capacity considerations 38Jeremias Werner @JereWerner
  • 39.
    The API Specification containers:Kubernetes container spec but only 1 container is allowed containerConcurrency: The maximum number of concurrent requests being handled by a single container instance. Basis for scaling. resources: The requested and limited memory and cpu resources of the container apiVersion: serving.knative.dev/v1beta1 kind: Service metadata: name: helloworld spec: template: spec: containerConcurrency: 10 containers: - image: jerewern/helloworld resources: limits: memory: 256Mi cpu: 2000m requests: memory: 128Mi cpu: 100m https://github.com/knative/docs/blob/master/docs/serving/spec/knative-api-specification-1.0.md Jeremias Werner @JereWerner
  • 40.
    The Request Flow(simplified) 1. Istio Ingress Gateway is configured by the Route and terminates the request 2. Ingress Gateway forwards the requests to the Activtor 3. Activator buffers requests when scaled to zero or when in burst mode 4. Queue Proxy is terminating the request in the service pod and forwards the request to the user container 5. Autoscaler is scraping metrics from the Activator and Queue Proxies and scales the Deployment assuming istio-sidecar injection being disabled Jeremias Werner @JereWerner
  • 41.
    Think big! Acustomer requirement ~ 1s response guarantee 41Jeremias Werner @JereWerner
  • 42.
  • 43.
    The Problem Statement 43 ThePod startup depends on a couple of factors, like: – Size of the container image which might need to be pulled – Creation and startup of user container, queue proxy and (optionally) istio sidecar – Process startup and waiting for readiness – Network namespace setup for the Pod – Network setup in k8s and making the Pod available in the deployment and service – Load on the worker node machine Image Pull Pod creation Container Startup Container Startup Cluster network setup Jeremias Werner @JereWerner
  • 44.
    It‘s in theballpark of... (do you remember the FaaS experience from above?) ~3-5s overhead 44Jeremias Werner @JereWerner
  • 45.
    Possible Knative Improvements 45 Discussions inthe Knative community – Improve load-balancing in activator – Do not wait for readiness probe – Do not wait for Pod being reachable behind ClusterIP and address the Pod directly – Get rid of the queue-proxy side-car container – Write a custom kubelet or use virtual kubelet – Pre-warming images for specific runtimes, i.e. nodejs, … and only inject code! https://docs.google.com/document/d/1Jdd8eu3cJRv CVkl8Y48Fg3fVY6Bpp7Tv8HshSg58dOg/edit# Jeremias Werner @JereWerner
  • 46.
    What the usercan do... As an user – Use light-weight frameworks for your application container and ensure fast startup times, like Quarkus – Small container image and reduce #layers (GraalVM) – Find the right container concurrency > 1 – Configure min replicas to avoid scale to zero 46Jeremias Werner @JereWerner
  • 47.
    Think big! Acustomer requirement 1 petabyte image data 47Jeremias Werner @JereWerner
  • 48.
  • 49.
    Revisit the PywrenScenario Execution with 100 requests in parallel took 128s for ~360 million records and 7.7GB 49 Actual Pods Desired Pods Jeremias Werner @JereWerner
  • 50.
    Panic and StableMode 50 Panic Mode = True 6s panic window 60s sliding stable window Goal: average of 70% of container concurrency in sliding window Panic when observed concurrency > 200% of desired concurrency Check metrics of observed concurrency every 2s (tick interval) Jeremias Werner @JereWerner
  • 51.
    The „Panic Mode“ GroupName / DOC ID / Month XX, 2019 / © 2019 IBM Corporation 51
  • 52.
    Activator as buffer! 52 ”ProxyMode” – if … a) … the spare capacity, i.e. the number of requests the pods could handle, is smaller than the target burst capacity b) … scaled to zero “Serve Mode” – otherwise Proxy vs Serve https://docs.google.com/document/d/1Jdd8eu3cJRv CVkl8Y48Fg3fVY6Bpp7Tv8HshSg58dOg/edit# Spare capacity < threshold Jeremias Werner @JereWerner
  • 53.
    Think big! Acustomer requirement 1 million req/min peak 53Jeremias Werner @JereWerner
  • 54.
    Throughput matters! High volume &low latency 54Jeremias Werner @JereWerner
  • 55.
    Test: – 1000 req/s –100ms duration of each request – container concurrency is 40 – 5 minutes test run Testing autoscaler stability and precision Produce a constant rate of N req/s Higher container concurrency Actual vs Expected: • number of scale-up and scale-down • number of pods scaled • success and error rates • latency 55 Expected: 1000 𝑟𝑒𝑞 𝑠 ∗ 0,1𝑠 40 𝑟𝑒𝑞 ∗ 70% = 3,5 𝑃𝑜𝑑𝑠 100 𝑚𝑠 𝑙𝑎𝑡𝑒𝑛𝑐𝑦 100% 𝑠𝑢𝑐𝑐𝑒𝑠𝑠 Jeremias Werner @JereWerner
  • 56.
    It‘s works asexpected! 56 Panic, scale-out to 100 Pods Stablize, 4 Pods Initial wave, buffered 110 ~= 40 req * 0,7 * 4 Pods 1000 𝑟𝑒𝑞 𝑠 ∗ 0,1𝑠 40 𝑟𝑒𝑞 ∗ 70% = 3,5 𝑃𝑜𝑑𝑠 Activator in proxy mode Jeremias Werner @JereWerner
  • 57.
    Scaling matters... 57 1. Understandcomponent on the critical path • Activator is getting more and more part of the critical path, i.e. new in 0.9 release 2. Identify bottlenecks • Consider network bandwidth 3. Scale horizontally and vertically • Istio is CPU hungry and requires 0.5 vCPU per 1k req/s Jeremias Werner @JereWerner
  • 58.
    We could easilyscale up to... 140k req/s 58Jeremias Werner @JereWerner
  • 59.
    Think big! Acustomer requirement 7 TB of memory 59Jeremias Werner @JereWerner
  • 60.
    Capacity considarations (verysimplified) Capacity! 60Jeremias Werner @JereWerner
  • 61.
    61 Let‘s talk abit about resources and placement Jeremias Werner @JereWerner
  • 62.
    Remember resources: The requestedand limited memory and cpu resources of the container apiVersion: serving.knative.dev/v1beta1 kind: Service metadata: name: helloworld spec: template: spec: containerConcurrency: 10 timeoutSeconds: 600 containers: - image: jerewern/helloworld resources: limits: memory: 256Mi cpu: 2000m requests: memory: 128Mi cpu: 100m https://github.com/knative/docs/blob/master/docs/serving/spec/knative-api-specification-1.0.md Jeremias Werner @JereWerner
  • 63.
    Placement 63 Placement is donebased on the resource requests values for Memory and CPU If a pod can not be placed, the node autoscaler kicks in and deploys additional nodes https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/ 32/32 GB = 32 GB requests / 32 GB limit Jeremias Werner @JereWerner
  • 64.
    Resource Limits andQuota 64 Containers can have resource limits assigned If the container reaches the limit • Memory imit à Kill • CPU limit à throttle Note: Resource limits count for the resource quota given to a k8s namespace 32/32 GB = 32 GB requests / 32 GB limit https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/ Jeremias Werner @JereWerner
  • 65.
    Think big! Acustomer requirement 20 minutes model training 65Jeremias Werner @JereWerner
  • 66.
    This is whatyou get if a request takes 10min 504 Gateway Timeout 66Jeremias Werner @JereWerner
  • 67.
    It‘s sort ofa... Gap! Group Name / DOC ID / Month XX, 2019 / © 2019 IBM Corporation 67
  • 68.
    We would needsomething asynchronous... 68 1. Client should NOT need to wait for the request to finish 2. Client should be able to submit non-blocking requests that run asynchronous 3. Client should be able to query the state of the request in order to check if the request was successful or failed 4. System should NOT scale down the pods when the request is running asynchronous 5. System should count an asynchronous request for container concurrency https://docs.google.com/document/d/11Fryfns- KQL6JXfNG9TyMsh3MDVF_gFGT2sl9bOpc_o/edit Jeremias Werner @JereWerner
  • 69.
    This problem is... Notsolved yet! 69Jeremias Werner @JereWerner
  • 70.
    Content What is Serverless? Whatis Serverless good for? What is Knative? How Knative is working behind-the-scenes? How Knative changes the serverless landscape? 70Jeremias Werner @JereWerner
  • 71.
    Serverless is nowmore! Run serverless containers, apps and functions on Kubernetes with ease That’s why it changes the serverless landscape! 71 Source: If applicable, describe source origin Increasingfocusonbusinesslogic Decreasing concern (and control) over stack implementation Bare Metal Virtual machines Functions Containers Apps/PaaS Serverless Jeremias Werner @JereWerner
  • 72.
    Traditional 12-factor app WithKnative you can run your traditional application and container in a serverless fashion. Easy lift-and- shift. Knative scales your app and container from-zero and to-zero Scale by number of requests instead of CPU/Memory Jeremias Werner @JereWerner
  • 73.
    Portability! 73 Knative runs whereKubernetes runs! Knative brings serverless to Kubernetes Operators and developers can leverage the same infrastructure, tools and skills as for Kubernetes + Jeremias Werner @JereWerner
  • 74.
    Strength and Weaknessof Knative Best suitable for high-volume request-response workload allowing much higher throughput than traditional FaaS Scaling based on requests and concurrency vs memory/CPU Allows higher memory limits than traditional FaaS services (AWS Lambda, Azure Functions,...) Designed to handle bursty workload, but slow reaction due to async feedback loop Short critical path allows very low latency for “warm“ invocations, i.e. ~1ms. Lack on container cold start time compared to existing FaaS services (seconds vs milliseconds) Lack of long-running invocations 74Jeremias Werner @JereWerner
  • 75.
    A lot ofwork todo... Help wanted! 75 https://github.com/knative Jeremias Werner @JereWerner
  • 76.
    Or you wantto... Try it out? 76Jeremias Werner @JereWerner
  • 77.
    It‘s available asa managed service in IBM Cloud 77Jeremias Werner @JereWerner
  • 78.
    Questions? Thank you! 78Group Name/ DOC ID / Month XX, 2019 / © 2019 IBM Corporation