Service-Level Objective for
Serverless Applications
Hai Duc Nguyen, Aleksander Slominski, and Lionel Villard
Background
Business objectives
- Market growth
- Maximize user
experiences
Feedback: Performance metrics, earning, etc.
Business improvement loop
Solution deployments
- Functionalities
- Latency
- Throughput
Encode business logics in terms of performance
indicators that can be measured and enforced by proper
system designs
- Software-level Agreements (SLA): performance
guarantee agreements defined by user and enforced
by the implementation
- Software-level Objective (SLO): key elements of SLA,
objectives that the implementation must achieve to
satisfy the SLA
2
SLO/SLA Example: Reefer Container Shipment
reference implementation*
*see https://ibm-cloud-architecture.github.io/refarch-kc/
• Business goal: customer
experience should not be
affected by delay
• Translate to SLA/SLO
- 99% of order placement < 2s
- No order placement lost
- Availability > 99.999%
- Capacity > 1M concurrent
orders
- Business logics change
- Market growth
- Response to competitors’
improvement
3
Can we translate the business goal to system
implementation fast?
Serverless (industry perspective)
• According to Cloud Native Computing Foundation*: “Serverless computing refers to the concept of
building and running applications that do not require server management. It describes a finer-
grained deployment model where applications, bundled as one or more functions, are uploaded to
a platform and then executed, scaled, and billed in response to the exact demand needed at the
moment”
* https://www.cncf.io/blog/2018/02/14/cncf-takes-first-step-towards-serverless-computing/
Time
Workload
Serverless
Allocation
Cost
- Autoscaling: dynamically add/remove
resources according to workload, scale-to-
zero
- Truly pay-as-you-go: only pay for function
execution time with fine-grain pricing (per
100ms)
- Detach resource management from
software development/maintenance
process
- Speed up software evolution
4
Serverless Implementation
Kubernetes is “an open-source system for automating
deployment, scaling, and management of containerized
applications”*
• Become the de facto standard platform for container orchestration.
• Market adoption is strong: 80% of companies used Kubernetes in
some forms in 2019, and the number is subjected to increase***
Knative is “an open source community project which adds
components for deploying, running, and managing serverless,
cloud native applications to Kubernetes”**.
* https://kubernetes.io
** https://www.redhat.com/en/topics/microservices/what-is-knative
*** Kubernetes Ecosystem, https://thenewstack.io/ebooks/kubernetes/state-of-kubernetes-ecosystem-second-edition-2020/
5
Example: handling a new shipment order
Preprocess
Find
container
Schedule
shipment
New order Response
𝑓
1. Submit
2. Establish
Knative
Kubernetes
4. Invoke
3. Subscribe
𝑓
5. Publish 6. Trigger
Container Container
𝑓 Kafka
Kafka
Container
6
Why Serverless SLO?
• In Serverless, it is order of magnitude faster to translate among
business logics and system designs
• Zero deployment/maintenance effort
• New way of utilizing resources
• Finer grain resource management  truly pay-as-you-go
• High flexibility  more rooms for resource management optimization
• Spatial: stateless
• Temporal: short execution time
 Potential way to quickly define and enforce SLOs
7
Experiment setup for serverless sequence*
100ms 100ms 100ms
…
- Periodic bursts
- Constant ramp-up, 4 orders per sec2
- Limited rate, up to 120 orders per sec
Workload (Input)
Preprocess
Find
container
Schedule
shipment
New order Response
forward-1 Kafka Kafkaforward-2 forward-3 Output
System specs:
- Knative v0.15.1 over Kubernetes v1.18.2 supported by KinD**
- Events are transmitted through Kafka v2.5.0
- 12 CPU, 48 GB Memory and 256GB SSD
*Script available at https://github.com/ngduchai/event-driven-apps
**KinD helps deploying Kubernetes cluster over local computer, see https://kind.sigs.k8s.io
8
Invocation Overhead of forward-1
• Scaling delay: Serverless framework need time to detect workload changes and
schedule resources accordingly
• Cold start: It takes time to allocate and initialize new function sandboxes
Latency distributionDemand vs. System capacity9
Autoscaler behaviors over the Sequence
Scale-up delay propagation Latency delay propagation
- Serverless platform is not aware of the sequence topology
- The overhead is amplified as computation propagates over the topology
10
Problem and Approach
• Can we leverage the serverless model to define and enforce a new set
of SLA/SLO to support fast translation from business logic to
application solutions?
SLO EnforcerBusiness + Application
Logics
Application + SLO
Description
Workload
1. Define 2. Deploy
5. Config
3. Run
4. Feedback
with perf.
metrics
6. Evaluate
11
Application Description: Open Application
Model (OAM)
• “Open Application Model (OAM) is a runtime-agnostic specification for defining
cloud native applications and enable building application-centric platforms by
natural. Focused on application rather than container or orchestrator, Open
Application Model brings modular, extensible, and portable design for building
application centric platforms on any runtime systems like Kubernetes, cloud, or
IoT devices.” *
• Provide for translating business goals to application design and application description,
including
• Computation components (e.g. Micro-services)
• Topology
• Deployment environment
• We extend OAM for defining and enforcing SLO by
• Adding serverless (Knative deployment) description support
• Adding SLO description
*OAM see https://github.com/oam-dev/spec
12
SLO Description
• As long as
• input rate < 100 rps:
• Ramp-up < 10 req/sec*sec
• Guarantee:
• 95th latency < 1000ms
• 400ms <= Mean latency <= 600ms
• Stddev latency <= 200
input output
Per-request Pricing
End-to-end Latency Earning ($)
< 600 0.00002
< 1000 0.00001
Otherwise 0
Translate to SLO
Add to Blueprint
(slodesc.yaml)
13
Control Invocation Overhead with Knative
• Cold start: multiplex
multiple invocation into big
pods (e.g. containers) to
reduce # cold starts
• Scaling delay: reserve extra
resource to buy time when
workload surges
14
Note: 400m+1 = use pod size of 400m with
reservation = 400m (1 pod)
Topology Awareness Configuration with
Knative
• Select the right pod size • Select the right reservation
Increasing pod size doesn’t always
improve concurrency significantly
Increasing workflow size increases the
per-component deployment cost
15
Demonstration
• Naïve serverless vs. Serverless + SLO enforcement
• Reuse the previous workload, application logics, and system setup
• SLO: as long as input rate < 120 order per sec:
• 95th of end-to-end latency < 1500ms
• Recorded Videos
• LINK TO BE ADDED
16
Highlights (forward-3)
17
Serverless Serverless +
SLO enforcement
Results: Latency distribution
Serverless deployment Serverless + SLO enforcer deployment
Extremely high latency due to scaling lag
and topology unawareness
18
SLO Enforcer successfully meet SLO
requirements by choosing right pod size
and reservation
Results: Earning
• Cost is calculated based on IBM
container pricing*, with 0.000034
USD/Second/Core
* https://cloud.ibm.com/kubernetes/catalog/about
Per-request Icome
End-to-end Latency Income ($)
< 500 0.0000012
< 1000 0.0000011
< 1500 0.000001
Otherwise 0
19
SLO Enforcer meet the SLO at reasonably low
cost, thereby creates high earning, satisfying
the business goal.
Related Work
• Handling invocation overhead
• Cold start: SAND (ATC18), SOCK (ATC18), Catalyzer (ASPLOS20)
• Scaling: Shadrad et. al. (ATC’20)
• Per-function optimization, no topology support
• Topology-aware Deployment for Serverless: IBM Composer, AWS Step
Function
• Simple topology (sequence + parallel), no performance guarantee
• Performance Guarantee for Serverless: Real-time Serverless (WoSC19)
• Rate guarantee but no topology support
20
Conclusion and Future Vision
• Serverless opens opportunities to quickly build and adjust software
solution to business goals. But many challenges arise
• Scaling overhead
• Lack of topology awareness
• We propose an SLO Serverless interface to describe and enforce
business goals in terms of SLOs
• Long-term vision
• Support more complicated workload topologies
• Efficient SLO enforcement (smarter metrics selection, ML approaches, etc.)
• Generic mechanism for all serverless platforms (not just Knative)
21
Coming soon…
• Will be available soon
• Blog post
• Demonstration scripts
• … and slides, demo recording and talk recording posted later after the
talk
22
Thank you!
23
Backup slides
24
𝑓
𝑓
𝑓
𝑓
𝑓
𝑓
𝑓
…
New orders
Valid?
Yes
No
ReturnInspect
Find voyages
Find container
Workload
Generator
Order 𝑓
AllocatedOrder
RejectedOrder
Workload
Observer
Find containers
for an order
Supply chain from manufacturer to retailer
The Reality
25
Workload
Generator
Order 𝑓
AllocatedOrder
RejectedOrder
Workload
Observer
Find containers
for an order
Workload
Generator
Order 𝑓
Workload
Generator
𝑓
Workload
Generator
Empty
26

Service-Level Objective for Serverless Applications

  • 1.
    Service-Level Objective for ServerlessApplications Hai Duc Nguyen, Aleksander Slominski, and Lionel Villard
  • 2.
    Background Business objectives - Marketgrowth - Maximize user experiences Feedback: Performance metrics, earning, etc. Business improvement loop Solution deployments - Functionalities - Latency - Throughput Encode business logics in terms of performance indicators that can be measured and enforced by proper system designs - Software-level Agreements (SLA): performance guarantee agreements defined by user and enforced by the implementation - Software-level Objective (SLO): key elements of SLA, objectives that the implementation must achieve to satisfy the SLA 2
  • 3.
    SLO/SLA Example: ReeferContainer Shipment reference implementation* *see https://ibm-cloud-architecture.github.io/refarch-kc/ • Business goal: customer experience should not be affected by delay • Translate to SLA/SLO - 99% of order placement < 2s - No order placement lost - Availability > 99.999% - Capacity > 1M concurrent orders - Business logics change - Market growth - Response to competitors’ improvement 3 Can we translate the business goal to system implementation fast?
  • 4.
    Serverless (industry perspective) •According to Cloud Native Computing Foundation*: “Serverless computing refers to the concept of building and running applications that do not require server management. It describes a finer- grained deployment model where applications, bundled as one or more functions, are uploaded to a platform and then executed, scaled, and billed in response to the exact demand needed at the moment” * https://www.cncf.io/blog/2018/02/14/cncf-takes-first-step-towards-serverless-computing/ Time Workload Serverless Allocation Cost - Autoscaling: dynamically add/remove resources according to workload, scale-to- zero - Truly pay-as-you-go: only pay for function execution time with fine-grain pricing (per 100ms) - Detach resource management from software development/maintenance process - Speed up software evolution 4
  • 5.
    Serverless Implementation Kubernetes is“an open-source system for automating deployment, scaling, and management of containerized applications”* • Become the de facto standard platform for container orchestration. • Market adoption is strong: 80% of companies used Kubernetes in some forms in 2019, and the number is subjected to increase*** Knative is “an open source community project which adds components for deploying, running, and managing serverless, cloud native applications to Kubernetes”**. * https://kubernetes.io ** https://www.redhat.com/en/topics/microservices/what-is-knative *** Kubernetes Ecosystem, https://thenewstack.io/ebooks/kubernetes/state-of-kubernetes-ecosystem-second-edition-2020/ 5
  • 6.
    Example: handling anew shipment order Preprocess Find container Schedule shipment New order Response 𝑓 1. Submit 2. Establish Knative Kubernetes 4. Invoke 3. Subscribe 𝑓 5. Publish 6. Trigger Container Container 𝑓 Kafka Kafka Container 6
  • 7.
    Why Serverless SLO? •In Serverless, it is order of magnitude faster to translate among business logics and system designs • Zero deployment/maintenance effort • New way of utilizing resources • Finer grain resource management  truly pay-as-you-go • High flexibility  more rooms for resource management optimization • Spatial: stateless • Temporal: short execution time  Potential way to quickly define and enforce SLOs 7
  • 8.
    Experiment setup forserverless sequence* 100ms 100ms 100ms … - Periodic bursts - Constant ramp-up, 4 orders per sec2 - Limited rate, up to 120 orders per sec Workload (Input) Preprocess Find container Schedule shipment New order Response forward-1 Kafka Kafkaforward-2 forward-3 Output System specs: - Knative v0.15.1 over Kubernetes v1.18.2 supported by KinD** - Events are transmitted through Kafka v2.5.0 - 12 CPU, 48 GB Memory and 256GB SSD *Script available at https://github.com/ngduchai/event-driven-apps **KinD helps deploying Kubernetes cluster over local computer, see https://kind.sigs.k8s.io 8
  • 9.
    Invocation Overhead offorward-1 • Scaling delay: Serverless framework need time to detect workload changes and schedule resources accordingly • Cold start: It takes time to allocate and initialize new function sandboxes Latency distributionDemand vs. System capacity9
  • 10.
    Autoscaler behaviors overthe Sequence Scale-up delay propagation Latency delay propagation - Serverless platform is not aware of the sequence topology - The overhead is amplified as computation propagates over the topology 10
  • 11.
    Problem and Approach •Can we leverage the serverless model to define and enforce a new set of SLA/SLO to support fast translation from business logic to application solutions? SLO EnforcerBusiness + Application Logics Application + SLO Description Workload 1. Define 2. Deploy 5. Config 3. Run 4. Feedback with perf. metrics 6. Evaluate 11
  • 12.
    Application Description: OpenApplication Model (OAM) • “Open Application Model (OAM) is a runtime-agnostic specification for defining cloud native applications and enable building application-centric platforms by natural. Focused on application rather than container or orchestrator, Open Application Model brings modular, extensible, and portable design for building application centric platforms on any runtime systems like Kubernetes, cloud, or IoT devices.” * • Provide for translating business goals to application design and application description, including • Computation components (e.g. Micro-services) • Topology • Deployment environment • We extend OAM for defining and enforcing SLO by • Adding serverless (Knative deployment) description support • Adding SLO description *OAM see https://github.com/oam-dev/spec 12
  • 13.
    SLO Description • Aslong as • input rate < 100 rps: • Ramp-up < 10 req/sec*sec • Guarantee: • 95th latency < 1000ms • 400ms <= Mean latency <= 600ms • Stddev latency <= 200 input output Per-request Pricing End-to-end Latency Earning ($) < 600 0.00002 < 1000 0.00001 Otherwise 0 Translate to SLO Add to Blueprint (slodesc.yaml) 13
  • 14.
    Control Invocation Overheadwith Knative • Cold start: multiplex multiple invocation into big pods (e.g. containers) to reduce # cold starts • Scaling delay: reserve extra resource to buy time when workload surges 14 Note: 400m+1 = use pod size of 400m with reservation = 400m (1 pod)
  • 15.
    Topology Awareness Configurationwith Knative • Select the right pod size • Select the right reservation Increasing pod size doesn’t always improve concurrency significantly Increasing workflow size increases the per-component deployment cost 15
  • 16.
    Demonstration • Naïve serverlessvs. Serverless + SLO enforcement • Reuse the previous workload, application logics, and system setup • SLO: as long as input rate < 120 order per sec: • 95th of end-to-end latency < 1500ms • Recorded Videos • LINK TO BE ADDED 16
  • 17.
  • 18.
    Results: Latency distribution Serverlessdeployment Serverless + SLO enforcer deployment Extremely high latency due to scaling lag and topology unawareness 18 SLO Enforcer successfully meet SLO requirements by choosing right pod size and reservation
  • 19.
    Results: Earning • Costis calculated based on IBM container pricing*, with 0.000034 USD/Second/Core * https://cloud.ibm.com/kubernetes/catalog/about Per-request Icome End-to-end Latency Income ($) < 500 0.0000012 < 1000 0.0000011 < 1500 0.000001 Otherwise 0 19 SLO Enforcer meet the SLO at reasonably low cost, thereby creates high earning, satisfying the business goal.
  • 20.
    Related Work • Handlinginvocation overhead • Cold start: SAND (ATC18), SOCK (ATC18), Catalyzer (ASPLOS20) • Scaling: Shadrad et. al. (ATC’20) • Per-function optimization, no topology support • Topology-aware Deployment for Serverless: IBM Composer, AWS Step Function • Simple topology (sequence + parallel), no performance guarantee • Performance Guarantee for Serverless: Real-time Serverless (WoSC19) • Rate guarantee but no topology support 20
  • 21.
    Conclusion and FutureVision • Serverless opens opportunities to quickly build and adjust software solution to business goals. But many challenges arise • Scaling overhead • Lack of topology awareness • We propose an SLO Serverless interface to describe and enforce business goals in terms of SLOs • Long-term vision • Support more complicated workload topologies • Efficient SLO enforcement (smarter metrics selection, ML approaches, etc.) • Generic mechanism for all serverless platforms (not just Knative) 21
  • 22.
    Coming soon… • Willbe available soon • Blog post • Demonstration scripts • … and slides, demo recording and talk recording posted later after the talk 22
  • 23.
  • 24.
  • 25.
    𝑓 𝑓 𝑓 𝑓 𝑓 𝑓 𝑓 … New orders Valid? Yes No ReturnInspect Find voyages Findcontainer Workload Generator Order 𝑓 AllocatedOrder RejectedOrder Workload Observer Find containers for an order Supply chain from manufacturer to retailer The Reality 25
  • 26.
    Workload Generator Order 𝑓 AllocatedOrder RejectedOrder Workload Observer Find containers foran order Workload Generator Order 𝑓 Workload Generator 𝑓 Workload Generator Empty 26