Maintaining SLOs of Cloud-native Applications via Self-Adaptive Resource Sharing

Vladimir Podolskiy*, Michael Mayo**, Abigail Koay**,
Michael Gerndt*, Panos Patros**
*Technical University of Munich (TUM), Germany
**University of Waikato, New Zealand
IEEE SASO 2019
Umeå, Sweden, June 18th 2019
Full Paper
Cloud-based Adaptation
Maintaining SLOs of Cloud-native Applications
via Self-Adaptive Resource Sharing

2Panos Patros & Vladimir Podolskiy | Maintaining SLOs of Cloud-native Applications via Self-Adaptive…
We are…
Vladimir Podolskiy
3rd-year PhD, TUM
Predictive
autoscaling and
anomaly detection
Panos Patros
Lecturer in
Software
Engineering
Head of ORCA lab

• ORCA Lab at Waikato, NZ
• Started in Jan 2019
• 1 Research Assistant
• 9 Research Students
• 7 Faculty
• 2 Interns
• 8 International Collaborators (Canada, Germany, USA)
• New graduate-level course (70% Project Component)
• COMPX529 Engineering Self-Adaptive Systems
Oceania Researchers in Cloud and Adaptive-systems
Ohu Rangahau Kapua Aunoa

• Background:
 Containerization & Cloud Computing
 Resource Sharing & Container Orchestration via Kubernetes
 Machine Learning & Lasso Regression
• Motivation of the Study
• Research Problem
• Data Collection
• Proposed Approach:
 Anomalies Removal
 Prediction of the Service Level Indicators
 SLO-compliant Resource Allocation
• Evaluation:
 Method
 Results
 Limitations
• Usage Scenario: Augmenting Vertical Pods Autoscaler (VPA) of Kubernetes
• Conclusions & Future Work
Contents

Introduction:
Background
Motivation
Research Problem

• Cloud
• Abstracts Computing Resources
• Containers
• Low-overhead OS-level virtualization
• Multitenancy
• Containerization of apps
• Kuberneres
• Orchestration of app containers
• Resource Management
• Soft (e.g. CPU shares) and hard (e.g. LFS) limits
• Machine Learning
• Least Absolute Shrinkage and Selection Operator (LASSO)
• Linear Regression, shrinks data towards mean
Background

• Satisfyings requirements at varying loads
• Engineering is for the people
• Business, Environment and Society Sustainability
• Cloud Service Level Agreements (SLAs)
• Service Level Objectives (SLOs)
• Financial penalties
• Service Level Indicators (SLIs)
• Tradeoffs
• Performance
• Resource Consumption and Cost
• Isolation and Security
• State of the art: scaling out (adding containers)
Motivation

• Consider a saturated container cloud
• Little/no benefit from scaling out
• Or up (finite pie)
• Instead, change resource limits
• However, CPU utilization SLIs do not mean much to end-users
• Instead, use response time and throughput
• However, hard to autonomously map to limits
• Therefore, main contribution
• Collect dataset from multitenant deployments
• Detect and remove anomalies (GC, etc.)
• Machine Learn RT/Thru performance models
• Resize containers based on loads and target SLOs
Research Problem

Data Collection:
Testbed
Applications
Load Driving

• Machine provided by STRATUS (cybersecurity) project
• Availability is a key cybersecurity requirement!
• 24-CPU Intel Xeon
• 256GB RAM
• Private and local cloud
• Performance isolation is imperative
• Can„t rely on public clouds
• 8 VMs (4 CPUs + 4GB RAM)
• 1 master, 8 workers
• Kubernetes using Oracle„s Vagrant Script
• Load was driven from a separate machine
• Performance isolation is imperative! 
Testbed

1. NGINX (app1)
• Single container image
• Replicated 7 times (Load balancer Kubernetes service
exposed)
2. IBM Webshpere Liberty Profile (app2)
• Single container image: runs IBM JVM + Liberty Profile
• Replicated 7 times (Load balancer Kubernetes service
exposed)
3. Redis + PHP Guestbook
• PHP x7, Redis-Master x1, Redis-Slave x7
Deployed Applications

• Separate Machine provided by University of Waikato
• 8-CPU Intel Xeon
• 16GB RAM
• Stress-Testing Script (dataset creation):
1. Select random workloads
2. Select random CPU limits (soft and hard)
3. Redeploy apps
4. Fire requests using Apache ab x16
5. Collect RT and Thru SLIs
6. Repeat x500
Load Driving

Approach to Maintain SLOs of Cloud-native
Applications via Self-Adaptive Resource Sharing:
Anomalies Removal
Prediction of the Service Level Indicators
SLO-compliant Resource Allocation
Limitations of the Approach

MAPE-K Inspired Architecture of the Solution

Approach Overview
Collecting SLI Values for
Various Resource Limits and
Workload Rates
Anomalies Identification and
Removal
Learning Prediction Models
𝑆𝐿𝐼 = 𝑓(𝑊𝑜𝑟𝑘𝑙𝑜𝑎𝑑, 𝑅𝑒𝑠𝐿𝑖𝑚)
Deriving the Resource Limits for
Applications via Optimization

Approach Overview
Workload Rates
Removal

• Any data-based method, including prediction, is as good as the data
given to it (Garbage-in-Garbage-our principle)
 If the input does not contain the information to describe the output,
then the model produced by any approach won‟t be accurate
Anomalies Removal: Motivation

• Solution:
Add Missing
Information to the Input
(collect anew?)
Remove
the Intractable Data
from the Output
OR

GOAL – to leave only the explainable
(~normal distribution)
Remove
the Intractable Data
from the Output

1. Expectation-Maximization (EM)
Clustering with 10-fold cross-
validation to get the clusters of
similar observations
2. Find the cluster that represents
the anomalies (“too high” and
“too low” SLI values + high
standard deviation)
3. Remove the data grouped into
this cluster from the dataset and
try to fit the model to see
whether R2 score improves.
~13% of anomalies to be removed
Anomalies Removal: Approach
Alternative Isolation Forest
approach gives similar ~11%

Anomalies Removal: Result

Approach Overview
Workload Rates
Removal

• Performance model allows us to answer the following question
 What performance can be achieved for the given configuration?
… without testing all the possible options.
Learning the Performance Model: Motivation

• Performance model allows us to answer the following question
 What performance can be achieved for the given configuration?
… without testing all the possible options.
• In our context that question would sound:
 What service level indicators values (throughput and 99%-tile
response time) can be achieved for the workload rate and
resource limits (CPU in millicores) given for our co-located
containerized applications?

• Possible answers:
Analytical
(Expert Modeling )
Black Box
(Machine Learning)
OR

…BUT WHY?
Black Box
(Machine Learning)

HYPE?
FUNDING?
TO GET PAPER ACCEPTED?
LAZINESS?
TO GET CITATIONS?
Black Box
(Machine Learning)

TOO MANY APPS TO GENERALIZE
WITH FIXED MODELS
Black Box
(Machine Learning)

• Challenge – many ML approaches (linear regression, lasso regression,
neural networks). How to select?
Pre-Study I – ML Approach Selection

• Challenge – many ML approaches (linear regression, lasso regression,
neural networks). How to select?
• Via R-squared (R2)! It is a statistical measure that represents the
proportion of the variance for a dependent variable that's explained by
variables in a regression model.

• Unresolved Questions:
 What should be predicted? All? SLIs for the given application?
Specific SLI for all apps?
 What degree of the polynomial should be selected for the model?
Pre-Study II – Model & Parameters Selection

• Option I: Independent Models (single output variable):
 A) Without target variables as predictors
 B) With target variables as predictors

• Option II: Application-wise Models (two output variables):

• Option III: SLI-wise Models (three output variables):

• Option IV: All-targets Models (six output variables):

• Model of choice – Application-wise model of degree 1 with target
variables as predictors
Learning the Performance Model: Results
Performance
Model for
App 1
Workload rates
Resource limits
SLIs (App 2, 3)
SLIs
(App 1)
Performance
Model for
App 2
Workload rates
Resource limits
SLIs (App 1, 3)
SLIs
(App 2)

• Reasons:
 Consistent resource limits per app for the later stages
 Well-balanced (High R-squared and small fitting time)
 Scales well with increase in the number of apps
Learning the Performance Model: Results

Approach Overview
Workload Rates
Removal

• Service Level Objectives (SLOs) are the thresholds put on Service
Level Indicators such as throughput or response time that characterize
appropriate behavior of the system
• Example: the user should receive the response in under 800 ms for 99%
percent of the requests and the system should process not less than 30
requests per second
SLO-compliant Resource Allocation: Motivation

requests per second
• If there are not enough resources (CPU, memory…), the requests could
end up being dropped or served in more than 800 ms

requests per second
• If there are not enough resources (CPU, memory…), the requests could
end up being dropped or served in more than 800 ms
→ The system should have
enough capacity to minimize SLO violations
under the changing workload

Seems like
OPTIMIZATION PROBLEM!

• SLOs for ith app:
 on throughput:
 on response time:
• Per app-SLI pair cost functions:
 for response time:
 for throughput:
• Application-wise cost function:
SLO-compliant Resource Allocation: Formalism
Predicted SLIs

• Formulation of constrained optimization problem:
SLO-compliant Resource Allocation: Formalism

• Constraint: NP-hard nonlinear integer programming formulation
SLO-compliant Resource Allocation: Design

• Workaround: solving as continuous constrained optimization problem

• Selected optimization method: trust region-based for nonlinear
constrained optimization

• Alternatives and augmentations:
 pure fine-grained brute force with step size 10 by 10 by 10 (BF-10)
 pure coarse-grained brute force with step size 50 by 50 by 50 (BF-50)
 pure trust region-based continuous optimization (CO)
 continuous optimization with coarse-grained brute force
(Hyb = CO + BF-50)

• Parameters:
 For all apps - ms and RPS
 Limit on CPU: 3000 mCPUs
 Number of tests: 10 for both soft and hard limits
• Evaluation results:

• Evaluation results:
• Conclusions on method selection for resource allocation:
 Hybrid method is more accurate than pure continuous optimization
 Fine-grained brute force is out of option due to high execution time
 Coarse-grained brute force has good accuracy and low execution time
but scales badly
 Hybrid method finds a good balance between execution time and the
quality of solution

Hence,
We will allocate the resources
with the hybrid approach

Evaluation:
Method
Results
Limitations

• Validation test consists of two parts:
 Preliminary Validation Test (PVT) to acquire SLIs values used as in
application-wise performance model (16 times with ab); no optimization
 Evaluation Validation Test (EVT) to conduct the real evaluation based
on values from PVT (16 times with ab); optimization is done with hybrid
approach
Evaluation Method

• Test settings:
 PVT:
 EVT:
Evaluation Method
Result of PVT

• The approach proven to be appropriate for the installation and SLOs:
 at most 2 SLO violations out of 16 trials for 99%-tile response time
 at most 1 SLO violation out of 16 trials for the throughput
Evaluation Results

• Simplistic dataset that does not allow to proof whether the approach is
feasible for more complex applications
• Performance models susceptible to influences of events that are not
reflected by the input variables (such as garbage collection in Java apps)
• Cost of the resources is not taken into account
• Focus on CPU
Limitations of the Approach

Usage Scenario:
Kubernetes’ Vertical Pods Autoscaler (VPA)
Augmenting VPA

• Known Kubernetes autoscaling options:
 Horizontal Pod Autoscaler (HPA) // production-ready; allows to change number of pods
 Cluster Autoscaler (CA) // production-ready; allows to change number of instances for
various cloud providers overriding native autoscalers
 Vertical Pod Autoscaler (VPA) // beta; allows to change amount of resources (CPU,
memory) allocated to pod, but requires pod restart
 Addon Resizer (AR) // beta; simplified VPA that modifies resource requests based on the
number of nodes
• All options utilize reactive approach
• HPA supports arbitrary scaling metrics
• VPA and HPA are currently incompatible when scaling on memory/CPU
Autoscaling in Kubernetes

Vertical Pod Autoscaler (VPA) sets the resource requests automatically based on usage and
thus allowing the proper scheduling onto nodes so that appropriate resource amount is
available for each pod1)
Core components of VPA – Recommender and Updater
Vertical Pod Autoscaler
1) https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler

Recommender computes the recommended resource requests for pods based on current and
historical usage of resources.
First, it equally shares the minimal amount of resources for containers in the given pod (by
default – 250 Mb of memory, 25 mCPUs):
𝒓𝒊 =
𝟏
𝒏𝒋
∙ 𝑹𝒋
where for the given resource type: 𝑟𝑖 is the minimal amount of the given resources for ith
container of jth pod, 𝑛𝑗 is the number of containers in jth pod, 𝑅𝑗 is the minimal amount of the
given resource for jth pod.
VPA: Recommender (1)

Second, it utilizes three chains of estimators to produce target estimate, lower bound
estimate and the upper bound estimate of the resources to allocate to the pod. Each chain
contains percentile estimator (90%, 50%, 95% corr.) and margin estimator (15% overhead)
that follows it. For lower and upper bounds, the confidence multiplier is added (𝑘 =
1 + 𝑚 𝑑 𝑒
). Max of these estimates and of minimal amount of resource from before is
selected. Input is resource usage data collected for 𝑑 days.
So, for either of two resource types (CPU, memory) we have:
𝑅𝑗 = max⁡ 𝒓𝒊⁡; 1.15 ∙ 𝑟𝑖,90%
𝑛 𝑗
𝑖=1
𝑅𝑗 = max⁡ 𝒓𝒊⁡; 1.15 ∙ 𝑟𝑖,50% ∙ 1 +
0.001
𝑑
−2𝑛 𝑗
𝑖=1
𝑅𝑗 = max⁡ 𝒓𝒊⁡; 1.15 ∙ 𝑟𝑖,95% ∙ 1 +
1
𝑑
𝑛 𝑗
𝑖=1
VPA: Recommender (2)

Updater runs in Kubernetes cluster and decides which pods should be restarted based on
resources allocation recommendation calculated by Recommender. Practically speaking,
Updater evicts the pods to be updated, whereas the actual recreation of pods with new
resource requests is shifted to the particular controller of pods (e.g. Deployment/ReplicaSet).
The only noticeable thing about the Updater is that pods are updated in the order of priority.
Update priority is proportional to fraction by which resources should be increased / decreased.
Hence, the update priority for the jth pod is computed as follows:
𝑝𝑗 =
𝐶𝑃𝑈𝑖
𝑅𝑒𝑞
− 𝐶𝑃𝑈𝑖
𝑅𝑒𝑐𝑛 𝑗
𝑖=1
𝑛 𝑗
𝑖=1
𝐶𝑃𝑈𝑖
𝑅𝑒𝑞𝑛 𝑗
𝑖=1
+
𝑚𝑒𝑚𝑖
𝑅𝑒𝑞
− 𝑚𝑒𝑚𝑖
𝑅𝑒𝑐𝑛 𝑗
𝑖=1
𝑛 𝑗
𝑖=1
𝑚𝑒𝑚𝑖
𝑅𝑒𝑞𝑛 𝑗
𝑖=1
Currently the only supported update strategy is based on pods restarts.
VPA: Updater

• Add relevant VPA settings, e.g. SLOs, number of trials to get data
• Add VPA Performance Data Collector
• Add VPA Recommender option to use the presented approach
for SLO-compliant resource allocation
• Augment VPA Updater and runtimes to avoid restart of pods
on vertical scaling
Augmenting VPA: the Proposal

Conclusions & Future Work

• Major contribution:
 approach to the SLO-compliant resource allocation problem for co-
located containerized applications with the following steps:
1. Collection SLI values for various resource limits and workload rates
2. Removing anomalies that cannot be explained through available
features via clustering
3. Learning prediction models relating SLIs to parameters of workload
and resource limits
4. Deriving the resource limits for applications deployment via
continuous optimization and limited brute force search for known
SLOs
 the approach is validated with at most 2 SLO violations among 16 trials
Conclusions

• Additional findings:
 an approach to select the model features and parameters thereof in
order to increase the accuracy of the SLI prediction model
 lasso regression-based models of degree 2 seem to suffice for
predicting SLIs
Conclusions

• Augment the approach and repeat the study for larger and more realistic
set of apps
Future Work

set of apps
• Evaluation of the impact of runtime/technology-specific behaviors like
garbage collection on SLIs and search for predictors for such behaviors
Future Work

set of apps
• Evaluation of artificial neural networks for predicting SLIs
Future Work

set of apps
• SLO-compliant resource allocation for individual microservices of
compound applications
Future Work

set of apps
• SLO-compliant resource allocation for individual microservices of
compound applications
• Derivation of models for allocation of other resources like RAM
Future Work

76
Contacts
Vladimir Podolskiy
v.podolskiy@tum.de
/vladimirpodolskiy
/Vladimir_Podolskiy
Panos Patros
panos.patros@waikato.ac.nz
/panos-patros
/Panos_Patros

Maintaining SLOs of Cloud-native Applications via Self-Adaptive Resource Sharing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Maintaining SLOs of Cloud-native Applications via Self-Adaptive Resource Sharing

Similar to Maintaining SLOs of Cloud-native Applications via Self-Adaptive Resource Sharing (20)

Recently uploaded

Recently uploaded (20)

Maintaining SLOs of Cloud-native Applications via Self-Adaptive Resource Sharing