Automated Parameterization of Performance Models from Measurements

Automated Parameterization of
Performance Models from
Measurements
Giuliano Casale
Imperial College London, UK
Simon Spinner
University of Würzburg, Germany
Weikun Wang
Imperial College London, UK
Tutorial @ ICPE 2016, Delft, the Netherlands, March 13, 2016

2
Workload Characterization
Common parameters in performance models:
 Service demand of a request
 CPU time, bandwidth consumed, …
 Arrival rate of requests
 Applications
 Automated performance modelling
 Resource cost splitting
 Performance anomaly detection
 …

3
Example: Queueing Modelling
Demands
Java Modelling Tools:
http://jmt.sf.net

4
Example: Cost Splitting
Carico Workload 1
Transazioni/ora
Lun Mar Mer Gio Ven Sab Dom
0100200300400
Carico Workload 2
Transazioni/ora
050150250
Ripartizione dei Costi
UtilizzoCPU-%
020406080100
Carico Workload 3
Transazioni/ora
050100150200
Workload 1
Workload 2
Workload 3
#Transactions#Transactions#Transactions
Mo Tu We Th Fr Sa Su
How to recover weight of individual
contributions to utilization?
Utilization[0-100%]
We know from theory that the weight is
exactly the service demand!

5
A Typical Challenge
5
HTTP Requests
in the WS
(Web Server)
Observation period T
1 request in WS
3 requests in WS
0 requests in WS
Time

6
A Typical Challenge
 OS schedules jobs in round robin
If n requests run simultaneously, each will
approximately receive 1/n of the CPU time
Process Sharing is a round robin where the
quantum of time assigned to each request is
infinitesimal
6
X X
Service time S
of the yellow request
Time
CPU
33% CPU
time each
50%
each
100%
for blue
X
Quantum
Requests
Arrive
Simultaneously
3 requests
running

7
Tutorial Agenda
 Introduction
 Demand Estimation
Utilization-based
– LibReDE tool
Response-based
Queue Length-based
– FG tool
 Comparison Study & Case Studies
 Arrival Rate Estimation
– M3A tool

2
Overview
 Estimation Approaches (first part)
 Simple
 Utilization
 Response Time
 LibReDE demo
 Estimation Approaches (second part)
 Queue Length

3
Response Time Approximation
 Trivial approximation: 𝐷𝐷𝑖𝑖,𝑐𝑐 ≈ 𝑅𝑅𝑐𝑐
 Assumptions
 resource dominates system response time
 waiting time in queue ≪ 𝐷𝐷𝑖𝑖,𝑐𝑐
 Only applicable at low resource utilization

4
Service Demand Law
 Basic operational law:
𝐷𝐷𝑖𝑖,𝑐𝑐 =
𝑈𝑈𝑖𝑖,𝑐𝑐
𝑋𝑋0,𝑐𝑐
 Partial utilization 𝑈𝑈𝑖𝑖,𝑐𝑐 is hard to derive
 Operating system: per-process statistics
 Profilers: high-overheads
 2 alternative solutions:
 Controlled experiment
 Partitioning

5
Controlled Experiment
 Measurement
Interval
 CPU Utilization
 Requests executed in separate experiments
 Resource
Demand
 Problems:
 Not applicable at runtime
 Mutual interference
40%
Request1
30%
Request2

6
Partitioning
 Measurement
Interval
 CPU Utilization
 Mixed Workload
 How to partition
processing time?
 Response times
 Additional performance counters
60%
Request2
Request1
?

7
Estimation Approaches
Data Collection Data Collection
Demand Estimation
Demand Estimation
Modeling Assumptions
(scheduling, service distribution)
(scheduling, service distribution)
Model Solution Model Solution
Utilization Approach Response Time Approach

8
Linear Regression
 Linear model (based on Utilization Law)
 𝑈𝑈𝑖𝑖 = ∑ 𝑋𝑋𝑖𝑖,𝑐𝑐
𝑁𝑁
𝑐𝑐 ∙ 𝐷𝐷𝑖𝑖,𝑐𝑐 + 𝑈𝑈0
 Example:
 At least 𝑚𝑚 > 𝑛𝑛 observations required
 Alternative solutions:
 Least-Squares Regression
 Least Absolute Differences Regression
 …
0.54 = 3 * 𝐷𝐷𝑖𝑖,1 + … + 8 * 𝐷𝐷𝑖𝑖,𝑛𝑛
0.72 = 9 * 𝐷𝐷𝑖𝑖,1 + … + 4 * 𝐷𝐷𝑖𝑖,𝑛𝑛
0.33 = 2 * 𝐷𝐷𝑖𝑖,1 + … + 9 * 𝐷𝐷𝑖𝑖,𝑛𝑛
… = …

9
Example: Outliers
 Outliers can bias the regression
Fit
Without
Outliers

10
Example: Collinearities
 e.g.: logins proportional to logouts
“Right” answer
not well-defined

11
Other Approaches
 Robust regression
 Least Absolute Differences: Zhang et al. (2007)
 Least Trimmed Squares: Casale et al. (2008)
 Machine-learning
 Clusterwise linear regression: Cremonesi et al. (2010)
 Pattern matching: Cremonesi et al. (2014)

12
Utilization Approaches
 Utilization-based approaches
 Advantages
 Only utilization and throughput data required
 Minimal assumptions:
– Any scheduling strategy
– Any interarrival distribution
– (Any service time distribution)
 Disadvantages
 Robustness
 Amount of data

13
Response Time Approaches
 Assumptions
 Single queue: closed-form solution exists
 Queueing network: product-form
 Response time equations depend on
 Scheduling strategy
 Service distribution
 Interarrival time distribution
If M/G/1 with PS or LCFS
or M/M/1 with FCFS and class-independent service
times, then 𝑅𝑅 =
𝐷𝐷
1−𝑈𝑈

14
General Optimization
 Assumptions:
 Variables 𝑫𝑫 = (𝐷𝐷1,1, … , 𝐷𝐷1,𝑛𝑛, … , 𝐷𝐷𝑖𝑖,1, … , 𝐷𝐷𝑖𝑖,𝑛𝑛)
 Queueing Network QN  𝒛𝒛 = 𝑓𝑓(𝑫𝑫)
 Observation data 𝒛𝒛�
 Optimization Problem:
 min
𝑫𝑫
𝒛𝒛 − 𝒛𝒛�
 𝑫𝑫 may be subject to certain constraints
arbitrary norm

15
Examples
 Menascé (2008):
 Liu et al. (2006):
Squared response time error Product-form solution (non-linear!)
Constrained to valid solutions

16
Kalman Filter
 Dynamical system
 State model:
𝑿𝑿𝑘𝑘=𝑭𝑭𝑘𝑘−1 𝑿𝑿𝑘𝑘−1+𝑩𝑩𝑘𝑘−1 𝒖𝒖𝑘𝑘−1 + 𝒘𝒘𝑘𝑘−1
 Observation model:
𝒁𝒁𝑘𝑘 = 𝑯𝑯𝑘𝑘 𝑿𝑿𝑘𝑘+𝒗𝒗𝑘𝑘
 Filter
 𝒛𝒛𝑘𝑘, 𝒛𝒛𝑘𝑘−1, 𝒛𝒛𝑘𝑘−2, … , 𝒛𝒛1  𝑿𝑿� 𝑘𝑘~𝑁𝑁(𝒙𝒙�𝑘𝑘, 𝑷𝑷� 𝑘𝑘)
observations
previous state controlled input uncorrelated noisenext state
estimated mean value
estimated covariance
time-series of observations
observation noise

17
Applied to Demand Estimation
 State vector 𝑿𝑿𝑘𝑘 = 𝑫𝑫
 Constant state model 𝑿𝑿𝑘𝑘=𝑿𝑿𝑘𝑘−1 + 𝒘𝒘𝑘𝑘−1
 Observation model (e.g., Kumar et al. 2009)
𝑅𝑅1
⋮
𝑈𝑈
=
𝐷𝐷1
1 − 𝑈𝑈
⋮
𝑋𝑋1 ∙ 𝐷𝐷1 + ⋯ + 𝑋𝑋𝐶𝐶 ∙ 𝐷𝐷𝑐𝑐
 Other observation models are possible (e.g.,
Zheng et al. 2008, Wang et al. 2012)

18
Want to learn more?
Elsevier PEVA, October 2015.

19
http://descartes.tools/librede
Eclipse Public License (EPL)

1
Response Time Based
Estimation
Joint work with S. Kraft and S. Pacheco-Sanchez (SAP Belfast,
UK) and Juan F. Pérez (U. Melbourne, Australia)

2
Paradigm Shift
Demand Estimation
(Scheduling, Service Distribution)
Model Generation and Solution
Data Collection
Demand Estimation
(Scheduling, Service Distribution)
Model Generation and Solution
Data Collection
Utilization Approach Response Time Approach

3
 Estimate demand D from response time R
 We investigate the likelihood function in
 First-Come First-Served (FCFS) queues
– e.g., admission control, disk drive buffers, …
 Processor Sharing (PS) queues
– e.g., CPUs, bandwidth sharing, …
Observation
R
D4
D3
D1
D2D0
1. For each observed R sample
2. Draw D from parameter space
3. Compute likelihood P[R|D]
4. Move in parameter space to
maximize P[R|D]
Parameter Space
Maximum Likelihood Estimator

4
 Model response time using absorbing CTMCs
 Under FCFS, future arrivals do not affect
response time distribution of the tagged job
RT Likelihood in FCFS queues
D1D2 D2 D1D1
nK, jobs of class k in queue
D1D1
1
2
3
45
1
Absorbingtransition
Model
State Space of
Markov Chain
2
1
D1D2 D2 D1 D1
3
D1D2 D2
4
D2 D2
5
D2
1

5
RT Likelihood in FCFS queues
 Probability of being absorbed by time t
into a give CTMC state
 Well-understood: PH-type distributed
FCFS Example:
Backlog
seen
upon
arrival
Class 1 arrival
ML Problem (K classes)

6
 Monitoring dataset
 Active mix: (1 ,2 ,1 ,1 , 0 )
 Admission state (mix) and response times
Assumptions
V CPUs
R Classes +
Class switching
W Workers
…
Admission
Response time
Multi-core server

7
PI: Trajectory Inference
 Class switching probabilities
 Users submit requests cyclically
 Requests issued change class over time
 Closed class-switching queueing model
 V CPUs, R classes O(V2R) states
 Inference of trajectory too complex
?

8
Dataset characteristics
 CI: Complete information (baseline)
 V=1 CPU: full state trajectory
 V>1 CPUs: no individual CPU states
 We split demand proportionally, taking into
account the active workers
 PI: Partial information
 Sample admission state and response time
 Mean throughput is assumed known

9
CI: Demand estimation
 V=1 CPU
 Full demand distribution Request
Runtime
Active workers
Demand
Request j
Class r
Scale by Active CPUs V>1 CPU

10
0
1
2
3
4
5
6
7
8
9
10
20 60 100 140 180 220
100 samples
1000 samples
CI: Baseline
Number of users (N) – R=2 classes - 0.1 prob. class switch
Error(%)
V=2 CPUs, W=8 workers

11
 CI requires very detailed measurements
 Closed queueing network model
 Assume a fixed mix as seen upon job arrival
 No class switching ( tractable)
 Model can estimate response time of arriving job
PI: Approximation
CPU-0
CPU-1
Admission
Inter
Admission
Time
CPU 0+1
(PS queue)

12
 V=1 CPU
 Model assumed in equilibrium
RPS: Regression Approach
Queue seen
at admission
(incl. arriving job)
Response
Time
Class r
 V>1 CPU
 Individual CPU state estimated
Demand
Class r
Average queue per CPU

13
0.00
20.00
40.00
60.00
80.00
100.00
120.00
140.00
160.00
20 60 100 140 180 220
CI
RPS
RPS Results
100 samples, V=4 CPUs, W=16 workers
Error(%)
Best in heavy-load

14
MLPS: Maximum Likelihood
 Maximum Likelihood Estimation (MLPS)
 Search over mean demand guesses
 Maximize likelihood of observed dataset
 Response time likelihood
 Tagged customer method (absorbing CTMC)
 Initialized with state seen upon admission
 Mean demand guess CTMC rates

15
 Job of class 3 arrives at system with V=1 CPU
 Mix seen upon arrival: 1 job of class 1, 3 jobs of class 2
 We study the transient of this CTMC to obtain the
response time distribution of the class-3 job
MLPS: Absorbing CTMC
0,0,1
1,0,1
0,1,1
0,2,1
0,2,1
1,1,1
0,3,1
1,2,1
1,3,1
1/E[D3]
1/(2E[D3])
1/(3E[D3])
1/(4E[D3])
1/(5E[D3])
1/(4E[D3])
1/(3E[D3])
1/(2E[D3])
1/(3E[D3])
Initial state
Transitions depend
on E[D1] and E[D2]

16
MLPS: Absorbing CTMC
 V=1 CPU
 Dataset:
 Likelihood function for each sample:
 V>1 CPUs
 Load-dependent rates
init with state at
admission
trajectory
in ri sec
completion rates
CTMC generator1/demand

17
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
20 60 100 140 180 220
CI
MLPS
MLPS: Results
Error(%)
Best in light load

18
0.00
2.00
4.00
6.00
8.00
10.00
12.00
20 60 100 140 180 220
CI
MINPS
MinPS: min(DRPS, DMLPS)
Error(%)
best in light or heavy load

19
1
10
100
1,000
2 4 8 16 32 64
MLPS
FMLPS
RPS/ERPS
MinPS: ODE Approximation
100 samples: W=128 workers
V (CPUs)
RunningTime[s]
exponential
growth
scalable
ODE approx

20
MinPS: Sensitivity Analysis
 Magnitude of class demands
 3 orders of difference: CI gap ~insensitive
 Class switch probability
 High / Rare: CI gap ~insensitive
 Non-exponential service
 low CV: CI gap weakly sensitive
 high CV: CI gap ~insensitive for CV<2

21
Case Study: SAP ERP
 3-tier commercial application
 Modified MLPS with setup times
 Transactions grouped in R=2 classes
Response Time
User 1
Worker Database
SAP ERP Application Server
Workload
Generator
Dispatcher
SAP ERP Database

22
0
0.05
0.1
0.15
0.2
0.25
N 5N 7N 10N 15N
MEAS
SIM
SAP ERP Queueing Model
Measured vs Simulation with Estimated Demands
Population - Baseline N = [6 jobs class 1, 4 jobs class 2]
ResponseTime

23
Fluid MLPS (FMLPS)
 Limit behaviour of the CTMC for growing rates
and requests increasingly deterministic
 V=scale factor. Request mix is unchanged.
 Limit behaviour can be modelled via ODEs
State
occupancy
measure
at time t=100

24
1
10
100
1,000
2 4 8 16 32 64
MLPS
ERPS
FMLPS
Computational Costs
100 samples: W=128 workers
V (CPUs)
RunningTime[s]
CTMC: exponential
complexity
Fluid:
scalable
(RPS, <1s)

25
0
2
4
6
8
10
12
20 60 100 140 180
CI
FMLPS
FMLPS: SAP ERP Results
100 samples: V=8 CPUs, W=64 workers
N (jobs)
Error%
(Intractable for MLPS)

26
Queue-Length
Based Estimation

27
 Monitor occupancy at all resources
 Observations:
 Ill-posed, unless think times known
 Probabilistic model of distributed system
 Gibbs: iteratively sample posterior
Gibbs Sampling (GQL)
prior

28
GQL: Results
 Accurate estimates, error ~3%-7%

29
GQL: Sensitivity analysis
 Increasing model sizes
0
5
10
15
20
25
30
35
40
45
50
4 6 8 13 20 28 89 127 186
GQL
GQL
Error(%)
Log of state space size

30
GQL: Prior distribution
 Dirichlet prior
 all estimates converge unless the exact demand has
very low probability prior

31
GQL: Results

32
GQL vs Other MCMC Methods
 Far better convergence properties than
Metropolis-Hastings and Slice sampling
 About 13-15% error in estimating demands
against cloud ERP data (Apache OFBiz)

33
QMLE Approximation
 Based on Maximum Likelihood Estimation
 Works with mean queue length
 A simple approximation of the MLE:
 Consider the demand vector where
 More details at tomorrow’s talk!

35
Tool support
 FG - “Filling-the-Gap”
 Batch offline analysis, support for Condor
 Open Source Software
 MCR executables (BSD-3)
 Main repo:
https://github.com/Imperial-AESOP/Filling-the-Gap
 Manual available in the repo

36
FG: Initial design
 Outputs
 Model parameters
 Visualization
 Forecasting
–Requires analysis, but not decision-making
 User control knobs
 Analysis frequency
 Horizon of analysis
 Monitoring intensity
 Maximum collection window

37
FG: Parameter Estimation
 Parameters for QN/LQN models

39
FG: Methods
 Implemented methods
 Complete Information (CI)
 Utilization-Based Regression (UBR)
 Utilization-Based Optimization (UBO)
(M/GI/1-PS; cf. Zhang et al., Menasce)
 ODE-based MLPS (“fluid MLPS”, FMLPS)
 MINPS
 Queue-Based Gibbs Sampling (GQL)
 Extended RPS (ERPS, includes a new
correction for number of cores)

1
Queue-Length
Based Estimation

2
 Monitor occupancy at all resources
 Observations:
 Ill-posed, unless think times known
 Probabilistic model of distributed system
 Markov-Chain Monte-Carlo (MCMC)
 draw samples from target distribution
 averaging samples provides estimate

3
 Markov-Chain Monte-Carlo (MCMC)

4
 Gibbs: sample one dimension at a time

 iteratively sample posterior
 where
prior

5
GQL: Results

6
GQL: Sensitivity analysis
 Increasing model sizes
0
5
10
15
20
25
30
35
40
45
50
4 6 8 13 20 28 89 127 186
GQL
GQL
Error(%)
Log of state space size

7
GQL: Prior distribution
 Dirichlet prior
 all estimates converge unless the exact demand has
very low probability prior

8
GQL: Results

9
GQL vs Other MCMC Methods
 Far better convergence properties than
Metropolis-Hastings and Slice sampling
 About 13-15% error in estimating demands
against cloud ERP data (Apache OFBiz)

10
QMLE Approximation
 Based on Maximum Likelihood Estimation
 Works with mean queue length
 A simple approximation of the MLE:
 Consider the demand vector where
 Approach generalizes to load-dependent QNs
 More details at tomorrow’s talk!

12
Tool support
 FG - “Filling-the-Gap”
 Batch offline analysis, support for Condor
 Open Source Software
 MCR executables (BSD-3)
 Main repo:
https://github.com/Imperial-AESOP/Filling-the-Gap
 Manual available in the repo

13
FG: Initial design
 Outputs
 Model parameters
 Visualization
 User control knobs
 Analysis frequency
 Horizon of analysis
 Algorithm selection

14
FG: Parameter Estimation
 Parameters for QN/LQN models

16
FG: Methods
 Implemented methods
 Complete Information (CI)
 Utilization-Based Regression (UBR)
 Utilization-Based Optimization (UBO)
(M/GI/1-PS; cf. Zhang et al., Menasce)
 ODE-based MLPS (“fluid MLPS”, FMLPS)
 MINPS
 Queue-Based Gibbs Sampling (GQL)
 Extended RPS (ERPS, includes a new
correction for number of cores)

2
Comparison
Elsevier PEVA, October 2015.

3
Experiments
 Dataset D1: Queueing Simulator
 Simulated M/M/1 queue with FCFS scheduling
 Workload classes: 1, 2 and 5
 Utilization levels: 10%, 50%, 90%
 In total: 900 traces
 Dataset D2: Micro-Benchmarks
 Workload classes: 1, 2, and 3
 Utilization levels: 20%, 50%, 80%
 In total: 210 traces

4
Compared Approaches
 Based on Service Demand Law (Brosig et al.
2009)
 Utilization Regression (Rolia and Vetland 1995)
 Kalman Filter (Kumar et al. 2009)
 Opitimization 1 (Menascé 2008)
 Optimization 2 (Liu et al. 2006)
 Response time regression (Kraft et al. 2009)
 Gibbs Sampling (Wang et al. 20133)

5
Sampling Interval
0
20
40
60
80
100
120
Util. Regression Kalman Filter Optim. 1 Optim. 2
RelativeError(in%)
1 s 5 s 10 s 30 s 60 s 120 s
Number of requests within one interval
should be larger than 60.
Dataset D1

6
Number of Samples
0
1
2
3
4
5
6
7
8
SDL Util.
Regression
Kalman Filter Optim. 1 Optim. 2 Rt. Regression Gibbs
RelativeError(in%)
600 samples 3600 samples
Dataset D1
Number of samples has only limited impact.

7
Number of Workload Classes
0
20
40
60
80
100
120
140
160
180
SDL Util.
Regression
Kalman Filter Optim. 1 Optim. 2 Rt.
Regression
Gibbs
RelativeError(in%)
1 class 2 classes 5 classes
Dataset D1
Number of classes has strong impact on D1.

8
Number of workload classes
0
5
10
15
20
25
30
SDL Util.
Regression
RelativeError(in%)
Dataset D2
Number of classes has a much smaller impact on D2.

9
Correlation AnalysisClasses
SDL
Util.
Regression
KalmanFilter
Optim.1
Optim.2
Rt.Regression
Gibbs
2 0.91 0.35 0.88 0.89 0.88 0.52 0.9
5 0.72 0.37 0.78 0.8 0.8 0.44 0.79
Correlation with std[D]:
Significant influence of scheduling strategy.

10
Load Level
0
10
20
30
40
50
60
70
80
90
100
SDL Util.
Regression
Regression
Gibbs
RelativeError(in%)
10% 50% 90%
Dataset D1
High utilization observations can impact
estimation negatively.

11
Collinearity
0
20
40
60
80
100
120
SDL Util.
Regression
Regression
Gibbs
RelativeError(in%)
Low High
Collinearity in throughputs impacts
Utilization Regression

12
Missing Jobs
0
10
20
30
40
50
60
70
80
90
100
SDL Util.
Regression
Regression
Gibbs
RelativeError(in%)
5% 10% 20% 30%
Higher influence on utilization approaches
than on response time ones.

13
Additional Wait Times
0
5
10
15
20
25
30
35
40
45
SDL Util.
Regression
RelativeError(in%)
25ms 75ms 125ms
Higher influence on response time approaches
than on utilization ones.

14
Execution Time
0.10
1.00
10.00
100.00
1000.00
10000.00
100000.00
1000000.00
SDL Util.
Regression
Kalman
Filter
Optim. 1 Optim. 2 Rt.
Regression
Gibbs
ExecutionTime(inms)

15
Case Study: SPECjEnterprise
 SPECjEnterprise2010 application benchmark
 Distributed deployment over 7 VMs
 „Microservice Style“
 Strategies for Demand Estimation
 Observed end-to-end response time
 Observed residence time per tier
 Per-resource statistics

17
Results
0
10
20
30
40
50
60
70
80
90
100
Purchase Manage Browse Mfg EJB Mfg WS
RelativeError(%)
Prediction Error Response Time
Optim. 2 (System) Optim. 2 (Tier) SDL

18
Results
0
10
20
30
40
50
60
70
80
90
100
VM 2 VM 3 VM 4 VM 5 VM 6 VM 7 VM 9
RelativeError(%)
Prediction Error Utilization
Optim. 2 (System) Optim. 2 (Tier) SDL

19
Case Study: SAP HANA
 Admission control
 Multi-tenant application (extended TPC-W)
 SAP HANA cloud platform
 Supports Performance isolation between tenants
IEEE/ACM CCGrid 2014.

20
General Idea
Application
Server
Admission
Control
Requests rr,c
Tenants
Accepted
Request
Response
Resource Demand Estimation
Demand dt,r,i
Guarantee
Indices:
t = tenant
r = request type
i = resource
Moni-
toring
Throughput Xt,r
Response Time Rt,r
Utilization Ui

23
Case Study: Zimbra
 Goal: Automatic vertical CPU scaling of VMs
 Zimbra is a collaboration server
 Transactional workload
 SLA: Mails need to be delivered within 2 minutes
 Mails may be queued
IEEE SASO 2014.

24
Approach Overview
Desired
resource
allocation (at+1)
VM
1
VM
2
VM
n
...
Applicatio
n
Controller
Model
Builder
Model: p = f(λ,a)
Current resource
usage (ut)
App-level
SLO (pref)
App Sensor
System Sensor
Observed app
performance
(pt)
pt
vApp Manager
New VM resource settings
(number of vCPUs, configured memory size)
vApp

25
Layered Performance Model
Application
layer
Virtual resource
layer
Physical resource
layer
VM1 VM2vApp
vCPU vCPU
Physical CPU
Service rate depends
on physical hardware
+ Hypervisor Scheduling
Delays
+ OS scheduling delays
+ Wait times for other
resources
Hierarchical modeling approach (Method of Layers [1]):
Service time at layer 𝑖𝑖 is equal to response time of an underlying closed queueing network
at layer 𝑖𝑖 − 1
Load-dependent
Service Demands

26
Influence of Layers
Zimbra MTA with linearly increasing workload:
Demands(inseconds)
Estimated demands reflect contention at hypervisor and
application level

27
Reconfigurations
Controller Mean
latency [s]
Reconfigurations Mean
vCPUs
Max
vCPUs
Model-based 20.48 13 1.4 2
Trigger-based
(1 min)
10.82 273 1.83 3
Trigger-based
(5 min)
25.97 72 1.46 3
Static allocation 1385 0 1 1
Zimbra MTA VM:
Model-based controller needs less reconfigurations and
resources

28
Bibliography
Menascé, D. A. (2008). “Computing missing service demand
parameters for performance models”. In: CMG Conference
Proceedings, pp. 241–248
Liu, Z., L. Wynter, C. H. Xia, and F. Zhang (2006). “Parameter
inference of queueing models for IT systems using end-to-end
measurements”. In: Perform. Eval. 63.1, pp. 36–60
Kumar, D., A. N. Tantawi, and L. Zhang (2009a). “Real-time
performance modeling for adaptive software systems with multi-
class workload”. In: Proceedings of the 17th Annual Meeting of the
IEEE/ACM International Symposium on Modelling, Analysis and
Simulation of Computer and Telecommunication Systems,
MASCOTS, pp. 1–4
Zheng, T., C. M. Woodside, and M. Litoiu (2008). “Performance
Model Estimation and Tracking Using Optimal Filters”. In: IEEE
Trans. Software Eng. 34.3, pp. 391–406

29
Bibliography
Wang, W., X. Huang, X. Qin, W. Zhang, J. Wei, and H. Zhong
(2012). “Application-Level CPU Consumption Estimation: Towards
Performance Isolation of Multi-tenancy Web Applications”. In:
Proceedings of the 2012 IEEE Fifth International Conference on
Cloud Computing, CLOUD, pp. 439–446
Brosig, F., S. Kounev, and K. Krogmann (2009). “Automated
extraction of palladio component models from running enterprise
Java applications”. In: Proceedings of the 4th International
Conference on Performance Evaluation Methodologies and Tools,
VALUETOOLS, p. 10
Rolia, J. and V. Vetland (1995). “Parameter estimation for
performance models of distributed application systems”. In:
Proceedings of the 1995 Conference of the Centre for Advanced
Studies on Collaborative Research, CASCON, p. 54

30
Bibliography
Kraft, S., S. Pacheco-Sanchez, G. Casale, and S. Dawson (2009).
“Estimating service resource consumption from response time
measurements”. In: Proceedings of the 4th International
VALUETOOLS, p. 48
Wang, W. and G. Casale (2013). “Bayesian Service Demand
Estimation Using Gibbs Sampling”. In: Proceedings of the 2013
IEEE 21st International Symposium on Modelling, Analysis and
MASCOTS, pp. 567–576
G. Casale, P. Cremonesi, R. Turrin, Robust Workload Estimation in
Queueing Network Performance Models, in: 16th Euromicro
Conference on Parallel, Distributed and Network-Based Processing
(PDP), 2008, pp. 183-187.

31
Bibliography
P. Cremonesi, K. Dhyani, A. Sansottera, Service Time
Estimation with a Refinement Enhanced Hybrid Clustering
Algorithm, in: Analytical and Stochastic Modeling Techniques
and Applications, Vol. 6148 of Lecture Notes in Computer
Science, Springer Berlin / Heidelberg, 2010, pp. 291--305
P. Cremonesi, A. Sansottera, Indirect estimation of service
demands in the presence of structural changes, Performance
Evaluation 73 (0) (2014) 18--40, special Issue on the 9th
International Conference on Quantitative Evaluation of Systems

1
Arrival Process Fitting
Joint work with A. Sansottera and P. Cremonesi (DEIB, Politecnico di
Milano, Italy)

2
Outline
 Introduction
 Moments and probabilities in Marked MAPs
 Fitting of second-order acyclic Marked MAPs
 Results
 Conclusions

3
Requests Traffic
Time-Varying Peaks of User Activity
High
Performance
Will the system
sustain the load?
Sun Mon Tue Wed Thu
request number
Inter-arrival times
[µs]
FAST
RATE
SLOW
RATE
SLOW
RATE
FAST
RATE
SLOW
RATE

4
 Stochastic models
 Generate statistically similar request arrival patterns
 Analytical models accelerate search for optimal decisions
Markovian Traffic Model
request number
Inter-arrival times
[µs]Arrival Process Modelling
Stochastic analysis
SLOW
RATE
FAST
RATE
FAST
RATE
SLOW
RATE
30%
70%
50%
50%
Request number
Interarrivaltime[ms]

5
 Stochastic models
 Generate statistically similar request arrival patterns
 Analytical models accelerate search for optimal decisions
Arrival Process Modelling
SLOW
RATE
FAST
RATE
FAST
RATE
SLOW
RATE
Markovian Traffic Model Automated fitting methods
30%
70%
50%
50%
Models evaluated: ~350Initial Guess

6
 Network of queues
 mathematical abstraction for prediction, what-if scenarios, …
 describes billions of possible states for the resources
 efficient output analysis techniques [Smirni, QEST’09]
Incoming
Requests
Traffic Decomposition for QN
FAST
RATE
SLOW
RATE
Completed
Requests
Web Server
Storage
Database
CPU
Disks
CPUs
Cloud
Resources

7
 developed efficient analysis techniques
Output Flow
Completed
Requests
Storage
Database
Disks
CPUs
FAST
RATE
SLOW
RATE
SLOW
RATE
Cloud
Resources

8
Output Flow
Completed
Requests
Storage
Database
Disks
CPUs
FAST
RATE
SLOW
RATE
SLOW
RATE
Cloud
Resources

9
Output Flow
Completed
Requests
SLOW
RATE
FAST
RATE
SLOW
RATE
FAST
RATE
Cloud
Resources

10
Non-Poisson Arrivals
Microsoft Live Maps Back End Trace
Disk read/write inter-issue time
Poisson  Phase-type Renewal Processes

11
PH-type Distribution
N transient states
Exit vector
No-mass at 0 assumption
CTMC
Representation PH(D0, α)
1 absorbing state
Phase-type Distribution: distribution of the time to absorption

12
PH-type Renewal Process
 Renewal Process with Phase-type distribution
 Inter-arrival times: i.i.d. with PH(D0, α) distribution
 Counting process N(t) is a CTMC
 Blocks of N states
 Block k: N(t) = k
 After absorption, go to block k+1
 Initial state in block k+1: probability α
 Rate of exit from state i and restart from state j = si αj












⋅
⋅
⋅
=

α
α
α
sD
sD
sD
Q
0
0
0
00
00
00
Transitions in block k
N(t) = 2
N(t) = 3
Event: exit from block k, restart from block k+1
N(t) = 1

13
Some Tools for PH Fitting
 EMpht (1996)
http://home.imf.au.dk/asmus/pspapers.html
EM algorithm for ML ﬁtting, based on Runge-Kutta
methods
Local optimization technique
 jPhase (2006)
http://copa.uniandes.edu.co/software/jmarkov/index.html
Java library
ML and canonical form ﬁtting algorithms

14
Some Tools for PH Fitting
 PhFit (2002)
http://webspn.hit.bme.hu/∼telek/tools.htm
Separate ﬁt of distribution body and tail
Both continuous and discrete ML distributions
 G-FIT (2007)
http://ls4-www.cs.uni-dortmund.de/home/thummler/gﬁt.tgz
Hyper-Erlang PHs used as building block
Automatic aggregation of large traces, dramatic
speed-up of computational times compared to EMpht

15
Correlated Arrivals
Phase-type Renewal Processes  Markovian Arrival Processes

16
Markovian Arrival Process
 Phase-type Renewal Process
Rate of exit from state i and restart from state j = si αj
 Markovian Arrival Process (MAP)
Rate of exit from state i and restart from state j = sij
Generalization of PH-Renewal: allows to model correlation












=

10
10
10
00
00
00
DD
DD
DD
Q 











−
−
−
=
nnn rr
rr
rr
D
λ
λ
λ




21
23221
13121
0












−
=
nnnn sss
sss
sss
D




21
232221
131211
1
Representation: MAP(D0,D1)
Interval-stationary initialization

17
Tools for MAP fitting
 KPC-Toolbox (2008)
http://www.cs.wm.edu/MAPQN/kpctoolbox.html
Moment-matching method
Composition of large MAPs by two-state MAPs
Property of KPC Process (similar relations for
higher-order moments, ACF, …)
KPC Process
!/][][][ kXEXEXE k
b
k
a
k
=

18
Motivation and Goals
 Marked Markovian Arrival Processes (MMAPs)
 Generalization of MAPs to model multi-class arrivals
 Allow to model non-Poisson cross-correlated arrivals
 Allow efficient solution of the models with matrix-
analytic methods
 Modeling the arrival process at a queuing
system (MMAP[K]/PH[k]/1-FCFS queue)
 FCFS queues can be analyzed analytically using age process
 Q-MAM: https://bitbucket.org/qmam/qmam/src
 BU-Tools: http://webspn.hit.bme.hu/~telek/tools/butools/

19
Multi-class Arrivals
Markovian Arrival Processes  Marked Markovian Arrival Processes

20
Marked MAPs
(D0,D1) is a representation of
the MAP underlying the MMAP
(D0,D11,D12) is a representation of a
MMAP[2] process (2 classes)

21
Fitting
 Fitting problem
 Marked trace from a real system: (Xi, Ci)  MMAP
 Queues with arrivals that follow MMAP can be solved
analytically
 Two families of methods
 Maximum-likelihood
 Matching moments (or other characteristics)
 We focus on moment matching
 More computationally efficient
 In real systems, easier to save moments than the whole
trace

22
Issues of moment matching
 Representation of MMAPs is not minimal
 Number of parameters >> Degrees of freedom
 Hard to obtain analytical fitting formulas for
the parameters
 Easy: Parameters -> Moments
 Hard: Moments -> Parameters
 Requires solving a non-linear system of equations in the
general case
 Non-linear least squares for MMAP fitting [Buchholz, 2010]

23
Issues of moment matching
 Feasibility: given a number of states n for the
MMAP, which values of the moments can be fitted
exactly?
 Related issue: how to perform approximate fitting?
 Which characteristics best capture the queueing
behavior?
 Caveat 1: not all characteristics have known analytical
formulas
 Caveat 2: inverting the analytical formulas might be harder
for some characteristics

24
Outline
 Introduction
 Moments and probabilities in Marked
MAPs
 Results

25
Definitions
 Ordinary moment of order j:
 Backward moment of order j for class c (green):
 Forward moment of order j for class c (green):
 Cross moments of order j for class c followed by class k:
 Probability of a class c arrival:
 “Transition” probability of a class c arrival followed by class k

26
Moment Dependencies
 Ordinary moments can be expressed as linear
combination of
 the forward moments, weighted by the class probabilities
 the backward moments, weighted by the class probabilities
 the cross moments, weighted by the class-transition probabilities
For 2 classes and j = 1
Linear system for M1ck
4 unknowns, rank 3
A cross-moment might
be needed to uniquely
determine a second-
order MMAP[2]

27
Outline
 Introduction
 Fitting of second-order acyclic Marked
MAPs
 Results

28
AMMAP[2] Fitting
7 degrees of
freedom
4 for the
underlying
AMAP
3 for the
marginal
Phase-type
1 for the
auto-
correlation
decay
3 for
multi-class
characteristics
D0 D1

29
AMMAP[2] Fitting
 Any MAP(2) has geometric autocorrelation decay with rate γ
• Canonical form for the underlying MAP(2) [Bodrog et al., 2010]
 Acyclic: two forms for γ > 0 and γ < 0
 For γ = 0, acyclic phase-type renewal
γ > 0 γ < 0

30
AMMAP[2] Fitting
γ > 0 γ < 0

31
AMMAP[2] Fitting
γ > 0 γ < 0
3 degrees of freedom 3 degrees of freedom

32
AMMAP[2] Fitting
 How to spend the 3 available degrees of
freedom?
 We have found closed, analytical formulas for the
three parameters q11, q21, q22, for both canonical
forms
 Three different sets of characteristics considered
Class probabilities and…
1) Forward moments and backward moments
2) Forward moments and class transition probabilities
3) Backward moments and class transition probabilities

33
AMMAP[m] Fitting
 How to handle more than 2 classes?
p1 = 0.29
F11 = 0.08
B11 = 0.08
p2 = 0.43
F12 = 0.13
B12 = 0.12
p3 = 0.29
F13 = 0.08
B13 = 0.09

34
M3A Toolbox
 Latest version:
https://github.com/Imperial-AESOP/M3A
A set of Matlab functions designed for computing
the statistical descriptors of MMAPs and fitting
marked traces with MMAPs
Syntax compatibility with KPC-Toolbox
– M3A’s MMAPs are treated by KPC-Toolbox as MAPs

35
Outline
 Introduction
 Results

36
Real-World Traces
Microsoft Live Maps Back End Trace –
Simulation of */M/1 Queue

37
Bibliography
Menascé, D. A. (2008). “Computing missing service demand
parameters for performance models”. In: CMG Conference
Proceedings, pp. 241–248
Liu, Z., L. Wynter, C. H. Xia, and F. Zhang (2006). “Parameter
inference of queueing models for IT systems using end-to-end
measurements”. In: Perform. Eval. 63.1, pp. 36–60
Kumar, D., A. N. Tantawi, and L. Zhang (2009a). “Real-time
performance modeling for adaptive software systems with multi-
class workload”. In: Proceedings of the 17th Annual Meeting of the
IEEE/ACM International Symposium on Modelling, Analysis and
MASCOTS, pp. 1–4
Zheng, T., C. M. Woodside, and M. Litoiu (2008). “Performance
Model Estimation and Tracking Using Optimal Filters”. In: IEEE
Trans. Software Eng. 34.3, pp. 391–406

38
Bibliography
Wang, W., X. Huang, X. Qin, W. Zhang, J. Wei, and H. Zhong
(2012). “Application-Level CPU Consumption Estimation: Towards
Performance Isolation of Multi-tenancy Web Applications”. In:
Proceedings of the 2012 IEEE Fifth International Conference on
Cloud Computing, CLOUD, pp. 439–446
Brosig, F., S. Kounev, and K. Krogmann (2009). “Automated
extraction of palladio component models from running enterprise
Java applications”. In: Proceedings of the 4th International
VALUETOOLS, p. 10
Rolia, J. and V. Vetland (1995). “Parameter estimation for
performance models of distributed application systems”. In:
Proceedings of the 1995 Conference of the Centre for Advanced
Studies on Collaborative Research, CASCON, p. 54

39
Bibliography
Kraft, S., S. Pacheco-Sanchez, G. Casale, and S. Dawson (2009).
“Estimating service resource consumption from response time
measurements”. In: Proceedings of the 4th International
VALUETOOLS, p. 48
Wang, W. and G. Casale (2013). “Bayesian Service Demand
Estimation Using Gibbs Sampling”. In: Proceedings of the 2013
IEEE 21st International Symposium on Modelling, Analysis and
MASCOTS, pp. 567–576
G. Casale, P. Cremonesi, R. Turrin, Robust Workload Estimation in
Queueing Network Performance Models, in: 16th Euromicro
Conference on Parallel, Distributed and Network-Based Processing
(PDP), 2008, pp. 183-187.

40
Bibliography
P. Cremonesi, K. Dhyani, A. Sansottera, Service Time
Estimation with a Refinement Enhanced Hybrid Clustering
Algorithm, in: Analytical and Stochastic Modeling Techniques
and Applications, Vol. 6148 of Lecture Notes in Computer
Science, Springer Berlin / Heidelberg, 2010, pp. 291--305
P. Cremonesi, A. Sansottera, Indirect estimation of service
demands in the presence of structural changes, Performance
Evaluation 73 (0) (2014) 18--40, special Issue on the 9th
International Conference on Quantitative Evaluation of Systems
Giuliano Casale, Evgenia Smirni: KPC-toolbox: fitting Markovian
arrival processes and phase-type distributions with MATLAB.
SIGMETRICS Performance Evaluation Review 39(4): 47 (2012)
Andrea Sansottera, Giuliano Casale, Paolo Cremonesi: Fitting
second-order acyclic Marked Markovian Arrival Processes. DSN
2013: 1-12
Work supported by the EU projects DICE (644869) and MODAClouds (318484) and the EPSRC project
OptiMAM (EP/M009211/1).

Automated Parameterization of Performance Models from Measurements

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Automated Parameterization of Performance Models from Measurements

Similar to Automated Parameterization of Performance Models from Measurements (20)

Recently uploaded

Recently uploaded (20)

Automated Parameterization of Performance Models from Measurements