Deep Software Variability for Resilient Performance Models of Configurable Systems

Deep Software Variability
for Resilient Performance Models of Conﬁgurable Systems
President : Élisa FROMONT Full Professor, Université de Rennes 1, IUF, FRANCE
Reviewers : Lidia FUENTES Full Professor, Universidad de Málaga, ITIS Software, SPAIN
Rick RABISER Full Professor, Johannes Kepler Universität, AUSTRIA
Examiners : Maxime CORDY Research Scientist, University of Luxembourg, LUXEMBOURG
Pooyan JAMSHIDI Assistant Professor, University of South Carolina, UNITED STATES
Supervisors : Mathieu ACHER Full Professor, INSA Rennes, IUF, FRANCE
Arnaud BLOUIN Associate Professor, INSA Rennes, FRANCE
Jean-Marc JÉZÉQUEL Full Professor, Université de Rennes 1, FRANCE
PhD Defense - Luc Lesoil

Software Systems are everywhere!
Context
Software is eating the world 2/50

Developers provide software options to (de)activate
Options
Context
What’s an option? 3

Software conﬁgurations lead to distinct performance values
Performance
Conﬁguration
x264
--cabac
--me dia
--output compressed.264
video.mkv
x264
--no-cabac
--me tesa
--output compressed.264
video.mkv
time = 34 seconds
time = 2 min 25 seconds
Context
The power of configurations ! 4

2320
# atoms in
universe
231
# seconds in a
human life
grep
11 options 20 000 options
48 options 119 options
A huge[2]
number of options & conﬁgurations
2x
possible
configurations
1 500 options
X
independent
boolean
options
Context
#options is already high & won’t decrease 5
[2] Tianyin Xu, Long Jin, Xuepeng Fan, Yuanyuan Zhou, Shankar Pasupathy, Rukma Talwadker,
Hey, you have given me too many knobs! Understanding and dealing with over-designed configuration in system software, FSE’15, Link

Options
Challenging to interpret options’ effects
~25 MB ~50 MB ~1 GB
RANDOMIZE
BASE
DEBUG
INFO
REDUCED
UBSAN
SANITIZE
ALL
DEBUG
INFO
SPLIT
SENSORS
ADM1031
BACKLIGHT
GPIO
DEBUG
DRIVER
ENCLOSURE
SERVICES
REFCOUNT
FULL
Kernel Size
Performance
….
.conﬁg ﬁle
….
….
~7 MB
DEBUG
INFO
20 000
options
Context
Too many options for humans? 6
+ interactions

Whole
Population of
Configurations
Predict
Performance
Sample
Configurations
Measure
Performance
Train Performance
Model [3]
Learning
x264 --no-cabac --ref 1 …
compressed.264
video.mkv
x264 --cabac --ref 2 …
compressed.264
video.mkv
x264 --cabac --ref 7 …
compressed.264
video.mkv
Performance
Configurations
Machine
Learning
Configurations Performance
Configurations
Machine
Learning
Performance
24 seconds
57 seconds
39 seconds
[3] J. Guo, K. Czarnecki, S. Apel, N. Siegmund and A.
Wąsowski, Variability-aware performance prediction: A
statistical learning approach, ASE’13,
10.1109/ASE.2013.6693089
Sampling, Measuring,
Context
State-of-the-art Solution :
But not too many options for ML 7

Performance
?
?
?
Performance also depends on the software stack
Hardware
Operating
System
Software
Input Data
Problem
Software layer is not enough 9
[4] Pooyan Jamshidi, Norbert Siegmund, Miguel Velez,
Christian Kästner, Akshay Patel, Yuvraj Agarwal,
Transfer Learning for Performance Modeling of
Configurable Systems: An Exploratory Analysis,
ASE’17, Link

10.4
x264
--mbtree
...
x264
--no-mbtree
...
x264
--no-mbtree
...
x264
--mbtree
...
20.04
Dell latitude
7400
Raspberry Pi
4 model B
vertical
animation vertical
animation vertical
animation vertical
animation
Duration (s) 22 25 73 72
6 6 351 359
Size (MB) 28 34 33 21
33 21 28 34
A B
2
1 2
1
Hardware
Operating
System
Software
Input Data
Real-world example with x264
Problem
Example of interactions 10

Age # Cores GPU
Compil. Version
Version Option Distrib.
Size Length Res.
Run-time
Hardware
Operating
System
Software
Input Data
Introducing Deep Variability[5]
Deep Variability =
set of interactions between
the different elements of
software environment
Impacting performance
distributions
Threatens
the generalisation of
performance models
Bug
Perf. ↗
Perf. ↘
Problem
Introducing deep variability 11
[5] L. Lesoil, M.Acher, A.Blouin, and J-M. Jézéquel,
Deep software variability: Towards handling
cross-layer configuration,VaMoS'21, Link

Age # Cores GPU
Compil. Version
Size Length Res.
Run-time
Hardware
Operating
System
Software
Input Data
Deep Variability =
set of interactions between
the different elements of
software environment
Impacting performance
distributions
Threatens
the generalisation of
performance models
Bug
Perf. ↗
Perf. ↘
If performance distributions vary with software stacks,
performance models are obsolete when their environment change
Problem
Introducing deep variability 12
Introducing Deep Variability[5] [5] L. Lesoil, M.Acher, A.Blouin, and J-M. Jézéquel,
Deep software variability: Towards handling
cross-layer configuration,VaMoS'21, Link

Why should I care ?
Problem
Deep Variability Stakeholders 13
Default software
conﬁg. is not optimal
Users Developers
Documentation can
not cover all soft. envs.
Researchers
Performance Models
are not applicable
Scientists
May introduce a bias
in experiments
Companies
Server conﬁguration not
adapted to software

Contributions
Performance models

1. Characterize and spot empirical evidence of the existence of deep variability
2. Propose solutions to include deep variability in performance models
Contributions
Contributions 15

1. Characterize & Spot Deep Variability
On the Input Sensitivity of Conﬁgurable Systems

Software systems process inputs that are different in terms of…
Contrib 1. Spot & Characterize Deep Variability
Define the notion of input 17
Nature Scale & Complexity

On the Input Sensitivity[6]
of Conﬁgurable Systems
+
Input Video 1
+
Input Video 2
Software Input Data
≠ inputs ⇒ ≠ perf. distributions 18
[6] Yufei Ding, Jason Ansel, Kalyan Veeramachaneni, Xipeng Shen, Una-May O’Reilly, Saman Amarasinghe,
Autotuning algorithmic choice for input sensitivity. SIGPLAN Not.‘2015, Link

Performance distributions & models change with inputs
-O2 -fnoasm
-Ofast
-O1
-fﬂoat-store
Software
System
Performance
Conﬁgurations
Machine
Learning
Decision
Tree
Compil. time
Input
Data
This part investigates, demonstrates and quantifies
what effects configurable system inputs have on performance
Study input-config interactions 19

Gather data about Input Sensitivity
Measure performance
5
6
4
3
2
1
8 software systems
≠ domains
≠ inputs
A typical measurement process 20

RQ1 - Do software performance stay consistent across inputs?
+
-
Performance distributions of ① & ➁
are negatively correlated
Spearman correlations
A configuration could be good
for profile ① but not for ➁
Config. ranks change with inputs 21

RQ2 - Do conﬁguration option's effects change with input data?
An option could be good to activate for one input but bad for others
Individual impact of options change with inputs
Options effects change with inputs 22

RQ3 - Can we ignore Input Sensitivity?
S1 S2
We both predict
optimal
configurations
I value
Input
Sensitivity
I do not care
about Input
Sensitivity
Performance up to x10
when considering inputs
S1 ≈ S2 + 38% perf
Significant diff. of performance 23

RQ4 - How do research papers address Input Sensitivity?
65 papers
Q-A. Is there a software system processing input data in the study?
Q-B. Does the experimental protocol include several inputs?
Q-C. Is the problem of Input Sensitivity mentioned e.g. in threat?
Q-D. Does the paper propose a solution to generalize performance
models across inputs?
94%
63%
47%
26%
Inputs in research papers 24

RQ5 - How to quantify Input Sensitivity?
Score of Input Sensitivity IS
0 = not input-sensitive
1 = input-sensitive
Proposing input sensitivity score 25

Input Sensitivity threatens the concrete application of performance models
A model trained on one input will not be reusable on any other input
Conclusion
[7] Stefan Mühlbauer, Florian Sattler, Christian Kaltenecker, Johannes Dorn, Sven Apel, Norbert Siegmund,
Analyzing the Impact of Workloads on Modeling the Performance of Configurable Software Systems, ICSE’23, Link
Our results were also found by another team [7]
Inputs mess with perf. models 26

Age # Cores GPU
Compil. Version
Size Length Res.
Run-time
Hardware
Operating
System
Software
Input Data
Towards modelling Deep Variability…
ICSR’22
+
SPLC’21
ICPE’22
VaMoS’22
JSS’23
TSE’21
+
SPLC’22
Deep variability is real !! 27

Concrete insights from our experiments
Concrete insights about deep var 28
[not pub] Hardware change perf. distributions linearly (few exceptions)
[A] OS parameter can affect software performance evolution
[B+C] OS version change the effect of OS options
[D+E] Compile-time options mostly interact in a linear way with run-time options
[D+E] Non-linear interactions between compile- & run-time are uncommon
[F] Choice of software can change the impact & effect of common options
[G] Inputs can interact in a non-linear way with run-time options
[G] These interactions are limited to some software & performance properties
[A] L. Lesoil, M. Acher, A. Blouin, J-M. Jézéquel, Beware of the
interactions of variability layers when reasoning about evolution of
mongodb, ICPE'22, Link
[B] H. Martin, M. Acher, JA. Pereira, L. Lesoil, J-M. Jezequel, DE.
Khelladi, Transfer learning across variants and versions : The case
of linux kernel size, TSE’21. Link
[C] M. Acher, H. Martin, JA. Pereira, L. Lesoil, A. Blouin, J-M. Jézéquel,
DE. Khelladi, O. Barais, Feature Subset Selection for Learning Huge
Configuration Spaces: The case of Linux Kernel Size, SPLC’22, Link
[D] X. Tërnava, M. Acher, L. Lesoil, A. Blouin, J-M. Jézéquel, Scratching
the surface of ./configure: Learning the effects of compile-time
options on binary size and gadgets, ICSR’22, Link
[E] L. Lesoil, M. Acher, X. Tërnava, A. Blouin, J-M. Jézéquel, The
interplay of compile-time and run-time options for performance
prediction, SPLC'21, Link
[F] L. Lesoil, H. Martin, M. Acher, A. Blouin and J-M. Jezequel,
Transferring performance between distinct configurable systems :
A case study, VaMoS'22, Link
[G] L. Lesoil, M. Acher, A. Blouin, J-M. Jézéquel, Input sensitivity on
the performance of configurable systems : An Empirical Study,
JSS'23, Link

Age # Cores GPU
Compil. Version
Size Length Res.
Run-time
Hardware
Operating
System
Software
Input Data
Towards modelling Deep Variability… with other researchers!
DV is also addressed in SOTA 29
[H] S. Mühlbauer, F. Sattler, C. Kaltenecker, J. Dorn,
S. Apel, N. Siegmund, Analyzing the Impact of
Workloads on Modeling the Performance of
Configurable Software Systems, ICSE’23, Link
[I] S. Mühlbauer, S. Apel, N. Siegmund,
Identifying Software Performance Changes
Across Variants and Versions, ASE’20, Link
[J] MS Iqbal, R. Krishna, MA Javidian, B. Ray,
P. Jamshidi, Unicorn: Reasoning about
Configurable System Performance through the
Lens of Causality, Eurosys’22, Link
–
[K] Marko Boras, Josip Balen, Kresimir Vdovjak,
Performance Evaluation of Linux Operating
Systems, ICSST’20, Link
[L] D. Cotroneo, R. Natella, R. Pietrantuono,
S. Russo, Software Aging Analysis
of the Linux Operating System, ISSRE’10, Link
[H] [I]
[J]
[L]
[K]

1. Characterize and spot empirical evidence of the existence of deep variability
2. Propose solutions to include deep variability in performance models
Contributions
Contributions 30

2. Train Resilient Performance Models
How to embed deep variability in models?

Make Performance Models Resist to Deep Variability
Hardware
Operating
System
Software
Input Data
0.152.2854 0.155.2917
Include deep variability
in the model
Train a performance model
valid for as many possible
software environments
Contrib 2. Train Resilient Performance Models
The future of perf. modelling 32

Current State-the-art solution : Transfer Learning
Source
Input
Perf
P.
target
?
training
prediction
source
?
Shifting function
Source Model
2
1
Source Model
Shifting function
Training
Test
1
Learn the ≠
between
source & target
2
Train a model
on the source
3
Apply ① and ②
on the test set
Model Shift[8]
3
Measuring, Learning
for each new input
Time- & resource-
consuming for users
[8] Pavel Valov, Jean-Christophe Petkovich, Jianmei Guo, Sebastian Fischmeister and Czarnecki Krzysztof,
Transferring Performance Prediction Models Across Different Hardware Platforms, ICPE’17, Link
TL avoids deep variability 33
vertical
animation
Target
Input

Measuring on each new input has a non-negligible cost
Cost of (Transfer) Learning 34
New video
Target
Input
3’
Default config. Optim. config.
1’
10*3’
Source
Input
animation
Target
Input
Optim. config.
1’
100*3’
Fixed config.
3’
Transfer Learning Std. Learning
31’ 301’

How? Contextual performance models[9]
with input properties
Spatial = 2.78
Temporal = 0.18
Chunk = 4.42
Color = 0.19
8 Software Systems
+ Domain knowledge
360p
720p
Complexity
Category = Sports
Input Properties
Include env. properties in training 35
[9] Paul Temple, Mathieu Acher, Jean-Marc Jézéquel, Olivier Barais.
Learning-Contextual Variability Models, IEEE Soft’17, Link

Input-aware Learning
+
+
Input Video 1
+
Input Video 2
Performance
Conﬁgurations Input Properties
Train machine learning models
predicting the performance of software conﬁgurations
AND robusts to the change of input data
Alternative approach 36

Input Properties
RQ1. To what extent are input properties helpful? (1/2)
Classify inputs into previously identified performance profiles
(70% accuracy)
No need for additional measurements
Input properties instead of domain knowledge
Performance proﬁles
① ② ③ ④
Input props for benchmarking 37
Machine
Learning

RQ1. To what extent are input properties helpful? (2/2)
Transfer Learning
Since inputs matter, so does the
choice of the source input [10]
Source
Input
Target
Input
?
Input props for TL source selection 38
4 different policies to select the
best input:
- uniform selection of input
- closest input properties
- closest performance distr.
- input of the same perf. profile
[10] Rahul Krishna, Vivek Nair, Pooyan Jamshidi and Tim Menzies, Whence to Learn?
Transferring Knowledge in Configurable Systems Using BEETLE, TSE’20, Link

RQ2. How do conﬁgurations & inputs affect the model?
Input-aware Learning
is possible
Robust to new inputs
without new
measurements
Prediction error stable ~ 25 inputs
Various results according to software systems
Input-aware models are real 39

RQ3. Which approach to recommend? (1/2)
In general, standard learning beats the
input-aware learning
Better than SOTA for low budgets 40
For small number of configurations,
input-aware learning outperforms
SOTA standard learning

RQ3. Which approach to recommend? (2/2)
Transfer learning > standard learning
with a fancy selection of the source thanks
to input properties
Improving TL with deep var. 41
Knowledge about inputs allows transfer learning
to beat the standard learning

Performance prediction should be quick (measurement cost --)
Input-aware learning is possible
With very low budget of configurations,
Contextual performance models using input properties outperform transfer learning
Conclusion
Few messages 42

Conclusion
Conclusion
- Put a name on (& promote) deep variability
- Gathered data to empirically prove it exists
- Proposed ways to benchmark deep variability
- Practical implications for performance models
Deep variability rocks 44

Open Access
Reproducibility & Availability of our work
Conclusion
Open Research 45
Measurement
process

Why should I care?
Conclusion
Benefits of Deep Variability 46
Recommend optimal
conﬁguration
Researchers
Make performance
models usable
Scientists
Control DV to enable
reproducible science
Companies
Recommend optimal
env. conﬁg for servers
Users Developers
Automatic testing of
deep variability

Deep Variability-Aware Performance-Inﬂuence Models
Hardware
Operating
System
Software
Input Data
Built in 16 minutes
lscpu command cat /etc/*-release
version
compile-time options
run-time options
#LOCs for a .c ﬁle
resolution for a video
Linux version
Linux variant
distribution
#cores
L1/L2/L3 cache
street price
[11] Y. Wang, V. Lee, GY. Wei, D. Brooks, Predicting New Workload or
CPU Performance by Analyzing Public Datasets, TACO’19, Link
Perspectives
Let’s extend that to other layers ! 48
[9]

Build a common benchmark to study Deep Variability
Hardware
Operating
System
Software
Input Data
[12] N. Siegmund, S. Kolesnikov, C. Kästner, S. Apel, D. Batory, M.
Rosenmüller, G. Saake, Predicting Performance via Automated
Feature-Interaction Detection, ICSE’12, Link
Perspectives
A common benchmark for DV 49
0.152.2854 0.155.2917
Weakness of this work:
only one layer at a time !
Share the computational effort
Agree on a common playground to test
deep variability

Deep Software Variability
for Resilient Performance Models of Conﬁgurable Systems

H. Martin, M. Acher, JA. Pereira, L. Lesoil, JM. Jézéquel and DE. Khelladi, Transfer learning across variants and versions:
The case of linux kernel size, Transactions on Software Engineering (TSE’21). https://hal.inria.fr/hal-03358817
Across versions of software systems
of
4.13 4.15 4.20 5.0 5.4 5.7 5.8
Evolve performance models with software
Recycle data
52

Joint evolution of mongoDB change points (top) and performance values (bottom)
Code
User #1 User #2
Thread Level = 512
Perf ↘ Perf ↗
Thread Level = 1
Dev
?
Across time
Interactions between
the runtime environment &
the evolution of the software
L. Lesoil, M. Acher, A. Blouin, JM Jézéquel, Beware of the Interactions of Variability Layers When Reasoning about Evolution of MongoDB,
International Conference on Performance Engineering (ICPE'22). https://hal.archives-ouvertes.fr/hal-03624309/
[1]
53

Tend to confirm the negative results for
hardware of SOTA, e.g. [1]
Across hardware platforms
[1] P. Valov, JC. Petkovich, J. Guo, S. Fischmeister, K. Czarnecki, Transferring Performance Prediction Models Across Different
Hardware Platforms, International Conference on Performance Engineering (ICPE’17). https://dl.acm.org/doi/10.1145/3030207.3030216
Hardware
Software
Input Data
30 clusters of Grid’5000 with different
hardware models
fixed with the same operating system
8 videos
201 configs
Only weak interactions (aka linear)
between hardware and configurations
54

10.4
x264
--mbtree
...
x264
--no-mbtree
...
x264
--no-mbtree
...
x264
--mbtree
...
20.04
Dell latitude
7400
Raspberry Pi
4 model B
vertical
animation vertical
animation vertical
animation vertical
animation
Duration (s) 22 25 73 72
6 6 351 359
Size (MB) 28 34 33 21
33 21 28 34
A B
2
1 2
1
≈*16
≈*12
Hardware
Operating
System
Software
Input Data
Is software variability deeper than expected ?
55
Problem

PoC : Input-aware performance models
Input Data
Configuration Performance
+
This (RESIST) paper proposes to train performance models robusts to the change of input data
L. Lesoil, H. Spieker, A. Gotlieb, M. Acher, A. Blouin and JM. Jézéquel, Learning Input-aware Performance
Models of Configurable Systems: An Empirical Evaluation. No preprint yet. Submitted
Software Input Data
Input-aware Performance Model
56

RQ1. How to choose a (machine learning) algorithm establishing a
relevant performance prediction model?
● Supervised Online Approach
● All Inputs & Systems
● Separate in train-test
Gradient Boosting Tree
~5% prediction error
57

Inputs
OFFLINE
1. Measure performance related to inputs and conﬁgs
2. Train the model
ONLINE
1. Compute input properties
2. Apply the model
Configs Model
Input
Config
Performance
Perf.
Model
2
Input
Config
User
Perf.
1
1
2
Difference Online &Ofﬂine
58

Transfer Learning
Closest Perf. selection
No Many
Few
Yes
No
Are you ready to
measure configurations?
Did someone measure
inputs on this system?
Did someone measure
inputs on this system?
We cannot guarantee
a robust prediction
Supervised - Online
Tune hyperparameters
No
Supervised - Offline
(Random selection)
Cannot predict
Yes
9% 3%
5%
Actionable Conclusion
Online
Offline
59

git clone https://github.com/mirror/x264
./x264 --me tesa
Com
Dow d
Run
Use
./conﬁgure [--enable-asm] …
make
./conﬁgure --disable-asm …
make
./x264 --me umh ./x264 --me tesa ./x264 --me umh
10.6 seconds 3.4 seconds 81.5 seconds 25.9 seconds
At compile- and run-time
A B
1 1
2 2
L. Lesoil, M. Acher, X. Tërnava, A. Blouin and JM. Jézéquel, The Interplay of Compile-time and Run-time Options for Performance
Prediction, International Systems and Software Product Line Conference 2022 (SPLC ’21). https://hal.ird.fr/INRIA/hal-03286127
60

RQ1.1. Do the run-time performances of configurable systems vary with compile-time options?
RQ1 - Do compile-time options change the performance distributions?
6
1
Fixed run-time config.
Different compile-time config.
Results:
- Size => stable
- xz => stable
- x264 => vary with run-time
- nodeJS => vary

RQ1.2. How many performance can we gain/lose when changing the default compile-time configuration?
RQ1 - Do compile-time options change the performance distributions?
6
2
Performance Ratio (r, c) =
performance of the run-time configuration r for the compile-time configuration c
performance of the run-time configuration r for the default compile-time configuration
Results:
- Size => no gain
- xz and poppler => negligible
- Good default performance
- Can vary with input data

RQ2.1. Do compile-time options interact with the run-time options?
63
RQ2 - How to tune software performances at the compile-time level?
Spearman correlations (3a)
Random Forest importances (3b)
Results (nodeJS):
- Compile-time options alter perf. rankings
-> Interplay
- Both compile- and run-time are useful
-> Interplay

RQ2.2. How to use these interactions to find a set of good compile-time options and tune the configurable system?
64
RQ2 - How to tune software performances at the compile-time level?
Predict the best compile-time configuration
Vary the training size : 1%, 5% and 10% of measurements
Depict the performance ratio per input and per training size in Table 3
Results (nodeJS):
Do not need too much data - 5% is enough to get close to the oracle
Up to 50 % improvement of performance

Input Video
Output Size (MB)
20.1 3.8 16.1
19.2 3.7 15
--preset slow
--ref 1
--preset fast
--ref 16
Input Video
Output Size (MB)
4.9 ? ?
? 0.9 11.1
Conﬁguration
Apply
transfer learning
between distinct
software systems
Across concurrent software systems
L. Lesoil, H. Martin, M. Acher, A. Blouin, JM. Jézéquel, Transferring Performance between Distinct Configurable Systems : A Case Study.
International Working Conference on Variability Modelling of Software-Intensive Systems (VaMoS’22). https://hal.inria.fr/hal-03514984/ 65

A Brief History of Transfer Learning
Valov et al
IPCE’17
Hardware
(Model Shift)
Martin et al
TSE’21
Variants &
Versions
(tEAMS)
This paper
VaMoS’22
≠ software
systems
Jamshidi et
al
SEAMS’17
Challenges
Valov et al
IPCE’20
Pareto
frontier
Jamshidi et
al
ASE’17
Hardware &
workloads
Jamshidi et
al
FSE’18
Exploit
similarities
(L2S)
Krishna et
al
TSE’20
Bellwhether
(Beetle)
…applied to software systems…
…and predicting their performance properties
66

24 options
Human ML
Speed Slow Fast
Accuracy Worse Better
With more
training data
Lost Progressing
Motivation Bored Always !
Why do we need Machine Learning?
Same conditions for both
encoded size
201 conﬁgs
For huge conﬁgurable systems, e.g. 20k options for Linux, ML scales but not human
67

Run-time configurations matter
Threads = ’auto-detect’
Tile size = 64 pixels
Progressive refine = True
Thread(s) = 1
Tile size = 12 pixels
Progressive refine = False
Rendering a 3D scene
1
2
~ 3 seconds
~ 3 minutes
Performance
Run-time
configuration
Input data
What about
compile-time
configurations?
68

Problem
Features do not have
the same name
--level --level-idc
The feature of one system
encapsulates one feature
of the other
--fullrange --range full
A feature
is not implemented X --rc-grain
A feature value
is not implemented
--me ‘star’ X
Features do not have the
same default value
--qpmax [51] --qpmax [69]
Different requirements
or feature interactions
.yuv format
=> --input-res
.yuv format
≠> --input-res
Feature ranges differ
between source & target
--crf [0-51] --crf [0-51] --crf [0-69]
Challenges (2/2) - Align Configuration Spaces
Transfer requires a common
configuration space
How to automate the alignment of
config. space?
Different cases
to handle
69

Experiment - Compute the DTW
for all combinations of hardware platforms
Impact of hardware platforms on software evolution
Heatmap of DTW between times series
related to different variants of hardware
ⓑ DTW = 0.38
ⓓ DTW = 5.39
What is the Dynamic Time Warping?
Similar
Different
Result - Identify hardware platforms
having similar evolutions
to reduce the cost of benchmarking

ⓑ DRPC = 1.61%
ⓒ DRPC = 25.07%
Impact of workloads on software evolution
Experiment - Compute the DRPC
distribution for each workload
Result - Identify stable workloads
to use in benchmarks
Daily Relative Percentage Change
● p(t) the performance value at the time t
● d(t, t+1) the number of days between t and t+1

The bias of public datasets of hardware performance
Challenging (even for google & cie) to
build a representative benchmark of
hardware platforms
Phoronix is the result of a life work,
though it is missing many SKUs

History of hardware micro-architectures

Clustering of software systems

Predicting performance of hardware platforms based on properties
Built in 16 minutes
[9] Y. Wang, V. Lee, GY. Wei, D. Brooks, Predicting New Workload or
CPU Performance by Analyzing Public Datasets, TACO’19, Link
[9]
Case 1 - Prediction for New SKUs

Case 2 - Prediction for New Systems
78

Case 3 - Cross-Prediction Between Suites (aka different sources)
79

x264
on Youtube UGC
to compress
input videos
no-mbtree
no-cabac
Wang, Yilin, Sasi Inguva, and Balu Adsumilli.
"YouTube UGC dataset for video compression research."
IEEE 21st International Workshop on Multimedia Signal Processing (MMSP) 2019
https://media.withyoutube.com/
encoding size
encoding time
fps
cpu consumption
bitrate
spatial/temporal/chunk complexity
resolution of the video
video fps
Input
videos
Conﬁgs
Properties
Perfs
80

fno-asm
O1/02/Ofast
gcc
on PolyBench
to compile
input .c programs
Louis-Noel Pouchet
Polybench: The polyhedral benchmark suite v3.1
http://web.cs.ucla.edu/~pouchet/software/polybench/
binary size,
compilation time,
execution time
size of the program
# LOCs
# methods
# imports
Input
programs
Conﬁgs
Properties Perfs
81

quality
thread
ImageMagick
on an excerpt of ImageNet
to blur
input images
“ImageNet: A large-scale hierarchical image database.“
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei.
IEEE Conf. on Computer Vision and Pattern Recognition, 2009
https://doi.org/10.1109/CVPR.2009.5206848
size of the result
extraction time
initial size of image
# rgb
Input
images
Conﬁgs
Properties
Perfs
82

Lingeling memlimit
minimize
lingeling
on SAT Compet. bench.
to solve
input formulae
Francisco Gomes de Oliveira Neto, Richard Torkar et al.
Evolution of statistical analysis in empirical software engineering research [...]
Journal of Systems and Software, 2019
https://doi.org/10.1016/j.jss.2019.07.002
#conﬂicts
#reductions
# propositions
# and
# or
Input
formulae
Conﬁgs
Properties
Perfs
83

debug
wasm
NodeJS
on its test suite
to interpret
.js scripts
NodeJS test suite:
https://github.com/nodejs/node
# operations per
second
size of the script
# LOCs
# methods
# imports
Input
scripts
Conﬁgs
Properties
Perfs
84

poppler
on Trent Nelson’s list
to extract images
out of input .pdf ﬁles
ccit
format
Trent Nelson
Technically-oriented pdf collection (github repo) 2014
https://github.com/tpn/pdfs
size of the comp. images
time
avg size of images
# pages
# images
Input pdfs
Conﬁgs
Properties
Perfs
85

maxsize
memtrace
SQLite
on TPC-H
to query
input databases
Meikel Poess and Chris Floyd
New TPC Benchmarks for Decision Support and Web Commerce
SIGMOD 2000
https://doi.org/10.1145/369275.369291
15 queries
-> time to handle the query
# memory size
# lines
Input DBs
Conﬁgs
Properties
Perfs
86

format
level
memory
xz
on different corpora
to compress
input files
The Canterbury Corpus + The Silesia Corpus.
http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
size
time
type of file
size
Input files
Configs
Properties
Perfs
87

Devs are aware of the input sensitivity problem
88
browsing commits of x264

Feature Subset Selection for Learning Huge Configuration Spaces:
The case of Linux Kernel Size
https://www.kaggle.com/competitions/linux-kernel-size/overview
90
We can benefit from contributions of the machine
learning community…
And our dataset/problems are raising interests.

“Neuroimaging pipelines are known to generate different results
depending on the computing platform where they are compiled and
executed.” Significant differences were revealed between
FreeSurfer version v5.0.0 and the two earlier versions.
[...] About a factor two smaller differences were detected
between Macintosh and Hewlett-Packard workstations
and between OSX 10.5 and OSX 10.6. The observed
differences are similar in magnitude as effect sizes
reported in accuracy evaluations and neurodegenerative
studies.
see also Krefting, D., Scheel, M., Freing, A., Specovius, S., Paul, F., and
Brandt, A. (2011). “Reliability of quantitative neuroimage analysis using
freesurfer in distributed environments,” in MICCAI Workshop on
High-Performance and Distributed Computing for Medical Imaging. 91

“Neuroimaging pipelines are known to generate different results
depending on the computing platform where they are compiled and
executed.”
Reproducibility of neuroimaging
analyses across operating
systems, Glatard et al., Front.
Neuroinform., 24 April 2015
The implementation of mathematical functions manipulating single-precision floating-point
numbers in libmath has evolved during the last years, leading to numerical differences in
computational results. While these differences have little or no impact on simple analysis
pipelines such as brain extraction and cortical tissue classification, their accumulation
creates important differences in longer pipelines such as the subcortical tissue
classification, RSfMRI analysis, and cortical thickness extraction.
92

Can a coupled ESM simulation be restarted from a diﬀerent machine without causing
climate-changing modiﬁcations in the results? Using two versions of EC-Earth: one “non-replicable”
case (see below) and one replicable case.
93

Deep Software Variability for Resilient Performance Models of Configurable Systems

Recommended

Recommended

More Related Content

Similar to Deep Software Variability for Resilient Performance Models of Configurable Systems

Similar to Deep Software Variability for Resilient Performance Models of Configurable Systems (20)

More from Luc Lesoil

More from Luc Lesoil (6)

Recently uploaded

Recently uploaded (20)

Deep Software Variability for Resilient Performance Models of Configurable Systems