SlideShare a Scribd company logo
Deep Software Variability
for Resilient Performance Models of Conļ¬gurable Systems
President : Ɖlisa FROMONT Full Professor, UniversitĆ© de Rennes 1, IUF, FRANCE
Reviewers : Lidia FUENTES Full Professor, Universidad de MƔlaga, ITIS Software, SPAIN
Rick RABISER Full Professor, Johannes Kepler UniversitƤt, AUSTRIA
Examiners : Maxime CORDY Research Scientist, University of Luxembourg, LUXEMBOURG
Pooyan JAMSHIDI Assistant Professor, University of South Carolina, UNITED STATES
Supervisors : Mathieu ACHER Full Professor, INSA Rennes, IUF, FRANCE
Arnaud BLOUIN Associate Professor, INSA Rennes, FRANCE
Jean-Marc JƉZƉQUEL Full Professor, UniversitĆ© de Rennes 1, FRANCE
PhD Defense - Luc Lesoil
Software Systems are everywhere!
Context
Software is eating the world 2/50
Developers provide software options to (de)activate
Options
Context
Whatā€™s an option? 3
Software conļ¬gurations lead to distinct performance values
Performance
Conļ¬guration
x264
--cabac
--me dia
--output compressed.264
video.mkv
x264
--no-cabac
--me tesa
--output compressed.264
video.mkv
time = 34 seconds
time = 2 min 25 seconds
Context
The power of configurations ! 4
2320
# atoms in
universe
231
# seconds in a
human life
grep
11 options 20 000 options
48 options 119 options
A huge[2]
number of options & conļ¬gurations
2x
possible
configurations
1 500 options
X
independent
boolean
options
Context
#options is already high & wonā€™t decrease 5
[2] Tianyin Xu, Long Jin, Xuepeng Fan, Yuanyuan Zhou, Shankar Pasupathy, Rukma Talwadker,
Hey, you have given me too many knobs! Understanding and dealing with over-designed configuration in system software, FSEā€™15, Link
Options
Challenging to interpret optionsā€™ effects
~25 MB ~50 MB ~1 GB
RANDOMIZE
BASE
DEBUG
INFO
REDUCED
UBSAN
SANITIZE
ALL
DEBUG
INFO
SPLIT
SENSORS
ADM1031
BACKLIGHT
GPIO
DEBUG
DRIVER
ENCLOSURE
SERVICES
REFCOUNT
FULL
Kernel Size
Performance
ā€¦.
.conļ¬g ļ¬le
ā€¦.
ā€¦.
~7 MB
DEBUG
INFO
20 000
options
Context
Too many options for humans? 6
+ interactions
Whole
Population of
Configurations
Predict
Performance
Sample
Configurations
Measure
Performance
Train Performance
Model [3]
Learning
x264 --no-cabac --ref 1 ā€¦
compressed.264
video.mkv
x264 --cabac --ref 2 ā€¦
compressed.264
video.mkv
x264 --cabac --ref 7 ā€¦
compressed.264
video.mkv
Performance
Conļ¬gurations
Machine
Learning
Conļ¬gurations Performance
Conļ¬gurations
Machine
Learning
Performance
24 seconds
57 seconds
39 seconds
[3] J. Guo, K. Czarnecki, S. Apel, N. Siegmund and A.
Wąsowski, Variability-aware performance prediction: A
statistical learning approach, ASEā€™13,
10.1109/ASE.2013.6693089
Sampling, Measuring,
Context
State-of-the-art Solution :
But not too many options for ML 7
Problem
Performance
?
?
?
Performance also depends on the software stack
Hardware
Operating
System
Software
Input Data
Problem
Software layer is not enough 9
[4] Pooyan Jamshidi, Norbert Siegmund, Miguel Velez,
Christian KƤstner, Akshay Patel, Yuvraj Agarwal,
Transfer Learning for Performance Modeling of
Configurable Systems: An Exploratory Analysis,
ASEā€™17, Link
10.4
x264
--mbtree
...
x264
--no-mbtree
...
x264
--no-mbtree
...
x264
--mbtree
...
20.04
Dell latitude
7400
Raspberry Pi
4 model B
vertical
animation vertical
animation vertical
animation vertical
animation
Duration (s) 22 25 73 72
6 6 351 359
Size (MB) 28 34 33 21
33 21 28 34
A B
2
1 2
1
Hardware
Operating
System
Software
Input Data
Real-world example with x264
Problem
Example of interactions 10
Age # Cores GPU
Compil. Version
Version Option Distrib.
Size Length Res.
Run-time
Hardware
Operating
System
Software
Input Data
Introducing Deep Variability[5]
Deep Variability =
set of interactions between
the different elements of
software environment
Impacting performance
distributions
Threatens
the generalisation of
performance models
Bug
Perf. ā†—
Perf. ā†˜
Problem
Introducing deep variability 11
[5] L. Lesoil, M.Acher, A.Blouin, and J-M. JƩzƩquel,
Deep software variability: Towards handling
cross-layer configuration,VaMoS'21, Link
Age # Cores GPU
Compil. Version
Version Option Distrib.
Size Length Res.
Run-time
Hardware
Operating
System
Software
Input Data
Deep Variability =
set of interactions between
the different elements of
software environment
Impacting performance
distributions
Threatens
the generalisation of
performance models
Bug
Perf. ā†—
Perf. ā†˜
If performance distributions vary with software stacks,
performance models are obsolete when their environment change
Problem
Introducing deep variability 12
Introducing Deep Variability[5] [5] L. Lesoil, M.Acher, A.Blouin, and J-M. JƩzƩquel,
Deep software variability: Towards handling
cross-layer configuration,VaMoS'21, Link
Why should I care ?
Problem
Deep Variability Stakeholders 13
Default software
conļ¬g. is not optimal
Users Developers
Documentation can
not cover all soft. envs.
Researchers
Performance Models
are not applicable
Scientists
May introduce a bias
in experiments
Companies
Server conļ¬guration not
adapted to software
Contributions
Performance models
1. Characterize and spot empirical evidence of the existence of deep variability
2. Propose solutions to include deep variability in performance models
Contributions
Contributions 15
1. Characterize & Spot Deep Variability
On the Input Sensitivity of Conļ¬gurable Systems
Software systems process inputs that are different in terms ofā€¦
Contrib 1. Spot & Characterize Deep Variability
Define the notion of input 17
Nature Scale & Complexity
On the Input Sensitivity[6]
of Conļ¬gurable Systems
+
Input Video 1
+
Input Video 2
Software Input Data
Contrib 1. Spot & Characterize Deep Variability
ā‰  inputs ā‡’ ā‰  perf. distributions 18
[6] Yufei Ding, Jason Ansel, Kalyan Veeramachaneni, Xipeng Shen, Una-May Oā€™Reilly, Saman Amarasinghe,
Autotuning algorithmic choice for input sensitivity. SIGPLAN Not.ā€˜2015, Link
Performance distributions & models change with inputs
-O2 -fnoasm
-Ofast
-O1
-fļ¬‚oat-store
Software
System
Performance
Conļ¬gurations
Machine
Learning
Decision
Tree
Compil. time
Input
Data
This part investigates, demonstrates and quantifies
what effects configurable system inputs have on performance
Contrib 1. Spot & Characterize Deep Variability
Study input-config interactions 19
Gather data about Input Sensitivity
Measure performance
5
6
4
3
2
1
8 software systems
ā‰  domains
ā‰  inputs
Contrib 1. Spot & Characterize Deep Variability
A typical measurement process 20
RQ1 - Do software performance stay consistent across inputs?
+
-
Performance distributions of ā‘  & āž
are negatively correlated
Spearman correlations
A configuration could be good
for profile ā‘  but not for āž
Contrib 1. Spot & Characterize Deep Variability
Config. ranks change with inputs 21
RQ2 - Do conļ¬guration option's effects change with input data?
An option could be good to activate for one input but bad for others
Individual impact of options change with inputs
Contrib 1. Spot & Characterize Deep Variability
Options effects change with inputs 22
RQ3 - Can we ignore Input Sensitivity?
S1 S2
We both predict
optimal
configurations
I value
Input
Sensitivity
I do not care
about Input
Sensitivity
Performance up to x10
when considering inputs
S1 ā‰ˆ S2 + 38% perf
Contrib 1. Spot & Characterize Deep Variability
Significant diff. of performance 23
RQ4 - How do research papers address Input Sensitivity?
65 papers
Q-A. Is there a software system processing input data in the study?
Q-B. Does the experimental protocol include several inputs?
Q-C. Is the problem of Input Sensitivity mentioned e.g. in threat?
Q-D. Does the paper propose a solution to generalize performance
models across inputs?
94%
63%
47%
26%
Contrib 1. Spot & Characterize Deep Variability
Inputs in research papers 24
RQ5 - How to quantify Input Sensitivity?
Score of Input Sensitivity IS
0 = not input-sensitive
1 = input-sensitive
Contrib 1. Spot & Characterize Deep Variability
Proposing input sensitivity score 25
Input Sensitivity threatens the concrete application of performance models
A model trained on one input will not be reusable on any other input
Conclusion
[7] Stefan MĆ¼hlbauer, Florian Sattler, Christian Kaltenecker, Johannes Dorn, Sven Apel, Norbert Siegmund,
Analyzing the Impact of Workloads on Modeling the Performance of Configurable Software Systems, ICSEā€™23, Link
Our results were also found by another team [7]
Contrib 1. Spot & Characterize Deep Variability
Inputs mess with perf. models 26
Age # Cores GPU
Compil. Version
Version Option Distrib.
Size Length Res.
Run-time
Hardware
Operating
System
Software
Input Data
Towards modelling Deep Variabilityā€¦
ICSRā€™22
+
SPLCā€™21
ICPEā€™22
VaMoSā€™22
JSSā€™23
Contrib 1. Spot & Characterize Deep Variability
TSEā€™21
+
SPLCā€™22
Deep variability is real !! 27
Concrete insights from our experiments
Concrete insights about deep var 28
Contrib 1. Spot & Characterize Deep Variability
[not pub] Hardware change perf. distributions linearly (few exceptions)
[A] OS parameter can affect software performance evolution
[B+C] OS version change the effect of OS options
[D+E] Compile-time options mostly interact in a linear way with run-time options
[D+E] Non-linear interactions between compile- & run-time are uncommon
[F] Choice of software can change the impact & effect of common options
[G] Inputs can interact in a non-linear way with run-time options
[G] These interactions are limited to some software & performance properties
[A] L. Lesoil, M. Acher, A. Blouin, J-M. JƩzƩquel, Beware of the
interactions of variability layers when reasoning about evolution of
mongodb, ICPE'22, Link
[B] H. Martin, M. Acher, JA. Pereira, L. Lesoil, J-M. Jezequel, DE.
Khelladi, Transfer learning across variants and versions : The case
of linux kernel size, TSEā€™21. Link
[C] M. Acher, H. Martin, JA. Pereira, L. Lesoil, A. Blouin, J-M. JƩzƩquel,
DE. Khelladi, O. Barais, Feature Subset Selection for Learning Huge
Configuration Spaces: The case of Linux Kernel Size, SPLCā€™22, Link
[D] X. Tƫrnava, M. Acher, L. Lesoil, A. Blouin, J-M. JƩzƩquel, Scratching
the surface of ./configure: Learning the effects of compile-time
options on binary size and gadgets, ICSRā€™22, Link
[E] L. Lesoil, M. Acher, X. Tƫrnava, A. Blouin, J-M. JƩzƩquel, The
interplay of compile-time and run-time options for performance
prediction, SPLC'21, Link
[F] L. Lesoil, H. Martin, M. Acher, A. Blouin and J-M. Jezequel,
Transferring performance between distinct configurable systems :
A case study, VaMoS'22, Link
[G] L. Lesoil, M. Acher, A. Blouin, J-M. JƩzƩquel, Input sensitivity on
the performance of configurable systems : An Empirical Study,
JSS'23, Link
Age # Cores GPU
Compil. Version
Version Option Distrib.
Size Length Res.
Run-time
Hardware
Operating
System
Software
Input Data
Towards modelling Deep Variabilityā€¦ with other researchers!
Contrib 1. Spot & Characterize Deep Variability
DV is also addressed in SOTA 29
[H] S. MĆ¼hlbauer, F. Sattler, C. Kaltenecker, J. Dorn,
S. Apel, N. Siegmund, Analyzing the Impact of
Workloads on Modeling the Performance of
Configurable Software Systems, ICSEā€™23, Link
[I] S. MĆ¼hlbauer, S. Apel, N. Siegmund,
Identifying Software Performance Changes
Across Variants and Versions, ASEā€™20, Link
[J] MS Iqbal, R. Krishna, MA Javidian, B. Ray,
P. Jamshidi, Unicorn: Reasoning about
Configurable System Performance through the
Lens of Causality, Eurosysā€™22, Link
ā€“
[K] Marko Boras, Josip Balen, Kresimir Vdovjak,
Performance Evaluation of Linux Operating
Systems, ICSSTā€™20, Link
[L] D. Cotroneo, R. Natella, R. Pietrantuono,
S. Russo, Software Aging Analysis
of the Linux Operating System, ISSREā€™10, Link
[H] [I]
[J]
[L]
[K]
1. Characterize and spot empirical evidence of the existence of deep variability
2. Propose solutions to include deep variability in performance models
Contributions
Contributions 30
2. Train Resilient Performance Models
How to embed deep variability in models?
Make Performance Models Resist to Deep Variability
Hardware
Operating
System
Software
Input Data
0.152.2854 0.155.2917
Include deep variability
in the model
Train a performance model
valid for as many possible
software environments
Contrib 2. Train Resilient Performance Models
The future of perf. modelling 32
Current State-the-art solution : Transfer Learning
Source
Input
Perf
P.
target
?
training
prediction
source
?
Shifting function
Source Model
2
1
Source Model
Shifting function
Training
Test
1
Learn the ā‰ 
between
source & target
2
Train a model
on the source
3
Apply ā‘  and ā‘”
on the test set
Model Shift[8]
3
Measuring, Learning
for each new input
Time- & resource-
consuming for users
Contrib 2. Train Resilient Performance Models
[8] Pavel Valov, Jean-Christophe Petkovich, Jianmei Guo, Sebastian Fischmeister and Czarnecki Krzysztof,
Transferring Performance Prediction Models Across Different Hardware Platforms, ICPEā€™17, Link
TL avoids deep variability 33
vertical
animation
Target
Input
Measuring on each new input has a non-negligible cost
Contrib 2. Train Resilient Performance Models
Cost of (Transfer) Learning 34
New video
Target
Input
3ā€™
Default conļ¬g. Optim. conļ¬g.
1ā€™
10*3ā€™
Source
Input
animation
Target
Input
Optim. conļ¬g.
1ā€™
100*3ā€™
Fixed conļ¬g.
3ā€™
Transfer Learning Std. Learning
31ā€™ 301ā€™
How? Contextual performance models[9]
with input properties
Spatial = 2.78
Temporal = 0.18
Chunk = 4.42
Color = 0.19
8 Software Systems
+ Domain knowledge
360p
720p
Complexity
Category = Sports
Input Properties
Contrib 2. Train Resilient Performance Models
Include env. properties in training 35
[9] Paul Temple, Mathieu Acher, Jean-Marc JƩzƩquel, Olivier Barais.
Learning-Contextual Variability Models, IEEE Softā€™17, Link
Input-aware Learning
+
+
Input Video 1
+
Input Video 2
Performance
Conļ¬gurations Input Properties
Train machine learning models
predicting the performance of software conļ¬gurations
AND robusts to the change of input data
Contrib 2. Train Resilient Performance Models
Alternative approach 36
Input Properties
RQ1. To what extent are input properties helpful? (1/2)
Classify inputs into previously identified performance profiles
(70% accuracy)
No need for additional measurements
Input properties instead of domain knowledge
Performance proļ¬les
ā‘  ā‘” ā‘¢ ā‘£
Contrib 2. Train Resilient Performance Models
Input props for benchmarking 37
Machine
Learning
RQ1. To what extent are input properties helpful? (2/2)
Transfer Learning
Since inputs matter, so does the
choice of the source input [10]
Source
Input
Target
Input
?
Contrib 2. Train Resilient Performance Models
Input props for TL source selection 38
4 different policies to select the
best input:
- uniform selection of input
- closest input properties
- closest performance distr.
- input of the same perf. profile
[10] Rahul Krishna, Vivek Nair, Pooyan Jamshidi and Tim Menzies, Whence to Learn?
Transferring Knowledge in Configurable Systems Using BEETLE, TSEā€™20, Link
RQ2. How do conļ¬gurations & inputs affect the model?
Input-aware Learning
is possible
Robust to new inputs
without new
measurements
Prediction error stable ~ 25 inputs
Various results according to software systems
Contrib 2. Train Resilient Performance Models
Input-aware models are real 39
RQ3. Which approach to recommend? (1/2)
In general, standard learning beats the
input-aware learning
Contrib 2. Train Resilient Performance Models
Better than SOTA for low budgets 40
For small number of configurations,
input-aware learning outperforms
SOTA standard learning
RQ3. Which approach to recommend? (2/2)
Transfer learning > standard learning
with a fancy selection of the source thanks
to input properties
Contrib 2. Train Resilient Performance Models
Improving TL with deep var. 41
Knowledge about inputs allows transfer learning
to beat the standard learning
Performance prediction should be quick (measurement cost --)
Input-aware learning is possible
With very low budget of configurations,
Contextual performance models using input properties outperform transfer learning
Conclusion
Contrib 2. Train Resilient Performance Models
Few messages 42
Conclusion
Conclusion
Conclusion
- Put a name on (& promote) deep variability
- Gathered data to empirically prove it exists
- Proposed ways to benchmark deep variability
- Practical implications for performance models
Deep variability rocks 44
Open Access
Reproducibility & Availability of our work
Conclusion
Open Research 45
Measurement
process
Why should I care?
Conclusion
Benefits of Deep Variability 46
Recommend optimal
conļ¬guration
Researchers
Make performance
models usable
Scientists
Control DV to enable
reproducible science
Companies
Recommend optimal
env. conļ¬g for servers
Users Developers
Automatic testing of
deep variability
Perspectives
Deep Variability-Aware Performance-Inļ¬‚uence Models
Hardware
Operating
System
Software
Input Data
Built in 16 minutes
lscpu command cat /etc/*-release
version
compile-time options
run-time options
#LOCs for a .c ļ¬le
resolution for a video
Linux version
Linux variant
distribution
#cores
L1/L2/L3 cache
street price
[11] Y. Wang, V. Lee, GY. Wei, D. Brooks, Predicting New Workload or
CPU Performance by Analyzing Public Datasets, TACOā€™19, Link
Perspectives
Letā€™s extend that to other layers ! 48
[9]
Build a common benchmark to study Deep Variability
Hardware
Operating
System
Software
Input Data
[12] N. Siegmund, S. Kolesnikov, C. KƤstner, S. Apel, D. Batory, M.
RosenmĆ¼ller, G. Saake, Predicting Performance via Automated
Feature-Interaction Detection, ICSEā€™12, Link
Perspectives
A common benchmark for DV 49
0.152.2854 0.155.2917
Weakness of this work:
only one layer at a time !
Share the computational effort
Agree on a common playground to test
deep variability
Deep Software Variability
for Resilient Performance Models of Conļ¬gurable Systems
Back-up slides
H. Martin, M. Acher, JA. Pereira, L. Lesoil, JM. JƩzƩquel and DE. Khelladi, Transfer learning across variants and versions:
The case of linux kernel size, Transactions on Software Engineering (TSEā€™21). https://hal.inria.fr/hal-03358817
Across versions of software systems
of
4.13 4.15 4.20 5.0 5.4 5.7 5.8
Evolve performance models with software
Recycle data
52
Joint evolution of mongoDB change points (top) and performance values (bottom)
Code
User #1 User #2
Thread Level = 512
Perf ā†˜ Perf ā†—
Thread Level = 1
Dev
?
Across time
Interactions between
the runtime environment &
the evolution of the software
L. Lesoil, M. Acher, A. Blouin, JM JƩzƩquel, Beware of the Interactions of Variability Layers When Reasoning about Evolution of MongoDB,
International Conference on Performance Engineering (ICPE'22). https://hal.archives-ouvertes.fr/hal-03624309/
[1]
53
Tend to conļ¬rm the negative results for
hardware of SOTA, e.g. [1]
Across hardware platforms
[1] P. Valov, JC. Petkovich, J. Guo, S. Fischmeister, K. Czarnecki, Transferring Performance Prediction Models Across Different
Hardware Platforms, International Conference on Performance Engineering (ICPEā€™17). https://dl.acm.org/doi/10.1145/3030207.3030216
Hardware
Software
Input Data
30 clusters of Gridā€™5000 with different
hardware models
ļ¬xed with the same operating system
8 videos
201 conļ¬gs
Only weak interactions (aka linear)
between hardware and conļ¬gurations
54
10.4
x264
--mbtree
...
x264
--no-mbtree
...
x264
--no-mbtree
...
x264
--mbtree
...
20.04
Dell latitude
7400
Raspberry Pi
4 model B
vertical
animation vertical
animation vertical
animation vertical
animation
Duration (s) 22 25 73 72
6 6 351 359
Size (MB) 28 34 33 21
33 21 28 34
A B
2
1 2
1
ā‰ˆ*16
ā‰ˆ*12
Hardware
Operating
System
Software
Input Data
Is software variability deeper than expected ?
55
Problem
PoC : Input-aware performance models
Input Data
Configuration Performance
+
This (RESIST) paper proposes to train performance models robusts to the change of input data
L. Lesoil, H. Spieker, A. Gotlieb, M. Acher, A. Blouin and JM. JƩzƩquel, Learning Input-aware Performance
Models of Configurable Systems: An Empirical Evaluation. No preprint yet. Submitted
Software Input Data
Input-aware Performance Model
56
RQ1. How to choose a (machine learning) algorithm establishing a
relevant performance prediction model?
ā— Supervised Online Approach
ā— All Inputs & Systems
ā— Separate in train-test
Gradient Boosting Tree
~5% prediction error
57
Inputs
OFFLINE
1. Measure performance related to inputs and conļ¬gs
2. Train the model
ONLINE
1. Compute input properties
2. Apply the model
Configs Model
Input
Config
Performance
Perf.
Model
2
Input
Config
User
Perf.
1
1
2
Difference Online &Ofļ¬‚ine
58
Transfer Learning
Closest Perf. selection
No Many
Few
Yes
No
Are you ready to
measure conļ¬gurations?
Did someone measure
inputs on this system?
Did someone measure
inputs on this system?
We cannot guarantee
a robust prediction
Supervised - Online
Tune hyperparameters
No
Supervised - Ofļ¬‚ine
(Random selection)
Cannot predict
Yes
9% 3%
5%
Actionable Conclusion
Online
Ofļ¬‚ine
59
git clone https://github.com/mirror/x264
./x264 --me tesa
Com
Dow d
Run
Use
./conļ¬gure [--enable-asm] ā€¦
make
./conļ¬gure --disable-asm ā€¦
make
./x264 --me umh ./x264 --me tesa ./x264 --me umh
10.6 seconds 3.4 seconds 81.5 seconds 25.9 seconds
At compile- and run-time
A B
1 1
2 2
L. Lesoil, M. Acher, X. Tƫrnava, A. Blouin and JM. JƩzƩquel, The Interplay of Compile-time and Run-time Options for Performance
Prediction, International Systems and Software Product Line Conference 2022 (SPLC ā€™21). https://hal.ird.fr/INRIA/hal-03286127
60
RQ1.1. Do the run-time performances of conļ¬gurable systems vary with compile-time options?
RQ1 - Do compile-time options change the performance distributions?
6
1
Fixed run-time conļ¬g.
Different compile-time conļ¬g.
Results:
- Size => stable
- xz => stable
- x264 => vary with run-time
- nodeJS => vary
RQ1.2. How many performance can we gain/lose when changing the default compile-time conļ¬guration?
RQ1 - Do compile-time options change the performance distributions?
6
2
Performance Ratio (r, c) =
performance of the run-time conļ¬guration r for the compile-time conļ¬guration c
performance of the run-time conļ¬guration r for the default compile-time conļ¬guration
Results:
- Size => no gain
- xz and poppler => negligible
- Good default performance
- Can vary with input data
RQ2.1. Do compile-time options interact with the run-time options?
63
RQ2 - How to tune software performances at the compile-time level?
Spearman correlations (3a)
Random Forest importances (3b)
Results (nodeJS):
- Compile-time options alter perf. rankings
-> Interplay
- Both compile- and run-time are useful
-> Interplay
RQ2.2. How to use these interactions to ļ¬nd a set of good compile-time options and tune the conļ¬gurable system?
64
RQ2 - How to tune software performances at the compile-time level?
Predict the best compile-time conļ¬guration
Vary the training size : 1%, 5% and 10% of measurements
Depict the performance ratio per input and per training size in Table 3
Results (nodeJS):
Do not need too much data - 5% is enough to get close to the oracle
Up to 50 % improvement of performance
Input Video
Output Size (MB)
20.1 3.8 16.1
19.2 3.7 15
--preset slow
--ref 1
--preset fast
--ref 16
Input Video
Output Size (MB)
4.9 ? ?
? 0.9 11.1
Conļ¬guration
Apply
transfer learning
between distinct
software systems
Across concurrent software systems
L. Lesoil, H. Martin, M. Acher, A. Blouin, JM. JƩzƩquel, Transferring Performance between Distinct Configurable Systems : A Case Study.
International Working Conference on Variability Modelling of Software-Intensive Systems (VaMoSā€™22). https://hal.inria.fr/hal-03514984/ 65
A Brief History of Transfer Learning
Valov et al
IPCEā€™17
Hardware
(Model Shift)
Martin et al
TSEā€™21
Variants &
Versions
(tEAMS)
This paper
VaMoSā€™22
ā‰  software
systems
Jamshidi et
al
SEAMSā€™17
Challenges
Valov et al
IPCEā€™20
Pareto
frontier
Jamshidi et
al
ASEā€™17
Hardware &
workloads
Jamshidi et
al
FSEā€™18
Exploit
similarities
(L2S)
Krishna et
al
TSEā€™20
Bellwhether
(Beetle)
ā€¦applied to software systemsā€¦
ā€¦and predicting their performance properties
66
24 options
Human ML
Speed Slow Fast
Accuracy Worse Better
With more
training data
Lost Progressing
Motivation Bored Always !
Why do we need Machine Learning?
Same conditions for both
encoded size
201 conļ¬gs
For huge conļ¬gurable systems, e.g. 20k options for Linux, ML scales but not human
67
Run-time conļ¬gurations matter
Threads = ā€™auto-detectā€™
Tile size = 64 pixels
Progressive refine = True
Thread(s) = 1
Tile size = 12 pixels
Progressive refine = False
Rendering a 3D scene
1
2
~ 3 seconds
~ 3 minutes
Performance
Run-time
conļ¬guration
Input data
What about
compile-time
conļ¬gurations?
68
Problem
Features do not have
the same name
--level --level-idc
The feature of one system
encapsulates one feature
of the other
--fullrange --range full
A feature
is not implemented X --rc-grain
A feature value
is not implemented
--me ā€˜starā€™ X
Features do not have the
same default value
--qpmax [51] --qpmax [69]
Different requirements
or feature interactions
.yuv format
=> --input-res
.yuv format
ā‰ > --input-res
Feature ranges differ
between source & target
--crf [0-51] --crf [0-51] --crf [0-69]
Challenges (2/2) - Align Conļ¬guration Spaces
Transfer requires a common
conļ¬guration space
How to automate the alignment of
conļ¬g. space?
Different cases
to handle
69
Experiment - Compute the DTW
for all combinations of hardware platforms
Impact of hardware platforms on software evolution
Heatmap of DTW between times series
related to different variants of hardware
ā“‘ DTW = 0.38
ā““ DTW = 5.39
What is the Dynamic Time Warping?
Similar
Different
Result - Identify hardware platforms
having similar evolutions
to reduce the cost of benchmarking
ā“‘ DRPC = 1.61%
ā“’ DRPC = 25.07%
Impact of workloads on software evolution
Experiment - Compute the DRPC
distribution for each workload
Result - Identify stable workloads
to use in benchmarks
Daily Relative Percentage Change
ā— p(t) the performance value at the time t
ā— d(t, t+1) the number of days between t and t+1
The bias of public datasets of hardware performance
Challenging (even for google & cie) to
build a representative benchmark of
hardware platforms
Phoronix is the result of a life work,
though it is missing many SKUs
History of hardware micro-architectures
Test suites in Phoronix
Clustering of software systems
Phoronix Dataset features
Predicting performance of hardware platforms based on properties
Built in 16 minutes
[9] Y. Wang, V. Lee, GY. Wei, D. Brooks, Predicting New Workload or
CPU Performance by Analyzing Public Datasets, TACOā€™19, Link
[9]
Case 1 - Prediction for New SKUs
Case 2 - Prediction for New Systems
78
Case 3 - Cross-Prediction Between Suites (aka different sources)
79
x264
on Youtube UGC
to compress
input videos
no-mbtree
no-cabac
Wang, Yilin, Sasi Inguva, and Balu Adsumilli.
"YouTube UGC dataset for video compression research."
IEEE 21st International Workshop on Multimedia Signal Processing (MMSP) 2019
https://media.withyoutube.com/
encoding size
encoding time
fps
cpu consumption
bitrate
spatial/temporal/chunk complexity
resolution of the video
video fps
Input
videos
Conļ¬gs
Properties
Perfs
80
fno-asm
O1/02/Ofast
gcc
on PolyBench
to compile
input .c programs
Louis-Noel Pouchet
Polybench: The polyhedral benchmark suite v3.1
http://web.cs.ucla.edu/~pouchet/software/polybench/
binary size,
compilation time,
execution time
size of the program
# LOCs
# methods
# imports
Input
programs
Conļ¬gs
Properties Perfs
81
quality
thread
ImageMagick
on an excerpt of ImageNet
to blur
input images
ā€œImageNet: A large-scale hierarchical image database.ā€œ
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei.
IEEE Conf. on Computer Vision and Pattern Recognition, 2009
https://doi.org/10.1109/CVPR.2009.5206848
size of the result
extraction time
initial size of image
# rgb
Input
images
Conļ¬gs
Properties
Perfs
82
Lingeling memlimit
minimize
lingeling
on SAT Compet. bench.
to solve
input formulae
Francisco Gomes de Oliveira Neto, Richard Torkar et al.
Evolution of statistical analysis in empirical software engineering research [...]
Journal of Systems and Software, 2019
https://doi.org/10.1016/j.jss.2019.07.002
#conļ¬‚icts
#reductions
# propositions
# and
# or
Input
formulae
Conļ¬gs
Properties
Perfs
83
debug
wasm
NodeJS
on its test suite
to interpret
.js scripts
NodeJS test suite:
https://github.com/nodejs/node
# operations per
second
size of the script
# LOCs
# methods
# imports
Input
scripts
Conļ¬gs
Properties
Perfs
84
poppler
on Trent Nelsonā€™s list
to extract images
out of input .pdf ļ¬les
ccit
format
Trent Nelson
Technically-oriented pdf collection (github repo) 2014
https://github.com/tpn/pdfs
size of the comp. images
time
avg size of images
# pages
# images
Input pdfs
Conļ¬gs
Properties
Perfs
85
maxsize
memtrace
SQLite
on TPC-H
to query
input databases
Meikel Poess and Chris Floyd
New TPC Benchmarks for Decision Support and Web Commerce
SIGMOD 2000
https://doi.org/10.1145/369275.369291
15 queries
-> time to handle the query
# memory size
# lines
Input DBs
Conļ¬gs
Properties
Perfs
86
format
level
memory
xz
on different corpora
to compress
input ļ¬les
The Canterbury Corpus + The Silesia Corpus.
http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
size
time
type of ļ¬le
size
Input ļ¬les
Conļ¬gs
Properties
Perfs
87
Devs are aware of the input sensitivity problem
88
browsing commits of x264
89
Feature Subset Selection for Learning Huge Configuration Spaces:
The case of Linux Kernel Size
https://www.kaggle.com/competitions/linux-kernel-size/overview
90
We can benefit from contributions of the machine
learning communityā€¦
And our dataset/problems are raising interests.
ā€œNeuroimaging pipelines are known to generate different results
depending on the computing platform where they are compiled and
executed.ā€ Significant differences were revealed between
FreeSurfer version v5.0.0 and the two earlier versions.
[...] About a factor two smaller differences were detected
between Macintosh and Hewlett-Packard workstations
and between OSX 10.5 and OSX 10.6. The observed
differences are similar in magnitude as effect sizes
reported in accuracy evaluations and neurodegenerative
studies.
see also Krefting, D., Scheel, M., Freing, A., Specovius, S., Paul, F., and
Brandt, A. (2011). ā€œReliability of quantitative neuroimage analysis using
freesurfer in distributed environments,ā€ in MICCAI Workshop on
High-Performance and Distributed Computing for Medical Imaging. 91
ā€œNeuroimaging pipelines are known to generate different results
depending on the computing platform where they are compiled and
executed.ā€
Reproducibility of neuroimaging
analyses across operating
systems, Glatard et al., Front.
Neuroinform., 24 April 2015
The implementation of mathematical functions manipulating single-precision floating-point
numbers in libmath has evolved during the last years, leading to numerical differences in
computational results. While these differences have little or no impact on simple analysis
pipelines such as brain extraction and cortical tissue classification, their accumulation
creates important differences in longer pipelines such as the subcortical tissue
classification, RSfMRI analysis, and cortical thickness extraction.
92
Can a coupled ESM simulation be restarted from a diļ¬€erent machine without causing
climate-changing modiļ¬cations in the results? Using two versions of EC-Earth: one ā€œnon-replicableā€
case (see below) and one replicable case.
93

More Related Content

Similar to Deep Software Variability for Resilient Performance Models of Configurable Systems

Predicting system trustworthyness
Predicting system trustworthynessPredicting system trustworthyness
Predicting system trustworthyness
Saransh Garg
Ā 
PhD Thesis Defense
PhD Thesis DefensePhD Thesis Defense
PhD Thesis Defense
Filip Krikava
Ā 
Performance testing based on time complexity analysis for embedded software
Performance testing based on time complexity analysis for embedded softwarePerformance testing based on time complexity analysis for embedded software
Performance testing based on time complexity analysis for embedded software
Mr. Chanuwan
Ā 
Performancetestingbasedontimecomplexityanalysisforembeddedsoftware 1008150404...
Performancetestingbasedontimecomplexityanalysisforembeddedsoftware 1008150404...Performancetestingbasedontimecomplexityanalysisforembeddedsoftware 1008150404...
Performancetestingbasedontimecomplexityanalysisforembeddedsoftware 1008150404...
NNfamily
Ā 
Software performance simulation strategies for high-level embedded system design
Software performance simulation strategies for high-level embedded system designSoftware performance simulation strategies for high-level embedded system design
Software performance simulation strategies for high-level embedded system design
Mr. Chanuwan
Ā 
Resume_VenkataRakeshGudipalli Master - Copy
Resume_VenkataRakeshGudipalli Master - CopyResume_VenkataRakeshGudipalli Master - Copy
Resume_VenkataRakeshGudipalli Master - Copy
Venkata Rakesh Gudipalli
Ā 
Super COMPUTING Journal
Super COMPUTING JournalSuper COMPUTING Journal
Super COMPUTING Journal
Pandey_G
Ā 

Similar to Deep Software Variability for Resilient Performance Models of Configurable Systems (20)

Analysis of Software Complexity Measures for Regression Testing
Analysis of Software Complexity Measures for Regression TestingAnalysis of Software Complexity Measures for Regression Testing
Analysis of Software Complexity Measures for Regression Testing
Ā 
Machine programming
Machine programmingMachine programming
Machine programming
Ā 
(Structural) Feature Interactions for Variability-Intensive Systems Testing
(Structural) Feature Interactions for Variability-Intensive Systems Testing (Structural) Feature Interactions for Variability-Intensive Systems Testing
(Structural) Feature Interactions for Variability-Intensive Systems Testing
Ā 
Anders Claesson - Test Strategies in Agile Projects - EuroSTAR 2010
Anders Claesson - Test Strategies in Agile Projects - EuroSTAR 2010Anders Claesson - Test Strategies in Agile Projects - EuroSTAR 2010
Anders Claesson - Test Strategies in Agile Projects - EuroSTAR 2010
Ā 
A Unique Test Bench for Various System-on-a-Chip
A Unique Test Bench for Various System-on-a-Chip A Unique Test Bench for Various System-on-a-Chip
A Unique Test Bench for Various System-on-a-Chip
Ā 
Predicting system trustworthyness
Predicting system trustworthynessPredicting system trustworthyness
Predicting system trustworthyness
Ā 
Sofware Engineering Important Past Paper 2019
Sofware Engineering Important Past Paper 2019Sofware Engineering Important Past Paper 2019
Sofware Engineering Important Past Paper 2019
Ā 
PhD Thesis Defense
PhD Thesis DefensePhD Thesis Defense
PhD Thesis Defense
Ā 
OS VERIFICATION- A SURVEY AS A SOURCE OF FUTURE CHALLENGES
OS VERIFICATION- A SURVEY AS A SOURCE OF FUTURE CHALLENGESOS VERIFICATION- A SURVEY AS A SOURCE OF FUTURE CHALLENGES
OS VERIFICATION- A SURVEY AS A SOURCE OF FUTURE CHALLENGES
Ā 
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringBridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
Ā 
poster_3.0
poster_3.0poster_3.0
poster_3.0
Ā 
Performance testing based on time complexity analysis for embedded software
Performance testing based on time complexity analysis for embedded softwarePerformance testing based on time complexity analysis for embedded software
Performance testing based on time complexity analysis for embedded software
Ā 
Performancetestingbasedontimecomplexityanalysisforembeddedsoftware 1008150404...
Performancetestingbasedontimecomplexityanalysisforembeddedsoftware 1008150404...Performancetestingbasedontimecomplexityanalysisforembeddedsoftware 1008150404...
Performancetestingbasedontimecomplexityanalysisforembeddedsoftware 1008150404...
Ā 
Linux Assignment 3
Linux Assignment 3Linux Assignment 3
Linux Assignment 3
Ā 
Software performance simulation strategies for high-level embedded system design
Software performance simulation strategies for high-level embedded system designSoftware performance simulation strategies for high-level embedded system design
Software performance simulation strategies for high-level embedded system design
Ā 
ē؋åŗå‘˜å®žč·µä¹‹č·Æ
ē؋åŗå‘˜å®žč·µä¹‹č·Æē؋åŗå‘˜å®žč·µä¹‹č·Æ
ē؋åŗå‘˜å®žč·µä¹‹č·Æ
Ā 
Resume_VenkataRakeshGudipalli Master - Copy
Resume_VenkataRakeshGudipalli Master - CopyResume_VenkataRakeshGudipalli Master - Copy
Resume_VenkataRakeshGudipalli Master - Copy
Ā 
Super COMPUTING Journal
Super COMPUTING JournalSuper COMPUTING Journal
Super COMPUTING Journal
Ā 
Can ML help software developers? (TEQnation 2022)
Can ML help software developers? (TEQnation 2022)Can ML help software developers? (TEQnation 2022)
Can ML help software developers? (TEQnation 2022)
Ā 
SE-Lecture1.ppt
SE-Lecture1.pptSE-Lecture1.ppt
SE-Lecture1.ppt
Ā 

More from Luc Lesoil

More from Luc Lesoil (6)

ICPE 2022 - Data Challenge
ICPE 2022 - Data ChallengeICPE 2022 - Data Challenge
ICPE 2022 - Data Challenge
Ā 
VaMoS 2022 - Transfer Learning across Distinct Software Systems
VaMoS 2022 - Transfer Learning across Distinct Software SystemsVaMoS 2022 - Transfer Learning across Distinct Software Systems
VaMoS 2022 - Transfer Learning across Distinct Software Systems
Ā 
Introduction ML
Introduction MLIntroduction ML
Introduction ML
Ā 
Slimfast
SlimfastSlimfast
Slimfast
Ā 
SPLC 2021 - The Interplay of Compile-time and Run-time Options for Performan...
SPLC 2021  - The Interplay of Compile-time and Run-time Options for Performan...SPLC 2021  - The Interplay of Compile-time and Run-time Options for Performan...
SPLC 2021 - The Interplay of Compile-time and Run-time Options for Performan...
Ā 
VaMoS 2021 - Deep Software Variability: Towards Handling Cross-Layer Configur...
VaMoS 2021 - Deep Software Variability: Towards Handling Cross-Layer Configur...VaMoS 2021 - Deep Software Variability: Towards Handling Cross-Layer Configur...
VaMoS 2021 - Deep Software Variability: Towards Handling Cross-Layer Configur...
Ā 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
Ā 

Recently uploaded (20)

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Ā 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
Ā 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
Ā 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Ā 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Ā 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
Ā 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Ā 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Ā 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Ā 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
Ā 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Ā 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Ā 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Ā 
Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...
Ā 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Ā 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Ā 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Ā 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Ā 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Ā 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Ā 

Deep Software Variability for Resilient Performance Models of Configurable Systems

  • 1. Deep Software Variability for Resilient Performance Models of Conļ¬gurable Systems President : Ɖlisa FROMONT Full Professor, UniversitĆ© de Rennes 1, IUF, FRANCE Reviewers : Lidia FUENTES Full Professor, Universidad de MĆ”laga, ITIS Software, SPAIN Rick RABISER Full Professor, Johannes Kepler UniversitƤt, AUSTRIA Examiners : Maxime CORDY Research Scientist, University of Luxembourg, LUXEMBOURG Pooyan JAMSHIDI Assistant Professor, University of South Carolina, UNITED STATES Supervisors : Mathieu ACHER Full Professor, INSA Rennes, IUF, FRANCE Arnaud BLOUIN Associate Professor, INSA Rennes, FRANCE Jean-Marc JƉZƉQUEL Full Professor, UniversitĆ© de Rennes 1, FRANCE PhD Defense - Luc Lesoil
  • 2. Software Systems are everywhere! Context Software is eating the world 2/50
  • 3. Developers provide software options to (de)activate Options Context Whatā€™s an option? 3
  • 4. Software conļ¬gurations lead to distinct performance values Performance Conļ¬guration x264 --cabac --me dia --output compressed.264 video.mkv x264 --no-cabac --me tesa --output compressed.264 video.mkv time = 34 seconds time = 2 min 25 seconds Context The power of configurations ! 4
  • 5. 2320 # atoms in universe 231 # seconds in a human life grep 11 options 20 000 options 48 options 119 options A huge[2] number of options & conļ¬gurations 2x possible configurations 1 500 options X independent boolean options Context #options is already high & wonā€™t decrease 5 [2] Tianyin Xu, Long Jin, Xuepeng Fan, Yuanyuan Zhou, Shankar Pasupathy, Rukma Talwadker, Hey, you have given me too many knobs! Understanding and dealing with over-designed configuration in system software, FSEā€™15, Link
  • 6. Options Challenging to interpret optionsā€™ effects ~25 MB ~50 MB ~1 GB RANDOMIZE BASE DEBUG INFO REDUCED UBSAN SANITIZE ALL DEBUG INFO SPLIT SENSORS ADM1031 BACKLIGHT GPIO DEBUG DRIVER ENCLOSURE SERVICES REFCOUNT FULL Kernel Size Performance ā€¦. .conļ¬g ļ¬le ā€¦. ā€¦. ~7 MB DEBUG INFO 20 000 options Context Too many options for humans? 6 + interactions
  • 7. Whole Population of Configurations Predict Performance Sample Configurations Measure Performance Train Performance Model [3] Learning x264 --no-cabac --ref 1 ā€¦ compressed.264 video.mkv x264 --cabac --ref 2 ā€¦ compressed.264 video.mkv x264 --cabac --ref 7 ā€¦ compressed.264 video.mkv Performance Conļ¬gurations Machine Learning Conļ¬gurations Performance Conļ¬gurations Machine Learning Performance 24 seconds 57 seconds 39 seconds [3] J. Guo, K. Czarnecki, S. Apel, N. Siegmund and A. Wąsowski, Variability-aware performance prediction: A statistical learning approach, ASEā€™13, 10.1109/ASE.2013.6693089 Sampling, Measuring, Context State-of-the-art Solution : But not too many options for ML 7
  • 9. Performance ? ? ? Performance also depends on the software stack Hardware Operating System Software Input Data Problem Software layer is not enough 9 [4] Pooyan Jamshidi, Norbert Siegmund, Miguel Velez, Christian KƤstner, Akshay Patel, Yuvraj Agarwal, Transfer Learning for Performance Modeling of Configurable Systems: An Exploratory Analysis, ASEā€™17, Link
  • 10. 10.4 x264 --mbtree ... x264 --no-mbtree ... x264 --no-mbtree ... x264 --mbtree ... 20.04 Dell latitude 7400 Raspberry Pi 4 model B vertical animation vertical animation vertical animation vertical animation Duration (s) 22 25 73 72 6 6 351 359 Size (MB) 28 34 33 21 33 21 28 34 A B 2 1 2 1 Hardware Operating System Software Input Data Real-world example with x264 Problem Example of interactions 10
  • 11. Age # Cores GPU Compil. Version Version Option Distrib. Size Length Res. Run-time Hardware Operating System Software Input Data Introducing Deep Variability[5] Deep Variability = set of interactions between the different elements of software environment Impacting performance distributions Threatens the generalisation of performance models Bug Perf. ā†— Perf. ā†˜ Problem Introducing deep variability 11 [5] L. Lesoil, M.Acher, A.Blouin, and J-M. JĆ©zĆ©quel, Deep software variability: Towards handling cross-layer configuration,VaMoS'21, Link
  • 12. Age # Cores GPU Compil. Version Version Option Distrib. Size Length Res. Run-time Hardware Operating System Software Input Data Deep Variability = set of interactions between the different elements of software environment Impacting performance distributions Threatens the generalisation of performance models Bug Perf. ā†— Perf. ā†˜ If performance distributions vary with software stacks, performance models are obsolete when their environment change Problem Introducing deep variability 12 Introducing Deep Variability[5] [5] L. Lesoil, M.Acher, A.Blouin, and J-M. JĆ©zĆ©quel, Deep software variability: Towards handling cross-layer configuration,VaMoS'21, Link
  • 13. Why should I care ? Problem Deep Variability Stakeholders 13 Default software conļ¬g. is not optimal Users Developers Documentation can not cover all soft. envs. Researchers Performance Models are not applicable Scientists May introduce a bias in experiments Companies Server conļ¬guration not adapted to software
  • 15. 1. Characterize and spot empirical evidence of the existence of deep variability 2. Propose solutions to include deep variability in performance models Contributions Contributions 15
  • 16. 1. Characterize & Spot Deep Variability On the Input Sensitivity of Conļ¬gurable Systems
  • 17. Software systems process inputs that are different in terms ofā€¦ Contrib 1. Spot & Characterize Deep Variability Define the notion of input 17 Nature Scale & Complexity
  • 18. On the Input Sensitivity[6] of Conļ¬gurable Systems + Input Video 1 + Input Video 2 Software Input Data Contrib 1. Spot & Characterize Deep Variability ā‰  inputs ā‡’ ā‰  perf. distributions 18 [6] Yufei Ding, Jason Ansel, Kalyan Veeramachaneni, Xipeng Shen, Una-May Oā€™Reilly, Saman Amarasinghe, Autotuning algorithmic choice for input sensitivity. SIGPLAN Not.ā€˜2015, Link
  • 19. Performance distributions & models change with inputs -O2 -fnoasm -Ofast -O1 -fļ¬‚oat-store Software System Performance Conļ¬gurations Machine Learning Decision Tree Compil. time Input Data This part investigates, demonstrates and quantifies what effects configurable system inputs have on performance Contrib 1. Spot & Characterize Deep Variability Study input-config interactions 19
  • 20. Gather data about Input Sensitivity Measure performance 5 6 4 3 2 1 8 software systems ā‰  domains ā‰  inputs Contrib 1. Spot & Characterize Deep Variability A typical measurement process 20
  • 21. RQ1 - Do software performance stay consistent across inputs? + - Performance distributions of ā‘  & āž are negatively correlated Spearman correlations A configuration could be good for profile ā‘  but not for āž Contrib 1. Spot & Characterize Deep Variability Config. ranks change with inputs 21
  • 22. RQ2 - Do conļ¬guration option's effects change with input data? An option could be good to activate for one input but bad for others Individual impact of options change with inputs Contrib 1. Spot & Characterize Deep Variability Options effects change with inputs 22
  • 23. RQ3 - Can we ignore Input Sensitivity? S1 S2 We both predict optimal configurations I value Input Sensitivity I do not care about Input Sensitivity Performance up to x10 when considering inputs S1 ā‰ˆ S2 + 38% perf Contrib 1. Spot & Characterize Deep Variability Significant diff. of performance 23
  • 24. RQ4 - How do research papers address Input Sensitivity? 65 papers Q-A. Is there a software system processing input data in the study? Q-B. Does the experimental protocol include several inputs? Q-C. Is the problem of Input Sensitivity mentioned e.g. in threat? Q-D. Does the paper propose a solution to generalize performance models across inputs? 94% 63% 47% 26% Contrib 1. Spot & Characterize Deep Variability Inputs in research papers 24
  • 25. RQ5 - How to quantify Input Sensitivity? Score of Input Sensitivity IS 0 = not input-sensitive 1 = input-sensitive Contrib 1. Spot & Characterize Deep Variability Proposing input sensitivity score 25
  • 26. Input Sensitivity threatens the concrete application of performance models A model trained on one input will not be reusable on any other input Conclusion [7] Stefan MĆ¼hlbauer, Florian Sattler, Christian Kaltenecker, Johannes Dorn, Sven Apel, Norbert Siegmund, Analyzing the Impact of Workloads on Modeling the Performance of Configurable Software Systems, ICSEā€™23, Link Our results were also found by another team [7] Contrib 1. Spot & Characterize Deep Variability Inputs mess with perf. models 26
  • 27. Age # Cores GPU Compil. Version Version Option Distrib. Size Length Res. Run-time Hardware Operating System Software Input Data Towards modelling Deep Variabilityā€¦ ICSRā€™22 + SPLCā€™21 ICPEā€™22 VaMoSā€™22 JSSā€™23 Contrib 1. Spot & Characterize Deep Variability TSEā€™21 + SPLCā€™22 Deep variability is real !! 27
  • 28. Concrete insights from our experiments Concrete insights about deep var 28 Contrib 1. Spot & Characterize Deep Variability [not pub] Hardware change perf. distributions linearly (few exceptions) [A] OS parameter can affect software performance evolution [B+C] OS version change the effect of OS options [D+E] Compile-time options mostly interact in a linear way with run-time options [D+E] Non-linear interactions between compile- & run-time are uncommon [F] Choice of software can change the impact & effect of common options [G] Inputs can interact in a non-linear way with run-time options [G] These interactions are limited to some software & performance properties [A] L. Lesoil, M. Acher, A. Blouin, J-M. JĆ©zĆ©quel, Beware of the interactions of variability layers when reasoning about evolution of mongodb, ICPE'22, Link [B] H. Martin, M. Acher, JA. Pereira, L. Lesoil, J-M. Jezequel, DE. Khelladi, Transfer learning across variants and versions : The case of linux kernel size, TSEā€™21. Link [C] M. Acher, H. Martin, JA. Pereira, L. Lesoil, A. Blouin, J-M. JĆ©zĆ©quel, DE. Khelladi, O. Barais, Feature Subset Selection for Learning Huge Configuration Spaces: The case of Linux Kernel Size, SPLCā€™22, Link [D] X. TĆ«rnava, M. Acher, L. Lesoil, A. Blouin, J-M. JĆ©zĆ©quel, Scratching the surface of ./configure: Learning the effects of compile-time options on binary size and gadgets, ICSRā€™22, Link [E] L. Lesoil, M. Acher, X. TĆ«rnava, A. Blouin, J-M. JĆ©zĆ©quel, The interplay of compile-time and run-time options for performance prediction, SPLC'21, Link [F] L. Lesoil, H. Martin, M. Acher, A. Blouin and J-M. Jezequel, Transferring performance between distinct configurable systems : A case study, VaMoS'22, Link [G] L. Lesoil, M. Acher, A. Blouin, J-M. JĆ©zĆ©quel, Input sensitivity on the performance of configurable systems : An Empirical Study, JSS'23, Link
  • 29. Age # Cores GPU Compil. Version Version Option Distrib. Size Length Res. Run-time Hardware Operating System Software Input Data Towards modelling Deep Variabilityā€¦ with other researchers! Contrib 1. Spot & Characterize Deep Variability DV is also addressed in SOTA 29 [H] S. MĆ¼hlbauer, F. Sattler, C. Kaltenecker, J. Dorn, S. Apel, N. Siegmund, Analyzing the Impact of Workloads on Modeling the Performance of Configurable Software Systems, ICSEā€™23, Link [I] S. MĆ¼hlbauer, S. Apel, N. Siegmund, Identifying Software Performance Changes Across Variants and Versions, ASEā€™20, Link [J] MS Iqbal, R. Krishna, MA Javidian, B. Ray, P. Jamshidi, Unicorn: Reasoning about Configurable System Performance through the Lens of Causality, Eurosysā€™22, Link ā€“ [K] Marko Boras, Josip Balen, Kresimir Vdovjak, Performance Evaluation of Linux Operating Systems, ICSSTā€™20, Link [L] D. Cotroneo, R. Natella, R. Pietrantuono, S. Russo, Software Aging Analysis of the Linux Operating System, ISSREā€™10, Link [H] [I] [J] [L] [K]
  • 30. 1. Characterize and spot empirical evidence of the existence of deep variability 2. Propose solutions to include deep variability in performance models Contributions Contributions 30
  • 31. 2. Train Resilient Performance Models How to embed deep variability in models?
  • 32. Make Performance Models Resist to Deep Variability Hardware Operating System Software Input Data 0.152.2854 0.155.2917 Include deep variability in the model Train a performance model valid for as many possible software environments Contrib 2. Train Resilient Performance Models The future of perf. modelling 32
  • 33. Current State-the-art solution : Transfer Learning Source Input Perf P. target ? training prediction source ? Shifting function Source Model 2 1 Source Model Shifting function Training Test 1 Learn the ā‰  between source & target 2 Train a model on the source 3 Apply ā‘  and ā‘” on the test set Model Shift[8] 3 Measuring, Learning for each new input Time- & resource- consuming for users Contrib 2. Train Resilient Performance Models [8] Pavel Valov, Jean-Christophe Petkovich, Jianmei Guo, Sebastian Fischmeister and Czarnecki Krzysztof, Transferring Performance Prediction Models Across Different Hardware Platforms, ICPEā€™17, Link TL avoids deep variability 33 vertical animation Target Input
  • 34. Measuring on each new input has a non-negligible cost Contrib 2. Train Resilient Performance Models Cost of (Transfer) Learning 34 New video Target Input 3ā€™ Default conļ¬g. Optim. conļ¬g. 1ā€™ 10*3ā€™ Source Input animation Target Input Optim. conļ¬g. 1ā€™ 100*3ā€™ Fixed conļ¬g. 3ā€™ Transfer Learning Std. Learning 31ā€™ 301ā€™
  • 35. How? Contextual performance models[9] with input properties Spatial = 2.78 Temporal = 0.18 Chunk = 4.42 Color = 0.19 8 Software Systems + Domain knowledge 360p 720p Complexity Category = Sports Input Properties Contrib 2. Train Resilient Performance Models Include env. properties in training 35 [9] Paul Temple, Mathieu Acher, Jean-Marc JĆ©zĆ©quel, Olivier Barais. Learning-Contextual Variability Models, IEEE Softā€™17, Link
  • 36. Input-aware Learning + + Input Video 1 + Input Video 2 Performance Conļ¬gurations Input Properties Train machine learning models predicting the performance of software conļ¬gurations AND robusts to the change of input data Contrib 2. Train Resilient Performance Models Alternative approach 36
  • 37. Input Properties RQ1. To what extent are input properties helpful? (1/2) Classify inputs into previously identified performance profiles (70% accuracy) No need for additional measurements Input properties instead of domain knowledge Performance proļ¬les ā‘  ā‘” ā‘¢ ā‘£ Contrib 2. Train Resilient Performance Models Input props for benchmarking 37 Machine Learning
  • 38. RQ1. To what extent are input properties helpful? (2/2) Transfer Learning Since inputs matter, so does the choice of the source input [10] Source Input Target Input ? Contrib 2. Train Resilient Performance Models Input props for TL source selection 38 4 different policies to select the best input: - uniform selection of input - closest input properties - closest performance distr. - input of the same perf. profile [10] Rahul Krishna, Vivek Nair, Pooyan Jamshidi and Tim Menzies, Whence to Learn? Transferring Knowledge in Configurable Systems Using BEETLE, TSEā€™20, Link
  • 39. RQ2. How do conļ¬gurations & inputs affect the model? Input-aware Learning is possible Robust to new inputs without new measurements Prediction error stable ~ 25 inputs Various results according to software systems Contrib 2. Train Resilient Performance Models Input-aware models are real 39
  • 40. RQ3. Which approach to recommend? (1/2) In general, standard learning beats the input-aware learning Contrib 2. Train Resilient Performance Models Better than SOTA for low budgets 40 For small number of configurations, input-aware learning outperforms SOTA standard learning
  • 41. RQ3. Which approach to recommend? (2/2) Transfer learning > standard learning with a fancy selection of the source thanks to input properties Contrib 2. Train Resilient Performance Models Improving TL with deep var. 41 Knowledge about inputs allows transfer learning to beat the standard learning
  • 42. Performance prediction should be quick (measurement cost --) Input-aware learning is possible With very low budget of configurations, Contextual performance models using input properties outperform transfer learning Conclusion Contrib 2. Train Resilient Performance Models Few messages 42
  • 44. Conclusion Conclusion - Put a name on (& promote) deep variability - Gathered data to empirically prove it exists - Proposed ways to benchmark deep variability - Practical implications for performance models Deep variability rocks 44
  • 45. Open Access Reproducibility & Availability of our work Conclusion Open Research 45 Measurement process
  • 46. Why should I care? Conclusion Benefits of Deep Variability 46 Recommend optimal conļ¬guration Researchers Make performance models usable Scientists Control DV to enable reproducible science Companies Recommend optimal env. conļ¬g for servers Users Developers Automatic testing of deep variability
  • 48. Deep Variability-Aware Performance-Inļ¬‚uence Models Hardware Operating System Software Input Data Built in 16 minutes lscpu command cat /etc/*-release version compile-time options run-time options #LOCs for a .c ļ¬le resolution for a video Linux version Linux variant distribution #cores L1/L2/L3 cache street price [11] Y. Wang, V. Lee, GY. Wei, D. Brooks, Predicting New Workload or CPU Performance by Analyzing Public Datasets, TACOā€™19, Link Perspectives Letā€™s extend that to other layers ! 48 [9]
  • 49. Build a common benchmark to study Deep Variability Hardware Operating System Software Input Data [12] N. Siegmund, S. Kolesnikov, C. KƤstner, S. Apel, D. Batory, M. RosenmĆ¼ller, G. Saake, Predicting Performance via Automated Feature-Interaction Detection, ICSEā€™12, Link Perspectives A common benchmark for DV 49 0.152.2854 0.155.2917 Weakness of this work: only one layer at a time ! Share the computational effort Agree on a common playground to test deep variability
  • 50. Deep Software Variability for Resilient Performance Models of Conļ¬gurable Systems
  • 52. H. Martin, M. Acher, JA. Pereira, L. Lesoil, JM. JĆ©zĆ©quel and DE. Khelladi, Transfer learning across variants and versions: The case of linux kernel size, Transactions on Software Engineering (TSEā€™21). https://hal.inria.fr/hal-03358817 Across versions of software systems of 4.13 4.15 4.20 5.0 5.4 5.7 5.8 Evolve performance models with software Recycle data 52
  • 53. Joint evolution of mongoDB change points (top) and performance values (bottom) Code User #1 User #2 Thread Level = 512 Perf ā†˜ Perf ā†— Thread Level = 1 Dev ? Across time Interactions between the runtime environment & the evolution of the software L. Lesoil, M. Acher, A. Blouin, JM JĆ©zĆ©quel, Beware of the Interactions of Variability Layers When Reasoning about Evolution of MongoDB, International Conference on Performance Engineering (ICPE'22). https://hal.archives-ouvertes.fr/hal-03624309/ [1] 53
  • 54. Tend to conļ¬rm the negative results for hardware of SOTA, e.g. [1] Across hardware platforms [1] P. Valov, JC. Petkovich, J. Guo, S. Fischmeister, K. Czarnecki, Transferring Performance Prediction Models Across Different Hardware Platforms, International Conference on Performance Engineering (ICPEā€™17). https://dl.acm.org/doi/10.1145/3030207.3030216 Hardware Software Input Data 30 clusters of Gridā€™5000 with different hardware models ļ¬xed with the same operating system 8 videos 201 conļ¬gs Only weak interactions (aka linear) between hardware and conļ¬gurations 54
  • 55. 10.4 x264 --mbtree ... x264 --no-mbtree ... x264 --no-mbtree ... x264 --mbtree ... 20.04 Dell latitude 7400 Raspberry Pi 4 model B vertical animation vertical animation vertical animation vertical animation Duration (s) 22 25 73 72 6 6 351 359 Size (MB) 28 34 33 21 33 21 28 34 A B 2 1 2 1 ā‰ˆ*16 ā‰ˆ*12 Hardware Operating System Software Input Data Is software variability deeper than expected ? 55 Problem
  • 56. PoC : Input-aware performance models Input Data Configuration Performance + This (RESIST) paper proposes to train performance models robusts to the change of input data L. Lesoil, H. Spieker, A. Gotlieb, M. Acher, A. Blouin and JM. JĆ©zĆ©quel, Learning Input-aware Performance Models of Configurable Systems: An Empirical Evaluation. No preprint yet. Submitted Software Input Data Input-aware Performance Model 56
  • 57. RQ1. How to choose a (machine learning) algorithm establishing a relevant performance prediction model? ā— Supervised Online Approach ā— All Inputs & Systems ā— Separate in train-test Gradient Boosting Tree ~5% prediction error 57
  • 58. Inputs OFFLINE 1. Measure performance related to inputs and conļ¬gs 2. Train the model ONLINE 1. Compute input properties 2. Apply the model Configs Model Input Config Performance Perf. Model 2 Input Config User Perf. 1 1 2 Difference Online &Ofļ¬‚ine 58
  • 59. Transfer Learning Closest Perf. selection No Many Few Yes No Are you ready to measure conļ¬gurations? Did someone measure inputs on this system? Did someone measure inputs on this system? We cannot guarantee a robust prediction Supervised - Online Tune hyperparameters No Supervised - Ofļ¬‚ine (Random selection) Cannot predict Yes 9% 3% 5% Actionable Conclusion Online Ofļ¬‚ine 59
  • 60. git clone https://github.com/mirror/x264 ./x264 --me tesa Com Dow d Run Use ./conļ¬gure [--enable-asm] ā€¦ make ./conļ¬gure --disable-asm ā€¦ make ./x264 --me umh ./x264 --me tesa ./x264 --me umh 10.6 seconds 3.4 seconds 81.5 seconds 25.9 seconds At compile- and run-time A B 1 1 2 2 L. Lesoil, M. Acher, X. TĆ«rnava, A. Blouin and JM. JĆ©zĆ©quel, The Interplay of Compile-time and Run-time Options for Performance Prediction, International Systems and Software Product Line Conference 2022 (SPLC ā€™21). https://hal.ird.fr/INRIA/hal-03286127 60
  • 61. RQ1.1. Do the run-time performances of conļ¬gurable systems vary with compile-time options? RQ1 - Do compile-time options change the performance distributions? 6 1 Fixed run-time conļ¬g. Different compile-time conļ¬g. Results: - Size => stable - xz => stable - x264 => vary with run-time - nodeJS => vary
  • 62. RQ1.2. How many performance can we gain/lose when changing the default compile-time conļ¬guration? RQ1 - Do compile-time options change the performance distributions? 6 2 Performance Ratio (r, c) = performance of the run-time conļ¬guration r for the compile-time conļ¬guration c performance of the run-time conļ¬guration r for the default compile-time conļ¬guration Results: - Size => no gain - xz and poppler => negligible - Good default performance - Can vary with input data
  • 63. RQ2.1. Do compile-time options interact with the run-time options? 63 RQ2 - How to tune software performances at the compile-time level? Spearman correlations (3a) Random Forest importances (3b) Results (nodeJS): - Compile-time options alter perf. rankings -> Interplay - Both compile- and run-time are useful -> Interplay
  • 64. RQ2.2. How to use these interactions to ļ¬nd a set of good compile-time options and tune the conļ¬gurable system? 64 RQ2 - How to tune software performances at the compile-time level? Predict the best compile-time conļ¬guration Vary the training size : 1%, 5% and 10% of measurements Depict the performance ratio per input and per training size in Table 3 Results (nodeJS): Do not need too much data - 5% is enough to get close to the oracle Up to 50 % improvement of performance
  • 65. Input Video Output Size (MB) 20.1 3.8 16.1 19.2 3.7 15 --preset slow --ref 1 --preset fast --ref 16 Input Video Output Size (MB) 4.9 ? ? ? 0.9 11.1 Conļ¬guration Apply transfer learning between distinct software systems Across concurrent software systems L. Lesoil, H. Martin, M. Acher, A. Blouin, JM. JĆ©zĆ©quel, Transferring Performance between Distinct Configurable Systems : A Case Study. International Working Conference on Variability Modelling of Software-Intensive Systems (VaMoSā€™22). https://hal.inria.fr/hal-03514984/ 65
  • 66. A Brief History of Transfer Learning Valov et al IPCEā€™17 Hardware (Model Shift) Martin et al TSEā€™21 Variants & Versions (tEAMS) This paper VaMoSā€™22 ā‰  software systems Jamshidi et al SEAMSā€™17 Challenges Valov et al IPCEā€™20 Pareto frontier Jamshidi et al ASEā€™17 Hardware & workloads Jamshidi et al FSEā€™18 Exploit similarities (L2S) Krishna et al TSEā€™20 Bellwhether (Beetle) ā€¦applied to software systemsā€¦ ā€¦and predicting their performance properties 66
  • 67. 24 options Human ML Speed Slow Fast Accuracy Worse Better With more training data Lost Progressing Motivation Bored Always ! Why do we need Machine Learning? Same conditions for both encoded size 201 conļ¬gs For huge conļ¬gurable systems, e.g. 20k options for Linux, ML scales but not human 67
  • 68. Run-time conļ¬gurations matter Threads = ā€™auto-detectā€™ Tile size = 64 pixels Progressive refine = True Thread(s) = 1 Tile size = 12 pixels Progressive refine = False Rendering a 3D scene 1 2 ~ 3 seconds ~ 3 minutes Performance Run-time conļ¬guration Input data What about compile-time conļ¬gurations? 68
  • 69. Problem Features do not have the same name --level --level-idc The feature of one system encapsulates one feature of the other --fullrange --range full A feature is not implemented X --rc-grain A feature value is not implemented --me ā€˜starā€™ X Features do not have the same default value --qpmax [51] --qpmax [69] Different requirements or feature interactions .yuv format => --input-res .yuv format ā‰ > --input-res Feature ranges differ between source & target --crf [0-51] --crf [0-51] --crf [0-69] Challenges (2/2) - Align Conļ¬guration Spaces Transfer requires a common conļ¬guration space How to automate the alignment of conļ¬g. space? Different cases to handle 69
  • 70. Experiment - Compute the DTW for all combinations of hardware platforms Impact of hardware platforms on software evolution Heatmap of DTW between times series related to different variants of hardware ā“‘ DTW = 0.38 ā““ DTW = 5.39 What is the Dynamic Time Warping? Similar Different Result - Identify hardware platforms having similar evolutions to reduce the cost of benchmarking
  • 71. ā“‘ DRPC = 1.61% ā“’ DRPC = 25.07% Impact of workloads on software evolution Experiment - Compute the DRPC distribution for each workload Result - Identify stable workloads to use in benchmarks Daily Relative Percentage Change ā— p(t) the performance value at the time t ā— d(t, t+1) the number of days between t and t+1
  • 72. The bias of public datasets of hardware performance Challenging (even for google & cie) to build a representative benchmark of hardware platforms Phoronix is the result of a life work, though it is missing many SKUs
  • 73. History of hardware micro-architectures
  • 74. Test suites in Phoronix
  • 77. Predicting performance of hardware platforms based on properties Built in 16 minutes [9] Y. Wang, V. Lee, GY. Wei, D. Brooks, Predicting New Workload or CPU Performance by Analyzing Public Datasets, TACOā€™19, Link [9] Case 1 - Prediction for New SKUs
  • 78. Case 2 - Prediction for New Systems 78
  • 79. Case 3 - Cross-Prediction Between Suites (aka different sources) 79
  • 80. x264 on Youtube UGC to compress input videos no-mbtree no-cabac Wang, Yilin, Sasi Inguva, and Balu Adsumilli. "YouTube UGC dataset for video compression research." IEEE 21st International Workshop on Multimedia Signal Processing (MMSP) 2019 https://media.withyoutube.com/ encoding size encoding time fps cpu consumption bitrate spatial/temporal/chunk complexity resolution of the video video fps Input videos Conļ¬gs Properties Perfs 80
  • 81. fno-asm O1/02/Ofast gcc on PolyBench to compile input .c programs Louis-Noel Pouchet Polybench: The polyhedral benchmark suite v3.1 http://web.cs.ucla.edu/~pouchet/software/polybench/ binary size, compilation time, execution time size of the program # LOCs # methods # imports Input programs Conļ¬gs Properties Perfs 81
  • 82. quality thread ImageMagick on an excerpt of ImageNet to blur input images ā€œImageNet: A large-scale hierarchical image database.ā€œ Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. IEEE Conf. on Computer Vision and Pattern Recognition, 2009 https://doi.org/10.1109/CVPR.2009.5206848 size of the result extraction time initial size of image # rgb Input images Conļ¬gs Properties Perfs 82
  • 83. Lingeling memlimit minimize lingeling on SAT Compet. bench. to solve input formulae Francisco Gomes de Oliveira Neto, Richard Torkar et al. Evolution of statistical analysis in empirical software engineering research [...] Journal of Systems and Software, 2019 https://doi.org/10.1016/j.jss.2019.07.002 #conļ¬‚icts #reductions # propositions # and # or Input formulae Conļ¬gs Properties Perfs 83
  • 84. debug wasm NodeJS on its test suite to interpret .js scripts NodeJS test suite: https://github.com/nodejs/node # operations per second size of the script # LOCs # methods # imports Input scripts Conļ¬gs Properties Perfs 84
  • 85. poppler on Trent Nelsonā€™s list to extract images out of input .pdf ļ¬les ccit format Trent Nelson Technically-oriented pdf collection (github repo) 2014 https://github.com/tpn/pdfs size of the comp. images time avg size of images # pages # images Input pdfs Conļ¬gs Properties Perfs 85
  • 86. maxsize memtrace SQLite on TPC-H to query input databases Meikel Poess and Chris Floyd New TPC Benchmarks for Decision Support and Web Commerce SIGMOD 2000 https://doi.org/10.1145/369275.369291 15 queries -> time to handle the query # memory size # lines Input DBs Conļ¬gs Properties Perfs 86
  • 87. format level memory xz on different corpora to compress input ļ¬les The Canterbury Corpus + The Silesia Corpus. http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia size time type of ļ¬le size Input ļ¬les Conļ¬gs Properties Perfs 87
  • 88. Devs are aware of the input sensitivity problem 88 browsing commits of x264
  • 89. 89
  • 90. Feature Subset Selection for Learning Huge Configuration Spaces: The case of Linux Kernel Size https://www.kaggle.com/competitions/linux-kernel-size/overview 90 We can benefit from contributions of the machine learning communityā€¦ And our dataset/problems are raising interests.
  • 91. ā€œNeuroimaging pipelines are known to generate different results depending on the computing platform where they are compiled and executed.ā€ Significant differences were revealed between FreeSurfer version v5.0.0 and the two earlier versions. [...] About a factor two smaller differences were detected between Macintosh and Hewlett-Packard workstations and between OSX 10.5 and OSX 10.6. The observed differences are similar in magnitude as effect sizes reported in accuracy evaluations and neurodegenerative studies. see also Krefting, D., Scheel, M., Freing, A., Specovius, S., Paul, F., and Brandt, A. (2011). ā€œReliability of quantitative neuroimage analysis using freesurfer in distributed environments,ā€ in MICCAI Workshop on High-Performance and Distributed Computing for Medical Imaging. 91
  • 92. ā€œNeuroimaging pipelines are known to generate different results depending on the computing platform where they are compiled and executed.ā€ Reproducibility of neuroimaging analyses across operating systems, Glatard et al., Front. Neuroinform., 24 April 2015 The implementation of mathematical functions manipulating single-precision floating-point numbers in libmath has evolved during the last years, leading to numerical differences in computational results. While these differences have little or no impact on simple analysis pipelines such as brain extraction and cortical tissue classification, their accumulation creates important differences in longer pipelines such as the subcortical tissue classification, RSfMRI analysis, and cortical thickness extraction. 92
  • 93. Can a coupled ESM simulation be restarted from a diļ¬€erent machine without causing climate-changing modiļ¬cations in the results? Using two versions of EC-Earth: one ā€œnon-replicableā€ case (see below) and one replicable case. 93