With commits and releases, hundreds of tests are run on varying conditions (e.g., over different hardware and workloads) that can help to understand evolution and ensure non-regression of software
performance. We hypothesize that performance is not only sensitive to evolution of software, but also to different variability layers
of its execution environment, spanning the hardware, the operating system, the build, or the workload processed by the software.
Leveraging the MongoDB dataset, our results show that changes in
hardware and workload can drastically impact performance evolution and thus should be taken into account when reasoning about evolution. An open problem resulting from this study is how to manage the variability layers in order to efficiently test the performance
evolution of a software.
1. Beware of the Interactions of Variability Layers
when Reasoning about the Evolution of MongoDB
Luc Lesoil, Mathieu Acher, Arnaud Blouin & jean-marc Jézéquel
2022/04/12
Beijing, China
Data Challenge
2. ≠ Thread Levels
≠ Perf Evolutions
Joint evolution of mongoDB change points (top) and performance values (bottom)
Code
User #1 User #2
Thread Level = 512
Perf ↘ Perf ↗
Dataset: Expanded Metrics, Project: sys-perf, Task: industry_benchmark_wmajority,
Hardware: linux-3-node-replSet, Test: csb_50_read_50_update_w_majority
Thread Level = 1
Dev
?
Impact of runtime environments on software evolution
2/5
Interactions between
the runtime environment &
the evolution of the software
[1] The Use of Change Point Detection to Identify Software Performance Regressions in a Continuous Integration System, Daly et al., ICPE 2020, https://dl.acm.org/doi/abs/10.1145/3358960.3375791
[1]
3. Experiment - Compute the DTW
for all combinations of hardware platforms
Impact of hardware platforms on software evolution
Heatmap of DTW between times series
related to different variants of hardware
ⓑ DTW = 0.38
ⓓ DTW = 5.39
What is the Dynamic Time Warping?
Similar
Different
Result - Identify hardware platforms
having similar evolutions
to reduce the cost of benchmarking 3/5
4. ⓑ DRPC = 1.61%
ⓒ DRPC = 25.07%
Impact of workloads on software evolution
Experiment - Compute the DRPC
distribution for each workload
Result - Identify stable workloads
to use in benchmarks
Daily Relative Percentage Change
● p(t) the performance value at the time t
● d(t, t+1) the number of days between t and t+1 4/5
5. Takeaway Message
Runtime environments matter (when quantifying software evolution)!
@David and MongoDB performance team
Need feedback & domain knowledge to draw actionable conclusions
Thanks for this Data Challenge !
5/5
7. Pre-processing of Time Series
Only consider the period of definition
common to the two Time Series
Linear interpolation if a point is
present only in one TS
1
Time
Performance
TS #1
TS #2
8. These high values can be due to:
- the standardisation if the standard
deviation of the distribution is too
low
- outliers in the TS
We have to standardise because TS
have different scales
High DTW values for couple of hardware platforms
2