SlideShare a Scribd company logo
Embracing Deep Variability For
Reproducibility and Replicability
Mathieu Acher, Benoit Combemale, Georges Aaron Randrianaina, Jean-Marc Jézéquel
https://hal.science/hal-04582287
@acherm
Embracing Deep Variability For Reproducibility and
Replicability
Abstract: Reproducibility (aka determinism in some cases) constitutes a fundamental
aspect in various fields of computer science, such as floating-point computations in
numerical analysis and simulation, concurrency models in parallelism, reproducible builds
for third parties integration and packaging, and containerization for execution
environments. These concepts, while pervasive across diverse concerns, often exhibit
intricate inter-dependencies, making it challenging to achieve a comprehensive
understanding. In this short and vision paper we delve into the application of software
engineering techniques, specifically variability management, to systematically identify and
explicit points of variability that may give rise to reproducibility issues (eg language,
libraries, compiler, virtual machine, OS, environment variables, etc). The primary
objectives are: i) gaining insights into the variability layers and their possible interactions,
ii) capturing and documenting configurations for the sake of reproducibility, and iii)
exploring diverse configurations to replicate, and hence validate and ensure the
robustness of results. By adopting these methodologies, we aim to address the
complexities associated with reproducibility and replicability in modern software systems
and environments, facilitating a more comprehensive and nuanced perspective on these
critical aspects.
Computational science
depends on software and its engineering
3
design of mathematical model
mining and analysis of data
executions of large simulations
problem solving
executable paper
from a set of scripts to automate the deployment to… a
comprehensive system containing several features that
help researchers exploring various hypotheses
Computational science
depends on software and its engineering
4
Dealing with software collapse: software stops working eventually
Konrad Hinsen 2019
Configuration failures represent one of the most common types of
software failures Sayagh et al. TSE 2018
multi-million line of code base
multi-dependencies
multi-systems
multi-layer
multi-version
multi-person
multi-variant
“Insanity is doing the same thing over and over again and expecting different results”
Overoptimistic reaction: we are not insane in computational science; we
just live with the fact that many variability factors can lead to “different” results.
Reality check: we are not insane in CS; we fix everything and explore a tiny
portion and ignore which variability factors have a significant effect on results
hope that variability factors won’t refute our findings
5
http://throwgrammarfromthetrain.blogspot.com/2010/10/definition-of-insanity.html
Reproducible science with variability
6
“Authors provide all the necessary data and the computer codes to run the
analysis again, re-creating the results.”
Yet, despite the availability of data and code, several studies report that unexplored variability
in software can lead to varying results up to the point discrepancies can radically change the
conclusions and contradict established knowledge
from a set of scripts to automate the deployment to… a
comprehensive system containing several features that
help researchers exploring various hypotheses
Computational science with deep variability
7
hardware
variability
25,000+ options,
10^6000 variants
(operating system)
thousands of
compiler flags
dozens of library
versions
dozens of
command-line
parameters
(container)
configuration files
(distributed
environment)
hyperparameters
(application code)
variability in data
energy
consumption
execution time
binary
42
accuracy
Deep Software Variability
“refers to the interaction of all external “factors” modifying the behavior (including both functional and
nonfunctional properties) of a software system” Lesoil et al. VaMoS 2020
Combinatorial explosion of the epistemic and ontological variability with impacts on computational
result and non-functional properties
8
always 42 ?
9
Deep Software Variability
10
Deep Software Variability
11
Deep Software Variability
https://github.com/FAMILIAR-project/reproducibility-associativity/
Deep Software Variability
Deep Software Variability
“refers to the interaction of all external “factors”
modifying the behavior (including both functional
and nonfunctional properties) of a software
system” Lesoil et al. VaMoS 2020
Combinatorial explosion of the epistemic and
ontological variability with impacts on
computational result and non-functional properties
Non-linear interplays between variability
layers!
Some evidence of deep variability:
● Climate model
● Machine learning
● Neuroimaging
● Bluff-body aerodynamics
● Performance modeling of software
● Reproducible builds
12
always 42 ?
Can a coupled ESM simulation be restarted from a different machine without causing climate-changing modifications in the results? Using
two versions of EC-Earth: one “non-replicable” case (see below) and one replicable case.
13
We demonstrate that effects of parameter, hardware, and software variation are
detectable, complex, and interacting. However, we find most of the effects of
parameter variation are caused by a small subset of parameters. Notably, the
entrainment coefficient in clouds is associated with 30% of the variation seen in
climate sensitivity, although both low and high values can give high climate
sensitivity. We demonstrate that the effect of hardware and software is small relative
to the effect of parameter variation and, over the wide range of systems tested, may
be treated as equivalent to that caused by changes in initial conditions.
57,067 climate model runs. These runs sample parameter space for 10 parameters
with between two and four levels of each, covering 12,487 parameter combinations
(24% of possible combinations) and a range of initial conditions
14
Joelle Pineau “Building Reproducible, Reusable, and Robust Machine Learning Software” ICSE’19 keynote “[...] results
can be brittle to even minor perturbations in the domain or experimental procedure”
What is the magnitude of the effect
hyperparameter settings can have on baseline
performance?
How does the choice of network architecture for
the policy and value function approximation affect
performance?
How can the reward scale affect results?
Can random seeds drastically alter performance?
How do the environment properties affect
variability in reported RL algorithm performance?
Are commonly used baseline implementations
comparable? 15
“Completing a full replication study of our previously published findings on bluff-body
aerodynamics was harder than we thought. Despite the fact that we have good
reproducible-research practices, sharing our code and data openly.”
16
Can Machine Learning Pipelines Be Better Configured?
Wang et al. FSE’2023
“A pipeline is subject to misconfiguration if
it exhibits significantly inconsistent performance upon changes in
the versions of its configured libraries or the combination of these
libraries. We refer to such performance inconsistency as a pipeline
configuration (PLC) issue.”
17
Should software version numbers determine science?
Significant differences were revealed between
FreeSurfer version v5.0.0 and the two earlier versions.
[...] About a factor two smaller differences were detected
between Macintosh and Hewlett-Packard workstations
and between OSX 10.5 and OSX 10.6. The observed
differences are similar in magnitude as effect sizes
reported in accuracy evaluations and neurodegenerative
studies.
see also Krefting, D., Scheel, M., Freing, A., Specovius, S., Paul, F., and
Brandt, A. (2011). “Reliability of quantitative neuroimage analysis using
freesurfer in distributed environments,” in MICCAI Workshop on
High-Performance and Distributed Computing for Medical Imaging.
18
“Neuroimaging pipelines are known to generate different results
depending on the computing platform where they are compiled and
executed.”
Reproducibility of neuroimaging
analyses across operating systems,
Glatard et al., Front. Neuroinform., 24
April 2015
The implementation of mathematical functions manipulating single-precision floating-point
numbers in libmath has evolved during the last years, leading to numerical differences in
computational results. While these differences have little or no impact on simple analysis
pipelines such as brain extraction and cortical tissue classification, their accumulation
creates important differences in longer pipelines such as the subcortical tissue
classification, RSfMRI analysis, and cortical thickness extraction.
19
Data analysis workflows in many scientific domains have become increasingly complex and flexible (=
subject to variability). Here we assess the effect of this flexibility on the results of functional magnetic
resonance imaging by asking 70 independent teams to analyse the same dataset, testing the same 9
ex-ante hypotheses. The flexibility of analytical approaches is exemplified by the fact that no two teams
chose identical workflows to analyse the data. This flexibility resulted in sizeable variation in the results of
hypothesis tests, even for teams whose statistical maps were highly correlated at intermediate stages of
the analysis pipeline. Variation in reported results was related to several aspects of analysis methodology.
Notably, a meta-analytical approach that aggregated information across teams yielded a significant
consensus in activated regions. Furthermore, prediction markets of researchers in the field revealed an
overestimation of the likelihood of significant findings, even by researchers with direct knowledge of the
dataset. Our findings show that analytical flexibility can have substantial effects on scientific conclusions,
and identify factors that may be related to variability in the analysis of functional magnetic resonance
imaging. The results emphasize the importance of validating and sharing complex analysis workflows, and
demonstrate the need for performing and reporting multiple analyses of the same data. Potential
approaches that could be used to mitigate issues related to analytical variability are discussed.
20
Deep variability problem (statement)
Fundamentally, we have a huge multi-dimensional variant space (eg 10^6000)
run (source_code, input_data) => result
vs
run (hardware, operating_system, build_environment, input_data’, source_code’, …) => eq~(result)
Fixing variability once and for all in all dimensions/layers, is the obvious solution…
Challenging per se; tools available
But it is either impossible (extreme example: the ages of processor can have an impact
on execution time)... And above all, not desirable:
● non-robust result
● generalization/transferability of the results/findings
● kill innovation
21
Our Vision: Embrace
deep variability!
Explicit modeling of the variability
points and their relationships, such as:
1. Get insights into the variability “factors”
and their possible interactions
2. Capture and document configurations
for the sake of reproducibility
3. Explore diverse configurations to
replicate, and hence optimize, validate,
increase the robustness, or provide
better resilience
⇒ We aim to address the complexities associated
with reproducibility and replicability in modern
software systems and environments, facilitating a
more comprehensive and nuanced perspective on these
critical “factors”.
22
https://hal.science/hal-04582287
Replicability is the holy grail!
Exploring various configurations:
● Make more robust scientific findings
● Define and assess the validity envelope
● Enable exploration and optimization
● Innovation and new hypothesis, insights, knowledge
⇒ We propose to embrace deep variability for the sake of
replicability (challenge: results can will be different… define an equivalence relation and manage
uncertainty: confidence intervals, error margin, etc.)
⇒ Good news: software engineering techniques have been and are
developed to support variability management! 23
Feature model: widely studied and used formalism in software engineering (proposed in 1990!)
● Formal abstractions are definitely needed to encode variability knowledge
and pilot the exploration of computational experiments
● Numerous works/techniques to specify and reverse engineer (out of
spreadsheet, command-line parameters, source code, doc., configurations, etc.) feature models
24
Whole
Population of
Configurations
● Performance
prediction
● Identification of
important
variability factors
● Transferability
● Optimization
Training
Sample
Performance
Measurements
Prediction
Model
J. Alves Pereira, H. Martin, M. Acher, J.-M. Jézéquel, G. Botterweck and A. Ventresque
“Learning Software Configuration Spaces: A Systematic Literature Review” JSS, 2021
Automated and strategic exploration with feature
models: sampling and learning (regression, classification)
25
▸ Solutions and challenges
▸ abstractions/models (feature models)
▸ learning and sampling
▸ reuse of configuration knowledge
▸ leveraging stability
▸ systematic exploration
▸ identification of root causes
▸ LLMs to support exploration of variants’ space
▸ incremental build of configuration space (Randrianaina
et al. ICSE’22)
▸ debloating variability (Ternava et al. SAC’23)
▸ feature subset selection (Martin et al. SPLC’23)
▸ Essentially, we aim to reduce the dimensionality of the
problem as well as the computational and human cost
to foster verification of results and innovation
26
Backup slides
27
28
https://github.com/FAMILIAR-project/reproducibility-associativity/
(excerpt)
textual notation (UVL)
Reproducible Science as a Testing Problem
#1 Test Generation Problem (input)
inputs: computing environment, parameters of an algorithm, versions of
a library or tool, choice of a programming language
#2 Oracle Problem (output)
we usually ignore the outcome! (open problems; open questions; new
knowledge)
System under
Study
(replicable)
Input Output
(scientific
result)
29
Reproduction vs replication http://rescience.github.io/faq/
“Reproduction of a computational study means running the same computation on the same input data, and then checking if the
results are the same, or at least “close enough” when it comes to numerical approximations. Reproduction can be considered as
software testing at the level of a complete study.”
We don’t “test” in one run, in one computing environment, with one kind of input data, etc.
“Replication of a scientific study (computational or other) means repeating a published protocol, respecting its spirit and intentions
but varying the technical details. For computational work, this would mean using different software, running a simulation from
different initial conditions, etc. The idea is to change something that everyone believes shouldn’t matter, and see if the scientific
conclusions are affected or not.”
It is the most interesting direction, basically for synthesizing new scientific knowledge!
In both cases, there is the need to
harness the combinatorial explosion
of deep software variability
30
31
Exact same results? No
32

More Related Content

Similar to Embracing Deep Variability For Reproducibility and Replicability

Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
A Comparison of Traditional Simulation and MSAL (6-3-2015)
A Comparison of Traditional Simulation and MSAL (6-3-2015)A Comparison of Traditional Simulation and MSAL (6-3-2015)
A Comparison of Traditional Simulation and MSAL (6-3-2015)
Bob Garrett
 
PCA_2022-In_and_out.pptx zxczxczxczxczxcxzczx
PCA_2022-In_and_out.pptx zxczxczxczxczxcxzczxPCA_2022-In_and_out.pptx zxczxczxczxczxcxzczx
PCA_2022-In_and_out.pptx zxczxczxczxczxcxzczx
JuanManuelNasralaAlv1
 
The effect of distributed archetypes on complexity theory
The effect of distributed archetypes on complexity theoryThe effect of distributed archetypes on complexity theory
The effect of distributed archetypes on complexity theory
Vinícius Uchôa
 
IEEE Fuzzy system Title and Abstract 2016
IEEE Fuzzy system Title and Abstract 2016 IEEE Fuzzy system Title and Abstract 2016
IEEE Fuzzy system Title and Abstract 2016
tsysglobalsolutions
 
4213ijsea04
4213ijsea044213ijsea04
4213ijsea04
ijseajournal
 
THE REMOVAL OF NUMERICAL DRIFT FROM SCIENTIFIC MODELS
THE REMOVAL OF NUMERICAL DRIFT FROM SCIENTIFIC MODELSTHE REMOVAL OF NUMERICAL DRIFT FROM SCIENTIFIC MODELS
THE REMOVAL OF NUMERICAL DRIFT FROM SCIENTIFIC MODELS
IJSEA
 
System Structure for Dependable Software Systems
System Structure for Dependable Software SystemsSystem Structure for Dependable Software Systems
System Structure for Dependable Software Systems
Vincenzo De Florio
 
The Smart Way To Invest in AI and ML_SFStartupDay
The Smart Way To Invest in AI and ML_SFStartupDayThe Smart Way To Invest in AI and ML_SFStartupDay
The Smart Way To Invest in AI and ML_SFStartupDay
Amazon Web Services
 
50120140506002
5012014050600250120140506002
50120140506002
IAEME Publication
 
Scimakelatex.93126.cocoon.bobbin
Scimakelatex.93126.cocoon.bobbinScimakelatex.93126.cocoon.bobbin
Scimakelatex.93126.cocoon.bobbin
Agostino_Marchetti
 
SIMTECT-2013-Images-That-Change-KP
SIMTECT-2013-Images-That-Change-KPSIMTECT-2013-Images-That-Change-KP
SIMTECT-2013-Images-That-Change-KP
Kurt Pudniks
 
Collaborative archietyped for ipv4
Collaborative archietyped for ipv4Collaborative archietyped for ipv4
Collaborative archietyped for ipv4
Fredrick Ishengoma
 
Early Detection of Collaboration Conflicts & Risks in Software Development
Early Detection of Collaboration Conflicts & Risks in Software DevelopmentEarly Detection of Collaboration Conflicts & Risks in Software Development
Early Detection of Collaboration Conflicts & Risks in Software Development
Roopesh Jhurani
 
Model checking
Model checkingModel checking
Model checking
Richard Ashworth
 
Software aging prediction – a new approach
Software aging prediction – a new approach Software aging prediction – a new approach
Software aging prediction – a new approach
IJECEIAES
 
EMPIRICAL APPLICATION OF SIMULATED ANNEALING USING OBJECT-ORIENTED METRICS TO...
EMPIRICAL APPLICATION OF SIMULATED ANNEALING USING OBJECT-ORIENTED METRICS TO...EMPIRICAL APPLICATION OF SIMULATED ANNEALING USING OBJECT-ORIENTED METRICS TO...
EMPIRICAL APPLICATION OF SIMULATED ANNEALING USING OBJECT-ORIENTED METRICS TO...
ijcsa
 
Software Defect Prediction Using Radial Basis and Probabilistic Neural Networks
Software Defect Prediction Using Radial Basis and Probabilistic Neural NetworksSoftware Defect Prediction Using Radial Basis and Probabilistic Neural Networks
Software Defect Prediction Using Radial Basis and Probabilistic Neural Networks
Editor IJCATR
 
Harmful interupts
Harmful interuptsHarmful interupts
Harmful interupts
Richard Ashworth
 
Mastering Software Variability for Innovation and Science
Mastering Software Variability for Innovation and ScienceMastering Software Variability for Innovation and Science
Mastering Software Variability for Innovation and Science
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 

Similar to Embracing Deep Variability For Reproducibility and Replicability (20)

Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
A Comparison of Traditional Simulation and MSAL (6-3-2015)
A Comparison of Traditional Simulation and MSAL (6-3-2015)A Comparison of Traditional Simulation and MSAL (6-3-2015)
A Comparison of Traditional Simulation and MSAL (6-3-2015)
 
PCA_2022-In_and_out.pptx zxczxczxczxczxcxzczx
PCA_2022-In_and_out.pptx zxczxczxczxczxcxzczxPCA_2022-In_and_out.pptx zxczxczxczxczxcxzczx
PCA_2022-In_and_out.pptx zxczxczxczxczxcxzczx
 
The effect of distributed archetypes on complexity theory
The effect of distributed archetypes on complexity theoryThe effect of distributed archetypes on complexity theory
The effect of distributed archetypes on complexity theory
 
IEEE Fuzzy system Title and Abstract 2016
IEEE Fuzzy system Title and Abstract 2016 IEEE Fuzzy system Title and Abstract 2016
IEEE Fuzzy system Title and Abstract 2016
 
4213ijsea04
4213ijsea044213ijsea04
4213ijsea04
 
THE REMOVAL OF NUMERICAL DRIFT FROM SCIENTIFIC MODELS
THE REMOVAL OF NUMERICAL DRIFT FROM SCIENTIFIC MODELSTHE REMOVAL OF NUMERICAL DRIFT FROM SCIENTIFIC MODELS
THE REMOVAL OF NUMERICAL DRIFT FROM SCIENTIFIC MODELS
 
System Structure for Dependable Software Systems
System Structure for Dependable Software SystemsSystem Structure for Dependable Software Systems
System Structure for Dependable Software Systems
 
The Smart Way To Invest in AI and ML_SFStartupDay
The Smart Way To Invest in AI and ML_SFStartupDayThe Smart Way To Invest in AI and ML_SFStartupDay
The Smart Way To Invest in AI and ML_SFStartupDay
 
50120140506002
5012014050600250120140506002
50120140506002
 
Scimakelatex.93126.cocoon.bobbin
Scimakelatex.93126.cocoon.bobbinScimakelatex.93126.cocoon.bobbin
Scimakelatex.93126.cocoon.bobbin
 
SIMTECT-2013-Images-That-Change-KP
SIMTECT-2013-Images-That-Change-KPSIMTECT-2013-Images-That-Change-KP
SIMTECT-2013-Images-That-Change-KP
 
Collaborative archietyped for ipv4
Collaborative archietyped for ipv4Collaborative archietyped for ipv4
Collaborative archietyped for ipv4
 
Early Detection of Collaboration Conflicts & Risks in Software Development
Early Detection of Collaboration Conflicts & Risks in Software DevelopmentEarly Detection of Collaboration Conflicts & Risks in Software Development
Early Detection of Collaboration Conflicts & Risks in Software Development
 
Model checking
Model checkingModel checking
Model checking
 
Software aging prediction – a new approach
Software aging prediction – a new approach Software aging prediction – a new approach
Software aging prediction – a new approach
 
EMPIRICAL APPLICATION OF SIMULATED ANNEALING USING OBJECT-ORIENTED METRICS TO...
EMPIRICAL APPLICATION OF SIMULATED ANNEALING USING OBJECT-ORIENTED METRICS TO...EMPIRICAL APPLICATION OF SIMULATED ANNEALING USING OBJECT-ORIENTED METRICS TO...
EMPIRICAL APPLICATION OF SIMULATED ANNEALING USING OBJECT-ORIENTED METRICS TO...
 
Software Defect Prediction Using Radial Basis and Probabilistic Neural Networks
Software Defect Prediction Using Radial Basis and Probabilistic Neural NetworksSoftware Defect Prediction Using Radial Basis and Probabilistic Neural Networks
Software Defect Prediction Using Radial Basis and Probabilistic Neural Networks
 
Harmful interupts
Harmful interuptsHarmful interupts
Harmful interupts
 
Mastering Software Variability for Innovation and Science
Mastering Software Variability for Innovation and ScienceMastering Software Variability for Innovation and Science
Mastering Software Variability for Innovation and Science
 

More from University of Rennes, INSA Rennes, Inria/IRISA, CNRS

A Demonstration of End-User Code Customization Using Generative AI
A Demonstration of End-User Code Customization Using Generative AIA Demonstration of End-User Code Customization Using Generative AI
A Demonstration of End-User Code Customization Using Generative AI
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
24 Reasons Why Variability Models Are Not Yet Universal (24RWVMANYU)
24 Reasons Why Variability Models Are Not Yet Universal (24RWVMANYU)24 Reasons Why Variability Models Are Not Yet Universal (24RWVMANYU)
24 Reasons Why Variability Models Are Not Yet Universal (24RWVMANYU)
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
On Programming Variability with Large Language Model-based Assistant
On Programming Variability with Large Language Model-based AssistantOn Programming Variability with Large Language Model-based Assistant
On Programming Variability with Large Language Model-based Assistant
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Generative AI for Reengineering Variants into Software Product Lines: An Expe...
Generative AI for Reengineering Variants into Software Product Lines: An Expe...Generative AI for Reengineering Variants into Software Product Lines: An Expe...
Generative AI for Reengineering Variants into Software Product Lines: An Expe...
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Tackling Deep Software Variability Together
Tackling Deep Software Variability TogetherTackling Deep Software Variability Together
Tackling Deep Software Variability Together
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
On anti-cheating in chess, science, reproducibility, and variability
On anti-cheating in chess, science, reproducibility, and variabilityOn anti-cheating in chess, science, reproducibility, and variability
On anti-cheating in chess, science, reproducibility, and variability
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Feature Subset Selection for Learning Huge Configuration Spaces: The case of ...
Feature Subset Selection for Learning Huge Configuration Spaces: The case of ...Feature Subset Selection for Learning Huge Configuration Spaces: The case of ...
Feature Subset Selection for Learning Huge Configuration Spaces: The case of ...
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Machine Learning and Deep Software Variability
Machine Learning and Deep Software VariabilityMachine Learning and Deep Software Variability
Machine Learning and Deep Software Variability
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Transfer Learning Across Variants and Versions: The Case of Linux Kernel Size
Transfer Learning Across Variants and Versions: The Case of Linux Kernel SizeTransfer Learning Across Variants and Versions: The Case of Linux Kernel Size
Transfer Learning Across Variants and Versions: The Case of Linux Kernel Size
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Reproducible Science and Deep Software Variability
Reproducible Science and Deep Software VariabilityReproducible Science and Deep Software Variability
Reproducible Science and Deep Software Variability
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Software Variability and Artificial Intelligence
Software Variability and Artificial IntelligenceSoftware Variability and Artificial Intelligence
Software Variability and Artificial Intelligence
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Teaching Software Product Lines: A Snapshot of Current Practices and Challenges
Teaching Software Product Lines: A Snapshot of Current Practices and ChallengesTeaching Software Product Lines: A Snapshot of Current Practices and Challenges
Teaching Software Product Lines: A Snapshot of Current Practices and Challenges
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Exploiting the Enumeration of All Feature Model Configurations: A New Perspec...
Exploiting the Enumeration of All Feature Model Configurations: A New Perspec...Exploiting the Enumeration of All Feature Model Configurations: A New Perspec...
Exploiting the Enumeration of All Feature Model Configurations: A New Perspec...
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Assessing Product Line Derivation Operators Applied to Java Source Code: An E...
Assessing Product Line Derivation Operators Applied to Java Source Code: An E...Assessing Product Line Derivation Operators Applied to Java Source Code: An E...
Assessing Product Line Derivation Operators Applied to Java Source Code: An E...
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Synthesis of Attributed Feature Models From Product Descriptions
Synthesis of Attributed Feature Models From Product DescriptionsSynthesis of Attributed Feature Models From Product Descriptions
Synthesis of Attributed Feature Models From Product Descriptions
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
From Basic Variability Models to OpenCompare.org
From Basic Variability Models to OpenCompare.orgFrom Basic Variability Models to OpenCompare.org
From Basic Variability Models to OpenCompare.org
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Pandoc: a universal document converter
Pandoc: a universal document converterPandoc: a universal document converter
Pandoc: a universal document converter
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Metamorphic Domain-Specific Languages
Metamorphic Domain-Specific LanguagesMetamorphic Domain-Specific Languages
Metamorphic Domain-Specific Languages
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
3D Printing, Customization, and Product Lines
3D Printing, Customization, and Product Lines3D Printing, Customization, and Product Lines
3D Printing, Customization, and Product Lines
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
WebFML: Synthesizing Feature Models Everywhere (@ SPLC 2014)
WebFML: Synthesizing Feature Models Everywhere (@ SPLC 2014)WebFML: Synthesizing Feature Models Everywhere (@ SPLC 2014)
WebFML: Synthesizing Feature Models Everywhere (@ SPLC 2014)
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 

More from University of Rennes, INSA Rennes, Inria/IRISA, CNRS (20)

A Demonstration of End-User Code Customization Using Generative AI
A Demonstration of End-User Code Customization Using Generative AIA Demonstration of End-User Code Customization Using Generative AI
A Demonstration of End-User Code Customization Using Generative AI
 
24 Reasons Why Variability Models Are Not Yet Universal (24RWVMANYU)
24 Reasons Why Variability Models Are Not Yet Universal (24RWVMANYU)24 Reasons Why Variability Models Are Not Yet Universal (24RWVMANYU)
24 Reasons Why Variability Models Are Not Yet Universal (24RWVMANYU)
 
On Programming Variability with Large Language Model-based Assistant
On Programming Variability with Large Language Model-based AssistantOn Programming Variability with Large Language Model-based Assistant
On Programming Variability with Large Language Model-based Assistant
 
Generative AI for Reengineering Variants into Software Product Lines: An Expe...
Generative AI for Reengineering Variants into Software Product Lines: An Expe...Generative AI for Reengineering Variants into Software Product Lines: An Expe...
Generative AI for Reengineering Variants into Software Product Lines: An Expe...
 
Tackling Deep Software Variability Together
Tackling Deep Software Variability TogetherTackling Deep Software Variability Together
Tackling Deep Software Variability Together
 
On anti-cheating in chess, science, reproducibility, and variability
On anti-cheating in chess, science, reproducibility, and variabilityOn anti-cheating in chess, science, reproducibility, and variability
On anti-cheating in chess, science, reproducibility, and variability
 
Feature Subset Selection for Learning Huge Configuration Spaces: The case of ...
Feature Subset Selection for Learning Huge Configuration Spaces: The case of ...Feature Subset Selection for Learning Huge Configuration Spaces: The case of ...
Feature Subset Selection for Learning Huge Configuration Spaces: The case of ...
 
Machine Learning and Deep Software Variability
Machine Learning and Deep Software VariabilityMachine Learning and Deep Software Variability
Machine Learning and Deep Software Variability
 
Transfer Learning Across Variants and Versions: The Case of Linux Kernel Size
Transfer Learning Across Variants and Versions: The Case of Linux Kernel SizeTransfer Learning Across Variants and Versions: The Case of Linux Kernel Size
Transfer Learning Across Variants and Versions: The Case of Linux Kernel Size
 
Reproducible Science and Deep Software Variability
Reproducible Science and Deep Software VariabilityReproducible Science and Deep Software Variability
Reproducible Science and Deep Software Variability
 
Software Variability and Artificial Intelligence
Software Variability and Artificial IntelligenceSoftware Variability and Artificial Intelligence
Software Variability and Artificial Intelligence
 
Teaching Software Product Lines: A Snapshot of Current Practices and Challenges
Teaching Software Product Lines: A Snapshot of Current Practices and ChallengesTeaching Software Product Lines: A Snapshot of Current Practices and Challenges
Teaching Software Product Lines: A Snapshot of Current Practices and Challenges
 
Exploiting the Enumeration of All Feature Model Configurations: A New Perspec...
Exploiting the Enumeration of All Feature Model Configurations: A New Perspec...Exploiting the Enumeration of All Feature Model Configurations: A New Perspec...
Exploiting the Enumeration of All Feature Model Configurations: A New Perspec...
 
Assessing Product Line Derivation Operators Applied to Java Source Code: An E...
Assessing Product Line Derivation Operators Applied to Java Source Code: An E...Assessing Product Line Derivation Operators Applied to Java Source Code: An E...
Assessing Product Line Derivation Operators Applied to Java Source Code: An E...
 
Synthesis of Attributed Feature Models From Product Descriptions
Synthesis of Attributed Feature Models From Product DescriptionsSynthesis of Attributed Feature Models From Product Descriptions
Synthesis of Attributed Feature Models From Product Descriptions
 
From Basic Variability Models to OpenCompare.org
From Basic Variability Models to OpenCompare.orgFrom Basic Variability Models to OpenCompare.org
From Basic Variability Models to OpenCompare.org
 
Pandoc: a universal document converter
Pandoc: a universal document converterPandoc: a universal document converter
Pandoc: a universal document converter
 
Metamorphic Domain-Specific Languages
Metamorphic Domain-Specific LanguagesMetamorphic Domain-Specific Languages
Metamorphic Domain-Specific Languages
 
3D Printing, Customization, and Product Lines
3D Printing, Customization, and Product Lines3D Printing, Customization, and Product Lines
3D Printing, Customization, and Product Lines
 
WebFML: Synthesizing Feature Models Everywhere (@ SPLC 2014)
WebFML: Synthesizing Feature Models Everywhere (@ SPLC 2014)WebFML: Synthesizing Feature Models Everywhere (@ SPLC 2014)
WebFML: Synthesizing Feature Models Everywhere (@ SPLC 2014)
 

Recently uploaded

Properties of virus(Ultrastructure and types of virus)
Properties of virus(Ultrastructure and types of virus)Properties of virus(Ultrastructure and types of virus)
Properties of virus(Ultrastructure and types of virus)
saloniswain225
 
Testing the Son of God Hypothesis (Jesus Christ)
Testing the Son of God Hypothesis (Jesus Christ)Testing the Son of God Hypothesis (Jesus Christ)
Testing the Son of God Hypothesis (Jesus Christ)
Robert Luk
 
smallintestinedisorders-causessymptoms-240626051934-b669b27d.pptx
smallintestinedisorders-causessymptoms-240626051934-b669b27d.pptxsmallintestinedisorders-causessymptoms-240626051934-b669b27d.pptx
smallintestinedisorders-causessymptoms-240626051934-b669b27d.pptx
muralinath2
 
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
PANDURANGLAWATE1
 
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
Sérgio Sacani
 
largeintestinepathologiesconditions-240627071428-3c936a47 (2).pptx
largeintestinepathologiesconditions-240627071428-3c936a47 (2).pptxlargeintestinepathologiesconditions-240627071428-3c936a47 (2).pptx
largeintestinepathologiesconditions-240627071428-3c936a47 (2).pptx
muralinath2
 
The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...
The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...
The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...
Sérgio Sacani
 
MCQ in Electrostatics. for class XII pptx
MCQ in Electrostatics. for class XII  pptxMCQ in Electrostatics. for class XII  pptx
MCQ in Electrostatics. for class XII pptx
ArunachalamM22
 
poikilocytosis 237654378658585210854.pptx
poikilocytosis 237654378658585210854.pptxpoikilocytosis 237654378658585210854.pptx
poikilocytosis 237654378658585210854.pptx
muralinath2
 
degree Certificate of Aston University
degree Certificate of Aston Universitydegree Certificate of Aston University
degree Certificate of Aston University
ebgyz
 
PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...
PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...
PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...
Thane Heins
 
poikilocytosis 23765437865210857453257844.pptx
poikilocytosis 23765437865210857453257844.pptxpoikilocytosis 23765437865210857453257844.pptx
poikilocytosis 23765437865210857453257844.pptx
muralinath2
 
Lunar Mobility Drivers and Needs - Artemis
Lunar Mobility Drivers and Needs - ArtemisLunar Mobility Drivers and Needs - Artemis
Lunar Mobility Drivers and Needs - Artemis
Sérgio Sacani
 
Active and Passive Surveillance of pharmacovigillance
Active and Passive Surveillance of pharmacovigillanceActive and Passive Surveillance of pharmacovigillance
Active and Passive Surveillance of pharmacovigillance
SejalAgrawal43
 
Transmission Spectroscopy of the Habitable Zone Exoplanet LHS 1140 b with JWS...
Transmission Spectroscopy of the Habitable Zone Exoplanet LHS 1140 b with JWS...Transmission Spectroscopy of the Habitable Zone Exoplanet LHS 1140 b with JWS...
Transmission Spectroscopy of the Habitable Zone Exoplanet LHS 1140 b with JWS...
Sérgio Sacani
 
BIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptx
BIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptxBIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptx
BIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptx
alishyt102010
 
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary TrackThe Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
Sérgio Sacani
 
SCIENTIFIC INVESTIGATIONS – THE IMPORTANCE OF FAIR TESTING.pptx
SCIENTIFIC INVESTIGATIONS – THE IMPORTANCE OF FAIR TESTING.pptxSCIENTIFIC INVESTIGATIONS – THE IMPORTANCE OF FAIR TESTING.pptx
SCIENTIFIC INVESTIGATIONS – THE IMPORTANCE OF FAIR TESTING.pptx
JoanaBanasen1
 
Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...
Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...
Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...
James AH Campbell
 
Anatomy, and reproduction of Gnetum.pptx
Anatomy, and reproduction of Gnetum.pptxAnatomy, and reproduction of Gnetum.pptx
Anatomy, and reproduction of Gnetum.pptx
karthiksaran8
 

Recently uploaded (20)

Properties of virus(Ultrastructure and types of virus)
Properties of virus(Ultrastructure and types of virus)Properties of virus(Ultrastructure and types of virus)
Properties of virus(Ultrastructure and types of virus)
 
Testing the Son of God Hypothesis (Jesus Christ)
Testing the Son of God Hypothesis (Jesus Christ)Testing the Son of God Hypothesis (Jesus Christ)
Testing the Son of God Hypothesis (Jesus Christ)
 
smallintestinedisorders-causessymptoms-240626051934-b669b27d.pptx
smallintestinedisorders-causessymptoms-240626051934-b669b27d.pptxsmallintestinedisorders-causessymptoms-240626051934-b669b27d.pptx
smallintestinedisorders-causessymptoms-240626051934-b669b27d.pptx
 
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
 
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
 
largeintestinepathologiesconditions-240627071428-3c936a47 (2).pptx
largeintestinepathologiesconditions-240627071428-3c936a47 (2).pptxlargeintestinepathologiesconditions-240627071428-3c936a47 (2).pptx
largeintestinepathologiesconditions-240627071428-3c936a47 (2).pptx
 
The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...
The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...
The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...
 
MCQ in Electrostatics. for class XII pptx
MCQ in Electrostatics. for class XII  pptxMCQ in Electrostatics. for class XII  pptx
MCQ in Electrostatics. for class XII pptx
 
poikilocytosis 237654378658585210854.pptx
poikilocytosis 237654378658585210854.pptxpoikilocytosis 237654378658585210854.pptx
poikilocytosis 237654378658585210854.pptx
 
degree Certificate of Aston University
degree Certificate of Aston Universitydegree Certificate of Aston University
degree Certificate of Aston University
 
PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...
PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...
PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...
 
poikilocytosis 23765437865210857453257844.pptx
poikilocytosis 23765437865210857453257844.pptxpoikilocytosis 23765437865210857453257844.pptx
poikilocytosis 23765437865210857453257844.pptx
 
Lunar Mobility Drivers and Needs - Artemis
Lunar Mobility Drivers and Needs - ArtemisLunar Mobility Drivers and Needs - Artemis
Lunar Mobility Drivers and Needs - Artemis
 
Active and Passive Surveillance of pharmacovigillance
Active and Passive Surveillance of pharmacovigillanceActive and Passive Surveillance of pharmacovigillance
Active and Passive Surveillance of pharmacovigillance
 
Transmission Spectroscopy of the Habitable Zone Exoplanet LHS 1140 b with JWS...
Transmission Spectroscopy of the Habitable Zone Exoplanet LHS 1140 b with JWS...Transmission Spectroscopy of the Habitable Zone Exoplanet LHS 1140 b with JWS...
Transmission Spectroscopy of the Habitable Zone Exoplanet LHS 1140 b with JWS...
 
BIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptx
BIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptxBIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptx
BIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptx
 
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary TrackThe Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
 
SCIENTIFIC INVESTIGATIONS – THE IMPORTANCE OF FAIR TESTING.pptx
SCIENTIFIC INVESTIGATIONS – THE IMPORTANCE OF FAIR TESTING.pptxSCIENTIFIC INVESTIGATIONS – THE IMPORTANCE OF FAIR TESTING.pptx
SCIENTIFIC INVESTIGATIONS – THE IMPORTANCE OF FAIR TESTING.pptx
 
Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...
Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...
Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...
 
Anatomy, and reproduction of Gnetum.pptx
Anatomy, and reproduction of Gnetum.pptxAnatomy, and reproduction of Gnetum.pptx
Anatomy, and reproduction of Gnetum.pptx
 

Embracing Deep Variability For Reproducibility and Replicability

  • 1. Embracing Deep Variability For Reproducibility and Replicability Mathieu Acher, Benoit Combemale, Georges Aaron Randrianaina, Jean-Marc Jézéquel https://hal.science/hal-04582287 @acherm
  • 2. Embracing Deep Variability For Reproducibility and Replicability Abstract: Reproducibility (aka determinism in some cases) constitutes a fundamental aspect in various fields of computer science, such as floating-point computations in numerical analysis and simulation, concurrency models in parallelism, reproducible builds for third parties integration and packaging, and containerization for execution environments. These concepts, while pervasive across diverse concerns, often exhibit intricate inter-dependencies, making it challenging to achieve a comprehensive understanding. In this short and vision paper we delve into the application of software engineering techniques, specifically variability management, to systematically identify and explicit points of variability that may give rise to reproducibility issues (eg language, libraries, compiler, virtual machine, OS, environment variables, etc). The primary objectives are: i) gaining insights into the variability layers and their possible interactions, ii) capturing and documenting configurations for the sake of reproducibility, and iii) exploring diverse configurations to replicate, and hence validate and ensure the robustness of results. By adopting these methodologies, we aim to address the complexities associated with reproducibility and replicability in modern software systems and environments, facilitating a more comprehensive and nuanced perspective on these critical aspects.
  • 3. Computational science depends on software and its engineering 3 design of mathematical model mining and analysis of data executions of large simulations problem solving executable paper from a set of scripts to automate the deployment to… a comprehensive system containing several features that help researchers exploring various hypotheses
  • 4. Computational science depends on software and its engineering 4 Dealing with software collapse: software stops working eventually Konrad Hinsen 2019 Configuration failures represent one of the most common types of software failures Sayagh et al. TSE 2018 multi-million line of code base multi-dependencies multi-systems multi-layer multi-version multi-person multi-variant
  • 5. “Insanity is doing the same thing over and over again and expecting different results” Overoptimistic reaction: we are not insane in computational science; we just live with the fact that many variability factors can lead to “different” results. Reality check: we are not insane in CS; we fix everything and explore a tiny portion and ignore which variability factors have a significant effect on results hope that variability factors won’t refute our findings 5 http://throwgrammarfromthetrain.blogspot.com/2010/10/definition-of-insanity.html
  • 6. Reproducible science with variability 6 “Authors provide all the necessary data and the computer codes to run the analysis again, re-creating the results.” Yet, despite the availability of data and code, several studies report that unexplored variability in software can lead to varying results up to the point discrepancies can radically change the conclusions and contradict established knowledge from a set of scripts to automate the deployment to… a comprehensive system containing several features that help researchers exploring various hypotheses
  • 7. Computational science with deep variability 7 hardware variability 25,000+ options, 10^6000 variants (operating system) thousands of compiler flags dozens of library versions dozens of command-line parameters (container) configuration files (distributed environment) hyperparameters (application code) variability in data energy consumption execution time binary 42 accuracy
  • 8. Deep Software Variability “refers to the interaction of all external “factors” modifying the behavior (including both functional and nonfunctional properties) of a software system” Lesoil et al. VaMoS 2020 Combinatorial explosion of the epistemic and ontological variability with impacts on computational result and non-functional properties 8 always 42 ?
  • 12. Deep Software Variability “refers to the interaction of all external “factors” modifying the behavior (including both functional and nonfunctional properties) of a software system” Lesoil et al. VaMoS 2020 Combinatorial explosion of the epistemic and ontological variability with impacts on computational result and non-functional properties Non-linear interplays between variability layers! Some evidence of deep variability: ● Climate model ● Machine learning ● Neuroimaging ● Bluff-body aerodynamics ● Performance modeling of software ● Reproducible builds 12 always 42 ?
  • 13. Can a coupled ESM simulation be restarted from a different machine without causing climate-changing modifications in the results? Using two versions of EC-Earth: one “non-replicable” case (see below) and one replicable case. 13
  • 14. We demonstrate that effects of parameter, hardware, and software variation are detectable, complex, and interacting. However, we find most of the effects of parameter variation are caused by a small subset of parameters. Notably, the entrainment coefficient in clouds is associated with 30% of the variation seen in climate sensitivity, although both low and high values can give high climate sensitivity. We demonstrate that the effect of hardware and software is small relative to the effect of parameter variation and, over the wide range of systems tested, may be treated as equivalent to that caused by changes in initial conditions. 57,067 climate model runs. These runs sample parameter space for 10 parameters with between two and four levels of each, covering 12,487 parameter combinations (24% of possible combinations) and a range of initial conditions 14
  • 15. Joelle Pineau “Building Reproducible, Reusable, and Robust Machine Learning Software” ICSE’19 keynote “[...] results can be brittle to even minor perturbations in the domain or experimental procedure” What is the magnitude of the effect hyperparameter settings can have on baseline performance? How does the choice of network architecture for the policy and value function approximation affect performance? How can the reward scale affect results? Can random seeds drastically alter performance? How do the environment properties affect variability in reported RL algorithm performance? Are commonly used baseline implementations comparable? 15
  • 16. “Completing a full replication study of our previously published findings on bluff-body aerodynamics was harder than we thought. Despite the fact that we have good reproducible-research practices, sharing our code and data openly.” 16
  • 17. Can Machine Learning Pipelines Be Better Configured? Wang et al. FSE’2023 “A pipeline is subject to misconfiguration if it exhibits significantly inconsistent performance upon changes in the versions of its configured libraries or the combination of these libraries. We refer to such performance inconsistency as a pipeline configuration (PLC) issue.” 17
  • 18. Should software version numbers determine science? Significant differences were revealed between FreeSurfer version v5.0.0 and the two earlier versions. [...] About a factor two smaller differences were detected between Macintosh and Hewlett-Packard workstations and between OSX 10.5 and OSX 10.6. The observed differences are similar in magnitude as effect sizes reported in accuracy evaluations and neurodegenerative studies. see also Krefting, D., Scheel, M., Freing, A., Specovius, S., Paul, F., and Brandt, A. (2011). “Reliability of quantitative neuroimage analysis using freesurfer in distributed environments,” in MICCAI Workshop on High-Performance and Distributed Computing for Medical Imaging. 18
  • 19. “Neuroimaging pipelines are known to generate different results depending on the computing platform where they are compiled and executed.” Reproducibility of neuroimaging analyses across operating systems, Glatard et al., Front. Neuroinform., 24 April 2015 The implementation of mathematical functions manipulating single-precision floating-point numbers in libmath has evolved during the last years, leading to numerical differences in computational results. While these differences have little or no impact on simple analysis pipelines such as brain extraction and cortical tissue classification, their accumulation creates important differences in longer pipelines such as the subcortical tissue classification, RSfMRI analysis, and cortical thickness extraction. 19
  • 20. Data analysis workflows in many scientific domains have become increasingly complex and flexible (= subject to variability). Here we assess the effect of this flexibility on the results of functional magnetic resonance imaging by asking 70 independent teams to analyse the same dataset, testing the same 9 ex-ante hypotheses. The flexibility of analytical approaches is exemplified by the fact that no two teams chose identical workflows to analyse the data. This flexibility resulted in sizeable variation in the results of hypothesis tests, even for teams whose statistical maps were highly correlated at intermediate stages of the analysis pipeline. Variation in reported results was related to several aspects of analysis methodology. Notably, a meta-analytical approach that aggregated information across teams yielded a significant consensus in activated regions. Furthermore, prediction markets of researchers in the field revealed an overestimation of the likelihood of significant findings, even by researchers with direct knowledge of the dataset. Our findings show that analytical flexibility can have substantial effects on scientific conclusions, and identify factors that may be related to variability in the analysis of functional magnetic resonance imaging. The results emphasize the importance of validating and sharing complex analysis workflows, and demonstrate the need for performing and reporting multiple analyses of the same data. Potential approaches that could be used to mitigate issues related to analytical variability are discussed. 20
  • 21. Deep variability problem (statement) Fundamentally, we have a huge multi-dimensional variant space (eg 10^6000) run (source_code, input_data) => result vs run (hardware, operating_system, build_environment, input_data’, source_code’, …) => eq~(result) Fixing variability once and for all in all dimensions/layers, is the obvious solution… Challenging per se; tools available But it is either impossible (extreme example: the ages of processor can have an impact on execution time)... And above all, not desirable: ● non-robust result ● generalization/transferability of the results/findings ● kill innovation 21
  • 22. Our Vision: Embrace deep variability! Explicit modeling of the variability points and their relationships, such as: 1. Get insights into the variability “factors” and their possible interactions 2. Capture and document configurations for the sake of reproducibility 3. Explore diverse configurations to replicate, and hence optimize, validate, increase the robustness, or provide better resilience ⇒ We aim to address the complexities associated with reproducibility and replicability in modern software systems and environments, facilitating a more comprehensive and nuanced perspective on these critical “factors”. 22 https://hal.science/hal-04582287
  • 23. Replicability is the holy grail! Exploring various configurations: ● Make more robust scientific findings ● Define and assess the validity envelope ● Enable exploration and optimization ● Innovation and new hypothesis, insights, knowledge ⇒ We propose to embrace deep variability for the sake of replicability (challenge: results can will be different… define an equivalence relation and manage uncertainty: confidence intervals, error margin, etc.) ⇒ Good news: software engineering techniques have been and are developed to support variability management! 23
  • 24. Feature model: widely studied and used formalism in software engineering (proposed in 1990!) ● Formal abstractions are definitely needed to encode variability knowledge and pilot the exploration of computational experiments ● Numerous works/techniques to specify and reverse engineer (out of spreadsheet, command-line parameters, source code, doc., configurations, etc.) feature models 24
  • 25. Whole Population of Configurations ● Performance prediction ● Identification of important variability factors ● Transferability ● Optimization Training Sample Performance Measurements Prediction Model J. Alves Pereira, H. Martin, M. Acher, J.-M. Jézéquel, G. Botterweck and A. Ventresque “Learning Software Configuration Spaces: A Systematic Literature Review” JSS, 2021 Automated and strategic exploration with feature models: sampling and learning (regression, classification) 25
  • 26. ▸ Solutions and challenges ▸ abstractions/models (feature models) ▸ learning and sampling ▸ reuse of configuration knowledge ▸ leveraging stability ▸ systematic exploration ▸ identification of root causes ▸ LLMs to support exploration of variants’ space ▸ incremental build of configuration space (Randrianaina et al. ICSE’22) ▸ debloating variability (Ternava et al. SAC’23) ▸ feature subset selection (Martin et al. SPLC’23) ▸ Essentially, we aim to reduce the dimensionality of the problem as well as the computational and human cost to foster verification of results and innovation 26
  • 29. Reproducible Science as a Testing Problem #1 Test Generation Problem (input) inputs: computing environment, parameters of an algorithm, versions of a library or tool, choice of a programming language #2 Oracle Problem (output) we usually ignore the outcome! (open problems; open questions; new knowledge) System under Study (replicable) Input Output (scientific result) 29
  • 30. Reproduction vs replication http://rescience.github.io/faq/ “Reproduction of a computational study means running the same computation on the same input data, and then checking if the results are the same, or at least “close enough” when it comes to numerical approximations. Reproduction can be considered as software testing at the level of a complete study.” We don’t “test” in one run, in one computing environment, with one kind of input data, etc. “Replication of a scientific study (computational or other) means repeating a published protocol, respecting its spirit and intentions but varying the technical details. For computational work, this would mean using different software, running a simulation from different initial conditions, etc. The idea is to change something that everyone believes shouldn’t matter, and see if the scientific conclusions are affected or not.” It is the most interesting direction, basically for synthesizing new scientific knowledge! In both cases, there is the need to harness the combinatorial explosion of deep software variability 30
  • 31. 31