Transfer Learning Across Variants and Versions: The Case of Linux Kernel Size

Transfer Learning Across
Variants and Versions
The Case of Linux Kernel Size
Hugo Martin, Mathieu Acher, Juliana Alves Pereira, Luc Lesoil,
Jean-Marc Jézéquel, and Djamel Eddine Khelladi
Published at IEEE Transactions on Software Engineering
(TSE) in 2021
Preprint: https://hal.inria.fr/hal-03358817

Configurable
software
system
Configurations Variants Quantitative
property
(eg related to performance,
security, energy consumption)
176.8Mb
Linux kernel
.config
(compile-time/Kconfig)
Kernel variants
(binaries)
binary size

Configurable
software
system
property
16.1Mb
Linux kernel
.config
Kernel variants
(binaries)
binary size

Configurable
software
system
property
176.8Mb
Linux kernel
.config
Kernel variants
(binaries)
binary size
16.1Mb
77.2Mb

Configurable
software
system
property
Linux kernel
.config
Kernel variants
(binaries)
binary size
?

Challenge 1: you cannot build ≈106000
configurations; sampling and learning to the
rescue but still (very) costly!
7.1Mb
176.8Mb
?
Variability
in
space

Challenge 2: Linux evolves; (heterogeneous)
transfer learning!
7.1Mb
176.8Mb
?
v4.13 v5.8
3 years
later…
?
?
?
Variability
in
space
…Variability in time…

Instead of learning from scratch or reusing “as is”,
we propose to transfer (adapt) a prediction model.
A problem overlooked in the literature is that the
feature spaces differ among versions since options
are added/removed.
We propose TEAMS for transfer evolution-aware
model shifting.
Key results (more details in the rest of the talk):
Rapid degradation of prediction models due to evolution (up to 32% prediction errors)
Effective transfer learning with TEAMS (reaching same accuracy 3 years later + SOTA)
Possible to learn colossal conﬁguration spaces such as the Linux kernel one across
variants and versions

15,000+
options
Linux 5.2.8, arm
(% of types’ options)
39000
26000
≈106000
variants
(without constraints) 9

100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Linux Kernel
≈106000
variants
≈1080
is the estimated number of atoms
in the universe
≈1040
is the estimated number of
possible chess positions
10

Linux Kernel
≈106000
configurations

A challenging case
● Targeted non-functional, quantitative property: binary size
○ interest for maintainers/users of the Linux kernel (embedded systems, cloud, etc.)
○ challenging to predict (cross-cutting options, interplay with compilers/build systems, etc/.)
● Dataset: version 4.13.3, x86_64 arch, measurements of 95K+ random
configurations
○ paranoiac about deep variability since 2017: Docker to control the build environment and scale
○ build: 8 minutes on average
○ diversity: from 7Mb to 1.9Gb
● Do existing techniques work?
○ most of the work in performance prediction consider a relatively low number of options (<50)
○ Linux has 9K+ options for x86_64
2
12

TUXML: Sampling, Measuring, Learning
Docker for a reproducible environment
with tools/packages needed
and Python procedures inside
Easy to launch campaign:
”python kernel_generator.py 10”
builds/measures
10 random configurations
(information sent to a database)
https://github.com/TuxML/

TUXML: Sampling, Measuring, Learning
Docker for a reproducible environment
with tools/packages needed
and Python procedures inside
Easy to launch campaign:
”python3 kernel_generator.py 10”
builds/measures
10 random configurations
(information sent to a database)
https://github.com/TuxML/

Data: version 4.13.3 and 4.15 (x86_64)
74K+ configurations for Linux 4.15
95K+ configurations for Linux 4.13.3
(and 15K hours of computation on a grid computing)

A challenging case
● Linear-based algorithms : high error rate (it’s not additive!)
● Polynomial regression & performance-influence model : Out Of Memory (too much interactions and
not designed for 9K+ options)
● Tree-based algorithms & neural networks: low error rate
Mean Absolute Percentage Error
(MAPE): the lower the better
17
Mathieu Acher, Hugo Martin, Juliana Alves Pereira, Arnaud Blouin, Jean-Marc Jézéquel, Djamel Eddine Khelladi, Luc Lesoil, and Olivier Barais.
Learning Very Large Configuration Spaces: What Matters for Linux Kernel Sizes (2019) https://hal.inria.fr/hal-02314830
N : percentage of the
dataset used to training
for Linux 4.13.3

Can we reuse a prediction model among versions?
No. Accuracy quickly
decreases:
○ 4.13: 6%
○ 4.15: 20%
○ 5.7: 35%
3
19
Insights about evolution of (important) options in the TSE article!

Transfer learning to the rescue
“Inductive transfer refers to any algorithmic process by which structure or
knowledge derived from a learning problem is used to enhance learning on a
related problem.” - Jeremy West in A theoretical foundation for inductive transfer
● 95K+ configuration measurements, 15.000 hours of computation
● Mission Impossible : Saving Private Model 4.13
● Heterogeneous transfer learning: the feature space is different!
5
20

Heterogeneity between versions
● Feature changes
○ Deleted features
○ New features
● Model compatible only with 4.13 features (only them, all of them)
○ Delete new features
○ Create dummy values for old features
■ Set all features to 1
4.13
4.15
4
4.15 compatible
with Model 4.13

Transfer learning to the rescue
● Heterogeneous transfer learning: the feature space is
different
● TEAMS: transfer evolution-aware model shifting
○ train a source model (aka reuse of prediction model)
○ “align” feature space
○ learn the transfer function through model shift
■ train (1) over the targeted version and (2) using the
predicted value of the source
■ linear model shift is not sufficient (intuitively, the relation
“source_size * k = target_size” does not hold and is much
more complex!)
● Bonus: TEAMS is compositional and you can combine
transfer across successive versions
5
22
Insights about composing TEAMS in the TSE article!

TEAMS in action… (eg from version 4.13 to 5.8)

TEAMS: transfer evolution-aware model shifting
5
Budget : 1.000 configurations
● Model shifting :
○ From 6.7% to 10.6% error rate
● Scratch :
● Scratch :
● Scratch :
24
(5K configurations
for training)

Kpredict
Python module for Python 3.8+ ( https://github.com/HugoJPMartin/kpredict )
Works for many kernel versions and any configuration x86_64
Error : ≃ 6.3%
97% of the predictions are below 20% error
12
H. Martin, M. Acher, J. A. Pereira, L. Lesoil, J. Jézéquel and D. E. Khelladi, “Transfer learning across variants and versions: The
case of linux kernel size” Transactions on Software Engineering (TSE), 2021 25

Conclusion
Learning across variants and versions at the scale of Linux
Direct perspectives for Linux:
● Considering more recent versions (negative transfer?)
● Deep variability: interplay with compilers (gcc version/clang version) or achitecture (eg
x86_32)
● Beyond binary size, targetting other quantitative properties (performance, energy, security,
build time, etc.)
Beyond Linux, mobile/web apps, web browsers, compilers, database systems, cloud services,
image processing, distributed streaming platforms, data transfer tools are also:
● highly configurable (variability in space)
● continuously evolving with the addition or removal of options + maintenance of code base
(variability in time)
Does evolution degrade performance model?
Is (heterogeneous) transfer learning (eg TEAMS) effective?

Published at IEEE Transactions on Software
Engineering (TSE) in 2021
Preprint: https://hal.inria.fr/hal-03358817

Transfer learning
“Inductive transfer refers to any algorithmic process by which structure or
knowledge derived from a learning problem is used to enhance learning on a
related problem.” - Jeremy West in A theoretical foundation for inductive transfer
● 100.000 configuration measurements, 15.000 hours of computation
● Mission Impossible : Saving Private Model 4.13
○ Budget : 5.000 configurations measurements (one night worth of ISTIC computing power)
5

Model 4.13: Genesis
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
16MB
52MB
...
115MB
6

Model 4.13: Genesis
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
16MB
52MB
...
115MB
Gradient Boosting
Tree algorithm
Features Target
6

Model 4.13: Genesis
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
16MB
52MB
...
115MB
Gradient Boosting
Tree algorithm
Features Target
Model 4.13
6

Model 4.13: Genesis
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
16MB
52MB
...
115MB
Gradient Boosting
Tree algorithm
Features Target
Model 4.13
f1
f2
f3
... fn
0 1 1 ... 0
1 0 0 ... 1
... ... ... ... ...
1 0 1 ... 0 6

Model 4.13: Genesis
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
16MB
52MB
...
115MB
Gradient Boosting
Tree algorithm
Features Target
Model 4.13
Size
18MB
25MB
...
228MB
Predict
f1
f2
f3
... fn
0 1 1 ... 0
1 0 0 ... 1
... ... ... ... ...
1 0 1 ... 0 6

Model 4.13: Genesis
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
16MB
52MB
...
115MB
Gradient Boosting
Tree algorithm
Features Target
Model 4.13
Size
18MB
25MB
...
228MB
Predict
✅
✅
✅
f1
f2
f3
... fn
0 1 1 ... 0
1 0 0 ... 1
... ... ... ... ...
1 0 1 ... 0 6

Model Shifting
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
22MB
68MB
...
105MB
Gradient Boosting
Tree algorithm
Features Target
Model 4.15
7

Model Shifting
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
22MB
68MB
...
105MB
Gradient Boosting
Tree algorithm
Features Target
Model 4.15
Size
19MB
26MB
...
298MB
Predict
❌
❌
✅
f1
f2
f3
... fn
0 1 1 ... 0
1 0 0 ... 1
... ... ... ... ...
1 0 1 ... 0 7

Model Shifting
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
22MB
68MB
...
105MB
Gradient Boosting
Tree algorithm
Features Target
Model 4.15
Size
19MB
26MB
...
298MB
Predict
❌
❌
✅
f1
f2
f3
... fn
0 1 1 ... 0
1 0 0 ... 1
... ... ... ... ...
1 0 1 ... 0 7
Model 4.13

Model Shifting
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
22MB
68MB
...
105MB
Gradient Boosting
Tree algorithm
Features Target
Model 4.15
f1
f2
f3
... fn
0 1 1 ... 0
1 0 0 ... 1
... ... ... ... ...
1 0 1 ... 0
Size
19MB
26MB
...
298MB
Predict
❌
❌
✅
Model 4.13
Old Size
16MB
52MB
...
115MB
Old Size
18MB
25MB
...
228MB
Predict
Predict
7

Model Shifting
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
22MB
68MB
...
105MB
Gradient Boosting
Tree algorithm
Features Target
Shifting Model
4.15
f1
f2
f3
... fn
0 1 1 ... 0
1 0 0 ... 1
... ... ... ... ...
1 0 1 ... 0
Size
21MB
35MB
...
298MB
Predict
✅
✅
✅
Model 4.13
Old Size
16MB
52MB
...
115MB
Old Size
18MB
25MB
...
228MB
Predict
Predict
7

Results
● Scratch :
8

Results
● Scratch :
● Scratch :
8

Results
● Scratch :
● Scratch :
● Scratch :
8

Incremental Model Shifting
Model 4.13 + Shifting Model 4.15 = Model 4.15
Source + Shifting Model = Full Model
Simple Model Shifting
9

9

Results
● Scratch :
● Incremental Shifting :
○ From 5.6% to 7.5%
10

Results
● Scratch :
○ From 5.6% to 7.5%
● Scratch :
○ From 5.2% to 6.5%
10

Results
● Scratch :
○ From 6.7% to 13.3%
● Scratch :
○ From 5.6% to 7.5%
● Scratch :
○ From 5.2% to 6.5%
10

Results
● Scratch :
○ From 6.7% to 13.3%
● Scratch :
○ From 5.6% to 7.5%
● Scratch :
○ From 5.2% to 6.5%
Model 4.13 Budget : 85.000 configurations
10

Results
● Scratch :
○ From 6.7% to 13.3%
● Scratch :
○ From 5.6% to 7.5%
● Scratch :
○ From 5.2% to 6.5%
10

Results
● Scratch :
○ From 6.7% to 13.3%
● Scratch :
○ From 5.6% to 7.5%
● Scratch :
○ From 5.2% to 6.5%
● Scratch :
○ From 6.7% to 7.9%
10

Results
● Scratch :
○ From 6.7% to 13.3%
● Scratch :
○ From 5.6% to 7.5%
● Scratch :
○ From 5.2% to 6.5%
● Scratch :
○ From 6.7% to 7.9%
● Scratch :
○ From 6.1% to 6.7%
10

Results
● Scratch :
○ From 6.7% to 13.3%
● Scratch :
○ From 5.6% to 7.5%
● Scratch :
○ From 5.2% to 6.5%
● Scratch :
○ From 8.5% to 13.8%
● Scratch :
○ From 6.7% to 7.9%
● Scratch :
○ From 6.1% to 6.7%
10

Summary
● Model 4.13 is saved
○ Positively reuse old model on new version at lower cost
○ Better than learning from scratch for years
● Incremental Shifting
○ More sensible to previous models error
○ Better use of more transfer budget
11

Kpredict
Python module for Python 3.8+ ( https://github.com/HugoJPMartin/kpredict )
Error : ≃ 6.3%
97% of the predictions are below 20% error
12

Transfer Learning Across Variants and Versions: The Case of Linux Kernel Size

Transfer Learning Across Variants and Versions: The Case of Linux Kernel Size

Recommended

Recommended

More Related Content

Similar to Transfer Learning Across Variants and Versions: The Case of Linux Kernel Size

Similar to Transfer Learning Across Variants and Versions: The Case of Linux Kernel Size (20)

More from University of Rennes, INSA Rennes, Inria/IRISA, CNRS

More from University of Rennes, INSA Rennes, Inria/IRISA, CNRS (20)

Recently uploaded

Recently uploaded (20)

Transfer Learning Across Variants and Versions: The Case of Linux Kernel Size