Feature Subset Selection for Learning Huge Configuration Spaces: The case of Linux Kernel Size

Feature Subset Selection for Learning
Huge Configuration Spaces
The case of Linux Kernel Size
Mathieu Acher, Hugo Martin, Juliana Alves Pereira, Luc Lesoil, Arnaud
Blouin, Jean-Marc Jézéquel, Djamel Eddine Khelladi, Olivier Barais
Preprint: https://hal.inria.fr/hal-03720273

15,000+
options
Linux 5.2.8, arm
(% of types’ options)
39000
26000
≈106000
variants
(without constraints) 2

100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Linux Kernel
≈106000
variants
≈1080
is the estimated number of atoms
in the universe
≈1040
is the estimated number of
possible chess positions
3

Dimensionality reduction with feature selection
Huge configuration space ≈106000
configurations
Large option/feature* set: 9K+ options for x86_64
Hypothesis: only a
subset of options
matter when
predicting
properties of
variants
4
*options (~Linux features) are encoded as features (~predictive variables in learning problems)

configurations
Large option/feature set: 9K+ options for x86_64
Hypothesis: only a subset of
options matter when predicting
properties of variants.
Very few studies at this scale
p options p’ options with p’ << p
n configurations
5

Hypothesis: Only a subset of options matter when predicting
properties of variants. Key results:
● Some state-of-the-art solutions are not scaling
due to “too many feature interactions” (think
about combinatorial with thousands of features!)
● Only ~300 features* (instead of 9K+) are
sufficient to efficiently predict and even
outperforms the accuracy of “learning over all
features/options”
● Training time can be decreased
● Identification of influential options is
consistent with, and can even improve, the
expert knowledge about Linux kernel
configuration.
6
*options (~Linux features) are encoded as features (~predictive
variables in learning problems)

Configurable
software
system
Configurations Variants Quantitative
property
(eg related to performance,
security, energy consumption)
176.8Mb
Linux kernel
.config
(compile-time/Kconfig)
Kernel variants
(binaries)
binary size7

Configurable
software
system
property
16.1Mb
Linux kernel
.config
Kernel variants
(binaries)
binary size8

Configurable
software
system
property
176.8Mb
Linux kernel
.config
Kernel variants
(binaries)
binary size
16.1Mb
77.2Mb
9

Configurable
software
system
property
Linux kernel
.config
Kernel variants
(binaries)
binary size
?
10

Challenge: you cannot build ≈106000
configurations; sampling and
learning to the rescue but…
Is it accurate? Is it effective with p’ features and feature selection?
How many features*? Which options* matter?
7.1Mb
176.8Mb
?
11
p’ options with p’ << p
*options (~Linux features) are encoded as features (~predictive
variables in learning problems)

A challenging case
● Targeted non-functional, quantitative
property: binary size
○ interest for maintainers/users of the Linux
kernel (embedded systems, cloud, etc.)
○ challenging to predict (cross-cutting
options, interplay with compilers/build
systems, etc.)
● Dataset: version 4.13.3, x86_64 arch,
measurements of 95K+ random
configurations
○ paranoiac about deep variability since
2017: Docker to control the build
environment and scale
○ build: 8 minutes on average
○ diversity: from 7Mb to 1.9Gb 12

TUXML: Sampling, Measuring, Learning
13
Most of the work consider a relatively low number of options (<50) Linux has 9K+ options for x86_64
Feature subset selection vs recursive feature elimination: scale? accuracy?
*EX: execution, SI: simulation, SA: static analysis, UF: user feedback, SM: synthetic measurements.

Docker for a reproducible environment
with tools/packages needed
and Python procedures inside
Easy to launch campaign:
”python kernel_generator.py 10”
builds/measures
10 random configurations
(information sent to a database)
https://github.com/TuxML/
14

Docker for a reproducible environment
with tools/packages needed
and Python procedures inside
Easy to launch campaign:
”python3 kernel_generator.py 10”
builds/measures
10 random configurations
(information sent to a database)
https://github.com/TuxML/
15

Data: version 4.13.3 (x86_64)
95K+ configurations for Linux 4.13.3
(and 15K hours of computation on a grid computing)
16

RQ1: How do SOTA
techniques perform on
huge configuration spaces?
● Linear-based algorithms : high error rate (it’s not additive!)
● Polynomial regression & performance-influence model : Out Of Memory (too
much interactions and not designed for 9K+ options)
● Tree-based algorithms & neural networks: low error rate
Mean Absolute Percentage Error
(MAPE): the lower the better
17
N : percentage of the
dataset used to training

configurations
Large options/feature set: 9K+ options for x86_64
Only a subset of options matter when
predicting properties of variants.
RQ2: How accurate is the prediction
model with and without feature selection?
p options p’ options with p’ << p
n configurations
18

Dimensionality reduction with Tree-based feature selection
Tree-based algorithm
(Random Forest)
p=8.743 options
Learn on
Full dataset
p’ <<<<< p options
Reduced dataset
Filter
Any learning algorithm
Learn on
DEBUG_INFO (0.33)
active_options (0.19)
group_129 (0.14)
DEBUG_INFO_REDUCED (0.11)
DEBUG_INFO_SPLIT (0.08)
feature ranking
list
(based on
feature
importance)
19

RQ2: Tree-based Feature Selection pays off!
● Tree-based algorithms & neural
networks:
○ Lower error rate
○ Lower training time
■ Random forest : 18x
■ Gradient Boosting Tree : 5x
● Simpler models, easier to train,
and improved accuracy
● Bonus: interpretable and
consistent with domain
knowledge
20

RQ2: Optimal number of features/options when performing
feature selection
● Depending on algorithm
○ Gradient Boosting Trees &
Neural networks : 1500
● Depending on training set size
● Random forest : 250 options
Sweet spot where only ~300
features are sufficient to efficiently
train a Random Forest and a
Gradient Boosting Tree to obtain a
prediction model that outperforms
other baselines operating over the
full set of features (6% prediction
errors for 40K configurations). 21

RQ3+4: Stability of influential options and Training time
reduction
Using an ensemble of Random
Forest allows the creation of a far
more stable list, with more than 95%
common features in top 300 between
multiple list
Tree-based feature selection speeds
the model training at least 5 times
up to 48 times (since p’ <<<< p)
22

RQ5: How do feature ranking lists, as computed by tree-based
feature selection, relate to Linux knowledge? Top
influential
options
147 documented
options in Kconfig
0 - 50 7
50 - 250 6
250 - 500 6
500 - 1500 28
1500 - 69
Top 50 options in the feature ranking list represents 95% of the feature
importance; collinearity and interpretability: beware!
Incompleteness of Linux documentation:
● Vast majority of influential options is either not documented or not
referring to size: only 7 options of the top 50 are documented as
having a clear influence on size
● Leveraging all the 147 options in the Linux documentation (and
only them) leads to prediction error of 23.6% (instead of <6% for
our feature ranking list)
Relevance: Investigations and exchanges with domain experts confirm
the relevance of the top 50, giving 6 categories of options.
Effective identification of important features:
● consistent with Linux knowledge (Kconfig documentation and
expert insight)
● can be used to refine or augment the incomplete
documentation of the Linux kernel.
23

Kaggle competition using our dataset
https://www.kaggle.com/competitions/linux-kernel-size/overview
24
We can benefit from contributions of the
machine learning community…
And our dataset/problems are raising interests.

Conclusion Feature subset selection is effective over
the huge configuration space of Linux:
● only ~300 features out of 9K+
● accuracy is better with than without tree-based
● training time is decreased
● interpretability: identification of influential options is consistent with, and can
even improve, the expert knowledge about Linux kernel configuration
Future work
● Replication on different versions of Linux
● Does feature ranking list transfer to other versions?
https://www.kaggle.com/competitions/linux-kernel-size/overview
25

Decision Tree
● Ability to handle interactions between features
● Low impact of combinatorial explosion
● Competitive accuracy
● Interpretability
○ Decision rules
○ Feature importance
● Ensembles : Random Forests, Gradient Boosting Trees...
○ More accurate, less interpretable
28

Kpredict
Python module for Python 3.8+ ( https://github.com/HugoJPMartin/kpredict )
Works for many kernel versions and any configuration x86_64
Error : ≃ 6.3%
97% of the predictions are below 20% error
H. Martin, M. Acher, J. A. Pereira, L. Lesoil, J. Jézéquel and D. E. Khelladi, “Transfer learning across variants and versions: The
case of linux kernel size” Transactions on Software Engineering (TSE), 2021 29

Published at IEEE Transactions on Software
Engineering (TSE) in 2021
Preprint: https://hal.inria.fr/hal-03358817
30

Transfer learning
“Inductive transfer refers to any algorithmic process by which structure or
knowledge derived from a learning problem is used to enhance learning on a
related problem.” - Jeremy West in A theoretical foundation for inductive transfer
● 100.000 configuration measurements, 15.000 hours of computation
● Mission Impossible : Saving Private Model 4.13
○ Budget : 5.000 configurations measurements (one night worth of ISTIC computing power)
33

Model 4.13: Genesis
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
16MB
52MB
...
115MB
35

Model 4.13: Genesis
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
16MB
52MB
...
115MB
36

Model 4.13: Genesis
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
16MB
52MB
...
115MB
Gradient Boosting
Tree algorithm
Features Target
37

Model 4.13: Genesis
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
16MB
52MB
...
115MB
Gradient Boosting
Tree algorithm
Features Target
Model 4.13
38

Model 4.13: Genesis
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
16MB
52MB
...
115MB
Gradient Boosting
Tree algorithm
Features Target
Model 4.13
f1
f2
f3
... fn
0 1 1 ... 0
1 0 0 ... 1
... ... ... ... ...
1 0 1 ... 0
39

Model 4.13: Genesis
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
16MB
52MB
...
115MB
Gradient Boosting
Tree algorithm
Features Target
Model 4.13
Size
18MB
25MB
...
228MB
Predict
f1
f2
f3
... fn
0 1 1 ... 0
1 0 0 ... 1
... ... ... ... ...
1 0 1 ... 0
40

Model 4.13: Genesis
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
16MB
52MB
...
115MB
Gradient Boosting
Tree algorithm
Features Target
Model 4.13
Size
18MB
25MB
...
228MB
Predict
✅
✅
✅
f1
f2
f3
... fn
0 1 1 ... 0
1 0 0 ... 1
... ... ... ... ...
1 0 1 ... 0
41

Model Shifting
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
22MB
68MB
...
105MB
Gradient Boosting
Tree algorithm
Features Target
Model 4.15
43

Model Shifting
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
22MB
68MB
...
105MB
Gradient Boosting
Tree algorithm
Features Target
Model 4.15
Size
19MB
26MB
...
298MB
Predict
❌
❌
✅
f1
f2
f3
... fn
0 1 1 ... 0
1 0 0 ... 1
... ... ... ... ...
1 0 1 ... 0
44

Model Shifting
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
22MB
68MB
...
105MB
Gradient Boosting
Tree algorithm
Features Target
Model 4.15
Size
19MB
26MB
...
298MB
Predict
❌
❌
✅
f1
f2
f3
... fn
0 1 1 ... 0
1 0 0 ... 1
... ... ... ... ...
1 0 1 ... 0
Model 4.13
45

Model Shifting
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
22MB
68MB
...
105MB
Gradient Boosting
Tree algorithm
Features Target
Model 4.15
f1
f2
f3
... fn
0 1 1 ... 0
1 0 0 ... 1
... ... ... ... ...
1 0 1 ... 0
Size
19MB
26MB
...
298MB
Predict
❌
❌
✅
Model 4.13
Old Size
16MB
52MB
...
115MB
Old Size
18MB
25MB
...
228MB
Predict
Predict
46

Model Shifting
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
22MB
68MB
...
105MB
Gradient Boosting
Tree algorithm
Features Target
Shifting Model
4.15
f1
f2
f3
... fn
0 1 1 ... 0
1 0 0 ... 1
... ... ... ... ...
1 0 1 ... 0
Size
21MB
35MB
...
298MB
Predict
✅
✅
✅
Model 4.13
Old Size
16MB
52MB
...
115MB
Old Size
18MB
25MB
...
228MB
Predict
Predict
47

Results
Budget : 5.000 configurations
● Model shifting :
○ From 5.6% to 7.1% error rate
● Scratch :
49

Results
● Scratch :
● Scratch :
50

Results
● Scratch :
● Scratch :
● Scratch :
51

Incremental Model Shifting
Model 4.13 + Shifting Model 4.15 = Model 4.15
Source + Shifting Model = Full Model
Simple Model Shifting
53

54

9
55

9
56

Results
● Scratch :
● Incremental Shifting :
○ From 5.6% to 7.5%
10
58

Results
● Scratch :
○ From 5.6% to 7.5%
● Scratch :
○ From 5.2% to 6.5%
10
59

Results
● Scratch :
○ From 6.7% to 13.3%
● Scratch :
○ From 5.6% to 7.5%
● Scratch :
○ From 5.2% to 6.5%
10
60

Results
● Scratch :
○ From 6.7% to 13.3%
● Scratch :
○ From 5.6% to 7.5%
● Scratch :
○ From 5.2% to 6.5%
Model 4.13 Budget : 85.000 configurations
10
61

Results
● Scratch :
○ From 6.7% to 13.3%
● Scratch :
○ From 5.6% to 7.5%
● Scratch :
○ From 5.2% to 6.5%
10
62

Results
● Scratch :
○ From 6.7% to 13.3%
● Scratch :
○ From 5.6% to 7.5%
● Scratch :
○ From 5.2% to 6.5%
● Scratch :
○ From 6.7% to 7.9%
10
63

Results
● Scratch :
○ From 6.7% to 13.3%
● Scratch :
○ From 5.6% to 7.5%
● Scratch :
○ From 5.2% to 6.5%
● Scratch :
○ From 6.7% to 7.9%
● Scratch :
○ From 6.1% to 6.7%
10
64

Results
● Scratch :
○ From 6.7% to 13.3%
● Scratch :
○ From 5.6% to 7.5%
● Scratch :
○ From 5.2% to 6.5%
● Scratch :
○ From 8.5% to 13.8%
● Scratch :
○ From 6.7% to 7.9%
● Scratch :
○ From 6.1% to 6.7%
10
65

Summary
● Model 4.13 is saved
○ Positively reuse old model on new version at lower cost
○ Better than learning from scratch for years
● Incremental Shifting
○ More sensible to previous models error
○ Better use of more transfer budget
11
66

Kpredict
Python module for Python 3.8+ ( https://github.com/HugoJPMartin/kpredict )
Error : ≃ 6.3%
97% of the predictions are below 20% error
12
67

Feature Subset Selection for Learning Huge Configuration Spaces: The case of Linux Kernel Size

More Related Content

More from University of Rennes, INSA Rennes, Inria/IRISA, CNRS

Recently uploaded

Feature Subset Selection for Learning Huge Configuration Spaces: The case of Linux Kernel Size