Deterministic Machine Learning with MLflow and mlf-core

Deterministic Machine
Learning with MLﬂow and
mlf-core
19.11.2020
Lukas Heumos

About me
● Bioinformatics MSc from the University of
Tübingen
● Research Software Engineer at the Quantitative
Biology Center Tübingen
● Expert for reproducible research
2

About the Quantitative Biology Center (QBiC)
● Bioinformatics core facility at the University of Tübingen
○ Data management and data analysis
● Strong contributor to reproducible research
● Job opening for a Scientiﬁc Data Steward
3

Why do we even care?
● 400 papers in machine learning evaluated [1]
○ Only 24% reproducible
[1] Gundersen, Odd Erik & Kjensmo, Sigbjørn. (2018). State of the Art: Reproducibility in Artificial Intelligence.
Auditing Experimentation
Debugging Regression
Science

Primary reasons for non-reproducible machine learning [1]
● Data and code not shared
● Documentation insuﬃcient
○ Hyperparameters, metrics, ...
○ Used hardware
[1] Gundersen, Odd Erik & Kjensmo, Sigbjørn. (2018). State of the Art: Reproducibility in Artificial Intelligence.
● Irreproducible environment
● Usage of GPUs
○ Non-deterministic operations

The elephant in the deterministic machine learning room
● Sum-reduce algorithm
○ Based on CUDA atomic add operations
○ GPUs operate highly in parallel
○ Summing up requires synchronization

On highly parallel floating point addition
● Order of thread synchronization leads to
different floating point errors
● Summation is not associative
● Many applications of the algorithm lead
to amplified differences
● Most machine learning libraries are
based on atomic operations
● Plenty more reasons for
non-deterministic behavior

Recent developments
9
Deterministic algorithms working as expected?
Options for all algorithms available?
Eﬀect on the run time?
● (Optional) deterministic algorithms are now oﬀered
○ Implemented without atomic operations
since v0.4.0 (2017)
since v2.1.0 (2020)
since v1.1.0 (2020)

Evaluating determinism - the setup
● Containerized projects
○ Pytorch, Tensorﬂow: MNIST
○ XGBoost: Covertype
10
● Three settings for CPU, single GPU and multiple GPUs
○ No random seeds
○ All possible random seeds
○ Deterministic algorithms and random seeds
● 5 runs per setting
System Hardware
1 - Laptop Intel I5 7300 HQ and NVIDIA 1050M
2 - deNBI K80 Intel 12 core and 2 NVIDIA Tesla k80s
3/4 - deNBI V100 Intel 24 core and 2 NVIDIA Tesla V100s

GPU with just seeds is non-deterministic
I5 7300 HQ
1050M
11

Can appear deterministic if cuDNN benchmark was lucky
12
12 core
2x K80s

Same hardware - same results
13
24 core
2x V100s

Marginal effect on runtime
14
12 core
2x K80s

Primary takeaways
● Deterministic algorithms work
○ Badly tested
○ Need to be forced
15[1] https://github.com/NVIDIA/framework-determinism
● Not every algorithm has a deterministic option
○ Diﬃcult to get complete lists
○ Even harder to keep track
● Determinism is hardware architecture dependent
● Neglectable eﬀect on the runtime
○ Duncan Riach (NVIDIA): ~ 6% [1]

Requirements for Deterministic Machine Learning
Run 1
Run 2
Run 3
Complex requirements demand
an intuitive software solution

Enabling deterministic machine learning with mlf-core [1][2]
[1] https://mlf-core.com
[2] https://pypi.org/project/mlf-core/ Inspired by

Overview
Templates Continuous
Integration
Documentation Community
Linting Sync

mlf-core lint ensures that code stays deterministic

Enabling deterministic machine learning
Run 1
Run 2
Run 3

mlf-core together with Nextﬂow enables end-to-end
deterministic machine learning

Further work
● mlf-core
○ New templates
■ Machine learning libraries
■ Python packages
○ Improving existing templates
■ Cloud conﬁgurations
■ Adding hyperparameter optimization
■ Restructuring templates
● Add mlf-core projects for popular
architectures
○ Serve as reference implementations, which can be
veriﬁed
28

Join mlf-core
29
● pip install mlf-core
● https://discord.gg/Mv8sAcq
● github.com/mlf-core
● mlf-core.com

Acknowledgements
● Sven Nahnsen
● Philipp Hennig
● Gisela Gabernet
● Duncan Riach
● nf-core
● deNBI
30
This work was supported by the BMBF-funded de.NBI Cloud within the German Network
for Bioinformatics Infrastructure (de.NBI)(031A537B, 031A533A, 031A538A, 031A533B,
031A535A, 031A537C, 031A534A, 031A532B).

Further reasons for non-determinism
● NVIDIA cuDNN benchmark
○ Disable benchmarking
● Bias additions, max-pooling, batch normalization
○ Usually based on atomic add
○ Circumvent with deterministic algorithms
● Many many non obvious functions
○ Index_add
○ Gate_gradients
○ …
○ Usually no solution available
● GPU batch distribution
○ Library speciﬁc
● CUDA version
○ Must be compiled with the correct version 33

Deterministic sum_reduce
● One of the easier algorithms to replace atomic add in
● Multiply transpose of a column vector with a column of ones
def reduce_sum_det(x):
v = tf.reshape(x, [1, -1])
return tf.reshape(tf.matmul(v, tf.ones_like(v), transpose_b=True), []
● GPUs are good at matrix multiplication
34

Architecture of pytorch & tensorflow & xgboost params
● Pytorch & Tensorflow
○ 2x conv, 2x dropout, 1x 2d max pooling, 2x fc, 1x log softmax
○ Rectified linear activation functions
○ Adam optimizer
● XGBoost
○ Hist & gpu_hist algorithms
○ Subsample, colsample_bytree, colsample_bylevel = 0.5
35

Datasets mnist, covertype
● MNIST
○ 60000 training, 10000 test
○ Classify 10 handwritten digits
● Covertype
○ 581012 instances
○ Classify tree cover type
36

Deterministic Machine Learning with MLflow and mlf-core

Recommended

Recommended

More Related Content

Similar to Deterministic Machine Learning with MLflow and mlf-core

Similar to Deterministic Machine Learning with MLflow and mlf-core (20)

More from Databricks

More from Databricks (20)

Recently uploaded

Recently uploaded (20)

Deterministic Machine Learning with MLflow and mlf-core