Parallel Ablation Studies for Machine Learning with Maggy on Apache Spark

Parallel Ablation Studies for
Machine Learning with Maggy on
Apache Spark
Sina Sheikholeslami
PhD Student, KTH Royal Institute of Technology
Jim Dowling
CEO, Logical Clocks AB
Assoc Prof, KTH Royal Institute of Technology
sinash93
jim_dowling

Agenda
Ablation Studies
Why are they important for deep learning?
Asynchronous ML Trials on Spark
Maggy Framework
Parallel Ablation Studies with
Maggy
Programming model with worked-through
example

Ablation for Machine Learning
3
Dataset
Machine
Learning Model
Optimizer
Evaluate
Problem Definition
Data Preparation
Model Selection
Repeat if
needed
Model Training
area roomsfloors price

Ablation study: Remove, retrain, measure.
0.6 0.17 0.05 0.05 0.1 0.98
Accuracy
=+ ++ +
4

Problem: Rewrite ML Code for Ablations, Distribution
Explore
and Design
Experimentation:
Tune and Search
Model Training
(Distributed)
Explainability and
Ablation Studies
5

Maggy: Unified code for Distributed ML + Ablations
OBLIVIOUS
TRAINING
FUNCTION
# RUNS ON THE WORKERS
def train():
def input_fn(): # return dataset
model = …
optimizer = …
model.compile(…)
rc = tf.estimator.RunConfig(
‘CollectiveAllReduceStrategy’)
keras_estimator = tf.keras.estimator.
model_to_estimator(….)
tf.estimator.train_and_evaluate(
keras_estimator, input_fn)
Ablation StudiesEDA HParam Tuning Training (Dist)
Apache V2 - https://github.com/logicalclocks/maggy
6

Maggy: Programming Model
from maggy import experiment
experiment.set_dataset_generator(gen_dataset)
experiment.set_model_generator(gen_model)
# Hyperparameter optimization
experiment.set_context('optimization', 'randomsearch', searchspace)
result = experiment.lagom(train_fun)
params = result.get('best_hp')
# Distributed Training
experiment.set_context('dist_training', 'MultiWorkerMirroredStrategy', params)
experiment.lagom(train_fun)
# Ablation study
experiment.set_context('ablation', 'loco', ablation_study, params)
experiment.lagom(train_fun)
7

Maggy: Distribution and Tracking in One Function*
# RUNS ON THE WORKERS
def train(depth, lr):
from hops import model as mr
def build_data():
..
model = generate_model()
optimizer = …
model.compile(…)
print(…)
mr.export_model(model)
return { ‘accuracy’: acc }
# RUNS ON THE DRIVER
sp=Searchspace(depth=('INTEGER',[2,8]), lr=(..))
experiment.set_context('optimization', 'random’,
sp, direction='max’, num_trials=15)
experiment.lagom(train)
training function & Hparams
save model to Hopsworks Model Registry
track this dict with Experiment results
print to notebook & store in experiment log
define HParams
launch 15 ‘train’ functions on workers
define Trials
https://youtu.be/xora_4iDcQ8
8

Maggy vs
*https://www.logicalclocks.com/blog/hopsworks-ml-experiments
def train(depth, weight):
X_train, X_test, y_train, y_test = build_data(..)
...
model.fit(X_train, y_train) # auto-logging
...
hops.export_model(model, "tensorflow",..,model_name)
...
# import matplotlib,create diagram.png
plt.savefig(diagram.png')
return {'accuracy': accuracy, 'diagram': 'diagram.png’}
sp=Searchspace(depth=('INTEGER',[2,8]), weight=('INTEGER’,[2,8]))
experiment.set_context('optimization', 'random’, sp,
direction='max’, num_trials=15)
experiment.lagom(train)
def train(depth, weight):
X_train, X_test, y_train, y_test = build_data(..)
mlflow.set_tracking_uri("jdbc:mysql://uname:pwd@host:3306/db")
mlflow.set_experiment("My Experiment")
with mlflow.start_run() as run:
...
mlflow.log_param("depth", depth)
mlflow.log_param("weight", weight)
with open("test.txt", "w") as f:
f.write("hello world!")
mlflow.log_artifacts("/full/path/to/test.txt")
...
model.fit(X_train, y_train) # auto-logging
...
mlflow.tensorflow.log_model(model, "tensorflow-model",
registered_model_name=model_name)
Maggy Tracking, Model Registry & HParam Tuning MLFlow Tracking & Model Registry (No Hparam Tuning)
10

PySpark for Distribution
Worker 1
Worker 5
Worker 3
Worker 2
Worker 4
Worker 7
Worker 8
Worker 6
Driver
TF_CONFIG
Driver
Experiment
Controller
Worker 1 Worker NWorker 2
Single
Host
Explore
and Design
Experimentation:
Tune and Search
Model Training
(Distributed)
Explainability and
Ablation Studies
11

Maggy makes transparent
… fixing parameters
… launching
the function
… launching trials (parametrized
instantiations of the function)
… generating new trials
… collecting and logging results
… setting up TF_CONFIG
… wrapping in Distribution Strategy
… launching function as workers
… collecting results
12

Maggy: Asynchronous Trials in PySpark for Ablations
Task11
Task12
Task13
Task1N
Driver (Global Optimizer)
Barrier
Metrics New Trial
13

LOCO: Leave One Component Out
A simple, “natural” ablation policy: an implementation of an ablator
Currently supports Feature, Layer, and Module Ablation
16

Feature Ablation
Uses the Feature Store to access the dataset metadata
Generates Python callables that once called, will return modified
datasets
▪ Removes one-feature-at-a-time
17
area roomsfloors price roomsfloors price

Model Ablation
Uses a base model function
Generates Python callables that once called, will return modified
models
▪ Uses the model configuration to find and remove layer(s)
▪ Removes one-layer-at-a-time, one-layer-group-at-a-time, or one-module-at-a-time
18

Ablation User & Developer API
(Scan for Example Notebooks)

User API: Define Dataset Creation
21

User API: Define Model Creation
22

User API: Define Training Function
23

User API: Initialize the Study
24

User API: Setup Model Ablation
25

User API: Setup Feature Ablation
26

User API: Launch Parallel Trials
27

Developer API: Policy Implementation (1/2)
28

Developer API: Policy Implementation (2/2)
29

Maggy is Open-source
Code Repository: https://github.com/logicalclocks/maggy
API Documentation: https://maggy.readthedocs.io/en/latest/
30

Acknowledgments
Thanks to our colleagues at Logical Clocks and DC@KTH:
Moritz Meister, Robin Andersson, Kim Hammar,
Kai Jeggle, Alessio Molinari, Alex Ormenisan, Tianze Wang,
Amir Payberah, Vladimir Vlassov
This work is supported by the ExtremeEarth
project funded by European Union’s Horizon
2020 Research and Innovation Programme
under grant agreement No. 825258.

Demo: Ablation Study of Common
DL Network Architectures with Maggy

Feedback
Your feedback is important to us.
Don’t forget to rate
and review the sessions.

Parallel Ablation Studies for Machine Learning with Maggy on Apache Spark

More Related Content

What's hot

Similar to Parallel Ablation Studies for Machine Learning with Maggy on Apache Spark

More from Databricks

Recently uploaded

Parallel Ablation Studies for Machine Learning with Maggy on Apache Spark