Parallel Ablation Studies for
Machine Learning with Maggy on
Apache Spark
Sina Sheikholeslami
PhD Student, KTH Royal Institute of Technology
Jim Dowling
CEO, Logical Clocks AB
Assoc Prof, KTH Royal Institute of Technology
sinash93
jim_dowling
Agenda
Ablation Studies
Why are they important for deep learning?
Asynchronous ML Trials on Spark
Maggy Framework
Parallel Ablation Studies with
Maggy
Programming model with worked-through
example
Ablation for Machine Learning
3
Dataset
Machine
Learning Model
Optimizer
Evaluate
Problem Definition
Data Preparation
Model Selection
Repeat if
needed
Model Training
area roomsfloors price
Ablation study: Remove, retrain, measure.
0.6 0.17 0.05 0.05 0.1 0.98
Accuracy
=+ ++ +
4
Problem: Rewrite ML Code for Ablations, Distribution
Explore
and Design
Experimentation:
Tune and Search
Model Training
(Distributed)
Explainability and
Ablation Studies
5
Maggy: Unified code for Distributed ML + Ablations
OBLIVIOUS
TRAINING
FUNCTION
# RUNS ON THE WORKERS
def train():
def input_fn(): # return dataset
model = …
optimizer = …
model.compile(…)
rc = tf.estimator.RunConfig(
‘CollectiveAllReduceStrategy’)
keras_estimator = tf.keras.estimator.
model_to_estimator(….)
tf.estimator.train_and_evaluate(
keras_estimator, input_fn)
Ablation StudiesEDA HParam Tuning Training (Dist)
Apache V2 - https://github.com/logicalclocks/maggy
6
Maggy: Programming Model
from maggy import experiment
experiment.set_dataset_generator(gen_dataset)
experiment.set_model_generator(gen_model)
# Hyperparameter optimization
experiment.set_context('optimization', 'randomsearch', searchspace)
result = experiment.lagom(train_fun)
params = result.get('best_hp')
# Distributed Training
experiment.set_context('dist_training', 'MultiWorkerMirroredStrategy', params)
experiment.lagom(train_fun)
# Ablation study
experiment.set_context('ablation', 'loco', ablation_study, params)
experiment.lagom(train_fun)
7
Maggy: Distribution and Tracking in One Function*
# RUNS ON THE WORKERS
def train(depth, lr):
from hops import model as mr
def build_data():
..
model = generate_model()
optimizer = …
model.compile(…)
print(…)
mr.export_model(model)
return { ‘accuracy’: acc }
# RUNS ON THE DRIVER
from maggy import experiment
sp=Searchspace(depth=('INTEGER',[2,8]), lr=(..))
experiment.set_context('optimization', 'random’,
sp, direction='max’, num_trials=15)
experiment.lagom(train)
training function & Hparams
save model to Hopsworks Model Registry
track this dict with Experiment results
print to notebook & store in experiment log
define HParams
launch 15 ‘train’ functions on workers
define Trials
https://youtu.be/xora_4iDcQ8
8
Maggy vs
*https://www.logicalclocks.com/blog/hopsworks-ml-experiments
def train(depth, weight):
X_train, X_test, y_train, y_test = build_data(..)
...
model.fit(X_train, y_train) # auto-logging
...
hops.export_model(model, "tensorflow",..,model_name)
...
# import matplotlib,create diagram.png
plt.savefig(diagram.png')
return {'accuracy': accuracy, 'diagram': 'diagram.png’}
from maggy import experiment
sp=Searchspace(depth=('INTEGER',[2,8]), weight=('INTEGER’,[2,8]))
experiment.set_context('optimization', 'random’, sp,
direction='max’, num_trials=15)
experiment.lagom(train)
def train(depth, weight):
X_train, X_test, y_train, y_test = build_data(..)
mlflow.set_tracking_uri("jdbc:mysql://uname:pwd@host:3306/db")
mlflow.set_experiment("My Experiment")
with mlflow.start_run() as run:
...
mlflow.log_param("depth", depth)
mlflow.log_param("weight", weight)
with open("test.txt", "w") as f:
f.write("hello world!")
mlflow.log_artifacts("/full/path/to/test.txt")
...
model.fit(X_train, y_train) # auto-logging
...
mlflow.tensorflow.log_model(model, "tensorflow-model",
registered_model_name=model_name)
Maggy Tracking, Model Registry & HParam Tuning MLFlow Tracking & Model Registry (No Hparam Tuning)
10
PySpark for Distribution
Worker 1
Worker 5
Worker 3
Worker 2
Worker 4
Worker 7
Worker 8
Worker 6
Driver
TF_CONFIG
Driver
Experiment
Controller
Worker 1 Worker NWorker 2
Single
Host
Explore
and Design
Experimentation:
Tune and Search
Model Training
(Distributed)
Explainability and
Ablation Studies
11
Maggy makes transparent
… fixing parameters
… launching
the function
… launching trials (parametrized
instantiations of the function)
… generating new trials
… collecting and logging results
… setting up TF_CONFIG
… wrapping in Distribution Strategy
… launching function as workers
… collecting results
12
Maggy: Asynchronous Trials in PySpark for Ablations
Task11
Task12
Task13
Task1N
Driver (Global Optimizer)
Barrier
Metrics New Trial
13
Ablation Studies in Maggy
15
LOCO: Leave One Component Out
A simple, “natural” ablation policy: an implementation of an ablator
Currently supports Feature, Layer, and Module Ablation
16
Feature Ablation
Uses the Feature Store to access the dataset metadata
Generates Python callables that once called, will return modified
datasets
▪ Removes one-feature-at-a-time
17
area roomsfloors price roomsfloors price
Model Ablation
Uses a base model function
Generates Python callables that once called, will return modified
models
▪ Uses the model configuration to find and remove layer(s)
▪ Removes one-layer-at-a-time, one-layer-group-at-a-time, or one-module-at-a-time
18
Ablation User & Developer API
(Scan for Example Notebooks)
Programming Workflow
20
User API: Define Dataset Creation
21
User API: Define Model Creation
22
User API: Define Training Function
23
User API: Initialize the Study
24
User API: Setup Model Ablation
25
User API: Setup Feature Ablation
26
User API: Launch Parallel Trials
27
Developer API: Policy Implementation (1/2)
28
Developer API: Policy Implementation (2/2)
29
Maggy is Open-source
Code Repository: https://github.com/logicalclocks/maggy
API Documentation: https://maggy.readthedocs.io/en/latest/
30
Acknowledgments
Thanks to our colleagues at Logical Clocks and DC@KTH:
Moritz Meister, Robin Andersson, Kim Hammar,
Kai Jeggle, Alessio Molinari, Alex Ormenisan, Tianze Wang,
Amir Payberah, Vladimir Vlassov
This work is supported by the ExtremeEarth
project funded by European Union’s Horizon
2020 Research and Innovation Programme
under grant agreement No. 825258.
Demo: Ablation Study of Common
DL Network Architectures with Maggy
Feedback
Your feedback is important to us.
Don’t forget to rate
and review the sessions.

Parallel Ablation Studies for Machine Learning with Maggy on Apache Spark

  • 1.
    Parallel Ablation Studiesfor Machine Learning with Maggy on Apache Spark Sina Sheikholeslami PhD Student, KTH Royal Institute of Technology Jim Dowling CEO, Logical Clocks AB Assoc Prof, KTH Royal Institute of Technology sinash93 jim_dowling
  • 2.
    Agenda Ablation Studies Why arethey important for deep learning? Asynchronous ML Trials on Spark Maggy Framework Parallel Ablation Studies with Maggy Programming model with worked-through example
  • 3.
    Ablation for MachineLearning 3 Dataset Machine Learning Model Optimizer Evaluate Problem Definition Data Preparation Model Selection Repeat if needed Model Training area roomsfloors price
  • 4.
    Ablation study: Remove,retrain, measure. 0.6 0.17 0.05 0.05 0.1 0.98 Accuracy =+ ++ + 4
  • 5.
    Problem: Rewrite MLCode for Ablations, Distribution Explore and Design Experimentation: Tune and Search Model Training (Distributed) Explainability and Ablation Studies 5
  • 6.
    Maggy: Unified codefor Distributed ML + Ablations OBLIVIOUS TRAINING FUNCTION # RUNS ON THE WORKERS def train(): def input_fn(): # return dataset model = … optimizer = … model.compile(…) rc = tf.estimator.RunConfig( ‘CollectiveAllReduceStrategy’) keras_estimator = tf.keras.estimator. model_to_estimator(….) tf.estimator.train_and_evaluate( keras_estimator, input_fn) Ablation StudiesEDA HParam Tuning Training (Dist) Apache V2 - https://github.com/logicalclocks/maggy 6
  • 7.
    Maggy: Programming Model frommaggy import experiment experiment.set_dataset_generator(gen_dataset) experiment.set_model_generator(gen_model) # Hyperparameter optimization experiment.set_context('optimization', 'randomsearch', searchspace) result = experiment.lagom(train_fun) params = result.get('best_hp') # Distributed Training experiment.set_context('dist_training', 'MultiWorkerMirroredStrategy', params) experiment.lagom(train_fun) # Ablation study experiment.set_context('ablation', 'loco', ablation_study, params) experiment.lagom(train_fun) 7
  • 8.
    Maggy: Distribution andTracking in One Function* # RUNS ON THE WORKERS def train(depth, lr): from hops import model as mr def build_data(): .. model = generate_model() optimizer = … model.compile(…) print(…) mr.export_model(model) return { ‘accuracy’: acc } # RUNS ON THE DRIVER from maggy import experiment sp=Searchspace(depth=('INTEGER',[2,8]), lr=(..)) experiment.set_context('optimization', 'random’, sp, direction='max’, num_trials=15) experiment.lagom(train) training function & Hparams save model to Hopsworks Model Registry track this dict with Experiment results print to notebook & store in experiment log define HParams launch 15 ‘train’ functions on workers define Trials https://youtu.be/xora_4iDcQ8 8
  • 9.
    Maggy vs *https://www.logicalclocks.com/blog/hopsworks-ml-experiments def train(depth,weight): X_train, X_test, y_train, y_test = build_data(..) ... model.fit(X_train, y_train) # auto-logging ... hops.export_model(model, "tensorflow",..,model_name) ... # import matplotlib,create diagram.png plt.savefig(diagram.png') return {'accuracy': accuracy, 'diagram': 'diagram.png’} from maggy import experiment sp=Searchspace(depth=('INTEGER',[2,8]), weight=('INTEGER’,[2,8])) experiment.set_context('optimization', 'random’, sp, direction='max’, num_trials=15) experiment.lagom(train) def train(depth, weight): X_train, X_test, y_train, y_test = build_data(..) mlflow.set_tracking_uri("jdbc:mysql://uname:pwd@host:3306/db") mlflow.set_experiment("My Experiment") with mlflow.start_run() as run: ... mlflow.log_param("depth", depth) mlflow.log_param("weight", weight) with open("test.txt", "w") as f: f.write("hello world!") mlflow.log_artifacts("/full/path/to/test.txt") ... model.fit(X_train, y_train) # auto-logging ... mlflow.tensorflow.log_model(model, "tensorflow-model", registered_model_name=model_name) Maggy Tracking, Model Registry & HParam Tuning MLFlow Tracking & Model Registry (No Hparam Tuning) 10
  • 10.
    PySpark for Distribution Worker1 Worker 5 Worker 3 Worker 2 Worker 4 Worker 7 Worker 8 Worker 6 Driver TF_CONFIG Driver Experiment Controller Worker 1 Worker NWorker 2 Single Host Explore and Design Experimentation: Tune and Search Model Training (Distributed) Explainability and Ablation Studies 11
  • 11.
    Maggy makes transparent …fixing parameters … launching the function … launching trials (parametrized instantiations of the function) … generating new trials … collecting and logging results … setting up TF_CONFIG … wrapping in Distribution Strategy … launching function as workers … collecting results 12
  • 12.
    Maggy: Asynchronous Trialsin PySpark for Ablations Task11 Task12 Task13 Task1N Driver (Global Optimizer) Barrier Metrics New Trial 13
  • 13.
  • 14.
    LOCO: Leave OneComponent Out A simple, “natural” ablation policy: an implementation of an ablator Currently supports Feature, Layer, and Module Ablation 16
  • 15.
    Feature Ablation Uses theFeature Store to access the dataset metadata Generates Python callables that once called, will return modified datasets ▪ Removes one-feature-at-a-time 17 area roomsfloors price roomsfloors price
  • 16.
    Model Ablation Uses abase model function Generates Python callables that once called, will return modified models ▪ Uses the model configuration to find and remove layer(s) ▪ Removes one-layer-at-a-time, one-layer-group-at-a-time, or one-module-at-a-time 18
  • 17.
    Ablation User &Developer API (Scan for Example Notebooks)
  • 18.
  • 19.
    User API: DefineDataset Creation 21
  • 20.
    User API: DefineModel Creation 22
  • 21.
    User API: DefineTraining Function 23
  • 22.
  • 23.
    User API: SetupModel Ablation 25
  • 24.
    User API: SetupFeature Ablation 26
  • 25.
    User API: LaunchParallel Trials 27
  • 26.
    Developer API: PolicyImplementation (1/2) 28
  • 27.
    Developer API: PolicyImplementation (2/2) 29
  • 28.
    Maggy is Open-source CodeRepository: https://github.com/logicalclocks/maggy API Documentation: https://maggy.readthedocs.io/en/latest/ 30
  • 29.
    Acknowledgments Thanks to ourcolleagues at Logical Clocks and DC@KTH: Moritz Meister, Robin Andersson, Kim Hammar, Kai Jeggle, Alessio Molinari, Alex Ormenisan, Tianze Wang, Amir Payberah, Vladimir Vlassov This work is supported by the ExtremeEarth project funded by European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No. 825258.
  • 30.
    Demo: Ablation Studyof Common DL Network Architectures with Maggy
  • 31.
    Feedback Your feedback isimportant to us. Don’t forget to rate and review the sessions.