REPRODUCIBLE AI
USING PYTORCH AND
MLFLOW
GEETA CHAUHAN
AI PARTNER ENGINEERING, FACEBOOK AI
NOV, 2020
AGENDA 01
PYTORCH COMMUNITY GROWTH
02
REPRODUCIBLE AI CHALLENGE
03
SOLUTION: MLFLOW + PYTORCH
04
REFERENCES
P Y T O R C H C O M M U N I T Y
G R O W T H
~1,619C O N T R I B U T O R S
50%+Y O Y G R O W T H
34K+P Y T O R C H F O R U M U S E R S
G R O W I N G U S A G E I N O P E N S O U R C E
Source: https://paperswithcode.com/trends
G R O W T H O F D A T A I N M L P I P E L I N E S @ F A C E B O O K
FB DATA USED IN AN ML
PIPELINE IN 2018
FB DATA USED IN AN ML
PIPELINE TODAY
DATA WAREHOUSE
GROWTH SINCE 2018
ML DATA GROWTH
SINCE 2018
30% 50% 3X2X
G R O W T H O F M L T R A I N I N G @ F A C E B O O K
WORKFLOWSUNIQUE USERS COMPUTE CONSUMED
5X
INCREASE
2X
INCREASE
8X
INCREASE
R E P R O D U C I B L E A I C H A L L E N G E
TRADITIONAL SOFTWARE VS MACHINE LEARNING
• Continuous, Iterative process, Optimize for metric
• Quality depends on data and tuning parameters
• Experiment tracking is difficult
• Over time data changes, model drift
• Compare + combine many libraries and models
• Diverse deployment environments
REPRODUCIBILITY CHALLENGE
• Difficult to reproduce results of a paper,
• Missing data, Model weights, scripts
R E S E A R C H
• Hyper parameters, Features, Data,
Vocabulary and other artifacts lost
• People leaving company
P R O D U C T I O N
REPRODUCIBLE RESEARCH
NeurIPs 2019 Reproducibility Checklist
REPRODUCIBILITY CHECKLIST
• Dependencies — does a repository have information on
dependencies or instructions on how to set up the environment?
• Training scripts — does a repository contain a way to train/fit
the model(s) described in the paper?
• Evaluation scripts — does a repository contain a script to
calculate the performance of the trained model(s) or run
experiments on models?
• Pretrained models — does a repository provide free access to
pretrained model weights?
• Results — does a repository contain a table/plot of main results
and a script to reproduce those results?
ARXIV + PWC —> REPRODUCIBLE RESEARCH
https://medium.com/paperswithcode/papers-with-code-partners-with-arxiv-ecc362883167
M L F L O W + P Y T O R C H
Model
Registry
Store, annotate
and manage
models in a central
repository
Projects
Package data science
code in a format that
enables reproducible
runs on many
platform
Models
Deploy machine
learning models in
diverse serving
environments
Tracking
Record and query
experiments:
code, data, config,
and results
PyTorch auto logging PyTorch examples w/
MLProjects
TorchScripted models,
Save/Load artifacts
MLflow TorchServe
Deployment Plugin
MLFLOW + PYTORCH FOR REPRODUCIBILITY
M L F L O W A U T O L O G G I N G
• PyTorch auto logging with Lightning training
loop
• Model hyper-parameters like LR, model
summary, optimizer name, Min delta, Best
Score
• Early stopping and other callbacks
• Log every N iterations
• User defined metrics like F1 score, test
accuracy
• ….
from mlflow.pytorch.pytorch_autolog import autolog
parser =
LightningMNISTClassifier.add_model_specific_args(parent_par
ser=parser)
autolog() #just add this and your autologging should work!
mlflow.set_tracking_uri(dict_args['tracking_uri'])
model = LightningMNISTClassifier(**dict_args)
early_stopping = EarlyStopping(monitor="val_loss",
mode="min", verbose=True)
checkpoint_callback = ModelCheckpoint(
filepath=os.getcwd(), save_top_k=1, verbose=True,
monitor="val_loss", mode="min", prefix="",
)
lr_logger = LearningRateLogger()
trainer = pl.Trainer.from_argparse_args(
args,
callbacks=[lr_logger],
early_stop_callback=early_stopping,
checkpoint_callback=checkpoint_callback,
train_percent_check=0.1,
)
trainer.fit(model)
trainer.test()
C O M P A R E E X P E R I M E N T R U N S
S A V E A R T I F A C T S • Additional artifacts for model reproducibility
• For Example: vocabulary files for NLP models,
requirements.txt and other extra files for torchserve deployment
mlflow.pytorch.save_model(
model,
path=args.model_save_path,
requirements_file="requirements.txt",
extra_files=["class_mapping.json", "bert_base_uncased_vocab.txt"],
)
:param requirements_file: An (optional) string containing the path to requirements file.
If ``None``, no requirements file is added to the model.
:param extra_files: An (optional) list containing the paths to corresponding extra files.
For example, consider the following ``extra_files`` list::
extra_files = ["s3://my-bucket/path/to/my_file1",
"s3://my-bucket/path/to/my_file2"]
In this case, the ``"my_file1 & my_file2"`` extra file is downloaded from S3.
If ``None``, no extra files are added to the model.
T O R C H S C R I P T E D M O D E L
• Log TorchScripted model
• Serialize and Optimize models for python-free
process
• Recommended for production inference
mlflow.set_tracking_uri(dict_args["tracking_uri"])
model = LightningMNISTClassifier(**dict_args)
# Convert to TorchScripted model
scripted_model = torch.jit.script(model)
mlflow.start_run()
# Log the scripted model using log_model
mlflow.pytorch.log_model(scripted_model, "scripted_model")
# If you need to reload the model just call load_model
uri_path = mlflow.get_artifact_uri()
scripted_loaded_model =
mlflow.pytorch.load_model(os.path.join(uri_path,
"scripted_model"))
mlflow.end_run()
TORCHSERVE
• Default handlers for common use cases (e.g., image segmentation, text classification) along with custom handlers support
for other use cases and a Model Zoo
• Multi-model serving, Model versioning and ability to roll back to an earlier version
• Automatic batching of individual inferences across HTTP requests
• Logging including common metrics, and the ability to incorporate custom metrics
• Robust HTTP APIS - Management and Inference
D E P L O Y M E N T P L U G I N
New TorchServe Deployment Plugin
Test models during development cycle, pull
models from MLflow Model repository and run
• CLI
• Run with Local vs remote TorchServe
• Python API
mlflow deployments predict --name mnist_test --target
torchserve --input_path sample.json --output_path
output.json
import os
import matplotlib.pyplot as plt
from torchvision import transforms
from mlflow.deployments import get_deploy_client
img = plt.imread(os.path.join(os.getcwd(), "test_data/one.png"))
mnist_transforms = transforms.Compose([
transforms.ToTensor()
])
image = mnist_transforms(img)
plugin = get_deploy_client("torchserve")
config = {
'MODEL_FILE': "mnist_model.py",
'HANDLER_FILE': 'mnist_handler.py'
}
plugin.create_deployment(name="mnist_test",
model_uri="mnist_cnn.pt", config=config)
prediction = plugin.predict("mnist_test", image)
DEMO
PYTEXT
PARAMETER SWEEPING
EVALUATION
TRAINING
MODEL AUTHORING
NEW IDEA / PAPER
PYTORCH
MODEL
PYTHON
SERVICE
SMALL-SCALE
METRICS
PYTEXT
PERFORMANCE TUNING
EXPORT VALIDATION
EXPORT TO TORCHSCRIPT
PYTORCH
TORCHSCRIPT
C++
INFERENCE
SERVICE
RESEARCH TO PRODUCTION CYCLE @ FACEBOOK
FUTURE
• Model Interpretability - Captum
• Hyper parameter optimization - Ax/BoTorch
• More examples ….
REFERENCES
• PyTorch 1.7: https://pytorch.org/blog/pytorch-1.7-released/
• Reproducibility Checklist: https://www.cs.mcgill.ca/~jpineau/
ReproducibilityChecklist.pdf
• NeurIPS Reproducibility updates: https://ai.facebook.com/blog/new-code-
completeness-checklist-and-reproducibility-updates/
• arXiv + Papers with code: https://medium.com/paperswithcode/papers-with-cod
partners-with-arxiv-ecc362883167
• NeurIPS 2020 RC: https://paperswithcode.com/rc2020
• MLflow PyTorch autolog: https://github.com/mlflow/mlflow/tree/master/mlflow/p
• MLflow TorchServe deployment plugin: https://github.com/mlflow/mlflow-torchs
• MLflow + PyTorch Examples: https://github.com/mlflow/mlflow/tree/master/exam
pytorch
• PyTorch Medium: https://medium.com/pytorch
QUESTIONS?
Contact:
Email: gchauhan@fb.com
Linkedin: https://www.linkedin.com/in/geetachauhan/
Feedback
Your feedback is important to us.
Don’t forget to rate
and review the sessions.
SIMPLICITY
OVER
COMPLEXITY
HARDWARE
ACCELERATED
INFERENCE
DISTRIBUTED
TRAINING
DYNAMIC
NEURAL
NETWORKS
EAGER &
GRAPH-BASED
EXECUTION
WHAT IS PYTORCH?
INDUSTRY USAGE
https://medium.com/pytorch

Reproducible AI Using PyTorch and MLflow

  • 1.
    REPRODUCIBLE AI USING PYTORCHAND MLFLOW GEETA CHAUHAN AI PARTNER ENGINEERING, FACEBOOK AI NOV, 2020
  • 2.
    AGENDA 01 PYTORCH COMMUNITYGROWTH 02 REPRODUCIBLE AI CHALLENGE 03 SOLUTION: MLFLOW + PYTORCH 04 REFERENCES
  • 3.
    P Y TO R C H C O M M U N I T Y G R O W T H
  • 4.
    ~1,619C O NT R I B U T O R S 50%+Y O Y G R O W T H 34K+P Y T O R C H F O R U M U S E R S
  • 5.
    G R OW I N G U S A G E I N O P E N S O U R C E Source: https://paperswithcode.com/trends
  • 6.
    G R OW T H O F D A T A I N M L P I P E L I N E S @ F A C E B O O K FB DATA USED IN AN ML PIPELINE IN 2018 FB DATA USED IN AN ML PIPELINE TODAY DATA WAREHOUSE GROWTH SINCE 2018 ML DATA GROWTH SINCE 2018 30% 50% 3X2X
  • 7.
    G R OW T H O F M L T R A I N I N G @ F A C E B O O K WORKFLOWSUNIQUE USERS COMPUTE CONSUMED 5X INCREASE 2X INCREASE 8X INCREASE
  • 8.
    R E PR O D U C I B L E A I C H A L L E N G E
  • 9.
    TRADITIONAL SOFTWARE VSMACHINE LEARNING • Continuous, Iterative process, Optimize for metric • Quality depends on data and tuning parameters • Experiment tracking is difficult • Over time data changes, model drift • Compare + combine many libraries and models • Diverse deployment environments
  • 10.
    REPRODUCIBILITY CHALLENGE • Difficultto reproduce results of a paper, • Missing data, Model weights, scripts R E S E A R C H • Hyper parameters, Features, Data, Vocabulary and other artifacts lost • People leaving company P R O D U C T I O N
  • 11.
    REPRODUCIBLE RESEARCH NeurIPs 2019Reproducibility Checklist
  • 12.
    REPRODUCIBILITY CHECKLIST • Dependencies —does a repository have information on dependencies or instructions on how to set up the environment? • Training scripts — does a repository contain a way to train/fit the model(s) described in the paper? • Evaluation scripts — does a repository contain a script to calculate the performance of the trained model(s) or run experiments on models? • Pretrained models — does a repository provide free access to pretrained model weights? • Results — does a repository contain a table/plot of main results and a script to reproduce those results?
  • 13.
    ARXIV + PWC—> REPRODUCIBLE RESEARCH https://medium.com/paperswithcode/papers-with-code-partners-with-arxiv-ecc362883167
  • 14.
    M L FL O W + P Y T O R C H
  • 15.
    Model Registry Store, annotate and manage modelsin a central repository Projects Package data science code in a format that enables reproducible runs on many platform Models Deploy machine learning models in diverse serving environments Tracking Record and query experiments: code, data, config, and results PyTorch auto logging PyTorch examples w/ MLProjects TorchScripted models, Save/Load artifacts MLflow TorchServe Deployment Plugin MLFLOW + PYTORCH FOR REPRODUCIBILITY
  • 16.
    M L FL O W A U T O L O G G I N G • PyTorch auto logging with Lightning training loop • Model hyper-parameters like LR, model summary, optimizer name, Min delta, Best Score • Early stopping and other callbacks • Log every N iterations • User defined metrics like F1 score, test accuracy • …. from mlflow.pytorch.pytorch_autolog import autolog parser = LightningMNISTClassifier.add_model_specific_args(parent_par ser=parser) autolog() #just add this and your autologging should work! mlflow.set_tracking_uri(dict_args['tracking_uri']) model = LightningMNISTClassifier(**dict_args) early_stopping = EarlyStopping(monitor="val_loss", mode="min", verbose=True) checkpoint_callback = ModelCheckpoint( filepath=os.getcwd(), save_top_k=1, verbose=True, monitor="val_loss", mode="min", prefix="", ) lr_logger = LearningRateLogger() trainer = pl.Trainer.from_argparse_args( args, callbacks=[lr_logger], early_stop_callback=early_stopping, checkpoint_callback=checkpoint_callback, train_percent_check=0.1, ) trainer.fit(model) trainer.test()
  • 17.
    C O MP A R E E X P E R I M E N T R U N S
  • 18.
    S A VE A R T I F A C T S • Additional artifacts for model reproducibility • For Example: vocabulary files for NLP models, requirements.txt and other extra files for torchserve deployment mlflow.pytorch.save_model( model, path=args.model_save_path, requirements_file="requirements.txt", extra_files=["class_mapping.json", "bert_base_uncased_vocab.txt"], ) :param requirements_file: An (optional) string containing the path to requirements file. If ``None``, no requirements file is added to the model. :param extra_files: An (optional) list containing the paths to corresponding extra files. For example, consider the following ``extra_files`` list:: extra_files = ["s3://my-bucket/path/to/my_file1", "s3://my-bucket/path/to/my_file2"] In this case, the ``"my_file1 & my_file2"`` extra file is downloaded from S3. If ``None``, no extra files are added to the model.
  • 19.
    T O RC H S C R I P T E D M O D E L • Log TorchScripted model • Serialize and Optimize models for python-free process • Recommended for production inference mlflow.set_tracking_uri(dict_args["tracking_uri"]) model = LightningMNISTClassifier(**dict_args) # Convert to TorchScripted model scripted_model = torch.jit.script(model) mlflow.start_run() # Log the scripted model using log_model mlflow.pytorch.log_model(scripted_model, "scripted_model") # If you need to reload the model just call load_model uri_path = mlflow.get_artifact_uri() scripted_loaded_model = mlflow.pytorch.load_model(os.path.join(uri_path, "scripted_model")) mlflow.end_run()
  • 20.
    TORCHSERVE • Default handlersfor common use cases (e.g., image segmentation, text classification) along with custom handlers support for other use cases and a Model Zoo • Multi-model serving, Model versioning and ability to roll back to an earlier version • Automatic batching of individual inferences across HTTP requests • Logging including common metrics, and the ability to incorporate custom metrics • Robust HTTP APIS - Management and Inference
  • 21.
    D E PL O Y M E N T P L U G I N New TorchServe Deployment Plugin Test models during development cycle, pull models from MLflow Model repository and run • CLI • Run with Local vs remote TorchServe • Python API mlflow deployments predict --name mnist_test --target torchserve --input_path sample.json --output_path output.json import os import matplotlib.pyplot as plt from torchvision import transforms from mlflow.deployments import get_deploy_client img = plt.imread(os.path.join(os.getcwd(), "test_data/one.png")) mnist_transforms = transforms.Compose([ transforms.ToTensor() ]) image = mnist_transforms(img) plugin = get_deploy_client("torchserve") config = { 'MODEL_FILE': "mnist_model.py", 'HANDLER_FILE': 'mnist_handler.py' } plugin.create_deployment(name="mnist_test", model_uri="mnist_cnn.pt", config=config) prediction = plugin.predict("mnist_test", image)
  • 22.
  • 23.
    PYTEXT PARAMETER SWEEPING EVALUATION TRAINING MODEL AUTHORING NEWIDEA / PAPER PYTORCH MODEL PYTHON SERVICE SMALL-SCALE METRICS PYTEXT PERFORMANCE TUNING EXPORT VALIDATION EXPORT TO TORCHSCRIPT PYTORCH TORCHSCRIPT C++ INFERENCE SERVICE RESEARCH TO PRODUCTION CYCLE @ FACEBOOK
  • 24.
    FUTURE • Model Interpretability- Captum • Hyper parameter optimization - Ax/BoTorch • More examples ….
  • 25.
    REFERENCES • PyTorch 1.7:https://pytorch.org/blog/pytorch-1.7-released/ • Reproducibility Checklist: https://www.cs.mcgill.ca/~jpineau/ ReproducibilityChecklist.pdf • NeurIPS Reproducibility updates: https://ai.facebook.com/blog/new-code- completeness-checklist-and-reproducibility-updates/ • arXiv + Papers with code: https://medium.com/paperswithcode/papers-with-cod partners-with-arxiv-ecc362883167 • NeurIPS 2020 RC: https://paperswithcode.com/rc2020 • MLflow PyTorch autolog: https://github.com/mlflow/mlflow/tree/master/mlflow/p • MLflow TorchServe deployment plugin: https://github.com/mlflow/mlflow-torchs • MLflow + PyTorch Examples: https://github.com/mlflow/mlflow/tree/master/exam pytorch • PyTorch Medium: https://medium.com/pytorch
  • 26.
  • 27.
    Feedback Your feedback isimportant to us. Don’t forget to rate and review the sessions.
  • 28.
  • 29.