Often times model deployment and integration consists of several moving parts that require intricate steps woven together. Automating this pipeline and feedback loop can be incredibly challenging, especially in lieu of varying model development techniques.
3. About the Speakers
Peter Tamisin
Technical Lead, Customer Success
● 20+ year career focused on data
analytics and engineering
● Member of Databricks Automation SME
team contributing multiple blogs and
published best practice guides on
CI/CD
● Based in Atlanta, GA enjoys cheering on
the Atlanta Hawks and playing
video/board games with his wife and 4
kids
Mary Grace Moesta
Customer Success Engineer
● Supporting customers in the
retail and CPG space
● Former data scientist with
focused work on customer
experience and brand
acceleration
● Databricks Labs AutoML
contributor
● Based in Detroit, MI, enjoys
running and golfing in her free
time
4. Agenda
▪ Setting Definition and
Assumptions
▪ Defining MLops
▪ The importance of MLops in a production system
▪ Basics of CI/CD
▪ How MLflow pivots CI/CD basics for ML
▪ An Example of promoting ML
Code and a model as artifacts
▪ Version control
▪ Interfacing with MLflow
▪ Registering the Model
▪ Building DevOps pipeline to trigger production training and
inference runs
5. Continuous Delivery for Machine Learning
Danilo Sato, Arif Wider, Christoph windheuser
“Continuous Delivery for Machine Learning (CD4ML) is a software
engineering approach in which a cross-functional team produces machine
learning applications based on code, data, and models in small and safe
increments that can be reproduced and reliably released at any time, in short
adaptation cycles.”
6. Why MLops is Relevant
▪ The data science and machine learning development framework is
traditionally centered around local development
▪ Keeping work scoped to a scientist’s local laptop, code save locally, limited to the amount of compute in local machine, etc.
▪ As data and process complexity grows, so does the number of
integration points
▪ Leveraging tools like Spark means there is more machines to manage, data is stored across various locations, the cost of kicking off a
single run now becomes much more complex and expensive
▪ Machine learning operations allow for development at scale and hands
off execution of production runs
▪ Automated execution enables the business to implement these powerful machine learning solutions with ease
7. The Basics of Continuous Integration and
Continuous Delivery
▪ Code
▪ Develop code and run tests in a local IDE, Databricks,
Databricks Connect, etc.
▪ Manually run tests
▪ Commit code and tests to a version controlled code branch
▪ Build
▪ Pull together new code + tests
▪ Run automated tests
▪ Build library and non notebook code
▪ Release
▪ Generate Release Artifact
▪ Deploy
▪ Two methods of deployment
▪ Deploying notebooks
▪ Deploying Libraries / release artifacts
▪ Test
▪ Run automated tests
▪ Report results
▪ Operate
▪ Programatically schedule downstream data
engineering, machine learning, and analytics
workloads
Continuous DeliveryContinuous Integration
8. The Basics of Continuous Integration and
Continuous Delivery with an ML Twist
▪ Code
▪ Develop code and run tests in a local IDE, Databricks,
Databricks Connect, etc.
▪ Manually run tests
▪ Commit code and tests to a version controlled code branch
▪ Build
▪ Pull together new code + tests
▪ Run automated tests
▪ Build library and non notebook code
▪ Release
▪ Generate Release Artifact
▪ Deploy
▪ Two methods of deployment
▪ Deploying notebooks
▪ Deploying Libraries / release artifacts
▪ Test
▪ Run automated tests
▪ Report results
▪ Operate
▪ Programatically schedule downstream data
engineering, machine learning, and analytics
workloads
Continuous DeliveryContinuous Integration
➔ Batching scoring, real time serving, containers,
cloud inference services
➔ Leveraging tools like Jenkins, AzDO, ect to
trigger new model builds in production when
new changes are merged to master
➔ Using MLflow to track experiments, runs,
hyperparameters, code, artifacts, etc.
➔ Notebook / IDE environment to develop on feature
branches using favorite ML tools: sklearn, SparkML,
TF, pytorch, etc.
➔ Training runs at scale with new model features,
hyperparams, etc. implemented
➔ Tracking different model version in production using
Model Registry
➔ Model, entire pipeline, image, code etc. as artifacts
➔ Writing tests for the machine learning code /
feature engineering
9. How Contributes to Seamless MLOps
Staging Production Archived
Data Scientists Deployment Engineers
v1
v2
v3
Models Tracking
Flavor 2Flavor 1
Model Registry
Custom
Models
In-Line Code
Containers
Batch & Stream
Scoring
Cloud Inference
Services
OSS Serving
Solutions
Serving
Parameters Metrics Artifacts
ModelsMetadata
10. Code and Version Control
● Pick your favorite version control
○ Github, AzDo, Bitbucket, etc.
● Branch from master for development
○ Hyperparam space search, alternative feature sets, algorithm refinements, etc.
● Tracking development metrics and criteria using MLflow tracking
▪ Tracking artifacts that will be used in
the build / release stages
11. Controlling Model Flow Through Build and
Release Stages
▪ After new models
have been trained
in the feature
branch:
Parameterizing the
right experiment
path using widgets
Setting the decision
criteria for a best run
Searching through filtered
runs to identify the best_run
and build the model URI to
programmatically reference
later
12. Controlling Model Flow Through Build and
Release Stages
▪ Once the best_run has been
identified, use Model
Registry to track the flow of
models in and out of
production
▪ Note that the stages defined
in the registry do not directly
translate to environments
▪ The registry also doesn’t span
multiple workspaces, they
map 1:1 to workspace.
Initially
register the
new model
Archive the current
model out of production
Promote to
production stage
13. Continuous Deployment
▪ Deployment depends on the business problem and the SLA attached
▪ Let’s look at batch inference as an example:
Pipeline is triggered anytime there’s a
change to master branch (release)
Environment variables + secrets that
have been redacted from the script
14. Continuous Deployment Example: AzDo Pipeline
Defining the instance
OS
Installing Python
Installing Databricks
CLI
Configuring the
Databricks CLI with a
secret token
15. Copying Code to Hands off Production
Environments
▪ Copy the code from the master branch to a desired location
▪ This could be a ‘hands off’ production environment, isolated within a single workspace, etc.
16. Building and Spinning Up the Cluster
Specifying the cluster
configuration
Booting up the
cluster
17. Make the Inference Job with Parameters
▪ Create the jobs using the name
defined as an environment variable
▪ Passing the parameters which specify
the experiment location
▪ Setting job configurations
▪ Increasing number of concurrent runs
allows to run multiple inference jobs at
the same time
18. Running the Inference Job
▪ Uses the run-now endpoint to run the job via API
▪ The job shows up in the Databricks UI
▪ Can navigate using the GUI or API to get more details about each job run
19. Wrapping it all Up
▪ MLops is an important piece of the machine learning framework that
enables the business to consume downstream results with ease
▪ The basics of CI/CD can be pivoted to fit the structure of a machine
learning project to help establish the feedback loop from development
to production
▪ can act that the governing body to help regulate and track
the entire lifecycle
▪ Automation tools like Jenkins, Azure DevOps, etc. are applied to
orchestrate the end to end process