2. Agenda
• MLflow model serving
• How to score models with MLflow
○ Online scoring with MLflow scoring server
○ Offline scoring with Spark
• Custom model deployment and scoring
3. Agenda
Offline Prediction
● High throughput
● Bulk predictions
● Predict with Spark
● Batch Prediction
● Structured Streaming
Prediction
Online Prediction
● Low latency
● Individual predictions
● Real-time model serving
● MLflow scoring server
● MLflow deployment plugin
● Serving outside MLflow
● Databricks model serving
4. Vocabulary - Synonyms
Model Serving
● Model serving
● Model scoring
● Model prediction
● Model inference
● Model deployment
Prediction
● Offline == batch prediction
○ Spark batch prediction
○ Spark streaming prediction
● Online == real-time prediction
6. Offline Scoring
• Spark is ideally suited for offline scoring
• Can use either Spark batch or structured streaming
• Use SQL with MLflow UDFs
• Distributed scoring with Spark
• Load model from MLflow Model Registry
7. MLflow Offline Scoring Example
Directly invoke model
udf = mlflow.pyfunc.spark_udf(spark, model_uri)
spark.udf.register("predict", udf)
%sql select *, predict(*) as prediction from my_data
UDF - SQL
model = mlflow.spark.load_model(model_uri)
predictions = model.transform(data)
UDF - Python
udf = mlflow.pyfunc.spark_udf(spark, model_uri)
predictions = data.withColumn("prediction", udf(*data.columns))
model_uri = "models:/sklearn-wine/production"
Model URI
8. Online Scoring
• Score models outside of Spark
• Variety of real-time deployment options:
○ REST API - web servers - self-managed or as containers in cloud providers
○ Embedded in browsers, .e.g TensorFlow Lite
○ Edge devices: mobile phones, sensors, routers, etc.
○ Hardware accelerators - GPU, NVIDIA, Intel
9. Options to serve real-time models
• Embed model in application code
• Model deployed as service
• Model published as data
• Martin Fowler’s Continuous Delivery for Machine Learning
10. MLflow Scoring Server
• Basis of different MLflow deployment targets
• Standard REST API:
○ Input: CSV or JSON (pandas-split or pandas-records formats)
○ Output: JSON list of predictions
• Implementation: Python Flask or Java Jetty server
• MLflow CLI builds a server with embedded model
• See Built-In Deployment Tools
11. MLflow Scoring Server deployment options
• Local web server
• SageMaker docker container
• Azure ML docker container
• Generic docker container
12. MLflow Scoring Server container types
Python Server
Model and its ML
framework embedded
in Flask server.
Only for non-Spark ML
models.
Python Server + Spark
Flask server delegates
scoring to Spark
server. Only for Spark
ML models.
Java MLeap Server
Model and MLeap
runtime are embedded
in Jetty server. Only
for Spark ML models.
No Spark runtime.
18. MLflow SageMaker and Azure ML containers
• Two types of containers to deploy to cloud providers
○ Python container - Flask web server process with embedded model
○ SparkML container - Flask web server process and Spark process
• SageMaker container
○ Most versatile container type
○ Can run in local mode on laptop as regular docker container
19. Python container
• Flask web server
• Model (e.g. sklearn or Keras/TF) is embedded inside web server
20. SparkML container
• Two processes:
○ Flask web server
○ Spark server - OSS Spark - no Databricks
• Web server delegates scoring to Spark server
21. MLflow SageMaker container deployment
• CLI:
○ mlflow sagemaker build-and-push-container
○ mlflow sagemaker deploy
○ mlflow sagemaker run-local
• API: mlflow.sagemaker API
• Deploy a model on Amazon SageMaker
23. MLflow AzureML container deployment
• Deploy to Azure Kubernetes Service (AKS) or Azure Container
Instances (ACI)
• CLI - mlflow azureml build-image
• API - mlflow.azureml.build-image
• Deploy a python_function model on Microsoft Azure ML
25. MLeap
• Non-Spark serialization format for Spark models
• Advantage is ability to score Spark model without overhead of
Spark
• Faster than scoring Spark ML models with Spark
• Problem is stability, maturity and lack of dedicated commercial
support
27. Databricks Model Serving
• Expose MLflow model predictions as REST endpoint
• One-node cluster is automatically provisioned
• HTTP endpoint publicly exposed
• Limited production capability - intended for light loads and testing
• Production-grade serving on the product roadmap
28. Model Serving on Databricks
Model Serving
Turnkey serving
solution to expose
models as REST
endpoints
Reports
Applications
...
REST
Endpoint
Tracking
Record and query
experiments: code,
metrics, parameters,
artifacts, models
Models
General format
that standardizes
deployment options
Logged
Model
Model Registry
Centralized and
collaborative
model lifecycle
management
Staging Production Archived
Data Scientists Deployment Engineers
30. Databricks Model Serving Resources
• Blog posts
○ Quickly Deploy, Test, and Manage ML Models as REST Endpoints with MLflow
Model Serving on Databricks - 2020-11-02
○ Announcing MLflow Model Serving on Databricks - 2020-06-25
• Documentation
○ MLflow Model Serving on Databricks
31. MLflow Deployment Plugins
• MLflow Deployment Plugins - Deploy model to custom serving tool
• Current deployment plugins:
○ https://github.com/RedisAI/mlflow-redisai
○ https://github.com/mlflow/mlflow-torchserve
32. MLflow PyTorch
• MLflow PyTorch integration released in MLflow 1.12.0 (Nov. 2020)
• Features:
○ PyTorch Autologging
○ TorchScript Models
○ TorchServing
• Deploy MLflow PyTorch models into TorchServe
33. MLflow PyTorch and TorchServe Resources
• PyTorch and MLflow Integration Announcement - 2020-11-12
• MLflow 1.12 Features Extended PyTorch Integration - 2012-11-13
• MLflow and PyTorch — Where Cutting Edge AI meets MLOps
• https://github.com/mlflow/mlflow-torchserve
34. MLflow and TensorFlow Serving Custom
Deployment
• Example how to deploy models to a custom deployment target
• MLflow deployment plugin has not yet been developed
• Deployment builds a standard TensorFlow Serving docker
container with a MLflow model from Model Registry
• https://github.com/amesar/mlflow-
tools/tree/master/mlflow_tools/tensorflow_serving
35. Keras/TensorFlow Model Formats
• HD5 format
○ Default in Keras TF 1.x but is legacy in Keras TF 2.x
○ Not supported by TensorFlow Serving
• SavedModel format
○ TensorFlow standard format is default in Keras TF 2.x
○ Supported by TensorFlow Serving
36. MLflow Keras/TensorFlow Flavors
• MLflow currently only supports Keras HD5 format as flavor
• MLflow does not support SavedModel flavor until this is resolved:
○ https://github.com/mlflow/mlflow/issues/3224
○ https://github.com/mlflow/mlflow/pull/3552
• Can save SavedModel as an unmanaged artifact for now
38. TensorFlow Serving Example
docker run -t --rm --publish 8501
--volume /opt/mlflow/mlruns/1/f48dafd70be044298f71488b0ae10df4/artifacts/tensorflow-model:/models/keras_wine
--env MODEL_NAME=keras_wine
tensorflow/serving
Launch server
Launch server in Docker container
curl -d '{"instances": [12.8, 0.03, 0.48, 0.98, 6.2, 29, 1.2, 0.4, 75 ] }'
-X POST http://localhost:8501/v1/models/keras_wine:predict
Launch server
Request
{ "predictions": [[-0.70597136]] }
Launch server
Response
39. MLflow Keras/TensorFlow Run Example
MLflow Flavors - Managed Models
● Models
○ keras-hd5-model
○ onnx-model
● Can be served as PyFunc models
MLflow Unmanaged Models
● Models
○ tensorflow-model - Standard TF SavedModel format
○ tensorflow-lite-model
○ tensorflow.js model
● Load as raw artifacts and then serve manually
● Sample code: wine_train.py and wine_predict.py
40. ONNX
• Interoperable model format supported by:
○ MSFT, Facebook, AWS, Nvidia, Intel, etc.
• Train once, deploy anywhere (in theory) - depends on ML tool
• Save MLflow model as ONNX flavor
• Deploy in standard MLflow scoring server
• ONNX runtime is embedded in Python server in MLflow container
What did they do with us?
what are they trying to do? recommendation? content curation?
how does that work?
How come Delta and Spark and those things can help with that thing (recommendation, or whatever they do)?
What did they do with us?
what are they trying to do? recommendation? content curation?
how does that work?
How come Delta and Spark and those things can help with that thing (recommendation, or whatever they do)?
What did they do with us?
what are they trying to do? recommendation? content curation?
how does that work?
How come Delta and Spark and those things can help with that thing (recommendation, or whatever they do)?
What did they do with us?
what are they trying to do? recommendation? content curation?
how does that work?
How come Delta and Spark and those things can help with that thing (recommendation, or whatever they do)?
1. Aha! feature
2. Description of the feature:
With MLflow we already have a Model format that abstracts away the ML framework. It doesn’t matter whether you use TensorFlow, Scikit, or SparkML, if you log it as an MLflow Model you can deploy the models in the same way.
The tracking server allows you to log those models in experiments, with metrics, parameters, etc.
Once you want to deploy a model, the Model Registry takes care of versioning, the deployment lifecycle, and hand-off between Data Scientists and Deployment Engineers
From the Model Registry, there are several deployment options. However, many customers have asked us for a turnkey solution to serve models as REST endpoints. So we are building a Serving solution to do exactly that, with two clicks.
3. Value of the feature (aka what databricks was before this feature, and what this feature will do for databricks)
Compared to other serving solutions this will require only 2 clicks to be enabled. However, important to highlight that this is for reporting / debug purposes and will not autoscale to arbitrary request loads.
4. Associated summary slide bullet point:
Native support for serving MLflow Models as REST endpoints
5. Cloud: All
6. Deployment (MT, ST, Azure)
All