Databricks Overview
for MLOps
Clemens Mewald
Director of Product Management
MLOps / Governance
The Databricks ML Platform
Data Science Workspace
Data
Ingestion
Data
Versioning
Model
Training
Model
Tuning
Runtime and
Environments
Monitoring
Batch
Scoring
Online Serving
DATA ENGINEERS DATA SCIENTISTS ML ENGINEERS DATA ANALYSTS
Collaborative Data Science Workspace
MLOps / Governance
Data Science Workspace
Data
Ingestion
Data
Versioning
Model
Training
Model
Tuning
Runtime and
Environments
Monitoring
Batch
Scoring
Online Serving
Data Science Workspace
DATA ENGINEERS DATA SCIENTISTS
Cloud-native Collaboration Features
Commenting Co-Presence
Co-Editing
Multi-Language
Scala, SQL, Python, R: All in one
notebook.
Collaborative
Realtime co-presence, co-editing,
and commenting.
Databricks Notebooks
ML ENGINEERS DATA ANALYSTS
(Git-based) Projects
Version Review Test
Development /
Experimentatio
n
Production Jobs
Git / CI/CD
Systems
CI/CD Integration
▲
▼
Supported Git Providers
MLOps / Governance
High Quality Data at Scale
Data Science Workspace
Data
Ingestion
Data
Versioning
Model
Training
Model
Tuning
Runtime and
Environments
Monitoring
Batch
Scoring
Online Serving
High Quality Data at Scale
Structured, Semi-Structured and
Unstructured Data
Business
Intelligence
Data
Science
Machine
Learning
Delta Lake
Data Science
Workspace
MLflow
Workspace
SQL
Analytics
Ingest any format at any scale from any source
ACID transactions guarantee data validity
Versioning and time-travel built-in
Automated logging of data + version information
Turnkey ML Training at Scale
MLOps / Governance
Data Science Workspace
Data
Ingestion
Data
Versioning
Model
Training
Model
Tuning
Runtime and
Environments
Monitoring
Batch
Scoring
Online Serving
ML Runtime: DevOps-free Environment
optimized for Machine Learning
Packages up the most popular ML Toolkits
Simplifies Distributed ML/DL
Distribute and scale any single-machine ML code
to 1,000’s of machines.
Built-in AutoML and Auto-Logging
Hyperparameter tuning, AutoML, automated
tracking, and visualizations with MLflow
Turnkey ML Training at Scale
Distributed Training
▪ Built-in support in the ML Runtime
TensorFlow native Distribution Strategy (Spark TensorFlow Distributor)
HorovodRunner (Keras, TensorFlow, and PyTorch) Worker Nodes
Driver
Training Tasks
Distributed Tuning
▪ Built-in support in the ML Runtime
Worker Nodes
Driver
Trials
Integration
Support for all Deployment Modes
MLOps / Governance
Data Science Workspace
Data
Ingestion
Data
Versioning
Model
Training
Model
Tuning
Runtime and
Environments
Monitoring
Batch
Scoring
Online Serving
Models Tracking
Flavor 2
Flavor 1
Custom
Models
In-Line Code
Containers
Batch & Stream
Scoring
Cloud Inference
Services
OSS Serving
Solutions
Parameters Metrics Artifacts
Models
Metadata
Deployment Options
Staging Production Archived
Data Scientists Deployment Engineers
v2
v3
v1
Model Registry
Support for all Deployment Modes
Support for all Deployment Modes
Deploying an MLLib
model as a Spark UDF
Support for all Deployment Modes
Deploying an MLLib
model as a Spark UDF
Deploying a Scikit Learn
model as a Spark UDF
Support for all Deployment Modes
Deploying an MLLib
model as a Spark UDF
Deploying a Scikit Learn
model as a Spark UDF
Deploying a TensorFlow
model as a Spark UDF
Support for all Deployment Modes
Deploying an MLLib
model as a Spark UDF
Deploying a Scikit Learn
model as a Spark UDF
Deploying a TensorFlow
model as a Spark UDF
Yes, they’re all the same!
As are the commands to
deploy these models as
Docker containers, etc.
Data Science Workspace
Data
Ingestion
Data
Versioning
Model
Training
Model
Tuning
Runtime and
Environments
Monitoring
Batch
Scoring
Online Serving
Data Governance
Powered by
Experiment Tracking Reproducibility Model Governance
End-to-end MLOps / Governance
Powered by
Data Governance Experiment Tracking Reproducibility Model Governance
Data Source / Lineage
Data Versioning
Automated Data Source capture and Versioning
Powered by
Data Governance Experiment Tracking Reproducibility Model Governance
Feature-Level Data
Lineage / Usage
Automated capture of Feature Usage
Powered by
Data Governance Experiment Tracking Reproducibility Model Governance
Parameters
Metrics
Models
Artifacts
Automated capture of ML metrics, parameters,
artifacts, etc.
Powered by
Data Governance Experiment Tracking Reproducibility Model Governance
Trials
Automated capture of Hyperparameter Search
Powered by
Data Governance Experiment Tracking Reproducibility Model Governance
Model Interpretability
Automated Model Interpretability
Powered by
Data Governance Experiment Tracking Reproducibility Model Governance
Code Versioning
Cluster
Configuration
Environment
Configuration
Automated capture of Code, Environment and
Cluster Specification
Powered by
Data Governance Experiment Tracking Reproducibility Model Governance
Model Discoverability Model Stage-Based ACLs
Model Sharing, Reuse, and ACLs
Powered by
Data Governance Experiment Tracking Reproducibility Model Governance
Approval Process for
Stage Transitions
Audit Log of
Model Changes
Automated Model Lineage and Governance
Powered by
Data Governance Experiment Tracking Reproducibility Model Governance
Turnkey Serving
integrated with Model
Versions and Stages
Turnkey Model Serving
Data Governance Experiment Tracking Reproducibility Model Governance
Quality / Performance
Metric Monitoring
Powered by
Model Quality monitoring
Code versioning
Data versioning
Cluster configuration
Environment specification
Auto-Logging Reproducibility Checklist Reproduce Run Feature
Data Governance Experiment Tracking Reproducibility Model Governance
Powered by
✓
✓
✓
✓
The Result: Full End-to-End Governance and
Reproducibility
MLOps / Governance
The Databricks ML Platform
Data Science Workspace
Data
Ingestion
Data
Versioning
Model
Training
Model
Tuning
Runtime and
Environments
Monitoring
Batch
Scoring
Online Serving

Databricks Overview for MLOps

  • 1.
    Databricks Overview for MLOps ClemensMewald Director of Product Management
  • 2.
    MLOps / Governance TheDatabricks ML Platform Data Science Workspace Data Ingestion Data Versioning Model Training Model Tuning Runtime and Environments Monitoring Batch Scoring Online Serving
  • 3.
    DATA ENGINEERS DATASCIENTISTS ML ENGINEERS DATA ANALYSTS Collaborative Data Science Workspace MLOps / Governance Data Science Workspace Data Ingestion Data Versioning Model Training Model Tuning Runtime and Environments Monitoring Batch Scoring Online Serving
  • 4.
    Data Science Workspace DATAENGINEERS DATA SCIENTISTS Cloud-native Collaboration Features Commenting Co-Presence Co-Editing Multi-Language Scala, SQL, Python, R: All in one notebook. Collaborative Realtime co-presence, co-editing, and commenting. Databricks Notebooks ML ENGINEERS DATA ANALYSTS
  • 5.
    (Git-based) Projects Version ReviewTest Development / Experimentatio n Production Jobs Git / CI/CD Systems CI/CD Integration ▲ ▼ Supported Git Providers
  • 6.
    MLOps / Governance HighQuality Data at Scale Data Science Workspace Data Ingestion Data Versioning Model Training Model Tuning Runtime and Environments Monitoring Batch Scoring Online Serving
  • 7.
    High Quality Dataat Scale Structured, Semi-Structured and Unstructured Data Business Intelligence Data Science Machine Learning Delta Lake Data Science Workspace MLflow Workspace SQL Analytics Ingest any format at any scale from any source ACID transactions guarantee data validity Versioning and time-travel built-in Automated logging of data + version information
  • 8.
    Turnkey ML Trainingat Scale MLOps / Governance Data Science Workspace Data Ingestion Data Versioning Model Training Model Tuning Runtime and Environments Monitoring Batch Scoring Online Serving
  • 9.
    ML Runtime: DevOps-freeEnvironment optimized for Machine Learning Packages up the most popular ML Toolkits Simplifies Distributed ML/DL Distribute and scale any single-machine ML code to 1,000’s of machines. Built-in AutoML and Auto-Logging Hyperparameter tuning, AutoML, automated tracking, and visualizations with MLflow Turnkey ML Training at Scale
  • 10.
    Distributed Training ▪ Built-insupport in the ML Runtime TensorFlow native Distribution Strategy (Spark TensorFlow Distributor) HorovodRunner (Keras, TensorFlow, and PyTorch) Worker Nodes Driver Training Tasks
  • 11.
    Distributed Tuning ▪ Built-insupport in the ML Runtime Worker Nodes Driver Trials Integration
  • 12.
    Support for allDeployment Modes MLOps / Governance Data Science Workspace Data Ingestion Data Versioning Model Training Model Tuning Runtime and Environments Monitoring Batch Scoring Online Serving
  • 13.
    Models Tracking Flavor 2 Flavor1 Custom Models In-Line Code Containers Batch & Stream Scoring Cloud Inference Services OSS Serving Solutions Parameters Metrics Artifacts Models Metadata Deployment Options Staging Production Archived Data Scientists Deployment Engineers v2 v3 v1 Model Registry Support for all Deployment Modes
  • 14.
    Support for allDeployment Modes Deploying an MLLib model as a Spark UDF
  • 15.
    Support for allDeployment Modes Deploying an MLLib model as a Spark UDF Deploying a Scikit Learn model as a Spark UDF
  • 16.
    Support for allDeployment Modes Deploying an MLLib model as a Spark UDF Deploying a Scikit Learn model as a Spark UDF Deploying a TensorFlow model as a Spark UDF
  • 17.
    Support for allDeployment Modes Deploying an MLLib model as a Spark UDF Deploying a Scikit Learn model as a Spark UDF Deploying a TensorFlow model as a Spark UDF Yes, they’re all the same! As are the commands to deploy these models as Docker containers, etc.
  • 18.
    Data Science Workspace Data Ingestion Data Versioning Model Training Model Tuning Runtimeand Environments Monitoring Batch Scoring Online Serving Data Governance Powered by Experiment Tracking Reproducibility Model Governance End-to-end MLOps / Governance
  • 19.
    Powered by Data GovernanceExperiment Tracking Reproducibility Model Governance Data Source / Lineage Data Versioning Automated Data Source capture and Versioning
  • 20.
    Powered by Data GovernanceExperiment Tracking Reproducibility Model Governance Feature-Level Data Lineage / Usage Automated capture of Feature Usage
  • 21.
    Powered by Data GovernanceExperiment Tracking Reproducibility Model Governance Parameters Metrics Models Artifacts Automated capture of ML metrics, parameters, artifacts, etc.
  • 22.
    Powered by Data GovernanceExperiment Tracking Reproducibility Model Governance Trials Automated capture of Hyperparameter Search
  • 23.
    Powered by Data GovernanceExperiment Tracking Reproducibility Model Governance Model Interpretability Automated Model Interpretability
  • 24.
    Powered by Data GovernanceExperiment Tracking Reproducibility Model Governance Code Versioning Cluster Configuration Environment Configuration Automated capture of Code, Environment and Cluster Specification
  • 25.
    Powered by Data GovernanceExperiment Tracking Reproducibility Model Governance Model Discoverability Model Stage-Based ACLs Model Sharing, Reuse, and ACLs
  • 26.
    Powered by Data GovernanceExperiment Tracking Reproducibility Model Governance Approval Process for Stage Transitions Audit Log of Model Changes Automated Model Lineage and Governance
  • 27.
    Powered by Data GovernanceExperiment Tracking Reproducibility Model Governance Turnkey Serving integrated with Model Versions and Stages Turnkey Model Serving
  • 28.
    Data Governance ExperimentTracking Reproducibility Model Governance Quality / Performance Metric Monitoring Powered by Model Quality monitoring
  • 29.
    Code versioning Data versioning Clusterconfiguration Environment specification Auto-Logging Reproducibility Checklist Reproduce Run Feature Data Governance Experiment Tracking Reproducibility Model Governance Powered by ✓ ✓ ✓ ✓ The Result: Full End-to-End Governance and Reproducibility
  • 30.
    MLOps / Governance TheDatabricks ML Platform Data Science Workspace Data Ingestion Data Versioning Model Training Model Tuning Runtime and Environments Monitoring Batch Scoring Online Serving