SlideShare a Scribd company logo
Andre Mesarovic
Sr. Specialist Solutions Architect
23 November 2022
Object Relationships
Databricks
● Databricks MLflow objects (runs, experiments, registered models and their
versions, notebooks) form a complex web of relationships.
● Objects live in different places: workspace objects, DBFS (cloud) and MySQL.
○ A run’s metadata lives in MySQL, its artifacts in cloud and its notebook in the workspace and/or git.
● Experiments have zero or more runs.
● Registered models have 0 or more versions that point to a run’s MLflow model.
● Code that generated a run’s MLflow model:
○ MLflow runs have pointers to a notebook revision that generated the model.
○ Runs will/should have pointers to the git version of a notebook that generated the model.
Overview
● Model is an overloaded term with three meanings:
○ Native model artifact - this is the lowest level and is simply the native flavor’s serialized format. For
sklearn it’s a pickle file, for Keras it’s a directory with TensorFlow’s native SaveModel format files.
○ MLflow model - a wrapper around the native model artifact with metadata in the MLmodel file and
environment information in conda.yaml and requirements.txt files.
○ Registered model - a bucket of model versions. A model version contains one MLflow model that is
cached in the model repository. A version has the following links (expressed as tags):
■ run_id - points to the run that generated the version’s model.
■ source - points to the path of MLflow model in the run that corresponds to the version’s model.
■ workspace_uri - currently missing. Needed if using shared model registry. ML-19472.
Model terminology
Model relationships
● Runs
○ Contains one or more MLflow models
● Experiments
○ Notebook experiments
○ Workspace experiments
● Registered models
○ A registered model contains versions
○ A version points to one run’s MLflow model
○ Native model artifacts - the actual bits that execute predictions that are part of the MLflow model
● Notebooks
Databricks MLflow object relationships
Databricks MLflow objects relationships
● Diagram uses the UML modeling language.
○ *: indicates a many relationship
○ 1: indicates a required one relationship.
○ 0..1: indicates an optional one relationship.
● This is a logical diagram. Not all nuances are captured for simplification.
● The diagram represents a notebook experiment.
● A workspace experiment is not represented in the diagram.
Diagram legend
● A registered model is a bucket for model versions.
● A version has one MLflow model which is linked to the run that generated it.
● The production and staging stage have one "latest" version.
● Registered model versions are cached in the model registry.
● This is a clone of the run's MLflow model that the version points to.
● If source run is in a different workspace we have a lineage reachability problem.
See ML-19472 - Add workspace URI field in ModelVersion for a registered
model to make run reachable.
Registered models
● An experiment has zero or more runs.
● Two types of experiments:
○ Notebook experiment
■ Relationship of experiment to notebook is one-to-one.
■ Workspace path of the experiment is the same as its notebook.
○ Workspace experiment
■ Relationship of experiment to notebook is one-to-many.
■ Explicitly specify the experiment path with set_experiment method.
■ Different notebooks can create runs in the same experiment.
Experiments
● A run belongs to only one experiment.
● A run is linked to one notebook revision. MLflow notebook tags:
○ mlflow.databricks.notebookRevisionID
○ mlflow.databricks.notebookID
○ mlflow.databricks.notebookPath
● Optionally a run’s notebook can be linked to a git reference.
○ See discussion on Notebook below for details.
● A run can have one or more MLflow models (flavors) such as Sklearn and ONNX.
● Every run has a default Pyfunc flavor which is wrapper around the native model.
Runs
MLflow Run Details
● An MLflow Run has three basic components
○ Metadata (params, metrics, tags) residing in a MySQL database.
○ MLflow model artifact which lives in DBFS (cloud). Note you can also have arbitrary customer
artifacts.
○ Link to code:
■ For Databricks, the run points to either:
● Workspace notebook revision
● Repos notebook a pointer to git.
■ For open source the link points to git.
MLflow Run Details Legend
● A notebook has many revisions.
● Optionally, a notebook revision can be checked into git with Databricks Repos.
● Need to capture git reference analogous to the MLflow open source tags:
○ mlflow.source.git.commit
○ mlflow.source.git.repoURL
○ mlflow.gitRepoURL
● See ML-19473 - Add git reference tags to Databricks run if its notebook is synced with
Repos
● Two sources of truth for a notebook snapshot that can be confusing:
○ Databricks notebook revision
○ Git version
Notebooks
journey!
Happy

More Related Content

Similar to Databricks MLflow Object Relationships

Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
Databricks
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
Databricks
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
Wattpad - Spark Stories
Wattpad - Spark StoriesWattpad - Spark Stories
Wattpad - Spark Stories
Rylan Halteman
 
Managing the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowManaging the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflow
Databricks
 
Single Responsibility Principle
Single Responsibility PrincipleSingle Responsibility Principle
Single Responsibility Principle
BADR
 
Design and Implementation of the Security Graph Language
Design and Implementation of the Security Graph LanguageDesign and Implementation of the Security Graph Language
Design and Implementation of the Security Graph Language
Asankhaya Sharma
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Knoldus Inc.
 
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Aaron Saray
 
Pythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlowPythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlow
Fernando Ortega Gallego
 
MLFlow 1.0 Meetup
MLFlow 1.0 Meetup MLFlow 1.0 Meetup
MLFlow 1.0 Meetup
Databricks
 
Mongo db
Mongo dbMongo db
Mongo db
Gyanendra Yadav
 
Stefan Richter - Writing simple, readable and robust code: Examples in Java, ...
Stefan Richter - Writing simple, readable and robust code: Examples in Java, ...Stefan Richter - Writing simple, readable and robust code: Examples in Java, ...
Stefan Richter - Writing simple, readable and robust code: Examples in Java, ...
AboutYouGmbH
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Collaborative modeling with sirius
Collaborative modeling with siriusCollaborative modeling with sirius
Collaborative modeling with sirius
pcdavid_
 
Productionalizing Spark ML
Productionalizing Spark MLProductionalizing Spark ML
Productionalizing Spark ML
datamantra
 
New c sharp3_features_(linq)_part_iv
New c sharp3_features_(linq)_part_ivNew c sharp3_features_(linq)_part_iv
New c sharp3_features_(linq)_part_iv
Nico Ludwig
 
Open source data_warehousing_overview
Open source data_warehousing_overviewOpen source data_warehousing_overview
Open source data_warehousing_overviewAlex Meadows
 
Design Patterns In Scala
Design Patterns In ScalaDesign Patterns In Scala
Design Patterns In Scala
Knoldus Inc.
 
Complete+dbt+Bootcamp+slides-plus examples
Complete+dbt+Bootcamp+slides-plus examplesComplete+dbt+Bootcamp+slides-plus examples
Complete+dbt+Bootcamp+slides-plus examples
nicolascombin1
 

Similar to Databricks MLflow Object Relationships (20)

Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Wattpad - Spark Stories
Wattpad - Spark StoriesWattpad - Spark Stories
Wattpad - Spark Stories
 
Managing the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowManaging the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflow
 
Single Responsibility Principle
Single Responsibility PrincipleSingle Responsibility Principle
Single Responsibility Principle
 
Design and Implementation of the Security Graph Language
Design and Implementation of the Security Graph LanguageDesign and Implementation of the Security Graph Language
Design and Implementation of the Security Graph Language
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
 
Pythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlowPythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlow
 
MLFlow 1.0 Meetup
MLFlow 1.0 Meetup MLFlow 1.0 Meetup
MLFlow 1.0 Meetup
 
Mongo db
Mongo dbMongo db
Mongo db
 
Stefan Richter - Writing simple, readable and robust code: Examples in Java, ...
Stefan Richter - Writing simple, readable and robust code: Examples in Java, ...Stefan Richter - Writing simple, readable and robust code: Examples in Java, ...
Stefan Richter - Writing simple, readable and robust code: Examples in Java, ...
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Collaborative modeling with sirius
Collaborative modeling with siriusCollaborative modeling with sirius
Collaborative modeling with sirius
 
Productionalizing Spark ML
Productionalizing Spark MLProductionalizing Spark ML
Productionalizing Spark ML
 
New c sharp3_features_(linq)_part_iv
New c sharp3_features_(linq)_part_ivNew c sharp3_features_(linq)_part_iv
New c sharp3_features_(linq)_part_iv
 
Open source data_warehousing_overview
Open source data_warehousing_overviewOpen source data_warehousing_overview
Open source data_warehousing_overview
 
Design Patterns In Scala
Design Patterns In ScalaDesign Patterns In Scala
Design Patterns In Scala
 
Complete+dbt+Bootcamp+slides-plus examples
Complete+dbt+Bootcamp+slides-plus examplesComplete+dbt+Bootcamp+slides-plus examples
Complete+dbt+Bootcamp+slides-plus examples
 

Recently uploaded

Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 

Recently uploaded (20)

Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 

Databricks MLflow Object Relationships

  • 1. Andre Mesarovic Sr. Specialist Solutions Architect 23 November 2022 Object Relationships Databricks
  • 2. ● Databricks MLflow objects (runs, experiments, registered models and their versions, notebooks) form a complex web of relationships. ● Objects live in different places: workspace objects, DBFS (cloud) and MySQL. ○ A run’s metadata lives in MySQL, its artifacts in cloud and its notebook in the workspace and/or git. ● Experiments have zero or more runs. ● Registered models have 0 or more versions that point to a run’s MLflow model. ● Code that generated a run’s MLflow model: ○ MLflow runs have pointers to a notebook revision that generated the model. ○ Runs will/should have pointers to the git version of a notebook that generated the model. Overview
  • 3. ● Model is an overloaded term with three meanings: ○ Native model artifact - this is the lowest level and is simply the native flavor’s serialized format. For sklearn it’s a pickle file, for Keras it’s a directory with TensorFlow’s native SaveModel format files. ○ MLflow model - a wrapper around the native model artifact with metadata in the MLmodel file and environment information in conda.yaml and requirements.txt files. ○ Registered model - a bucket of model versions. A model version contains one MLflow model that is cached in the model repository. A version has the following links (expressed as tags): ■ run_id - points to the run that generated the version’s model. ■ source - points to the path of MLflow model in the run that corresponds to the version’s model. ■ workspace_uri - currently missing. Needed if using shared model registry. ML-19472. Model terminology
  • 5. ● Runs ○ Contains one or more MLflow models ● Experiments ○ Notebook experiments ○ Workspace experiments ● Registered models ○ A registered model contains versions ○ A version points to one run’s MLflow model ○ Native model artifacts - the actual bits that execute predictions that are part of the MLflow model ● Notebooks Databricks MLflow object relationships
  • 7. ● Diagram uses the UML modeling language. ○ *: indicates a many relationship ○ 1: indicates a required one relationship. ○ 0..1: indicates an optional one relationship. ● This is a logical diagram. Not all nuances are captured for simplification. ● The diagram represents a notebook experiment. ● A workspace experiment is not represented in the diagram. Diagram legend
  • 8. ● A registered model is a bucket for model versions. ● A version has one MLflow model which is linked to the run that generated it. ● The production and staging stage have one "latest" version. ● Registered model versions are cached in the model registry. ● This is a clone of the run's MLflow model that the version points to. ● If source run is in a different workspace we have a lineage reachability problem. See ML-19472 - Add workspace URI field in ModelVersion for a registered model to make run reachable. Registered models
  • 9. ● An experiment has zero or more runs. ● Two types of experiments: ○ Notebook experiment ■ Relationship of experiment to notebook is one-to-one. ■ Workspace path of the experiment is the same as its notebook. ○ Workspace experiment ■ Relationship of experiment to notebook is one-to-many. ■ Explicitly specify the experiment path with set_experiment method. ■ Different notebooks can create runs in the same experiment. Experiments
  • 10. ● A run belongs to only one experiment. ● A run is linked to one notebook revision. MLflow notebook tags: ○ mlflow.databricks.notebookRevisionID ○ mlflow.databricks.notebookID ○ mlflow.databricks.notebookPath ● Optionally a run’s notebook can be linked to a git reference. ○ See discussion on Notebook below for details. ● A run can have one or more MLflow models (flavors) such as Sklearn and ONNX. ● Every run has a default Pyfunc flavor which is wrapper around the native model. Runs
  • 12. ● An MLflow Run has three basic components ○ Metadata (params, metrics, tags) residing in a MySQL database. ○ MLflow model artifact which lives in DBFS (cloud). Note you can also have arbitrary customer artifacts. ○ Link to code: ■ For Databricks, the run points to either: ● Workspace notebook revision ● Repos notebook a pointer to git. ■ For open source the link points to git. MLflow Run Details Legend
  • 13. ● A notebook has many revisions. ● Optionally, a notebook revision can be checked into git with Databricks Repos. ● Need to capture git reference analogous to the MLflow open source tags: ○ mlflow.source.git.commit ○ mlflow.source.git.repoURL ○ mlflow.gitRepoURL ● See ML-19473 - Add git reference tags to Databricks run if its notebook is synced with Repos ● Two sources of truth for a notebook snapshot that can be confusing: ○ Databricks notebook revision ○ Git version Notebooks

Editor's Notes

  1. What did they do with us? what are they trying to do? recommendation? content curation? how does that work? How come Delta and Spark and those things can help with that thing (recommendation, or whatever they do)?
  2. What did they do with us? what are they trying to do? recommendation? content curation? how does that work? How come Delta and Spark and those things can help with that thing (recommendation, or whatever they do)?
  3. What did they do with us? what are they trying to do? recommendation? content curation? how does that work? How come Delta and Spark and those things can help with that thing (recommendation, or whatever they do)?
  4. What did they do with us? what are they trying to do? recommendation? content curation? how does that work? How come Delta and Spark and those things can help with that thing (recommendation, or whatever they do)?
  5. What did they do with us? what are they trying to do? recommendation? content curation? how does that work? How come Delta and Spark and those things can help with that thing (recommendation, or whatever they do)?
  6. What did they do with us? what are they trying to do? recommendation? content curation? how does that work? How come Delta and Spark and those things can help with that thing (recommendation, or whatever they do)?
  7. What did they do with us? what are they trying to do? recommendation? content curation? how does that work? How come Delta and Spark and those things can help with that thing (recommendation, or whatever they do)?
  8. What did they do with us? what are they trying to do? recommendation? content curation? how does that work? How come Delta and Spark and those things can help with that thing (recommendation, or whatever they do)?
  9. What did they do with us? what are they trying to do? recommendation? content curation? how does that work? How come Delta and Spark and those things can help with that thing (recommendation, or whatever they do)?
  10. What did they do with us? what are they trying to do? recommendation? content curation? how does that work? How come Delta and Spark and those things can help with that thing (recommendation, or whatever they do)?
  11. What did they do with us? what are they trying to do? recommendation? content curation? how does that work? How come Delta and Spark and those things can help with that thing (recommendation, or whatever they do)?
  12. What did they do with us? what are they trying to do? recommendation? content curation? how does that work? How come Delta and Spark and those things can help with that thing (recommendation, or whatever they do)?