Accelerating the Machine
Learning Lifecycle with
MLflow ( Paper Review )
Databricks Inc. Matei Zaharia… 강석우 정리자료
Abstract
MLflow, an open source platform, to streamline the machine learning lifecycle
Machine learning development creates multiple new challenges ( three key challenges )
Experimentation
Reproducibility
Model Deployment
Introduction
ML applications need to be deployed to production. This is especially challenging when deployment
requires collaboration with another team.
In practice, an organization will need to run models from multiple ML libraries, TensorFlow versions, etc.,
and has to design its own infrastructure for this task.
MLflow’s key principle is an open interface design. and provides API for experiment tracking,
reproducible runs and model packaging and deployment, usable in Python, Java and R.
The challenge is how to do so while maximum flexibility for ML
developers to build the best possible model
The goal in machine learning is to optimize a specific metric, such as prediction accuracy.
ML user’s four challenges arise repeatedly
Multitude of tools ( Want to try every available tool to see whether it improves results )
Experiment tracking ( Results can be affected from input data to code and hyperparameter )
Reproducibility ( Teams often have trouble the same code to work again if the others modifies )
Production deployment ( plethora of possible inference environments pipleline needs to be reliably converted)
MLflow Overview
MLflow provides three components, which can either be used together or separately
MLflow Tracking, recording and logging experiment runs, queried through an API or UI
MLflow Projects, a simple format for packaging code into reusable projects
MLflow Models, a generic format for packaging models
MLflow Tracking is an API for logging and querying experiment runs
Log parameters, which are arbitrary key-value pairs
Log metrics, each metric can also be updated throughout the run
Log artifacts, arbitrary output files
MLflow Tracking
MLflow Projects
MLflow Projects provide a simple format for packaging reproducible data science code
YAML Format ( YAML Ain’t Markup Language )
Focused on Readability ( Specify dependencies such as name, env, entry points )
Inspired by E-mail
Compared with JSON ( type definition )
MLflow Models
MLflow Models are a convention for packaging machine learning models
flavors, allowing diverse tools to understand the model at different levels of abstractions
also YAML format
Conclusion
MLflow, a software platform that can structure the machine learning lifecycle while giving users broad
flexibility to use their own ML algorithms, software libraries and development processes.
https://www.mlflow.org

Accelerating the machine learning lifecycle with m lflow

  • 1.
    Accelerating the Machine LearningLifecycle with MLflow ( Paper Review ) Databricks Inc. Matei Zaharia… 강석우 정리자료
  • 2.
    Abstract MLflow, an opensource platform, to streamline the machine learning lifecycle Machine learning development creates multiple new challenges ( three key challenges ) Experimentation Reproducibility Model Deployment
  • 3.
    Introduction ML applications needto be deployed to production. This is especially challenging when deployment requires collaboration with another team. In practice, an organization will need to run models from multiple ML libraries, TensorFlow versions, etc., and has to design its own infrastructure for this task. MLflow’s key principle is an open interface design. and provides API for experiment tracking, reproducible runs and model packaging and deployment, usable in Python, Java and R.
  • 4.
    The challenge ishow to do so while maximum flexibility for ML developers to build the best possible model The goal in machine learning is to optimize a specific metric, such as prediction accuracy. ML user’s four challenges arise repeatedly Multitude of tools ( Want to try every available tool to see whether it improves results ) Experiment tracking ( Results can be affected from input data to code and hyperparameter ) Reproducibility ( Teams often have trouble the same code to work again if the others modifies ) Production deployment ( plethora of possible inference environments pipleline needs to be reliably converted)
  • 5.
    MLflow Overview MLflow providesthree components, which can either be used together or separately MLflow Tracking, recording and logging experiment runs, queried through an API or UI MLflow Projects, a simple format for packaging code into reusable projects MLflow Models, a generic format for packaging models
  • 6.
    MLflow Tracking isan API for logging and querying experiment runs Log parameters, which are arbitrary key-value pairs Log metrics, each metric can also be updated throughout the run Log artifacts, arbitrary output files MLflow Tracking
  • 7.
    MLflow Projects MLflow Projectsprovide a simple format for packaging reproducible data science code YAML Format ( YAML Ain’t Markup Language ) Focused on Readability ( Specify dependencies such as name, env, entry points ) Inspired by E-mail Compared with JSON ( type definition )
  • 8.
    MLflow Models MLflow Modelsare a convention for packaging machine learning models flavors, allowing diverse tools to understand the model at different levels of abstractions also YAML format
  • 9.
    Conclusion MLflow, a softwareplatform that can structure the machine learning lifecycle while giving users broad flexibility to use their own ML algorithms, software libraries and development processes. https://www.mlflow.org