Building a machine learning model is an iterative process. A data scientist will build many tens to hundreds of models before arriving at one that meets some acceptance criteria. However, the current style of model building is ad-hoc and there is no practical way for a data scientist to manage models that are built over time. In addition, there are no means to run complex queries on models and related data.
In this talk, we present ModelDB, a novel end-to-end system for managing machine learning (ML) models. Using client libraries, ModelDB automatically tracks and versions ML models in their native environments (e.g. spark.ml, scikit-learn). A common set of abstractions enable ModelDB to capture models and pipelines built across different languages and environments. The structured representation of models and metadata then provides a platform for users to issue complex queries across various modeling artifacts. Our rich web frontend provides a way to query ModelDB at varying levels of granularity.
ModelDB has been open-sourced at https://github.com/mitdbg/modeldb.