What’s new in ?
Corey Zumar
Software Engineer, Databricks
A System to Accelerate the Machine Learning Lifecycle
Overview of new developments
Feature deep dives
Autologging for scikit-learn
Model schemas & input examples
MLflow’s plugin ecosystem
What’s next for MLflow?
Outline
: An Open Source ML Platform
Experiment
management
Model managementReproducible runs Model packaging
and deployment
T R A C K I N G P R O J E C T S M O D E L R E G I S T R YM O D E L S
Training
Deployment
Raw Data
Data Prep
ML Engineer
Application
DeveloperData Engineer
Any Language
Any ML Library
: Overview of new developments
TRACKING PROJECTS MODELS MODEL REGISTRY
- Scikit-learn
autologging (1.11)
- Fast.ai
autologging (1.9)
- UI accessibility,
syntax highlighting
for artifacts + PDF
support (1.9, 1.11)
- Backend plugin
support (1.9)
- YARN execution
backend
- Expanded
artifact
resolution
capabilities (1.9)
- Model schemas &
input examples
(1.9)
- Deployment
plugin support
(1.9)
- Spacy model
flavor (1.8)
- Tags for registered
models and model
versions (1.9)
- Enhanced version
comparison UI, including
schemas (1.11)
- Simplified model
archiving (1.10)
Overview of new developments
Feature deep dives
Autologging for scikit-learn
Model signatures & input examples
MLflow’s plugin ecosystem
What’s next for MLflow?
Outline
Instrumenting scikit-learn workflows
mlflow.start_run()
mlflow.log_param(“alpha”, alpha)
mlflow.log_param(“l1_ratio”, l1_ratio)
model = ElasticNet(alpha, l1_ratio)
model.fit(X, y)
y_pred = model.predict(X)
loss = mean_absolute_error(y, y_pred)
mlflow.log_metric(“loss”, loss)
mlflow.sklearn.log_model(model)
mlflow.end_run()
Instrumenting scikit-learn workflows
mlflow.start_run()
mlflow.log_param(“alpha”, alpha)
mlflow.log_param(“l1_ratio”,
l1_ratio)
model = ElasticNet(alpha, l1_ratio)
model.fit(X, y)
y_pred = model.predict(X)
loss = mean_absolute_error(y, y_pred)
mlflow.log_metric(“loss”, loss)
mlflow.sklearn.log_model(model)
mlflow.end_run()
ElasticNet
ARDRegression
AdaBoostClassifier
AdaBoostRegressor
AdditiveChi2Sampler
AffinityPropagation
AgglomerativeClustering
BaggingClassifier
BaggingRegressor
BayesianGaussianMixture
BayesianRidge
BernoulliNB
BernoulliRBM
Binarizer
Birch
CCA
CalibratedClassifierCV
CategoricalNB
ClassifierChain
ColumnTransformer
ComplementNB
CountVectorizer
DBSCAN
DecisionTreeClassifier
DecisionTreeRegressor
DictVectorizer
DictionaryLearning
DummyClassifier
DummyRegressor
ElasticNet
ElasticNetCV
EllipticEnvelope
EmpiricalCovariance
ExtraTreeClassifier
ExtraTreeRegressor
ExtraTreesClassifier
ExtraTreesRegressor
FactorAnalysis
FastICA
FeatureAgglomeration
FeatureHasher
FeatureUnion
FunctionTransformer
GammaRegressor
GaussianMixture
GaussianNB
GaussianProcessClassifier
GaussianProcessRegressor
GaussianRandomProjection
GenericUnivariateSelect
GradientBoostingClassifier
GradientBoostingRegressor
GraphicalLasso
GraphicalLassoCV
GridSearchCV
HashingVectorizer
HistGradientBoostingClassifier
HistGradientBoostingRegressor
HuberRegressor
IncrementalPCA
IsolationForest
Isomap
IsotonicRegression
IterativeImputer
KBinsDiscretizer
KMeans
KNNImputer
KNeighborsClassifier
KNeighborsRegressor
KNeighborsTransformer
KernelCenterer
KernelDensity
KernelPCA
KernelRidge
LabelBinarizer
LabelEncoder
LabelPropagation
LabelSpreading
Lars
LarsCV
Lasso
LassoCV
LassoLars
LassoLarsCV
LassoLarsIC
LatentDirichletAllocation
LedoitWolf
LinearDiscriminantAnalysis
LinearRegression
LinearSVC
LinearSVR
LocalOutlierFactor
LocallyLinearEmbedding
LogisticRegression
LogisticRegressionCV
MDS
MLPClassifier
MLPRegressor
MaxAbsScaler
MeanShift
MinCovDet
MinMaxScaler
MiniBatchDictionaryLearning
MiniBatchKMeans
MiniBatchSparsePCA
MissingIndicator
MultiLabelBinarizer
MultiOutputClassifier
MultiOutputRegressor
MultiTaskElasticNet
MultiTaskElasticNetCV
MultiTaskLasso
MultiTaskLassoCV
MultinomialNB
NMF
NearestCentroid
NearestNeighbors
NeighborhoodComponentsAnaly
sis
Normalizer
NuSVC
NuSVR
Nystroem
OAS
OPTICS
OneClassSVM
OneHotEncoder
OneVsOneClassifier
OneVsRestClassifier
OrdinalEncoder
OrthogonalMatchingPursuit
OrthogonalMatchingPursuitCV
OutputCodeClassifier
PCA
PLSCanonical
PLSRegression
PLSSVD
PassiveAggressiveClassifier
PassiveAggressiveRegressor
PatchExtractor
Perceptron
Pipeline
PoissonRegressor
PolynomialFeatures
PowerTransformer
QuadraticDiscriminantAnalysis
QuantileTransformer
RANSACRegressor
RBFSampler
RFE
RFECV
RadiusNeighborsClassifier
RadiusNeighborsRegressor
RadiusNeighborsTransformer
RandomForestClassifier
RandomForestRegressor
RandomTreesEmbedding
RandomizedSearchCV
RegressorChain
Ridge
RidgeCV
RidgeClassifier
RidgeClassifierCV
RobustScaler
SGDClassifier
SGDRegressor
SVC
SVR
SelectFdr
SelectFpr
SelectFromModel
SelectFwe
SelectKBest
SelectPercentile
ShrunkCovariance
SimpleImputer
SkewedChi2Sampler
SparseCoder
SparsePCA
SparseRandomProjection
SpectralBiclustering
SpectralClustering
SpectralCoclustering
SpectralEmbedding
StackingClassifier
StackingRegressor
StandardScaler
TSNE
TfidfTransformer
TfidfVectorizer
TheilSenRegressor
TransformedTargetRegressor
TruncatedSVD
TweedieRegressor
VarianceThreshold
VotingClassifier
VotingRegressor
Autologging for scikit-learn
One line to record params, metrics and models:
mlflow.sklearn.autolog()
new
in
1.11
Autologging: supported libraries
Overview of new developments
Feature deep dives
Autologging for scikit-learn
Model schemas & input examples
MLflow’s plugin ecosystem
What’s next for MLflow?
Outline
Model Schemas
Specify input and output data types for models
Incompatible schemas!
Model
Input Schema
Output Schema
Check Compatibility
and Validate New
Model Versions
new
in
1.9
zipcode: string,
sqft: double,
distance: double
price: double
log_model(…)
Model Schemas
Infer model input / output signature from data
infer_signature(
inputs,
outputs
)
inputs: [
'year built': long,
'year sold': long,
'lot area': long,
'zip code': long,
'quality': long
]
outputs: ['sale price': double]
log_model(
...,
signature
)
Model Schemas
Validate inputs against schema during inference
input_frame = pd.DataFrame.from_dict({
"year built": data["year built"],
"year sold" data["year sold"],
"lot area": data["lot area"],
"zip code": data["zip code"],
"condition": data["condition"],
})
Input Schema
Output Schema
Model
Schema Mismatch Error
Overview of new developments
Feature deep dives
Autologging for scikit-learn
Model schemas & input examples
MLflow’s plugin ecosystem
What’s next for MLflow?
Outline
Plugins
Seamlessly integrate third-party code & infrastructure with MLflow
Third-party plugin code
TrackingStore
Artifact Repository
Deployment Client
setup(
entry_points = {
"mlflow.tracking_store": …,
…
}
)
Installation entrypoint MLflow Library
Plugins
Supported plugin types:
Tracking storage & search (Tracking Store)
Tracking metadata collection (Run Context Provider)
Model registry storage & search (Model Registry Store)
Project execution (Project Backend)
Model deployment (Deployment Client)
Creating a plugin
1. Implement the plugin interface
(https://mlflow.org/docs/latest/plugins.html)
2. Add an MLflow plugin entrypoint & upload your package to PyPI
3. Add your plugin to the MLflow docs:
https://mlflow.org/docs/latest/plugins.html#community-plugins
Pluggable way to create and manage
deployment endpoints in MLflow
Used in new endpoint
Other integrations being ported:
Deployments API
mlflow deployments create -t redisai -n spam
-m models:/SpamScorer/production
mlflow deployments predict -t redisai –n spam
-f emails.json
new
in
1.9
Pluggable way to execute MLflow
Projects on a variety of compute
resources
Used for new YARN backend
Other integrations being ported:
Project Backend Plugins
mlflow run --backend yarn
https://github.com/mlflow/mlflow#examples/pytorch
new
in
1.9
Community Plugins
Elastic Search backend for MLflow Tracking (experimental)
(https://pypi.org/project/mlflow-elasticsearchstore/)
Model deployment to RedisAI (https://pypi.org/project/mlflow-redisai/)
Project execution on YARN (https://pypi.org/project/mlflow-yarn/)
Artifact storage in SQL Server
(https://pypi.org/project/mlflow-dbstore/)
Artifact storage in Alibaba Cloud OSS
(https://pypi.org/project/aliyunstoreplugin/)
Overview of new developments
Feature deep dives
Autologging for scikit-learn
Model schemas & input examples
MLflow’s plugin ecosystem
What’s next for MLflow?
Outline
What’s next for ?
Model explainability
integrations (see Data + AI
Summit Europe 2020)
Support for tensor input /
output schemas
Expanded input example &
schema collection for
autologging (XGBoost,
TensorFlow, and more)
Input Schema
Output Schema
autolog()
Model
Model Serving on Databricks
Tracking
Experiment tracking
Logged
Model
Model Registry
Model management
Model Serving
Turnkey serving for
MLflow models
new
Staging Production Archived
Data Scientists Application Engineers
Reports
Applications
...
REST
Endpoint
public
preview
Deployment Backends
DATABRICKS.COM/DATAAISUMMIT
Thank you!
pip install mlflow
GitHub Repository: github.com/mlflow/mlflow
Website: mlflow.org
Slack server: https://tinyurl.com/mlflow-slack-server

Whats new in_mlflow

  • 1.
    What’s new in? Corey Zumar Software Engineer, Databricks A System to Accelerate the Machine Learning Lifecycle
  • 2.
    Overview of newdevelopments Feature deep dives Autologging for scikit-learn Model schemas & input examples MLflow’s plugin ecosystem What’s next for MLflow? Outline
  • 3.
    : An OpenSource ML Platform Experiment management Model managementReproducible runs Model packaging and deployment T R A C K I N G P R O J E C T S M O D E L R E G I S T R YM O D E L S Training Deployment Raw Data Data Prep ML Engineer Application DeveloperData Engineer Any Language Any ML Library
  • 4.
    : Overview ofnew developments TRACKING PROJECTS MODELS MODEL REGISTRY - Scikit-learn autologging (1.11) - Fast.ai autologging (1.9) - UI accessibility, syntax highlighting for artifacts + PDF support (1.9, 1.11) - Backend plugin support (1.9) - YARN execution backend - Expanded artifact resolution capabilities (1.9) - Model schemas & input examples (1.9) - Deployment plugin support (1.9) - Spacy model flavor (1.8) - Tags for registered models and model versions (1.9) - Enhanced version comparison UI, including schemas (1.11) - Simplified model archiving (1.10)
  • 5.
    Overview of newdevelopments Feature deep dives Autologging for scikit-learn Model signatures & input examples MLflow’s plugin ecosystem What’s next for MLflow? Outline
  • 6.
    Instrumenting scikit-learn workflows mlflow.start_run() mlflow.log_param(“alpha”,alpha) mlflow.log_param(“l1_ratio”, l1_ratio) model = ElasticNet(alpha, l1_ratio) model.fit(X, y) y_pred = model.predict(X) loss = mean_absolute_error(y, y_pred) mlflow.log_metric(“loss”, loss) mlflow.sklearn.log_model(model) mlflow.end_run()
  • 7.
    Instrumenting scikit-learn workflows mlflow.start_run() mlflow.log_param(“alpha”,alpha) mlflow.log_param(“l1_ratio”, l1_ratio) model = ElasticNet(alpha, l1_ratio) model.fit(X, y) y_pred = model.predict(X) loss = mean_absolute_error(y, y_pred) mlflow.log_metric(“loss”, loss) mlflow.sklearn.log_model(model) mlflow.end_run() ElasticNet ARDRegression AdaBoostClassifier AdaBoostRegressor AdditiveChi2Sampler AffinityPropagation AgglomerativeClustering BaggingClassifier BaggingRegressor BayesianGaussianMixture BayesianRidge BernoulliNB BernoulliRBM Binarizer Birch CCA CalibratedClassifierCV CategoricalNB ClassifierChain ColumnTransformer ComplementNB CountVectorizer DBSCAN DecisionTreeClassifier DecisionTreeRegressor DictVectorizer DictionaryLearning DummyClassifier DummyRegressor ElasticNet ElasticNetCV EllipticEnvelope EmpiricalCovariance ExtraTreeClassifier ExtraTreeRegressor ExtraTreesClassifier ExtraTreesRegressor FactorAnalysis FastICA FeatureAgglomeration FeatureHasher FeatureUnion FunctionTransformer GammaRegressor GaussianMixture GaussianNB GaussianProcessClassifier GaussianProcessRegressor GaussianRandomProjection GenericUnivariateSelect GradientBoostingClassifier GradientBoostingRegressor GraphicalLasso GraphicalLassoCV GridSearchCV HashingVectorizer HistGradientBoostingClassifier HistGradientBoostingRegressor HuberRegressor IncrementalPCA IsolationForest Isomap IsotonicRegression IterativeImputer KBinsDiscretizer KMeans KNNImputer KNeighborsClassifier KNeighborsRegressor KNeighborsTransformer KernelCenterer KernelDensity KernelPCA KernelRidge LabelBinarizer LabelEncoder LabelPropagation LabelSpreading Lars LarsCV Lasso LassoCV LassoLars LassoLarsCV LassoLarsIC LatentDirichletAllocation LedoitWolf LinearDiscriminantAnalysis LinearRegression LinearSVC LinearSVR LocalOutlierFactor LocallyLinearEmbedding LogisticRegression LogisticRegressionCV MDS MLPClassifier MLPRegressor MaxAbsScaler MeanShift MinCovDet MinMaxScaler MiniBatchDictionaryLearning MiniBatchKMeans MiniBatchSparsePCA MissingIndicator MultiLabelBinarizer MultiOutputClassifier MultiOutputRegressor MultiTaskElasticNet MultiTaskElasticNetCV MultiTaskLasso MultiTaskLassoCV MultinomialNB NMF NearestCentroid NearestNeighbors NeighborhoodComponentsAnaly sis Normalizer NuSVC NuSVR Nystroem OAS OPTICS OneClassSVM OneHotEncoder OneVsOneClassifier OneVsRestClassifier OrdinalEncoder OrthogonalMatchingPursuit OrthogonalMatchingPursuitCV OutputCodeClassifier PCA PLSCanonical PLSRegression PLSSVD PassiveAggressiveClassifier PassiveAggressiveRegressor PatchExtractor Perceptron Pipeline PoissonRegressor PolynomialFeatures PowerTransformer QuadraticDiscriminantAnalysis QuantileTransformer RANSACRegressor RBFSampler RFE RFECV RadiusNeighborsClassifier RadiusNeighborsRegressor RadiusNeighborsTransformer RandomForestClassifier RandomForestRegressor RandomTreesEmbedding RandomizedSearchCV RegressorChain Ridge RidgeCV RidgeClassifier RidgeClassifierCV RobustScaler SGDClassifier SGDRegressor SVC SVR SelectFdr SelectFpr SelectFromModel SelectFwe SelectKBest SelectPercentile ShrunkCovariance SimpleImputer SkewedChi2Sampler SparseCoder SparsePCA SparseRandomProjection SpectralBiclustering SpectralClustering SpectralCoclustering SpectralEmbedding StackingClassifier StackingRegressor StandardScaler TSNE TfidfTransformer TfidfVectorizer TheilSenRegressor TransformedTargetRegressor TruncatedSVD TweedieRegressor VarianceThreshold VotingClassifier VotingRegressor
  • 8.
    Autologging for scikit-learn Oneline to record params, metrics and models: mlflow.sklearn.autolog() new in 1.11
  • 20.
  • 21.
    Overview of newdevelopments Feature deep dives Autologging for scikit-learn Model schemas & input examples MLflow’s plugin ecosystem What’s next for MLflow? Outline
  • 22.
    Model Schemas Specify inputand output data types for models Incompatible schemas! Model Input Schema Output Schema Check Compatibility and Validate New Model Versions new in 1.9 zipcode: string, sqft: double, distance: double price: double log_model(…)
  • 23.
    Model Schemas Infer modelinput / output signature from data infer_signature( inputs, outputs ) inputs: [ 'year built': long, 'year sold': long, 'lot area': long, 'zip code': long, 'quality': long ] outputs: ['sale price': double] log_model( ..., signature )
  • 24.
    Model Schemas Validate inputsagainst schema during inference input_frame = pd.DataFrame.from_dict({ "year built": data["year built"], "year sold" data["year sold"], "lot area": data["lot area"], "zip code": data["zip code"], "condition": data["condition"], }) Input Schema Output Schema Model Schema Mismatch Error
  • 34.
    Overview of newdevelopments Feature deep dives Autologging for scikit-learn Model schemas & input examples MLflow’s plugin ecosystem What’s next for MLflow? Outline
  • 35.
    Plugins Seamlessly integrate third-partycode & infrastructure with MLflow Third-party plugin code TrackingStore Artifact Repository Deployment Client setup( entry_points = { "mlflow.tracking_store": …, … } ) Installation entrypoint MLflow Library
  • 36.
    Plugins Supported plugin types: Trackingstorage & search (Tracking Store) Tracking metadata collection (Run Context Provider) Model registry storage & search (Model Registry Store) Project execution (Project Backend) Model deployment (Deployment Client)
  • 37.
    Creating a plugin 1.Implement the plugin interface (https://mlflow.org/docs/latest/plugins.html) 2. Add an MLflow plugin entrypoint & upload your package to PyPI 3. Add your plugin to the MLflow docs: https://mlflow.org/docs/latest/plugins.html#community-plugins
  • 38.
    Pluggable way tocreate and manage deployment endpoints in MLflow Used in new endpoint Other integrations being ported: Deployments API mlflow deployments create -t redisai -n spam -m models:/SpamScorer/production mlflow deployments predict -t redisai –n spam -f emails.json new in 1.9
  • 39.
    Pluggable way toexecute MLflow Projects on a variety of compute resources Used for new YARN backend Other integrations being ported: Project Backend Plugins mlflow run --backend yarn https://github.com/mlflow/mlflow#examples/pytorch new in 1.9
  • 40.
    Community Plugins Elastic Searchbackend for MLflow Tracking (experimental) (https://pypi.org/project/mlflow-elasticsearchstore/) Model deployment to RedisAI (https://pypi.org/project/mlflow-redisai/) Project execution on YARN (https://pypi.org/project/mlflow-yarn/) Artifact storage in SQL Server (https://pypi.org/project/mlflow-dbstore/) Artifact storage in Alibaba Cloud OSS (https://pypi.org/project/aliyunstoreplugin/)
  • 41.
    Overview of newdevelopments Feature deep dives Autologging for scikit-learn Model schemas & input examples MLflow’s plugin ecosystem What’s next for MLflow? Outline
  • 42.
    What’s next for? Model explainability integrations (see Data + AI Summit Europe 2020) Support for tensor input / output schemas Expanded input example & schema collection for autologging (XGBoost, TensorFlow, and more) Input Schema Output Schema autolog() Model
  • 43.
    Model Serving onDatabricks Tracking Experiment tracking Logged Model Model Registry Model management Model Serving Turnkey serving for MLflow models new Staging Production Archived Data Scientists Application Engineers Reports Applications ... REST Endpoint public preview Deployment Backends
  • 44.
  • 45.
    Thank you! pip installmlflow GitHub Repository: github.com/mlflow/mlflow Website: mlflow.org Slack server: https://tinyurl.com/mlflow-slack-server