RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro

RESTful Machine
Learning with Flask and
TensorFlow Serving
Carlo Mazzaferro
Data Scientist
ByteCubed
PyData DC 2018

Goals
● Explain some common problems that arise when developing
machine learning products
● Describe some of the solutions that are leveraged to solve those
problems, alongside with their limitations
● Open source a library

Background
● BS Biomed Eng/Bioinformatics
● DNA stuff
○ Lots of data (wrangling)
○ Data pipelines
○ Data management

Background
● BS Biomed Eng/Bioinformatics
● DNA stuff
○ Lots of data (wrangling)
○ Data pipelines
○ Data management
○ Data Science aspirant
● Currently
○ ML Engineering @ ByteCubed
○ Some frontend work

Background
● Currently
○ ML Engineering @ ByteCubed
○ Some frontend work
○ Putting models in the client’s hands
● Since always
○ Workflow optimization for ML/DS

Takeaways
● To make great products
○ Repeatability matters
○ Ease of deployment matters
○ Quick iteration is key
○ 80-20 rule
Ultimately, reduce friction between R&D and the rest of the Software Practice

Takeaways
● To make great products
○ Do machine learning like the great engineer you are, not like the great machine learning expert
you aren’t.
[Google’s best practices for ML]
○ Provide DS with the right tooling, and the entire organization will benefit

TensorFlow Serving
A high-performance serving system for machine learning models, designed for production environments
Capabilities
● High-performance inference
● Model discovery
● RPC interface (gRPC and REST)
● Much more: tensorflow.org/serving

How?
Capabilities
● Keeping models loaded without needing to restore the dataflow graph and
session
● gRPC interface for high-performance, low-latency inference
● Low-level control on how models are loaded
○ C++ API

● Relies on a C++ API for low-level control
● Undocumented Python API
○ Small subset of features implemented
● gRPC API only
○ REST API (for inference only) introduced in August
Limitations

● Pythonic API ✅
● JSON + REST ✅
● Test different models, architectures, configurations ✅
● Track models over time ✅
Needs

Current state
Existing solutions
● Clipper.ai → great, but not very active (last commit in June)
● P/SaaS platforms (Algorithmia, Rekognition, etc.)
● Mlflow.org → R&D, tracking of model performance
● Custom solutions of varied quality
Model deployment frameworks

Introducing racket
Minimalistic framework for ML model deployment & management
● Access to most of the TensorFlow Serving functionality in a Pythonic way
● Model exploration and deployment, consolidated

racket
Motivation
● Automation
○ Quick iteration
● Reduce boilerplate
○ Versioning & serving
● Exploration & visibility
○ Loading a different version
○ Ability to track & query model performance
● Flexibility
○ Multiple models can be loaded and are accessible through a simple API

racket
Features
● Automated model version
● RESTful interface with rich capabilities
○ Loading a different model
○ Loading a different version
○ Ability to track & query model performance
● Automatic API documentation with Swagger
● Train, explore, and deploy with a single tool
● CLI access
● Static typing

ML For Humans
Enabling integration with non-ml experts:
Developer perspective
● REST access
○ Ease of integration
● Ease of deployment
○ Containers not just an afterthought
● Visibility: discover shapes of inputs
needed, shapes of outputs
+

ML For Humans
Enabling integration with non-ml experts:
Data Scientist perspective
from racket import KerasLearner
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.callbacks import TensorBoard
class KerasModel(KerasLearner):
VERSION = '1.1.1'
MODEL_TYPE = 'regression'
MODEL_NAME = 'keras-simple-regression'
def build_model(self):
optimizer = tf.train.RMSPropOptimizer(0.001)
model = Sequential()
model.add(Dense(24, input_dim=5, kernel_initializer='normal', activation='relu'))
model.add(Dense(48, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
model.compile(loss='mean_absolute_error', optimizer=optimizer)
return model
def fit(self, x, y, x_val=None, y_val=None, epochs=2, batch_size=20):
self.model.fit(x, y, epochs=epochs, verbose=0, validation_data=(x_val, y_val))
if __name__ == '__main__':
X_train, X_test, y_train, y_test = train_test_split(x, y)
kf = KerasModel()
kf.fit(X_train, y_train, x_val=X_test, y_val=y_test)
kf.store(autoload=True)
● Define model, rest is taken care of
● Scoring done automatically
○ Including user-defined/multiple
metrics
● Access to previous runs’ data
● Multiple models & versions

Quick demo
Demo link: https://asciinema.org/a/xxoebEfyu1bzO84hWWAams577
Docs: https://r-racket.readthedocs-hosted.com/en/latest/
Code: https://github.com/carlomazzaferro/racket

Moving Forward
● Needs
○ Support for batching requests
○ Better testing
○ Deployment guide
■ Kubernetes & co.
○ Feature parity on CLI vs REST access

RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro

More Related Content

What's hot

Similar to RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro

More from PyData

Recently uploaded

RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro