RESTful Machine
Learning with Flask and
TensorFlow Serving
Carlo Mazzaferro
Data Scientist
ByteCubed
PyData DC 2018
Goals
● Explain some common problems that arise when developing
machine learning products
● Describe some of the solutions that are leveraged to solve those
problems, alongside with their limitations
● Open source a library
Background
● BS Biomed Eng/Bioinformatics
● DNA stuff
○ Lots of data (wrangling)
○ Data pipelines
○ Data management
Background
● BS Biomed Eng/Bioinformatics
● DNA stuff
○ Lots of data (wrangling)
○ Data pipelines
○ Data management
○ Data Science aspirant
● Currently
○ ML Engineering @ ByteCubed
○ Some frontend work
Background
● Currently
○ ML Engineering @ ByteCubed
○ Some frontend work
○ Putting models in the client’s hands
● Since always
○ Workflow optimization for ML/DS
Takeaways
● To make great products
○ Repeatability matters
○ Ease of deployment matters
○ Quick iteration is key
○ 80-20 rule
Ultimately, reduce friction between R&D and the rest of the Software Practice
Takeaways
● To make great products
○ Do machine learning like the great engineer you are, not like the great machine learning expert
you aren’t.
[Google’s best practices for ML]
○ Provide DS with the right tooling, and the entire organization will benefit
TensorFlow Serving
A high-performance serving system for machine learning models, designed for production environments
Capabilities
● High-performance inference
● Model discovery
● RPC interface (gRPC and REST)
● Much more: tensorflow.org/serving
How?
A high-performance serving system for machine learning models, designed for production environments
Capabilities
● Keeping models loaded without needing to restore the dataflow graph and
session
● gRPC interface for high-performance, low-latency inference
● Low-level control on how models are loaded
○ C++ API
● Relies on a C++ API for low-level control
● Undocumented Python API
○ Small subset of features implemented
● gRPC API only
○ REST API (for inference only) introduced in August
Limitations
A high-performance serving system for machine learning models, designed for production environments
● Pythonic API ✅
● JSON + REST ✅
● Test different models, architectures, configurations ✅
● Track models over time ✅
Needs
A high-performance serving system for machine learning models, designed for production environments
Current state
Existing solutions
● Clipper.ai → great, but not very active (last commit in June)
● P/SaaS platforms (Algorithmia, Rekognition, etc.)
● Mlflow.org → R&D, tracking of model performance
● Custom solutions of varied quality
Model deployment frameworks
Introducing racket
Minimalistic framework for ML model deployment & management
● Access to most of the TensorFlow Serving functionality in a Pythonic way
● Model exploration and deployment, consolidated
racket
Motivation
● Automation
○ Quick iteration
● Reduce boilerplate
○ Versioning & serving
● Exploration & visibility
○ Loading a different version
○ Ability to track & query model performance
● Flexibility
○ Multiple models can be loaded and are accessible through a simple API
racket
Features
● Automated model version
● RESTful interface with rich capabilities
○ Loading a different model
○ Loading a different version
○ Ability to track & query model performance
● Automatic API documentation with Swagger
● Train, explore, and deploy with a single tool
● CLI access
● Static typing
ML For Humans
Enabling integration with non-ml experts:
Developer perspective
● REST access
○ Ease of integration
● Ease of deployment
○ Containers not just an afterthought
● Visibility: discover shapes of inputs
needed, shapes of outputs
+
ML For Humans
Enabling integration with non-ml experts:
Data Scientist perspective
from racket import KerasLearner
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.callbacks import TensorBoard
class KerasModel(KerasLearner):
VERSION = '1.1.1'
MODEL_TYPE = 'regression'
MODEL_NAME = 'keras-simple-regression'
def build_model(self):
optimizer = tf.train.RMSPropOptimizer(0.001)
model = Sequential()
model.add(Dense(24, input_dim=5, kernel_initializer='normal', activation='relu'))
model.add(Dense(48, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
model.compile(loss='mean_absolute_error', optimizer=optimizer)
return model
def fit(self, x, y, x_val=None, y_val=None, epochs=2, batch_size=20):
self.model.fit(x, y, epochs=epochs, verbose=0, validation_data=(x_val, y_val))
if __name__ == '__main__':
X_train, X_test, y_train, y_test = train_test_split(x, y)
kf = KerasModel()
kf.fit(X_train, y_train, x_val=X_test, y_val=y_test)
kf.store(autoload=True)
● Define model, rest is taken care of
● Scoring done automatically
○ Including user-defined/multiple
metrics
● Access to previous runs’ data
● Multiple models & versions
Quick demo
Demo link: https://asciinema.org/a/xxoebEfyu1bzO84hWWAams577
Docs: https://r-racket.readthedocs-hosted.com/en/latest/
Code: https://github.com/carlomazzaferro/racket
Moving Forward
● Needs
○ Support for batching requests
○ Better testing
○ Deployment guide
■ Kubernetes & co.
○ Feature parity on CLI vs REST access
Questions ?

RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro

  • 1.
    RESTful Machine Learning withFlask and TensorFlow Serving Carlo Mazzaferro Data Scientist ByteCubed PyData DC 2018
  • 2.
    Goals ● Explain somecommon problems that arise when developing machine learning products ● Describe some of the solutions that are leveraged to solve those problems, alongside with their limitations ● Open source a library
  • 3.
    Background ● BS BiomedEng/Bioinformatics ● DNA stuff ○ Lots of data (wrangling) ○ Data pipelines ○ Data management
  • 4.
    Background ● BS BiomedEng/Bioinformatics ● DNA stuff ○ Lots of data (wrangling) ○ Data pipelines ○ Data management ○ Data Science aspirant ● Currently ○ ML Engineering @ ByteCubed ○ Some frontend work
  • 5.
    Background ● Currently ○ MLEngineering @ ByteCubed ○ Some frontend work ○ Putting models in the client’s hands ● Since always ○ Workflow optimization for ML/DS
  • 6.
    Takeaways ● To makegreat products ○ Repeatability matters ○ Ease of deployment matters ○ Quick iteration is key ○ 80-20 rule Ultimately, reduce friction between R&D and the rest of the Software Practice
  • 7.
    Takeaways ● To makegreat products ○ Do machine learning like the great engineer you are, not like the great machine learning expert you aren’t. [Google’s best practices for ML] ○ Provide DS with the right tooling, and the entire organization will benefit
  • 8.
    TensorFlow Serving A high-performanceserving system for machine learning models, designed for production environments Capabilities ● High-performance inference ● Model discovery ● RPC interface (gRPC and REST) ● Much more: tensorflow.org/serving
  • 9.
    How? A high-performance servingsystem for machine learning models, designed for production environments Capabilities ● Keeping models loaded without needing to restore the dataflow graph and session ● gRPC interface for high-performance, low-latency inference ● Low-level control on how models are loaded ○ C++ API
  • 10.
    ● Relies ona C++ API for low-level control ● Undocumented Python API ○ Small subset of features implemented ● gRPC API only ○ REST API (for inference only) introduced in August Limitations A high-performance serving system for machine learning models, designed for production environments
  • 11.
    ● Pythonic API✅ ● JSON + REST ✅ ● Test different models, architectures, configurations ✅ ● Track models over time ✅ Needs A high-performance serving system for machine learning models, designed for production environments
  • 12.
    Current state Existing solutions ●Clipper.ai → great, but not very active (last commit in June) ● P/SaaS platforms (Algorithmia, Rekognition, etc.) ● Mlflow.org → R&D, tracking of model performance ● Custom solutions of varied quality Model deployment frameworks
  • 13.
    Introducing racket Minimalistic frameworkfor ML model deployment & management ● Access to most of the TensorFlow Serving functionality in a Pythonic way ● Model exploration and deployment, consolidated
  • 14.
    racket Motivation ● Automation ○ Quickiteration ● Reduce boilerplate ○ Versioning & serving ● Exploration & visibility ○ Loading a different version ○ Ability to track & query model performance ● Flexibility ○ Multiple models can be loaded and are accessible through a simple API
  • 15.
    racket Features ● Automated modelversion ● RESTful interface with rich capabilities ○ Loading a different model ○ Loading a different version ○ Ability to track & query model performance ● Automatic API documentation with Swagger ● Train, explore, and deploy with a single tool ● CLI access ● Static typing
  • 16.
    ML For Humans Enablingintegration with non-ml experts: Developer perspective ● REST access ○ Ease of integration ● Ease of deployment ○ Containers not just an afterthought ● Visibility: discover shapes of inputs needed, shapes of outputs +
  • 17.
    ML For Humans Enablingintegration with non-ml experts: Data Scientist perspective from racket import KerasLearner from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.callbacks import TensorBoard class KerasModel(KerasLearner): VERSION = '1.1.1' MODEL_TYPE = 'regression' MODEL_NAME = 'keras-simple-regression' def build_model(self): optimizer = tf.train.RMSPropOptimizer(0.001) model = Sequential() model.add(Dense(24, input_dim=5, kernel_initializer='normal', activation='relu')) model.add(Dense(48, kernel_initializer='normal', activation='relu')) model.add(Dense(1, kernel_initializer='normal')) model.compile(loss='mean_absolute_error', optimizer=optimizer) return model def fit(self, x, y, x_val=None, y_val=None, epochs=2, batch_size=20): self.model.fit(x, y, epochs=epochs, verbose=0, validation_data=(x_val, y_val)) if __name__ == '__main__': X_train, X_test, y_train, y_test = train_test_split(x, y) kf = KerasModel() kf.fit(X_train, y_train, x_val=X_test, y_val=y_test) kf.store(autoload=True) ● Define model, rest is taken care of ● Scoring done automatically ○ Including user-defined/multiple metrics ● Access to previous runs’ data ● Multiple models & versions
  • 18.
    Quick demo Demo link:https://asciinema.org/a/xxoebEfyu1bzO84hWWAams577 Docs: https://r-racket.readthedocs-hosted.com/en/latest/ Code: https://github.com/carlomazzaferro/racket
  • 19.
    Moving Forward ● Needs ○Support for batching requests ○ Better testing ○ Deployment guide ■ Kubernetes & co. ○ Feature parity on CLI vs REST access
  • 20.