ONNX and MLflow

ONNX and MLflow
Andre Mesarovic
February 26, 2020
1

Open Neural Network Exchange
Lingua Franca of Deep Learning
Hagay Lupesko, Engineering Leader, AI and ML at Facebook
2

Agenda
4
• What is ONNX?
• Who is using ONNX?
• ONNX and MLflow

ONNX
5
• Open interoperable format to represent ML models
• Model portability across frameworks
• Decouple training and scoring
• Interoperability between frameworks, compilers, runtimes,
and hardware accelerators
• Accelerated inferencing on cloud and edge (IOT)

ONNX
6
• Wide support by ML industry vendors
• Bringing the worlds of AI research and products closer
together so that they innovate and deploy faster
• Train in one framework and score in another
• Optimize models for deployments on multiple platforms
• Originally for DL models, but now covers non-DL (sklearn)

ONNX Pillars
7
• ONNX format standard - Linux Foundation - Nov. 2019
• ONNX converters
○ TensorFlow to ONNX, ONNX to TensorFlow, etc.
• ONNX Runtime
○ MSFT ONNX Runtime - open-source
○ Nvidia TensorRT

ONNX Timelines
8
• Founded in Sep. 2017 by Microsoft and Facebook
• AWS, Intel, AMD and NVIDIA support by Dec. 2017
• Enhanced Facebook F8 support in May 2018
• ONNX Runtime open-sourced by MSFT in Dec. 2018
• ONNX joins Linux Foundation

ONNX Sample Optimizing Pipeline
12

ONNX Model Zoo
13
• Collection of pre-trained state-of-the-art DL models
• Central repository of reusable models
• Python Jupyter notebooks
• Image classification, natural language, vision, etc.
• https://github.com/onnx/models

Intermediate Representation (IR)
14
• IR is common concept in compilers and virtual machines
• Two key features of an IR:
○ Capture source code without loss of information
○ Be independent of target platform
• Common representation for source tensor formats
• Providers optimize IR for target hardware devices

ONNX IR
15
• 116 operators
• Acos, BatchNormalization, HardSigmoid, Relu, Softmax, etc.
• Export models is reasonably robust
• Import is less robust due to unimplemented ops
• https://github.com/onnx/onnx/blob/master/docs/Operators.
md

Optimizer Processing
16
• Fusion - Fuse multiple ops
• Data layout abstraction
• Data reuse - reuse for subgraphs
• Graph scheduling - run similar subgraphs in parallel
• Graph partitioning - Partition subgraphs to run on different
devices
• Memory management

Computation Graph
17
import numpy as np
a = np.ones(10)
b = np.ones(10) * 2
c = b * a
d = c + 1

ONNX and MLeap
18
• Think of ONNX as MLeap on steroids
• Both address ML model interoperability but...
• MLeap focuses on real-time scoring with Spark ML
• ONNX support for Spark ML is weak
• MLeap: 2 person company Combust no longer supporting
• ONNX backed by large number of big ML vendors

Microsoft
19
• Focus on WinML and ML.net
• Office, Windows, Cognitive Services, Skype, Bing Ads, PowerBI
• 100s millions of devices, serving billions of requests
• 2019 MSFT announced that Windows 10 will have ONNX
embedded in the OS to include the ability to run ML models
natively with hardware acceleration

Microsoft ONNX Runtime Usage
20
• Used in millions of Windows devices and powers core models
across Office, Bing, and Azure
• Average of 2x performance gains
• Office - 14.6x reduction in latency
• Bing QnA - 2.8x reduction in latency
• Azure Cognitive Services - 3.5x reduction in latency for OCR

Microsoft - BERT - ONNX
21
• Bidirectional Encoder Representations from Transformers
• Google’s state-of-the-art NLP model
• BERT is widely used in Bing
• MSFT just open-sources BERT in Jan. 2020
• 17x BERT inference acceleration with ONNX Runtime
• Scores with Nvidia V100 GPU in 1.7 milliseconds
• https://cloudblogs.microsoft.com/opensource/2020/01/21/microsoft-onnx-open-
source-optimizations-transformer-inference-gpu-cpu

Microsoft Raven- SQL Server + ONNX
22
• Extending relational query processing with ML inference
○ http://cidrdb.org/cidr2020/papers/p24-karanasos-cidr20.pdf
• Project Raven
○ Raven, deep native integration of ONNX runtimes with SQL Server
○ and a unified IR
○ advanced cross-optimizations between ML and database operators
• Can in-RDBMS scoring outperform dedicated frameworks?

Raven Concepts
24
• Introduces IR that includes both ML and relational operators.
• Optimize inference query that includes both data and ML
operations in a holistic manner
• Leverage relational operator and data properties to optimize
ML part of query

Raven Operator Sets
25
• Relation algebra (RA)
• Linear Algebra (LA)
• Other ML operators and data featurizers (MLD) - classical non-
NN frameworks such as sklearn
• UDFs - Used to wrap the non-optimizable code as a black box

Raven Inference Engine
26
• Inference execution modes
○ In-process execution (Raven)
○ Out-of-process execution (Raven Ext)
○ Containerized execution
• For small data sets Raven slower than ONNX runtime
• For large data sets Raven is 5x faster

Microsoft - SQL Database Edge
27
• Deploy and make predictions with an ONNX model in SQL
Database Edge Preview - 2019-11-04
○ https://docs.microsoft.com/en-us/azure/sql-database-edge/deploy-onnx
• Machine learning and AI with ONNX in SQL Database Edge
Preview - 2019-11-07
○ https://docs.microsoft.com/en-us/azure/sql-database-edge/onnx-overview

AWS
28
• ONNX is already integrated with MXNet
• ONNX installed on AWS Deep Learning AMIs (DLAMI)
• New Inferentia chip supports ONNX
• Amazon Elastic Inference supports ONNX
• Model Server for Apache MXNet (MMS)
• Score with ONNX.js using Lambda and Serverless

Facebook
29
• PyTorch 1.0 has native ONNX export format since May 2018
• Has not been nearly as active recently as MIcrosoft
• But is quietly contributing to ONNX github
• https://www.facebook.com/onnxai
• https://ai.facebook.com/blog/onnx-expansion-speeds-ai-
development-

Nvidia
30
• TensorRT - SDK for high-performance DL inferencing
• Nvidia GPU Cloud ONNX support for TensorRT in Dec. 2017
• ONNX Runtime support for TensorRT in Dec. 2018
• TensorRT backend for ONNX
○ https://github.com/onnx/onnx-tensorrt
• Jetson NANO

Apple
33
• Production-grade Core ML to ONNX conversion
• https://github.com/onnx/onnx-coreml
• https://apple.github.io/coremltools/

ONNX and Spark ML
34
• Spark ML is not advertised as ONNX supported
• Conversion project does exist:
○ https://github.com/onnx/onnxmltools
• Very few examples
• Preliminary testing reveals problems
• Opportunity to contribute!

ONNX and MLflow
35
• ONNX support introduced in MLflow 1.5.0
• Convert model to ONNX format
• Save ONNX model as ONNX flavor
• No automatic ONNX model logging like MLeap
• Scoring: use ONNX Runtime or convert to native flavor

ONNX MLflow spin around the block
36

ONNX and MLflow Test Coverage
37

MLflow and ONNX Sample Code
Github
• https://github.com/amesar/mlflow-examples/tree/master/python/sklearn
• https://github.com/amesar/mlflow-examples/tree/master/python/keras
38

MLflow and ONNX Keras Example
39
import mlflow.onnx
import onnxmltools
onnx_model = onnxmltools.convert_keras(model)
mlflow.onnx.log_model(onnx_model, "onnx-model")
Log Model
Read and Score Model
import onnxruntime
session = onnxruntime.InferenceSession(model.SerializeToString())
input_name = session.get_inputs()[0].name
predictions = session.run(None, {input_name:
data_np.astype(np.float32)})[0]

ONNX and MLflow

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to ONNX and MLflow

Similar to ONNX and MLflow (20)

More from amesar0

More from amesar0 (6)

Recently uploaded

Recently uploaded (20)

ONNX and MLflow