ONNX and MLflow
Andre Mesarovic
February 26, 2020
1
Open Neural Network Exchange
Lingua Franca of Deep Learning
Hagay Lupesko, Engineering Leader, AI and ML at Facebook
2
3
Agenda
4
• What is ONNX?
• Who is using ONNX?
• ONNX and MLflow
ONNX
5
• Open interoperable format to represent ML models
• Model portability across frameworks
• Decouple training and scoring
• Interoperability between frameworks, compilers, runtimes,
and hardware accelerators
• Accelerated inferencing on cloud and edge (IOT)
ONNX
6
• Wide support by ML industry vendors
• Bringing the worlds of AI research and products closer
together so that they innovate and deploy faster
• Train in one framework and score in another
• Optimize models for deployments on multiple platforms
• Originally for DL models, but now covers non-DL (sklearn)
ONNX Pillars
7
• ONNX format standard - Linux Foundation - Nov. 2019
• ONNX converters
○ TensorFlow to ONNX, ONNX to TensorFlow, etc.
• ONNX Runtime
○ MSFT ONNX Runtime - open-source
○ Nvidia TensorRT
ONNX Timelines
8
• Founded in Sep. 2017 by Microsoft and Facebook
• AWS, Intel, AMD and NVIDIA support by Dec. 2017
• Enhanced Facebook F8 support in May 2018
• ONNX Runtime open-sourced by MSFT in Dec. 2018
• ONNX joins Linux Foundation
ONNX Community
9
ONNX Training Frameworks
10
ONNX
11
ONNX Sample Optimizing Pipeline
12
ONNX Model Zoo
13
• Collection of pre-trained state-of-the-art DL models
• Central repository of reusable models
• Python Jupyter notebooks
• Image classification, natural language, vision, etc.
• https://github.com/onnx/models
Intermediate Representation (IR)
14
• IR is common concept in compilers and virtual machines
• Two key features of an IR:
○ Capture source code without loss of information
○ Be independent of target platform
• Common representation for source tensor formats
• Providers optimize IR for target hardware devices
ONNX IR
15
• 116 operators
• Acos, BatchNormalization, HardSigmoid, Relu, Softmax, etc.
• Export models is reasonably robust
• Import is less robust due to unimplemented ops
• https://github.com/onnx/onnx/blob/master/docs/Operators.
md
Optimizer Processing
16
• Fusion - Fuse multiple ops
• Data layout abstraction
• Data reuse - reuse for subgraphs
• Graph scheduling - run similar subgraphs in parallel
• Graph partitioning - Partition subgraphs to run on different
devices
• Memory management
Computation Graph
17
import numpy as np
a = np.ones(10)
b = np.ones(10) * 2
c = b * a
d = c + 1
ONNX and MLeap
18
• Think of ONNX as MLeap on steroids
• Both address ML model interoperability but...
• MLeap focuses on real-time scoring with Spark ML
• ONNX support for Spark ML is weak
• MLeap: 2 person company Combust no longer supporting
• ONNX backed by large number of big ML vendors
Microsoft
19
• Focus on WinML and ML.net
• Office, Windows, Cognitive Services, Skype, Bing Ads, PowerBI
• 100s millions of devices, serving billions of requests
• 2019 MSFT announced that Windows 10 will have ONNX
embedded in the OS to include the ability to run ML models
natively with hardware acceleration
Microsoft ONNX Runtime Usage
20
• Used in millions of Windows devices and powers core models
across Office, Bing, and Azure
• Average of 2x performance gains
• Office - 14.6x reduction in latency
• Bing QnA - 2.8x reduction in latency
• Azure Cognitive Services - 3.5x reduction in latency for OCR
Microsoft - BERT - ONNX
21
• Bidirectional Encoder Representations from Transformers
• Google’s state-of-the-art NLP model
• BERT is widely used in Bing
• MSFT just open-sources BERT in Jan. 2020
• 17x BERT inference acceleration with ONNX Runtime
• Scores with Nvidia V100 GPU in 1.7 milliseconds
• https://cloudblogs.microsoft.com/opensource/2020/01/21/microsoft-onnx-open-
source-optimizations-transformer-inference-gpu-cpu
Microsoft Raven- SQL Server + ONNX
22
• Extending relational query processing with ML inference
○ http://cidrdb.org/cidr2020/papers/p24-karanasos-cidr20.pdf
• Project Raven
○ Raven, deep native integration of ONNX runtimes with SQL Server
○ and a unified IR
○ advanced cross-optimizations between ML and database operators
• Can in-RDBMS scoring outperform dedicated frameworks?
Raven Overview
23
Raven Concepts
24
• Introduces IR that includes both ML and relational operators.
• Optimize inference query that includes both data and ML
operations in a holistic manner
• Leverage relational operator and data properties to optimize
ML part of query
Raven Operator Sets
25
• Relation algebra (RA)
• Linear Algebra (LA)
• Other ML operators and data featurizers (MLD) - classical non-
NN frameworks such as sklearn
• UDFs - Used to wrap the non-optimizable code as a black box
Raven Inference Engine
26
• Inference execution modes
○ In-process execution (Raven)
○ Out-of-process execution (Raven Ext)
○ Containerized execution
• For small data sets Raven slower than ONNX runtime
• For large data sets Raven is 5x faster
Microsoft - SQL Database Edge
27
• Deploy and make predictions with an ONNX model in SQL
Database Edge Preview - 2019-11-04
○ https://docs.microsoft.com/en-us/azure/sql-database-edge/deploy-onnx
• Machine learning and AI with ONNX in SQL Database Edge
Preview - 2019-11-07
○ https://docs.microsoft.com/en-us/azure/sql-database-edge/onnx-overview
AWS
28
• ONNX is already integrated with MXNet
• ONNX installed on AWS Deep Learning AMIs (DLAMI)
• New Inferentia chip supports ONNX
• Amazon Elastic Inference supports ONNX
• Model Server for Apache MXNet (MMS)
• Score with ONNX.js using Lambda and Serverless
Facebook
29
• PyTorch 1.0 has native ONNX export format since May 2018
• Has not been nearly as active recently as MIcrosoft
• But is quietly contributing to ONNX github
• https://www.facebook.com/onnxai
• https://ai.facebook.com/blog/onnx-expansion-speeds-ai-
development-
Nvidia
30
• TensorRT - SDK for high-performance DL inferencing
• Nvidia GPU Cloud ONNX support for TensorRT in Dec. 2017
• ONNX Runtime support for TensorRT in Dec. 2018
• TensorRT backend for ONNX
○ https://github.com/onnx/onnx-tensorrt
• Jetson NANO
Nvidia TensorRT
31
Intel
32
Apple
33
• Production-grade Core ML to ONNX conversion
• https://github.com/onnx/onnx-coreml
• https://apple.github.io/coremltools/
ONNX and Spark ML
34
• Spark ML is not advertised as ONNX supported
• Conversion project does exist:
○ https://github.com/onnx/onnxmltools
• Very few examples
• Preliminary testing reveals problems
• Opportunity to contribute!
ONNX and MLflow
35
• ONNX support introduced in MLflow 1.5.0
• Convert model to ONNX format
• Save ONNX model as ONNX flavor
• No automatic ONNX model logging like MLeap
• Scoring: use ONNX Runtime or convert to native flavor
ONNX MLflow spin around the block
36
ONNX and MLflow Test Coverage
37
MLflow and ONNX Sample Code
Github
• https://github.com/amesar/mlflow-examples/tree/master/python/sklearn
• https://github.com/amesar/mlflow-examples/tree/master/python/keras
38
MLflow and ONNX Keras Example
39
import mlflow.onnx
import onnxmltools
onnx_model = onnxmltools.convert_keras(model)
mlflow.onnx.log_model(onnx_model, "onnx-model")
Log Model
Read and Score Model
import onnxruntime
session = onnxruntime.InferenceSession(model.SerializeToString())
input_name = session.get_inputs()[0].name
predictions = session.run(None, {input_name:
data_np.astype(np.float32)})[0]
Thank you
Have a nice day
40

ONNX and MLflow

  • 1.
    ONNX and MLflow AndreMesarovic February 26, 2020 1
  • 2.
    Open Neural NetworkExchange Lingua Franca of Deep Learning Hagay Lupesko, Engineering Leader, AI and ML at Facebook 2
  • 3.
  • 4.
    Agenda 4 • What isONNX? • Who is using ONNX? • ONNX and MLflow
  • 5.
    ONNX 5 • Open interoperableformat to represent ML models • Model portability across frameworks • Decouple training and scoring • Interoperability between frameworks, compilers, runtimes, and hardware accelerators • Accelerated inferencing on cloud and edge (IOT)
  • 6.
    ONNX 6 • Wide supportby ML industry vendors • Bringing the worlds of AI research and products closer together so that they innovate and deploy faster • Train in one framework and score in another • Optimize models for deployments on multiple platforms • Originally for DL models, but now covers non-DL (sklearn)
  • 7.
    ONNX Pillars 7 • ONNXformat standard - Linux Foundation - Nov. 2019 • ONNX converters ○ TensorFlow to ONNX, ONNX to TensorFlow, etc. • ONNX Runtime ○ MSFT ONNX Runtime - open-source ○ Nvidia TensorRT
  • 8.
    ONNX Timelines 8 • Foundedin Sep. 2017 by Microsoft and Facebook • AWS, Intel, AMD and NVIDIA support by Dec. 2017 • Enhanced Facebook F8 support in May 2018 • ONNX Runtime open-sourced by MSFT in Dec. 2018 • ONNX joins Linux Foundation
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
    ONNX Model Zoo 13 •Collection of pre-trained state-of-the-art DL models • Central repository of reusable models • Python Jupyter notebooks • Image classification, natural language, vision, etc. • https://github.com/onnx/models
  • 14.
    Intermediate Representation (IR) 14 •IR is common concept in compilers and virtual machines • Two key features of an IR: ○ Capture source code without loss of information ○ Be independent of target platform • Common representation for source tensor formats • Providers optimize IR for target hardware devices
  • 15.
    ONNX IR 15 • 116operators • Acos, BatchNormalization, HardSigmoid, Relu, Softmax, etc. • Export models is reasonably robust • Import is less robust due to unimplemented ops • https://github.com/onnx/onnx/blob/master/docs/Operators. md
  • 16.
    Optimizer Processing 16 • Fusion- Fuse multiple ops • Data layout abstraction • Data reuse - reuse for subgraphs • Graph scheduling - run similar subgraphs in parallel • Graph partitioning - Partition subgraphs to run on different devices • Memory management
  • 17.
    Computation Graph 17 import numpyas np a = np.ones(10) b = np.ones(10) * 2 c = b * a d = c + 1
  • 18.
    ONNX and MLeap 18 •Think of ONNX as MLeap on steroids • Both address ML model interoperability but... • MLeap focuses on real-time scoring with Spark ML • ONNX support for Spark ML is weak • MLeap: 2 person company Combust no longer supporting • ONNX backed by large number of big ML vendors
  • 19.
    Microsoft 19 • Focus onWinML and ML.net • Office, Windows, Cognitive Services, Skype, Bing Ads, PowerBI • 100s millions of devices, serving billions of requests • 2019 MSFT announced that Windows 10 will have ONNX embedded in the OS to include the ability to run ML models natively with hardware acceleration
  • 20.
    Microsoft ONNX RuntimeUsage 20 • Used in millions of Windows devices and powers core models across Office, Bing, and Azure • Average of 2x performance gains • Office - 14.6x reduction in latency • Bing QnA - 2.8x reduction in latency • Azure Cognitive Services - 3.5x reduction in latency for OCR
  • 21.
    Microsoft - BERT- ONNX 21 • Bidirectional Encoder Representations from Transformers • Google’s state-of-the-art NLP model • BERT is widely used in Bing • MSFT just open-sources BERT in Jan. 2020 • 17x BERT inference acceleration with ONNX Runtime • Scores with Nvidia V100 GPU in 1.7 milliseconds • https://cloudblogs.microsoft.com/opensource/2020/01/21/microsoft-onnx-open- source-optimizations-transformer-inference-gpu-cpu
  • 22.
    Microsoft Raven- SQLServer + ONNX 22 • Extending relational query processing with ML inference ○ http://cidrdb.org/cidr2020/papers/p24-karanasos-cidr20.pdf • Project Raven ○ Raven, deep native integration of ONNX runtimes with SQL Server ○ and a unified IR ○ advanced cross-optimizations between ML and database operators • Can in-RDBMS scoring outperform dedicated frameworks?
  • 23.
  • 24.
    Raven Concepts 24 • IntroducesIR that includes both ML and relational operators. • Optimize inference query that includes both data and ML operations in a holistic manner • Leverage relational operator and data properties to optimize ML part of query
  • 25.
    Raven Operator Sets 25 •Relation algebra (RA) • Linear Algebra (LA) • Other ML operators and data featurizers (MLD) - classical non- NN frameworks such as sklearn • UDFs - Used to wrap the non-optimizable code as a black box
  • 26.
    Raven Inference Engine 26 •Inference execution modes ○ In-process execution (Raven) ○ Out-of-process execution (Raven Ext) ○ Containerized execution • For small data sets Raven slower than ONNX runtime • For large data sets Raven is 5x faster
  • 27.
    Microsoft - SQLDatabase Edge 27 • Deploy and make predictions with an ONNX model in SQL Database Edge Preview - 2019-11-04 ○ https://docs.microsoft.com/en-us/azure/sql-database-edge/deploy-onnx • Machine learning and AI with ONNX in SQL Database Edge Preview - 2019-11-07 ○ https://docs.microsoft.com/en-us/azure/sql-database-edge/onnx-overview
  • 28.
    AWS 28 • ONNX isalready integrated with MXNet • ONNX installed on AWS Deep Learning AMIs (DLAMI) • New Inferentia chip supports ONNX • Amazon Elastic Inference supports ONNX • Model Server for Apache MXNet (MMS) • Score with ONNX.js using Lambda and Serverless
  • 29.
    Facebook 29 • PyTorch 1.0has native ONNX export format since May 2018 • Has not been nearly as active recently as MIcrosoft • But is quietly contributing to ONNX github • https://www.facebook.com/onnxai • https://ai.facebook.com/blog/onnx-expansion-speeds-ai- development-
  • 30.
    Nvidia 30 • TensorRT -SDK for high-performance DL inferencing • Nvidia GPU Cloud ONNX support for TensorRT in Dec. 2017 • ONNX Runtime support for TensorRT in Dec. 2018 • TensorRT backend for ONNX ○ https://github.com/onnx/onnx-tensorrt • Jetson NANO
  • 31.
  • 32.
  • 33.
    Apple 33 • Production-grade CoreML to ONNX conversion • https://github.com/onnx/onnx-coreml • https://apple.github.io/coremltools/
  • 34.
    ONNX and SparkML 34 • Spark ML is not advertised as ONNX supported • Conversion project does exist: ○ https://github.com/onnx/onnxmltools • Very few examples • Preliminary testing reveals problems • Opportunity to contribute!
  • 35.
    ONNX and MLflow 35 •ONNX support introduced in MLflow 1.5.0 • Convert model to ONNX format • Save ONNX model as ONNX flavor • No automatic ONNX model logging like MLeap • Scoring: use ONNX Runtime or convert to native flavor
  • 36.
    ONNX MLflow spinaround the block 36
  • 37.
    ONNX and MLflowTest Coverage 37
  • 38.
    MLflow and ONNXSample Code Github • https://github.com/amesar/mlflow-examples/tree/master/python/sklearn • https://github.com/amesar/mlflow-examples/tree/master/python/keras 38
  • 39.
    MLflow and ONNXKeras Example 39 import mlflow.onnx import onnxmltools onnx_model = onnxmltools.convert_keras(model) mlflow.onnx.log_model(onnx_model, "onnx-model") Log Model Read and Score Model import onnxruntime session = onnxruntime.InferenceSession(model.SerializeToString()) input_name = session.get_inputs()[0].name predictions = session.run(None, {input_name: data_np.astype(np.float32)})[0]
  • 40.
    Thank you Have anice day 40