Successfully reported this slideshow.
Your SlideShare is downloading. ×

Deploying End-to-End Deep Learning Pipelines with ONNX

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 38 Ad

Deploying End-to-End Deep Learning Pipelines with ONNX

Download to read offline

A deep learning model is often viewed as fully self-contained, freeing practitioners from the burden of data processing and feature engineering. However, in most real-world applications of AI, these models have similarly complex requirements for data pre-processing, feature extraction and transformation as more traditional ML models. Any non-trivial use case requires care to ensure no model skew exists between the training-time data pipeline and the inference-time data pipeline.

This is not simply theoretical – small differences or errors can be difficult to detect but can have dramatic impact on the performance and efficacy of the deployed solution. Despite this, there are currently few widely accepted, standard solutions for enabling simple deployment of end-to-end deep learning pipelines to production. Recently, the Open Neural Network Exchange (ONNX) standard has emerged for representing deep learning models in a standardized format.

While this is useful for representing the core model inference phase, we need to go further to encompass deployment of the end-to-end pipeline. In this talk I will introduce ONNX for exporting deep learning computation graphs, as well as the ONNX-ML component of the specification, for exporting both ‘traditional’ ML models as well as common feature extraction, data transformation and post-processing steps.

I will cover how to use ONNX and the growing ecosystem of exporter libraries for common frameworks (including TensorFlow, PyTorch, Keras, scikit-learn and now Apache SparkML) to deploy complete deep learning pipelines.

Finally, I will explore best practices for working with and combining these disparate exporter toolkits, as well as highlight the gaps, issues and missing pieces to be taken into account and still to be addressed.

A deep learning model is often viewed as fully self-contained, freeing practitioners from the burden of data processing and feature engineering. However, in most real-world applications of AI, these models have similarly complex requirements for data pre-processing, feature extraction and transformation as more traditional ML models. Any non-trivial use case requires care to ensure no model skew exists between the training-time data pipeline and the inference-time data pipeline.

This is not simply theoretical – small differences or errors can be difficult to detect but can have dramatic impact on the performance and efficacy of the deployed solution. Despite this, there are currently few widely accepted, standard solutions for enabling simple deployment of end-to-end deep learning pipelines to production. Recently, the Open Neural Network Exchange (ONNX) standard has emerged for representing deep learning models in a standardized format.

While this is useful for representing the core model inference phase, we need to go further to encompass deployment of the end-to-end pipeline. In this talk I will introduce ONNX for exporting deep learning computation graphs, as well as the ONNX-ML component of the specification, for exporting both ‘traditional’ ML models as well as common feature extraction, data transformation and post-processing steps.

I will cover how to use ONNX and the growing ecosystem of exporter libraries for common frameworks (including TensorFlow, PyTorch, Keras, scikit-learn and now Apache SparkML) to deploy complete deep learning pipelines.

Finally, I will explore best practices for working with and combining these disparate exporter toolkits, as well as highlight the gaps, issues and missing pieces to be taken into account and still to be addressed.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Deploying End-to-End Deep Learning Pipelines with ONNX (20)

Advertisement

More from Databricks (20)

Advertisement

Deploying End-to-End Deep Learning Pipelines with ONNX

  1. 1. Deploying end-to-end deep learning pipelines with ONNX — Nick Pentreath Principal Engineer @MLnick
  2. 2. About IBM Developer / © 2019 IBM Corporation – @MLnick on Twitter & Github – Principal Engineer, IBM CODAIT (Center for Open-Source Data & AI Technologies) – Machine Learning & AI – Apache Spark committer & PMC – Author of Machine Learning with Spark – Various conferences & meetups 2
  3. 3. CODAIT Improving the Enterprise AI Lifecycle in Open Source Center for Open Source Data & AI Technologies IBM Developer / © 2019 IBM Corporation 3 CODAIT aims to make AI solutions dramatically easier to create, deploy, and manage in the enterprise. We contribute to and advocate for the open-source technologies that are foundational to IBM’s AI offerings. 30+ open-source developers!
  4. 4. The Machine Learning Workflow IBM Developer / © 2019 IBM Corporation 4
  5. 5. Perception IBM Developer / © 2019 IBM Corporation 5
  6. 6. In reality the workflow spans teams … IBM Developer / © 2019 IBM Corporation 6
  7. 7. … and tools … IBM Developer / © 2019 IBM Corporation 7
  8. 8. … and is a small (but critical!) piece of the puzzle *Source: Hidden Technical Debt in Machine Learning Systems IBM Developer / © 2019 IBM Corporation 8
  9. 9. Machine Learning Deployment IBM Developer / © 2019 IBM Corporation 9
  10. 10. What, Where, How? – What are you deploying? • What is a “model” – Where are you deploying? • Target environment (cloud, browser, edge) • Batch, streaming, real-time? – How are you deploying? • “devops” deployment mechanism • Serving framework We will talk mostly about the what IBM Developer / © 2019 IBM Corporation 10
  11. 11. What is a “model”? IBM Developer / © 2019 IBM Corporation 11
  12. 12. Deep Learning doesn’t need feature engineering or data processing … IBM Developer / © 2019 IBM Corporation 12 right?
  13. 13. Deep learning pipeline? IBM Developer / © 2019 IBM Corporation 13 Source: https://ai.googleblog.com/2016/03/train-your-own-image-classifier-with.html beagle: 0.82 Input image Inference Prediction
  14. 14. Deep learning pipeline! IBM Developer / © 2019 IBM Corporation 14 beagle: 0.82 basset: 0.09 bluetick: 0.07 ... Input image Image pre-processing Prediction Decode image Resize Normalization Convert types / format Inference Post-processing [0.2, 0.3, … ] (label, prob) Sort Label map PIL, OpenCV, tf.image, … Custom Python * Logos trademarks of their respective projects
  15. 15. Image pre-processing IBM Developer / © 2019 IBM Corporation 15 Decode image Resize Normalization Convert types / format vs Color mode cat: 0.45 beagle: 0.34 Decoding lib RGB BGR PIL vs OpenCV vs tf.image vs skimage libJPEG vs OpenCV INTEGER_FAST vs INTEGER_ACCURATE Data layout NCHW vs NHWC
  16. 16. Image pre-processing IBM Developer / © 2019 IBM Corporation 16 Decode image Resize Normalization Convert types / format Operation order Normalize INT pixels (128) Convert Float32 Convert Float32 Normalize float (0.5)
  17. 17. Inference post- processing IBM Developer / © 2019 IBM Corporation 17 Convert to numpy array [0.2, 0.3, … ] (label, prob) Sort Label map Custom loading label mapping / vocab / … TF SavedModel (assets) Keras decode_predictions Custom code Custom code
  18. 18. Pipelines, not Models – Deploying just the model part of the workflow is not enough – Entire pipeline must be deployed • Data transforms • Feature extraction & pre-processing • DL / ML model • Prediction transformation – Even ETL is part of the pipeline! – Pipelines in frameworks • scikit-learn • Spark ML pipelines • TensorFlow Transform • pipeliner (R) IBM Developer / © 2019 IBM Corporation 18
  19. 19. Challenges IBM Developer / © 2019 IBM Corporation – Formats • Each framework does things differently • Proprietary formats: lock-in, not portable – Lack of standardization leads to custom solutions and extensions – Need to manage and bridge many different: • Languages - Python, R, Notebooks, Scala / Java / C • Frameworks – too many to count! • Dependencies • Versions – Performance characteristics can be highly variable across these dimensions – Friction between teams • Data scientists & researchers – latest & greatest • Production – stability, control, minimize changes, performance • Business – metrics, business impact, product must always work! * Logos trademarks of their respective projects 19
  20. 20. Containers for ML Deployment IBM Developer / © 2019 IBM Corporation 20
  21. 21. Containers are “The Solution” … right? – But … • What goes in the container is still the most important factor • Performance can be highly variable across language, framework, version • Requires devops knowledge, CI / deployment pipelines, good practices • Does not solve the issue of standardization • Formats • APIs exposed • A serving framework is still required on top – Container-based deployment has significant benefits • Repeatability • Ease of configuration • Separation of concerns – focus on what, not how • Allow data scientists & researchers to use their language / framework of choice • Container frameworks take care of (certain) monitoring, fault tolerance, HA, etc. IBM Developer / © 2019 IBM Corporation 21
  22. 22. Open Standards for Model Serialization & Deployment IBM Developer / © 2019 IBM Corporation 22
  23. 23. Why a standard? Standard Format Execution Optimization Tooling (Viz, analysis, …) Single stack IBM Developer / © 2019 IBM Corporation 23
  24. 24. Why an Open Standard? – Open-source vs open standard – Open source (license) is only one aspect • OSS licensing allows free use, modification • Inspect the code etc • … but may not have any control – Open governance is critical • Avoid concentration of control (typically by large companies, vendors) • Visibility of development processes, strategic planning, roadmaps – However there are downsides • Standard needs wide adoption and critical mass to succeed • A standard can move slowly in terms of new features, fixes and enhancements • Design by committee • Keeping up with pace of framework development IBM Developer / © 2019 IBM Corporation 24
  25. 25. Open Neural Network Exchange (ONNX) – Championed by Facebook & Microsoft – Protobuf for serialization format and type specification – Describes • computation graph (inputs, outputs, operators) - DAG • values (weights) – In this way the serialized graph is “self- contained” – Focused on Deep Learning / tensor operations – Baked into PyTorch from 1.0.0 / Caffe2 as the serialization & interchange format IBM Developer / © 2019 IBM Corporation 25
  26. 26. ONNX Graphs IBM Developer / © 2019 IBM Corporation 26 matmult/Mul (op#0) input0 X input1 Y output0 Z X Z Y Source: http://onnx.ai/sklearn-onnx/auto_examples/plot_pipeline.html graph { node { input: "X" input: "Y" output: "Z" name: "matmult" op_type: "Mul" } input { name: "X" type { ... } } output { name: "Z" type { ... } } }
  27. 27. ONNX Graphs Source: https://github.com/onnx/tutorials/blob/master/tutorials/VisualizingAModel.md SqueezeNet Graph Visualization IBM Developer / © 2019 IBM Corporation 27
  28. 28. ONNX-ML – Provides support for (parts of) “traditional” machine learning • Additional types – sequences – maps • Operators • Vectorizers (numeric & string data) • One hot encoding, label encoding • Scalers (normalization, scaling) • Models (linear, SVM, TreeEnsemble) • … https://github.com/onnx/onnx/blob/master/docs/Operators-ml.md IBM Developer / © 2019 IBM Corporation 28
  29. 29. ONNX-ML – Exporter support • Scikit-learn – 60+ • LightGBM • XGBoost • Apache Spark ML – 25+ • Keras – all layers + TF custom layers • Libsvm • Apple CoreML https://github.com/onnx/onnxmltools/ http://onnx.ai/sklearn-onnx/index.html https://github.com/onnx/keras-onnx IBM Developer / © 2019 IBM Corporation 29
  30. 30. ONNX-ML for Apache Spark – Exporter support • Linear models, DT, RF, GBT, NaiveBayes, OneVsRest • Scalers, Imputer, Binarizer, Bucketizer • String indexing, Stop words • OneHotEncoding, Feature selection / slicing • PCA, Word2Vec, LSH https://github.com/onnx/onnxmltools/tree/master/onnxmltools/convert/sparkml IBM Developer / © 2019 IBM Corporation 30 initial_types = [ ("label", StringTensorType([1, 1])), ... ] pipeline_model = pipeline.fit(training_data) onnx_model = convert_sparkml(pipeline_model, ..., initial_types)
  31. 31. ONNX-ML for Apache Spark – Missing exporters • Feature hashing, TFIDF • RFormula • NGram • SQLTransformer • Models – clustering, FP, ALS – Current issues • Tokenizer - supported but not in ONNX spec (custom operator) • Limited invalid data handling • Python / PySpark only IBM Developer / © 2019 IBM Corporation 31
  32. 32. ONNX Ecosystem Other compliant runtimes Single stack IBM Developer / © 2019 IBM Corporation 32 Network visualization Converters ONNX Spec ONNX Model Zoo
  33. 33. ONNX Governance – Move towards open governance model • Multiple vendors • High level steering committee • Special Interest Groups (SIGs) – Converters – Training – Pipelines – Model zoos • Working groups https://github.com/onnx/onnx/tree/master/community IBM Developer / © 2019 IBM Corporation 33 – However ... not in a foundation
  34. 34. ONNX Missing Pieces – ONNX • Operator / converter coverage – E.g. TensorFlow coverage • Image processing – Support for basic resize, crop – No support for reading image directly (like tf.image.decode_jpeg) • String processing • Comprehensive benchmarks IBM Developer / © 2019 IBM Corporation 34 – ONNX-ML • Types – datetime • Operators – String processing / NLP – e.g. tokenization – Hashing – Clustering models • Specific exporters – Apache Spark ML – python only – Very basic tokenization in sklearn – No support for Keras tokenizer • Combining frameworks – Still ad-hoc, requires custom code
  35. 35. Summary ONNX ! ! • Backing by large industry players • Growing rapidly with lots of momentum • Open governance model • Focused on deep learning operators • ONNX-ML provides some support for ”traditional” ML and feature processing • Still relatively new • Difficult to keep up with breadth and depth of framework evolution • Still work required for feature processing and other data types (strings, datetime, etc) • Limited image & text pre- processing IBM Developer / © 2019 IBM Corporation 35
  36. 36. Conclusion – However there are risks • ONNX still relatively young • Operator / framework coverage • Limitations of the standard • Can one standard encompass all requirements & use cases? – Open standard for serialization and deployment of deep learning pipelines • True portability across languages, frameworks, runtimes and versions • Execution environment independent of the producer • One execution stack – Solves a significant pain point for the deployment of ML pipelines in a truly open manner Get involved - it’s open source, (open governance)! https://onnx.ai/ IBM Developer / © 2019 IBM Corporation 36
  37. 37. Thank you IBM Developer / © 2019 IBM Corporation Sign up for IBM Cloud and try Watson Studio: https://ibm.biz/BdznGk codait.org twitter.com/MLnick github.com/MLnick developer.ibm.com 37
  38. 38. IBM Developer / © 2019 IBM Corporation 38

×