Neural Network File Format for Inference Framework

Neural Network File Format for Inference Framework
Kobe Yu

About Me
● Kobe Yu
● Experience
○ PhD student(Advisor JK Lee) @PLLab NTHU
○ Compiler engineer @Kneron
● Open source project
○ Agrino, Field sensor for agricultue
○ Plant detection by deep learning
● Community
○ Open hack farm
○ FarmBot Taiwan user group

About The Licenses
Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
You are free to:
Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material
for any purpose, even commercially.
This license is acceptable for Free Cultural Works.
The licensor cannot revoke these freedoms as long as you follow the license terms.

Outline
● Neural Network Workﬂow
● Neural Network File Format
● Inference Framework
● Kneron NPU

AI Architecture Edge v.s Centralized
source: https://www.linkedin.com/pulse/ai-5g-wireless-network-edge-berge-ayvazian/

Neural Network Workﬂow - From Training to Inference
source: https://developer.qualcomm.com/docs/snpe/overview.html

Neural Network File format
● Graph and and nodes
○ Network struct(DAG)
○ Node with attribute and pretrained weights
● Training Framework
○ Tensorflow: pb
○ Caffe: .prototext .caffemodel
○ Pytorch: .h5
● Inference Framework
○ Qualcomm SNPE: .dlc
○ Intel OpenVINO : xml + raw binary(pretrain weights)
○ Tensorflow: .tflite
● Exchangeable File format
○ ONNX (ONNX community)
○ NNEF (Khronos)

Neural Network File Format - ONNX
● Content
○ Graph / Node with attributes
○ Pretrain weights
■ Float
■ Fixed point (Quantization)

Flatbuffer
source: https://google.github.io/flatbuffers/flatbuffers_benchmarks.html

Neural Network File format - NNEF
source: https://www.khronos.org/nnef

Neural Network File format
source: https://www.khronos.org/nnef

ONNX and ONNX Runtime
source: https://mc.ai/accelerate-and-productionize-ml-model-inferencing-using-open-source-tools/

Inference Framework Framework
● Forware only
● Target independent
○ Abstruction layer for hareware target
■ OpenVINO / DLDT - inference engine
■ ONNX Runtime - execution provider
■ Android / NNAPI
■ ARM / NN SDK
■ ...
● Target dependent
○ Highly optimized kernel funciton
○ Intel - MKLDNN(CPU), clDNN(GPU), FPGA plugin, VPU(Movidius)
○ ARM / Compute library - CPU / GPU
○ Nvidia / TensorRT

Inference Framework - ONNX Runtime
source: https://github.com/microsoft/onnxruntime/blob/master/docs/InferenceHighLevelDesign.md

Inference Framework - Intel OpenVINO
source: https://docs.openvinotoolkit.org/2018_R5/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html

Inference Framework - ARM NN SDK
SOURCE: https://developer.arm.com/ip-products/processors/machine-learning/arm-nn

Inference Framework - Android NNAPI
Source: https://developer.android.com/ndk/guides/neuralnetworks

Performance
source: https://medium.com/@aallan/benchmarking-edge-computing-ce3f13942245

Kneron KL520 Firmware Architecture
source: http://doc.kneron.com:8888/

Summary
● Neural network ﬁle format(model ﬁle) v.s Inference framework
○ a.out / .exe v.s Unix like OS / Windoes
● Inference framework
○ Two level software stack
■ Level one: Construct compute graph
■ Level two: Highly optimized kernel function(implementation of nn node)
■ Dedicated hardware
● NPU
○ Provide height level ISA( convolution, pooling, activation function...etc)

Neural Network File Format for Inference Framework

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Neural Network File Format for Inference Framework

Similar to Neural Network File Format for Inference Framework (20)

Recently uploaded

Recently uploaded (20)

Neural Network File Format for Inference Framework