Ncnn a universal and efficient neural network inference with vulkan

Mobility Technologies Co., Ltd.
ncnn: A universal and efficient neural network
inference with vulkan 
郭　卓然

今日の内容 
- ncnnの概要
- ncnnの行う最適化
- ベンチマーク
-

● Tencent社の高性能ニューラルネットワーク推論フレームワーク 
● https://github.com/Tencent/ncnn
 
ncnnとは 
reference: https://www.bilibili.com/video/BV1fQ4y1Z7Yx/

● 2017年からオープンソース化 
 
● Vulkan APIを用いて高速化 
(VulkanはGPU向けのグラフィックス・コンピュートAPI）  
● 主要なNNフレームワークをほぼサポート 
○ PyTorch、TensorFlow、ONNX、など  
 
● 複数のエッジデバイス(CPUとGPU)に対応 
○ NVIDIA、AMD、Intel、Qualcomm、Apple、など  
 
● Tencent社の幾つかのApplicationsに応用 
○ Wechat、QQ、など  
 
ncnnの概要 

● Object detection 
● Face parsing  
● Super resolution  
● Frame interpolation  
 
 
ncnn AI デモ  
refernce：https://github.com/Tencent/ncnn

 
 
 
プラットフォーム 
reference:
https://github.com/Tencent/ncnn

メモリプール再利用  
blob1
convolution
blob2
relu(inplace)
blob3
pooling
blob4
blob1 blob2
blob2/blob3
blob4 blob3
計算プロセ
スで2ブロッ
クのメモリの
みが使用さ
れる
メモリプールの使用
blob = Binary Large OBjectの略、バイナリデータを表すオブジェクト、格納するための型

Operator fusion 
blob1
x=min(x,10)
blob2
x=max(x,0)
blob3
min演算子とmax演
算子が1つの演算に
fuseされることによ
り推論が高速化
blob1
x=torch.clamp(x, min=0, max=10)
blob3

 
 
 
Small bits representation (FP16/BF16） 
FP32 Tensor
FP16
Tensor
BF16
Tensor
FP16演算子kernel、
A55、A75などのアー
キテクチャをサポート
BF16演算子kernel、
すべてのARMCPUを
サポート
● FP/BF16は16ビットを使用して浮動小数点数を表現
● FP32に比べ50%のメモリ節約
● 効率的な演算子kernelの実装が可能
● 推論速度が向上

 
 
 
Quantization 
convolution
relu
convolution
FP32blob
FP32blob
FP32blob
FP32blob
quantize
int8 convolution
dequantize
quantize
int8 convolution
dequantize
relu
FP32blob
FP32blob
FP32blob
FP32blob
int8 convolution
quantize
int8 convolution
requantize
requantize
FP32blob
INT8blob
INT8blob

● 一部のレイヤーがGPUで動かない場合、自動的にCPUに切り替える必要がある
 
● CPUとGPUで最適なメモリレイアウトは異なる
 
● ncnnは、CPUとGPUの変換パイプラインを自動選択
 
● GPUでの転送にはfp16データ型を優先
 
 
 
CPU/GPU mixed inference 

複数のデバイスにタスクをディスパッチする 
○ GPUの使用率向上 
○ コンピューティング能力を活用 
 
並列推論  

 
● Qualcomm Adreno <= 540でVkImageを優先する 
○ ハードウェア texture fetchで大幅に高速化 
○ Adreno640 +を含むその他にはVkBufferを優先 
 
 
● Blacklist / Whitelist for old-buggy driver 
○ vendor id + driver version + vulkan api versionでフィルタリング 
○ Androidシステムのアップグレード、特に8.1以降では、ドライバーの品質が向上 
 
 
プラットフォーム固有のトリック  

● Swiftshader as the vulkan driver on CPU 
○ Make sure the vulkan code produces is expected 
 
 
 
Vulkan continuous integration  
refernce
https://github.com/Tencent/ncnn

● ncnn benchmark on Apple M1 Silicon(ms) (less is better) 
 
 
ncnnベンチマーク 

● ncnn benchmark compared with other inference engines 
 
 
ncnnベンチマーク 
Small model、Threads x 1 Large model、Threads x 1
reference: https://www.zhihu.com/question/400143354

事業応用観点でみたncnn  
- Pros
- アプリ組み込みがしやすい（依存性がすくない、バイナリが軽い）
- 複数のプラットフォーム向けに実行ファイルのサンプルが提供されている
- iOS, Androidにも対応し、プラットフォーム間で共用できる
- Cons
- GPUを使うためにはVulkan APIが必須

Ncnn a universal and efficient neural network inference with vulkan

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Ncnn a universal and efficient neural network inference with vulkan

Similar to Ncnn a universal and efficient neural network inference with vulkan (20)

Ncnn a universal and efficient neural network inference with vulkan