第2回 AI半導体について知る「Blaize Graph Streaming Processor」

会社及び製品紹介
Introduction to Blaize
January 2022
Copyright ©2020 Blaize. All rights reserved. Company Confidential.

本日の議題
Copyright © 2020 Blaize. All rights reserved.
1.会社紹介（岡田）
2.Blaizeの商品（岡田）
3.Blaizeの目指す世界（岡田）
4.採用事例（岡田）
5.GSPのアーキテクチャについて（小林）
6.SWの開発環境（小林）
7.デモ（小林）
8.Q &A (岡田&小林）

Blaize社概要
Copyright ©2021 Blaize. All rights reserved.
3
• 2010設立
• 主製品– エッジ推論用AI Chip
Graph Streaming Processor (GSP)
• 2021年度現在３種類の製品を販売
主要株主
従業員300人超
El Dorado, CA
(HQ)
Automotive
CoE
Hyderabad
India
Manila
Philippine
Tokyo
Japan
世界の拠点

4
Blaize社経営陣
Ke Yin
VP of GSP Engineering
Co-founder
Formerly:
Satyaki Koneru
Chief Technology Officer
Co-founder
Formerly:
Val G. Cook
Chief Software Architect
Co-founder
Formerly:
Dinakar Munagala
Chief Executive Officer
Co-founder
Formerly:
Tom Trill
Vice President
Worldwide Sales
Formerly:
Harminder Sehmi
Vice President
Finance, Investor Relations
Formerly:
Santiago Fernandez-Gomez
Vice President
Operations, Quality
Formerly:
Dmitry Zakharchenko
Vice President R&D
Formerly:

エッジAI需要が急増：
データのある場所での
コンピューティング
データセンタからエッジへ機能を移行
• エッジ・コンピューティングの成長は持続的に加速
• 多様なユースケースがさらに拡大
• まったく異なるAIコンピューティング要件
• 広範なアプリケーション開発/展開ニーズ
リテール自動車/モビリティ
ロボティクス/
インダストリー4.0
セキュリティ
自
動
車
と
モ
ビ
リ
テ
ィ
センサ・
フュージョン
車内自律走行車
商
用
スマート・シティ
スマート・
リテール
テレコム/5G
産
業
用
3Dセンシング・
ソリューション
スマート・
ファクトリ
非可視
スペクトル
5
Blaizeが狙うマーケット

Blaizeが得意とする処理＝エッジで要求される処理
複数入力同時処理
Blaize
- YOLO
複数モデル同時処理
- Resnet
- 独自モデル
- 低遅延
- 低消費電力
- バッチ処理をしない！ - etc
6
システムにおおよその効率*:
# 必要TOPS数消費電力
GSP
GPU >2x
1x
TOPS最適化
1x
5x
遅延
>+15
%
1x
6x
1x
5x
1x
1x
15x
外部メモリへのアクセス
コスト効率
(Perf/Watt/$)
*Blaize P1600 vs. Nvidia Xavier AGX running
Unet

Blaizeの商品
BLAIZE AI エッジ・コンピューティング・プラットフォーム
8

開発、最適化、製品化の
すべてが行える統合開発環境
AI Studio™ Picasso SDK™
AI ソフトウェア開発環境
AIコンピューティングハードウェア
プラットフォーム
Blaize の商品
8
エッジAIコンピューティングに最適化され
た低消費電力、低遅延かつ低TCO
AIプリケーション向けに開発されたGraph
Streaming Processorは100%プログ
ラム可能かつAiに最適化
Blaize社独自開発のAI
GSP アーキテクチャ

GSP
組み込み
プラットフォーム
ユースケース
センサ・エッジの内蔵カメラ
ネットワーク・エッジ機器
16 10x
100%
10x
16
10x
Ethernet, CSI, USB, CODEC, I/O…
Commercial & Industrial Grades
PATHFINDER P1600組み込みシステム・オン・モジュール
9

XplorerTM
Accelerator
Platforms
利用シーン
• エッジ及びエンタープライズ向けサーバ
• 自立型高額検査
• スマートパーキング、交通制御
• スマートリテール
• ネットワークビデオレコーダー
• 物流＆工場アプリケーション
• セキュリティシステム
• 工場用PC及びサーバー
16
XPLORER™ ACCELERATOR PLATFORMS
16-20
Xplorer X1600E EDSFF
Xplorer X1600P PCIe
PCIe 3.0 x4, 4GB LPDDR4
EDSFF E1.S
Enterprise Grade
10
10W
Xplorer X1600P-Q
Xplorer X1600P-Q
64 80
65 - 80W
PCIe 3.0 x4, x16, 16GB LPDDR4
Commercial & Enterprise Versions

同じGSPアーキテクチャ上で多くの自動車用アプリケーションが動作
電力を消費し開発が煩雑なGPUとFPGAをスケーラブルなBlaize GSPプラットフォームに置き換え
自動車システム制
御・監視
局所的な画像
前後処理
• カメラ、ライダー、レーダーのための集中型センサ・フュージョン
• 集中型コンピューティング＆AIアクセラレーション...
キャビン内モニ
タ
ADAS＆自律走行
インフォテインメ
ント＆
ドライバーUX
局所的なセンサ
前後処理
自動車関連アプリケーション
11 Copyright © 2020 Blaize. All rights reserved.

ユースケース : セマンティック・セグメンテーション
Blaize Pathfinder P1600
低消費電力で実現する自動運転
Blaizeによる測定結果
遅延 50ms 以下遅延 25ms 遅延 25ms 以下
前方カメラレーン検出
全方位カメラ
システム効率 15倍
電力 1/5
他社製品と比較して同等かそれ以上の性能
20

スマート・シティ分野
お客様の使用例
Pathfinder P1600
アプリケーション要件
• 5～10個のHD POEカメラ
• 50FPSで動作する3つの
独立ニューラル・ネットワーク
モニタリング
• 人間の検知
• 人間の姿勢と位置
• 自動車の検知と車種
• ナンバープレート検知と読み取り
• 交差点の安全性とセキュリティ
1. 効率、安全性による交差点の利用効率向上
2. 料金トラッキングのためのナンバー・プレート読み取り
26

リテール・セキュリティ分野
お客様の使用例
Xplorer X1600P
アプリケーション要件
• 10～20個のHD POEカメラ
• 50FPSで動作する4つの独立
ニューラル・ネットワーク
モニタリング
• 人間の検知
• 人間の姿勢
• 人間のマスク検知
• 盗難
• 店舗の安全性とセキュリティ
• 製品の検知
1. 盗難防止
2. リテール・アナリティクス
3. マスク検知と着用状況
27

Blaizeの目指す世界

エッジAi向けEnd to Endコンピューティングプラットフォームを提供
ハードウェア
ソフトウェア
16
Copyright © 2021 Blaize. All rights reserved. Company Confidential.
アクセラレーター, システム・オン・モジュール
学習済みモデルを提供するマーケットプレイス
ディープニューラルネットワーク (DNN)に適したアーキテクチャの
システムオンチップ
C
A
B
E
D
ローコード/ノーコードの開発環境
Ai
Studio

17 Copyright ©2020 Blaize. All rights reserved.
AI Studio
エッジAiアプリケーション
の運用/開発サイクルをビ
ジュアライズした発の開
発プラットフォーム
エッジAiの運用フローをアイディア-> 開発 -> デプロイ-> 管理までをビ
ジュアライズ

スマートアグリテック– Blaize AI Studioで実現！
- 例え少しの改善でも社会に大きく改善
農業のプロはC++, Python,
Machine Learningのプロでは
ない。
結果
• 水の消費量の削減
• 堆肥や汚染の削減
• イールドの改善
• 土地管理の向上
推論デバイス:
Blaize P1600 SOM
Blaize AI Studio の
ガイドに従ってアプりを作
成
Demo Video Link
18
Blaizeのソリューション

採用事例

プレスリリース

Blaize GSPはAxell社のAillia SDKでも動く!
Copy right ©2019 Blaize. All rights reserv ed.
6
• AXELL 'ailia’ SDK is a deep learning middleware specialized in inference in the edge and provides pre- and
post-processing with validated pre-trained models publicly available on the internet and original model.
• A customer can easily integrate their AI application with ‘ailia’ SDK and can compute inference with Blaize
GSP hardware acceleration via ailia influenceAPI.
• 'ailia’ SDK also provides the 'Unity' plugin for setting up an application quickly. This demo accesses to
webcam and video stream inside Unity and application UI is running on the Unity game engine.
GoogLeNet classification
on Blaize GSP
ailia SDK
Unity API
ailia pre
precessing
Blaize GSP
GoogLeNet
classification
ailia post
processing
Unity for UI
Video
Output
Webcam
Video
Stream
C#
API
C#
API
ailia
API
ailia
API
Powered by

Thank you!

Blaize GSP (Graph Streaming Processor)
Architectural Differentiation
Tatsuya Kobayashi, Principal Field Application Engineer
23
https://www.linkedin.com/in/tatsuya-kobayashi/

OpenVX Cross-Vendor Vision and Inferencing
• Image processing and
AI inference
acceleration is
performed by
OpenVX.
• OpenVX needs a
higher-level graph
abstraction to enable
optimized cross-
vendor drivers

Architecture Secret Sauce – Blaize Graph Streaming Processor
Up to 100x lower memory bandwidth, 10x higher IPS/W/$ system efficiency
Graph Model Processing Steps
Vision Processing/Object Detection Example Processing Comparison
1
2 4
3 5 6
A B C D
Legacy
Sequential
Processing
CPU/GPU
Time
• Data parallel only
• Tasks completed before data
sent to next node
• High memory utilization to
store data while tasks
complete
CPUs/ GPUs: Use of external DRAM storage between steps increases
latency, increases power and increases system cost.
1
2
4
3
5
6
A ,B,C,D
Graph Streaming
Processing
Time
• Data & task parallel
• Result of task sent to next
node when ready
• Low memory utilization as
data is sent directly to next
node
Blaize GSP massively shrinks bandwidth to external DRAM and host CPU –
lowering latency, power and system cost
Instructions Programs
Intermediate Data
INPUT
(Pixels)
A
1 2
B C
5
4
3
6
D
A B C D
1 2 3
4 5 6 vs
OUTPUT (Image)
Object Model
26

Architectual ‘BATCH 1’ advantage for multi stream video proccessing
28
• GPU based solution
- Batch processing is required to process
multiple camera inputs and increase
latency
- YUV-RGB conversion required before
batch
- This increases the amount of memory
transfer for colour conversion and
introduces latency due to batch
processing.
• Blaize (Picasso SDK/GSF)
- Task-level parallelism eliminates the need
for batch processing and reduces latency
- No batch processing allows different
inference graphs to be assigned to
different cameras
- High efficiency, low power consumption
due to low memory bandwidth, flexibility
of implementation
• Efficiency - Task Level Parallelism
• Pre-fetch/Pre-allocate Data
• Minimize Off-chip Memory Traffic

Picasso™ SDK
31
• The Picasso™ Software Development Kit (SDK) from Blaize includes all the drivers,
libraries and tools necessary to create compelling computer vision applications. These
applications are fully hardware accelerated by the second generation of Blaize’s El Cano
Graph Streaming Processor (GSP).
• NetDeploy - NetDeploy is a tool that helps in
converting a Neural Network inference model to
Hardware specific code for running on Blaize GSP.
• Picasso Libraries - Picasso libraries include the
OpenVX Computer Vision(CV) APIs, ONNX
Operators, Image Signal Processing (ISP) Modules
• Picasso Python Package to utilise the GSP from
Python.
• Pre-build NN models
• LLDB Source Level Debugger
• Blaize Performance Profiler
ML Frameworks
Applications
(C/C++, OpenVX)
Blaize Picasso Software Development Platform
Picasso Libraries
OpenVX API ONNX API
Blaize Netdeploy
Automated Neural Network Optimization
Pre-build
NN models
Graph Framework (OpenVX, OpenCL)
Graph Compiler
Custom Kernel Compiler
(OpenCL C/C++)
Graph Runtime
Dev Kit
Performance Profiler
LLDB Debugger
Integrated Dev Environment
ISP API
Picasso
Python
Package

Graph Framework
32
• OpenVX provides fundamental computer vision support within a capable framework.
• The framework provides for data flow graphs or task-based execution graphs
• Data flow graphs are accelerated natively by the Blaize hardware.
A few of the key enhancements are listed below:
• Fully hardware accelerated user kernels written
in OpenCL C/C++
• Fundamental Computer Vision support
• Optimized Neural Network support with ONNX
API
• OpenVX and ONNX API can be accessed from
Python
• Varied data types (8 bits integer, half precision
brain floating-point (BF16) numbers)
ML Frameworks
Applications
(C/C++, OpenVX)
Blaize Picasso Software Development Platform
Picasso Libraries
OpenVX API ONNX API
Blaize Netdeploy
Automated Neural Network Optimization
Pre-build
NN models
Graph Framework (OpenVX, OpenCL)
Graph Compiler
Custom Kernel Compiler
(OpenCL C/C++)
Graph Runtime
Dev Kit
Performance Profiler
LLDB Debugger
Integrated Dev Environment
ISP API
Picasso
Python
Package

Picasso™ SDK
33
To create an AI application using the Picasso SDK
• Convert a python NN model written in Python
for PyTorch/TensorFlow into a c++ ONNX API
NN graph file using netdeploy.
• I/O processing such as camera input is coded
in c/c++. To connect to the c++ NN graph
codes generated by netdeploy, we can
connect via OpenVX API.
• It is also possible to execute pre-processing
and post-processing of c++ NN graphs with a
custom kernel written in OpenVX and OpenCL.
• OpenVX and a custom kernel written in
OpenCL can be used to perform pre-
processing, post-processing, etc.
- Pre-processing can be ISP processing (YUV-RGB
conversion, rescale image to DNN input size).
- Post-processing can be NMS processing of Yolo.
Run the binary compiled for the target to
perform NN processing.
Creating an AI application

DEMONSTRATION
Demonstration of optimized GSP
application reference for PoC design
34

Graph Streaming Framework (GSF)
35
Overview
• Optimized evaluation/demonstration tool
- The Graph Streaming Framework (GSF) is an
evaluation and demonstration tool that
provides an end-to-end workflow for training,
customizing and deploying networks,
visualising results and benchmarking
performance.
- This software is only intended to assist
customers with quick and easy evaluation of
their use cases, creation of prototypes and
proof-of-concepts on Blaize hardware.
• Supported Yolov2/v3/v4, ResNet, Unet,
Swiftnet, OpenPose

Demonstration Overview
38
• Runs Yolov4/OpenPose on the GSF framework.
• Video decoding, preprocessing, inference, and
postprocessing (reshape/NMS) are performed on
Blaize Path Finder (embedded board).
• The FullHD H.264 video is sent from the host PC to
the Path Finder via a socket (TCP/IP connection).
• The Path Finder sends information about the
coordinates, class and probability of the bounding
box to the host PC. The host PC draws the result on
the video and displays it.
• Notes
- 表示されているFPSはカメラ入力から画像出力までの時間
からの値です．DNN処理のみの性能ではありません．
- 画面表示はオンラインの関係でリモート接続したホストPC
の画像をさらにカンファレンスで配信しています．このためリ
フレッシュレートは見た目通りではありません．
- 学習はオープンなデータセットを利用しており，使用してい
るデータに最適化されたweight/biasではありません．

Acknowledgements
39
• The following materials are used in the demonstration
- Video by Olya Kobruseva from Pexels
- Video by Kelly L from Pexels
- Video by Anastasia Shuraeva from Pexels
- Video by George Morina from Pexels
- Video by RODNAE Productions from Pexels
- Video by Mikhail Nilov from Pexels
- Video by Allan Mas from Pexels

第2回 AI半導体について知る「Blaize Graph Streaming Processor」

Recommended

Recommended

More Related Content

More from 日本ディープラーニング協会（JDLA）

More from 日本ディープラーニング協会（JDLA） (7)

Recently uploaded

Recently uploaded (11)

第2回 AI半導体について知る「Blaize Graph Streaming Processor」