[TMS 2018] 기술개발 / FuriosaAI 백준호 CEO, 글로벌 격전지에서 발견한 기회

글로벌 격전지에서의 승부:
개념 설계 중심으로의 대전환
June Paik
FuriosaAI

Contents
Introduction to AI chips
How do we build AI chips?
What is the right team to build the great winner AI chips?
뜻으로 보는 기술 스타트업

Popular Graph: ResNet-50 conv1, input data tensors to the 7x7 convolution on the right of the image in green and
yellow are processed by the convolution vertices into partials (light blue). Reductions (orange) process the partials
and pass on the non-linearity (blue). (Source: Graphcore)

Google TPU Pod
64 2nd-gen TPUs
11.5 petaflops
4 terabytes of memory
2-D toroidal mesh network

AI Chip Scale of Computation
> 1 Tops > 10 Tops > 100 Tops
1 Tops = 1,000, 000, 000, 000 OP per Second

Scale of Storage: Size
Speech/ Vision/Translation High Accuracy Model
> 100MB
Mobile Model
> 1 MB
Recommendation Systems
> 1GB
Mixture of Experts
> 1TB

Scale of Storage : Bandwidth
R = 3, W = 112,
N=64, p = 0.2
Fully Connected
3 x 3 Conv
Depthwise
Seperarable Conv
Batch Norm
Layer Norm
Compute
Data Access
Compute / Access
BW per 125 TFLOP
W4 n2
W2 n
W2 n
3 Gb/s 3 Tb/s 30 Tb/s 800 Tb/s
R2 n R2 + n 5
W2 n
W2 r2 n2 W2 r2 n+ W2 n2
W2 n W2 n
5W2 n
(Source : Cerebras)

Scale of Model Diversities
1.Perceptron (P)
2.Feed forward (FF)
3.RadialBasisNetwork(RBF)
4.RecurrentNeuralNetwork(RNN)
5.Long/ShortTermMemory(LSTM)
6.Gated RecurrentUnit(GRU)
7.AutoEncoder(AE)
8.Variational AE (VAE)
9.DenoisingAE (DAE)
10.Sparse AE (SAE)
11.Markov Chain (MC)
12.Hopfield Network(HN)
13.Boltzmann Machine (BM)
14.Restricted BM(RBM)
15.Deep Brief Network(DBN)
16.Deep ConvolutionalNetwork(DCN)
17.Deconvolutional Network(DN)
18.Deep ConvolutionalInverse Graphics Network(DCIGN)
19.Generative AdversarialNetwork(GAN)
20.Liquid State Machine (LSM)
21.Extreme LearningMachine (ELM)
22.Echostate Network(ESN)
23.Deep ResidualNetwork(DRN)
24.Kohonen Network(KN)
25.SupportVectorMachine (SVM)
26.NeuralTuringMachine (NTM)

AI Chip
What is the AI chip?
AI chip은 AI computation을 가장 고성능 효율적으로 처리하기 위한 반도체칩이다.
AI chip은 Application + Algorithm + Software + Hardware가 유기적으로 집약된 미래
엔지니어링의 결정체이며, AI 산업의 근본 경쟁력을 결정짓는 요소 기술이다.
Ex: Google TPU, Tesla Autopilot, Alexa AI Speaker

AI Chip Global Competition
Competition heating up: vertical & regional
AI chip 시장은 글로벌 기술 격전지이며,
국가별 기업별 vertical 한 방향으로 가고 있다.
Ex: Nvidia, Intel, Google, Amazon, Facebook, Samsung, Qualcomm, ARM, Baidu, Alibaba,
Graphcore, Cerebras, Groq, Cambricon, Horizontal Robotics, Habana…

Weakness & Strength
.

AI 칩 엔지니어링은 많은 요소 기술들이 복합적으로 적용되는 정밀 공학.
Application
Algorithm
Software
Microarchitecture
Verification
Physical Implementation
Manufacturing
Packaging
Testing
Board Design

What is the strength of our ecosystem?
AI 칩 제조 경쟁력은 갖추고 있음. Caution: Very Captial Intensive.
Application
Algorithm
Software
Microarchitecture
Verification
Physical Implementation K
Manufacturing J
Packaging J
Testing J
Board Design K

What is the weakness of ecosystem?
AI 칩 설계 경쟁력은 글로발 기업에 비해 매우 취약함.
Application L
Algorithm L
Software L
Microarchitecture L
Verification L
Physical Implementation
Manufacturing
Packaging J
Testing J
Board DesignJ

Weakness example: Microarchitecture
Microarchitecture가 취약하다는 말은 무엇을 의미하나?
Microarchitecture는 근본 개념설계의 영역이다.
한국 산업 미래를 위한 제언인 책 “축적의 시간”은 근본 개념설계가 우리 인더스터리에 가장 취약한 문제 영역이고,
반드시 극복해야 할 과제로 규정하고 있다.
근본 개념설계는 지성의 힘을 바탕으로 하며 부가가치가 높은 상품으로 이어지는 핵심이다.
퓨리오사는 근본 개념설계에 도전하는 회사이다.
이 다음 슬라이드에서는 Microarchitecture를 정의하고,
근본 개념 설계의 정수인 Microarchitecture 설계 방법론에 대해서 이야기한다.

Illustration of physical chips
Zoom into a microchip

Microarchitecture = micro + architecture
Chip Design회사의 Architecture Blueprint에 기반한 상세 설계도를
FAB회사에 전달 칩을 제조한다.

Great architecture needs great architect.
Microarchitecture는 근본 개념설계의 영역이다.
Great building servers people to enable the best human activities in the most humane manner possible given the
building material.
Great microarchitecture serves computation process that enables the best applications in the most efficient
manner possible given the silicon/power/budget
§ Real estate in the micro world
§ Great architect should know in and out of everything and is able to implement the chip as scheduled with
the given budgets

Microarchitect’s toolkit
근본 개념 설계는 필드의 근본 개념에 근거해야 한다.
§ Instruction Set Architecture
§ VLIW, SIMD, VECTOR, Systolic Array
§ SuperScale, Multithreading, DataFlow
§ Pipelining
§ Virtualization
§ Prefetching, Caching
§ IO, Memory subsystem
§ Finite State Machine
§ …

Key Question:
What is the great winner architecture for
AI computation?

More important questions
How can we explore and find the best
architecture and build it?

Build the performance modeling simulator
It’s a so called cycle accurate-simulator which can simulate both behavior and performance of
machine we’re building at the very fine granularity and abstraction level which is usually at the
level of clock cycle. This enforces the discipline of
§ Concrete and precise thinking
§ Data-Driven evaluation for important trade-off of design choices
Architect should have strong (or reasonable) SW skill to build this simulator.
OOP language and Event-Driven programming paradigm is the natural fit for this job. C++ is the
standard choice.

Arch exploration takes time and experiences.
Korean industries have neglected this part because we didn’t (or couldn’t afford to) allocate
enough time for defining and exploring the design space to come up with the solid architecture
specification. It takes time because
§ Workload characterization and prediction takes time.
§ Simulation needs supercomputer-scale computation.
§ Understanding very detailed design trade-off just takes time.
In other words, cultivating intuition by refining it iteratively by methodically taking good measures
takes time

Time Schedule
So let’s say it takes 1.5~2 years to build commercial AI chips from concept to production. We need
to allocate at least 6~8 month for performance modeling that goes in parallel to the
implementation
Performance Modeling /
Architecturing
RTL Implementation
Software Architecturing / Implementation
Verification
Physical Design / Manufacturing

Arch Examples: : Quantization (suggested by Google)
§ Aggressive operator fusion: Performing as many operations as possible in a single pass can
lower the cost of memory accesses and provide significant improvements in run-time and
power consumption
§ Compressed memory access: One can optimize memory bandwidth by supporting on the fly
de-compression of weights (and activations). A simple way to do that is to support lower
precision storage of weights and possibly activations.
§ Lower precision 4/8/16 bit arithmetic processing
§ Per-layer selection of bitwidths
§ Per-channel quantization

Example of AI chips: Google TPU

Example of AI chips: Furiosa MadRun

Team building:
What’s the right team to build great
winner AI chips?

New Organization Essential
Application (Business) + Algorithm+ Software Driven된 기존과 다른 조직 구성이
필수적으로 필요하다.
Any orgnization that designs a system… will inevitably produce a design whose structure
is a copy of the organization’s communication structure. – Conway, cliff young
• 큰 기업이 스타트업보다 불리함.
• 스타트업에게도 쉽지 않음.

FuriosaAI: organization structure
Application + Algorithm + Software Driven
• Application Partners: Naver, BinaryVR, Molocos, Neosapience, Seoul Smart City
Project…
• Algorithm (2)
• Software (6): Compiler, Runtime, Driver, Tool Chains
• Microarchitect (4): NPU Core, NoC, DRAM subsystem
• Logic Design (3)
• Physical Design (1): Outsourcing to SiFive or China Partners
• Manufacturing / Packaging / Board: TSMC or Samsung and Design house.

뜻으로 보는 기술 스타트업:
Furiosa Perspective

뜻으로 본 한국 역사 (함석헌)
수난의 여왕
치욕과
분열과
압박과
상실과
좌절을 극복해나가는 역사
한국 기술 스타트업은 글로발 그리고 국내 생태계의 험준한 위치에서
가파른 수난의 지형을 뚫고 올라가는 도전의 걸음임과 동시에
수많은 실패속에서도 결국 분명히 우뚝 서겠다는 강한 의지와 희망이다.

뜻으로 보는 기술 스타트업
새로운 창조를 위한 씨알
씨알의 역사적 의미 - 씨알이란 말은 민(people)이란 뜻인데, 우리 자신을 역사적 악에서
해방시키고, 새로운 창조를 위한 자격을 스스로 닦아 내기 위한 씨알.
기술 스타트업의 생태계적 의미 – 지성(People + AI)을 바탕으로 근본 문제를 해결하여 우리 생태계
(ecosystem)를 기존 관성에서 해방시키고, 혁신적 비지니스 모델 을 창조하기 위한 자격을 스스로
닦아 내기 위한 씨알.
Keyword: 주체성, 근본성, 순수성, 생동성, 관계성

Final Word: ecosystem
We should do deep research on local and global Ecosystem.
기술 기업은 필수적으로 기술을 필요로 하는 파트너와 적극 협력하는 관계가 중요하며 이는
국내와 글로발 생태계에 대한 깊은 이해와 더불어 자신에 대한 철저한 인식을 바탕으로 해야
한다.

[TMS 2018] 기술개발 / FuriosaAI 백준호 CEO, 글로벌 격전지에서 발견한 기회

Recommended

Recommended

More Related Content

Similar to [TMS 2018] 기술개발 / FuriosaAI 백준호 CEO, 글로벌 격전지에서 발견한 기회

Similar to [TMS 2018] 기술개발 / FuriosaAI 백준호 CEO, 글로벌 격전지에서 발견한 기회 (20)

More from NAVER D2 STARTUP FACTORY

More from NAVER D2 STARTUP FACTORY (20)

Recently uploaded

Recently uploaded (20)

[TMS 2018] 기술개발 / FuriosaAI 백준호 CEO, 글로벌 격전지에서 발견한 기회