글로벌 격전지에서의 승부:
개념 설계 중심으로의 대전환
June Paik
FuriosaAI
Contents
Introduction to AI chips
How do we build AI chips?
What is the right team to build the great winner AI chips?
뜻으로 보는 기술 스타트업
Introduction to AI chips
Neural Network In A Minute
Popular Graph: ResNet-50 conv1, input data tensors to the 7x7 convolution on the right of the image in green and
yellow are processed by the convolution vertices into partials (light blue). Reductions (orange) process the partials
and pass on the non-linearity (blue). (Source: Graphcore)
The mark-1 perceptron machine
Google TPU Pod
64 2nd-gen TPUs
11.5 petaflops
4 terabytes of memory
2-D toroidal mesh network
AI Chip Scale of Computation
> 1 Tops > 10 Tops > 100 Tops
1 Tops = 1,000, 000, 000, 000 OP per Second
Scale of Storage: Size
Speech/ Vision/Translation High Accuracy Model
> 100MB
Mobile Model
> 1 MB
Recommendation Systems
> 1GB
Mixture of Experts
> 1TB
Scale of Storage : Bandwidth
R = 3, W = 112,
N=64, p = 0.2
Fully Connected
3 x 3 Conv
Depthwise
Seperarable Conv
Batch Norm
Layer Norm
Compute
Data Access
Compute / Access
BW per 125 TFLOP
W4 n2
W2 n
W2 n
3 Gb/s 3 Tb/s 30 Tb/s 800 Tb/s
R2 n R2 + n 5
W2 n
W2 r2 n2 W2 r2 n+ W2 n2
W2 n W2 n
5W2 n
(Source : Cerebras)
Scale of Model Diversities
1.Perceptron (P)
2.Feed forward (FF)
3.RadialBasisNetwork(RBF)
4.RecurrentNeuralNetwork(RNN)
5.Long/ShortTermMemory(LSTM)
6.Gated RecurrentUnit(GRU)
7.AutoEncoder(AE)
8.Variational AE (VAE)
9.DenoisingAE (DAE)
10.Sparse AE (SAE)
11.Markov Chain (MC)
12.Hopfield Network(HN)
13.Boltzmann Machine (BM)
14.Restricted BM(RBM)
15.Deep Brief Network(DBN)
16.Deep ConvolutionalNetwork(DCN)
17.Deconvolutional Network(DN)
18.Deep ConvolutionalInverse Graphics Network(DCIGN)
19.Generative AdversarialNetwork(GAN)
20.Liquid State Machine (LSM)
21.Extreme LearningMachine (ELM)
22.Echostate Network(ESN)
23.Deep ResidualNetwork(DRN)
24.Kohonen Network(KN)
25.SupportVectorMachine (SVM)
26.NeuralTuringMachine (NTM)
Scale of Services
AI Chip
What is the AI chip?
AI chip은 AI computation을 가장 고성능 효율적으로 처리하기 위한 반도체칩이다.
AI chip은 Application + Algorithm + Software + Hardware가 유기적으로 집약된 미래
엔지니어링의 결정체이며, AI 산업의 근본 경쟁력을 결정짓는 요소 기술이다.
Ex: Google TPU, Tesla Autopilot, Alexa AI Speaker
AI Chip Global Competition
Competition heating up: vertical & regional
AI chip 시장은 글로벌 기술 격전지이며,
국가별 기업별 vertical 한 방향으로 가고 있다.
Ex: Nvidia, Intel, Google, Amazon, Facebook, Samsung, Qualcomm, ARM, Baidu, Alibaba,
Graphcore, Cerebras, Groq, Cambricon, Horizontal Robotics, Habana…
How do we build AI chips?
Weakness & Strength
.
How do we build AI chips?
AI 칩 엔지니어링은 많은 요소 기술들이 복합적으로 적용되는 정밀 공학.
Application
Algorithm
Software
Microarchitecture
Verification
Physical Implementation
Manufacturing
Packaging
Testing
Board Design
What is the strength of our ecosystem?
AI 칩 제조 경쟁력은 갖추고 있음. Caution: Very Captial Intensive.
Application
Algorithm
Software
Microarchitecture
Verification
Physical Implementation K
Manufacturing J
Packaging J
Testing J
Board Design K
What is the weakness of ecosystem?
AI 칩 설계 경쟁력은 글로발 기업에 비해 매우 취약함.
Application L
Algorithm L
Software L
Microarchitecture L
Verification L
Physical Implementation
Manufacturing
Packaging J
Testing J
Board DesignJ
Weakness example: Microarchitecture
Microarchitecture가 취약하다는 말은 무엇을 의미하나?
Microarchitecture는 근본 개념설계의 영역이다.
한국 산업 미래를 위한 제언인 책 “축적의 시간”은 근본 개념설계가 우리 인더스터리에 가장 취약한 문제 영역이고,
반드시 극복해야 할 과제로 규정하고 있다.
근본 개념설계는 지성의 힘을 바탕으로 하며 부가가치가 높은 상품으로 이어지는 핵심이다.
퓨리오사는 근본 개념설계에 도전하는 회사이다.
이 다음 슬라이드에서는 Microarchitecture를 정의하고,
근본 개념 설계의 정수인 Microarchitecture 설계 방법론에 대해서 이야기한다.
Illustration of physical chips
Zoom into a microchip
Microarchitecture = micro + architecture
Chip Design회사의 Architecture Blueprint에 기반한 상세 설계도를
FAB회사에 전달 칩을 제조한다.
Great architecture needs great architect.
Microarchitecture는 근본 개념설계의 영역이다.
Great building servers people to enable the best human activities in the most humane manner possible given the
building material.
Great microarchitecture serves computation process that enables the best applications in the most efficient
manner possible given the silicon/power/budget
§ Real estate in the micro world
§ Great architect should know in and out of everything and is able to implement the chip as scheduled with
the given budgets
Microarchitect’s toolkit
근본 개념 설계는 필드의 근본 개념에 근거해야 한다.
§ Instruction Set Architecture
§ VLIW, SIMD, VECTOR, Systolic Array
§ SuperScale, Multithreading, DataFlow
§ Pipelining
§ Virtualization
§ Prefetching, Caching
§ IO, Memory subsystem
§ Finite State Machine
§ …
Key Question:
What is the great winner architecture for
AI computation?
More important questions
How can we explore and find the best
architecture and build it?
Build the performance modeling simulator
It’s a so called cycle accurate-simulator which can simulate both behavior and performance of
machine we’re building at the very fine granularity and abstraction level which is usually at the
level of clock cycle. This enforces the discipline of
§ Concrete and precise thinking
§ Data-Driven evaluation for important trade-off of design choices
Architect should have strong (or reasonable) SW skill to build this simulator.
OOP language and Event-Driven programming paradigm is the natural fit for this job. C++ is the
standard choice.
Arch exploration takes time and experiences.
Korean industries have neglected this part because we didn’t (or couldn’t afford to) allocate
enough time for defining and exploring the design space to come up with the solid architecture
specification. It takes time because
§ Workload characterization and prediction takes time.
§ Simulation needs supercomputer-scale computation.
§ Understanding very detailed design trade-off just takes time.
In other words, cultivating intuition by refining it iteratively by methodically taking good measures
takes time
Time Schedule
So let’s say it takes 1.5~2 years to build commercial AI chips from concept to production. We need
to allocate at least 6~8 month for performance modeling that goes in parallel to the
implementation
Performance Modeling /
Architecturing
RTL Implementation
Software Architecturing / Implementation
Verification
Physical Design / Manufacturing
Arch Examples: : Quantization (suggested by Google)
§ Aggressive operator fusion: Performing as many operations as possible in a single pass can
lower the cost of memory accesses and provide significant improvements in run-time and
power consumption
§ Compressed memory access: One can optimize memory bandwidth by supporting on the fly
de-compression of weights (and activations). A simple way to do that is to support lower
precision storage of weights and possibly activations.
§ Lower precision 4/8/16 bit arithmetic processing
§ Per-layer selection of bitwidths
§ Per-channel quantization
Example of AI chips: Google TPU
Example of AI chips: Furiosa MadRun
Team building:
What’s the right team to build great
winner AI chips?
New Organization Essential
Application (Business) + Algorithm+ Software Driven된 기존과 다른 조직 구성이
필수적으로 필요하다.
Any orgnization that designs a system… will inevitably produce a design whose structure
is a copy of the organization’s communication structure. – Conway, cliff young
• 큰 기업이 스타트업보다 불리함.
• 스타트업에게도 쉽지 않음.
FuriosaAI: organization structure
Application + Algorithm + Software Driven
• Application Partners: Naver, BinaryVR, Molocos, Neosapience, Seoul Smart City
Project…
• Algorithm (2)
• Software (6): Compiler, Runtime, Driver, Tool Chains
• Microarchitect (4): NPU Core, NoC, DRAM subsystem
• Logic Design (3)
• Physical Design (1): Outsourcing to SiFive or China Partners
• Manufacturing / Packaging / Board: TSMC or Samsung and Design house.
뜻으로 보는 기술 스타트업:
Furiosa Perspective
뜻으로 본 한국 역사 (함석헌)
수난의 여왕
치욕과
분열과
압박과
상실과
좌절을 극복해나가는 역사
한국 기술 스타트업은 글로발 그리고 국내 생태계의 험준한 위치에서
가파른 수난의 지형을 뚫고 올라가는 도전의 걸음임과 동시에
수많은 실패속에서도 결국 분명히 우뚝 서겠다는 강한 의지와 희망이다.
뜻으로 보는 기술 스타트업
새로운 창조를 위한 씨알
씨알의 역사적 의미 - 씨알이란 말은 민(people)이란 뜻인데, 우리 자신을 역사적 악에서
해방시키고, 새로운 창조를 위한 자격을 스스로 닦아 내기 위한 씨알.
기술 스타트업의 생태계적 의미 – 지성(People + AI)을 바탕으로 근본 문제를 해결하여 우리 생태계
(ecosystem)를 기존 관성에서 해방시키고, 혁신적 비지니스 모델 을 창조하기 위한 자격을 스스로
닦아 내기 위한 씨알.
Keyword: 주체성, 근본성, 순수성, 생동성, 관계성
Final Word: ecosystem
We should do deep research on local and global Ecosystem.
기술 기업은 필수적으로 기술을 필요로 하는 파트너와 적극 협력하는 관계가 중요하며 이는
국내와 글로발 생태계에 대한 깊은 이해와 더불어 자신에 대한 철저한 인식을 바탕으로 해야
한다.
Thank you

[TMS 2018] 기술개발 / FuriosaAI 백준호 CEO, 글로벌 격전지에서 발견한 기회

  • 1.
    글로벌 격전지에서의 승부: 개념설계 중심으로의 대전환 June Paik FuriosaAI
  • 2.
    Contents Introduction to AIchips How do we build AI chips? What is the right team to build the great winner AI chips? 뜻으로 보는 기술 스타트업
  • 3.
  • 4.
  • 6.
    Popular Graph: ResNet-50conv1, input data tensors to the 7x7 convolution on the right of the image in green and yellow are processed by the convolution vertices into partials (light blue). Reductions (orange) process the partials and pass on the non-linearity (blue). (Source: Graphcore)
  • 7.
  • 8.
    Google TPU Pod 642nd-gen TPUs 11.5 petaflops 4 terabytes of memory 2-D toroidal mesh network
  • 9.
    AI Chip Scaleof Computation > 1 Tops > 10 Tops > 100 Tops 1 Tops = 1,000, 000, 000, 000 OP per Second
  • 10.
    Scale of Storage:Size Speech/ Vision/Translation High Accuracy Model > 100MB Mobile Model > 1 MB Recommendation Systems > 1GB Mixture of Experts > 1TB
  • 11.
    Scale of Storage: Bandwidth R = 3, W = 112, N=64, p = 0.2 Fully Connected 3 x 3 Conv Depthwise Seperarable Conv Batch Norm Layer Norm Compute Data Access Compute / Access BW per 125 TFLOP W4 n2 W2 n W2 n 3 Gb/s 3 Tb/s 30 Tb/s 800 Tb/s R2 n R2 + n 5 W2 n W2 r2 n2 W2 r2 n+ W2 n2 W2 n W2 n 5W2 n (Source : Cerebras)
  • 12.
    Scale of ModelDiversities 1.Perceptron (P) 2.Feed forward (FF) 3.RadialBasisNetwork(RBF) 4.RecurrentNeuralNetwork(RNN) 5.Long/ShortTermMemory(LSTM) 6.Gated RecurrentUnit(GRU) 7.AutoEncoder(AE) 8.Variational AE (VAE) 9.DenoisingAE (DAE) 10.Sparse AE (SAE) 11.Markov Chain (MC) 12.Hopfield Network(HN) 13.Boltzmann Machine (BM) 14.Restricted BM(RBM) 15.Deep Brief Network(DBN) 16.Deep ConvolutionalNetwork(DCN) 17.Deconvolutional Network(DN) 18.Deep ConvolutionalInverse Graphics Network(DCIGN) 19.Generative AdversarialNetwork(GAN) 20.Liquid State Machine (LSM) 21.Extreme LearningMachine (ELM) 22.Echostate Network(ESN) 23.Deep ResidualNetwork(DRN) 24.Kohonen Network(KN) 25.SupportVectorMachine (SVM) 26.NeuralTuringMachine (NTM)
  • 13.
  • 14.
    AI Chip What isthe AI chip? AI chip은 AI computation을 가장 고성능 효율적으로 처리하기 위한 반도체칩이다. AI chip은 Application + Algorithm + Software + Hardware가 유기적으로 집약된 미래 엔지니어링의 결정체이며, AI 산업의 근본 경쟁력을 결정짓는 요소 기술이다. Ex: Google TPU, Tesla Autopilot, Alexa AI Speaker
  • 15.
    AI Chip GlobalCompetition Competition heating up: vertical & regional AI chip 시장은 글로벌 기술 격전지이며, 국가별 기업별 vertical 한 방향으로 가고 있다. Ex: Nvidia, Intel, Google, Amazon, Facebook, Samsung, Qualcomm, ARM, Baidu, Alibaba, Graphcore, Cerebras, Groq, Cambricon, Horizontal Robotics, Habana…
  • 16.
    How do webuild AI chips? Weakness & Strength .
  • 17.
    How do webuild AI chips? AI 칩 엔지니어링은 많은 요소 기술들이 복합적으로 적용되는 정밀 공학. Application Algorithm Software Microarchitecture Verification Physical Implementation Manufacturing Packaging Testing Board Design
  • 18.
    What is thestrength of our ecosystem? AI 칩 제조 경쟁력은 갖추고 있음. Caution: Very Captial Intensive. Application Algorithm Software Microarchitecture Verification Physical Implementation K Manufacturing J Packaging J Testing J Board Design K
  • 19.
    What is theweakness of ecosystem? AI 칩 설계 경쟁력은 글로발 기업에 비해 매우 취약함. Application L Algorithm L Software L Microarchitecture L Verification L Physical Implementation Manufacturing Packaging J Testing J Board DesignJ
  • 20.
    Weakness example: Microarchitecture Microarchitecture가취약하다는 말은 무엇을 의미하나? Microarchitecture는 근본 개념설계의 영역이다. 한국 산업 미래를 위한 제언인 책 “축적의 시간”은 근본 개념설계가 우리 인더스터리에 가장 취약한 문제 영역이고, 반드시 극복해야 할 과제로 규정하고 있다. 근본 개념설계는 지성의 힘을 바탕으로 하며 부가가치가 높은 상품으로 이어지는 핵심이다. 퓨리오사는 근본 개념설계에 도전하는 회사이다. 이 다음 슬라이드에서는 Microarchitecture를 정의하고, 근본 개념 설계의 정수인 Microarchitecture 설계 방법론에 대해서 이야기한다.
  • 21.
    Illustration of physicalchips Zoom into a microchip
  • 22.
    Microarchitecture = micro+ architecture Chip Design회사의 Architecture Blueprint에 기반한 상세 설계도를 FAB회사에 전달 칩을 제조한다.
  • 23.
    Great architecture needsgreat architect. Microarchitecture는 근본 개념설계의 영역이다. Great building servers people to enable the best human activities in the most humane manner possible given the building material. Great microarchitecture serves computation process that enables the best applications in the most efficient manner possible given the silicon/power/budget § Real estate in the micro world § Great architect should know in and out of everything and is able to implement the chip as scheduled with the given budgets
  • 24.
    Microarchitect’s toolkit 근본 개념설계는 필드의 근본 개념에 근거해야 한다. § Instruction Set Architecture § VLIW, SIMD, VECTOR, Systolic Array § SuperScale, Multithreading, DataFlow § Pipelining § Virtualization § Prefetching, Caching § IO, Memory subsystem § Finite State Machine § …
  • 25.
    Key Question: What isthe great winner architecture for AI computation?
  • 26.
    More important questions Howcan we explore and find the best architecture and build it?
  • 27.
    Build the performancemodeling simulator It’s a so called cycle accurate-simulator which can simulate both behavior and performance of machine we’re building at the very fine granularity and abstraction level which is usually at the level of clock cycle. This enforces the discipline of § Concrete and precise thinking § Data-Driven evaluation for important trade-off of design choices Architect should have strong (or reasonable) SW skill to build this simulator. OOP language and Event-Driven programming paradigm is the natural fit for this job. C++ is the standard choice.
  • 28.
    Arch exploration takestime and experiences. Korean industries have neglected this part because we didn’t (or couldn’t afford to) allocate enough time for defining and exploring the design space to come up with the solid architecture specification. It takes time because § Workload characterization and prediction takes time. § Simulation needs supercomputer-scale computation. § Understanding very detailed design trade-off just takes time. In other words, cultivating intuition by refining it iteratively by methodically taking good measures takes time
  • 29.
    Time Schedule So let’ssay it takes 1.5~2 years to build commercial AI chips from concept to production. We need to allocate at least 6~8 month for performance modeling that goes in parallel to the implementation Performance Modeling / Architecturing RTL Implementation Software Architecturing / Implementation Verification Physical Design / Manufacturing
  • 30.
    Arch Examples: :Quantization (suggested by Google) § Aggressive operator fusion: Performing as many operations as possible in a single pass can lower the cost of memory accesses and provide significant improvements in run-time and power consumption § Compressed memory access: One can optimize memory bandwidth by supporting on the fly de-compression of weights (and activations). A simple way to do that is to support lower precision storage of weights and possibly activations. § Lower precision 4/8/16 bit arithmetic processing § Per-layer selection of bitwidths § Per-channel quantization
  • 31.
    Example of AIchips: Google TPU
  • 32.
    Example of AIchips: Furiosa MadRun
  • 33.
    Team building: What’s theright team to build great winner AI chips?
  • 34.
    New Organization Essential Application(Business) + Algorithm+ Software Driven된 기존과 다른 조직 구성이 필수적으로 필요하다. Any orgnization that designs a system… will inevitably produce a design whose structure is a copy of the organization’s communication structure. – Conway, cliff young • 큰 기업이 스타트업보다 불리함. • 스타트업에게도 쉽지 않음.
  • 35.
    FuriosaAI: organization structure Application+ Algorithm + Software Driven • Application Partners: Naver, BinaryVR, Molocos, Neosapience, Seoul Smart City Project… • Algorithm (2) • Software (6): Compiler, Runtime, Driver, Tool Chains • Microarchitect (4): NPU Core, NoC, DRAM subsystem • Logic Design (3) • Physical Design (1): Outsourcing to SiFive or China Partners • Manufacturing / Packaging / Board: TSMC or Samsung and Design house.
  • 36.
    뜻으로 보는 기술스타트업: Furiosa Perspective
  • 37.
    뜻으로 본 한국역사 (함석헌) 수난의 여왕 치욕과 분열과 압박과 상실과 좌절을 극복해나가는 역사 한국 기술 스타트업은 글로발 그리고 국내 생태계의 험준한 위치에서 가파른 수난의 지형을 뚫고 올라가는 도전의 걸음임과 동시에 수많은 실패속에서도 결국 분명히 우뚝 서겠다는 강한 의지와 희망이다.
  • 38.
    뜻으로 보는 기술스타트업 새로운 창조를 위한 씨알 씨알의 역사적 의미 - 씨알이란 말은 민(people)이란 뜻인데, 우리 자신을 역사적 악에서 해방시키고, 새로운 창조를 위한 자격을 스스로 닦아 내기 위한 씨알. 기술 스타트업의 생태계적 의미 – 지성(People + AI)을 바탕으로 근본 문제를 해결하여 우리 생태계 (ecosystem)를 기존 관성에서 해방시키고, 혁신적 비지니스 모델 을 창조하기 위한 자격을 스스로 닦아 내기 위한 씨알. Keyword: 주체성, 근본성, 순수성, 생동성, 관계성
  • 39.
    Final Word: ecosystem Weshould do deep research on local and global Ecosystem. 기술 기업은 필수적으로 기술을 필요로 하는 파트너와 적극 협력하는 관계가 중요하며 이는 국내와 글로발 생태계에 대한 깊은 이해와 더불어 자신에 대한 철저한 인식을 바탕으로 해야 한다.
  • 40.