SlideShare a Scribd company logo
1 of 26
Download to read offline
Scale-out Computing Model on Massive Core
System: From HPC to Fabric-Based SoC
Dr. Fu Li
li@qcftech.com
Quantum Cloud Future (Beijing) Technologies Co., Ltd.
Quantum Cloud Future (Beijing) Technology Co. Ltd.
Cook Book
1. What is Massive Core System (MCS)?
1.1. HPC system
1.2. GPU system
1.3. MicroSlides: Fabric-based SoC
2. Why scale-out computing is important in MCS?
3. How to make MCS faster?
3.1. MPI and openMP in HPC
3.2. Memory coalescing and cudaDMA in GPU computing
4. QCF’s scale-out computing model for Microslides
4.1. the hardware (Socionext)
4.2. the architecture
4.3. the result (arm vs x86 vs GPU)
new
Quantum Cloud Future (Beijing) Technology Co. Ltd.
QuantumTheory and Spectroscopy
Molecular Dynamics Fast Fourier Transform
HPC
Content-Centric Networking
Cloud Storage
Doppler ASIC Boba FPGA
MPI, OpenMPCUDAStatistic Mechanics
GPU switch
PacketShader
Introduction to Quantum Cloud
With background from Quantum calculation,
1) we perform large-scale molecular dynamics simulation on HPC cluster using
Amber and Gromacs,
2) we optimize Fourier transform and matrix operation on multicore system.
Quantum Cloud Future (Beijing) Technology Co. Ltd.
Introduction to Quantum Cloud
Then we found GPU is a great tool for both molecular dynamics and matrix
operation.
Quantum Cloud Future (Beijing) Technology Co. Ltd.
Introduction to Quantum Cloud
Later we found similar systems with massive CPU cores.
Quantum Cloud Future (Beijing) Technology Co. Ltd.
Introduction to Quantum Cloud
Today we will show some practical example about our scale-out algorithm on
these systems
Quantum Cloud Future (Beijing) Technology Co. Ltd.
NumberofCores
1
10
100
1,000
10,000
100,000
System Power Consumption (Watts)
10 100 1000 10K 100k 1M
System and Cores: Communication Matters
QCF & SOCIONEXT
PC
Server
Blade
Server
Super
Computer
General-purpose
Quantum Cloud Future (Beijing) Technology Co. Ltd.
NumberofCores
1
10
100
1,000
10,000
100,000
System Power Consumption (Watts)
10 100 1000 10K 100k 1M
System and Cores: Communication Matters
QCF & SOCIONEXT
PC
Server
Blade
Server
Super
Computer
GPU
GPU Cluster
General-purpose
Special-purpose
Quantum Cloud Future (Beijing) Technology Co. Ltd.
NumberofCores
1
10
100
1,000
10,000
100,000
System Power Consumption (Watts)
10 100 1000 10K 100k 1M
System and Cores: Communication Matters
QCF & SOCIONEXT
PC
Server
Blade
Server
Super
Computer
GPU
GPU Cluster
General-purpose
Special-purpose
Traditional
ARM
Server
ARM
SoC
Quantum Cloud Future (Beijing) Technology Co. Ltd.
NumberofCores
1
10
100
1,000
10,000
100,000
System Power Consumption (Watts)
10 100 1000 10K 100k 1M
System and Cores: Communication Matters
QCF & SOCIONEXT
PC
Server
Blade
Server
Super
Computer
GPU
GPU Cluster
Microslides
Special-purpose
General-purpose
General-purpose
Microslides
of ARM CPU
Microslides
of ARM SoC
Traditional
ARM
Server
ARM
SoC
Quantum Cloud Future (Beijing) Technology Co. Ltd.
NumberofCores
1
10
100
1,000
10,000
100,000
System Power Consumption (Watts)
10 100 1000 10K 100k 1M
System and Cores: Communication Matters
QCF & SOCIONEXT
PC
Server
Blade
Server
Super
Computer
GPU
GPU Cluster
Microslides
Microslides
of ARM CPU
Microslides
of ARM SoC
2006 20182012
intra CPU connection
inter CPU connection
cluster connection
Special-purpose
General-purpose
General-purpose
Traditional
ARM
Server
ARM
SoC
Quantum Cloud Future (Beijing) Technology Co. Ltd.
Data Communication Between Systems Is Obstacle
cores
Intra CPU Fabric
Sockets Bus
Memory
Networking
Cache L2/L3
Cache L1
cores
Intra CPU Fabric
Sockets Bus
Memory
Networking
Cache L2/L3
Cache L1
Cache/Storage
I/O
Hierarchical structure is critical for Von Neumann architecture
Quantum Cloud Future (Beijing) Technology Co. Ltd.
Data Communication Between Systems Is Obstacle
cores
Intra CPU Fabric
Sockets Bus
Memory
Networking
Cache L2/L3
Cache L1
cores
Intra CPU Fabric
Sockets Bus
Memory
Networking
Cache L2/L3
Cache L1
Cache/Storage
I/O
Quantum Cloud Future (Beijing) Technology Co. Ltd.
Data Communication Between Systems Is Obstacle
cores
Intra CPU Fabric
Sockets Bus
Memory
Networking
Cache L2/L3
Cache L1
cores
Intra CPU Fabric
Sockets Bus
Memory
Networking
Cache L2/L3
Cache L1instruction-level
parallelism
OS-level
parallelism
algorithm-level
parallelism
Cache/Storage
I/O
Quantum Cloud Future (Beijing) Technology Co. Ltd.
Data Communication Between Systems Is Obstacle
cores
Intra CPU Fabric
Sockets Bus
Memory
Networking
Cache L2/L3
Cache L1
cores
Intra CPU Fabric
Sockets Bus
Memory
Networking
Cache L2/L3
Cache L1instruction-level
parallelism
OS-level
parallelism
algorithm-level
parallelism
batch, share-nothing
stateless computing
big RAM
avoid context switching
TLB, cache-conscious
big.LITTLE
GPU, FPGA
Fast cache, cache prefetch
Vector processing, SIMD/AVX
Cache/Storage
I/O
Quantum Cloud Future (Beijing) Technology Co. Ltd.
Data Communication Between Systems Is Obstacle
cores
Intra CPU Fabric
Sockets Bus
Memory
Networking
Cache L2/L3
Cache L1
cores
Intra CPU Fabric
Sockets Bus
Memory
Networking
Cache L2/L3
Cache L1instruction-level
parallelism
OS-level
parallelism
algorithm-level
parallelism
batch, share-nothing
stateless computing
big RAM
avoid context switching
TLB, cache-conscious
big.LITTLE
GPU, FPGA
Fast cache, cache prefetch
Vector processing, SIMD/AVX
Cache/Storage
I/O
Consolidation will be the next-wave innovation for Chip design and system optimization
• IO consolidation: networking, bus, fabric
• storage consolidation: memory, cache, networking buffer
Quantum Cloud Future (Beijing) Technology Co. Ltd.
Parallel and Scaling
Quantum Cloud Future (Beijing) Technology Co. Ltd.
Fabric-Based ARM SoC
From SOCIONEXT
• PCIe Fabric for networking
• 768 cores
• c2c 10Gbps, 36 microsec latency
• 1TB DDR4 RAM
• 700 watts TDP per chassis
watt/core
ARMSoC 1
x86 16~25
GPU 0.3~0.5
Quantum Cloud Future (Beijing) Technology Co. Ltd.
Cluster Management Tools
PBS openstack kubernetes mesos
basic batchprocess kvm container container/noncontainer
pro
veryfast
veryflexible
normallywithMPI
verysecure
verystable
system-levelisolation
fast
secure
productionready
fast
compatiblewith
processandcontainer
productionready
canbesecure
cons noisolation
highoverhead
slow
containerapp
notflexibleenough
complexity
scenario scientificcalculation privatecloud applicationCI DatacenterOS
Quantum Cloud Future (Beijing) Technology Co. Ltd.
Share-Nothing + Message Queue Architecture
Stateless
计算架构
host
core core
IO
core
use an “individual” core to do IO for the host to
increase the throughput
Quantum Cloud Future (Beijing) Technology Co. Ltd.
Example: PacketShader on GPU
Quantum Cloud Future (Beijing) Technology Co. Ltd.
Example: Rendering on Arm
Render@Baremetal
Render@Container
0
1
2
3
4
buggy fishy cat bmps teeglasFX splash poked
Intel ARM
0
0.5
1
1.5
2
bmw27 classroom bechmark
Baremetal 1container 2container 4container
并发情况下提⾼高3倍
多实例例并发情况下提⾼高1.8倍
Quantum Cloud Future (Beijing) Technology Co. Ltd.
Example: Rendering on Arm
0
7.5
15
22.5
30
performace scaled 1 scaled 2
Intel arm SoC Intel arm SoC Intel arm SoC
scaled 1: scaled performance with frequency and core number
scaled 2: scaled performance with frequency and core number and watts
Quantum Cloud Future (Beijing) Technology Co. Ltd.
Example: AI on Arm
Caffe@Container ARM vs Intel vs GPU (scaled)
0
0.4
0.8
1.2
1.6
CIFAR 10 - 1 CIFAR 10 -2 CIFAR 10 - 3
Intel ARM GPU 1070
Quantum Cloud Future (Beijing) Technology Co. Ltd.
Example: AI on Arm SoC
0
4
8
12
16
caffe scaled caffe darknet scaled darknet
Intel SoC Intel SoC Intel SoC Intel SoC
0
2.25
4.5
6.75
9
caffe scaled caffe darknet scaled darknet
Intel SoC Intel SoC Intel SoC Intel SoC
Training
Inference
量量⼦子云未来(北北京)信息科技有限公司(以下称量量⼦子云)是⼀一家以影视⾏行行业为主的垂直⾏行行业云计算公司。
量量⼦子云专注于影视⾏行行业的云化,和国际知名影视公司和特效制作公司合作,为影视⾏行行业客户提供制作软件、图形⼯工作站、⾼高性能存储、渲染服务等⼀一站式解决⽅方案等。
ADDRESS 北北京市朝阳区⼯工体北北路路8号三⾥里里屯SOHO办公A座2101NUMBER
EMAIL info@lzyco.com WEBSITE
010-53518265
www.lzyco.com
THANKS

More Related Content

What's hot

User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network ProcessingRyousei Takano
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersRyousei Takano
 
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore EraFlow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore EraRyousei Takano
 
計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?Shinnosuke Furuya
 
Early Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingEarly Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingDESMOND YUEN
 
Playing BBR with a userspace network stack
Playing BBR with a userspace network stackPlaying BBR with a userspace network stack
Playing BBR with a userspace network stackHajime Tazaki
 
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand SolutionsMellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand Solutionsinside-BigData.com
 
Hardware architecture of Summit Supercomputer
 Hardware architecture of Summit Supercomputer Hardware architecture of Summit Supercomputer
Hardware architecture of Summit SupercomputerVigneshwarRamaswamy
 
Scallable Distributed Deep Learning on OpenPOWER systems
Scallable Distributed Deep Learning on OpenPOWER systemsScallable Distributed Deep Learning on OpenPOWER systems
Scallable Distributed Deep Learning on OpenPOWER systemsGanesan Narayanasamy
 
JAWS-UG HPC #17 - Supercomputing'19 参加報告 - PFN 福田圭祐
JAWS-UG HPC #17 - Supercomputing'19 参加報告 - PFN 福田圭祐JAWS-UG HPC #17 - Supercomputing'19 参加報告 - PFN 福田圭祐
JAWS-UG HPC #17 - Supercomputing'19 参加報告 - PFN 福田圭祐Preferred Networks
 
Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Hajime Tazaki
 
2014-4Q-OpenStack-Fall-presentation-public-20150310a
2014-4Q-OpenStack-Fall-presentation-public-20150310a2014-4Q-OpenStack-Fall-presentation-public-20150310a
2014-4Q-OpenStack-Fall-presentation-public-20150310aKen Igarashi
 
LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1Hajime Tazaki
 
MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformGanesan Narayanasamy
 
Stig Telfer - OpenStack and the Software-Defined SuperComputer
Stig Telfer - OpenStack and the Software-Defined SuperComputerStig Telfer - OpenStack and the Software-Defined SuperComputer
Stig Telfer - OpenStack and the Software-Defined SuperComputerDanny Abukalam
 
Jetson AGX Xavier and the New Era of Autonomous Machines
Jetson AGX Xavier and the New Era of Autonomous MachinesJetson AGX Xavier and the New Era of Autonomous Machines
Jetson AGX Xavier and the New Era of Autonomous MachinesDustin Franklin
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning ApplicationsNVIDIA Taiwan
 

What's hot (20)

User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network Processing
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computers
 
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore EraFlow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
 
計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?
 
Early Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingEarly Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic Computing
 
Playing BBR with a userspace network stack
Playing BBR with a userspace network stackPlaying BBR with a userspace network stack
Playing BBR with a userspace network stack
 
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand SolutionsMellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
 
mTCP使ってみた
mTCP使ってみたmTCP使ってみた
mTCP使ってみた
 
Hardware architecture of Summit Supercomputer
 Hardware architecture of Summit Supercomputer Hardware architecture of Summit Supercomputer
Hardware architecture of Summit Supercomputer
 
Scallable Distributed Deep Learning on OpenPOWER systems
Scallable Distributed Deep Learning on OpenPOWER systemsScallable Distributed Deep Learning on OpenPOWER systems
Scallable Distributed Deep Learning on OpenPOWER systems
 
JAWS-UG HPC #17 - Supercomputing'19 参加報告 - PFN 福田圭祐
JAWS-UG HPC #17 - Supercomputing'19 参加報告 - PFN 福田圭祐JAWS-UG HPC #17 - Supercomputing'19 参加報告 - PFN 福田圭祐
JAWS-UG HPC #17 - Supercomputing'19 参加報告 - PFN 福田圭祐
 
Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)
 
2014-4Q-OpenStack-Fall-presentation-public-20150310a
2014-4Q-OpenStack-Fall-presentation-public-20150310a2014-4Q-OpenStack-Fall-presentation-public-20150310a
2014-4Q-OpenStack-Fall-presentation-public-20150310a
 
LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1
 
MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platform
 
Stig Telfer - OpenStack and the Software-Defined SuperComputer
Stig Telfer - OpenStack and the Software-Defined SuperComputerStig Telfer - OpenStack and the Software-Defined SuperComputer
Stig Telfer - OpenStack and the Software-Defined SuperComputer
 
Chainer v4 and v5
Chainer v4 and v5Chainer v4 and v5
Chainer v4 and v5
 
Jetson AGX Xavier and the New Era of Autonomous Machines
Jetson AGX Xavier and the New Era of Autonomous MachinesJetson AGX Xavier and the New Era of Autonomous Machines
Jetson AGX Xavier and the New Era of Autonomous Machines
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning Applications
 
RAPIDS Overview
RAPIDS OverviewRAPIDS Overview
RAPIDS Overview
 

Similar to Scale-out AI Training on Massive Core System from HPC to Fabric-based SOC

HPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeHPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeAnand Haridass
 
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Eric Van Hensbergen
 
Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...PT Datacomm Diangraha
 
Trends and challenges in IP based SOC design
Trends and challenges in IP based SOC designTrends and challenges in IP based SOC design
Trends and challenges in IP based SOC designAishwaryaRavishankar8
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceAmazon Web Services
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceAmazon Web Services
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
The von Neumann Memory Barrier and Computer Architectures for the 21st Century
The von Neumann Memory Barrier and Computer Architectures for the 21st CenturyThe von Neumann Memory Barrier and Computer Architectures for the 21st Century
The von Neumann Memory Barrier and Computer Architectures for the 21st CenturyPerry Lea
 
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Amazon Web Services
 
組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステムShinnosuke Furuya
 
Parallelism Processor Design
Parallelism Processor DesignParallelism Processor Design
Parallelism Processor DesignSri Prasanna
 
Future Cloud Infrastructure
Future Cloud InfrastructureFuture Cloud Infrastructure
Future Cloud Infrastructureexponential-inc
 
A Framework with Cloud Integration for CNN Acceleration on FPGA Devices
A Framework with Cloud Integration for CNN Acceleration on FPGA DevicesA Framework with Cloud Integration for CNN Acceleration on FPGA Devices
A Framework with Cloud Integration for CNN Acceleration on FPGA DevicesNECST Lab @ Politecnico di Milano
 
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...Paul Brebner
 
Arm Neoverse market update_05122020.pdf
Arm Neoverse market update_05122020.pdfArm Neoverse market update_05122020.pdf
Arm Neoverse market update_05122020.pdfPaul Yang
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...PROIDEA
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudRyousei Takano
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsAnand Haridass
 

Similar to Scale-out AI Training on Massive Core System from HPC to Fabric-based SOC (20)

HPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeHPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand Challenge
 
uCluster
uClusteruCluster
uCluster
 
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
 
Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...
 
Trends and challenges in IP based SOC design
Trends and challenges in IP based SOC designTrends and challenges in IP based SOC design
Trends and challenges in IP based SOC design
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
The von Neumann Memory Barrier and Computer Architectures for the 21st Century
The von Neumann Memory Barrier and Computer Architectures for the 21st CenturyThe von Neumann Memory Barrier and Computer Architectures for the 21st Century
The von Neumann Memory Barrier and Computer Architectures for the 21st Century
 
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
 
組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム
 
Parallelism Processor Design
Parallelism Processor DesignParallelism Processor Design
Parallelism Processor Design
 
Future Cloud Infrastructure
Future Cloud InfrastructureFuture Cloud Infrastructure
Future Cloud Infrastructure
 
A Framework with Cloud Integration for CNN Acceleration on FPGA Devices
A Framework with Cloud Integration for CNN Acceleration on FPGA DevicesA Framework with Cloud Integration for CNN Acceleration on FPGA Devices
A Framework with Cloud Integration for CNN Acceleration on FPGA Devices
 
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
 
Arm Neoverse market update_05122020.pdf
Arm Neoverse market update_05122020.pdfArm Neoverse market update_05122020.pdf
Arm Neoverse market update_05122020.pdf
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC Cloud
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of Systems
 

More from inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Recently uploaded

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Recently uploaded (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

Scale-out AI Training on Massive Core System from HPC to Fabric-based SOC

  • 1. Scale-out Computing Model on Massive Core System: From HPC to Fabric-Based SoC Dr. Fu Li li@qcftech.com Quantum Cloud Future (Beijing) Technologies Co., Ltd.
  • 2. Quantum Cloud Future (Beijing) Technology Co. Ltd. Cook Book 1. What is Massive Core System (MCS)? 1.1. HPC system 1.2. GPU system 1.3. MicroSlides: Fabric-based SoC 2. Why scale-out computing is important in MCS? 3. How to make MCS faster? 3.1. MPI and openMP in HPC 3.2. Memory coalescing and cudaDMA in GPU computing 4. QCF’s scale-out computing model for Microslides 4.1. the hardware (Socionext) 4.2. the architecture 4.3. the result (arm vs x86 vs GPU) new
  • 3. Quantum Cloud Future (Beijing) Technology Co. Ltd. QuantumTheory and Spectroscopy Molecular Dynamics Fast Fourier Transform HPC Content-Centric Networking Cloud Storage Doppler ASIC Boba FPGA MPI, OpenMPCUDAStatistic Mechanics GPU switch PacketShader Introduction to Quantum Cloud With background from Quantum calculation, 1) we perform large-scale molecular dynamics simulation on HPC cluster using Amber and Gromacs, 2) we optimize Fourier transform and matrix operation on multicore system.
  • 4. Quantum Cloud Future (Beijing) Technology Co. Ltd. Introduction to Quantum Cloud Then we found GPU is a great tool for both molecular dynamics and matrix operation.
  • 5. Quantum Cloud Future (Beijing) Technology Co. Ltd. Introduction to Quantum Cloud Later we found similar systems with massive CPU cores.
  • 6. Quantum Cloud Future (Beijing) Technology Co. Ltd. Introduction to Quantum Cloud Today we will show some practical example about our scale-out algorithm on these systems
  • 7. Quantum Cloud Future (Beijing) Technology Co. Ltd. NumberofCores 1 10 100 1,000 10,000 100,000 System Power Consumption (Watts) 10 100 1000 10K 100k 1M System and Cores: Communication Matters QCF & SOCIONEXT PC Server Blade Server Super Computer General-purpose
  • 8. Quantum Cloud Future (Beijing) Technology Co. Ltd. NumberofCores 1 10 100 1,000 10,000 100,000 System Power Consumption (Watts) 10 100 1000 10K 100k 1M System and Cores: Communication Matters QCF & SOCIONEXT PC Server Blade Server Super Computer GPU GPU Cluster General-purpose Special-purpose
  • 9. Quantum Cloud Future (Beijing) Technology Co. Ltd. NumberofCores 1 10 100 1,000 10,000 100,000 System Power Consumption (Watts) 10 100 1000 10K 100k 1M System and Cores: Communication Matters QCF & SOCIONEXT PC Server Blade Server Super Computer GPU GPU Cluster General-purpose Special-purpose Traditional ARM Server ARM SoC
  • 10. Quantum Cloud Future (Beijing) Technology Co. Ltd. NumberofCores 1 10 100 1,000 10,000 100,000 System Power Consumption (Watts) 10 100 1000 10K 100k 1M System and Cores: Communication Matters QCF & SOCIONEXT PC Server Blade Server Super Computer GPU GPU Cluster Microslides Special-purpose General-purpose General-purpose Microslides of ARM CPU Microslides of ARM SoC Traditional ARM Server ARM SoC
  • 11. Quantum Cloud Future (Beijing) Technology Co. Ltd. NumberofCores 1 10 100 1,000 10,000 100,000 System Power Consumption (Watts) 10 100 1000 10K 100k 1M System and Cores: Communication Matters QCF & SOCIONEXT PC Server Blade Server Super Computer GPU GPU Cluster Microslides Microslides of ARM CPU Microslides of ARM SoC 2006 20182012 intra CPU connection inter CPU connection cluster connection Special-purpose General-purpose General-purpose Traditional ARM Server ARM SoC
  • 12. Quantum Cloud Future (Beijing) Technology Co. Ltd. Data Communication Between Systems Is Obstacle cores Intra CPU Fabric Sockets Bus Memory Networking Cache L2/L3 Cache L1 cores Intra CPU Fabric Sockets Bus Memory Networking Cache L2/L3 Cache L1 Cache/Storage I/O Hierarchical structure is critical for Von Neumann architecture
  • 13. Quantum Cloud Future (Beijing) Technology Co. Ltd. Data Communication Between Systems Is Obstacle cores Intra CPU Fabric Sockets Bus Memory Networking Cache L2/L3 Cache L1 cores Intra CPU Fabric Sockets Bus Memory Networking Cache L2/L3 Cache L1 Cache/Storage I/O
  • 14. Quantum Cloud Future (Beijing) Technology Co. Ltd. Data Communication Between Systems Is Obstacle cores Intra CPU Fabric Sockets Bus Memory Networking Cache L2/L3 Cache L1 cores Intra CPU Fabric Sockets Bus Memory Networking Cache L2/L3 Cache L1instruction-level parallelism OS-level parallelism algorithm-level parallelism Cache/Storage I/O
  • 15. Quantum Cloud Future (Beijing) Technology Co. Ltd. Data Communication Between Systems Is Obstacle cores Intra CPU Fabric Sockets Bus Memory Networking Cache L2/L3 Cache L1 cores Intra CPU Fabric Sockets Bus Memory Networking Cache L2/L3 Cache L1instruction-level parallelism OS-level parallelism algorithm-level parallelism batch, share-nothing stateless computing big RAM avoid context switching TLB, cache-conscious big.LITTLE GPU, FPGA Fast cache, cache prefetch Vector processing, SIMD/AVX Cache/Storage I/O
  • 16. Quantum Cloud Future (Beijing) Technology Co. Ltd. Data Communication Between Systems Is Obstacle cores Intra CPU Fabric Sockets Bus Memory Networking Cache L2/L3 Cache L1 cores Intra CPU Fabric Sockets Bus Memory Networking Cache L2/L3 Cache L1instruction-level parallelism OS-level parallelism algorithm-level parallelism batch, share-nothing stateless computing big RAM avoid context switching TLB, cache-conscious big.LITTLE GPU, FPGA Fast cache, cache prefetch Vector processing, SIMD/AVX Cache/Storage I/O Consolidation will be the next-wave innovation for Chip design and system optimization • IO consolidation: networking, bus, fabric • storage consolidation: memory, cache, networking buffer
  • 17. Quantum Cloud Future (Beijing) Technology Co. Ltd. Parallel and Scaling
  • 18. Quantum Cloud Future (Beijing) Technology Co. Ltd. Fabric-Based ARM SoC From SOCIONEXT • PCIe Fabric for networking • 768 cores • c2c 10Gbps, 36 microsec latency • 1TB DDR4 RAM • 700 watts TDP per chassis watt/core ARMSoC 1 x86 16~25 GPU 0.3~0.5
  • 19. Quantum Cloud Future (Beijing) Technology Co. Ltd. Cluster Management Tools PBS openstack kubernetes mesos basic batchprocess kvm container container/noncontainer pro veryfast veryflexible normallywithMPI verysecure verystable system-levelisolation fast secure productionready fast compatiblewith processandcontainer productionready canbesecure cons noisolation highoverhead slow containerapp notflexibleenough complexity scenario scientificcalculation privatecloud applicationCI DatacenterOS
  • 20. Quantum Cloud Future (Beijing) Technology Co. Ltd. Share-Nothing + Message Queue Architecture Stateless 计算架构 host core core IO core use an “individual” core to do IO for the host to increase the throughput
  • 21. Quantum Cloud Future (Beijing) Technology Co. Ltd. Example: PacketShader on GPU
  • 22. Quantum Cloud Future (Beijing) Technology Co. Ltd. Example: Rendering on Arm Render@Baremetal Render@Container 0 1 2 3 4 buggy fishy cat bmps teeglasFX splash poked Intel ARM 0 0.5 1 1.5 2 bmw27 classroom bechmark Baremetal 1container 2container 4container 并发情况下提⾼高3倍 多实例例并发情况下提⾼高1.8倍
  • 23. Quantum Cloud Future (Beijing) Technology Co. Ltd. Example: Rendering on Arm 0 7.5 15 22.5 30 performace scaled 1 scaled 2 Intel arm SoC Intel arm SoC Intel arm SoC scaled 1: scaled performance with frequency and core number scaled 2: scaled performance with frequency and core number and watts
  • 24. Quantum Cloud Future (Beijing) Technology Co. Ltd. Example: AI on Arm Caffe@Container ARM vs Intel vs GPU (scaled) 0 0.4 0.8 1.2 1.6 CIFAR 10 - 1 CIFAR 10 -2 CIFAR 10 - 3 Intel ARM GPU 1070
  • 25. Quantum Cloud Future (Beijing) Technology Co. Ltd. Example: AI on Arm SoC 0 4 8 12 16 caffe scaled caffe darknet scaled darknet Intel SoC Intel SoC Intel SoC Intel SoC 0 2.25 4.5 6.75 9 caffe scaled caffe darknet scaled darknet Intel SoC Intel SoC Intel SoC Intel SoC Training Inference