5. Today’s supercomputers
DF W
FU
FU
M
clock clockclock clock
Processor CORE
Multi‐core CPU (chip)
Multi‐chip node (board)
HBM
Accel.
(GPU)
Accel.
(GPU)
NIC
L2, L3 cache
Rack
(connects
many nodes)
System
(connects
many racks)L2,L3cacheL2,L3cache
6. What’s happened in supercomputing
• Mainframe Era (circa 1953 ‐ circa 1972)
• Memory capacity is the main limit
• All fundamental computer architecture techniques are invented
• First rise of HW acceleration: vector computers (1974‐1993)
• Processing speed on matrix algebra is the main limit
• SIMD processing, domain specific architectures
• Rise of massive homogeneous parallelism (1994‐2007)
• Memory bandwidth is the main limit (memory wall)
• Moore’s law boosts clock speed and scale of integration in HIGH VOLUME
PRODUCTION processors (killer processors)
• Parallel architectures with commodity CPUs overcome vector processors
• The renaissance of acceleration units (2008 ‐ …? )
• Power consumption is the main limit
• Hardware specialization allows better power efficiency (FLOPS/W)
• The first example are GPUs because they come from HIGH VOLUME MARKET
• Need specialized yet widely re‐usable power efficient accelerator chips
7. Todays’s Embedded Systems for IoT
Long range, low BW
Short range, BW
Low rate (periodic) data
SW update, commands
Transmit
Idle: ~1µW
Active: ~ 10mW
Analyze
µController
IOs
Was 1 ÷ 25 MOPS
in 1 ÷ 10 mW
e.g. CortexM
Sense
MEMS Microphone
ULP Imager
100 µW ÷ 2 mW
EMG/ECG/EIT
Local Mem.
Accelerators
Now is >1000 MOPS
in 1 ÷ 10 mW
Courtesy of Luca Benini,ETH Zurich
8. What’s happened in Embedded Systems for IoT
• Wireless Sensor Network era (2000 – 2010)
• Limited or absent Internet connection, limited and local processing
• First generation Internet of Things (2010 – 2018)
• Internet connection; processing demanded to the cloud
• Artificial Intelligence on Internet of Things (2018 ‐ …?)
• Need very high computing power for AI applications
(VGG16 convolutional NN requires >100 billions operations per inference)
• Need to favor local processing (edge computing) over processing off‐load
(cloud computing) to reduce communication overhead
• Need very high power efficiency for local processing
• Hardware acceleration and parallel computation
• Need a supercomputer on the sensor node
9. • HPC is a strategic goal pursued at worldwide political level
• EPI project alone totaling 120 Million Euros funding
• HPC systems (supercomputers) necessarily target not only higher
speed, but also higher and higher power efficiency
• HPC history shows that supercomputers need high volume market
devices to economically survive using top‐level technology
• Embedded systems (IoT, automotive) demand not only power
efficiency, but also higher and higher computing speed
• Embedded systems have the market volume to justify mass
production of computing devices
• Key enabling technologies in the close future:
• 7‐5‐3 nm FinFET processes
• 3D stacked memories (HBM), stacked memories‐CPU (heat!)
• Chip‐lets
• Optical links
Summary of the trends...