1) A CPU core can cycle over 3 billion times per second, while light only travels 10cm during one CPU cycle.
2) Soon, servers will have 128 CPU cores with over 400 billion CPU cycles per second, but most of that power will be wasted waiting for data.
3) By 2022, there will be a 128x increase in transistors per chip, but disk storage cannot keep up with bandwidth demands and will only be used for archival purposes, with DRAM, flash, and phase change memory replacing it for active data.
11. 2010 - 2022
128X increase in transistors per chip
CPU
NIC RAM FLASH DISK
Moore’s Law will continue for at least 10 Years
Transistors per area will double ~ every 2 year
128 X increase in ~ 12 years
2022: 512Gbit / DRAM, 8 Tbit / Flash
Frequency Gains are difficult
Pollack’s rule: Power scales quadratic with clock performance
Parallelism with more cores is a must
12. 2010 - 2022
128X increase in transistors per chip
CPU
NIC RAM FLASH DISK
2014: 64 cores, 2016: 128 cores, 2022: 1024 cores
Memory/IO bandwidth need to grow with processing power
Disks cannot follow!
13. 2010 - 2022
128X increase in transistors per chip
CPU
NIC RAM FLASH DISK
2010 2022
CORES PER
10 1024
CHIP
MEMORY
Challenging!
BANDWIDTH 40 Gb/s 2.5 Tb/s But needed to feed the
cores !
IO
2 Gb/s 250 Gb/s
BANDWIDTH
•No big change : Single Core Clock Rate (will stay < 5GHz )
•But impressive overall computing power: 5000 ( core * GHz )
14. Disks are Tape
DISK
“Spinning Rust”
Forget Hard Disks !
Disks cannot go faster
Disks cannot follow bandwidth requirements
Random-read scanning of a 1TB disk space today :
takes 15 – 150 days (!)
To reach 1TB/s you would need 10.000 disks in parallel
Disks can only be archives any more (sequential access)
DRAM, Flash and PCM will be replacement
15. 2010 - 2022
128X increase in transistors per chip
CPU
NIC RAM FLASH DISK
2010 2022
CORES PER
16 1024
CHIP
MEMORY
BANDWIDTH 40 GB/s 2.5 TB/s
IO
2 GB/s 250 GB/s
BANDWIDTH
No big change : Latency
16. Latency and Bandwidth
2 determining factors , which won’t change :
RAM – CPU latency : ~ 0.1 µs
NIC latency via LAN or WAN : 0.1 – 100 ms
RAM
CPU DISK
NIC
FLASH
archive
NICs move to PCI Express
Throughput x 2 / year
May move onto CPU chip
Access time falls by 50% / year
10 – 100 Gbit/s already today
goes from SATA to PCI Express
Latency in cluster ~1 µs possible
(Infiniband/opt. Ethern.)
LAN/WAN latency 0.1 – 100 ms
19. A CPU accessesLevel 12 cache memory–
It accesses Level cache memory in 1
in 6 – cycles.
2 20 cycles.
20. It accesses Level 2 cache memory in 6 – 20
It accesses RAM in 100 – 400 cycles.
cycles.
21. It accesses Flash memory in 5000
It accesses RAM in 100 – 400 cycles.
cycles.
22. It accesses Flash memorystorage
It accesses Disc in 5000 cycles.
in 1, 000, 000 cycles.
23. translate cycles to miles and
assume you were a CPU core ..
… then Level 1 cache would be
in the building …
Level 2 cache would be
at the edge of this city …
RAM would be in a different state …
Flash memory would be a different
country …
... and disc storage would be the planet
Mars.
25. Software Implications
Latency and locality are the determining factors
What could that mean?
Roundtrip latency 500
cycles
RAM
CPU DISK
NIC
FLASH
1000 – archive
5,000 1,000,000
500,000,000
cycles cycles
cycles
26. Why Bother ?
Systems may just get smaller !
More users for transaction
processing on a single machine -
isn’t that great?
Already today most customers
could run the ERP load of a
company on a single blade
Commodity hardware becomes
sufficient for ERP
No threat!
(… or may be becoming a commodity is a threat?)