2010 - 2022 128X increase in transistors per chip CPUNIC RAM FLASH DISK Moore’s Law will continue for at least 10 Years Transistors per area will double ~ every 2 year 128 X increase in ~ 12 years 2022: 512Gbit / DRAM, 8 Tbit / Flash Frequency Gains are difficult Pollack’s rule: Power scales quadratic with clock performance Parallelism with more cores is a must
2010 - 2022 128X increase in transistors per chip CPU NIC RAM FLASH DISK 2014: 64 cores, 2016: 128 cores, 2022: 1024 cores Memory/IO bandwidth need to grow with processing power Disks cannot follow!
2010 - 2022 128X increase in transistors per chip CPUNIC RAM FLASH DISK 2010 2022 CORES PER 10 1024 CHIP MEMORY Challenging! BANDWIDTH 40 Gb/s 2.5 Tb/s But needed to feed the cores ! IO 2 Gb/s 250 Gb/s BANDWIDTH •No big change : Single Core Clock Rate (will stay < 5GHz ) •But impressive overall computing power: 5000 ( core * GHz )
Disks are Tape DISK“Spinning Rust” Forget Hard Disks ! Disks cannot go faster Disks cannot follow bandwidth requirements Random-read scanning of a 1TB disk space today : takes 15 – 150 days (!) To reach 1TB/s you would need 10.000 disks in parallel Disks can only be archives any more (sequential access) DRAM, Flash and PCM will be replacement
2010 - 2022 128X increase in transistors per chip CPUNIC RAM FLASH DISK 2010 2022 CORES PER 16 1024 CHIP MEMORY BANDWIDTH 40 GB/s 2.5 TB/s IO 2 GB/s 250 GB/s BANDWIDTH No big change : Latency
Latency and Bandwidth 2 determining factors , which won’t change : RAM – CPU latency : ~ 0.1 µs NIC latency via LAN or WAN : 0.1 – 100 ms RAM CPU DISK NIC FLASH archive NICs move to PCI Express Throughput x 2 / year May move onto CPU chip Access time falls by 50% / year 10 – 100 Gbit/s already today goes from SATA to PCI ExpressLatency in cluster ~1 µs possible (Infiniband/opt. Ethern.) LAN/WAN latency 0.1 – 100 ms
A CPU accesses Level 1 cache memory in 1 – 2 cycles.
A CPU accessesLevel 12 cache memory– It accesses Level cache memory in 1 in 6 – cycles. 2 20 cycles.
It accesses Level 2 cache memory in 6 – 20It accesses RAM in 100 – 400 cycles. cycles.
It accesses Flash memory in 5000 It accesses RAM in 100 – 400 cycles. cycles.
It accesses Flash memorystorage It accesses Disc in 5000 cycles. in 1, 000, 000 cycles.
translate cycles to miles andassume you were a CPU core .. … then Level 1 cache would be in the building … Level 2 cache would be at the edge of this city … RAM would be in a different state … Flash memory would be a different country …... and disc storage would be the planet Mars.
Software ImplicationsRoundtrip latency 500 cycles RAM CPU DISK NIC FLASH 1000 – 500,000,000 5,000 1,000,000 archive cycles cycles cycles
Software Implications Latency and locality are the determining factors What could that mean?Roundtrip latency 500 cycles RAM CPU DISK NIC FLASH 1000 – archive 5,000 1,000,000 500,000,000 cycles cycles cycles
Why Bother ? Systems may just get smaller ! More users for transaction processing on a single machine - isn’t that great? Already today most customers could run the ERP load of a company on a single blade Commodity hardware becomes sufficient for ERP No threat! (… or may be becoming a commodity is a threat?)