CPU cycles travel farther than light in 1 second

•Download as PPT, PDF•

0 likes•299 views

1) A CPU core can cycle over 3 billion times per second, while light only travels 10cm during one CPU cycle. 2) Soon, servers will have 128 CPU cores with over 400 billion CPU cycles per second, but most of that power will be wasted waiting for data. 3) By 2022, there will be a 128x increase in transistors per chip, but disk storage cannot keep up with bandwidth demands and will only be used for archival purposes, with DRAM, flash, and phase change memory replacing it for active data.

Technology

Today, a CPU core can
cycle three billion times
in one second.

In about 1 second, light
travels to the moon …

… but during one CPU
cycle, light travels only
10cm.

A motherboard with
eight x 16 core CPUs will
soon be available …

That is 128x the computing
power of a single CPU …

… or over 400 billion CPU cycles per second
on a
single server blade or socket.

… most of that computing
power will be wasted …

2010 - 2022
128X increase in transistors per chip

CPU
NIC RAM FLASH DISK

 Moore’s Law will continue for at least 10 Years
 Transistors per area will double ~ every 2 year
 128 X increase in ~ 12 years
 2022: 512Gbit / DRAM, 8 Tbit / Flash
 Frequency Gains are difficult
 Pollack’s rule: Power scales quadratic with clock performance
 Parallelism with more cores is a must

2010 - 2022
128X increase in transistors per chip

CPU
NIC RAM FLASH DISK

 2014: 64 cores, 2016: 128 cores, 2022: 1024 cores
 Memory/IO bandwidth need to grow with processing power
 Disks cannot follow!

2010 - 2022
128X increase in transistors per chip

CPU
NIC RAM FLASH DISK

2010 2022

CORES PER
10 1024
CHIP
MEMORY
Challenging!
BANDWIDTH 40 Gb/s 2.5 Tb/s But needed to feed the
cores !
IO
2 Gb/s 250 Gb/s
BANDWIDTH

•No big change : Single Core Clock Rate (will stay < 5GHz )
•But impressive overall computing power:  5000 ( core * GHz )

Disks are Tape

DISK
“Spinning Rust”
 Forget Hard Disks !
 Disks cannot go faster
 Disks cannot follow bandwidth requirements
 Random-read scanning of a 1TB disk space today :
takes 15 – 150 days (!)
 To reach 1TB/s you would need 10.000 disks in parallel
 Disks can only be archives any more (sequential access)
 DRAM, Flash and PCM will be replacement

2010 - 2022
128X increase in transistors per chip

CPU
NIC RAM FLASH DISK

2010 2022

CORES PER
16 1024
CHIP
MEMORY
BANDWIDTH 40 GB/s 2.5 TB/s

IO
2 GB/s 250 GB/s
BANDWIDTH

No big change : Latency

Latency and Bandwidth
2 determining factors , which won’t change :
RAM – CPU latency : ~ 0.1 µs
NIC latency via LAN or WAN : 0.1 – 100 ms

RAM
CPU DISK
NIC
FLASH
archive
 NICs move to PCI Express
 Throughput x 2 / year
 May move onto CPU chip
 Access time falls by 50% / year
10 – 100 Gbit/s already today
 goes from SATA to PCI Express
Latency in cluster ~1 µs possible
(Infiniband/opt. Ethern.)
 LAN/WAN latency 0.1 – 100 ms

A CPU accesses Level 1 cache
memory in 1 – 2 cycles.

A CPU accessesLevel 12 cache memory–
It accesses Level cache memory in 1
in 6 – cycles.
2 20 cycles.

It accesses Level 2 cache memory in 6 – 20
It accesses RAM in 100 – 400 cycles.
cycles.

It accesses Flash memory in 5000
It accesses RAM in 100 – 400 cycles.
cycles.

It accesses Flash memorystorage
It accesses Disc in 5000 cycles.
in 1, 000, 000 cycles.

translate cycles to miles and
assume you were a CPU core ..

… then Level 1 cache would be
in the building …
Level 2 cache would be
at the edge of this city …
RAM would be in a different state …
Flash memory would be a different
country …
... and disc storage would be the planet
Mars.

Software Implications

Roundtrip latency 500
cycles
RAM
CPU DISK
NIC
FLASH
1000 –
500,000,000
5,000 1,000,000 archive
cycles cycles
cycles

Software Implications

Latency and locality are the determining factors
What could that mean?

Roundtrip latency 500
cycles
RAM
CPU DISK
NIC
FLASH
1000 – archive
5,000 1,000,000
500,000,000
cycles cycles
cycles

Why Bother ?

Systems may just get smaller !
More users for transaction
processing on a single machine -
isn’t that great?
Already today most customers
could run the ERP load of a
company on a single blade
Commodity hardware becomes
sufficient for ERP
No threat!
(… or may be becoming a commodity is a threat?)

What's hot

LUG 2014Hitoshi Sato

Storage virtualisation loughtecLoughtec

Ahorro energético archivado de backupsOmega Peripherals

Plextor mSATA M5M sales kitsMaarten Souren

SSD based storage tuning for databasesAngelo Rajadurai

ZFS WorkshopAPNIC

ZFSMarc Seeger

Higher Performance SSDs with HLNANDrrschuetz

ZFSmewandalmeida

An Introduction to the Implementation of ZFS by Kirk McKusickeurobsdcon

CloudStackユーザ会〜仮想ルータの謎に迫るsamemoon

Zettabyte File Storage SystemAmdocs

Scale2014Dru Lavigne

ZFS Talk Part 1Steven Burgess

ZFS Tutorial LISA 2011Richard Elling

What's hot (15)

LUG 2014

Storage virtualisation loughtec

Ahorro energético archivado de backups

Plextor mSATA M5M sales kits

SSD based storage tuning for databases

ZFS Workshop

ZFS

Higher Performance SSDs with HLNAND

ZFS

An Introduction to the Implementation of ZFS by Kirk McKusick

CloudStackユーザ会〜仮想ルータの謎に迫る

Zettabyte File Storage System

Scale2014

ZFS Talk Part 1

ZFS Tutorial LISA 2011

Similar to CPU cycles travel farther than light in 1 second

Memory, Big Data, NoSQL and VirtualizationBigstep

Storage and performance, Whiptail Internet World

Optimize Your Hardware for DrupalChristoph Weber

Exaflop In 2018 HardwareJacob Wu

Database Research on Modern Computing ArchitectureKyong-Ha Lee

Argonne's Theta Supercomputer Architectureinside-BigData.com

Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Databricks

Deploying ssd in the data center 2014Howard Marks

Ssd And Enteprise StorageFrank Zhao

MARC ONERA Toulouse2012 AltreonicEric Verhulst

CLFS 2010bergwolf

How to Modernize Your Database Platform to Realize Consolidation SavingsIsaac Christoffersen

Memory-Based Cloud Architectures小新制造

Exadata and OLTPEnkitec

Core 2 processorsArun Kumar

Enfabrica - Bridging the Network and Memory WorldsMemory Fabric Forum

IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...npinto

SSD Deployment Strategies for MySQLYoshinori Matsunobu

LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linuxbrouer

Shared Memory Centric Computing with CXL & OMIAllan Cantle

Similar to CPU cycles travel farther than light in 1 second (20)

Memory, Big Data, NoSQL and Virtualization

Storage and performance, Whiptail

Optimize Your Hardware for Drupal

Exaflop In 2018 Hardware

Database Research on Modern Computing Architecture

Argonne's Theta Supercomputer Architecture

Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...

Deploying ssd in the data center 2014

Ssd And Enteprise Storage

MARC ONERA Toulouse2012 Altreonic

CLFS 2010

How to Modernize Your Database Platform to Realize Consolidation Savings

Memory-Based Cloud Architectures

Exadata and OLTP

Core 2 processors

Enfabrica - Bridging the Network and Memory Worlds

IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...

SSD Deployment Strategies for MySQL

LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux

Shared Memory Centric Computing with CXL & OMI

Recently uploaded

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi

Search Engine Optimization SEO PDF for 2024.pdfRankYa

Anypoint Exchange: It’s Not Just a Repo!Manik S Magar

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

Story boards and shot lists for my a level piececharlottematthew16

My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer

Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Artificial intelligence in cctv survelliance.pptxhariprasad279825

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

Recently uploaded (20)

SIP trunking in Janus @ Kamailio World 2024

Connect Wave/ connectwave Pitch Deck Presentation

DevoxxFR 2024 Reproducible Builds with Apache Maven

Vertex AI Gemini Prompt Engineering Tips

Search Engine Optimization SEO PDF for 2024.pdf

Anypoint Exchange: It’s Not Just a Repo!

SAP Build Work Zone - Overview L2-L3.pptx

"Debugging python applications inside k8s environment", Andrii Soldatenko

Story boards and shot lists for my a level piece

My INSURER PTE LTD - Insurtech Innovation Award 2024

Vector Databases 101 - An introduction to the world of Vector Databases

Designing IA for AI - Information Architecture Conference 2024

Streamlining Python Development: A Guide to a Modern Project Setup

Developer Data Modeling Mistakes: From Postgres to NoSQL

Dev Dives: Streamline document processing with UiPath Studio Web

Unraveling Multimodality with Large Language Models.pdf

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

Artificial intelligence in cctv survelliance.pptx

Are Multi-Cloud and Serverless Good or Bad?

CPU cycles travel farther than light in 1 second

1. Did You Know?

2. Today, a CPU core can cycle three billion times in one second.

3. In about 1 second, light travels to the moon …

4. … but during one CPU cycle, light travels only 10cm.

5. Did You Know?

6. A motherboard with eight x 16 core CPUs will soon be available …

7. That is 128x the computing power of a single CPU … … or over 400 billion CPU cycles per second on a single server blade or socket.

8. But …

9. … most of that computing power will be wasted …

10. … waiting for data.

11. 2010 - 2022 128X increase in transistors per chip CPU NIC RAM FLASH DISK  Moore’s Law will continue for at least 10 Years  Transistors per area will double ~ every 2 year  128 X increase in ~ 12 years  2022: 512Gbit / DRAM, 8 Tbit / Flash  Frequency Gains are difficult  Pollack’s rule: Power scales quadratic with clock performance  Parallelism with more cores is a must

12. 2010 - 2022 128X increase in transistors per chip CPU NIC RAM FLASH DISK  2014: 64 cores, 2016: 128 cores, 2022: 1024 cores  Memory/IO bandwidth need to grow with processing power  Disks cannot follow!

13. 2010 - 2022 128X increase in transistors per chip CPU NIC RAM FLASH DISK 2010 2022 CORES PER 10 1024 CHIP MEMORY Challenging! BANDWIDTH 40 Gb/s 2.5 Tb/s But needed to feed the cores ! IO 2 Gb/s 250 Gb/s BANDWIDTH •No big change : Single Core Clock Rate (will stay < 5GHz ) •But impressive overall computing power:  5000 ( core * GHz )

14. Disks are Tape DISK “Spinning Rust”  Forget Hard Disks !  Disks cannot go faster  Disks cannot follow bandwidth requirements  Random-read scanning of a 1TB disk space today : takes 15 – 150 days (!)  To reach 1TB/s you would need 10.000 disks in parallel  Disks can only be archives any more (sequential access)  DRAM, Flash and PCM will be replacement

15. 2010 - 2022 128X increase in transistors per chip CPU NIC RAM FLASH DISK 2010 2022 CORES PER 16 1024 CHIP MEMORY BANDWIDTH 40 GB/s 2.5 TB/s IO 2 GB/s 250 GB/s BANDWIDTH No big change : Latency

16. Latency and Bandwidth 2 determining factors , which won’t change : RAM – CPU latency : ~ 0.1 µs NIC latency via LAN or WAN : 0.1 – 100 ms RAM CPU DISK NIC FLASH archive  NICs move to PCI Express  Throughput x 2 / year  May move onto CPU chip  Access time falls by 50% / year 10 – 100 Gbit/s already today  goes from SATA to PCI Express Latency in cluster ~1 µs possible (Infiniband/opt. Ethern.)  LAN/WAN latency 0.1 – 100 ms

17. Did You Know?

18. A CPU accesses Level 1 cache memory in 1 – 2 cycles.

19. A CPU accessesLevel 12 cache memory– It accesses Level cache memory in 1 in 6 – cycles. 2 20 cycles.

20. It accesses Level 2 cache memory in 6 – 20 It accesses RAM in 100 – 400 cycles. cycles.

21. It accesses Flash memory in 5000 It accesses RAM in 100 – 400 cycles. cycles.

22. It accesses Flash memorystorage It accesses Disc in 5000 cycles. in 1, 000, 000 cycles.

23. translate cycles to miles and assume you were a CPU core .. … then Level 1 cache would be in the building … Level 2 cache would be at the edge of this city … RAM would be in a different state … Flash memory would be a different country … ... and disc storage would be the planet Mars.

24. Software Implications Roundtrip latency 500 cycles RAM CPU DISK NIC FLASH 1000 – 500,000,000 5,000 1,000,000 archive cycles cycles cycles

25. Software Implications Latency and locality are the determining factors What could that mean? Roundtrip latency 500 cycles RAM CPU DISK NIC FLASH 1000 – archive 5,000 1,000,000 500,000,000 cycles cycles cycles

26. Why Bother ? Systems may just get smaller ! More users for transaction processing on a single machine - isn’t that great? Already today most customers could run the ERP load of a company on a single blade Commodity hardware becomes sufficient for ERP No threat! (… or may be becoming a commodity is a threat?)

27. or ? .......

28. Think in opportunities .......

CPU cycles travel farther than light in 1 second

Recommended

Recommended

More Related Content

What's hot

What's hot (15)

Similar to CPU cycles travel farther than light in 1 second

Similar to CPU cycles travel farther than light in 1 second (20)

Recently uploaded

Recently uploaded (20)

CPU cycles travel farther than light in 1 second