Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and Devices

Memory Expansion
with CXL Ready
Systems and Devices
Presenter:
Ravi Kiran Gummaluri
Micron Technology

Agenda
• Memory demand and scaling challenges
• CXL memory expansion
• Capacity expansion solutions
• Database performance analysis on AMD Platform
• Bandwidth expansion solutions
• AI inference performance analysis on Intel Platform
• Conclusions and Next steps

Memory Demand and Scaling challenges
3
Growing demand for Memory need in data center applications . (~26 % yoy )
Memory Latency -> is only improving 1.1 times every two years.
Processor speed -> has been doubling every two years.
DRAM is not scaling -> Memory Capacity is doubling every four years.
Increased TCO for Data Centers -> Memory is ~ 50% of the overall server cost .
How do we solve increased Memory Bandwidth , Capacity requirements and reduce TCO ?
Figure 1 : Source: https://www.statista.com/statistics/871513/worldwide-data-created/
Figure 3 : Source: Based on capacity and core counts from publicly available AMD and Intel datasheets, and public statements.
Figure 1: Growing memory usage Figure 2: Memory wall Figure 3: Memory capacity Vs CPU cores

CXL Memory expansion
 CXL Memory Expansion
 Cache-line granular access semantics.
 CXL-Memory appears to a system as a CPU-less NUMA node. (Not
dependent on CPU Arch)
 Hot Pluggable memory
 Works with various form factors E1.S, E3.S , E5.S,Add on Card etc
 Interoperable with various memory types (DDR4, DDR5, LPDDR5, NVM ..)
 CXL Memory Capacity Expansion
 CXL Direct attached Memory Tiering
1. Application Transparent
 OS Managed
 User Space Library
2. Application Managed
 Application Aware (ex: libnuma)
 Modified (ex : libmemkind)
 CXL Switch / Fabric attached Memory Tiering
 Another Memory tier added to system with higher latencies.
 CXL Memory Bandwidth Expansion
 CXL Heterogenous interleave solutions
1. Hardware based Interleave
2. Software and HW heterogenous interleave.
3. Software based NUMA interleave.
4
Figure : Memory Hierarchy

Micron Memory Expansion on AMD platform
5
System Configuration :

TPC-H: DRAM Vs Tiered memory(DRAM+CXL)
6
CXL can provide better performance for capacity intensive workloads

HW Heterogenous Interleave
 System Address map will be interleaved between
Local DRAM and CXL memory
 Pros
 Easy to configure
 Cons
 Kernel/OS cannot manage memory allocations.
⎻ Affects kernel memory.
⎻ Hides the NUMA topology from the OS.
 Fixed configuration : Not scalable for all workloads
 CMM capacity will be restricted to align with Local
DRAM capacity.
Figure : HW Heterogenous interleave

HW + SW Heterogenous Interleave
 HW : Supports associating DRAM channels to
different NUMA domains .
 SW : Interleave 4(Local ):1(CXL) NUMA domain
using numactl .
 NPS4 :Each socket is partitioned into 4 NUMA
domains. Each NUMA domain has 3 memory
channels.
 Pros
 NUMA topology is enabled.
 Kernel/OS can manage the memory allocations
 Overcomes capacity limitations imposed by HW
interleave solution .
 Cons
 Fixed configuration : Not scalable for all workloads . Figure : HW + SW 4:1 Interleave

SW Heterogenous Interleave
Figure : SW Interleave with weights
Local DRAM
CXL MEMORY
Node 1
Socket 0
Application
requesting
100-pages
80-pages
20-pages
 Memory allocations performed according to per-node
weights
 Pros
 Scalable : Not fixed configuration
o Application can configure different weights according to BW
requirements .
o This only applies when explicitly enabled for a job.
 NUMA topology is enabled.
 Kernel/OS can manage the memory allocations
 Overcomes capacity limitations imposed by HW
interleave solution .
 Cons
 CXL Switch / Fabric attached Memory Tier cannot take
advantage of this configuration.
Node 0

LLM Performance Optimization with Micron’s CXL Memory SW interleaving
10
CXL can provide better performance for bandwidth intensive workloads

Conclusion / Next Steps
11
Conclusions :
CXL memory expansion can provide a solution to increased Memory Bandwidth and
Capacity requirements .
CXL memory can help in bandwidth expansion using SW interleaving between DDR and
CXL memory. Bandwidth sensitive workloads, Such as AI and HPC benefit from this.
CXL memory when introduced as tiered memory can help in increasing memory capacity
and reducing latency impact of Storage media . Capacity sensitive workloads , Such as
database and data analytics applications can benefit from this.
Next Steps :
Application aware and optimized page allocation algorithms can further improve system
performance by utilizing various memory tiers and media characteristics .
CXL memory pooling and Fabric attached memory can help further in defining various
memory tiers to reduce system TCO.

Introducing Micron CZ120 CXL Memory Module
Delivering Capacity, Bandwidth, Flexibility
128GB / 256GB
Up to 2TB incremental server capacity supporting CXL 2.0
36GB/s
Up to 34% increased server memory bandwidth
memory bandwidth per module using PCIe® Gen5 x8
E3.S 2T x8
Industry-standard form factor for broad deployment
1. By adding 8x256GB CZ120s, system limitations apply
2. Memory Latency Checker bandwidth compared to 12-channel 4800MT/s RDIMM server
2
1

Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and Devices

Recommended

Recommended

More Related Content

Similar to Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and Devices

Similar to Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and Devices (20)

More from Memory Fabric Forum

More from Memory Fabric Forum (20)

Recently uploaded

Recently uploaded (20)

Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and Devices

Editor's Notes