Astera Labs Proprietary & Confidential
Intelligent Connectivity for Cloud and AI Infrastructure
Supercomputing 2023 – CXL Forum
November 2023
Astera Labs Proprietary & Confidential
To deliver semiconductor-based connectivity solutions
purpose-built to unleash the full potential of cloud and AI infrastructure
Our Mission
Astera Labs Proprietary & Confidential 3
Unprecedented Scale for Cloud & AI Infrastructure!
PCIe 5.0 reach 3X over copper cables for
parallel processing of AI accelerators
Increase memory bandwidth by 50%
and reduce latency by 25%
Widest range of Active Copper Cables
for up to 800G rack-scale connectivity
Ethernet
Unprecedented Bandwidth
Unprecedented Reach Unprecedented Flexibility
48 DDR5 DIMMs with 2-Socket System
(32 DIMMs without Astera Labs)
Core
Cloud
Network
General
Compute
AI/ML
DDR5 5600 DDR5 5600
x16
x16
x16
x16
Host CXL Memory
Controller
HW Interleaving
Lower kW
Lower TCO
Taurus Ethernet Smart Cable Modules
Leo CXL Smart Memory Controllers
Aries PCIe/CXL Smart DSP Retimers
5th Gen Intel® Xeon® Scalable
Processors
Active Electrical PCIe 5.0 Cabling
AI Data Platform
SSD SSD SSD SSD SSD SSD SSD SSD
PCIe Fabric
GPU GPU
Astera Labs Proprietary & Confidential 4
Intelligent Connectivity for Cloud and AI Infrastructure
Unprecedented PCIe 5.0 reach for
parallel processing of AI accelerators
Unprecedented memory capacity for
volume data center servers
Unprecedented flexibility for rack-scale
Ethernet connectivity over copper
Ethernet
Taurus Ethernet Smart Cable Modules
Leo CXL Smart Memory Controllers
Aries PCIe/CXL Smart DSP Retimers
Astera Labs Proprietary & Confidential 5
 Breaking Through the Memory Wall
 Memory Bound Use Cases
 CXL for Modular Shared Infrastructure
 Critical CXL Collaboration Happening Now
 Call to Action
Agenda
Astera Labs Proprietary & Confidential 6
Breaking Through the Memory Wall
Challenges with Previous Attempts
1. Memory BW and capacity did not scale efficiently
2. Latency inferior to local CPU memory
3. Not deployable at scale
4. Not easily adopted by existing applications
Breaking Through the Memory Wall with CXL
1. Increase server memory BW and capacity by 50%
2. No compromise performance- reduce latency by 25%
3. Standard DRAM for flexible supply chain and cost
4. Plug-and-play support with no software changes
Astera Labs Proprietary & Confidential 7
Memory Bound Use Cases
eCommerce & Business Intelligence
 Online Transaction Processing
 Online Analytics Processing
AI Inferencing
 Recommendation Engines
 Semantic Cache
What is
happening?
OLTP
What has
happened?
OLAP
Opportunity for CXL to Boost MySQL Database Performance Opportunity for CXL to Boost Vector Database Performance
Vector
DB
Vector Database
Inference Server
REST Models
Query
Inference
Users
Query/Store
Inference
Astera Labs Proprietary & Confidential 8
Industry’s Highest Performant CXL Type 3 Device
Leo CXL Smart Memory Add-in Card
CXL 1.1/2.0
16x32G CXL Link
4x DDR5-5600 RDIMM Slots
2TB Memory Expansion
Leo CXL Smart Memory Controllers
Leo E-Series for Memory Expansion
Leo P-Series for Memory Expansion, Pooling & Sharing
Ready to Accelerate Memory Intensive Applications with CXL
Astera Labs Proprietary & Confidential 9
Leo CXL Smart Memory Controller
Memory expansion, pooling and sharing up to 2TB capacity
Server-grade customizable RAS and security features
Flexible and scalable memory interface with low-latency data paths
Seamless interoperation with all CPUs/GPUs, Memory DIMMs
Expand, Share & Pool Up to 2TB of Memory
Leo CXL
Expansion
Leo CXL
Expansion
Leo CXL
Pooling/
Sharing
CPU
CPU
Unprecedented Memory Capacity with Industry’s Highest Performant CXL Controller
Telemetry, RAS, Security
Low Latency Data Path
Memory PHY
and
Controller
CXL 1.1
CXL 2.0 CSP Customization
Leo CXL Smart Memory Controller
Serdes
Decision
Feedback EQ
CTLE+VGA
Phase Locked
Loop
Eyescan
CDR Tx EQ
CXL
Controller
and
Bifurcation
DDR5-5600
Astera Labs Proprietary & Confidential 10
Breaking Through the Memory Wall
System Under Test Configuration
CPU
Storage
Local Memory
CXL-Attached Memory
Mode
Benchmark
4th Gen Intel® Xeon® Scalable Processor (Single-Socket)
2x NVMe PCIe 4.0 SSDs
128GB (8x 16GB DDR5-4800)
128GB (2x 64GB DDR5-5600)
Memory Tiering (MemVerge Memory Machine)
Sysbench (Percona Labs TPC-C Scripts)
150% More TPS with only 15% More CPU Utilization
Online Transaction Processing Performance
0%
40%
80%
120%
160%
200%
240%
280%
0 200 400 600 800 1000
TPS
(Normalized)
Clients
Transactions per Second (TPS)
DRAM DRAM + CXL
0%
10%
20%
30%
40%
50%
60%
0 200 400 600 800 1000
CPU
Utilization
(Normalized)
Clients
CPU Utilization
DRAM DRAM + CXL
150%+ TPS
15%+ CPU Utilization
System Under Test Configuration
CPU
Storage
Local Memory
CXL-Attached Memory
Mode
Benchmark
5th Gen Intel® Xeon® Scalable Processor (Single-Socket)
4x NVMe PCIe 4.0 SSDs
512GB (8x 64GB DDR5-5600)
256GB (4x 64GB DDR5-5600)
12-Way Heterogenous Interleaving
TPC-H (1000 scale factor)
Cut Big Query Times in Half with CXL Memory
Data Warehouse
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22
Average
Query
Times
(Normalized)
TPC-H Query Times
DRAM+CXL DRAM-Only
50% Query Time Improvement with CXL!
Astera Labs Proprietary & Confidential 11
CFD WRF EDA ROMS
LBM
0%
20%
40%
60%
80%
100%
120%
140%
160%
180%
Computational Fluid Dynamics Computational Fluid Dynamics LBM Weather Research and Forecasting Model Computational Electromagnetics Regional Ocean Modeling System
Normalized
Benchmark
Score
CXL Interleaving Benchmark Results
SPECrate® 2017 Floating Point
DRAM only DRAM+CXL
HPC Performance with Industry Standard Benchmark Suite
CPU: Dual 5th Gen Intel® Xeon® Scalable Processors
OS: Ubuntu 22.04.3 LTS 5.17.11gpm
Compiler: C/C++: Version 2023.0 of Intel oneAPI DPC++/C++ Compiler for Linux;
Fortran: Version 2023.0 of Intel Fortran Compiler for Linux
File System: ext4
Base Pointers: 64-bit
CXL Interleaving Up to 50%+ Performance Improvement for HPC Workloads
Astera Labs Proprietary & Confidential 12
Optimized TCO for In-Memory Databases
 Interleaving across CXL-Attached Memory
 2.33x memory capacity and 1.66x memory bandwidth per socket with CXL
 Lower TCO for memory-intensive applications
Popular Certified & Supported SAP HANA® Hardware
48 DIMMs with Two 2-Socket Systems
High kW
High TCO
Without CXL
Optimized Hardware for In-Memory Databases
56 DIMMs with One 2-Socket System
With CXL
Lower kW
Lower TCO
DDR5 4800 DDR5 4800 DDR5 4800 DDR5 4800
x16
x16
x16
x16
x16
x16
x16
x16
Unprecedented Memory Density with Leo CXL Smart Memory Controllers
Astera Labs Proprietary & Confidential 13
Enabling CXL for Heterogeneous Infrastructure
 Use Case: Memory Expansion
 Real-time Apps
 MB / PCI CEM Connectivity
 Use Case: JBOM Enablement
 Intelligent Tiering/Placement
 Midplane or Backplane Connectivity
 Use Case: Shared/Pooled Memory
 High-Capacity In-Memory Compute
 PCIe Cabling Connectivity
Local CXL-Attached Short Reach, CXL-Attached Long Reach, CXL-Attached
PCIe
Retimer
CXL CXL CXL
CPU
PCIe
Retimer
PCIe Cabling PCIe
Retimer
CPU
Backplane
Leo CXL Smart Memory Controllers
Aries PCIe/CXL Smart DSP Retimers
Direct
Astera Labs Proprietary & Confidential 14
Extending Reach for PCIe/CXL Memory
0.0x
0.1x
0.2x
0.3x
0.4x
0.5x
0.6x
0.7x
0.8x
0.9x
1.0x
1.1x
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
110.00%
CXL CXL + 1 Retimer CXL + 2 Retimers
Relative
Latency
Relative
Bandwidth
Relative MLC Performance with and without Retimers
Bandwidth Latency
Minimal Impact to Performance with Extended Reach
Astera Labs Proprietary & Confidential 15
Calls to Action
Visit Check out how we smashed through
Memory
[OCP Map and where we are]
Learn More
www.asteralabs.com
CXL Resources
• Linux: https://pmem.io/ndctl/collab
• Interop:
https://www.asteralabs.com/interop
Ecosystem Alliance Contact:
• michael.ocampo@asteralabs.com
See the Demos at the
CXL Consortium & PCI-SIG Booths
Increase memory
bandwidth by 50% and
reduce latency by 25%
PCIe 5.0 reach 3X over
copper cables for parallel
processing of AI
accelerators
www.asteralabs.com
Thank You
Check us out on
Astera Labs Proprietary & Confidential

Astera Labs: Intelligent Connectivity for Cloud and AI Infrastructure

  • 1.
    Astera Labs Proprietary& Confidential Intelligent Connectivity for Cloud and AI Infrastructure Supercomputing 2023 – CXL Forum November 2023
  • 2.
    Astera Labs Proprietary& Confidential To deliver semiconductor-based connectivity solutions purpose-built to unleash the full potential of cloud and AI infrastructure Our Mission
  • 3.
    Astera Labs Proprietary& Confidential 3 Unprecedented Scale for Cloud & AI Infrastructure! PCIe 5.0 reach 3X over copper cables for parallel processing of AI accelerators Increase memory bandwidth by 50% and reduce latency by 25% Widest range of Active Copper Cables for up to 800G rack-scale connectivity Ethernet Unprecedented Bandwidth Unprecedented Reach Unprecedented Flexibility 48 DDR5 DIMMs with 2-Socket System (32 DIMMs without Astera Labs) Core Cloud Network General Compute AI/ML DDR5 5600 DDR5 5600 x16 x16 x16 x16 Host CXL Memory Controller HW Interleaving Lower kW Lower TCO Taurus Ethernet Smart Cable Modules Leo CXL Smart Memory Controllers Aries PCIe/CXL Smart DSP Retimers 5th Gen Intel® Xeon® Scalable Processors Active Electrical PCIe 5.0 Cabling AI Data Platform SSD SSD SSD SSD SSD SSD SSD SSD PCIe Fabric GPU GPU
  • 4.
    Astera Labs Proprietary& Confidential 4 Intelligent Connectivity for Cloud and AI Infrastructure Unprecedented PCIe 5.0 reach for parallel processing of AI accelerators Unprecedented memory capacity for volume data center servers Unprecedented flexibility for rack-scale Ethernet connectivity over copper Ethernet Taurus Ethernet Smart Cable Modules Leo CXL Smart Memory Controllers Aries PCIe/CXL Smart DSP Retimers
  • 5.
    Astera Labs Proprietary& Confidential 5  Breaking Through the Memory Wall  Memory Bound Use Cases  CXL for Modular Shared Infrastructure  Critical CXL Collaboration Happening Now  Call to Action Agenda
  • 6.
    Astera Labs Proprietary& Confidential 6 Breaking Through the Memory Wall Challenges with Previous Attempts 1. Memory BW and capacity did not scale efficiently 2. Latency inferior to local CPU memory 3. Not deployable at scale 4. Not easily adopted by existing applications Breaking Through the Memory Wall with CXL 1. Increase server memory BW and capacity by 50% 2. No compromise performance- reduce latency by 25% 3. Standard DRAM for flexible supply chain and cost 4. Plug-and-play support with no software changes
  • 7.
    Astera Labs Proprietary& Confidential 7 Memory Bound Use Cases eCommerce & Business Intelligence  Online Transaction Processing  Online Analytics Processing AI Inferencing  Recommendation Engines  Semantic Cache What is happening? OLTP What has happened? OLAP Opportunity for CXL to Boost MySQL Database Performance Opportunity for CXL to Boost Vector Database Performance Vector DB Vector Database Inference Server REST Models Query Inference Users Query/Store Inference
  • 8.
    Astera Labs Proprietary& Confidential 8 Industry’s Highest Performant CXL Type 3 Device Leo CXL Smart Memory Add-in Card CXL 1.1/2.0 16x32G CXL Link 4x DDR5-5600 RDIMM Slots 2TB Memory Expansion Leo CXL Smart Memory Controllers Leo E-Series for Memory Expansion Leo P-Series for Memory Expansion, Pooling & Sharing Ready to Accelerate Memory Intensive Applications with CXL
  • 9.
    Astera Labs Proprietary& Confidential 9 Leo CXL Smart Memory Controller Memory expansion, pooling and sharing up to 2TB capacity Server-grade customizable RAS and security features Flexible and scalable memory interface with low-latency data paths Seamless interoperation with all CPUs/GPUs, Memory DIMMs Expand, Share & Pool Up to 2TB of Memory Leo CXL Expansion Leo CXL Expansion Leo CXL Pooling/ Sharing CPU CPU Unprecedented Memory Capacity with Industry’s Highest Performant CXL Controller Telemetry, RAS, Security Low Latency Data Path Memory PHY and Controller CXL 1.1 CXL 2.0 CSP Customization Leo CXL Smart Memory Controller Serdes Decision Feedback EQ CTLE+VGA Phase Locked Loop Eyescan CDR Tx EQ CXL Controller and Bifurcation DDR5-5600
  • 10.
    Astera Labs Proprietary& Confidential 10 Breaking Through the Memory Wall System Under Test Configuration CPU Storage Local Memory CXL-Attached Memory Mode Benchmark 4th Gen Intel® Xeon® Scalable Processor (Single-Socket) 2x NVMe PCIe 4.0 SSDs 128GB (8x 16GB DDR5-4800) 128GB (2x 64GB DDR5-5600) Memory Tiering (MemVerge Memory Machine) Sysbench (Percona Labs TPC-C Scripts) 150% More TPS with only 15% More CPU Utilization Online Transaction Processing Performance 0% 40% 80% 120% 160% 200% 240% 280% 0 200 400 600 800 1000 TPS (Normalized) Clients Transactions per Second (TPS) DRAM DRAM + CXL 0% 10% 20% 30% 40% 50% 60% 0 200 400 600 800 1000 CPU Utilization (Normalized) Clients CPU Utilization DRAM DRAM + CXL 150%+ TPS 15%+ CPU Utilization System Under Test Configuration CPU Storage Local Memory CXL-Attached Memory Mode Benchmark 5th Gen Intel® Xeon® Scalable Processor (Single-Socket) 4x NVMe PCIe 4.0 SSDs 512GB (8x 64GB DDR5-5600) 256GB (4x 64GB DDR5-5600) 12-Way Heterogenous Interleaving TPC-H (1000 scale factor) Cut Big Query Times in Half with CXL Memory Data Warehouse 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22 Average Query Times (Normalized) TPC-H Query Times DRAM+CXL DRAM-Only 50% Query Time Improvement with CXL!
  • 11.
    Astera Labs Proprietary& Confidential 11 CFD WRF EDA ROMS LBM 0% 20% 40% 60% 80% 100% 120% 140% 160% 180% Computational Fluid Dynamics Computational Fluid Dynamics LBM Weather Research and Forecasting Model Computational Electromagnetics Regional Ocean Modeling System Normalized Benchmark Score CXL Interleaving Benchmark Results SPECrate® 2017 Floating Point DRAM only DRAM+CXL HPC Performance with Industry Standard Benchmark Suite CPU: Dual 5th Gen Intel® Xeon® Scalable Processors OS: Ubuntu 22.04.3 LTS 5.17.11gpm Compiler: C/C++: Version 2023.0 of Intel oneAPI DPC++/C++ Compiler for Linux; Fortran: Version 2023.0 of Intel Fortran Compiler for Linux File System: ext4 Base Pointers: 64-bit CXL Interleaving Up to 50%+ Performance Improvement for HPC Workloads
  • 12.
    Astera Labs Proprietary& Confidential 12 Optimized TCO for In-Memory Databases  Interleaving across CXL-Attached Memory  2.33x memory capacity and 1.66x memory bandwidth per socket with CXL  Lower TCO for memory-intensive applications Popular Certified & Supported SAP HANA® Hardware 48 DIMMs with Two 2-Socket Systems High kW High TCO Without CXL Optimized Hardware for In-Memory Databases 56 DIMMs with One 2-Socket System With CXL Lower kW Lower TCO DDR5 4800 DDR5 4800 DDR5 4800 DDR5 4800 x16 x16 x16 x16 x16 x16 x16 x16 Unprecedented Memory Density with Leo CXL Smart Memory Controllers
  • 13.
    Astera Labs Proprietary& Confidential 13 Enabling CXL for Heterogeneous Infrastructure  Use Case: Memory Expansion  Real-time Apps  MB / PCI CEM Connectivity  Use Case: JBOM Enablement  Intelligent Tiering/Placement  Midplane or Backplane Connectivity  Use Case: Shared/Pooled Memory  High-Capacity In-Memory Compute  PCIe Cabling Connectivity Local CXL-Attached Short Reach, CXL-Attached Long Reach, CXL-Attached PCIe Retimer CXL CXL CXL CPU PCIe Retimer PCIe Cabling PCIe Retimer CPU Backplane Leo CXL Smart Memory Controllers Aries PCIe/CXL Smart DSP Retimers Direct
  • 14.
    Astera Labs Proprietary& Confidential 14 Extending Reach for PCIe/CXL Memory 0.0x 0.1x 0.2x 0.3x 0.4x 0.5x 0.6x 0.7x 0.8x 0.9x 1.0x 1.1x 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00% 110.00% CXL CXL + 1 Retimer CXL + 2 Retimers Relative Latency Relative Bandwidth Relative MLC Performance with and without Retimers Bandwidth Latency Minimal Impact to Performance with Extended Reach
  • 15.
    Astera Labs Proprietary& Confidential 15 Calls to Action Visit Check out how we smashed through Memory [OCP Map and where we are] Learn More www.asteralabs.com CXL Resources • Linux: https://pmem.io/ndctl/collab • Interop: https://www.asteralabs.com/interop Ecosystem Alliance Contact: • michael.ocampo@asteralabs.com See the Demos at the CXL Consortium & PCI-SIG Booths Increase memory bandwidth by 50% and reduce latency by 25% PCIe 5.0 reach 3X over copper cables for parallel processing of AI accelerators
  • 16.
    www.asteralabs.com Thank You Check usout on Astera Labs Proprietary & Confidential

Editor's Notes

  • #10 Create a Taurus one here
  • #11 Supermicro X13 Hyper offers up to four PCIe 5.0 slots, supporting CXL Type 3 devices to boost in-memory performance and reduce TCO Leo Memory Connectivity Platform enhances memory capacity and performance for OLTP (Online Transaction Processing) with server-grade RAS and security MemVerge provides comprehensive memory telemetry and improves performance for memory-intensive applications Improves memory reliability using transparent checkpoint and restore services without IO to storage with MemVerge In-Memory Snapshots
  • #12 Bwave = Computational Fluid Dynamics lbm_r = Computational Fluid Dynamics LBM wrf_r = Weather Research and Forecasting Model fotonik3d_r = Computational Electromagnetics roms_r = Regional Ocean Modeling System
  • #13 Without CXL: Lenovo ThinkSystem 2U SR665 V3 or 1U SR645 V3 With CXL: Lenovo ThinkSystem SR675 V3 TCO benefits - One platform instead of two - 2x CPU instead of 4x - Less rack space (2x2U vs 1x3U) - Lower power - Less licenses
  • #18 Supermicro X13 Hyper offers up to four PCIe 5.0 slots, supporting CXL Type 3 devices to boost in-memory performance and reduce TCO Leo Memory Connectivity Platform enhances memory capacity and performance for OLTP (Online Transaction Processing) with server-grade RAS and security MemVerge provides comprehensive memory telemetry and improves performance for memory-intensive applications Improves memory reliability using transparent checkpoint and restore services without IO to storage with MemVerge In-Memory Snapshots