SlideShare a Scribd company logo
1 of 16
Harnessing light to power new possibilities
Advantages of Optical CXL
for Disaggregated Compute Architectures
Ron Swartzentruber
Director of Engineering
2
Agenda
 Memory centric shift in the data center
 AI Large Language Model growth
 Need for optical CXL technology
 Case study: OPT inference benefits using optical CXL
© Lightelligence, Inc.
3
Physical
Machine
0
Virtual
Machine
0
Virtual
Machine
1
Stranded
resource
Physical
Machine
1
Stranded
resource
Virtual
Machine
2
Virtual
Machine
3 FLEXIBLE MANAGEABLE ECONOMICAL OPEN
Physical
Machine
1
Physical
Machine
0
Physical
Machine
2
……
……
……
Virtual
Machine
0
Virtual
Machine
1
Virtual
Machine
2
Virtual
Machine
3
Disaggregation is the Future for Datacenter
Virtual
Machine
4
CPU cores DRAM Accelerators
© Lightelligence, Inc.
4
AI trends
 AI and Large Language Models will continue to grow and consume more compute
 Disaggregated memory architectures are required in order to continue to scale
 Optical interconnects are required to extend reach
Source: https://medium.com/riselab/ai-and-memory-wall-2cb4265cb0b8 Source: https://hc34.hotchips.org/assets/program/tutorials/CXL/Hot%20Chips%202022%20CXL%20MemoryChallenges.pdf
© Lightelligence, Inc.
5
Optical Interconnect Latency
© Lightelligence, Inc.
100s of ns
100s of 𝜇s
6
CXL is the PredominantStandard for Disaggregation
Cache-
coherence
Latency
Memory
decouple
CXL Yes ~100ns Supported
RDMA (ethernet) No ~3μs Not supported
CXL 2.0 Switch
Standardized Fabric Manager
H1 H2 H3 H4 H#
……
CXL2.0 CXL2.0 CXL2.0 CXL2.0 CXL2.0
CXL1.0 CXL2.0 CXL2.0 CXL2.0 CXL2.0
D1 D2 D3 D4 …… D#
© Lightelligence, Inc.
7
OpticalCXL is Required forScaling
ATTENUATION
(DB)
0
-10
-20
-30
-40
-50
PROPAGATION DISTANCE (M)
1m 10m
0 -0.003
-4
-40
Copper Optics
Assuming AWG26 wire, PCIe 5.0 signal
32 cables with diameter
> 6mm (CAT8)
16 fibers with diameter
of 0.125mm
…
…
6mm
> 30 mm
Copper
<1mm
Optics
Supporting 4x PCIe 5.0 x16
© Lightelligence, Inc.
8
OpticalCXL in the Datacenter
Compute
Break Through the Rack!
Memory Banks
© Lightelligence, Inc.
9
Case study: LLM Inference
CXL Memory
Expander
CXL Memory
Expander
Server
2x CXL 1.1 CPUs
 2U Supermicro server
 2x AMD Genoa CXL 1.1 CPUs
 MemVerge Memory Tiering and
Pooling Software
 2x Micron 256GB Memory Expanders each with
CXL/PCIe Gen5x8 link
Memory
Expansion
Module
Photowave
Card
 Nvidia GPU running LLM inference
 All VMs access to CXL memory
 Secure application, encrypted data
Photowave Card
© Lightelligence, Inc.
10
LLM Model List
Model Weight Memory(float16)
KV-Cache per
sample(float16)
Activation per
sample(float16)
Context length
OPT-1.3B 2.4 GB 0.095 GB 0.002 GB 512
OPT-13B 23.921 GB 0.397 GB 0.005 GB 512
OPT-30B 55.803 GB 0.667 GB 0.007 GB 512
OPT-66B 122.375 GB 1.143 GB 0.009 GB 512
OPT-175B 325GB 2.285GB 0.012GB 512
KV-cache Size: data_type * dimension* num_layers* batch_size * Context_len * 2
e.g., for opt-1.3B, FP16 -> 2Bytes * 2048 * 24 * 1 * 512 * 2 = 100,663,296 Bytes
Activation Size: data_type * dimension * batch_size * Context_len
Entire OPT-66B model fits within one 128GB CXL memory expander
© Lightelligence, Inc.

CXL: 882MB/s, System Memory 857MB/s, Disk: 582MB/s, MemVerge: 493MB/s

CXL: 2365MB/s, System Memory: 2609MB/s, Disk: 1887MB/s, MemVerge: 2173MB/s
11
Results
~2.4x
© Lightelligence, Inc.
OPT-66B
model results
Disk
(NVMe)
CXL
Memory
System
Memory
MemVerge
60:40Policy
Decode
Throughput
(Tokens/s)
1.984 4.868 6.216 6.237
Decode
Latency(s)
338.7 138.2 108.1 107.7
12
PHOTOWAVETM OPTICALCXL MEMORY EXPANDER
© Lightelligence, Inc.
PHOTOWAVETM OPTICALCXL MEMORY EXPANDER
CXL
GPU
UTILIZATION
GPU MEM.
UTILIZATION
CPU
UTILIZATION
MEM.
UTILIZATION
CXL MEM.
UTILIZATION
DECODE
THROUGHPUT
GENOA
AMD
CPU
SAMSUNG
CXL 128GB
NVIDIA
GPU: 1xA10 24GB
OPT-66B MODEL PROGRESS 99%
TOKENS/S
PARAMETERS
INFERENCE ENGINE: FLEXGEN
KV CACHE: 109.688GB
RUN MODE: CXL
WEIGHTS: 122.375GB
95%
77% 51%
27%
77%
CXL DIS
K
✔️ ✔️
13
© Lightelligence, Inc.
NVMe
Summary of Results
CXL memory offloading is efficient and
beneficial
 LLM inference case study
 Allows use of lower cost memory
Similar performance compared to pure
system memory
1.9xTCO improvement with
inexpensiveGPUs at similar
throughput
2.4x performance advantage compared
to SSD/NVMe disk offloading
14
© Lightelligence, Inc.
PhotowaveTM Form Factors
 CXL 2.0/PCIe Gen5 x16
 Jitter reduction, SI cleanup
 Sideband signals over optics
 x8, x4 or x2 bifurcation
 End-to-end latency:
 Card: under 20ns + TOF
 AOC: 1ns + TOF
Low ProfilePCIeCard OCP3.0SFFCard ActiveOpticalCables
ProductSuite Features
15
© Lightelligence, Inc.
Endnotes
Hardware
configuration
Super Micro Server
 AMD EPYC 9124 16-Core
CPU
 Samsung DDR5 4800 MT/s
 MEM0 size: 256GB
 MEM1 size: 256GB
 Bandwidth: 307GB/s
Nvidia GPU
 Gen4x16, DMEM size: 24GB
 Bandwidth: 32GB/s
Samsung NVME
 Gen4x4, MEM size: 1.92TB
 Bandwidth: 8GB/s
Micron CXL Memory
 Gen5x8, MEM size: 256GB
 Bandwidth: 32GB/s
 LLM: OPT-66B
 Batch size = 24
 Context length = 512
 Output length = 8
 FlexGen
Algorithm&Software
16
© Lightelligence, Inc.

More Related Content

Similar to Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute Architectures ​

Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red_Hat_Storage
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bk
IBM Switzerland
 
Top 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & AnalysisTop 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & Analysis
NomanSiddiqui41
 

Similar to Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute Architectures ​ (20)

Hortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AIHortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AI
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
 
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptxQ1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
 
Q1 Memory Fabric Forum: Breaking Through the Memory Wall
Q1 Memory Fabric Forum: Breaking Through the Memory WallQ1 Memory Fabric Forum: Breaking Through the Memory Wall
Q1 Memory Fabric Forum: Breaking Through the Memory Wall
 
IBM Power Systems: Designed for Data
IBM Power Systems: Designed for DataIBM Power Systems: Designed for Data
IBM Power Systems: Designed for Data
 
Micron CXL product and architecture update
Micron CXL product and architecture updateMicron CXL product and architecture update
Micron CXL product and architecture update
 
Trends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient PerformanceTrends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient Performance
 
Full scan frenzy at amadeus
Full scan frenzy at amadeusFull scan frenzy at amadeus
Full scan frenzy at amadeus
 
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
 
IBM HPC Transformation with AI
IBM HPC Transformation with AI IBM HPC Transformation with AI
IBM HPC Transformation with AI
 
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems SpecialistOWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
 
RedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power SystemsRedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power Systems
 
11540800.ppt
11540800.ppt11540800.ppt
11540800.ppt
 
Performance of State-of-the-Art Cryptography on ARM-based Microprocessors
Performance of State-of-the-Art Cryptography on ARM-based MicroprocessorsPerformance of State-of-the-Art Cryptography on ARM-based Microprocessors
Performance of State-of-the-Art Cryptography on ARM-based Microprocessors
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bk
 
Top 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & AnalysisTop 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & Analysis
 
MemVerge - The Dawn of Big Memory
MemVerge - The Dawn of Big MemoryMemVerge - The Dawn of Big Memory
MemVerge - The Dawn of Big Memory
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systems
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems
 
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
 

More from Memory Fabric Forum

More from Memory Fabric Forum (20)

H3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptxH3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptx
 
Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.
Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.
Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.
 
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IP
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IPQ1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IP
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IP
 
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and Devices
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and DevicesQ1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and Devices
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and Devices
 
Q1 Memory Fabric Forum: About MindShare Training
Q1 Memory Fabric Forum: About MindShare TrainingQ1 Memory Fabric Forum: About MindShare Training
Q1 Memory Fabric Forum: About MindShare Training
 
Q1 Memory Fabric Forum: CXL-Related Activities within OCP
Q1 Memory Fabric Forum: CXL-Related Activities within OCPQ1 Memory Fabric Forum: CXL-Related Activities within OCP
Q1 Memory Fabric Forum: CXL-Related Activities within OCP
 
Q1 Memory Fabric Forum: CXL Controller by Montage Technology
Q1 Memory Fabric Forum: CXL Controller by Montage TechnologyQ1 Memory Fabric Forum: CXL Controller by Montage Technology
Q1 Memory Fabric Forum: CXL Controller by Montage Technology
 
Q1 Memory Fabric Forum: Teledyne LeCroy | Austin Labs
Q1 Memory Fabric Forum: Teledyne LeCroy | Austin LabsQ1 Memory Fabric Forum: Teledyne LeCroy | Austin Labs
Q1 Memory Fabric Forum: Teledyne LeCroy | Austin Labs
 
Q1 Memory Fabric Forum: SMART CXL Product Lineup
Q1 Memory Fabric Forum: SMART CXL Product LineupQ1 Memory Fabric Forum: SMART CXL Product Lineup
Q1 Memory Fabric Forum: SMART CXL Product Lineup
 
Q1 Memory Fabric Forum: CXL Form Factor Primer
Q1 Memory Fabric Forum: CXL Form Factor PrimerQ1 Memory Fabric Forum: CXL Form Factor Primer
Q1 Memory Fabric Forum: CXL Form Factor Primer
 
Q1 Memory Fabric Forum: Memory Fabric in a Composable System
Q1 Memory Fabric Forum: Memory Fabric in a Composable SystemQ1 Memory Fabric Forum: Memory Fabric in a Composable System
Q1 Memory Fabric Forum: Memory Fabric in a Composable System
 
Q1 Memory Fabric Forum: Big Memory Computing for AI
Q1 Memory Fabric Forum: Big Memory Computing for AIQ1 Memory Fabric Forum: Big Memory Computing for AI
Q1 Memory Fabric Forum: Big Memory Computing for AI
 
Q1 Memory Fabric Forum: Micron CXL-Compatible Memory Modules
Q1 Memory Fabric Forum: Micron CXL-Compatible Memory ModulesQ1 Memory Fabric Forum: Micron CXL-Compatible Memory Modules
Q1 Memory Fabric Forum: Micron CXL-Compatible Memory Modules
 
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 Update
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 UpdateQ1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 Update
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 Update
 
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
 
Q1 Memory Fabric Forum: XConn CXL Switches for AI
Q1 Memory Fabric Forum: XConn CXL Switches for AIQ1 Memory Fabric Forum: XConn CXL Switches for AI
Q1 Memory Fabric Forum: XConn CXL Switches for AI
 
Q1 Memory Fabric Forum: VMware Memory Vision
Q1 Memory Fabric Forum: VMware Memory VisionQ1 Memory Fabric Forum: VMware Memory Vision
Q1 Memory Fabric Forum: VMware Memory Vision
 
MemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the BudgetMemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the Budget
 
Micron - CXL Enabling New Pliability in the Modern Data Center.pptx
Micron - CXL Enabling New Pliability in the Modern Data Center.pptxMicron - CXL Enabling New Pliability in the Modern Data Center.pptx
Micron - CXL Enabling New Pliability in the Modern Data Center.pptx
 
MemVerge: Past Present and Future of CXL
MemVerge: Past Present and Future of CXLMemVerge: Past Present and Future of CXL
MemVerge: Past Present and Future of CXL
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute Architectures ​

  • 1. Harnessing light to power new possibilities Advantages of Optical CXL for Disaggregated Compute Architectures Ron Swartzentruber Director of Engineering
  • 2. 2 Agenda  Memory centric shift in the data center  AI Large Language Model growth  Need for optical CXL technology  Case study: OPT inference benefits using optical CXL © Lightelligence, Inc.
  • 3. 3 Physical Machine 0 Virtual Machine 0 Virtual Machine 1 Stranded resource Physical Machine 1 Stranded resource Virtual Machine 2 Virtual Machine 3 FLEXIBLE MANAGEABLE ECONOMICAL OPEN Physical Machine 1 Physical Machine 0 Physical Machine 2 …… …… …… Virtual Machine 0 Virtual Machine 1 Virtual Machine 2 Virtual Machine 3 Disaggregation is the Future for Datacenter Virtual Machine 4 CPU cores DRAM Accelerators © Lightelligence, Inc.
  • 4. 4 AI trends  AI and Large Language Models will continue to grow and consume more compute  Disaggregated memory architectures are required in order to continue to scale  Optical interconnects are required to extend reach Source: https://medium.com/riselab/ai-and-memory-wall-2cb4265cb0b8 Source: https://hc34.hotchips.org/assets/program/tutorials/CXL/Hot%20Chips%202022%20CXL%20MemoryChallenges.pdf © Lightelligence, Inc.
  • 5. 5 Optical Interconnect Latency © Lightelligence, Inc. 100s of ns 100s of 𝜇s
  • 6. 6 CXL is the PredominantStandard for Disaggregation Cache- coherence Latency Memory decouple CXL Yes ~100ns Supported RDMA (ethernet) No ~3μs Not supported CXL 2.0 Switch Standardized Fabric Manager H1 H2 H3 H4 H# …… CXL2.0 CXL2.0 CXL2.0 CXL2.0 CXL2.0 CXL1.0 CXL2.0 CXL2.0 CXL2.0 CXL2.0 D1 D2 D3 D4 …… D# © Lightelligence, Inc.
  • 7. 7 OpticalCXL is Required forScaling ATTENUATION (DB) 0 -10 -20 -30 -40 -50 PROPAGATION DISTANCE (M) 1m 10m 0 -0.003 -4 -40 Copper Optics Assuming AWG26 wire, PCIe 5.0 signal 32 cables with diameter > 6mm (CAT8) 16 fibers with diameter of 0.125mm … … 6mm > 30 mm Copper <1mm Optics Supporting 4x PCIe 5.0 x16 © Lightelligence, Inc.
  • 8. 8 OpticalCXL in the Datacenter Compute Break Through the Rack! Memory Banks © Lightelligence, Inc.
  • 9. 9 Case study: LLM Inference CXL Memory Expander CXL Memory Expander Server 2x CXL 1.1 CPUs  2U Supermicro server  2x AMD Genoa CXL 1.1 CPUs  MemVerge Memory Tiering and Pooling Software  2x Micron 256GB Memory Expanders each with CXL/PCIe Gen5x8 link Memory Expansion Module Photowave Card  Nvidia GPU running LLM inference  All VMs access to CXL memory  Secure application, encrypted data Photowave Card © Lightelligence, Inc.
  • 10. 10 LLM Model List Model Weight Memory(float16) KV-Cache per sample(float16) Activation per sample(float16) Context length OPT-1.3B 2.4 GB 0.095 GB 0.002 GB 512 OPT-13B 23.921 GB 0.397 GB 0.005 GB 512 OPT-30B 55.803 GB 0.667 GB 0.007 GB 512 OPT-66B 122.375 GB 1.143 GB 0.009 GB 512 OPT-175B 325GB 2.285GB 0.012GB 512 KV-cache Size: data_type * dimension* num_layers* batch_size * Context_len * 2 e.g., for opt-1.3B, FP16 -> 2Bytes * 2048 * 24 * 1 * 512 * 2 = 100,663,296 Bytes Activation Size: data_type * dimension * batch_size * Context_len Entire OPT-66B model fits within one 128GB CXL memory expander © Lightelligence, Inc.
  • 11.  CXL: 882MB/s, System Memory 857MB/s, Disk: 582MB/s, MemVerge: 493MB/s  CXL: 2365MB/s, System Memory: 2609MB/s, Disk: 1887MB/s, MemVerge: 2173MB/s 11 Results ~2.4x © Lightelligence, Inc. OPT-66B model results Disk (NVMe) CXL Memory System Memory MemVerge 60:40Policy Decode Throughput (Tokens/s) 1.984 4.868 6.216 6.237 Decode Latency(s) 338.7 138.2 108.1 107.7
  • 12. 12 PHOTOWAVETM OPTICALCXL MEMORY EXPANDER © Lightelligence, Inc.
  • 13. PHOTOWAVETM OPTICALCXL MEMORY EXPANDER CXL GPU UTILIZATION GPU MEM. UTILIZATION CPU UTILIZATION MEM. UTILIZATION CXL MEM. UTILIZATION DECODE THROUGHPUT GENOA AMD CPU SAMSUNG CXL 128GB NVIDIA GPU: 1xA10 24GB OPT-66B MODEL PROGRESS 99% TOKENS/S PARAMETERS INFERENCE ENGINE: FLEXGEN KV CACHE: 109.688GB RUN MODE: CXL WEIGHTS: 122.375GB 95% 77% 51% 27% 77% CXL DIS K ✔️ ✔️ 13 © Lightelligence, Inc. NVMe
  • 14. Summary of Results CXL memory offloading is efficient and beneficial  LLM inference case study  Allows use of lower cost memory Similar performance compared to pure system memory 1.9xTCO improvement with inexpensiveGPUs at similar throughput 2.4x performance advantage compared to SSD/NVMe disk offloading 14 © Lightelligence, Inc.
  • 15. PhotowaveTM Form Factors  CXL 2.0/PCIe Gen5 x16  Jitter reduction, SI cleanup  Sideband signals over optics  x8, x4 or x2 bifurcation  End-to-end latency:  Card: under 20ns + TOF  AOC: 1ns + TOF Low ProfilePCIeCard OCP3.0SFFCard ActiveOpticalCables ProductSuite Features 15 © Lightelligence, Inc.
  • 16. Endnotes Hardware configuration Super Micro Server  AMD EPYC 9124 16-Core CPU  Samsung DDR5 4800 MT/s  MEM0 size: 256GB  MEM1 size: 256GB  Bandwidth: 307GB/s Nvidia GPU  Gen4x16, DMEM size: 24GB  Bandwidth: 32GB/s Samsung NVME  Gen4x4, MEM size: 1.92TB  Bandwidth: 8GB/s Micron CXL Memory  Gen5x8, MEM size: 256GB  Bandwidth: 32GB/s  LLM: OPT-66B  Batch size = 24  Context length = 512  Output length = 8  FlexGen Algorithm&Software 16 © Lightelligence, Inc.

Editor's Notes

  1. Key message: CXL is industry consensus for disaggregation
  2. MemVerge policy: System memory 60%, CXL Memory 40%
  3. What is CPU% ******************************************************************************** CPU% mem 29.525 cxl 31.431000000000004 disk 81.46 main.py:36: MatplotlibDeprecationWarning: Calling gca() with keyword arguments was deprecated in Matplotlib 3.4. Starting two minor releases later, gca() will take no keyword arguments. The gca() function should only be used to get the current axes, or if no axes exist, create new axes with default keyword arguments. To create a new axes with non-default arguments, use plt.axes() or plt.subplot(). ax = plt.gca(facecolor='black') ******************************************************************************** MEM% mem 27.2 cxl 27.2 disk 11.722000000000001 ******************************************************************************** GPU% mem 99.49 cxl 97.05 disk 53.01 ******************************************************************************** CXLMEM% mem 0.0016306192454823602 cxl 77.11430249904593 disk 0.1663918208702139 ******************************************************************************** GPUMEM% mem 45.17 cxl 49.0 disk 34.71 ******************************************************************************** GPUMEM_USED_MB mem 9213.4375 cxl 9213.4375 disk 8979.4375 ******************************************************************************** PCI_TX_MBps mem 274.2578125 cxl 191.357421875 disk 81.064453125 ******************************************************************************** PCI_RX_MBps mem 2007.3828125 cxl 1422.421875 disk 1158.056640625 (tfpy38) hussainazhar@Hussains-MacBook-Air T4gpu % python main.py ******************************************************************************** CPU% mem 29.525 cxl 31.431000000000004 disk 81.46 main.py:36: MatplotlibDeprecationWarning: Calling gca() with keyword arguments was deprecated in Matplotlib 3.4. Starting two minor releases later, gca() will take no keyword arguments. The gca() function should only be used to get the current axes, or if no axes exist, create new axes with default keyword arguments. To create a new axes with non-default arguments, use plt.axes() or plt.subplot(). ax = plt.gca(facecolor='black') ******************************************************************************** MEM% mem 27.2 cxl 27.2 disk 11.722000000000001 ******************************************************************************** GPU% mem 99.49 cxl 97.05 disk 53.01 ******************************************************************************** CXLMEM% mem 0.0016306192454823602 cxl 77.11430249904593 disk 0.1663918208702139 ******************************************************************************** GPUMEM% mem 45.17 cxl 49.0 disk 34.71 ******************************************************************************** GPUMEM_USED_MB mem 9213.4375 cxl 9213.4375 disk 8979.4375 ******************************************************************************** PCI_TX_MBps mem 274.2578125 cxl 191.357421875 disk 81.064453125 ******************************************************************************** PCI_RX_MBps mem 2007.3828125 cxl 1422.421875 disk 1158.056640625