SlideShare a Scribd company logo
1 of 16
Harnessing light to power new possibilities
Advantages of Optical CXL
for Disaggregated Compute Architectures
Ron Swartzentruber
Director of Engineering
2
Agenda
 Memory centric shift in the data center
 AI Large Language Model growth
 Need for optical CXL technology
 Case study: OPT inference benefits using optical CXL
© Lightelligence, Inc.
3
Physical
Machine
0
Virtual
Machine
0
Virtual
Machine
1
Stranded
resource
Physical
Machine
1
Stranded
resource
Virtual
Machine
2
Virtual
Machine
3 FLEXIBLE MANAGEABLE ECONOMICAL OPEN
Physical
Machine
1
Physical
Machine
0
Physical
Machine
2
……
……
……
Virtual
Machine
0
Virtual
Machine
1
Virtual
Machine
2
Virtual
Machine
3
Disaggregation is the Future for Datacenter
Virtual
Machine
4
CPU cores DRAM Accelerators
© Lightelligence, Inc.
4
AI trends
 AI and Large Language Models will continue to grow and consume more compute
 Disaggregated memory architectures are required in order to continue to scale
 Optical interconnects are required to extend reach
Source: https://medium.com/riselab/ai-and-memory-wall-2cb4265cb0b8 Source: https://hc34.hotchips.org/assets/program/tutorials/CXL/Hot%20Chips%202022%20CXL%20MemoryChallenges.pdf
© Lightelligence, Inc.
5
Optical Interconnect Latency
© Lightelligence, Inc.
100s of ns
100s of 𝜇s
6
CXL is the PredominantStandard for Disaggregation
Cache-
coherence
Latency
Memory
decouple
CXL Yes ~100ns Supported
RDMA (ethernet) No ~3μs Not supported
CXL 2.0 Switch
Standardized Fabric Manager
H1 H2 H3 H4 H#
……
CXL2.0 CXL2.0 CXL2.0 CXL2.0 CXL2.0
CXL1.0 CXL2.0 CXL2.0 CXL2.0 CXL2.0
D1 D2 D3 D4 …… D#
© Lightelligence, Inc.
7
OpticalCXL is Required forScaling
ATTENUATION
(DB)
0
-10
-20
-30
-40
-50
PROPAGATION DISTANCE (M)
1m 10m
0 -0.003
-4
-40
Copper Optics
Assuming AWG26 wire, PCIe 5.0 signal
32 cables with diameter
> 6mm (CAT8)
16 fibers with diameter
of 0.125mm
…
…
6mm
> 30 mm
Copper
<1mm
Optics
Supporting PCIe 5.0 x64
© Lightelligence, Inc.
8
OpticalCXL in the Datacenter
Compute
Break Through the Rack!
Memory Banks
© Lightelligence, Inc.
9
Case study: LLM Inference
128GB
CXL
Memory
Expander
128GB
CXL Memory
Expander
Server
2x CXL 1.1 CPUs
 2U Supermicro server
 2x AMD Genoa CXL 1.1 CPUs
 MemVerge Memory Tiering and
Pooling Software
 2x Micron 256GB Memory Expanders each with
CXL/PCIe Gen5x8 link
Memory
Expansion
Module
Photowave
Card
 Nvidia GPU running LLM inference
 All VMs access to CXL memory
 Secure application, encrypted data
Demo @ booth #1392
Photowave Card
© Lightelligence, Inc.
10
LLM Model List
Model Weight Memory(float16)
KV-Cache per
sample(float16)
Activation per
sample(float16)
Context length
OPT-1.3B 2.4 GB 0.095 GB 0.002 GB 512
OPT-13B 23.921 GB 0.397 GB 0.005 GB 512
OPT-30B 55.803 GB 0.667 GB 0.007 GB 512
OPT-66B 122.375 GB 1.143 GB 0.009 GB 512
OPT-175B 325GB 2.285GB 0.012GB 512
KV-cache Size: data_type * dimension* num_layers* batch_size * Context_len * 2
e.g., for opt-1.3B, FP16 -> 2Bytes * 2048 * 24 * 1 * 512 * 2 = 100,663,296 Bytes
Activation Size: data_type * dimension * batch_size * Context_len
Entire OPT-66B model fits within one 128GB CXL memory expander
© Lightelligence, Inc.

CXL: 882MB/s, System Memory 857MB/s, Disk: 582MB/s, MemVerge: 493MB/s

CXL: 2365MB/s, System Memory: 2609MB/s, Disk: 1887MB/s, MemVerge: 2173MB/s
11
Results
~2.4x
© Lightelligence, Inc.
OPT-66B
model results
Disk
(NVMe)
CXL
Memory
System
Memory
MemVerge
60:40Policy
Decode
Throughput
(Tokens/s)
1.984 4.859 6.216 6.237
Decode
Latency(s)
338.7 138.2 108.1 107.7
12
PHOTOWAVETM OPTICALCXL MEMORY EXPANDER
© Lightelligence, Inc.
PHOTOWAVETM OPTICALCXL MEMORY EXPANDER
CXL
GPU
UTILIZATION
GPU MEM.
UTILIZATION
CPU
UTILIZATION
MEM.
UTILIZATION
CXL MEM.
UTILIZATION
DECODE
THROUGHPUT
GENOA
AMD
CPU
SAMSUNG
CXL 128GB
NVIDIA
GPU: 1xA10 24GB
OPT-66B MODEL PROGRESS 99%
TOKENS/S
PARAMETERS
INFERENCE ENGINE: FLEXGEN
KV CACHE: 109.688GB
RUN MODE: CXL
WEIGHTS: 122.375GB
95%
77% 51%
27%
77%
CXL DIS
K
✔️ ✔️
13
© Lightelligence, Inc.
NVMe
Summary of Results
CXL memory offloading is efficient and
beneficial
 LLM inference case study
 Allows use of lower cost memory
Similar performance compared to pure
system memory
1.9xTCO improvement with
inexpensiveGPUs at similar
throughput
2.4x performance advantage compared
to SSD/NVMe disk offloading
14
© Lightelligence, Inc.
PhotowaveTM Form Factors
 CXL 2.0/PCIe Gen5 x16
 Jitter reduction, SI cleanup
 Sideband signals over optics
 x8, x4 or x2 bifurcation
 End-to-end latency:
 Card: under 20ns + TOF
 AOC: 1ns + TOF
Low ProfilePCIeCard OCP3.0SFFCard ActiveOpticalCables
ProductSuite Features
15
© Lightelligence, Inc.
Endnotes
Hardware
configuration
Super Micro Server
 AMD EPYC 9124 16-Core
CPU
 Samsung DDR5 4800 MT/s
 MEM0 size: 256GB
 MEM1 size: 256GB
 Bandwidth: 307GB/s
Nvidia GPU
 Gen4x16, DMEM size: 24GB
 Bandwidth: 32GB/s
Samsung NVME
 Gen4x4, MEM size: 1.92TB
 Bandwidth: 8GB/s
Samsung CXL Memory
 Gen5x8, MEM size: 128GB
 Bandwidth: 32GB/s
 LLM: OPT-66B
 Batch size = 24
 Context length = 512
 Output length = 8
 FlexGen
Algorithm&Software
16
© Lightelligence, Inc.

More Related Content

Similar to Photowave Presentation Slides - 11.8.23.pptx

Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureDanielle Womboldt
 
Introduce: IBM Power Linux with PowerKVM
Introduce: IBM Power Linux with PowerKVMIntroduce: IBM Power Linux with PowerKVM
Introduce: IBM Power Linux with PowerKVMZainal Abidin
 
Micron CXL product and architecture update
Micron CXL product and architecture updateMicron CXL product and architecture update
Micron CXL product and architecture updateMemory Fabric Forum
 
Hortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AIHortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AIDataWorks Summit
 
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptxQ1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptxMemory Fabric Forum
 
Trends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient PerformanceTrends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient Performanceinside-BigData.com
 
IBM Power Systems: Designed for Data
IBM Power Systems: Designed for DataIBM Power Systems: Designed for Data
IBM Power Systems: Designed for DataIBM Power Systems
 
Full scan frenzy at amadeus
Full scan frenzy at amadeusFull scan frenzy at amadeus
Full scan frenzy at amadeusMongoDB
 
RedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power SystemsRedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power SystemsRedis Labs
 
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems SpecialistOWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems SpecialistParis Open Source Summit
 
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red_Hat_Storage
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIBM Switzerland
 
Top 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & AnalysisTop 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & AnalysisNomanSiddiqui41
 
Performance of State-of-the-Art Cryptography on ARM-based Microprocessors
Performance of State-of-the-Art Cryptography on ARM-based MicroprocessorsPerformance of State-of-the-Art Cryptography on ARM-based Microprocessors
Performance of State-of-the-Art Cryptography on ARM-based MicroprocessorsHannes Tschofenig
 
MemVerge - The Dawn of Big Memory
MemVerge - The Dawn of Big MemoryMemVerge - The Dawn of Big Memory
MemVerge - The Dawn of Big MemoryMemory Fabric Forum
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsGanesan Narayanasamy
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems Ganesan Narayanasamy
 
Synergistic processing in cell's multicore architecture
Synergistic processing in cell's multicore architectureSynergistic processing in cell's multicore architecture
Synergistic processing in cell's multicore architectureMichael Gschwind
 

Similar to Photowave Presentation Slides - 11.8.23.pptx (20)

Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 
Introduce: IBM Power Linux with PowerKVM
Introduce: IBM Power Linux with PowerKVMIntroduce: IBM Power Linux with PowerKVM
Introduce: IBM Power Linux with PowerKVM
 
Micron CXL product and architecture update
Micron CXL product and architecture updateMicron CXL product and architecture update
Micron CXL product and architecture update
 
Hortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AIHortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AI
 
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptxQ1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
 
Trends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient PerformanceTrends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient Performance
 
IBM Power Systems: Designed for Data
IBM Power Systems: Designed for DataIBM Power Systems: Designed for Data
IBM Power Systems: Designed for Data
 
Full scan frenzy at amadeus
Full scan frenzy at amadeusFull scan frenzy at amadeus
Full scan frenzy at amadeus
 
RedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power SystemsRedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power Systems
 
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems SpecialistOWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
 
IBM HPC Transformation with AI
IBM HPC Transformation with AI IBM HPC Transformation with AI
IBM HPC Transformation with AI
 
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bk
 
Top 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & AnalysisTop 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & Analysis
 
11540800.ppt
11540800.ppt11540800.ppt
11540800.ppt
 
Performance of State-of-the-Art Cryptography on ARM-based Microprocessors
Performance of State-of-the-Art Cryptography on ARM-based MicroprocessorsPerformance of State-of-the-Art Cryptography on ARM-based Microprocessors
Performance of State-of-the-Art Cryptography on ARM-based Microprocessors
 
MemVerge - The Dawn of Big Memory
MemVerge - The Dawn of Big MemoryMemVerge - The Dawn of Big Memory
MemVerge - The Dawn of Big Memory
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systems
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems
 
Synergistic processing in cell's multicore architecture
Synergistic processing in cell's multicore architectureSynergistic processing in cell's multicore architecture
Synergistic processing in cell's multicore architecture
 

More from Memory Fabric Forum

H3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptxH3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptxMemory Fabric Forum
 
Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.
Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.
Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.Memory Fabric Forum
 
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IP
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IPQ1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IP
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IPMemory Fabric Forum
 
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and Devices
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and DevicesQ1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and Devices
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and DevicesMemory Fabric Forum
 
Q1 Memory Fabric Forum: About MindShare Training
Q1 Memory Fabric Forum: About MindShare TrainingQ1 Memory Fabric Forum: About MindShare Training
Q1 Memory Fabric Forum: About MindShare TrainingMemory Fabric Forum
 
Q1 Memory Fabric Forum: CXL-Related Activities within OCP
Q1 Memory Fabric Forum: CXL-Related Activities within OCPQ1 Memory Fabric Forum: CXL-Related Activities within OCP
Q1 Memory Fabric Forum: CXL-Related Activities within OCPMemory Fabric Forum
 
Q1 Memory Fabric Forum: CXL Controller by Montage Technology
Q1 Memory Fabric Forum: CXL Controller by Montage TechnologyQ1 Memory Fabric Forum: CXL Controller by Montage Technology
Q1 Memory Fabric Forum: CXL Controller by Montage TechnologyMemory Fabric Forum
 
Q1 Memory Fabric Forum: Teledyne LeCroy | Austin Labs
Q1 Memory Fabric Forum: Teledyne LeCroy | Austin LabsQ1 Memory Fabric Forum: Teledyne LeCroy | Austin Labs
Q1 Memory Fabric Forum: Teledyne LeCroy | Austin LabsMemory Fabric Forum
 
Q1 Memory Fabric Forum: SMART CXL Product Lineup
Q1 Memory Fabric Forum: SMART CXL Product LineupQ1 Memory Fabric Forum: SMART CXL Product Lineup
Q1 Memory Fabric Forum: SMART CXL Product LineupMemory Fabric Forum
 
Q1 Memory Fabric Forum: CXL Form Factor Primer
Q1 Memory Fabric Forum: CXL Form Factor PrimerQ1 Memory Fabric Forum: CXL Form Factor Primer
Q1 Memory Fabric Forum: CXL Form Factor PrimerMemory Fabric Forum
 
Q1 Memory Fabric Forum: Memory Fabric in a Composable System
Q1 Memory Fabric Forum: Memory Fabric in a Composable SystemQ1 Memory Fabric Forum: Memory Fabric in a Composable System
Q1 Memory Fabric Forum: Memory Fabric in a Composable SystemMemory Fabric Forum
 
Q1 Memory Fabric Forum: Big Memory Computing for AI
Q1 Memory Fabric Forum: Big Memory Computing for AIQ1 Memory Fabric Forum: Big Memory Computing for AI
Q1 Memory Fabric Forum: Big Memory Computing for AIMemory Fabric Forum
 
Q1 Memory Fabric Forum: Micron CXL-Compatible Memory Modules
Q1 Memory Fabric Forum: Micron CXL-Compatible Memory ModulesQ1 Memory Fabric Forum: Micron CXL-Compatible Memory Modules
Q1 Memory Fabric Forum: Micron CXL-Compatible Memory ModulesMemory Fabric Forum
 
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 Update
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 UpdateQ1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 Update
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 UpdateMemory Fabric Forum
 
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)Memory Fabric Forum
 
Q1 Memory Fabric Forum: XConn CXL Switches for AI
Q1 Memory Fabric Forum: XConn CXL Switches for AIQ1 Memory Fabric Forum: XConn CXL Switches for AI
Q1 Memory Fabric Forum: XConn CXL Switches for AIMemory Fabric Forum
 
Q1 Memory Fabric Forum: VMware Memory Vision
Q1 Memory Fabric Forum: VMware Memory VisionQ1 Memory Fabric Forum: VMware Memory Vision
Q1 Memory Fabric Forum: VMware Memory VisionMemory Fabric Forum
 
MemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the BudgetMemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the BudgetMemory Fabric Forum
 
Micron - CXL Enabling New Pliability in the Modern Data Center.pptx
Micron - CXL Enabling New Pliability in the Modern Data Center.pptxMicron - CXL Enabling New Pliability in the Modern Data Center.pptx
Micron - CXL Enabling New Pliability in the Modern Data Center.pptxMemory Fabric Forum
 
MemVerge: Past Present and Future of CXL
MemVerge: Past Present and Future of CXLMemVerge: Past Present and Future of CXL
MemVerge: Past Present and Future of CXLMemory Fabric Forum
 

More from Memory Fabric Forum (20)

H3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptxH3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptx
 
Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.
Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.
Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.
 
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IP
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IPQ1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IP
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IP
 
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and Devices
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and DevicesQ1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and Devices
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and Devices
 
Q1 Memory Fabric Forum: About MindShare Training
Q1 Memory Fabric Forum: About MindShare TrainingQ1 Memory Fabric Forum: About MindShare Training
Q1 Memory Fabric Forum: About MindShare Training
 
Q1 Memory Fabric Forum: CXL-Related Activities within OCP
Q1 Memory Fabric Forum: CXL-Related Activities within OCPQ1 Memory Fabric Forum: CXL-Related Activities within OCP
Q1 Memory Fabric Forum: CXL-Related Activities within OCP
 
Q1 Memory Fabric Forum: CXL Controller by Montage Technology
Q1 Memory Fabric Forum: CXL Controller by Montage TechnologyQ1 Memory Fabric Forum: CXL Controller by Montage Technology
Q1 Memory Fabric Forum: CXL Controller by Montage Technology
 
Q1 Memory Fabric Forum: Teledyne LeCroy | Austin Labs
Q1 Memory Fabric Forum: Teledyne LeCroy | Austin LabsQ1 Memory Fabric Forum: Teledyne LeCroy | Austin Labs
Q1 Memory Fabric Forum: Teledyne LeCroy | Austin Labs
 
Q1 Memory Fabric Forum: SMART CXL Product Lineup
Q1 Memory Fabric Forum: SMART CXL Product LineupQ1 Memory Fabric Forum: SMART CXL Product Lineup
Q1 Memory Fabric Forum: SMART CXL Product Lineup
 
Q1 Memory Fabric Forum: CXL Form Factor Primer
Q1 Memory Fabric Forum: CXL Form Factor PrimerQ1 Memory Fabric Forum: CXL Form Factor Primer
Q1 Memory Fabric Forum: CXL Form Factor Primer
 
Q1 Memory Fabric Forum: Memory Fabric in a Composable System
Q1 Memory Fabric Forum: Memory Fabric in a Composable SystemQ1 Memory Fabric Forum: Memory Fabric in a Composable System
Q1 Memory Fabric Forum: Memory Fabric in a Composable System
 
Q1 Memory Fabric Forum: Big Memory Computing for AI
Q1 Memory Fabric Forum: Big Memory Computing for AIQ1 Memory Fabric Forum: Big Memory Computing for AI
Q1 Memory Fabric Forum: Big Memory Computing for AI
 
Q1 Memory Fabric Forum: Micron CXL-Compatible Memory Modules
Q1 Memory Fabric Forum: Micron CXL-Compatible Memory ModulesQ1 Memory Fabric Forum: Micron CXL-Compatible Memory Modules
Q1 Memory Fabric Forum: Micron CXL-Compatible Memory Modules
 
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 Update
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 UpdateQ1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 Update
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 Update
 
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
 
Q1 Memory Fabric Forum: XConn CXL Switches for AI
Q1 Memory Fabric Forum: XConn CXL Switches for AIQ1 Memory Fabric Forum: XConn CXL Switches for AI
Q1 Memory Fabric Forum: XConn CXL Switches for AI
 
Q1 Memory Fabric Forum: VMware Memory Vision
Q1 Memory Fabric Forum: VMware Memory VisionQ1 Memory Fabric Forum: VMware Memory Vision
Q1 Memory Fabric Forum: VMware Memory Vision
 
MemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the BudgetMemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the Budget
 
Micron - CXL Enabling New Pliability in the Modern Data Center.pptx
Micron - CXL Enabling New Pliability in the Modern Data Center.pptxMicron - CXL Enabling New Pliability in the Modern Data Center.pptx
Micron - CXL Enabling New Pliability in the Modern Data Center.pptx
 
MemVerge: Past Present and Future of CXL
MemVerge: Past Present and Future of CXLMemVerge: Past Present and Future of CXL
MemVerge: Past Present and Future of CXL
 

Recently uploaded

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Photowave Presentation Slides - 11.8.23.pptx

  • 1. Harnessing light to power new possibilities Advantages of Optical CXL for Disaggregated Compute Architectures Ron Swartzentruber Director of Engineering
  • 2. 2 Agenda  Memory centric shift in the data center  AI Large Language Model growth  Need for optical CXL technology  Case study: OPT inference benefits using optical CXL © Lightelligence, Inc.
  • 3. 3 Physical Machine 0 Virtual Machine 0 Virtual Machine 1 Stranded resource Physical Machine 1 Stranded resource Virtual Machine 2 Virtual Machine 3 FLEXIBLE MANAGEABLE ECONOMICAL OPEN Physical Machine 1 Physical Machine 0 Physical Machine 2 …… …… …… Virtual Machine 0 Virtual Machine 1 Virtual Machine 2 Virtual Machine 3 Disaggregation is the Future for Datacenter Virtual Machine 4 CPU cores DRAM Accelerators © Lightelligence, Inc.
  • 4. 4 AI trends  AI and Large Language Models will continue to grow and consume more compute  Disaggregated memory architectures are required in order to continue to scale  Optical interconnects are required to extend reach Source: https://medium.com/riselab/ai-and-memory-wall-2cb4265cb0b8 Source: https://hc34.hotchips.org/assets/program/tutorials/CXL/Hot%20Chips%202022%20CXL%20MemoryChallenges.pdf © Lightelligence, Inc.
  • 5. 5 Optical Interconnect Latency © Lightelligence, Inc. 100s of ns 100s of 𝜇s
  • 6. 6 CXL is the PredominantStandard for Disaggregation Cache- coherence Latency Memory decouple CXL Yes ~100ns Supported RDMA (ethernet) No ~3μs Not supported CXL 2.0 Switch Standardized Fabric Manager H1 H2 H3 H4 H# …… CXL2.0 CXL2.0 CXL2.0 CXL2.0 CXL2.0 CXL1.0 CXL2.0 CXL2.0 CXL2.0 CXL2.0 D1 D2 D3 D4 …… D# © Lightelligence, Inc.
  • 7. 7 OpticalCXL is Required forScaling ATTENUATION (DB) 0 -10 -20 -30 -40 -50 PROPAGATION DISTANCE (M) 1m 10m 0 -0.003 -4 -40 Copper Optics Assuming AWG26 wire, PCIe 5.0 signal 32 cables with diameter > 6mm (CAT8) 16 fibers with diameter of 0.125mm … … 6mm > 30 mm Copper <1mm Optics Supporting PCIe 5.0 x64 © Lightelligence, Inc.
  • 8. 8 OpticalCXL in the Datacenter Compute Break Through the Rack! Memory Banks © Lightelligence, Inc.
  • 9. 9 Case study: LLM Inference 128GB CXL Memory Expander 128GB CXL Memory Expander Server 2x CXL 1.1 CPUs  2U Supermicro server  2x AMD Genoa CXL 1.1 CPUs  MemVerge Memory Tiering and Pooling Software  2x Micron 256GB Memory Expanders each with CXL/PCIe Gen5x8 link Memory Expansion Module Photowave Card  Nvidia GPU running LLM inference  All VMs access to CXL memory  Secure application, encrypted data Demo @ booth #1392 Photowave Card © Lightelligence, Inc.
  • 10. 10 LLM Model List Model Weight Memory(float16) KV-Cache per sample(float16) Activation per sample(float16) Context length OPT-1.3B 2.4 GB 0.095 GB 0.002 GB 512 OPT-13B 23.921 GB 0.397 GB 0.005 GB 512 OPT-30B 55.803 GB 0.667 GB 0.007 GB 512 OPT-66B 122.375 GB 1.143 GB 0.009 GB 512 OPT-175B 325GB 2.285GB 0.012GB 512 KV-cache Size: data_type * dimension* num_layers* batch_size * Context_len * 2 e.g., for opt-1.3B, FP16 -> 2Bytes * 2048 * 24 * 1 * 512 * 2 = 100,663,296 Bytes Activation Size: data_type * dimension * batch_size * Context_len Entire OPT-66B model fits within one 128GB CXL memory expander © Lightelligence, Inc.
  • 11.  CXL: 882MB/s, System Memory 857MB/s, Disk: 582MB/s, MemVerge: 493MB/s  CXL: 2365MB/s, System Memory: 2609MB/s, Disk: 1887MB/s, MemVerge: 2173MB/s 11 Results ~2.4x © Lightelligence, Inc. OPT-66B model results Disk (NVMe) CXL Memory System Memory MemVerge 60:40Policy Decode Throughput (Tokens/s) 1.984 4.859 6.216 6.237 Decode Latency(s) 338.7 138.2 108.1 107.7
  • 12. 12 PHOTOWAVETM OPTICALCXL MEMORY EXPANDER © Lightelligence, Inc.
  • 13. PHOTOWAVETM OPTICALCXL MEMORY EXPANDER CXL GPU UTILIZATION GPU MEM. UTILIZATION CPU UTILIZATION MEM. UTILIZATION CXL MEM. UTILIZATION DECODE THROUGHPUT GENOA AMD CPU SAMSUNG CXL 128GB NVIDIA GPU: 1xA10 24GB OPT-66B MODEL PROGRESS 99% TOKENS/S PARAMETERS INFERENCE ENGINE: FLEXGEN KV CACHE: 109.688GB RUN MODE: CXL WEIGHTS: 122.375GB 95% 77% 51% 27% 77% CXL DIS K ✔️ ✔️ 13 © Lightelligence, Inc. NVMe
  • 14. Summary of Results CXL memory offloading is efficient and beneficial  LLM inference case study  Allows use of lower cost memory Similar performance compared to pure system memory 1.9xTCO improvement with inexpensiveGPUs at similar throughput 2.4x performance advantage compared to SSD/NVMe disk offloading 14 © Lightelligence, Inc.
  • 15. PhotowaveTM Form Factors  CXL 2.0/PCIe Gen5 x16  Jitter reduction, SI cleanup  Sideband signals over optics  x8, x4 or x2 bifurcation  End-to-end latency:  Card: under 20ns + TOF  AOC: 1ns + TOF Low ProfilePCIeCard OCP3.0SFFCard ActiveOpticalCables ProductSuite Features 15 © Lightelligence, Inc.
  • 16. Endnotes Hardware configuration Super Micro Server  AMD EPYC 9124 16-Core CPU  Samsung DDR5 4800 MT/s  MEM0 size: 256GB  MEM1 size: 256GB  Bandwidth: 307GB/s Nvidia GPU  Gen4x16, DMEM size: 24GB  Bandwidth: 32GB/s Samsung NVME  Gen4x4, MEM size: 1.92TB  Bandwidth: 8GB/s Samsung CXL Memory  Gen5x8, MEM size: 128GB  Bandwidth: 32GB/s  LLM: OPT-66B  Batch size = 24  Context length = 512  Output length = 8  FlexGen Algorithm&Software 16 © Lightelligence, Inc.

Editor's Notes

  1. Key message: CXL is industry consensus for disaggregation
  2. MemVerge policy: System memory 60%, CXL Memory 40%
  3. What is CPU% ******************************************************************************** CPU% mem 29.525 cxl 31.431000000000004 disk 81.46 main.py:36: MatplotlibDeprecationWarning: Calling gca() with keyword arguments was deprecated in Matplotlib 3.4. Starting two minor releases later, gca() will take no keyword arguments. The gca() function should only be used to get the current axes, or if no axes exist, create new axes with default keyword arguments. To create a new axes with non-default arguments, use plt.axes() or plt.subplot(). ax = plt.gca(facecolor='black') ******************************************************************************** MEM% mem 27.2 cxl 27.2 disk 11.722000000000001 ******************************************************************************** GPU% mem 99.49 cxl 97.05 disk 53.01 ******************************************************************************** CXLMEM% mem 0.0016306192454823602 cxl 77.11430249904593 disk 0.1663918208702139 ******************************************************************************** GPUMEM% mem 45.17 cxl 49.0 disk 34.71 ******************************************************************************** GPUMEM_USED_MB mem 9213.4375 cxl 9213.4375 disk 8979.4375 ******************************************************************************** PCI_TX_MBps mem 274.2578125 cxl 191.357421875 disk 81.064453125 ******************************************************************************** PCI_RX_MBps mem 2007.3828125 cxl 1422.421875 disk 1158.056640625 (tfpy38) hussainazhar@Hussains-MacBook-Air T4gpu % python main.py ******************************************************************************** CPU% mem 29.525 cxl 31.431000000000004 disk 81.46 main.py:36: MatplotlibDeprecationWarning: Calling gca() with keyword arguments was deprecated in Matplotlib 3.4. Starting two minor releases later, gca() will take no keyword arguments. The gca() function should only be used to get the current axes, or if no axes exist, create new axes with default keyword arguments. To create a new axes with non-default arguments, use plt.axes() or plt.subplot(). ax = plt.gca(facecolor='black') ******************************************************************************** MEM% mem 27.2 cxl 27.2 disk 11.722000000000001 ******************************************************************************** GPU% mem 99.49 cxl 97.05 disk 53.01 ******************************************************************************** CXLMEM% mem 0.0016306192454823602 cxl 77.11430249904593 disk 0.1663918208702139 ******************************************************************************** GPUMEM% mem 45.17 cxl 49.0 disk 34.71 ******************************************************************************** GPUMEM_USED_MB mem 9213.4375 cxl 9213.4375 disk 8979.4375 ******************************************************************************** PCI_TX_MBps mem 274.2578125 cxl 191.357421875 disk 81.064453125 ******************************************************************************** PCI_RX_MBps mem 2007.3828125 cxl 1422.421875 disk 1158.056640625