SlideShare a Scribd company logo
Using CXL with AI
Applications
Memory Fabric Forum
Steve Scargall, Senior Product Manager
• CXL 1.1 Memory Expansion Form Factors
• Latency and Bandwidth Memory Placement Strategies
• RDBMS Investigation and Results
• Vector Database Investigation and Results
• Understanding Your Application Behavior
Agenda
2
CXL Memory Expansion Form Factors
3
E3.S Memory Modules
PCIe Add-In Cards (AICs)
DDR DIMMs
Add-in Card (AIC)
• Flexible capacity, up to 2 TB per card
• Higher bandwidth, up to x16 PCIe5 lanes
(~ 1x DDR5 channel)
E3.S Module
• Easy front loading, same as SSDs
• Fixed capacity – 128, 256, & 512 GB
• Lower bandwidth at x8 PCIe5 lanes
1. Configure CXL as ‘Special Purpose’ in the BIOS
2. The Linux Kernel creates a DEVDAX (/dev/daxX.Y)
3. Convert to a System-RAM namespace:
$ sudo daxctl reconfigure-device --mode=system-ram daxX.Y
4. CXL Memory appears as a new memory-only NUMA Node
Using CXL Type 3 Memory with Apps
4
# numactl -H
available: 3 nodes (0-2)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 …
node 0 size: 515714 MB
node 0 free: 2321 MB
node 1 cpus: 48 49 50 51 52 53 54 55 56 …
node 1 size: 516064 MB
node 1 free: 499038 MB
node 2 cpus:
node 2 size: 129023 MB  CXL Memory
node 2 free: 203 MB
node distances:
node 0 1 2
0: 10 21 14
1: 21 10 24
2: 14 24 10
• Latency tiering intelligently manages data placement and
movement across heterogeneous memory devices to optimize
performance based on the "temperature" of memory pages – Hot or
Cold(er) and device characteristics.
Latency Optimized Memory Placement
5
Application
Hot Cold
DRAM CXL
TPC-C Results MySQL (Latency Policy)
6
SUT: Intel Xeon Gold 6438Y, 1024GiB DRAM in 1DPC, 1x CXL AIC Memory Expander (x16 Lanes), OS: Ubuntu 22.04.04, Kernel 6.2.15, MySQL 8.x
Performance varies by use, configuration, and other factors. Performance results are based on testing as of the benchmark date using production and
pre-production hardware and software. Your results may vary.
0.00
1.00
2.00
3.00
4.00
5.00
0 100 200 300 400 500 600 700 800 900 1000
Transactions
per
Second
(TPS)
Number of Clients
TPS: Relative to TPP
(Higher is better)
Kernel TPP MM3.0
0.00
1.00
2.00
3.00
4.00
5.00
0 100 200 300 400 500 600 700 800 900 1000
Queries
per
Second
(QPS)
Number of Clients
QPS: Relative to TPP
(Higher is better)
Kernel TPP MM3.x
0.00
0.20
0.40
0.60
0.80
1.00
1.20
0 100 200 300 400 500 600 700 800 900 1000
P95
Latency
(ms)
Number of Clients
Reduced P95 Latency: Relative to TPP
(Lower is better)
Kernel TPP MM3.x
0.00
0.50
1.00
1.50
2.00
2.50
0 100 200 300 400 500 600 700 800 900 1000
CPU
Utilization
(%) Number of Clients
CPU Utilization: Relative to TPP
(Lower is better)
Kernel TPP MM3.x
• The goal is to maximize the overall system bandwidth by strategically placing data
between DRAM and CXL.
• Hot and Cold data can be placed on DRAM and CXL.
• Strategies include Equal Interleaving, Weighted Interleaving, Random Page
Selection, Intelligent Page Selection, etc.
• The Ratio of DRAM:CXL needs to be determined. Use STREAM or Intel MLC to
obtain DRAM and CXL bandwidth numbers.
Bandwidth Optimized Memory Placement
7
Application
DRAM CXL
• The Ratio of DRAM:CXL needs to be determined. Use STREAM or Intel MLC to
obtain DRAM and CXL bandwidth numbers.
Example
• Per CPU Socket:
o DRAM: 8 x DDR5-4800 (1DPC) ~= 300 GB/s
o CXL: 1 x AIC (x16 lanes) ~= 60GB/s
o Bandwidth Ratio ~5:1 DRAM:CXL (20%)
Bandwidth Napkin Math
8
Weaviate Results (Bandwidth Policy)
9
Performance varies by use, configuration, and other factors. Performance results are based on testing as of the benchmark date using production and
pre-production hardware and software. Your results may vary.
SUT: 2xIntel Xeon Platinum 8568CXL, 1024GiB DRAM DDR5-4800 (1DPC), 1x CXL AIC Memory Expander (x16 Lanes), OS: Ubuntu 22.04.04, Kernel 6.2.15, Weaviate v1.23.7
Weaviate Results (Bandwidth Policy)
10
SUT: 2xIntel Xeon Platinum 8568CXL, 1024GiB DRAM DDR5-4800 (1DPC), 1x CXL AIC Memory Expander (x16 Lanes), OS: Ubuntu 22.04.04, Kernel 6.2.15, Weaviate v1.23.7
Performance varies by use, configuration, and other factors. Performance results are based on testing as of the benchmark date using production and
pre-production hardware and software. Your results may vary.
• Use the Top-Down Microarchitectural Analysis Method
o Modern CPUs employ pipelining and techniques like hardware threading, out-of-order
execution, and instruction-level parallelism to utilize resources as effectively as possible.
o Hierarchical organization of event-based metrics that identifies the dominant performance
bottlenecks in an application
Understanding Your Application
11
Source: https://www.intel.com/content/www/us/en/docs/vtune-profiler/cookbook/2023-0/top-down-microarchitecture-analysis-method.html
• Intel VTune Profiler and toplev are great tools to use
Understanding Your Application
12
$ toplev -l2 --nodes '!+Memory_Bound*/3,+Backend_Bound,+MUX' stream_c.exe --ntimes 1000 
--ntimes 1000 --array-size 40M –malloc
<.... Generated application output ... >
# 4.7-full on Intel(R) Xeon(R) Gold 6438Y+ [spr/sapphire_rapids]
BE Backend_Bound % Slots 88.6 [20.0%]
BE/Mem Backend_Bound.Memory_Bound % Slots 62.1 [20.0%]<==
This metric represents fraction of slots the Memory
subsystem within the Backend was a bottleneck...
warning: 5 nodes had zero counts: DRAM_Bound L1_Bound L2_Bound L3_Bound Store_
Bound
Run toplev --describe Memory_Bound^ to get more information on bottleneck
Add --run-sample to find locations
• Website: https://memoryfabricforum.com/
• YouTube: https://www.youtube.com/@MemoryFabricForum
• Slide Share: https://www.slideshare.net/cxladmin
• LinkedIn: https://www.linkedin.com/groups/14324322/
• Discord: https://discord.gg/crKjfp3xCf
Call to Action
13

More Related Content

Similar to Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx

Intel new processors
Intel new processorsIntel new processors
Intel new processors
zaid_b
 
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Odinot Stanislas
 
Supermicro X12 Performance Update
Supermicro X12 Performance UpdateSupermicro X12 Performance Update
Supermicro X12 Performance Update
Rebekah Rodriguez
 
Red Hat Storage Day Boston - Supermicro Super Storage
Red Hat Storage Day Boston - Supermicro Super StorageRed Hat Storage Day Boston - Supermicro Super Storage
Red Hat Storage Day Boston - Supermicro Super Storage
Red_Hat_Storage
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
Ceph Community
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
Patrick McGarry
 
Astera Labs: Intelligent Connectivity for Cloud and AI Infrastructure
Astera Labs:  Intelligent Connectivity for Cloud and AI InfrastructureAstera Labs:  Intelligent Connectivity for Cloud and AI Infrastructure
Astera Labs: Intelligent Connectivity for Cloud and AI Infrastructure
Memory Fabric Forum
 
Future Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPCFuture Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPC
Slide_N
 
MemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the BudgetMemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the Budget
Memory Fabric Forum
 
Deep Dive On Intel Optane SSDs And New Server Platforms
Deep Dive On Intel Optane SSDs And New Server PlatformsDeep Dive On Intel Optane SSDs And New Server Platforms
Deep Dive On Intel Optane SSDs And New Server Platforms
NEXTtour
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of Systems
Anand Haridass
 
Crypto Performance on ARM Cortex-M Processors
Crypto Performance on ARM Cortex-M ProcessorsCrypto Performance on ARM Cortex-M Processors
Crypto Performance on ARM Cortex-M Processors
Hannes Tschofenig
 
X3850x5techpresentation09 29-2010-101118124714-phpapp01
X3850x5techpresentation09 29-2010-101118124714-phpapp01X3850x5techpresentation09 29-2010-101118124714-phpapp01
X3850x5techpresentation09 29-2010-101118124714-phpapp01
Yalçin KARACA
 
The Power of One: Supermicro’s High-Performance Single-Processor Blade Systems
The Power of One: Supermicro’s High-Performance Single-Processor Blade SystemsThe Power of One: Supermicro’s High-Performance Single-Processor Blade Systems
The Power of One: Supermicro’s High-Performance Single-Processor Blade Systems
Rebekah Rodriguez
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Community
 
SUN主机产品介绍.ppt
SUN主机产品介绍.pptSUN主机产品介绍.ppt
SUN主机产品介绍.ppt
PencilData
 
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDSAccelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Ceph Community
 
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral ProgramBig Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
inside-BigData.com
 
Cuda 6 performance_report
Cuda 6 performance_reportCuda 6 performance_report
Cuda 6 performance_report
Michael Zhang
 
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph clusterCeph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Community
 

Similar to Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx (20)

Intel new processors
Intel new processorsIntel new processors
Intel new processors
 
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
 
Supermicro X12 Performance Update
Supermicro X12 Performance UpdateSupermicro X12 Performance Update
Supermicro X12 Performance Update
 
Red Hat Storage Day Boston - Supermicro Super Storage
Red Hat Storage Day Boston - Supermicro Super StorageRed Hat Storage Day Boston - Supermicro Super Storage
Red Hat Storage Day Boston - Supermicro Super Storage
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
Astera Labs: Intelligent Connectivity for Cloud and AI Infrastructure
Astera Labs:  Intelligent Connectivity for Cloud and AI InfrastructureAstera Labs:  Intelligent Connectivity for Cloud and AI Infrastructure
Astera Labs: Intelligent Connectivity for Cloud and AI Infrastructure
 
Future Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPCFuture Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPC
 
MemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the BudgetMemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the Budget
 
Deep Dive On Intel Optane SSDs And New Server Platforms
Deep Dive On Intel Optane SSDs And New Server PlatformsDeep Dive On Intel Optane SSDs And New Server Platforms
Deep Dive On Intel Optane SSDs And New Server Platforms
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of Systems
 
Crypto Performance on ARM Cortex-M Processors
Crypto Performance on ARM Cortex-M ProcessorsCrypto Performance on ARM Cortex-M Processors
Crypto Performance on ARM Cortex-M Processors
 
X3850x5techpresentation09 29-2010-101118124714-phpapp01
X3850x5techpresentation09 29-2010-101118124714-phpapp01X3850x5techpresentation09 29-2010-101118124714-phpapp01
X3850x5techpresentation09 29-2010-101118124714-phpapp01
 
The Power of One: Supermicro’s High-Performance Single-Processor Blade Systems
The Power of One: Supermicro’s High-Performance Single-Processor Blade SystemsThe Power of One: Supermicro’s High-Performance Single-Processor Blade Systems
The Power of One: Supermicro’s High-Performance Single-Processor Blade Systems
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
 
SUN主机产品介绍.ppt
SUN主机产品介绍.pptSUN主机产品介绍.ppt
SUN主机产品介绍.ppt
 
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDSAccelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
 
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral ProgramBig Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
 
Cuda 6 performance_report
Cuda 6 performance_reportCuda 6 performance_report
Cuda 6 performance_report
 
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph clusterCeph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
 

More from Memory Fabric Forum

H3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptxH3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptx
Memory Fabric Forum
 
Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.
Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.
Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.
Memory Fabric Forum
 
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IP
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IPQ1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IP
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IP
Memory Fabric Forum
 
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and Devices
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and DevicesQ1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and Devices
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and Devices
Memory Fabric Forum
 
Q1 Memory Fabric Forum: About MindShare Training
Q1 Memory Fabric Forum: About MindShare TrainingQ1 Memory Fabric Forum: About MindShare Training
Q1 Memory Fabric Forum: About MindShare Training
Memory Fabric Forum
 
Q1 Memory Fabric Forum: CXL-Related Activities within OCP
Q1 Memory Fabric Forum: CXL-Related Activities within OCPQ1 Memory Fabric Forum: CXL-Related Activities within OCP
Q1 Memory Fabric Forum: CXL-Related Activities within OCP
Memory Fabric Forum
 
Q1 Memory Fabric Forum: CXL Controller by Montage Technology
Q1 Memory Fabric Forum: CXL Controller by Montage TechnologyQ1 Memory Fabric Forum: CXL Controller by Montage Technology
Q1 Memory Fabric Forum: CXL Controller by Montage Technology
Memory Fabric Forum
 
Q1 Memory Fabric Forum: Teledyne LeCroy | Austin Labs
Q1 Memory Fabric Forum: Teledyne LeCroy | Austin LabsQ1 Memory Fabric Forum: Teledyne LeCroy | Austin Labs
Q1 Memory Fabric Forum: Teledyne LeCroy | Austin Labs
Memory Fabric Forum
 
Q1 Memory Fabric Forum: SMART CXL Product Lineup
Q1 Memory Fabric Forum: SMART CXL Product LineupQ1 Memory Fabric Forum: SMART CXL Product Lineup
Q1 Memory Fabric Forum: SMART CXL Product Lineup
Memory Fabric Forum
 
Q1 Memory Fabric Forum: CXL Form Factor Primer
Q1 Memory Fabric Forum: CXL Form Factor PrimerQ1 Memory Fabric Forum: CXL Form Factor Primer
Q1 Memory Fabric Forum: CXL Form Factor Primer
Memory Fabric Forum
 
Q1 Memory Fabric Forum: Memory Fabric in a Composable System
Q1 Memory Fabric Forum: Memory Fabric in a Composable SystemQ1 Memory Fabric Forum: Memory Fabric in a Composable System
Q1 Memory Fabric Forum: Memory Fabric in a Composable System
Memory Fabric Forum
 
Q1 Memory Fabric Forum: Big Memory Computing for AI
Q1 Memory Fabric Forum: Big Memory Computing for AIQ1 Memory Fabric Forum: Big Memory Computing for AI
Q1 Memory Fabric Forum: Big Memory Computing for AI
Memory Fabric Forum
 
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXL
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXLQ1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXL
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXL
Memory Fabric Forum
 
Q1 Memory Fabric Forum: Micron CXL-Compatible Memory Modules
Q1 Memory Fabric Forum: Micron CXL-Compatible Memory ModulesQ1 Memory Fabric Forum: Micron CXL-Compatible Memory Modules
Q1 Memory Fabric Forum: Micron CXL-Compatible Memory Modules
Memory Fabric Forum
 
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 Update
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 UpdateQ1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 Update
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 Update
Memory Fabric Forum
 
Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute ...
Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute ...Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute ...
Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute ...
Memory Fabric Forum
 
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
Memory Fabric Forum
 
Q1 Memory Fabric Forum: XConn CXL Switches for AI
Q1 Memory Fabric Forum: XConn CXL Switches for AIQ1 Memory Fabric Forum: XConn CXL Switches for AI
Q1 Memory Fabric Forum: XConn CXL Switches for AI
Memory Fabric Forum
 
Q1 Memory Fabric Forum: VMware Memory Vision
Q1 Memory Fabric Forum: VMware Memory VisionQ1 Memory Fabric Forum: VMware Memory Vision
Q1 Memory Fabric Forum: VMware Memory Vision
Memory Fabric Forum
 
Micron - CXL Enabling New Pliability in the Modern Data Center.pptx
Micron - CXL Enabling New Pliability in the Modern Data Center.pptxMicron - CXL Enabling New Pliability in the Modern Data Center.pptx
Micron - CXL Enabling New Pliability in the Modern Data Center.pptx
Memory Fabric Forum
 

More from Memory Fabric Forum (20)

H3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptxH3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptx
 
Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.
Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.
Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.
 
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IP
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IPQ1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IP
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IP
 
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and Devices
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and DevicesQ1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and Devices
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and Devices
 
Q1 Memory Fabric Forum: About MindShare Training
Q1 Memory Fabric Forum: About MindShare TrainingQ1 Memory Fabric Forum: About MindShare Training
Q1 Memory Fabric Forum: About MindShare Training
 
Q1 Memory Fabric Forum: CXL-Related Activities within OCP
Q1 Memory Fabric Forum: CXL-Related Activities within OCPQ1 Memory Fabric Forum: CXL-Related Activities within OCP
Q1 Memory Fabric Forum: CXL-Related Activities within OCP
 
Q1 Memory Fabric Forum: CXL Controller by Montage Technology
Q1 Memory Fabric Forum: CXL Controller by Montage TechnologyQ1 Memory Fabric Forum: CXL Controller by Montage Technology
Q1 Memory Fabric Forum: CXL Controller by Montage Technology
 
Q1 Memory Fabric Forum: Teledyne LeCroy | Austin Labs
Q1 Memory Fabric Forum: Teledyne LeCroy | Austin LabsQ1 Memory Fabric Forum: Teledyne LeCroy | Austin Labs
Q1 Memory Fabric Forum: Teledyne LeCroy | Austin Labs
 
Q1 Memory Fabric Forum: SMART CXL Product Lineup
Q1 Memory Fabric Forum: SMART CXL Product LineupQ1 Memory Fabric Forum: SMART CXL Product Lineup
Q1 Memory Fabric Forum: SMART CXL Product Lineup
 
Q1 Memory Fabric Forum: CXL Form Factor Primer
Q1 Memory Fabric Forum: CXL Form Factor PrimerQ1 Memory Fabric Forum: CXL Form Factor Primer
Q1 Memory Fabric Forum: CXL Form Factor Primer
 
Q1 Memory Fabric Forum: Memory Fabric in a Composable System
Q1 Memory Fabric Forum: Memory Fabric in a Composable SystemQ1 Memory Fabric Forum: Memory Fabric in a Composable System
Q1 Memory Fabric Forum: Memory Fabric in a Composable System
 
Q1 Memory Fabric Forum: Big Memory Computing for AI
Q1 Memory Fabric Forum: Big Memory Computing for AIQ1 Memory Fabric Forum: Big Memory Computing for AI
Q1 Memory Fabric Forum: Big Memory Computing for AI
 
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXL
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXLQ1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXL
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXL
 
Q1 Memory Fabric Forum: Micron CXL-Compatible Memory Modules
Q1 Memory Fabric Forum: Micron CXL-Compatible Memory ModulesQ1 Memory Fabric Forum: Micron CXL-Compatible Memory Modules
Q1 Memory Fabric Forum: Micron CXL-Compatible Memory Modules
 
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 Update
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 UpdateQ1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 Update
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 Update
 
Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute ...
Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute ...Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute ...
Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute ...
 
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
 
Q1 Memory Fabric Forum: XConn CXL Switches for AI
Q1 Memory Fabric Forum: XConn CXL Switches for AIQ1 Memory Fabric Forum: XConn CXL Switches for AI
Q1 Memory Fabric Forum: XConn CXL Switches for AI
 
Q1 Memory Fabric Forum: VMware Memory Vision
Q1 Memory Fabric Forum: VMware Memory VisionQ1 Memory Fabric Forum: VMware Memory Vision
Q1 Memory Fabric Forum: VMware Memory Vision
 
Micron - CXL Enabling New Pliability in the Modern Data Center.pptx
Micron - CXL Enabling New Pliability in the Modern Data Center.pptxMicron - CXL Enabling New Pliability in the Modern Data Center.pptx
Micron - CXL Enabling New Pliability in the Modern Data Center.pptx
 

Recently uploaded

Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 

Recently uploaded (20)

Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 

Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx

  • 1. Using CXL with AI Applications Memory Fabric Forum Steve Scargall, Senior Product Manager
  • 2. • CXL 1.1 Memory Expansion Form Factors • Latency and Bandwidth Memory Placement Strategies • RDBMS Investigation and Results • Vector Database Investigation and Results • Understanding Your Application Behavior Agenda 2
  • 3. CXL Memory Expansion Form Factors 3 E3.S Memory Modules PCIe Add-In Cards (AICs) DDR DIMMs Add-in Card (AIC) • Flexible capacity, up to 2 TB per card • Higher bandwidth, up to x16 PCIe5 lanes (~ 1x DDR5 channel) E3.S Module • Easy front loading, same as SSDs • Fixed capacity – 128, 256, & 512 GB • Lower bandwidth at x8 PCIe5 lanes
  • 4. 1. Configure CXL as ‘Special Purpose’ in the BIOS 2. The Linux Kernel creates a DEVDAX (/dev/daxX.Y) 3. Convert to a System-RAM namespace: $ sudo daxctl reconfigure-device --mode=system-ram daxX.Y 4. CXL Memory appears as a new memory-only NUMA Node Using CXL Type 3 Memory with Apps 4 # numactl -H available: 3 nodes (0-2) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 … node 0 size: 515714 MB node 0 free: 2321 MB node 1 cpus: 48 49 50 51 52 53 54 55 56 … node 1 size: 516064 MB node 1 free: 499038 MB node 2 cpus: node 2 size: 129023 MB  CXL Memory node 2 free: 203 MB node distances: node 0 1 2 0: 10 21 14 1: 21 10 24 2: 14 24 10
  • 5. • Latency tiering intelligently manages data placement and movement across heterogeneous memory devices to optimize performance based on the "temperature" of memory pages – Hot or Cold(er) and device characteristics. Latency Optimized Memory Placement 5 Application Hot Cold DRAM CXL
  • 6. TPC-C Results MySQL (Latency Policy) 6 SUT: Intel Xeon Gold 6438Y, 1024GiB DRAM in 1DPC, 1x CXL AIC Memory Expander (x16 Lanes), OS: Ubuntu 22.04.04, Kernel 6.2.15, MySQL 8.x Performance varies by use, configuration, and other factors. Performance results are based on testing as of the benchmark date using production and pre-production hardware and software. Your results may vary. 0.00 1.00 2.00 3.00 4.00 5.00 0 100 200 300 400 500 600 700 800 900 1000 Transactions per Second (TPS) Number of Clients TPS: Relative to TPP (Higher is better) Kernel TPP MM3.0 0.00 1.00 2.00 3.00 4.00 5.00 0 100 200 300 400 500 600 700 800 900 1000 Queries per Second (QPS) Number of Clients QPS: Relative to TPP (Higher is better) Kernel TPP MM3.x 0.00 0.20 0.40 0.60 0.80 1.00 1.20 0 100 200 300 400 500 600 700 800 900 1000 P95 Latency (ms) Number of Clients Reduced P95 Latency: Relative to TPP (Lower is better) Kernel TPP MM3.x 0.00 0.50 1.00 1.50 2.00 2.50 0 100 200 300 400 500 600 700 800 900 1000 CPU Utilization (%) Number of Clients CPU Utilization: Relative to TPP (Lower is better) Kernel TPP MM3.x
  • 7. • The goal is to maximize the overall system bandwidth by strategically placing data between DRAM and CXL. • Hot and Cold data can be placed on DRAM and CXL. • Strategies include Equal Interleaving, Weighted Interleaving, Random Page Selection, Intelligent Page Selection, etc. • The Ratio of DRAM:CXL needs to be determined. Use STREAM or Intel MLC to obtain DRAM and CXL bandwidth numbers. Bandwidth Optimized Memory Placement 7 Application DRAM CXL
  • 8. • The Ratio of DRAM:CXL needs to be determined. Use STREAM or Intel MLC to obtain DRAM and CXL bandwidth numbers. Example • Per CPU Socket: o DRAM: 8 x DDR5-4800 (1DPC) ~= 300 GB/s o CXL: 1 x AIC (x16 lanes) ~= 60GB/s o Bandwidth Ratio ~5:1 DRAM:CXL (20%) Bandwidth Napkin Math 8
  • 9. Weaviate Results (Bandwidth Policy) 9 Performance varies by use, configuration, and other factors. Performance results are based on testing as of the benchmark date using production and pre-production hardware and software. Your results may vary. SUT: 2xIntel Xeon Platinum 8568CXL, 1024GiB DRAM DDR5-4800 (1DPC), 1x CXL AIC Memory Expander (x16 Lanes), OS: Ubuntu 22.04.04, Kernel 6.2.15, Weaviate v1.23.7
  • 10. Weaviate Results (Bandwidth Policy) 10 SUT: 2xIntel Xeon Platinum 8568CXL, 1024GiB DRAM DDR5-4800 (1DPC), 1x CXL AIC Memory Expander (x16 Lanes), OS: Ubuntu 22.04.04, Kernel 6.2.15, Weaviate v1.23.7 Performance varies by use, configuration, and other factors. Performance results are based on testing as of the benchmark date using production and pre-production hardware and software. Your results may vary.
  • 11. • Use the Top-Down Microarchitectural Analysis Method o Modern CPUs employ pipelining and techniques like hardware threading, out-of-order execution, and instruction-level parallelism to utilize resources as effectively as possible. o Hierarchical organization of event-based metrics that identifies the dominant performance bottlenecks in an application Understanding Your Application 11 Source: https://www.intel.com/content/www/us/en/docs/vtune-profiler/cookbook/2023-0/top-down-microarchitecture-analysis-method.html
  • 12. • Intel VTune Profiler and toplev are great tools to use Understanding Your Application 12 $ toplev -l2 --nodes '!+Memory_Bound*/3,+Backend_Bound,+MUX' stream_c.exe --ntimes 1000 --ntimes 1000 --array-size 40M –malloc <.... Generated application output ... > # 4.7-full on Intel(R) Xeon(R) Gold 6438Y+ [spr/sapphire_rapids] BE Backend_Bound % Slots 88.6 [20.0%] BE/Mem Backend_Bound.Memory_Bound % Slots 62.1 [20.0%]<== This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck... warning: 5 nodes had zero counts: DRAM_Bound L1_Bound L2_Bound L3_Bound Store_ Bound Run toplev --describe Memory_Bound^ to get more information on bottleneck Add --run-sample to find locations
  • 13. • Website: https://memoryfabricforum.com/ • YouTube: https://www.youtube.com/@MemoryFabricForum • Slide Share: https://www.slideshare.net/cxladmin • LinkedIn: https://www.linkedin.com/groups/14324322/ • Discord: https://discord.gg/crKjfp3xCf Call to Action 13