- Memory intensive workloads are dominating computing and increasing memory capacity just with CPU-attached DRAM is getting expensive.
- CXL allows augmenting system memory footprint at lower cost by running over existing PCIe links to add memory outside of the CPU package.
- Intel Xeon roadmap fully supports CXL starting with 5th Gen Xeons, and Intel CPUs offer unique hardware-based tiering modes between native DRAM and CXL memory without depending on the operating system.
- CXL has full industry support as the standard for coherent input/output.
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 UpdateMemory Fabric Forum
OCP Steering Committee member and ex-President of the CXL Consortium, Siamak Tavallaei, provides an update on the CXL specifications with a focus on the recently released 3.1 specification.
Torry Steed, Sr. Product Marketing Manager at SMART Modular, provides an overview of CXL PCIe Add-in Cards (AICs) and memory modules that can be used to expand capacity in servers or in external memory pooling systems.
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXLMemory Fabric Forum
Thibault Grossi, Sr. Technology & Market Analyst, shares excerpts from the recently published report, Memory Processor Interface, Focus on CXL. The reports provides a taxonomy of CXL market segments and revenue forecasts through 2028.
During the CXL Forum at OCP Global Summit, Michael Ocampo of Astera Labs explained the problem of the memory wall, and how CXL memory powered by Astera Labs can break through
During the CXL Forum at OCP Global Summit, memory system architect Jungmin Choi of SK hynix talks about the need for memory bandwidth and capacity, and the SK hynix Niagara solution.
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 UpdateMemory Fabric Forum
OCP Steering Committee member and ex-President of the CXL Consortium, Siamak Tavallaei, provides an update on the CXL specifications with a focus on the recently released 3.1 specification.
Torry Steed, Sr. Product Marketing Manager at SMART Modular, provides an overview of CXL PCIe Add-in Cards (AICs) and memory modules that can be used to expand capacity in servers or in external memory pooling systems.
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXLMemory Fabric Forum
Thibault Grossi, Sr. Technology & Market Analyst, shares excerpts from the recently published report, Memory Processor Interface, Focus on CXL. The reports provides a taxonomy of CXL market segments and revenue forecasts through 2028.
During the CXL Forum at OCP Global Summit, Michael Ocampo of Astera Labs explained the problem of the memory wall, and how CXL memory powered by Astera Labs can break through
During the CXL Forum at OCP Global Summit, memory system architect Jungmin Choi of SK hynix talks about the need for memory bandwidth and capacity, and the SK hynix Niagara solution.
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IPMemory Fabric Forum
Gary Ruggles, Sr Product Manger for PCIe and CXL Controller IP, provides an provides example use cases for adoption of CXL, an introduction to Synopsys CXL IP Solutions, interop and proof points.
In the CXL Forum Theater at SC23 hosted by MemVerge, the Open Compute Project provided an overview of CXL, as well as CXL-related hardware and software projects at OCP
During the CXL Forum at OCP Global Summit, Dharmesh Jani of Meta and Siamak Tavalllei of the CXL Consortium describe the extensive work being done by the Open Compute Project related to CXL
During the CXL Forum at OCP Global Summit 23, Rick Kutcipal and Sreeni Bagalkote of Broadcom presented their PCIe/CXL Roadmap and announced their Atlas 4 CXL switch.
During the CXL Forum at OCP Global Summit, Enfabrica CEO Rochan Sankar described how to bridge the network and memory worlds with their accelerated compute fabric switch.
During the CXL Forum at OCP Global Summit, Mahesh Wagh, CXL Consortium TTF Co-chair and Senior Fellow at AMD, presented and update of the CXL Consortium mission and road map.
Arm: Enabling CXL devices within the Data Center with Arm SolutionsMemory Fabric Forum
During the CXL Forum at OCP Summit, Arm Director of Segment Marketing Parag Beeraka provides and overview of the Arm portfolio of CXL products for the Data Center
During the CXL Forum at OCP Global Summit, SMART Modular Director Product Marketing Arthur Sainio, provides an overview of the company's CXL memory cards and modules.
All Presentations during CXL Forum at Flash Memory Summit 22Memory Fabric Forum
The document summarizes a full-day forum hosted by the CXL Consortium and MemVerge on CXL. The morning agenda includes presentations on CXL from representatives of Google, Intel, PCI-SIG, Marvell, Samsung, and Micron. The afternoon agenda includes panels on CXL usage models from Meta, OCP, Anthropic, and MemVerge. A keynote presentation provides an update on the CXL Consortium and the recently released CXL 3.0 specification, including its expanded fabric capabilities and management features. The specification is aimed at enabling new usage models for memory sharing and expansion to address industry trends toward increased data processing demands.
Shared Memory Centric Computing with CXL & OMIAllan Cantle
Discusses how CXL can be better utilized as a separate Fabric Cache domain to a processors own Local Cache Domain. This is done by leveraging a Shared Memory Centric architectures that utilize both the Open Memory Interface OMI, and Compute eXpress Link, CXL, for the memory ports.
How Development Teams Cut Costs with ScyllaDB.pdfScyllaDB
Now that teams are increasingly being pressed to cut costs, the database can be a low-hanging fruit for sizable cost reduction – especially if you’re managing terabytes to petabytes of data with millions of read/write operations per second.
Join Tzach Livyatan, VP of Product at ScyllaDB, as he shares four ways that teams commonly cut database costs by rethinking their database strategy. We’ll cover topics including:
- Cutting admin costs by reducing node sprawl and reducing the need for tuning
- ScyllaDB as a better, compatible Amazon DynamoDB
- Options to increase price performance through new cloud instances
- Ways to safely add more workloads to your cluster without compromising the performance of your latency-sensitive workloads
In this deck, Yuichiro Ajima from Fujitsu presents: The Tofu Interconnect D.
"Through the development of post-K, which will be equipped with this CPU, Fujitsu will contribute to the resolution of social and scientific issues in such computer simulation fields as cutting-edge research, health and longevity, disaster prevention and mitigation, energy, as well as manufacturing, while enhancing industrial competitiveness and contributing to the creation of Society 5.0 by promoting applications in big data and AI fields."
Learn more: https://insidehpc.com/2018/08/fujitsu-unveils-details-post-k-supercomputer-processor-powered-arm/
and
http://www.fujitsu.com/jp/solutions/business-technology/tc/catalog/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
During the CXL Forum at OCP Global Summit, Jeff Hilland of HPE explained what CXL, PCI SIG, DMTF, OFA, OCP, and SNIA are doing to make CXL fabric, memory and device management interoperable.
MySQL exposes a collection of tunable parameters and indicators that is frankly intimidating. But a poorly tuned MySQL server is a bottleneck for your PHP application scalability. This session shows how to do InnoDB tuning and read the InnoDB status report in MySQL 5.5.
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203Linaro
Session ID: SFO17-203
Session Name: Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203
Speaker: Fu Wei
Track: LEG
★ Session Summary ★
This presentation gives an updated RAS architecture on ARM64 base on RAS extension (in ARMv8.2), SDEI (Software Delegated Exception Interface), APEI, UEFI PI-SMM. Will talk about all the components of the new RAS architecture on ARM64, gives audience the current status and the next step of development.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/sfo17/sfo17-203/
Presentation:
Video: https://www.youtube.com/watch?v=NReFBzbeWi0
---------------------------------------------------
★ Event Details ★
Linaro Connect San Francisco 2017 (SFO17)
25-29 September 2017
Hyatt Regency San Francisco Airport
---------------------------------------------------
Keyword:
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://twitter.com/linaroorg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and DevicesMemory Fabric Forum
Ravi Gummaluri, Director, CXL System Architecture at Micron describes use cases for memory expansion with tiered DRAM and CXL memory, along with performance data.
Ecosystem Alliance Manager Michael Ocampo talks about the CXL industry's effort to break through the memory wall, memory bound use cases, CXL for modular shared infrastructure, and critical CXL collaboration that's happening now.
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IPMemory Fabric Forum
Gary Ruggles, Sr Product Manger for PCIe and CXL Controller IP, provides an provides example use cases for adoption of CXL, an introduction to Synopsys CXL IP Solutions, interop and proof points.
In the CXL Forum Theater at SC23 hosted by MemVerge, the Open Compute Project provided an overview of CXL, as well as CXL-related hardware and software projects at OCP
During the CXL Forum at OCP Global Summit, Dharmesh Jani of Meta and Siamak Tavalllei of the CXL Consortium describe the extensive work being done by the Open Compute Project related to CXL
During the CXL Forum at OCP Global Summit 23, Rick Kutcipal and Sreeni Bagalkote of Broadcom presented their PCIe/CXL Roadmap and announced their Atlas 4 CXL switch.
During the CXL Forum at OCP Global Summit, Enfabrica CEO Rochan Sankar described how to bridge the network and memory worlds with their accelerated compute fabric switch.
During the CXL Forum at OCP Global Summit, Mahesh Wagh, CXL Consortium TTF Co-chair and Senior Fellow at AMD, presented and update of the CXL Consortium mission and road map.
Arm: Enabling CXL devices within the Data Center with Arm SolutionsMemory Fabric Forum
During the CXL Forum at OCP Summit, Arm Director of Segment Marketing Parag Beeraka provides and overview of the Arm portfolio of CXL products for the Data Center
During the CXL Forum at OCP Global Summit, SMART Modular Director Product Marketing Arthur Sainio, provides an overview of the company's CXL memory cards and modules.
All Presentations during CXL Forum at Flash Memory Summit 22Memory Fabric Forum
The document summarizes a full-day forum hosted by the CXL Consortium and MemVerge on CXL. The morning agenda includes presentations on CXL from representatives of Google, Intel, PCI-SIG, Marvell, Samsung, and Micron. The afternoon agenda includes panels on CXL usage models from Meta, OCP, Anthropic, and MemVerge. A keynote presentation provides an update on the CXL Consortium and the recently released CXL 3.0 specification, including its expanded fabric capabilities and management features. The specification is aimed at enabling new usage models for memory sharing and expansion to address industry trends toward increased data processing demands.
Shared Memory Centric Computing with CXL & OMIAllan Cantle
Discusses how CXL can be better utilized as a separate Fabric Cache domain to a processors own Local Cache Domain. This is done by leveraging a Shared Memory Centric architectures that utilize both the Open Memory Interface OMI, and Compute eXpress Link, CXL, for the memory ports.
How Development Teams Cut Costs with ScyllaDB.pdfScyllaDB
Now that teams are increasingly being pressed to cut costs, the database can be a low-hanging fruit for sizable cost reduction – especially if you’re managing terabytes to petabytes of data with millions of read/write operations per second.
Join Tzach Livyatan, VP of Product at ScyllaDB, as he shares four ways that teams commonly cut database costs by rethinking their database strategy. We’ll cover topics including:
- Cutting admin costs by reducing node sprawl and reducing the need for tuning
- ScyllaDB as a better, compatible Amazon DynamoDB
- Options to increase price performance through new cloud instances
- Ways to safely add more workloads to your cluster without compromising the performance of your latency-sensitive workloads
In this deck, Yuichiro Ajima from Fujitsu presents: The Tofu Interconnect D.
"Through the development of post-K, which will be equipped with this CPU, Fujitsu will contribute to the resolution of social and scientific issues in such computer simulation fields as cutting-edge research, health and longevity, disaster prevention and mitigation, energy, as well as manufacturing, while enhancing industrial competitiveness and contributing to the creation of Society 5.0 by promoting applications in big data and AI fields."
Learn more: https://insidehpc.com/2018/08/fujitsu-unveils-details-post-k-supercomputer-processor-powered-arm/
and
http://www.fujitsu.com/jp/solutions/business-technology/tc/catalog/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
During the CXL Forum at OCP Global Summit, Jeff Hilland of HPE explained what CXL, PCI SIG, DMTF, OFA, OCP, and SNIA are doing to make CXL fabric, memory and device management interoperable.
MySQL exposes a collection of tunable parameters and indicators that is frankly intimidating. But a poorly tuned MySQL server is a bottleneck for your PHP application scalability. This session shows how to do InnoDB tuning and read the InnoDB status report in MySQL 5.5.
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203Linaro
Session ID: SFO17-203
Session Name: Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203
Speaker: Fu Wei
Track: LEG
★ Session Summary ★
This presentation gives an updated RAS architecture on ARM64 base on RAS extension (in ARMv8.2), SDEI (Software Delegated Exception Interface), APEI, UEFI PI-SMM. Will talk about all the components of the new RAS architecture on ARM64, gives audience the current status and the next step of development.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/sfo17/sfo17-203/
Presentation:
Video: https://www.youtube.com/watch?v=NReFBzbeWi0
---------------------------------------------------
★ Event Details ★
Linaro Connect San Francisco 2017 (SFO17)
25-29 September 2017
Hyatt Regency San Francisco Airport
---------------------------------------------------
Keyword:
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://twitter.com/linaroorg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and DevicesMemory Fabric Forum
Ravi Gummaluri, Director, CXL System Architecture at Micron describes use cases for memory expansion with tiered DRAM and CXL memory, along with performance data.
Ecosystem Alliance Manager Michael Ocampo talks about the CXL industry's effort to break through the memory wall, memory bound use cases, CXL for modular shared infrastructure, and critical CXL collaboration that's happening now.
CXL is enabling new memory architectures by connecting CPUs and GPUs to shared memory pools. Early CXL 1.1 focused on memory expansion by connecting processors to DRAM modules. CXL 2.0 allowed for small memory pools accessible by a few servers. CXL 3.0 supports larger shared memory fabrics by connecting thousands of nodes and enabling true shared memory regions accessible coherently by multiple hosts and accelerators. However, shared memory fabrics using CXL 3.0 may experience greater latency variability and congestion compared to single-host or small memory pooling configurations.
MemVerge CEO Charles Fan describes why memory-hungry generative AI is a driver for CXL technology, the new computing model for AI, and MemVerge software for CXL and AI.
This document presents benchmarks to analyze the memory subsystem performance of multicore processors from AMD and Intel. The benchmarks measure latency and bandwidth for different cache coherence states and locations in the memory hierarchy. Testing was done on dual-socket systems using AMD Opteron 2300 (Shanghai) and Intel Xeon 5500 (Nehalem-EP) quad-core processors. Results show significant performance differences driven by each processor's distinct cache architecture and coherence protocol implementations.
From Rack scale computers to Warehouse scale computersRyousei Takano
This document discusses the transition from rack-scale computers to warehouse-scale computers through the disaggregation of technologies. It provides examples of rack-scale architectures like Open Compute Project and Intel Rack Scale Architecture. For warehouse-scale computers, it examines HP's The Machine project using application-specific cores, universal memory, and photonics fabric. It also outlines UC Berkeley's FireBox project utilizing 1 terabit/sec optical fibers, many-core systems-on-chip, and non-volatile memory modules connected via high-radix photonic switches.
Heterogeneous Computing : The Future of SystemsAnand Haridass
Charts from NITK-IBM Computer Systems Research Group (NCSRG)
- Dennard Scaling,Moore's Law, OpenPOWER, Storage Class Memory, FPGA, GPU, CAPI, OpenCAPI, nVidia nvlink, Google Microsoft Heterogeneous system usage
RedisConf18 - Re-architecting Redis-on-Flash with Intel 3DX Point™ MemoryRedis Labs
The document discusses re-architecting Redis-on-Flash with Intel 3D XPoint memory. It introduces 3D XPoint memory as a new type of memory that is persistent, has high capacity of 6 TB per system, and is cheaper than DRAM. RedisLabs and Intel are collaborating to build the next version of Redis-on-Flash using 3D XPoint memory to increase scalability through larger memory modules and reduce costs compared to DRAM. The challenges include higher latency compared to DRAM and evolving standards.
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptxMemory Fabric Forum
MemVerge product manager and software architect Steve Scargall discusses key factors related to the use of CXL with AI apps including, memory expansion form factors, latency and bandwidth memory placement strategies, RDBMS investigation and results, vector database investigation, and results understanding your application behavior.
I understand that physics and hardware emmaded on the use of finete .pdfanil0878
I understand that physics and hardware emmaded on the use of finete element methods to predict
fluid flow over airplane wings,that progress is likely to continue. However, in recent years, this
progress has been achieved through greatly increased hardware complexity with the rise of
multicore and manycore processors, and this is affecting the ability of application developers to
achieve the full potential of these systems. currently performance is measured on a dense
matrix–matrix multiplication test which has questionable relevance to real applications.the
incredible advances in processor technology and all of the accompanying aspects of computer
system design, such as the memory subsystem and networking
In embedded it seems to combination of both hardware and the software , it is used to be
combined function of action in the systems .while we do that the application to developed in the
achieve the full potential of the systems in advanced processer technology.
Hardware
(1) Memory
Advances in memory technology have struggled to keep pace with the phenomenal advances in
processors. This difficulty in improving the main memory bandwidth led to the development of a
cache hierarchy with data being held in different cache levels within the processor. The idea is
that instead of fetching the required data multiple times from the main memory, it is instead
brought into the cache once and re-used multiple times. Intel allocates about half of the chip to
cache, with the largest LLC (last-level cache) being 30MB in size. IBM\'s new Power8 CPU has
an even larger L3 cache of up to 96MB [4]. By contrast, the largest L2 cache in NVIDIA\'s
GPUs is only 1.5MB.These different hardware design choices are motivated by careful
consideration of the range of applications being run by typical users.
One complication which has become more common and more important in the past few years is
non-uniform memory access. Ten years ago, most shared-memory multiprocessors would have
several CPUs sharing a memory bus to access a single main memory. A final comment on the
memory subsystem concerns the energy cost of moving data compared to performing a single
floating point computation.
(2) Processors
CPUs had a single processing core, and the increase in performance came partly from an increase
in the number of computational pipelines, but mainly through an increase in clock frequency.
Unfortunately, the power consumption is approximately proportional to the cube of the
frequency and this led to CPUs with a power consumption of up to 250W.CPUs address memory
bandwidth limitations by devoting half or more of the chip to LLC, so that small applications can
be held entirely within the cache. They address the 200-cycle latency issue by using very
complex cores which are capable of out-of-order execution , By contrast, GPUs adopt a very
different design philosophy because of the different needs of the graphical applications they
target. A GPU usually has a number of functional u.
CXL Memory Expansion, Pooling, Sharing, FAM Enablement, and SwitchingMemory Fabric Forum
The document discusses CXL, a new open standard protocol for efficient CPU and memory connectivity. CXL allows for memory disaggregation and pooling across devices by enabling high-bandwidth, low-latency connections between CPUs, GPUs, accelerators, and memory. This helps address the growing CPU-memory bottleneck by allowing expansion of memory capacity beyond what can physically connect to the CPU. CXL also enables memory tiering by providing different performance and cost options for "near" directly attached memory versus "far" switched or fabric attached memory.
Reliable Hydra SSD Architecture for General Purpose ControllersIJMER
The Solid State Disks (SSDs) had almost replaced the traditional Hard Disk Drives (HDDs) in modern computing systems. SSDs possess advanced features such as low power consumption, faster random access and greater shock resistance. In general, NAND Flash memories are used for bulk storage applications. This project focus on an advanced SSD architecture, called Reliable Hydra to enhance the SSD performance. Hydra SSDs overcomes the discrepancy between slow flash memory bus and fast host interfaces like SATA, SCSI, USB etc. It uses multiple high level memory controllers to execute flash memory operation without the intervention of FTL (Flash Translation Layer). It accelerates the processing of host write requests by aggressive write buffering. Memories are subjected to bit flipping errors, so this project also considers the incorporation of matrix code for increasing the reliability and hence yield of the system. The highly sophisticated controllers for real time systems in the industrial, robotics, medical and scientific applications require high performance and reliable memories. The aim of this project is to design Reliable Hydra SSD architecture for controller applications to enhance the performance. The design architecture is to be coded in VHDL using Xilinx ISE tools.
VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...VMworld
The document discusses the future of software-defined storage in 3 years. It predicts that storage media will continue to advance with higher capacities and lower latencies using technologies like 3D NAND and NVDIMMs. Networking and interconnects like NVMe over Fabrics will allow disaggregated storage resources to be pooled and shared across servers. Software-defined storage platforms will evolve to provide common services for distributed data platforms beyond just block storage, with advanced data placement and policy controls to optimize different workloads.
1. Building exascale computers requires moving to sub-nanometer scales and steering individual electrons to solve problems more efficiently.
2. Moving data is a major challenge, as moving data off-chip uses 200x more energy than computing with it on-chip.
3. Future computers should optimize for data movement at all levels, from system design to microarchitecture, to minimize energy usage.
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors Michelle Holley
Speaker: Daniel Towner, System Architect for Wireless Access, Intel Corporation
5G brings many new capabilities over 4G including higher bandwidths, lower latencies, and more efficient use of radio spectrum. However, these improvements require a large increase in computing power in the base station. Fortunately the Xeon Scalable Processor series (Skylake-SP) recently introduced by Intel has a new high-performance instruction set called Intel® Advanced Vector Extensions 512 (Intel® AVX-512) which is capable of delivering the compute needed to support the exciting new world of 5G.
In his talk Daniel will give an overview of the new capabilities of the Intel AVX-512 instruction set and show why they are so beneficial to supporting 5G efficiently. The most obvious difference is that Intel AVX-512 has double the compute performance of previous generations of instruction sets. Perhaps surprisingly though it is the addition of brand new instructions that can make the biggest improvements. The new instructions mean that software algorithms can become more efficient, thereby enabling even more effective use of the improvements in computing performance and leading to very high performance 5G NR software implementations.
This document discusses approaches to scaling in-memory databases on multicore hardware. There are two main approaches: employing a symmetric database engine where a single process uses all cores to access shared memory, and employing a partitioned database engine where the database is divided into partitions each managed by a dedicated core. A challenge is that cache coherency limits scalability as it does not scale to thousands of cores. The document recommends a software-hardware co-design approach, avoiding centralized critical sections, leveraging hardware message passing, and using techniques like optimistic concurrency control to improve scalability on high core count systems.
The document summarizes the memory performance limitations of Intel Xeon microprocessors based on the Nehalem and Westmere architectures. It finds that per core and per thread memory bandwidth is restricted to about 1/3 of theoretical maximum values. Moving from Nehalem to Westmere, read performance scales well but write performance suffers, revealing scalability issues in Westmere's design. The document aims to provide an accurate analysis of memory bandwidth and latency limitations to help application developers optimize code efficiency.
Similar to Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL) (20)
Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.Memory Fabric Forum
Nilesh Shah provide an overview of the ZeroPoint portable, hardware IP portfolio for lossless memory compression and compaction. The IP boosts memory capacity 2-4x, bandwidth and performance/watt by 50%, and is 1,000x faster than competitors.
Q1 Memory Fabric Forum: CXL-Related Activities within OCPMemory Fabric Forum
OCP steering committee member, and former President of the CXL Consortium, Siamak Tavallaei, provides an overview of CXL-related activities happening within the Open Compute Project.
Q1 Memory Fabric Forum: CXL Controller by Montage TechnologyMemory Fabric Forum
For CXL AIC and memory module designers, Nilesh Shah of Montage provides and overview of their CXL memory controller product, technology, and performance.
Nick Kriczsky and Gorden Getty provide an overview of Teledyne LeCroy’s Austin Labs portfolio of products to services including: 1) testing for protocol and electrical compliance, interoperability, data integrity, and performance, 2) In depth protocol training (PCIe, USB, NVMe, NVMe-oF, Fibre Channel), and 3) Automation (solutions for analysis, jamming, generation)
Torry Steed, Sr. Staff Product Manager at SMART Modular, covers the changing shape of memory leading to new categories of CXL form factors. He dives deeper to address EDSFF and AIC variations, mechanical sizes, installation locations, capacity considerations, and power ratings.
Q1 Memory Fabric Forum: Memory Fabric in a Composable SystemMemory Fabric Forum
Eddie McMorrow, Sr. Product Manager at GigaIO, defines composable infrastructure and memory fabrics, then provides and overview of the FabreX memory fabric.
Q1 Memory Fabric Forum: Micron CXL-Compatible Memory ModulesMemory Fabric Forum
Michael Abraham, Director of Product Management at Micron, discusses data center challenges, the memory and storage hierarchy, Micron CZ120 memory modules, database (TPC-H) improvements, AI inferencing improvements, and how to enabling in your company.
Q1 Memory Fabric Forum: Advantages of Optical CXL for Disaggregated Compute ...Memory Fabric Forum
Ron Swartzentruber, Director of Engineering at Lightelligence, explains why optical connectivity is needed for CXL fabrics, and provides an overview of the Photowave line of port expander PCIe cards and active optical cables.
Arvind Jagannath of VMware makes the case for bridging the CPU-Memory imbalance with memory tiering, describes their vision for memory disaggregation, and explains that VMware will support CXL Expanders – Specific Configurations, Memory Tiering to reduce overall TCO, and Memory Accelerators to enable CXL-based use-cases.
MemVerge Field CTO Yong Tian shows what memory expansion costs with an analysis of various server configurations with up to 8TB of tiered DRAM and CXL memory.
In the CXL Forum Theater at SC23 hosted by MemVerge, Lightelligence describes CXL's need for optical connectivity and their portfolio of CXL optical expander cards and cables
Synopsys: Achieve First Pass Silicon Success with Synopsys CXL IP SolutionsMemory Fabric Forum
This document discusses Synopsys' CXL IP solutions for enabling first pass silicon success. It provides an overview of:
- How large data sets are driving the need for CXL and larger, more efficient cache coherent storage.
- How CXL allows memory expansion by enabling one interface to connect to various memory types like DDR, LPDDR, and persistent memory.
- Synopsys' complete CXL IP solution which uses proven PCIe IP to provide a highly efficient 512-bit controller and 32GT/s PHY for maximum bandwidth and low latency.
- Synopsys' work with XConn to achieve first pass silicon success on a 256 lane CXL 2.0 switch SOC
In the CXL Forum Theater at SC23 hosted by MemVerge, Samsung described their the architecture and use cases of their hybrid drive that includes DRAM and Flash memory
Project Gismo introduces a global I/O-free shared memory object (Gismo) library that utilizes CXL to provide direct memory access across nodes. This allows distributed applications to access remote objects as fast as local memory, eliminating object serialization and data copying. Demo results show Gismo can improve performance of AI/ML workloads like Ray by up to 675% and reduce database synchronization times. The Gismo API provides functions to connect, create, access, and manage shared memory objects globally without I/O.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
2. 2
1 Source: Intel. Results may vary.
2 Source: https://flashmemorysummit.com/English/Collaterals/Proceedings/2018/20180809_NEWM-301A-1_Gervasi.pdf
3 Source: Intel Internal. Estimates – based on large scale deployments and does not include software costs.
Current cost percentage of server memory
compared to other components
1 MB
4 MB
64 MB
16 MB
128 MB
256 MB
512 MB
1 GB
2 GB
4 GB
8 GB
16 GB
32 GB
32 MB
1985 1996 2025
2011
Scaling of DRAM
Density Is Slowing2
DRAM Density Over Time2
~2x every
3 Years
~4x every
3 Years
2x
Every 4 years
Projected
Compute Performance
Growth Is Accelerating1
CPU Core Growth Projection Over Time1
Exponential
CPU Core Growth
2017 2018 2019 2020 2021 2022 2023
Memory Density and Costs Not keeping pace to meet Data Center
Workloads and Infrastructure Cost Requirements
Popular Memory
Intensive Workloads
AI/ML with LLMs
Databases & Analytics
Web-caching apps
Content Delivery Ntwks
Virtual Desktop Infra
Today Memory Costs dominate the Server’s BOM
4. 4
CXL on Motherboards: Same slot for PCIe OR CXL
Starting with 5th Gen Xeon (aka SPR) processors
• Flexible port configured for PCIe or CXL during link-up
PCIe x16 @32
Gbs
Connector/Slot
CXL Device
x16 CXL
Intel® Xeon®
Host
Mux
CXL
PCIe
PCIe Device
x16 PCIe
Intel Archer City PCIe slots
5. 5
Augment System Memory with CXL
Expensive to add more DRAM channels to the CPU
package
Memory capacity expansion with 2 DPC often causes
drop in total memory b/w (DDR5 5600 DDR5 4800)
Has lowest memory latency
CPU-attached
DRAM
CPU
Native DDR5
EDSFF E3
or E1
PCI CEM/Custom Board
Cheaper to add CXL channels to CPU package
• 66 pins for 1 x16 CXL link vs 250 pins for 2x DDR5 channel
• Note: B/w of 1x16 CXL link ~= 2 x DDR5 channels
Allows for b/w expansion irrespective of DRAM
configuration on CXL Memory buffer
Reduce TCO by re-use of older DDR4 memory or by
use of cheaper low b/w memory like NVM
Has higher latency compared to CPU-attached DRAM
CXL-attached
Memory
6. 6
CXL-based memory addition Memory tiers (NUMA nodes)
Native DRAM (‘Near Memory’) / CXL memory (‘Far Memory’)
Software (Hypervisor/OS/App) assisted
Memory tiering
Mechanism:
Software does Hot/cold page movement
Larger granularity (4K+) transfers
Tracking/telemetry overheads
Hardware-controlled memory tiering:
Options
(1) Interleave DRAM and CXL memory
address space
System memory & b/w expansion
Lowers average latency
(2) Intel Flat Memory Mode (on BHS)
System memory expansion
TCO reduction
CXL Memory Tiering
Intel’s HW-controlled tiering feature unique to Intel Xeon CPUs;
Systems boots as a single NUMA node, Provides O/S-version agnostic performance gains
7. 7
Intel Xeon Roadmap Fully Aligned with CXL Roadmap
Intel CXL Enabling Strategy
*PoC: Proof of Concept
Supports CXL v1.1 spec
Leadership in CXL ecosystem
enablement
4th
& 5th
Gen Intel® Xeon®
Sapphire Rapids (SPR) /
Emerald Rapids (EMR)
CPUs
Eagle Stream Platform)
Supports CXL v2.0 spec
Enhanced support for CXL
Memory
Flat Memory Mode
Memory Pooling for PoC
6th
Gen Intel® Xeon® CPU
(Granite Rapids (GNR) /
Sierra Forest (SRF) CPUs
Birch Stream Platform
Support for CXL v3.X spec
Future Gen Intel® Xeon®
CPU
8. 8
CXL is emerging as the industry focal point for coherent IO
CXL Consortium and OpenCAPI sign letter of intent
to transfer OpenCAPI specification and assets to the CXL Consortium
August 2, 2022, Flash Memory Summit
CXL Consortium and OpenCAPI Consortium Sign Letter of Intent
to Transfer OpenCAPI Assets to CXL
In February 2022, CXL Consortium and Gen-Z Consortium signed
agreement to transfer Gen-Z specification and assets to CXL Consortium
CXL Standard Firmly Entrenched
Compute Express Link™ and CXL™ Consortium are trademarks of the Compute Express Link Consortium; Confidential | CXL™ Consortium 2020
CXL Board of Directors
Industry Open Standard for
High Speed Communications
250+
Member Companies
9. 9
Summary
Memory intensive Workloads dominating the Computing landscape today
• Increasing memory capacity purely using CPU-attached DRAM is getting expensive
CXL protocol, running over the same existing PCIe links, allows for
augmenting system memory footprint at a lower cost
Intel Xeon® roadmap fully supports CXL starting with Gen-5 Xeon®
CPUs
• Intel CPUs offer unique h/w-based tiering modes which do NOT depend on the O/S
data-movement capabilities
CXL protocol has full support from all major computing industry players
12. 12
Memory Classification
Second Memory Tier;
Total system Addr space = Native DRAM + CXL Memory
Single Memory Tier;
Total system Addr space = Native DRAM + CXL Memory
CXL Memory Attributes
Bandwidth, latency similar to direct attach DDR OR
Lower bandwidth, higher latency vs. direct attach DDR
Bandwidth, latency similar to direct attach DDR OR
Lower bandwidth, higher latency vs. direct attach DDR
Software Considerations
OS version must support CXL memory as next tier.
Perf mgmt. by O/S with AutoNuma OR
By WL or O/S by moving hot/cold pages (4KB+);
Completely H/w managed movement of data between two tiers.
No S/w (WL or O/S) involvement;
Granularity of data movement is Cache-line (64B) Lower latency
CXL Memory Expansion: Intel Flat Memory
Mode
Host
CXL Memory
Expander
DDRx Memory Channels
System
memory
DDR
5
DDR LPDD
R
NVM
CXL
Plain Memory Tier Flat Memory Mode
CXL
Type-3
card
13. 13
Intel Flat Memory Mode
Special GNR/SRF CXL Memory Expansion mode
Both DRAM and far memory exposed to OS as combined physical memory
Data resides in either DRAM or FM - no replication
Hot data is swapped into DRAM – one cacheline at a time, not a whole 4KB page
Performance very good due to 1:1 Near/Far memory ratios
Flat MM
(1:1 ratio)
512GB
Far Memory +
512GB DRAM
OS
visible
memory
D
R
A
M
Far
Memory
Flat memory mode feature unique to GNR/SRF on BHS platform
14. 14
Flat Memory Mode Performance Demo Test Configuration
Future Intel Xeon processor code-named “Granite Rapids”
Intel Flat Memory Mode
• Performance test:
• SAP in-memory database HANA*
• Online Analytics Processing (OLAP) workload measuring analytic queries
• OS: SUSE Enterprise Linux SLES 15
• Insights:
• 98% performance when compared to using only all native DDR5 memory
• More than 80% of memory capacity (native DRAM + CXL memory) in use
• Less than 4% miss rate – Intel Flat Memory mode serves more than 96% of memory accesses from native DRAM with
hardware managed tiering between native DRAM (DDR5) and CXL attached DDR4 memory
Future Intel Xeon processor code-named “Granite Rapids”
Using only native DDR5
vs.
DDR
5
DDR
5
DDR
5
DDR
5
DDR
5
DDR
5
DDR
5
DDR
5
DDR
5
DDR
5
DDR
5
DDR
5
DDR
5
DDR
5
DDR
5
DDR
5
DDR4 DDR4 DDR4
256GB DDR5 memory 128GB DDR5 memory + 128GB CXL DDR4 memory
CXL CXL CXL
*Note: This is a performance test and not a support statement from SAP
15. 15
CXL Memory Bandwidth Expansion
Value Prop Enable b/w hungry workloads like ML; Enable higher core counts
Interleaving modes
Interleave across CXL devices within CXL memory Region
Hetero-Interleave between CPU’s DDR5 & CXL mem (for bandwidth usage expansion only)
System configuration chosen at boot time
CXL Memory Attributes Bandwidth sustained over CXL link similar to direct attach DDR
Methods
1) Completely H/w-based interleaving – no O/S tiering capability read
2) S/w (O/S, Middleware) based page interleaving
CPU
Direct Attach DDR5
EDSFF E3
or E1
PCI CEM/Custom Board
H/W assisted Hetero Interleaving feature unique to EMR & GNR/SR
16. 16
H/w Assisted Hetero-Interleave Mode (EGS-EMR)
Completely H/w-controlled tiering mode
• CXL Memory recognized as a single Numa mode
No page movements
No dependence on O/S-based tiering techniques
System address space ‘striped’ across
• 8 native DRAM channels (for 5th gen Xeons)
• 2 CXL links attached memory ( ~= 4x DDR5
channels)
Total = 12-way interleave
Results in higher system memory bandwidth
DDR5 DIMM
DDR5 on Buffer
Buff
Buff
EMR
UPI
8x DDR 5
channels
x16 CXL1.1
x16 CXL1.1
2-way ch
interleave
4-way
8-way
2-way ch
interleave
Intel’s Hetero-Interleave mode beneficial to b/w-hungry WLs like AI / ML
No dependency on O/S version/capability
17. 17
23% speedup w/ hetero mode(12ch) CXL
memory
Hetero mode memory BW Utilization
• Read/Write ratio: 2:1
Performance
100%
123% *
10
15
20
25
30
35
40
45
native-only 12ch mode
Throughput(fps)
BoneAgeAssessment Perf Speedup Hetero Mode
higher is better
Localization
Network
Regression
Network
Heatmap
Network
gender Bone Age
Assessment
key points
heatmap
Input Output
AI Inference
*123% is using production CXL silicon. Demo is running pre-production
silicon that shows 112% speedup.
AI-based Image
Analysis
EMR based Demo
Editor's Notes
As you all know PCIe has been the primary link connecting external devices to CPU over almost 2 decades. The PCIe link of course has evolved & addressed the bw needs & new features reqd by modern CPUs.
So then many of you probably wonder about the need to invent a new CXL protocol.
Slide showing the challenges not quite addressed by the existing PCIe links between processors & devices.
Under the hood, CXL includes 3 sub-protocols: The first is exactly same as the PCIe protocol – called CXL.io – and the other two are the new Coherent protocols (CXL.mem / .cache)
However this does not mean PCIe has no future. There are certain applications – like those related to big data block transfers – which are better addressed by PCIe than CXL. But that is not the subject we will discuss today.
We now have a way to add to the system memory using CXL links..
CXL Consortium got a big momentum boost in yr2022 with both OpenCAPI (IBM’s coherent link protocol) & GenZ (another coherent fabric protocol pushed by major OEMs) merged their assets with those of CXL.
Later another coherent link group (CCIX) followed suit. Today the CXL consortium stands 250+ companies tall & here to stay
These are the basic cases when the memory is directly attached locally on the CXL end-point & the memory is pretty much captive to the local host.
And one could consider that Memory Tiering is really a special case of Memory expansion but here the S/W can play a bigger role in performing tiering-optimizations whereby it can move pages in / out of tiered memory based on application execution dynamics. And this is especially so when a lower bw memory like Persistent memory is used.
Now contrast this with Flat Memory mode which is unique feature offered only on Intel cpus starting with the GNR generation. In this mode the BIOS which initially enumerates the system memory (native + CXL memory) presents this to the O/S as only 1 Numa node or one tier. So O/S based page-movement is not invoked during system operation. Instead the CPU h/w will swap cachelines with the CXL memory when a miss occurs in the DRAM. This is a quick 64byte transfer unlike the full 4K page in a O/S based data movement – meaning the workload is stalled only briefly.
Since this is a h/w controlled data movement there is no dependence on Linux version for a particular CXL page movement capability. Any Linux kernel – as early as v5.1 – which can detect CXL memory will suffice. The O/S is still of course needed for housekeeping tasks like launching applications, error handling etc.
Want to share a CXL-memory expansion mode which is unique to GNR/SRF family of CPUs.
Want to compare this with other CXL-memory expansion use cases we discussed so far.
Unlike the other memory expansion modes where the additional memory resides in a separate tier & any cachelines in this tier will incur a higher latency, in the Flat2LM mode the fetched cache-line is swapped with the corresponding cache-line in the near-memory. Note that this is always possible since the ratio between the two memory sizes is 1:1
FlatMM offers even better performance compared to Memory-mode of Intel Optane persistent memory since over there the ratio was 1:4 leading to more un-related evictions when swapping was done. Of course one has to provision the same amount of memory on the CXL-side.
FlatMM’s value prop is the big memory TCO reduction. A big use case emerging is the re-use of old generation of DDR memories which otherwise would have been recycled. So I term this advantage as ‘2 for the price of 1’.
Testing done on SAP HANA database, SUSE Enterprise Linux SLES 15 with OS kernel 5.19, GNR X3 A stepping 66C, BHSDCRB1.SYS.2526.D01.2307311547 BIOS version, with DDR5 and CXL DDR4 x8 memory
Even though we are using DDR4 memory on the CXL side the hit to perf is only 2%.
Keep in mind that Flat MM is a TCO play – it is not a perf improvement play. We want to show that in spite of using a cheaper lower b/w memory on the CXL side
And the other big use case of CXL attached memory is BW expansion. Today we already use Memory interleaving when accessing local DRAM by simultaneously accessing all available channels. So imagine getting a big chunk of data out of just one DRAM channel sequentially if one could split that chunk across all available 8 DRAM channels & fetch multiple smaller chunks simultaneously.
With CXL attached memory, one can extend this idea to add CXL memory to the above interleaving scheme.
And all this happens completely under the hood – meaning the CPU h/w can be configured at boot time to access memory in this manner – w/out the s/w having to do any addressing tricks.
This feature is unique to Intel CPUs. It is supported on both Eagle Stream & Birch Stream platforms.