2. Agenda
1.What is CXL?
2.Why we need?
3.CXL vs PCI-E.
4.Where we place CXL.
5.CXL protocols .
6.CXL –Devices types .
3. What is CXL?
• CXL is an open standard industry-supported cache-coherent
interconnect for processors, memory expansion, and
accelerators.
• Essentially, CXL technology maintains memory coherency
between the CPU memory space and memory on attached
devices.
4. Why we need CXL ?
1.It aims to provide a high-speed, low-latency connection between CPUs, GPUs, FPGAs,
and other accelerators while enabling coherent memory access between these devices.
2.CXL can work alongside PCIe, extending its capabilities and addressing the needs of
emerging workloads, like artificial intelligence, machine learning, and high-
performance computing.
5. CXL vs PCI-e
Purpose and Design:
PCI-e (5.0) : PCIe is a widely used interconnect standard primarily
designed for connecting various components within a computer system, such as
graphics cards, storage devices, networking cards.
CXL :designed with a focus on memory coherency and acceleration. It aims
to provide a high-speed, low-latency connection between CPUs, GPUs, FPGAs,
and other accelerators while enabling coherent memory access between these
devices.
6. Continuation -
Memory Coherency:
PCIe: While PCIe supports peer-to-peer data transfers, it does not inherently
provide memory coherency between devices. In other words, when data is shared
between devices over PCIe, managing data consistency in different caches and
memory spaces can be complex and may require additional software overhead.
CXL: CXL, however, is designed to support memory coherency, which means that
devices connected via CXL can directly access each other's memory and maintain data
consistency with lower software intervention..
7. Continuation-
Workload Acceleration:
• PCIe: While PCIe is excellent for connecting a wide variety of devices, its lack of
memory coherency support can lead to more significant overhead in certain
scenarios, where data needs to be frequently synchronized between devices.
• CXL: CXL's focus on memory coherency and acceleration makes it well-suited
for tasks that require intensive data sharing and parallel processing, such as
Artificial intelligence and scientific computing.
8. Where we place CXL .
• CXL builds upon the physical and electrical interfaces of pci-e with
protocols that establish coherency, simplify the software stack, and maintain
compatibility. with existing standards
• CXL controls a PCIe 5 feature that allows alternate protocols to use the
physical PCIe layer.
• CXL transaction protocols are activated only if both sides support CXL.
Otherwise, they operate as PCIe devices.
9. Terminology / Acronyms
• Accelerator:- Devices that may be used by software running on Host processors to
offload or perform any type of compute or I/O tasks
Examples of accelerators include programmable agents (such as GPU/GPGPU), fixed-
function agents, or reconfigurable agents such as FPGAs.
• Cache coherence : In a multiprocessor system, data inconsistency may occur
among adjacent levels or within the same level of the memory hierarchy.
12. MESI protocol
• The MESI protocol is an Invalidate-based cache coherence protocol, and is one of the most
common protocols that support write-back caches. It is also known as the Illinois protocol
• The letters in the acronym MESI represent four exclusive states that a cache line
can be marked with (encoded using two additional bits):
• STATES:-
• Modified (M) As mentioned above, this term signifies that the data stored in the
cache and main memory are different. This means the data in the cache has
been modified, and the changes need to be reflected in the main memory.
• Exclusive (E)The exclusive term signifies that the data is clean, i.e., the cache
and the main memory hold identical data..
13. Continuation:-
• Shared - Shared refers to the fact that the cache value contains the most current data copy,
which is then shared across the whole cache as well as main memory.
• Invalid - When a cache block is marked as invalid, it means that it needs to be fetched from
another cache or main memory.
• Operation:-
14. Continuation :-
• The MESI protocol is defined by a finite-state machine that transitions from one state to
another based on 2 stimuli.
• The first stimulus is the processor specific Read and Write request. For example: A
processor P1 has a Block X in its Cache, and there is a request from the processor to read
or write from that block.
• The second stimulus is given through the bus connecting the processors. In particular the
"Bus side requests" come from other processors that don't have the cache block or the
updated data in their Cache.
• Different type of Processor requests and Bus side requests:
• Processor Requests to Cache include the following operations:
1.PrRd: The processor requests to read a Cache block.
2.PrWr: The processor requests to write a Cache block
15. Continuation
• Bus side requests are the following:
• BusRd: Snooped request that indicates there is a read request to a Cache block requested by another
processor
• BusRdX: Snooped request that indicates there is a write request to a Cache block requested by another
processor that doesn't already have the block.
• BusUpgr: Snooped request that indicates that there is a write request to a Cache block requested by
another processor that already has that cache block residing in its own cache.
• Flush: Snooped request that indicates that an entire cache block is written back to the main memory by
another processor.
• FlushOpt: Snooped request that indicates that an entire cache block is posted on the bus in order to
supply it to another processor (Cache to Cache transfers).
• Snooping Operation: In a snooping system, all caches on a bus monitor all the
transactions on that bus. Every cache has a copy of the sharing status of every block of
physical memory it has stored. The state of the block is changed according to the State
Diagram of the protocol used. (Refer image above for MESI state diagram). The bus has
snoopers on both sides:
1.Snooper towards the Processor/Cache side.
2.The snooping function on the memory side is done by the Memory controller.
16.
17.
18. CXL protocols
• 1.CXL io :- This protocol is functionally equivalent to the PCIe protocol—
As the foundational communication protocol, CXL.io is used for device
discovery, enumeration, link-up,
• CXL Cache :-This protocol, which is designed for more specific
applications, enables accelerators to efficiently access and cache host
memory for optimized performance
• CXL Memory:- This protocol enables a host, such as a processor, to
access device-attached memory using load/store commands
Together, these three protocols facilitate the coherent sharing of memory
resources between computing devices
19. CXL Device- types
Type 1 Devices: CXL.io + CXL.cache
Type 2 Devices: CXL.io + CXL.cache + CXL.memory
Type 3 Devices: CXL.io + CXL.memory
20. Type 1 - Device with Cache
• Type 1 Devices: Accelerators such as smart NICs
typically lack local memory. Via CXL, these devices can
communicate with the host processor’s DDR memory.
• Type 1 CXL devices have special needs for which having a fully
coherent cache in the device
• The size of cache that can be supported for such devices
depends on the host’s snoop filtering capacity
21. Type 2 Device
• Type 2 Devices: GPUs, ASICs, and FPGAs are all
equipped with DDR or HBM memory and can use CXL
to make the host processor’s memory locally available
to the accelerator—and the accelerator’s memory
locally available to the CPU.
• The key goal for CXL is to provide a means for the Host to
push operands into device-attached memory and for the
Host to pull results out of device-attached memory
• The Bias Based coherency model defines two
states of bias for device-attached memory:
1.Host bias
2. Device bias.
22. Continuation :-
• Host bias :- When the device-attached memory is in Host Bias state, it appears to
the device just as regular Host-attached memory does. That is, if the device needs
to access it, it needs to send a request to the Host which will resolve coherency for
the requested line.
• Device bias :- when the device-attached memory is in Device Bias state, the device
is guaranteed that the Host does not have the line in any cache. As such, the device
can access it without sending any transaction (request, snoops, etc.) to the Host
whatsoever
Note :- Host itself sees a uniform view of device-attached memory regardless of the bias state. In
both modes, coherency is preserved for device-attached memory
23. Type -3 device :-
• A Type 3 CXL Device supports CXL.io and CXL.mem
protocols. An example of a Type 3 CXL device is a memory
expander for the Host as shown in the figure below
• Type 3 Devices: Memory devices can be attached via
CXL to provide additional bandwidth and capacity to
host processors. The type of memory is independent of
the host’s main memory
• The device operates primarily over CXL.mem to service
requests sent from the Host. The CXL.io protocol is primarily
used for device discovery, enumeration, error reporting
and management.
• The CXL.io protocol is permitted to be used by the device for
other IO specific application usages
24.
25. Flex Bus
• A Flex Bus port allows designs to choose between providing native
PCIe protocol or CXL over a high-bandwidth, off-package link; the
selection happens during link training via alternate protocol
negotiation.
Flex Bus Link Features :-
• Native PCIe mode, full feature support as defined in the PCIe
specification.
• Signaling rate of 32 GT/s, degraded rate of 16GT/s or 8 GT/s in CXL
mode.
• Link width support for x16, x8, x4, x2 (degraded mode), and x1
(degraded mode) in CXL mode.
• Bifurcation (aka Link Subdivision) support to x4 in CXL mode