Presented by
Date
Event
LMG: Replacing CMEM
Meeting SoC shared buffer allocation, management, and
address translation requirements.
Gil Pitney + UMM team
9 Feb 2015
Linaro Connect HKG15
● CMEM (http://processors.wiki.ti.com/index.php/CMEM_Overview) is a
contiguous memory buffer allocator/manager, designed to meet
multimedia, embedded, OpenCL compute and other use cases on TI SoC.
● Though open source, CMEM lives outside of Linux mainline.
● TI would like to explore meeting those SoC use case requirements using
existing mainline Linux mechanisms, or Linaro-UMM (cenalloc/dma-buf).
● Upstreaming CMEM may be an option if there are no alternatives, and if
there is community buy-in.
Goals
TI SoC Use Cases and Requirements
● Use Case: OpenCL compute, MultiMedia Apps
○ Requirements:
■ Contiguous memory buffer allocation (eg: CMA)
■ Buffers shared between ARM+DSP (via rpmsg/virtio) or between ARM+HW
Accelerator/DMA
■ Huge buffer allocation (> 2GB)
■ Allocation from pools of fixed or variable-sized buffers.
■ User Space must be able to source/sink (write/read) buffers.
● Use Case: User Mode Ethernet Driver (Transport Layer)
○ Added Requirements:
■ Ability to allocate memory mapped as necessary to support Keystone II DMA-cache
coherence.
■ User space driver must be able to program DMA registers.
Current TI Solutions: CMEM
● CMEM: http://git.ti.com/ipc/ludev
● What it is:
○ Contiguous physical memory allocator, Kernel Module and User Space API.
○ Used on several TI ARM+Accelerator platforms (DaVinci, OMAP3+, Keystone II).
● How it works:
○ Memory Reservation: CMA or Linux memory carveout.
■ if CMA:
● Either the global CMA pool is assumed (dma_alloc_coherent())
● or a number of CMA pools are reserved (via a CMEM kernel stub calling
dma_declare_contiguous() from the platform’s MACHINE_START.reserve fxn.)
■ if carveout:
● Either via DT "reserved-memory" node or kernel cmd line: mem=<size>@0x<address>
○ Memory Allocation: From a Heap, or Pools of fixed size buffers.
■ Number of pools, buffer sizes and type of memory (CMA or carveout) is defined in DT (or via
CMEM insmod command line).
■ Internal Heap module implements alloc() and free().
○ Buffer Tracking:
■ Buffers are registered per fd for buffer ownership tracking and cleanup (i.e., no dma-buf support)
○ Mapping to User Space: CMEM mmap() calls remap_pfn_range(). cached/non-cached.
Current TI Solutions: hplib
● hplib: http://git.ti.com/keystone-rtos/hplib
● What it is:
○ "High Performance Library": Kernel Module and User Space API.
○ Cached/uncached physically contiguous memory block allocator and virtual memory
mapping/cache operations for user space Ethernet drivers.
● Client: User Space Ethernet Transport Layer/Driver on Keystone II, used by ODP app.
○ Programs Keystone II Queue Manager and CPPI DMA registers from user space.
○ High packet rates: so low latency operation required.
● How it works:
○ Allocates a 16MB CMA buffer, maps into user space.
○ Only one virtual/physical translation is maintained, user space transport carves up the
allocation to manage sub-buffers for DMA.
○ Keystone II SDK adds a kernel patch to allow CMA memory to be cacheable.
● ION (Android):
○ Pros:
■ Provides separate heaps, CMA pools
■ Can “plug-in” new allocators
■ User space API.
■ From user space, can do cache operations (deprecated?).
■ dma-buf supported.
○ Cons:
■ From user space, no API to get phys address (?)
■ Though ION is in staging, not evident that ION is destined for
Linux mainline*
* http://www.slideshare.net/linaroorg/lca14-509-ionupstreamingstatusnextsteps
Potential Alternative Solutions (1/3)
● DRM/GEM/PRIME:
○ Pros:
■ GEM provides some generic buffer allocation capability, and CMA
buffer allocation helpers.
■ dma-buf handles can be exported and shared, through PRIME.
○ Cons:
■ Mostly designed for graphics, so much of the capabilities are
geared towards managing GPU/display buffers.
■ CMEM use cases don’t typically need command execution
buffers, writecombining, domain tracking or other DRM/GEM
capabilities.
■ A new DRM driver needs to be created to export the GEM APIs.
■ No GEM concept of pools of fixed buffers.
Potential Alternative Solutions (2/3)
● cenalloc/dma-buf: (WIP: git.linaro.org/people/sumit.semwal/linux-3.x.git)
○ Pros:
■ Uses dma-buf (may become dma-buf “allocation helpers”).
■ Potentially allows unification of allocators under a central
framework device (/dev/cenalloc).
■ Allows for memory constraints sharing between devices.
■ User mode can get physical address if needed.
○ Cons:
■ Selection of pool ID (like ION Heap ID) at allocation time not
supported.
■ Some use cases don’t fit the delayed allocation model: eg: when
user space needs to be the source in a pipeline, actual allocation
needs to occur earlier than the importing driver’s
dma_buf_map_attachment() call.
Potential Alternative Solutions (3/3)
How CMEM might fit under cenalloc (1/2)
#include <linux/cenalloc/cenalloc.h>
int main (int argc, char ** argv)
{
int cenalloc_fd;
int buf_fd;
void *payload;
struct cenalloc_fd_data alloc_data;
/* Assumption: At boot-time, the TI platform registers the CMEM allocator and mapping masks with cenalloc,
* by calling cenalloc_device_add_allocator(&cmem_cma)
*/
/* Allocate a dma-buf for a CMEM buffer */
cenalloc_fd = open("/dev/cenalloc");
alloc_data.flags = CA_FLAG_CACHED | CA_CMA_POOL_ID;
alloc_data.len = PAYLOADSIZE;
alloc_data.align = 0;
ret = ioctl(cenalloc_fd, CA_IOC_CREATE, &alloc_data); /* Exports a dma-buf, not yet backed by phys memory. */
buf_fd = alloc_data.fd;
/* Allocate the CMEM buffer, and get a virtual address to the CMEM buffer.
* We need to prod the importer (rpmsg socket driver) to do the cenalloc-required
* dma_buf_map_attach()/dma_buf_map_attachment() on the buf_fd, which will finally allocate the buffer
*/
ioctl(rpmsg_sock_fd, RPMSG_BUF_ALLOCATE, &rpmsg_arg_struct); /* Args contain buf_fd */
payload = mmap(...., buf_fd, ...);
How CMEM might fit under cenalloc (2/2)
.
.
.
/* Send buffers from user space to the remote processor (eg: DSP) via rpmsg socket driver */
for (i = 0 ; ; i++) {
/* read a block of data and fill CMEM buffer */
ret = fread(payload, 1, PAYLOADSIZE, stdin);
/* And send to DSP (remote processor) */
err = send(rpmsg_sock_fd, msg, PAYLOADSIZE, 0); /* msg is a struct that contains the dma-buf fd */
}
close(cenalloc_fd);
return(0);
}
NOTE: Example usage would change if cenalloc allocators get merged under dma-buf.
● CMEM is sufficiently generic, but doesn't require the delayed allocation
and constraint-sharing features of cenalloc/dma-buf helpers.
● However, it may be useful to plug CMEM allocators under a centralized
dma-buf allocation framework.
● Any other Linux solutions to replace the CMEM capabilities?
Other Options?

HKG15-106: Replacing CMEM: Meeting TI's SoC shared buffer allocation, management, and address translation requirements

  • 2.
    Presented by Date Event LMG: ReplacingCMEM Meeting SoC shared buffer allocation, management, and address translation requirements. Gil Pitney + UMM team 9 Feb 2015 Linaro Connect HKG15
  • 3.
    ● CMEM (http://processors.wiki.ti.com/index.php/CMEM_Overview)is a contiguous memory buffer allocator/manager, designed to meet multimedia, embedded, OpenCL compute and other use cases on TI SoC. ● Though open source, CMEM lives outside of Linux mainline. ● TI would like to explore meeting those SoC use case requirements using existing mainline Linux mechanisms, or Linaro-UMM (cenalloc/dma-buf). ● Upstreaming CMEM may be an option if there are no alternatives, and if there is community buy-in. Goals
  • 4.
    TI SoC UseCases and Requirements ● Use Case: OpenCL compute, MultiMedia Apps ○ Requirements: ■ Contiguous memory buffer allocation (eg: CMA) ■ Buffers shared between ARM+DSP (via rpmsg/virtio) or between ARM+HW Accelerator/DMA ■ Huge buffer allocation (> 2GB) ■ Allocation from pools of fixed or variable-sized buffers. ■ User Space must be able to source/sink (write/read) buffers. ● Use Case: User Mode Ethernet Driver (Transport Layer) ○ Added Requirements: ■ Ability to allocate memory mapped as necessary to support Keystone II DMA-cache coherence. ■ User space driver must be able to program DMA registers.
  • 5.
    Current TI Solutions:CMEM ● CMEM: http://git.ti.com/ipc/ludev ● What it is: ○ Contiguous physical memory allocator, Kernel Module and User Space API. ○ Used on several TI ARM+Accelerator platforms (DaVinci, OMAP3+, Keystone II). ● How it works: ○ Memory Reservation: CMA or Linux memory carveout. ■ if CMA: ● Either the global CMA pool is assumed (dma_alloc_coherent()) ● or a number of CMA pools are reserved (via a CMEM kernel stub calling dma_declare_contiguous() from the platform’s MACHINE_START.reserve fxn.) ■ if carveout: ● Either via DT "reserved-memory" node or kernel cmd line: mem=<size>@0x<address> ○ Memory Allocation: From a Heap, or Pools of fixed size buffers. ■ Number of pools, buffer sizes and type of memory (CMA or carveout) is defined in DT (or via CMEM insmod command line). ■ Internal Heap module implements alloc() and free(). ○ Buffer Tracking: ■ Buffers are registered per fd for buffer ownership tracking and cleanup (i.e., no dma-buf support) ○ Mapping to User Space: CMEM mmap() calls remap_pfn_range(). cached/non-cached.
  • 6.
    Current TI Solutions:hplib ● hplib: http://git.ti.com/keystone-rtos/hplib ● What it is: ○ "High Performance Library": Kernel Module and User Space API. ○ Cached/uncached physically contiguous memory block allocator and virtual memory mapping/cache operations for user space Ethernet drivers. ● Client: User Space Ethernet Transport Layer/Driver on Keystone II, used by ODP app. ○ Programs Keystone II Queue Manager and CPPI DMA registers from user space. ○ High packet rates: so low latency operation required. ● How it works: ○ Allocates a 16MB CMA buffer, maps into user space. ○ Only one virtual/physical translation is maintained, user space transport carves up the allocation to manage sub-buffers for DMA. ○ Keystone II SDK adds a kernel patch to allow CMA memory to be cacheable.
  • 7.
    ● ION (Android): ○Pros: ■ Provides separate heaps, CMA pools ■ Can “plug-in” new allocators ■ User space API. ■ From user space, can do cache operations (deprecated?). ■ dma-buf supported. ○ Cons: ■ From user space, no API to get phys address (?) ■ Though ION is in staging, not evident that ION is destined for Linux mainline* * http://www.slideshare.net/linaroorg/lca14-509-ionupstreamingstatusnextsteps Potential Alternative Solutions (1/3)
  • 8.
    ● DRM/GEM/PRIME: ○ Pros: ■GEM provides some generic buffer allocation capability, and CMA buffer allocation helpers. ■ dma-buf handles can be exported and shared, through PRIME. ○ Cons: ■ Mostly designed for graphics, so much of the capabilities are geared towards managing GPU/display buffers. ■ CMEM use cases don’t typically need command execution buffers, writecombining, domain tracking or other DRM/GEM capabilities. ■ A new DRM driver needs to be created to export the GEM APIs. ■ No GEM concept of pools of fixed buffers. Potential Alternative Solutions (2/3)
  • 9.
    ● cenalloc/dma-buf: (WIP:git.linaro.org/people/sumit.semwal/linux-3.x.git) ○ Pros: ■ Uses dma-buf (may become dma-buf “allocation helpers”). ■ Potentially allows unification of allocators under a central framework device (/dev/cenalloc). ■ Allows for memory constraints sharing between devices. ■ User mode can get physical address if needed. ○ Cons: ■ Selection of pool ID (like ION Heap ID) at allocation time not supported. ■ Some use cases don’t fit the delayed allocation model: eg: when user space needs to be the source in a pipeline, actual allocation needs to occur earlier than the importing driver’s dma_buf_map_attachment() call. Potential Alternative Solutions (3/3)
  • 10.
    How CMEM mightfit under cenalloc (1/2) #include <linux/cenalloc/cenalloc.h> int main (int argc, char ** argv) { int cenalloc_fd; int buf_fd; void *payload; struct cenalloc_fd_data alloc_data; /* Assumption: At boot-time, the TI platform registers the CMEM allocator and mapping masks with cenalloc, * by calling cenalloc_device_add_allocator(&cmem_cma) */ /* Allocate a dma-buf for a CMEM buffer */ cenalloc_fd = open("/dev/cenalloc"); alloc_data.flags = CA_FLAG_CACHED | CA_CMA_POOL_ID; alloc_data.len = PAYLOADSIZE; alloc_data.align = 0; ret = ioctl(cenalloc_fd, CA_IOC_CREATE, &alloc_data); /* Exports a dma-buf, not yet backed by phys memory. */ buf_fd = alloc_data.fd; /* Allocate the CMEM buffer, and get a virtual address to the CMEM buffer. * We need to prod the importer (rpmsg socket driver) to do the cenalloc-required * dma_buf_map_attach()/dma_buf_map_attachment() on the buf_fd, which will finally allocate the buffer */ ioctl(rpmsg_sock_fd, RPMSG_BUF_ALLOCATE, &rpmsg_arg_struct); /* Args contain buf_fd */ payload = mmap(...., buf_fd, ...);
  • 11.
    How CMEM mightfit under cenalloc (2/2) . . . /* Send buffers from user space to the remote processor (eg: DSP) via rpmsg socket driver */ for (i = 0 ; ; i++) { /* read a block of data and fill CMEM buffer */ ret = fread(payload, 1, PAYLOADSIZE, stdin); /* And send to DSP (remote processor) */ err = send(rpmsg_sock_fd, msg, PAYLOADSIZE, 0); /* msg is a struct that contains the dma-buf fd */ } close(cenalloc_fd); return(0); } NOTE: Example usage would change if cenalloc allocators get merged under dma-buf.
  • 12.
    ● CMEM issufficiently generic, but doesn't require the delayed allocation and constraint-sharing features of cenalloc/dma-buf helpers. ● However, it may be useful to plug CMEM allocators under a centralized dma-buf allocation framework. ● Any other Linux solutions to replace the CMEM capabilities? Other Options?