HKG15-106: Replacing CMEM: Meeting TI's SoC shared buffer allocation, management, and address translation requirements

Presented by
Date
Event
LMG: Replacing CMEM
Meeting SoC shared buffer allocation, management, and
address translation requirements.
Gil Pitney + UMM team
9 Feb 2015
Linaro Connect HKG15

● CMEM (http://processors.wiki.ti.com/index.php/CMEM_Overview) is a
contiguous memory buffer allocator/manager, designed to meet
multimedia, embedded, OpenCL compute and other use cases on TI SoC.
● Though open source, CMEM lives outside of Linux mainline.
● TI would like to explore meeting those SoC use case requirements using
existing mainline Linux mechanisms, or Linaro-UMM (cenalloc/dma-buf).
● Upstreaming CMEM may be an option if there are no alternatives, and if
there is community buy-in.
Goals

TI SoC Use Cases and Requirements
● Use Case: OpenCL compute, MultiMedia Apps
○ Requirements:
■ Contiguous memory buffer allocation (eg: CMA)
■ Buffers shared between ARM+DSP (via rpmsg/virtio) or between ARM+HW
Accelerator/DMA
■ Huge buffer allocation (> 2GB)
■ Allocation from pools of fixed or variable-sized buffers.
■ User Space must be able to source/sink (write/read) buffers.
● Use Case: User Mode Ethernet Driver (Transport Layer)
○ Added Requirements:
■ Ability to allocate memory mapped as necessary to support Keystone II DMA-cache
coherence.
■ User space driver must be able to program DMA registers.

Current TI Solutions: CMEM
● CMEM: http://git.ti.com/ipc/ludev
● What it is:
○ Contiguous physical memory allocator, Kernel Module and User Space API.
○ Used on several TI ARM+Accelerator platforms (DaVinci, OMAP3+, Keystone II).
● How it works:
○ Memory Reservation: CMA or Linux memory carveout.
■ if CMA:
● Either the global CMA pool is assumed (dma_alloc_coherent())
● or a number of CMA pools are reserved (via a CMEM kernel stub calling
dma_declare_contiguous() from the platform’s MACHINE_START.reserve fxn.)
■ if carveout:
● Either via DT "reserved-memory" node or kernel cmd line: mem=<size>@0x<address>
○ Memory Allocation: From a Heap, or Pools of fixed size buffers.
■ Number of pools, buffer sizes and type of memory (CMA or carveout) is defined in DT (or via
CMEM insmod command line).
■ Internal Heap module implements alloc() and free().
○ Buffer Tracking:
■ Buffers are registered per fd for buffer ownership tracking and cleanup (i.e., no dma-buf support)
○ Mapping to User Space: CMEM mmap() calls remap_pfn_range(). cached/non-cached.

Current TI Solutions: hplib
● hplib: http://git.ti.com/keystone-rtos/hplib
● What it is:
○ "High Performance Library": Kernel Module and User Space API.
○ Cached/uncached physically contiguous memory block allocator and virtual memory
mapping/cache operations for user space Ethernet drivers.
● Client: User Space Ethernet Transport Layer/Driver on Keystone II, used by ODP app.
○ Programs Keystone II Queue Manager and CPPI DMA registers from user space.
○ High packet rates: so low latency operation required.
● How it works:
○ Allocates a 16MB CMA buffer, maps into user space.
○ Only one virtual/physical translation is maintained, user space transport carves up the
allocation to manage sub-buffers for DMA.
○ Keystone II SDK adds a kernel patch to allow CMA memory to be cacheable.

● ION (Android):
○ Pros:
■ Provides separate heaps, CMA pools
■ Can “plug-in” new allocators
■ User space API.
■ From user space, can do cache operations (deprecated?).
■ dma-buf supported.
○ Cons:
■ From user space, no API to get phys address (?)
■ Though ION is in staging, not evident that ION is destined for
Linux mainline*
* http://www.slideshare.net/linaroorg/lca14-509-ionupstreamingstatusnextsteps
Potential Alternative Solutions (1/3)

● DRM/GEM/PRIME:
○ Pros:
■ GEM provides some generic buffer allocation capability, and CMA
buffer allocation helpers.
■ dma-buf handles can be exported and shared, through PRIME.
○ Cons:
■ Mostly designed for graphics, so much of the capabilities are
geared towards managing GPU/display buffers.
■ CMEM use cases don’t typically need command execution
buffers, writecombining, domain tracking or other DRM/GEM
capabilities.
■ A new DRM driver needs to be created to export the GEM APIs.
■ No GEM concept of pools of fixed buffers.

● cenalloc/dma-buf: (WIP: git.linaro.org/people/sumit.semwal/linux-3.x.git)
○ Pros:
■ Uses dma-buf (may become dma-buf “allocation helpers”).
■ Potentially allows unification of allocators under a central
framework device (/dev/cenalloc).
■ Allows for memory constraints sharing between devices.
■ User mode can get physical address if needed.
○ Cons:
■ Selection of pool ID (like ION Heap ID) at allocation time not
supported.
■ Some use cases don’t fit the delayed allocation model: eg: when
user space needs to be the source in a pipeline, actual allocation
needs to occur earlier than the importing driver’s
dma_buf_map_attachment() call.

How CMEM might fit under cenalloc (1/2)
#include <linux/cenalloc/cenalloc.h>
int main (int argc, char ** argv)
{
int cenalloc_fd;
int buf_fd;
void *payload;
struct cenalloc_fd_data alloc_data;
/* Assumption: At boot-time, the TI platform registers the CMEM allocator and mapping masks with cenalloc,
* by calling cenalloc_device_add_allocator(&cmem_cma)
*/
/* Allocate a dma-buf for a CMEM buffer */
cenalloc_fd = open("/dev/cenalloc");
alloc_data.flags = CA_FLAG_CACHED | CA_CMA_POOL_ID;
alloc_data.len = PAYLOADSIZE;
alloc_data.align = 0;
ret = ioctl(cenalloc_fd, CA_IOC_CREATE, &alloc_data); /* Exports a dma-buf, not yet backed by phys memory. */
buf_fd = alloc_data.fd;
/* Allocate the CMEM buffer, and get a virtual address to the CMEM buffer.
* We need to prod the importer (rpmsg socket driver) to do the cenalloc-required
* dma_buf_map_attach()/dma_buf_map_attachment() on the buf_fd, which will finally allocate the buffer
*/
ioctl(rpmsg_sock_fd, RPMSG_BUF_ALLOCATE, &rpmsg_arg_struct); /* Args contain buf_fd */
payload = mmap(...., buf_fd, ...);

How CMEM might fit under cenalloc (2/2)
.
.
.
/* Send buffers from user space to the remote processor (eg: DSP) via rpmsg socket driver */
for (i = 0 ; ; i++) {
/* read a block of data and fill CMEM buffer */
ret = fread(payload, 1, PAYLOADSIZE, stdin);
/* And send to DSP (remote processor) */
err = send(rpmsg_sock_fd, msg, PAYLOADSIZE, 0); /* msg is a struct that contains the dma-buf fd */
}
close(cenalloc_fd);
return(0);
}
NOTE: Example usage would change if cenalloc allocators get merged under dma-buf.

● CMEM is sufficiently generic, but doesn't require the delayed allocation
and constraint-sharing features of cenalloc/dma-buf helpers.
● However, it may be useful to plug CMEM allocators under a centralized
dma-buf allocation framework.
● Any other Linux solutions to replace the CMEM capabilities?
Other Options?

HKG15-106: Replacing CMEM: Meeting TI's SoC shared buffer allocation, management, and address translation requirements

More Related Content

More from Linaro

Recently uploaded

HKG15-106: Replacing CMEM: Meeting TI's SoC shared buffer allocation, management, and address translation requirements