What is the issue ION solves?
Provides a way to allocate buffers so that they can be
shared between different hardware devices (via DMA)
to avoid copying
Different devices have different constraints
– Physically contiguous memory
– Smaller memory aperture (32bit device accessing LPAE/64bit
– Different pagetable sizes
Provides a method to select type of buffer that satisfies
While mostly used for graphics, ION is not graphics
kmalloc for physically contiguous allocation
CMA allows kernel to make space for
contiguously physical allocations
Carveout memory is physically contiguous
memory reserved at boot
Provides way for userland to allocate buffers
from various “pools of memory” (aka: heaps)
– SYSTEM: Virtually contiguous (vmalloc)
– SYSTEM_CONTIG: Small physically contiguous
– CARVEOUT: Large reserved physically contiguous
– CHUNK: Carveout + large page tables
– CUSTOM: Whatever hardware vendors want (ick)
– CMA: Sometime in the future?
ION Interface (cont)
Allows freeing, mapping and passing of
those buffers to other applications and
– Buffers shared as file descriptors
Using our examples
CPU + GPU: SYSTEM
CPU + MMC: SYSTEM_CONTIG
CPU + CAMERA: CARVEOUT
CPU + GPU + CAMERA: CARVEOUT
CPU + GPU + MMC: SYSTEM_CONTIG
Note: ION does not help calculate what the proper
heap is for the given combination of hardware. It just
provides userland an interface to specify a heap that
userland knows satisfies the hardware constraints
ION developer priorities
Android developers very focused on avoiding “jank”
- frame drops, jerky animations
Want very deterministic behavior
– They worry about CMA since it may spend a variable
amount of time to move memory on a large allocation
– Delayed constraint-solving dma-buf allocation ideas are
similarly not considered viable (by Android devs)
Want to centralize as much logic as possible in ION
core, so any optimizations can be made once in the
– Avoid lots of per-driver tweaking
Isn't this what dma-buf does?
ION pre-dates dma-buf
dma-buf provides a subset of what ION does
dma-buf is more of a encapsulation structure for
buffers of different types
– Allows buffers to be passed between different drivers and
– Basically a marshaling structure
– Does not specify how the buffers are allocated
ION also has its own buffer encapsulation structure
– ION added support to export dmabufs (sort of)
Isn't this what CMA does?
Again: Sort of.
CMA allows for large physically contiguous memory allocations by
migrating memory to make room for the large allocation
– Avoids wasting memory with carveouts if they aren't in use.
– CMA has pluggable allocators and options that can allow for allocations that
satisfy the constraints needed.
– CMA is kernel-internal only for now, and doesn't have a interface to allow
userland to allocate buffers or specify constraint options
– Migrating pages to make room can cause non-deterministic delays. Android
developers want deterministic behavior.
Patches to support CMA via ION have been submitted by Benjamin
Gaignard (Android developer plan on accepting them).
What about TTM, GEM and PRIME?
You are now in the acronym pit of despair!
DRM, DRI, DRI2, EXA, UXA, GEM, TTM, UMA, GTT
What about TTM, GEM?
TTM: Graphics memory manager for discrete gpus that have
their own video-ram.
– Considered complicated / poorly documented
– Provides fence synchronization facility
GEM: More minimal approach to TTM
– Developed by Intel, focused on their hardware
– Limited to UMA devices (ie: integrated graphics)
– No synchronization (fence) primitives
Those have to be implemented w/ driver-specific ioctls
– Allows for sharing of buffers between applications by named ids
GEM-ified TTM: TTM backend w/ GEM API
What about PRIME?
PRIME: GEM extended to use file
descriptors for passing object
references/buffers between drivers and
– Uses dmabuf for passing buffers around
– Required for “hybrid graphics” where there
are multiple gpu (discrete and integrated)
Issues with ION
Doesn't build on non 32-bit ARM architectures
Quite a bit of DMA api misuse
– Lots of ARM specific assumptions about DMA rules that aren't
Exports kernel pointers to userland (makes compat_ioctl
Larger portability issue that applications have to
understand the hardware buffer constraints in order to
select the right heap to use
– On different hardware, different heaps may be available, as
well as different devices with different constraints
– Same userland wouldn't necessarily work on different hardware
CPUs and Devices both cache memory
– To keep coherency, we need to flush caches
before initiating DMA
– This requires a direction and a device
ION pre sync's data, before knowing which
device its going to. Leaves device value as
NULL. Works for their uses
– Broken for IOMMUs
What is our plan with ION?
Working w/ Android and ARM developers to address 32bit
Working with Arnd to try to sort out if we can address the
dma-api misuse, or decide if new dma-apis are needed
Try to come up with a way for the interface to expose less
hardware specific detail
– Query devices for an opaque heap-cookie they support, which
could be OR-ed with other cookies to determine which heap to use
for cross device buffers
All of this may break current interface compatibility :(
I suspect getting ION into staging is as good as it will get
What is Sync?
Provides synchronization primitives that can be
shared across processes
Used mostly to synchronize both drivers and
applications drawing to the screen
Like a condition-wait variable, but can be backed by
– Some gpus support hardware mutexes
Provides lots of debugging data for sorting out
In staging directory as of 3.10
Timelines and fences
– Applications set fences at specific points on
timeline and wait
struct sw_sync_create_fence_data data;
data.value = fence_count
ioctl(timeline_fd, SW_SYNC_IOC_CREATE_FENCE, &data);
ioctl(data.fence, SYNC_IOC_WAIT, &timeout);
– Controlling thread increments timeline, waking
any processes waiting.
ioctl(timeline_fd, SW_SYNC_IOC_INC, &count);
What about Dmabuf-fences?
Developed by Maarten Lankhorst, Daniel Vetter and
Creates similar synchronization fences that are tied to
specific dma-buf buffers
Provides implicit synchronization
– Android's Sync is explicit synchronization, requiring
developers to add the logic
Limited to dma-buf buffers
– Android's Sync driver can be used in more varied contexts
Daniel Vetter's take:
“The fundamental difference between android syncpoints and the dma_buf
fences is that syncpoints use explicit userspace synchronization objects which
get passed around as fds. Whereas dma_buf fences are all implicitly attached to
the respective dma_bufs, so userspace can just pass around the buffer object fds
and the kernel ensures that magic happens and everything is synced up properly.
Imo the later approach has two big upsides:
- Implicit sync objects are a _much_ simpler programming model. Think
synchronous file i/o vs. aio. And if the kernel doesn't suck, there's not really a
performance disadvantage, at least for the shared buffer use-case. GL drivers
might still need explicit syncing for their gpu state objects for the last ounce of
performance, but that's not relevant.
- Having fences attached directly to dma_buf objects is the only way to make
dynamic buffers (i.e. eviction from garts/memory) possible. Currently every
graphics driver on android seems to just pin their buffers into main memory so
there's no need for that. And ion also only cares about pinned buffers. But I
expect that this will change.”
What about wait/wound-style mutexes?
Also developed by Maarten Lankhorst and Daniel Vetter
Developed to handle the case where buffers are shared
between devices. Since buffers may not be ordered in the same
way on all devices, there may be the possiblility for ABBA
Wait/wound style mutexes provide a global ticket (or context)
which orders acquisitions. If a deadlock occurs, the oldest ticket
holder waits for the mutex, while the younger holders have to
“back off” and drop the locks they hold.
Kernel driver interface only, not something userspace can use.
I suspect this to be a base for dmabuf-fences
Queued to be merged for 3.11
What is our plan with Sync?
Try to stir discussion between community
and Android developers on explicit vs
implicit synchronization issues
Follow along to see if any part of the
implementations can be shared
What is KMS?
Kernel Mode Setting
Makes the kernel responsible for graphics
mode (resolution, refresh, orientation)
– Avoids races with userland and hardware
– Can switch modes on OOPs to display message
What is HWComposer?
Per-platform userspace code that manages
Part of the HAL layer
Currently using fb
Would be nice to convert HWComposer to
What is our plan with
Android devs likely already working on KMS enabled HAL
– Likely to be optimized specifically for next hardware release
– Not likely to be generic KMS HAL
Areas that may need work:
– Sync and vsync notifications with KMS
Hopefully this resolves the pageflipping framebuffer issue?
– Gralloc allocates 2x y_res
– Most fb drivers don't support this