The current state-of-the-art in displaying guest video is to copy pixel data from domU memory into a buffer in the device model domain, and then to render the display using something like X, or VNC. The quantity of data copied is partially mitigated by dirty page tracking. However when using the VM to play video or other other tasks that require frequent full-screen updates, copying is a significant drag on system performance and power consumption. By using the DRM subsystem in dom0 on systems with a unified memory architecture, it is possible to make arbitrary pages available for direct scanout by the graphics hardware. The in-kernel graphics drivers make this relatively straight forward and maintainable. This presentation explains how the current display path works, and how to use DRM to improve it.
XPDS13: Zero-copy display of guest framebuffers using GEM - John Baboval, Citrix
1. Zero-‐Copy
Display
of
Guest
Framebuffers
using
GEM
John
Baboval
Citrix
Systems,
Inc.
October
24th,
2013
2. Agenda
• Xen
video
basics
• ExisNng
OpNmizaNons
• GEM
October
24th,
2013
Citrix
Systems,
Inc
2
3. Overview
of
QEMU
Graphics
(The
quick
version)
QEMU
Video
Device
Driver
VM
HW
UI
(device
specific)
(device
independent)
I/O
port
handlers
dpy_update()
• Called
by
HW
when
Linear
Framebuffer
scanlines
change
gfx_hw_update()
dpy_refresh()
• Called
on
interval
to
• Called
on
interval
to
process
guest
pixel
data
update
display
into
DisplaySurface
DisplaySurface
October
24th,
2013
Citrix
Systems,
Inc
Output
Device
X11,
TCP…
3
4. Worst
Case
Scenario
• Emulated
VRAM
domU
dom0
QEMU
Virtual
VRAM
Display
Surface
SDL
Surface
(window)
ComposiNng
Buffer
Back
Buffer
libSDL
X.org
Linux/DRM
Xen
October
24th,
2013
Citrix
Systems,
Inc
4
5. ExisNng
OpNmizaNons
• Foreign
Page
Mapping
(Shared
Memory)
domU
dom0
QEMU
Virtual
VRAM
Dirty
Page
Data
Display
Surface
SDL
Surface
(window)
ComposiNng
Buffer
Back
Buffer
libSDL
X.org
Linux/DRM
Xen
October
24th,
2013
Citrix
Systems,
Inc
5
6. ExisNng
OpNmizaNons
• Shared
Buffer
Mode
+
Foreign
Page
Mapping
domU
dom0
QEMU
Virtual
VRAM
Display
Surface
Dirty
Page
Data
SDL
Surface
(window)
ComposiNng
Buffer
Back
Buffer
libSDL
X.org
Linux/DRM
Xen
October
24th,
2013
Citrix
Systems,
Inc
6
7. Graphics
Hardware
(The
REALLY
Short
Version)
• Systems
commonly
have
a
unified
memory
architecture
(UMA)
– GPU
is
connected
to
the
same
memory
bus
as
the
CPU
– Can
scan
directly
out
of
system
RAM
• UMA
GPUs
have
their
own
virtual
address
space
– Can
access
any
domain’s
memory
if
the
GPU’s
page
table
is
appropriately
programmed…
– …and…
well,
it’s
the
REALLY
short
version
October
24th,
2013
Citrix
Systems,
Inc
7
8. The
Obvious
SoluNon!
• Map
the
linear
framebuffer
into
the
GPU
page
table
• Program
the
CRTC
base
address
so
that
the
GPU
scans
out
of
the
framebuffer
• Simple!
October
24th,
2013
Citrix
Systems,
Inc
8
9. Unpleasant
Reality
• GPUs
are
complicated
• GPU
vendors
are
secreNve
• GPUs
change
constantly
• You
probably
want
to
use
your
GPU
for
more
than
one
thing
at
a
Nme
October
24th,
2013
Citrix
Systems,
Inc
9
10. GEM
• Linux
already
knows
how
to
program
your
GPU
• The
API
is
“standardized”
• …but
we
want
to
tell
it
WHICH
pages
to
use.
October
24th,
2013
Citrix
Systems,
Inc
10
11. GEM
Objects
• Basically
a
bunch
of
GPU
specific
state
tracking
and
a
scaner-‐gather
list
of
pages
• Add
an
interface
that
fills
in
the
scaner-‐gather
list
with
the
right
pages
• Pass
the
object
to
KMS
• …
but
which
pages?
October
24th,
2013
Citrix
Systems,
Inc
11
12. Foreign
pages?
• Mapping
the
guest’s
foreign
pages
provides
virtual
addresses
to
the
memory
• No
machine
addresses
• Not
Linux
memory,
so
no
page
structs
to
add
to
the
scaner
gather
list
October
24th,
2013
Citrix
Systems,
Inc
12
13. Grants?
•
•
•
•
domU
allocates
a
big
grant
table
Creates
a
grantref
for
each
page
in
the
framebuffer
Passes
the
table
address
and
size
to
the
device
model
Grant
mappings
get
added
to
the
m2p_override
table,
so
we
have
page
structs
• Requires
a
cooperaNve
guest
• Uses
a
lot
of
grant
refs
• Sets
up
mappings
that
never
get
used
October
24th,
2013
Citrix
Systems,
Inc
13
14. Manual
m2p_override
• Use
the
translate_gpfn_list
hypercall
to
get
a
list
of
MFNs
• Use
the
m2p_override
infrastructure
from
the
grant
code
to
redirect
pages
• Use
redirected
pages
in
the
scaner
gather
list
• No
guest
knowledge
• No
unnecessary
mappings
October
24th,
2013
Citrix
Systems,
Inc
14
15. Foreign-‐backed
GEM
Objects
• Add
new
ioctl
to
create
i915_gem_foreign_object
• Standard
GEM
object,
but
override
get_pages
and
put_pages
handlers
using
exisNng
hooks
• Create
m2p_override
mappings
for
framebuffer
pages
• Fill
in
the
scaner-‐gather
list
with
the
overridden
pages
• Pass
the
object
to
KMS
• Success!
October
24th,
2013
Citrix
Systems,
Inc
15
16. New
DRM
ioctl
• (intel)
• I915_GEM_FOREIGN
– In:
• GFN
• Size
• Domain
ID
– Out:
• GEM
handle
October
24th,
2013
Citrix
Systems,
Inc
16
18. What
to
do
With
Your
GEM
Object
• Take
over
the
scanout
buffer
of
an
enNre
vt
– Lowest
runNme
overhead
(none)
– Requires
source
of
input
events
(/dev/event/*)
• Convert
it
to
a
Prime
object
for
use
with
DRI2/DRI3
X
extensions
• Convert
it
to
an
EGL
named
image,
and
bind
it
to
a
texture
– Hardware
accelerated
framebuffer
tricks
•
•
•
•
Scaling
RotaNon
ComposiNng
Effects
• Use
it
to
display
XenGT
hardware
accelerated
framebuffers
– Be
sure
to
see
Haitao
Shan’s
XenGT
presentaNon
tomorrow…
October
24th,
2013
Citrix
Systems,
Inc
18