XPDDS18: Qemu and Xen: Reducing the attack surface - Paul Durrant, Citrix

QEMU and Xen:
Reducing the Attack
Surface
Paul Durrant,
Senior Principal Software Engineer,
Citrix Systems R&D
1

Acknowledgements:
● Ian Jackson
● George Dunlap
● Anthony Perard
2

Background:
How do we use QEMU
with Xen?
3

Paravirtual Backend
Service Domain Guest Domain
Xen
4
I/O Drivers
Frontend Frontend
QEMU
BackendBackend
Kernel
User
XenStore

Paravirtual Backend
Xen
5
Frontend Frontend
QEMU
BackendBackend
Kernel
User
Ring and data shared
directly using grants
XenStore
I/O Drivers

Paravirtual Backend
Xen
6
Frontend Frontend
QEMU
BackendBackend
Kernel
User
XenStore
Service domain trusted by guest
but doesn’t need mapping privilege
I/O Drivers

Paravirtual Backend
Xen
7
Frontend Frontend
QEMU
BackendBackend
Kernel
User
XenStore
I/O Drivers
Backends do use
hypercalls

I/O Emulation
Emulator Domain Guest Domain
Xen
8
Driver Driver
QEMU
Device
Model
Device
Model
Kernel
User
IO / MMIO
I/O Drivers
IOREQ
Server

I/O Emulation
Xen
9
Driver Driver
QEMU
Device
Model
Device
Model
Kernel
User
IO / MMIO / PCI
I/O trapped
by Xen and
forwarded
to QEMU
I/O Drivers
IOREQ
Server

I/O Emulation
Xen
10
Driver Driver
QEMU
Device
Model
Device
Model
Kernel
User
Emulator domain has mapping
privilege to access data
I/O Drivers
IOREQ
Server

I/O Emulation
Xen
11
Driver Driver
QEMU
Device
Model
Device
Model
Kernel
User
IOREQ
Server
I/O Drivers
Emulator uses hypercalls and
has shared memory interface
with Xen

IOREQ Pages
Xen
13
QEMU
Kernel
User
IOREQ
Server
Memory
SYNC
BUFFERED
E820
Reserved
Control pages
allocated from
guest memory
IOREQ Server
creation

IOREQ Pages
Xen
14
QEMU
Kernel
User
IOREQ
Server
Memory
SYNC
BUFFERED
E820
Reserved
Then they are
mapped by
QEMU and Xen
Map foreign
memory hypercall

IOREQ Pages
Xen
15
QEMU
Kernel
User
IOREQ
Server
Memory
SYNC
BUFFERED
E820
Reserved
Requests and responses are
passed between Xen and QEMU
using the shared pages

IOREQ Pages
Xen
16
QEMU
Kernel
User
IOREQ
Server
Memory
E820
Reserved
But the pages could still be
manipulated directly by the guest
SYNC
BUFFERED

Mitigation:
IOREQ Server Enable
17

IOREQ Server Enable
Xen
18
QEMU
Kernel
User
IOREQ
Server
Memory
E820
Reserved
The protocol
is not
immediately
operational
SYNC
BUFFERED

IOREQ Server Enable
Xen
19
QEMU
Kernel
User
IOREQ
Server
Memory
E820
Reserved
Then they are
mapped by
QEMU
Enable IOREQ
Server hypercall
MFNs are removed from
the guest P2M...

IOREQ Server Enable
Xen
20
QEMU
Kernel
User
IOREQ
Server
Memory
E820
Reserved
Then they are
mapped by
QEMU
…better if they were never
there at all

Better Mitigation:
XENMEM_acquire_resource
21

IOREQ Page Mapping
Xen
22
QEMU
Kernel
User
IOREQ
Server
Control pages
allocated from
‘Xen’ memory
Create IOREQ
Server hypercall
SYNC
BUFFERED

IOREQ Page Mapping
Xen
23
QEMU
Kernel
User
IOREQ
Server
New hypercall
SYNC
BUFFERED

/*
* Get the pages for a particular guest resource, so that they can be
* mapped directly by a tools domain.
*/
#define XENMEM_acquire_resource 28
struct xen_mem_acquire_resource {
domid_t domid;
uint16_t type;
uint32_t id;
uint32_t nr_frames;
uint32_t flags;
uint64_aligned_t frame;
XEN_GUEST_HANDLE(xen_pfn_t) frame_list;
};
24

/*
*/
domid_t domid;
uint16_t type;
uint32_t id;
uint32_t nr_frames;
uint32_t flags;
};
25
XENMEM_resource_ioreq_server
Resource type for all
IOREQ Server control
pages

/*
*/
domid_t domid;
uint16_t type;
uint32_t id;
uint32_t nr_frames;
uint32_t flags;
};
26
Frame identifiers
distinguish between
SYNC and BUFFERED
pages
XENMEM_resource_ioreq_server_frame_bufioreq
XENMEM_resource_ioreq_server_frame_ioreq(n)

/*
*/
domid_t domid;
uint16_t type;
uint32_t id;
uint32_t nr_frames;
uint32_t flags;
};
27
If the tools domain is PV then, upon return, frame_list
will be populated with the MFNs of the resource.
If the tools domain is HVM then it is expected that, on
entry, frame_list will be populated with a list of GFNs
that will be mapped to the MFNs of the resource.
Emulator could be
running in either PV or
HVM domain

struct privcmd_mmap_resource {
domid_t dom;
__u32 type;
__u32 id;
__u32 idx;
__u64 num;
__u64 addr;
} privcmd_mmap_resource_t;
28
IOCTL_PRIVCMD_MMAP_RESOURCE
commit 9a80bfbdd23242168a508b950ffdc80f675ce695
Author: Paul Durrant <paul.durrant@citrix.com>
Date: Fri Jul 28 11:22:49 2017 +0100
xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE
Currently queued for
upstream

Attack Vector:
Hypercall Memory
Handles
29

Hypercall Memory Handles
Xen
30
QEMU
Kernel
User
privcmd

Xen
31
Kernel
User
privcmd
Guest attacks and
compromises
QEMU
QEMU

Xen
32
QEMU
Kernel
User
privcmd
IOCTL_PRIVCMD_HYPERCALL

QEMU
Xen
33
Kernel
User
privcmd HYPERVISOR_???

34
HVMOP_track_dirty_vram
/* Track dirty VRAM. */
#define HVMOP_track_dirty_vram 6
struct xen_hvm_track_dirty_vram {
/* Domain to be tracked. */
domid_t domid;
/* Number of pages to track. */
uint32_t nr;
/* First pfn to track. */
uint64_aligned_t first_pfn;
/* OUT variable. */
/* Dirty bitmap buffer. */
XEN_GUEST_HANDLE_64(uint8) dirty_bitmap;
};
Emulator Domain Memory
QEMU controlled
value

Xen
35
Kernel
User
privcmd
Xen writes to
emulator domain
kernel memory
HVMOP_track_dirty_vram
QEMU

Mitigation:
HYPERCALL_dm_op
36

privcmd
Hypercall Memory
Xen
37
QEMU
Kernel
User
libxendevicemodel

privcmd
Hypercall Memory
Xen
38
QEMU
Kernel
User
libxendevicemodel
IOCTL_PRIVCMD_DM_OP
HYPERVISOR_dm_op

privcmd
Hypercall Memory
Xen
39
QEMU
Kernel
User
libxendevicemodel
IOCTL_PRIVCMD_DM_OP
HYPERVISOR_dm_op
This can be
audited

40
IOCTL_PRIVCMD_DM_OP
struct privcmd_dm_op {
domid_t dom;
__u16 num;
const privcmd_dm_op_buf_t __user
*ubufs;
};
struct privcmd_dm_op_buf
{
void __user *uptr;
size_t size;
};
access_ok()?

41
HYPERVISOR_dm_op
HYPERVISOR_dm_op(domid_t domid,
unsigned int nr_bufs,
xen_dm_op_buf_t bufs[]);
struct xen_dm_op_buf {
XEN_GUEST_HANDLE(void) h;
xen_ulong_t size;
};
Operation information

Attack Vector:
Hypercall Target
Domain
42

Hypercall Target Domain
A
Xen
43
QEMU
Kernel
User
privcmd
libxenforeignmemory B
Guest Domain
IOCTL_PRIVCMD_MMAP
domid = B

A
Xen
44
QEMU
Kernel
User
privcmd
Guest Domain
Target
domain
unaudited

A
Xen
45
QEMU
Kernel
User
privcmd
Guest Domain
Memory mapped
from ‘wrong’
guest

Mitigation:
IOCTL_PRIVCMD_RESTRICT
46

A
Xen
47
QEMU
Kernel
User
privcmd
Guest Domain
domid = A
Handle now restricted to operations on domain A

A
Xen
48
QEMU
Kernel
User
privcmd
Guest Domain
domid = A
IOCTL_PRIVCMD_MMAP
domid = B
Hypercall not
issued

Multiple Handles
Emulator Domain
Xen
49
QEMU
Kernel
User
privcmd
libxenforeignmemory libxendevicemodel libxenstore libxenevtchn libxengnttab
privcmd xenbus gntdevevtchn
QEMU has lots of handles
open to many different
drivers

50
One Library to Rule Them All
QEMU
libxentoolcore
New library

51
Handle Registration
QEMU
libxentoolcore
xentoolcore__register_active_handle()
Other libraries register a restriction
callback for each open handle

52
Handle Restriction
QEMU
libxentoolcore
xentoolcore_restrict_all()
active_handle->restrict_callback()
Restriction ‘aware’
implementation
QEMU makes single call to restrict
all handles

53
Handle Restriction
QEMU
libxentoolcore
xentoolcore_restrict_all()
xentoolcore__restrict_by_dup2_null()
Restriction ‘unaware’
implementation

56
[pauldu@brixham:~]/usr/local/lib/xen/bin/qemu-system-i386 --help
.
.
.
-runas user change to user id user just before starting the VM
.
.
.
QEMU Command Line

57
.
.
.
.
.
.
Not actually a UID
but a user name
QEMU Command Line

58
Shared UID Problem
QEMU
Shared UID
QEMU QEMU

59
Shared UID Problem
QEMU
Shared UID
QEMU QEMU
Compromised
process

60
Shared UID Problem
QEMU
Shared UID
QEMU QEMU
ptrace(2)
Compromised
process

61
UID per VM
domid space
uid space
< 16 bits 32 bits
Space is much larger so
reserve a region
System
reserved
region
UID base

62
.
.
.
.
.
.
This is going to be
awkward
QEMU Command Line

63
.
.
.
.
.
.
commit 2c42f1e80103cb926c0703d4c1ac1fb9c3e2c600
Author: Ian Jackson <ian.jackson@eu.citrix.com>
Date: Fri Sep 15 18:10:44 2017 +0100
os-posix: Provide new -runas <uid>:<gid> facility
This allows the caller to specify a uid and gid to use, even if there
is no corresponding password entry. This will be useful in certain
Xen configurations.
But this makes it
better
QEMU Command Line

QEMU
64
Cleanup
QEMUQEMU QEMUQEMU
uid = base +
1
uid = base +
DOMID_FIRST_RESERVED - 1
uid = base +
2
uid = base +
3
UIDs cycle as domains
come and go...

QEMU
65
Cleanup
QEMUQEMU QEMUQEMU
uid = base +
1
uid = base +
DOMID_FIRST_RESERVED - 1
uid = base +
2
uid = base +
3
... but something is
lurking here

QEMU
66
Cleanup
QEMU
uid = base +
2
uid = base +
2
Compromised process
with the same UID that
was not killed...

QEMU
67
Cleanup
QEMU
uid = base +
2
uid = base +
2
Compromised process
with the same UID that
was not killed...
... But why was it not
killed?

QEMU
68
Killing Processes Is Tricky
while(1) {
if(!fork())
_exit(0);
}
kill(qemu_pid);
Toolstack
Not going to work since
PID is continuously
changing
uid = base +
2

69
Reliable Mechanism
while(1) {
if(!fork())
_exit(0);
}
setresuid(..., base + 2, ...);
kill(-1, SIGKILL)
QEMU
Toolstack
uid = base +
2

QEMU
70
Reliable Mechanism
while(1) {
if(!fork())
_exit(0);
}
Toolstack
uid = base +
2
Carefully crafted so
QEMU can’t kill
Toolstack
kill(-1, SIGKILL)

QEMU
71
Reliable Mechanism
while(1) {
if(!fork())
_exit(0);
}
Toolstack
uid = base +
2
Example code here
kill(-1, SIGKILL)
https://github.com/gwd/runner-reaper

● Direct resource mapping makes guest attack
on QEMU more difficult
72
Summary

● Hypercall auditing and restriction reduces
ability of compromised QEMU to attack host
or other guests
73
Summary

● Hypercall auditing and restriction reduces
ability of compromised QEMU to attack host
or other guests
● De-privileging QEMU stops it bypassing
those restrictions
74
Summary

● Migration
● PCI Pass-Through
76
Problems

● Migration
77
Problems
Problem: Signaling uses xenstore

● Migration
78
Problems
Solution: Use QMP instead

79
Problems
● Migration

Audit Handles
80
QEMU
Kernel
User
privcmd

Audit Handles
81
QEMU
Kernel
User
privcmd

● Migration
82
Problems
Problem: pcilib(7)

● Migration
83
Problems
Continue to run QEMU as root
Problem: pcilib(7)

● chroot(2)
● setrlimit(2)
● Linux namespaces
85
Further Restrictions

● chroot(2)
● setrlimit(2)
86
qemu-system-i386 -chroot <dir>

● chroot(2)
● setrlimit(2)
87
Virtual CD-ROM?

● chroot(2)
● setrlimit(2)
88
Use QMP add-fd

● chroot(2)
● setrlimit(2)
89
Use QMP add-fd
RLIMIT_CORE
RLIMIT_FSIZE
RLIMIT_LOCKS
RLIMIT_NOFILE
.
.
.

● chroot(2)
● setrlimit(2)
90
Use QMP add-fd
RLIMIT_CORE
RLIMIT_FSIZE
RLIMIT_LOCKS
RLIMIT_NOFILE
.
.
.
unshare(CLONE_NEWNS | CLONE_NEWIPC);

XPDDS18: Qemu and Xen: Reducing the attack surface - Paul Durrant, Citrix

More Related Content

What's hot

Similar to XPDDS18: Qemu and Xen: Reducing the attack surface - Paul Durrant, Citrix

More from The Linux Foundation

Recently uploaded

XPDDS18: Qemu and Xen: Reducing the attack surface - Paul Durrant, Citrix

Editor's Notes