Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

QEMU Sandboxing for dummies

989 views

Published on

This is the presentation is about security on QEMU/KVM virtual machines using Seccomp. DevConf 2018 in Brno - Czech Republic.

Published in: Software
  • Be the first to comment

QEMU Sandboxing for dummies

  1. 1. QEMU Sandboxing for dummies Eduardo Otubo <otubo@redhat.com> Senior Software Engineer 27/Jan/2018
  2. 2. 2
  3. 3. 1. Secure Computing: The basics 2. Libseccomp 3. Qemu sandboxing v1 4. Qemu sandboxing v2 and more options Agenda 3
  4. 4. Secure Computing: the basics ● Kernel support first version dated from March, 8th 2005 (2.6.12) Commit by: Andrea Arcangeli ● The main purpose is to call prctl() with PR_SET_SECCOMP on the process which will allow only: exit(), sigreturn(), read() and write() ○ Otherwise SIGKILL or SIGSYS are issued 4
  5. 5. Secure Computing: the basics ● Second kernel implementation with dynamic seccomp policies: January, 11th 2011; Commit by: Will Drewry <wad@chromium.org> ● Now uses with seccomp() system call ● Uses BPF (Berkeley Packet Filter) ○ An in-kernel data link layer packet filter that has an abstracted API that also works as a generic filter 5
  6. 6. struct sock_filter filter[] = { /* Grab the system call number */ BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_nr), /* Jump table for the allowed syscalls */ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_rt_sigreturn, 0, 1), BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), #ifdef __NR_sigreturn BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_sigreturn, 0, 1), BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), #endif BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit_group, 0, 1), BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit, 0, 1), BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_read, 1, 0), BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_write, 3, 2), 6
  7. 7. Libseccomp ● Paul Moore (2011) ● Userspace layer to make life easier: ○ Abstract complex BPF constructions ○ Abstract differences between architectures and its ABIs ○ Optimize filter construction for best performance ○ Kill (sigkill), trap (sigsys), Allow in case of matched filter (among other actions) 7
  8. 8. struct sock_filter filter[] = { /* Grab the system call number */ BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_nr), /* Jump table for the allowed syscalls */ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_rt_sigreturn, 0, 1), BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), #ifdef __NR_sigreturn BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_sigreturn, 0, 1), BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), #endif BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit_group, 0, 1), BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit, 0, 1), BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_read, 1, 0), BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_write, 3, 2), 8
  9. 9. struct sock_filter filter[] = { /* Grab the system call number */ BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_nr), /* Jump table for the allowed syscalls */ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_rt_sigreturn, 0, 1), BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), #ifdef __NR_sigreturn BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_sigreturn, 0, 1), BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), #endif BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit_group, 0, 1), BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit, 0, 1), BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_read, 1, 0), BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_write, 3, 2), 9
  10. 10. Qemu sandboxing v1 static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(timer_settime), 255 }, { SCMP_SYS(timer_gettime), 254 }, { SCMP_SYS(futex), 253 }, { SCMP_SYS(select), 252 }, { SCMP_SYS(recvfrom), 251 }, { SCMP_SYS(sendto), 250 }, { SCMP_SYS(read), 249 }, { SCMP_SYS(brk), 248 }, { SCMP_SYS(clone), 247 }, { SCMP_SYS(mmap), 247 }, { SCMP_SYS(mprotect), 246 }, { SCMP_SYS(execve), 245 }, { SCMP_SYS(open), 245 }, { SCMP_SYS(ioctl), 245 }, { SCMP_SYS(recvmsg), 245 }, { SCMP_SYS(sendmsg), 245 }, 10
  11. 11. Qemu sandboxing v1 11 ● Basic whitelist approach (--sandbox=on) ○ Every system call is blocked, except for the ones that are explicitly whitelisted ● Various compatibility problems, requires lots of testing and different workloads ● It’s safe right?
  12. 12. 12
  13. 13. Qemu sandboxing v1 Not actually! ● QEMU links to too many different shared libraries and there is no way to determine which code paths QEMU triggers in these libraries and thus identify which syscalls will be genuinely needed. ● Sometimes you miss a syscall and it aborts right at the beginning before boot (which is good?) but sometimes your VM is running for days and it could suddenly abort (which is terrible) 13
  14. 14. Qemu sandboxing v2 ● Extended blacklist approach (--sandbox=on,...) ● Everything is allowed except for a few sets that are definitely not allowed ○ Default system calls: basic set of forbidden system calls (kexec,swapon, swapoff, mount, umount, etc) ○ obsolete ○ elevateprivileges ○ spawn ○ resourcecontrol 14
  15. 15. Obsolete system calls ● Old system calls that were usefull in the past but became obsolete or replaced by new version ○ Like readdir() being replaced by getdents() ● Should be by default blocked, but left an option to enabled it by --sandbox on,obsolete=allow 15
  16. 16. Elevated Privileges ● This option would block all set*uid|gid system calls, this is known to be required by some features like bridge helpers ● This option also does prctl(PR_SET_NO_NEW_PRIVS) which will avoid new threads to escalate privilege as well ● This mode could be switched on or off by the option: --sandbox on,elevatedprivileges=allow|deny|children 16
  17. 17. Spawn ● This option provides a fair way to disable new fork() or exec() processes to be created at all, privileged or not. ● Things like bridge helper, SMB server, ifup/down scripts, migration exec: protocol would all be disabled. ● This mode could be switched on or off by the option: --sandbox on,spawn=allow|deny 17
  18. 18. Resource Control ● Avoids QEMU to set process affinity, scheduler priority, etc ● This shouldn’t be QEMU’s responsability to do this, but rather management software like libvirt. ● This mode could be switched on or off by the option: --sandbox on,resourcecontrol=allow|deny 18
  19. 19. Qemu sandboxing v2 static const struct QemuSeccompSyscall blacklist[] = { /* default set of syscalls to blacklist */ { SCMP_SYS(reboot), QEMU_SECCOMP_SET_DEFAULT }, { SCMP_SYS(swapon), QEMU_SECCOMP_SET_DEFAULT }, { SCMP_SYS(swapoff), QEMU_SECCOMP_SET_DEFAULT }, { SCMP_SYS(syslog), QEMU_SECCOMP_SET_DEFAULT }, { SCMP_SYS(mount), QEMU_SECCOMP_SET_DEFAULT }, { SCMP_SYS(umount), QEMU_SECCOMP_SET_DEFAULT }, { SCMP_SYS(kexec_load), QEMU_SECCOMP_SET_DEFAULT }, { SCMP_SYS(afs_syscall), QEMU_SECCOMP_SET_DEFAULT }, { SCMP_SYS(break), QEMU_SECCOMP_SET_DEFAULT }, { SCMP_SYS(ftime), QEMU_SECCOMP_SET_DEFAULT }, { SCMP_SYS(getpmsg), QEMU_SECCOMP_SET_DEFAULT }, { SCMP_SYS(gtty), QEMU_SECCOMP_SET_DEFAULT }, { SCMP_SYS(lock), QEMU_SECCOMP_SET_DEFAULT }, { SCMP_SYS(mpx), QEMU_SECCOMP_SET_DEFAULT }, { SCMP_SYS(prof), QEMU_SECCOMP_SET_DEFAULT }, { SCMP_SYS(profil), QEMU_SECCOMP_SET_DEFAULT }, 19
  20. 20. Some thoughts on Qemu sandboxing 20 ● Sandboxing is not your definitive solution for security on virtualization. But rather a good solution to be stacked on others like: ○ MAC/DAC (Mandatory Access Control and Discretionary Access Control) ○ SELinux ○ Remote Management using SSH/TLS/SSL ○ Guest Image cryptography ○ Virtual Trusted Platform Module (vTPM) ● Sandbox v2 are not low level knobs to control system calls but rahter a high level knobs to controls concepts.
  21. 21. Questions? 21
  22. 22. THANK YOU plus.google.com/+RedHat linkedin.com/company/red-hat youtube.com/user/RedHatVideos facebook.com/redhatinc twitter.com/RedHatNews

×