SlideShare a Scribd company logo
1 of 23
Download to read offline
Dave Voutila <dv@openbsd.org>, AsiaBSDCon 2023
Hardening Emulated Devices in
OpenBsd’s vmd(8) Hypervisor
fork&
exec&
fork&
exec.
Dave Voutila (dv@)
Vermont 🍁, USA 🇺🇸
(40 mins from Québec 🇨🇦)
Hypervisors as High Value Targets
Why do you rob a bank? It’s where the money is. 💰
• Shift to multi-tenant public cloud means target rich environments
• If it’s networked, it’s vulnerable.
• Pop one ESXi instance, now you own many systems
• “it’s ok, i’m running it in a vm”
A Potpourri of CVE’s
Some classic guest-to-host escapes
• QEMU
• CVE-2015-3456 — “VENOM”
• aka the QEMU
fl
oppy disk one
• CVE-2015-7504 — network device escape
• VMWare
• CVE-2020-3967 — EHCI heap over
fl
ow
• OpenBSD
• DHCP packet handler stack over
fl
ow (6.8, 6.9) CC BY-SA 4.0, Crowdstrike
Emulated Devices are the Problem
When in doubt, cut it out. ✂
• By de
fi
nition, always handling untrusted (guest) input
• Need read/write to guest physical memory
• Need read/write to host interfaces (network,
fi
les, etc.)
• Emulating in kernel is asking for trouble, so commonly done in userland
• In OpenBSD, only pvclock(4) is handled in the kernel (vmm(4))
Multi-process QEMU
First Type-2 open source hypervisor doing this?
• Oracle started work in 2017, landed in QEMU December 2020
• Elena U
fi
mtseva, Jag Raman, John G. Johnson
• https://lists.gnu.org/archive/html/qemu-devel/2020-12/msg00268.html
• …but, who uses it?
• I’d presume Oracle Cloud
• ¯_(ツ)_/¯ 🤷
• Documentation is primarily about design, future ideas, & not present day usage.
• Additional burden placed upon the poor administrators 😩
vmm(4) / vmd(8)
OpenBSD’s native hypervisor
• Originally released with OpenBSD 5.9 by mlarkin@ & reyk@
• Currently amd64 only with support for both amd64 and i386 guests
• Adopted privilege separation design
• fork+exec —> chroot(2) & pledge(2)
• drop from root to _vmd
• Components
• vmm(4) — in kernel monitor
• vmd(8) — userland daemon
• vmctl(8) — userland control utility
vmd(8) gaps & weaknesses
Room for improvement
• vm process is fork(2)’d from the vmm process, without execv*(2)
• Every vm gets same address space layout
• Every vm has junk left over from vmm process (global state)
• vm process isn’t chroot’ed & vmm process isn’t running as root
• vm process has multiple pledge(2) promises beyond “stdio”
• “recvfd” — vm send/recv
• “vmm” — vmm(4) ioctl for vcpu, interrupts, registers r/w
Existing Synthetic Mitigations
A house is only as good as its foundation.
• ASLR — force attackers to need info leaks (3.4)
• RETGUARD — control
fl
ow integrity (6.4 for amd64?)
• pledge(2) — minimize syscalls available to attackers (5.9)
• unveil(2) — hide parts of the
fi
le system without root (6.4)
• In -current and landing as part OpenBSD 7.3:
• mimmutable(2) — prevent changing page permissions or mappings
• pinsyscall(2) — minimize allowed call point for syscalls like execve
• execute-only — enforce execute-only on .text to prevent ROP (amd64 via kernel PK
Step 1: fork+exec for each vm
Minimizing info leaks across vm’s
• Switch vmm process from just fork(2) to: fork(2) + execvp(2)
• Easy wins 🙌
• Reuse socketpair for IPC between vmm proc and vm
• Simply pass the fd number for the channel in argv
• Headaches 😖
• vmm process needs absolute path to vmd executable
• vm process can’t rely on existing global state for con
fi
guration
Step 2: Isolate the VirtIO Device
“Breaking up is hard to do.” — the Oracle Blog post
• vmd uses multiple vm_mem_ranges (bios/reserved, mmio, regular)
• The approach:
• shm_mkstemp(3) — create temporary
fi
le for mapping shared
memory
• ftruncate(2) & shm_unlink(3) — size and remove the temp
fi
le
• fcntl(2) — set the fd to not close on exec
• mmap(2) guest memory ranges MAP_SHARED | MAP_CONCEAL
• Pass the fd number after exec & re-mmap vm_mem_ranges
Step 3: Wiring up RPC
Here’s a dime…call someone who cares.
• Sync Channel
• Bootstrapping device con
fi
g post-exec
• Virtio PCI register reads/writes
• Async Channel
• Lifecycle events (vm pause/resume, shutdown)
• Assert/Deassert IRQ
• Set host MAC (vionet)
Putting it all Together
Sorry for my artwork 🧑🎨
High-level Message Flow
Sorry for my artwork 🧑🎨
1. Guest
fi
lls bu
ff
ers,
updates virtqueues, etc.
2. Guest writes to Device
register via IO instructions
3. Device is noti
fi
ed it can
process data. Writes it to
fd.
4. Device kicks guest via
vcpu interrupt to notify
bu
ff
ers are processed
Security! But at what cost?
What about the user/admin experience? Does it change?
• OpenBSD 7.2
• # vmd -d
• # vmctl start -Lc -d disk.raw -m 8g guest
• Prototype
• # vmd -d
• # vmctl start -Lc -d disk.raw -m 8g guest
Security! But at what cost?
What about the user/admin experience? Does it change?
They’re the same commands.
Security! But at what Cost?
vmd(8) has room for performance improvement
• zero-copy virtio added only recently
• single emulated device thread handles all async io
• virtio PCI devices — network, disk, cdrom
• ns8250 serial device
• i8253 programmable interrupt timer
• Mutexes guard device state from competing vcpu & event threads
Quick & Dirty Benchmarking
This is not a benchmarking talk 🙅
• Lenovo X1 Carbon (10th gen)
• 12th gen Intel i7-1270P @ 2.2 GHz
• 32 GiB RAM
• 1 TB NVME disk
• Guest Operating Systems
• OpenBSD 7.2
• Alpine Linux 3.7 (kernel vTKTKT)
vioblk(4) benchmark
fi
o(1) [ports] performing 8 KiB random writes to 2GiB
fi
le for 5 minutes
• Very little di
ff
erence in
throughput
• Very little di
ff
erence in
latency
• Long-tail slightly
worse on Alpine??
Host
Version
Guest
Version
Throughput
MiB/s
clat avg
(usec)
clat stdev
(usec)
99.90th %
(usec)
99.95%
(usec)
OpenBSD-
current
OpenBSD-
current
90.3 89.9 8730 338 379
Prototype
OpenBSD-
current
100 76.4 7220 388 429
OpenBSD-
current
Alpine
Linux 3.17
131 11.2 534 32 38
Prototype
Alpine
Linux 3.17
132 11.7 682 594 685
vio(4) benchmark
iperf3(1) [ports] with 1 thread run for 30 seconds, alternating TX/RX
• iperf(3) used with 1
thread in alternating
client / server modes
• Observations
• More consistent
throughput in
OpenBSD guests
• Negligible
performance
decrease for Linux
guests (not sure why)
Host
Version
Guest
Version
Receiving
Bitrate
(Mbit/s)
%
Sending
Bitrate
(MBit/s)
%
-current
OpenBSD
7.2
0.86 — 1.26 —
prototype
OpenBSD
7.2
1.22 63% 1.35 7%
-current
Alpine Linux
3.17
1.26 — 4.26 —
prototype
Alpine Linux
3.17
1.30 5% 3.19 -25%
Headaches!
Some things weren’t so easy. 😰
• Dual-channel connectivity
• synchronous register io from vcpu thread needs to avoid deadlocks
• vcpu vs. event thread isolation
• libevent(3) is not thread-safe (OpenBSD bundles v1.x)
• Debugging multi-process, async code is often challenging
• printf & ktrace can quickly generate lots of noise
• nanosleep(2) + gdb attaching directly to device process helps
Future Work & Research
Plans for the next hackathon (m2k23) 📋
• Finish lifecycle cleanup (vm send/receive)
• QCOW2 disk support — only supports raw images at the moment
• Begin merging changes into tree after 7.3 release
• vm fork+exec dance
• vioblk
• vionet
• Expand to other devices
• ns8250?
• Tighten exposure of guest physical memory
• guest aware drivers? (is there anything to gain by limiting how much of guest memory we remap?)
Thanks!
See you in Ottawa?

More Related Content

Similar to Hardening Emulated Devices in OpenBSD’s vmd(8) Hypervisor

Hardware support for efficient virtualization
Hardware support for efficient virtualizationHardware support for efficient virtualization
Hardware support for efficient virtualizationLennox Wu
 
Bridging the Semantic Gap in Virtualized Environment
Bridging the Semantic Gap in Virtualized EnvironmentBridging the Semantic Gap in Virtualized Environment
Bridging the Semantic Gap in Virtualized EnvironmentAndy Lee
 
5. IO virtualization
5. IO virtualization5. IO virtualization
5. IO virtualizationHwanju Kim
 
Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack eurobsdcon
 
Considerations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfConsiderations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfhik_lhz
 
BlueHat v18 || Straight outta v mware - modern exploitation of the svga devic...
BlueHat v18 || Straight outta v mware - modern exploitation of the svga devic...BlueHat v18 || Straight outta v mware - modern exploitation of the svga devic...
BlueHat v18 || Straight outta v mware - modern exploitation of the svga devic...BlueHat Security Conference
 
Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsIntel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsHisaki Ohara
 
Erlang Lightning Talk
Erlang Lightning TalkErlang Lightning Talk
Erlang Lightning TalkGiltTech
 
AVX512 assembly language in FFmpeg
AVX512 assembly language in FFmpegAVX512 assembly language in FFmpeg
AVX512 assembly language in FFmpegKieran Kunhya
 
Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Hajime Tazaki
 
Rmll Virtualization As Is Tool 20090707 V1.0
Rmll Virtualization As Is Tool 20090707 V1.0Rmll Virtualization As Is Tool 20090707 V1.0
Rmll Virtualization As Is Tool 20090707 V1.0guest72e8c1
 
Practical Windows Kernel Exploitation
Practical Windows Kernel ExploitationPractical Windows Kernel Exploitation
Practical Windows Kernel ExploitationzeroSteiner
 
Virtual Distro Dispatcher - A light-weight Desktop-as-a-Service solution
Virtual Distro Dispatcher - A light-weight Desktop-as-a-Service solutionVirtual Distro Dispatcher - A light-weight Desktop-as-a-Service solution
Virtual Distro Dispatcher - A light-weight Desktop-as-a-Service solutionFlavio Bertini
 
Module_2_Slides.pdf
Module_2_Slides.pdfModule_2_Slides.pdf
Module_2_Slides.pdfgoldfer1
 
Metasploit & Windows Kernel Exploitation
Metasploit & Windows Kernel ExploitationMetasploit & Windows Kernel Exploitation
Metasploit & Windows Kernel ExploitationzeroSteiner
 

Similar to Hardening Emulated Devices in OpenBSD’s vmd(8) Hypervisor (20)

Hardware support for efficient virtualization
Hardware support for efficient virtualizationHardware support for efficient virtualization
Hardware support for efficient virtualization
 
Bridging the Semantic Gap in Virtualized Environment
Bridging the Semantic Gap in Virtualized EnvironmentBridging the Semantic Gap in Virtualized Environment
Bridging the Semantic Gap in Virtualized Environment
 
5. IO virtualization
5. IO virtualization5. IO virtualization
5. IO virtualization
 
Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack
 
Considerations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfConsiderations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmf
 
Genode Compositions
Genode CompositionsGenode Compositions
Genode Compositions
 
BlueHat v18 || Straight outta v mware - modern exploitation of the svga devic...
BlueHat v18 || Straight outta v mware - modern exploitation of the svga devic...BlueHat v18 || Straight outta v mware - modern exploitation of the svga devic...
BlueHat v18 || Straight outta v mware - modern exploitation of the svga devic...
 
Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsIntel DPDK Step by Step instructions
Intel DPDK Step by Step instructions
 
Erlang Lightning Talk
Erlang Lightning TalkErlang Lightning Talk
Erlang Lightning Talk
 
AVX512 assembly language in FFmpeg
AVX512 assembly language in FFmpegAVX512 assembly language in FFmpeg
AVX512 assembly language in FFmpeg
 
17-virtualization.pptx
17-virtualization.pptx17-virtualization.pptx
17-virtualization.pptx
 
Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)
 
RMLL / LSM 2009
RMLL / LSM 2009RMLL / LSM 2009
RMLL / LSM 2009
 
Rmll Virtualization As Is Tool 20090707 V1.0
Rmll Virtualization As Is Tool 20090707 V1.0Rmll Virtualization As Is Tool 20090707 V1.0
Rmll Virtualization As Is Tool 20090707 V1.0
 
Practical Windows Kernel Exploitation
Practical Windows Kernel ExploitationPractical Windows Kernel Exploitation
Practical Windows Kernel Exploitation
 
Virtual Distro Dispatcher - A light-weight Desktop-as-a-Service solution
Virtual Distro Dispatcher - A light-weight Desktop-as-a-Service solutionVirtual Distro Dispatcher - A light-weight Desktop-as-a-Service solution
Virtual Distro Dispatcher - A light-weight Desktop-as-a-Service solution
 
Module_2_Slides.pdf
Module_2_Slides.pdfModule_2_Slides.pdf
Module_2_Slides.pdf
 
Metasploit & Windows Kernel Exploitation
Metasploit & Windows Kernel ExploitationMetasploit & Windows Kernel Exploitation
Metasploit & Windows Kernel Exploitation
 
ARM-KVM: Weather Report
ARM-KVM: Weather ReportARM-KVM: Weather Report
ARM-KVM: Weather Report
 
Podman rootless containers
Podman rootless containersPodman rootless containers
Podman rootless containers
 

Recently uploaded

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

Hardening Emulated Devices in OpenBSD’s vmd(8) Hypervisor

  • 1. Dave Voutila <dv@openbsd.org>, AsiaBSDCon 2023 Hardening Emulated Devices in OpenBsd’s vmd(8) Hypervisor fork& exec& fork& exec.
  • 2. Dave Voutila (dv@) Vermont 🍁, USA 🇺🇸 (40 mins from Québec 🇨🇦)
  • 3. Hypervisors as High Value Targets Why do you rob a bank? It’s where the money is. 💰 • Shift to multi-tenant public cloud means target rich environments • If it’s networked, it’s vulnerable. • Pop one ESXi instance, now you own many systems • “it’s ok, i’m running it in a vm”
  • 4. A Potpourri of CVE’s Some classic guest-to-host escapes • QEMU • CVE-2015-3456 — “VENOM” • aka the QEMU fl oppy disk one • CVE-2015-7504 — network device escape • VMWare • CVE-2020-3967 — EHCI heap over fl ow • OpenBSD • DHCP packet handler stack over fl ow (6.8, 6.9) CC BY-SA 4.0, Crowdstrike
  • 5. Emulated Devices are the Problem When in doubt, cut it out. ✂ • By de fi nition, always handling untrusted (guest) input • Need read/write to guest physical memory • Need read/write to host interfaces (network, fi les, etc.) • Emulating in kernel is asking for trouble, so commonly done in userland • In OpenBSD, only pvclock(4) is handled in the kernel (vmm(4))
  • 6. Multi-process QEMU First Type-2 open source hypervisor doing this? • Oracle started work in 2017, landed in QEMU December 2020 • Elena U fi mtseva, Jag Raman, John G. Johnson • https://lists.gnu.org/archive/html/qemu-devel/2020-12/msg00268.html • …but, who uses it? • I’d presume Oracle Cloud • ¯_(ツ)_/¯ 🤷 • Documentation is primarily about design, future ideas, & not present day usage. • Additional burden placed upon the poor administrators 😩
  • 7. vmm(4) / vmd(8) OpenBSD’s native hypervisor • Originally released with OpenBSD 5.9 by mlarkin@ & reyk@ • Currently amd64 only with support for both amd64 and i386 guests • Adopted privilege separation design • fork+exec —> chroot(2) & pledge(2) • drop from root to _vmd • Components • vmm(4) — in kernel monitor • vmd(8) — userland daemon • vmctl(8) — userland control utility
  • 8. vmd(8) gaps & weaknesses Room for improvement • vm process is fork(2)’d from the vmm process, without execv*(2) • Every vm gets same address space layout • Every vm has junk left over from vmm process (global state) • vm process isn’t chroot’ed & vmm process isn’t running as root • vm process has multiple pledge(2) promises beyond “stdio” • “recvfd” — vm send/recv • “vmm” — vmm(4) ioctl for vcpu, interrupts, registers r/w
  • 9. Existing Synthetic Mitigations A house is only as good as its foundation. • ASLR — force attackers to need info leaks (3.4) • RETGUARD — control fl ow integrity (6.4 for amd64?) • pledge(2) — minimize syscalls available to attackers (5.9) • unveil(2) — hide parts of the fi le system without root (6.4) • In -current and landing as part OpenBSD 7.3: • mimmutable(2) — prevent changing page permissions or mappings • pinsyscall(2) — minimize allowed call point for syscalls like execve • execute-only — enforce execute-only on .text to prevent ROP (amd64 via kernel PK
  • 10. Step 1: fork+exec for each vm Minimizing info leaks across vm’s • Switch vmm process from just fork(2) to: fork(2) + execvp(2) • Easy wins 🙌 • Reuse socketpair for IPC between vmm proc and vm • Simply pass the fd number for the channel in argv • Headaches 😖 • vmm process needs absolute path to vmd executable • vm process can’t rely on existing global state for con fi guration
  • 11. Step 2: Isolate the VirtIO Device “Breaking up is hard to do.” — the Oracle Blog post • vmd uses multiple vm_mem_ranges (bios/reserved, mmio, regular) • The approach: • shm_mkstemp(3) — create temporary fi le for mapping shared memory • ftruncate(2) & shm_unlink(3) — size and remove the temp fi le • fcntl(2) — set the fd to not close on exec • mmap(2) guest memory ranges MAP_SHARED | MAP_CONCEAL • Pass the fd number after exec & re-mmap vm_mem_ranges
  • 12. Step 3: Wiring up RPC Here’s a dime…call someone who cares. • Sync Channel • Bootstrapping device con fi g post-exec • Virtio PCI register reads/writes • Async Channel • Lifecycle events (vm pause/resume, shutdown) • Assert/Deassert IRQ • Set host MAC (vionet)
  • 13. Putting it all Together Sorry for my artwork 🧑🎨
  • 14. High-level Message Flow Sorry for my artwork 🧑🎨 1. Guest fi lls bu ff ers, updates virtqueues, etc. 2. Guest writes to Device register via IO instructions 3. Device is noti fi ed it can process data. Writes it to fd. 4. Device kicks guest via vcpu interrupt to notify bu ff ers are processed
  • 15. Security! But at what cost? What about the user/admin experience? Does it change? • OpenBSD 7.2 • # vmd -d • # vmctl start -Lc -d disk.raw -m 8g guest • Prototype • # vmd -d • # vmctl start -Lc -d disk.raw -m 8g guest
  • 16. Security! But at what cost? What about the user/admin experience? Does it change? They’re the same commands.
  • 17. Security! But at what Cost? vmd(8) has room for performance improvement • zero-copy virtio added only recently • single emulated device thread handles all async io • virtio PCI devices — network, disk, cdrom • ns8250 serial device • i8253 programmable interrupt timer • Mutexes guard device state from competing vcpu & event threads
  • 18. Quick & Dirty Benchmarking This is not a benchmarking talk 🙅 • Lenovo X1 Carbon (10th gen) • 12th gen Intel i7-1270P @ 2.2 GHz • 32 GiB RAM • 1 TB NVME disk • Guest Operating Systems • OpenBSD 7.2 • Alpine Linux 3.7 (kernel vTKTKT)
  • 19. vioblk(4) benchmark fi o(1) [ports] performing 8 KiB random writes to 2GiB fi le for 5 minutes • Very little di ff erence in throughput • Very little di ff erence in latency • Long-tail slightly worse on Alpine?? Host Version Guest Version Throughput MiB/s clat avg (usec) clat stdev (usec) 99.90th % (usec) 99.95% (usec) OpenBSD- current OpenBSD- current 90.3 89.9 8730 338 379 Prototype OpenBSD- current 100 76.4 7220 388 429 OpenBSD- current Alpine Linux 3.17 131 11.2 534 32 38 Prototype Alpine Linux 3.17 132 11.7 682 594 685
  • 20. vio(4) benchmark iperf3(1) [ports] with 1 thread run for 30 seconds, alternating TX/RX • iperf(3) used with 1 thread in alternating client / server modes • Observations • More consistent throughput in OpenBSD guests • Negligible performance decrease for Linux guests (not sure why) Host Version Guest Version Receiving Bitrate (Mbit/s) % Sending Bitrate (MBit/s) % -current OpenBSD 7.2 0.86 — 1.26 — prototype OpenBSD 7.2 1.22 63% 1.35 7% -current Alpine Linux 3.17 1.26 — 4.26 — prototype Alpine Linux 3.17 1.30 5% 3.19 -25%
  • 21. Headaches! Some things weren’t so easy. 😰 • Dual-channel connectivity • synchronous register io from vcpu thread needs to avoid deadlocks • vcpu vs. event thread isolation • libevent(3) is not thread-safe (OpenBSD bundles v1.x) • Debugging multi-process, async code is often challenging • printf & ktrace can quickly generate lots of noise • nanosleep(2) + gdb attaching directly to device process helps
  • 22. Future Work & Research Plans for the next hackathon (m2k23) 📋 • Finish lifecycle cleanup (vm send/receive) • QCOW2 disk support — only supports raw images at the moment • Begin merging changes into tree after 7.3 release • vm fork+exec dance • vioblk • vionet • Expand to other devices • ns8250? • Tighten exposure of guest physical memory • guest aware drivers? (is there anything to gain by limiting how much of guest memory we remap?)