Xen currently has two major mechanisms to maintain security while hosting untrusted VMs without causing disruption to those guests: live patching, and live migration. We introduce a third method: live updating Xen. A live-update operation involves loading of the newly-staged hypervisor into RAM, the currently-running Xen serializing its state, and then transferring control to the newly-staged Xen, all without disrupting running instances, beyond a little downtime when neither hypervisor is running guest vCPUs.
We present a proposal on the design of such a feature, and invite comments and feedback.
2. Live Update
• Update the running hypervisor with a new build
• Gracefully transfer running guests to the new Xen
• Guests may only notice a small pause
2
3. Why Do This
• AWS operates a large fleet of hosts
– Not much opportunity to reboot
●
long-running guests
– Operationally, we need to be ready to fix customer pain
• Roll out fixes
– Bug fixes
– Security fixes
• Bring new features
• Maintenance
– Reduce number of hypervisor versions needing support
• Development
– Reduce devel times by faster testing and prototyping 3
4. Existing Techniques
• Live Patching
– Works well, operationally proven
– Requires backporting to multiple supported hypervisor versions
– Effort required increases with patch complexity
– Recurring work for each livepatch
• Live Migration
– Guest workload-dependent
– Not applicable for all device models in use
4
5. Live Update
• Currently restricting to minor version updates
– e.g. 4.11.1 → 4.11.2
• Considering just hypervisor updates
– Dom0, userspace, etc., out of scope for now
• Most of this talk is request for comments
– General idea and design is being presented here
– Prototyping on these ideas started recently
– Design is deliberately fluid to incorporate feedback and various usecases
5
6. Terminology
• Running Xen
– Current hypervisor on a host
– The “source” in the live-update operation
• Target Xen
– New build of hypervisor
– The “target” in the live-update operation
6
7. General Idea
• Load Target Xen in memory,
• Initiate Live-Update
– Pause all domains,
– Mask interrupts for domains,
– Serialize domain states,
– Serialize Running Xen state,
– Jump to Target Xen,
• Target Xen takes over
– Deserialize state,
– Unpause domains,
– Unmask interrupts 7
8. General Idea
• Load Target Xen in memory,
• Initiate Live-Update
– Pause all domains,
– Mask interrupts for domains,
– Serialize domain states,
– Serialize Running Xen state,
– Jump to Target Xen,
• Target Xen takes over
– Deserialize state,
– Unpause domains,
– Unmask interrupts 8
9. Load Target Xen in Memory
• crashkernel area and kexec
• Load new Xen binary in crashkernel region
– kexec -l
9
10. Load Target Xen in Memory
• crashkernel area and kexec
• Load new Xen binary in crashkernel region
– kexec -l
• To load currently, we have to:
– $ zcat /boot/xen-4.12.gz > xen-4.12
– $ echo -en x3 | dd of=xen-4.12 bs=1 seek=16 conv=notrunc
– $ kexec -l xen-4.12 –append="..." --module "/boot/vmlinuz ..."
--module /boot/initramfs -d –mem-min=0x2000000
10
11. Solutions to Challenges in Load
• Patches merged for kexec-tools v2.0.20
– Adds multiboot2 support
– Gets us relocation support – can now load in crashkernel area
– Don’t use lowmem areas
– Can directly use the ELF binary
– From Varad Gautam
●
“[PATCH 1/2] elf: Support ELF loading with relocation”
●
“[PATCH 2/2] x86: Support multiboot2 images”
11
12. General Idea
• Load Target Xen in memory,
• Initiate Live-Update
– Pause all domains,
– Mask interrupts for domains,
– Serialize domain states,
– Serialize Running Xen state,
– Jump to Target Xen,
• Target Xen takes over
– Deserialize state,
– Unpause domains,
– Unmask interrupts 12
13. Jump to Target Xen
• Needs new hypercall, just `kexec -e` not sufficient
– As this needs to be an atomic operation with pausing dom0
• Do not drop to Real Mode
– Start in protected mode (or, later, even long mode)
– Stop using real-mode low memory
– Patches on the list from David
●
“[RFC PATCH 0/7] Clean up x86_64 boot code”
• Skip startup
13
14. Consume State in Target Xen
• Two ways to transfer state
– Pointer to memory region via kexec command line
– Multiboot module with state
• Deserialize state from Running Xen
– Xen state; domain state
14
15. General Idea
• Load Target Xen in memory,
• Initiate Live-Update
– Pause all domains,
– Mask interrupts for domains,
– Serialize domain states,
– Serialize Running Xen state,
– Jump to Target Xen,
• Target Xen takes over
– Deserialize state,
– Unpause domains,
– Unmask interrupts 15
16. Persisting Guest State
• We have Live Migration
– For minor version upgrades, state changes not expected
– Just slightly different from LM: migration across time, not space
• Persist memory
• Persist domain structures
• Collect state information
– domheap, page tables, start_info, shared_info_frame
16
17. Persisting Host State
• IOMMU state
– Mask interrupts
– DMA requests continue as normal
• Memory regions
– Xen memory, domain memory spread out
– Have to ensure to not overwrite these areas
●
And carefully relocate Target Xen later
17
18. Prototyping in Persisting Guest State
• Ongoing work for a PV guest
• Modified `xl save` workflow to start serialization
– Skip memory scrubbing
– Allow domain destruction
– Store pointers in well-known location
• Launch new domain
– Re-use state information from previously-destroyed domain
– See if guest continues running
• Later
– extend this to Dom0
– HVM domains
– Across kexec 18
19. Things to be Aware of (1/2)
• Pause time
– Should not result in guest noticing much of this activity
– A decent estimate could be “network connections don’t time out”
●
3 TCP RTT
– About 1-2 seconds OK to begin with
– Leaving memory pages in RAM, not initializing IOMMU, skipping startup – all help
• Interrupts could get lost
– May have to find a way to queue them and reinject
• Domain states
– Already-paused domains should remain paused
• Ordering of pausing/masking activities during setup phase
19
20. Things to be Aware of (2/2)
• Host Time: Target Xen re-initializes RTC
– This can be off compared to Running Xen
• Guest Timekeeping
– pvclock sync
• Internal state / struct changes
– Handling major version updates
– Can also sneak in for security fixes
– Thoughts for the future
●
Static annotation in source code / compile-time warnings
• Controlling capabilities per domain
– Currently, spread out: xen cmdline, global config, domain config, compile-time
– Control feature advertisements at launch based on Running Xen capabilities 20
21. More Information
• Discussions ongoing on IRC and devel list
• Sending out RFC patches as we write them
• Design session
• Wiki page
– https://wiki.xen.org/wiki/Live-Updating_Xen
– Links to WIP trees
– JIRA board
– General status information
21