Presentation for #illumos day at #surgecon, 2014. Video can be found at https://www.youtube.com/watch?v=TrfD3pC0VSs Source code is at https://github.com/joyent/illumos-joyent
The dream is alive! Running Linux containers on an illumos kernel
1. The dream is alive!
Running Linux containers
on an illumos kernel
Bryan Cantrill
CTO
bryan@joyent.com
@bcantrill
2. OS emulation: An old idea
• Operating systems have long employed system call
emulation to allow binaries from one operating system
run on another on the same instruction set architecture
• Combines the binary footprint of the emulated system
with the operational advantages of the emulating system
• Sun first did this with SunOS 4.x binaries on Solaris 2.x
• With Solaris x86, it became possible to run binaries
targeted for Linux via SCO’s (open source) “lxrun”
• Packaging innovation in Linux in early 2000s + deeply
differentiated technologies in Solaris 10 (e.g. ZFS,
DTrace, zones) made Linux emulation more attractive
3. Rise of zones
• While more important, the problem also became more
complicated: programs became more complicated than
single-process binaries
• Clear that “lxrun” would only work for applications, not
systems — needed a deeper solution
• Fortunately, coincided with the rise of operating system
virtualization embodied by zones
• Idea: introduce notion of a branded zone whereby an
entire foreign system (a brand) could be emulated within
the confines of a zone
4. BrandZ: LX-branded zones
• In 2006, team at Sun that included Nils Nieuwejaar and
Russ Blaine integrated BrandZ, a Linux branded zone
(PSARC 2005/471)
• Support was a user/kernel hybrid: lx system calls
bounced back to a user-level emulation library that
depended on some in-kernel emulation (e.g. futexes)
• Support was for RHEL 3 (!): glibc 2.3.2 + Linux 2.4
• Remarkable amount of work was done to handle device
pathing, signal handling, /proc — and arcana like TTY
ioctls, ptrace, etc.
• Worked for a surprising number of binaries!
5. What was missing?
• Support was only for 2.4 kernels
• Support for 2.6 required adding new, Linux-only
mechanisms that had native analogues (e.g., epoll)
• Only 32-bit was supported
• XVM (the Xen-on-Solaris effort inside of Sun) had much
more managerial support and was thought to be a “more
supportable” solution
6. The decline of the lx brand
After cresting in 2007, contributions to lx dwindled:
30
20
10
0
2006 2007 2008 2009 2010
Pushes to usr/src/lib/brand/lx
7. Clinically dead
The lx brand was removed on June 11, 2010...
30
20
10
0
2006 2007 2008 2009 2010 2011 2012 2013
Pushes to usr/src/lib/brand/lx
8. The organ donation years
• Joyent customers asked for SmartOS to support htop, a
colorful Linux program for system process monitoring
• htop is very, very specific to Linux /proc — and porting it
to use illumos /proc seemed arduous and pointless…
• ...but a relatively complete Linux /proc had integrated
with the LX brand!
• In April 2012, the /proc portion of the LX brand was
extracted, cleaned up, and separately integrated
• Mounted at /system/lxproc in SmartOS zones; htop
modified to look for this path on illumos
9. Exhumed!
• In January 2014, David Mackay, an illumos community
member, announced that he was able to resurrect the lx
brand —and that it appeared to work!
Linked below is a webrev which restores LX branded zones
support to Illumos:
http://cr.illumos.org/~webrev/DavidJX8P/lx-zones-restoration/
I have been running OpenIndiana, using it daily on my
workstation for over a month with the above webrev applied to
the illumos-gate and built by myself.
It would definitely raise interest in Illumos. Indeed, I have
seen many people who are extremely interested in LX zones.
The LX zones code is minimally invasive on Illumos itself, and
is mostly segregated out.
I hope you find this of interest.
10. Could it be revived?
• David’s work inspired us to rethink LX-branded zones...
• It seemed that the reasons for the discontinuation of LX
brand support might not still be valid...
• ...and it seemed that the engineering challenges might
not be as structurally daunting
11. Has Linux made it easier?
• Linux is moving much more slowly: pace of development
of new user-visible kernel abstraction has slowed
• Torvalds discovered religion on ABI compatibility
• The need to run on older kernels has dissuaded
software from using the more obscure Linux-isms
• The glibc/kernel disconnect means that glibc (and apps!)
must reasonably be able to process ENOSYS
• Easier support model: the rise of the cloud has replaced
shrink-wrapped software with open source + SaaS
• Server focus: Mac OS X gave us Unix — and relegated
“Linux on the desktop” to “Duke Nukem Forever” status
12. Have motivations changed?
• Originally, LX branded zones were about bringing Linux
applications into established Solaris environments for
purposes of hardware consolidation
• Port of KVM to illumos circa 2011 solved this problem
• ...but KVM has unresolvable performance and resource
limitations, and Linux on KVM only gets indirect benefit
from ZFS, DTrace and zones
• At the same time, enthusiasm for containers and OS-based
virtualization have blossomed (ht: Docker)
• There seems to be desire for a best-of-all worlds system
that combines Linux strengths (binary footprint) with
illumos technical differentiators (ZFS, zones, DTrace)
13. Reviving LX-branded zones
• Encouraged that the body might not have decomposed,
Joyent engineer Jerry Jelinek exhumed the LX brand
and reintegrated it into SmartOS on March 20, 2014
• Guiding principles:
• Do it all in the open
• Do it all on SmartOS master (illumos-joyent)
• Add base illumos facilities wherever possible
• Aim to upstream to illumos when we’re done
• Thanks to Jerry grinding out many, many LX bug fixes,
got Ubuntu 10.04 booting in April, Ubuntu 12.04 booting
in May and Ubuntu 14.04 booting in July
14. IT’S ALIVE!
Contributions to the lx brand since March:
100
lx
brand/75
lib/src/50
usr/to Pushes 25
0
2006 2007 2008 2009 2010 2011 2012 2013 2014
15. So what have we done?
• Fixed a ton of bugs (ht: LTP)
• Added native epoll(5) — though not in terms of event
ports but rather in terms of poll(7D)
• Added exclusive IP stacks for LX-branded zones
• Added support for netlink (RFC 3549) — but restricted
that support to the lx brand
• Added support for thunk-less native binaries within an
LX branded zone
• Added native inotify(5)
• Added initial 64-bit support
16. What is left to do?
• vsyscall support (needed for 64-bit)
• Anything else for 64-bit
• Stack switching (needed for Go)
• Multi-threaded ptrace support
• Lots of using it and figuring out what breaks!
17. How can you get involved?
• SmartOS contains latest-and-greatest bits; first step is to
get SmartOS running
• We have a 32-bit Ubuntu 14.04 image that can be used
to create a zone via vmadm:
b7493690-f019-4612-958b-bab5f844283e
• Will need to configure a VM with “kernel-version” set to
3.13.0 and “brand” to “lx” in the vmadm JSON payload
• If you find that something is boken, create an issue on
the illumos-joyent github repo
• Once 64-bit is working, we will be very actively seeking
community engagement; stay tuned!
18. Thanks!
• The original BrandZ team at Sun for a remarkable
amount of work: Nils Nieuwejaar and Russ Blaine
• The illumos community — especially David Mackay! —
for inspiring the revival
• Jerry Jelinek for leading the charge — and doing the
vast majority of the work!
• @rmustacc for thunk-less native binary support
• @jmclulow for stack switching
• @djhoffma for his work on ptrace
• @joshwilsdon for vmadm support for LX brands