The dream is alive! Running Linux containers on an illumos kernel

22,921 views
22,151 views

Published on

Presentation for #illumos day at #surgecon, 2014. Video can be found at https://www.youtube.com/watch?v=TrfD3pC0VSs Source code is at https://github.com/joyent/illumos-joyent

Published in: Software
3 Comments
23 Likes
Statistics
Notes
  • Thanks Bryan.
    FYI, the Janus team, based in Watford UK, were Paul Winder, Ann Marie Westgate, Richard Cohen, Clive Standbridge, Joanna Farley and Andrew Stitcher. This team developed and implemented the guts of the Linux Emulation code in Solaris 10 x86 which was later continued by the BrandX team. I remember demonstrating Janus in Menlo Park. The demonstration showed a game running on Windows Xp running under Wine, running under Red Hat running under Solaris 10 x86. It performed flawlessly and remarkably well given the many layers involved. My recollection back in 2004 was that given how well it ran, and how well it was implemented (with little contamination to the Solaris kernel), the ARCs were concerned that it would confuse the Sun user base. There were concerns that Janus (Linux) would influence the user base to move to Linux having long term effects to the Sun Solaris marketplace. Given the way it ran alongside Solaris, there was also concern that the user would become confused which commands to use, the Linux ones, the Solaris ones or both. There was also concern about where applications would be installed (or updated) possibly contaminating the existing Solaris base. Thus was the impetus for Zoning the Linux environment....and it is my understanding that this was the impetus for BrandX.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Berny,

    No slight intended; it's more that my memories of Janus were really murky and my slides were made quite quickly. ;) I'm sure I'll be giving this presentation again and in much more depth, so I'll be sure to get the Janus history in there -- thanks for the reference to the PSARC case!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hmmm. interesting indeed. But very surprised that Brian didn't reference Sun's project Janus (PSARC 2003/445) nor its team.... the original linux emulation development team blended into BrandZ including /proc, Linux system call emulation, path management and zoned Linux running on Solaris 10 x86, not to mention ltrace and strace working under Red Hat emulation. But anyway, great to to hear its been revived.
    Berny Goodheart
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
22,921
On SlideShare
0
From Embeds
0
Number of Embeds
1,677
Actions
Shares
0
Downloads
143
Comments
3
Likes
23
Embeds 0
No embeds

No notes for slide

The dream is alive! Running Linux containers on an illumos kernel

  1. 1. The dream is alive! Running Linux containers on an illumos kernel Bryan Cantrill CTO bryan@joyent.com @bcantrill
  2. 2. OS emulation: An old idea • Operating systems have long employed system call emulation to allow binaries from one operating system run on another on the same instruction set architecture • Combines the binary footprint of the emulated system with the operational advantages of the emulating system • Sun first did this with SunOS 4.x binaries on Solaris 2.x • With Solaris x86, it became possible to run binaries targeted for Linux via SCO’s (open source) “lxrun” • Packaging innovation in Linux in early 2000s + deeply differentiated technologies in Solaris 10 (e.g. ZFS, DTrace, zones) made Linux emulation more attractive
  3. 3. Rise of zones • While more important, the problem also became more complicated: programs became more complicated than single-process binaries • Clear that “lxrun” would only work for applications, not systems — needed a deeper solution • Fortunately, coincided with the rise of operating system virtualization embodied by zones • Idea: introduce notion of a branded zone whereby an entire foreign system (a brand) could be emulated within the confines of a zone
  4. 4. BrandZ: LX-branded zones • In 2006, team at Sun that included Nils Nieuwejaar and Russ Blaine integrated BrandZ, a Linux branded zone (PSARC 2005/471) • Support was a user/kernel hybrid: lx system calls bounced back to a user-level emulation library that depended on some in-kernel emulation (e.g. futexes) • Support was for RHEL 3 (!): glibc 2.3.2 + Linux 2.4 • Remarkable amount of work was done to handle device pathing, signal handling, /proc — and arcana like TTY ioctls, ptrace, etc. • Worked for a surprising number of binaries!
  5. 5. What was missing? • Support was only for 2.4 kernels • Support for 2.6 required adding new, Linux-only mechanisms that had native analogues (e.g., epoll) • Only 32-bit was supported • XVM (the Xen-on-Solaris effort inside of Sun) had much more managerial support and was thought to be a “more supportable” solution
  6. 6. The decline of the lx brand After cresting in 2007, contributions to lx dwindled: 30 20 10 0 2006 2007 2008 2009 2010 Pushes to usr/src/lib/brand/lx
  7. 7. Clinically dead The lx brand was removed on June 11, 2010... 30 20 10 0 2006 2007 2008 2009 2010 2011 2012 2013 Pushes to usr/src/lib/brand/lx
  8. 8. The organ donation years • Joyent customers asked for SmartOS to support htop, a colorful Linux program for system process monitoring • htop is very, very specific to Linux /proc — and porting it to use illumos /proc seemed arduous and pointless… • ...but a relatively complete Linux /proc had integrated with the LX brand! • In April 2012, the /proc portion of the LX brand was extracted, cleaned up, and separately integrated • Mounted at /system/lxproc in SmartOS zones; htop modified to look for this path on illumos
  9. 9. Exhumed! • In January 2014, David Mackay, an illumos community member, announced that he was able to resurrect the lx brand —and that it appeared to work! Linked below is a webrev which restores LX branded zones support to Illumos: http://cr.illumos.org/~webrev/DavidJX8P/lx-zones-restoration/ I have been running OpenIndiana, using it daily on my workstation for over a month with the above webrev applied to the illumos-gate and built by myself. It would definitely raise interest in Illumos. Indeed, I have seen many people who are extremely interested in LX zones. The LX zones code is minimally invasive on Illumos itself, and is mostly segregated out. I hope you find this of interest.
  10. 10. Could it be revived? • David’s work inspired us to rethink LX-branded zones... • It seemed that the reasons for the discontinuation of LX brand support might not still be valid... • ...and it seemed that the engineering challenges might not be as structurally daunting
  11. 11. Has Linux made it easier? • Linux is moving much more slowly: pace of development of new user-visible kernel abstraction has slowed • Torvalds discovered religion on ABI compatibility • The need to run on older kernels has dissuaded software from using the more obscure Linux-isms • The glibc/kernel disconnect means that glibc (and apps!) must reasonably be able to process ENOSYS • Easier support model: the rise of the cloud has replaced shrink-wrapped software with open source + SaaS • Server focus: Mac OS X gave us Unix — and relegated “Linux on the desktop” to “Duke Nukem Forever” status
  12. 12. Have motivations changed? • Originally, LX branded zones were about bringing Linux applications into established Solaris environments for purposes of hardware consolidation • Port of KVM to illumos circa 2011 solved this problem • ...but KVM has unresolvable performance and resource limitations, and Linux on KVM only gets indirect benefit from ZFS, DTrace and zones • At the same time, enthusiasm for containers and OS-based virtualization have blossomed (ht: Docker) • There seems to be desire for a best-of-all worlds system that combines Linux strengths (binary footprint) with illumos technical differentiators (ZFS, zones, DTrace)
  13. 13. Reviving LX-branded zones • Encouraged that the body might not have decomposed, Joyent engineer Jerry Jelinek exhumed the LX brand and reintegrated it into SmartOS on March 20, 2014 • Guiding principles: • Do it all in the open • Do it all on SmartOS master (illumos-joyent) • Add base illumos facilities wherever possible • Aim to upstream to illumos when we’re done • Thanks to Jerry grinding out many, many LX bug fixes, got Ubuntu 10.04 booting in April, Ubuntu 12.04 booting in May and Ubuntu 14.04 booting in July
  14. 14. IT’S ALIVE! Contributions to the lx brand since March: 100 lx brand/75 lib/src/50 usr/to Pushes 25 0 2006 2007 2008 2009 2010 2011 2012 2013 2014
  15. 15. So what have we done? • Fixed a ton of bugs (ht: LTP) • Added native epoll(5) — though not in terms of event ports but rather in terms of poll(7D) • Added exclusive IP stacks for LX-branded zones • Added support for netlink (RFC 3549) — but restricted that support to the lx brand • Added support for thunk-less native binaries within an LX branded zone • Added native inotify(5) • Added initial 64-bit support
  16. 16. What is left to do? • vsyscall support (needed for 64-bit) • Anything else for 64-bit • Stack switching (needed for Go) • Multi-threaded ptrace support • Lots of using it and figuring out what breaks!
  17. 17. How can you get involved? • SmartOS contains latest-and-greatest bits; first step is to get SmartOS running • We have a 32-bit Ubuntu 14.04 image that can be used to create a zone via vmadm: b7493690-f019-4612-958b-bab5f844283e • Will need to configure a VM with “kernel-version” set to 3.13.0 and “brand” to “lx” in the vmadm JSON payload • If you find that something is boken, create an issue on the illumos-joyent github repo • Once 64-bit is working, we will be very actively seeking community engagement; stay tuned!
  18. 18. Thanks! • The original BrandZ team at Sun for a remarkable amount of work: Nils Nieuwejaar and Russ Blaine • The illumos community — especially David Mackay! — for inspiring the revival • Jerry Jelinek for leading the charge — and doing the vast majority of the work! • @rmustacc for thunk-less native binary support • @jmclulow for stack switching • @djhoffma for his work on ptrace • @joshwilsdon for vmadm support for LX brands

×