Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers

My presentation from #OReillySACon, March 19, 2015. More information available on the forthcoming service at https://www.joyent.com/lp/preview.

The Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers

  1. 1. The Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers CTO bryan@joyent.com Bryan Cantrill @bcantrill
  2. 2. Who is Joyent? • In an interview with ACM Queue in 2008, Joyent’s mission was described concisely — if ambitiously:
  3. 3. Virtualization as cloud catalyst • This vision — dating back to 2005 — was an example of early cloud computing, but was itself not a new vision... • In the 1960s — shortly after the dawn of computing! — pundits foresaw a multi-tenant compute utility • The vision was four decades too early: it took the internet + commodity computing + virtualization to yield cloud computing • Virtualization is the essential ingredient for multi-tenant operation — but where in the stack to virtualize? • Choices around virtualization capture tensions between elasticity, tenancy, and performance • tl;dr: Virtualization choices drive economic tradeoffs
  4. 4. • The historical answer — since the 1960s — has been to virtualize at the level of the hardware: • A virtual machine is presented upon which each tenant runs an operating system of their choosing • There are as many operating systems as tenants • The singular advantage of hardware virtualization: it can run entire legacy stacks unmodified • However, hardware virtualization exacts a heavy price: operating systems are not designed to share resources like DRAM, CPU, I/O devices or the network • Hardware virtualization limits tenancy, elasticity and performance Hardware-level virtualization?
  5. 5. • Virtualizing at the application platform layer addresses the tenancy challenges of hardware virtualization • Added advantage of a much more nimble (& developer- friendly!) abstraction… • ...but at the cost of dictating abstraction to the developer • This creates the “Google App Engine problem”: developers are in a straightjacket where toy programs are easy — but sophisticated apps are impossible • Virtualizing at the application platform layer poses many other challenges with respect to security, containment and scalability Platform-level virtualization?
  6. 6. • Virtualizing at the OS level hits the sweet spot: • Single OS (i.e., single kernel) allows for efficient use of hardware resources, maximizing tenancy and performance • Disjoint instances are securely compartmentalized by the operating system • Gives users what appears to be a virtual machine (albeit a very fast one) on which to run higher-level software • The ease of a PaaS with the generality of IaaS • Model was pioneered by FreeBSD jails and taken to their logical extreme by Solaris zones — and then aped by Linux containers OS-level virtualization!
  7. 7. OS-level virtualization in the cloud • Joyent runs OS containers in the cloud via SmartOS (our illumos derivative) — and we have run containers in multi-tenant production since ~2005 • Core SmartOS facilities are container-aware and optimized: Zones, ZFS, DTrace, Crossbow, SMF, etc. • SmartOS also supports hardware-level virtualization — but we have long advocated OS-level virtualization for new build out • We emphasized their operational characteristics (performance, elasticity, tenancy)...
  8. 8. And it worked! • Our vision captured developers seeking to scale apps — and by 2007, a rapidly growing Twitter ran on Joyent Accelerators:
  9. 9. But there were challenges... • OS-based virtualization was a tremendous strength — but SmartOS being (seemingly) spuriously different made it difficult to capture developer mind-share • Differences are more idiosyncratic than meaningful, but they became an obstacle to adoption… • Adopters had to be highly technical and really care about performance/scale • Differentiating on performance alone is challenging, especially when the platform is different: too tempting to blame the differences instead of using the differentiators
  10. 10. Could we go upstack? • To recapture the developer, we needed to get upstack • First attempt was SmartPlatform (ca. 2009?), a JavaScript (SpiderMonkey!) + Perl frankensteinPaaS • SmartPlatform had all of the problems of SpiderMonkey, Perl and a PaaS — but showed the value of server-side JavaScript • When node.js first appeared in late 2009, we were among the first to see its promise, and we lunged...
  11. 11. node.js + OS-based virtualization? • In 2010, the challenge became to tie node.js to our most fundamental differentiator, OS-based virtualization • First experiments was a high-tenancy container-based PaaS, no.de, launched for Node Knockout in Fall 2010 • We ran high tenancy (400+ machines in 48GB), high performance — and developed DTrace-based graphical observability • Early results were promising...
  12. 12. node.js + OS-based virtualization!
  13. 13. no.de: Challenges of a PaaS • We went on to develop full cloud analytics for no.de: • But the PaaS business is more than performance management — and it was clear that it was very early it what was going to be a tough business...
  14. 14. node.js: Wins and frustrations • The SmartOS + node.js efforts were successful in as much as new developer converts to SmartOS were (and are!) often coming from node.js • The debugging we built into node.js on SmartOS is (frankly) jawdropping — and essential for serious use... • ...but our differentiators are production-oriented — developers still have to be highly technical, and still have to be willing to endure transitional pain • Exacerbated by the fact that applications aren’t built in node.js — they are connected with node.js • We ended up back with familiar problems...
  15. 15. Hardware virtualization? • In late 2010, it was clear that — despite the (obvious!) technical superiority of OS-based virtualization — we also needed hardware-based virtualization • Could OS-based virtualization could help us differentiate a hardware virtualization implementation? • If we could port KVM to SmartOS, we could offer advantages over other hypervisors: shared filesystem cache, double-hulled security, global observability • The problem is that KVM isn’t, in fact, portable — and had never been ported to a different system
  16. 16. KVM + SmartOS: Supergroup or stopgap? • In 2011, we managed to successfully port KVM to SmartOS, becoming the first (and only) hypervisor to offer HW virtualization within OS virtualization • Over the course of 2011, we built SmartDataCenter, a container-based orchestration and cloud-management system around SmartOS • Deployed SmartDataCenter into production in the Joyent Public Cloud in late 2011 • Over the course of 2012, our entire cloud moved to SDC • This was essential: most of our VMs today run inside KVM, and many customers are hybrid
  17. 17. The limits of hardware virtualization • Ironically, our time on KVM helped to reinforce our most fundamental beliefs in OS-based virtualization... • We spent significant time making KVM on SmartOS perform — but there are physical limits • There are certain performance and resource problems around HW-based virtualization that are simple intractable • While it is indisputably the right abstraction for running legacy software, it is the wrong abstraction for future elastic infrastructure!
  18. 18. Aside: Cloud storage • In 2011, the gaping hole in the Joyent Public Cloud was storage — but we were reluctant to build an also-ran S3 • In thinking about this problem, it was tempting to fixate on ZFS, one of our most fundamental differentiators • ZFS rivals OS-based virtualization for our earliest differentiator: we were the first large, public deployment of ZFS (ca. 2006) — and a long-time proponent • While ZFS was part of the answer, it should have been no surprise that OS-based virtualization...
  19. 19. ZFS + OS-based virtualization?
  20. 20. Manta: ZFS + OS-based virtualization! • Over 2012 and early 2013, we built Manta, a ZFS- and container-based internet-facing object storage system offering in situ compute • OS-based virtualization allows the description of compute can be brought to where objects reside instead of having to backhaul objects to transient compute • The abstractions made available for computation are anything that can run on the OS... • ...and as a reminder, the OS — Unix — was built around the notion of ad hoc unstructured data processing, and allows for remarkably terse expressions of computation
  21. 21. Aside: Unix • When Unix appeared in the early 1970s, it was not just a new system, but a new way of thinking about systems • Instead of a sealed monolith, the operating system was a collection of small, easily understood programs • First Edition Unix (1971) contained many programs that we still use today (ls, rm, cat, mv) • Its very name conveyed this minimalist aesthetic: Unix is a homophone of “eunuchs” — a castrated Multics We were a bit oppressed by the big system mentality. Ken wanted to do something simple. — Dennis Ritchie
  22. 22. Unix: Let there be light • In 1969, Doug McIlroy had the idea of connecting different components: At the same time that Thompson and Ritchie were sketching out a file system, I was sketching out how to do data processing on the blackboard by connecting together cascades of processes • This was the primordial pipe, but it took three years to persuade Thompson to adopt it: And one day I came up with a syntax for the shell that went along with the piping, and Ken said, “I’m going to do it!”
  23. 23. Unix: ...and there was light And the next morning we had this orgy of one-liners. — Doug McIlroy
  24. 24. The Unix philosophy • The pipe — coupled with the small-system aesthetic — gave rise to the Unix philosophy, as articulated by Doug McIlroy: • Write programs that do one thing and do it well • Write programs to work together • Write programs that handle text streams, because that is a universal interface • Four decades later, this philosophy remains the single most important revolution in software systems thinking!
  25. 25. • In 1986, Jon Bentley posed the challenge that became the Epic Rap Battle of computer science history: Read a file of text, determine the n most frequently used words, and print out a sorted list of those words along with their frequencies. • Don Knuth’s solution: an elaborate program in WEB, a Pascal-like literate programming system of his own invention, using a purpose-built algorithm • Doug McIlroy’s solution shows the power of the Unix philosophy: tr -cs A-Za-z 'n' | tr A-Z a-z | sort | uniq -c | sort -rn | sed ${1}q Doug McIlroy v. Don Knuth: FIGHT!
  26. 26. Big Data: History repeats itself? • The original Google MapReduce paper (Dean et al., OSDI ’04) poses a problem disturbingly similar to Bentley’s challenge nearly two decades prior: Count of URL Access Frequency: The function processes logs of web page requests and outputs ⟨URL, 1⟩. The reduce function adds together all values for the same URL and emits a ⟨URL, total count⟩ pair • But the solutions do not adhere to the Unix philosophy... • ...and nor do they make use of the substantial Unix foundation for data processing • e.g., Appendix A of the OSDI ’04 paper has a 71 line word count in C++ — with nary a wc in sight
  27. 27. • Manta allows for an arbitrarily scalable variant of McIlroy’s solution to Bentley’s challenge: mfind -t o /bcantrill/public/v7/usr/man | mjob create -o -m "tr -cs A-Za-z 'n' | tr A-Z a-z | sort | uniq -c" -r "awk '{ x[$2] += $1 } END { for (w in x) { print x[w] " " w } }' | sort -rn | sed ${1}q" • This description not only terse, it is high performing: data is left at rest — with the “map” phase doing heavy reduction of the data stream • As such, Manta — like Unix — is not merely syntactic sugar; it converges compute and data in a new way Manta: Unix for Big Data
  28. 28. Manta revolution • Our experiences with Manta — like those with KVM — have served to strengthen our core belief in OS-based virtualization • Compute/data convergence is clearly the future of big data: stores of record must support computation as a first-class, in situ operation • Unix is a natural way of expressing this computation — and the OS is clearly the right level at which to virtualize to support this securely • Manta will surely not be the only system to represent the confluence of these; the rest of the world will (ultimately) figure out the power of OS-based virtualization
  29. 29. Manta mental challenges • Our biggest challenge with Manta has been that the key underlying technology — OS-based virtualization — is not well understood • We underestimated the degree to which this would be an impediment: Manta felt “easy” to us • When technology requires a shift in mental model, its transformative power must be that much greater to compensate for its increased burden! • Would the world ever really figure out containers?!
  30. 30. Containers as PaaS foundation? • Some saw the power of OS containers to facilitate up- stack platform-as-a-service abstractions • For example, dotCloud — a platform-as-a-service provider — build their PaaS on OS containers • Hearing that many were interested in their container orchestration layer (but not their PaaS), dotCloud open sourced their container-based orchestration layer...
  31. 31. ...and Docker was born
  32. 32. Docker revolution • Docker has used the rapid provisioning + shared underlying filesystem of containers to allow developers to think operationally • Developers can encode dependencies and deployment practices into an image • Images can be layered, allowing for swift development • Images can be quickly deployed — and re-deployed • As such, Docker is a perfect for for microservices • Docker will do to apt what apt did to tar
  33. 33. Docker’s challenges • The Docker model is the future of containers • Docker’s challenges are largely around production deployment: security, network virtualization, persistence • Security concerns are real enough that for multi-tenancy, OS containers are currently running in hardware VMs (!!) • In SmartOS, we have spent a decade addressing these concerns — and we have proven it in production… • Could we combine the best of both worlds? • Could we somehow deploy Docker containers as SmartOS zones?
  34. 34. Docker + SmartOS: Linux binaries? • First (obvious) problem: while it has been designed to be cross-platform, Docker is Linux-centric • While Docker could be ported, the encyclopedia of Docker images will likely forever remain Linux binaries • SmartOS is Unix — but it isn’t Linux… • Could we somehow natively emulate Linux — and run Linux binaries directly on the SmartOS kernel?
  35. 35. OS emulation: An old idea • Operating systems have long employed system call emulation to allow binaries from one operating system run on another on the same instruction set architecture • Combines the binary footprint of the emulated system with the operational advantages of the emulating system • Done as early as 1969 with DEC’s PA1050 (TOPS-10 on TOPS-20); Sun did this (for similar reasons) ca. 1993 with SunOS 4.x binaries running on Solaris 2.x • In mid-2000s, Sun developed zone-based OS emulation for Solaris: branded zones • Several brands were developed — notably including an LX brand that allowed for Linux emulation
  36. 36. LX-branded zones: Life and death • The LX-branded zone worked for RHEL 3 (!): glibc 2.3.2 + Linux 2.4 • Remarkable amount of work was done to handle device pathing, signal handling, /proc — and arcana like TTY ioctls, ptrace, etc. • Worked for a surprising number of binaries! • But support was only for 2.4 kernels and only for 32-bit; 2.6 + 64-bit appeared daunting… • Support was ripped out of the system on June 11, 2010 • Fortunately, this was after the system was open sourced in June 2005 — and the source was out there...
  37. 37. LX-branded zones: Resurrection! • In January 2014, David Mackay, an illumos community member, announced that he was able to resurrect the LX brand —and that it appeared to work! Linked below is a webrev which restores LX branded zones support to Illumos: http://cr.illumos.org/~webrev/DavidJX8P/lx-zones-restoration/ I have been running OpenIndiana, using it daily on my workstation for over a month with the above webrev applied to the illumos-gate and built by myself. It would definitely raise interest in Illumos. Indeed, I have seen many people who are extremely interested in LX zones. The LX zones code is minimally invasive on Illumos itself, and is mostly segregated out. I hope you find this of interest.
  38. 38. LX-branded zones: Revival • Encouraged that the LX-branded work was salvageable, Joyent engineer Jerry Jelinek reintegrated the LX brand into SmartOS on March 20, 2014... • ...and started the (substantial) work to modernize it • Guiding principles for LX-branded zone work: • Do it all in the open • Do it all on SmartOS master (illumos-joyent) • Add base illumos facilities wherever possible • Aim to upstream to illumos when we’re done
  39. 39. LX-branded zones: Progress • Working assiduously over the course of 2014, progress was difficult but steady: • Ubuntu 10.04 booted in April • Ubuntu 12.04 booted in May • Ubuntu 14.04 booted in July • 64-bit Ubuntu 14.04 booted in October (!) • Going into 2015, it was becoming increasingly difficult to find Linux software that didn’t work...
  40. 40. LX-branded zones: Working well...
  41. 41. ...and, um, well received
  42. 42. Docker + SmartOS: Provisioning? • With the binary problem being tackled, focus turned to the mechanics of integrating Docker with the SmartOS facilities for provisioning • Provisioning a SmartOS zone operates via the global zone that represents the control plane of the machine • docker is a single binary that functions as both client and server — and with too much surface area to run in the global zone, especially for a public cloud • docker has also embedded Go- and Linux-isms that we did not want in the global zone; we needed to find a different approach...
  43. 43. Docker Remote API • While docker is a single binary that can run on the client or the server, it does not run in both at once… • docker (the client) communicates with docker (the server) via the Docker Remote API • The Docker Remote API is expressive, modern and robust (i.e. versioned), allowing for docker to communicate with Docker backends that aren’t docker • The clear approach was therefore to implement a Docker Remote API endpoint for SmartDataCenter
  44. 44. Aside: SmartDataCenter • Orchestration software for SmartOS-based clouds • Unlike other cloud stacks, not designed to run arbitrary hypervisors, sell legacy hardware or get 160 companies to agree on something • SmartDataCenter is designed to leverage the SmartOS differentiators: ZFS, DTrace and (esp.) zones • Runs both the Joyent Public Cloud and business-critical on-premises clouds at well-known brands • Born proprietary — but made entirely open source on November 6, 2014: http://github.com/joyent/sdc
  45. 45. SmartDataCenter: Architecture Booter AMQP broker Public API Customer portal ZFS-based multi-tenant filesystem VirtualNIC VirtualNIC Virtual SmartOS (OS virt.) ... VirtualNIC VirtualNICLinux Guest (HW virt.) ... VirtualNIC VirtualNIC Windows Guest (HW virt.) ... VirtualNIC VirtualNIC Virtual OS or Machine ... SmartOS kernel (network booted) SmartOS kernel (flash booted) Provisioner Instrumenter Heartbeater DHCP/TFTP AMQP AMQP agents Public HTTP Head-node Compute node Tens/hundreds per head-node . . . SDC 7 core services BinderDNS Operator portal . . . Firewall
  46. 46. SmartDataCenter: Core Services Analytics aggregator Key/Value Service (Moray) Firewall API (FWAPI) Virtual Machine API (VMAPI) Directory Service (UFDS) Designation API (DAPI) Workflow API Network API (NAPI) Compute- Node API (CNAPI) Image API Alerts & Monitoring (Amon) Packaging API (PAPI) Service API (SAPI) DHCP/ TFTP AMQP DNS Booter AMQP broker Binder Public API Customer portal Public HTTP Operator portal Operator Services Manta Other DCs Note: Service interdependencies not shown for readability Head-node Other core services may be provisioned on compute nodes SDC7 Core Services
  47. 47. SmartDataCenter + Docker • Implementing an SDC-wide endpoint for the Docker remote API allows us to build in terms of our established core services: UFDS, CNAPI, VMAPI, Image API, etc. • Has the welcome side-effect of virtualizing the notion of Docker host machine: Docker containers can be placed anywhere within the data center • From a developer perspective, one less thing to manage • From an operations perspective, allows for a flexible layer of management and control: Docker API endpoints become a potential administrative nexus • As such, virtualizing the Docker host is somewhat analogous to the way ZFS virtualized the filesystem...
  48. 48. SmartDataCenter + Docker: Challenges • Some Docker constructs have (implicitly) encoded co- locality of Docker containers on a physical machine • Some of these constructs (e.g., --volumes-from) we will discourage but accommodate by co-scheduling • Others (e.g., host directory-based volumes) we are implementing via NFS backed by Manta, our (open source!) distributed object storage service • Moving forward, we are working with Docker to help assure that the Docker Remote API doesn’t create new implicit dependencies on physical locality
  49. 49. SmartDataCenter + Docker: Networking • Parallel to our SmartOS and Docker work, we have been working on next-generation software-defined networking for SmartOS and SmartDataCenter • Goal was to use standard encapsulation/decapsulation protocols (i.e., VXLAN) for overlay networks • We have taken a kernel-based (and ARP-inspired) approach to assure scale • Complements SDC’s existing in-kernel, API-managed firewall facilities • All done in the open: in SmartOS (illumos-joyent) and as sdc-portolan
  50. 50. Putting it all together: sdc-docker • Our Docker engine for SDC, sdc-docker, implements the end points for the Docker Remote API • Work is young (started in earnest in early fall 2014), but because it takes advantage of a proven orchestration substrate, progress has been very quick… • We are deploying it into early access production in the Joyent Public Cloud in Q1CY15 (yes: T-12 days!) • It’s open source: http://github.com/joyent/sdc-docker; you can install SDC (either on hardware or on VMware) and check it out for yourself!
  51. 51. Containers: reflecting back • For nearly a decade, we at Joyent have believed that OS-virtualized containers are the future of computing • While the efficiency gains are tremendous, they have not alone been enough to propel containers into the mainstream • Containers are being propelled by Docker and its embodiment of an entirely different advantage of OS containers: developer agility • With Docker, the moment for the technology seems to have arrived: the technology seems to be in the right place at the right time • Reflecting back on our adventure as an early adopter...
  52. 52. Early adoption: The peril • When working on a revolutionary technology, it’s easy to dismiss the inconveniences as casualties of the future • Some conveniences are actually constraints — but it can be very difficult to discern which! • When adopters must endure painful differences to enjoy the differentiators, the economic advantages of a technological revolution are undermined • And even when the thinking does shift, it can take a long time; as Keynes famously observed, “the market can stay irrational longer than you can stay solvent”!
  53. 53. Early adoption: The promise • When the payoffs do come, they can be tremendously outsized with respect to the risk • Placing gutsy technological bets attracts like-minded technologists — which can create uniquely fertile environments for innovation • If and where early adoption is based on open source, the community of like-minded technologists is not confined to be within a company’s walls • Open source innovation allows for new customers and/ or new employees: for early adopters, open source is the farm system!
  54. 54. Early adoption: The peril and the promise • While early adoption isn’t for everyone, every organization should probably be doing some early adoption somewhere — and probably in the open • When an early adopter of a technology, don’t innovate in too many directions at once: know the differentiators and focus on ease of use/adoption for everything else • Stay flexible and adaptable! You may very well be right on trajectory, but wrong on specifics • Don’t give up! Technological revolutions happen much slower than you think they should — and then much more quickly than anyone would think possible • “God bless the early adopters!”
  55. 55. Thank you! • Jerry Jelinek, @jmclulow, @pfmooney and @jperkin for their work on LX branded zones • @joshwilsdon, @trentmick, @cachafla and @orlandov for their work on sdc-docker • @rmustacc, @wayfaringrob, @fredfkuo and @notmatt for their work on SDC overlay networking • @dapsays for his work on Manta and node.js debugging • @tjfontaine for his work on node.js • The countless engineers who have worked on or with us because they believed in OS-based virtualization!

×