Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
node.js in production:
Reflections on three years
of riding the unicorn
Bryan Cantrill
SVP, Engineering
bryan@joyent.com
@b...
Production systems

•

Production systems are ones doing real work: when
they misbehave, users or other systems are affect...
node.js advantages

•

In terms of production suitability, node.js had — and still
has — a couple of major advantages goin...
node.js challenges

•

But node.js also has a couple of major challenges:

•
•

JavaScript closures make it easy to accide...
August 2010: DTrace in node.js

•

Added simple user-level statically defined tracing
(USDT) probes for node.js on platform...
August 2010: Deploying 0.2.x

•

In August 2010, we deployed our first node.js-based
service into production: a NodeKnockou...
August 2010: Deploying 0.2.x, cont.

•

We had a memory leak that resulted in heap exhaustion
after several hours under he...
February 2011: 0.4.0

•

In February 2011, we deployed our first major node.jsbased service (on 0.4.0)

•

Service was able...
January 2011: node-dtrace-provider

•

Our DTrace probes in node were proving to be too lowlevel for higher-level services...
April 2011: Restify

•

Based on our experiences with Connect/Express, we
wanted to build a node module that was purpose-b...
November 2011: MDB support for V8

•

In mid-2011, Joyent’s Dave Pacheco dared to dream the
impossible dream: full postmor...
December 2011: DTrace ustack helper

•

mdb_v8 was actually a way station to an even bolder
dream: a DTrace ustack helper ...
December 2011: Flame graphs

•

Pouring through stack traces can make hot functions
difficult to visualize

•

Joyent’s Bre...
January 2012: Bunyan

•

Logging was becoming more and more of a problem for
us — especially as we were developing distrib...
February 2012: npm shrinkwrap

•

npm allows for fine-grained semver control over
package dependencies, but we found that n...
April 2012: node-vasync

•

There are a number of modules that deal with some of
the mechanics of asynchronous control flow...
May 2012: ::findjsobjects

•

Building on Dave Pacheco’s mdb_v8, we implemented a
debugger command that iterates over all o...
May 2012: ::findjsobjects -p

•

Searching by property name allows one to find particular
objects in the JavaScript heap, e....
July 2012: node-fast

•

While HTTP makes it very easy to put together a
distributed system, parsing and connection
manage...
October 2012: Bunyan + DTrace

•

With all of our services using Bunyan, we could enable
dynamic logging by adding DTrace ...
May 2013: --abort-on-uncaught-exception

•

Crash dumps are great — but aborting after an
uncaught exception makes it very...
July 2013: Thoth

•

One of the most important systems we have built in
node is Manta, our object store featuring in situ ...
December 2013: Dump analysis on Linux

•

Postmortem debugging has been a (the) tremendous
breakthrough for node.js in pro...
December 2013: Linux support in libproc

•

Over the course of a multiday engineering hackathon,
TJ and Joyent’s Max Brunn...
Node.js in production!

•

For us at Joyent, the tooling that we have built into
node.js has resulted in what we believe t...
Thank you

•

@dapsays, the Patron Saint of node.js in production, for
DTrace support, MDB support, node-vasync, Manta, et...
Upcoming SlideShare
Loading in …5
×

node.js in production: Reflections on three years of riding the unicorn

16,878 views

Published on

My presentation at #NodeSummit, December 3, 2013. Video is at http://www.joyent.com/developers/videos/reflections-on-three-years-of-nodejs-in-production

Published in: Technology

node.js in production: Reflections on three years of riding the unicorn

  1. 1. node.js in production: Reflections on three years of riding the unicorn Bryan Cantrill SVP, Engineering bryan@joyent.com @bcantrill Tuesday, December 3, 13
  2. 2. Production systems • Production systems are ones doing real work: when they misbehave, users or other systems are affected • Production systems value reliability, performance and ease of deployment — usually in that order • Contrast to development systems, that value ease of development and speed of development — in that order • These values can be in tension: new languages and environments typically arise for their development values, not their production ones • Would node.js be any different? Tuesday, December 3, 13
  3. 3. node.js advantages • In terms of production suitability, node.js had — and still has — a couple of major advantages going for it: • • It’s built on a VM (V8) that itself was designed for performance • Tuesday, December 3, 13 It leverages extant (Unix) abstractions • • It’s not a new language Its pure event-oriented model aligns ease of programming with scalability with respect to load As the stewards of both node and SmartOS, Joyent had another advantage: we could change, improve or leverage SmartOS to accommodate node in production
  4. 4. node.js challenges • But node.js also has a couple of major challenges: • • JavaScript closures make it easy to accidentally reference memory • Because node.js is often used to connect backend components, failure to propagate back pressure can induce memory explosion and death • Tuesday, December 3, 13 Single-threaded execution of JavaScript means that compute-bound code can entirely impede progress High performance VM also implies inscrutable core dumps and very limited instrumentation
  5. 5. August 2010: DTrace in node.js • Added simple user-level statically defined tracing (USDT) probes for node.js on platforms that support DTrace (e.g., Mac OS X, SmartOS) • Probes were around connection establishment, serving HTTP requests, etc. • Allowed questions to be dynamically asked of running, production node.js servers, e.g.: dtrace -n ‘node*:::http-server-request{ printf(“%s of %s from %sn”, args[0]->method, args[0]->url, args[1]->remoteAddress)}‘ dtrace -n http-server-request’{ @[args[1]->remoteAddress] = count()}‘ dtrace -n gc-start’{self->ts = timestamp}’ -n gc-done’/self->ts/{@ = quantize(timestamp - self->ts)}’ Tuesday, December 3, 13
  6. 6. August 2010: Deploying 0.2.x • In August 2010, we deployed our first node.js-based service into production: a NodeKnockout leader-board that used node.js DTrace probes to geolocate connections to contestants in real-time • Results were promising; surprisingly easy to develop and deploy a node.js based service — and service consumed very little CPU • Watching the Node Knockout contestants in production revealed they were all light on CPU: • But there was a storm cloud... Tuesday, December 3, 13
  7. 7. August 2010: Deploying 0.2.x, cont. • We had a memory leak that resulted in heap exhaustion after several hours under heavy load • Our service was stateless and load balanced for HA, so this was more disconcerting than debilitating... • ...but we also had quite a few contestants that would run their RSS up and crash; there was clearly a larger issue: Tuesday, December 3, 13
  8. 8. February 2011: 0.4.0 • In February 2011, we deployed our first major node.jsbased service (on 0.4.0) • Service was able to be built remarkably quickly — but with some pain-points around Connect • Despite being potentially a compute-bound service, CPU consumption was (again) a non-issue • And with an updated node (and many fixed node leaks), memory consumption wasn’t necessarily as acute... • …but we hit our first “spinning black hole” problem Tuesday, December 3, 13
  9. 9. January 2011: node-dtrace-provider • Our DTrace probes in node were proving to be too lowlevel for higher-level services — we needed to allow USDT probes to be expressed in JavaScript • Fortunately, DTrace community member Chris Andrews extended his libusdt to node.js, allowed statically defined probes in JavaScript, e.g.: var dtp = d.createDTraceProvider(‘foo’); var probe = dtp.addProbe(‘foo-start’); probe.fire(function(p) { return ([ { bar: 123, baz: ‘bar’ } ]); }); Tuesday, December 3, 13
  10. 10. April 2011: Restify • Based on our experiences with Connect/Express, we wanted to build a node module that was purpose-built to implement HTTP-based API endpoints • Based on Chris Andrews’ work, we wanted to have first class support for DTrace • Joyent’s Mark Cavage developed node-restify, which quickly became the foundation for all of our services • Built-in DTrace support allows full observability into perroute/per-handler latency — a capability that we could not live without at this point Tuesday, December 3, 13
  11. 11. November 2011: MDB support for V8 • In mid-2011, Joyent’s Dave Pacheco dared to dream the impossible dream: full postmortem support for V8 for MDB, the debugger native to SmartOS • Several unspeakable layer violations, mdb_v8 brought postmortem debugging to node.js • ::jsstack prints full stack including both native C++ frames and JavaScript frames • • ::jsprint prints JavaScript objects — from the dump Tuesday, December 3, 13 Thanks to mdb_v8, we were able to go back to a core dump from that infinite loop in our service deployed several months earlier — and nail it
  12. 12. December 2011: DTrace ustack helper • mdb_v8 was actually a way station to an even bolder dream: a DTrace ustack helper for node.js • A ustack helper is a bit of code that accompanies a binary and assists DTrace in probe context to resolve stack frames to their higher-level names • Once completed, allows user-level stack traces to be associated with in-kernel events — like profiling events • Can use the DTrace profile provider to determine how a node.js program is consuming CPU via stack sampling Tuesday, December 3, 13
  13. 13. December 2011: Flame graphs • Pouring through stack traces can make hot functions difficult to visualize • Joyent’s Brendan Gregg developed flame graphs, which allow us to easily visualize thousands of sampled stacks: Tuesday, December 3, 13
  14. 14. January 2012: Bunyan • Logging was becoming more and more of a problem for us — especially as we were developing distributed systems in node.js • Joyent’s Trent Mick developed node-bunyan, a simple and fast JSON logging library for node.js • Provides standardized, JSON, line-based log output that can be easily processed with JSON tools, e.g.: {"name":"moray","hostname":"d1cfb6c7-c975-4ed8-a689fb18f94b6bfc","pid":8393,"component":"manatee","path":"/manatee/sdc/ election","level":20,"db":{"available":2,"max":15,"size":2,"waiting": 0},"options":{"async":false,"read":true},"msg":"pg: entered","time":"2013-12-03T02:54:24.565Z","v":0} • Tuesday, December 3, 13 Also includes command line tool, bunyan, for displaying Bunyan logs
  15. 15. February 2012: npm shrinkwrap • npm allows for fine-grained semver control over package dependencies, but we found that nested dependencies could result in non-replicable installs • “npm shrinkwrap” generates a file that shrinkwraps all nested dependencies into npm-shrinkwrap.json, thereby locking down all nested versions • Guarantees that all installs will have same semver versions of dependencies • Doesn’t necessarily guarantee identical installs, however; for this, one needs private npm repositories Tuesday, December 3, 13
  16. 16. April 2012: node-vasync • There are a number of modules that deal with some of the mechanics of asynchronous control flow… • But we found that libraries that handle We found we needed one that emphasized debugging, and in particular, • node-vasync captures a number of popular flow patterns and allows state to be inspected via MDB Tuesday, December 3, 13
  17. 17. May 2012: ::findjsobjects • Building on Dave Pacheco’s mdb_v8, we implemented a debugger command that iterates over all of memory in a core dump, looking for JavaScript objects • Entirely brute force, but allows one to take a swing at a nasty node.js issue: semantic memory leaks > ::findjsobjects OBJECT #OBJECTS 95709ac1 195 957093f9 66 95f13181 130 8432ff55 222 843304dd 91 8432cc55 99 95f08545 66 8432f2e1 546 9570cafd 47 8432be95 415 8432fb09 67 Tuesday, December 3, 13 #PROPS 3 9 5 3 9 9 14 2 24 3 19 CONSTRUCTOR: PROPS Object: socket, type, handle Object: uid, windowsVerbatimArguments, stdio, … <anonymous> (as exports.StringDecoder): … Buffer: length, offset, parent Object: refreservation, creation, name, type, … Object: time, msg, level, hostname, pid, action, … ChildProcess: _closesNeeded, stdio, … Array Object: <sliced string>, <sliced string>, … Array Socket: errorEmitted, _bytesDispatched, …
  18. 18. May 2012: ::findjsobjects -p • Searching by property name allows one to find particular objects in the JavaScript heap, e.g.: > ::findjsobjects -p ip4addr | ::findjsobjects | ::jsprint -a 8432b109: { ip4addr: 9aee115d: "10.88.88.200", VLAN: 9aee1199: "0", Host Interface: 9aee1185: "e1000g0", Link Status: 9aee1175: "up", MAC Address: 9aee113d: "02:08:20:47:93:82", } … • While designed for postmortem debugging, this allows mdb_v8 to be used for in situ debugging in development • Also guides one to a best practice: towards unique property names (which we have historically done in the operating system via structure prefixing) Tuesday, December 3, 13
  19. 19. July 2012: node-fast • While HTTP makes it very easy to put together a distributed system, parsing and connection management can become prohibitively expensive • In building Manta, we found that we needed something lighter/faster; Joyent’s Mark Cavage built node-fast • Only what you need: fully async/duplex/persistent connections, simple on-wire protocol (JSON), etc. • None of what you don’t want: no IDL madness, no object model, no binary translation madness, etc. • Deliberately light and limited — HTTP is still the right answer until it isn’t Tuesday, December 3, 13
  20. 20. October 2012: Bunyan + DTrace • With all of our services using Bunyan, we could enable dynamic logging by adding DTrace USDT probes • Can use the raw DTrace probes: # dtrace -qn log-debug'{printf("%sn", copyinstr(arg0))}' -x strsize=8k {"name":"wf-moray-backend","hostname":"414ffb35-adee-47b7-bdf4d21cb039386c","pid": 10952,"component":"MorayClient","host":"10.99.99.17","port": 2020,"req_id":"bddb180f-1770-edcf-8df2-b3a81d97e9b1","level": 20,"bucket":"wf_runners","key":"414ffb35-adee-47b7-bdf4d21cb039386c","value": {"active_at":"2013-12-03T07:22:25.125Z","idle":false},"msg":"putObject: entered","time":"2013-12-03T07:22:25.135Z","v":0} ... • Added the json() subroutine to DTrace to make this easier to process • Can also use “bunyan -p” and avoid the lower-level DTrace details entirely Tuesday, December 3, 13
  21. 21. May 2013: --abort-on-uncaught-exception • Crash dumps are great — but aborting after an uncaught exception makes it very difficult to determine the true origin of the exception • Dave Pacheco implemented a V8 patch to induce a process abort (and a core dump) on an uncaught exception • This allows us to use postmortem debugging to debug our everyday logic errors • Available starting in 0.10.x — we use it wherever we have it! Tuesday, December 3, 13
  22. 22. July 2013: Thoth • One of the most important systems we have built in node is Manta, our object store featuring in situ compute • Manta is an excellent platform for building data-based services — especially for large data objects • We built manta-thoth, a platform for core and crash dump analysis that allows us to debug core dumps without moving them • Thoth has become critically important for us to track and automatically debug production node.js services Tuesday, December 3, 13
  23. 23. December 2013: Dump analysis on Linux • Postmortem debugging has been a (the) tremendous breakthrough for node.js in production… • ...but despite all node’s postmortem support all being open source, it has been limited to SmartOS • Some have toyed with porting MDB to Linux; this is in principle possible, but will be rough sledding • Joyent’s TJ Fontaine (of node core fame) observed what we had done with dump analysis on Manta and had a simpler idea… • What about making Linux dumps consumable on SmartOS — and therefore Manta? Tuesday, December 3, 13
  24. 24. December 2013: Linux support in libproc • Over the course of a multiday engineering hackathon, TJ and Joyent’s Max Brunning added support for Linux crash dumps in SmartOS’s libproc • Fortunately, because of the way the postmortem work was done by Dave Pacheco, it Just Works • Do this yourself: https://gist.github.com/tjfontaine/de104fe058300a51f7cf • For Linux users: put your Linux dumps to Manta, and you can finally debug those pesky leaks and crashes! • Use --abort-on-uncaught-exception and you can use Manta and postmortem debugging to debug more quotidian programming errors! Tuesday, December 3, 13
  25. 25. Node.js in production! • For us at Joyent, the tooling that we have built into node.js has resulted in what we believe to be the best dynamic environment for production use • Yes, even when compared to much older platforms like Java and Erlang... • There is still work to be done, especially around add-on development (see TJ’s shim work!) and potentially better bundling of objects… • We will continue to emphasize production deployment and use in our stewardship of node.js! Tuesday, December 3, 13
  26. 26. Thank you • @dapsays, the Patron Saint of node.js in production, for DTrace support, MDB support, node-vasync, Manta, etc. • • • • • @mcavage for node-restify, node-fast, Manta, etc. Tuesday, December 3, 13 @trentmick for node-bunyan @chrisandrews for node-dtrace-provider @brendangregg for flame graphs @tjfontaine for bringing postmortem debugging to an entirely new audience with Linux support for libproc!

×