node.js in production: Reflections on three years of riding the unicorn

node.js in production:
Reﬂections on three years
of riding the unicorn
Bryan Cantrill
SVP, Engineering
bryan@joyent.com
@bcantrill
Tuesday, December 3, 13

Production systems

•

Production systems are ones doing real work: when
they misbehave, users or other systems are affected

•

Production systems value reliability, performance and
ease of deployment — usually in that order

•

Contrast to development systems, that value ease of
development and speed of development — in that order

•

These values can be in tension: new languages and
environments typically arise for their development
values, not their production ones

•

Would node.js be any different?


node.js advantages

•

In terms of production suitability, node.js had — and still
has — a couple of major advantages going for it:

•
•

It’s built on a VM (V8) that itself was designed for
performance

•


It leverages extant (Unix) abstractions

•

•

It’s not a new language

Its pure event-oriented model aligns ease of
programming with scalability with respect to load

As the stewards of both node and SmartOS, Joyent had
another advantage: we could change, improve or
leverage SmartOS to accommodate node in production

node.js challenges

•

But node.js also has a couple of major challenges:

•
•

JavaScript closures make it easy to accidentally
reference memory

•

Because node.js is often used to connect backend
components, failure to propagate back pressure can
induce memory explosion and death

•


Single-threaded execution of JavaScript means that
compute-bound code can entirely impede progress

High performance VM also implies inscrutable core
dumps and very limited instrumentation

August 2010: DTrace in node.js

•

Added simple user-level statically deﬁned tracing
(USDT) probes for node.js on platforms that support
DTrace (e.g., Mac OS X, SmartOS)

•

Probes were around connection establishment, serving
HTTP requests, etc.

•

Allowed questions to be dynamically asked of running,
production node.js servers, e.g.:
dtrace -n ‘node*:::http-server-request{
printf(“%s of %s from %sn”, args[0]->method,
args[0]->url, args[1]->remoteAddress)}‘
dtrace -n http-server-request’{
@[args[1]->remoteAddress] = count()}‘
dtrace -n gc-start’{self->ts = timestamp}’
-n gc-done’/self->ts/{@ = quantize(timestamp - self->ts)}’


August 2010: Deploying 0.2.x

•

In August 2010, we deployed our ﬁrst node.js-based
service into production: a NodeKnockout leader-board
that used node.js DTrace probes to geolocate
connections to contestants in real-time

•

Results were promising; surprisingly easy to develop
and deploy a node.js based service — and service
consumed very little CPU

•

Watching the Node Knockout contestants in production
revealed they were all light on CPU:

•

But there was a storm cloud...


August 2010: Deploying 0.2.x, cont.

•

We had a memory leak that resulted in heap exhaustion
after several hours under heavy load

•

Our service was stateless and load balanced for HA, so
this was more disconcerting than debilitating...

•

...but we also had quite a few contestants that would run
their RSS up and crash; there was clearly a larger issue:


February 2011: 0.4.0

•

In February 2011, we deployed our first major node.jsbased service (on 0.4.0)

•

Service was able to be built remarkably quickly — but
with some pain-points around Connect

•

Despite being potentially a compute-bound service,
CPU consumption was (again) a non-issue

•

And with an updated node (and many fixed node leaks),
memory consumption wasn’t necessarily as acute...

•

…but we hit our first “spinning black hole” problem


January 2011: node-dtrace-provider

•

Our DTrace probes in node were proving to be too lowlevel for higher-level services — we needed to allow
USDT probes to be expressed in JavaScript

•

Fortunately, DTrace community member Chris Andrews
extended his libusdt to node.js, allowed statically
deﬁned probes in JavaScript, e.g.:
var dtp = d.createDTraceProvider(‘foo’);
var probe = dtp.addProbe(‘foo-start’);
probe.fire(function(p) {
return ([ { bar: 123, baz: ‘bar’ } ]);
});


April 2011: Restify

•

Based on our experiences with Connect/Express, we
wanted to build a node module that was purpose-built to
implement HTTP-based API endpoints

•

Based on Chris Andrews’ work, we wanted to have ﬁrst
class support for DTrace

•

Joyent’s Mark Cavage developed node-restify, which
quickly became the foundation for all of our services

•

Built-in DTrace support allows full observability into perroute/per-handler latency — a capability that we could
not live without at this point


November 2011: MDB support for V8

•

In mid-2011, Joyent’s Dave Pacheco dared to dream the
impossible dream: full postmortem support for V8 for
MDB, the debugger native to SmartOS

•

Several unspeakable layer violations, mdb_v8 brought
postmortem debugging to node.js

•

::jsstack prints full stack including both native C++
frames and JavaScript frames

•
•

::jsprint prints JavaScript objects — from the dump


Thanks to mdb_v8, we were able to go back to a core
dump from that inﬁnite loop in our service deployed
several months earlier — and nail it

December 2011: DTrace ustack helper

•

mdb_v8 was actually a way station to an even bolder
dream: a DTrace ustack helper for node.js

•

A ustack helper is a bit of code that accompanies a
binary and assists DTrace in probe context to resolve
stack frames to their higher-level names

•

Once completed, allows user-level stack traces to be
associated with in-kernel events — like proﬁling events

•

Can use the DTrace proﬁle provider to determine how a
node.js program is consuming CPU via stack sampling


December 2011: Flame graphs

•

Pouring through stack traces can make hot functions
difﬁcult to visualize

•

Joyent’s Brendan Gregg developed ﬂame graphs, which
allow us to easily visualize thousands of sampled
stacks:


January 2012: Bunyan

•

Logging was becoming more and more of a problem for
us — especially as we were developing distributed
systems in node.js

•

Joyent’s Trent Mick developed node-bunyan, a simple
and fast JSON logging library for node.js

•

Provides standardized, JSON, line-based log output that
can be easily processed with JSON tools, e.g.:
{"name":"moray","hostname":"d1cfb6c7-c975-4ed8-a689fb18f94b6bfc","pid":8393,"component":"manatee","path":"/manatee/sdc/
election","level":20,"db":{"available":2,"max":15,"size":2,"waiting":
0},"options":{"async":false,"read":true},"msg":"pg:
entered","time":"2013-12-03T02:54:24.565Z","v":0}

•


Also includes command line tool, bunyan, for displaying
Bunyan logs

February 2012: npm shrinkwrap

•

npm allows for ﬁne-grained semver control over
package dependencies, but we found that nested
dependencies could result in non-replicable installs

•

“npm shrinkwrap” generates a ﬁle that shrinkwraps all
nested dependencies into npm-shrinkwrap.json,
thereby locking down all nested versions

•

Guarantees that all installs will have same semver
versions of dependencies

•

Doesn’t necessarily guarantee identical installs,
however; for this, one needs private npm repositories


April 2012: node-vasync

•

There are a number of modules that deal with some of
the mechanics of asynchronous control ﬂow…

•

But we found that libraries that handle We found we
needed one that emphasized debugging, and in
particular,

•

node-vasync captures a number of popular ﬂow patterns
and allows state to be inspected via MDB


May 2012: ::ﬁndjsobjects

•

Building on Dave Pacheco’s mdb_v8, we implemented a
debugger command that iterates over all of memory in a
core dump, looking for JavaScript objects

•

Entirely brute force, but allows one to take a swing at a
nasty node.js issue: semantic memory leaks
> ::findjsobjects
OBJECT #OBJECTS
95709ac1
195
957093f9
66
95f13181
130
8432ff55
222
843304dd
91
8432cc55
99
95f08545
66
8432f2e1
546
9570cafd
47
8432be95
415
8432fb09
67


#PROPS
3
9
5
3
9
9
14
2
24
3
19

CONSTRUCTOR: PROPS
Object: socket, type, handle
Object: uid, windowsVerbatimArguments, stdio, …
<anonymous> (as exports.StringDecoder): …
Buffer: length, offset, parent
Object: refreservation, creation, name, type, …
Object: time, msg, level, hostname, pid, action, …
ChildProcess: _closesNeeded, stdio, …
Array
Object: <sliced string>, <sliced string>, …
Array
Socket: errorEmitted, _bytesDispatched, …

May 2012: ::findjsobjects -p

•

Searching by property name allows one to find particular
objects in the JavaScript heap, e.g.:
> ::findjsobjects -p ip4addr | ::findjsobjects | ::jsprint -a
8432b109: {
ip4addr: 9aee115d: "10.88.88.200",
VLAN: 9aee1199: "0",
Host Interface: 9aee1185: "e1000g0",
Link Status: 9aee1175: "up",
MAC Address: 9aee113d: "02:08:20:47:93:82",
}
…

•

While designed for postmortem debugging, this allows
mdb_v8 to be used for in situ debugging in development

•

Also guides one to a best practice: towards unique
property names (which we have historically done in the
operating system via structure prefixing)


July 2012: node-fast

•

While HTTP makes it very easy to put together a
distributed system, parsing and connection
management can become prohibitively expensive

•

In building Manta, we found that we needed something
lighter/faster; Joyent’s Mark Cavage built node-fast

•

Only what you need: fully async/duplex/persistent
connections, simple on-wire protocol (JSON), etc.

•

None of what you don’t want: no IDL madness, no object
model, no binary translation madness, etc.

•

Deliberately light and limited — HTTP is still the right
answer until it isn’t


October 2012: Bunyan + DTrace

•

With all of our services using Bunyan, we could enable
dynamic logging by adding DTrace USDT probes

•

Can use the raw DTrace probes:
# dtrace -qn log-debug'{printf("%sn", copyinstr(arg0))}' -x strsize=8k
{"name":"wf-moray-backend","hostname":"414ffb35-adee-47b7-bdf4d21cb039386c","pid":
10952,"component":"MorayClient","host":"10.99.99.17","port":
2020,"req_id":"bddb180f-1770-edcf-8df2-b3a81d97e9b1","level":
20,"bucket":"wf_runners","key":"414ffb35-adee-47b7-bdf4d21cb039386c","value":
{"active_at":"2013-12-03T07:22:25.125Z","idle":false},"msg":"putObject:
entered","time":"2013-12-03T07:22:25.135Z","v":0}
...

•

Added the json() subroutine to DTrace to make this
easier to process

•

Can also use “bunyan -p” and avoid the lower-level
DTrace details entirely


May 2013: --abort-on-uncaught-exception

•

Crash dumps are great — but aborting after an
uncaught exception makes it very difﬁcult to determine
the true origin of the exception

•

Dave Pacheco implemented a V8 patch to induce a
process abort (and a core dump) on an uncaught
exception

•

This allows us to use postmortem debugging to debug
our everyday logic errors

•

Available starting in 0.10.x — we use it wherever we
have it!


July 2013: Thoth

•

One of the most important systems we have built in
node is Manta, our object store featuring in situ compute

•

Manta is an excellent platform for building data-based
services — especially for large data objects

•

We built manta-thoth, a platform for core and crash
dump analysis that allows us to debug core dumps
without moving them

•

Thoth has become critically important for us to track and
automatically debug production node.js services


December 2013: Dump analysis on Linux

•

Postmortem debugging has been a (the) tremendous
breakthrough for node.js in production…

•

...but despite all node’s postmortem support all being
open source, it has been limited to SmartOS

•

Some have toyed with porting MDB to Linux; this is in
principle possible, but will be rough sledding

•

Joyent’s TJ Fontaine (of node core fame) observed what
we had done with dump analysis on Manta and had a
simpler idea…

•

What about making Linux dumps consumable on
SmartOS — and therefore Manta?


December 2013: Linux support in libproc

•

Over the course of a multiday engineering hackathon,
TJ and Joyent’s Max Brunning added support for Linux
crash dumps in SmartOS’s libproc

•

Fortunately, because of the way the postmortem work
was done by Dave Pacheco, it Just Works

•

Do this yourself:
https://gist.github.com/tjfontaine/de104fe058300a51f7cf

•

For Linux users: put your Linux dumps to Manta, and
you can ﬁnally debug those pesky leaks and crashes!

•

Use --abort-on-uncaught-exception and you
can use Manta and postmortem debugging to debug
more quotidian programming errors!


Node.js in production!

•

For us at Joyent, the tooling that we have built into
node.js has resulted in what we believe to be the best
dynamic environment for production use

•

Yes, even when compared to much older platforms like
Java and Erlang...

•

There is still work to be done, especially around add-on
development (see TJ’s shim work!) and potentially better
bundling of objects…

•

We will continue to emphasize production deployment
and use in our stewardship of node.js!


Thank you

•

@dapsays, the Patron Saint of node.js in production, for
DTrace support, MDB support, node-vasync, Manta, etc.

•
•
•
•
•

@mcavage for node-restify, node-fast, Manta, etc.


@trentmick for node-bunyan
@chrisandrews for node-dtrace-provider
@brendangregg for ﬂame graphs
@tjfontaine for bringing postmortem debugging to an
entirely new audience with Linux support for libproc!

node.js in production: Reflections on three years of riding the unicorn

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to node.js in production: Reflections on three years of riding the unicorn

Similar to node.js in production: Reflections on three years of riding the unicorn (20)

More from bcantrill

More from bcantrill (20)

Recently uploaded

Recently uploaded (20)

node.js in production: Reflections on three years of riding the unicorn