Eliminating every last bit of downtime caused by deployment and application errors takes some work. Learn how a combination of domains, sensible handling of uncaught exceptions, graceful connection termination, and process management with the cluster module and its friends can give you confidence that your application is always available.
18. tycth
r / a c doesn't do
async.
ty{
r
vrf=fnto( {
a
ucin)
trwnwErr"ho";
ho e ro(u-h)
}
;
stieu(,10;
eTmotf 0)
}cth(x {
ac e)
cnoelg"r /cthwntcth,e)
osl.o(ty
ac o' ac" x;
}
19. Domains are a bit like
tycth
r / a c for async.
vrd=rqie'oan)cet(;
a
eur(dmi'.rae)
do(err,fnto (r){
.n'ro' ucin er
cnoelg"oancuh" er;
osl.o(dmi agt, r)
};
)
vrf=dbn(ucin){
a
.idfnto(
trwnwErr"ho";
ho e ro(u-h)
};
)
stieu(,10;
eTmotf 0)
26. In Express, this might look like:
vrdmiWapr=fnto(e,rs nx){
a oanrpe
ucinrq e, et
vrrqoan=dmi.rae)
a eDmi
oancet(;
rqoanadrq;
eDmi.d(e)
rqoanadrs;
eDmi.d(e)
rqoanoc(err,fnto(r){
eDmi.ne'ro' uciner
rssn(0) / o nx(r)
e.ed50; / r eter;
};
)
rqoanrnnx)
eDmi.u(et;
}
;
Based on
https://github.com/brianc/node-domain-middleware
https://github.com/mathrawka/express-domain-errors
27. Domain methods.
a dbind an EE to the domain.
d:
r nrun a function in context of domain.
u:
b n : bind one function.
id
i t r e tlike bind but handles 1st arg e r
necp:
r.
d s o ecancels IO and timers.
ips:
34. Cluster module.
Node = one thread per process.
Most machines have multiple CPUs.
One process per CPU = cluster.
35. master / workers
1 master process forks n
workers.
Master and workers communicate state via IPC.
When workers want to listen to a socket, master registers them
for it.
Each new connection to socket is handed off to a worker.
No shared application state between workers.
38. Another use case for cluster:
Deployment.
Want to replace all existing servers.
Something must manage that = cluster master process.
39. Zero downtime deployment.
When master starts, give it a symlink to worker code.
After deploy new code, update symlink.
Send signal to master: fork new workers!
Master tells old workers to shut down, forks new workers from
new code.
Master process never stops running.
40. Signals.
A way to communicate with running processes.
S G U : reload workers (some like S G S 2
IHP
I U R ).
$kl - HP<i>
il s U pd
$srie<oesrienm>rla
evc nd-evc-ae eod
51. 1. Call s r e . l s .
evrcoe
vratrroHo =fnto(r){
a feErrok
uciner
sre.ls(;/ <-esr n nwcnetos
evrcoe) / - nue o e oncin
}
52. 2. Shut down keep-alive
connections.
vratrroHo =fnto(r){
a feErrok
uciner
apst"shtigon,tu) / <-stsae
p.e(iSutnDw" re; / - e tt
sre.ls(;
evrcoe)
}
vrsudwMdl =fnto(e,rs nx){
a htonide
ucinrq e, et
i(p.e(iSutnDw" { / <-ceksae
fapgt"shtigon)
/ - hc tt
rqcneto.eTmot1; / <-kl ke-lv
e.oncinstieu()
/ - il epaie
}
nx(;
et)
}
Idea from https://github.com/mathrawka/express-graceful-exit
53. 3. Then call p o e s e i
rcs.xt
in s r e . l s callback.
evrcoe
vratrroHo =fnto(r){
a feErrok
uciner
apst"shtigon,tu)
p.e(iSutnDw" re;
sre.ls(ucin){
evrcoefnto(
poesei() / <-alcert ei
rcs.xt1; / - l la o xt
};
)
}
56. On startup:
Cluster master comes up (for example, via Upstart).
Cluster master forks workers from symlink.
Each worker's server starts accepting connections.
57. On deploy:
Point symlink to new version.
Send signal to cluster master.
Master tells existing workers to stop accepting new connections.
Master forks new workers from new code.
Existing workers shut down gracefully.
60. Back to where we started:
1. Sensibly handle uncaught
exceptions.
We have minimized these by using domains.
But they can still happen.
61. Node docs say not to keep running.
An unhandled exception means your
application — and by extension node.js
itself — is in an undefined state. Blindly
resuming means anything could happen.
You have been warned.
http://nodejs.org/api/process.html#process_event_uncaughtexception
65. On uncaught exception:
Log error.
Server stops accepting new connections.
Worker tells cluster master it's done.
Master forks a replacement worker.
Worker exits gracefully when all connections are closed, or after
timeout.
67. People are also under the illusion that it is
possible to trace back [an uncaught]
exception to the http request that caused
it...
-felixge, https://github.com/joyent/node/issues/2582
78. Good reading:
Node.js Best Practice Exception Handling (some answers more
helpful than others)
Remove uncaught exception handler?
Isaacs stands by killing on uncaught
Domains don't incur performance hits compared to try catch
Rejected PR to add domains to Mongoose, with discussion
Don't call enter / exit across async
Comparison of naught and forever
What's changing in cluster
79. If you thought this was interesting,
We're hiring.
careers.fluencia.com