Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Debugging node in prod

174,199 views

Published on

Learn how we debug our Node apps in production at Netflix

Published in: Technology
  • Be the first to comment

Debugging node in prod

  1. 1. Debugging Node.js in Prod Yunong Xiao @yunongx Software Engineer Node Platform November 2015
  2. 2. Node.js @ Netflix ❖ 65+ Million Subscribers ❖ Website (netflix.com) ❖ Dynamic asset packager ❖ PaaS on Node ❖ Internal Services
  3. 3. –Gene Kranz, Flight Director, Apollo 13 “Let's work the problem, people. Let's not make things any worse by guessing”
  4. 4. Apply the Scientific Method 1. Construct a Hypothesis 2. Collect data 3. Analyze data and draw a conclusion 4. Repeat
  5. 5. Production Crisis ❖ Runtime Performance ❖ Runtime Crashes ❖ Memory Leaks
  6. 6. Netflix is “Slow”
  7. 7. Gather Request Data http://restify.com http://github.com/restify/node-restify Observable REST Framework
  8. 8. to the Rescue [2014-12-09T14:07:26.293Z] INFO: shakti/restify-audit/20067: handled: 200, latency=1402 (req_id=b3fa3820-7fac-11e4-8908-a5c7b70d676f, latency=1435) GET / HTTP/1.1 host: www.netflix.com -- HTTP/1.1 200 OK x-netflix.client.instance: i-057e47ef x-frame-options: DENY content-type: text/html -- req.timers: { "parseBody": 700123, "apiRpc": 701911, "render": 400031 }
  9. 9. req.timers: { "parseBody": 700123, “apiRPC”: 301911, "render": 400031, } On CPU
  10. 10. CPU is Critical ❖ Node is essentially “single threaded” ❖ Cascading effect on ALL requests in process
  11. 11. req.timers: { "parseBody": 700123, “apiRPC”: 301911, "render": 400031, } Can’t process ANY other request for 1.1 seconds On CPU
  12. 12. How Much Code? $ find . -name "*.js*" | xargs cat | wc -l 6 042 301
  13. 13. Statistically Sample Stack Traces
  14. 14. Snapshot What’s Currently Executing Stacktrace: A stack trace is a report of the active stack frames at a certain point in time during the execution of a program. > console.log(ex, ex.stack.split("n")) ReferenceError: ex is not defined at repl:1:13 at REPLServer.defaultEval (repl.js:132:27) at bound (domain.js:254:14) at REPLServer.runBound [as eval] (domain.js:267:12) at REPLServer.<anonymous> (repl.js:279:12) at REPLServer.emit (events.js:107:17) at REPLServer.Interface._onLine (readline.js:214:10) at REPLServer.Interface._line (readline.js:553:8) at REPLServer.Interface._ttyWrite (readline.js:830:14) at ReadStream.onkeypress (readline.js:109:10)
  15. 15. Two Problems 1) How to sample stack traces from a running process? 2) How to do 1) without affecting the process?
  16. 16. Linux Perf Events PERF(1) perf Manual PERF(1) NAME perf - Performance analysis tools for Linux SYNOPSIS perf [--version] [--help] COMMAND [ARGS] DESCRIPTION Performance counters for Linux are a new kernel-based subsystem that provide a framework for all things performance analysis. It covers hardware level (CPU/PMU, Performance Monitoring Unit) features and software features (software counters, tracepoints) as well.
  17. 17. Sample Stack Traces w/ perf(1) # perf record -F 99 -p `pgrep -n node` -g -- sleep 30 [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.524 MB perf.data (~22912 samples) ]
  18. 18. Sample Stack Trace ab2fee v8::internal::Heap::DeoptMarkedAllocationSites() (/apps/node/bin/ a69754 v8::internal::StackGuard::HandleInterrupts() (/apps/node/bin/node) c9f13b v8::internal::Runtime_StackGuard(int, v8::internal::Object** 3c793e3060bb (/tmp/perf-5382.map) 3c793e3060bb (/tmp/perf-5382.map) 3c793e3060bb (/tmp/perf-5382.map) 3c793e3060bb (/tmp/perf-5382.map) (repeated 30 more lines) 8e6b2f v8::Function::Call(v8::Local<v8::Context>, v8::Local<v8::Value>, int, v8::Local<v8::Value>*) (/apps/node/bin/node) 8f2281 v8::Function::Call(v8::Local<v8::Value>, int, v8::Local<v8::Value>*) (/apps/node/bin/node) df599a node::MakeCallback(node::Environment*, v8::Local<v8::Value>,... df5ccb node::CheckImmediate(uv_check_s*) (/apps/node/bin/node) fb1597 uv__run_check (/apps/node/bin/node) fabcee uv_run (/apps/node/bin/node) dfaa50 node::Start(int, char**) (/apps/node/bin/node) 7fcc3ef6876d __libc_start_main (/lib/x86_64-linux-gnu/libc-2.15.so) Missing JS Frames
  19. 19. Why? v8 places symbols JIT(Just in Time)
  20. 20. node --perf_basic_prof_only_functions “outputs the files in a format that the existing perf tool can consume.”
  21. 21. node --perf_basic_prof_only_functions Available right now in Node v5.x Coming soon to Node v4.x: https://github.com/nodejs/node/pull/3609
  22. 22. 3c793e446880 22c LazyCompile:~baseCallback /apps/node/webapp/ node_modules/restify-errors/node_modules/lodash/index.js:1654 3c793e446b20 c4 LazyCompile:~baseReduce /apps/node/webapp/ node_modules/restify-errors/node_modules/lodash/index.js:2519 3c793e446c60 330 LazyCompile:~ /apps/node/webapp/node_modules/restify- errors/node_modules/lodash/index.js:3040 3c793e447000 12c LazyCompile:~ /apps/node/webapp/node_modules/restify- errors/node_modules/lodash/index.js:2520 3c793e4471a0 2a4 LazyCompile:~ /apps/node/webapp/node_modules/restify- errors/lib/httpErrors.js:54 v8 Generated perf.map
  23. 23. Results node 5382 cpu-clock: 3c793e38b0c1 LazyCompile:DELETE native runtime.js:349 (/tmp/perf-5382.map) 3c793e31981d Builtin:JSConstructStubGeneric (/tmp/perf-5382.map) 3c793ff2ca94 (/tmp/perf-5382.map) 3c793e98a10f LazyCompile:~AtlasClient._run /apps/node/webapp/node_modules/nf-atlas-client/lib/client/AtlasClient.js:85 (/tmp/ perf-5382.map) 3c793f47de29 LazyCompile:*AtlasClient.timer /apps/node/webapp/node_modules/nf-atlas-client/lib/client/AtlasClient.js:70 (/tmp/ perf-5382.map) 3c793e9eee38 LazyCompile:~fetchSingleGetCallback /apps/node/webapp/singletons/ShaktiFetcher.js:120 (/tmp/perf-5382.map) 3c793f6cffee LazyCompile:*Model.get /apps/node/webapp/node_modules/nf-models/lib/Model.js:90 (/tmp/perf-5382.map) 3c793ed3e2ad (/tmp/perf-5382.map) 3c7940e4357b Handler:ca (/tmp/perf-5382.map) 3c793f060e3c Function:~ /apps/node/webapp/node_modules/vasync/lib/vasync.js:134 (/tmp/perf-5382.map) 3c79404edbfa (/tmp/perf-5382.map) 3c79401fd3f7 (/tmp/perf-5382.map) 3c79400e307b LazyCompile:*fetchMulti /apps/node/webapp/singletons/ShaktiFetcher.js:50 (/tmp/perf-5382.map) 3c793fb9a59f LazyCompile:*fetch /apps/node/webapp/singletons/ShaktiFetcher.js:32 (/tmp/perf-5382.map) 3c793e896697 (/tmp/perf-5382.map) 3c7943aaabbe (/tmp/perf-5382.map) 3c793ef4c53c Function:~ /apps/node/webapp/node_modules/vasync/lib/vasync.js:245 (/tmp/perf-5382.map) 3c793eaf4f01 LazyCompile:* /apps/node/webapp/node_modules/nf-packager/lib/index.js:194 (/tmp/perf-5382.map) 3c793eab130a LazyCompile:processImmediate timers.js:352 (/tmp/perf-5382.map) 3c793e319f7d Builtin:JSEntryTrampoline (/tmp/perf-5382.map) 3c793e3189e2 Stub:JSEntryStub (/tmp/perf-5382.map) a65baf v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*, bool) (/apps/node/bin/node) 8e6b2f v8::Function::Call(v8::Local<v8::Context>, v8::Local<v8::Value>, int, v8::Local<v8::Value>*) (/apps/node/bin/node) 8f2281 v8::Function::Call(v8::Local<v8::Value>, int, v8::Local<v8::Value>*) (/apps/node/bin/node) df599a node::MakeCallback(node::Environment*, v8::Local<v8::Value>, v8::Local<v8::Function>, int, v8::Local<v8::Value>*) (/apps/node/bin/node) df5ccb node::CheckImmediate(uv_check_s*) (/apps/node/bin/node) fb1597 uv__run_check (/apps/node/bin/node) fabcee uv_run (/apps/node/bin/node) dfaa50 node::Start(int, char**) (/apps/node/bin/node) 7fcc3ef6876d __libc_start_main (/lib/x86_64-linux-gnu/libc-2.15.so) ) JS Frames Native Frames
  24. 24. Problem: Too Many Traces $ cat out.nodestacks01 | grep cpu-clock | wc -l 744 $ wc -l out.nodestacks01 58116
  25. 25. Too Many Traces
  26. 26. Solution: Flame Graphs
  27. 27. Flamegraph ❖ Each box presents a function in the stack (stack frame) ❖ x-axis: percent of time on CPU ❖ y-axis: stack depth ❖ colors: random, or can be a dimension ❖ https://github.com/ brendangregg/FlameGraph v8 libc JS built ins
  28. 28. Flame Graph Interpretation a() b() h() c() d() e() f() g() i()
  29. 29. Flame Graph Interpretation Top edge shows who is running on-CPU,
 and how much (width) a() b() h() c() d() e() f() g() i()
  30. 30. Flame Graph Interpretation Top-down shows ancestry e.g., from g(): h() d() e() i() a() b() c() f() g()
  31. 31. Flame Graph Interpretation a() b() h() c() d() e() f() g() i() Widths are proportional to presence in samples e.g., comparing b() to h() (incl. children)
  32. 32. > 50% time on CPU
  33. 33. lodash!
  34. 34. function merge(object) { var args = arguments, length = 2; ...
  35. 35. Use _.assign() Instead
  36. 36. Before
  37. 37. After
  38. 38. Flame Graphs Helps you find 1 LoC out of 6 Million
  39. 39. Results ❖ Dramatically reduced request latency ❖ Reduced CPU utilization ❖ Increased throughput
  40. 40. Runtime Performance Technique ❖ Sample stack traces via perf(1) ❖ Visualize code distribution with CPU flame graphs ❖ Identify candidate code paths for performance improvement ❖ Repeat
  41. 41. Runtime Crashes
  42. 42. - Chafin, R. "Pioneer F & G Telemetry and Command Processor Core Dump Program." JPL Technical Report XVI, no. 32-1526 (1971): 174. “The method described in this article was designed to provide a core dump… with a minimal impact on the spacecraft… as the resumption of data acquisition from the spacecraft is the highest priority.”
  43. 43. Core Dumps
  44. 44. Core Dumps
  45. 45. Core Dumps — A Brief History ❖ Magnetic core memory ❖ Dump out the contents of “core” memory for debugging ❖ “Core dump” was born ❖ Initially printed on paper! ❖ Postmortem debugging was born!
  46. 46. Production Constraints ❖ Uptime is critical ❖ Not easily reproducible ❖ Can’t simulate environment ❖ Resume normal operations ASAP
  47. 47. Postmortem Debugging Take core dump Restart app Load core dump elsewhere Engineer Fix Debug Continue serving traffic
  48. 48. Configure Node to Dump Core on Error !"[0] <> node --abort_on_uncaught_exception throw.js Uncaught Error FROM Object.<anonymous> (/Users/yunong/throw.js:1:63) Module._compile (module.js:435:26) Object.Module._extensions..js (module.js:442:10) Module.load (module.js:356:32) Function.Module._load (module.js:311:12) Function.Module.runMain (module.js:467:10) startup (node.js:134:18) node.js:961:3 [1] 4131 illegal hardware instruction (core dumped) node -- abort_on_uncaught_exception throw.js
  49. 49. Node Post Mortem Tooling ❖ Netflix uses Linux in Prod ❖ Linux — Work in progress ❖ https://github.com/tjfontaine/lldb-v8 ❖ https://github.com/indutny/llnode ❖ Solaris — Full featured, compatible with Linux cores ❖ https://github.com/joyent/mdb_v8
  50. 50. Socks & Duct Tape: Setup a Debug Solaris Instance EC2: http://omnios.omniti.com/wiki.php/ Installation#IntheCloud VM: http://omnios.omniti.com/wiki.php/ Installation#Quickstart
  51. 51. Post Mortem Methodology ❖ Where: Inspect stack trace ❖ Why: Inspect heap and stack variable state
  52. 52. mdb(1) JS commands ❖ ::help <cmd> ❖ ::jsstack ❖ ::jsprint ❖ ::jssource ❖ ::jsconstructor ❖ ::findjsobjects ❖ ::jsfunctions
  53. 53. Load the Core Dump # mdb ./node-v4.2.2-linux/node-v4.2.2-linux-x64/bin/node ./core.7186 > ::load ./mdb_v8_amd64.so mdb_v8 version: 1.1.1 (release, from 28cedf2) V8 version: 143.156.132.195 Autoconfigured V8 support from target C++ symbol demangling enabled linux node binary core dumpload mdb_v8 module
  54. 54. ::jsstack > ::jsstack js: test js: storeHeader js: <anonymous> (as OutgoingMessage._storeHeader) js: <anonymous> (as ServerResponse.writeHead) js: restifyWriteHead js: _cb js: send js: <anonymous> (as <anon>) js: <anonymous> (as ReactRenderer._renderLayout) js: <anonymous> (as <anon>) js: <anonymous> (as <anon>) js: <anonymous> (as dispatchHandler) js: <anonymous> (as <anon>) js: runHooks js: runTransitionToHooks js: <anonymous> (as assign.to) js: <anonymous> (as <anon>) js: runHooks js: runTransitionFromHooks js: <anonymous> (as assign.from) js: <anonymous> (as React.createClass.statics.dispatch) native: _ZN2v88internalL6InvokeEbNS0_6HandleINS0_10JSFunctionEEENS1_INS0... native: v8::internal::Execution::Call+0xc8 native: v8::internal::Runtime_Apply+0x1ce frame type func name
  55. 55. Always name your functions! var foo = function foo() {}; Foo.prototype.bar = function bar() {}; foo(function bar() {});
  56. 56. ::jsstack -vn0 Frame and Function Args > ::jsstack -vn0 js: test file: native regexp.js posn: position 2677 this: 2421205bd4d9 (JSRegExp) arg1: 34d5391d8859 (SeqAsciiString) js: storeHeader file: http.js posn: position 18774 this: 2ad561306c91 (<unknown>) arg1: 3bd67e0669b9 (JSObject: ServerResponse) arg2: 3dfe966ae299 (JSObject: Object) arg3: 34d5391d8859 (SeqAsciiString) arg4: 34d5391d8881 (SeqAsciiString) js: <anonymous> (as OutgoingMessage._storeHeader) file: http.js posn: position 15652 this: 3bd67e0669b9 (JSObject: ServerResponse) arg1: 3dfe966ae271 (ConsString) arg2: 3dfe966add99 (JSObject: Object) js: restifyWriteHead file: /apps/node/webapp/node_modules/restify/lib/response.js posn: position 6964 this: 3bd67e0669b9 (JSObject: ServerResponse) (1 internal frame elided) js: _cb Func Name JS File Line # Func Args
  57. 57. ::jsstack -v Frame Source > ::jsstack -v js: storeHeader file: http.js posn: position 18774 this: 2ad561306c91 (<unknown>) arg1: 3bd67e0669b9 (JSObject: ServerResponse) arg2: 3dfe966ae299 (JSObject: Object) arg3: 34d5391d8859 (SeqAsciiString) arg4: 34d5391d8881 (SeqAsciiString) 652 653 function storeHeader(self, state, field, value) { 654 // Protect against response splitting. The if statement is there to 655 // minimize the performance impact in the common case. 656 if (/[rn]/.test(value)) 657 value = value.replace(/[rn]+[ t]*/g, ''); 658 659 state.messageHeader += field + ': ' + value + CRLF; 660 661 if (connectionExpression.test(field)) { 662 state.sentConnectionHeader = true; 663 if (closeExpression.test(value)) { 664 self._last = true; 665 } else { 666 self.shouldKeepAlive = true; 667 } 668
  58. 58. ::jsstack Function Args > ::jsstack -vn0 js: test file: native regexp.js posn: position 2677 this: 2421205bd4d9 (JSRegExp) arg1: 34d5391d8859 (SeqAsciiString) js: storeHeader file: http.js posn: position 18774 this: 2ad561306c91 (<unknown>) arg1: 3bd67e0669b9 (JSObject: ServerResponse) arg2: 3dfe966ae299 (JSObject: Object) arg3: 34d5391d8859 (SeqAsciiString) arg4: 34d5391d8881 (SeqAsciiString) js: <anonymous> (as OutgoingMessage._storeHeader) file: http.js posn: position 15652 this: 3bd67e0669b9 (JSObject: ServerResponse) arg1: 3dfe966ae271 (ConsString) arg2: 3dfe966add99 (JSObject: Object) js: restifyWriteHead file: /apps/node/webapp/node_modules/restify/lib/response.js posn: position 6964 this: 3bd67e0669b9 (JSObject: ServerResponse) (1 internal frame elided) js: _cb Memory Address of Var Var Type
  59. 59. ::jsprint Print JS Objects > 3bd67e0669b9::jsprint { "_time": 1437690472539, "_headers": { "content-type": "text/html", "req_id": "5b7f18f2-7f12-4c68-b07f-3cd75698ba65", "set-cookie": “CENSORED; Domain=.netflix.com; Expires=Fri, 24 Jul 2015 10:27:52 GMT "x-frame-options": "DENY", "x-ua-compatible": "IE=edge", "x-netflix.client.instance": "i-c420596c", }, "output": [], "_last": false, "_hangupClose": false, "_hasBody": true, "socket": { "_connecting": false, "_handle": [...], "_readableState": [...], "readable": true, "domain": null, "_events": [...], "_maxListeners": 10, "_writableState": [...], "writable": true, "allowHalfOpen": true, Actual JS Object Instance
  60. 60. ::jsconstructor Show Object Constructor > 3bd67e0669b9::jsconstructor -v ServerResponse (JSFunction: 2421205bced9)
  61. 61. ::jssource Print f() Source > 2421205bced9::jssource file: http.js 1066 function ServerResponse(req) { 1067 OutgoingMessage.call(this); 1068 1069 if (req.method === 'HEAD') this._hasBody = false; 1070 1071 this.sendDate = true; 1072 1073 if (req.httpVersionMajor < 1 || req.httpVersionMinor < 1) { 1074 this.useChunkedEncodingByDefault = chunkExpression.test(req.headers.te); 1075 this.shouldKeepAlive = false; 1076 } 1077 } 1078 util.inherits(ServerResponse, OutgoingMessage);
  62. 62. Core Dump === Complete Process State
  63. 63. Memory Leaks
  64. 64. Memory Leaks
  65. 65. Generate Core Dump Ad-hoc
  66. 66. gcore(1) GNU Tools gcore(1) NAME gcore - Generate a core file for a running process SYNOPSIS gcore [-o filename] pid
  67. 67. Take a Core Dump! root@demo:~# gcore `pgrep node` [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/ libthread_db.so.1". [New Thread 0x7facaeffd700 (LWP 5650)] [New Thread 0x7facaf7fe700 (LWP 5649)] [New Thread 0x7facaffff700 (LWP 5648)] [New Thread 0x7facbc967700 (LWP 5647)] [New Thread 0x7facbd168700 (LWP 5617)] [New Thread 0x7facbd969700 (LWP 5616)] [New Thread 0x7facbe16a700 (LWP 5615)] [New Thread 0x7facbe96b700 (LWP 5614)] 0x00007facbea5b5a9 in syscall () from /lib/x86_64-linux-gnu/libc.so.6 Saved corefile core.5602
  68. 68. Problem: Find Leaking Objects
  69. 69. ::findjsobjects NAME findjsobjects - find JavaScript objects SYNOPSIS [ addr ] ::findjsobjects [-vb] [-r | -c cons | -p prop]
  70. 70. ::findjsobjects Find ALL JS Objects on Heap > ::findjsobjects OBJECT #OBJECTS #PROPS CONSTRUCTOR: PROPS ... 3dfe97453121 18 6721 Array 157a020e01 1304 101 <anonymous> (as Constructor): ... 8f1a53211 13879 12 ReactDOMComponent: _tag, tagName, props, ... 8f1a05691 85776 2 Array 3dfe97451a99 36 5589 Array 23e5d7d44351 1 218020 Object: .2f5hpw2hgjk.1.0.3, ... 8f1a05f31 40533 6 <anonymous> (as ReactElement): type, ... 8f1a04da1 252133 1 Array 8f1a04dc1 125869 7 Array 8f1a04f01 114914 8 Array 8f1a04d39 230924 7 Module: id, exports, parent, filename, ...
  71. 71. Memory Leak Strategy ❖ Look at objects on heap for suspicious objects ❖ Take successive core dumps and compare object counts ❖ Growing object counts are likely leaking ❖ Inspect object for more context ❖ Walk reverse references to find root object
  72. 72. Look at Object Delta Between Successive Core Dumps
  73. 73. Uptime = 45mins > ::findjsobjects OBJECT #OBJECTS #PROPS CONSTRUCTOR: PROPS ... 8f1a04d39 230924 7 Module: id, exports, parent, filename, ...
  74. 74. Uptime = 90 mins > ::findjsobjects OBJECT #OBJECTS #PROPS CONSTRUCTOR: PROPS ... 8f1a04d39 323454 7 Module: id, exports, parent, filename, ...
  75. 75. Analyze Leaked Objects
  76. 76. Representative Object > ::findjsobjects OBJECT #OBJECTS #PROPS CONSTRUCTOR: PROPS ... 8f1a04d39 323454 7 Module: id, exports, parent, filename, ... Representative Object, 1 of 323454
  77. 77. Look Closer > 8f1a04d39::jsprint { "id": "/apps/node/webapp/ui/js/pages/akiraClient.js", "exports": {}, "parent": { "id": "/apps/node/webapp/middleware/autoClientStrings.js", "exports": function autoExposeClientStrings, "parent": [...], "filename": "/apps/node/webapp/middleware/ autoClientStrings.js", "loaded": true, "children": [...], "paths": [...], }, "filename": "/apps/node/webapp/ui/js/pages/akiraClient.js",
  78. 78. Use ::findjsobjects to Find All “Module” Objects > 8f1a04d39::findjsobjects 8f1a04d39 3fd996bffb39 3fd996bfcff1 3fd996bfbac1 3fd996bf8a19 3fd996bf7949 3fd996bf3ce9 3fd996bf0f19 3fd996bead71 3fd996bea821 3fd996bea001 3fd996be92b1 3fd996be73d1 3fd996be58d1 3fd996bd88b1 3fd996bcb459 3fd996bcaa41 3fd996bc7009
  79. 79. Analyze All 320K+ Objects?
  80. 80. Custom Querying With Pipes and Unix Tools 8f1a04d39::findjsobjects | ::jsprint ! grep filename | sort | uniq -c
  81. 81. Results ... 1 "filename": "/apps/node/webapp/ui/js/akira/ components/messaging/paymentHold.js", 2 "filename": "/apps/node/webapp/ui/js/common/ commonCore.js", 1 "filename": "/apps/node/webapp/ui/js/common/ playPrediction/playPrediction.js", 3 "filename": "/apps/node/webapp/ui/js/common/ presentationTracking/presentationTracking.js", 111061 "filename": “/apps/node/webapp/ui/js/common/ playPrediction/playPrediction.js", 7103 "filename": “/apps/node/webapp/ui/js/pages/ reactClientRender.js", 111061 "filename": “/apps/node/webapp/ui/js/pages/ akiraClient.js", 118257 "filename": “/apps/node/webapp/middleware/ autoClientStrings.js", ... Client Side Modules
  82. 82. What’s holding on to these modules?
  83. 83. Aim: Find Root Object
  84. 84. Walk Reverse Refs with ::findjsobjects -r > 8f1a04d39::findjsobjects -r 8f1a04d39 referred to by 14fd6c5b13c1.parent
  85. 85. Root Object > 1f313791bb41::jsprint [ { "id": "/apps/node/webapp/ui/js/pages/akiraClient.js", "exports": [...], "parent": [...], "filename": "/apps/node/webapp/ui/js/pages/akiraClient.js", "loaded": false, "children": [...], "paths": [...], }, { "id": "/apps/node/webapp/ui/js/pages/akiraClient.js", "exports": [...], "parent": [...], "filename": "/apps/node/webapp/ui/js/pages/akiraClient.js", "loaded": false, "children": [...], "paths": [...], }, { "id": "/apps/node/webapp/ui/js/pages/akiraClient.js", "exports": [...], "parent": [...],
  86. 86. Spot the Leak var cache = {}; function checkCache(someModule) { var mod = cache[someModule]; if (!mod) { try { mod = require(someModule); cache[someModule] = mod; return mod; } catch (e) { return {}; } } return mod; } Module could be client only, must catch Should cache the fact we caught an exception here
  87. 87. Root Cause ❖ Node caches metadata for each module ❖ If require process throws an exception, the module metadata is leaked (bug?) ❖ Client side module meant we were throwing during every request, and not caching the fact we tried to require it ❖ Each request leaks 3+ module metadata objects
  88. 88. Memory Leaks ❖ Take successive core dumps (gcore(1)) ❖ Compare object counts (::findjsobjects) ❖ Growing objects are likely leaking ❖ Inspect object for more context (::jsprint) ❖ Walk reverse references to find root obj (::findjsobjects - r)
  89. 89. Post Mortem Debugging is Critical to Large Scale Node Deployments
  90. 90. More State than Just Logs ❖ Detailed stack trace (::jsstack) ❖ Function args for each frame (::jsstack -vn0) ❖ Get state of any object and its provenance (::jsprint, ::jsconstructor) ❖ Get source code of any function (::jssource) ❖ Find arbitrary JS objects (::findjsobjects) ❖ Unmodified Node binary!
  91. 91. Production Failures are Inevitable
  92. 92. But We Can Learn from Them
  93. 93. Production Debugging ❖ Runtime Performance ❖ CPU profiling/flame graphs ❖ Runtime Crashes ❖ Inspect program state with core dumps and mdb ❖ Memory leaks ❖ Analyze objects and references with core dumps and mdb
  94. 94. Use the Scientific Method
  95. 95. Epilogue — State of Tooling ❖ Join Working Group https://github.com/nodejs/post- mortem ❖ Help make mdb_v8 cross platform https://github.com/ joyent/mdb_v8 ❖ Contribute to https://github.com/tjfontaine/lldb-v8 and https://github.com/indutny/llnode
  96. 96. Acknowledgements ❖ mdb_v8 ❖ Dave Pacheco, TJ Fontaine, Julien Gilli, Bryan Cantrill ❖ CPU Profiling/Flamegraphs ❖ Brendan Gregg, Google v8 team, Ali Ijaz Sheikh ❖ Linux Perf ❖ Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Peter Zijlstra ❖ lldb-v8 ❖ TJ Fontaine ❖ llnode ❖ Fedor Indutny
  97. 97. From Netflix: Node.js Page Fault Flame Graphs
  98. 98. Get Involved!
  99. 99. Thanks ❖ Questions? We’re Hiring! ❖ yunong@netflix.com ❖ @yunongx
  100. 100. Citations ❖ Slides 29-32 used with permission from “Java Mixed- Mode Flame Graphs”, Brendan Gregg, Oct 2015 ❖ Slide 26 used with permission from http:// www.brendangregg.com/FlameGraphs/ cpuflamegraphs.html

×