Top Node.js Metrics to Watch
Stefan Thies
Agenda
- Development of node.js
performance monitoring agents
- Node.js key metrics
Metrics Aggregation
- pre-aggregation in monitoring agent
- N measures per minute -> sum, max, min, avg, rates, percentiles
- aggregation plans in Sematext backend (Java / Hadoop)
- 1 min, 5 min, 1 hour, 1 day, 1 week, 1 month
- fast queries over long periods of time and multiple dimensions

(e.g. filters for host, process/worker id)
Store & Forward
1)Buffer Metrics when the receiver is not reachable ...
2)Re-transmit metrics, stored in NeDB
1 http.post(options, cb1)
3 http.post(options, cb2)
3 http.post(options, cb3)
4 http.post(options, cb4)
5 cb4 (err)
5 cb1
6 cb2
7 cb3
Java
Server
Threads, Thread Pool,
limited e.g. max 3
Node Client & Java Backend
async,
non-blocking
Main + Event
Loop Thread
HTTP 500
internal server error
Luke be nice to
Node.js Client
“A stupid guy called ‘Travis’, made DoS attacks!!!”
8 Minute Unit Tests for network/storage test cases
30 seconds :)
OS Metrics limited in node.js API
- Limited Memory info os.freemem(), os.totalmem()
- a few missing CPU metrics: os.cpus()
- No Disk stats in node API
How to Load the Monitoring Agent?
OK - for Devs, but Ops don’t like to touch source code ...
Node 4.x to the rescue! 

Pre-loading modules with ‘-r’ / require
Garbage Collection
- Incremental marking and lazy sweeping
- marking ‘stop the world’
- Incremental GC cycles (scavenge)
- Full GC cycles
- What should be measured?
- Count of GC cycles
- Rate GC cycles / time
- Sum GC Time
- Released Memory (before GC - after GC)
How to get GC info?
Find GC options: node --v8-options | grep _gc
node --trace-gc --trace_gc_nvp lib/index.js
[7729:0x101804600] [I:0x101804600] 26 ms: pause=0.9 mutator=-1455940110228.4 gc=s
external=0.0 mark=0.0 sweep=0.00 sweepns=0.00 sweepos=0.00 sweepcode=0.00 sweepcell=0.00
sweepmap=0.00 evacuate=0.0 new_new=0.0 root_new=0.0 old_new=0.0 compaction_ptrs=0.0
intracompaction_ptrs=0.0 misc_compaction=0.0 weak_closure=0.0 inc_weak_closure=0.0
weakcollection_process=0.0 weakcollection_clear=0.0 weakcollection_abort=0.0
total_size_before=2360232 total_size_after=2257696 holes_size_before=32 holes_size_after=32
allocated=2360232 promoted=0 semi_space_copied=929376 nodes_died_in_new=7
nodes_copied_in_new=5 nodes_promoted=0 promotion_ratio=0.0% average_survival_ratio=90.1%
promotion_rate=0.0% semi_space_copy_rate=90.1% new_space_allocation_throughput=0
context_disposal_rate=0.0 steps_count=0 steps_took=0.0 scavenge_throughput=1180677
NPM modules for GC info
- gc-stats
- gc-profiler
- memwatch(-next)
- missing gc times
- + leak detection
- + heap diff
Native C++
modules V8
API / NAN 1.x
vs. NAN 2.x
NPM GC packages
GC Insights as part of Node.js API?
Examples for Node.js Metrics
- CPU Usage
- Memory Usage
- Disk
- I/O read/writes
- Space
- Process Metrics
- Application Metrics
- in-process monitor
Server, Process, Application Metrics
GC Metrics
GC cycles / min
< 50
GC Time
< 20 ms / min
Released Mem.
2 MB / cycle
Example - Monitoring Kibana 4.1 Node.js App
http://blog.sematext.com/2015/05/27/monitoring-kibana-4s-
node-js-app/
- 2.0 ruby server
- 3.0 HTML5 no server
- 4.0-4.2 Node Express
- > 4.3 Node Hapi.js
We run a managed 

ELK Stack / Logging SaaS
GC cycles - out of control!
GC cycles / min: 45.000 (!)
GC Time: < 10 sec / min
???
OOM Kill
Taming GC ...
GC cycles / min: 100
GC Time: < 92 ms / min
--max-old-space-size=200
GC cycles / min: 45.000 (!)
GC Time: < 10 sec / min
Update to Node.js 4.2.x
Event Loop Latency
Avg. Latency < 0,5 ms
Event Loop Latency
EventLoop Latency < 0,5 ms 3 / 15 ms !!!
3 / 15 ms !!!
Don’t Block the Event Loop ...
Process Memory
Example - Process Memory
OOM Kill
Number of Workers
Number of Workers
Correlate Metrics for Different Workers
HTTP Metrics
HTTP Request and Error Rate
Error Breakdown
Get the Full Picture ...
Thank you!
www.npmjs.com/~megastef
www.npmjs.com/~sematext
@seti321 or @sematext

Top Node.js Metrics to Watch

  • 1.
    Top Node.js Metricsto Watch Stefan Thies
  • 2.
    Agenda - Development ofnode.js performance monitoring agents - Node.js key metrics
  • 4.
    Metrics Aggregation - pre-aggregationin monitoring agent - N measures per minute -> sum, max, min, avg, rates, percentiles - aggregation plans in Sematext backend (Java / Hadoop) - 1 min, 5 min, 1 hour, 1 day, 1 week, 1 month - fast queries over long periods of time and multiple dimensions
 (e.g. filters for host, process/worker id)
  • 5.
    Store & Forward 1)BufferMetrics when the receiver is not reachable ... 2)Re-transmit metrics, stored in NeDB
  • 6.
    1 http.post(options, cb1) 3http.post(options, cb2) 3 http.post(options, cb3) 4 http.post(options, cb4) 5 cb4 (err) 5 cb1 6 cb2 7 cb3 Java Server Threads, Thread Pool, limited e.g. max 3 Node Client & Java Backend async, non-blocking Main + Event Loop Thread HTTP 500 internal server error Luke be nice to Node.js Client
  • 7.
    “A stupid guycalled ‘Travis’, made DoS attacks!!!” 8 Minute Unit Tests for network/storage test cases 30 seconds :)
  • 8.
    OS Metrics limitedin node.js API - Limited Memory info os.freemem(), os.totalmem() - a few missing CPU metrics: os.cpus() - No Disk stats in node API
  • 9.
    How to Loadthe Monitoring Agent? OK - for Devs, but Ops don’t like to touch source code ... Node 4.x to the rescue! 
 Pre-loading modules with ‘-r’ / require
  • 10.
    Garbage Collection - Incrementalmarking and lazy sweeping - marking ‘stop the world’ - Incremental GC cycles (scavenge) - Full GC cycles - What should be measured? - Count of GC cycles - Rate GC cycles / time - Sum GC Time - Released Memory (before GC - after GC)
  • 11.
    How to getGC info? Find GC options: node --v8-options | grep _gc node --trace-gc --trace_gc_nvp lib/index.js [7729:0x101804600] [I:0x101804600] 26 ms: pause=0.9 mutator=-1455940110228.4 gc=s external=0.0 mark=0.0 sweep=0.00 sweepns=0.00 sweepos=0.00 sweepcode=0.00 sweepcell=0.00 sweepmap=0.00 evacuate=0.0 new_new=0.0 root_new=0.0 old_new=0.0 compaction_ptrs=0.0 intracompaction_ptrs=0.0 misc_compaction=0.0 weak_closure=0.0 inc_weak_closure=0.0 weakcollection_process=0.0 weakcollection_clear=0.0 weakcollection_abort=0.0 total_size_before=2360232 total_size_after=2257696 holes_size_before=32 holes_size_after=32 allocated=2360232 promoted=0 semi_space_copied=929376 nodes_died_in_new=7 nodes_copied_in_new=5 nodes_promoted=0 promotion_ratio=0.0% average_survival_ratio=90.1% promotion_rate=0.0% semi_space_copy_rate=90.1% new_space_allocation_throughput=0 context_disposal_rate=0.0 steps_count=0 steps_took=0.0 scavenge_throughput=1180677
  • 12.
    NPM modules forGC info - gc-stats - gc-profiler - memwatch(-next) - missing gc times - + leak detection - + heap diff Native C++ modules V8 API / NAN 1.x vs. NAN 2.x
  • 14.
  • 15.
    GC Insights aspart of Node.js API?
  • 16.
  • 17.
    - CPU Usage -Memory Usage - Disk - I/O read/writes - Space - Process Metrics - Application Metrics - in-process monitor Server, Process, Application Metrics
  • 18.
    GC Metrics GC cycles/ min < 50 GC Time < 20 ms / min Released Mem. 2 MB / cycle
  • 19.
    Example - MonitoringKibana 4.1 Node.js App http://blog.sematext.com/2015/05/27/monitoring-kibana-4s- node-js-app/ - 2.0 ruby server - 3.0 HTML5 no server - 4.0-4.2 Node Express - > 4.3 Node Hapi.js We run a managed 
 ELK Stack / Logging SaaS
  • 20.
    GC cycles -out of control! GC cycles / min: 45.000 (!) GC Time: < 10 sec / min ??? OOM Kill
  • 21.
    Taming GC ... GCcycles / min: 100 GC Time: < 92 ms / min --max-old-space-size=200 GC cycles / min: 45.000 (!) GC Time: < 10 sec / min Update to Node.js 4.2.x
  • 22.
    Event Loop Latency Avg.Latency < 0,5 ms
  • 23.
    Event Loop Latency EventLoopLatency < 0,5 ms 3 / 15 ms !!! 3 / 15 ms !!!
  • 24.
    Don’t Block theEvent Loop ...
  • 25.
  • 26.
    Example - ProcessMemory OOM Kill
  • 27.
  • 28.
  • 29.
    Correlate Metrics forDifferent Workers
  • 30.
  • 31.
    HTTP Request andError Rate
  • 32.
  • 33.
    Get the FullPicture ...
  • 34.