SlideShare a Scribd company logo
1 of 22
Download to read offline
Instrumenting the
real-time web:
Running node.js in production

Bryan Cantrill
VP, Engineering

bryan@joyent.com
@bcantrill
“Real-time web?”

   • The term has enjoyed some popularity, but there is
     clearly confusion about the definition of “real-time”
   • A real-time system is one in which the correctness of the
     system is relative to its timeliness
   • A hard real-time system is one which the latency
     constraints are rigid: violation constitutes total system
     failure (e.g., an actuator on a physical device)
   • A soft real-time system is one in which latency
     constraints are more flexible: violation is undesirable but
     non-fatal (e.g., a video game or MP3 player)
   • Historically, the only real-time aspect of the web has
     been in some of its static content (e.g. video, audio)
The rise of the real-time web

    • The rise of mobile + HTML5 has given rise to a new
     breed of web application: ones in which dynamic data
     has real-time semantics
    • These data-intensive real-time applications present new
     semantics for web-facing applications
    • These present new data semantics for web applications:
     CRUD, ACID, BASE, CAP — meet DIRT!
The challenge of DIRTy apps

   • DIRTy applications tend to have the human in the loop
      • Good news: deadlines are soft — microseconds only
        matter when they add up to tens of milliseconds

      • Bad news: because humans are in the loop, demand
        for the system can be non-linear

   • One must deal not only with the traditional challenge of
     scalability, but also the challenge of a real-time system!
Building DIRTy apps

   • Embedded real-time systems are sufficiently controlled
     that latency bubbles can be architected away
   • Web-facing systems are far too sloppy to expect this!
   • Focus must shift from preventing latency bubbles to
     preventing latency bubbles from cascading
   • Operations that can induce latency (network, I/O, etc.)
     must not be able to take the system out with them!
   • Implies purely asynchronous and evented architectures,
     which are notoriously difficult to implement...
Enter node.js

   • node.js is a JavaScript-based framework for building
     event-oriented servers:
      var http = require(‘http’);

      http.createServer(function (req, res) {
             res.writeHead(200, {'Content-Type': 'text/plain'});
             res.end('Hello Worldn');
      }).listen(8124, "127.0.0.1");

      console.log(‘Server running at http://127.0.0.1:8124!’);
node.js as building block

    • node.js is a confluence of three ideas:
       • JavaScriptʼs rich support for asynchrony (i.e. closures)
       • High-performance JavaScript VMs (e.g. V8)
       • The system abstractions that God intended (i.e. UNIX)
    • Because everything is asynchronous, node.js is ideal for
     delivering scale in the presence of long-latency events!
The primacy of latency

   • As the correctness of the system is its timeliness, we
     must be able to measure the system to verify it
   • In a real-time system, it does not make sense to
     measure operations per second!
   • The only metric that matters is latency
   • This is dangerous to distill to a single number; the
     distribution of latency over time is essential
   • This poses both instrumentation and visualization
     challenges!
Instrumenting for latency

    • Instrumenting for latency requires modifying the system
     twice: as an operation starts and as it finishes
    • During an operation, the system must track — on a per-
     operation basis — the start time of the operation
    • Upon operation completion, the resulting stored data
     cannot be a scalar — the distribution is essential when
     understanding latency
    • Instrumentation must be systemic; must be able to
     reach to the sources of latency deep within the system
    • These constraints eliminate static instrumentation; we
     need a better way to instrument the system
Enter DTrace

   • Facility for dynamic instrumentation of production
     systems originally developed circa 2003 for Solaris 10
   • Open sourced (along with the rest of Solaris) in 2005;
     subsequently ported to many other systems (MacOS X,
     FreeBSD, NetBSD, QNX, nascent Linux port)
   • Support for arbitrary actions, arbitrary predicates, in
     situ data aggregation, statically-defined instrumentation
   • Designed for safe, ad hoc use in production: concise
     answers to arbitrary questions
   • Particularly well suited to real-time: the original design
     center was the understanding of latency bubbles
DTrace + Node?

   • DTrace instruments the system holistically, which is to
    say, from the kernel, which poses a challenge for
    interpreted environments
   • User-level statically defined tracing (USDT) providers
    describe semantically relevant points of instrumentation
   • Some interpreted environments (e.g., Ruby, Python,
    PHP, Erlang) have added USDT providers that
    instrument the interpreter itself
   • This approach is very fine-grained (e.g., every function
    call) and doesnʼt work in JITʼd environments
   • We decided to take a different tack for Node
DTrace for node.js

    • Given the nature of the paths that we wanted to
     instrument, we introduced a function into JavaScript that
     Node can call to get into USDT-instrumented C++
    • Introduces disabled probe effect: calling from JavaScript
     into C++ costs even when probes are not enabled
    • We use USDT is-enabled probes to minimize disabled
     probe effect once in C++
    • If (and only if) the probe is enabled, we prepare a
     structure for the kernel that allows for translation into a
     structure that is familiar to node programmers
Node USDT Provider

   • Example one-liners:
     dtrace -n ‘node*:::http-server-request{
        printf(“%s of %s from %sn”, args[0]->method,
            args[0]->url, args[1]->remoteAddress)}‘

     dtrace -n http-server-request’{@[args[1]->remoteAddress] = count()}‘

     dtrace -n gc-start’{self->ts = timestamp}’ 
        -n gc-done’/self->ts/{@ = quantize(timestamp - self->ts)}’



   • A script to measure HTTP latency:
     http-server-request
     {
            self->ts[args[1]->fd] = timestamp;
     }

     http-server-response
     /self->ts[args[0]->fd]/
     {
            @[zonename] = quantize(timestamp - self->ts[args[0]->fd]);
     }
User-defined USDT probes in node.js

   • Our USDT technique has been generalized by Chris
     Andrews in his node-dtrace-provider npm module:
       https://github.com/chrisa/node-dtrace-provider
   • Used by Joyentʼs Mark Cavage in his ldap.js to measure
     and validate operation latency
   • But how to visualize operation latency?
Visualizing latency

    • Could visualize latency as a scalar (i.e., average):




    • This hides outliers — and in a real-time system, it is the
     outliers that you care about!
    • Using percentiles helps to convey distribution — but
     crucial detail remains hidden
Visualizing latency as a heatmap

    • Latency is much better visualized as a heatmap, with
     time on the x-axis, latency on the y-axis, and frequency
     represented with color saturation:




    • Many patterns are now visible (as in this example of
     MySQL query latency), but critical data is still hidden
Visualizing latency as a 4D heatmap

   • Can use hue to represent higher dimensionality: time on
     the x-axis, latency on the y-axis, frequency via color
     saturation, and hue representing the new dimension:




   • In this example, the higher dimension is the MySQL
     database table associated with the operation
Visualizing node.js latency

    • Using the USDT probes as foundation, we developed a
     cloud analytics facility that visualizes latency in real-time
     via four dimensional heatmaps:




    • Facility is available via Joyentʼs no.de service, Joyentʼs
     public cloud, or Joyentʼs SmartDataCenter
Debugging latency

   • Latency visualization is essential for understanding
     where latency is being induced in a complicated system,
     but how can we determine why?
   • This requires associating an external event — an I/O
     request, a network packet, a profiling interrupt — with
     the code thatʼs inducing it
   • For node.js — like other dynamic environments — this is
     historically very difficult: the VM is opaque to the OS
   • Using DTraceʼs helper mechanism, we have developed
     a V8 ustack helper that allows OS-level events to be
     correlated to the node.js-backtrace that induced them
   • Available for node 0.6.7 on Joyentʼs SmartOS
Visualizing node.js CPU latency

   • Using the node.js ustack helper and the DTrace profile
     provider, we can determine the relative frequency of
     stack backtraces in terms of CPU consumption
   • Stacks can be visualized with flame graphs, a stack
     visualization developed by Joyentʼs Brendan Gregg:
node.js in production

    • node.js is particularly amenable for the DIRTy apps that
     typify the real-time web
    • The ability to understand latency must be considered
     when deploying node.js-based systems into production!
    • Understanding latency requires dynamic instrumentation
     and novel visualization
    • At Joyent, we have added DTrace-based dynamic
     instrumentation for node.js to SmartOS, and novel
     visualization into our cloud and software offerings
    • Better production support — better observability, better
     debuggability — remains an important area of node.js
     development!
Thank you!

   • @ryah and @rmustacc for Node DTrace USDT
    integration
   • @dapsays, @rmustacc, @rob_ellis and @notmatt for
    cloud analytics
   • @chrisandrews for node-dtrace-provider and
    @mcavage for putting it to such great use in ldap.js
   • @dapsays for the V8 DTrace ustack helper
   • @brendangregg for both the heatmap and flame graph
    visualizations
   • More information: http://dtrace.org/blogs/dap,
    http://dtrace.org/blogs/brendan and http://smartos.org

More Related Content

What's hot

Linux binary Exploitation - Basic knowledge
Linux binary Exploitation - Basic knowledgeLinux binary Exploitation - Basic knowledge
Linux binary Exploitation - Basic knowledgeAngel Boy
 
HES2011 - Tarjei Mandt – Kernel Pool Exploitation on Windows 7
HES2011 - Tarjei Mandt – Kernel Pool Exploitation on Windows 7HES2011 - Tarjei Mandt – Kernel Pool Exploitation on Windows 7
HES2011 - Tarjei Mandt – Kernel Pool Exploitation on Windows 7Hackito Ergo Sum
 
Linux cgroups and namespaces
Linux cgroups and namespacesLinux cgroups and namespaces
Linux cgroups and namespacesLocaweb
 
WASM(WebAssembly)入門 ペアリング演算やってみた
WASM(WebAssembly)入門 ペアリング演算やってみたWASM(WebAssembly)入門 ペアリング演算やってみた
WASM(WebAssembly)入門 ペアリング演算やってみたMITSUNARI Shigeo
 
Operating System-Concepts of Process
Operating System-Concepts of ProcessOperating System-Concepts of Process
Operating System-Concepts of ProcessShipra Swati
 
[C++ Korea] C++ 메모리 모델과 atomic 타입 연산들
[C++ Korea] C++ 메모리 모델과 atomic 타입 연산들[C++ Korea] C++ 메모리 모델과 atomic 타입 연산들
[C++ Korea] C++ 메모리 모델과 atomic 타입 연산들DongMin Choi
 
IPC in Microkernel Systems, Capabilities
IPC in Microkernel Systems, CapabilitiesIPC in Microkernel Systems, Capabilities
IPC in Microkernel Systems, CapabilitiesMartin Děcký
 
Windows internals
Windows internalsWindows internals
Windows internalsPiyush Jain
 
Process concept
Process conceptProcess concept
Process conceptjangezkhan
 
C++でCプリプロセッサを作ったり速くしたりしたお話
C++でCプリプロセッサを作ったり速くしたりしたお話C++でCプリプロセッサを作ったり速くしたりしたお話
C++でCプリプロセッサを作ったり速くしたりしたお話Kinuko Yasuda
 
송창규, unity build로 빌드타임 반토막내기, NDC2010
송창규, unity build로 빌드타임 반토막내기, NDC2010송창규, unity build로 빌드타임 반토막내기, NDC2010
송창규, unity build로 빌드타임 반토막내기, NDC2010devCAT Studio, NEXON
 
Computer architecture virtual memory
Computer architecture virtual memoryComputer architecture virtual memory
Computer architecture virtual memoryMazin Alwaaly
 
the Paxos Commit algorithm
the Paxos Commit algorithmthe Paxos Commit algorithm
the Paxos Commit algorithmpaolos84
 
Master Canary Forging: 新しいスタックカナリア回避手法の提案 by 小池 悠生 - CODE BLUE 2015
Master Canary Forging: 新しいスタックカナリア回避手法の提案 by 小池 悠生 - CODE BLUE 2015Master Canary Forging: 新しいスタックカナリア回避手法の提案 by 小池 悠生 - CODE BLUE 2015
Master Canary Forging: 新しいスタックカナリア回避手法の提案 by 小池 悠生 - CODE BLUE 2015CODE BLUE
 

What's hot (20)

Linux binary Exploitation - Basic knowledge
Linux binary Exploitation - Basic knowledgeLinux binary Exploitation - Basic knowledge
Linux binary Exploitation - Basic knowledge
 
HES2011 - Tarjei Mandt – Kernel Pool Exploitation on Windows 7
HES2011 - Tarjei Mandt – Kernel Pool Exploitation on Windows 7HES2011 - Tarjei Mandt – Kernel Pool Exploitation on Windows 7
HES2011 - Tarjei Mandt – Kernel Pool Exploitation on Windows 7
 
Linux cgroups and namespaces
Linux cgroups and namespacesLinux cgroups and namespaces
Linux cgroups and namespaces
 
WASM(WebAssembly)入門 ペアリング演算やってみた
WASM(WebAssembly)入門 ペアリング演算やってみたWASM(WebAssembly)入門 ペアリング演算やってみた
WASM(WebAssembly)入門 ペアリング演算やってみた
 
Understanding the Dalvik Virtual Machine
Understanding the Dalvik Virtual MachineUnderstanding the Dalvik Virtual Machine
Understanding the Dalvik Virtual Machine
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
 
Deadlock
DeadlockDeadlock
Deadlock
 
Operating System-Concepts of Process
Operating System-Concepts of ProcessOperating System-Concepts of Process
Operating System-Concepts of Process
 
[C++ Korea] C++ 메모리 모델과 atomic 타입 연산들
[C++ Korea] C++ 메모리 모델과 atomic 타입 연산들[C++ Korea] C++ 메모리 모델과 atomic 타입 연산들
[C++ Korea] C++ 메모리 모델과 atomic 타입 연산들
 
IPC in Microkernel Systems, Capabilities
IPC in Microkernel Systems, CapabilitiesIPC in Microkernel Systems, Capabilities
IPC in Microkernel Systems, Capabilities
 
Windows internals
Windows internalsWindows internals
Windows internals
 
Virtual Machine Constructions for Dummies
Virtual Machine Constructions for DummiesVirtual Machine Constructions for Dummies
Virtual Machine Constructions for Dummies
 
Process concept
Process conceptProcess concept
Process concept
 
C++でCプリプロセッサを作ったり速くしたりしたお話
C++でCプリプロセッサを作ったり速くしたりしたお話C++でCプリプロセッサを作ったり速くしたりしたお話
C++でCプリプロセッサを作ったり速くしたりしたお話
 
Build Programming Language Runtime with LLVM
Build Programming Language Runtime with LLVMBuild Programming Language Runtime with LLVM
Build Programming Language Runtime with LLVM
 
송창규, unity build로 빌드타임 반토막내기, NDC2010
송창규, unity build로 빌드타임 반토막내기, NDC2010송창규, unity build로 빌드타임 반토막내기, NDC2010
송창규, unity build로 빌드타임 반토막내기, NDC2010
 
Computer architecture virtual memory
Computer architecture virtual memoryComputer architecture virtual memory
Computer architecture virtual memory
 
the Paxos Commit algorithm
the Paxos Commit algorithmthe Paxos Commit algorithm
the Paxos Commit algorithm
 
kernels
 kernels kernels
kernels
 
Master Canary Forging: 新しいスタックカナリア回避手法の提案 by 小池 悠生 - CODE BLUE 2015
Master Canary Forging: 新しいスタックカナリア回避手法の提案 by 小池 悠生 - CODE BLUE 2015Master Canary Forging: 新しいスタックカナリア回避手法の提案 by 小池 悠生 - CODE BLUE 2015
Master Canary Forging: 新しいスタックカナリア回避手法の提案 by 小池 悠生 - CODE BLUE 2015
 

Viewers also liked

FeedHenry at NodeJam (San Francisco, 25th Jan 2012)
FeedHenry at NodeJam (San Francisco, 25th Jan 2012)FeedHenry at NodeJam (San Francisco, 25th Jan 2012)
FeedHenry at NodeJam (San Francisco, 25th Jan 2012)Mícheál Ó Foghlú
 
ql.io: Consuming HTTP at Scale
ql.io: Consuming HTTP at Scale ql.io: Consuming HTTP at Scale
ql.io: Consuming HTTP at Scale Subbu Allamaraju
 
Probabilistic algorithms for fun and pseudorandom profit
Probabilistic algorithms for fun and pseudorandom profitProbabilistic algorithms for fun and pseudorandom profit
Probabilistic algorithms for fun and pseudorandom profitTyler Treat
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machineAlexei Starovoitov
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF SuperpowersBrendan Gregg
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and moreBrendan Gregg
 

Viewers also liked (8)

Node Summit 2012
Node Summit 2012Node Summit 2012
Node Summit 2012
 
FeedHenry at NodeJam (San Francisco, 25th Jan 2012)
FeedHenry at NodeJam (San Francisco, 25th Jan 2012)FeedHenry at NodeJam (San Francisco, 25th Jan 2012)
FeedHenry at NodeJam (San Francisco, 25th Jan 2012)
 
ql.io: Consuming HTTP at Scale
ql.io: Consuming HTTP at Scale ql.io: Consuming HTTP at Scale
ql.io: Consuming HTTP at Scale
 
Rqa14 secondary
Rqa14 secondaryRqa14 secondary
Rqa14 secondary
 
Probabilistic algorithms for fun and pseudorandom profit
Probabilistic algorithms for fun and pseudorandom profitProbabilistic algorithms for fun and pseudorandom profit
Probabilistic algorithms for fun and pseudorandom profit
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 

Similar to Instrumenting the real-time web: Node.js in production

John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudyJohn Adams
 
The impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves GoelevenThe impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves GoelevenParticular Software
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudBrendan Gregg
 
Build cloud native solution using open source
Build cloud native solution using open source Build cloud native solution using open source
Build cloud native solution using open source Nitesh Jadhav
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservicesBigstep
 
Birmingham-20060705
Birmingham-20060705Birmingham-20060705
Birmingham-20060705Miguel Vidal
 
node.js and Containers: Dispatches from the Frontier
node.js and Containers: Dispatches from the Frontiernode.js and Containers: Dispatches from the Frontier
node.js and Containers: Dispatches from the Frontierbcantrill
 
Sync in an NFV World (Ram, ITSF 2016)
Sync in an NFV World  (Ram, ITSF 2016)Sync in an NFV World  (Ram, ITSF 2016)
Sync in an NFV World (Ram, ITSF 2016)Adam Paterson
 
Sync in an NFV World (Ram, ITSF 2016)
Sync in an NFV World (Ram, ITSF 2016)Sync in an NFV World (Ram, ITSF 2016)
Sync in an NFV World (Ram, ITSF 2016)Calnex Solutions
 
Onboarding a Historical Company on the Cloud Journey
Onboarding a Historical Company on the Cloud JourneyOnboarding a Historical Company on the Cloud Journey
Onboarding a Historical Company on the Cloud JourneyMarius Zaharia
 
Moving to software-based production workflows and containerisation of media a...
Moving to software-based production workflows and containerisation of media a...Moving to software-based production workflows and containerisation of media a...
Moving to software-based production workflows and containerisation of media a...Kieran Kunhya
 
Fiware: Connecting to robots
Fiware: Connecting to robotsFiware: Connecting to robots
Fiware: Connecting to robotsJaime Martin Losa
 
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Docker, Inc.
 
Tech 2 tech low latency networking on Janet presentation
Tech 2 tech low latency networking on Janet presentationTech 2 tech low latency networking on Janet presentation
Tech 2 tech low latency networking on Janet presentationJisc
 
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...DataStax Academy
 
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsLeveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsJulien Anguenot
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 

Similar to Instrumenting the real-time web: Node.js in production (20)

John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
The impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves GoelevenThe impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves Goeleven
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloud
 
Build cloud native solution using open source
Build cloud native solution using open source Build cloud native solution using open source
Build cloud native solution using open source
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
Birmingham-20060705
Birmingham-20060705Birmingham-20060705
Birmingham-20060705
 
node.js and Containers: Dispatches from the Frontier
node.js and Containers: Dispatches from the Frontiernode.js and Containers: Dispatches from the Frontier
node.js and Containers: Dispatches from the Frontier
 
Sync in an NFV World (Ram, ITSF 2016)
Sync in an NFV World  (Ram, ITSF 2016)Sync in an NFV World  (Ram, ITSF 2016)
Sync in an NFV World (Ram, ITSF 2016)
 
Sync in an NFV World (Ram, ITSF 2016)
Sync in an NFV World (Ram, ITSF 2016)Sync in an NFV World (Ram, ITSF 2016)
Sync in an NFV World (Ram, ITSF 2016)
 
Onboarding a Historical Company on the Cloud Journey
Onboarding a Historical Company on the Cloud JourneyOnboarding a Historical Company on the Cloud Journey
Onboarding a Historical Company on the Cloud Journey
 
Moving to software-based production workflows and containerisation of media a...
Moving to software-based production workflows and containerisation of media a...Moving to software-based production workflows and containerisation of media a...
Moving to software-based production workflows and containerisation of media a...
 
Fiware: Connecting to robots
Fiware: Connecting to robotsFiware: Connecting to robots
Fiware: Connecting to robots
 
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
 
Tech 2 tech low latency networking on Janet presentation
Tech 2 tech low latency networking on Janet presentationTech 2 tech low latency networking on Janet presentation
Tech 2 tech low latency networking on Janet presentation
 
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
 
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsLeveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
 
Brad stack - Digital Health and Well-Being Festival
Brad stack - Digital Health and Well-Being Festival Brad stack - Digital Health and Well-Being Festival
Brad stack - Digital Health and Well-Being Festival
 
Intro to Databases
Intro to DatabasesIntro to Databases
Intro to Databases
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 

More from bcantrill

Predicting the Present
Predicting the PresentPredicting the Present
Predicting the Presentbcantrill
 
Sharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of ToolmakingSharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of Toolmakingbcantrill
 
Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...bcantrill
 
I have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systemsI have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systemsbcantrill
 
Towards Holistic Systems
Towards Holistic SystemsTowards Holistic Systems
Towards Holistic Systemsbcantrill
 
The Coming Firmware Revolution
The Coming Firmware RevolutionThe Coming Firmware Revolution
The Coming Firmware Revolutionbcantrill
 
Hardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden AgeHardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden Agebcantrill
 
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator tracesTockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator tracesbcantrill
 
No Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's LawNo Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's Lawbcantrill
 
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software EngineeringAndreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software Engineeringbcantrill
 
Visualizing Systems with Statemaps
Visualizing Systems with StatemapsVisualizing Systems with Statemaps
Visualizing Systems with Statemapsbcantrill
 
Platform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system softwarePlatform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system softwarebcantrill
 
Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?bcantrill
 
dtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the uniondtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the unionbcantrill
 
The Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systemsThe Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systemsbcantrill
 
Papers We Love: ARC after dark
Papers We Love: ARC after darkPapers We Love: ARC after dark
Papers We Love: ARC after darkbcantrill
 
Principles of Technology Leadership
Principles of Technology LeadershipPrinciples of Technology Leadership
Principles of Technology Leadershipbcantrill
 
Zebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data pathZebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data pathbcantrill
 
Platform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyondPlatform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyondbcantrill
 
Debugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mindDebugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mindbcantrill
 

More from bcantrill (20)

Predicting the Present
Predicting the PresentPredicting the Present
Predicting the Present
 
Sharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of ToolmakingSharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of Toolmaking
 
Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...
 
I have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systemsI have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systems
 
Towards Holistic Systems
Towards Holistic SystemsTowards Holistic Systems
Towards Holistic Systems
 
The Coming Firmware Revolution
The Coming Firmware RevolutionThe Coming Firmware Revolution
The Coming Firmware Revolution
 
Hardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden AgeHardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden Age
 
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator tracesTockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
 
No Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's LawNo Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's Law
 
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software EngineeringAndreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
 
Visualizing Systems with Statemaps
Visualizing Systems with StatemapsVisualizing Systems with Statemaps
Visualizing Systems with Statemaps
 
Platform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system softwarePlatform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system software
 
Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?
 
dtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the uniondtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the union
 
The Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systemsThe Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systems
 
Papers We Love: ARC after dark
Papers We Love: ARC after darkPapers We Love: ARC after dark
Papers We Love: ARC after dark
 
Principles of Technology Leadership
Principles of Technology LeadershipPrinciples of Technology Leadership
Principles of Technology Leadership
 
Zebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data pathZebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data path
 
Platform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyondPlatform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyond
 
Debugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mindDebugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mind
 

Recently uploaded

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Instrumenting the real-time web: Node.js in production

  • 1. Instrumenting the real-time web: Running node.js in production Bryan Cantrill VP, Engineering bryan@joyent.com @bcantrill
  • 2. “Real-time web?” • The term has enjoyed some popularity, but there is clearly confusion about the definition of “real-time” • A real-time system is one in which the correctness of the system is relative to its timeliness • A hard real-time system is one which the latency constraints are rigid: violation constitutes total system failure (e.g., an actuator on a physical device) • A soft real-time system is one in which latency constraints are more flexible: violation is undesirable but non-fatal (e.g., a video game or MP3 player) • Historically, the only real-time aspect of the web has been in some of its static content (e.g. video, audio)
  • 3. The rise of the real-time web • The rise of mobile + HTML5 has given rise to a new breed of web application: ones in which dynamic data has real-time semantics • These data-intensive real-time applications present new semantics for web-facing applications • These present new data semantics for web applications: CRUD, ACID, BASE, CAP — meet DIRT!
  • 4. The challenge of DIRTy apps • DIRTy applications tend to have the human in the loop • Good news: deadlines are soft — microseconds only matter when they add up to tens of milliseconds • Bad news: because humans are in the loop, demand for the system can be non-linear • One must deal not only with the traditional challenge of scalability, but also the challenge of a real-time system!
  • 5. Building DIRTy apps • Embedded real-time systems are sufficiently controlled that latency bubbles can be architected away • Web-facing systems are far too sloppy to expect this! • Focus must shift from preventing latency bubbles to preventing latency bubbles from cascading • Operations that can induce latency (network, I/O, etc.) must not be able to take the system out with them! • Implies purely asynchronous and evented architectures, which are notoriously difficult to implement...
  • 6. Enter node.js • node.js is a JavaScript-based framework for building event-oriented servers: var http = require(‘http’); http.createServer(function (req, res) { res.writeHead(200, {'Content-Type': 'text/plain'}); res.end('Hello Worldn'); }).listen(8124, "127.0.0.1"); console.log(‘Server running at http://127.0.0.1:8124!’);
  • 7. node.js as building block • node.js is a confluence of three ideas: • JavaScriptʼs rich support for asynchrony (i.e. closures) • High-performance JavaScript VMs (e.g. V8) • The system abstractions that God intended (i.e. UNIX) • Because everything is asynchronous, node.js is ideal for delivering scale in the presence of long-latency events!
  • 8. The primacy of latency • As the correctness of the system is its timeliness, we must be able to measure the system to verify it • In a real-time system, it does not make sense to measure operations per second! • The only metric that matters is latency • This is dangerous to distill to a single number; the distribution of latency over time is essential • This poses both instrumentation and visualization challenges!
  • 9. Instrumenting for latency • Instrumenting for latency requires modifying the system twice: as an operation starts and as it finishes • During an operation, the system must track — on a per- operation basis — the start time of the operation • Upon operation completion, the resulting stored data cannot be a scalar — the distribution is essential when understanding latency • Instrumentation must be systemic; must be able to reach to the sources of latency deep within the system • These constraints eliminate static instrumentation; we need a better way to instrument the system
  • 10. Enter DTrace • Facility for dynamic instrumentation of production systems originally developed circa 2003 for Solaris 10 • Open sourced (along with the rest of Solaris) in 2005; subsequently ported to many other systems (MacOS X, FreeBSD, NetBSD, QNX, nascent Linux port) • Support for arbitrary actions, arbitrary predicates, in situ data aggregation, statically-defined instrumentation • Designed for safe, ad hoc use in production: concise answers to arbitrary questions • Particularly well suited to real-time: the original design center was the understanding of latency bubbles
  • 11. DTrace + Node? • DTrace instruments the system holistically, which is to say, from the kernel, which poses a challenge for interpreted environments • User-level statically defined tracing (USDT) providers describe semantically relevant points of instrumentation • Some interpreted environments (e.g., Ruby, Python, PHP, Erlang) have added USDT providers that instrument the interpreter itself • This approach is very fine-grained (e.g., every function call) and doesnʼt work in JITʼd environments • We decided to take a different tack for Node
  • 12. DTrace for node.js • Given the nature of the paths that we wanted to instrument, we introduced a function into JavaScript that Node can call to get into USDT-instrumented C++ • Introduces disabled probe effect: calling from JavaScript into C++ costs even when probes are not enabled • We use USDT is-enabled probes to minimize disabled probe effect once in C++ • If (and only if) the probe is enabled, we prepare a structure for the kernel that allows for translation into a structure that is familiar to node programmers
  • 13. Node USDT Provider • Example one-liners: dtrace -n ‘node*:::http-server-request{ printf(“%s of %s from %sn”, args[0]->method, args[0]->url, args[1]->remoteAddress)}‘ dtrace -n http-server-request’{@[args[1]->remoteAddress] = count()}‘ dtrace -n gc-start’{self->ts = timestamp}’ -n gc-done’/self->ts/{@ = quantize(timestamp - self->ts)}’ • A script to measure HTTP latency: http-server-request { self->ts[args[1]->fd] = timestamp; } http-server-response /self->ts[args[0]->fd]/ { @[zonename] = quantize(timestamp - self->ts[args[0]->fd]); }
  • 14. User-defined USDT probes in node.js • Our USDT technique has been generalized by Chris Andrews in his node-dtrace-provider npm module: https://github.com/chrisa/node-dtrace-provider • Used by Joyentʼs Mark Cavage in his ldap.js to measure and validate operation latency • But how to visualize operation latency?
  • 15. Visualizing latency • Could visualize latency as a scalar (i.e., average): • This hides outliers — and in a real-time system, it is the outliers that you care about! • Using percentiles helps to convey distribution — but crucial detail remains hidden
  • 16. Visualizing latency as a heatmap • Latency is much better visualized as a heatmap, with time on the x-axis, latency on the y-axis, and frequency represented with color saturation: • Many patterns are now visible (as in this example of MySQL query latency), but critical data is still hidden
  • 17. Visualizing latency as a 4D heatmap • Can use hue to represent higher dimensionality: time on the x-axis, latency on the y-axis, frequency via color saturation, and hue representing the new dimension: • In this example, the higher dimension is the MySQL database table associated with the operation
  • 18. Visualizing node.js latency • Using the USDT probes as foundation, we developed a cloud analytics facility that visualizes latency in real-time via four dimensional heatmaps: • Facility is available via Joyentʼs no.de service, Joyentʼs public cloud, or Joyentʼs SmartDataCenter
  • 19. Debugging latency • Latency visualization is essential for understanding where latency is being induced in a complicated system, but how can we determine why? • This requires associating an external event — an I/O request, a network packet, a profiling interrupt — with the code thatʼs inducing it • For node.js — like other dynamic environments — this is historically very difficult: the VM is opaque to the OS • Using DTraceʼs helper mechanism, we have developed a V8 ustack helper that allows OS-level events to be correlated to the node.js-backtrace that induced them • Available for node 0.6.7 on Joyentʼs SmartOS
  • 20. Visualizing node.js CPU latency • Using the node.js ustack helper and the DTrace profile provider, we can determine the relative frequency of stack backtraces in terms of CPU consumption • Stacks can be visualized with flame graphs, a stack visualization developed by Joyentʼs Brendan Gregg:
  • 21. node.js in production • node.js is particularly amenable for the DIRTy apps that typify the real-time web • The ability to understand latency must be considered when deploying node.js-based systems into production! • Understanding latency requires dynamic instrumentation and novel visualization • At Joyent, we have added DTrace-based dynamic instrumentation for node.js to SmartOS, and novel visualization into our cloud and software offerings • Better production support — better observability, better debuggability — remains an important area of node.js development!
  • 22. Thank you! • @ryah and @rmustacc for Node DTrace USDT integration • @dapsays, @rmustacc, @rob_ellis and @notmatt for cloud analytics • @chrisandrews for node-dtrace-provider and @mcavage for putting it to such great use in ldap.js • @dapsays for the V8 DTrace ustack helper • @brendangregg for both the heatmap and flame graph visualizations • More information: http://dtrace.org/blogs/dap, http://dtrace.org/blogs/brendan and http://smartos.org