Slides from my Scalapeno 2016 talk about BigPanda's journey from node.js to Scala on our core data processing service.
For more details - hit me up at @itrvd on Twitter.
Designing IA for AI - Information Architecture Conference 2024
From node.js to Scala - with a 100x performance boost
1. FROM TO
with a 100x perf. boost!
BY ITAMAR RAVID | MAY 3, 2016
2. t
AGENDA
WE’LL TALK ABOUT…
• What we do, our challenges and what led us to Scala and Akka;
• How we redesigned our core data processing service;
• Some useful lessons and patterns.
There will be relatively little node.js bashing. Promise.
3. t
BIGPANDA: THE ANSWER TO ALERT FATIGUE
RABBIT IS DOWN!
NO FREE SPACE!
INBOUND QUEUE OVERFLOWING!
OUTBOUND QUEUE OVERFLOWING!
APPLICATION HEALTH CRITICAL!
TOO MANY FAILED HTTP REQS!
rabbit-1, ping
rabbit-2, disk
queue-1, size
queue-2, size
app1, health
app2, 500 codes
RabbitMQ cluster
ping disk
RabbitMQ node 3
queue size queue size
API server
health failed reqs
CorrelationAlgorithm
4. t
Correlation
Stage
Normalization
Stage
IN TERMS OF STREAMS…
RABBIT IS DOWN!
NO FREE SPACE!
INBOUND QUEUE OVERFLOWING!
OUTBOUND QUEUE OVERFLOWING!
APPLICATION HEALTH CRITICAL!
TOO MANY FAILED HTTP REQS!
Nagios event source
Datadog event source
AppDynamics
event source
rabbit-1, ping
rabbit-2, disk
queue-1, size
queue-2, size
app1, health
app2, 500 codes
RabbitMQ cluster
ping disk
RabbitMQ node 3
queue size queue size
API server
health failed reqs
CorrelationAlgorithm
21. t
ACTOR-BASED SOLUTION
Node Manager
Customer A
Pipeline
Kafka
Reader
Algorithm
runner
Mongo
Writer
Rabbit
Writer
Customer B
Pipeline
Customer C
Pipeline
SUPERVISION
MESSAGING
customer_a_inputs
22. t
NEXT-GEN SOLUTION
Node Manager
Customer A
Pipeline
Kafka
Reader
Algorithm
runner
Mongo
Writer
Rabbit
Writer
Customer B
Pipeline
Customer C
Pipeline
SUPERVISION
MESSAGING
FAILURE
ISOLATION
customer_a_inputs
23. t
NEXT-GEN SOLUTION
Node Manager
Customer A
Pipeline
Kafka
Reader
Algorithm
runner
Mongo
Writer
Rabbit
Writer
Customer B
Pipeline
Customer C
Pipeline
SUPERVISION
MESSAGING
SEPARATE DISPATCHERS
FOR QOS-TUNING
customer_a_inputs
28. t
PRUNING AN INFINITE DATA STREAM
5 6 7 8 9 N…10
t=8, OK
MISSING
ALERTS :-(
PRUNING STREAMS THAT RESULT IN
STATE REQUIRES STATE RECOVERY.
29. t
PRUNING AN INFINITE DATA STREAM
5 6 7 8 9 N…10
Snapshot
Repository
<data …>
lastOffset: 4
<data …>
lastOffset: 8
<data …>
lastOffset: 10
ON BOOT, LATEST SNAPSHOT IS LOADED
AND STREAM IS SEEKED TO STORED OFFSET.
30. t
PRUNING AN INFINITE DATA STREAM
CHALLENGES:
- COMPACTNESS
- SCHEMA EVOLUTION
kryo/chill with a manual de/serializer <=> Map[String, Any]
Schema evolution support with some caveats
Big datasets are only a few MBs in size
31. USE SNAPSHOTS TO PRUNE STREAMS
JSON IS NOT THE ONLY SOLUTION!
KEY TAKEAWAYS
47. t
FINAL NUMBERS AND BENEFITS
OVERALL RATE IMPROVMENT:
~ 16 events/s on a single node.js process at peak
1600-2500 events/s on a single pipeline at peak
ISOLATION
COMPLETE DETERMINISM
SCALABILITY
Actor-per-Customer; failure isolation
More nodes => more actors; reduced I/O
Actions determined entirely by Kafka contents;
amazing for debugging!