Messaging becomes Data Distributions gets embedded event processing (not complex, made simple) - bending all the rules one benchmark at a time - Push Technology, Waratek and other things
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
Data distribution in the cloud with Node.js
1. Data Distribution
in the cloud with
Node.js
Copyright Push Technology 2012
2. • British startup. Founded in 2006.
• ‘Last mile’ data distribution specialist.
• Data-centric approach to
messaging/caching.
• Preferred by 6 of the top 10 online
eGaming exchanges.
• Growing fast. 400% year on year.
• Focus: Better bang for your bytes!
Introducing Push Technology
Copyright Push Technology 2012
Twitter: @push_technology
3. • Distributed Systems / HPC guy.
• Chief Scientist :- at Push Technology
• Alumnus of :-
Motorola, IONA, Betfair, JPMC, StreamBase.
• School: Trinity College Dublin.
- BA (Mod). Comp. Sci.+
- M.Sc. Networks & Distributed Systems
• Responds to: Guinness, Whisky
About me?
Copyright Push Technology 2012
Darach@PushTechnology.com
4. • Favorite language: Erlang
• Favorite bits?
• OTP – Behaviors
• Bit Syntax
• Least favorite language: Java
• Paid to write this stuff
• Love the JVM
• Liking Node a lot.
• Small fast data guy. I work in microseconds,
measure in nanoseconds. On my critical path
micro-benchmarking is a way of life.
About me?
Copyright Push Technology 2012
Darach@PushTechnology.com
5. 1st clean room certified JVM in 10
years. Built in Dublin! It rocks.
Tenant #1 Tenant #2 Tenant #N
(Diffusion) (Diffusion) (Diffusion)
Push Technology Diffusion
Waratek Cloud VM for Java
Benefits
• High density deployments
• Elastic. Scalable on demand
• Meterability: Bandwidth and compute utilization
• Multi-tenant. Each tenant fully isolated
Copyright Push Technology 2012
6. A US Cap Market second?
• 174 microseconds round trip
time rules out High
Frequency Trading
applications.
Not on the critical path!
Source: Me, former life
@StreamBase
• http://slidesha.re/guZOVe
8. Traditional Messaging
A
B
ba
bb
Producers ? Consumers
Pros Cons
• Loosely coupled. • No data model. Slinging blobs
• All you can eat messaging patterns • Fast producer, slow consumer? Ouch.
• Familiar • No data ‘smarts’. A blob is a blob.
Copyright Push Technology 2012
9. Invented yonks ago…
Before the InterWebs
For ‘reliable’ networks
For machine to machine
Remember DEC Message Queues?
- That basically. Vomit!
Copyright Push Technology 2012
10. When fallacies were simple
-The network is reliable
-Latency is zero
-Bandwidth is infinite
-There is one administrator
-The network is secure
-Transport cost is zero
-The network is homogeneous
Copyright Push Technology 2012
11. Then in 1992, this happened:
The phrase ‘surfing the internet’ was coined by Jean Poly.
First base.
Copyright Push Technology 2012
12. It grew, and it grew
Copyright Push Technology 2012
13. Then in 2007, this happened:
The god phone:
Surfing died. Touching happened.
Second base unlocked.
Copyright Push Technology 2012
14. Then in 2007, this happened:
So we took all the things and put them in the internet:
Cloud happened.
So we could touch all the things.
Messaging
Apps
Hardware
Virtualize all the things
Services
Skills,
Specialties
Copyright Push Technology 2012
15. Then in 2009, this happened:
Ryan Dahl, basically.
Tyrannically asynchronous.
Devilishly event oriented.
Amazoidingly non-blocking.
Copyright Push Technology 2012
16. It grew, and it grew
Like all the good things do.
Copyright Push Technology 2012
17. Stop. Fallacies? Reality:
-The network is not reliable
nor is it cost free.
-Latency is not zero
nor is it a democracy.
-Bandwidth is not infinite
nor predictable especially the last mile!
-There is not only one administrator
trust, relationships are key
-The network is not secure
nor is the data that flows through it
-Transport cost is not zero
but what you don’t do is free
-The network is not homogeneous
nor is it smart
Copyright Push Technology 2012
18. Look. What, How & Why?
-What and How are what geeks do.
-Why gets you paid
-Business Value and Trust dictate What and How
- Policies, Events and Content implements Business Value
-Science basically. But think like a carpenter:
-Measure twice. Cut once.
Copyright Push Technology 2012
19. The Problem: The bird, basically.
Immediately Inconsistent.
But, Eventually Consistent
… Maybe.
Copyright Push Technology 2012
20. Listen.
- Every nuance comes with a set of tradeoffs.
- Choosing the right ones can be hard, but it pays off.
- Context, Environment are critical
- Break all the rules, one benchmark at a time.
- Benchmark Driven Development FTW
Copyright Push Technology 2012
21. Act.
- You measured twice, right?
- So get cutting!
- Simples
Copyright Push Technology 2012
22. Act. Telepathy? Telemetry!
A
B
ba
bb
Buffer
Producers Bloat
Consumers
Virtualize client queues? Nuance: ‘See’ backlog, client affinity. Tradeoff GD harder :/
Copyright Push Technology 2012
23. Act. Stateless or Stateful Topics
A
B
ba x
bb x
Producers
Is it a Consumers
cache?
Data one hop closer to consumers. Good state? Touch it! Exploit it! Use it!
Copyright Push Technology 2012
24. Act. Finagle the data
A
B Snapshot Delta
ba x
bb x
Producers State! Consumers
Last value cached. Tradeoff? Memory. Snapshot on subscribe. Deltas thereafter
Copyright Push Technology 2012
25. Act. ‘Smart data’
A B C A C D
t0 t1
Don’t repeat yourself. Send the changes, not the whole list after initial ‘snapshot’.
Copyright Push Technology 2012
26. Act. Behaviors
A
B
ba x
bb x
X The
Producers
topic is Consumers
the
cloud!
Extensible. Nuance? Roll your own protocols. Tradeoff? 3rd party code in the engine :/
Copyright Push Technology 2012
27. Data Distribution
Messaging remixed around:
Relevance - Queue depth for conflatable data should be 0 or 1. No more
Responsiveness - Use HTTP/REST for things. Stream the little things
Timeliness - It’s relative. M2M != M2H.
Context - Packed binary, deltas mostly, snapshot on subscribe.
Environment- Don’t send 1M 1K events to a mobile phone with 0.5mbps.
Copyright Push Technology 2012
28. An Example
Operations:>
Tenants :>
Gaming Live Internet Apps Finance QA + Dev + UAT
Copyright Push Technology 2012
29. Either way?
It’s about the data.
Period.
The rest (analysis, storage, transformation) is sugar.
Copyright Push Technology 2012
30. Sugar? Streams
w w S
C Q
w w
Stream Operations
• Mapping. Change/enrich the data structurally.
• Aggregation. A ‘window of’ data. Eg. A seconds worth.
• Splitting & Filtering
• Combining multiple streams. Eg. Temporal pattern matching
• Access/Store. Eg: CRUD, variable, file, …
Copyright Push Technology 2012
31. Sugar? Streams
w w S
C Q
w w
Stream Operations
• Mapping. Just a function call in Node.js
• Aggregation. A ‘window of’ data. Eg. A seconds worth.
• Splitting & Filtering. An expression or a set thereof.
• Combining multiple streams. It depends. Can be ‘complex’
• Access/Store. Trivial.
Copyright Push Technology 2012
33. Introducing eep.js
w w S
C Q
w w
What is eep.js?
• Add aggregate functions and window operations to Node.js
• 4 window types: tumbling, sliding, periodic, monotonic
• Node makes evented IO easy. So just add windows.
• Fast. 8-40 million events per second (upper bound).
Copyright Push Technology 2012
34. eep.js: Tumbling Windows
x() x() x() x()
emit()
x() x() x() x() emit()
1 2 3 4
x() x() x() x()
emit()
2 3 4 5
init()
2 3 4 5
init()
init()
t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 ...
What is a tumbling window?
• Every N events, give me an average of the last N events
• Does not overlap windows
• ‘Closing’ a window, ‘Emits’ a result (the average)
• Closing a window, Opens a new window
Copyright Push Technology 2012
35. eep.js: Aggregate Functions
What is an aggregate function?
• A function that computes values over events.
• The cost of calculations are ammortized per event
• Just follow the above recipe
• Example: Aggregate 2M events (equity prices), send to GPU
on emit, receive 2M options put/call prices as a result.
Copyright Push Technology 2012
39. eep.js: Sliding Windows
init()
1 2 3 4 5 .. .. .. ..
x() 1 2 3 4 .. .. .. ..
x() 1 2 3 .. .. .. ..
x() 1 2 .. .. .. ..
t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 ...
What is a sliding window?
• Like tumbling, except can overlap. But O(N2), Keep N small.
• Every event opens a new window.
• After N events, every subsequent event emits a result.
• Like all windows, cost of calculation ammortized over events
Copyright Push Technology 2012
43. eep.js: Monotonic Windows
my my my
x() x() x() x()
emit()
x() x() x() x() emit()
1 2 3 4
x() x() x() x()
emit()
2 3 4 5
init()
2 3 4 5
init()
init()
t0 t1 t2 t3 ...
What is a monotonic window?
• Driven mad by ‘wall clock time’? Need a logical clock?
• No worries. Provide your own clock! Eg. Vector clock
Copyright Push Technology 2012
47. eep.js
Embedded Event Processing:
• Simple to use. Aggregates Functions and Windowed event processing.
• Get it from GitHub/npm soon. Use it. Fork it.
• Fast. CEP engines typically handle ~250K/sec.
• For small N (most common) is 34x - 200x faster than commercial CEP engines.
• But, at a small price. Simple. No multi-dimensional, infinite or predicate windows
• Reduces a flood of events into a few in near real time
• Can handle 8-40 million events per second (max, on my laptop). YMMV.
• Combinators may be added. [Ugh, if I need combinators]
Copyright Push Technology 2012
49. Performance? In perspective
• A 1 producer, 1 consumer lock-free wait-free full duplex queue implementation on
a 2.3GHz intel Sandybridge can:
• Distribute ~300M events between hyperthreads per second
• Distribute ~50M events between two hardware threads on two cores on the same physical die
• Distributed ~30M events between two hardware threads on two cores on separate physical dies
• You can, with a fully lock-free wait-free system (and you bypass the operating system
kernel), maybe, ~1M 1K events/second
• There’s no point being capable of > 30M events/second on a thread if you’re going over a wire.
• So, 8-40 million events/second in node is a pleasant sufficiency
• It’s not the algorithm. It’s the mechanical sympathy, stoopid!
• Lock free wait-free concurrency is easier than lock based concurrency. Try it.
Copyright Push Technology 2012
50. • Thank you for listening
• Thank you for having me
• Thank you Push for the beer budget
• Le twitter: @darachennis
• Expect eep.js in GitHub soon
• I’ll hashtag it #nodedublin
• Thank you @Waratek geeks.
About me?
Copyright Push Technology 2012
Darach@PushTechnology.com