Messaging becomes Data Distributions gets embedded event processing (not complex, made simple) - bending all the rules one benchmark at a time - Push Technology, Waratek and other things
Data Distribution
in the cloud with
Node.js
Copyright Push Technology 2012
• British startup. Founded in 2006.
• ‘Last mile’ data distribution specialist.
• Data-centric approach to
messaging/caching.
• Preferred by 6 of the top 10 online
eGaming exchanges.
• Growing fast. 400% year on year.
• Focus: Better bang for your bytes!
Introducing Push Technology
Copyright Push Technology 2012
Twitter: @push_technology
• Distributed Systems / HPC guy.
• Chief Scientist :- at Push Technology
• Alumnus of :-
Motorola, IONA, Betfair, JPMC, StreamBase.
• School: Trinity College Dublin.
- BA (Mod). Comp. Sci.+
- M.Sc. Networks & Distributed Systems
• Responds to: Guinness, Whisky
About me?
Copyright Push Technology 2012
Darach@PushTechnology.com
• Favorite language: Erlang
• Favorite bits?
• OTP – Behaviors
• Bit Syntax
• Least favorite language: Java
• Paid to write this stuff
• Love the JVM
• Liking Node a lot.
• Small fast data guy. I work in microseconds,
measure in nanoseconds. On my critical path
micro-benchmarking is a way of life.
About me?
Copyright Push Technology 2012
Darach@PushTechnology.com
1st clean room certified JVM in 10
years. Built in Dublin! It rocks.
Tenant #1 Tenant #2 Tenant #N
(Diffusion) (Diffusion) (Diffusion)
Push Technology Diffusion
Waratek Cloud VM for Java
Benefits
• High density deployments
• Elastic. Scalable on demand
• Meterability: Bandwidth and compute utilization
• Multi-tenant. Each tenant fully isolated
Copyright Push Technology 2012
A US Cap Market second?
• 174 microseconds round trip
time rules out High
Frequency Trading
applications.
Not on the critical path!
Source: Me, former life
@StreamBase
• http://slidesha.re/guZOVe
Traditional Messaging
A
B
ba
bb
Producers ? Consumers
Pros Cons
• Loosely coupled. • No data model. Slinging blobs
• All you can eat messaging patterns • Fast producer, slow consumer? Ouch.
• Familiar • No data ‘smarts’. A blob is a blob.
Copyright Push Technology 2012
Invented yonks ago…
Before the InterWebs
For ‘reliable’ networks
For machine to machine
Remember DEC Message Queues?
- That basically. Vomit!
Copyright Push Technology 2012
When fallacies were simple
-The network is reliable
-Latency is zero
-Bandwidth is infinite
-There is one administrator
-The network is secure
-Transport cost is zero
-The network is homogeneous
Copyright Push Technology 2012
Then in 1992, this happened:
The phrase ‘surfing the internet’ was coined by Jean Poly.
First base.
Copyright Push Technology 2012
Then in 2007, this happened:
The god phone:
Surfing died. Touching happened.
Second base unlocked.
Copyright Push Technology 2012
Then in 2007, this happened:
So we took all the things and put them in the internet:
Cloud happened.
So we could touch all the things.
Messaging
Apps
Hardware
Virtualize all the things
Services
Skills,
Specialties
Copyright Push Technology 2012
Then in 2009, this happened:
Ryan Dahl, basically.
Tyrannically asynchronous.
Devilishly event oriented.
Amazoidingly non-blocking.
Copyright Push Technology 2012
It grew, and it grew
Like all the good things do.
Copyright Push Technology 2012
Stop. Fallacies? Reality:
-The network is not reliable
nor is it cost free.
-Latency is not zero
nor is it a democracy.
-Bandwidth is not infinite
nor predictable especially the last mile!
-There is not only one administrator
trust, relationships are key
-The network is not secure
nor is the data that flows through it
-Transport cost is not zero
but what you don’t do is free
-The network is not homogeneous
nor is it smart
Copyright Push Technology 2012
Look. What, How & Why?
-What and How are what geeks do.
-Why gets you paid
-Business Value and Trust dictate What and How
- Policies, Events and Content implements Business Value
-Science basically. But think like a carpenter:
-Measure twice. Cut once.
Copyright Push Technology 2012
The Problem: The bird, basically.
Immediately Inconsistent.
But, Eventually Consistent
… Maybe.
Copyright Push Technology 2012
Listen.
- Every nuance comes with a set of tradeoffs.
- Choosing the right ones can be hard, but it pays off.
- Context, Environment are critical
- Break all the rules, one benchmark at a time.
- Benchmark Driven Development FTW
Copyright Push Technology 2012
Act.
- You measured twice, right?
- So get cutting!
- Simples
Copyright Push Technology 2012
Act. Telepathy? Telemetry!
A
B
ba
bb
Buffer
Producers Bloat
Consumers
Virtualize client queues? Nuance: ‘See’ backlog, client affinity. Tradeoff GD harder :/
Copyright Push Technology 2012
Act. Stateless or Stateful Topics
A
B
ba x
bb x
Producers
Is it a Consumers
cache?
Data one hop closer to consumers. Good state? Touch it! Exploit it! Use it!
Copyright Push Technology 2012
Act. Finagle the data
A
B Snapshot Delta
ba x
bb x
Producers State! Consumers
Last value cached. Tradeoff? Memory. Snapshot on subscribe. Deltas thereafter
Copyright Push Technology 2012
Act. ‘Smart data’
A B C A C D
t0 t1
Don’t repeat yourself. Send the changes, not the whole list after initial ‘snapshot’.
Copyright Push Technology 2012
Act. Behaviors
A
B
ba x
bb x
X The
Producers
topic is Consumers
the
cloud!
Extensible. Nuance? Roll your own protocols. Tradeoff? 3rd party code in the engine :/
Copyright Push Technology 2012
Data Distribution
Messaging remixed around:
Relevance - Queue depth for conflatable data should be 0 or 1. No more
Responsiveness - Use HTTP/REST for things. Stream the little things
Timeliness - It’s relative. M2M != M2H.
Context - Packed binary, deltas mostly, snapshot on subscribe.
Environment- Don’t send 1M 1K events to a mobile phone with 0.5mbps.
Copyright Push Technology 2012
An Example
Operations:>
Tenants :>
Gaming Live Internet Apps Finance QA + Dev + UAT
Copyright Push Technology 2012
Either way?
It’s about the data.
Period.
The rest (analysis, storage, transformation) is sugar.
Copyright Push Technology 2012
Sugar? Streams
w w S
C Q
w w
Stream Operations
• Mapping. Change/enrich the data structurally.
• Aggregation. A ‘window of’ data. Eg. A seconds worth.
• Splitting & Filtering
• Combining multiple streams. Eg. Temporal pattern matching
• Access/Store. Eg: CRUD, variable, file, …
Copyright Push Technology 2012
Sugar? Streams
w w S
C Q
w w
Stream Operations
• Mapping. Just a function call in Node.js
• Aggregation. A ‘window of’ data. Eg. A seconds worth.
• Splitting & Filtering. An expression or a set thereof.
• Combining multiple streams. It depends. Can be ‘complex’
• Access/Store. Trivial.
Copyright Push Technology 2012
Embedded Event
Processing with
Node.js.
eep.js
Copyright Push Technology 2012
Introducing eep.js
w w S
C Q
w w
What is eep.js?
• Add aggregate functions and window operations to Node.js
• 4 window types: tumbling, sliding, periodic, monotonic
• Node makes evented IO easy. So just add windows.
• Fast. 8-40 million events per second (upper bound).
Copyright Push Technology 2012
eep.js: Tumbling Windows
x() x() x() x()
emit()
x() x() x() x() emit()
1 2 3 4
x() x() x() x()
emit()
2 3 4 5
init()
2 3 4 5
init()
init()
t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 ...
What is a tumbling window?
• Every N events, give me an average of the last N events
• Does not overlap windows
• ‘Closing’ a window, ‘Emits’ a result (the average)
• Closing a window, Opens a new window
Copyright Push Technology 2012
eep.js: Aggregate Functions
What is an aggregate function?
• A function that computes values over events.
• The cost of calculations are ammortized per event
• Just follow the above recipe
• Example: Aggregate 2M events (equity prices), send to GPU
on emit, receive 2M options put/call prices as a result.
Copyright Push Technology 2012
eep.js: Sliding Windows
init()
1 2 3 4 5 .. .. .. ..
x() 1 2 3 4 .. .. .. ..
x() 1 2 3 .. .. .. ..
x() 1 2 .. .. .. ..
t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 ...
What is a sliding window?
• Like tumbling, except can overlap. But O(N2), Keep N small.
• Every event opens a new window.
• After N events, every subsequent event emits a result.
• Like all windows, cost of calculation ammortized over events
Copyright Push Technology 2012
eep.js: Monotonic Windows
my my my
x() x() x() x()
emit()
x() x() x() x() emit()
1 2 3 4
x() x() x() x()
emit()
2 3 4 5
init()
2 3 4 5
init()
init()
t0 t1 t2 t3 ...
What is a monotonic window?
• Driven mad by ‘wall clock time’? Need a logical clock?
• No worries. Provide your own clock! Eg. Vector clock
Copyright Push Technology 2012
eep.js
Embedded Event Processing:
• Simple to use. Aggregates Functions and Windowed event processing.
• Get it from GitHub/npm soon. Use it. Fork it.
• Fast. CEP engines typically handle ~250K/sec.
• For small N (most common) is 34x - 200x faster than commercial CEP engines.
• But, at a small price. Simple. No multi-dimensional, infinite or predicate windows
• Reduces a flood of events into a few in near real time
• Can handle 8-40 million events per second (max, on my laptop). YMMV.
• Combinators may be added. [Ugh, if I need combinators]
Copyright Push Technology 2012
Performance? In perspective
• A 1 producer, 1 consumer lock-free wait-free full duplex queue implementation on
a 2.3GHz intel Sandybridge can:
• Distribute ~300M events between hyperthreads per second
• Distribute ~50M events between two hardware threads on two cores on the same physical die
• Distributed ~30M events between two hardware threads on two cores on separate physical dies
• You can, with a fully lock-free wait-free system (and you bypass the operating system
kernel), maybe, ~1M 1K events/second
• There’s no point being capable of > 30M events/second on a thread if you’re going over a wire.
• So, 8-40 million events/second in node is a pleasant sufficiency
• It’s not the algorithm. It’s the mechanical sympathy, stoopid!
• Lock free wait-free concurrency is easier than lock based concurrency. Try it.
Copyright Push Technology 2012
• Thank you for listening
• Thank you for having me
• Thank you Push for the beer budget
• Le twitter: @darachennis
• Expect eep.js in GitHub soon
• I’ll hashtag it #nodedublin
• Thank you @Waratek geeks.
About me?
Copyright Push Technology 2012
Darach@PushTechnology.com