More Related Content


Similar to Data distribution in the cloud with Node.js(20)


Data distribution in the cloud with Node.js

  1. Data Distribution in the cloud with Node.js Copyright Push Technology 2012
  2. • British startup. Founded in 2006. • ‘Last mile’ data distribution specialist. • Data-centric approach to messaging/caching. • Preferred by 6 of the top 10 online eGaming exchanges. • Growing fast. 400% year on year. • Focus: Better bang for your bytes! Introducing Push Technology Copyright Push Technology 2012 Twitter: @push_technology
  3. • Distributed Systems / HPC guy. • Chief Scientist :- at Push Technology • Alumnus of :- Motorola, IONA, Betfair, JPMC, StreamBase. • School: Trinity College Dublin. - BA (Mod). Comp. Sci.+ - M.Sc. Networks & Distributed Systems • Responds to: Guinness, Whisky About me? Copyright Push Technology 2012
  4. • Favorite language: Erlang • Favorite bits? • OTP – Behaviors • Bit Syntax • Least favorite language: Java • Paid to write this stuff • Love the JVM • Liking Node a lot. • Small fast data guy. I work in microseconds, measure in nanoseconds. On my critical path micro-benchmarking is a way of life. About me? Copyright Push Technology 2012
  5. 1st clean room certified JVM in 10 years. Built in Dublin! It rocks. Tenant #1 Tenant #2 Tenant #N (Diffusion) (Diffusion) (Diffusion) Push Technology Diffusion Waratek Cloud VM for Java Benefits • High density deployments • Elastic. Scalable on demand • Meterability: Bandwidth and compute utilization • Multi-tenant. Each tenant fully isolated Copyright Push Technology 2012
  6. A US Cap Market second? • 174 microseconds round trip time rules out High Frequency Trading applications. Not on the critical path! Source: Me, former life @StreamBase •
  7. Data Distribution. Wat? Copyright Push Technology 2012
  8. Traditional Messaging A B ba bb Producers ? Consumers Pros Cons • Loosely coupled. • No data model. Slinging blobs • All you can eat messaging patterns • Fast producer, slow consumer? Ouch. • Familiar • No data ‘smarts’. A blob is a blob. Copyright Push Technology 2012
  9. Invented yonks ago… Before the InterWebs For ‘reliable’ networks For machine to machine Remember DEC Message Queues? - That basically. Vomit! Copyright Push Technology 2012
  10. When fallacies were simple -The network is reliable -Latency is zero -Bandwidth is infinite -There is one administrator -The network is secure -Transport cost is zero -The network is homogeneous Copyright Push Technology 2012
  11. Then in 1992, this happened: The phrase ‘surfing the internet’ was coined by Jean Poly. First base. Copyright Push Technology 2012
  12. It grew, and it grew Copyright Push Technology 2012
  13. Then in 2007, this happened: The god phone: Surfing died. Touching happened. Second base unlocked. Copyright Push Technology 2012
  14. Then in 2007, this happened: So we took all the things and put them in the internet: Cloud happened. So we could touch all the things. Messaging Apps Hardware Virtualize all the things Services Skills, Specialties Copyright Push Technology 2012
  15. Then in 2009, this happened: Ryan Dahl, basically. Tyrannically asynchronous. Devilishly event oriented. Amazoidingly non-blocking. Copyright Push Technology 2012
  16. It grew, and it grew Like all the good things do. Copyright Push Technology 2012
  17. Stop. Fallacies? Reality: -The network is not reliable nor is it cost free. -Latency is not zero nor is it a democracy. -Bandwidth is not infinite nor predictable especially the last mile! -There is not only one administrator trust, relationships are key -The network is not secure nor is the data that flows through it -Transport cost is not zero but what you don’t do is free -The network is not homogeneous nor is it smart Copyright Push Technology 2012
  18. Look. What, How & Why? -What and How are what geeks do. -Why gets you paid -Business Value and Trust dictate What and How - Policies, Events and Content implements Business Value -Science basically. But think like a carpenter: -Measure twice. Cut once. Copyright Push Technology 2012
  19. The Problem: The bird, basically. Immediately Inconsistent. But, Eventually Consistent … Maybe. Copyright Push Technology 2012
  20. Listen. - Every nuance comes with a set of tradeoffs. - Choosing the right ones can be hard, but it pays off. - Context, Environment are critical - Break all the rules, one benchmark at a time. - Benchmark Driven Development FTW Copyright Push Technology 2012
  21. Act. - You measured twice, right? - So get cutting! - Simples Copyright Push Technology 2012
  22. Act. Telepathy? Telemetry! A B ba bb Buffer Producers Bloat Consumers Virtualize client queues? Nuance: ‘See’ backlog, client affinity. Tradeoff GD harder :/ Copyright Push Technology 2012
  23. Act. Stateless or Stateful Topics A B ba x bb x Producers Is it a Consumers cache? Data one hop closer to consumers. Good state? Touch it! Exploit it! Use it! Copyright Push Technology 2012
  24. Act. Finagle the data A B Snapshot Delta ba x bb x Producers State! Consumers Last value cached. Tradeoff? Memory. Snapshot on subscribe. Deltas thereafter Copyright Push Technology 2012
  25. Act. ‘Smart data’ A B C A C D t0 t1 Don’t repeat yourself. Send the changes, not the whole list after initial ‘snapshot’. Copyright Push Technology 2012
  26. Act. Behaviors A B ba x bb x X The Producers topic is Consumers the cloud! Extensible. Nuance? Roll your own protocols. Tradeoff? 3rd party code in the engine :/ Copyright Push Technology 2012
  27. Data Distribution Messaging remixed around: Relevance - Queue depth for conflatable data should be 0 or 1. No more Responsiveness - Use HTTP/REST for things. Stream the little things Timeliness - It’s relative. M2M != M2H. Context - Packed binary, deltas mostly, snapshot on subscribe. Environment- Don’t send 1M 1K events to a mobile phone with 0.5mbps. Copyright Push Technology 2012
  28. An Example Operations:> Tenants :> Gaming Live Internet Apps Finance QA + Dev + UAT Copyright Push Technology 2012
  29. Either way? It’s about the data. Period. The rest (analysis, storage, transformation) is sugar. Copyright Push Technology 2012
  30. Sugar? Streams w w S C Q w w Stream Operations • Mapping. Change/enrich the data structurally. • Aggregation. A ‘window of’ data. Eg. A seconds worth. • Splitting & Filtering • Combining multiple streams. Eg. Temporal pattern matching • Access/Store. Eg: CRUD, variable, file, … Copyright Push Technology 2012
  31. Sugar? Streams w w S C Q w w Stream Operations • Mapping. Just a function call in Node.js • Aggregation. A ‘window of’ data. Eg. A seconds worth. • Splitting & Filtering. An expression or a set thereof. • Combining multiple streams. It depends. Can be ‘complex’ • Access/Store. Trivial. Copyright Push Technology 2012
  32. Embedded Event Processing with Node.js. eep.js Copyright Push Technology 2012
  33. Introducing eep.js w w S C Q w w What is eep.js? • Add aggregate functions and window operations to Node.js • 4 window types: tumbling, sliding, periodic, monotonic • Node makes evented IO easy. So just add windows. • Fast. 8-40 million events per second (upper bound). Copyright Push Technology 2012
  34. eep.js: Tumbling Windows x() x() x() x() emit() x() x() x() x() emit() 1 2 3 4 x() x() x() x() emit() 2 3 4 5 init() 2 3 4 5 init() init() t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 ... What is a tumbling window? • Every N events, give me an average of the last N events • Does not overlap windows • ‘Closing’ a window, ‘Emits’ a result (the average) • Closing a window, Opens a new window Copyright Push Technology 2012
  35. eep.js: Aggregate Functions What is an aggregate function? • A function that computes values over events. • The cost of calculations are ammortized per event • Just follow the above recipe • Example: Aggregate 2M events (equity prices), send to GPU on emit, receive 2M options put/call prices as a result. Copyright Push Technology 2012
  36. meh: Fumbling Windows Copyright Push Technology 2012
  37. Lesser Fumbling Windows Copyright Push Technology 2012
  38. Event Windows, tumbling. Copyright Push Technology 2012
  39. eep.js: Sliding Windows init() 1 2 3 4 5 .. .. .. .. x() 1 2 3 4 .. .. .. .. x() 1 2 3 .. .. .. .. x() 1 2 .. .. .. .. t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 ... What is a sliding window? • Like tumbling, except can overlap. But O(N2), Keep N small. • Every event opens a new window. • After N events, every subsequent event emits a result. • Like all windows, cost of calculation ammortized over events Copyright Push Technology 2012
  40. Event Windows, sliding. Copyright Push Technology 2012
  41. eep.js: Periodic Windows x() x() x() x() emit() x() x() x() x() emit() 1 2 3 4 x() x() x() x() emit() 2 3 4 5 init() 2 3 4 5 init() init() t0 t1 t2 t3 ... What is a periodic window? • Driven by ‘wall clock time’ in milliseconds • Not monotonic, natch. Beware of NTP Copyright Push Technology 2012
  42. Event Windows, periodic. Copyright Push Technology 2012
  43. eep.js: Monotonic Windows my my my x() x() x() x() emit() x() x() x() x() emit() 1 2 3 4 x() x() x() x() emit() 2 3 4 5 init() 2 3 4 5 init() init() t0 t1 t2 t3 ... What is a monotonic window? • Driven mad by ‘wall clock time’? Need a logical clock? • No worries. Provide your own clock! Eg. Vector clock Copyright Push Technology 2012
  44. Event Windows, monotonic. Copyright Push Technology 2012
  45. Event Windows, monotonic. Copyright Push Technology 2012
  46. Event Windows, monotonic. Copyright Push Technology 2012
  47. eep.js Embedded Event Processing: • Simple to use. Aggregates Functions and Windowed event processing. • Get it from GitHub/npm soon. Use it. Fork it. • Fast. CEP engines typically handle ~250K/sec. • For small N (most common) is 34x - 200x faster than commercial CEP engines. • But, at a small price. Simple. No multi-dimensional, infinite or predicate windows • Reduces a flood of events into a few in near real time • Can handle 8-40 million events per second (max, on my laptop). YMMV. • Combinators may be added. [Ugh, if I need combinators] Copyright Push Technology 2012
  48. Le Performance? 100 10 Millions Java Tumbling Events Java Sliding 1 per Second 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 Node Tumbling Node Sliding 0.1 0.01 0.001 Window Size [Fixed] Copyright Push Technology 2012
  49. Performance? In perspective • A 1 producer, 1 consumer lock-free wait-free full duplex queue implementation on a 2.3GHz intel Sandybridge can: • Distribute ~300M events between hyperthreads per second • Distribute ~50M events between two hardware threads on two cores on the same physical die • Distributed ~30M events between two hardware threads on two cores on separate physical dies • You can, with a fully lock-free wait-free system (and you bypass the operating system kernel), maybe, ~1M 1K events/second • There’s no point being capable of > 30M events/second on a thread if you’re going over a wire. • So, 8-40 million events/second in node is a pleasant sufficiency • It’s not the algorithm. It’s the mechanical sympathy, stoopid! • Lock free wait-free concurrency is easier than lock based concurrency. Try it. Copyright Push Technology 2012
  50. • Thank you for listening • Thank you for having me • Thank you Push for the beer budget • Le twitter: @darachennis • Expect eep.js in GitHub soon • I’ll hashtag it #nodedublin • Thank you @Waratek geeks. About me? Copyright Push Technology 2012

Editor's Notes

  1. Surfing the internet coined by Jean Poly
  2. Abstractio
  3. Abstractio
  4. Abstractio
  5. Abstractio
  6. Abstractio
  7. Abstractio
  8. Abstractio