Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data distribution in the cloud with Node.js


Published on

Messaging becomes Data Distributions gets embedded event processing (not complex, made simple) - bending all the rules one benchmark at a time - Push Technology, Waratek and other things

Published in: Technology
  • Sex in your area is here: ❤❤❤ ❤❤❤
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating for everyone is here: ❶❶❶ ❶❶❶
    Are you sure you want to  Yes  No
    Your message goes here

Data distribution in the cloud with Node.js

  1. Data Distribution in the cloud with Node.jsCopyright Push Technology 2012
  2. • British startup. Founded in 2006. • ‘Last mile’ data distribution specialist. • Data-centric approach to messaging/caching. • Preferred by 6 of the top 10 online eGaming exchanges. • Growing fast. 400% year on year. • Focus: Better bang for your bytes! Introducing Push TechnologyCopyright Push Technology 2012 Twitter: @push_technology
  3. • Distributed Systems / HPC guy. • Chief Scientist :- at Push Technology • Alumnus of :- Motorola, IONA, Betfair, JPMC, StreamBase. • School: Trinity College Dublin. - BA (Mod). Comp. Sci.+ - M.Sc. Networks & Distributed Systems • Responds to: Guinness, Whisky About me?Copyright Push Technology 2012
  4. • Favorite language: Erlang • Favorite bits? • OTP – Behaviors • Bit Syntax • Least favorite language: Java • Paid to write this stuff • Love the JVM • Liking Node a lot. • Small fast data guy. I work in microseconds, measure in nanoseconds. On my critical path micro-benchmarking is a way of life. About me?Copyright Push Technology 2012
  5. 1st clean room certified JVM in 10 years. Built in Dublin! It rocks. Tenant #1 Tenant #2 Tenant #N (Diffusion) (Diffusion) (Diffusion) Push Technology Diffusion Waratek Cloud VM for Java Benefits • High density deployments • Elastic. Scalable on demand • Meterability: Bandwidth and compute utilization • Multi-tenant. Each tenant fully isolatedCopyright Push Technology 2012
  6. A US Cap Market second? • 174 microseconds round trip time rules out High Frequency Trading applications. Not on the critical path! Source: Me, former life @StreamBase •
  7. Data Distribution. Wat?Copyright Push Technology 2012
  8. Traditional Messaging A B ba bb Producers ? ConsumersPros Cons• Loosely coupled. • No data model. Slinging blobs• All you can eat messaging patterns • Fast producer, slow consumer? Ouch.• Familiar • No data ‘smarts’. A blob is a blob.Copyright Push Technology 2012
  9. Invented yonks ago…Before the InterWebsFor ‘reliable’ networksFor machine to machineRemember DEC Message Queues?- That basically. Vomit!Copyright Push Technology 2012
  10. When fallacies were simple -The network is reliable -Latency is zero -Bandwidth is infinite -There is one administrator -The network is secure -Transport cost is zero -The network is homogeneousCopyright Push Technology 2012
  11. Then in 1992, this happened:The phrase ‘surfing the internet’ was coined by Jean Poly.First base.Copyright Push Technology 2012
  12. It grew, and it grewCopyright Push Technology 2012
  13. Then in 2007, this happened:The god phone:Surfing died. Touching happened.Second base unlocked.Copyright Push Technology 2012
  14. Then in 2007, this happened:So we took all the things and put them in the internet:Cloud happened.So we could touch all the things. Messaging Apps Hardware Virtualize all the things Services Skills, SpecialtiesCopyright Push Technology 2012
  15. Then in 2009, this happened:Ryan Dahl, basically.Tyrannically asynchronous.Devilishly event oriented.Amazoidingly non-blocking.Copyright Push Technology 2012
  16. It grew, and it grewLike all the good things do.Copyright Push Technology 2012
  17. Stop. Fallacies? Reality: -The network is not reliable nor is it cost free. -Latency is not zero nor is it a democracy. -Bandwidth is not infinite nor predictable especially the last mile! -There is not only one administrator trust, relationships are key -The network is not secure nor is the data that flows through it -Transport cost is not zero but what you don’t do is free -The network is not homogeneous nor is it smartCopyright Push Technology 2012
  18. Look. What, How & Why? -What and How are what geeks do. -Why gets you paid -Business Value and Trust dictate What and How - Policies, Events and Content implements Business Value -Science basically. But think like a carpenter: -Measure twice. Cut once.Copyright Push Technology 2012
  19. The Problem: The bird, basically. Immediately Inconsistent. But, Eventually Consistent … Maybe.Copyright Push Technology 2012
  20. Listen. - Every nuance comes with a set of tradeoffs. - Choosing the right ones can be hard, but it pays off. - Context, Environment are critical - Break all the rules, one benchmark at a time. - Benchmark Driven Development FTWCopyright Push Technology 2012
  21. Act. - You measured twice, right? - So get cutting! - SimplesCopyright Push Technology 2012
  22. Act. Telepathy? Telemetry! A B ba bb Buffer Producers Bloat Consumers Virtualize client queues? Nuance: ‘See’ backlog, client affinity. Tradeoff GD harder :/Copyright Push Technology 2012
  23. Act. Stateless or Stateful Topics A B ba x bb x Producers Is it a Consumers cache? Data one hop closer to consumers. Good state? Touch it! Exploit it! Use it!Copyright Push Technology 2012
  24. Act. Finagle the data A B Snapshot Delta ba x bb x Producers State! Consumers Last value cached. Tradeoff? Memory. Snapshot on subscribe. Deltas thereafterCopyright Push Technology 2012
  25. Act. ‘Smart data’ A B C A C D t0 t1 Don’t repeat yourself. Send the changes, not the whole list after initial ‘snapshot’.Copyright Push Technology 2012
  26. Act. Behaviors A B ba x bb x X The Producers topic is Consumers the cloud! Extensible. Nuance? Roll your own protocols. Tradeoff? 3rd party code in the engine :/Copyright Push Technology 2012
  27. Data DistributionMessaging remixed around:Relevance - Queue depth for conflatable data should be 0 or 1. No moreResponsiveness - Use HTTP/REST for things. Stream the little thingsTimeliness - It’s relative. M2M != M2H.Context - Packed binary, deltas mostly, snapshot on subscribe.Environment- Don’t send 1M 1K events to a mobile phone with 0.5mbps.Copyright Push Technology 2012
  28. An Example Operations:> Tenants :> Gaming Live Internet Apps Finance QA + Dev + UATCopyright Push Technology 2012
  29. Either way?It’s about the data.Period.The rest (analysis, storage, transformation) is sugar.Copyright Push Technology 2012
  30. Sugar? Streams w w S C Q w w Stream Operations • Mapping. Change/enrich the data structurally. • Aggregation. A ‘window of’ data. Eg. A seconds worth. • Splitting & Filtering • Combining multiple streams. Eg. Temporal pattern matching • Access/Store. Eg: CRUD, variable, file, …Copyright Push Technology 2012
  31. Sugar? Streams w w S C Q w w Stream Operations • Mapping. Just a function call in Node.js • Aggregation. A ‘window of’ data. Eg. A seconds worth. • Splitting & Filtering. An expression or a set thereof. • Combining multiple streams. It depends. Can be ‘complex’ • Access/Store. Trivial.Copyright Push Technology 2012
  32. Embedded Event Processing with Node.js. eep.jsCopyright Push Technology 2012
  33. Introducing eep.js w w S C Q w w What is eep.js? • Add aggregate functions and window operations to Node.js • 4 window types: tumbling, sliding, periodic, monotonic • Node makes evented IO easy. So just add windows. • Fast. 8-40 million events per second (upper bound).Copyright Push Technology 2012
  34. eep.js: Tumbling Windows x() x() x() x() emit() x() x() x() x() emit() 1 2 3 4 x() x() x() x() emit() 2 3 4 5init() 2 3 4 5 init() init() t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 ... What is a tumbling window? • Every N events, give me an average of the last N events • Does not overlap windows • ‘Closing’ a window, ‘Emits’ a result (the average) • Closing a window, Opens a new window Copyright Push Technology 2012
  35. eep.js: Aggregate Functions What is an aggregate function? • A function that computes values over events. • The cost of calculations are ammortized per event • Just follow the above recipe • Example: Aggregate 2M events (equity prices), send to GPU on emit, receive 2M options put/call prices as a result.Copyright Push Technology 2012
  36. meh: Fumbling WindowsCopyright Push Technology 2012
  37. Lesser Fumbling WindowsCopyright Push Technology 2012
  38. Event Windows, tumbling.Copyright Push Technology 2012
  39. eep.js: Sliding Windowsinit() 1 2 3 4 5 .. .. .. .. x() 1 2 3 4 .. .. .. .. x() 1 2 3 .. .. .. .. x() 1 2 .. .. .. .. t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 ... What is a sliding window? • Like tumbling, except can overlap. But O(N2), Keep N small. • Every event opens a new window. • After N events, every subsequent event emits a result. • Like all windows, cost of calculation ammortized over events Copyright Push Technology 2012
  40. Event Windows, sliding.Copyright Push Technology 2012
  41. eep.js: Periodic Windows x() x() x() x() emit() x() x() x() x() emit() 1 2 3 4 x() x() x() x() emit() 2 3 4 5init() 2 3 4 5 init() init() t0 t1 t2 t3 ... What is a periodic window? • Driven by ‘wall clock time’ in milliseconds • Not monotonic, natch. Beware of NTP Copyright Push Technology 2012
  42. Event Windows, periodic.Copyright Push Technology 2012
  43. eep.js: Monotonic Windows my my my x() x() x() x() emit() x() x() x() x() emit() 1 2 3 4 x() x() x() x() emit() 2 3 4 5init() 2 3 4 5 init() init() t0 t1 t2 t3 ... What is a monotonic window? • Driven mad by ‘wall clock time’? Need a logical clock? • No worries. Provide your own clock! Eg. Vector clock Copyright Push Technology 2012
  44. Event Windows, monotonic.Copyright Push Technology 2012
  45. Event Windows, monotonic.Copyright Push Technology 2012
  46. Event Windows, monotonic.Copyright Push Technology 2012
  47. eep.jsEmbedded Event Processing:• Simple to use. Aggregates Functions and Windowed event processing.• Get it from GitHub/npm soon. Use it. Fork it.• Fast. CEP engines typically handle ~250K/sec.• For small N (most common) is 34x - 200x faster than commercial CEP engines.• But, at a small price. Simple. No multi-dimensional, infinite or predicate windows• Reduces a flood of events into a few in near real time• Can handle 8-40 million events per second (max, on my laptop). YMMV.• Combinators may be added. [Ugh, if I need combinators]Copyright Push Technology 2012
  48. Le Performance? 100 10Millions Java TumblingEvents Java Sliding 1 perSecond 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 Node Tumbling Node Sliding 0.1 0.01 0.001 Window Size [Fixed] Copyright Push Technology 2012
  49. Performance? In perspective• A 1 producer, 1 consumer lock-free wait-free full duplex queue implementation on a 2.3GHz intel Sandybridge can: • Distribute ~300M events between hyperthreads per second • Distribute ~50M events between two hardware threads on two cores on the same physical die • Distributed ~30M events between two hardware threads on two cores on separate physical dies • You can, with a fully lock-free wait-free system (and you bypass the operating system kernel), maybe, ~1M 1K events/second • There’s no point being capable of > 30M events/second on a thread if you’re going over a wire. • So, 8-40 million events/second in node is a pleasant sufficiency • It’s not the algorithm. It’s the mechanical sympathy, stoopid! • Lock free wait-free concurrency is easier than lock based concurrency. Try it.Copyright Push Technology 2012
  50. • Thank you for listening • Thank you for having me • Thank you Push for the beer budget • Le twitter: @darachennis • Expect eep.js in GitHub soon • I’ll hashtag it #nodedublin • Thank you @Waratek geeks. About me?Copyright Push Technology 2012