Distributed rules engines and CEP

7,926 views
7,683 views

Published on

Speaker: John Davies
We've had powerful Rules Engines and Complex Event Processing for a good decade now and there are several powerful ones on the market, some even open source. Many of these engines though have been build around single albeit efficient applications running on a single machine. As we take Big Data head on we start to see the need to rules that we can distribute and complex events processing across our distributed system. Combining Pivotal's GemFire and C24's Integration Objects, naturally glued together with a little Spring, we can process millions of complex events in seconds. John will walk through some of the design and use-cases of these powerful system.

Published in: Technology
1 Comment
13 Likes
Statistics
Notes
  • NB!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
7,926
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
119
Comments
1
Likes
13
Embeds 0
No embeds

No notes for slide

Distributed rules engines and CEP

  1. 1. Distributed rules engines and CEP By John T Davies © 2013 SpringOne 2GX. All rights reserved. Do not distribute without permission. 1
  2. 2. About Me • I’m a CTO, I’ve spent 30 years in IT, designed hardware, programmed in assembler, C, C++, Objective-C, Java and whole host of other languages • Today I concentrate on ideas, I still program, I spent a lot of time prototyping (in Java 7-8, Scala, HTML5, JavaFx and Objective-C) but I don’t do 9-5 programming all day – “9-5” for programmers means 24/7 • We’re working on some seriously cool stuff with the Spring guys, hopefully I can inspire you with some ideas... 2 2
  3. 3. Pragmatism vs. Buzz-words • During my talk you’ll here a lot of buzz-words like “CEP” and Big-Data • To me these are often vendor attributed “tags” to something us techies know as something else • One of my goals in this talk is to demystify the buzz-words • At the end of the day if the vendor’s telling you you need an X but Y works then use Y – Vendors are driven by marketing and they are usually people who can’t usually understand technology or sales so go into marketing 3 3
  4. 4. Agenda The different types of rules Big Data (definition) Parsing messages (at a million per second) Spring SpEL filters - too “slow” In-memory compilation Putting it all together 4 4
  5. 5. Rules, Rules, Rules Engines and CEP • Every time one of the above comes up the conversation starts bringing in the others – So let’s try to first look at the different usages of rules and event processing • I’d like to start with the idea that a rule or being able to process an event is basically programming logic – We decide whether to perform an action based on the content of an event or some outside condition e.g. time. 5 5
  6. 6. Five types of Rules • There may be more, or less, but I have 5 for now – Message syntax and semantic rules • Wire-level and business-level rules associated with the message – Routing & Decision Rules • Routing based on message content – Rules Engines • As above but involving multiple sources often with aggregation – CEP (Complex Event Processing) • Really as above but someone’s tagged it “CEP” – Matching and Transformation rules 6 6
  7. 7. A little more detail... • Message syntax and semantic rules • Syntactical Rules – DateFormat, Integer range (0-255), Regex ([A-Z]{3}) – Can be mostly defined by XML Schema – CRC checksums • Semantic Rules – ElementX must be from a list in my database (e.g. Currencies) – TradeDate must be before SettlementDate – Total amount bought today must not exceed the daily limit 7 7
  8. 8. Routing & Decision Rules • Probably mostly what you’re used to with Spring Integration and Batch Spring Integration – <filter input-channel="filter-message-channel" outputchannel="process-message-channel" Spring Batch expression="payload.commission > 1000"/> – <filter input-channel="in-channel" output-channel="out-channel" expression="msg.valid = ‘false’"/> • These rules are both static and dynamic and it’s the dynamic part we’re going to look at later 8 8
  9. 9. Rules Engines • Enter the realm of Rules Engines vs. CEP • To me a rules engine is a container that executes rules • BUT, things are often made easier for the user but adding friendly features like a framework or DSL • DRools is a good example, not the only one though 9 9
  10. 10. CEP - Complex Event Processing • Immediate thoughts turn towards trading platforms – Algo traders and Quants writing new trading algorithms • It’s difficult to separate from a rules engine other than the fact that it’s more mature and usually run in near “real-time” • Usually accompanied with shiny-suited salesmen, great marketing material with expensive graphics and moneymaking examples 10 10
  11. 11. Matching and Transformation • Matching (and reconciliation) – This is like a special case of a search, basically we’re looking for matching pairs (or batches) of messages – Used in Financial Services for matching trades or reconciliation • Transformation – Messages are rarely in the right form to be used without some form of transformation – Transformation could be to a different wire protocol or to a totally different structure e.g. a canonical message format 11 11
  12. 12. Layers of Rules • Basically we have wire-level rules – Mostly syntactic and some semantic (business) • Basic routing rules – Generally based around business content • Rules engines and complex rules (including CEP) – Usually involving multiple sources and aggregation of data over periods of time, some long running 12 12
  13. 13. Rule Distribution • Distributing these rules has different problems... – Message syntax and semantic rules • Associated with the message so inherently distributed – Routing & Decision Rules • Rules can be chained along the message flow so relatively distributable – Rules Engines • To date, most engines run on single machines, very few are distributed – CEP • As with the engines, to date these are not distributed. They were not built for “big-data” sized problems 13 13
  14. 14. Big Data - What is it? • Another buzzword, my definition... Data that won’t fit on one machine • We have machines with virtually unlimited storage but memory is still limited – 100k messages per second (24/7) is 8.64 billion/day – In 6 months that’s 1.5 trillion (1.5 PB for 1k messages) – Try sorting that without hitting the disk drive! 14 14
  15. 15. Big Data Architecture • Big Data architecture is basically distributed architecture – By (my) definition, it’s data that doesn’t fit on one machine so it has to be distributed • So nothing really new, again it’s another buzzword sold to you by the vendors • Don’t get me wrong, Big Data is real and we do need it but it’s just a label put on things we’ve been doing for years 15 15
  16. 16. OK - Getting down to it • NB: I’m being purposely obtuse about the clients, we’re not allowed to mention them - unfortunately • Here are a couple of scenarios • Telco taking cell-tower data – Binary data, 200-700 bytes, 20-500k/sec, 24/7 • Investment Bank – Tag/Value data, 500-1.5k bytes, 20-400k/sec (per exchange) 8-12 hours/day 16 16
  17. 17. We have two scenarios here • One is filtering and processing the messages in near-realtime – Performance is critical, not being able to manage the peaks (3-400k+/sec) would result in massive back-logs, requiring impractical amounts of memory – The more we can filter at this level the less we need to do later... • The other is querying, sorting and triggers (event processing) – Not just realtime but also aggregated and historic data • Aggregate queries like min, max, avg, moving avg etc. – Some of the results form this “layer” can be used in the other • I.e. filtering and message processing can be based on aggregate data 17 17
  18. 18. Putting the volume into perspective • If we can process 200k/sec and we get a peek of 400k for an hour (e.g. Christmas, Thanksgiving, breaking news etc.) – We have a backlog of 200k/sec for 3600 seconds, if the volume drops to 150k/sec it will take 4 hours to catch up – And we’ll need 360 GB of RAM to queue it!!! • Even working at 500k/sec and a peak of 550k for 30 minutes dropping to 400k – 15 minute’s delay and 45 GB of RAM • We need REALTIME 18 18
  19. 19. Message Parsing • Taking a message with about 30-40 attributes and parsing them so that they can be available to Spring Integration/Batch is not exactly complex – But providing the ability to manage the changes and dozens of standards starts to add complexity – Things change over time, we need to manage that change too • Two relatively simple examples from the Telco and Financial Services industry are RADIUS (rfc-2865) and FIX (from FPL) – Others include ASN.1, ISO-8583 19 19
  20. 20. Radius (rfc-2865) • Remote Authentication Dial In User Service (RADIUS) – 77 pages of binary spec... 20 20
  21. 21. Fix FAST Spec... • Int32 Example – Mandatory Negative Number with sign-bit extension 21 21
  22. 22. Modelling in C24 first • The RADIUS standard... 22 22
  23. 23. The Fix standard... • Modelled in C24-iO 23 23
  24. 24. What we can now do • The spec may say that bit 13 of a 32 bit field represents the presence of a field ABC (later in the message) – Programatically we can test bit 5 with a mask 0x00002000 – Using the generated code we can simply call isAbcSet() • It may say that bits 4-6 represent the version ID – Programatically we can mask it and then shift it... • mask 0x00000070 • shift >> 4 – Using the generated code we can simply call getVersionId() 24 24
  25. 25. From binary jd-server:Radius TestData jdavies$ hexdump -C radius.dat 00000000 01 02 00 74 ea d5 7c 62 1f d0 f6 fe a3 bf 36 4c |...t..|b......6L| 00000010 35 25 e5 8c 1a 17 00 00 28 af 01 11 32 33 34 34 |5%......(...2344| 00000020 35 37 30 36 32 37 38 38 35 33 36 01 11 32 33 34 |57062788536..234| 00000030 31 35 39 30 36 32 35 38 38 35 33 36 1f 11 33 35 |159062588536..35| 00000040 33 34 32 31 30 32 30 39 34 35 35 36 38 5e 0e 34 |3421020945568^.4| 00000050 34 37 30 30 34 31 38 38 36 37 33 1a 0d 00 00 28 |47004188673....(| 00000060 af 08 07 32 33 34 33 35 06 06 00 00 00 02 37 06 |...23435......7.| 00000070 4e 97 57 a8 |N.W.| 00000074 25 25
  26. 26. To Spring Integration... • <filter input-channel="filter-message-channel" outputchannel="process-message-channel" ref=”payload” method=”isAbcSet"/> • <filter input-channel="filter-message-channel" outputchannel="process-message-channel" expression="payload.versionId == 5"/> 26 26
  27. 27. So, job done? • No :-( • We had two problems, firstly the parsing of the binary into a bound-Java object was slow ~20k/sec (per core) – Fine for most purposes and with an 8 core machine potentially good for 100k/sec but we wanted better • Secondly, and we discovered this after fixing the above, the SpEL (Spring Expression Language) “queries” were too slow – We’ll come back to this in a few slides 27 27
  28. 28. ByteBuffer • Java 1.4 added java.nio.ByteBuffer • Basically a wrapper for a byte[] 28 28
  29. 29. Lazy parsing • Rather than parse all of the elements from the message every time we retrieve the elements only when we need them – Similar to comparing a DOM with a SAX parser – We assume that we will only need to filter/sort/query on a limited number of fields so we save a lot of redundant parsing • Performance goes from ~20k/sec (50µs per message) to ~1 million/sec (1µS per message), some 50 times faster – Even parsing the entire message is significantly faster 29 29
  30. 30. Here’s where it gets interesting... • So we now have a Java object with a ByteBuffer holding the message data and dozens of get() methods to get the content – Message call = new Message(data); – call.getDuration(); • This works pretty fast, plus the JIT compiler kicks in after 10,000 iterations and optimises the method • However if we use a SpEL expression it gets horribly “slow” • <filter input-channel="filter-message-channel" output-channel="processmessage-channel" expression="payload.duration lt 0.1"/> 30 30
  31. 31. Reflection • What’s happening under the hood is reflection, very powerful but sadly still rather “slow” – expression="payload.duration lt 0.1" – turns into something like... • This adds about 700nS to each message – A few of these and we’ve more than halved the performance 31 31
  32. 32. The Java Compiler to the rescue! • New from Java 1.6: ToolProvider.getSystemJavaCompiler(); • We can create a generic accessor for double values... • Can now write a class on the fly that implements this method for the getter we want e.g. getDuration() 32 32
  33. 33. Java on the fly... 33 33
  34. 34. Compile it... 34 34
  35. 35. Running it... • Load up the compiled class (or byte-code) and run it • Note however that the first two lines (above) are done only once, outside of the loop • The result is “native” performance as if it was code – Which of course it is 35 35
  36. 36. Back to spring... • SpEL expressions are currently interpreted • There is some basic caching – java.lang.reflect.Field – java.lang.reflect.Method objects are cached once discovered and reused on subsequent evaluations • But overall they are “slow” (for what we need) – We need to look at avoiding reflection by compiling the parsed expression into a class 36 36
  37. 37. SpEL compiler • Currently proposed by Andy Clement (Spring) is a SpEL compiler – With the proposed changes SpEL has a mixed mode interpreter (MMI) system in place – Mixed means it mixes the current interpreter with a real expression compiler • When an expression is evaluated X number of times SpEL compiles it to byte-code and uses that going forward – User does nothing, evaluations just accelerate 37 37
  38. 38. Does it make a difference? • Property access:foo.bar.boo 38 38
  39. 39. Does it make a difference? • Method invocation:hello() 39 39
  40. 40. Drawbacks? • Pure speed is at the cost of flexibility • No checking = can go bang if you supply unexpected data – e.g. after 10000 iterations change the type of parameters you are supplying to something incompatible – With great speed comes great responsibility • For cases of simple method invocation or property access on the same object types over and over, this is perfect 40 40
  41. 41. Exceptions • Personally I like the AKKA framework but we’re also using the Reactor framework • One thing to look out for in heavily threaded, high performance code is exceptions • Throwing an exception can be very expensive so we need to go out of our way to avoid them - where possible 41 41
  42. 42. So “dynamic” rules at µSec performance • If we can now compare and filter in under 1 µSec (1 million/ sec) then we can reduce the load on the “database” – In most cases the “database” is an in-memory store – GemFire • We can now use GemFire mode efficiently for two reasons... – 1. We perform “triage” on the messages and only pass on the relevant messages – 2. As we’re using ByteBuffers we have no marshalling and unmarshalling of the data as we pass it into GemFire 42 42
  43. 43. GemFire for distributing calculated values • If a user decides to analyse the top 1% of messages (by some value) then we need to calculate the 1% mark for the filter – With our compiled SpEL filter this now runs at “native” speed • GemFire and perform several roles here – Distributing the 1% value to distributed filters – Maintaining and distributing min/max/average values – Storing the filtered data for analysis 43 43
  44. 44. Distributed CEP? • Since we can now calculate and maintain aggregates based on expressions over multiple data groups • We can send messages/alerts/call-backs based on incoming data or aggregates of that data we can, in effect, perform Complex Event Processing (CEP) in a distributed environment • In practice - it works 44 44
  45. 45. More Spring • Read more on the tools we used... – Spring XD - http://projects.spring.io/spring-xd/ – Spring Integration - http://projects.spring.io/spring-integration/ – Spring Batch - http://projects.spring.io/spring-batch/ – Spring AMQP - http://projects.spring.io/spring-amqp/ 45 45
  46. 46. Thank you! + Twitter: @jtdavies Google+: 104817601461885280216 (is this really a Google+ ID?) LinkedIn: http://linkedin.com/in/jdavies/ C24: www.c24.biz 46

×