Introducing the WSO2 Complex Event Processor

  1. 1. Introducing the WSO2 Complex Event Processor Simplifying Complexities of Data Processing S. Suhothayan Software Engineer, Data Technologies Team.
  2. 2. Outline ƒ Introduction to CEP ƒ WSO2 CEP Server ƒ Siddhi Runtime ƒ HA & Scalability of WSO2 CEP ƒ WSO2 CEP server and WSO2 BAM ƒ Use Cases
  3. 3. Event Processing (Contd.) ƒ Event processing is about listening to events and detecting patterns in near real-time without storing all events. ƒ Three models o Simple Event Processing - Simple filters (e.g. Is this a gold or platinum customer?) o Event Stream Processing - Looking across multiple event streams and joining multiple event stream etc. o Complex Event Processing - Processing multiple event streams to identify meaningful patterns, using complex conditions & temporal windows - E.g. There has been a more than 10% increase in overall trading activity AND the average price of commodities has fallen 2% in the last 4 hours
  4. 4. Complex Event Processing ƒ We categorize events into different streams ƒ Process with minimal storage ƒ Use queries to evaluate the continuous event streams (Usually SQL like query language) ƒ Very fast results (in milliseconds range)
  5. 5. CEP Queries ƒ Types of queries are following o Filters and Projection o Windows – events are processed within temporal windows (e.g. for aggregation and joins). Time window vs. length window. o Ordering – identify event sequences and patterns (e.g. for a credit card new location followed by small and a large purchase might suggest a fraud) o Joins – join two streams
  6. 6. Example Query from p=PINChangeEvents#window.time(3600) join t=TransactionEvents[amount>10000]#window.time(3600) on p.custid==t.custid return t.custid, t.amount;
  7. 7. Opensource CEP Runtimes ƒ Siddhi o Apache License, a java library, Tuple based event model o Supports distributed processing o Supports multiple query models - Based on a SQL-like language - Filters, Windows, Joins, Ordering and others ƒ Esper, o GPLv2 License, a Java library, Events can be XML, Map, Object o Supports multiple query models - Based on a SQL-like language - Filters, Windows, Joins, Ordering and others ƒ Drools Fusion o Apache License, a java library o Support for temporal reasoning + windows
  8. 8. WSO2 CEP Server ƒ Enterprise grade server for CEP runtimes ƒ Provides support for several transports (network access) and data formats o SOAP/WS-Eventing – XML messages o REST/JSON – JSON messages o JMS – map messages, XML messages o Thrift – WSO2 data bridge format - High Performant Event Capturing & Delivery Framework supports Java/C/C++/C# via Thrift language bindings. ƒ Support multiple CEP runtimes o Siddhi – WSO2, new, very fast, distributed o Esper - well known CEP runtime o Drools Fusion – rule based, but much slower ƒ Easy plugin new brokers, new CEP engines
  9. 9. WSO2 CEP Server(Contd.) File System
  10. 10. CEP Buckets ƒ CEP Bucket is a logical execution unit ƒ Each CEP bucket has set of queries, event sources and input, output event mappings. ƒ It is one-one with a CEP engine
  11. 11. Management UI ƒ To define buckets ƒ Update running queries without resetting current execution states ƒ Manage brokers (Data adopters)
  12. 12. Developer Studio UI ƒ Eclipse based tool to define buckets ƒ Can manage the configurations through the production lifecycle
  13. 13. Siddhi Complex Event Processing Engine
  14. 14. Big Picture ƒ Users provide query/queries ƒ Map event streams to queries ƒ Siddhi keep the queries running and invoke callbacks registered against one or more queries/streams ƒ Example Query from cseEventStream[ symbol == ‘IBM’]#win.time(50000) insert into IBMStockQuote symbol, avg(price) as avgPrice
  15. 15. Siddhi High Level Architecture
  16. 16. Siddhi Queries: Filters from <stream-name> [<conditions>]* insert into <stream-name> ƒ Filters the events by conditions ƒ Conditions o >, <, = , <=, <=, != o contains o and, or, not ƒ Example from cseEventStream[price >= 20 and symbol==’IBM’] insert into StockQuote symbol, volume
  17. 17. Window from <stream-name> [<conditions>]#window.<window-name>(<parameters>) Insert [<output-type>] into <stream-name ƒ Types of Windows o (Time | Length) (Sliding| Batch) windows o Unique window, First unique (not supported in 1.0) ƒ Type of aggregate functions o sum, avg, max, min ƒ Example from cseEventStream[price >= 20]#window.lengthBatch(50) insert expired-events into StockQuote symbol, avg(price) as avgPrice group by symbol having avgPrice>50
  18. 18. Join from <stream>#<window> [unidirectional] join <stream>#<window> on <condition> within <time> insert into <stream> ƒ Join two streams based on a condition and window ƒ Join can be in multiple forms ((left|right|full outer) | inner) join - only inner is supported in 1.0 ƒ Unidirectional – event arriving only to the unidirectional stream triggers the join ƒ Example from TickEvent[symbol==’IBM’]#win.length(2000) join NewsEvent#win.time(500) insert into JoinStream *
  19. 19. Pattern from [every] <condition> Æ [every] <condition> … <condition> within <time> insert into StockQuote (<attribute-name>* | * ) ƒ Check condition A happen before/after condition B ƒ Can do iterative checks via “every” keyword. ƒ Here with “within <time>”, SIddhi emits only events that are within that time of each other ƒ Example from every (a1 = purchase[price < 10] ) Æa2 = purchase [price >10000 and a1.cardNo==a2.cardNo] within 300000 insert into potentialFraud a2. cardNo as cardNo, a2. price as price, as place
  20. 20. Sequence from <event-regular-expression> within <time> insert into <stream> ƒ Regular Expressions supported o * - Zero or more matches (reluctant). o + - One or more matches (reluctant). o ? - Zero or one match (reluctant). o or – either event ƒ Here we have to refer events returned by * , + using square brackets to access a specific occurrence of that event From a1 = requestOrder[action == "buy"], b1 = cseEventStream[price > a1.price and symbol==a1.symbol]+, b2 = cseEventStream[price <b1.price] insert into purchaseOrder a1. symbol as symbol, b1[0].price as firstPrice, b2.price as orderPrice
  21. 21. Performance Results ƒ We compared Siddhi with Esper, the widely used opensource CEP engine ƒ For evaluation, we did setup different queries using both systems, push events in to the system, and measure the time till all of them are processed. ƒ We used Intel(R) Xeon(R) X3440 @2.53GHz , 4 cores 8M cache 8GB RAM running Debian 2.6.32-5-amd64 Kernel
  22. 22. Performance Comparison With ESPER Simple filter without window from StockTick[prize >6] return symbol, prize
  23. 23. Performance Comparison With ESPER State machine query for pattern matching From f=FraudWarningEvent -> p=PINChangeEvent(accountNumber=f.accountNumber) return accountNumber;
  24. 24. Siddhi Features ƒ Supports State Persistence o Enabling Queries to span lifetimes much greater than server uptime. o By taking periodic snapshots and storing all state information and windows to a scalable persistence store (Apache Cassandra). o Pluggable persistent stores. ƒ Support Highly Available Deployment o Using Hazelcast distributed cache as a shared working memory.
  25. 25. HA/ Persistence ƒ This is ability to recover runtime state in the case of a failure ƒ CEP server can support if CEP engine supports persistence (OK with Siddhi, Esper)
  26. 26. Scaling ƒ CEP pipeline can be distributed,But queries like windows, patterns, and Join are hard to distribute ƒ WSO2 CEP with Siddhi uses distributed cache (Hazelcast) as shared memory and selective processing approach to achieve massive scalability in distributed processing
  27. 27. Event Recording ƒ Ability to record all/some of the events for future processing ƒ Few options o Publish them to Cassandra cluster using WSO2 data bridge API or BAM (can process data in Cassandra with Hadoop using WSO2 BAM). o Write them to distributed cache o Custom thrift based event recorder
  28. 28. WSO2 BAM
  29. 29. CEP Role within WSO2 Platform
  30. 30. DEMO
  31. 31. Scenario ƒ Monitoring stock exchange for game changing moments ƒ Two input event streams. o Event stream of Stock Quotes from a stock exchange o Event stream of word count on various company names from twitter pages ƒ Check whether the last traded price of the stock has changed significantly(by 2%) within last minute, and people are twitting about that company (> 10) within last minute
  32. 32. Example Scenario JMS Event Publisher JMS Event Receiver
  33. 33. Input events ƒ Input events are JMS Maps o Stock Exchange Stream Map<String, Object> map1 = new HashMap<String, Object>(); map1.put("symbol", "MSFT"); map1.put("price", 26.36); publisher.publish("AllStockQuotes", map1); o Twitter Stream Map<String, Object> map1 = new HashMap<String, Object>(); map1.put("company", "MSFT"); map1.put("wordCount", 8); publisher.publish("TwitterFeed", map1);
  34. 34. Queries
  35. 35. Queries from allStockQuotes[win.time(60000)] insert into fastMovingStockQuotes symbol,price, avg(price) as averagePrice group by symbol having ((price > averagePrice*1.02) or (averagePrice*0.98 > price )) from twitterFeed[win.time(60000)] insert into highFrequentTweets company as company, sum(wordCount) as words group by company having (words > 10) from fastMovingStockQuotes[win.time(60000)] as fastMovingStockQuotes join highFrequentTweets[win.time(60000)] as highFrequentTweets on insert into predictedStockQuotes fastMovingStockQuotes.symbol as company, fastMovingStockQuotes.averagePrice as amount, highFrequentTweets.words as words
  36. 36. Alert ƒ As a XML <quotedata:StockQuoteDataEvent xmlns:xsi="" xmlns:xsd="" xmlns:quotedata=""> <quotedata:StockSymbol>{company}</quotedata:StockSymbol> <quotedata:LastTradeAmount>{amount}</quotedata:LastTradeAmount> <quotedata:WordCount>{words}</quotedata:WordCount> </quotedata:StockQuoteDataEvent>
  37. 37. Useful links ƒ WSO2 CEP 2.0.0 Milestone 2 ƒ Distributed Processing Sample With Siddhi CEP and ActiveMQ JMS Broker. html
  38. 38. Questions?
  39. 39. Thank you.