Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

WSO2 Complex Event Processor

2,340 views

Published on

At WSO2Con 2014 Europe

Published in: Data & Analytics, Technology

WSO2 Complex Event Processor

  1. 1. WSO2 Complex Event Processor Sriskandarajah Suhothayan (Suho) Srinath Perera WSO2 Inc.
  2. 2. Outline • BigData • Complex Event Processing • Basic Constructs of Query Language • CEP Solution Patterns • Scale, HA and Performance • Demo
  3. 3. Why Big Data is hard? • How store? Assuming 1TB bytes it takes 1000 computers to store a 1PB • How to move? Assuming 10Gb network, it takes 2 hours to copy 1TB, or 83 days to copy a 1PB • How to search? Assuming each record is 1KB and one machine can process 1000 records per sec, it needs 277CPU days to process a 1TB and 785 CPU years to process a 1 PB • How to process? – How to convert algorithms to work in large size – How to create new algorithms http://www.susanica.com/photo/9
  4. 4. Why it is hard (Contd.)? • System build of many computers • That handles lots of data • Running complex logic • This pushes us to frontier of Distributed Systems and Databases • More data does not mean there is a simple model • Some models can be complex as the system http://www.flickr.com/photos/mariachily/5250487136, Licensed CC
  5. 5. Big data Processing Technologies Landscape
  6. 6. WSO2 Bigdata Offerings
  7. 7. CEP Is & Is NOT! • Is NOT! • Simple filters • Simple Event Processing • E.g. Is this a gold or platinum customer? • Joining multiple event streams • Event Stream Processing • Is ! • Processing multiple event streams • Identify meaningful patterns among streams • Using temporal windows • E.g. Notify if there is a 10% increase in overall trading activity AND the average price of commodities has fallen 2% in the last 4 hours
  8. 8. What is ?
  9. 9. Query Functions of CEP • Filter • Transformation • Window + { Aggregation, group by } • Join • Event Sequence • Event Table
  10. 10. CEP Architecture
  11. 11. Event Streams • Event stream is a sequence of events • Event streams are defined by Stream Definitions • Events streams have in-flows and out-flows • Inflows can be from • Event builders Converts incoming XML, JSON, etc events to event stream • Execution plans • Outflows are to • Event formatters Converts to event stream to XML, JSON, etc events • Execution plans
  12. 12. Stream Definition { 'name':'phone.retail.shop', 'version':'1.0.0', 'nickName': 'Phone_Retail_Shop', 'description': 'Phone Sales', 'metaData':[ {'name':'clientType','type':'STRING'} ], 'correlaitonData':[ {'name':’transactionID’,'type':'STRING'} ], 'payloadData':[ {'name':'brand','type':'STRING'}, {'name':'quantity','type':'INT'}, {'name':'total','type':'INT'}, {'name':'user','type':'STRING'} ] }
  13. 13. Event Format • Standard event formats are available for • XML • JSON • Text • Map • WSO2 Event • If events adhere to the standard format they do not need data mapping. • If events do not adhere custom event mapping should be configured in Event builder & Event Formatter appropriately.
  14. 14. Event Format Standard XML event format <events> <event> <metaData> <tenant_id>2</tenant_id> </metaData> <correlationData> <activity_id>ID5</activity_id> </correlationData> <payloadData> <clientPhoneNo>0771117673</clientPhoneNo> <clientName>Mohanadarshan</clientName> <clientResidenceAddress>15, Alexendra road, California</clientResidenceAddress> <clientAccountNo>ACT5673</clientAccountNo> </payloadData> </event> <events>
  15. 15. CEP Execution Plan ● Is an isolated logical execution unit ● Each execution plan imports some of the event streams available in CEP and defines the execution logic using queries and exports the results as output event streams. ● Has one-to-one relationship with CEP Backend Runtime. ● Has many-to-many relationship with Event Streams. ● Each execution plan spawns a Siddhi Engine Instance.
  16. 16. CEP Solution patterns 1. Transformation - project, translate, enrich, split 2. Filter 3. Composition / Aggregation / Analytics ● basic stats, group by, moving averages 4. Join multiple streams 5. Detect patterns ● Coordinating events over time ● Trends - increasing, decreasing, stable, non-increasing, non-decreasing, mixed 6. Blacklisting 7. Building a profile
  17. 17. Siddhi Query Structure define stream <event stream> (<attribute> <type>,<attribute> <type>, ...); from <event stream> select <attribute>,<attribute>, ... insert into <event stream> ;
  18. 18. Siddhi Query : Projection define stream TempStream (deviceID long, roomNo int, temp double); from TempStream select roomNo, temp insert into OutputStream ;
  19. 19. Siddhi Query : Inferred Streams from TempStream select roomNo, temp insert into OutputStream ; define stream OutputStream (roomNo int, temp double);
  20. 20. Siddhi Query : Enrich from TempStream select roomNo, temp,‘C’ as scale insert into OutputStream define stream OutputStream (roomNo int, temp double, scale string);
  21. 21. Siddhi Query : Enrich from TempStream select deviceID, roomNo, avg(temp) as avgTemp insert into OutputStream ;
  22. 22. Siddhi Query : Transformation from TempStream select concat(deviceID, ‘-’, roomNo) as uid, toFahrenheit(temp) as tempInF, ‘F’ as scale insert into OutputStream ;
  23. 23. Siddhi Query : Split from TempStream select roomNo, temp insert into RoomTempStream ; from TempStream select deviceID, temp insert into DeviceTempStream ;
  24. 24. Siddhi Query : Filter from TempStream [temp > 30.0 and roomNo != 2043] select roomNo, temp insert into HotRoomsStream ;
  25. 25. Siddhi Query : Window from TempStream select roomNo, avg(temp) as avgTemp insert into HotRoomsStream ;
  26. 26. Siddhi Query : Window from TempStream#window.time(1 min) select roomNo, avg(temp) as avgTemp insert into HotRoomsStream ;
  27. 27. Siddhi Query : Window from TempStream#window.time(1 min) select roomNo, avg(temp) as avgTemp group by roomNo insert into HotRoomsStream ;
  28. 28. Siddhi Query : Batch Window from TempStream#window.timeBatch(5 min) select roomNo, avg(temp) as avgTemp group by roomNo insert into HotRoomsStream ;
  29. 29. Siddhi Query : Join define stream TempStream (deviceID long, roomNo int, temp double); define stream RegulatorStream (deviceID long, roomNo int, isOn bool);
  30. 30. Siddhi Query : Join define stream TempStream (deviceID long, roomNo int, temp double); define stream RegulatorStream (deviceID long, roomNo int, isOn bool); from TempStream[temp > 30.0]#window.time(1 min) as T join RegulatorStream[isOn == false]#window.lenght(1) as R on T.roomNo == R.roomNo select T.roomNo, R.deviceID, ‘start’ as action insert into RegulatorActionStream ;
  31. 31. Siddhi Query : Detect Trend from t1=TempStream, t2=TempStream [t1.temp < t2.temp and t1.deviceID == t2.deviceID]+ within 5 min select t1.temp as initialTemp, t2.temp as finalTemp, t1.deviceID, t1.roomNo insert into IncreaingHotRoomsStream ;
  32. 32. Siddhi Query : Partition define partition Device by TempStream.deviceID ; define partition Temp by range TempStream.temp <= 0 as ‘ICE’, range TempStream.temp > 0 and TempStream.temp < 100 as ‘WATER’, range TempStream.temp > 100 as ‘VAPOUR’ ;
  33. 33. Siddhi Query : Detect Trend per Partition define partition Device by TempStream.deviceID ; from t1=TempStream, t2=TempStream [t1.temp < t2.temp and t1.deviceID == t2.deviceID]+ within 5 min select t1.temp as initialTemp, t2.temp as finalTemp, t1.deviceID, t1.roomNo insert into IncreaingHotRoomsStream partition by Device ;
  34. 34. Siddhi Query : Detect Pattern define stream Purchase (price double, cardNo long,place string); from every (a1 = Purchase[price < 10] -> a3= ..) -> a2 = Purchase[price >10000 and a1.cardNo == a2.cardNo] within 1 day select a1.cardNo as cardNo, a2.price as price, a2.place as place insert into PotentialFraud ;
  35. 35. Siddhi Query : Define Event Table define table CardUserTable (name string, cardNum long) ; define table CardUserTable (name string, cardNum long) from (‘datasource.name’=‘CardDataSource’, ‘table.name’= ‘UserTable’, ‘caching.algorithm’=‘LRU’) ; Cache types supported ● Basic: A size-based algorithm based on FIFO. ● LRU (Least Recently Used): The least recently used event is dropped when cache is full. ● LFU (Least Frequently Used): The least frequently used event is dropped when cache is full.
  36. 36. Siddhi Query : Query Event Table define stream Purchase (price double, cardNo long, place string); define table CardUserTable (name string, cardNum long) ; from Purchase#window.length(1) join CardUserTable on Purchase.cardNo == CardUserTable.cardNum select Purchase.cardNo as cardNo, CardUserTable.name as name, Purchase.price as price insert into PurchaseUserStream ;
  37. 37. Siddhi Query : Insert into Event Table define stream FraudStream (price double, cardNo long, userName string); define table BlacklistedUserTable (name string, cardNum long) ; from FraudStream select userName as name, cardNo as cardNum insert into BlacklistedUserTable ;
  38. 38. Siddhi Query : Update into Event Table define stream LoginStream (userID string, islogin bool, loginTime long); define table LastLoginTable (userID string, time long) ; from LoginStream select userID, loginTime as time update LastLoginTable on LoginStream.userID == LastLoginTable.userID ;
  39. 39. Siddhi Extensions ● Function extension ● Aggregator extension ● Window extension ● Transform extension
  40. 40. Siddhi Query : Function Extension from TempStream select deviceID, roomNo, custom:toKelvin(temp) as tempInKelvin, ‘K’ as scale insert into OutputStream ;
  41. 41. Siddhi Query : Aggregator Extension from TempStream select deviceID, roomNo, temp custom:stdev(temp) as stdevTemp, ‘C’ as scale insert into OutputStream ;
  42. 42. Siddhi Query : Window Extension from TempStream #window.custom:lastUnique(roomNo,2 min) select * insert into OutputStream ;
  43. 43. Siddhi Query : Transform Extension from XYZSpeedStream #transform.custom:getVelocityVector(v,vx,vy,vz) select velocity, direction insert into SpeedStream ;
  44. 44. CEP Event Adaptors ● For receiving and publishing events ● Has the configurations to connect to external endpoints ● Has many-to-one relationship with Event Streams
  45. 45. CEP Event Adaptors Support for several transports (network access) ● SOAP ● HTTP ● JMS ● SMTP ● SMS ● Thrift ● Kafka Supporting data formats ● XML ● JSON ● Map ● Text ● WSO2Event - WSO2 data format over Thrift for High Performant Event transfer supporting Java/C/C++/C# via Thrift language bindings
  46. 46. CEP Event Adaptors Supports database writes using Map messages ● Cassandra ● MYSQL ● H2 Supports custom event adaptors via its pluggable architecture!
  47. 47. Monitoring & Debugging : Event Flow ● Visualization of the Event Stream flow in CEP ● Helps to get the big picture ● Good for debugging
  48. 48. Monitoring & Debugging : Event Tracer • Dump message traces in a textual format • Before and after processing each stage of event flow
  49. 49. Monitoring & Debugging : Event Statistics • Real-time statistics • via visual illustrations & JMX • Time based request & response counts • Stats on all components of CEP server
  50. 50. Real Time Dashboard • Provides tools to configure gadgets • Currently supports RDBMS only • Powered by WSO2 User Engagement Server ( WSO2UES)
  51. 51. Performance Results • Same JVM Performance (Siddhi with Esper, M means a Million) 4 core machine • Filters 8M Events/Sec vs Esper 2M • Window 2.5M Events/Sec vs. Esper 1M • Patterns 1.4M Events/Sec about 10X faster than Esper • Over the Network Performance (Using thrift based WSO2 event format) - 8 core machine • Filter 0.25M (or 250K) Event/Sec
  52. 52. CEP High Availability Execution plan in “RedundantNode” based distributed processing mode <executionPlan name="RedundantNodeExecutionPlan" statistics="enable" trace="enable" xmlns="http://wso2.org/carbon/eventprocessor"> ... <siddhiConfiguration> <property name="siddhi.enable.distributed.processing">RedundantNode</property> <property name="siddhi.persistence.snapshot.time.interval.minutes">0</property> </siddhiConfiguration> ... </executionPlan>
  53. 53. HA / Persistence • Option 1: Side by side • Recommended • Takes 2X hardware • Gives zero down time • Option 2: Snapshot and restore • Uses less HW • Will lose events between snapshots • Downtime while recovery • ** Some scenarios you can use event tables to keep intermediate state
  54. 54. Scaling • Vertically scaling • Can be distributed as a pipeline • Horizontally scaling • Queries like windows, patterns, and Join have shared states • Hard to distribute!
  55. 55. Scaling (Contd.) • Currently users have to setup the pipeline manually (WSO2 team can help) • Work is underway to support above pipeline and distributer operators out of the box
  56. 56. Lambda Architecture
  57. 57. Demo
  58. 58. Scenario MyPizzaShop – On time delivery or free Pizza Offer !!!
  59. 59. Order Event { "event": { "correlationData": { "orderNo": "0023" }, "payloadData": { "orderInfo": "2 L PEPPERONI", "amount": "25.70", "name": "James Mark", "address": "29BX Finchwood Ave, Clovis, CA 93611", "tpNo": "(626)446-4601" } } }
  60. 60. Delivered to customer event correlation_orderNo:23, isDelivered:true
  61. 61. Email Notification Hi Alis Miranda Your order for 1 L CHICKEN pizza will be delivered in 30 mins to 779 Burl Ave, Clovis, CA 93611. The total cost of the order is $14.5. If you didn't get the pizza within 30 min you will be eligible to have those pizzas for free..!! MyPizzaShop
  62. 62. Final Payment Notification <event xmlns="http://wso2.org/carbon/event"> <correlationData> <orderNo>3</orderNo> </correlationData> <payloadData> <name>James Clark</name> <amount>54.0</amount> </payloadData> </event>
  63. 63. Thank You

×