Complex Event Processing with Esper


Published on

Talk I gave at Codebits 2011 on 11/11/11 about Complex Event Processing using Esper.

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Introduce myself\nTalk about Goals for this presentation\n- motivate you to further explore the world of Complex Event Processing\nget you started with building your own CEP apps with Esper\nyou wont learn how you use it but you will have an idea where to start\n\n
  • What is Complex Event Processing?\nIs anyone here familiar with the term?\n-> Ask around\n
  • It’s actually a pretty simple but powerful concept.\nIntuitively we all know what it is.\n\nA Complex Event is simply an event that can be inferred from other simpler events.\n\nComplex Event Processing is, very basically, a framework for analyzing and extract meaning, knowledge and value from the continuous stream of Events produced and consumed by modern information systems:\n- business transactions\n- call center\n- financial\n- network events\n- events coming from Web APIs\n\nThis concept was born in 2002, by David Luckham in his book The Power of Events. There he explores the evolution of event-driven businesses and what he calls the Event Cloud which are all the events that modern businesses and systems produce and consume.\n\nI highly recommend anyone interested in the area to read this book. Despite being almost a decade old, most of the concepts and principles still hold today and are still followed by few.\n\n
  • Wikipedia\n\nWhat can we infer from these?\n\nWe can infer a new event, a Complex Event: a Wedding happened!\n\n\n\n
  • BAM - call center, billing, payments, etc...\nHFT\nNetwork Intrusion Detection\nFraud Detection\nSensor Networks\nlike Fire, Tsunami detection\neolic\n
  • It’s a technological framework\nAlthough you normally call a CEP system, one that presents a few defining characteristics\n
  • You could say it’s now a buzzword used by most Enterprise software providers\n\nSubject to a set of commercialization fuzz\n
  • 0.5\n\nBut it’s actually useful as a framework to think about how to take advantage of the Event cloud, that is to say, all the data and events generated at amazing pace nowadays.\n\nSimple set of principles about event processing and the use of events, and that is going to be subject to a similar set of commercialization fuzz in the future -> LUKHAM\n\nCEP is about patterns of events. What kinds of patterns do you want to recognize? How do you define patterns? What are the important elements of an event pattern? For example, is timing important? Is large numbers of events important? Are their cause or relationships important? Should you be able to define patterns that involve the causality between events? So on. What do you do when you recognize a pattern? Can you abstract it into a higher lever event? OK, now you have hierarchies of events. So now, what sorts of hierarchies are important in event processing? Can you define your own hierarchy? Can you change it easily? Can you drill down from a higher-level event to find out how it happened? All of those kinds of issues form the principles of complex event processing. It's just a different take on what you do with that. -> LUKHAM\n\nComplex event processing (CEP) consists of processing many events happening across all the layers of an organization, identifying the most meaningful events within the event cloud, analyzing their impact, and taking subsequent action in real time. -> WIKIPEDIA\n\n"Complex Event Processing, or CEP, is primarily an event processing concept that deals with the task of processing multiple events with the goal of identifying the meaningful events within the event cloud. CEP employs techniques such as detection of complex patterns of many events, event correlation and abstraction, event hierarchies, and relationships between events such as causality, membership, and timing, and event-driven processes." -> WIKIPEDIA\n
  • This the basic CEP architecture\nEDA\nEvent Sources\nUI => ALERTS are most important on a tactical level\n\nEvent Sources\n- Events are generated at a freakish pace from all over the place.\n- Web APIs\n- Logs\n- Business transactions\n\nAs an EDA, all incoming events are published in a messaging bus\n\nEvents that go into this system must be preprocessed and republished\n- transformed\n- normalized\n- split\n\n\n
  • CEP system\n\n\nDomain Specific Language\nContinuous Query\nTime or Length Windows\nTemporal Pattern Matching\n\nEvent pattern detection\nEvent abstraction\nEvent hierarchies\nEvent relationships\n- causality\n- membership\n- timing\nAbstracting event-driven processes => ??\n\n
  • Continuous Query\nAsync\nContinuously responds to the actual events passing through the system\nFits into EDA\n\nSegway into Realtime...\n
  • Low latency\nReact When it matters!!\nHigh-throughput => because we are dealing with a lot of data/events\n
  • Low latency\nReact When it matters!!\nHigh-throughput => because we are dealing with a lot of data/events\n
  • Goes accross all layers of an organization\n\nEvent hierarchies\n\nTraceability => DRILL DOWN\n
  • Summarize a bunch of events:\n- averages, counts, etc...\n
  • Horizontal\n\nand Vertical\n\nBy cause\nTiming\nMembership\n\nDrill down\n
  • \n
  • DSL to make it easy and performant to express the Event Processing Rules that realize the previous features and All Operations\n
  • General but rigorous definition\n
  • Continuously applied/Streaming\nTime/Length Windows\n
  • Commercial/Open Source\nJVM\nLatency\nThroughput\nDeals with lots of rules\nEPL\n\nFrame remaining talk\n
  • Event Stream Processing\nCEP System\n
  • It’s not the only piece you need\n\nYou don’t need to build it yourself! \n\nAll previous Operations are supported\n\nNo glue code we are able to easily apply filters, aggregations! Esper automatically maintains only the data we need to fulfill our queries and expires old events as new ones arrive.\n
  • 0.3\n\nNeither Databases nor OLAPs\n\n
  • 0.3\n
  • 1\n\nSome highly focused and optimized memory-based stores improve the situation and can, in some cases actually be enough.\n\nHowhever, there is no language constructs for continuous event processing and querying\n
  • 1\n\nSome highly focused and optimized memory-based stores improve the situation and can, in some cases actually be enough.\n\nHowhever, there is no language constructs for continuous event processing and querying\n
  • 2\n\nEPL queries are created and stored in the engine, and publish results to listeners as events are received by the engine or timer events occur that match the criteria specified in the query. Events can also be obtained from running EPL queries via the safeIterator and iterator methods that provide a pull-data API.\nThe select clause in an EPL query specifies the event properties or events to retrieve. The from clause in an EPL query specifies the event stream definitions and stream names to use. The where clause in an EPL query specifies search conditions that specify which event or event combination to search for. For example, the following statement returns the average price for IBM stock ticks in the last 30 seconds.\nThe Event Processing Language (EPL) is a SQL-like language with SELECT, FROM, WHERE, GROUP BY, HAVING and ORDER BY clauses. Streams replace tables as the source of data with events replacing rows as the basic unit of data. Since events are composed of data, the SQL concepts of correlation through joins, filtering and aggregation through grouping can be effectively leveraged.\nThe INSERT INTO clause is recast as a means of forwarding events to other streams for further downstream processing. External data accessible through JDBC may be queried and joined with the stream data. Additional clauses such as the PATTERN and OUTPUT clauses are also available to provide the missing SQL language constructs specific to event processing.\nThe purpose of the UPDATE clause is to update event properties. Update takes place before an event applies to any selecting statements or pattern statements.\nEPL statements are used to derive and aggregate information from one or more streams of events, and to join or merge event streams. This section outlines EPL syntax. It also outlines the built-in views, which are the building blocks for deriving and aggregating information from event streams.\nEPL statements contain definitions of one or more views. Similar to tables in a SQL statement, views define the data available for querying and filtering. Some views represent windows over a stream of events. Other views derive statistics from event properties, group events or handle unique event property values. Views can be staggered onto each other to build a chain of views. The Esper engine makes sure that views are reused among EPL statements for efficiency.\nThe built-in set of views is:\nData window views: win:length, win:length_batch, win:time, win:time_batch, win:time_length_batch, win:time_accum, win:ext_timed, ext:sort_window, ext:time_order, std:unique, std:groupwin, std:lastevent, std:firstevent, std:firstunique, win:firstlength, win:firsttime. \nViews that derive statistics: std:size, stat:uni, stat:linest, stat:correl, stat:weighted_avg. \nEPL provides the concept of named window. Named windows are data windows that can be inserted-into and deleted-from by one or more statements, and that can queried by one or more statements. Named windows have a global character, being visible and shared across an engine instance beyond a single statement. Use the CREATE WINDOW clause to create named windows. Use the ON MERGE clause to atomically merge events into named window state, the INSERT INTO clause to insert data into a named window, the ON DELETE clause to remove events from a named window, the ON UPDATE clause to update events held by a named window and the ON SELECT clause to perform a query triggered by a pattern or arriving event on a named window. Finally, the name of the named window can occur in a statement's FROM clause to query a named window or include the named window in a join or subquery.\nEPL allows execution of on-demand (fire-and-forget, non-continuous, triggered by API) queries against named windows through the runtime API. The query engine automatically indexes named window data for fast access by ON SELECT/UPDATE/INSERT/DELETE without the need to create an index explicitly. For fast on-demand query execution via runtime API use the CREATE INDEX syntax to create an explicit index.\nUse CREATE SCHEMA to declare an event type.\nVariables can come in handy to parameterize statements and change parameters on-the-fly and in response to events. Variables can be used in an expression anywhere in a statement as well as in the output clause for dynamic control of output rates.\nEsper can be extended by plugging-in custom developed views and aggregation functions.\n\n\nSegway into the EPL\n
  • \n
  • Ways to define events: API, modules, etc…\n\nEvent representations\n\nThe event type will then be used in the proper rules/queries, as you would use a Table name in SQL.\n\nInheritance\n
  • \n
  • Inheritance enables polymorphic rules\n
  • \n
  • \n
  • Rate of published URLs per minute\n
  • Rate of hashtag publishing per 30 seconds\nFor each Hashtag\n
  • Sliding\nTumbling\n
  • Chaining - View Composition\nOrder matters\n\n
  • TRIX\nSentiment Injection\n
  • \n
  • \n
  • Great for transactions!\n\n\n
  • Very common case\nA Missing Event is an Event\n
  • \n
  • \n
  • \n
  • \n
  • It’s not as powerful as what you can find in rules engines\n\nYou can circunvent this by writing your own extensions in a JVM language\n\nCould be better\n\n
  • There’s no native support for tracing causal relationships between events\n\nYou have to build it in your rules\n
  • Only the commercial version\n\nYou can build your own\n
  • Rails\nFeedzai\n
  • Streambase\nFeedzai\nSiddhi-CEP\nETALIS\nOracle CEP\nMicrosoft...\nApama\nRuleCore\nDrools Fusion\n...\n
  • \n
  • Complex Event Processing with Esper

    1. 1. Complex EventProcessing with Esper @antonioalegria
    2. 2. Complex Event Processing? CEP
    3. 3. “Complex Event is an event that could only happen if lots of other events happened” “CEP is a set of tools and techniques for analyzing and controlling the complex seriesof interrelated events that drivemodern distributed information systems” David Luckham, 2002
    4. 4. Example• Church bell ringing• Appearance of a man in a tuxedo• Appearance of a woman in a white gown• Rice flying through the air
    5. 5. Example• Church bell ringing• Appearance of a man in a tuxedo• Appearance of a woman in a white gown• Rice flying through the air Wedding has happened!
    6. 6. CEP Use Cases• Are our business processes running on time and correctly?• Can we detect an opportunity for arbitrage in our trading department?• Are we servicing our call center customer’s requests in a timely fashion?• Was there a breach in our network?
    7. 7. It’s not a technology
    8. 8. It’s a Buzzword like SOA!
    9. 9. It’s an Architectural Pattern
    10. 10. What do you need for CEP?
    11. 11. Event driven
    12. 12. (soft) Real-time
    13. 13. (soft) Real-time Right
    14. 14. Across all layers of organization
    15. 15. Event Aggregation
    16. 16. Event Relationships• Causality• Membership• Timing
    17. 17. Event Patterns
    18. 18. Domain Specific Language for Event Processing
    19. 19. What you need for CEP• Event Driven• Right-time• Across all layers• Aggregation, Correlation & Traceability• Patterns• DSL
    20. 20. Common CEP Operations• Windowing• Transformation• Aggregation/Grouping• Merging/Union• Filtering• Sorting• Correlation• Pattern Detection
    21. 21. Esper
    22. 22. Esper makes it easier to build a CEP app
    23. 23. Not meant to replace Databases
    24. 24. But some parallels can be made
    25. 25. Esper DB• Stores queries • Stores data• Continuous queries • On-demand queries• Time is a dimension • Time is a data type
    26. 26. Esper DB• EPL • SQL• Event Streams • Tables• Events • Rows
    27. 27. Esper Processing Model
    28. 28. EPLEvent Processing Language
    29. 29. Event Definition (1/2)create schema Event ( id string, // Event unique identifier ts long // Timestamp (milliseconds));create schema Tweet ( user string, // username (e.g. ‘codebits’) text string, // actual tweet retweet_of string // references a inherits Event;
    30. 30. Event Definition (2/2)create schema Hashtag ( tweet_id string, // references a user string, value string) inherits Event;// Create Url and Mention event types as a copy of Hashtagcreate schema Url() copyfrom Hashtag;create schema Mention() copyfrom Hashtag;
    31. 31. Looks like SQL...// All eventsselect * from Event;// Only tweetsselect user, text as statusfrom Tweet;
    32. 32. Filtering// Tweets from @codebitsselect * from Tweet(user = codebits);// Another way to do itselect * from Tweet where user = codebits;// All occurrences of #codebits not posted by @codebitsselect user, value as hashtag, current_timestamp() as tsfrom Hashtag(value = codebits and user != codebits);
    33. 33. Stream Creation and Redirectioninsert into CodebitsTweetsselect * from Tweet(user = ‘codebits’);select * from CodebitsTweets;
    34. 34. Aggregationinsert into UrlsPerSecondselect count(*) as count from sec);// Every second (driven by above rule) calculate for last minute// - average Urls tweeted// - total Urls tweetedselect avg(count), sum(count)from;
    35. 35. Groupingselect value as hashtag, count(*)from Hashtag(value != null).win:time(30 seconds)group by value;
    36. 36. Simple Event Viewsselect * from min);select * from hour);select * from;select * from;
    37. 37. Other Standard Event Views// Don’t use system clock, use event stream propertyselect * from, 5 min);// Last 10 tweets per userselect * from Tweet.std:groupwin(user).win:length(10);// Top 5 Hashtagsselect * from HashtagsPerMinute.std:sort(5, count desc);
    38. 38. You can create your own custom Views
    39. 39. Correlation// Associate hashtags used to describe a URLinsert into UrlTagsselect u.value as url, h.value as hashtagfrom Url.std:lastevent() as u, Hashtag.std:lastevent() as hwhere u.tweet_id = h.tweet_id;insert into UrlTagsCountselect url, hashtag, count(*) as countfrom hour)group by url, hashtag;
    40. 40. Correlation (1/2)// Every minute, output Top 3 hashtags per URLselect * from UrlTagsCount.ext:sort(3, count desc)output snapshot at(*/1,*,*,*,*);
    41. 41. Event Patterns// Measure how long it takes users to respond to Tweetinsert into ResponseDelayselect as tweet_id, t.user as author, m.value as responder, t.ts as start_ts, m.ts as stop_ts, m.ts - t.ts as durationfrom pattern [ every (t=Tweet -> m=Mention(value = t.user))];
    42. 42. Detecting Missing Events// No Tweet from @codebits in 1 hourselect *from pattern [ every Tweet(user = ‘codebits’) -> (timer:interval(1 hour) and not Tweet(user = ‘codebits’))];
    43. 43. Other features• Subqueries• Inner, outer joins• Named windows• 1 class integration with databases (JDBC) st• Regex-like Event Pattern matching (match- recognize)
    44. 44. Esper is awesome!
    45. 45. It’s not a silver bullet well, duh!
    46. 46. Memory Usage
    47. 47. Resilience &Persistence
    48. 48. Weak Pattern matching
    49. 49. Drill-down not trivial
    50. 50. It’s NOT distributed!
    51. 51. Not full-stack
    52. 52. QAFor more: @antonioalegria