This document provides an overview of complex event processing (CEP) and Esper, an open source CEP engine. It defines CEP as analyzing and controlling interrelated events to understand modern distributed systems. Esper makes it easier to build CEP applications by providing an event processing language (EPL) to define event types, continuous queries, and event patterns similar to SQL. It supports features like filtering, aggregation, windows, correlations, and pattern detection on streaming event data. While powerful, Esper has limitations around memory usage, resilience, and distribution that must be considered.
3. “Complex Event is an event
that could only happen if lots
of other events happened”
“CEP is a set of tools and
techniques for analyzing and
controlling the complex series
of interrelated events that drive
modern distributed information
systems”
David Luckham, 2002
4. Example
• Church bell ringing
• Appearance of a man in a tuxedo
• Appearance of a woman in a white gown
• Rice flying through the air
5. Example
• Church bell ringing
• Appearance of a man in a tuxedo
• Appearance of a woman in a white gown
• Rice flying through the air
Wedding has happened!
6. CEP Use Cases
• Are our business processes running on
time and correctly?
• Can we detect an opportunity for arbitrage
in our trading department?
• Are we servicing our call center customer’s
requests in a timely fashion?
• Was there a breach in our network?
31. Event Definition (1/2)
create schema Event (
id string, // Event unique identifier
ts long // Timestamp (milliseconds)
);
create schema Tweet (
user string, // username (e.g. ‘codebits’)
text string, // actual tweet
retweet_of string // references a Tweet.id
) inherits Event;
32. Event Definition (2/2)
create schema Hashtag (
tweet_id string, // references a Tweet.id
user string,
value string
) inherits Event;
// Create Url and Mention event types as a copy of Hashtag
create schema Url() copyfrom Hashtag;
create schema Mention() copyfrom Hashtag;
33. Looks like SQL...
// All events
select * from Event;
// Only tweets
select user, text as status
from Tweet;
34. Filtering
// Tweets from @codebits
select * from Tweet(user = 'codebits');
// Another way to do it
select * from Tweet where user = 'codebits';
// All occurrences of #codebits not posted by @codebits
select user,
value as hashtag,
current_timestamp() as ts
from Hashtag(value = 'codebits' and user != 'codebits');
35. Stream Creation and Redirection
insert into CodebitsTweets
select * from Tweet(user = ‘codebits’);
select * from CodebitsTweets;
36. Aggregation
insert into UrlsPerSecond
select count(*) as count from Url.win:time_batch(1 sec);
// Every second (driven by above rule) calculate for last minute
// - average Urls tweeted
// - total Urls tweeted
select avg(count), sum(count)
from UrlsPerSecond.win:length(60);
37. Grouping
select value as hashtag, count(*)
from Hashtag(value != null).win:time(30 seconds)
group by value;
38. Simple Event Views
select * from Tweet.win:time(5 min);
select * from Tweet.win:time_batch(1 hour);
select * from Tweet.win:length(10);
select * from Tweet.win:length_batch(10);
39. Other Standard Event Views
// Don’t use system clock, use event stream property
select * from Tweet.win:ext_timed(ts, 5 min);
// Last 10 tweets per user
select * from Tweet.std:groupwin(user).win:length(10);
// Top 5 Hashtags
select * from HashtagsPerMinute.std:sort(5, count desc);
41. Correlation
// Associate hashtags used to describe a URL
insert into UrlTags
select u.value as url, h.value as hashtag
from Url.std:lastevent() as u,
Hashtag.std:lastevent() as h
where u.tweet_id = h.tweet_id;
insert into UrlTagsCount
select url,
hashtag,
count(*) as count
from UrlTags.win:time(1 hour)
group by url, hashtag;
42. Correlation (1/2)
// Every minute, output Top 3 hashtags per URL
select * from UrlTagsCount.ext:sort(3, count desc)
output snapshot at(*/1,*,*,*,*);
43. Event Patterns
// Measure how long it takes users to respond to Tweet
insert into ResponseDelay
select t.id as tweet_id,
t.user as author,
m.value as responder,
t.ts as start_ts,
m.ts as stop_ts,
m.ts - t.ts as duration
from pattern [
every (t=Tweet -> m=Mention(value = t.user))
];
44. Detecting Missing Events
// No Tweet from @codebits in 1 hour
select *
from pattern [ every Tweet(user = ‘codebits’) ->
(timer:interval(1 hour) and not Tweet(user = ‘codebits’))
];
45. Other features
• Subqueries
• Inner, outer joins
• Named windows
• 1 class integration with databases (JDBC)
st
• Regex-like Event Pattern matching (match-
recognize)
Introduce myself\nTalk about Goals for this presentation\n- motivate you to further explore the world of Complex Event Processing\nget you started with building your own CEP apps with Esper\nyou wont learn how you use it but you will have an idea where to start\n\n
What is Complex Event Processing?\nIs anyone here familiar with the term?\n-> Ask around\n
It’s actually a pretty simple but powerful concept.\nIntuitively we all know what it is.\n\nA Complex Event is simply an event that can be inferred from other simpler events.\n\nComplex Event Processing is, very basically, a framework for analyzing and extract meaning, knowledge and value from the continuous stream of Events produced and consumed by modern information systems:\n- business transactions\n- call center\n- financial\n- network events\n- events coming from Web APIs\n\nThis concept was born in 2002, by David Luckham in his book The Power of Events. There he explores the evolution of event-driven businesses and what he calls the Event Cloud which are all the events that modern businesses and systems produce and consume.\n\nI highly recommend anyone interested in the area to read this book. Despite being almost a decade old, most of the concepts and principles still hold today and are still followed by few.\n\n
Wikipedia\n\nWhat can we infer from these?\n\nWe can infer a new event, a Complex Event: a Wedding happened!\n\n\n\n
It’s a technological framework\nAlthough you normally call a CEP system, one that presents a few defining characteristics\n
You could say it’s now a buzzword used by most Enterprise software providers\n\nSubject to a set of commercialization fuzz\n
0.5\n\nBut it’s actually useful as a framework to think about how to take advantage of the Event cloud, that is to say, all the data and events generated at amazing pace nowadays.\n\nSimple set of principles about event processing and the use of events, and that is going to be subject to a similar set of commercialization fuzz in the future -> LUKHAM\n\nCEP is about patterns of events. What kinds of patterns do you want to recognize? How do you define patterns? What are the important elements of an event pattern? For example, is timing important? Is large numbers of events important? Are their cause or relationships important? Should you be able to define patterns that involve the causality between events? So on. What do you do when you recognize a pattern? Can you abstract it into a higher lever event? OK, now you have hierarchies of events. So now, what sorts of hierarchies are important in event processing? Can you define your own hierarchy? Can you change it easily? Can you drill down from a higher-level event to find out how it happened? All of those kinds of issues form the principles of complex event processing. It's just a different take on what you do with that. -> LUKHAM\n\nComplex event processing (CEP) consists of processing many events happening across all the layers of an organization, identifying the most meaningful events within the event cloud, analyzing their impact, and taking subsequent action in real time. -> WIKIPEDIA\n\n"Complex Event Processing, or CEP, is primarily an event processing concept that deals with the task of processing multiple events with the goal of identifying the meaningful events within the event cloud. CEP employs techniques such as detection of complex patterns of many events, event correlation and abstraction, event hierarchies, and relationships between events such as causality, membership, and timing, and event-driven processes." -> WIKIPEDIA\n
This the basic CEP architecture\nEDA\nEvent Sources\nUI => ALERTS are most important on a tactical level\n\nEvent Sources\n- Events are generated at a freakish pace from all over the place.\n- Web APIs\n- Logs\n- Business transactions\n\nAs an EDA, all incoming events are published in a messaging bus\n\nEvents that go into this system must be preprocessed and republished\n- transformed\n- normalized\n- split\n\n\n
Commercial/Open Source\nJVM\nLatency\nThroughput\nDeals with lots of rules\nEPL\n\nFrame remaining talk\n
Event Stream Processing\nCEP System\n
It’s not the only piece you need\n\nYou don’t need to build it yourself! \n\nAll previous Operations are supported\n\nNo glue code we are able to easily apply filters, aggregations! Esper automatically maintains only the data we need to fulfill our queries and expires old events as new ones arrive.\n
0.3\n\nNeither Databases nor OLAPs\n\n
0.3\n
1\n\nSome highly focused and optimized memory-based stores improve the situation and can, in some cases actually be enough.\n\nHowhever, there is no language constructs for continuous event processing and querying\n
1\n\nSome highly focused and optimized memory-based stores improve the situation and can, in some cases actually be enough.\n\nHowhever, there is no language constructs for continuous event processing and querying\n
2\n\nEPL queries are created and stored in the engine, and publish results to listeners as events are received by the engine or timer events occur that match the criteria specified in the query. Events can also be obtained from running EPL queries via the safeIterator and iterator methods that provide a pull-data API.\nThe select clause in an EPL query specifies the event properties or events to retrieve. The from clause in an EPL query specifies the event stream definitions and stream names to use. The where clause in an EPL query specifies search conditions that specify which event or event combination to search for. For example, the following statement returns the average price for IBM stock ticks in the last 30 seconds.\nThe Event Processing Language (EPL) is a SQL-like language with SELECT, FROM, WHERE, GROUP BY, HAVING and ORDER BY clauses. Streams replace tables as the source of data with events replacing rows as the basic unit of data. Since events are composed of data, the SQL concepts of correlation through joins, filtering and aggregation through grouping can be effectively leveraged.\nThe INSERT INTO clause is recast as a means of forwarding events to other streams for further downstream processing. External data accessible through JDBC may be queried and joined with the stream data. Additional clauses such as the PATTERN and OUTPUT clauses are also available to provide the missing SQL language constructs specific to event processing.\nThe purpose of the UPDATE clause is to update event properties. Update takes place before an event applies to any selecting statements or pattern statements.\nEPL statements are used to derive and aggregate information from one or more streams of events, and to join or merge event streams. This section outlines EPL syntax. It also outlines the built-in views, which are the building blocks for deriving and aggregating information from event streams.\nEPL statements contain definitions of one or more views. Similar to tables in a SQL statement, views define the data available for querying and filtering. Some views represent windows over a stream of events. Other views derive statistics from event properties, group events or handle unique event property values. Views can be staggered onto each other to build a chain of views. The Esper engine makes sure that views are reused among EPL statements for efficiency.\nThe built-in set of views is:\nData window views: win:length, win:length_batch, win:time, win:time_batch, win:time_length_batch, win:time_accum, win:ext_timed, ext:sort_window, ext:time_order, std:unique, std:groupwin, std:lastevent, std:firstevent, std:firstunique, win:firstlength, win:firsttime. \nViews that derive statistics: std:size, stat:uni, stat:linest, stat:correl, stat:weighted_avg. \nEPL provides the concept of named window. Named windows are data windows that can be inserted-into and deleted-from by one or more statements, and that can queried by one or more statements. Named windows have a global character, being visible and shared across an engine instance beyond a single statement. Use the CREATE WINDOW clause to create named windows. Use the ON MERGE clause to atomically merge events into named window state, the INSERT INTO clause to insert data into a named window, the ON DELETE clause to remove events from a named window, the ON UPDATE clause to update events held by a named window and the ON SELECT clause to perform a query triggered by a pattern or arriving event on a named window. Finally, the name of the named window can occur in a statement's FROM clause to query a named window or include the named window in a join or subquery.\nEPL allows execution of on-demand (fire-and-forget, non-continuous, triggered by API) queries against named windows through the runtime API. The query engine automatically indexes named window data for fast access by ON SELECT/UPDATE/INSERT/DELETE without the need to create an index explicitly. For fast on-demand query execution via runtime API use the CREATE INDEX syntax to create an explicit index.\nUse CREATE SCHEMA to declare an event type.\nVariables can come in handy to parameterize statements and change parameters on-the-fly and in response to events. Variables can be used in an expression anywhere in a statement as well as in the output clause for dynamic control of output rates.\nEsper can be extended by plugging-in custom developed views and aggregation functions.\n\n\nSegway into the EPL\n
\n
Ways to define events: API, modules, etc…\n\nEvent representations\n\nThe event type will then be used in the proper rules/queries, as you would use a Table name in SQL.\n\nInheritance\n
\n
Inheritance enables polymorphic rules\n
\n
\n
Rate of published URLs per minute\n
Rate of hashtag publishing per 30 seconds\nFor each Hashtag\n
Sliding\nTumbling\n
Chaining - View Composition\nOrder matters\n\n
TRIX\nSentiment Injection\n
\n
\n
Great for transactions!\n\n\n
Very common case\nA Missing Event is an Event\n
\n
\n
\n
CORRELATION/JOINS/PATTERNS\n
\n
It’s not as powerful as what you can find in rules engines\n\nYou can circunvent this by writing your own extensions in a JVM language\n\nCould be better\n\n
There’s no native support for tracing causal relationships between events\n\nYou have to build it in your rules\n
Only the commercial version\n\nYou can build your own\n