SlideShare a Scribd company logo
1 of 121
Processing Flows of InformationFrom Data Streams to Complex Event Processing GianpaoloCugola and Alessandro Margara Politecnicodi Milano Dip. Elettronica e Informazione [cugola,margara]@elet.polimi.it “Processing Flows of Information: from data streams to complex event processing”ACM Computing Surveys, to appear (but available from: http://home.dei.polimi.it/cugola)
Background and motivation Back in 2007 CEP was already a hot topic… … but having a good grasp of the area was rather hard As observed by OpherEtzion the area was looking like the “Tower of Babel” Event Processing and the Babylon Tower – Event process thinking blog – Sept. 8, 2007 event data message primitive complex composite stream flow sequence pattern join query rule condition action publish subscribe notify advertise send receive middleware system application protocol network routing 2 DEBS 2011 - Tutorial
Background and motivation Several communities were contributing to the area… … each bringing its own expertise and vocabulary… …but often working in isolation event Researchers data message DEBS primitive data management& databases process modeling& automation complex composite stream flow sequence pattern join query rule condition action publish subscribe notify middlewaresystems ad-hoc tools(intrusion det., …) advertise send receive applicationservers DBMSs middleware system application protocol network Tool vendors routing 3 DEBS 2011 - Tutorial
Background and motivation That was 2007. What about today? Things did not change much From the “Event Process Thinking” blog – Sept. 2010 [Which is] the relation between event processing and data stream management? They are aliases -- stream is just a collection of events, likewise, an event is just a member in a stream, and the functionality is the same Stream management is a subset of event processing -- there are different ways to do event processing, streams is one of them Event processing is a subset of stream management -- event streams is just one type of stream, but there are voice stream, video stream, … Event processing and stream management are distinct and there is no overlapping between them At the same time tool vendors are building tools that try to combine  ¿ juxtapose ?  different approaches 4 DEBS 2011 - Tutorial
Our goal We would like to compare different systems in a precise way We would like to compare different approaches in a precise way We would like to help people coming from different areas communicate and compare their work with others We would like to isolate the open issues from those already solved We would like to precisely identify the challenges We would like to isolate the best part of the various approaches… … finding a way to combine them We need a modeling frameworkthat could accommodate all the proposals 5 DEBS 2011 - Tutorial
Forget your own vocabulary(temporarily) CEP, DSMSs, Active Databases…  All these terms hide a peculiar view of the domain we have in mind With subtle, “unsaid” (and often unclear) differences Before learning something new we have to forget what we already know 6 DEBS 2011 - Tutorial
The Information Flow Processing domain The IFP engine processes incoming flows of information according to a set of processing rules Processing is “on line” Sources produce the incoming information flows, sinks consume the results of processing, rule managers add or remove rules Information flows are composed of information items Items part of the same flow are neither necessarily ordered nor of the same kind ,[object Object],Sources Sinks IFP Engine Information Flows Information Flows ----- ----- ----- ----- Rules ----- ----- ----- ----- Rule managers 7 DEBS 2011 - Tutorial
IFP: One name several incarnations A lot of applications Environmental monitoring through sensor networks Financial applications Fraud detection tools Network intrusion detection systems RFID-based inventory management Manufacturing control systems … Several technologies Active databases DSMSs CEP Systems 8 DEBS 2011 - Tutorial
One model, several models Different models to capture different viewpoints Functional model Processing model Deployment model Interaction model Time model Data model Rule model Language model 9 DEBS 2011 - Tutorial
Functional model Knowledgebase A A A History History Seq History Decider Forwarder Receiver Producer ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Clock Rules 10 DEBS 2011 - Tutorial
Functional model ,[object Object]
Acts as a demultiplexerKnowledgebase A A A History History Seq History Decider Forwarder Receiver Producer ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Clock Rules 11 DEBS 2011 - Tutorial
Functional model ,[object Object]
Acts as a multiplexerKnowledgebase A A A History History Seq History Decider Forwarder Receiver Producer ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Clock Rules 12 DEBS 2011 - Tutorial
A short digression We assume rules can be (logically)decomposed in two parts: C -> A C is the condition A is the action Example (in CQL):   Select IStream(Count(*))From F1 [Range 1 Minute]Where F1.A > 0 This way we can split processingin two phases: The detection phase determines the items that trigger the rule The production phase use those items to produce the output of the rule Knowledgebase A A A History History Seq History action Decider Forwarder Receiver condition Producer ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Clock Rules 13 DEBS 2011 - Tutorial
Functional model Knowledgebase A A A Seq Decider Forwarder History Receiver History Producer History ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Clock Rules 14 DEBS 2011 - Tutorial
Functional model ,[object Object]
Accumulates partial results into the history
When a rule fires passes to the producer its action part and the triggering itemsKnowledgebase A A A Seq Decider Forwarder History Receiver History Producer History ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Clock Rules 15 DEBS 2011 - Tutorial
Functional model Knowledgebase A A A ,[object Object]
Uses the items in Seq as stated in action ASeq Decider Forwarder History Receiver History Producer History ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Clock Rules 16 DEBS 2011 - Tutorial
Functional model Knowledgebase A A A Seq Decider Forwarder History Receiver History Producer History ,[object Object],----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Clock Rules 17 DEBS 2011 - Tutorial
Functional model ,[object Object],Knowledgebase A A A Seq Decider Forwarder History Receiver History Producer History ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Clock Rules 18 DEBS 2011 - Tutorial
Functional model ,[object Object],Knowledgebase A A A Seq Decider Forwarder History Receiver History Producer History ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Clock Rules 19 DEBS 2011 - Tutorial
Functional model Knowledgebase A A A ,[object Object]
Periodically creates special information items holding current time
Its presence models the ability of performingperiodic processing of inputsSeq Decider Forwarder History Receiver History Producer History ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Clock Rules 20 DEBS 2011 - Tutorial
Functional model: Considerations The detection-production cycle Fired by a new item I entering the engine through the Receiver Including those periodically produced by the Clock, if present Detection phase: Evaluates all the rules to find those enabled Using item I plus the data into the Knowledge base, if present The item I can be accumulated into the History for partially enabled rules The action part of the enabled rules together with the triggering items (A+Seq) is passed to the producer Production phase: Produces the output items Combining the items that triggered the rule with data present in the Knowledge base, if present New items are sent to subscribed sinks (through the Forwarder)… …but they could also be sent internally to be processed again (recursive processing) In some systems the action part of fired rules may also change the set of deployed rules 21 DEBS 2011 - Tutorial
Functional model: Considerations Maximum length of Seq a key aspect 1  PubSub Bounded   CQL like languages without time based windows Pattern based languages without a Kleene+ operator Other key aspects that impact expressiveness Presence of the Clock Models the ability to process rulesperiodically Available in almost half of the systemsreviewed Most Active DBs and DSMSs butfew CEP systems Knowledgebase A A A History History Seq History Decider Forwarder Receiver Producer ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Clock Rules 22 DEBS 2011 - Tutorial
Functional model: Considerations Presence of the Knowledge base Only available in systems coming from the database community Presence of the looping flow exitingthe Producer Models the ability of performing recursiveprocessing Half CEP systems have it All Active DBs but very few DSMSshave it They have nested rules Support to dynamic rule change Few systems support it Can be implemented externally… Through sinks acting also as rule managers …but we think it is nice to have it internally Knowledgebase A A A History History Seq History Decider Forwarder Receiver Producer ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Clock Rules 23 DEBS 2011 - Tutorial
The semantics of processing What determines the output of each detection-production cycle? The new item entering the engine The set of deployed rules The items stored into the History The content of the Knowledge Base Is this enough? Example (in Padres and CQL): Smoke && Temp>50	 Select IStream(Smoke.area)From Smoke[Rows 30 Slide 10], Temp[Rows 50 Slide 5]Where Smoke.area = Temp.area AND Temp.value > 50 Knowledgebase A A A History History History History History History Rules Knowledgebase Seq Decider ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Producer ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- 24 DEBS 2011 - Tutorial
Processing model Three policies affect the behavior of the system The selection policy The consumption policy The load shedding policy 25 DEBS 2011 - Tutorial
Selection policy Determines if a rule fires once or multiple times and the items actually selected from the History Example: A0  B A1  B A0  B A1  B R R R R multiple ? Decider Receiver A A A A0 A1 single or B A A B 26 DEBS 2011 - Tutorial
Selection policy: Considerations Most systems adopt a multiple selection policy It is simpler to implement Is it adequate? Example: Alert fire when smoke and high temperature in a short time frame If 10 sensors read high temperature and immediately afterward one detects smoke I would like to receive a single alert, not 10 A few systems allow this policy to be programmed… …some of them on a per-rule base E.g., Amit, T-Rex 27 DEBS 2011 - Tutorial
Selection policy: The TESLA case TESLA (Trio-based Event Specification Language): the T-Rex language A rule language for CEP. Tries to combine expressiveness and efficiency Has a formally defined semantics Expressed in Trio, a Metric Temporal Logic (see DEBS 2010) Allows rule managers to choose their own selection policy on a per rule base Example: Multiple selection 	define  Fire(area: string, measuredTemp: double)from    Smoke(area=$a) and each Temp(area=$a and val>50) within 1min. from Smokewhere   area=Smoke.area and measuredTemp=Temp.value Example: Single selection 	define  Fire(area: string, measuredTemp: double)from    Smoke(area=$a) and last Temp(area=$a and val>50) within 1min. from Smokewhere   area=Smoke.area and measuredTemp=Temp.val 28 DEBS 2011 - Tutorial
Selection policy: The TESLA case TESLA (Trio-based Event Specification Language): the T-Rex language A rule language for CEP. Tries to combine expressiveness and efficiency Has a formally defined semantics Expressed in Trio, a Metric Temporal Logic (see DEBS 2010) Allows rule managers to choose their own selection policy on a per rule base Example: Multiple selection 	define  Fire(area: string, measuredTemp: double)from    Smoke(area=$a) and each Temp(area=$a and val>50) within 1min. from Smokewhere   area=Smoke.area and measuredTemp=Temp.value Example: Single selection 	define  Fire(area: string, measuredTemp: double)from    Smoke(area=$a) and last Temp(area=$a and val>50) within 1min. from Smokewhere   area=Smoke.area and measuredTemp=Temp.val ,[object Object]
first…within
n-first…within   n-last…within29 DEBS 2011 - Tutorial
Consumption policy Determines how the history changes after firing of a rule  what happens when new items enter the Decider Example: A   B A   B R R ? Decider Receiver A zero B A A B selected 30 DEBS 2011 - Tutorial
Consumption policy: Considerations Most systems couple a multiple selection policy with a zero consumption policy This is the common case with DSMSs, which use (sliding) windows to select relevant events Example (in CQL) 	Select IStream(Smoke.area)From Smoke[Range 1 min], Temp[Range 1 min]Where Smoke.area = Temp.area AND Temp.val > 50 The systems that allow the selection policy to be programmed often allow the consumption policy to be programmed, too E.g., Amit, T-Rex 31 DEBS 2011 - Tutorial
Consumption policy: The TESLA case Zero consumption policy define   Fire(area: string, measuredTemp: double)from     Smoke(area=$a) and each Temp(area=$a and val>50)within 1min. from Smokewhere    area=Smoke.area and measuredTemp=Temp.value Selected consumption policy define    Fire(area: string, measuredTemp: double)from      Smoke(area=$a) and each Temp(area=$a and val>50)within 1min. from Smokewhere     area=Smoke.area and measuredTemp=Temp.valueconsuming Temp Fire! Fire! Fire! Fire! Fire! Fire! Fire! Fire! Fire! Fire! Fire! Fire! T T T T S S T T T T S S T T T T 32 DEBS 2011 - Tutorial
Load shedding policy Problem: How to manage bursts of input data It may seem a system issue i.e., an issue to solve into theReceiver But it strongly impacts the resultsproduced i.e., the “semantics” of the rules Accordingly, some systems allows this issue to be determined on a per-rule basis e.g., Aurora allows rules to specify the expected QoS and sheds input to stay within limits with the available resources Conceptually the issue is addressed into the decider Knowledgebase A A A History History Seq History Decider Forwarder Receiver Producer ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Clock Rules 33 DEBS 2011 - Tutorial
Deployment model IFP applications may include a large number of sources and sinks Possibly dispersed over a wide geographical area It becomes important to consider the deployment architecture of the engine How the components of the functional model can be distributed to achieve scalability 34 DEBS 2011 - Tutorial
Deployment model Sources Sinks IFP Engine Centralized Clustered Networked Information Flows Information Flows ----- ----- ----- ----- Rules ----- ----- ----- ----- Distributed Rule managers 35 DEBS 2011 - Tutorial
Deployment Model Most existing systems adopt a centralized solution When distributed processing is allowed, it is usually based on clustered solutions A few systems have recognized the importance of networked deployment for some applications E.g. Microsoft StreamInsight (part of SQLServer) Filtering near sources Aggregation and correlation in-network Analytics and historical data in a centralized server/cluster In most cases, deployment/configuration is not automatic 36 DEBS 2011 - Tutorial
Deployment model Automatic distribution of processing introduces the operator placementproblem Given a set of rules (composed of operators) and a set of nodes How to split the processing load How to assign operators to available nodes In other words (Event Processing in Action) Given an event processing network How to map it onto the physical network of nodes 37 DEBS 2011 - Tutorial
Operator placement The operator placement problem is still open Several proposals Often adopting techniques coming from the Operational Research Difficult to compare solutions and results Even in its simplest form the problem is NP-hard Good survey paper by Lakshmanan, Li, and Strom “Placement Strategies for Internet-Scale Data Stream Systems” 38 DEBS 2011 - Tutorial
Operator placement Goals Operator Placement 39 DEBS 2011 - Tutorial
Operator placement Different goals have been considered for operator placement Depending from the deployment architecture Depending from the application requirements Examples: Minimize delay / network usage (bandwidth) Pietzuch et. Al. “Network-Aware Operator Placement for Stream-Processing Systems” Reduce hardware resource usage / Obtain load balancing Relevant in clustered scenarios Several papers for IBM System S 40 DEBS 2011 - Tutorial
Operator placement Heterogeneous Resources Rule Rewriting Goal User-defined QOS Operator Placement Centralized vs Distributed Re-use vs Replication On-line vsOff-line 41 DEBS 2011 - Tutorial
More on deployment model Operator placement is only part of the problem Other issues How to build the network of nodes? How to maintain it? How to gather the information required to solve the operator placement problem? How to actually “place” the operators? How to “replace” them when the situation changes? New rules added, old rules removed… …new sources/sinks 42 DEBS 2011 - Tutorial
Deployment model and dynamics How to cope with mobile nodes? Mobile sinks and sources… …but also mobile “processors” The issue is relevant We leave in a mobile world Very few proposals A lot of work in the area of pure publish/subscribe Several works published in DEBS, not to mention other major conferences/journals May we reuse some of this work? 43 DEBS 2011 - Tutorial
Interaction Model Sources Sinks IFP Engine It is interesting to study the characteristics of the interactions among the main component of an IFP system Who starts the communication? 44 DEBS 2011 - Tutorial
Interaction Model Sources Sinks IFP Engine Observation Model Forwarding Model Notification Model ,[object Object]
Pull
Push
Pull
Push
Pull45 DEBS 2011 - Tutorial
Time Model Relationship between information items and passing of time Ability of an IFP system to associate some kind of happened-before (ordering) relationship to information items We identified 4 classes: Stream-only Causal Absolute Interval 46 DEBS 2011 - Tutorial
Stream-Only Time Model Used in original DSMSs Timestamps may be present or not When present, they are used only to group items when they enter the processing engine They are not exposed to the language With the exception of windowing constructs Ordering in output streams is conceptually separate from the ordering in input streams CQL/Stream Select DStream(*) From F1[Rows 5],  F2[Range 1 Minute] Where	 F1.A = F2.A 47 DEBS 2011 - Tutorial
Causal Time Model Each item has a label reflecting some kind of causal relationship Partial order E.g. Rapide An event is causally ordered after all events that led to its occurrence Gigascope Selectcount(*) From	 A, B Where	 A.a-1 <= B.b and  A.a+1 > B.b A.a, B.b monotonically increase A B C 48 DEBS 2011 - Tutorial
Absolute Time Model Information items have an associated timestamp Defining a single point in time w.r.t. a (logically)unique clock Total order Timestamps are fully exposed to the language Timestamps can be given by the source, or by the processing engine TESLA/T-Rex Define	Fire(area: string, measuredTemp: double)  From	Smoke(area=$a) and last  	Temp(area=$a and value>45) 	within 5 min. from Smoke  Where 	area=Smoke.areaand measuredTemp=Temp.value 49 DEBS 2011 - Tutorial
Interval Time Model Used for events to include “duration” SnoopIB, Cayuga, NextCEP, … At a first sight, it is a simple extension of the absolute time model Timestamps with two values: start time and end time However, it opens many issues What is the successor of an event? What is the timestamp associated to a composite event? 50 DEBS 2011 - Tutorial
Interval Time Model Which is the immediate successor of A? Choose according to end time only: B But it started before A! Exclude B: C, D Both of them? Which of them? No other event strictly between A and its successor: C, D, E Seems a natural definition Unfortunately we loose associativity! X(YZ) ≠ (X Y)Z May impede some rule rewriting for processing optimizations 51 DEBS 2011 - Tutorial
Interval Time Model “What is “Next” in event processing?” by White et. Al Proposes a number of desired properties to be satisfied by the “Next” function There is one model that satisfies them all Complete History It is not sufficient to encode timestamps using a couple of values Timestamps of composite events must embed the timestamps of all the events that led to their occurrence Possibly, timestamps of unbounded size In case of unbounded Seq 52 DEBS 2011 - Tutorial
Data Model Studies how the different systems Represent single data items Organize them into data flows Data Data Items Nature of Items ,[object Object]
Event NotificationsFormat ,[object Object]
Tuples
Objects
…Support for Uncertainty Data Flows ,[object Object]
Heterogeneous53 DEBS 2011 - Tutorial
Nature of Items The meaning we associate to information items Generic data Event notifications Deeply influences several other aspects of an IFP system Time model !!! Rule language Semantics of processing Heritage of the heterogeneous backgrounds of different communities Data Data Items Nature of Items ,[object Object]
Event NotificationsFormat ,[object Object]
Tuples
Objects
…Support for Uncertainty Data Flows ,[object Object]
Heterogeneous54 DEBS 2011 - Tutorial
Nature of Items CQL/Stream (Generic Data) SelectIStream(*) From	 F1[Rows 5],  F2[Range 1 Minute] Where	 F1.A = F2.A TESLA/T-Rex (Event Notifications) Define	Fire (area: string, measuredTemp: double)  From	Smoke(area=$a)and last  	Temp(area=$a and value>45) within 5 min. from Smoke  Where 	area=Smoke.area and measuredTemp=Temp.value 55 DEBS 2011 - Tutorial
Format of Items How information is represented Influences the way items are processed E.g., Relational model requires tuples Data Data Items Nature of Items ,[object Object]
Event NotificationsFormat ,[object Object]
Tuples
Objects
…Support for Uncertainty Data Flows ,[object Object]
Heterogeneous56 DEBS 2011 - Tutorial
Support for Uncertainty Ability to associate a degree of uncertainty to information items To the content of items Imprecise temperature reading To the presence of an item (occurrence of an event) Spurious RFID reading When present, probabilistic information is usually exploited in rules during processing Data Data Items Nature of Items ,[object Object]
Event NotificationsFormat ,[object Object]
Tuples
Objects
…Support for Uncertainty Data Flows ,[object Object]
Heterogeneous57 DEBS 2011 - Tutorial
Data Flows Homogeneous Each flow contains data with the same format and “kind” E.g. Tuples with identical structure Often associated with “database-like” rule languages Heterogeneous Information flows are seen as channels connecting sources, processors, and sinks Each channel may transport items with different kind and format Data Data Items Nature of Items ,[object Object]
Event NotificationsFormat ,[object Object]
Tuples
Objects
…Support for Uncertainty Data Flows ,[object Object]
Heterogeneous58 DEBS 2011 - Tutorial
Rule Model Rules are much more complex entities than data items Large number of different approaches Already observed in the previous slides Looking back to our functional model, we classify them into two macro classes Transforming rules Detecting rules Rule Type of Rules ,[object Object]
Detecting RulesSupport for Uncertainty 59 DEBS 2011 - Tutorial
Transforming Rules Do not present an explicit distinction between detection and production Define an execution plan combining primitiveoperators Each operator transforms one or more input flows into one or more output flows The execution plan can be defined explicitly (e.g., through graphical notation) implicitly (using a high level language) Often used with homogeneous information flows To take advantage of the predefined structure of input and output Rule Type of Rules ,[object Object]
Detecting RulesSupport for Uncertainty 60 DEBS 2011 - Tutorial
Detecting Rules Present an explicit distinction between detection and production Usually, the detection is based on a logical predicate that captures patterns of interest in the history of received items Rule Type of Rules ,[object Object]
Detecting RulesSupport for Uncertainty 61 DEBS 2011 - Tutorial
Support for Uncertainty Two orthogonal aspects Support for uncertain input Allows rules to deal with/reason about uncertain input data Support for uncertain output Allows rules to associate a degree of uncertainty to the output produced Rule Type of Rules ,[object Object]
Detecting RulesSupport for Uncertainty 62 DEBS 2011 - Tutorial
Language Model Specify operations to Filter Join Aggregate input flows … … to produce one or more output flows Following the rule model, we define two classes of languages: Transforming languages Declarative languages Imperative languages Detecting languages Pattern-based 63 DEBS 2011 - Tutorial
Following the rule model, we define two classes of languages: Transforming languages Declarative languages Imperative languages Detecting languages Pattern-based Language Model Specify the expected result rather than the desired execution flow Usually derive from relational languages Relational algebra / SQL CQL/Stream: Select IStream(*) From	  F1[Rows 5],   F2[Rows 10] Where  F1.A = F2.A 64 DEBS 2011 - Tutorial
Following the rule model, we define two classes of languages: Transforming languages Declarative languages Imperative languages Detecting languages Pattern-based Language Model Specify the desired execution flow Starting from primitive operators Can be user-defined Usually adopt a graphical notation 65 DEBS 2011 - Tutorial
Imperative Languages Aurora (Boxes & Arrows Model) 66 DEBS 2011 - Tutorial
Hybrid Languages Oracle CEP 67 DEBS 2011 - Tutorial
Following the rule model, we define two classes of languages: Transforming languages Declarative languages Imperative languages Detecting languages Pattern-based Language Model Specify a firing condition as a pattern Select a portion of incoming flows through Logic operators Content / timing constraints The action uses selected items to produce new knowledge 68 DEBS 2011 - Tutorial
Detecting Languages Action TESLA / T-Rex DefineFire(area: string, measuredTemp: double)  From 	 Smoke(area=$a) and last  Temp(area=$a and value>45) within 5 min. from Smoke  Where area=Smoke.areaand measuredTemp=Temp.value Condition (Pattern) 69 DEBS 2011 - Tutorial
Language Model Different syntaxes / constructs / operators Comparison of languages semantics still an open issue Our approach: Review all operators encountered in the analysis of systems Specifying the classes of languages adopting them Trying to capture some semantics relationship Among operators 70 DEBS 2011 - Tutorial
Language Model Single-Item operators Selection operators Filter items according to their content Elaboration operators Projection Extracts a part of the content of an item Renaming Changes the name of a field in languages based on records or tuples Present in all languages Defined as primitive operators in imperative languages Declarative languages inherit selection, projection, and renaming from relational algebra Renaming Select RStream (I.Price as HighPrice) From	Items[Rows 1] as I Where	I.Price > 100 Projection Selection 71 DEBS 2011 - Tutorial
Language Model Single-Item operators Selection operators Filter items according to their content Elaboration operators Projection Extracts a part of the content of an item Renaming Changes the name of a field in languages based on records or tuples Pattern-based languages Selection inside the condition part (pattern) Elaboration as part of the action Projection DefineExpensiveItem 	(highPrice: double)  From 	Item(price>100) Where 	highPrice = price Selection Renaming 72 DEBS 2011 - Tutorial
Language Model Logic Operators Conjunction Disjunction Repetition Negation Explicitly present in pattern-based languages PADRES (A & B) || (C & D) Conjunction Disjunction 73 DEBS 2011 - Tutorial
Language Model Logic Operators Conjunction Disjunction Repetition Negation Some logic operators are blocking Express pattern whose validity cannot be decided into a bounded amount of time E.g., Negation Used in conjunction with windows DefineFire() From 	 Smoke(area=$a) and  not Rain(area=$a)  within 10 min from Smoke Negation Window 74 DEBS 2011 - Tutorial
Language Model Logic Operators Conjunction Disjunction Repetition Negation Traditionally, logic operators were not explicitly offered by declarative and imperative languages However, they could be expressed as transformation of input flows Conjunction of A and B Select IStream  (F1.A, F2.B) From F1 [Rows 10],  F2 [Rows 20] 75 DEBS 2011 - Tutorial
Language Model Sequences Similar to logic operators Based on timing relations among items Present in almost all pattern-based languages DefineFire()  From 	 Smoke(area=$a) and last   Temp(area=$a and value>45)  within 5 min. from Smoke  Sequence (time-bounded) 76 DEBS 2011 - Tutorial
Language Model Sequences Similar to logic operators Based on timing relations among items Traditionally, transforming languages did not provide sequences explicitly Could be expressed with an explicit reference to timestamps If present inside items Select IStream  (F1.A, F2.B) From	 F1 [Rows 10], F2 [Rows 20] Where	 F1.timestamp <F2.timestamp Impose timestamp order 77 DEBS 2011 - Tutorial
Language Model Iterations Express possibly unbounded sequences of items … … satisfying an iterating condition Implicitly defines an ordering among items Iteration (Kleene +) SASE+ PATTERN SEQ(Alert a, Shipment+ b[ ]) WHERE skip_till_any_match(a, b[ ]) { a.type= ’contaminated’ and 	b[1].from = a.site and b[i].from = b[i-1].to } WITHIN 3 hours 78 DEBS 2011 - Tutorial

More Related Content

What's hot

Applying CEP Drools Fusion - Drools jBPM Bootcamps 2011
Applying CEP Drools Fusion - Drools jBPM Bootcamps 2011Applying CEP Drools Fusion - Drools jBPM Bootcamps 2011
Applying CEP Drools Fusion - Drools jBPM Bootcamps 2011Geoffrey De Smet
 
Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with EsperAntónio Alegria
 
Complex Event Processing (CEP) for Next-Generation Security Event Management,...
Complex Event Processing (CEP) for Next-Generation Security Event Management,...Complex Event Processing (CEP) for Next-Generation Security Event Management,...
Complex Event Processing (CEP) for Next-Generation Security Event Management,...Tim Bass
 
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...Srinath Perera
 
Splunk Ninjas: New Features and Search Dojo
Splunk Ninjas: New Features and Search DojoSplunk Ninjas: New Features and Search Dojo
Splunk Ninjas: New Features and Search DojoSplunk
 
Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with EsperTed Won
 
Kks sre book_ch10
Kks sre book_ch10Kks sre book_ch10
Kks sre book_ch10Chris Huang
 
Keynote 1 the rise of stream processing for data management &amp; micro serv...
Keynote 1  the rise of stream processing for data management &amp; micro serv...Keynote 1  the rise of stream processing for data management &amp; micro serv...
Keynote 1 the rise of stream processing for data management &amp; micro serv...Sabri Skhiri
 
Scalable Event Processing with WSO2CEP @ WSO2Con2015eu
Scalable Event Processing with WSO2CEP @  WSO2Con2015euScalable Event Processing with WSO2CEP @  WSO2Con2015eu
Scalable Event Processing with WSO2CEP @ WSO2Con2015euSriskandarajah Suhothayan
 
S4: Distributed Stream Computing Platform
S4: Distributed Stream Computing PlatformS4: Distributed Stream Computing Platform
S4: Distributed Stream Computing PlatformFarzad Nozarian
 
Splunk Ninjas Breakout Session
Splunk Ninjas Breakout SessionSplunk Ninjas Breakout Session
Splunk Ninjas Breakout SessionSplunk
 
Processing Patterns for Predictive Business
Processing Patterns for Predictive BusinessProcessing Patterns for Predictive Business
Processing Patterns for Predictive BusinessTim Bass
 
A Practical Guide to Anomaly Detection for DevOps
A Practical Guide to Anomaly Detection for DevOpsA Practical Guide to Anomaly Detection for DevOps
A Practical Guide to Anomaly Detection for DevOpsBigPanda
 
Observability For Modern Applications
Observability For Modern ApplicationsObservability For Modern Applications
Observability For Modern ApplicationsAmazon Web Services
 
Splunk conf2014 - Lesser Known Commands in Splunk Search Processing Language ...
Splunk conf2014 - Lesser Known Commands in Splunk Search Processing Language ...Splunk conf2014 - Lesser Known Commands in Splunk Search Processing Language ...
Splunk conf2014 - Lesser Known Commands in Splunk Search Processing Language ...Splunk
 
Splunk live! ninjas_break-out
Splunk live! ninjas_break-outSplunk live! ninjas_break-out
Splunk live! ninjas_break-outSplunk
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataJames Sirota
 
Scaling Pattern and Sequence Queries in Complex Event Processing
Scaling Pattern and Sequence Queries in Complex Event ProcessingScaling Pattern and Sequence Queries in Complex Event Processing
Scaling Pattern and Sequence Queries in Complex Event ProcessingMohanadarshan Vivekanandalingam
 

What's hot (20)

Applying CEP Drools Fusion - Drools jBPM Bootcamps 2011
Applying CEP Drools Fusion - Drools jBPM Bootcamps 2011Applying CEP Drools Fusion - Drools jBPM Bootcamps 2011
Applying CEP Drools Fusion - Drools jBPM Bootcamps 2011
 
Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with Esper
 
Complex Event Processing (CEP) for Next-Generation Security Event Management,...
Complex Event Processing (CEP) for Next-Generation Security Event Management,...Complex Event Processing (CEP) for Next-Generation Security Event Management,...
Complex Event Processing (CEP) for Next-Generation Security Event Management,...
 
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
 
Splunk Ninjas: New Features and Search Dojo
Splunk Ninjas: New Features and Search DojoSplunk Ninjas: New Features and Search Dojo
Splunk Ninjas: New Features and Search Dojo
 
Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with Esper
 
Tuning Java Servers
Tuning Java Servers Tuning Java Servers
Tuning Java Servers
 
Kks sre book_ch10
Kks sre book_ch10Kks sre book_ch10
Kks sre book_ch10
 
Keynote 1 the rise of stream processing for data management &amp; micro serv...
Keynote 1  the rise of stream processing for data management &amp; micro serv...Keynote 1  the rise of stream processing for data management &amp; micro serv...
Keynote 1 the rise of stream processing for data management &amp; micro serv...
 
Scalable Event Processing with WSO2CEP @ WSO2Con2015eu
Scalable Event Processing with WSO2CEP @  WSO2Con2015euScalable Event Processing with WSO2CEP @  WSO2Con2015eu
Scalable Event Processing with WSO2CEP @ WSO2Con2015eu
 
S4: Distributed Stream Computing Platform
S4: Distributed Stream Computing PlatformS4: Distributed Stream Computing Platform
S4: Distributed Stream Computing Platform
 
Splunk Ninjas Breakout Session
Splunk Ninjas Breakout SessionSplunk Ninjas Breakout Session
Splunk Ninjas Breakout Session
 
Processing Patterns for Predictive Business
Processing Patterns for Predictive BusinessProcessing Patterns for Predictive Business
Processing Patterns for Predictive Business
 
A Practical Guide to Anomaly Detection for DevOps
A Practical Guide to Anomaly Detection for DevOpsA Practical Guide to Anomaly Detection for DevOps
A Practical Guide to Anomaly Detection for DevOps
 
Observability For Modern Applications
Observability For Modern ApplicationsObservability For Modern Applications
Observability For Modern Applications
 
Splunk conf2014 - Lesser Known Commands in Splunk Search Processing Language ...
Splunk conf2014 - Lesser Known Commands in Splunk Search Processing Language ...Splunk conf2014 - Lesser Known Commands in Splunk Search Processing Language ...
Splunk conf2014 - Lesser Known Commands in Splunk Search Processing Language ...
 
Splunk live! ninjas_break-out
Splunk live! ninjas_break-outSplunk live! ninjas_break-out
Splunk live! ninjas_break-out
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking Data
 
18 Data Streams
18 Data Streams18 Data Streams
18 Data Streams
 
Scaling Pattern and Sequence Queries in Complex Event Processing
Scaling Pattern and Sequence Queries in Complex Event ProcessingScaling Pattern and Sequence Queries in Complex Event Processing
Scaling Pattern and Sequence Queries in Complex Event Processing
 

Similar to Processing Flows of Information DEBS 2011

Linked services for the Web of Data
Linked services for the Web of DataLinked services for the Web of Data
Linked services for the Web of DataJohn Domingue
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshSion Smith
 
PowerPoint
PowerPointPowerPoint
PowerPointVideoguy
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshIanFurlong4
 
Proof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityProof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityOpen Cyber University of Korea
 
IPv4 to IPv6 network transformation
IPv4 to IPv6 network transformationIPv4 to IPv6 network transformation
IPv4 to IPv6 network transformationNikolay Milovanov
 
Engineering 4.0: Digitization through task automation and reuse
Engineering 4.0:  Digitization through task automation and reuseEngineering 4.0:  Digitization through task automation and reuse
Engineering 4.0: Digitization through task automation and reuseCARLOS III UNIVERSITY OF MADRID
 
Oracle SOA Suite in use – a practical experience report
Oracle SOA Suite in use – a practical experience reportOracle SOA Suite in use – a practical experience report
Oracle SOA Suite in use – a practical experience reportGuido Schmutz
 
Provisioning Toolchain Introduction for Velocity Online Conference (March 2010)
Provisioning Toolchain Introduction for Velocity Online Conference (March 2010)Provisioning Toolchain Introduction for Velocity Online Conference (March 2010)
Provisioning Toolchain Introduction for Velocity Online Conference (March 2010)dev2ops
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationDenodo
 
The Future of Networks is Open...Source
The Future of Networks is Open...SourceThe Future of Networks is Open...Source
The Future of Networks is Open...SourceFrancois Duthilleul
 
Database project edi
Database project ediDatabase project edi
Database project ediRey Jefferson
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsMichael Häusler
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...Cisco DevNet
 
Dev Ops for systems of record - Talk at Agile Australia 2015
Dev Ops for systems of record - Talk at Agile Australia 2015Dev Ops for systems of record - Talk at Agile Australia 2015
Dev Ops for systems of record - Talk at Agile Australia 2015Mirco Hering
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunk
 
Puppet Keynote by Ralph Luchs
Puppet Keynote by Ralph LuchsPuppet Keynote by Ralph Luchs
Puppet Keynote by Ralph LuchsNETWAYS
 
data-mesh-101.pptx
data-mesh-101.pptxdata-mesh-101.pptx
data-mesh-101.pptxTarekHamdi8
 
127801976 mobile-shop-management-system-documentation
127801976 mobile-shop-management-system-documentation127801976 mobile-shop-management-system-documentation
127801976 mobile-shop-management-system-documentationNitesh Kumar
 
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentApache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentHostedbyConfluent
 

Similar to Processing Flows of Information DEBS 2011 (20)

Linked services for the Web of Data
Linked services for the Web of DataLinked services for the Web of Data
Linked services for the Web of Data
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 
PowerPoint
PowerPointPowerPoint
PowerPoint
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
 
Proof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityProof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics Interoperability
 
IPv4 to IPv6 network transformation
IPv4 to IPv6 network transformationIPv4 to IPv6 network transformation
IPv4 to IPv6 network transformation
 
Engineering 4.0: Digitization through task automation and reuse
Engineering 4.0:  Digitization through task automation and reuseEngineering 4.0:  Digitization through task automation and reuse
Engineering 4.0: Digitization through task automation and reuse
 
Oracle SOA Suite in use – a practical experience report
Oracle SOA Suite in use – a practical experience reportOracle SOA Suite in use – a practical experience report
Oracle SOA Suite in use – a practical experience report
 
Provisioning Toolchain Introduction for Velocity Online Conference (March 2010)
Provisioning Toolchain Introduction for Velocity Online Conference (March 2010)Provisioning Toolchain Introduction for Velocity Online Conference (March 2010)
Provisioning Toolchain Introduction for Velocity Online Conference (March 2010)
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
 
The Future of Networks is Open...Source
The Future of Networks is Open...SourceThe Future of Networks is Open...Source
The Future of Networks is Open...Source
 
Database project edi
Database project ediDatabase project edi
Database project edi
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data Applications
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
 
Dev Ops for systems of record - Talk at Agile Australia 2015
Dev Ops for systems of record - Talk at Agile Australia 2015Dev Ops for systems of record - Talk at Agile Australia 2015
Dev Ops for systems of record - Talk at Agile Australia 2015
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
 
Puppet Keynote by Ralph Luchs
Puppet Keynote by Ralph LuchsPuppet Keynote by Ralph Luchs
Puppet Keynote by Ralph Luchs
 
data-mesh-101.pptx
data-mesh-101.pptxdata-mesh-101.pptx
data-mesh-101.pptx
 
127801976 mobile-shop-management-system-documentation
127801976 mobile-shop-management-system-documentation127801976 mobile-shop-management-system-documentation
127801976 mobile-shop-management-system-documentation
 
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentApache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
 

Recently uploaded

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Recently uploaded (20)

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Processing Flows of Information DEBS 2011

  • 1. Processing Flows of InformationFrom Data Streams to Complex Event Processing GianpaoloCugola and Alessandro Margara Politecnicodi Milano Dip. Elettronica e Informazione [cugola,margara]@elet.polimi.it “Processing Flows of Information: from data streams to complex event processing”ACM Computing Surveys, to appear (but available from: http://home.dei.polimi.it/cugola)
  • 2. Background and motivation Back in 2007 CEP was already a hot topic… … but having a good grasp of the area was rather hard As observed by OpherEtzion the area was looking like the “Tower of Babel” Event Processing and the Babylon Tower – Event process thinking blog – Sept. 8, 2007 event data message primitive complex composite stream flow sequence pattern join query rule condition action publish subscribe notify advertise send receive middleware system application protocol network routing 2 DEBS 2011 - Tutorial
  • 3. Background and motivation Several communities were contributing to the area… … each bringing its own expertise and vocabulary… …but often working in isolation event Researchers data message DEBS primitive data management& databases process modeling& automation complex composite stream flow sequence pattern join query rule condition action publish subscribe notify middlewaresystems ad-hoc tools(intrusion det., …) advertise send receive applicationservers DBMSs middleware system application protocol network Tool vendors routing 3 DEBS 2011 - Tutorial
  • 4. Background and motivation That was 2007. What about today? Things did not change much From the “Event Process Thinking” blog – Sept. 2010 [Which is] the relation between event processing and data stream management? They are aliases -- stream is just a collection of events, likewise, an event is just a member in a stream, and the functionality is the same Stream management is a subset of event processing -- there are different ways to do event processing, streams is one of them Event processing is a subset of stream management -- event streams is just one type of stream, but there are voice stream, video stream, … Event processing and stream management are distinct and there is no overlapping between them At the same time tool vendors are building tools that try to combine ¿ juxtapose ? different approaches 4 DEBS 2011 - Tutorial
  • 5. Our goal We would like to compare different systems in a precise way We would like to compare different approaches in a precise way We would like to help people coming from different areas communicate and compare their work with others We would like to isolate the open issues from those already solved We would like to precisely identify the challenges We would like to isolate the best part of the various approaches… … finding a way to combine them We need a modeling frameworkthat could accommodate all the proposals 5 DEBS 2011 - Tutorial
  • 6. Forget your own vocabulary(temporarily) CEP, DSMSs, Active Databases… All these terms hide a peculiar view of the domain we have in mind With subtle, “unsaid” (and often unclear) differences Before learning something new we have to forget what we already know 6 DEBS 2011 - Tutorial
  • 7.
  • 8. IFP: One name several incarnations A lot of applications Environmental monitoring through sensor networks Financial applications Fraud detection tools Network intrusion detection systems RFID-based inventory management Manufacturing control systems … Several technologies Active databases DSMSs CEP Systems 8 DEBS 2011 - Tutorial
  • 9. One model, several models Different models to capture different viewpoints Functional model Processing model Deployment model Interaction model Time model Data model Rule model Language model 9 DEBS 2011 - Tutorial
  • 10. Functional model Knowledgebase A A A History History Seq History Decider Forwarder Receiver Producer ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Clock Rules 10 DEBS 2011 - Tutorial
  • 11.
  • 12. Acts as a demultiplexerKnowledgebase A A A History History Seq History Decider Forwarder Receiver Producer ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Clock Rules 11 DEBS 2011 - Tutorial
  • 13.
  • 14. Acts as a multiplexerKnowledgebase A A A History History Seq History Decider Forwarder Receiver Producer ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Clock Rules 12 DEBS 2011 - Tutorial
  • 15. A short digression We assume rules can be (logically)decomposed in two parts: C -> A C is the condition A is the action Example (in CQL): Select IStream(Count(*))From F1 [Range 1 Minute]Where F1.A > 0 This way we can split processingin two phases: The detection phase determines the items that trigger the rule The production phase use those items to produce the output of the rule Knowledgebase A A A History History Seq History action Decider Forwarder Receiver condition Producer ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Clock Rules 13 DEBS 2011 - Tutorial
  • 16. Functional model Knowledgebase A A A Seq Decider Forwarder History Receiver History Producer History ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Clock Rules 14 DEBS 2011 - Tutorial
  • 17.
  • 18. Accumulates partial results into the history
  • 19. When a rule fires passes to the producer its action part and the triggering itemsKnowledgebase A A A Seq Decider Forwarder History Receiver History Producer History ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Clock Rules 15 DEBS 2011 - Tutorial
  • 20.
  • 21. Uses the items in Seq as stated in action ASeq Decider Forwarder History Receiver History Producer History ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Clock Rules 16 DEBS 2011 - Tutorial
  • 22.
  • 23.
  • 24.
  • 25.
  • 26. Periodically creates special information items holding current time
  • 27. Its presence models the ability of performingperiodic processing of inputsSeq Decider Forwarder History Receiver History Producer History ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Clock Rules 20 DEBS 2011 - Tutorial
  • 28. Functional model: Considerations The detection-production cycle Fired by a new item I entering the engine through the Receiver Including those periodically produced by the Clock, if present Detection phase: Evaluates all the rules to find those enabled Using item I plus the data into the Knowledge base, if present The item I can be accumulated into the History for partially enabled rules The action part of the enabled rules together with the triggering items (A+Seq) is passed to the producer Production phase: Produces the output items Combining the items that triggered the rule with data present in the Knowledge base, if present New items are sent to subscribed sinks (through the Forwarder)… …but they could also be sent internally to be processed again (recursive processing) In some systems the action part of fired rules may also change the set of deployed rules 21 DEBS 2011 - Tutorial
  • 29. Functional model: Considerations Maximum length of Seq a key aspect 1  PubSub Bounded  CQL like languages without time based windows Pattern based languages without a Kleene+ operator Other key aspects that impact expressiveness Presence of the Clock Models the ability to process rulesperiodically Available in almost half of the systemsreviewed Most Active DBs and DSMSs butfew CEP systems Knowledgebase A A A History History Seq History Decider Forwarder Receiver Producer ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Clock Rules 22 DEBS 2011 - Tutorial
  • 30. Functional model: Considerations Presence of the Knowledge base Only available in systems coming from the database community Presence of the looping flow exitingthe Producer Models the ability of performing recursiveprocessing Half CEP systems have it All Active DBs but very few DSMSshave it They have nested rules Support to dynamic rule change Few systems support it Can be implemented externally… Through sinks acting also as rule managers …but we think it is nice to have it internally Knowledgebase A A A History History Seq History Decider Forwarder Receiver Producer ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Clock Rules 23 DEBS 2011 - Tutorial
  • 31. The semantics of processing What determines the output of each detection-production cycle? The new item entering the engine The set of deployed rules The items stored into the History The content of the Knowledge Base Is this enough? Example (in Padres and CQL): Smoke && Temp>50 Select IStream(Smoke.area)From Smoke[Rows 30 Slide 10], Temp[Rows 50 Slide 5]Where Smoke.area = Temp.area AND Temp.value > 50 Knowledgebase A A A History History History History History History Rules Knowledgebase Seq Decider ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Producer ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- 24 DEBS 2011 - Tutorial
  • 32. Processing model Three policies affect the behavior of the system The selection policy The consumption policy The load shedding policy 25 DEBS 2011 - Tutorial
  • 33. Selection policy Determines if a rule fires once or multiple times and the items actually selected from the History Example: A0 B A1 B A0 B A1 B R R R R multiple ? Decider Receiver A A A A0 A1 single or B A A B 26 DEBS 2011 - Tutorial
  • 34. Selection policy: Considerations Most systems adopt a multiple selection policy It is simpler to implement Is it adequate? Example: Alert fire when smoke and high temperature in a short time frame If 10 sensors read high temperature and immediately afterward one detects smoke I would like to receive a single alert, not 10 A few systems allow this policy to be programmed… …some of them on a per-rule base E.g., Amit, T-Rex 27 DEBS 2011 - Tutorial
  • 35. Selection policy: The TESLA case TESLA (Trio-based Event Specification Language): the T-Rex language A rule language for CEP. Tries to combine expressiveness and efficiency Has a formally defined semantics Expressed in Trio, a Metric Temporal Logic (see DEBS 2010) Allows rule managers to choose their own selection policy on a per rule base Example: Multiple selection define Fire(area: string, measuredTemp: double)from Smoke(area=$a) and each Temp(area=$a and val>50) within 1min. from Smokewhere area=Smoke.area and measuredTemp=Temp.value Example: Single selection define Fire(area: string, measuredTemp: double)from Smoke(area=$a) and last Temp(area=$a and val>50) within 1min. from Smokewhere area=Smoke.area and measuredTemp=Temp.val 28 DEBS 2011 - Tutorial
  • 36.
  • 38. n-first…within n-last…within29 DEBS 2011 - Tutorial
  • 39. Consumption policy Determines how the history changes after firing of a rule  what happens when new items enter the Decider Example: A B A B R R ? Decider Receiver A zero B A A B selected 30 DEBS 2011 - Tutorial
  • 40. Consumption policy: Considerations Most systems couple a multiple selection policy with a zero consumption policy This is the common case with DSMSs, which use (sliding) windows to select relevant events Example (in CQL) Select IStream(Smoke.area)From Smoke[Range 1 min], Temp[Range 1 min]Where Smoke.area = Temp.area AND Temp.val > 50 The systems that allow the selection policy to be programmed often allow the consumption policy to be programmed, too E.g., Amit, T-Rex 31 DEBS 2011 - Tutorial
  • 41. Consumption policy: The TESLA case Zero consumption policy define Fire(area: string, measuredTemp: double)from Smoke(area=$a) and each Temp(area=$a and val>50)within 1min. from Smokewhere area=Smoke.area and measuredTemp=Temp.value Selected consumption policy define Fire(area: string, measuredTemp: double)from Smoke(area=$a) and each Temp(area=$a and val>50)within 1min. from Smokewhere area=Smoke.area and measuredTemp=Temp.valueconsuming Temp Fire! Fire! Fire! Fire! Fire! Fire! Fire! Fire! Fire! Fire! Fire! Fire! T T T T S S T T T T S S T T T T 32 DEBS 2011 - Tutorial
  • 42. Load shedding policy Problem: How to manage bursts of input data It may seem a system issue i.e., an issue to solve into theReceiver But it strongly impacts the resultsproduced i.e., the “semantics” of the rules Accordingly, some systems allows this issue to be determined on a per-rule basis e.g., Aurora allows rules to specify the expected QoS and sheds input to stay within limits with the available resources Conceptually the issue is addressed into the decider Knowledgebase A A A History History Seq History Decider Forwarder Receiver Producer ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Clock Rules 33 DEBS 2011 - Tutorial
  • 43. Deployment model IFP applications may include a large number of sources and sinks Possibly dispersed over a wide geographical area It becomes important to consider the deployment architecture of the engine How the components of the functional model can be distributed to achieve scalability 34 DEBS 2011 - Tutorial
  • 44. Deployment model Sources Sinks IFP Engine Centralized Clustered Networked Information Flows Information Flows ----- ----- ----- ----- Rules ----- ----- ----- ----- Distributed Rule managers 35 DEBS 2011 - Tutorial
  • 45. Deployment Model Most existing systems adopt a centralized solution When distributed processing is allowed, it is usually based on clustered solutions A few systems have recognized the importance of networked deployment for some applications E.g. Microsoft StreamInsight (part of SQLServer) Filtering near sources Aggregation and correlation in-network Analytics and historical data in a centralized server/cluster In most cases, deployment/configuration is not automatic 36 DEBS 2011 - Tutorial
  • 46. Deployment model Automatic distribution of processing introduces the operator placementproblem Given a set of rules (composed of operators) and a set of nodes How to split the processing load How to assign operators to available nodes In other words (Event Processing in Action) Given an event processing network How to map it onto the physical network of nodes 37 DEBS 2011 - Tutorial
  • 47. Operator placement The operator placement problem is still open Several proposals Often adopting techniques coming from the Operational Research Difficult to compare solutions and results Even in its simplest form the problem is NP-hard Good survey paper by Lakshmanan, Li, and Strom “Placement Strategies for Internet-Scale Data Stream Systems” 38 DEBS 2011 - Tutorial
  • 48. Operator placement Goals Operator Placement 39 DEBS 2011 - Tutorial
  • 49. Operator placement Different goals have been considered for operator placement Depending from the deployment architecture Depending from the application requirements Examples: Minimize delay / network usage (bandwidth) Pietzuch et. Al. “Network-Aware Operator Placement for Stream-Processing Systems” Reduce hardware resource usage / Obtain load balancing Relevant in clustered scenarios Several papers for IBM System S 40 DEBS 2011 - Tutorial
  • 50. Operator placement Heterogeneous Resources Rule Rewriting Goal User-defined QOS Operator Placement Centralized vs Distributed Re-use vs Replication On-line vsOff-line 41 DEBS 2011 - Tutorial
  • 51. More on deployment model Operator placement is only part of the problem Other issues How to build the network of nodes? How to maintain it? How to gather the information required to solve the operator placement problem? How to actually “place” the operators? How to “replace” them when the situation changes? New rules added, old rules removed… …new sources/sinks 42 DEBS 2011 - Tutorial
  • 52. Deployment model and dynamics How to cope with mobile nodes? Mobile sinks and sources… …but also mobile “processors” The issue is relevant We leave in a mobile world Very few proposals A lot of work in the area of pure publish/subscribe Several works published in DEBS, not to mention other major conferences/journals May we reuse some of this work? 43 DEBS 2011 - Tutorial
  • 53. Interaction Model Sources Sinks IFP Engine It is interesting to study the characteristics of the interactions among the main component of an IFP system Who starts the communication? 44 DEBS 2011 - Tutorial
  • 54.
  • 55. Pull
  • 56. Push
  • 57. Pull
  • 58. Push
  • 59. Pull45 DEBS 2011 - Tutorial
  • 60. Time Model Relationship between information items and passing of time Ability of an IFP system to associate some kind of happened-before (ordering) relationship to information items We identified 4 classes: Stream-only Causal Absolute Interval 46 DEBS 2011 - Tutorial
  • 61. Stream-Only Time Model Used in original DSMSs Timestamps may be present or not When present, they are used only to group items when they enter the processing engine They are not exposed to the language With the exception of windowing constructs Ordering in output streams is conceptually separate from the ordering in input streams CQL/Stream Select DStream(*) From F1[Rows 5], F2[Range 1 Minute] Where F1.A = F2.A 47 DEBS 2011 - Tutorial
  • 62. Causal Time Model Each item has a label reflecting some kind of causal relationship Partial order E.g. Rapide An event is causally ordered after all events that led to its occurrence Gigascope Selectcount(*) From A, B Where A.a-1 <= B.b and A.a+1 > B.b A.a, B.b monotonically increase A B C 48 DEBS 2011 - Tutorial
  • 63. Absolute Time Model Information items have an associated timestamp Defining a single point in time w.r.t. a (logically)unique clock Total order Timestamps are fully exposed to the language Timestamps can be given by the source, or by the processing engine TESLA/T-Rex Define Fire(area: string, measuredTemp: double) From Smoke(area=$a) and last Temp(area=$a and value>45) within 5 min. from Smoke Where area=Smoke.areaand measuredTemp=Temp.value 49 DEBS 2011 - Tutorial
  • 64. Interval Time Model Used for events to include “duration” SnoopIB, Cayuga, NextCEP, … At a first sight, it is a simple extension of the absolute time model Timestamps with two values: start time and end time However, it opens many issues What is the successor of an event? What is the timestamp associated to a composite event? 50 DEBS 2011 - Tutorial
  • 65. Interval Time Model Which is the immediate successor of A? Choose according to end time only: B But it started before A! Exclude B: C, D Both of them? Which of them? No other event strictly between A and its successor: C, D, E Seems a natural definition Unfortunately we loose associativity! X(YZ) ≠ (X Y)Z May impede some rule rewriting for processing optimizations 51 DEBS 2011 - Tutorial
  • 66. Interval Time Model “What is “Next” in event processing?” by White et. Al Proposes a number of desired properties to be satisfied by the “Next” function There is one model that satisfies them all Complete History It is not sufficient to encode timestamps using a couple of values Timestamps of composite events must embed the timestamps of all the events that led to their occurrence Possibly, timestamps of unbounded size In case of unbounded Seq 52 DEBS 2011 - Tutorial
  • 67.
  • 68.
  • 71.
  • 73.
  • 74.
  • 77.
  • 79. Nature of Items CQL/Stream (Generic Data) SelectIStream(*) From F1[Rows 5], F2[Range 1 Minute] Where F1.A = F2.A TESLA/T-Rex (Event Notifications) Define Fire (area: string, measuredTemp: double) From Smoke(area=$a)and last Temp(area=$a and value>45) within 5 min. from Smoke Where area=Smoke.area and measuredTemp=Temp.value 55 DEBS 2011 - Tutorial
  • 80.
  • 81.
  • 84.
  • 86.
  • 87.
  • 90.
  • 92.
  • 93.
  • 96.
  • 98.
  • 99. Detecting RulesSupport for Uncertainty 59 DEBS 2011 - Tutorial
  • 100.
  • 101. Detecting RulesSupport for Uncertainty 60 DEBS 2011 - Tutorial
  • 102.
  • 103. Detecting RulesSupport for Uncertainty 61 DEBS 2011 - Tutorial
  • 104.
  • 105. Detecting RulesSupport for Uncertainty 62 DEBS 2011 - Tutorial
  • 106. Language Model Specify operations to Filter Join Aggregate input flows … … to produce one or more output flows Following the rule model, we define two classes of languages: Transforming languages Declarative languages Imperative languages Detecting languages Pattern-based 63 DEBS 2011 - Tutorial
  • 107. Following the rule model, we define two classes of languages: Transforming languages Declarative languages Imperative languages Detecting languages Pattern-based Language Model Specify the expected result rather than the desired execution flow Usually derive from relational languages Relational algebra / SQL CQL/Stream: Select IStream(*) From F1[Rows 5], F2[Rows 10] Where F1.A = F2.A 64 DEBS 2011 - Tutorial
  • 108. Following the rule model, we define two classes of languages: Transforming languages Declarative languages Imperative languages Detecting languages Pattern-based Language Model Specify the desired execution flow Starting from primitive operators Can be user-defined Usually adopt a graphical notation 65 DEBS 2011 - Tutorial
  • 109. Imperative Languages Aurora (Boxes & Arrows Model) 66 DEBS 2011 - Tutorial
  • 110. Hybrid Languages Oracle CEP 67 DEBS 2011 - Tutorial
  • 111. Following the rule model, we define two classes of languages: Transforming languages Declarative languages Imperative languages Detecting languages Pattern-based Language Model Specify a firing condition as a pattern Select a portion of incoming flows through Logic operators Content / timing constraints The action uses selected items to produce new knowledge 68 DEBS 2011 - Tutorial
  • 112. Detecting Languages Action TESLA / T-Rex DefineFire(area: string, measuredTemp: double) From Smoke(area=$a) and last Temp(area=$a and value>45) within 5 min. from Smoke Where area=Smoke.areaand measuredTemp=Temp.value Condition (Pattern) 69 DEBS 2011 - Tutorial
  • 113. Language Model Different syntaxes / constructs / operators Comparison of languages semantics still an open issue Our approach: Review all operators encountered in the analysis of systems Specifying the classes of languages adopting them Trying to capture some semantics relationship Among operators 70 DEBS 2011 - Tutorial
  • 114. Language Model Single-Item operators Selection operators Filter items according to their content Elaboration operators Projection Extracts a part of the content of an item Renaming Changes the name of a field in languages based on records or tuples Present in all languages Defined as primitive operators in imperative languages Declarative languages inherit selection, projection, and renaming from relational algebra Renaming Select RStream (I.Price as HighPrice) From Items[Rows 1] as I Where I.Price > 100 Projection Selection 71 DEBS 2011 - Tutorial
  • 115. Language Model Single-Item operators Selection operators Filter items according to their content Elaboration operators Projection Extracts a part of the content of an item Renaming Changes the name of a field in languages based on records or tuples Pattern-based languages Selection inside the condition part (pattern) Elaboration as part of the action Projection DefineExpensiveItem (highPrice: double) From Item(price>100) Where highPrice = price Selection Renaming 72 DEBS 2011 - Tutorial
  • 116. Language Model Logic Operators Conjunction Disjunction Repetition Negation Explicitly present in pattern-based languages PADRES (A & B) || (C & D) Conjunction Disjunction 73 DEBS 2011 - Tutorial
  • 117. Language Model Logic Operators Conjunction Disjunction Repetition Negation Some logic operators are blocking Express pattern whose validity cannot be decided into a bounded amount of time E.g., Negation Used in conjunction with windows DefineFire() From Smoke(area=$a) and not Rain(area=$a) within 10 min from Smoke Negation Window 74 DEBS 2011 - Tutorial
  • 118. Language Model Logic Operators Conjunction Disjunction Repetition Negation Traditionally, logic operators were not explicitly offered by declarative and imperative languages However, they could be expressed as transformation of input flows Conjunction of A and B Select IStream (F1.A, F2.B) From F1 [Rows 10], F2 [Rows 20] 75 DEBS 2011 - Tutorial
  • 119. Language Model Sequences Similar to logic operators Based on timing relations among items Present in almost all pattern-based languages DefineFire() From Smoke(area=$a) and last Temp(area=$a and value>45) within 5 min. from Smoke Sequence (time-bounded) 76 DEBS 2011 - Tutorial
  • 120. Language Model Sequences Similar to logic operators Based on timing relations among items Traditionally, transforming languages did not provide sequences explicitly Could be expressed with an explicit reference to timestamps If present inside items Select IStream (F1.A, F2.B) From F1 [Rows 10], F2 [Rows 20] Where F1.timestamp <F2.timestamp Impose timestamp order 77 DEBS 2011 - Tutorial
  • 121. Language Model Iterations Express possibly unbounded sequences of items … … satisfying an iterating condition Implicitly defines an ordering among items Iteration (Kleene +) SASE+ PATTERN SEQ(Alert a, Shipment+ b[ ]) WHERE skip_till_any_match(a, b[ ]) { a.type= ’contaminated’ and b[1].from = a.site and b[i].from = b[i-1].to } WITHIN 3 hours 78 DEBS 2011 - Tutorial
  • 122. Language Model Logic operators, sequences, and iterations traditionally not offered by transforming languages And now? Current trend: Embed patterns inside declarative languages Especially adopted in commercial systems Esper Select A.price From pattern [every (A (B or C))] Where A.price > 100 IMO: difficult to write / understand Mailing list of Esper! 79 DEBS 2011 - Tutorial
  • 123. Language Model Windows Kind: Logical (Time-Based) Physical (Count-Based) User-Defined Movement Fixed Landmark Sliding Pane Tumble Logical SelectIStream(Count(*)) From F1[Range 1 Minute] Physical Select IStream(Count(*)) From F1[Rows 50 Slide 10] 80 DEBS 2011 - Tutorial
  • 124. Language Model User-Defined (Stream-Mill) WINDOW AGGREGATE positive min(Next Real): Real { TABLE inwindow(wnext real); INITIALIZE : { } ITERATE : { DELETE FROM inwindow WHERE wnext < 0 INSERT INTO RETURN SELECT min(wnext) FROM inwindow } EXPIRE : { } } 81 DEBS 2011 - Tutorial
  • 125. Language Model Windows are used to limit the scope of blocking operators They are generally available in declarative and imperative languages They are not present in all pattern-based languages Some of them do not include blocking operators Some of them “embed” windows inside operators Making them unblocking CEDR EVENT Test-Rule WHENUNLESS(A, B, 12 hours) WHERE A.a< B.b 82 DEBS 2011 - Tutorial
  • 126. Language Model Flow management operators Required by declarative and imperative languages to merge, split, organize, and process incoming flows of information Flow Management Operators Join Order By Union Group By Except Flow Creation Bag Operators Intersect Duplicate Remove-duplicates 83 DEBS 2011 - Tutorial
  • 127. Language Model Parameterization Allows the binding of different information items based on their content Offered implicitly by declarative and imperative languages Through a combination of join and selection Offered as an explicit operator in pattern-based languages Join CQL / Stream SelectIStream (F1.A, F2.B) From F1 [Rows 10], F2 [Rows 20] Where F1.A > F2.B Selection Explicit Parameter 84 TESLA / T-Rex Define Fire() From Smoke(area=$a) and last Temp(area=$a and value>45) within 5 min. from Smoke DEBS 2011 - Tutorial
  • 128.
  • 129.
  • 130. User-definedProduction Aggregate 85 DEBS 2011 - Tutorial
  • 131. Discussion 86 DEBS 2011 - Tutorial
  • 132. Abstraction IFP languages and systems offer a level of abstraction over flowing information Similar to the role SQL / DBMSs play for static data The heterogeneity of solutions suggests that finding the “right” abstraction is still an open issue Several open questions Applications API (Including Rules) IFP System 87 DEBS 2011 - Tutorial
  • 133. Abstraction Which are the requirements of applications? According to the results of EPTS survey Simple operations (mainly aggregate / join) Few sources (mainly 1-10, and 10-100) Low rates (mainly 10-100, and 100-1000 event/s) Data coming from DBMSs / files 88 DEBS 2011 - Tutorial
  • 134. Abstraction How can we explain these answers? IFP systems used as “enhanced” DBMSs IMO, companies are using the technology they know E.g. Language: no need for patterns or no knowledge about patterns? Features vs Simplicity/Integration This is the “consolidated” part of IFP systems IMO, raising the level of abstraction would make IFP systems more appealing for a larger audience Is it possible to combine advanced features and ease of use? 89 DEBS 2011 - Tutorial
  • 135. Abstraction Other details are present in the EPTS survey Most of the people was referring to technologies that were “already deployed” or “in development” The systems were chosen based on the trust on the vendor This suggests that the last (research) advancements in event processing may be still unknown They lack a solid implementation … … that can easily work side by side with existing data processing systems 90 DEBS 2011 - Tutorial
  • 136. Abstraction How many kinds of languages / abstractions? Traditionally, two Transforming rules Based on the Stream model Detecting rules Based on patterns Often merged together in commercial systems “Merged together” does not mean “organically combined” The underlying model is usually derived from the Stream model Good integration with relational languages Difficult to integrate with patterns Difficult to understand the semantics of rules Can we offer a better model / formalism? 91 DEBS 2011 - Tutorial
  • 137. Abstraction Do we really need “One abstraction to rule ‘em all”? Again, we need to better understand application requirements … … but we also need to better analyze the expressiveness and semantics of languages! In our survey, we offer a description of existing operators Open issues: How do operators contribute to the overall expressiveness of languages? Is it possible to find a “minimal set” of operators to organically combine the capabilities of transforming and detecting rules? 92 DEBS 2011 - Tutorial
  • 138. Our Proposal: TESLA “One size fits all” language does not exist At least, we have to find it yet We started from the following requirements Focus specifically on events Not generic data processing Define a clear and precise semantics Issues of selection and consumption policies General issue of formal semantics Be expressive Useful for applications Keep it simple Easy to read and write 93 DEBS 2011 - Tutorial
  • 139. DefineCE(Att1 : Type1, …, Attn : Typen) From Pattern Where Att1 = f1(..), …, Attn = fn(..) Consuminge1, …, em TESLA 94 DEBS 2011 - Tutorial
  • 140. Patterns in TESLA CE Selection of a single event A(x>10) Timer() Selection of sequences A(x>10) and each B within 5 min from A A(x>10) and last B within 5 min from A A(x>10) and first B within 5 min from A Generalization n-first / n-last A(X=15) B B 5 min CE CE A (X=15) B B CE A (X=15) B B CE A (X=15) B B 95 DEBS 2011 - Tutorial
  • 141. Patterns in TESLA TESLA allows *-within operators to be composed with each other: In chains of events A and each B within 3 min from A and last C within 2 min from B In parallel A and each B within 3 min from A and last C within 4 min from A Parameters can be added between events in a pattern A B C B A C 96 DEBS 2011 - Tutorial
  • 142. Negations and Aggregates Two kinds of negations: Interval based: A and last B within 3 min from A and not C between B and A Time based: A and not C within 3 min from A Similarly, two kinds of aggregates Interval based Use values appearing between two events Time based Use values appearing in a time interval 97 DEBS 2011 - Tutorial
  • 143. Iterations We believe that explicit operators for repetitions are difficult to use/understand Especially when different selection/consumption policies are allowed We achieve the same goal using hierarchies of events Complex events can be used to define (more) complex events Recurring schemes of repetitions Macros! 98 DEBS 2011 - Tutorial
  • 144. Formal Semantics The semantics of each TESLA operator / rule is formally defined through a set of TRIO metric temporal logic formulas They define an equivalence between the presence of a given pattern and the occurrence of one or more complex events, specifying: The occurrence time of complex events The content of complex events The set of consumed events The language is unambiguous A user can in advance check the semantics of her rules This makes it possible to check the correctness of an event detection engine using a model checker Given a history of events and a set of TESLA rules … … does the engine satisfy all the formulas ? 99 DEBS 2011 - Tutorial
  • 145. Abstraction Support for uncertainty Many applications deal with imprecise (or even incorrect) data from sources E.g., sensors, RFID, … Uncertain rules may increase the expressiveness / usefulness of an IFP system Addressed only in a few research papers … … and still missing in existing systems 100 DEBS 2011 - Tutorial
  • 146. Abstraction Which are the key requirements for an uncertainty model? It should be suitable for the application it is designed for Is it possible to have a single model? It should be easy to manage / understand To allow application domain experts to easily build rules on top of it It should have a limited impact on the performance of the overall system To preserve the required QoS 101 DEBS 2011 - Tutorial
  • 147. Abstraction Support for QoS policies Allow users to define application-dependent policies Which data is more important? Which items is it better to discard in case of system overload? Which QoS metric is more significant for the application? Completeness of results Latency … Traditionally offered only by DSMSs Most systems simply adopt a best-effort strategy 102 DEBS 2011 - Tutorial
  • 148. Abstraction Support for a Knowledge Base Some systems offer special operations to read from persistent storage Possibly a key-feature for the integration with existing DBMSs Support for periodic evaluation of rules As opposed to purely reactive evaluation 103 DEBS 2011 - Tutorial
  • 149. Abstraction Capability to dynamically change the set of deployed rules Add / Remove Activate / Deactivate Context awareness Tutorial at DEBS 2010 Could be used to increase the expressiveness of IFP … … but also to simplify / speedup processing Context / Situation used to reason on the status of the system / application Possibly combined with the possibility to activate / deactivate rules 104 DEBS 2011 - Tutorial
  • 150. Abstraction Where to offer the IFP abstraction? As a standalone product/system Similar to DBMSs As a middleware component Similar to a Publish/Subscribe system To guide the interaction of other components As part of a programming language Similar to what LINQ (Language-Integrated Query) offers to C# and to the .Net Framework Explored in EventJava Extension of Java for event correlation 105 DEBS 2011 - Tutorial
  • 151. Time Going back to our time model … … is there a “right approach” to deal with time? Using models that define a total order among information items seems to me easier to manage/understand Do we really need interval time for events? How do we define operators like sequences? To offer a proper level of abstraction … … but also to allow an efficient implementation 106 DEBS 2011 - Tutorial
  • 152. Time Many existing systems offer operators that rely of some notion of time Main problem: out-of-order arrival of information items Requirements Preserve the semantics of processing Keep the delay for obtaining results as small as possible Solutions Trade-off between correct and low latency processing Sometimes controlled by users through predefined or customizable policies Open issues Solutions for distributed / incremental processing may introduce delay at each processing step Problem not addressed in existing solutions for distributed processing 107 DEBS 2011 - Tutorial
  • 153. Processing Many processing algorithms proposed DSMSs cannot exploit traditional indexing techniques New effort is required to efficiently evaluate standing queries Most CEP systems are based on automata … … or in general incremental processing Perform processing incrementally, as new information is available Academia: ODE, Cayuga, NextCEP, Amit, TRex, etc. Industry: Esper/Nesper We are currently investigating other solutions Keep the whole history of all received information items Ad-hoc data structures that simplify information retrieval Start evaluation only when required Faster! Easier to parallelize (more on this later …) 108 DEBS 2011 - Tutorial
  • 154. Our Experience Processing Time (microsec) Multiple Selection Policy Processing Time (microsec) Single Selection Policy 109 DEBS 2011 - Tutorial
  • 155. Processing Are we using the right algorithms? Can we isolate the best algorithm for a particular set of operators? Accurate analysis of performance Can we switch among different algorithms depending from the workload? Deployed rules … … but also analysis of traffic Adaptive middleware 110 DEBS 2011 - Tutorial
  • 156. Processing Exploit parallel hardware Multi-core CPUs, but also GP-GPUs To handle several rules concurrently To speedup complex operations inside a single rule Only partially investigated E.g., streaming aggregates Can be extended to different operators? Also in this case the best solution may depend from the workload 111 DEBS 2011 - Tutorial
  • 157. Our Experience 112 DEBS 2011 - Tutorial
  • 158. Our Experience GPU can significantly speedup huge computations Fixed overhead to activate it makes it unsuitable for simple tasks Optimal workload: reduced set of very complex rules Additional advantage: CPU cores are free for other task Process simple rules … … but also handle marshalling/unmarshalling and communication with sources and sinks The trend. GPUs are becoming: Easier to program Suitable for an increasing number of tasks Not only data parallel algorithms 113 DEBS 2011 - Tutorial
  • 159. Processing Exploit existing models for computation on clusters / clouds Recently received great attention E.g. Map-Reduce Are these models suitable for IFP systems? To efficiently support the expressiveness of existing languages Can we create new models? Explicitly conceived to deal with IFP rules 114 DEBS 2011 - Tutorial
  • 160. Processing Exploit domain-dependent information to speedup processing E.g., implication between complex events or situations Can be part of context information, or combined with it In certain cases information can be automatically deduced Depending from the data and rule models adopted 115 DEBS 2011 - Tutorial
  • 161. Distribution of Processing Most systems rely on centralized processing When distributed processing is allowed, a clustered deployment model is often adopted Cooperating processors are co-located The problem of networked distribution has been addressed only marginally Are there applications that involve large scale scenarios? E.g. monitoring applications Potentially introduce new issues Bandwidth may become a bottleneck Data may be transferred over non-dedicated channels Several applications may run on the same “processing network” 116 DEBS 2011 - Tutorial
  • 162. Operator Placement Current approaches mainly focus on (sub)optimal placement with global knowledge Often based on over-simplified model … … that ignore important implementation details Research directions Distributed protocols allowing autonomous decisions at each node Fast adaptation to application load Monitoring and statistical analysis of data Optimization of multiple rules on the same processing network 117 DEBS 2011 - Tutorial
  • 163. Workloads A major problem when it comes to evaluate and compare different systems is the lack of workloads Benchmark Only one benchmark available: Linear Road Strongly tailored for the Stream model Difficult to adapt to other systems Huge parameter space Difficult to isolate relevant cases DEBS Grand Challenge may be a first step in this direction … Data coming from real / realistic use cases To understand which operators are more important … … and what to optimize in the processing algorithm 118 DEBS 2011 - Tutorial
  • 164. Security / Privacy Define security models and policies Which pieces of information can be consumed be a single operator or a set of operators Important: output data may have a different visibility w.r.t. input data! The system may be allowed to consume single data items, but not to extract new information from them, or to distribute this information Implement protocols and algorithms to enforce them 119 DEBS 2011 - Tutorial
  • 165. Thanks for your Attention! 120 DEBS 2011 - Tutorial
  • 166. Questions? 121 DEBS 2011 - Tutorial

Editor's Notes

  1. Parlarediintgrazione con I DB quandoparlidiKnowledgeBase
  2. Parlarediintgrazione con I DB quandoparlidiKnowledgeBase
  3. Parlarediintgrazione con I DB quandoparlidiKnowledgeBase
  4. Parlarediintgrazione con I DB quandoparlidiKnowledgeBase
  5. Parlarediintgrazione con I DB quandoparlidiKnowledgeBase
  6. Parlarediintgrazione con I DB quandoparlidiKnowledgeBase
  7. Parlarediintgrazione con I DB quandoparlidiKnowledgeBase
  8. Clustered -&gt; Processing nodes are co-located. Strictly connected. Usually adopting a shared-memory model. The aim is to split the processing load among different processing elements.Networked -&gt; Processing nodes are geographically distributed. No shared memory. The aim is to reduce network usage by pushing processing as near as possible to sources.
  9. MicrosoftStreamInsight allows users / system administrators to deploy multiple services, so that [elencopuntato]
  10. What do we mean by automatic distribution? Take for example P2P applications for file sharing. No need to manually configure the network when new nodes join or leave it.
  11. Heteorgeneous Resources -&gt; e.g., when the algorithm tries to deploy part of the processing load on small devices like mobile phones or even wireless sensorsRule Rewriting -&gt; Not only optimization inside single rules, but also multi-rule optimization. The rewriting algorithm depends from a set of rules.It is not clear if reuse and replications are assumptons from which starting or they are a “value” of the result and which one is better.
  12. Observation model -&gt; hybrid is common in existing applications. Indeed, they include both sources that actively send information (e.g., sensors) and sources with stable information the engine can use during processing (e.g., databases). These are modeled in our functional model using the knowledge base component.
  13. Ordering of output streams is conceptually separate from the ordering of input streams -&gt; Consider for example the use of Dstream, that outputs all the elements that are removed from the processing table. Or consider a sort by clause over a generic attribute.
  14. Gigascope -&gt; explicitly conceived to monitor traffic. Does not use the traditional Stream model. It does not use intermediate database tables for processing, but directly defines output streams starting from input streams.This requires some assumptions on the ordering of items to define a semantics for operators like join and group by.Allows the system to decide how long to keep an information item
  15. Data model -&gt; studies how the different systems represent information and organize them into streams.
  16. Processing of generic data, as derived from the relational modelAs already discussed in the time model, no relation between processing and time / timestampsIn TESLA  event notifications: strong focus on time (usually totally ordered, absolute or interval)
  17. Data model -&gt; studies how the different systems represent information and organize them into streams.
  18. Siriesce a trovare un esempio?
  19. This distinction also appears (sometimes implicitly) in the related work section of many papers.
  20. We will provide a few examples to better clarify these aspects in the next few slides, when we enter more into the details about languages and operators.
  21. Probably for our background (we are part of a software engineering group), the first topic that we want to address is about the abstraction we are building with event processing.As we will see this is also a rich field: despite many works have been carried out, in my opinion much has to be done.
  22. If we believe this data, we have already finished our work. We already build (many) systems that satisfy these requirements.