Processing Flows of Information DEBS 2011


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Parlarediintgrazione con I DB quandoparlidiKnowledgeBase
  • Parlarediintgrazione con I DB quandoparlidiKnowledgeBase
  • Parlarediintgrazione con I DB quandoparlidiKnowledgeBase
  • Parlarediintgrazione con I DB quandoparlidiKnowledgeBase
  • Parlarediintgrazione con I DB quandoparlidiKnowledgeBase
  • Parlarediintgrazione con I DB quandoparlidiKnowledgeBase
  • Parlarediintgrazione con I DB quandoparlidiKnowledgeBase
  • Clustered -> Processing nodes are co-located. Strictly connected. Usually adopting a shared-memory model. The aim is to split the processing load among different processing elements.Networked -> Processing nodes are geographically distributed. No shared memory. The aim is to reduce network usage by pushing processing as near as possible to sources.
  • MicrosoftStreamInsight allows users / system administrators to deploy multiple services, so that [elencopuntato]
  • What do we mean by automatic distribution? Take for example P2P applications for file sharing. No need to manually configure the network when new nodes join or leave it.
  • Heteorgeneous Resources -> e.g., when the algorithm tries to deploy part of the processing load on small devices like mobile phones or even wireless sensorsRule Rewriting -> Not only optimization inside single rules, but also multi-rule optimization. The rewriting algorithm depends from a set of rules.It is not clear if reuse and replications are assumptons from which starting or they are a “value” of the result and which one is better.
  • Observation model -> hybrid is common in existing applications. Indeed, they include both sources that actively send information (e.g., sensors) and sources with stable information the engine can use during processing (e.g., databases). These are modeled in our functional model using the knowledge base component.
  • Ordering of output streams is conceptually separate from the ordering of input streams -> Consider for example the use of Dstream, that outputs all the elements that are removed from the processing table. Or consider a sort by clause over a generic attribute.
  • Gigascope -> explicitly conceived to monitor traffic. Does not use the traditional Stream model. It does not use intermediate database tables for processing, but directly defines output streams starting from input streams.This requires some assumptions on the ordering of items to define a semantics for operators like join and group by.Allows the system to decide how long to keep an information item
  • Data model -> studies how the different systems represent information and organize them into streams.
  • Processing of generic data, as derived from the relational modelAs already discussed in the time model, no relation between processing and time / timestampsIn TESLA  event notifications: strong focus on time (usually totally ordered, absolute or interval)
  • Data model -> studies how the different systems represent information and organize them into streams.
  • Siriesce a trovare un esempio?
  • This distinction also appears (sometimes implicitly) in the related work section of many papers.
  • We will provide a few examples to better clarify these aspects in the next few slides, when we enter more into the details about languages and operators.
  • Probably for our background (we are part of a software engineering group), the first topic that we want to address is about the abstraction we are building with event processing.As we will see this is also a rich field: despite many works have been carried out, in my opinion much has to be done.
  • If we believe this data, we have already finished our work. We already build (many) systems that satisfy these requirements.
  • Processing Flows of Information DEBS 2011

    1. 1. Processing Flows of InformationFrom Data Streams to Complex Event Processing<br />GianpaoloCugola and Alessandro Margara<br />Politecnicodi Milano<br />Dip. Elettronica e Informazione<br />[cugola,margara]<br />“Processing Flows of Information: from data streams to complex event processing”ACM Computing Surveys, to appear (but available from:<br />
    2. 2. Background and motivation<br />Back in 2007 CEP was already a hot topic…<br />… but having a good grasp of the area was rather hard<br />As observed by OpherEtzion the area was looking like the “Tower of Babel”<br />Event Processing and the Babylon Tower – Event process thinking blog – Sept. 8, 2007<br />event<br />data<br />message<br />primitive<br />complex<br />composite<br />stream<br />flow<br />sequence<br />pattern<br />join<br />query<br />rule<br />condition<br />action<br />publish<br />subscribe<br />notify<br />advertise<br />send<br />receive<br />middleware<br />system<br />application<br />protocol<br />network<br />routing<br />2<br />DEBS 2011 - Tutorial<br />
    3. 3. Background and motivation<br />Several communities were contributing to the area…<br />… each bringing its own expertise and vocabulary…<br />…but often working in isolation<br />event<br />Researchers<br />data<br />message<br />DEBS<br />primitive<br />data management& databases<br />process modeling& automation<br />complex<br />composite<br />stream<br />flow<br />sequence<br />pattern<br />join<br />query<br />rule<br />condition<br />action<br />publish<br />subscribe<br />notify<br />middlewaresystems<br />ad-hoc tools(intrusion det., …)<br />advertise<br />send<br />receive<br />applicationservers<br />DBMSs<br />middleware<br />system<br />application<br />protocol<br />network<br />Tool vendors<br />routing<br />3<br />DEBS 2011 - Tutorial<br />
    4. 4. Background and motivation<br />That was 2007. What about today?<br />Things did not change much<br />From the “Event Process Thinking” blog – Sept. 2010<br />[Which is] the relation between event processing and data stream management?<br />They are aliases -- stream is just a collection of events, likewise, an event is just a member in a stream, and the functionality is the same<br />Stream management is a subset of event processing -- there are different ways to do event processing, streams is one of them<br />Event processing is a subset of stream management -- event streams is just one type of stream, but there are voice stream, video stream, …<br />Event processing and stream management are distinct and there is no overlapping between them<br />At the same time tool vendors are building tools that try to combine ¿ juxtapose ? different approaches<br />4<br />DEBS 2011 - Tutorial<br />
    5. 5. Our goal<br />We would like to compare different systems in a precise way<br />We would like to compare different approaches in a precise way<br />We would like to help people coming from different areas communicate and compare their work with others<br />We would like to isolate the open issues from those already solved<br />We would like to precisely identify the challenges<br />We would like to isolate the best part of the various approaches…<br />… finding a way to combine them<br />We need a modeling frameworkthat could accommodate all the proposals<br />5<br />DEBS 2011 - Tutorial<br />
    6. 6. Forget your own vocabulary(temporarily)<br />CEP, DSMSs, Active Databases… <br />All these terms hide a peculiar view of the domain we have in mind<br />With subtle, “unsaid” (and often unclear) differences<br />Before learning something new we have to forget what we already know<br />6<br />DEBS 2011 - Tutorial<br />
    7. 7. The Information Flow Processing domain<br />The IFP engine processes incoming flows of information according to a set of processing rules<br />Processing is “on line”<br />Sources produce the incoming information flows, sinks consume the results of processing, rule managers add or remove rules<br />Information flows are composed of information items<br />Items part of the same flow are neither necessarily ordered nor of the same kind<br /><ul><li>Processing involve filtering,combining, and aggregatingflows, item by item as theyenter the engine</li></ul>Sources<br />Sinks<br />IFP Engine<br />Information Flows<br />Information Flows<br />-----<br />-----<br />-----<br />-----<br />Rules<br />-----<br />-----<br />-----<br />-----<br />Rule managers<br />7<br />DEBS 2011 - Tutorial<br />
    8. 8. IFP: One name several incarnations<br />A lot of applications<br />Environmental monitoring through sensor networks<br />Financial applications<br />Fraud detection tools<br />Network intrusion detection systems<br />RFID-based inventory management<br />Manufacturing control systems<br />…<br />Several technologies<br />Active databases<br />DSMSs<br />CEP Systems<br />8<br />DEBS 2011 - Tutorial<br />
    9. 9. One model, several models<br />Different models to capture different viewpoints<br />Functional model<br />Processing model<br />Deployment model<br />Interaction model<br />Time model<br />Data model<br />Rule model<br />Language model<br />9<br />DEBS 2011 - Tutorial<br />
    10. 10. Functional model<br />Knowledgebase<br />A<br />A<br />A<br />History<br />History<br />Seq<br />History<br />Decider<br />Forwarder<br />Receiver<br />Producer<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />Clock<br />Rules<br />10<br />DEBS 2011 - Tutorial<br />
    11. 11. Functional model<br /><ul><li>Implements the transport protocol to move information items along the net
    12. 12. Acts as a demultiplexer</li></ul>Knowledgebase<br />A<br />A<br />A<br />History<br />History<br />Seq<br />History<br />Decider<br />Forwarder<br />Receiver<br />Producer<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />Clock<br />Rules<br />11<br />DEBS 2011 - Tutorial<br />
    13. 13. Functional model<br /><ul><li>Implements the transport protocol to move information items along the net
    14. 14. Acts as a multiplexer</li></ul>Knowledgebase<br />A<br />A<br />A<br />History<br />History<br />Seq<br />History<br />Decider<br />Forwarder<br />Receiver<br />Producer<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />Clock<br />Rules<br />12<br />DEBS 2011 - Tutorial<br />
    15. 15. A short digression<br />We assume rules can be (logically)decomposed in two parts: C -> A<br />C is the condition<br />A is the action<br />Example (in CQL):<br /> Select IStream(Count(*))From F1 [Range 1 Minute]Where F1.A > 0<br />This way we can split processingin two phases:<br />The detection phase determines the items that trigger the rule<br />The production phase use those items to produce the output of the rule<br />Knowledgebase<br />A<br />A<br />A<br />History<br />History<br />Seq<br />History<br />action<br />Decider<br />Forwarder<br />Receiver<br />condition<br />Producer<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />Clock<br />Rules<br />13<br />DEBS 2011 - Tutorial<br />
    16. 16. Functional model<br />Knowledgebase<br />A<br />A<br />A<br />Seq<br />Decider<br />Forwarder<br />History<br />Receiver<br />History<br />Producer<br />History<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />Clock<br />Rules<br />14<br />DEBS 2011 - Tutorial<br />
    17. 17. Functional model<br /><ul><li>Implements the detection phase
    18. 18. Accumulates partial results into the history
    19. 19. When a rule fires passes to the producer its action part and the triggering items</li></ul>Knowledgebase<br />A<br />A<br />A<br />Seq<br />Decider<br />Forwarder<br />History<br />Receiver<br />History<br />Producer<br />History<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />Clock<br />Rules<br />15<br />DEBS 2011 - Tutorial<br />
    20. 20. Functional model<br />Knowledgebase<br />A<br />A<br />A<br /><ul><li>Implements the production phase
    21. 21. Uses the items in Seq as stated in action A</li></ul>Seq<br />Decider<br />Forwarder<br />History<br />Receiver<br />History<br />Producer<br />History<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />Clock<br />Rules<br />16<br />DEBS 2011 - Tutorial<br />
    22. 22. Functional model<br />Knowledgebase<br />A<br />A<br />A<br />Seq<br />Decider<br />Forwarder<br />History<br />Receiver<br />History<br />Producer<br />History<br /><ul><li>Some systems allow rules to be added or removed at processing time</li></ul>-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />Clock<br />Rules<br />17<br />DEBS 2011 - Tutorial<br />
    23. 23. Functional model<br /><ul><li>If present models the ability of performing recursive processing building hierarchies of items</li></ul>Knowledgebase<br />A<br />A<br />A<br />Seq<br />Decider<br />Forwarder<br />History<br />Receiver<br />History<br />Producer<br />History<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />Clock<br />Rules<br />18<br />DEBS 2011 - Tutorial<br />
    24. 24. Functional model<br /><ul><li>Some systems allows rules to combine flowing items with items previously stored into a (read only) storage</li></ul>Knowledgebase<br />A<br />A<br />A<br />Seq<br />Decider<br />Forwarder<br />History<br />Receiver<br />History<br />Producer<br />History<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />Clock<br />Rules<br />19<br />DEBS 2011 - Tutorial<br />
    25. 25. Functional model<br />Knowledgebase<br />A<br />A<br />A<br /><ul><li>Optional component
    26. 26. Periodically creates special information items holding current time
    27. 27. Its presence models the ability of performingperiodic processing of inputs</li></ul>Seq<br />Decider<br />Forwarder<br />History<br />Receiver<br />History<br />Producer<br />History<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />Clock<br />Rules<br />20<br />DEBS 2011 - Tutorial<br />
    28. 28. Functional model: Considerations<br />The detection-production cycle<br />Fired by a new item I entering the engine through the Receiver<br />Including those periodically produced by the Clock, if present<br />Detection phase: Evaluates all the rules to find those enabled<br />Using item I plus the data into the Knowledge base, if present<br />The item I can be accumulated into the History for partially enabled rules<br />The action part of the enabled rules together with the triggering items (A+Seq) is passed to the producer<br />Production phase: Produces the output items<br />Combining the items that triggered the rule with data present in the Knowledge base, if present<br />New items are sent to subscribed sinks (through the Forwarder)…<br />…but they could also be sent internally to be processed again (recursive processing)<br />In some systems the action part of fired rules may also change the set of deployed rules<br />21<br />DEBS 2011 - Tutorial<br />
    29. 29. Functional model: Considerations<br />Maximum length of Seq a key aspect<br />1  PubSub<br />Bounded  <br />CQL like languages without time based windows<br />Pattern based languages without a Kleene+ operator<br />Other key aspects that impact expressiveness<br />Presence of the Clock<br />Models the ability to process rulesperiodically<br />Available in almost half of the systemsreviewed<br />Most Active DBs and DSMSs butfew CEP systems<br />Knowledgebase<br />A<br />A<br />A<br />History<br />History<br />Seq<br />History<br />Decider<br />Forwarder<br />Receiver<br />Producer<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />Clock<br />Rules<br />22<br />DEBS 2011 - Tutorial<br />
    30. 30. Functional model: Considerations<br />Presence of the Knowledge base<br />Only available in systems coming from the database community<br />Presence of the looping flow exitingthe Producer<br />Models the ability of performing recursiveprocessing<br />Half CEP systems have it<br />All Active DBs but very few DSMSshave it<br />They have nested rules<br />Support to dynamic rule change<br />Few systems support it<br />Can be implemented externally…<br />Through sinks acting also as rule managers<br />…but we think it is nice to have it internally<br />Knowledgebase<br />A<br />A<br />A<br />History<br />History<br />Seq<br />History<br />Decider<br />Forwarder<br />Receiver<br />Producer<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />Clock<br />Rules<br />23<br />DEBS 2011 - Tutorial<br />
    31. 31. The semantics of processing<br />What determines the output of each detection-production cycle?<br />The new item entering the engine<br />The set of deployed rules<br />The items stored into the History<br />The content of the Knowledge Base<br />Is this enough?<br />Example (in Padres and CQL):<br />Smoke && Temp>50 <br />Select IStream(Smoke.area)From Smoke[Rows 30 Slide 10], Temp[Rows 50 Slide 5]Where Smoke.area = Temp.area AND Temp.value > 50<br />Knowledgebase<br />A<br />A<br />A<br />History<br />History<br />History<br />History<br />History<br />History<br />Rules<br />Knowledgebase<br />Seq<br />Decider<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />Producer<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />24<br />DEBS 2011 - Tutorial<br />
    32. 32. Processing model<br />Three policies affect the behavior of the system<br />The selection policy<br />The consumption policy<br />The load shedding policy<br />25<br />DEBS 2011 - Tutorial<br />
    33. 33. Selection policy<br />Determines if a rule fires once or multiple times and the items actually selected from the History<br />Example:<br />A0 B<br />A1 B<br />A0 B<br />A1 B<br />R<br />R<br />R<br />R<br />multiple<br />?<br />Decider<br />Receiver<br />A<br />A A<br />A0 A1<br />single<br />or<br />B<br />A<br />A B<br />26<br />DEBS 2011 - Tutorial<br />
    34. 34. Selection policy: Considerations<br />Most systems adopt a multiple selection policy<br />It is simpler to implement<br />Is it adequate?<br />Example: Alert fire when smoke and high temperature in a short time frame<br />If 10 sensors read high temperature and immediately afterward one detects smoke I would like to receive a single alert, not 10<br />A few systems allow this policy to be programmed…<br />…some of them on a per-rule base<br />E.g., Amit, T-Rex<br />27<br />DEBS 2011 - Tutorial<br />
    35. 35. Selection policy: The TESLA case<br />TESLA (Trio-based Event Specification Language): the T-Rex language<br />A rule language for CEP. Tries to combine expressiveness and efficiency<br />Has a formally defined semantics<br />Expressed in Trio, a Metric Temporal Logic (see DEBS 2010)<br />Allows rule managers to choose their own selection policy on a per rule base<br />Example: Multiple selection<br /> define Fire(area: string, measuredTemp: double)from Smoke(area=$a) and each Temp(area=$a and val>50) within 1min. from Smokewhere area=Smoke.area and measuredTemp=Temp.value<br />Example: Single selection<br /> define Fire(area: string, measuredTemp: double)from Smoke(area=$a) and last Temp(area=$a and val>50) within 1min. from Smokewhere area=Smoke.area and measuredTemp=Temp.val<br />28<br />DEBS 2011 - Tutorial<br />
    36. 36. Selection policy: The TESLA case<br />TESLA (Trio-based Event Specification Language): the T-Rex language<br />A rule language for CEP. Tries to combine expressiveness and efficiency<br />Has a formally defined semantics<br />Expressed in Trio, a Metric Temporal Logic (see DEBS 2010)<br />Allows rule managers to choose their own selection policy on a per rule base<br />Example: Multiple selection<br /> define Fire(area: string, measuredTemp: double)from Smoke(area=$a) and each Temp(area=$a and val>50) within 1min. from Smokewhere area=Smoke.area and measuredTemp=Temp.value<br />Example: Single selection<br /> define Fire(area: string, measuredTemp: double)from Smoke(area=$a) and last Temp(area=$a and val>50) within 1min. from Smokewhere area=Smoke.area and measuredTemp=Temp.val<br /><ul><li>Alternatively you may use:
    37. 37. first…within
    38. 38. n-first…within n-last…within</li></ul>29<br />DEBS 2011 - Tutorial<br />
    39. 39. Consumption policy<br />Determines how the history changes after firing of a rule  what happens when new items enter the Decider<br />Example:<br />A B<br />A B<br />R<br />R<br />?<br />Decider<br />Receiver<br />A<br />zero<br />B<br />A<br />A B<br />selected<br />30<br />DEBS 2011 - Tutorial<br />
    40. 40. Consumption policy: Considerations<br />Most systems couple a multiple selection policy with a zero consumption policy<br />This is the common case with DSMSs, which use (sliding) windows to select relevant events<br />Example (in CQL)<br /> Select IStream(Smoke.area)From Smoke[Range 1 min], Temp[Range 1 min]Where Smoke.area = Temp.area AND Temp.val > 50<br />The systems that allow the selection policy to be programmed often allow the consumption policy to be programmed, too<br />E.g., Amit, T-Rex<br />31<br />DEBS 2011 - Tutorial<br />
    41. 41. Consumption policy: The TESLA case<br />Zero consumption policy<br />define Fire(area: string, measuredTemp: double)from Smoke(area=$a) and each Temp(area=$a and val>50)within 1min. from Smokewhere area=Smoke.area and measuredTemp=Temp.value<br />Selected consumption policy<br />define Fire(area: string, measuredTemp: double)from Smoke(area=$a) and each Temp(area=$a and val>50)within 1min. from Smokewhere area=Smoke.area and measuredTemp=Temp.valueconsuming Temp<br />Fire!<br />Fire!<br />Fire!<br />Fire!<br />Fire!<br />Fire!<br />Fire!<br />Fire!<br />Fire!<br />Fire!<br />Fire!<br />Fire!<br />T<br />T<br />T<br />T<br />S<br />S<br />T<br />T<br />T<br />T<br />S<br />S<br />T<br />T<br />T<br />T<br />32<br />DEBS 2011 - Tutorial<br />
    42. 42. Load shedding policy<br />Problem: How to manage bursts of input data<br />It may seem a system issue<br />i.e., an issue to solve into theReceiver<br />But it strongly impacts the resultsproduced<br />i.e., the “semantics” of the rules<br />Accordingly, some systems allows this issue to be determined on a per-rule basis<br />e.g., Aurora allows rules to specify the expected QoS and sheds input to stay within limits with the available resources<br />Conceptually the issue is addressed into the decider<br />Knowledgebase<br />A<br />A<br />A<br />History<br />History<br />Seq<br />History<br />Decider<br />Forwarder<br />Receiver<br />Producer<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />-----<br />Clock<br />Rules<br />33<br />DEBS 2011 - Tutorial<br />
    43. 43. Deployment model<br />IFP applications may include a large number of sources and sinks<br />Possibly dispersed over a wide geographical area<br />It becomes important to consider the deployment architecture of the engine<br />How the components of the functional model can be distributed to achieve scalability<br />34<br />DEBS 2011 - Tutorial<br />
    44. 44. Deployment model<br />Sources<br />Sinks<br />IFP Engine<br />Centralized<br />Clustered<br />Networked<br />Information Flows<br />Information Flows<br />-----<br />-----<br />-----<br />-----<br />Rules<br />-----<br />-----<br />-----<br />-----<br />Distributed<br />Rule managers<br />35<br />DEBS 2011 - Tutorial<br />
    45. 45. Deployment Model<br />Most existing systems adopt a centralized solution<br />When distributed processing is allowed, it is usually based on clustered solutions<br />A few systems have recognized the importance of networked deployment for some applications<br />E.g. Microsoft StreamInsight (part of SQLServer)<br />Filtering near sources<br />Aggregation and correlation in-network<br />Analytics and historical data in a centralized server/cluster<br />In most cases, deployment/configuration is not automatic<br />36<br />DEBS 2011 - Tutorial<br />
    46. 46. Deployment model<br />Automatic distribution of processing introduces the operator placementproblem<br />Given a set of rules (composed of operators) and a set of nodes<br />How to split the processing load<br />How to assign operators to available nodes<br />In other words (Event Processing in Action)<br />Given an event processing network<br />How to map it onto the physical network of nodes<br />37<br />DEBS 2011 - Tutorial<br />
    47. 47. Operator placement<br />The operator placement problem is still open<br />Several proposals<br />Often adopting techniques coming from the Operational Research<br />Difficult to compare solutions and results<br />Even in its simplest form the problem is NP-hard<br />Good survey paper by Lakshmanan, Li, and Strom<br />“Placement Strategies for Internet-Scale Data Stream Systems”<br />38<br />DEBS 2011 - Tutorial<br />
    48. 48. Operator placement<br />Goals<br />Operator Placement<br />39<br />DEBS 2011 - Tutorial<br />
    49. 49. Operator placement<br />Different goals have been considered for operator placement<br />Depending from the deployment architecture<br />Depending from the application requirements<br />Examples:<br />Minimize delay / network usage (bandwidth)<br />Pietzuch et. Al. “Network-Aware Operator Placement for Stream-Processing Systems”<br />Reduce hardware resource usage / Obtain load balancing<br />Relevant in clustered scenarios<br />Several papers for IBM System S<br />40<br />DEBS 2011 - Tutorial<br />
    50. 50. Operator placement<br />Heterogeneous Resources<br />Rule Rewriting<br />Goal<br />User-defined QOS<br />Operator Placement<br />Centralized vs Distributed<br />Re-use vs Replication<br />On-line vsOff-line<br />41<br />DEBS 2011 - Tutorial<br />
    51. 51. More on deployment model<br />Operator placement is only part of the problem<br />Other issues<br />How to build the network of nodes?<br />How to maintain it?<br />How to gather the information required to solve the operator placement problem?<br />How to actually “place” the operators?<br />How to “replace” them when the situation changes?<br />New rules added, old rules removed…<br />…new sources/sinks<br />42<br />DEBS 2011 - Tutorial<br />
    52. 52. Deployment model and dynamics<br />How to cope with mobile nodes?<br />Mobile sinks and sources…<br />…but also mobile “processors”<br />The issue is relevant<br />We leave in a mobile world<br />Very few proposals<br />A lot of work in the area of pure publish/subscribe<br />Several works published in DEBS, not to mention other major conferences/journals<br />May we reuse some of this work?<br />43<br />DEBS 2011 - Tutorial<br />
    53. 53. Interaction Model<br />Sources<br />Sinks<br />IFP Engine<br />It is interesting to study the characteristics of the interactions among the main component of an IFP system<br />Who starts the communication?<br />44<br />DEBS 2011 - Tutorial<br />
    54. 54. Interaction Model<br />Sources<br />Sinks<br />IFP Engine<br />Observation Model<br />Forwarding Model<br />Notification Model<br /><ul><li>Push
    55. 55. Pull
    56. 56. Push
    57. 57. Pull
    58. 58. Push
    59. 59. Pull</li></ul>45<br />DEBS 2011 - Tutorial<br />
    60. 60. Time Model<br />Relationship between information items and passing of time<br />Ability of an IFP system to associate some kind of happened-before (ordering) relationship to information items<br />We identified 4 classes:<br />Stream-only<br />Causal<br />Absolute<br />Interval<br />46<br />DEBS 2011 - Tutorial<br />
    61. 61. Stream-Only Time Model<br />Used in original DSMSs<br />Timestamps may be present or not<br />When present, they are used only to group items when they enter the processing engine<br />They are not exposed to the language<br />With the exception of windowing constructs<br />Ordering in output streams is conceptually separate from the ordering in input streams<br />CQL/Stream<br />Select DStream(*)<br />From F1[Rows 5],<br /> F2[Range 1 Minute]<br />Where F1.A = F2.A<br />47<br />DEBS 2011 - Tutorial<br />
    62. 62. Causal Time Model<br />Each item has a label reflecting some kind of causal relationship<br />Partial order<br />E.g. Rapide<br />An event is causally ordered after all events that led to its occurrence<br />Gigascope<br />Selectcount(*)<br />From A, B<br />Where A.a-1 <= B.b and<br /> A.a+1 > B.b<br />A.a, B.b monotonically increase<br />A<br />B<br />C<br />48<br />DEBS 2011 - Tutorial<br />
    63. 63. Absolute Time Model<br />Information items have an associated timestamp<br />Defining a single point in time w.r.t. a (logically)unique clock<br />Total order<br />Timestamps are fully exposed to the language<br />Timestamps can be given by the source, or by the processing engine<br />TESLA/T-Rex<br />Define Fire(area: string,<br />measuredTemp: double) <br />From Smoke(area=$a) and last <br /> Temp(area=$a and value>45)<br /> within 5 min. from Smoke <br />Where area=Smoke.areaand<br />measuredTemp=Temp.value<br />49<br />DEBS 2011 - Tutorial<br />
    64. 64. Interval Time Model<br />Used for events to include “duration”<br />SnoopIB, Cayuga, NextCEP, …<br />At a first sight, it is a simple extension of the absolute time model<br />Timestamps with two values: start time and end time<br />However, it opens many issues<br />What is the successor of an event?<br />What is the timestamp associated to a composite event?<br />50<br />DEBS 2011 - Tutorial<br />
    65. 65. Interval Time Model<br />Which is the immediate successor of A?<br />Choose according to end time only: B<br />But it started before A!<br />Exclude B: C, D<br />Both of them?<br />Which of them?<br />No other event strictly between A and its successor: C, D, E<br />Seems a natural definition<br />Unfortunately we loose associativity!<br />X(YZ) ≠ (X Y)Z<br />May impede some rule rewriting for processing optimizations<br />51<br />DEBS 2011 - Tutorial<br />
    66. 66. Interval Time Model<br />“What is “Next” in event processing?” by White et. Al<br />Proposes a number of desired properties to be satisfied by the “Next” function<br />There is one model that satisfies them all<br />Complete History<br />It is not sufficient to encode timestamps using a couple of values<br />Timestamps of composite events must embed the timestamps of all the events that led to their occurrence<br />Possibly, timestamps of unbounded size<br />In case of unbounded Seq<br />52<br />DEBS 2011 - Tutorial<br />
    67. 67. Data Model<br />Studies how the different systems<br />Represent single data items<br />Organize them into data flows<br />Data<br />Data Items<br />Nature of Items<br /><ul><li>Generic Data
    68. 68. Event Notifications</li></ul>Format<br /><ul><li>Records
    69. 69. Tuples
    70. 70. Objects
    71. 71. …</li></ul>Support for Uncertainty<br />Data Flows<br /><ul><li>Homogeneous
    72. 72. Heterogeneous</li></ul>53<br />DEBS 2011 - Tutorial<br />
    73. 73. Nature of Items<br />The meaning we associate to information items<br />Generic data<br />Event notifications<br />Deeply influences several other aspects of an IFP system<br />Time model !!!<br />Rule language<br />Semantics of processing<br />Heritage of the heterogeneous backgrounds of different communities<br />Data<br />Data Items<br />Nature of Items<br /><ul><li>Generic Data
    74. 74. Event Notifications</li></ul>Format<br /><ul><li>Records
    75. 75. Tuples
    76. 76. Objects
    77. 77. …</li></ul>Support for Uncertainty<br />Data Flows<br /><ul><li>Homogeneous
    78. 78. Heterogeneous</li></ul>54<br />DEBS 2011 - Tutorial<br />
    79. 79. Nature of Items<br />CQL/Stream (Generic Data)<br />SelectIStream(*)<br />From F1[Rows 5],<br /> F2[Range 1 Minute]<br />Where F1.A = F2.A<br />TESLA/T-Rex (Event Notifications)<br />Define Fire (area: string,<br />measuredTemp: double) <br />From Smoke(area=$a)and last <br /> Temp(area=$a and value>45)<br />within 5 min.<br />from Smoke <br />Where area=Smoke.area and<br />measuredTemp=Temp.value<br />55<br />DEBS 2011 - Tutorial<br />
    80. 80. Format of Items<br />How information is represented<br />Influences the way items are processed<br />E.g., Relational model requires tuples<br />Data<br />Data Items<br />Nature of Items<br /><ul><li>Generic Data
    81. 81. Event Notifications</li></ul>Format<br /><ul><li>Records
    82. 82. Tuples
    83. 83. Objects
    84. 84. …</li></ul>Support for Uncertainty<br />Data Flows<br /><ul><li>Homogeneous
    85. 85. Heterogeneous</li></ul>56<br />DEBS 2011 - Tutorial<br />
    86. 86. Support for Uncertainty<br />Ability to associate a degree of uncertainty to information items<br />To the content of items<br />Imprecise temperature reading<br />To the presence of an item (occurrence of an event)<br />Spurious RFID reading<br />When present, probabilistic information is usually exploited in rules during processing<br />Data<br />Data Items<br />Nature of Items<br /><ul><li>Generic Data
    87. 87. Event Notifications</li></ul>Format<br /><ul><li>Records
    88. 88. Tuples
    89. 89. Objects
    90. 90. …</li></ul>Support for Uncertainty<br />Data Flows<br /><ul><li>Homogeneous
    91. 91. Heterogeneous</li></ul>57<br />DEBS 2011 - Tutorial<br />
    92. 92. Data Flows<br />Homogeneous<br />Each flow contains data with the same format and “kind”<br />E.g. Tuples with identical structure<br />Often associated with “database-like” rule languages<br />Heterogeneous<br />Information flows are seen as channels connecting sources, processors, and sinks<br />Each channel may transport items with different kind and format<br />Data<br />Data Items<br />Nature of Items<br /><ul><li>Generic Data
    93. 93. Event Notifications</li></ul>Format<br /><ul><li>Records
    94. 94. Tuples
    95. 95. Objects
    96. 96. …</li></ul>Support for Uncertainty<br />Data Flows<br /><ul><li>Homogeneous
    97. 97. Heterogeneous</li></ul>58<br />DEBS 2011 - Tutorial<br />
    98. 98. Rule Model<br />Rules are much more complex entities than data items<br />Large number of different approaches<br />Already observed in the previous slides<br />Looking back to our functional model, we classify them into two macro classes<br />Transforming rules<br />Detecting rules<br />Rule<br />Type of Rules<br /><ul><li>Transforming Rules
    99. 99. Detecting Rules</li></ul>Support for Uncertainty<br />59<br />DEBS 2011 - Tutorial<br />
    100. 100. Transforming Rules<br />Do not present an explicit distinction between detection and production<br />Define an execution plan combining primitiveoperators<br />Each operator transforms one or more input flows into one or more output flows<br />The execution plan can be defined<br />explicitly (e.g., through graphical notation)<br />implicitly (using a high level language)<br />Often used with homogeneous information flows<br />To take advantage of the predefined structure of input and output<br />Rule<br />Type of Rules<br /><ul><li>Transforming Rules
    101. 101. Detecting Rules</li></ul>Support for Uncertainty<br />60<br />DEBS 2011 - Tutorial<br />
    102. 102. Detecting Rules<br />Present an explicit distinction between detection and production<br />Usually, the detection is based on a logical predicate that captures patterns of interest in the history of received items<br />Rule<br />Type of Rules<br /><ul><li>Transforming Rules
    103. 103. Detecting Rules</li></ul>Support for Uncertainty<br />61<br />DEBS 2011 - Tutorial<br />
    104. 104. Support for Uncertainty<br />Two orthogonal aspects<br />Support for uncertain input<br />Allows rules to deal with/reason about uncertain input data<br />Support for uncertain output<br />Allows rules to associate a degree of uncertainty to the output produced<br />Rule<br />Type of Rules<br /><ul><li>Transforming Rules
    105. 105. Detecting Rules</li></ul>Support for Uncertainty<br />62<br />DEBS 2011 - Tutorial<br />
    106. 106. Language Model<br />Specify operations to<br />Filter<br />Join<br />Aggregate<br />input flows …<br />… to produce one or more output flows<br />Following the rule model, we define two classes of languages:<br />Transforming languages<br />Declarative languages<br />Imperative languages<br />Detecting languages<br />Pattern-based<br />63<br />DEBS 2011 - Tutorial<br />
    107. 107. Following the rule model, we define two classes of languages:<br />Transforming languages<br />Declarative languages<br />Imperative languages<br />Detecting languages<br />Pattern-based<br />Language Model<br />Specify the expected result rather than the desired execution flow<br />Usually derive from relational languages<br />Relational algebra / SQL<br />CQL/Stream:<br />Select IStream(*)<br />From F1[Rows 5],<br /> F2[Rows 10]<br />Where F1.A = F2.A<br />64<br />DEBS 2011 - Tutorial<br />
    108. 108. Following the rule model, we define two classes of languages:<br />Transforming languages<br />Declarative languages<br />Imperative languages<br />Detecting languages<br />Pattern-based<br />Language Model<br />Specify the desired execution flow<br />Starting from primitive operators<br />Can be user-defined<br />Usually adopt a graphical notation<br />65<br />DEBS 2011 - Tutorial<br />
    109. 109. Imperative Languages<br />Aurora (Boxes & Arrows Model)<br />66<br />DEBS 2011 - Tutorial<br />
    110. 110. Hybrid Languages<br />Oracle CEP<br />67<br />DEBS 2011 - Tutorial<br />
    111. 111. Following the rule model, we define two classes of languages:<br />Transforming languages<br />Declarative languages<br />Imperative languages<br />Detecting languages<br />Pattern-based<br />Language Model<br />Specify a firing condition as a pattern<br />Select a portion of incoming flows through<br />Logic operators<br />Content / timing constraints<br />The action uses selected items to produce new knowledge<br />68<br />DEBS 2011 - Tutorial<br />
    112. 112. Detecting Languages<br />Action<br />TESLA / T-Rex<br />DefineFire(area: string, measuredTemp: double) <br />From Smoke(area=$a) and last <br />Temp(area=$a and value>45)<br />within 5 min. from Smoke <br />Where area=Smoke.areaand<br />measuredTemp=Temp.value<br />Condition (Pattern)<br />69<br />DEBS 2011 - Tutorial<br />
    113. 113. Language Model<br />Different syntaxes / constructs / operators<br />Comparison of languages semantics still an open issue<br />Our approach:<br />Review all operators encountered in the analysis of systems<br />Specifying the classes of languages adopting them<br />Trying to capture some semantics relationship<br />Among operators<br />70<br />DEBS 2011 - Tutorial<br />
    114. 114. Language Model<br />Single-Item operators<br />Selection operators<br />Filter items according to their content<br />Elaboration operators<br />Projection<br />Extracts a part of the content of an item<br />Renaming<br />Changes the name of a field in languages based on records or tuples<br />Present in all languages<br />Defined as primitive operators in imperative languages<br />Declarative languages inherit selection, projection, and renaming from relational algebra<br />Renaming<br />Select RStream<br />(I.Price as HighPrice)<br />From Items[Rows 1] as I<br />Where I.Price > 100<br />Projection<br />Selection<br />71<br />DEBS 2011 - Tutorial<br />
    115. 115. Language Model<br />Single-Item operators<br />Selection operators<br />Filter items according to their content<br />Elaboration operators<br />Projection<br />Extracts a part of the content of an item<br />Renaming<br />Changes the name of a field in languages based on records or tuples<br />Pattern-based languages<br />Selection inside the condition part (pattern)<br />Elaboration as part of the action<br />Projection<br />DefineExpensiveItem (highPrice: double) <br />From Item(price>100)<br />Where highPrice = price<br />Selection<br />Renaming<br />72<br />DEBS 2011 - Tutorial<br />
    116. 116. Language Model<br />Logic Operators<br />Conjunction<br />Disjunction<br />Repetition<br />Negation<br />Explicitly present in pattern-based languages<br />PADRES<br />(A & B) || (C & D)<br />Conjunction<br />Disjunction<br />73<br />DEBS 2011 - Tutorial<br />
    117. 117. Language Model<br />Logic Operators<br />Conjunction<br />Disjunction<br />Repetition<br />Negation<br />Some logic operators are blocking<br />Express pattern whose validity cannot be decided into a bounded amount of time<br />E.g., Negation<br />Used in conjunction with windows<br />DefineFire()<br />From Smoke(area=$a) and<br /> not Rain(area=$a)<br /> within 10 min from Smoke<br />Negation<br />Window<br />74<br />DEBS 2011 - Tutorial<br />
    118. 118. Language Model<br />Logic Operators<br />Conjunction<br />Disjunction<br />Repetition<br />Negation<br />Traditionally, logic operators were not explicitly offered by declarative and imperative languages<br />However, they could be expressed as transformation of input flows<br />Conjunction of A and B<br />Select IStream<br /> (F1.A, F2.B)<br />From F1 [Rows 10],<br /> F2 [Rows 20]<br />75<br />DEBS 2011 - Tutorial<br />
    119. 119. Language Model<br />Sequences<br />Similar to logic operators<br />Based on timing relations among items<br />Present in almost all pattern-based languages<br />DefineFire() <br />From Smoke(area=$a) and last <br /> Temp(area=$a and value>45)<br /> within 5 min. from Smoke <br />Sequence (time-bounded)<br />76<br />DEBS 2011 - Tutorial<br />
    120. 120. Language Model<br />Sequences<br />Similar to logic operators<br />Based on timing relations among items<br />Traditionally, transforming languages did not provide sequences explicitly<br />Could be expressed with an explicit reference to timestamps<br />If present inside items<br />Select IStream<br /> (F1.A, F2.B)<br />From F1 [Rows 10], F2 [Rows 20]<br />Where F1.timestamp <F2.timestamp<br />Impose timestamp order<br />77<br />DEBS 2011 - Tutorial<br />
    121. 121. Language Model<br />Iterations<br />Express possibly unbounded sequences of items …<br />… satisfying an iterating condition<br />Implicitly defines an ordering among items<br />Iteration (Kleene +)<br />SASE+<br />PATTERN SEQ(Alert a, Shipment+ b[ ])<br />WHERE skip_till_any_match(a, b[ ]) {<br />a.type= ’contaminated’ and<br /> b[1].from = and b[i].from = b[i-1].to<br />} WITHIN 3 hours<br />78<br />DEBS 2011 - Tutorial<br />
    122. 122. Language Model<br />Logic operators, sequences, and iterations traditionally not offered by transforming languages<br />And now?<br />Current trend:<br />Embed patterns inside declarative languages<br />Especially adopted in commercial systems<br />Esper<br />Select A.price<br />From pattern<br /> [every (A (B or C))]<br />Where A.price > 100 <br />IMO: difficult to write / understand<br />Mailing list of Esper!<br />79<br />DEBS 2011 - Tutorial<br />
    123. 123. Language Model<br />Windows<br />Kind:<br />Logical (Time-Based)<br />Physical (Count-Based)<br />User-Defined<br />Movement<br />Fixed<br />Landmark<br />Sliding<br />Pane<br />Tumble<br />Logical<br />SelectIStream(Count(*))<br />From F1[Range 1 Minute]<br />Physical<br />Select IStream(Count(*))<br />From F1[Rows 50 Slide 10]<br />80<br />DEBS 2011 - Tutorial<br />
    124. 124. Language Model<br />User-Defined (Stream-Mill)<br />WINDOW AGGREGATE positive min(Next Real): Real {<br /> TABLE inwindow(wnext real);<br /> INITIALIZE : { }<br /> ITERATE : {<br /> DELETE FROM inwindow<br /> WHERE wnext < 0<br /> INSERT INTO RETURN<br /> SELECT min(wnext)<br /> FROM inwindow<br /> } <br /> EXPIRE : { }<br />}<br />81<br />DEBS 2011 - Tutorial<br />
    125. 125. Language Model<br />Windows are used to limit the scope of blocking operators<br />They are generally available in declarative and imperative languages<br />They are not present in all pattern-based languages<br />Some of them do not include blocking operators<br />Some of them “embed” windows inside operators<br />Making them unblocking<br />CEDR<br />EVENT Test-Rule<br />WHENUNLESS(A, B, 12 hours)<br />WHERE A.a< B.b<br />82<br />DEBS 2011 - Tutorial<br />
    126. 126. Language Model<br />Flow management operators<br />Required by declarative and imperative languages to merge, split, organize, and process incoming flows of information<br />Flow Management Operators<br />Join<br />Order By<br />Union<br />Group By<br />Except<br />Flow Creation<br />Bag Operators<br />Intersect<br />Duplicate<br />Remove-duplicates<br />83<br />DEBS 2011 - Tutorial<br />
    127. 127. Language Model<br />Parameterization<br />Allows the binding of different information items based on their content<br />Offered implicitly by declarative and imperative languages<br />Through a combination of join and selection<br />Offered as an explicit operator in pattern-based languages<br />Join<br />CQL / Stream<br />SelectIStream (F1.A, F2.B)<br />From F1 [Rows 10], F2 [Rows 20]<br />Where F1.A > F2.B<br />Selection<br />Explicit Parameter<br />84<br />TESLA / T-Rex<br />Define Fire() <br />From Smoke(area=$a) and last <br /> Temp(area=$a and value>45)<br /> within 5 min. from Smoke <br />DEBS 2011 - Tutorial<br />
    128. 128. Language Model<br />Detection<br />Aggregate<br />DefineFire(area: string,<br />measuredTemp: double) <br />From Smoke(area=$a) and<br />45 < Avg(Temp(area=$a).value<br />within 5 min. from Smoke)<br />Where area=Smoke.areaand measuredTemp=Temp.value<br />Aggregates<br />Scope<br /><ul><li>Detection Aggregates
    129. 129. Production Aggregates</li></ul>Definition<br />DefineFire(area: string,<br />measuredTemp: double) <br />From Smoke(area=$a) and<br />last Temp(area=$a and value>45)<br />within 5 min. from Smoke)<br />Where area=Smoke.areaand measuredTemp=Avg(Temp(area=$a).value)<br />within 1 hour from Smoke<br /><ul><li>Predefined
    130. 130. User-defined</li></ul>Production<br />Aggregate<br />85<br />DEBS 2011 - Tutorial<br />
    131. 131. Discussion<br />86<br />DEBS 2011 - Tutorial<br />
    132. 132. Abstraction<br />IFP languages and systems offer a level of abstraction over flowing information<br />Similar to the role SQL / DBMSs play for static data<br />The heterogeneity of solutions suggests that finding the “right” abstraction is still an open issue<br />Several open questions<br />Applications<br />API (Including Rules)<br />IFP System<br />87<br />DEBS 2011 - Tutorial<br />
    133. 133. Abstraction<br />Which are the requirements of applications?<br />According to the results of EPTS survey<br />Simple operations (mainly aggregate / join)<br />Few sources (mainly 1-10, and 10-100)<br />Low rates (mainly 10-100, and 100-1000 event/s)<br />Data coming from DBMSs / files<br />88<br />DEBS 2011 - Tutorial<br />
    134. 134. Abstraction<br />How can we explain these answers?<br />IFP systems used as “enhanced” DBMSs<br />IMO, companies are using the technology they know<br />E.g. Language: no need for patterns or no knowledge about patterns?<br />Features vs Simplicity/Integration<br />This is the “consolidated” part of IFP systems<br />IMO, raising the level of abstraction would make IFP systems more appealing for a larger audience<br />Is it possible to combine advanced features and ease of use?<br />89<br />DEBS 2011 - Tutorial<br />
    135. 135. Abstraction<br />Other details are present in the EPTS survey<br />Most of the people was referring to technologies that were “already deployed” or “in development”<br />The systems were chosen based on the trust on the vendor<br />This suggests that the last (research) advancements in event processing may be still unknown<br />They lack a solid implementation …<br />… that can easily work side by side with existing data processing systems<br />90<br />DEBS 2011 - Tutorial<br />
    136. 136. Abstraction<br />How many kinds of languages / abstractions?<br />Traditionally, two<br />Transforming rules<br />Based on the Stream model<br />Detecting rules<br />Based on patterns<br />Often merged together in commercial systems<br />“Merged together” does not mean “organically combined”<br />The underlying model is usually derived from the Stream model<br />Good integration with relational languages<br />Difficult to integrate with patterns<br />Difficult to understand the semantics of rules<br />Can we offer a better model / formalism?<br />91<br />DEBS 2011 - Tutorial<br />
    137. 137. Abstraction<br />Do we really need “One abstraction to rule ‘em all”?<br />Again, we need to better understand application requirements …<br />… but we also need to better analyze the expressiveness and semantics of languages!<br />In our survey, we offer a description of existing operators<br />Open issues:<br />How do operators contribute to the overall expressiveness of languages?<br />Is it possible to find a “minimal set” of operators to organically combine the capabilities of transforming and detecting rules?<br />92<br />DEBS 2011 - Tutorial<br />
    138. 138. Our Proposal: TESLA<br />“One size fits all” language does not exist<br />At least, we have to find it yet<br />We started from the following requirements<br />Focus specifically on events<br />Not generic data processing<br />Define a clear and precise semantics<br />Issues of selection and consumption policies<br />General issue of formal semantics<br />Be expressive<br />Useful for applications<br />Keep it simple<br />Easy to read and write<br />93<br />DEBS 2011 - Tutorial<br />
    139. 139. DefineCE(Att1 : Type1, …, Attn : Typen)<br />From Pattern<br />Where Att1 = f1(..), …, Attn = fn(..)<br />Consuminge1, …, em<br />TESLA<br />94<br />DEBS 2011 - Tutorial<br />
    140. 140. Patterns in TESLA<br />CE<br />Selection of a single event<br />A(x>10)<br />Timer()<br />Selection of sequences<br />A(x>10) and each B within 5 min from A<br />A(x>10) and last B within 5 min from A<br />A(x>10) and first B within 5 min from A<br />Generalization<br />n-first / n-last<br />A(X=15)<br />B<br />B<br />5 min<br />CE<br />CE<br />A (X=15)<br />B<br />B<br />CE<br />A (X=15)<br />B<br />B<br />CE<br />A (X=15)<br />B<br />B<br />95<br />DEBS 2011 - Tutorial<br />
    141. 141. Patterns in TESLA<br />TESLA allows *-within operators to be composed with each other:<br />In chains of events<br />A and each B<br />within 3 min from A<br /> and last C<br />within 2 min from B<br />In parallel<br />A and each B<br />within 3 min from A<br /> and last C<br />within 4 min from A<br />Parameters can be added between events in a pattern<br />A<br />B<br />C<br />B<br />A<br />C<br />96<br />DEBS 2011 - Tutorial<br />
    142. 142. Negations and Aggregates<br />Two kinds of negations:<br />Interval based:<br />A and last B<br />within 3 min from A<br />and not C between B and A<br />Time based:<br />A and not C within 3 min from A<br />Similarly, two kinds of aggregates<br />Interval based<br />Use values appearing between two events<br />Time based<br />Use values appearing in a time interval<br />97<br />DEBS 2011 - Tutorial<br />
    143. 143. Iterations<br />We believe that explicit operators for repetitions are difficult to use/understand<br />Especially when different selection/consumption policies are allowed<br />We achieve the same goal using hierarchies of events<br />Complex events can be used to define (more) complex events<br />Recurring schemes of repetitions<br />Macros! <br />98<br />DEBS 2011 - Tutorial<br />
    144. 144. Formal Semantics<br />The semantics of each TESLA operator / rule is formally defined through a set of TRIO metric temporal logic formulas<br />They define an equivalence between the presence of a given pattern and the occurrence of one or more complex events, specifying:<br />The occurrence time of complex events<br />The content of complex events<br />The set of consumed events<br />The language is unambiguous<br />A user can in advance check the semantics of her rules<br />This makes it possible to check the correctness of an event detection engine using a model checker<br />Given a history of events and a set of TESLA rules …<br />… does the engine satisfy all the formulas ?<br />99<br />DEBS 2011 - Tutorial<br />
    145. 145. Abstraction<br />Support for uncertainty<br />Many applications deal with imprecise (or even incorrect) data from sources<br />E.g., sensors, RFID, …<br />Uncertain rules may increase the expressiveness / usefulness of an IFP system<br />Addressed only in a few research papers …<br />… and still missing in existing systems<br />100<br />DEBS 2011 - Tutorial<br />
    146. 146. Abstraction<br />Which are the key requirements for an uncertainty model?<br />It should be suitable for the application it is designed for<br />Is it possible to have a single model?<br />It should be easy to manage / understand<br />To allow application domain experts to easily build rules on top of it<br />It should have a limited impact on the performance of the overall system<br />To preserve the required QoS<br />101<br />DEBS 2011 - Tutorial<br />
    147. 147. Abstraction<br />Support for QoS policies<br />Allow users to define application-dependent policies<br />Which data is more important?<br />Which items is it better to discard in case of system overload?<br />Which QoS metric is more significant for the application?<br />Completeness of results<br />Latency<br />…<br />Traditionally offered only by DSMSs<br />Most systems simply adopt a best-effort strategy<br />102<br />DEBS 2011 - Tutorial<br />
    148. 148. Abstraction<br />Support for a Knowledge Base<br />Some systems offer special operations to read from persistent storage<br />Possibly a key-feature for the integration with existing DBMSs<br />Support for periodic evaluation of rules<br />As opposed to purely reactive evaluation<br />103<br />DEBS 2011 - Tutorial<br />
    149. 149. Abstraction<br />Capability to dynamically change the set of deployed rules<br />Add / Remove<br />Activate / Deactivate<br />Context awareness<br />Tutorial at DEBS 2010<br />Could be used to increase the expressiveness of IFP …<br />… but also to simplify / speedup processing<br />Context / Situation used to reason on the status of the system / application<br />Possibly combined with the possibility to activate / deactivate rules<br />104<br />DEBS 2011 - Tutorial<br />
    150. 150. Abstraction<br />Where to offer the IFP abstraction?<br />As a standalone product/system<br />Similar to DBMSs<br />As a middleware component<br />Similar to a Publish/Subscribe system<br />To guide the interaction of other components<br />As part of a programming language<br />Similar to what LINQ (Language-Integrated Query) offers to C# and to the .Net Framework<br />Explored in EventJava<br />Extension of Java for event correlation<br />105<br />DEBS 2011 - Tutorial<br />
    151. 151. Time<br />Going back to our time model …<br />… is there a “right approach” to deal with time?<br />Using models that define a total order among information items seems to me easier to manage/understand<br />Do we really need interval time for events?<br />How do we define operators like sequences?<br />To offer a proper level of abstraction …<br />… but also to allow an efficient implementation<br />106<br />DEBS 2011 - Tutorial<br />
    152. 152. Time<br />Many existing systems offer operators that rely of some notion of time<br />Main problem: out-of-order arrival of information items<br />Requirements<br />Preserve the semantics of processing<br />Keep the delay for obtaining results as small as possible<br />Solutions<br />Trade-off between correct and low latency processing<br />Sometimes controlled by users through predefined or customizable policies<br />Open issues<br />Solutions for distributed / incremental processing may introduce delay at each processing step<br />Problem not addressed in existing solutions for distributed processing<br />107<br />DEBS 2011 - Tutorial<br />
    153. 153. Processing<br />Many processing algorithms proposed<br />DSMSs cannot exploit traditional indexing techniques<br />New effort is required to efficiently evaluate standing queries<br />Most CEP systems are based on automata …<br />… or in general incremental processing<br />Perform processing incrementally, as new information is available<br />Academia: ODE, Cayuga, NextCEP, Amit, TRex, etc.<br />Industry: Esper/Nesper<br />We are currently investigating other solutions<br />Keep the whole history of all received information items<br />Ad-hoc data structures that simplify information retrieval<br />Start evaluation only when required<br />Faster!<br />Easier to parallelize (more on this later …)<br />108<br />DEBS 2011 - Tutorial<br />
    154. 154. Our Experience<br />Processing Time (microsec)<br />Multiple Selection Policy<br />Processing Time (microsec)<br />Single Selection Policy<br />109<br />DEBS 2011 - Tutorial<br />
    155. 155. Processing<br />Are we using the right algorithms?<br />Can we isolate the best algorithm for a particular set of operators?<br />Accurate analysis of performance<br />Can we switch among different algorithms depending from the workload?<br />Deployed rules …<br />… but also analysis of traffic<br />Adaptive middleware<br />110<br />DEBS 2011 - Tutorial<br />
    156. 156. Processing<br />Exploit parallel hardware<br />Multi-core CPUs, but also GP-GPUs<br />To handle several rules concurrently<br />To speedup complex operations inside a single rule<br />Only partially investigated<br />E.g., streaming aggregates<br />Can be extended to different operators?<br />Also in this case the best solution may depend from the workload<br />111<br />DEBS 2011 - Tutorial<br />
    157. 157. Our Experience<br />112<br />DEBS 2011 - Tutorial<br />
    158. 158. Our Experience<br />GPU can significantly speedup huge computations<br />Fixed overhead to activate it makes it unsuitable for simple tasks<br />Optimal workload: reduced set of very complex rules<br />Additional advantage: CPU cores are free for other task<br />Process simple rules …<br />… but also handle marshalling/unmarshalling and communication with sources and sinks<br />The trend. GPUs are becoming:<br />Easier to program<br />Suitable for an increasing number of tasks<br />Not only data parallel algorithms<br />113<br />DEBS 2011 - Tutorial<br />
    159. 159. Processing<br />Exploit existing models for computation on clusters / clouds<br />Recently received great attention<br />E.g. Map-Reduce<br />Are these models suitable for IFP systems?<br />To efficiently support the expressiveness of existing languages<br />Can we create new models?<br />Explicitly conceived to deal with IFP rules<br />114<br />DEBS 2011 - Tutorial<br />
    160. 160. Processing<br />Exploit domain-dependent information to speedup processing<br />E.g., implication between complex events or situations<br />Can be part of context information, or combined with it<br />In certain cases information can be automatically deduced<br />Depending from the data and rule models adopted<br />115<br />DEBS 2011 - Tutorial<br />
    161. 161. Distribution of Processing<br />Most systems rely on centralized processing<br />When distributed processing is allowed, a clustered deployment model is often adopted<br />Cooperating processors are co-located<br />The problem of networked distribution has been addressed only marginally<br />Are there applications that involve large scale scenarios?<br />E.g. monitoring applications<br />Potentially introduce new issues<br />Bandwidth may become a bottleneck<br />Data may be transferred over non-dedicated channels<br />Several applications may run on the same “processing network”<br />116<br />DEBS 2011 - Tutorial<br />
    162. 162. Operator Placement<br />Current approaches mainly focus on (sub)optimal placement with global knowledge<br />Often based on over-simplified model …<br />… that ignore important implementation details<br />Research directions<br />Distributed protocols allowing autonomous decisions at each node<br />Fast adaptation to application load<br />Monitoring and statistical analysis of data<br />Optimization of multiple rules on the same processing network<br />117<br />DEBS 2011 - Tutorial<br />
    163. 163. Workloads<br />A major problem when it comes to evaluate and compare different systems is the lack of workloads<br />Benchmark<br />Only one benchmark available: Linear Road<br />Strongly tailored for the Stream model<br />Difficult to adapt to other systems<br />Huge parameter space<br />Difficult to isolate relevant cases<br />DEBS Grand Challenge may be a first step in this direction …<br />Data coming from real / realistic use cases<br />To understand which operators are more important …<br />… and what to optimize in the processing algorithm<br />118<br />DEBS 2011 - Tutorial<br />
    164. 164. Security / Privacy<br />Define security models and policies<br />Which pieces of information can be consumed be a single operator or a set of operators<br />Important: output data may have a different visibility w.r.t. input data!<br />The system may be allowed to consume single data items, but not to extract new information from them, or to distribute this information<br />Implement protocols and algorithms to enforce them<br />119<br />DEBS 2011 - Tutorial<br />
    165. 165. Thanks for your Attention!<br />120<br />DEBS 2011 - Tutorial<br />
    166. 166. Questions?<br />121<br />DEBS 2011 - Tutorial<br />