Aaai 2011 event processing tutorial
Upcoming SlideShare
Loading in...5
×
 

Aaai 2011 event processing tutorial

on

  • 4,966 views

AAAI 2011 - Tutorial: Introduction to event processing and challenges for the next generations of event processing of interest to the AI community

AAAI 2011 - Tutorial: Introduction to event processing and challenges for the next generations of event processing of interest to the AI community

Statistics

Views

Total Views
4,966
Views on SlideShare
4,953
Embed Views
13

Actions

Likes
8
Downloads
346
Comments
0

2 Embeds 13

https://twitter.com 11
http://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Aaai 2011 event processing tutorial Aaai 2011 event processing tutorial Presentation Transcript

  • Event processing – State of the art and research challenges AAAI 2011 Tutorial, San Francisco, August 7 th , 2011 Opher Etzion ( [email_address] ) Yagil Engel ( [email_address] )
  • Slides available at: ie.technion.ac.il/~yagile/EP_Tutorial.pdf
  • Imagine that… A driver gets notification on the car screen: the person crossing the street is an Alzheimer patient out of his regular route, he lives in 5 King Street. A national park gets information on all cars heading to the park from the car computer; can open more parking lots and notify cars that the park will be closed.
  • Agenda Introduction and roots of event processing Players and architecture of event processing Current state of the art in event processing Challenges in event processing systems Summary I II III IV V
  • I: Introduction and roots of event processing
  • What is “event processing” anyway? or Event processing is a form of computing that performs operations on events
  • In computing we processed events since early days Network and System Management
  • Emerging technologies in enterprise computing (Gartner Hype Cycle, Summer 2009)
  • What ’s new? The analog: moving from files to DBMS In recent years – architectures, abstractions, and dedicated commercial products emerge to support functionality that was traditionally carried out within regular programming. For some applications it is an improvement in TCO; for others is breaking the cost-effectiveness barrier.
  • What is an event – three views An event is anything that happens, or is contemplated as happening. The happening view The state change view An event is a state of change of anything The detectable condition view An event is a detectable condition that can trigger a notification
  • In daily life we often react to events..
  • Many times we react to combination of events within a context The house sensor detects that the child did not arrive home within 2 hours from the scheduled end of classes for the day I want to be notified when my own investment portfolio is down 5% since the start of the trading Day ; have an agent call me when I am available , send SMS when I am in a meeting , and Email when I am out of office .
  • Event Patterns Pattern detection is one of the notable functions of event processing
  • What we actually want to react to are – situations TOLL VILOATOR FRUSTRATED CUSTOMER Sometimes the situation is determined by detecting that some pattern occurred in the Flowing events. Toll violation Frustrated customer Sometimes the events can approximate or indicate with some certainty that the situation has occurred
  • Event processing is being used for various reasons
  • Ancestor: Production Rules When Precondition Fire Action The precondition is implicit event when activated in forward chaining
  • Ancestor: active databases On event When condition Do action With coupling mode Composite events were inherited to event processing
  • Ancestor: Data Stream management system Source: Ankur Jain ’s website
  • Event processing and Data stream management? Aliases? One of them subset of the other? Totally unrelated concepts?
  • Ancestor: Temporal databases There is a substantial temporal nature to event processing. Recently – also spatial and spatio-temporal functions are being added
  • Ancestor: Discrete event simulation
  • Ancestor: Formal Verification
  • Ancestor: Network and system management
  • Ancestor: Messaging – pub/sub middleware
  • II: Players and architecture of event processing
  • Event Driven Architecture Event driven architecture: asynchronous, decoupled; each component is autonomic.
  • Fast Flower Delivery Flower Store Van Driver Ranking and Reporting System Bid Request Delivery Bid Assignments, Bid alerts, Assign Alerts Control System GPS Location Location Service Location Driver ’s Guild Ranking and reports Delivery confirmation Pick Up confirmation Ranked drivers / automatic assignment Bid System Store Preferences Delivery Request Assignment System Manual Assignment Assignment Assignments, Pick Up Alert Delivery Alert http://www.ep-ts.com/EventProcessingInAction
  • Event Processing Agent Context Event Channel Event Consumer Event Type Event Producer Global State The seven Building blocks
  • Event processing network
  • Example of EPN – part of the FFD example
  • Event type definition Detection time, Occurrence time, source, Certainty… Stock id, quote, volume… Free comments…
  • Producer – State Observer in workflows State observer Push: Instrumentation points; Pull: Query the state
  • Producer – Code instrumentation
  • Producer – syndication
  • Producers – video streams to events
  • Producer – sensors
  • Producer and consumer - Sixth sense
  • Twitter as a producer and consumer
  • Consumer - Performance monitoring dashboard
  • Consumer - Ambient Orb
  • Event Processing Agent Filter Transform Detect Pattern Translate Aggregate Split Compose Enrich Project Event Processing Agents
  • The EPA picture
  • Filter EPA A filter EPA is an EPA that performs filtering only, and has no matching or derivation steps, so it does not transform the input event .
  • Transform EPA sub types
  • Sample of pattern types
    • all pattern is satisfied when the relevant event set contains at least one instance of each event type in the participant set
    • any pattern is satisfied if the relevant event set contains an instance of any of the event types in the participant set
    • absence pattern is satisfied when there are no relevant events
    • relative N highest values pattern is satisfied by the events which have the N highest value of a specific attribute over all the relevant events, where N is an argument
    • value average pattern is satisfied when the value of a specific attribute, averaged over all the relevant events, satisfies the value average threshold assertion.
    • always pattern is satisfied when all the relevant events satisfy the always pattern assertion
    • sequence pattern is satisfied when the relevant event set contains at least one event instance for each event type in the participant set, and the order of the event instances is identical to the order of the event types in the participant set.
    • increasing pattern is satisfied by an attribute A if for all the relevant events, e1 << e2  e1.A < e2.A
    • relative max distance pattern is satisfied when the maximal distance between any two relevant events satisfies the max threshold assertion
    • moving toward pattern is satisfied when for any pair of relevant events e1, e2 we have e1 << e2  the location of e2 is closer to a certain object then the location of e1.
  • Pattern detection example Pattern name: Manual Assignment Preparation Pattern Type: relative N highest Context: Bid Interval Relevant event types: Delivery Bid Pattern parameter: N = 5; value = Ranking Cardinality: Single deferred Find the five highest bids within the bid interval Taken from the Fast Flower Delivery use case
  • Our entire culture is context sensitive
    • In the play “The Tea house of the August Moon” one of the characters says: Pornography question of geography
    • This says that in different geographical contexts people view things differently
    • Furthermore, the syntax of the language (no verbs) is typical to the way that the people of Okinawa are talking
    When hearing concert people are not talking, eating, and keep their mobile phone on “silent”.
  • Context has three distinct roles (which may be combined) Partition the incoming events The events that relate to each customer are processed separately Grouping events together Different processing for Different context partitions Determining the processing Grouping together events that happened in the same hour at the same location
  • Context Definition A context is a named specification of conditions that groups event instances so that they can be processed in a related way. It assigns each event instance to one or more context partitions . A context may have one or more context dimensions. Temporal Spatial State Oriented Segmentation Oriented
  • Context Types Context Fixed location Entity distance location Event distance location Spatial State Oriented Fixed interval Event interval Sliding fixed interval Sliding event interval Temporal Segmentation Oriented
  • Context Types Examples Spatial State Oriented Temporal Context “ Every day between 08:00 and 10:00 AM ” “ A week after borrowing a disk” “ A time window bounded by TradingDayStart and TradingDayEnd events ” “ 3 miles from the traffic accident location ” “ Within an authorized zone in a manufactory ” “ All Children 2-5 years old” “ All platinum customers” “ Airport security level is red” “ Weather is stormy” Segmentation Oriented
  • III: The current states of the art in event processing
  • An Observation The Babylon Tower symbolizes the tendency Of humanity to talk in multiple languages.
    • The Event Processing area is no different: most languages in the industry really follow
    • the hammer and nails syndrome – and extended existing approaches
    • imperative script language
    • SQL extensions
    • Extension of inference rule language
    The epts language analysis workgroup is aimed to understand the various styles And extract common functions that can be used to define what is an event processing language; this tutorial is an interim report It does not seem that we ’ll succeed to settle In the near future around a single programming style
  • The Babylon tower and current state of the practice
  • StreamBase Studio
  • StreamBase Pattern Matching
  • CCL Studio (Coral8  Sybase)
  • CCL – Pattern Matching
    • RFID monitoring application
      • Checks if a tag has been seen by readers A and B, then C, but not D, within a 10 second window.
    Insert into StreamAlerts Select StreamA.id From StreamA a, StreamB b, StreamC c, StreamD d Matching [10 seconds: a && b, c, !d] On a.id = b.id = c.id = d.id
  • Microsoft Streaminsights var topfive = (from window in inputStream.Snapshot() from e in window orderby e.f ascending, e.i descending select e).Take(5); var avgCount = from v in inputStream group v by v.i % 4 into eachGroup from window in eachGroup.Snapshot() select new { avgNumber = window.Avg(e => e.number) };
  • Esper EPL – FFD Example /* * Not delivered up after 10 mins (600 secs) of the request target delivery time */ insert into AlertW(requestId, message, driver, timestamp) select a.requestId, &quot;not delivered&quot;, a.driver, current_timestamp() from pattern[ every a=Assignment  (timer:interval(600 + (a.deliveryTime-current_timestamp)/1000) and not DeliveryConfirmation(requestId = a.requestId) and not NoOneToReceiveMSG(requestId = a.requestId)) ];
  • ruleCore - Reakt
    • Event stream view - a unique context of events
      • a view contains a window into the inbound stream of events and contains commonly only semantically related events
    • Situation - an interesting combination of multiple events as they occur over time
      • An item with an RFID tag being picked up from the shelf and then moving past the checkout without being paid for
    • Rule - an active event processing entity reacting to specific combinations of inbound events over time
    • Action - the last part of a rule's evaluation in response to a detected situation
  • Amit - Situation
  • IBM Websphere Business Events
  • Apama EPL – FFD Examples
  • Performance benchmarks There is a large variance among applications, thus a collection of benchmarks should be devised, and each application should be classified to a benchmark Some classification criteria: Application complexity Filtering rate Required Performance metrics
  • Performance benchmarks – cont. Adi A., Etzion O. Amit - the situation manager. The VLDB Journal – The International Journal on Very Large Databases. Volume 13 Issue 2, 2004. Mendes M., Bizarro P., Marques P. Benchmarking event processing systems: current state and future directions. WOSP/SIPEW 2010: 259-260 . Previous studies ‎indicate that there is a major performance degradation as application complexity increases.
  • Throughput Input throughput output throughput Processing throughput Measures: number of input events that the system can digest within a given time interval Measures: Total processing times / # of event processed within a given time interval Measures: # of events that were emitted to consumers within a given time interval
  • Latency latency In the E2E level it is defined as the elapsed time FROM the time-point when the producer emits an input event TO the time-point when the consumer receives and output event The latency definition But – input event may not result in output event: It may be filtered out, participate in a pattern but does not result in pattern detection, or participates in deferred operation (e.g. aggregation) Similar definitions for the EPA level, or path level
  • Performance goals and metrics
    • Multi-objective optimization function:
      • min(  *avg latency + (1-  )*(1/thoughput))
    Max throughput All/ 80% have max/avg latency < δ All/ 90% of time units have throughput > Ω minmax latency minavg latency latency leveling
  • Scalability in event processing: various dimensions # of producers # of input events # of EPA types # of concurrent runtime instances # of concurrent runtime contexts Internal state size # of consumers # of derived events Processing complexity
  • Scalability solutions Significant progress in scalability enablers that provides feasibility for a system based on large scale event sources, event quantities, computations and actuators Smart placements of processing elements with dynamic load balancing Fault tolerance techniques enable trustable automatic processing Virtualization (scale-in) Use of parallel processing – multi-core and GPU processors – without extra programming efforts
  • IV: Challenges in event processing systems
  • Challenges
    • Inexact Event Processing
    • Predictive Event Processing
    • Use of Machine Learning
    • From Reactive to Proactive
    • Correctness
  • Inexact event processing
  • Uncertain situations False positive: The pattern is matched; The real-world situation does not occur False negative: The pattern is not matched; The real-world situation occurs
  • Temporal indeterminacy T1 T2
  • Challenges
    • Inexact Event Processing
    • Predictive Event Processing
    • Use of Machine Learning
    • From Reactive to Proactive
    • Correctness
  • Predictive Event Processing (1) VS. Photo by Michael Gray, Flickr
  • Predictive Event Processing (2) VS. +
  • Predictive Event Patterns
      • Pattern  Future event, probability, time interval
      • “ 4 high value deposits from different geographic locations within 3 days ”
        •  “ 0.6 chance for a large transfer abroad, in 1 day”
    “ Output event will occur with distribution D over interval (t1,t2)” Stock decrease of > 5% in 3 hours  Good chance for 2% increase within 2 hours
  • Limitations of the use of rules in specifying predictive event patterns
    • Limitations:
      • Partial patterns
      • Uncertain input events
      • Complex relationship between random variables
    Rule = hard-coded probabilistic Relationship
  • Dynamic event prediction Time Series Prediction Graphical models Temporal Graphical models
  • Graphical Model for Missing a Flight (Logistics Scenario)
  • Predictive Model for Missing a Flight (Logistics Scenario)
  • Predictive Model for Missing a Flight (Logistics Scenario)
  • Predictive Model for Missing a Flight (Logistics Scenario)
  • Continuous Time Bayesian Networks (CTBN, Nodelman et al, 2002)
    • Can be used to model probabilistic and temporal relationship between events E.g., Applied for the problem of detecting host-level attacks in network traffic (Xu and Shelton, 2008)
  • Anomaly Detection in Networks (Xu and Shelton, 2008)
  • CTBN model (Xu and Shelton, 2008)
  • Challenges
    • Inexact Event Processing
    • Predictive Event Processing
    • Use of Machine Learning
    • From Reactive to Proactive
    • Correctness
  • Machine Learning in EP Systems
    • Requires for training predictive capabilities:
      • Learn parameters / structure of graphical models
      • Learn predictive rules
    • Discover the patters used by EPAs
  • Event Pattern Discovery
    • Most (almost all) deployed systems today rely on user input to obtain complex event patterns
    • How can (business) users obtain these patterns?
      • Users do not know all the patterns that are relevant
      • System must be built and maintained by domain experts
  • Requirements of Data Mining Algorithms
    • What DM algorithms should be able to do?
      • Low frequency patterns
      • Temporal Windows
      • Assertions and Thresholds
      • Non-Standard patterns
  • Low Frequency Patterns
    • Detecting rare events:
      • Frauds, attacks
      • Predict crashes
      • Equipment failure
      • Natural disasters
    • Solutions:
      • Low support mining
      • Unsupervised learning for anomaly detection
  • Temporal Windows
    • Time window should be output of the DM process
    • Work by Mannila et al. 1997 : WINEPI
  • Assertions and Thresholds
    • Pattern “3 cash deposits in one day” may have no predictive value
    • BUT
    • “ 3 cash deposits above $10000 from 3 different locations” does
    • Multiattribute mining (Hellerstein et al.)
  • Other kinds of patterns
    • We may be interested in patterns which are not sequential:
      • “ All”, “Absence”, “Max Value”, “Sometime”
      • “ If there is no deposit to this account in the last year,…”
      • “ If the maximal value of deposit to this account in the last year is $5,…”
      • “ If at least one of the deposits where made from abroad,…”
  • Challenges
    • Inexact Event Processing
    • Predictive Event Processing
    • Use of Machine Learning
    • From Reactive to Proactive
    • Correctness
  • From Reactive to Proactive
  • Example: Call Center Queue Assignment MDP Model: States (S) : queue status Actions (a) : assignments Reward (R) : penalty for waiting and blocking Transition (T) : call arrival, call ending
  • Proactivity: Call Center Example
  • Proactive Event-Driven Computing (1) predict (states, events) Real-time decision Proactive action events Event processing (filter, transform, match patterns) events Detect / Derive Predict Decide Act events Proactive event-driven computing is a new paradigm aimed at predicting the occurrence of problems or opportunities before they occur, and changing the course of actions to mitigate or leverage them
  • Energy Scenario Detect  Predict Decide  Act Consumption Level Production Level State Generator Failure Generator Fixed Weather Forecast (sun, wind, temp, storm) Consumption Forecast Production Forecast Outage Prediction Many Failed Generators Prediction Call for Urgent Generators Fix Activate Expensive Diesel Generators Declare “Peak Hours” for Tomorrow Activate Rolling Blackout
  • Detect Monitor shipment progress and various related alerts (traffic, cargo handling time at airport, carriers being late) Predict According to current route, the shipment will be 3 hours late and we will incur high penalty Decide Find alternative route which (given new condition) is faster than previous route Act Generate cargo reservations, reroute shipment Critical Shipment Logistics
  • Personal reschedule Detect I got out of the house 20 Minutes late; there are three spots of traffic congestion on the way to the office; it is raining; and I have an important meeting in 25 minutes! Predict I am not going to get to the meeting, not even close! Decide Check whether there is a qualified person for this meeting that can replace me and has lower priority task for the duration of this meeting and reschedule his/her other obligations; Alternatively, check if there Is another time-slot later on the day for which the meeting can be rescheduled and get a decision! Act Notify all involved on their reschedule.
  • Electric car – battery replacement overload Detect Tracking the cars driving within a certain area and their battery status. Predict In 2 hours the service stations in the area will be out of charged batteries. Decide Whether there are available spare batteries nearby that can be shipped via car, or a helicopter need to be dispatched to ship batteries from the central store. Act Load batteries on selected means of transportation and start the journey! Background: A company leases electric cars that can drive up to 100 miles; it provides both personal and public battery charge spots, and robotic battery replacement service stations as part of the lease.
  • Portfolio tuning Detect Track corporate actions, news, exchange prices, and rumors about all securities in my portfolio Predict My portfolio is going to exceed my personal risk limit within 1 hour Decide Mark the securities to be sold and best timing to sell, find an alternative to buy that retain the risk limit. Act Buy/Sell orders
  • Predict
    • Uncertain Rules
    • Bayesian Network
    • Classifiers:
        • Decision trees
        • Naïve Bayes
    Decide
    • Temporal Decision Process
    • Optimization tools (black box)
    Probabilistic events Analytics
    • Events
    Actions Proactive Event-Driven Computing (2)
  • Event Processing DM vs. AI DM
    • EP: scalable decision making, under large steams of online information
    • AI: state-based, decision-theoretic deliberation
    • EP+AI: EP synthesize streams to meaningful bit of info, AI operates on reduced state space
  • Decision Making for Proactive E-D Computing
    • Decision Rules: EPAs that react to future events
    • Markov Decision Process
      • Model for policy optimization under uncertainty
      • Model must be updated when the predictive EP modules predicts relevant future events
        • Requires online adjustment of policy
          • Brafman, Domshlak, Engel, and Feldman, AAAI 2011
    • External Optimization tools
      • E.g., route planner for the logistics scenario
      • Parameterization, or shared resource information
  • Proactive EP: Challenges to the EPN
    • Event Life Span
    • Response from Actuators
    • Multiple Proactive Agents
    • State Driven vs. Event-Driven
  • Challenges
    • Inexact Event Processing
    • Predictive Event Processing
    • Use of Machine Learning
    • From Reactive to Proactive
    • Correctness
  • Correctness The ability of a developer to create correct implementation for all cases (including the boundaries) Observation: A substantial amount of effort is invested today in many of the tools to workaround the inability of the language to easily create correct solutions
  • Some correctness topics The right interpretation of language constructs The right order of events The right classification of events to windows
  • The right interpretation of language constructs – example All (E1, E2) – what do we mean? A customer both sells and buys the same security in value of more than $1M within a single day Deal fulfillment: Package arrival and payment arrival 6/3 10:00 7/3 11:00 8/3 11:00 8/3 14:00
  • Fine tuning of the semantics (I) When should the derived event be emitted? When the Pattern is matched ? At the window end?
  • Fine tuning of the semantics (II) How many instances of derived events should be emitted? Only once? Every time there is a match ?
  • Fine tuning of the semantics (III) What happens if the same event happens several times? Only one – first, last, higher/lower value on some predicate? All of them participate in a match?
  • Fine tuning of the semantics (IV) Can we consume or reuse events that participate in a match?
  • Fine tuning of semantics – conclusion
    • Some languages have explicit policies:
    • Example: CCL Keep policies
      • KEEP LAST PER Id
      • KEEP 3 MINUTES
      • KEEP EVERY 3 MINUTES
      • KEEP UNTIL ( ” MON 17:00:00 ” )
      • KEEP 10 ROWS
      • KEEP LAST ROW
      • KEEP 10 ROWS PER Symbol
    In other cases – explicit programming and workarounds are used if semantics intended is different than the default semantics
  • The right order of events - scenario
    • Bid scenario- ground rules:
    • All bidders that issued a bid within the validity interval participate in the bid.
    • The highest bid wins. In the case of tie between bids, the first accepted bid wins the auction
    ===Input Bids=== Bid Start 12:55:00 credit bid id=2,occurrence time=12:55:32,price=4 cash bid id=29,occurrence time=12:55:33,price=4 cash bid id=33,occurrence time=12:55:34,price=3 credit bid id=66,occurrence time=12:55:36,price=4 credit bid id=56,occurrence time=12:55:59,price=5 Bid End 12:56:00 ===Winning Bid=== cash bid id=29,occurrence time=12:55:33,price=4 Trace: Race conditions: Between events; Between events and Window start/end
  • Ordering in a distributed environment - possible issues Even if the occurrence time of an event is accurate, it might arrive after some processing has already been done If we used occurrence time of an event as reported by the sources it might not be accurate, due to clock accuracy in the source Most systems order event by detection time – but events may switch their order on the way
  • Clock accuracy in the source Clock synchronization Time server, example: http:// tf.nist.gov/service/its.htm
  • Buffering technique
    • Assumptions:
      • Events are reported by the producers as soon as they occur;
      • The delay in reporting events to the system is relatively small, and can be bounded by a time-out offset ;
      • Events arriving after this time-out can be ignored.
    Sorted Buffer (by occurrence time) To t > To +  Producers Event Processing
    • Principles:
      • Let  be the time-out offset, according to the assumption it is safe to assume that at any time-point t, all events whose occurrence time is earlier than t -  have already arrived.
      • Each event whose occurrence time is To is then kept in the buffer until To+  , at which time the buffer can be sorted by occurrence time, and then events can be processed in this sorted order.
  • Retrospective compensation Out of order event Recalculation Retraction of previous EPA results Not always possible!
  • Classification to windows - scenario Calculate Statistics for each Player (aggregate per quarter) Calculate Statistics for each Team (aggregate per quarter) Window classification: Player statistics are calculated at the end of each quarter Team statistics are calculated at the end of each quarter based on the players events arrived within the same quarter All instances of player statistics that occur within a quarter window must be classified to the same window, even if they are derived after the window termination.
  • V: Summary
  • Event processing is an emerging technology Potential for mutually beneficial interaction with AI  Make the next generation A vehicle to substantially Change the world Already attracted coverage of analysts and all major software vendors Event Patterns It barely scratched the surface Of its potential
  • REFERENCES (StoA of Event Processing)
    • Opher Etzion and Peter Niblett, Event Processing in Action , Manning, 2010.
    • Mani Chandy and Roy Schulte, Event Processing: Designing IT Systems for Agile Companies , McGraw Hill, 2009.
    • David Luckham, The Power of Events: An Introduction to Complex Event Processing in Distributed Enterprise Systems , Addison-Wesley, 2002.
    • Gianpaolo Cugola and Alessandro Margara, Processing Flows of Information: From Data Stream to Complex Event Processing , to appear in ACM Computing Surveys. Available through: http://home.dei.polimi.it/margara/papers/survey.pdf
  • REFERENCES (Challenges Section)
    • H. Mannila, H. Toivonen, and A. Inkeri Verkamo, Discovery of frequent episodes in event sequences , Data Mining and Knowledge Discovery, 1997.
    • J.L. Hellerstein, S. Ma, and C.S. Perng, Discovering actionable patterns in event data , IBM Systems Journal, 2002
    • R.I. Brafman, C. Domshlak, Y. Engel, and Z. Feldman, Planning for Operational Control Systems with Predictable Exogenous Events , AAAI 2011
    • Y. Engel and O. Etzion, Towards Proactive Event Driven Computing, DEBS 2011