Your SlideShare is downloading. ×
0
A Stratified Approach for Supporting High
           Throughput Event Processing Applications

  Geetika T. Lakshmanan    ...
Outline
     Why scalable event processing is an important problem?
     Some terms (EPN, EPA)
     What has been done ...
Our Goal
     Devise a generic framework to maximize the overall input (and thus
     output) throughput of an event proce...
Why is this an important problem?
     Quantity of events that a single application needs to process is
    constantly in...
Event Processing Agent
        An event processing agent has input and output event channels.
        In general it rece...
Related Work
     Scalability in event processing
         –   Scalable event processing infrastructure (E.g. Astrolobe (...
Is this a solved problem?

                                         Centralized stream processing
                        ...
Overview of Our Solution
       Profiling
        –   Used to assign agents to nodes in order to maximize throughput
    ...
Distributed Event Processing Network Architecture
     Input: Specification of an Event Processing Application
     Outp...
Stratified Event Processing Graph
     1. Define the event processing application in the form of an Event
        Processi...
Example: Credit Card Scenario
     Event Processing Dependency Graph
                                           High Volum...
Assigning Nodes to Each Stratum
    Goal: Executing at a user set percentage of their capacity, nodes in a
     stratum c...
Assigning Nodes to Each Stratum
         – ti : User set percentage of node capacity
         – mi : All the incoming even...
Overview of Dynamic Load Distribution Algorithm
        Statistics collected by event Proxy:
         – Number of input e...
Overview of Dynamic Load Distribution Algorithm
                         Stratum n                              Stratum n+...
Post Migration Utilization Calculation
      We need to determine whether this migration will lead to overload. If it
   ...
Implementation
      Used nodes running IBM Active Middleware Technology (AMiT), a CEP
       engine that serves as a con...
Results
                  450000
                                                                                         ...
Results
                  25,000

                                                                                        ...
Results
                                50.00
                                              32.67         40.00
          ...
Results
                  60000
                                                                                          ...
Results
                                                                                                              7000...
Support for Dynamic Changes in EP Graph
      Our algorithm supports:
        –   Addition of a new connected subgraph to...
Conclusion and Future Work
      We demonstrate a novel architecture for distributed event
       processing that maximiz...
Backup Slides




25
Goal of Implementation
      Explore benefits of event processing on stratified vs. centralized
       (single node) vs. ...
Upcoming SlideShare
Loading in...5
×

Debs Presentation 2009 July62009

957

Published on

A Stratified Approach for Supporting High Throughput Event Processing Applications -
DEBS 2009 presentation delivered by Geetika T. Lakshmanan

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
957
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
51
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Debs Presentation 2009 July62009"

  1. 1. A Stratified Approach for Supporting High Throughput Event Processing Applications Geetika T. Lakshmanan Yuri G. Rabinovich Opher Etzion IBM T. J. Watson Research Center IBM Haifa Research Lab IBM Haifa Research Lab gtlakshm@us.ibm.com yurir@il.ibm.com opher@il.ibm.com July 2009
  2. 2. Outline  Why scalable event processing is an important problem?  Some terms (EPN, EPA)  What has been done already?  Overview of our solution – Credit-Card Scenario – Profiling and initial assignment of nodes to strata – Stratification – Load Distribution Algorithm – Algorithm optimizations and support for dynamic changes in event processing graph  Implementation and Results  Conclusion 2
  3. 3. Our Goal Devise a generic framework to maximize the overall input (and thus output) throughput of an event processing application which is represented as an EPN, given a specific set of resources (cluster of nodes with varying computational power) and a traffic model. The framework should be adaptive to changes either in the configuration or in the traffic model. Engine EPN Engine Event EPA EPA Event Producer EPA EPA EPA EPA Consumer EPA EPA Event Event Producer Engine Consumer Repository EPA EPA Event Event EPA EPA Producer Consumer 3
  4. 4. Why is this an important problem?  Quantity of events that a single application needs to process is constantly increasing (E.g. RFID events, Massive Online Multiplayer Games, financial transactions).  Manual partitioning is difficult (due to semantic dependencies between event processing agents) particularly when it is required to be adaptive and dynamic. 4
  5. 5. Event Processing Agent  An event processing agent has input and output event channels.  In general it receives a collection of events as input, derives one or more events as output and emits them on one or more output channels.  The input channels are partitioned according to a context which partitions the space of events according to semantic partitions of relevance Input Output Event Processing Agent Event Processing Agent Channel Channel Derived Event Context Agent Spec Definition Filter Transform Detect Pattern Route Translate Aggregate Split Enrich 5
  6. 6. Related Work  Scalability in event processing – Scalable event processing infrastructure (E.g. Astrolobe (2003), PIER (2003), Sienna (2000.)) – Controlled input load shedding (Kulkarni et al. (2008)). – CEP over streams (Wu et al. (2006)). – More work needs to be done.  Numerous centralized implementations arising due to interdependencies among event processing agents.  Synergy between stream processing and event processing. – Distributed stream processing techniques: • Mehta et al., 1995 • Shah et al., 2003 • Balazinska et al., 2004 • Kumar et al., 2005 • Xing et al., 2005, 2006 • Zhou et al., 2006 • Pietzuch et al., 2006 • Gu et al., 2007 • Lakshmanan et al., 2008 6
  7. 7. Is this a solved problem? Centralized stream processing Implementations Centralized event processing Load distribution algorithms Implementations for scalable stream processing Shah et al., Mehta et al., Gu et al., Xing et al. Zhou et al., Liu et al. ……… Scalable event processing implementations (Astrolobe, PIER, Sienna) Event-at-a-time Implementations Set-at-a-time Implementations 7
  8. 8. Overview of Our Solution  Profiling – Used to assign agents to nodes in order to maximize throughput  Stratification of EPN – Splitting the EPN into strata layers – Based on semantic dependencies between agents – Distributed implementation design with event proxy to relay events between strata  Load Distribution – Distribute load among agents dynamically during runtime and respect statistical load relationships between nodes 8
  9. 9. Distributed Event Processing Network Architecture  Input: Specification of an Event Processing Application  Output: Stratified EPN (event processing operations event processing agents) Stratum 1 Stratum 2 Stratum 3 EP Node EP Node EP Node Events Events EP EP EP EP Node EP Node EP Node Proxy Proxy Proxy EP Node EP Node EP Node DB DB DB  Event Proxy receives input events and routes them to nodes in a stratum according to the event context.  Event proxy periodically collects performance statistics per node in a stratum. 9
  10. 10. Stratified Event Processing Graph 1. Define the event processing application in the form of an Event Processing Network Dependency Graph  G=(V,E) (directed edges from event source to event target) 2. Overview of Stratification Algorithm  Create partitions by finding sub graphs that are independent in the dependency graph.  For each sub graph, construct a network of EPAs.  Push filters to the beginning of the network to filter out irrelevant events.  Iterate through graph and identify areas of strict interdependence. (i.e. sub graphs with no connecting edges).  For each sub graph define stratum levels. High Volume Purchase Give Discount to Amount > Purchase More Than 5 Occurrences Company 100 Discount High Volume Within 1 Hour Canceled Purchase Give Discount to Amount > Purchase More Than Cancel 5 Occurrences Company Follows 100 Discount Within 1 Hour Discount Canceled Cancel Cancel High Volume More Than Cancel Follows Amount > Cancel 3 Occurrences Discount to Discount 100 Within 1 Hour Company Cancel High Volume More Than Cancel Amount > Cancel 3 Occurrences Discount to 100 Within 1 Hour Company High Volume Purchase Give Discount to Amount > Purchase More Than 5 Occurrences Company 100 Discount Within 1 Hour Canceled Cancel Follows Discount Cancel High Volume More Than Cancel Amount > Cancel 3 Occurrences Discount to 100 Within 1 Hour Company 10
  11. 11. Example: Credit Card Scenario Event Processing Dependency Graph High Volume Purchase Give Discount to Amount > Purchase More Than 5 Occurrences Company 100 Discount Within 1 Hour Canceled Cancel Follows Discount High Volume Cancel More Than Amount > Cancel Cancel 3 Occurrences Discount to 100 Within 1 Hour Company Stratification algorithm Stratum 1 Stratum 2 Stratum 3 High Volume Give Discount Purchase Amount > Purchase More Than to Company 100 5 Occurrences Within 1 Hour Discount Cancel Canceled Follows Discount High Volume Cancel More Than Amount > Cancel 3 Occurrences Cancel 100 Within 1 Hour Discount to Company Stratified Event Processing Graph 11
  12. 12. Assigning Nodes to Each Stratum  Goal: Executing at a user set percentage of their capacity, nodes in a stratum can process all of the incoming events in their stratum level in parallel under peak event traffic conditions. – Assume agents in a single stratum are replicated on all nodes in that stratum.  Overall strategy: – Profiling nodes. Determine maximum event processing capability of available nodes by observing performance under synthetic workload. – Compute ratio of events split between nodes for first stratum. – Determine number of nodes to assign to stratum. – Repeat for next stratum, and next, until done. 12
  13. 13. Assigning Nodes to Each Stratum – ti : User set percentage of node capacity – mi : All the incoming events – ri : Maximum possible event processing rate (events/sec) – di : Maximum possible derived event production rate (events/sec) Formulas ((ti*ri)/mi)*100 Stratum ID: n mi*(di/ri) Stratum ID: n+1 Percentage of event Derived event production stream directed to node i rate of nodes in stratum n Example: Incoming event rate: 200,000 ev/sec. Processing Capacity of node i: 36,000 events/sec. ti=0.95. If (di/ri)=0.5, total rate of ((0.95*36,000)/200,000)*100 = 17.1% derived events created by Stratum ID: the stratum n nodes is n+1 Percentage of event Stratum ID: n 200,000*0.5=100,000 stream directed to node i events/sec Thus, 6 nodes will be needed in this stratum 13
  14. 14. Overview of Dynamic Load Distribution Algorithm  Statistics collected by event Proxy: – Number of input events processed by execution of agents in a particular context – Number of derived events produced by the execution of agents in this context – Number of different agent executions evaluated in this context – Total amount of latency to evaluate all agents executed in this context  For these statistics, event proxy maintains a time series, and computes statistics such as mean, standard deviation, covariance and correlation coefficient (between agents on the same node, and between contexts for the same agent).  These statistics dictate the choice of load donor and recipient nodes.  Definition of load is purposely generic to incorporate priorities and preferences of application priorities. 14
  15. 15. Overview of Dynamic Load Distribution Algorithm Stratum n Stratum n+1 Engine Queue AMIT Engine Queue AMIT Engine Queue AMIT EPProxy Engine Queue AMIT Engine Queue AMIT Engine Queue AMIT  Event Proxy collects statistics and maintains a time series and makes the following decisions: – Identify most heavily loaded node in a stratum (donor node). – Identify a heavy context to migrate from the donor node. (Also use load correlation as a guiding factor). – Identify recipient node for migrated load. – Estimate post migration utilization of donor and recipient nodes. If post migration utilization of recipient node is unsatisfactory, go back to step 3 and identify new recipient node. If post migration utilization of donor node is unsatisfactory, go back to step 2 and identify new context to migrate. – Execute migration and wait for x length time interval. Go to step 1. 15
  16. 16. Post Migration Utilization Calculation  We need to determine whether this migration will lead to overload. If it triggers other migrations then the system will become unstable. Therefore compute the post migration utilization of the donor and recipient machines.  If the average event arrival rate in time period t for context c is (c) and the average latency to evaluate context c is a (c), then the load of this context in time period t can be defined as (c1)( (c1).  Thus the post migration utilization, Ud, of the donor machine and Ur of the recipient machine after migrating a context c1, and where nd and nr are the total number of contexts on the donor and recipient respectively, is: λ (c1) p (c1) λ (c1) p(c1) Ud ' = Ud (1 − i≤nd ) Ur ' = Ur (1 + i ≤nr ) ∑ λ (ci) p(ci) i =1 ∑ λ (ci) p(ci) i =1  Post migration utilization of donor and recipient nodes must be less than preset quality thresholds. 16
  17. 17. Implementation  Used nodes running IBM Active Middleware Technology (AMiT), a CEP engine that serves as a container for event processing agents.  Event processing scenario: credit card scenario  Node hardware characteristics: – Type 1: Dual Core AMD Opteron 280 2.4 GHz and 1GB memory. – Type 2: Intel Pentium D 3.6 Ghz and 2GB memory. – Type 3: Intel Xeon 2.6 Ghz and 2 GB memory. Stratum 1 Stratum 2 Stratum 3 High Volume Give Discount Purchase Amount > Purchase More Than to Company 100 5 Occurrences Within 1 Hour Discount Cancel Canceled Follows Discount High Volume Cancel More Than Amount > Cancel 3 Occurrences Cancel 100 Within 1 Hour Discount to Company 17
  18. 18. Results 450000 398000 400000 350000 300000 300000 Input events processing rate by stratified versus Events/Sec 250000 200000 162000 partitioned event 150000 150000 processing networks 81000 90000 100000 50000 0 1:1:1 = 3 Machines 2:2:1 = 5 Machines 5:4:1 = 10 Machines System Type Centralized = 30,000 Stratified Input Rate Partitioned Input Rate  Synthetic workload: consists strictly of all events that trigger the generation of derived events. Number of nodes: 3, 5, 10. Heterogeneous mix of Type 1, 2 and 3. Ratios are selected to be “optimal.”  Y-axis: Maximum number of input event processing rate is computed as the sum of the average input event processing rate of all nodes in the network.  Illustrates the maximum performance that can be achieved by the event processing network when it is overloaded 18
  19. 19. Results 25,000 21000 20,000 15000 Derived events 15,000 production rate by Events/Sec 10,000 9000 stratified versus 4,500 4500 7500 partitioned event 5,000 processing networks. 0 1:1:1 = 3 Machines 2:2:1 = 5 Machines 5:4:1 = 10 Machines System Type Centralized = 1,500 Stratified Derived Rate Partitioned Derived Rate  Synthetic workload: consists strictly of all events that trigger the generation of derived events. Number of nodes: 3, 5, 10. Heterogeneous mix of Type 1, 2 and 3. Ratios are selected to be “optimal.”  Y-axis: Maximum number of derived event processing rate is computed as the sum of the average derived event processing rate of all nodes in the network. 19
  20. 20. Results 50.00 32.67 40.00 40.00 Percentage of Improvement (Percentage) 30.00 20.00 improvement in 10.00 performance of -32.08 -31.75 0.00 the stratified -10.00 network relative to -20.00 -30.00 a partitioned -40.00 network 100% - 5:4:1 12.5.% - 5:4:1 Percentage of Events Participating in Derived Events Production Improvement In Event Processing Rate Improvement In Derived Events Rate  Stratified network of ten nodes, where the proportion of nodes in three strata is 5:4:1 with ten nodes in a partitioned network.  All nodes used for this experiment are of Type 1.  Illustrates how changing the proportion of input events that participate in derived events production in the first stratum level impacts the input event processing rate and derived events production rate of the entire system. 20
  21. 21. Results 60000 52800 49083 50000 44100 39800 39800 37375 40000 34813 34438 Events/Sec 30000 Average input events processing 20000 rate per node in a stratified network with different configurations 10000 0 100% - 5:4:1 50% - 6:3:1 25% - 8:3:1 12.5.% - 11:3:1 Percenatage of Events Participating in Derived Events Productions 5:4:1Ratio Optimal Ratio for Percentage  Compares the average input events processing rate per processing node of a stratified network of ten nodes where the distribution of nodes among the three strata is 5:4:1 to an optimal configuration of nodes to strata.  Demonstrates that reconfiguration of the system with optimal ratio of nodes per stratum can improve performance and reacts effectively to changes in the proportion of input events that participate in derived events production in the first stratum level 21
  22. 22. Results 70000 300000 Dynamic Load Distribution (Ours) 60000 Largest Load First 250000 Dynamic Load Distribution (Ours) Random 50000 No Load Distribution Total Throughput 200000 No Load Distribution Mean Throughput 40000 150000 30000 100000 20000 50000 10000 0 0 0 200 400 600 800 1000 1200 5 10 15 20 25 30 Time (sec) Number of Nodes Total throughput since the system Mean throughput of load distribution started event processing compared with LLF, Random and None for different number of nodes  Load is defined as the total number of agents' executions for a particular context.  0.5: load threshold for a node to initiate load distribution.  0.1: load threshold of a context’s contribution to the percentage of the total load on a node, where this context has the highest load correlation coefficient with respect to the remaining contexts on the same node.  0.85: acceptable post-migration utilization of a recipient node. 0.1 is the threshold for percentage decrease in utilization of a donor node to warrant a migration.  Periodically fluctuating workload. 22
  23. 23. Support for Dynamic Changes in EP Graph  Our algorithm supports: – Addition of a new connected subgraph to the existing EPN. – Addition of an agent to the graph in the EPN. – Deletion of agents from the graph – Failure of one or more nodes in a stratum level.  Algorithm is also amenable to agent-level optimizations (E.g. coalescing of neighboring agents). 23
  24. 24. Conclusion and Future Work  We demonstrate a novel architecture for distributed event processing that maximizes the throughput of the event processing system, and stratification algorithm to partition an event processing application on to a distributed set of nodes.  The experimental results illustrate the effectiveness of the stratification technique for achieving an initial partitioning of the event processing graph in a distributed event processing system that anticipates a high volume of agent triggering events.  Performance of a stratified network can be improved during runtime with the dynamic load distribution algorithm.  Future Work: – Investigate high availability – Techniques for optimizing stateful load migration between nodes dynamically during runtime. – Investigate variations of stratification (currently in IBM HRL) 24
  25. 25. Backup Slides 25
  26. 26. Goal of Implementation  Explore benefits of event processing on stratified vs. centralized (single node) vs. partitioned network (single stratum in which load is distributed according to context) when system is under heavy load (when the number of incoming events that trigger the generation of derived events increases).  Compare stratification with partitioned approach when system is not heavily loaded.  Explore the effectiveness and scalability of the load distribution algorithm 26
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×