Using Data Mining to Aid in Rule Construction for Event ...


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Using Data Mining to Aid in Rule Construction for Event ...

  1. 1. MINING EVENT DATA FOR ACTIONABLE PATTERNS Joseph L. Hellerstein and Sheng Ma IBM T.J. Watson Research Center Hawthorne, New York {hellers, shengma} A central problem in event management is constructing correlation rules. Doing so requires characterizing patterns of events for which actions should be taken (e.g., sequences of printer status changes that foretell a printer-off line event). In most cases, rule construction requires experts to identify problem patterns, a process that is time-consuming and error prone. Herein, we describe how data mining can be used to identify actionable patterns. In particular, we present efficient mining algorithms for three kinds of patterns found in event data: event bursts, periodicities, and mutually dependent events. 1. Introduction Event management is central to the operations of operations staff, AO creates a challenge as well--- computer and communications systems in that it constructing the correlation rules. Visual programming provides a way to focus on exceptional situations. As techniques have simplified the mechanics of rule installations grow in size and complexity, event construction. However, determining the content and management has become increasingly burdensome. effectiveness of rules remains an impediment to Automated operations (AO), which was introduced in increased automation. Herein, we propose an the mid 1980s (e.g., Mill86]), provides a way to approach to simplifying rule construction by using data automatically filter and act on events, typically by using mining to characterize the left-hand side of correlation correlation rules. While this reduces the burden on the rules. Today: On-line monitoring Discard Create Raw events Modify Parser Correlation engine Run an application Database Rules Display EventAnalyzer Human expertise EventBrowser Rule Generator EventMiner Proposed: Off-line analysis Figure 1: On-Line and Off-Line Components of Event Management
  2. 2. Figure 1 summarizes how event management is done there remains a broad range of problems that are not today and our vision for how it can be improved. The addressed by universal truths. For these, human area above the dotted line depicts the current state-of- experts must construct correlation rules, a process that the-art. Raw events flow into the event management is time consuming and error prone. system, where they are parsed and stored. Then, a We propose two approaches to assist experts in rule correlation engine uses (correlation) rules to interprets construction. The first is to visualize large volumes of these events. Some events are filtered. Others are event data to aid in determining relationships between coalesced. And some result in alarms, emails, or other events. The second employs data mining techniques actions. Correlation rules are structured as if-then to automate the search for patterns. These statements. The if-part (or left-hand side) describes a approaches can be used separately, but they are most situation in which actions are to be taken. The then- effective when employed in combination. part (or right-hand side) details what is to be done when the condition is satisfied. To elaborate, consider the events displayed in Figure 2. These events were collected from a corporate Our focus is rule construction, especially the left-hand intranet over a three day period. The events consist of side of rules. The rationale for this focus is that we SNMP traps such as “threshold violated”, “connection- must know the situation to be addressed before an closed”, “port-up”, and “port-down”. The x-axis is time, action can be identified. Today, two broad approaches and the y-axis is the host from which the event are used to specify the left-hand side of rules. The first originated. The latter are encoded as integers between is based on universal truths, such as exploiting 1 and 149, the number of hosts present. Note that topology information to make inferences about while this plot contains a considerable amount of connectivity (e.g., [YeLkMoYeOh96]). Unfortunately, information, little can be discerned in terms of patterns. Host Time Time Figure 2: Three Days of SNMP Traps from a Corporate Intranet
  3. 3. Pat 1 Pat 3 Host Pat 2 Pat 4 Ordering Time Figure 3: SNMP Traps With Hosts Ordered to Reveal Patterns Now, consider Figure 3. Here, the hosts are ordered in Our approach to this automation makes use of data a way to reveal patterns (as in [MaHe99]), many of mining. Data mining is a mixture of statistical and data which provide the basis for constructing correlation management techniques that provide a way to cluster rules. For example, pattern 1 consists of “threshold categorical data so as to find “interesting” violated” and “threshold reset” events that occur every combinations (e.g., hub “cold start” trap is often thirty seconds. Such a pattern may be indicative of preceded by “CPU threshold violated”). Considerable hosts nearing their capacity limits. Pattern 2 has a work has been done in mining transactional data (e.g., cloud-like appearance that consists of “port-up” and supermarket purchases), much of which is based on “port-down” events generated as a result of mobile [AgImSw93] and [AgSr94]. For event data, time plays users connecting to and disconnecting from hubs. an important role. Follow-on research has pursued two Such patterns are probably of little interest to the directions that address this requirement. The first, operations staff and hence should be filtered since sequential data mining (e.g., [AgSr95]), takes into they represent normal behavior. Pattern 3, which account the sequences of events rather than just their happens every day at 2:00pm, consists of SNMP occurrence. The second, temporal data mining (e.g., “request” and “authentication failure” events. This is [MaToVe97]), considers the time between event most likely due to an improperly configured monitor. occurrences. Data mining has been applied to Pattern 4 is a series of link-up and link-down events numerous domains. [ApHo94] discuss the construction that resulted from a software problem on a group of of rules based for capital markets. [CoSrMo97] hubs. describes approaches to finding patterns in web accesses. [ApWeGr93] discusses prediction of defects In a well managed installation, errors are rare. Thus in disk drives. [MaToVe97] addresses sequential months of data are needed to identify actionable mining in the context of telecommunications events. abnormalities. These data volumes can be substantial. The latter, while closely related to our interests, uses For example, several installations we work with event data to motivate temporal associations, not to routinely collect five million events per week. Given the identify characteristic patterns and their interpretation. large volume of data and the different time scales at In summary, while much foundational work has been which patterns may be present, it is difficult to done on data mining and some consideration have systematically identify patterns only through been made that pertain to mining event data, no one visualization. Clearly, automation is needed as well. has addressed the specifics of patterns that arise in enterprise event management.
  4. 4. The remainder of this paper is organized as follows. which we denote by MinSupp. A naïve approach is Section 2 provides background on data mining. displayed in Algorithm 1. Sections 3, 4, and 5 describe, several patterns of particular interest in event management—event bursts, periodic patterns, and mutually dependent patterns. QC = φ Our conclusions are contained in Section 6. For each possible pattern P 2. Data mining background Count = 0 This section provides a brief overview of data mining as it pertains to the analysis of event data. We begin For each market basket B by describing market basket analysis, the context in For each item I in P which data mining was first proposed. Then, we If I is not in B, advance to next discuss efficiency considerations, a topic of particular market basket importance given the large size of event histories that must be mined. Last, we show how traditional data End mining relates to event mining. Count++ Market basket analysis originated from looking at data End from supermarkets. The context is as follows. Each customer has a basket of goods. The question If Count > MinSupp, add P to QC addressed is “Which items, when purchased, indicate End that another item will be purchased as well?” This is commonly referred to as an association rule. For Algorithm 1: Naive Approach to Finding Frequent example, early studies found that when diapers are Patterns purchased beer is frequently purchased as well. Association rules indicate a one-way dependency. For example, it turns out that purchasers of beer are, in In Algorithm 1, QC is the set of qualified candidates, general, not particularly inclined to buy diapers. those patterns that have the minimum support level. The algorithm considers all possible patterns, scans To proceed, some notation is introduced. Let I 1 ,...I M through all market baskets for each pattern, and for be items that can be purchased. Thus, each market each market basket, counts the number tests if each basket contains a subset of these items. We use item of the pattern is present in the market basket. B1 ,...B N to denote the set of market baskets, where Considerable computation time is required to perform this algorithm, even on modest-sized data sets. In there is one basket per transaction. Thus, for particular, observe that the number of iterations in the 1 ≤ n ≤ N , Bn ⊆ {I 1 ,...I M } . outer loop is exponential in the number of patterns A key data mining problem is to find sets of items, since there are 2 M − 1 possible patterns (where M is typically referred to as itemsets, that occur in a large the number of items). Clearly, Algorithm 1 scales number of market baskets. This is captured in a metric poorly. called support. Support is computed by counting the Fortunately, the search for frequent patterns can be number of baskets in which the itemset occurs and made more efficient. Doing so rests on the following then dividing by N , the number of baskets. A second observation: and closely related problem addresses prediction. Here, we are looking for I j1 ...I j K that have a high The support for I j1 ...I jK +1 can be no probability of predicting that I j K +1 will be in the same greater than the support for I j1 ...I j K . basket. The metric used here is confidence. This means that if we find a pattern with low support, Confidence is computed by counting the baskets in there is no need to consider any pattern that contains which I j1 ...I j K +1 occur (which we denote by that pattern. This is an example of the downward closure property. Count ( I j1 ...I jK +1 ) ) and then dividing by With the downward closure property, we can improve Count ( I j1 ...I jK ) . the efficiency of Algorithm 1. This is shown below in Algorithm 2 that considers patterns of increasing Typically, mining involves finding all patterns whose length. Such a strategy is referred to as level-wise support is larger than a minimum value of support, search [AgImSw93].
  5. 5. FI = {I such that Count(I)> MinSupp} /* Frequent well. Thus, it is unclear if the resulting ratio will be Items */ smaller or larger than the original ratio. Hence, downward closure does not hold. QC(1) = FI N=1 Now, we return to the problem of mining event data. Here, the context changes in a couple of ways. First, While QC(N) ≠φ there is no concept of a market basket. However, events have a timestamp and so looking for patterns of For P ∈ QC ( N ) × FI events means looking at events that co-occur within a time range. These ranges may be windows (either of Count = 0 fixed or variable size) or they may be contiguous For each market basket B segments of the data that are designated in some For each item I in P other way. In the data mining literature, this is referred to as temporal mining or temporal association. If I is not in B, advance to next market basket A second consideration needed in event mining relates to the attributes used to characterize membership in End itemsets. Several attributes are common to event data. Count++ Event type describes the nature of the event. Event origin specifies the source of the event, which is a End combination of the host from which the event If Count > MinSupp, add P to QC(N+1) originated and the process and/or application that End generated the event. (Due to the limited granularity of the data used in our running examples, we simplify N++ matters in the sequel by just referring to the host from End which the event originated.) In addition to type and origin, there is a plethora of other attributes that Algorithm 2: Using Downward Closure to Find depend on these two, such as the port associated with Frequent Patterns a “port down” event and the threshold value and metric in a “threshold violated” event. The algorithm first finds frequent items, since by downward closure infrequent items cannot be in The next three sections address patterns we have frequent patterns. QC(N) contains the qualified discovered in the course of analyzing event data: contains with N items. The potential patterns with N+1 event bursts, periodicities, and mutually dependent items are those that have N items in combination with events. Each is illustrated using the corporate intranet one of the frequent items not already in the N item data. Then we discuss issues related to the efficient pattern1. Even though Algorithm 2 has four loops discovery of these patterns. instead of the three in Algorithm 1, Algorithm 2 avoids 3. Event bursts looking through an exponential number of patterns and so is considerably more efficient. This section describes a commonly occurring pattern in problem situations—event bursts. We begin by The downward closure property holds for some motivating this pattern and providing an example. patterns and not for others. In particular, downward Next, we outline our approach to discovering these closure does not hold for the confidence of association patterns. rules. To see this, recall that confidence is computed Count ( I j1 ...I jK +1 ) / Count ( I j1 ...I jK ) . Event bursts (or event storms) arise under several as Now circumstances. For example, when a critical element consider the confidence with which I j1 ...I jK , I * fails in a network that lacks sufficient redundancy (e.g., the only name server fails), communications are predicts I jK +1 . Observe that by including a new item, impaired thereby causing numerous “cannot reach we decrease (or at least do not increase) the destination” events to be generated in a short time numerator of period. Another situation relates to cascading Count ( I j1 ...I jK +1 , I * ) / Count ( I j1 ...I jK , I * ) . However, problems, such as those introduced by a virus or, more subtly, by switching loads after a failure, a including I * in the pattern affects the denominator as change that can result in additional failures due to heavier loads. 1 QC ( N ) × FI is a shorthand for this statement, although it is not precise technically
  6. 6. Figure 4: Examples of Event Bursts Figure 4 provides a means for visual identification of intervals are indicated by the vertical lines that lie event bursts in our corporate intranet data. The plot in above the threshold (which is indicated by the the lower left contains the raw data in the same form horizontal line). as in Figure 3 (although the ordering of hosts on the y- Step 2 uses the intervals identified in Step 1 as the axis is somewhat different). Given the coarse time “market baskets” of events. For example, mining the scale of the plot relative to the granularity of event three intervals with the largest event rates in Figure 3 arrivals, there are many cases in which more than one finds the pattern “SNMP request”, “Authentication event occupies the same pixel. As a result, it is difficult Failure. Note that the mining employed here is to discern event rates by inspection. We could do drill- essentially that done in Algorithm 2. However, our downs in various sections of the plot to better “market baskets” are just those intervals that have high determine event rates, but this is labor intensive. event rates. Instead, the upper left plot summarizes the rates of 4. Periodicities events for a specific window size (as indicated in the lower left). Further, the table in the upper right of Periodic patterns consist of repeated occurrences of Figure 4 summarizes those situations in which large the same event or event set. Our experience has been event rates are present. This provides a convenient that such patterns are common in event data, often way to select subsets of the data to study in detail. accounting for one half to two thirds of the events present. Mining event bursts consists of the following two steps: Why are periodic behaviors common in networks? Two 1. Finding periods in which event rates are higher than factors contribute to this phenomenon. The first relates a specified threshold to monitoring--when a managed element emits a high 2. Mining for patterns common to the periods identified severity event, the management server often initiates in step (1) periodic monitoring of key resources (e.g., router CPU utilization). The second consideration is a For step 1, we proceed by first intervalizing the data. consequence of scheduling routine maintenance Then, event rates within each interval are computed. tasks, such as rebooting print servers every morning Those intervals in which rates exceed a specified or backing up data every week. threshold are then identified. In Figure 3, these
  7. 7. Our experience with analyzing events in computer phenomena, thereby facilitating diagnosis. In either networks is that periodic patterns often lead to case, patterns with a very low support are often of actionable insights. There are two reasons for this. great interest. For example, we found a one-day First, a periodic pattern indicates something persistent periodic pattern due to a periodic port-scan. Although and predictable. Thus, there is value in identifying and this pattern only happens three times in a three-day characterizing the periodicity. Second, the period itself log, it provides a strong indication of a security often provides a signature of the underlying intrusion.
  8. 8. Figure 5: Partially Periodic Pattern Unfortunately, mining such periodic patterns is monitoring interval or “on'' segment, the monitoring complicated by several factors. request and its response occur periodically. The ''off'' segment consists of a random gap in the periodicity 1. Periodic behavior is not necessarily persistent. For until another exceptional situation initiates periodic example, in complex networks, periodic monitoring is monitoring. This makes it difficult to apply well initiated when an exception occurs (e.g., CPU established techniques such as the fast fourier utilization exceeds a threshold) and stops once transforms. exceptional situation is no longer present. During the
  9. 9. 2. There may be phase shifts and variations in the noise) may arise during both on-segments and off- period due to network delays, lack of clock segments. synchronization, and rounding errors. Pattern 1 in Figure 2 is an example of a partial 3. Period lengths are not known in advance. This periodicity. Figure 6 displays a zoomed version of means that either an exhaustive search is required or Figure 2 for two AFS servers in pattern 1. These there must be a way to infer the periods. Further, partial periodicities contain two types of events: periods may span a wide range, from milliseconds to “threshold violated” (circled point) and “threshold reset” days. (uncircled point). As in Figure 2, the x-axis is time, and the y-axis is the host on which the events originated. 4. The number of occurrences of a periodic pattern Here the periodicities occur approximately every 30 typically depends on the period. For example, a seconds, although some are closer to 28 seconds and pattern with a period of one day period has, at most, others are near 33 seconds. seven occurrences in a week, while one minute period may have as many as 1440 occurrences. Thus, mining We view mining for p-patterns as consisting of two patterns with longer periods requires adjusting support sub-tasks: (a) finding period lengths and (b) finding levels. In particular, mining patterns with low support temporal associations. While a variation of level-wise greatly increases computational requirements in search can be employed to address the second sub- existing approaches to discovering temporal task, the first sub-task has not been addressed (to the associations. best of our knowledge). In order to capture items (1) and (2) above, we employ Our approach to finding the periods of p-patterns is to the concept of a partially periodic temporal compute event inter-arrival times and then test if inter- association. We refer to as a p-pattern. A p-pattern arrival counts exceed what would have been expected generalizes the concept of partial periodicity defined by chance. Note that a simple threshold test is not [HaDoYi99] by combining it with temporal associations sufficient here since small inter-arrival times are much (akin to episodes in [MaTuVe97]) and including the more common than longer ones and hence the concept of time tolerance to account for imperfections threshold must be adjusted by the size of the inter- in the periodicities. arrival time. We address this by using a Poisson distribution as our null hypothesis for the count of Figure 5 illustrates the structure of a partially periodic events at specified inter-arrival times. A Chi-Square pattern. Such patterns consist of an on-segment and test is used to assess statistical significance. Next, we an off-segment. During the on-segment, events are mine for the patterns at each statistically significant periodic with a period of P . No periodic event is inter-arrival time. This is done by employing a level- present during the off-segment. Spurious events (or wise search on each interval that comprises each period.
  10. 10. Threshold violated Threshold reset Figure 6: Partial Periodicity of Two AFS Servers We have applied our algorithm for p-pattern discovery discovered, ranging in length from 1 event to 13 with to the corporate intranet data. Over 30 patterns were periods ranging from 1 second to 1 day. 5. Mutual dependencies In particular, we are interested in events that occur together when they occur. We use the term mutually Thus far, we have considered frequently occurring dependent pattern or m-pattern to refer to a group of patterns. That is, given a set of intervals, we look for events that occur together when they occur. In Figure pairs of event type and host that commonly occur 2, pattern 4 is an m-pattern. It consists of a together. While this problem statement is the focus in combination of “link down” and “link up” events. The mainstream data mining, event management requires occurrence of these events is displayed in Figure 7 another perspective as well. (which is a zoomed version of Figure 2).
  11. 11. Figure 7: m-patterns in Corporate Intranet Data The events in an m-pattern do not necessarily occur in 40% of the intervals to be considered frequent) and (b) the same sequence. However, they do occur as a the m-pattern co-occurrence threshold is 90%. (The group. Thus, m-patterns are quite different from latter is much higher because of the semantics of an association rules in that the latter only indicate a one- m-pattern.) Observe that the pattern ab is frequent in way dependency. Also, the metric for quantifying the that this pattern occurs in 50% of the intervals. presence of an m-pattern differs from that for However, there are four cases in which a occurs but b associations rules. Association rules are typically does not. Thus, ab is not an m-pattern. Now consider, quantified in terms of support, the fraction of intervals dc. This pattern is much less frequent than ab in that in which the association is evidenced. In contrast, m- dc occurs in only two of the eight intervals, which is patterns require a metric more akin to confidence. This below the support threshold. However, whenever c or observation leads to some difficulties in terms of d occurs the other does as well. Thus, dc is an m- efficiently mining m-patterns since, as we discussed in pattern. Section 2, confidence does not have the downward What should be done when an m-pattern is closure property and hence Algorithm 2 cannot be discovered? Logically, the events in the pattern can be used. treated as a group. So, from an analysis perspective, it Figure 8 compares m-patterns and frequent patterns. is desirable to coalesce an m-pattern into a single Here, a, b, c, d are events, and the intervalization event. This not only reduces the number of events, it consists of two time units (as indicated by the dotted also provides the opportunity for higher level lines). Suppose that: (a) the support threshold for a semantics (e.g., the m-pattern caused by a router frequent pattern is 40% (i.e., a pattern must appear in failure).
  12. 12. d d d c c b b b b a a a a a a a a 0 4 8 12 16 Time a,b,c,d are events ab is frequent, but not an m-pattern dc is an m-pattern, but not frequent Figure 8: Illustration of an m-pattern We turn now to the specifics of m-pattern discovery. 6. Conclusions An m-pattern is present if P ( S1 | S 2 ) > q where Event management is a fundamental part of systems S1 , S 2 are subsets of the events in the m-pattern and management. Over the last fifteen years, automated operations has increased operator productivity by q is the co-occurrence threshold. P ( S1 | S 2 ) is using correlation rules to interpret event patterns. computed by counting all intervals in which both sets While productivity has improved, a substantial of events are present and dividing by those intervals in bottleneck remains—determining what correlation which only S 2 is present. rules to write. First, observe that m-patterns have the downward This paper describes how data mining can be used to identify patterns of events that indicate underlying closure property. To see this, let T ⊃ S and suppose problems. Traditionally, data mining has been applied that S is not an m-pattern. We show that T is also not to consumer purchases, often referred to as market an m-pattern. By definition, there exists subsets of S, basket data. A central consideration in this work is S1 , S 2 such that P( S1 | S 2 ) ≤ q . But S1 , S 2 are scalability. This is achieved by using a level-wise subsets of T as well. So, the conclusion follows. search in which patterns are discovered by building larger patterns from smaller ones. Even though downward closure holds for m-patterns, other computational difficulties arise. In particular, if we We show how to apply data mining to event data in an use the definition of an m-pattern to test for its efficient and effective way. Several patterns are presence, then the number of tests we must make is identified that are of particular interest to event exponential in the size of the pattern. Clearly, this management—event bursts, periodicities, and mutual scales poorly. dependencies. We provide interpretations for each, and we show how pattern discovery can be structured Fortunately, there is a way to simplify matters. We to exploit a level-wise search, thereby improving claim that if ∀a ∈ S , P ( S − {a} | a ) > q , then S is a scalability. The latter is particularly important since, m-pattern. That is, we must demonstrate that under based on our experience, tens of millions of events this assumption it follows that P ( S1 | S 2 ) > q for may need to be analyzed in order to discover actionable patterns. S1 , S 2 ⊆ S . Let a ∈ S2 . Note that P ( S1 ∪ S 2 ) P ( S1 ∪ S 2 ) P ( S ) The first pattern we describe is event bursts (or event P ( S1 | S 2 ) = ≥ ≥ = P( S − {a} | a) ≥ q storms). These commonly occur when a critical P( S 2 ) P (a ) P(a) component fails. Of particular interest is the set of This means that only linear time is required to check events common to event bursts (e.g., in order to for the presence of an m-pattern within an interval.
  13. 13. classify the kind of problem present). Here, the bursts [AgSr95] R. Agrawal and R. Srikant. Mining Sequential serve as the market baskets to which a level-wise Patterns. Proc. 1995 Int. Conf. Data Engineering, pp. search is applied. 3-14, 1995. A second pattern is periodic occurrences of a set of [ApHo94] C. Apte and S.J. Hong. Predicting Equity events. Of most interest to event management are Returns from Securities Data with Minimal Rule partial periodicities since periodic behavior is often Generation. Knowledge Discovery and Data Mining, initiated by some other source (e.g., violating a 1994. threshold). A key consideration here is finding the period. We describe how to construct a statistical test [ApWeGr93] C. Apte, S. Weiss, G. Grout. Predicting to do this. Once periods have been identified, they can Defects in Disk Drive Manufacturing: A Case Study in be used in a level-wise search. High-Dimensional Classification. Proc. of IEEE Conference on Artificial Intelligence and its Last, we introduce mutually dependent patterns (or m- Applications, 1993. patterns). The objective is to identify groups of events that occur together when they occur, even though the [CoSrMo97] R. Cooley, J. Srivastava, and B. occurrence of these events is not frequent. Looking for Mobasher. Web mining: Information and pattern co-occurrences introduces some algorithmic discovery on the world wide web. 9th IEEE challenges in that some new insights are required in International Conference on Tools with Artificial order to achieve computational efficiencies. Even so, Intelligence (ICTAI'97), 1997. we show that m-patterns have the downward closure property, and we show how an efficient level-wise [HaDoYi99] J. Han, G. Dong, and Y. Yin. Efficient search can be applied to the discovery of m-patterns. mining of partially periodic patterns in time series database. International Conference on Data Acknowledgements Engineering, 1999. Our thanks to Luanne Burns and David Rabenhorst for [MaHe99] S. Ma and J.L. Hellerstein. “Ordering developing the prototype visualization and mining Categorical Data to Improve Visualization," IEEE facility used for the studies in this paper. We also Symposium on Information Visualization, 1999. thank Chang-Shing Perng, David Taylor, and Sujay Parekh who, in addition to Luanne Burns and David [Mill86] K.R. Milliken, A.V. Cruise, R.L. Ennis, A.J. Rabenhorst, provided stimulating discussion and Finkel, J.L. Hellerstein, D.J. Loeb, D.A. Klein, M.J. useful comments on this work. Masullo, H.M. Van Woerkom, N.B. Waite. YES/MVS and the Automation of Operations for Large Computer References Complexes. IBM Systems Journal. Vol 25, No. 2, [AgImSw93] R. Agrawal, T. Imielinski, and A. Swami. 1986. Mining Association Rules Between Sets of Items in Large Databases. Proc of Very Large Data Bases, pp. [MaToVe97] H. Mannila, H. Toivonen, and A 207-216, 1993. ~Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, [AgSr94] R. Agrawal and R. Srikant. Fast algorithms 1(3), 1997. for mining association rules. Proc of Very Large Data Bases, 1994. [YeKlMoYeOh97] S.A. Yemini, S. Sliger, M. Eyal, Y. Yemini, and D. Ohsie. High Speed and Robust Event Correlation. IEEE Communications Magazine, Vol 34, No. 5. pp. 82-90, 1996.