Effective and efficient shape based pattern


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Effective and efficient shape based pattern

  1. 1. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 1 Effective and Efficient Shape-Based Pattern Detection over Streaming Time Series Yueguo Chen, Ke Chen, and Mario A. Nascimento, Abstract—Existing distance measures of time series such as the Euclidean distance, DTW and EDR are inadequate in handling certain degrees of amplitude shifting and scaling variances of data items. We propose a novel distance measure of time series, Spatial Assembling Distance (SpADe), that is able to handle noisy, shifting and scaling in both temporal and amplitude dimensions. We further apply the SpADe to the application of streaming pattern detection, which is very useful in trend-related analysis, sensor networks and video surveillance. Our experimental results on real time series data sets show that SpADe is an effective distance measure of time series. Moreover, high accuracy and efficiency are achieved by SpADe for continuous pattern detection in streaming time series. Index Terms—Distance measure, time series, shifting and scaling, pattern detection. ✦ 11 1 00 0 11 00 1 I NTRODUCTION b 1 0 1 0 0 e 1 11 00 11 1 00 0 1 1 0 0 11 11 1 00 00 0 s1 1 0 1 0 Studies on evaluating the similarity of time series have 11 00 0 hump 0 1 1 0 1 1 0ascending 1 0 00 c d0 11 1 0 attracted the interest of database community for many a0 1 00 0 1 11 1 11 00 1 0 1 1 0 0 11 00 1 0 years. A number of distance measures [1], [2], [3], [4] 1 0 11 00 1 011 00 1 0 1 0 1 1 0 e’ 0 b’ 00 00 1 1 11 1 0 11 00 1 0 have been proposed to improve the effectiveness of 11 1 00 0 1 0 11 00 1 01 0 11 11 00 1111f’ 0 0 s2 0000 0 d’ 0 1 0 c’ 1 0 a’ 00 1 11 0 11 00 1 00 1 11 0 matching time series, which is highly affected by noise 11 1 00 0 00 1 11 and warps within time series [5]. The so-called warps Fig. 1. Illustration of noise, shifting and scaling in tempo- in temporal and amplitude dimensions of time series ral and amplitude dimensions of time series. impose difficulties in evaluating distances between time series. Figure 1 shows cases of warps (shifting and ries may contain certain degrees of various warping scaling) existing between two time series s1 and s2 . Note factors mentioned above. A distance measure of time that s1 is similar to s2 at the semantic level, as there series is sensitive to a warping factor if a large distance is a hump followed by an ascending trend in both of is generated for two similar time series with such a them. The first warp is temporal shifting, i.e., the lag of warping factor. An effective distance measure should be ascending trend to the hump in s1 (measured as d − c) insensitive to the above warping factors. is different from that (measured as d − c ) in s2 . The second is amplitude shifting, e.g., the values of data Existing distance measures of time series can be classi- items between d and e in s1 are larger than those of fied into three categories. The first category is Euclidean- the corresponding items between d and e in s2 . The based measures in which Euclidean distance is used in third is scaling, the extensions of humps in s1 and s2 are measuring distance between either two original time different in both temporal dimension (from c-a and c - sequences or features got from the original time se- a ) and amplitude dimension (from s1 [b]-s1 [a] and s2 [b ]- quences. It has been observed that the Euclidean distance s2 [a ]). Noise (f ) also exists in time series s2 . is very sensitive to distortion and noise [3], [6]. More- In this paper, we focus on shape-based time series over, it only handles global time scaling by shrinking where local shapes usually imply important semantics or stretching time sequences compulsively. The second and they are very useful in identifying objects and category includes numerical warping distances such as phenomena represented by the time series. Examples Dynamic Time Warping (DTW) [1] and Edit distance of shape-based time series are trajectories, silhouettes with Real Penalty (ERP) [2]. The distance between two of objects, signals from sensors. Shape-based time se- time series is aggregated over pair-wise difference of data items in the optimal alignment between two time • Yueguo Chen is with the Key Laboratory of Data Engineering and sequences. These distance measures handle local time Knowledge Engineering, MOE of China, Renmin University of China, shifting and scaling [7], but are still sensitive to cer- China. E-mail: chenyueguo@ruc.edu.cn tain degrees of amplitude shifting and scaling as the • Ke Chen is with College of Computer Science, Zhejiang University. E-mail: chenk@zju.edu.cn amplitude difference of data items will be accumulated. • Mario A. Nascimento is with Department of Computing Science, Univer- The third category is ε-matching warping distances, in sity of Alberta. E-mail: mn@cs.ualberta.ca which distance is aggregated over bounded similarity This work is partially sponsored by NSERC, Canada scores determined by a matching threshold ε. ExamplesDigital Object Indentifier 10.1109/TKDE.2010.223 1041-4347/10/$26.00 © 2010 IEEE
  2. 2. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 2are Longest Common Subsequence (LCSS) [4] and Edit subsequence and the query pattern is no more than aDistance on Real sequence (EDR) [3]. Compared to the given threshold δ.second category, the ε-matching warping distances are All the mentioned distance measures of time series arerobust in the presence of noise and partially handle designed for full sequence matching, in which distancesome amplitude shifting and scaling variances. However, is measured based on the full length of sequences. How-they are still sensitive to certain degrees of amplitude ever, on the problem of streaming pattern detection, wewarps because ε-matching is directly based on amplitude have no priori knowledge on the positions and lengthsvalues. Figure 2 shows two examples where amplitude of the possible matching subsequences. When usingshifting and scaling variances may affect the effective- these distances, we need to first divide the potentialness of existing warping distances. subsequences from the streaming time series, and then compare them to query patterns based on full matching. B B An obvious solution is to compare the most recent sub- A sequences of streaming time series to the query patterns A C C whenever a new data item arrives. However, such an ap- (a) (b) proach is computationally intensive, and incurs redun- dant computational overhead. Segmentation is a simpleFig. 2. Impact of amplitude shifting and scaling. d(A, C) way to handle subsequence matching, in which potentialmay be less than d(A, B) for warping distances. matching subsequences are extracted from streaming time series and compared to query patterns. However, The local shapes of time series also affect the ef- potential segments may be hard to extract as many timefectiveness of distances. Figure 3 shows an example series patterns have no clear boundaries.where the DTW distance of two local shapes is quite As a subsequence matching problem, pattern detectionsmall even though they are quite distinct in shapes. on streaming time series is naturally expensive. WarpingExisting warping distances lose much information when distances have so far not been extended for onlinematching local shapes. pattern detection in streaming time series while taking both shifting and scaling into account. SpADe is applied b to efficiently perform continuous detection of patterns on ... ... streaming time sequences without the need to perform a sequence segmentation. Our contributions are as follow: • We propose a robust distance measure of shape- b’ based time series, SpADe, which can be applied ... ... to both full sequence and subsequence matching. a’ It is not sensitive to shifting and scaling in eitherFig. 3. Impact of local shapes on warping distances. the temporal or the amplitude dimensions of time series. Global amplitude shifting and scaling can be handled • We propose a continuous SpADe computation ap-by normalization [3], [8]. Given a time series s, each data proach which can naturally be used on streamingitem s[i] can be normalized as s[i] = (s[i] − μ)/σ, where pattern detection. We improve the efficiency of pat-μ and σ are the average and standard deviation of data tern detection by using a pruning approach.items in s. Many available time series data sets have • We extend the SpADe distance for streaming patternbeen normalized [9]. However, local amplitude shifting detection of multivariate time series.and scaling (an example is shown in Figure 1) cannot be • Experimental study was conducted. We present ex-handled by simple normalization of global time series. perimental results that show that SpADe is an effec-To fully handle noise, local shapes, shifting and scaling tive distance measure of time series, and it is bothin temporal and amplitude dimensions of shape-based efficient and effective for subsequence matching ontime series, we propose a novel distance measure, called streaming time series.Spatial Assembling Distance (SpADe). The rest of the paper is organized as follows. Section We investigate the use of SpADe in the context of 2 gives an overview of distance measures of time seriesdetection of streaming patterns. Pattern detection on and existing solutions on subsequence matching. Sectionstreaming time series is to continuously monitor match- 3 defines the basic SpADe, and Section 4 proposesing subsequences of streaming time series against some effective techniques on computing the SpADe distance.given query patterns. A pattern in time series is a Section 5 introduces the approach of continuous pat-set of sequential data items collected in discrete time tern detection by SpADe. Section 6 extends the SpADepoints, describing a meaningful tendency of evolving distance for streaming pattern detection of multivariatedata items during a period of time, and therefore im- time series. Section 7 shows the experimental study ofplying important phenomenon of the monitored objects. SpADe. Section 8 summarizes our conclusions.A subsequence of streaming time series is said to be This paper improves on our previous work [10] by giv-matched to a query pattern if the distance between the ing a thorough analysis of warping-based subsequence
  3. 3. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 3matching in Section 2.3, a detailed discussion on effective techniques is the use of Euclidean distance on measuringcomputation of the SpADe distance in Section 4, an distances in feature space. Park et al. [24] proposed anextension of SpADe for streaming pattern detection of approach for subsequence matching by applying DTW.multivariate time series in Section 6, and an extensive The suffix tree is used to index possible subsequencesexperimental study on the impacts of parameters and of the data sequences. However, all these studies onthe application of streaming human motion pattern de- subsequence matching try to search the matches of shorttection in Section 7. query patterns to long sequences in a database, where index can be built on long data sequences. Pattern detection on streaming time series is to detect2 P RELIMINARIES matching subsequences within long streaming sequences2.1 Distance measures of time series to any given query pattern. Wu et. al [25] proposedThe distance between two time series is essentially an online segmentation and pruning algorithm to sim-computed from the aggregation of pair-wise difference plify the data sequence as zigzag shapes. However, theof data items within them. Traditionally, the Euclidean piecewise linear representation limits its application indistance is used to measure the distances between time shape based pattern matching on time series. Euclideanseries of the same length. Many dimensionality reduc- distance or its variation (e.g., correlations) was usedtion techniques, such as Discrete Fourier Transform [11], in matching patterns in some recent works on stream-Singular Value Decomposition [12], Discrete Wavelet ing time series such as BRAID [26], SPIRIT [27]. GaoTransform [13], Adaptive Piecewise Constant Approxi- et. al [28] also studied continuous pattern queries onmation [14] and Chebyshev Polynomials [15], have been streaming time series. They attempted to detect the near-applied to feature vector extraction from time series, est neighbor pattern when new data value arrives. Asafter which Euclidean distance can then be applied in mentioned earlier, the use of simple Euclidean distancemeasuring distances of the extracted feature vectors. or correlation in these studies affects the effectivenessHowever, it has been observed that the Euclidean metric of pattern matching where shifting and scaling exist.is very sensitive to distortion and noise [3], [6]. Steaming pattern detection on DTW distance has been Warping distances such as DTW [1] and EDR [3] have recently studied in [29]. The matching subsequences arebeen proposed to measure distances of time series with continuously monitored by computing DTW distances inarbitrary lengths. The optimal alignments of data items a continuous fashion. This technique can also be appliedbetween two time sequences are obtained by repeating to the other warping distances such as EDR. However,some data items so that the lengths of two sequences as stated earlier, these warping distances do not handlecan be the same. As a result, local time shifting and shifting and scaling in amplitude.scaling [7] are handled under those warping distances.The distance is calculated by finding the best warping 2.3 Warping-based subsequence matchingpath in the distance matrix using dynamic programming, Given two time series s1 and s2 of lengths m andwhich has a complexity of O(mn) (m and n are the n, a warping distance uses a matrix of (m + 1) ×lengths of time series). Lower bounds of warping dis- (n + 1) for computing the full sequence distance bytances [6], [16] have been proposed to prune some real a recursive function: M [i, j] = f(x,y)∈φ(i,j) (M [x, y] +computations of warping distances. However, existing subcost((x, y), (i, j))). M [i, j] records an intermittent re-warping distances are still sensitive to the shifting and sult of an optimal substructure, which describes thescaling in the amplitude dimension of time series. optimal matching of two prefixes s1 [1 : i] and s2 [1 : j]. Supporting effectively matching time series under The main function f is either min or max function,shifting and scaling variances has been attempted by depending on whether it is to measure distances or sim-many studies [5], [17], [18], [19], [20]. However, the ilarities. Notation φ(i, j) denotes the set of entries in thetechniques proposed in these studies either support only matrix from which M [i, j] can be dynamically computed.uniform shifting and scaling or cannot fully address For each element (x, y) ∈ φ(i, j), it is satisfied that x ≤ ithe shifting and scaling variances in both temporal and and y ≤ j so that M [i, j] can be dynamically computedamplitude dimensions of time series. Moreover, time from those entries which have been already computed.series are matched based on data items in these studies, Typically, φ(i, j) = {(i − 1, j), (i, j − 1), (i − 1, j − 1)}.where meaningful local shapes (as the example in Figure The function subcost((x, y), (i, j)) is the additional cost3) may not be effectively captured and matched. for computing M [i, j] from M [x, y]. It is typically a non- negative function. The actual distance of time series is2.2 Pattern detection on streaming time series actually aggregated over a number of subcosts throughFor subsequence matching, ST-index [21], Dual Match dynamic programming. The initial condition for com-[22] and General Match [23] extract local patterns from puting the warping distances is M [0, 0] = 0, from whichsequences by fixed size sliding windows. They map each the distance is aggregated. The entries in M can bewindow of data items into a multidimensional point computed row-by-row or column-by-column. The lastand use indexing techniques to efficiently match the entry to be computed, M [m, n], finally determines thesubsequences in feature space. The limitation of these warping distance of two time series. For each entry
  4. 4. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 4(i, j) ∈ M , there must be a warping path from which sliding window. A local pattern l of length w from a timeM [i, j] is aggregated. In full sequence matching, we sequence s1 can be described as l = (θt , θa , θs ), which areshould guarantee that the warping path of each entry the position (mid point) of l in s1 , the mean amplitude of(i, j) (i × j > 0) is initialized from the entry (0, 0). data items in l and the shape signature of l respectively. The warping distance can also be applied for sub- The distance of two local patterns l in s1 and l in s2 , cansequence matching, to handle the temporal variances be measured as D1 (l, l ) = f (|θa − θa |, |θs − θs |), whichbetween the querying patterns and matching subse- is a weighted sum of the differences in amplitude andquences. Given a querying pattern q and a long time shape features of two local patterns. The weights in f isseries s of length m and n, a wide distance matrix M application-specific, depending on the tolerance of theof (m + 1) × (n + 1) can be created (shown in Figure 4). amplitude difference and that of the shape difference.Instead of evaluating the distance of two full time series A local pattern match (LPM) p is formed from l andbased on the warping path between two fixed corner l if D1 (l, l ) < ε, which means that there is a matchentries, we propose to evaluate the distances between q between l and l . We label the positions of l in s1 andand the subsequences of s based on the warping paths l in s2 as xp and yp respectively. A matching matrix offrom the bottom edge to the top edge of M . m × n is shown in Figure 5 to describe the match of local patterns in s1 and s2 . The relative positions of l and l M[m, e] are obtained by projecting p horizontally and vertically. x A LPM p can be described by the coordinates of two q local patterns: p = (xp , yp , ψp ) = (θt , θt , θt − θt ), where ψp ... ... i M[i, j] represents the temporal shifting of two local patterns. j m M[0, b] y s s1 p l lFig. 4. Subsequence matching using warping distances p.x m p.x 0 s1 The boundary entries are initialized as M [0, j] = 0, s2 xM [i, 0] = +∞ (i > 0). All the other entries in M l’ l’are computed column-by-column following the same 0 p.y n O y p.y s2 nrecursive function as full sequence matching. Withineach column, they are computed in a bottom-up manner. Fig. 5. An example of an LPM and its corresponding localIn each column, the top entry M [m, e] is used to evaluate patterns in matching matrix.whether there is a matching subsequence ended at theposition e of the long time series s. For each entry (m, e) Note that there are a number of local patterns ex-of the top edge of M , a warping path can be traced tracted from two time sequences s1 and s2 . A largeout. Given such a warping path (which starts at (0, b) number of LPMs will be formed if s1 and s2 are similarand ends at (m, e)), the warping distance (or its square) in shapes. Their distribution can be visualized in thebetween q and subsequence s[b : e] can then be measured matching matrix formed from the two sequences.as M [m, e]. The subsequence s[b : e] will be a matchingsubsequence to q if M [m, e] ≤ δ. 3.2 Distance between two LPMs For streaming time series scenarios, the length of sis not fixed. Data items of s evolve dynamically. We We measure the SpADe distance of two time series bymay maintain a sliding window of width w (which is finding the best combination of LPMs in the matchingcomparable to m) as the width of matrix M . When a matrix, such that they can maximize the matches of s1new data item is appended to s, a new column of M will and s2 . The quality of LPM combination is determinedbe recomputed by refreshing all entries in that column by the following two criteria: 1), the projections (verticalin a bottom-up manner. Such a technique can also be and horizontal) of LPMs should cover large regions ofapplied in subsequence matching when n is too large. In s1 and s2 . The larger the covered regions, the morethis case, instead of using a matrix of (m + 1) × (n + 1), data items in s1 and s2 are matched; 2), the temporala small matrix of (m + 1) × w is enough (w n). shifting of two LPMs should be as small as possible, which means that two LPMs can be obtained by a similar3 S PATIAL A SSEMBLING D ISTANCE transformation from local patterns in s1 to local patterns in s2 . We define the gaps between two LPMs p1 and p23.1 Local pattern match on s1 and s2 as Dx (p2 , p1 ) and Dy (p2 , p1 ) respectively:In full sequence matching, the distance between twotime sequences s1 [1 : m] and s2 [1 : n] is measured max(xp2 − xp1 − w, 0) if xp2 > xp1 ; Dx (p2 , p1 ) =based on the full length of two sequences. We borrow the +∞ otherwise.idea from General Match [23], and extract a set of small max(yp2 − yp1 − w, 0) if yp2 > yp1 ; Dy (p2 , p1 ) =local patterns from time series by using a fixed size of +∞ otherwise.
  5. 5. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 5 The gaps are used to handle the noise and local l1 l3unmatched regions within time series. l l1 Definition 1: The distance of two LPMs p1 and p2 isdefined as D2 (p2 , p1 ) = g(Dx (p2 , p1 )) + g(Dy (p2 , p1 )) + l2 l4g(|ψp2 − ψp1 |). temporal scaling amplitude scaling Function g(x) is a penalty on the gaps between twoLPMs, which can be defined by users, but should satisfy Fig. 6. Scaling the local patterns.the following properties: 1) g(0) = 0; 2) g(x + y) ≥ g(x) +g(y), (x, y ≥ 0). In our study, we simply use g(x) = x noted as V (l). If the local pattern l is cast into St temporalwhich satisfies the requirements on g(x). We also define scales and Sa amplitude scales, then |V (l)| = St × Sa .the distance (D2 ) between a LPM p and a point at the Given two time series s1 and s2 , we actually measuretop or bottom of the matching matrix, by assuming that the distance between them by only scaling one timethe point is the mid point of a virtual LPM. series s1 . A LPM p is formed by a local pattern l in s1 and a local pattern l in s2 , if ∃l ∈ V (l), D1 (l , l ) < ε.3.3 SpADe in full sequence matching According to the definition of SpADe, to compute the distance of s1 and s2 , we need to extract O(n) local Definition 2: Given a path r = Ps → p1 → ... → patterns from s2 and conduct O(n) ε-range queries overpt → Pe formed by Ps (0, 0), Pe (m, n), and a number of those O(mSt Sa ) scaled local patterns extracted from s1 .LPMs p1 , . . . , pt , the length of r is defined as Cost(r) = t−1 As a result, the total computational cost of SpADe will beD2 (p1 , Ps ) + i=1 D2 (pi+1 , pi ) + D2 (Pe , pt ). much higher, compared to the traditional distance mea- Given two sequences s1 [1 : m] and s2 [1 : n], a matching sures of time series such as DTW and EDR. Therefore,matrix can be built based on all the LPMs between s1 and we propose some approximate techniques to speed ups2 . Given two corner points Ps (0, 0) and Pe (m, n) in the the distance computation of SpADe.matching matrix, {ri } include all the paths derived fromthe LPMs, and linking Ps and Pe . Definition 3: The SpADe distance of s1 to s2 un- 4.2 Efficient detection of LPMsder full sequence matching is defined as D(s1 , s2 ) = Short local patterns are preferred to describe the finemint Cost(rt ), rt ∈ {ri }. grained local shapes of time series. This is because long In other words, the SpADe distance of two given time local patterns generate more false positive LPMs, as largesequences is the length of shortest path from left-bottom ε is needed for long patterns to reduce the false dismissalcorner to the right-up corner in the matching matrix of ratio of LPMs. Haar wavelet [31] is a good candidatethese two sequences. We find the best combination of for extracting θa and θs features from local patterns,LPMs using the shortest path connecting two end points. as low band wavelet coefficients elegantly describe theThat is why we call the distance as spatial assembling mean amplitude and the general shape of local patterns.distance. Finding shortest paths has been well studied Moreover, the Haar wavelet is computationally efficient.and the classic Dijkstra’s algorithm [30] can be applied. In our solution, we propose to use the first 4 low band wavelet coefficients as θa (the first low band wavelet4 E FFECTIVE S PAD E C OMPUTATION coefficient) and θs (the second to the fourth low band wavelet coefficients) features of local patterns.4.1 Handling scaling variations In many applications of time series, distances of aThe scaling variations of two time series are not handled querying time series to a number of database time seriesin the original definition of SpADe given in the previous are typically computed online. To improve the efficiencysection. To handle the scaling variations, one time series of matching local patterns, those existing instances canneed to be scaled into a number of time series in both be preprocessed, and scaled local patterns can be ex-temporal and amplitude dimensions. Then, for each tracted from them. A multi-dimensional index such aslocal pattern in the original time series, a number of R-tree [32] can be used to index those local patterns soscaled local patterns can be extracted from the scaled that ε-range queries can be efficiently processed.time series. Figure 6 shows how a number of scaled To handle the variances of shifting and scaling, givenlocal patterns are extracted based on a original local a local pattern l extracted from a query time series q, apattern l. First, a number of local patterns (l1 and l2 in large number of existing local patterns extracted from allthe example) with the same mid points and different data sequences will match l . Therefore, many brancheslengths are extracted from the original time series as a in the R-tree are involved during the query, which incurmeans of temporal scaling. Second, for each temporally much computational overhead. Inspired by VA-File [33],scaled local pattern (l1 as an example), a number of we partition the feature space into cells, and approximateamplitude scaled local patterns (l3 and l4 ) of same length the distance between local patterns according to the cellsare extracted from the same positions of the amplitude they fall in. As the number of dimensions is small andscaled time series. The set of all scaled (both in time and adequate variation should be allowed, the total numberamplitude) local patterns varied from l (including l) is of filled cells is expected to be much less than the number
  6. 6. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 6of local patterns. Consequently, each cell records a list of w = 4k, where k is an integer. The range of w is consid-original local patterns whose scaled local patterns fall in ered based on the length of time series. It cannot be toothe cells. Therefore, only a local pattern l is maintained small as the short local patterns may not be long enougheven though more than one local patterns in V (l) fall in to represent a meaningful local shape patterns. More-a cell c. Given a query local pattern l (located in cell c), over, small w incurs a large number of local patterns, andall local patterns within c and the direct neighbor cells therefore drops the efficiency of SpADe. On the otheraround c are treated as the matching local patterns of l . hand, w also cannot be too large as 4 wavelet coefficientsTherefore, efficiency of detecting LPMs is achieved by will be not enough to approximate the complex localchecking the matching local patterns within cells. shapes extracted from long local patterns. On practice The space of wavelet coefficients of local patterns is (tested from many time series data sets), w can be chosenpartitioned into cells. Effective widths of cells are learned from 64 to n , where n is the average length of time series. n 2from the distribution of wavelet coefficients extracted We generate a number of scales in time and amplitudefrom the training data set. For each wavelet coefficient fi , by specifying St and Sa . The granularity of scales is ¯ −μwe normalize it as fi = fiσi i , where ui and σi are mean set as 0.1. For example, if St is 7, then we generateand standard deviation of fi respectively. The widths temporal scales of 0.7, 0.8, . . . , 1.3. Parameter c is chosenof cells in the normalized wavelet coefficient space are from 8 to 16. It cannot be too small as small c generatesset as 1 for each dimension. To limit the number of 4 large sliding steps which will lose some LPMs. On the ¯ ¯cells, all fi > 2 or all fi < −2 are treated as outlier contrary, c does not need to be larger than 16 because a wpartitions. Therefore, each dimension is segmented into sliding step of 16 is already fine enough as a slide. The18 partitions, and there are totally 184 cells in the feature four parameters w, St , Sa and c are adjusted within itsspace of local patterns. value range. The combination achieving best accuracy in cross validation of training data set is learned as the4.3 Fast SpADe using disjoint sliding windows parameters in SpADe.Local patterns can be extracted from time series withdifferent granularity of sliding steps. The finest gran- 5 S PAD E ON S UBSEQUENCE M ATCHINGularity is applied in the original definition of SpADe, SpADe is useful not only for full sequence matching,i.e., local patterns are extracted at every position of both but for subsequence matching as well. It is a goods1 and s2 . As a result, the number of detected LPMs candidate to continuously monitor subsequences. In thiswill be very large, incurring high computational cost of section, we show how SpADe distance can be continu-SpADe. Inspired by the idea applied in [21], we propose ously computed in subsequence matching. First we giveto speed up the SpADe computation by using wider some notions used in subsequence matching. A numbersliding steps so that the number of derived LPMs can of time series queries qs, describing the phenomenonbe remarkably reduced. In our solution, disjoint sliding interested by users, are preprocessed and stored inwindows on the query time series s2 , and a sliding step query engine. The streaming time series s continuouslyof w (c is introduced for determining the width of sliding c feeds data items to the query engine. The query enginestep) on the other time series s1 were used to extract continuously reports the matching subsequences whoselocal patterns from two time series. The SpADe distance distances to any query pattern q is no more than somecan then be computed from those LPMs. The longer the given query threshold δ.LPMs, the larger sliding steps within s1 and s2 , and themore efficiency can be achieved on SpADe computation. Pe 5.1 Variance of SpADe in subsequence matching m Given a query pattern q[1 : m] and some recent data items s[ts : te ] in the streaming time series, the local s1 SpADe distance of s at time point t (ts ≤ t < te ) is defined as: Definition 4: D(q, s, t) = mini<te D(q, s[t + 1 : i]). 0 Ps s2 n D(q, s, t) measures the distance of the best matching subsequence (to q) starting at time point t + 1 of s.Fig. 7. SpADe computation by disjoint sliding windows. As shown in Figure 8, D(q, s, t) can be explained as the shortest path from point Ps (0, t) to points Pe (m, t )4.4 Parameter learning (t < t < te ). Let tr = argmint D(q, s[t + 1 : t ]). D(q, s, t)There are some parameters, w, St , Sa and c, which affect is actually the full sequence matching SpADe distancethe accuracy of SpADe distance. Effective values of these of q to s[t + 1 : tr ]. The global time scaling of a matching −tparameters can be learned from the training data set subsequence s[t+1 : tr ] to q can be measured as u = trm .by maximizing the accuracy of cross validation on one If u = 1, the matching subsequence is in the same lengthnearest neighbor classification approach. To facilitate the of q, and it is called an equal-length match; If u > 1, thewavelet transformation, we choose the pattern length matching subsequence will be longer than q, and it is
  7. 7. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 7 t shrinking t r t+m expanding t e m match match consecutive LPMs pt1 and pt2 in the path, such that, pt1 is detected behind pt2 , i.e., ypt1 ≥ ypt2 , and Dx (pt2 , pt1 ) = q +∞. According to Definition 1, D2 (pt2 , pt1 ) = +∞. shortest path ... ... Therefore, Dc (p) = +∞, which is impossible because we can at least find a path from Ps (0, yp − w ) to p whose 2 cost is only g(xp − w ). Consequently, p1 cannot change 2 the value of Dc (p). 0 Ps ... s ... Lemma 1 guarantees that Dc (p) can be immediately computed when p is detected from the streaming timeFig. 8. An example of local SpADe distance. series. The computation of Dc (p) is to find the previouscalled an expanding match; otherwise, u < 1, it is called a LPM of p, noted as p , from which the shortest pathshrinking match. from the bottom edge of the matching matrix to p is Pattern detection tries to find subsequences of s whose found, i.e., p = argminp1 (Dc (p1 ) + D2 (p, p1 )). AccordingSpADe distance to query q is less than some threshold to Definition 1, p should be in the left-bottom cornerδ. This can be achieved by continuously computing local of p. Figure 9 shows the searching region ABOC of p .SpADe distances, i.e., finding matching subsequences This is because for those LPMs whose reference point issatisfying D(q, s, t) ≤ δ at every point of s. However, this beyond ABOC, one of the gaps of p to them will be +∞.is not efficient because each computation of SpADe dis- . .tance requires finding the shortest path of LPMs within . O’ O psome window size, which consumes much computation. B B’To improve the efficiency of continuous SpADe compu- ... ε ...tation, we propose an incremental way of computingSpADe distance. For pattern detection, the probability q A’ . O" C’of having matching subsequence grows as the number . . A Cof LPMs increases. Much computation will be saved if ... s ...the SpADe distance is updated only when new LPMsare detected. Fig. 9. Searching region of previous LPM. Definition 5: The cumulating SpADe distance of a de-tected LPM p to query q, noted as Dc (p), is the shortest However, it is not necessary to search p in the largepath starting from points at the bottom edge of matching region of ABOC, as large gaps are usually not allowedmatrix to p. in practice. Therefore, the searching region of p can Definition 6: The potential SpADe distance of a LPM be reduced by constraining the gaps between two con-p to query q is defined as Dp (p) = Dc (p) + g(m − xp − w ). secutive LPMs. Figure 9 shows the constraint searching 2 Dc (p) is a lower bound on the length of paths passing region A B OC with a gap bound of ξ. The efficiencythrough p and linking the bottom and top edges of the of computing Dc (p) will be improved significantly whenmatching matrix. Once Dc (p) > δ, p will not emerge small ξ is applied. The cumulating SpADe distance andin the path of any qualified matching subsequence for potential SpADe distance with the constraint region areq. On the other hand, if Dc (p) ≤ δ, p is a promising denoted as Dc,ξ (p) and Dp,ξ (p) respectively. On detectingLPM. Meanwhile, Dp (p) is an upper bound of the local p , we get Dc,ξ (p) = Dc,ξ (p ) + D(p, p ). For range query,SpADe distance. Therefore, Once Dp (p) ≤ δ, a qualified if Dc,ξ (p) > δ, we simply drop p as it will not appear asmatching subsequence to the query q is found. a LPM in a qualified matching subsequence. To find p of p, we need maintain those LPMs in the searching region of p , and test all the LPMs within5.2 Incremental computation of SpADe this region column-by-column. To reduce the number ofOn pattern detection in streaming time series, we ac- detected LPMs, we use disjoint sliding windows on thetually detect LPMs by cutting the most recent local streaming time series. Meanwhile, for each query patternpattern from streaming data sequence, extracting feature q, a sliding step of w is applied. As shown in Figure 9, c 2from the chopped local pattern, and retrieving LPMs the number of LPMs in A B O O” is bounded as cξ2 wof the local pattern. On detecting a LPM p, it will be due to the strategy of sliding steps.perfect if Dc (p) and Dp (p) can be computed on the fly. The above model guarantees that Dc,ξ (p) can be com-The following lemma supports this incremental way of puted column-by-column because the previous LPM of ξSpADe computation. p must be in the previous w columns of the column p Lemma 1: The LPMs detected behind a LPM p on locates. Therefore, for each query pattern q, the numberstreaming time series will not change Dc (p). of LPMs need to be dynamically maintained is bounded Proof: Suppose p1 is detected behind p. Therefore, as O( cmξ ). If there are N query patterns with largest w2yp1 ≥ yp . If p1 changes Dc (p), it should be in the shortest ¯ length of m, the memory cost of continuous SpADepath of Dc (p). Let p1 → ... → pt → p is a path from p1 computation will be bounded as the maximal number ¯to p in shortest path. Then we must be able to find two of LPMs need to maintained, O( cN mξ ). If t is the av- w2
  8. 8. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 8 εerage number of LP M s detected from one chopped . . . . . . . . . . . . . .local pattern of streaming time series, the complexity of . . . . . . . nξt2 p4 p5whole pattern detection will be O( Nw2 ). In the regions p3 BZwhere no matching subsequences appear, the number of p2 bX q ... bW bYLPMs will be very small, close to zero. Therefore, the ... b1 bA ...computation of Dc,ξ (p) will be very efficient. b0 p1 p Along with the computation of Dc,ξ (p), we record xthe starting point of the shortest path to p. Dp,ξ (p) is p0computed following the calculation of Dc,ξ (p). As we A ... W X Y ... Z 0 y shave mentioned, once Dp,ξ (p) is found to be less thanδ, a qualified matching subsequence is detected. The Fig. 10. Pruning in SpADe distance computation.position of matching subsequence is actually the verticalprojections from the starting point of the shortest path = 0. ∴ D2 (p4 , p1 ) ≥ D2 (p3 , p1 ), and p4 is not a potentialof p to the end point of p. Considering that the potential posterior of p1 .SpADe distance of some LPMs around p may also satisfy We have mentioned that disjoint sliding window isthe range query, the LPM who has the smallest Dp,ξ (p) used to chop local patterns from streaming time series.within a local region is returned as the end of a matching Therefore, a column of LPMs will be obtained for everysubsequence in this region. chopped local pattern. The post-bound of a LPM pi in column A is bi . The post-bound of column A can be defined as bA = maxypi =yA bi , i.e., the highest post-bound5.3 Pruning approach in SpADe computation of pi in column A. According to Lemma 2, any LPM overThe major computational cost of range query on stream- bA and behind column A will not be a potential posterioring time series comes from the computation of cumulat- of any LPM in column A.ing SpADe distance of detected LPMs. Query processing Definition 8: The estimate-bound of a column A iswill be efficient if some LPMs can be pruned without BA = maxyA −ξ≤yX <yA bX (X is a column before A).the computation of cumulating SpADe distances. In the Figure 10 shows an example of estimate-bound BZ offollowing, we introduce the concepts of post-bound and column Z. It is obvious that for a LPM p5 over BZ inestimate-bound, and show how such a pruning approach column Z, it is not a potential posterior of any LPM inis achieved. column W, X, Y . In other words, the previous LPM of p5 Definition 7: The post-bound of a LPM p is the highest will not be found in the searching region of p5 . Therefore,position of the potential posteriors of p, which can be p5 can be pruned without the computation of Dc,ξ (p5 ).located in the next column of p. The estimate-bound of a column is continuously com- A LPM p2 is a potential posterior of p1 if Dc,ξ (p1 ) + puted based on the post-bound of previous columns. OnD2 (p2 , p1 ) ≤ δ. Suppose the post-bound of p1 is b1 , getting a promising LPM in a new column, we updateaccording to the definition, for any p3 satisfying yp3 = the post-bound of that column, which is further used toyp1 + w and xp3 > b1 , p3 will not be a potential posterior compute the estimate-bound of following columns.of p1 . Based on this, we have the following lemma. Lemma 2: For any LPM p4 satisfying that yp4 ≥ yp1 +w 6 S TREAMING PATTERN D ETECTION FORand xp4 > b1 which is the post-bound of p1 , p4 will notbe a potential posterior of p1 . M ULTI - FEATURE T IME S ERIES Proof: We simplify xpi and ypi as xi and yi . For a p4 In SpADe, local patterns are approximated for efficientsatisfying the conditions in Lemma 2, a virtual LPM p3 matching by using wavelet transformation and gridcan be found such that y3 = y1 + w, x3 = x4 > b1 , and indexing. However, when time series are multivariateψp3 = ψp4 . Therefore, p3 is not a potential posterior ofp1 . To show that p4 is also not a potential posterior of sequences (i.e., s[i] is a multivariate vector instead ofp1 , we only need prove that D2 (p4 , p1 ) ≥ D2 (p3 , p1 ). The a univariate number), the number of grids for approx-relationship of p1 , p3 and p4 is shown in Figure 10. imating local patterns will be exponentially increased, due to the curse of dimensionality [34]. As a result, theD2 (p3 , p1 ) = g(x3 − x1 − w) + g(|x3 − x1 − w|) cost of indexing and matching local patterns increasesD2 (p4 , p1 ) = g(x4 − x1 − w) + g(y4 − y1 − w) exponentially. To efficiently apply SpADe distance to +g(|(x4 − x1 ) − (y4 − y1 )|) streaming pattern detection of multi-feature time series, x3 = x4 we propose to decompose the multi-feature time series g(x + y) ≥ g(x) + g(y), x, y ≥ 0 into a number of time series of univariate data, and then∴ ΔD = D2 (p4 , p1 ) − D2 (p3 , p1 ) match them in parallel. The local distances of matching subsequences ended at the same position of different= g(y4 − y1 − w) + g(|(x4 − x1 ) − (y4 − y1 )|) − g(|x4 − x1 − w|) decomposed time series are aggregated on the fly, which gives an overall evaluation of the match between the= g(|(x4 − x1 ) − (y4 − y1 )|) + g(y4 − y1 − w) − g(x4 − x1 − w) subsequence (ended at the current position) of streaming≥ g(|(x4 − x1 ) − (y4 − y1 )|) − g(|(x4 − x1 − w) − (y4 − y1 − w)|) time series and the query pattern.
  9. 9. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 9 For a query pattern and a streaming sequence q and As shown in Table 1, we compare the distance measuress, given a dimension i, SpADe is applied to evaluate the based on the classification accuracy over 19 data sets.match between qi and si (which are the ith decomposed For each distance measure, we learn the parameters (e.g.,sequences of q and s). At column j, the local match of si warping width of DTW, matching threshold ε of EDR,is defined as Di,j (q, s) = minp∈P Dp (p), where P is the w, c, St and Sa of SpADe) from the training data set byset of all LPMs (between qi and si ) detected in column maximizing the 1NN classification accuracy of leave onej. It is actually the minimal potential SpADe distance of out cross validation. The classification accuracy on testall LPMs detected in column j. If there is no LPM in data set is shown in Table 1. We see that in many timecolumn j, Di,j (q, s) = g(m), where m is the length of q. series data sets, especially in many of those with smoothIn the example shown in Figure 11, Di,j (q, s) = Dp (p1 ). shapes, SpADe achieves higher accuracy than the other distance measures. c j−2 c j−1 c j m Data set Euclidean DTW EDR LCSS SpADe Syn. con. 0.880 0.983 0.960 0.877 0.953 Gun point 0.913 0.913 0.980 0.980 1.000 CBF 0.852 0.996 0.989 0.988 0.959 p1 FaceAll 0.714 0.808 0.806 0.718 0.767 qi OSULeaf 0.517 0.616 0.785 0.777 0.889 Swed. leaf 0.787 0.843 0.904 0.867 0.888 50words 0.631 0.758 0.802 0.773 0.793 p2 Trace 0.760 0.990 0.960 1.000 1.000 Two Pat. 0.910 0.998 0.998 0.999 0.990 0 ... si ... Wafer FaceFour 0.995 0.784 0.995 0.886 0.993 0.966 0.988 0.920 0.994 0.977 Lighting2 0.754 0.869 0.852 0.803 0.755Fig. 11. An example of local (best) match. Lighting7 0.575 0.712 0.699 0.712 0.699 ECG200 0.880 0.880 0.900 0.870 0.840 Adiac 0.611 0.609 0.616 0.558 0.681 For feature sequences qi and si , the local best Yoga 0.830 0.845 0.806 0.849 0.857match of si at column j is defined as Di,j (q, s) = Fish 0.783 0.840 0.920 0.914 0.943 Problem4 0.917 0.900 0.917 0.933 0.933minj ≤j (Di,j (q, s) + g(w × (j − j ))). Assuming that the Problem12 0.829 0.913 0.883 0.895 0.898query pattern contains d features, we then define the TABLE 1local best match of s to q at column j as Dj (q, s) = d Accuracy of 1NN classification in full sequence matching. i=1 Di,j (q, s). It is obvious that Dj (q, s) is the aggre-gation of local best matches of si for all decomposedsequences. It therefore gives an overall evaluation of 7.1.2 Impact of parametersthe distance of a subsequence of s (ended at column The length of local patterns w is an important parameter.j) to the query q. Because all decomposed sequences of It determines the complexity of shapes in the extracteds are compared against the corresponding decomposed local patterns. However, optimal w can be learned fromsequences of q in parallel, the local best match of s can training data sets, and it can also be set as a trade offthen be continuously (column-by-column) computed. between the accuracy and efficiency of classification. We show the impact of pattern length on the accuracy of7 P ERFORMANCE E VALUATION leave one out cross validation of 1NN classification inIn our performance evaluation, we compare SpADe with Figure 12. Three data sets of different shapes are used insome commonly used distance measures of time series: this test. The shapes of some examples of time series areEuclidean distance, DTW and EDR in terms of accuracy shown on the left, and the accuracy of correspondingand efficiency. Our test platform is a PC with Pentium4 data set is shown on the right. In this test, given a3.0G CPU and 1G RAM. pattern length w, the maximal accuracy achieved by adjusting c, St and Sa is recorded. We can see that shorter local patterns are preferred in the Fish data set7.1 Full sequence matching of SpADe (Figures 12(a) and 12(b)) to capture the local shapes moreWe use the UCR Time Series Classification/Clustering accurately because those local shapes are important indata sets [9] for testing the performance of SpADe in identifying the labels of instances in this data set; Forfull sequence matching. the Problem4 data set (Figures 12(c) and 12(d)), longer local patterns are preferred as there are too much high7.1.1 Accuracy in full sequence matching frequency dithering within the shapes of time series.Like in many other studies [35], [3], one nearest neighbor The shapes of short local patterns are meaningless inclassification (1NN) is used to test the accuracy of dis- this data set. On the contrary, the wavelet approxima-tances under full sequence matching. In 1NN classifica- tion of long local patterns reduces the impact of hightion, for each sequence in the testing data set, we predict frequency dithering. It therefore smooths the shapes ofits label from its nearest neighbor in the training data set. time series; For the Problem12 data set (Figures 12(e)If the derived label is the same as the original label of the and 12(f)), pattern length w does not affects the accuracytesting sequence, we get a hit; Otherwise, we get a miss. too much. However, it cannot be too long as the wavelet