• Save
Effective and Efficient Shape-Based Pattern Detection over Streaming Time Series
Upcoming SlideShare
Loading in...5
×
 

Effective and Efficient Shape-Based Pattern Detection over Streaming Time Series

on

  • 532 views

Effective and Efficient Shape-Based Pattern

Effective and Efficient Shape-Based Pattern
Detection over Streaming Time Series -

Statistics

Views

Total Views
532
Views on SlideShare
532
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Effective and Efficient Shape-Based Pattern Detection over Streaming Time Series Effective and Efficient Shape-Based Pattern Detection over Streaming Time Series Document Transcript

    • This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 1 Effective and Efficient Shape-Based Pattern Detection over Streaming Time Series Yueguo Chen, Ke Chen, and Mario A. Nascimento, Abstract—Existing distance measures of time series such as the Euclidean distance, DTW and EDR are inadequate in handling certain degrees of amplitude shifting and scaling variances of data items. We propose a novel distance measure of time series, Spatial Assembling Distance (SpADe), that is able to handle noisy, shifting and scaling in both temporal and amplitude dimensions. We further apply the SpADe to the application of streaming pattern detection, which is very useful in trend-related analysis, sensor networks and video surveillance. Our experimental results on real time series data sets show that SpADe is an effective distance measure of time series. Moreover, high accuracy and efficiency are achieved by SpADe for continuous pattern detection in streaming time series. Index Terms—Distance measure, time series, shifting and scaling, pattern detection. ✦ 11 1 00 0 11 00 1 I NTRODUCTION b 1 0 1 0 0 e 1 11 00 11 1 00 0 1 1 0 0 11 11 1 00 00 0 s1 1 0 1 0 Studies on evaluating the similarity of time series have 11 00 0 hump 0 1 1 0 1 1 0ascending 1 0 00 c d0 11 1 0 attracted the interest of database community for many a0 1 00 0 1 11 1 11 00 1 0 1 1 0 0 11 00 1 0 years. A number of distance measures [1], [2], [3], [4] 1 0 11 00 1 011 00 1 0 1 0 1 1 0 e’ 0 b’ 00 00 1 1 11 1 0 11 00 1 0 have been proposed to improve the effectiveness of 11 1 00 0 1 0 11 00 1 01 0 11 11 00 1111f’ 0 0 s2 0000 0 d’ 0 1 0 c’ 1 0 a’ 00 1 11 0 11 00 1 00 1 11 0 matching time series, which is highly affected by noise 11 1 00 0 00 1 11 and warps within time series [5]. The so-called warps Fig. 1. Illustration of noise, shifting and scaling in tempo- in temporal and amplitude dimensions of time series ral and amplitude dimensions of time series. impose difficulties in evaluating distances between time series. Figure 1 shows cases of warps (shifting and ries may contain certain degrees of various warping scaling) existing between two time series s1 and s2 . Note factors mentioned above. A distance measure of time that s1 is similar to s2 at the semantic level, as there series is sensitive to a warping factor if a large distance is a hump followed by an ascending trend in both of is generated for two similar time series with such a them. The first warp is temporal shifting, i.e., the lag of warping factor. An effective distance measure should be ascending trend to the hump in s1 (measured as d − c) insensitive to the above warping factors. is different from that (measured as d − c ) in s2 . The second is amplitude shifting, e.g., the values of data Existing distance measures of time series can be classi- items between d and e in s1 are larger than those of fied into three categories. The first category is Euclidean- the corresponding items between d and e in s2 . The based measures in which Euclidean distance is used in third is scaling, the extensions of humps in s1 and s2 are measuring distance between either two original time different in both temporal dimension (from c-a and c - sequences or features got from the original time se- a ) and amplitude dimension (from s1 [b]-s1 [a] and s2 [b ]- quences. It has been observed that the Euclidean distance s2 [a ]). Noise (f ) also exists in time series s2 . is very sensitive to distortion and noise [3], [6]. More- In this paper, we focus on shape-based time series over, it only handles global time scaling by shrinking where local shapes usually imply important semantics or stretching time sequences compulsively. The second and they are very useful in identifying objects and category includes numerical warping distances such as phenomena represented by the time series. Examples Dynamic Time Warping (DTW) [1] and Edit distance of shape-based time series are trajectories, silhouettes with Real Penalty (ERP) [2]. The distance between two of objects, signals from sensors. Shape-based time se- time series is aggregated over pair-wise difference of data items in the optimal alignment between two time • Yueguo Chen is with the Key Laboratory of Data Engineering and sequences. These distance measures handle local time Knowledge Engineering, MOE of China, Renmin University of China, shifting and scaling [7], but are still sensitive to cer- China. E-mail: chenyueguo@ruc.edu.cn tain degrees of amplitude shifting and scaling as the • Ke Chen is with College of Computer Science, Zhejiang University. E-mail: chenk@zju.edu.cn amplitude difference of data items will be accumulated. • Mario A. Nascimento is with Department of Computing Science, Univer- The third category is ε-matching warping distances, in sity of Alberta. E-mail: mn@cs.ualberta.ca which distance is aggregated over bounded similarity This work is partially sponsored by NSERC, Canada scores determined by a matching threshold ε. ExamplesDigital Object Indentifier 10.1109/TKDE.2010.223 1041-4347/10/$26.00 © 2010 IEEE
    • This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 2are Longest Common Subsequence (LCSS) [4] and Edit subsequence and the query pattern is no more than aDistance on Real sequence (EDR) [3]. Compared to the given threshold δ.second category, the ε-matching warping distances are All the mentioned distance measures of time series arerobust in the presence of noise and partially handle designed for full sequence matching, in which distancesome amplitude shifting and scaling variances. However, is measured based on the full length of sequences. How-they are still sensitive to certain degrees of amplitude ever, on the problem of streaming pattern detection, wewarps because ε-matching is directly based on amplitude have no priori knowledge on the positions and lengthsvalues. Figure 2 shows two examples where amplitude of the possible matching subsequences. When usingshifting and scaling variances may affect the effective- these distances, we need to first divide the potentialness of existing warping distances. subsequences from the streaming time series, and then compare them to query patterns based on full matching. B B An obvious solution is to compare the most recent sub- A sequences of streaming time series to the query patterns A C C whenever a new data item arrives. However, such an ap- (a) (b) proach is computationally intensive, and incurs redun- dant computational overhead. Segmentation is a simpleFig. 2. Impact of amplitude shifting and scaling. d(A, C) way to handle subsequence matching, in which potentialmay be less than d(A, B) for warping distances. matching subsequences are extracted from streaming time series and compared to query patterns. However, The local shapes of time series also affect the ef- potential segments may be hard to extract as many timefectiveness of distances. Figure 3 shows an example series patterns have no clear boundaries.where the DTW distance of two local shapes is quite As a subsequence matching problem, pattern detectionsmall even though they are quite distinct in shapes. on streaming time series is naturally expensive. WarpingExisting warping distances lose much information when distances have so far not been extended for onlinematching local shapes. pattern detection in streaming time series while taking both shifting and scaling into account. SpADe is applied b to efficiently perform continuous detection of patterns on ... ... streaming time sequences without the need to perform a sequence segmentation. Our contributions are as follow: • We propose a robust distance measure of shape- b’ based time series, SpADe, which can be applied ... ... to both full sequence and subsequence matching. a’ It is not sensitive to shifting and scaling in eitherFig. 3. Impact of local shapes on warping distances. the temporal or the amplitude dimensions of time series. Global amplitude shifting and scaling can be handled • We propose a continuous SpADe computation ap-by normalization [3], [8]. Given a time series s, each data proach which can naturally be used on streamingitem s[i] can be normalized as s[i] = (s[i] − μ)/σ, where pattern detection. We improve the efficiency of pat-μ and σ are the average and standard deviation of data tern detection by using a pruning approach.items in s. Many available time series data sets have • We extend the SpADe distance for streaming patternbeen normalized [9]. However, local amplitude shifting detection of multivariate time series.and scaling (an example is shown in Figure 1) cannot be • Experimental study was conducted. We present ex-handled by simple normalization of global time series. perimental results that show that SpADe is an effec-To fully handle noise, local shapes, shifting and scaling tive distance measure of time series, and it is bothin temporal and amplitude dimensions of shape-based efficient and effective for subsequence matching ontime series, we propose a novel distance measure, called streaming time series.Spatial Assembling Distance (SpADe). The rest of the paper is organized as follows. Section We investigate the use of SpADe in the context of 2 gives an overview of distance measures of time seriesdetection of streaming patterns. Pattern detection on and existing solutions on subsequence matching. Sectionstreaming time series is to continuously monitor match- 3 defines the basic SpADe, and Section 4 proposesing subsequences of streaming time series against some effective techniques on computing the SpADe distance.given query patterns. A pattern in time series is a Section 5 introduces the approach of continuous pat-set of sequential data items collected in discrete time tern detection by SpADe. Section 6 extends the SpADepoints, describing a meaningful tendency of evolving distance for streaming pattern detection of multivariatedata items during a period of time, and therefore im- time series. Section 7 shows the experimental study ofplying important phenomenon of the monitored objects. SpADe. Section 8 summarizes our conclusions.A subsequence of streaming time series is said to be This paper improves on our previous work [10] by giv-matched to a query pattern if the distance between the ing a thorough analysis of warping-based subsequence
    • This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 3matching in Section 2.3, a detailed discussion on effective techniques is the use of Euclidean distance on measuringcomputation of the SpADe distance in Section 4, an distances in feature space. Park et al. [24] proposed anextension of SpADe for streaming pattern detection of approach for subsequence matching by applying DTW.multivariate time series in Section 6, and an extensive The suffix tree is used to index possible subsequencesexperimental study on the impacts of parameters and of the data sequences. However, all these studies onthe application of streaming human motion pattern de- subsequence matching try to search the matches of shorttection in Section 7. query patterns to long sequences in a database, where index can be built on long data sequences. Pattern detection on streaming time series is to detect2 P RELIMINARIES matching subsequences within long streaming sequences2.1 Distance measures of time series to any given query pattern. Wu et. al [25] proposedThe distance between two time series is essentially an online segmentation and pruning algorithm to sim-computed from the aggregation of pair-wise difference plify the data sequence as zigzag shapes. However, theof data items within them. Traditionally, the Euclidean piecewise linear representation limits its application indistance is used to measure the distances between time shape based pattern matching on time series. Euclideanseries of the same length. Many dimensionality reduc- distance or its variation (e.g., correlations) was usedtion techniques, such as Discrete Fourier Transform [11], in matching patterns in some recent works on stream-Singular Value Decomposition [12], Discrete Wavelet ing time series such as BRAID [26], SPIRIT [27]. GaoTransform [13], Adaptive Piecewise Constant Approxi- et. al [28] also studied continuous pattern queries onmation [14] and Chebyshev Polynomials [15], have been streaming time series. They attempted to detect the near-applied to feature vector extraction from time series, est neighbor pattern when new data value arrives. Asafter which Euclidean distance can then be applied in mentioned earlier, the use of simple Euclidean distancemeasuring distances of the extracted feature vectors. or correlation in these studies affects the effectivenessHowever, it has been observed that the Euclidean metric of pattern matching where shifting and scaling exist.is very sensitive to distortion and noise [3], [6]. Steaming pattern detection on DTW distance has been Warping distances such as DTW [1] and EDR [3] have recently studied in [29]. The matching subsequences arebeen proposed to measure distances of time series with continuously monitored by computing DTW distances inarbitrary lengths. The optimal alignments of data items a continuous fashion. This technique can also be appliedbetween two time sequences are obtained by repeating to the other warping distances such as EDR. However,some data items so that the lengths of two sequences as stated earlier, these warping distances do not handlecan be the same. As a result, local time shifting and shifting and scaling in amplitude.scaling [7] are handled under those warping distances.The distance is calculated by finding the best warping 2.3 Warping-based subsequence matchingpath in the distance matrix using dynamic programming, Given two time series s1 and s2 of lengths m andwhich has a complexity of O(mn) (m and n are the n, a warping distance uses a matrix of (m + 1) ×lengths of time series). Lower bounds of warping dis- (n + 1) for computing the full sequence distance bytances [6], [16] have been proposed to prune some real a recursive function: M [i, j] = f(x,y)∈φ(i,j) (M [x, y] +computations of warping distances. However, existing subcost((x, y), (i, j))). M [i, j] records an intermittent re-warping distances are still sensitive to the shifting and sult of an optimal substructure, which describes thescaling in the amplitude dimension of time series. optimal matching of two prefixes s1 [1 : i] and s2 [1 : j]. Supporting effectively matching time series under The main function f is either min or max function,shifting and scaling variances has been attempted by depending on whether it is to measure distances or sim-many studies [5], [17], [18], [19], [20]. However, the ilarities. Notation φ(i, j) denotes the set of entries in thetechniques proposed in these studies either support only matrix from which M [i, j] can be dynamically computed.uniform shifting and scaling or cannot fully address For each element (x, y) ∈ φ(i, j), it is satisfied that x ≤ ithe shifting and scaling variances in both temporal and and y ≤ j so that M [i, j] can be dynamically computedamplitude dimensions of time series. Moreover, time from those entries which have been already computed.series are matched based on data items in these studies, Typically, φ(i, j) = {(i − 1, j), (i, j − 1), (i − 1, j − 1)}.where meaningful local shapes (as the example in Figure The function subcost((x, y), (i, j)) is the additional cost3) may not be effectively captured and matched. for computing M [i, j] from M [x, y]. It is typically a non- negative function. The actual distance of time series is2.2 Pattern detection on streaming time series actually aggregated over a number of subcosts throughFor subsequence matching, ST-index [21], Dual Match dynamic programming. The initial condition for com-[22] and General Match [23] extract local patterns from puting the warping distances is M [0, 0] = 0, from whichsequences by fixed size sliding windows. They map each the distance is aggregated. The entries in M can bewindow of data items into a multidimensional point computed row-by-row or column-by-column. The lastand use indexing techniques to efficiently match the entry to be computed, M [m, n], finally determines thesubsequences in feature space. The limitation of these warping distance of two time series. For each entry
    • This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 4(i, j) ∈ M , there must be a warping path from which sliding window. A local pattern l of length w from a timeM [i, j] is aggregated. In full sequence matching, we sequence s1 can be described as l = (θt , θa , θs ), which areshould guarantee that the warping path of each entry the position (mid point) of l in s1 , the mean amplitude of(i, j) (i × j > 0) is initialized from the entry (0, 0). data items in l and the shape signature of l respectively. The warping distance can also be applied for sub- The distance of two local patterns l in s1 and l in s2 , cansequence matching, to handle the temporal variances be measured as D1 (l, l ) = f (|θa − θa |, |θs − θs |), whichbetween the querying patterns and matching subse- is a weighted sum of the differences in amplitude andquences. Given a querying pattern q and a long time shape features of two local patterns. The weights in f isseries s of length m and n, a wide distance matrix M application-specific, depending on the tolerance of theof (m + 1) × (n + 1) can be created (shown in Figure 4). amplitude difference and that of the shape difference.Instead of evaluating the distance of two full time series A local pattern match (LPM) p is formed from l andbased on the warping path between two fixed corner l if D1 (l, l ) < ε, which means that there is a matchentries, we propose to evaluate the distances between q between l and l . We label the positions of l in s1 andand the subsequences of s based on the warping paths l in s2 as xp and yp respectively. A matching matrix offrom the bottom edge to the top edge of M . m × n is shown in Figure 5 to describe the match of local patterns in s1 and s2 . The relative positions of l and l M[m, e] are obtained by projecting p horizontally and vertically. x A LPM p can be described by the coordinates of two q local patterns: p = (xp , yp , ψp ) = (θt , θt , θt − θt ), where ψp ... ... i M[i, j] represents the temporal shifting of two local patterns. j m M[0, b] y s s1 p l lFig. 4. Subsequence matching using warping distances p.x m p.x 0 s1 The boundary entries are initialized as M [0, j] = 0, s2 xM [i, 0] = +∞ (i > 0). All the other entries in M l’ l’are computed column-by-column following the same 0 p.y n O y p.y s2 nrecursive function as full sequence matching. Withineach column, they are computed in a bottom-up manner. Fig. 5. An example of an LPM and its corresponding localIn each column, the top entry M [m, e] is used to evaluate patterns in matching matrix.whether there is a matching subsequence ended at theposition e of the long time series s. For each entry (m, e) Note that there are a number of local patterns ex-of the top edge of M , a warping path can be traced tracted from two time sequences s1 and s2 . A largeout. Given such a warping path (which starts at (0, b) number of LPMs will be formed if s1 and s2 are similarand ends at (m, e)), the warping distance (or its square) in shapes. Their distribution can be visualized in thebetween q and subsequence s[b : e] can then be measured matching matrix formed from the two sequences.as M [m, e]. The subsequence s[b : e] will be a matchingsubsequence to q if M [m, e] ≤ δ. 3.2 Distance between two LPMs For streaming time series scenarios, the length of sis not fixed. Data items of s evolve dynamically. We We measure the SpADe distance of two time series bymay maintain a sliding window of width w (which is finding the best combination of LPMs in the matchingcomparable to m) as the width of matrix M . When a matrix, such that they can maximize the matches of s1new data item is appended to s, a new column of M will and s2 . The quality of LPM combination is determinedbe recomputed by refreshing all entries in that column by the following two criteria: 1), the projections (verticalin a bottom-up manner. Such a technique can also be and horizontal) of LPMs should cover large regions ofapplied in subsequence matching when n is too large. In s1 and s2 . The larger the covered regions, the morethis case, instead of using a matrix of (m + 1) × (n + 1), data items in s1 and s2 are matched; 2), the temporala small matrix of (m + 1) × w is enough (w n). shifting of two LPMs should be as small as possible, which means that two LPMs can be obtained by a similar3 S PATIAL A SSEMBLING D ISTANCE transformation from local patterns in s1 to local patterns in s2 . We define the gaps between two LPMs p1 and p23.1 Local pattern match on s1 and s2 as Dx (p2 , p1 ) and Dy (p2 , p1 ) respectively:In full sequence matching, the distance between twotime sequences s1 [1 : m] and s2 [1 : n] is measured max(xp2 − xp1 − w, 0) if xp2 > xp1 ; Dx (p2 , p1 ) =based on the full length of two sequences. We borrow the +∞ otherwise.idea from General Match [23], and extract a set of small max(yp2 − yp1 − w, 0) if yp2 > yp1 ; Dy (p2 , p1 ) =local patterns from time series by using a fixed size of +∞ otherwise.
    • This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 5 The gaps are used to handle the noise and local l1 l3unmatched regions within time series. l l1 Definition 1: The distance of two LPMs p1 and p2 isdefined as D2 (p2 , p1 ) = g(Dx (p2 , p1 )) + g(Dy (p2 , p1 )) + l2 l4g(|ψp2 − ψp1 |). temporal scaling amplitude scaling Function g(x) is a penalty on the gaps between twoLPMs, which can be defined by users, but should satisfy Fig. 6. Scaling the local patterns.the following properties: 1) g(0) = 0; 2) g(x + y) ≥ g(x) +g(y), (x, y ≥ 0). In our study, we simply use g(x) = x noted as V (l). If the local pattern l is cast into St temporalwhich satisfies the requirements on g(x). We also define scales and Sa amplitude scales, then |V (l)| = St × Sa .the distance (D2 ) between a LPM p and a point at the Given two time series s1 and s2 , we actually measuretop or bottom of the matching matrix, by assuming that the distance between them by only scaling one timethe point is the mid point of a virtual LPM. series s1 . A LPM p is formed by a local pattern l in s1 and a local pattern l in s2 , if ∃l ∈ V (l), D1 (l , l ) < ε.3.3 SpADe in full sequence matching According to the definition of SpADe, to compute the distance of s1 and s2 , we need to extract O(n) local Definition 2: Given a path r = Ps → p1 → ... → patterns from s2 and conduct O(n) ε-range queries overpt → Pe formed by Ps (0, 0), Pe (m, n), and a number of those O(mSt Sa ) scaled local patterns extracted from s1 .LPMs p1 , . . . , pt , the length of r is defined as Cost(r) = t−1 As a result, the total computational cost of SpADe will beD2 (p1 , Ps ) + i=1 D2 (pi+1 , pi ) + D2 (Pe , pt ). much higher, compared to the traditional distance mea- Given two sequences s1 [1 : m] and s2 [1 : n], a matching sures of time series such as DTW and EDR. Therefore,matrix can be built based on all the LPMs between s1 and we propose some approximate techniques to speed ups2 . Given two corner points Ps (0, 0) and Pe (m, n) in the the distance computation of SpADe.matching matrix, {ri } include all the paths derived fromthe LPMs, and linking Ps and Pe . Definition 3: The SpADe distance of s1 to s2 un- 4.2 Efficient detection of LPMsder full sequence matching is defined as D(s1 , s2 ) = Short local patterns are preferred to describe the finemint Cost(rt ), rt ∈ {ri }. grained local shapes of time series. This is because long In other words, the SpADe distance of two given time local patterns generate more false positive LPMs, as largesequences is the length of shortest path from left-bottom ε is needed for long patterns to reduce the false dismissalcorner to the right-up corner in the matching matrix of ratio of LPMs. Haar wavelet [31] is a good candidatethese two sequences. We find the best combination of for extracting θa and θs features from local patterns,LPMs using the shortest path connecting two end points. as low band wavelet coefficients elegantly describe theThat is why we call the distance as spatial assembling mean amplitude and the general shape of local patterns.distance. Finding shortest paths has been well studied Moreover, the Haar wavelet is computationally efficient.and the classic Dijkstra’s algorithm [30] can be applied. In our solution, we propose to use the first 4 low band wavelet coefficients as θa (the first low band wavelet4 E FFECTIVE S PAD E C OMPUTATION coefficient) and θs (the second to the fourth low band wavelet coefficients) features of local patterns.4.1 Handling scaling variations In many applications of time series, distances of aThe scaling variations of two time series are not handled querying time series to a number of database time seriesin the original definition of SpADe given in the previous are typically computed online. To improve the efficiencysection. To handle the scaling variations, one time series of matching local patterns, those existing instances canneed to be scaled into a number of time series in both be preprocessed, and scaled local patterns can be ex-temporal and amplitude dimensions. Then, for each tracted from them. A multi-dimensional index such aslocal pattern in the original time series, a number of R-tree [32] can be used to index those local patterns soscaled local patterns can be extracted from the scaled that ε-range queries can be efficiently processed.time series. Figure 6 shows how a number of scaled To handle the variances of shifting and scaling, givenlocal patterns are extracted based on a original local a local pattern l extracted from a query time series q, apattern l. First, a number of local patterns (l1 and l2 in large number of existing local patterns extracted from allthe example) with the same mid points and different data sequences will match l . Therefore, many brancheslengths are extracted from the original time series as a in the R-tree are involved during the query, which incurmeans of temporal scaling. Second, for each temporally much computational overhead. Inspired by VA-File [33],scaled local pattern (l1 as an example), a number of we partition the feature space into cells, and approximateamplitude scaled local patterns (l3 and l4 ) of same length the distance between local patterns according to the cellsare extracted from the same positions of the amplitude they fall in. As the number of dimensions is small andscaled time series. The set of all scaled (both in time and adequate variation should be allowed, the total numberamplitude) local patterns varied from l (including l) is of filled cells is expected to be much less than the number
    • This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 6of local patterns. Consequently, each cell records a list of w = 4k, where k is an integer. The range of w is consid-original local patterns whose scaled local patterns fall in ered based on the length of time series. It cannot be toothe cells. Therefore, only a local pattern l is maintained small as the short local patterns may not be long enougheven though more than one local patterns in V (l) fall in to represent a meaningful local shape patterns. More-a cell c. Given a query local pattern l (located in cell c), over, small w incurs a large number of local patterns, andall local patterns within c and the direct neighbor cells therefore drops the efficiency of SpADe. On the otheraround c are treated as the matching local patterns of l . hand, w also cannot be too large as 4 wavelet coefficientsTherefore, efficiency of detecting LPMs is achieved by will be not enough to approximate the complex localchecking the matching local patterns within cells. shapes extracted from long local patterns. On practice The space of wavelet coefficients of local patterns is (tested from many time series data sets), w can be chosenpartitioned into cells. Effective widths of cells are learned from 64 to n , where n is the average length of time series. n 2from the distribution of wavelet coefficients extracted We generate a number of scales in time and amplitudefrom the training data set. For each wavelet coefficient fi , by specifying St and Sa . The granularity of scales is ¯ −μwe normalize it as fi = fiσi i , where ui and σi are mean set as 0.1. For example, if St is 7, then we generateand standard deviation of fi respectively. The widths temporal scales of 0.7, 0.8, . . . , 1.3. Parameter c is chosenof cells in the normalized wavelet coefficient space are from 8 to 16. It cannot be too small as small c generatesset as 1 for each dimension. To limit the number of 4 large sliding steps which will lose some LPMs. On the ¯ ¯cells, all fi > 2 or all fi < −2 are treated as outlier contrary, c does not need to be larger than 16 because a wpartitions. Therefore, each dimension is segmented into sliding step of 16 is already fine enough as a slide. The18 partitions, and there are totally 184 cells in the feature four parameters w, St , Sa and c are adjusted within itsspace of local patterns. value range. The combination achieving best accuracy in cross validation of training data set is learned as the4.3 Fast SpADe using disjoint sliding windows parameters in SpADe.Local patterns can be extracted from time series withdifferent granularity of sliding steps. The finest gran- 5 S PAD E ON S UBSEQUENCE M ATCHINGularity is applied in the original definition of SpADe, SpADe is useful not only for full sequence matching,i.e., local patterns are extracted at every position of both but for subsequence matching as well. It is a goods1 and s2 . As a result, the number of detected LPMs candidate to continuously monitor subsequences. In thiswill be very large, incurring high computational cost of section, we show how SpADe distance can be continu-SpADe. Inspired by the idea applied in [21], we propose ously computed in subsequence matching. First we giveto speed up the SpADe computation by using wider some notions used in subsequence matching. A numbersliding steps so that the number of derived LPMs can of time series queries qs, describing the phenomenonbe remarkably reduced. In our solution, disjoint sliding interested by users, are preprocessed and stored inwindows on the query time series s2 , and a sliding step query engine. The streaming time series s continuouslyof w (c is introduced for determining the width of sliding c feeds data items to the query engine. The query enginestep) on the other time series s1 were used to extract continuously reports the matching subsequences whoselocal patterns from two time series. The SpADe distance distances to any query pattern q is no more than somecan then be computed from those LPMs. The longer the given query threshold δ.LPMs, the larger sliding steps within s1 and s2 , and themore efficiency can be achieved on SpADe computation. Pe 5.1 Variance of SpADe in subsequence matching m Given a query pattern q[1 : m] and some recent data items s[ts : te ] in the streaming time series, the local s1 SpADe distance of s at time point t (ts ≤ t < te ) is defined as: Definition 4: D(q, s, t) = mini<te D(q, s[t + 1 : i]). 0 Ps s2 n D(q, s, t) measures the distance of the best matching subsequence (to q) starting at time point t + 1 of s.Fig. 7. SpADe computation by disjoint sliding windows. As shown in Figure 8, D(q, s, t) can be explained as the shortest path from point Ps (0, t) to points Pe (m, t )4.4 Parameter learning (t < t < te ). Let tr = argmint D(q, s[t + 1 : t ]). D(q, s, t)There are some parameters, w, St , Sa and c, which affect is actually the full sequence matching SpADe distancethe accuracy of SpADe distance. Effective values of these of q to s[t + 1 : tr ]. The global time scaling of a matching −tparameters can be learned from the training data set subsequence s[t+1 : tr ] to q can be measured as u = trm .by maximizing the accuracy of cross validation on one If u = 1, the matching subsequence is in the same lengthnearest neighbor classification approach. To facilitate the of q, and it is called an equal-length match; If u > 1, thewavelet transformation, we choose the pattern length matching subsequence will be longer than q, and it is
    • This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 7 t shrinking t r t+m expanding t e m match match consecutive LPMs pt1 and pt2 in the path, such that, pt1 is detected behind pt2 , i.e., ypt1 ≥ ypt2 , and Dx (pt2 , pt1 ) = q +∞. According to Definition 1, D2 (pt2 , pt1 ) = +∞. shortest path ... ... Therefore, Dc (p) = +∞, which is impossible because we can at least find a path from Ps (0, yp − w ) to p whose 2 cost is only g(xp − w ). Consequently, p1 cannot change 2 the value of Dc (p). 0 Ps ... s ... Lemma 1 guarantees that Dc (p) can be immediately computed when p is detected from the streaming timeFig. 8. An example of local SpADe distance. series. The computation of Dc (p) is to find the previouscalled an expanding match; otherwise, u < 1, it is called a LPM of p, noted as p , from which the shortest pathshrinking match. from the bottom edge of the matching matrix to p is Pattern detection tries to find subsequences of s whose found, i.e., p = argminp1 (Dc (p1 ) + D2 (p, p1 )). AccordingSpADe distance to query q is less than some threshold to Definition 1, p should be in the left-bottom cornerδ. This can be achieved by continuously computing local of p. Figure 9 shows the searching region ABOC of p .SpADe distances, i.e., finding matching subsequences This is because for those LPMs whose reference point issatisfying D(q, s, t) ≤ δ at every point of s. However, this beyond ABOC, one of the gaps of p to them will be +∞.is not efficient because each computation of SpADe dis- . .tance requires finding the shortest path of LPMs within . O’ O psome window size, which consumes much computation. B B’To improve the efficiency of continuous SpADe compu- ... ε ...tation, we propose an incremental way of computingSpADe distance. For pattern detection, the probability q A’ . O" C’of having matching subsequence grows as the number . . A Cof LPMs increases. Much computation will be saved if ... s ...the SpADe distance is updated only when new LPMsare detected. Fig. 9. Searching region of previous LPM. Definition 5: The cumulating SpADe distance of a de-tected LPM p to query q, noted as Dc (p), is the shortest However, it is not necessary to search p in the largepath starting from points at the bottom edge of matching region of ABOC, as large gaps are usually not allowedmatrix to p. in practice. Therefore, the searching region of p can Definition 6: The potential SpADe distance of a LPM be reduced by constraining the gaps between two con-p to query q is defined as Dp (p) = Dc (p) + g(m − xp − w ). secutive LPMs. Figure 9 shows the constraint searching 2 Dc (p) is a lower bound on the length of paths passing region A B OC with a gap bound of ξ. The efficiencythrough p and linking the bottom and top edges of the of computing Dc (p) will be improved significantly whenmatching matrix. Once Dc (p) > δ, p will not emerge small ξ is applied. The cumulating SpADe distance andin the path of any qualified matching subsequence for potential SpADe distance with the constraint region areq. On the other hand, if Dc (p) ≤ δ, p is a promising denoted as Dc,ξ (p) and Dp,ξ (p) respectively. On detectingLPM. Meanwhile, Dp (p) is an upper bound of the local p , we get Dc,ξ (p) = Dc,ξ (p ) + D(p, p ). For range query,SpADe distance. Therefore, Once Dp (p) ≤ δ, a qualified if Dc,ξ (p) > δ, we simply drop p as it will not appear asmatching subsequence to the query q is found. a LPM in a qualified matching subsequence. To find p of p, we need maintain those LPMs in the searching region of p , and test all the LPMs within5.2 Incremental computation of SpADe this region column-by-column. To reduce the number ofOn pattern detection in streaming time series, we ac- detected LPMs, we use disjoint sliding windows on thetually detect LPMs by cutting the most recent local streaming time series. Meanwhile, for each query patternpattern from streaming data sequence, extracting feature q, a sliding step of w is applied. As shown in Figure 9, c 2from the chopped local pattern, and retrieving LPMs the number of LPMs in A B O O” is bounded as cξ2 wof the local pattern. On detecting a LPM p, it will be due to the strategy of sliding steps.perfect if Dc (p) and Dp (p) can be computed on the fly. The above model guarantees that Dc,ξ (p) can be com-The following lemma supports this incremental way of puted column-by-column because the previous LPM of ξSpADe computation. p must be in the previous w columns of the column p Lemma 1: The LPMs detected behind a LPM p on locates. Therefore, for each query pattern q, the numberstreaming time series will not change Dc (p). of LPMs need to be dynamically maintained is bounded Proof: Suppose p1 is detected behind p. Therefore, as O( cmξ ). If there are N query patterns with largest w2yp1 ≥ yp . If p1 changes Dc (p), it should be in the shortest ¯ length of m, the memory cost of continuous SpADepath of Dc (p). Let p1 → ... → pt → p is a path from p1 computation will be bounded as the maximal number ¯to p in shortest path. Then we must be able to find two of LPMs need to maintained, O( cN mξ ). If t is the av- w2
    • This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 8 εerage number of LP M s detected from one chopped . . . . . . . . . . . . . .local pattern of streaming time series, the complexity of . . . . . . . nξt2 p4 p5whole pattern detection will be O( Nw2 ). In the regions p3 BZwhere no matching subsequences appear, the number of p2 bX q ... bW bYLPMs will be very small, close to zero. Therefore, the ... b1 bA ...computation of Dc,ξ (p) will be very efficient. b0 p1 p Along with the computation of Dc,ξ (p), we record xthe starting point of the shortest path to p. Dp,ξ (p) is p0computed following the calculation of Dc,ξ (p). As we A ... W X Y ... Z 0 y shave mentioned, once Dp,ξ (p) is found to be less thanδ, a qualified matching subsequence is detected. The Fig. 10. Pruning in SpADe distance computation.position of matching subsequence is actually the verticalprojections from the starting point of the shortest path = 0. ∴ D2 (p4 , p1 ) ≥ D2 (p3 , p1 ), and p4 is not a potentialof p to the end point of p. Considering that the potential posterior of p1 .SpADe distance of some LPMs around p may also satisfy We have mentioned that disjoint sliding window isthe range query, the LPM who has the smallest Dp,ξ (p) used to chop local patterns from streaming time series.within a local region is returned as the end of a matching Therefore, a column of LPMs will be obtained for everysubsequence in this region. chopped local pattern. The post-bound of a LPM pi in column A is bi . The post-bound of column A can be defined as bA = maxypi =yA bi , i.e., the highest post-bound5.3 Pruning approach in SpADe computation of pi in column A. According to Lemma 2, any LPM overThe major computational cost of range query on stream- bA and behind column A will not be a potential posterioring time series comes from the computation of cumulat- of any LPM in column A.ing SpADe distance of detected LPMs. Query processing Definition 8: The estimate-bound of a column A iswill be efficient if some LPMs can be pruned without BA = maxyA −ξ≤yX <yA bX (X is a column before A).the computation of cumulating SpADe distances. In the Figure 10 shows an example of estimate-bound BZ offollowing, we introduce the concepts of post-bound and column Z. It is obvious that for a LPM p5 over BZ inestimate-bound, and show how such a pruning approach column Z, it is not a potential posterior of any LPM inis achieved. column W, X, Y . In other words, the previous LPM of p5 Definition 7: The post-bound of a LPM p is the highest will not be found in the searching region of p5 . Therefore,position of the potential posteriors of p, which can be p5 can be pruned without the computation of Dc,ξ (p5 ).located in the next column of p. The estimate-bound of a column is continuously com- A LPM p2 is a potential posterior of p1 if Dc,ξ (p1 ) + puted based on the post-bound of previous columns. OnD2 (p2 , p1 ) ≤ δ. Suppose the post-bound of p1 is b1 , getting a promising LPM in a new column, we updateaccording to the definition, for any p3 satisfying yp3 = the post-bound of that column, which is further used toyp1 + w and xp3 > b1 , p3 will not be a potential posterior compute the estimate-bound of following columns.of p1 . Based on this, we have the following lemma. Lemma 2: For any LPM p4 satisfying that yp4 ≥ yp1 +w 6 S TREAMING PATTERN D ETECTION FORand xp4 > b1 which is the post-bound of p1 , p4 will notbe a potential posterior of p1 . M ULTI - FEATURE T IME S ERIES Proof: We simplify xpi and ypi as xi and yi . For a p4 In SpADe, local patterns are approximated for efficientsatisfying the conditions in Lemma 2, a virtual LPM p3 matching by using wavelet transformation and gridcan be found such that y3 = y1 + w, x3 = x4 > b1 , and indexing. However, when time series are multivariateψp3 = ψp4 . Therefore, p3 is not a potential posterior ofp1 . To show that p4 is also not a potential posterior of sequences (i.e., s[i] is a multivariate vector instead ofp1 , we only need prove that D2 (p4 , p1 ) ≥ D2 (p3 , p1 ). The a univariate number), the number of grids for approx-relationship of p1 , p3 and p4 is shown in Figure 10. imating local patterns will be exponentially increased, due to the curse of dimensionality [34]. As a result, theD2 (p3 , p1 ) = g(x3 − x1 − w) + g(|x3 − x1 − w|) cost of indexing and matching local patterns increasesD2 (p4 , p1 ) = g(x4 − x1 − w) + g(y4 − y1 − w) exponentially. To efficiently apply SpADe distance to +g(|(x4 − x1 ) − (y4 − y1 )|) streaming pattern detection of multi-feature time series, x3 = x4 we propose to decompose the multi-feature time series g(x + y) ≥ g(x) + g(y), x, y ≥ 0 into a number of time series of univariate data, and then∴ ΔD = D2 (p4 , p1 ) − D2 (p3 , p1 ) match them in parallel. The local distances of matching subsequences ended at the same position of different= g(y4 − y1 − w) + g(|(x4 − x1 ) − (y4 − y1 )|) − g(|x4 − x1 − w|) decomposed time series are aggregated on the fly, which gives an overall evaluation of the match between the= g(|(x4 − x1 ) − (y4 − y1 )|) + g(y4 − y1 − w) − g(x4 − x1 − w) subsequence (ended at the current position) of streaming≥ g(|(x4 − x1 ) − (y4 − y1 )|) − g(|(x4 − x1 − w) − (y4 − y1 − w)|) time series and the query pattern.
    • This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 9 For a query pattern and a streaming sequence q and As shown in Table 1, we compare the distance measuress, given a dimension i, SpADe is applied to evaluate the based on the classification accuracy over 19 data sets.match between qi and si (which are the ith decomposed For each distance measure, we learn the parameters (e.g.,sequences of q and s). At column j, the local match of si warping width of DTW, matching threshold ε of EDR,is defined as Di,j (q, s) = minp∈P Dp (p), where P is the w, c, St and Sa of SpADe) from the training data set byset of all LPMs (between qi and si ) detected in column maximizing the 1NN classification accuracy of leave onej. It is actually the minimal potential SpADe distance of out cross validation. The classification accuracy on testall LPMs detected in column j. If there is no LPM in data set is shown in Table 1. We see that in many timecolumn j, Di,j (q, s) = g(m), where m is the length of q. series data sets, especially in many of those with smoothIn the example shown in Figure 11, Di,j (q, s) = Dp (p1 ). shapes, SpADe achieves higher accuracy than the other distance measures. c j−2 c j−1 c j m Data set Euclidean DTW EDR LCSS SpADe Syn. con. 0.880 0.983 0.960 0.877 0.953 Gun point 0.913 0.913 0.980 0.980 1.000 CBF 0.852 0.996 0.989 0.988 0.959 p1 FaceAll 0.714 0.808 0.806 0.718 0.767 qi OSULeaf 0.517 0.616 0.785 0.777 0.889 Swed. leaf 0.787 0.843 0.904 0.867 0.888 50words 0.631 0.758 0.802 0.773 0.793 p2 Trace 0.760 0.990 0.960 1.000 1.000 Two Pat. 0.910 0.998 0.998 0.999 0.990 0 ... si ... Wafer FaceFour 0.995 0.784 0.995 0.886 0.993 0.966 0.988 0.920 0.994 0.977 Lighting2 0.754 0.869 0.852 0.803 0.755Fig. 11. An example of local (best) match. Lighting7 0.575 0.712 0.699 0.712 0.699 ECG200 0.880 0.880 0.900 0.870 0.840 Adiac 0.611 0.609 0.616 0.558 0.681 For feature sequences qi and si , the local best Yoga 0.830 0.845 0.806 0.849 0.857match of si at column j is defined as Di,j (q, s) = Fish 0.783 0.840 0.920 0.914 0.943 Problem4 0.917 0.900 0.917 0.933 0.933minj ≤j (Di,j (q, s) + g(w × (j − j ))). Assuming that the Problem12 0.829 0.913 0.883 0.895 0.898query pattern contains d features, we then define the TABLE 1local best match of s to q at column j as Dj (q, s) = d Accuracy of 1NN classification in full sequence matching. i=1 Di,j (q, s). It is obvious that Dj (q, s) is the aggre-gation of local best matches of si for all decomposedsequences. It therefore gives an overall evaluation of 7.1.2 Impact of parametersthe distance of a subsequence of s (ended at column The length of local patterns w is an important parameter.j) to the query q. Because all decomposed sequences of It determines the complexity of shapes in the extracteds are compared against the corresponding decomposed local patterns. However, optimal w can be learned fromsequences of q in parallel, the local best match of s can training data sets, and it can also be set as a trade offthen be continuously (column-by-column) computed. between the accuracy and efficiency of classification. We show the impact of pattern length on the accuracy of7 P ERFORMANCE E VALUATION leave one out cross validation of 1NN classification inIn our performance evaluation, we compare SpADe with Figure 12. Three data sets of different shapes are used insome commonly used distance measures of time series: this test. The shapes of some examples of time series areEuclidean distance, DTW and EDR in terms of accuracy shown on the left, and the accuracy of correspondingand efficiency. Our test platform is a PC with Pentium4 data set is shown on the right. In this test, given a3.0G CPU and 1G RAM. pattern length w, the maximal accuracy achieved by adjusting c, St and Sa is recorded. We can see that shorter local patterns are preferred in the Fish data set7.1 Full sequence matching of SpADe (Figures 12(a) and 12(b)) to capture the local shapes moreWe use the UCR Time Series Classification/Clustering accurately because those local shapes are important indata sets [9] for testing the performance of SpADe in identifying the labels of instances in this data set; Forfull sequence matching. the Problem4 data set (Figures 12(c) and 12(d)), longer local patterns are preferred as there are too much high7.1.1 Accuracy in full sequence matching frequency dithering within the shapes of time series.Like in many other studies [35], [3], one nearest neighbor The shapes of short local patterns are meaningless inclassification (1NN) is used to test the accuracy of dis- this data set. On the contrary, the wavelet approxima-tances under full sequence matching. In 1NN classifica- tion of long local patterns reduces the impact of hightion, for each sequence in the testing data set, we predict frequency dithering. It therefore smooths the shapes ofits label from its nearest neighbor in the training data set. time series; For the Problem12 data set (Figures 12(e)If the derived label is the same as the original label of the and 12(f)), pattern length w does not affects the accuracytesting sequence, we get a hit; Otherwise, we get a miss. too much. However, it cannot be too long as the wavelet
    • This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 10 1 Class A Class A 0.95 Class B Class B 0.9 0.85 Accuracy 0.8 0.75 0.7 0.65 0.6 0 10 20 30 40 50 60 70 0 50 100 150 200 250 300 350 400 450 w (a) Fish data set (b) Accuracy of Fish 1 Class A Class B 0.95 0.9 0.85 Accuracy 0.8 0.75 0.7 0.65 0.6 0 50 100 150 200 250 300 0 100 200 300 400 500 w (c) Problem4 (d) Accuracy of Problem4 1 Class A Class B 0.95 Class C 0.9 0.85 Accuracy 0.8 0.75 0.7 0.65 0.6 50 100 150 200 250 300 350 400 450 500 550 0 200 400 600 800 1000 w (e) Problem12 (f) Accuracy of Problem12Fig. 12. Impact of pattern length w on the accuracy of SpADe.transformation of local patterns will lose too much shape the accuracy and efficiency of SpADe is shown in Figuresinformation if the patterns are too long. 14(a) and 14(b) respectively. As shown in Figure 14(a), in The pattern length w has significant impact on the general, larger temporal and amplitude scales generateefficiency of SpADe as it affects the number of local higher accuracy on classification of time series data. Thispatterns to be extracted, and therefore the number of phenomena is obvious in those data sets where bothLPMs to be derived. We show this impact using the Fish temporal and amplitude variances exist among similardata set in Figure 13(a). The other parameters are fixed in instances of common time series. However, if St and Sathis test (St = Sa = 1, c = 8). The average computational are too large, there will be many false positive matches oftime of 1NN classification for one test time series is local patterns, which will reduce the accuracy of SpADe.plotted over different pattern lengths. The results show Effective St and Sa can be trained by maximizing thethat the computational cost of SpADe drops significantly 1NN cross validation accuracy of SpADe on classifyingwhen w is enlarged. The double of w can save the training data set. The experimental results in Figurecomputational cost in one order because the number of 14(b) show that the computational time is approximatelylocal patterns is halved, and the sliding step is doubled. proportional to St × Sa .In practice, we may use the pattern length w such thatthe reducing of w cannot improve the accuracy too much.We also show the impact of sliding steps w on thecefficiency of SpADe in Figure 13(b). The computationaltime is approximately inversely proportional to w .c 450 250 400 200 Computation time (ms) Computation time (ms) 350 300 150 (a) accuracy (b) efficiency 250 200 150 100 Fig. 14. Impact of temporal and amplitude scales. 100 50 50 0 0 7.1.3 Efficiency over disk-based datasets 15 20 25 30 35 40 45 50 55 60 65 1 1.5 2 2.5 3 3.5 4 4.5 5 w sliding step: w/c To further study the performance of SpADe, we use (a) Efficiency for w (b) Efficiency for sliding step the Gait in Parkinson’s Disease dataset 1 for testing the efficiency of SpADe for disk-based datasets. We extractFig. 13. Impact of parameters on SpADe. totally 218 = 256K time series from this dataset using Besides the parameters w and c, the temporal and disjoint sliding window over all feature sequences. Theamplitude scales St , Sa can also affect the accuracy and length of each time series is 256. The size of the wholeefficiency of the SpADe. We learn parameters St , Sa by time series dataset is 256M bytes. We use 10NN query tomaximizing the 1NN cross validation accuracy of the test the scalability of distance measures (SpADe, DTW,training data set. The test is conducted over the GunPoint data set. The impact of the scaling parameters on 1. http://www.physionet.org/pn3/gaitpdb/
    • This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 11EDR), with the sizes of datasets from 210 to 218 instances. 7.2.2 Experimental settingsThe testing method used in [6] is applied in our test, We compare the performance of SpADe under streamingi.e., we compare the ratio of time cost of the distance subsequence matching with that of Euclidean distance,measures with the time cost of linear scan by DTW. DTW and EDR in terms of accuracy and efficiency. For For DTW and EDR, a 16 dimensional in-memory index each class of signals from the MC data set, we generate[6] is applied. For SpADe, we learn parameters from 200 a representative pattern as a query pattern. Therefore,time series (5 from each class) instances. Because the there are a total of 21 query patterns. To test the accu-index of SpADe is usually larger than the size of dataset racy of distance measures on pattern detection, we use(around 2-5 times in our experiments), it is stored in KNN query to retrieve the 10 nearest neighbors of eachdisk. As can be seen from the results in Figure 15, the query pattern (each class has 10-11 repetitive patternsperformance of SpADe is better than linear scan by one in the data set). Hits or misses of detected matchingorder of magnitude regardless of the dataset size. This subsequences are determined by the positions of sub-is because, for a given query, only the inverted lists of sequences. Distances of subsequences are continuouslymatching local patterns for the query local patterns are measured and a small window size is used to retrieveretrieved from SpADe index. Comparatively, the pruning the best matching subsequence in a local region. Thepower of DTW and EDR increases with the enlargement accuracy is evaluated based on the average accuracy ofof datasets, they therefore have better performance when the KNN queries over all 21 query patterns.the dataset is large enough. According to this experi- For Euclidean distance, we extract subsequences withment, SpADe is not a good choice for indexing large the same length of query patterns from the streamingdisk-based datasets. time series, and compute the Euclidean distances of these extracted subsequences to the query patterns. The early 1 abandoning technique is applied to speed up the Eu- 0.8 clidean distance computation. In the general Euclidean- time cost ratio Linear scan with DTW 0.6 SpADe based streaming pattern detection approach, one subse- DTW EDR quence is extracted at every position of streaming time 0.4 series. We also enlarge the sliding step to the same length 0.2 of local patterns in SpADe, and extract one subsequence 0 every sliding step in another competitor called the Eu- 210 212 214 216 218 sliding approach. For DTW and EDR distances, the dataset size (number of time series) streaming pattern detection technique proposed in [29]Fig. 15. Efficiency comparison of full sequence matching is applied. For SpADe, we use a small pattern length of w = 8. The scaling parameters St and Sa are set based on the scales of maximal scaling variations introduced in7.2 Subsequence matching under SpADe the synthetic streaming time series. The largest allowed7.2.1 Data set gap ξ is set as ξ = 5w, which is large enough in most applications.Streaming time series data sets are needed to measurethe performance of SpADe on continuous pattern de-tection. Unfortunately, we were unable to find a long 7.2.3 Performance comparisonstreaming time series with labelled subsequences. We In the first test, we evaluate the performance of distanceobserve that the sequential tandem of samples in the measures under various degrees of time shifting. We in-MotorCurrent (MC) data set [9] is just a smooth stream- troduce time shifting by moving some random partitionsing time series. The MC data set contains 21 classes of time series forward or backward by a random step,of signals with 20 samples of each signal class for a so that items between two consecutive partitions are nototal of 420 signals. We do a fixed step sampling over more aligned. As shown in Figure 16(a), SpADe and EDRthe MC data set, and achieve a length of 300 for each have much better accuracy than Euclidean distance andsample sequence. Then, a synthetic labelled streaming DTW. DTW distance handles time shifting. However,time series (with a length of 12600) is generated by the allowance of time shifting in DTW may generatesimply connecting these samples. To ensure that there many pathological paths. Moreover, it is more sensitiveare shifting and scaling along both time and amplitude to shape gaps caused by the time shifting than EDRdimensions of the streaming MC data set, we randomly distance. The Euclidean distance performs worst becauseperform some shifting and scaling in random positions the time shifting causes the unmatches of dimensionsof the streaming data set. On average, one shifting (or in Euclidean distances. Figure 16(e) shows the efficiencyscaling) is introduced for half the length of each query of distance measures in various scales of time shifting,pattern. We test the performance of distance measures corresponding to the accuracy measurements in Figureunder various degrees of shifting and scaling along 16(a). We see that SpADe is much more efficient thantemporal and amplitude dimensions in the synthetic the others. DTW and EDR are expensive due to thestreaming time series. complexity of O(mn) in subsequence matching of one
    • This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 12 1 1 1 1 Euclidean Eu-sliding 0.8 DTW 0.8 0.8 0.8 EDR SpADe 0.6 0.6 Euclidean 0.6 0.6 Accuracy Accuracy Accuracy Accuracy Eu-sliding DTW 0.4 0.4 EDR 0.4 Euclidean 0.4 Euclidean SpADe Eu-sliding Eu-sliding DTW DTW 0.2 0.2 0.2 EDR 0.2 EDR SpADe SpADe 0 0 0 0 0 1 2 3 4 5 6 0 10 20 30 40 50 0 16 32 48 64 80 0 5 10 15 20 25 Max time shifting width(% of query length) Max time scaling (1+/-x%) Max amplitude shifting width(% of variance) Max amplitude scaling (1+/-x%)(a) Accuracy in time shifting (b) Accuracy in time scaling (c) Accuracy in amp. shifting (d) Accuracy in amp. scaling 1000 1000 1000 1000 100 100 Time (seconds) Time (seconds) Time (seconds) Time (seconds) 100 100 Euclidean Eu-sliding Euclidean 10 DTW 10 Euclidean Eu-sliding Eu-sliding EDR DTW SpADe Euclidean DTW 10 EDR 10 EDR Eu-sliding SpADe 1 1 DTW SpADe EDR SpADe 0.1 0.1 1 1 0 1 2 3 4 5 6 0 10 20 30 40 50 0 16 32 48 64 80 0 5 10 15 20 25 Max time shifting width (% of query length) Max time scaling (1+/-x%) Max amplitude shifting width(% of variance) Max amplitude scaling (1+/-x%)(e) Efficiency in time shifting (f) Efficiency in time scaling (g) Efficiency in amp. shift. (h) Efficiency in amp. scalingFig. 16. Performance under various factors, with accuracy in the top row and efficiency in the bottom respectively.query pattern. Eu-sliding is around 8 times faster than The performance of distance measures under variousthe general Euclidean approach due to the wider sliding degrees of amplitude scaling is shown in Figure 16(d)steps. The computational cost of Euclidean distances and Figure 16(h). We can still find that SpADe achievesslightly increase when time shifting is enlarged because best accuracy in various amplitude scales. However,the pruning bounds converge more slowly. it pays some additional cost to handle the amplitude Time scaling is one difficult factor for distance mea- scaling variations.sures since various time scales need to be tested to Note that the above tests treat the variation factors in-guarantee no false dismissals. Note that DTW and EDR dependently. We also conduct a comparison on distancenaturally handle some time scaling by the best warping measures in a case where 2% of maximum time shifting,paths in distance matrix. Comparatively, EDR is more 10% of maximum amplitude shifting and scaling, andsensitive to time scaling than DTW as the length dif- 20% of maximum time scaling are introduced. The re-ference of two matching sequences will be aggregated sults are shown in Table 2. We see that SpADe is muchinto EDR distance. That is why we see that the accuracy more accurate than the other distances. The efficiencyof DTW is better than that of EDR when the varied of SpADe is only worse than Eu-sliding slightly, whiletime scale is large (e.g., larger than 20 in Figure 16(b)). much better than the other four.SpADe achieves the best accuracy under different time Euclidean Eu-sliding DTW EDR SpADescales. The Euclidean distance is not comparable to the Accuracy 0.133 0.129 0.638 0.929 0.981 Time cost[seconds] 64.8 8.05 175.1 222.5 16.6others due to its inability to handle scaling withoutforced stretching and shrinking. The straight forward TABLE 2way to stretch or shrink of query patterns to various Performance comparison on a compositive case.scales of subsequence incurs huge computation, and itis limited to handling global scaling. Figure 16(f) shows 7.2.4 Pruning effect of SpADethe efficiency of distance measures in various degrees of The pruning approach of SpADe reduces the number oftime scaling. The efficiency of SpADe is similar to that cumulating SpADe distances that need to be calculated.of Eu-sliding, while much better than that of the others. We test the effect of the proposed pruning approachSpADe becomes expensive when St is enlarged to handle under various scales of time shifting and scaling. Aslarger time scales. Comparatively, the computational cost shown in Figure 17, 2-3 times better efficiency is achievedof the other distances is quite similar to the cost in Figure by the pruning approach in these tests.16(e) as no special cost is paid for handling time scaling. In Figure 16(c) and 16(g), we compare the performance 1.4 30of the four distance measures under various degrees 1.2 25 Time (seconds) Time (seconds)of amplitude shifting. We find that except SpADe, the 1 20 0.8 Without pruningaccuracy of the other distance measures is low when 0.6 Pruning 15 Without pruning Pruning 10the amplitude shifting is large. EDR still works well 0.4 5when the amplitude shifting is small (no large than 0.2 0 0the matching threshold ε). The DTW and Euclidean 0 1 2 3 4 5 Max time shifting width (% of query length) 6 0 10 20 30 Max time scaling (1+/-x%) 40 50distances have similar accuracy under various degrees (a) Time shifting (b) Time scalingof amplitude shifting. While the efficiency of SpADe isstill much better than that of the other distance measures. Fig. 17. Pruning effect of SpADe.
    • This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 137.3 Pattern detection of multi-feature time seriesWe generate some multi-feature time series from the real-life dataset of CMU motion capture database [36]. Thisdata set has more than two thousands human motionsequences of more than 100 subjects, with the lengthof sequences varying from 2 to 22,948. Based on thegeometric features proposed in [37], we extract fourfeatures related to the motion of lower half of bodies, (a) Efficiency (b) Accuracywhich are distances between two feet, angles of the Fig. 18. The performance of SpADe on streaming multi-left knee, angles of the right knee, and angles between feature time seriestwo femurs. Considering the human motion sequencesare captured in high resolution (120Hz sampling rate), against one streaming motion sequence, using eitherwe apply a re-sampling of one snapshot over every 4 DTW or EDR. From Figure 18(a) we can see that, thesnapshots. Therefore, each motion sequence contain 4 efficiency of SpADe distance is highly depended on thedecomposed time series with a sampling rate of 30Hz. number of scales (St and Sa ) it supports. The efficiency We randomly choose 20 motion sequences whose of SpADe is similar to DTW and EDR when St ×Sa ≈ 40.length is more than 500. From each of them, we ran- SpADe will be more efficient than DTW and EDR whendomly extract one subsequence of length 128 as a query it supports smaller St and Sa . However, even thoughpattern. Therefore, each query pattern lasts more than large values of St and Sa (e.g., St = 13 and Sa = 13) are4 seconds in real motion. Because the query patterns supported, the computational time of SpADe is still noare focused on lower half of bodies, we avoid to use more than three times of those of DTW or EDR.normal walking as query patterns because many motion On the accuracy of streaming pattern detection,sequences contain many cases of normal walking. We SpADe outperforms DTW and EDR. Even though St = 1create a streaming motion sequence by connecting the and Sa = 1, SpADe is still slightly more accurate thanten longest motion sequences in the dataset, which lasts DTW and EDR. This is because SpADe perfectly capturesfor around 1450 seconds. We do the experiments on 10 local shapes of motion sequences, which are very usefulnearest neighbors query for streaming motion pattern in matching similar motion sequences. When St anddetection. We measure the accuracy of streaming pattern Sa are slightly enlarged, the accuracy of SpADe can bedetection by comparing the query patterns and their 10 improved up to 76.5%. From Figure 18(b), it is obviousnearest neighbors manually, using a 3D visualization tool that the supports of amplitude variances can improvefor motion sequences. The accuracy of streaming pattern the accuracy of SpADe on streaming pattern detectiondetection is finally evaluated based on the number of of motion sequences.accurately matched subsequences. The SpADe distance is compared against DTW and 8 C ONCLUSIONEDR distances. The matching threshold of EDR is set as Motivated by the fact that Euclidean distance, DTW0.3 according to parameter setting schema discussed in and EDR have poor accuracy on pattern detection in[3]. We test the average response time of a 10 nearest streaming time series when shifting and scaling existneighbors query. The average computational time of in temporal or amplitude dimensions, we proposed aDTW and EDR distances for processing one 10 nearest novel distance, SpADe, which can be used to measureneighbors query over the streaming motion sequence is distance between shape based time series. The measure51.7ms and 54.3ms respectively. The average accuracy of of SpADe is based on the detection of the best combina-DTW and EDR distances are 69% and 66.5% respectively. tion of LPMs by computing the shortest path in match- For SpADe distance, we choose the length of local ing matrix. We applied SpADe on pattern detection inpatterns as w = 8, which lasts for around 0.27 seconds. streaming time series and streaming motion sequences.For each query, we first extract all scaled local patterns To speed up the computation of SpADe distances, wefrom the query sequences by using a sliding step of 1 proposed to use wavelets to retrieve the important shape(c = w). We test the performance of SpADe by varying coefficients of local patterns. We used the partitionedthe temporal scales (St ) and the amplitude scales (Sa ). cells to approximate and index these multi-dimensionalThe average computational time and accuracy are shown local patterns. To further speed up the continuous queryin Figure 18. processing on SpADe, we proposed an incremental way On the efficiency of streaming pattern detection, the of computing SpADe distances, which is very suitablethree distance measures are efficient enough. For DTW for pattern detection on streaming time series. We alsoand EDR, it takes less than 0.06 seconds to match a proposed a pruning approach to limit the searchingquery pattern against a streaming motion sequence of region of previous LPM and prune the computation1450 seconds. This implies that, with our experimental of cumulating SpADe distances. Extensive performancesettings, the streaming pattern detection algorithm can study was conducted and the results showed that SpADemonitor more than 20k query patterns simultaneously is an effective distance measure of shape based time se-
    • This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 14ries, and it is both efficient and effective for subsequence [26] Y. Sakurai, S. Papadimitriou, and C. Faloutsos, “Braid: Streammatching on streaming time series. mining through group lag correlations.” in SIGMOD Conference, 2005, pp. 599–610.9 ACKNOWLEDGMENTS [27] S. Papadimitriou, J. Sun, and C. Faloutsos, “Streaming pattern discovery in multiple time-series.” in VLDB, 2005, pp. 697–708.We would like to acknowledge and thank the collabo- [28] L. Gao and X. S. Wang, “Continually evaluating similarity-basedration by Professors Beng Chin Ooi and Anthony K.H. pattern queries on a streaming time series.” in SIGMOD Conference, 2002, pp. 370–381.Tung in an earlier version of this paper [10]. [29] Y. Sakurai, C. Faloutsos, and M. Yamamuro, “Stream monitoring under the time warping distance.” in ICDE, 2007, pp. 1046–1055.R EFERENCES [30] E. W. Dijkstra, “A note on two problems in connexion with graphs,” Numerische Mathematik, no. 1, pp. 269–271, 1959.[1] D. J. Berndt and J. Clifford, “Using dynamic time warping to find [31] C. K. Chui, An Introduction to Wavelets. San Diego: Academic patterns in time series.” in KDD Workshop, 1994, pp. 359–370. Press, 1992.[2] L. Chen and R. T. Ng, “On the marriage of lp-norms and edit [32] A. Guttman, “R-trees: a dynamic index structure for spatial distance.” in VLDB, 2004, pp. 792–803. searching,” in SIGMOD Conference. New York, NY, USA: ACM, ¨[3] L. Chen, M. T. Ozsu, and V. Oria, “Robust and fast similarity search 1984, pp. 47–57. for moving object trajectories.” in SIGMOD Conference, 2005, pp. [33] R. Weber, H.-J. Schek, and S. Blott, “A quantitative analysis 491–502. and performance study for similarity-search methods in high-[4] M. Vlachos, D. Gunopulos, and G. Kollios, “Discovering similar dimensional spaces.” in VLDB, 1998, pp. 194–205. multidimensional trajectories.” in ICDE, 2002, pp. 673–684. [34] C. Bohm, S. Berchtold, and D. A. Keim, “Searching in high- ¨[5] R. Agrawal, K.-I. Lin, H. S. Sawhney, and K. Shim, “Fast similarity dimensional spaces: Index structures for improving the perfor- search in the presence of noise, scaling, and translation in time- mance of multimedia databases,” ACM Comput. Surv., vol. 33, no. 3, series databases,” in VLDB, 1995, pp. 490–501. pp. 322–373, 2001.[6] E. J. Keogh, “Exact indexing of dynamic time warping.” in VLDB, [35] E. J. Keogh and S. Kasetty, “On the need for time series data 2002, pp. 406–417. mining benchmarks: a survey and empirical demonstration.” in[7] A. W.-C. Fu, E. J. Keogh, L. Y. H. Lau, and C. A. Ratanamahatana, KDD, 2002, pp. 102–111. “Scaling and time warping in time series querying.” in VLDB, 2005, [36] CMU Graphics Lab Motion Capture Database, pp. 649–660. http://mocap.cs.cmu.edu/.[8] M. D. Morse and J. M. Patel, “An efficient and accurate method [37] M. Muller, T. Roder, and M. Clausen, “Efficient content-based ¨ ¨ for evaluating time series similarity.” in SIGMOD Conference, 2007, retrieval of motion capture data.” ACM Trans. Graph., vol. 24, no. 3, pp. 569–580. pp. 677–685, 2005.[9] UCR Time Series Data Mining Archive, http://www.cs.ucr.edu/ ea- monn/time series data/. Yueguo Chen received the BS and Master de-[10] Y. Chen, M. A. Nascimento, B. C. Ooi, and A. K. H. Tung, “Spade: gree in Mechanical Engineering and Control En- On shape-based pattern detection in streaming time series.” in gineering from Tsinghua University, Beijing, in ICDE, 2007, pp. 786–795. 2001 and 2004. He earned his Ph.D. degree in[11] R. Agrawal, C. Faloutsos, and A. N. Swami, “Efficient similarity Computer Science from National University of search in sequence databases.” in FODO, 1993, pp. 69–84. Singapore in 2009. He is currently an Assistant[12] F. Korn, H. V. Jagadish, and C. Faloutsos, “Efficiently supporting Professor in the Key Laboratory of DEKE, Ren- ad hoc queries in large datasets of time sequences.” in SIGMOD min University of China. His research interests Conference, 1997, pp. 289–300. include community information systems and the[13] I. Popivanov and R. J. Miller, “Similarity search over time-series management of RDF data, Web data, unstruc- data using wavelets.” in ICDE, 2002, pp. 212–. tured data and spatial temporal data.[14] B.-K. Yi and C. Faloutsos, “Fast time sequence indexing for arbitrary lp norms.” in VLDB, 2000, pp. 385–394.[15] Y. Cai and R. T. Ng, “Indexing spatio-temporal trajectories with Ke Chen received Ph.D. degree in Computer chebyshev polynomials.” in SIGMOD Conference, 2004, pp. 599–610. Science from Zhejiang University in 2007. She[16] Y. Zhu and D. Shasha, “Warping indexes with envelope trans- was a postdoctoral associate at the School of forms for query by humming.” in SIGMOD Conference, 2003, pp. Aeronautics and Astronautics, Zhejiang Univer- 181–192. sity, during 2007-2009. She is currently an As-[17] K. K. W. Chu and M. H. Wong, “Fast time-series searching with sistant Professor at the College of Computer scaling and shifting,” in PODS, 1999, pp. 237–248. Science in the same University. Her research[18] W.-K. Loh, S.-W. Kim, and K.-Y. Whang, “A subsequence match- interests include spatial temporal data manage- ing algorithm that supports normalization transform in time-series ment, peer-to-peer systems, and data privacy. databases,” Data Min. Knowl. Discov., vol. 9, no. 1, pp. 5–28, 2004.[19] A. W.-C. Fu, E. J. Keogh, L. Y. H. Lau, C. A. Ratanamahatana, and R. C.-W. Wong, “Scaling and time warping in time series querying,” VLDB J., vol. 17, no. 4, pp. 899–921, 2008.[20] S. Gandhi, S. Nath, S. Suri, and J. Liu, “Gamps: compressing Mario A. Nascimento earned his Ph.D. degree multi sensor data by grouping and amplitude scaling,” in SIGMOD in Computer Science at Southern Methodist Uni- Conference, 2009, pp. 771–784. versity (Dallas, TX) and is currently an Associate[21] C. Faloutsos, M. Ranganathan, and Y. Manolopoulos, “Fast subse- Professor and Associat Chair (Research) at the quence matching in time-series databases.” in SIGMOD Conference, University of Albertas Department of Computing 1994, pp. 419–429. Science in Canada. In addition he has also[22] Y.-S. Moon, K.-Y. Whang, and W.-K. Loh, “Duality-based subse- been a Visiting Professor at universities in Korea quence matching in time-series databases.” in ICDE, 2001, pp. 263– (CAU), Singapore (NUS) and Denmark (AAU). 272. He has published over 60 papers in interna-[23] Y.-S. Moon, K.-Y. Whang, and W.-S. Han, “General match: a tional journals, conferences and workshops, of- subsequence matching method in time-series databases based on ten serves as program committee member of generalized windows.” in SIGMOD Conference, 2002, pp. 382–393. the top atabase conferences, has been program co-chair of several[24] S. Park, W. W. Chu, J. Yoon, and C. Hsu, “Efficient searches for workshops and also served as guest-editor for journals. Mario was also similar subsequences of different lengths in sequence databases.” ACM SIGMODs Information Director, ACM SIGMOD Records Editor-In- in ICDE, 2000, pp. 23–32. Chief, and is a Senior Member of the ACM. His main research interests[25] H. Wu, B. Salzberg, and D. Zhang, “Online event-driven subse- lie in the area of in data management of spatio-temporal data and within quence matching over financial data streams.” in SIGMOD Confer- wireless sensor networks. ence, 2004, pp. 23–34.