Querying and Mining of Time Series Data: Experimental Comparison ...

590 views
469 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
590
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Querying and Mining of Time Series Data: Experimental Comparison ...

  1. 2. Motivation and Summary of Findings <ul><li>The tightness of lower bounding </li></ul><ul><li>(  thus the pruning power,  indexing effectiveness) of different representation methods for time series data, for the most part, makes a very little difference on various data sets. </li></ul><ul><li>Classification error ratios of elastic measures, e.g, DTW, LCSS, EDR and ERP can be significantly more accurate than other measures </li></ul><ul><li>With large training data set size, Euclidean distance is competitive with elastic measures such as DTW (thus getting more data helps more than fussing with distance measures in most cases ) </li></ul><ul><li>Time series are ubiquitous </li></ul><ul><li>Key aspects for achieving effectiveness and efficiency : </li></ul><ul><ul><li>representation methods </li></ul></ul><ul><ul><li>similarity measures . </li></ul></ul><ul><li>Consolidate the large amount of existing research efforts </li></ul><ul><li>We conducted the largest (by a huge margin) set of time series data mining experiments </li></ul>
  2. 3. Comparison of Time Series Representation Methods TLB on an ECG data set TLB on a bursty data set TLB on a periodic data set <ul><li>8 representation methods: </li></ul><ul><li>SAX, DFT, DWT, DCT, PAA, CHEB, APCA, IPLA </li></ul><ul><li>Use tightness of lower bounds (TLB) as a metric for comparison: </li></ul><ul><ul><li>TLB = LowerBoundDist / TrueEuclideanDist </li></ul></ul><ul><li>The tightness of lower bounding </li></ul><ul><li>(  pruning power,  effectiveness of the indexing) of different representation methods, for the most part, makes little difference on various data sets </li></ul>SAX, DCT, ACPA, DFT, PAA/DWT, CHEB, IPLA 4 6 8 10 0 0.2 0.4 0.6 0.8 480 960 1440 1920 foetal_ecg (excerpt) 0 200 400 4 6 8 10 0 0.5 1 480 960 1440 1920 SAX, DCT, ACPA, DFT, PAA/DWT, CHEB, IPLA 480 960 1440 1920 4 6 8 10 0 0.2 0.4 0.6 0.8 SAX, DCT, ACPA, DFT, PAA/DWT, CHEB, IPLA
  3. 4. Comparison of Time Series Similarity Measures - Findings <ul><li>Compared 9 similarity measures: Euclidean, L 1 , L inf , DISSIM, TQuEST, DTW, EDR, ERP, LCSS, Swale and Spade </li></ul><ul><li>on 38 diverse data sets </li></ul><ul><li>Used 1-Nearest Neighbor Classification for evaluating the accuracy of underlying measures </li></ul><ul><li>Used stratified cross-validation to minimize the impact of class distribution of the data sets </li></ul><ul><li>As training set size increases, Euclidean distance quickly becomes as effective as elastic measures (e.g., DTW, EDR) </li></ul><ul><li>Edit-distance based measures are, for the most part, as effective as DTW ( but require more effort for tuning ) However they are not vastly superior as some have suggested </li></ul><ul><li>Some measures (e.g., DISSIM, TQuEST) which were claimed as being vastly superior to simpler methods, are in fact no better or worse </li></ul>
  4. 5. Example: Impact of Training Data Set Size If large training set is available, Euclidean may be as good as DTW , and is the fastest one can get…
  5. 6. Visualizing Classification Accuracy Using Scatter Plot (1) Euclidean Distance vs. L 1 Norm and L inf Norm DTW distance vs. Euclidean distance
  6. 7. Visualizing Classification Accuracy Using Scatter Plot (2) LCSS distance vs. Euclidean and DTW distance ERP distance vs. Euclidean and DTW distance
  7. 8. Visualizing Classification Accuracy Using Scatter Plot (3) DISSIM distance vs. Euclidean and DTW distance It has been claimed that DISSIM “ efficiently retrieves similar trajectories in cases where related work fails ” However, on average it is no better than DTW TQuEST distance vs. Euclidean and DTW distance It has been claimed that “ DTW is the only competitor that achieves roughly similar accuracy (to TQuEST) ” However, DTW and even Euclidean Distance is significantly better than TQuEST on average
  8. 9. Visualizing Classification Accuracy Using Scatter Plot (4) Both SpADe and Swale have been proposed as been significantly better than Euclidean Distance and DTW. However, they are both about as good as Euclidean Distance on average (show to the left), and slightly worse than DTW on average.
  9. 10. Conclusions & Future Work <ul><li>We attempted to consolidate </li></ul><ul><li>existing works on representation methods and similarity measures </li></ul><ul><li>for time series data </li></ul><ul><li>Future extensions include: </li></ul><ul><ul><li>Conducting statistical analysis to investigate relationships among different similarity measures and present correlation-based comparison. </li></ul></ul><ul><ul><li>Investigate (meta) properties of the datasets that could yield favorable effectiveness of some (or other) similarity measure </li></ul></ul>Anything else You Suggest!

×