Compressed learning for time series classification
1. Compressed Learning for Time
Series Classification
Shueh-Han Shih
Department of Computer Science and Information
Engineering, National Taiwan University of Science and
Technology
5. Motivation (cont’d)
• The key to handle time series data effectively is
choosing a suitable representation
• Transmission and storage issues are critical in IoT
scenario
• To provide interpretable result for human is
important
Time series sparse representation - envelope
6. Time series data type
1. Symbolic sequence
2. Complex symbolic sequence
3. Simple time series
4. Multivariate time series
A brief survey on sequence classification Z Xing, J Pei, E Keogh - ACM SIGKDD , 2010
7. Classification of time series
• Assigning instances to one of the predefined classes.
0 20 40 60 80 100 120 140 160 180 200
-200
-150
-100
-50
0
50
100
150
200
Class 1
0 20 40 60 80 100 120 140 160 180 200
-200
-150
-100
-50
0
50
100
150
200
Class 2
0 20 40 60 80 100 120 140 160 180 200
-200
-150
-100
-50
0
50
100
150
200
Class 3
0 20 40 60 80 100 120 140 160 180 200
-200
-150
-100
-50
0
50
100
150
200
Class 4
8. Time series classification approaches
• Feature based
• Sequence distance based
• Model based
A brief survey on sequence classification Z Xing, J Pei, E Keogh - ACM SIGKDD, 2010
10. Conventional approach
• Number of sample needed for compressed sensing is
much more lower than Nyquist frequency.
Image: http://www.ni.com/
11. Main idea
• Most real-world signals are sparse in some basis
A𝑥 = 𝑦, A ∈ ℝ 𝑝×𝑛 𝑎𝑛𝑑 𝑝 ≪ 𝑛
• Dramatically reduce the transmission loading
a measure
12. Requirements of compressed sensing
1. 𝑥 should be a 𝑘-sparse signal
1 to 1 relation between data and compressed domain
2. A must satisfies the restricted isometry property
(1 − δ 𝑝)ǁ𝑥ǁ2
2
≤ ǁA𝑥ǁ2
2
≤ (1 + δ 𝑝)ǁ𝑥ǁ2
2
A =
𝑟𝑎𝑛𝑑𝑛 𝑝,𝑛
𝑛
(mean=0, 𝜎 =
1
𝑛
)
for some constant 𝛿 𝑝 ∈ (0, 1
Image: Mostafa Mohsenvand Projects
14. Learning in the compressed domain
• Perform task without recovery
• SVM can keep the learnability
in compressed domain
• Reduce model complexity
Image: Compressed learning: Universal sparse dimensionality reduction and learning in the
measurement domain. R Calderbank, S Jafarpour, R Schapire - preprint, 2009 - dsp.rice.edu
16. The origin
• The ‘envelope’ in finance
Image: http://www.investopedia.com/
17. Preliminaries
• A time series
– T = 𝑡1, 𝑡2, … 𝑡𝑗 … 𝑡 𝑛 , 𝑡𝑗 ∈ ℝ
• A time series dataset
– D = T 𝑖 | T 𝑖 = 𝑡1
𝑖
, 𝑡2
𝑖
, … 𝑡 𝑛
𝑖 , 𝑖 = 1 𝑡𝑜 𝑚
• Well-synchronized with the same length
– A set of random sample from random variables 𝐓1, 𝐓2, … 𝐓𝑗, … 𝐓𝑛
18. Envelope creation
• Given 𝐷, envelope with size 𝑘
– E 𝑘 = 𝑍 𝑍 = 𝑧1, 𝑧2, … 𝑧 𝑛 , 𝑧𝑗 − 𝜇 𝑗 ≤ 𝑘 ∙ 𝑠𝑡𝑑𝑗 , ∀ 𝑧𝑗 ∈ ℝ}
• 𝜇 𝑗 = 𝑚𝑒𝑎𝑛(𝑻𝑗) , 𝑠𝑡𝑑𝑗 = 𝑠𝑡𝑑(𝑻𝑗)
– Profiling the time series dataset
19. Envelope encoding
• Encoding time series T as a sparse series S
• Sparsity indicates the similarity of a time series and 𝐷
𝑠𝑗 = 1, 𝑖𝑓 𝑡𝑗 > 𝜇 𝑗 + 𝑘 ∙ 𝑠𝑡𝑑𝑗
𝑠𝑗 = −1, 𝑖𝑓 𝑡𝑗 < 𝜇 𝑗 − 𝑘 ∙ 𝑠𝑡𝑑𝑗
𝑠𝑗 = 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
, 𝑓𝑜𝑟 𝑗 = 1 𝑡𝑜 𝑛
20. Guarantee of sparsity
• According to Chebyshev’s inequality,
– Pr(|X − 𝜇| ≤ 𝑘𝜎) ≥ 1 −
1
𝑘2
– No matter what kind of distribution for 𝑻𝑗
23. Determination of 𝑘 (cont’d)
• Focus on time series multi-class classification
– Envelope representation should be discriminative
𝑘∗ = arg max
𝑘
(−𝑎 𝑘 + 𝜆 ∙ 𝑏 𝑘)
– 𝑘 : tradeoff between sparsity and distinguishability
24. Encoding result visualization
• Sparsity indicates similarity
Sparse property
⟹ transmission
efficiency
• Encoding results
are interpretable
ECGFiveDays from UCR
27. Outline
• Introduction
• Compressed sensing
• Sparse representation - envelope
• Classification framework
• Experimental results
• Case study
• Conclusion
1. Proposed method vs. state-of-art method on
classification task
2. Compressibility of envelope representation
with compressed sensing
3. Noise resistance of envelope representation
4. Time efficiency
5. Space efficiency
28. Classification performance
• Benchmark dataset from UCR (5/42)
Dataset/ Algorithm Number of
classes
Size of training
set
Size of testing
set
Time
series Length
CBF 3 30 900 128
Coffee 2 28 28 286
ECGFiveDays 2 23 861 136
ItalyPowerDemand 2 67 1029 24
Sony II 2 27 953 65
29. Classification performance (cont’d)
• Result on benchmark dataset (5/42)
– Win:9 / lose:18 / between:15 (close:12)
– Not the case with IoT scenario, never lack of data
Dataset/ Algorithm 1NN-Euclidean 1NN-DTW (best, noWin) Envelope+
linearSVM
CBF 85.2(0.9357) 99.6/99.7 90.66
Coffee 75(0.031608) 82.1/82.1 85.71
ECGFiveDays 79.7(0.8758) 79.7/76.8 88.38
ItalyPowerDemand 95.5(1.0661) 95.5/95 97.08
Sony II 69.5(0.9986) 69.5/72.5 82.79
30. Influence of compression ratio
• Compression ratio = 𝑝/𝑛
(Number of measurements) / (data dimension)
31. Influence of compression ratio (cont’d)
• Using nearly
1
3
datasets from UCR
– Some datasets have excellent compressibility
32. Influence of compression ratio (cont’d)
• Result on benchmark dataset (5/42)
Dataset/ Algorithm 1NN-
Euclidean
1NN-DTW
(best, noWin)
Compression
ratio=10%
Compression
ratio=20%
Compression
ratio=50%
CBF 85.2 99.6/99.7 80.44 88.22 88.44
Coffee 75 82.1/82.1 71.42 82.14 89.28
ECGFiveDays 79.7 79.7/76.8 78.86 81.3 81.64
ItalyPowerDemand 95.5 95.5/95 86.58 91.73 93.97
Sony II 69.5 69.5/72.5 76.91 78.38 80.06
34. Robustness to noise (cont’d)
• Using ECG200 dataset as example
The original envelope The envelope with noise level SNR=10.
35. Robustness to noise (cont’d)
• Envelope representation is noise-resistant
– Can even ignore denoising stage
Envelope built/SVM trained with clean data Envelope built/SVM trained with noisy data
36. Time efficiency
1. Building envelope takes O(m*n)
2. Encoding each instance takes O(n)
3. Linear SVM, expects to be O(m2)
Linear time in prediction
0 2 4 6 8 10 12 14 16
0
2
4
6
8
10
12
14
16
Execution time (testing)
envelope (Sec.)
KNN+ED(Sec.)
37. Space efficiency
1. 32 to (2 ∗ #𝑐𝑙𝑎𝑠𝑠) ratio of reduction
2. 32 to (32 ∗ #𝑐𝑙𝑎𝑠𝑠 ∗ 𝑐𝑜𝑚𝑝𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑟𝑎𝑡𝑖𝑜) ratio of
reduction through compressed sensing
3. Run length encoding
47. Conclusion
• Propose a sparse representation for time series
• Propose a heuristic to determine envelope size 𝑘
• Effectiveness, efficiency, robustness verification
• Real-world use case
Editor's Notes
Time series classification
(dis)advantage of CS
Time series representation method
Then experiments
Last,
Data collected continuously, time correlated series; communication & storage issue
Reduce dimension & recover with little loss
Extract info. From TS for ML model
www.aeris.com
www.comp-engine.org
www.ceremade.dauphine.fr
In order to ~~~ we propose envelope…
alphabet of symbols e.g. DNA
complex symbolic sequence e.g. transaction
simple time series e.g. electric meter
multivariate time series e.g. EEG
www.bios.net
www.apps.rus.mto.gov.on.ca
www.rowetel.com
www.dianliwenmi.com
Assign new instance to certain class based on given data
Using the example from case study
model each part as one state;
the mean of the state is the mean estimated from that part
For a gesture recognizer we build multiple of these models, one for each gesture. a training set to estimate the parameters of models. During recognition we simply pick the model describing the data best.
The goal of compressed sensing is to provide measurement matrix A, with the number of measurements m as small as possible
M is the #sample, which is << Nyquist
Normal Random matrix A generated with specific parameters is usually good enough for most real world applications.
Most efforts are cost in recovery stage (discard this)
The erroe of SVM in the measurement domain is with high probability close to the error of the best linear classifier in the data domain
The idea of ‘envelope’ has been applied in finance for a long time
used by investors and traders to help identify extreme overbought and oversold conditions. (兩端分別是由開盤價和收盤價)
a vector in temporal order
D is well-synchronized with the same length
regarded as random samples from a set of random variables
A set of values covered mu+- k*std.
Envelope is the profile of the dataset
possibility of applying CS for further boost
𝑘 is a critical issue, which directly affect the distinguishability /sparsity
Propose a heuristic to make envelope distinguishable
Large/small k will affect distinguishability
Options for concise format
libSVM with 1 to 1 (shorter training time)
few datasets from UCR
comparing the performance of proposed method with state-of-art
Envelope may have inferior performance due to the lack of training instances
possible to reduce the size to about 20%~10% and still keep the classification performance, which is very promising.
Still keep good performance after compression
為訊號功率(Power of Signal)。
為雜訊功率(Power of Noise)。
為訊號振幅(Amplitude of Signal)。
為雜訊振幅(Amplitude of Noise)。
Robustness of proposed method to noise
proposed method is noise-resisting.
Faster than Knn+ED
Space-saving
Using multiple devices with weak models
integrate them to get better performance
Using BLE for transmission
identify users from door opening trajectories.
treat each step as a time series instance
the proposed method is also suitable for distinct cases.
Integrate results of multi-steps to get better performance
Door and slippers make distinct predictions
Demo
Intro. of TS classification, Pros & cons of CS
Supervised feature extraction technique
Heuristic
Benchmark, noise, compression
Real-world cases