Compressed learning for time series classification

Compressed Learning for Time
Series Classification
Shueh-Han Shih
Department of Computer Science and Information
Engineering, National Taiwan University of Science and
Technology

Outline
• Introduction
• Compressed sensing
• Sparse representation - envelope
• Classification framework
• Experimental results
• Case study
• Conclusion

Motivation
Image: http://www.aeris.com/

Motivation (cont’d)
• The key to handle time series data effectively is
choosing a suitable representation
• Transmission and storage issues are critical in IoT
scenario
• To provide interpretable result for human is
important
 Time series sparse representation - envelope

Time series data type
1. Symbolic sequence
2. Complex symbolic sequence
3. Simple time series
4. Multivariate time series
A brief survey on sequence classification Z Xing, J Pei, E Keogh - ACM SIGKDD , 2010

Classification of time series
• Assigning instances to one of the predefined classes.
0 20 40 60 80 100 120 140 160 180 200
-200
-150
-100
-50
0
50
100
150
200
Class 1
0 20 40 60 80 100 120 140 160 180 200
-200
-150
-100
-50
0
50
100
150
200
Class 2
0 20 40 60 80 100 120 140 160 180 200
-200
-150
-100
-50
0
50
100
150
200
Class 3
0 20 40 60 80 100 120 140 160 180 200
-200
-150
-100
-50
0
50
100
150
200
Class 4

Time series classification approaches
• Feature based
• Sequence distance based
• Model based
A brief survey on sequence classification Z Xing, J Pei, E Keogh - ACM SIGKDD, 2010

Conventional approach
• Number of sample needed for compressed sensing is
much more lower than Nyquist frequency.
Image: http://www.ni.com/

Main idea
• Most real-world signals are sparse in some basis
A𝑥 = 𝑦, A ∈ ℝ 𝑝×𝑛 𝑎𝑛𝑑 𝑝 ≪ 𝑛
• Dramatically reduce the transmission loading
a measure

Requirements of compressed sensing
1. 𝑥 should be a 𝑘-sparse signal
 1 to 1 relation between data and compressed domain
2. A must satisfies the restricted isometry property
 (1 − δ 𝑝)ǁ𝑥ǁ2
2
≤ ǁA𝑥ǁ2
2
≤ (1 + δ 𝑝)ǁ𝑥ǁ2
2
A =
𝑟𝑎𝑛𝑑𝑛 𝑝,𝑛
𝑛
(mean=0, 𝜎 =
1
𝑛
)
for some constant 𝛿 𝑝 ∈ (0, 1
Image: Mostafa Mohsenvand Projects

Basic routine
𝑨 = 𝑸
𝑨′ = 𝑸𝑷
𝒚 = 𝑨𝒙(𝒐𝒓 𝑨′
𝒔)
transmission
• Postpone the computational cost to recovery stage

Learning in the compressed domain
• Perform task without recovery
• SVM can keep the learnability
in compressed domain
• Reduce model complexity
Image: Compressed learning: Universal sparse dimensionality reduction and learning in the
measurement domain. R Calderbank, S Jafarpour, R Schapire - preprint, 2009 - dsp.rice.edu

The origin
• The ‘envelope’ in finance
Image: http://www.investopedia.com/

Preliminaries
• A time series
– T = 𝑡1, 𝑡2, … 𝑡𝑗 … 𝑡 𝑛 , 𝑡𝑗 ∈ ℝ
• A time series dataset
– D = T 𝑖 | T 𝑖 = 𝑡1
𝑖
, 𝑡2
𝑖
, … 𝑡 𝑛
𝑖 , 𝑖 = 1 𝑡𝑜 𝑚
• Well-synchronized with the same length
– A set of random sample from random variables 𝐓1, 𝐓2, … 𝐓𝑗, … 𝐓𝑛

Envelope creation
• Given 𝐷, envelope with size 𝑘
– E 𝑘 = 𝑍 𝑍 = 𝑧1, 𝑧2, … 𝑧 𝑛 , 𝑧𝑗 − 𝜇 𝑗 ≤ 𝑘 ∙ 𝑠𝑡𝑑𝑗 , ∀ 𝑧𝑗 ∈ ℝ}
• 𝜇 𝑗 = 𝑚𝑒𝑎𝑛(𝑻𝑗) , 𝑠𝑡𝑑𝑗 = 𝑠𝑡𝑑(𝑻𝑗)
– Profiling the time series dataset

Envelope encoding
• Encoding time series T as a sparse series S
• Sparsity indicates the similarity of a time series and 𝐷
𝑠𝑗 = 1, 𝑖𝑓 𝑡𝑗 > 𝜇 𝑗 + 𝑘 ∙ 𝑠𝑡𝑑𝑗
𝑠𝑗 = −1, 𝑖𝑓 𝑡𝑗 < 𝜇 𝑗 − 𝑘 ∙ 𝑠𝑡𝑑𝑗
𝑠𝑗 = 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
, 𝑓𝑜𝑟 𝑗 = 1 𝑡𝑜 𝑛

Guarantee of sparsity
• According to Chebyshev’s inequality,
– Pr(|X − 𝜇| ≤ 𝑘𝜎) ≥ 1 −
1
𝑘2
– No matter what kind of distribution for 𝑻𝑗

Determination of 𝑘
• 𝑘 affects the effectiveness of envelope representation

Determination of 𝑘 (cont’d)
• Focus on time series multi-class classification
– Envelope representation should be discriminative
𝑘∗ = arg max
𝑘
(−𝑎 𝑘 + 𝜆 ∙ 𝑏 𝑘)
– 𝑘 : tradeoff between sparsity and distinguishability

Encoding result visualization
• Sparsity indicates similarity
Sparse property
⟹ transmission
efficiency
• Encoding results
are interpretable
ECGFiveDays from UCR

Envelope representation workflow
• From raw series to feature

Overall workflow
• Classification scheme

Outline
• Introduction
• Compressed sensing
• Sparse representation - envelope
• Classification framework
• Experimental results
• Case study
• Conclusion
1. Proposed method vs. state-of-art method on
classification task
2. Compressibility of envelope representation
with compressed sensing
3. Noise resistance of envelope representation
4. Time efficiency
5. Space efficiency

Classification performance
• Benchmark dataset from UCR (5/42)
Dataset/ Algorithm Number of
classes
Size of training
set
Size of testing
set
Time
series Length
CBF 3 30 900 128
Coffee 2 28 28 286
ECGFiveDays 2 23 861 136
ItalyPowerDemand 2 67 1029 24
Sony II 2 27 953 65

Classification performance (cont’d)
• Result on benchmark dataset (5/42)
– Win:9 / lose:18 / between:15 (close:12)
– Not the case with IoT scenario, never lack of data
Dataset/ Algorithm 1NN-Euclidean 1NN-DTW (best, noWin) Envelope+
linearSVM
CBF 85.2(0.9357) 99.6/99.7 90.66
Coffee 75(0.031608) 82.1/82.1 85.71
ECGFiveDays 79.7(0.8758) 79.7/76.8 88.38
ItalyPowerDemand 95.5(1.0661) 95.5/95 97.08
Sony II 69.5(0.9986) 69.5/72.5 82.79

Influence of compression ratio
• Compression ratio = 𝑝/𝑛
(Number of measurements) / (data dimension)

Influence of compression ratio (cont’d)
• Using nearly
1
3
datasets from UCR
– Some datasets have excellent compressibility

Influence of compression ratio (cont’d)
• Result on benchmark dataset (5/42)
Dataset/ Algorithm 1NN-
Euclidean
1NN-DTW
(best, noWin)
Compression
ratio=10%
Compression
ratio=20%
Compression
ratio=50%
CBF 85.2 99.6/99.7 80.44 88.22 88.44
Coffee 75 82.1/82.1 71.42 82.14 89.28
ECGFiveDays 79.7 79.7/76.8 78.86 81.3 81.64
ItalyPowerDemand 95.5 95.5/95 86.58 91.73 93.97
Sony II 69.5 69.5/72.5 76.91 78.38 80.06

Robustness to noise
• Noise level - SNR
Image: documentation.meraki.com

Robustness to noise (cont’d)
• Using ECG200 dataset as example
The original envelope The envelope with noise level SNR=10.

Robustness to noise (cont’d)
• Envelope representation is noise-resistant
– Can even ignore denoising stage
Envelope built/SVM trained with clean data Envelope built/SVM trained with noisy data

Time efficiency
1. Building envelope takes O(m*n)
2. Encoding each instance takes O(n)
3. Linear SVM, expects to be O(m2)
 Linear time in prediction
0 2 4 6 8 10 12 14 16
0
2
4
6
8
10
12
14
16
Execution time (testing)
envelope (Sec.)
KNN+ED(Sec.)

Space efficiency
1. 32 to (2 ∗ #𝑐𝑙𝑎𝑠𝑠) ratio of reduction
2. 32 to (32 ∗ #𝑐𝑙𝑎𝑠𝑠 ∗ 𝑐𝑜𝑚𝑝𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑟𝑎𝑡𝑖𝑜) ratio of
reduction through compressed sensing
3. Run length encoding

Smart home project
• Passive user identification
Image: www.bitronvideo.eu

Using sensor
• Data collection
– EcoBT Mini
– 33Hz
50 100 150 200 250
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
50 100 150 200 250
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
50 100 150 200 250
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
50 100 150 200 250
-80
-60
-40
-20
0
20
40
60
80
50 100 150 200 250
-200
-150
-100
-50
0
50
100
150
200
50 100 150 200 250
-200
-150
-100
-50
0
50
100
150

Door opening recognition
• User identification

Door opening recognition (cont’d)
• Recognition performance
– Left: axis 5 Right: axis 1&5
0 20 40 60 80 100 120 140 160 180 200
-200
-150
-100
-50
0
50
100
150
200
Class 1
0 20 40 60 80 100 120 140 160 180 200
-200
-150
-100
-50
0
50
100
150
200
Class 2
0 20 40 60 80 100 120 140 160 180 200
-200
-150
-100
-50
0
50
100
150
200
Class 3
0 20 40 60 80 100 120 140 160 180 200
-200
-150
-100
-50
0
50
100
150
200
Class 4

Gait recognition
• Via slipper
Image: www.footbionics.com

Gait recognition(cont’d)
• Recognition capability
– Left: axis 2 Right: axis 2&3&4
0 5 10 15 20 25 30 35 40
-1
-0.5
0
0.5
1
1.5
2
Class 1
0 5 10 15 20 25 30 35 40
-1
-0.5
0
0.5
1
1.5
2
Class 2
0 5 10 15 20 25 30 35 40
-1
-0.5
0
0.5
1
1.5
2
Class 3
0 5 10 15 20 25 30 35 40
-1
-0.5
0
0.5
1
1.5
2
Class 4

Demonstration
• Demo workflow

Conclusion
• Propose a sparse representation for time series
• Propose a heuristic to determine envelope size 𝑘
• Effectiveness, efficiency, robustness verification
• Real-world use case

Compressed learning for time series classification

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (14)

Similar to Compressed learning for time series classification

Similar to Compressed learning for time series classification (20)

Recently uploaded

Recently uploaded (20)

Compressed learning for time series classification

Editor's Notes