Demonstrating Quantum Speed-Up with a Two-Transmon Quantum Processor Ph.D. d...
Shapelet Poster Nurjahan Begum
1. Scalable Discovery of Multi Dimensional Time Series ShapeletsScalable Discovery of Multi Dimensional Time Series Shapelets
NurjahanNurjahan BegumBegum11,, JianhuiJianhui ChenChen22, Makoto Yamada, Makoto Yamada22 andand EamonnEamonn KeoghKeogh11
UniversityUniversity of California,of California, RiversideRiverside11 Yahoo LabsYahoo Labs22
WhyWhy ShapeletsShapelets are Important?are Important?
IntroductionIntroduction Admissible PruningAdmissible Pruning –– Entropy PruningEntropy Pruning
State of the art algorithmsState of the art algorithms
Multi DimensionalMulti Dimensional ShapeletsShapelets
Time series shapelets are small, local patterns in a time series that are
highly predictive of a class and are thus very useful features for building
classifiers and for certain visualization and summarization tasks.
0 200 400 600 800 1000 1200 1400
Horned Lizards
Turtles
Exact Algorithm [1]
Orderline0 ∞
split
candidate
Locked Shapelet
Euclidean Distance Early Abandoned Euclidean Distance
Admissible PruningAdmissible Pruning –– Early AbandonEarly Abandon
Orderline for bsf shapelet candidate
Optimal split point
of the 30,240 distinct ways the
remaining five distances could be added
to this line, could any of them results in
an information gain that is better than
the best so far?
Question: Answer:
Constant time answer!
Trick: consider most optimistic
scenarios and test.
Optimistically assumed placements
Information gain of the better of the two possiblities:
50% of the distance calculations are pruned!
0 100
0 100
0 100
0 100
0 100
0
0 100
Right wrist
Right Side of Pelvis
Subject A Subject B
100
0 100
0 100
0 0 100
0 100
0 100
Right Ankle
Why discovery of multi dimensional
shapelets is expensive?
Brute force algorithm complexity
Number of Candidates shapelets are high
For lagged shapelet problem, the size of candidate set will even larger
Large number of distance computation
Each distance computation will take more time now due to problem settings (multi-
dimensional)
#newiran
#Montazeri
#Neda
16 June, 2009
20 June, 2009
0 50 100 150 200 250 300
10
30
50
0 50 100 150 200 250 300
10
30
50
0 50 100 150 200 250 300
50
150
250
Event: IranRevolution
hours
hours
hours
#Taylor
#Kanye
#taylorswift
#kanyewest
0 20 40 60 80 100 120 140
2
6
10
0 20 40 60 80 100 120 140
0
20
40
0 20 40 60 80 100 120 140
0
4
8
0 20 40 60 80 100 120 140
5
15
Overall time complexity O(N2m4)
Lagged Shapelet
Our ObservationsOur Observations
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80
200
400
600
Information Gain
frequency
Lag = 0
Lag = 0
Lag = 1
Lag = 2
shapelets
InformationGain
0 5 10 15 20 250
0.5
1
0 5 10 15 20 250
0.5
1
0 5 10 15 20 250
0.5
1
The number of useful shapelet candidates is small
Only improving the useful shapelet candidates is enough
References:
[1] Ye, L., et al. Time Series Shapeles: A Useful Primitive for Data Mining. KDD, 2009.
Algorithm OutlineAlgorithm Outline
Phase 1:
Use a Logistic Regression type model to identify useful candidate shapelets
Phase 2:
Consider only the useful candidate shapelets for lagged scenario with some
fixed lag parameter (relaxed assumption)
Phase 3:
Estimate heuristically or theoretically the optimal lag width.
Types of Multi DimensionalTypes of Multi Dimensional ShapeletsShapelets