Genta Yoshimura, Atsunori Kanemura, Hideki Asoh,
"Enumerating Hub Motifs in Time Series Based on the Matrix Profile,"
5th Workshop on Mining and Learning from Time Series (MiLeTS’19).
MiLeTS'19: Enumerating Hub Motifs in Time Series Based on the Matrix Profile
1. Enumerating Hub Motifs
in Time Series
Based on the Matrix Profile
1 National Institute of Advanced Industrial Science and Technology (AIST)
2 Mitsubishi Electric Corporation
3 LeapMind Inc.
5th Workshop on Mining and Learning from Time Series (MiLeTS’19)
Aug 5, 2019 - Anchorage, Alaska, USA
Genta Yoshimura1,2 Atsunori Kanemura1,3 Hideki Asoh1
2. Outline
1. Introduction
• Motif Enumeration
• Problems in Existing Methods
2. Method
• Novel Motif Definition: Hub Motif
• Proposed Method: HubFinder
3. Experiments
• Synthetic Data
• Human Motion Data
4. Conclusion
• Summary
2G. Yoshimura et al. Enumerating Hub Motifs in Time Series Based on the Matrix ProfileMiLeTS'19
3. Outline
1. Introduction
• Motif Enumeration
• Problems of Existing Methods
2. Method
• Novel Motif Definition: Hub Motif
• Proposed Method: HubFinder
3. Experiments
• Synthetic Data
• Human Motion Data
4. Conclusion
• Summary
3G. Yoshimura et al. Enumerating Hub Motifs in Time Series Based on the Matrix ProfileMiLeTS'19
4. Motif Enumeration from Time Series
Motif = a subsequence that occurs frequently in time series
• Finding motifs is useful for many time series mining tasks
• Classification, forecasting, segmentation, anomaly detection, …
Motif Enumeration
• Enumerate multiple motifs in order of significance
rather than fining a single motif
• Most time series include multiple patterns
• In our problem setting, motif length W is a tunable parameter
• Not variable-length motifs, but fixed-length motifs
4G. Yoshimura et al. Enumerating Hub Motifs in Time Series Based on the Matrix ProfileMiLeTS'19
Introduction
Time series
Time
Motif 1 Motif 2
W W W W
5. The difference of two definitions arises from
how we regard a subsequence as significant
1. Range motif
• A subsequence is significant
if there exist many subsequences
inside the sphere of radius R
2. Closest-pair motif
• A subsequence is significant
if the distance to its closest
subsequence is small
Note
• Z-normalized Euclidean Distance (ED) is used as subsequence distance
• Trivial-matches are ignored when finding neighbor subsequences
n1=6 n2=3
Existing Two Motif Definitions
5G. Yoshimura et al. Enumerating Hub Motifs in Time Series Based on the Matrix ProfileMiLeTS'19
Introduction
>significant
>significant
d1=0.63 d2=0.89
Subsequence whose length is W
R
6. Existing methods [Bagnall+14] require a radius parameter R
• Place spheres of radius R so as not to overlap each other
• Iteratively find the most significant subsequence as motif
and remove subsequences inside the sphere of radius R
1. SetFinder = Range motif based method
2. ScanMK = Closest-pair motif based method
Existing Motif Enumeration Methods
6G. Yoshimura et al. Enumerating Hub Motifs in Time Series Based on the Matrix ProfileMiLeTS'19
Introduction
R
R ×
argmaxi ni
argmini di
argmaxi ni
argmini di
remove
remove
remove
remove
7. Problems in Existing Methods
Existing methods suffer from the radius parameter R
1. It is not easy to tune R
• Appropriate parameter R changes
in accordance with the target dataset
• We cannot even know which R is appropriate
in most real applications where no ground truth is available
2. There are cases where the existing methods fail to
enumerate motifs successfully no matter how finely tune R
• Such cases can be easily made and actually occur in real datasets
Novel motif enumeration method is necessary
7G. Yoshimura et al. Enumerating Hub Motifs in Time Series Based on the Matrix ProfileMiLeTS'19
Introduction
R
R
R
?
Too small…Too large…
8. Outline
1. Introduction
• Motif Enumeration
• Problems of Existing Methods
2. Method
• Novel Motif Definition: Hub Motif
• Proposed Method: HubFinder
3. Experiments
• Synthetic Data
• Human Motion Data
4. Conclusion
• Summary
8G. Yoshimura et al. Enumerating Hub Motifs in Time Series Based on the Matrix ProfileMiLeTS'19
9. Novel Motif Definition: Hub Motif
In order to get free from the radius parameter R
1. Range motif
• A subsequence is significant
if there exist many subsequences
with in the sphere of radius R
2. Closest-pair motif
• A subsequence is significant
if the distance to its closest
subsequence is small
3. Hub motif
• A subsequence is significant
if a sum of distances from
other subsequences is small
• Looks like a wheel hub
9G. Yoshimura et al. Enumerating Hub Motifs in Time Series Based on the Matrix ProfileMiLeTS'19
Method
R
d1=0.63 d2=0.89
n1=6 n2=3
>significant
Σk dik=5.12 Σk djk=7.36
10. Proposed Method: HubFinder
HubFinder does not require the radius parameter R
• Motif length: W
• Number of motifs: K
HubFinder consists of two steps
1. Extract candidates for motifs using the matrix profile
2. Refine candidates into K motifs according to the hub motif significance
10G. Yoshimura et al. Enumerating Hub Motifs in Time Series Based on the Matrix ProfileMiLeTS'19
Method
Time series
Candidates
1. Extract
2. Refine
Time
W
K Motifs
・・・
K
11. Step 1. Extract candidates for motifs
• Time series:
• -th subsequence:
• Matrix profile:
• is z-normalized Euclidean Distance (ED)
between and its closest subsequence
except its trivial matches
• Can be computed efficiently using STOMP algorithm [Zhu+16]
• is a candidate of motifs if is a local minimum of
• Use a sliding window whose length is to detect local minima
• Extracted candidates are added to a candidate set
11G. Yoshimura et al. Enumerating Hub Motifs in Time Series Based on the Matrix ProfileMiLeTS'19
closest-pair
Time series X
Matrix profile P STOMP
Method
is a local minimum in a sliding window ⇒
12. Step 2. Refine candidates into K motifs
• Refine the candidate set into a motif set
• Cost function based on the hub motif definition
• Find which minimizes the cost function in greedy manner
• New candidate is added to one by one
• If , the least significant candidate is removed
MiLeTS'19 G. Yoshimura et al. Enumerating Hub Motifs in Time Series Based on the Matrix Profile 12
Method
Motif set = { }Candidate set = { }
13. Outline
1. Introduction
• Motif Enumeration
• Problems of Existing Methods
2. Method
• Novel Motif Definition: Hub Motif
• Proposed Method: HubFinder
3. Experiments
• Synthetic Data
• Human Motion Data
4. Conclusion
• Summary
13G. Yoshimura et al. Enumerating Hub Motifs in Time Series Based on the Matrix ProfileMiLeTS'19
14. Synthetic Data
Two motifs of length W=32 are arranged alternately
• Motif-1: z-normalized triangular wave + Gaussian noise
• Motif-2: z-normalized sine wave + Gaussian noise
MiLeTS'19 G. Yoshimura et al. Enumerating Hub Motifs in Time Series Based on the Matrix Profile 14
Experiments
・・・ x50 (T=9616)
15. Apply ScanMK, SetFinder, and HubFinder with W=32
• HubFinder succeeds in finding alternate motifs perfectly without tuning R
• Existing methods fail no matter how finely you tune R
• Existing methods are sensitive to R
Synthetic Data (Result)
MiLeTS'19 G. Yoshimura et al. Enumerating Hub Motifs in Time Series Based on the Matrix Profile 15
Experiments
58.5% (R=0.96)
69.0% (R=0.86)
100% (constant)
Purity(thelarger,thebetter)
Radius
Extracted 2nd motif and neighbors
Extracted 1st motif and neighbors
16. Human Motion Data
MotionSense Dataset [Malekzadeh+18]
• Collected with an iPhone 6s kept in the participant's front pocket
• Include 3D accelerometer time series of human motion
• Total of 24 participants performed several activities
• 4 activities were chosen for this study
MiLeTS'19 G. Yoshimura et al. Enumerating Hub Motifs in Time Series Based on the Matrix Profile 16
Experiments
Downstairs Upstairs
Walking Jogging
x
y
z
x
y
z
x
y
z
x
y
z
17. Human Motion Data (Result)
Apply ScanMK, SetFinder, and HubFinder with W=64
• Position of top-4 motifs ( ) and neighbors ( ) of participant #23
• Existing methods fail to find motif from Downstairs activity
• HubFinder successfully finds motifs from all 4 activities
MiLeTS'19 G. Yoshimura et al. Enumerating Hub Motifs in Time Series Based on the Matrix Profile 17
Experiments
Downstairs Upstairs Walking Jogging
ScanMKSetFinderhubFinder
18. Outline
1. Introduction
• Motif Enumeration
• Problems of Existing Methods
2. Method
• Novel Motif Definition: Hub Motif
• Proposed Method: HubFinder
3. Experiments
• Synthetic Data
• Human Motion Data
4. Conclusion
• Summary
18G. Yoshimura et al. Enumerating Hub Motifs in Time Series Based on the Matrix ProfileMiLeTS'19
19. Summary
• Problems in existing motif enumeration methods caused by R
• Novel hub motif definition and HubFinder algorithm
• HubFinder succeeds in finding appropriate motifs without tuning R
• Existing methods fail no matter how finely tune R
Future Work
• Remove the motif length parameter W
(Extend to variable-length motifs)
• Utilize extracted motifs for other time series mining tasks
such as classification, forecasting, segmentation, and anomaly detection
Python code is available at
https://github.com/intellygenta/HubFinder
Thank you for your attention!!
19G. Yoshimura et al. Enumerating Hub Motifs in Time Series Based on the Matrix ProfileMiLeTS'19
Conclusion
20. References
[Bagnall+14] A. Bagnall, J. Hills, and J. Lines,
“Finding Motif Sets in Time Series”
arXiv:1407.3685 (2014).
[Zhu+16] Y. Zhu, Z. Zimmerman, N. S. Senobari, C. C. M. Yeh,
G. Funning, A. Mueen, P. Brisk, and E. Keogh,
“Matrix Profile II: Exploiting a Novel Algorithm and GPUs to
Break the One Hundred Million Barrier for Time Series Motifs and Joins“
IEEE 16th International Conference on Data Mining (ICDM), 739–748. (2016)
[Malekzadeh+18] M. Malekzadeh, R. G. Clegg, A. Cavallaro, and H. Haddadi,
“Protecting sensory data against sensitive inferences”
Workshop on Privacy by Design in Distributed Systems (2018).
MiLeTS'19 G. Yoshimura et al. Enumerating Hub Motifs in Time Series Based on the Matrix Profile 20
21. Purity
• Motif enumeration is similar to the clustering task to some extent
• Find representative patterns within a dataset in unsupervised manner
• We adopt purity as evaluation metric for this study
• One of the most popular metric for the clustering
•
• Ground truth motif clusters:
• Enumerated motif clusters:
• E.g. Purity = (5 + 1) / (5 + 5) = 0.60
MiLeTS'19 G. Yoshimura et al. Enumerating Hub Motifs in Time Series Based on the Matrix Profile 21
ψ1 ψ1
∈
∈
∈
∈
∈
∈
∈
∈
∈
∈
ψ1 ψ1 ψ1ψ2 ψ2 ψ2 ψ2 ψ2
ω1
∈
∈
∈
∈
∈
∈
ω2 ω1 ω1 ω1 ω1
∈
ω1
Ground truth
Enumerated
22. Time Complexity
• Running times on the synthetic time series
• HubFinder is faster than the existing methods for long time series
• HubFinder does not need multiple trials for tuning R
MiLeTS'19 G. Yoshimura et al. Enumerating Hub Motifs in Time Series Based on the Matrix Profile 22
23. Dependency of the Number of Motifs K
Dependency of the number of motifs K on the purity for the synthetic data
• Blue and orange lines: ScanMK and SetFinder for their optimal radius R
• Gray lines: Existing methods with non-optimal radii
• HubFinder outperforms existing methods for all K and R
MiLeTS'19 G. Yoshimura et al. Enumerating Hub Motifs in Time Series Based on the Matrix Profile 23
24. Human Motion Data (Result for All Participants)
HubFinder outperforms existing methods in terms of purity metric
MiLeTS'19 G. Yoshimura et al. Enumerating Hub Motifs in Time Series Based on the Matrix Profile 24
ScanMK/SetFinder Purity (with the best radius R)
HubFinderPurity(withouttuningR)