MiLeTS'19: Enumerating Hub Motifs in Time Series Based on the Matrix Profile

Enumerating Hub Motifs
in Time Series
Based on the Matrix Profile
1 National Institute of Advanced Industrial Science and Technology (AIST)
2 Mitsubishi Electric Corporation
3 LeapMind Inc.
5th Workshop on Mining and Learning from Time Series (MiLeTS’19)
Aug 5, 2019 - Anchorage, Alaska, USA
Genta Yoshimura1,2 Atsunori Kanemura1,3 Hideki Asoh1

Outline
1. Introduction
• Motif Enumeration
• Problems in Existing Methods
2. Method
• Novel Motif Definition: Hub Motif
• Proposed Method: HubFinder
3. Experiments
• Synthetic Data
• Human Motion Data
4. Conclusion
• Summary
2G. Yoshimura et al. Enumerating Hub Motifs in Time Series Based on the Matrix ProfileMiLeTS'19

Outline
1. Introduction
• Problems of Existing Methods
2. Method
3. Experiments
• Synthetic Data
4. Conclusion
• Summary

Motif Enumeration from Time Series
Motif = a subsequence that occurs frequently in time series
• Finding motifs is useful for many time series mining tasks
• Classification, forecasting, segmentation, anomaly detection, …
Motif Enumeration
• Enumerate multiple motifs in order of significance
rather than fining a single motif
• Most time series include multiple patterns
• In our problem setting, motif length W is a tunable parameter
• Not variable-length motifs, but fixed-length motifs
Introduction
Time series
Time
Motif 1 Motif 2
W W W W

The difference of two definitions arises from
how we regard a subsequence as significant
1. Range motif
• A subsequence is significant
if there exist many subsequences
inside the sphere of radius R
2. Closest-pair motif
if the distance to its closest
subsequence is small
Note
• Z-normalized Euclidean Distance (ED) is used as subsequence distance
• Trivial-matches are ignored when finding neighbor subsequences
n1=6 n2=3
Existing Two Motif Definitions
Introduction
>significant
>significant
d1=0.63 d2=0.89
Subsequence whose length is W
R

Existing methods [Bagnall+14] require a radius parameter R
• Place spheres of radius R so as not to overlap each other
• Iteratively find the most significant subsequence as motif
and remove subsequences inside the sphere of radius R
1. SetFinder = Range motif based method
2. ScanMK = Closest-pair motif based method
Existing Motif Enumeration Methods
Introduction
R
R ×
argmaxi ni
argmini di
argmaxi ni
argmini di
remove
remove
remove
remove

Problems in Existing Methods
Existing methods suffer from the radius parameter R
1. It is not easy to tune R
• Appropriate parameter R changes
in accordance with the target dataset
• We cannot even know which R is appropriate
in most real applications where no ground truth is available
2. There are cases where the existing methods fail to
enumerate motifs successfully no matter how finely tune R
• Such cases can be easily made and actually occur in real datasets
Novel motif enumeration method is necessary
Introduction
R
R
R
？
Too small…Too large…

Outline
1. Introduction
2. Method
3. Experiments
• Synthetic Data
4. Conclusion
• Summary

Novel Motif Definition: Hub Motif
In order to get free from the radius parameter R
1. Range motif
if there exist many subsequences
with in the sphere of radius R
2. Closest-pair motif
if the distance to its closest
subsequence is small
3. Hub motif
if a sum of distances from
other subsequences is small
• Looks like a wheel hub
Method
R
d1=0.63 d2=0.89
n1=6 n2=3
>significant
Σk dik=5.12 Σk djk=7.36

Proposed Method: HubFinder
HubFinder does not require the radius parameter R
• Motif length: W
• Number of motifs: K
HubFinder consists of two steps
1. Extract candidates for motifs using the matrix profile
2. Refine candidates into K motifs according to the hub motif significance
Method
Time series
Candidates
1. Extract
2. Refine
Time
W
K Motifs
・・・
K

Step 1. Extract candidates for motifs
• Time series:
• -th subsequence:
• Matrix profile:
• is z-normalized Euclidean Distance (ED)
between and its closest subsequence
except its trivial matches
• Can be computed efficiently using STOMP algorithm [Zhu+16]
• is a candidate of motifs if is a local minimum of
• Use a sliding window whose length is to detect local minima
• Extracted candidates are added to a candidate set
closest-pair
Time series X
Matrix profile P STOMP
Method
is a local minimum in a sliding window ⇒

Step 2. Refine candidates into K motifs
• Refine the candidate set into a motif set
• Cost function based on the hub motif definition
• Find which minimizes the cost function in greedy manner
• New candidate is added to one by one
• If , the least significant candidate is removed
MiLeTS'19 G. Yoshimura et al. Enumerating Hub Motifs in Time Series Based on the Matrix Profile 12
Method
Motif set = { }Candidate set = { }

Outline
1. Introduction
2. Method
3. Experiments
• Synthetic Data
4. Conclusion
• Summary

Synthetic Data
Two motifs of length W=32 are arranged alternately
• Motif-1: z-normalized triangular wave + Gaussian noise
• Motif-2: z-normalized sine wave + Gaussian noise
Experiments
・・・ x50 (T=9616)

Apply ScanMK, SetFinder, and HubFinder with W=32
• HubFinder succeeds in finding alternate motifs perfectly without tuning R
• Existing methods fail no matter how finely you tune R
• Existing methods are sensitive to R
Synthetic Data (Result)
Experiments
58.5% (R=0.96)
69.0% (R=0.86)
100% (constant)
Purity(thelarger,thebetter)
Radius
Extracted 2nd motif and neighbors
Extracted 1st motif and neighbors

Human Motion Data
MotionSense Dataset [Malekzadeh+18]
• Collected with an iPhone 6s kept in the participant's front pocket
• Include 3D accelerometer time series of human motion
• Total of 24 participants performed several activities
• 4 activities were chosen for this study
Experiments
Downstairs Upstairs
Walking Jogging
x
y
z
x
y
z
x
y
z
x
y
z

Human Motion Data (Result)
Apply ScanMK, SetFinder, and HubFinder with W=64
• Position of top-4 motifs ( ) and neighbors ( ) of participant #23
• Existing methods fail to find motif from Downstairs activity
• HubFinder successfully finds motifs from all 4 activities
Experiments
Downstairs Upstairs Walking Jogging
ScanMKSetFinderhubFinder

Outline
1. Introduction
2. Method
3. Experiments
• Synthetic Data
4. Conclusion
• Summary

Summary
• Problems in existing motif enumeration methods caused by R
• Novel hub motif definition and HubFinder algorithm
• HubFinder succeeds in finding appropriate motifs without tuning R
• Existing methods fail no matter how finely tune R
Future Work
• Remove the motif length parameter W
(Extend to variable-length motifs)
• Utilize extracted motifs for other time series mining tasks
such as classification, forecasting, segmentation, and anomaly detection
Python code is available at
https://github.com/intellygenta/HubFinder
Thank you for your attention!!
Conclusion

References
[Bagnall+14] A. Bagnall, J. Hills, and J. Lines,
“Finding Motif Sets in Time Series”
arXiv:1407.3685 (2014).
[Zhu+16] Y. Zhu, Z. Zimmerman, N. S. Senobari, C. C. M. Yeh,
G. Funning, A. Mueen, P. Brisk, and E. Keogh,
“Matrix Profile II: Exploiting a Novel Algorithm and GPUs to
Break the One Hundred Million Barrier for Time Series Motifs and Joins“
IEEE 16th International Conference on Data Mining (ICDM), 739–748. (2016)
[Malekzadeh+18] M. Malekzadeh, R. G. Clegg, A. Cavallaro, and H. Haddadi,
“Protecting sensory data against sensitive inferences”
Workshop on Privacy by Design in Distributed Systems (2018).

Purity
• Motif enumeration is similar to the clustering task to some extent
• Find representative patterns within a dataset in unsupervised manner
• We adopt purity as evaluation metric for this study
• One of the most popular metric for the clustering
•
• Ground truth motif clusters:
• Enumerated motif clusters:
• E.g. Purity = (5 + 1) / (5 + 5) = 0.60
ψ1 ψ1
∈
∈
∈
∈
∈
∈
∈
∈
∈
∈
ψ1 ψ1 ψ1ψ2 ψ2 ψ2 ψ2 ψ2
ω1
∈
∈
∈
∈
∈
∈
ω2 ω1 ω1 ω1 ω1
∈
ω1
Ground truth
Enumerated

Time Complexity
• Running times on the synthetic time series
• HubFinder is faster than the existing methods for long time series
• HubFinder does not need multiple trials for tuning R

Dependency of the Number of Motifs K
Dependency of the number of motifs K on the purity for the synthetic data
• Blue and orange lines: ScanMK and SetFinder for their optimal radius R
• Gray lines: Existing methods with non-optimal radii
• HubFinder outperforms existing methods for all K and R

Human Motion Data (Result for All Participants)
HubFinder outperforms existing methods in terms of purity metric
ScanMK/SetFinder Purity (with the best radius R)
HubFinderPurity(withouttuningR)

MiLeTS'19: Enumerating Hub Motifs in Time Series Based on the Matrix Profile

Recommended

Recommended

More Related Content

Similar to MiLeTS'19: Enumerating Hub Motifs in Time Series Based on the Matrix Profile

Similar to MiLeTS'19: Enumerating Hub Motifs in Time Series Based on the Matrix Profile (20)

Recently uploaded

Recently uploaded (20)

MiLeTS'19: Enumerating Hub Motifs in Time Series Based on the Matrix Profile