SlideShare a Scribd company logo
1 of 40
Download to read offline
M.Tech Seminar Presentation
2014-15
Presented by
Jitesh Khandelwal
10211015
Guided by
Dr. Durga Toshniwal
IIT Roorkee
Finance
e.g. Stock prices Medical
e.g. Electro Cardio Grams
Marketing
e.g. Forecasting product/brand demands
Operations
e.g. Monitoring control infrastructure
at LHC
Social Networks
e.g. Like count on a profile picture
based on gender
Almost everything is a time series!
Value Prediction Pattern Identification
2.1, 9.3, 4.5, 3, 6.7, 4.0, 18.8, 9.2, 5.8, ?
2.1, 9.3, 4.5, 3, 6.7, 4.0, 18.8, 9.2, 5.8
Based on mathematical models Based on human perception
What’s next?
Classification
Anomaly DetectionMotif discovery
Clustering
[16]
[16] [16]
Source: www.google.com
Raw time series
data
Similarity model
selection
Dimensionality
reduction
Index
construction
Mathematical formulation
of human perception
of similarity
High dimensionality
makes distance
calculation slow
Enables efficient querying
of big and fast incoming
time series data
1 32
Raw time series
data
Similarity model
selection
Dimensionality
reduction
Index
construction
1 32
Symbolic
Representation
Text Processing
Algorithms
Double is 4 byte, Char is 1 byte. Hence, lower memory footprint.
Lp Norms
DTW distance
Longest common
subsequence
Landmark
Similarity
ℒ 𝑝 𝑥 − 𝑦 =
𝑖=1
𝐿
𝑥𝑖 − 𝑦𝑖
𝑝
1
𝑝 ℒ1
ℒ2
- Manhattan distance
- Euclidean distance
Invariant to amplitude scaling when used with z-score normalization.
Source: www.google.com
Lp Norms
DTW distance
Longest common
subsequence
Landmark
Similarity
𝐷 𝑖, 𝑗 = 𝑥𝑖 − 𝑦𝑗
+ 𝑚𝑖𝑛 𝐷 𝑖 − 1, 𝑗 , 𝐷 𝑖 − 1, 𝑗 − 1 , 𝐷 𝑖, 𝑗 − 1
DTW is Dynamic Time Warping.
Allows comparison of variable length time series.
Computationally Expensive. Can be optimized using warping
window techniques and early abandoning using lower bounds.
Source: www.google.com
Lp Norms
DTW distance
Longest common
subsequence
Landmark
Similarity
Applicable only to symbolic representations of time series.
Non-metric because it does not satisfy triangle inequality.
𝑆𝑖𝑚 𝑥, 𝑦 = 𝐿𝐶𝑆 𝑥, 𝑦
A distance measure D is a metric if it satisfies the following properties:
1. Symmetry: D(X, Y) = D(Y, X)
2. Triangle Inequality: D(X, Y) + D(Y, Z) <= D(X, Z)
Threshold parameter, matching criteria for 2 points from x and y.
Warping threshold, constraint on matching of points along the
time axis.
Works the same ways as human remember patterns.
Definition of landmarks vary with application domains.
E.g. local minima, local maxima, inflection point etc.
Uses MDPP (Minimum Distance/Percentage Principle)
technique to eliminate noisy landmarks.
𝑥𝑖+1 − 𝑥𝑖 < 𝐷
𝑦𝑖+1 − 𝑦𝑖
𝑦𝑖 − 𝑦𝑖+1 2
< 𝑃
𝑀𝐷𝑃 𝐷, 𝑃 removes landmarks at and if𝑥𝑖 𝑥𝑖+1
Lp Norms
DTW distance
Longest common
subsequence
Landmark
Similarity
𝐷𝑟𝑒𝑑𝑢𝑐𝑒𝑑 𝑠𝑝𝑎𝑐𝑒 𝐴, 𝐵 ≤ 𝐷𝑡𝑟𝑢𝑒 𝐴, 𝐵
False Alarms False Dismissals
Objects that appear close in index space are actually
distant.
Objects appear distant in index space but are actually
closer.
Removed in post-processing step. Unacceptable.
DFT
DWT
PAA
eAPCA
APCA
Discrete Fourier Transform
𝑋 𝑓 =
1
𝑛
𝑡=0
𝑛−1
𝑥𝑡 𝑒
−𝑗2𝜋𝑡𝑓
𝑛
1. Choose coefficients corresponding to a few low values of
frequencies.
2. Choose coefficients corresponding to frequencies with higher
values of coefficients.
Based on Parseval’s Relation, Euclidean
distance is preserved in the Frequency
domain.
DFT
DWT
PAA
eAPCA
APCA
Discrete Fourier Transform
[6]
DFT
DWT
PAA
eAPCA
APCA
Discrete Fourier Transform
[6]
DFT
DWT
PAA
eAPCA
APCA
Discrete Wavelet Transform
𝜓𝑗,𝑘
= 2
𝑗
2 𝜓 2𝑗
𝑡 − 𝑘
Used with Haar wavelets as the basis function. Applicable only for time
series with lengths which are a power of 2.
Lower bound is tighter than DFT.
DFT
DWT
PAA
eAPCA
APCA
Discrete Wavelet Transform
Using Haar Wavelet as the basis function.
[6]
DFT
DWT
PAA
eAPCA
APCA
Piecewise Aggregate Approximation
𝑥𝑖 =
𝑁
𝑛
𝑗=
𝑛
𝑁 𝑖−1 +1
𝑛
𝑁 𝑖
𝑥𝑗
Reconstruction quality and estimated distance in index space is same as DWT
with the Haar Wavelet. With no restriction on length.
[6]
DFT
DWT
PAA
eAPCA
APCA
Piecewise Aggregate Approximation
𝐷 𝑋, 𝑌 =
𝑛
𝑁 𝑖=1
𝑁
𝑥𝑖 − 𝑦𝑖
2
A lower bound on the Euclidean distance in the PAA space.
N = actual number of points
n = number of PAA segments
DFT
DWT
PAA
eAPCA
APCA
Adaptive Piecewise Constant Approximation
Data adaptive. Shorter segments for areas of high activity.
An extension of PAA.
𝑋 = < 𝑥1, 𝑟1 >, < 𝑥2, 𝑟2 > ⋯ < 𝑥 𝑛, 𝑟 𝑛 >
𝑥𝑖 = 𝑚𝑒𝑎𝑛(𝑥 𝑟𝑖−1+1 … 𝑥 𝑟𝑖
)
[7]
DFT
DWT
PAA
eAPCA
APCA
Adaptive Piecewise Constant Approximation
𝐷 𝐶, 𝑄 =
𝑖=1
𝑀
𝑐𝑟𝑖 − 𝑐𝑟𝑖−1 𝑞𝑥𝑖 − 𝑐𝑥𝑖
2
M = number of APCA segments
A lower bound on the Euclidean distance in the APCA space.
DFT
DWT
PAA
eAPCA
APCA
Extended APCA
𝑆 = 𝜇1, 𝜎1, 𝑟1 , … , 𝜇 𝑚, 𝜎 𝑚, 𝑟 𝑚
Also stores variance along with mean for the segments.
As a result, it gives both a lower and upper bound on the Euclidean distance.
Formulas are very ugly!
SAX
iSAX
SFA
Based on PAA.
Symbolic Aggregate Approximation
Static alphabet size.
“Desirable to have a
discretization technique that
produce symbols with equal
probability.”
Can leverage run length encoding compression.
Breakpoints
[9]
SAX
iSAX
SFA
Symbolic Aggregate Approximation
Supports a lower bound distance measure to Euclidean distance.
𝐷𝑆𝐴𝑋 ≤ 𝐷 𝑃𝐴𝐴 ≤ 𝐷𝑡𝑟𝑢𝑒
Can be calculated in a streaming fashion.
[9]
SAX
iSAX
SFA
Indexable SAX
a, b, c, d (SAX)
00, 01, 10, 11
(iSAX)
0 00, 01, 10, 11
1 00, 01, 10, 11
1 00, 01, 0 10, 11
1 00, 01, 1 10, 11
Fixed number of segments. Dynamic alphabet size.
iSAX Notation: iSAX(T, segment count, alphabet size)
e.g. iSAX(T, 4, 8)
SAX
iSAX
SFA
Indexable SAX
Comparison of iSAX words with different alphabet size.
iSAX(A, 4, 8) = { 110, 110, 011, 000 }
iSAX(B, 4, 2) = { 0 , 0 , 1 , 1 }
Replace 0 with either of
{ 0 00, 0 01, 0 10, 0 11 }
whichever is closest to 110.
Similarly for all segments.
{ 011, 011, 100, 100} != iSAX(B, 4, 8)
Just a lower bound estimate. We
cannot undo lossy compression.
SAX
iSAX
SFA
Symbolic Fourier Approximation
Uses MCB (multiple coefficient binning) discretization.
Based on DFT.
SAX - assumes a common distribution
for all the coefficients of the reduced
representation
MCB – histograms are built for all the
coefficients and then equi-width
binning is used.
Tighter lower
bound than iSAX
[12]
SAX
iSAX
SFA
Symbolic Fourier Approximation
Every SFA symbol has some global information since it is based on
DFT. Cannot be calculated in a streaming fashion.
Unlike iSAX, Fixed alphabet size. Dynamic segment count. Quality of
representation improves with segment count.
Pruning
Power
Tightness of
Lower bound
number of data points examined to answer the query
total number of data points in the database
Intuitively captures the measure of the quality of representation. Free from
implementation bias.
On a random walk dataset with query lengths 256 to 1024 and dimension of
representation 16 to 64 [7]
Lower is better.
Pruning
Power
Tightness of
Lower bound
lower bounding distance
true distance
Tightness of lower bounds for various time series representations on the Koski ECG
dataset [10]
Higher is better.
R/R* Trees
SFA Trie
DS Tree
ADS index
iSAX Tree
R trees are multi-dimensional index structures.
Encloses close objects in a MBR (Minimum Bounding Rectangle).
Individual objects are at the leaves and intermediate nodes are
MBRs enclosing other MBRs or the objects.
Used for indexing time series after dimensionality reduction using
DFT, PAA, APCA and other numeric representations.
Unlike R trees, In R* trees, there is no overlap between the
different MBRs due to which it also works for range queries rather
than only point queries.
R/R* Trees
SFA Trie
DS Tree
ADS index
iSAX Tree
Based on the dynamic alphabet size of iSAX representation.
Given the segment count, say d. The root node has 2^d children.
A Leaf node, when overflows, is converted to an intermediate node.
A segment is selected and its cardinality is increased to produce 2 child
leaf nodes that contain the iSAX representations of the time series.
iSAX 2.0 is an improvement over iSAX where the segment on which
split occurs is selected based on the distribution of time series so that
the splitting is balanced.
R/R* Trees
SFA Trie
DS Tree
ADS index
iSAX Tree
Based on the SFA representation.
Time series with common SFA prefix lie in common sub-tree.
SFA is computed for more number of Fourier coefficients. But not all
are used in the index. Hence, small index size.
Example:
SFA( T1 ) = abaacde | SFA( T2 ) = abbadef
SFA( T1 ) = abaacde | SFA( T2 ) = abbadef | SFA( T3 ) = abaagef
R/R* Trees
SFA Trie
DS Tree
ADS index
iSAX Tree
Based on the Extended APCA reduction method.
Intermediate nodes store
𝜇𝑖
𝑚𝑖𝑛
, 𝜇𝑖
𝑚𝑎𝑥
, 𝜎𝑖
𝑚𝑖𝑛
, 𝜎𝑖
𝑚𝑎𝑥
for all segments i = 1 to m.
It also stores the splitting strategy chose during splitting.
Dynamic Segmentation Tree
R/R* Trees
SFA Trie
DS Tree
ADS index
iSAX Tree
Dynamic Segmentation Tree
Splitting strategies are of 2 types: Horizontal and Vertical.
Horizontal: using mean and variance.
Vertical: using segment splitting.
Splitting strategy is chosen based on the value of a Quality Measure.
The one with maximum value is selected.
𝑄 =
𝑖=1
𝑚
𝑟𝑖 − 𝑟𝑖−1 𝜇𝑖
𝑚𝑎𝑥
− 𝜇𝑖
𝑚𝑖𝑛 2
+ 𝜎𝑖
𝑚𝑎𝑥2
𝑆𝑝𝑙𝑖𝑡𝑡𝑖𝑛𝑔 𝐵𝑒𝑛𝑒𝑓𝑖𝑡 = 𝑄 𝑝𝑎𝑟𝑒𝑛𝑡 − 𝑄𝑙 + 𝑄 𝑟 2
R/R* Trees
SFA Trie
DS Tree
ADS index
iSAX Tree
Dynamic Segmentation Tree
Apart from similarity search as supported by other indices, it also allows
distance histogram computation for a given query.
For e.g.
Given a query Q, a list L = [ ([10, 20], 10), ([15, 30], 15), ([40, 50], 2) ]
means that there are 3 leaf nodes: N1, N2 and N3. N1 includes 10 time
series, and their distance from Q is between [10, 20]. Similarly for N2
and N3.
This is due to the lower and upper bounds provided by eAPCA.
R/R* Trees
SFA Trie
DS Tree
ADS index
iSAX Tree
Adaptive data series index
Based in iSAX representation.
Delays the construction of leaf nodes to query time.
Also, leaf nodes contain only the iSAX representations and the actual
data series remain in the disk. Even during splits, only the iSAX
representations are shuffled.
Trade off:
Small leaf size require splits that costs disk IO time, whereas
Big leaf size leads to increased query time for linear scan.
So, ADS+ uses adaptive leaf size. A bigger build time leaf size and a
much smaller query time leaf size.
[1] Agrawal, R., Faloutsos, C., & Swami, A. (1993). “Efficient similarity search in sequence databases”.
Proceedings of the 4th Conference on Foundations of Data Organization and Algorithms.
[2] Antonin Guttman, (1984). “R-trees: a dynamic index structure for spatial searching”. Proceedings of the
1984 ACM SIGMOD international conference on Management of data.
[3] Yi, B.K., Faloutsos, C. (2000) “Fast Time Sequence Indexing for Arbitrary Lp-Norms”. Proceedings of the
26th International Conference on Very Large Data Bases.
[4] Keogh, E. (2002) “Exact Indexing of Dynamic Time Warping”. Proceedings of the 28th international
conference on Very Large Data Bases.
[5] Perng, C., Wang H., Zhang S. R., Parker, D.S. (2000). “Landmarks: A New Model for Similarity-Based Pattern
Querying in Time Series Databases”. Proceedings of the 16th International Conference on Data Engineering
[6] Keogh, E., Chakrabarti, K., Pazzani, M. & Mehrotra, S. (2000). “Dimensionality Reduction for Fast
Similarity Search in Large Time Series Databases”. Published in Journal Knowledge and Information Systems.
[7] Keogh, E., Chakrabarti, K., Mehrotra, S., Pazzani, M. (2002) “Locally Adaptive Dimensionality Reduction for
Indexing Large Time Series Databases”. Published in Journal ACM Transactions on Database Systems.
[8] Wang, Y., Wang, P., Pei, J., Wang, W., Huang, S. (2013) “A Data-adaptive and Dynamic Segmentation Index
for Whole Matching on Time Series”. Proceedings of the VLDB Endowment
[9] Lin, J., Keogh, E., Lonardi, S., Chiu, B. (2003). “A symbolic representation of time series, with implications
for streaming algorithms”. Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining
and knowledge discovery.
[10] Shieh, J., Keogh, E., (2008) “iSAX: Indexing and Mining Terabyte Sized Time Series”. Proceedings of the
14th ACM SIGKDD international conference on Knowledge discovery and data mining.
[11] Camerra, A., Palpanas, T., Shieh, J., Keogh, E. (2010) “iSAX 2.0: Indexing and Mining One Billion Time
Series”. Proceedings of the IEEE International Conference on Data Mining.
[12] Schäfer, P., Högqvist, M. (2012) “SFA: A Symbolic Fourier Approximation and Index for Similarity Search in
High Dimensional Datasets”. Proceedings of the 15th International Conference on Extending Database
Technology.
[13] Beckmann, N., Kriegel, H., Schneider, R., Seeger, B. (1990) “The R*-tree: an efficient and robust access
method for points and rectangles”, Proceedings of the ACM SIGMOD international conference on
Management of data.
[14] Zoumpatianos, K., Idreos, S., Palpanas, T. (2014) “Indexing for Interactive Exploration of Big Data Series”.
Proceedings of the ACM SIGMOD International Conference on Data Management.
[15] Wu, Y.L., Agrawal, D., Abbadi, A.E., (2000) “A comparison of DFT and DWT based Similarity Search in
Time-Series Databases”. Proceedings of the ninth international conference on Information and knowledge
management.
[16] Esling, P., Agon, C. (2012) “Time-Series Data Mining”. Published in Journal ACM Computing Surveys
(CSUR).
Pattern Mining in large time series databases

More Related Content

What's hot

Clustering on database systems rkm
Clustering on database systems rkmClustering on database systems rkm
Clustering on database systems rkmVahid Mirjalili
 
Dimensionality reduction
Dimensionality reductionDimensionality reduction
Dimensionality reductionShatakirti Er
 
Lect 3 background mathematics
Lect 3 background mathematicsLect 3 background mathematics
Lect 3 background mathematicshktripathy
 
K means and dbscan
K means and dbscanK means and dbscan
K means and dbscanYan Xu
 
Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...TELKOMNIKA JOURNAL
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Yan Xu
 
Clustering, k-means clustering
Clustering, k-means clusteringClustering, k-means clustering
Clustering, k-means clusteringMegha Sharma
 
Statistical Clustering
Statistical ClusteringStatistical Clustering
Statistical Clusteringtim_hare
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methodsKrish_ver2
 
"Principal Component Analysis - the original paper" presentation @ Papers We ...
"Principal Component Analysis - the original paper" presentation @ Papers We ..."Principal Component Analysis - the original paper" presentation @ Papers We ...
"Principal Component Analysis - the original paper" presentation @ Papers We ...Adrian Florea
 
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...Adam Fausett
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...Simplilearn
 
Further results on the joint time delay and frequency estimation without eige...
Further results on the joint time delay and frequency estimation without eige...Further results on the joint time delay and frequency estimation without eige...
Further results on the joint time delay and frequency estimation without eige...IJCNCJournal
 
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering AlgorithmLino Possamai
 
Optics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structureOptics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structureRajesh Piryani
 

What's hot (20)

Clustering on database systems rkm
Clustering on database systems rkmClustering on database systems rkm
Clustering on database systems rkm
 
Dimensionality reduction
Dimensionality reductionDimensionality reduction
Dimensionality reduction
 
K mean-clustering
K mean-clusteringK mean-clustering
K mean-clustering
 
Lect 3 background mathematics
Lect 3 background mathematicsLect 3 background mathematics
Lect 3 background mathematics
 
K means and dbscan
K means and dbscanK means and dbscan
K means and dbscan
 
Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering
 
Birch
BirchBirch
Birch
 
Clustering, k-means clustering
Clustering, k-means clusteringClustering, k-means clustering
Clustering, k-means clustering
 
Statistical Clustering
Statistical ClusteringStatistical Clustering
Statistical Clustering
 
Pca analysis
Pca analysisPca analysis
Pca analysis
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methods
 
"Principal Component Analysis - the original paper" presentation @ Papers We ...
"Principal Component Analysis - the original paper" presentation @ Papers We ..."Principal Component Analysis - the original paper" presentation @ Papers We ...
"Principal Component Analysis - the original paper" presentation @ Papers We ...
 
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
 
Kmeans
KmeansKmeans
Kmeans
 
Cluster Analysis for Dummies
Cluster Analysis for DummiesCluster Analysis for Dummies
Cluster Analysis for Dummies
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
 
Further results on the joint time delay and frequency estimation without eige...
Further results on the joint time delay and frequency estimation without eige...Further results on the joint time delay and frequency estimation without eige...
Further results on the joint time delay and frequency estimation without eige...
 
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering Algorithm
 
Optics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structureOptics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structure
 

Similar to Pattern Mining in large time series databases

Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding AlgorithmFixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding AlgorithmCSCJournals
 
Adaptive blind multiuser detection under impulsive noise using principal comp...
Adaptive blind multiuser detection under impulsive noise using principal comp...Adaptive blind multiuser detection under impulsive noise using principal comp...
Adaptive blind multiuser detection under impulsive noise using principal comp...csandit
 
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...csandit
 
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...cscpconf
 
Getting started with chemometric classification
Getting started with chemometric classificationGetting started with chemometric classification
Getting started with chemometric classificationAlex Henderson
 
Eigen Subspace based Direction of Arrival Estimation for Coherent Sources
Eigen Subspace based Direction of Arrival Estimation for Coherent SourcesEigen Subspace based Direction of Arrival Estimation for Coherent Sources
Eigen Subspace based Direction of Arrival Estimation for Coherent SourcesINFOGAIN PUBLICATION
 
Channel and clipping level estimation for ofdm in io t –based networks a review
Channel and clipping level estimation for ofdm in io t –based networks a reviewChannel and clipping level estimation for ofdm in io t –based networks a review
Channel and clipping level estimation for ofdm in io t –based networks a reviewIJARIIT
 
Sparsity based Joint Direction-of-Arrival and Offset Frequency Estimator
Sparsity based Joint Direction-of-Arrival and Offset Frequency EstimatorSparsity based Joint Direction-of-Arrival and Offset Frequency Estimator
Sparsity based Joint Direction-of-Arrival and Offset Frequency EstimatorJason Fernandes
 
Performance analysis of music and esprit doa estimation algorithms for adapti...
Performance analysis of music and esprit doa estimation algorithms for adapti...Performance analysis of music and esprit doa estimation algorithms for adapti...
Performance analysis of music and esprit doa estimation algorithms for adapti...marwaeng
 
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...EuroIoTa
 
Accurate indoor positioning system based on modify nearest point technique
Accurate indoor positioning system based on modify nearest point techniqueAccurate indoor positioning system based on modify nearest point technique
Accurate indoor positioning system based on modify nearest point techniqueIJECEIAES
 
Suppression of grating lobes
Suppression of grating lobesSuppression of grating lobes
Suppression of grating lobeskavindrakrishna
 
Accurate time series classification using shapelets
Accurate time series classification using shapeletsAccurate time series classification using shapelets
Accurate time series classification using shapeletsIJDKP
 
RESOLVING CYCLIC AMBIGUITIES AND INCREASING ACCURACY AND RESOLUTION IN DOA ES...
RESOLVING CYCLIC AMBIGUITIES AND INCREASING ACCURACY AND RESOLUTION IN DOA ES...RESOLVING CYCLIC AMBIGUITIES AND INCREASING ACCURACY AND RESOLUTION IN DOA ES...
RESOLVING CYCLIC AMBIGUITIES AND INCREASING ACCURACY AND RESOLUTION IN DOA ES...csandit
 
DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)Cory Cook
 

Similar to Pattern Mining in large time series databases (20)

Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding AlgorithmFixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
 
Adaptive blind multiuser detection under impulsive noise using principal comp...
Adaptive blind multiuser detection under impulsive noise using principal comp...Adaptive blind multiuser detection under impulsive noise using principal comp...
Adaptive blind multiuser detection under impulsive noise using principal comp...
 
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
 
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
 
Getting started with chemometric classification
Getting started with chemometric classificationGetting started with chemometric classification
Getting started with chemometric classification
 
Estimating Space-Time Covariance from Finite Sample Sets
Estimating Space-Time Covariance from Finite Sample SetsEstimating Space-Time Covariance from Finite Sample Sets
Estimating Space-Time Covariance from Finite Sample Sets
 
Eigen Subspace based Direction of Arrival Estimation for Coherent Sources
Eigen Subspace based Direction of Arrival Estimation for Coherent SourcesEigen Subspace based Direction of Arrival Estimation for Coherent Sources
Eigen Subspace based Direction of Arrival Estimation for Coherent Sources
 
Dycops2019
Dycops2019 Dycops2019
Dycops2019
 
Channel and clipping level estimation for ofdm in io t –based networks a review
Channel and clipping level estimation for ofdm in io t –based networks a reviewChannel and clipping level estimation for ofdm in io t –based networks a review
Channel and clipping level estimation for ofdm in io t –based networks a review
 
Sparsity based Joint Direction-of-Arrival and Offset Frequency Estimator
Sparsity based Joint Direction-of-Arrival and Offset Frequency EstimatorSparsity based Joint Direction-of-Arrival and Offset Frequency Estimator
Sparsity based Joint Direction-of-Arrival and Offset Frequency Estimator
 
Performance analysis of music and esprit doa estimation algorithms for adapti...
Performance analysis of music and esprit doa estimation algorithms for adapti...Performance analysis of music and esprit doa estimation algorithms for adapti...
Performance analysis of music and esprit doa estimation algorithms for adapti...
 
Clustering
ClusteringClustering
Clustering
 
Linear discriminant analysis
Linear discriminant analysisLinear discriminant analysis
Linear discriminant analysis
 
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
 
Accurate indoor positioning system based on modify nearest point technique
Accurate indoor positioning system based on modify nearest point techniqueAccurate indoor positioning system based on modify nearest point technique
Accurate indoor positioning system based on modify nearest point technique
 
Suppression of grating lobes
Suppression of grating lobesSuppression of grating lobes
Suppression of grating lobes
 
Accurate time series classification using shapelets
Accurate time series classification using shapeletsAccurate time series classification using shapelets
Accurate time series classification using shapelets
 
RESOLVING CYCLIC AMBIGUITIES AND INCREASING ACCURACY AND RESOLUTION IN DOA ES...
RESOLVING CYCLIC AMBIGUITIES AND INCREASING ACCURACY AND RESOLUTION IN DOA ES...RESOLVING CYCLIC AMBIGUITIES AND INCREASING ACCURACY AND RESOLUTION IN DOA ES...
RESOLVING CYCLIC AMBIGUITIES AND INCREASING ACCURACY AND RESOLUTION IN DOA ES...
 
DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)
 
Clique and sting
Clique and stingClique and sting
Clique and sting
 

Recently uploaded

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制vexqp
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 

Recently uploaded (20)

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 

Pattern Mining in large time series databases

  • 1. M.Tech Seminar Presentation 2014-15 Presented by Jitesh Khandelwal 10211015 Guided by Dr. Durga Toshniwal IIT Roorkee
  • 2. Finance e.g. Stock prices Medical e.g. Electro Cardio Grams Marketing e.g. Forecasting product/brand demands Operations e.g. Monitoring control infrastructure at LHC Social Networks e.g. Like count on a profile picture based on gender Almost everything is a time series!
  • 3. Value Prediction Pattern Identification 2.1, 9.3, 4.5, 3, 6.7, 4.0, 18.8, 9.2, 5.8, ? 2.1, 9.3, 4.5, 3, 6.7, 4.0, 18.8, 9.2, 5.8 Based on mathematical models Based on human perception What’s next?
  • 5. Raw time series data Similarity model selection Dimensionality reduction Index construction Mathematical formulation of human perception of similarity High dimensionality makes distance calculation slow Enables efficient querying of big and fast incoming time series data 1 32
  • 6. Raw time series data Similarity model selection Dimensionality reduction Index construction 1 32 Symbolic Representation Text Processing Algorithms Double is 4 byte, Char is 1 byte. Hence, lower memory footprint.
  • 7. Lp Norms DTW distance Longest common subsequence Landmark Similarity ℒ 𝑝 𝑥 − 𝑦 = 𝑖=1 𝐿 𝑥𝑖 − 𝑦𝑖 𝑝 1 𝑝 ℒ1 ℒ2 - Manhattan distance - Euclidean distance Invariant to amplitude scaling when used with z-score normalization. Source: www.google.com
  • 8. Lp Norms DTW distance Longest common subsequence Landmark Similarity 𝐷 𝑖, 𝑗 = 𝑥𝑖 − 𝑦𝑗 + 𝑚𝑖𝑛 𝐷 𝑖 − 1, 𝑗 , 𝐷 𝑖 − 1, 𝑗 − 1 , 𝐷 𝑖, 𝑗 − 1 DTW is Dynamic Time Warping. Allows comparison of variable length time series. Computationally Expensive. Can be optimized using warping window techniques and early abandoning using lower bounds. Source: www.google.com
  • 9. Lp Norms DTW distance Longest common subsequence Landmark Similarity Applicable only to symbolic representations of time series. Non-metric because it does not satisfy triangle inequality. 𝑆𝑖𝑚 𝑥, 𝑦 = 𝐿𝐶𝑆 𝑥, 𝑦 A distance measure D is a metric if it satisfies the following properties: 1. Symmetry: D(X, Y) = D(Y, X) 2. Triangle Inequality: D(X, Y) + D(Y, Z) <= D(X, Z) Threshold parameter, matching criteria for 2 points from x and y. Warping threshold, constraint on matching of points along the time axis.
  • 10. Works the same ways as human remember patterns. Definition of landmarks vary with application domains. E.g. local minima, local maxima, inflection point etc. Uses MDPP (Minimum Distance/Percentage Principle) technique to eliminate noisy landmarks. 𝑥𝑖+1 − 𝑥𝑖 < 𝐷 𝑦𝑖+1 − 𝑦𝑖 𝑦𝑖 − 𝑦𝑖+1 2 < 𝑃 𝑀𝐷𝑃 𝐷, 𝑃 removes landmarks at and if𝑥𝑖 𝑥𝑖+1 Lp Norms DTW distance Longest common subsequence Landmark Similarity
  • 11. 𝐷𝑟𝑒𝑑𝑢𝑐𝑒𝑑 𝑠𝑝𝑎𝑐𝑒 𝐴, 𝐵 ≤ 𝐷𝑡𝑟𝑢𝑒 𝐴, 𝐵 False Alarms False Dismissals Objects that appear close in index space are actually distant. Objects appear distant in index space but are actually closer. Removed in post-processing step. Unacceptable.
  • 12. DFT DWT PAA eAPCA APCA Discrete Fourier Transform 𝑋 𝑓 = 1 𝑛 𝑡=0 𝑛−1 𝑥𝑡 𝑒 −𝑗2𝜋𝑡𝑓 𝑛 1. Choose coefficients corresponding to a few low values of frequencies. 2. Choose coefficients corresponding to frequencies with higher values of coefficients. Based on Parseval’s Relation, Euclidean distance is preserved in the Frequency domain.
  • 15. DFT DWT PAA eAPCA APCA Discrete Wavelet Transform 𝜓𝑗,𝑘 = 2 𝑗 2 𝜓 2𝑗 𝑡 − 𝑘 Used with Haar wavelets as the basis function. Applicable only for time series with lengths which are a power of 2. Lower bound is tighter than DFT.
  • 16. DFT DWT PAA eAPCA APCA Discrete Wavelet Transform Using Haar Wavelet as the basis function. [6]
  • 17. DFT DWT PAA eAPCA APCA Piecewise Aggregate Approximation 𝑥𝑖 = 𝑁 𝑛 𝑗= 𝑛 𝑁 𝑖−1 +1 𝑛 𝑁 𝑖 𝑥𝑗 Reconstruction quality and estimated distance in index space is same as DWT with the Haar Wavelet. With no restriction on length. [6]
  • 18. DFT DWT PAA eAPCA APCA Piecewise Aggregate Approximation 𝐷 𝑋, 𝑌 = 𝑛 𝑁 𝑖=1 𝑁 𝑥𝑖 − 𝑦𝑖 2 A lower bound on the Euclidean distance in the PAA space. N = actual number of points n = number of PAA segments
  • 19. DFT DWT PAA eAPCA APCA Adaptive Piecewise Constant Approximation Data adaptive. Shorter segments for areas of high activity. An extension of PAA. 𝑋 = < 𝑥1, 𝑟1 >, < 𝑥2, 𝑟2 > ⋯ < 𝑥 𝑛, 𝑟 𝑛 > 𝑥𝑖 = 𝑚𝑒𝑎𝑛(𝑥 𝑟𝑖−1+1 … 𝑥 𝑟𝑖 ) [7]
  • 20. DFT DWT PAA eAPCA APCA Adaptive Piecewise Constant Approximation 𝐷 𝐶, 𝑄 = 𝑖=1 𝑀 𝑐𝑟𝑖 − 𝑐𝑟𝑖−1 𝑞𝑥𝑖 − 𝑐𝑥𝑖 2 M = number of APCA segments A lower bound on the Euclidean distance in the APCA space.
  • 21. DFT DWT PAA eAPCA APCA Extended APCA 𝑆 = 𝜇1, 𝜎1, 𝑟1 , … , 𝜇 𝑚, 𝜎 𝑚, 𝑟 𝑚 Also stores variance along with mean for the segments. As a result, it gives both a lower and upper bound on the Euclidean distance. Formulas are very ugly!
  • 22. SAX iSAX SFA Based on PAA. Symbolic Aggregate Approximation Static alphabet size. “Desirable to have a discretization technique that produce symbols with equal probability.” Can leverage run length encoding compression. Breakpoints [9]
  • 23. SAX iSAX SFA Symbolic Aggregate Approximation Supports a lower bound distance measure to Euclidean distance. 𝐷𝑆𝐴𝑋 ≤ 𝐷 𝑃𝐴𝐴 ≤ 𝐷𝑡𝑟𝑢𝑒 Can be calculated in a streaming fashion. [9]
  • 24. SAX iSAX SFA Indexable SAX a, b, c, d (SAX) 00, 01, 10, 11 (iSAX) 0 00, 01, 10, 11 1 00, 01, 10, 11 1 00, 01, 0 10, 11 1 00, 01, 1 10, 11 Fixed number of segments. Dynamic alphabet size. iSAX Notation: iSAX(T, segment count, alphabet size) e.g. iSAX(T, 4, 8)
  • 25. SAX iSAX SFA Indexable SAX Comparison of iSAX words with different alphabet size. iSAX(A, 4, 8) = { 110, 110, 011, 000 } iSAX(B, 4, 2) = { 0 , 0 , 1 , 1 } Replace 0 with either of { 0 00, 0 01, 0 10, 0 11 } whichever is closest to 110. Similarly for all segments. { 011, 011, 100, 100} != iSAX(B, 4, 8) Just a lower bound estimate. We cannot undo lossy compression.
  • 26. SAX iSAX SFA Symbolic Fourier Approximation Uses MCB (multiple coefficient binning) discretization. Based on DFT. SAX - assumes a common distribution for all the coefficients of the reduced representation MCB – histograms are built for all the coefficients and then equi-width binning is used. Tighter lower bound than iSAX [12]
  • 27. SAX iSAX SFA Symbolic Fourier Approximation Every SFA symbol has some global information since it is based on DFT. Cannot be calculated in a streaming fashion. Unlike iSAX, Fixed alphabet size. Dynamic segment count. Quality of representation improves with segment count.
  • 28. Pruning Power Tightness of Lower bound number of data points examined to answer the query total number of data points in the database Intuitively captures the measure of the quality of representation. Free from implementation bias. On a random walk dataset with query lengths 256 to 1024 and dimension of representation 16 to 64 [7] Lower is better.
  • 29. Pruning Power Tightness of Lower bound lower bounding distance true distance Tightness of lower bounds for various time series representations on the Koski ECG dataset [10] Higher is better.
  • 30. R/R* Trees SFA Trie DS Tree ADS index iSAX Tree R trees are multi-dimensional index structures. Encloses close objects in a MBR (Minimum Bounding Rectangle). Individual objects are at the leaves and intermediate nodes are MBRs enclosing other MBRs or the objects. Used for indexing time series after dimensionality reduction using DFT, PAA, APCA and other numeric representations. Unlike R trees, In R* trees, there is no overlap between the different MBRs due to which it also works for range queries rather than only point queries.
  • 31. R/R* Trees SFA Trie DS Tree ADS index iSAX Tree Based on the dynamic alphabet size of iSAX representation. Given the segment count, say d. The root node has 2^d children. A Leaf node, when overflows, is converted to an intermediate node. A segment is selected and its cardinality is increased to produce 2 child leaf nodes that contain the iSAX representations of the time series. iSAX 2.0 is an improvement over iSAX where the segment on which split occurs is selected based on the distribution of time series so that the splitting is balanced.
  • 32. R/R* Trees SFA Trie DS Tree ADS index iSAX Tree Based on the SFA representation. Time series with common SFA prefix lie in common sub-tree. SFA is computed for more number of Fourier coefficients. But not all are used in the index. Hence, small index size. Example: SFA( T1 ) = abaacde | SFA( T2 ) = abbadef SFA( T1 ) = abaacde | SFA( T2 ) = abbadef | SFA( T3 ) = abaagef
  • 33. R/R* Trees SFA Trie DS Tree ADS index iSAX Tree Based on the Extended APCA reduction method. Intermediate nodes store 𝜇𝑖 𝑚𝑖𝑛 , 𝜇𝑖 𝑚𝑎𝑥 , 𝜎𝑖 𝑚𝑖𝑛 , 𝜎𝑖 𝑚𝑎𝑥 for all segments i = 1 to m. It also stores the splitting strategy chose during splitting. Dynamic Segmentation Tree
  • 34. R/R* Trees SFA Trie DS Tree ADS index iSAX Tree Dynamic Segmentation Tree Splitting strategies are of 2 types: Horizontal and Vertical. Horizontal: using mean and variance. Vertical: using segment splitting. Splitting strategy is chosen based on the value of a Quality Measure. The one with maximum value is selected. 𝑄 = 𝑖=1 𝑚 𝑟𝑖 − 𝑟𝑖−1 𝜇𝑖 𝑚𝑎𝑥 − 𝜇𝑖 𝑚𝑖𝑛 2 + 𝜎𝑖 𝑚𝑎𝑥2 𝑆𝑝𝑙𝑖𝑡𝑡𝑖𝑛𝑔 𝐵𝑒𝑛𝑒𝑓𝑖𝑡 = 𝑄 𝑝𝑎𝑟𝑒𝑛𝑡 − 𝑄𝑙 + 𝑄 𝑟 2
  • 35. R/R* Trees SFA Trie DS Tree ADS index iSAX Tree Dynamic Segmentation Tree Apart from similarity search as supported by other indices, it also allows distance histogram computation for a given query. For e.g. Given a query Q, a list L = [ ([10, 20], 10), ([15, 30], 15), ([40, 50], 2) ] means that there are 3 leaf nodes: N1, N2 and N3. N1 includes 10 time series, and their distance from Q is between [10, 20]. Similarly for N2 and N3. This is due to the lower and upper bounds provided by eAPCA.
  • 36. R/R* Trees SFA Trie DS Tree ADS index iSAX Tree Adaptive data series index Based in iSAX representation. Delays the construction of leaf nodes to query time. Also, leaf nodes contain only the iSAX representations and the actual data series remain in the disk. Even during splits, only the iSAX representations are shuffled. Trade off: Small leaf size require splits that costs disk IO time, whereas Big leaf size leads to increased query time for linear scan. So, ADS+ uses adaptive leaf size. A bigger build time leaf size and a much smaller query time leaf size.
  • 37. [1] Agrawal, R., Faloutsos, C., & Swami, A. (1993). “Efficient similarity search in sequence databases”. Proceedings of the 4th Conference on Foundations of Data Organization and Algorithms. [2] Antonin Guttman, (1984). “R-trees: a dynamic index structure for spatial searching”. Proceedings of the 1984 ACM SIGMOD international conference on Management of data. [3] Yi, B.K., Faloutsos, C. (2000) “Fast Time Sequence Indexing for Arbitrary Lp-Norms”. Proceedings of the 26th International Conference on Very Large Data Bases. [4] Keogh, E. (2002) “Exact Indexing of Dynamic Time Warping”. Proceedings of the 28th international conference on Very Large Data Bases. [5] Perng, C., Wang H., Zhang S. R., Parker, D.S. (2000). “Landmarks: A New Model for Similarity-Based Pattern Querying in Time Series Databases”. Proceedings of the 16th International Conference on Data Engineering [6] Keogh, E., Chakrabarti, K., Pazzani, M. & Mehrotra, S. (2000). “Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases”. Published in Journal Knowledge and Information Systems.
  • 38. [7] Keogh, E., Chakrabarti, K., Mehrotra, S., Pazzani, M. (2002) “Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases”. Published in Journal ACM Transactions on Database Systems. [8] Wang, Y., Wang, P., Pei, J., Wang, W., Huang, S. (2013) “A Data-adaptive and Dynamic Segmentation Index for Whole Matching on Time Series”. Proceedings of the VLDB Endowment [9] Lin, J., Keogh, E., Lonardi, S., Chiu, B. (2003). “A symbolic representation of time series, with implications for streaming algorithms”. Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery. [10] Shieh, J., Keogh, E., (2008) “iSAX: Indexing and Mining Terabyte Sized Time Series”. Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. [11] Camerra, A., Palpanas, T., Shieh, J., Keogh, E. (2010) “iSAX 2.0: Indexing and Mining One Billion Time Series”. Proceedings of the IEEE International Conference on Data Mining.
  • 39. [12] Schäfer, P., Högqvist, M. (2012) “SFA: A Symbolic Fourier Approximation and Index for Similarity Search in High Dimensional Datasets”. Proceedings of the 15th International Conference on Extending Database Technology. [13] Beckmann, N., Kriegel, H., Schneider, R., Seeger, B. (1990) “The R*-tree: an efficient and robust access method for points and rectangles”, Proceedings of the ACM SIGMOD international conference on Management of data. [14] Zoumpatianos, K., Idreos, S., Palpanas, T. (2014) “Indexing for Interactive Exploration of Big Data Series”. Proceedings of the ACM SIGMOD International Conference on Data Management. [15] Wu, Y.L., Agrawal, D., Abbadi, A.E., (2000) “A comparison of DFT and DWT based Similarity Search in Time-Series Databases”. Proceedings of the ninth international conference on Information and knowledge management. [16] Esling, P., Agon, C. (2012) “Time-Series Data Mining”. Published in Journal ACM Computing Surveys (CSUR).