Sophia He

Similarity Measures for Time Series
SOPHIA HE

Outline
•Why find similarity?
•Test & Raw datasets
• Approach 1 Correlation Table
• Approach 2 SAS Proc Similarity Method
• Approach 3 SIM Coefficient Method
• Approach 4 Derivatives Comparison Method
• Approach 5 Spectral Analysis Method
•Conclusion

Test Dataset & Raw Datasets
TEST DATASETS
Generated 5 test datasets (20 observations each)
using ARMA(p, q) model
ARMA(1,1)
Zt=at+ Ф *Zt-1 - θ *at-1
where a0 =0 Z0 = 0 t = 1…20
• Series 1: Ф = -0.8, θ = 0.1
• Series 2: Ф = 0.8, θ = -0.1
• Series 3: Ф = 0.85, θ = -0.15
• Series 4: Ф = -0.8, θ = 0.1 shifted with 21
observations
• Series 5: Ф = -0.85, θ = 0.15 with 21 observations
RAW DATASETS
Raw_itd1 & Raw_itd2: commodities datasets
• 119 series in Raw_itd1
• 120 series in Raw_itd2
• 222 monthly observations from January 1997 to
June 2015
Group Series
Similar Group Series 2&3
Dissimilar Group Series 1&2
Identical Shifted Group Series 1&4
Similar Shifted Group Series 1&5

Approach 1 Correlation Table
Test dataset: Advantages:
• Able to detect period
movements/shifts but
unstable
Disadvantages:
• Only capture linear
correlation between series
• Sensitive to outliersGroup Series
Similar Group Series 2&3
Dissimilar Group Series 1&2
Identical Shifted Group Series 1&4
Similar Shifted Group Series 1&5
Series S1 S2 S3 S4 S5
S1 1 0.18454 0.14729 -0.9221 0.98857
S2 0.18454 1 0.99359 0.0392 0.15459
S3 0.14729 0.99359 1 0.05083 0.12369
S4 -0.9221 0.0392 0.05083 1 -0.95089
S5 0.98857 0.15459 0.12369 -0.95089 1

Approach 1 Correlation Table
Similar pair Dissimilar pair

Approach 2 SAS Proc Similarity Method
•Use Distance Matrix to compute a similarity measure of a pair of series
Data
Input
series:
X1 X2 X3 … Xn
Target
series:
Y1 Y2 Y3 … Ym
Distance Matrix
Input Series
X1 X2 X3 … Xn
Target
Series
Y1 D11 D12 D13 … D1n
Y2 … … … … …
Y3 … … … … …
… … … … … …
Ym Dm1 … … … Dmn
Dij = Input Series – Target Series
e.g. D11 = X1 – Y1
Normalized
Rescaled
Computes all possible paths to
transverse the matrix

Output:
1. Similarity Measures: Absolute Deviation
• Measure=ABSDEV / Absolute Deviation
• total distance of the minimum path to transverse the distance matrix
2. Cost Statistics: statistics associated with minimum path
3. Path Statistics: indicating percentages of direct path (diagonal movement),
compression (vertical movement) and expansion (horizontal movement)
The Less the Amount of Absolute Deviation, the More Similar the Two Series Are

More on Proc Similarity in SAS
Basic Structure
proc similarity data=data out= ;
input S1 / normalize = scale=;
target S2 / slide = normalize= measure=
compress=(localabs=0) expand=(localabs=0);
run;
Output:
Similarity Measures
Path & Cost Measures
Transformed Input &Target Series
Input & Target Path Index
Distance Metric
Transformation
Normalization:
 Absolute:
 Standard:
Scale:
 Absolute:
 Standard:
User-defined:
 FCMPOPT Statement & Options
Measures
1. SQRDEV/ABSDEV : squared or absolute deviation
2. MSQRDEV/MABSDEV : mean squared or absolute deviation
 relative to the length of the input or target sequence
 relative to the minimum or maximum valid path length
3. User-defined Measures

More on Proc Similarity in SAS
Path & Cost Statistics Plots

Results of Raw_itd1
Advantages:
 Higher accuracy rate of detecting similar pairs
 Normalized and rescaled the series
 Compute pair-wise similarity measures using DO loop
 Performs well when treating totally dissimilar series that crosses each
other
Disadvantages:
 Bad at detecting similar and shifted series
 Sensitive to Outliers
Series Pairs
Proc Similarity
Measure
RW_CMACDG391 & RW_CMACDP553 44.00063414
RW_CMACDG183 & RW_CMACDG274 49.51170024
RW_CMACDG274 & RW_CMACDP553 54.0065316
Table 2 Proc Similarity Measures
Series Absolute Deviation
Similar Group Series 2&3 1.92422
Dissimilar Group Series 1&2 20.25579
Identical Shifted Group Series 1&4 1.20447
Similar Shifted Group Series 1&5 33.96824

Identical PairSimilar Pair Very Similar Pair
Similarity Measure = 51.98 Similarity Measure = 2.78E-14Similarity Measure = 14.11

Approach 3 SIM Coefficient
SIM coefficient is calculated by the following:
𝑦𝑡
(1)
=
𝑥 𝑡
(1)
−𝑥 𝑡−1
(1)
𝑥 𝑡−1
(1) for t=2,…,T
𝑆𝑖𝑚 𝑦 1 , 𝑦 2 =
𝑡=2
𝑇
[
𝑎𝑏𝑠(𝑦𝑡
1
− 𝑦𝑡
2
)
max 𝑎𝑏𝑠 𝑦𝑡
1
, 𝑎𝑏𝑠 𝑦𝑡
2
(𝑇 − 1)]
The Closer the Value to Zero, the More Similar the Series Are.

Approach 3 SIM Coefficient
Series Pairs SIM_final
Results of Raw_itd1
Advantages:
 Compute pair-wise similarity measures using DO loop
 Performs well when treating totally dissimilar series that crosses
each other
 Best in detecting both identical & shifted series and similar & shifted
series
Disadvantages :
 Accuracy rate is lower than Proc Similarity method
Cut-off: considered similar when below 0.7
Table 3 SIM Measures
Series SIM Coefficient
Similar Group Series 2&3 0.36887
Dissimilar Group Series 1&2 0.98743
Identical Shifted Group Series 1&4 0.23527
Similar Shifted Group Series 4&5 0.23978

Approach 4 Derivatives Comparison Method
Step 1: Use spline function to represent series
Step 2: Compute first derivatives/slopes at each knot
Step 3: Compute second derivatives/rate of change at
each knot
Step 4: Compute difference between first & second
derivatives
The Smaller the Difference, the More Similar the Series Are.
Basic Idea:
spline function on series + calculation of first & second derivatives

Dissimilar Series: Series 1&2

Quantitative Measures
Table 4 Comparison of Derivatives
Difference between 1st
Derivative
Difference between 2nd
Derivative
Similar Group 0.81508 0.16769
Dissimilar Group 20.4207 4.21562
Identical Shifted Group 24.2524 5.17868
Similar Shifted Group 34.8150 7.75581
Disadvantage:
 Slow in processing time
 Bad at detecting both identical & shifted
series and similar & shifted series
 # of knots < # of observations

Approach 5 Spectral Analysis Method
SAS: proc spectra
Plot frequency against phase spectrum in
radians of X and Y
Phase spectrum:
Time Domain:
e.g ARIMA Model
• Auto-covariance
• Auto-correlation
Frequency Domain:
• spectral density
function
Time Series Representation Similar Pair
Dissimilar Pair

Conclusion
Approach 1 Correlation Table:
 Easy to Interpret & Fastest in Computing Pair-Wise Measures
Approach 2 SAS Proc Similarity Method:
 More Functionalities & Highest Accuracy Rate
Approach 3 SIM Coefficient:
 Straight Forward Formula & Add More Accuracy
Approach 4 Derivatives Comparison Method:
 Confirmation Mechanism
Approach 5 Spectral Analysis Method:
 Different Prospective & Measures

Sophia He

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Sophia He

Similar to Sophia He (20)

Sophia He

Editor's Notes