A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
KDD Poster Nurjahan Begum
1. Observation 1:
The convergence of DTW and Euclidean distance results for increasing data sizes.
Observation 2:
The increasing effectiveness of lower-bounding pruning for increasing data sizes.
Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning StrategyAccelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy
Nurjahan Begum,Nurjahan Begum, LiudmilaLiudmila Ulanova,Ulanova, Jun WangJun Wang11 andand EamonnEamonn KeoghKeogh
UniversityUniversity of California,of California, RiversideRiverside UT DallasUT Dallas11
Why is DTW Clustering Hard?Why is DTW Clustering Hard?
Motivation of DTW ClusteringMotivation of DTW Clustering Density Peaks (DP) AlgorithmDensity Peaks (DP) Algorithm
Why Existing Work isWhy Existing Work is notnot the Answer?the Answer?
TADPoleTADPole: Our Proposed Algorithm: Our Proposed Algorithm
How ‘good’ are TADPole Clusters?
Case Study 1: ElectromagneticCase Study 1: Electromagnetic ArticulographArticulograph
How Effective isHow Effective is TADPole’sTADPole’s Pruning?Pruning?
#kanyewest
#Michael
#MichaelJackson
#taylorswift
0 40 80 120
hours
Synonym Discovery ?
Association Discovery ?
“I’mma let you finish”
Bos taurus
Hyperoodon
ampullatus
Talpa
europaea
Bos taurus
Hyperoodon
ampullatus
Talpa
europaea
Cetartiodactyla
DTW ED
0 1000 2000
0.01
0.03
0.05
0.07
1-NN
errorrate
Size of training set
Euclidean
DTW
0 1000 2000
0.6
0.7
0.8
0.9
Dataset Size
RandIndex
DTW
Euclidean
Neither of these two observations help!
5
1
2
3
4
6
7
8
9
10
11
12
13
1
2
3
4
5
6
7
8
9
10
11
12
13
Mislabeled
by k-means
Outlier
Scalability Issue:
DTW is not a metric, therefore very difficult to index
Quality Issue:
Need clustering algorithm which is insensitive to outliers
3 steps
1. Density Calculation
2. NN within Higher Density List Calculation
3. Cluster Assignment
1
23
4
5
6
8
7
910
111213 1
2
3
4
5
6
7
8
9
10
11
12
13
4
3
6
4
5
3
1
3
1
1
2
2
2
ρ
3 5
Elements with higher density
4.2 6
Item 1’s cluster label =
item 3’s cluster label
1
dc
j
ciji dd )(
Pruning During Local Density Computation
j
LBMatrix(i,j)
Dij
UBMatrix(i,j)
LBMatrix(i,j)
Dij
UBMatrix(i,j)
dc
LBMatrix(i,j)
Dij
UBMatrix(i,j)
B)
C)
D)
i j
i
i
j
j
i Dij = 0A)
Pruning During NN Distance Calculation
From Higher Density List
LBMatrix(i,j1)
D1
UBMatrix(i,j1)
D2
UBMatrix(i,j2)
D3
UBMatrix(i,j3)
A)
B)
C)
i j1
i
i
j2
j3
D4
UBMatrix(i,j4)
i j4
D)
LBMatrix(i,j2)
LBMatrix(i,j4)
LBMatrix(i,j3)
DistanceCalculations
0 3500
1
3
5
7
x 10
6
TADPole
Number of objects
Absolute
Number
0 3500
0
100
Number of objects
Brute force
TADPole
Percentage
DP: 9 Hours
TADPole: 9 minutes
Distance Computation Ordering:Distance Computation Ordering:
AnytimeAnytime TADPoleTADPole
Distance Computation Percentage 100%
0.4
1
0
RandIndex
Euclidean
Distance
Oracle
Order
TADPole
Order
0 10%
0.4
1
Oracle Order
Random Order
TADPole Order
Random
Order
RandIndex
Distance Computation Percentage
Zoom-In of Above Figure
This reflects the
90% of DTW
calculations that
were admissibly
pruned
This reflects the
10% of DTW
calculations that
were calculated in
anytime ordering
10%
0 150
Y
Z
Y
Z
1 2 3 4 5 6 7
0.84
0.92
1
Distance Computation Percentage
RandIndex
Euclidean Distance
Oracle Order
Random Order
TADPole Order
Pruning: 94%
Case Study 2:Case Study 2: PulsusPulsus DatasetDataset
Suspected Pulsus
Severe Pulsus
Healthy
Oximeter
Vein
Artery
Photo Detector
LED
0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60
Patient 639 Patient 523 Patient 618 Patient 2975918
0 10 20 30 40 50 600 10 20 30 40 50 60
Normalized Respiration Rate
Normalized Heart Rate
PowerSpectral
Density
Frequency
A) B)
C) D) E) F)
200 600 1000 1400 1800200 600 1000 1400 1800
Non-Severe Pulsus Severe Pulsus
PPG
ReproducibilityReproducibility
All the code and datasets used in this paper are publicly available in:
www.cs.ucr.edu/~nbegu001/SpeededClusteringDTW
Pruning: 88%