The presentation deals with the clustering of trajectories of moving objects. A k-means-like algorithm based on a Euclidean distance between piece-wise linear curves is used. The main novelty of the paper is the opportunity of considering in the clustering procedure a step that automatically weights the importance of sub-trajectories of the original ones. The algorithm uses an adaptive distances approach and a cluster-wise weighting. The proposed algorithm is tested against some workbench trajectory datasets.
Presented at SIS 2019, Milan.
3. Outline
β’ We aim at clustering trajectories of
moving objects.
β’ A k-means-like algorithm based on a
Euclidean distance between piece-wise
linear curves is used. Each trajectory is
decomposed into sub-trajectories.
β’ The importance of each sub-trajectory is
automatically computed in the clustering
algorithm using an adaptive distances
approach.
β’ The proposed algorithm is tested against
some workbench trajectory datasets
Some trajectories
Examples of clustered
trajectories
19/6/2019 Sis 2019 2 / 47
4. Trajectory
A trajectory ππ is a collection of ordered pairs of data (sπ
π, π‘π
π),
π = 1, β¦ , π, sampled in π time-points where sπ
π is a spatial location
(namely. a 2D or a 3D vector of spatial coordinates) and π‘π
π is a
time-stamp. A trajectory can be enriched with other data recorded at
each time-point, but we donβt consider this case. Considering the order
provided by the time-stamps, a trajectory π is described as a curve in a
2D (or 3D space).
Trajectories are everywhere
Trajectories of
β’ pedestrians
β’ animals
β’ vehicles
β’ hurricanes
β’ β¦
Sensed by:
β’ GPS systems
β’ GSM
networks
β’ RFID and
WiFi
β’ β¦
Clustering and classification are
useful applications in
β’ Transportation
β’ Urban planning
β’ Business
β’ β¦
19/6/2019 Sis 2019 3 / 47
5. Clustering trajectories
Clustering aims at grouping objects such that
β’ similar objects are grouped together
β’ different objects belongs to different groups
Trajectories clustering looks for groups of trajectories, or of
sub-trajectories, such that they represent a movement pattern in the data.
When are two trajectories similar?
19/6/2019 Sis 2019 4 / 47
6. Different approaches to trajectory clustering
In the literature, the following two approaches represent the state of the
art of trajectory clustering:
β’ Lee et al. [6] propose a distance between sub-trajectories, and an
algorithm implements an extension of a density based clustering for
grouping set of sub-trajectories.
β’ Ferreira et al. [4] estimate π vector fields associated with π groups of
trajectories observed in a 2D space. This application, is inspired by
the problem of monitoring and predicting storm or hurricane paths.
β’ Another approach is provided by functional data analysis where a
trajectory is considered as a curve in a 2D or 3D space. Sangalli et
al. [8] proposed a k-means type algorithm using an alignment step.1
1We do not consider alignment in this paper
19/6/2019 Sis 2019 5 / 47
7. What if trajectories have different time-length? Two choices.
Consider sub-trajectories:
sub-trajectories of equal lengths can be
selected and then compared
β’ Pro: time length is preserved
β’ Cons: computational cost can be high
Normalize lengths
time lengths are set equal to 1
β’ Pros: trajectories are considered as a
single objects, distances are more
interpretable, computational cost is
acceptable. Averaging trajectories is
possible.
β’ Cons: if trajectories have very different
time-lengths some biases arise
19/6/2019 Sis 2019 6 / 47
9. Distances between trajectories
Some assumptions
β’ We consider normalized trajectories.
β’ We consider trajectories as piece-wise linear curves.
β’ For each piece, we assume a constant relative speed.
Under these assumptions, we can consider the Euclidean distance between
two 2D trajectories2
having the same π time-stamps normalized in [0, 1].
Given two normalized trajectories
π1 = {{(π₯1
0, π¦1
0), 0}, β¦ , {(π₯1
π , π¦1
π ), π1
π }, β¦ , {(π₯1
π1
, π¦1
π1
), 1}} and
π2 = {{(π₯2
0, π¦2
0), 0}, β¦ , {(π₯2
π , π¦2
π ), π2
π }, β¦ , {(π₯2
π2
, π¦2
π2
), 1}},
where π π
π =
π‘ π
πβπ‘ π
0
π‘ π
π π
βπ‘ π
0
2The trajectory is on a plane, but the extension to 3D spaces is straightforward.
19/6/2019 Sis 2019 7 / 47
10. Euclidean distance between two trajectories i
It is possible to express the two trajectories with a common set of πβs by
a linear interpolation. Once the two trajectories are registered such that
they have the same normalized πΏ β [πππ(π1, π2), (π1 + π2)] time-stamps
we compute the squared Euclidean distance between π1 and π2 as
follows:
π2
πΈ (π1, π2) =
1
β«
0
[(π₯1(π) β π₯2(π))
2
+ (π¦1(π) β π¦2(π))
2
]ππ =
=
πΏ
β
β=1
(πβ β πββ1) {
| Μπ₯1(β) β Μπ₯2(β)|
2
+ | Μπ¦1(β) β Μπ¦2(β)|
2
+
+1
3 [| Μπ₯1(β) β Μπ₯2(β)|
2
+ | Μπ¦1(β) β Μπ¦2(β)|
2
]
}
(1)
where:
19/6/2019 Sis 2019 8 / 47
11. Euclidean distance between two trajectories ii
β’ Μπ₯1(β) = π₯1(πβ)+π₯1(πββ1)
2 , Μπ₯2(β) = π₯2(πβ)+π₯2(πββ1)
2 ,
Μπ¦1(β) = π¦1(πβ)+π¦1(πββ1)
2 , and Μπ¦2(β) = π¦2(πβ)+π¦2(πββ1)
2 . The points
( Μπ₯1(β), Μπ¦1(β)) and ( Μπ₯2(β), Μπ¦2(β)) are, respectively, the centers of the
segment that starts from (π₯1(πββ1), π¦1(πββ1)) and arrives at
(π₯1(πβ) , π¦1(πβ)), respectively, the centers of the segment that starts
from (π₯2(πββ1) , π¦2(πββ1)) and arrives at (π₯2(πβ), π¦2(πβ));
β’ Μπ₯1(β) = π₯1(πβ)βπ₯1(πββ1)
2 , Μπ₯2(β) = π₯2(πβ)βπ₯2(πββ1)
2 ,
Μπ¦1(β) = π¦1(πβ)βπ¦1(πββ1)
2 , and Μπ¦2(β) = π¦2(πβ)βπ¦2(πββ1)
2 . The value
( Μπ₯1(β), Μπ¦1(β)) and ( Μπ₯2(β) , Μπ¦2(β)) are, respectively, the pairs of the
signed component-wise half widths of the segment that starts from
(π₯1(πββ1) , π¦1(πββ1)) and arrives at (π₯1(πβ) , π¦1(πβ)), respectively, of
the segment that starts from (π₯2(πββ1) , π¦2(πββ1)) and arrives at
(π₯2(πβ) , π¦2(πβ)).
19/6/2019 Sis 2019 9 / 47
13. Using adaptive distances
The hypothesis is that each sub-trajectory could have a different
importance in the clustering process.
Adaptive distances (or weighted distances)[3]
β’ We can consider a system of weights for each sub-trajectory
reflecting the importance in the clustering process.
β’ Using adaptive distances in a k-means algorithm as suggested in [3],
we extended the k-means algorithm for trajectories such that a
system of weights is the solution of the minimization of the criterion
function (or the cost function) of the k-means algorithm.
β’ We propose a global weighting system (a weight for each
sub-trajectory) and a cluster-wise (a weight for each cluster) one.
Note that, weights are computed by the algorithm and not provided
by the user.
19/6/2019 Sis 2019 11 / 47
15. The algorithm: Initialization
0. Input: A dataset π of normalized and registered trajectories cut at some
predefined π normalized time-stamps. A predefined πΎ number of
clusters.
1. Initialization Set π‘ = 0
1.1 Centers selection Select randomly πΎ trajectories and store
them in πΊ(0)
1.2 Fix initial weights Fix Ξ(0) = 1
1.3 Assign Assign data to clusters according to a minimum
distance criterion, and generate the initial partition
of trajectories π«(0)
1.4 Compute initial criterion Compute π
(0)
π΄πΊ (SKADAG) or π
(0)
π΄πΏ
(SKADAL)
19/6/2019 Sis 2019 13 / 47
16. The algorithm: Iterative optimization
2. Repeat Set π‘ = π‘ + 1
2.1 Centers selection Fixed π«(π‘β1) and Ξ(π‘β1), compute the average
trajectories for each cluster and store them in πΊ(π‘)
2.2 Compute weights Fixed π«(π‘β1) and πΊ(π‘), compute Ξ(π‘)
according to the constrained minimization of π
(π‘)
π΄πΊ
(SKADAG) or π
(π‘)
π΄πΏ (SKADAL), using the
Lagrange multiplier method.
2.3 Assign Fixed πΊ(π‘) and Ξ(π‘), assign trajectories to clusters
according a minimum distance criterion w.r.t. the
average trajectories, and store the partition of
trajectories in π«(π‘).
2.4 Compute the new criterion Compute π
(π‘)
π΄πΊ (SKADAG) or
π
(π‘)
π΄πΏ (SKADAL) .
2.5 Verify the stopping rule If π
(π‘)
π΄πΊ < π
(π‘β1)
π΄πΊ (SKADAG) or
π
(π‘)
π΄πΏ < π
(π‘)
π΄πΏ (SKADAL) then go to 2. else go to
3..
3. Return solution Return π«(π‘), πΊ(π‘), Ξ(π‘).
19/6/2019 Sis 2019 14 / 47
18. Two datasets
We apply the algorithm to two datasets3
CROSS (road) dataset
CROSS is a dataset of 1, 900 trajectories of
vehicles approaching to a crossroad. The
trajectories are labeled into 19 different
types.
LABOMNI
It is a dataset describing 15 (πΎ) sets of
trajectories of 209 people in a laboratory.
3available from http://cvrr.ucsd.edu/LISA/Datasets/TrajectoryClustering/CVRR_d
ataset_trajectory_clustering.zip
19/6/2019 Sis 2019 15 / 47
19. Main results: external validity indices
Table 2: CROSS and LABOMNI datasets: ARI= Adjusted Rand Index,
PUR=Purity, NMI=Normalized Mutual Information
CROSS LABOMNI
N=1,900 k=19 N=209 K=15
Methods ARI PUR NMI ARI PUR NMI
K-means 0.8163 0.8389 0.9405 0.6715 0.8373 0.8248
cuts (0.15,0.85) cuts (0.005, 0.15,0.85,0.995)
K-means pieces 0.8210 0.8411 0.9443 0.8772 0.9234 0.9118
SKADAG 0.8192 0.8400 0.9426 0.8930 0.9330 0.9230
SKADAL 0.8200 0.8405 0.9433 0.8273 0.9139 0.8998
Note: algorithms based on sub-trajectories perform slightly better for CROSS and
significantly better for the LABOMNI. According to [7], we note that CROSS has less
complex trajectories than LABOMNI. Indeed, CROSS contains more regular
trajectories, since they show cars behaviors at a crossroad, while LABOMNI consists in
trajectories of people that walk almost freely in a laboratory.
19/6/2019 Sis 2019 16 / 47
20. LABOMNI clustering results i
In the following slides, we focus on LABOMNI dataset.
β’ For each ground-truth class (top-left), we show the closest cluster
(bottom-left).
β’ On the right, it is reported the respective Variance function, namely
π ππ π(π) =
1
π π
β
πβπΆ π
[ππ(π) β Μππ(π)]
2
For comparing results, we plot the square root of the function.
β’ We report the log of relevance weights (that sum to zero because of
the log transformation)
19/6/2019 Sis 2019 17 / 47
46. Comments on LABOMNI
β’ We see that some clusters are strange! This is due to misaligned
trajectories.
β’ While alignment is solvable in a hierarchical clustering approach (for
example, distances matrices can be computed using warping), in a
k-mean approach is less feasible.
19/6/2019 Sis 2019 43 / 47
47. Conclusions
β’ In this work
β We presented a new k-means clustering algorithm for trajectories
β We observed that using sub-trajectories improves clustering results
β’ In perspective
β How many cuts and where to cut?
β A possible solution is to cut everywhere! I.e., we are developing and
testing a new algorithm where the π are continuous.
β Alignment of trajectories in a k-means framework. Recently, some
proposal have been introduced for time-series.
β Relaxing hypothesis without losing interpretability and performances.
19/6/2019 Sis 2019 44 / 47
48. References i
[1] DemΕ‘ar, U., Buchin, K., Cagnacci, F., Safi, K., Speckmann, B.,
Weghe, N.V., Weiskopf, D., Weibel, R.: Analysis and visualisation of
movement: an interdisciplinary review. Movement ecology. 3:5,
(2015)
[2] Diday, E.: The dynamic clusters method in nonhierarchical
clustering. International Journal of Computer and Information
Sciences 2: 61 (1973) doi: 10.1007/BF00987153
[3] Diday, E. and Govaert, G.: Classification Automatique avec
Distances Adaptatives. R.A.I.R.O. Informatique Computer Science,
11 (4), 329-349 (1977)
19/6/2019 Sis 2019 45 / 47
49. References ii
[4] Ferreira, N. , Klosowski, J. T., Scheidegger, C. E. and Silva, C. T.:
Vector Field kβMeans: Clustering Trajectories by Fitting Multiple
Vector Fields. Computer Graphics Forum, 32: 201-210. (2013) doi:
10.1111/cgf.12107
[5] Jiang Bian, Dayong Tian, Yuanyan Tang, Dacheng Tao. A review of
moving object trajectory clustering algorithms. Artif Intell Rev
(2016) doi: 10.1007/s10462-016-9477-7
[6] Lee, J., Han, J., Whang, K.: Trajectory clustering: a
partition-and-group framework. Proceedings of the 2007 ACM
SIGMOD international conference on Management of data, pp.
593-604 (2007)
19/6/2019 Sis 2019 46 / 47
50. References iii
[7] Morris, B. T., Trivedi, M. M.:Learning Trajectory Patterns by
Clustering: Experimental Studies and Comparative Evaluation, in
Proc. IEEE Inter. Conf. on Computer Vision and Pattern Recog.,
Maimi, Florida, (2009)
[8] Sangalli, L.M., Secchi, P., Vantini, S., Vitelli, V.: K-mean alignment
for curve clustering, Computational Statistics & Data Analysis, 54,5,
1219β1233 (2010)
19/6/2019 Sis 2019 47 / 47