Large amounts of mobility data are being generated from many different sources, and several data mining methods have been proposed for this data. One of the most critical steps for trajectory data mining is segmentation.
This task can be seen as a pre-processing step in which a trajectory is divided into several meaningful consecutive sub-sequences. This process is necessary because trajectory patterns may not hold in the entire trajectory but on trajectory parts.
In this work we propose a supervised trajectory segmentation algorithm, called Wise Sliding Window Segmentation (WS-II).
It processes the trajectory coordinates to find behavioral changes in space and time, generating an error signal that is further used to train a binary classifier for segmenting trajectory data.
This algorithm is flexible and can be used in different domains. We evaluate our method over three real datasets from different domains (meteorology, fishing, and individuals movements), and compare it with four other trajectory segmentation algorithms: OWS, GRASP-UTS, CB-SMoT, and SPD.
We observed that the proposed algorithm achieves the highest performance for all datasets with statistically significant differences in terms of the harmonic mean of purity and coverage.
Wise Sliding Window Segmentation: A classification-aided approach for trajectory segmentation
1. WISE SLIDING WINDOW SEGMENTATION: A
CLASSIFICATION-AIDED APPROACH FOR
TRAJECTORY SEGMENTATION
MOHAMMAD ETEMAD, ZAHRA ETEMAD, AMILCAR SOARES, VANIA BOGORNY,
STAN MATWIN, LUIS TORGO
Canadian AI 2020: etemad@dal.ca
4. PROBLEM DEFINITION
GIVEN A RAW TRAJECTORY ๐, WE WOULD LIKE TO GENERATE A SEQUENCE OF
SEGMENTS
S =< ๐ 0
๐
, โฆ , ๐ ๐
๐
>
EACH ๐ ๐
๐
SATISFIES A CERTAIN HOMOGENEITY CRITERIA FOR A GIVEN
APPLICATION DOMAIN.
5. PROBLEM DEFINITION
TO EVALUATE THE PERFORMANCE OF THE GENERATED S, WE RELY ON THE
KNOWLEDGE OF AN EXPERT USER TO PROVIDE A SET OF SEMANTIC TUPLES ๐ ๐๐ =
(๐ ๐๐, ๐๐๐๐๐)
โขidentifies a segment ๐ ๐ of a trajectory, generated by
the expert user,
๐ ๐๐
โขis a semantic label attached by the expert to ๐ ๐
โขExamples:
โขA transportation mode
โขStatus of fishing or non-fishing.
๐๐๐๐๐
19. SMALL AIS
DATA
COLLECTED BY AUTHORS.
AVAILABLE AT [1]
https://github.com/metemaad/
Number of
trajectory
points:513012
Number of
vessels:10
MMSIs: 316027034, 316030538, 316032
316036216, 316038739, 316250000,
316278000, 316302000, 319030600,
319035600
24. CONCLUSIONS
WS-II is a supervised trajectory
segmentation method
Majority voting contribute to
robustness of proposed method
โขHigher performance
โขWork better on long segments
โขSupervised approach
Strengths of WS-II
โขSupervised approach
โขLow performance on short trajectories
Weaknesses of WS-II
25. RESOURCES
โข [1] SOARES, AMรLCAR, ET AL. "GRASP-UTS: AN ALGORITHM FOR UNSUPERVISED TRAJECTORY
SEGMENTATION." INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 29.1 (2015): 46-68.
โข [2] ETEMAD, MOHAMMAD, ET AL. "A TRAJECTORY SEGMENTATION ALGORITHM BASED ON INTERPOLATION-BASED
CHANGE DETECTION STRATEGIES." EDBT/ICDT WORKSHOPS. 2019.
โข [3] HTTPS://GITHUB.COM/METEMAAD/WS-II
โข [4] PALMA, ANDREY TIETBOHL, ET AL. "A CLUSTERING-BASED APPROACH FOR DISCOVERING INTERESTING PLACES
IN TRAJECTORIES." PROCEEDINGS OF THE 2008 ACM SYMPOSIUM ON APPLIED COMPUTING. 2008.
โข [5] FENG, SHANSHAN, ET AL. "POI2VEC: GEOGRAPHICAL LATENT REPRESENTATION FOR PREDICTING FUTURE
VISITORS." THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE. 2017.
โข [6] YU ZHENG, LIZHU ZHANG, XING XIE, WEI-YING MA. MINING INTERESTING LOCATIONS AND TRAVEL SEQUENCES
FROM GPS TRAJECTORIES. IN PROCEEDINGS OF INTERNATIONAL CONFERENCE ON WORLD WILD WEB (WWW 2009),
MADRID SPAIN. ACM PRESS: 791-800.
26. TRAJECTORY POINT
A TRAJECTORY POINT, ๐๐
๐
, IS THE LOCATION OF OBJECT ๐ AT TIME ๐, AND IS
DEFINED AS
โข is the longitude of the location which varies from 0ยฐ to ยฑ180ยฐ๐ฅ๐
๐
โข is the latitude which varies from 0ยฐ
to ยฑ90ยฐ
.๐ฆ๐
๐
Definitio
ns
๐๐
๐
=< ๐ฅ๐
๐
, ๐ฆ๐
๐
>
27. RAW TRAJECTORY
A RAW TRAJECTORY, OR SIMPLY TRAJECTORY, IS A TIME-ORDERED SEQUENCE OF
TRAJECTORY POINTS OF SOME MOVING OBJECT ๐
Definitio
ns
๐ ๐
=< ๐1
๐
, ๐2
๐
, โฆ , ๐ ๐
๐
>
๐1
๐ ๐2
๐
๐3
๐
๐4
๐
๐5
๐
๐6
๐
๐7
๐
๐8
๐
๐9
๐ ๐10
๐
๐11
๐
๐12
๐ ๐13
๐
๐14
๐
๐15
๐
๐16
๐
๐17
๐
๐18
๐
๐19
๐
I am Mohammad Etemad, a Ph.D. candidate at Dalhousie University. I am going to present a classification-aided approach for trajectory segmentation called wise sliding window segmentation, WS-II.
First, I talk briefly about trajectory segmentation applications.
Then, we formally describe the problem of trajectory segmentation.
Then I discuss some available solutions for trajectory segmentation.
After defining the problem, I propose our approach to solve this problem using WS-II.
At the end, We explain our experimental results and provide some conclusions on our approach.
Trajectory segmentation is a preprocessing approach that can be used in a variety of applications.
It can be applied to detect fishing or non-fishing trajectories.
It can be used to detect animal migration and behavior analysis.
In tourism, it can help identify points of interest and visit patterns.
Trajectory segmentation can be applied to find segments with different transportation modes in traffic dynamics such as walk, run, bikes, bus, or cars.
This task can facilitate the identification of patterns in vessel movement, find abnormal movements, and planning for selecting an efficient path in the vessel navigation domain.
We formally define trajectory segmentation as an approach to divide an ordered set of trajectory points, called raw trajectories, to subsets of ordered sets of trajectory points so that each subset satisfies specific criteria.
We evaluate the performance of the trajectory segmentation task by using knowledge of an expert represented in two variables, called sid and label.
Sid is the segment identifier assigned by an expert and label is the semantic knowledge about that segment such as transportation mode or fishing /non-fishing
There are a few well-known solutions available for trajectory segmentation. The very first solution is based on finding stops and moves of a trajectory. This solution drives from the idea that there is a stop segment between every two moving segments.
In this case, the moving object moves in the vicinity of a central point during a predefined time threshold.
A more sophisticated approach to this problem is CB-SMoT, in which we find stop segments using the speed of moving objects and utilizing a density-based clustering approach.
GRASP_UTS is a more advanced approach using The minimum description length (MDL) principle to find homogenous segments.
OWS is a recent approach based on utilizing a sliding window that uses an interpolation technique as its core to generate an error signal.
This error signal is a proxy to indicate behavioral changes of the moving object.
Here we are proposing WS-II, including five major steps.
At first, we generate an error signal, which is the core part of the OWS algorithm.
Then we generate samples of error signal segments that are labeled by an expert.
Using the generated samples, we train a binary classifier to predict if an error signal segment includes a partitioning position or not.
Since we use a sliding window with size n, we have n number of predictions for each trajectory point. Using a majority vote, we decide if the potential partitioning position is an actual partitioning position or not.
Then we select the partitioning position and produce our segments.
In this illustration, we show the steps of WSII in a big picture. This approach has two types of trajectory data.
One which is labeled by an expert and another that has no label.
Therefore this approach relies on the quality and amount of available labeled trajectory data. The error signal can be generated using the same function for labeled and non-labeled data. Obviously, there is no label associated with each sample for non-labeled data.
Using the labeled samples, we train a binary classifier, and we use them to predict the label for each non-labeled sample.
Then we apply a majority vote to decide which potential partitioning position is an actual partitioning position.
As explained in detail in the reference [2] as a core part of the OWS, an error signal is generated using the distance deviation of interpolated midpoint from the actual midpoint in a sliding window over a trajectory.
Here we can see a sample to produce an error signal for the green sliding windowโthe distance between actual midpoint, which is l4 and IC, which is the interpolated midpoint.
IC is positioned using two extrapolated points LB and LF in a backward and forward manner. The use of two extrapolations in a forward and backward way helps to soften the error signal and provides robustness against some GPS noises such as jump noise and gaps.
After generating labeled samples, we pass them to a classifier to learn to predict a partitioning position. We are not interested in finding the best binary classifier because we use a majority vote in our approach.
All we need is a binary classifier that works better than a dummy classifier. By increasing the length of our sliding window, the number of dummy classifiers used for the final prediction will increase, and it helps us to worry less about the accuracy of this binary classifier. We select a random forest with a limited number of estimators and features to increase the number of estimators participating in the final decision.
We create labeled samples using the error signal generated for labeled data.
Therefore for each point, we know that if the point is a partitioning position or not.
here. We select a window size of 7 as an example.
So each seven error values create a sample , e1 to e7, and the label is assigned based on the inclusion of the partitioning position.
Zero is used for the case that the samples do not include a partitioning position, and one used for the cases they include a partitioning position. To generate the next sample, we move our sliding window by one and repeat the above process to create the following sample.
In the case of non-labeled data, we do not care about the label of each error signal segment because our binary classifier will predict it.
For each midpoint of a non-labeled sample, we predict if the sample includes a partitioning position or not. Because each sliding window has n trajectory points, we have n prediction for each window. The majority vote for each window produces the final decision whether the window includes a portioning position or not. If the window has a portioning position, the partitioning position is the midpoint of the window.
Here we explain the majority vote in more detail. The decision to select a midpoint of a sliding window as a partitioning position drives from the predictions of all samples include the midpoint. Therefore, there is n number of forecasts for a sliding window size n. If the number of affirmative votes is more than n/2 here, or degree of majority vote times window size, we select that midpoint as a partitioning position.
The BCLS in this table is the result of predictions of the binary classifier, and the MCLS is the majority vote for that point.
We use a parameter called degree of majority vote that indicates how many positive votes are required to accept a potential partitioning position as an actual partitioning position.
When we decide which sliding window includes the actual partitioning position, we select the midpoint of that sliding window as the partitioning position, which is the last point of a segment. The next point is the start of the next segment.
We have developed stay point detection (SPD), CBSMOT, OWS, and grasp-UTS as our base to compare WS-II against them since they are available solutions for this problem.
We have employed four datasets in this research. Fishing, hurricane, and Geolife datasets were applied to compare our proposed model with other algorithms. We collected and produced a dataset called Small AIS dataset, including the AIS data for ten different vessels moving in the Halifax harbor to test our algorithm.
We did not use small AIS dataset for comparing the algorithms because it has been used in debugging and testing our algorithm.
This proprietary dataset includes 5190 trajectory points and 153 segments and introduced in the GRASP-UTS evaluation research. We clean this dataset by removing short segments.segments shorter than sliding window.
This public dataset includes 1990 trajectory points and 182 segments. The segments are created based on wind speed and the category of hurricane and introduced in GRASP-UTS research.
This public dataset is a subset of the Geolife dataset includes 32,095 trajectory points and 304 segments and introduced in Microsoft research. We select a subset of this dataset because GRASP-UTS took an unreasonably long time to process subsets larger than this, and GRASP-UTS was not able to produce the results for the whole dataset.
We collected this dataset using an AIS antenna and a raspberry pi.
The data is shared with the AIShub and is available on our GitHub address. We labeled some parts of this dataset using geographical features of the harbor and label them.
This dataset is used in our debugging and testing of the proposed algorithm. Therefore we did not use it for our evaluations.
There are two major parameters for WS-II algorithm. First the size of sliding window. Second, the degree on majority vote. We select five window sizes and two majority vote to tune our algorithm.
Increase of majority vote shows more robust results and improved by increasing the window size.
For hurricane dataset, the higher degree of majority vote provides higher quality segments. However, the increase of window size did not increase the performance. We think this is because of the sampling rate in this dataset which is 6hrs.
For geolife dataset, shorter window sizes with higher majority vote degree provided more robust result.
To compare the performance of our proposed method, we applied the harmonic mean of purity and coverage. Purity and coverage are introduced in GRASP-UTS research as a performance measure. Since they are perpendicular, if we assume, there are no two adjacent segments with the same label. We also evaluate the purity and coverage independently, which are not reported in the paper and available on our Github page.
As we can see, the proposed method provides higher performance than other methods on all experimented datasets.
In summary, to our knowledge, WS-II is the first supervised method for trajectory segmentation. It benefits from the majority vote to make more sound decisions on the placement of partitioning position and segmenting our trajectories.
Although this method has some strengths, such as its high performance and use of labeled data to learn and work in different domains, There are some weaknesses as well.
First, the assumption of having access to labeled data is a limitation for this algorithm. Second, The algorithm cannot produce a reasonably good performance if the segments are smaller than the length of the sliding window.
These are our primary resources in this presentation.
There is a more comprehensive list of resources available in our paper.