Anna Monreale Fabio Pinelli Roberto Trasarti  Fosca Giannotti A. Monreale, F. Pinelli, R. Trasarti, F. Giannotti.  WhereNext: a Location Predictor on Trajectory Pattern Mining . KDD 2009 Knowledge Discovery and Delivery Lab (ISTI-CNR  &  Univ. Pisa) ‏ www-kdd.isti.cnr.it
Wireless networks infrastructures are the  nerves of our territory besides offering their services, they gather highly informative  traces  about the human mobile activities Miniaturization, wearability, pervasiveness will produce traces of increasing positioning accuracy semantic richness
From the analysis of the traces of our mobile phones it is possible to reconstruct our mobile behaviour, the way we collectively move  This knowledge may help us improving decision-making in many mobility-related issues: Planning traffic and public mobility systems in metropolitan areas;  Planning physical communication networks Forecasting traffic-related phenomena Organizing logistics systems Prediction
 
Predicting the next location of a trajectory can improve a large set of services such as: Navigational services. Traffic management. Location-based advertising. Services Pre-fetching. Simulation. ? ? ? .4 .8 .35
How to realize this idea:  Extract patterns from  all the  available movements  in a certain area instead of on the individual history of an object; Using these  Local movement patterns  as predictive rules.    Build a prediction tree as global model. Trajectory dataset Local patterns Prediction Tree
Select the set of interesting trajectories Validation Evaluation Extract T-Patterns (A set of Local models) Merge T-Patterns (Global model) Use the Condensed model as predictor
The local pattern   we use is the  T-Pattern.  It describes the common behavior of a group of users in space and time.  F. Giannotti, M. Nanni, F. Pinelli, and D. Pedreschi.  Trajectory pattern mining . KDD 2007: 330-339.
Generating  all rules  from each T-pattern and using them to build a classifier is too expensive. T-Pattern Rules α 1 α 2 α 3 R 1 R 2 R 3 R 4 R 1 R 2 R 3 R 4 R 1 R 2 R 3 R 4
To avoid the rules generation the T-Pattern set is organized as a prefix tree.  For Each node  v   •  Id  identifies the node  v •  Region  a spatial component of the T-Pattern •  Support  is the support of the T-pattern For Each edge  j   •  [a,b]  correspond to the time interval  α n  of the T-Pattern
Three steps: Search for best match Candidate generation Make predictions How to compute the Best Match? Best Match Prediction
The spatio-temporal distance computed between the segment of trajectory (bounded in time using the previous transition time) and the current node of the path. Case a : The trajectory segment intersects the region of the node Case b : The enlarged trajectory segment intersects the region Case c : The enlarged trajectory segment doesn’t intersect the region Where  the  th_t  is the time tolerance window defined by the user.
The path score is the aggregation of all punctual scores along a path.  The  Best Match  is the path having: the maximum path score; at least one admissible prediction. 10 min 15 min 8 min 10 min Punctual score: 1 Punctual Score: .58 Punctual Score: .8 11 min 16 min Path score .79
Average  generalizes distances between the trajectory and each node Sum  is based on the concept of depth Max  is the optimistic one, the best punctual score is selected as path score Context-dependent  aggregations   can take into consideration other aspects of the problem.
The WhereNext algorithm can be tuned using its parameters: -  th_t  : time window tolerance -  th_s : space window tolerance -  th_score : minimum prediction score threshold -  th_agg : the aggregation function used to compute the path score (Avg, Sum or Max)
It is very hard to understand which is the best set of  T-patterns we can use to build the our model: a big set of  T-patterns    very slow prediction.  a small set of T-patterns    coverage leaks For this reason we have defined a way to measure the prediction power of a T-Pattern set.
An evaluating function is defined to estimate the  predicting power  of a T-Pattern set. SpatialCoverage : the space coverage of the regions contained in the T-Patterns set; DatasetCoverage : measures how much the T-Pattern set represents the trajectories RegionSeparation : the precision of the regions in the T-Pattern set. Model 1 Model 2 Testing the a priori evaluation
You are here
The results are evaluated using the following measures: Accuracy : rate of the correctly predicted locations (space and time) divided by the total number of trajectories to be predicted.  Average Error : the average distance between the real trajectories in the predicted interval and the region predicted.  Prediction rate : the number of trajectories which have a prediction divided by the total number of trajectories to be predicted. Predicted Location Cut Original Predicted Location Cut Original Error
We used real life GPS dataset obtained from 17,000 vehicles in the urban area of the city of Milan. Training set : 4000 trajectories between 7am and 10 am on Wednesday  Test set : 500 trajectories between 7am and 10 am on Thursday.
Predicted  vs  th_score Average Error  vs  th_space
Accuracy  vs  Average Error Single Users  Accuracy  and  Prediction rate
A visual example of the application on Milan mobility data. The context is traffic management and we want to predict how the traffic will move in the city center. We have built a predictor on a “good” set of  T-patterns which  include  the city gates of Milan. Part of the GeoPKDD integrated platform.  F. Giannotti, D. Pedreschi, and et al. Geopkdd:  Geographic privacy-aware knowledge discovery and delivery  (european project), 2008.
- A  new technique  to predict the next locations of a trajectory based on previous movements of all the objects without considering any information about the users. - The  time information  is used not only to order the events but is intrinsically equipped in the T-Patterns used to build the Prediction tree. - The user can  tune the method  to obtain a good accuracy and prediction rate. - We are experimenting the method  in real world  applications.
 
Trajectories Dataset Regions of Interest T-PATTERNS
 
The same exact spatial location (x,y) usually never occurs twice The same exact transition times usually do not occur twice Solution: allow approximation a notion of  spatial neighborhood a notion of  temporal tolerance
Two points match if one falls within a  spatial neighborhood N()  of the other Two transition times match if their  temporal difference is  ≤ τ Example:
Two points match if one falls within a  spatial neighborhood N()  of the other Two transition times match if their  temporal difference is  ≤ τ Example:
Two points match if one falls within a  spatial neighborhood N()  of the other Two transition times match if their  temporal difference is  ≤ τ Example:
T-pattern mining can be mapped to a density estimation problem over R 3n-1   2 dimensions for each (x,y) in the pattern (2n) ‏ 1 dimension for each transition (n-1) ‏ Density computed by mapping each sub-sequence of n points of each input trajectory to  R 3n-1 drawing an influence area for each point (composition of  N()  and  τ ) Too computationally expensive, heuristics needed Our solution: a combination of sequential pattern mining and density-based clustering

Where Next

  • 1.
    Anna Monreale FabioPinelli Roberto Trasarti Fosca Giannotti A. Monreale, F. Pinelli, R. Trasarti, F. Giannotti. WhereNext: a Location Predictor on Trajectory Pattern Mining . KDD 2009 Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa) ‏ www-kdd.isti.cnr.it
  • 2.
    Wireless networks infrastructuresare the nerves of our territory besides offering their services, they gather highly informative traces about the human mobile activities Miniaturization, wearability, pervasiveness will produce traces of increasing positioning accuracy semantic richness
  • 3.
    From the analysisof the traces of our mobile phones it is possible to reconstruct our mobile behaviour, the way we collectively move This knowledge may help us improving decision-making in many mobility-related issues: Planning traffic and public mobility systems in metropolitan areas; Planning physical communication networks Forecasting traffic-related phenomena Organizing logistics systems Prediction
  • 4.
  • 5.
    Predicting the nextlocation of a trajectory can improve a large set of services such as: Navigational services. Traffic management. Location-based advertising. Services Pre-fetching. Simulation. ? ? ? .4 .8 .35
  • 6.
    How to realizethis idea: Extract patterns from all the available movements in a certain area instead of on the individual history of an object; Using these Local movement patterns as predictive rules. Build a prediction tree as global model. Trajectory dataset Local patterns Prediction Tree
  • 7.
    Select the setof interesting trajectories Validation Evaluation Extract T-Patterns (A set of Local models) Merge T-Patterns (Global model) Use the Condensed model as predictor
  • 8.
    The local pattern we use is the T-Pattern. It describes the common behavior of a group of users in space and time. F. Giannotti, M. Nanni, F. Pinelli, and D. Pedreschi. Trajectory pattern mining . KDD 2007: 330-339.
  • 9.
    Generating allrules from each T-pattern and using them to build a classifier is too expensive. T-Pattern Rules α 1 α 2 α 3 R 1 R 2 R 3 R 4 R 1 R 2 R 3 R 4 R 1 R 2 R 3 R 4
  • 10.
    To avoid therules generation the T-Pattern set is organized as a prefix tree. For Each node v • Id identifies the node v • Region a spatial component of the T-Pattern • Support is the support of the T-pattern For Each edge j • [a,b] correspond to the time interval α n of the T-Pattern
  • 11.
    Three steps: Searchfor best match Candidate generation Make predictions How to compute the Best Match? Best Match Prediction
  • 12.
    The spatio-temporal distancecomputed between the segment of trajectory (bounded in time using the previous transition time) and the current node of the path. Case a : The trajectory segment intersects the region of the node Case b : The enlarged trajectory segment intersects the region Case c : The enlarged trajectory segment doesn’t intersect the region Where the th_t is the time tolerance window defined by the user.
  • 13.
    The path scoreis the aggregation of all punctual scores along a path. The Best Match is the path having: the maximum path score; at least one admissible prediction. 10 min 15 min 8 min 10 min Punctual score: 1 Punctual Score: .58 Punctual Score: .8 11 min 16 min Path score .79
  • 14.
    Average generalizesdistances between the trajectory and each node Sum is based on the concept of depth Max is the optimistic one, the best punctual score is selected as path score Context-dependent aggregations can take into consideration other aspects of the problem.
  • 15.
    The WhereNext algorithmcan be tuned using its parameters: - th_t : time window tolerance - th_s : space window tolerance - th_score : minimum prediction score threshold - th_agg : the aggregation function used to compute the path score (Avg, Sum or Max)
  • 16.
    It is veryhard to understand which is the best set of T-patterns we can use to build the our model: a big set of T-patterns  very slow prediction. a small set of T-patterns  coverage leaks For this reason we have defined a way to measure the prediction power of a T-Pattern set.
  • 17.
    An evaluating functionis defined to estimate the predicting power of a T-Pattern set. SpatialCoverage : the space coverage of the regions contained in the T-Patterns set; DatasetCoverage : measures how much the T-Pattern set represents the trajectories RegionSeparation : the precision of the regions in the T-Pattern set. Model 1 Model 2 Testing the a priori evaluation
  • 18.
  • 19.
    The results areevaluated using the following measures: Accuracy : rate of the correctly predicted locations (space and time) divided by the total number of trajectories to be predicted. Average Error : the average distance between the real trajectories in the predicted interval and the region predicted. Prediction rate : the number of trajectories which have a prediction divided by the total number of trajectories to be predicted. Predicted Location Cut Original Predicted Location Cut Original Error
  • 20.
    We used reallife GPS dataset obtained from 17,000 vehicles in the urban area of the city of Milan. Training set : 4000 trajectories between 7am and 10 am on Wednesday Test set : 500 trajectories between 7am and 10 am on Thursday.
  • 21.
    Predicted vs th_score Average Error vs th_space
  • 22.
    Accuracy vs Average Error Single Users Accuracy and Prediction rate
  • 23.
    A visual exampleof the application on Milan mobility data. The context is traffic management and we want to predict how the traffic will move in the city center. We have built a predictor on a “good” set of T-patterns which include the city gates of Milan. Part of the GeoPKDD integrated platform. F. Giannotti, D. Pedreschi, and et al. Geopkdd: Geographic privacy-aware knowledge discovery and delivery (european project), 2008.
  • 24.
    - A new technique to predict the next locations of a trajectory based on previous movements of all the objects without considering any information about the users. - The time information is used not only to order the events but is intrinsically equipped in the T-Patterns used to build the Prediction tree. - The user can tune the method to obtain a good accuracy and prediction rate. - We are experimenting the method in real world applications.
  • 25.
  • 26.
    Trajectories Dataset Regionsof Interest T-PATTERNS
  • 27.
  • 28.
    The same exactspatial location (x,y) usually never occurs twice The same exact transition times usually do not occur twice Solution: allow approximation a notion of spatial neighborhood a notion of temporal tolerance
  • 29.
    Two points matchif one falls within a spatial neighborhood N() of the other Two transition times match if their temporal difference is ≤ τ Example:
  • 30.
    Two points matchif one falls within a spatial neighborhood N() of the other Two transition times match if their temporal difference is ≤ τ Example:
  • 31.
    Two points matchif one falls within a spatial neighborhood N() of the other Two transition times match if their temporal difference is ≤ τ Example:
  • 32.
    T-pattern mining canbe mapped to a density estimation problem over R 3n-1 2 dimensions for each (x,y) in the pattern (2n) ‏ 1 dimension for each transition (n-1) ‏ Density computed by mapping each sub-sequence of n points of each input trajectory to R 3n-1 drawing an influence area for each point (composition of N() and τ ) Too computationally expensive, heuristics needed Our solution: a combination of sequential pattern mining and density-based clustering

Editor's Notes