Where Next


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • We have found a way
  • We did an experiment
  • Made so far
  • Where Next

    1. 1. Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale, F. Pinelli, R. Trasarti, F. Giannotti. WhereNext: a Location Predictor on Trajectory Pattern Mining . KDD 2009 Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa) ‏ www-kdd.isti.cnr.it
    2. 2. <ul><li>Wireless networks infrastructures are the nerves of our territory </li></ul><ul><li>besides offering their services, they gather highly informative traces about the human mobile activities </li></ul><ul><li>Miniaturization, wearability, pervasiveness will produce traces of increasing </li></ul><ul><ul><li>positioning accuracy </li></ul></ul><ul><ul><li>semantic richness </li></ul></ul>
    3. 3. <ul><li>From the analysis of the traces of our mobile phones it is possible to reconstruct our mobile behaviour, the way we collectively move </li></ul><ul><li>This knowledge may help us improving decision-making in many mobility-related issues: </li></ul><ul><ul><li>Planning traffic and public mobility systems in metropolitan areas; </li></ul></ul><ul><ul><li>Planning physical communication networks </li></ul></ul><ul><ul><li>Forecasting traffic-related phenomena </li></ul></ul><ul><ul><li>Organizing logistics systems </li></ul></ul><ul><ul><li>Prediction </li></ul></ul>
    4. 5. <ul><li>Predicting the next location of a trajectory can improve a large set of services such as: </li></ul><ul><li>Navigational services. </li></ul><ul><li>Traffic management. </li></ul><ul><li>Location-based advertising. </li></ul><ul><li>Services Pre-fetching. </li></ul><ul><li>Simulation. </li></ul>? ? ? .4 .8 .35
    5. 6. <ul><li>How to realize this idea: </li></ul><ul><li>Extract patterns from all the available movements in a certain area instead of on the individual history of an object; </li></ul><ul><li>Using these Local movement patterns as predictive rules. </li></ul><ul><li>Build a prediction tree as global model. </li></ul>Trajectory dataset Local patterns Prediction Tree
    6. 7. Select the set of interesting trajectories Validation Evaluation Extract T-Patterns (A set of Local models) Merge T-Patterns (Global model) Use the Condensed model as predictor
    7. 8. <ul><li>The local pattern we use is the T-Pattern. It describes the common behavior of a group of users in space and time. </li></ul>F. Giannotti, M. Nanni, F. Pinelli, and D. Pedreschi. Trajectory pattern mining . KDD 2007: 330-339.
    8. 9. <ul><li>Generating all rules from each T-pattern and using them to build a classifier is too expensive. </li></ul>T-Pattern Rules α 1 α 2 α 3 R 1 R 2 R 3 R 4 R 1 R 2 R 3 R 4 R 1 R 2 R 3 R 4
    9. 10. <ul><li>To avoid the rules generation the T-Pattern set is organized as a prefix tree. </li></ul><ul><li>For Each node v • Id identifies the node v </li></ul><ul><li>• Region a spatial component of the T-Pattern </li></ul><ul><li>• Support is the support of the T-pattern </li></ul><ul><li>For Each edge j </li></ul><ul><li>• [a,b] correspond to the time interval α n of the T-Pattern </li></ul>
    10. 11. <ul><li>Three steps: </li></ul><ul><ul><li>Search for best match </li></ul></ul><ul><ul><li>Candidate generation </li></ul></ul><ul><ul><li>Make predictions </li></ul></ul>How to compute the Best Match? Best Match Prediction
    11. 12. <ul><li>The spatio-temporal distance computed between the segment of trajectory (bounded in time using the previous transition time) and the current node of the path. </li></ul>Case a : The trajectory segment intersects the region of the node Case b : The enlarged trajectory segment intersects the region Case c : The enlarged trajectory segment doesn’t intersect the region Where the th_t is the time tolerance window defined by the user.
    12. 13. <ul><li>The path score is the aggregation of all punctual scores along a path. </li></ul><ul><li>The Best Match is the path having: </li></ul><ul><ul><li>the maximum path score; </li></ul></ul><ul><ul><li>at least one admissible prediction. </li></ul></ul>10 min 15 min 8 min 10 min Punctual score: 1 Punctual Score: .58 Punctual Score: .8 11 min 16 min Path score .79
    13. 14. <ul><li>Average generalizes distances between the trajectory and each node </li></ul><ul><li>Sum is based on the concept of depth </li></ul><ul><li>Max is the optimistic one, the best punctual score is selected as path score </li></ul><ul><li>Context-dependent aggregations can take into consideration other aspects of the problem. </li></ul>
    14. 15. <ul><li>The WhereNext algorithm can be tuned using its parameters: - th_t : time window tolerance </li></ul><ul><li>- th_s : space window tolerance </li></ul><ul><li>- th_score : minimum prediction score threshold </li></ul><ul><li>- th_agg : the aggregation function used to compute the path score (Avg, Sum or Max) </li></ul>
    15. 16. <ul><li>It is very hard to understand which is the best set of T-patterns we can use to build the our model: </li></ul><ul><li>a big set of T-patterns  very slow prediction. </li></ul><ul><li>a small set of T-patterns  coverage leaks </li></ul><ul><li>For this reason we have defined a way to measure the prediction power of a T-Pattern set. </li></ul>
    16. 17. <ul><li>An evaluating function is defined to estimate the predicting power of a T-Pattern set. </li></ul><ul><li>SpatialCoverage : the space coverage of the regions contained in the T-Patterns set; </li></ul><ul><li>DatasetCoverage : measures how much the T-Pattern set represents the trajectories </li></ul><ul><li>RegionSeparation : the precision of the regions in the T-Pattern set. </li></ul>Model 1 Model 2 Testing the a priori evaluation
    17. 18. You are here
    18. 19. <ul><li>The results are evaluated using the following measures: </li></ul><ul><li>Accuracy : rate of the correctly predicted locations (space and time) divided by the total number of trajectories to be predicted. </li></ul><ul><li>Average Error : the average distance between the real trajectories in the predicted interval and the region predicted. </li></ul><ul><li>Prediction rate : the number of trajectories which have a prediction divided by the total number of trajectories to be predicted. </li></ul>Predicted Location Cut Original Predicted Location Cut Original Error
    19. 20. <ul><li>We used real life GPS dataset obtained from 17,000 vehicles in the urban area of the city of Milan. </li></ul>Training set : 4000 trajectories between 7am and 10 am on Wednesday Test set : 500 trajectories between 7am and 10 am on Thursday.
    20. 21. <ul><li>Predicted vs th_score </li></ul>Average Error vs th_space
    21. 22. <ul><li>Accuracy vs Average Error </li></ul>Single Users Accuracy and Prediction rate
    22. 23. <ul><li>A visual example of the application on Milan mobility data. The context is traffic management and we want to predict how the traffic will move in the city center. </li></ul><ul><li>We have built a predictor on a “good” set of T-patterns which include the city gates of Milan. </li></ul>Part of the GeoPKDD integrated platform. F. Giannotti, D. Pedreschi, and et al. Geopkdd: Geographic privacy-aware knowledge discovery and delivery (european project), 2008.
    23. 24. <ul><li>- A new technique to predict the next locations of a trajectory based on previous movements of all the objects without considering any information about the users. - The time information is used not only to order the events but is intrinsically equipped in the T-Patterns used to build the Prediction tree. - The user can tune the method to obtain a good accuracy and prediction rate. </li></ul><ul><li>- We are experimenting the method in real world applications. </li></ul>
    24. 26. Trajectories Dataset Regions of Interest T-PATTERNS
    25. 28. <ul><li>The same exact spatial location (x,y) usually never occurs twice </li></ul><ul><li>The same exact transition times usually do not occur twice </li></ul><ul><li>Solution: allow approximation </li></ul><ul><ul><li>a notion of spatial neighborhood </li></ul></ul><ul><ul><li>a notion of temporal tolerance </li></ul></ul>
    26. 29. <ul><li>Two points match if one falls within a spatial neighborhood N() of the other </li></ul><ul><li>Two transition times match if their temporal difference is ≤ τ </li></ul><ul><li>Example: </li></ul>
    27. 30. <ul><li>Two points match if one falls within a spatial neighborhood N() of the other </li></ul><ul><li>Two transition times match if their temporal difference is ≤ τ </li></ul><ul><li>Example: </li></ul>
    28. 31. <ul><li>Two points match if one falls within a spatial neighborhood N() of the other </li></ul><ul><li>Two transition times match if their temporal difference is ≤ τ </li></ul><ul><li>Example: </li></ul>
    29. 32. <ul><li>T-pattern mining can be mapped to a density estimation problem over R 3n-1 </li></ul><ul><ul><li>2 dimensions for each (x,y) in the pattern (2n) ‏ </li></ul></ul><ul><ul><li>1 dimension for each transition (n-1) ‏ </li></ul></ul><ul><li>Density computed by </li></ul><ul><ul><li>mapping each sub-sequence of n points of each input trajectory to R 3n-1 </li></ul></ul><ul><ul><li>drawing an influence area for each point (composition of N() and τ ) </li></ul></ul><ul><li>Too computationally expensive, heuristics needed </li></ul><ul><li>Our solution: a combination of sequential pattern mining and density-based clustering </li></ul>