Where Next
Upcoming SlideShare
Loading in...5

Where Next






Total Views
Views on SlideShare
Embed Views



1 Embed 2

http://www.slideshare.net 2



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • We have found a way
  • We did an experiment
  • Made so far

Where Next Where Next Presentation Transcript

  • Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale, F. Pinelli, R. Trasarti, F. Giannotti. WhereNext: a Location Predictor on Trajectory Pattern Mining . KDD 2009 Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa) ‏ www-kdd.isti.cnr.it
    • Wireless networks infrastructures are the nerves of our territory
    • besides offering their services, they gather highly informative traces about the human mobile activities
    • Miniaturization, wearability, pervasiveness will produce traces of increasing
      • positioning accuracy
      • semantic richness
    • From the analysis of the traces of our mobile phones it is possible to reconstruct our mobile behaviour, the way we collectively move
    • This knowledge may help us improving decision-making in many mobility-related issues:
      • Planning traffic and public mobility systems in metropolitan areas;
      • Planning physical communication networks
      • Forecasting traffic-related phenomena
      • Organizing logistics systems
      • Prediction
    • Predicting the next location of a trajectory can improve a large set of services such as:
    • Navigational services.
    • Traffic management.
    • Location-based advertising.
    • Services Pre-fetching.
    • Simulation.
    ? ? ? .4 .8 .35
    • How to realize this idea:
    • Extract patterns from all the available movements in a certain area instead of on the individual history of an object;
    • Using these Local movement patterns as predictive rules.
    • Build a prediction tree as global model.
    Trajectory dataset Local patterns Prediction Tree
  • Select the set of interesting trajectories Validation Evaluation Extract T-Patterns (A set of Local models) Merge T-Patterns (Global model) Use the Condensed model as predictor
    • The local pattern we use is the T-Pattern. It describes the common behavior of a group of users in space and time.
    F. Giannotti, M. Nanni, F. Pinelli, and D. Pedreschi. Trajectory pattern mining . KDD 2007: 330-339.
    • Generating all rules from each T-pattern and using them to build a classifier is too expensive.
    T-Pattern Rules α 1 α 2 α 3 R 1 R 2 R 3 R 4 R 1 R 2 R 3 R 4 R 1 R 2 R 3 R 4
    • To avoid the rules generation the T-Pattern set is organized as a prefix tree.
    • For Each node v • Id identifies the node v
    • • Region a spatial component of the T-Pattern
    • • Support is the support of the T-pattern
    • For Each edge j
    • • [a,b] correspond to the time interval α n of the T-Pattern
    • Three steps:
      • Search for best match
      • Candidate generation
      • Make predictions
    How to compute the Best Match? Best Match Prediction
    • The spatio-temporal distance computed between the segment of trajectory (bounded in time using the previous transition time) and the current node of the path.
    Case a : The trajectory segment intersects the region of the node Case b : The enlarged trajectory segment intersects the region Case c : The enlarged trajectory segment doesn’t intersect the region Where the th_t is the time tolerance window defined by the user.
    • The path score is the aggregation of all punctual scores along a path.
    • The Best Match is the path having:
      • the maximum path score;
      • at least one admissible prediction.
    10 min 15 min 8 min 10 min Punctual score: 1 Punctual Score: .58 Punctual Score: .8 11 min 16 min Path score .79
    • Average generalizes distances between the trajectory and each node
    • Sum is based on the concept of depth
    • Max is the optimistic one, the best punctual score is selected as path score
    • Context-dependent aggregations can take into consideration other aspects of the problem.
    • The WhereNext algorithm can be tuned using its parameters: - th_t : time window tolerance
    • - th_s : space window tolerance
    • - th_score : minimum prediction score threshold
    • - th_agg : the aggregation function used to compute the path score (Avg, Sum or Max)
    • It is very hard to understand which is the best set of T-patterns we can use to build the our model:
    • a big set of T-patterns  very slow prediction.
    • a small set of T-patterns  coverage leaks
    • For this reason we have defined a way to measure the prediction power of a T-Pattern set.
    • An evaluating function is defined to estimate the predicting power of a T-Pattern set.
    • SpatialCoverage : the space coverage of the regions contained in the T-Patterns set;
    • DatasetCoverage : measures how much the T-Pattern set represents the trajectories
    • RegionSeparation : the precision of the regions in the T-Pattern set.
    Model 1 Model 2 Testing the a priori evaluation
  • You are here
    • The results are evaluated using the following measures:
    • Accuracy : rate of the correctly predicted locations (space and time) divided by the total number of trajectories to be predicted.
    • Average Error : the average distance between the real trajectories in the predicted interval and the region predicted.
    • Prediction rate : the number of trajectories which have a prediction divided by the total number of trajectories to be predicted.
    Predicted Location Cut Original Predicted Location Cut Original Error
    • We used real life GPS dataset obtained from 17,000 vehicles in the urban area of the city of Milan.
    Training set : 4000 trajectories between 7am and 10 am on Wednesday Test set : 500 trajectories between 7am and 10 am on Thursday.
    • Predicted vs th_score
    Average Error vs th_space
    • Accuracy vs Average Error
    Single Users Accuracy and Prediction rate
    • A visual example of the application on Milan mobility data. The context is traffic management and we want to predict how the traffic will move in the city center.
    • We have built a predictor on a “good” set of T-patterns which include the city gates of Milan.
    Part of the GeoPKDD integrated platform. F. Giannotti, D. Pedreschi, and et al. Geopkdd: Geographic privacy-aware knowledge discovery and delivery (european project), 2008.
    • - A new technique to predict the next locations of a trajectory based on previous movements of all the objects without considering any information about the users. - The time information is used not only to order the events but is intrinsically equipped in the T-Patterns used to build the Prediction tree. - The user can tune the method to obtain a good accuracy and prediction rate.
    • - We are experimenting the method in real world applications.
  • Trajectories Dataset Regions of Interest T-PATTERNS
    • The same exact spatial location (x,y) usually never occurs twice
    • The same exact transition times usually do not occur twice
    • Solution: allow approximation
      • a notion of spatial neighborhood
      • a notion of temporal tolerance
    • Two points match if one falls within a spatial neighborhood N() of the other
    • Two transition times match if their temporal difference is ≤ τ
    • Example:
    • Two points match if one falls within a spatial neighborhood N() of the other
    • Two transition times match if their temporal difference is ≤ τ
    • Example:
    • Two points match if one falls within a spatial neighborhood N() of the other
    • Two transition times match if their temporal difference is ≤ τ
    • Example:
    • T-pattern mining can be mapped to a density estimation problem over R 3n-1
      • 2 dimensions for each (x,y) in the pattern (2n) ‏
      • 1 dimension for each transition (n-1) ‏
    • Density computed by
      • mapping each sub-sequence of n points of each input trajectory to R 3n-1
      • drawing an influence area for each point (composition of N() and τ )
    • Too computationally expensive, heuristics needed
    • Our solution: a combination of sequential pattern mining and density-based clustering