Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Mirco Nanni e-mail: mirco.nanni@isti.cnr.it Roberto Trasarti e-mail: roberto.trasarti@isti.cnr.it KDD Lab, ISTI-CNR, Pisa, Italy SSTDM-09 Miami, 2009
  2. 2. <ul><li>Diffusion of devices with GPS technology leads to a large amount of data for vehicles and individuals . </li></ul><ul><li>Applications on urban context require to map this data on a street network . </li></ul>
  3. 3. <ul><li>Geometric Map Matching : Consider a set of timestamps T = {0, 1, . . . , t} and a function Pr = (latitude, longitude) which describes the real position of a user at time r ∈ T. Then, given a set of georeferenced map objects O = {o1, . . . , on}, a Map Matching function is defined as follows: F( Pr) -> oj where oj ∈ O. </li></ul><ul><li>There are three classes of approches: </li></ul><ul><li>Point – to – point </li></ul><ul><ul><li>The original points are mapped on nodes of the network </li></ul></ul><ul><li>Point – to – segments </li></ul><ul><ul><li>The original points are mapped on segments of the network </li></ul></ul><ul><ul><li>Segments – to – Segments </li></ul></ul><ul><ul><ul><li>Considering every two consicutive original point the obtained segments are mapped on segments of the network </li></ul></ul></ul>
  4. 4. <ul><li>Accuracy of the devices : </li></ul><ul><ul><ul><li>Errors in the positioning create ambiguity during the process of map matching. </li></ul></ul></ul><ul><ul><li>Usually solved by heuristics . </li></ul></ul><ul><li>Sampling rate : </li></ul><ul><ul><ul><li>The sample rate (or storing rate) of the </li></ul></ul></ul><ul><ul><ul><li>device can produce a disconnected path. </li></ul></ul></ul><ul><ul><li>How to complete the path? </li></ul></ul>
  5. 5. <ul><li>Best Match : Let M be a street map, composed of: </li></ul><ul><li>a set of nodes M.nodes , </li></ul><ul><li>a set of oriented segments M.segments ⊆ M.nodes ×M.nodes, </li></ul><ul><li>and a cost function that associates each segment with a real value M.cost : M.segments -> R. </li></ul><ul><li>Then, given a sequence of segments S = <s 1 , . . . , s n >, the match set of S over M is defined as follows: </li></ul><ul><li>and the best match of S over M is: </li></ul><ul><li>where </li></ul>
  6. 6. <ul><li>From a set of disconnected segments to a connected path on the street network: </li></ul>
  7. 7. <ul><li>Point-to-segment </li></ul><ul><ul><li>When the two nearest segments of a point have distances from it that are equal up to a given tolerance, the segment whose starting vertex is closest to the end point of the previous segment of the trajectory is chosen. </li></ul></ul><ul><li>K-BestMatch </li></ul><ul><ul><li>We use a more flexible approach which considers the k-optimal alternatives paths between two disconnected segments of the initial set. </li></ul></ul>
  8. 8. <ul><li>The previous representation of a path on the street network becomes a multipath </li></ul>
  9. 9. <ul><li>Item-frequency representation : Given a street map M, a sequence of segments S = <s 1 , . . . , s n > and a positive integer k, the Item-Frequency representation IF(S,M, k) of S over M w.r.t. k is defined as a pair (I, f) such that: </li></ul><ul><li>The frequencies can be computed locally for each gap in S. </li></ul>
  10. 10. <ul><li>Freqeuncy: </li></ul><ul><li>Red = 1 </li></ul><ul><li>Orange >=.75 </li></ul><ul><li>Yellow >=.5 </li></ul><ul><li>Green >=.25 </li></ul><ul><li>Blue <=.25 </li></ul>K = 1 (BestMatch) K =4 As obvious, segments already contained in the original dataset have frequency 1 in both the cases.
  11. 11. <ul><li>Point-to-segment : Compares a set of points P with all the segments of the street network M.segments, therefore the complexity is O(|P|m) , where m = |M.segments| </li></ul><ul><li>K-BestMatch : considering G as the set of gaps in the dataset, the complexity is O(|G|kn(m +nlogn)) where k is the k-bestmatch parameter and n = |M.nodes| (See [1]). </li></ul><ul><li>The overall complexity is O(|G|kn(m+nlogn) + |P|m). If we assume that the road network M is fixed, and therefore n and m are constant factors, the complexity reduces to O(|G|k + |P|) . </li></ul>[1] Ernesto Q. V. Martins and Marta M. B. Pascoal. A new implementation of yen ranking loopless paths algorithm. In 4OR: A Quarterly Journal of Operations Research.
  12. 12. <ul><li>Given the Item-Frequency representations of two trajectories, IF1 = (I1, f1) and IF2 = (I2, f2), we define the following distances between IF1 and IF2: where: </li></ul>
  13. 13. <ul><li>As a first validation of the method, we provide a visual account of the effects of the k−BestMatch reconstruction with the Jaccard distance on k-Nearest Neighbor queries (kNN): </li></ul><ul><li>Q: Returning the top 10 objects that are closest to a chosen pivot object </li></ul><ul><li>Our solution contains a larger core of segments but there are not the outlier paths that occur in the BestMatch based. </li></ul>Pivot object BestMatch Result K-BestMatch Result
  14. 14. <ul><li>In order to test the method on a clustering task, a generic agglomerative hierarchical clustering algorithm was adapted to work with the item-frequency representation of trajectories. </li></ul>A Cluster using K-BestMatch Same set of trajectory using BestMatch. (It’s not a cluster)
  15. 15. <ul><li>In this experiment the k−BestMatch reconstruction was performed with several different values for k, then comparing the resulting clusters against those obtained by adopting the BestMatch approach (k = 1). </li></ul><ul><li>The comparison between clustering results is performed by the standard F-measure: </li></ul>
  16. 16. <ul><li>A first exploration of the effects that a more flexible map matching approach can have on the comparison, query and mining of trajectories. </li></ul><ul><li>Preliminary results are encouraging, and suggest that overcoming the limits of standard best-match reconstruction strategies can have beneficial effects on the successive analysis to be performed on such data. </li></ul><ul><li>The work also rose several open issues: </li></ul><ul><li>the need for refined methods to select the k-optimal alternative paths, for instance trying to limit path redundancy ; </li></ul><ul><li>the need for considering also the order of visit of the segments, thus moving from the item-frequency representation to a more complex one </li></ul>
  17. 17. <ul><li>Thank you. </li></ul><ul><li>Questions? </li></ul>Mirco Nanni e-mail: mirco.nanni@isti.cnr.it Roberto Trasarti e-mail: roberto.trasarti@isti.cnr.it KDD Lab, ISTI-CNR, Pisa, Italy