• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes.
 

Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes.

on

  • 1,495 views

Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes.

Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes.
Giuseppe Borruso, Gabriella Schoier - University of Trieste

Statistics

Views

Total Views
1,495
Views on SlideShare
1,494
Embed Views
1

Actions

Likes
2
Downloads
129
Comments
0

1 Embed 1

http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes. Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes. Presentation Transcript

    • Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes. Giuseppe Borruso, Gabriella Schoier Department of Economic, Business, Mathematic and Statistic Sciences - DEAMS, University of Trieste, Via Valerio, 4/1, 34127 Trieste, Italy {gabriella.schoier,giuseppe.borruso}@econ.units.it Geographical Analysis, Urban Modeling, Spatial Statistics GEOG-AN-MOD 11
    • Introduction
      • The rapid developments in the availability and access to spatially referenced information in a variety of areas has induced the need for better analysis techniques to understand the various phenomena.
      • In particular spatial clustering algorithms , which groups similar spatial objects into classes, can be used for the identification of areas sharing common characteristics.
      • The aim of this paper is to present a density based algorithm for the discovery of clusters of units in large spatial data sets.
      • In particular the analysis represents a first insight into a wealth of geographical data collected by individuals as activity dairy data , these representing the routes a set of individuals drive in their daily movements for reasons connected to work, children picking, free time, rest.
      • In this analysis the attention is drawn on point data sets corresponding to GPS traces driven along a same route in different days.
      • Our aim here is to explore the presence of clusters along the route , trying to understand the origins and motivations behind that in order to better understand the road network structure in terms of ’dense’ spaces along the network , these representing road congestion , rather than the presence of junctions or other factors affecting individual mobility .
      • In this paper the attention is therefore focused on methods to highlight such clusters and see their impact on the network structure.
      • Spatial clustering algorithms are examined (DBSCAN) and a comparison with other non-parametric density based algorithm (Kernel Density Estimation) is performed. A test is performed over the urban area of Trieste (Italy).
    • Geographical Data Mining
      • In recent years geographic data collection devices linked to location-aware technologies such as the global positioning system allow researchers to collect huge amounts of data.
      • Other devices such as cell phones, in-vehicle navigation systems and wireless Internet clients can capture data on individual movement patterns.
      • The process of extracting information and knowledge from these massive georeferenced databases is known as Geographic Knowledge Discovery (GKD) or Geographical Data Mining.
      • Real-time environmental monitoring systems such as intelligent transportation systems and location-based services are generating geo-referenced data in the form of dynamic flows and space-time trajectories.
      • The complexity of spatial objects and relationships in geo-referenced data, as well as the computationally intensity of many spatial algorithms, means that geographic background knowledge can play an important role in managing the GKD process.
    • Spatial Clustering Algorithms
      • DBSCAN Density-Based Spatial Clustering of Applications with Noise and its extensions judges the density around the neighborhood of an object to be sufficiently dense if the number of points within a distance Epscoord of an object is greater than MinPts number of points.
      • It classifies regions with sufficiently high density into clusters, and discovers clusters of arbitrary shape. The density based notion of clustering states that within each cluster the density of the points has to be significantly higher than the density of points outside the cluster
      • where D is a data set of points.
      • The distance function determines the shape of the neighborhood. MinPts is the minimum number of points that must be contained in the neighborhood of that point in the cluster..
    • Kernel Density Estimation
      • The function creates a density surface from a distribution of points (events) in space, providing an estimate of events withn its searching function according to their distance from the point where the estimate is computed.
      • =‘moving three dimensional functions that weights events within its sphere of influence according to their distance from the point at which the intensity is being estimated’ (Gatrell et al. 1996).
      Gatrell A., Bailey T, Diggle P., Rowlingson B. (1996): “Spatial Point Pattern Analysis and its Application in Geographical Epidemiology”. Transactions of the Institute of British Geographers 21: 256-74. Study Region R Event s i Location s Kernel k( ) Bandwidth τ
    • Kernel Density Estimation
      • The function creates a density surface from a distribution of points (events) in space, providing an estimate of events withn its searching function according to their distance from the point where the estimate is computed.
      • = estimate of intensity of the spatial distribution of events, measured in point s ;
      • s i = the i th event;
      • k( ) = kernel function
      • = bandwidth.
      • Varying the bandwidth it is possible to obtain smoother or sharper surfaces and analyze the phenomena at different scales.
    • Some Comparisons between DBSCAN and KDE
      • DBSCAN
      • Bandwidth
      • AND
      • N of points
      • Highlights clusters of points
      • KDE
      • Bandwidth
      • OR
      • N of points
      • Function (quartic, normal, uniform, etc.).
      • cell size
    • The Data and the Study Area
      • The analysis was performed over an activity dairy dataset as registered by individual users relying on a commercial GPS receiver for personal car navigation. The full dataset consists of journeys registering home-work, home-school, work free time locations (gym, theatres, etc.) collected over a 6 months period (July 2010 - January 2011) on a nearly daily basis and mainly based on car usage. The data were mainly collected in the Province of Trieste (Italy) although travels in the region, abroad or in other Italian cities were recordered.
      • In this initial stage of the research, the attention is dedicated to highlighting hot-spots in a linear dataset, as that represented by GPS traces.
      • Hotspots can be of interest for highlighting braking areas along a route and therefore those critical nodes and junctions in an urban road network characterized by traffic or other forms of congestion.
      • Also, the aim is that of allowing the realization of maps of fluidity or accessibility given the users travel time in certain areas.
    • A data sample: October data (5, 6, 7, 8, 11, 12)
    • Anticipation – density analysis on different routes & days
    • A data sample: October data (5, 6, 7, 8, 11, 12) Sample POIs University campus Gym Home (2) Home (1) Nursery
      • Selected route on different days (on same route)
      • 23, 24, 30 August
      • 1 September
      University campus Home (1) Nursery
    • The Analysis
      • DBSCAN
      • In the application Epscoord =0.0004 as the bandwidth is 44 m.
      • while MinPts = 15 in order to have meaningful clusters (=15 sec).
      • KDE
      • quartic kernel function,
      • with a 5 m cell resolution
      • 44 m bandwidth.
      • Tests:
        • 25 m bandwidth (good for highlighting hotspots at small scale)
        • 100 m (over smoothed too much the data)
      • Euclidean (not network) density estimation
      1 059 GPS points collected on 1° September 2010 (Home – Nursery journey)
    • Results from DBSCAN (right) and KDE (left)
    • The results
      • The results from the DBSCAN are six clusters visible from the numbers. They are represented with different colors: cluster 1 with red, cluster 2 green, cluster 3 blue, cluster 4 light blue, cluster 5 pink and cluster 6 yellow.
      • The results from the density analysis are visible as darker areas from light grey to black in the map. Clusters here visible as hotspots can be seen as black areas, while denser dark grey areas can be interpreted as slower segments of the journey.
      • Non-peak areas deriving from KDE analysis, as the dark grey areas along the journey.
      • Slow parts of the journey when compared to the rest of it.
      • When speed is slower than in the normal journey, and also in other parts of the route, in urbanized areas
    •  
    •  
    • University campus – entrance and bus stop
    • Junction – state roads
    • Nursery – end of the journey
    • Conclusions and further development
      • The attention is drawn on point datasets corresponding to GPS traces driven along a same route in different days.
      • The attention is focused on methods to highlight such clusters and see their impact on the network structure in terms of ’dense’ spaces along the network, these representing road congestion, rather than the presence of junctions or other factors affecting individual mobility.
      • The results from the application of both the DBSCAN algorithm and the KDE one, using the same searching radius,are comparable. They show the hotspots in the same places when the speed is slower than in the normal journey, in urbanized areas (beginning and ending of the journey) as well as in out of town road segments in correspondence to major bends.
      • understand pattern of movements of a set of individuals in a same day, as well as differences in ’city uses’ in different times of the day, and therefore highlight a city’s ’variable geometry’ in time and space.
      • Also, the presence of hotspots replicated on different journeys can allow highlighting the different speeds in an urban environment and also their directions, therefore allowing to understand urban metrics and shapes in a different way than that usually allowed by more simplified data sets.
    • Thank you for your attention! Questions?