SlideShare a Scribd company logo
1 of 16
Automatic Classification of Retail Spaces from a Large Scale Topographc Database William A Mackaness, Omair Z Chaudhry School of GeoSciences, University of Edinburgh, william.mackaness@ed.ac.uk  Environmental and Geographical Sciences,  Manchester Metropolitan University, O.Chaudhry@mmu.ac.uk
 
A need to classify retail spaces   ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Characterising Retail Space
Retail Space
Measures to discern different Retail Spaces   ,[object Object],[object Object]
Identifying Retail Spaces
The High Street
Boolean Approach ,[object Object],[object Object]
Boolean Approach ,[object Object]
Boolean Logic
Fuzzy Logic
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object]
Bayesian Approach ,[object Object]
 
Conclusion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

More from GISRUK conference

4B_3_Automatically generating keywods for georeferenced imaged
4B_3_Automatically generating keywods for georeferenced imaged4B_3_Automatically generating keywods for georeferenced imaged
4B_3_Automatically generating keywods for georeferenced imaged
GISRUK conference
 
4B_1_How many volunteers does it take to map an area well
4B_1_How many volunteers does it take to map an area well4B_1_How many volunteers does it take to map an area well
4B_1_How many volunteers does it take to map an area well
GISRUK conference
 
4A_1_Uncertainty in the 2001 output area classification for the census of eng...
4A_1_Uncertainty in the 2001 output area classification for the census of eng...4A_1_Uncertainty in the 2001 output area classification for the census of eng...
4A_1_Uncertainty in the 2001 output area classification for the census of eng...
GISRUK conference
 
3A_4_Applying network analysis to quantify accessibility to urban greenspace ...
3A_4_Applying network analysis to quantify accessibility to urban greenspace ...3A_4_Applying network analysis to quantify accessibility to urban greenspace ...
3A_4_Applying network analysis to quantify accessibility to urban greenspace ...
GISRUK conference
 
3A_2_Modelling health-harming behaviours in a socially ranked geographic space
3A_2_Modelling health-harming behaviours in a socially ranked geographic space3A_2_Modelling health-harming behaviours in a socially ranked geographic space
3A_2_Modelling health-harming behaviours in a socially ranked geographic space
GISRUK conference
 
1A_3_A geodemographic classification of london primary schools
1A_3_A geodemographic classification of london primary schools1A_3_A geodemographic classification of london primary schools
1A_3_A geodemographic classification of london primary schools
GISRUK conference
 
9B_1_Trust in web gis a preliminary investigation of the environment agencys ...
9B_1_Trust in web gis a preliminary investigation of the environment agencys ...9B_1_Trust in web gis a preliminary investigation of the environment agencys ...
9B_1_Trust in web gis a preliminary investigation of the environment agencys ...
GISRUK conference
 
9A_1_On automatic mapping of environmental data using adaptive general regres...
9A_1_On automatic mapping of environmental data using adaptive general regres...9A_1_On automatic mapping of environmental data using adaptive general regres...
9A_1_On automatic mapping of environmental data using adaptive general regres...
GISRUK conference
 
8B_1_A map to hear - use of sound in enhancing the map use experience
8B_1_A map to hear - use of sound in enhancing the map use experience8B_1_A map to hear - use of sound in enhancing the map use experience
8B_1_A map to hear - use of sound in enhancing the map use experience
GISRUK conference
 
8A_2_A containment-first search algorithm for higher-order analysis of urban ...
8A_2_A containment-first search algorithm for higher-order analysis of urban ...8A_2_A containment-first search algorithm for higher-order analysis of urban ...
8A_2_A containment-first search algorithm for higher-order analysis of urban ...
GISRUK conference
 
7A_3_An ontological modelling of communications for an intelligent transport ...
7A_3_An ontological modelling of communications for an intelligent transport ...7A_3_An ontological modelling of communications for an intelligent transport ...
7A_3_An ontological modelling of communications for an intelligent transport ...
GISRUK conference
 
7A_2_Preliminary results of a spatial analysis of dublin citys bike rental sc...
7A_2_Preliminary results of a spatial analysis of dublin citys bike rental sc...7A_2_Preliminary results of a spatial analysis of dublin citys bike rental sc...
7A_2_Preliminary results of a spatial analysis of dublin citys bike rental sc...
GISRUK conference
 
7A_1_Multi-scale visualization of inbound and outbound traffic delays in london
7A_1_Multi-scale visualization of inbound and outbound traffic delays in london7A_1_Multi-scale visualization of inbound and outbound traffic delays in london
7A_1_Multi-scale visualization of inbound and outbound traffic delays in london
GISRUK conference
 

More from GISRUK conference (20)

4B_3_Automatically generating keywods for georeferenced imaged
4B_3_Automatically generating keywods for georeferenced imaged4B_3_Automatically generating keywods for georeferenced imaged
4B_3_Automatically generating keywods for georeferenced imaged
 
4B_1_How many volunteers does it take to map an area well
4B_1_How many volunteers does it take to map an area well4B_1_How many volunteers does it take to map an area well
4B_1_How many volunteers does it take to map an area well
 
4A_1_Uncertainty in the 2001 output area classification for the census of eng...
4A_1_Uncertainty in the 2001 output area classification for the census of eng...4A_1_Uncertainty in the 2001 output area classification for the census of eng...
4A_1_Uncertainty in the 2001 output area classification for the census of eng...
 
3A_4_Applying network analysis to quantify accessibility to urban greenspace ...
3A_4_Applying network analysis to quantify accessibility to urban greenspace ...3A_4_Applying network analysis to quantify accessibility to urban greenspace ...
3A_4_Applying network analysis to quantify accessibility to urban greenspace ...
 
3A_2_Modelling health-harming behaviours in a socially ranked geographic space
3A_2_Modelling health-harming behaviours in a socially ranked geographic space3A_2_Modelling health-harming behaviours in a socially ranked geographic space
3A_2_Modelling health-harming behaviours in a socially ranked geographic space
 
1A_3_A geodemographic classification of london primary schools
1A_3_A geodemographic classification of london primary schools1A_3_A geodemographic classification of london primary schools
1A_3_A geodemographic classification of london primary schools
 
UK Map Challenge Aidan Slingsby
UK Map Challenge   Aidan SlingsbyUK Map Challenge   Aidan Slingsby
UK Map Challenge Aidan Slingsby
 
SP_4 Supporting spatial negotiations in land use planning
SP_4 Supporting spatial negotiations in land use planningSP_4 Supporting spatial negotiations in land use planning
SP_4 Supporting spatial negotiations in land use planning
 
SP_3 Automatic identification of high streets and classification of urban lan...
SP_3 Automatic identification of high streets and classification of urban lan...SP_3 Automatic identification of high streets and classification of urban lan...
SP_3 Automatic identification of high streets and classification of urban lan...
 
9B_1_Trust in web gis a preliminary investigation of the environment agencys ...
9B_1_Trust in web gis a preliminary investigation of the environment agencys ...9B_1_Trust in web gis a preliminary investigation of the environment agencys ...
9B_1_Trust in web gis a preliminary investigation of the environment agencys ...
 
9A_1_On automatic mapping of environmental data using adaptive general regres...
9A_1_On automatic mapping of environmental data using adaptive general regres...9A_1_On automatic mapping of environmental data using adaptive general regres...
9A_1_On automatic mapping of environmental data using adaptive general regres...
 
8B_4_Exploring the usability of geographic information
8B_4_Exploring the usability of geographic information8B_4_Exploring the usability of geographic information
8B_4_Exploring the usability of geographic information
 
8B_2_Using sound to represent uncertainty in address locations
8B_2_Using sound to represent uncertainty in address locations8B_2_Using sound to represent uncertainty in address locations
8B_2_Using sound to represent uncertainty in address locations
 
8B_1_A map to hear - use of sound in enhancing the map use experience
8B_1_A map to hear - use of sound in enhancing the map use experience8B_1_A map to hear - use of sound in enhancing the map use experience
8B_1_A map to hear - use of sound in enhancing the map use experience
 
8A_2_A containment-first search algorithm for higher-order analysis of urban ...
8A_2_A containment-first search algorithm for higher-order analysis of urban ...8A_2_A containment-first search algorithm for higher-order analysis of urban ...
8A_2_A containment-first search algorithm for higher-order analysis of urban ...
 
7B_4_planning alerts for community maps
7B_4_planning alerts for community maps7B_4_planning alerts for community maps
7B_4_planning alerts for community maps
 
7B_1_Revealing the fuzzy geography of an urban locality
7B_1_Revealing the fuzzy geography of an urban locality7B_1_Revealing the fuzzy geography of an urban locality
7B_1_Revealing the fuzzy geography of an urban locality
 
7A_3_An ontological modelling of communications for an intelligent transport ...
7A_3_An ontological modelling of communications for an intelligent transport ...7A_3_An ontological modelling of communications for an intelligent transport ...
7A_3_An ontological modelling of communications for an intelligent transport ...
 
7A_2_Preliminary results of a spatial analysis of dublin citys bike rental sc...
7A_2_Preliminary results of a spatial analysis of dublin citys bike rental sc...7A_2_Preliminary results of a spatial analysis of dublin citys bike rental sc...
7A_2_Preliminary results of a spatial analysis of dublin citys bike rental sc...
 
7A_1_Multi-scale visualization of inbound and outbound traffic delays in london
7A_1_Multi-scale visualization of inbound and outbound traffic delays in london7A_1_Multi-scale visualization of inbound and outbound traffic delays in london
7A_1_Multi-scale visualization of inbound and outbound traffic delays in london
 

9A_2_Automatic classification of retail spaces from a large scale topographc database

Editor's Notes

  1. KEYWORDS: retail, classification methodologies, fuzzy logic, Bayesian,
  2. Footnote: that many of the characteristics of different retail spaces are manifest in the extent, and patterns of distribution of retail buildings. When coupled with additional information (such as access and transport information, parking areas and retail type), this paper illustrates that it is possible to automatically and systematically classify retail spaces using fine scale topographic data. We begin with a discussion of the characteristics of various retail spaces; we describe how various measures can be modelled using a variety of national coverage datasets. We contrasts three methodologies that use these metrics in various ways to generate various classifications. The outputs of the analysis are compared in order to assess the efficacy of these approaches.
  3. Figure 1: A hierarchy of types of retail spaces found in urban areas The city is comprised of different types of retail space, varying in density, composition, and location – from ‘the high street’ and the shopping mall to the retail park and the factory outlet (Figure 1). Table 1: Defining characteristics of a subset of retail spaces (variously sourced from Guy 1998; and Schillers 2001).
  4. 2.1 ‘The High Street’ Typically the High street contains a multiplicity of owners, with a predominance of retail outlets, clustered in an unplanned manner, along arterial roads – originally ‘seeded’ from periodic markets that once served the local community. Benefits include ease of access via public transport, and clustering of shops along the high street, which may additionally provide parking. 2.2 The Shopping Mall The Shopping Mall is defined as ‘many units of shops but managed as a single property’ (Pitt and Musa 2008), and comes in many guises (DeLisle 2007) – a Regional Shopping Centre being a large version of the Shopping Mall. The Shopping Mall may form an adjunct to the High Street, and is typically well served by public transport. Thus the key defining characteristics from a morphological point of view, is that they often lie within a city block, and have a high density of shops contained within one building. 2.3 Retail Parks Retail Parks are ‘loose groupings of superstores and retail warehouses located at nodal points in the suburbs’ (Bromley and Thomas 1988, p4). They are often adjacent to major arterial roads or junctions with access focused on the car. Factory Outlets follow a similar profile. Retail parks are found at the edge of town, where lower rental values result in expansive, single storey outlets. Table 1, summarises key characteristics identified from the literature.
  5. 3.1 Form, Composition and Extent The information used to measure the various characteristics was sourced from various ‘layers’ within Ordnance Survey’s MasterMap product. The Address layer provided information on the type of shop and the number of shops within a single building. 3.2 Urban Centrality Among a number of defining characteristic of retail spaces is their location relative to the centre of town. The centrality measure, at any given location ‘x’, was devised by fitting a convex hull to the urban extent (Chaudhry and Mackaness 2008) and normalising the value between the edge of the urban extent and its centre (OS Meridian 2)(Figure 2). Figure 1: A measure of centrality 3.3 Accessibility Retail spaces are served by a mix of public and private transport. Central sites benefit from the hub effect of bus services, whilst edge of centre sites suffer poorer bus services, instead catering more for the car, with large shared parking. The density of bus stops at the location of retail spaces was determined using PointX data (Ordnance Survey 2009b). The density of roads servicing a particular site can be determined from road network data (Ordnance Survey 2009a). The density for both datasets is determined by using kernel density estimation (Wasserman 2005) (Figure 3). Figure 2: (a) Density of bus stops in Edinburgh city (b) density of OS ITN road nodes in Edinburgh city By using these various metrics, we were able to model five of Guy’s (1998) ten suggested dimensions (namely function, size of store, physical form, location and development type).
  6. 4. Identifying Retail Spaces But before we can begin the process of classification, we must first identify the retail spaces themselves. Determining the extent of a site was based on previous work (Chaudhry et al. 2009). Reflecting on the adage that ‘function defines form’, the algorithm is able to group together different features according to their shared function. Figure 4 illustrates how the algorithm has grouped together various features (a), and identified an aggregated retail space (b). Figure 4: (a) Selected features that belong to a retail space (b) Aggregated geometry of the retail space from features in (a). (c) Representation of this retail space at 1:25,000 scale. (Ordnance Survey © Crown Copyright. All rights reserved)
  7. 5. Three Alternate Methodologies – Boolean, Fuzzy and Bayesian Thurs talk!! The next challenge was to classify these spaces (Figure 4b) according to a hierarchy (Figure 1) using various metrics (section 3.0). The detection and classification of ‘high streets’ was based upon work by Chaudhry et al (in press). The approach involved identifying clusters of commercial buildings using minimum spanning trees (Cormen et al. 2001; Regnauld 2003) and combining them with roads (Thom 2005; Thomson and Brooks 2007; Chaudhry and Mackaness 2005). By identifying continuous lengths of road, associated with clusters of retail outlets, it was possible to identify ‘the high streets’ of a city (Figure 5). Figure 5: The highlighted road (about 1000m long) is that of Shirley high street in Southampton, UK.
  8. For the remaining retail spaces one can envisage a number of different methodologies for classifying retail spaces, using various metrics. A simple approach would be to determine, for each site, whether a particular characteristics was present or not (section 5.1). Alternatively a range of values for each metric could be considered, affording more flexible definitions of different retail spaces (section 5.2). A third approach is to take a set of retail spaces for which their classification is known, and to compare an unknown retail space against the known sample and measure how similar or dissimilar it is, as a way of classifying that space (section 5.3). Each approach was applied to – Edinburgh, Glasgow and Southampton (cities in Great Britain). 5.1 Boolean Logic In the first approach we used crisp definitions and thresholds for defining shopping malls and retail parks. In this type of inference, a retail space is a shopping mall or not (0 or 1). Figure 6: Criteria used to define a shopping mall. Figure 7: Criteria used to define a retail park.
  9. For the remaining retail spaces one can envisage a number of different methodologies for classifying retail spaces, using various metrics. A simple approach would be to determine, for each site, whether a particular characteristics was present or not (section 5.1). Alternatively a range of values for each metric could be considered, affording more flexible definitions of different retail spaces (section 5.2). A third approach is to take a set of retail spaces for which their classification is known, and to compare an unknown retail space against the known sample and measure how similar or dissimilar it is, as a way of classifying that space (section 5.3). Each approach was applied to – Edinburgh, Glasgow and Southampton (cities in Great Britain). 5.1 Boolean Logic In the first approach we used crisp definitions and thresholds for defining shopping malls and retail parks. In this type of inference, a retail space is a shopping mall or not (0 or 1). Figure 6: Criteria used to define a shopping mall. Figure 7: Criteria used to define a retail park.
  10. Table 2 shows the results for the three cities, using these crisp definitions of a shopping mall and a retail park. The candidate sites in tables 2, 3 and 4, refer to retail spaces generated by the approach outlined in section 4. Each retail space has at least one building classified as a shop by Address Layer. Table 2 compares the automatically classified results against manually identified shopping malls and retails parks for the three test areas. The results are very poor, perhaps reflecting Guy’s observation that the classification of retail spaces lie along a continuum, and that each type of space can vary in it character, and may not have all the attributes that we typically associate with a particular retail space.
  11. 5.1 Fuzzy Logic Using fuzzy logic acknowledges the vaguee nature of some of these characteristics. It is an approach that explicitly acknowledges that we have prototypical views of these different retail spaces.Iinstead of using crisp thresholds (0 or 1), we use normalised values (0 -1) for each measure. For instance instead of assigning 0 to shop count for a retail space with 25 shops (less than the threshold of 50) we assign it a value of 0.5 – dividing the actual value (25) by the threshold (50) (eg Figure 8). We can do this normalisation using the actual values for each measure together with the threshold values (Figure 6 and 7). Figure 8: Fuzzy and boolean values against the shop frequency. In order to classify a retail space into a shopping mall or a retail park, we need to combine the values from different measures into an overall value. Weighted linear average (as proposed by Luscher et al. 2008) was used. We calculate the degree of congruence between reality and the ideal prototype using equation 1. Where con(Cj, Rk) is the congruence value of a constituent concept of Ci and the weight wj is an influence value of the subconcept. Initially all weights were equated to 1. con(Ci, Rk) = 0 when a realisation Rk differs completely from a template Ci , and con(Ci, Rj) = 1 when they match perfectly. This approach correctly identifies many of the shopping malls and retails parks in the three test areas (Table 3). But there are quite a few omission and commission errors. Commission errors are especially significant in the case of shopping malls. This is due the fact that all the factors are given equal weights thus if there is building with just 1 shop but with high centrality and accessibility, it will be classified as a shopping mall. By re-running the algorithm, the weights (in effect, the importance attached to each metric) were adjusted in order to improve the quality of classification – seeking to minimise errors of commission and omission. The results show a close correlation with a manual classification (Table 3 last two columns). Weights were set experimentally.
  12. 5.3 Bayesian Inference A third approach was to use Bayesian inference which obviates the need to normalise and weight the metrics. Bayes’ Rule is a simple way of calculating conditional probabilities (Hacking 2001). Using a Bayesian approach, we can answer questions of the following form: ‘For a given, unclassified retail space with a specific set of characteristics, what is the likelihood that it belongs to the population of ‘shopping malls’ with their specific set of characteristics?’. What is returned is a probability value reflecting the likelihood that the unclassified site is indeed a shopping mall. We used an approach similar to that proposed by Leusher et al. (2009). The joint conditional probability for a classification of an unknown is given by equation 2. Where Pc is the conditional probability of the unkown for the predicted class C, N is the number of samples in the training dataset, are the bandwidths, K is the standard normal distribution function, is the vector of properties (Figure 6, 7) of unknown, is the vector of the same properties of training dataset. Here in this research the manually classified shopping malls and retails parks for Edinburgh city were used as our training dataset . Once trained it was used to classify shopping malls and retail parks in Glasgow and Southampton. The results shown in Table 4 show that there are very few cases of omission and commission errors as compared to previous to approaches.
  13. 6. Conclusion DeLisle (2005, p2) talks of a ‘dynamic tension between too few and too many classes’ in retail space classification arguing that the basis of a sound system of classification is one that has metrics that are unambiguous, meaningful and measurable. Where definitions can be agreed, it is possible to automate the process of identifying and classifying different retail spaces. This paper has proposed a frame of reference that could be used as a basis for painting a national picture, as well as help guide any re-assessment of those criteria. Each methodology has its strengths and limitations in terms of clarity, ease of use, and data requirements. These techniques can produce a systematic classification of retail spaces, through the uniform application of meaningful criteria. The evaluation indicates the correctness of the approach in seeking to minimise classification error. The work also illustrates the breadth of utility afforded through the use of Ordnance Survey data – in particular MasterMap data, and how such databases can be enriched through the use of various analysis techniques.