Moving to real time segmentation: efficient computation of geodemographic classification
Upcoming SlideShare
Loading in...5
×
 

Moving to real time segmentation: efficient computation of geodemographic classification

on

  • 3,050 views

Moving to real time segmentation: efficient computation of geodemographic classification,

Moving to real time segmentation: efficient computation of geodemographic classification,
GISRUK 2009, University of Durham, Durham.

M Adnan, A D Singleton, C Brunsdon and P A Longley

Statistics

Views

Total Views
3,050
Views on SlideShare
2,902
Embed Views
148

Actions

Likes
1
Downloads
26
Comments
0

3 Embeds 148

http://www.alex-singleton.com 145
http://www.slideshare.net 2
http://blogs.splintdev.geog.ucl.ac.uk 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Moving to real time segmentation: efficient computation of geodemographic classification Moving to real time segmentation: efficient computation of geodemographic classification Presentation Transcript

  • Moving to real time segmentation: efficient computation of Geodemographic classification Adnan, M., Singleton, A.D., Brunsdon, C., Longley, P.A.
  • Presentation Outline
    • Need for real time Geodemographics.
    • What are real time Geodemographics?
    • Computational challenges
    • Clustering Algorithms
      • K-means
      • Clara and GA (Genetic Algorithm)
    • Comparison of Clustering Algorithms
    • Web based clustering tool demo
  • Need for real time Geodemographics
    • Current classifications are created using static data sources.
    • Rate and scale of current population change is making large surveys (census) increasingly redundant.
        • Significant hidden value in transactional data
    • Data is increasingly available in near real time
    • e.g. ONS NESS API.
    • Application specific (bespoke) classifications have demonstrated utility.
  • What are real time Geodemographics ?
  • Computational challenges
    • Integration of large and possibly disparate databases.
    • Data normalisation and optimization for fast transactions.
    • Minimizing computational time of clustering algorithms (Very Important)!
  • Some Clustering algorithms
    • K-Means
    • PAM (Partitioning Around Medoids)
    • CLARA (Clustering Large Applications)
    • GA (Genetic Algorithm)
    • K-Means++
    • Fuzzy Clustering Algorithms
    • This paper: K-means, CLARA, and GA.
  • K-means
    • Attempts to find out cluster centroids by minimising within sum of squares distance.
    • K-means is unstable due to its initial seeds assignment.
    • Creating a Geodemographic classification requires running algorithm multiple times.
        • Computationally expensive in a real time environment.
  • K-means (100 runs of k-means on OAC data set for k=5)
  • An example of bad clustering result
  • An example of bad clustering result
  • An example of bad clustering result
  • PAM, CLARA and Genetic Algorithm
    • PAM (Partitioning around medoids) tries to minimize the sum of distances of the objects to their cluster centers.
    • CLARA draws multiple samples of the dataset, applies PAM to each sample and returns the best result.
    • GA (Genetic Algorithm) is inspired by models of biological evolution. Produce results through a breeding procedure.
  • Comparing computational efficiency… PAM, and GA on the three geographic aggregations of a dataset covering London. Figure 1: OA(Output Area) level results Figure 2 : LSOA (Lower Super Output Area) level results Figure 3 : Ward level results
  • Comparing classification optimisation efficiency Figure 4 : OA (Output Area) level results Figure 5: LSOA (Lower Super Output Area) level results Figure 6: Ward level results
  • Algorithm Stability (w.r.t. Computational time) Figure 7: Running k-means on OA (Output Area) for 120 times on each iteration Figure 8: Running CLARA on OA (Output Area) for 120 times on each iteration Figure 9: Running GA on OA (Output Area) for 120 times on each iteration
  • Some Outcomes
    • For Larger datasets:
    • Computational (Time) Efficiency => PAM
    • Classification (Better Clustering) Efficiency =>
    • Genetic Clustering
    • For Smaller datasets:
    • Computational (Time) Efficiency => K-Means
    • Classification (Better Clustering) Efficiency => PAM
  • K-means and Principle Component Analysis
    • PCA can be used to facilitate K-means clustering by reducing dimensions.
    • (Ding, C., He, X., 2004)
    Figure 10: K-means result for 41 “OAC variables” Figure 11: K-means result for 26 “OAC Principle Components”
  • K-means and Principle Component Analysis
    • PCA can be used to facilitate K-means clustering by reducing dimensions.
    • (Ding, C., He, X., 2004)
    Figure 10: K-means result for 4 1 “OAC variables” Figure 11: K-means result for 26 “OAC Principle Components”
  • Conclusion and Future work
    • CLARA and GA are plausible alternative to k-means in a real time Geodemographic classification system.
    • K-means might be combines with PCA for enhanced computation power.
    • In an online environment k-means is better for small data sets.
    • In a real time geodemographic classification system, a clustering algorithm can be chosen at run time.
  • Web based clustering tool demo