Parallel k-means clustering using GPUs for the Geocomputation of Real-time Geodemographics Muhammad Adnan, Alex Singleton,...
Presentation Outline <ul><li>Geodemographic Classications </li></ul><ul><li>Does one size fit all ? </li></ul><ul><li>Besp...
Geodemographic Classifications <ul><li>OAC by ONS </li></ul><ul><li>Mosaic by Experian </li></ul><ul><li>Accorn by CACI </...
Does one size fit all ? OAC (Output Area Classification): London OAC (Output Area Classification): Birmingham
Does one size fit all ? <ul><li>There are some underlying interesting patterns at finest geographical levels. </li></ul><u...
Does one size fit all ? <ul><li>There are some underlying interesting patterns at finest geographical levels. </li></ul><u...
Live data is available on the web <ul><li>ONS NESS API (Live XML feeds) </li></ul><ul><li>Police API (Live XML feeds) </li...
Major challenge for ‘on the fly’ classifications <ul><li>K-means is used to create geodemographic classifications. </li></...
Enhancements for K-means <ul><li>This paper gives two enhancement methods for K-means. </li></ul><ul><ul><li>A parallel ve...
Parallel k-means Clustering algorithm
Nvidia Graphics Cards <ul><li>Nvidia graphics cards have multiple GPUs (Graphical Processing Units). </li></ul><ul><ul><li...
Parallel k-means <ul><li>User specifies K and N </li></ul><ul><li>Where, K= Number of clusters  </li></ul><ul><li>N= Numbe...
Parallel k-means <ul><li>User specifies K and N </li></ul><ul><li>Where, K= Number of clusters  </li></ul><ul><li>N= Numbe...
Parallel k-means <ul><li>User specifies K and N </li></ul><ul><li>Where, K= Number of clusters  </li></ul><ul><li>N= Numbe...
Comparing k-means and Parallel k-means <ul><li>OA (Output Area) Level Results </li></ul>OA (Output Area) Level results
Comparing k-means and Parallel k-means <ul><li>LSOA (Lower Super Output Area) Level Results </li></ul>LSOA (Lower Super Ou...
Comparing k-means and Parallel k-means <ul><li>WARD Level Results </li></ul>OA (Output Area) Level results
Efficiency achieved by using Parallel K-means <ul><li>OAC Classification by Parallel K-means </li></ul><ul><li>Parallel K-...
Within sum of squares of K-means <ul><li>Running K-means on OA (Output Area) Level data for UK for K=7. </li></ul>
2 nd  performance enhancement for k-means Establishing a threshold value <ul><li>If threshold remains same for another 100...
Testing the approach <ul><li>OA (Output Area) level data for UK was used </li></ul>K-means for K=7 <ul><li>This approach i...
Conclusion & Future Work <ul><li>Need for real time bespoke geodemographic classifications is increasing. </li></ul><ul><l...
Thank you for listening Any Questions?
Upcoming SlideShare
Loading in …5
×

4A_ 3_Parallel k-means clustering using gp_us for the geocomputation of real-time geodemographics

964 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
964
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

4A_ 3_Parallel k-means clustering using gp_us for the geocomputation of real-time geodemographics

  1. 1. Parallel k-means clustering using GPUs for the Geocomputation of Real-time Geodemographics Muhammad Adnan, Alex Singleton, Paul Longley
  2. 2. Presentation Outline <ul><li>Geodemographic Classications </li></ul><ul><li>Does one size fit all ? </li></ul><ul><li>Bespoke classifications </li></ul><ul><li>Live data </li></ul><ul><li>Major challenge for ‘on the fly’ classifications </li></ul><ul><li>Enhancements for k-means clustering algorithm </li></ul><ul><ul><li>Parallel K-means clustering using Nvidia graphics cards </li></ul></ul><ul><ul><li>Comparison of K-means and Parallel K-means </li></ul></ul><ul><ul><li>Enhancement by using within sum of squares and standard deviation of clusters </li></ul></ul><ul><li>Conlcusion </li></ul>
  3. 3. Geodemographic Classifications <ul><li>OAC by ONS </li></ul><ul><li>Mosaic by Experian </li></ul><ul><li>Accorn by CACI </li></ul><ul><li>Microvision by NDS/Equifax </li></ul><ul><li>All classifications give a national level overview of UK by groupings areas into homogeneous clusters. </li></ul>
  4. 4. Does one size fit all ? OAC (Output Area Classification): London OAC (Output Area Classification): Birmingham
  5. 5. Does one size fit all ? <ul><li>There are some underlying interesting patterns at finest geographical levels. </li></ul><ul><li>Different bespoke classifications have been developed over the past few years. </li></ul>Employment classification of Yorkshire and Humber OAC (Output Area Classification)
  6. 6. Does one size fit all ? <ul><li>There are some underlying interesting patterns at finest geographical levels. </li></ul><ul><li>Different bespoke classifications have been developed over the past few years . </li></ul>Employment classification of Yorkshire and Humber OAC (Output Area Classification) <ul><li>E-society Classification by UCL </li></ul><ul><li>Education Classification by UCL </li></ul><ul><li>Curicible by Tesco </li></ul>
  7. 7. Live data is available on the web <ul><li>ONS NESS API (Live XML feeds) </li></ul><ul><li>Police API (Live XML feeds) </li></ul><ul><li>100s of data sources on Education, Crime, Transport, Environment ( www.data.gov.uk ) </li></ul><ul><li>This increases the need for </li></ul><ul><li>‘ On the fly’ bespoke classifications. </li></ul><ul><ul><li>Users have the control of how the classification is created and variables are weighted. </li></ul></ul><ul><ul><li>Classifications are produced within minutes rather than hours. </li></ul></ul>
  8. 8. Major challenge for ‘on the fly’ classifications <ul><li>K-means is used to create geodemographic classifications. </li></ul><ul><li>K-means is an unstable clustering algorithm. </li></ul><ul><li>Creating a classification with K-means requires running it multiple times on a data set. </li></ul><ul><ul><li>10,000 times (Singleton & Longley, 2008) </li></ul></ul><ul><li>Creating OAC (k=7 groups) with K-means requires approx. 11.75 hours on a high specification computer. </li></ul>
  9. 9. Enhancements for K-means <ul><li>This paper gives two enhancement methods for K-means. </li></ul><ul><ul><li>A parallel version of K-means which runs on GPUs of Nvidia Graphics Cards. </li></ul></ul><ul><ul><li>Convergence of K-means algorithm by ‘comparing within sum of squares’ and ‘standard deviation’ of consecutive runs. </li></ul></ul>
  10. 10. Parallel k-means Clustering algorithm
  11. 11. Nvidia Graphics Cards <ul><li>Nvidia graphics cards have multiple GPUs (Graphical Processing Units). </li></ul><ul><ul><li>GeForce 8600M GS has 16 GPUs. </li></ul></ul><ul><li>Each GPU can run one process independent of others. </li></ul><ul><li>Programmers use C/C++ to write programs which can run on Nvidia graphics cards. </li></ul><ul><li>GPUs can be used for parallel computation of computationaly expensive algorithms. </li></ul>Has 1000 GPUs
  12. 12. Parallel k-means <ul><li>User specifies K and N </li></ul><ul><li>Where, K= Number of clusters </li></ul><ul><li>N= Number of k-means runs </li></ul>Step-1 CPU <ul><li>Count number of GPUs </li></ul><ul><li>Prepare data points </li></ul><ul><li>Upload data on GPUs </li></ul>
  13. 13. Parallel k-means <ul><li>User specifies K and N </li></ul><ul><li>Where, K= Number of clusters </li></ul><ul><li>N= Number of k-means runs </li></ul>Step-1 CPU <ul><li>Count number of GPUs </li></ul><ul><li>Prepare data points </li></ul><ul><li>Upload data on GPUs </li></ul>Step-2 Graphics Card GPU-1 GPU-2 GPU-3 GPU-N <ul><li>Perform k-means clustering by minimizing within sum of squares. </li></ul><ul><li>Return the result back to CPU. </li></ul>
  14. 14. Parallel k-means <ul><li>User specifies K and N </li></ul><ul><li>Where, K= Number of clusters </li></ul><ul><li>N= Number of k-means runs </li></ul>Step-1 CPU <ul><li>Count number of GPUs </li></ul><ul><li>Prepare data points </li></ul><ul><li>Upload data on GPUs </li></ul>Step-2 Graphics Card GPU-1 GPU-2 GPU-3 GPU-N <ul><li>Perform k-means clustering by minimizing within sum of squares. </li></ul><ul><li>Return the result back to CPU. </li></ul>Step-3 <ul><li>CPU keeps on delegating data point to GPUs until ‘N’ times. </li></ul><ul><li>CPU compares ‘within sum of squares’ of each run. </li></ul><ul><li>The run having ‘minimum within sum of squares’ is the geodemographic classification. </li></ul>
  15. 15. Comparing k-means and Parallel k-means <ul><li>OA (Output Area) Level Results </li></ul>OA (Output Area) Level results
  16. 16. Comparing k-means and Parallel k-means <ul><li>LSOA (Lower Super Output Area) Level Results </li></ul>LSOA (Lower Super Output Area) Level results
  17. 17. Comparing k-means and Parallel k-means <ul><li>WARD Level Results </li></ul>OA (Output Area) Level results
  18. 18. Efficiency achieved by using Parallel K-means <ul><li>OAC Classification by Parallel K-means </li></ul><ul><li>Parallel K-means gives 90% efficiency over K-means </li></ul>No. of clusters K-means Parallel K-means Throughput 7 9 sec. 0.54 sec. 94% 12 25 sec. 1.5 sec. 93% 52 38 sec. 2.16 sec. 89%
  19. 19. Within sum of squares of K-means <ul><li>Running K-means on OA (Output Area) Level data for UK for K=7. </li></ul>
  20. 20. 2 nd performance enhancement for k-means Establishing a threshold value <ul><li>If threshold remains same for another 100 runs, terminate the algorithm. </li></ul>
  21. 21. Testing the approach <ul><li>OA (Output Area) level data for UK was used </li></ul>K-means for K=7 <ul><li>This approach is reasonably faster than running K-means for 10,000 times. </li></ul>Run Number Convergence achieved 1 1016 runs 2 928 runs 3 1800 runs 4 826 runs
  22. 22. Conclusion & Future Work <ul><li>Need for real time bespoke geodemographic classifications is increasing. </li></ul><ul><li>Parallel K-means is faster in performance than standard K-means clustering algorithm. </li></ul><ul><li>Parallel K-means can be used for ‘On the fly’ creation of geodemographic classifications. </li></ul><ul><li>Parallel K-means can be combined with 2 nd approach described in this paper for enhanced computational throughput. </li></ul>
  23. 23. Thank you for listening Any Questions?

×