• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Operation Point Cluster - Blue Raster Esri Developer Summit 2013 Presentation
 

Operation Point Cluster - Blue Raster Esri Developer Summit 2013 Presentation

on

  • 891 views

 

Statistics

Views

Total Views
891
Views on SlideShare
891
Embed Views
0

Actions

Likes
1
Downloads
6
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Operation Point Cluster - Blue Raster Esri Developer Summit 2013 Presentation Operation Point Cluster - Blue Raster Esri Developer Summit 2013 Presentation Presentation Transcript

    • Brendan Collins
    • “The function of the brain and nervoussystem is to protect us from beingoverwhelmed and confused by this mass oflargely useless and irrelevant knowledge,by shutting out most of what we shouldotherwise perceive or remember at anymoment, and leaving only that very smalland special selection which is likely tobe practically useful.” -Aldous Huxley
    • 103,000 Public Schools(No Clustering)
    • 103,000 Public Schools(Count)
    • 103,000 Public Schools(Mean Student Teacher Ratio)
    • Operation Point Cluster• Review general clustering algorithms• Suggest strategies & implementations for clustering for web applications – Server-side (C#) – Offline w/ArcGIS (Python) – Offline w/3rd Party (Python)
    • Data Classification (One Dimensional Clustering)• Equal-interval – Clusters have same max – min (interval)• Quantile – Clusters have same count• Natural Breaks (Jenks) – Clusters have minimum deviation from mean
    • KMeans(Centroid-based)
    • KMeans (Centroid-based)1. Choose random starting points2. Assign each target point to cluster candidates3. Replace randomly centroid point with mean of group.4. Repeat steps 2 & 3 until convergence.
    • Grid Clustering (Grid-based)1. Overlay mesh sized appropriate for zoom level2. Compare point coordinates to mesh to create clusters.• Very common on client-side• Can lead to undesired “Grid” effect• Somewhat non-deterministic
    • QuadTree(Distance-based) http://en.wikipedia.org/wiki/QUADTREE
    • QuadTree (Distance-based)1.Input minimum cluster tolerance2.Recursively insert points into existing tree 1. Where distance < tolerance, number of points++ 2. Where distance > tolerance, insert to child node.• Easy to implement• Can lead to “Grid” affect
    • DBSCAN(Density-based) http://en.wikipedia.org/wiki/DBSCAN
    • DBSCAN (Density-based)1. Takes search radius and minimum number of points for cluster2. Visit each point and count number of points in search radius• Clusters can be any shape• Search radius determined by zoom level
    • Strategies & Implementations for Web Apps (Server Object Extension vs. Pre-Crunched)
    • Where should clustering occur? • Small number of points ( < 10,000 ) • No addition server loadClient-side • Widely available within client APIs • Limited by client-side languages • Medium number of points ( < 1M ) • Many language/library optionsServer-side • Robust querying • Very maintainable / extendible • Large number of points( > 1M) • Many language/library options Offline • Limited querying • Output Normal Feature Class
    • Clustering Server Object Extension (C#/QuadTree)1. Extends MapServer2. Wraps map query based on extent3. returns clustered results4. Stateless5. Problems 1. Re-calculates tree on each request 2. Client-side wrappers 3. Lost out-of-box ArcGIS Server functions
    • Clustering with Arcpy (distance-based / offline)1.Divide data into logical chunks (where clause)2.Integrate using tolerance3.Collect Events4.Spatial Join add descriptive statistics4.Append all results
    • Clustering w/Python• Numpy/Scipy – Defacto• Scikit-Learn – (Python machine learning library)• PyTables – HDF5, akin to NetCDF, but with support for hierarchical tables and very scalable – http://bcdcspatial.blogspot.com/2013 /02/converting-arcgis-feature-class- to.html
    • Scikit-Learn SciKit – Learn…btw it’s awesome - http://scikit-learn.org/stable/
    • Bleeding Edge Python• PyPy, Cython, Anaconda, Numba Pro, Pandas• Python is now a first-class citizen on the GPU!
    • In Summary:• Clustering is not Panning• Think outside Count• Clustering is not only for spatial data
    • Thank You!Follow us on Twitter: @blueraster @brendancolVisit us at: blueraster.com/blog bcdcspatial.blogspot.com