This presentation describes tools and possible workflows using the Grouping Analysis tool in ArcGIS. The tutorial developed from this material highlights practical usage of Grouping Analysis with additional tools to solve real-world problems in two scenarios and is suitable for ArcGIS users at any level of experience. The tutorial was produced as a Major Research Project in GIS for Business at the Centre of Geographic Sciences, sponsored by Esri.
3. Lauren Rosenshein Bennett, MS
Geoprocessing Product Engineer, Esri
Lbennett@esri.com
Dr. Konrad Dramowicz
Faculty, Centre of Geographic Sciences
Konrad.Dramowicz@nscc.ca
Dr. Ela Dramowicz
Faculty, Centre of Geographic Sciences
Ela.Dramowicz@nscc.ca
Introduction
Project Sponsor & Supervisors
4. Introduction
• Experimental testing of tool with
multiple datasets
• Incorporation of Grouping
Analysis with other tools
• Review of technical literature
on clustering algorithms
• Review of existing tutorials
Project Overview
5. Introduction
• Introduced at ArcGIS 10.1
• Available with Basic, Standard and
Advanced license levels
• Found in the Spatial Statistics
toolbox, within the Mapping
Clusters toolset
• Script tool
Grouping Analysis Tool
6. Introduction
• “...Performs a classification
procedure that tries to find natural
clusters in your data.” - Esri
• An aid for data comprehension
• Feature similarity is based on
attributes specified as analysis fields
and optionally, spatial constraints
• Given a number of groups, features
within each output group are as
similar as possible while groups are
as different as possible
Grouping Analysis Tool
7. Introduction
• Two algorithm types: cluster
analysis (traditional K-means) and
regionalization (spatial K-means)
• Thirteen parameters
(six required)
• Grouping results contingent on
the number of groups, analysis
fields, and type of spatial
constraint
Grouping Analysis Tool
8. Data
Features:
• Esri
• City of Vancouver
Multivariate Data:
• World Bank
• BBC
• Weatherbase
• Statistics Canada
Data Sources
9. Data
• Data Enrichment (ArcGIS Online)
• HTML table import
• Spreadsheet reformatting
• Table joins
• Feature class edits
Data Preparation
10. Data
Selection Criteria:
• Two scales of analysis
• Illustration of various spatial
constraint effects on results
• Sufficient number of features
• Visible spatial patterns in results
Tutorial Datasets
11. General Steps:
• Exploratory data analysis
• Preprocessing
• Determining appropriate Grouping
Analysis settings
• Postprocessing, interpretation
and evaluation of results
Grouping Analysis Workflows
12. Exploratory Data Analysis
1. Distribution of variable values
• Thematic mapping
• Spatial autocorrelation
2. Spatial relationships among
features
• Contiguity of features and number
of neighbours
• Spatial autocorrelation
Exploratory Data Analysis
13. Exploratory Data Analysis
• Explore distribution of dataset
variables
• Choropleth maps and graduated
symbol maps
• Identify set of variables to be used
for Grouping Analysis
Thematic Mapping
14. Exploratory Data Analysis
• Analyze contiguity relationships
among features
• Polygon Neighbors tool
• Determine relative connectivity of
features by counting number
of neighbours
• Frequency tool
Spatial Relationships
15. Exploratory Data Analysis
• Analyze contiguity and/or proximity
relationships among features using
GeoDa
• Create spatial weights
• Display histogram of feature
connectivity according to
defined spatial relationships
• Histogram linked to map and
attribute table
Alternative Approach
16. Exploratory Data Analysis
• Considers attribute values and
location of features simultaneously
• Moran’s I statistic determines
whether spatial pattern of values is
dispersed, random or clustered
• Significance of pattern evaluated
with corresponding z-score
• One variable at a time
Spatial Autocorrelation
17. Preprocessing
Use hot spots to limit study
area for Grouping Analysis:
• Calculate incremental spatial
autocorrelation
• Identify distance band of most
intense clustering
• Create hot spot map
• Select features from original
dataset based on location
of hot spots
Preprocessing
18. Grouping Analysis Settings
1. How many groups should be created?
2. Which analysis fields should be used?
3. Is a spatial constraint necessary?
If so, which type is appropriate?
Grouping Analysis Settings:
Key Considerations
19. Grouping Analysis Settings
• Default number is 2
• Sturge’s rule:
C = 1 + 3.3 log(n), where
C is the number of groups and
n is the number of features
• Evaluate the optimal number of groups
(up to a maximum of 15)
Number of Groups
21. Grouping Analysis Settings
• Generally driven by research purpose
and objectives of grouping
• Guide selection of analysis fields with
exploratory data analysis findings
• Spatial variables may be used as
indirect spatial constraints
• Assess effectiveness of fields to
distinguish features with output report
Analysis Fields
23. Grouping Analysis Settings
• Choice of spatial constraint or no
spatial constraint determines which
algorithm is used for grouping
• No spatial constraint – traditional
K-Means (data space only)
• Any spatial constraint – Spatial ‘K’luster
Analysis by Tree Edge Removal (SKATER)
method (spatial K-Means)
Spatial Constraints
25. Grouping Analysis Settings
• Contiguity – edges only (“rook” type) or
edges and corners (“queen” type)
• Delaunay triangulation – contiguity of
representations of features as Voronoi
polygons
• Proximity – K nearest neighbours
• Spatial weights
Spatial Constraint Types
26. Grouping Analysis Settings
• Evaluate optimal number of groups
• Guide selection of analysis fields with
calculated R2 values
• Visually assess results of specified
spatial constraint
Iterative Process for Optimizing
Grouping Analysis
27. Interpretation & Evaluation
• Spatial distribution of groups (map)
• Global statistics (output report)
• Group and variable statistics
(output report)
• Group profiles
Interpretation of Results
30. • Consider global mean, median and
range for each variable
Group Profiles (3)
Interpretation & Evaluation
31. Interpretation & Evaluation
• Global Moran’s I statistic
• Determine spatial pattern of group
membership
• Measure spatial compactness of
group membership
• Clustered groups generally desired
Evaluation of Results:
Spatial Autocorrelation
Dispersed
Clustered
Random
32. Interpretation & Evaluation
• Smallest to largest group
• Indicator of balance in group
membership
• Balanced number of group
members generally desired for
comparison of statistics
• Frequency tool
Evaluation of Results:
Cluster Size Ratio
33. Interpretation & Evaluation
• Goodness measure that combines
concepts of cohesion and separation
• Adapted from cluster analysis to
consider attribute data and location
• Silhouette coefficient is calculated
for every feature and the average is
taken for the entire dataset
Evaluation of Results:
Silhouette
34. Interpretation & Evaluation
(B – A) / max(A, B) where
A is the distance between a
feature and its group center
B is the distance between the
feature and its neighbouring
group center
Silhouette Coefficient
35. Interpretation & Evaluation
• Range between –1 (poor)
and 1 (excellent)
• < 0.2 indicates poor clustering
• > 0.5 indicates good partition
of the data
Silhouette Coefficient Values
36. Tutorial Exercises
• Six exercises
• Two scenarios (3 exercises for each)
• Suitable for users at all levels of
experience
• Exercises take the user through the
steps of preprocessing, group
creation, interpretation and
evaluation of results outlined here
Grouping Analysis Tutorial
37. Tutorial Exercises
Exercises:
1. Data exploration
2. Grouping for exploratory data
analysis
3. Using Spatial Statistics tools to
target areas of interest
Scenario 1:
Analysis of Crime in Chicago
38. Tutorial Exercises
Exercises:
4. Create groups and use results to
write profiles
5. Explore effects of spatial
constraints
6. Evaluation of results
Scenario 2:
Analysis of Olympic Results
39. Tutorial Exercises
1. All tutorial exercises use polygon
data exclusively; point features not
covered
2. Space-time constraints using
spatial weights matrix file not
covered
3. Catered to general user; no
exercises specifically target
advanced users
Limitations
40. Recommendations
1. Exploratory data analysis
2. Grouping Analysis
3. Evaluation of results
Recommendations:
Enhancements and Additional Tools
41. Recommendations
• Multi-step process using Polygon
Neighbors, Frequency and table
joins could be simplified
• Dynamic linking of objects
can make use of existing
ArcGIS functionality
Determining Spatial Relationships
Among Features
42. Recommendations
• Expand types of spatial
relationships that can be analyzed
• Enable the analysis of higher order
relationships
Determining Spatial Relationships
Among Features (continued)
43. Recommendations
• Tools for determining most useful
diagnostic or predictor variables
• Guide selection of analysis fields for
data partitioning
• Adapt neural networks or other
data mining tools to work with
spatial constraints
Identification of Useful
Diagnostic Variables
45. Recommendations
• Spatial weights matrix can be
used as the spatial constraint
for creating groups
• Custom weights require
either manual table creation
or programming
• Solution: interactive feature
selection
User-defined spatial relationships
among features
46. Recommendations
• Expand beyond R2 and F-statistic
values in output report
• Adapt methods used to evaluate
cluster analysis algorithms
(e.g. Silhouette)
• Challenge: universally applicable
evaluation methods may not be
feasible
Evaluation of Results