Developing a Tutorial for Grouping Analysis in ArcGIS

Developing a Tutorial for
Grouping Analysis in ArcGIS
Daniel Pierre
May 29, 2014

1. Introduction
2. Data
3. Grouping Analysis Workflows
4. Tutorial Exercises
5. Conclusions: Recommendations
Presentation Outline

Lauren Rosenshein Bennett, MS
Geoprocessing Product Engineer, Esri
Lbennett@esri.com
Dr. Konrad Dramowicz
Faculty, Centre of Geographic Sciences
Konrad.Dramowicz@nscc.ca
Dr. Ela Dramowicz
Faculty, Centre of Geographic Sciences
Ela.Dramowicz@nscc.ca
Introduction
Project Sponsor & Supervisors

Introduction
• Experimental testing of tool with
multiple datasets
• Incorporation of Grouping
Analysis with other tools
• Review of technical literature
on clustering algorithms
• Review of existing tutorials
Project Overview

Introduction
• Introduced at ArcGIS 10.1
• Available with Basic, Standard and
Advanced license levels
• Found in the Spatial Statistics
toolbox, within the Mapping
Clusters toolset
• Script tool
Grouping Analysis Tool

Introduction
• “...Performs a classification
procedure that tries to find natural
clusters in your data.” - Esri
• An aid for data comprehension
• Feature similarity is based on
attributes specified as analysis fields
and optionally, spatial constraints
• Given a number of groups, features
within each output group are as
similar as possible while groups are
as different as possible

Introduction
• Two algorithm types: cluster
analysis (traditional K-means) and
regionalization (spatial K-means)
• Thirteen parameters
(six required)
• Grouping results contingent on
the number of groups, analysis
fields, and type of spatial
constraint

Data
Features:
• Esri
• City of Vancouver
Multivariate Data:
• World Bank
• BBC
• Weatherbase
• Statistics Canada
Data Sources

Data
• Data Enrichment (ArcGIS Online)
• HTML table import
• Spreadsheet reformatting
• Table joins
• Feature class edits
Data Preparation

Data
Selection Criteria:
• Two scales of analysis
• Illustration of various spatial
constraint effects on results
• Sufficient number of features
• Visible spatial patterns in results
Tutorial Datasets

General Steps:
• Exploratory data analysis
• Preprocessing
• Determining appropriate Grouping
Analysis settings
• Postprocessing, interpretation
and evaluation of results
Grouping Analysis Workflows

Exploratory Data Analysis
1. Distribution of variable values
• Thematic mapping
• Spatial autocorrelation
2. Spatial relationships among
features
• Contiguity of features and number
of neighbours
• Spatial autocorrelation

• Explore distribution of dataset
variables
• Choropleth maps and graduated
symbol maps
• Identify set of variables to be used
for Grouping Analysis
Thematic Mapping

• Analyze contiguity relationships
among features
• Polygon Neighbors tool
• Determine relative connectivity of
features by counting number
of neighbours
• Frequency tool
Spatial Relationships

• Analyze contiguity and/or proximity
relationships among features using
GeoDa
• Create spatial weights
• Display histogram of feature
connectivity according to
defined spatial relationships
• Histogram linked to map and
attribute table
Alternative Approach

• Considers attribute values and
location of features simultaneously
• Moran’s I statistic determines
whether spatial pattern of values is
dispersed, random or clustered
• Significance of pattern evaluated
with corresponding z-score
• One variable at a time
Spatial Autocorrelation

Preprocessing
Use hot spots to limit study
area for Grouping Analysis:
• Calculate incremental spatial
autocorrelation
• Identify distance band of most
intense clustering
• Create hot spot map
• Select features from original
dataset based on location
of hot spots
Preprocessing

Grouping Analysis Settings
1. How many groups should be created?
2. Which analysis fields should be used?
3. Is a spatial constraint necessary?
If so, which type is appropriate?
Grouping Analysis Settings:
Key Considerations

• Default number is 2
• Sturge’s rule:
C = 1 + 3.3 log(n), where
C is the number of groups and
n is the number of features
• Evaluate the optimal number of groups
(up to a maximum of 15)
Number of Groups

Two vs. Three Groups

• Generally driven by research purpose
and objectives of grouping
• Guide selection of analysis fields with
exploratory data analysis findings
• Spatial variables may be used as
indirect spatial constraints
• Assess effectiveness of fields to
distinguish features with output report
Analysis Fields

Temperature: Spatial Variable

• Choice of spatial constraint or no
spatial constraint determines which
algorithm is used for grouping
• No spatial constraint – traditional
K-Means (data space only)
• Any spatial constraint – Spatial ‘K’luster
Analysis by Tree Edge Removal (SKATER)
method (spatial K-Means)
Spatial Constraints

No Spatial Constraint vs.
Spatial Constraint

• Contiguity – edges only (“rook” type) or
edges and corners (“queen” type)
• Delaunay triangulation – contiguity of
representations of features as Voronoi
polygons
• Proximity – K nearest neighbours
• Spatial weights
Spatial Constraint Types

• Evaluate optimal number of groups
• Guide selection of analysis fields with
calculated R2 values
• Visually assess results of specified
spatial constraint
Iterative Process for Optimizing
Grouping Analysis

Interpretation & Evaluation
• Spatial distribution of groups (map)
• Global statistics (output report)
• Group and variable statistics
(output report)
• Group profiles
Interpretation of Results

• Compare group means with each
other and global range
Group Profiles

• Compare group means and ranges
for each variable
Group Profiles (2)

• Consider global mean, median and
range for each variable
Group Profiles (3)

• Global Moran’s I statistic
• Determine spatial pattern of group
membership
• Measure spatial compactness of
group membership
• Clustered groups generally desired
Evaluation of Results:
Spatial Autocorrelation
Dispersed
Clustered
Random

• Smallest to largest group
• Indicator of balance in group
membership
• Balanced number of group
members generally desired for
comparison of statistics
• Frequency tool
Cluster Size Ratio

• Goodness measure that combines
concepts of cohesion and separation
• Adapted from cluster analysis to
consider attribute data and location
• Silhouette coefficient is calculated
for every feature and the average is
taken for the entire dataset
Silhouette

(B – A) / max(A, B) where
A is the distance between a
feature and its group center
B is the distance between the
feature and its neighbouring
group center
Silhouette Coefficient

• Range between –1 (poor)
and 1 (excellent)
• < 0.2 indicates poor clustering
• > 0.5 indicates good partition
of the data
Silhouette Coefficient Values

Tutorial Exercises
• Six exercises
• Two scenarios (3 exercises for each)
• Suitable for users at all levels of
experience
• Exercises take the user through the
steps of preprocessing, group
creation, interpretation and
evaluation of results outlined here
Grouping Analysis Tutorial

Tutorial Exercises
Exercises:
1. Data exploration
2. Grouping for exploratory data
analysis
3. Using Spatial Statistics tools to
target areas of interest
Scenario 1:
Analysis of Crime in Chicago

Tutorial Exercises
Exercises:
4. Create groups and use results to
write profiles
5. Explore effects of spatial
constraints
6. Evaluation of results
Scenario 2:
Analysis of Olympic Results

Tutorial Exercises
1. All tutorial exercises use polygon
data exclusively; point features not
covered
2. Space-time constraints using
spatial weights matrix file not
covered
3. Catered to general user; no
exercises specifically target
advanced users
Limitations

Recommendations
1. Exploratory data analysis
2. Grouping Analysis
3. Evaluation of results
Recommendations:
Enhancements and Additional Tools

Recommendations
• Multi-step process using Polygon
Neighbors, Frequency and table
joins could be simplified
• Dynamic linking of objects
can make use of existing
ArcGIS functionality
Determining Spatial Relationships
Among Features

Recommendations
• Expand types of spatial
relationships that can be analyzed
• Enable the analysis of higher order
relationships
Determining Spatial Relationships
Among Features (continued)

Recommendations
• Tools for determining most useful
diagnostic or predictor variables
• Guide selection of analysis fields for
data partitioning
• Adapt neural networks or other
data mining tools to work with
spatial constraints
Identification of Useful
Diagnostic Variables

Recommendations
Enhancements
• Create unique identifier
• Replace null values

Recommendations
• Spatial weights matrix can be
used as the spatial constraint
for creating groups
• Custom weights require
either manual table creation
or programming
• Solution: interactive feature
selection
User-defined spatial relationships
among features

Recommendations
• Expand beyond R2 and F-statistic
values in output report
• Adapt methods used to evaluate
cluster analysis algorithms
(e.g. Silhouette)
• Challenge: universally applicable
evaluation methods may not be
feasible
Evaluation of Results

Developing a Tutorial for Grouping Analysis in ArcGIS

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (8)

Similar to Developing a Tutorial for Grouping Analysis in ArcGIS

Similar to Developing a Tutorial for Grouping Analysis in ArcGIS (20)

More from COGS Presentations

More from COGS Presentations (20)

Recently uploaded

Recently uploaded (20)

Developing a Tutorial for Grouping Analysis in ArcGIS