More Related Content
Similar to Spatial Analysis and Geomatics (20)
More from Rich Heimann (13)
Spatial Analysis and Geomatics
- 1. Geoprocessing & Spatial Analysis
GES673
at Shady Grove
Richard Heimann
Richard Heimann © 2013
Thursday, February 21, 13
- 2. Course Description
The increased access to spatial data and overall improved application of spatial analytical methods present
certain challenges to social scientific research. This graduate course is designed to focus on substantive social
science research topics while exposing rewards and potential risks involved in the application of geographic
information systems (GIS), spatial analysis, and spatial statistics in their own research.
The course will also highlight connections between spatial concepts and data availability. Both traditional spatial
science data will be used as well as new emerging social media data, which better reflect some of the more recently
developments in Big Data - most notably the socially critical exploration of such data. Substantive foci will include
readings and discussions of spatially explicit theory leaning toward acknowledgment of a social and spatial turn
in Big Data and enhanced role and extension of spatial analysis to keep with such trends.
Throughout the course, lectures and discussions will be complemented with lab sessions introducing spatial
analysis methods and GIS and spatial analysis software. The lab sessions will include the use of among other
software GeoDa and ArcGIS. These lab sessions will introduce many methodological and technical issues relevant
to spatial analysis. Assignments for the courses include up to two writing assignments, up to four lab assignments,
and a final project which will be presented as a short 15-minute presentation as well as submitted as a term paper.
The writing assignments will include an annotated bibliography/brief literature review within a selected theme
area of spatial thinking/perspectives/methods. The lab assignments will focus on building geospatial databases,
basic spatial analysis, exploratory spatial data analysis, and spatial regression modeling. The courses will include
other labs and assignments that will be completed for no grade; these are intended as mechanisms/opportunities
for developing and enhancing familiarity with selected software, data resources, and analytic methods.
Course Objectives:
- Examine methods and literature of geographic information science, spatial analysis and geographic knowledge
discovery.
- Learn about solving problems and answering questions using GIS and quantitative methods.
- Use GIS software to learn some of the analytical tools available - ArcGIS Desktop & GeoDa.
- Gain experience working with traditional and nontraditional social science data (i.e. Flickr, Twitter).
Richard Heimann © 2013
Thursday, February 21, 13
- 3. Course Notes
Text:
1. Geospatial Analysis, 3rd edition. By: Michael J. de Smith, Michael Goodchild, and Paul A.Longley. The text is available as
an Adobe readable file for download (uses special secure PDF reader), a version for the Kindle, on-line via a website, and as a
printed book. See http://www.spatialanalysisonline.com/ for further information.
2. Making Spatial Decisions Using GIS: A Workbook. 2nd edition. By: Kathryn Keranen and Robert Kolvoord. Should be
available in the Shady Grove Bookstore or ESRI Press or Amazon: http://www.amazon.com/Making-Spatial-Decisions-Using-
GIS/dp/1589482808
3. GeoDa User Guide 0.9.3. (UG) The documentation will be somewhat unsyncronized with the software but not so much so
that you will be prevented from completing labs.
https://geodacenter.asu.edu/software/documentation
4. Exploring Spatial Data with GeoDa: A Workbook (UGW) http://www.csiss.org/clearinghouse/GeoDa/geodaworkbook.pdf
5. Other readings will be required and further suggested. They will be noted in the syllabus and either provided or will be
cited for your discovery.
Optional Text:
a. The GIS 20: Essential Skills - http://www.amazon.com/GIS-20-Essential-Skills/dp/1589482565
Evaluation
Midterm exam (15%) (20 “T/F with explanation”) Based on lectures and readings (open book)
Lab Assignment 50 points (25%) (5 x 10)
Reading Labs 40 points (20%) (4 x 10)
Paper (60 points) & Presentation (20) (40%)
Richard Heimann © 2013
Thursday, February 21, 13
- 4. What will we discuss…?
Methods Theory
-Visual Data Analysis -First Law of Geography
-Spatial Analysis -Spatial Heterogeneity
-ESDA -Spatially Explicit Theory
-Spatial Analysis
-Geographic Knowledge Discovery
-Spatial Econometrics
-Spatial Modeling
Data
Big Data, Small Data vs. Big Data
MAUP, Ecological Fallacy,
Atomistic Fallacy, etc.
Richard Heimann © 2013
Thursday, February 21, 13
- 5. Why GeoDa, Python, and R?
Not a GIS, but…
•Complements all major GIS packages.
•Windows based, so familiar interface.
•Relies on same programming/math as the R package
spdep and extends into Python using PySAL.
• Incorporates more sophisticated statistical routines into
spatial analysis than a GIS (i.e. ArcGIS Desktop).
•Developed by Dr. Luc Anselin, Arizona State U.
•FREE!
•Python is an OS interpreted, object-oriented, high-level
programming language.
• R is an OS strongly functional language and
environment to statistically explore data sets and analyze
datasets.
Richard Heimann © 2013
Thursday, February 21, 13
- 6. What do I mean when I say OS?
Free and Open Source: you can think of it as “free” as in
“free speech,” and “free” as in “free beer.”
Open GeoDa is a cross-platform,
open source version.
PySAL is the underlying open
source library with extended
functionality.
Richard Heimann © 2013
Thursday, February 21, 13
- 7. Introductions
Name
Background
Experience w/ Spatial Analysis
Expectations…
Recently watched movie or book read…
Richard Heimann © 2013
Thursday, February 21, 13
- 8. Geoprocessing & Spatial Analysis (GES673)
What will we talk about today?
Just an introduction...but we will be gaining
momentum.
What is GIS? Spatial Analysis?
Why is Spatial Analysis and what are the four
levels?
The Social Turn in Big Data and the neospatial
analysis and mining for knowledge discovery.
Richard Heimann © 2013
Thursday, February 21, 13
- 9. What is GIS?
This is NOT a GIS Class.
Geographic Information Information is
knowledge about “what is where when”
Geographic/geospatial: synonymous.
...spatial subtly different.
What is the ‘S’ in GIS?
Systems: the technology.
Science: the concepts and theory.
Studies: the societal context.
Richard Heimann © 2013
Thursday, February 21, 13
- 10. Defining Geographic Information Systems (GIS)
The common ground between information processing and the
many fields using spatial analysis techniques. (Tomlinson, 1972)
A powerful set of tools for collecting, storing, retrieving,
transforming, and displaying spatial data from the real world.
(Burroughs, 1986)
A computerised database management system for the capture,
storage, retrieval, analysis and display of spatial (locationally
defined) data. (NCGIA, 1987)
A decision support system involving the integration of spatially
referenced data in a problem solving environment. (Cowen, 1988)
Richard Heimann © 2013
Thursday, February 21, 13
- 11. Geographic Information System:
intuitive description
A map with a database behind it; a virtual
representation of the real world and its
infrastructure.
Richard Heimann © 2013
Thursday, February 21, 13
- 12. GI Systems, Science and Studies
Which will we do?
Systems
Advanced Seminar is GIS GES670
Professional Seminar in Geospatial Technologies GES659
*Geoprocessing and Spatial Analysis GES673
*Spatial Social Science GES679
Science
*Geoprocessing and Spatial Analysis GES673
GIS Modeling Techniques GES773
Spatial Social Science GES679
*Spatial Statistics GES774
Advanced Visualization and Presentation
Studies
*Geoprocessing and Spatial Analysis GES673
GIS Modeling Techniques GES773
*Spatial Social Science GES679
*Combine hands-on technical training with an understanding of the underlying science, and an emphasis on multidisciplinary applications
Richard Heimann © 2013
Thursday, February 21, 13
- 13. Where Most UMBC Students Work and Live
Richard Heimann © 2013
Thursday, February 21, 13
- 14. The GIS Data Model
Richard Heimann © 2013
Thursday, February 21, 13
- 15. The GIS Data Model: Purpose
Allows geographic features to be digitally
represented and stored in a database so that
they can be abstractly presented in map
(analog) form, and can also be worked with
and manipulated to address some problem.
(see associated diagrams)
Richard Heimann © 2013
Thursday, February 21, 13
- 17. A layer-cake of information
GIS Data Model
Richard Heimann © 2013
Thursday, February 21, 13
- 18. Spatial and Attribute Data
Spatial data (where)
specifies location; stored in a shape file, geodatabase or
similar geographic file.
Attribute (descriptive) data (what, how much, when)
specifies characteristics at that location, natural or
human-created stored in a data base table.
GIS systems traditionally maintain spatial and
attribute data separately, then “join” them for display
or analysis.
Richard Heimann © 2013
Thursday, February 21, 13
- 19. Spatial and Attribute Data
ALABAMA AL
Lack of Locational Invariance (Goodchild et al)
ALASKA AK
ARIZONA AZ
ARKANSAS AR
• fundamental property of spatial analysis
CALIFORNIA CA
COLORADO CO
CONNECTICUT CT
• results change when location changes DELAWARE DE
DISTRICT OF COLUMBIA DC
FLORIDA FL
where matters GEORGIA
HAWAII
IDAHO
GA
HI
ID
ILLINOIS IL
INDIANA IN
IOWA IA
KANSAS KS
KENTUCKY KY
LOUISIANA LA
MAINE ME
MARYLAND MD
MASSACHUSETTS MA
MICHIGAN MI
MINNESOTA MN
MISSISSIPPI MS
MISSOURI MO
MONTANA MT
NEBRASKA NE
NEVADA NV
NEW HAMPSHIRE NH
NEW JERSEY NJ
NEW MEXICO NM
NEW YORK NY
NORTH CAROLINA NC
NORTH DAKOTA ND
OHIO OH
OKLAHOMA OK
OREGON OR
PENNSYLVANIA PA
RHODE ISLAND RI
SOUTH CAROLINA SC
SOUTH DAKOTA SD
TENNESSEE TN
TEXAS TX
UTAH UT
VERMONT VT
VIRGINIA VA
WASHINGTON WA
WEST VIRGINIA WV
WISCONSIN WI
WYOMING WY
Richard Heimann © 2013
Thursday, February 21, 13
- 20. Representing Data with Raster and Vector Models
Raster Model
Area is covered by grid with (usually) equal-sized, square cells;
Regular Lattices.
Attributes are recorded by assigning each cell a single value based on
the majority feature (attribute) in the cell, such as land use type.
Image data is a special case of raster data in which the “attribute” is
a reflectance value from the geomagnetic spectrum
Cells in image data often called pixels (picture elements)
Vector Model
The fundamental concept of vector GIS is that all geographic features
in the real work can be represented either as:
Points or dots (nodes): Cities, human sensors, individual obs.
Lines (arcs): movement, connectedness, networks
Areas (polygons): Countries, States, Census Tracts, Cities, Irregular
Lattices - Multivariate in nature.
Richard Heimann © 2013
Thursday, February 21, 13
- 22. Lattice Data; Yes or No?
Irregular
Lattice
Regular Irregular
Lattice Richard Heimann © 2013 Lattice
Thursday, February 21, 13
- 23. What is spatial analysis?
From Data to Information
...beyond mapping.
transformations, manipulations and application of
analytical methods to spatial (geographic) data
Lack of locational invariance (Goodchild et al)
Fundamental property of spatial analysis.
Analyses where the outcome changes when the locations of
the objects under study change.
Median center vs. Median, Standard Deviational Ellipses.,
Autocorrelation vs. Spatial Autocorrelation.
Where matters
In an absolute sense (coordinates)
In a relative sense (spatial arrangement, distance)
Richard Heimann © 2013
Thursday, February 21, 13
- 24. Spatial analysis as a process
Problem formulation
Data gathering
Exploratory analysis
Hypothesis formulation
Modeling and testing
Consultation and review
Reporting and implementation
Richard Heimann © 2013
Thursday, February 21, 13
- 27. Components of Spatial Analysis
Visualization
Showing interesting patterns
Exploratory Spatial Data Analysis (ESDA)
Finding interesting patterns
Spatial Modeling, Regression
Explaining interesting patterns
Richard Heimann © 2013
Thursday, February 21, 13
- 28. THE PROBLEM … GEOGRAPHICAL LITERACY
Despite having a highly education society, Americans are arguably the
world’s most geographically ignorant people
By comparison, children throughout much of the world are exposed to
geographic training in both primary and secondary schools
Most Americans learn what little geography they know in elementary
or middle school.
In the United States, the last time a student hears the word
“geography” is usually in the third grade
Discussion of geography at any higher level is hidden under the heading “social studies”
Concern over geographical illiteracy led President Reagan to declare
November 15-21, 1987 as the first Geography Awareness Week (a joint
resolution of the One Hundredth Congress)
Richard Heimann © 2013
Thursday, February 21, 13
- 29. GEOGRAPHY TODAY
The National Geographic Society released the Roper Public
Affairs 2006 Geographic Literacy Study in May, 2006
510 interviews were conducted among a sample of 18- to 24-year old adults in
the continental United States between December 17, 2006 and January 20,
2006)
The sample has a margin or error of +/- 4.4 % at the 95% confidence level
Survey results …
Over 6 in ten (63%) of those surveyed could not locate Iraq on a map of the Middle
East
Nearly nine in ten (88%) could not identify Afghanistan on a map of Asia
Seven in ten (70%) could not find North Korea on a map, and 63% did not know its
border with South Korea is the most heavily fortified in the world
Sizeable percentages did not know that Sudan and Rwanda are in located in Africa
(54% and 40%, respectively)
Richard Heimann © 2013
Thursday, February 21, 13
- 30. GEOGRAPHY TODAY (CONTINUED)
Three-quarters could not find Indonesia on a world map and were unaware that
a majority of Indonesia’s population is Muslin, making it the largest Muslim
country in the world.
A third or more could not find Louisiana or Mississippi on a map of the United
States.
Only 18% could correctly answer a multiple-choice question about the most
widely spoken native language in the world. (5 Part Questionnaire)
Although half said map reading skills are “absolutely necessary” in today’s
world, many Americans lack basic practical skills necessary for safety and
employment in today’s world.
One-third (34%) would go in the wrong direction in the event of an evacuation
One third (32%) would miss a conference call scheduled with colleagues in
another time zone.
Recommended Link
2006 National Geographic – Roper Survey of Geographic Literacy
Richard Heimann © 2013 http://www.nationalgeographic.com/roper2006/findings.html
Thursday, February 21, 13
- 31. Advanced Placement Human Geography
Score Percent
This college-level course introduces
students to the systematic study of 5 11.6%
patterns and processes that have
shaped human understanding, use, 4 16.7%
and alteration of Earth's surface.
3 21.9%
Students employ spatial concepts
and landscape analyses to analyze 2 16.6%
human social organization and its
environmental consequences. They 1 33.2%
also learn about the methods and
tools geographers use in their In the 2009
science and practice. administration,
50,730 students
took the exam and
the mean score
Richard Heimann © 2013 was a 2.57.
Thursday, February 21, 13
- 33. Human Geography
http://www.benjaminbarber.com/bio.html
Richard Heimann © 2013
Thursday, February 21, 13
- 35. WHAT IS GEOGRAPHY?
• Geography is the study of the earth’s surface as the space within
which human population live
• Geography combines characteristics of both the natural and
social sciences and literally bridges the gap between the two -
more on this later.
• Geography is a generalized as opposed to a specialized field of
study
• Space is the unifying theme for geographers
• Geography is the science of space and place
• Geographers are interested in …
• Where things are located on the earth’s surface
• Why they are located where they are
• How places differ from one another
• How people interact with the environment
• Geographers were among the first scientists to sound the alarm
that human-induced changes to the environment are beginning
to threaten the balance of life
Richard Heimann © 2013
Thursday, February 21, 13
- 36. What was wrong with Geography?
Geography had a number of problems, including:
1. It was overly descriptive
Geography followed a set format for the inventory of physical and
cultural features
2. It was almost purely educational
Regions don't really exist
3. It failed to explain geographic patterns
Geography was descriptive and did not explain why patterns
were the way they were
Where attempts at explanation did exist, they favored historical
approaches
4. The biggest problem of geography was the fact that it was
unscientific
…the Nomothetic & Idiographic debate in geography begins!
Richard Heimann © 2013
Thursday, February 21, 13
- 37. Introduction to Spatial Analysis
Topics
•Description versus Analysis
•The concepts of Process, Pattern and
Analysis
•Issues and challenges in spatial data
analysis
•Measuring space
Richard Heimann © 2013
Thursday, February 21, 13
- 38. Process, Pattern and Analysis
Processes operating in space produce
patterns
Spatial Analysis is aimed at:
1., 2. Identifying and describing the pattern
3., 4. Identifying and understanding the
process
Richard Heimann © 2013
Thursday, February 21, 13
- 39. Complete Spatial Randomness
Deviations from spatial
randomness suggests underlying
social processes.
“Every observable effect has a
physical cause” (Thales)
Perhaps the most profound insight-
Randomized Variable causality is a rejection of the Total TTL Count –
– 500 meter cell 500 meter cell
randomness.
“Every observable effect has a physical cause” (Thales) Perhaps the most profound insight-causality is a rejection of the randomness.
Richard Heimann © 2013
Thursday, February 21, 13
- 40. Complete Spatial Randomness
Randomized Variable Total TTL Count –
– 500 meter cell 500 meter cell
“Every observable effect has a physical cause” (Thales) Perhaps the most profound insight-causality is a rejection of the randomness.
Richard Heimann © 2013
Thursday, February 21, 13
- 41. Description vs. Analysis
Description
Most GIS systems are used by
governments and private
companies to describe the real
world this helps the organization
“do its job”
For example, manage sewer and water
networks manage land resources
Most GIS systems are primarily
designed for this purpose
They are used to develop spatial databases to
describe the real world and help manage it.
Richard Heimann © 2013
Thursday, February 21, 13
- 42. Description vs. Analysis
Analysis
Tries to understand the
processes which cause or
create the patterns in the
real world
Understanding processes:
Helps the organization do its
job better
Make better decisions, for example
Helps us understand the
phenomena itself
This is the role of science
Richard Heimann © 2013
Thursday, February 21, 13
- 43. Description vs. Analysis
Is the locations of the software industry
different from the telecommucations
industry?
Analysis
Tries to understand the
processes which cause or
create the patterns in the
real world
Understanding processes:
Helps the organization do its
job better
Make better decisions, for example
Helps us understand the Here, we are using “centrographic statistics” to
phenomena itself help answer this question
This is the role of science
Richard Heimann © 2013
Thursday, February 21, 13
- 44. Dr. Snow maps cholera in Soho London (1854)
Richard Heimann © 2013
Thursday, February 21, 13
- 45. The first example of Spatial Analysis
Richard Heimann © 2013
Thursday, February 21, 13
- 46. The first example of Spatial Analysis
• John Snow’s maps of cholera in 1850s London
Richard Heimann © 2013
Thursday, February 21, 13
- 47. The first example of Spatial Analysis
• John Snow’s maps of cholera in 1850s London
Richard Heimann © 2013
Thursday, February 21, 13
- 48. The first example of Spatial Analysis
• John Snow’s maps of cholera in 1850s London
Richard Heimann © 2013
Thursday, February 21, 13
- 49. The first example of Spatial Analysis
• John Snow’s maps of cholera in 1850s London
Richard Heimann © 2013
Thursday, February 21, 13
- 50. The first example of Spatial Analysis
• John Snow’s maps of cholera in 1850s London
Richard Heimann © 2013
Thursday, February 21, 13
- 51. The first example of Spatial Analysis
• John Snow’s maps of cholera in 1850s London
Richard Heimann © 2013
Thursday, February 21, 13
- 52. The first example of Spatial Analysis
• John Snow’s maps of cholera in 1850s London
Richard Heimann © 2013
Thursday, February 21, 13
- 53. The first example of Spatial Analysis
• John Snow’s maps of cholera in 1850s London
Was it ESDA or hypothesis testing?
Richard Heimann © 2013
Thursday, February 21, 13
- 54. The first example of Spatial Analysis
• John Snow’s maps of cholera in 1850s London
Was it ESDA or hypothesis testing?
• Did he discover the association between water and
cholera after drawing the map: ESDA
Richard Heimann © 2013
Thursday, February 21, 13
- 55. The first example of Spatial Analysis
• John Snow’s maps of cholera in 1850s London
Was it ESDA or hypothesis testing?
• Did he discover the association between water and
cholera after drawing the map: ESDA
• Did he draw the map in order to prove the
association: using a map for hypothesis testing
Richard Heimann © 2013
Thursday, February 21, 13
- 56. Spatial Analysis: successive levels of sophistication
Four levels of Spatial Analysis:
--Each is more advanced (more difficult!)
Spatial data description (the primitives)
Exploratory Spatial Data Analysis (ESDA)
Spatial statistical analysis and hypothesis testing
Spatial modeling and prediction
We will look at all 4 levels in this lecture series
Richard Heimann © 2013
Thursday, February 21, 13
- 57. Spatial Analysis: successive levels of sophistication
Four levels of Spatial Analysis:
--Each is more advanced (more difficult!)
Spatial data description (the primitives)
Exploratory Spatial Data Analysis (ESDA)
Spatial statistical analysis and hypothesis testing
Spatial modeling and prediction
We will look at all 4 levels in this lecture series
Richard Heimann © 2013
Thursday, February 21, 13
- 58. Spatial Analysis: successive levels of sophistication
1. Spatial data description (primitive):
Focus is on describing the world,
and representing it in a digital
format
--computer map
--computer database
Uses classic GIS capabilities
--buffering, map layer overlay
--spatial queries & measurement
Richard Heimann © 2013
Thursday, February 21, 13
- 59. Spatial Analysis: successive levels of sophistication
1. Spatial data description (primitive):
Focus is on describing the world,
and representing it in a digital
format
--computer map
--computer database
Uses classic GIS capabilities
--buffering, map layer overlay
--spatial queries & measurement
Richard Heimann © 2013
Thursday, February 21, 13
- 60. Spatial Analysis: successive levels of sophistication
2. Exploratory Spatial Data Analysis
Searching for patterns and possible explanations
GeoVisualization through calculation and display
of Centrographic statistics and other spatially
descriptive statistics
Richard Heimann © 2013
Thursday, February 21, 13
- 61. Spatial Analysis: successive levels of sophistication
2. Exploratory Spatial Data Analysis
Centrographics - Moments of Data
Map showing changes to the mean center of population for the
United States, 1790–2010 (U.S. Census Bureau)[1]
http://en.wikipedia.org/wiki/Moment_(mathematics)
Richard Heimann © 2013
Thursday, February 21, 13
- 63. Spatial Analysis: successive levels of sophistication
The Geography of the Nazi Vote:
Context, Confession, and Class in the
Reichstag Election of 1930 Author(s):
John O'Loughlin, Colin Flint, Luc
Anselin Source: Annals of the
Association of American Geographers
Richard Heimann © 2013
Thursday, February 21, 13
- 64. Spatial Analysis: successive levels of sophistication
3. Spatial statistical analysis and hypothesis testing
Are data “to be expected” or are they “unexpected” relative to some
statistical model, usually of a random process (pure chance).
2.5% 2.5%
-1.96 1.96
0
We can test if the spatial pattern for voting behavior in Germany in
1930 is in fact cluster or random.
The Geography of the Nazi Vote: Context, Confession, and Class in the Reichstag Election of 1930 Author(s): John O'Loughlin,
Colin Flint, Luc Anselin Source: Annals of the Association of American Geographers
Richard Heimann © 2013
Thursday, February 21, 13
- 65. Making things even harder...
• Inward and outward asymptotics i.e. increasing
spatial extent, increasing temporal lags, finer
spatial resolution, finer temporal resolution.
• Increased number of cross sections.
• …visual correlations and visual detection of
change over space and time do not exist.
• Apophenia is real!
• Spatial Analysis and Geographic Pattern
Recognition will reduce patternicity (Sherman,
2008).
Richard Heimann © 2013
Thursday, February 21, 13
- 67. Spatially Random or Spatially Clustered?
Moran’s I: Moran’s I:
0.689 0.003
Richard Heimann © 2013
Thursday, February 21, 13
- 68. Spatial Analysis: successive levels of sophistication
4. Spatial modeling: prediction
Construct models (of processes) to predict spatial outcomes
(patterns)
Coefficient: % Poverty Coefficient: % FB Coefficient: % Elderly Coefficient: % Black
Richard Heimann © 2013
Thursday, February 21, 13
- 70. Spatial Analysis: successive levels of sophistication
Statistically
Statistically significant
significant global global
variables that variables that
exhibit strong exhibit little
regional regional
variation
variation inform inform region
local policy. wide policy.
Richard Heimann © 2013
Thursday, February 21, 13
- 71. Spatial Analysis: successive levels of sophistication
Local R2 informs
us where the
model is
performing well
and where it is
performing
poorly.
The poor results
in the south may
indicate that an
important
variable is
missing from our
model.
Richard Heimann © 2013
Thursday, February 21, 13
- 72. Issues/Challenges/Problems
in Spatial Analysis
Summarize these now.
Talk in greater detail about
them throughout this
lecture series.
Richard Heimann © 2013
Thursday, February 21, 13
- 74. Critical Issues in Spatial Analysis
• Spatial autocorrelation
– Data from locations near to each other are usually more similar than data from
locations far away from each other
Richard Heimann © 2013
Thursday, February 21, 13
- 75. Critical Issues in Spatial Analysis
• Spatial autocorrelation
– Data from locations near to each other are usually more similar than data from
locations far away from each other
• Modifiable areal unit problem (MAUP-zone )
– Results may depend on the specific geographic unit used in the study
– Province or county; county or city
Richard Heimann © 2013
Thursday, February 21, 13
- 76. Critical Issues in Spatial Analysis
• Spatial autocorrelation
– Data from locations near to each other are usually more similar than data from
locations far away from each other
• Modifiable areal unit problem (MAUP-zone )
– Results may depend on the specific geographic unit used in the study
– Province or county; county or city
• Scale affects representation and results
– Cities may be represented as points or polygons
– Results depend on the scale at which the analysis is conducted: province or county
– MAUP—scale effect
Richard Heimann © 2013
Thursday, February 21, 13
- 77. Critical Issues in Spatial Analysis
• Spatial autocorrelation
– Data from locations near to each other are usually more similar than data from
locations far away from each other
• Modifiable areal unit problem (MAUP-zone )
– Results may depend on the specific geographic unit used in the study
– Province or county; county or city
• Scale affects representation and results
– Cities may be represented as points or polygons
– Results depend on the scale at which the analysis is conducted: province or county
– MAUP—scale effect
• Ecological fallacy
– Results obtained from aggregated data (e.g. provinces) cannot be assumed to
apply to individual people
– MAUP—individual effect
Richard Heimann © 2013
Thursday, February 21, 13
- 78. Critical Issues in Spatial Analysis
• Spatial autocorrelation
– Data from locations near to each other are usually more similar than data from
locations far away from each other
• Modifiable areal unit problem (MAUP-zone )
– Results may depend on the specific geographic unit used in the study
– Province or county; county or city
• Scale affects representation and results
– Cities may be represented as points or polygons
– Results depend on the scale at which the analysis is conducted: province or county
– MAUP—scale effect
• Ecological fallacy
– Results obtained from aggregated data (e.g. provinces) cannot be assumed to
apply to individual people
– MAUP—individual effect
• Non-uniformity of Space
– Phenomena are not distributed evenly in space
– Be careful how you interpret results!
Richard Heimann © 2013
Thursday, February 21, 13
- 79. Critical Issues in Spatial Analysis
• Spatial autocorrelation
– Data from locations near to each other are usually more similar than data from
locations far away from each other
• Modifiable areal unit problem (MAUP-zone )
– Results may depend on the specific geographic unit used in the study
– Province or county; county or city
• Scale affects representation and results
– Cities may be represented as points or polygons
– Results depend on the scale at which the analysis is conducted: province or county
– MAUP—scale effect
• Ecological fallacy
– Results obtained from aggregated data (e.g. provinces) cannot be assumed to
apply to individual people
– MAUP—individual effect
• Non-uniformity of Space
– Phenomena are not distributed evenly in space
– Be careful how you interpret results!
• Edge issues
– Edges of the map, beyond which there is no data, can significantly affect results
Richard Heimann © 2013
Thursday, February 21, 13
- 80. The common problems...
http://www.amazon.com/GIS-20-Essential-Skills/dp/1589482565
Richard Heimann © 2013
Thursday, February 21, 13
- 82. Fundamental Spatial Concepts
Distance
The magnitude of spatial separation
Euclidean (straight line) distance often only an
approximation
Adjacency or neighborhood
Nominal or binary (0,1) equivalent of distance
Levels of adjacency exist: 1st, 2nd, 3rd nearest
neighbor, etc..
Interaction
The strength of the relationship between entities
An inverse function of distance
Richard Heimann © 2013
Thursday, February 21, 13
- 83. Review (Part 1)
What is Spatial Analysis?
What are the four levels of Spatial Analysis?
What are the three measures?
Richard Heimann © 2013
Thursday, February 21, 13
- 84. Take a Break!
Richard Heimann © 2013
Thursday, February 21, 13
- 85. Nontraditional Spatial Analysis
Traditional spatial analyses grew up in an era of sparse data and very weak
computational power. Today, both of those circumstances are reversed and
many of the old solutions are no longer suitable to answer todays questions.
"Spatial Analysis and Data Mining", reflects this change and combines two things
which, until recently, engaged quite different groups of researchers and
practitioners. Together, they require particular techniques and a sophisticated
understanding of the special problems associated with spatial data. This
geographic data mining, or Geographic Knowledge Discovery (GKD), is not new,
but is developing and changing rapidly as both more, and different, data becomes
available, and people see new applications. The days of ‘Big Data’ require fresh
thinking.
The aim of geographic data mining (GKD) is to assist in the generation of
hypotheses, which can be tested, about interesting or anomalous spatial patterns
which may be discovered in very large databases. It is important that the patterns
discovered should not be statistical or sampling artifacts, and should be nontrivial
and useful. The intent is not to build a system that makes decisions or
interpretations automatically, but supports humans in these tasks. Also GKD is
not synonymous with statistical analyses, such tools have a role in the testing of
hypotheses generated by GKD but not in GKD itself.
Richard Heimann © 2013
Thursday, February 21, 13
- 86. DATA is the new OIL…
Richard Heimann © 2013
Thursday, February 21, 13
- 87. Long Tail of Big Data
Head: Big Data
Long Tail: Intelligence Reporting, Science Data – Dark Data
Head: Big Data – Large continuous datasets coincident over Time & Space. Ideal for multivariate analysis.
Tail {power law distribution} is good for business but suboptimal for governance. Data in tail is often
unmaintained beyond their initially designed use case and individually curated. As a result, the data is
discontiguous from other research efforts and discontinuous over space and time.
Dark data is suspected to exist or ought to exist but is difficult or impossible to find. The problem of dark data is
real and prevalent in the tail. The long tail is an intractably large management problem.
Richard Heimann © 2013
Thursday, February 21, 13
- 88. Long Tail of NSF data…
Power law 80% 20%
Number of Grants 7,478 1,869
Dollar Amount $938,548,595 $1,199,088,125
Total Grants (NSF07) 9,347 (Count) $2,137,636,716 (Amount)
Richard Heimann © 2013
Thursday, February 21, 13
- 89. Long Tail of data science…
Head Tail
Homogenous Heterogeneous
Centralized curation Individual curation
Maintained Unmaintained
Continuous over S & T Discontinuous over S & T
Visibly accessible DARK Data
High Velocity Slow or NO velocity
High Volume Low Volume
Easier Data Integration Harder Data Integration
Unreasonable Effectiveness of Data Reasonable Effectiveness of Data
Open Innovation – Integrated Research Closed Innovation – Vertical Research
Richard Heimann © 2013
Thursday, February 21, 13
- 90. The Open Innovation Model
In the new model of open innovation, a company commercializes both its own
ideas as well as innovations from other firms and seeks ways to bring its in-
house ideas to market by deploying pathways outside its current businesses.
Note that the boundary between the company and its surrounding
environment is porous (represented by a dashed line), enabling innovations to
move more easily between the two.
Henry W. Chesbrough, Era of Open Innovation. SPRING 2003 MIT SLOAN MANAGEMENT REVIEW
Richard Heimann © 2013
Thursday, February 21, 13
- 91. “The Unreasonable
Effectiveness of “The Unreasonable
Mathematics in the Effectiveness of Data”
Natural Sciences”
Eugene Wigner (1960 Nobel Peter Norvig Director of Research
Laureate) at Google Inc.
Richard Heimann © 2013
Thursday, February 21, 13
- 92. Big Data, Small Theory
Spatial Simpson’s Paradox
Global standards will always compete with local social
phenomenon.
Violence in the Violence in the
north north
Violence
Violence in the
south Violence in the south
Global models average regionally variant Local models account for regional variation.
phenomenon.
Richard Heimann © 2013
Thursday, February 21, 13
- 93. New Aged Experimentation
George Box
“”The only way to understand complex
systems is to shock those systems and
observe the way they react””
New motivation for experimentation
especially in quasi-experimental methods.
(...more later)
Richard Heimann © 2013
Thursday, February 21, 13
- 95. Nontraditional Datasets
Twitter – Sampled ongoing collection of social media tweets with UserId and
time. Some even have precise location data, but this is not the norm. Collection
pulls roughly between 1-2 million tweets / day.
Example Proxy Problems:
Discovery of crowd-sourced phenomena (e.g., people posting to beware of a certain
neighborhood)
Discovery of correlated trends (e.g., finding that people posting about a certain topic in an
area correlates to higher crime in that area)
Tracking sentiment on certain topics and issues
Tracking language usage in areas to determine abnormal language presence in an area
Richard Heimann © 2013
Thursday, February 21, 13
- 96. What is Geographic Knowledge Discovery??
• How can we infer movement patterns from vast amounts of what
appears to be just point data collected in time and associated with an
identifier ?
• Technique is applicable to Twitter, FourSquare and MANY others.
Volume plot of photos binned by area on log scale
Paris as seen from Flickr over all time
Richard Heimann © 2013
Thursday, February 21, 13
- 97. What is Geographic Knowledge Discovery??
Aggregate micro-pathing on a world of photo metadata with no speed,
time, or distance restrictions
Richard Heimann © 2013
Thursday, February 21, 13
- 98. Personal Notes
Richard Heimann
Office: UMBC Common Faculty Area 3rd Floor
Phone: 571-403-0119 (C)
Office hours:
Tues. 6:30-7:00 (Virtual);
or by appointment (send e-mail)
I promptly respond to emails. Phone calls are another
matter.
Email: rheimann@umbc.edu or
heimann.richard@gmail.com
Richard Heimann © 2013
Thursday, February 21, 13
- 99. Thank you…
Data Tactics Corporation
https://www.data-tactics-corp.com/
http://datatactics.blogspot.com/
Twitter: @DataTactics
Rich Heimann
Twitter: @rheimann
Richard Heimann © 2013
Thursday, February 21, 13