• Save
Spatial Analysis and Geomatics
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Spatial Analysis and Geomatics

on

  • 1,298 views

GES673 UMBC MPS in GIS; Spatial Analysis and Geoprocessing using traditional and nontraditional datasets.

GES673 UMBC MPS in GIS; Spatial Analysis and Geoprocessing using traditional and nontraditional datasets.

Statistics

Views

Total Views
1,298
Views on SlideShare
1,297
Embed Views
1

Actions

Likes
4
Downloads
0
Comments
0

1 Embed 1

https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Spatial Analysis and Geomatics Presentation Transcript

  • 1. Geoprocessing & Spatial Analysis GES673 at Shady Grove Richard Heimann Richard Heimann © 2013Thursday, February 21, 13
  • 2. Course DescriptionThe increased access to spatial data and overall improved application of spatial analytical methods presentcertain challenges to social scientific research. This graduate course is designed to focus on substantive socialscience research topics while exposing rewards and potential risks involved in the application of geographicinformation systems (GIS), spatial analysis, and spatial statistics in their own research. The course will also highlight connections between spatial concepts and data availability. Both traditional spatialscience data will be used as well as new emerging social media data, which better reflect some of the more recentlydevelopments in Big Data - most notably the socially critical exploration of such data. Substantive foci will includereadings and discussions of spatially explicit theory leaning toward acknowledgment of a social and spatial turnin Big Data and enhanced role and extension of spatial analysis to keep with such trends.Throughout the course, lectures and discussions will be complemented with lab sessions introducing spatialanalysis methods and GIS and spatial analysis software. The lab sessions will include the use of among othersoftware GeoDa and ArcGIS. These lab sessions will introduce many methodological and technical issues relevantto spatial analysis. Assignments for the courses include up to two writing assignments, up to four lab assignments,and a final project which will be presented as a short 15-minute presentation as well as submitted as a term paper.The writing assignments will include an annotated bibliography/brief literature review within a selected themearea of spatial thinking/perspectives/methods. The lab assignments will focus on building geospatial databases,basic spatial analysis, exploratory spatial data analysis, and spatial regression modeling. The courses will includeother labs and assignments that will be completed for no grade; these are intended as mechanisms/opportunitiesfor developing and enhancing familiarity with selected software, data resources, and analytic methods.Course Objectives:- Examine methods and literature of geographic information science, spatial analysis and geographic knowledgediscovery.- Learn about solving problems and answering questions using GIS and quantitative methods.- Use GIS software to learn some of the analytical tools available - ArcGIS Desktop & GeoDa.- Gain experience working with traditional and nontraditional social science data (i.e. Flickr, Twitter). Richard Heimann © 2013Thursday, February 21, 13
  • 3. Course NotesText:1. Geospatial Analysis, 3rd edition. By: Michael J. de Smith, Michael Goodchild, and Paul A.Longley. The text is available asan Adobe readable file for download (uses special secure PDF reader), a version for the Kindle, on-line via a website, and as aprinted book. See http://www.spatialanalysisonline.com/ for further information.2. Making Spatial Decisions Using GIS: A Workbook. 2nd edition. By: Kathryn Keranen and Robert Kolvoord. Should beavailable in the Shady Grove Bookstore or ESRI Press or Amazon: http://www.amazon.com/Making-Spatial-Decisions-Using-GIS/dp/15894828083. GeoDa User Guide 0.9.3. (UG) The documentation will be somewhat unsyncronized with the software but not so much sothat you will be prevented from completing labs.https://geodacenter.asu.edu/software/documentation4. Exploring Spatial Data with GeoDa: A Workbook (UGW) http://www.csiss.org/clearinghouse/GeoDa/geodaworkbook.pdf5. Other readings will be required and further suggested. They will be noted in the syllabus and either provided or will becited for your discovery.Optional Text:a. The GIS 20: Essential Skills - http://www.amazon.com/GIS-20-Essential-Skills/dp/1589482565EvaluationMidterm exam (15%) (20 “T/F with explanation”) Based on lectures and readings (open book)Lab Assignment 50 points (25%) (5 x 10)Reading Labs 40 points (20%) (4 x 10)Paper (60 points) & Presentation (20) (40%) Richard Heimann © 2013Thursday, February 21, 13
  • 4. What will we discuss…? Methods Theory-Visual Data Analysis -First Law of Geography-Spatial Analysis -Spatial Heterogeneity-ESDA -Spatially Explicit Theory-Spatial Analysis-Geographic Knowledge Discovery-Spatial Econometrics-Spatial Modeling Data Big Data, Small Data vs. Big Data MAUP, Ecological Fallacy, Atomistic Fallacy, etc. Richard Heimann © 2013Thursday, February 21, 13
  • 5. Why GeoDa, Python, and R?Not a GIS, but…•Complements all major GIS packages.•Windows based, so familiar interface.•Relies on same programming/math as the R packagespdep and extends into Python using PySAL.• Incorporates more sophisticated statistical routines intospatial analysis than a GIS (i.e. ArcGIS Desktop).•Developed by Dr. Luc Anselin, Arizona State U.•FREE!•Python is an OS interpreted, object-oriented, high-levelprogramming language.• R is an OS strongly functional language andenvironment to statistically explore data sets and analyzedatasets. Richard Heimann © 2013Thursday, February 21, 13
  • 6. What do I mean when I say OS? Free and Open Source: you can think of it as “free” as in “free speech,” and “free” as in “free beer.”   Open GeoDa is a cross-platform, open source version. PySAL is the underlying open source library with extended functionality. Richard Heimann © 2013Thursday, February 21, 13
  • 7. Introductions Name Background Experience w/ Spatial Analysis Expectations… Recently watched movie or book read… Richard Heimann © 2013Thursday, February 21, 13
  • 8. Geoprocessing & Spatial Analysis (GES673)What will we talk about today?Just an introduction...but we will be gainingmomentum.What is GIS? Spatial Analysis?Why is Spatial Analysis and what are the fourlevels?The Social Turn in Big Data and the neospatialanalysis and mining for knowledge discovery. Richard Heimann © 2013Thursday, February 21, 13
  • 9. What is GIS? This is NOT a GIS Class. Geographic Information Information is knowledge about “what is where when” Geographic/geospatial: synonymous. ...spatial subtly different. What is the ‘S’ in GIS? Systems: the technology. Science: the concepts and theory. Studies: the societal context. Richard Heimann © 2013Thursday, February 21, 13
  • 10. Defining Geographic Information Systems (GIS)The common ground between information processing and themany fields using spatial analysis techniques. (Tomlinson, 1972)A powerful set of tools for collecting, storing, retrieving,transforming, and displaying spatial data from the real world.(Burroughs, 1986)A computerised database management system for the capture,storage, retrieval, analysis and display of spatial (locationallydefined) data. (NCGIA, 1987)A decision support system involving the integration of spatiallyreferenced data in a problem solving environment. (Cowen, 1988) Richard Heimann © 2013Thursday, February 21, 13
  • 11. Geographic Information System: intuitive description A map with a database behind it; a virtual representation of the real world and its infrastructure. Richard Heimann © 2013Thursday, February 21, 13
  • 12. GI Systems, Science and Studies Which will we do? Systems Advanced Seminar is GIS GES670 Professional Seminar in Geospatial Technologies GES659 *Geoprocessing and Spatial Analysis GES673 *Spatial Social Science GES679 Science *Geoprocessing and Spatial Analysis GES673 GIS Modeling Techniques GES773 Spatial Social Science GES679 *Spatial Statistics GES774 Advanced Visualization and Presentation Studies *Geoprocessing and Spatial Analysis GES673 GIS Modeling Techniques GES773 *Spatial Social Science GES679 *Combine hands-on technical training with an understanding of the underlying science, and an emphasis on multidisciplinary applications Richard Heimann © 2013Thursday, February 21, 13
  • 13. Where Most UMBC Students Work and Live Richard Heimann © 2013Thursday, February 21, 13
  • 14. The GIS Data Model Richard Heimann © 2013Thursday, February 21, 13
  • 15. The GIS Data Model: Purpose Allows geographic features to be digitally represented and stored in a database so that they can be abstractly presented in map (analog) form, and can also be worked with and manipulated to address some problem. (see associated diagrams) Richard Heimann © 2013Thursday, February 21, 13
  • 16. Richard Heimann © 2013Thursday, February 21, 13
  • 17. A layer-cake of information GIS Data Model Richard Heimann © 2013Thursday, February 21, 13
  • 18. Spatial and Attribute Data Spatial data (where) specifies location; stored in a shape file, geodatabase or similar geographic file. Attribute (descriptive) data (what, how much, when) specifies characteristics at that location, natural or human-created stored in a data base table. GIS systems traditionally maintain spatial and attribute data separately, then “join” them for display or analysis. Richard Heimann © 2013Thursday, February 21, 13
  • 19. Spatial and Attribute Data ALABAMA ALLack of Locational Invariance (Goodchild et al) ALASKA AK ARIZONA AZ ARKANSAS AR• fundamental property of spatial analysis CALIFORNIA CA COLORADO CO CONNECTICUT CT• results change when location changes DELAWARE DE DISTRICT OF COLUMBIA DC FLORIDA FLwhere matters GEORGIA HAWAII IDAHO GA HI ID ILLINOIS IL INDIANA IN IOWA IA KANSAS KS KENTUCKY KY LOUISIANA LA MAINE ME MARYLAND MD MASSACHUSETTS MA MICHIGAN MI MINNESOTA MN MISSISSIPPI MS MISSOURI MO MONTANA MT NEBRASKA NE NEVADA NV NEW HAMPSHIRE NH NEW JERSEY NJ NEW MEXICO NM NEW YORK NY NORTH CAROLINA NC NORTH DAKOTA ND OHIO OH OKLAHOMA OK OREGON OR PENNSYLVANIA PA RHODE ISLAND RI SOUTH CAROLINA SC SOUTH DAKOTA SD TENNESSEE TN TEXAS TX UTAH UT VERMONT VT VIRGINIA VA WASHINGTON WA WEST VIRGINIA WV WISCONSIN WI WYOMING WY Richard Heimann © 2013Thursday, February 21, 13
  • 20. Representing Data with Raster and Vector Models Raster Model Area is covered by grid with (usually) equal-sized, square cells; Regular Lattices. Attributes are recorded by assigning each cell a single value based on the majority feature (attribute) in the cell, such as land use type. Image data is a special case of raster data in which the “attribute” is a reflectance value from the geomagnetic spectrum Cells in image data often called pixels (picture elements) Vector Model The fundamental concept of vector GIS is that all geographic features in the real work can be represented either as: Points or dots (nodes): Cities, human sensors, individual obs. Lines (arcs): movement, connectedness, networks Areas (polygons): Countries, States, Census Tracts, Cities, Irregular Lattices - Multivariate in nature. Richard Heimann © 2013Thursday, February 21, 13
  • 21. Lattice Data; Yes or No? Richard Heimann © 2013Thursday, February 21, 13
  • 22. Lattice Data; Yes or No? Irregular Lattice Regular Irregular Lattice Richard Heimann © 2013 LatticeThursday, February 21, 13
  • 23. What is spatial analysis? From Data to Information ...beyond mapping. transformations, manipulations and application of analytical methods to spatial (geographic) data Lack of locational invariance (Goodchild et al) Fundamental property of spatial analysis. Analyses where the outcome changes when the locations of the objects under study change. Median center vs. Median, Standard Deviational Ellipses., Autocorrelation vs. Spatial Autocorrelation. Where matters In an absolute sense (coordinates) In a relative sense (spatial arrangement, distance) Richard Heimann © 2013Thursday, February 21, 13
  • 24. Spatial analysis as a process Problem formulation Data gathering Exploratory analysis Hypothesis formulation Modeling and testing Consultation and review Reporting and implementation Richard Heimann © 2013Thursday, February 21, 13
  • 25. Analytical methodologies Mitchell (2005) Draper et al (2005) Richard Heimann © 2013Thursday, February 21, 13
  • 26. Analytical methodologies - PPDAC Mackay & Oldford (2002) Richard Heimann © 2013Thursday, February 21, 13
  • 27. Components of Spatial Analysis Visualization Showing interesting patterns Exploratory Spatial Data Analysis (ESDA) Finding interesting patterns Spatial Modeling, Regression Explaining interesting patterns Richard Heimann © 2013Thursday, February 21, 13
  • 28. THE PROBLEM … GEOGRAPHICAL LITERACY Despite having a highly education society, Americans are arguably the world’s most geographically ignorant people By comparison, children throughout much of the world are exposed to geographic training in both primary and secondary schools Most Americans learn what little geography they know in elementary or middle school. In the United States, the last time a student hears the word “geography” is usually in the third grade Discussion of geography at any higher level is hidden under the heading “social studies” Concern over geographical illiteracy led President Reagan to declare November 15-21, 1987 as the first Geography Awareness Week (a joint resolution of the One Hundredth Congress) Richard Heimann © 2013Thursday, February 21, 13
  • 29. GEOGRAPHY TODAY The National Geographic Society released the Roper Public Affairs 2006 Geographic Literacy Study in May, 2006 510 interviews were conducted among a sample of 18- to 24-year old adults in the continental United States between December 17, 2006 and January 20, 2006) The sample has a margin or error of +/- 4.4 % at the 95% confidence level Survey results … Over 6 in ten (63%) of those surveyed could not locate Iraq on a map of the Middle East Nearly nine in ten (88%) could not identify Afghanistan on a map of Asia Seven in ten (70%) could not find North Korea on a map, and 63% did not know its border with South Korea is the most heavily fortified in the world Sizeable percentages did not know that Sudan and Rwanda are in located in Africa (54% and 40%, respectively) Richard Heimann © 2013Thursday, February 21, 13
  • 30. GEOGRAPHY TODAY (CONTINUED) Three-quarters could not find Indonesia on a world map and were unaware that a majority of Indonesia’s population is Muslin, making it the largest Muslim country in the world. A third or more could not find Louisiana or Mississippi on a map of the United States. Only 18% could correctly answer a multiple-choice question about the most widely spoken native language in the world. (5 Part Questionnaire) Although half said map reading skills are “absolutely necessary” in today’s world, many Americans lack basic practical skills necessary for safety and employment in today’s world. One-third (34%) would go in the wrong direction in the event of an evacuation One third (32%) would miss a conference call scheduled with colleagues in another time zone. Recommended Link 2006 National Geographic – Roper Survey of Geographic Literacy Richard Heimann © 2013 http://www.nationalgeographic.com/roper2006/findings.htmlThursday, February 21, 13
  • 31. Advanced Placement Human Geography Score Percent This college-level course introduces students to the systematic study of 5 11.6% patterns and processes that have shaped human understanding, use, 4 16.7% and alteration of Earths surface. 3 21.9% Students employ spatial concepts and landscape analyses to analyze 2 16.6% human social organization and its environmental consequences. They 1 33.2% also learn about the methods and tools geographers use in their In the 2009 science and practice. administration, 50,730 students took the exam and the mean score Richard Heimann © 2013 was a 2.57. Thursday, February 21, 13
  • 32. Human Geography Richard Heimann © 2013Thursday, February 21, 13
  • 33. Human Geography http://www.benjaminbarber.com/bio.html Richard Heimann © 2013Thursday, February 21, 13
  • 34. Human Geography Richard Heimann © 2013Thursday, February 21, 13
  • 35. WHAT IS GEOGRAPHY?• Geography is the study of the earth’s surface as the space within which human population live• Geography combines characteristics of both the natural and social sciences and literally bridges the gap between the two - more on this later.• Geography is a generalized as opposed to a specialized field of study• Space is the unifying theme for geographers• Geography is the science of space and place• Geographers are interested in … • Where things are located on the earth’s surface • Why they are located where they are • How places differ from one another • How people interact with the environment• Geographers were among the first scientists to sound the alarm that human-induced changes to the environment are beginning to threaten the balance of life Richard Heimann © 2013Thursday, February 21, 13
  • 36. What was wrong with Geography? Geography had a number of problems, including: 1. It was overly descriptive Geography followed a set format for the inventory of physical and cultural features 2. It was almost purely educational Regions dont really exist 3. It failed to explain geographic patterns Geography was descriptive and did not explain why patterns were the way they were Where attempts at explanation did exist, they favored historical approaches 4. The biggest problem of geography was the fact that it was unscientific …the Nomothetic & Idiographic debate in geography begins! Richard Heimann © 2013Thursday, February 21, 13
  • 37. Introduction to Spatial AnalysisTopics•Description versus Analysis•The concepts of Process, Pattern andAnalysis•Issues and challenges in spatial dataanalysis•Measuring space Richard Heimann © 2013Thursday, February 21, 13
  • 38. Process, Pattern and Analysis Processes operating in space produce patterns Spatial Analysis is aimed at: 1., 2. Identifying and describing the pattern 3., 4. Identifying and understanding the process Richard Heimann © 2013Thursday, February 21, 13
  • 39. Complete Spatial Randomness Deviations from spatial randomness suggests underlying social processes. “Every observable effect has a physical cause” (Thales) Perhaps the most profound insight-Randomized Variable causality is a rejection of the Total TTL Count – – 500 meter cell 500 meter cell randomness. “Every observable effect has a physical cause” (Thales) Perhaps the most profound insight-causality is a rejection of the randomness. Richard Heimann © 2013Thursday, February 21, 13
  • 40. Complete Spatial RandomnessRandomized Variable Total TTL Count – – 500 meter cell 500 meter cell “Every observable effect has a physical cause” (Thales) Perhaps the most profound insight-causality is a rejection of the randomness. Richard Heimann © 2013Thursday, February 21, 13
  • 41. Description vs. Analysis Description Most GIS systems are used by governments and private companies to describe the real world this helps the organization “do its job” For example, manage sewer and water networks manage land resources Most GIS systems are primarily designed for this purpose They are used to develop spatial databases to describe the real world and help manage it. Richard Heimann © 2013Thursday, February 21, 13
  • 42. Description vs. Analysis Analysis Tries to understand the processes which cause or create the patterns in the real world Understanding processes: Helps the organization do its job better Make better decisions, for example Helps us understand the phenomena itself This is the role of science Richard Heimann © 2013Thursday, February 21, 13
  • 43. Description vs. Analysis Is the locations of the software industry different from the telecommucations industry? Analysis Tries to understand the processes which cause or create the patterns in the real world Understanding processes: Helps the organization do its job better Make better decisions, for example Helps us understand the Here, we are using “centrographic statistics” to phenomena itself help answer this question This is the role of science Richard Heimann © 2013Thursday, February 21, 13
  • 44. Dr. Snow maps cholera in Soho London (1854) Richard Heimann © 2013Thursday, February 21, 13
  • 45. The first example of Spatial Analysis Richard Heimann © 2013Thursday, February 21, 13
  • 46. The first example of Spatial Analysis• John Snow’s maps of cholera in 1850s London Richard Heimann © 2013Thursday, February 21, 13
  • 47. The first example of Spatial Analysis• John Snow’s maps of cholera in 1850s London Richard Heimann © 2013Thursday, February 21, 13
  • 48. The first example of Spatial Analysis• John Snow’s maps of cholera in 1850s London Richard Heimann © 2013Thursday, February 21, 13
  • 49. The first example of Spatial Analysis• John Snow’s maps of cholera in 1850s London Richard Heimann © 2013Thursday, February 21, 13
  • 50. The first example of Spatial Analysis• John Snow’s maps of cholera in 1850s London Richard Heimann © 2013Thursday, February 21, 13
  • 51. The first example of Spatial Analysis• John Snow’s maps of cholera in 1850s London Richard Heimann © 2013Thursday, February 21, 13
  • 52. The first example of Spatial Analysis• John Snow’s maps of cholera in 1850s London Richard Heimann © 2013Thursday, February 21, 13
  • 53. The first example of Spatial Analysis• John Snow’s maps of cholera in 1850s LondonWas it ESDA or hypothesis testing? Richard Heimann © 2013Thursday, February 21, 13
  • 54. The first example of Spatial Analysis• John Snow’s maps of cholera in 1850s LondonWas it ESDA or hypothesis testing?• Did he discover the association between water and cholera after drawing the map: ESDA Richard Heimann © 2013Thursday, February 21, 13
  • 55. The first example of Spatial Analysis• John Snow’s maps of cholera in 1850s LondonWas it ESDA or hypothesis testing?• Did he discover the association between water and cholera after drawing the map: ESDA• Did he draw the map in order to prove the association: using a map for hypothesis testing Richard Heimann © 2013Thursday, February 21, 13
  • 56. Spatial Analysis: successive levels of sophistication Four levels of Spatial Analysis: --Each is more advanced (more difficult!) Spatial data description (the primitives) Exploratory Spatial Data Analysis (ESDA) Spatial statistical analysis and hypothesis testing Spatial modeling and prediction We will look at all 4 levels in this lecture series Richard Heimann © 2013Thursday, February 21, 13
  • 57. Spatial Analysis: successive levels of sophistication Four levels of Spatial Analysis: --Each is more advanced (more difficult!) Spatial data description (the primitives) Exploratory Spatial Data Analysis (ESDA) Spatial statistical analysis and hypothesis testing Spatial modeling and prediction We will look at all 4 levels in this lecture series Richard Heimann © 2013Thursday, February 21, 13
  • 58. Spatial Analysis: successive levels of sophistication 1. Spatial data description (primitive): Focus is on describing the world, and representing it in a digital format --computer map --computer database Uses classic GIS capabilities --buffering, map layer overlay --spatial queries & measurement Richard Heimann © 2013Thursday, February 21, 13
  • 59. Spatial Analysis: successive levels of sophistication 1. Spatial data description (primitive): Focus is on describing the world, and representing it in a digital format --computer map --computer database Uses classic GIS capabilities --buffering, map layer overlay --spatial queries & measurement Richard Heimann © 2013Thursday, February 21, 13
  • 60. Spatial Analysis: successive levels of sophistication 2. Exploratory Spatial Data Analysis Searching for patterns and possible explanations GeoVisualization through calculation and display of Centrographic statistics and other spatially descriptive statistics Richard Heimann © 2013Thursday, February 21, 13
  • 61. Spatial Analysis: successive levels of sophistication 2. Exploratory Spatial Data Analysis Centrographics - Moments of Data Map showing changes to the mean center of population for the United States, 1790–2010 (U.S. Census Bureau)[1] http://en.wikipedia.org/wiki/Moment_(mathematics) Richard Heimann © 2013Thursday, February 21, 13
  • 62. Spatial Analysis: successive levels of sophistication Richard Heimann © 2013Thursday, February 21, 13
  • 63. Spatial Analysis: successive levels of sophistication The Geography of the Nazi Vote: Context, Confession, and Class in the Reichstag Election of 1930 Author(s): John OLoughlin, Colin Flint, Luc Anselin Source: Annals of the Association of American Geographers Richard Heimann © 2013Thursday, February 21, 13
  • 64. Spatial Analysis: successive levels of sophistication 3. Spatial statistical analysis and hypothesis testing Are data “to be expected” or are they “unexpected” relative to some statistical model, usually of a random process (pure chance). 2.5% 2.5% -1.96 1.96 0 We can test if the spatial pattern for voting behavior in Germany in 1930 is in fact cluster or random. The Geography of the Nazi Vote: Context, Confession, and Class in the Reichstag Election of 1930 Author(s): John OLoughlin, Colin Flint, Luc Anselin Source: Annals of the Association of American Geographers Richard Heimann © 2013Thursday, February 21, 13
  • 65. Making things even harder...• Inward and outward asymptotics i.e. increasing spatial extent, increasing temporal lags, finer spatial resolution, finer temporal resolution.• Increased number of cross sections.• …visual correlations and visual detection of change over space and time do not exist.• Apophenia is real!• Spatial Analysis and Geographic Pattern Recognition will reduce patternicity (Sherman, 2008). Richard Heimann © 2013Thursday, February 21, 13
  • 66. Spatially Random or Spatially Clustered? Richard Heimann © 2013Thursday, February 21, 13
  • 67. Spatially Random or Spatially Clustered? Moran’s I: Moran’s I: 0.689 0.003 Richard Heimann © 2013Thursday, February 21, 13
  • 68. Spatial Analysis: successive levels of sophistication 4. Spatial modeling: prediction Construct models (of processes) to predict spatial outcomes (patterns)Coefficient: % Poverty Coefficient: % FB Coefficient: % Elderly Coefficient: % Black Richard Heimann © 2013Thursday, February 21, 13
  • 69. Spatial Analysis: successive levels of sophistication Richard Heimann © 2013Thursday, February 21, 13
  • 70. Spatial Analysis: successive levels of sophistication Statistically Statistically significant significant global global variables that variables that exhibit strong exhibit little regional regional variation variation inform inform region local policy. wide policy. Richard Heimann © 2013Thursday, February 21, 13
  • 71. Spatial Analysis: successive levels of sophistication Local R2 informs us where the model is performing well and where it is performing poorly. The poor results in the south may indicate that an important variable is missing from our model. Richard Heimann © 2013Thursday, February 21, 13
  • 72. Issues/Challenges/Problems in Spatial Analysis Summarize these now. Talk in greater detail about them throughout this lecture series. Richard Heimann © 2013Thursday, February 21, 13
  • 73. Critical Issues in Spatial Analysis Richard Heimann © 2013Thursday, February 21, 13
  • 74. Critical Issues in Spatial Analysis• Spatial autocorrelation – Data from locations near to each other are usually more similar than data from locations far away from each other Richard Heimann © 2013Thursday, February 21, 13
  • 75. Critical Issues in Spatial Analysis• Spatial autocorrelation – Data from locations near to each other are usually more similar than data from locations far away from each other• Modifiable areal unit problem (MAUP-zone ) – Results may depend on the specific geographic unit used in the study – Province or county; county or city Richard Heimann © 2013Thursday, February 21, 13
  • 76. Critical Issues in Spatial Analysis• Spatial autocorrelation – Data from locations near to each other are usually more similar than data from locations far away from each other• Modifiable areal unit problem (MAUP-zone ) – Results may depend on the specific geographic unit used in the study – Province or county; county or city• Scale affects representation and results – Cities may be represented as points or polygons – Results depend on the scale at which the analysis is conducted: province or county – MAUP—scale effect Richard Heimann © 2013Thursday, February 21, 13
  • 77. Critical Issues in Spatial Analysis• Spatial autocorrelation – Data from locations near to each other are usually more similar than data from locations far away from each other• Modifiable areal unit problem (MAUP-zone ) – Results may depend on the specific geographic unit used in the study – Province or county; county or city• Scale affects representation and results – Cities may be represented as points or polygons – Results depend on the scale at which the analysis is conducted: province or county – MAUP—scale effect• Ecological fallacy – Results obtained from aggregated data (e.g. provinces) cannot be assumed to apply to individual people – MAUP—individual effect Richard Heimann © 2013Thursday, February 21, 13
  • 78. Critical Issues in Spatial Analysis• Spatial autocorrelation – Data from locations near to each other are usually more similar than data from locations far away from each other• Modifiable areal unit problem (MAUP-zone ) – Results may depend on the specific geographic unit used in the study – Province or county; county or city• Scale affects representation and results – Cities may be represented as points or polygons – Results depend on the scale at which the analysis is conducted: province or county – MAUP—scale effect• Ecological fallacy – Results obtained from aggregated data (e.g. provinces) cannot be assumed to apply to individual people – MAUP—individual effect• Non-uniformity of Space – Phenomena are not distributed evenly in space – Be careful how you interpret results! Richard Heimann © 2013Thursday, February 21, 13
  • 79. Critical Issues in Spatial Analysis• Spatial autocorrelation – Data from locations near to each other are usually more similar than data from locations far away from each other• Modifiable areal unit problem (MAUP-zone ) – Results may depend on the specific geographic unit used in the study – Province or county; county or city• Scale affects representation and results – Cities may be represented as points or polygons – Results depend on the scale at which the analysis is conducted: province or county – MAUP—scale effect• Ecological fallacy – Results obtained from aggregated data (e.g. provinces) cannot be assumed to apply to individual people – MAUP—individual effect• Non-uniformity of Space – Phenomena are not distributed evenly in space – Be careful how you interpret results!• Edge issues – Edges of the map, beyond which there is no data, can significantly affect results Richard Heimann © 2013Thursday, February 21, 13
  • 80. The common problems... http://www.amazon.com/GIS-20-Essential-Skills/dp/1589482565 Richard Heimann © 2013Thursday, February 21, 13
  • 81. Measuring Space Richard Heimann © 2013Thursday, February 21, 13
  • 82. Fundamental Spatial Concepts Distance The magnitude of spatial separation Euclidean (straight line) distance often only an approximation Adjacency or neighborhood Nominal or binary (0,1) equivalent of distance Levels of adjacency exist: 1st, 2nd, 3rd nearest neighbor, etc.. Interaction The strength of the relationship between entities An inverse function of distance Richard Heimann © 2013Thursday, February 21, 13
  • 83. Review (Part 1) What is Spatial Analysis? What are the four levels of Spatial Analysis? What are the three measures? Richard Heimann © 2013Thursday, February 21, 13
  • 84. Take a Break! Richard Heimann © 2013Thursday, February 21, 13
  • 85. Nontraditional Spatial Analysis Traditional spatial analyses grew up in an era of sparse data and very weak computational power. Today, both of those circumstances are reversed and many of the old solutions are no longer suitable to answer todays questions."Spatial Analysis and Data Mining", reflects this change and combines two things which, until recently, engaged quite different groups of researchers and practitioners. Together, they require particular techniques and a sophisticated understanding of the special problems associated with spatial data. This geographic data mining, or Geographic Knowledge Discovery (GKD), is not new,but is developing and changing rapidly as both more, and different, data becomes available, and people see new applications. The days of ‘Big Data’ require fresh thinking. The aim of geographic data mining (GKD) is to assist in the generation ofhypotheses, which can be tested, about interesting or anomalous spatial patternswhich may be discovered in very large databases. It is important that the patternsdiscovered should not be statistical or sampling artifacts, and should be nontrivial and useful. The intent is not to build a system that makes decisions or interpretations automatically, but supports humans in these tasks. Also GKD is not synonymous with statistical analyses, such tools have a role in the testing of hypotheses generated by GKD but not in GKD itself. Richard Heimann © 2013Thursday, February 21, 13
  • 86. DATA is the new OIL… Richard Heimann © 2013Thursday, February 21, 13
  • 87. Long Tail of Big Data Head: Big Data Long Tail: Intelligence Reporting, Science Data – Dark DataHead: Big Data – Large continuous datasets coincident over Time & Space. Ideal for multivariate analysis.Tail {power law distribution} is good for business but suboptimal for governance. Data in tail is oftenunmaintained beyond their initially designed use case and individually curated. As a result, the data isdiscontiguous from other research efforts and discontinuous over space and time.Dark data is suspected to exist or ought to exist but is difficult or impossible to find. The problem of dark data isreal and prevalent in the tail. The long tail is an intractably large management problem. Richard Heimann © 2013Thursday, February 21, 13
  • 88. Long Tail of NSF data… Power law 80% 20% Number of Grants 7,478 1,869 Dollar Amount $938,548,595 $1,199,088,125 Total Grants (NSF07) 9,347 (Count) $2,137,636,716 (Amount) Richard Heimann © 2013Thursday, February 21, 13
  • 89. Long Tail of data science… Head Tail Homogenous Heterogeneous Centralized curation Individual curation Maintained Unmaintained Continuous over S & T Discontinuous over S & T Visibly accessible DARK Data High Velocity Slow or NO velocity High Volume Low Volume Easier Data Integration Harder Data Integration Unreasonable Effectiveness of Data Reasonable Effectiveness of Data Open Innovation – Integrated Research Closed Innovation – Vertical Research Richard Heimann © 2013Thursday, February 21, 13
  • 90. The Open Innovation Model In the new model of open innovation, a company commercializes both its own ideas as well as innovations from other firms and seeks ways to bring its in- house ideas to market by deploying pathways outside its current businesses. Note that the boundary between the company and its surrounding environment is porous (represented by a dashed line), enabling innovations to move more easily between the two. Henry W. Chesbrough, Era of Open Innovation. SPRING 2003 MIT SLOAN MANAGEMENT REVIEW Richard Heimann © 2013Thursday, February 21, 13
  • 91. “The Unreasonable Effectiveness of “The Unreasonable Mathematics in the Effectiveness of Data” Natural Sciences” Eugene Wigner (1960 Nobel Peter Norvig Director of Research Laureate) at Google Inc. Richard Heimann © 2013Thursday, February 21, 13
  • 92. Big Data, Small Theory Spatial Simpson’s Paradox Global standards will always compete with local social phenomenon. Violence in the Violence in the north north Violence Violence in the south Violence in the south Global models average regionally variant Local models account for regional variation. phenomenon. Richard Heimann © 2013Thursday, February 21, 13
  • 93. New Aged Experimentation George Box “”The only way to understand complex systems is to shock those systems and observe the way they react”” New motivation for experimentation especially in quasi-experimental methods. (...more later) Richard Heimann © 2013Thursday, February 21, 13
  • 94. New Aged Experimentation Richard Heimann © 2013Thursday, February 21, 13
  • 95. Nontraditional Datasets Twitter – Sampled ongoing collection of social media tweets with UserId and time. Some even have precise location data, but this is not the norm. Collection pulls roughly between 1-2 million tweets / day. Example Proxy Problems: Discovery of crowd-sourced phenomena (e.g., people posting to beware of a certain neighborhood) Discovery of correlated trends (e.g., finding that people posting about a certain topic in an area correlates to higher crime in that area) Tracking sentiment on certain topics and issues Tracking language usage in areas to determine abnormal language presence in an area Richard Heimann © 2013Thursday, February 21, 13
  • 96. What is Geographic Knowledge Discovery??• How can we infer movement patterns from vast amounts of what appears to be just point data collected in time and associated with an identifier ?• Technique is applicable to Twitter, FourSquare and MANY others. Volume plot of photos binned by area on log scale Paris as seen from Flickr over all time Richard Heimann © 2013Thursday, February 21, 13
  • 97. What is Geographic Knowledge Discovery?? Aggregate micro-pathing on a world of photo metadata with no speed, time, or distance restrictions Richard Heimann © 2013Thursday, February 21, 13
  • 98. Personal Notes Richard Heimann Office: UMBC Common Faculty Area 3rd Floor Phone: 571-403-0119 (C) Office hours: Tues. 6:30-7:00 (Virtual); or by appointment (send e-mail) I promptly respond to emails. Phone calls are another matter. Email: rheimann@umbc.edu or heimann.richard@gmail.com Richard Heimann © 2013Thursday, February 21, 13
  • 99. Thank you… Data Tactics Corporation https://www.data-tactics-corp.com/ http://datatactics.blogspot.com/ Twitter: @DataTactics Rich Heimann Twitter: @rheimann Richard Heimann © 2013Thursday, February 21, 13