6.
1. Why we use R for analyzing spatial data
2. The relation between R and geographical information
systems (GIS)
3. What spatial data are, and the types of spatial data
we distinguish
4. The challenges posed by their storage and display
5. The analysis of observed spatial data in relation to
processes thought to have generated them
6. Sources of information about the use of R for spatial
data analysis and the structure of the book.
Presentation Layout
8.
What is R?
Ross Ihaka and Robert Gentleman - University of
Auckland, New Zealand.
Currently developed by the R Development Core
team
“R is a free software environment for statistical
computing and graphics.
runs on a wide variety of UNIX platforms, Windows
and MacOS
Everything in R is an object
10.
Interactive Language
Data Structure
Graphics
Missing Values
Functions As First Class Object
Package
Community
https://www.rstudio.com/products/rstudio/featur
es/
Why use R?
12.
R has a package system that makes it
extremely easy for people to add their
own functionality so it is
indistinguishable from the central part
of R. And people have. There are
thousands of packages that do all sorts
of extraordinary things.
Packages
17.
What is GIS
Geographical Information System
“…a powerful set of tools for collecting, storing,
retrieving at will, transforming, and displaying
spatial data from the real world for a particular set of
purposes “ - Burrough and McDonnell (1998, p. 11)
“…checking, manipulating, and analysing data,
which are spatially referenced to the Earth”
19.
“Generally speaking, spatial data
represents the location, size and shape
of an object on planet Earth such as a
building, lake, mountain or township.
Spatial data may also include attributes
that provide more information about
the entity that is being represented.”
28.
2 parts
first presenting the shared R packages, functions,
classes, and methods for handling spatial data
showcases more specialized kinds of spatial data
analysis, in which the relative position of
observations in space may contribute to
understanding the data generation process
Layout of the book
29.
If you have any questions, please
don’t hesitate to ask it now because
when I say “thank you”, no more
question will be answered.
Q & A
Does the spatial patterning of disease incidences give rise to the conclusion that they are clustered, and if so, are the clusters found related to factors such as age, relative poverty, or pollution sources?
Given a number of observed soil samples, which part of a study area is polluted?
Given scattered air quality measurements, how many people are exposed to high levels of black smoke or particulate matter (e.g. PM10),1 and where do they live?
Do governments tend to compare their policies with those of their neigh- bours, or do they behave independently?
Agriculture land
Weather – Rainfall
Population Density
R was created by Ross Ihaka and Robert Gentleman[9] at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team, of which Chambers is a member. R is named partly after the first names of the first two R authors and partly as a play on the name of S.
“R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. To download R, please choose your preferred CRAN mirror.”
R implements the language S, an object-oriented language designed for data analysis.
R is used mostly in academia, S-Plus more in corporate businesses
I everything in R is an object
R uses a data base where it stores its objects; this is empty or loaded on start-up, and (possibly) saved on exit
during run-time, R does everything in memory, unless you load or save data from/to disk or connection.
www.r-project.org:
Data analysis is inherently an interactive process — what you see at one stage determines what you want to do next. Interactivity is important. Language is important. The two together — an interactive language — is even more than their sum.
R has a fantastic mechanism for creating data structures. Obviously if you are doing data analysis, you want to be able to put your data into a natural form. You don’t have to warp your data into a particular structure because that is all that is available.
Graphics should be central to data analysis. Humans are predominantly visual, we don’t intuitively grasp numbers like we do pictures. It is easy to produce graphs for exploring data. The default graphs can be tweaked to get publication-quality graphs.
Functions, like mean and median, are objects that you can use like data. You can easily change your analysis to use the median (or some strange estimate you make up on the spot) rather than the mean
Real data have missing values. Missing values are an integral part of the R language. Many functions have arguments that control how missing values are to be handled.
Spatial data, also known as geospatial data, is information about a physical object that can be represented by numerical values in a geographic coordinate system.
Generally speaking, spatial data represents the location, size and shape of an object on planet Earth such as a building, lake, mountain or township. Spatial data may also include attributes that provide more information about the entity that is being represented. Geographic Information Systems (GIS) or other specialized software applications can be used to access, visualize, manipulate and analyze geospatial data.
Point, a single point location, such as a GPS reading or a geocoded address
Line, a set of ordered points, connected by straight line segments
Polygon, an area, marked by one or more enclosing lines, possibly containing holesGrid, a collection of points or rectangular cells, organised in a regular lattice
The first three are vector data models and represent entities as exactly as possible,
while the final data model is a raster data model, representing continuous surfaces by using a regular tessellation
Kept in memory
Update online
Displayed on maps
Other property: coastline, river, admistrative bounderies, etc
on-screen graphics and has many graph- ics drivers, for example for vector graphics output to PostScript, Windows metafiles, PDF, and many bitmapped graphics formats