Essentials of a
Bas Kempen
© ISRIC-World Soil Information, 2017. Reproduction or dissemination of the work as a whole or parts is not permitted without
consent of the author. Sale or placement on a website where payment must be made to access the document is strictly prohibited. https://creativecommons.org/licenses/by-nc-nd/4.0/
What is R?
• R is a free software environment for statistical computing and
graphics
• R provides a wide variety of statistical (linear and nonlinear
modelling, classical statistical tests, time-series analysis,
classification, clustering, …) and graphical techniques, and is highly
extensible.
– Base functionality (comes with R installation)
– Extension via ‘Packages’ (~6,700)
• www.r-project.org/
• www.r-tutor.com
Why R?
• It is free
• It runs on a variety of platforms
• Platform for (advanced) statistical data analyses
• State-of-the-art graphic capabilities
• Connects with other software (SAGA GIS, GE, Python)
• Very large user community on the web; lots of resources
• R has a steep learning curve
• Thousands of packages, not always easy to find what you are
looking for
• Sometimes cryptic error messages
• Not a GIS
Working with R
RStudio
First steps.....
• Tell R where to find and save your files: setting the
working directory using the ‘setwd’ command:
– setwd(“D:/Bas/SpringSchool/Rintro/workingdir”)
– setwd(“D:BasSpringSchoolRintroworkingdir”)
• Load packages that are required for your analyses:
– library(gstat)
– require(gstat)
(packages should be installed on your computer)
• Note: R is case-sensitive!
Setting up your session
R scripts and data objects
• R scripts are saved as “<name>.R” files
• R objects in the environment can be saved for future
use.
• Save the entire environment or only a couple of
objects.
• Objects are saved as “<name>.rda” or
“<name>.RDATA” files
Basic Data Types
• Numeric
• Integer
• Logical
• Character
• Factor
Vector
Function
Vectors
• Sequence of data elements of the same basic
type.
• Vectors can be combined.
Vector arithmetic
• Vectorized operations: most operations work on vectors with
the same syntax as they work on scalars (no need for looping)
• Vector arithmetic:
• Recycling of vector elements:
Other data structures
• Matrices
• Lists
• Data frames
• Data frame is the fundamental data structure
for statistical modelling in R.
• Data frame is a table with columns and rows
(fields and records).
Data frame
• Columns can have different data types
(numeric, integer, logical, character, factor)
• All columns must have the same length
Selecting subsets
• Selection is done with ‘[...]’
• Vector:
• Data frame:
Functions
• Data analyses and modelling is done through functions.
• These can be very simple:
• More complex functions have multiple arguments (inputs)
• Arguments have specific requirements
• Access help: ?fit.variogram
Plotting
• Large number of packages and functions for
generating plots with basic functionality to
‘high-level’: e.g. lattice and ggplot.
• The basic function for plotting is ‘plot’
ggplot (http://ggplot2.org/)
• library(ggplot2)
• Build your plot layer by layer
• Building blocks:
– geom: the geometric object that describes the
type of plot that is produced.
– aes: ‘aesthetics’, defines the visual properties of
the variables that are going to be plotted.
– scales: control the legend, plot layout
Plotting with ggplot
Plotting with ggplot
Plotting with ggplot
Plotting with ggplot
Importing data
• Importing from tables:
– csv: read.csv()
– txt: read.table()
– xlxs: read.xlsx() [requires package ‘xlsx’]
• Data is imported as a data.frame
Exporting data
• Variety of exporting formats for tabular data:
• Saving plots
Working with spatial data in R
Spatial Data in R
• R offers a wide variety of packages and tools that
can handle spatial data.
• Note: R is not a GIS.
• R is not so memory efficient.
• Relevant packages:
– sp: handling spatial data
– raster: reading/manipulating/writing spatial raster
data
– rgdal: reading/writing spatial data
– maptools: reading/manipulating/writing spatial
polygon data (not maintained anymore)
Spatial data classes and formats
• Vector: points, lines and polygons (areal).
• For storing data that has discrete boundaries, such as
country borders, land parcels, and streets.
• Format: shapefile
Spatial data classes and formats
• Raster: surface divided into a regular grid of cells.
• For storing data that varies continuously, as in a
satellite image, a surface of chemical concentrations,
or an elevation surface.
• Format: GeoTiff (allows embedding spatial reference
information, metadata and color legends. It also
supports internal compression)
• (Ascii, ESRI Grid)
Structures for spatial data
• Spatial data is nothing more than a data frame
that has columns with X and Y coordinates.
• Example:
• Let’s now take a look at R classes for spatial data
(sp package)
Spatial data classes I
• Convert a data frame to a SpatialPointsDataFrame
object with the coordinates function.
Spatial data classes I
• Data frame with data points at regular
intervals.
Spatial data classes II
• Create a grid object with the gridded function:
SpatialPixelsDataFrame
Spatial data classes III
• Create a full grid object with the fullgrid function:
SpatialGridDataFrame
Spatial data classes IV
• The sp class for polygon data is the
SpatialPolygonsDataFrame.
Spatial data classes VI
• The raster package comes with its own class
for raster data: RasterLayer.
Importing spatial data
• rgdal: readOGR (vector), readGDAL (raster)
• maptools: readShapePoly, readShapePoints,
readShapeLines (functions not maintained anymore)
• raster: raster
Projections
• Once you have loaded your spatial data in R,
you have to tell R its geographic projection.
• Check the current projection: proj4string
function.
• Setting a projection: CRS function.
• Reprojecting to another coordinate system:
spTransform function.
Projections
Projections
Plotting
• sp package: spplot
Plotting
ggplot I
ggplot II
ggplot III
Interactive maps
• Interactive maps can be generated with the
leaflet package.
Exporting spatial data
• rgdal: writeOGR (vector), writeGDAL (raster)
• maptools: writePolyShape, writePointsShape
• raster: writeRaster
Now lets practice. Have fun!!

Essentials of R

  • 1.
    Essentials of a BasKempen © ISRIC-World Soil Information, 2017. Reproduction or dissemination of the work as a whole or parts is not permitted without consent of the author. Sale or placement on a website where payment must be made to access the document is strictly prohibited. https://creativecommons.org/licenses/by-nc-nd/4.0/
  • 2.
    What is R? •R is a free software environment for statistical computing and graphics • R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible. – Base functionality (comes with R installation) – Extension via ‘Packages’ (~6,700) • www.r-project.org/ • www.r-tutor.com
  • 3.
    Why R? • Itis free • It runs on a variety of platforms • Platform for (advanced) statistical data analyses • State-of-the-art graphic capabilities • Connects with other software (SAGA GIS, GE, Python) • Very large user community on the web; lots of resources • R has a steep learning curve • Thousands of packages, not always easy to find what you are looking for • Sometimes cryptic error messages • Not a GIS
  • 4.
  • 5.
  • 6.
    First steps..... • TellR where to find and save your files: setting the working directory using the ‘setwd’ command: – setwd(“D:/Bas/SpringSchool/Rintro/workingdir”) – setwd(“D:BasSpringSchoolRintroworkingdir”) • Load packages that are required for your analyses: – library(gstat) – require(gstat) (packages should be installed on your computer) • Note: R is case-sensitive!
  • 7.
  • 8.
    R scripts anddata objects • R scripts are saved as “<name>.R” files • R objects in the environment can be saved for future use. • Save the entire environment or only a couple of objects. • Objects are saved as “<name>.rda” or “<name>.RDATA” files
  • 9.
    Basic Data Types •Numeric • Integer • Logical • Character • Factor Vector Function
  • 10.
    Vectors • Sequence ofdata elements of the same basic type. • Vectors can be combined.
  • 11.
    Vector arithmetic • Vectorizedoperations: most operations work on vectors with the same syntax as they work on scalars (no need for looping) • Vector arithmetic: • Recycling of vector elements:
  • 12.
    Other data structures •Matrices • Lists • Data frames • Data frame is the fundamental data structure for statistical modelling in R. • Data frame is a table with columns and rows (fields and records).
  • 13.
    Data frame • Columnscan have different data types (numeric, integer, logical, character, factor) • All columns must have the same length
  • 14.
    Selecting subsets • Selectionis done with ‘[...]’ • Vector: • Data frame:
  • 15.
    Functions • Data analysesand modelling is done through functions. • These can be very simple: • More complex functions have multiple arguments (inputs) • Arguments have specific requirements • Access help: ?fit.variogram
  • 16.
    Plotting • Large numberof packages and functions for generating plots with basic functionality to ‘high-level’: e.g. lattice and ggplot. • The basic function for plotting is ‘plot’
  • 17.
    ggplot (http://ggplot2.org/) • library(ggplot2) •Build your plot layer by layer • Building blocks: – geom: the geometric object that describes the type of plot that is produced. – aes: ‘aesthetics’, defines the visual properties of the variables that are going to be plotted. – scales: control the legend, plot layout
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
    Importing data • Importingfrom tables: – csv: read.csv() – txt: read.table() – xlxs: read.xlsx() [requires package ‘xlsx’] • Data is imported as a data.frame
  • 23.
    Exporting data • Varietyof exporting formats for tabular data: • Saving plots
  • 24.
  • 25.
    Spatial Data inR • R offers a wide variety of packages and tools that can handle spatial data. • Note: R is not a GIS. • R is not so memory efficient. • Relevant packages: – sp: handling spatial data – raster: reading/manipulating/writing spatial raster data – rgdal: reading/writing spatial data – maptools: reading/manipulating/writing spatial polygon data (not maintained anymore)
  • 26.
    Spatial data classesand formats • Vector: points, lines and polygons (areal). • For storing data that has discrete boundaries, such as country borders, land parcels, and streets. • Format: shapefile
  • 27.
    Spatial data classesand formats • Raster: surface divided into a regular grid of cells. • For storing data that varies continuously, as in a satellite image, a surface of chemical concentrations, or an elevation surface. • Format: GeoTiff (allows embedding spatial reference information, metadata and color legends. It also supports internal compression) • (Ascii, ESRI Grid)
  • 28.
    Structures for spatialdata • Spatial data is nothing more than a data frame that has columns with X and Y coordinates. • Example: • Let’s now take a look at R classes for spatial data (sp package)
  • 29.
    Spatial data classesI • Convert a data frame to a SpatialPointsDataFrame object with the coordinates function.
  • 30.
    Spatial data classesI • Data frame with data points at regular intervals.
  • 31.
    Spatial data classesII • Create a grid object with the gridded function: SpatialPixelsDataFrame
  • 32.
    Spatial data classesIII • Create a full grid object with the fullgrid function: SpatialGridDataFrame
  • 33.
    Spatial data classesIV • The sp class for polygon data is the SpatialPolygonsDataFrame.
  • 34.
    Spatial data classesVI • The raster package comes with its own class for raster data: RasterLayer.
  • 35.
    Importing spatial data •rgdal: readOGR (vector), readGDAL (raster) • maptools: readShapePoly, readShapePoints, readShapeLines (functions not maintained anymore) • raster: raster
  • 36.
    Projections • Once youhave loaded your spatial data in R, you have to tell R its geographic projection. • Check the current projection: proj4string function. • Setting a projection: CRS function. • Reprojecting to another coordinate system: spTransform function.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
    Interactive maps • Interactivemaps can be generated with the leaflet package.
  • 45.
    Exporting spatial data •rgdal: writeOGR (vector), writeGDAL (raster) • maptools: writePolyShape, writePointsShape • raster: writeRaster
  • 46.