Spatial Analysis with R - the Good, the Bad, and the Pretty

12,818 views

Published on

This presentation was given to the Davis R Users' Group on May 23 by Robert Hijmans

Published in: Education, Technology

Spatial Analysis with R - the Good, the Bad, and the Pretty

  1. 1. The good, the bad & the prettySpatial data analysis with RRobert HijmansUniversity of California, DavisMay 2013
  2. 2. Spatial is special• Complex: geometry and attributes• Earth is flat? Map projections• Size: lots and lots of it, multivariate, time series• Special plots: maps• First Law of Geography: nearby things are similar– Statistical assumptions: violated– Interpolation: possible
  3. 3. GIS* –● Visual interaction –• Data management –• Geometric operations –• Standard workflows –• Single map production –• Click, click, click & click –• Speed of execution –• Cumbersome –Dont we have GIS for that?– R– Data & model focused ●– Analysis ●– Attributes as important ●– Creativity & innovation ●– Many (simpler) maps ●– Repeatability (single script) ●– Speed of development ●– Easy & powerful (& free) ●* there are many different GISs and they evolve
  4. 4. Geometry of spatial objects (‘vector’)points, lines, polygonsXY
  5. 5. (Xmin, Ymax)dimXdimY(Xmax, Ymin)Geometry of spatial field (grid / raster data)row 1row 6col 1 col 51 2 3 4 56 726 27 28 29 3024 25
  6. 6. MODIS, 22 May, 2013
  7. 7. Representing spatial datasp classes:SpatialPointsDataFrameSpatialLinesDataFrameSpatialPolygonsDataFrameSpatialGridDataFrameSpatialPixelsDataFramergdalread/write of object (vector) and raster data,(shapefiles, geotiff)> library(rgdal)> city <- readOGR(d:/data, city)> elev <- readGDAL(d:/data/elevation.tif)
  8. 8. Map projectionscoordinate reference systemClass: CRSproj4string(city) <- CRS(+proj=lonlat +datum=WGS84)cityutm <- sptransform(city, CRS(+proj=utm +zone=51))
  9. 9. Types of spatial analysis*• Query and reasoningWhere is? How much is this here? How to get from A to B?• „MeasurementArea, Distance, Length, Slope• „TransformationBuffering, overlay, interpolation• „Exploration and descriptionclusters, trends, spatial dependence, fragmentation• „OptimizationSite selection, re-districting, traveling salesman• „InferenceSamples from a population, problem of spatial autocorrelation• ModelingClimate change effects, impact of nuclear accident, dispersal* After Michael Goodchild: http://www.csiss.org/aboutus/presentations/files/goodchild_qmss_oct02.pdf
  10. 10. Spatial statistics• Point pattern analysis• Geostatistics (kriging)• Inference (hypothesis testing)
  11. 11. 1. Location of points is of prime interest2. Points are not a sample3. Points are within a defined study area4. Points should be true incidents (not centroids)Point patterns
  12. 12. Point patterns> library(spatstat); library(maptools)> cityOwin <- as(city, “owin”)> pts <- coordinates(crime)> p <- ppp(pts[,1], pts[,2], window=cityOwin)> s <- smooth.ppp(p)> e <- envelope(p) http://www.spatstat.org/
  13. 13. Geostatistics> library(gstat)> data(meuse)> coordinates(meuse) <- ~x+y> spplot(meuse, zinc)1. Measurements are of prime interest (not locations)2. Points are a sample3. Unbiased estimates for locations that were not sampled
  14. 14. > x <- krige(log(zinc)~1, meuse, meuse.grid, model = m)> spplot(x["var1.pred"], main="ordinary kriging predictions")> spplot(x["var1.var"], main = "ordinary kriging variance")
  15. 15. > f <- houseValue ~ age + nBedrooms> m <- lm(f1, data=hh)> summary(m)Call:lm(formula = f1, data = hh)Residuals:Min 1Q Median 3Q Max-222541 -67489 -6128 60509 217655Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) -628578 233217 -2.695 0.00931 **age 12695 2480 5.119 4.05e-06 ***nBedrooms 191889 76756 2.500 0.01543 *Regression with spatial data
  16. 16. Analyse the model residuals for SA (e.g. Morans I)
  17. 17. > library(spdep)> cb <- poly2nb(ca)> lw <- nb2listw(cb)> plot(ca)> plot(lw, coordinates(ca),add=TRUE, col="red")> moran.test(residuals, lw)Morans I test under randomisationMoran I statistic standard deviate = 2.6926, p-value = 0.003545alternative hypothesis: greatersample estimates:Moran I statistic Expectation Variance0.158977893 -0.010101010 0.003943149
  18. 18. If SA ‘significant’ then you could• Re-specify your model• Permit the coefficients,  , to vary spatially(GWR)• Modify the regression model to incorporate the SA• Proceed and ignore SA?
  19. 19. OLS: Y = Xβ + eAutogregressive model: Y = ρWY + eSimultaneous Autoregressive Models:SAR-lag: Y = ρWY + Xβ + e(endogenous, inherent spat. autocorrelation, diffusion )SAR-err: Y = Xβ + λWu + e(exogenous, induced spatial autocorrelation)SAR-mix: Y = ρWY + Xβ + WXγ + eCAR
  20. 20. raster package• new classes (‘S4’) for raster data• no file size restrictions• file formats: gdal, ncdf, ‘native’• > 200 functions
  21. 21. RasterLayer> library(raster)>> x <- raster(ncol=10, nrow=5)>> x <- raster(volcano.tif)>> xclass : RasterLayerdimensions : 87, 61, 5307 (nrow, ncol, ncell)resolution : 10, 10 (x, y)extent : 2667400, 2668010, 6478700, 6479570 (xmin, xmax, …coord. ref. : +proj=nzmg +lat_0=-41 +lon_0=173 +x_0=251values : d:datavolcano.tifmin value : 94max value : 195
  22. 22. > str(x)Formal class RasterLayer [package "raster"] with 16 slots..@ file :Formal class .RasterFile [package "raster"] with 9 slots. . .. ..@ name : chr “d:datavolcano.tif“.. .. ..@ driver : chr "gdal"..@ data :Formal class .SingleLayerData [package "raster"] with 11 slots.. .. ..@ values : logi(0).. .. ..@ inmemory : logi FALSE.. .. ..@ min : num 94. . .. ..@ max : num 195..@ extent :Formal class Extent [package "raster"] with 4 slots.. .. ..@ xmin: num 2667400.. .. ..@ xmax: num 2668010.. @ rotation :Formal class .Rotation [package "raster"] with 2 slots.. .. ..@ geotrans: num(0).. .. ..@ transfun:function ()..@ ncols : int 61..@ nrows : int 87..@ crs :Formal class CRS [package "sp"] with 1 slots.. .. ..@ projargs: chr " +proj=nzmg +lat_0=-41 +lon_0=173 +x_0=2510000 +y_0=6023150..@ layernames: chr "volcano”RasterLayer
  23. 23. Multiple layersRasterStack - many filesRasterBrick - single files> s <- stack(x, x*2, sqrt(x))>> sclass : RasterStackdimensions : 87, 61, 5307, 3 (nrow, ncol, ncell,nlayers)resolution : 0.01639344, 0.01149425 (x, y)extent : 0, 1, 0, 1 (xmin, xmax, ymin, ymax)coord. ref. : NAmin values : 94.0, 188.0, 9.7max values : 195, 390, 14layer names : layer.1, layer.2, layer.3
  24. 24. 01 – 1011 – 2526 – 5051 – 100> 100Daily rainfall
  25. 25. Some functionsncell(x)xyFromCell(x, 10)getValues(x, row)adjacent(x, 10)writeRaster(x, filename, …)merge, crop, project, aggregate,reclass, resample,rasterize, distance, focal …“High level”“Low level”
  26. 26. r <- raster(nc=10, nr=10)values(r) <- 1:ncell(r)q <- sqrt(r)x <- (q + r) * 2s <- stack(r, q, x)ss <- s * rRaster algebra
  27. 27. > elev <- getData(worldclim, var=alt, res=2.5)> usa1 <- getData(GADM, country=USA, level=1)> ca <- usa1[usa1$NAME_1 == California, ]> bio <- getData(worldclim, var=bio, res=5)> library(dismo)> bg <- sampleRandom(bio, ext=extent(ca), size=1000)> obs <- extract(bio, bigfoot)> alt <- crop(elev, ca)> alt <- mask(alt, ca)> plot(alt)> points(bigfoot)Modeling bigfoot(after Hickerson et al., 2008)data from:http://www.bfro.net/news/google_earth.asp
  28. 28. Likelihoodof occurrence> d <- data.frame(pa=c(rep(1, nrow(obs)), rep(0, nrow(bg))),rbind(obs, bg))> library(randomForest)> rf <- randomForest(pa~., data=d)> pred <- predict(bio, rf)> plot(pred)> plot(ca, add=T)> points(sel2, col=blue, pch=20)
  29. 29. VisualizationplotplotRGBcontourplot3D…
  30. 30. > library(rasterVis)> plot(s, addfun=function()plot(esp, add=T))
  31. 31. > library(rasterVis)> alt <- getData(worldclim, var=alt, res=2.5)> usa1 <- getData(GADM, country=USA, level=1)> ca <- usa1[usa1$NAME_1 == California, ]> alt <- crop(alt, extent(ca)+ 0.5)> alt <- mask(alt, ca)> levelplot(alt, par.settings=GrTheme)
  32. 32. http://www.revolutionanalytics.com/news-events/free-webinars/2012/ggplot2-with-hadley-wickham/http://spatialanalysis.co.uk/2012/02/great-maps-ggplot2/
  33. 33. > library(dismo)> g <- gmap(Mountain View, CA)> plot(g, interpolate=T)> xy <- geocode("2600 Casey Ave, Mountain View, CA")> points(Mercator(xy[,2:3]), col=red, pch=*, cex=5)
  34. 34. http://flowingdata.com/2011/05/11/how-to-map-connections-with-great-circles/> library(geosphere)> inter <- gcIntermediate(lonlat1, lonlat2, n=100)> lines(inter, col=colors, lwd=lwd)
  35. 35. .http://cran.r-project.org/web/views/Spatial.htmlMore info

×