Introduction to R software, by Leire ibaibarriaga

1,142 views
1,040 views

Published on

Key lecture for the EURO-BASIN Training Workshop on Introduction to Statistical Modelling for Habitat Model Development, 26-28 Oct, AZTI-Tecnalia, Pasaia, Spain (www.euro-basin.eu)

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,142
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
48
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Introduction to R software, by Leire ibaibarriaga

  1. 1. EURO-­‐BASIN,  www.euro-­‐basin.eu   Introduc)on  to  Sta)s)cal  Modelling  Tools  for  Habitat  Models  Development,  26-­‐28th  Oct  2011  
  2. 2. 2 OUTLINE• What is R?• Installation• First session in R• Working directory IN THIS TALK• Getting help in R• Editors and GUIs for R• Installing and updating packages• Useful packages for habitat modelling• Documentation• R language: type of objects, functions to manipulate them, …• Import/export data• Plots in R FOR BACKGROUND• Linear models, generalised linear models (very introductory)• Programming in R
  3. 3. 3 WHAT IS R?• R is a language and environment for statistical computing and graphics.• R provides a wide variety of statistical and graphical techniques, and is highly extensible.• R is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories by John Chambers and colleagues. R is also known as “GNU S”.• R is completely free and it is available as Free Software under the terms of the Free Software Foundation GNU General Public License in source code form.• It compiles and runs on a wide variety of UNIX platforms and similar systems, Windows and MacOS.• R is object-oriented.• R is mostly command-line driven (although various graphical interfaces have been developed).• R has developed rapidly, and has been extended by a large collection of packages.• Web page: www.r-project.org
  4. 4. 4 INSTALLATION• Sources, binaries and documentation for R can be obtained via CRAN, the “Comprehensive R Archive Network”• For Windows: Download the binary installer “R-2.13.2-win.exe”. Just double-click on the icon and follow the instructions. The default path is: “C:Program FilesRR-2.13.1”
  5. 5. 5 FIRST SESSION IN R• Ways to open a session in R: 1. If you double-click on the icon, “Rgui” (graphical user interface) will open 2. From a system window, execute “Rterm” 3. Open R from Tinn-R, Xemacs or similar.
  6. 6. 6HOW DOES IT LOOK LIKE? R CONSOLE
  7. 7. 7 THE R CONSOLE• > indicates that R is waiting a new command• + indicates that the previous command was uncomplete and continues reading.• Different commands are given in different lines or in the same line separated by ;• Comments are written adding #. Everything after this symbol is not read by R• R distinguises capital and lower case letters. “A” is not the same as “a”.• Type of commands: Expressions: the command is evaluated and printed on the screen. Nothing is saved 3+2 or sum(3,2) Assignments: the command is evaluated and saved as an object using <- Nothing is printed. Need to type the name of the object or use the function print() to see it. a <- 3+2 a print(a)
  8. 8. 8 WORKING DIRECTORY• To know the working directory of the current session type: getwd()• To change the working directory: setwd(whatever)• Alternatively, execute R from the directory using a shortcut: – Create a directory – Right-click the R icon, go to “properties” and copy the “shortcut” path in “Start in”• If we use Tinn-R, the default working directory is the one in which the R script is saved• To save the current workspace use save.image(). By default the workspace will be called “.Rdata”. We can specify a name using save.image(“myworkspace.Rdata”)• To quit an R session, q()• Be careful, in windows the paths should be given either as: setwd("C:tmpRcourse") setwd("C:/tmp/Rcourse")
  9. 9. 9 GETTING HELP• ?sum• help(sum)• help("+")• help.start()• ?help.search• help.search("linear models")• ?apropos• apropos("lm")• ?demo• demo(graphics); demo(persp)
  10. 10. 10 EDITORS• Useful to have a text editor that allows us to keep the code scripts (with comments, ordered, etc)• Desirable properties for the text editor: syntax highlighting, checking parenthesis, etc, the code can be directly sent to R without (copy-paste)• R for Windows has a small text editor. File/Open/New script. It links directly with R (Select code and Ctrl + R) but doesn’t allow syntax highlighting, etc.• I use Tinn-R (only for Windows) http://sourceforge.net/projects/tinn-r• It needs to run R in mode SDI and to install the packages R2HTML and SciViews. It might need to change the file Rprofile.site• Other alternatives: Emacs/ESS, Rstudio, Vim, jEdit, JGR, Eclipse,• See a complete list in: http://sciviews.org/_rgui/projects/Editors.html
  11. 11. 11 GUIs for R• The R command line interface (CLI) is powerful because it allows direct control on calculations and it is flexible. However, good knowledge of the language is required. The CLI is intimidating for beginners. The learning curve is typically longer than with a graphical user interface (GUI), although it is recognized that the effort is profitable and leads to better practice.• Several projects are developping alternate user interfaces. See ongoing projects: http://sciviews.org/_rgui/• An example: RCmdr http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/
  12. 12. 12 PACKAGES• All R functions and datasets are stored in packages. Only when a package is loaded are its contents available.• To see which packages are installed at your site, issue the command > library()• To load a particular package called “mgcv” use a command like > library(mgcv)• To see which packages are currently loaded, use > search()• Help for a specific library: library(help="mgcv") help("mgcv-package")• Detach the loaded library: detach("package:mgcv")
  13. 13. 13 PACKAGES• The standard (or base) packages are considered part of the R source code. They should be automatically available in any R installation.• There are thousands of contributed packages for R, written by many different authors. Some of these packages implement specialized statistical methods, others give access to data or hardware, and others are designed to complement textbooks. Some (the recommended packages) are distributed with every binary distribution of R. The rest packages should be downloaded individually from CRAN.• The packages in CRAN can be installed and updated in two ways: – From the Rgui menu – From the R console using the commands: install.packages(‘R2WinBUGS’) update.packages(‘R2WinBUGS’)
  14. 14. 14 USEFUL PACKAGES• http://cran.r-project.org/web/views/Environmetrics.html• http://cran.r-project.org/web/views/MachineLearning.html• http://cran.r-project.org/web/views/Spatial.html
  15. 15. 15 DOCUMENTATION• R manuals• FAQs, Wiki,…• Reference cards• R News• R Journal• A lot of material in the web, e.g.: The R graph Gallery: http://addictedtor.free.fr/graphiques/ R bloggers: http://www.r-bloggers.com/
  16. 16. 16 OBJECTS• Everything (almost) in R is an object• They are the entities that are created and saved in an R session• They can be numbers, characters, functions, vectors, matrices, etc• ls() or objects() show the objects created• rm(a) removes and object called “a”.
  17. 17. 17 TYPE OF OBJECTS• Vector: unidimensional collection of elements of the same type (numbers, TRUE/FALSE, characters, …)• Matrix: Bidimensional collection of elements of the same type• Array: multidimensional collection of elements of the same type• Data frame: like array, but allowing each column to be of different type• Functions: code• Factor: categorical vector• List: a generalised vector. Each component can be of different type and can at the same time have its own components
  18. 18. 18 NUMERIC VECTORS• ?c ; ?rep ; ?seq• a <- c(1,2,3) # c: concatenate• c(x,0,9,x)• rep(1,3) # rep: repeat• rep(a, each=3); rep(x=a, each=3)• rep(a, times=3)• rep( c(1,2), times=c(4,5))• 1:7• 7:1• 2*1:5• n <- 10; (1:n)-1; 1:n-1• seq(-2,3,by=4) # seq: sequence• seq(-2,3,length=4)• seq(5) 18
  19. 19. 19 ARITHMETIC OPERATORS• + , - , * , / , ^, %%, % / %• log( ), exp( ), sqrt( ),• sin( ), cos( ), tan( ), abs( )• ?Arithmetic• x <- 1:3• 2*x• 2^3• (3*2)^2/log(4)• sqrt(-1)• log(10); log(10, base=10) 19
  20. 20. 20 LOGICAL VECTORS• A logival value can take the value: TRUE (T), FALSE (F) or NA (not available)• a <- c(TRUE,FALSE,NA)• b <- c(T,F,NA)• rep(a,3) 20
  21. 21. 21 LOGICAL OPERATORS• <, >, = = , >= , <= , &, | , !• ?Logic• ?Comparison• x <- c(-3,0,6)• x > 0• x>=0• x>0• x<0• x<=0• x==0• !x==0• x < 2 & x > -1 ; x < 2 | x > -1• any(x < 2) ; any(x > -10) ; all(x < 2); all(x > -10) 21
  22. 22. 22 MISSING VALUES• NA (not available) x <- c(1, 2, NA, 4) is.na(x) sum(x); sum(x, na.rm=T)• NaN (not a number) 0/0; Inf – Inf 3/0 x <- c(3, NA, NaN) is.na(x); is.nan(x) 22
  23. 23. 23 CHARACTER VECTORS• The characters are defined by “ “• n new line, t tab, b white space• c(“h”,”o”,”l”,”a”)• paste(“h”,”o”,”l”,”a”)• paste(“h”,”o”,”l”,”a”, sep=“”)• paste(“x”,1:3, sep=“”)• nchar(“hola”)• substring(“hola”, 1:4, 1:4) 23
  24. 24. 24 ACCESSING PART OF VECTOR• Namevector [index]• x <- seq(-1,7)• y <- x <= 5• x[1]; x[c(1,6)]; x[1:4]; x[c(2, 5:6)]• x[y]; x[!y]; x[x > 0]• x[-1]; x[-(3:4)]• x[7] <- 0; x[3:4] <- c(11,9)• x[y] <- NA• is.na(x)• x[is.na(x)] <- 0 24
  25. 25. 25 FACTORS• ?factor• x <- c(rep(“blue”,2), “green”,rep(”red”,4))• x• x <- factor(x)• x• z <- factor(substring(“hola”,1:4,1:4), levels=letters)• z• y <- factor(1:4)• y 25
  26. 26. 26 Matrices• ?matrix• matrix(1:20, ncol=4, nrow=5, byrow=T)• a <- matrix(1:20, ncol=4, nrow=5, byrow=T)• dim(a); nrow(a); ncol(a);• a[1,4]• a[2,]• a[,3]• t(a) # traspose• cbind(1, c(3,2), c(4,7)) # column combine• rbind(1, c(3,2), c(4,7)) # row combine 26
  27. 27. 27 LISTS• ?list• a <- list(country=“China”, measurements=c(34,38,32), station=34)• a$country; a$measurements; a$station• a[1]• a[[1]]• a[[2]][1]• names(a)• length(a)• dim(a) 27
  28. 28. 28 DATA FRAME• a <- data.frame(Long=rep(c(-3:-1), rep(11, 3)), Lat=rep(seq(43,48, by=0.5),3))• names(a)• dim(a)• a$Long; a$Lat• a[1,]; a[,1]• a[a$Long==-1,] 28
  29. 29. 29 FUNCTIONS• Examples ls() a <- sum(1:6) rm(a)• General structure of a function: name (arg1, arg2, arg3)• The arguments can be given in order name (arg1, arg2, arg3) or by name name (arg2=x2, arg3=x3, arg1=x1)
  30. 30. 30 READ R DATA FILES• data()• data(package=“nls”)• ls()• data(trees)• ?trees• ls()• names(trees) 30
  31. 31. 31 READ DATA FROM FILES• ?read.table• A <- read.table(“datos.txt”, header=T)• A <- read.table(“C:/use/datos.txt”, header=T) 31
  32. 32. 32 BASIC STATISTICS• sum, mean, median, var, sd, quantile, min, max, range, sort, unique, summary• data(iris)• summary(iris)• mean(iris$Sepal.Length)• quantile(iris$Sepal.length, 0.25)• quantile(iris$Sepal.length, seq(0,1,0.25))• table(iris$Species) # contingency table• tapply(iris$Sepal.Length, iris$Species, mean)• cor(iris) # correlation matrix 32
  33. 33. 33 HISTOGRAMS• ?hist; ?barplot• a <- rnorm(1000, 0, 1)• hist(a)• hist(a, breaks=10)• hist(a, breaks=seq(-6,6))• hist(a, breaks=10, prob=T)• hist(a, breaks=10, prob=T, labels=T)• hist(a, col=3)• hist(a, border=4)• b <- c(3,2,4,7,1,9)• barplot(b) 33
  34. 34. 34 STEM• ?stem• a <- rnorm(1000, 0, 1)• stem(a)• stem(a, scale=2) 34
  35. 35. 35 BOXPLOTS• ?boxplot• b <- c(rep("A", 100), rep("B", 100), rep("C", 100), rep("D", 100), rep("E", 100))• a <- rnorm(500)• datos <- data.frame(a=a, b=b))• boxplot(datos$a)• boxplot(a ~ b, data=datos)• boxplot(datos$a, notch=T)• boxplot(datos$a, notch=T, col=2, border=4)• boxplot(a ~ b, data=datos, col=1:5) 35
  36. 36. 36 BOXPLOTS• data(iris)• names(iris)• boxplot(iris$Petal.Length)• boxplot(iris$Petal.Length, notch=T)• boxplot(iris$Petal.Length, notch=T, col=2, border=4)• boxplot(Petal.Length ~ Species, data=iris)• boxplot(Petal.Length ~ Species, data=iris, col=2:4)• boxplot(iris[,1:4], col=2:5) 36
  37. 37. 37 QQPLOTS• ?qqplot• a <- rnorm(1000, 0, 1)• qqnorm(a)• qqline(a, col=2)• data(precip)• qqnorm(precip, ylab = "Precipitation [in/yr] for 70 US cities“, col=2) 37
  38. 38. 38 CONDITIONED PLOTS• ?plot.default; ?pairs; ?coplot• pairs(iris)• pairs(iris[, 1:4])• pairs(iris[, 1:4], panel = panel.smooth, main = “Iris data")• coplot(Petal.Width ~ Petal.Length |Species , data=iris, row=1)• coplot(Petal.Width ~ Petal.Length |Sepal.Length, data=iris, row=1)• coplot(Petal.Width ~ Petal.Length |Sepal.Length, given.values=co.intervals(Height, 3), data=iris, row=1) 38
  39. 39. 39 PLOT• ?plot; ?plot.default• a <- rnorm(1000, 0, 1)• plot(a)• plot(a, type=“l”)• plot(a, type=“h”)• plot(a, type=“b”)• plot(a, col=2); plot(a, col=“red”)• plot(a, pch=“*”)• plot(a, pch=2)• plot(a, pch=3, cex=0.6)• plot(a, pch=3, cex=0.6, col=6, xlab=“ ”, ylab=“a”, main=“Residuals”) 39
  40. 40. 40 PLOT• data(cars)• plot(cars)• plot(cars$speed, cars$dist)• plot(cars$dist, cars$speed)• plot(cars, type=“p”, pch=2, col=3, xlab=“Velocity”, ylab=“Distance”, main=“About cars”, xlim=c(10,20), ylim=c(0,80))• ?par 40
  41. 41. 41INTERACTING WITH FIGURES• ?identify• ?locator• plot(cars)• identify(cars, cars$speed)• locator(1)• locator(10) 41
  42. 42. 42 ADD LINES AND POINTS• ?lines; ?points; ?abline, ?text• plot(cars, type=“n”)• points(cars$speed[1:10], cars$dist[1:10], col=6)• lines(cars)• lines(cars$speed[1:10], cars$dist[1:10], col=4)• lines(lowess(cars))• abline(v=10)• abline(h=40)• abline(a=0, b=1)• text(cars$speed, cars$dist, labels=cars$dist) 42
  43. 43. 43 ADD LEGENDS• ?legend• boxplot(iris$Sepal.Length ~ iris$Species, col=2:4)• legend(1, 8, levels(iris$Species), fill=2:4)• boxplot(iris$Sepal.Length ~ iris$Species, col=2:4)• loc <- locator(1)• legend(loc, levels(iris$Species), fill=2:4) 43
  44. 44. 44 LINEAR REGRESSION• ?lm• mod.lm <- lm(Petal.Width~Petal.Length, data=iris)• mod2.lm <- lm(Petal.Width~Petal.Length - 1, data=iris)• mod3.lm <- lm(Petal.Width~Petal.Length + Species, data=iris)• mod4.lm <- lm(Petal.Width~Petal.Length * Species, data=iris)• mod5.lm <- lm(Petal.Width~Petal.Length , subset=Species==“setosa”, data=iris) 44
  45. 45. 45 LINEAR REGRESSION• mod.lm• summary(mod.lm)• coef(mod.lm)• residuals(mod.lm)• predict(mod.lm)• names(mod.lm)• plot(iris$Petal.Length, iris$Petal.Width, type=“n”)• points(iris$Petal.Length[iris$Species==“setosa”], iris$Petal.Width[iris$Species==“setosa”], col=2)• abline(coef(mod.lm)) 45
  46. 46. 46 R PROGRAMMING• If (condition) { instructions }• If (condition) { instructions } else {instructions}• while (condition) { instructions }• For (i in index) { instructions } 46
  47. 47. 47 OWN FUNCTIONS• Open a text editor to correct/create functions: fix(nombre)• Estructure: function (arg1, arg2,arg3){ instructions return(result) } 47
  48. 48. EURO-­‐BASIN,  www.euro-­‐basin.eu   Introduc)on  to  Sta)s)cal  Modelling  Tools  for  Habitat  Models  Development,  26-­‐28th  Oct  2011  

×