Introducing R

What is and what can I do with it?
R is freely available software for Windows, Mac OS and Linux
To download R in New Zealand: http://cran.stat.auckland.ac.nz/
What is R?
A very simple programming language
A place for you to input data
A collection of tools for you to perform calculations
A tool for producing graphics
A statistics suite that can be downloaded on to any PC, Mac or Linux system
A software package that can run on high performance computing clusters

With R you can:
Perform simple or advanced statistical tests and analyses
e.g. standard deviation, t-test, principal component analysis
Read and manipulate data from existing files
e.g. tables in Excel files, trees in nexus files, data on websites
Write data or figures to files
e.g. export a figure to .pdf, export a .csv file
Produce simple or advanced figures

http://dx.doi.org/10.1098/rspb.2014.0806
Figure 2. A reconstruction of the evolutionary history of
carotenoid pigmentation in feathers. The likelihood that
ancestors could display carotenoid feather pigments has
been reconstructed using ‘hidden’ transition rates in three
rate categories (AIC = 4002.5, 11 transition rates) [33]. The
POEs (defined in Material and methods) for carotenoid
feather pigmentation are identified by red circles.
Branches are coloured according to the proportional
likelihood of carotenoid-consistent colours at the
preceding node. Solid purple points indicate species for
which carotenoid feather pigments were confirmed
present from chemical analysis; open black points
represent those for which where carotenoids were not
detected in feathers after chemical analysis. Supertree
phylogeny from [21].

Who is this guide for?
Starting at ground level and shaping you into a confident R user
Are you…
Completely new to R?
An infrequent R user who wants a refresher?
The material in these slides may not be useful for confident R users.
An Introduction to R
W. N. Venables, D. M. Smith and the R Core Team
http://cran.r-project.org/doc/manuals/R-intro.pdf

What does this guide cover?
Part zero: Getting started
Interacting with R
Part one: Objects
Vectors, Matrices, Character arrays
Part two: Data manipulation
Analysing data, T-test
Part three: External data
Reading data into R, ANOVA
Part four: Packages and libraries
Installing new packages into R
Part five: Scripts
Using pre-written code
Part six: Logic (programming)
Other functions in R

Starting
This guide will demonstrate the R Console (command-line input) for R 3.02 running in Windows 7.
For Mac OS, R can be executed from terminal. For Unix, seek professional help…
The only point of difference should be the initial starting of R and the visual appearance:
Console commands will be the same for all operating systems.

#Throughout this guide a hashtag (i.e. number sign ‘#’) will identify a
comment or instruction
#Start R by finding the R application on your computer
#You will be presented with the R console

#There are a variety of ways of using R, and we will start out with the most basic
#We are going to enter lines of code into R by typing or pasting them into the R
console
#At its most basic, R is just a calculator
> 1+1
[1] 2
> 1*3
[1] 3
> 4-7
[1] -3
> 20/4
[1] 5
>
#The lines above this have come from the R Console. Remember to remove
the > symbol if you copy text directly from these slides and paste it into R

#Some more basic mathematical operations in R
> 12--2
[1] 14
> 2^2
[1] 4
> sqrt(9)
[1] 3
> 4*(1+2)
[1] 12

Part zero: Exercise
#Use R to find the length of the hypotenuse in the triangle shown below
#Side a has length 3, Side b has length 4, and the hypotenuse has length h
h2=a2+b2
h= √(a2+b2)
3
4
h

Part zero: Exercise
#Use R to find the length of the hypotenuse in the triangle shown below
> sqrt(3^2+4^2)
[1] 5
3
4
h

Part one: Objects
#R is more than just a basic calculator…
#Most operations in R will use objects, which are values stored in R
#Type x=1 into the R console
#You have now input a number into R by storing that number as an object. For this
example, the name of our object is x
#Objects must be named using letters alone, or letters followed by other symbols
#Object names cannot include spaces
> x=1
>
#Congratulations, you have just programmed R to store an object.
#Type x into the R console to recall your object
> x
[1] 1
>

Part one: Objects
#We will now replace the value of x with 10
> x
[1] 1
> x=10
> x
[1] 10
>
#As you can see, the value of an object can be easily replaced by simply making
the object equal to a new value

Part one: Objects
#Let’s make y into a vector - a one dimensional array
#There are several ways of making a vector in R. These methods introduce
functions.
#A function is an operation performed on numbers and/or objects.
#The two easiest ways of making a vector in R use different functions:
#Use the concatenate function c and place numbers inside parentheses
> y=c(10,11,12,13,14,15,16,17,18,19,20)
> y
[1] 10 11 12 13 14 15 16 17 18 19 20
#Use the array function and place numbers inside parentheses
> y=array(10:20)
> y
[1] 10 11 12 13 14 15 16 17 18 19 20

Part one: Objects
#Just as we replaced x with a single value, we can also replace a single value
within our vector
#Let’s replace the fifth number in our vector with 0
> y
[1] 10 11 12 13 14 15 16 17 18 19 20
> y[5]=0
> y
[1] 10 11 12 13 0 15 16 17 18 19 20
>
#Square brackets [] placed after a vector will instruct R that we are interested in
only a part of the vector. In the example above, we are referring to the fifth
position in the vector

Part one: Objects
#Try these vector manipulations as well:
> y[1]=y[2]
> y
[1] 11 11 12 13 0 15 16 17 18 19 20
>
#The value of the first position was changed to be the same as the value in the
second position
> y[c(1,3,5)]=5
> y
[1] 5 11 5 13 5 15 16 17 18 19 20
>
#The values in the first, third and fifth positions were made equal to 5

Part one: Objects
#Onward! We will make a new object, a two-dimensional matrix, and call it z
#Our matrix will have ten rows and ten columns, and we will start out by filling
all the cells with 0
> z=matrix(0,ncol=10,nrow=10)
> z
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 0 0 0 0 0 0 0 0 0
[2,] 0 0 0 0 0 0 0 0 0 0
[3,] 0 0 0 0 0 0 0 0 0 0
[4,] 0 0 0 0 0 0 0 0 0 0
[5,] 0 0 0 0 0 0 0 0 0 0
[6,] 0 0 0 0 0 0 0 0 0 0
[7,] 0 0 0 0 0 0 0 0 0 0
[8,] 0 0 0 0 0 0 0 0 0 0
[9,] 0 0 0 0 0 0 0 0 0 0
[10,] 0 0 0 0 0 0 0 0 0 0
>

Part one: Objects
#We can replace parts of our matrix, like we did with our vector
> z
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 0 0 0 0 0 0 0 0 0
[2,] 0 0 0 0 0 0 0 0 0 0
[3,] 0 0 0 0 0 0 0 0 0 0
[4,] 0 0 0 0 0 0 0 0 0 0
[5,] 0 0 0 0 0 0 0 0 0 0
[6,] 0 0 0 0 0 0 0 0 0 0
[7,] 0 0 0 0 0 0 0 0 0 0
[8,] 0 0 0 0 0 0 0 0 0 0
[9,] 0 0 0 0 0 0 0 0 0 0
[10,] 0 0 0 0 0 0 0 0 0 0
> z[1,3]=33
> z
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 0 33 0 0 0 0 0 0 0
[2,] 0 0 0 0 0 0 0 0 0 0
[3,] 0 0 0 0 0 0 0 0 0 0
[4,] 0 0 0 0 0 0 0 0 0 0
[5,] 0 0 0 0 0 0 0 0 0 0
[6,] 0 0 0 0 0 0 0 0 0 0
[7,] 0 0 0 0 0 0 0 0 0 0
[8,] 0 0 0 0 0 0 0 0 0 0
[9,] 0 0 0 0 0 0 0 0 0 0
[10,] 0 0 0 0 0 0 0 0 0 0
#Here, the two numbers inside the square brackets are a coordinate for the matrix:
first row, third column

Part one: Objects
#We can replace an entire row by not providing a column coordinate
> z[1,]=33
> z
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 33 33 33 33 33 33 33 33 33 33
[2,] 0 0 0 0 0 0 0 0 0 0
[3,] 0 0 0 0 0 0 0 0 0 0
[4,] 0 0 0 0 0 0 0 0 0 0
[5,] 0 0 0 0 0 0 0 0 0 0
[6,] 0 0 0 0 0 0 0 0 0 0
[7,] 0 0 0 0 0 0 0 0 0 0
[8,] 0 0 0 0 0 0 0 0 0 0
[9,] 0 0 0 0 0 0 0 0 0 0
[10,] 0 0 0 0 0 0 0 0 0 0
>
#Likewise, we can replace an entire column
> z[,3]=c(1:10)
> z
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 33 33 1 33 33 33 33 33 33 33
[2,] 0 0 2 0 0 0 0 0 0 0
[3,] 0 0 3 0 0 0 0 0 0 0
[4,] 0 0 4 0 0 0 0 0 0 0
[5,] 0 0 5 0 0 0 0 0 0 0
[6,] 0 0 6 0 0 0 0 0 0 0
[7,] 0 0 7 0 0 0 0 0 0 0
[8,] 0 0 8 0 0 0 0 0 0 0
[9,] 0 0 9 0 0 0 0 0 0 0
[10,] 0 0 10 0 0 0 0 0 0 0
>

Part one: Objects
#Lastly, we will make a character array, which is like a vector or a matrix except
that it can hold numbers and letters
> w=matrix("df",ncol=10,nrow=10)
> w
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df"
>
#So, this covers the basics of creating objects for storing data in R.

Part one: Objects
#Let’s clean out the objects that we made in Part One
> ls()
[1] "w" "x" "y" "z"
>
#The list objects command ls() will show us which objects are stored in R
#We can permanently remove a specific object with the rm() function
> rm(x)
> ls()
[1] "w" "y" "z"
>
#We can also remove all objects
> rm(list = ls())
> ls()
> character(0)

Part one: Exercise
#Make a new matrix object with three columns and seven rows, and fill every cell
with the number 9. Use your first name as the name of the matrix object.
#Make a new vector object with the numbers 101, 898 and -3. Use your surname
as the name of the vector object.
#Replace the fourth row of your matrix with your vector.

Part one: Exercise
#Make a new matrix object with three columns, seven rows, and fill every cell with
the number 9. Use your first name as the name of the matrix object.
> daniel=matrix(9,ncol=3,nrow=7)
> daniel
[,1] [,2] [,3]
[1,] 9 9 9
[2,] 9 9 9
[3,] 9 9 9
[4,] 9 9 9
[5,] 9 9 9
[6,] 9 9 9
[7,] 9 9 9
#Make a new vector object with the numbers 101, 898 and -3. Use your surname
as the name of the vector object.
> thomas=c(101,898,-3)
> thomas
[1] 101 898 -3
#Replace the fourth row of your matrix with your vector.
> daniel[4,]=thomas
> daniel
[,1] [,2] [,3]
[1,] 9 9 9
[2,] 9 9 9
[3,] 9 9 9
[4,] 101 898 -3
[5,] 9 9 9
[6,] 9 9 9
[7,] 9 9 9

HELP!
#You can call on the help function if you become lost or unstuck when using R
#Can’t remember how to make a matrix?
> ?matrix
>

#This will be a worked example for a Student’s T-test for the means of two
samples, showcasing the storage and analysis of data in R

#Make x a vector containing 1000 random numbers
> set.seed(1)
> x=rnorm(1000)
#Make y a vector containing 1000 random numbers
> set.seed(100)
> y=rnorm(1000)
#The random numbers in R are not truly random, they are simply drawn from a pool of
data that has many characteristics of random data. Using the set.seed function, we
can define a set of ‘random’ numbers for use in our calculations. This will mean that we
should all get the same results from our ‘random’ numbers’
#We will use Student’s T-test to see if the mean of x and mean of y are significantly
different

#What are the assumptions for a T-test?
#1) That the two samples (x and y) are each normally distributed
#2) That the two samples have the same variance
#3) That the two samples are independent
#These are calculated data so we will assume that 3) is true.
#We should test 1) and 2) if we want our T-test results to be meaningful!

#We will use the Shapiro-Wilk1 test to see if the data are normally distributed
#The Shapiro-Wilk test calculates a normality statistic (W) and tests the hypothesis that
the data are normal
#We would reject the null hypothesis for our sample if we received a p-value of <0.05
#To perform a Shapiro-Wilk test in R we use the shapiro.test function
> shapiro.test(x)
Shapiro-Wilk normality test
data: x
W = 0.9988, p-value = 0.7256
>
> shapiro.test(y)
data: y
W = 0.9993, p-value = 0.9765
1Shapiro SS & Wilk MB. 1965. An analysis of variance test for normality (complete samples). Biometrika 52: 591–611

#We will use an F-test1 to see if x and y have equal variances
#The null hypothesis of this F-test is that the two datasets have equal variances, and this
hypothesis is rejected if the p-value is <0.05
#We calculate an F-test for equal variances in R using the var.test function
> var.test(x,y)
F test to compare two variances
data: x and y
F = 1.0084, num df = 999, denom df = 999, p-value = 0.8947
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.890733 1.141648
sample estimates:
ratio of variances
1.008417
1Box, G.E.P. (1953). "Non-Normality and Tests on Variances". Biometrika 40 (3/4): 318–335.

#Let’s perform the Student’s T-test and see if the mean of x and the mean of y are
significantly different
#We will use a simple form of the t.test function. This test requires three pieces of
information: x, y, and information about equal variance
> t.test(x,y,var.equal=TRUE)
Two Sample t-test
data: x and y
t = -0.6161, df = 1998, p-value = 0.5379
alternative hypothesis: true difference in means is not equal to 0
-0.11903134 0.06212487
sample estimates:
mean of x mean of y
-0.01164814 0.01680509
#The null hypothesis for this test is that x and y have the same mean value. The
significance level was set at 0.95, so the rejection criteria would be a p-value less than
0.05. Did we reject the null hypothesis?

Part two: Exercise
#Generate vector objects a and b as below
> set.seed(10)
> a=rnorm(1000,sd=2)
> set.seed(50)
> b=rnorm(1000,sd=1)
#Is the mean of a significantly different from the mean of b? Is it appropriate to
use a Student’s T-test to address this question?

Part two: Exercise
> shapiro.test(a)
data: a
W = 0.9979, p-value = 0.2538
> shapiro.test(b)
data: b
W = 0.9978, p-value = 0.2242
> var.test(a,b)
F test to compare two variances
data: a and b
F = 3.7431, num df = 999, denom df = 999,
p-value < 2.2e-16
alternative hypothesis: true ratio of
variances is not equal to 1
3.306307 4.237678
sample estimates:
ratio of variances
3.743136
> t.test(a,b,var.equal=F)
Welch Two Sample t-test
data: a and b
t = 0.3949, df = 1497.218, p-value = 0.693
alternative hypothesis: true difference in
means is not equal to 0
-0.1106290 0.1663946
sample estimates:
mean of x mean of y
0.022749483 -0.005133326
>
Is the mean of a different from the mean of
b?
p-value = 0.693
Fail to reject the null hypothesis that the
means are different.

#Datasets can often be too large to type into R. This section of the guide will show you
how to automatically read data into R and then perform an analysis
#For this test we will perform a one-way analysis of variance (ANOVA)
Country
#Right click on the dataset embedded above the arrow , move the mouse to ‘Macro-
Enabled Worksheet Object’, click Open, and then save the table as IUCN.csv (a comma
separated values file) to a folder on your computer
#The dataset contains a count of endangered species for sixty randomly selected
countries in three different regions. These data have been extracted from Table 6a of
the IUCN Red List summary statistics:
http://www.iucnredlist.org/documents/summarystatistics/2010_3RL_Stats_Table_6a.pdf

#We are going to use a one-way ANOVA to see if the mean number of endangered
species is different in different regions (AFRICA, ASIA and EUROPE).
#First step: we will now tell R where to look for the file, using the setwd()function
> setwd("H:/Projects/Teaching/R")
#Hint: your working directory will be different to mine
#Note: we use forwardslashes / and not backslashes
#Second step: we read the file into R as a new object called IUCN. The term sep=","
is used because values in the dataset are separated by commas. The term header=T
is used because the first row of the IUCN table contains column names
> IUCN=read.table("IUCN.csv",sep=",",header=T)
#Alternatively, if we know the full file path, then we could read the file into R without
using setwd()
> IUCN=read.table("H:/Projects/Teaching/R/IUCN.csv",sep=",",header=T)

#What are the assumptions for a one-way ANOVA?
#1) That the data in each group have been randomly selected from a normal distribution
#2) That each group of data have the same variance
#3) That each group of data is independent
#Assumption 3) may be unlikely but we will assume it is true.
#We should test 1) and 2) if we want our ANOVA results to be meaningful!

#We will use the Shapiro-Wilk test to see if the data from each region (AFRICA, ASIA and
EUROPE) and are normally distributed
#First though, we will separate out the data for each region so that we can test for
normality separately
> af=IUCN[which(IUCN[,2]=="AFRICA"),3]
#Let’s take a closer look:
IUCN[,2]calls up the second column of the IUCN object
#The which() function is asking ‘which of the values in column 2 of the IUCN object
contain the word “AFRICA”? which(IUCN[,2]=="AFRICA"). This give us the Africa
row values.
#Now we can use the Africa row values to find the number of Endangered species for
each African country. These species counts are stored in column 3 of the IUCN object.
IUCN[which(IUCN[,2]=="AFRICA"),3]
#Now we store the endangered species counts for African countries as the af object
af=IUCN[which(IUCN[,2]=="AFRICA"),3]

#Repeat for ASIA and EUROPE
> ai=IUCN[which(IUCN[,2]=="ASIA"),3]
> eu=IUCN[which(IUCN[,2]=="EUROPE"),3]

#We will use a Bartlett Test of Homogeneity of Variances1 to test if variance is equal
across our three groups (AFRICA, ASIA, EUROPE).
#The function for the Bartlett test is simply Bartlett.test(). The terms for this
function will be the Endangered species column of the IUCN object, and the
Region column of the IUCN object. Column 3 and column 2 respectively.
#A Bartlett operates similar to an F-test. The null hypothesis for this Bartlett-test is that the
groups have equal variances.
#We would reject the null hypothesis for our dataset if we received a p-value of <0.05.
> bartlett.test(IUCN[,3]~IUCN[,2])
Bartlett test of homogeneity of variances
data: IUCN[, 3] by IUCN[, 2]
Bartlett's K-squared = 11.6261, df = 2, p-value = 0.002988
11Box, G.E.P. (1953). "Non-Normality and Tests on Variances". Biometrika 40 (3/4): 318–335.

#Here we reject the null hypothesis – at least Region has a variance that is not equal to the
variance of another Region in the dataset.
#Our dataset does not satisfy the second assumption of the ANOVA. We can still proceed
however.
#The ANOVA test is robust to violations of this second assumption. This means that it can still
produce meaningful results even if the groups do not have equal variances. As a rule of thumb,
we can proceed if the maximum variance of our groups is less than 4 times greater than the
minimum variance of our groups.
> var(af)
[1] 25.07692
> var(ai)
[1] 9.002849
> var(eu)
[1] 7.464387
>
#The variance of the number of endangered species in Africa is substantially greater than the
other two variance values. However, the Africa group variance is less than 4 time the variance
of the Europe group
> var(eu)<4*var(af)
[1] TRUE
#So, we will proceed, but we need to be aware that with unequal variances is will be tougher
for an analysis of variance to find a significant result.

#Perform the one-way ANOVA using the aov() function with the following
syntax, and store the results as an object called IUCN_ANOVA
> IUCN_ANOVA=aov(Endangered_species~Region,data=IUCN)
#You can see the ANOVA results by calling up the IUCN_ANOVA object
> IUCN_ANOVA
Call:
aov(formula = Endangered_species ~ Region, data = IUCN)
Terms:
Region Residuals
Sum of Squares 703.284 1080.148
Deg. of Freedom 2 78
Residual standard error: 3.721297
Estimated effects may be unbalanced
>

#Use the summary() function to find out more about the ANOVA
> summary(IUCN_ANOVA)
Df Sum Sq Mean Sq F value Pr(>F)
Region 2 703.3 351.6 25.39 3.21e-09 ***
Residuals 78 1080.1 13.8
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
#Interpretation: How do we read this table to find out if the mean number of
endangered species is different in different regions?
#The null hypothesis for this test is that the mean number of endangered species is the
same in each region. We would reject this null hypothesis if the p-value (i.e. Pr(>F))
is less than the significance level for this test (i.e. <0.05). So, we reject the null
hypothesis, and conclude that the mean number of endangered species is significantly
different between regions.

#Are the number of endangered animals different between all regions, or just different
for one region? To find out we will use Tukey’s Honest Significant Difference test.
#The function for Tukey’s HSD is simply TukeyHSD(). The test uses the following
syntax
> TukeyHSD(IUCN_ANOVA,"Region")
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = Endangered_species ~ Region, data = IUCN)
$Region
diff lwr upr p adj
ASIA-AFRICA -4.185185 -6.605050 -1.7653208 0.0002620
EUROPE-AFRICA -7.185185 -9.605050 -4.7653208 0.0000000
EUROPE-ASIA -3.000000 -5.419864 -0.5801356 0.0111684
#Tukey’s HSD provides a pairwise test of each group in the ANOVA. Any Region pair with
a p adj value <0.05 had a significantly different number of endangered species.

#Bonus: Let’s plot our IUCN data to better visualise these results
> boxplot(Endangered_species~Region,data=IUCN)
maximum
(excl. outliers)
upper quartile
mean
lower quartile
minimum
(excl. outliers)
AFRICA ASIA EUROPE
5 10 15 20 25
AFRICA ASIA EUROPE
5 10 15 20 25
Outlier

Part three: Exercise
#Plotting basics
#To quickly generate a plot in R using only default options, simply use the
plot() function.
> plot(af)
>
#There are many variables that you change to improve the look of your plots
plot(af,xlab="Country",main="Africa",col=rainbow(100),p
ch=16,ylab="Endangered species (number)",cex=2,font=6)
barplot(af,col="red",names.arg=IUCN[which(IUCN[,2]=="AF
RICA"),1],las=2,ylab="Endangered species
(count)",main="Africa")
#Use ?plot and ?barplot to learn about the variables you can change when
plotting data

#You have been using some of the basic functions that are packaged with R, and you
have been either generating or importing datasets
#Anyone can write a new function in R though, or make a dataset, and these functions
and datasets can be bundled together into a package
#R is modular, which means you can download and install new packages to give you
access to new functions and/or datasets
#There is an automatic and a manual method for installing packages. This guide will
teach you how to manually install packages in R
#Why the manual method you ask? Because R requires internet access to download
packages, which can be complicated by a University proxy. I can’t guarantee that the
proxy won’t be an issue. That’s why. Well that, and it will be good for you.

#This will be an exercise in downloading the ‘Analyses of Phylogenetics and Evolution’
package, first written by Emmanuel Paradis in 2008
#The abbreviation for this package is ape

#Open a web browser and enter http://cran.r-project.org/web/packages/ape/index.html
into the address bar – go to the website. The page should be mostly black text on a
white background.
#Find the Downloads section towards the bottom of the website.
#For mac users: download the Mac OS X binary (ape_3.1-4.tgz)
#For PC users: download the Windows binary (ape_3.1-4.zip)
#For UNIX users: again, seek professional help
#Save the ape_3.1-4.xxx file somewhere on your computer that you can easily find
#Note to future users: the file name may be slightly different if Paradis has updated ape

#Run R
#Use the install.packages function with the following syntax to install the ape
package
> install.packages("H:/Teaching/ape_3.1-4.zip")
#Remember to replace my file path “H:/Teaching/” with the file path of the
folder where you downloaded the ape package
#You should see text like this appear after you enter the install.packages
command
Installing package into ‘C:/Documents/R/win-library/3.1’
(as ‘lib’ is unspecified)
inferring 'repos = NULL' from the file name
package ‘ape’ successfully unpacked and MD5 sums checked
#Congratulations, you have now added functions and datasets written by Emmanuel
Paradis to your own copy of R

#You only need to install a package into R once. The package is now available as a
‘library’. If you want to use the ape library in your current R session, then you need to
load the library into R
> library(ape)
>
#So, you install a package once, and load a library many times (every time you run R and
want to use the library)
#The ape library is now available for youto use. Ape is a library of datasets and tools that
have been designed around phylogenetic analyses. We quickly will explore some of the
data and functions in ape:
> data(bird.orders)
#The data function loads a dataset into R. Here we have loaded the bird orders dataset
that is part of the ape library

> plot(bird.orders)
#The plot function detects that bird.orders is a special type of object – it is a
‘phylo’ class of object. This type of object is a different object class from the vectors,
matrices and data frames that we have been working with
#The ape library has a special plot function for plotting ‘phylo’ objects. This special
plot function replaced the normal plot function when we tried to plot
bird.orders.
#Don’t worry! All of this happened automatically because we installed the ape package

#Test: Use the ? (help) function for plot.phylo to learn how to plot the
bird.orders dataset as a fan, as below
> ?plot.phylo

Part four: Exercise
#Download, install and load two packages: ggplot2 and labeling
#Get the packages using Google ‘r ggplot2 cran’ and ‘r labeling cran’ or use the
links below
http://cran.r-project.org/web/packages/labeling/index.html
http://cran.r-project.org/web/packages/ggplot2/index.html
#Use the new data and functions provided by these packages to plot the density of
diamonds against their weight (carat).
> qplot(carat, data = diamonds, geom = "density", colour = color)
>
#For more information on ggplot see http://ggplot2.org/book/qplot.pdf

Part five: Scripts
#One of the best features of R is the ability to automatically carry out many
commands, one after another. For this type of operation we would first write all
of our commands into a script, and then enter the entire script into R in one action
#We are going to use previously scripted code for this section of the guide. Our script will
generate, analyse and plot some data.
#Go ahead and open this embedded text file by right clicking on it and clicking
‘Packager Shell Object Object’  ‘Activate Contents’
#Copy the entire contents of this notepad document and paste it all into R
#Now, read through the notepad document to find out what has taken place

#There are many functions in R that do more than just basic mathematical operations
#We have seen one already, the which() function. This function looked through an
object to find a particular value that we wanted.
> which(IUCN[,2]==“AFRICA”)
#Here we will focus on loops, which we access using the for() function.
#A loop is written as follows
>for(i in 1:10){
}
# for starts the loop
# i is a value that will be updated as the loop iterates
# 1 is the starting value for i
# 10 is the final value for i
#The curly brackets {} enclose the calculations that are looped

#Make j = 1
> j=1
#We will use a loop to increase the value of j by i through ten iterations
> for(i in 1:10){
j+i
}
#We don’t get to see what happens inside a loop unless we specifically ask for it
> for(i in 1:10){
+ print(j+i)
+ }
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
[1] 11
>

#What is the new value of j? j is still 1, because we did not store the changed value.
> for(i in 1:10){
+ j=j+1
+ }
> j
[1] 11
#j is now equal to 11. How did that happen?
> j=1
> for(i in 1:10){
+ j=j+1
+ print(j)
+ }
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
[1] 11

Part six: Exercise
#Make a vector of ten random numbers
#Using a loop, add 100 to each number in the vector, in sequence. For example, in
the first iteration of your loop you will add 100 to the first value of your vector, in
the second iteration of your loop you will add 100 to the second value of your
vector, and so on.

Part six: Exercise
> x=rnorm(10)
> x
[1] -0.81673186 0.35409408 0.69619606 -2.04003445 -
1.02832503 -0.31418186
[7] 0.09717105 0.78778455 -0.15048025 1.86026573
>
>
> for(i in 1:length(x)){
+ x[i]=x[i]+100
+ }
> x
[1] 99.18327 100.35409 100.69620 97.95997 98.97167
99.68582 100.09717
[8] 100.78778 99.84952 101.86027
>

How far does a Duvaucel's gecko travel after release?
Department of Conservation
Reference: 10039929
Photograph by Chris Smuts-Kennedy
Grid of monitored stations
Methods:
• Record the grid coordinates of the station
where the gecko is released
• Each day for three subsequent days
measure the grid coordinates of the station
where the gecko is found
• Calculate the distance between recorded
stations
• 10 m by 10 m grid
1 m
1 m

#Step one: Set up the monitoring grid data for each day. 0 means that the gecko
was not observed in that grid cell, 1 means that the gecko was observed in that
grid cell.
#Release day
set.seed(1)
d0=rep(0,100)
d0[round(runif(1,min=0,max=100))]=1
day.zero=matrix(d0,ncol=10,nrow=10)
#Day one check
set.seed(2)
d1=rep(0,100)
day.one=matrix(d1,ncol=10,nrow=10)
#Day two check
set.seed(3)
d2=rep(0,100)
day.two=matrix(d2,ncol=10,nrow=10)
#Day three check
set.seed(4)
d3=rep(0,100)
day.three=matrix(d3,ncol=10,nrow=10)

#Step two: Combine all of the grid data into one list. This will help us quickly
analyse the data as a single batch.
days=list(day.zero,day.one,day.two,day.three)
#Step three: Create a matrix where we will store the grid locations for the gecko
location, and calculate the daily distance.
movement=matrix(0,ncol=3,nrow=length(days))
colnames(movement)=c("Easting","Northing","Displacement (m)")
#Step four: Find the grid cell for the location of the gecko on each day and store
that information in the movement matrix.
for(i in 1:length(days)){
movement[i,1]=which(days[[i]]==1, arr.ind=TRUE)[1]
movement[i,2]=which(days[[i]]==1, arr.ind=TRUE)[2]
}

#Step five: Calculate the distance that the gecko travelled each day.
for(j in 2:length(days)){
movement[j,3]=sqrt(((abs(movement[j,1]-movement[j-
1,1]))^2)+((abs(movement[j,2]-movement[j-1,2]))^2))
}
#Step six: Plot the distance between each station where the gecko was found
on each subsequent day.
barplot(movement[,3],xlab="Day",ylab="Displament (m)",main="Gecko
distance")

Conclusion
#By now you should have a good understanding of how to use R
#We have covered all of the basic ways of interacting with R:
- Storing data
- Plotting data
- Analysing data with functions
- Loading new functions for data analysis
#There is so much further you can take this though – your imagination is the limit!
#You should think of this tutorial as a quick reference guide to help get you on your feet
#You can also check out tutorial videos at
illuminatingaotearoa.wordpress.com/zoostar

Introducing R

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (17)

Similar to Introducing R

Similar to Introducing R (20)

Recently uploaded

Recently uploaded (20)

Introducing R

Editor's Notes