SlideShare a Scribd company logo
1 of 64
Introducing
What is and what can I do with it? 
R is freely available software for Windows, Mac OS and Linux 
To download R in New Zealand: http://cran.stat.auckland.ac.nz/ 
What is R? 
A very simple programming language 
A place for you to input data 
A collection of tools for you to perform calculations 
A tool for producing graphics 
A statistics suite that can be downloaded on to any PC, Mac or Linux system 
A software package that can run on high performance computing clusters
What is and what can I do with it? 
With R you can: 
Perform simple or advanced statistical tests and analyses 
e.g. standard deviation, t-test, principal component analysis 
Read and manipulate data from existing files 
e.g. tables in Excel files, trees in nexus files, data on websites 
Write data or figures to files 
e.g. export a figure to .pdf, export a .csv file 
Produce simple or advanced figures
What is and what can I do with it? 
http://dx.doi.org/10.1098/rspb.2014.0806 
Figure 2. A reconstruction of the evolutionary history of 
carotenoid pigmentation in feathers. The likelihood that 
ancestors could display carotenoid feather pigments has 
been reconstructed using ‘hidden’ transition rates in three 
rate categories (AIC = 4002.5, 11 transition rates) [33]. The 
POEs (defined in Material and methods) for carotenoid 
feather pigmentation are identified by red circles. 
Branches are coloured according to the proportional 
likelihood of carotenoid-consistent colours at the 
preceding node. Solid purple points indicate species for 
which carotenoid feather pigments were confirmed 
present from chemical analysis; open black points 
represent those for which where carotenoids were not 
detected in feathers after chemical analysis. Supertree 
phylogeny from [21].
Who is this guide for? 
Starting at ground level and shaping you into a confident R user 
Are you… 
Completely new to R? 
An infrequent R user who wants a refresher? 
The material in these slides may not be useful for confident R users. 
An Introduction to R 
W. N. Venables, D. M. Smith and the R Core Team 
http://cran.r-project.org/doc/manuals/R-intro.pdf
What does this guide cover? 
Part zero: Getting started 
Interacting with R 
Part one: Objects 
Vectors, Matrices, Character arrays 
Part two: Data manipulation 
Analysing data, T-test 
Part three: External data 
Reading data into R, ANOVA 
Part four: Packages and libraries 
Installing new packages into R 
Part five: Scripts 
Using pre-written code 
Part six: Logic (programming) 
Other functions in R
Starting 
This guide will demonstrate the R Console (command-line input) for R 3.02 running in Windows 7. 
For Mac OS, R can be executed from terminal. For Unix, seek professional help… 
The only point of difference should be the initial starting of R and the visual appearance: 
Console commands will be the same for all operating systems.
Part zero: Getting started 
#Throughout this guide a hashtag (i.e. number sign ‘#’) will identify a 
comment or instruction 
#Start R by finding the R application on your computer 
#You will be presented with the R console
Part zero: Getting started 
#There are a variety of ways of using R, and we will start out with the most basic 
#We are going to enter lines of code into R by typing or pasting them into the R 
console 
#At its most basic, R is just a calculator 
> 1+1 
[1] 2 
> 1*3 
[1] 3 
> 4-7 
[1] -3 
> 20/4 
[1] 5 
> 
#The lines above this have come from the R Console. Remember to remove 
the > symbol if you copy text directly from these slides and paste it into R
Part zero: Getting started 
#Some more basic mathematical operations in R 
> 12--2 
[1] 14 
> 2^2 
[1] 4 
> sqrt(9) 
[1] 3 
> 4*(1+2) 
[1] 12
Part zero: Exercise 
#Use R to find the length of the hypotenuse in the triangle shown below 
#Side a has length 3, Side b has length 4, and the hypotenuse has length h 
h2=a2+b2 
h= √(a2+b2) 
3 
4 
h
Part zero: Exercise 
#Use R to find the length of the hypotenuse in the triangle shown below 
> sqrt(3^2+4^2) 
[1] 5 
3 
4 
h
Part one: Objects 
#R is more than just a basic calculator… 
#Most operations in R will use objects, which are values stored in R 
#Type x=1 into the R console 
#You have now input a number into R by storing that number as an object. For this 
example, the name of our object is x 
#Objects must be named using letters alone, or letters followed by other symbols 
#Object names cannot include spaces 
> x=1 
> 
#Congratulations, you have just programmed R to store an object. 
#Type x into the R console to recall your object 
> x 
[1] 1 
>
Part one: Objects 
#We will now replace the value of x with 10 
> x 
[1] 1 
> x=10 
> x 
[1] 10 
> 
#As you can see, the value of an object can be easily replaced by simply making 
the object equal to a new value
Part one: Objects 
#Let’s make y into a vector - a one dimensional array 
#There are several ways of making a vector in R. These methods introduce 
functions. 
#A function is an operation performed on numbers and/or objects. 
#The two easiest ways of making a vector in R use different functions: 
#Use the concatenate function c and place numbers inside parentheses 
> y=c(10,11,12,13,14,15,16,17,18,19,20) 
> y 
[1] 10 11 12 13 14 15 16 17 18 19 20 
#Use the array function and place numbers inside parentheses 
> y=array(10:20) 
> y 
[1] 10 11 12 13 14 15 16 17 18 19 20
Part one: Objects 
#Just as we replaced x with a single value, we can also replace a single value 
within our vector 
#Let’s replace the fifth number in our vector with 0 
> y 
[1] 10 11 12 13 14 15 16 17 18 19 20 
> y[5]=0 
> y 
[1] 10 11 12 13 0 15 16 17 18 19 20 
> 
#Square brackets [] placed after a vector will instruct R that we are interested in 
only a part of the vector. In the example above, we are referring to the fifth 
position in the vector
Part one: Objects 
#Try these vector manipulations as well: 
> y[1]=y[2] 
> y 
[1] 11 11 12 13 0 15 16 17 18 19 20 
> 
#The value of the first position was changed to be the same as the value in the 
second position 
> y[c(1,3,5)]=5 
> y 
[1] 5 11 5 13 5 15 16 17 18 19 20 
> 
#The values in the first, third and fifth positions were made equal to 5
Part one: Objects 
#Onward! We will make a new object, a two-dimensional matrix, and call it z 
#Our matrix will have ten rows and ten columns, and we will start out by filling 
all the cells with 0 
> z=matrix(0,ncol=10,nrow=10) 
> z 
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] 
[1,] 0 0 0 0 0 0 0 0 0 0 
[2,] 0 0 0 0 0 0 0 0 0 0 
[3,] 0 0 0 0 0 0 0 0 0 0 
[4,] 0 0 0 0 0 0 0 0 0 0 
[5,] 0 0 0 0 0 0 0 0 0 0 
[6,] 0 0 0 0 0 0 0 0 0 0 
[7,] 0 0 0 0 0 0 0 0 0 0 
[8,] 0 0 0 0 0 0 0 0 0 0 
[9,] 0 0 0 0 0 0 0 0 0 0 
[10,] 0 0 0 0 0 0 0 0 0 0 
>
Part one: Objects 
#We can replace parts of our matrix, like we did with our vector 
> z 
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] 
[1,] 0 0 0 0 0 0 0 0 0 0 
[2,] 0 0 0 0 0 0 0 0 0 0 
[3,] 0 0 0 0 0 0 0 0 0 0 
[4,] 0 0 0 0 0 0 0 0 0 0 
[5,] 0 0 0 0 0 0 0 0 0 0 
[6,] 0 0 0 0 0 0 0 0 0 0 
[7,] 0 0 0 0 0 0 0 0 0 0 
[8,] 0 0 0 0 0 0 0 0 0 0 
[9,] 0 0 0 0 0 0 0 0 0 0 
[10,] 0 0 0 0 0 0 0 0 0 0 
> z[1,3]=33 
> z 
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] 
[1,] 0 0 33 0 0 0 0 0 0 0 
[2,] 0 0 0 0 0 0 0 0 0 0 
[3,] 0 0 0 0 0 0 0 0 0 0 
[4,] 0 0 0 0 0 0 0 0 0 0 
[5,] 0 0 0 0 0 0 0 0 0 0 
[6,] 0 0 0 0 0 0 0 0 0 0 
[7,] 0 0 0 0 0 0 0 0 0 0 
[8,] 0 0 0 0 0 0 0 0 0 0 
[9,] 0 0 0 0 0 0 0 0 0 0 
[10,] 0 0 0 0 0 0 0 0 0 0 
#Here, the two numbers inside the square brackets are a coordinate for the matrix: 
first row, third column
Part one: Objects 
#We can replace an entire row by not providing a column coordinate 
> z[1,]=33 
> z 
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] 
[1,] 33 33 33 33 33 33 33 33 33 33 
[2,] 0 0 0 0 0 0 0 0 0 0 
[3,] 0 0 0 0 0 0 0 0 0 0 
[4,] 0 0 0 0 0 0 0 0 0 0 
[5,] 0 0 0 0 0 0 0 0 0 0 
[6,] 0 0 0 0 0 0 0 0 0 0 
[7,] 0 0 0 0 0 0 0 0 0 0 
[8,] 0 0 0 0 0 0 0 0 0 0 
[9,] 0 0 0 0 0 0 0 0 0 0 
[10,] 0 0 0 0 0 0 0 0 0 0 
> 
#Likewise, we can replace an entire column 
> z[,3]=c(1:10) 
> z 
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] 
[1,] 33 33 1 33 33 33 33 33 33 33 
[2,] 0 0 2 0 0 0 0 0 0 0 
[3,] 0 0 3 0 0 0 0 0 0 0 
[4,] 0 0 4 0 0 0 0 0 0 0 
[5,] 0 0 5 0 0 0 0 0 0 0 
[6,] 0 0 6 0 0 0 0 0 0 0 
[7,] 0 0 7 0 0 0 0 0 0 0 
[8,] 0 0 8 0 0 0 0 0 0 0 
[9,] 0 0 9 0 0 0 0 0 0 0 
[10,] 0 0 10 0 0 0 0 0 0 0 
>
Part one: Objects 
#Lastly, we will make a character array, which is like a vector or a matrix except 
that it can hold numbers and letters 
> w=matrix("df",ncol=10,nrow=10) 
> w 
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] 
[1,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" 
[2,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" 
[3,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" 
[4,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" 
[5,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" 
[6,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" 
[7,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" 
[8,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" 
[9,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" 
[10,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" 
> 
#So, this covers the basics of creating objects for storing data in R.
Part one: Objects 
#Let’s clean out the objects that we made in Part One 
> ls() 
[1] "w" "x" "y" "z" 
> 
#The list objects command ls() will show us which objects are stored in R 
#We can permanently remove a specific object with the rm() function 
> rm(x) 
> ls() 
[1] "w" "y" "z" 
> 
#We can also remove all objects 
> rm(list = ls()) 
> ls() 
> character(0)
Part one: Exercise 
#Make a new matrix object with three columns and seven rows, and fill every cell 
with the number 9. Use your first name as the name of the matrix object. 
#Make a new vector object with the numbers 101, 898 and -3. Use your surname 
as the name of the vector object. 
#Replace the fourth row of your matrix with your vector.
Part one: Exercise 
#Make a new matrix object with three columns, seven rows, and fill every cell with 
the number 9. Use your first name as the name of the matrix object. 
> daniel=matrix(9,ncol=3,nrow=7) 
> daniel 
[,1] [,2] [,3] 
[1,] 9 9 9 
[2,] 9 9 9 
[3,] 9 9 9 
[4,] 9 9 9 
[5,] 9 9 9 
[6,] 9 9 9 
[7,] 9 9 9 
#Make a new vector object with the numbers 101, 898 and -3. Use your surname 
as the name of the vector object. 
> thomas=c(101,898,-3) 
> thomas 
[1] 101 898 -3 
#Replace the fourth row of your matrix with your vector. 
> daniel[4,]=thomas 
> daniel 
[,1] [,2] [,3] 
[1,] 9 9 9 
[2,] 9 9 9 
[3,] 9 9 9 
[4,] 101 898 -3 
[5,] 9 9 9 
[6,] 9 9 9 
[7,] 9 9 9
HELP! 
#You can call on the help function if you become lost or unstuck when using R 
#Can’t remember how to make a matrix? 
> ?matrix 
>
Part two: Data manipulation 
#This will be a worked example for a Student’s T-test for the means of two 
samples, showcasing the storage and analysis of data in R
Part two: Data manipulation 
#Make x a vector containing 1000 random numbers 
> set.seed(1) 
> x=rnorm(1000) 
#Make y a vector containing 1000 random numbers 
> set.seed(100) 
> y=rnorm(1000) 
#The random numbers in R are not truly random, they are simply drawn from a pool of 
data that has many characteristics of random data. Using the set.seed function, we 
can define a set of ‘random’ numbers for use in our calculations. This will mean that we 
should all get the same results from our ‘random’ numbers’ 
#We will use Student’s T-test to see if the mean of x and mean of y are significantly 
different
Part two: Data manipulation 
#What are the assumptions for a T-test? 
#1) That the two samples (x and y) are each normally distributed 
#2) That the two samples have the same variance 
#3) That the two samples are independent 
#These are calculated data so we will assume that 3) is true. 
#We should test 1) and 2) if we want our T-test results to be meaningful!
Part two: Data manipulation 
#We will use the Shapiro-Wilk1 test to see if the data are normally distributed 
#The Shapiro-Wilk test calculates a normality statistic (W) and tests the hypothesis that 
the data are normal 
#We would reject the null hypothesis for our sample if we received a p-value of <0.05 
#To perform a Shapiro-Wilk test in R we use the shapiro.test function 
> shapiro.test(x) 
Shapiro-Wilk normality test 
data: x 
W = 0.9988, p-value = 0.7256 
> 
> shapiro.test(y) 
Shapiro-Wilk normality test 
data: y 
W = 0.9993, p-value = 0.9765 
1Shapiro SS & Wilk MB. 1965. An analysis of variance test for normality (complete samples). Biometrika 52: 591–611
Part two: Data manipulation 
#We will use an F-test1 to see if x and y have equal variances 
#The null hypothesis of this F-test is that the two datasets have equal variances, and this 
hypothesis is rejected if the p-value is <0.05 
#We calculate an F-test for equal variances in R using the var.test function 
> var.test(x,y) 
F test to compare two variances 
data: x and y 
F = 1.0084, num df = 999, denom df = 999, p-value = 0.8947 
alternative hypothesis: true ratio of variances is not equal to 1 
95 percent confidence interval: 
0.890733 1.141648 
sample estimates: 
ratio of variances 
1.008417 
1Box, G.E.P. (1953). "Non-Normality and Tests on Variances". Biometrika 40 (3/4): 318–335.
Part two: Data manipulation 
#Let’s perform the Student’s T-test and see if the mean of x and the mean of y are 
significantly different 
#We will use a simple form of the t.test function. This test requires three pieces of 
information: x, y, and information about equal variance 
> t.test(x,y,var.equal=TRUE) 
Two Sample t-test 
data: x and y 
t = -0.6161, df = 1998, p-value = 0.5379 
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval: 
-0.11903134 0.06212487 
sample estimates: 
mean of x mean of y 
-0.01164814 0.01680509 
#The null hypothesis for this test is that x and y have the same mean value. The 
significance level was set at 0.95, so the rejection criteria would be a p-value less than 
0.05. Did we reject the null hypothesis?
Part two: Exercise 
#Generate vector objects a and b as below 
> set.seed(10) 
> a=rnorm(1000,sd=2) 
> set.seed(50) 
> b=rnorm(1000,sd=1) 
#Is the mean of a significantly different from the mean of b? Is it appropriate to 
use a Student’s T-test to address this question?
Part two: Exercise 
> shapiro.test(a) 
Shapiro-Wilk normality test 
data: a 
W = 0.9979, p-value = 0.2538 
> shapiro.test(b) 
Shapiro-Wilk normality test 
data: b 
W = 0.9978, p-value = 0.2242 
> var.test(a,b) 
F test to compare two variances 
data: a and b 
F = 3.7431, num df = 999, denom df = 999, 
p-value < 2.2e-16 
alternative hypothesis: true ratio of 
variances is not equal to 1 
95 percent confidence interval: 
3.306307 4.237678 
sample estimates: 
ratio of variances 
3.743136 
> t.test(a,b,var.equal=F) 
Welch Two Sample t-test 
data: a and b 
t = 0.3949, df = 1497.218, p-value = 0.693 
alternative hypothesis: true difference in 
means is not equal to 0 
95 percent confidence interval: 
-0.1106290 0.1663946 
sample estimates: 
mean of x mean of y 
0.022749483 -0.005133326 
> 
Is the mean of a different from the mean of 
b? 
p-value = 0.693 
Fail to reject the null hypothesis that the 
means are different.
Part three: External data 
#Datasets can often be too large to type into R. This section of the guide will show you 
how to automatically read data into R and then perform an analysis 
#For this test we will perform a one-way analysis of variance (ANOVA) 
Country 
#Right click on the dataset embedded above the arrow , move the mouse to ‘Macro- 
Enabled Worksheet Object’, click Open, and then save the table as IUCN.csv (a comma 
separated values file) to a folder on your computer 
#The dataset contains a count of endangered species for sixty randomly selected 
countries in three different regions. These data have been extracted from Table 6a of 
the IUCN Red List summary statistics: 
http://www.iucnredlist.org/documents/summarystatistics/2010_3RL_Stats_Table_6a.pdf
Part three: External data 
#We are going to use a one-way ANOVA to see if the mean number of endangered 
species is different in different regions (AFRICA, ASIA and EUROPE). 
#First step: we will now tell R where to look for the file, using the setwd()function 
> setwd("H:/Projects/Teaching/R") 
#Hint: your working directory will be different to mine 
#Note: we use forwardslashes / and not backslashes  
#Second step: we read the file into R as a new object called IUCN. The term sep="," 
is used because values in the dataset are separated by commas. The term header=T 
is used because the first row of the IUCN table contains column names 
> IUCN=read.table("IUCN.csv",sep=",",header=T) 
#Alternatively, if we know the full file path, then we could read the file into R without 
using setwd() 
> IUCN=read.table("H:/Projects/Teaching/R/IUCN.csv",sep=",",header=T)
Part three: External data 
#What are the assumptions for a one-way ANOVA? 
#1) That the data in each group have been randomly selected from a normal distribution 
#2) That each group of data have the same variance 
#3) That each group of data is independent 
#Assumption 3) may be unlikely but we will assume it is true. 
#We should test 1) and 2) if we want our ANOVA results to be meaningful!
Part three: External data 
#We will use the Shapiro-Wilk test to see if the data from each region (AFRICA, ASIA and 
EUROPE) and are normally distributed 
#First though, we will separate out the data for each region so that we can test for 
normality separately 
> af=IUCN[which(IUCN[,2]=="AFRICA"),3] 
#Let’s take a closer look: 
IUCN[,2]calls up the second column of the IUCN object 
#The which() function is asking ‘which of the values in column 2 of the IUCN object 
contain the word “AFRICA”? which(IUCN[,2]=="AFRICA"). This give us the Africa 
row values. 
#Now we can use the Africa row values to find the number of Endangered species for 
each African country. These species counts are stored in column 3 of the IUCN object. 
IUCN[which(IUCN[,2]=="AFRICA"),3] 
#Now we store the endangered species counts for African countries as the af object 
af=IUCN[which(IUCN[,2]=="AFRICA"),3]
Part three: External data 
#Repeat for ASIA and EUROPE 
> ai=IUCN[which(IUCN[,2]=="ASIA"),3] 
> eu=IUCN[which(IUCN[,2]=="EUROPE"),3]
Part three: External data 
#We will use a Bartlett Test of Homogeneity of Variances1 to test if variance is equal 
across our three groups (AFRICA, ASIA, EUROPE). 
#The function for the Bartlett test is simply Bartlett.test(). The terms for this 
function will be the Endangered species column of the IUCN object, and the 
Region column of the IUCN object. Column 3 and column 2 respectively. 
#A Bartlett operates similar to an F-test. The null hypothesis for this Bartlett-test is that the 
groups have equal variances. 
#We would reject the null hypothesis for our dataset if we received a p-value of <0.05. 
> bartlett.test(IUCN[,3]~IUCN[,2]) 
Bartlett test of homogeneity of variances 
data: IUCN[, 3] by IUCN[, 2] 
Bartlett's K-squared = 11.6261, df = 2, p-value = 0.002988 
11Box, G.E.P. (1953). "Non-Normality and Tests on Variances". Biometrika 40 (3/4): 318–335.
Part three: External data 
#Here we reject the null hypothesis – at least Region has a variance that is not equal to the 
variance of another Region in the dataset. 
#Our dataset does not satisfy the second assumption of the ANOVA. We can still proceed 
however. 
#The ANOVA test is robust to violations of this second assumption. This means that it can still 
produce meaningful results even if the groups do not have equal variances. As a rule of thumb, 
we can proceed if the maximum variance of our groups is less than 4 times greater than the 
minimum variance of our groups. 
> var(af) 
[1] 25.07692 
> var(ai) 
[1] 9.002849 
> var(eu) 
[1] 7.464387 
> 
#The variance of the number of endangered species in Africa is substantially greater than the 
other two variance values. However, the Africa group variance is less than 4 time the variance 
of the Europe group 
> var(eu)<4*var(af) 
[1] TRUE 
#So, we will proceed, but we need to be aware that with unequal variances is will be tougher 
for an analysis of variance to find a significant result.
Part three: External data 
#Perform the one-way ANOVA using the aov() function with the following 
syntax, and store the results as an object called IUCN_ANOVA 
> IUCN_ANOVA=aov(Endangered_species~Region,data=IUCN) 
#You can see the ANOVA results by calling up the IUCN_ANOVA object 
> IUCN_ANOVA 
Call: 
aov(formula = Endangered_species ~ Region, data = IUCN) 
Terms: 
Region Residuals 
Sum of Squares 703.284 1080.148 
Deg. of Freedom 2 78 
Residual standard error: 3.721297 
Estimated effects may be unbalanced 
>
Part three: External data 
#Use the summary() function to find out more about the ANOVA 
> summary(IUCN_ANOVA) 
Df Sum Sq Mean Sq F value Pr(>F) 
Region 2 703.3 351.6 25.39 3.21e-09 *** 
Residuals 78 1080.1 13.8 
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
> 
#Interpretation: How do we read this table to find out if the mean number of 
endangered species is different in different regions? 
#The null hypothesis for this test is that the mean number of endangered species is the 
same in each region. We would reject this null hypothesis if the p-value (i.e. Pr(>F)) 
is less than the significance level for this test (i.e. <0.05). So, we reject the null 
hypothesis, and conclude that the mean number of endangered species is significantly 
different between regions.
Part three: External data 
#Are the number of endangered animals different between all regions, or just different 
for one region? To find out we will use Tukey’s Honest Significant Difference test. 
#The function for Tukey’s HSD is simply TukeyHSD(). The test uses the following 
syntax 
> TukeyHSD(IUCN_ANOVA,"Region") 
Tukey multiple comparisons of means 
95% family-wise confidence level 
Fit: aov(formula = Endangered_species ~ Region, data = IUCN) 
$Region 
diff lwr upr p adj 
ASIA-AFRICA -4.185185 -6.605050 -1.7653208 0.0002620 
EUROPE-AFRICA -7.185185 -9.605050 -4.7653208 0.0000000 
EUROPE-ASIA -3.000000 -5.419864 -0.5801356 0.0111684 
#Tukey’s HSD provides a pairwise test of each group in the ANOVA. Any Region pair with 
a p adj value <0.05 had a significantly different number of endangered species.
Part three: External data 
#Bonus: Let’s plot our IUCN data to better visualise these results 
> boxplot(Endangered_species~Region,data=IUCN) 
maximum 
(excl. outliers) 
upper quartile 
mean 
lower quartile 
minimum 
(excl. outliers) 
AFRICA ASIA EUROPE 
5 10 15 20 25 
AFRICA ASIA EUROPE 
5 10 15 20 25 
Outlier
Part three: Exercise 
#Plotting basics 
#To quickly generate a plot in R using only default options, simply use the 
plot() function. 
> plot(af) 
> 
#There are many variables that you change to improve the look of your plots 
plot(af,xlab="Country",main="Africa",col=rainbow(100),p 
ch=16,ylab="Endangered species (number)",cex=2,font=6) 
barplot(af,col="red",names.arg=IUCN[which(IUCN[,2]=="AF 
RICA"),1],las=2,ylab="Endangered species 
(count)",main="Africa") 
#Use ?plot and ?barplot to learn about the variables you can change when 
plotting data
Part four: Packages and libraries 
#You have been using some of the basic functions that are packaged with R, and you 
have been either generating or importing datasets 
#Anyone can write a new function in R though, or make a dataset, and these functions 
and datasets can be bundled together into a package 
#R is modular, which means you can download and install new packages to give you 
access to new functions and/or datasets 
#There is an automatic and a manual method for installing packages. This guide will 
teach you how to manually install packages in R 
#Why the manual method you ask? Because R requires internet access to download 
packages, which can be complicated by a University proxy. I can’t guarantee that the 
proxy won’t be an issue. That’s why. Well that, and it will be good for you.
Part four: Packages and libraries 
#This will be an exercise in downloading the ‘Analyses of Phylogenetics and Evolution’ 
package, first written by Emmanuel Paradis in 2008 
#The abbreviation for this package is ape
Part four: Packages and libraries 
#Open a web browser and enter http://cran.r-project.org/web/packages/ape/index.html 
into the address bar – go to the website. The page should be mostly black text on a 
white background. 
#Find the Downloads section towards the bottom of the website. 
#For mac users: download the Mac OS X binary (ape_3.1-4.tgz) 
#For PC users: download the Windows binary (ape_3.1-4.zip) 
#For UNIX users: again, seek professional help 
#Save the ape_3.1-4.xxx file somewhere on your computer that you can easily find 
#Note to future users: the file name may be slightly different if Paradis has updated ape
Part four: Packages and libraries 
#Run R 
#Use the install.packages function with the following syntax to install the ape 
package 
> install.packages("H:/Teaching/ape_3.1-4.zip") 
#Remember to replace my file path “H:/Teaching/” with the file path of the 
folder where you downloaded the ape package 
#You should see text like this appear after you enter the install.packages 
command 
Installing package into ‘C:/Documents/R/win-library/3.1’ 
(as ‘lib’ is unspecified) 
inferring 'repos = NULL' from the file name 
package ‘ape’ successfully unpacked and MD5 sums checked 
#Congratulations, you have now added functions and datasets written by Emmanuel 
Paradis to your own copy of R
Part four: Packages and libraries 
#You only need to install a package into R once. The package is now available as a 
‘library’. If you want to use the ape library in your current R session, then you need to 
load the library into R 
> library(ape) 
> 
#So, you install a package once, and load a library many times (every time you run R and 
want to use the library) 
#The ape library is now available for youto use. Ape is a library of datasets and tools that 
have been designed around phylogenetic analyses. We quickly will explore some of the 
data and functions in ape: 
> data(bird.orders) 
#The data function loads a dataset into R. Here we have loaded the bird orders dataset 
that is part of the ape library
Part four: Packages and libraries 
> plot(bird.orders) 
#The plot function detects that bird.orders is a special type of object – it is a 
‘phylo’ class of object. This type of object is a different object class from the vectors, 
matrices and data frames that we have been working with 
#The ape library has a special plot function for plotting ‘phylo’ objects. This special 
plot function replaced the normal plot function when we tried to plot 
bird.orders. 
#Don’t worry! All of this happened automatically because we installed the ape package
Part four: Packages and libraries 
#Test: Use the ? (help) function for plot.phylo to learn how to plot the 
bird.orders dataset as a fan, as below 
> ?plot.phylo
Part four: Exercise 
#Download, install and load two packages: ggplot2 and labeling 
#Get the packages using Google ‘r ggplot2 cran’ and ‘r labeling cran’ or use the 
links below 
http://cran.r-project.org/web/packages/labeling/index.html 
http://cran.r-project.org/web/packages/ggplot2/index.html 
#Use the new data and functions provided by these packages to plot the density of 
diamonds against their weight (carat). 
> qplot(carat, data = diamonds, geom = "density", colour = color) 
> 
#For more information on ggplot see http://ggplot2.org/book/qplot.pdf
Part five: Scripts 
#One of the best features of R is the ability to automatically carry out many 
commands, one after another. For this type of operation we would first write all 
of our commands into a script, and then enter the entire script into R in one action 
#We are going to use previously scripted code for this section of the guide. Our script will 
generate, analyse and plot some data. 
#Go ahead and open this embedded text file by right clicking on it and clicking 
‘Packager Shell Object Object’  ‘Activate Contents’ 
#Copy the entire contents of this notepad document and paste it all into R 
#Now, read through the notepad document to find out what has taken place
Part six: Logic (programming) 
#There are many functions in R that do more than just basic mathematical operations 
#We have seen one already, the which() function. This function looked through an 
object to find a particular value that we wanted. 
> which(IUCN[,2]==“AFRICA”) 
#Here we will focus on loops, which we access using the for() function. 
#A loop is written as follows 
>for(i in 1:10){ 
} 
# for starts the loop 
# i is a value that will be updated as the loop iterates 
# 1 is the starting value for i 
# 10 is the final value for i 
#The curly brackets {} enclose the calculations that are looped
Part six: Logic (programming) 
#Make j = 1 
> j=1 
#We will use a loop to increase the value of j by i through ten iterations 
> for(i in 1:10){ 
j+i 
} 
#We don’t get to see what happens inside a loop unless we specifically ask for it 
> for(i in 1:10){ 
+ print(j+i) 
+ } 
[1] 2 
[1] 3 
[1] 4 
[1] 5 
[1] 6 
[1] 7 
[1] 8 
[1] 9 
[1] 10 
[1] 11 
>
Part six: Logic (programming) 
#What is the new value of j? j is still 1, because we did not store the changed value. 
> for(i in 1:10){ 
+ j=j+1 
+ } 
> j 
[1] 11 
#j is now equal to 11. How did that happen? 
> j=1 
> for(i in 1:10){ 
+ j=j+1 
+ print(j) 
+ } 
[1] 2 
[1] 3 
[1] 4 
[1] 5 
[1] 6 
[1] 7 
[1] 8 
[1] 9 
[1] 10 
[1] 11
Part six: Exercise 
#Make a vector of ten random numbers 
#Using a loop, add 100 to each number in the vector, in sequence. For example, in 
the first iteration of your loop you will add 100 to the first value of your vector, in 
the second iteration of your loop you will add 100 to the second value of your 
vector, and so on.
Part six: Exercise 
> x=rnorm(10) 
> x 
[1] -0.81673186 0.35409408 0.69619606 -2.04003445 - 
1.02832503 -0.31418186 
[7] 0.09717105 0.78778455 -0.15048025 1.86026573 
> 
> 
> for(i in 1:length(x)){ 
+ x[i]=x[i]+100 
+ } 
> x 
[1] 99.18327 100.35409 100.69620 97.95997 98.97167 
99.68582 100.09717 
[8] 100.78778 99.84952 101.86027 
>
How far does a Duvaucel's gecko travel after release? 
Department of Conservation 
Reference: 10039929 
Photograph by Chris Smuts-Kennedy 
Grid of monitored stations 
Methods: 
• Record the grid coordinates of the station 
where the gecko is released 
• Each day for three subsequent days 
measure the grid coordinates of the station 
where the gecko is found 
• Calculate the distance between recorded 
stations 
• 10 m by 10 m grid 
1 m 
1 m
#Step one: Set up the monitoring grid data for each day. 0 means that the gecko 
was not observed in that grid cell, 1 means that the gecko was observed in that 
grid cell. 
#Release day 
set.seed(1) 
d0=rep(0,100) 
d0[round(runif(1,min=0,max=100))]=1 
day.zero=matrix(d0,ncol=10,nrow=10) 
#Day one check 
set.seed(2) 
d1=rep(0,100) 
d1[round(runif(1,min=0,max=100))]=1 
day.one=matrix(d1,ncol=10,nrow=10) 
#Day two check 
set.seed(3) 
d2=rep(0,100) 
d2[round(runif(1,min=0,max=100))]=1 
day.two=matrix(d2,ncol=10,nrow=10) 
#Day three check 
set.seed(4) 
d3=rep(0,100) 
d3[round(runif(1,min=0,max=100))]=1 
day.three=matrix(d3,ncol=10,nrow=10)
#Step two: Combine all of the grid data into one list. This will help us quickly 
analyse the data as a single batch. 
days=list(day.zero,day.one,day.two,day.three) 
#Step three: Create a matrix where we will store the grid locations for the gecko 
location, and calculate the daily distance. 
movement=matrix(0,ncol=3,nrow=length(days)) 
colnames(movement)=c("Easting","Northing","Displacement (m)") 
#Step four: Find the grid cell for the location of the gecko on each day and store 
that information in the movement matrix. 
for(i in 1:length(days)){ 
movement[i,1]=which(days[[i]]==1, arr.ind=TRUE)[1] 
movement[i,2]=which(days[[i]]==1, arr.ind=TRUE)[2] 
}
#Step five: Calculate the distance that the gecko travelled each day. 
for(j in 2:length(days)){ 
movement[j,3]=sqrt(((abs(movement[j,1]-movement[j- 
1,1]))^2)+((abs(movement[j,2]-movement[j-1,2]))^2)) 
} 
#Step six: Plot the distance between each station where the gecko was found 
on each subsequent day. 
barplot(movement[,3],xlab="Day",ylab="Displament (m)",main="Gecko 
distance")
Conclusion 
#By now you should have a good understanding of how to use R 
#We have covered all of the basic ways of interacting with R: 
- Storing data 
- Plotting data 
- Analysing data with functions 
- Loading new functions for data analysis 
#There is so much further you can take this though – your imagination is the limit! 
#You should think of this tutorial as a quick reference guide to help get you on your feet 
#You can also check out tutorial videos at 
illuminatingaotearoa.wordpress.com/zoostar

More Related Content

What's hot

Python fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuanPython fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuanWei-Yuan Chang
 
Introduction to programming - class 11
Introduction to programming - class 11Introduction to programming - class 11
Introduction to programming - class 11Paul Brebner
 
Gsp 215 Effective Communication / snaptutorial.com
Gsp 215  Effective Communication / snaptutorial.comGsp 215  Effective Communication / snaptutorial.com
Gsp 215 Effective Communication / snaptutorial.comHarrisGeorg21
 
Python quickstart for programmers: Python Kung Fu
Python quickstart for programmers: Python Kung FuPython quickstart for programmers: Python Kung Fu
Python quickstart for programmers: Python Kung Fuclimatewarrior
 
Python3 cheatsheet
Python3 cheatsheetPython3 cheatsheet
Python3 cheatsheetGil Cohen
 
The Ring programming language version 1.10 book - Part 127 of 212
The Ring programming language version 1.10 book - Part 127 of 212The Ring programming language version 1.10 book - Part 127 of 212
The Ring programming language version 1.10 book - Part 127 of 212Mahmoud Samir Fayed
 
Folding Unfolded - Polyglot FP for Fun and Profit - Haskell and Scala - Part ...
Folding Unfolded - Polyglot FP for Fun and Profit - Haskell and Scala - Part ...Folding Unfolded - Polyglot FP for Fun and Profit - Haskell and Scala - Part ...
Folding Unfolded - Polyglot FP for Fun and Profit - Haskell and Scala - Part ...Philip Schwarz
 
Cheat sheet python3
Cheat sheet python3Cheat sheet python3
Cheat sheet python3sxw2k
 
Arrays In General
Arrays In GeneralArrays In General
Arrays In Generalmartha leon
 
GSP 215 RANK Education Counseling -- gsp215rank.com
GSP 215 RANK Education Counseling -- gsp215rank.comGSP 215 RANK Education Counseling -- gsp215rank.com
GSP 215 RANK Education Counseling -- gsp215rank.comkopiko85
 
16. Arrays Lists Stacks Queues
16. Arrays Lists Stacks Queues16. Arrays Lists Stacks Queues
16. Arrays Lists Stacks QueuesIntro C# Book
 
Python 2.5 reference card (2009)
Python 2.5 reference card (2009)Python 2.5 reference card (2009)
Python 2.5 reference card (2009)gekiaruj
 
Scientific Computing with Python - NumPy | WeiYuan
Scientific Computing with Python - NumPy | WeiYuanScientific Computing with Python - NumPy | WeiYuan
Scientific Computing with Python - NumPy | WeiYuanWei-Yuan Chang
 
18. Dictionaries, Hash-Tables and Set
18. Dictionaries, Hash-Tables and Set18. Dictionaries, Hash-Tables and Set
18. Dictionaries, Hash-Tables and SetIntro C# Book
 
Basics of Python programming (part 2)
Basics of Python programming (part 2)Basics of Python programming (part 2)
Basics of Python programming (part 2)Pedro Rodrigues
 

What's hot (19)

Programming Assignment Help
Programming Assignment HelpProgramming Assignment Help
Programming Assignment Help
 
Python fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuanPython fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuan
 
Introduction to programming - class 11
Introduction to programming - class 11Introduction to programming - class 11
Introduction to programming - class 11
 
Gsp 215 Effective Communication / snaptutorial.com
Gsp 215  Effective Communication / snaptutorial.comGsp 215  Effective Communication / snaptutorial.com
Gsp 215 Effective Communication / snaptutorial.com
 
Python quickstart for programmers: Python Kung Fu
Python quickstart for programmers: Python Kung FuPython quickstart for programmers: Python Kung Fu
Python quickstart for programmers: Python Kung Fu
 
Python3 cheatsheet
Python3 cheatsheetPython3 cheatsheet
Python3 cheatsheet
 
The Ring programming language version 1.10 book - Part 127 of 212
The Ring programming language version 1.10 book - Part 127 of 212The Ring programming language version 1.10 book - Part 127 of 212
The Ring programming language version 1.10 book - Part 127 of 212
 
Folding Unfolded - Polyglot FP for Fun and Profit - Haskell and Scala - Part ...
Folding Unfolded - Polyglot FP for Fun and Profit - Haskell and Scala - Part ...Folding Unfolded - Polyglot FP for Fun and Profit - Haskell and Scala - Part ...
Folding Unfolded - Polyglot FP for Fun and Profit - Haskell and Scala - Part ...
 
Cheat sheet python3
Cheat sheet python3Cheat sheet python3
Cheat sheet python3
 
Python Cheat Sheet
Python Cheat SheetPython Cheat Sheet
Python Cheat Sheet
 
Computer Science Assignment Help
 Computer Science Assignment Help  Computer Science Assignment Help
Computer Science Assignment Help
 
Arrays In General
Arrays In GeneralArrays In General
Arrays In General
 
GSP 215 RANK Education Counseling -- gsp215rank.com
GSP 215 RANK Education Counseling -- gsp215rank.comGSP 215 RANK Education Counseling -- gsp215rank.com
GSP 215 RANK Education Counseling -- gsp215rank.com
 
16. Arrays Lists Stacks Queues
16. Arrays Lists Stacks Queues16. Arrays Lists Stacks Queues
16. Arrays Lists Stacks Queues
 
Python 2.5 reference card (2009)
Python 2.5 reference card (2009)Python 2.5 reference card (2009)
Python 2.5 reference card (2009)
 
Scientific Computing with Python - NumPy | WeiYuan
Scientific Computing with Python - NumPy | WeiYuanScientific Computing with Python - NumPy | WeiYuan
Scientific Computing with Python - NumPy | WeiYuan
 
18. Dictionaries, Hash-Tables and Set
18. Dictionaries, Hash-Tables and Set18. Dictionaries, Hash-Tables and Set
18. Dictionaries, Hash-Tables and Set
 
Computational Assignment Help
Computational Assignment HelpComputational Assignment Help
Computational Assignment Help
 
Basics of Python programming (part 2)
Basics of Python programming (part 2)Basics of Python programming (part 2)
Basics of Python programming (part 2)
 

Viewers also liked

Word processor in the classroom
Word processor in the classroomWord processor in the classroom
Word processor in the classroomLuphiie Lyaa
 
In cloud galleries comparison
In cloud galleries comparisonIn cloud galleries comparison
In cloud galleries comparisonAdryMorci
 
Upgrade your presentations
Upgrade your presentationsUpgrade your presentations
Upgrade your presentationskeekee92
 
Justmeans power point
Justmeans power pointJustmeans power point
Justmeans power pointjustmeanscsr
 
Combining Text and Graphics in Eclipse-based Modeling Tools
Combining Text and Graphics in Eclipse-based Modeling ToolsCombining Text and Graphics in Eclipse-based Modeling Tools
Combining Text and Graphics in Eclipse-based Modeling ToolsDr. Jan Köhnlein
 
Introduction to technical writing
Introduction to technical writingIntroduction to technical writing
Introduction to technical writingAdam Tablante
 
Creating mail merge
Creating mail mergeCreating mail merge
Creating mail mergeNico Bereber
 
Contextualized online search and research skills
Contextualized online search and research skillsContextualized online search and research skills
Contextualized online search and research skillsJonathan Jr Marcelino
 
How to hack computers how to h joel tope
How to hack computers  how to h   joel topeHow to hack computers  how to h   joel tope
How to hack computers how to h joel topeSonny Dolinen
 
Lesson 2 Online Safety, Security, Ethics and Etiquette
Lesson 2   Online Safety, Security, Ethics and EtiquetteLesson 2   Online Safety, Security, Ethics and Etiquette
Lesson 2 Online Safety, Security, Ethics and EtiquetteLea Rodriguez
 
Online Platform :Empowerment of technologies ICT
Online Platform :Empowerment of technologies ICTOnline Platform :Empowerment of technologies ICT
Online Platform :Empowerment of technologies ICTSonny Dolinen
 
Its all about Infographics
Its all about InfographicsIts all about Infographics
Its all about InfographicsAditya Krishna
 

Viewers also liked (17)

Word processor in the classroom
Word processor in the classroomWord processor in the classroom
Word processor in the classroom
 
In cloud galleries comparison
In cloud galleries comparisonIn cloud galleries comparison
In cloud galleries comparison
 
GNU Image Manipulation Program
GNU Image Manipulation ProgramGNU Image Manipulation Program
GNU Image Manipulation Program
 
Upgrade your presentations
Upgrade your presentationsUpgrade your presentations
Upgrade your presentations
 
Unit1 module4 em
Unit1 module4 emUnit1 module4 em
Unit1 module4 em
 
Justmeans power point
Justmeans power pointJustmeans power point
Justmeans power point
 
Combining Text and Graphics in Eclipse-based Modeling Tools
Combining Text and Graphics in Eclipse-based Modeling ToolsCombining Text and Graphics in Eclipse-based Modeling Tools
Combining Text and Graphics in Eclipse-based Modeling Tools
 
Introduction to technical writing
Introduction to technical writingIntroduction to technical writing
Introduction to technical writing
 
Animate pp1
Animate pp1Animate pp1
Animate pp1
 
Mail merge
Mail mergeMail merge
Mail merge
 
Creating mail merge
Creating mail mergeCreating mail merge
Creating mail merge
 
Contextualized online search and research skills
Contextualized online search and research skillsContextualized online search and research skills
Contextualized online search and research skills
 
How to hack computers how to h joel tope
How to hack computers  how to h   joel topeHow to hack computers  how to h   joel tope
How to hack computers how to h joel tope
 
Unit1 module1 em
Unit1 module1 emUnit1 module1 em
Unit1 module1 em
 
Lesson 2 Online Safety, Security, Ethics and Etiquette
Lesson 2   Online Safety, Security, Ethics and EtiquetteLesson 2   Online Safety, Security, Ethics and Etiquette
Lesson 2 Online Safety, Security, Ethics and Etiquette
 
Online Platform :Empowerment of technologies ICT
Online Platform :Empowerment of technologies ICTOnline Platform :Empowerment of technologies ICT
Online Platform :Empowerment of technologies ICT
 
Its all about Infographics
Its all about InfographicsIts all about Infographics
Its all about Infographics
 

Similar to Introducing R

R tutorial for a windows environment
R tutorial for a windows environmentR tutorial for a windows environment
R tutorial for a windows environmentYogendra Chaubey
 
Modeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptModeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptanshikagoel52
 
Multi dimensional arrays
Multi dimensional arraysMulti dimensional arrays
Multi dimensional arraysAseelhalees
 
Unit I - 1R introduction to R program.pptx
Unit I - 1R introduction to R program.pptxUnit I - 1R introduction to R program.pptx
Unit I - 1R introduction to R program.pptxSreeLaya9
 
R-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdfR-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdfKabilaArun
 
R-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdfR-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdfattalurilalitha
 
Lecture1_R.pdf
Lecture1_R.pdfLecture1_R.pdf
Lecture1_R.pdfBusyBird2
 
CE344L-200365-Lab2.pdf
CE344L-200365-Lab2.pdfCE344L-200365-Lab2.pdf
CE344L-200365-Lab2.pdfUmarMustafa13
 
ComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical SciencesComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical Sciencesalexstorer
 
Introduction to R
Introduction to RIntroduction to R
Introduction to RRajib Layek
 

Similar to Introducing R (20)

R tutorial for a windows environment
R tutorial for a windows environmentR tutorial for a windows environment
R tutorial for a windows environment
 
R Programming Intro
R Programming IntroR Programming Intro
R Programming Intro
 
Python lecture 05
Python lecture 05Python lecture 05
Python lecture 05
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
 
Lecture1 r
Lecture1 rLecture1 r
Lecture1 r
 
Modeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptModeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.ppt
 
Multi dimensional arrays
Multi dimensional arraysMulti dimensional arrays
Multi dimensional arrays
 
Unit 1 - R Programming (Part 2).pptx
Unit 1 - R Programming (Part 2).pptxUnit 1 - R Programming (Part 2).pptx
Unit 1 - R Programming (Part 2).pptx
 
Unit I - 1R introduction to R program.pptx
Unit I - 1R introduction to R program.pptxUnit I - 1R introduction to R program.pptx
Unit I - 1R introduction to R program.pptx
 
R-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdfR-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdf
 
R-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdfR-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdf
 
R-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdfR-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdf
 
Lecture1_R.pdf
Lecture1_R.pdfLecture1_R.pdf
Lecture1_R.pdf
 
C Language Lecture 10
C Language Lecture 10C Language Lecture 10
C Language Lecture 10
 
CE344L-200365-Lab2.pdf
CE344L-200365-Lab2.pdfCE344L-200365-Lab2.pdf
CE344L-200365-Lab2.pdf
 
Arrays in C language
Arrays in C languageArrays in C language
Arrays in C language
 
ComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical SciencesComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical Sciences
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Array
ArrayArray
Array
 

Recently uploaded

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfWadeK3
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 

Recently uploaded (20)

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 

Introducing R

  • 2. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: http://cran.stat.auckland.ac.nz/ What is R? A very simple programming language A place for you to input data A collection of tools for you to perform calculations A tool for producing graphics A statistics suite that can be downloaded on to any PC, Mac or Linux system A software package that can run on high performance computing clusters
  • 3. What is and what can I do with it? With R you can: Perform simple or advanced statistical tests and analyses e.g. standard deviation, t-test, principal component analysis Read and manipulate data from existing files e.g. tables in Excel files, trees in nexus files, data on websites Write data or figures to files e.g. export a figure to .pdf, export a .csv file Produce simple or advanced figures
  • 4. What is and what can I do with it? http://dx.doi.org/10.1098/rspb.2014.0806 Figure 2. A reconstruction of the evolutionary history of carotenoid pigmentation in feathers. The likelihood that ancestors could display carotenoid feather pigments has been reconstructed using ‘hidden’ transition rates in three rate categories (AIC = 4002.5, 11 transition rates) [33]. The POEs (defined in Material and methods) for carotenoid feather pigmentation are identified by red circles. Branches are coloured according to the proportional likelihood of carotenoid-consistent colours at the preceding node. Solid purple points indicate species for which carotenoid feather pigments were confirmed present from chemical analysis; open black points represent those for which where carotenoids were not detected in feathers after chemical analysis. Supertree phylogeny from [21].
  • 5. Who is this guide for? Starting at ground level and shaping you into a confident R user Are you… Completely new to R? An infrequent R user who wants a refresher? The material in these slides may not be useful for confident R users. An Introduction to R W. N. Venables, D. M. Smith and the R Core Team http://cran.r-project.org/doc/manuals/R-intro.pdf
  • 6. What does this guide cover? Part zero: Getting started Interacting with R Part one: Objects Vectors, Matrices, Character arrays Part two: Data manipulation Analysing data, T-test Part three: External data Reading data into R, ANOVA Part four: Packages and libraries Installing new packages into R Part five: Scripts Using pre-written code Part six: Logic (programming) Other functions in R
  • 7. Starting This guide will demonstrate the R Console (command-line input) for R 3.02 running in Windows 7. For Mac OS, R can be executed from terminal. For Unix, seek professional help… The only point of difference should be the initial starting of R and the visual appearance: Console commands will be the same for all operating systems.
  • 8. Part zero: Getting started #Throughout this guide a hashtag (i.e. number sign ‘#’) will identify a comment or instruction #Start R by finding the R application on your computer #You will be presented with the R console
  • 9. Part zero: Getting started #There are a variety of ways of using R, and we will start out with the most basic #We are going to enter lines of code into R by typing or pasting them into the R console #At its most basic, R is just a calculator > 1+1 [1] 2 > 1*3 [1] 3 > 4-7 [1] -3 > 20/4 [1] 5 > #The lines above this have come from the R Console. Remember to remove the > symbol if you copy text directly from these slides and paste it into R
  • 10. Part zero: Getting started #Some more basic mathematical operations in R > 12--2 [1] 14 > 2^2 [1] 4 > sqrt(9) [1] 3 > 4*(1+2) [1] 12
  • 11. Part zero: Exercise #Use R to find the length of the hypotenuse in the triangle shown below #Side a has length 3, Side b has length 4, and the hypotenuse has length h h2=a2+b2 h= √(a2+b2) 3 4 h
  • 12. Part zero: Exercise #Use R to find the length of the hypotenuse in the triangle shown below > sqrt(3^2+4^2) [1] 5 3 4 h
  • 13. Part one: Objects #R is more than just a basic calculator… #Most operations in R will use objects, which are values stored in R #Type x=1 into the R console #You have now input a number into R by storing that number as an object. For this example, the name of our object is x #Objects must be named using letters alone, or letters followed by other symbols #Object names cannot include spaces > x=1 > #Congratulations, you have just programmed R to store an object. #Type x into the R console to recall your object > x [1] 1 >
  • 14. Part one: Objects #We will now replace the value of x with 10 > x [1] 1 > x=10 > x [1] 10 > #As you can see, the value of an object can be easily replaced by simply making the object equal to a new value
  • 15. Part one: Objects #Let’s make y into a vector - a one dimensional array #There are several ways of making a vector in R. These methods introduce functions. #A function is an operation performed on numbers and/or objects. #The two easiest ways of making a vector in R use different functions: #Use the concatenate function c and place numbers inside parentheses > y=c(10,11,12,13,14,15,16,17,18,19,20) > y [1] 10 11 12 13 14 15 16 17 18 19 20 #Use the array function and place numbers inside parentheses > y=array(10:20) > y [1] 10 11 12 13 14 15 16 17 18 19 20
  • 16. Part one: Objects #Just as we replaced x with a single value, we can also replace a single value within our vector #Let’s replace the fifth number in our vector with 0 > y [1] 10 11 12 13 14 15 16 17 18 19 20 > y[5]=0 > y [1] 10 11 12 13 0 15 16 17 18 19 20 > #Square brackets [] placed after a vector will instruct R that we are interested in only a part of the vector. In the example above, we are referring to the fifth position in the vector
  • 17. Part one: Objects #Try these vector manipulations as well: > y[1]=y[2] > y [1] 11 11 12 13 0 15 16 17 18 19 20 > #The value of the first position was changed to be the same as the value in the second position > y[c(1,3,5)]=5 > y [1] 5 11 5 13 5 15 16 17 18 19 20 > #The values in the first, third and fifth positions were made equal to 5
  • 18. Part one: Objects #Onward! We will make a new object, a two-dimensional matrix, and call it z #Our matrix will have ten rows and ten columns, and we will start out by filling all the cells with 0 > z=matrix(0,ncol=10,nrow=10) > z [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 0 0 0 0 0 0 0 0 0 0 [2,] 0 0 0 0 0 0 0 0 0 0 [3,] 0 0 0 0 0 0 0 0 0 0 [4,] 0 0 0 0 0 0 0 0 0 0 [5,] 0 0 0 0 0 0 0 0 0 0 [6,] 0 0 0 0 0 0 0 0 0 0 [7,] 0 0 0 0 0 0 0 0 0 0 [8,] 0 0 0 0 0 0 0 0 0 0 [9,] 0 0 0 0 0 0 0 0 0 0 [10,] 0 0 0 0 0 0 0 0 0 0 >
  • 19. Part one: Objects #We can replace parts of our matrix, like we did with our vector > z [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 0 0 0 0 0 0 0 0 0 0 [2,] 0 0 0 0 0 0 0 0 0 0 [3,] 0 0 0 0 0 0 0 0 0 0 [4,] 0 0 0 0 0 0 0 0 0 0 [5,] 0 0 0 0 0 0 0 0 0 0 [6,] 0 0 0 0 0 0 0 0 0 0 [7,] 0 0 0 0 0 0 0 0 0 0 [8,] 0 0 0 0 0 0 0 0 0 0 [9,] 0 0 0 0 0 0 0 0 0 0 [10,] 0 0 0 0 0 0 0 0 0 0 > z[1,3]=33 > z [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 0 0 33 0 0 0 0 0 0 0 [2,] 0 0 0 0 0 0 0 0 0 0 [3,] 0 0 0 0 0 0 0 0 0 0 [4,] 0 0 0 0 0 0 0 0 0 0 [5,] 0 0 0 0 0 0 0 0 0 0 [6,] 0 0 0 0 0 0 0 0 0 0 [7,] 0 0 0 0 0 0 0 0 0 0 [8,] 0 0 0 0 0 0 0 0 0 0 [9,] 0 0 0 0 0 0 0 0 0 0 [10,] 0 0 0 0 0 0 0 0 0 0 #Here, the two numbers inside the square brackets are a coordinate for the matrix: first row, third column
  • 20. Part one: Objects #We can replace an entire row by not providing a column coordinate > z[1,]=33 > z [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 33 33 33 33 33 33 33 33 33 33 [2,] 0 0 0 0 0 0 0 0 0 0 [3,] 0 0 0 0 0 0 0 0 0 0 [4,] 0 0 0 0 0 0 0 0 0 0 [5,] 0 0 0 0 0 0 0 0 0 0 [6,] 0 0 0 0 0 0 0 0 0 0 [7,] 0 0 0 0 0 0 0 0 0 0 [8,] 0 0 0 0 0 0 0 0 0 0 [9,] 0 0 0 0 0 0 0 0 0 0 [10,] 0 0 0 0 0 0 0 0 0 0 > #Likewise, we can replace an entire column > z[,3]=c(1:10) > z [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 33 33 1 33 33 33 33 33 33 33 [2,] 0 0 2 0 0 0 0 0 0 0 [3,] 0 0 3 0 0 0 0 0 0 0 [4,] 0 0 4 0 0 0 0 0 0 0 [5,] 0 0 5 0 0 0 0 0 0 0 [6,] 0 0 6 0 0 0 0 0 0 0 [7,] 0 0 7 0 0 0 0 0 0 0 [8,] 0 0 8 0 0 0 0 0 0 0 [9,] 0 0 9 0 0 0 0 0 0 0 [10,] 0 0 10 0 0 0 0 0 0 0 >
  • 21. Part one: Objects #Lastly, we will make a character array, which is like a vector or a matrix except that it can hold numbers and letters > w=matrix("df",ncol=10,nrow=10) > w [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [2,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [3,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [4,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [5,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [6,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [7,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [8,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [9,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [10,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" > #So, this covers the basics of creating objects for storing data in R.
  • 22. Part one: Objects #Let’s clean out the objects that we made in Part One > ls() [1] "w" "x" "y" "z" > #The list objects command ls() will show us which objects are stored in R #We can permanently remove a specific object with the rm() function > rm(x) > ls() [1] "w" "y" "z" > #We can also remove all objects > rm(list = ls()) > ls() > character(0)
  • 23. Part one: Exercise #Make a new matrix object with three columns and seven rows, and fill every cell with the number 9. Use your first name as the name of the matrix object. #Make a new vector object with the numbers 101, 898 and -3. Use your surname as the name of the vector object. #Replace the fourth row of your matrix with your vector.
  • 24. Part one: Exercise #Make a new matrix object with three columns, seven rows, and fill every cell with the number 9. Use your first name as the name of the matrix object. > daniel=matrix(9,ncol=3,nrow=7) > daniel [,1] [,2] [,3] [1,] 9 9 9 [2,] 9 9 9 [3,] 9 9 9 [4,] 9 9 9 [5,] 9 9 9 [6,] 9 9 9 [7,] 9 9 9 #Make a new vector object with the numbers 101, 898 and -3. Use your surname as the name of the vector object. > thomas=c(101,898,-3) > thomas [1] 101 898 -3 #Replace the fourth row of your matrix with your vector. > daniel[4,]=thomas > daniel [,1] [,2] [,3] [1,] 9 9 9 [2,] 9 9 9 [3,] 9 9 9 [4,] 101 898 -3 [5,] 9 9 9 [6,] 9 9 9 [7,] 9 9 9
  • 25. HELP! #You can call on the help function if you become lost or unstuck when using R #Can’t remember how to make a matrix? > ?matrix >
  • 26. Part two: Data manipulation #This will be a worked example for a Student’s T-test for the means of two samples, showcasing the storage and analysis of data in R
  • 27. Part two: Data manipulation #Make x a vector containing 1000 random numbers > set.seed(1) > x=rnorm(1000) #Make y a vector containing 1000 random numbers > set.seed(100) > y=rnorm(1000) #The random numbers in R are not truly random, they are simply drawn from a pool of data that has many characteristics of random data. Using the set.seed function, we can define a set of ‘random’ numbers for use in our calculations. This will mean that we should all get the same results from our ‘random’ numbers’ #We will use Student’s T-test to see if the mean of x and mean of y are significantly different
  • 28. Part two: Data manipulation #What are the assumptions for a T-test? #1) That the two samples (x and y) are each normally distributed #2) That the two samples have the same variance #3) That the two samples are independent #These are calculated data so we will assume that 3) is true. #We should test 1) and 2) if we want our T-test results to be meaningful!
  • 29. Part two: Data manipulation #We will use the Shapiro-Wilk1 test to see if the data are normally distributed #The Shapiro-Wilk test calculates a normality statistic (W) and tests the hypothesis that the data are normal #We would reject the null hypothesis for our sample if we received a p-value of <0.05 #To perform a Shapiro-Wilk test in R we use the shapiro.test function > shapiro.test(x) Shapiro-Wilk normality test data: x W = 0.9988, p-value = 0.7256 > > shapiro.test(y) Shapiro-Wilk normality test data: y W = 0.9993, p-value = 0.9765 1Shapiro SS & Wilk MB. 1965. An analysis of variance test for normality (complete samples). Biometrika 52: 591–611
  • 30. Part two: Data manipulation #We will use an F-test1 to see if x and y have equal variances #The null hypothesis of this F-test is that the two datasets have equal variances, and this hypothesis is rejected if the p-value is <0.05 #We calculate an F-test for equal variances in R using the var.test function > var.test(x,y) F test to compare two variances data: x and y F = 1.0084, num df = 999, denom df = 999, p-value = 0.8947 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.890733 1.141648 sample estimates: ratio of variances 1.008417 1Box, G.E.P. (1953). "Non-Normality and Tests on Variances". Biometrika 40 (3/4): 318–335.
  • 31. Part two: Data manipulation #Let’s perform the Student’s T-test and see if the mean of x and the mean of y are significantly different #We will use a simple form of the t.test function. This test requires three pieces of information: x, y, and information about equal variance > t.test(x,y,var.equal=TRUE) Two Sample t-test data: x and y t = -0.6161, df = 1998, p-value = 0.5379 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.11903134 0.06212487 sample estimates: mean of x mean of y -0.01164814 0.01680509 #The null hypothesis for this test is that x and y have the same mean value. The significance level was set at 0.95, so the rejection criteria would be a p-value less than 0.05. Did we reject the null hypothesis?
  • 32. Part two: Exercise #Generate vector objects a and b as below > set.seed(10) > a=rnorm(1000,sd=2) > set.seed(50) > b=rnorm(1000,sd=1) #Is the mean of a significantly different from the mean of b? Is it appropriate to use a Student’s T-test to address this question?
  • 33. Part two: Exercise > shapiro.test(a) Shapiro-Wilk normality test data: a W = 0.9979, p-value = 0.2538 > shapiro.test(b) Shapiro-Wilk normality test data: b W = 0.9978, p-value = 0.2242 > var.test(a,b) F test to compare two variances data: a and b F = 3.7431, num df = 999, denom df = 999, p-value < 2.2e-16 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 3.306307 4.237678 sample estimates: ratio of variances 3.743136 > t.test(a,b,var.equal=F) Welch Two Sample t-test data: a and b t = 0.3949, df = 1497.218, p-value = 0.693 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.1106290 0.1663946 sample estimates: mean of x mean of y 0.022749483 -0.005133326 > Is the mean of a different from the mean of b? p-value = 0.693 Fail to reject the null hypothesis that the means are different.
  • 34. Part three: External data #Datasets can often be too large to type into R. This section of the guide will show you how to automatically read data into R and then perform an analysis #For this test we will perform a one-way analysis of variance (ANOVA) Country #Right click on the dataset embedded above the arrow , move the mouse to ‘Macro- Enabled Worksheet Object’, click Open, and then save the table as IUCN.csv (a comma separated values file) to a folder on your computer #The dataset contains a count of endangered species for sixty randomly selected countries in three different regions. These data have been extracted from Table 6a of the IUCN Red List summary statistics: http://www.iucnredlist.org/documents/summarystatistics/2010_3RL_Stats_Table_6a.pdf
  • 35. Part three: External data #We are going to use a one-way ANOVA to see if the mean number of endangered species is different in different regions (AFRICA, ASIA and EUROPE). #First step: we will now tell R where to look for the file, using the setwd()function > setwd("H:/Projects/Teaching/R") #Hint: your working directory will be different to mine #Note: we use forwardslashes / and not backslashes #Second step: we read the file into R as a new object called IUCN. The term sep="," is used because values in the dataset are separated by commas. The term header=T is used because the first row of the IUCN table contains column names > IUCN=read.table("IUCN.csv",sep=",",header=T) #Alternatively, if we know the full file path, then we could read the file into R without using setwd() > IUCN=read.table("H:/Projects/Teaching/R/IUCN.csv",sep=",",header=T)
  • 36. Part three: External data #What are the assumptions for a one-way ANOVA? #1) That the data in each group have been randomly selected from a normal distribution #2) That each group of data have the same variance #3) That each group of data is independent #Assumption 3) may be unlikely but we will assume it is true. #We should test 1) and 2) if we want our ANOVA results to be meaningful!
  • 37. Part three: External data #We will use the Shapiro-Wilk test to see if the data from each region (AFRICA, ASIA and EUROPE) and are normally distributed #First though, we will separate out the data for each region so that we can test for normality separately > af=IUCN[which(IUCN[,2]=="AFRICA"),3] #Let’s take a closer look: IUCN[,2]calls up the second column of the IUCN object #The which() function is asking ‘which of the values in column 2 of the IUCN object contain the word “AFRICA”? which(IUCN[,2]=="AFRICA"). This give us the Africa row values. #Now we can use the Africa row values to find the number of Endangered species for each African country. These species counts are stored in column 3 of the IUCN object. IUCN[which(IUCN[,2]=="AFRICA"),3] #Now we store the endangered species counts for African countries as the af object af=IUCN[which(IUCN[,2]=="AFRICA"),3]
  • 38. Part three: External data #Repeat for ASIA and EUROPE > ai=IUCN[which(IUCN[,2]=="ASIA"),3] > eu=IUCN[which(IUCN[,2]=="EUROPE"),3]
  • 39. Part three: External data #We will use a Bartlett Test of Homogeneity of Variances1 to test if variance is equal across our three groups (AFRICA, ASIA, EUROPE). #The function for the Bartlett test is simply Bartlett.test(). The terms for this function will be the Endangered species column of the IUCN object, and the Region column of the IUCN object. Column 3 and column 2 respectively. #A Bartlett operates similar to an F-test. The null hypothesis for this Bartlett-test is that the groups have equal variances. #We would reject the null hypothesis for our dataset if we received a p-value of <0.05. > bartlett.test(IUCN[,3]~IUCN[,2]) Bartlett test of homogeneity of variances data: IUCN[, 3] by IUCN[, 2] Bartlett's K-squared = 11.6261, df = 2, p-value = 0.002988 11Box, G.E.P. (1953). "Non-Normality and Tests on Variances". Biometrika 40 (3/4): 318–335.
  • 40. Part three: External data #Here we reject the null hypothesis – at least Region has a variance that is not equal to the variance of another Region in the dataset. #Our dataset does not satisfy the second assumption of the ANOVA. We can still proceed however. #The ANOVA test is robust to violations of this second assumption. This means that it can still produce meaningful results even if the groups do not have equal variances. As a rule of thumb, we can proceed if the maximum variance of our groups is less than 4 times greater than the minimum variance of our groups. > var(af) [1] 25.07692 > var(ai) [1] 9.002849 > var(eu) [1] 7.464387 > #The variance of the number of endangered species in Africa is substantially greater than the other two variance values. However, the Africa group variance is less than 4 time the variance of the Europe group > var(eu)<4*var(af) [1] TRUE #So, we will proceed, but we need to be aware that with unequal variances is will be tougher for an analysis of variance to find a significant result.
  • 41. Part three: External data #Perform the one-way ANOVA using the aov() function with the following syntax, and store the results as an object called IUCN_ANOVA > IUCN_ANOVA=aov(Endangered_species~Region,data=IUCN) #You can see the ANOVA results by calling up the IUCN_ANOVA object > IUCN_ANOVA Call: aov(formula = Endangered_species ~ Region, data = IUCN) Terms: Region Residuals Sum of Squares 703.284 1080.148 Deg. of Freedom 2 78 Residual standard error: 3.721297 Estimated effects may be unbalanced >
  • 42. Part three: External data #Use the summary() function to find out more about the ANOVA > summary(IUCN_ANOVA) Df Sum Sq Mean Sq F value Pr(>F) Region 2 703.3 351.6 25.39 3.21e-09 *** Residuals 78 1080.1 13.8 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > #Interpretation: How do we read this table to find out if the mean number of endangered species is different in different regions? #The null hypothesis for this test is that the mean number of endangered species is the same in each region. We would reject this null hypothesis if the p-value (i.e. Pr(>F)) is less than the significance level for this test (i.e. <0.05). So, we reject the null hypothesis, and conclude that the mean number of endangered species is significantly different between regions.
  • 43. Part three: External data #Are the number of endangered animals different between all regions, or just different for one region? To find out we will use Tukey’s Honest Significant Difference test. #The function for Tukey’s HSD is simply TukeyHSD(). The test uses the following syntax > TukeyHSD(IUCN_ANOVA,"Region") Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = Endangered_species ~ Region, data = IUCN) $Region diff lwr upr p adj ASIA-AFRICA -4.185185 -6.605050 -1.7653208 0.0002620 EUROPE-AFRICA -7.185185 -9.605050 -4.7653208 0.0000000 EUROPE-ASIA -3.000000 -5.419864 -0.5801356 0.0111684 #Tukey’s HSD provides a pairwise test of each group in the ANOVA. Any Region pair with a p adj value <0.05 had a significantly different number of endangered species.
  • 44. Part three: External data #Bonus: Let’s plot our IUCN data to better visualise these results > boxplot(Endangered_species~Region,data=IUCN) maximum (excl. outliers) upper quartile mean lower quartile minimum (excl. outliers) AFRICA ASIA EUROPE 5 10 15 20 25 AFRICA ASIA EUROPE 5 10 15 20 25 Outlier
  • 45. Part three: Exercise #Plotting basics #To quickly generate a plot in R using only default options, simply use the plot() function. > plot(af) > #There are many variables that you change to improve the look of your plots plot(af,xlab="Country",main="Africa",col=rainbow(100),p ch=16,ylab="Endangered species (number)",cex=2,font=6) barplot(af,col="red",names.arg=IUCN[which(IUCN[,2]=="AF RICA"),1],las=2,ylab="Endangered species (count)",main="Africa") #Use ?plot and ?barplot to learn about the variables you can change when plotting data
  • 46. Part four: Packages and libraries #You have been using some of the basic functions that are packaged with R, and you have been either generating or importing datasets #Anyone can write a new function in R though, or make a dataset, and these functions and datasets can be bundled together into a package #R is modular, which means you can download and install new packages to give you access to new functions and/or datasets #There is an automatic and a manual method for installing packages. This guide will teach you how to manually install packages in R #Why the manual method you ask? Because R requires internet access to download packages, which can be complicated by a University proxy. I can’t guarantee that the proxy won’t be an issue. That’s why. Well that, and it will be good for you.
  • 47. Part four: Packages and libraries #This will be an exercise in downloading the ‘Analyses of Phylogenetics and Evolution’ package, first written by Emmanuel Paradis in 2008 #The abbreviation for this package is ape
  • 48. Part four: Packages and libraries #Open a web browser and enter http://cran.r-project.org/web/packages/ape/index.html into the address bar – go to the website. The page should be mostly black text on a white background. #Find the Downloads section towards the bottom of the website. #For mac users: download the Mac OS X binary (ape_3.1-4.tgz) #For PC users: download the Windows binary (ape_3.1-4.zip) #For UNIX users: again, seek professional help #Save the ape_3.1-4.xxx file somewhere on your computer that you can easily find #Note to future users: the file name may be slightly different if Paradis has updated ape
  • 49. Part four: Packages and libraries #Run R #Use the install.packages function with the following syntax to install the ape package > install.packages("H:/Teaching/ape_3.1-4.zip") #Remember to replace my file path “H:/Teaching/” with the file path of the folder where you downloaded the ape package #You should see text like this appear after you enter the install.packages command Installing package into ‘C:/Documents/R/win-library/3.1’ (as ‘lib’ is unspecified) inferring 'repos = NULL' from the file name package ‘ape’ successfully unpacked and MD5 sums checked #Congratulations, you have now added functions and datasets written by Emmanuel Paradis to your own copy of R
  • 50. Part four: Packages and libraries #You only need to install a package into R once. The package is now available as a ‘library’. If you want to use the ape library in your current R session, then you need to load the library into R > library(ape) > #So, you install a package once, and load a library many times (every time you run R and want to use the library) #The ape library is now available for youto use. Ape is a library of datasets and tools that have been designed around phylogenetic analyses. We quickly will explore some of the data and functions in ape: > data(bird.orders) #The data function loads a dataset into R. Here we have loaded the bird orders dataset that is part of the ape library
  • 51. Part four: Packages and libraries > plot(bird.orders) #The plot function detects that bird.orders is a special type of object – it is a ‘phylo’ class of object. This type of object is a different object class from the vectors, matrices and data frames that we have been working with #The ape library has a special plot function for plotting ‘phylo’ objects. This special plot function replaced the normal plot function when we tried to plot bird.orders. #Don’t worry! All of this happened automatically because we installed the ape package
  • 52. Part four: Packages and libraries #Test: Use the ? (help) function for plot.phylo to learn how to plot the bird.orders dataset as a fan, as below > ?plot.phylo
  • 53. Part four: Exercise #Download, install and load two packages: ggplot2 and labeling #Get the packages using Google ‘r ggplot2 cran’ and ‘r labeling cran’ or use the links below http://cran.r-project.org/web/packages/labeling/index.html http://cran.r-project.org/web/packages/ggplot2/index.html #Use the new data and functions provided by these packages to plot the density of diamonds against their weight (carat). > qplot(carat, data = diamonds, geom = "density", colour = color) > #For more information on ggplot see http://ggplot2.org/book/qplot.pdf
  • 54. Part five: Scripts #One of the best features of R is the ability to automatically carry out many commands, one after another. For this type of operation we would first write all of our commands into a script, and then enter the entire script into R in one action #We are going to use previously scripted code for this section of the guide. Our script will generate, analyse and plot some data. #Go ahead and open this embedded text file by right clicking on it and clicking ‘Packager Shell Object Object’  ‘Activate Contents’ #Copy the entire contents of this notepad document and paste it all into R #Now, read through the notepad document to find out what has taken place
  • 55. Part six: Logic (programming) #There are many functions in R that do more than just basic mathematical operations #We have seen one already, the which() function. This function looked through an object to find a particular value that we wanted. > which(IUCN[,2]==“AFRICA”) #Here we will focus on loops, which we access using the for() function. #A loop is written as follows >for(i in 1:10){ } # for starts the loop # i is a value that will be updated as the loop iterates # 1 is the starting value for i # 10 is the final value for i #The curly brackets {} enclose the calculations that are looped
  • 56. Part six: Logic (programming) #Make j = 1 > j=1 #We will use a loop to increase the value of j by i through ten iterations > for(i in 1:10){ j+i } #We don’t get to see what happens inside a loop unless we specifically ask for it > for(i in 1:10){ + print(j+i) + } [1] 2 [1] 3 [1] 4 [1] 5 [1] 6 [1] 7 [1] 8 [1] 9 [1] 10 [1] 11 >
  • 57. Part six: Logic (programming) #What is the new value of j? j is still 1, because we did not store the changed value. > for(i in 1:10){ + j=j+1 + } > j [1] 11 #j is now equal to 11. How did that happen? > j=1 > for(i in 1:10){ + j=j+1 + print(j) + } [1] 2 [1] 3 [1] 4 [1] 5 [1] 6 [1] 7 [1] 8 [1] 9 [1] 10 [1] 11
  • 58. Part six: Exercise #Make a vector of ten random numbers #Using a loop, add 100 to each number in the vector, in sequence. For example, in the first iteration of your loop you will add 100 to the first value of your vector, in the second iteration of your loop you will add 100 to the second value of your vector, and so on.
  • 59. Part six: Exercise > x=rnorm(10) > x [1] -0.81673186 0.35409408 0.69619606 -2.04003445 - 1.02832503 -0.31418186 [7] 0.09717105 0.78778455 -0.15048025 1.86026573 > > > for(i in 1:length(x)){ + x[i]=x[i]+100 + } > x [1] 99.18327 100.35409 100.69620 97.95997 98.97167 99.68582 100.09717 [8] 100.78778 99.84952 101.86027 >
  • 60. How far does a Duvaucel's gecko travel after release? Department of Conservation Reference: 10039929 Photograph by Chris Smuts-Kennedy Grid of monitored stations Methods: • Record the grid coordinates of the station where the gecko is released • Each day for three subsequent days measure the grid coordinates of the station where the gecko is found • Calculate the distance between recorded stations • 10 m by 10 m grid 1 m 1 m
  • 61. #Step one: Set up the monitoring grid data for each day. 0 means that the gecko was not observed in that grid cell, 1 means that the gecko was observed in that grid cell. #Release day set.seed(1) d0=rep(0,100) d0[round(runif(1,min=0,max=100))]=1 day.zero=matrix(d0,ncol=10,nrow=10) #Day one check set.seed(2) d1=rep(0,100) d1[round(runif(1,min=0,max=100))]=1 day.one=matrix(d1,ncol=10,nrow=10) #Day two check set.seed(3) d2=rep(0,100) d2[round(runif(1,min=0,max=100))]=1 day.two=matrix(d2,ncol=10,nrow=10) #Day three check set.seed(4) d3=rep(0,100) d3[round(runif(1,min=0,max=100))]=1 day.three=matrix(d3,ncol=10,nrow=10)
  • 62. #Step two: Combine all of the grid data into one list. This will help us quickly analyse the data as a single batch. days=list(day.zero,day.one,day.two,day.three) #Step three: Create a matrix where we will store the grid locations for the gecko location, and calculate the daily distance. movement=matrix(0,ncol=3,nrow=length(days)) colnames(movement)=c("Easting","Northing","Displacement (m)") #Step four: Find the grid cell for the location of the gecko on each day and store that information in the movement matrix. for(i in 1:length(days)){ movement[i,1]=which(days[[i]]==1, arr.ind=TRUE)[1] movement[i,2]=which(days[[i]]==1, arr.ind=TRUE)[2] }
  • 63. #Step five: Calculate the distance that the gecko travelled each day. for(j in 2:length(days)){ movement[j,3]=sqrt(((abs(movement[j,1]-movement[j- 1,1]))^2)+((abs(movement[j,2]-movement[j-1,2]))^2)) } #Step six: Plot the distance between each station where the gecko was found on each subsequent day. barplot(movement[,3],xlab="Day",ylab="Displament (m)",main="Gecko distance")
  • 64. Conclusion #By now you should have a good understanding of how to use R #We have covered all of the basic ways of interacting with R: - Storing data - Plotting data - Analysing data with functions - Loading new functions for data analysis #There is so much further you can take this though – your imagination is the limit! #You should think of this tutorial as a quick reference guide to help get you on your feet #You can also check out tutorial videos at illuminatingaotearoa.wordpress.com/zoostar

Editor's Notes

  1. http://cran.r-project.org/doc/manuals/R-intro.pdf
  2. http://cran.r-project.org/doc/manuals/R-intro.pdf
  3. points(c(0,0),c(Afr.up,Afr.down),type="l") points(c(1,1),c(Afr.up,Afr.down),type="l")
  4. Open embedded notepad file