Access, select & ordering
Day 1 - Introduction to R for Life Sciences
Accessing vectors, matrices, data.frames
Positions within vectors, matrices and data.frames are accessed
using [ ]:
> v <- c(10, 3, 5, 10)
> v[2]
3
[] can also be used to assign (write) new values, e.g: v[2] <- 10
( ) are used for function calls (or grouping operators, more later) !!!
for instance: myvector <- c( ), mymatrix <- matrix( ), mydata <- data.frame( )
Three ways to access values from vectors, matrices
and data.frames
Integers: specify the positions of the elements you mean
Logical: specify (using TRUE/FALSE) which elements you want
Character: specify their names
only if your vector/matrix/data.frame has (unique) names!
All these selections are made with vectors.
They are sometimes called indexes.
Examples:
chromlength <- c(230218, 813184, 316620, 1531933)
Integer:
chromlength[ c(4, 2) ] => 1531933, 813184
Logical:
chromlength[ c(FALSE, FALSE, TRUE, FALSE) ] => 316620
Character:
names(chromlength) <- c("chrI", "chrII", "chrIII", "chrIV")
chromlength[ c("chrIII", "chrI") ] => 316620, 230218
Specifics for lists & data.frames
lists
>mylist <- list(analysis=”GSEA”, genes=c(“Foxo3a”, “TP53”), cutoff=0.05)
> mylist$analysis
> mylist$genes[2]
data.frames
> mydata[ , "id"]
> mydata$id # does the same thing
Dimensions of data.frames and matrices
> x <- matrix(1:6, nrow=2, byrow=TRUE)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
x[ i, j ]
index before the comma: indicates the row(s). If missing: all rows
index after the comma: indicates column(s). If missing: all columns
Example
> x
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
Using integers:
> x[2, 3] # the value on the second row, third column
> x[ , 2] # all rows, second column. So: the whole 2nd column
> x[ , c(1,3)] # the first and third column (new data.frame or matrix!)
> x[ , -2] # everything but the second column
> x[ , 1:3] # first up to and including third column
Using logicals:
delA delB delC
geneA 1 2 3
geneB 4 5 6
> ind <- c(FALSE, TRUE, TRUE)
> x[ 1 , ind] # first row; first column:no, 2nd, 3rd column: yes
[1] 2 3
Using characters:
> x <- matrix(1:6, nrow=2, byrow=TRUE,
dimnames=list( c("geneA", "geneB"), c("delA", "delB", "delC"))
delA delB delC
geneA 1 2 3
geneB 4 5 6
> x["geneB", "delA"] # selects the value of geneB in delA
> x[, c("delA", "delC")] # selects columns delA and delC
Logical vector and selection
Often (implicitly) used in combination with select statements
delA delB delC
geneA 1 2 3
geneB 4 5 6
> ind <- x["geneA", ] > 1
[1] FALSE TRUE TRUE
> x["geneA", ind]
[1] 2 3
> x["geneA", x["geneA", ] > 1] # same as above, but implicit
Operators
< # Less than
> # Greater than
== # Equal to. Note: don’t confuse with = (assignment)
>= # Greater than or equal to
<= # Less than or equal to
& # AND
| # OR
Note: x <- 2 is an assignment
x < -2 is a comparison! Use extra spaces or parentheses
AND (&), OR (|) , NOT (!)
a b a & b
FALSE FALSE FALSE
FALSE TRUE FALSE
TRUE FALSE FALSE
TRUE TRUE TRUE
FALSE NA FALSE
TRUE NA NA
a b a | b
FALSE FALSE FALSE
FALSE TRUE TRUE
TRUE FALSE TRUE
TRUE TRUE TRUE
FALSE NA NA
TRUE NA TRUE
a ! a
FALSE TRUE
TRUE FALSE
NA NA
Auto-recycling of vector content
If you combine vectors of different length, R will automatically
‘recycle’ the content of the shortest vector to become the length of
the longest:
> mynumbers <- c(10.4, 5, 8.4, 3)
> mynumbers2 <- mynumbers + 1
> mynumbers2
11.4, 6, 9.4, 4 # In fact, mynumbers + c(1, 1, 1, 1) is done
But also:
> mynumbers2 + c(2, 30)
13.4, 36, 11.4, 34 # Here, mynumbers2 + c(2, 30, 2, 30) is done.
Recycling also works with logical operators
Comparison of equal length vectors (no recycling needed) :
> v1 <- c(10, 5, 5, 1)
> v2 <- c(10, 3, 5, 2)
> v1 == v2
TRUE, FALSE, TRUE, FALSE
Comparison of unequal length vectors:
> v1 == 5 # The value 5 is recycled to get an equal length vector.
# So in fact, v1 == c(5,5,5,5) is done
FALSE, TRUE, TRUE, FALSE
Operators
delA delB delC
geneA 1 2 3
geneB 4 5 6
> ind <- x["geneA", ] > 1 & x["geneA", ] < 3
> x["geneA", ind]
[1] 2
Combining logical operators
AND-operator has precedence over OR-operator
(like in mathematics: *, / have precedence over -, +)
Group them with parentheses if needed, or for clarity
> ind <- ( x < -1.7 | x > 2 ) & !is.na(x)
Select statements
Special (common) functions, all return a logical vector
is.na()
is.numeric() (and also is.character(), is.factor(), is.matrix(), is.data.frame() )
duplicated()
! # (exclamation mark): logical NOT, i.e. negation
Used a lot in checking the consistency of your data or arguments
for a function
Ordering
(Re)order a data.frame or matrix using the values from a single
column using order()
> mydata <- data.frame( id=c(1,3,4,2), name=c("geneB", "geneA", "geneD",
"geneC"), value=c(-0.2, 1.5, -3, 3))
> mydata[order(mydata[, "id"]), ] # sort on id
> mydata[order(mydata[, "name"]), ] # sort on name

Day 1c access, select ordering copy.pptx

  • 1.
    Access, select &ordering Day 1 - Introduction to R for Life Sciences
  • 2.
    Accessing vectors, matrices,data.frames Positions within vectors, matrices and data.frames are accessed using [ ]: > v <- c(10, 3, 5, 10) > v[2] 3 [] can also be used to assign (write) new values, e.g: v[2] <- 10 ( ) are used for function calls (or grouping operators, more later) !!! for instance: myvector <- c( ), mymatrix <- matrix( ), mydata <- data.frame( )
  • 3.
    Three ways toaccess values from vectors, matrices and data.frames Integers: specify the positions of the elements you mean Logical: specify (using TRUE/FALSE) which elements you want Character: specify their names only if your vector/matrix/data.frame has (unique) names! All these selections are made with vectors. They are sometimes called indexes.
  • 4.
    Examples: chromlength <- c(230218,813184, 316620, 1531933) Integer: chromlength[ c(4, 2) ] => 1531933, 813184 Logical: chromlength[ c(FALSE, FALSE, TRUE, FALSE) ] => 316620 Character: names(chromlength) <- c("chrI", "chrII", "chrIII", "chrIV") chromlength[ c("chrIII", "chrI") ] => 316620, 230218
  • 5.
    Specifics for lists& data.frames lists >mylist <- list(analysis=”GSEA”, genes=c(“Foxo3a”, “TP53”), cutoff=0.05) > mylist$analysis > mylist$genes[2] data.frames > mydata[ , "id"] > mydata$id # does the same thing
  • 6.
    Dimensions of data.framesand matrices > x <- matrix(1:6, nrow=2, byrow=TRUE) [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 x[ i, j ] index before the comma: indicates the row(s). If missing: all rows index after the comma: indicates column(s). If missing: all columns
  • 7.
    Example > x [,1] [,2][,3] [1,] 1 2 3 [2,] 4 5 6 Using integers: > x[2, 3] # the value on the second row, third column > x[ , 2] # all rows, second column. So: the whole 2nd column > x[ , c(1,3)] # the first and third column (new data.frame or matrix!) > x[ , -2] # everything but the second column > x[ , 1:3] # first up to and including third column
  • 8.
    Using logicals: delA delBdelC geneA 1 2 3 geneB 4 5 6 > ind <- c(FALSE, TRUE, TRUE) > x[ 1 , ind] # first row; first column:no, 2nd, 3rd column: yes [1] 2 3
  • 9.
    Using characters: > x<- matrix(1:6, nrow=2, byrow=TRUE, dimnames=list( c("geneA", "geneB"), c("delA", "delB", "delC")) delA delB delC geneA 1 2 3 geneB 4 5 6 > x["geneB", "delA"] # selects the value of geneB in delA > x[, c("delA", "delC")] # selects columns delA and delC
  • 10.
    Logical vector andselection Often (implicitly) used in combination with select statements delA delB delC geneA 1 2 3 geneB 4 5 6 > ind <- x["geneA", ] > 1 [1] FALSE TRUE TRUE > x["geneA", ind] [1] 2 3 > x["geneA", x["geneA", ] > 1] # same as above, but implicit
  • 11.
    Operators < # Lessthan > # Greater than == # Equal to. Note: don’t confuse with = (assignment) >= # Greater than or equal to <= # Less than or equal to & # AND | # OR Note: x <- 2 is an assignment x < -2 is a comparison! Use extra spaces or parentheses
  • 12.
    AND (&), OR(|) , NOT (!) a b a & b FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE TRUE TRUE TRUE FALSE NA FALSE TRUE NA NA a b a | b FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE NA NA TRUE NA TRUE a ! a FALSE TRUE TRUE FALSE NA NA
  • 13.
    Auto-recycling of vectorcontent If you combine vectors of different length, R will automatically ‘recycle’ the content of the shortest vector to become the length of the longest: > mynumbers <- c(10.4, 5, 8.4, 3) > mynumbers2 <- mynumbers + 1 > mynumbers2 11.4, 6, 9.4, 4 # In fact, mynumbers + c(1, 1, 1, 1) is done But also: > mynumbers2 + c(2, 30) 13.4, 36, 11.4, 34 # Here, mynumbers2 + c(2, 30, 2, 30) is done.
  • 14.
    Recycling also workswith logical operators Comparison of equal length vectors (no recycling needed) : > v1 <- c(10, 5, 5, 1) > v2 <- c(10, 3, 5, 2) > v1 == v2 TRUE, FALSE, TRUE, FALSE Comparison of unequal length vectors: > v1 == 5 # The value 5 is recycled to get an equal length vector. # So in fact, v1 == c(5,5,5,5) is done FALSE, TRUE, TRUE, FALSE
  • 15.
    Operators delA delB delC geneA1 2 3 geneB 4 5 6 > ind <- x["geneA", ] > 1 & x["geneA", ] < 3 > x["geneA", ind] [1] 2
  • 16.
    Combining logical operators AND-operatorhas precedence over OR-operator (like in mathematics: *, / have precedence over -, +) Group them with parentheses if needed, or for clarity > ind <- ( x < -1.7 | x > 2 ) & !is.na(x)
  • 17.
    Select statements Special (common)functions, all return a logical vector is.na() is.numeric() (and also is.character(), is.factor(), is.matrix(), is.data.frame() ) duplicated() ! # (exclamation mark): logical NOT, i.e. negation Used a lot in checking the consistency of your data or arguments for a function
  • 18.
    Ordering (Re)order a data.frameor matrix using the values from a single column using order() > mydata <- data.frame( id=c(1,3,4,2), name=c("geneB", "geneA", "geneD", "geneC"), value=c(-0.2, 1.5, -3, 3)) > mydata[order(mydata[, "id"]), ] # sort on id > mydata[order(mydata[, "name"]), ] # sort on name