Iteration and functions
Day 4 - Introduction to R for Life Sciences
What is iteration
Repeating a process to a set of values
Often automatic:
c <- a + b # Works for vectors and matrices
Reminder: value-recycling
matrix(1:4, nrow=2, byrow=TRUE) + c(70,90) →
[,1] [,2]
[1,] 71 72
[2,] 93 94 # (the recycling is column-wise!)
Aggregation
You often need to "summarize" a vector as one number:
length(), min(), max(), mean(), median(), sd(), var(), mad(), IQR(),
sum(), prod(), all(), any(), ... # (see also: summary() )
How to do this for all rows or columns of a matrix? apply() !
result <- apply(mat, dir, FUNC, ...)
mat a matrix
dir direction: 1=rows, 2=columns
FUNC a function taking a vector, producing a single value
... additional argument(s) to FUNC
apply() cont’d
> m <- matrix(1:6, nrow=2)
> m
[,1] [,2] [,3]
[1,] 1 3 5 # filled column-wise by default
[2,] 2 4 6
> apply(m, 1, sum) # row-wise sum
[1] 9 12
> apply(m, 2, prod, na.rm=TRUE) # column-wise product, ignoring NA’s
[1] 2 12 30
And also:
lapply(lst, FUNC) # iterate over list-contents
tapply(), sapply(), mapply(), …
lapply() and sapply()
Works on lists, rather than matrices
> lst <- list(a=runif(7), b=runif(2), <etc>)
$a
[1] 0.971 0.380 0.287 0.787 0.721 0.938 0.364
$b
[1] 0.609 0.658
<etc>
> lapply(lst, mean)
$a
[1] 0.635
$b
[1] 0.633
<etc>
Custom built functions
apply() can use existing functions, but they may not suffice
e.g.: number of values exceeding 2*stddev from mean
Define a function n.exceeding2sd(), then call apply() as before
v <- apply(m, 1, n.exceeding2sd)
Functions
Functions are "recipes" that automate actions
Needed for reuse and clarity
Easy and cheap
Have to be defined
Can be called
Inputs are called arguments (which can have defaults)
special argument: the triple-dot plot(x, y, …)
Functions can have local variables (try to avoid global variables)
Outputs are called return values
Full function example
Definition:
sum.of.squares <- function(x) { # argument(s)
s <- x^2 # variables x, s and tot are local
tot <- sum(s) # note: indentation!
tot # last value is returned
} # end of definition
Call:
ss <- sum.of.squares(some.vector)
ss.percolumn <- apply(data, 2, sum.of.squares)
More complex example
Definition:
n.exceeding.SDs <- function(x, n=2, na.rm=FALSE) {
m <- mean(x, na.rm=na.rm)
s <- sd(x, na.rm=na.rm)
abs.z <- abs((x - m)/s) # Z-scores, all made positive
sum( abs.z > n, na.rm=na.rm) # last value is returned
}
Call:
n <- n.exceeding.SDs(some.vector)
outliers.per.column <- apply(data, 2, n.exceeding.sds)

Day 4a iteration and functions.pptx

  • 1.
    Iteration and functions Day4 - Introduction to R for Life Sciences
  • 2.
    What is iteration Repeatinga process to a set of values Often automatic: c <- a + b # Works for vectors and matrices Reminder: value-recycling matrix(1:4, nrow=2, byrow=TRUE) + c(70,90) → [,1] [,2] [1,] 71 72 [2,] 93 94 # (the recycling is column-wise!)
  • 3.
    Aggregation You often needto "summarize" a vector as one number: length(), min(), max(), mean(), median(), sd(), var(), mad(), IQR(), sum(), prod(), all(), any(), ... # (see also: summary() ) How to do this for all rows or columns of a matrix? apply() ! result <- apply(mat, dir, FUNC, ...) mat a matrix dir direction: 1=rows, 2=columns FUNC a function taking a vector, producing a single value ... additional argument(s) to FUNC
  • 4.
    apply() cont’d > m<- matrix(1:6, nrow=2) > m [,1] [,2] [,3] [1,] 1 3 5 # filled column-wise by default [2,] 2 4 6 > apply(m, 1, sum) # row-wise sum [1] 9 12 > apply(m, 2, prod, na.rm=TRUE) # column-wise product, ignoring NA’s [1] 2 12 30 And also: lapply(lst, FUNC) # iterate over list-contents tapply(), sapply(), mapply(), …
  • 5.
    lapply() and sapply() Workson lists, rather than matrices > lst <- list(a=runif(7), b=runif(2), <etc>) $a [1] 0.971 0.380 0.287 0.787 0.721 0.938 0.364 $b [1] 0.609 0.658 <etc> > lapply(lst, mean) $a [1] 0.635 $b [1] 0.633 <etc>
  • 6.
    Custom built functions apply()can use existing functions, but they may not suffice e.g.: number of values exceeding 2*stddev from mean Define a function n.exceeding2sd(), then call apply() as before v <- apply(m, 1, n.exceeding2sd)
  • 7.
    Functions Functions are "recipes"that automate actions Needed for reuse and clarity Easy and cheap Have to be defined Can be called Inputs are called arguments (which can have defaults) special argument: the triple-dot plot(x, y, …) Functions can have local variables (try to avoid global variables) Outputs are called return values
  • 8.
    Full function example Definition: sum.of.squares<- function(x) { # argument(s) s <- x^2 # variables x, s and tot are local tot <- sum(s) # note: indentation! tot # last value is returned } # end of definition Call: ss <- sum.of.squares(some.vector) ss.percolumn <- apply(data, 2, sum.of.squares)
  • 9.
    More complex example Definition: n.exceeding.SDs<- function(x, n=2, na.rm=FALSE) { m <- mean(x, na.rm=na.rm) s <- sd(x, na.rm=na.rm) abs.z <- abs((x - m)/s) # Z-scores, all made positive sum( abs.z > n, na.rm=na.rm) # last value is returned } Call: n <- n.exceeding.SDs(some.vector) outliers.per.column <- apply(data, 2, n.exceeding.sds)