Stat405 Data structures
Hadley Wickham
Thursday, 11 November 2010
Assessment
• Final: there is no ﬁnal
• Two groups still haven’t sent me
electronic versions of their project
• Many of you STILL HAVEN’T returned
your team evaluations
Thursday, 11 November 2010
1. Basic data types
2. Vectors, matrices & arrays
3. Lists & data.frames
Thursday, 11 November 2010
1d Vector List
2d Matrix Data frame
nd Array
Same types Different types
Thursday, 11 November 2010
When vectors of
character as.character(c(T, F))
different types occur
as.character(seq_len(5)) in an expression,
as.logical(c(0, 1, 100)) they will be
numeric automatically
as.logical(c("T", "F", "a"))
as.numeric(c("A", "100")) coerced to the same
logical as.numeric(c(T, F))
type: character >
numeric > logical
mode()
names() Optional, but useful
length() A scalar is a vector of length 1
Technically, these are all atomic vectors
Thursday, 11 November 2010
Your turn
Experiment with automatic coercion.
What is happening in the following cases?
104 & 2 < 4
mean(diamonds$cut == "Good")
c(T, F, T, T, "F")
c(1, 2, 3, 4, F)
Thursday, 11 November 2010
Matrix (2d) (1, 12)
Array (>2d) 1 2 3 4 5 6 7 8 9 10 11 12
Just like a vector. (4, 3) (2, 6)
Has mode() and 1 5 9 1 3 5 7 9 11
length().
2 6 10 2 4 6 8 10 12
Create with matrix 3 7 11
() or array(), or a <- seq_len(12)
4 8 12
from a vector by dim(a) <- c(1, 12)
setting dim() dim(a) <- c(4, 3)
dim(a) <- c(2, 6)
as.vector() dim(a) <- c(3, 2, 2)
converts back to a
vector
Thursday, 11 November 2010
List
Is also a vector (so has c(1, 2, c(3, 4))
mode, length and names), list(1, 2, list(3, 4))
but is different in that it can
store any other vector inside c("a", T, 1:3)
it (including lists). list("a", T, 1:3)
Use unlist() to convert to a <- list(1:3, 1:5)
a vector. Use as.list() to unlist(a)
convert a vector to a list. as.list(a)
b <- list(1:3, "a", "b")
unlist(b)
Technically a recursive vector
Thursday, 11 November 2010
Data frame
List of vectors, each of the
same length. (Cross
between list and matrix)
Different to matrix in that
each column can have a
different type
Thursday, 11 November 2010
# How do you convert a matrix to a data frame?
# How do you convert a data frame to a matrix?
# What is different?
# What does these subsetting operations do?
# Why do they work? (Remember to use str)
diamonds[1]
diamonds[[1]]
diamonds[["cut"]]
Thursday, 11 November 2010
x <- sample(12)
# What's the difference between a & b?
a <- matrix(x, 4, 3)
b <- array(x, c(4, 3))
# What's the difference between x & y
y <- matrix(x, 12)
# How are these subsetting operations different?
a[, 1]
a[, 1, drop = FALSE]
a[1, ]
a[1, , drop = FALSE]
Thursday, 11 November 2010
b <- seq_len(10)
a <- letters[b]
# What sort of matrix does this create?
rbind(a, b)
cbind(a, b)
# Why would you want to use a data frame here?
# How would you create it?
Thursday, 11 November 2010
load(url("http://had.co.nz/stat405/data/quiz.rdata"))
# What is a? What is b?
# How are they different? How are they similar?
# How can you turn a in to b?
# How can you turn b in to a?
# What are c, d, and e?
# How are they different? How are they similar?
# How can you turn one into another?
# What is f?
# How can you extract the first element?
# How can you extract the first value in the first
# element?
Thursday, 11 November 2010
# a is numeric vector, containing the numbers 1 to 10
# b is a list of numeric scalars
# they contain the same values, but in a different format
identical(a[1], b[[1]])
identical(a, unlist(b))
identical(b, as.list(a))
# c is a named list
# d is a data.frame
# e is a numeric matrix
# From most to least general: c, d, e
identical(c, as.list(d))
identical(d, as.data.frame(c))
identical(e, data.matrix(d))
Thursday, 11 November 2010
# f is a list of matrices of different dimensions
f[[1]]
f[[1]][1, 2]
Thursday, 11 November 2010
# What does these subsetting operations do?
# Why do they work? (Remember to use str)
diamonds[1]
diamonds[[1]]
diamonds[["cut"]]
diamonds[["cut"]][1:10]
diamonds$cut[1:10]
Thursday, 11 November 2010
# What's the difference between a & b?
a <- matrix(x, 4, 3)
b <- array(x, c(4, 3))
# What's the difference between x & y
y <- matrix(x, 12)
# How are these subsetting operations different?
a[, 1]
a[, 1, drop = FALSE]
a[1, ]
a[1, , drop = FALSE]
Thursday, 11 November 2010
b <- seq_len(10)
a <- letters[b]
# What sort of matrix does this create?
rbind(a, b)
cbind(a, b)
# Why would you want to use a data frame here?
# How would you create it?
Thursday, 11 November 2010