Stat405                Data structures


                                Hadley Wickham
Wednesday, 30 September 2009
1. Outline for next couple of weeks
               2. Basic data types
               3. Vectors, matrices & arrays
      ...
Outline
                   More details about the underlying data
                   structures we’ve been using so far.
 ...
1d    Vector          List



                           2d   Matrix       Data frame




                           nd   ...
When vectors of
        character              as.character(c(T, F))
                                                     ...
Your turn
                   Experiment with automatic coercion.
                   What is happening in the following cas...
Matrix (2d)               (1, 12)

     Array (>2d)                1    2   3        4   5   6       7       8       9 10 ...
List
       Is also a vector (so has          c(1, 2, c(3, 4))
       mode, length and names),          list(1, 2, list(3,...
Data frame
          List of vectors, each of the
          same length. (Cross
          between list and matrix)

      ...
# How do you convert a matrix to a data frame?
          # How do you convert a data frame to a matrix?
          # What i...
x <- sample(12)


          # What's the difference between a & b?
          a <- matrix(x, 4, 3)
          b <- array(x, ...
1d                   names()     length()        c()


                               colnames()    ncol()      cbind()
  ...
b <- seq_len(10)
          a <- letters[b]


          # What sort of matrix does this create?
          rbind(a, b)
     ...
load(url("http://had.co.nz/stat405/data/quiz.rdata"))


          # What is a?         What is b?
          # How are they...
# a is numeric vector, containing the numbers 1 to 10
     # b is a list of numeric scalars
     # they contain the same v...
# f is a list of matrices of different dimensions

     f[[1]]
     f[[1]][1, 2]




Wednesday, 30 September 2009
# What does these subsetting operations do?
          # Why do they work?   (Remember to use str)
          diamonds[1]
  ...
Vectors      x[1:4]      —



                 Matrices x[1:4, ]        x[1:4, ,
                  Arrays x[, 2:3, ]      ...
# What's the difference between a & b?
            a <- matrix(x, 4, 3)
            b <- array(x, c(4, 3))


            #...
1d        c()


                                 matrix()
                         2d                     t()
            ...
b <- seq_len(10)
          a <- letters[b]


          # What sort of matrix does this create?
          rbind(a, b)
     ...
Upcoming SlideShare
Loading in...5
×

11 Data Structures

747

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
747
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
30
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

11 Data Structures

  1. 1. Stat405 Data structures Hadley Wickham Wednesday, 30 September 2009
  2. 2. 1. Outline for next couple of weeks 2. Basic data types 3. Vectors, matrices & arrays 4. Lists & data.frames Wednesday, 30 September 2009
  3. 3. Outline More details about the underlying data structures we’ve been using so far. Then pulling apart those data structures (particularly data frames), operating on each piece and putting them back together again. An alternative to for loops when each piece is independent. Wednesday, 30 September 2009
  4. 4. 1d Vector List 2d Matrix Data frame nd Array Same types Different types Wednesday, 30 September 2009
  5. 5. When vectors of character as.character(c(T, F)) different types occur as.character(seq_len(5)) in an expression, as.logical(c(0, 1, 100)) they will be numeric automatically as.logical(c("T", "F", "a")) as.numeric(c("A", "100")) coerced to the same logical as.numeric(c(T, F)) type: character > numeric > logical mode() names() Optional, but useful length() A scalar is a vector of length 1 Technically, these are all atomic vectors Wednesday, 30 September 2009
  6. 6. Your turn Experiment with automatic coercion. What is happening in the following cases? 104 & 2 < 4 mean(diamonds$cut == "Good") c(T, F, T, T, "F") c(1, 2, 3, 4, F) Wednesday, 30 September 2009
  7. 7. Matrix (2d) (1, 12) Array (>2d) 1 2 3 4 5 6 7 8 9 10 11 12 Just like a vector. (4, 3) (2, 6) Has mode() and 1 5 9 1 3 5 7 9 11 length(). 2 6 10 2 4 6 8 10 12 Create with matrix 3 7 11 () or array(), or a <- seq_len(12) 4 8 12 from a vector by dim(a) <- c(1, 12) setting dim() dim(a) <- c(4, 3) dim(a) <- c(2, 6) as.vector() dim(a) <- c(3, 2, 2) converts back to a vector Wednesday, 30 September 2009
  8. 8. List Is also a vector (so has c(1, 2, c(3, 4)) mode, length and names), list(1, 2, list(3, 4)) but is different in that it can store any other vector inside c("a", T, 1:3) it (including lists). list("a", T, 1:3) Use unlist() to convert to a <- list(1:3, 1:5) a vector. Use as.list() to unlist(a) convert a vector to a list. as.list(a) b <- list(1:3, "a", "b") unlist(b) Technically a recursive vector Wednesday, 30 September 2009
  9. 9. Data frame List of vectors, each of the same length. (Cross between list and matrix) Different to matrix in that each column can have a different type Wednesday, 30 September 2009
  10. 10. # How do you convert a matrix to a data frame? # How do you convert a data frame to a matrix? # What is different? # What does these subsetting operations do? # Why do they work? (Remember to use str) diamonds[1] diamonds[[1]] diamonds[["cut"]] Wednesday, 30 September 2009
  11. 11. x <- sample(12) # What's the difference between a & b? a <- matrix(x, 4, 3) b <- array(x, c(4, 3)) # What's the difference between x & y y <- matrix(x, 12) # How are these subsetting operations different? a[, 1] a[, 1, drop = FALSE] a[1, ] a[1, , drop = FALSE] Wednesday, 30 September 2009
  12. 12. 1d names() length() c() colnames() ncol() cbind() 2d rownames() nrow() rbind() abind() nd dimnames() dim() (special package) Wednesday, 30 September 2009
  13. 13. b <- seq_len(10) a <- letters[b] # What sort of matrix does this create? rbind(a, b) cbind(a, b) # Why would you want to use a data frame here? # How would you create it? Wednesday, 30 September 2009
  14. 14. load(url("http://had.co.nz/stat405/data/quiz.rdata")) # What is a? What is b? # How are they different? How are they similar? # How can you turn a in to b? # How can you turn b in to a? # What are c, d, and e? # How are they different? How are they similar? # How can you turn one into another? # What is f? # How can you extract the first element? # How can you extract the first value in the first # element? Wednesday, 30 September 2009
  15. 15. # a is numeric vector, containing the numbers 1 to 10 # b is a list of numeric scalars # they contain the same values, but in a different format identical(a[1], b[[1]]) identical(a, unlist(b)) identical(b, as.list(a)) # c is a named list # d is a data.frame # e is a numeric matrix # From most to least general: c, d, e identical(c, as.list(d)) identical(d, as.data.frame(c)) identical(e, data.matrix(d)) Wednesday, 30 September 2009
  16. 16. # f is a list of matrices of different dimensions f[[1]] f[[1]][1, 2] Wednesday, 30 September 2009
  17. 17. # What does these subsetting operations do? # Why do they work? (Remember to use str) diamonds[1] diamonds[[1]] diamonds[["cut"]] diamonds[["cut"]][1:10] diamonds$cut[1:10] Wednesday, 30 September 2009
  18. 18. Vectors x[1:4] — Matrices x[1:4, ] x[1:4, , Arrays x[, 2:3, ] drop = F] x[[1]] Lists x$name x[1] Wednesday, 30 September 2009
  19. 19. # What's the difference between a & b? a <- matrix(x, 4, 3) b <- array(x, c(4, 3)) # What's the difference between x & y y <- matrix(x, 12) # How are these subsetting operations different? a[, 1] a[, 1, drop = FALSE] a[1, ] a[1, , drop = FALSE] Wednesday, 30 September 2009
  20. 20. 1d c() matrix() 2d t() data.frame() nd array() aperm() Wednesday, 30 September 2009
  21. 21. b <- seq_len(10) a <- letters[b] # What sort of matrix does this create? rbind(a, b) cbind(a, b) # Why would you want to use a data frame here? # How would you create it? Wednesday, 30 September 2009
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×