Manipulationg
   data in
    2013-02-22 @HSPH
   Kazuki Yoshida, M.D.
     MPH-CLE student

                          FREEDOM
                          TO	
  KNOW
Manipulating data in R

n   What are Objects?
n   What is Class attribute?
n   Various data objects you will see in R.
Objects

n   Just about everything named in R is an object
n   An object is a container that
     n   knows its class (label for what’s inside).
     n   has contents (eg, Actual numbers).
Examples of objects
n   dataset, which you use for analysis (various
     classes)
n   functions, which perform analysis (function class)
n   results, which come out of analysis (various
     classes)
     n   In effect, you always get a new dataset filled
          with results when you analyze data.
Classes of data values
      inside data objects
n   Numeric: Continuous variables
n   Factor: Categorical variables
n   Logical: TRUE/FALSE binary variables
n   etc...
Class?

n   An object’s class tells R how the object should be
     handled.
n   For example, summarizing data should work
     differently for numbers and categories!
Object




                                  iables
                              var
                          ical !
                       gor ide
                  C ate ins

                                Class attribute

    http://en.wikipedia.org/wiki/File:3_D-Box.jpg
Data objects

n   Vector (contains single class of data values)


n   List (contains multiple classes of data values)
Data objects

n   Vector (contains single class of data values)
     n   Array including Matrix
n   List (contains multiple classes of data values)
     n   Data frame
Vector
n   Smallest building block of data objects
n   Single dimension
n   Combination of values of same class
n   vec1 <- c(2013, 2, 15, -10) # combine
n   vec2 <- 1:16 # integers 1 to 16
Vector




1-dimensional
Array/Matrix
n   Vector folded into a multidimensional structure
n   2-dimensional array is a matrix
n   vec3 <- 1:16
n   dim(vec3) <- c(4, 4) # 4 x 4 structure
n   dim(vec3) <- c(2, 2, 4) # 2 x 2 x 4 structure
n   arr1 <- array(1:60, dim = c(3,4,5))
Matrix




Folded vector with dimension
List
n   Combination of any values or objects
n   Can contain objects of multiple classes
n   eg, a list of two vectors, a matrix, three arrays
n   List_name$Variable_name operation with $ operator
n   list1 <- list(first = 1:17, second = matrix(letters, 13,2))
n   list2 <- list(alpha = c(1,4,5,7), beta = c("h","s","p","h"))
List
           Multi-part object




  Can contain vectors,
    arrays, or lists!
Data frame
n   Special case of a list
n   List of same-length vectors vertically aligned
n   df1 <- data.frame(list2)
n   list3 <- list(small = letters, large = LETTERS,
     number = 1:26)
n   df2 <- data.frame(list3)
Data Frame




Multiple vectors of same length tied together!
Access by indexes
n   letters[3] # 1-dimensional object
n   arr1[1,2,3] # 3-dimensional object
n   arr1[1, ,3] # implies 1,(all),3
n   df1[ ,3] # implies (all),3
n   list1[[1]] # list needs [[ ]]
Access named elements
n   list3
n   list3$small
n   list3[["small"]]
n   df1$large
n   df1[, "large"]
20130222 Data structures and manipulation in R

20130222 Data structures and manipulation in R

  • 1.
    Manipulationg data in 2013-02-22 @HSPH Kazuki Yoshida, M.D. MPH-CLE student FREEDOM TO  KNOW
  • 2.
    Manipulating data inR n What are Objects? n What is Class attribute? n Various data objects you will see in R.
  • 3.
    Objects n Just about everything named in R is an object n An object is a container that n knows its class (label for what’s inside). n has contents (eg, Actual numbers).
  • 4.
    Examples of objects n dataset, which you use for analysis (various classes) n functions, which perform analysis (function class) n results, which come out of analysis (various classes) n In effect, you always get a new dataset filled with results when you analyze data.
  • 5.
    Classes of datavalues inside data objects n Numeric: Continuous variables n Factor: Categorical variables n Logical: TRUE/FALSE binary variables n etc...
  • 6.
    Class? n An object’s class tells R how the object should be handled. n For example, summarizing data should work differently for numbers and categories!
  • 7.
    Object iables var ical ! gor ide C ate ins Class attribute http://en.wikipedia.org/wiki/File:3_D-Box.jpg
  • 8.
    Data objects n Vector (contains single class of data values) n List (contains multiple classes of data values)
  • 9.
    Data objects n Vector (contains single class of data values) n Array including Matrix n List (contains multiple classes of data values) n Data frame
  • 10.
    Vector n Smallest building block of data objects n Single dimension n Combination of values of same class n vec1 <- c(2013, 2, 15, -10) # combine n vec2 <- 1:16 # integers 1 to 16
  • 11.
  • 12.
    Array/Matrix n Vector folded into a multidimensional structure n 2-dimensional array is a matrix n vec3 <- 1:16 n dim(vec3) <- c(4, 4) # 4 x 4 structure n dim(vec3) <- c(2, 2, 4) # 2 x 2 x 4 structure n arr1 <- array(1:60, dim = c(3,4,5))
  • 13.
  • 14.
    List n Combination of any values or objects n Can contain objects of multiple classes n eg, a list of two vectors, a matrix, three arrays n List_name$Variable_name operation with $ operator n list1 <- list(first = 1:17, second = matrix(letters, 13,2)) n list2 <- list(alpha = c(1,4,5,7), beta = c("h","s","p","h"))
  • 15.
    List Multi-part object Can contain vectors, arrays, or lists!
  • 16.
    Data frame n Special case of a list n List of same-length vectors vertically aligned n df1 <- data.frame(list2) n list3 <- list(small = letters, large = LETTERS, number = 1:26) n df2 <- data.frame(list3)
  • 17.
    Data Frame Multiple vectorsof same length tied together!
  • 18.
    Access by indexes n letters[3] # 1-dimensional object n arr1[1,2,3] # 3-dimensional object n arr1[1, ,3] # implies 1,(all),3 n df1[ ,3] # implies (all),3 n list1[[1]] # list needs [[ ]]
  • 19.
    Access named elements n list3 n list3$small n list3[["small"]] n df1$large n df1[, "large"]