19 tables

Rice Unversity
Oct. 28, 2010

More Related Content


19 tables

  1. Today we will use the reshape2 and xtables packages, and the movies.csv.bz2 dataset. install.packages(c("reshape2", "xtables"))
  2. Garrett Grolemund Phd Student / Rice University Department of Statistics Tables
  3. 1. Melt and Cast (reshape) 2. Making tables with R (xtables) 3. Advanced styling (booktabs)
  4. Melt and Cast
  5. reshape2 Improved version of reshape - much faster. melt() works the same as in reshape.
  6. Recall Sometimes a single variable is strewn across multiple columns. We can recombine it with a 3 step process: 1. protect the good columns 2. melt the rest 3. subset out false rows
  7. tips1 <- melt(tips, id = c("customer_ID", "total_bill", "tip")) names(tips1)[4] <- "gender" tips1 <- subset(tips1, value == 1)[ , 1:4] customer_id total_bill tip female male 1 20.53 4.00 0 1 2 21.70 4.30 0 1 3 19.65 3.00 1 0 customer_id total_bill tip gender 1 20.53 4.00 male 2 21.70 4.30 male 3 19.65 3.00 female
  8. Your turn Use melt() to combine the genre columns of movies.csv into one column that displays each movie’s genre.
  9. movies <- read.csv("movies.csv.bz2", stringsAsFactors = FALSE) library(reshape) m.movies <-melt(movies, id = names(movies)[1:17]) names(movies)[18] <- "genre" movies <- subset(m.movies, value == 1)[ , 1:18]
  10. Cast
  11. subject age weight height John Smith 33 90 1.87 Mary Smith 1.54 subject variable value John Smith age 33 John Smith weight 90 John Smith height 1.87 Mary Smith height 1.54 variable John Smith Mary Smith age 33 weight 90 height 1.87 1.54 What is the difference between these data frames?
  12. subject age weight height John Smith 33 90 1.87 Mary Smith 1.54 subject variable value John Smith age 33 John Smith weight 90 John Smith height 1.87 Mary Smith height 1.54 variable John Smith Mary Smith age 33 weight 90 height 1.87 1.54 Same measurements, different arrangements
  13. dcast (or cast in reshape) transforms a melted dataframe into whatever arrangement you like. Dataframe must contain a column named “value.” smiths m.smiths <- melt(smiths[ , -2]) dcast(m.smiths, subject ~ variable) dcast(m.smiths, variable ~ subject)
  14. General strategy Think of dcast() as making a table
  15. dcast(m.smiths, subject ~ variable) melted dataset
  16. dcast(m.smiths, subject ~ variable) what IDs to put in left hand column
  17. dcast(m.smiths, subject ~ variable) what IDs to put along the top
  18. dcast(m.smiths, subject ~ variable) Whatever was in the value column will get spread across the cells of the table
  19. dcast(m.smith, subject + gender ~ variable) left hand column 1 1 2 left hand column 2 m.smiths$gender <- rep(c(“male”, “female”), 2) Multiple variables
  20. dcast(m.smith, subject ~ gender + variable) Multiple variables ?
  21. dcast(m.smith, subject ~ gender + variable) still only one header row: names get combined Multiple variables
  22. Your turn Use reshape2 and the smiths data set to create the following tables variable John Smith Mary Smith 1 age 33.00 NA 2 weight 90.00 NA 3 height 1.87 1.54 subject variable 1.54 1.87 1 John Smith time NA 1 2 John Smith age NA 33 3 John Smith weight NA 90 4 Mary Smith time 1 NA 5 Mary Smith age NA NA 6 Mary Smith weight NA NA
  23. Q: Often reshaping a data set will lead to two values being put into the same cell. How do we combine them? For example, Aggregating game player variable value 1 Yao Ming shot attempts 16 2 Yao Ming shot attempts 12 2 Jordan Hill shot attempts 9 player shot attempts Jordan Hill 9 Yao Ming ?
  24. A: However you want. Aggregating player shot attempts Jordan Hill 9 Yao Ming 28 dcast(data, left ~ top, agg.method) R function to use to combine the values
  25. df <- data.frame(game = c(1,2,2), player = c("Yao Ming", "Yao Ming", "Jordan Hill"), variable = c("shot attempts", "shot attempts", "shot attempts"), value = c(16, 12, 9)) dcast(df, player ~ variable, sum)
  26. Useful aggregating functions length (default) number of times the combination appeared sum total number of all values in that cell mean average number of all values in that cell * remember to use na.rm = T
  27. Your turn Use the movie data set to create a table that shows: 1. The total number of movies in each genre by year. (hint: years in the rows, genres in the columns) 2. The average rating, length, and budget of movies from each year (hint: years in rows)
  28. dcast(movies, year ~ genre, length) movies1 <- melt(movies[ , c(2:5)], id = "year") movies1 <- dcast(movies1, year ~ variable, mean, na.rm = T) qplot(year, length, data = movies1, geom = "line")
  29. Its often useful to see the total for each row and column. These totals are known as margins or marginal distributions. Margins
  30. To add a margin column: dcast(data, left ~ top, margins = "name of last variable on left side of ~") To add a margin row: dcast(data, left ~ top, margins = "name of last variable on right side of ~") To add both: dcast(data, left ~ top, margins = c("name1", "name2")) Margins
  31. Your turn Add a margin row and column to your table of genres by year.
  32. Making tables with R
  33. Data analysis done in R Problem Tables published in latex?you + hand entry
  34. Data analysis done in R solution Tables published in latex?xtable
  35. library(xtable) xtable converts objects in R to a table in latex code with xtable() and print() head(smiths) print(xtable(smiths)) print(xtable(smiths), floating = FALSE)
  36. Captions print(xtable(smiths, caption = "John is taller than Mary."))
  37. Centering print(xtable(smiths), latex.environment = "center")
  38. Column Alignment smith.table <- xtable(smiths) align(smith.table) <- "|cc|cccc|" print(smith.table)
  39. Labels print(xtable(smiths, label = "tab:smith"))
  40. Suppress row names print(xtable(smiths), include.rownames = FALSE)
  41. more Remember that variable names like “year_2010” won’t print out in latex. You’ll have to change them to “year_2010” first. For more examples of xtables, visit: vignettes/xtableGallery.pd
  42. Create latex code for the genre by year table. Give the table an appropriate caption and label and remove row names. Your turn
  43. genres <- dcast(movies, year ~ genre, length) print(xtable(genres, label = "tab:genres", caption = "Table of genres by year. Action movies win."), include.rownames = FALSE)
  44. Advanced table styling
  45. 6 rules for pretty tables From 1. Never use vertical lines 2. Never use double lines 3. Put units in the column heading, not in body of table 4. Always precede a decimal point by a digit 5. Never use “ditto” signs to repeat a previous value 6. Use signficant digits consistently within columns
  46. booktabs package booktabs is a latex package that provides tools for making table even better. Specifically it provides toprule and bottomrule which are thicker and provide better spacing than using hline to begin and end your tables. midrule is thinner and can be used within the table.
  47. booktabs package To use toprule, bottomrule, and midrule first declare usepackage{booktabs} at the top of your tex document.
  48. This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 United States License. To view a copy of this license, visit 3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.