Upcoming SlideShare
×

# 19 tables

1,296 views
1,232 views

Published on

3 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
1,296
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
20
0
Likes
3
Embeds 0
No embeds

No notes for slide

### 19 tables

1. 1. Today we will use the reshape2 and xtables packages, and the movies.csv.bz2 dataset. install.packages(c("reshape2", "xtables"))
2. 2. Garrett Grolemund Phd Student / Rice University Department of Statistics Tables
3. 3. 1. Melt and Cast (reshape) 2. Making tables with R (xtables) 3. Advanced styling (booktabs)
4. 4. Melt and Cast
5. 5. reshape2 Improved version of reshape - much faster. melt() works the same as in reshape.
6. 6. Recall Sometimes a single variable is strewn across multiple columns. We can recombine it with a 3 step process: 1. protect the good columns 2. melt the rest 3. subset out false rows
7. 7. tips1 <- melt(tips, id = c("customer_ID", "total_bill", "tip")) names(tips1)[4] <- "gender" tips1 <- subset(tips1, value == 1)[ , 1:4] customer_id total_bill tip female male 1 20.53 4.00 0 1 2 21.70 4.30 0 1 3 19.65 3.00 1 0 customer_id total_bill tip gender 1 20.53 4.00 male 2 21.70 4.30 male 3 19.65 3.00 female
8. 8. Your turn Use melt() to combine the genre columns of movies.csv into one column that displays each movie’s genre.
9. 9. movies <- read.csv("movies.csv.bz2", stringsAsFactors = FALSE) library(reshape) m.movies <-melt(movies, id = names(movies)[1:17]) names(movies)[18] <- "genre" movies <- subset(m.movies, value == 1)[ , 1:18]
10. 10. Cast
11. 11. subject age weight height John Smith 33 90 1.87 Mary Smith 1.54 subject variable value John Smith age 33 John Smith weight 90 John Smith height 1.87 Mary Smith height 1.54 variable John Smith Mary Smith age 33 weight 90 height 1.87 1.54 What is the difference between these data frames?
12. 12. subject age weight height John Smith 33 90 1.87 Mary Smith 1.54 subject variable value John Smith age 33 John Smith weight 90 John Smith height 1.87 Mary Smith height 1.54 variable John Smith Mary Smith age 33 weight 90 height 1.87 1.54 Same measurements, different arrangements
13. 13. dcast (or cast in reshape) transforms a melted dataframe into whatever arrangement you like. Dataframe must contain a column named “value.” smiths m.smiths <- melt(smiths[ , -2]) dcast(m.smiths, subject ~ variable) dcast(m.smiths, variable ~ subject)
14. 14. General strategy Think of dcast() as making a table
15. 15. dcast(m.smiths, subject ~ variable) melted dataset
16. 16. dcast(m.smiths, subject ~ variable) what IDs to put in left hand column
17. 17. dcast(m.smiths, subject ~ variable) what IDs to put along the top
18. 18. dcast(m.smiths, subject ~ variable) Whatever was in the value column will get spread across the cells of the table
19. 19. dcast(m.smith, subject + gender ~ variable) left hand column 1 1 2 left hand column 2 m.smiths\$gender <- rep(c(“male”, “female”), 2) Multiple variables
20. 20. dcast(m.smith, subject ~ gender + variable) Multiple variables ?
21. 21. dcast(m.smith, subject ~ gender + variable) still only one header row: names get combined Multiple variables
22. 22. Your turn Use reshape2 and the smiths data set to create the following tables variable John Smith Mary Smith 1 age 33.00 NA 2 weight 90.00 NA 3 height 1.87 1.54 subject variable 1.54 1.87 1 John Smith time NA 1 2 John Smith age NA 33 3 John Smith weight NA 90 4 Mary Smith time 1 NA 5 Mary Smith age NA NA 6 Mary Smith weight NA NA
23. 23. Q: Often reshaping a data set will lead to two values being put into the same cell. How do we combine them? For example, Aggregating game player variable value 1 Yao Ming shot attempts 16 2 Yao Ming shot attempts 12 2 Jordan Hill shot attempts 9 player shot attempts Jordan Hill 9 Yao Ming ?
24. 24. A: However you want. Aggregating player shot attempts Jordan Hill 9 Yao Ming 28 dcast(data, left ~ top, agg.method) R function to use to combine the values
25. 25. df <- data.frame(game = c(1,2,2), player = c("Yao Ming", "Yao Ming", "Jordan Hill"), variable = c("shot attempts", "shot attempts", "shot attempts"), value = c(16, 12, 9)) dcast(df, player ~ variable, sum)
26. 26. Useful aggregating functions length (default) number of times the combination appeared sum total number of all values in that cell mean average number of all values in that cell * remember to use na.rm = T
27. 27. Your turn Use the movie data set to create a table that shows: 1. The total number of movies in each genre by year. (hint: years in the rows, genres in the columns) 2. The average rating, length, and budget of movies from each year (hint: years in rows)
28. 28. dcast(movies, year ~ genre, length) movies1 <- melt(movies[ , c(2:5)], id = "year") movies1 <- dcast(movies1, year ~ variable, mean, na.rm = T) qplot(year, length, data = movies1, geom = "line")
29. 29. Its often useful to see the total for each row and column. These totals are known as margins or marginal distributions. Margins
30. 30. To add a margin column: dcast(data, left ~ top, margins = "name of last variable on left side of ~") To add a margin row: dcast(data, left ~ top, margins = "name of last variable on right side of ~") To add both: dcast(data, left ~ top, margins = c("name1", "name2")) Margins
31. 31. Your turn Add a margin row and column to your table of genres by year.
32. 32. Making tables with R
33. 33. Data analysis done in R Problem Tables published in latex?you + hand entry
34. 34. Data analysis done in R solution Tables published in latex?xtable
35. 35. library(xtable) xtable converts objects in R to a table in latex code with xtable() and print() head(smiths) print(xtable(smiths)) print(xtable(smiths), ﬂoating = FALSE)
36. 36. Captions print(xtable(smiths, caption = "John is taller than Mary."))
37. 37. Centering print(xtable(smiths), latex.environment = "center")
38. 38. Column Alignment smith.table <- xtable(smiths) align(smith.table) <- "|cc|cccc|" print(smith.table)
39. 39. Labels print(xtable(smiths, label = "tab:smith"))
40. 40. Suppress row names print(xtable(smiths), include.rownames = FALSE)
41. 41. more Remember that variable names like “year_2010” won’t print out in latex. You’ll have to change them to “year_2010” ﬁrst. For more examples of xtables, visit: cran.r-project.org/web/packages/xtable/ vignettes/xtableGallery.pd
42. 42. Create latex code for the genre by year table. Give the table an appropriate caption and label and remove row names. Your turn
43. 43. genres <- dcast(movies, year ~ genre, length) print(xtable(genres, label = "tab:genres", caption = "Table of genres by year. Action movies win."), include.rownames = FALSE)