Advertisement

17 polishing

Rice Unversity
Oct. 21, 2010
Advertisement

More Related Content

Advertisement

17 polishing

  1. Stat405 Polishing graphics for presentation Hadley Wickham Thursday, 21 October 2010
  2. Mark Schoenhals • Rice alum & former stat major • Visiting on Monday. Two talks: 11am (Four designs for ecommerce experiments), 4pm (Using data smarter faster: How one small store sold $500 million of music gear to over 1 million customers) • Undergrads are invited to lunch with him. Email me if you’re interested Thursday, 21 October 2010
  3. # Who is the most accurate shooter in the NBA? library(plyr) nba <- read.csv("nba-0809.csv.bz2") shots <- subset(nba, etype == "shot") success <- ddply(shots, c("team", "player"), summarise, total = length(player), made = sum(result == "made")) success$prop <- success$made / success$total success <- arrange(success, desc(prop)) Thursday, 21 October 2010
  4. team player total made prop 1 NYK Eddy Curry 1 1 1.0000000 2 OKC Steven Hill 1 1 1.0000000 3 SAS Marcus Williams 2 2 1.0000000 4 CHA Dwayne Jones 4 3 0.7500000 5 OKC Mouhamed Sene 7 5 0.7142857 6 SAS Pops Mensah-Bonsu 7 5 0.7142857 7 BOS J.R. Giddens 3 2 0.6666667 8 LAL Yue Sun 3 2 0.6666667 9 MIL Eddie Gill 9 6 0.6666667 10 DAL Erick Dampier 269 175 0.6505576 11 LAC DeAndre Jordan 132 85 0.6439394 12 ORL Adonal Foyle 11 7 0.6363636 13 BOS Bill Walker 58 36 0.6206897 14 POR Joel Przybilla 261 161 0.6168582 15 POR Shavlik Randolph 13 8 0.6153846 16 DET Amir Johnson 152 92 0.6052632 17 PHX Shaquille O'Neal 813 491 0.6039360 18 DEN Nene Hilario 708 427 0.6031073 19 ATL Solomon Jones 93 56 0.6021505 20 BOS Mikki Moore 85 51 0.6000000 Thursday, 21 October 2010
  5. 1.0 ● 0.8 ● ● ● ● ● ● ● ● ● ● ● 0.6 ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ●● ● ● ●● ● ● ●● ● ● ● ● ●● ●● ●● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ●● ● ● prop ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ●● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●●●● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ●● ●● ●● ●● ●●● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ●● ●● ●● ● ● ● 0.4 ● ●●●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●●● ● ● ●● 0.2 ● ● ● ● 0.0 ● ● 500 1000 1500 total Thursday, 21 October 2010
  6. 1. ggplot() practice 2. Communication graphics 3. Polishing a plot: scales and themes Thursday, 21 October 2010
  7. 50 ● ● ●● ● ● ●● ● ● ●● ●● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ●●● 45 ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ●● ●● ● ●● ● ●●● ● ● ●● ● ● ●●● ● ● ●● ● ●● ● ● ●●● ● ● ● ● ● ●● ●● ●● ●● % cancelled ● ●● ● ● ●● ● ● 0.0 ●● 40 ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●●● ● ●●●●● ● 0.2 ● ●● ● ●● ●● ●● ●● ● 0.4 35 ●● ● ● ● ● ●●● ● ● ●● ● ●● ● ● ●● ● ● 0.6 ●● ●● ● ● ● ● ●●●● ● ●● ● ●●●● ● ●●●● ● 0.8 ● ● ●●● ● ● ●● ●● ●● ●●●●● ●● ● 1.0 30 ●●● ●●● ● ●●●● ● ● ●● ●● ● ●● ● ● ●● 25 ● −120 −110 −100 −90 −80 −70 Thursday, 21 October 2010
  8. Your turn Identify the data and layers in the flight delays data, then write the ggplot2 code to create it. library(ggplot2) library(maps) usa <- map_data("state") feb13 <- read.csv("delays-feb-13-2007.csv") Thursday, 21 October 2010
  9. ggplot(feb13, aes(long, lat)) + geom_point(aes(size = 1), colour = "white") + geom_polygon(aes(group = group), data = usa, colour = "grey70", fill = NA) + geom_point(aes(size = ncancelw / ntot), colour = alpha("black", 1/2)) # Polishing: up next last_plot() + scale_area("% cancelled", to = c(1, 8), breaks = seq(0, 1, by = 0.2), limits = c(0, 1)) scale_x_continuous("", limits = c(-125, -67)), scale_y_continuous("", limits = c(24, 50)) Thursday, 21 October 2010
  10. Communication graphics When you need to communicate your findings, you need to spend a lot of time polishing your graphics to eliminate distractions and focus on the story. Now it’s time to pay attention to the small stuff: labels, colour choices, tick marks... Thursday, 21 October 2010
  11. Context Thursday, 21 October 2010
  12. Consumption Thursday, 21 October 2010
  13. 36 What’s wrong with this 34 plot? bin 32 < 1000 < 1e4 lat < 1e5 < 1e6 30 < 1e7 28 26 −106 −104 −102 −100 −98 −96 −94 long Thursday, 21 October 2010
  14. Some problems Incorrect coordinate system Bad colour scheme Unnecessary axis labels Legend needs improvement: better title and better key labels No title Thursday, 21 October 2010
  15. Thursday, 21 October 2010
  16. 1. Scales: used to override default perceptual mappings, and tune parameters of axes and legends. 2. Themes: control presentation of non-data elements. 3. Saving your work: to include in reports, presentations, etc. Thursday, 21 October 2010
  17. Scales Thursday, 21 October 2010
  18. Scales Control how data is mapped to perceptual properties, and produce guides (axes and legends) which allow us to read the plot. Important parameters: name, breaks & labels, limits. Naming scheme: scale_aesthetic_name. All default scales have name continuous or discrete. Thursday, 21 October 2010
  19. # Default scales scale_x_continuous() scale_y_discrete() scale_colour_discrete() # Custom scales scale_colour_hue() scale_x_log10() scale_fill_brewer() # Scales with parameters scale_x_continuous("X Label", limits = c(1, 10)) scale_colour_gradient(low = "blue", high = "red") Thursday, 21 October 2010
  20. # First argument (name) controls axis label scale_y_continuous("Latitude") scale_x_continuous("") # Breaks and labels control tick marks scale_x_continuous(breaks = -c(106,100,94)) scale_fill_discrete(labels = c("< 1000" = "< 1000", "< 1e4" = "< 10,000", "< 1e5" = "< 100,000", "< 1e6" = "< 1,000,000", "< 1e7" = "1,000,000+")) scale_y_continuous(breaks = NA) # Limits control range of data scale_y_continuous(limits = c(26, 32)) # same as: p + ylim(26, 32) Thursday, 21 October 2010
  21. options(stringsAsFactors = FALSE) pop <- read.csv("tx-pop.csv") pop$bin <- cut(log10(pop$pop), breaks = 2:7, labels = c("< 1000", "< 1e4", "< 1e5", "< 1e6", "< 1e7")) borders <- read.csv("tx-borders.csv") choro <- join(borders, pop) qplot(long, lat, data = choro, geom = "polygon", group = group, fill = bin) Thursday, 21 October 2010
  22. Your turn Fix the axis and legend related problems that we have identified. Thursday, 21 October 2010
  23. qplot(long, lat, data = choro, geom = "polygon", group = group, fill = bin) + scale_fill_discrete("Population", labels = c("< 1000" = "< 1000" , "< 1e4" = "< 10,000", "< 1e5" = "< 100,000", "< 1e6" = "< 1,000,000", "< 1e7" = "1,000,000+")) + scale_x_continuous("") + scale_y_continuous("") + coord_map() Thursday, 21 October 2010
  24. Alternate scales Can also override the default choice of scales. You are most likely to want to do this with colour, as it is the most important aesthetic after position. Need a little background to be able to use colour effectively: colour spaces & colour blindness. Thursday, 21 October 2010
  25. Colour spaces Most familiar is rgb: defines colour as mixture of red, green and blue. Matches the physics of eye, but the brain does a lot of post-processing, so it’s hard to directly perceive these components. A more useful colour space is hcl: hue, chroma and luminance Thursday, 21 October 2010
  26. hue luminance chroma Thursday, 21 October 2010
  27. Default colour scales Discrete: evenly spaced hues of equal chroma and luminance. No colour appears more important than any other. Does not imply order. Continuous: evenly spaced hues between two colours. Thursday, 21 October 2010
  28. Colour blindness 7-10% of men are red-green colour “blind”. (Many other rarer types of colour blindness) Solutions: avoid red-green contrasts; use redundant mappings; test. I like color oracle: http://colororacle.cartography.ch Thursday, 21 October 2010
  29. Alternatives Discrete: brewer, grey Continuous: gradient2, gradientn Thursday, 21 October 2010
  30. Your turn Modify the fill scale to use a Brewer colour palette of your choice. (Hint: you will need to change the name of the scale) Use RColorBrewer::display.brewer.all to list all palettes. Thursday, 21 October 2010
  31. Themes Thursday, 21 October 2010
  32. Visual appearance So far have only discussed how to get the data displayed the way you want, focussing on the essence of the plot. Themes give you a huge amount of control over the appearance of the plot, the choice of background colours, fonts and so on. Thursday, 21 October 2010
  33. # Two built in themes. The default: qplot(carat, price, data = diamonds) # And a theme with a white background: qplot(carat, price, data = diamonds) + theme_bw() # Use theme_set if you want it to apply to every # future plot. theme_set(theme_bw()) # This is the best way of seeing all the default # options theme_bw() theme_grey() Thursday, 21 October 2010
  34. Plot title The plot theme also controls the plot title. You can change this for an individual plot by adding opts(title = "My title") Thursday, 21 October 2010
  35. Your turn Add an informative title and see what the plot looks like with a white background. Thursday, 21 October 2010
  36. Elements You can also make your own theme, or modify and existing. Themes are made up of elements which can be one of: theme_line, theme_segment, theme_text, theme_rect, theme_blank Gives you a lot of control over plot appearance. Thursday, 21 October 2010
  37. Elements Axis: axis.line, axis.text.x, axis.text.y, axis.ticks, axis.title.x, axis.title.y Legend: legend.background, legend.key, legend.text, legend.title Panel: panel.background, panel.border, panel.grid.major, panel.grid.minor Strip: strip.background, strip.text.x, strip.text.y Thursday, 21 October 2010
  38. # To modify a plot p + opts(plot.title = theme_text(size = 12, face = "bold")) p + opts(plot.title = theme_text(colour = "red")) p + opts(plot.title = theme_text(angle = 45)) p + opts(plot.title = theme_text(hjust = 1)) Thursday, 21 October 2010
  39. # If we want, we could also remove the axes: last_plot() + opts( axis.text.x = theme_blank(), axis.text.y = theme_blank(), axis.title.x = theme_blank(), axis.title.y = theme_blank(), axis.ticks.length = unit(0, "cm"), axis.ticks.margin = unit(0, "cm")) Thursday, 21 October 2010
  40. Thursday, 21 October 2010
Advertisement