Grammar Of Graphics: past, present, future

  • 3,517 views
Uploaded on

 

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,517
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
111
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. A grammar of graphics: past, present, and future Hadley Wickham Rice University http://had.co.nz/ Friday, 29 May 2009
  • 2. Past Friday, 29 May 2009
  • 3. “If any number of magnitudes are each the same multiple of the same number of other magnitudes, then the sum is that multiple of the sum.” Euclid, ~300 BC Friday, 29 May 2009
  • 4. “If any number of magnitudes are each the same multiple of the same number of other magnitudes, then the sum is that multiple of the sum.” Euclid, ~300 BC m(Σx) = Σ(mx) Friday, 29 May 2009
  • 5. The grammar of graphics • An abstraction which makes thinking, reasoning and communicating graphics easier • Developed by Leland Wilkinson, particularly in “The Grammar of Graphics” 1999/2005 • One of the cornerstones of my research (I’ll talk about the others later) Friday, 29 May 2009
  • 6. Present Friday, 29 May 2009
  • 7. ggplot2 • High-level R package for creating statistical graphics. A rich set of components + user friendly wrappers • Inspired by “The Grammar of Graphics” Leland Wilkinson 1999 • John Chambers award in 2006 • Philosophy of ggplot • Examples from a recent paper • New methods facilitated by ggplot Friday, 29 May 2009
  • 8. Data Layer Data Trans Mapping Geom Element Stat Position Scale Scale Guide Coord + defaults Coord Facet – the grammar Friday, 29 May 2009
  • 9. Philosophy • Make graphics easier • Use the grammar to facilitate research into new types of display • Continuum of expertise: • start simple by using the results of the theory • grow in power by understanding the theory • begin to contribute new components • Orthogonal components and minimal special cases should make learning easy(er?) Friday, 29 May 2009
  • 10. Examples • J. Hobbs, H. Wickham, H. Hofmann, and D. Cook. Glaciers melt as mountains warm: A graphical case study. Computational Statistics. Special issue for ASA Statistical Computing and Graphics Data Expo 2006. • Exploratory graphics created with GGobi, Mondrian, Manet, Gauguin and R, but needed consistent high-quality graphics that worked in black and white for publication • So... used ggplot2 to recreate the graphics Friday, 29 May 2009
  • 11. qplot(long, lat, data = expo, geom = quot;tilequot;, fill = ozone, facets = year ~ month) + scale_fill_gradient(low = quot;whitequot;, high = quot;blackquot;) + map Friday, 29 May 2009
  • 12. 30 20 10 0 −10 −20 ggplot(df, aes(x = long + res * x, y = lat + res * y)) + map + geom_polygon(aes(group = interaction(long, lat)), fill=NA, colour=quot;blackquot;) −110 −85 −60 Friday, 29 May 2009
  • 13. 30 20 10 0 −10 −20 ggplot(df, aes(x = long + res * x, y = lat + res * y)) + map + geom_polygon(aes(group = interaction(long, lat)), fill=NA, colour=quot;blackquot;) −110 −85 −60 Friday, 29 May 2009
  • 14. 1.0 0.5 ● 0.0 ● −0.5 −1.0 −1.0 −0.5 0.0 0.5 1.0 Friday, 29 May 2009
  • 15. 1.0 1.0 0.8 0.5 ● 0.6 0.0 ● ● 0.4 −0.5 0.2 −1.0 0.0 −1.0 −0.5 0.0 0.5 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Friday, 29 May 2009
  • 16. with eated r lly cr n tou Initia latio 30 corre 20 10 0 −10 −20 ggplot(rexpo, aes(x = long + res * rtime, y = lat + res * rpressure)) + map + geom_line(aes(group = id)) −110 −85 −60 Friday, 29 May 2009
  • 17. library(maps) outlines <- as.data.frame(map(quot;worldquot;,xlim=-c(113.8, 56.2),ylim=c(-21.2, 36.2))) map <- c( geom_path(aes(x = x, y = y), data = outlines, colour = alpha(quot;grey20quot;, 0.2)), scale_x_continuous(quot;quot;, limits = c(-113.8, -56.2), breaks = c(-110, -85, -60)), scale_y_continuous(quot;quot;, limits = c(-21.2, 36.2)) ) Friday, 29 May 2009
  • 18. 310 300 temperature 290 280 270 1995 1996 1997 1998 1999 2000 date qplot(date, temperature, data=clustered, group=id, geom=quot;linequot;) + pacific + elnino(clustered$temperature) Friday, 29 May 2009
  • 19. pacific <- brush(cluster %in% c(5,6)) brush <- function(condition, background = quot;grey60quot;, brush = quot;redquot;) { cond_string <- deparse(substitute(condition), width=500) list( aes_string(colour = cond_string, order = cond_string, size = cond_string), scale_colour_manual(values = c(quot;FALSEquot; = background, quot;TRUEquot; = brush)), opts(legend.position = quot;nonequot;) ) } Friday, 29 May 2009
  • 20. Friday, 29 May 2009
  • 21. ggplot(clustered, aes(x = long, y = lat)) + geom_tile(aes(width = 2.5, height = 2.5, fill = factor(cluster))) + facet_grid(cluster ~ .) + map + scale_fill_brewer(palette=quot;Spectralquot;) qplot(date, value, data = clusterm, group = id, geom = quot;linequot;, facets = cluster ~ variable, colour = factor(cluster)) + scale_y_continuous(quot;quot;, breaks=NA) + scale_colour_brewer(palette=quot;Spectralquot;) Friday, 29 May 2009
  • 22. New methods • Supplemental statistical summaries • Iterating between graphics and models • Tables of plots • Inspired by ideas of Tukey (and others) • Exploratory graphics, not (as) pretty Friday, 29 May 2009
  • 23. Intro to data • Response of trees to gypsy moth attack • 5 varieties of tree: Dan-2, Sau-2, Sau-3, Wau-1, Wau-2 • 2 treatments: NGM / GM • 2 nutrient levels: low / high • 5 reps • Measured: weight, N, tannin, salicylates Friday, 29 May 2009
  • 24. qplot(genotype, weight, data=b) ● 70 ● ● ● ● ● 60 ● ● ● ● ● ● 50 ● ● ● ● ● ● weight ● ● 40 ● ● ● ● ● ● ● 30 ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● 10 ● ● ● ● Dan−2 Sau−2 Sau−3 Wau−1 Wau−2 genotype Friday, 29 May 2009
  • 25. qplot(genotype, weight, data=b, ● 70 colour=nutr) ● ● ● ● ● 60 ● ● ● ● ● ● 50 ● ● ● ● ● ● nutr weight ● ● Low 40 ● ● ● High ● ● ● ● 30 ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● 10 ● ● ● ● Dan−2 Sau−2 Sau−3 Wau−1 Wau−2 genotype Friday, 29 May 2009
  • 26. ● 70 ● ● ● ● ● 60 ● ● ● ● ● ● 50 ● ● ● ● ● ● nutr weight ● ● Low 40 ● ● ● High ● ● ● ● 30 ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● 10 ● ● ● ● Sau−3 Dan−2 Sau−2 Wau−2 Wau−1 genotype Friday, 29 May 2009
  • 27. Comparing means • For inference, interested in comparing the means of the groups • But hard to do visually - eyes naturally compare ranges • What can we do? - Visual ANOVA Friday, 29 May 2009
  • 28. Supplemental summaries Fro m • smry <- stat_summary( Hm isc fun=quot;mean_cl_bootquot;, conf.int=0.68, geom=quot;crossbarquot;, width=0.3 ) • Adds another layer with summary statistics: mean + bootstrap estimate of standard error • Motivation: still exploratory, so minimise distributional assumptions, will model explicitly later Friday, 29 May 2009
  • 29. qplot(genotype, weight, data=b, ● 70 colour=nutr) ● ● ● ● ● 60 ● ● ● ● ● ● 50 ● ● ● ● ● ● nutr weight ● ● Low 40 ● ● ● High ● ● ● ● 30 ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● 10 ● ● ● ● Sau−3 Dan−2 Sau−2 Wau−2 Wau−1 genotype Friday, 29 May 2009
  • 30. qplot(genotype, weight, data=b, ● 70 colour=nutr) + smry ● ● ● ● ● 60 ● ● ● ● ● ● 50 ● ● ● ● ● ● nutr weight ● ● Low 40 ● ● ● High ● ● ● ● 30 ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● 10 ● ● ● ● Sau−3 Dan−2 Sau−2 Wau−2 Wau−1 genotype Friday, 29 May 2009
  • 31. Iterating graphics and modelling • Clearly strong genotype effect. Is there a nutr effect? Is there a nutr-genotype interaction? • Hard to see from this plot - what if we remove the genotype main effect? What if we remove the nutr main effect? • How does this compare an ANOVA? Friday, 29 May 2009
  • 32. qplot(genotype, weight, data=b, ● 70 colour=nutr) + smry ● ● ● ● ● 60 ● ● ● ● ● ● 50 ● ● ● ● ● ● nutr weight ● ● Low 40 ● ● ● High ● ● ● ● 30 ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● 10 ● ● ● ● Sau−3 Dan−2 Sau−2 Wau−2 Wau−1 genotype Friday, 29 May 2009
  • 33. ● 20 ● ● ● ● ● ● ● 10 ● ● ● ● ● ● ● ● ● ● ● ● nutr weight2 ● ● ● 0 ● ● Low ● ● ● ● ● High ● ● ● ● ● ● ● ● ● ● ● −10 ● ● ● ● ● ● ● −20 b$weight2 <- resid(lm(weight ~ genotype, data=b)) qplot(genotype, weight2, data=b, colour=nutr) + smry ● Sau−3 Dan−2 Sau−2 Wau−2 Wau−1 genotype Friday, 29 May 2009
  • 34. ● ● ● ● ● 10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● nutr weight3 ● 0 ● ● ● ● ● ● Low ● ● ● ● ● ● High ● ● ● ● ● ● ● ● ● −10 ● ● ● ● −20 b$weight3 <- resid(lm(weight ~ genotype + nutr, data=b)) qplot(genotype, weight3, data=b, colour=nutr) + smry ● Sau−3 Dan−2 Sau−2 Wau−2 Wau−1 genotype Friday, 29 May 2009
  • 35. Df Sum Sq Mean Sq F value Pr(>F) genotype 4 13331 3333 36.22 8.4e-13 *** nutr 1 1053 1053 11.44 0.0016 ** genotype:nutr 4 144 36 0.39 0.8141 Residuals 40 3681 92 anova(lm(weight ~ genotype * nutr, data=b)) Friday, 29 May 2009
  • 36. Graphics ➙ Model • In the previous example, we used graphics to iteratively build up a model - a la stepwise regression! • But: here interested in gestalt, not accurate prediction, and must remember that this is just one possible model • What about model ➙ graphics? Friday, 29 May 2009
  • 37. Model ➙ Graphics • If we model first, we need graphical tools to summarise model results, e.g. post-hoc comparison of levels • We can do better than SAS! But it’s hard work: effects, multComp and multCompView • Rich research area Friday, 29 May 2009
  • 38. ● ● ● ● ● ● 60 ● ● ● ● ● ● ● ● ● ● ● ● ● ● nutr weight 40 ● ● Low ● ● High ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● 0 a a b bc c Sau3 Dan2 Sau2 Wau2 Wau1 genotype Friday, 29 May 2009
  • 39. ● ● ● ● ● ● 60 ● ● ● ● ● ● ● ● ● ● ● ● ● ● nutr weight 40 ● ● Low ● ● High ● ● ● ● ● ● ● ● ggplot(b, aes(x=genotype, y=weight)) ● ● ● ● ● + geom_hline(intercept = mean(b$weight)) 20 ● ● ● ● + geom_crossbar(aes(y=fit, min=lower,max=upper), ● ● ● data=geffect) ● ● ● ● + 0geom_point(aes(colour = nutr)) a a b bc c + geom_text(aes(label Sau2group), data=geffect) Sau3 Dan2 = Wau2 Wau1 genotype Friday, 29 May 2009
  • 40. Tables of plots • Often interested in marginal, as well as conditional, relationships • Or comparing one subset to the whole, rather than to other subsets • Like in contingency table, we often want to see margins as well Friday, 29 May 2009
  • 41. Dan−2 Sau−2 Sau−3 Wau−1 Wau−2 ● ● 4.5 ● 4.0 ● ● ● ● ● ● ● ● ● 3.5 ● High ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 3.0 ● ● ● ● ● ● ● ● ● ● ● ● ● 2.5 ● ● 2.0 trt NGM n 4.5 GM 4.0 3.5 Low 3.0 ● 2.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● GM NGM GM NGM GM NGM GM NGM GM NGM trt Friday, 29 May 2009
  • 42. Dan−2 Sau−2 Sau−3 Wau−1 Wau−2 (all) ● ● ● ● 4.5 ● ● 4.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 3.5 ● ● High ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 3.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2.5 ● ● ● ● 2.0 4.5 4.0 trt 3.5 Low NGM n 3.0 ● ● GM 2.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 4.5 ● ● 4.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 3.5 ● ● (all) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 3.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● GM NGM GM NGM GM NGM GM NGM GM NGM GM NGM trt Friday, 29 May 2009
  • 43. Arranging plots • Facilitate comparisons of interest • Small differences need to be closer together (big differences can be far apart) • Connections to model? Friday, 29 May 2009
  • 44. Summary • Need to move beyond canned statistical graphics to experimenting with new graphical methods • Strong links between graphics and models, how can we best use them? • Static graphics often aren’t enough Friday, 29 May 2009
  • 45. Future Friday, 29 May 2009
  • 46. GGobi ggplot2 reshape plyr rggobi data expo 09 data sets classifly meifly clusterfly Friday, 29 May 2009
  • 47. GGobi ggplot2 reshape plyr rggobi data expo 09 data sets classifly meifly clusterfly Foundations of New methods Data analysis statistical graphics Friday, 29 May 2009
  • 48. GGobi ggplot2 reshape2 ?? plyr A grammar of ?? rggobi data expo 09 interactive graphics data sets classifly meifly clusterfly Product/area graphics Tourr Foundations of New methods Data analysis statistical graphics Friday, 29 May 2009
  • 49. Questions? Friday, 29 May 2009