Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Simplified Grammar of Graphics by Amit Kapoor 1083 views
- The Grammar of Graphics, for ISKO UK by Conrad Taylor 3650 views
- The grammar of graphics by Bobbie Olson 148 views
- Michael 2011 japan by yintengfei 3544 views
- Introduzione a R by MCalderisi 6218 views
- First steps in RStudio by Prof. Dr. Jan Kirenz 1001 views

5,570 views

Published on

No Downloads

Total views

5,570

On SlideShare

0

From Embeds

0

Number of Embeds

927

Shares

0

Downloads

138

Comments

0

Likes

8

No embeds

No notes for slide

- 1. A grammar of graphics: past, present, and future Hadley Wickham Rice University http://had.co.nz/ Friday, 29 May 2009
- 2. Past Friday, 29 May 2009
- 3. “If any number of magnitudes are each the same multiple of the same number of other magnitudes, then the sum is that multiple of the sum.” Euclid, ~300 BC Friday, 29 May 2009
- 4. “If any number of magnitudes are each the same multiple of the same number of other magnitudes, then the sum is that multiple of the sum.” Euclid, ~300 BC m(Σx) = Σ(mx) Friday, 29 May 2009
- 5. The grammar of graphics • An abstraction which makes thinking, reasoning and communicating graphics easier • Developed by Leland Wilkinson, particularly in “The Grammar of Graphics” 1999/2005 • One of the cornerstones of my research (I’ll talk about the others later) Friday, 29 May 2009
- 6. Present Friday, 29 May 2009
- 7. ggplot2 • High-level R package for creating statistical graphics. A rich set of components + user friendly wrappers • Inspired by “The Grammar of Graphics” Leland Wilkinson 1999 • John Chambers award in 2006 • Philosophy of ggplot • Examples from a recent paper • New methods facilitated by ggplot Friday, 29 May 2009
- 8. Data Layer Data Trans Mapping Geom Element Stat Position Scale Scale Guide Coord + defaults Coord Facet – the grammar Friday, 29 May 2009
- 9. Philosophy • Make graphics easier • Use the grammar to facilitate research into new types of display • Continuum of expertise: • start simple by using the results of the theory • grow in power by understanding the theory • begin to contribute new components • Orthogonal components and minimal special cases should make learning easy(er?) Friday, 29 May 2009
- 10. Examples • J. Hobbs, H. Wickham, H. Hofmann, and D. Cook. Glaciers melt as mountains warm: A graphical case study. Computational Statistics. Special issue for ASA Statistical Computing and Graphics Data Expo 2006. • Exploratory graphics created with GGobi, Mondrian, Manet, Gauguin and R, but needed consistent high-quality graphics that worked in black and white for publication • So... used ggplot2 to recreate the graphics Friday, 29 May 2009
- 11. qplot(long, lat, data = expo, geom = quot;tilequot;, fill = ozone, facets = year ~ month) + scale_fill_gradient(low = quot;whitequot;, high = quot;blackquot;) + map Friday, 29 May 2009
- 12. 30 20 10 0 −10 −20 ggplot(df, aes(x = long + res * x, y = lat + res * y)) + map + geom_polygon(aes(group = interaction(long, lat)), fill=NA, colour=quot;blackquot;) −110 −85 −60 Friday, 29 May 2009
- 13. 30 20 10 0 −10 −20 ggplot(df, aes(x = long + res * x, y = lat + res * y)) + map + geom_polygon(aes(group = interaction(long, lat)), fill=NA, colour=quot;blackquot;) −110 −85 −60 Friday, 29 May 2009
- 14. 1.0 0.5 ● 0.0 ● −0.5 −1.0 −1.0 −0.5 0.0 0.5 1.0 Friday, 29 May 2009
- 15. 1.0 1.0 0.8 0.5 ● 0.6 0.0 ● ● 0.4 −0.5 0.2 −1.0 0.0 −1.0 −0.5 0.0 0.5 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Friday, 29 May 2009
- 16. with eated r lly cr n tou Initia latio 30 corre 20 10 0 −10 −20 ggplot(rexpo, aes(x = long + res * rtime, y = lat + res * rpressure)) + map + geom_line(aes(group = id)) −110 −85 −60 Friday, 29 May 2009
- 17. library(maps) outlines <- as.data.frame(map(quot;worldquot;,xlim=-c(113.8, 56.2),ylim=c(-21.2, 36.2))) map <- c( geom_path(aes(x = x, y = y), data = outlines, colour = alpha(quot;grey20quot;, 0.2)), scale_x_continuous(quot;quot;, limits = c(-113.8, -56.2), breaks = c(-110, -85, -60)), scale_y_continuous(quot;quot;, limits = c(-21.2, 36.2)) ) Friday, 29 May 2009
- 18. 310 300 temperature 290 280 270 1995 1996 1997 1998 1999 2000 date qplot(date, temperature, data=clustered, group=id, geom=quot;linequot;) + pacific + elnino(clustered$temperature) Friday, 29 May 2009
- 19. pacific <- brush(cluster %in% c(5,6)) brush <- function(condition, background = quot;grey60quot;, brush = quot;redquot;) { cond_string <- deparse(substitute(condition), width=500) list( aes_string(colour = cond_string, order = cond_string, size = cond_string), scale_colour_manual(values = c(quot;FALSEquot; = background, quot;TRUEquot; = brush)), opts(legend.position = quot;nonequot;) ) } Friday, 29 May 2009
- 20. Friday, 29 May 2009
- 21. ggplot(clustered, aes(x = long, y = lat)) + geom_tile(aes(width = 2.5, height = 2.5, fill = factor(cluster))) + facet_grid(cluster ~ .) + map + scale_fill_brewer(palette=quot;Spectralquot;) qplot(date, value, data = clusterm, group = id, geom = quot;linequot;, facets = cluster ~ variable, colour = factor(cluster)) + scale_y_continuous(quot;quot;, breaks=NA) + scale_colour_brewer(palette=quot;Spectralquot;) Friday, 29 May 2009
- 22. New methods • Supplemental statistical summaries • Iterating between graphics and models • Tables of plots • Inspired by ideas of Tukey (and others) • Exploratory graphics, not (as) pretty Friday, 29 May 2009
- 23. Intro to data • Response of trees to gypsy moth attack • 5 varieties of tree: Dan-2, Sau-2, Sau-3, Wau-1, Wau-2 • 2 treatments: NGM / GM • 2 nutrient levels: low / high • 5 reps • Measured: weight, N, tannin, salicylates Friday, 29 May 2009
- 24. qplot(genotype, weight, data=b) ● 70 ● ● ● ● ● 60 ● ● ● ● ● ● 50 ● ● ● ● ● ● weight ● ● 40 ● ● ● ● ● ● ● 30 ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● 10 ● ● ● ● Dan−2 Sau−2 Sau−3 Wau−1 Wau−2 genotype Friday, 29 May 2009
- 25. qplot(genotype, weight, data=b, ● 70 colour=nutr) ● ● ● ● ● 60 ● ● ● ● ● ● 50 ● ● ● ● ● ● nutr weight ● ● Low 40 ● ● ● High ● ● ● ● 30 ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● 10 ● ● ● ● Dan−2 Sau−2 Sau−3 Wau−1 Wau−2 genotype Friday, 29 May 2009
- 26. ● 70 ● ● ● ● ● 60 ● ● ● ● ● ● 50 ● ● ● ● ● ● nutr weight ● ● Low 40 ● ● ● High ● ● ● ● 30 ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● 10 ● ● ● ● Sau−3 Dan−2 Sau−2 Wau−2 Wau−1 genotype Friday, 29 May 2009
- 27. Comparing means • For inference, interested in comparing the means of the groups • But hard to do visually - eyes naturally compare ranges • What can we do? - Visual ANOVA Friday, 29 May 2009
- 28. Supplemental summaries Fro m • smry <- stat_summary( Hm isc fun=quot;mean_cl_bootquot;, conf.int=0.68, geom=quot;crossbarquot;, width=0.3 ) • Adds another layer with summary statistics: mean + bootstrap estimate of standard error • Motivation: still exploratory, so minimise distributional assumptions, will model explicitly later Friday, 29 May 2009
- 29. qplot(genotype, weight, data=b, ● 70 colour=nutr) ● ● ● ● ● 60 ● ● ● ● ● ● 50 ● ● ● ● ● ● nutr weight ● ● Low 40 ● ● ● High ● ● ● ● 30 ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● 10 ● ● ● ● Sau−3 Dan−2 Sau−2 Wau−2 Wau−1 genotype Friday, 29 May 2009
- 30. qplot(genotype, weight, data=b, ● 70 colour=nutr) + smry ● ● ● ● ● 60 ● ● ● ● ● ● 50 ● ● ● ● ● ● nutr weight ● ● Low 40 ● ● ● High ● ● ● ● 30 ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● 10 ● ● ● ● Sau−3 Dan−2 Sau−2 Wau−2 Wau−1 genotype Friday, 29 May 2009
- 31. Iterating graphics and modelling • Clearly strong genotype effect. Is there a nutr effect? Is there a nutr-genotype interaction? • Hard to see from this plot - what if we remove the genotype main effect? What if we remove the nutr main effect? • How does this compare an ANOVA? Friday, 29 May 2009
- 32. qplot(genotype, weight, data=b, ● 70 colour=nutr) + smry ● ● ● ● ● 60 ● ● ● ● ● ● 50 ● ● ● ● ● ● nutr weight ● ● Low 40 ● ● ● High ● ● ● ● 30 ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● 10 ● ● ● ● Sau−3 Dan−2 Sau−2 Wau−2 Wau−1 genotype Friday, 29 May 2009
- 33. ● 20 ● ● ● ● ● ● ● 10 ● ● ● ● ● ● ● ● ● ● ● ● nutr weight2 ● ● ● 0 ● ● Low ● ● ● ● ● High ● ● ● ● ● ● ● ● ● ● ● −10 ● ● ● ● ● ● ● −20 b$weight2 <- resid(lm(weight ~ genotype, data=b)) qplot(genotype, weight2, data=b, colour=nutr) + smry ● Sau−3 Dan−2 Sau−2 Wau−2 Wau−1 genotype Friday, 29 May 2009
- 34. ● ● ● ● ● 10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● nutr weight3 ● 0 ● ● ● ● ● ● Low ● ● ● ● ● ● High ● ● ● ● ● ● ● ● ● −10 ● ● ● ● −20 b$weight3 <- resid(lm(weight ~ genotype + nutr, data=b)) qplot(genotype, weight3, data=b, colour=nutr) + smry ● Sau−3 Dan−2 Sau−2 Wau−2 Wau−1 genotype Friday, 29 May 2009
- 35. Df Sum Sq Mean Sq F value Pr(>F) genotype 4 13331 3333 36.22 8.4e-13 *** nutr 1 1053 1053 11.44 0.0016 ** genotype:nutr 4 144 36 0.39 0.8141 Residuals 40 3681 92 anova(lm(weight ~ genotype * nutr, data=b)) Friday, 29 May 2009
- 36. Graphics ➙ Model • In the previous example, we used graphics to iteratively build up a model - a la stepwise regression! • But: here interested in gestalt, not accurate prediction, and must remember that this is just one possible model • What about model ➙ graphics? Friday, 29 May 2009
- 37. Model ➙ Graphics • If we model ﬁrst, we need graphical tools to summarise model results, e.g. post-hoc comparison of levels • We can do better than SAS! But it’s hard work: effects, multComp and multCompView • Rich research area Friday, 29 May 2009
- 38. ● ● ● ● ● ● 60 ● ● ● ● ● ● ● ● ● ● ● ● ● ● nutr weight 40 ● ● Low ● ● High ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● 0 a a b bc c Sau3 Dan2 Sau2 Wau2 Wau1 genotype Friday, 29 May 2009
- 39. ● ● ● ● ● ● 60 ● ● ● ● ● ● ● ● ● ● ● ● ● ● nutr weight 40 ● ● Low ● ● High ● ● ● ● ● ● ● ● ggplot(b, aes(x=genotype, y=weight)) ● ● ● ● ● + geom_hline(intercept = mean(b$weight)) 20 ● ● ● ● + geom_crossbar(aes(y=fit, min=lower,max=upper), ● ● ● data=geffect) ● ● ● ● + 0geom_point(aes(colour = nutr)) a a b bc c + geom_text(aes(label Sau2group), data=geffect) Sau3 Dan2 = Wau2 Wau1 genotype Friday, 29 May 2009
- 40. Tables of plots • Often interested in marginal, as well as conditional, relationships • Or comparing one subset to the whole, rather than to other subsets • Like in contingency table, we often want to see margins as well Friday, 29 May 2009
- 41. Dan−2 Sau−2 Sau−3 Wau−1 Wau−2 ● ● 4.5 ● 4.0 ● ● ● ● ● ● ● ● ● 3.5 ● High ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 3.0 ● ● ● ● ● ● ● ● ● ● ● ● ● 2.5 ● ● 2.0 trt NGM n 4.5 GM 4.0 3.5 Low 3.0 ● 2.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● GM NGM GM NGM GM NGM GM NGM GM NGM trt Friday, 29 May 2009
- 42. Dan−2 Sau−2 Sau−3 Wau−1 Wau−2 (all) ● ● ● ● 4.5 ● ● 4.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 3.5 ● ● High ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 3.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2.5 ● ● ● ● 2.0 4.5 4.0 trt 3.5 Low NGM n 3.0 ● ● GM 2.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 4.5 ● ● 4.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 3.5 ● ● (all) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 3.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● GM NGM GM NGM GM NGM GM NGM GM NGM GM NGM trt Friday, 29 May 2009
- 43. Arranging plots • Facilitate comparisons of interest • Small differences need to be closer together (big differences can be far apart) • Connections to model? Friday, 29 May 2009
- 44. Summary • Need to move beyond canned statistical graphics to experimenting with new graphical methods • Strong links between graphics and models, how can we best use them? • Static graphics often aren’t enough Friday, 29 May 2009
- 45. Future Friday, 29 May 2009
- 46. GGobi ggplot2 reshape plyr rggobi data expo 09 data sets classiﬂy meiﬂy clusterﬂy Friday, 29 May 2009
- 47. GGobi ggplot2 reshape plyr rggobi data expo 09 data sets classiﬂy meiﬂy clusterﬂy Foundations of New methods Data analysis statistical graphics Friday, 29 May 2009
- 48. GGobi ggplot2 reshape2 ?? plyr A grammar of ?? rggobi data expo 09 interactive graphics data sets classiﬂy meiﬂy clusterﬂy Product/area graphics Tourr Foundations of New methods Data analysis statistical graphics Friday, 29 May 2009
- 49. Questions? Friday, 29 May 2009

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment