Introduction to ggplot2

5,341 views

Published on

Published in: Design, Technology, Education

Introduction to ggplot2

  1. 1. Introduction to ggplot2 Elegant Graphics for Data Analysis Maik Röder 15.12.2011 RUGBCN and Barcelona Code Meetupvendredi 16 décembre 2011 1
  2. 2. Data Analysis Steps • Prepare data • e.g. using the reshape framework for restructuring data • Plot data • e.g. using ggplot2 instead of base graphics and lattice • Summarize the data and refine the plots • Iterative processvendredi 16 décembre 2011 2
  3. 3. ggplot2 grammar of graphicsvendredi 16 décembre 2011 3
  4. 4. Grammar • Oxford English Dictionary: • The fundamental principles or rules of an art or science • A book presenting these in methodical form. (Now rare; formerly common in the titles of books.) • System of rules underlying a given language • An abstraction which facilitates thinking, reasoning and communicatingvendredi 16 décembre 2011 4
  5. 5. The grammar of graphics • Move beyond named graphics (e.g. “scatterplot”) • gain insight into the deep structure that underlies statistical graphics • Powerful and flexible system for • constructing abstract graphs (set of points) mathematically • Realizing physical representations as graphics by mapping aesthetic attributes (size, colour) to graphs • Lacking openly available implementationvendredi 16 décembre 2011 5
  6. 6. Specification Concise description of components of a graphic • DATA - data operations that create variables from datasets. Reshaping using an Algebra with operations • TRANS - variable transformations • SCALE - scale transformations • ELEMENT - graphs and their aesthetic attributes • COORD - a coordinate system • GUIDE - one or more guidesvendredi 16 décembre 2011 6
  7. 7. Birth/Death Rate Source: http://www.scalloway.org.uk/popu6.htmvendredi 16 décembre 2011 7
  8. 8. Excess birth (vs. death) rates in selected countries Source: The grammar of Graphics, p.13vendredi 16 décembre 2011 8
  9. 9. Grammar of Graphics Specification can be run in GPL implemented in SPSS DATA: source("demographics") DATA: longitude, latitude = map(source("World")) TRANS: bd = max(birth - death, 0) COORD: project.mercator() ELEMENT: point(position(lon * lat), size(bd), color(color.red)) ELEMENT: polygon(position(longitude * latitude)) Source: The grammar of Graphics, p.13vendredi 16 décembre 2011 9
  10. 10. Rearrangement of Components Grammar of Graphics Layered Grammar of Graphics Data Defaults Trans Data Mapping Element Layer Data Mapping Geom Stat Scale Position Guide Scale Coord Coord Facetvendredi 16 décembre 2011 10
  11. 11. Layered Grammar of Graphics Implementation embedded in R using ggplot2 w <- world d <- demographics d <- transform(d, bd = pmax(birth - death, 0)) p <- ggplot(d, aes(lon, lat)) p <- p + geom_polygon(data = w) p <- p + geom_point(aes(size = bd), colour = "red") p <- p + coord_map(projection = "mercator") pvendredi 16 décembre 2011 11
  12. 12. ggplot2 • Author: Hadley Wickham • Open Source implementation of the layered grammar of graphics • High-level R package for creating publication- quality statistical graphics • Carefully chosen defaults following basic graphical design rules • Flexible set of components for creating any type of graphicsvendredi 16 décembre 2011 12
  13. 13. ggplot2 installation • In R console: install.packages("ggplot2") library(ggplot2)vendredi 16 décembre 2011 13
  14. 14. qplot • Quickly plot something with qplot • for exploring ideas interactively • Same options as plot converted to ggplot2 qplot(carat, price, data=diamonds, main = "Diamonds", asp = 1)vendredi 16 décembre 2011 14
  15. 15. vendredi 16 décembre 2011 15
  16. 16. Exploring with qplot First try: qplot(carat, price, data=diamonds) Log transform using functions on the variables: qplot(log(carat), log(price), data=diamonds)vendredi 16 décembre 2011 16
  17. 17. vendredi 16 décembre 2011 17
  18. 18. from qplot to ggplotqplot(carat, price, data=diamonds, main = "Diamonds", asp = 1)p <- ggplot(diamonds, aes(carat, price))p <- p + geom_point()p <- p + opts(title = "Diamonds", aspect.ratio = 1)pvendredi 16 décembre 2011 18
  19. 19. Data and mapping • If you need to flexibly restructure and aggregate data beforehand, use Reshape • data is considered an independent concern • Need a mapping of what variables are mapped to what aesthetic • weight => x, height => y, age => size • Mappings are defined in scalesvendredi 16 décembre 2011 19
  20. 20. Statistical Transformations • a stat transforms data • can add new variables to a dataset • that can be used in aesthetic mappingsvendredi 16 décembre 2011 20
  21. 21. stat_smooth • Fits a smoother to the data • Displays a smooth and its standard error ggplot(diamonds, aes(carat, price)) + geom_point() + geom_smooth()vendredi 16 décembre 2011 21
  22. 22. vendredi 16 décembre 2011 22
  23. 23. Geometric Object • Control the type of plot • A geom can only display certain aestheticsvendredi 16 décembre 2011 23
  24. 24. geom_histogram • Distribution of carats shown in a histogram ggplot(diamonds, aes(carat)) + geom_histogram()vendredi 16 décembre 2011 24
  25. 25. vendredi 16 décembre 2011 25
  26. 26. Position adjustments • Tweak positioning of geometric objects • Avoid overlapsvendredi 16 décembre 2011 26
  27. 27. position_jitter • Avoid overplotting by jittering points x <- c(0, 0, 0, 0, 0) y <- c(0, 0, 0, 0, 0) overplotted <- data.frame(x, y) ggplot(overplotted, aes(x,y)) + geom_point(position=position_jitter (w=0.1, h=0.1))vendredi 16 décembre 2011 27
  28. 28. vendredi 16 décembre 2011 28
  29. 29. Scales • Control mapping from data to aesthetic attributes • One scale per aestheticvendredi 16 décembre 2011 29
  30. 30. scale_x_continuous scale_y_continuous x <- c(0, 0, 0, 0, 0) y <- c(0, 0, 0, 0, 0) overplotted <- data.frame(x, y) ggplot(overplotted, aes(x,y)) + geom_point(position=position_jitter (w=0.1, h=0.1)) + scale_x_continuous(limits=c(-1,1)) + scale_y_continuous(limits=c(-1,1))vendredi 16 décembre 2011 30
  31. 31. vendredi 16 décembre 2011 31
  32. 32. Coordinate System • Maps the position of objects into the plane • Affect all position variables simultaneously • Change appearance of geoms (unlike scales)vendredi 16 décembre 2011 32
  33. 33. coord_maplibrary("maps")map <- map("nz", plot=FALSE)[c("x","y")]m <- data.frame(map)n <- qplot(x, y, data=m, geom="path")nd <- data.frame(c(0), c(0))n + geom_point(data = d, colour = "red")vendredi 16 décembre 2011 33
  34. 34. vendredi 16 décembre 2011 34
  35. 35. Faceting • lay out multiple plots on a page • split data into subsets • plot subsets into different panelsvendredi 16 décembre 2011 35
  36. 36. Facet Types 2D grid of panels: 1D ribbon of panels wrapped into 2D:vendredi 16 décembre 2011 36
  37. 37. Faceting aesthetics <- aes(carat, ..density..) p <- ggplot(diamonds, aesthetics) p <- p + geom_histogram(binwidth = 0.2) p + facet_grid(clarity ~ cut)vendredi 16 décembre 2011 37
  38. 38. vendredi 16 décembre 2011 38
  39. 39. Faceting Formula no faceting .~ . single row multiple columns .~ a single column, multiple rows b~. multiple rows and columns a~b .~ a + b multiple variables in rows and/or a + b ~. columns a+b~c+dvendredi 16 décembre 2011 39
  40. 40. Scales in Facets facet_grid(. ~ cyl, scales="free_x") scales value free fixed - free x, y free_x x free_y yvendredi 16 décembre 2011 40
  41. 41. Layers • Iterativey update a plot • change a single feature at a time • Think about the high level aspects of the plot in isolation • Instead of choosing a static type of plot, create new types of plots on the fly • Cure against immobility • Developers can easily develop new layers without affecting other layersvendredi 16 décembre 2011 41
  42. 42. Hierarchy of defaults Omitted layer Default chosen by layer Stat Geom Geom Stat Mapping Plot default Coord Cartesian coordinates Chosen depending on aesthetic and type of Scale variable Linear scaling for continuous variables Position Integers for categorical variablesvendredi 16 décembre 2011 42
  43. 43. Thanks! • Visit the ggplot2 homepage: • http://had.co.nz/ggplot2/ • Get the ggplot2 book: • http://amzn.com/0387981403 • Get the Grammar of Graphics book from Leland Wilkinson: • http://amzn.com/0387245448vendredi 16 décembre 2011 43

×