Your SlideShare is downloading. ×
Introduction to ggplot2
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Introduction to ggplot2

4,085
views

Published on

Published in: Design, Technology, Education

0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,085
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
175
Comments
0
Likes
8
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Introduction to ggplot2 Elegant Graphics for Data Analysis Maik Röder 15.12.2011 RUGBCN and Barcelona Code Meetupvendredi 16 décembre 2011 1
  • 2. Data Analysis Steps • Prepare data • e.g. using the reshape framework for restructuring data • Plot data • e.g. using ggplot2 instead of base graphics and lattice • Summarize the data and refine the plots • Iterative processvendredi 16 décembre 2011 2
  • 3. ggplot2 grammar of graphicsvendredi 16 décembre 2011 3
  • 4. Grammar • Oxford English Dictionary: • The fundamental principles or rules of an art or science • A book presenting these in methodical form. (Now rare; formerly common in the titles of books.) • System of rules underlying a given language • An abstraction which facilitates thinking, reasoning and communicatingvendredi 16 décembre 2011 4
  • 5. The grammar of graphics • Move beyond named graphics (e.g. “scatterplot”) • gain insight into the deep structure that underlies statistical graphics • Powerful and flexible system for • constructing abstract graphs (set of points) mathematically • Realizing physical representations as graphics by mapping aesthetic attributes (size, colour) to graphs • Lacking openly available implementationvendredi 16 décembre 2011 5
  • 6. Specification Concise description of components of a graphic • DATA - data operations that create variables from datasets. Reshaping using an Algebra with operations • TRANS - variable transformations • SCALE - scale transformations • ELEMENT - graphs and their aesthetic attributes • COORD - a coordinate system • GUIDE - one or more guidesvendredi 16 décembre 2011 6
  • 7. Birth/Death Rate Source: http://www.scalloway.org.uk/popu6.htmvendredi 16 décembre 2011 7
  • 8. Excess birth (vs. death) rates in selected countries Source: The grammar of Graphics, p.13vendredi 16 décembre 2011 8
  • 9. Grammar of Graphics Specification can be run in GPL implemented in SPSS DATA: source("demographics") DATA: longitude, latitude = map(source("World")) TRANS: bd = max(birth - death, 0) COORD: project.mercator() ELEMENT: point(position(lon * lat), size(bd), color(color.red)) ELEMENT: polygon(position(longitude * latitude)) Source: The grammar of Graphics, p.13vendredi 16 décembre 2011 9
  • 10. Rearrangement of Components Grammar of Graphics Layered Grammar of Graphics Data Defaults Trans Data Mapping Element Layer Data Mapping Geom Stat Scale Position Guide Scale Coord Coord Facetvendredi 16 décembre 2011 10
  • 11. Layered Grammar of Graphics Implementation embedded in R using ggplot2 w <- world d <- demographics d <- transform(d, bd = pmax(birth - death, 0)) p <- ggplot(d, aes(lon, lat)) p <- p + geom_polygon(data = w) p <- p + geom_point(aes(size = bd), colour = "red") p <- p + coord_map(projection = "mercator") pvendredi 16 décembre 2011 11
  • 12. ggplot2 • Author: Hadley Wickham • Open Source implementation of the layered grammar of graphics • High-level R package for creating publication- quality statistical graphics • Carefully chosen defaults following basic graphical design rules • Flexible set of components for creating any type of graphicsvendredi 16 décembre 2011 12
  • 13. ggplot2 installation • In R console: install.packages("ggplot2") library(ggplot2)vendredi 16 décembre 2011 13
  • 14. qplot • Quickly plot something with qplot • for exploring ideas interactively • Same options as plot converted to ggplot2 qplot(carat, price, data=diamonds, main = "Diamonds", asp = 1)vendredi 16 décembre 2011 14
  • 15. vendredi 16 décembre 2011 15
  • 16. Exploring with qplot First try: qplot(carat, price, data=diamonds) Log transform using functions on the variables: qplot(log(carat), log(price), data=diamonds)vendredi 16 décembre 2011 16
  • 17. vendredi 16 décembre 2011 17
  • 18. from qplot to ggplotqplot(carat, price, data=diamonds, main = "Diamonds", asp = 1)p <- ggplot(diamonds, aes(carat, price))p <- p + geom_point()p <- p + opts(title = "Diamonds", aspect.ratio = 1)pvendredi 16 décembre 2011 18
  • 19. Data and mapping • If you need to flexibly restructure and aggregate data beforehand, use Reshape • data is considered an independent concern • Need a mapping of what variables are mapped to what aesthetic • weight => x, height => y, age => size • Mappings are defined in scalesvendredi 16 décembre 2011 19
  • 20. Statistical Transformations • a stat transforms data • can add new variables to a dataset • that can be used in aesthetic mappingsvendredi 16 décembre 2011 20
  • 21. stat_smooth • Fits a smoother to the data • Displays a smooth and its standard error ggplot(diamonds, aes(carat, price)) + geom_point() + geom_smooth()vendredi 16 décembre 2011 21
  • 22. vendredi 16 décembre 2011 22
  • 23. Geometric Object • Control the type of plot • A geom can only display certain aestheticsvendredi 16 décembre 2011 23
  • 24. geom_histogram • Distribution of carats shown in a histogram ggplot(diamonds, aes(carat)) + geom_histogram()vendredi 16 décembre 2011 24
  • 25. vendredi 16 décembre 2011 25
  • 26. Position adjustments • Tweak positioning of geometric objects • Avoid overlapsvendredi 16 décembre 2011 26
  • 27. position_jitter • Avoid overplotting by jittering points x <- c(0, 0, 0, 0, 0) y <- c(0, 0, 0, 0, 0) overplotted <- data.frame(x, y) ggplot(overplotted, aes(x,y)) + geom_point(position=position_jitter (w=0.1, h=0.1))vendredi 16 décembre 2011 27
  • 28. vendredi 16 décembre 2011 28
  • 29. Scales • Control mapping from data to aesthetic attributes • One scale per aestheticvendredi 16 décembre 2011 29
  • 30. scale_x_continuous scale_y_continuous x <- c(0, 0, 0, 0, 0) y <- c(0, 0, 0, 0, 0) overplotted <- data.frame(x, y) ggplot(overplotted, aes(x,y)) + geom_point(position=position_jitter (w=0.1, h=0.1)) + scale_x_continuous(limits=c(-1,1)) + scale_y_continuous(limits=c(-1,1))vendredi 16 décembre 2011 30
  • 31. vendredi 16 décembre 2011 31
  • 32. Coordinate System • Maps the position of objects into the plane • Affect all position variables simultaneously • Change appearance of geoms (unlike scales)vendredi 16 décembre 2011 32
  • 33. coord_maplibrary("maps")map <- map("nz", plot=FALSE)[c("x","y")]m <- data.frame(map)n <- qplot(x, y, data=m, geom="path")nd <- data.frame(c(0), c(0))n + geom_point(data = d, colour = "red")vendredi 16 décembre 2011 33
  • 34. vendredi 16 décembre 2011 34
  • 35. Faceting • lay out multiple plots on a page • split data into subsets • plot subsets into different panelsvendredi 16 décembre 2011 35
  • 36. Facet Types 2D grid of panels: 1D ribbon of panels wrapped into 2D:vendredi 16 décembre 2011 36
  • 37. Faceting aesthetics <- aes(carat, ..density..) p <- ggplot(diamonds, aesthetics) p <- p + geom_histogram(binwidth = 0.2) p + facet_grid(clarity ~ cut)vendredi 16 décembre 2011 37
  • 38. vendredi 16 décembre 2011 38
  • 39. Faceting Formula no faceting .~ . single row multiple columns .~ a single column, multiple rows b~. multiple rows and columns a~b .~ a + b multiple variables in rows and/or a + b ~. columns a+b~c+dvendredi 16 décembre 2011 39
  • 40. Scales in Facets facet_grid(. ~ cyl, scales="free_x") scales value free fixed - free x, y free_x x free_y yvendredi 16 décembre 2011 40
  • 41. Layers • Iterativey update a plot • change a single feature at a time • Think about the high level aspects of the plot in isolation • Instead of choosing a static type of plot, create new types of plots on the fly • Cure against immobility • Developers can easily develop new layers without affecting other layersvendredi 16 décembre 2011 41
  • 42. Hierarchy of defaults Omitted layer Default chosen by layer Stat Geom Geom Stat Mapping Plot default Coord Cartesian coordinates Chosen depending on aesthetic and type of Scale variable Linear scaling for continuous variables Position Integers for categorical variablesvendredi 16 décembre 2011 42
  • 43. Thanks! • Visit the ggplot2 homepage: • http://had.co.nz/ggplot2/ • Get the ggplot2 book: • http://amzn.com/0387981403 • Get the Grammar of Graphics book from Leland Wilkinson: • http://amzn.com/0387245448vendredi 16 décembre 2011 43