ggplot2:                            A backstage tour                            Hadley Wickham                            ...
1. Why ggplot2?                2. Sneak peek and new features                3. Best practices                4. Questions...
Poll: What graphics                          system are you                         currently using?Wednesday, February 8,...
Why ggplot2?Wednesday, February 8, 12
2004                                          WHC                                                  ●                      ...
Wednesday, February 8, 12
“Nothing is as practical as a good theory”              —Kurt Lewin              “[A good model] will bring together in a ...
A plot is made up of multiple layers.     A layer consists of data, a set of     mappings between variables and     aesthe...
Interesting ggplot example                            Layered grammar + ggplot2                                           ...
Charlotte Wickham, http://cwick.co.nz/Wednesday, February 8, 12
David B Sparks, http://bit.ly/hn54NWWednesday, February 8, 12
Claudia Beleites, http://bit.ly/yNqlpzWednesday, February 8, 12
Poll: What resources are                most helpful to you when                improving your R skills?Wednesday, Februar...
Learning ggplot2                    ggplot2 mailing list                    http://groups.google.com/group/ggplot2        ...
Sneak                             peekWednesday, February 8, 12
Poll: Why do you use                     visualisation?Wednesday, February 8, 12
# Getting started     # To get the CRAN version     install.packages("ggplot2")     # To get the development version     i...
Development version                       CRAN versionWednesday, February 8, 12
New geoms to deal with overplotting       45                               ●                                       ●      ...
New geoms to deal with overplotting       45                               ●                                       ●      ...
45                                                                                                                     ●  ...
45       40       35       30 hwy       25       20       15                  2seater   compact   midsize   minivan   pick...
45                              ●                                       ●                                                 ...
45                            ●                           ●                                                        ●      ...
1.0            Better legends                             ●                                                               ...
1.0            Better legends                             ●                                                               ...
1.0                                                                                                                       ...
1.0                                                                                                                       ...
qplot(x, y, data = df, colour = colour, alpha = I(1/4))Wednesday, February 8, 12
qplot(x, y, data = df, colour = colour, alpha = I(1/4)) +  guides(colour = guide_legend(    override.aes = list(alpha = 1,...
# Better layout     df <- data.frame(x = 1:10, y = 10:1, colour = 1:2)     qplot(x, y, data = df) + coord_fixed()     qplo...
# Speed improvements     system.time(       print(qplot(carat, price, data = diamonds))     )     # Includes new tools for...
30s with geom_tile, 8s with annotation_rasterWednesday, February 8, 12
library(ggplot2)     library(reshape2)     library(RgoogleMaps)     library(ggmap)     theft <- subset(crime, offense == "...
ggplot2 0.9 scheduled for                              release on March 1Wednesday, February 8, 12
Poll: How big                            is your data?Wednesday, February 8, 12
# Future work: big visualisation     # (Sponsored by Revolution Analytics)     # How can you make a plot of 100 million   ...
Wednesday, February 8, 12
Wednesday, February 8, 12
Wednesday, February 8, 12
Wednesday, February 8, 12
Wednesday, February 8, 12
~100,000 points                            0.06 s to bin                            0.20 s to convert                     ...
~1.2 million                            ~100,000 points                            10 s toto bin                          ...
Best                            practicesWednesday, February 8, 12
Poll: How do you learn                      about new packages?Wednesday, February 8, 12
Package best                              practices                    • Namespace                    • Documentation     ...
Wednesday, February 8, 12
# Namespaces     library(ggplot2)     ddply     # Note that plyr, reshape etc arent automatically     # loaded. This is go...
export("%+%")     export(aes_all)     export(aes_auto)     export(aes_string)     export(aes)     export(annotate)     exp...
# Unit tests     # Look in tests/ or inst/tests/     library(testthat)     test_package("ggplot2")Wednesday, February 8, 12
# Documentation     # Function level in man/     ?geom_point     ?facet_wrap     package?ggplot2     # Vignettes in inst/d...
QuestionsWednesday, February 8, 12
Learning ggplot2                    ggplot2 mailing list                    http://groups.google.com/group/ggplot2        ...
Upcoming SlideShare
Loading in...5
×

A Backstage Tour of ggplot2 with Hadley Wickham

8,639

Published on

Ggplot2 is one of R’s most popular, widely used packages, developed by Rice University’s Hadley Wickham. Ggplot2’s exploratory graphics capabilities are driving the use of R as a complement to legacy analytics tools such as SAS. SAS is well-regarded for its strength in data management and "production" statistics, where you know what you want to do and need to do it repeatedly. On the other hand, R is strong in data analysis and exploration in situations where figuring out what is needed is the biggest challenge. In this important way, SAS and R are strong companions.

This webinar will provide an all-access pass to Hadley’s latest work. He’ll discuss:

* A brief overview of ggplot2, and how it's different to other plotting systems
* A sneak peek at some of the new features coming to the next version of ggplot2
* What’s been learned about good development practices in the 5 years since first starting to develop ggplot
* Some of the internals of ggplot2, and talk about how he is gradually making it easier for others to contribute.

Published in: Education, Technology

A Backstage Tour of ggplot2 with Hadley Wickham

  1. 1. ggplot2: A backstage tour Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics Rice University February 2012Wednesday, February 8, 12
  2. 2. 1. Why ggplot2? 2. Sneak peek and new features 3. Best practices 4. QuestionsWednesday, February 8, 12
  3. 3. Poll: What graphics system are you currently using?Wednesday, February 8, 12
  4. 4. Why ggplot2?Wednesday, February 8, 12
  5. 5. 2004 WHC ● 02H ● 02M ● 12H ● 0.2 0.1 0.0 whc −0.1 −0.2 −0.3 20 40 60 80 dayWednesday, February 8, 12
  6. 6. Wednesday, February 8, 12
  7. 7. “Nothing is as practical as a good theory” —Kurt Lewin “[A good model] will bring together in a coherent way things that previously appeared unrelated and which also will provide a basis for dealing systematically with new situations” —David CoxWednesday, February 8, 12
  8. 8. A plot is made up of multiple layers. A layer consists of data, a set of mappings between variables and aesthetics, a geometric object and a statistical transformation Scales control the details of the mapping. All components are independent and reusable.Wednesday, February 8, 12
  9. 9. Interesting ggplot example Layered grammar + ggplot2 James Cheshire, http://bit.ly/xqHhAsWednesday, February 8, 12
  10. 10. Charlotte Wickham, http://cwick.co.nz/Wednesday, February 8, 12
  11. 11. David B Sparks, http://bit.ly/hn54NWWednesday, February 8, 12
  12. 12. Claudia Beleites, http://bit.ly/yNqlpzWednesday, February 8, 12
  13. 13. Poll: What resources are most helpful to you when improving your R skills?Wednesday, February 8, 12
  14. 14. Learning ggplot2 ggplot2 mailing list http://groups.google.com/group/ggplot2 stackoverflow http://stackoverflow.com/tags/ggplot2 Lattice to ggplot2 conversion http://learnr.wordpress.com/?s=lattice Cookbook for common graphics http://wiki.stdout.org/rcookbook/Graphs/ ggplot2 book http://amzn.com/0387981403Wednesday, February 8, 12
  15. 15. Sneak peekWednesday, February 8, 12
  16. 16. Poll: Why do you use visualisation?Wednesday, February 8, 12
  17. 17. # Getting started # To get the CRAN version install.packages("ggplot2") # To get the development version install.packages("devtools") library(devtools) dev_mode() # dont overwrite your existing install install_github("ggplot2")Wednesday, February 8, 12
  18. 18. Development version CRAN versionWednesday, February 8, 12
  19. 19. New geoms to deal with overplotting 45 ● ● 40 (by Winston Chang) ● ● ● 35 ● ● ● ● ● ● ● ● 30 ● ● ● ● ● hwy ● ● ● ● ● ● ● ● ● ● ● ● 25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● ● 15 ● ● ● ● ● 2seater compact midsize minivan pickup subcompact suv classWednesday, February 8, 12
  20. 20. New geoms to deal with overplotting 45 ● ● 40 (by Winston Chang) ● ● ● 35 ● ● ● ● ● ● ● ● 30 ● ● ● ● ● hwy ● ● ● ● ● ● ● ● ● ● ● ● 25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● ● 15 ● ● ● ● ● 2seater compact midsize minivan pickup subcompact suvqplot(class, hwy, data = mpg) classWednesday, February 8, 12
  21. 21. 45 ● ● ● 40 ● ● ● 35 ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● 30 ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ●●● hwy ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●●● ●● ● ● ● 25 ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ●● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● 15 ● ● ● ● ● ●● ● ● ● ● ● 2seater compact midsize minivan pickup subcompact suvqplot(class, hwy, data = mpg, geom = "jitter") classWednesday, February 8, 12
  22. 22. 45 40 35 30 hwy 25 20 15 2seater compact midsize minivan pickup subcompact suvqplot(class, hwy, data = mpg, geom = "violin") classWednesday, February 8, 12
  23. 23. 45 ● ● ● 40 ● ●● 35 ●● ● ● ● ● ●●● ●●● ●●●● 30 ●● ●● ●●●●●●●●●●●● ●●●●● ●●●●● hwy ● ●●●● ●● ●●●●●●● ●●●●● ● ● ●● ●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●● ● 25 ● ●●●●●●● ●●●● ● ●● ● ●● ● ●●●● ●●●● ● ● ● ● ●● ● ● ●●● ● ●● ● ● ● 20 ●●● ● ●●●●●●● ●●●● ●●●●●●●●● ●● ●●●●●●●● ● ●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●● ●● 15 ●●●● ●●●●●● ●● ●●● ●● 10 2seater compact midsize minivan pickup subcompact suv classWednesday, February 8, 12
  24. 24. 45 ● ● ● 40 ● ●● 35 ●● ● ● ● ● ●●● ●●● ●●●● 30 ●● ●● ●●●●●●●●●●●● ●●●●● ●●●●● hwy ● ●●●● ●● ●●●●●●● ●●●●● ● ● ●● ●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●● ● 25 ● ●●●●●●● ●●●● ● ●● ● ●● ● ●●●● ●●●● ● ● ● ● ●● ● ● ●●● ● ●● ● ● ● 20 ●●● ● ●●●●●●● ●●●● ●●●●●●●●● ●● ●●●●●●●● ● ●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●● ●● 15 ●●●● ●●●●●● ●●qplot(class, hwy, data = mpg, geom = "dotplot", ●● ●●● stackdir = "center", binaxis = "y", 10 2seater compact midsize minivan pickup subcompact suv stackratio = 1, binwidth = class 1)Wednesday, February 8, 12
  25. 25. 1.0 Better legends ● ● ● ● (by Kohske Takahashi) ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● colour ● 0.6 ● ● ● 0.5 ● ● ● ● ● ● ● ● ● 1.0 ● y ● ● ● 1.5 ● ● ● 2.0 ● ● ● ● ● 0.4 ● ● 2.5 ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 0.0 0.2 0.4 0.6 0.8 1.0 xWednesday, February 8, 12
  26. 26. 1.0 Better legends ● ● ● ● (by Kohske Takahashi) ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● colour ● 0.6 ● ● ● 0.5 ● ● ● ● ● ● ● ● ● 1.0 ● y ● ● ● 1.5 ● ● ● 2.0 ● ● ● ● ● 0.4 ● ● 2.5 ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ●df <- data.frame(x = runif(100), y = runif(100)) ● ● ● ● ● ● ● ●df$colour <- with(df, x ^ 2 + y + runif(100)) ● ● ● ● ● ● ● ● ● 0.0 0.0 0.2 0.4 0.6 0.8 1.0qplot(x, y, data = df, colour = colour) xWednesday, February 8, 12
  27. 27. 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● 0.6 ● ● ● ● ● ● colour ● ● ● ● ● ● 0.5 ● 1.0 ● 1.5 y ● ● ● ● 2.0 ● 2.5 ● ● ● ● ● ● 0.4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0qplot(x, y, data = df, colour = colour) + 0.0 0.2 0.4 0.6 0.8 1.0 guides(colour = guide_legend(nrow = 2, byrow = T)) xWednesday, February 8, 12
  28. 28. 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● colour 0.6 ● ● 2.5 ● ● ● ● ● 2.0 ● ● ● ● ● 1.5 y ● ● ● ● 1.0 ● ● ● ● 0.4 ● ● 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0qplot(x, y, data = df, 0.4 0.0 0.2 colour = colour)0.8+ 0.6 1.0 guides(colour = guide_colorbar()) xWednesday, February 8, 12
  29. 29. qplot(x, y, data = df, colour = colour, alpha = I(1/4))Wednesday, February 8, 12
  30. 30. qplot(x, y, data = df, colour = colour, alpha = I(1/4)) + guides(colour = guide_legend( override.aes = list(alpha = 1, size = 2)))Wednesday, February 8, 12
  31. 31. # Better layout df <- data.frame(x = 1:10, y = 10:1, colour = 1:2) qplot(x, y, data = df) + coord_fixed() qplot(x, y, data = df) + facet_wrap(~ colour) # Internally, there has been a big rewrite of # the facetting data processing and rendering # systems. This lays the foundation for new # features, and fixes some annoying long-standing # bugs.Wednesday, February 8, 12
  32. 32. # Speed improvements system.time( print(qplot(carat, price, data = diamonds)) ) # Includes new tools for figuring out whats # taking all the time benchplot(qplot(carat, price, data = diamonds)) # See also geom_raster and geom_map # Still a lot of work to do. The emphasis in # ggplot2 is reducing the amount of thinking # time by making it easier to go from the plot in # your brain to the plot on the page.Wednesday, February 8, 12
  33. 33. 30s with geom_tile, 8s with annotation_rasterWednesday, February 8, 12
  34. 34. library(ggplot2) library(reshape2) library(RgoogleMaps) library(ggmap) theft <- subset(crime, offense == "theft" & lat > 29 & lat < 30.2 & lon > -95.8) lonr <- range(theft$lon) latr <- range(theft$lat) h_map <- GetMap.bbox(lonr, latr, size = c(1024, 1024)) h_raster <- as.raster(h_map$myTile) benchplot(ggplot(theft, aes(lon, lat)) + annotation_raster(h_raster, lonr[1], lonr[2], latr[1], latr[2]) + geom_density2d(colour = "black")) h_data <- melt(as.matrix(h_raster)) h_data$lat <- seq(latr[2], latr[1], length = 640)[h_data$Var1] h_data$lon <- seq(lonr[1], lonr[2], length = 640)[h_data$Var2] benchplot(ggplot(theft, aes(lon, lat)) + geom_tile(aes(fill = value), data = h_data) + scale_fill_identity() + geom_density2d(colour = "black"))Wednesday, February 8, 12
  35. 35. ggplot2 0.9 scheduled for release on March 1Wednesday, February 8, 12
  36. 36. Poll: How big is your data?Wednesday, February 8, 12
  37. 37. # Future work: big visualisation # (Sponsored by Revolution Analytics) # How can you make a plot of 100 million # observations? # In less that one minute.Wednesday, February 8, 12
  38. 38. Wednesday, February 8, 12
  39. 39. Wednesday, February 8, 12
  40. 40. Wednesday, February 8, 12
  41. 41. Wednesday, February 8, 12
  42. 42. Wednesday, February 8, 12
  43. 43. ~100,000 points 0.06 s to bin 0.20 s to convert 6.0 s to plotWednesday, February 8, 12
  44. 44. ~1.2 million ~100,000 points 10 s toto bin 0.06 s bin 0.20 s to convert 6.0 s to plotWednesday, February 8, 12
  45. 45. Best practicesWednesday, February 8, 12
  46. 46. Poll: How do you learn about new packages?Wednesday, February 8, 12
  47. 47. Package best practices • Namespace • Documentation • Unit tests • Read the source! • (ggplot2 not always the best example: it was was my second R package - I have now written around 30. I now know a lot more!)Wednesday, February 8, 12
  48. 48. Wednesday, February 8, 12
  49. 49. # Namespaces library(ggplot2) ddply # Note that plyr, reshape etc arent automatically # loaded. This is good development practice - # its better to be explicit than implicit. # Look at the NAMESPACE file.Wednesday, February 8, 12
  50. 50. export("%+%") export(aes_all) export(aes_auto) export(aes_string) export(aes) export(annotate) export(annotation_custom) export(annotation_map) export(annotation_raster) export(autoplot) export(benchplot) export(borders) export(continuous_scale) export(coord_cartesian) export(coord_equal) export(coord_fixed) export(coord_flip) export(coord_map) export(coord_polar) ...Wednesday, February 8, 12
  51. 51. # Unit tests # Look in tests/ or inst/tests/ library(testthat) test_package("ggplot2")Wednesday, February 8, 12
  52. 52. # Documentation # Function level in man/ ?geom_point ?facet_wrap package?ggplot2 # Vignettes in inst/doc # (ggplot2 doesnt have any) # Publications citation("ggplot2")Wednesday, February 8, 12
  53. 53. QuestionsWednesday, February 8, 12
  54. 54. Learning ggplot2 ggplot2 mailing list http://groups.google.com/group/ggplot2 stackoverflow http://stackoverflow.com/tags/ggplot2 Lattice to ggplot2 conversion http://learnr.wordpress.com/?s=lattice Cookbook for common graphics http://wiki.stdout.org/rcookbook/Graphs/ ggplot2 book http://amzn.com/0387981403Wednesday, February 8, 12
  1. ¿Le ha llamado la atención una diapositiva en particular?

    Recortar diapositivas es una manera útil de recopilar información importante para consultarla más tarde.

×