A Backstage Tour of ggplot2 with Hadley Wickham
 

Like this? Share it with your network

Share

A Backstage Tour of ggplot2 with Hadley Wickham

on

  • 8,731 views

Ggplot2 is one of R’s most popular, widely used packages, developed by Rice University’s Hadley Wickham. Ggplot2’s exploratory graphics capabilities are driving the use of R as a complement to ...

Ggplot2 is one of R’s most popular, widely used packages, developed by Rice University’s Hadley Wickham. Ggplot2’s exploratory graphics capabilities are driving the use of R as a complement to legacy analytics tools such as SAS. SAS is well-regarded for its strength in data management and "production" statistics, where you know what you want to do and need to do it repeatedly. On the other hand, R is strong in data analysis and exploration in situations where figuring out what is needed is the biggest challenge. In this important way, SAS and R are strong companions.

This webinar will provide an all-access pass to Hadley’s latest work. He’ll discuss:

* A brief overview of ggplot2, and how it's different to other plotting systems
* A sneak peek at some of the new features coming to the next version of ggplot2
* What’s been learned about good development practices in the 5 years since first starting to develop ggplot
* Some of the internals of ggplot2, and talk about how he is gradually making it easier for others to contribute.

Statistics

Views

Total Views
8,731
Views on SlideShare
4,161
Embed Views
4,570

Actions

Likes
10
Downloads
148
Comments
1

5 Embeds 4,570

http://www.revolutionanalytics.com 4549
http://localhost 14
http://yonniedev.devcloud.acquia-sites.com 5
https://twitter.com 1
http://yonnie.devcloud.acquia-sites.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • ggplot2 best practice
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

A Backstage Tour of ggplot2 with Hadley Wickham Presentation Transcript

  • 1. ggplot2: A backstage tour Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics Rice University February 2012Wednesday, February 8, 12
  • 2. 1. Why ggplot2? 2. Sneak peek and new features 3. Best practices 4. QuestionsWednesday, February 8, 12
  • 3. Poll: What graphics system are you currently using?Wednesday, February 8, 12
  • 4. Why ggplot2?Wednesday, February 8, 12
  • 5. 2004 WHC ● 02H ● 02M ● 12H ● 0.2 0.1 0.0 whc −0.1 −0.2 −0.3 20 40 60 80 dayWednesday, February 8, 12
  • 6. Wednesday, February 8, 12
  • 7. “Nothing is as practical as a good theory” —Kurt Lewin “[A good model] will bring together in a coherent way things that previously appeared unrelated and which also will provide a basis for dealing systematically with new situations” —David CoxWednesday, February 8, 12
  • 8. A plot is made up of multiple layers. A layer consists of data, a set of mappings between variables and aesthetics, a geometric object and a statistical transformation Scales control the details of the mapping. All components are independent and reusable.Wednesday, February 8, 12
  • 9. Interesting ggplot example Layered grammar + ggplot2 James Cheshire, http://bit.ly/xqHhAsWednesday, February 8, 12
  • 10. Charlotte Wickham, http://cwick.co.nz/Wednesday, February 8, 12
  • 11. David B Sparks, http://bit.ly/hn54NWWednesday, February 8, 12
  • 12. Claudia Beleites, http://bit.ly/yNqlpzWednesday, February 8, 12
  • 13. Poll: What resources are most helpful to you when improving your R skills?Wednesday, February 8, 12
  • 14. Learning ggplot2 ggplot2 mailing list http://groups.google.com/group/ggplot2 stackoverflow http://stackoverflow.com/tags/ggplot2 Lattice to ggplot2 conversion http://learnr.wordpress.com/?s=lattice Cookbook for common graphics http://wiki.stdout.org/rcookbook/Graphs/ ggplot2 book http://amzn.com/0387981403Wednesday, February 8, 12
  • 15. Sneak peekWednesday, February 8, 12
  • 16. Poll: Why do you use visualisation?Wednesday, February 8, 12
  • 17. # Getting started # To get the CRAN version install.packages("ggplot2") # To get the development version install.packages("devtools") library(devtools) dev_mode() # dont overwrite your existing install install_github("ggplot2")Wednesday, February 8, 12
  • 18. Development version CRAN versionWednesday, February 8, 12
  • 19. New geoms to deal with overplotting 45 ● ● 40 (by Winston Chang) ● ● ● 35 ● ● ● ● ● ● ● ● 30 ● ● ● ● ● hwy ● ● ● ● ● ● ● ● ● ● ● ● 25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● ● 15 ● ● ● ● ● 2seater compact midsize minivan pickup subcompact suv classWednesday, February 8, 12
  • 20. New geoms to deal with overplotting 45 ● ● 40 (by Winston Chang) ● ● ● 35 ● ● ● ● ● ● ● ● 30 ● ● ● ● ● hwy ● ● ● ● ● ● ● ● ● ● ● ● 25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● ● 15 ● ● ● ● ● 2seater compact midsize minivan pickup subcompact suvqplot(class, hwy, data = mpg) classWednesday, February 8, 12
  • 21. 45 ● ● ● 40 ● ● ● 35 ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● 30 ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ●●● hwy ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●●● ●● ● ● ● 25 ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ●● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● 15 ● ● ● ● ● ●● ● ● ● ● ● 2seater compact midsize minivan pickup subcompact suvqplot(class, hwy, data = mpg, geom = "jitter") classWednesday, February 8, 12
  • 22. 45 40 35 30 hwy 25 20 15 2seater compact midsize minivan pickup subcompact suvqplot(class, hwy, data = mpg, geom = "violin") classWednesday, February 8, 12
  • 23. 45 ● ● ● 40 ● ●● 35 ●● ● ● ● ● ●●● ●●● ●●●● 30 ●● ●● ●●●●●●●●●●●● ●●●●● ●●●●● hwy ● ●●●● ●● ●●●●●●● ●●●●● ● ● ●● ●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●● ● 25 ● ●●●●●●● ●●●● ● ●● ● ●● ● ●●●● ●●●● ● ● ● ● ●● ● ● ●●● ● ●● ● ● ● 20 ●●● ● ●●●●●●● ●●●● ●●●●●●●●● ●● ●●●●●●●● ● ●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●● ●● 15 ●●●● ●●●●●● ●● ●●● ●● 10 2seater compact midsize minivan pickup subcompact suv classWednesday, February 8, 12
  • 24. 45 ● ● ● 40 ● ●● 35 ●● ● ● ● ● ●●● ●●● ●●●● 30 ●● ●● ●●●●●●●●●●●● ●●●●● ●●●●● hwy ● ●●●● ●● ●●●●●●● ●●●●● ● ● ●● ●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●● ● 25 ● ●●●●●●● ●●●● ● ●● ● ●● ● ●●●● ●●●● ● ● ● ● ●● ● ● ●●● ● ●● ● ● ● 20 ●●● ● ●●●●●●● ●●●● ●●●●●●●●● ●● ●●●●●●●● ● ●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●● ●● 15 ●●●● ●●●●●● ●●qplot(class, hwy, data = mpg, geom = "dotplot", ●● ●●● stackdir = "center", binaxis = "y", 10 2seater compact midsize minivan pickup subcompact suv stackratio = 1, binwidth = class 1)Wednesday, February 8, 12
  • 25. 1.0 Better legends ● ● ● ● (by Kohske Takahashi) ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● colour ● 0.6 ● ● ● 0.5 ● ● ● ● ● ● ● ● ● 1.0 ● y ● ● ● 1.5 ● ● ● 2.0 ● ● ● ● ● 0.4 ● ● 2.5 ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 0.0 0.2 0.4 0.6 0.8 1.0 xWednesday, February 8, 12
  • 26. 1.0 Better legends ● ● ● ● (by Kohske Takahashi) ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● colour ● 0.6 ● ● ● 0.5 ● ● ● ● ● ● ● ● ● 1.0 ● y ● ● ● 1.5 ● ● ● 2.0 ● ● ● ● ● 0.4 ● ● 2.5 ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ●df <- data.frame(x = runif(100), y = runif(100)) ● ● ● ● ● ● ● ●df$colour <- with(df, x ^ 2 + y + runif(100)) ● ● ● ● ● ● ● ● ● 0.0 0.0 0.2 0.4 0.6 0.8 1.0qplot(x, y, data = df, colour = colour) xWednesday, February 8, 12
  • 27. 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● 0.6 ● ● ● ● ● ● colour ● ● ● ● ● ● 0.5 ● 1.0 ● 1.5 y ● ● ● ● 2.0 ● 2.5 ● ● ● ● ● ● 0.4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0qplot(x, y, data = df, colour = colour) + 0.0 0.2 0.4 0.6 0.8 1.0 guides(colour = guide_legend(nrow = 2, byrow = T)) xWednesday, February 8, 12
  • 28. 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● colour 0.6 ● ● 2.5 ● ● ● ● ● 2.0 ● ● ● ● ● 1.5 y ● ● ● ● 1.0 ● ● ● ● 0.4 ● ● 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0qplot(x, y, data = df, 0.4 0.0 0.2 colour = colour)0.8+ 0.6 1.0 guides(colour = guide_colorbar()) xWednesday, February 8, 12
  • 29. qplot(x, y, data = df, colour = colour, alpha = I(1/4))Wednesday, February 8, 12
  • 30. qplot(x, y, data = df, colour = colour, alpha = I(1/4)) + guides(colour = guide_legend( override.aes = list(alpha = 1, size = 2)))Wednesday, February 8, 12
  • 31. # Better layout df <- data.frame(x = 1:10, y = 10:1, colour = 1:2) qplot(x, y, data = df) + coord_fixed() qplot(x, y, data = df) + facet_wrap(~ colour) # Internally, there has been a big rewrite of # the facetting data processing and rendering # systems. This lays the foundation for new # features, and fixes some annoying long-standing # bugs.Wednesday, February 8, 12
  • 32. # Speed improvements system.time( print(qplot(carat, price, data = diamonds)) ) # Includes new tools for figuring out whats # taking all the time benchplot(qplot(carat, price, data = diamonds)) # See also geom_raster and geom_map # Still a lot of work to do. The emphasis in # ggplot2 is reducing the amount of thinking # time by making it easier to go from the plot in # your brain to the plot on the page.Wednesday, February 8, 12
  • 33. 30s with geom_tile, 8s with annotation_rasterWednesday, February 8, 12
  • 34. library(ggplot2) library(reshape2) library(RgoogleMaps) library(ggmap) theft <- subset(crime, offense == "theft" & lat > 29 & lat < 30.2 & lon > -95.8) lonr <- range(theft$lon) latr <- range(theft$lat) h_map <- GetMap.bbox(lonr, latr, size = c(1024, 1024)) h_raster <- as.raster(h_map$myTile) benchplot(ggplot(theft, aes(lon, lat)) + annotation_raster(h_raster, lonr[1], lonr[2], latr[1], latr[2]) + geom_density2d(colour = "black")) h_data <- melt(as.matrix(h_raster)) h_data$lat <- seq(latr[2], latr[1], length = 640)[h_data$Var1] h_data$lon <- seq(lonr[1], lonr[2], length = 640)[h_data$Var2] benchplot(ggplot(theft, aes(lon, lat)) + geom_tile(aes(fill = value), data = h_data) + scale_fill_identity() + geom_density2d(colour = "black"))Wednesday, February 8, 12
  • 35. ggplot2 0.9 scheduled for release on March 1Wednesday, February 8, 12
  • 36. Poll: How big is your data?Wednesday, February 8, 12
  • 37. # Future work: big visualisation # (Sponsored by Revolution Analytics) # How can you make a plot of 100 million # observations? # In less that one minute.Wednesday, February 8, 12
  • 38. Wednesday, February 8, 12
  • 39. Wednesday, February 8, 12
  • 40. Wednesday, February 8, 12
  • 41. Wednesday, February 8, 12
  • 42. Wednesday, February 8, 12
  • 43. ~100,000 points 0.06 s to bin 0.20 s to convert 6.0 s to plotWednesday, February 8, 12
  • 44. ~1.2 million ~100,000 points 10 s toto bin 0.06 s bin 0.20 s to convert 6.0 s to plotWednesday, February 8, 12
  • 45. Best practicesWednesday, February 8, 12
  • 46. Poll: How do you learn about new packages?Wednesday, February 8, 12
  • 47. Package best practices • Namespace • Documentation • Unit tests • Read the source! • (ggplot2 not always the best example: it was was my second R package - I have now written around 30. I now know a lot more!)Wednesday, February 8, 12
  • 48. Wednesday, February 8, 12
  • 49. # Namespaces library(ggplot2) ddply # Note that plyr, reshape etc arent automatically # loaded. This is good development practice - # its better to be explicit than implicit. # Look at the NAMESPACE file.Wednesday, February 8, 12
  • 50. export("%+%") export(aes_all) export(aes_auto) export(aes_string) export(aes) export(annotate) export(annotation_custom) export(annotation_map) export(annotation_raster) export(autoplot) export(benchplot) export(borders) export(continuous_scale) export(coord_cartesian) export(coord_equal) export(coord_fixed) export(coord_flip) export(coord_map) export(coord_polar) ...Wednesday, February 8, 12
  • 51. # Unit tests # Look in tests/ or inst/tests/ library(testthat) test_package("ggplot2")Wednesday, February 8, 12
  • 52. # Documentation # Function level in man/ ?geom_point ?facet_wrap package?ggplot2 # Vignettes in inst/doc # (ggplot2 doesnt have any) # Publications citation("ggplot2")Wednesday, February 8, 12
  • 53. QuestionsWednesday, February 8, 12
  • 54. Learning ggplot2 ggplot2 mailing list http://groups.google.com/group/ggplot2 stackoverflow http://stackoverflow.com/tags/ggplot2 Lattice to ggplot2 conversion http://learnr.wordpress.com/?s=lattice Cookbook for common graphics http://wiki.stdout.org/rcookbook/Graphs/ ggplot2 book http://amzn.com/0387981403Wednesday, February 8, 12