Your SlideShare is downloading. ×
0
16 critique
16 critique
16 critique
16 critique
16 critique
16 critique
16 critique
16 critique
16 critique
16 critique
16 critique
16 critique
16 critique
16 critique
16 critique
16 critique
16 critique
16 critique
16 critique
16 critique
16 critique
16 critique
16 critique
16 critique
16 critique
16 critique
16 critique
16 critique
16 critique
16 critique
16 critique
16 critique
16 critique
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

16 critique

1,105

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,105
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
29
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Stat405 Graphical theory & critique Hadley Wickham Tuesday, 19 October 2010
  • 2. Project • Generally excellent • Common problems: lack of proof reading, lack of flow • If you’re going to throw away 85% of the data, I want to know how that differs from the data that you kept Tuesday, 19 October 2010
  • 3. Project • Don’t forget to set up a meeting time with me this week • (I’ll be travelling from Saturday until when the project is due, so if you can’t meet with me this week, email Garrett to set up a time) Tuesday, 19 October 2010
  • 4. Tuesday, 19 October 2010
  • 5. Exploratory graphics Are for you (not others). Need to be able to create rapidly because your first attempt will never be the most revealing. Iteration is crucial for developing the best display of your data. Gives rise to two key questions: Tuesday, 19 October 2010
  • 6. What should I plot? How can I plot it? Tuesday, 19 October 2010
  • 7. Two general tools Plot critique toolkit: “graphics are like pumpkin pie” Theory behind ggplot2: “A layered grammar of graphics” plus lots of practice... Tuesday, 19 October 2010
  • 8. Graphics are like pumpkin pie The four C’s of critiquing a graphic Tuesday, 19 October 2010
  • 9. Content Tuesday, 19 October 2010
  • 10. Construction Tuesday, 19 October 2010
  • 11. Context Tuesday, 19 October 2010
  • 12. Consumption Tuesday, 19 October 2010
  • 13. Content What data (variables) does the graph display? What non-data is present? What is pumpkin (essence of the graphic) vs what is spice (useful additional info)? Tuesday, 19 October 2010
  • 14. Your turn Identify the data and non-data on “Napoleon's march” and “Building an electoral victory”. Which features are the most important? Which are just useful background information? Tuesday, 19 October 2010
  • 15. Results Minard’s march: (top) latitude, longitude, number of troops, direction, branch, city name (bottom) latitude, temperature, date Building an electoral victory: state, number of electoral college votes, winner, margin of victory Tuesday, 19 October 2010
  • 16. Construction How many layers are on the plot? What data does each layer display? What sort of geometric object does it use? Is it a summary of the raw data? How are variables mapped to aesthetics? Tuesday, 19 October 2010
  • 17. Fo r i a r c bl va Perceptual mapping on e s t in on uo l y ! us Best 1. Position along a common scale 2. Position along nonaligned scale 3. Length 4. Angle/slope 5. Area 6. Volume Worst 7. Colour Tuesday, 19 October 2010
  • 18. Your turn Answer the following questions for “Napoleon's march” and “Flight delays”: How many layers are on the plot? What data does the layer display? How does it display it? Tuesday, 19 October 2010
  • 19. Results Napoleon’s march: (top) (1) path plot with width mapped to number of troops, colour to direction, separate group for each branch (2) labels giving city names (bottom) (1) line plot with longitude on x-axis and temperature on y-axis (2) text labels giving dates Flight delays: (1) white circles showing 100% cancellation, (2) outline of states, (3) points with size proportional to percent cancellations at each airport. Tuesday, 19 October 2010
  • 20. Can the explain composition of a graphic in words, but how do we create it? Tuesday, 19 October 2010
  • 21. “If any number of magnitudes are each the same multiple of the same number of other magnitudes, then the sum is that multiple of the sum.” Euclid, ~300 BC Tuesday, 19 October 2010
  • 22. “If any number of magnitudes are each the same multiple of the same number of other magnitudes, then the sum is that multiple of the sum.” Euclid, ~300 BC m(Σx) = Σ(mx) Tuesday, 19 October 2010
  • 23. The grammar of graphics An abstraction which makes thinking about, reasoning about and communicating graphics easier. Developed by Leland Wilkinson, particularly in “The Grammar of Graphics” 1999/2005 You’ve been using it in ggplot2 without knowing it! But to do more, you need to learn more about the theory. Tuesday, 19 October 2010
  • 24. What is a layer? • Data • Mappings from variables to aesthetics (aes) • A geometric object (geom) • A statistical transformation (stat) • A position adjustment (position) Tuesday, 19 October 2010
  • 25. layer(geom, stat, position, data, mapping, ...) layer( data = mpg, mapping = aes(x = displ, y = hwy), geom = "point", stat = "identity", position = "identity" ) layer( data = diamonds, mapping = aes(x = carat), geom = "bar", stat = "bin", position = "stack" ) Tuesday, 19 October 2010
  • 26. # A lot of typing! layer( data = mpg, mapping = aes(x = displ, y = hwy), geom = "point", stat = "identity", position = "identity" ) # Every geom has an associated default statistic # (and vice versa), and position adjustment. geom_point(aes(displ, hwy), data = mpg) geom_histogram(aes(displ), data = mpg) Tuesday, 19 October 2010
  • 27. # To actually create the plot ggplot() + geom_point(aes(displ, hwy), data = mpg) ggplot() + geom_histogram(aes(displ), data = mpg) Tuesday, 19 October 2010
  • 28. # Multiple layers ggplot() + geom_point(aes(displ, hwy), data = mpg) + geom_smooth(aes(displ, hwy), data = mpg) # Avoid redundancy: ggplot(mpg, aes(displ, hwy)) + geom_point() + geom_smooth() Tuesday, 19 October 2010
  • 29. # Different layers can have different aesthetics ggplot(mpg, aes(displ, hwy)) + geom_point(aes(colour = class)) + geom_smooth() ggplot(mpg, aes(displ, hwy)) + geom_point(aes(colour = class)) + geom_smooth(aes(group = class), method = "lm", se = F) Tuesday, 19 October 2010
  • 30. Your turn For each of the following plots created with qplot, recreate the equivalent ggplot code. qplot(price, carat, data = diamonds) qplot(hwy, cty, data = mpg, geom = "jitter") qplot(reorder(class, hwy), hwy, data = mpg, geom = c("jitter", "boxplot")) qplot(log10(price), log10(carat), data = diamonds), colour = color) + geom_smooth(method = "lm") Tuesday, 19 October 2010
  • 31. ggplot(diamonds, aes(price, data)) + geom_smooth() gglot(mpg, aes(hwy, cty)) + geom_jitter() ggplot(mpg, aes(reorder(class, hwy), hwy)) + geom_jitter() + geom_boxplot() ggplot(diamonds, aes(log10(price), log10(carat), colour = color)) + geom_point() + geom_smooth(method = "lm") Tuesday, 19 October 2010
  • 32. More geoms & stats See http://had.co.nz/ggplot2 for complete list with helpful icons: Geoms: (0d) point, (1d) line, path, (2d) boxplot, bar, tile, text, polygon Stats: bin, summary, sum Tuesday, 19 October 2010
  • 33. Your turn Go back to the descriptions of “Minard’s march” and “Flight delays” that you created before. Start converting your textual description to ggplot2 code. Tuesday, 19 October 2010

×