Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

1,187 views

Published on

No Downloads

Total views

1,187

On SlideShare

0

From Embeds

0

Number of Embeds

4

Shares

0

Downloads

50

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Stat405 Graphic tips & tricks Hadley Wickham Wednesday, 9 September 2009
- 2. 1. Homework 2. Reading a scatterplot 3. Scatterplot techniques for large data 4. Iteration & story telling 5. Project & homework Wednesday, 9 September 2009
- 3. Homework Great start! Remember the grading scheme: 4.5–5 = A+, 4–4.5 = A, 3.5–4 = A- Shorter is better than longer. Check aspect ratios. Read the comments! Wednesday, 9 September 2009
- 4. Revision: reading a scatterplot • Big patterns • Small patterns • Deviations from the pattern • Strange patterns Wednesday, 9 September 2009
- 5. Wednesday, 9 September 2009
- 6. Strong linear relationship. A number of outliers. Wednesday, 9 September 2009
- 7. Wednesday, 9 September 2009
- 8. Unusual striations. Two groups? Little relationship between table and price? Wednesday, 9 September 2009
- 9. Wednesday, 9 September 2009
- 10. Curved (exponential?) relationship. Outliers mostly cheaper than expected. Wednesday, 9 September 2009
- 11. But what’s the problem with all these plots? qplot(carat, price, data = diamonds) Wednesday, 9 September 2009
- 12. But what’s the problem with all these plots? In pairs, brainstorm solutions for 2 minutes. qplot(carat, price, data = diamonds) Wednesday, 9 September 2009
- 13. Ideas If x discrete, use boxplots. Use semi-transparent points. Divide into bins and count number of points in each bin (2d histogram). Display statistical summary. Wednesday, 9 September 2009
- 14. Box and whisker plots Wednesday, 9 September 2009
- 15. Boxplots Less information than a histogram, but take up much less space. Already seen them used with discrete x values. Can also use with continuous x values, by specifying how we want the data grouped. Wednesday, 9 September 2009
- 16. qplot(table, price, data = diamonds) Wednesday, 9 September 2009
- 17. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10000 price 5000 50 60 70 80 90 qplot(table, price, data = diamonds, geom = "boxplot") table Wednesday, 9 September 2009
- 18. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10000 ● ● ● ● ● ● ● ● ● ● price ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5000 qplot(table, price, data = diamonds, geom 80 "boxplot", 50 60 70 = 90 group = round(table)) table Wednesday, 9 September 2009
- 19. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10000 ● ● ● ● ● ● ● ● ● ● price ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5000 One boxplot for each unique value of this aesthetic qplot(table, price, data = diamonds, geom 80 "boxplot", 50 60 70 = 90 group = round(table)) table Wednesday, 9 September 2009
- 20. Alpha blending Wednesday, 9 September 2009
- 21. qplot(carat, price, data = diamonds, alpha = I(1/10)) Wednesday, 9 September 2009
- 22. qplot(carat, price, data = diamonds, alpha = I(1/50)) Wednesday, 9 September 2009
- 23. qplot(carat, price, data = diamonds, alpha = I(1/250)) Wednesday, 9 September 2009
- 24. Statistical summary Wednesday, 9 September 2009
- 25. qplot(carat, price, data = diamonds) + geom_smooth() Wednesday, 9 September 2009
- 26. qplot(log10(carat), log10(price), data = diamonds) + geom_smooth() Wednesday, 9 September 2009
- 27. qplot(log10(carat), log10(price), data = diamonds) + geom_smooth(method = "lm") Wednesday, 9 September 2009
- 28. 2d bins Wednesday, 9 September 2009
- 29. # Very basic cleaning diamonds$x[diamonds$x == 0] <- NA diamonds$y[diamonds$y == 0] <- NA diamonds$y[diamonds$y > 12] <- NA qplot(x, y, data = diamonds) qplot(x, y, data = diamonds, geom = "bin2d") qplot(x, y, data = diamonds, geom = "hex") qplot(x, y, data = diamonds, geom = "bin2d", bins = 100) qplot(x, y, data = diamonds, geom = "hex", bins = 100) # Zoom in qplot(x, y, data = diamonds, geom = "bin2d", bins = 100) + xlim(4,7) + ylim(4,7) qplot(x, y, data = diamonds, geom = "bin2d", bins = 100) + xlim(4,5) + ylim(4,5) Wednesday, 9 September 2009
- 30. qplot(x, x / y, data = diamonds, geom = "bin2d") qplot(x, log(x / y), data = diamonds, geom = "bin2d") clean <- subset(diamonds, abs(log(x / y)) < 0.1) qplot(x, log(x / y), data = clean, geom = "bin2d") qplot(x, log(x / y), data = clean, geom = "bin2d", bins = 80) Wednesday, 9 September 2009
- 31. qplot(x, x / y, data = diamonds, geom = "bin2d") qplot(x, log(x / y), data = diamonds, geom = "bin2d") clean <- subset(diamonds, abs(log(x / y)) < 0.1) qplot(x, log(x / y), data = clean, geom = "bin2d") qplot(x, log(x / y), data = clean, geom = "bin2d", bins = 80) What would be a good name for log(x / y)? What other variable might you create to go with it? Wednesday, 9 September 2009
- 32. Your turn Continue to explore the relationship between x, y, z and depth. Create new variables as necessary. (Hint: rerun the cleaning code from last week, and create more as necessary) Some good ideas here: http:// www.diamondhelpers.com/ﬁvesteps/4- certiﬁed-diamonds.shtml Wednesday, 9 September 2009
- 33. x table width z depth = z / diameter table = table width / x * 100 Wednesday, 9 September 2009
- 34. y_big <- diamonds$y > 10 z_big <- diamonds$z > 6 x_zero <- diamonds$x == 0 y_zero <- diamonds$y == 0 z_zero <- diamonds$z == 0 diamonds$x[x_zero] <- NA diamonds$y[y_zero | y_big] <- NA diamonds$z[z_zero | z_big] <- NA Wednesday, 9 September 2009
- 35. qplot(z/y * 100, depth, data = diamonds) last_plot() + xlim(50, 100) last_plot() + xlim(50, 80) + ylim(50, 80) qplot(z/x * 100, depth, data = diamonds) + xlim(50, 80) + ylim(50, 80) qplot(z/x * 100, depth / (z/x), data = diamonds) last_plot() + xlim(50, 80) + ylim(80, 120) last_plot() + ylim(95, 105) # ... Wednesday, 9 September 2009
- 36. Iteration & stories Wednesday, 9 September 2009
- 37. Stories Best data analyses tell a story, with a natural ﬂow from beginning to end. For homeworks, try and come up with three plots that tell a story. Stories about a small sample of the data can work well. Wednesday, 9 September 2009
- 38. qplot(cty, hwy, data = mpg) qplot(cty, hwy, data = mpg, geom = "jitter") qplot(cty, hwy, data = mpg, geom = "jitter", colour = class) qplot(cty, cty / hwy, data = mpg, geom = "jitter", colour = class) qplot(cty, cty / hwy, data = mpg, colour = class) qplot(displ, cty / hwy, data = mpg, colour = class) qplot(displ, cty / hwy, data = mpg) + facet_wrap(~ class) qplot(displ, cty / hwy, data = mpg) + facet_wrap(~ class) + geom_smooth(se = F) qplot(displ, cty / hwy, data = mpg) + facet_wrap(~ class) + geom_smooth(method = "lm", se = F) qplot(displ, cty, data = mpg) + facet_wrap(~ class) Wednesday, 9 September 2009
- 39. Project Due in 3.5 weeks. Bigger group data analysis project. (Will be discussing group dynamics on Monday) Homework is to get you started working with the data. Wednesday, 9 September 2009
- 40. Next week Checking on a slot machine. Learning how to write functions. Basics of simulation. Wednesday, 9 September 2009
- 41. Feedback http://hadley.wufoo.com/forms/stat405-feedback/ Wednesday, 9 September 2009

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment