05 Tips Tricks

1,187 views

Published on

Published in: Technology, Sports
  • Be the first to comment

05 Tips Tricks

  1. 1. Stat405 Graphic tips & tricks Hadley Wickham Wednesday, 9 September 2009
  2. 2. 1. Homework 2. Reading a scatterplot 3. Scatterplot techniques for large data 4. Iteration & story telling 5. Project & homework Wednesday, 9 September 2009
  3. 3. Homework Great start! Remember the grading scheme: 4.5–5 = A+, 4–4.5 = A, 3.5–4 = A- Shorter is better than longer. Check aspect ratios. Read the comments! Wednesday, 9 September 2009
  4. 4. Revision: reading a scatterplot • Big patterns • Small patterns • Deviations from the pattern • Strange patterns Wednesday, 9 September 2009
  5. 5. Wednesday, 9 September 2009
  6. 6. Strong linear relationship. A number of outliers. Wednesday, 9 September 2009
  7. 7. Wednesday, 9 September 2009
  8. 8. Unusual striations. Two groups? Little relationship between table and price? Wednesday, 9 September 2009
  9. 9. Wednesday, 9 September 2009
  10. 10. Curved (exponential?) relationship. Outliers mostly cheaper than expected. Wednesday, 9 September 2009
  11. 11. But what’s the problem with all these plots? qplot(carat, price, data = diamonds) Wednesday, 9 September 2009
  12. 12. But what’s the problem with all these plots? In pairs, brainstorm solutions for 2 minutes. qplot(carat, price, data = diamonds) Wednesday, 9 September 2009
  13. 13. Ideas If x discrete, use boxplots. Use semi-transparent points. Divide into bins and count number of points in each bin (2d histogram). Display statistical summary. Wednesday, 9 September 2009
  14. 14. Box and whisker plots Wednesday, 9 September 2009
  15. 15. Boxplots Less information than a histogram, but take up much less space. Already seen them used with discrete x values. Can also use with continuous x values, by specifying how we want the data grouped. Wednesday, 9 September 2009
  16. 16. qplot(table, price, data = diamonds) Wednesday, 9 September 2009
  17. 17. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10000 price 5000 50 60 70 80 90 qplot(table, price, data = diamonds, geom = "boxplot") table Wednesday, 9 September 2009
  18. 18. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10000 ● ● ● ● ● ● ● ● ● ● price ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5000 qplot(table, price, data = diamonds, geom 80 "boxplot", 50 60 70 = 90 group = round(table)) table Wednesday, 9 September 2009
  19. 19. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10000 ● ● ● ● ● ● ● ● ● ● price ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5000 One boxplot for each unique value of this aesthetic qplot(table, price, data = diamonds, geom 80 "boxplot", 50 60 70 = 90 group = round(table)) table Wednesday, 9 September 2009
  20. 20. Alpha blending Wednesday, 9 September 2009
  21. 21. qplot(carat, price, data = diamonds, alpha = I(1/10)) Wednesday, 9 September 2009
  22. 22. qplot(carat, price, data = diamonds, alpha = I(1/50)) Wednesday, 9 September 2009
  23. 23. qplot(carat, price, data = diamonds, alpha = I(1/250)) Wednesday, 9 September 2009
  24. 24. Statistical summary Wednesday, 9 September 2009
  25. 25. qplot(carat, price, data = diamonds) + geom_smooth() Wednesday, 9 September 2009
  26. 26. qplot(log10(carat), log10(price), data = diamonds) + geom_smooth() Wednesday, 9 September 2009
  27. 27. qplot(log10(carat), log10(price), data = diamonds) + geom_smooth(method = "lm") Wednesday, 9 September 2009
  28. 28. 2d bins Wednesday, 9 September 2009
  29. 29. # Very basic cleaning diamonds$x[diamonds$x == 0] <- NA diamonds$y[diamonds$y == 0] <- NA diamonds$y[diamonds$y > 12] <- NA qplot(x, y, data = diamonds) qplot(x, y, data = diamonds, geom = "bin2d") qplot(x, y, data = diamonds, geom = "hex") qplot(x, y, data = diamonds, geom = "bin2d", bins = 100) qplot(x, y, data = diamonds, geom = "hex", bins = 100) # Zoom in qplot(x, y, data = diamonds, geom = "bin2d", bins = 100) + xlim(4,7) + ylim(4,7) qplot(x, y, data = diamonds, geom = "bin2d", bins = 100) + xlim(4,5) + ylim(4,5) Wednesday, 9 September 2009
  30. 30. qplot(x, x / y, data = diamonds, geom = "bin2d") qplot(x, log(x / y), data = diamonds, geom = "bin2d") clean <- subset(diamonds, abs(log(x / y)) < 0.1) qplot(x, log(x / y), data = clean, geom = "bin2d") qplot(x, log(x / y), data = clean, geom = "bin2d", bins = 80) Wednesday, 9 September 2009
  31. 31. qplot(x, x / y, data = diamonds, geom = "bin2d") qplot(x, log(x / y), data = diamonds, geom = "bin2d") clean <- subset(diamonds, abs(log(x / y)) < 0.1) qplot(x, log(x / y), data = clean, geom = "bin2d") qplot(x, log(x / y), data = clean, geom = "bin2d", bins = 80) What would be a good name for log(x / y)? What other variable might you create to go with it? Wednesday, 9 September 2009
  32. 32. Your turn Continue to explore the relationship between x, y, z and depth. Create new variables as necessary. (Hint: rerun the cleaning code from last week, and create more as necessary) Some good ideas here: http:// www.diamondhelpers.com/fivesteps/4- certified-diamonds.shtml Wednesday, 9 September 2009
  33. 33. x table width z depth = z / diameter table = table width / x * 100 Wednesday, 9 September 2009
  34. 34. y_big <- diamonds$y > 10 z_big <- diamonds$z > 6 x_zero <- diamonds$x == 0 y_zero <- diamonds$y == 0 z_zero <- diamonds$z == 0 diamonds$x[x_zero] <- NA diamonds$y[y_zero | y_big] <- NA diamonds$z[z_zero | z_big] <- NA Wednesday, 9 September 2009
  35. 35. qplot(z/y * 100, depth, data = diamonds) last_plot() + xlim(50, 100) last_plot() + xlim(50, 80) + ylim(50, 80) qplot(z/x * 100, depth, data = diamonds) + xlim(50, 80) + ylim(50, 80) qplot(z/x * 100, depth / (z/x), data = diamonds) last_plot() + xlim(50, 80) + ylim(80, 120) last_plot() + ylim(95, 105) # ... Wednesday, 9 September 2009
  36. 36. Iteration & stories Wednesday, 9 September 2009
  37. 37. Stories Best data analyses tell a story, with a natural flow from beginning to end. For homeworks, try and come up with three plots that tell a story. Stories about a small sample of the data can work well. Wednesday, 9 September 2009
  38. 38. qplot(cty, hwy, data = mpg) qplot(cty, hwy, data = mpg, geom = "jitter") qplot(cty, hwy, data = mpg, geom = "jitter", colour = class) qplot(cty, cty / hwy, data = mpg, geom = "jitter", colour = class) qplot(cty, cty / hwy, data = mpg, colour = class) qplot(displ, cty / hwy, data = mpg, colour = class) qplot(displ, cty / hwy, data = mpg) + facet_wrap(~ class) qplot(displ, cty / hwy, data = mpg) + facet_wrap(~ class) + geom_smooth(se = F) qplot(displ, cty / hwy, data = mpg) + facet_wrap(~ class) + geom_smooth(method = "lm", se = F) qplot(displ, cty, data = mpg) + facet_wrap(~ class) Wednesday, 9 September 2009
  39. 39. Project Due in 3.5 weeks. Bigger group data analysis project. (Will be discussing group dynamics on Monday) Homework is to get you started working with the data. Wednesday, 9 September 2009
  40. 40. Next week Checking on a slot machine. Learning how to write functions. Basics of simulation. Wednesday, 9 September 2009
  41. 41. Feedback http://hadley.wufoo.com/forms/stat405-feedback/ Wednesday, 9 September 2009

×