Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Grammar of Graphics - Darya Vanichkina

52 views

Published on

Presented as part of ResBaz 2018 in Sydney.

Published in: Data & Analytics
  • Be the first to comment

Grammar of Graphics - Darya Vanichkina

  1. 1. The Grammar of Graphics Darya Vanichkina
  2. 2. … a personal commentary/interpretation of … http://vita.had.co.nz/papers/layered-grammar.html
  3. 3. Grammar “the fundamental principles or rules of an art or science” (OED Online 1989)
  4. 4. Simple case
  5. 5. Simple case
  6. 6. Components of the grammar Data Variables of interest Aesthetics x-axis y-axis colour fill size labels alpha shape line width line type Geometries point line histogram bar boxplot Facets columns rows Statistics binning smoothing descriptive inferential Coordinates cartesian fixed polar limits Themes Not data, but important for overall impact Adapted from DataCamp
  7. 7. Exploring the diamonds data
  8. 8. Relationship between table (width of top of diamond relative to widest point) and price? diamonds %>% ggplot(aes(x = table, y = price)) + geom_point()
  9. 9. Relationship between carat (weight) and price? diamonds %>% ggplot(aes(x = carat, y = price)) + geom_point()
  10. 10. Relationship between carat (weight) and price? [How many diamonds?] diamonds %>% ggplot(aes(x = carat, y = price)) + geom_point(alpha = 0.1)
  11. 11. Relationship between carat (weight) and price? [Lots of small diamonds!] diamonds %>% ggplot(aes(x = carat, y = price)) + geom_hex()
  12. 12. Components of the grammar Data Variables of interest Aesthetics x-axis y-axis colour fill size labels alpha shape line width line type Geometries point line histogram bar boxplot Facets columns rows Statistics binning smoothing descriptive inferential Coordinates cartesian fixed polar limits Themes Not data, but important for overall impact Adapted from DataCamp
  13. 13. Relationship between clarity and price? Worse ——————————————————> Better diamonds %>% ggplot(aes(x = clarity, y = price)) + geom_boxplot()
  14. 14. How many diamonds with each clarity? Worse ——————————————————> Better diamonds %>% ggplot(aes(x = clarity, fill = clarity)) + geom_bar()
  15. 15. Relationship between cut and price? Worse ——————————————————> Better diamonds %>% ggplot(aes(x = cut, y = price)) + geom_boxplot()
  16. 16. Relationship between cut and price? Worse ——————————————————> Better diamonds %>% ggplot(aes(x = cut, y = price)) + geom_boxplot()
  17. 17. Relationship between cut and price? Worse ——————————————————> Better diamonds %>% ggplot(aes(x = cut, y = price)) + geom_violin()
  18. 18. Components of the grammar Data Variables of interest Aesthetics x-axis y-axis colour fill size labels alpha shape line width line type Geometries point line histogram bar boxplot Facets columns rows Statistics binning smoothing descriptive inferential Coordinates cartesian fixed polar limits Themes Not data, but important for overall impact Adapted from DataCamp
  19. 19. Relationship between price and carat grouped by cut and clarity? diamonds %>% ggplot(aes(x = carat, y = price)) + geom_point() + facet_grid(cut~clarity)
  20. 20. Relationship between price and carat grouped by cut and clarity (with trend)? diamonds %>% ggplot(aes(x = carat, y = price)) + geom_point() + facet_grid(cut~clarity) + geom_smooth(method = "lm")
  21. 21. Relationship between table parameter (width of top of diamond relative to widest point, converted on the fly into a factor) and price, faceted by cut? diamonds %>% mutate(TableIsSmall = factor(table < median(table)))%>% ggplot(aes(x = carat, y = price, colour = TableIsSmall)) + geom_point() + facet_grid(.~cut)
  22. 22. Relationship between table parameter (width of top of diamond relative to widest point, converted on the fly into a factor) and price, faceted by cut? diamonds %>% mutate(TableIsSmall = factor(table < median(table)))%>% ggplot(aes(x = carat, y = price, colour = TableIsSmall)) + geom_point() + facet_grid(.~cut) Most diamonds with a small table (TableIsBig == FALSE) have an “Ideal” cut…
  23. 23. Components of the grammar Data Variables of interest Aesthetics x-axis y-axis colour fill size labels alpha shape line width line type Geometries point line histogram bar boxplot Facets columns rows Statistics binning smoothing descriptive inferential Coordinates cartesian fixed polar limits Themes Not data, but important for overall impact Adapted from DataCamp
  24. 24. Relationship between cut and price? Worse ——————————————————> Better diamonds %>% ggplot(aes(x = cut, y = price)) + geom_boxplot()
  25. 25. Relationship between cut and price? [log-scale] Worse ——————————————————> Better diamonds %>% ggplot(aes(x = cut, y = price)) + geom_boxplot()+ scale_y_log10() Worse ——————————————————> Better diamonds %>% ggplot(aes(x = cut, y = price)) + geom_boxplot()
  26. 26. How many diamonds with each clarity? Worse ——————————————————> Better diamonds %>% ggplot(aes(x = clarity, fill = clarity))+geom_bar()
  27. 27. How many diamonds with each clarity? Worse ——————————————————> Better diamonds %>% ggplot(aes(x = clarity, fill = clarity))+geom_bar() diamonds %>% ggplot(aes(x = clarity, fill = clarity)) +geom_bar(width=1) + coord_polar()
  28. 28. Components of the grammar Data Variables of interest Aesthetics x-axis y-axis colour fill size labels alpha shape line width line type Geometries point line histogram bar boxplot Facets columns rows Statistics binning smoothing descriptive inferential Coordinates cartesian fixed polar limits Themes Not data, but important for overall impact Adapted from DataCamp
  29. 29. Finishing up with the diamonds data diamonds %>% ggplot(aes(x = carat, y = price)) + geom_point() + facet_grid(cut~clarity, scales = "free_x") + geom_smooth(method = “lm") diamonds %>% ggplot(aes(x = cut, y = price)) + geom_boxplot()+ scale_y_log10()
  30. 30. Finishing up with the diamonds data diamonds %>% ggplot(aes(x = carat, y = price)) + geom_point() + facet_grid(cut~clarity, scales = "free_x") + geom_smooth(method = "lm") + theme_bw() diamonds %>% ggplot(aes(x = cut, y = price)) + geom_boxplot()+ scale_y_log10() + theme_minimal()
  31. 31. Just changing the themediamonds %>% ggplot(aes(x = carat, y = price)) + geom_point(col = "red") + facet_grid(cut~clarity, scales = "free_x") + geom_smooth(method = "lm") + theme_black() diamonds %>% ggplot(aes(x = cut, y = price)) + geom_boxplot()+ scale_y_log10() + theme_dark()
  32. 32. The Grammar of Graphics Data Variables of interest Aesthetics x-axis y-axis colour fill size labels alpha shape line width line type Geometries point line histogram bar boxplot Facets columns rows Statistics binning smoothing descriptive inferential Coordinates cartesian fixed polar limits Themes Not data, but important for overall impact
  33. 33. Limitations
  34. 34. Limitations https://www.andrewheiss.com/blog/2017/08/10/exploring-minards-1812-plot-with-ggplot2/
  35. 35. The Grammar of Graphics Data Variables of interest Aesthetics x-axis y-axis colour fill size labels alpha shape line width line type Geometries point line histogram bar boxplot Facets columns rows Statistics binning smoothing descriptive inferential Coordinates cartesian fixed polar limits Themes Not data, but important for overall impact https://github.com/dvanic/18ResBazPresentation

×