Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Lesser Known Stars of the Tidyverse

233 views

Published on

These are my slides for my RStudio::conf presentation on February 2, 2018. Recording of the talk will be available soon.

Published in: Data & Analytics
  • Be the first to comment

The Lesser Known Stars of the Tidyverse

  1. 1. The Lesser Known s of the Tidyverse Emily Robinson @robinson_es
  2. 2. About Me - Data Analyst at Etsy - R User for ~6 years - Enjoy talking about: • A/B Testing • Building and finding Data Science community • R
  3. 3. Disclaimers
  4. 4. This talk represents my own views, not those of Etsy
  5. 5. It’s not Base R vs. Tidyverse
  6. 6. Talk Goals
  7. 7. 1. Keep you hip to the lingo
  8. 8. 2. Stop you from doing this ...
  9. 9. … by sharing useful functions
  10. 10. 3. Point you to resources
  11. 11. The Tidyverse
  12. 12. An opinionated collection of R packages designed for data science that share an underlying design philosophy, grammar, and data structures
  13. 13. Tidyverse ?=
  14. 14. Tidyverse !=
  15. 15. Tibble
  16. 16. Tidyverse != Hadleyverse
  17. 17. Tidyverse != Hadleyverse Many other contributors
  18. 18. Demo
  19. 19. Problem: it takes over the console Step 1: print your dataset!
  20. 20. Prints only 10 rows and the columns that fit on the screen Solution: as_tibble()
  21. 21. Problem: your NAs aren’t actually NAs Step 2: examine your NAs
  22. 22. Solution: na_if() to replace certain values with NA
  23. 23. Problem: how I can I do this quickly? + Skimr Solution: dplyr::select_if() + skimr::skim() Step 3: examine your numeric columns
  24. 24. Problem: it has multiple answers in each row Step 4: examine a single column
  25. 25. Solution: stringr::str_split() …
  26. 26. Solution: stringr::str_split() and tidyr::unnest() +
  27. 27. Problem: it’s a mess Step 5: make a scatterplot!
  28. 28. ggplot(WorkChallenges, aes(x = fct_reorder(question, perc_problem), y = perc_problem)) + geom_point() Solution: fct_reorder() to order one axis by the other
  29. 29. Problem: your scale is mis-ordered Step 6: make a bar chart!
  30. 30. Solution: fct_relevel() to manually order your factor ggplot(aes(x = fct_relevel(response, "Rarely", "Sometimes", "Often", "Most of the time"))) + geom_bar()
  31. 31. Final step: do something cool and new! Problem:
  32. 32. One solution: make a minimal reproducible example +
  33. 33. Part 0 (optional): use tribble() to make a toy dataset
  34. 34. Part 1: Use reprex() to find any problems Credit: Nick Tiernay, https://www.njtierney.com/post/2017/01/11/magic-reprex/
  35. 35. Part 2: Use reprex() to post your question or issue Credit: Nick Tiernay, https://www.njtierney.com/post/2017/01/11/magic-reprex/
  36. 36. Review stringr::str_split tidyr::unnest forcats::fct_reorder forcats::fct_relevel reprex::reprex tibble::as_tibble tibble::tribble dplyr::na_if dplyr::select_if skimr::skim
  37. 37. Resources
  38. 38. R4ds.had.co.nz
  39. 39. #rstats Twitter
  40. 40. #rstats Twitter
  41. 41. Datacamp.com
  42. 42. Base R to Tidyverse Translation www.significantdigits.org/2017/10/switching-from-base-r-to-tidyverse/
  43. 43. - Tidyverse.org - community.rstudio.com/c/tidyverse - https://www.rstudio.com/resources/cheatsheets/ - https://medium.com/@kierisi/r4ds-the-next-iteration- d51e0a1b0b82 And much more!
  44. 44. Come for the stickers and package names … Stay for the friendly community and happy workflow. The tidyverse
  45. 45. Thank You! tiny.cc/rstudiotalk robinsones.github.io @robinson_es

×