Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

NY R Conference talk

186 views

Published on

My talk in April 2018 at the NY R Conference on the Lesser Known Stars of the Tidyverse.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

NY R Conference talk

  1. 1. The Lesser Known s of the Tidyverse Emily Robinson @robinson_es
  2. 2. About Me ➔ R User ~ 6 years ➔ Data Scientist at DataCamp ➔ Enjoy talking about: ◆ A/B testing ◆ Building and finding data science community ◆ R
  3. 3. Talk Goals
  4. 4. 1. Keep you hip to the lingo
  5. 5. 2. Stop you from doing this ….
  6. 6. …. by sharing useful functions
  7. 7. 3. Point you to resources
  8. 8. The Tidyverse
  9. 9. Coherent system of packages for data manipulation, exploration, and visualization that share a common design philosophy
  10. 10. Tidyverse = ?
  11. 11. Tidyverse = !
  12. 12. Tidyverse != Hadleyverse
  13. 13. Tidyverse != Hadleyverse Many other contributors
  14. 14. Demo
  15. 15. Some steps of a data analysis workflow ➔ View dataset in console ➔ Inspect missing values ➔ Examine some columns ➔ Make a plot ➔ Do something cool and new!
  16. 16. Problem: it takes over the console Step 1: print your dataset!
  17. 17. Prints only 10 rows and the columns that fit on the screen Solution: as_tibble()
  18. 18. Problem: how do you do this for every column? Step 2: examine your NAs
  19. 19. Problem: missing values aren’t actually NA Answer: purrr::map_df() to “map” function over each column
  20. 20. Solution: na_if() to replace certain values with NA
  21. 21. Problem: how I can I do this quickly? + Skimr Solution: dplyr::select_if() + skimr::skim() Step 3: examine your numeric columns
  22. 22. Problem: it has multiple answers in each row Step 4: examine a single column
  23. 23. Solution: stringr::str_split() …
  24. 24. Solution: stringr::str_split() and tidyr::unnest() +
  25. 25. Problem: it’s a mess Step 5: make a plot!
  26. 26. Solution: coord_flip … But they’re not ordered
  27. 27. + forcats::fct_reorder
  28. 28. Final step: do something cool and new! Problem:
  29. 29. One solution: make a minimal reproducible example +
  30. 30. Part 0 (optional): use tribble() to make a toy dataset
  31. 31. Part 1: Use reprex() to find any problems Credit: Nick Tiernay, https://www.njtierney.com/post/2017/01/11/magic-reprex/
  32. 32. Part 2: Use reprex() to post your question or issue Credit: Nick Tiernay, https://www.njtierney.com/post/2017/01/11/magic-reprex/
  33. 33. Review stringr::str_split tidyr::unnest coord_flip() forcats::fct_reorder tibble::tribble reprex::reprex tibble::as_tibble purrr:map_df dplyr::na_if dplyr::select_if skimr::skim
  34. 34. Resources
  35. 35. R4ds.had.co.nz
  36. 36. #rstats twitter
  37. 37. #rstats twitter
  38. 38. Rstudio.com/resources/cheatsheets
  39. 39. DataCamp.com
  40. 40. Learn | https://datacamp.com/courses
  41. 41. Conclusion
  42. 42. The tidyverse Come for the stickers and package names … Stay for the friendly community and happy workflow
  43. 43. Thank you! tiny.cc/nyrtalk hookedondata.org @robinson_es

×