Successfully reported this slideshow.
Your SlideShare is downloading. ×

Practical Workflows in R

More Related Content

Practical Workflows in R

  1. 1. thedatacollective@danwwilson #useR2018 Practical R Workflows How I smooth the bumps
  2. 2. thedatacollective@danwwilson #useR2018 Who am I?
  3. 3. thedatacollective@danwwilson #useR2018 Less of…
  4. 4. thedatacollective@danwwilson #useR2018 More of..
  5. 5. thedatacollective@danwwilson #useR2018 2017 2018 2019JulySeptember Founded
  6. 6. thedatacollective@danwwilson #useR2018 Who should be here?
  7. 7. thedatacollective@danwwilson #useR2018
  8. 8. thedatacollective@danwwilson #useR2018
  9. 9. thedatacollective@danwwilson #useR2018
  10. 10. thedatacollective@danwwilson #useR2018
  11. 11. thedatacollective@danwwilson #useR2018
  12. 12. thedatacollective@danwwilson #useR2018 Where to start?
  13. 13. thedatacollective@danwwilson #useR2018 It’s all about the data
  14. 14. thedatacollective@danwwilson #useR2018 Set your requirements… …then be flexible
  15. 15. thedatacollective@danwwilson #useR2018 Make it yours
  16. 16. thedatacollective@danwwilson #useR2018 * Organise things* • Name things well http://bit.ly/Jenny_naming • Determine your folder structure and use it consistently
  17. 17. thedatacollective@danwwilson #useR2018 Identify points of friction
  18. 18. thedatacollective@danwwilson #useR2018 Workflows take thinking • What work is repetitive? • What data can you standardize across projects? • How should you name things? • How should you structure projects? • What steps in a project slow you down?
  19. 19. thedatacollective@danwwilson #useR2018 Building skills
  20. 20. thedatacollective@danwwilson #useR2018 Example Project Receive Data Standardise Data Review Summary Client Thedata collective Analyse Data Review data Zip Outputs
  21. 21. thedatacollective@danwwilson #useR2018 Re-use code
  22. 22. thedatacollective@danwwilson #useR2018 Create functions get_recency(tx_date, current_date) get_frequency(num_gifts) get_value(gift_amount) All in src/99_functions.R get_value <- function(amount) { amount <- suppressWarnings(as.numeric(amount)) x <- replace(amount, is.na(amount), -Inf) cut(x, breaks = c(-Inf, 0.00001, 10, 25, 50, 100, 250, 500, 1000, Inf), labels = c("ERROR", "<$10", "$10-$24.99", "$25-$49.99", "$50-$99.99", "$100-$249.99", "$250-$499.99", "$500-$999.99", "$1,000+"), include.lowest = TRUE, right = FALSE) }
  23. 23. thedatacollective@danwwilson #useR2018 But…
  24. 24. thedatacollective@danwwilson #useR2018 Create a package segmentr - http://bit.ly/segmentr • http://bit.ly/pkg_hadley • http://bit.ly/pkg_hilary • http://bit.ly/pkg_rstudio • http://bit.ly/pkg_karl
  25. 25. thedatacollective@danwwilson #useR2018 Start at your level, and improve • Start simple • Re-usable code snippets • Keep them together • Build a file of commonly used functions (99_functions.R) • Build a package • When you can make the time, or skills are ready
  26. 26. thedatacollective@danwwilson #useR2018 Reducing friction
  27. 27. thedatacollective@danwwilson #useR2018 What slows you down? • R is great even with its quirks > paste(“Dan”, NA, “Wilson”) # Dan NA Wilson paste_na() > paste_na(“Dan”, NA, “Wilson”) # Dan Wilson
  28. 28. thedatacollective@danwwilson #useR2018 Standardised data > data(package = "segmentr") Data sets in package 'segmentr’: ask_conversion Ask Conversion Table lookup_channel Channel Code Lookup lookup_classification Classification Code Lookup segments Segments
  29. 29. thedatacollective@danwwilson #useR2018 Streamlined process* • Build templates • Start the project quickly • Next step is client specific templates
  30. 30. thedatacollective@danwwilson #useR2018 Getting data out of R* • Copying and pasting from the console is painful (or impossible) • Writing to CSV and opening to copy/paste from is too many steps • copy_clip()
  31. 31. thedatacollective@danwwilson #useR2018 Build solutions to common problems • Limit manual intervention • Make objects (data, functions, etc) accessible • Reduce repetition • Simplify integration with other tools/software
  32. 32. thedatacollective@danwwilson #useR2018 Key take outs 1. Workflows require a little bit of forethought… take the time 2. Start at your level and build your skills 3. Ease the friction
  33. 33. thedatacollective@danwwilson #useR2018 Thanks @danwwilson

×