Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Practical Workflows in R

193 views

Published on

How to improve R workflows

Published in: Data & Analytics
  • Be the first to comment

Practical Workflows in R

  1. 1. thedatacollective@danwwilson #useR2018 Practical R Workflows How I smooth the bumps
  2. 2. thedatacollective@danwwilson #useR2018 Who am I?
  3. 3. thedatacollective@danwwilson #useR2018 Less of…
  4. 4. thedatacollective@danwwilson #useR2018 More of..
  5. 5. thedatacollective@danwwilson #useR2018 2017 2018 2019JulySeptember Founded
  6. 6. thedatacollective@danwwilson #useR2018 Who should be here?
  7. 7. thedatacollective@danwwilson #useR2018
  8. 8. thedatacollective@danwwilson #useR2018
  9. 9. thedatacollective@danwwilson #useR2018
  10. 10. thedatacollective@danwwilson #useR2018
  11. 11. thedatacollective@danwwilson #useR2018
  12. 12. thedatacollective@danwwilson #useR2018 Where to start?
  13. 13. thedatacollective@danwwilson #useR2018 It’s all about the data
  14. 14. thedatacollective@danwwilson #useR2018 Set your requirements… …then be flexible
  15. 15. thedatacollective@danwwilson #useR2018 Make it yours
  16. 16. thedatacollective@danwwilson #useR2018 * Organise things* • Name things well http://bit.ly/Jenny_naming • Determine your folder structure and use it consistently
  17. 17. thedatacollective@danwwilson #useR2018 Identify points of friction
  18. 18. thedatacollective@danwwilson #useR2018 Workflows take thinking • What work is repetitive? • What data can you standardize across projects? • How should you name things? • How should you structure projects? • What steps in a project slow you down?
  19. 19. thedatacollective@danwwilson #useR2018 Building skills
  20. 20. thedatacollective@danwwilson #useR2018 Example Project Receive Data Standardise Data Review Summary Client Thedata collective Analyse Data Review data Zip Outputs
  21. 21. thedatacollective@danwwilson #useR2018 Re-use code
  22. 22. thedatacollective@danwwilson #useR2018 Create functions get_recency(tx_date, current_date) get_frequency(num_gifts) get_value(gift_amount) All in src/99_functions.R get_value <- function(amount) { amount <- suppressWarnings(as.numeric(amount)) x <- replace(amount, is.na(amount), -Inf) cut(x, breaks = c(-Inf, 0.00001, 10, 25, 50, 100, 250, 500, 1000, Inf), labels = c("ERROR", "<$10", "$10-$24.99", "$25-$49.99", "$50-$99.99", "$100-$249.99", "$250-$499.99", "$500-$999.99", "$1,000+"), include.lowest = TRUE, right = FALSE) }
  23. 23. thedatacollective@danwwilson #useR2018 But…
  24. 24. thedatacollective@danwwilson #useR2018 Create a package segmentr - http://bit.ly/segmentr • http://bit.ly/pkg_hadley • http://bit.ly/pkg_hilary • http://bit.ly/pkg_rstudio • http://bit.ly/pkg_karl
  25. 25. thedatacollective@danwwilson #useR2018 Start at your level, and improve • Start simple • Re-usable code snippets • Keep them together • Build a file of commonly used functions (99_functions.R) • Build a package • When you can make the time, or skills are ready
  26. 26. thedatacollective@danwwilson #useR2018 Reducing friction
  27. 27. thedatacollective@danwwilson #useR2018 What slows you down? • R is great even with its quirks > paste(“Dan”, NA, “Wilson”) # Dan NA Wilson paste_na() > paste_na(“Dan”, NA, “Wilson”) # Dan Wilson
  28. 28. thedatacollective@danwwilson #useR2018 Standardised data > data(package = "segmentr") Data sets in package 'segmentr’: ask_conversion Ask Conversion Table lookup_channel Channel Code Lookup lookup_classification Classification Code Lookup segments Segments
  29. 29. thedatacollective@danwwilson #useR2018 Streamlined process* • Build templates • Start the project quickly • Next step is client specific templates
  30. 30. thedatacollective@danwwilson #useR2018 Getting data out of R* • Copying and pasting from the console is painful (or impossible) • Writing to CSV and opening to copy/paste from is too many steps • copy_clip()
  31. 31. thedatacollective@danwwilson #useR2018 Build solutions to common problems • Limit manual intervention • Make objects (data, functions, etc) accessible • Reduce repetition • Simplify integration with other tools/software
  32. 32. thedatacollective@danwwilson #useR2018 Key take outs 1. Workflows require a little bit of forethought… take the time 2. Start at your level and build your skills 3. Ease the friction
  33. 33. thedatacollective@danwwilson #useR2018 Thanks @danwwilson

×