Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
data wrangling in R
@cjlortie
more than half the battle
benefit of wrangling in R versus Excel
reproducibility
transparent
know thy data
values, variables, and observations
tidy your data with variables in columns
vectors of values are easier to work with in R
column headers should be variable names
no special signs such as $#*/ within dataframe
one variable per column and one class
one observational unit per table
missing values
happen
is.na()

na.omit
na.rm=TRUE
simplify & selections from dataframes
drplyr
select for columns
filter for rows
summarise & mutate
%>% to build logical wrangling pipelines
tidy data semantics from Journal of Statistical Software
reading such as ‘tidy data’ idea paper
Upcoming SlideShare
Loading in …5
×

Data wrangling in R: be a wrangleR

804 views

Published on

A short deck illustrating some of the issues resolved using R to wrangle data. Missing values and dataframe simplification are the two issues primarily addressed. A few common data structure issues are also identified.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Data wrangling in R: be a wrangleR

  1. 1. data wrangling in R @cjlortie
  2. 2. more than half the battle
  3. 3. benefit of wrangling in R versus Excel
  4. 4. reproducibility transparent
  5. 5. know thy data
  6. 6. values, variables, and observations
  7. 7. tidy your data with variables in columns vectors of values are easier to work with in R
  8. 8. column headers should be variable names
  9. 9. no special signs such as $#*/ within dataframe
  10. 10. one variable per column and one class
  11. 11. one observational unit per table
  12. 12. missing values happen
  13. 13. is.na()
 na.omit na.rm=TRUE
  14. 14. simplify & selections from dataframes
  15. 15. drplyr
  16. 16. select for columns
  17. 17. filter for rows
  18. 18. summarise & mutate
  19. 19. %>% to build logical wrangling pipelines
  20. 20. tidy data semantics from Journal of Statistical Software reading such as ‘tidy data’ idea paper

×