NY R Conference talk

The Lesser Known s
of the Tidyverse
Emily Robinson
@robinson_es
About Me
➔ R User ~ 6 years
➔ Data Scientist at DataCamp
➔ Enjoy talking about:
◆ A/B testing
◆ Building and finding data
science community
◆ R
Talk Goals
1. Keep you hip to the lingo
2. Stop you from doing this ….
…. by sharing useful functions
3. Point you to resources
The Tidyverse
Coherent system of packages for
data manipulation, exploration,
and visualization that share a
common design philosophy
NY R Conference talk
Tidyverse = ?
Tidyverse = !
Tidyverse != Hadleyverse
Tidyverse != Hadleyverse
Many other contributors
Demo
NY R Conference talk
Some steps of a data analysis workflow
➔ View dataset in console
➔ Inspect missing values
➔ Examine some columns
➔ Make a plot
➔ Do something cool and new!
Problem: it takes over the console
Step 1: print your dataset!
Prints only 10 rows and the columns that fit on the screen
Solution: as_tibble()
Problem: how do you do this for every column?
Step 2: examine your NAs
Problem: missing values aren’t actually NA
Answer: purrr::map_df() to “map” function over each column
Solution: na_if() to replace certain values with NA
Problem: how I can I do this quickly?
+
Skimr
Solution: dplyr::select_if() + skimr::skim()
Step 3: examine your numeric columns
Problem: it has multiple answers in each row
Step 4: examine a single column
Solution: stringr::str_split() …
Solution: stringr::str_split() and tidyr::unnest()
+
Problem: it’s a mess
Step 5: make a plot!
Solution: coord_flip …
But they’re not ordered
+ forcats::fct_reorder
Final step: do something cool and new!
Problem:
One solution: make a minimal reproducible example
+
Part 0 (optional): use tribble() to make a toy dataset
Part 1: Use reprex() to find any problems
Credit: Nick Tiernay, https://www.njtierney.com/post/2017/01/11/magic-reprex/
Part 2: Use reprex() to post your question or issue
Credit: Nick Tiernay, https://www.njtierney.com/post/2017/01/11/magic-reprex/
Review
stringr::str_split
tidyr::unnest
coord_flip()
forcats::fct_reorder
tibble::tribble
reprex::reprex
tibble::as_tibble
purrr:map_df
dplyr::na_if
dplyr::select_if
skimr::skim
Resources
R4ds.had.co.nz
#rstats twitter
#rstats twitter
Rstudio.com/resources/cheatsheets
DataCamp.com
Learn | https://datacamp.com/courses
Conclusion
The tidyverse
Come for the stickers and
package names …
Stay for the friendly
community and happy
workflow
Thank you!
tiny.cc/nyrtalk
hookedondata.org
@robinson_es
1 of 45

More Related Content

Similar to NY R Conference talk(20)

Startup Data ScienceStartup Data Science
Startup Data Science
Misha Lisovich290 views
The Essential Perl Hacker's ToolkitThe Essential Perl Hacker's Toolkit
The Essential Perl Hacker's Toolkit
Stephen Scaffidi3K views
2014 pycon-talk2014 pycon-talk
2014 pycon-talk
c.titus.brown3.1K views
Don't let your tests slow you downDon't let your tests slow you down
Don't let your tests slow you down
Daniel Irvine259 views
Final grasp ASEFinal grasp ASE
Final grasp ASE
babak danyal1.4K views
Ad505 dev blastAd505 dev blast
Ad505 dev blast
Bill Buchan503 views
Five Ways To Do Data Analytics "The Wrong Way"Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"
Discover Pinterest1.4K views
Data VisualizationData Visualization
Data Visualization
Vera Kovaleva339 views
Beginner's Guide to UI DesignBeginner's Guide to UI Design
Beginner's Guide to UI Design
Máirín Duffy3.6K views
DataHubDataHub
DataHub
Aditya Parameswaran2K views

Recently uploaded(20)

Web Dev - 1 PPT.pdfWeb Dev - 1 PPT.pdf
Web Dev - 1 PPT.pdf
gdsczhcet49 views
ChatGPT and AI for Web DevelopersChatGPT and AI for Web Developers
ChatGPT and AI for Web Developers
Maximiliano Firtman161 views
Liqid: Composable CXL PreviewLiqid: Composable CXL Preview
Liqid: Composable CXL Preview
CXL Forum120 views
ThroughputThroughput
Throughput
Moisés Armani Ramírez31 views
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
Prity Khastgir IPR Strategic India Patent Attorney Amplify Innovation24 views

NY R Conference talk