SlideShare a Scribd company logo
1 of 33
Download to read offline
thedatacollective@danwwilson #useR2018
Practical R Workflows
How I smooth the bumps
thedatacollective@danwwilson #useR2018
Who am I?
thedatacollective@danwwilson #useR2018
Less of…
thedatacollective@danwwilson #useR2018
More of..
thedatacollective@danwwilson #useR2018
2017 2018 2019JulySeptember
Founded
thedatacollective@danwwilson #useR2018
Who should be here?
thedatacollective@danwwilson #useR2018
thedatacollective@danwwilson #useR2018
thedatacollective@danwwilson #useR2018
thedatacollective@danwwilson #useR2018
thedatacollective@danwwilson #useR2018
thedatacollective@danwwilson #useR2018
Where to start?
thedatacollective@danwwilson #useR2018
It’s all about the data
thedatacollective@danwwilson #useR2018
Set your requirements…
…then be flexible
thedatacollective@danwwilson #useR2018
Make it yours
thedatacollective@danwwilson #useR2018
*
Organise things*
• Name things well http://bit.ly/Jenny_naming
• Determine your folder structure and use it consistently
thedatacollective@danwwilson #useR2018
Identify points of friction
thedatacollective@danwwilson #useR2018
Workflows take thinking
• What work is repetitive?
• What data can you standardize across projects?
• How should you name things?
• How should you structure projects?
• What steps in a project slow you down?
thedatacollective@danwwilson #useR2018
Building skills
thedatacollective@danwwilson #useR2018
Example Project
Receive
Data
Standardise
Data
Review
Summary
Client
Thedata
collective
Analyse
Data
Review data
Zip Outputs
thedatacollective@danwwilson #useR2018
Re-use code
thedatacollective@danwwilson #useR2018
Create functions
get_recency(tx_date, current_date)
get_frequency(num_gifts)
get_value(gift_amount)
All in src/99_functions.R
get_value <- function(amount) {
amount <- suppressWarnings(as.numeric(amount))
x <- replace(amount, is.na(amount), -Inf)
cut(x, breaks = c(-Inf, 0.00001, 10, 25, 50, 100, 250, 500, 1000, Inf),
labels = c("ERROR", "<$10", "$10-$24.99", "$25-$49.99", "$50-$99.99",
"$100-$249.99", "$250-$499.99", "$500-$999.99", "$1,000+"),
include.lowest = TRUE, right = FALSE)
}
thedatacollective@danwwilson #useR2018
But…
thedatacollective@danwwilson #useR2018
Create a package
segmentr - http://bit.ly/segmentr
• http://bit.ly/pkg_hadley
• http://bit.ly/pkg_hilary
• http://bit.ly/pkg_rstudio
• http://bit.ly/pkg_karl
thedatacollective@danwwilson #useR2018
Start at your level, and improve
• Start simple
• Re-usable code snippets
• Keep them together
• Build a file of commonly used functions (99_functions.R)
• Build a package
• When you can make the time, or skills are ready
thedatacollective@danwwilson #useR2018
Reducing friction
thedatacollective@danwwilson #useR2018
What slows you down?
• R is great even with its quirks
> paste(“Dan”, NA, “Wilson”)
# Dan NA Wilson
paste_na()
> paste_na(“Dan”, NA, “Wilson”)
# Dan Wilson
thedatacollective@danwwilson #useR2018
Standardised data
> data(package = "segmentr")
Data sets in package 'segmentr’:
ask_conversion Ask Conversion Table
lookup_channel Channel Code Lookup
lookup_classification Classification Code Lookup
segments Segments
thedatacollective@danwwilson #useR2018
Streamlined process*
• Build templates
• Start the project quickly
• Next step is client specific templates
thedatacollective@danwwilson #useR2018
Getting data out of R*
• Copying and pasting from the console is painful (or impossible)
• Writing to CSV and opening to copy/paste from is too many steps
• copy_clip()
thedatacollective@danwwilson #useR2018
Build solutions to common problems
• Limit manual intervention
• Make objects (data, functions, etc) accessible
• Reduce repetition
• Simplify integration with other tools/software
thedatacollective@danwwilson #useR2018
Key take outs
1. Workflows require a little bit of forethought… take the time
2. Start at your level and build your skills
3. Ease the friction
thedatacollective@danwwilson #useR2018
Thanks
@danwwilson

More Related Content

Similar to Practical Workflows in R

Codemotion Milan 2018 - AI with a devops mindset: experimentation, sharing an...
Codemotion Milan 2018 - AI with a devops mindset: experimentation, sharing an...Codemotion Milan 2018 - AI with a devops mindset: experimentation, sharing an...
Codemotion Milan 2018 - AI with a devops mindset: experimentation, sharing an...Thiago de Faria
 
Thiago de Faria - AI with a devops mindset - experimentation, sharing and eas...
Thiago de Faria - AI with a devops mindset - experimentation, sharing and eas...Thiago de Faria - AI with a devops mindset - experimentation, sharing and eas...
Thiago de Faria - AI with a devops mindset - experimentation, sharing and eas...Codemotion
 
Data In Action: Business Value of Data
Data In Action: Business Value of DataData In Action: Business Value of Data
Data In Action: Business Value of DataMatt Turner
 
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Kent Graziano
 
.NET Development for SQL Server Developer - Sql Saturday #264 Ancona
.NET Development for SQL Server Developer - Sql Saturday #264 Ancona.NET Development for SQL Server Developer - Sql Saturday #264 Ancona
.NET Development for SQL Server Developer - Sql Saturday #264 AnconaMarco Parenzan
 
Managing Database Indexes: A Data-Driven Approach - Amadeus Magrabi
Managing Database Indexes: A Data-Driven Approach - Amadeus MagrabiManaging Database Indexes: A Data-Driven Approach - Amadeus Magrabi
Managing Database Indexes: A Data-Driven Approach - Amadeus MagrabiAmadeus Magrabi
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopCCG
 
Data Con LA 2022 - Supercharge your Snowflake Data Cloud from a Snowflake Dat...
Data Con LA 2022 - Supercharge your Snowflake Data Cloud from a Snowflake Dat...Data Con LA 2022 - Supercharge your Snowflake Data Cloud from a Snowflake Dat...
Data Con LA 2022 - Supercharge your Snowflake Data Cloud from a Snowflake Dat...Data Con LA
 
Data Security and Protection in DevOps
Data Security and Protection in DevOps Data Security and Protection in DevOps
Data Security and Protection in DevOps Karen Lopez
 
RedisConf18 - Redis Memory Optimization
RedisConf18 - Redis Memory OptimizationRedisConf18 - Redis Memory Optimization
RedisConf18 - Redis Memory OptimizationRedis Labs
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
The Path to Truly Understanding Your MongoDB Data
The Path to Truly Understanding Your MongoDB DataThe Path to Truly Understanding Your MongoDB Data
The Path to Truly Understanding Your MongoDB DataMongoDB
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsDATAVERSITY
 
NDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data ScienceNDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data ScienceMark West
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
 
Move a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloudMove a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloudIke Ellis
 
Overview of running R in the Oracle Database
Overview of running R in the Oracle DatabaseOverview of running R in the Oracle Database
Overview of running R in the Oracle DatabaseBrendan Tierney
 
Democratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseDemocratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseJesus Rodriguez
 
DataEd Webinar: Metadata Strategies
DataEd Webinar:  Metadata StrategiesDataEd Webinar:  Metadata Strategies
DataEd Webinar: Metadata StrategiesDATAVERSITY
 

Similar to Practical Workflows in R (20)

Codemotion Milan 2018 - AI with a devops mindset: experimentation, sharing an...
Codemotion Milan 2018 - AI with a devops mindset: experimentation, sharing an...Codemotion Milan 2018 - AI with a devops mindset: experimentation, sharing an...
Codemotion Milan 2018 - AI with a devops mindset: experimentation, sharing an...
 
Thiago de Faria - AI with a devops mindset - experimentation, sharing and eas...
Thiago de Faria - AI with a devops mindset - experimentation, sharing and eas...Thiago de Faria - AI with a devops mindset - experimentation, sharing and eas...
Thiago de Faria - AI with a devops mindset - experimentation, sharing and eas...
 
Data In Action: Business Value of Data
Data In Action: Business Value of DataData In Action: Business Value of Data
Data In Action: Business Value of Data
 
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
 
.NET Development for SQL Server Developer - Sql Saturday #264 Ancona
.NET Development for SQL Server Developer - Sql Saturday #264 Ancona.NET Development for SQL Server Developer - Sql Saturday #264 Ancona
.NET Development for SQL Server Developer - Sql Saturday #264 Ancona
 
Managing Database Indexes: A Data-Driven Approach - Amadeus Magrabi
Managing Database Indexes: A Data-Driven Approach - Amadeus MagrabiManaging Database Indexes: A Data-Driven Approach - Amadeus Magrabi
Managing Database Indexes: A Data-Driven Approach - Amadeus Magrabi
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual Workshop
 
Data Con LA 2022 - Supercharge your Snowflake Data Cloud from a Snowflake Dat...
Data Con LA 2022 - Supercharge your Snowflake Data Cloud from a Snowflake Dat...Data Con LA 2022 - Supercharge your Snowflake Data Cloud from a Snowflake Dat...
Data Con LA 2022 - Supercharge your Snowflake Data Cloud from a Snowflake Dat...
 
Data Security and Protection in DevOps
Data Security and Protection in DevOps Data Security and Protection in DevOps
Data Security and Protection in DevOps
 
BD-ACA Week6
BD-ACA Week6BD-ACA Week6
BD-ACA Week6
 
RedisConf18 - Redis Memory Optimization
RedisConf18 - Redis Memory OptimizationRedisConf18 - Redis Memory Optimization
RedisConf18 - Redis Memory Optimization
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
The Path to Truly Understanding Your MongoDB Data
The Path to Truly Understanding Your MongoDB DataThe Path to Truly Understanding Your MongoDB Data
The Path to Truly Understanding Your MongoDB Data
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling Fundamentals
 
NDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data ScienceNDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data Science
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
Move a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloudMove a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloud
 
Overview of running R in the Oracle Database
Overview of running R in the Oracle DatabaseOverview of running R in the Oracle Database
Overview of running R in the Oracle Database
 
Democratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseDemocratizing Data Science in the Enterprise
Democratizing Data Science in the Enterprise
 
DataEd Webinar: Metadata Strategies
DataEd Webinar:  Metadata StrategiesDataEd Webinar:  Metadata Strategies
DataEd Webinar: Metadata Strategies
 

Recently uploaded

Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...ThinkInnovation
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are successPratikSingh115843
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...ThinkInnovation
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 

Recently uploaded (16)

Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are success
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 

Practical Workflows in R