SlideShare a Scribd company logo
Sharing 101
Code Reproducibility & Sharing
Series
Omnia Mohamed
Data Analytics Engineer , IBM
What you will NOT learn during this session?
How to Code in R
How to be professional Git users
What you’ll get from this session?
How to configure Git and R to play nice together
How to organize your R projects
How to publish your first R project on github
Some tips to make your code more shareable
Why we need git?
Version Control
Secure organized location for your code
Computer crashed?
Building a Career
● An essential skill for work market
● Your git account will be a portfolio of your data science projects
● Base for blogging
Team Collaboration
No need for shared folders
Easier tracking of changes
Code merging capabilities
Easy finger pointing
Who is who?
Git
Open source project for version control originally developed in 2005.
Github
Web-based Git repository hosting service, which offers all of the distributed revision
control and source code management (SCM) functionality.
Where do I start?
Install R & Rstudio
https://cloud.r-project.org/
https://www.rstudio.com/products/rstudio/download/
Install Git
https://gitforwindows.org/
Configure your account on git local
Open Git bash and run the following commands:
git config --global user.name 'Jane Doe'
git config --global user.email 'jane@example.com'
git config --global --list #this should show the configurations you just set
Create your first repository
From Git website :
Create new repository
Create your first repository
Get it local
Using R Studio :
1. File -> new project ->version control -> git
2. Insert repository url that u get from this screen
Or from git bash command line:
git clone https://github.com/YOUR-USERNAME/YOUR-REPOSITORY.git
Write your first script
File -> New ->R script
Generate some random data
x <- rnorm(1000)
y <- x * 2 + rnorm(1000)
df <- data.frame(x, y)
Visualize it
ggplot(data = df,mapping = aes(x,y))+geom_point()
Save!
Let’s land on git
Simple git workflow
Let’s land on git
Commit to local repository
Add comments
Push to remote repository
Check it out on the web
Tips for new gities
Comment your commits
Commit frequently
Push only tested code
Pull frequently
Sharing data science projects
The Ikea Mode Plug & Play Mode
The keys of a plug & play project
● Has readme file
● Standard coding convention
● Organized project directory
● Reproducible code
● Executable outputs
Read Me File
Project Title
Project scope
Environment and version info
Prerequisite
Installation guide
Example of usage
Authors
Contribution
License
You don’t need to
include all sections,
only the ones applies
to your project
Project Directory organization
Script files
known also
as “scripts”
folder
Markdown
reports each
markdown
has a folder
inside
Your data is saved here under 2 folders:
“Raw” for original data
“Preprocessed” for manipulated and
cleaned data
Each shiny app has a
folder under this one
You can have
additional folders as
you need like docs or
figs
Standard coding convention
Tidy verse style guide
Google R style guide
Make it Readable
File names : meaningful with no special chars and prefixed with order of the file if they
should run in sequence ,ex. 00_dataprep_functions.R
Attribute names : lowercase with _ ,ex. expiry_date
Assignment : using -> instead of ,ex. x <- 5 Alt+ -
Functions naming and commenting
Same naming as objects ,ex:
#' Drop last column of dataframe
#' @param data A dataframe.
#' @return dataframe after dropping last column.
#' @examples
#' drop_last_col(iris)
drop_last_col <- function(data){
dropped_data <- data[-c(length(data))]
return(dropped_data)
}
Function objective
Function parameters
Name is lowercase no special characters ,
opening brackets right after function
definition
Closing brackets at the end on seperate
line
Make it Reproducible - here
here() :
library(here)
file_name -> here(“data”,”file.csv”)
#The file_name string now holds the value of : “myprojectrootfolder/data/file.csv”
Make it Reproducible - Seed
For reproducing data or results that depend on random generation use seed() to
ensure same results every time.
par(mfrow=c(2,2))
for(i in 1:4){
x <- rnorm(1000)
hist(x, main = paste0("fig",i))
}
Make it Reproducible - Seed
par(mfrow=c(2,2))
for(i in 1:4){
set.seed(123)
x <- rnorm(1000)
hist(x, main = paste0("fig",i))
}
Make it reproducible - pacman
Make sure that the packages you use are installed on the running machine:
#check if pacman package doesn’t exist then install it
if(!require(pacman)){
install.packages("pacman")
}
#pacman will check the installation of packages , install them and load them into environment
pacman::p_load("tidyverse", "caTools", "glmnet")
Make it Reproducible
Environment practices:
● Use Packrat for libraries management
● Using checkpoint
● Using docker for full environment sharing
Make it executable
● Use R markdown for reporting analysis (will have a session on it later ;) )
● Use shiny apps for tools and interactive reports
● Use APIs for accessible models (Plumer is your friend)
● Create packages
Now it’s your turn
Fork repo of world life expectancy dataset:
https://github.com/rfordatascience/tidytuesday/tree/master/data/2018/2018-07-03
Create your own project
Organize it your way
Find out :
● Top 3 countries with highest life expectancy in 2015 .
● Top 3 countries who improved over past 20 years.
Share your repo with us on the meetup website
First 3 to submit
with the mentioned
guidelines will win
voucher of 50LE
worth
May the odds be ever in your favor!
References
Git Resources:
https://git-scm.com/book/en/v2
https://happygitwithr.com/install-git.html#install-git-windows
https://www.javaworld.com/article/2113465/git-smart-20-essential-tips-for-git-and-
github-users.html
References - cont.
Reproducibility and project organization:
https://swcarpentry.github.io/r-novice-gapminder/02-project-intro/
https://kbroman.org/steps2rr/pages/organize.html
https://github.com/swcarpentry/good-enough-practices-in-scientific-
computing/blob/gh-pages/good-enough-practices-for-scientific-computing.pdf
Read me template:
https://gist.github.com/PurpleBooth/109311bb0361f32d87a2
References - Cont.
Style guides:
https://style.tidyverse.org/files.html#names
https://google.github.io/styleguide/Rguide.xml
Environment packaging :
https://rstudio.github.io/packrat/walkthrough.html
https://colinfay.me/docker-r-reproducibility/
Thank you!
Cairo
Meetup

More Related Content

What's hot

Overlay Technique | Pebble Developer Retreat 2014
Overlay Technique | Pebble Developer Retreat 2014Overlay Technique | Pebble Developer Retreat 2014
Overlay Technique | Pebble Developer Retreat 2014
Pebble Technology
 
Adding CF Attributes to an HDF5 File
Adding CF Attributes to an HDF5 FileAdding CF Attributes to an HDF5 File
Adding CF Attributes to an HDF5 File
The HDF-EOS Tools and Information Center
 
Mahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformMahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud Platform
IMC Institute
 
PyHEP 2019: Python 3.8
PyHEP 2019: Python 3.8PyHEP 2019: Python 3.8
PyHEP 2019: Python 3.8
Henry Schreiner
 
Device-specific Clang Tooling for Embedded Systems
Device-specific Clang Tooling for Embedded SystemsDevice-specific Clang Tooling for Embedded Systems
Device-specific Clang Tooling for Embedded Systems
emBO_Conference
 
Time Series Data with InfluxDB
Time Series Data with InfluxDBTime Series Data with InfluxDB
Time Series Data with InfluxDB
Turi, Inc.
 
Infrastructure as Code & Terraform 101
Infrastructure as Code & Terraform 101Infrastructure as Code & Terraform 101
Infrastructure as Code & Terraform 101
Kristoffer Ahl
 
Sorter
SorterSorter
Tim Panton - Presentation at Emerging Communications Conference & Awards (eCo...
Tim Panton - Presentation at Emerging Communications Conference & Awards (eCo...Tim Panton - Presentation at Emerging Communications Conference & Awards (eCo...
Tim Panton - Presentation at Emerging Communications Conference & Awards (eCo...
eCommConf
 
Limits Profiling
Limits ProfilingLimits Profiling
Limits Profiling
Adrian Larson
 
Bypassing DEP using ROP
Bypassing DEP using ROPBypassing DEP using ROP
Bypassing DEP using ROP
Japneet Singh
 
20210928_pgunconf_hll_count
20210928_pgunconf_hll_count20210928_pgunconf_hll_count
20210928_pgunconf_hll_count
Kohei KaiGai
 
Profiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf ToolsProfiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf Tools
emBO_Conference
 
Performance Profiling in Rust
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in Rust
InfluxData
 
HBase based map reduce job unit testing
HBase based map reduce job unit testingHBase based map reduce job unit testing
HBase based map reduce job unit testing
Ashok Agarwal
 
#PDR15 Creating Pebble Apps for Aplite, Basalt, and Chalk
#PDR15 Creating Pebble Apps for Aplite, Basalt, and Chalk#PDR15 Creating Pebble Apps for Aplite, Basalt, and Chalk
#PDR15 Creating Pebble Apps for Aplite, Basalt, and Chalk
Pebble Technology
 
Cypher for Gremlin
Cypher for GremlinCypher for Gremlin
Cypher for Gremlin
openCypher
 
Scaling with Python: SF Python Meetup, September 2017
Scaling with Python: SF Python Meetup, September 2017Scaling with Python: SF Python Meetup, September 2017
Scaling with Python: SF Python Meetup, September 2017
Varun Varma
 
Linux Kernel 개발참여방법과 문화 (Contribution)
Linux Kernel 개발참여방법과 문화 (Contribution)Linux Kernel 개발참여방법과 문화 (Contribution)
Linux Kernel 개발참여방법과 문화 (Contribution)
Ubuntu Korea Community
 
Terraform 101
Terraform 101Terraform 101
Terraform 101
Haggai Philip Zagury
 

What's hot (20)

Overlay Technique | Pebble Developer Retreat 2014
Overlay Technique | Pebble Developer Retreat 2014Overlay Technique | Pebble Developer Retreat 2014
Overlay Technique | Pebble Developer Retreat 2014
 
Adding CF Attributes to an HDF5 File
Adding CF Attributes to an HDF5 FileAdding CF Attributes to an HDF5 File
Adding CF Attributes to an HDF5 File
 
Mahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformMahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud Platform
 
PyHEP 2019: Python 3.8
PyHEP 2019: Python 3.8PyHEP 2019: Python 3.8
PyHEP 2019: Python 3.8
 
Device-specific Clang Tooling for Embedded Systems
Device-specific Clang Tooling for Embedded SystemsDevice-specific Clang Tooling for Embedded Systems
Device-specific Clang Tooling for Embedded Systems
 
Time Series Data with InfluxDB
Time Series Data with InfluxDBTime Series Data with InfluxDB
Time Series Data with InfluxDB
 
Infrastructure as Code & Terraform 101
Infrastructure as Code & Terraform 101Infrastructure as Code & Terraform 101
Infrastructure as Code & Terraform 101
 
Sorter
SorterSorter
Sorter
 
Tim Panton - Presentation at Emerging Communications Conference & Awards (eCo...
Tim Panton - Presentation at Emerging Communications Conference & Awards (eCo...Tim Panton - Presentation at Emerging Communications Conference & Awards (eCo...
Tim Panton - Presentation at Emerging Communications Conference & Awards (eCo...
 
Limits Profiling
Limits ProfilingLimits Profiling
Limits Profiling
 
Bypassing DEP using ROP
Bypassing DEP using ROPBypassing DEP using ROP
Bypassing DEP using ROP
 
20210928_pgunconf_hll_count
20210928_pgunconf_hll_count20210928_pgunconf_hll_count
20210928_pgunconf_hll_count
 
Profiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf ToolsProfiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf Tools
 
Performance Profiling in Rust
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in Rust
 
HBase based map reduce job unit testing
HBase based map reduce job unit testingHBase based map reduce job unit testing
HBase based map reduce job unit testing
 
#PDR15 Creating Pebble Apps for Aplite, Basalt, and Chalk
#PDR15 Creating Pebble Apps for Aplite, Basalt, and Chalk#PDR15 Creating Pebble Apps for Aplite, Basalt, and Chalk
#PDR15 Creating Pebble Apps for Aplite, Basalt, and Chalk
 
Cypher for Gremlin
Cypher for GremlinCypher for Gremlin
Cypher for Gremlin
 
Scaling with Python: SF Python Meetup, September 2017
Scaling with Python: SF Python Meetup, September 2017Scaling with Python: SF Python Meetup, September 2017
Scaling with Python: SF Python Meetup, September 2017
 
Linux Kernel 개발참여방법과 문화 (Contribution)
Linux Kernel 개발참여방법과 문화 (Contribution)Linux Kernel 개발참여방법과 문화 (Contribution)
Linux Kernel 개발참여방법과 문화 (Contribution)
 
Terraform 101
Terraform 101Terraform 101
Terraform 101
 

Similar to R sharing 101

Reproducible Computational Research in R
Reproducible Computational Research in RReproducible Computational Research in R
Reproducible Computational Research in R
Samuel Bosch
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache Kafka
Joe Stein
 
MobileConf 2021 Slides: Let's build macOS CLI Utilities using Swift
MobileConf 2021 Slides:  Let's build macOS CLI Utilities using SwiftMobileConf 2021 Slides:  Let's build macOS CLI Utilities using Swift
MobileConf 2021 Slides: Let's build macOS CLI Utilities using Swift
Diego Freniche Brito
 
Dependencies Managers in C/C++. Using stdcpp 2014
Dependencies Managers in C/C++. Using stdcpp 2014Dependencies Managers in C/C++. Using stdcpp 2014
Dependencies Managers in C/C++. Using stdcpp 2014
biicode
 
How to lock a Python in a cage? Managing Python environment inside an R project
How to lock a Python in a cage?  Managing Python environment inside an R projectHow to lock a Python in a cage?  Managing Python environment inside an R project
How to lock a Python in a cage? Managing Python environment inside an R project
WLOG Solutions
 
Open source projects with python
Open source projects with pythonOpen source projects with python
Open source projects with python
roskakori
 
Adopt DevOps philosophy on your Symfony projects (Symfony Live 2011)
Adopt DevOps philosophy on your Symfony projects (Symfony Live 2011)Adopt DevOps philosophy on your Symfony projects (Symfony Live 2011)
Adopt DevOps philosophy on your Symfony projects (Symfony Live 2011)
Fabrice Bernhard
 
Ab initio training Ab-initio Architecture
Ab initio training Ab-initio ArchitectureAb initio training Ab-initio Architecture
Ab initio training Ab-initio Architecture
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Advanced Malware Analysis Training Session 5 - Reversing Automation
Advanced Malware Analysis Training Session 5 - Reversing AutomationAdvanced Malware Analysis Training Session 5 - Reversing Automation
Advanced Malware Analysis Training Session 5 - Reversing Automation
securityxploded
 
Devtools cheatsheet
Devtools cheatsheetDevtools cheatsheet
Devtools cheatsheet
Dr. Volkan OBAN
 
Devtools cheatsheet
Devtools cheatsheetDevtools cheatsheet
Devtools cheatsheet
Dieudonne Nahigombeye
 
Django dev-env-my-way
Django dev-env-my-wayDjango dev-env-my-way
Django dev-env-my-way
Robert Lujo
 
PyCon 2013 : Scripting to PyPi to GitHub and More
PyCon 2013 : Scripting to PyPi to GitHub and MorePyCon 2013 : Scripting to PyPi to GitHub and More
PyCon 2013 : Scripting to PyPi to GitHub and More
Matt Harrison
 
DevOps(4) : Ansible(2) - (MOSG)
DevOps(4) : Ansible(2) - (MOSG)DevOps(4) : Ansible(2) - (MOSG)
DevOps(4) : Ansible(2) - (MOSG)
Soshi Nemoto
 
"I have a framework idea" - Repeat less, share more.
"I have a framework idea" - Repeat less, share more."I have a framework idea" - Repeat less, share more.
"I have a framework idea" - Repeat less, share more.
Fabio Milano
 
Yaetos Tech Overview
Yaetos Tech OverviewYaetos Tech Overview
Yaetos Tech Overview
prevota
 
Lean Drupal Repositories with Composer and Drush
Lean Drupal Repositories with Composer and DrushLean Drupal Repositories with Composer and Drush
Lean Drupal Repositories with Composer and Drush
Pantheon
 
C# Production Debugging Made Easy
 C# Production Debugging Made Easy C# Production Debugging Made Easy
C# Production Debugging Made Easy
Alon Fliess
 
Debugging Python with gdb
Debugging Python with gdbDebugging Python with gdb
Debugging Python with gdb
Roman Podoliaka
 
Through the firewall with miniCRAN
Through the firewall with miniCRANThrough the firewall with miniCRAN
Through the firewall with miniCRAN
Revolution Analytics
 

Similar to R sharing 101 (20)

Reproducible Computational Research in R
Reproducible Computational Research in RReproducible Computational Research in R
Reproducible Computational Research in R
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache Kafka
 
MobileConf 2021 Slides: Let's build macOS CLI Utilities using Swift
MobileConf 2021 Slides:  Let's build macOS CLI Utilities using SwiftMobileConf 2021 Slides:  Let's build macOS CLI Utilities using Swift
MobileConf 2021 Slides: Let's build macOS CLI Utilities using Swift
 
Dependencies Managers in C/C++. Using stdcpp 2014
Dependencies Managers in C/C++. Using stdcpp 2014Dependencies Managers in C/C++. Using stdcpp 2014
Dependencies Managers in C/C++. Using stdcpp 2014
 
How to lock a Python in a cage? Managing Python environment inside an R project
How to lock a Python in a cage?  Managing Python environment inside an R projectHow to lock a Python in a cage?  Managing Python environment inside an R project
How to lock a Python in a cage? Managing Python environment inside an R project
 
Open source projects with python
Open source projects with pythonOpen source projects with python
Open source projects with python
 
Adopt DevOps philosophy on your Symfony projects (Symfony Live 2011)
Adopt DevOps philosophy on your Symfony projects (Symfony Live 2011)Adopt DevOps philosophy on your Symfony projects (Symfony Live 2011)
Adopt DevOps philosophy on your Symfony projects (Symfony Live 2011)
 
Ab initio training Ab-initio Architecture
Ab initio training Ab-initio ArchitectureAb initio training Ab-initio Architecture
Ab initio training Ab-initio Architecture
 
Advanced Malware Analysis Training Session 5 - Reversing Automation
Advanced Malware Analysis Training Session 5 - Reversing AutomationAdvanced Malware Analysis Training Session 5 - Reversing Automation
Advanced Malware Analysis Training Session 5 - Reversing Automation
 
Devtools cheatsheet
Devtools cheatsheetDevtools cheatsheet
Devtools cheatsheet
 
Devtools cheatsheet
Devtools cheatsheetDevtools cheatsheet
Devtools cheatsheet
 
Django dev-env-my-way
Django dev-env-my-wayDjango dev-env-my-way
Django dev-env-my-way
 
PyCon 2013 : Scripting to PyPi to GitHub and More
PyCon 2013 : Scripting to PyPi to GitHub and MorePyCon 2013 : Scripting to PyPi to GitHub and More
PyCon 2013 : Scripting to PyPi to GitHub and More
 
DevOps(4) : Ansible(2) - (MOSG)
DevOps(4) : Ansible(2) - (MOSG)DevOps(4) : Ansible(2) - (MOSG)
DevOps(4) : Ansible(2) - (MOSG)
 
"I have a framework idea" - Repeat less, share more.
"I have a framework idea" - Repeat less, share more."I have a framework idea" - Repeat less, share more.
"I have a framework idea" - Repeat less, share more.
 
Yaetos Tech Overview
Yaetos Tech OverviewYaetos Tech Overview
Yaetos Tech Overview
 
Lean Drupal Repositories with Composer and Drush
Lean Drupal Repositories with Composer and DrushLean Drupal Repositories with Composer and Drush
Lean Drupal Repositories with Composer and Drush
 
C# Production Debugging Made Easy
 C# Production Debugging Made Easy C# Production Debugging Made Easy
C# Production Debugging Made Easy
 
Debugging Python with gdb
Debugging Python with gdbDebugging Python with gdb
Debugging Python with gdb
 
Through the firewall with miniCRAN
Through the firewall with miniCRANThrough the firewall with miniCRAN
Through the firewall with miniCRAN
 

Recently uploaded

一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 

Recently uploaded (20)

一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 

R sharing 101

  • 1. Sharing 101 Code Reproducibility & Sharing Series Omnia Mohamed Data Analytics Engineer , IBM
  • 2. What you will NOT learn during this session? How to Code in R How to be professional Git users
  • 3. What you’ll get from this session? How to configure Git and R to play nice together How to organize your R projects How to publish your first R project on github Some tips to make your code more shareable
  • 4. Why we need git?
  • 6. Secure organized location for your code Computer crashed?
  • 7. Building a Career ● An essential skill for work market ● Your git account will be a portfolio of your data science projects ● Base for blogging
  • 8. Team Collaboration No need for shared folders Easier tracking of changes Code merging capabilities Easy finger pointing
  • 9. Who is who? Git Open source project for version control originally developed in 2005. Github Web-based Git repository hosting service, which offers all of the distributed revision control and source code management (SCM) functionality.
  • 10. Where do I start? Install R & Rstudio https://cloud.r-project.org/ https://www.rstudio.com/products/rstudio/download/ Install Git https://gitforwindows.org/
  • 11. Configure your account on git local Open Git bash and run the following commands: git config --global user.name 'Jane Doe' git config --global user.email 'jane@example.com' git config --global --list #this should show the configurations you just set
  • 12. Create your first repository From Git website : Create new repository
  • 13. Create your first repository Get it local Using R Studio : 1. File -> new project ->version control -> git 2. Insert repository url that u get from this screen Or from git bash command line: git clone https://github.com/YOUR-USERNAME/YOUR-REPOSITORY.git
  • 14. Write your first script File -> New ->R script Generate some random data x <- rnorm(1000) y <- x * 2 + rnorm(1000) df <- data.frame(x, y) Visualize it ggplot(data = df,mapping = aes(x,y))+geom_point() Save!
  • 17. Let’s land on git Commit to local repository Add comments Push to remote repository Check it out on the web
  • 18. Tips for new gities Comment your commits Commit frequently Push only tested code Pull frequently
  • 19. Sharing data science projects The Ikea Mode Plug & Play Mode
  • 20. The keys of a plug & play project ● Has readme file ● Standard coding convention ● Organized project directory ● Reproducible code ● Executable outputs
  • 21. Read Me File Project Title Project scope Environment and version info Prerequisite Installation guide Example of usage Authors Contribution License You don’t need to include all sections, only the ones applies to your project
  • 22. Project Directory organization Script files known also as “scripts” folder Markdown reports each markdown has a folder inside Your data is saved here under 2 folders: “Raw” for original data “Preprocessed” for manipulated and cleaned data Each shiny app has a folder under this one You can have additional folders as you need like docs or figs
  • 23. Standard coding convention Tidy verse style guide Google R style guide
  • 24. Make it Readable File names : meaningful with no special chars and prefixed with order of the file if they should run in sequence ,ex. 00_dataprep_functions.R Attribute names : lowercase with _ ,ex. expiry_date Assignment : using -> instead of ,ex. x <- 5 Alt+ -
  • 25. Functions naming and commenting Same naming as objects ,ex: #' Drop last column of dataframe #' @param data A dataframe. #' @return dataframe after dropping last column. #' @examples #' drop_last_col(iris) drop_last_col <- function(data){ dropped_data <- data[-c(length(data))] return(dropped_data) } Function objective Function parameters Name is lowercase no special characters , opening brackets right after function definition Closing brackets at the end on seperate line
  • 26. Make it Reproducible - here here() : library(here) file_name -> here(“data”,”file.csv”) #The file_name string now holds the value of : “myprojectrootfolder/data/file.csv”
  • 27. Make it Reproducible - Seed For reproducing data or results that depend on random generation use seed() to ensure same results every time. par(mfrow=c(2,2)) for(i in 1:4){ x <- rnorm(1000) hist(x, main = paste0("fig",i)) }
  • 28. Make it Reproducible - Seed par(mfrow=c(2,2)) for(i in 1:4){ set.seed(123) x <- rnorm(1000) hist(x, main = paste0("fig",i)) }
  • 29. Make it reproducible - pacman Make sure that the packages you use are installed on the running machine: #check if pacman package doesn’t exist then install it if(!require(pacman)){ install.packages("pacman") } #pacman will check the installation of packages , install them and load them into environment pacman::p_load("tidyverse", "caTools", "glmnet")
  • 30. Make it Reproducible Environment practices: ● Use Packrat for libraries management ● Using checkpoint ● Using docker for full environment sharing
  • 31. Make it executable ● Use R markdown for reporting analysis (will have a session on it later ;) ) ● Use shiny apps for tools and interactive reports ● Use APIs for accessible models (Plumer is your friend) ● Create packages
  • 32. Now it’s your turn Fork repo of world life expectancy dataset: https://github.com/rfordatascience/tidytuesday/tree/master/data/2018/2018-07-03 Create your own project Organize it your way Find out : ● Top 3 countries with highest life expectancy in 2015 . ● Top 3 countries who improved over past 20 years. Share your repo with us on the meetup website First 3 to submit with the mentioned guidelines will win voucher of 50LE worth
  • 33. May the odds be ever in your favor!
  • 35. References - cont. Reproducibility and project organization: https://swcarpentry.github.io/r-novice-gapminder/02-project-intro/ https://kbroman.org/steps2rr/pages/organize.html https://github.com/swcarpentry/good-enough-practices-in-scientific- computing/blob/gh-pages/good-enough-practices-for-scientific-computing.pdf Read me template: https://gist.github.com/PurpleBooth/109311bb0361f32d87a2
  • 36. References - Cont. Style guides: https://style.tidyverse.org/files.html#names https://google.github.io/styleguide/Rguide.xml Environment packaging : https://rstudio.github.io/packrat/walkthrough.html https://colinfay.me/docker-r-reproducibility/

Editor's Notes

  1. https://happygitwithr.com/install-git.html#install-git-windows
  2. https://www.javaworld.com/article/2113465/git-smart-20-essential-tips-for-git-and-github-users.html
  3. https://swcarpentry.github.io/r-novice-gapminder/02-project-intro/ https://kbroman.org/steps2rr/pages/organize.html https://github.com/swcarpentry/good-enough-practices-in-scientific-computing/blob/gh-pages/good-enough-practices-for-scientific-computing.pdf
  4. https://gist.github.com/PurpleBooth/109311bb0361f32d87a2
  5. https://google.github.io/styleguide/Rguide.xml
  6. https://rstudio.github.io/packrat/walkthrough.html