Code sharing and review
in the open with
rOpenSci
Julia Gustavsen, PhD
About me
• Background:
• PhD from University of British Columbia,
(Vancouver, Canada) in marine microbial ecology
• Current job:
• Bioinformatician at health tech company SOPHiA
GENETICS (St-Sulpice, VD, CH)
• Other activities:
• rOpenSci reviewer
• Software Carpentry instructor
2
Scientific data access and
analysis via R packages
Tutorials
UnConferences
Supporting post-docs and
other researchers
rOpenSci
Promoting tools and
best practices
3
https://ropensci.org
The issue of reproducibility in science
Vines et al.
The availability of research data declines
rapidly with article age.
Current Biology, 2014. 21:1-4
The availability of data
(and code) from
analyses can be
problematic for the
reproducibility of
results.
Probability
that data
are
available
or can be
shared
Age of paper (years)
The Effect of Article Age to
Receiving Data from the Authors
4
Scientific data access and
analysis via R packages
Tutorials
UnConferences
Supporting post-docs and
other researchers
rOpenSci
Promoting tools and
best practices
5
https://ropensci.org
rOpenSci: enabling access to scientific
data and reproducibility in analyses
• R packages = Software written in the
R programming language
• rOpenSci has >307 R packages
available
• What kinds of packages and data?
• Altmetrics
• Databases
• Geospatial
• Image Processing
• Text mining and language processing
• Computing Infrastructure
• Security
• Taxonomy
6
https://ropensci.org/packages/
Example of facilitating database
access: NCBI
• Package “rentrez” is used to
access NCBI’s database – large
amount of publication and
biological data.
• Can access NBCI by web-
interface (“entrez”) or
interacting with FTP site or
via the application
programming interface (API).
rentrez uses api to make it
easier to get data.
• Many tutorials at:
ropensci.org/tutorials
7
R packages submitted to rOpenSci
Submit
• Author
submits R
package to
rOpenSci
Code
review
• Editor
assigns two
reviewers
Revisions
• Rounds of
revisions
with author
and
reviewers
Decision
• Decision by
editor and
R package
included as
part of
rOpenSci
8
Why this code review works well?
• Code review is done in the open.
• Happens on Github
• Info available to other authors and
interested parties
• Less fear of unfairly negative review for
authors
• Code of conduct and reviewing guide
10
• Overall benefits of the code review
• Improved code
• Author and reviewer learn something
• About code
• About data accessibility
• Warm fuzzy feeling of helping another researcher
• Networking
Get involved with rOpenSci:
• Use their packages and discuss!
https://discuss.ropensci.org/
• Get involved with onboarding (submit a package, review a package)
https://github.com/ropensci/onboarding
• More ways to get involved
https://ropensci.org/community/
11
@rOpenSci
ropensci