SlideShare a Scribd company logo
1 of 50
Meet a 100% R-based CRO
The summary of a 5-year journey
Adrian Olszewski
Principal Biostatistician at 2KMM CRO
The R/Pharma 2022 Conference, Nov 10th 2022
10min
www.2kmm.eu
aolszewski@2kmm.pl
Disclaimer
2
This presentation shows the biostatistician’s perspective first.
Lots of exploratory research, involving tens of statistical tests, complex survival models and non-parametric
methods. Here producing TFLs is important but secondary. I need the NUMBERS first to populate them.
It’s not to „blame” or „unfairly criticize”.
My job is to analysis trial’s data on time and within the budget.
If something does not work so I cannot meet the deadline, fixing things exceeds the budget, the situation
seems hopeless – there’s no time for sentiments, ideology, and hiding problems. I’m going to be held
accountable for the effects of my work, not for my „love for the tool”.
„You get what you pay for. It’s a free software. Stop complaining.
The fact that something is „for free” does not mean it cannot be improved. The first step is to admit problems
exist, diagnoze them and be honest. It needs a sober assessment of the situation to counteract effectively.
So why do you „waste your time”? Buy XYZ® and be happy
I really want to make things better. I ❤️ R. If i did not, I’d have abandoned it in 2000. Things get better but
won’t „fix itself” magically. Relationships can be tough, but that's no reason to give up! Besides, it’s fun! 
Introduction ► Who we are
⦿ The 2KMM - a small Polish CRO with a global reach.
⦿ 100% R-based:
trial design • DM • datasets • analysis & research • TFL • documents •
consulting • tutoring • making tools
⦿ 28 projects: RCTs + observational studies (in several therapeutic areas)
lots of ad hoc research
Introduction ► Who we are
Our specifics:
• No CDISC yet ; data sources based on SQL views
• Lots of planned exploratory analyses with complex scenarios
• Sometimes asked to use dinosaur tools vs. the freshest method widely widespread
• Being a CRO we are not as powerful in decisions as a big pharmaceutical company:
o a Sponsor may have own vision and demand us to follow it
o our proposals may be questioned (sometimes without a discussion)
• Very differentiated requests from different sponsors:
• make tables like X, make table like Y
• use this format, use that format
• we prefer X, we HATE X. ABC is important vs. ABC is negligible vs. please decide
 It’s difficult to work out a common approach, workflow, template.
Introduction ► History
5
⦿ When we started, a few questions had to be answered
 Can we rely on R entirely? Will it suffice? Everyone around uses SAS
 What are the hidden costs of using open source (no free lunch)
 Can we trust R? How to validate it?
 What packages do we need to start? Collection of requirements
 How to organize the working environment (SOPs, technical aspects)
In general – we were rather optimistic in 2018 
Introduction ► Opinions
⦿ After 5 years we have some opinions:
 Did R suffice to complete our work? Mostly…
 Could we just „launch R and focus on the work”? Partially…
 Could we trust R on faith? Did we fail? No. / Painfully.
 What are the hidden costs of using open source: Non-negligible
 How many packages we ended up with? 230+ 
 Describe the experience briefly? annoyance, determination,
fixing stuff, reporting issues,
researching, satisfaction
 Are we happy with R? Will we stay with? It’s a tough ♥ / Yes
 Why? It’s flexible. It’s getting better.
It’s worth. We learned „HOW”
Introduction ► Costs
This is not true that using free software does not cost a penny. It costs the time
that one could spend doing the analysis, spent on:
 Collecting the library of necessary tools. That’s not easy, will show why.
 Validating the selected tools (making sure 2+2=4)
 Realizing, that the important package fails or has gone (hello, CRAN!)
 contacting authors or the entire group, reporting issues at GitHub
 searching for a replacement (+validation) - may lack features
 If no response - researching the problem on your own
 Paying for external consultancy, books, pay-walled articles to move on
Introduction ► Costs
How much did it cost?
An equivalent of a few 1yr licenses of a „good commercial software”.
Wait, what!? So where are the savings then?!
1. The cost is distributed over time (a year, say)
2. Such a big cost is rather one-off - at the beginning of the process
Occasional costs will take place, though (new versions, „retired packages”)
3. You get what you need (mostly), not what others decide you need
4. Once done – can be reused infinite number of times (no per-user licenses)
5. You better control what you have – because you are the one who made it.
6. You get the code – at least a little chance to fix things with own hands
7. As long you as your repository (library) is validated and frozen – you sleep well.
Introduction ► Costs
“Oh, c’mon. You have all the codes! It’s open source! Why don’t you just fix the
problems and go back to work? What’s the problem? I think you exaggerate!
Resources (staff + time + money) allocated to “employ the Open Source.”
Big company Small company
15 specialists
X$ 
2 specialists
Y$, Y << X 
Introduction ► Step 1: organization of work ; technical infrastructure + SOPs
projects R
3.x
VHDX container
V
P
N
wild
validated
SOP SOP
SOP SOP
Introduction ► Step 1: organization of work ; portable R
https://sourceforge.net/projects/rportable/
 Allows one to test new stuff and mix different
versions of R core in a single analysis.
 Easy – no installation (VMs, containers), no extra
packages / dependencies / setups
 No elevated rights needed
 Regular directories – easy management!
 When „matured”, can be packed into a VHDX cont.
 Easily selectable as the current engine in RStudio:
Introduction ► Step 1: organization of work ; portable R
Introduction ► Step 1: organization of work ; portable R
 This allows us to mix not only
packages in different versions (with
all necessary dependencies) in a
single analysis, but also to mix
versions of the R core itself, when
certain package needs higher/lower
version of the R core.
 Combines RPortable + rscript.exe +
convention of naming [input data]-
[output results] files.
 Each version-dependent code
knows where to read the data from
and where to store the results.
 Fully isolated codes. Data
exchanged via regular R objets
(RDS or feather)
Warning in install.packages :
package ‘emmeans’ is not available (for R version 3.6.3)
Introduction ► Step 2: Simple, automated workflow
Introduction ► Step 2: Simple, automated workflow
Introduction ► Step 2: Simple, automated template-based workflow
DOCx template
- Headers, footers
- Styles
- Content placeholders
definitions
definitions
Header
Footer
Title
Header
Footer
Report
ID A B C
1 A B C
2 A B C
DOCx report
- Headers, footers preserved
- Styles utilized
- Placeholders hold actual T/F/Ls
HTML log
- All R commands
- All messages
- All (simplified) results
Trial ABC
LOG
Author: xxx Date: xxxx
print(„Hi!”)
[1] Hi!
library(…)
library(…)
library(…)
…..
…..
…..
…..
Rmarkdown „manager”
- Reads the DOCx template for TODOs
- Does the „TODOs”
- Replaces „TODOs” with TFLs
- Becomes the HTML LOG
Introduction ► Step 2: Simple, automated template-based workflow
definitions
definitions
Header
Footer
Title
RMarkdown file
- Creates the environment
- Reads the DOCx template
- Loads the Word parsing „engine”
- The engine:
- iterates through definitions of placeholders
- parses the fields,
- loads the R files per convention
- executes the code
- replaces placeholders with actual TFLs
- Auto-updates (appends) the HTML to LOG
library(…)
library(…)
library(…)
…..
…..
…..
….. DOCx reading engine
Introduction ► Step 2: Simple, automated template-based workflow
definitions
definitions
Header
Footer
Title
RMarkdown file
- Creates the environment
- Reads the DOCx template
- Loads the Word parsing „engine”
- The engine:
- iterates through definitions of placeholders
- parses the fields,
- loads the R files per convention
- executes the code
- replaces placeholders with actual TFLs
- Auto-updates (appends) the HTML to LOG
library(…)
library(…)
library(…)
…..
…..
…..
….. DOCx reading engine
## Preparing the objects storing the content of the report in both MS Word and MS Excel
formats
```{r}
word_report_document_name <- paste0(target_report_document_name, ".docx")
excel_report_document_name <- paste0(target_report_document_name, ".xlsx")
word_report_template_name <- paste0(target_report_document_name, "_template.docx")
doc_report <- read_docx(word_report_document_name)
doc_content <- docx_summary(doc_report)
xls_report <- createWorkbook()
```
# Data analysis
```{r child="rendering_engine.rmd", echo=TRUE, results='asis'}
```
```{r}
print(doc_report, target = word_report_document_name)
saveWorkbook(wb = xls_report, file = excel_report_document_name, overwrite = TRUE)
```
Introduction ► Step 2: Simple, automated template-based workflow
definitions
definitions
Header
Footer
Title
RMarkdown file
- Creates the environment
- Reads the DOCx template
- Loads the Word parsing „engine”
- The engine:
- iterates through definitions of placeholders
- parses the fields,
- loads the R files per convention
- executes the code
- replaces placeholders with actual TFLs
- Auto-updates (appends) the HTML to LOG
library(…)
library(…)
library(…)
…..
…..
…..
….. DOCx reading engine
## Preparing the objects storing the content of the report in both MS Word and MS Excel
formats
```{r}
word_report_document_name <- paste0(target_report_document_name, ".docx")
excel_report_document_name <- paste0(target_report_document_name, ".xlsx")
word_report_template_name <- paste0(target_report_document_name, "_template.docx")
doc_report <- read_docx(word_report_document_name)
doc_content <- docx_summary(doc_report)
xls_report <- createWorkbook()
```
# Data analysis
```{r child="rendering_engine.rmd", echo=TRUE, results='asis'}
```
```{r}
print(doc_report, target = word_report_document_name)
saveWorkbook(wb = xls_report, file = excel_report_document_name, overwrite = TRUE)
```
table_defs <- subset(doc_content, grepl("^[Table]", doc_content$text), text)
table_defs <- gsub("[Table] ", "", table_defs$text)
for (def in table_defs) {
split_defs <- strsplit(def, "@")[[1]][-1]
table_title <- trimws(gsub("title:(.*)", "1", split_defs[grep("^title", split_defs)]))
table_number <- trimws(gsub("table_num:(.*)", "1", split_defs[grep("^table_num", split_defs)]))
force_table_num <- trimws(gsub("force_table_num:(.*)", "1", split_defs[grep("^force_table_num", split_defs)]))
table_sufix <- trimws(gsub("table_sufix:(.*)", "1", split_defs[grep("^table_sufix", split_defs)]))
r_file <- trimws(gsub("r_code:(.*)", "1", split_defs[grep("^r_code", split_defs)]))
r_prn_file <- trimws(gsub("r_printer_code:(.*)", "1", split_defs[grep("^r_printer_code", split_defs)]))
exclude <- trimws(gsub("exclude:(.*)", "1", split_defs[grep("^exclude", split_defs)]))
table_title <- iconv(table_title,from = "UTF-8", to = "UTF-8")
exclude <- ifelse(identical(exclude, character(0)), FALSE, as.logical(exclude))
……………………………………………
if (identical(r_file, character(0)) || r_file == "") {
r_file <- paste0("Table", table_number, table_sufix, ".r")
}
……………………………………………
r_file <- file.path(r_code_location, r_file)
chunk <- c(paste("#### Table ", paste0(table_number, table_sufix), "-", table_title, "n"),
paste("```{r ", r_file, "}n"),
readLines(r_file),
"```n")
cat(knit_child(text = chunk, quiet = TRUE), sep = 'n’)
……………………………………………
}
```
Introduction ► Step 2: Simple, automated template-based workflow
DOCx template
- Headers, footers
- Styles
- Content placeholders
definitions
definitions
Header
Footer
Title
Regular R files. Naming convention. Triplet per table.
Prefix:
_data reads data from RDATA / DBI / XML / CSV / XLSX
_an performs the analyses; stores results in RDATA
_print reads the RDATA, generates DOCx tables, XLSx
files, EMF graphs and HTML output for the LOG
Table_01_data.r Table_01_an.r Table_01_print.r
𝑦 = 𝛽0 + 𝛽1X
Introduction ► Step2: Simple, automated template-based workflow
definitions
definitions
Header
Footer
Title
Header
Footer
Report
ID A B C
1 A B C
2 A B C
library(…)
library(…)
library(…)
…..
…..
…..
…..
Trial ABC
LOG
Author: xxx Date: xxxx
print(„Hi!”)
[1] Hi!
Introduction ► Step2: Simple, automated template-based workflow
definitions
definitions
Header
Footer
Title
Header
Footer
Report
ID A B C
1 A B C
2 A B C
library(…)
library(…)
library(…)
…..
…..
…..
…..
Trial ABC
LOG
Author: xxx Date: xxxx
print(„Hi!”)
[1] Hi!
Introduction ► Step 3: defining tasks  finding tools  making a library
Modelling,
longitudinal
analysis
Inference
(testing, CIs, MCP)
Summaries Effect size
Advanced
survival
Making complex
tables
Dose – Response
PK, PD, DF
Questionnaires
Generating
documents
(DOCx, RTF, PDF)
Documenting
(log) the analysis
Data I/O Technical /
Programming
Trial design &
simulation
Plotting
Randomization
Data
manipulation
Meta-analysis CDISC-related
Missing data –
patterns and
imputation
Model
diagnostics
Introduction ► Step 3: defining tasks  finding tools  making a library
Modelling,
longitudinal
analysis
Inference
(testing, CIs, MCP)
Summaries Effect size
Advanced
survival
Making complex
tables
Dose – Response
PK, PD, DF
Questionnaires
Generating
documents
(DOCx, RTF, PDF)
Documenting
(log) the analysis
Data I/O Technical /
Programming
Trial design &
simulation
Plotting
Randomization
Data
manipulation
CRTSize, faux, gsDesign,
ldbounds, MAMS, Mediana,
PowerTOST, pwr, RCTdesign, RPACT,
samplesizeCMH, simstudy, SSRMST,
ThreeArmedTrials, TrialSize
CRTgeeDR, drgee, gee, geeasy,
geepack , geesmv, GLMMadaptive,
glmmTMB, glmtoolbox, ipw,
lavaSearch2, lme4, lmerTest,
lqmm, MASS, MASS, mmmgee,
multgee, MuMIn, nlme, ordinal,
QRLMM, repolr, rms, robustlmm,
sasLM, simplexreg, wgeesel,
lqmm, lqr, rms gam, quantreg…
Meta-analysis CDISC-related
Missing data –
patterns and
imputation
Model
diagnostics
bshazard, cmprsk, ComparisonSurv,
controlTest, coxphw, CoxR2, FHtest,
frailtypack, frailtysurv, landest,
maxcombo, mstate, muhaz, nph,
npsurvSS, pammtools, reda, reReg,
RMST, surv2sampleComp, survival,
Survmisc, survRM2
Introduction ► Step 3: defining tasks  finding tools  making a library
ARTool, Asbio, BaylorEdPsych,
bear, betareg, bindrcpp,
binom, biostatUZH, blockrand,
boot, broom, bshazard, car,
clinPK, clubSandwich, cmprsk,
coin, ComparisonSurv,
compute.es, confintr,
conflicted, contrast,
controlTest, correlation,
coxphw, CoxR2, CRTgeeDR,
CRTSize, cvcqv, dabestr,
DataEditR , DBI, DescTools,
devEMF, diffdf, DoseFinding,
dplyr, drgee, dunn.test,
e1071, effectsize, effsize,
emmeans, epiR, equatiomatic,
faux, FHtest, fitdistrplus,
flextable, forcats, foreign,
forestmodel, frailtypack,
frailtysurv, gam, gee, geeasy,
geepack , geesmv, GFD,
ggalluvial, GGally, ggeffects,
gghalves, ggmosaic, ggplot2,
ggpol, ggrepel, ggridges ,
ggside, ggsignif, ggstance,
ggtext, GLMMadaptive, glmmTMB,
glmnet, glmtoolbox, glue,
gmodels, gridExtra , gsDesign,
haven, Hmisc,
InformativeCensoring,
interactions, ipw, irr,
Kendall, knitr, knitr,
kSamples, landest, latex2exp,
lavaSearch2, lawstat,
ldbounds, likert, lme4,
lmerTest, lmPerm, logR,
logspline, lqmm, lqr,
lubridate, magrittr, MAMS,
MarginalEffects, margins,
MASS, maxcombo, MCPMod, mcr,
Mediana, meta, metafor,
metaviz, mice, Minirand,
MissMech, misty, Mkinfer,
mmmgee, modelbased, mstate,
muhaz, multcomp, multgee,
multxpert, MuMIn, MVN,
mvnormalTest, mvtnorm, naniar,
nlme, nortest, nparcomp,
nparLD, nph, npsurvSS,
officer, onlineFDR, openxlsx,
ordinal, PairedData,
pammtools, patchwork,
pbkrtest, PearsonDS, permuco,
PK, PKconverter, PKfit,
PKPDmodels, pkr, PMCMRplus,
polycor, PowerTOST, PropCIs,
Publish, purrr, pwr, qqplotr,
QRLMM, Qtools, quantreg,
r2rtf, randomizeR, rankFD,
ratesci, Rcmdr, rcompanion,
RCTdesign, readr, reda,
repolr, reReg, rlang,
Rmarkdown, rms, RMST,
robustbase, robustlmm, RODBC,
RPACT, rstatix, RVAideMemoire,
rvg, samplesizeCMH, sandwich,
sasLM, SASxport, scales,
simplexreg, simstudy, sqldf,
SSRMST, statsExpressions,
stringr, summarytools,
SuppDist, surv2sampleComp,
survival, Survmisc, survRM2,
testthat, ThreeArmedTrials,
tidyquery, tidyr, tidytext,
TOSTER, trend, TrialSize,
UpSetR, VGAM, VIM, wgeesel,
WRS2, xml2, mitml, jomo
psych, irr, SimplyAgree
mmrm, ggh4x, ggformula,
DescrTab2
230 so far, 200 in
daily use.
Introduction ► Step 3: defining tasks  finding tools  Hall of fame! (incomplete!)
nlme::gls() -MMRM emmeans
geepack, geeM, geesmv, wgeesel,
CRTgeeDR, multgee, repolr, ipw
Officeverse
Officer, flextable, rvg
SASXport
r2rtf
glmmTMB Tidyverse
nparLD, GFD, rankFD,
ARTool, nparcomp, WRS2
Mediana, gsDesign,
RCTdesign, rpact, MAMS
survival, cmprsk, nph, reda,
reReg, frailtypack, survRM2
RMarkdown broom
boot
sandwich,
clubSanwdich
DBI,
RODBC
effectsize
margins,
MarginalEffects
sasLM
Introduction ► Step 3: defining tasks  finding tools  Hall of fame! (incomplete!)
Introduction ► Step 3: defining tasks  finding tools  Hall of fame! (incomplete!)
Introduction ► So many sources of packages!
Packages
GitHub
CRAN
CRAN
archive
RForge
External
(PKfit)
Bioconductor
• Versions may differ
• Different ways of
reporting issues
Introduction ► So many sources of packages!
Packages
GitHub
CRAN
CRAN
archive
RForge
External
(PKfit)
Bioconductor
• Versions may differ
• Different ways of
reporting issues
Challenges ► Numerical validation
https://www.researchgate.net/publication/345778861_Numerical_validation_as_a_critical
_aspect_in_bringing_R_to_the_Clinical_Research
Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R
As long, as somebody uses just the basic tools, problems may never occur.
And this scope may be just sufficient for quite a lot scenarios!
• “group by” summaries with N, %, mean, median, SD, Q1, Q3, min, max…
• aov()
• kruskal.test(), wilcox.test(), t.test()…
• lmer(post_value ~ treatment * time + baseline + baseline:time + (1|PatID))
• plot(survfit(Surv(time, status) ~ treatment))
BTW, did you see median()?
Is it equal to quantile()[“50%”]? Always?   
https://stats.stackexchange.com/questions/
578387/serious-coding-error-in-qic-
function-in-geepack
Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R
Like it or not – the fact is that SAS® IS the industry standard in clinical trials and
people will use it to re-create your analyses – and NATURALLY ask if the
numbers don’t agree.
SAS®
Regulatory
agency
Journal
Sponsor-side
biostat team
Your
colleague
Validator
- it’s not about a “crusade”:
“R is better! No! SAS® is better!
No! Excel is better!”
- it’s not about favoring anyone
(“you think it’s better because
expensive!?”)
- It’s about the reality.
Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R
If they ask you about the discrepancies, you can:
1. ignore it (can you?),
2. say „I don’t know, it just happened, but R is right!”
3. investigate it and respond:
1. both are right, just different approach 🤷
2. well, R is wrong, I’m gonna fix it or message the authors
But to respond – you need to know what happened.
A much worse situation: NOBODY found a difference, and you just
published the results with errors.
Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R
We do not care, if a package has a „good marketing”. It must be working well.
Has vignettes!
Has active community!
5 in rankings. YouTube tutorials.
Top popular download on GitHub
Has unfixed errors that nobody cares…
Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R
• nlme: Priority: recommended; linear mixed models with almost all the stuff SAS has
• MCPMod: Design and Analysis of Dose-Finding Studies
• PMCMRplus Lots of popular non-param stuff, dose repsonse findings - Williams
• MASS Priority: recommended; lots of stuff, including glmmPQL!
• boot Priority: recommended
• nparcomp Lots of non-parametric methods
• frailtySurv Shared frailty models
• rms Strategies for regression modeling by Prof. Harrell
• geesmv Small-sample Morell’s correction for the GEE sandwich SEs
• ipw Inverse-Probability Weighting – for GEE under MAR
• multxpert Common Multiple Testing Procedures and Gatekeeping Procedures by Prof. Dmitrienko
• PropCIs A must have – CIs for proportion
• pkfit One of the most important tools for PK; Not even on CRAN
• bear One of the best available tools for the PK, not even on CRAN
• cmprsk Survival with competing risks
These packages have no marketing. Would you exclude them from your toolkit?
Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7327187/
Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R
And the real problem is that R is discrepant not only against SAS, but even… itself
Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R
After failing several times, we finally decided to validate as much as possible.
This consumes a lot of time and efforts. But make us sleeping better.
Package Function Version Dataset Test Completed Soft1 Soft2 Soft3 Discrep. Decision Justif.
pkg1 fn1() 0.6.2 Trial 1 #23 OK OK OK FAILED …………. OK ………
pkg1 fn1() 0.6.2 Journal 2 #23 OK OK FAILED OK ………… FAILED ………
pkg1 fn1() 0.6.2 Journal 2 #24 OK OK OK OK ………… OK ………
pkg1 fn2() 0.6.2 Journal 2 #25 OK FAILED FAILED FAILED ………… FAILED ………
Validation
Reference software
Textbook formulas
– by hand
Other trusted package Published results: journals/books
Published
results: manuals
Code inspection
Challenges ► Ocean of possibilities. But be careful! It’s deep!
Open Source gives you the ocean of possibilities (for doing THE SAME)!
OK! Diversity is overall good, but without overdoing! Let’s imagine I want
procedure ABC. R has 10 functions in 5 packages to do ABC in 8 ways. My
day has only 24 hours and I have my work, and the lifer after hours.
Challenges ► Documentation
Documentation quality varies a lot. From dedicated web-books with numerous
examples ( https://ardata-fr.github.io/flextable-book/ ) to just raw manual with
no formula and references to a paid article or rare book.
SAS, NCSS, SPSS, Stata – have awesome tutorials, manuals – almost
courses in statistics  NCSS gives even the input data and results!
Just a basic manual You can do PhD with it!
Challenges ► Why cannot things be simple?
SAS ®: PROC MIXED EMPIRICAL… REPEATED … CS … KR … LSMEANS
R: Kenward-Roger? … a-ha! Use lmer4! But wait, I want a marginal model with CS.
Random-intercept ≠ CS for negative within-subject correlations! I could use glmmTMB
for this, but pbkrtest doesn’t support it.
But there’s nlme! Take nlme::gls(). But pbkrtest doesn’t support nlme::gls().
OK then, let’s use Satterthwaite!
OK. nlme::gls() + emmeans (for LS-means + Satterthwaite). Now I want the robust HC0
(„sandwich”) estimator. Get clubSandwich and use the emmeans to provide the
adjusted Var-Cov. Follow it by emmeans::joint_tests(). Double check the DF, as
car::Anova() may have a problem here.
Done! Sigh! … Did you check GitHub, if there are no opened issues?
Statistics UX
Challenges ► Packages removed from CRAN
BaylorEdPsych , cvcqv, MissMech, PKPDmodels, SASxport, mixor, coxinterval,
quantreg (restored), brunnermunzel (restored), MomTrunc (restored), tlrmvnmvt
(restored), normtest, flow, nlmixr
Challenges ► Packages removed from CRAN
BaylorEdPsych , cvcqv, MissMech, PKPDmodels, SASxport, mixor, coxinterval,
quantreg (restored), brunnermunzel (restored), MomTrunc (restored), tlrmvnmvt
(restored), normtest, flow, nlmixr
Challenges ► Packages removed from CRAN
Challenges ► Let’s combine it together!
START!
Package
removed
from CRAN
Search for a
replacement
Email the author…
Create new issue.
Learn the new package
What does this
thing do?!
Something is wrong!
It works!
FIXED?
What now!?
Another package is
needed
It works!
Sorry, I’m
busy.
No, it
doesn’t
Partially
managed…
Future plans
We plan:
• To research a couple of new tools:
• For work: MMRM (!)
• For CDISC: admiral, sassy, definer, metacore
• For RTF: rtftables, gt
• Out of curiosity: tplyr
• For technical work: box
• To focus on CDISC and a preparation to the first big submission.
• To extend the numerical validation of packages
Overall impression
Employing Open Source means accepting the consequences.
The efforts, costs, extra work - cannot be taken lightly in a small CRO.
But it is definitely worth the efforts.
In moments of doubt, it’s good to remember, that no big deals come easy.
R is and will be our friend. Even if a demanding one 
50
THANK YOU
This is just the beginning…

More Related Content

Similar to Meet a 100% R-based CRO's 5-year journey

Chen's second test slides
Chen's second test slidesChen's second test slides
Chen's second test slidesHima Challa
 
A simple test paper from Chen
A simple test paper from ChenA simple test paper from Chen
A simple test paper from Chentechweb08
 
A simple test paper from Chen
A simple test paper from ChenA simple test paper from Chen
A simple test paper from Chentechweb08
 
Chen's second test slides again
Chen's second test slides againChen's second test slides again
Chen's second test slides againHima Challa
 
Computer Tools for Academic Research
Computer Tools for Academic ResearchComputer Tools for Academic Research
Computer Tools for Academic ResearchMiklos Koren
 
Big Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesBig Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesRob Winters
 
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014The Hive
 
Programming of c++
Programming of c++Programming of c++
Programming of c++Ateeq Sindhu
 
Managing Software Dependencies and the Supply Chain_ MIT EM.S20.pdf
Managing Software Dependencies and the Supply Chain_ MIT EM.S20.pdfManaging Software Dependencies and the Supply Chain_ MIT EM.S20.pdf
Managing Software Dependencies and the Supply Chain_ MIT EM.S20.pdfAndrew Lamb
 
Automation and machine learning in the enterprise
Automation and machine learning in the enterpriseAutomation and machine learning in the enterprise
Automation and machine learning in the enterprisealphydan
 
“Adoption DSpace 7 and 8 Challenges and Solutions from Real Migration Experie...
“Adoption DSpace 7 and 8 Challenges and Solutions from Real Migration Experie...“Adoption DSpace 7 and 8 Challenges and Solutions from Real Migration Experie...
“Adoption DSpace 7 and 8 Challenges and Solutions from Real Migration Experie...4Science
 
Agile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics ApplicationsAgile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics ApplicationsRussell Jurney
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera, Inc.
 
Brownfield Domain Driven Design
Brownfield Domain Driven DesignBrownfield Domain Driven Design
Brownfield Domain Driven DesignNicolò Pignatelli
 
Intro. to prog. c++
Intro. to prog. c++Intro. to prog. c++
Intro. to prog. c++KurdGul
 
Technologies for startup
Technologies for startupTechnologies for startup
Technologies for startupDzung Nguyen
 
Unified characterisation, please
Unified characterisation, pleaseUnified characterisation, please
Unified characterisation, pleaseAndy Jackson
 

Similar to Meet a 100% R-based CRO's 5-year journey (20)

Chen's second test slides
Chen's second test slidesChen's second test slides
Chen's second test slides
 
A simple test paper from Chen
A simple test paper from ChenA simple test paper from Chen
A simple test paper from Chen
 
A simple test paper from Chen
A simple test paper from ChenA simple test paper from Chen
A simple test paper from Chen
 
Chen's second test slides again
Chen's second test slides againChen's second test slides again
Chen's second test slides again
 
Computer Tools for Academic Research
Computer Tools for Academic ResearchComputer Tools for Academic Research
Computer Tools for Academic Research
 
Big Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesBig Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil Games
 
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
 
Programming of c++
Programming of c++Programming of c++
Programming of c++
 
Managing Software Dependencies and the Supply Chain_ MIT EM.S20.pdf
Managing Software Dependencies and the Supply Chain_ MIT EM.S20.pdfManaging Software Dependencies and the Supply Chain_ MIT EM.S20.pdf
Managing Software Dependencies and the Supply Chain_ MIT EM.S20.pdf
 
Automation and machine learning in the enterprise
Automation and machine learning in the enterpriseAutomation and machine learning in the enterprise
Automation and machine learning in the enterprise
 
Printing without printers
Printing without printersPrinting without printers
Printing without printers
 
“Adoption DSpace 7 and 8 Challenges and Solutions from Real Migration Experie...
“Adoption DSpace 7 and 8 Challenges and Solutions from Real Migration Experie...“Adoption DSpace 7 and 8 Challenges and Solutions from Real Migration Experie...
“Adoption DSpace 7 and 8 Challenges and Solutions from Real Migration Experie...
 
Msr2021 tutorial-di penta
Msr2021 tutorial-di pentaMsr2021 tutorial-di penta
Msr2021 tutorial-di penta
 
Agile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics ApplicationsAgile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics Applications
 
R tutorial
R tutorialR tutorial
R tutorial
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
 
Brownfield Domain Driven Design
Brownfield Domain Driven DesignBrownfield Domain Driven Design
Brownfield Domain Driven Design
 
Intro. to prog. c++
Intro. to prog. c++Intro. to prog. c++
Intro. to prog. c++
 
Technologies for startup
Technologies for startupTechnologies for startup
Technologies for startup
 
Unified characterisation, please
Unified characterisation, pleaseUnified characterisation, please
Unified characterisation, please
 

More from Adrian Olszewski

Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...Adrian Olszewski
 
Logistic regression - one of the key regression tools in experimental research
Logistic regression - one of the key regression tools in experimental researchLogistic regression - one of the key regression tools in experimental research
Logistic regression - one of the key regression tools in experimental researchAdrian Olszewski
 
Meet a 100% R-based CRO. The summary of a 5-year journey
Meet a 100% R-based CRO. The summary of a 5-year journeyMeet a 100% R-based CRO. The summary of a 5-year journey
Meet a 100% R-based CRO. The summary of a 5-year journeyAdrian Olszewski
 
Why are data transformations a bad choice in statistics
Why are data transformations a bad choice in statisticsWhy are data transformations a bad choice in statistics
Why are data transformations a bad choice in statisticsAdrian Olszewski
 
Modern statistical techniques
Modern statistical techniquesModern statistical techniques
Modern statistical techniquesAdrian Olszewski
 
Dealing with outliers in Clinical Research
Dealing with outliers in Clinical ResearchDealing with outliers in Clinical Research
Dealing with outliers in Clinical ResearchAdrian Olszewski
 
The use of R statistical package in controlled infrastructure. The case of Cl...
The use of R statistical package in controlled infrastructure. The case of Cl...The use of R statistical package in controlled infrastructure. The case of Cl...
The use of R statistical package in controlled infrastructure. The case of Cl...Adrian Olszewski
 
Rcommander - a menu-driven GUI for R
Rcommander - a menu-driven GUI for RRcommander - a menu-driven GUI for R
Rcommander - a menu-driven GUI for RAdrian Olszewski
 
GNU R in Clinical Research and Evidence-Based Medicine
GNU R in Clinical Research and Evidence-Based MedicineGNU R in Clinical Research and Evidence-Based Medicine
GNU R in Clinical Research and Evidence-Based MedicineAdrian Olszewski
 

More from Adrian Olszewski (10)

Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...
 
Logistic regression - one of the key regression tools in experimental research
Logistic regression - one of the key regression tools in experimental researchLogistic regression - one of the key regression tools in experimental research
Logistic regression - one of the key regression tools in experimental research
 
Meet a 100% R-based CRO. The summary of a 5-year journey
Meet a 100% R-based CRO. The summary of a 5-year journeyMeet a 100% R-based CRO. The summary of a 5-year journey
Meet a 100% R-based CRO. The summary of a 5-year journey
 
Flextable and Officer
Flextable and OfficerFlextable and Officer
Flextable and Officer
 
Why are data transformations a bad choice in statistics
Why are data transformations a bad choice in statisticsWhy are data transformations a bad choice in statistics
Why are data transformations a bad choice in statistics
 
Modern statistical techniques
Modern statistical techniquesModern statistical techniques
Modern statistical techniques
 
Dealing with outliers in Clinical Research
Dealing with outliers in Clinical ResearchDealing with outliers in Clinical Research
Dealing with outliers in Clinical Research
 
The use of R statistical package in controlled infrastructure. The case of Cl...
The use of R statistical package in controlled infrastructure. The case of Cl...The use of R statistical package in controlled infrastructure. The case of Cl...
The use of R statistical package in controlled infrastructure. The case of Cl...
 
Rcommander - a menu-driven GUI for R
Rcommander - a menu-driven GUI for RRcommander - a menu-driven GUI for R
Rcommander - a menu-driven GUI for R
 
GNU R in Clinical Research and Evidence-Based Medicine
GNU R in Clinical Research and Evidence-Based MedicineGNU R in Clinical Research and Evidence-Based Medicine
GNU R in Clinical Research and Evidence-Based Medicine
 

Recently uploaded

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 

Recently uploaded (20)

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 

Meet a 100% R-based CRO's 5-year journey

  • 1. Meet a 100% R-based CRO The summary of a 5-year journey Adrian Olszewski Principal Biostatistician at 2KMM CRO The R/Pharma 2022 Conference, Nov 10th 2022 10min www.2kmm.eu aolszewski@2kmm.pl
  • 2. Disclaimer 2 This presentation shows the biostatistician’s perspective first. Lots of exploratory research, involving tens of statistical tests, complex survival models and non-parametric methods. Here producing TFLs is important but secondary. I need the NUMBERS first to populate them. It’s not to „blame” or „unfairly criticize”. My job is to analysis trial’s data on time and within the budget. If something does not work so I cannot meet the deadline, fixing things exceeds the budget, the situation seems hopeless – there’s no time for sentiments, ideology, and hiding problems. I’m going to be held accountable for the effects of my work, not for my „love for the tool”. „You get what you pay for. It’s a free software. Stop complaining. The fact that something is „for free” does not mean it cannot be improved. The first step is to admit problems exist, diagnoze them and be honest. It needs a sober assessment of the situation to counteract effectively. So why do you „waste your time”? Buy XYZ® and be happy I really want to make things better. I ❤️ R. If i did not, I’d have abandoned it in 2000. Things get better but won’t „fix itself” magically. Relationships can be tough, but that's no reason to give up! Besides, it’s fun! 
  • 3. Introduction ► Who we are ⦿ The 2KMM - a small Polish CRO with a global reach. ⦿ 100% R-based: trial design • DM • datasets • analysis & research • TFL • documents • consulting • tutoring • making tools ⦿ 28 projects: RCTs + observational studies (in several therapeutic areas) lots of ad hoc research
  • 4. Introduction ► Who we are Our specifics: • No CDISC yet ; data sources based on SQL views • Lots of planned exploratory analyses with complex scenarios • Sometimes asked to use dinosaur tools vs. the freshest method widely widespread • Being a CRO we are not as powerful in decisions as a big pharmaceutical company: o a Sponsor may have own vision and demand us to follow it o our proposals may be questioned (sometimes without a discussion) • Very differentiated requests from different sponsors: • make tables like X, make table like Y • use this format, use that format • we prefer X, we HATE X. ABC is important vs. ABC is negligible vs. please decide  It’s difficult to work out a common approach, workflow, template.
  • 5. Introduction ► History 5 ⦿ When we started, a few questions had to be answered  Can we rely on R entirely? Will it suffice? Everyone around uses SAS  What are the hidden costs of using open source (no free lunch)  Can we trust R? How to validate it?  What packages do we need to start? Collection of requirements  How to organize the working environment (SOPs, technical aspects) In general – we were rather optimistic in 2018 
  • 6. Introduction ► Opinions ⦿ After 5 years we have some opinions:  Did R suffice to complete our work? Mostly…  Could we just „launch R and focus on the work”? Partially…  Could we trust R on faith? Did we fail? No. / Painfully.  What are the hidden costs of using open source: Non-negligible  How many packages we ended up with? 230+   Describe the experience briefly? annoyance, determination, fixing stuff, reporting issues, researching, satisfaction  Are we happy with R? Will we stay with? It’s a tough ♥ / Yes  Why? It’s flexible. It’s getting better. It’s worth. We learned „HOW”
  • 7. Introduction ► Costs This is not true that using free software does not cost a penny. It costs the time that one could spend doing the analysis, spent on:  Collecting the library of necessary tools. That’s not easy, will show why.  Validating the selected tools (making sure 2+2=4)  Realizing, that the important package fails or has gone (hello, CRAN!)  contacting authors or the entire group, reporting issues at GitHub  searching for a replacement (+validation) - may lack features  If no response - researching the problem on your own  Paying for external consultancy, books, pay-walled articles to move on
  • 8. Introduction ► Costs How much did it cost? An equivalent of a few 1yr licenses of a „good commercial software”. Wait, what!? So where are the savings then?! 1. The cost is distributed over time (a year, say) 2. Such a big cost is rather one-off - at the beginning of the process Occasional costs will take place, though (new versions, „retired packages”) 3. You get what you need (mostly), not what others decide you need 4. Once done – can be reused infinite number of times (no per-user licenses) 5. You better control what you have – because you are the one who made it. 6. You get the code – at least a little chance to fix things with own hands 7. As long you as your repository (library) is validated and frozen – you sleep well.
  • 9. Introduction ► Costs “Oh, c’mon. You have all the codes! It’s open source! Why don’t you just fix the problems and go back to work? What’s the problem? I think you exaggerate! Resources (staff + time + money) allocated to “employ the Open Source.” Big company Small company 15 specialists X$  2 specialists Y$, Y << X 
  • 10. Introduction ► Step 1: organization of work ; technical infrastructure + SOPs projects R 3.x VHDX container V P N wild validated SOP SOP SOP SOP
  • 11. Introduction ► Step 1: organization of work ; portable R https://sourceforge.net/projects/rportable/  Allows one to test new stuff and mix different versions of R core in a single analysis.  Easy – no installation (VMs, containers), no extra packages / dependencies / setups  No elevated rights needed  Regular directories – easy management!  When „matured”, can be packed into a VHDX cont.  Easily selectable as the current engine in RStudio:
  • 12. Introduction ► Step 1: organization of work ; portable R
  • 13. Introduction ► Step 1: organization of work ; portable R  This allows us to mix not only packages in different versions (with all necessary dependencies) in a single analysis, but also to mix versions of the R core itself, when certain package needs higher/lower version of the R core.  Combines RPortable + rscript.exe + convention of naming [input data]- [output results] files.  Each version-dependent code knows where to read the data from and where to store the results.  Fully isolated codes. Data exchanged via regular R objets (RDS or feather) Warning in install.packages : package ‘emmeans’ is not available (for R version 3.6.3)
  • 14. Introduction ► Step 2: Simple, automated workflow
  • 15. Introduction ► Step 2: Simple, automated workflow
  • 16. Introduction ► Step 2: Simple, automated template-based workflow DOCx template - Headers, footers - Styles - Content placeholders definitions definitions Header Footer Title Header Footer Report ID A B C 1 A B C 2 A B C DOCx report - Headers, footers preserved - Styles utilized - Placeholders hold actual T/F/Ls HTML log - All R commands - All messages - All (simplified) results Trial ABC LOG Author: xxx Date: xxxx print(„Hi!”) [1] Hi! library(…) library(…) library(…) ….. ….. ….. ….. Rmarkdown „manager” - Reads the DOCx template for TODOs - Does the „TODOs” - Replaces „TODOs” with TFLs - Becomes the HTML LOG
  • 17. Introduction ► Step 2: Simple, automated template-based workflow definitions definitions Header Footer Title RMarkdown file - Creates the environment - Reads the DOCx template - Loads the Word parsing „engine” - The engine: - iterates through definitions of placeholders - parses the fields, - loads the R files per convention - executes the code - replaces placeholders with actual TFLs - Auto-updates (appends) the HTML to LOG library(…) library(…) library(…) ….. ….. ….. ….. DOCx reading engine
  • 18. Introduction ► Step 2: Simple, automated template-based workflow definitions definitions Header Footer Title RMarkdown file - Creates the environment - Reads the DOCx template - Loads the Word parsing „engine” - The engine: - iterates through definitions of placeholders - parses the fields, - loads the R files per convention - executes the code - replaces placeholders with actual TFLs - Auto-updates (appends) the HTML to LOG library(…) library(…) library(…) ….. ….. ….. ….. DOCx reading engine ## Preparing the objects storing the content of the report in both MS Word and MS Excel formats ```{r} word_report_document_name <- paste0(target_report_document_name, ".docx") excel_report_document_name <- paste0(target_report_document_name, ".xlsx") word_report_template_name <- paste0(target_report_document_name, "_template.docx") doc_report <- read_docx(word_report_document_name) doc_content <- docx_summary(doc_report) xls_report <- createWorkbook() ``` # Data analysis ```{r child="rendering_engine.rmd", echo=TRUE, results='asis'} ``` ```{r} print(doc_report, target = word_report_document_name) saveWorkbook(wb = xls_report, file = excel_report_document_name, overwrite = TRUE) ```
  • 19. Introduction ► Step 2: Simple, automated template-based workflow definitions definitions Header Footer Title RMarkdown file - Creates the environment - Reads the DOCx template - Loads the Word parsing „engine” - The engine: - iterates through definitions of placeholders - parses the fields, - loads the R files per convention - executes the code - replaces placeholders with actual TFLs - Auto-updates (appends) the HTML to LOG library(…) library(…) library(…) ….. ….. ….. ….. DOCx reading engine ## Preparing the objects storing the content of the report in both MS Word and MS Excel formats ```{r} word_report_document_name <- paste0(target_report_document_name, ".docx") excel_report_document_name <- paste0(target_report_document_name, ".xlsx") word_report_template_name <- paste0(target_report_document_name, "_template.docx") doc_report <- read_docx(word_report_document_name) doc_content <- docx_summary(doc_report) xls_report <- createWorkbook() ``` # Data analysis ```{r child="rendering_engine.rmd", echo=TRUE, results='asis'} ``` ```{r} print(doc_report, target = word_report_document_name) saveWorkbook(wb = xls_report, file = excel_report_document_name, overwrite = TRUE) ``` table_defs <- subset(doc_content, grepl("^[Table]", doc_content$text), text) table_defs <- gsub("[Table] ", "", table_defs$text) for (def in table_defs) { split_defs <- strsplit(def, "@")[[1]][-1] table_title <- trimws(gsub("title:(.*)", "1", split_defs[grep("^title", split_defs)])) table_number <- trimws(gsub("table_num:(.*)", "1", split_defs[grep("^table_num", split_defs)])) force_table_num <- trimws(gsub("force_table_num:(.*)", "1", split_defs[grep("^force_table_num", split_defs)])) table_sufix <- trimws(gsub("table_sufix:(.*)", "1", split_defs[grep("^table_sufix", split_defs)])) r_file <- trimws(gsub("r_code:(.*)", "1", split_defs[grep("^r_code", split_defs)])) r_prn_file <- trimws(gsub("r_printer_code:(.*)", "1", split_defs[grep("^r_printer_code", split_defs)])) exclude <- trimws(gsub("exclude:(.*)", "1", split_defs[grep("^exclude", split_defs)])) table_title <- iconv(table_title,from = "UTF-8", to = "UTF-8") exclude <- ifelse(identical(exclude, character(0)), FALSE, as.logical(exclude)) …………………………………………… if (identical(r_file, character(0)) || r_file == "") { r_file <- paste0("Table", table_number, table_sufix, ".r") } …………………………………………… r_file <- file.path(r_code_location, r_file) chunk <- c(paste("#### Table ", paste0(table_number, table_sufix), "-", table_title, "n"), paste("```{r ", r_file, "}n"), readLines(r_file), "```n") cat(knit_child(text = chunk, quiet = TRUE), sep = 'n’) …………………………………………… } ```
  • 20. Introduction ► Step 2: Simple, automated template-based workflow DOCx template - Headers, footers - Styles - Content placeholders definitions definitions Header Footer Title Regular R files. Naming convention. Triplet per table. Prefix: _data reads data from RDATA / DBI / XML / CSV / XLSX _an performs the analyses; stores results in RDATA _print reads the RDATA, generates DOCx tables, XLSx files, EMF graphs and HTML output for the LOG Table_01_data.r Table_01_an.r Table_01_print.r 𝑦 = 𝛽0 + 𝛽1X
  • 21. Introduction ► Step2: Simple, automated template-based workflow definitions definitions Header Footer Title Header Footer Report ID A B C 1 A B C 2 A B C library(…) library(…) library(…) ….. ….. ….. ….. Trial ABC LOG Author: xxx Date: xxxx print(„Hi!”) [1] Hi!
  • 22. Introduction ► Step2: Simple, automated template-based workflow definitions definitions Header Footer Title Header Footer Report ID A B C 1 A B C 2 A B C library(…) library(…) library(…) ….. ….. ….. ….. Trial ABC LOG Author: xxx Date: xxxx print(„Hi!”) [1] Hi!
  • 23. Introduction ► Step 3: defining tasks  finding tools  making a library Modelling, longitudinal analysis Inference (testing, CIs, MCP) Summaries Effect size Advanced survival Making complex tables Dose – Response PK, PD, DF Questionnaires Generating documents (DOCx, RTF, PDF) Documenting (log) the analysis Data I/O Technical / Programming Trial design & simulation Plotting Randomization Data manipulation Meta-analysis CDISC-related Missing data – patterns and imputation Model diagnostics
  • 24. Introduction ► Step 3: defining tasks  finding tools  making a library Modelling, longitudinal analysis Inference (testing, CIs, MCP) Summaries Effect size Advanced survival Making complex tables Dose – Response PK, PD, DF Questionnaires Generating documents (DOCx, RTF, PDF) Documenting (log) the analysis Data I/O Technical / Programming Trial design & simulation Plotting Randomization Data manipulation CRTSize, faux, gsDesign, ldbounds, MAMS, Mediana, PowerTOST, pwr, RCTdesign, RPACT, samplesizeCMH, simstudy, SSRMST, ThreeArmedTrials, TrialSize CRTgeeDR, drgee, gee, geeasy, geepack , geesmv, GLMMadaptive, glmmTMB, glmtoolbox, ipw, lavaSearch2, lme4, lmerTest, lqmm, MASS, MASS, mmmgee, multgee, MuMIn, nlme, ordinal, QRLMM, repolr, rms, robustlmm, sasLM, simplexreg, wgeesel, lqmm, lqr, rms gam, quantreg… Meta-analysis CDISC-related Missing data – patterns and imputation Model diagnostics bshazard, cmprsk, ComparisonSurv, controlTest, coxphw, CoxR2, FHtest, frailtypack, frailtysurv, landest, maxcombo, mstate, muhaz, nph, npsurvSS, pammtools, reda, reReg, RMST, surv2sampleComp, survival, Survmisc, survRM2
  • 25. Introduction ► Step 3: defining tasks  finding tools  making a library ARTool, Asbio, BaylorEdPsych, bear, betareg, bindrcpp, binom, biostatUZH, blockrand, boot, broom, bshazard, car, clinPK, clubSandwich, cmprsk, coin, ComparisonSurv, compute.es, confintr, conflicted, contrast, controlTest, correlation, coxphw, CoxR2, CRTgeeDR, CRTSize, cvcqv, dabestr, DataEditR , DBI, DescTools, devEMF, diffdf, DoseFinding, dplyr, drgee, dunn.test, e1071, effectsize, effsize, emmeans, epiR, equatiomatic, faux, FHtest, fitdistrplus, flextable, forcats, foreign, forestmodel, frailtypack, frailtysurv, gam, gee, geeasy, geepack , geesmv, GFD, ggalluvial, GGally, ggeffects, gghalves, ggmosaic, ggplot2, ggpol, ggrepel, ggridges , ggside, ggsignif, ggstance, ggtext, GLMMadaptive, glmmTMB, glmnet, glmtoolbox, glue, gmodels, gridExtra , gsDesign, haven, Hmisc, InformativeCensoring, interactions, ipw, irr, Kendall, knitr, knitr, kSamples, landest, latex2exp, lavaSearch2, lawstat, ldbounds, likert, lme4, lmerTest, lmPerm, logR, logspline, lqmm, lqr, lubridate, magrittr, MAMS, MarginalEffects, margins, MASS, maxcombo, MCPMod, mcr, Mediana, meta, metafor, metaviz, mice, Minirand, MissMech, misty, Mkinfer, mmmgee, modelbased, mstate, muhaz, multcomp, multgee, multxpert, MuMIn, MVN, mvnormalTest, mvtnorm, naniar, nlme, nortest, nparcomp, nparLD, nph, npsurvSS, officer, onlineFDR, openxlsx, ordinal, PairedData, pammtools, patchwork, pbkrtest, PearsonDS, permuco, PK, PKconverter, PKfit, PKPDmodels, pkr, PMCMRplus, polycor, PowerTOST, PropCIs, Publish, purrr, pwr, qqplotr, QRLMM, Qtools, quantreg, r2rtf, randomizeR, rankFD, ratesci, Rcmdr, rcompanion, RCTdesign, readr, reda, repolr, reReg, rlang, Rmarkdown, rms, RMST, robustbase, robustlmm, RODBC, RPACT, rstatix, RVAideMemoire, rvg, samplesizeCMH, sandwich, sasLM, SASxport, scales, simplexreg, simstudy, sqldf, SSRMST, statsExpressions, stringr, summarytools, SuppDist, surv2sampleComp, survival, Survmisc, survRM2, testthat, ThreeArmedTrials, tidyquery, tidyr, tidytext, TOSTER, trend, TrialSize, UpSetR, VGAM, VIM, wgeesel, WRS2, xml2, mitml, jomo psych, irr, SimplyAgree mmrm, ggh4x, ggformula, DescrTab2 230 so far, 200 in daily use.
  • 26. Introduction ► Step 3: defining tasks  finding tools  Hall of fame! (incomplete!) nlme::gls() -MMRM emmeans geepack, geeM, geesmv, wgeesel, CRTgeeDR, multgee, repolr, ipw Officeverse Officer, flextable, rvg SASXport r2rtf glmmTMB Tidyverse nparLD, GFD, rankFD, ARTool, nparcomp, WRS2 Mediana, gsDesign, RCTdesign, rpact, MAMS survival, cmprsk, nph, reda, reReg, frailtypack, survRM2 RMarkdown broom boot sandwich, clubSanwdich DBI, RODBC effectsize margins, MarginalEffects sasLM
  • 27. Introduction ► Step 3: defining tasks  finding tools  Hall of fame! (incomplete!)
  • 28. Introduction ► Step 3: defining tasks  finding tools  Hall of fame! (incomplete!)
  • 29. Introduction ► So many sources of packages! Packages GitHub CRAN CRAN archive RForge External (PKfit) Bioconductor • Versions may differ • Different ways of reporting issues
  • 30. Introduction ► So many sources of packages! Packages GitHub CRAN CRAN archive RForge External (PKfit) Bioconductor • Versions may differ • Different ways of reporting issues
  • 31. Challenges ► Numerical validation https://www.researchgate.net/publication/345778861_Numerical_validation_as_a_critical _aspect_in_bringing_R_to_the_Clinical_Research
  • 32. Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R As long, as somebody uses just the basic tools, problems may never occur. And this scope may be just sufficient for quite a lot scenarios! • “group by” summaries with N, %, mean, median, SD, Q1, Q3, min, max… • aov() • kruskal.test(), wilcox.test(), t.test()… • lmer(post_value ~ treatment * time + baseline + baseline:time + (1|PatID)) • plot(survfit(Surv(time, status) ~ treatment)) BTW, did you see median()? Is it equal to quantile()[“50%”]? Always?   
  • 34. Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R Like it or not – the fact is that SAS® IS the industry standard in clinical trials and people will use it to re-create your analyses – and NATURALLY ask if the numbers don’t agree. SAS® Regulatory agency Journal Sponsor-side biostat team Your colleague Validator - it’s not about a “crusade”: “R is better! No! SAS® is better! No! Excel is better!” - it’s not about favoring anyone (“you think it’s better because expensive!?”) - It’s about the reality.
  • 35. Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R If they ask you about the discrepancies, you can: 1. ignore it (can you?), 2. say „I don’t know, it just happened, but R is right!” 3. investigate it and respond: 1. both are right, just different approach 🤷 2. well, R is wrong, I’m gonna fix it or message the authors But to respond – you need to know what happened. A much worse situation: NOBODY found a difference, and you just published the results with errors.
  • 36. Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R We do not care, if a package has a „good marketing”. It must be working well. Has vignettes! Has active community! 5 in rankings. YouTube tutorials. Top popular download on GitHub Has unfixed errors that nobody cares…
  • 37. Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R • nlme: Priority: recommended; linear mixed models with almost all the stuff SAS has • MCPMod: Design and Analysis of Dose-Finding Studies • PMCMRplus Lots of popular non-param stuff, dose repsonse findings - Williams • MASS Priority: recommended; lots of stuff, including glmmPQL! • boot Priority: recommended • nparcomp Lots of non-parametric methods • frailtySurv Shared frailty models • rms Strategies for regression modeling by Prof. Harrell • geesmv Small-sample Morell’s correction for the GEE sandwich SEs • ipw Inverse-Probability Weighting – for GEE under MAR • multxpert Common Multiple Testing Procedures and Gatekeeping Procedures by Prof. Dmitrienko • PropCIs A must have – CIs for proportion • pkfit One of the most important tools for PK; Not even on CRAN • bear One of the best available tools for the PK, not even on CRAN • cmprsk Survival with competing risks These packages have no marketing. Would you exclude them from your toolkit?
  • 38. Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7327187/
  • 39. Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R And the real problem is that R is discrepant not only against SAS, but even… itself
  • 40. Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R After failing several times, we finally decided to validate as much as possible. This consumes a lot of time and efforts. But make us sleeping better. Package Function Version Dataset Test Completed Soft1 Soft2 Soft3 Discrep. Decision Justif. pkg1 fn1() 0.6.2 Trial 1 #23 OK OK OK FAILED …………. OK ……… pkg1 fn1() 0.6.2 Journal 2 #23 OK OK FAILED OK ………… FAILED ……… pkg1 fn1() 0.6.2 Journal 2 #24 OK OK OK OK ………… OK ……… pkg1 fn2() 0.6.2 Journal 2 #25 OK FAILED FAILED FAILED ………… FAILED ……… Validation Reference software Textbook formulas – by hand Other trusted package Published results: journals/books Published results: manuals Code inspection
  • 41. Challenges ► Ocean of possibilities. But be careful! It’s deep! Open Source gives you the ocean of possibilities (for doing THE SAME)! OK! Diversity is overall good, but without overdoing! Let’s imagine I want procedure ABC. R has 10 functions in 5 packages to do ABC in 8 ways. My day has only 24 hours and I have my work, and the lifer after hours.
  • 42. Challenges ► Documentation Documentation quality varies a lot. From dedicated web-books with numerous examples ( https://ardata-fr.github.io/flextable-book/ ) to just raw manual with no formula and references to a paid article or rare book. SAS, NCSS, SPSS, Stata – have awesome tutorials, manuals – almost courses in statistics  NCSS gives even the input data and results! Just a basic manual You can do PhD with it!
  • 43. Challenges ► Why cannot things be simple? SAS ®: PROC MIXED EMPIRICAL… REPEATED … CS … KR … LSMEANS R: Kenward-Roger? … a-ha! Use lmer4! But wait, I want a marginal model with CS. Random-intercept ≠ CS for negative within-subject correlations! I could use glmmTMB for this, but pbkrtest doesn’t support it. But there’s nlme! Take nlme::gls(). But pbkrtest doesn’t support nlme::gls(). OK then, let’s use Satterthwaite! OK. nlme::gls() + emmeans (for LS-means + Satterthwaite). Now I want the robust HC0 („sandwich”) estimator. Get clubSandwich and use the emmeans to provide the adjusted Var-Cov. Follow it by emmeans::joint_tests(). Double check the DF, as car::Anova() may have a problem here. Done! Sigh! … Did you check GitHub, if there are no opened issues? Statistics UX
  • 44. Challenges ► Packages removed from CRAN BaylorEdPsych , cvcqv, MissMech, PKPDmodels, SASxport, mixor, coxinterval, quantreg (restored), brunnermunzel (restored), MomTrunc (restored), tlrmvnmvt (restored), normtest, flow, nlmixr
  • 45. Challenges ► Packages removed from CRAN BaylorEdPsych , cvcqv, MissMech, PKPDmodels, SASxport, mixor, coxinterval, quantreg (restored), brunnermunzel (restored), MomTrunc (restored), tlrmvnmvt (restored), normtest, flow, nlmixr
  • 46. Challenges ► Packages removed from CRAN
  • 47. Challenges ► Let’s combine it together! START! Package removed from CRAN Search for a replacement Email the author… Create new issue. Learn the new package What does this thing do?! Something is wrong! It works! FIXED? What now!? Another package is needed It works! Sorry, I’m busy. No, it doesn’t Partially managed…
  • 48. Future plans We plan: • To research a couple of new tools: • For work: MMRM (!) • For CDISC: admiral, sassy, definer, metacore • For RTF: rtftables, gt • Out of curiosity: tplyr • For technical work: box • To focus on CDISC and a preparation to the first big submission. • To extend the numerical validation of packages
  • 49. Overall impression Employing Open Source means accepting the consequences. The efforts, costs, extra work - cannot be taken lightly in a small CRO. But it is definitely worth the efforts. In moments of doubt, it’s good to remember, that no big deals come easy. R is and will be our friend. Even if a demanding one 
  • 50. 50 THANK YOU This is just the beginning…