SlideShare a Scribd company logo
1 of 50
Meet a 100% R-based CRO
The summary of a 5-year journey
Adrian Olszewski
Principal Biostatistician at 2KMM CRO
The R/Pharma 2022 Conference, Nov 10th 2022
10min
www.2kmm.eu
aolszewski@2kmm.pl
Disclaimer
2
This presentation shows the biostatistician’s perspective first.
Lots of exploratory research, involving tens of statistical tests, complex survival models and non-parametric
methods. Here producing TFLs is important but secondary. I need the NUMBERS first to populate them.
It’s not to „blame” or „unfairly criticize”.
My job is to analysis trial’s data on time and within the budget.
If something does not work so I cannot meet the deadline, fixing things exceeds the budget, the situation
seems hopeless – there’s no time for sentiments, ideology, and hiding problems. I’m going to be held
accountable for the effects of my work, not for my „love for the tool”.
„You get what you pay for. It’s a free software. Stop complaining.
The fact that something is „for free” does not mean it cannot be improved. The first step is to admit problems
exist, diagnoze them and be honest. It needs a sober assessment of the situation to counteract effectively.
So why do you „waste your time”? Buy XYZ® and be happy
I really want to make things better. I ❤️ R. If i did not, I’d have abandoned it in 2000. Things get better but
won’t „fix itself” magically. Relationships can be tough, but that's no reason to give up! Besides, it’s fun! 
Introduction ► Who we are
⦿ The 2KMM - a small Polish CRO with a global reach.
⦿ 100% R-based:
trial design • DM • datasets • analysis & research • TFL • documents •
consulting • tutoring • making tools
⦿ 28 projects: RCTs + observational studies (in several therapeutic areas)
lots of ad hoc research
Introduction ► Who we are
Our specifics:
• No CDISC yet ; data sources based on SQL views
• Lots of planned exploratory analyses with complex scenarios
• Sometimes asked to use dinosaur tools vs. the freshest method widely widespread
• Being a CRO we are not as powerful in decisions as a big pharmaceutical company:
o a Sponsor may have own vision and demand us to follow it
o our proposals may be questioned (sometimes without a discussion)
• Very differentiated requests from different sponsors:
• make tables like X, make table like Y
• use this format, use that format
• we prefer X, we HATE X. ABC is important vs. ABC is negligible vs. please decide
 It’s difficult to work out a common approach, workflow, template.
Introduction ► History
5
⦿ When we started, a few questions had to be answered
 Can we rely on R entirely? Will it suffice? Everyone around uses SAS
 What are the hidden costs of using open source (no free lunch)
 Can we trust R? How to validate it?
 What packages do we need to start? Collection of requirements
 How to organize the working environment (SOPs, technical aspects)
In general – we were rather optimistic in 2018 
Introduction ► Opinions
⦿ After 5 years we have some opinions:
 Did R suffice to complete our work? Mostly…
 Could we just „launch R and focus on the work”? Partially…
 Could we trust R on faith? Did we fail? No. / Painfully.
 What are the hidden costs of using open source: Non-negligible
 How many packages we ended up with? 230+ 
 Describe the experience briefly? annoyance, determination,
fixing stuff, reporting issues,
researching, satisfaction
 Are we happy with R? Will we stay with? It’s a tough ♥ / Yes
 Why? It’s flexible. It’s getting better.
It’s worth. We learned „HOW”
Introduction ► Costs
This is not true that using free software does not cost a penny. It costs the time
that one could spend doing the analysis, spent on:
 Collecting the library of necessary tools. That’s not easy, will show why.
 Validating the selected tools (making sure 2+2=4)
 Realizing, that the important package fails or has gone (hello, CRAN!)
 contacting authors or the entire group, reporting issues at GitHub
 searching for a replacement (+validation) - may lack features
 If no response - researching the problem on your own
 Paying for external consultancy, books, pay-walled articles to move on
Introduction ► Costs
How much did it cost?
An equivalent of a few 1yr licenses of a „good commercial software”.
Wait, what!? So where are the savings then?!
1. The cost is distributed over time (a year, say)
2. Such a big cost is rather one-off - at the beginning of the process
Occasional costs will take place, though (new versions, „retired packages”)
3. You get what you need (mostly), not what others decide you need
4. Once done – can be reused infinite number of times (no per-user licenses)
5. You better control what you have – because you are the one who made it.
6. You get the code – at least a little chance to fix things with own hands
7. As long you as your repository (library) is validated and frozen – you sleep well.
Introduction ► Costs
“Oh, c’mon. You have all the codes! It’s open source! Why don’t you just fix the
problems and go back to work? What’s the problem? I think you exaggerate!
Resources (staff + time + money) allocated to “employ the Open Source.”
Big company Small company
15 specialists
X$ 
2 specialists
Y$, Y << X 
Introduction ► Step 1: organization of work ; technical infrastructure + SOPs
projects R
3.x
VHDX container
V
P
N
wild
validated
SOP SOP
SOP SOP
Introduction ► Step 1: organization of work ; portable R
https://sourceforge.net/projects/rportable/
 Allows one to test new stuff and mix different
versions of R core in a single analysis.
 Easy – no installation (VMs, containers), no extra
packages / dependencies / setups
 No elevated rights needed
 Regular directories – easy management!
 When „matured”, can be packed into a VHDX cont.
 Easily selectable as the current engine in RStudio:
Introduction ► Step 1: organization of work ; portable R
Introduction ► Step 1: organization of work ; portable R
 This allows us to mix not only
packages in different versions (with
all necessary dependencies) in a
single analysis, but also to mix
versions of the R core itself, when
certain package needs higher/lower
version of the R core.
 Combines RPortable + rscript.exe +
convention of naming [input data]-
[output results] files.
 Each version-dependent code
knows where to read the data from
and where to store the results.
 Fully isolated codes. Data
exchanged via regular R objets
(RDS or feather)
Warning in install.packages :
package ‘emmeans’ is not available (for R version 3.6.3)
Introduction ► Step 2: Simple, automated workflow
Introduction ► Step 2: Simple, automated workflow
Introduction ► Step 2: Simple, automated template-based workflow
DOCx template
- Headers, footers
- Styles
- Content placeholders
definitions
definitions
Header
Footer
Title
Header
Footer
Report
ID A B C
1 A B C
2 A B C
DOCx report
- Headers, footers preserved
- Styles utilized
- Placeholders hold actual T/F/Ls
HTML log
- All R commands
- All messages
- All (simplified) results
Trial ABC
LOG
Author: xxx Date: xxxx
print(„Hi!”)
[1] Hi!
library(…)
library(…)
library(…)
…..
…..
…..
…..
Rmarkdown „manager”
- Reads the DOCx template for TODOs
- Does the „TODOs”
- Replaces „TODOs” with TFLs
- Becomes the HTML LOG
Introduction ► Step 2: Simple, automated template-based workflow
definitions
definitions
Header
Footer
Title
RMarkdown file
- Creates the environment
- Reads the DOCx template
- Loads the Word parsing „engine”
- The engine:
- iterates through definitions of placeholders
- parses the fields,
- loads the R files per convention
- executes the code
- replaces placeholders with actual TFLs
- Auto-updates (appends) the HTML to LOG
library(…)
library(…)
library(…)
…..
…..
…..
….. DOCx reading engine
Introduction ► Step 2: Simple, automated template-based workflow
definitions
definitions
Header
Footer
Title
RMarkdown file
- Creates the environment
- Reads the DOCx template
- Loads the Word parsing „engine”
- The engine:
- iterates through definitions of placeholders
- parses the fields,
- loads the R files per convention
- executes the code
- replaces placeholders with actual TFLs
- Auto-updates (appends) the HTML to LOG
library(…)
library(…)
library(…)
…..
…..
…..
….. DOCx reading engine
## Preparing the objects storing the content of the report in both MS Word and MS Excel
formats
```{r}
word_report_document_name <- paste0(target_report_document_name, ".docx")
excel_report_document_name <- paste0(target_report_document_name, ".xlsx")
word_report_template_name <- paste0(target_report_document_name, "_template.docx")
doc_report <- read_docx(word_report_document_name)
doc_content <- docx_summary(doc_report)
xls_report <- createWorkbook()
```
# Data analysis
```{r child="rendering_engine.rmd", echo=TRUE, results='asis'}
```
```{r}
print(doc_report, target = word_report_document_name)
saveWorkbook(wb = xls_report, file = excel_report_document_name, overwrite = TRUE)
```
Introduction ► Step 2: Simple, automated template-based workflow
definitions
definitions
Header
Footer
Title
RMarkdown file
- Creates the environment
- Reads the DOCx template
- Loads the Word parsing „engine”
- The engine:
- iterates through definitions of placeholders
- parses the fields,
- loads the R files per convention
- executes the code
- replaces placeholders with actual TFLs
- Auto-updates (appends) the HTML to LOG
library(…)
library(…)
library(…)
…..
…..
…..
….. DOCx reading engine
## Preparing the objects storing the content of the report in both MS Word and MS Excel
formats
```{r}
word_report_document_name <- paste0(target_report_document_name, ".docx")
excel_report_document_name <- paste0(target_report_document_name, ".xlsx")
word_report_template_name <- paste0(target_report_document_name, "_template.docx")
doc_report <- read_docx(word_report_document_name)
doc_content <- docx_summary(doc_report)
xls_report <- createWorkbook()
```
# Data analysis
```{r child="rendering_engine.rmd", echo=TRUE, results='asis'}
```
```{r}
print(doc_report, target = word_report_document_name)
saveWorkbook(wb = xls_report, file = excel_report_document_name, overwrite = TRUE)
```
table_defs <- subset(doc_content, grepl("^[Table]", doc_content$text), text)
table_defs <- gsub("[Table] ", "", table_defs$text)
for (def in table_defs) {
split_defs <- strsplit(def, "@")[[1]][-1]
table_title <- trimws(gsub("title:(.*)", "1", split_defs[grep("^title", split_defs)]))
table_number <- trimws(gsub("table_num:(.*)", "1", split_defs[grep("^table_num", split_defs)]))
force_table_num <- trimws(gsub("force_table_num:(.*)", "1", split_defs[grep("^force_table_num", split_defs)]))
table_sufix <- trimws(gsub("table_sufix:(.*)", "1", split_defs[grep("^table_sufix", split_defs)]))
r_file <- trimws(gsub("r_code:(.*)", "1", split_defs[grep("^r_code", split_defs)]))
r_prn_file <- trimws(gsub("r_printer_code:(.*)", "1", split_defs[grep("^r_printer_code", split_defs)]))
exclude <- trimws(gsub("exclude:(.*)", "1", split_defs[grep("^exclude", split_defs)]))
table_title <- iconv(table_title,from = "UTF-8", to = "UTF-8")
exclude <- ifelse(identical(exclude, character(0)), FALSE, as.logical(exclude))
……………………………………………
if (identical(r_file, character(0)) || r_file == "") {
r_file <- paste0("Table", table_number, table_sufix, ".r")
}
……………………………………………
r_file <- file.path(r_code_location, r_file)
chunk <- c(paste("#### Table ", paste0(table_number, table_sufix), "-", table_title, "n"),
paste("```{r ", r_file, "}n"),
readLines(r_file),
"```n")
cat(knit_child(text = chunk, quiet = TRUE), sep = 'n’)
……………………………………………
}
```
Introduction ► Step 2: Simple, automated template-based workflow
DOCx template
- Headers, footers
- Styles
- Content placeholders
definitions
definitions
Header
Footer
Title
Regular R files. Naming convention. Triplet per table.
Prefix:
_data reads data from RDATA / DBI / XML / CSV / XLSX
_an performs the analyses; stores results in RDATA
_print reads the RDATA, generates DOCx tables, XLSx
files, EMF graphs and HTML output for the LOG
Table_01_data.r Table_01_an.r Table_01_print.r
𝑦 = 𝛽0 + 𝛽1X
Introduction ► Step2: Simple, automated template-based workflow
definitions
definitions
Header
Footer
Title
Header
Footer
Report
ID A B C
1 A B C
2 A B C
library(…)
library(…)
library(…)
…..
…..
…..
…..
Trial ABC
LOG
Author: xxx Date: xxxx
print(„Hi!”)
[1] Hi!
Introduction ► Step2: Simple, automated template-based workflow
definitions
definitions
Header
Footer
Title
Header
Footer
Report
ID A B C
1 A B C
2 A B C
library(…)
library(…)
library(…)
…..
…..
…..
…..
Trial ABC
LOG
Author: xxx Date: xxxx
print(„Hi!”)
[1] Hi!
Introduction ► Step 3: defining tasks  finding tools  making a library
Modelling,
longitudinal
analysis
Inference
(testing, CIs, MCP)
Summaries Effect size
Advanced
survival
Making complex
tables
Dose – Response
PK, PD, DF
Questionnaires
Generating
documents
(DOCx, RTF, PDF)
Documenting
(log) the analysis
Data I/O Technical /
Programming
Trial design &
simulation
Plotting
Randomization
Data
manipulation
Meta-analysis CDISC-related
Missing data –
patterns and
imputation
Model
diagnostics
Introduction ► Step 3: defining tasks  finding tools  making a library
Modelling,
longitudinal
analysis
Inference
(testing, CIs, MCP)
Summaries Effect size
Advanced
survival
Making complex
tables
Dose – Response
PK, PD, DF
Questionnaires
Generating
documents
(DOCx, RTF, PDF)
Documenting
(log) the analysis
Data I/O Technical /
Programming
Trial design &
simulation
Plotting
Randomization
Data
manipulation
CRTSize, faux, gsDesign,
ldbounds, MAMS, Mediana,
PowerTOST, pwr, RCTdesign, RPACT,
samplesizeCMH, simstudy, SSRMST,
ThreeArmedTrials, TrialSize
CRTgeeDR, drgee, gee, geeasy,
geepack , geesmv, GLMMadaptive,
glmmTMB, glmtoolbox, ipw,
lavaSearch2, lme4, lmerTest,
lqmm, MASS, MASS, mmmgee,
multgee, MuMIn, nlme, ordinal,
QRLMM, repolr, rms, robustlmm,
sasLM, simplexreg, wgeesel,
lqmm, lqr, rms gam, quantreg…
Meta-analysis CDISC-related
Missing data –
patterns and
imputation
Model
diagnostics
bshazard, cmprsk, ComparisonSurv,
controlTest, coxphw, CoxR2, FHtest,
frailtypack, frailtysurv, landest,
maxcombo, mstate, muhaz, nph,
npsurvSS, pammtools, reda, reReg,
RMST, surv2sampleComp, survival,
Survmisc, survRM2
Introduction ► Step 3: defining tasks  finding tools  making a library
ARTool, Asbio, BaylorEdPsych,
bear, betareg, bindrcpp,
binom, biostatUZH, blockrand,
boot, broom, bshazard, car,
clinPK, clubSandwich, cmprsk,
coin, ComparisonSurv,
compute.es, confintr,
conflicted, contrast,
controlTest, correlation,
coxphw, CoxR2, CRTgeeDR,
CRTSize, cvcqv, dabestr,
DataEditR , DBI, DescTools,
devEMF, diffdf, DoseFinding,
dplyr, drgee, dunn.test,
e1071, effectsize, effsize,
emmeans, epiR, equatiomatic,
faux, FHtest, fitdistrplus,
flextable, forcats, foreign,
forestmodel, frailtypack,
frailtysurv, gam, gee, geeasy,
geepack , geesmv, GFD,
ggalluvial, GGally, ggeffects,
gghalves, ggmosaic, ggplot2,
ggpol, ggrepel, ggridges ,
ggside, ggsignif, ggstance,
ggtext, GLMMadaptive, glmmTMB,
glmnet, glmtoolbox, glue,
gmodels, gridExtra , gsDesign,
haven, Hmisc,
InformativeCensoring,
interactions, ipw, irr,
Kendall, knitr, knitr,
kSamples, landest, latex2exp,
lavaSearch2, lawstat,
ldbounds, likert, lme4,
lmerTest, lmPerm, logR,
logspline, lqmm, lqr,
lubridate, magrittr, MAMS,
MarginalEffects, margins,
MASS, maxcombo, MCPMod, mcr,
Mediana, meta, metafor,
metaviz, mice, Minirand,
MissMech, misty, Mkinfer,
mmmgee, modelbased, mstate,
muhaz, multcomp, multgee,
multxpert, MuMIn, MVN,
mvnormalTest, mvtnorm, naniar,
nlme, nortest, nparcomp,
nparLD, nph, npsurvSS,
officer, onlineFDR, openxlsx,
ordinal, PairedData,
pammtools, patchwork,
pbkrtest, PearsonDS, permuco,
PK, PKconverter, PKfit,
PKPDmodels, pkr, PMCMRplus,
polycor, PowerTOST, PropCIs,
Publish, purrr, pwr, qqplotr,
QRLMM, Qtools, quantreg,
r2rtf, randomizeR, rankFD,
ratesci, Rcmdr, rcompanion,
RCTdesign, readr, reda,
repolr, reReg, rlang,
Rmarkdown, rms, RMST,
robustbase, robustlmm, RODBC,
RPACT, rstatix, RVAideMemoire,
rvg, samplesizeCMH, sandwich,
sasLM, SASxport, scales,
simplexreg, simstudy, sqldf,
SSRMST, statsExpressions,
stringr, summarytools,
SuppDist, surv2sampleComp,
survival, Survmisc, survRM2,
testthat, ThreeArmedTrials,
tidyquery, tidyr, tidytext,
TOSTER, trend, TrialSize,
UpSetR, VGAM, VIM, wgeesel,
WRS2, xml2, mitml, jomo
psych, irr, SimplyAgree
mmrm, ggh4x, ggformula,
DescrTab2
230 so far, 200 in
daily use.
Introduction ► Step 3: defining tasks  finding tools  Hall of fame! (incomplete!)
nlme::gls() -MMRM emmeans
geepack, geeM, geesmv, wgeesel,
CRTgeeDR, multgee, repolr, ipw
Officeverse
Officer, flextable, rvg
SASXport
r2rtf
glmmTMB Tidyverse
nparLD, GFD, rankFD,
ARTool, nparcomp, WRS2
Mediana, gsDesign,
RCTdesign, rpact, MAMS
survival, cmprsk, nph, reda,
reReg, frailtypack, survRM2
RMarkdown broom
boot
sandwich,
clubSanwdich
DBI,
RODBC
effectsize
margins,
MarginalEffects
sasLM
Introduction ► Step 3: defining tasks  finding tools  Hall of fame! (incomplete!)
Introduction ► Step 3: defining tasks  finding tools  Hall of fame! (incomplete!)
Introduction ► So many sources of packages!
Packages
GitHub
CRAN
CRAN
archive
RForge
External
(PKfit)
Bioconductor
• Versions may differ
• Different ways of
reporting issues
Introduction ► So many sources of packages!
Packages
GitHub
CRAN
CRAN
archive
RForge
External
(PKfit)
Bioconductor
• Versions may differ
• Different ways of
reporting issues
Challenges ► Numerical validation
https://www.researchgate.net/publication/345778861_Numerical_validation_as_a_critical
_aspect_in_bringing_R_to_the_Clinical_Research
Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R
As long, as somebody uses just the basic tools, problems may never occur.
And this scope may be just sufficient for quite a lot scenarios!
• “group by” summaries with N, %, mean, median, SD, Q1, Q3, min, max…
• aov()
• kruskal.test(), wilcox.test(), t.test()…
• lmer(post_value ~ treatment * time + baseline + baseline:time + (1|PatID))
• plot(survfit(Surv(time, status) ~ treatment))
BTW, did you see median()?
Is it equal to quantile()[“50%”]? Always?   
https://stats.stackexchange.com/questions/
578387/serious-coding-error-in-qic-
function-in-geepack
Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R
Like it or not – the fact is that SAS® IS the industry standard in clinical trials and
people will use it to re-create your analyses – and NATURALLY ask if the
numbers don’t agree.
SAS®
Regulatory
agency
Journal
Sponsor-side
biostat team
Your
colleague
Validator
- it’s not about a “crusade”:
“R is better! No! SAS® is better!
No! Excel is better!”
- it’s not about favoring anyone
(“you think it’s better because
expensive!?”)
- It’s about the reality.
Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R
If they ask you about the discrepancies, you can:
1. ignore it (can you?),
2. say „I don’t know, it just happened, but R is right!”
3. investigate it and respond:
1. both are right, just different approach 🤷
2. well, R is wrong, I’m gonna fix it or message the authors
But to respond – you need to know what happened.
A much worse situation: NOBODY found a difference, and you just
published the results with errors.
Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R
We do not care, if a package has a „good marketing”. It must be working well.
Has vignettes!
Has active community!
5 in rankings. YouTube tutorials.
Top popular download on GitHub
Has unfixed errors that nobody cares…
Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R
• nlme: Priority: recommended; linear mixed models with almost all the stuff SAS has
• MCPMod: Design and Analysis of Dose-Finding Studies
• PMCMRplus Lots of popular non-param stuff, dose repsonse findings - Williams
• MASS Priority: recommended; lots of stuff, including glmmPQL!
• boot Priority: recommended
• nparcomp Lots of non-parametric methods
• frailtySurv Shared frailty models
• rms Strategies for regression modeling by Prof. Harrell
• geesmv Small-sample Morell’s correction for the GEE sandwich SEs
• ipw Inverse-Probability Weighting – for GEE under MAR
• multxpert Common Multiple Testing Procedures and Gatekeeping Procedures by Prof. Dmitrienko
• PropCIs A must have – CIs for proportion
• pkfit One of the most important tools for PK; Not even on CRAN
• bear One of the best available tools for the PK, not even on CRAN
• cmprsk Survival with competing risks
These packages have no marketing. Would you exclude them from your toolkit?
Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7327187/
Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R
And the real problem is that R is discrepant not only against SAS, but even… itself
Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R
After failing several times, we finally decided to validate as much as possible.
This consumes a lot of time and efforts. But make us sleeping better.
Package Function Version Dataset Test Completed Soft1 Soft2 Soft3 Discrep. Decision Justif.
pkg1 fn1() 0.6.2 Trial 1 #23 OK OK OK FAILED …………. OK ………
pkg1 fn1() 0.6.2 Journal 2 #23 OK OK FAILED OK ………… FAILED ………
pkg1 fn1() 0.6.2 Journal 2 #24 OK OK OK OK ………… OK ………
pkg1 fn2() 0.6.2 Journal 2 #25 OK FAILED FAILED FAILED ………… FAILED ………
Validation
Reference software
Textbook formulas
– by hand
Other trusted package Published results: journals/books
Published
results: manuals
Code inspection
Challenges ► Ocean of possibilities. But be careful! It’s deep!
Open Source gives you the ocean of possibilities (for doing THE SAME)!
OK! Diversity is overall good, but without overdoing! Let’s imagine I want
procedure ABC. R has 10 functions in 5 packages to do ABC in 8 ways. My
day has only 24 hours and I have my work, and the lifer after hours.
Challenges ► Documentation
Documentation quality varies a lot. From dedicated web-books with numerous
examples ( https://ardata-fr.github.io/flextable-book/ ) to just raw manual with
no formula and references to a paid article or rare book.
SAS, NCSS, SPSS, Stata – have awesome tutorials, manuals – almost
courses in statistics  NCSS gives even the input data and results!
Just a basic manual You can do PhD with it!
Challenges ► Why cannot things be simple?
SAS ®: PROC MIXED EMPIRICAL… REPEATED … CS … KR … LSMEANS
R: Kenward-Roger? … a-ha! Use lmer4! But wait, I want a marginal model with CS.
Random-intercept ≠ CS for negative within-subject correlations! I could use glmmTMB
for this, but pbkrtest doesn’t support it.
But there’s nlme! Take nlme::gls(). But pbkrtest doesn’t support nlme::gls().
OK then, let’s use Satterthwaite!
OK. nlme::gls() + emmeans (for LS-means + Satterthwaite). Now I want the robust HC0
(„sandwich”) estimator. Get clubSandwich and use the emmeans to provide the
adjusted Var-Cov. Follow it by emmeans::joint_tests(). Double check the DF, as
car::Anova() may have a problem here.
Done! Sigh! … Did you check GitHub, if there are no opened issues?
Statistics UX
Challenges ► Packages removed from CRAN
BaylorEdPsych , cvcqv, MissMech, PKPDmodels, SASxport, mixor, coxinterval,
quantreg (restored), brunnermunzel (restored), MomTrunc (restored), tlrmvnmvt
(restored), normtest, flow, nlmixr
Challenges ► Packages removed from CRAN
BaylorEdPsych , cvcqv, MissMech, PKPDmodels, SASxport, mixor, coxinterval,
quantreg (restored), brunnermunzel (restored), MomTrunc (restored), tlrmvnmvt
(restored), normtest, flow, nlmixr
Challenges ► Packages removed from CRAN
Challenges ► Let’s combine it together!
START!
Package
removed
from CRAN
Search for a
replacement
Email the author…
Create new issue.
Learn the new package
What does this
thing do?!
Something is wrong!
It works!
FIXED?
What now!?
Another package is
needed
It works!
Sorry, I’m
busy.
No, it
doesn’t
Partially
managed…
Future plans
We plan:
• To research a couple of new tools:
• For work: MMRM (!)
• For CDISC: admiral, sassy, definer, metacore
• For RTF: rtftables, gt
• Out of curiosity: tplyr
• For technical work: box
• To focus on CDISC and a preparation to the first big submission.
• To extend the numerical validation of packages
Overall impression
Employing Open Source means accepting the consequences.
The efforts, costs, extra work - cannot be taken lightly in a small CRO.
But it is definitely worth the efforts.
In moments of doubt, it’s good to remember, that no big deals come easy.
R is and will be our friend. Even if a demanding one 
50
THANK YOU
This is just the beginning…

More Related Content

What's hot

ITIL Service Project RACI Matrix Chart
ITIL Service Project RACI Matrix ChartITIL Service Project RACI Matrix Chart
ITIL Service Project RACI Matrix ChartSlideTeam
 
Ch 6 development plan and quality plan
Ch 6 development plan and quality planCh 6 development plan and quality plan
Ch 6 development plan and quality planKittitouch Suteeca
 
11. Project Communication Management
11. Project Communication Management11. Project Communication Management
11. Project Communication ManagementBhuWan Khadka
 
software effort estimation
 software effort estimation software effort estimation
software effort estimationBesharam Dil
 
Software testing-and-risk-analysis
Software testing-and-risk-analysisSoftware testing-and-risk-analysis
Software testing-and-risk-analysisAjit Waje
 
Business Process Management Training | By ex-Deloitte & McKinsey Consultants
Business Process Management Training | By ex-Deloitte & McKinsey ConsultantsBusiness Process Management Training | By ex-Deloitte & McKinsey Consultants
Business Process Management Training | By ex-Deloitte & McKinsey ConsultantsAurelien Domont, MBA
 
Managing people and organizing teams
Managing people and organizing teamsManaging people and organizing teams
Managing people and organizing teamstumetr1
 
Dynamic system development method
Dynamic system development methodDynamic system development method
Dynamic system development methodNisak Ahamed
 
MG6088 SOFTWARE PROJECT MANAGEMENT
MG6088 SOFTWARE PROJECT MANAGEMENTMG6088 SOFTWARE PROJECT MANAGEMENT
MG6088 SOFTWARE PROJECT MANAGEMENTKathirvel Ayyaswamy
 
Amharic WSD using WordNet
Amharic WSD using WordNetAmharic WSD using WordNet
Amharic WSD using WordNetSeid Hassen
 
Distributed Transaction in Microservices.pdf
Distributed Transaction in Microservices.pdfDistributed Transaction in Microservices.pdf
Distributed Transaction in Microservices.pdfrony setyawansyah
 
Function of software quality assurance lecture 2
Function of software quality assurance lecture 2Function of software quality assurance lecture 2
Function of software quality assurance lecture 2Abdul Basit
 
Software Project Planning 1
Software Project Planning 1Software Project Planning 1
Software Project Planning 1Gagan Deep
 
Time Management within IT Project Management
Time Management within IT Project ManagementTime Management within IT Project Management
Time Management within IT Project Managementrielaantonio
 
Software Project Management (SPM)
Software Project Management (SPM)Software Project Management (SPM)
Software Project Management (SPM)Shahid Riaz
 
Basic Software Effort Estimation
Basic Software Effort EstimationBasic Software Effort Estimation
Basic Software Effort Estimationumair khan
 
Dynamic Systems Development Method (DSDM) - Agile
Dynamic Systems Development Method (DSDM) - AgileDynamic Systems Development Method (DSDM) - Agile
Dynamic Systems Development Method (DSDM) - AgileMaruf Abdullah (Rion)
 

What's hot (20)

ITIL Service Project RACI Matrix Chart
ITIL Service Project RACI Matrix ChartITIL Service Project RACI Matrix Chart
ITIL Service Project RACI Matrix Chart
 
Advanced Scrum
Advanced ScrumAdvanced Scrum
Advanced Scrum
 
Ch 6 development plan and quality plan
Ch 6 development plan and quality planCh 6 development plan and quality plan
Ch 6 development plan and quality plan
 
11. Project Communication Management
11. Project Communication Management11. Project Communication Management
11. Project Communication Management
 
software effort estimation
 software effort estimation software effort estimation
software effort estimation
 
Software testing-and-risk-analysis
Software testing-and-risk-analysisSoftware testing-and-risk-analysis
Software testing-and-risk-analysis
 
Business Process Management Training | By ex-Deloitte & McKinsey Consultants
Business Process Management Training | By ex-Deloitte & McKinsey ConsultantsBusiness Process Management Training | By ex-Deloitte & McKinsey Consultants
Business Process Management Training | By ex-Deloitte & McKinsey Consultants
 
Managing people and organizing teams
Managing people and organizing teamsManaging people and organizing teams
Managing people and organizing teams
 
Spm unit1
Spm unit1Spm unit1
Spm unit1
 
Dynamic system development method
Dynamic system development methodDynamic system development method
Dynamic system development method
 
MG6088 SOFTWARE PROJECT MANAGEMENT
MG6088 SOFTWARE PROJECT MANAGEMENTMG6088 SOFTWARE PROJECT MANAGEMENT
MG6088 SOFTWARE PROJECT MANAGEMENT
 
Amharic WSD using WordNet
Amharic WSD using WordNetAmharic WSD using WordNet
Amharic WSD using WordNet
 
SPM PPT
SPM PPTSPM PPT
SPM PPT
 
Distributed Transaction in Microservices.pdf
Distributed Transaction in Microservices.pdfDistributed Transaction in Microservices.pdf
Distributed Transaction in Microservices.pdf
 
Function of software quality assurance lecture 2
Function of software quality assurance lecture 2Function of software quality assurance lecture 2
Function of software quality assurance lecture 2
 
Software Project Planning 1
Software Project Planning 1Software Project Planning 1
Software Project Planning 1
 
Time Management within IT Project Management
Time Management within IT Project ManagementTime Management within IT Project Management
Time Management within IT Project Management
 
Software Project Management (SPM)
Software Project Management (SPM)Software Project Management (SPM)
Software Project Management (SPM)
 
Basic Software Effort Estimation
Basic Software Effort EstimationBasic Software Effort Estimation
Basic Software Effort Estimation
 
Dynamic Systems Development Method (DSDM) - Agile
Dynamic Systems Development Method (DSDM) - AgileDynamic Systems Development Method (DSDM) - Agile
Dynamic Systems Development Method (DSDM) - Agile
 

Similar to Meet a 100% R-based CRO. The summary of a 5-year journey

DocOps: Documentation at the Speed of Agile
DocOps: Documentation at the Speed of AgileDocOps: Documentation at the Speed of Agile
DocOps: Documentation at the Speed of AgileMary Connor
 
Scaling Analysis Responsibly
Scaling Analysis ResponsiblyScaling Analysis Responsibly
Scaling Analysis ResponsiblyWork-Bench
 
Cs121 Unit Test
Cs121 Unit TestCs121 Unit Test
Cs121 Unit TestJill Bell
 
Mattias Diagl - Low Budget Tooling - Excel-ent
Mattias Diagl - Low Budget Tooling - Excel-entMattias Diagl - Low Budget Tooling - Excel-ent
Mattias Diagl - Low Budget Tooling - Excel-entTEST Huddle
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the tradeFangda Wang
 
A simple test paper from Chen
A simple test paper from ChenA simple test paper from Chen
A simple test paper from Chentechweb08
 
Chen's second test slides
Chen's second test slidesChen's second test slides
Chen's second test slidesHima Challa
 
A simple test paper from Chen
A simple test paper from ChenA simple test paper from Chen
A simple test paper from Chentechweb08
 
A simple test paper from Chen
A simple test paper from ChenA simple test paper from Chen
A simple test paper from Chentechweb08
 
Chen's second test slides again
Chen's second test slides againChen's second test slides again
Chen's second test slides againHima Challa
 
Computer Tools for Academic Research
Computer Tools for Academic ResearchComputer Tools for Academic Research
Computer Tools for Academic ResearchMiklos Koren
 
Big Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesBig Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesRob Winters
 
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014The Hive
 
Programming of c++
Programming of c++Programming of c++
Programming of c++Ateeq Sindhu
 
Managing Software Dependencies and the Supply Chain_ MIT EM.S20.pdf
Managing Software Dependencies and the Supply Chain_ MIT EM.S20.pdfManaging Software Dependencies and the Supply Chain_ MIT EM.S20.pdf
Managing Software Dependencies and the Supply Chain_ MIT EM.S20.pdfAndrew Lamb
 
Automation and machine learning in the enterprise
Automation and machine learning in the enterpriseAutomation and machine learning in the enterprise
Automation and machine learning in the enterprisealphydan
 
“Adoption DSpace 7 and 8 Challenges and Solutions from Real Migration Experie...
“Adoption DSpace 7 and 8 Challenges and Solutions from Real Migration Experie...“Adoption DSpace 7 and 8 Challenges and Solutions from Real Migration Experie...
“Adoption DSpace 7 and 8 Challenges and Solutions from Real Migration Experie...4Science
 

Similar to Meet a 100% R-based CRO. The summary of a 5-year journey (20)

DocOps: Documentation at the Speed of Agile
DocOps: Documentation at the Speed of AgileDocOps: Documentation at the Speed of Agile
DocOps: Documentation at the Speed of Agile
 
Raising the Bar
Raising the BarRaising the Bar
Raising the Bar
 
Scaling Analysis Responsibly
Scaling Analysis ResponsiblyScaling Analysis Responsibly
Scaling Analysis Responsibly
 
Cs121 Unit Test
Cs121 Unit TestCs121 Unit Test
Cs121 Unit Test
 
Debugging
DebuggingDebugging
Debugging
 
Mattias Diagl - Low Budget Tooling - Excel-ent
Mattias Diagl - Low Budget Tooling - Excel-entMattias Diagl - Low Budget Tooling - Excel-ent
Mattias Diagl - Low Budget Tooling - Excel-ent
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the trade
 
A simple test paper from Chen
A simple test paper from ChenA simple test paper from Chen
A simple test paper from Chen
 
Chen's second test slides
Chen's second test slidesChen's second test slides
Chen's second test slides
 
A simple test paper from Chen
A simple test paper from ChenA simple test paper from Chen
A simple test paper from Chen
 
A simple test paper from Chen
A simple test paper from ChenA simple test paper from Chen
A simple test paper from Chen
 
Chen's second test slides again
Chen's second test slides againChen's second test slides again
Chen's second test slides again
 
Computer Tools for Academic Research
Computer Tools for Academic ResearchComputer Tools for Academic Research
Computer Tools for Academic Research
 
Big Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesBig Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil Games
 
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
 
Programming of c++
Programming of c++Programming of c++
Programming of c++
 
Managing Software Dependencies and the Supply Chain_ MIT EM.S20.pdf
Managing Software Dependencies and the Supply Chain_ MIT EM.S20.pdfManaging Software Dependencies and the Supply Chain_ MIT EM.S20.pdf
Managing Software Dependencies and the Supply Chain_ MIT EM.S20.pdf
 
Automation and machine learning in the enterprise
Automation and machine learning in the enterpriseAutomation and machine learning in the enterprise
Automation and machine learning in the enterprise
 
Printing without printers
Printing without printersPrinting without printers
Printing without printers
 
“Adoption DSpace 7 and 8 Challenges and Solutions from Real Migration Experie...
“Adoption DSpace 7 and 8 Challenges and Solutions from Real Migration Experie...“Adoption DSpace 7 and 8 Challenges and Solutions from Real Migration Experie...
“Adoption DSpace 7 and 8 Challenges and Solutions from Real Migration Experie...
 

More from Adrian Olszewski

Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...Adrian Olszewski
 
Logistic regression - one of the key regression tools in experimental research
Logistic regression - one of the key regression tools in experimental researchLogistic regression - one of the key regression tools in experimental research
Logistic regression - one of the key regression tools in experimental researchAdrian Olszewski
 
Meet a 100% R-based CRO - The summary of a 5-year journey
Meet a 100% R-based CRO - The summary of a 5-year journeyMeet a 100% R-based CRO - The summary of a 5-year journey
Meet a 100% R-based CRO - The summary of a 5-year journeyAdrian Olszewski
 
Why are data transformations a bad choice in statistics
Why are data transformations a bad choice in statisticsWhy are data transformations a bad choice in statistics
Why are data transformations a bad choice in statisticsAdrian Olszewski
 
Modern statistical techniques
Modern statistical techniquesModern statistical techniques
Modern statistical techniquesAdrian Olszewski
 
Dealing with outliers in Clinical Research
Dealing with outliers in Clinical ResearchDealing with outliers in Clinical Research
Dealing with outliers in Clinical ResearchAdrian Olszewski
 
The use of R statistical package in controlled infrastructure. The case of Cl...
The use of R statistical package in controlled infrastructure. The case of Cl...The use of R statistical package in controlled infrastructure. The case of Cl...
The use of R statistical package in controlled infrastructure. The case of Cl...Adrian Olszewski
 
Rcommander - a menu-driven GUI for R
Rcommander - a menu-driven GUI for RRcommander - a menu-driven GUI for R
Rcommander - a menu-driven GUI for RAdrian Olszewski
 
GNU R in Clinical Research and Evidence-Based Medicine
GNU R in Clinical Research and Evidence-Based MedicineGNU R in Clinical Research and Evidence-Based Medicine
GNU R in Clinical Research and Evidence-Based MedicineAdrian Olszewski
 

More from Adrian Olszewski (10)

Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...
 
Logistic regression - one of the key regression tools in experimental research
Logistic regression - one of the key regression tools in experimental researchLogistic regression - one of the key regression tools in experimental research
Logistic regression - one of the key regression tools in experimental research
 
Meet a 100% R-based CRO - The summary of a 5-year journey
Meet a 100% R-based CRO - The summary of a 5-year journeyMeet a 100% R-based CRO - The summary of a 5-year journey
Meet a 100% R-based CRO - The summary of a 5-year journey
 
Flextable and Officer
Flextable and OfficerFlextable and Officer
Flextable and Officer
 
Why are data transformations a bad choice in statistics
Why are data transformations a bad choice in statisticsWhy are data transformations a bad choice in statistics
Why are data transformations a bad choice in statistics
 
Modern statistical techniques
Modern statistical techniquesModern statistical techniques
Modern statistical techniques
 
Dealing with outliers in Clinical Research
Dealing with outliers in Clinical ResearchDealing with outliers in Clinical Research
Dealing with outliers in Clinical Research
 
The use of R statistical package in controlled infrastructure. The case of Cl...
The use of R statistical package in controlled infrastructure. The case of Cl...The use of R statistical package in controlled infrastructure. The case of Cl...
The use of R statistical package in controlled infrastructure. The case of Cl...
 
Rcommander - a menu-driven GUI for R
Rcommander - a menu-driven GUI for RRcommander - a menu-driven GUI for R
Rcommander - a menu-driven GUI for R
 
GNU R in Clinical Research and Evidence-Based Medicine
GNU R in Clinical Research and Evidence-Based MedicineGNU R in Clinical Research and Evidence-Based Medicine
GNU R in Clinical Research and Evidence-Based Medicine
 

Recently uploaded

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Recently uploaded (20)

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

Meet a 100% R-based CRO. The summary of a 5-year journey

  • 1. Meet a 100% R-based CRO The summary of a 5-year journey Adrian Olszewski Principal Biostatistician at 2KMM CRO The R/Pharma 2022 Conference, Nov 10th 2022 10min www.2kmm.eu aolszewski@2kmm.pl
  • 2. Disclaimer 2 This presentation shows the biostatistician’s perspective first. Lots of exploratory research, involving tens of statistical tests, complex survival models and non-parametric methods. Here producing TFLs is important but secondary. I need the NUMBERS first to populate them. It’s not to „blame” or „unfairly criticize”. My job is to analysis trial’s data on time and within the budget. If something does not work so I cannot meet the deadline, fixing things exceeds the budget, the situation seems hopeless – there’s no time for sentiments, ideology, and hiding problems. I’m going to be held accountable for the effects of my work, not for my „love for the tool”. „You get what you pay for. It’s a free software. Stop complaining. The fact that something is „for free” does not mean it cannot be improved. The first step is to admit problems exist, diagnoze them and be honest. It needs a sober assessment of the situation to counteract effectively. So why do you „waste your time”? Buy XYZ® and be happy I really want to make things better. I ❤️ R. If i did not, I’d have abandoned it in 2000. Things get better but won’t „fix itself” magically. Relationships can be tough, but that's no reason to give up! Besides, it’s fun! 
  • 3. Introduction ► Who we are ⦿ The 2KMM - a small Polish CRO with a global reach. ⦿ 100% R-based: trial design • DM • datasets • analysis & research • TFL • documents • consulting • tutoring • making tools ⦿ 28 projects: RCTs + observational studies (in several therapeutic areas) lots of ad hoc research
  • 4. Introduction ► Who we are Our specifics: • No CDISC yet ; data sources based on SQL views • Lots of planned exploratory analyses with complex scenarios • Sometimes asked to use dinosaur tools vs. the freshest method widely widespread • Being a CRO we are not as powerful in decisions as a big pharmaceutical company: o a Sponsor may have own vision and demand us to follow it o our proposals may be questioned (sometimes without a discussion) • Very differentiated requests from different sponsors: • make tables like X, make table like Y • use this format, use that format • we prefer X, we HATE X. ABC is important vs. ABC is negligible vs. please decide  It’s difficult to work out a common approach, workflow, template.
  • 5. Introduction ► History 5 ⦿ When we started, a few questions had to be answered  Can we rely on R entirely? Will it suffice? Everyone around uses SAS  What are the hidden costs of using open source (no free lunch)  Can we trust R? How to validate it?  What packages do we need to start? Collection of requirements  How to organize the working environment (SOPs, technical aspects) In general – we were rather optimistic in 2018 
  • 6. Introduction ► Opinions ⦿ After 5 years we have some opinions:  Did R suffice to complete our work? Mostly…  Could we just „launch R and focus on the work”? Partially…  Could we trust R on faith? Did we fail? No. / Painfully.  What are the hidden costs of using open source: Non-negligible  How many packages we ended up with? 230+   Describe the experience briefly? annoyance, determination, fixing stuff, reporting issues, researching, satisfaction  Are we happy with R? Will we stay with? It’s a tough ♥ / Yes  Why? It’s flexible. It’s getting better. It’s worth. We learned „HOW”
  • 7. Introduction ► Costs This is not true that using free software does not cost a penny. It costs the time that one could spend doing the analysis, spent on:  Collecting the library of necessary tools. That’s not easy, will show why.  Validating the selected tools (making sure 2+2=4)  Realizing, that the important package fails or has gone (hello, CRAN!)  contacting authors or the entire group, reporting issues at GitHub  searching for a replacement (+validation) - may lack features  If no response - researching the problem on your own  Paying for external consultancy, books, pay-walled articles to move on
  • 8. Introduction ► Costs How much did it cost? An equivalent of a few 1yr licenses of a „good commercial software”. Wait, what!? So where are the savings then?! 1. The cost is distributed over time (a year, say) 2. Such a big cost is rather one-off - at the beginning of the process Occasional costs will take place, though (new versions, „retired packages”) 3. You get what you need (mostly), not what others decide you need 4. Once done – can be reused infinite number of times (no per-user licenses) 5. You better control what you have – because you are the one who made it. 6. You get the code – at least a little chance to fix things with own hands 7. As long you as your repository (library) is validated and frozen – you sleep well.
  • 9. Introduction ► Costs “Oh, c’mon. You have all the codes! It’s open source! Why don’t you just fix the problems and go back to work? What’s the problem? I think you exaggerate! Resources (staff + time + money) allocated to “employ the Open Source.” Big company Small company 15 specialists X$  2 specialists Y$, Y << X 
  • 10. Introduction ► Step 1: organization of work ; technical infrastructure + SOPs projects R 3.x VHDX container V P N wild validated SOP SOP SOP SOP
  • 11. Introduction ► Step 1: organization of work ; portable R https://sourceforge.net/projects/rportable/  Allows one to test new stuff and mix different versions of R core in a single analysis.  Easy – no installation (VMs, containers), no extra packages / dependencies / setups  No elevated rights needed  Regular directories – easy management!  When „matured”, can be packed into a VHDX cont.  Easily selectable as the current engine in RStudio:
  • 12. Introduction ► Step 1: organization of work ; portable R
  • 13. Introduction ► Step 1: organization of work ; portable R  This allows us to mix not only packages in different versions (with all necessary dependencies) in a single analysis, but also to mix versions of the R core itself, when certain package needs higher/lower version of the R core.  Combines RPortable + rscript.exe + convention of naming [input data]- [output results] files.  Each version-dependent code knows where to read the data from and where to store the results.  Fully isolated codes. Data exchanged via regular R objets (RDS or feather) Warning in install.packages : package ‘emmeans’ is not available (for R version 3.6.3)
  • 14. Introduction ► Step 2: Simple, automated workflow
  • 15. Introduction ► Step 2: Simple, automated workflow
  • 16. Introduction ► Step 2: Simple, automated template-based workflow DOCx template - Headers, footers - Styles - Content placeholders definitions definitions Header Footer Title Header Footer Report ID A B C 1 A B C 2 A B C DOCx report - Headers, footers preserved - Styles utilized - Placeholders hold actual T/F/Ls HTML log - All R commands - All messages - All (simplified) results Trial ABC LOG Author: xxx Date: xxxx print(„Hi!”) [1] Hi! library(…) library(…) library(…) ….. ….. ….. ….. Rmarkdown „manager” - Reads the DOCx template for TODOs - Does the „TODOs” - Replaces „TODOs” with TFLs - Becomes the HTML LOG
  • 17. Introduction ► Step 2: Simple, automated template-based workflow definitions definitions Header Footer Title RMarkdown file - Creates the environment - Reads the DOCx template - Loads the Word parsing „engine” - The engine: - iterates through definitions of placeholders - parses the fields, - loads the R files per convention - executes the code - replaces placeholders with actual TFLs - Auto-updates (appends) the HTML to LOG library(…) library(…) library(…) ….. ….. ….. ….. DOCx reading engine
  • 18. Introduction ► Step 2: Simple, automated template-based workflow definitions definitions Header Footer Title RMarkdown file - Creates the environment - Reads the DOCx template - Loads the Word parsing „engine” - The engine: - iterates through definitions of placeholders - parses the fields, - loads the R files per convention - executes the code - replaces placeholders with actual TFLs - Auto-updates (appends) the HTML to LOG library(…) library(…) library(…) ….. ….. ….. ….. DOCx reading engine ## Preparing the objects storing the content of the report in both MS Word and MS Excel formats ```{r} word_report_document_name <- paste0(target_report_document_name, ".docx") excel_report_document_name <- paste0(target_report_document_name, ".xlsx") word_report_template_name <- paste0(target_report_document_name, "_template.docx") doc_report <- read_docx(word_report_document_name) doc_content <- docx_summary(doc_report) xls_report <- createWorkbook() ``` # Data analysis ```{r child="rendering_engine.rmd", echo=TRUE, results='asis'} ``` ```{r} print(doc_report, target = word_report_document_name) saveWorkbook(wb = xls_report, file = excel_report_document_name, overwrite = TRUE) ```
  • 19. Introduction ► Step 2: Simple, automated template-based workflow definitions definitions Header Footer Title RMarkdown file - Creates the environment - Reads the DOCx template - Loads the Word parsing „engine” - The engine: - iterates through definitions of placeholders - parses the fields, - loads the R files per convention - executes the code - replaces placeholders with actual TFLs - Auto-updates (appends) the HTML to LOG library(…) library(…) library(…) ….. ….. ….. ….. DOCx reading engine ## Preparing the objects storing the content of the report in both MS Word and MS Excel formats ```{r} word_report_document_name <- paste0(target_report_document_name, ".docx") excel_report_document_name <- paste0(target_report_document_name, ".xlsx") word_report_template_name <- paste0(target_report_document_name, "_template.docx") doc_report <- read_docx(word_report_document_name) doc_content <- docx_summary(doc_report) xls_report <- createWorkbook() ``` # Data analysis ```{r child="rendering_engine.rmd", echo=TRUE, results='asis'} ``` ```{r} print(doc_report, target = word_report_document_name) saveWorkbook(wb = xls_report, file = excel_report_document_name, overwrite = TRUE) ``` table_defs <- subset(doc_content, grepl("^[Table]", doc_content$text), text) table_defs <- gsub("[Table] ", "", table_defs$text) for (def in table_defs) { split_defs <- strsplit(def, "@")[[1]][-1] table_title <- trimws(gsub("title:(.*)", "1", split_defs[grep("^title", split_defs)])) table_number <- trimws(gsub("table_num:(.*)", "1", split_defs[grep("^table_num", split_defs)])) force_table_num <- trimws(gsub("force_table_num:(.*)", "1", split_defs[grep("^force_table_num", split_defs)])) table_sufix <- trimws(gsub("table_sufix:(.*)", "1", split_defs[grep("^table_sufix", split_defs)])) r_file <- trimws(gsub("r_code:(.*)", "1", split_defs[grep("^r_code", split_defs)])) r_prn_file <- trimws(gsub("r_printer_code:(.*)", "1", split_defs[grep("^r_printer_code", split_defs)])) exclude <- trimws(gsub("exclude:(.*)", "1", split_defs[grep("^exclude", split_defs)])) table_title <- iconv(table_title,from = "UTF-8", to = "UTF-8") exclude <- ifelse(identical(exclude, character(0)), FALSE, as.logical(exclude)) …………………………………………… if (identical(r_file, character(0)) || r_file == "") { r_file <- paste0("Table", table_number, table_sufix, ".r") } …………………………………………… r_file <- file.path(r_code_location, r_file) chunk <- c(paste("#### Table ", paste0(table_number, table_sufix), "-", table_title, "n"), paste("```{r ", r_file, "}n"), readLines(r_file), "```n") cat(knit_child(text = chunk, quiet = TRUE), sep = 'n’) …………………………………………… } ```
  • 20. Introduction ► Step 2: Simple, automated template-based workflow DOCx template - Headers, footers - Styles - Content placeholders definitions definitions Header Footer Title Regular R files. Naming convention. Triplet per table. Prefix: _data reads data from RDATA / DBI / XML / CSV / XLSX _an performs the analyses; stores results in RDATA _print reads the RDATA, generates DOCx tables, XLSx files, EMF graphs and HTML output for the LOG Table_01_data.r Table_01_an.r Table_01_print.r 𝑦 = 𝛽0 + 𝛽1X
  • 21. Introduction ► Step2: Simple, automated template-based workflow definitions definitions Header Footer Title Header Footer Report ID A B C 1 A B C 2 A B C library(…) library(…) library(…) ….. ….. ….. ….. Trial ABC LOG Author: xxx Date: xxxx print(„Hi!”) [1] Hi!
  • 22. Introduction ► Step2: Simple, automated template-based workflow definitions definitions Header Footer Title Header Footer Report ID A B C 1 A B C 2 A B C library(…) library(…) library(…) ….. ….. ….. ….. Trial ABC LOG Author: xxx Date: xxxx print(„Hi!”) [1] Hi!
  • 23. Introduction ► Step 3: defining tasks  finding tools  making a library Modelling, longitudinal analysis Inference (testing, CIs, MCP) Summaries Effect size Advanced survival Making complex tables Dose – Response PK, PD, DF Questionnaires Generating documents (DOCx, RTF, PDF) Documenting (log) the analysis Data I/O Technical / Programming Trial design & simulation Plotting Randomization Data manipulation Meta-analysis CDISC-related Missing data – patterns and imputation Model diagnostics
  • 24. Introduction ► Step 3: defining tasks  finding tools  making a library Modelling, longitudinal analysis Inference (testing, CIs, MCP) Summaries Effect size Advanced survival Making complex tables Dose – Response PK, PD, DF Questionnaires Generating documents (DOCx, RTF, PDF) Documenting (log) the analysis Data I/O Technical / Programming Trial design & simulation Plotting Randomization Data manipulation CRTSize, faux, gsDesign, ldbounds, MAMS, Mediana, PowerTOST, pwr, RCTdesign, RPACT, samplesizeCMH, simstudy, SSRMST, ThreeArmedTrials, TrialSize CRTgeeDR, drgee, gee, geeasy, geepack , geesmv, GLMMadaptive, glmmTMB, glmtoolbox, ipw, lavaSearch2, lme4, lmerTest, lqmm, MASS, MASS, mmmgee, multgee, MuMIn, nlme, ordinal, QRLMM, repolr, rms, robustlmm, sasLM, simplexreg, wgeesel, lqmm, lqr, rms gam, quantreg… Meta-analysis CDISC-related Missing data – patterns and imputation Model diagnostics bshazard, cmprsk, ComparisonSurv, controlTest, coxphw, CoxR2, FHtest, frailtypack, frailtysurv, landest, maxcombo, mstate, muhaz, nph, npsurvSS, pammtools, reda, reReg, RMST, surv2sampleComp, survival, Survmisc, survRM2
  • 25. Introduction ► Step 3: defining tasks  finding tools  making a library ARTool, Asbio, BaylorEdPsych, bear, betareg, bindrcpp, binom, biostatUZH, blockrand, boot, broom, bshazard, car, clinPK, clubSandwich, cmprsk, coin, ComparisonSurv, compute.es, confintr, conflicted, contrast, controlTest, correlation, coxphw, CoxR2, CRTgeeDR, CRTSize, cvcqv, dabestr, DataEditR , DBI, DescTools, devEMF, diffdf, DoseFinding, dplyr, drgee, dunn.test, e1071, effectsize, effsize, emmeans, epiR, equatiomatic, faux, FHtest, fitdistrplus, flextable, forcats, foreign, forestmodel, frailtypack, frailtysurv, gam, gee, geeasy, geepack , geesmv, GFD, ggalluvial, GGally, ggeffects, gghalves, ggmosaic, ggplot2, ggpol, ggrepel, ggridges , ggside, ggsignif, ggstance, ggtext, GLMMadaptive, glmmTMB, glmnet, glmtoolbox, glue, gmodels, gridExtra , gsDesign, haven, Hmisc, InformativeCensoring, interactions, ipw, irr, Kendall, knitr, knitr, kSamples, landest, latex2exp, lavaSearch2, lawstat, ldbounds, likert, lme4, lmerTest, lmPerm, logR, logspline, lqmm, lqr, lubridate, magrittr, MAMS, MarginalEffects, margins, MASS, maxcombo, MCPMod, mcr, Mediana, meta, metafor, metaviz, mice, Minirand, MissMech, misty, Mkinfer, mmmgee, modelbased, mstate, muhaz, multcomp, multgee, multxpert, MuMIn, MVN, mvnormalTest, mvtnorm, naniar, nlme, nortest, nparcomp, nparLD, nph, npsurvSS, officer, onlineFDR, openxlsx, ordinal, PairedData, pammtools, patchwork, pbkrtest, PearsonDS, permuco, PK, PKconverter, PKfit, PKPDmodels, pkr, PMCMRplus, polycor, PowerTOST, PropCIs, Publish, purrr, pwr, qqplotr, QRLMM, Qtools, quantreg, r2rtf, randomizeR, rankFD, ratesci, Rcmdr, rcompanion, RCTdesign, readr, reda, repolr, reReg, rlang, Rmarkdown, rms, RMST, robustbase, robustlmm, RODBC, RPACT, rstatix, RVAideMemoire, rvg, samplesizeCMH, sandwich, sasLM, SASxport, scales, simplexreg, simstudy, sqldf, SSRMST, statsExpressions, stringr, summarytools, SuppDist, surv2sampleComp, survival, Survmisc, survRM2, testthat, ThreeArmedTrials, tidyquery, tidyr, tidytext, TOSTER, trend, TrialSize, UpSetR, VGAM, VIM, wgeesel, WRS2, xml2, mitml, jomo psych, irr, SimplyAgree mmrm, ggh4x, ggformula, DescrTab2 230 so far, 200 in daily use.
  • 26. Introduction ► Step 3: defining tasks  finding tools  Hall of fame! (incomplete!) nlme::gls() -MMRM emmeans geepack, geeM, geesmv, wgeesel, CRTgeeDR, multgee, repolr, ipw Officeverse Officer, flextable, rvg SASXport r2rtf glmmTMB Tidyverse nparLD, GFD, rankFD, ARTool, nparcomp, WRS2 Mediana, gsDesign, RCTdesign, rpact, MAMS survival, cmprsk, nph, reda, reReg, frailtypack, survRM2 RMarkdown broom boot sandwich, clubSanwdich DBI, RODBC effectsize margins, MarginalEffects sasLM
  • 27. Introduction ► Step 3: defining tasks  finding tools  Hall of fame! (incomplete!)
  • 28. Introduction ► Step 3: defining tasks  finding tools  Hall of fame! (incomplete!)
  • 29. Introduction ► So many sources of packages! Packages GitHub CRAN CRAN archive RForge External (PKfit) Bioconductor • Versions may differ • Different ways of reporting issues
  • 30. Introduction ► So many sources of packages! Packages GitHub CRAN CRAN archive RForge External (PKfit) Bioconductor • Versions may differ • Different ways of reporting issues
  • 31. Challenges ► Numerical validation https://www.researchgate.net/publication/345778861_Numerical_validation_as_a_critical _aspect_in_bringing_R_to_the_Clinical_Research
  • 32. Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R As long, as somebody uses just the basic tools, problems may never occur. And this scope may be just sufficient for quite a lot scenarios! • “group by” summaries with N, %, mean, median, SD, Q1, Q3, min, max… • aov() • kruskal.test(), wilcox.test(), t.test()… • lmer(post_value ~ treatment * time + baseline + baseline:time + (1|PatID)) • plot(survfit(Surv(time, status) ~ treatment)) BTW, did you see median()? Is it equal to quantile()[“50%”]? Always?   
  • 34. Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R Like it or not – the fact is that SAS® IS the industry standard in clinical trials and people will use it to re-create your analyses – and NATURALLY ask if the numbers don’t agree. SAS® Regulatory agency Journal Sponsor-side biostat team Your colleague Validator - it’s not about a “crusade”: “R is better! No! SAS® is better! No! Excel is better!” - it’s not about favoring anyone (“you think it’s better because expensive!?”) - It’s about the reality.
  • 35. Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R If they ask you about the discrepancies, you can: 1. ignore it (can you?), 2. say „I don’t know, it just happened, but R is right!” 3. investigate it and respond: 1. both are right, just different approach 🤷 2. well, R is wrong, I’m gonna fix it or message the authors But to respond – you need to know what happened. A much worse situation: NOBODY found a difference, and you just published the results with errors.
  • 36. Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R We do not care, if a package has a „good marketing”. It must be working well. Has vignettes! Has active community! 5 in rankings. YouTube tutorials. Top popular download on GitHub Has unfixed errors that nobody cares…
  • 37. Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R • nlme: Priority: recommended; linear mixed models with almost all the stuff SAS has • MCPMod: Design and Analysis of Dose-Finding Studies • PMCMRplus Lots of popular non-param stuff, dose repsonse findings - Williams • MASS Priority: recommended; lots of stuff, including glmmPQL! • boot Priority: recommended • nparcomp Lots of non-parametric methods • frailtySurv Shared frailty models • rms Strategies for regression modeling by Prof. Harrell • geesmv Small-sample Morell’s correction for the GEE sandwich SEs • ipw Inverse-Probability Weighting – for GEE under MAR • multxpert Common Multiple Testing Procedures and Gatekeeping Procedures by Prof. Dmitrienko • PropCIs A must have – CIs for proportion • pkfit One of the most important tools for PK; Not even on CRAN • bear One of the best available tools for the PK, not even on CRAN • cmprsk Survival with competing risks These packages have no marketing. Would you exclude them from your toolkit?
  • 38. Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7327187/
  • 39. Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R And the real problem is that R is discrepant not only against SAS, but even… itself
  • 40. Challenges ► Numerical validation – R vs. SAS®, vs. SPSS®, vs. Stata®, vs… R After failing several times, we finally decided to validate as much as possible. This consumes a lot of time and efforts. But make us sleeping better. Package Function Version Dataset Test Completed Soft1 Soft2 Soft3 Discrep. Decision Justif. pkg1 fn1() 0.6.2 Trial 1 #23 OK OK OK FAILED …………. OK ……… pkg1 fn1() 0.6.2 Journal 2 #23 OK OK FAILED OK ………… FAILED ……… pkg1 fn1() 0.6.2 Journal 2 #24 OK OK OK OK ………… OK ……… pkg1 fn2() 0.6.2 Journal 2 #25 OK FAILED FAILED FAILED ………… FAILED ……… Validation Reference software Textbook formulas – by hand Other trusted package Published results: journals/books Published results: manuals Code inspection
  • 41. Challenges ► Ocean of possibilities. But be careful! It’s deep! Open Source gives you the ocean of possibilities (for doing THE SAME)! OK! Diversity is overall good, but without overdoing! Let’s imagine I want procedure ABC. R has 10 functions in 5 packages to do ABC in 8 ways. My day has only 24 hours and I have my work, and the lifer after hours.
  • 42. Challenges ► Documentation Documentation quality varies a lot. From dedicated web-books with numerous examples ( https://ardata-fr.github.io/flextable-book/ ) to just raw manual with no formula and references to a paid article or rare book. SAS, NCSS, SPSS, Stata – have awesome tutorials, manuals – almost courses in statistics  NCSS gives even the input data and results! Just a basic manual You can do PhD with it!
  • 43. Challenges ► Why cannot things be simple? SAS ®: PROC MIXED EMPIRICAL… REPEATED … CS … KR … LSMEANS R: Kenward-Roger? … a-ha! Use lmer4! But wait, I want a marginal model with CS. Random-intercept ≠ CS for negative within-subject correlations! I could use glmmTMB for this, but pbkrtest doesn’t support it. But there’s nlme! Take nlme::gls(). But pbkrtest doesn’t support nlme::gls(). OK then, let’s use Satterthwaite! OK. nlme::gls() + emmeans (for LS-means + Satterthwaite). Now I want the robust HC0 („sandwich”) estimator. Get clubSandwich and use the emmeans to provide the adjusted Var-Cov. Follow it by emmeans::joint_tests(). Double check the DF, as car::Anova() may have a problem here. Done! Sigh! … Did you check GitHub, if there are no opened issues? Statistics UX
  • 44. Challenges ► Packages removed from CRAN BaylorEdPsych , cvcqv, MissMech, PKPDmodels, SASxport, mixor, coxinterval, quantreg (restored), brunnermunzel (restored), MomTrunc (restored), tlrmvnmvt (restored), normtest, flow, nlmixr
  • 45. Challenges ► Packages removed from CRAN BaylorEdPsych , cvcqv, MissMech, PKPDmodels, SASxport, mixor, coxinterval, quantreg (restored), brunnermunzel (restored), MomTrunc (restored), tlrmvnmvt (restored), normtest, flow, nlmixr
  • 46. Challenges ► Packages removed from CRAN
  • 47. Challenges ► Let’s combine it together! START! Package removed from CRAN Search for a replacement Email the author… Create new issue. Learn the new package What does this thing do?! Something is wrong! It works! FIXED? What now!? Another package is needed It works! Sorry, I’m busy. No, it doesn’t Partially managed…
  • 48. Future plans We plan: • To research a couple of new tools: • For work: MMRM (!) • For CDISC: admiral, sassy, definer, metacore • For RTF: rtftables, gt • Out of curiosity: tplyr • For technical work: box • To focus on CDISC and a preparation to the first big submission. • To extend the numerical validation of packages
  • 49. Overall impression Employing Open Source means accepting the consequences. The efforts, costs, extra work - cannot be taken lightly in a small CRO. But it is definitely worth the efforts. In moments of doubt, it’s good to remember, that no big deals come easy. R is and will be our friend. Even if a demanding one 
  • 50. 50 THANK YOU This is just the beginning…