Docopt, beautiful command-line options for R, user2014
Docopt, beautiful command-line options for R
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
July 2014, UseR!2014
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
What is docopt?
docopt is an utility R library for parsing command-line options. It is
a port of docopt.py (python).
How does it work?
You supply a properly formed help description
docopt creates from this a fully functional command-line
parser
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
Why Command-line?
R is used more and more:
Ad hoc, interactive analysis, e.g
R REPL shell
RStudio
interactive data analysis
Creating R libraries with vi, Rstudio etc.
no data analysis
But also for repetitive batch jobs:
Rscript my_script.R arg1 arg2 . . .
R -f my_script.R --args arg1 arg2 . . .
reproducible data processing
So also more and more Command-line!
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
Rscript example
#!/usr/bin/Rscript
my_model <- glm( data=iris
, Sepal.Width ~ Sepal.Length
)
print(coef(my_model))
Hmm, that script only works for this specific data set.
I Need Arguments and Options!
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
Command-line parameters
Parsing command-line parameters seems easy, but what about:
Switches? e.g. --debug, --help
Short names and long names? -d, -h vs --debug, --help?
Options with a value? --output=garbage.csv
Arguments e.g. input_file.csv?
Optional arguments?
default values for options?
documenting all options and arguments?
That is a lot of work for just a batch script. . .
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
Retrieving command-line options
What libraries available?
base::commandArgs (very primitive)
library(getopt): (basic)
library(argparse), Python dependency
library(optparse) very nice, Python inspired
These are all fine, but result in a lot of parsing or settting-up code
in your script. (and that is not what your script is about. . . )
docopt is different.
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
What is Docopt?
Originally a Python lib: http://docopt.org
It is a Command Line Interface Specification language:
You specify your help and docopt parser takes care of
everything.
The documentation = the specification.
Your script starts with the command-line help
docopt automatically has --help or -h switch to supply help
to users of your script.
It will stop when obligatory switch are not set or non existing
options are set.
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
Simple example
#!/usr/bin/Rscript
"This is my incredible script
Usage: my_inc_script.R [-v --output=<output>] FILE
" -> doc
library(docopt)
my_opts <- docopt(doc)
That’s all you need to handle your command-line options.
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
Options
Docopt lets you parse:
Both short as long options
Default values
Descriptions of parameters
Optional parameters: my_script.R [-a -b]
Commands: my_script.R (lm | summary)
Positional arguments
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
Usage patterns
Syntax is defined at http://docopt.org
Start with Usage:
"Usage:
script.R --option <argument>
script.R [<optional-argument>]
script.R --another-option=<with-argument>
script.R (--either-that-option | <or-this-argument>)
script.R <repeating-argument> <repeating-argument>...
" -> doc
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
Longer example
#!/usr/bin/Rscript
"This is my useful scriptI I use on everything
Usage: my_uf_script.R [options] FILE
Options:
-b --bogus This is a bogus switch
-o --output=OUTPUT output file [default: out.csv]
Arguments:
FILE the input file" -> doc
library(docopt)
my_opts <- docopt(doc)
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
Recall first example
Lets make a CLI for our script
#!/usr/bin/Rscript
my_model <- glm( data=iris
, Sepal.Width ~ Sepal.Length
)
print(coef(my_model))
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
Preparing. . .
#!/usr/bin/Rscript
main <- function( DATA, response, terms, family){
data <- read.csv(DATA)
f <- as.formula(paste0(response, " ~ ", terms))
my_model <- glm(f, family=family, data=data)
print(coef(my_model))
}
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
Done!
"Usage: my_script.R --response=<y> --terms=<x>
[--family=<family>] DATA
Options:
-r --response=<y> Response for glm
-t --terms=<x> Terms for glm
-f --family=<family> Family [default: gaussian]
Arguments:
DATA Input data frame" -> doc
main <- function( DATA, response, terms, family){...}
opt <- docopt::docopt(doc)
main(opt$DATA, opt[["--response"]], opt[["--terms"]],
opt[["--family"]])
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
Implementation
Docopt is implemented:
using Reference classes (R5) in pure R.
It is port of the original Python project: http://docopt.org
Available from: CRAN and
https://github.com/edwindj/docopt.R
Very functional, except for:
multiple identical arguments -vvv
repeating arguments (both will be fixed soon)
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
Questions?
$ my_talk.R --help
Edwins talk on docopt
Usage: my_talk.R (--questions | --fell-asleep)
Options:
-q --questions Anyone any questions?
-f --fell-asleep Wake up! Next UseR talk!
$ my_talk.R --questions
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R