Docopt, beautiful command-line options for R, user2014

6,815 views

Published on

Presentation given at UseR!2014, July 2nd 2014.

Published in: Software, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
6,815
On SlideShare
0
From Embeds
0
Number of Embeds
61
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Docopt, beautiful command-line options for R, user2014

  1. 1. Docopt, beautiful command-line options for R Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) July 2014, UseR!2014 Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  2. 2. What is docopt? docopt is an utility R library for parsing command-line options. It is a port of docopt.py (python). How does it work? You supply a properly formed help description docopt creates from this a fully functional command-line parser Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  3. 3. Why Command-line? R is used more and more: Ad hoc, interactive analysis, e.g R REPL shell RStudio interactive data analysis Creating R libraries with vi, Rstudio etc. no data analysis But also for repetitive batch jobs: Rscript my_script.R arg1 arg2 . . . R -f my_script.R --args arg1 arg2 . . . reproducible data processing So also more and more Command-line! Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  4. 4. Rscript example #!/usr/bin/Rscript my_model <- glm( data=iris , Sepal.Width ~ Sepal.Length ) print(coef(my_model)) Hmm, that script only works for this specific data set. I Need Arguments and Options! Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  5. 5. Command-line parameters Parsing command-line parameters seems easy, but what about: Switches? e.g. --debug, --help Short names and long names? -d, -h vs --debug, --help? Options with a value? --output=garbage.csv Arguments e.g. input_file.csv? Optional arguments? default values for options? documenting all options and arguments? That is a lot of work for just a batch script. . . Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  6. 6. Retrieving command-line options What libraries available? base::commandArgs (very primitive) library(getopt): (basic) library(argparse), Python dependency library(optparse) very nice, Python inspired These are all fine, but result in a lot of parsing or settting-up code in your script. (and that is not what your script is about. . . ) docopt is different. Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  7. 7. What is Docopt? Originally a Python lib: http://docopt.org It is a Command Line Interface Specification language: You specify your help and docopt parser takes care of everything. The documentation = the specification. Your script starts with the command-line help docopt automatically has --help or -h switch to supply help to users of your script. It will stop when obligatory switch are not set or non existing options are set. Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  8. 8. Simple example #!/usr/bin/Rscript "This is my incredible script Usage: my_inc_script.R [-v --output=<output>] FILE " -> doc library(docopt) my_opts <- docopt(doc) That’s all you need to handle your command-line options. Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  9. 9. Options Docopt lets you parse: Both short as long options Default values Descriptions of parameters Optional parameters: my_script.R [-a -b] Commands: my_script.R (lm | summary) Positional arguments Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  10. 10. Usage patterns Syntax is defined at http://docopt.org Start with Usage: "Usage: script.R --option <argument> script.R [<optional-argument>] script.R --another-option=<with-argument> script.R (--either-that-option | <or-this-argument>) script.R <repeating-argument> <repeating-argument>... " -> doc Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  11. 11. Longer example #!/usr/bin/Rscript "This is my useful scriptI I use on everything Usage: my_uf_script.R [options] FILE Options: -b --bogus This is a bogus switch -o --output=OUTPUT output file [default: out.csv] Arguments: FILE the input file" -> doc library(docopt) my_opts <- docopt(doc) Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  12. 12. Recall first example Lets make a CLI for our script #!/usr/bin/Rscript my_model <- glm( data=iris , Sepal.Width ~ Sepal.Length ) print(coef(my_model)) Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  13. 13. Preparing. . . #!/usr/bin/Rscript main <- function( DATA, response, terms, family){ data <- read.csv(DATA) f <- as.formula(paste0(response, " ~ ", terms)) my_model <- glm(f, family=family, data=data) print(coef(my_model)) } Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  14. 14. Done! "Usage: my_script.R --response=<y> --terms=<x> [--family=<family>] DATA Options: -r --response=<y> Response for glm -t --terms=<x> Terms for glm -f --family=<family> Family [default: gaussian] Arguments: DATA Input data frame" -> doc main <- function( DATA, response, terms, family){...} opt <- docopt::docopt(doc) main(opt$DATA, opt[["--response"]], opt[["--terms"]], opt[["--family"]]) Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  15. 15. Implementation Docopt is implemented: using Reference classes (R5) in pure R. It is port of the original Python project: http://docopt.org Available from: CRAN and https://github.com/edwindj/docopt.R Very functional, except for: multiple identical arguments -vvv repeating arguments (both will be fixed soon) Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  16. 16. Questions? $ my_talk.R --help Edwins talk on docopt Usage: my_talk.R (--questions | --fell-asleep) Options: -q --questions Anyone any questions? -f --fell-asleep Wake up! Next UseR talk! $ my_talk.R --questions Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  17. 17. Questions? Thanks for listening! Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R

×