R journal 2010-2


Published on

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

R journal 2010-2

  1. 1. The Journal Volume 2/2, December 2010 A peer-reviewed, open-access publication of the R Foundation for Statistical ComputingContentsEditorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Contributed Research ArticlesSolving Differential Equations in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Source References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16hglm: A Package for Fitting Hierarchical Generalized Linear Models . . . . . . . . . . . . . . 20dclone: Data Cloning in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29stringr: modern, consistent string processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38Bayesian Estimation of the GARCH(1,1) Model with Student-t Innovations . . . . . . . . . . . 41cudaBayesreg: Bayesian Computation in CUDA . . . . . . . . . . . . . . . . . . . . . . . . . . 48binGroup: A Package for Group Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56The RecordLinkage Package: Detecting Errors in Data . . . . . . . . . . . . . . . . . . . . . . . 61spikeslab: Prediction and Variable Selection Using Spike and Slab Regression . . . . . . . . . 68From the CoreWhat’s New? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74News and NotesuseR! 2010 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77Forthcoming Events: useR! 2011 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79Changes in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81Changes on CRAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90News from the Bioconductor Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101R Foundation News . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
  2. 2. 2 The Journal is a peer-reviewed publication of the R Foundation for Statistical Computing. Communications regarding this publication should be addressed to the editors. All articles are copyrighted by the respective authors. Prospective authors will find detailed and up-to-date submission instructions on the Journal’s homepage. Editor-in-Chief: Peter Dalgaard Center for Statistics Copenhagen Business School Solbjerg Plads 3 2000 Frederiksberg Denmark Editorial Board: Vince Carey, Martyn Plummer, and Heather Turner. Editor Programmer’s Niche: Bill Venables Editor Help Desk: Uwe Ligges Editor Book Reviews: G. Jay Kerns Department of Mathematics and Statistics Youngstown State University Youngstown, Ohio 44555-0002 USA gkerns@ysu.edu R Journal Homepage: http://journal.r-project.org/ Email of editors and editorial board: firstname.lastname@R-project.org The R Journal is indexed/abstracted by EBSCO, DOAJ.The R Journal Vol. 2/2, December 2010 ISSN 2073-4859
  3. 3. 3Editorialby Peter Dalgaard putationally demanding. Especially for transient so- lutions in two or three spatial dimensions, comput-Welcome to the 2nd issue of the 2nd volume of The ers simply were not fast enough in a time where nu-R Journal. merical performance was measured in fractions of a I am pleased to say that we can offer ten peer- MFLOPS (million floating point operations per sec-reviewed papers this time. Many thanks go to the ond). Today, the relevant measure is GFLOPS andauthors and the reviewers who ensure that our arti- we should be getting much closer to practicable so-cles live up to high academic standards. The tran- lutions.sition from R News to The R Journal is now nearly However, raw computing power is not sufficient;completed. We are now listed by EBSCO and the there are non-obvious aspects of numerical analysisregistration procedure with Thomson Reuters is well that should not be taken lightly, notably issues of sta-on the way. We thereby move into the framework of bility and accuracy. There is a reason that numericalscientific journals and away from the grey-literature analysis is a scientific field in its own right.newsletter format; however, it should be stressed From a statistician’s perspective, being able to fitthat R News was a fairly high-impact piece of grey models to actual data is of prime importance. Forliterature: A cited reference search turned up around models with only a few parameters, you can get1300 references to the just over 200 papers that were quite far with nonlinear regression and a good nu-published in R News! merical solver. For ill-posed problems with func- I am particularly happy to see the paper by tional parameters (the so-called “inverse problems”),Soetart et al. on differential equation solvers. In and for stochastic differential equations, there stillmany fields of research, the natural formulation of appears to be work to be done. Soetart et al. do notmodels is via local relations at the infinitesimal level, go into these issues, but I hope that their paper willrather than via closed form mathematical expres- be an inspiration for further work.sions, and quite often solutions rely on simplifying With this issue, in accordance with the rotationassumptions. My own PhD work, some 25 years ago, rules of the Editorial Board, I step down as Editor-concerned diffusion of substances within the human in-Chief, to be succeded by Heather Turner. Heathereye, with the ultimate goal of measuring the state has already played a key role in the transition fromof the blood-retinal barrier. Solutions for this prob- R News to The R Journal, as well as being probablylem could be obtained for short timespans, if one as- the most efficient Associate Editor on the Board. Thesumed that the eye was completely spherical. Ex- Editorial Board will be losing last year’s Editor-in-tending the solutions to accommodate more realis- Chief, Vince Carey, who has now been on board fortic models (a necessity for fitting actual experimental the full four years. We shall miss Vince, who has al-data) resulted in quite unwieldy formulas, and even ways been good for a precise and principled argu-then, did not give you the kind of modelling freedom ment and in the process taught at least me severalthat you really wanted to elucidate the scientific is- new words. We also welcome Hadley Wickham assue. a new Associate Editor and member of the Editorial In contrast, numerical procedures could fairly Board.easily be set up and modified to better fit reality. Season’s greetings and best wishes for a happyThe main problem was that they tended to be com- 2011!The R Journal Vol. 2/2, December 2010 ISSN 2073-4859
  4. 4. 4The R Journal Vol. 2/2, December 2010 ISSN 2073-4859
  5. 5. C ONTRIBUTED R ESEARCH A RTICLES 5Solving Differential Equations in Rby Karline Soetaert, Thomas Petzoldt and R. Woodrow complete repertoire of differential equations can beSetzer1 numerically solved. More specifically, the following types of differen- Abstract Although R is still predominantly ap- tial equations can now be handled with add-on pack- plied for statistical analysis and graphical repre- ages in R: sentation, it is rapidly becoming more suitable for mathematical computing. One of the fields • Initial value problems (IVP) of ordinary differ- where considerable progress has been made re- ential equations (ODE), using package deSolve cently is the solution of differential equations. (Soetaert et al., 2010b). Here we give a brief overview of differential • Initial value differential algebraic equations equations that can now be solved by R. (DAE), package deSolve . • Initial value partial differential equations (PDE), packages deSolve and ReacTranIntroduction (Soetaert and Meysman, 2010).Differential equations describe exchanges of matter, • Boundary value problems (BVP) of ordinaryenergy, information or any other quantities, often as differential equations, using package bvpSolvethey vary in time and/or space. Their thorough ana- (Soetaert et al., 2010a), or ReacTran and root-lytical treatment forms the basis of fundamental the- Solve (Soetaert, 2009).ories in mathematics and physics, and they are in- • Initial value delay differential equationscreasingly applied in chemistry, life sciences and eco- (DDE), using packages deSolve or PBSddes-nomics. olve (Couture-Beil et al., 2010). Differential equations are solved by integration,but unfortunately, for many practical applications • Stochastic differential equations (SDE), usingin science and engineering, systems of differential packages sde (Iacus, 2008) and pomp (Kingequations cannot be integrated to give an analytical et al., 2008).solution, but rather need to be solved numerically. In this short overview, we demonstrate how to Many advanced numerical algorithms that solve solve the first four types of differential equationsdifferential equations are available as (open-source) in R. It is beyond the scope to give an exhaustivecomputer codes, written in programming languages overview about the vast number of methods to solvelike FORTRAN or C and that are available from these differential equations and their theory, so therepositories like GAMS (http://gams.nist.gov/) or reader is encouraged to consult one of the numer-NETLIB (www.netlib.org). ous textbooks (e.g., Ascher and Petzold, 1998; Press Depending on the problem, mathematical for- et al., 2007; Hairer et al., 2009; Hairer and Wanner,malisations may consist of ordinary differential 2010; LeVeque, 2007, and many others).equations (ODE), partial differential equations In addition, a large number of analytical and nu-(PDE), differential algebraic equations (DAE), or de- merical methods exists for the analysis of bifurca-lay differential equations (DDE). In addition, a dis- tions and stability properties of deterministic sys-tinction is made between initial value problems (IVP) tems, the efficient simulation of stochastic differen-and boundary value problems (BVP). tial equations or the estimation of parameters. We With the introduction of R-package odesolve do not deal with these methods here.(Setzer, 2001), it became possible to use R (R Devel-opment Core Team, 2009) for solving very simple ini-tial value problems of systems of ordinary differen- Types of differential equationstial equations, using the lsoda algorithm of Hind-marsh (1983) and Petzold (1983). However, many Ordinary differential equationsreal-life applications, including physical transportmodeling, equilibrium chemistry or the modeling of Ordinary differential equations describe the changeelectrical circuits, could not be solved with this pack- of a state variable y as a function f of one independentage. variable t (e.g., time or space), of y itself, and, option- Since odesolve, much effort has been made to ally, a set of other variables p, often called parameters:improve R’s capabilities to handle differential equa-tions, mostly by incorporating published and well dy y = = f (t, y, p)tested numerical codes, such that now a much more dt 1 The views expressed in this paper are those of the authors and do not necessarily reflect the views or policies of the U.S. EnvironmentalProtection AgencyThe R Journal Vol. 2/2, December 2010 ISSN 2073-4859
  6. 6. 6 C ONTRIBUTED R ESEARCH A RTICLES In many cases, solving differential equations re- rithm (Press et al., 2007). Two more stable solu-quires the introduction of extra conditions. In the fol- tion methods implement a mono implicit Runge-lowing, we concentrate on the numerical treatment Kutta (MIRK) code, based on the FORTRAN codeof two classes of problems, namely initial value prob- twpbvpC (Cash and Mazzia, 2005), and the collocationlems and boundary value problems. method, based on the FORTRAN code colnew (Bader and Ascher, 1987). Some boundary value problems can also be solved with functions from packages Re-Initial value problems acTran and rootSolve (see below).If the extra conditions are specified at the initial valueof the independent variable, the differential equa- Partial differential equationstions are called initial value problems (IVP). There exist two main classes of algorithms to nu- In contrast to ODEs where there is only one indepen-merically solve such problems, so-called Runge-Kutta dent variable, partial differential equations (PDE)formulas and linear multistep formulas (Hairer et al., contain partial derivatives with respect to more than2009; Hairer and Wanner, 2010). The latter contains one independent variable, for instance t (time) andtwo important families, the Adams family and the x (a spatial dimension). To distinguish this typebackward differentiation formulae (BDF). of equations from ODEs, the derivatives are repre- Another important distinction is between explicit sented with the ∂ symbol, e.g.and implicit methods, where the latter methods can ∂y ∂ysolve a particular class of equations (so-called “stiff” = f (t, x, y, , p) ∂t ∂xequations) where explicit methods have problemswith stability and efficiency. Stiffness occurs for in- Partial differential equations can be solved by sub-stance if a problem has components with different dividing one or more of the continuous independentrates of variation according to the independent vari- variables in a number of grid cells, and replacing theable. Very often there will be a tradeoff between us- derivatives by discrete, algebraic approximate equa-ing explicit methods that require little work per inte- tions, so-called finite differences (cf. LeVeque, 2007;gration step and implicit methods which are able to Hundsdorfer and Verwer, 2003).take larger integration steps, but need (much) more For time-varying cases, it is customary to discre-work for one step. tise the spatial coordinate(s) only, while time is left in In R, initial value problems can be solved with continuous form. This is called the method-of-lines,functions from package deSolve (Soetaert et al., and in this way, one PDE is translated into a large2010b), which implements many solvers from ODE- number of coupled ordinary differential equations,PACK (Hindmarsh, 1983), the code vode (Brown that can be solved with the usual initial value prob-et al., 1989), the differential algebraic equation solver lem solvers (cf. Hamdi et al., 2007). This applies todaspk (Brenan et al., 1996), all belonging to the linear parabolic PDEs such as the heat equation, and to hy-multistep methods, and comprising Adams meth- perbolic PDEs such as the wave equation.ods as well as backward differentiation formulae. For time-invariant problems, usually all indepen-The former methods are explicit, the latter implicit. dent variables are discretised, and the derivatives ap-In addition, this package contains a de-novo imple- proximated by algebraic equations, which are solvedmentation of a rather general Runge-Kutta solver by root-finding techniques. This technique applies tobased on Dormand and Prince (1980); Prince and elliptic PDEs.Dormand (1981); Bogacki and Shampine (1989); Cash R-package ReacTran provides functions to gener-and Karp (1990) and using ideas from Butcher (1987) ate finite differences on a structured grid. After that,and Press et al. (2007). Finally, the implicit Runge- the resulting time-varying cases can be solved withKutta method radau (Hairer et al., 2009) has been specially-designed functions from package deSolve,added recently. while time-invariant cases can be solved with root- solving methods from package rootSolve .Boundary value problems Differential algebraic equationsIf the extra conditions are specified at different Differential-algebraic equations (DAE) contain avalues of the independent variable, the differen- mixture of differential ( f ) and algebraic equationstial equations are called boundary value problems (g), the latter e.g. for maintaining mass-balance con-(BVP). A standard textbook on this subject is Ascher ditions:et al. (1995). Package bvpSolve (Soetaert et al., 2010a) imple- y = f (t, y, p)ments three methods to solve boundary value prob- 0 = g(t, y, p)lems. The simplest solution method is the singleshooting method, which combines initial value prob- Important for the solution of a DAE is its index.lem integration with a nonlinear root finding algo- The index of a DAE is the number of differentiationsThe R Journal Vol. 2/2, December 2010 ISSN 2073-4859
  7. 7. C ONTRIBUTED R ESEARCH A RTICLES 7needed until a system consisting only of ODEs is ob- Numerical accuracytained. Function daspk (Brenan et al., 1996) from pack- Numerical solution of a system of differential equa-age deSolve solves (relatively simple) DAEs of index tions is an approximation and therefore prone to nu-at most 1, while function radau (Hairer et al., 2009) merical errors, originating from several sources:solves DAEs of index up to 3. 1. time step and accuracy order of the solver, 2. floating point arithmetics,Implementation details 3. properties of the differential system and stabil- ity of the solution algorithm.The implemented solver functions are explained bymeans of the ode-function, used for the solution of For methods with automatic stepsize selection,initial value problems. The interfaces to the other accuracy of the computation can be adjusted us-solvers have an analogous definition: ing the non-negative arguments atol (absolute tol- erance) and rtol (relative tolerance), which control ode(y, times, func, parms, method = c("lsoda", the local errors of the integration. "lsode", "lsodes", "lsodar", Like R itself, all solvers use double-precision "vode", "daspk", "euler", "rk4", floating-point arithmetics according to IEEE Stan- "ode23", "ode45", "radau", "bdf", dard 754 (2008), which means that it can represent "bdf_d", "adams", "impAdams", numbers between approx. ±2.25 10−308 to approx. "impAdams_d"), ...) ±1.8 10308 and with 16 significant digits. It is there- fore not advisable to set rtol below 10−16 , except set- To use this, the system of differential equations ting it to zero with the intention to use absolute tol-can be defined as an R-function (func) that computes erance exclusively.derivatives in the ODE system (the model definition) The solvers provided by the packages presentedaccording to the independent variable (e.g. time t). below have proven to be quite robust in most prac-func can also be a function in a dynamically loaded tical cases, however users should always be awareshared library (Soetaert et al., 2010c) and, in addition, about the problems and limitations of numericalsome solvers support also the supply of an analyti- methods and carefully check results for plausibil-cally derived function of partial derivatives (Jacobian ity. The section “Troubleshooting” in the package vi-matrix). gnette (Soetaert et al., 2010d) should be consulted as If func is an R-function, it must be defined as: a first source for solving typical problems. func <- function(t, y, parms, ...)where t is the actual value of the independent vari- Examplesable (e.g. the current time point in the integration),y is the current estimate of the variables in the ODE An initial value ODEsystem, parms is the parameter vector and ... can beused to pass additional arguments to the function. Consider the famous van der Pol equation (van der The return value of func should be a list, whose Pol and van der Mark, 1927), that describes a non-first element is a vector containing the derivatives conservative oscillator with non-linear damping andof y with respect to t, and whose next elements are which was originally developed for electrical cir-optional global values that can be recorded at each cuits employing vacuum tubes. The oscillation is de-point in times. The derivatives must be specified in scribed by means of a 2nd order ODE:the same order as the state variables y. Depending on the algorithm specified in argu- z − µ (1 − z2 ) z + z = 0ment method, numerical simulation proceeds either Such a system can be routinely rewritten as a systemexactly at the time steps specified in times, or us- of two 1st order ODEs, if we substitute z with y1 anding time steps that are independent from times and z with y2 :where the output is generated by interpolation. Withthe exception of method euler and several fixed-step y1 = y2Runge-Kutta methods all algorithms have automatic y2 = µ · (1 − y1 2 ) · y2 − y1time stepping, which can be controlled by setting ac-curacy requirements (see below) or by using optional There is one parameter, µ, and two differentialarguments like hini (initial time step), hmin (minimal variables, y1 and y2 with initial values (at t = 0):time step) and hmax (maximum time step). Specificdetails, e.g. about the applied interpolation methods y 1( t =0) = 2can be found in the manual pages and the original y 2( t =0) = 0literature cited there.The R Journal Vol. 2/2, December 2010 ISSN 2073-4859
  8. 8. 8 C ONTRIBUTED R ESEARCH A RTICLES The van der Pol equation is often used as a testproblem for ODE solvers, as, for large µ, its dy- IVP ODE, stiffnamics consists of parts where the solution changes 2very slowly, alternating with regions of very sharpchanges. This “stiffness” makes the equation quite 1challenging to solve. 0 y In R, this model is implemented as a function(vdpol) whose inputs are the current time (t), the val- −1ues of the state variables (y), and the parameters (mu); −2the function returns a list with as first element thederivatives, concatenated. 0 500 1000 1500 2000 2500 3000 time vdpol <- function (t, y, mu) { list(c( y[2], Figure 1: Solution of the van der Pol equation, an mu * (1 - y[1]^2) * y[2] - y[1] initial value ordinary differential equation, stiff case, )) µ = 1000. } After defining the initial condition of the state IVP ODE, nonstiffvariables (yini), the model is solved, and output 2written at selected time points (times), using de-Solve’s integration function ode. The default rou- 1tine lsoda, which is invoked by ode automatically 0 yswitches between stiff and non-stiff methods, de-pending on the problem (Petzold, 1983). −1 We run the model for a typically stiff (mu = 1000) −2and nonstiff (mu = 1) situation: 0 5 10 15 20 25 30 library(deSolve) time yini <- c(y1 = 2, y2 = 0) stiff <- ode(y = yini, func = vdpol, times = 0:3000, parms = 1000) Figure 2: Solution of the van der Pol equation, an initial value ordinary differential equation, non-stiff nonstiff <- ode(y = yini, func = vdpol, case, µ = 1. times = seq(0, 30, by = 0.01), parms = 1) solver non-stiff stiff The model returns a matrix, of class deSolve, ode23 0.37 271.19with in its first column the time values, followed by lsoda 0.26 0.23the values of the state variables: adams 0.13 616.13 bdf 0.15 0.22 head(stiff, n = 3) radau 0.53 0.72 time y1 y2 Table 1: Comparison of solvers for a stiff and a[1,] 0 2.000000 0.0000000000 non-stiff parametrisation of the van der Pol equation[2,] 1 1.999333 -0.0006670373 (time in seconds, mean values of ten simulations on[3,] 2 1.998666 -0.0006674088 an AMD AM2 X2 3000 CPU). Figures are generated using the S3 plot method A comparison of timings for two explicit solvers,for objects of class deSolve: the Runge-Kutta method (ode23) and the adams method, with the implicit multistep solver (bdf, plot(stiff, type = "l", which = "y1", backward differentiation formula) shows a clear ad- lwd = 2, ylab = "y", vantage for the latter in the stiff case (Figure 1). The main = "IVP ODE, stiff") default solver (lsoda) is not necessarily the fastest, but shows robust behavior due to automatic stiff- plot(nonstiff, type = "l", which = "y1", ness detection. It uses the explicit multistep Adams lwd = 2, ylab = "y", method for the non-stiff case and the BDF method main = "IVP ODE, nonstiff") for the stiff case. The accuracy is comparable for allThe R Journal Vol. 2/2, December 2010 ISSN 2073-4859
  9. 9. C ONTRIBUTED R ESEARCH A RTICLES 9solvers with atol = rtol = 10−6 , the default. xi <- 0.0025 analytic <- cos(pi * x) + exp((x - 1)/sqrt(xi)) + exp(-(x + 1)/sqrt(xi))A boundary value ODE max(abs(analytic - twp[, 2]))The webpage of Jeff Cash (Cash, 2009) contains many [1] 7.788209e-10test cases, including their analytical solution (see be-low), that BVP solvers should be able to solve. We A similar low discrepancy (4 · 10−11 ) is noted foruse equation no. 14 from this webpage as an exam- the ξ = 0.0001 as solved by bvpcol; the shootingple: method is considerably less precise (1.4 · 10−5 ), al- though the same tolerance (atol = 10−8 ) was used ξy − y = −(ξπ 2 + 1) cos(πx ) for all runs.on the interval [−1, 1], and subject to the boundary The plot shows how the shape of the solutionconditions: is affected by the parameter ξ, becoming more and more steep near the boundaries, and therefore more y( x=−1) = 0 and more difficult to solve, as ξ gets smaller. y( x=+1) = 0 plot(shoot[, 1], shoot[, 2], type = "l", lwd = 2, ylim = c(-1, 1), col = "blue",The second-order equation first is rewritten as two xlab = "x", ylab = "y", main = "BVP ODE")first-order equations: lines(twp[, 1], twp[, 2], col = "red", lwd = 2) lines(coll[, 1], coll[, 2], col = "green", lwd = 2) y1 = y2 legend("topright", legend = c("0.01", "0.0025", y2 = 1/ξ · (y1 − (ξπ 2 + 1) cos(πx )) "0.0001"), col = c("blue", "red", "green"), title = expression(xi), lwd = 2)It is implemented in R as: BVP ODE Prob14 <- function(x, y, xi) { ξ 1.0 list(c( 0.01 y[2], 0.0025 0.0001 1/xi * (y[1] - (xi*pi*pi+1) * cos(pi*x)) 0.5 )) } 0.0With decreasing values of ξ, this problem becomes yincreasingly difficult to solve. We use three val-ues of ξ, and solve the problem with the shooting, −0.5the MIRK and the collocation method (Ascher et al.,1995). Note how the initial conditions yini and the con-ditions at the end of the integration interval yend −1.0are specified, where NA denotes that the value is not −1.0 −0.5 0.0 0.5 1.0known. The independent variable is called x here(rather than times in ode). x library(bvpSolve) x <- seq(-1, 1, by = 0.01) Figure 3: Solution of the BVP ODE problem, for dif- shoot <- bvpshoot(yini = c(0, NA), ferent values of parameter ξ. yend = c(0, NA), x = x, parms = 0.01, func = Prob14) Differential algebraic equations twp <- bvptwp(yini = c(0, NA), yend = c(0, The so called “Rober problem” describes an auto- NA), x = x, parms = 0.0025, catalytic reaction (Robertson, 1966) between three func = Prob14) chemical species, y1 , y2 and y3 . The problem can be coll <- bvpcol(yini = c(0, NA), formulated either as an ODE (Mazzia and Magherini, yend = c(0, NA), x = x, parms = 1e-04, 2008), or as a DAE: func = Prob14) y1 = −0.04y1 + 104 y2 y3The numerical approximation generated by bvptwp y2 = 0.04y1 − 104 y2 y3 − 3107 y2 2is very close to the analytical solution, e.g. for ξ =0.0025: 1 = y1 + y2 + y3The R Journal Vol. 2/2, December 2010 ISSN 2073-4859
  10. 10. 10 C ONTRIBUTED R ESEARCH A RTICLES IVP DAE where the first two equations are differential y1 y2equations that specify the dynamics of chemicalspecies y1 and y2 , while the third algebraic equation 0.8ensures that the summed concentration of the three 2e−05 conc. conc.species remains 1. 0.4 The DAE has to be specified by the residual func- 0e+00tion instead of the rates of change (as in ODEs). 0.0 1e−06 1e+00 1e+06 1e−06 1e+00 1e+06 r1 = −y1 − 0.04y1 + 104 y2 y3 time time r2 = −y2 + 0.04y1 − 104 y2 y3 − 3 107 y2 2 y3 error r3 = −1 + y1 + y2 + y3 1e−09 0.8 Implemented in R this becomes: −2e−09 conc. conc. 0.4 daefun<-function(t, y, dy, parms) { −5e−09 res1 <- - dy[1] - 0.04 * y[1] + 0.0 1e4 * y[2] * y[3] 1e−06 1e+00 1e+06 1e−06 1e+00 1e+06 res2 <- - dy[2] + 0.04 * y[1] - 1e4 * y[2] * y[3] - 3e7 * y[2]^2 time time res3 <- y[1] + y[2] + y[3] - 1 list(c(res1, res2, res3), error = as.vector(y[1] + y[2] + y[3]) - 1) Figure 4: Solution of the DAE problem for the sub- } stances y1 , y2 , y3 ; mass balance error: deviation of to- tal sum from one. yini <- c(y1 = 1, y2 = 0, y3 = 0) dyini <- c(-0.04, 0.04, 0) times <- 10 ^ seq(-6,6,0.1) Partial differential equations In partial differential equations (PDE), the func- The input arguments of function daefun are the tion has several independent variables (e.g. time andcurrent time (t), the values of the state variables and depth) and contains their partial derivatives.their derivatives (y, dy) and the parameters (parms). Many partial differential equations can be solvedIt returns the residuals, concatenated and an output by numerical approximation (finite differencing) af-variable, the error in the algebraic equation. The lat- ter rewriting them as a set of ODEs (see Schiesser,ter is added to check upon the accuracy of the results. 1991; LeVeque, 2007; Hundsdorfer and Verwer, For DAEs solved with daspk, both the state vari- 2003).ables and their derivatives need to be initialised (y Functions tran.1D, tran.2D, and tran.3D fromand dy). Here we make sure that the initial condi- R package ReacTran (Soetaert and Meysman, 2010)tions for y obey the algebraic constraint, while also implement finite difference approximations of thethe initial condition of the derivatives is consistent diffusive-advective transport equation which, for thewith the dynamics. 1-D case, is: library(deSolve) 1 ∂ ∂C ∂ print(system.time(out <-daspk(y = yini, − · Ax −D · − ( Ax · u · C) dy = dyini, times = times, res = daefun, Ax ∂x ∂x ∂x parms = NULL))) Here D is the “diffusion coefficient”, u is the “advec- user system elapsed tion rate”, and A x is some property (e.g. surface area) 0.07 0.00 0.11 that depends on the independent variable, x. It should be noted that the accuracy of the finite An S3 plot method can be used to plot all vari- difference approximations can not be specified in theables at once: ReacTran functions. It is up to the user to make sure that the solutions are sufficiently accurate, e.g. by in- plot(out, ylab = "conc.", xlab = "time", cluding more grid points. type = "l", lwd = 2, log = "x") mtext("IVP DAE", side = 3, outer = TRUE, line = -1) One dimensional PDE There is a very fast initial change in concentra- Diffusion-reaction models are a fundamental class oftions, mainly due to the quick reaction between y1 models which describe how concentration of matter,and y2 and amongst y2 . After that, the slow reaction energy, information, etc. evolves in space and timeof y1 with y2 causes the system to change much more under the influence of diffusive transport and trans-smoothly. This is typical for stiff problems. formation (Soetaert and Herman, 2009).The R Journal Vol. 2/2, December 2010 ISSN 2073-4859
  11. 11. C ONTRIBUTED R ESEARCH A RTICLES 11 As an example, consider the 1-D diffusion- user system elapsedreaction model in [0, 10]: 0.02 0.00 0.02 ∂C ∂ ∂C The values of the state-variables (y) are plotted = D· −Q ∂t ∂x ∂x against the distance, in the middle of the grid cells (Grid$x.mid).with C the concentration, t the time, x the distancefrom the origin, Q, the consumption rate, and with plot (Grid$x.mid, std$y, type = "l",boundary conditions (values at the model edges): lwd = 2, main = "steady-state PDE", xlab = "x", ylab = "C", col = "red") ∂C =0 ∂x x=0 steady−state PDE Cx=10 = Cext 20To solve this model in R, first the 1-D model Grid is 10defined; it divides 10 cm (L) into 1000 boxes (N). 0 C library(ReacTran) −10 Grid <- setup.grid.1D(N = 1000, L = 10) −30 The model equation includes a transport term,approximated by ReacTran function tran.1D and 0 2 4 6 8 10a consumption term (Q). The downstream bound- xary condition, prescribed as a concentration (C.down)needs to be specified, the zero-gradient at the up-stream boundary is the default: Figure 5: Steady-state solution of the 1-D diffusion- reaction model. pde1D <-function(t, C, parms) { tran <- tran.1D(C = C, D = D, The analytical solution compares well with the C.down = Cext, dx = Grid)$dC numerical approximation: list(tran - Q) # return value: rate of change } analytical <- Q/2/D*(Grid$x.mid^2 - 10^2) + Cext max(abs(analytical - std$y)) The model parameters are: [1] 1.250003e-05 D <- 1 # diffusion constant Q <- 1 # uptake rate Next the model is run dynamically for 100 time Cext <- 20 units using deSolve function ode.1D, and starting with a uniform concentration: In a first application, the model is solved tosteady-state, which retrieves the condition where the require(deSolve)concentrations are invariant: times <- seq(0, 100, by = 1) ∂ ∂C system.time( 0= D· −Q out <- ode.1D(y = rep(1, Grid$N), ∂x ∂x times = times, func = pde1D, parms = NULL, nspec = 1)In R, steady-state conditions can be estimated using )functions from package rootSolve which implementamongst others a Newton-Raphson algorithm (Press user system elapsedet al., 2007). For 1-dimensional models, steady.1D is 0.61 0.02 0.63most efficient. The initial “guess” of the steady-statesolution (y) is unimportant; here we take simply N Here, out is a matrix, whose 1st column containsrandom numbers. Argument nspec = 1 informs the the output times, and the next columns the values ofsolver that only one component is described. the state variables in the different boxes; we print the Although a system of 1000 equations needs to be first columns of the last three rows of this matrix:solved, this takes only a fraction of a second: tail(out[, 1:4], n = 3) library(rootSolve) print(system.time( time 1 2 3 std <- steady.1D(y = runif(Grid$N), [99,] 98 -27.55783 -27.55773 -27.55754 func = pde1D, parms = NULL, nspec = 1) [100,] 99 -27.61735 -27.61725 -27.61706 )) [101,] 100 -27.67542 -27.67532 -27.67513The R Journal Vol. 2/2, December 2010 ISSN 2073-4859
  12. 12. 12 C ONTRIBUTED R ESEARCH A RTICLES We plot the result using a blue-yellow-red color R code, or in compiled languages. However, com-scheme, and using deSolve’s S3 method image. Fig- pared to odesolve, it includes a more complete seture 6 shows that, as time proceeds, gradients develop of integrators, and a more extensive set of options tofrom the uniform distribution, until the system al- tune the integration routines, it provides more com-most reaches steady-state at the end of the simula- plete output, and has extended the applicability do-tion. main to include also DDEs, DAEs and PDEs. Thanks to the DAE solvers daspk (Brenan et al., image(out, xlab = "time, days", 1996) and radau (Hairer and Wanner, 2010) it is now ylab = "Distance, cm", also possible to model electronic circuits or equilib- main = "PDE", add.contour = TRUE) rium chemical systems. These problems are often of index ≤ 1. In many mechanical systems, physical constraints lead to DAEs of index up to 3, and these PDE more complex problems can be solved with radau. 1.0 The inclusion of BVP and PDE solvers have 15 10 opened up the application area to the field of re- 5 active transport modelling (Soetaert and Meysman, 0.8 0 2010), such that R can now be used to describe quan- −5 tities that change not only in time, but also along one or more spatial axes. We use it to model how 0.6 −10Distance, cm −15 ecosystems change along rivers, or in sediments, but it could equally serve to model the growth of a tu- 0.4 −20 mor in human brains, or the dispersion of toxicants in human tissues. −25 The open source matrix language R has great po- 0.2 tential for dynamic modelling, and the tools cur- rently available are suitable for solving a wide va- riety of practical and scientific problems. The perfor- 0.0 0 20 40 60 80 100 mance is sufficient even for larger systems, especially time, days when models can be formulated using matrix alge- bra or are implemented in compiled languages like C or Fortran (Soetaert et al., 2010b). Indeed, thereFigure 6: Dynamic solution of the 1-D diffusion- is emerging interest in performing statistical analysisreaction model. on differential equations, e.g. in package nlmeODE (Tornøe et al., 2004) for fitting non-linear mixed- It should be noted that the steady-state model is effects models using differential equations, pack-effectively a boundary value problem, while the tran- age FME (Soetaert and Petzoldt, 2010) for sensitiv-sient model is a prototype of a “parabolic” partial dif- ity analysis, parameter estimation and Markov chainferential equation (LeVeque, 2007). Monte-Carlo analysis or package ccems for combina- Whereas R can also solve the other two main torially complex equilibrium model selection (Radi-classes of PDEs, i.e. of the “hyperbolic” and “ellip- voyevitch, 2008).tic” type, it is well beyond the scope of this paper to However, there is ample room for extensionselaborate on that. and improvements. For instance, the PDE solvers are quite memory intensive, and could benefit from the implementation of sparse matrix solvers that areDiscussion more efficient in this respect2 . In addition, the meth- ods implemented in ReacTran handle equations de-Although R is still predominantly applied for statis- fined on very simple shapes only. Extending thetical analysis and graphical representation, it is more PDE approach to finite elements (Strang and Fix,and more suitable for mathematical computing, e.g. 1973) would open up the application domain of R toin the field of matrix algebra (Bates and Maechler, any irregular geometry. Other spatial discretisation2008). Thanks to the differential equation solvers, R schemes could be added, e.g. for use in fluid dynam-is also emerging as a powerful environment for dy- ics.namic simulations (Petzoldt, 2003; Soetaert and Her- Our models are often applied to derive unknownman, 2009; Stevens, 2009). parameters by fitting them against data; this relies on The new package deSolve has retained all the the availability of apt parameter fitting algorithms.funtionalities of its predecessor odesolve (Setzer, Discussion of these items is highly welcomed, in2001), such as the potential to define models both in the new special interest group about dynamic mod- 2 for instance, the “preconditioned Krylov” part of the daspk method is not yet supported 3 https://stat.ethz.ch/mailman/listinfo/r-sig-dynamic-modelsThe R Journal Vol. 2/2, December 2010 ISSN 2073-4859
  13. 13. C ONTRIBUTED R ESEARCH A RTICLES 13els3 in R. E. Hairer, S. P. Noørsett, and G. Wanner. Solving Ordi- nary Differential Equations I: Nonstiff Problems. Sec- ond Revised Edition. Springer-Verlag, Heidelberg,Bibliography 2009.U. Ascher, R. Mattheij, and R. Russell. Numerical So- S. Hamdi, W. E. Schiesser, and G. W. Griffiths. lution of Boundary Value Problems for Ordinary Dif- Method of lines. Scholarpedia, 2(7):2859, 2007. ferential Equations. Philadelphia, PA, 1995. A. C. Hindmarsh. ODEPACK, a systematized collec- tion of ODE solvers. In R. Stepleman, editor, Scien-U. M. Ascher and L. R. Petzold. Computer Methods tific Computing, Vol. 1 of IMACS Transactions on Sci- for Ordinary Differential Equations and Differential- entific Computation, pages 55–64. IMACS / North- Algebraic Equations. SIAM, Philadelphia, 1998. Holland, Amsterdam, 1983.G. Bader and U. Ascher. A new basis implementa- W. Hundsdorfer and J. Verwer. Numerical Solution of tion for a mixed order boundary value ODE solver. Time-Dependent Advection-Diffusion-Reaction Equa- SIAM J. Scient. Stat. Comput., 8:483–500, 1987. tions. Springer Series in Computational Mathematics.D. Bates and M. Maechler. Matrix: A Matrix Package Springer-Verlag, Berlin, 2003. for R, 2008. R package version 0.999375-9. S. M. Iacus. sde: Simulation and Inference for Stochas- tic Differential Equations, 2008. R package versionP. Bogacki and L. Shampine. A 3(2) pair of Runge- 2.0.3. Kutta formulas. Appl. Math. Lett., 2:1–9, 1989. IEEE Standard 754. Ieee standard for floating-pointK. E. Brenan, S. L. Campbell, and L. R. Pet- arithmetic, Aug 2008. zold. Numerical Solution of Initial-Value Problems in Differential-Algebraic Equations. SIAM Classics in A. A. King, E. L. Ionides, and C. M. Breto. pomp: Sta- Applied Mathematics, 1996. tistical Inference for Partially Observed Markov Pro- cesses, 2008. R package version 0.21-3.P. N. Brown, G. D. Byrne, and A. C. Hindmarsh. VODE, a variable-coefficient ode solver. SIAM J. R. J. LeVeque. Finite Difference Methods for Ordinary Sci. Stat. Comput., 10:1038–1051, 1989. and Partial Differential Equations, Steady State and Time Dependent Problems. SIAM, 2007.J. C. Butcher. The Numerical Analysis of Ordinary Dif- ferential Equations, Runge-Kutta and General Linear F. Mazzia and C. Magherini. Test Set for Initial Value Methods. Wiley, Chichester, New York, 1987. Problem Solvers, release 2.4. Department of Mathe- matics, University of Bari, Italy, 2008. URL http://J. R. Cash. 35 Test Problems for Two Way Point Bound- pitagora.dm.uniba.it/~testset. Report 4/2008. ary Value Problems, 2009. URL http://www.ma.ic. ac.uk/~jcash/BVP_software/PROBLEMS.PDF. L. R. Petzold. Automatic selection of methods for solving stiff and nonstiff systems of ordinary dif-J. R. Cash and A. H. Karp. A variable order ferential equations. SIAM J. Sci. Stat. Comput., 4: Runge-Kutta method for initial value problems 136–148, 1983. with rapidly varying right-hand sides. ACM Trans- T. Petzoldt. R as a simulation platform in ecological actions on Mathematical Software, 16:201–222, 1990. modelling. R News, 3(3):8–16, 2003.J. R. Cash and F. Mazzia. A new mesh selection W. H. Press, S. A. Teukolsky, W. T. Vetterling, and algorithm, based on conditioning, for two-point B. P. Flannery. Numerical Recipes: The Art of Scien- boundary value codes. J. Comput. Appl. Math., 184: tific Computing. Cambridge University Press, 3rd 362–381, 2005. edition, 2007.A. Couture-Beil, J. T. Schnute, and R. Haigh. PB- P. J. Prince and J. R. Dormand. High order embed- Sddesolve: Solver for Delay Differential Equations, ded Runge-Kutta formulae. J. Comput. Appl. Math., 2010. R package version 1.08.11. 7:67–75, 1981.J. R. Dormand and P. J. Prince. A family of embed- R Development Core Team. R: A Language and Envi- ded Runge-Kutta formulae. J. Comput. Appl. Math., ronment for Statistical Computing. R Foundation for 6:19–26, 1980. Statistical Computing, Vienna, Austria, 2009. URL http://www.R-project.org. ISBN 3-900051-07-0.E. Hairer and G. Wanner. Solving Ordinary Differen- tial Equations II: Stiff and Differential-Algebraic Prob- T. Radivoyevitch. Equilibrium model selection: lems. Second Revised Edition. Springer-Verlag, Hei- dTTP induced R1 dimerization. BMC Systems Bi- delberg, 2010. ology, 2:15, 2008.The R Journal Vol. 2/2, December 2010 ISSN 2073-4859
  14. 14. 14 C ONTRIBUTED R ESEARCH A RTICLESH. H. Robertson. The solution of a set of reaction rate equations. In J. Walsh, editor, Numerical Analysis: Karline Soetaert An Introduction, pages 178–182. Academic Press, Netherlands Institute of Ecology London, 1966. K.Soetaert@nioo.knaw.nlW. E. Schiesser. The Numerical Method of Lines: In- Thomas Petzoldt tegration of Partial Differential Equations. Academic Technische Universität Dresden Press, San Diego, 1991. Thomas.Petzoldt@tu-dresden.deR. W. Setzer. The odesolve Package: Solvers for Ordi- R. Woodrow Setzer nary Differential Equations, 2001. R package version US Environmental Protection Agency 0.1-1. Setzer.Woodrow@epamail.epa.govK. Soetaert. rootSolve: Nonlinear Root Finding, Equi- librium and Steady-State Analysis of Ordinary Differ- ential Equations, 2009. R package version 1.6.K. Soetaert and P. M. J. Herman. A Practical Guide to Ecological Modelling. Using R as a Simulation Plat- form. Springer, 2009. ISBN 978-1-4020-8623-6.K. Soetaert and F. Meysman. ReacTran: Reactive Transport Modelling in 1D, 2D and 3D, 2010. R pack- age version 1.2.K. Soetaert and T. Petzoldt. Inverse modelling, sensi- tivity and Monte Carlo analysis in R using package FME. Journal of Statistical Software, 33(3):1–28, 2010. URL http://www.jstatsoft.org/v33/i03/.K. Soetaert, J. R. Cash, and F. Mazzia. bvpSolve: Solvers for Boundary Value Problems of Ordinary Dif- ferential Equations, 2010a. R package version 1.2.K. Soetaert, T. Petzoldt, and R. W. Setzer. Solving dif- ferential equations in R: Package deSolve. Journal of Statistical Software, 33(9):1–25, 2010b. ISSN 1548- 7660. URL http://www.jstatsoft.org/v33/i09.K. Soetaert, T. Petzoldt, and R. W. Setzer. R Pack- age deSolve: Writing Code in Compiled Languages, 2010c. deSolve vignette - R package version 1.8.K. Soetaert, T. Petzoldt, and R. W. Setzer. R Package deSolve: Solving Initial Value Differential Equations, 2010d. deSolve vignette - R package version 1.8.M. H. H. Stevens. A Primer of Ecology with R. Use R Series. Springer, 2009. ISBN: 978-0-387-89881-0.G. Strang and G. Fix. An Analysis of The Finite Element Method. Prentice Hall, 1973.C. W. Tornøe, H. Agersø, E. N. Jonsson, H. Mad- sen, and H. A. Nielsen. Non-linear mixed-effects pharmacokinetic/pharmacodynamic modelling in nlme using differential equations. Computer Meth- ods and Programs in Biomedicine, 76:31–40, 2004.B. van der Pol and J. van der Mark. Frequency de- multiplication. Nature, 120:363–364, 1927.The R Journal Vol. 2/2, December 2010 ISSN 2073-4859
  15. 15. C ONTRIBUTED R ESEARCH A RTICLES 15 Table 2: Summary of the main functions that solve differential equations. Function Package Description ode deSolve IVP of ODEs, full, banded or arbitrary sparse Jacobian ode.1D deSolve IVP of ODEs resulting from 1-D reaction-transport problems ode.2D deSolve IVP of ODEs resulting from 2-D reaction-transport problems ode.3D deSolve IVP of ODEs resulting from 3-D reaction-transport problems daspk deSolve IVP of DAEs of index ≤ 1, full or banded Jacobian radau deSolve IVP of DAEs of index ≤ 3, full or banded Jacobian dde PBSddesolve IVP of delay differential equations, based on Runge-Kutta formu- lae dede deSolve IVP of delay differential equations, based on Adams and BDF for- mulae bvpshoot bvpSolve BVP of ODEs; the shooting method bvptwp bvpSolve BVP of ODEs; mono-implicit Runge-Kutta formula bvpcol bvpSolve BVP of ODEs; collocation formula steady rootSolve steady-state of ODEs; full, banded or arbitrary sparse Jacobian steady.1D rootSolve steady-state of ODEs resulting from 1-D reaction-transport prob- lems steady.2D rootSolve steady-state of ODEs resulting from 2-D reaction-transport prob- lems steady.3D rootSolve steady-state of ODEs resulting from 3-D reaction-transport prob- lems tran.1D ReacTran numerical approximation of 1-D advective-diffusive transport problems tran.2D ReacTran numerical approximation of 2-D advective-diffusive transport problems tran.3D ReacTran numerical approximation of 3-D advective-diffusive transport problems Table 3: Summary of the auxilliary functions that solve differential equations. Function Package Description lsoda deSolve IVP ODEs, full or banded Jacobian, automatic choice for stiff or non-stiff method lsodar deSolve same as lsoda, but includes a root-solving procedure. lsode, vode deSolve IVP ODEs, full or banded Jacobian, user specifies if stiff or non- stiff lsodes deSolve IVP ODEs, arbitrary sparse Jacobian, stiff method rk4, rk, euler deSolve IVP ODEs, using Runge-Kutta and Euler methods zvode deSolve IVP ODEs, same as vode, but for complex variables runsteady rootSolve steady-state ODEs by dynamically running, full or banded Jaco- bian stode rootSolve steady-state ODEs by Newton-Raphson method, full or banded Jacobian stodes rootSolve steady-state ODEs by Newton-Raphson method, arbitrary sparse JacobianThe R Journal Vol. 2/2, December 2010 ISSN 2073-4859
  16. 16. 16 C ONTRIBUTED R ESEARCH A RTICLESSource Referencesby Duncan Murdoch [1] 3 Abstract Since version 2.10.0, R includes ex- > typeof(parsed) panded support for source references in R code [1] "expression" and ‘.Rd’ files. This paper describes the origin and purposes of source references, and current The first element is the assignment, the second ele- and future support for them. ment is the for loop, and the third is the single x at the end:One of the strengths of R is that it allows “compu- > parsed[[1]]tation on the language”, i.e. the parser returns an Robject which can be manipulated, not just evaluated. x <- 1:10This has applications in quality control checks, de- > parsed[[2]]bugging, and elsewhere. For example, the codetoolspackage (Tierney, 2009) examines the structure of for (i in x) {parsed source code to look for common program- print(i)ming errors. Functions marked by debug() can be }executed one statement at a time, and the trace() > parsed[[3]]function can insert debugging statements into anyfunction. x Computing on the language is often enhanced by The first two elements are both of type "language",being able to refer to the original source code, rather and are made up of smaller components. The dif-than just to a deparsed (reconstructed) version of ference between an "expression" and a "language"it based on the parsed object. To support this, we object is mainly internal: the former is based on theadded source references to R 2.5.0 in 2007. These are generic vector type (i.e. type "list"), whereas theattributes attached to the result of parse() or (as of latter is based on the "pairlist" type. Pairlists are2.10.0) parse_Rd() to indicate where a particular part rarely encountered explicitly in any other context.of an object originated. In this article I will describe From a user point of view, they act just like generictheir structure and how they are used in R. The arti- vectors.cle is aimed at developers who want to create debug- The third element x is of type "symbol". There aregers or other tools that use the source references, at other possible types, such as "NULL", "double", etc.:users who are curious about R internals, and also at essentially any simple R object could be an element.users who want to use the existing debugging facili- The comments in the source code and the whiteties. The latter group may wish to skip over the gory space making up the indentation of the third line aredetails and go directly to the section “Using Source not part of the parsed object.References". The parse_Rd() function parses ‘.Rd’ documenta- tion files. It also returns a recursive structure contain-The R parsers ing objects of different types (Murdoch and Urbanek, 2009; Murdoch, 2010).We start with a quick introduction to the R parser.The parse() function returns an R object of type"expression". This is a list of statements; the state- Source reference structurements can be of various types. For example, consider As described above, the result of parse() is essen-the R source shown in Figure 1. tially a list (the "expression" object) of objects that1: x <- 1:10 # Initialize x may be lists (the "language" objects) themselves, and2: for (i in x) { so on recursively. Each element of this structure from3: print(i) # Print each entry the top down corresponds to some part of the source4: } file used to create it: in our example, parse[[1]] cor-5: x responds to the first line of ‘sample.R’, parse[[2]] is the second through fourth lines, and parse[[3]] is Figure 1: The contents of ‘sample.R’. the fifth line. The comments and indentation, though helpful If we parse this file, we obtain an expression of to the human reader, are not part of the parsed object.length 3: However, by default the parsed object does contain a "srcref" attribute:> parsed <- parse("sample.R")> length(parsed) > attr(parsed, "srcref")The R Journal Vol. 2/2, December 2010 ISSN 2073-4859
  17. 17. C ONTRIBUTED R ESEARCH A RTICLES 17[[1]] The "srcfile" attribute is actually an environmentx <- 1:10 containing an encoding, a filename, a timestamp, and a working directory. These give information[[2]] about the file from which the parser was reading.for (i in x) { The reason it is an environment is that environments print(i) # Print each entry are reference objects: even though all three source ref-} erences contain this attribute, in actuality there is[[3]] only one copy stored. This was done to save mem-x ory, since there are often hundreds of source refer- ences from each file.Although it appears that the "srcref" attribute con- Source references in objects returned bytains the source, in fact it only references it, and the parse_Rd() use the same structure as those returnedprint.srcref() method retrieves it for printing. If by parse(). The main difference is that in Rd objectswe remove the class from each element, we see the source references are attached to every component,true structure: whereas parse() only constructs source references for complete statements, not for their component> lapply(attr(parsed, "srcref"), unclass) parts, and they are attached to the container of the statements. Thus for example a braced list of state-[[1]] ments processed by parse() will receive a "srcref"[1] 1 1 1 9 1 9 attribute containing source references for each state-attr(,"srcfile") ment within, while the statements themselves willsample.R not hold their own source references, and sub- expressions within each statement will not generate[[2]] source references at all. In contrast the "srcref" at-[1] 2 1 4 1 1 1attr(,"srcfile") tribute for a section in an ‘.Rd’ file will be a sourcesample.R reference for the whole section, and each component part in the section will have its own source reference.[[3]][1] 5 1 5 1 1 1attr(,"srcfile") Relation to the "source" attributesample.R By default the R parser also creates an attributeEach element is a vector of 6 integers: (first line, first named "source" when it parses a function definition.byte, last line, last byte, first character, last character). When available, this attribute is used by default inThe values refer to the position of the source for each lieu of deparsing to display the function definition.element in the original source file; the details of the It is unrelated to the "srcref" attribute, which is in-source file are contained in a "srcfile" attribute on tended to point to the source, rather than to duplicateeach reference. the source. An integrated development environment The reason both bytes and characters are (IDE) would need to know the correspondence be-recorded in the source reference is historical. When tween R code in R and the true source, and "srcref"they were introduced, they were mainly used for re- attributes are intended to provide this.trieving source code for display; for this, bytes areneeded. Since R 2.9.0, they have also been used to When are "srcref" attributes added?aid in error messages. Since some characters take upmore than one byte, users need to be informed about As mentioned above, the parser adds a "srcref"character positions, not byte positions, and the last attribute by default. For this, it is assumes thattwo entries were added. options("keep.source") is left at its default setting The "srcfile" attribute is also not as simple as it of TRUE, and that parse() is given a filename as argu-looks. For example, ment file, or a character vector as argument text. In the latter case, there is no source file to refer-> srcref <- attr(parsed, "srcref")[[1]] ence, so parse() copies the lines of source into a> srcfile <- attr(srcref, "srcfile") "srcfilecopy" object, which is simply a "srcfile"> typeof(srcfile) object that contains a copy of the text.[1] "environment" Developers may wish to add source references in other situations. To do that, an object inheriting from> ls(srcfile) class "srcfile" should be passed as the srcfile ar- gument to parse().[1] "Enc" "encoding" "filename" The other situation in which source references[4] "timestamp" "wd" are likely to be created in R code is when callingThe R Journal Vol. 2/2, December 2010 ISSN 2073-4859
  18. 18. 18 C ONTRIBUTED R ESEARCH A RTICLESsource(). The source() function calls parse(), cre- In this simple example it is easy to see where theating the source references, and then evaluates the problem occurred, but in a more complex functionresulting code. At this point newly created functions it might not be so simple. To find it, we can convertwill have source references attached to the body of the warning to an error usingthe function. > options(warn=2) The section “Breakpoints” below discusses howto make sure that source references are created in and then re-run the code to generate an error. Afterpackage code. generating the error, we can display a stack trace: > traceback()Using source references 5: doWithOneRestart(return(expr), restart) 4: withOneRestart(expr, restarts[[1L]])Error locations 3: withRestarts({ .Internal(.signalCondition(For the most part, users need not be concerned with simpleWarning(msg, call), msg, call))source references, but they interact with them fre- .Internal(.dfltWarn(msg, call))quently. For example, error messages make use of }, muffleWarning = function() NULL) at badabs.R#2them to report on the location of syntax errors: 2: .signalSimpleWarning("the condition has length > 1 and only the first element will be used",> source("error.R") quote(if (x < 0) x <- -x)) at badabs.R#3 1: badabs(c(5, -10))Error in source("error.R") : error.R:4:1: unexpectedelse To read a traceback, start at the bottom. We see our3: print( "less" ) call from the console as line “1:”, and the warning4: else being signalled in line “2:”. At the end of line “2:” ^ it says that the warning originated “at badabs.R#3”, i.e. line 3 of the ‘badabs.R’ file. A more recent addition is the use of source ref-erences in code being executed. When R evaluatesa function, it evaluates each statement in turn, keep- Breakpointsing track of any associated source references. As of Users may also make use of source references whenR 2.10.0, these are reported by the debugging sup- setting breakpoints. The trace() function lets us setport functions traceback(), browser(), recover(), breakpoints in particular R functions, but we need toand dump.frames(), and are returned as an attribute specify which function and where to do the setting.on each element returned by sys.calls(). For ex- The setBreakpoint() function is a more friendlyample, consider the function shown in Figure 2. front end that uses source references to construct a call to trace(). For example, if we wanted to set a1: # Compute the absolute value breakpoint on ‘badabs.R’ line 3, we could use2: badabs <- function(x) {3: if (x < 0) > setBreakpoint("badabs.R#3")4: x <- -x5: x D:svnpaperssrcrefsbadabs.R#3:6: } badabs step 2 in <environment: R_GlobalEnv> This tells us that we have set a breakpoint in step 2 of Figure 2: The contents of ‘badabs.R’. the function badabs found in the global environment. When we run it, we will see This function is syntactically correct, and worksto calculate the absolute value of scalar values, but is > badabs( c(5, -10) )not a valid way to calculate the absolute values of theelements of a vector, and when called it will generate badabs.R#3an incorrect result and a warning: Called from: badabs(c(5, -10))> source("badabs.R") Browse[1]>> badabs( c(5, -10) ) telling us that we have broken into the browser at the[1] 5 -10 requested line, and it is waiting for input. We could then examine x, single step through the code, or doWarning message: any other action of which the browser is capable.In if (x < 0) x <- -x : By default, most packages are built without the condition has length > 1 and only the first source reference information, because it adds quite element will be used substantially to the size of the code. However, settingThe R Journal Vol. 2/2, December 2010 ISSN 2073-4859
  19. 19. C ONTRIBUTED R ESEARCH A RTICLES 19the environment variable R_KEEP_PKG_SOURCE=yes length 6 with a class and "srcfile" attribute. It isbefore installing a source package will tell R to keep hard to measure exactly how much space this takesthe source references, and then breakpoints may be because much is shared with other source references,set in package source code. The envir argument to but it is on the order of 100 bytes per reference.setBreakpoints() will need to be set in order to tell Clearly a more efficient design is possible, at the ex-it to search outside the global environment when set- pense of moving support code to C from R. As part ofting breakpoints. this move, the use of environments for the "srcfile" attribute could be dropped: they were used as the only available R-level reference objects. For develop-The #line directive ers, this means that direct access to particular partsIn some cases, R source code is written by a program, of a source reference should be localized as much asnot by a human being. For example, Sweave() ex- possible: They should write functions to extract par-tracts lines of code from Sweave documents before ticular information, and use those functions wheresending the lines to R for parsing and evaluation. To needed, rather than extracting information directly.support such preprocessors, the R 2.10.0 parser rec- Then, if the implementation changes, only those ex-ognizes a new directive of the form tractor functions will need to be updated. Finally, source level debugging could be imple-#line nn "filename" mented to make use of source references, to singlewhere nn is an integer. As with the same-named step through the actual source files, rather than dis-directive in the C language, this tells the parser to playing a line at a time as the browser() does.assume that the next line of source is line nn fromthe given filename for the purpose of constructingsource references. The Sweave() function doesn’t Bibliographycurrently make use of this, but in the future, it (and D. Murdoch. Parsing Rd files. 2010. URL http:other preprocessors) could output #line directives //developer.r-project.org/parseRd.pdf.so that source references and syntax errors refer tothe original source location rather than to an inter- D. Murdoch and S. Urbanek. The new R help system.mediate file. The R Journal, 1/2:60–65, 2009. The #line directive was a late addition to R2.10.0. Support for this in Sweave() appeared in R L. Tierney. codetools: Code Analysis Tools for R, 2009. R2.12.0. package version 0.2-2.The future Duncan Murdoch Dept. of Statistical and Actuarial SciencesThe source reference structure could be improved. University of Western OntarioFirst, it adds quite a lot of bulk to R objects in mem- London, Ontario, Canadaory. Each source reference is an integer vector of murdoch@stats.uwo.caThe R Journal Vol. 2/2, December 2010 ISSN 2073-4859