Experience
                                    R project
                                     R spatial
  Managing research in collaborative networks




  Open Source software, research and higher
       education: a practitioner’s view
GoOpen 2010 (Fou thread), Aker Brygge, Oslo, 19–20 April.


                                     Roger Bivand

                        Department of Economics
        Norwegian School of Economics and Business Administration
                             Bergen, Norway


                                     20 April 2010



                                Roger Bivand     A practitioner’s view
Experience
                                            R project
                                             R spatial
          Managing research in collaborative networks


Outline
      This talk will examine how open source software development
      and use may interact with their institutional contexts in
      research and higher education
      The talk will be based on experience of open source
      development in applied statistics and geospatial applications
      Reasons for mismatch between an institutional context
      preferring secrecy when applying for funding, restricted
      deliverables, and races to publication, and the ways in which
      open source development occur will be discussed
      In particular, the roles of mutual trust and community-building
      in open source development will be stressed; these factors
      appear to express externalities between developers and users
      of software that are neglected in the exclusive management
      models prevalent in research and higher education
                                        Roger Bivand     A practitioner’s view
Experience
                                        R project
                                         R spatial
      Managing research in collaborative networks


Contextual background

     In order to provide some justifications for presenting a
     “practitioners view”, some background information beyond my
     affiliation may be useful
     Although employed in the Department of Economics at
     Norges Handelshøyskole, I am an academic geographer,
     educated in Cambridge, and the London School of Economics
     My specialities within geography are in quantitative methods
     and geographical information systems, and have used and
     developed software since 1973, for research and teaching
     During the EU 5th Framework, I was involved in the
     evaluation of three open source Information Society
     Technologies (IST) calls; I also founded the MBA programmes
     at Warsaw University of Technology in 1991/92

                                    Roger Bivand     A practitioner’s view
Experience
                                         R project
                                          R spatial
       Managing research in collaborative networks


Little languages
      My first “open source” publication was an extra module for the
      proprietary program Systat, with both source code and DOS binaries
      available for FTP download, and an accompanying paper in
      Computers & Geosciences in 1992
      While much early software (Fortran, later C) was compiled (I only
      had limited exposure to BASIC), by the 1980s little languages,
      generally interpreted, began to appear as glue for compiled programs
      The languages covered in two of my papers published in 1996 and
      1997 were the Unix shell scripting language and AWK, used as glue
      for the GRASS GIS, and for GMT for map production; I have been
      using Unix/Linux since 1985
      In these papers and other work in the mid 1990s, I pointed up the
      benefits of scripting in permitting work to be reproduced and
      audited, contrasted with non-journalling GUIs that were becoming
      prevalent in academic practice
                                     Roger Bivand     A practitioner’s view
Experience
                                          R project
                                           R spatial
        Managing research in collaborative networks


Glimpse from 1997
  Here is a slide from a talk given in Italy about software for
  handling geographical information (GI) in early 1997:


                  MAPPING GI USERS:



                  PRODUCTION: high training costs,                             PROFESSIONALS: as consultants customising
                  application specific macro languages,                          GI handling technologies for clients in long/
                  few linking requirements (cf. COTS)                            medium term relationships; as researchers
                                                                                                 in GI handling technologies




                      MORE                                STANDARDISED TASKS                                       LESS
                      LESS                                NEED OPEN SOFTWARE                                      MORE




                  CASUAL: generic likeness to                                           CURIOUS: as researchers analysing
                  familiar GUI, looks & behaves                                         geographic information; as citizens
                  like Excel or Netscape (cf.                                            challenging the use of GI by private
                  plug-ins)                                                            companies and public administration




                                                  Roger Bivand            A practitioner’s view
Experience
                                         R project
                                          R spatial
       Managing research in collaborative networks


Using the R project

      My first message to the R project was in mid January 1997, as
      I had begun using early alpha releases to re-implement a
      number of spatial analysis functions
      The initial motivation to systematise code for functions for
      spatial data analysis was for a course given in the University of
      Bergen Department of Geography; we were a joint department
      until administrative changes split us
      By 1998, Albrecht Gebhardt (Klagenfurt, Austria) and I had
      provided code for most simple spatial data analysis for R,
      either porting existing code, or writing fresh contributions
      (presentation at a congress in Vienna)
      But what is the R project?

                                     Roger Bivand     A practitioner’s view
Experience
                                          R project
                                           R spatial
        Managing research in collaborative networks


www.r-project.org
  While its website is non-candy, R is becoming a central resource for
  statistical and computational data analysis across the sciences and
  in business:




                                      Roger Bivand     A practitioner’s view
Experience
                                        R project
                                         R spatial
      Managing research in collaborative networks


The R project

     R is a language and environment for statistical computing and
     graphics — it is a GNU project which is similar to the S language
     and environment which was developed at Bell Laboratories (formerly
     AT&T, now Alcatel–Lucent) by John Chambers and colleagues
     R can be considered as a different implementation of S. There are
     some important differences, but much code written for S runs
     unaltered under R
     The term “environment” is intended to characterize it as a fully
     planned and coherent system, rather than an incremental accretion
     of very specific and inflexible tools, as is frequently the case with
     other data analysis software
     Many users think of R as a statistics system. We prefer to think of
     it of an environment within which statistical techniques are
     implemented — R can be extended (easily) via packages

                                    Roger Bivand     A practitioner’s view
Experience
                                             R project
                                              R spatial
           Managing research in collaborative networks


 The R foundation

 The R project began as an academic
initiative with no funding in Auckland,
New Zealand, and was licensed under
GPL as more collaborators joined. This
group was strengthed by academic
contributors to S, who began to work
with R in the late 1990s. By 2002, a
more formal structure was needed, and a
foundation was formed. I was invited to
join as an ordinary member in March
2003, so have seen things “from the
kitchen” since then.


                                         Roger Bivand     A practitioner’s view
Experience
                                             R project
                                              R spatial
           Managing research in collaborative networks


 The R community

 While the software system was intended
to be “fully planned and coherent” the
                                    ,
community that has grown up around R
is neither planned nor coherent. Since
1997, there have been two main mailing
lists, one for users, the other for
developers. John Fox (another non-core
ordinary foundation member) has
described the social structure of the
project in a recent paper in the R
Journal, from which this graph is taken:



                                         Roger Bivand     A practitioner’s view
Experience
                                             R project
                                              R spatial
           Managing research in collaborative networks


 CRAN and contributed packages

 The community has also grown thanks
to the ease with which packages may be
contributed. Both writing packages, and
their formal checking against R are not
hard — the check process executes all
the examples on the help pages and
other documentation. The
comprehensive R archive network
(CRAN) thus distributes R itself (source
and binaries for multiple platforms) and
packages (source and binaries), and
packages may also be installed and
updated from within R.


                                         Roger Bivand     A practitioner’s view
Experience
                                         R project
                                          R spatial
       Managing research in collaborative networks


CRAN




                                     Roger Bivand     A practitioner’s view
Experience
                                              R project
                                               R spatial
            Managing research in collaborative networks


 CRAN task views

 Since so many packages have been
contributed to R, and distributed
through CRAN, it became necessary to
provide a mechanism for guiding users
towards solutions to their problems. It is
helpful to see the complexity of CRAN
as an advantage, with “ecologically” more
fit packages establishing themselves in
“niches” possibly even in competition
with other packages providing similar
facilities. Task views have been added as
a light-weight non-authoritative way of
offering suggestions:


                                          Roger Bivand     A practitioner’s view
Experience
                                             R project
                                              R spatial
           Managing research in collaborative networks


 R Forge

 In addition to CRAN running the
released, patched, and development
versions of R on the CRAN packages’
examples nightly, packages may also be
hosted on the R Forge repository. This
provides the usual *forge services, such
as SVN, but also builds Windows and
OSX binary packages, and checks
package source on multiple platforms
nightly. So even alpha or beta packages
may be made available, and may begin
to harvest user input, before being
released to CRAN:


                                         Roger Bivand     A practitioner’s view
Experience
                                         R project
                                          R spatial
       Managing research in collaborative networks


R spatial
      In 1999 I had interfaced R and the open source GIS GRASS, and
      presented a paper on this at a Scandinavian GIS meeting — the
      paper was rejected by Norsk Geografisk Tidsskrift, but published in
      extended form in Computers & Geosciences in 2000
      This, and the publication of a paper based on my 1998 presentation
      with Albrecht Gebhardt in Journal of Geographical Systems, and a
      presentation with Markus Neteler, the lead GRASS developer at the
      2000 GeoComputation conference, led to closer personal contacts
      with R core
      Kurt Hornik, who runs CRAN, encouraged me to talk about R and
      GIS at the March 2001 Distributed Statistical Computing meeting in
      Vienna, at which I got to know active developers personally
      By the next DSC meeting in March 2003, I was organising a
      thematic session on spatial statistics, and a crucial fringe developers’
      workshop to discuss how to advance spatial data analysis in R
                                     Roger Bivand     A practitioner’s view
Experience
                                              R project
                                               R spatial
            Managing research in collaborative networks


 CRAN Spatial task view

 Since 2003, a number of
community-building steps have been
made over and above developing
contributed packages. From the CRAN
side, the Spatial task view is the hub, to
which traffic is channelled to package
pages and to ancilliary websites, as well
as the special interest group mailing list.
Some package authors contact me to
ask to be included, others are asked
whether they want to be added to the
web of information


                                          Roger Bivand     A practitioner’s view
Experience
                                             R project
                                              R spatial
           Managing research in collaborative networks


 R-sig-geo mailing list
 Following the 2003 workshop, we                                                    monthly number of emails on r−sig−geo
started a project on Sourceforge to                                                                                                                                                  q




permit joint development, and a mailing




                                                                        300
list served within the family of R lists
                                                                                                                                                                         q




                                                                        250
from Zurich. Traffic on the list has                                                                                                                                  q
                                                                                                                                                                        q
                                                                                                                                                                                         q

                                                                                                                                                               q

grown steadily, with a subscribed                                                                                                                                      q
                                                                                                                                                                           q
                                                                                                                                                                                    qq q
                                                                                                                                                                                      q




                                                                        200
                                                                                                                                                                                   q
                                                                                                                                                                               q




                                                          # of emails
membership in April 2010 of over 1600.                                                                                                                 q
                                                                                                                                                      q q          q
                                                                                                                                                                                 q


                                                                                                                               q                                             q
Naturally, many of these “lurk” without                                                                                                                    q




                                                                        150
                                                                                                                                     q        q       q
                                                                                                                                                  q
                                                                                                                                                          q q
                                                                                                                                       q q

posting, while others post without                                                                                q
                                                                                                                                         qq
                                                                                                                                                  q             q




                                                                        100
                                                                                                                          q
                                                                                                                           q     q

helping, and many fewer help by                                                                                      q
                                                                                                                      q    qq
                                                                                                                           q
                                                                                                                             q
                                                                                                                                   q          q



answering posted questions. This final                                   50
                                                                                                 q
                                                                                                       q

                                                                                                      q q q
                                                                                                           q
                                                                                                                 q
                                                                                                                      q
                                                                                                                       q
                                                                               q                       qq
                                                                                q             qq
group is however growing, and since the                                            qqq q
                                                                                    qqqq
                                                                                            qq
                                                                                                q
                                                                                                   q
                                                                                                    q


                                                                                                q q
                                                                                          q q
list archives are also kept on Nabble,                                        qq
                                                                               q
                                                                        0




they are easy to search for information.                                           2004         2005           2006        2007               2008              2009                 2010




                                         Roger Bivand     A practitioner’s view
Experience
                                                R project
                                                 R spatial
              Managing research in collaborative networks


  The sp package

 In 2003, we agreed that a shared system of
new-style classes to contain spatial data would
permit many-to-one and on-to-many conversion
of representations, avoiding the then prevalent
many-to-many conversion problem. The idea
was to make it easier for GIS people and stats
people to work together by creating objects that
“looked” familiar to both groups, although the
groups differ a lot in how they “see” data
objects. Package dependencies have grown, here
the upper diagram shows packages depending
on sp in April 2008, the lower diagram in April
2010:


                                            Roger Bivand     A practitioner’s view
Experience
                                              R project
                                               R spatial
            Managing research in collaborative networks


 R Wiki
 In addition to the “coordinated”
information sources, a community Wiki
does exist. While it seems to suit some
users, the general impression (among
older people?) is that there is little
feeling of responsibility for following up
tips given there. On the mailing list and
its archive, usually experienced
developers or users will clarify
misunderstandings, while on the Wiki,
posters do not feel obliged to update
their contributions, as when examples
stop working (they are not run ever,
unlike CRAN package examples):

                                          Roger Bivand     A practitioner’s view
Experience
                                             R project
                                              R spatial
           Managing research in collaborative networks


 Spatial on R Forge

 R Forge is used actively by individuals
and groups in developing packages for
spatial data analysis, with 52 projects
registered in April 2010. Some projects
are registered in more than one topical
area, some may never mature, but some
are already in active use; the raster
package is already frequently discussed
on R-sig-geo — it was released to CRAN
in late March 2010 after a gestation of
16 months.



                                         Roger Bivand     A practitioner’s view
Experience
                                             R project
                                              R spatial
           Managing research in collaborative networks


 Book website
 Finally, I’ll mention a book that I wrote
with Edzer Pebesma and Virgilio
G´mez-Rubio, and published in the
  o
Springer useR series in 2008. Not only
does the book seem to be doing OK, but
the website with dataset and code
download is visited frequently (450–600
unique visitors per month). The code is
run nightly against current R and the
various required contributed packages. It
may be of interest to note that the text
was written using the literate
programming tool Sweave in R, which is
designed to support reproducible
research (as indeed is this talk).
                                         Roger Bivand     A practitioner’s view
Experience
                                         R project
                                          R spatial
       Managing research in collaborative networks


Managing research in higher education
      While the links between the knowledge economy and Open
      Source software are evident, there are very real challenges to
      the management of research and higher education in policy
      terms that need to be addressed
      Most research and higher education organisations have been
      rationalised and subjected to the styles of management
      practices introduced in commercial corporations years and
      even decades ago
      In particular, budget discipline is a favoured tool in attempting
      to point organisational units in directions seen as being
      appropriate
      Given that these organisations clearly face a “missing market”  ,
      in that neither potential students nor grant-giving bodies are
      analogues of customers in a fast-food restaurant, those
      responsible for management have a measurement problem
                                     Roger Bivand     A practitioner’s view
Experience
                                         R project
                                          R spatial
       Managing research in collaborative networks


Grant processes
      Universities and research institutions appear to “compete” in
      grant processes, and thereby seem to have an interest in
      locking potential competitors out, by securing privileged
      access to knowledge
      While such advantage may be quite real in the case of
      laboratory skills and quality — the institution does deliver
      services of higher quality, or when the institution has secured
      the services of high-flying academics — this model is not
      directly transferable to software
      Given the steadily increasing importance of software in
      teaching and research, it seems clear that care is needed in
      constructing management tools for activities which may
      produce or modify software (see the UEA “climategate”
      scandal)
                                     Roger Bivand     A practitioner’s view
Experience
                                         R project
                                          R spatial
       Managing research in collaborative networks


Software deliverables

      It does make sense for institutions to develop expertise in
      customising software, in training, and in publishing materials
      of benefit to software users on a for-profit basis
      It does not in general, however, make sense to mandate source
      closure in research programs or projects, in the same way that
      mandating openness might be mistaken
      The question as to whether software deliverables, or software
      developed in the process of creating deliverables should be
      opened is one that is relevant in all grant processes
      It is also highly relevant in evaluation routines associated with
      program and project execution


                                     Roger Bivand     A practitioner’s view
Experience
                                         R project
                                          R spatial
       Managing research in collaborative networks


Handling software in research projects

      In grant awarding and evaluation processes, the grant-making
      body should consider at least two factors: the importance of
      Open Source for enhanced efficiency in providing the software
      needed in a project, and the importance of reproducibility and
      peer-review in the scientific process generally
      It can thus be argued that the management of the boundary
      between what the institution “owns”, what can sensibly be
      commercialised on a for-profit basis, and research productivity
      and efficiency deserves attention
      Otherwise, naive and rather outdated management practises
      can endanger research quality and productivity with regard to
      software innovation and incremental improvement by seeing
      products where one should see services

                                     Roger Bivand     A practitioner’s view
Experience
                                         R project
                                          R spatial
       Managing research in collaborative networks


Software, research and higher education
      There are clearly cases in which source code should not be
      opened, although when the public purse has funded the
      research involves, the number of real cases will in practice be
      very few, even for projects with very small communities of
      interest
      It is of importance for the enabling, for the empowering of
      actors in the knowledge economy, that unnecessary barriers to
      the diffusion of knowledge be removed, and that new ones not
      be permitted to emerge
      As a corollary, researchers should perhaps be given incentives
      in career terms to contribute to the pool of knowledge by
      opening source code, and by contributing to the improvement
      of software in their domain of science, in the same way that
      publications are rewarded
                                     Roger Bivand     A practitioner’s view
Experience
                                        R project
                                         R spatial
      Managing research in collaborative networks


Round-up
     As far as I am aware, no research council has played any
     relevant role in the progress of the R project directly
     Indirectly, research council funded projects have included
     software deliverables defined as contributed packages,
     including spatial packages (but none that I have handled)
     Even more indirectly, people in research council funded
     doctoral and post-doctoral positions have not only used R and
     R spatial, but have contributed to software development, even
     though this was not required or mentioned in their projects
     Finally, the diffuseness and unpredictability of collaborative
     networks of “amator” developers makes it very hard to reply to
     calls; if a research council wanted to be pro-active, it might
     fund travel for active developers to enable them to meet, or
     similar enabling measures
                                    Roger Bivand     A practitioner’s view

GoOpen 2010: Roger Bivand

  • 1.
    Experience R project R spatial Managing research in collaborative networks Open Source software, research and higher education: a practitioner’s view GoOpen 2010 (Fou thread), Aker Brygge, Oslo, 19–20 April. Roger Bivand Department of Economics Norwegian School of Economics and Business Administration Bergen, Norway 20 April 2010 Roger Bivand A practitioner’s view
  • 2.
    Experience R project R spatial Managing research in collaborative networks Outline This talk will examine how open source software development and use may interact with their institutional contexts in research and higher education The talk will be based on experience of open source development in applied statistics and geospatial applications Reasons for mismatch between an institutional context preferring secrecy when applying for funding, restricted deliverables, and races to publication, and the ways in which open source development occur will be discussed In particular, the roles of mutual trust and community-building in open source development will be stressed; these factors appear to express externalities between developers and users of software that are neglected in the exclusive management models prevalent in research and higher education Roger Bivand A practitioner’s view
  • 3.
    Experience R project R spatial Managing research in collaborative networks Contextual background In order to provide some justifications for presenting a “practitioners view”, some background information beyond my affiliation may be useful Although employed in the Department of Economics at Norges Handelshøyskole, I am an academic geographer, educated in Cambridge, and the London School of Economics My specialities within geography are in quantitative methods and geographical information systems, and have used and developed software since 1973, for research and teaching During the EU 5th Framework, I was involved in the evaluation of three open source Information Society Technologies (IST) calls; I also founded the MBA programmes at Warsaw University of Technology in 1991/92 Roger Bivand A practitioner’s view
  • 4.
    Experience R project R spatial Managing research in collaborative networks Little languages My first “open source” publication was an extra module for the proprietary program Systat, with both source code and DOS binaries available for FTP download, and an accompanying paper in Computers & Geosciences in 1992 While much early software (Fortran, later C) was compiled (I only had limited exposure to BASIC), by the 1980s little languages, generally interpreted, began to appear as glue for compiled programs The languages covered in two of my papers published in 1996 and 1997 were the Unix shell scripting language and AWK, used as glue for the GRASS GIS, and for GMT for map production; I have been using Unix/Linux since 1985 In these papers and other work in the mid 1990s, I pointed up the benefits of scripting in permitting work to be reproduced and audited, contrasted with non-journalling GUIs that were becoming prevalent in academic practice Roger Bivand A practitioner’s view
  • 5.
    Experience R project R spatial Managing research in collaborative networks Glimpse from 1997 Here is a slide from a talk given in Italy about software for handling geographical information (GI) in early 1997: MAPPING GI USERS: PRODUCTION: high training costs, PROFESSIONALS: as consultants customising application specific macro languages, GI handling technologies for clients in long/ few linking requirements (cf. COTS) medium term relationships; as researchers in GI handling technologies MORE STANDARDISED TASKS LESS LESS NEED OPEN SOFTWARE MORE CASUAL: generic likeness to CURIOUS: as researchers analysing familiar GUI, looks & behaves geographic information; as citizens like Excel or Netscape (cf. challenging the use of GI by private plug-ins) companies and public administration Roger Bivand A practitioner’s view
  • 6.
    Experience R project R spatial Managing research in collaborative networks Using the R project My first message to the R project was in mid January 1997, as I had begun using early alpha releases to re-implement a number of spatial analysis functions The initial motivation to systematise code for functions for spatial data analysis was for a course given in the University of Bergen Department of Geography; we were a joint department until administrative changes split us By 1998, Albrecht Gebhardt (Klagenfurt, Austria) and I had provided code for most simple spatial data analysis for R, either porting existing code, or writing fresh contributions (presentation at a congress in Vienna) But what is the R project? Roger Bivand A practitioner’s view
  • 7.
    Experience R project R spatial Managing research in collaborative networks www.r-project.org While its website is non-candy, R is becoming a central resource for statistical and computational data analysis across the sciences and in business: Roger Bivand A practitioner’s view
  • 8.
    Experience R project R spatial Managing research in collaborative networks The R project R is a language and environment for statistical computing and graphics — it is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Alcatel–Lucent) by John Chambers and colleagues R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R The term “environment” is intended to characterize it as a fully planned and coherent system, rather than an incremental accretion of very specific and inflexible tools, as is frequently the case with other data analysis software Many users think of R as a statistics system. We prefer to think of it of an environment within which statistical techniques are implemented — R can be extended (easily) via packages Roger Bivand A practitioner’s view
  • 9.
    Experience R project R spatial Managing research in collaborative networks The R foundation The R project began as an academic initiative with no funding in Auckland, New Zealand, and was licensed under GPL as more collaborators joined. This group was strengthed by academic contributors to S, who began to work with R in the late 1990s. By 2002, a more formal structure was needed, and a foundation was formed. I was invited to join as an ordinary member in March 2003, so have seen things “from the kitchen” since then. Roger Bivand A practitioner’s view
  • 10.
    Experience R project R spatial Managing research in collaborative networks The R community While the software system was intended to be “fully planned and coherent” the , community that has grown up around R is neither planned nor coherent. Since 1997, there have been two main mailing lists, one for users, the other for developers. John Fox (another non-core ordinary foundation member) has described the social structure of the project in a recent paper in the R Journal, from which this graph is taken: Roger Bivand A practitioner’s view
  • 11.
    Experience R project R spatial Managing research in collaborative networks CRAN and contributed packages The community has also grown thanks to the ease with which packages may be contributed. Both writing packages, and their formal checking against R are not hard — the check process executes all the examples on the help pages and other documentation. The comprehensive R archive network (CRAN) thus distributes R itself (source and binaries for multiple platforms) and packages (source and binaries), and packages may also be installed and updated from within R. Roger Bivand A practitioner’s view
  • 12.
    Experience R project R spatial Managing research in collaborative networks CRAN Roger Bivand A practitioner’s view
  • 13.
    Experience R project R spatial Managing research in collaborative networks CRAN task views Since so many packages have been contributed to R, and distributed through CRAN, it became necessary to provide a mechanism for guiding users towards solutions to their problems. It is helpful to see the complexity of CRAN as an advantage, with “ecologically” more fit packages establishing themselves in “niches” possibly even in competition with other packages providing similar facilities. Task views have been added as a light-weight non-authoritative way of offering suggestions: Roger Bivand A practitioner’s view
  • 14.
    Experience R project R spatial Managing research in collaborative networks R Forge In addition to CRAN running the released, patched, and development versions of R on the CRAN packages’ examples nightly, packages may also be hosted on the R Forge repository. This provides the usual *forge services, such as SVN, but also builds Windows and OSX binary packages, and checks package source on multiple platforms nightly. So even alpha or beta packages may be made available, and may begin to harvest user input, before being released to CRAN: Roger Bivand A practitioner’s view
  • 15.
    Experience R project R spatial Managing research in collaborative networks R spatial In 1999 I had interfaced R and the open source GIS GRASS, and presented a paper on this at a Scandinavian GIS meeting — the paper was rejected by Norsk Geografisk Tidsskrift, but published in extended form in Computers & Geosciences in 2000 This, and the publication of a paper based on my 1998 presentation with Albrecht Gebhardt in Journal of Geographical Systems, and a presentation with Markus Neteler, the lead GRASS developer at the 2000 GeoComputation conference, led to closer personal contacts with R core Kurt Hornik, who runs CRAN, encouraged me to talk about R and GIS at the March 2001 Distributed Statistical Computing meeting in Vienna, at which I got to know active developers personally By the next DSC meeting in March 2003, I was organising a thematic session on spatial statistics, and a crucial fringe developers’ workshop to discuss how to advance spatial data analysis in R Roger Bivand A practitioner’s view
  • 16.
    Experience R project R spatial Managing research in collaborative networks CRAN Spatial task view Since 2003, a number of community-building steps have been made over and above developing contributed packages. From the CRAN side, the Spatial task view is the hub, to which traffic is channelled to package pages and to ancilliary websites, as well as the special interest group mailing list. Some package authors contact me to ask to be included, others are asked whether they want to be added to the web of information Roger Bivand A practitioner’s view
  • 17.
    Experience R project R spatial Managing research in collaborative networks R-sig-geo mailing list Following the 2003 workshop, we monthly number of emails on r−sig−geo started a project on Sourceforge to q permit joint development, and a mailing 300 list served within the family of R lists q 250 from Zurich. Traffic on the list has q q q q grown steadily, with a subscribed q q qq q q 200 q q # of emails membership in April 2010 of over 1600. q q q q q q q Naturally, many of these “lurk” without q 150 q q q q q q q q posting, while others post without q qq q q 100 q q q helping, and many fewer help by q q qq q q q q answering posted questions. This final 50 q q q q q q q q q q qq q qq group is however growing, and since the qqq q qqqq qq q q q q q q q list archives are also kept on Nabble, qq q 0 they are easy to search for information. 2004 2005 2006 2007 2008 2009 2010 Roger Bivand A practitioner’s view
  • 18.
    Experience R project R spatial Managing research in collaborative networks The sp package In 2003, we agreed that a shared system of new-style classes to contain spatial data would permit many-to-one and on-to-many conversion of representations, avoiding the then prevalent many-to-many conversion problem. The idea was to make it easier for GIS people and stats people to work together by creating objects that “looked” familiar to both groups, although the groups differ a lot in how they “see” data objects. Package dependencies have grown, here the upper diagram shows packages depending on sp in April 2008, the lower diagram in April 2010: Roger Bivand A practitioner’s view
  • 19.
    Experience R project R spatial Managing research in collaborative networks R Wiki In addition to the “coordinated” information sources, a community Wiki does exist. While it seems to suit some users, the general impression (among older people?) is that there is little feeling of responsibility for following up tips given there. On the mailing list and its archive, usually experienced developers or users will clarify misunderstandings, while on the Wiki, posters do not feel obliged to update their contributions, as when examples stop working (they are not run ever, unlike CRAN package examples): Roger Bivand A practitioner’s view
  • 20.
    Experience R project R spatial Managing research in collaborative networks Spatial on R Forge R Forge is used actively by individuals and groups in developing packages for spatial data analysis, with 52 projects registered in April 2010. Some projects are registered in more than one topical area, some may never mature, but some are already in active use; the raster package is already frequently discussed on R-sig-geo — it was released to CRAN in late March 2010 after a gestation of 16 months. Roger Bivand A practitioner’s view
  • 21.
    Experience R project R spatial Managing research in collaborative networks Book website Finally, I’ll mention a book that I wrote with Edzer Pebesma and Virgilio G´mez-Rubio, and published in the o Springer useR series in 2008. Not only does the book seem to be doing OK, but the website with dataset and code download is visited frequently (450–600 unique visitors per month). The code is run nightly against current R and the various required contributed packages. It may be of interest to note that the text was written using the literate programming tool Sweave in R, which is designed to support reproducible research (as indeed is this talk). Roger Bivand A practitioner’s view
  • 22.
    Experience R project R spatial Managing research in collaborative networks Managing research in higher education While the links between the knowledge economy and Open Source software are evident, there are very real challenges to the management of research and higher education in policy terms that need to be addressed Most research and higher education organisations have been rationalised and subjected to the styles of management practices introduced in commercial corporations years and even decades ago In particular, budget discipline is a favoured tool in attempting to point organisational units in directions seen as being appropriate Given that these organisations clearly face a “missing market” , in that neither potential students nor grant-giving bodies are analogues of customers in a fast-food restaurant, those responsible for management have a measurement problem Roger Bivand A practitioner’s view
  • 23.
    Experience R project R spatial Managing research in collaborative networks Grant processes Universities and research institutions appear to “compete” in grant processes, and thereby seem to have an interest in locking potential competitors out, by securing privileged access to knowledge While such advantage may be quite real in the case of laboratory skills and quality — the institution does deliver services of higher quality, or when the institution has secured the services of high-flying academics — this model is not directly transferable to software Given the steadily increasing importance of software in teaching and research, it seems clear that care is needed in constructing management tools for activities which may produce or modify software (see the UEA “climategate” scandal) Roger Bivand A practitioner’s view
  • 24.
    Experience R project R spatial Managing research in collaborative networks Software deliverables It does make sense for institutions to develop expertise in customising software, in training, and in publishing materials of benefit to software users on a for-profit basis It does not in general, however, make sense to mandate source closure in research programs or projects, in the same way that mandating openness might be mistaken The question as to whether software deliverables, or software developed in the process of creating deliverables should be opened is one that is relevant in all grant processes It is also highly relevant in evaluation routines associated with program and project execution Roger Bivand A practitioner’s view
  • 25.
    Experience R project R spatial Managing research in collaborative networks Handling software in research projects In grant awarding and evaluation processes, the grant-making body should consider at least two factors: the importance of Open Source for enhanced efficiency in providing the software needed in a project, and the importance of reproducibility and peer-review in the scientific process generally It can thus be argued that the management of the boundary between what the institution “owns”, what can sensibly be commercialised on a for-profit basis, and research productivity and efficiency deserves attention Otherwise, naive and rather outdated management practises can endanger research quality and productivity with regard to software innovation and incremental improvement by seeing products where one should see services Roger Bivand A practitioner’s view
  • 26.
    Experience R project R spatial Managing research in collaborative networks Software, research and higher education There are clearly cases in which source code should not be opened, although when the public purse has funded the research involves, the number of real cases will in practice be very few, even for projects with very small communities of interest It is of importance for the enabling, for the empowering of actors in the knowledge economy, that unnecessary barriers to the diffusion of knowledge be removed, and that new ones not be permitted to emerge As a corollary, researchers should perhaps be given incentives in career terms to contribute to the pool of knowledge by opening source code, and by contributing to the improvement of software in their domain of science, in the same way that publications are rewarded Roger Bivand A practitioner’s view
  • 27.
    Experience R project R spatial Managing research in collaborative networks Round-up As far as I am aware, no research council has played any relevant role in the progress of the R project directly Indirectly, research council funded projects have included software deliverables defined as contributed packages, including spatial packages (but none that I have handled) Even more indirectly, people in research council funded doctoral and post-doctoral positions have not only used R and R spatial, but have contributed to software development, even though this was not required or mentioned in their projects Finally, the diffuseness and unpredictability of collaborative networks of “amator” developers makes it very hard to reply to calls; if a research council wanted to be pro-active, it might fund travel for active developers to enable them to meet, or similar enabling measures Roger Bivand A practitioner’s view