GoOpen 2010: Roger Bivand

2,634 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,634
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

GoOpen 2010: Roger Bivand

  1. 1. Experience R project R spatial Managing research in collaborative networks Open Source software, research and higher education: a practitioner’s view GoOpen 2010 (Fou thread), Aker Brygge, Oslo, 19–20 April. Roger Bivand Department of Economics Norwegian School of Economics and Business Administration Bergen, Norway 20 April 2010 Roger Bivand A practitioner’s view
  2. 2. Experience R project R spatial Managing research in collaborative networks Outline This talk will examine how open source software development and use may interact with their institutional contexts in research and higher education The talk will be based on experience of open source development in applied statistics and geospatial applications Reasons for mismatch between an institutional context preferring secrecy when applying for funding, restricted deliverables, and races to publication, and the ways in which open source development occur will be discussed In particular, the roles of mutual trust and community-building in open source development will be stressed; these factors appear to express externalities between developers and users of software that are neglected in the exclusive management models prevalent in research and higher education Roger Bivand A practitioner’s view
  3. 3. Experience R project R spatial Managing research in collaborative networks Contextual background In order to provide some justifications for presenting a “practitioners view”, some background information beyond my affiliation may be useful Although employed in the Department of Economics at Norges Handelshøyskole, I am an academic geographer, educated in Cambridge, and the London School of Economics My specialities within geography are in quantitative methods and geographical information systems, and have used and developed software since 1973, for research and teaching During the EU 5th Framework, I was involved in the evaluation of three open source Information Society Technologies (IST) calls; I also founded the MBA programmes at Warsaw University of Technology in 1991/92 Roger Bivand A practitioner’s view
  4. 4. Experience R project R spatial Managing research in collaborative networks Little languages My first “open source” publication was an extra module for the proprietary program Systat, with both source code and DOS binaries available for FTP download, and an accompanying paper in Computers & Geosciences in 1992 While much early software (Fortran, later C) was compiled (I only had limited exposure to BASIC), by the 1980s little languages, generally interpreted, began to appear as glue for compiled programs The languages covered in two of my papers published in 1996 and 1997 were the Unix shell scripting language and AWK, used as glue for the GRASS GIS, and for GMT for map production; I have been using Unix/Linux since 1985 In these papers and other work in the mid 1990s, I pointed up the benefits of scripting in permitting work to be reproduced and audited, contrasted with non-journalling GUIs that were becoming prevalent in academic practice Roger Bivand A practitioner’s view
  5. 5. Experience R project R spatial Managing research in collaborative networks Glimpse from 1997 Here is a slide from a talk given in Italy about software for handling geographical information (GI) in early 1997: MAPPING GI USERS: PRODUCTION: high training costs, PROFESSIONALS: as consultants customising application specific macro languages, GI handling technologies for clients in long/ few linking requirements (cf. COTS) medium term relationships; as researchers in GI handling technologies MORE STANDARDISED TASKS LESS LESS NEED OPEN SOFTWARE MORE CASUAL: generic likeness to CURIOUS: as researchers analysing familiar GUI, looks & behaves geographic information; as citizens like Excel or Netscape (cf. challenging the use of GI by private plug-ins) companies and public administration Roger Bivand A practitioner’s view
  6. 6. Experience R project R spatial Managing research in collaborative networks Using the R project My first message to the R project was in mid January 1997, as I had begun using early alpha releases to re-implement a number of spatial analysis functions The initial motivation to systematise code for functions for spatial data analysis was for a course given in the University of Bergen Department of Geography; we were a joint department until administrative changes split us By 1998, Albrecht Gebhardt (Klagenfurt, Austria) and I had provided code for most simple spatial data analysis for R, either porting existing code, or writing fresh contributions (presentation at a congress in Vienna) But what is the R project? Roger Bivand A practitioner’s view
  7. 7. Experience R project R spatial Managing research in collaborative networks www.r-project.org While its website is non-candy, R is becoming a central resource for statistical and computational data analysis across the sciences and in business: Roger Bivand A practitioner’s view
  8. 8. Experience R project R spatial Managing research in collaborative networks The R project R is a language and environment for statistical computing and graphics — it is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Alcatel–Lucent) by John Chambers and colleagues R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R The term “environment” is intended to characterize it as a fully planned and coherent system, rather than an incremental accretion of very specific and inflexible tools, as is frequently the case with other data analysis software Many users think of R as a statistics system. We prefer to think of it of an environment within which statistical techniques are implemented — R can be extended (easily) via packages Roger Bivand A practitioner’s view
  9. 9. Experience R project R spatial Managing research in collaborative networks The R foundation The R project began as an academic initiative with no funding in Auckland, New Zealand, and was licensed under GPL as more collaborators joined. This group was strengthed by academic contributors to S, who began to work with R in the late 1990s. By 2002, a more formal structure was needed, and a foundation was formed. I was invited to join as an ordinary member in March 2003, so have seen things “from the kitchen” since then. Roger Bivand A practitioner’s view
  10. 10. Experience R project R spatial Managing research in collaborative networks The R community While the software system was intended to be “fully planned and coherent” the , community that has grown up around R is neither planned nor coherent. Since 1997, there have been two main mailing lists, one for users, the other for developers. John Fox (another non-core ordinary foundation member) has described the social structure of the project in a recent paper in the R Journal, from which this graph is taken: Roger Bivand A practitioner’s view
  11. 11. Experience R project R spatial Managing research in collaborative networks CRAN and contributed packages The community has also grown thanks to the ease with which packages may be contributed. Both writing packages, and their formal checking against R are not hard — the check process executes all the examples on the help pages and other documentation. The comprehensive R archive network (CRAN) thus distributes R itself (source and binaries for multiple platforms) and packages (source and binaries), and packages may also be installed and updated from within R. Roger Bivand A practitioner’s view
  12. 12. Experience R project R spatial Managing research in collaborative networks CRAN Roger Bivand A practitioner’s view
  13. 13. Experience R project R spatial Managing research in collaborative networks CRAN task views Since so many packages have been contributed to R, and distributed through CRAN, it became necessary to provide a mechanism for guiding users towards solutions to their problems. It is helpful to see the complexity of CRAN as an advantage, with “ecologically” more fit packages establishing themselves in “niches” possibly even in competition with other packages providing similar facilities. Task views have been added as a light-weight non-authoritative way of offering suggestions: Roger Bivand A practitioner’s view
  14. 14. Experience R project R spatial Managing research in collaborative networks R Forge In addition to CRAN running the released, patched, and development versions of R on the CRAN packages’ examples nightly, packages may also be hosted on the R Forge repository. This provides the usual *forge services, such as SVN, but also builds Windows and OSX binary packages, and checks package source on multiple platforms nightly. So even alpha or beta packages may be made available, and may begin to harvest user input, before being released to CRAN: Roger Bivand A practitioner’s view
  15. 15. Experience R project R spatial Managing research in collaborative networks R spatial In 1999 I had interfaced R and the open source GIS GRASS, and presented a paper on this at a Scandinavian GIS meeting — the paper was rejected by Norsk Geografisk Tidsskrift, but published in extended form in Computers & Geosciences in 2000 This, and the publication of a paper based on my 1998 presentation with Albrecht Gebhardt in Journal of Geographical Systems, and a presentation with Markus Neteler, the lead GRASS developer at the 2000 GeoComputation conference, led to closer personal contacts with R core Kurt Hornik, who runs CRAN, encouraged me to talk about R and GIS at the March 2001 Distributed Statistical Computing meeting in Vienna, at which I got to know active developers personally By the next DSC meeting in March 2003, I was organising a thematic session on spatial statistics, and a crucial fringe developers’ workshop to discuss how to advance spatial data analysis in R Roger Bivand A practitioner’s view
  16. 16. Experience R project R spatial Managing research in collaborative networks CRAN Spatial task view Since 2003, a number of community-building steps have been made over and above developing contributed packages. From the CRAN side, the Spatial task view is the hub, to which traffic is channelled to package pages and to ancilliary websites, as well as the special interest group mailing list. Some package authors contact me to ask to be included, others are asked whether they want to be added to the web of information Roger Bivand A practitioner’s view
  17. 17. Experience R project R spatial Managing research in collaborative networks R-sig-geo mailing list Following the 2003 workshop, we monthly number of emails on r−sig−geo started a project on Sourceforge to q permit joint development, and a mailing 300 list served within the family of R lists q 250 from Zurich. Traffic on the list has q q q q grown steadily, with a subscribed q q qq q q 200 q q # of emails membership in April 2010 of over 1600. q q q q q q q Naturally, many of these “lurk” without q 150 q q q q q q q q posting, while others post without q qq q q 100 q q q helping, and many fewer help by q q qq q q q q answering posted questions. This final 50 q q q q q q q q q q qq q qq group is however growing, and since the qqq q qqqq qq q q q q q q q list archives are also kept on Nabble, qq q 0 they are easy to search for information. 2004 2005 2006 2007 2008 2009 2010 Roger Bivand A practitioner’s view
  18. 18. Experience R project R spatial Managing research in collaborative networks The sp package In 2003, we agreed that a shared system of new-style classes to contain spatial data would permit many-to-one and on-to-many conversion of representations, avoiding the then prevalent many-to-many conversion problem. The idea was to make it easier for GIS people and stats people to work together by creating objects that “looked” familiar to both groups, although the groups differ a lot in how they “see” data objects. Package dependencies have grown, here the upper diagram shows packages depending on sp in April 2008, the lower diagram in April 2010: Roger Bivand A practitioner’s view
  19. 19. Experience R project R spatial Managing research in collaborative networks R Wiki In addition to the “coordinated” information sources, a community Wiki does exist. While it seems to suit some users, the general impression (among older people?) is that there is little feeling of responsibility for following up tips given there. On the mailing list and its archive, usually experienced developers or users will clarify misunderstandings, while on the Wiki, posters do not feel obliged to update their contributions, as when examples stop working (they are not run ever, unlike CRAN package examples): Roger Bivand A practitioner’s view
  20. 20. Experience R project R spatial Managing research in collaborative networks Spatial on R Forge R Forge is used actively by individuals and groups in developing packages for spatial data analysis, with 52 projects registered in April 2010. Some projects are registered in more than one topical area, some may never mature, but some are already in active use; the raster package is already frequently discussed on R-sig-geo — it was released to CRAN in late March 2010 after a gestation of 16 months. Roger Bivand A practitioner’s view
  21. 21. Experience R project R spatial Managing research in collaborative networks Book website Finally, I’ll mention a book that I wrote with Edzer Pebesma and Virgilio G´mez-Rubio, and published in the o Springer useR series in 2008. Not only does the book seem to be doing OK, but the website with dataset and code download is visited frequently (450–600 unique visitors per month). The code is run nightly against current R and the various required contributed packages. It may be of interest to note that the text was written using the literate programming tool Sweave in R, which is designed to support reproducible research (as indeed is this talk). Roger Bivand A practitioner’s view
  22. 22. Experience R project R spatial Managing research in collaborative networks Managing research in higher education While the links between the knowledge economy and Open Source software are evident, there are very real challenges to the management of research and higher education in policy terms that need to be addressed Most research and higher education organisations have been rationalised and subjected to the styles of management practices introduced in commercial corporations years and even decades ago In particular, budget discipline is a favoured tool in attempting to point organisational units in directions seen as being appropriate Given that these organisations clearly face a “missing market” , in that neither potential students nor grant-giving bodies are analogues of customers in a fast-food restaurant, those responsible for management have a measurement problem Roger Bivand A practitioner’s view
  23. 23. Experience R project R spatial Managing research in collaborative networks Grant processes Universities and research institutions appear to “compete” in grant processes, and thereby seem to have an interest in locking potential competitors out, by securing privileged access to knowledge While such advantage may be quite real in the case of laboratory skills and quality — the institution does deliver services of higher quality, or when the institution has secured the services of high-flying academics — this model is not directly transferable to software Given the steadily increasing importance of software in teaching and research, it seems clear that care is needed in constructing management tools for activities which may produce or modify software (see the UEA “climategate” scandal) Roger Bivand A practitioner’s view
  24. 24. Experience R project R spatial Managing research in collaborative networks Software deliverables It does make sense for institutions to develop expertise in customising software, in training, and in publishing materials of benefit to software users on a for-profit basis It does not in general, however, make sense to mandate source closure in research programs or projects, in the same way that mandating openness might be mistaken The question as to whether software deliverables, or software developed in the process of creating deliverables should be opened is one that is relevant in all grant processes It is also highly relevant in evaluation routines associated with program and project execution Roger Bivand A practitioner’s view
  25. 25. Experience R project R spatial Managing research in collaborative networks Handling software in research projects In grant awarding and evaluation processes, the grant-making body should consider at least two factors: the importance of Open Source for enhanced efficiency in providing the software needed in a project, and the importance of reproducibility and peer-review in the scientific process generally It can thus be argued that the management of the boundary between what the institution “owns”, what can sensibly be commercialised on a for-profit basis, and research productivity and efficiency deserves attention Otherwise, naive and rather outdated management practises can endanger research quality and productivity with regard to software innovation and incremental improvement by seeing products where one should see services Roger Bivand A practitioner’s view
  26. 26. Experience R project R spatial Managing research in collaborative networks Software, research and higher education There are clearly cases in which source code should not be opened, although when the public purse has funded the research involves, the number of real cases will in practice be very few, even for projects with very small communities of interest It is of importance for the enabling, for the empowering of actors in the knowledge economy, that unnecessary barriers to the diffusion of knowledge be removed, and that new ones not be permitted to emerge As a corollary, researchers should perhaps be given incentives in career terms to contribute to the pool of knowledge by opening source code, and by contributing to the improvement of software in their domain of science, in the same way that publications are rewarded Roger Bivand A practitioner’s view
  27. 27. Experience R project R spatial Managing research in collaborative networks Round-up As far as I am aware, no research council has played any relevant role in the progress of the R project directly Indirectly, research council funded projects have included software deliverables defined as contributed packages, including spatial packages (but none that I have handled) Even more indirectly, people in research council funded doctoral and post-doctoral positions have not only used R and R spatial, but have contributed to software development, even though this was not required or mentioned in their projects Finally, the diffuseness and unpredictability of collaborative networks of “amator” developers makes it very hard to reply to calls; if a research council wanted to be pro-active, it might fund travel for active developers to enable them to meet, or similar enabling measures Roger Bivand A practitioner’s view

×