A presentation given at the co-ordination workshop on Open Access to Scientific Information on Wednesday 4th May 2011 at the EU DG Information Society & Media, Avenue de Beaulieu 25, Brussels.
New ways to communicate in science: perspectives from biodiversity research
1. New ways to communicate in science: perspectives from biodiversity research Vince Smith Natural History Museum, London [email_address] ViBRANT Virtual Biodiversity
2.
3.
4.
5.
6. ViBRANT: virtual biodiversity research 15 17 partners in 9 countries (universities, museums & SMEs) Building a natively digital scholarly communication system for European biodiversity research ( Interoperability, workflows, service sharing & information modeling)
7.
8.
9.
10. Third party data publishing Specimen records on Scratchpads Automatically pushed to 3rd party specialist data publishers >18K specimen records (local small scale coverage) >276M specimen records (worldwide coverage) 15 http://scratchpads.eu > http://gbif.org
11. Third party data publishing Specimen records on Scratchpads Pushed by author to 3rd party specialist data publishers >18K specimen records (local small scale coverage) >56k species assessed, 18k threatened (worldwide coverage) 15 http://scratchpads.eu > http://iucn.org
12. Next generation article publishing Paper assembled from Scratchpad database XML submission, peer review & marked-up publication by Pensoft 5-step workflow for selecting data, adding metadata & previewing Published in Zookeys & Phytokeys (worldwide coverage) 15 http://scratchpads.eu > http://pensoft.net doi:10.3897/zookeys.50.539 PDF HTML XML
SLIDE 1: TITLE SLIDE Thanks for that introduction. Good morning everybody, my name is Vince Smith and I am the coordinator for the ViBRANT project. ViBRANT is a 3-year EU FP7 Research Infrastructure project that started in December last year. It is funded under the Virtual Research Communities call. ViBRANT stands for Virtual Biodiversity Research and Access Network for Taxonomy, and in many ways that is a very good description of what we are doing. ViBRANT is about setting up the means, tools and infrastructure to produce a more rational and a more effective framework for European Biodiversity research. Fundamentally this means linking up the people, the data and the science of biodiversity, through a common and interoperable infrastructure.
Slide 2: TALK OUTLINE At its heart ViBRANT is about building a research infrastructure to support scholarly communication in biodiversity science and in this talk I want to start off by highlighting two key trends which I think set the tone for discussion about new ways to communicate in science. - The first of these trends concerns how both publishers and authors are moving away from physical paper as their medium of choice for publishing information. The second trend concerns how changes in research practices are changing the unit of scholarly communication away from the article. I then want to say a few words about the challenges to the traditional scholarly communication system are particularly appropriate for biodiversity science. My community have some specific needs in this area, which may explain why we are in the vanguard of exploring new ways to communicate in science. I’ll then introduce the ViBRANT project and explore the components that are particularly concerned with scholarly publishing. Even within a single project like ViBRANT we are exploring a ranges of different mechanisms for scholarly communication. This is because even in a relatively narrow community like biodiversity science, no one approach is going to suit all the participants of this community. Finally I’ll briefly talk about incentives and metrics. If we are to encourage people to engage in new ways to communicate in science, we will need to think carefully about how we can reward this activity. Then I will close with some words on future directions in VIBRANT, which link up some of the issues I have raised. First off then I want to briefly mention to trends which I thing are common to scholarly communication, regardless off the discipline.
Slide 3: Trend 1: the death of paper The first of these is the death of physical paper as a medium of publication. Based on a survey of major academic publishers back in 2008, we know an average of 90% of all academic journals are available electronically. The rate varies a little with the discipline – its slightly higher in science, technical and medial publishing and slightly lower in the humanities. This is not to say that these publishers have abandoned paper. Indeed there are very few e-ONLY journals. However, most of them are online. Second to this, we know that subscriptions to academic journals are increasingly e-only. Amongst the major universities in the UK (the so called Russell group universities), more that 75% of all subscriptions are electronic. Thirdly we know that the year on year compound growth in the usage of electronic journals is rising. To sum up those points, most journals are online, more subscriptions are electronic, and more people that ever are using them online. These transitions are largely being driven by cost. There are very significant economies of scale, both for the publisher, and the subscribing institution if content moves online. This is a particular challenge however, for small-scale publishers which are common in some sectors (such as biodiversity science) and for smaller institutions, which don’t have the negotiating power to purchase large journal subscription packages. Finally, it is worth noting that data suggest it is not Open Access which is the biggest factor in driving this transition to e-publishing. Issues about confidence in the longevity of electronic content, issues about different scholarly cultures, and finally issues about cost are potentially bigger drivers than Open Access.
Slide 4: Trend 2: the death “t h e ” paper The second major transition that I want to comment on concerns the movement away from the scholarly paper (or article) as the major means of scholarly communication. For something like 350 years now, the scientific paper has been the primary way of establishing precedence and validity of scholarly claims. However, as research practices have changed, other forms of scholarly communication are better suited to transmitting information. Research is increasingly a collaborative, data intensive and networked activity. Scholarly databases, in their many different forms, are increasingly becoming a better means of communication information than scholarly papers, because their contents can be re-used in many different ways, compared to scientific papers, which frequently bury this information. Yet in large part our scholarly communication system has not adapted. PDF papers are still the predominant form of scholarly communication, and these hide or exclude the wealth of information and effort – the so-called “D a rk data ” that lead to the construction of the paper. As an example of this, I recently published a paper that had just 2,500 words, but represented the collective effort of 6 people in three different continents who have been building the underling dataset for the past 10 years. While those 2,500 words that I published have a certain value, it’s the underlying data from hundreds of different species, several different genes, and 100’s of hours of analysis that represents the lasting wealth of information we created. In effect what we need is a natively digital scholarly communication system that supports the life-cycle of this data from its creation, through to its synthesis into scholarly papers. In many ways my thinking in this areas is summed up by this quote by a digital librarian, which nicely expresses the problem. “ the future scholarly communication system should closely resemble—and be intertwined with—the scholarly endeavor itself, rather than being its after-thought or annex ”
Slide 5: Communicating biodiversity science The reason why the article is a particularly poor way of communication biodiversity science is perhaps best understood if I say a little about why we are trying to achieve. Our discipline has this enormously ambitious goal, to inventory all the Earth’s species, to document their relationships – not just their evolutionary relationships but also their ecological relationships, and we what to publish and apply these data. In effect, taxonomy in one form or another is the foundation stone for biology. The data set for this endeavour are the 1.8 M species we have described, in approximately 300M pages of published text over last 250 years, and the billions of specimens we have collected and enshrines in our museums and collections, which are estimated to be something in the region of 1.5-3 billion specimens. To do all this work we have a relatively tiny workforce. There are estimated to be no more than 4-8,000 professional taxonomists. Fortunately this is an endeavour that many people are attracted to in a non-professional capacity and there are estimated to be something in the region of 30-40,000 “P r o-amateurs ” who work in taxonomy worldwide. These are people who are not paid for their work, but work to professional standards. And finally taxonomy and biodiversity is a subject that engenders enormous enthusiasm amongst the general public and there are potentially many thousands, perhaps millions of people worldwide who would be prepared to devote a little of their time to help with this cause of describing all life on each. So in sum, we enormously big question that we are trying to answer; a huge and deeply interconnected dataset of species names, publications and specimens that form our data set; and a tiny number of people in a position to help. Communicating this information is not best achieved on pieces of paper – let alone on physical paper. Rather we need a natively digital system of communication biodiversity information, and in many ways that is exactly what the VIBRANT project is trying to achieve.
Slide 6: ViBRANT: virtualising biodiversity research So ViBRANT is a consortium of 17 partners in 9 different countries, all of whom are specialists in managing and building services on biodiversity data. The projects goals are to link these projects up to provide a more effective framework for European Biodiversity Research. In effect we are building a natively digital scholarly communication system for European biodiversity research, and in more practical terms this concerns issues to do with interoperability (making sure that systems work together), building workflows (so that users can seamlessly jump between these linked systems), service sharing (so that we can access the specialist services) & information modeling (so we can more effectively manage the data).
Slide 7: Main publishing components of ViBRANT In practical terms just three of the partners have a major role in the scholarly communication parts of ViBRANT, and is the activities of these partners that I’ll be focusing on for the rest of the talk. I’ll introduce the activities of these partners first, and then highlight the different publishing services they are offering. First off we have the Scratchpads. This is the primary user access point for ViBRANT services. These are hosted websites for biodiversity scientists. In effect we have an ecosystem of research communities (more than 230+ different communities) using these tools. Each Scratchpad acts as a research & publication platform for their users. They are highly flexible so they can meet the very different needs of these communities, and their support the workflows required by their scholarly activities. At present we have about 3,000 users spread across these 230 different communities, who have built about 300,000 pages of content in this system since 2007. A second key component in the scholarly communication is Pensoft. This is a publisher based in Sofia Bulgaria that is specialist in low cost open access publishing of biodiversity science. They produce a wide range of biodiversity books, e-books and scholarly journals, and in what is special about Pensoft is that as part of the ViBRANT collaboration they are providing tools that enable scholars to quickly turn their biodiversity databases into scholarly papers. Finally ViBRANT is working with a number of major biodiversity databases. One of the biggest is GBIF who specialise in publishing primary biodiversity data (mainly museum specimens & field observations) from a network of distributed databases. In fact GBIF currently are publishing more than 276M data records from 12k datasets by 336 publishers. Together, these partners are providing four different types of scholarly publishing service. These are 1) a low cost journal infrastructure; 2) community web publishing; 3) biodiversity observation data publishing; and 4) next generation publishing services producing scholarly articles from databases. In the next part of this talk I want to provide a little detail on these services:
Slide 8: Low cost journal infrastructure So first off, a low cost journal infrastructure. We have a small number of scratchpad communities that are using their Scratchpads as a scientific journal to publish PDF articles. These are the websites of various biodiversity societies and publish their society journal through a Scratchpad. I know in detail about two of these societies. One is a specialist publication on mosquitoes and publishing the E u ropean Mosquito bulletin , which looks at the spread of mosquitoes across Europe. Another is a specialist group on stick insects, of which there are about 5,000 species. Each of these journals have independent editorial control and peer review which is managed by their respective societies. They are free to publish in, entirely Open Access and there are no page limits or colour limits. The infrastructure is free and the only costs involved are those of the time taken by the peer review and editorial board, which is managed by the societies. However they are only available electronically. Each of these societies have ISBN numbers for their journals but because they are not part of any formal publishing house the articles have no doi’s, are not indexed by the PubMed database or have any ISI impact factor. They are also relatively unsophisticated in their use of the site. They don’t have any online submission (e-mail) – this is all managed by e-mail, and the societies are just publishing PDF articles.
Slide 9: Community self publishing, beyond the PDF In fact societies publishing PDF articles on their Scratchpads is a relatively niche activity restricted to just a few sites. By far in away the biggest use of the Scratchpads, although one where is there much wider variation in quality than with the society publishing, is in the direct use of the sites to publish biodiversity information. A majority of the sites are specialist resources for information on specific taxa, or the flora and flora of specific regions. Scratchpads have been developed to support the many specialist datatypes concerning these taxonomic datasets. For example users can build databases of observations, specimen records, distribution maps, DNA sequences, evolutionary trees, image galleries, identification keys – to name but a few. Most of the communities using these tools specialise in providing this information for particular taxa. They rely on community editing by interested parties and community peer review. Their content as a condition of use of the system must be free, open access and available to others through a Creative Commons licence. The reputation of the authors governs quality with those authors that have a particular reputation to defend, such as scholarly authors, often providing the highest quality information. However, for the most part this information is not pushed to other specialist data repositories where is might be aggregated with other related information. For the most part it is confined to particular Scratchpads.
Slide 10: Third party data publishing Nevertheless there are some Scratchpad users which for particular kids of data, are sharing their datasets more widely we specialist data publishers. Where there is an available data standard, we have enabled services that allow Scratchpad users to share their data with larger data publishers. An area where this is currently working is with Specimen information. Across the Scratchpads we have over 18k specimen records and biological observations where people have recorded information about the particular taxonomic specimens in a standardised way. These data are connected to the GBIF network and added to the 276 Million specimen records already available. The records are crucial for mapping the distribution of species for a wide range of studies including climate change, changes in agricultural patterns and changing land use.
Slide 11: Third party data publishing Another area where we are involved in 3 rd part data publishing is with the International Union for the Conservation of Nature. This is a body that specialises in the assessment of the conservation of species. Crucially this requires the specimen observation data so we have information on the abundance and distribution of specimens, and in the construction of the formal assessments that tell us whether a species is critically endangered, endangered, threatened or not threatened.
Slide 12: Next generation article publishing Arguably our most ambitious developments in the context of scholarly communication concern the semi-automated construction of scholarly papers from databases. In many cases Scratchpads contain extensive data relating to the biology and description of particular species. These are highly suited to formal specialist publications, but doing so require the laborious assembly of these data into a manuscript. As part of the ViBRANT collaboration with Pensoft, and in particular their two journals specialising in the formal taxonomic descriptions of plants and animals, we have developed a 5-step workflow for users to selecting appropriate data from their Scratchpad, adding additional metadata describing this information so that it conforms with the norms of descriptive taxonomic publications, and then finally preview and submit this manuscript to the publisher. From here is goes through a standard peer review process. Review recommendations are then incorporated in to a revised version of the manuscript, which is then sent back to the publisher and simultaneously published on both the Scratchpad and in the publisher’s journal. In addition to dramatically speeding up the construction of scientific papers, the user has the advantage that firstly the manuscript can be revised and updated and new information comes to light on the Scratchpad, while preserving the original version of the manuscript as part of the scholarly record. In addition the publish extracts relevant information from the submitted database which is them automatically submitted to the specialist data repositories. So in sum then, these are the four ways in which ViBRANT is facilitation scholarly communication of biodiversity information. 1) Through low cost journal infrastructure; 2) by community web publishing; 3) by biodiversity observation data publishing; and 4) by next generation publishing services producing scholarly articles from databases.
Slide 13: Incentives and metrics To conclude I want to summarise what we have achieved so far and highlight the one problem that we have yet to properly addressed within the ViBRANT. To do this it is necessary to highlight what are the central tenants of a scholarly publication. Any system of scholarly communication needs to perform 5 basic functions. It needs to 1) support the registration of scholarly claim to enable the author to claim precedence for the work; 2) it must certify that claim to establish it’s validity – this usually done through some form of review, like peer review; it must 3) enable others to become aware of this work, in other words it must be findable, such as through some indexing mechanism. 4) It must archive the claim to preserve it in the scholarly record. Finally 5) any system of scholarly communication must reward the effort for publishing the claim, for example, by providing them with some form of credit. This typically takes the form of impact metrics and citation of the claim, such as the impact factor of the journal, or the citation of the scholarly work. In practical terms for the most part the ViBRANT infrastructure currently achieves most of these. Where it fails is in the rewarding component. As I said traditionally reward is achieved via article level metrics such as journal impact factor, or sometimes the personal H-Index metric for particular authors. We need to expand this concept of reward; firstly to embrace other kinds of scholarly units beyond just published articles. But also based on the citation and reuse of these units. In effect we need some kind of “ R e ward hub ” in which we can track the different and very varied contributions of authors, who in terms need some kind of author identifier through which we can track their work. Such as system of metrics is likely to be essential to pave the way for greater social acceptance of the value of other contributions beyond the conventional scholarly article.
Slide 14: Future directions So it is in this direction that ViBRANT is headed. Firstly we need to embrace other units of publication beyond the article. We need to reward the production of this content through the construction of metrics that measure the citation and reuse of this information. Essential to this is the need for a common author identifier, and this is an issue that many publishing houses are working on at the moment – particularly in the context of systems like CrossRef’s Author-ID. In ViBRANT we need to apply these metrics to a wide range of synthetic datasets (e.g. taxonomic checklists, identification keys, species threat assessments) and these datasets are going to be specific to particular communities. One concrete area that we are developing as a step toward this is to enable users to formally “P u blish ” metadata descriptions of datasets in a traditional journal. These data publications will allow authors to describe their datasets, provide a mechanism for their citation, and incentivize authors to produce them. It will also allow use of more traditional metrics of tracking the use of these datasets, for example through DOI’s numbers of the metadata publications, and provide some measure of their impact. Therefore as part of the ViBRANTs work we are setting up a special Pensoft journal for data publication in 2011. Perhaps the last message that I want to leave you with in that it’s likely the no single approach is going to work for any one community. Realistically, even in a small community like biodiversity science, a range of approaches is needed, and this is essential if they are to be socially accepted by their potential users.
Slide 15: Questions With that I’ll stop there and take questions. Thank you very much.