Going to give a introduction to the Scratchpad project which is based at the NHM, but part of the EDIT iniatitiave. Specifically I’m going to: Provide sound background on the origins of the Scratchpad project Say a little about Drupal - the content management system on which the Scratchpads are based Give a technical overview of how the Scratchpads work and how we manage them. Run through the functionalities of the Scratchpads And finally give you some perspectives on where the project is going and what the future is for the Scratchpad iniative.
Before I do this let me say a few words about the case for the Scratchpads: Why do we need them. - Taxonomy / biodiversity studies have lots of big questions which require large aggregations of data to address. Issue like climate change, changing patterns of land use, protection of food supplies, ecosystem services etc.
The problem is that the way we do taxonomy doesn’t make it possible to address these questions. There is no collective focus for the discipline. Instead we are all specilaists in our particyular field. We need some way of connecting the very small process of doing taxonomic research, with the very grand ambitions we have for the discipline as the founcatiion for much of biological science. And no matter how we do this, we have to do it in a way that functions both technically (providing some way of aggreggating all the diverse inputs into taxonomy) and sociologically, that fits the practical ways in which most taxonmists work.
The way that the NHM have tackled this problem is perhaps typical of many other institutions. Within the NHM we have a small team of people that are very good at building carefully managed websites. These people undertand the technology required to do this, and though a carefully managed process of permissions, various website get built - you can see some of them here. This process has a fer advanatages. The sites are often very high quality…However, there are alo many problems. The authors of the content are quite removed from the process of building the website. Consequently they have very limited control of what actually appears and how it apprears….With out doubt the biggest problem is that the process of building these sites is enormously protracted and slow. New site appear at the rate of about three of four a year, sometime less.
This is for various reasons but principally it is because the way the project is managed. Every year the NHM’s web development team produce a huge list of all the projects on the waiting list which they planed to do in the coming year, and I was always struck by the fact that the lists would never grow in length - the number of science projects was always more or less static. And if you examined this list close up you reaslise it was because a number of projects were no longer relavenbt because the authors had either died on the waiting list, or had subsequently left the museum. To my eyes there are two basic bottlenecks when it comes to building these sites. The first is the fact that the authors had to go to a group of people that had the technical proficiency to build the contect. The second is that the processes is so tightly managed that getting permission to add or alter content on the web was exceedingly difficult.
Anyone outside the NHM cannot have failed to notice the explosion of the content that has appreard on the web in the past few years. This is largly because these tools have blown away the technical barriers associted with publishing contenbt on the web. And the management barriers associted with getting permission to publish content. Most of these tools are very specific - allowing users to do one particular thing (perhaps publish a snipplet of text like twitter, or say an image on a photosharing site like flickr). But some tools are highly generic.
These tools - often called content management systems, are highly generic and enable user to very rapidly build customised websites with a very generic set of functions tailed to the needs of a particular audience. Four of the most popular sites are listed here, and they all provide basic web publishing functionality. Probably the most popular of these, and the one that we selected to use for the Scratchpads is Drupal.
Drupal is a very popular CMS syetm. It’s the foundation for many the websites of many universities, colleages, libraries, societies and even goivernment institutions. It has a very active developer community, som businesess entiely rely on Drupal, and arguably provides a much more sustainabily platform that some of the other lesser know CMS systems. One of the principle reasons why its so popular is that it is particlarly focused on community management of content. The social process needed to mediate the publication of content on the web are built in to Drupal very effectively.
This has been indipendentlally recognised in a number of studes comparing the functions of CMS systems. For example this report by IdealWare was published early this year, and rates Drupal very highly. Especially in the area of community functionality.
So Drupal was selected to be the foundation for the Scratchpad project. It supports the central management of multiple sites, providing us with a mechanism to supply customised community based websites to anyone that requests a site. THis is the home page of the Scratchpad project. Users can access a variety of resources from here. In parctiuclar they can also apply for a new site.
Fill in form…
Get a template empty site that they can populate and add new users. The Scratchpads have been built to support a wide range of activities required by taxonomists. These functions are easy to navigate, helping people to construct their sites quickly and easily. Crucially users and theme and focus their site on one or more of a thange of activities. This is crucial to building a sense of community amongst the userbase.
The project has been running since March 2007 and in that time we have built up…These form an ecosystem of different communities working toward their own goals in a self managed environment. A high proportion of the sites fail - Probably about 80% have relativly little content. The system is construicted to accomdate failure.
Some use cases for the Scratchpads are illustrated on this rather elaborate slide here….
From a technical perspective Scratchpad sit on a single Virtual Machine in the NHM server room. Currently they occypy about 100 GB of space. The server architecture is based on a LAMP configuration. Its all backed up using Tivoli storage manager. Eventually these go to tape as part of the NHM’s storage management and are picked up by a courrier and stored off site. The marginal costs of creating a new site are exceptionally low - probably about £3 per site, taking into account storage and staff time.
Developers site with instructions on how to set up your own Scratchpad server. You can also access individual modules from here. Functionality is partitions into modules of code which with with the Drupal core. Moduels that are not
Scratchpads: A standard implementation using Drupal
Vincent S. Smith & Simon Rycroft Scratchpads A standard implementation using Drupal
Macro taxonomy The foundation for biodiversity research <ul><li>Inventory the Earth’s species </li></ul><ul><li>Document their relationships </li></ul><ul><li>“ Publish” & apply these data </li></ul>Goal… <ul><li>1.8 M described spp. (10M names) </li></ul><ul><li>300M pages (over last 250 years) </li></ul><ul><li>1.5-3B specimens </li></ul>Data set… People… <ul><li>4-6,000 scientists </li></ul><ul><li>30-40,000 “pro-amateurs” </li></ul><ul><li>Many more citizen scientists? </li></ul>
Micro taxonomy The process of taxonomic research <ul><li>Parochial </li></ul><ul><li>Specialised experts </li></ul><ul><li>Fragmented & distributed </li></ul>Sociology… <ul><li>Different (domain specific) </li></ul><ul><li>Communities of practice </li></ul><ul><li>Non transferable skills </li></ul>Methodology… Output… <ul><li>Heterogeneous & scattered </li></ul><ul><li>High volume, low impact </li></ul><ul><li>Hard to find (use) </li></ul>
ViBRANT INFRA-2010-1.2.3 (VRC) 17 Partners Approx €4M <ul><li>Distribute hosting infrastructure </li></ul><ul><li>Web services on major data types </li></ul><ul><li>Registry of sites and data services </li></ul><ul><li>Common data portal </li></ul><ul><li>New services (keys / phylog.) </li></ul><ul><li>Vocab. / ontology site (GBIF) </li></ul><ul><li>User support </li></ul><ul><li>Sociological research </li></ul>