Smith, VS. 2012. Scratchpad 2, Virtual Research Environment: Project Update. A presentation given at the EOL Content Summit, Smithsonian Tropical Research Institution, Barro Colorado Island, Panama. 17-20 Jan. 2012.
Good afternoon everybody. I ’m going to give a very quick update on the Scratchpad project, which is being developed under the auspices of an EU funded project called ViBRANT, and a UK Research council project called e-monocot. The Scratchpads are a class of software called Virtual Research Environments, and these enable enable real-time collaboration and dissemination of information and are creating new opportunities to manage data. The Scratchpads project has been running for approximately 5 years, and in the last year we have been redeveloping the software to improve the functionality and sustainability of the system. In this talk I ’ m going quickly go over some of these new developments.
So I think most of you will be familiar with the fundamental problem the Scratchpads are trying to address. Much of the science we are doing, especially in the field of taxonomy and biodiversity studies, need to happen on a global scale. It requires global standards for software to talk to each other, global workflows to make that process of interaction seamless, and the cooperation of major projects, working toward shared goals. However, the practice of taxonomy happens locally. Its carried out by local groups of scientists, using local infrastructures, and usually with local funders. So the challenge is to link these local activities up to these global projects. And this is fundamentally what the Scratchpad project is about.
The Scratchpad project originally started on a shoestring budget under the EU funded EDIT project, and this continued under the ViBRANT initiative, plus some other projects, which who ’ s funding at the moment is scheduled to finish at the end of 2013. As many of you will know, the Scratchpads are hosted websites for taxonomists. These website can be customised by who own and manage the content on these sites. These typically focus on a particular taxon, the flora and fauna of a region or in some cases are about the actions of a particular society. The sites act as a research & publication platform for their users and these sites support the taxonomic workflow through a series of modules. The entire project is currently maintained and developed by just two software developers. And these support an entire ecosystem of about 300 user communities.
The types of information present within Scratchpads is very variable but the majority of sites are principally its about taxa. These sites hold structured information for taxa on a range of data types including Classifications, taxon profiles, specimens, literature, images, maps, phenotypic, genotypic & morphometric datasets, keys, phylogenies. Very occasionally you will find a Scratchpads that has a little bit of al this information of a group, but more often these sites will have a specific focus. For example, may taxon sites concentrate on the compilation of bibliographic data, because its easy to share and benefits all parties. In addition to taxa there are many site with very different agenda. Some focus on conservation issues like the drafting of Red List threat assessments. Others on specific science projects or the flora and fauna of a particular region. Others concern specific societies. The construction of these Scratchpads is very much a bottom up endeavor, and we don’t seek to alter of influence this decision in any way, except through the structure and functionality we create in the sites.
At present there are about 320 different Scratchpad sites, and in total we have about 6,000 registered users, of which about 5,000 of these are active users. In other words these active users have logged in and done something on the site. In fact I ’ ve started to stop paying attention to these total numbers. More interesting is the monthly and weekly use. So for example last week we had 273 users log in, and over the month that figure was 759. This monthly figure was across about 85 different Scratchpads, and gives an idea of the breadth of use and “ stickyness ” of the sites. The number of users on any one site ranges from 1 to over 1000, with an average of 15 per site and the most popular number being just 1. We don ’ t collect precise numbers but broadly these fall into professional scientists and informed amateur. We do have plans for some citizen science applications for the Scratchpads but this will come later. What is interesting is that the number of users shot up with the start of the ViBRANT project, and after Scratchpads 2 have bedded in in 2012, its likely this momentum will be maintained.
So the overall goal of the Scratchpad project remains the same with the release of Scratchpad 2. We aim to create a scholarly communication system that is intertwined with the pursuit of taxonomy and systematics,. In other words we want the Scratchpads to be embedded into the day to day work practices of the user community. Using the system should be seen as something that makes researchers life easier, rather than being something that is just an add on or optional extra to their activities. And this is the real justification for Scratchpads 2. The original version of the Scratchpads was put together on an ad hoc basis with a tiny budget. To use a British expression it is rather a kludge. New funding has allows us to step back from this and reconsider how we change the technical delivery of the Scratchpads to make them more scalable and sustainable. As well as front end enhancements to improve the functionality and ease of use. The fact that the underlying content management system was undergoing a version upgrade from 6-7, also gave us a good reason to do this now. In terms of release timeline, at present all the technical development is done and we are in to process of completing the theming (in other words the presentation and views). The current plan, and I’m pretty confident this is accurate, is to have a complete version by the 3 rd Feb, which we will release on the Sandbox for testing comment and bug fixes. By early March we will push Scratchpads to to all new users. In other words new site requests will get Scratchpads 2 from this point on. By April we will offer opt in switch over for existing users. And by mid 2012 we will essentially force SP1 users to switch over, but preserve the option to stick with SP1 for some users. I’m certain some users will not want to switch because of the specific ways they are using their sites, or because the the new theming breaks their sense of ownership with the site.
So what are the improvements in SP2. As I said previously they consist of backend enhancements to make the system more sustainable, and front end enhancements to make the system more functional. I’m not going to go through the back end enhancements in detail, but in short they comprise… Aegir hosting environment (automated site management & migration) Themed project profiles (e.g. GBIF NPT, COMBER & potentially LifeDesks) Git code management (distributed source code version control) Scratchpad wide search (Apache Lucene, dedicated VM) Distributed physical hosting (not just at the NHM) Data services (Darwin Core Archives & Extensions) Its worth nothing that several backend improvements are still to come after the SP2 release.
This is the list of front end enhancements and rather than go though the list I have a slide on most of these which I will go through quickly.
Probably the major difference people will see when it comes to SP2 is the new theming and presentation. The original was rather idiosyncratic, and left the user to customize the color scheme and layout. This resulted in some rather unprofessional looking sites, but did re-enforce the users sense of ownership with the site and its content. It is perhaps for this reason that we are likely to have a problem moving some users to SP1, because they are so wedded to their original theme. The new theme, is much more consistent with less clutter, clearer navigation and looks more professional. The goal was to preserve the flexibility we offered with SP1 but offer a more consistent set of navigational structures across the whole site.
So for example users see these little setting wheels where they can easily configure content and the display. Likewise we have predefined swatches that enable users to pick a series of complimentary colours for their site, rather than the hodge podge that we currently have under SP1.
Another major improvement in SP2 is a much easier editing and administration interface. All editing now takes place in overlays that sit on top of the display, providing a logical link between editing and how its presented. Likewise the interface to these is much more consistent, removing a lot of the clutter that was previously present.
One of the biggest improvements in SP2 is a t abular data management environment. Almost all categories of content can be displayed in a spreadsheet like environment that supports one clink Excel import from templates that are dynamically created by the Scratchpad to reflect the field structure of the content type. A key feature of this is the ability to instantly filter the display on any of the fields, and export this content. Even better is the ability to download the data, update it off line and then reimport in with the corrections. We are expecting this to be heavily used and to significantly drive up content within the Scratchpads.
One of the biggest problems in SP1 was that many activities had to be performed in a particular sequence, and that tis sequence wasn’t especially intuitive. As a result many users simple had to know or learn the sequence in order to achieve their goals. This is fixed in SP2 with the workflows module, which allows us to chain activities together, such as the processes involve the site setup, adding new users, or activities like creating a new taxonomy, The result is that the whole system should be much more intuitive to the user. This is one part of SP2 that has not been themed yet so there is nothing too special to see. But the goal is that users can follow a more logical structure to running their site, which out having to look at the help resources or contact the core Scratchpad team.
Another big improvement is the incorporation of a faceted search interface into all the content types. For those unfamiliar with faceted search it’ s a bit like when shopping on Amazon and having a intelligent list of search categories presented, depending on what you are looking at. SO for example when you go tot the default bibliography view users see lists of authors, years and journals though which they can rapidly discover the information they are looking for. As I ’ ve already mentions, for content displayed in a grip users can similarly search on any field to discover content too.
This faceted search interface has been incorporated into the media gallery , which has a completely new look and also includes video support. As part of this it is out intention (and I say intention because I don’t know whether this has been done yet) to embed YouTube and Vimeo videos. Likewise users should be able to embed links to specific flickr images too. As a result this means we don’ t have to host the actual content, is users want t place this else where because of the benefits conferred by those other sites.
As part of the redesign we have completely changed our taxon pages. Previously these used to be presented as a single page and shows a mix of content from the users site as well as a wide range of third party content. Frankly the interface for this wasn’t very good. It was very hard to navigate and customize. Also the quality of the third party information was often very poor. As a result, users usually just displayed their own content, and not the third party content, and in fact the taxon pages simply were not very well used. We have tried to address these shortcomings in SP2 by moving all content to tabs, and limiting the sources for third party content. This uses the EOL API and at the moment only displays third party content from EOL, although we plan to expand these sources later this year.
The new interface is designed to encourage easy publishing of structured textual data (at present these are the SPM fields) to EOL. Users can also directly edit this content from the taxon page, As with SP1 the taxon pages support parent child inheritance so optionally data on for example a species page, is available on the genus page too. Frankly the taxon pages are still in the process of being themed ( this was only done last week) so these still have a little way to go before they are fully presentable.
With regard to maps, in SP1 users had a rather confusing choice of three map types. They could make use of the GBIF occurrence maps, although interestingly may users chose not to display these because of concerns about data quality. Alternatively they could display the maps dynamically constructed from their own specimen record in their site. Alternatively they could construct regional maps identifying the presence or absence of taxa in a particular county, according to the TDWG standard, which we supported to level 4.
Within SP2 the goal is to integrate all these map types, and ensure that even the third party data can be locally edited. This has already been achieved for point and TDWG region data. In addition users can now define flexible polygons, lines or annotations on the map, all with structured metadata. Technically at this stage it is also possible to import GBIF points on the the map along with the metadata, which can then be edited and the location of points edited (or more likely suppressed on the map). However the danger here is that we are potentially recreating GBIF for certain taxa or regions within the site, and that is not the intention. Also there are major issues displaying very large number of points, although it is worth noting that many of these issues have now been fixed. Thus there is still some work to do here before the mapping module is fully realized with al this functionality.
Within SP2 we have made major headway in allowing users to formally publish content from their site, to other resources. These publishing options range from full manuscripts containing taxon descriptions, to data sets, or even individual data projects. Essentially these publishing options are the payback for users engaging with the site that have taken the trouble of structuring their data. By doing this the system enables them to very easily reuse this information and gain credit for this via publishing outlets. In the context of formal manuscripts, a prototype system has already been in operation in SP1. This enables users to create a paper for peer review and publication in the Journals ZooKeys or PhytoKeys. This is done by entering the basic metadata about the paper, selecting which Scratchpad content forms part of the paper, providing an interface to organise the manuscript, and then the means to submit this as structured XML using Taxpub markup to Pensoft. At present we have has three papers with new species descriptions published this way. As part of the SP2 work, Pensoft are altering the XML structure and we have changed the interface to make this even easier to user.
Shortly after the release of SP2 we will also be providing the option for users to do this for scientific datasets. Many researchers have checklists, ecological, phenotypic, genotypic or morphometric data that on its own doesn’t justify a traditional scientific paper. However, with sufficient descriptive metadata and a mechanism to offer this information to others in a structured way, would justify publication. Parts of the Scratchpad, like the Character project tool, provide the means for users to import or build these datasets very quickly, and in partnership with Pensoft we will provide a mechanism through which they can be formally published at the touch of a button.
This same metaphor extends to smaller amounts of data. For example, it is out intention to support the direct publication of taxon names to ZooBank, once sufficient metadata has been deposited in the taxon editor. Likewise these principles extend to EOL SPM content. Once sufficient content is present, it should activate a button that enables (at the users disgression) a mechanism to push data to these third party databases. I should add that achieving this level of granular publication has not been completed yet, although Scratchpads do (and will continue to support) the publication of species profile information to EOL. However, we hope to add these functions in the near future.
I just wanted to finish up with a few discussion points that particularly relate to the Scratchpads and EOL. These might be best discussed after the LifeDesk presentation but essentially are the issues we face either making greater use of EOL in Scratchpads or in terms of pushing content too EOL.
Scratchpad 2, Virtual Research Environment: Project Update
Scratchpad 2, Virtual Research Environment: Project Update Vince Smith Natural History Museum, London [email_address] EOL Content Summit Panama, 17-20 Jan. 2012
The problem <ul><li>Science is global </li></ul><ul><ul><li>It needs global standards </li></ul></ul><ul><ul><li>Global workflows </li></ul></ul><ul><ul><li>Cooperation of global players </li></ul></ul><ul><li>Science is carried out “locally” </li></ul><ul><ul><li>By local scientists </li></ul></ul><ul><ul><li>Being part of local infrastructures </li></ul></ul><ul><ul><li>Having local funders </li></ul></ul>
Scratchpads <ul><li>EDIT (07-11) , ViBRANT / eMonocot (11-13) </li></ul><ul><li>Hosted websites for taxonomists </li></ul><ul><li>Taxonomic, regional or societal </li></ul><ul><li>Research & publication platform </li></ul><ul><li>Supports the taxonomic workflow </li></ul><ul><li>Modular (Drupal) & flexible </li></ul><ul><li>Two full time developers </li></ul><ul><li>Ecosystem of communities (~300) </li></ul>http://scratchpads.eu
Scratchpad 2 Overview Justification for SP2 Release Timeline <ul><li>Backend enhancements (technical sustainability & scalability) </li></ul><ul><li>Frontend enhancements (improved functionality & ease of use) </li></ul><ul><li>Move from Drupal 6 to Drupal 7 (4 year upgrade cycle, UI + entities) </li></ul>Goal “ a scholarly communication system that is intertwined with the pursuit of natural history, rather than its after-thought or annex ” <ul><li>Feb. 3 rd (Sandbox release for testing, comments & bug fixes ) </li></ul><ul><li>March 2 nd (released for new site requests) </li></ul><ul><li>April 2 nd (opt in switch over for SP1 users) </li></ul><ul><li>Mid 2012 (opt out of automatic migration for all SP1 users) </li></ul>
<ul><li>Aegir hosting environment (automated site management & migration) </li></ul><ul><li>Themed project profiles (e.g. GBIF NPT, COMBER & potentially LifeDesks) </li></ul><ul><li>Git code management (distributed source code version control) </li></ul><ul><li>Scratchpad wide search (Apache Lucene, dedicated VM) </li></ul><ul><li>Distributed physical hosting (not just at the NHM) </li></ul><ul><li>Data services (Darwin Core Archives & Extensions) </li></ul><ul><li>Scratchpad site registry with metadata and usage metrics </li></ul><ul><li>Integrated Single Sign On (SSO) (e.g. Google, Facebook, author-ID) </li></ul><ul><li>Proper GUIDS beyond URL ’s (LSIDs, DataCite DOI ’ s?) </li></ul>Scratchpad 2 backend enhancements Still to come…
<ul><li>Consistent theming, navigation & less clutter (more scholarly, still flexible) </li></ul><ul><li>Easier administration (editing overlays, simple content & user management) </li></ul><ul><li>Tabular data management (1 step Excel import & export) </li></ul><ul><li>Guided workflows (linking site functionality) </li></ul><ul><li>Faceted search (with multisite search options) </li></ul><ul><li>New multi-media gallery (including video support) </li></ul><ul><li>New EOL linked taxon pages (inspired by Natural History guides) </li></ul><ul><li>Consistent mapping (integrating points, polygons, regions & 3 rd party data) </li></ul><ul><li>Manuscript & data publication (enhanced publication module) </li></ul><ul><li>Key construction, analysis toolbox and more data publishing outlets </li></ul>Scratchpad 2 frontend enhancements Still to come…
Scratchpad 2 taxon pages <ul><li>Easy publishing to EOL </li></ul><ul><li>Better content control </li></ul><ul><li>Direct editing </li></ul><ul><li>Parent-child data inheritance </li></ul><ul><li>Inspired by Natural History Guides </li></ul>
Scratchpad 2 mapping Three map types supported in SP1 User defined TDWG regions (up to Level 4) GBIF Maps User defined Point localities via DwC records
Scratchpad 2 mapping <ul><li>Point & TDWG region maps </li></ul><ul><li>Flexible polygon maps & annotations </li></ul><ul><li>GBIF point imports (record limits) </li></ul><ul><li>Edit point location & metadata </li></ul>
Scratchpad 2 “publication” options <ul><li>Manuscripts from Scratchpads to ZooKeys & PhytoKeys </li></ul><ul><li>Produced directly from the database </li></ul><ul><li>Three exemplar papers produced via SP1 </li></ul><ul><li>Comprehensive new data structure in SP2 </li></ul>Taxon descriptions 1. Define the publication 2. Enter metadata 3. Select taxa & content 4. Organise manuscript 5. Submit to journal
Scratchpad 2 “publication” options <ul><li>New Pensoft data publication journal </li></ul><ul><li>Published metadata descriptions of structured datasets </li></ul><ul><li>Dataset pushed to 3 rd party repository </li></ul>Datasets
Scratchpad 2 “publication” options <ul><li>Name & metadata to ZooBank (GNA) </li></ul><ul><li>Taxon SPM data to EOL </li></ul><ul><li>1-Click publication (subject to minimum data standard) </li></ul>Data items
EOL Issues – discussion points <ul><li>Incomplete </li></ul><ul><li>Multiple records </li></ul><ul><li>Licensing complexity </li></ul><ul><li>Seeding Scratchpad SPM fields ??? </li></ul>API <ul><li>Profile development </li></ul><ul><li>Mapping content </li></ul><ul><li>Resources </li></ul><ul><li>Timing & policy </li></ul>LifeDesks <ul><li>Often has better content, less duplication & no licensing issues </li></ul><ul><li>Would like to see greater integration with EOL </li></ul>Wikipedia