Scratchpads
Virtual Research Environments

for taxonomic and biodiversity related data

Dr. Vince Smith
Informatics Research Leader
The Natural History Museum London
Where to find and how to cite this presentation:

Smith, V.S., Koureas, D, & Livermore, L. 2014. Scratchpads
introductory presentation. Slideshare.
http://www.slideshare.net/vsmithuk/Scratchpad-2014-Introduction
Current taxonomic data production

Typically generated

by
small communities
for “local” research projects

Figure from Costello M.J et al, 2013. doi: 10.1126/science.1230318

Publications based on countless

specimens, images, maps, ke
ys and datasets
However…
not publicly accessible
lack sufficient contextual metadata
published in formats that require time-consuming manual extraction
difficulty in publishing valuable datasets (i.a. local or regional Floras, Faunas)

Published knowledge cannot easily be mobilised
Vast amounts of unpublished taxonomic “knowledge”
On the other hand:
Estimates of

7.5 million species
still undescribed1

1How

Many Species Are There on Earth and in the Ocean? Mora C et al.
doi:10.1371/journal.pbio.1001127
Expected volume

Need of extracting,

of taxonomic and

aggregating and linking

biodiversity data

data on a global level
The four nodes of data cycle

1.

We collect and generate data

2.

We curate, link and structure data

3.

We analyse data

4.

We publish data
The four nodes of data cycle
What are the

bottlenecks
Data

in the workflow?

collection &
generation

Data

Data

publishing

curation

Data
analysis
What we need is…
a
seamless

workflow

Data
collection &
generation

Data

Data

publishing

curation

Data
analysis
To achieve this…

Link together
evolutionary
data… by developing
“

analytical tools and
proper
documentation and

This requires data, information & knowledge
to be…

• Digital
Not printed paper

• Openly accessible
Not behind barriers (e.g. paywalls)

• Linked-up
Not in silos

then use this framework to
conduct comparative
analyses, studies of
evolutionary process and
biodiversity analyses”
Cyndy Parr, Rob Guralnick, Nico Cellinese and Rod Page. TREE. doi:10.1016/j.tree.2011.11.001
Scratchpads
Virtual Research Environments

Making taxonomy digital, open & linked
so…

what are
the

Scratchpads?
What are Scratchpads?

• Hosted websites for biodiversity data
• Virtual research & publication platform
• Completely open access & open source
• Modular & flexible
What are Scratchpads?
facilitate
development of online research communities
through
standardized environment of entering and curating data
that allow

sharing and interlinking
and
dissemination of research products
The Scratchpads concept
A Scratchpad is a website that holds data for you and your community

Your data

External data & services
The Scratchpads concept
Examples of use:

Taxa
(Classifications, taxon profiles, specimens, literature, images, maps, phenotypic,
genotypic & morphometric datasets, keys, phylogenies)

Conservation

Projects

Regions

Societies
Examples of use:
Red List conservation assessments
Examples of use:
Bulbous monocot genera listed in CITES
Examples of use:
Global Invasive Alien Species Information Partnership
Examples of use:
Belgian Network for DNA Barcoding
Major integrated projects

• Online resource for
monocot plants
• Collaboration between
Kew, Oxford University
and NHM
• Data to be open and
usable by other scientists
Major integrated projects

• 21+ open community sites and
growing
• Over 45 internationally
collaborating scientists

• Site data feeds into a “Portal”

Site List: http://about.e-monocot.org/list-emonocot-scratchpads
Major integrated projects

• Retrieve information on
any Monocot plant
• Rich downloadable data

• Identification keys
• Model example of linked
attributed data
eMonocot Portal: http://e-monocot.org/
Are Scratchpads sustainable?

665 Scratchpads Communities
by

7,334 active registered users

covering

162,432 taxa

in 735,660 pages.

In total more than

1,300,000 visitors
81 paper citations in 2012

Per month unique visitors to Scratchpads sites

65,000
unique visitors/month
Are Scratchpads sustainable?

2007

2011

2014

ViBRANT
Virtual Biodiversity Research

&

&

Other grants in the pipeline
New Proposals
the main

features
The main features

Classification term
oriented system
Biological
classifications
Taxonomies

Non-biological
classifications
Hierarchical controlled
vocabularies
The main features
Dynamic Biological Classifications

Manually entered or imported

Auto generated
The main features
Taxon pages
Overview of data related to taxon

Generated from tagged content
The main features
Bibliography management

An inbuilt Bibliography manager

Faceted browsing
Taxon tagging and free keywords
Import from and export to all major formats
The main features
Specimen/Observation data

Annotated full specimen/observation records
Linked to images and georeferenced
Linked to GenBank accession numbers
The main features
Distribution maps
Google maps based

Data layers
Occurrence data

Distribution data
TDWG regions

GBIF data
The main features

Example regional distribution
Create phylogenetic trees
Based on Newick/NeXML
Different views
The main features
Character matrices – Key construction

Quantitative or qualitative characters
Auto generation of keys
Taxon based matrices
[Specimens based character matrices]
The main features
Media handling

Bulk upload
Metadata
(EXIF & Aubudon core)

Media galleries
The main features
Generation of custom pages

Tagged or not
External RSS
Twitter feeds
Media files
The main features
Enhanced communication tools

Working groups

Forums
Blog entries
Webforms
Newsletters
RSS syndication
Inbuilt comments
The main features

analytical
tools

OBOE service
i.a.
Ecological informatics,
Phylogenetics,
Sequence alignment
Phylogenies
MCMC methods to estimate the posterior distribution of model parameters

Sequence alignment
Multiple sequence alignment

Microsatellite repeats finder
External services Integration
data

mobilisation

more on the way…
IUCN data integration
GBIF data integration
Help & Support
• In-site Support

• Wiki
• Training Courses (12 in 2012)

• Ambassadors Programme
• Embedded Issues Queue

• Sandbox Site

http://help.scratchpads.eu
Data publishing
a
seamless

workflow

Data
collection &
generation

Data

Data

publishing

curation

Data
analysis
The vision

Helping researchers take

credit for all research products
Publication module
The main features
The

Publication
module

Open-access
journal
What does the BDJ publish?
• Single taxon treatments and
nomenclatural acts
• Local or regional checklists
• Sampling reports and occasional
inventories
• Habitat-based checklists and inventories
• Ecological and biological observations of
species and communities?
• Single identification keys
• biodiversity-related databases, including
genomic, ecological and environmental
data (data papers)
• Biodiversity-related software tools
How do

Scratchpads
and the

BDJ
interact?
Working in a single environment

Allow submission of

datasets
for publication

without
reformatting and
restructuring
based on standardised XML schema
Assembling a manuscript

• Work on multiple

manuscripts

• Allocate different

people to different manuscripts

• Handle permissions
Assembling a manuscript
Data included in manuscript in a structured annotated format
Author names and affiliations
Assembling a manuscript
Taxon descriptions
Assembling a manuscript
Specimen data
Figures and Tables
Supplementary files

Select from existing or upload new
Assembling a manuscript
References

Easily cite

bibliography

Auto compile list of references
Assembling a manuscript
Texts
The publication module
Author names and affiliations
Taxon descriptions
Specimen data

Figures and Tables
XML
Keys
References
Supplementary files
Texts
Previewing your manuscript
Submission & enhanced peer review

• Manuscript data validation
• One-click submission to BDJ
• Traditional peer review and optional panel/public review
Community

T h e wo r k f lo w
XML
submission

SCRATCHPADS

PENSOFT JOURNAL SYSTEM
(PJS 2.0)

MANUSCRIPT PUBLISHED
(XML, PDF)

Archive

datasets

Occurrence data

Taxon treatments

Plazi

Taxon names
Wiki
Scratchpads are an integrated system to

Enter, Curate, Mark-up, Link and Publish data

workflow
in a single virtual environment
taxonomic
Acknowledgements
Scratchpads technical development
- Vince Smith, Simon Rycroft, Ben Scott, Ed Baker, Alice Heaton, Katherine Boutton

Scratchpads outreach
- Laurence Livermore, Isa van deVelde & Dimitris Koureas

e-Monocot
- Paul Wilkin & the Kew team, Charles Godfray & the Oxford team

ViBRANT
- Vince Smith, Dave Roberts & Lucy Reeve

Pensoft

-

Lyubomir Penev and the Pensoft team

Our 7000+ users
Data
collection &
generation

Data
publishing

Thank you

Data
analysis

Data
curation
Scratchpad 2014-introduction

Scratchpad 2014-introduction

Editor's Notes

  • #6 With almost 8 million species still to be described,
  • #12 The Scratchpads platform is being developed for the last 5 years under this framework. To provide researchers with the necessary tools to make taxonomy digital, open and linked!To facilitate the development of virtual research environments
  • #24 In the project there are more than 21 eMonocot Scratchpads which have over 45 international collaborating scientists.The eMonocot Scratchpads cover over 15 families with more planned with additional workshops which will take place this year at Monocots V in New York.The Scratchpad to eMonocot Portal link is now active and available for the public to browse all the Scratchpad data combined with other external monocot resources.
  • #25 All of the information is brought together in the eMonocot portal. The information presented here will be especially useful for anyone studying the ecology or evolution of the monocot plants, or who wants to understand monocot biodiversity and conservation.The portal provides taxon descriptions, distribution maps, taxonomies and keys, all of which are downloadable and attributed to the author and contributing site.
  • #31 Intuitive professional looking layout.Easy to compile taxon pages without any knowledge of web design.Taxonomy provides the crucial backbone, linking content together and is easily updateable.On this page you can see the classification browser in the side bar, detailed nomenclatural information, images and a diagnostic summary.
  • #57 No need to restructure, correctly format or gather additional data. Once entered in Scratchpad they can automatically be added to the publication.
  • #66 Prior to submission all data are validated to ensure there are no important missing fields.Once you have finished your manuscript there is a simple one-click submission process where all your specified authors and contributors are given access to the article in the Pensoft Journal System and will be updated on the review process.