Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scratchpad 2014-introduction


Published on

Scratchpad Introduction by Smith, V.S., Koureas, D, & Livermore, L. Updated for Feb. 2014.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Scratchpad 2014-introduction

  1. 1. Scratchpads Virtual Research Environments for taxonomic and biodiversity related data Dr. Vince Smith Informatics Research Leader The Natural History Museum London
  2. 2. Where to find and how to cite this presentation: Smith, V.S., Koureas, D, & Livermore, L. 2014. Scratchpads introductory presentation. Slideshare.
  3. 3. Current taxonomic data production Typically generated by small communities for “local” research projects Figure from Costello M.J et al, 2013. doi: 10.1126/science.1230318 Publications based on countless specimens, images, maps, ke ys and datasets
  4. 4. However… not publicly accessible lack sufficient contextual metadata published in formats that require time-consuming manual extraction difficulty in publishing valuable datasets (i.a. local or regional Floras, Faunas) Published knowledge cannot easily be mobilised Vast amounts of unpublished taxonomic “knowledge”
  5. 5. On the other hand: Estimates of 7.5 million species still undescribed1 1How Many Species Are There on Earth and in the Ocean? Mora C et al. doi:10.1371/journal.pbio.1001127
  6. 6. Expected volume Need of extracting, of taxonomic and aggregating and linking biodiversity data data on a global level
  7. 7. The four nodes of data cycle 1. We collect and generate data 2. We curate, link and structure data 3. We analyse data 4. We publish data
  8. 8. The four nodes of data cycle What are the bottlenecks Data in the workflow? collection & generation Data Data publishing curation Data analysis
  9. 9. What we need is… a seamless workflow Data collection & generation Data Data publishing curation Data analysis
  10. 10. To achieve this… Link together evolutionary data… by developing “ analytical tools and proper documentation and This requires data, information & knowledge to be… • Digital Not printed paper • Openly accessible Not behind barriers (e.g. paywalls) • Linked-up Not in silos then use this framework to conduct comparative analyses, studies of evolutionary process and biodiversity analyses” Cyndy Parr, Rob Guralnick, Nico Cellinese and Rod Page. TREE. doi:10.1016/j.tree.2011.11.001
  11. 11. Scratchpads Virtual Research Environments Making taxonomy digital, open & linked
  12. 12. so… what are the Scratchpads?
  13. 13. What are Scratchpads? • Hosted websites for biodiversity data • Virtual research & publication platform • Completely open access & open source • Modular & flexible
  14. 14. What are Scratchpads? facilitate development of online research communities through standardized environment of entering and curating data that allow sharing and interlinking and dissemination of research products
  15. 15. The Scratchpads concept A Scratchpad is a website that holds data for you and your community Your data External data & services
  16. 16. The Scratchpads concept
  17. 17. Examples of use: Taxa (Classifications, taxon profiles, specimens, literature, images, maps, phenotypic, genotypic & morphometric datasets, keys, phylogenies) Conservation Projects Regions Societies
  18. 18. Examples of use: Red List conservation assessments
  19. 19. Examples of use: Bulbous monocot genera listed in CITES
  20. 20. Examples of use: Global Invasive Alien Species Information Partnership
  21. 21. Examples of use: Belgian Network for DNA Barcoding
  22. 22. Major integrated projects • Online resource for monocot plants • Collaboration between Kew, Oxford University and NHM • Data to be open and usable by other scientists
  23. 23. Major integrated projects • 21+ open community sites and growing • Over 45 internationally collaborating scientists • Site data feeds into a “Portal” Site List:
  24. 24. Major integrated projects • Retrieve information on any Monocot plant • Rich downloadable data • Identification keys • Model example of linked attributed data eMonocot Portal:
  25. 25. Are Scratchpads sustainable? 665 Scratchpads Communities by 7,334 active registered users covering 162,432 taxa in 735,660 pages. In total more than 1,300,000 visitors 81 paper citations in 2012 Per month unique visitors to Scratchpads sites 65,000 unique visitors/month
  26. 26. Are Scratchpads sustainable? 2007 2011 2014 ViBRANT Virtual Biodiversity Research & & Other grants in the pipeline New Proposals
  27. 27. the main features
  28. 28. The main features Classification term oriented system Biological classifications Taxonomies Non-biological classifications Hierarchical controlled vocabularies
  29. 29. The main features Dynamic Biological Classifications Manually entered or imported Auto generated
  30. 30. The main features Taxon pages Overview of data related to taxon Generated from tagged content
  31. 31. The main features Bibliography management An inbuilt Bibliography manager Faceted browsing Taxon tagging and free keywords Import from and export to all major formats
  32. 32. The main features Specimen/Observation data Annotated full specimen/observation records Linked to images and georeferenced Linked to GenBank accession numbers
  33. 33. The main features Distribution maps Google maps based Data layers Occurrence data Distribution data TDWG regions GBIF data
  34. 34. The main features Example regional distribution
  35. 35. Create phylogenetic trees Based on Newick/NeXML Different views
  36. 36. The main features Character matrices – Key construction Quantitative or qualitative characters Auto generation of keys Taxon based matrices [Specimens based character matrices]
  37. 37. The main features Media handling Bulk upload Metadata (EXIF & Aubudon core) Media galleries
  38. 38. The main features Generation of custom pages Tagged or not External RSS Twitter feeds Media files
  39. 39. The main features Enhanced communication tools Working groups Forums Blog entries Webforms Newsletters RSS syndication Inbuilt comments
  40. 40. The main features analytical tools OBOE service i.a. Ecological informatics, Phylogenetics, Sequence alignment
  41. 41. Phylogenies MCMC methods to estimate the posterior distribution of model parameters Sequence alignment Multiple sequence alignment Microsatellite repeats finder
  42. 42. External services Integration data mobilisation more on the way…
  43. 43. IUCN data integration
  44. 44. GBIF data integration
  45. 45. Help & Support • In-site Support • Wiki • Training Courses (12 in 2012) • Ambassadors Programme • Embedded Issues Queue • Sandbox Site
  46. 46. Data publishing a seamless workflow Data collection & generation Data Data publishing curation Data analysis
  47. 47. The vision Helping researchers take credit for all research products
  48. 48. Publication module
  49. 49. The main features The Publication module Open-access journal
  50. 50. What does the BDJ publish? • Single taxon treatments and nomenclatural acts • Local or regional checklists • Sampling reports and occasional inventories • Habitat-based checklists and inventories • Ecological and biological observations of species and communities? • Single identification keys • biodiversity-related databases, including genomic, ecological and environmental data (data papers) • Biodiversity-related software tools
  51. 51. How do Scratchpads and the BDJ interact?
  52. 52. Working in a single environment Allow submission of datasets for publication without reformatting and restructuring based on standardised XML schema
  53. 53. Assembling a manuscript • Work on multiple manuscripts • Allocate different people to different manuscripts • Handle permissions
  54. 54. Assembling a manuscript Data included in manuscript in a structured annotated format Author names and affiliations
  55. 55. Assembling a manuscript Taxon descriptions
  56. 56. Assembling a manuscript Specimen data
  57. 57. Figures and Tables
  58. 58. Supplementary files Select from existing or upload new
  59. 59. Assembling a manuscript References Easily cite bibliography Auto compile list of references
  60. 60. Assembling a manuscript Texts
  61. 61. The publication module Author names and affiliations Taxon descriptions Specimen data Figures and Tables XML Keys References Supplementary files Texts
  62. 62. Previewing your manuscript
  63. 63. Submission & enhanced peer review • Manuscript data validation • One-click submission to BDJ • Traditional peer review and optional panel/public review
  64. 64. Community T h e wo r k f lo w XML submission SCRATCHPADS PENSOFT JOURNAL SYSTEM (PJS 2.0) MANUSCRIPT PUBLISHED (XML, PDF) Archive datasets Occurrence data Taxon treatments Plazi Taxon names Wiki
  65. 65. Scratchpads are an integrated system to Enter, Curate, Mark-up, Link and Publish data workflow in a single virtual environment taxonomic
  66. 66. Acknowledgements Scratchpads technical development - Vince Smith, Simon Rycroft, Ben Scott, Ed Baker, Alice Heaton, Katherine Boutton Scratchpads outreach - Laurence Livermore, Isa van deVelde & Dimitris Koureas e-Monocot - Paul Wilkin & the Kew team, Charles Godfray & the Oxford team ViBRANT - Vince Smith, Dave Roberts & Lucy Reeve Pensoft - Lyubomir Penev and the Pensoft team Our 7000+ users
  67. 67. Data collection & generation Data publishing Thank you Data analysis Data curation