Making your data work for you:                           Scratchpads, publishing & the                               Biodi...
Our informatics grand challenge… ―Link together evolutionary data… by developing analytical tools and proper documentation...
Our informatics grand challenge… ―Link together evolutionary               This requires data, information data… by develo...
Most of our output is not digital, open or linked •      15-20k new spp. described annually (2M total)1 •      30k nomencl...
ScratchpadVirtual Research Environments    Making taxonomy digital, open & linked
What is a Scratchpad? A website for you & your community         1                      2                 3     Your data ...
Scratchpads                        • EDIT (07-11), ViBRANT / eMonocot (11-13)                        • Hosted websites for...
Categories of Scratchpads                                      Taxa (Classifications, taxon profiles, specimens, literatur...
Summary of what Scratchpads can do  •   Taxon pages, generated from tagged content (plant/animal)  •   Bibliography manage...
Scratchpad v.1 usage (2007- Mar. 2012)   Nodes, 430, 948   Sites 326   Users 6809   Active Users 5733   (273 w / 759 m)   ...
Scratchpad 2 – the new version of Scratchpads                                     • Launched March 2012                   ...
Getting data in and out of Scratchpads 2
Online community revision                          • Taxonomy is in perpetual beta                            - Constantly...
Publishing observations & taxon datahttp://scratchpads.eu > http://gbif.org & http://eol.org   Specimen records & species ...
Experiments with article publishinghttp://scratchpads.eu > http://pensoft.net     Paper assembled from                    ...
Example papers via Scratchpads…  Blagoderov V, Hippa H, Nel A (2010). ZooKeys 50:        Faulwetter S, Chatzigeorgiou G, G...
BDJThe Biodiversity Data Journal        Making small data big!
Why do we need another new journal!!!    Taxonomy needs less fragmentation, not more! BUT… • We need to encourage taxonomi...
Biodiversity Data Journal (BDJ)• All data matters: No lower or upper limit of manuscript size!• Multiple publishing routes...
BDJ publication & dissemination workflow                             GBIF-generated                                    Man...
Pensoft manuscript writing tool                             Contributors                                              • Co...
Testing screenshots of the writing tool  Manuscript preview   Multi-figure plates   Plate layout  ID Key                  ...
Why publish in the BDJ?• Joining (small) data into a large data pool• Open-access, archiving and re-using your data  throu...
What will BDJ publish?• Single taxon treatments and nomenclatural acts• Local or regional checklists• Sampling reports and...
BDJ     Barcoding, genomic &environmental sequence papers        Making small data big!
Mammal taxa added to Genbank annually                                             Aus sp.                                 ...
Proportion of mammal dark taxa in Genbank                                            Aus sp.                          Prop...
Proportion of invert. dark taxa in Genbank                                       BOLD
Dark taxa are the norm for bacteria
A lesson in principles for dealing with dark taxaRoth v. Wikipediahttp://www.newyorker.com/online/blogs/books/2012/09/an-o...
But Wikipedia said ―no‖   ―I understand your point that the   author is the greatest authority on   their own work,‖ write...
But Wikipedia said ―no‖ One of Wikipedia’s core principles, along with things like neutrality, is verifiability: a reader ...
Lessons for taxonomy & dark taxa…       Taxonomic statements should be verifiable                         Literature is th...
Example templates & dissemination   Occurrence data           Any other data      ―Dark‖ taxon data   Morphometric data   ...
Example template & data fields
Workflow describing ―Dark Taxa‖                                               PWT – COLLABORATIVE    Dark taxon sequenced ...
Data published                 Nomenclature                   Literature                 Descriptions                     ...
―Dark Taxon‖ papers  • Should contain…   -   The scope of the taxonomic, ecological & geographic coverage   -   The source...
Possible discussion points…  • The concept…    - Is it a good approach to incentivize data publishing & good metadata     ...
Acknowledgements  • Scratchpad technical development   - Simon Rycroft, Ben Scott, Ed Baker, Alice Heaton, Katherine Boult...
Penso                    Penso                               Peer-review op onsWri ng                   Journal           ...
Why we need new methods of publishing…                                                                      RE-USE        ...
Source: Wikipedia
Making your data work for you: Scratchpads, publishing & the biodiversity data journal
Making your data work for you: Scratchpads, publishing & the biodiversity data journal
Upcoming SlideShare
Loading in...5
×

Making your data work for you: Scratchpads, publishing & the biodiversity data journal

771

Published on

This is a derivative of a talk I gave at the Linnean society on 20th Sept. 2012. This version was given at the i4Life Environmental Genomics workshop on 25th Sept. and refocused to look at the dark taxa problem and developing published descriptions of molecular sequence clusters.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
771
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Making your data work for you: Scratchpads, publishing & the biodiversity data journal

  1. 1. Making your data work for you: Scratchpads, publishing & the Biodiversity Data JournalEBI, UK Vince Smith1, Dave Roberts1 & Lyubomir Penev225 September, 2012 1. Natural History Museum, London 2. Pensoft Publishers, Sofia, Bulgaria vince@vsmith.info
  2. 2. Our informatics grand challenge… ―Link together evolutionary data… by developing analytical tools and proper documentation and then use this framework to conduct comparative analyses, studies of evolutionary process and biodiversity analyses‖ Cyndy Parr, Rob Guralnick, Nico Cellinese and Rod Page. TREE. doi:10.1016/j.tree.2011.11.001
  3. 3. Our informatics grand challenge… ―Link together evolutionary This requires data, information data… by developing & knowledge to be… analytical tools and proper documentation and then • Digital use this framework to Not printed paper conduct comparative • Openly accessible analyses, studies of evolutionary process and Not behind barriers biodiversity analyses‖ • Linked-up Not in silos Cyndy Parr, Rob Guralnick, Nico Cellinese and Rod Page. TREE. doi:10.1016/j.tree.2011.11.001
  4. 4. Most of our output is not digital, open or linked • 15-20k new spp. described annually (2M total)1 • 30k nomenclatural acts (12M total) 1 • 20k phylogenies (750k total)2 • 31k taxa sequenced (360k taxa total)3 • 800k BioMed papers (40M total pp. of taxonomy) 4 • Countless specimens, images, maps, keys… Typically generated by small communities for “local” research projects Figures from 1) Zhang, Zootaxa 2011 4, 1-4; 2) Web-of-Science; 3) Genbank and 4) PubMed.
  5. 5. ScratchpadVirtual Research Environments Making taxonomy digital, open & linked
  6. 6. What is a Scratchpad? A website for you & your community 1 2 3 Your data Uploaded & ―Published‖ & reviewed tagged on your site Fast Intuitive Fit for use
  7. 7. Scratchpads • EDIT (07-11), ViBRANT / eMonocot (11-13) • Hosted websites for taxonomists • Taxonomic, regional or societal • Research & publication platform • Supports the taxonomic workflow • Modular (Drupal) & flexible • Two full time developers • Ecosystem of communities (~450)http://scratchpads.eu
  8. 8. Categories of Scratchpads Taxa (Classifications, taxon profiles, specimens, literature, images, maps, phenotypic, genotypic & morphometric datasets, keys, phylogenies) Conservation Projects Regions Societies
  9. 9. Summary of what Scratchpads can do • Taxon pages, generated from tagged content (plant/animal) • Bibliography management • Character matrixes • Specimen records • Distribution maps (from specimens and regional) • Images, video and sound (bulk import) • Excel spreadsheet import (dynamically generated) • Darwin Core Archive export • Tabular data editing • Custom content • User management • Custom webforms • EOL data import (taxonomy, species information) • GBIF Map integration
  10. 10. Scratchpad v.1 usage (2007- Mar. 2012) Nodes, 430, 948 Sites 326 Users 6809 Active Users 5733 (273 w / 759 m) Users Range: 1-1049 Sites Mean: 15 Mode: 1 • Prof. scientists • Amateur naturalists • Citizen scientists ViBRANT SP 2
  11. 11. Scratchpad 2 – the new version of Scratchpads • Launched March 2012 • 120 sites to date • EOL Fellows • SP1 migration ongoing • More professional • Easier to… - configure (workflows) - navigate (facets) - & populate (MS Excel templates) • Greater standardisation • Still highly flexible • Project profiles (eMonocot) • Framework for integratione.g. http://ihs.myspecies.info/
  12. 12. Getting data in and out of Scratchpads 2
  13. 13. Online community revision • Taxonomy is in perpetual beta - Constantly evolving - Changing contributors - Small granular contributions • Sustainability - A permanent space to work - Guaranteed access (2016) - Easy ways to get the data out • Open science - Beyond Open Access - New ways of working - Data management plansFreeloader flieshttp://milichiidae.info • Need incentives to use - More efficient (functions & reuse) - Attribution & provenance - Credit via citation • New forms of publication
  14. 14. Publishing observations & taxon datahttp://scratchpads.eu > http://gbif.org & http://eol.org Specimen records & species Pushed to GBIF & EOL pages on Scratchpads (requires site registration with GBIF & EOL) Darwin Core Archive (DwCA) >19K specimen records >377M specimen records GBIF > 122k species pages > 1 M species pages in EOL
  15. 15. Experiments with article publishinghttp://scratchpads.eu > http://pensoft.net Paper assembled from XML submission, peer review & Scratchpad database marked-up publication by Pensoft doi:10.3897/zookeys.50.539 XML HTML PDF5-step workflow for selecting data, Published in Zookeys & Phytokeys adding metadata & previewing (worldwide coverage)
  16. 16. Example papers via Scratchpads… Blagoderov V, Hippa H, Nel A (2010). ZooKeys 50: Faulwetter S, Chatzigeorgiou G, Galil BS, Brake I, von Tschirnhaus M (2010). ZooKeys 50: 79–90. doi: 10.3897/zookeys.50.506 Nicolaidou A, Arvanitidis C (2011. ZooKeys 150: 91–96. doi: 10.3897/zookeys.50.505 327–345. doi: 10.3897/zookeys.150.1877 http://sciaroidea.info/node/44428 http://polychaetes.marbigen.org/node/35 http://milichiidae.info/node/14995 Live (updated) versions of these papers
  17. 17. BDJThe Biodiversity Data Journal Making small data big!
  18. 18. Why do we need another new journal!!! Taxonomy needs less fragmentation, not more! BUT… • We need to encourage taxonomists to mobilize & describe their data • This takes considerable effort (e.g. Scratchpads) • ―Arguably‖ this is best rewarded through credit • This means papers and citations • Process must be very easy for authors • Process must facilitate data reuse • Meet ―Open Data‖ policy commitments • The Biodiversity Data Journal is very different…
  19. 19. Biodiversity Data Journal (BDJ)• All data matters: No lower or upper limit of manuscript size!• Multiple publishing routes (not just Scratchpads)• ALL within a single online collaborative platform, including the writing of the manuscript!• New collaborative article authoring tool• Community peer review with ―open‖ &―public‖ options• This is in addition to conventional peer-review• Online editorial process and version control• Standards-compliant (Darwin Core, Dublin Core, NLM etc.)• Pre-defined Code-compliant article templates
  20. 20. BDJ publication & dissemination workflow GBIF-generated Manuscripts Scratchpads- manuscripts from generated from generated manuscripts metadata descriptions authors’ databases AuthorsConventional manuscripts (MS Word, Open Office) Pensoft Journal System Pensoft Writing Tool (PJS) (PWT) Marked up final publication in PDF, HTML and XML formats
  21. 21. Pensoft manuscript writing tool Contributors • Collaborative online editing (mentor, linguis c editor, copy editor, poten al reviewer, colleague/friend) Con • Rich text capabilities trib u ng • Various templates for taxon treatments Inv ite • Identification keys builder Taxon treatment • Species occurrence data Template- import (Darwin Core based Interac ve key compliant) manuscript Checklist Authoring • Smart citation for figures,Lead author crea on tables, references & Data paper automated positioning Inv ite g • Assembling plates from single figures orin A uth • References import • (CrossRef, PubMed Central, etc.) Co-authors
  22. 22. Testing screenshots of the writing tool Manuscript preview Multi-figure plates Plate layout ID Key ID Key preview builder
  23. 23. Why publish in the BDJ?• Joining (small) data into a large data pool• Open-access, archiving and re-using your data through data aggregators• Providing citation record and creditability for data in the form of peer-reviewed publications• Facilitating online article authoring and editorial process for authors, reviewers and editors• Using a truly innovative dissemination of atomized content• Very low-cost. Free in the launch phase, thereafter at fee that anyone can afford!
  24. 24. What will BDJ publish?• Single taxon treatments and nomenclatural acts• Local or regional checklists• Sampling reports and occasional inventories• Habitat-based checklists and inventories• Ecological and biological observations of species and communities?• Single identification keys• ANY KIND of biodiversity-related database, including genomic, ecological and environmental data (data papers)• Biodiversity-related software tools Starting late 2012, early 2013 Recruiting editors now
  25. 25. BDJ Barcoding, genomic &environmental sequence papers Making small data big!
  26. 26. Mammal taxa added to Genbank annually Aus sp. = dark taxa", taxa (specimens) that arent identified to a known species Proper Linnaean names
  27. 27. Proportion of mammal dark taxa in Genbank Aus sp. Proper Linnaean names
  28. 28. Proportion of invert. dark taxa in Genbank BOLD
  29. 29. Dark taxa are the norm for bacteria
  30. 30. A lesson in principles for dealing with dark taxaRoth v. Wikipediahttp://www.newyorker.com/online/blogs/books/2012/09/an-open-letter-to-wikipedia.html
  31. 31. But Wikipedia said ―no‖ ―I understand your point that the author is the greatest authority on their own work,‖ writes the Wikipedia Administrator—―but we require secondary sources.‖
  32. 32. But Wikipedia said ―no‖ One of Wikipedia’s core principles, along with things like neutrality, is verifiability: a reader must be able to look at a statement in a Wikipedia article and find out where it comes from. http://quominus.org/archives/981
  33. 33. Lessons for taxonomy & dark taxa… Taxonomic statements should be verifiable Literature is the evidence base for taxonomy Literature should be the evidence base for dark taxa http://quominus.org/archives/981
  34. 34. Example templates & dissemination Occurrence data Any other data ―Dark‖ taxon data Morphometric data BIODIVERSITY Genome descriptions MANUSCRIPT Image galleries Environmental XML sequence data MARK UP Structured text (data!) Biblio- Occurr-ARTICLES Taxon treatments Taxon names graphies ence data COL Plazi Wiki BHL
  35. 35. Example template & data fields
  36. 36. Workflow describing ―Dark Taxa‖ PWT – COLLABORATIVE Dark taxon sequenced ARTICLE AUTHORING TOOL MANUSCRIPT FINALISATION & SUBMISSION Automated submission to Pensoft BDJ – PEER-REVIEW Writing Tool Metadata: voucher specimen, images, locality, etc. MANUSCRIPT PUBLISHED Automated update of bibliographic metadata, taxon name, Zoobank record, etc.
  37. 37. Data published Nomenclature Literature Descriptions Plazi Images Occurrences
  38. 38. ―Dark Taxon‖ papers • Should contain… - The scope of the taxonomic, ecological & geographic coverage - The sources of voucher specimens - The sampling & lab. protocols used - The process used to ID taxa to which vouchers belong • Possible data fields include… - Average no. of records per taxon - Range of records per taxon (Min-Max) - Average, min. and max. sequence length - Range of intraspecific variation - Median variation with in taxon X% - Range of divergence to closed know taxon pairs (min & max?) - Median divergence between closest taxon pair
  39. 39. Possible discussion points… • The concept… - Is it a good approach to incentivize data publishing & good metadata practices? - The suitability for ―Dark Taxa‖, new genomes and env. sequence data - Is this more suitable for some data papers (e.g. dark taxa) than others? • The practicalities… - The fit to existing systems (both for data collection and dissemination) - The data fields (Dark Taxa‖, new genomes and env. sequence data) - Next steps in developing this concept
  40. 40. Acknowledgements • Scratchpad technical development - Simon Rycroft, Ben Scott, Ed Baker, Alice Heaton, Katherine Boulton, • Scratchpad outreach - Irina Brake, Laurence Livermore, Dimitris Koureas • E-Monocot - Paul Wilkin &the Kew team, Charles Godfray & the Oxford team • ViBRANT - Dave Roberts, Lucy Reeve & many many more • Pensoft - Lyubomir Penev, Teodor Georgiev & colleagues • Our 7,000+ users
  41. 41. Penso Penso Peer-review op onsWri ng Journal Public CommunityTool System Closed(PWT) (PJS) Review Review Nominated reviewers requests Review Editor Collabora ve Panel reviewers online wri ng Online edi ng Review Editorial decision & feedback Public reviewers Authors Publica on & All reviews assembled into a Online edi ng dissemina on single online version Author’s revised manuscript
  42. 42. Why we need new methods of publishing… RE-USE of CONTENT Publishing and sharing of primary data Primary data Drawings: Slavena Peneva
  43. 43. Source: Wikipedia
  1. Gostou de algum slide específico?

    Recortar slides é uma maneira fácil de colecionar informações para acessar mais tarde.

×