Your SlideShare is downloading. ×
Sharing re-usable phylogenetic data: we're not there yet
Sharing re-usable phylogenetic data: we're not there yet
Sharing re-usable phylogenetic data: we're not there yet
Sharing re-usable phylogenetic data: we're not there yet
Sharing re-usable phylogenetic data: we're not there yet
Sharing re-usable phylogenetic data: we're not there yet
Sharing re-usable phylogenetic data: we're not there yet
Sharing re-usable phylogenetic data: we're not there yet
Sharing re-usable phylogenetic data: we're not there yet
Sharing re-usable phylogenetic data: we're not there yet
Sharing re-usable phylogenetic data: we're not there yet
Sharing re-usable phylogenetic data: we're not there yet
Sharing re-usable phylogenetic data: we're not there yet
Sharing re-usable phylogenetic data: we're not there yet
Sharing re-usable phylogenetic data: we're not there yet
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Sharing re-usable phylogenetic data: we're not there yet

606

Published on

My talk given at TDWG (Florence, Italy), 9am 31st October 2013

My talk given at TDWG (Florence, Italy), 9am 31st October 2013

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
606
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Sharing reusable phylogenetic data: we're not there yet Ross Mounce @rmounce http://orcid.org/0000-0002-3520-2046
  • 2. A talk of two halves 1.) Outlining the extent of the problem (lack of) sharing, standards, care (?) 2.) What I'm trying to do about it: Digging data out of PDFs Re-releasing as
  • 3. Where's the data? Just ~4% of published phylogenetic studies in 2010 publicly archived their supporting phylo data in Stoltzfus A, O'Meara B, Whitacre J, Mounce R, Gillespie E, Kumar S, Rosauer D, & Vos R. 2012 Sharing and re-use of phylogenetic trees (and associated data) to facilitate synthesis BMC Research Notes 10.1186/1756-0500-5-574 Check our data yourself on Dryad here: 10.5061/dryad.h6pf365t
  • 4. Scientists cannot be relied upon to share published data upon request This has been known for a while now e.g. (in Psychology) Wicherts et al 2006 But has been confirmed to be true for phylogenetics too: Drew et al 2013 'Lost Branches in the Tree of Life' report that just ~16% of researchers contacted supplied the requested ('published') phylo data. My own experience tallies with this – I soon stopped bothering to try and ask people via email for a copy of their published data. It's a waste of time.
  • 5. The (Single) Supplementary Data File was a Y2K solution – a dump Many legacy journal supplementary data systems bury data and leave it there to decompose Often not re-usable in form e.g. a lazy PDF Sometimes 'typeset', corrupting the data A jumble of words & data where the bit you want is on page 92 (no programmatic access) Research BURIED and really not very discoverable Data Do reviewers even look at it? I think not tbh
  • 6. I wasted too much of my PhD trying to get usable data to re-analyze This is what I felt like... So I tried to do something about it... An open letter in support of palaeontology data archiving www.supportpalaeodatarchiving.co.uk Which was picked-up by Nature News Which, in turn got me in touch with:
  • 7. Part 2 Since few will help you to re-use their data You've got to dig it out and make it re-usable yourself AND re-release it openly so no-one else wastes their time doing this
  • 8. It's not just phylogenetics. I learned from the Open Knowledge Conference (Berlin 2011) that a lot different academic fields seem also struggle to make re-usable published data available. If it's a common, shared-problem... why not seek a shared, cross-disciplinary solution?
  • 9. AMI (Amanuensis) Building upon tools first developed in computational chemistry by the Murray-Rust lab e.g. ChemicalTagger → PhyloTagger (Entity tagging) (Chem)PubCrawler → (Phylo)PubCrawler (to getting 10,000+ PDFs to work on) https://bitbucket.org/nickday/pub-crawler http://www-ucc.ch.cam.ac.uk/products/software/chemicaltagger Open Source
  • 10. BBSRC grant approved “PLUTo: Phyloinformatic Literature Unlocking Tools” Software for making published phyloinformatic data discoverable, open, and reusable ...I just need to get my PhD viva done & rubber-stamped Instructions for getting the current working setup here: (multiple repositories, dependencies & requirements!) http://rossmounce.co.uk/2013/10/06/setting-up-ami2-on-windows/
  • 11. PDF  HTML  AMI Evolution of ultraviolet vision in the largest avian radiation - the passerines Anders Ödeen 1* , Olle Håstad and Per Alström 4 2,3 Styles , superscripts And diåcritics preserved!
  • 12. PDF  Turdus iliacus Taeniopygia guttata Serinus canaria Lanius excubitor Melopsittacus undulatus Pavo cristatus Sturnus vulgaris Dolichonyx oryzivorus Ficedula hypoleuca Vaccinium myrtillus Falco tinnunculus Turdus Pomatostomus Leothrix Amytornis Acanthisitta Orthonyx x 2 Malurus Cnemophilus x 4 Philesturnus x 2 Motacilla x 2 Toxorhampus x 2
  • 13. Typical phylo tree: 60 nodes, complex and miniscule annotation, vertical text, hyphenation and valuable branch lengths. AMI extracts ALL
  • 14. AMI 0.84 0.91 0.93 0.95 Posterior probability 23.12 34.54 37.21 38.55 Branch lengths NexML HTML Acanthisitta Acrocephalus Ailuroedus Ailuroedus Amytornis Camptostoma Acanthisittidae Acanthizidae Acrocephalidae Callaeidae Campephagidae Cnemophilidae Corvidae Genus Family
  • 15. Acknowledgements & Thanks For the Panton Fellowship, inspiration and support To the organisers of both the session: Nico, Hilmar, Rutger and the conference as a whole! For travel & accommodation support, without which I couldn't possibly attend TDWG My main collaborators on PLUTo: Matthew Wills and Peter Murray-Rust

×