• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Sharing re-usable phylogenetic data: we're not there yet

Sharing re-usable phylogenetic data: we're not there yet



My talk given at TDWG (Florence, Italy), 9am 31st October 2013

My talk given at TDWG (Florence, Italy), 9am 31st October 2013



Total Views
Views on SlideShare
Embed Views



2 Embeds 53

https://twitter.com 50
http://tweetedtimes.com 3



Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Sharing re-usable phylogenetic data: we're not there yet Sharing re-usable phylogenetic data: we're not there yet Presentation Transcript

    • Sharing reusable phylogenetic data: we're not there yet Ross Mounce @rmounce http://orcid.org/0000-0002-3520-2046
    • A talk of two halves 1.) Outlining the extent of the problem (lack of) sharing, standards, care (?) 2.) What I'm trying to do about it: Digging data out of PDFs Re-releasing as
    • Where's the data? Just ~4% of published phylogenetic studies in 2010 publicly archived their supporting phylo data in Stoltzfus A, O'Meara B, Whitacre J, Mounce R, Gillespie E, Kumar S, Rosauer D, & Vos R. 2012 Sharing and re-use of phylogenetic trees (and associated data) to facilitate synthesis BMC Research Notes 10.1186/1756-0500-5-574 Check our data yourself on Dryad here: 10.5061/dryad.h6pf365t
    • Scientists cannot be relied upon to share published data upon request This has been known for a while now e.g. (in Psychology) Wicherts et al 2006 But has been confirmed to be true for phylogenetics too: Drew et al 2013 'Lost Branches in the Tree of Life' report that just ~16% of researchers contacted supplied the requested ('published') phylo data. My own experience tallies with this – I soon stopped bothering to try and ask people via email for a copy of their published data. It's a waste of time.
    • The (Single) Supplementary Data File was a Y2K solution – a dump Many legacy journal supplementary data systems bury data and leave it there to decompose Often not re-usable in form e.g. a lazy PDF Sometimes 'typeset', corrupting the data A jumble of words & data where the bit you want is on page 92 (no programmatic access) Research BURIED and really not very discoverable Data Do reviewers even look at it? I think not tbh
    • I wasted too much of my PhD trying to get usable data to re-analyze This is what I felt like... So I tried to do something about it... An open letter in support of palaeontology data archiving www.supportpalaeodatarchiving.co.uk Which was picked-up by Nature News Which, in turn got me in touch with:
    • Part 2 Since few will help you to re-use their data You've got to dig it out and make it re-usable yourself AND re-release it openly so no-one else wastes their time doing this
    • It's not just phylogenetics. I learned from the Open Knowledge Conference (Berlin 2011) that a lot different academic fields seem also struggle to make re-usable published data available. If it's a common, shared-problem... why not seek a shared, cross-disciplinary solution?
    • AMI (Amanuensis) Building upon tools first developed in computational chemistry by the Murray-Rust lab e.g. ChemicalTagger → PhyloTagger (Entity tagging) (Chem)PubCrawler → (Phylo)PubCrawler (to getting 10,000+ PDFs to work on) https://bitbucket.org/nickday/pub-crawler http://www-ucc.ch.cam.ac.uk/products/software/chemicaltagger Open Source
    • BBSRC grant approved “PLUTo: Phyloinformatic Literature Unlocking Tools” Software for making published phyloinformatic data discoverable, open, and reusable ...I just need to get my PhD viva done & rubber-stamped Instructions for getting the current working setup here: (multiple repositories, dependencies & requirements!) http://rossmounce.co.uk/2013/10/06/setting-up-ami2-on-windows/
    • PDF  HTML  AMI Evolution of ultraviolet vision in the largest avian radiation - the passerines Anders Ödeen 1* , Olle Håstad and Per Alström 4 2,3 Styles , superscripts And diåcritics preserved!
    • PDF  Turdus iliacus Taeniopygia guttata Serinus canaria Lanius excubitor Melopsittacus undulatus Pavo cristatus Sturnus vulgaris Dolichonyx oryzivorus Ficedula hypoleuca Vaccinium myrtillus Falco tinnunculus Turdus Pomatostomus Leothrix Amytornis Acanthisitta Orthonyx x 2 Malurus Cnemophilus x 4 Philesturnus x 2 Motacilla x 2 Toxorhampus x 2
    • Typical phylo tree: 60 nodes, complex and miniscule annotation, vertical text, hyphenation and valuable branch lengths. AMI extracts ALL
    • AMI 0.84 0.91 0.93 0.95 Posterior probability 23.12 34.54 37.21 38.55 Branch lengths NexML HTML Acanthisitta Acrocephalus Ailuroedus Ailuroedus Amytornis Camptostoma Acanthisittidae Acanthizidae Acrocephalidae Callaeidae Campephagidae Cnemophilidae Corvidae Genus Family
    • Acknowledgements & Thanks For the Panton Fellowship, inspiration and support To the organisers of both the session: Nico, Hilmar, Rutger and the conference as a whole! For travel & accommodation support, without which I couldn't possibly attend TDWG My main collaborators on PLUTo: Matthew Wills and Peter Murray-Rust