'Open' as a strategy for durability, reproducibility, and scalability (BOSC 2014)
Upcoming SlideShare
Loading in...5
×
 

'Open' as a strategy for durability, reproducibility, and scalability (BOSC 2014)

on

  • 186 views

About the 'Open' in the Open Tree of Life project.

About the 'Open' in the Open Tree of Life project.

Statistics

Views

Total Views
186
Views on SlideShare
185
Embed Views
1

Actions

Likes
0
Downloads
0
Comments
0

1 Embed 1

https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

'Open' as a strategy for durability, reproducibility, and scalability (BOSC 2014) Presentation Transcript

  • 1. 'Open' as a strategy for durability, reproducibility, and scalability Jonathan A. Rees National Evolutionary Synthesis Center BOSC 12 July 2014
  • 2. Outline ● The challenge ● Open Tree of Life ● Threats ● 'Open' as defense ● General remarks
  • 3. ● The challenge
  • 4. (c) NingWang et al CC-BY
  • 5. Problems ● Information about tree of life is hard to fnd ● Technical incompatibilities make integration hard (e.g. names / identifers) ● Scientifc diferences are rampant but hard to see ● No single view on what is known
  • 6. ● ● Open Tree of Life
  • 7. Stephen Smith
  • 8. ● Name gathering (taxonomy) ● Study upload ● Curation interface (OTU matching etc.) ● Deposit to study repository ● Synthesis ● Access (browser & API)
  • 9. But... ● Trees are hard to obtain (Drew et al. 2013) – ~4% relatively easily available (Treebase, Dryad) – Another ~12% available in response to email – Others available only as JPEGs (cf. Mounce) ● Lots of manual labor (curation) – Obtaining trees – Ingroup and outgroup – OTU matching
  • 10. ● ● ● Threats
  • 11. Threats (1) ● Under- and non-funding ● Software and/or data could be lost or captured – volunteers may not want to invest if they think work might be lost or captured ● Software could become brittle and unimprovable – volunteers may not want to invest if they think the data is 'captured' by the software
  • 12. Threats (2) ● Scientifc decision making could be opaque – volunteers could turn away if system doesn't make sense or if its claims are hard to confrm
  • 13. ● ● ● ● 'Open' as defense
  • 14. Legal 'open' ● Hybrid CC0 + 'facts are free' (open data) – aids persistence (link to Dryad) – makes corpus more valuable ● Free software (including marked Javascript) – aids persistence – encourages experiments ● Open access publications (CC-BY)
  • 15. Technical 'open' ● Data as JSON – 'open' to larger part of ecosystem ● Trees as NeXML (in JSON syntax) ● Data on github ('phylesystem' repo + index) – what you see is everything – not locked up in a database – helps assure community that data won't be lost ● Scripting
  • 16. Process 'open' ● Open process ('open science') – public issue trackers – public discussion groups – routing user feedback to github issue tracker ● Reproducibility is another kind of 'open' – hard to reproduce → not so open – links to source material – scripting
  • 17. ● ● ● ● ● General remarks
  • 18. life = free software ? "Three cell growth types". Licensed under Creative Commons Attribution-Share Alike 3.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Three_cell_growth_types.png#mediaviewer/File:Three_cell_growth_types.png. copy interpret disseminate compete change merge
  • 19. ● I hereby acknowledge technical debt
  • 20. ● tree.opentreeofife.org ● opentreeofife.org ● github.com/opentreeofife ● twitter.com/opentreeofife ● freenode #opentreeofife ● opentreeofife google group ● opentreeofife-software google group
  • 21. Software: Jim Allman Joseph Brown Karen Cranston Cody Hinchlif Mark Holder Jonathan Leto Emily McTavish Peter Midford Rick Ree Jonathan Rees Stephen Smith PIs: Gordon Burleigh Keith Crandall Karen Cranston Karl Gude David Hibbett Mark Holder Laura Katz Rick Ree Stephen Smith Doug Soltis Tifani Williams Funding: NSF Curators: Joseph Brown Bryan Drew Romina Gazis Jessica Grant Cody Hinchlif Dail Laughinghouse Chris Owen
  • 22. © 2014 Jonathan A Rees /CC-BY 4.0