Open science, open-source,     and open data:   Collaboration as an   emergent property?           Cyberinfrastructure for...
“Openness is the new          flower power”    “We create the   scholarship. Wecreate the meta-data. We create the tools. W...
E-access and interoperability    are enabling factors                    Sidlauskas et al (2010)
What does “open” mean? “A piece of content  or data is open ifanyone is free to use,      reuse, and  redistribute it —sub...
What does “open” mean? “A piece of content  or data is open if  anyone is free to   use, reuse, and  redistribute it —   s...
Open Notebook ScienceAn experiment within the DataONE Summer Internship Program
Open e-collaboration sites
Dryad: archiving, finding,         and sharing open data•   Data archived at publication         •   Data given persistent ...
Dryad: archiving, finding,         and sharing open data•   Data archived at publication         •   Data given persistent ...
612            downloadsAmy Zanne
• Two-layered process for gaining access to data  • “Individuals or groups of individuals that would like to use    the TR...
Another model forcontrolled sharing:         Text                            Text• Data description  published• Data embar...
Open Source:    The value of community       Web traffic      500                             Mailing list membership      ...
Community participation      in Phenoscape• Anatomy Ontology Contributors: 25• Taxonomy Ontology Contributors: 11• Communi...
Sustaining informatics    resources over the long term• 875 modules in core, >422,000  lines of code• Most widely used Per...
Open development:  Social coding
Adopting open development   influenced by incentivesComparative Methods in R Hackathon, Dec 2007
Steve Kembel   Matt Helmus
Summary• The principles in common are freedom to • Reuse • Modify and recombine • Redistribute• These are also fundamental...
Will openness make usbetter collaborators?
Will openness make usbetter collaborators?        Hum Nat (2007) 18:88–108
Will openness make usbetter collaborators?        Hum Nat (2007) 18:88–108
Will openness make usbetter collaborators?        Hum Nat (2007) 18:88–108
Will openness make usbetter collaborators?
Whenever you do athing, act as if all theworld were watching.              Thomas Jefferson
Acknowledgments• NESCent Directors              • EvoInfo Working Group  and resident scientists          participants• NE...
Open science, open-source, and open data: Collaboration as an emergent property?
Open science, open-source, and open data: Collaboration as an emergent property?
Open science, open-source, and open data: Collaboration as an emergent property?
Upcoming SlideShare
Loading in …5
×

Open science, open-source, and open data: Collaboration as an emergent property?

2,236 views

Published on

Talk I gave as part of the panel "How will cyberinfrastructure capabilities shape the future of scientific collaboration?" at the Cyberinfrastructure for Collaborative Science workshop, held at the National Evolutionary Synthesis Center (NESCent), May 18-20, 2011.

More information about the workshop at
https://www.nescent.org/wg_collabsci/2011_Workshop

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,236
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
10
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • Describing how the work of the summer interns got picked up by others while it was happening, based on the web presence of their evolving outputs. Examples, multiple outside groups stepped in to flesh out and maintain the database of funder policies on data archiving that Nic launched. Valerie’s resource pages on data citation led to students being invited by DataCite to contribute to the metadata model behind data DOIs. The interns also posted regularly to a project blog, shared their bibliographies on Mendeley, and their end-of-summer poster was likely seen by a much larger number of people online than at the conference, based on the Twitter buzz we observed.\n\nDownsides: There was fear (founded or not) of the consequences for organizational reputation to have things out there representing DataONE that might be seen as shoddy, or embarrasing to publishers, etc., and it made us add more disclaimers/qualifiers to the web documentation.  We discovered that there were some weaknesses with the platforms we used (e.g. OpenWetWare).  This particular batch of interns didn't seem to be intimidated.  And the open communication led to some efficiencies with the larger mentor group -- I'm not aware of inefficiencies that were introduced.  The blog may not have always been the best investment of time from a research-productivity standpoint, but was a useful learning experience for the students nonetheless. \n
  • “Researchers have found many reasons for making their notebook open: better collaborations, greater visibility and broader impact for funding requirements, sharing detailed methods, solving the dilemma of  null results publishing, helping other scientists advance, demonstrating the viability of the idea or participation in a community of open science.”\n
  • \n
  • Dryad provides a way to make data to be available not just for validation, but also for new method development, meta-analysis, and other types of synthetic science; it also give researchers credit for their data as a first-class scholarly product.\n
  • Dryad provides a way to make data to be available not just for validation, but also for new method development, meta-analysis, and other types of synthetic science; it also give researchers credit for their data as a first-class scholarly product.\n
  • An example of how valuable such archived data can be. NESCent postdoc Amy Zanne and colleagues led a group that in 2009 deposited to Dryad a dataset compiling wood anatomy data from 8412 plant species. This dataset has already been downloaded over 600 times! While some of these downloads may lead to citations, there is probably a good deal of data reuse for educational purposes, and likely also exploration of analytical methods on this unique dataset.\n\nThe inset from the corresponding Ecology Letters article shows the geographical distribution of wood density in North and South America. Each data point is the mean wood density value of all unique species occurrences in that cell. Wood density clearly varies in a very predictable way with temperature, precipitation, and seasonality. Dryad contains the data underlying this figure, but without Dryad, researchers would be unable to reconstruct the original data from this image for testing new hypotheses.\n
  • Contrast this with the newly announced TRY database. A similar community compilation of trait data. But access is guarded by a formidable two-tiered process.\n
  • And contrast that with another model for protecting the professional needs of researchers in their published data. Many of the Dryad partner journals allow authors to embargo data for a limited period, but they must archive it publicly upon publication, and reuse cannot be controlled after the embargo has been lifted. This is consistent with the idea that the original authors have a conflict of interest in the reuse of that data after publication, and should not be empowered to suppress independent reanalysis any more than they should for the contents of the article itself. Dryad supports 1 yr no-questions asked embargoes. In this case, a published dataset from some of the TRY contributors has been embargoed for 3 yrs at the discretion of the journal editor.\n
  • Mailing list traffic has gone, in the case of Gbrowse, from less then 50/month in 2007 to sometimes over 400/month in 2010\nWe know that there have been over 200 new installations of Gbrowse per month based on registrations\n
  • \n
  • \n
  • \n
  • \n
  • 8 participating R packages\nAde4, Ape, apTreeshape, Geiger, Laser, OUCH, PaleoTS, Picante\nBefore the hackathon:1/8 were using version control (but none public), 1/8 had a mailing list\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Open science, open-source, and open data: Collaboration as an emergent property?

    1. 1. Open science, open-source, and open data: Collaboration as an emergent property? Cyberinfrastructure for Collaborative Science A workshop at NESCent, Durham, NC, May 18-20Hilmar Lapp and Todd Vision, NESCent
    2. 2. “Openness is the new flower power” “We create the scholarship. Wecreate the meta-data. We create the tools. We can reclaim andreinvent the way thatscientific scholarship is created and disseminated.” Peter Murray-Rust http://blogs.ch.cam.ac.uk/pmr/2010/08/02/flowerpoint-step-by-step/
    3. 3. E-access and interoperability are enabling factors Sidlauskas et al (2010)
    4. 4. What does “open” mean? “A piece of content or data is open ifanyone is free to use, reuse, and redistribute it —subject only, at most, to the requirement to attribute and share-alike.”
    5. 5. What does “open” mean? “A piece of content or data is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement toattribute and share- alike.”
    6. 6. Open Notebook ScienceAn experiment within the DataONE Summer Internship Program
    7. 7. Open e-collaboration sites
    8. 8. Dryad: archiving, finding, and sharing open data• Data archived at publication • Data given persistent GUIDs• Option for limited-term embargo • Metadata searchable• Easy submission process • API for data exchange• Data may be peer reviewed • Capacity for versioning/updates• Persistent link from paper to data • Long-term preservation• Data in public domain (CCZero) • Community governance
    9. 9. Dryad: archiving, finding, and sharing open data• Data archived at publication • Data given persistent GUIDs• Option for limited-term embargo • Metadata searchable• Easy submission process • API for data exchange• Data may be peer reviewed • Capacity for versioning/updates• Persistent link from paper to data • Long-term preservation• Data in public domain (CCZero) • Community governance
    10. 10. 612 downloadsAmy Zanne
    11. 11. • Two-layered process for gaining access to data • “Individuals or groups of individuals that would like to use the TRY database for a scientific project […] should submit a proposal to TRY” • “TRY will give the proponent the list of individuals […] that will necessarily have to be […] consulted […]”
    12. 12. Another model forcontrolled sharing: Text Text• Data description published• Data embargoed 1 (or 3) yrs post publication• Afterwards it is released under CCZero
    13. 13. Open Source: The value of community Web traffic 500 Mailing list membership 2008 400 2010 300 200 100 0 ajax anno arch cma deve gbro s unce itect p l wse chema ureExample: Generic Model Organism Database
    14. 14. Community participation in Phenoscape• Anatomy Ontology Contributors: 25• Taxonomy Ontology Contributors: 11• Community data annotators: 10• Workshop Participants: ~75• Software testing volunteers: 33• Internship students: 13• Phenotype Ontology RCN: http://phenotypercn.org
    15. 15. Sustaining informatics resources over the long term• 875 modules in core, >422,000 lines of code• Most widely used Perl toolkit in the life sciences• Active for 16 years• No direct grant funding• Continues to recruit new contributors• Leadership baton passed 5 times, dozens of committers• Stajich et al (2002) Genome Biology: cited 509x
    16. 16. Open development: Social coding
    17. 17. Adopting open development influenced by incentivesComparative Methods in R Hackathon, Dec 2007
    18. 18. Steve Kembel Matt Helmus
    19. 19. Summary• The principles in common are freedom to • Reuse • Modify and recombine • Redistribute• These are also fundamental to enabling synthetic collaborative science.
    20. 20. Will openness make usbetter collaborators?
    21. 21. Will openness make usbetter collaborators? Hum Nat (2007) 18:88–108
    22. 22. Will openness make usbetter collaborators? Hum Nat (2007) 18:88–108
    23. 23. Will openness make usbetter collaborators? Hum Nat (2007) 18:88–108
    24. 24. Will openness make usbetter collaborators?
    25. 25. Whenever you do athing, act as if all theworld were watching. Thomas Jefferson
    26. 26. Acknowledgments• NESCent Directors • EvoInfo Working Group and resident scientists participants• NESCent Informatics team: • Participants and co- R. Scherle, J. Balhoff, C. organizers of 5 hackathons Kothari, D. Clements, X. Liu, V. Gapeyev, J. Auman • Google OSPO, Google Summer of Code students &• Dryad team, UNC/MRC mentors collaborators, including Heather Piwowar • The O|B|F community and fellow Bio* developers• Phenoscape team (PIs P. Mabee, M. Westerfield, T. • Comp. Phyloinformatics Summer Course students & Vision), curators, and other instructors collaborators

    ×